Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp643634imj; Thu, 7 Feb 2019 09:35:45 -0800 (PST) X-Google-Smtp-Source: AHgI3IaphPPQ5CdbM7VaPWJhQ9GKQCmJ6jTts8S5puImMlGSdV2ku6TkSH41HqAAykNgp6IzVbDZ X-Received: by 2002:a62:1f97:: with SMTP id l23mr17130007pfj.13.1549560945616; Thu, 07 Feb 2019 09:35:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549560945; cv=none; d=google.com; s=arc-20160816; b=s7ZX4NoANTfJ7l1vV1IK0PQQVyWDDyW8QyEDWRcV8k9Nvz9yT2HH/QnBS3x53X+Ir0 UhGh22iBKi3e+mATzpLq4eeTFn0qzETEbhY+zCotV29yZ1LhCmj7F5efkkvPTZJV8GdN KIhicC3NpH2vKwqN6rYb/5xkHkw1OV5q3C0PGVTOCtrGJ04/BVVXHPsh5ybU/hyBxHfX 6TCEEUhGYEGL/GFaUq+yo0Z1dkzkQJnljg2gR3ILW50yRi7IAXWqVe1qPG5bU7YYlx5C wLZPrKNsujwIIGPRLS2xZL7VmE4uHfyLg+h6BLxBFnok3dA5gQCiKi6olycE4SoH732G zC0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=wJETO+Ta/zADySOqP6xlMu9L04nwUVuKDpERbQR6imo=; b=rA+TsqM8MchNkoZjorJsyT/0AaUBYFvliT9abvHQYPHjG2wZbQZSDJ5kYVMsP6oP+Z J0Z5CqrAn0kpGK1UgelLpCRtEK4frSDR2TFiVKE5GK4OP3gpFSJMdnzPtFRhI9HxW4nV 2b6R/Se08fz0Y1VL6xoU4lD4aQ21HOvoNDQXUIEYnMoDI6qrZjVdv/4XJg8hE2bthytC NW5rfxCe01+GyO9PdSEiXQGdrqzPOIhziOqHK4ccFIWvJRPDy8wE0R2hsCBz5vyq3Q3d WWpsvza9349SDfmvI+sU+Wv+4gJNMqrfE6vUTxplYY2sOmGwMoRYH5lvHHxHdwg4r41b NrcQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j2si5617607pgp.418.2019.02.07.09.35.29; Thu, 07 Feb 2019 09:35:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726925AbfBGRfW (ORCPT + 99 others); Thu, 7 Feb 2019 12:35:22 -0500 Received: from mga01.intel.com ([192.55.52.88]:35955 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726319AbfBGRfW (ORCPT ); Thu, 7 Feb 2019 12:35:22 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Feb 2019 09:35:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,344,1544515200"; d="scan'208";a="136687393" Received: from iweiny-desk2.sc.intel.com ([10.3.52.157]) by orsmga001.jf.intel.com with ESMTP; 07 Feb 2019 09:35:20 -0800 Date: Thu, 7 Feb 2019 09:35:04 -0800 From: Ira Weiny To: Christopher Lameter Cc: Doug Ledford , Dan Williams , Jason Gunthorpe , Dave Chinner , Matthew Wilcox , Jan Kara , lsf-pc@lists.linux-foundation.org, linux-rdma , Linux MM , Linux Kernel Mailing List , John Hubbard , Jerome Glisse , Michal Hocko Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA Message-ID: <20190207173504.GD29531@iweiny-DESK2.sc.intel.com> References: <20190206173114.GB12227@ziepe.ca> <20190206175233.GN21860@bombadil.infradead.org> <47820c4d696aee41225854071ec73373a273fd4a.camel@redhat.com> <01000168c43d594c-7979fcf8-b9c1-4bda-b29a-500efe001d66-000000@email.amazonses.com> <20190206210356.GZ6173@dastard> <20190206220828.GJ12227@ziepe.ca> <0c868bc615a60c44d618fb0183fcbe0c418c7c83.camel@redhat.com> <01000168c8e2de6b-9ab820ed-38ad-469c-b210-60fcff8ea81c-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <01000168c8e2de6b-9ab820ed-38ad-469c-b210-60fcff8ea81c-000000@email.amazonses.com> User-Agent: Mutt/1.11.1 (2018-12-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 07, 2019 at 04:55:37PM +0000, Christopher Lameter wrote: > One approach that may be a clean way to solve this: > > 1. Long term GUP usage requires the virtual mapping to the pages be fixed > for the duration of the GUP Map. There never has been a way to break > the pinnning and thus this needs to be preserved. How does this fit in with the changes John is making? > > 2. Page Cache Long term pins are not allowed since regular filesystems > depend on COW and other tricks which are incompatible with a long term > pin. Unless the hardware supports ODP or equivalent functionality. Right? > > 3. Filesystems that allow bypass of the page cache (like XFS / DAX) will > provide the virtual mapping when the PIN is done and DO NO OPERATIONS > on the longterm pinned range until the long term pin is removed. > Hardware may do its job (like for persistent memory) but no data > consistency on the NVDIMM medium is guaranteed until the long term pin > is removed and the filesystems regains control over the area. I believe Dan attempted something like this and it became pretty difficult. > > 4. Long term pin means that the mapped sections are an actively used part > of the file (like a filesystem write) and it cannot be truncated for > the duration of the pin. It can be thought of as if the truncate is > immediate followed by a write extending the file again. The mapping > by RDMA implies after all that remote writes can occur at anytime > within the area pinned long term. > This is a very interesting idea. I've never quite thought of it that way. That would be essentially like failing the truncate but without actually failing it... sneaky. ;-) What if user space then writes to the end of the file? Does that write end up at the point they truncated to or off the end of the mmaped area (old length)? I can see the behavior being defined either way. But one interferes with the RDMA data and the other does not. Not sure which is easier for the FS to handle either. Ira