Received: by 10.223.185.116 with SMTP id b49csp364935wrg; Thu, 22 Feb 2018 23:28:42 -0800 (PST) X-Google-Smtp-Source: AH8x227+rrjvSIObgULuy1QuxdxFpFbRa3IQyMiwJtyXBnooDoSt1hErOTSRRcqSXfM/Vbm6Mtbl X-Received: by 10.99.66.65 with SMTP id p62mr689123pga.378.1519370922601; Thu, 22 Feb 2018 23:28:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519370922; cv=none; d=google.com; s=arc-20160816; b=orMBN8oYebwOqg5x0hKW4GcAJscn3X12F3bR2hkcdKpBYGrkjs2Z9k/KYdYUa84qP8 ysNvBvuJMfcOHXCl4SlEjPIcwhg4xV0dmOWB9iRxxir1CjaVpzBu+juniM2LT7umLR5q uwImgBdUpAPD+2HrMmRQ5+Omyl6xMP6daY2heQ19jDv5Yzyr2Z84b3zHQwULYpBe2HIw o+Y1FrlIsV4izb5NcRgPm5e1FQIvcenSnYD31Cz8dLzTX3gMfUdIVCSofYJOh3heRHQR hKCGw/C1DM9yU3qOnNwdOFF7fYApFK+8jYIeBKt0RHRDSJXI1dvwHM1qi3K8MPNzYPJF dsKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:arc-authentication-results; bh=Kf/cUYuXMUryoGnK1rMJNeI7PZLoL9Nj7y+935Om2dQ=; b=Fxvj+0NL0Ydt/02cFDDf9M4Y4JmoDJqoHWbi3TyaBGyuSJVdSAgb7S/jRvyVyO2NOS KvA1GMV+sTfQHZO9W0EbxnDK1sPr9nZeK3cg3mWrHrae+5PUsQ/ay+I+uJohZG8iGg9q JrTn6s8ha04InxJFhDabynKAzswMsFYkW2T94rJeFn81Rp8I0JOKBnROztHp5GjSbI9p fYxDoYdpU8QnqC7wm+B0hrcbzsNr7BfRf5hhFq0eGPqxPTLYyVSybA68uCTFy9XB1VXC 0wp2dL1OACqDFFgwa4YT85kD9eneKY/R26J7d6gZHt7OujAsimk+0atG225X39ltcPA2 gGzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a8si1162423pgq.113.2018.02.22.23.28.28; Thu, 22 Feb 2018 23:28:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751813AbeBWH1b (ORCPT + 99 others); Fri, 23 Feb 2018 02:27:31 -0500 Received: from mga03.intel.com ([134.134.136.65]:4737 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751515AbeBWH1N (ORCPT ); Fri, 23 Feb 2018 02:27:13 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2018 23:27:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,382,1515484800"; d="scan'208";a="206416408" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga006.fm.intel.com with ESMTP; 22 Feb 2018 23:27:11 -0800 Subject: [PATCH v2 5/5] vfio: disable filesystem-dax page pinning From: Dan Williams To: linux-nvdimm@lists.01.org Cc: Haozhong Zhang , Michal Hocko , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, linux-mm@kvack.org, Alex Williamson , linux-fsdevel@vger.kernel.org, Christoph Hellwig Date: Thu, 22 Feb 2018 23:18:06 -0800 Message-ID: <151937028640.18973.6759836444320779319.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <151937026001.18973.12034171121582300402.stgit@dwillia2-desk3.amr.corp.intel.com> References: <151937026001.18973.12034171121582300402.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Filesystem-DAX is incompatible with 'longterm' page pinning. Without page cache indirection a DAX mapping maps filesystem blocks directly. This means that the filesystem must not modify a file's block map while any page in a mapping is pinned. In order to prevent the situation of userspace holding of filesystem operations indefinitely, disallow 'longterm' Filesystem-DAX mappings. RDMA has the same conflict and the plan there is to add a 'with lease' mechanism to allow the kernel to notify userspace that the mapping is being torn down for block-map maintenance. Perhaps something similar can be put in place for vfio. Note that xfs and ext4 still report: "DAX enabled. Warning: EXPERIMENTAL, use at your own risk" ...at mount time, and resolving the dax-dma-vs-truncate problem is one of the last hurdles to remove that designation. Acked-by: Alex Williamson Cc: Michal Hocko Cc: Christoph Hellwig Cc: kvm@vger.kernel.org Cc: Reported-by: Haozhong Zhang Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O") Signed-off-by: Dan Williams --- drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index e30e29ae4819..45657e2b1ff7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, { struct page *page[1]; struct vm_area_struct *vma; + struct vm_area_struct *vmas[1]; int ret; if (mm == current->mm) { - ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), - page); + ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), + page, vmas); } else { unsigned int flags = 0; @@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, down_read(&mm->mmap_sem); ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - NULL, NULL); + vmas, NULL); + /* + * The lifetime of a vaddr_get_pfn() page pin is + * userspace-controlled. In the fs-dax case this could + * lead to indefinite stalls in filesystem operations. + * Disallow attempts to pin fs-dax pages via this + * interface. + */ + if (ret > 0 && vma_is_fsdax(vmas[0])) { + ret = -EOPNOTSUPP; + put_page(page[0]); + } up_read(&mm->mmap_sem); }