Received: by 10.223.176.5 with SMTP id f5csp1907748wra; Sun, 4 Feb 2018 15:16:31 -0800 (PST) X-Google-Smtp-Source: AH8x224DV+OPm0vkIX9IoZ/gVzS6mDxwIeG/Ky1hpVApWDVofR0BQ5B3UeDFsUlzRCnV8GkHdeq9 X-Received: by 10.101.78.201 with SMTP id w9mr35584487pgq.43.1517786191073; Sun, 04 Feb 2018 15:16:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517786191; cv=none; d=google.com; s=arc-20160816; b=K6KWlciAsNyc69IcRg5elA7u5uc7CyvEZbCnY27I15Aya4p7UNLHwYfsYCYPRObCup cjz/5vGlwoyURZCrxTR2I2irWpFQ7XF3GtXW9fjJ+StWI4DMazpOQ4VKjZtv64eUkatG 8uGwVnFhtnRh0cO8aMxyUL1/M0X/8vE6Mbv5NByPxrGMltMCBFCwR08/LYEku4YzZKSz iHo+2tFeHwtgR4tOrqI1/tdQL3vXRKIwtUjpBuufNQl6Ng5H7lQJLfc6I1m2/So5Z/4F RoeyJk7Jz0JV7vHMu6nX/9TpZ0I0Ym7VzYyBqztH30vTPoZsDxl9bHj7bjF1cLuBMlQc Cr4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:arc-authentication-results; bh=z/NM0FR/xeANnsTjjoMhMkk6/zKUu9sL35zkZH3RjWY=; b=finFSuUIsfX8i2D9OgFxsOv9Avvdy1wb8UfpvaKUOhxWxEHCOI3vFUC8whOGzG/k4i 20vjYzyga+TO2HbhnW2GJg7nfdNNAzp4EPu/EEailsOAwl1MYo0oLLyF4w+V/LwMGQSE TFtiz3P7mFKUxRmVaiQRslJvxDNc8Qqg2yB9Pf1P0oavP60exYgASKuo9mDMKt5M5dEu twxO8ILcK1BB5D6IgVcr9y9zLsujyYbRz6vng2n9an8UIhr6ZMnc6JyWMHQIaTBuh1fv mqAMW4GjDk9wr9aZhQ0f+FFkGStNqBa3X44k+jr6/bjdfZ+H44gbLAKTbSRXYZKMmOyp Tu2Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z73si4663257pgz.781.2018.02.04.15.16.16; Sun, 04 Feb 2018 15:16:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752380AbeBDXPB (ORCPT + 99 others); Sun, 4 Feb 2018 18:15:01 -0500 Received: from mga03.intel.com ([134.134.136.65]:54907 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752218AbeBDXOh (ORCPT ); Sun, 4 Feb 2018 18:14:37 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Feb 2018 15:14:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,462,1511856000"; d="scan'208";a="31943670" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga002.jf.intel.com with ESMTP; 04 Feb 2018 15:14:36 -0800 Subject: [PATCH 3/3] vfio: disable filesystem-dax page pinning From: Dan Williams To: alex.williamson@redhat.com Cc: Haozhong Zhang , Michal Hocko , jack@suse.cz, kvm@vger.kernel.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@lst.de Date: Sun, 04 Feb 2018 15:05:30 -0800 Message-ID: <151778553083.7139.6601964812589807125.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <151778551496.7139.17808629759104553625.stgit@dwillia2-desk3.amr.corp.intel.com> References: <151778551496.7139.17808629759104553625.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Filesystem-DAX is incompatible with 'longterm' page pinning. Without page cache indirection a DAX mapping maps filesystem blocks directly. This means that the filesystem must not modify a file's block map while any page in a mapping is pinned. In order to prevent the situation of userspace holding of filesystem operations indefinitely, disallow 'longterm' Filesystem-DAX mappings. RDMA has the same conflict and the plan there is to add a 'with lease' mechanism to allow the kernel to notify userspace that the mapping is being torn down for block-map maintenance. Perhaps something similar can be put in place for vfio. Note that xfs and ext4 still report: "DAX enabled. Warning: EXPERIMENTAL, use at your own risk" ...at mount time, and resolving the dax-dma-vs-truncate problem is one of the last hurdles to remove that designation. Cc: Alex Williamson Cc: Michal Hocko Cc: Christoph Hellwig Cc: kvm@vger.kernel.org Cc: Reported-by: Haozhong Zhang Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O") Signed-off-by: Dan Williams --- drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index e30e29ae4819..45657e2b1ff7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, { struct page *page[1]; struct vm_area_struct *vma; + struct vm_area_struct *vmas[1]; int ret; if (mm == current->mm) { - ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), - page); + ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), + page, vmas); } else { unsigned int flags = 0; @@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, down_read(&mm->mmap_sem); ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - NULL, NULL); + vmas, NULL); + /* + * The lifetime of a vaddr_get_pfn() page pin is + * userspace-controlled. In the fs-dax case this could + * lead to indefinite stalls in filesystem operations. + * Disallow attempts to pin fs-dax pages via this + * interface. + */ + if (ret > 0 && vma_is_fsdax(vmas[0])) { + ret = -EOPNOTSUPP; + put_page(page[0]); + } up_read(&mm->mmap_sem); }