Received: by 10.223.176.5 with SMTP id f5csp2992234wra; Mon, 5 Feb 2018 13:45:00 -0800 (PST) X-Google-Smtp-Source: AH8x226yUxBJBdIRY7nMoyf38gRLKcpLa1EheeAF0wk0T0YEgBAt1rvKrgR8vZQBmfzSM9I5bW+f X-Received: by 2002:a17:902:930a:: with SMTP id bc10-v6mr203636plb.19.1517867100503; Mon, 05 Feb 2018 13:45:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517867100; cv=none; d=google.com; s=arc-20160816; b=wFJ0XE048rnkmKFGfrY74zIybHX94gdKXIJWjQfS8+dpY1kPz2QFsczbnBFxzt5EY1 R1/weqDl5Gqt1oK+3NpdH4xeSeHbpLP6zbdFr6qco+zz+b6jRY8YyaWoPxvcbYLb5/2S Lk/Pwk+kfaISpQp1bhJYCnv7J4IUMT3CQtAEHf3fApdWQ4/5Qqk4NCHpKtFwnE0Qk1hw HUQPUss2qqSRhDnkccBqDuwGMLfI0AZEMowel4YI09KyUfel5Ma8QB+dabEb7HuhTE2L mTmBG3/OB3yuVhsY9c1VsMQcMxaJzr611OSpK0KG9s2lVpRFgSp5QZXr4YcyYekYkNdy buUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=kOJo7Q6ZkNzkadfSvzia+ee0bZzZIg3xsoOvU+RLU30=; b=Wa5DmLIQw1paX+P8sqOMXtQmBI0yrrY6XKzPiODwCqJCzqs5r/Z3HVB8QcoOvTgxfV bWFV/jkUAtlXQemjswwxPIYAl/N/i4WgSZ0gTua2gJ75fAWlJfW/AAsc+zlwVsaitSJ6 YRATKkANxYu69RNrhTnZwQrp1Vn51+Nw7zMuqkYLY4SnC1eleIeAb/NVHJU7lQPyHdS4 h4qOryVgLJt6DMcRyaAMH8aNCR2+Ioqze8jUSQUqm4wbmJoL16JEFJwHihlF0yb0GwI/ JxRT0zimeuS31JZPvwnxYnySTuxYkUqQeGNDPuUEEnqCdChOqd2TlliwcSWEQy6wTM+m IKCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s16-v6si2714764plp.326.2018.02.05.13.44.45; Mon, 05 Feb 2018 13:45:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752294AbeBEVo2 (ORCPT + 99 others); Mon, 5 Feb 2018 16:44:28 -0500 Received: from mx1.redhat.com ([209.132.183.28]:16691 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752010AbeBEVoY (ORCPT ); Mon, 5 Feb 2018 16:44:24 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 116CD7572A; Mon, 5 Feb 2018 21:44:24 +0000 (UTC) Received: from w520.home (ovpn-117-203.phx2.redhat.com [10.3.117.203]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3210F6090A; Mon, 5 Feb 2018 21:44:23 +0000 (UTC) Date: Mon, 5 Feb 2018 14:44:22 -0700 From: Alex Williamson To: Dan Williams Cc: Haozhong Zhang , Michal Hocko , jack@suse.cz, kvm@vger.kernel.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@lst.de, Subject: Re: [PATCH 3/3] vfio: disable filesystem-dax page pinning Message-ID: <20180205144422.1ca67ab5@w520.home> In-Reply-To: <151778553083.7139.6601964812589807125.stgit@dwillia2-desk3.amr.corp.intel.com> References: <151778551496.7139.17808629759104553625.stgit@dwillia2-desk3.amr.corp.intel.com> <151778553083.7139.6601964812589807125.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 05 Feb 2018 21:44:24 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 04 Feb 2018 15:05:30 -0800 Dan Williams wrote: > Filesystem-DAX is incompatible with 'longterm' page pinning. Without > page cache indirection a DAX mapping maps filesystem blocks directly. > This means that the filesystem must not modify a file's block map while > any page in a mapping is pinned. In order to prevent the situation of > userspace holding of filesystem operations indefinitely, disallow > 'longterm' Filesystem-DAX mappings. > > RDMA has the same conflict and the plan there is to add a 'with lease' > mechanism to allow the kernel to notify userspace that the mapping is > being torn down for block-map maintenance. Perhaps something similar can > be put in place for vfio. > > Note that xfs and ext4 still report: > > "DAX enabled. Warning: EXPERIMENTAL, use at your own risk" > > ...at mount time, and resolving the dax-dma-vs-truncate problem is one > of the last hurdles to remove that designation. > > Cc: Alex Williamson > Cc: Michal Hocko > Cc: Christoph Hellwig > Cc: kvm@vger.kernel.org > Cc: > Reported-by: Haozhong Zhang > Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O") > Signed-off-by: Dan Williams > --- > drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) This isn't without some expense, a vfio mapping and un-mapping unit test incurs ~1.5% increase in system time losing access to gup_fast(). Also, I think tce_iommu_use_page() is going to have the same problem, it provides the same sort of functionality for a different vfio IOMMU backend. Please take this through your tree and I'll add a todo list item to see how we might improve this. Acked-by: Alex Williamson Thanks, Alex > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index e30e29ae4819..45657e2b1ff7 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > { > struct page *page[1]; > struct vm_area_struct *vma; > + struct vm_area_struct *vmas[1]; > int ret; > > if (mm == current->mm) { > - ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), > - page); > + ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), > + page, vmas); > } else { > unsigned int flags = 0; > > @@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > > down_read(&mm->mmap_sem); > ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, > - NULL, NULL); > + vmas, NULL); > + /* > + * The lifetime of a vaddr_get_pfn() page pin is > + * userspace-controlled. In the fs-dax case this could > + * lead to indefinite stalls in filesystem operations. > + * Disallow attempts to pin fs-dax pages via this > + * interface. > + */ > + if (ret > 0 && vma_is_fsdax(vmas[0])) { > + ret = -EOPNOTSUPP; > + put_page(page[0]); > + } > up_read(&mm->mmap_sem); > } > >