Received: by 10.223.176.5 with SMTP id f5csp2068365wra; Sun, 4 Feb 2018 19:49:20 -0800 (PST) X-Google-Smtp-Source: AH8x2255zuO/fV3/TYi+miccCaA5hu/2kLXex9pj7wtgy5T+zRr3BBvX1ZFQvbMEH23+KOEIFpT4 X-Received: by 10.98.78.148 with SMTP id c142mr47403010pfb.153.1517802560356; Sun, 04 Feb 2018 19:49:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517802560; cv=none; d=google.com; s=arc-20160816; b=nGiQW0csEkn/nFjox+ArLeHC0V4tq2MJAdCm8z0DJO0Bii1iS5E34TxRWWI2YrIU7c UKizbDdxmq3CoQ4rZLYFUfCgGUeH+4RW+c9EHFQTdJaTGSl1xQgjqrj6iDgcBKZF9MPV iUa8ltuiRheQ0iI5R9gzffLJlg18YZm7Lnhn+3ijdUaUDIoKE1twlr+fDyWF3yj9gKW6 jbSdi9bR66odRXvLQI1YsacEqSZRuj9y2ekBf8Q6fCwiAYvSxIKIMVnlLQYnnpETLr0H KoZQRWNccwx2rLf7d8C2mRIX9PDNBF4lUUjnzzr1cLKhNixoPh1yoHGmTlmkwfnejZnT uZpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:arc-authentication-results; bh=hRcfyzsQi3DYdOwj70nh4LF3vnXTz4EEAdzmA9G00To=; b=xp8CaZgUG8OqmGNov5Wc+pHw/IwoEkLeGmEshfpqpFSvdvyHAjxnUO+bAVsRwARWsH H8whr2/GPl99DP25cp71LKnuKBswD/5aQ7egA/FBBE3+L2NzlX0xoViRNzMhY7qO4zwn Pdv9RMZ30Hw/fC+nfNTFpNP0jtI7+QrwHwQe/t7RMHNVX12yjrs+w/UJV+WBebLNsPo7 CoWKS3Vu7Cnu8vpZHWQVck5ePh35OmdtkQsXNjxfKanSLXMj3aLCgdFbb1AhFFO0ucGa sxyVwIsf1hFvAE7AduHcOS3ZfANUM0ia7IB1AgKbMznXAiy2vySbIRc6FvrAuzHR+j3j +i6g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h3-v6si4031402plb.527.2018.02.04.19.49.05; Sun, 04 Feb 2018 19:49:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752411AbeBEDre (ORCPT + 99 others); Sun, 4 Feb 2018 22:47:34 -0500 Received: from mga07.intel.com ([134.134.136.100]:35426 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751979AbeBEDr1 (ORCPT ); Sun, 4 Feb 2018 22:47:27 -0500 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Feb 2018 19:47:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,462,1511856000"; d="scan'208";a="201179859" Received: from hz-desktop.sh.intel.com (HELO localhost) ([10.239.13.35]) by fmsmga005.fm.intel.com with ESMTP; 04 Feb 2018 19:47:24 -0800 Date: Mon, 5 Feb 2018 11:46:53 +0800 From: Haozhong Zhang To: Dan Williams Cc: alex.williamson@redhat.com, Michal Hocko , jack@suse.cz, kvm@vger.kernel.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@lst.de Subject: Re: [PATCH 3/3] vfio: disable filesystem-dax page pinning Message-ID: <20180205034653.mfnla2nq55ikkhav@hz-desktop> Mail-Followup-To: Dan Williams , alex.williamson@redhat.com, Michal Hocko , jack@suse.cz, kvm@vger.kernel.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@lst.de References: <151778551496.7139.17808629759104553625.stgit@dwillia2-desk3.amr.corp.intel.com> <151778553083.7139.6601964812589807125.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <151778553083.7139.6601964812589807125.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: NeoMutt/20171027 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/04/18 15:05 -0800, Dan Williams wrote: > Filesystem-DAX is incompatible with 'longterm' page pinning. Without > page cache indirection a DAX mapping maps filesystem blocks directly. > This means that the filesystem must not modify a file's block map while > any page in a mapping is pinned. In order to prevent the situation of > userspace holding of filesystem operations indefinitely, disallow > 'longterm' Filesystem-DAX mappings. > > RDMA has the same conflict and the plan there is to add a 'with lease' > mechanism to allow the kernel to notify userspace that the mapping is > being torn down for block-map maintenance. Perhaps something similar can > be put in place for vfio. > > Note that xfs and ext4 still report: > > "DAX enabled. Warning: EXPERIMENTAL, use at your own risk" > > ...at mount time, and resolving the dax-dma-vs-truncate problem is one > of the last hurdles to remove that designation. > > Cc: Alex Williamson > Cc: Michal Hocko > Cc: Christoph Hellwig > Cc: kvm@vger.kernel.org > Cc: > Reported-by: Haozhong Zhang > Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O") > Signed-off-by: Dan Williams > --- > drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index e30e29ae4819..45657e2b1ff7 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > { > struct page *page[1]; > struct vm_area_struct *vma; > + struct vm_area_struct *vmas[1]; > int ret; > > if (mm == current->mm) { > - ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), > - page); > + ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), > + page, vmas); vmas is not used subsequently if this branch is taken, so can we use NULL here? Thanks, Haozhong > } else { > unsigned int flags = 0; > > @@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > > down_read(&mm->mmap_sem); > ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, > - NULL, NULL); > + vmas, NULL); > + /* > + * The lifetime of a vaddr_get_pfn() page pin is > + * userspace-controlled. In the fs-dax case this could > + * lead to indefinite stalls in filesystem operations. > + * Disallow attempts to pin fs-dax pages via this > + * interface. > + */ > + if (ret > 0 && vma_is_fsdax(vmas[0])) { > + ret = -EOPNOTSUPP; > + put_page(page[0]); > + } > up_read(&mm->mmap_sem); > } > >