Received: by 2002:a25:2c96:0:0:0:0:0 with SMTP id s144csp218633ybs; Tue, 26 May 2020 07:35:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxHF5fYRKZqY+zQlJBOx0Q6tnh15tXYEV64njPD6V5CNwZVcY54Xr932ejZzD9icgz/aXup X-Received: by 2002:a17:906:9719:: with SMTP id k25mr1417209ejx.411.1590503726878; Tue, 26 May 2020 07:35:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590503726; cv=none; d=google.com; s=arc-20160816; b=Z99EQy8+KwhbNNZ/16kCtZo1UubkJvLLRgoG8LDlqfoUNyAZXpPibizCUaDl5hCf/X O+BoJ9YghNUzz1U+iA3HlVX4zBm/CBpAazKoPituHXau8iDrwkXk5CzClvTLrr7Iv0Or ARijI/MscQg7MFh83IrygVMOqlKx+3WzTkqY0QedFvpwrCCjGFpn1VL+CISaxswK08sn 1sTzvw1qBtHSMiwKDtFRWMqHkryeZvWsF//GgXhKwHYYPTm1Q7Hme7Gl1mAkG2sDB9AN qbOxJah5HPX5e6Pi6lFdLk7rKkIvxI7hBjQ0cKIxpUUjhXtK3D74JWazDsgv0k5VgbAL LB9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=ABpC662MbprtTlQAkgM8E5X8nM5UH23rSp9Nzpyl7nM=; b=c1g5ovbpSO7AYWApMUoIYdE1Jbld31fwOuqE0yBh3aBiIeT7D8+Fj9czT0BbXeNjLb S6ZpzFzgYE+23bTa5lhYNWLhifWy7RZo0hj7NjisNqto+Akb35XOwph+nGrCzTUXAxwY 8c37RdExPPgWgtP5LWVq8bPn+uxL13IwgEpBAXnq3obaCkRxw+YSrDa2MRxI8ydxWt8A /pV28EK4zru97pdtWeLBDxD3uOGx95Cp5tvjUgDUH9+no+PF5rIEaFXIo5m8h2+kLc7X YL119Lo82Ui7K9KsebohzzOSw+jq7p4w8wPP4hzYSfvl8K46QI+bDJAesXTs9vg7V9/H iKtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BifQQgy4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m22si11368092edp.611.2020.05.26.07.35.03; Tue, 26 May 2020 07:35:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BifQQgy4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729016AbgEZOc3 (ORCPT + 99 others); Tue, 26 May 2020 10:32:29 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:32285 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726856AbgEZOc3 (ORCPT ); Tue, 26 May 2020 10:32:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590503547; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ABpC662MbprtTlQAkgM8E5X8nM5UH23rSp9Nzpyl7nM=; b=BifQQgy4e5IB8gk3OweWNmpvAqc4LBI7TpTbFS8YquRzN1T3GGyjjhtGTaS95rBM5sxizc R9g3CclyAZo7ik0134LUnUAKr4HLzZmeR9WukgkuEpM/GqgRo9UEow8aqaAEeERzWIkgh7 lzKhg/P3jKJDxyfpIH9FWX4JmYMSr/8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-73-H3XJY1yJN3-ObR5G8tu1EA-1; Tue, 26 May 2020 10:32:25 -0400 X-MC-Unique: H3XJY1yJN3-ObR5G8tu1EA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C776B800053; Tue, 26 May 2020 14:32:22 +0000 (UTC) Received: from x1.home (ovpn-114-203.phx2.redhat.com [10.3.114.203]) by smtp.corp.redhat.com (Postfix) with ESMTP id EB99E78B52; Tue, 26 May 2020 14:32:19 +0000 (UTC) Date: Tue, 26 May 2020 08:32:18 -0600 From: Alex Williamson To: Peter Xu Cc: Jason Gunthorpe , John Hubbard , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, cohuck@redhat.com, cai@lca.pw, Andrea Arcangeli Subject: Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory Message-ID: <20200526083218.40402f01@x1.home> In-Reply-To: <20200526134954.GA1125781@xz-x1> References: <20200523193417.GI766834@xz-x1> <20200523170602.5eb09a66@x1.home> <20200523235257.GC939059@xz-x1> <20200525122607.GC744@ziepe.ca> <20200525142806.GC1058657@xz-x1> <20200525144651.GE744@ziepe.ca> <20200525151142.GE1058657@xz-x1> <20200525165637.GG744@ziepe.ca> <3d9c1c8b-5278-1c4d-0e9c-e6f8fdb75853@nvidia.com> <20200526003705.GK744@ziepe.ca> <20200526134954.GA1125781@xz-x1> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 26 May 2020 09:49:54 -0400 Peter Xu wrote: > On Mon, May 25, 2020 at 09:37:05PM -0300, Jason Gunthorpe wrote: > > On Mon, May 25, 2020 at 01:56:28PM -0700, John Hubbard wrote: > > > On 2020-05-25 09:56, Jason Gunthorpe wrote: > > > > On Mon, May 25, 2020 at 11:11:42AM -0400, Peter Xu wrote: > > > > > On Mon, May 25, 2020 at 11:46:51AM -0300, Jason Gunthorpe wrote: > > > > > > On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote: > > > > > > > On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote: > > > > > > > > On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote: > > > > > > > > > > > > > > > > > For what I understand now, IMHO we should still need all those handlings of > > > > > > > > > FAULT_FLAG_RETRY_NOWAIT like in the initial version. E.g., IIUC KVM gup will > > > > > > > > > try with FOLL_NOWAIT when async is allowed, before the complete slow path. I'm > > > > > > > > > not sure what would be the side effect of that if fault() blocked it. E.g., > > > > > > > > > the caller could be in an atomic context. > > > > > > > > > > > > > > > > AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when > > > > > > > > VM_FAULT_RETRY is returned, which this doesn't do? > > > > > > > > > > > > > > Yes, that's why I think we should still properly return VM_FAULT_RETRY if > > > > > > > needed.. because IMHO it is still possible that the caller calls with > > > > > > > FAULT_FLAG_RETRY_NOWAIT. > > > > > > > > > > > > > > My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means: > > > > > > > > > > > > > > - We cannot release the mmap_sem, and, > > > > > > > - We cannot sleep > > > > > > > > > > > > Sleeping looks fine, look at any FS implementation of fault, say, > > > > > > xfs. The first thing it does is xfs_ilock() which does down_write(). > > > > > > > > > > Yeah. My wild guess is that maybe fs code will always be without > > > > > FAULT_FLAG_RETRY_NOWAIT so it's safe to sleep unconditionally (e.g., I think > > > > > the general #PF should be fine to sleep in fault(); gup should be special, but > > > > > I didn't observe any gup code called upon file systems)? > > > > > > > > get_user_pages is called on filesystem backed pages. > > > > > > > > I have no idea what FAULT_FLAG_RETRY_NOWAIT is supposed to do. Maybe > > > > John was able to guess when he reworked that stuff? > > > > > > > > > > Although I didn't end up touching that particular area, I'm sure it's going > > > to come up sometime soon, so I poked around just now, and found that > > > FAULT_FLAG_RETRY_NOWAIT was added almost exactly 9 years ago. This flag was > > > intended to make KVM and similar things behave better when doing GUP on > > > file-backed pages that might, or might not be in memory. > > > > > > The idea is described in the changelog, but not in the code comments or > > > Documentation, sigh: > > > > > > commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7 > > > Author: Gleb Natapov > > > Date: Tue Mar 22 16:30:51 2011 -0700 > > > > > > mm: allow GUP to fail instead of waiting on a page > > > > > > GUP user may want to try to acquire a reference to a page if it is already > > > in memory, but not if IO, to bring it in, is needed. For example KVM may > > > tell vcpu to schedule another guest process if current one is trying to > > > access swapped out page. Meanwhile, the page will be swapped in and the > > > guest process, that depends on it, will be able to run again. > > > > > > This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and > > > FOLL_NOWAIT follow_page flags. FAULT_FLAG_RETRY_NOWAIT, when used in > > > conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that > > > it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY > > > instead. > > > > So, from kvm's perspective it was to avoid excessively long blocking in > > common paths when it could rejoin the completed IO by somehow waiting > > on a page itself? > > > > It all seems like it should not be used unless the page is going to go > > to IO? > > I think NOWAIT is used as a common flag for kvm for its initial attempt to > fault in a normal page, however... I just noticed another fact that actually > __get_user_pages() won't work with PFNMAP (check_vma_flags should fail), but > KVM just started to support fault() for PFNMAP from commit add6a0cd1c5b (2016) > using fixup_user_fault(), where nvidia seems to have a similar request to have > a fault handler on some mapped BARs. > > > > > Certainly there is no reason to optimize the fringe case of vfio > > sleeping if there is and incorrect concurrnent attempt to disable the > > a BAR. > > If fixup_user_fault() (which is always with ALLOW_RETRY && !RETRY_NOWAIT) is > the only path for the new fault(), then current way seems ok. Not sure whether > this would worth a WARN_ON_ONCE(RETRY_NOWAIT) in the fault() to be clear of > that fact. Thanks for the discussion over the weekend folks. Peter, I take it you'd be satisfied if this patch were updated as: diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index aabba6439a5b..35bd7cd4e268 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1528,6 +1528,13 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) struct vfio_pci_device *vdev = vma->vm_private_data; vm_fault_t ret = VM_FAULT_NOPAGE; + /* + * We don't expect to be called with NOWAIT and there are conflicting + * opinions on whether NOWAIT suggests we shouldn't wait for locks or + * just shouldn't wait for I/O. + */ + WARN_ON_ONCE(vmf->flags & FAULT_FLAG_RETRY_NOWAIT); + mutex_lock(&vdev->vma_lock); down_read(&vdev->memory_lock); Is that correct? Thanks, Alex