Received: by 2002:a25:1104:0:0:0:0:0 with SMTP id 4csp527523ybr; Fri, 22 May 2020 12:20:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzSpqB82zWwWux0WbqxrjTC5kxl/ctALdN8O2BfuS2B9uMp9MSa89GRgy39/Fwt2XI4manc X-Received: by 2002:a17:906:17c1:: with SMTP id u1mr10057603eje.47.1590175225594; Fri, 22 May 2020 12:20:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590175225; cv=none; d=google.com; s=arc-20160816; b=nFigv40aJFN0krlYyRhY/QG/Cr1Os7r4EGQ6RlM/zdqMrR7ZwO2VU0iih+vozqrowu +RwAks0CH47xfCErLAmGDJEOaNiLd7p/tAbz+sEGEGKhdmzfjnQJRkl/PbYQU3JFU0Sb s6Y31D6uft/3htAnj/Dr7QjozUpGf+KPGQVjA/t0aA5MdM3sQVq/kaXsKVn3cuKw+6rF BVDTux+8n0fC4XjJ1aoVQVlYr7pQ+XnfljE/RWx1kTK8CCf1aHvSdt5pJEvMUTY2CKWj 5O1CL6ZzG0JitYdeER+tFCzspbMEnINlv/YUOgQyUaW/o5rtNGSEyZNRKX6hmJnGM1+A /O4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:dkim-signature; bh=3C2uA83F4f+Jzl33o0NlvBMXHj5j2dV9Jozgi127y5s=; b=CE6tZkknOP435lPBlFvjcpxUbjL8/hed47rg2+nRfxSsDj5tZGPHkMXrzS6f8gry4f QUBWsoKkCM2O/dGHW1ftjjbHiQL8WJxC1aiVPBP5WoxaKgrSDlKaY1QY9RVx9TyNHcOn 1sbIyfz2zokZO75QaN8EDhm0H+ak5v5ETdTwUM+ANrz+HkS5bW06g6enGinHQZJmuN3i ATrl1b8pwTxTRsgVF8kAo733CcR8A3/2g3+kKnuJvlI5EkDzSMAo/TSNcvfJi7G8L/q2 9t7C2Ck+YLulSGO1rIFiQhuAzPf1nlNKymC1g93ebJtlOITd7Yqoh2y3kFqtc8J3SXQG PoNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=eqRGc0RL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k17si5594413ejs.501.2020.05.22.12.20.02; Fri, 22 May 2020 12:20:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=eqRGc0RL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730994AbgEVTRo (ORCPT + 99 others); Fri, 22 May 2020 15:17:44 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:34131 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730971AbgEVTRn (ORCPT ); Fri, 22 May 2020 15:17:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590175061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3C2uA83F4f+Jzl33o0NlvBMXHj5j2dV9Jozgi127y5s=; b=eqRGc0RLxIM/lsE50X3fLZOG1AtCTmapbZUtSBc+n/w8ihE1jXT2gBQD04pLJidlyUsHft ZGolZH1yxVmNteCA2uxiZtgxDsExcWm+zdC3vRusUBZ+w3IIuh/7m/Gc7/20lHhRDEqgsq JZjxugWpuwp7kQudnBMwbTdECJJm6Nk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-281-jcN57ZIlPna92M5JSmBN3Q-1; Fri, 22 May 2020 15:17:39 -0400 X-MC-Unique: jcN57ZIlPna92M5JSmBN3Q-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9E04A1054F8F; Fri, 22 May 2020 19:17:38 +0000 (UTC) Received: from gimli.home (ovpn-114-203.phx2.redhat.com [10.3.114.203]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1DBA2512F7; Fri, 22 May 2020 19:17:33 +0000 (UTC) Subject: [PATCH v3 2/3] vfio-pci: Fault mmaps to enable vma tracking From: Alex Williamson To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, cohuck@redhat.com, jgg@ziepe.ca, peterx@redhat.com, cai@lca.pw Date: Fri, 22 May 2020 13:17:32 -0600 Message-ID: <159017505275.18853.4012365704798047911.stgit@gimli.home> In-Reply-To: <159017449210.18853.15037950701494323009.stgit@gimli.home> References: <159017449210.18853.15037950701494323009.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rather than calling remap_pfn_range() when a region is mmap'd, setup a vm_ops handler to support dynamic faulting of the range on access. This allows us to manage a list of vmas actively mapping the area that we can later use to invalidate those mappings. The open callback invalidates the vma range so that all tracking is inserted in the fault handler and removed in the close handler. Reviewed-by: Peter Xu Signed-off-by: Alex Williamson --- drivers/vfio/pci/vfio_pci.c | 76 ++++++++++++++++++++++++++++++++++- drivers/vfio/pci/vfio_pci_private.h | 7 +++ 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 6c6b37b5c04e..66a545a01f8f 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1299,6 +1299,70 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf, return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); } +static int vfio_pci_add_vma(struct vfio_pci_device *vdev, + struct vm_area_struct *vma) +{ + struct vfio_pci_mmap_vma *mmap_vma; + + mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL); + if (!mmap_vma) + return -ENOMEM; + + mmap_vma->vma = vma; + + mutex_lock(&vdev->vma_lock); + list_add(&mmap_vma->vma_next, &vdev->vma_list); + mutex_unlock(&vdev->vma_lock); + + return 0; +} + +/* + * Zap mmaps on open so that we can fault them in on access and therefore + * our vma_list only tracks mappings accessed since last zap. + */ +static void vfio_pci_mmap_open(struct vm_area_struct *vma) +{ + zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); +} + +static void vfio_pci_mmap_close(struct vm_area_struct *vma) +{ + struct vfio_pci_device *vdev = vma->vm_private_data; + struct vfio_pci_mmap_vma *mmap_vma; + + mutex_lock(&vdev->vma_lock); + list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) { + if (mmap_vma->vma == vma) { + list_del(&mmap_vma->vma_next); + kfree(mmap_vma); + break; + } + } + mutex_unlock(&vdev->vma_lock); +} + +static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct vfio_pci_device *vdev = vma->vm_private_data; + + if (vfio_pci_add_vma(vdev, vma)) + return VM_FAULT_OOM; + + if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, + vma->vm_end - vma->vm_start, vma->vm_page_prot)) + return VM_FAULT_SIGBUS; + + return VM_FAULT_NOPAGE; +} + +static const struct vm_operations_struct vfio_pci_mmap_ops = { + .open = vfio_pci_mmap_open, + .close = vfio_pci_mmap_close, + .fault = vfio_pci_mmap_fault, +}; + static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) { struct vfio_pci_device *vdev = device_data; @@ -1357,8 +1421,14 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; - return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - req_len, vma->vm_page_prot); + /* + * See remap_pfn_range(), called from vfio_pci_fault() but we can't + * change vm_flags within the fault handler. Set them now. + */ + vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; + vma->vm_ops = &vfio_pci_mmap_ops; + + return 0; } static void vfio_pci_request(void *device_data, unsigned int count) @@ -1608,6 +1678,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) spin_lock_init(&vdev->irqlock); mutex_init(&vdev->ioeventfds_lock); INIT_LIST_HEAD(&vdev->ioeventfds_list); + mutex_init(&vdev->vma_lock); + INIT_LIST_HEAD(&vdev->vma_list); ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); if (ret) diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 36ec69081ecd..9b25f9f6ce1d 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -92,6 +92,11 @@ struct vfio_pci_vf_token { int users; }; +struct vfio_pci_mmap_vma { + struct vm_area_struct *vma; + struct list_head vma_next; +}; + struct vfio_pci_device { struct pci_dev *pdev; void __iomem *barmap[PCI_STD_NUM_BARS]; @@ -132,6 +137,8 @@ struct vfio_pci_device { struct list_head ioeventfds_list; struct vfio_pci_vf_token *vf_token; struct notifier_block nb; + struct mutex vma_lock; + struct list_head vma_list; }; #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)