Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2086708pxb; Mon, 8 Mar 2021 13:53:26 -0800 (PST) X-Google-Smtp-Source: ABdhPJyqn4Jt/jXoFnASbkpcn65UsXijMXL8GbLWe/7w9BDxAeBJ2juHvH66ueRDG5co+42s5zjt X-Received: by 2002:a17:906:2a8b:: with SMTP id l11mr17060410eje.1.1615240406556; Mon, 08 Mar 2021 13:53:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615240406; cv=none; d=google.com; s=arc-20160816; b=zUgom4waqu52BK7jw4fJ9VpHoi0qev7QDmb4ssxEr3TB20WyUws4eTkU1mBRuiQYK2 HUb7+yUMVl9xanyYFG2UGWpRSaDoXa+8ThCyS1xd90jag2eouIe2u6no2/3ctUxNowNA zJ/QAR0eybNmNRkzC2+nUH/LyYAjeK/nTLbQsM6HB/GfU493LdrUnozkN+pYMvLGiaDH sqcP803YZX1uZ08rlJgywaSzJLxA3z5uav3hEQKus10P7hl06btVsqfBJ2TTW3d/6yZi 6KeUxepOu+eSL1qccAugnuKc91Fpeere8vZ7ms6gDR0XMvJwVU4H7l7O6YCd1oLxIXmx Mjzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:dkim-signature; bh=26XCLb7rid7Ukjix3izHq6+LQuV7mac/ZWZOJ4MKVFg=; b=B2Y9Zv2LsQUneYM8ZVLZ9cuYbKsO5qe2DWqLoyVeCWdVCrkQ7se6uPRIhWBx6zD7cM HGPlkVR5o8vj6s0GQ0cYTHCBuuq00H3dMD31MnnZV0o7/V8t/c8FNNn/nRP0gxAhKrUg O21WzqL6CHSyH797mtiT3kZaeBgU4jBSvK8/YL9uVTF/dQqWqdfcVaEiNGFRUzWSokEV nU6ddsvAIkED7MusTSBYkdHk7YjviIhiJHNHQsht5Ji1lhsnWl/CHp/zHz2tZ2pSSP8S QLziZYeRMgl8SKYYZSvMCOjgzj+xDKE973Zvz9RDfJvdbgEc5Y7jeJ1fwtU8tk30h7An 3OCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HNb1L59+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a6si7999805edu.356.2021.03.08.13.53.04; Mon, 08 Mar 2021 13:53:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HNb1L59+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231934AbhCHVtz (ORCPT + 99 others); Mon, 8 Mar 2021 16:49:55 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:46337 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231995AbhCHVt3 (ORCPT ); Mon, 8 Mar 2021 16:49:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615240168; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=26XCLb7rid7Ukjix3izHq6+LQuV7mac/ZWZOJ4MKVFg=; b=HNb1L59+UG3Wqsx1jafdL5Al3F8KZVEdjpHaut9RZHGcrSfjfFF+MgJ682q9IYReJ4FIgJ gHHHKbidB8n5N0I/dwG3pFUIC9gY8iUKG4hkFGx5PP5y9bpXajdNM+7oJVLEXDfz8ZvA2q i8i385R97uHaZm5tIEEj6IXDj54OxZk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-343-iZIVxP7tPTi3hfqJvY5P5A-1; Mon, 08 Mar 2021 16:49:27 -0500 X-MC-Unique: iZIVxP7tPTi3hfqJvY5P5A-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CE0981084D69; Mon, 8 Mar 2021 21:49:25 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id 227D960C04; Mon, 8 Mar 2021 21:49:19 +0000 (UTC) Subject: [PATCH v1 11/14] vfio/type1: Register device notifier From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 08 Mar 2021 14:49:18 -0700 Message-ID: <161524015876.3480.18404153016941080011.stgit@gimli.home> In-Reply-To: <161523878883.3480.12103845207889888280.stgit@gimli.home> References: <161523878883.3480.12103845207889888280.stgit@gimli.home> User-Agent: StGit/0.21-2-g8ef5 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Impose a new default strict MMIO mapping mode where the vma for a VM_PFNMAP mapping must be backed by a vfio device. This allows holding a reference to the device and registering a notifier for the device, which additionally keeps the device in an IOMMU context for the extent of the DMA mapping. On notification of device release, automatically drop the DMA mappings for it. Signed-off-by: Alex Williamson --- drivers/vfio/vfio_iommu_type1.c | 163 ++++++++++++++++++++++++++++----------- 1 file changed, 116 insertions(+), 47 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index f22c07a40521..e89f11141dee 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -101,6 +101,20 @@ struct vfio_dma { struct task_struct *task; struct rb_root pfn_list; /* Ex-user pinned pfn list */ unsigned long *bitmap; + struct pfnmap_obj *pfnmap; +}; + +/* + * Separate object used for tracking pfnmaps to allow reference release and + * unregistering notifier outside of callback chain. + */ +struct pfnmap_obj { + struct notifier_block nb; + struct work_struct work; + struct vfio_iommu *iommu; + struct vfio_dma *dma; + struct vfio_device *device; + unsigned long base_pfn; }; struct vfio_batch { @@ -506,42 +520,6 @@ static void vfio_batch_fini(struct vfio_batch *batch) free_page((unsigned long)batch->pages); } -static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, - unsigned long vaddr, unsigned long *pfn, - bool write_fault) -{ - pte_t *ptep; - spinlock_t *ptl; - int ret; - - ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl); - if (ret) { - bool unlocked = false; - - ret = fixup_user_fault(mm, vaddr, - FAULT_FLAG_REMOTE | - (write_fault ? FAULT_FLAG_WRITE : 0), - &unlocked); - if (unlocked) - return -EAGAIN; - - if (ret) - return ret; - - ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl); - if (ret) - return ret; - } - - if (write_fault && !pte_write(*ptep)) - ret = -EFAULT; - else - *pfn = pte_pfn(*ptep); - - pte_unmap_unlock(ptep, ptl); - return ret; -} - /* Return 1 if iommu->lock dropped and notified, 0 if done */ static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma, struct vfio_dma **dma_last, int *retries) @@ -575,6 +553,52 @@ static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma, return 0; } +static void unregister_device_bg(struct work_struct *work) +{ + struct pfnmap_obj *pfnmap = container_of(work, struct pfnmap_obj, work); + + vfio_device_unregister_notifier(pfnmap->device, &pfnmap->nb); + vfio_device_put(pfnmap->device); + kfree(pfnmap); +} + +static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma); + +static int vfio_device_nb_cb(struct notifier_block *nb, + unsigned long action, void *unused) +{ + struct pfnmap_obj *pfnmap = container_of(nb, struct pfnmap_obj, nb); + + switch (action) { + case VFIO_DEVICE_RELEASE: + { + struct vfio_dma *dma_last = NULL; + int retries = 0; +again: + mutex_lock(&pfnmap->iommu->lock); + if (pfnmap->dma) { + struct vfio_dma *dma = pfnmap->dma; + + if (unmap_dma_pfn_list(pfnmap->iommu, dma, + &dma_last, &retries)) + goto again; + + dma->pfnmap = NULL; + pfnmap->dma = NULL; + vfio_remove_dma(pfnmap->iommu, dma); + } + mutex_unlock(&pfnmap->iommu->lock); + + /* Cannot unregister notifier from callback chain */ + INIT_WORK(&pfnmap->work, unregister_device_bg); + schedule_work(&pfnmap->work); + break; + } + } + + return NOTIFY_OK; +} + /* * Returns the positive number of pfns successfully obtained or a negative * error code. @@ -601,21 +625,60 @@ static int vaddr_get_pfns(struct vfio_iommu *iommu, struct vfio_dma *dma, vaddr = untagged_addr(vaddr); -retry: vma = find_vma_intersection(mm, vaddr, vaddr + 1); if (vma && vma->vm_flags & VM_PFNMAP) { - ret = follow_fault_pfn(vma, mm, vaddr, pfn, - dma->prot & IOMMU_WRITE); - if (ret == -EAGAIN) - goto retry; - - if (!ret) { - if (is_invalid_reserved_pfn(*pfn)) - ret = 1; - else - ret = -EFAULT; + if ((dma->prot & IOMMU_WRITE && !(vma->vm_flags & VM_WRITE)) || + (dma->prot & IOMMU_READ && !(vma->vm_flags & VM_READ))) { + ret = -EFAULT; + goto done; + } + + if (!dma->pfnmap) { + struct vfio_device *device; + unsigned long base_pfn; + struct pfnmap_obj *pfnmap; + + device = vfio_device_get_from_vma(vma); + if (IS_ERR(device)) { + ret = PTR_ERR(device); + goto done; + } + + ret = vfio_vma_to_pfn(vma, &base_pfn); + if (ret) { + vfio_device_put(device); + goto done; + } + + pfnmap = kzalloc(sizeof(*pfnmap), GFP_KERNEL); + if (!pfnmap) { + vfio_device_put(device); + ret = -ENOMEM; + goto done; + } + + pfnmap->nb.notifier_call = vfio_device_nb_cb; + pfnmap->iommu = iommu; + pfnmap->dma = dma; + pfnmap->device = device; + pfnmap->base_pfn = base_pfn; + + dma->pfnmap = pfnmap; + + ret = vfio_device_register_notifier(device, + &pfnmap->nb); + if (ret) { + dma->pfnmap = NULL; + kfree(pfnmap); + vfio_device_put(device); + goto done; + } } + + *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + + dma->pfnmap->base_pfn; + ret = 1; } done: mmap_read_unlock(mm); @@ -1189,6 +1252,12 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) { WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)); + if (dma->pfnmap) { + vfio_device_unregister_notifier(dma->pfnmap->device, + &dma->pfnmap->nb); + vfio_device_put(dma->pfnmap->device); + kfree(dma->pfnmap); + } vfio_unmap_unpin(iommu, dma, true); vfio_unlink_dma(iommu, dma); put_task_struct(dma->task);