Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2698590pxb; Tue, 23 Feb 2021 13:20:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJxB/dXG21FHpDTEv6rvC5KWFxxZWeGyAIJDmWPBb7NYNF6ZiLGmDe11Ry9yoB52U65jzhBi X-Received: by 2002:a17:907:1186:: with SMTP id uz6mr28515023ejb.330.1614115230625; Tue, 23 Feb 2021 13:20:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614115230; cv=none; d=google.com; s=arc-20160816; b=KDLEgDp3rX0rwXu2tFIrPEvWA5zYucegKwr/NTuFvrr6d6FVxth5/uPCOQUtWthMi1 9PF7ofJC2HSJtQLgjyAfIXAd8lzkyf914VOmSb48Wj3TAxJIiYVBWTkEIAuyS5HuUyRu T5mH2cRtuxvWk5LawKElc9GwpqS4BfE5jiATOvn9v5QJl4SMr6wfhF+ljyz5VIWcoGng uFjGosm2Bm4ejPctCbk7xvhJ+snCddIdjctR1pduQ4P3JktaOYqkVGr7uiDEOOzMNLuw MLUc8xqh1ndiHn0rupncfG6KyR2o3afqQi0wdLUfG3rnlAa3+A8cLQzBKdqkPQNaua8B wv1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=okMFFWo3gv2/OGYT/RzykJJjAdWwhFrwI+btUkgz6rE=; b=FKa+mfwjFih/0gs7MUzvZJfkdqdRQUe1qYHJyoVXRD9f7MAJnq+2XlbL4JusPM0zv6 nwXUEiLhe1MCwFKuaefZPUjOR2vkNMpGIOpuMPfjLMeafx8/XaEj1pIf2gxxVSEdM4EM EfQuUH4UdL5sBwSjzTma4IyQScaK4chvGcgCw42ANoWEU6JC7Qm3j5qxV7NpZ9ACi1eA 2vjHJT7QmOGAmQo3GxF8AAeHDVh//lReEoxtlxHaUl4yBxBmZzI5tCmtlZo1CwoiVQOR eWii7DX1okmGCP5ZtYO7Eh96UaiHZtnEnl2weB9xO3MCnLOD8srlKAG6Xpg4fTNCPpF1 1wxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hCLK28Qa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p26si7608203eji.740.2021.02.23.13.19.42; Tue, 23 Feb 2021 13:20:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hCLK28Qa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234545AbhBWVJm (ORCPT + 99 others); Tue, 23 Feb 2021 16:09:42 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:40515 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234358AbhBWVIg (ORCPT ); Tue, 23 Feb 2021 16:08:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614114428; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=okMFFWo3gv2/OGYT/RzykJJjAdWwhFrwI+btUkgz6rE=; b=hCLK28Qa3/Uv0XLHdukmtH4XZBCNsAduaBHNZYv804f0urLIJGAVXn6bG7haE0zaIOVd45 3TzU5HXXkQ/5X4v2Kpu2+5Sm/akqOwkYnhZ/TlPYqo5Axt03VyH5a3DhhxLahrn6umjRHA u+20T+ajA0Es8WYGQTgESI8heG/T3tI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-498-sxYJZProOBW96tlabNpWqQ-1; Tue, 23 Feb 2021 16:07:05 -0500 X-MC-Unique: sxYJZProOBW96tlabNpWqQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 81015CC62D; Tue, 23 Feb 2021 21:07:01 +0000 (UTC) Received: from laptop.redhat.com (ovpn-114-34.ams2.redhat.com [10.36.114.34]) by smtp.corp.redhat.com (Postfix) with ESMTP id 886925D9D0; Tue, 23 Feb 2021 21:06:51 +0000 (UTC) From: Eric Auger To: eric.auger.pro@gmail.com, eric.auger@redhat.com, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, will@kernel.org, maz@kernel.org, robin.murphy@arm.com, joro@8bytes.org, alex.williamson@redhat.com, tn@semihalf.com, zhukeqian1@huawei.com Cc: jacob.jun.pan@linux.intel.com, yi.l.liu@intel.com, wangxingang5@huawei.com, jiangkunkun@huawei.com, jean-philippe@linaro.org, zhangfei.gao@linaro.org, zhangfei.gao@gmail.com, vivek.gautam@arm.com, shameerali.kolothum.thodi@huawei.com, yuzenghui@huawei.com, nicoleotsuka@gmail.com, lushenming@huawei.com, vsethi@nvidia.com Subject: [PATCH v12 04/13] vfio/pci: Add VFIO_REGION_TYPE_NESTED region type Date: Tue, 23 Feb 2021 22:06:16 +0100 Message-Id: <20210223210625.604517-5-eric.auger@redhat.com> In-Reply-To: <20210223210625.604517-1-eric.auger@redhat.com> References: <20210223210625.604517-1-eric.auger@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add a new specific DMA_FAULT region aiming to exposed nested mode translation faults. This region only is exposed if the device is attached to a nested domain. The region has a ring buffer that contains the actual fault records plus a header allowing to handle it (tail/head indices, max capacity, entry size). At the moment the region is dimensionned for 512 fault records. Signed-off-by: Eric Auger --- v11 -> v12: - set fault_pages to NULL after free - check new_tail >= header->nb_entries (Zenghui) - update value of VFIO_REGION_TYPE_NESTED - handle the case where the domain is NULL - pass an int pointer to iommu_domain_get_attr [Shenming] v10 -> v11: - rename vfio_pci_init_dma_fault_region into vfio_pci_dma_fault_init - free fault_pages in vfio_pci_dma_fault_release - only register the region if the device is attached to a nested domain v8 -> v9: - Use a single region instead of a prod/cons region v4 -> v5 - check cons is not null in vfio_pci_check_cons_fault v3 -> v4: - use 2 separate regions, respectively in read and write modes - add the version capability --- drivers/vfio/pci/vfio_pci.c | 79 +++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_private.h | 6 +++ drivers/vfio/pci/vfio_pci_rdwr.c | 44 ++++++++++++++++ include/uapi/linux/vfio.h | 35 +++++++++++++ 4 files changed, 164 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 706de3ef94bb..e9a4a1c502c7 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -309,6 +309,81 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state) return ret; } +static void vfio_pci_dma_fault_release(struct vfio_pci_device *vdev, + struct vfio_pci_region *region) +{ + kfree(vdev->fault_pages); +} + +static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev, + struct vfio_pci_region *region, + struct vfio_info_cap *caps) +{ + struct vfio_region_info_cap_fault cap = { + .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT, + .header.version = 1, + .version = 1, + }; + return vfio_info_add_capability(caps, &cap.header, sizeof(cap)); +} + +static const struct vfio_pci_regops vfio_pci_dma_fault_regops = { + .rw = vfio_pci_dma_fault_rw, + .release = vfio_pci_dma_fault_release, + .add_capability = vfio_pci_dma_fault_add_capability, +}; + +#define DMA_FAULT_RING_LENGTH 512 + +static int vfio_pci_dma_fault_init(struct vfio_pci_device *vdev) +{ + struct vfio_region_dma_fault *header; + struct iommu_domain *domain; + size_t size; + int nested; + int ret; + + domain = iommu_get_domain_for_dev(&vdev->pdev->dev); + if (!domain) + return 0; + + ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nested); + if (ret || !nested) + return ret; + + mutex_init(&vdev->fault_queue_lock); + + /* + * We provision 1 page for the header and space for + * DMA_FAULT_RING_LENGTH fault records in the ring buffer. + */ + size = ALIGN(sizeof(struct iommu_fault) * + DMA_FAULT_RING_LENGTH, PAGE_SIZE) + PAGE_SIZE; + + vdev->fault_pages = kzalloc(size, GFP_KERNEL); + if (!vdev->fault_pages) + return -ENOMEM; + + ret = vfio_pci_register_dev_region(vdev, + VFIO_REGION_TYPE_NESTED, + VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT, + &vfio_pci_dma_fault_regops, size, + VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE, + vdev->fault_pages); + if (ret) + goto out; + + header = (struct vfio_region_dma_fault *)vdev->fault_pages; + header->entry_size = sizeof(struct iommu_fault); + header->nb_entries = DMA_FAULT_RING_LENGTH; + header->offset = sizeof(struct vfio_region_dma_fault); + return 0; +out: + kfree(vdev->fault_pages); + vdev->fault_pages = NULL; + return ret; +} + static int vfio_pci_enable(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev->pdev; @@ -407,6 +482,10 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) } } + ret = vfio_pci_dma_fault_init(vdev); + if (ret) + goto disable_exit; + vfio_pci_probe_mmaps(vdev); return 0; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 5c90e560c5c7..1d9b0f648133 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -134,6 +134,8 @@ struct vfio_pci_device { int ioeventfds_nr; struct eventfd_ctx *err_trigger; struct eventfd_ctx *req_trigger; + u8 *fault_pages; + struct mutex fault_queue_lock; struct list_head dummy_resources_list; struct mutex ioeventfds_lock; struct list_head ioeventfds_list; @@ -170,6 +172,10 @@ extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf, extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset, uint64_t data, int count, int fd); +extern size_t vfio_pci_dma_fault_rw(struct vfio_pci_device *vdev, + char __user *buf, size_t count, + loff_t *ppos, bool iswrite); + extern int vfio_pci_init_perm_bits(void); extern void vfio_pci_uninit_perm_bits(void); diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index a0b5fc8e46f4..164120607469 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -356,6 +356,50 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf, return done; } +size_t vfio_pci_dma_fault_rw(struct vfio_pci_device *vdev, char __user *buf, + size_t count, loff_t *ppos, bool iswrite) +{ + unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS; + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK; + void *base = vdev->region[i].data; + int ret = -EFAULT; + + if (pos >= vdev->region[i].size) + return -EINVAL; + + count = min(count, (size_t)(vdev->region[i].size - pos)); + + mutex_lock(&vdev->fault_queue_lock); + + if (iswrite) { + struct vfio_region_dma_fault *header = + (struct vfio_region_dma_fault *)base; + u32 new_tail; + + if (pos != 0 || count != 4) { + ret = -EINVAL; + goto unlock; + } + + if (copy_from_user((void *)&new_tail, buf, count)) + goto unlock; + + if (new_tail >= header->nb_entries) { + ret = -EINVAL; + goto unlock; + } + header->tail = new_tail; + } else { + if (copy_to_user(buf, base + pos, count)) + goto unlock; + } + *ppos += count; + ret = count; +unlock: + mutex_unlock(&vdev->fault_queue_lock); + return ret; +} + static void vfio_pci_ioeventfd_do_write(struct vfio_pci_ioeventfd *ioeventfd, bool test_mem) { diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 4c82e8f21618..bc46e5d6daa4 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -319,6 +319,7 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_TYPE_GFX (1) #define VFIO_REGION_TYPE_CCW (2) #define VFIO_REGION_TYPE_MIGRATION (3) +#define VFIO_REGION_TYPE_NESTED (4) /* sub-types for VFIO_REGION_TYPE_PCI_* */ @@ -343,6 +344,9 @@ struct vfio_region_info_cap_type { /* sub-types for VFIO_REGION_TYPE_GFX */ #define VFIO_REGION_SUBTYPE_GFX_EDID (1) +/* sub-types for VFIO_REGION_TYPE_NESTED */ +#define VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT (1) + /** * struct vfio_region_gfx_edid - EDID region layout. * @@ -989,6 +993,37 @@ struct vfio_device_feature { */ #define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN (0) +/* + * Capability exposed by the DMA fault region + * @version: ABI version + */ +#define VFIO_REGION_INFO_CAP_DMA_FAULT 6 + +struct vfio_region_info_cap_fault { + struct vfio_info_cap_header header; + __u32 version; +}; + +/* + * DMA Fault Region Layout + * @tail: index relative to the start of the ring buffer at which the + * consumer finds the next item in the buffer + * @entry_size: fault ring buffer entry size in bytes + * @nb_entries: max capacity of the fault ring buffer + * @offset: ring buffer offset relative to the start of the region + * @head: index relative to the start of the ring buffer at which the + * producer (kernel) inserts items into the buffers + */ +struct vfio_region_dma_fault { + /* Write-Only */ + __u32 tail; + /* Read-Only */ + __u32 entry_size; + __u32 nb_entries; + __u32 offset; + __u32 head; +}; + /* -------- API for Type1 VFIO IOMMU -------- */ /** -- 2.26.2