Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp2687245ybb; Sun, 22 Mar 2020 05:28:05 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvcdX1Kxit5iFbijSy0650mdTr6clw3a5lzoLU4HoCBDpJPDS/WI65MSKIF61sPj9puqvB0 X-Received: by 2002:aca:3857:: with SMTP id f84mr13367594oia.110.1584880085103; Sun, 22 Mar 2020 05:28:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584880085; cv=none; d=google.com; s=arc-20160816; b=zVJwovnjC0OywsPN0iNtwVQwCNgd/NteTKvkiAO+ULN8xh8Dj+Y4Tx558iLVuhX1hZ nC387Lr3MMt6D2eVa7ntnLG+P+oCL1nn4PjzLQAsYHvVvvxmAKLig51QYtZUmYhsVREZ Ugsn52h82XX/j/A/ft/0ixKTuCEZAuegPyGM4ju2uKAmOWepgd9LMuhUL/5RaY+lQrvR m8OSnlme5BMMcbBBcWXdEc2y6E0PxCAtTlpgnjToA5oJ8XhD/LEIi0S7tGeCtHk8SC/h 2A5F+F2fOYS3868yPtbNP2SfS2+INgPrhzi5KBllrh0NUDTHBUQKyevSDDddI5e4Ts/7 cddQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:ironport-sdr:ironport-sdr; bh=wR046oysCivwXxEPBVRdbuz1icOAQQeembyiF9SeKVk=; b=BgrcTFCyZfIFv+yC1sl/6ztPP0WZMKKQ7Xuxy9Yee81EjFpoPvJyIiBaSfOlXOLwG8 F585IjXYJOuO/sUrZIUc0QEdPpth35DVnwAZyKFOUUScUPDsqyUDqkF7pUYzmAsyeKYd +a8BT8tYK4I2UCaQ5PvGoCSHLiBHKWbrRBlc3ytrWKEnI9ydfHcXFjGzlxNtbC5wfiga arLmEOhfVC2HgPcR3KoDN3QDZQXOmxPO5pZUVH6QJWOwOP0+cv39S0mp2eZJYWQO6Rtm JAAA0PJ/dF94i566RVPYPqH6BCUoTai4jiT/z900H/wyfSvnOOd1zVxwzaFO0F2krF5z 5BFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g124si505178oia.0.2020.03.22.05.27.53; Sun, 22 Mar 2020 05:28:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727176AbgCVM0h (ORCPT + 99 others); Sun, 22 Mar 2020 08:26:37 -0400 Received: from mga18.intel.com ([134.134.136.126]:51561 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727085AbgCVM00 (ORCPT ); Sun, 22 Mar 2020 08:26:26 -0400 IronPort-SDR: iO/knZrY3qM11ZZVlLS4V9qnBlRq/keUs1KC6ni48XXDKp4HIdHG6O4PS6Y4exdiB8SCV7MOtn mSvvjOgOIAmw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: VWMjTkIXx7/w7hghIN8367EG5w+Lr0F9FRSyyBL+YpnkgB+B5Fl148yVcq1epTy/tTbnU3P9qM UlWSKbO/YfdA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663880" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host Date: Sun, 22 Mar 2020 05:32:03 -0700 Message-Id: <1584880325-10561-7-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Liu Yi L VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by hardware IOMMUs that have nesting DMA translation (a.k.a dual stage address translation). For such hardware IOMMUs, there are two stages/levels of address translation, and software may let userspace/VM to own the first- level/stage-1 translation structures. Example of such usage is vSVA ( virtual Shared Virtual Addressing). VM owns the first-level/stage-1 translation structures and bind the structures to host, then hardware IOMMU would utilize nesting translation when doing DMA translation fo the devices behind such hardware IOMMU. This patch adds vfio support for binding guest translation (a.k.a stage 1) structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only bind guest page table is needed, it also requires to expose interface to guest for iommu cache invalidation when guest modified the first-level/stage-1 translation structures since hardware needs to be notified to flush stale iotlbs. This would be introduced in next patch. In this patch, guest page table bind and unbind are done by using flags VFIO_IOMMU_BIND_GUEST_PGTBL and VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL VFIO_IOMMU_BIND, the bind/unbind data are conveyed by struct iommu_gpasid_bind_data. Before binding guest page table to host, VM should have got a PASID allocated by host via VFIO_IOMMU_PASID_REQUEST. Bind guest translation structures (here is guest page table) to host are the first step to setup vSVA (Virtual Shared Virtual Addressing). Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan --- drivers/vfio/vfio_iommu_type1.c | 158 ++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/vfio.h | 46 ++++++++++++ 2 files changed, 204 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 82a9e0b..a877747 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -130,6 +130,33 @@ struct vfio_regions { #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ (!list_empty(&iommu->domain_list)) +struct domain_capsule { + struct iommu_domain *domain; + void *data; +}; + +/* iommu->lock must be held */ +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu, + int (*fn)(struct device *dev, void *data), + void *data) +{ + struct domain_capsule dc = {.data = data}; + struct vfio_domain *d; + struct vfio_group *g; + int ret = 0; + + list_for_each_entry(d, &iommu->domain_list, next) { + dc.domain = d->domain; + list_for_each_entry(g, &d->group_list, next) { + ret = iommu_group_for_each_dev(g->iommu_group, + &dc, fn); + if (ret) + break; + } + } + return ret; +} + static int put_pfn(unsigned long pfn, int prot); /* @@ -2314,6 +2341,88 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, return 0; } +static int vfio_bind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_gpasid_bind_data *gbind_data = + (struct iommu_gpasid_bind_data *) dc->data; + + return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data); +} + +static int vfio_unbind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_gpasid_bind_data *gbind_data = + (struct iommu_gpasid_bind_data *) dc->data; + + return iommu_sva_unbind_gpasid(dc->domain, dev, + gbind_data->hpasid); +} + +/** + * Unbind specific gpasid, caller of this function requires hold + * vfio_iommu->lock + */ +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu, + struct iommu_gpasid_bind_data *gbind_data) +{ + return vfio_iommu_for_each_dev(iommu, + vfio_unbind_gpasid_fn, gbind_data); +} + +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu, + struct iommu_gpasid_bind_data *gbind_data) +{ + int ret = 0; + + mutex_lock(&iommu->lock); + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { + ret = -EINVAL; + goto out_unlock; + } + + ret = vfio_iommu_for_each_dev(iommu, + vfio_bind_gpasid_fn, gbind_data); + /* + * If bind failed, it may not be a total failure. Some devices + * within the iommu group may have bind successfully. Although + * we don't enable pasid capability for non-singletion iommu + * groups, a unbind operation would be helpful to ensure no + * partial binding for an iommu group. + */ + if (ret) + /* + * Undo all binds that already succeeded, no need to + * check the return value here since some device within + * the group has no successful bind when coming to this + * place switch. + */ + vfio_iommu_type1_do_guest_unbind(iommu, gbind_data); + +out_unlock: + mutex_unlock(&iommu->lock); + return ret; +} + +static long vfio_iommu_type1_unbind_gpasid(struct vfio_iommu *iommu, + struct iommu_gpasid_bind_data *gbind_data) +{ + int ret = 0; + + mutex_lock(&iommu->lock); + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { + ret = -EINVAL; + goto out_unlock; + } + + ret = vfio_iommu_type1_do_guest_unbind(iommu, gbind_data); + +out_unlock: + mutex_unlock(&iommu->lock); + return ret; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -2471,6 +2580,55 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, default: return -EINVAL; } + + } else if (cmd == VFIO_IOMMU_BIND) { + struct vfio_iommu_type1_bind bind; + u32 version; + int data_size; + void *gbind_data; + int ret; + + minsz = offsetofend(struct vfio_iommu_type1_bind, flags); + + if (copy_from_user(&bind, (void __user *)arg, minsz)) + return -EFAULT; + + if (bind.argsz < minsz) + return -EINVAL; + + /* Get the version of struct iommu_gpasid_bind_data */ + if (copy_from_user(&version, + (void __user *) (arg + minsz), + sizeof(version))) + return -EFAULT; + + data_size = iommu_uapi_get_data_size( + IOMMU_UAPI_BIND_GPASID, version); + gbind_data = kzalloc(data_size, GFP_KERNEL); + if (!gbind_data) + return -ENOMEM; + + if (copy_from_user(gbind_data, + (void __user *) (arg + minsz), data_size)) { + kfree(gbind_data); + return -EFAULT; + } + + switch (bind.flags & VFIO_IOMMU_BIND_MASK) { + case VFIO_IOMMU_BIND_GUEST_PGTBL: + ret = vfio_iommu_type1_bind_gpasid(iommu, + gbind_data); + break; + case VFIO_IOMMU_UNBIND_GUEST_PGTBL: + ret = vfio_iommu_type1_unbind_gpasid(iommu, + gbind_data); + break; + default: + ret = -EINVAL; + break; + } + kfree(gbind_data); + return ret; } return -ENOTTY; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ebeaf3e..2235bc6 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -14,6 +14,7 @@ #include #include +#include #define VFIO_API_VERSION 0 @@ -853,6 +854,51 @@ struct vfio_iommu_type1_pasid_request { */ #define VFIO_IOMMU_PASID_REQUEST _IO(VFIO_TYPE, VFIO_BASE + 22) +/** + * Supported flags: + * - VFIO_IOMMU_BIND_GUEST_PGTBL: bind guest page tables to host for + * nesting type IOMMUs. In @data field It takes struct + * iommu_gpasid_bind_data. + * - VFIO_IOMMU_UNBIND_GUEST_PGTBL: undo a bind guest page table operation + * invoked by VFIO_IOMMU_BIND_GUEST_PGTBL. + * + */ +struct vfio_iommu_type1_bind { + __u32 argsz; + __u32 flags; +#define VFIO_IOMMU_BIND_GUEST_PGTBL (1 << 0) +#define VFIO_IOMMU_UNBIND_GUEST_PGTBL (1 << 1) + __u8 data[]; +}; + +#define VFIO_IOMMU_BIND_MASK (VFIO_IOMMU_BIND_GUEST_PGTBL | \ + VFIO_IOMMU_UNBIND_GUEST_PGTBL) + +/** + * VFIO_IOMMU_BIND - _IOW(VFIO_TYPE, VFIO_BASE + 23, + * struct vfio_iommu_type1_bind) + * + * Manage address spaces of devices in this container. Initially a TYPE1 + * container can only have one address space, managed with + * VFIO_IOMMU_MAP/UNMAP_DMA. + * + * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP + * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page + * tables, and BIND manages the stage-1 (guest) page tables. Other types of + * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls + * the traffics only require single stage translation while BIND controls the + * traffics require nesting translation. But this depends on the underlying + * IOMMU architecture and isn't guaranteed. Example of this is the guest SVA + * traffics, such traffics need nesting translation to gain gVA->gPA and then + * gPA->hPA translation. + * + * Availability of this feature depends on the device, its bus, the underlying + * IOMMU and the CPU architecture. + * + * returns: 0 on success, -errno on failure. + */ +#define VFIO_IOMMU_BIND _IO(VFIO_TYPE, VFIO_BASE + 23) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* -- 2.7.4