Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp351950ybc; Tue, 12 Nov 2019 02:29:58 -0800 (PST) X-Google-Smtp-Source: APXvYqwqDK3gvjA9I7fOivEmc1+5fDOibeLQ/0RQu8TEJmfPJuTje4khlRiS+JLGxzjHDGhI7nmN X-Received: by 2002:a05:6402:886:: with SMTP id e6mr31764049edy.46.1573554598862; Tue, 12 Nov 2019 02:29:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573554598; cv=none; d=google.com; s=arc-20160816; b=llZG8B/ZLt0Kp+QcDIv5p6Fr4IXJRrtLHlw35lkPAtE6cUe7grCWQ4MeqLa47ir84i jo7unW4MnXjOMgfFICt94ULllgG3OUycoAyD3DsD7Ugg6zRFDjQ8y59Hk1KpfwUCi3V4 pi20dV8lZe5JIJTlCIwyOPBKR4eRqA7y1a7Frjpx7Pakb9t5fn+esTok72C7CS4VBn1k J//zmv0n3aFBEahljhd9010fD0GsSQ4SWfOgmYuQqPN9okKL10RRZ8YdFXZiHlJW31gX 4qcjaMvlfFaTWs0OYhJSl/fuqUdRL1/VAcVx/7Nm8qcTrZ+4JGZk8X/cq1tJlhheYLoU x0iQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:subject:from:dkim-signature; bh=7VAmLwuSokfepJ74+qyoE8W9U+YWj8RoIT+AdURp3P4=; b=TCe8SjVBkMssf8kPax0NyT2vSScf918B2DcVKfPmZbpCb5jockXUaDC+h4R1eo3IN4 S3TGWtjQU+bkjvn9WV4hBW4zgLcXgLM7f40Pu+xlN8tTh9fJBR6u1Rlw1wft3XI2kWPU Sn/9NEKUsm0d4Z6xsAUEBog25sM95ub9Z9B8eSHFZI5fc15iGhTzml+pbreZ99nW1yXl /SvKfdXI5eF0PSaeSMc/iCdBMxAmPaN9+GGF0IHft5Oi52cMsIzWZFyZt1dXtH83dHhM UizHetRgf/TQ7qrA2lMppluAS+Wkqe547sI0A/1II7PhjAPYEqdm9eRxClUJ7K3Bd44m AkJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=QlDD1nb+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g43si13698211edg.203.2019.11.12.02.29.34; Tue, 12 Nov 2019 02:29:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=QlDD1nb+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726960AbfKLK2v (ORCPT + 99 others); Tue, 12 Nov 2019 05:28:51 -0500 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:54368 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725834AbfKLK2v (ORCPT ); Tue, 12 Nov 2019 05:28:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573554529; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7VAmLwuSokfepJ74+qyoE8W9U+YWj8RoIT+AdURp3P4=; b=QlDD1nb+p7bXU5X70s0K5D6Yw3pQ9PJYVSylk8SFRSdt/aCAkqtcXPBWL74GCgeXMVt4Ml /yRZIZj3+YpLLWrxuUFuRs2jQDLLvdkrSEW2pLI4wuN51Pt+8QiVm5eaOoITASRZKIepun QFhlyMeBlBf2nBoXkI97sQvRf6AkM7Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-236-72ydc0LgNr6ncOYKwQfSbg-1; Tue, 12 Nov 2019 05:28:46 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C99B6DB61; Tue, 12 Nov 2019 10:28:44 +0000 (UTC) Received: from [10.36.116.54] (ovpn-116-54.ams2.redhat.com [10.36.116.54]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0B9B819C69; Tue, 12 Nov 2019 10:28:38 +0000 (UTC) From: Auger Eric Subject: Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function To: Jacob Pan , iommu@lists.linux-foundation.org, LKML , Joerg Roedel , David Woodhouse , Alex Williamson , Jean-Philippe Brucker Cc: Yi Liu , "Tian, Kevin" , Raj Ashok , Christoph Hellwig , Lu Baolu , Jonathan Cameron References: <1571946904-86776-1-git-send-email-jacob.jun.pan@linux.intel.com> <1571946904-86776-12-git-send-email-jacob.jun.pan@linux.intel.com> Message-ID: <56c3222f-0457-8e1f-73aa-0360f49d66a7@redhat.com> Date: Tue, 12 Nov 2019 11:28:37 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <1571946904-86776-12-git-send-email-jacob.jun.pan@linux.intel.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: 72ydc0LgNr6ncOYKwQfSbg-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jacob, On 10/24/19 9:55 PM, Jacob Pan wrote: > When Shared Virtual Address (SVA) is enabled for a guest OS via > vIOMMU, we need to provide invalidation support at IOMMU API and driver > level. This patch adds Intel VT-d specific function to implement > iommu passdown invalidate API for shared virtual address. >=20 > The use case is for supporting caching structure invalidation > of assigned SVM capable devices. Emulated IOMMU exposes queue > invalidation capability and passes down all descriptors from the guest > to the physical IOMMU. >=20 > The assumption is that guest to host device ID mapping should be > resolved prior to calling IOMMU driver. Based on the device handle, > host IOMMU driver can replace certain fields before submit to the > invalidation queue. >=20 > Signed-off-by: Jacob Pan > Signed-off-by: Ashok Raj > Signed-off-by: Liu, Yi L > --- > drivers/iommu/intel-iommu.c | 170 ++++++++++++++++++++++++++++++++++++++= ++++++ > 1 file changed, 170 insertions(+) >=20 > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 5fab32fbc4b4..a73e76d6457a 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5491,6 +5491,175 @@ static void intel_iommu_aux_detach_device(struct = iommu_domain *domain, > =09aux_domain_remove_dev(to_dmar_domain(domain), dev); > } > =20 > +/* > + * 2D array for converting and sanitizing IOMMU generic TLB granularity = to > + * VT-d granularity. Invalidation is typically included in the unmap ope= ration > + * as a result of DMA or VFIO unmap. However, for assigned device where = guest > + * could own the first level page tables without being shadowed by QEMU.= In above sentence needs to be rephrased. > + * this case there is no pass down unmap to the host IOMMU as a result o= f unmap > + * in the guest. Only invalidations are trapped and passed down. > + * In all cases, only first level TLB invalidation (request with PASID) = can be > + * passed down, therefore we do not include IOTLB granularity for reques= t > + * without PASID (second level). > + * > + * For an example, to find the VT-d granularity encoding for IOTLB for example > + * type and page selective granularity within PASID: > + * X: indexed by iommu cache type > + * Y: indexed by enum iommu_inv_granularity > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR] > + * > + * Granu_map array indicates validity of the table. 1: valid, 0: invalid > + * > + */ > +const static int inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_G= RANU_NR] =3D { > +=09/* PASID based IOTLB, support PASID selective and page selective */ I would rather use the generic terminology, ie. IOTLB invalidation supports PASID and ADDR granularity > +=09{0, 1, 1},> +=09/* PASID based dev TLBs, only support all PASIDs or s= ingle PASID */ Device IOLTB invalidation supports DOMAIN and PASID granularities > +=09{1, 1, 0}, > +=09/* PASID cache */ PASID cache invalidation support DOMAIN and PASID granularity > +=09{1, 1, 0} > +}; > + > +const static u64 inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV= _GRANU_NR] =3D { > +=09/* PASID based IOTLB */ > +=09{0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, > +=09/* PASID based dev TLBs */ > +=09{QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0}, > +=09/* PASID cache */ > +=09{QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, > +}; > + > +static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu= ) nit: this looks a bit weird to me to manipulate an u64 here. Why not use a int > +{ > +=09if (type >=3D IOMMU_CACHE_INV_TYPE_NR || granu >=3D IOMMU_INV_GRANU_N= R || > +=09=09!inv_type_granu_map[type][granu]) > +=09=09return -EINVAL; > + > +=09*vtd_granu =3D inv_type_granu_table[type][granu];> + > +=09return 0; > +} > + > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) > +{ > +=09u64 nr_pages =3D (granu_size * nr_granules) >> VTD_PAGE_SHIFT; > + > +=09/* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, e= tc. > +=09 * IOMMU cache invalidate API passes granu_size in bytes, and number = of > +=09 * granu size in contiguous memory. > +=09 */ > +=09return order_base_2(nr_pages); > +} > + > +#ifdef CONFIG_INTEL_IOMMU_SVM > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, > +=09=09struct device *dev, struct iommu_cache_invalidate_info *inv_info) > +{ > +=09struct dmar_domain *dmar_domain =3D to_dmar_domain(domain); > +=09struct device_domain_info *info; > +=09struct intel_iommu *iommu; > +=09unsigned long flags; > +=09int cache_type; > +=09u8 bus, devfn; > +=09u16 did, sid; > +=09int ret =3D 0; > +=09u64 size; > + > +=09if (!inv_info || !dmar_domain || > +=09=09inv_info->version !=3D IOMMU_CACHE_INVALIDATE_INFO_VERSION_1) > +=09=09return -EINVAL; > + > +=09if (!dev || !dev_is_pci(dev)) > +=09=09return -ENODEV; > + > +=09iommu =3D device_to_iommu(dev, &bus, &devfn); > +=09if (!iommu) > +=09=09return -ENODEV; > + > +=09spin_lock_irqsave(&device_domain_lock, flags); > +=09spin_lock(&iommu->lock); > +=09info =3D iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn); > +=09if (!info) { > +=09=09ret =3D -EINVAL; > +=09=09goto out_unlock; > +=09} > +=09did =3D dmar_domain->iommu_did[iommu->seq_id]; > +=09sid =3D PCI_DEVID(bus, devfn); > +=09size =3D to_vtd_size(inv_info->addr_info.granule_size, inv_info->addr= _info.nb_granules); > + > +=09for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache, IOMMU= _CACHE_INV_TYPE_NR) { > +=09=09u64 granu =3D 0; > +=09=09u64 pasid =3D 0; > + > +=09=09ret =3D to_vtd_granularity(cache_type, inv_info->granularity, &gra= nu); > +=09=09if (ret) { > +=09=09=09pr_err("Invalid cache type and granu combination %d/%d\n", cach= e_type, > +=09=09=09=09inv_info->granularity); > +=09=09=09break; > +=09=09} > + > +=09=09/* PASID is stored in different locations based on granularity */ > +=09=09if (inv_info->granularity =3D=3D IOMMU_INV_GRANU_PASID) > +=09=09=09pasid =3D inv_info->pasid_info.pasid; you need to check IOMMU_INV_ADDR_FLAGS_PASID in flags > +=09=09else if (inv_info->granularity =3D=3D IOMMU_INV_GRANU_ADDR) > +=09=09=09pasid =3D inv_info->addr_info.pasid; same > +=09=09else { > +=09=09=09pr_err("Cannot find PASID for given cache type and granularity\= n"); > +=09=09=09break; > +=09=09} > + > +=09=09switch (BIT(cache_type)) { > +=09=09case IOMMU_CACHE_INV_TYPE_IOTLB: > +=09=09=09if (size && (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + = size)) - 1))) { > +=09=09=09=09pr_err("Address out of range, 0x%llx, size order %llu\n", don't you mean address not correctly aligned? > +=09=09=09=09=09inv_info->addr_info.addr, size); > +=09=09=09=09ret =3D -ERANGE; > +=09=09=09=09goto out_unlock; > +=09=09=09} > + > +=09=09=09qi_flush_piotlb(iommu, did, mm_to_dma_pfn(inv_info->addr_info.a= ddr), > +=09=09=09=09=09pasid, size, granu, inv_info->addr_info.flags & IOMMU_INV= _ADDR_FLAGS_LEAF); > + > +=09=09=09/* > +=09=09=09 * Always flush device IOTLB if ATS is enabled since guest > +=09=09=09 * vIOMMU exposes CM =3D 1, no device IOTLB flush will be passe= d > +=09=09=09 * down. > +=09=09=09 */ > +=09=09=09if (info->ats_enabled) { > +=09=09=09=09qi_flush_dev_piotlb(iommu, sid, info->pfsid, > +=09=09=09=09=09=09pasid, info->ats_qdep, > +=09=09=09=09=09=09inv_info->addr_info.addr, size, > +=09=09=09=09=09=09granu); > +=09=09=09} > +=09=09=09break; > +=09=09case IOMMU_CACHE_INV_TYPE_DEV_IOTLB: > +=09=09=09if (info->ats_enabled) { > +=09=09=09=09qi_flush_dev_piotlb(iommu, sid, info->pfsid, > +=09=09=09=09=09=09inv_info->addr_info.pasid, info->ats_qdep, > +=09=09=09=09=09=09inv_info->addr_info.addr, size, > +=09=09=09=09=09=09granu); > +=09=09=09} else > +=09=09=09=09pr_warn("Passdown device IOTLB flush w/o ATS!\n"); > + > +=09=09=09break; > +=09=09case IOMMU_CACHE_INV_TYPE_PASID: > +=09=09=09qi_flush_pasid_cache(iommu, did, granu, inv_info->pasid_info.pa= sid); > + > +=09=09=09break; > +=09=09default: > +=09=09=09dev_err(dev, "Unsupported IOMMU invalidation type %d\n", > +=09=09=09=09cache_type); > +=09=09=09ret =3D -EINVAL; > +=09=09} > +=09} > +out_unlock: > +=09spin_unlock(&iommu->lock); > +=09spin_unlock_irqrestore(&device_domain_lock, flags); > + > +=09return ret; > +} > +#endif > + > static int intel_iommu_map(struct iommu_domain *domain, > =09=09=09 unsigned long iova, phys_addr_t hpa, > =09=09=09 size_t size, int iommu_prot) > @@ -6027,6 +6196,7 @@ const struct iommu_ops intel_iommu_ops =3D { > =09.is_attach_deferred=09=3D intel_iommu_is_attach_deferred, > =09.pgsize_bitmap=09=09=3D INTEL_IOMMU_PGSIZES, > #ifdef CONFIG_INTEL_IOMMU_SVM > +=09.cache_invalidate=09=3D intel_iommu_sva_invalidate, > =09.sva_bind_gpasid=09=3D intel_svm_bind_gpasid, > =09.sva_unbind_gpasid=09=3D intel_svm_unbind_gpasid, > #endif >=20 Thanks Eric