Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp938065imm; Thu, 13 Sep 2018 09:59:33 -0700 (PDT) X-Google-Smtp-Source: ANB0VdanznzQzOh9tQBDvaeh8MTljfuATYYUMojuluwZRNknIyMks/QLHwmMmIILSkTQXfXvu5jD X-Received: by 2002:a62:b604:: with SMTP id j4-v6mr8265740pff.199.1536857973124; Thu, 13 Sep 2018 09:59:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536857973; cv=none; d=google.com; s=arc-20160816; b=CiNT+a26Cqt5ZRGNRCGzVGDdY/Mc5FST3ghddNe3TsLYXM9a0uE3jv7chavQ0K23W2 egEiS65gM1TpQ9QFeMt9onjA8WmOblZKEiGmQMrXo4SwvVY5ZTy+U6dy3FAm5khmS119 flhzx8o1gW/xBjFMq8phUCJX5rbANfCPXbB5mSh5HCzItdkxVcOIvGIv9ThTHHIJ4VII BFO57HHkxZnm5FDOFJkzzZvSTBXaERYZ7q/QURmIzxDjg5mDOVZ4FzhC+5OV359/07mr gGmFoAcuPZlNx2/BsqyKzV2212EiM8jByRwxEXzNRNgTSkiW5lbjO87TVEVCw6C2ye1z GRhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=m9QHhXVNCSmH0qUHYuNRaCQbS5X7HJrmBJujlPR0tiU=; b=fOcuseZctL8AZZhb4Lv66Jedw7HMS667PgdeGC+w2z66mrDs8iRxZkO2/JqQN3auyt xURyG2Q62zCpEuukdocguV7ZSbJcolrCHPlVnPTmf2nWq0b8rFFOSggrjZzlz524yLU/ afAMDOhv0vzExVy7k+6UHP2NziLTG05IWwkKESKZ0RRs+YVtyv9pmCBGLy0870yqGIfC sYQZTD4AFNUUw5ZjBJc/M+wYnIFjg1NBpOW9m4CioFGKo/Yyq8i5rHL260iT19iWTDUK NDeV+Zvvm03WnCnxoSehBTvYVqGMIg45nJ0McvBgStBOWAnTHcseLPTVdT7Vi/4YDAlR +ZDg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f8-v6si4902788pln.5.2018.09.13.09.59.16; Thu, 13 Sep 2018 09:59:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727746AbeIMWIA (ORCPT + 99 others); Thu, 13 Sep 2018 18:08:00 -0400 Received: from mga01.intel.com ([192.55.52.88]:42850 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726821AbeIMWIA (ORCPT ); Thu, 13 Sep 2018 18:08:00 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Sep 2018 09:55:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,369,1531810800"; d="scan'208";a="80207740" Received: from otc-nc-03.jf.intel.com (HELO otc-nc-03) ([10.54.39.32]) by FMSMGA003.fm.intel.com with ESMTP; 13 Sep 2018 09:55:21 -0700 Date: Thu, 13 Sep 2018 09:55:21 -0700 From: "Raj, Ashok" To: Jean-Philippe Brucker Cc: "Tian, Kevin" , Lu Baolu , Joerg Roedel , David Woodhouse , Alex Williamson , Kirti Wankhede , "Bie, Tiwei" , "Kumar, Sanjay K" , "iommu@lists.linux-foundation.org" , "linux-kernel@vger.kernel.org" , "Sun, Yi Y" , "Pan, Jacob jun" , "kvm@vger.kernel.org" , Ashok Raj Subject: Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device Message-ID: <20180913165520.GA14731@otc-nc-03> References: <20180830040922.30426-1-baolu.lu@linux.intel.com> <380dc154-5d72-0085-2056-fa466789e1ab@arm.com> <3602f8c1-df17-4894-1bcc-4d779f9aa7fd@arm.com> <03d496b0-84c2-b3ca-5be5-d4540c6d8ec7@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <03d496b0-84c2-b3ca-5be5-d4540c6d8ec7@arm.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 13, 2018 at 04:03:01PM +0100, Jean-Philippe Brucker wrote: > On 13/09/2018 01:19, Tian, Kevin wrote: > >>> This is proposed for architectures which support finer granularity > >>> second level translation with no impact on architectures which only > >>> support Source ID or the similar granularity. > >> > >> Just to be clear, in this paragraph you're only referring to the > >> Nested/second-level translation for mdev, which is specific to vt-d > >> rev3? Other architectures can still do first-level translation with > >> PASID, to support some use-cases of IOMMU aware mediated device > >> (assigning mdevs to userspace drivers, for example) > > > > yes. aux domain concept applies only to vt-d rev3 which introduces > > scalable mode. Care is taken to avoid breaking usages on existing > > architectures. > > > > one note. Assigning mdevs to user space alone doesn't imply IOMMU > > aware. All existing mdev usages use software or proprietary methods to > > isolate DMA. There is only one potential IOMMU aware mdev usage > > which we talked not rely on vt-d rev3 scalable mode - wrap a random > > PCI device into a single mdev instance (no sharing). In that case mdev > > inherits RID from parent PCI device, thus is isolated by IOMMU in RID > > granular. Our RFC supports this usage too. In VFIO two usages (PASID- > > based and RID-based) use same code path, i.e. always binding domain to > > the parent device of mdev. But within IOMMU they go different paths. > > PASID-based will go to aux-domain as iommu_enable_aux_domain > > has been called on that device. RID-based will follow existing > > unmanaged domain path, as if it is parent device assignment. > > For Arm SMMU we're more interested in the PASID-granular case than the > RID-granular one. It doesn't necessarily require vt-d rev3 scalable > mode, the following example can be implemented with an SMMUv3, since it > only needs PASID-granular first-level translation: You are right, you can simply use the first level as IOVA for every PASID. Only issue becomes when you need to assign that to a guest, you would be required to shadow the 1st level. If you have a 2nd level per-pasid first level can be managed in guest and don't require to shadow them. > > We have a PCI function that supports PASID, and can be partitioned into > multiple isolated entities, mdevs. Each mdev has an MMIO frame, an MSI > vector and a PASID. > > Different processes (userspace drivers, not QEMU) each open one mdev. A > process controlling one mdev has two ways of doing DMA: > > (1) Classically, the process uses a VFIO_TYPE1v2_IOMMU container. This > creates an auxiliary domain for the mdev, with PASID #35. The process > creates DMA mappings with VFIO_IOMMU_MAP_DMA. VFIO calls iommu_map on > the auxiliary domain. The IOMMU driver populates the pgtables associated > with PASID #35. > > (2) SVA. One way of doing it: the process uses a new > "VFIO_TYPE1_SVA_IOMMU" type of container. VFIO binds the process address > space to the device, gets PASID #35. Simpler, but not everyone wants to > use SVA, especially not userspace drivers which need the highest > performance. > > > This example only needs to modify first-level translation, and works > with SMMUv3. The kernel here could be the host, in which case > second-level translation is disabled in the SMMU, or it could be the > guest, in which case second-level mappings are created by QEMU and > first-level translation is managed by assigning PASID tables to the guest. > > So (2) would use iommu_sva_bind_device(), but (1) needs something else. > Aren't auxiliary domains suitable for (1)? Why limit auxiliary domain to > second-level or nested translation? It seems silly to use a different > API for first-level, since the flow in userspace and VFIO is the same as > your second-level case as far as MAP_DMA ioctl goes. The difference is > that in your case the auxiliary domain supports an additional operation > which binds first-level page tables. An auxiliary domain that only > supports first-level wouldn't support this operation, but it can still > implement iommu_map/unmap/etc. > > > Another note: if for some reason you did want to allow userspace to > choose between first-level or second-level, you could implement the > VFIO_TYPE1_NESTING_IOMMU container. It acts like a VFIO_TYPE1v2_IOMMU, > but also sets the DOMAIN_ATTR_NESTING on the IOMMU domain. So DMA_MAP > ioctl on a NESTING container would populate second-level, and DMA_MAP on > a normal container populates first-level. But if you're always going to > use second-level by default, the distinction isn't necessary. Where is the nesting attribute specified? in vt-d2 it was part of context entry, so also meant all PASID's are nested now. In vt-d3 its part of PASID context. It seems unsafe to share PASID's with different VM's since any request W/O PASID has only one mapping. > > > >> Sounds good, I'll drop the private PASID patch if we can figure out a > >> solution to the attach/detach_dev problem discussed on patch 8/10 > >> > > > > Can you elaborate a bit on private PASID usage? what is the > > high level flow on it? > > > > Again based on earlier explanation, aux domain is specific to IOMMU > > architecture supporting vtd scalable mode-like capability, which allows > > separate 2nd/1st level translations per PASID. Need a better understanding > > how private PASID is relevant here. > > Private PASIDs are used for doing iommu_map/iommu_unmap on PASIDs > (first-level translation): > https://www.spinics.net/lists/dri-devel/msg177003.html As above, some > people don't want SVA, some can't do it, some may even want a few > private address spaces just for their kernel driver. They need a way to > allocate PASIDs and do iommu_map/iommu_unmap on them, without binding to > a process. I was planning to add the private PASID patch to my SVA > series, but in my opinion the feature overlaps with auxiliary domains. It sounds like it maps to AUX domains.