Received: by 10.192.165.156 with SMTP id m28csp408821imm; Tue, 17 Apr 2018 12:12:58 -0700 (PDT) X-Google-Smtp-Source: AIpwx489MRkFXpa5acodi3e7X9z3+Qb3wvs/aM180EBWPVo26OaUH5y8GTf+Q+V3FocCZ5mlkcuY X-Received: by 10.98.8.12 with SMTP id c12mr3068381pfd.77.1523992378397; Tue, 17 Apr 2018 12:12:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523992378; cv=none; d=google.com; s=arc-20160816; b=oc45G1qTjnFX2JP4BwlRAOtE97ZAchGaUeryUPuo6cJt0B3l9eWnv1bIRxCpEIFKzO RZ0OXL/2SffPwlIwR53qG4SkSJGHkOAAca9kMPTsjU4jEjpB9ltig2c2LvWAL823fijz 7TxVXw/Ph7OIqzc+r5Q8Psrkp9XlU8GACjXU/Uv9g7nMTid6X9JHVHuqdPeMClOGatkO qg+6BNArPsWpgS9EFBg+WRdZO50sru8/enCbJS3SRPmMVQ03haG54z6bvyR7nTZVPXdj 7cRoEyjb9gQC0mi2fovcqJWboso50R0VaKfZr4tM7W//WqxTJoXZIyQCkh6511Kl4X7V 5i4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=528SzXtl3kv6MAaZWvs1xuZm/1EvjF97Aa/Ka48AWbs=; b=iQDn1JkybA0PKDqp/G4ytFF7Gpqmq4B8s8PMdkNgAkOAwWY5MsVqdAQWBg0Um5DSEF 8O3yd//fRapX5DHZO9d1miSeRYLsEXcTuyQJYtjLRf0pT/R/aetHsddzBMtH1ZNB1H2/ MNX0J5+5VsArJ6G4pgW5SxPgFOQLWvWllXtWm1QzDGjUy5HcREwax5GHFSeX2nll0Szo TZJcENs1mr/kvReTNOsoLlUQtf8nszBLISZfM5Hh1phojgrOUqJACuO7wxFlsBqToaJk SIYskBnaoVqumHJ4wZjxhltRUkZFWssb/2egHfyddSG1DckUqJ6+waFLUSEGN9FbWghi SYzQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g17si5323942pgv.372.2018.04.17.12.12.43; Tue, 17 Apr 2018 12:12:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752699AbeDQTLK (ORCPT + 99 others); Tue, 17 Apr 2018 15:11:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37572 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752511AbeDQTLH (ORCPT ); Tue, 17 Apr 2018 15:11:07 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2C86031327DB; Tue, 17 Apr 2018 19:11:07 +0000 (UTC) Received: from w520.home (ovpn-116-103.phx2.redhat.com [10.3.116.103]) by smtp.corp.redhat.com (Postfix) with ESMTP id 28F896266D; Tue, 17 Apr 2018 19:11:06 +0000 (UTC) Date: Tue, 17 Apr 2018 13:10:47 -0600 From: Alex Williamson To: Jacob Pan Cc: iommu@lists.linux-foundation.org, LKML , Joerg Roedel , David Woodhouse , Greg Kroah-Hartman , Jean-Philippe Brucker , Rafael Wysocki , "Liu, Yi L" , "Tian, Kevin" , Raj Ashok , Jean Delvare , "Christoph Hellwig" , "Lu Baolu" , Yi L Subject: Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function Message-ID: <20180417131047.0a9c310f@w520.home> In-Reply-To: <1523915351-54415-5-git-send-email-jacob.jun.pan@linux.intel.com> References: <1523915351-54415-1-git-send-email-jacob.jun.pan@linux.intel.com> <1523915351-54415-5-git-send-email-jacob.jun.pan@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Tue, 17 Apr 2018 19:11:07 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 16 Apr 2018 14:48:53 -0700 Jacob Pan wrote: > Add Intel VT-d ops to the generic iommu_bind_pasid_table API > functions. > > The primary use case is for direct assignment of SVM capable > device. Originated from emulated IOMMU in the guest, the request goes > through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller > passes guest PASID table pointer (GPA) and size. > > Device context table entry is modified by Intel IOMMU specific > bind_pasid_table function. This will turn on nesting mode and matching > translation type. > > The unbind operation restores default context mapping. > > Signed-off-by: Jacob Pan > Signed-off-by: Liu, Yi L > Signed-off-by: Ashok Raj > --- > drivers/iommu/intel-iommu.c | 119 ++++++++++++++++++++++++++++++++++++++++++ > include/linux/dma_remapping.h | 1 + > 2 files changed, 120 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index a0f81a4..d8058be 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -2409,6 +2409,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, > info->ats_supported = info->pasid_supported = info->pri_supported = 0; > info->ats_enabled = info->pasid_enabled = info->pri_enabled = 0; > info->ats_qdep = 0; > + info->pasid_table_bound = 0; > info->dev = dev; > info->domain = domain; > info->iommu = iommu; > @@ -5132,6 +5133,7 @@ static void intel_iommu_put_resv_regions(struct device *dev, > > #ifdef CONFIG_INTEL_IOMMU_SVM > #define MAX_NR_PASID_BITS (20) > +#define MIN_NR_PASID_BITS (5) > static inline unsigned long intel_iommu_get_pts(struct intel_iommu *iommu) > { > /* > @@ -5258,6 +5260,119 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev) > > return iommu; > } > + > +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain, > + struct device *dev, struct pasid_table_config *pasidt_binfo) > +{ Never validates pasidt_binfo->{version,bytes} > + struct intel_iommu *iommu; > + struct context_entry *context; > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > + struct device_domain_info *info; > + struct pci_dev *pdev; > + u8 bus, devfn, host_table_pasid_bits; > + u16 did, sid; > + int ret = 0; > + unsigned long flags; > + u64 ctx_lo; > + > + iommu = device_to_iommu(dev, &bus, &devfn); > + if (!iommu) > + return -ENODEV; > + /* VT-d spec section 9.4 says pasid table size is encoded as 2^(x+5) */ > + host_table_pasid_bits = intel_iommu_get_pts(iommu) + MIN_NR_PASID_BITS; > + if (!pasidt_binfo || pasidt_binfo->pasid_bits > host_table_pasid_bits || > + pasidt_binfo->pasid_bits < MIN_NR_PASID_BITS) { > + pr_err("Invalid gPASID bits %d, host range %d - %d\n", > + pasidt_binfo->pasid_bits, > + MIN_NR_PASID_BITS, host_table_pasid_bits); > + return -ERANGE; > + } > + if (!ecap_nest(iommu->ecap)) { > + dev_err(dev, "Cannot bind PASID table, no nested translation\n"); > + ret = -EINVAL; > + goto out; > + } Gratuitous use of pr_err, some of these look user reachable, for instance can vfio know in advance the supported widths or can the user trigger that pr_err at will? Some of these errno values are also maybe not as descriptive as they could be. For instance if the iommu doesn't support nesting, that's not a calling argument error, that's an unsupported device error, right? > + pdev = to_pci_dev(dev); > + sid = PCI_DEVID(bus, devfn); > + info = dev->archdata.iommu; > + > + if (!info) { > + dev_err(dev, "Invalid device domain info\n"); > + ret = -EINVAL; > + goto out; > + } > + if (info->pasid_table_bound) { > + dev_err(dev, "Device PASID table already bound\n"); > + ret = -EBUSY; > + goto out; > + } > + if (!info->pasid_enabled) { > + ret = pci_enable_pasid(pdev, info->pasid_supported & ~1); > + if (ret) { > + dev_err(dev, "Failed to enable PASID\n"); > + goto out; > + } > + } > + spin_lock_irqsave(&iommu->lock, flags); > + context = iommu_context_addr(iommu, bus, devfn, 0); > + if (!context_present(context)) { > + dev_err(dev, "Context not present\n"); > + ret = -EINVAL; > + goto out_unlock; > + } > + > + /* Anticipate guest to use SVM and owns the first level, so we turn > + * nested mode on > + */ > + ctx_lo = context[0].lo; > + ctx_lo |= CONTEXT_NESTE | CONTEXT_PRS | CONTEXT_PASIDE; > + ctx_lo &= ~CONTEXT_TT_MASK; > + ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2; > + context[0].lo = ctx_lo; > + > + /* Assign guest PASID table pointer and size order */ > + ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) | > + (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS); Where does this IOMMU API interface define that base_ptr is 4K aligned or the format of the PASID table? Are these all standardized or do they vary by host IOMMU? If they're standards, maybe we could note that and the spec which defines them when we declare base_ptr. If they're IOMMU specific then I don't understand how we'll match a user provided PASID table to the requirements and format of the host IOMMU. Thanks, Alex > + context[1].lo = ctx_lo; > + /* make sure context entry is updated before flushing */ > + wmb(); > + did = dmar_domain->iommu_did[iommu->seq_id]; > + iommu->flush.flush_context(iommu, did, > + (((u16)bus) << 8) | devfn, > + DMA_CCMD_MASK_NOBIT, > + DMA_CCMD_DEVICE_INVL); > + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); > + info->pasid_table_bound = 1; > +out_unlock: > + spin_unlock_irqrestore(&iommu->lock, flags); > +out: > + return ret; > +} > + > +static void intel_iommu_unbind_pasid_table(struct iommu_domain *domain, > + struct device *dev) > +{ > + struct intel_iommu *iommu; > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > + struct device_domain_info *info; > + u8 bus, devfn; > + > + info = dev->archdata.iommu; > + if (!info) { > + dev_err(dev, "Invalid device domain info\n"); > + return; > + } > + iommu = device_to_iommu(dev, &bus, &devfn); > + if (!iommu) { > + dev_err(dev, "No IOMMU for device to unbind PASID table\n"); > + return; > + } > + > + domain_context_clear(iommu, dev); > + > + domain_context_mapping_one(dmar_domain, iommu, bus, devfn); > + info->pasid_table_bound = 0; > +} > #endif /* CONFIG_INTEL_IOMMU_SVM */ > > const struct iommu_ops intel_iommu_ops = { > @@ -5266,6 +5381,10 @@ const struct iommu_ops intel_iommu_ops = { > .domain_free = intel_iommu_domain_free, > .attach_dev = intel_iommu_attach_device, > .detach_dev = intel_iommu_detach_device, > +#ifdef CONFIG_INTEL_IOMMU_SVM > + .bind_pasid_table = intel_iommu_bind_pasid_table, > + .unbind_pasid_table = intel_iommu_unbind_pasid_table, > +#endif > .map = intel_iommu_map, > .unmap = intel_iommu_unmap, > .map_sg = default_iommu_map_sg, > diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h > index 21b3e7d..db290b2 100644 > --- a/include/linux/dma_remapping.h > +++ b/include/linux/dma_remapping.h > @@ -28,6 +28,7 @@ > > #define CONTEXT_DINVE (1ULL << 8) > #define CONTEXT_PRS (1ULL << 9) > +#define CONTEXT_NESTE (1ULL << 10) > #define CONTEXT_PASIDE (1ULL << 11) > > struct intel_iommu;