Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1078472yba; Fri, 3 May 2019 15:45:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqwgiUVLXu5LQv7dZD9NyZs0ZXIRqaKeb4NEngymvGOznxc6jrBO0aFWHRGX4kMZe0E5WBte X-Received: by 2002:a17:902:7294:: with SMTP id d20mr8704247pll.276.1556923502365; Fri, 03 May 2019 15:45:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556923502; cv=none; d=google.com; s=arc-20160816; b=gb1hVyQ/y13THc2yDSu4VLUstkZA+XpKWv0hehZbt6slhNyGR5YWPkpeO8bQw6+Kua LKOFP/QnnzluGlkLaaL0VlSZSZhen8HdB7wYJrgRUispmp5zVBQNwuMDGZEsJL7nHvUL gNh7sKq6KxrwEmhRrixIrYCBmCM8/euEiRacqGgSWSKSVtPd43xke3pcuxK7ElaKWfvP 9H2u42IKTUPkwec1aR+24AtSfNMytF5XayHQMnDwcXclaCNtjltrcppYSQ4q4Kp9/+hU YqAUWPk4VklO+60TJ4H/vZL4lPRPjXxM4gPDc9me9M9piZc0d0OImm5HzQDlbBhKpA9K iQAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=YjifoYfRReZcUrEloHWnFGhGX/ceUaDeiRBsNiawNG0=; b=Tsdd0ex9qpi8/b6hmKNCBvuzbpVD0otw15WvtztH8Yi4eVcuXtDxhP5AID1Wj7h1Y1 oHPHaBhV3TZSyV8YdUjcSLsOuJVD98tHbX0CziFWCszTedH6u5w+BRbCEfpyJH7luFtw TfJCurQaPqJRLqbPBjTeY4NEtmyUHDKpyf3jPAwgQuHaRrW6OPNphizUE5lmdb8w4oNC UjnJy/FuKTaB/97ubwOENdIIghQGj0NhBv/IBrJlIeONTvKu7XhooBpGbLUf1eKuXAGu cfwjMK6s2TSZMNQBPHs3c/8ylrfXaXlL4mxGsxoqfYqTyrdsrXDeoq04tz6SD5CJbzMS kfJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k15si4369513pgr.61.2019.05.03.15.44.47; Fri, 03 May 2019 15:45:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727111AbfECW3r (ORCPT + 99 others); Fri, 3 May 2019 18:29:47 -0400 Received: from mga02.intel.com ([134.134.136.20]:7005 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726833AbfECW3g (ORCPT ); Fri, 3 May 2019 18:29:36 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 May 2019 15:29:33 -0700 X-ExtLoop1: 1 Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga002.jf.intel.com with ESMTP; 03 May 2019 15:29:33 -0700 From: Jacob Pan To: iommu@lists.linux-foundation.org, LKML , Joerg Roedel , David Woodhouse , Eric Auger , Alex Williamson , Jean-Philippe Brucker Cc: "Yi Liu" , "Tian, Kevin" , Raj Ashok , "Christoph Hellwig" , "Lu Baolu" , Andriy Shevchenko , Jacob Pan , Liu@vger.kernel.org, Yi L Subject: [PATCH v3 14/16] iommu/vt-d: Add bind guest PASID support Date: Fri, 3 May 2019 15:32:15 -0700 Message-Id: <1556922737-76313-15-git-send-email-jacob.jun.pan@linux.intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1556922737-76313-1-git-send-email-jacob.jun.pan@linux.intel.com> References: <1556922737-76313-1-git-send-email-jacob.jun.pan@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When supporting guest SVA with emulated IOMMU, the guest PASID table is shadowed in VMM. Updates to guest vIOMMU PASID table will result in PASID cache flush which will be passed down to the host as bind guest PASID calls. For the SL page tables, it will be harvested from device's default domain (request w/o PASID), or aux domain in case of mediated device. .-------------. .---------------------------. | vIOMMU | | Guest process CR3, FL only| | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush - '-------------' | | | V | | CR3 in GPA '-------------' Guest ------| Shadow |--------------------------|-------- v v v Host .-------------. .----------------------. | pIOMMU | | Bind FL for GVA-GPA | | | '----------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------\.------------------------------. | | |SL for GPA-HPA, default domain| | | '------------------------------' '-------------' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables Signed-off-by: Jacob Pan Signed-off-by: Liu, Yi L --- drivers/iommu/intel-iommu.c | 4 + drivers/iommu/intel-svm.c | 175 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/intel-iommu.h | 10 ++- include/linux/intel-svm.h | 7 ++ 4 files changed, 194 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 1316c96..a10cb70 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5770,6 +5770,10 @@ const struct iommu_ops intel_iommu_ops = { .dev_enable_feat = intel_iommu_dev_enable_feat, .dev_disable_feat = intel_iommu_dev_disable_feat, .pgsize_bitmap = INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM + .sva_bind_gpasid = intel_svm_bind_gpasid, + .sva_unbind_gpasid = intel_svm_unbind_gpasid, +#endif }; static void quirk_iommu_g4x_gfx(struct pci_dev *dev) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 068dd9e..0815615 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -231,6 +231,181 @@ static LIST_HEAD(global_svm_list); list_for_each_entry(sdev, &svm->devs, list) \ if (dev == sdev->dev) \ +int intel_svm_bind_gpasid(struct iommu_domain *domain, + struct device *dev, + struct gpasid_bind_data *data) +{ + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); + struct intel_svm_dev *sdev; + struct intel_svm *svm = NULL; + struct dmar_domain *ddomain; + int ret = 0; + + if (WARN_ON(!iommu) || !data) + return -EINVAL; + + if (dev_is_pci(dev)) { + /* VT-d supports devices with full 20 bit PASIDs only */ + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX) + return -EINVAL; + } + + if (data->pasid <= 0 || data->pasid >= PASID_MAX) + return -EINVAL; + + ddomain = to_dmar_domain(domain); + /* REVISIT: + * Sanity check adddress width and paging mode support + * width matching in two dimensions: + * 1. paging mode CPU <= IOMMU + * 2. address width Guest <= Host. + */ + mutex_lock(&pasid_mutex); + svm = ioasid_find(NULL, data->pasid, NULL); + if (IS_ERR(svm)) { + ret = PTR_ERR(svm); + goto out; + } + if (svm) { + /* + * If we found svm for the PASID, there must be at + * least one device bond, otherwise svm should be freed. + */ + BUG_ON(list_empty(&svm->devs)); + + for_each_svm_dev() { + /* In case of multiple sub-devices of the same pdev assigned, we should + * allow multiple bind calls with the same PASID and pdev. + */ + sdev->users++; + goto out; + } + } else { + /* We come here when PASID has never been bond to a device. */ + svm = kzalloc(sizeof(*svm), GFP_KERNEL); + if (!svm) { + ret = -ENOMEM; + goto out; + } + /* REVISIT: upper layer/VFIO can track host process that bind the PASID. + * ioasid_set = mm might be sufficient for vfio to check pasid VMM + * ownership. + */ + svm->mm = get_task_mm(current); + svm->pasid = data->pasid; + refcount_set(&svm->refs, 0); + ioasid_set_data(data->pasid, svm); + INIT_LIST_HEAD_RCU(&svm->devs); + INIT_LIST_HEAD(&svm->list); + + mmput(svm->mm); + } + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL); + if (!sdev) { + ret = -ENOMEM; + goto out; + } + sdev->dev = dev; + sdev->users = 1; + + /* Set up device context entry for PASID if not enabled already */ + ret = intel_iommu_enable_pasid(iommu, sdev->dev); + if (ret) { + dev_err(dev, "Failed to enable PASID capability\n"); + kfree(sdev); + goto out; + } + + /* + * For guest bind, we need to set up PASID table entry as follows: + * - FLPM matches guest paging mode + * - turn on nested mode + * - SL guest address width matching + */ + ret = intel_pasid_setup_nested(iommu, + dev, + (pgd_t *)data->gcr3, + data->pasid, + data->flags, + ddomain, + data->addr_width); + if (ret) { + dev_err(dev, "Failed to set up PASID %d in nested mode, Err %d\n", + data->pasid, ret); + kfree(sdev); + goto out; + } + svm->flags |= SVM_FLAG_GUEST_MODE; + + init_rcu_head(&sdev->rcu); + refcount_inc(&svm->refs); + list_add_rcu(&sdev->list, &svm->devs); + out: + mutex_unlock(&pasid_mutex); + return ret; +} + +int intel_svm_unbind_gpasid(struct device *dev, int pasid) +{ + struct intel_svm_dev *sdev; + struct intel_iommu *iommu; + struct intel_svm *svm; + int ret = -EINVAL; + + mutex_lock(&pasid_mutex); + iommu = intel_svm_device_to_iommu(dev); + if (!iommu) + goto out; + + svm = ioasid_find(NULL, pasid, NULL); + if (IS_ERR(svm)) { + ret = PTR_ERR(svm); + goto out; + } + + if (!svm) + goto out; + + for_each_svm_dev() { + ret = 0; + sdev->users--; + if (!sdev->users) { + list_del_rcu(&sdev->list); + intel_pasid_tear_down_entry(iommu, dev, svm->pasid); + /* TODO: Drain in flight PRQ for the PASID since it + * may get reused soon, we don't want to + * confuse with its previous live. + * intel_svm_drain_prq(dev, pasid); + */ + kfree_rcu(sdev, rcu); + + if (list_empty(&svm->devs)) { + list_del(&svm->list); + kfree(svm); + /* + * We do not free PASID here until explicit call + * from VFIO to free. The PASID life cycle + * management is largely tied to VFIO management + * of assigned device life cycles. In case of + * guest exit without a explicit free PASID call, + * the responsibility lies in VFIO layer to free + * the PASIDs allocated for the guest. + * For security reasons, VFIO has to track the + * PASID ownership per guest anyway to ensure + * that PASID allocated by one guest cannot be + * used by another. + */ + ioasid_set_data(pasid, NULL); + } + } + break; + } + out: + mutex_unlock(&pasid_mutex); + + return ret; +} + int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops) { struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 48fa164..774f368 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -677,7 +677,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev); int intel_svm_init(struct intel_iommu *iommu); extern int intel_svm_enable_prq(struct intel_iommu *iommu); extern int intel_svm_finish_prq(struct intel_iommu *iommu); - +extern int intel_svm_bind_gpasid(struct iommu_domain *domain, + struct device *dev, struct gpasid_bind_data *data); +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid); struct svm_dev_ops; struct intel_svm_dev { @@ -693,12 +695,16 @@ struct intel_svm_dev { struct intel_svm { struct mmu_notifier notifier; - struct mm_struct *mm; + union { + struct mm_struct *mm; + u64 gcr3; + }; struct intel_iommu *iommu; int flags; int pasid; struct list_head devs; struct list_head list; + refcount_t refs; /* Number of devices sharing this PASID */ }; extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev); diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h index e3f7631..34b0a3b 100644 --- a/include/linux/intel-svm.h +++ b/include/linux/intel-svm.h @@ -52,6 +52,13 @@ struct svm_dev_ops { * do such IOTLB flushes automatically. */ #define SVM_FLAG_SUPERVISOR_MODE (1<<1) +/* + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device. + * In this case the mm_struct is in the guest kernel or userspace, its life + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides + * means to bind/unbind guest CR3 with PASIDs allocated for a device. + */ +#define SVM_FLAG_GUEST_MODE (1<<2) #ifdef CONFIG_INTEL_IOMMU_SVM -- 2.7.4