Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp24644imm; Wed, 5 Sep 2018 20:18:13 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbI67HM0eETIz6b72q5WUfpjCY6Tvu9KAw9XhIGc3sTDFZsHDX078hlK6rWH53ZFniFoZ9S X-Received: by 2002:a17:902:304:: with SMTP id 4-v6mr739760pld.39.1536203893085; Wed, 05 Sep 2018 20:18:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536203893; cv=none; d=google.com; s=arc-20160816; b=ZnK88k5AAyb2AcoZQ2Wlbz+d510KDUj/QuN/RyWTRVRCkpKMgojAERSbscfpUKW/lR 9ceivW9TP6+e7UNWAMlehYlsO3gl+M/gZBdTtSsL34O6tlZRsm+eEu+EfTBSbabGteUo ++AfzctzjCPSZsjAOOzI4UERTzJpXPoZYy7uQENgJVcRBzPVg7RrRl/NCyN32b7nAmIC Pil1fQ3iHn5biy9TenjyhFdOxTIfhW9Nwe632b+74+slqJMMuIwjoes0PvY+S4o11Q76 +7N2yRg4Kg05rcv0SBQC9WdMdcr2zk8zkK3NVzaMcp0REgMCrWqhaejiZYFV0IRNg/Ze jMJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :dlp-reaction:dlp-version:dlp-product:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from; bh=fNt/bUo330CVn9Xww/b5VBW7cAkhEaFTiGd5hMrSOvg=; b=v95YYJZFL4Ox7iYjuv8V1YVhXvmEplVPX3ynTy6/TaWrhL6PVWZu7Tg3djL6kXKVE0 mkOrc7p/vCCl7l/1T8XDnB0jQOGsMbibCqbNB/7eABJqzV1yBwx3H2g2GgZEwSvHDBoo Fo0i7o8u9vPW9l97jodBP0iq6678TZhpVKUUBo/A/jlqrHDIhni/RV5/PnuBfiuMRHz2 ReXD/b750QtnKC6w3obM0NCmHl0WdjfkR1kup9mzoVMoulMWicHeINZMf6qSs2BCJ2qJ kjW+YAHr1n+DhXlrbtffvvT4l52WNxjiL5gSQNz/gDuf8rlxE+Gu6LnE7TZCF6NMKBdp 92ZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f13-v6si4176920pln.512.2018.09.05.20.17.57; Wed, 05 Sep 2018 20:18:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727711AbeIFHtd convert rfc822-to-8bit (ORCPT + 99 others); Thu, 6 Sep 2018 03:49:33 -0400 Received: from mga18.intel.com ([134.134.136.126]:8443 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725981AbeIFHtc (ORCPT ); Thu, 6 Sep 2018 03:49:32 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2018 20:16:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,334,1531810800"; d="scan'208";a="70718390" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by orsmga007.jf.intel.com with ESMTP; 05 Sep 2018 20:11:54 -0700 Received: from fmsmsx158.amr.corp.intel.com (10.18.116.75) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 5 Sep 2018 20:11:53 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by fmsmsx158.amr.corp.intel.com (10.18.116.75) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 5 Sep 2018 20:11:53 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.205]) by shsmsx102.ccr.corp.intel.com ([169.254.2.226]) with mapi id 14.03.0319.002; Thu, 6 Sep 2018 11:11:51 +0800 From: "Tian, Kevin" To: Lu Baolu , Joerg Roedel , "David Woodhouse" CC: "Raj, Ashok" , "Kumar, Sanjay K" , "Pan, Jacob jun" , "Liu, Yi L" , "Sun, Yi Y" , "peterx@redhat.com" , Jean-Philippe Brucker , "iommu@lists.linux-foundation.org" , "linux-kernel@vger.kernel.org" , Jacob Pan Subject: RE: [PATCH v2 06/12] iommu/vt-d: Add second level page table interface Thread-Topic: [PATCH v2 06/12] iommu/vt-d: Add second level page table interface Thread-Index: AQHUQAHwDWc2LNkyDUSjEMfaICr4mqTimnUA Date: Thu, 6 Sep 2018 03:11:50 +0000 Message-ID: References: <20180830013524.28743-1-baolu.lu@linux.intel.com> <20180830013524.28743-7-baolu.lu@linux.intel.com> In-Reply-To: <20180830013524.28743-7-baolu.lu@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_NT x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiY2YxOTk0NzUtZDBmMC00YWQyLTkzNmMtMjVmYzlhNTM3ZDc4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiaVZjdHAyRTdwYVF6M0JIdGZTNWdEOEhmUDFiY2xIUUNmNFY3NDg2VDFsUVwvWHd0WXlEOWc5ZkREWHRFRmxuZFwvIn0= dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > Sent: Thursday, August 30, 2018 9:35 AM > > This adds the interfaces to setup or tear down the structures > for second level page table translations. This includes types > of second level only translation and pass through. > > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Cc: Liu Yi L > Signed-off-by: Sanjay Kumar > Signed-off-by: Lu Baolu > Reviewed-by: Ashok Raj > --- > drivers/iommu/intel-iommu.c | 2 +- > drivers/iommu/intel-pasid.c | 246 > ++++++++++++++++++++++++++++++++++++ > drivers/iommu/intel-pasid.h | 7 + > include/linux/intel-iommu.h | 3 + > 4 files changed, 257 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 562da10bf93e..de6b909bb47a 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -1232,7 +1232,7 @@ static void iommu_set_root_entry(struct > intel_iommu *iommu) > raw_spin_unlock_irqrestore(&iommu->register_lock, flag); > } > > -static void iommu_flush_write_buffer(struct intel_iommu *iommu) > +void iommu_flush_write_buffer(struct intel_iommu *iommu) > { > u32 val; > unsigned long flag; > diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c > index d6e90cd5b062..edcea1d8b9fc 100644 > --- a/drivers/iommu/intel-pasid.c > +++ b/drivers/iommu/intel-pasid.c > @@ -9,6 +9,7 @@ > > #define pr_fmt(fmt) "DMAR: " fmt > > +#include > #include > #include > #include > @@ -291,3 +292,248 @@ void intel_pasid_clear_entry(struct device *dev, > int pasid) > > pasid_clear_entry(pe); > } > + > +static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits) > +{ > + u64 old; > + > + old = READ_ONCE(*ptr); > + WRITE_ONCE(*ptr, (old & ~mask) | bits); > +} > + > +/* > + * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode > + * PASID entry. > + */ > +static inline void > +pasid_set_domain_id(struct pasid_entry *pe, u64 value) > +{ > + pasid_set_bits(&pe->val[1], GENMASK_ULL(15, 0), value); > +} > + > +/* > + * Setup the SLPTPTR(Second Level Page Table Pointer) field (Bit 12~63) > + * of a scalable mode PASID entry. > + */ > +static inline void > +pasid_set_address_root(struct pasid_entry *pe, u64 value) is address_root too general? especially when the entry could contain both 1st level and 2nd level pointers. > +{ > + pasid_set_bits(&pe->val[0], VTD_PAGE_MASK, value); > +} > + > +/* > + * Setup the AW(Address Width) field (Bit 2~4) of a scalable mode PASID > + * entry. > + */ > +static inline void > +pasid_set_address_width(struct pasid_entry *pe, u64 value) > +{ > + pasid_set_bits(&pe->val[0], GENMASK_ULL(4, 2), value << 2); > +} > + > +/* > + * Setup the PGTT(PASID Granular Translation Type) field (Bit 6~8) > + * of a scalable mode PASID entry. > + */ > +static inline void > +pasid_set_translation_type(struct pasid_entry *pe, u64 value) > +{ > + pasid_set_bits(&pe->val[0], GENMASK_ULL(8, 6), value << 6); > +} > + > +/* > + * Enable fault processing by clearing the FPD(Fault Processing > + * Disable) field (Bit 1) of a scalable mode PASID entry. > + */ > +static inline void pasid_set_fault_enable(struct pasid_entry *pe) > +{ > + pasid_set_bits(&pe->val[0], 1 << 1, 0); > +} > + > +/* > + * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a > + * scalable mode PASID entry. > + */ > +static inline void pasid_set_sre(struct pasid_entry *pe) > +{ > + pasid_set_bits(&pe->val[2], 1 << 0, 1); > +} > + > +/* > + * Setup the P(Present) field (Bit 0) of a scalable mode PASID > + * entry. > + */ > +static inline void pasid_set_present(struct pasid_entry *pe) > +{ > + pasid_set_bits(&pe->val[0], 1 << 0, 1); > +} it's a long list and there could be more in the future. What about defining some macro to simplify LOC, e.g. #define PASID_SET(name, i, m, b) \ static inline void pasid_set_name(struct pasid_entry *pe) \ { \ pasid_set_bits(&pe->val[i], m, b); \ } PASID_SET(present, 0, 1<<0, 1); PASID_SET(sre, 2, 1<<0, 1); ... > + > +/* > + * Setup Page Walk Snoop bit (Bit 87) of a scalable mode PASID > + * entry. > + */ > +static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value) > +{ > + pasid_set_bits(&pe->val[1], 1 << 23, value); > +} > + > +static void > +pasid_based_pasid_cache_invalidation(struct intel_iommu *iommu, > + int did, int pasid) pasid_cache_invalidation_with_pasid > +{ > + struct qi_desc desc; > + > + desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | > QI_PC_PASID(pasid); > + desc.qw1 = 0; > + desc.qw2 = 0; > + desc.qw3 = 0; > + > + qi_submit_sync(&desc, iommu); > +} > + > +static void > +pasid_based_iotlb_cache_invalidation(struct intel_iommu *iommu, > + u16 did, u32 pasid) iotlb_invalidation_with_pasid > +{ > + struct qi_desc desc; > + > + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | > + QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) | > QI_EIOTLB_TYPE; > + desc.qw1 = 0; > + desc.qw2 = 0; > + desc.qw3 = 0; > + > + qi_submit_sync(&desc, iommu); > +} > + > +static void > +pasid_based_dev_iotlb_cache_invalidation(struct intel_iommu *iommu, > + struct device *dev, int pasid) devtlb_invalidation_with_pasid > +{ > + struct device_domain_info *info; > + u16 sid, qdep, pfsid; > + > + info = dev->archdata.iommu; > + if (!info || !info->ats_enabled) > + return; > + > + sid = info->bus << 8 | info->devfn; > + qdep = info->ats_qdep; > + pfsid = info->pfsid; > + > + qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - > VTD_PAGE_SHIFT); > +} > + > +static void tear_down_one_pasid_entry(struct intel_iommu *iommu, > + struct device *dev, u16 did, > + int pasid) > +{ > + struct pasid_entry *pte; ptep > + > + intel_pasid_clear_entry(dev, pasid); > + > + if (!ecap_coherent(iommu->ecap)) { > + pte = intel_pasid_get_entry(dev, pasid); > + clflush_cache_range(pte, sizeof(*pte)); > + } > + > + pasid_based_pasid_cache_invalidation(iommu, did, pasid); > + pasid_based_iotlb_cache_invalidation(iommu, did, pasid); > + > + /* Device IOTLB doesn't need to be flushed in caching mode. */ > + if (!cap_caching_mode(iommu->cap)) > + pasid_based_dev_iotlb_cache_invalidation(iommu, dev, > pasid); can you elaborate, or point to any spec reference? > +} > + > +/* > + * Set up the scalable mode pasid table entry for second only or > + * passthrough translation type. > + */ > +int intel_pasid_setup_second_level(struct intel_iommu *iommu, second_level doesn't imply passthrough. what about intel_pasid_ setup_common, which is then invoked by SL or PT individually ( or even FL)? > + struct dmar_domain *domain, > + struct device *dev, int pasid, > + bool pass_through) > +{ > + struct pasid_entry *pte; > + struct dma_pte *pgd; > + u64 pgd_val; > + int agaw; > + u16 did; > + > + /* > + * If hardware advertises no support for second level translation, > + * we only allow pass through translation setup. > + */ > + if (!(ecap_slts(iommu->ecap) || pass_through)) { > + pr_err("No first level translation support on %s, only pass- first->second > through mode allowed\n", > + iommu->name); > + return -EINVAL; > + } > + > + /* > + * Skip top levels of page tables for iommu which has less agaw skip doesn't mean error > + * than default. Unnecessary for PT mode. > + */ > + pgd = domain->pgd; > + if (!pass_through) { > + for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) > { > + pgd = phys_to_virt(dma_pte_addr(pgd)); > + if (!dma_pte_present(pgd)) { > + dev_err(dev, "Invalid domain page table\n"); > + return -EINVAL; > + } > + } > + } > + pgd_val = pass_through ? 0 : virt_to_phys(pgd); > + did = pass_through ? FLPT_DEFAULT_DID : > + domain->iommu_did[iommu->seq_id]; > + > + pte = intel_pasid_get_entry(dev, pasid); > + if (!pte) { > + dev_err(dev, "Failed to get pasid entry of PASID %d\n", > pasid); > + return -ENODEV; > + } > + > + pasid_clear_entry(pte); > + pasid_set_domain_id(pte, did); > + > + if (!pass_through) > + pasid_set_address_root(pte, pgd_val); > + > + pasid_set_address_width(pte, iommu->agaw); > + pasid_set_translation_type(pte, pass_through ? 4 : 2); > + pasid_set_fault_enable(pte); > + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); > + > + /* > + * Since it is a second level only translation setup, we should > + * set SRE bit as well (addresses are expected to be GPAs). > + */ > + pasid_set_sre(pte); > + pasid_set_present(pte); > + > + if (!ecap_coherent(iommu->ecap)) > + clflush_cache_range(pte, sizeof(*pte)); > + > + if (cap_caching_mode(iommu->cap)) { > + pasid_based_pasid_cache_invalidation(iommu, did, pasid); > + pasid_based_iotlb_cache_invalidation(iommu, did, pasid); > + } else { > + iommu_flush_write_buffer(iommu); > + } > + > + return 0; > +} > + > +/* > + * Tear down the scalable mode pasid table entry for second only or > + * passthrough translation type. > + */ > +void intel_pasid_tear_down_second_level(struct intel_iommu *iommu, > + struct dmar_domain *domain, > + struct device *dev, int pasid) > +{ > + u16 did = domain->iommu_did[iommu->seq_id]; > + > + tear_down_one_pasid_entry(iommu, dev, did, pasid); > +} > diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h > index 03c1612d173c..85b158a1826a 100644 > --- a/drivers/iommu/intel-pasid.h > +++ b/drivers/iommu/intel-pasid.h > @@ -49,5 +49,12 @@ struct pasid_table *intel_pasid_get_table(struct > device *dev); > int intel_pasid_get_dev_max_id(struct device *dev); > struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid); > void intel_pasid_clear_entry(struct device *dev, int pasid); > +int intel_pasid_setup_second_level(struct intel_iommu *iommu, > + struct dmar_domain *domain, > + struct device *dev, int pasid, > + bool pass_through); > +void intel_pasid_tear_down_second_level(struct intel_iommu *iommu, > + struct dmar_domain *domain, > + struct device *dev, int pasid); > > #endif /* __INTEL_PASID_H */ > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index 72aff482b293..d77d23dfd221 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -115,6 +115,8 @@ > * Extended Capability Register > */ > > +#define ecap_smpwc(e) (((e) >> 48) & 0x1) > +#define ecap_slts(e) (((e) >> 46) & 0x1) > #define ecap_smts(e) (((e) >> 43) & 0x1) > #define ecap_dit(e) ((e >> 41) & 0x1) > #define ecap_pasid(e) ((e >> 40) & 0x1) > @@ -571,6 +573,7 @@ void free_pgtable_page(void *vaddr); > struct intel_iommu *domain_get_iommu(struct dmar_domain *domain); > int for_each_device_domain(int (*fn)(struct device_domain_info *info, > void *data), void *data); > +void iommu_flush_write_buffer(struct intel_iommu *iommu); > > #ifdef CONFIG_INTEL_IOMMU_SVM > int intel_svm_init(struct intel_iommu *iommu); > -- > 2.17.1