Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp6154432rwb; Tue, 22 Nov 2022 09:24:46 -0800 (PST) X-Google-Smtp-Source: AA0mqf52Ea8Fu2nkWqQrKpgiVjleOYY91BcTtczwg+8p3y6sl4+PxTnIzqVwe3Lxd/jbd2UILw10 X-Received: by 2002:a62:19c8:0:b0:56b:f390:36f with SMTP id 191-20020a6219c8000000b0056bf390036fmr26928802pfz.2.1669137886010; Tue, 22 Nov 2022 09:24:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669137886; cv=none; d=google.com; s=arc-20160816; b=hOz+ad3k0aDvnhu4Ut1sM8Cxp+mkIeRTdsZBx+U1Nd6P5g5x5/yFfHbIQx6gcfYmrR BEyqEEc2VEgi/PNVBWQRj7wlS89lEcu2uhFPsJCNLjyKCmrBVgvQNp7DUzE157rog6N1 poM4iRWOaEbImoNgsUibjN5eCsRU0UsqFLzYswxa+kqiiKU+OU6ikJxNu1gKLixZ9Bvu PuckMgAPFmrh6U5rOv6X0sUiUR5+t4yM1KPyqd+sYFnqZzHuK4Mehc1fVSSqL1LmTrXB CfJPwBhJco9OHbxhZevY9D/+a8I156TPgHfdSbj0e6XlNnS9oAYROljE6niiIeZOuLTr vE/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=iwQgscnLeDBFBF9+aiLe4u5vXP9kTqQWP2JaqhRntG8=; b=ZreQ2RXvlNokpb5owLmW6A7HEKekANzzxYDnA/CSP+Yyz5dX00pCAQ28vAQwd2bWNm 27D1/6Sj5xF3pBGefaOcva9VUHiDafGD0Ks5wVTrRo2M49rejXsPo1zX7ISr3e1FMPrV jP8ILY3pU8zPp5izN2E23JT8Gzv9Hon6G9PWtk73C5IPR7Q7aEroNjhMGw1NQhBhEt10 REnJ1RtOyhS/poT95SOHOQXSzRakQR15aGrqNLaD6KCtnbTaSbKqC21b+mTJcIHL9qku DHXlkW9nP1m4xSbKFR3T8XzuggkwPTX8x+bKvUUt255o2MCmViEtuyY5WD9/8XvkbKse mslw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=eqm7dQIT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a37-20020a631a65000000b0044d72a10aafsi15368511pgm.34.2022.11.22.09.24.32; Tue, 22 Nov 2022 09:24:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=eqm7dQIT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234306AbiKVQwy (ORCPT + 90 others); Tue, 22 Nov 2022 11:52:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234333AbiKVQwk (ORCPT ); Tue, 22 Nov 2022 11:52:40 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A2D8616F for ; Tue, 22 Nov 2022 08:52:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1669135957; x=1700671957; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=1InFcshJ6BkGxovepIOtFxf1177oRXbVXt0TP4XS+cs=; b=eqm7dQITGa2l77dfAOK+hEzWiuMsQiVFUoqqhyJdHaumzveciDM1fzUe +qltIfrxoR0P9emfiefNusELxEF6mDJkw1Njw6l4SK5TK+WVVv+p/ulJO hNO085wo62XJQu2pVHtDmllrKkwtfyVUoQBdDI9s6Wm2sS1WfdQXCcp4E lkO0ezHE/GMoOgK1x3AyxI91/liV8l0Oqva7uu+hsIIIg0Ytrf4d7jPgL lDN68Lo/MXa8bNc7wGY2Jrr7ccRaaSe0wHuGI2GBwLHC5Ml/8Ar9BuhYX n0oDFNCctCwTDPVX51mF2yrxn1LIjTi8g2pZbegTNILk7GlZVXsEUIfgk Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10539"; a="312562376" X-IronPort-AV: E=Sophos;i="5.96,184,1665471600"; d="scan'208";a="312562376" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Nov 2022 08:52:36 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10539"; a="970532308" X-IronPort-AV: E=Sophos;i="5.96,184,1665471600"; d="scan'208";a="970532308" Received: from araj-dh-work.jf.intel.com (HELO araj-dh-work) ([10.165.157.158]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Nov 2022 08:52:36 -0800 Date: Tue, 22 Nov 2022 16:52:26 +0000 From: Ashok Raj To: Jacob Pan Cc: LKML , iommu@lists.linux.dev, Lu Baolu , Joerg Roedel , Robin Murphy , Will Deacon , David Woodhouse , Raj Ashok , "Tian, Kevin" , Yi Liu , Yuzhang Luo Subject: Re: [PATCH] iommu/vt-d: Add a fix for devices need extra dtlb flush Message-ID: References: <20221122034529.3311562-1-jacob.jun.pan@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221122034529.3311562-1-jacob.jun.pan@linux.intel.com> X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 21, 2022 at 07:45:29PM -0800, Jacob Pan wrote: > QAT devices on Intel Sapphire Rapids and Emerald Rapids has a defect in s/has/have > address translation service (ATS). These devices may inadvertently issue > ATS invalidation completion before posted writes initiated with > translated address that utilized translations matching the invalidation > address range, violating the invalidation completion ordering. Above is a bit hard to read, can you see if the rephrasing works? These devices send the invalidation response ahead of servicing any posted writes that could be using the same address range. This can lead to memory corruption. > > An additional device TLB invalidation is needed to ensure no more posted > writes with translated address following the invalidation completion. > > Without this fix, DMA writes with translated address can happen after > device TLB invalidation is completed. Random data corruption can occur > as the DMA buffer gets reused after invalidation. Maybe drop both para above and replace with. An additional dTLB invalidation ensures the ordering is preserved and prevents any data-corruption. > > This patch adds a flag to mark the need for an extra flush based on given > device IDs, this flag is set during initialization when ATS support is > enabled. > > At runtime, this flag is used to control the extra dTLB flush. The > overhead of checking this unlikely true flag is smaller than checking > PCI device IDs every time. The above 2 para's can be dropped? Since that's what the code does and probably not needed in the commit log. > > Device TLBs are invalidated under the following six conditions: > 1. Device driver does DMA API unmap IOVA > 2. Device driver unbind a PASID from a process, sva_unbind_device() > 3. PASID is torn down, after PASID cache is flushed. e.g. process > exit_mmap() due to crash > 4. Under SVA usage, called by mmu_notifier.invalidate_range() where > VM has to free pages that were unmapped > 5. userspace driver unmaps a DMA buffer > 6. Cache invalidation in vSVA usage (upcoming) > > For #1 and #2, device drivers are responsible for stopping DMA traffic > before unmap/unbind. For #3, iommu driver gets mmu_notifier to > invalidate TLB the same way as normal user unmap which will do an extra > invalidation. The dTLB invalidation after PASID cache flush does not > need an extra invalidation. > > Therefore, we only need to deal with #4 and #5 in this patch. #1 is also > covered by this patch due to common code path with #5. I was told to add an imperative statement to tell what the patch does, without the code. Maybe something like: Add an extra device TLB invalidation for affected devices. > > Tested-by: Yuzhang Luo > Signed-off-by: Jacob Pan > --- Reviewed-by: Ashok Raj > drivers/iommu/intel/iommu.c | 100 +++++++++++++++++++++++++++++------- > drivers/iommu/intel/iommu.h | 3 ++ > drivers/iommu/intel/svm.c | 4 +- > 3 files changed, 87 insertions(+), 20 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 996a8b5ee5ee..6254788330b8 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -1396,6 +1396,39 @@ static void domain_update_iotlb(struct dmar_domain *domain) > spin_unlock_irqrestore(&domain->lock, flags); > } > > +/* > + * Check that the device does not live on an external facing PCI port that is > + * marked as untrusted. Such devices should not be able to apply quirks and > + * thus not be able to bypass the IOMMU restrictions. > + */ > +static bool risky_device(struct pci_dev *pdev) > +{ > + if (pdev->untrusted) { > + pci_info(pdev, > + "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n", > + pdev->vendor, pdev->device); > + pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n"); > + return true; > + } > + return false; > +} > + > +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */ > +#define BUGGY_QAT_DEVID_MASK 0x494c > +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev) > +{ > + if (pdev->vendor != PCI_VENDOR_ID_INTEL) > + return false; > + > + if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK) > + return false; > + > + if (risky_device(pdev)) > + return false; > + > + return true; > +} > + > static void iommu_enable_pci_caps(struct device_domain_info *info) > { > struct pci_dev *pdev; > @@ -1478,6 +1511,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info, > qdep = info->ats_qdep; > qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, > qdep, addr, mask); > + quirk_dev_tlb(info, addr, mask, PASID_RID2PASID, qdep); > } > > static void iommu_flush_dev_iotlb(struct dmar_domain *domain, > @@ -4490,9 +4524,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) > if (dev_is_pci(dev)) { > if (ecap_dev_iotlb_support(iommu->ecap) && > pci_ats_supported(pdev) && > - dmar_ats_supported(pdev, iommu)) > + dmar_ats_supported(pdev, iommu)) { > info->ats_supported = 1; > - > + info->dtlb_extra_inval = dev_needs_extra_dtlb_flush(pdev); > + } > if (sm_supported(iommu)) { > if (pasid_supported(iommu)) { > int features = pci_pasid_features(pdev); > @@ -4680,23 +4715,6 @@ static bool intel_iommu_is_attach_deferred(struct device *dev) > return translation_pre_enabled(info->iommu) && !info->domain; > } > > -/* > - * Check that the device does not live on an external facing PCI port that is > - * marked as untrusted. Such devices should not be able to apply quirks and > - * thus not be able to bypass the IOMMU restrictions. > - */ > -static bool risky_device(struct pci_dev *pdev) > -{ > - if (pdev->untrusted) { > - pci_info(pdev, > - "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n", > - pdev->vendor, pdev->device); > - pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n"); > - return true; > - } > - return false; > -} > - > static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain, > unsigned long iova, size_t size) > { > @@ -4931,3 +4949,47 @@ static void __init check_tylersburg_isoch(void) > pr_warn("Recommended TLB entries for ISOCH unit is 16; your BIOS set %d\n", > vtisochctrl); > } > + > +/* > + * Here we deal with a device TLB defect where device may inadvertently issue ATS > + * invalidation completion before posted writes initiated with translated address > + * that utilized translations matching the invalidation address range, violating > + * the invalidation completion ordering. > + * Therefore, any use cases that cannot guarantee DMA is stopped before unmap is > + * vulnerable to this defect. In other words, any dTLB invalidation initiated not > + * under the control of the trusted/privileged host device driver must use this > + * quirk. > + * Device TLBs are invalidated under the following six conditions: > + * 1. Device driver does DMA API unmap IOVA > + * 2. Device driver unbind a PASID from a process, sva_unbind_device() > + * 3. PASID is torn down, after PASID cache is flushed. e.g. process > + * exit_mmap() due to crash > + * 4. Under SVA usage, called by mmu_notifier.invalidate_range() where > + * VM has to free pages that were unmapped > + * 5. Userspace driver unmaps a DMA buffer > + * 6. Cache invalidation in vSVA usage (upcoming) > + * > + * For #1 and #2, device drivers are responsible for stopping DMA traffic > + * before unmap/unbind. For #3, iommu driver gets mmu_notifier to > + * invalidate TLB the same way as normal user unmap which will use this quirk. > + * The dTLB invalidation after PASID cache flush does not need this quirk. > + * > + * As a reminder, #6 will *NEED* this quirk as we enable nested translation. > + */ > +void quirk_dev_tlb(struct device_domain_info *info, unsigned long address, > + unsigned long mask, u32 pasid, u16 qdep) > +{ > + u16 sid; > + > + if (likely(!info->dtlb_extra_inval)) > + return; > + > + sid = PCI_DEVID(info->bus, info->devfn); > + if (pasid == PASID_RID2PASID) { > + qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, > + qdep, address, mask); > + } else { > + qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid, > + pasid, qdep, address, mask); > + } > +} > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h > index 92023dff9513..09b989bf545f 100644 > --- a/drivers/iommu/intel/iommu.h > +++ b/drivers/iommu/intel/iommu.h > @@ -623,6 +623,7 @@ struct device_domain_info { > u8 pri_enabled:1; > u8 ats_supported:1; > u8 ats_enabled:1; > + u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */ > u8 ats_qdep; > struct device *dev; /* it's NULL for PCIe-to-PCI bridge */ > struct intel_iommu *iommu; /* IOMMU used by this device */ > @@ -728,6 +729,8 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, > void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, > u32 pasid, u16 qdep, u64 addr, > unsigned int size_order); > +void quirk_dev_tlb(struct device_domain_info *info, unsigned long address, > + unsigned long pages, u32 pasid, u16 qdep); > void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, > u32 pasid); > > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c > index 7d08eb034f2d..117430b97ba9 100644 > --- a/drivers/iommu/intel/svm.c > +++ b/drivers/iommu/intel/svm.c > @@ -184,10 +184,12 @@ static void __flush_svm_range_dev(struct intel_svm *svm, > return; > > qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih); > - if (info->ats_enabled) > + if (info->ats_enabled) { > qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid, > svm->pasid, sdev->qdep, address, > order_base_2(pages)); > + quirk_dev_tlb(info, address, order_base_2(pages), svm->pasid, sdev->qdep); > + } > } > > static void intel_flush_svm_range_dev(struct intel_svm *svm, > -- > 2.25.1 > >