Received: by 10.213.65.68 with SMTP id h4csp251939imn; Mon, 12 Mar 2018 12:39:16 -0700 (PDT) X-Google-Smtp-Source: AG47ELuCCxu8tEQjQKoI45Aaujs564V6zSlH4gu/7KXidIbIo6r0vfViavgHk6mX7p9cNavxH1YS X-Received: by 2002:a17:902:744b:: with SMTP id e11-v6mr6456656plt.351.1520883556809; Mon, 12 Mar 2018 12:39:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520883556; cv=none; d=google.com; s=arc-20160816; b=Fdm1idvw3RL5pQ5OCnsh5szDHQm5A0Fs32nIe7TrlN2pr2mp5o9DeaNEvVIwxIZvsL KSbQ9gDTJFtMOAbmDKKiq3hqTfV6e/6WVw2fLEiNZ1C4tQ0Hno6+OgtCrdHsdKEG21N/ VC4VgPL/hBy3mJNKe87obXwf0ebz/cOeyWX9u6I/Ax1DuRUA7+NSqj8WH9ksJcGQ9x9Y vm4GJU0NiPrxLX1+c5HOMk/FaO48M5W0w0GfSzUSmDIxv+HegX5VldxfqjS3zLUV5+EZ gkANbgmj8ZNSzSeJPCUdwwrC2Go3zTd2Yomg0XwTaRQXicre5F29THOXXukYjaMyKOnT SqQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:references:in-reply-to:message-id :date:cc:to:from:arc-authentication-results; bh=iPLqfQF3ZGKjo08dlCBMCcQvh4hKUs9TsMH8lIQEjX8=; b=RV7BgnDLe+Va/ZRm8EKRBPPsdbXBLs7tEOZPiAJcIqRUd17y72htAZg53gz+Z/1VTv z2lEvmisnge47tXAV96RB41As7fsuyM2B2lJjaq+IkDZexWZPYutLCbTwJ+DLk36rM2V BCyQ0SieVgi3SpaNPFmqIaYbwbnSmsewEXDNbIUQi7QEkx1nUiuzqpzUn2Jjxze4fQAV CVpaYI/I6VT5mzXBgza58ZhXhfY/hiLg6MgX6wBZgJ96VjT8YLC8XMYQsdePM1smOc4A ZE9HaxL1/oHkJqQ/fLqg/oS1O+B7R69ikt7w313BkcYE84bv/ER//f4MQ7Ud8ri3Pro7 39Gg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m83si6348764pfa.367.2018.03.12.12.39.01; Mon, 12 Mar 2018 12:39:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932490AbeCLThM (ORCPT + 99 others); Mon, 12 Mar 2018 15:37:12 -0400 Received: from ale.deltatee.com ([207.54.116.67]:54654 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932294AbeCLTfj (ORCPT ); Mon, 12 Mar 2018 15:35:39 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1evTEJ-000666-QD; Mon, 12 Mar 2018 13:35:33 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1evTEG-0000l6-Nf; Mon, 12 Mar 2018 13:35:28 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Benjamin Herrenschmidt , Alex Williamson , Logan Gunthorpe Date: Mon, 12 Mar 2018 13:35:18 -0600 Message-Id: <20180312193525.2855-5-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180312193525.2855-1-logang@deltatee.com> References: <20180312193525.2855-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_NO_TEXT,T_RP_MATCHES_RCVD autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v3 04/11] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org For peer-to-peer transactions to work the downstream ports in each switch must not have the ACS flags set. At this time there is no way to dynamically change the flags and update the corresponding IOMMU groups so this is done at enumeration time before the groups are assigned. This effectively means that if CONFIG_PCI_P2PDMA is selected then all devices behind any PCIe switch will be in the same IOMMU group. Which implies that individual devices behind any switch will not be able to be assigned to separate VMs because there is no isolation between them. Additionally, any malicious PCIe devices will be able to DMA to memory exposed by other EPs in the same domain as TLPs will not be checked by the IOMMU. Given that the intended use case of P2P Memory is for users with custom hardware designed for purpose, we do not expect distributors to ever need to enable this option. Users that want to use P2P must have compiled a custom kernel with this configuration option and understand the implications regarding ACS. They will either not require ACS or will have design the system in such a way that devices that require isolation will be separate from those using P2P transactions. Signed-off-by: Logan Gunthorpe --- drivers/pci/Kconfig | 9 +++++++++ drivers/pci/p2pdma.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ drivers/pci/pci.c | 6 ++++++ include/linux/pci-p2pdma.h | 5 +++++ 4 files changed, 64 insertions(+) diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index d59f6f5ddfcd..c7a9d155baca 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -138,6 +138,15 @@ config PCI_P2PDMA it's hard to tell which support it at all, so at this time you will need a PCIe switch. + Enabling this option will also disable ACS on all ports behind + any PCIe switch. This effectively puts all devices behind any + switch into the same IOMMU group. Which implies that individual + devices behind any switch will not be able to be assigned to + separate VMs because there is no isolation between them. + Additionally, any malicious PCIe devices will be able to DMA + to memory exposed by other EPs in the same domain as TLPs will + not be checked by the IOMMU. + If unsure, say N. config PCI_LABEL diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index ab810c3a93eb..3e70b0662def 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -264,6 +264,50 @@ static struct pci_dev *get_upstream_bridge_port(struct pci_dev *pdev) } /* + * pci_p2pdma_disable_acs - disable ACS flags for ports in PCI + * bridges/switches + * @pdev: device to disable ACS flags for + * + * The ACS flags for P2P Request Redirect and P2P Completion Redirect need + * to be disabled on any downstream port in any switch in order for + * the TLPs to not be forwarded up to the RC which is not what we want + * for P2P. + * + * This function is called when the devices are first enumerated and + * will result in all devices behind any switch to be in the same IOMMU + * group. At this time there is no way to "hotplug" IOMMU groups so we rely + * on this largish hammer. If you need the devices to be in separate groups + * don't enable CONFIG_PCI_P2PDMA. + * + * Returns 1 if the ACS bits for this device were cleared, otherwise 0. + */ +int pci_p2pdma_disable_acs(struct pci_dev *pdev) +{ + struct pci_dev *up; + int pos; + u16 ctrl; + + up = get_upstream_bridge_port(pdev); + if (!up) + return 0; + pci_dev_put(up); + + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS); + if (!pos) + return 0; + + pci_info(pdev, "disabling ACS flags for peer-to-peer DMA\n"); + + pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl); + + ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR); + + pci_write_config_word(pdev, pos + PCI_ACS_CTRL, ctrl); + + return 1; +} + +/* * This function checks if two PCI devices are behind the same switch. * (ie. they share the same second upstream port as returned by * get_upstream_bridge_port().) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index f6a4dd10d9b0..e5da8f482e94 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -2826,6 +2827,11 @@ static void pci_std_enable_acs(struct pci_dev *dev) */ void pci_enable_acs(struct pci_dev *dev) { +#ifdef CONFIG_PCI_P2PDMA + if (pci_p2pdma_disable_acs(dev)) + return; +#endif + if (!pci_acs_enable) return; diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 59eb218bdb25..2a2bf2ca018e 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -18,6 +18,7 @@ struct block_device; struct scatterlist; #ifdef CONFIG_PCI_P2PDMA +int pci_p2pdma_disable_acs(struct pci_dev *pdev); int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset); int pci_p2pdma_add_client(struct list_head *head, struct device *dev); @@ -41,6 +42,10 @@ int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents, void pci_p2pdma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir); #else /* CONFIG_PCI_P2PDMA */ +static inline int pci_p2pdma_disable_acs(struct pci_dev *pdev) +{ + return 0; +} static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) { -- 2.11.0