Received: by 10.192.165.148 with SMTP id m20csp3638612imm; Mon, 7 May 2018 16:13:37 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqGhI9Cp9f88/dXqhUtqgrXEmQ7GPnDTvs3nPwE9oPgHN6uRB3Gyf1+B1hBUnKwOm/JVEex X-Received: by 2002:a17:902:9a90:: with SMTP id w16-v6mr39036636plp.390.1525734817383; Mon, 07 May 2018 16:13:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525734817; cv=none; d=google.com; s=arc-20160816; b=Nuj9cHbXDXDxka1efyjt9dMiu1LhD/y2rHvmFnhS/wSE6L/kLQ4EpsfrtxZ+1LWKI/ NO3PTQGJjUF+lVqqj43trMQfNidF+VbaPE6KpVEORwN1UbLQn9PQB2ze03ezY1mBFHnB 4U9Augk457kYSIguNFBZ0KYNTTdusELZpdtGdUBKKrwujRfyy1ChgeFtb8HFwARaR5XC 84vMbRPBDrK5Lf3QQFJcfAHef5CUV2xcVUGD+rLEOUhw84I3fvVEXeinm7aVDsIMR4f3 Blr6qjG5ufmvZyz14flWJcc2MEy/NNb2BaQEwxCMz4KO7kAyNrj5HPVIvNmjjpVJhz6U s8Fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=qXXrtN7zm4lK5w4mxlE2vX43Db6jU7dap44phlG5Hy4=; b=YoF4GVsNVagTJI4uiq+kulPRci83q0TxsmZPyaLQsoOtyt4ghuiPs+GUlKt/gtNKNi jvoJldjjIJ+eMPGjwHCU5fbcK6HpD12i1GJwcvt8+xbZQG68Jhe4ktptvrYjl8Yn6Pc9 TF7ASYcAbh9oIYk5su7vgOM75k08JllO5ypx2dGKZHpuknEXNjNH5NR3RllpRz8wCTKf uSPl+KPQqHdcus+ZH9VjvnGg7SyDoGVOvGvWpEvQEVUk5/55R+bXky7xzneJeMu004Kf Kk1D8yNSy9YdOvdDb6EFi8n6FzrgIXZ2/vZFkgsjCsRzQ9eC0F9yz4vIQlgDrz9cwLGi sjyA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=kHKUU/pk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 34-v6si22585531plm.495.2018.05.07.16.13.22; Mon, 07 May 2018 16:13:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=kHKUU/pk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753312AbeEGXNL (ORCPT + 99 others); Mon, 7 May 2018 19:13:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:53336 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752773AbeEGXNJ (ORCPT ); Mon, 7 May 2018 19:13:09 -0400 Received: from localhost (unknown [69.71.5.252]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 80CD720740; Mon, 7 May 2018 23:13:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1525734789; bh=wxf18GLD5goxABALB2gS3mv4p1pWc66BR19Y6k1b/ws=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=kHKUU/pkH69HVw+DSvX0mHKpXXgb/lwDfpJu+pL7m737SRnM+8u6PWoOD0ia94SwS CfU5sFsjWpOo6y5S5FfbXNoCq+ade0dxN9M6mainfqb0153T1KmzMF80+81UlEeTnX aeTMLqBzAsuw3o8QJt0G4vrIVURvHS6Zb2FzCgmg= Date: Mon, 7 May 2018 18:13:06 -0500 From: Bjorn Helgaas To: Logan Gunthorpe , Alex Williamson Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org, Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?iso-8859-1?B?Suly9G1l?= Glisse , Benjamin Herrenschmidt , Christian =?iso-8859-1?Q?K=F6nig?= Subject: Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Message-ID: <20180507231306.GG161390@bhelgaas-glaptop.roam.corp.google.com> References: <20180423233046.21476-1-logang@deltatee.com> <20180423233046.21476-5-logang@deltatee.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180423233046.21476-5-logang@deltatee.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+to Alex] Alex, Are you happy with this strategy of turning off ACS based on CONFIG_PCI_P2PDMA? We only check this at enumeration-time and I don't know if there are other places we would care? On Mon, Apr 23, 2018 at 05:30:36PM -0600, Logan Gunthorpe wrote: > For peer-to-peer transactions to work the downstream ports in each > switch must not have the ACS flags set. At this time there is no way > to dynamically change the flags and update the corresponding IOMMU > groups so this is done at enumeration time before the groups are > assigned. > > This effectively means that if CONFIG_PCI_P2PDMA is selected then > all devices behind any PCIe switch heirarchy will be in the same IOMMU > group. Which implies that individual devices behind any switch > heirarchy will not be able to be assigned to separate VMs because > there is no isolation between them. Additionally, any malicious PCIe > devices will be able to DMA to memory exposed by other EPs in the same > domain as TLPs will not be checked by the IOMMU. > > Given that the intended use case of P2P Memory is for users with > custom hardware designed for purpose, we do not expect distributors > to ever need to enable this option. Users that want to use P2P > must have compiled a custom kernel with this configuration option > and understand the implications regarding ACS. They will either > not require ACS or will have design the system in such a way that > devices that require isolation will be separate from those using P2P > transactions. > > Signed-off-by: Logan Gunthorpe > --- > drivers/pci/Kconfig | 9 +++++++++ > drivers/pci/p2pdma.c | 45 ++++++++++++++++++++++++++++++--------------- > drivers/pci/pci.c | 6 ++++++ > include/linux/pci-p2pdma.h | 5 +++++ > 4 files changed, 50 insertions(+), 15 deletions(-) > > diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig > index b2396c22b53e..b6db41d4b708 100644 > --- a/drivers/pci/Kconfig > +++ b/drivers/pci/Kconfig > @@ -139,6 +139,15 @@ config PCI_P2PDMA > transations must be between devices behind the same root port. > (Typically behind a network of PCIe switches). > > + Enabling this option will also disable ACS on all ports behind > + any PCIe switch. This effectively puts all devices behind any > + switch heirarchy into the same IOMMU group. Which implies that s/heirarchy/hierarchy/ (also above in changelog) > + individual devices behind any switch will not be able to be > + assigned to separate VMs because there is no isolation between > + them. Additionally, any malicious PCIe devices will be able to > + DMA to memory exposed by other EPs in the same domain as TLPs > + will not be checked by the IOMMU. > + > If unsure, say N. > > config PCI_LABEL > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index ed9dce8552a2..e9f43b43acac 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -240,27 +240,42 @@ static struct pci_dev *find_parent_pci_dev(struct device *dev) > } > > /* > - * If a device is behind a switch, we try to find the upstream bridge > - * port of the switch. This requires two calls to pci_upstream_bridge(): > - * one for the upstream port on the switch, one on the upstream port > - * for the next level in the hierarchy. Because of this, devices connected > - * to the root port will be rejected. > + * pci_p2pdma_disable_acs - disable ACS flags for all PCI bridges > + * @pdev: device to disable ACS flags for > + * > + * The ACS flags for P2P Request Redirect and P2P Completion Redirect need > + * to be disabled on any PCI bridge in order for the TLPS to not be forwarded > + * up to the RC which is not what we want for P2P. s/PCI bridge/PCIe switch/ (ACS doesn't apply to conventional PCI) > + * > + * This function is called when the devices are first enumerated and > + * will result in all devices behind any bridge to be in the same IOMMU > + * group. At this time, there is no way to "hotplug" IOMMU groups so we rely > + * on this largish hammer. If you need the devices to be in separate groups > + * don't enable CONFIG_PCI_P2PDMA. > + * > + * Returns 1 if the ACS bits for this device was cleared, otherwise 0. > */ > -static struct pci_dev *get_upstream_bridge_port(struct pci_dev *pdev) > +int pci_p2pdma_disable_acs(struct pci_dev *pdev) > { > - struct pci_dev *up1, *up2; > + int pos; > + u16 ctrl; > > - if (!pdev) > - return NULL; > + if (!pci_is_bridge(pdev)) > + return 0; > > - up1 = pci_dev_get(pci_upstream_bridge(pdev)); > - if (!up1) > - return NULL; > + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS); > + if (!pos) > + return 0; > + > + pci_info(pdev, "disabling ACS flags for peer-to-peer DMA\n"); > + > + pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl); > + > + ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR); > > - up2 = pci_dev_get(pci_upstream_bridge(up1)); > - pci_dev_put(up1); > + pci_write_config_word(pdev, pos + PCI_ACS_CTRL, ctrl); > > - return up2; > + return 1; > } > > /* > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e597655a5643..7e2f5724ba22 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -2835,6 +2836,11 @@ static void pci_std_enable_acs(struct pci_dev *dev) > */ > void pci_enable_acs(struct pci_dev *dev) > { > +#ifdef CONFIG_PCI_P2PDMA > + if (pci_p2pdma_disable_acs(dev)) > + return; > +#endif > + > if (!pci_acs_enable) > return; > > diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h > index 0cde88341eeb..fcb3437a2f3c 100644 > --- a/include/linux/pci-p2pdma.h > +++ b/include/linux/pci-p2pdma.h > @@ -18,6 +18,7 @@ struct block_device; > struct scatterlist; > > #ifdef CONFIG_PCI_P2PDMA > +int pci_p2pdma_disable_acs(struct pci_dev *pdev); > int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, > u64 offset); > int pci_p2pdma_add_client(struct list_head *head, struct device *dev); > @@ -40,6 +41,10 @@ int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents, > void pci_p2pdma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, > enum dma_data_direction dir); > #else /* CONFIG_PCI_P2PDMA */ > +static inline int pci_p2pdma_disable_acs(struct pci_dev *pdev) > +{ > + return 0; > +} > static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, > size_t size, u64 offset) > { > -- > 2.11.0 >