Received: by 10.213.65.68 with SMTP id h4csp1176144imn; Wed, 14 Mar 2018 11:53:14 -0700 (PDT) X-Google-Smtp-Source: AG47ELukeXw7L9cozk0Y2gX1Gd38pqWAqcGOvJ5DJKIfMeatB37bXlpvKUVI+qHWUJFgGuS6QOTa X-Received: by 10.101.93.71 with SMTP id e7mr4675041pgt.248.1521053594083; Wed, 14 Mar 2018 11:53:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521053594; cv=none; d=google.com; s=arc-20160816; b=uzi/mkA9U0SEywDvF5g3qvGNjSL1Eo1/Eu7h4JprwK5bK3EbU4F+UQaqiGomTUCDt0 inBsDAnh+aykpXMCj8dEbeLKqRm5M1Z885j6tdPtAO1nlarglq+3E7BUIQglolIeVcyc AESlqZDxl5olOEVILn0LYlOKT6p9i5JkKOeLUpKd2YdsMdKcyeu/jgvVD4ZVverHZTZ+ +hjC+gbi3vWH+2EK95ayrQ5fPIxnmD1DBueiwYEhUaVVBz7wuWG01NIs7MxxpYrsNf+4 GVn878UBdIhaLDzH8qqhLXmGh/jSSPa1aK0NKQ08wcdLsEOJfsMlf+4u9/13AHh0iMko PYIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:arc-authentication-results; bh=/bwejcIRgvLPygALuy83e59MG8ZyKNeAlLzd7yQ5KYk=; b=zUdHSsn/9nGa0o/pOGMARnYGJZASI4Imy0Kd34l+F+6UPYZHsp0uXXE9wgPhg3Xeyh e2tsO5IefVMNW1XqBOLJdKa8nPz20AruulCY7SzKAhxkL8pK2RgW0huR4RO5cOGukkjU lWQWDmhoOpa3kLNlaoFEb+MJxkGg834znjqMrpbXfcMP4JukhH6QMSDzDfJbV4h3GMvP v/dY/YEiDYaEH3SMEvSz4K9qILWyYvypAB1gY7a8c4J7DNK15TjFSd6hipqGJfBDtzI1 pNeAQx0LKpbe6qPbeSgHrKEp3nt28CGXsEOXoQAfAhRm2i7SyjmaxSXYGNGjlh2ohdB9 Wh1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n128si2265507pga.29.2018.03.14.11.52.58; Wed, 14 Mar 2018 11:53:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751413AbeCNSwE (ORCPT + 99 others); Wed, 14 Mar 2018 14:52:04 -0400 Received: from mail.kernel.org ([198.145.29.99]:45218 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750950AbeCNSwC (ORCPT ); Wed, 14 Mar 2018 14:52:02 -0400 Received: from localhost (unknown [69.71.5.252]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 73EBC2077A; Wed, 14 Mar 2018 18:52:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 73EBC2077A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=helgaas@kernel.org Date: Wed, 14 Mar 2018 13:51:59 -0500 From: Bjorn Helgaas To: Logan Gunthorpe Cc: Stephen Bates , Sinan Kaya , "linux-kernel@vger.kernel.org" , "linux-pci@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "linux-rdma@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?iso-8859-1?B?Suly9G1l?= Glisse , Benjamin Herrenschmidt , Alex Williamson Subject: Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory Message-ID: <20180314185159.GD179719@bhelgaas-glaptop.roam.corp.google.com> References: <3ea80992-a0fc-08f2-d93d-ae0ec4e3f4ce@codeaurora.org> <4eb6850c-df1b-fd44-3ee0-d43a50270b53@deltatee.com> <757fca36-dee4-e070-669e-f2788bd78e41@codeaurora.org> <4f761f55-4e9a-dccb-d12f-c59d2cd689db@deltatee.com> <20180313230850.GA45763@bhelgaas-glaptop.roam.corp.google.com> <8de5d3dd-a78f-02d5-0eea-4365364143b6@deltatee.com> <20180314025639.GA50067@bhelgaas-glaptop.roam.corp.google.com> <112493af-ccd0-455b-6600-b50764f7ab7e@deltatee.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <112493af-ccd0-455b-6600-b50764f7ab7e@deltatee.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 14, 2018 at 10:17:34AM -0600, Logan Gunthorpe wrote: > On 13/03/18 08:56 PM, Bjorn Helgaas wrote: > > I agree that peers need to have a common upstream bridge. I think > > you're saying peers need to have *two* common upstream bridges. If I > > understand correctly, requiring two common bridges is a way to ensure > > that peers directly below Root Ports don't try to DMA to each other. > > No, I don't get where you think we need to have two common upstream > bridges. I'm not sure when such a case would ever happen. But you seem > to understand based on what you wrote below. Sorry, I phrased that wrong. You don't require two common upstream bridges; you require two upstream bridges, with the upper one being common, i.e., static struct pci_dev *get_upstream_bridge_port(struct pci_dev *pdev) { struct pci_dev *up1, *up2; up1 = pci_dev_get(pci_upstream_bridge(pdev)); up2 = pci_dev_get(pci_upstream_bridge(up1)); return up2; } So if you're starting with pdev, up1 is the immediately upstream bridge and up2 is the second upstream bridge. If this is PCIe, up1 may be a Root Port and there is no up2, or up1 and up2 are in a switch. This is more restrictive than the spec requires. As long as there is a single common upstream bridge, peer-to-peer DMA should work. In fact, in conventional PCI, I think the upstream bridge could even be the host bridge (not a PCI-to-PCI bridge). You are focused on PCIe systems, and in those systems, most topologies do have an upstream switch, which means two upstream bridges. I'm trying to remove that assumption because I don't think there's a requirement for it in the spec. Enforcing this assumption complicates the code and makes it harder to understand because the reader says "huh, I know peer-to-peer DMA should work inside any PCI hierarchy*, so why do we need these two bridges?" [*] For conventional PCI, this means anything below the same host bridge. Two devices on a conventional PCI root bus should be able to DMA to each other, even though there's no PCI-to-PCI bridge above them. For PCIe, it means a "hierarchy domain" as used in PCIe r4.0, sec 1.3.1, i.e., anything below the same Root Port. > > So I guess the first order of business is to nail down whether peers > > below a Root Port are prohibited from DMAing to each other. My > > assumption, based on 6.12.1.2 and the fact that I haven't yet found > > a prohibition, is that they can. > > If you have a multifunction device designed to DMA to itself below a > root port, it can. But determining this is on a device by device basis, > just as determining whether a root complex can do peer to peer is on a > per device basis. So I'd say we don't want to allow it by default and > let someone who has such a device figure out what's necessary if and > when one comes along. It's not the job of this infrastructure to answer the device-dependent question of whether DMA initiators or targets support peer-to-peer DMA. All we want to do here is figure out whether the PCI topology supports it, using the mechanisms guaranteed by the spec. We can derive that from the basic rules about how PCI bridges work, i.e., from the PCI-to-PCI Bridge spec r1.2, sec 4.3: A bridge forwards PCI memory transactions from its primary interface to its secondary interface (downstream) if a memory address is in the range defined by the Memory Base and Memory Limit registers (when the base is less than or equal to the limit) as illustrated in Figure 4-3. Conversely, a memory transaction on the secondary interface that is within this address range will not be forwarded upstream to the primary interface. Any memory transactions on the secondary interface that are outside this address range will be forwarded upstream to the primary interface (provided they are not in the address range defined by the prefetchable memory address range registers). This works for either PCI or PCIe. The only wrinkle PCIe adds is that the very top of the hierarchy is a Root Port, and we can't rely on it to route traffic to other Root Ports. I also doubt Root Complex Integrated Endpoints can participate in peer-to-peer DMA. Thanks for your patience in working through all this. I know it sometimes feels like being bounced around in all directions. It's just a normal consequence of trying to add complex functionality to an already complex system, with interest and expertise spread unevenly across a crowd of people. Bjorn