Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754084AbdDRSfD (ORCPT ); Tue, 18 Apr 2017 14:35:03 -0400 Received: from mail-oi0-f50.google.com ([209.85.218.50]:34500 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753408AbdDRSe5 (ORCPT ); Tue, 18 Apr 2017 14:34:57 -0400 MIME-Version: 1.0 In-Reply-To: <20170418180020.GE7181@obsidianresearch.com> References: <1492381396.25766.43.camel@kernel.crashing.org> <20170418164557.GA7181@obsidianresearch.com> <20170418180020.GE7181@obsidianresearch.com> From: Dan Williams Date: Tue, 18 Apr 2017 11:34:56 -0700 Message-ID: Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory To: Jason Gunthorpe Cc: Benjamin Herrenschmidt , Logan Gunthorpe , Bjorn Helgaas , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2340 Lines: 54 On Tue, Apr 18, 2017 at 11:00 AM, Jason Gunthorpe wrote: > On Tue, Apr 18, 2017 at 10:27:47AM -0700, Dan Williams wrote: >> > FWIW, RDMA probably wouldn't want to use a p2mem device either, we >> > already have APIs that map BAR memory to user space, and would like to >> > keep using them. A 'enable P2P for bar' helper function sounds better >> > to me. >> >> ...and I think it's not a helper function as much as asking the bus >> provider "can these two device dma to each other". > > What I mean I could write in a RDMA driver: > > /* Allow the memory in BAR 1 to be the target of P2P transactions */ > pci_enable_p2p_bar(dev, 1); > > And not require anything else.. > >> The "helper" is the dma api redirecting through a software-iommu >> that handles bus address translation differently than it would >> handle host memory dma mapping. > > Not sure, until we see what arches actually need to do here it is hard > to design common helpers. > > Here are a few obvious things that arches will need to implement to > support this broadly: > > - Virtualization might need to do a hypervisor call to get the right > translation, or consult some hypervisor specific description table. > > - Anything using IOMMUs for virtualization will need to setup IOMMU > permissions to allow the P2P flow, this might require translation to > an address cookie. > > - Fail if the PCI devices are in different domains, or setup hardware to > do completion bus/device/function translation. > > - All platforms can succeed if the PCI devices are under the same > 'segment', but where segments begin is somewhat platform specific > knowledge. (this is 'same switch' idea Logan has talked about) > > So, we can eventually design helpers for various common scenarios, but > until we see what arch code actually needs to do it seems > premature. Much of this seems to involve interaction with some kind of > hardware, or consulation of some kind of currently platform specific > data, so I'm not sure what a software-iommu would be doing?? > > The main thing to agree on is that this code belongs under dma ops and > that arches have to support struct page mapped BAR addresses in their > dma ops inputs. Is that resonable? I think we're saying the same thing by "software-iommu" and "custom dma_ops", so yes.