Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754408AbdDRW2Y (ORCPT ); Tue, 18 Apr 2017 18:28:24 -0400 Received: from mail-oi0-f42.google.com ([209.85.218.42]:34198 "EHLO mail-oi0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752303AbdDRW2S (ORCPT ); Tue, 18 Apr 2017 18:28:18 -0400 MIME-Version: 1.0 In-Reply-To: <96198489-1af5-abcf-f23f-9a7e41aa17f7@deltatee.com> References: <1492381396.25766.43.camel@kernel.crashing.org> <20170418164557.GA7181@obsidianresearch.com> <20170418190138.GH7181@obsidianresearch.com> <20170418210339.GA24257@obsidianresearch.com> <20170418212258.GA26838@obsidianresearch.com> <96198489-1af5-abcf-f23f-9a7e41aa17f7@deltatee.com> From: Dan Williams Date: Tue, 18 Apr 2017 15:28:17 -0700 Message-ID: Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory To: Logan Gunthorpe Cc: Jason Gunthorpe , Benjamin Herrenschmidt , Bjorn Helgaas , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2407 Lines: 52 On Tue, Apr 18, 2017 at 3:15 PM, Logan Gunthorpe wrote: > > > On 18/04/17 03:36 PM, Dan Williams wrote: >> On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe >> wrote: >>> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: >>>>> I think this opens an even bigger can of worms.. >>>> >>>> No, I don't think it does. You'd only shim when the target page is >>>> backed by a device, not host memory, and you can figure this out by a >>>> is_zone_device_page()-style lookup. >>> >>> The bigger can of worms is how do you meaningfully stack dma_ops. >> >> This goes back to my original comment to make this capability a >> function of the pci bridge itself. The kernel has an implementation of >> a dynamically created bridge device that injects its own dma_ops for >> the devices behind the bridge. See vmd_setup_dma_ops() in >> drivers/pci/host/vmd.c. > > Well the issue I think Jason is pointing out is that the ops don't > stack. The map_* function in the injected dma_ops needs to be able to > call the original map_* for any page that is not p2p memory. This is > especially annoying in the map_sg function which may need to call a > different op based on the contents of the sgl. (And please correct me if > I'm not seeing how this can be done in the vmd example.) Unlike the pci bus address offset case which I think is fundamental to support since shipping archs do this today, I think it is ok to say p2p is restricted to a single sgl that gets to talk to host memory or a single device. That said, what's wrong with a p2p aware map_sg implementation calling up to the host memory map_sg implementation on a per sgl basis? > Also, what happens if p2p pages end up getting passed to a device that > doesn't have the injected dma_ops? This goes back to limiting p2p to a single pci host bridge. If the p2p capability is coordinated with the bridge rather than between the individual devices then we have a central point to catch this case. ...of course this is all hand wavy until someone writes the code and proves otherwise. > However, the concept of replacing the dma_ops for all devices behind a > supporting bridge is interesting and may be a good piece of the final > solution. It's at least a proof point for injecting special behavior for devices behind a (virtual) pci bridge without needing to go touch a bunch of drivers.