Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757405AbdDRWQE (ORCPT ); Tue, 18 Apr 2017 18:16:04 -0400 Received: from ale.deltatee.com ([207.54.116.67]:54934 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752910AbdDRWQA (ORCPT ); Tue, 18 Apr 2017 18:16:00 -0400 To: Dan Williams , Jason Gunthorpe References: <1492381396.25766.43.camel@kernel.crashing.org> <20170418164557.GA7181@obsidianresearch.com> <20170418190138.GH7181@obsidianresearch.com> <20170418210339.GA24257@obsidianresearch.com> <20170418212258.GA26838@obsidianresearch.com> Cc: Benjamin Herrenschmidt , Bjorn Helgaas , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse From: Logan Gunthorpe Message-ID: <96198489-1af5-abcf-f23f-9a7e41aa17f7@deltatee.com> Date: Tue, 18 Apr 2017 16:15:52 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 172.16.1.111 X-SA-Exim-Rcpt-To: jglisse@redhat.com, linux-kernel@vger.kernel.org, linux-nvdimm@ml01.01.org, linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, linux-pci@vger.kernel.org, keith.busch@intel.com, maxg@mellanox.com, sbates@raithlin.com, swise@opengridcomputing.com, axboe@kernel.dk, martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, sagi@grimberg.me, hch@lst.de, helgaas@kernel.org, benh@kernel.crashing.org, jgunthorpe@obsidianresearch.com, dan.j.williams@intel.com X-SA-Exim-Mail-From: logang@deltatee.com Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1488 Lines: 35 On 18/04/17 03:36 PM, Dan Williams wrote: > On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe > wrote: >> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: >>>> I think this opens an even bigger can of worms.. >>> >>> No, I don't think it does. You'd only shim when the target page is >>> backed by a device, not host memory, and you can figure this out by a >>> is_zone_device_page()-style lookup. >> >> The bigger can of worms is how do you meaningfully stack dma_ops. > > This goes back to my original comment to make this capability a > function of the pci bridge itself. The kernel has an implementation of > a dynamically created bridge device that injects its own dma_ops for > the devices behind the bridge. See vmd_setup_dma_ops() in > drivers/pci/host/vmd.c. Well the issue I think Jason is pointing out is that the ops don't stack. The map_* function in the injected dma_ops needs to be able to call the original map_* for any page that is not p2p memory. This is especially annoying in the map_sg function which may need to call a different op based on the contents of the sgl. (And please correct me if I'm not seeing how this can be done in the vmd example.) Also, what happens if p2p pages end up getting passed to a device that doesn't have the injected dma_ops? However, the concept of replacing the dma_ops for all devices behind a supporting bridge is interesting and may be a good piece of the final solution. Logan