Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932620AbdDQFNg (ORCPT ); Mon, 17 Apr 2017 01:13:36 -0400 Received: from ale.deltatee.com ([207.54.116.67]:48493 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932201AbdDQFNe (ORCPT ); Mon, 17 Apr 2017 01:13:34 -0400 To: Benjamin Herrenschmidt , Dan Williams References: <6e732d6a-9baf-1768-3e9c-f6c887a836b2@deltatee.com> <1492381958.25766.50.camel@kernel.crashing.org> Cc: Bjorn Helgaas , Jason Gunthorpe , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse From: Logan Gunthorpe Message-ID: <6149ab5e-c981-6881-8c5a-22349561c3e8@deltatee.com> Date: Sun, 16 Apr 2017 23:13:13 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.6.0 MIME-Version: 1.0 In-Reply-To: <1492381958.25766.50.camel@kernel.crashing.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 50.66.97.235 X-SA-Exim-Rcpt-To: jglisse@redhat.com, linux-kernel@vger.kernel.org, linux-nvdimm@ml01.01.org, linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, linux-pci@vger.kernel.org, keith.busch@intel.com, maxg@mellanox.com, sbates@raithlin.com, swise@opengridcomputing.com, axboe@kernel.dk, martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, sagi@grimberg.me, hch@lst.de, jgunthorpe@obsidianresearch.com, helgaas@kernel.org, dan.j.williams@intel.com, benh@kernel.crashing.org X-SA-Exim-Mail-From: logang@deltatee.com Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1898 Lines: 41 On 16/04/17 04:32 PM, Benjamin Herrenschmidt wrote: >> I'll consider this. Given the fact I can use your existing >> get_dev_pagemap infrastructure to look up the p2pmem device this >> probably isn't as hard as I thought it would be anyway (we probably >> don't even need a page flag). We'd just have lookup the dev_pagemap, >> test if it's a p2pmem device, and if so, call a p2pmem_dma_map function >> which could apply the offset or do any other arch specific logic (if >> necessary). > > I'm still not 100% why do you need a "p2mem device" mind you ... Well, you don't "need" it but it is a design choice that I think makes a lot of sense for the following reasons: 1) p2pmem is in fact a device on the pci bus. A pci driver will need to set it up and create the device and thus it will have a natural parent pci device. Instantiating a struct device for it means it will appear in the device hierarchy and one can use that to reason about its position in the topology. 2) In order to create the struct pages we use the ZONE_DEVICE infrastructure which requires a struct device. (See devm_memremap_pages.) This amazingly gets us the get_dev_pagemap architecture which also uses a struct device. So by using a p2pmem device we can go from struct page to struct device to p2pmem device quickly and effortlessly. 3) You wouldn't want to use the pci's struct device because it doesn't really describe what's going on. For example, there may be multiple devices on the pci device in question: eg. an NVME card and some p2pmem. Or it could be a NIC with some p2pmem. Or it could just be p2pmem by itself. And the logic to figure out what memory is available and where the address is will be non-standard so it's really straightforward to have any pci driver just instantiate a p2pmem device. It is probably worth you reading the RFC patches at this point to get a better feel for this. Logan