Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932118AbdDPQsC (ORCPT ); Sun, 16 Apr 2017 12:48:02 -0400 Received: from ale.deltatee.com ([207.54.116.67]:47163 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755987AbdDPQsA (ORCPT ); Sun, 16 Apr 2017 12:48:00 -0400 To: Dan Williams References: Cc: Benjamin Herrenschmidt , Bjorn Helgaas , Jason Gunthorpe , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" , Jerome Glisse From: Logan Gunthorpe Message-ID: <6e732d6a-9baf-1768-3e9c-f6c887a836b2@deltatee.com> Date: Sun, 16 Apr 2017 10:47:50 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 50.66.97.235 X-SA-Exim-Rcpt-To: jglisse@redhat.com, linux-kernel@vger.kernel.org, linux-nvdimm@ml01.01.org, linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, linux-pci@vger.kernel.org, keith.busch@intel.com, maxg@mellanox.com, sbates@raithlin.com, swise@opengridcomputing.com, axboe@kernel.dk, martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, sagi@grimberg.me, hch@lst.de, jgunthorpe@obsidianresearch.com, helgaas@kernel.org, benh@kernel.crashing.org, dan.j.williams@intel.com X-SA-Exim-Mail-From: logang@deltatee.com Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1831 Lines: 42 On 16/04/17 09:44 AM, Dan Williams wrote: > I think we very much want the dma mapping layer to be in the way. > It's the only sane semantic we have to communicate this translation. Yes, I wasn't proposing bypassing that layer, per say. I just meant that the layer would, in the end, have to return the address without any translations. > The difference is that there was nothing fundamental in the core > design of pmem + DAX that prevented other archs from growing pmem > support. THP and memory hotplug existed on other architectures and > they just need to plug in their arch-specific enabling. p2p support > needs the same starting point of something more than one architecture > can plug into, and handling the bus address offset case needs to be > incorporated into the design. I don't think there's a difference there either. There'd have been nothing fundamental in our core design that says offsets couldn't have been added later. > pmem + dax did not change the meaning of what a dma_addr_t is, p2p does. I don't think p2p actually really changes the meaning of dma_addr_t either. We are just putting addresses in there that weren't used previously. Our RFC makes no changes to anything even remotely related to dma_addr_t. > I think you need to give other archs a chance to support this with a > design that considers the offset case as a first class citizen rather > than an afterthought. I'll consider this. Given the fact I can use your existing get_dev_pagemap infrastructure to look up the p2pmem device this probably isn't as hard as I thought it would be anyway (we probably don't even need a page flag). We'd just have lookup the dev_pagemap, test if it's a p2pmem device, and if so, call a p2pmem_dma_map function which could apply the offset or do any other arch specific logic (if necessary). Logan