Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030395AbdDSRcg (ORCPT ); Wed, 19 Apr 2017 13:32:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59710 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S968410AbdDSRcd (ORCPT ); Wed, 19 Apr 2017 13:32:33 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 718903DBC9 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jglisse@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 718903DBC9 Date: Wed, 19 Apr 2017 13:32:26 -0400 From: Jerome Glisse To: Dan Williams Cc: Logan Gunthorpe , Jason Gunthorpe , Benjamin Herrenschmidt , Bjorn Helgaas , Christoph Hellwig , Sagi Grimberg , "James E.J. Bottomley" , "Martin K. Petersen" , Jens Axboe , Steve Wise , Stephen Bates , Max Gurtovoy , Keith Busch , linux-pci@vger.kernel.org, linux-scsi , linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm , "linux-kernel@vger.kernel.org" Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory Message-ID: <20170419173225.GA11255@redhat.com> References: <20170418164557.GA7181@obsidianresearch.com> <20170418190138.GH7181@obsidianresearch.com> <20170418210339.GA24257@obsidianresearch.com> <1492564806.25766.124.camel@kernel.crashing.org> <20170419155557.GA8497@obsidianresearch.com> <4899b011-bdfb-18d8-ef00-33a1516216a6@deltatee.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 19 Apr 2017 17:32:32 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3206 Lines: 72 On Wed, Apr 19, 2017 at 10:01:23AM -0700, Dan Williams wrote: > On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe wrote: > > > > > > On 19/04/17 09:55 AM, Jason Gunthorpe wrote: > >> I was thinking only this one would be supported with a core code > >> helper.. > > > > Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a > > type flag to the dev_pagemap structure which would be very useful to us. > > We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages. > > Then, potentially, we could add a dma_map callback to the structure > > (possibly unioned with an hmm field). The dev_ops providers would then > > just need to do something like this (enclosed in a helper): > > > > if (is_zone_device_page(page)) { > > pgmap = get_dev_pagemap(page_to_pfn(page)); > > if (!pgmap || pgmap->type != MEMORY_DEVICE_P2P || > > !pgmap->dma_map) > > return 0; > > > > dma_addr = pgmap->dma_map(dev, pgmap->dev, page); > > put_dev_pagemap(pgmap); > > if (!dma_addr) > > return 0; > > ... > > } > > > > The pci_enable_p2p_bar function would then just need to call > > devm_memremap_pages with the dma_map callback set to a function that > > does the segment check and the offset calculation. > > > > Thoughts? > > > > @Jerome: my feedback to you would be that your patch assumes all users > > of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more > > useful if it was generic. My suggestion would be to have the caller > > allocate the dev_pagemap structure, populate it and pass it into > > devm_memremap_pages. Given that pretty much everything in that structure > > are already arguments to that function, I feel like this makes sense. > > This should also help to unify hmm_devmem_pages_create and > > devm_memremap_pages which look very similar to each other. > > I like that change. Also the types should describe the memory relative > to its relationship to struct page, not whether it is persistent or > not. I would consider volatile and persistent memory that is attached > to the cpu memory controller and i/o coherent as the same type of > memory. DMA incoherent ranges like P2P and HMM should get their own > types. Dan you asked me to not use devm_memremap_pages() because you didn't want to have HMM memory in the pgmap_radix, did you change opinion on that ? :) Note i won't make any change now on that front but if it make sense i am happy to do it as separate patchset on top of HMM. Also i don't want p2pmem to be an exclusive or with HMM, we will want GPU to do peer to peer DMA and thus HMM ZONE_DEVICE pages to support this too. I do believe it is easier to special case in ZONE_DEVICE into existing dma_ops for each architecture. For x86 i think there is only 3 different set of dma_ops to modify. For other arch i guess there is even less. But in all the case i think p2pmem should stay out of memory management business. If some set of device do not have memory management it is better to propose helper to do that as part of the subsystem to which those devices belong. Just wanted to reiterate that point. Cheers, J?r?me