Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934993AbcKWXZV (ORCPT ); Wed, 23 Nov 2016 18:25:21 -0500 Received: from quartz.orcorp.ca ([184.70.90.242]:41893 "EHLO quartz.orcorp.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752609AbcKWXZT (ORCPT ); Wed, 23 Nov 2016 18:25:19 -0500 Date: Wed, 23 Nov 2016 16:25:03 -0700 From: Jason Gunthorpe To: Dan Williams Cc: Logan Gunthorpe , Serguei Sagalovitch , "Deucher, Alexander" , "linux-nvdimm@lists.01.org" , "linux-rdma@vger.kernel.org" , "linux-pci@vger.kernel.org" , "Kuehling, Felix" , "Bridgman, John" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Koenig, Christian" , "Sander, Ben" , "Suthikulpanit, Suravee" , "Blinzer, Paul" , "Linux-media@vger.kernel.org" , Haggai Eran Subject: Re: Enabling peer to peer device transactions for PCIe devices Message-ID: <20161123232503.GA13965@obsidianresearch.com> References: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com> <45c6e878-bece-7987-aee7-0e940044158c@deltatee.com> <20161123190515.GA12146@obsidianresearch.com> <7bc38037-b6ab-943f-59db-6280e16901ab@amd.com> <20161123193228.GC12146@obsidianresearch.com> <20161123203332.GA15062@obsidianresearch.com> <20161123215510.GA16311@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Broken-Reverse-DNS: no host name found for IP address 10.0.0.151 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2357 Lines: 57 On Wed, Nov 23, 2016 at 02:42:12PM -0800, Dan Williams wrote: > > The crucial part for this discussion is the ability to fence and block > > DMA for a specific range. This is the hardware capability that lets > > page migration happen: fence&block DMA, migrate page, update page > > table in HCA, unblock DMA. > > Wait, ODP requires migratable pages, ZONE_DEVICE pages are not > migratable. Does it? I didn't think so.. Does ZONE_DEVICE break MMU notifiers/etc or something? There is certainly nothing about the hardware that cares about ZONE_DEVICE vs System memory. I used 'migration' in the broader sense of doing any transformation to the page such that the DMA address changes - not the specific kernel MM process... > You can't replace a PCIe mapping with just any other System RAM > physical address, right? I thought that was exactly what HMM was trying to do? Migrate pages between CPU and GPU memory as needed. As Serguei has said this process needs to be driven by the GPU driver. The peer-peer issue is how do you do that while RDMA is possible on those pages, because when the page migrates to GPU memory you want the RDMA to follow it seamlessly. This is why page table mirroring is the best solution - use the existing mm machinery to link the DMA driver and whatever is controlling the VMA. > At least not without a filesystem recording where things went, but > at point we're no longer talking about the base P2P-DMA mapping In the filesystem/DAX case, it would be the filesystem that initiates any change in the page physical address. ODP *follows* changes in the VMA it does not cause any change in address mapping. That has to be done by whoever is in charge of the VMA. > something like pnfs-rdma to a DAX filesystem. Something in the kernel (ie nfs-rdma) would be entirely different. We generally don't do long lived mappings in the kernel for RDMA (certainly not for NFS), so it is much more like your basic every day DMA operation: map, execute, unmap. We probably don't need to use page table mirroring for this. ODP comes in when userpsace mmaps a DAX file and then tries to use it for RDMA. Page table mirroring lets the DAX filesystem decide to move the backing pages at any time. When it wants to do that it interacts with the MM in the usual way which links to ODP and makes sure the migration is seamless. Jason