Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934686AbcKWWcO (ORCPT ); Wed, 23 Nov 2016 17:32:14 -0500 Received: from quartz.orcorp.ca ([184.70.90.242]:60258 "EHLO quartz.orcorp.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933256AbcKWWcJ (ORCPT ); Wed, 23 Nov 2016 17:32:09 -0500 Date: Wed, 23 Nov 2016 14:55:10 -0700 From: Jason Gunthorpe To: Logan Gunthorpe Cc: Serguei Sagalovitch , Dan Williams , "Deucher, Alexander" , "linux-nvdimm@lists.01.org" , "linux-rdma@vger.kernel.org" , "linux-pci@vger.kernel.org" , "Kuehling, Felix" , "Bridgman, John" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Koenig, Christian" , "Sander, Ben" , "Suthikulpanit, Suravee" , "Blinzer, Paul" , "Linux-media@vger.kernel.org" , Haggai Eran Subject: Re: Enabling peer to peer device transactions for PCIe devices Message-ID: <20161123215510.GA16311@obsidianresearch.com> References: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com> <45c6e878-bece-7987-aee7-0e940044158c@deltatee.com> <20161123190515.GA12146@obsidianresearch.com> <7bc38037-b6ab-943f-59db-6280e16901ab@amd.com> <20161123193228.GC12146@obsidianresearch.com> <20161123203332.GA15062@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Broken-Reverse-DNS: no host name found for IP address 10.0.0.151 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2418 Lines: 60 On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote: > > As I said, there is no possible special handling. Standard IB hardware > > does not support changing the DMA address once a MR is created. Forget > > about doing that. > > Yeah, that's essentially the point I was trying to make. Not to mention > all the other unrelated hardware that can't DMA to an address that might > disappear mid-transfer. Right, it is impossible to ask for generic page migration with ongoing DMA. That is simply not supported by any of the hardware at all. > > Only ODP hardware allows changing the DMA address on the fly, and it > > works at the page table level. We do not need special handling for > > RDMA. > > I am aware of ODP but, noted by others, it doesn't provide a general > solution to the points above. How do you mean? Perhaps I am not following what Serguei is asking for, but I understood the desire was for a complex GPU allocator that could migrate pages between GPU and CPU memory under control of the GPU driver, among other things. The desire is for DMA to continue to work even after these migrations happen. Page table mirroring *is* the general solution for this problem. The GPU driver controls the VMA and the DMA driver mirrors that VMA. Do you know of another option that doesn't just degenerate to page table mirroring?? Remember, there are two facets to the RDMA ODP implementation, I feel there is some confusion here.. The crucial part for this discussion is the ability to fence and block DMA for a specific range. This is the hardware capability that lets page migration happen: fence&block DMA, migrate page, update page table in HCA, unblock DMA. Without that hardware support the DMA address must be unchanging, and there is nothing we can do about it. This is why standard IB hardware must have fixed MRs - it lacks the fence&block capability. The other part is the page faulting implementation, but that is not required, and to Serguei's point, is not desired for GPU anyhow. > > To me this means at least items #1 and #3 should be removed from > > Alexander's list. > > It's also worth noting that #4 makes use of ZONE_DEVICE (#2) so they are > really the same option. iopmem is really just one way to get BAR > addresses to user-space while inside the kernel it's ZONE_DEVICE. Seems fine for RDMA? Didn't we just strike off everything on the list except #2? :\ Jason