Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935334AbcKWUdn (ORCPT ); Wed, 23 Nov 2016 15:33:43 -0500 Received: from quartz.orcorp.ca ([184.70.90.242]:43063 "EHLO quartz.orcorp.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933402AbcKWUdl (ORCPT ); Wed, 23 Nov 2016 15:33:41 -0500 Date: Wed, 23 Nov 2016 13:33:32 -0700 From: Jason Gunthorpe To: Serguei Sagalovitch Cc: Logan Gunthorpe , Dan Williams , "Deucher, Alexander" , "linux-nvdimm@lists.01.org" , "linux-rdma@vger.kernel.org" , "linux-pci@vger.kernel.org" , "Kuehling, Felix" , "Bridgman, John" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Koenig, Christian" , "Sander, Ben" , "Suthikulpanit, Suravee" , "Blinzer, Paul" , "Linux-media@vger.kernel.org" , Haggai Eran Subject: Re: Enabling peer to peer device transactions for PCIe devices Message-ID: <20161123203332.GA15062@obsidianresearch.com> References: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com> <45c6e878-bece-7987-aee7-0e940044158c@deltatee.com> <20161123190515.GA12146@obsidianresearch.com> <7bc38037-b6ab-943f-59db-6280e16901ab@amd.com> <20161123193228.GC12146@obsidianresearch.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Broken-Reverse-DNS: no host name found for IP address 10.0.0.151 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1990 Lines: 47 On Wed, Nov 23, 2016 at 02:58:38PM -0500, Serguei Sagalovitch wrote: > We do not want to have "highly" dynamic translation due to > performance cost. We need to support "overcommit" but would > like to minimize impact. To support RDMA MRs for GPU/VRAM/PCIe > device memory (which is must) we need either globally force > pinning for the scope of "get_user_pages() / "put_pages" or have > special handling for RDMA MRs and similar cases. As I said, there is no possible special handling. Standard IB hardware does not support changing the DMA address once a MR is created. Forget about doing that. Only ODP hardware allows changing the DMA address on the fly, and it works at the page table level. We do not need special handling for RDMA. > Generally it could be difficult to correctly handle "DMA in > progress" due to the facts that (a) DMA could originate from > numerous PCIe devices simultaneously including requests to > receive network data. We handle all of this today in kernel via the page pinning mechanism. This needs to be copied into peer-peer memory and GPU memory schemes as well. A pinned page means the DMA address channot be changed and there is active non-CPU access to it. Any hardware that does not support page table mirroring must go this route. > (b) in HSA case DMA could originated from user space without kernel > driver knowledge. So without corresponding h/w support > everywhere I do not see how it could be solved effectively. All true user triggered DMA must go through some kind of coherent page table mirroring scheme (eg this is what CAPI does, presumably AMDs HSA is similar). A page table mirroring scheme is basically the same as what ODP does. Like I said, this is the direction the industry seems to be moving in, so any solution here should focus on VMAs/page tables as the way to link the peer-peer devices. To me this means at least items #1 and #3 should be removed from Alexander's list. Jason