Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935364AbcKXAnJ (ORCPT ); Wed, 23 Nov 2016 19:43:09 -0500 Received: from mail-by2nam03on0051.outbound.protection.outlook.com ([104.47.42.51]:58000 "EHLO NAM03-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S934820AbcKXAnG (ORCPT ); Wed, 23 Nov 2016 19:43:06 -0500 X-Greylist: delayed 97549 seconds by postgrey-1.27 at vger.kernel.org; Wed, 23 Nov 2016 19:41:28 EST From: "Sagalovitch, Serguei" To: Jason Gunthorpe , Logan Gunthorpe CC: Dan Williams , "Deucher, Alexander" , "linux-nvdimm@lists.01.org" , "linux-rdma@vger.kernel.org" , "linux-pci@vger.kernel.org" , "Kuehling, Felix" , "Bridgman, John" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Koenig, Christian" , "Sander, Ben" , "Suthikulpanit, Suravee" , "Blinzer, Paul" , "Linux-media@vger.kernel.org" , Haggai Eran Subject: Re: Enabling peer to peer device transactions for PCIe devices Thread-Topic: Enabling peer to peer device transactions for PCIe devices Thread-Index: AdJENuonJPasaqxFT7iHs+MJbpSfBgAtPueA//+5dwCAAchwgIAAH1mA//+u0ACAAFjKAP//s34AgABdkgCAAAqbgIAADDQAgAAg6s4= Date: Thu, 24 Nov 2016 00:40:37 +0000 Message-ID: References: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com> <45c6e878-bece-7987-aee7-0e940044158c@deltatee.com> <20161123190515.GA12146@obsidianresearch.com> <7bc38037-b6ab-943f-59db-6280e16901ab@amd.com> <20161123193228.GC12146@obsidianresearch.com> <20161123203332.GA15062@obsidianresearch.com> ,<20161123215510.GA16311@obsidianresearch.com> In-Reply-To: <20161123215510.GA16311@obsidianresearch.com> Accept-Language: en-CA, en-US Content-Language: en-CA X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Serguei.Sagalovitch@amd.com; x-originating-ip: [132.245.39.245] x-microsoft-exchange-diagnostics: 1;MWHPR12MB1693;7:YvHnATE+0T69DgM6ScCtYuEbiPMR7VfGbr0kfqFhFXmgB+wVatkawc9WeTavbKLlN3ewk5mSxKipt7fvqwn2jtZypkH+0TiF3pfXnysB6HE9cvXiQ5zGE2Zd4jRxsf21fgna7X9CxU1Ua5e4a2xSDRdnfmIMnQh219215w8kJlONhjqUKpBxTbXYoh3vN1YxpvQt/VdRv+oYZ+9k3C1kP2IVyHV/ta/MF4xqLjrcWNZA+jQ2DPcieBRyIm9WqMDWxBp1ubKHAlWdRzJMxPsgJphOkEaKhyWNF0Eq33sN4cVmR8xJUikUSZG3BY2B0IEZahixr+0Jgqb471ecqZCJowvokNsuQqN4OSuFHexg8xA=;20:qn5yi4Oq1/psAbt5Ere1V2yESbpKA1eC0/O8kZpAeGn24v+0D/ER8AjnjOeMPojprA4YEKkB3VpPKhwdwUvsCOUVS2Vt6KKl7lFtlFmYYG/BenI7Gf/FQw3eOmxcba0Tm2JDhqFPrplMBOjXp+EwuxYYxAqQ4zdRQkleZYnVKgkAElVWwrsgHfmsudBbkwTekIKMMYp8l/vCVdc3sQdZ9R7VktQK1Wf5El+YRypZhSU9pUylgqfUkdE7xbRRXg6r x-forefront-antispam-report: SFV:SKI;SCL:-1SFV:NSPM;SFS:(10009020)(6009001)(7916002)(199003)(24454002)(189002)(6506003)(76576001)(92566002)(229853002)(6512003)(38730400001)(66066001)(68736007)(101416001)(6116002)(50986999)(102836003)(2906002)(561944003)(2900100001)(3660700001)(33656002)(305945005)(74316002)(86362001)(3280700002)(7846002)(77096005)(2950100002)(81166006)(76176999)(5660300001)(3846002)(97736004)(81156014)(106356001)(99286002)(93886004)(9686002)(189998001)(4326007)(5001770100001)(105586002)(122556002)(7736002)(54356999)(7416002)(8936002)(8676002);DIR:OUT;SFP:1101;SCL:1;SRVR:MWHPR12MB1693;H:BLUPR12MB0692.namprd12.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; x-ms-office365-filtering-correlation-id: caf732e9-0036-4124-c403-08d4140282a6 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:MWHPR12MB1693; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040307)(6060326)(6045199)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6055026)(6041248)(6061324)(6072148);SRVR:MWHPR12MB1693;BCL:0;PCL:0;RULEID:;SRVR:MWHPR12MB1693; x-forefront-prvs: 0136C1DDA4 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Nov 2016 00:40:37.9168 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1693 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id uAO0hM6D009997 Content-Length: 2950 Lines: 68 On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote: > Perhaps I am not following what Serguei is asking for, but I > understood the desire was for a complex GPU allocator that could > migrate pages between GPU and CPU memory under control of the GPU > driver, among other things. The desire is for DMA to continue to work > even after these migrations happen. The main issue is to how to solve use cases when p2p is requested/initiated via CPU pointers where such pointers could point to non-system memory location e.g. VRAM. It will allow to provide consistent working model for user to deal only with pointers (HSA, CUDA, OpenCL 2.0 SVM) as well as provide performance optimization avoiding double-buffering and extra special code when dealing with PCIe device memory. Examples are: - RDMA Network operations. RDMA MRs where registered memory could be e.g. VRAM. Currently it is solved using so called PeerDirect interface which is currently out-of-tree and provided as part of OFED. - File operations (fread/fwrite) when user wants to transfer file data directly to/from e.g. VRAM Challenges are: - Because graphics sub-system must support overcomit (at least each application/process should independently see all resources) ideally such memory should be movable without changing CPU pointer value as well as "paged-out" supporting "page fault" at least on access from CPU. - We must co-exist with existing DRM infrastructure, as well as support sharing VRAM memory between different processes - We should be able to deal with large allocations: tens, hundreds of MBs or may be GBs. - We may have PCIe devices where p2p may not work - Potentially any GPU memory should be supported including memory carved out from system RAM (e.g. allocated via get_free_pages()). Note: - In the case of RDMA MRs life-span of "pinning" (get_user_pages"/put_page) may be defined/controlled by application not kernel which may be should treated differently as special case. Original proposal was to create "struct pages" for VRAM memory to allow "get_user_pages" to work transparently similar how it is/was done for "DAX Device" case. Unfortunately based on my understanding "DAX Device" implementation deal only with permanently "locked" memory (fixed location) unrelated to "get_user_pages"/"put_page" scope which doesn't satisfy requirements for "eviction" / "moving" of memory keeping CPU address intact. > The desire is for DMA to continue to work > even after these migrations happen At least some kind of mm notifier callback to inform about changing in location (pre- and post-) similar how it is done for system pages. My understanding is that It will not solve RDMA MR issue where "lock" could be during the whole application life but (a) it will not make RDMA MR case worse (b) should be enough for all other cases for "get_user_pages"/"put_page" controlled by kernel.