Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754917AbcKYO5G (ORCPT ); Fri, 25 Nov 2016 09:57:06 -0500 Received: from mail-sn1nam01on0061.outbound.protection.outlook.com ([104.47.32.61]:52144 "EHLO NAM01-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754548AbcKYO45 (ORCPT ); Fri, 25 Nov 2016 09:56:57 -0500 X-Greylist: delayed 3641 seconds by postgrey-1.27 at vger.kernel.org; Fri, 25 Nov 2016 09:56:57 EST Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Christian.Koenig@amd.com; Subject: Re: Enabling peer to peer device transactions for PCIe devices To: Jason Gunthorpe , Logan Gunthorpe References: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com> <45c6e878-bece-7987-aee7-0e940044158c@deltatee.com> <20161123190515.GA12146@obsidianresearch.com> <7bc38037-b6ab-943f-59db-6280e16901ab@amd.com> <20161123193228.GC12146@obsidianresearch.com> <20161123203332.GA15062@obsidianresearch.com> <20161123215510.GA16311@obsidianresearch.com> <91d28749-bc64-622f-56a1-26c00e6b462a@deltatee.com> <20161124164249.GD20818@obsidianresearch.com> CC: Serguei Sagalovitch , Dan Williams , "Deucher, Alexander" , "linux-nvdimm@lists.01.org" , "linux-rdma@vger.kernel.org" , "linux-pci@vger.kernel.org" , "Kuehling, Felix" , "Bridgman, John" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Sander, Ben" , "Suthikulpanit, Suravee" , "Blinzer, Paul" , "Linux-media@vger.kernel.org" , Haggai Eran From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <3f2d2db3-fb75-2422-2a18-a8497fd5d70e@amd.com> Date: Fri, 25 Nov 2016 14:22:17 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161124164249.GD20818@obsidianresearch.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [2a02:908:1251:7981:a194:99a7:5754:f487] X-ClientProxiedBy: HE1PR01CA0070.eurprd01.prod.exchangelabs.com (10.165.170.166) To MWHPR12MB1312.namprd12.prod.outlook.com (10.169.205.137) X-MS-Office365-Filtering-Correlation-Id: b3515a96-cfda-460e-42b0-08d41536203f X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:MWHPR12MB1312; X-Microsoft-Exchange-Diagnostics: 1;MWHPR12MB1312;3:stdsoPuI/5TecE25kvw1WfS0MfvifIZ0miRrExICndGlCFR6+Zh/xVg9KY3H04dMyOussKMoJlGNw4jZ6MpzYGoN4GtVwavpmcf6DiqN0MtKq2OFXYQpAyZrGFqAWMaAjQ3M0p4uA/mNku7bQlQoQ4TW3y1K9q+aGJwxmOAmgwI6pg2w/AKc6ryyUSTxP1M/GEYpv97U7AiTaCNXfkxVCIr2XcruZH9CwuiojJHOLUwL4SaaQiMv3X7rBATQmUiIv6W3/mAHQoI4Qtjv7gDObQ==;25:LLEMSU5sIiLKwVrBBnzR/hLfj8DYKeYF/qq/LVQ0qFiMFC/Z5mhSnJx2TclW55TJiM+YDEE9qqsmqynLGP0lyCArqCEbmeDx9skYdnxh3GRrmJHo9yp/0HMj/lsoDdoAJtTONqb9AzD4mFJLBBDC/lUDQ43uMGTQbiRzJ9h7LRM+Dd3M8k+dj9vDRhFJGNfvz3WfwRjuDGx3cPs8PBENNArgE11qzYKZJHhwqC4f4RiMiI/8j7tJfNr7e2pSkfk808WV2ZWkH4rq6h6TAYrwb7KnGjxLzmHC70f8P1FZnwwRVia3DkvEmET7JDw+6hEiOGoYuRxuk9ZeJddkV8O6oMSYB0z+4WzpuAnCVjTVKA2QodRqDk/Cubb1eML1O9ziwvLNbjcb1FUdeIxSd0mqunEM8rPNbyrh09pONubvuQ1C0IJjbLsJND6ydRGugGyZMbrFNkSDsh8WtunQSq/sEQ== X-Microsoft-Exchange-Diagnostics: 1;MWHPR12MB1312;31:6cUcW1nMzUbUsAgpfv2tbcxZSkt6aoSe7dvJWheMQ0VY7lcC3yDuxXZZSDKc380YdZ2HZ8Kvs8+LVhU1eQsPZIQ7zYum1/F1J2asITGfO24XZBworUl5FAgTP6YwZ+q75on0Z2S3UN60w3nOnfdpa4JK88us5/JNTGkyJeecvQPRefZ8GnY5E7fOTXYb3TM0s/TShIfYrVBYDFBdCzpdnf1hBR5duSMwAVICrla0WAgHcOGlsfqhGxcgcmjdmfPBhSos1FGSh4NtPV6pfy+ssg==;20:Ue1FZdceDEE5HL1Q2yLiR6OREQLBPTTxm0YCZocruXhyE0K7ohY/DlcLtNdNGrwLHwuWcWpKD9pw29alhRqIk03rGO/D9+IecZ3o1tZ5b+DbeSDVE/oO34K3oXcKfkbzftlfltaJHgA9I4DEi/fiUzbVXB/6Fe56hJpKQkNKfmgNcbeVJ8IVFPbj8IJlAfxQXGo3AZBo8S5O872SIrFXBvvb2FC2U6eyvut3e7XSrGapZeWz0aKxnt+S+Nt+BCDg82z1RufXl8PJyyWKqUgpM6dZ+NlzkabNOsoYiGIE6AJNn1EPoY4fH/iSURkO3uiVUrF8sx0DSV4s0QEyN2Ynzz871bPYlua0ZQ2Sy07wmXACG/jFC3ltMp6soDMBvxKefCDSmMQe+9Sr9pcPEg/76Mn+y+Lei5BJpdIXmiw4tUISBKkhmvIlVY3t48n1h0GyQMRQp0rtAi7sIZ2/gEm9Jdmkf+x7Qz1JNOcx6XGLh09wThXuCPuhIiZP4gzchm7K X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6060326)(6045199)(6040361)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026)(6061324)(6041248)(20161123555025)(20161123564025)(20161123560025)(20161123562025)(6072148);SRVR:MWHPR12MB1312;BCL:0;PCL:0;RULEID:;SRVR:MWHPR12MB1312; X-Microsoft-Exchange-Diagnostics: 1;MWHPR12MB1312;4:OQh5hF87vrBClt6eXyRNt/KWkhBCUl+UdeuQGwVI0jp3JfRjXdU9la8fp/KfTiihsUILgGtKyV6uOQBM62xNBn9WRHD0XqgBqYV7wX/+fNGNPn5bMlga/nPqTpXLj11pZY7DRxKhfy7ekH/fpegGgTP/gmm2Cq2wUbg/0W3D8i0BtDM1E/YLiM6Uulc4tmfoZdA9eaOg0FE4IwR9YRe1VLUDvKl8Rh6PpH6w/j2XP9hgdkpHN960+BMooDMwm2ZHPM+chrwuXKDe8jIBdWUsmVYwsoPYEVf2g95wh03u2kcpvGdrMLKCJPNiDdYQuLzV4dNn8aMG/sxpzOj2yL1CN1hYJ8hl/G95blOuYpp2+qWecyyOBvTQqAXKbEbQB9N4jX6S+Vi3Qw8jDMx264SE1Domw73l0CCtk850/FRZB4N2f9yo8Uj1t7BiDustRSpMxQ7K+63PHx7c/zo+iEY3I8td0Q/r195B6jvwCa67j2BcSVGURU3Ru9F/J0vE/WUI3BoE6tRCA7q+E8z80HY6cP9TcFCgWGG9iU+b4lm6iyAi4Byxmj2/0STXN2L3uRggdMFLpYoM1to49r3p0FCQ44K0HEsB7JoZO0yND7cB98xoR7wj0dsdd7PY799a9aovCLgUseHdQzcaiM+js309rA== X-Forefront-PRVS: 01371B902F X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(7916002)(24454002)(377454003)(199003)(189002)(81156014)(39380400001)(81166006)(31696002)(39410400001)(7416002)(93886004)(42186005)(50986999)(230700001)(8676002)(105586002)(5660300001)(31686004)(7846002)(7736002)(83506001)(23746002)(305945005)(65826007)(86362001)(50466002)(54356999)(189998001)(101416001)(106356001)(1706002)(39450400002)(6116002)(4326007)(2906002)(64126003)(33646002)(6666003)(47776003)(68736007)(5001770100001)(4001350100001)(38730400001)(65806001)(36756003)(2950100002)(229853002)(97736004)(92566002)(77096005)(65956001)(76176999)(39400400001);DIR:OUT;SFP:1101;SCL:1;SRVR:MWHPR12MB1312;H:[IPv6:2a02:908:1251:7981:a194:99a7:5754:f487];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;MWHPR12MB1312;23:x++WaQwWZxGuHD1hIvvR+IR3TjgTQdpp6kRFP?= =?Windows-1252?Q?qtZei9bnAJneBrPVsPiexLCBu6lOakZ0ifeYYqHoT0FBeAul7CPbjBzF?= =?Windows-1252?Q?KfQ03aUPzEob1w6HACvYRCm2aHY+yGIMpZwIaHnHEamA1KGlOUP9geMd?= =?Windows-1252?Q?tqMYUWPjjhKC6CPLCtqUjOPQ+swg3ZfzPVxOPdTPkwi4GjYG8fazWNxe?= =?Windows-1252?Q?hlFHl/654czR76R2gaK5aZHUlvPo4UzsdGF/YEGHkAxLjOZzgYEtcm+E?= =?Windows-1252?Q?MHTbrgKUtOGMQf7/idgVOf8sJCWojBqM7xCIDjPbk23DLxXqmzz9UjvD?= =?Windows-1252?Q?yAmVcJPca+mIOJVBR8oGNGURvpGoGaWidfE1riQ4Yy07S2CksyEHhq09?= =?Windows-1252?Q?sqJeBVUBV+ZFDkQ/dsVHj+cPO26c6FK+PUvsVyiPMJIGpbJmi66ZgHA+?= =?Windows-1252?Q?UZmBiDaGqp0bW1aUyPkD8mdt3WLvmsqT44tjidm+24FxgscSE0TSBKTZ?= =?Windows-1252?Q?Iqo4ho5xM+6FohmcaqnFysUY/MA7UfrPdtWv/FP5iq6Zw84uRw+JhDjc?= =?Windows-1252?Q?X5G4JUnggiEDdApjRF6CTQ3NTIytXLOoXjuJsvBp6uVrpsI8+JEUmjou?= =?Windows-1252?Q?kb0RKZ5+aQe6BXj02kVMSQx6EHh+yX98Ii6BGgx65s0gly3VeOUBv1o+?= =?Windows-1252?Q?AO3AohLj7pH0U8E86tUADraQZIQQxFH3suTwE2Yqm5Qm/f8cQRuTqDSU?= =?Windows-1252?Q?sa/axfGkPksmgvwFUEV4QkHUQNhVodNm5uJ02Dkkou0lbRDT9SmFhahF?= =?Windows-1252?Q?Z/Ud6wLT6NKmkrQBkNLIDqLmokczzf4pjgoejpLhGaabYr2QAeyEpqAO?= =?Windows-1252?Q?XyC6ALCESELO89ZL0HNkjqkC3KEGLafnfNKp4Hssd85VwRy0ZrsmcObD?= =?Windows-1252?Q?e9fl05sKBd53vXr0/ZEnUqfkVrXv7KvB8DTvUY2v85XIQDGuLJtpfjWz?= =?Windows-1252?Q?sacQ1cVjkveCVsBntdfS2xslKXoSb9nvGbg5w+QG8MzDi9nYd7lrvw8a?= =?Windows-1252?Q?s1W+Kf5gRHicXazMmm/7OyOMRic6cD8yH7DPiY211nVtwCego2zWDn0E?= =?Windows-1252?Q?ku/P7ABiiWf+WE1I0MaOCB9Q8oXUXbgCiODeFU1hn82IE/P5uc/y7RLn?= =?Windows-1252?Q?mw+bQbsNV9d9w0aiTx5N7sViYJv7Exy+EEu8Pb/CFYLuFC3SEesM4+yD?= =?Windows-1252?Q?F9305FGrin2Ze1Kr7R6LDbHsHZBEvfaIBWxz6mQ8vPTIv2waK6fXElxy?= =?Windows-1252?Q?y2ofem4QZ8o1WtXkDHy791n8sexy+jq4Xag6oKVCvWpoaSHgrtfGi8WA?= =?Windows-1252?Q?A1CNhu1jpi6V5vpsfeMsfoAnHM4cQB6diY3516Y/ulDQVNqRoyYRBu3a?= =?Windows-1252?Q?H+KVh99hB1wIU68dlbFRLsZbVXtWO3V/erKESB2BXDSPbvQ0f6elFiRH?= =?Windows-1252?Q?ACeO8Q=3D?= X-Microsoft-Exchange-Diagnostics: 1;MWHPR12MB1312;6:OSVlMEUnApDlxDmHh0GzqeGjIozXe9umf4yg8bTtV7tz4QPaNrVwZ9WmOKiR/oVmY1q5ndxtb/DO8k8DDBALHRQ2HQ32OFmtemN0RGDNO1+EvhBKpVp5MUb3kMGTKBC619DG9XZK2x4h/h+TwpdMB+ijZeUrMZlWcAGC+5zzIUZoxAmRq/ep+paHrDo6mdilzjny7Z81rK4qwi3vbh0Cx4bkMzufV/+reqZSDKSLsojnJxdkiE0Eor7Nopi5RPPuMgGw5uVdR0sqzjqcNA/chnWTEsjGLmudSZyi+ms2E7+HL9dlYkNMORu+rvg9/NO8j8Jg0qfBbK1744OcWjYWmGOjtuN8hoKEswfozHkHxkFPkDz2EEedeBmZUGLERruVEXzOBahpDsd8niFy+biMhpPqqgMHfEnF7WOw/1ildMZUGIrxA9BHoeipMDMZENejITRUDKOh8Gx308nLFRZu7dASxjzH+k32SGpxJaZxfT3d310UcrncV6ksP+rXfR3q;5:nbQRL85taGPKvGd15lOyXKMpHJRbEg5i6ayYjRtLJAnmjNLGW7ruwGbWqgbtVmp9BwgSCCoBkiajzBFTg5KUF4TZEx5WKIxAWff6cnFrsqMTQo88qQg8u6LeE2JYchrsZotpCOVf+LJcoY31x6ZkZQ==;24:5/RhiduUUKUxxFtIcxOYAPejXwAh9D+GUA950YNd0wimY2Q+pnZxShmh5YOVal8CsDW675w4NyS1qeq7Qcq11Zdfgf7A2XWcS8GEUDyNc48= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;MWHPR12MB1312;7:NAqugo1q92JX+u+A/2PxXkRG5EDbHtnc7CbOYu+bt3/Bo8bkkSA4gPdapkiyzjUs4YKiBuic/mkJtZH3w+VlWQCQVNtOz0IPrxxCX9Rzxj7RrDFd0K5vQsYdVRSO/QaomjJkeQc/5h3KXabwPjw+TCV/IehGAsML6snRe75AmMsuIvCrXju3EZVDLndF2ScgefjFKuhjaHmbONY7L2TQ2yQnBGFOXOUP65WYAb1uFHskbIko02nXGxwezRuSI/y4qYsT/rATBKadrUasZ9DaXsX05rpK1DSkZarHxHk65fGyBdgZlrmtgGYKYKt+SEqK1BQRUZ0bZTaKWnw33yRclvMHzmcQOlb+ajVKwgXMc8GedC6NfMssvlfYPkUcz0qUqo8ai2krQlBB63w9G7vO5SxHaP3CM6BmUKRlwQO0rVQEOV9i+TF335zXEe7xw0rcj7xZ0w8g8EduxnhPR7hAJA==;20:VQRaO/Bpxqs7RSZn4JR2y+cD1rOpUQP+SZkfH2RS+s3UmDw0tGP4yq3yXmmF6aR5LM8J1cZ+to4h2cDFRcGg8EKW2MstjzOrkSQPEQ1kNYfhP5I1dFXfo08LuNbDgBg9L8Ec1cGC/MtQIPalE2d+HUeFY7L9XlYyvetzp60OA7WUHDl3gvwJNRCISQ2WxLQ0dSBnkL4Q0UvBO/hsL2nODoK6zMsuPlX59qf4ZOaZhwnAVxjk9F+vid0zuBIclsxG X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Nov 2016 13:22:31.4922 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1312 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3671 Lines: 87 Am 24.11.2016 um 17:42 schrieb Jason Gunthorpe: > On Wed, Nov 23, 2016 at 06:25:21PM -0700, Logan Gunthorpe wrote: >> >> On 23/11/16 02:55 PM, Jason Gunthorpe wrote: >>>>> Only ODP hardware allows changing the DMA address on the fly, and it >>>>> works at the page table level. We do not need special handling for >>>>> RDMA. >>>> I am aware of ODP but, noted by others, it doesn't provide a general >>>> solution to the points above. >>> How do you mean? >> I was only saying it wasn't general in that it wouldn't work for IB >> hardware that doesn't support ODP or other hardware that doesn't do >> similar things (like an NVMe drive). > There are three cases to worry about: > - Coherent long lived page table mirroring (RDMA ODP MR) > - Non-coherent long lived page table mirroring (RDMA MR) > - Short lived DMA mapping (everything else) > > Like you say below we have to handle short lived in the usual way, and > that covers basically every device except IB MRs, including the > command queue on a NVMe drive. Well a problem which wasn't mentioned so far is that while GPUs do have a page table to mirror the CPU page table, they usually can't recover from page faults. So what we do is making sure that all memory accessed by the GPU Jobs stays in place while those jobs run (pretty much the same pinning you do for the DMA). But since this can lock down huge amounts of memory the whole command submission to GPUs is bound to the memory management. So when to much memory would get blocked by the GPU we block further command submissions until the situation resolves. >> any complex allocators (GPU or otherwise) should respect that. And that >> seems like it should be the default way most of this works -- and I >> think it wouldn't actually take too much effort to make it all work now >> as is. (Our iopmem work is actually quite small and simple.) > Yes, absolutely, some kind of page pinning like locking is a hard > requirement. > >> Yeah, we've had RDMA and O_DIRECT transfers to PCIe backed ZONE_DEVICE >> memory working for some time. I'd say it's a good fit. The main question >> we've had is how to expose PCIe bars to userspace to be used as MRs and >> such. > Is there any progress on that? > > I still don't quite get what iopmem was about.. I thought the > objection to uncachable ZONE_DEVICE & DAX made sense, so running DAX > over iopmem and still ending up with uncacheable mmaps still seems > like a non-starter to me... > > Serguei, what is your plan in GPU land for migration? Ie if I have a > CPU mapped page and the GPU moves it to VRAM, it becomes non-cachable > - do you still allow the CPU to access it? Or do you swap it back to > cachable memory if the CPU touches it? Depends on the policy in command, but currently it's the other way around most of the time. E.g. we allocate memory in VRAM, the CPU writes to it WC and avoids reading because that is slow, the GPU in turn can access it with full speed. When we run out of VRAM we move those allocations to system memory and update both the CPU as well as the GPU page tables. So that move is transparent for both userspace as well as shaders running on the GPU. > One approach might be to mmap the uncachable ZONE_DEVICE memory and > mark it inaccessible to the CPU - DMA could still translate. If the > CPU needs it then the kernel migrates it to system memory so it > becomes cachable. ?? The whole purpose of this effort is that we can do I/O on VRAM directly without migrating everything back to system memory. Allowing this, but then doing the migration by the first touch of the CPU is clearly not a good idea. Regards, Christian. > > Jason