Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935348AbcKWTnO (ORCPT ); Wed, 23 Nov 2016 14:43:14 -0500 Received: from mail-by2nam01on0072.outbound.protection.outlook.com ([104.47.34.72]:2816 "EHLO NAM01-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S935089AbcKWTnI (ORCPT ); Wed, 23 Nov 2016 14:43:08 -0500 X-Greylist: delayed 88069 seconds by postgrey-1.27 at vger.kernel.org; Wed, 23 Nov 2016 14:43:08 EST Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Serguei.Sagalovitch@amd.com; Subject: Re: Enabling peer to peer device transactions for PCIe devices To: =?UTF-8?Q?Christian_K=c3=b6nig?= , Dan Williams , Dave Hansen , "linux-nvdimm@lists.01.org" , "linux-rdma@vger.kernel.org" , "linux-pci@vger.kernel.org" , "Kuehling, Felix" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Sander, Ben" , "Suthikulpanit, Suravee" , "Deucher, Alexander" , "Blinzer, Paul" , "Linux-media@vger.kernel.org" References: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com> <20161123074902.ph7a5cmlw3pclugx@phenom.ffwll.local> <2a8a6582-f3de-5cda-0c6e-1c93774147e0@amd.com> From: Serguei Sagalovitch Message-ID: <6ed2c861-7493-902b-bfbd-9c937a3efba3@amd.com> Date: Wed, 23 Nov 2016 14:27:06 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <2a8a6582-f3de-5cda-0c6e-1c93774147e0@amd.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [165.204.55.251] X-ClientProxiedBy: CY4PR22CA0038.namprd22.prod.outlook.com (10.172.142.152) To CY1PR12MB0697.namprd12.prod.outlook.com (10.163.238.158) X-MS-Office365-Filtering-Correlation-Id: b99dd0d9-1993-44f1-a970-08d413d6b947 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:CY1PR12MB0697; X-Microsoft-Exchange-Diagnostics: 1;CY1PR12MB0697;3:UxydGWLdrwV5gTB0y+xpzYmgTw6ltLhfgTOjxtZphOm8sSJGRuFvrX6yHWvjswHucTtCglswAjgiqbSXLxk3eBi0aAiC1hsd7ED0y2qrSjiSfEleQE1qsSiz0Vu+7WlSFrgRP6D9APl4QO6I7pdP33zkc/pKFhqe94VGonW7jfixZR1YGAYl0Snf/fY12Z29BtWgghJ3bN5F8Ti2AQEWGIEJL24djsAX5F6pjbzAjqwDylvVmfDA1lbm3qiHdMxiZiKqQvODdVoIOoYTtlvmGg==;25:Npck4bE6QfL9h1FeS1wYfO0rQ+tCHn0FgbXPNaGpa6jPqYa9auVMRcOQsjA+P2FDoyeM5vrMkE7cONGY4EctPVJZchf7IF1RRVXD6h5fOGsNgwoBZKg5CSeFTiv/VGDKtkuQBmbXyHlll1pBB31Tz5qHyW65P0ta2mS0pruHMWk9QtQVwez2ucF7vDVw61H0OsMlEfZpHdq0usOwdDPuiv67lVkeYfxLaK5uNDSrfZ52xJlNex4D+ruxLWKITEm54/lELPakmwVpGwCCO8A6L7IpbfcugvEyxT4kQAQfIqyVh8K12bJeL6bu4RhsXByeRh1KUv3f/dsw8JJ0pCKUeqp57UNCajadzK+SEw1yjw1OB6blWHHp50h59rQvbmMrwgZ79Dpi0D5KkUsAXozKeeFwlg+rUpIp70Bau7nVpk9OuYGtkCpAeyC9fGgWEMrNZYZqgBRsBniyloys+r8zeA== X-Microsoft-Exchange-Diagnostics: 1;CY1PR12MB0697;31:FwsKt3f5rXNlyt8xaYngxaO0NQXUTCDJRDh641bgTIceAVOMOehcgtKuCLXaUssBGrsfb9FVc9fuAsUe2OOGGspSKOS14ejLLNWer8uGP/boHSC8sM7uWDask/ry3GhNQOa+RyZf8R15YNIKxs2OrrJRT1iYPWxBjvEDcSYU3o46MzLTg9vf71kE6b3PyTNceYsPY3rLHAgm52YxJNgBb5kC2MiluNFksth//8M1H9zQq5It/CAwUCtPtFOw+yj3wIm4jb+WS3CED8iZL2LJcA==;20:2MkJe0YU8gGC6b+ndvA96RcENQkPuo7PzBbUhxPw3KbHKEMdZMS71LV/HX4NxiQ3SIOHq2g97b7LQUqdfbJ/xkDIoqPMCex9SGec0eUmgpoXFgcCKcuhE1s+9tnHSarqs8QvW0cdzJuzlnZbHJBuLL1tWDtWTM9pvQQcQ2xqn64G8FzKgC+mdj2xaa/TOzmh6mkFaUx4dFibwdAUaZFvhh54p6Rt+41CeOoF2dm3v0a2AD3aMSN+cPhSO+a1uInlsmAVlaJNG7X9hYY22NHTsCcMf8/485Cb8eLkKV4TweScBs7XHEQOg20zzsmLhXSxlY/iI84fEq3iToHqKs5NX7aEjwbLobf1EjCaMr0nzqMMZFulk4iAFpJ1hu2evB6mybv+sxb5jCmBQf7q+CvStEnDbqu5R7wV2KtgV5A1CJNsE9j1c8ApnU95JVR/5HuHLRviHLitNfHzLGELMoa1UuyqEnZt/1f3Bf7qegQXnKFtkES0hIqG17K3vYQKrNLg X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(767451399110)(228905959029699)(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040307)(6060326)(6045199)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6055026)(6061324)(6041248)(6072148);SRVR:CY1PR12MB0697;BCL:0;PCL:0;RULEID:;SRVR:CY1PR12MB0697; X-Microsoft-Exchange-Diagnostics: 1;CY1PR12MB0697;4:lhZWWcQ4lVBVX5X6CdYDVez5XdhLULTxdIBfD8kCmJ0SuPBVPDAFRY8FqtY8kcTt6wLwOi7OB+v0QQW8xCka2LKkOOw3PEvamHystYOfEKx/0YzQftYVEMqVdLkGyECgSEQ4cHgkrtiQ+bb2ny1Pf5xUOZX78i7oADMHWXad35Br0/9Mp9SzqYPBQjFQ0XRo0zl6ZFBYKap0kHKB/aK8Y8QmS3JfIap/jQ4+HAZRBUBx0RX9i8A2N9gfcx/qpO8FHHF3lA57RwRtedNVQxrQWutEw3dHdC0cuTvLF+yndPGC6aZ+arXFXnlldESK718qz5e+B1l8XIkOtCccmZKJ4KdFnXHMdg5R2/Oa03v4+PrI4RGMMFuAPd/3/0XbUXFFlVZpJsBKwoM1qfDnCo/TUVWNkcyJ8YEsTrC7lqS9m5iTcU+naVZbOUYUMLMloriWrdKBqaXr7m6wwoTlghWxbZx3rxB0OTuA9m+NNirRky2Gm+z0J3B7rLBdmoO58Mm652ai4Vsu4hkg/2Ji/TqIN6GZZBbjmkfE+ZYlivqhytsQdofQTuamWhwXU1WVsBYONmhD+s8Bo0s9/QugRp/ebCNTvd/ckO3kfCZTAMz4HT6z6EV8D5UObVWBRc0Sb4yX X-Forefront-PRVS: 013568035E X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(6049001)(7916002)(377454003)(199003)(377424004)(189002)(24454002)(97736004)(77096005)(4001350100001)(33646002)(189998001)(36756003)(50986999)(107886002)(76176999)(50466002)(2201001)(101416001)(3846002)(5001770100001)(54356999)(81156014)(6116002)(93886004)(38730400001)(4001150100001)(8676002)(31686004)(68736007)(81166006)(42186005)(92566002)(105586002)(64126003)(83506001)(229853002)(2950100002)(2870700001)(31696002)(23746002)(7736002)(2501003)(65806001)(86362001)(66066001)(305945005)(65956001)(47776003)(6666003)(5660300001)(7846002)(106356001)(65826007)(2906002)(921003)(1121003);DIR:OUT;SFP:1101;SCL:1;SRVR:CY1PR12MB0697;H:[172.27.224.67];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;CY1PR12MB0697;23:+tsijvK6S9TA1t+cMVs0xBPCmnRQC0vbhLZKH?= =?Windows-1252?Q?K+HZNBd+ewFyxGX2wyhmrZRlRWUBtN4AmD+1jwkmEoIEPPHlU+YvEy5/?= =?Windows-1252?Q?hxL/KpFqWU7Ny6ZzNw4rCbrOAqgpNYNCv5Jw2zq4wdVev+bP2MFIoQby?= =?Windows-1252?Q?bzGdcr2KMt1QC1tQfHekxJa0yshkYJsWuCKsFaoqdKKW9iZi+gJK8jvk?= =?Windows-1252?Q?N0FEvX/WizpxxH1Yv+R8V5rgFEZhBLxqAYJUm3rGC6EWvH9L4NrRYRli?= =?Windows-1252?Q?VQjWVMd8t+93UljI/g/HPfuWzoIW8z5u22x5Tkfj+qMWTCiV+XI97I83?= =?Windows-1252?Q?+dw9sKEwtI6G4jh+l73e9vpHuameBYP8YdOSEg8wnDjkGgMoedDhFD1P?= =?Windows-1252?Q?tn9XP/r37b6iBv8cAwKRCTBwB5AyLYpYBMPvqP8lvpvtlxr4NMjdgm/s?= =?Windows-1252?Q?srl5OPIau3rCX4LFzzE1kT4FDDDLJXCQ+cnKTrbI5fk4S02T9uX01tPv?= =?Windows-1252?Q?LfDrdbPV90HvTCtEPYNvIqBzxK88wza967cHXxL+adyPEsKoqgIaZ7Pc?= =?Windows-1252?Q?X6+eje8e0i5dSicUbJEdehF7Jrh4c64Moze2gXxUzOG7MWBD8t+HPyg2?= =?Windows-1252?Q?nEQr2JkIyQGmYU3VI54Ahz/roK4Jez1K/2gNtO3v2gBUhTu24LIyQVzB?= =?Windows-1252?Q?rZgkRe0zW9nbU5/XJP0bVG6puL4plxQmOSiJDCvZY3Md+i0T2ox4ZvtB?= =?Windows-1252?Q?+MU0WSuQz2uD8TGoXYF4vblgR+5K5Q3nVQw/dGI8hla6COTDZ0ZfZPsy?= =?Windows-1252?Q?z3mLXgKyXh0u0ftMgdW8YYgnUfNJSh2yH1Ih4llTVqA7A2LBjKFQKvKT?= =?Windows-1252?Q?uhzqZ6ShNk2qyovbJfRmdrP+nGWeIgFmNl9XJ5Tw6fhqPV6Cz7nU1HZm?= =?Windows-1252?Q?LQVpfOTZY5XreznuC2IlH+JzxivlcHclXkYhFS9SGY1Y+gVIfBSlPdNr?= =?Windows-1252?Q?39iygZGjo/VBWYhh2Lu3do/rN1ZbcdpyHl5Dw3Bd8brlV3bpGg29uzYD?= =?Windows-1252?Q?2Y1fgZAmBIKyM6nQqm3FyM1WlgJgFJ4+/Fp7y8yl63PGa8ExOVrBQKmX?= =?Windows-1252?Q?Sc0ebjusIzl8yKG6Qup2sHtufAgGdDzJ3HoFojcjx900fbILonZcBbAx?= =?Windows-1252?Q?pxsbiUnart2q4k7WTwrmHk0FJY5gcmSGl+kolQ3TaF7OZvxEryyOMl9/?= =?Windows-1252?Q?Zx6+GlInjmD8zxiNOW92o1xhSguIfMZiq3FI0q2Gqb53tVQWqrj88Pj0?= =?Windows-1252?Q?a6ViM6AiNMo321Su5y/pW1yv3u2KSYuoD0rtAANBn/SGz6EFAIU/Sc0g?= =?Windows-1252?Q?LtksBLhRVAec0sNvuvgBNKW+T4FldUKctU0+YilJOeTcn8lMoOQDtI7i?= =?Windows-1252?Q?+8h+ST6Jofnbyz2PHBQkLPJEGaA6ZrqvU3g2ll3DAaRi0LYkTiVujLVu?= =?Windows-1252?Q?UHhiDjO4dyKwVlecZGAszkrFNcec+iAkeG6bGKZXsO4NLRgPw=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;CY1PR12MB0697;6:fVvGKpOPFXnsm1weLSFLp+fB8hUlStw4a4XeQcUbCM+bXdVrnps+CWj7tIzp2+QVkXN3zl7PpSOVoPvaaHnTioUuVA/yTatpisnMx1W0WLdMVgGawGP3iRvzcRQhyZ+hkUnbj2LvCJKo9l3eZNwz0aIBr9pbrofWUf+TcUqEx4Sr/MCp6zilNQBvOoC4h7FMZCRyrMul6gn2cDzZw/HB30ID2wey85RkJJLK36TJstnuW0Aq+9d9EMJr4zyuNnmsRGDwz/YD69GqxMpRvPDl2yYIkIN1t2KJs9Ah1fsxvXtSkab6XBEGyTmA3KxINhqB6figO/RyO7FLUTTHRnH8vqXVeyY5w2QjevdWHsvrf0GkUXpXtfGAsJgekLnuPokvnnLOVxi/0q73LIQ7ZkCupwlt1y//4GwEWh2/AJaJZYvVnmiyIBmSCzGYP0mJnBEGR93cuCytiXZKb1KPmz5JxRR+kJ/WDgP4YT7puIdyxzEol8MLhPhysXLwF5keYMYI;5:XNE/39MMJgb/uFV0yx/CsAN/D0BGN3WF50KQnPR2PWlb4wLx0CpPxql8Pjy+eiPFaavrwBHKkTmCZzaqA1acdqeYE6rcg71AHB2vqkyX3v+tYBHiOGWmQlKKYSYML5LJP8Tlf/dBsqQrahZyuNbsDA==;24:ziCIdWlLFX6a8Z5wvw+4h9MeblJwhxLiMX5GMON7LhKuvTmdQcREhM7GmhyTgR9R5rlkQmNnA7YKOf8gd2DGLzWfHMJkQieaJaXsRIRgMTs= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY1PR12MB0697;7:tlBjfU5YqlMzLdwaaKwhdujDM/LJsncX/BHLseuYLLyT1zBpNBO64L5TT31TmSartX8F0Rcitogjbqsq2QJ4F57oH1tD6m6+FmmFzxUwrjEw9k/LgFM2NZPyxKDoSDylm8yrstpKd8JpGN0u+/CFqyL9vpv5zTpmcAc/WUlczJe2awjgUH5osUxPVoDapdpXemHWT5n2kBAw9vDY3mEytgssLzRdNFLhNa4mnUX6pAiO4EpjRa9b7C0WCbw3N6IiqzTlcUJMNFdapZbgtWm/ftH48z6ZirbSaXkhFvjxJxsCa0AQA7iu2nE8zXyVFEPBVR7IOmIgYxqH3Igk5F/PkTHS3Df+NQJDBGJjzSudU4JpocRTyAGnmovpww31W5OpfVJ3N+AWU+Svwj03fiOpOjIMmXsRH9ro3jM4mogY/F9oN8GPJsYUAJlSj4J2ABq2LpHWOMcWSuWuJ6A2yv0Rsw==;20:AcOZpV6624SAfOplgsU7NMUCxesCz3s8xo0YUp1InC6HDhF5WnvkMGeZV5PPFBGCfy54uJXTQ42zMk960EX6XGkguaWItatEZTbGCtb5UDqSGUagGbZ1CDq+/dPwrysRO1+ivB0VBAuUFdoVF0QDhPI/cBgjUrvtdKDpHdoouS3ZyAeJZBIrErnsvx+wudB0q8+nscJSvP01Xfr9bpL0IRsXYZUgpu/d4SU+wKHfh6iqjupWeeMJO5XrU3VAUK7e X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2016 19:27:10.4001 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR12MB0697 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4837 Lines: 110 On 2016-11-23 03:51 AM, Christian K?nig wrote: > Am 23.11.2016 um 08:49 schrieb Daniel Vetter: >> On Tue, Nov 22, 2016 at 01:21:03PM -0800, Dan Williams wrote: >>> On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter wrote: >>>> On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch >>>> wrote: >>>>> On 2016-11-22 03:10 PM, Daniel Vetter wrote: >>>>>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams >>>>>> >>>>>> wrote: >>>>>>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch >>>>>>> wrote: >>>>>>>> I personally like "device-DAX" idea but my concerns are: >>>>>>>> >>>>>>>> - How well it will co-exists with the DRM infrastructure / >>>>>>>> implementations >>>>>>>> in part dealing with CPU pointers? >>>>>>> Inside the kernel a device-DAX range is "just memory" in the sense >>>>>>> that you can perform pfn_to_page() on it and issue I/O, but the >>>>>>> vma is >>>>>>> not migratable. To be honest I do not know how well that co-exists >>>>>>> with drm infrastructure. >>>>>>> >>>>>>>> - How well we will be able to handle case when we need to >>>>>>>> "move"/"evict" >>>>>>>> memory/data to the new location so CPU pointer should >>>>>>>> point to the >>>>>>>> new >>>>>>>> physical location/address >>>>>>>> (and may be not in PCI device memory at all)? >>>>>>> So, device-DAX deliberately avoids support for in-kernel >>>>>>> migration or >>>>>>> overcommit. Those cases are left to the core mm or drm. The >>>>>>> device-dax >>>>>>> interface is for cases where all that is needed is a >>>>>>> direct-mapping to >>>>>>> a statically-allocated physical-address range be it persistent >>>>>>> memory >>>>>>> or some other special reserved memory range. >>>>>> For some of the fancy use-cases (e.g. to be comparable to what >>>>>> HMM can >>>>>> pull off) I think we want all the magic in core mm, i.e. >>>>>> migration and >>>>>> overcommit. At least that seems to be the very strong drive in all >>>>>> general-purpose gpu abstractions and implementations, where >>>>>> memory is >>>>>> allocated with malloc, and then mapped/moved into vram/gpu address >>>>>> space through some magic, >>>>> It is possible that there is other way around: memory is requested >>>>> to be >>>>> allocated and should be kept in vram for performance reason but due >>>>> to possible overcommit case we need at least temporally to "move" >>>>> such >>>>> allocation to system memory. >>>> With migration I meant migrating both ways of course. And with stuff >>>> like numactl we can also influence where exactly the malloc'ed memory >>>> is allocated originally, at least if we'd expose the vram range as a >>>> very special numa node that happens to be far away and not hold any >>>> cpu cores. >>> I don't think we should be using numa distance to reverse engineer a >>> certain allocation behavior. The latency data should be truthful, but >>> you're right we'll need a mechanism to keep general purpose >>> allocations out of that range by default. Btw, strict isolation is >>> another design point of device-dax, but I think in this case we're >>> describing something between the two extremes of full isolation and >>> full compatibility with existing numactl apis. >> Yes, agreed. My idea with exposing vram sections using numa nodes wasn't >> to reuse all the existing allocation policies directly, those won't >> work. >> So at boot-up your default numa policy would exclude any vram nodes. >> >> But I think (as an -mm layman) that numa gives us a lot of the tools and >> policy interface that we need to implement what we want for gpus. > > Agree completely. From a ten mile high view our GPUs are just command > processors with local memory as well . > > Basically this is also the whole idea of what AMD is pushing with HSA > for a while. > > It's just that a lot of problems start to pop up when you look at all > the nasty details. For example only part of the GPU memory is usually > accessible by the CPU. > > So even when numa nodes expose a good foundation for this I think > there is still a lot of code to write. > > BTW: I should probably start to read into the numa code of the kernel. > Any good pointers for that? I would assume that "page" allocation logic itself should be inside of graphics driver due to possible different requirements especially from graphics: alignment, etc. > > Regards, > Christian. > >> Wrt isolation: There's a sliding scale of what different users expect, >> from full auto everything, including migrating pages around if needed to >> full isolation all seems to be on the table. As long as we keep vram >> nodes >> out of any default allocation numasets, full isolation should be >> possible. >> -Daniel > > Sincerely yours, Serguei Sagalovitch