Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946009AbbDXUcg (ORCPT ); Fri, 24 Apr 2015 16:32:36 -0400 Received: from mail-qg0-f53.google.com ([209.85.192.53]:36519 "EHLO mail-qg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031005AbbDXUcc (ORCPT ); Fri, 24 Apr 2015 16:32:32 -0400 Date: Fri, 24 Apr 2015 16:32:28 -0400 From: Jerome Glisse To: Christoph Lameter Cc: Benjamin Herrenschmidt , paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, jglisse@redhat.com, mgorman@suse.de, aarcange@redhat.com, riel@redhat.com, airlied@redhat.com, aneesh.kumar@linux.vnet.ibm.com, Cameron Buschardt , Mark Hairgrove , Geoffrey Gerfin , John McKenna , akpm@linux-foundation.org Subject: Re: Interacting with coherent memory on external devices Message-ID: <20150424203227.GG3840@gmail.com> References: <20150423161105.GB2399@gmail.com> <20150424150829.GA3840@gmail.com> <20150424164325.GD3840@gmail.com> <20150424171957.GE3840@gmail.com> <20150424192859.GF3840@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5710 Lines: 113 On Fri, Apr 24, 2015 at 03:00:18PM -0500, Christoph Lameter wrote: > On Fri, 24 Apr 2015, Jerome Glisse wrote: > > > > Still no answer as to why is that not possible with the current scheme? > > > You keep on talking about pointers and I keep on responding that this is a > > > matter of making the address space compatible on both sides. > > > > So if do that in a naive way, how can we migrate a chunk of memory to video > > memory while still handling properly the case where CPU try to access that > > same memory while it is migrated to the GPU memory. > > Well that the same issue that the migration code is handling which I > submitted a long time ago to the kernel. Yes so you had to modify the kernel for that ! So do we, and no, page migration as it exist is not sufficience and does not cover all use case we have. > > > Without modifying a single line of mm code, the only way to do this is to > > either unmap from the cpu page table the range being migrated or to mprotect > > it in some way. In both case the cpu access will trigger some kind of fault. > > Yes that is how Linux migration works. If you can fix that then how about > improving page migration in Linux between NUMA nodes first? In my case i can not use the page migration because there is no where to hook to explain how to migrate thing back and forth with a device. The page migration code is all on CPU and enjoy the benefit of being able to do thing atomicaly, i do not have such luxury. More over the core mm code assume that cpu pte migration entry is a short lived state. In case of migration to device memory we are talking about time span of several minutes. So obviously the page migration is not what we want, we want something similar but with different properties. That exactly what my HMM patchset does provide. What Paul wants to do however should be able to leverage the page migration that does exist. But again he has a far more advance platform. > > > This is not the behavior we want. What we want is same address space while > > being able to migrate system memory to device memory (who make that decision > > should not be part of that discussion) while still gracefully handling any > > CPU access. > > Well then there could be a situation where you have concurrent write > access. How do you reconcile that then? Somehow you need to stall one or > the other until the transaction is complete. No, it is exactly like thread on a CPU, if you have 2 threads that write to same address without having anykind of synchronization btw them, you can not predict what will be the end result. Same will happen here, either the GPU write goes last or the CPU one. Anyway this is not the use case we have in mind. We are thinking about concurrent access to same page but in a non conflicting way. Any conflicting access is a software bug like it is in the case of CPU threads. > > > This means if CPU access it we want to migrate memory back to system memory. > > To achieve this there is no way around adding couple of if inside the mm > > page fault code path. Now do you want each driver to add its own if branch > > or do you want a common infrastructure to do just that ? > > If you can improve the page migration in general then we certainly would > love that. Having faultless migration is certain a good thing for a lot of > functionality that depends on page migration. Faultless migration i am talking about is only on GPU side, but this is just an extra feature where you keep something mapped read only while migrating it to device memory and updating the GPU page table once done. So GPU will keep accessing system memory without interruption, this assume read only access. Otherwise you need a faulty migration thought you can cooperate with the thread scheduler to schedule other thread while migration is on going. > > > As i keep saying the solution you propose is what we have today, today we > > have fake share address space through the trick of remapping system memory > > at same address inside the GPU address space and also enforcing the use of > > a special memory allocator that goes behind the back of mm code. > > Hmmm... I'd like to know more details about that. Well there is no open source OpenCL 2.0 stack for discret GPU. But the idea is that you need special allocator because the GPU driver need to know about all the possible pages that might be use ie there is no page fault so all object need to be mapped and thus all page are pinned down. Well this is a little more complex as the special allocator keep track of each allocation creating an object for each of them and trying to only pin object that are use by current shader. Anyway bottom line is that it needs a special allocator, you can not use mmaped file directly or shared memory directly or anonymous memory allocated outside the special allocator. It require pinning memory. It can not migrate memory to device memory. We want to fix all that. > > > As you pointed out, not using GPU memory is a waste and we want to be able > > to use it. Now Paul have more sofisticated hardware that offer oportunities > > to do thing in a more transparent and efficient way. > > Does this also work between NUMA nodes in a Power8 system? My guess is that it just improve the device exchange with CPU, like trying to make the device memory access cost as much as would remote CPU memory access. I do not think it improve the regular NUMA nodes. Cheers, J?r?me -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/