Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030500AbbDXT3H (ORCPT ); Fri, 24 Apr 2015 15:29:07 -0400 Received: from mail-qc0-f180.google.com ([209.85.216.180]:32969 "EHLO mail-qc0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755119AbbDXT3E (ORCPT ); Fri, 24 Apr 2015 15:29:04 -0400 Date: Fri, 24 Apr 2015 15:29:00 -0400 From: Jerome Glisse To: Christoph Lameter Cc: Benjamin Herrenschmidt , paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, jglisse@redhat.com, mgorman@suse.de, aarcange@redhat.com, riel@redhat.com, airlied@redhat.com, aneesh.kumar@linux.vnet.ibm.com, Cameron Buschardt , Mark Hairgrove , Geoffrey Gerfin , John McKenna , akpm@linux-foundation.org Subject: Re: Interacting with coherent memory on external devices Message-ID: <20150424192859.GF3840@gmail.com> References: <1429756456.4915.22.camel@kernel.crashing.org> <20150423161105.GB2399@gmail.com> <20150424150829.GA3840@gmail.com> <20150424164325.GD3840@gmail.com> <20150424171957.GE3840@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4266 Lines: 81 On Fri, Apr 24, 2015 at 01:56:45PM -0500, Christoph Lameter wrote: > On Fri, 24 Apr 2015, Jerome Glisse wrote: > > > > Right this is how things work and you could improve on that. Stay with the > > > scheme. Why would that not work if you map things the same way in both > > > environments if both accellerator and host processor can acceess each > > > others memory? > > > > Again and again share address space, having a pointer means the same thing > > for the GPU than it means for the CPU ie having a random pointer point to > > the same memory whether it is accessed by the GPU or the CPU. While also > > keeping the property of the backing memory. It can be share memory from > > other process, a file mmaped from disk or simply anonymous memory and > > thus we have no control whatsoever on how such memory is allocated. > > Still no answer as to why is that not possible with the current scheme? > You keep on talking about pointers and I keep on responding that this is a > matter of making the address space compatible on both sides. So if do that in a naive way, how can we migrate a chunk of memory to video memory while still handling properly the case where CPU try to access that same memory while it is migrated to the GPU memory. Without modifying a single line of mm code, the only way to do this is to either unmap from the cpu page table the range being migrated or to mprotect it in some way. In both case the cpu access will trigger some kind of fault. This is not the behavior we want. What we want is same address space while being able to migrate system memory to device memory (who make that decision should not be part of that discussion) while still gracefully handling any CPU access. This means if CPU access it we want to migrate memory back to system memory. To achieve this there is no way around adding couple of if inside the mm page fault code path. Now do you want each driver to add its own if branch or do you want a common infrastructure to do just that ? As i keep saying the solution you propose is what we have today, today we have fake share address space through the trick of remapping system memory at same address inside the GPU address space and also enforcing the use of a special memory allocator that goes behind the back of mm code. But this limit to only using system memory, you can not use video memory transparently through such scheme. Some trick use today is to copy memory to device memory and to not bother with CPU access pretend it can not happen and as such the GPU and CPU can diverge in what they see for same address. We want to avoid trick like this that just lead to some weird and unexpected behavior. As you pointed out, not using GPU memory is a waste and we want to be able to use it. Now Paul have more sofisticated hardware that offer oportunities to do thing in a more transparent and efficient way. > > > Then you had transparent migration (transparent in the sense that we can > > handle CPU page fault on migrated memory) and you will see that you need > > to modify the kernel to become aware of this and provide a common code > > to deal with all this. > > If the GPU works like a CPU (which I keep hearing) then you should also be > able to run a linu8x kernel on it and make it a regular NUMA node. Hey why > dont we make the host cpu a GPU (hello Xeon Phi). I am not saying it works like a CPU, i am saying it should face the same kind of pattern when it comes to page fault, ie page fault are not the end of the world for the GPU and you should not assume that all GPU threads will wait for a pagefault because this is not the common case on CPU. Yes we prefer when page fault never happen, so does the CPU. No, you can not run the linux kernel on the GPU unless you are willing to allow having the kernel runs on heterogneous architecture with different instruction set. Not even going into the problematic of ring level/system level. We might one day go down that road but i see no compeling point today. Cheers, J?r?me -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/