Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030461AbbDXSly (ORCPT ); Fri, 24 Apr 2015 14:41:54 -0400 Received: from mail-bl2on0142.outbound.protection.outlook.com ([65.55.169.142]:2128 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933115AbbDXSlw convert rfc822-to-8bit (ORCPT ); Fri, 24 Apr 2015 14:41:52 -0400 Authentication-Results: spf=none (sender IP is 165.204.84.221) smtp.mailfrom=amd.com; gmail.com; dkim=none (message not signed) header.d=none; X-WSS-ID: 0NNBQLM-07-A87-02 X-M-MSG: Message-ID: <553A8E62.4060802@amd.com> Date: Fri, 24 Apr 2015 21:41:38 +0300 From: Oded Gabbay User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Jerome Glisse , Christoph Lameter CC: Benjamin Herrenschmidt , , , , , , , , , , Cameron Buschardt , Mark Hairgrove , "Geoffrey Gerfin" , John McKenna , , "Bridgman, John" Subject: Re: Interacting with coherent memory on external devices References: <20150421214445.GA29093@linux.vnet.ibm.com> <1429663372.27410.75.camel@kernel.crashing.org> <20150422005757.GP5561@linux.vnet.ibm.com> <1429664686.27410.84.camel@kernel.crashing.org> <1429756070.4915.17.camel@kernel.crashing.org> <20150423162245.GC2399@gmail.com> In-Reply-To: <20150423162245.GC2399@gmail.com> Content-Type: text/plain; charset="windows-1252"; format=flowed X-Originating-IP: [10.224.54.110] Content-Transfer-Encoding: 8BIT X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:165.204.84.221;CTRY:US;IPV:NLI;EFV:NLI;BMV:1;SFV:NSPM;SFS:(10019020)(6009001)(428002)(51704005)(189002)(199003)(377454003)(479174004)(24454002)(164054003)(80316001)(65806001)(19580405001)(19580395003)(87936001)(4001350100001)(50986999)(76176999)(65816999)(46102003)(87266999)(65956001)(54356999)(50466002)(47776003)(83506001)(86362001)(101416001)(77096005)(15975445007)(1720100001)(2950100001)(23746002)(59896002)(5001770100001)(106466001)(33656002)(105586002)(92566002)(62966003)(77156002)(93886004)(36756003)(3940600001);DIR:OUT;SFP:1102;SCL:1;SRVR:BN1PR02MB197;H:atltwp01.amd.com;FPR:;SPF:None;MLV:sfv;MX:1;A:1;LANG:en; X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(42134001)(42139001);SRVR:BN1PR02MB197; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006)(3002001);SRVR:BN1PR02MB197;BCL:0;PCL:0;RULEID:;SRVR:BN1PR02MB197; X-Forefront-PRVS: 05568D1FF7 X-OriginatorOrg: amd4.onmicrosoft.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Apr 2015 18:41:48.9615 (UTC) X-MS-Exchange-CrossTenant-Id: fde4dada-be84-483f-92cc-e026cbee8e96 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=fde4dada-be84-483f-92cc-e026cbee8e96;Ip=[165.204.84.221];Helo=[atltwp01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN1PR02MB197 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4242 Lines: 97 On 04/23/2015 07:22 PM, Jerome Glisse wrote: > On Thu, Apr 23, 2015 at 09:20:55AM -0500, Christoph Lameter wrote: >> On Thu, 23 Apr 2015, Benjamin Herrenschmidt wrote: >> >>>> There are hooks in glibc where you can replace the memory >>>> management of the apps if you want that. >>> >>> We don't control the app. Let's say we are doing a plugin for libfoo >>> which accelerates "foo" using GPUs. >> >> There are numerous examples of malloc implementation that can be used for >> apps without modifying the app. > > What about share memory pass btw process ? Or mmaped file ? Or > a library that is loaded through dlopen and thus had no way to > control any allocation that happen before it became active ? > >>> >>> Now some other app we have no control on uses libfoo. So pointers >>> already allocated/mapped, possibly a long time ago, will hit libfoo (or >>> the plugin) and we need GPUs to churn on the data. >> >> IF the GPU would need to suspend one of its computation thread to wait on >> a mapping to be established on demand or so then it looks like the >> performance of the parallel threads on a GPU will be significantly >> compromised. You would want to do the transfer explicitly in some fashion >> that meshes with the concurrent calculation in the GPU. You do not want >> stalls while GPU number crunching is ongoing. > > You do not understand how GPU works. GPU have a pools of thread, and they > always try to have the pool as big as possible so that when a group of > thread is waiting for some memory access, there are others thread ready > to perform some operation. GPU are about hidding memory latency that's > what they are good at. But they only achieve that when they have more > thread in flight than compute unit. The whole thread scheduling is done > by hardware and barely control by the device driver. > > So no having the GPU wait for a page fault is not as dramatic as you > think. If you use GPU as they are intended to use you might even never > notice the pagefault and reach close to the theoritical throughput of > the GPU nonetheless. > > >> >>> The point I'm making is you are arguing against a usage model which has >>> been repeatedly asked for by large amounts of customer (after all that's >>> also why HMM exists). >> >> I am still not clear what is the use case for this would be. Who is asking >> for this? > > Everyone but you ? OpenCL 2.0 specific request it and have several level > of support about transparent address space. The lowest one is the one > implemented today in which application needs to use a special memory > allocator. > > The most advance one imply integration with the kernel in which any > memory (mmaped file, share memory or anonymous memory) can be use by > the GPU and does not need to come from a special allocator. > > Everyone in the industry is moving toward the most advance one. That > is the raison d'?tre of HMM, to provide this functionality on hw > platform that do not have things such as CAPI. Which is x86/arm. > > So use case is all application using OpenCL or Cuda. So pretty much > everyone doing GPGPU wants this. I dunno how you can't see that. > Share address space is so much easier. Believe it or not most coders > do not have deep knowledge of how things work and if you can remove > the complexity of different memory allocation and different address > space from them they will be happy. > > Cheers, > J?r?me I second what Jerome said, and add that one of the key features of HSA is the ptr-is-a-ptr scheme, where the applications do *not* need to handle different address spaces. Instead, all the memory is seen as a unified address space. See slide 6 on the following presentation: http://www.slideshare.net/hsafoundation/hsa-overview Thanks, Oded > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/