Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753216AbdGSBrw (ORCPT ); Tue, 18 Jul 2017 21:47:52 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:9789 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753177AbdGSBrt (ORCPT ); Tue, 18 Jul 2017 21:47:49 -0400 Subject: Re: [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5 To: Jerome Glisse References: <20170713211532.970-1-jglisse@redhat.com> <2d534afc-28c5-4c81-c452-7e4c013ab4d0@huawei.com> <20170718153816.GA3135@redhat.com> CC: , , John Hubbard , David Nellans , Dan Williams , Balbir Singh , "Michal Hocko" From: Bob Liu Message-ID: Date: Wed, 19 Jul 2017 09:46:10 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170718153816.GA3135@redhat.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.142.83.150] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.596EBA43.00A1,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: a7fdc1bf43ffbba85638c12db6b2a89a Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2483 Lines: 60 On 2017/7/18 23:38, Jerome Glisse wrote: > On Tue, Jul 18, 2017 at 11:26:51AM +0800, Bob Liu wrote: >> On 2017/7/14 5:15, J?r?me Glisse wrote: >>> Sorry i made horrible mistake on names in v4, i completly miss- >>> understood the suggestion. So here i repost with proper naming. >>> This is the only change since v3. Again sorry about the noise >>> with v4. >>> >>> Changes since v4: >>> - s/DEVICE_HOST/DEVICE_PUBLIC >>> >>> Git tree: >>> https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-cdm-v5 >>> >>> >>> Cache coherent device memory apply to architecture with system bus >>> like CAPI or CCIX. Device connected to such system bus can expose >>> their memory to the system and allow cache coherent access to it >>> from the CPU. >>> >>> Even if for all intent and purposes device memory behave like regular >>> memory, we still want to manage it in isolation from regular memory. >>> Several reasons for that, first and foremost this memory is less >>> reliable than regular memory if the device hangs because of invalid >>> commands we can loose access to device memory. Second CPU access to >>> this memory is expected to be slower than to regular memory. Third >>> having random memory into device means that some of the bus bandwith >>> wouldn't be available to the device but would be use by CPU access. >>> >>> This is why we want to manage such memory in isolation from regular >>> memory. Kernel should not try to use this memory even as last resort >>> when running out of memory, at least for now. >>> >> >> I think set a very large node distance for "Cache Coherent Device Memory" >> may be a easier way to address these concerns. > > Such approach was discuss at length in the past see links below. Outcome > of discussion: > - CPU less node are bad > - device memory can be unreliable (device hang) no way for application > to understand that Device memory can also be more reliable if using high quality and expensive memory. > - application and driver NUMA madvise/mbind/mempolicy ... can conflict > with each other and no way the kernel can figure out which should > apply > - NUMA as it is now would not work as we need further isolation that > what a large node distance would provide > Agree, that's where we need spend time on. One drawback of HMM-CDM I'm worry about is one more extra copy. In the cache coherent case, CPU can write data to device memory directly then start fpga/GPU/other accelerators. Thanks, Bob Liu