Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966314AbdGUCLW (ORCPT ); Thu, 20 Jul 2017 22:11:22 -0400 Received: from szxga03-in.huawei.com ([45.249.212.189]:9395 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934727AbdGUCLV (ORCPT ); Thu, 20 Jul 2017 22:11:21 -0400 Subject: Re: [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5 To: Jerome Glisse References: <20170713211532.970-1-jglisse@redhat.com> <2d534afc-28c5-4c81-c452-7e4c013ab4d0@huawei.com> <20170718153816.GA3135@redhat.com> <20170719022537.GA6911@redhat.com> <20170720150305.GA2767@redhat.com> <20170721014106.GB25991@redhat.com> CC: , , John Hubbard , David Nellans , Dan Williams , Balbir Singh , "Michal Hocko" From: Bob Liu Message-ID: <052b3b89-6382-a1b8-270f-3a4e44158964@huawei.com> Date: Fri, 21 Jul 2017 10:10:11 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170721014106.GB25991@redhat.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.142.83.150] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090201.597162C7.0050,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 23c4a0a9227bec3849d7b96e7b05592b Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4186 Lines: 95 On 2017/7/21 9:41, Jerome Glisse wrote: > On Fri, Jul 21, 2017 at 09:15:29AM +0800, Bob Liu wrote: >> On 2017/7/20 23:03, Jerome Glisse wrote: >>> On Wed, Jul 19, 2017 at 05:09:04PM +0800, Bob Liu wrote: >>>> On 2017/7/19 10:25, Jerome Glisse wrote: >>>>> On Wed, Jul 19, 2017 at 09:46:10AM +0800, Bob Liu wrote: >>>>>> On 2017/7/18 23:38, Jerome Glisse wrote: >>>>>>> On Tue, Jul 18, 2017 at 11:26:51AM +0800, Bob Liu wrote: >>>>>>>> On 2017/7/14 5:15, J?r?me Glisse wrote: > > [...] > >>>> Then it's more like replace the numa node solution(CDM) with ZONE_DEVICE >>>> (type MEMORY_DEVICE_PUBLIC). But the problem is the same, e.g how to make >>>> sure the device memory say HBM won't be occupied by normal CPU allocation. >>>> Things will be more complex if there are multi GPU connected by nvlink >>>> (also cache coherent) in a system, each GPU has their own HBM. >>>> >>>> How to decide allocate physical memory from local HBM/DDR or remote HBM/ >>>> DDR? >>>> >>>> If using numa(CDM) approach there are NUMA mempolicy and autonuma mechanism >>>> at least. >>> >>> NUMA is not as easy as you think. First like i said we want the device >>> memory to be isolated from most existing mm mechanism. Because memory >>> is unreliable and also because device might need to be able to evict >>> memory to make contiguous physical memory allocation for graphics. >>> >> >> Right, but we need isolation any way. >> For hmm-cdm, the isolation is not adding device memory to lru list, and many >> if (is_device_public_page(page)) ... >> >> But how to evict device memory? > > What you mean by evict ? Device driver can evict whenever they see the need > to do so. CPU page fault will evict too. Process exit or munmap() will free > the device memory. > > Are you refering to evict in the sense of memory reclaim under pressure ? > > So the way it flows for memory pressure is that if device driver want to > make room it can evict stuff to system memory and if there is not enough Yes, I mean this. So every driver have to maintain their own LRU-similar list instead of reuse what already in linux kernel. > system memory than thing get reclaim as usual before device driver can > make progress on device memory reclaim. > > >>> Second device driver are not integrated that closely within mm and the >>> scheduler kernel code to allow to efficiently plug in device access >>> notification to page (ie to update struct page so that numa worker >>> thread can migrate memory base on accurate informations). >>> >>> Third it can be hard to decide who win between CPU and device access >>> when it comes to updating thing like last CPU id. >>> >>> Fourth there is no such thing like device id ie equivalent of CPU id. >>> If we were to add something the CPU id field in flags of struct page >>> would not be big enough so this can have repercusion on struct page >>> size. This is not an easy sell. >>> >>> They are other issues i can't think of right now. I think for now it >> >> My opinion is most of the issues are the same no matter use CDM or HMM-CDM. >> I just care about a more complete solution no matter CDM,HMM-CDM or other ways. >> HMM or HMM-CDM depends on device driver, but haven't see a public/full driver to >> demonstrate the whole solution works fine. > > I am working with NVidia close source driver team to make sure that it works > well for them. I am also working on nouveau open source driver for same NVidia > hardware thought it will be of less use as what is missing there is a solid > open source userspace to leverage this. Nonetheless open source driver are in > the work. > Looking forward to see these drivers be public. > The way i see it is start with HMM-CDM which isolate most of the changes in > hmm code. Once we get more experience with real workload and not with device > driver test suite then we can start revisiting NUMA and deeper integration > with the linux kernel. I rather grow organicaly toward that than trying to > design something that would make major changes all over the kernel without > knowing for sure that we are going in the right direction. I hope that this > make sense to others too. > Make sense. Thanks, Bob Liu