Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752686AbdIHCCo (ORCPT ); Thu, 7 Sep 2017 22:02:44 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:5574 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751457AbdIHCCm (ORCPT ); Thu, 7 Sep 2017 22:02:42 -0400 Subject: Re: [HMM-v25 19/19] mm/hmm: add new helper to hotplug CDM memory region v3 To: Jerome Glisse References: <20170817000548.32038-1-jglisse@redhat.com> <20170904155123.GA3161@redhat.com> <7026dfda-9fd0-2661-5efc-66063dfdf6bc@huawei.com> <20170905023826.GA4836@redhat.com> <20170905185414.GB24073@linux.intel.com> <0bc5047d-d27c-65b6-acab-921263e715c8@huawei.com> <20170906021216.GA23436@redhat.com> <4f4a2196-228d-5d54-5386-72c3ffb1481b@huawei.com> <1726639990.10465990.1504805251676.JavaMail.zimbra@redhat.com> CC: Ross Zwisler , , , , John Hubbard , Dan Williams , David Nellans , Balbir Singh , majiuyue , "xieyisheng (A)" From: Bob Liu Message-ID: <863afc77-ed84-fed5-ebb8-d88e636816a3@huawei.com> Date: Fri, 8 Sep 2017 09:59:15 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1726639990.10465990.1504805251676.JavaMail.zimbra@redhat.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.142.83.150] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090202.59B1F97D.0059,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: b78f3a304e01f614c5c679b1ba30a33b Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4012 Lines: 97 On 2017/9/8 1:27, Jerome Glisse wrote: >> On 2017/9/6 10:12, Jerome Glisse wrote: >>> On Wed, Sep 06, 2017 at 09:25:36AM +0800, Bob Liu wrote: >>>> On 2017/9/6 2:54, Ross Zwisler wrote: >>>>> On Mon, Sep 04, 2017 at 10:38:27PM -0400, Jerome Glisse wrote: >>>>>> On Tue, Sep 05, 2017 at 09:13:24AM +0800, Bob Liu wrote: >>>>>>> On 2017/9/4 23:51, Jerome Glisse wrote: >>>>>>>> On Mon, Sep 04, 2017 at 11:09:14AM +0800, Bob Liu wrote: >>>>>>>>> On 2017/8/17 8:05, Jérôme Glisse wrote: > > [...] > >>> For HMM each process give hint (somewhat similar to mbind) for range of >>> virtual address to the device kernel driver (through some API like OpenCL >>> or CUDA for GPU for instance). All this being device driver specific ioctl. >>> >>> The kernel device driver have an overall view of all the process that use >>> the device and each of the memory advise they gave. From that informations >>> the kernel device driver decide what part of each process address space to >>> migrate to device memory. >> >> Oh, I mean CDM-HMM. I'm fine with HMM. > > They are one and the same really. In both cases HMM is just a set of helpers > for device driver. > >>> This obviously dynamic and likely to change over the process lifetime. >>> >>> My understanding is that HMAT want similar API to allow process to give >>> direction on >>> where each range of virtual address should be allocated. It is expected >>> that most >> >> Right, but not clear who should manage the physical memory allocation and >> setup the pagetable mapping. An new driver or the kernel? > > Physical device memory is manage by the kernel device driver as it is today > and has it will be tomorrow. HMM does not change that, nor does it requires > any change to that. > Can someone from Intel give more information about the plan of managing HMAT reported memory? > Migrating process memory to or from device is done by the kernel through > the regular page migration. HMM provides new helper for device driver to > initiate such migration. There is no mechanisms like auto numa migration > for the reasons i explain previously. > > Kernel device driver use all knowledge it has to decide what to migrate to > device memory. Nothing new here either, it is what happens today for special > allocated device object and it will just happen all the same for regular > mmap memory (private anonymous or mmap of a regular file of a filesystem). > > > So every low level thing happen in the kernel. Userspace only provides > directive to the kernel device driver through device specific API. But the > kernel device driver can ignore or override those directive. > > >>> software can easily infer what part of its address will need more >>> bandwidth, smaller >>> latency versus what part is sparsely accessed ... >>> >>> For HMAT i think first target is HBM and persistent memory and device >>> memory might >>> be added latter if that make sense. >>> >> >> Okay, so there are two potential ways for CPU-addressable cache-coherent >> device memory >> (or cpu-less numa memory or "target domain" memory in ACPI spec )? >> 1. CDM-HMM >> 2. HMAT > > No this are 2 orthogonal thing, they do not conflict with each others quite > the contrary. HMM (the CDM part is no different) is a set of helpers, see > it as a toolbox, for device driver. > > HMAT is a way for firmware to report memory resources with more informations > that just range of physical address. HMAT is specific to platform that rely > on ACPI. HMAT does not provide any helpers to manage these memory. > > So a device driver can get informations about device memory from HMAT and then > use HMM to help in managing and using this memory. > Yes, but as Balbir mentioned requires : 1. Don't online the memory as a NUMA node 2. Use the HMM-CDM API's to map the memory to ZONE DEVICE via the driver And I'm not sure whether Intel going to use this HMM-CDM based method for their "target domain" memory ? Or they prefer to NUMA approach? Ross? Dan? -- Thanks, Bob Liu