Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754148AbdIECie (ORCPT ); Mon, 4 Sep 2017 22:38:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47792 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753781AbdIECib (ORCPT ); Mon, 4 Sep 2017 22:38:31 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 775D57CDEB Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jglisse@redhat.com Date: Mon, 4 Sep 2017 22:38:27 -0400 From: Jerome Glisse To: Bob Liu Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, John Hubbard , Dan Williams , David Nellans , Balbir Singh , majiuyue , "xieyisheng (A)" , ross.zwisler@linux.intel.com Subject: Re: [HMM-v25 19/19] mm/hmm: add new helper to hotplug CDM memory region v3 Message-ID: <20170905023826.GA4836@redhat.com> References: <20170817000548.32038-1-jglisse@redhat.com> <20170817000548.32038-20-jglisse@redhat.com> <20170904155123.GA3161@redhat.com> <7026dfda-9fd0-2661-5efc-66063dfdf6bc@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7026dfda-9fd0-2661-5efc-66063dfdf6bc@huawei.com> User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 05 Sep 2017 02:38:31 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3826 Lines: 83 On Tue, Sep 05, 2017 at 09:13:24AM +0800, Bob Liu wrote: > On 2017/9/4 23:51, Jerome Glisse wrote: > > On Mon, Sep 04, 2017 at 11:09:14AM +0800, Bob Liu wrote: > >> On 2017/8/17 8:05, J?r?me Glisse wrote: > >>> Unlike unaddressable memory, coherent device memory has a real > >>> resource associated with it on the system (as CPU can address > >>> it). Add a new helper to hotplug such memory within the HMM > >>> framework. > >>> > >> > >> Got an new question, coherent device( e.g CCIX) memory are likely reported to OS > >> through ACPI and recognized as NUMA memory node. > >> Then how can their memory be captured and managed by HMM framework? > >> > > > > Only platform that has such memory today is powerpc and it is not reported > > as regular memory by the firmware hence why they need this helper. > > > > I don't think anyone has defined anything yet for x86 and acpi. As this is > > Not yet, but now the ACPI spec has Heterogeneous Memory Attribute > Table (HMAT) table defined in ACPI 6.2. > The HMAT can cover CPU-addressable memory types(though not non-cache > coherent on-device memory). > > Ross from Intel already done some work on this, see: > https://lwn.net/Articles/724562/ > > arm64 supports APCI also, there is likely more this kind of device when CCIX > is out (should be very soon if on schedule). HMAT is not for the same thing, AFAIK HMAT is for deep "hierarchy" memory ie when you have several kind of memory each with different characteristics: - HBM very fast (latency) and high bandwidth, non persistent, somewhat small (ie few giga bytes) - Persistent memory, slower (both latency and bandwidth) big (tera bytes) - DDR (good old memory) well characteristics are between HBM and persistent So AFAICT this has nothing to do with what HMM is for, ie device memory. Note that device memory can have a hierarchy of memory themself (HBM, GDDR and in maybe even persistent memory). > > memory on PCIE like interface then i don't expect it to be reported as NUMA > > memory node but as io range like any regular PCIE resources. Device driver > > through capabilities flags would then figure out if the link between the > > device and CPU is CCIX capable if so it can use this helper to hotplug it > > as device memory. > > > > From my point of view, Cache coherent device memory will popular soon and > reported through ACPI/UEFI. Extending NUMA policy still sounds more reasonable > to me. Cache coherent device will be reported through standard mecanisms defined by the bus standard they are using. To my knowledge all the standard are either on top of PCIE or are similar to PCIE. It is true that on many platform PCIE resource is manage/initialize by the bios (UEFI) but it is platform specific. In some case we reprogram what the bios pick. So like i was saying i don't expect the BIOS/UEFI to report device memory as regular memory. It will be reported as a regular PCIE resources and then the device driver will be able to determine through some flags if the link between the CPU(s) and the device is cache coherent or not. At that point the device driver can use register it with HMM helper. The whole NUMA discussion happen several time in the past i suggest looking on mm list archive for them. But it was rule out for several reasons. Top of my head: - people hate CPU less node and device memory is inherently CPU less - device driver want total control over memory and thus to be isolated from mm mecanism and doing all those special cases was not welcome - existing NUMA migration mecanism are ill suited for this memory as access by the device to the memory is unknown to core mm and there is no easy way to report it or track it (this kind of depends on the platform and hardware) I am likely missing other big points. Cheers, J?r?me