Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8541377imu; Fri, 28 Dec 2018 21:37:30 -0800 (PST) X-Google-Smtp-Source: ALg8bN6RHNAUV51dpe8ttINTWdBxip/8gVT6/j7hj3kywhiRpgfkO/gZVLaz1/4ZeODFd2tN8uaH X-Received: by 2002:a63:a16:: with SMTP id 22mr673386pgk.318.1546061850525; Fri, 28 Dec 2018 21:37:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546061850; cv=none; d=google.com; s=arc-20160816; b=iBk/ZmvxB/ont9T0Tx5CxoQgD41fnDzk/h35nh/SOl4ReQnyaQOxihLOga3CccoNOi hd6aodePchiW/Vh0MkbDdRKVUHgSrEzSjwVntRZUnTXEOxXGpUnw85Qxif7IGIxA+q29 UceC19H27s4ac1R5tAoHyM8XBlDxG4J+dToRJdkAg7/jFhyasm7VY9SPLbHQdP8GZGzQ hO0M1unG6SB2RsQZa12WgUCuy2I1K2yhxyxQSZ3hwNWuPneBFT9oAlDyNhnMLVNeEqsm 8AzAv1SI/v9qCo/8trdCs3l1C3NiNtQ9DNI1yDaCbt8/kcCwJWQcZEbIgrdmC7qfrZ0h C0wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=/NOnWhhA+35xIEzlosJ2GPcYhUEU6Msq5SENPbu7BQY=; b=zpeG7U8YvD8NUc8VYVbkiDiyvG06b4uf1CvAvX7ujMVtRHwrJqrSD2LTG9Q/8yqHe+ g6gSuRMUIfl81gKMYMsDEzVKlZ1HAg3LH/xdyp+YzJOb5hyhjisYhv3s7511vx/U8E9H 0Hg2rZxWaM+4c+dqkIVZwEiE9wEnPAZQwJRczhwAlAAnTTwGNYdMJ8rSvvCYvnb6SeA5 WH4MZhHAM/NJCFzbxK+47dZUGzHnVO0U1Ga0r+nO8u9TwQN7Pv6r85jfzbgg29tj5Ilw nLqSqrzGb7rNwC2ZpEaCeFL0MfslrmlSOOVDe1rrXf2fnAe9R9SYfPcVAWAp5LH6Lhsw 5hLQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h14si39031040pgd.189.2018.12.28.21.37.14; Fri, 28 Dec 2018 21:37:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732149AbeL1Tw2 (ORCPT + 99 others); Fri, 28 Dec 2018 14:52:28 -0500 Received: from mx2.suse.de ([195.135.220.15]:40386 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726466AbeL1Tw1 (ORCPT ); Fri, 28 Dec 2018 14:52:27 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 77850AF57; Fri, 28 Dec 2018 19:52:25 +0000 (UTC) Date: Fri, 28 Dec 2018 20:52:24 +0100 From: Michal Hocko To: Fengguang Wu Cc: Andrew Morton , Linux Memory Management List , kvm@vger.kernel.org, LKML , Fan Du , Yao Yuan , Peng Dong , Huang Ying , Liu Jingqi , Dong Eddie , Dave Hansen , Zhang Yi , Dan Williams , Mel Gorman , Andrea Arcangeli Subject: Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Message-ID: <20181228195224.GY16738@dhcp22.suse.cz> References: <20181226131446.330864849@intel.com> <20181227203158.GO16738@dhcp22.suse.cz> <20181228050806.ewpxtwo3fpw7h3lq@wfg-t540p.sh.intel.com> <20181228084105.GQ16738@dhcp22.suse.cz> <20181228094208.7lgxhha34zpqu4db@wfg-t540p.sh.intel.com> <20181228121515.GS16738@dhcp22.suse.cz> <20181228133111.zromvopkfcg3m5oy@wfg-t540p.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181228133111.zromvopkfcg3m5oy@wfg-t540p.sh.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Ccing Mel and Andrea] On Fri 28-12-18 21:31:11, Wu Fengguang wrote: > > > > I haven't looked at the implementation yet but if you are proposing a > > > > special cased zone lists then this is something CDM (Coherent Device > > > > Memory) was trying to do two years ago and there was quite some > > > > skepticism in the approach. > > > > > > It looks we are pretty different than CDM. :) > > > We creating new NUMA nodes rather than CDM's new ZONE. > > > The zonelists modification is just to make PMEM nodes more separated. > > > > Yes, this is exactly what CDM was after. Have a zone which is not > > reachable without explicit request AFAIR. So no, I do not think you are > > too different, you just use a different terminology ;) > > Got it. OK.. The fall back zonelists patch does need more thoughts. > > In long term POV, Linux should be prepared for multi-level memory. > Then there will arise the need to "allocate from this level memory". > So it looks good to have separated zonelists for each level of memory. Well, I do not have a good answer for you here. We do not have good experiences with those systems, I am afraid. NUMA is with us for more than a decade yet our APIs are coarse to say the least and broken at so many times as well. Starting a new API just based on PMEM sounds like a ticket to another disaster to me. I would like to see solid arguments why the current model of numa nodes with fallback in distances order cannot be used for those new technologies in the beginning and develop something better based on our experiences that we gain on the way. I would be especially interested about a possibility of the memory migration idea during a memory pressure and relying on numa balancing to resort the locality on demand rather than hiding certain NUMA nodes or zones from the allocator and expose them only to the userspace. > On the other hand, there will also be page allocations that don't care > about the exact memory level. So it looks reasonable to expect > different kind of fallback zonelists that can be selected by NUMA policy. > > Thanks, > Fengguang -- Michal Hocko SUSE Labs