Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7756238imu; Fri, 28 Dec 2018 04:26:03 -0800 (PST) X-Google-Smtp-Source: ALg8bN4nu8QgAeLwNgsU+ajhV/9KJHY6IDtN8WaASG4zIslOj33AfpNnGFSsYqwSoBpho3uWzunI X-Received: by 2002:a17:902:6bc9:: with SMTP id m9mr27096757plt.173.1545999963378; Fri, 28 Dec 2018 04:26:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545999963; cv=none; d=google.com; s=arc-20160816; b=BOafHvja6ljoEbi50MDJOqIGCxiuYBSV6p/f/UfwV/fiGiOvo1P1IO7UFHDBhAp50w m8vkgijF0y0PiHzhOG4LtCGK6yp+W6XAeh10ooUWdmhJRqbvGp3wiC+QI1JiyoihkMro F8v0m59SraTpyJWjdXGd3r8jl8lrfadLYN0md5Qx0LJFmuHKrklnkhWv/Lq7SIFck1bh ZocHGKg6/wNjSgUalsIMSfpAk9NvNq819yrN99NhYW47l23S03NVL7QlnWgQeNZrF9Zu aTxtykmW8mTO/sxDD+2AzX1RdRvsrBrItWgdy4SFXD78+u5I0rGaAhjxLR/tPzIp/A1u avJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=2sPH82jqML16RG67tH9c+0o9XnoKgoBXC72PzISXTOE=; b=qEtLw4zZal38A+i8LWo32V3ly9aF+qcyVV5uNxxOgOpxzRcqPwgzaZDcoWTrhewbQr XaDGKq8G67ZJN4edk9yADlFr2v2SZYLWVttQ0VZB97NzBr4VW4Y7t1pbxiyjjVZR3LOQ homdFRDCV5xXr5/7vychSxSADoVL4hhsFXqD2hVZkZwSC6eWm05hl5JLwxgN6NmRy8yZ eJAHrnJe1iakVnKyFY253HaaeVtT8Ra9d3bdTEyC+NMcYrqxzRJzr1XLewWDKnFYeScj NgaVi3TahjzsTzkGq3/E2ErdieIGajz9dNb9sZFFKId4hBvnO+gDlq46umBz4yQccBrC iq9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n33si38565722pgl.336.2018.12.28.04.25.48; Fri, 28 Dec 2018 04:26:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728943AbeL1IlJ (ORCPT + 99 others); Fri, 28 Dec 2018 03:41:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:35420 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726700AbeL1IlJ (ORCPT ); Fri, 28 Dec 2018 03:41:09 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1CCEAAB93; Fri, 28 Dec 2018 08:41:07 +0000 (UTC) Date: Fri, 28 Dec 2018 09:41:05 +0100 From: Michal Hocko To: Fengguang Wu Cc: Andrew Morton , Linux Memory Management List , kvm@vger.kernel.org, LKML , Fan Du , Yao Yuan , Peng Dong , Huang Ying , Liu Jingqi , Dong Eddie , Dave Hansen , Zhang Yi , Dan Williams Subject: Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Message-ID: <20181228084105.GQ16738@dhcp22.suse.cz> References: <20181226131446.330864849@intel.com> <20181227203158.GO16738@dhcp22.suse.cz> <20181228050806.ewpxtwo3fpw7h3lq@wfg-t540p.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181228050806.ewpxtwo3fpw7h3lq@wfg-t540p.sh.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 28-12-18 13:08:06, Wu Fengguang wrote: [...] > Optimization: do hot/cold page tracking and migration > ===================================================== > > Since PMEM is slower than DRAM, we need to make sure hot pages go to > DRAM and cold pages stay in PMEM, to get the best out of PMEM and DRAM. > > - DRAM=>PMEM cold page migration > > It can be done in kernel page reclaim path, near the anonymous page > swap out point. Instead of swapping out, we now have the option to > migrate cold pages to PMEM NUMA nodes. OK, this makes sense to me except I am not sure this is something that should be pmem specific. Is there any reason why we shouldn't migrate pages on memory pressure to other nodes in general? In other words rather than paging out we whould migrate over to the next node that is not under memory pressure. Swapout would be the next level when the memory is (almost_) fully utilized. That wouldn't be pmem specific. > User space may also do it, however cannot act on-demand, when there > are memory pressure in DRAM nodes. > > - PMEM=>DRAM hot page migration > > While LRU can be good enough for identifying cold pages, frequency > based accounting can be more suitable for identifying hot pages. > > Our design choice is to create a flexible user space daemon to drive > the accounting and migration, with necessary kernel supports by this > patchset. We do have numa balancing, why cannot we rely on it? This along with the above would allow to have pmem numa nodes (cpuless nodes in fact) without any special casing and a natural part of the MM. It would be only the matter of the configuration to set the appropriate distance to allow reasonable allocation fallback strategy. I haven't looked at the implementation yet but if you are proposing a special cased zone lists then this is something CDM (Coherent Device Memory) was trying to do two years ago and there was quite some skepticism in the approach. -- Michal Hocko SUSE Labs