Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4820187imu; Tue, 8 Jan 2019 06:55:22 -0800 (PST) X-Google-Smtp-Source: ALg8bN6zBJtCwsHT031cn3i+VP2oyOP9fQRZbRh5QcgQRwlU3hkjcLEtrAOEnNWY/PhseUKMB3bL X-Received: by 2002:a63:5b1f:: with SMTP id p31mr1808817pgb.56.1546959322142; Tue, 08 Jan 2019 06:55:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546959322; cv=none; d=google.com; s=arc-20160816; b=hwg7sRbNMEsrLm2XMDTQ7pZ7/Jv64huX4dD5n8GMlEVcffscUvZ3IM+COAeyYmTBj3 D70Z049BASXurFIHHL/CtUT4Lqg7QxWP5idkfgODZnzIjFPgViZ+LtWPSXDmq0hLK8mh gPdRmtRqJjVu363G7YR9NrBQ662aMLSrRoGmrFdzUtOUefzU01d1ly5npXQYWUs98Pcv zCyJSmNE9cRoSnaPP0NPaobV5JGf8DnbpuRQoVhlcYJMQ3NY367r2Kqpf8EKN+cFslhW XKhR+lQX/sayID1l3rgo/x+IQ9H0DXSvCA228+DN4fSrLyiDit9VBfrtZW4jnBdP0d2c aPGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=cSsMJrqzk5znOS6S+TUGXRsrrq9dbdy1qK9L5/IjonY=; b=e/LGRN9JDro60ePmf7S5lu6ux39k0DM3CTs4TckYV0Py0DOaOEjZWhmHza3rVm2LD6 OaUdoXW3llZxd40zHNYX+40tnv/cDE0z1jJy84KcjUtZfuWEEu8fJd8YlLfwHG5GS584 ewKjynDJK624VCrLO8JRzB4YESDpBSiPxVvXkEET9biQy17PcRmVVi5jHrtpuy3Xbz90 zTtsv81hwsyYZ68fhd/sg/8LCR98ant80Ls3ptkMR2DniFopbjW+ztKVM60oRzDibrGm XK1EnVqUnY9KghJgjGOI9Fgouh0rh87JHD4n32n/o4aicm07h8Ld5Wmw9lfSV6RZvFe/ qU2A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w75si1254159pfd.55.2019.01.08.06.55.06; Tue, 08 Jan 2019 06:55:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729049AbfAHOxG (ORCPT + 99 others); Tue, 8 Jan 2019 09:53:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:54640 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728453AbfAHOxE (ORCPT ); Tue, 8 Jan 2019 09:53:04 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5098FB0B1; Tue, 8 Jan 2019 14:53:03 +0000 (UTC) Date: Tue, 8 Jan 2019 15:53:02 +0100 From: Michal Hocko To: Dave Hansen Cc: Fengguang Wu , Andrew Morton , Linux Memory Management List , kvm@vger.kernel.org, LKML , Fan Du , Yao Yuan , Peng Dong , Huang Ying , Liu Jingqi , Dong Eddie , Zhang Yi , Dan Williams Subject: Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Message-ID: <20190108145302.GY31793@dhcp22.suse.cz> References: <20181226131446.330864849@intel.com> <20181227203158.GO16738@dhcp22.suse.cz> <20181228050806.ewpxtwo3fpw7h3lq@wfg-t540p.sh.intel.com> <20181228084105.GQ16738@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 02-01-19 10:12:04, Dave Hansen wrote: > On 12/28/18 12:41 AM, Michal Hocko wrote: > >> > >> It can be done in kernel page reclaim path, near the anonymous page > >> swap out point. Instead of swapping out, we now have the option to > >> migrate cold pages to PMEM NUMA nodes. > > OK, this makes sense to me except I am not sure this is something that > > should be pmem specific. Is there any reason why we shouldn't migrate > > pages on memory pressure to other nodes in general? In other words > > rather than paging out we whould migrate over to the next node that is > > not under memory pressure. Swapout would be the next level when the > > memory is (almost_) fully utilized. That wouldn't be pmem specific. > > Yeah, we don't want to make this specific to any particular kind of > memory. For instance, with lots of pressure on expensive, small > high-bandwidth memory (HBM), we might want to migrate some HBM contents > to DRAM. > > We need to decide on whether we want to cause pressure on the > destination nodes or not, though. I think you're suggesting that we try > to look for things under some pressure and totally avoid them. That > sounds sane, but I also like the idea of this being somewhat ordered. > > Think of if we have three nodes, A, B, C. A is fast, B is medium, C is > slow. If A and B are "full" and we want to reclaim some of A, do we: > > 1. Migrate A->B, and put pressure on a later B->C migration, or > 2. Migrate A->C directly > > ? > > Doing A->C is less resource intensive because there's only one migration > involved. But, doing A->B/B->C probably makes the app behave better > because the "A data" is presumably more valuable and is more > appropriately placed in B rather than being demoted all the way to C. This is a good question and I do not have a good answer because I lack experiences with such "many levels" systems. If we followed CPU caches model ten you are right that the fallback should be gradual. This is more complex implementation wise of course. Anyway, I believe that there is a lot of room for experimentations. If this stays an internal implementation detail without user API then there is also no promise on future behavior so nothing gets carved into stone since the day 1 when our experiences are limited. -- Michal Hocko SUSE Labs