Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4414446yba; Wed, 17 Apr 2019 10:58:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqxkNS7Tddo6MHAW2NSxfg/BQkOhBxRcTDRH8hRSeDe90vtpj5yM7miDtrkm0swa0+x7QhwT X-Received: by 2002:a62:b418:: with SMTP id h24mr90449378pfn.145.1555523939816; Wed, 17 Apr 2019 10:58:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555523939; cv=none; d=google.com; s=arc-20160816; b=hvaKi/FrhcQt8AME3yiRtlNSuStU1wX3ETpiADZ7VZJ12IcBNBtB51q/vvbpRUw9F/ 3ApYaGsUyTYH5JyNXbjWg6lY1fGOlyXS1kwBNKTIT3mgHAgVdIhxRKsnZU1/TYAOhBGX Qs/OaDNtT8xODgt8mnd3Dqk5d9faxiXIPuqT7A5BJhcI41n8j/l0v+Mx+KBPW5Zrrphx 7eAgkeCPi856ZV/XgPLb5owLria+OU5KrRNeXuIsGC3v2CNcGO1GvHMlq2dmawf2pWDl A7fh2wAiR/7ndPKhZEJdKtLUJEf72dgmLNNWy7FVV5dOWxBeaF+e18vorg9o56W6rvd+ 0ozg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=ilvRUx3wmP7GZ9qaUFWLZQbmKvfv6koXKSl+KoMq7LY=; b=VmIYV5BAK1XQ6MPn9SYojDTDByVTXCx9WCJ+va40aQdKKV/27+uNxn+NakmOXb8vi1 mrfUNznKv88Wh10U7kGfVoKEVR3HJDr6WPD7NZ1S7rtKdFDwjwW9yEr1RBZLcQKZ/iLl xpBkcEv0L5CLqHHveJpHxgfsNFME2fsOH6b9uu8mcwZ0pANgSnn3TW6iwy3QLsYXywHx yT/pNBVWngl/MFqEpqREhHVoyz5AvFMwj7yfXmCBzMCDmuqGiucbJ86rp6z7Qz9w2ynJ on1D42nKXGDGqyi3TAshIi+Tk+OZF7pChyRk/CBM927iEoXz8MqeKtgej18T/RuEfGRm 5KRw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y6si56032318pfb.269.2019.04.17.10.58.44; Wed, 17 Apr 2019 10:58:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732860AbfDQR5r (ORCPT + 99 others); Wed, 17 Apr 2019 13:57:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:39878 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732587AbfDQR5q (ORCPT ); Wed, 17 Apr 2019 13:57:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 60BC9AC63; Wed, 17 Apr 2019 17:57:45 +0000 (UTC) Date: Wed, 17 Apr 2019 19:57:43 +0200 From: Michal Hocko To: Dave Hansen Cc: Yang Shi , mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node Message-ID: <20190417175743.GC9523@dhcp22.suse.cz> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> <20190412084702.GD13373@dhcp22.suse.cz> <20190416074714.GD11561@dhcp22.suse.cz> <876768ad-a63a-99c3-59de-458403f008c4@linux.alibaba.com> <20190417092318.GG655@dhcp22.suse.cz> <5c2d37e1-c7f6-5b7b-4f8e-a34e981b841e@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5c2d37e1-c7f6-5b7b-4f8e-a34e981b841e@intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 17-04-19 10:13:44, Dave Hansen wrote: > On 4/17/19 2:23 AM, Michal Hocko wrote: > >> 3. The demotion path can not have cycles > > yes. This could be achieved by GFP_NOWAIT opportunistic allocation for > > the migration target. That should prevent from loops or artificial nodes > > exhausting quite naturaly AFAICS. Maybe we will need some tricks to > > raise the watermark but I am not convinced something like that is really > > necessary. > > I don't think GFP_NOWAIT alone is good enough. > > Let's say we have a system full of clean page cache and only two nodes: > 0 and 1. GFP_NOWAIT will eventually kick off kswapd on both nodes. > Each kswapd will be migrating pages to the *other* node since each is in > the other's fallback path. I was thinking along node reclaim like based migration. You are right that a parallel kswapd might reclaim enough to cause the ping pong and we might need to play some watermaks tricks but as you say below this is to be seen and a playground to explore. All I am saying is to try the most simplistic approach first without all the bells and whistles to see how this plays out with real workloads and build on top of that. We already do have model - node_reclaim - which turned out to suck a lot because the reclaim was just too aggressive wrt. refault. Maybe migration will turn out much more feasible. And maybe I am completely wrong and we need a much more complex solution. > I think what you're saying is that, eventually, the kswapds will see > allocation failures and stop migrating, providing hysteresis. This is > probably true. > > But, I'm more concerned about that window where the kswapds are throwing > pages at each other because they're effectively just wasting resources > in this window. I guess we should figure our how large this window is > and how fast (or if) the dampening occurs in practice. -- Michal Hocko SUSE Labs