Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3997635yba; Wed, 17 Apr 2019 02:19:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqzjFLl9L2q8yBIz10gMyTWGlePHEIYqfE09U4u+WF7YOb1tuGI+5HSl8atS7yNqi6uxqyEg X-Received: by 2002:a63:6f0a:: with SMTP id k10mr80007751pgc.78.1555492755107; Wed, 17 Apr 2019 02:19:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555492755; cv=none; d=google.com; s=arc-20160816; b=EQuxma8ZjICoK1QyqELNeaDHtP5AW3tDXHqn87Toit3yOQkVzYpGIxZc1AP6ErUsix JlMXf/Wedk3RwqKEghNKm9ogVeMp2n0H+03XjaYZ58S+AJQxvSIK+qk46cXsauwSE4kn Qj46OUEvRN2/jklPjbrJiQpXZadDJou5JBFZkeaiMgZut08rDc7muz1eyGNFNiXq/JtD RWo/tYbbL1vwMLYqCfo8w6ps6XFgPkW+vqV2jpYEbaee0YsVsN9LjC5/X4/kCEp4hM9r oDh8SH/6fMfhaxFMgZtx99JJrd1XO7mMgunplZjWQjQgMrqYg044aBdg5i2XIzJscFyK xQXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=t752BW4qXRp59Rcl77ihQois3ZtYc8icuukK40133oY=; b=edEqXG+N8wm+kkP2Or4hToUiksJ/LVnlfeEdhchTMIkxw8I6TYIGAXuKROUYty1qFj JgfcTqLr++rnKIG++wNrol3mvvD3WVDjJlCnZpT4+izW6LPa6Qfc/3VDdS3OdifnSyTP sSu/M4/ls/NX9qhqfTyBKgt5rNF7M1FqwdrppWJzAqrbr9bYJq9X1UoosTbB21YT2Smo o6RP8T7Mmd1DIIA6whhX+tNNDhS+UyRE3EckGGgQqH+9irmphYNtfokd1y0yxmTc+7o8 eftoFohE2tIILre/7JBK+meKgjOo9/V8injqFGBekgD6P97F4zqxeltt4W12BmliB8ht CmdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m3si45502863pgg.478.2019.04.17.02.19.00; Wed, 17 Apr 2019 02:19:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731521AbfDQJRv (ORCPT + 99 others); Wed, 17 Apr 2019 05:17:51 -0400 Received: from mx2.suse.de ([195.135.220.15]:48362 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726237AbfDQJRv (ORCPT ); Wed, 17 Apr 2019 05:17:51 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BCCB7AE16; Wed, 17 Apr 2019 09:17:49 +0000 (UTC) Date: Wed, 17 Apr 2019 11:17:48 +0200 From: Michal Hocko To: Yang Shi Cc: mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node Message-ID: <20190417091748.GF655@dhcp22.suse.cz> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> <20190412084702.GD13373@dhcp22.suse.cz> <20190416074714.GD11561@dhcp22.suse.cz> <876768ad-a63a-99c3-59de-458403f008c4@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <876768ad-a63a-99c3-59de-458403f008c4@linux.alibaba.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 16-04-19 12:19:21, Yang Shi wrote: > > > On 4/16/19 12:47 AM, Michal Hocko wrote: [...] > > Why cannot we simply demote in the proximity order? Why do you make > > cpuless nodes so special? If other close nodes are vacant then just use > > them. > > We could. But, this raises another question, would we prefer to just demote > to the next fallback node (just try once), if it is contended, then just > swap (i.e. DRAM0 -> PMEM0 -> Swap); or would we prefer to try all the nodes > in the fallback order to find the first less contended one (i.e. DRAM0 -> > PMEM0 -> DRAM1 -> PMEM1 -> Swap)? I would go with the later. Why, because it is more natural. Because that is the natural allocation path so I do not see why this shouldn't be the natural demotion path. > > |------|???? |------| |------|??????? |------| > |PMEM0|---|DRAM0| --- CPU0 --- CPU1 --- |DRAM1| --- |PMEM1| > |------|???? |------| |------|?????? |------| > > The first one sounds simpler, and the current implementation does so and > this needs find out the closest PMEM node by recognizing cpuless node. Unless you are specifying an explicit nodemask then the allocator will do the allocation fallback for the migration target for you. > If we prefer go with the second option, it is definitely unnecessary to > specialize any node. > > > > > I would expect that the very first attempt wouldn't do much more than > > > > migrate to-be-reclaimed pages (without an explicit binding) with a > > > Do you mean respect mempolicy or cpuset when doing demotion? I was wondering > > > this, but I didn't do so in the current implementation since it may need > > > walk the rmap to retrieve the mempolicy in the reclaim path. Is there any > > > easier way to do so? > > You definitely have to follow policy. You cannot demote to a node which > > is outside of the cpuset/mempolicy because you are breaking contract > > expected by the userspace. That implies doing a rmap walk. > > OK, however, this may prevent from demoting unmapped page cache since there > is no way to find those pages' policy. I do not really expect that hard numa binding for the page cache is a usecase we really have to lose sleep over for now. > And, we have to think about what we should do when the demotion target has > conflict with the mempolicy. Simply skip it. > The easiest way is to just skip those conflict > pages in demotion. Or we may have to do the demotion one page by one page > instead of migrating a list of pages. Yes one page at the time sounds reasonable to me. THis is how we do reclaim anyway. -- Michal Hocko SUSE Labs