Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5592806imu; Wed, 26 Dec 2018 05:40:41 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xoes8/Q6TmOiK0KS96Bps8PxUHoLu3EJbNsnc1V5IWLYr13BuqFrlKtCmnQvAlQHl4u6Ai X-Received: by 2002:a62:399b:: with SMTP id u27mr20943558pfj.181.1545831640918; Wed, 26 Dec 2018 05:40:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545831640; cv=none; d=google.com; s=arc-20160816; b=uTY39BfTQxNR2R5bf35d31X2r5PkL0TQEkwsgBkpZZjiqWPjM1gHxoeXAsfA2MvX6/ 2NrN4ag512FAOvkvRgnO1FjdMAM7kpCIgHIdqe8LLB6WLU/ss8jbbGxE83gzCjnbNFFP BYp154SjM+NsookixHwxcUqvQ4CivaII1vRAMNqHXmB3mD9opqILJQm7n/VcGEvHUQ33 ui5VvgB2aKze9dB7QCXol+4b0lJamMOS6bozlHWXKHhlBDYXGsAkHX3LV0Rj1Oritnvn 3d/80w7ckVIRBMxCSLmevhJ+7swPQKicKm+bniAzCfRwPV2DVz9f/yV6OpTpa0nBisiq ovdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition:mime-version :references:subject:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:to:from:date :user-agent:message-id; bh=YdkPR857z2QJHU+MTdQBAeYrLi5HPVsYANWKLt6w3ss=; b=Gu68hRB5qmxXouBsJobg+bLG4nxJ+b5Cb5MF6sRJuCU6EH9f4fKwx5VaSoI4pf+mwW EdmVsPBNNJbPLB58L9eFQqmvfeSt23iA2pOQNTgISs1LPcMEwgxkAxWbu1wpp0PeEPSC WWYkDewW8oyDLXrCLkJDHnMH6Z0n3ZaUjpT3PIkvYTQS7ykFxmqp4MyoAdifOjjOZDuv G26qJQRVphmRKSrBgqLkPdRSgjcTaHWV8UzpuvaX2R3TgEaQ48PhbXxZm5FcQR+azejg ir80Q24VuXPCEsFP8YItOwh0pMICJmMZ5XE6OuA29t9vwNRtiwhNt4OOXrsx8Vx+GmBv 5jLQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w16si17023875plk.192.2018.12.26.05.40.26; Wed, 26 Dec 2018 05:40:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727279AbeLZNi5 (ORCPT + 99 others); Wed, 26 Dec 2018 08:38:57 -0500 Received: from mga04.intel.com ([192.55.52.120]:33944 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726935AbeLZNhG (ORCPT ); Wed, 26 Dec 2018 08:37:06 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Dec 2018 05:37:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,400,1539673200"; d="scan'208";a="121185462" Received: from wangdan1-mobl1.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.210.154]) by FMSMGA003.fm.intel.com with ESMTP; 26 Dec 2018 05:37:01 -0800 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1gc9Mr-0005OT-CD; Wed, 26 Dec 2018 21:37:01 +0800 Message-Id: <20181226133351.521151384@intel.com> User-Agent: quilt/0.65 Date: Wed, 26 Dec 2018 21:14:54 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Fan Du , Fengguang Wu cc: kvm@vger.kernel.org Cc: LKML cc: Yao Yuan cc: Peng Dong cc: Huang Ying CC: Liu Jingqi cc: Dong Eddie cc: Dave Hansen cc: Zhang Yi cc: Dan Williams Subject: [RFC][PATCH v2 08/21] mm: introduce and export pgdat peer_node References: <20181226131446.330864849@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=0019-mm-Introduce-and-export-peer_node-for-pgdat.patch Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Fan Du Each CPU socket can have 1 DRAM and 1 PMEM node, we call them "peer nodes". Migration between DRAM and PMEM will by default happen between peer nodes. It's a temp solution. In multiple memory layers, a node can have both promotion and demotion targets instead of a single peer node. User space may also be able to infer promotion/demotion targets based on future HMAT info. Signed-off-by: Fan Du Signed-off-by: Fengguang Wu --- drivers/base/node.c | 11 +++++++++++ include/linux/mmzone.h | 12 ++++++++++++ mm/page_alloc.c | 29 +++++++++++++++++++++++++++++ 3 files changed, 52 insertions(+) --- linux.orig/drivers/base/node.c 2018-12-23 19:39:51.647261099 +0800 +++ linux/drivers/base/node.c 2018-12-23 19:39:51.643261112 +0800 @@ -242,6 +242,16 @@ static ssize_t type_show(struct device * } static DEVICE_ATTR(type, S_IRUGO, type_show, NULL); +static ssize_t peer_node_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + int nid = dev->id; + struct pglist_data *pgdat = NODE_DATA(nid); + + return sprintf(buf, "%d\n", pgdat->peer_node); +} +static DEVICE_ATTR(peer_node, S_IRUGO, peer_node_show, NULL); + static struct attribute *node_dev_attrs[] = { &dev_attr_cpumap.attr, &dev_attr_cpulist.attr, @@ -250,6 +260,7 @@ static struct attribute *node_dev_attrs[ &dev_attr_distance.attr, &dev_attr_vmstat.attr, &dev_attr_type.attr, + &dev_attr_peer_node.attr, NULL }; ATTRIBUTE_GROUPS(node_dev); --- linux.orig/include/linux/mmzone.h 2018-12-23 19:39:51.647261099 +0800 +++ linux/include/linux/mmzone.h 2018-12-23 19:39:51.643261112 +0800 @@ -713,6 +713,18 @@ typedef struct pglist_data { /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; + + /* + * Points to the nearest node in terms of latency + * E.g. peer of node 0 is node 2 per SLIT + * node distances: + * node 0 1 2 3 + * 0: 10 21 17 28 + * 1: 21 10 28 17 + * 2: 17 28 10 28 + * 3: 28 17 28 10 + */ + int peer_node; } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) --- linux.orig/mm/page_alloc.c 2018-12-23 19:39:51.647261099 +0800 +++ linux/mm/page_alloc.c 2018-12-23 19:39:51.643261112 +0800 @@ -6926,6 +6926,34 @@ static void check_for_memory(pg_data_t * } } +/* + * Return the nearest peer node in terms of *locality* + * E.g. peer of node 0 is node 2 per SLIT + * node distances: + * node 0 1 2 3 + * 0: 10 21 17 28 + * 1: 21 10 28 17 + * 2: 17 28 10 28 + * 3: 28 17 28 10 + */ +static int find_best_peer_node(int nid) +{ + int n, val; + int min_val = INT_MAX; + int peer = NUMA_NO_NODE; + + for_each_online_node(n) { + if (n == nid) + continue; + val = node_distance(nid, n); + if (val < min_val) { + min_val = val; + peer = n; + } + } + return peer; +} + /** * free_area_init_nodes - Initialise all pg_data_t and zone data * @max_zone_pfn: an array of max PFNs for each zone @@ -7012,6 +7040,7 @@ void __init free_area_init_nodes(unsigne if (pgdat->node_present_pages) node_set_state(nid, N_MEMORY); check_for_memory(pgdat, nid); + pgdat->peer_node = find_best_peer_node(nid); } }