Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5450565yba; Wed, 10 Apr 2019 20:58:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqxzkNFoRChJEy6Gx0skva41VEnNsRAA7qX0jxDY1PlHN1Y4wRtDfhrK73i5P8QTR8lKAU6d X-Received: by 2002:a65:4183:: with SMTP id a3mr45941235pgq.121.1554955103324; Wed, 10 Apr 2019 20:58:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554955103; cv=none; d=google.com; s=arc-20160816; b=DBfObJDEK6JI4NPbxID+4sxRz0EJn/GkCVbefTAXNu0h6ypMbFCaVD9k+a2DKIH+kc +Byl3W7yX0OoEpe8HLoLQHBqTLg7AQr74hzOAiBTsZ+ECElf1Iy3CCo/RgC4DbLMBHas WfSbaEwYliDFjGhEkdw+zrkuglYRwCKjwSSZ+kG9htRyc8QqCjgJUWVA5dMvvEhkIV3/ Hi+kO5UVtHpDk1Mk7CgO6z8YQXJA4X4hde70w7CxZ9WL+LQKUzMYFbHymBo7PE4lzPBT zI6EcRoAYuzHUa1n78MIrv3vho05CEPI0/SIwKIVf0c9iTbjTCB8DLhrUSAHSZBnpt4K qAxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=MLWynHGliv6scIgC4xX8HtX9K46TiIdNd3gqOe6+OLg=; b=X47JbaJg0WGssGOy8bQ3p18VipGaeTNy1jJ65xmw8Z+iquT2RLp8rpnc3al8KMK789 1BA7NxJKsWabo5rMn+PQYaHjtAhcakMeT39EgFALIGYNDDK5t35Yey/d0EDi0htK8ddC Xy8HzQZ/V+tNWV1EXdE1RwxeYprJkpZNz3UjQP1nOMVxs9sm9udI3/VpNQQ/RMSQ7G8i 75jjzi9BE151KOAfZ8zllZoLkB074qov9Cu1gZVUtycqp8XXBujhp9xveUA7vBK6/hn1 SlqxDD7iZE1pQQY8vC3EIaBsFMQfEtdb2sjmsBEzskmHa3SLo8ioMjv6TBnSLxY0oZgv nMlA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h65si35338407pfd.232.2019.04.10.20.58.07; Wed, 10 Apr 2019 20:58:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726882AbfDKD53 (ORCPT + 99 others); Wed, 10 Apr 2019 23:57:29 -0400 Received: from out30-56.freemail.mail.aliyun.com ([115.124.30.56]:58477 "EHLO out30-56.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726551AbfDKD51 (ORCPT ); Wed, 10 Apr 2019 23:57:27 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07417;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0TP0I5rB_1554955031; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TP0I5rB_1554955031) by smtp.aliyun-inc.com(127.0.0.1); Thu, 11 Apr 2019 11:57:22 +0800 From: Yang Shi To: mhocko@suse.com, mgorman@techsingularity.net, riel@surriel.com, hannes@cmpxchg.org, akpm@linux-foundation.org, dave.hansen@intel.com, keith.busch@intel.com, dan.j.williams@intel.com, fengguang.wu@intel.com, fan.du@intel.com, ying.huang@intel.com, ziy@nvidia.com Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 2/9] mm: page_alloc: make find_next_best_node find return cpuless node Date: Thu, 11 Apr 2019 11:56:52 +0800 Message-Id: <1554955019-29472-3-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> References: <1554955019-29472-1-git-send-email-yang.shi@linux.alibaba.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Need find the cloest cpuless node to demote DRAM pages. Add "cpuless" parameter to find_next_best_node() to skip DRAM node on demand. Signed-off-by: Yang Shi --- mm/internal.h | 11 +++++++++++ mm/page_alloc.c | 14 ++++++++++---- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 9eeaf2b..a514808 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -292,6 +292,17 @@ static inline bool is_data_mapping(vm_flags_t flags) return (flags & (VM_WRITE | VM_SHARED | VM_STACK)) == VM_WRITE; } +#ifdef CONFIG_NUMA +extern int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless); +#else +static inline int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless) +{ + return 0; +} +#endif + /* mm/util.c */ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, struct rb_node *rb_parent); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7cd88a4..bda17c2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5362,6 +5362,7 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write, * find_next_best_node - find the next node that should appear in a given node's fallback list * @node: node whose fallback list we're appending * @used_node_mask: nodemask_t of already used nodes + * @cpuless: find next best cpuless node * * We use a number of factors to determine which is the next node that should * appear on a given node's fallback list. The node should not have appeared @@ -5373,7 +5374,8 @@ int numa_zonelist_order_handler(struct ctl_table *table, int write, * * Return: node id of the found node or %NUMA_NO_NODE if no node is found. */ -static int find_next_best_node(int node, nodemask_t *used_node_mask) +int find_next_best_node(int node, nodemask_t *used_node_mask, + bool cpuless) { int n, val; int min_val = INT_MAX; @@ -5381,13 +5383,18 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) const struct cpumask *tmp = cpumask_of_node(0); /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + if (!node_isset(node, *used_node_mask) && + !cpuless) { node_set(node, *used_node_mask); return node; } for_each_node_state(n, N_MEMORY) { + /* Find next best cpuless node */ + if (cpuless && (node_state(n, N_CPU))) + continue; + /* Don't want a node to appear more than once */ if (node_isset(n, *used_node_mask)) continue; @@ -5419,7 +5426,6 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) return best_node; } - /* * Build zonelists ordered by node and zones within node. * This results in maximum locality--normal zone overflows into local @@ -5481,7 +5487,7 @@ static void build_zonelists(pg_data_t *pgdat) nodes_clear(used_mask); memset(node_order, 0, sizeof(node_order)); - while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { + while ((node = find_next_best_node(local_node, &used_mask, false)) >= 0) { /* * We don't want to pressure a particular node. * So adding penalty to the first node in same -- 1.8.3.1