Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3758199pxk; Tue, 22 Sep 2020 01:32:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyXdq1i3G6h0/2luOgEoz/Xc3upso95/D6ipzjmJ1zM948WHU/f8zwdjmWV8Rjbu+RB3kJZ X-Received: by 2002:a17:906:4685:: with SMTP id a5mr3717039ejr.446.1600763551143; Tue, 22 Sep 2020 01:32:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600763551; cv=none; d=google.com; s=arc-20160816; b=ZwHasgEM8X7ZHB+m0yNDbJ6ePch/gC7lQ1zmJe3T+71yKxXYe3mci32evnKtuVEuVl 51kBAaLOkRYGIuEylG/6J1uetcctR3nl7+1sAgdQwGrmTHuzd2JM79pTt3H4lHB+g4w0 S3q88/Jb26btWrQrhwN8Wmrog/6n+TcXY8JlxcwR3aQpw1fOnaFE/tk1joGR0FVyd5TS Bmxv06lOXDgMyFSKCOF7GLk3NR05rr5i6qh2JngZoPtBfPENGSqpgs4rnJmdFUqlnjmp op0rAKVrZDDDNSpPrjlIPe/XtjJsYZBky3+JAE1/8mPqDm5G3hwZe1KENIt9Oo3L2w9J 1vZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:ironport-sdr:ironport-sdr; bh=YZezs9pr/DDnT45tOvCVAt1RYVZvhVtqmC7eNm2FvFs=; b=oZK68mXyn6xCCo78RuQcEl9wTT7aWwqLLEc1VQetIWsArQlbdEUWYSiHRCoUUJFPE8 N1yW0ldi29g1yx2dq22auu2fOQGXkGXJR6UCTayhWm5qze1Rq8bzQP9hbMlOqzqr7pw2 qer8ZUx1nZ/GBRdR1aU6ZEUcuq1SnGPlpUZ3jMayRhHF/Z2fkhbQ7iEu+ry6jnVxo32k wnptZTByM3+RLTckVX94vEj9JeYew1DJPMKa64+3eXdBCjYSXtlIYMYPBaRq+y+9020+ TtSoBXU/JuttZoW9MqTkKtAN09+S5AYd94nV4b7fCigG/J3YHluaP3/Q+SaV7MtSyvgW Ky+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s11si10015971eju.295.2020.09.22.01.32.06; Tue, 22 Sep 2020 01:32:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729534AbgIVGye (ORCPT + 99 others); Tue, 22 Sep 2020 02:54:34 -0400 Received: from mga05.intel.com ([192.55.52.43]:63428 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727710AbgIVGyd (ORCPT ); Tue, 22 Sep 2020 02:54:33 -0400 IronPort-SDR: 9E4BYeie6ROSMnkO9OqKOOEWpOYx1NFRxeW62lGliITZFPkUnzXBe5jLIbp0/zJ0iwUhXFsmC8 Xu7yX7jWsc0A== X-IronPort-AV: E=McAfee;i="6000,8403,9751"; a="245386212" X-IronPort-AV: E=Sophos;i="5.77,289,1596524400"; d="scan'208";a="245386212" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2020 23:54:33 -0700 IronPort-SDR: uL6ZEh4+QkK4B7jYVcpSqFTt11KDoZHGeimnATf6+7xFTr8WG3QICWTd1zEhfHQ33LDV5UGrNZ F8XxaCiSS9eQ== X-IronPort-AV: E=Sophos;i="5.77,289,1596524400"; d="scan'208";a="485834643" Received: from yhuang-mobile.sh.intel.com ([10.238.4.22]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2020 23:54:29 -0700 From: Huang Ying To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrew Morton , Ingo Molnar , Mel Gorman , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: [RFC -V2] autonuma: Migrate on fault among multiple bound nodes Date: Tue, 22 Sep 2020 14:54:01 +0800 Message-Id: <20200922065401.376348-1-ying.huang@intel.com> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now, AutoNUMA can only optimize the page placement among the NUMA nodes if the default memory policy is used. Because the memory policy specified explicitly should take precedence. But this seems too strict in some situations. For example, on a system with 4 NUMA nodes, if the memory of an application is bound to the node 0 and 1, AutoNUMA can potentially migrate the pages between the node 0 and 1 to reduce cross-node accessing without breaking the explicit memory binding policy. So in this patch, if mbind(.mode=MPOL_BIND, .flags=MPOL_MF_LAZY) is used to bind the memory of the application to multiple nodes, and in the hint page fault handler both the faulting page node and the accessing node are in the policy nodemask, the page will be tried to be migrated to the accessing node to reduce the cross-node accessing. [Peter Zijlstra: provided the simplified implementation method.] Questions: Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing globally. But for the memory areas that are bound to multiple NUMA nodes, even if the AutoNUMA is enabled globally via the sysctl knob, we still need to enable AutoNUMA again with a special flag. Why not just optimize the page placement if possible as long as AutoNUMA is enabled globally? The interface would look simpler with that. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Ingo Molnar Cc: Mel Gorman Cc: Rik van Riel Cc: Johannes Weiner Cc: "Matthew Wilcox (Oracle)" Cc: Dave Hansen Cc: Andi Kleen Cc: Michal Hocko Cc: David Rientjes --- mm/mempolicy.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eddbe4e56c73..273969204732 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2494,15 +2494,19 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long break; case MPOL_BIND: - /* - * allows binding to multiple nodes. - * use current page if in policy nodemask, - * else select nearest allowed node, if any. - * If no allowed nodes, use current [!misplaced]. + * Allows binding to multiple nodes. If both current and + * accessing nodes are in policy nodemask, migrate to + * accessing node to optimize page placement. Otherwise, + * use current page if in policy nodemask, else select + * nearest allowed node, if any. If no allowed nodes, use + * current [!misplaced]. */ - if (node_isset(curnid, pol->v.nodes)) + if (node_isset(curnid, pol->v.nodes)) { + if (node_isset(thisnid, pol->v.nodes)) + goto moron; goto out; + } z = first_zones_zonelist( node_zonelist(numa_node_id(), GFP_HIGHUSER), gfp_zone(GFP_HIGHUSER), @@ -2516,6 +2520,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long /* Migrate the page towards the node whose CPU is referencing it */ if (pol->flags & MPOL_F_MORON) { +moron: polnid = thisnid; if (!should_numa_migrate_memory(current, page, curnid, thiscpu)) -- 2.28.0