Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp384920lqp; Thu, 21 Mar 2024 04:34:50 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWkonDFtBlDrgOkFIGqKnpblKuDK5hxE1aN2kk1/bJxDUmO249APEz4J/Gm4MzSXy2wyLIkpkGEZL3LqPSw62I4FCBTsP8Osl7nZiDF9A== X-Google-Smtp-Source: AGHT+IEKRX9/+mV+YOx1ptz1w1ck2//TBt+UHGk0834yKCRC6WSLM4YTQk6N51KWQ/VTTI0qPDic X-Received: by 2002:a17:902:e80b:b0:1e0:1bff:65e5 with SMTP id u11-20020a170902e80b00b001e01bff65e5mr12756162plg.66.1711020890555; Thu, 21 Mar 2024 04:34:50 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711020890; cv=pass; d=google.com; s=arc-20160816; b=Bpm+yk6NHpDxCqFTL+qMvs23vw/ThDzaHA5zKoekq2yj8bZCezXFh4UOxe48NpQJMw vrYdigzUv1SYjJa649KM+xhkRnj91L8IVGYB+zQXSB4htOQnH7YGyHb+1o+6pq8qW0vk WFOfBBLTIvv7iKoQmY2c2Hm0o5FNJO4TSZl9xOM333vs/kRRsYGdwRcHU+Q5J541BZpQ gNUvqYL/G54ncVhXB5ef1Aptc4iH1xEWIHwtJPOaKRtNepxjhRu0XE+mMqVkmt1qgLmd fbAy5gJuxC4HWAG2iDwA16m24b3xBAUQbHxFiSoaAbcQTReL+0BiDnh0b2GG+KEWHG3o dgkw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=X3hz2VPzVZOEkd8yWvn5oS5lDp/KJJL1DQQq8W3Frm4=; fh=U12L/0KYs11bk/JAOkh3R44TAENIDyc2OzRsyWjsraA=; b=cqwHKRD4za1nEe7PwF2LuQLzOY6tInxfZNfEi7Axgt8AHjlW3xJxtaVrC2rYqlMgnD rktRRu57/LPVYiaDZg1ShitqkOJMYvkazE0dS33KGrpkqtBUerUuW63X3fY97K9SKIXX o2QqNUJki3IaZ+NK/+ECBXyRPdresN5eW2ZzKvHYln3P1honrReDn6dJdlcPt03DyUEX 2an13KQlKDtI01GLXzCpQnpj4L05bUG45Q8FI8q3nd+3sScxIuCLvoeZI9hWgE2JWWSq 7j2uzjxNlK0zF4PjpieKkR0S7IvA8aSihQs9X4mer3a4DCpaZV/5UGYstXU/sTTOmOC8 REyw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=UGM3RtES; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-109972-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-109972-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id h15-20020a170902f54f00b001ddc94bfaafsi11814874plf.269.2024.03.21.04.34.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Mar 2024 04:34:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-109972-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=UGM3RtES; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-109972-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-109972-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 9ECEDB210BD for ; Thu, 21 Mar 2024 11:32:44 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B18A558ACE; Thu, 21 Mar 2024 11:32:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="UGM3RtES" Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBFCD58AA3 for ; Thu, 21 Mar 2024 11:32:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711020737; cv=none; b=pXc281OZrhoNfD0nKdOX1hJ7b0RIFeanvohLXJRr276ZOXoGSGvjj+ESjfyNwZfV0L841FquDy3W44HQExEbVZ4mAW6me2edTPGiv5oc1xYan2m2CqR5+OGvfbzb63EymIHWpuSGOssR35yG79DrNzqFwY8KY8OyKgVnRUrMXi0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711020737; c=relaxed/simple; bh=wTvinnt0DOHGAhm7RHQdDvRHZ4C+zFBUwv35gW5mle0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iqLQPi6/Kw6FrDgU6qV7/VulKHeU5WuBrXHdEkY0RFJPoADzAzHIW47rJ8i0+lM8YASrlAs7pMNI7v6fBobIgXwZ/7yCChKbUSfx5PjdP33aJRG8KGNjg9II6zaHlfjaOsbQvrGaJz5eKVve+0eing7gGSIUUVDULUj3hoHHasQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=UGM3RtES; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42LBCqGt001660; Thu, 21 Mar 2024 11:30:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=X3hz2VPzVZOEkd8yWvn5oS5lDp/KJJL1DQQq8W3Frm4=; b=UGM3RtESnozu5OiOcI3+fihab4r9+/RXiL01seVABSbuemgeCUbPkN6nOGK5VEy5+/QQ BgONPZWPXOw3myg+9nQ7BLocKn+AnsfsSl4FV2E1tCNp5Kn1ckphV8+4C9SY5tBpLjjt IasiT5E1khMuxX2o9fqFO6jvYKr19XXVUJftmy9lDk6Hmf40MjpKWMHA8upEUSTrpSqc H0MqiuMK/D+3Nkj0Tm9I1JAywQLPSR0S//04QAIHiU4CKixwx87pSOK2jtZXAOFnj60T bIJPQ0HEXXJ3JirrpNzxKFuH0o7+cnoYvq4DZHHDiVVg8PBcIKBLafKqzkXq8G8mgSEs zQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x0ktd81hw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 Mar 2024 11:30:37 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42LBUapA000324; Thu, 21 Mar 2024 11:30:36 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x0ktd81hs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 Mar 2024 11:30:36 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42L8qijm015792; Thu, 21 Mar 2024 11:30:35 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3wwp50cvkm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 Mar 2024 11:30:35 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42LBUVr716318820 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 21 Mar 2024 11:30:33 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3EE342006A; Thu, 21 Mar 2024 11:30:31 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1A8752005A; Thu, 21 Mar 2024 11:30:28 +0000 (GMT) Received: from ltczz402-lp1.aus.stglabs.ibm.com (unknown [9.53.171.174]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 21 Mar 2024 11:30:27 +0000 (GMT) From: Donet Tom To: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Aneesh Kumar , Huang Ying , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan , Donet Tom Subject: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy Date: Thu, 21 Mar 2024 06:29:51 -0500 Message-Id: X-Mailer: git-send-email 2.39.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 6tqmExyYJedqdWzycIOEnWNwKfC9t0vE X-Proofpoint-GUID: i0C2nB4_Vyn4lJwcM9z3m0RrA2GKLa-C X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-21_08,2024-03-18_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 clxscore=1015 mlxscore=0 adultscore=0 impostorscore=0 malwarescore=0 mlxlogscore=999 phishscore=0 bulkscore=0 priorityscore=1501 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403140000 definitions=main-2403210080 commit bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") added support for migrate on protnone reference with MPOL_BIND memory policy. This allowed numa fault migration when the executing node is part of the policy mask for MPOL_BIND. This patch extends migration support to MPOL_PREFERRED_MANY policy. Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag MPOL_F_NUMA_BALANCING. This causes issues when we want to use NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, the kernel should not allocate pages from the slower memory tier via allocation control zonelist fallback. Instead, we should move cold pages from the faster memory node via memory demotion. For a page allocation, kswapd is only woken up after we try to allocate pages from all nodes in the allocation zone list. This implies that, without using memory policies, we will end up allocating hot pages in the slower memory tier. MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better allocation control when we have memory tiers in the system. With MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only of faster memory nodes. When we fail to allocate pages from the faster memory node, kswapd would be woken up, allowing demotion of cold pages to slower memory nodes. With the current kernel, such usage of memory policies implies we can't do page promotion from a slower memory tier to a faster memory tier using numa fault. This patch fixes this issue. For MPOL_PREFERRED_MANY, if the executing node is in the policy node mask, we allow numa migration to the executing nodes. If the executing node is not in the policy node mask, we do not allow numa migration. Signed-off-by: Aneesh Kumar K.V (IBM) Signed-off-by: Donet Tom --- mm/mempolicy.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index aa48376e2d34..13100a290918 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1504,9 +1504,10 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) return -EINVAL; if (*flags & MPOL_F_NUMA_BALANCING) { - if (*mode != MPOL_BIND) + if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY) + *flags |= (MPOL_F_MOF | MPOL_F_MORON); + else return -EINVAL; - *flags |= (MPOL_F_MOF | MPOL_F_MORON); } return 0; } @@ -2770,15 +2771,26 @@ int mpol_misplaced(struct folio *folio, struct vm_fault *vmf, break; case MPOL_BIND: - /* Optimize placement among multiple nodes via NUMA balancing */ + case MPOL_PREFERRED_MANY: + /* + * Even though MPOL_PREFERRED_MANY can allocate pages outside + * policy nodemask we don't allow numa migration to nodes + * outside policy nodemask for now. This is done so that if we + * want demotion to slow memory to happen, before allocating + * from some DRAM node say 'x', we will end up using a + * MPOL_PREFERRED_MANY mask excluding node 'x'. In such scenario + * we should not promote to node 'x' from slow memory node. + */ if (pol->flags & MPOL_F_MORON) { + /* + * Optimize placement among multiple nodes + * via NUMA balancing + */ if (node_isset(thisnid, pol->nodes)) break; goto out; } - fallthrough; - case MPOL_PREFERRED_MANY: /* * use current page if in policy nodemask, * else select nearest allowed node, if any. -- 2.39.3