Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp2655709lqp; Mon, 25 Mar 2024 05:54:49 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWPkbwDmwzjb5PsyAd6UVoUfr1agK5tq/RMld6g+SYwsyNsivDoBB21bP1KAsBAJqHFWIp81Uj6+p2OTztgnhoQKyHgECjJj6AxSMl8tQ== X-Google-Smtp-Source: AGHT+IGlyf0cHZHE6ke5qRAd+nCtRqZoNSJRdQsaqXc+zb9xaBwaySenKC0nEr9riI48nQBPfUBr X-Received: by 2002:a05:6808:144f:b0:3c3:882a:a874 with SMTP id x15-20020a056808144f00b003c3882aa874mr9943318oiv.5.1711371289103; Mon, 25 Mar 2024 05:54:49 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711371289; cv=pass; d=google.com; s=arc-20160816; b=L2SpR0wiHAMFdwd/KDLz6HHXxr97cIEbFzA+ILL3DdYu0HvA+dYJ166kW5QvFcmi0G /kvQQvozE/tHjaFs0Qd35bbTpNnDrmNbtwy/CmhJLc1MH+5qazIswfDm0ABp4o4KeKiw HA7KkBiUjvFhO+wOHMz4XeNcHSjwgzANScm/Ch2bamp4HSHVwrq39v14uUFoc9crUcgF YH9Q+sBtZkFTfvtuvJNRRt8FH6/Otz220L+LSTzcBvjQqZROmsLkol3ENSYOmq3MrHPF 10eFFo2o7+8Yp24RfZdYCNcsUwV+cWIaTNsc56Iynm1Ks53H9tXeo7Z0KXfpJPfJnheJ QaGg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=N2k9wSgo9UX01/19bcUNfkgoAa0/3oF73J+1H/ePw/I=; fh=CloVRZZHYLqCkoAaMOPHFM2U0vqUWyCcMxyEu7Uj4bo=; b=xFz89MVPExTt0FuQK6rKGma1Uc0aC4amMLncJZ6xkmwIxFL/QwTJADVPJtNNlkEEGo hqojsl9+rKc+O4yVhrSwRjdmAmh+uU0jXatA6paXAyfKL4AQyelkAOfAbZGbnO1ShY9d W/4ruOurKDcnIinU6wsG2m31NnGm+v2ouflo8eCG/vHUhv8/e7AKqv5VTNIhiMGbjAYN 8nzGJxRpQrNA8YIhyxbv2HT3Klko+MryWYiTCdmM4ELFFHvT9lN4NokU76rPpEFWKpOa aNqi+m/+6iHXW1czD1cAxZIHDz9Ax7PUKowAp50aRzZiiIIoM/t0MQ5MFsf7yzSvTk8m OdIQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=SJUnY1PO; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-116535-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116535-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id ey24-20020a05622a4c1800b004313a8a2b7csi5998533qtb.236.2024.03.25.05.54.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Mar 2024 05:54:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-116535-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=SJUnY1PO; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-116535-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116535-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id BD3C11C3751E for ; Mon, 25 Mar 2024 12:54:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DB89438DFB; Mon, 25 Mar 2024 07:32:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="SJUnY1PO" Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84114185221 for ; Mon, 25 Mar 2024 05:01:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711342910; cv=none; b=FgIkKhNDq8pIZg03K8NbRia6pbAY4xTOMHPjON1l8PDjk7LPO5/Swi8TPCMtvjUbphcdVLUJrKGr42OYrOueP85xRu7mDDBoTgrfsjMiSHsRs/tdvtp7MAQsrRXxm065CZ8d2mix7piLVd5F7MA8rRz1GYu0WgJ/zAaFyp8QJ+c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711342910; c=relaxed/simple; bh=7hT4rajj0x8vmFtp0XR/XBXh2yzvt8Zqa1DIIBtVNOA=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=mglkRyCOBJZFZcmzjPHL1P+j1NDVA7IRkGul3FjKyGckan2MfttUUhml27AH113FCNHZ8EDUILJ3hna55t5Cc228NGMSBMDZAyUHan+31JXrl+oq4ABeXqNB5XQS6GTGV8n0SoybgHQ/FtSsJiY/M/m3sPMxOlFoLoLZrwDuzoI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=SJUnY1PO; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42P3cDQR000437; Mon, 25 Mar 2024 05:00:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=N2k9wSgo9UX01/19bcUNfkgoAa0/3oF73J+1H/ePw/I=; b=SJUnY1POALicvQsMTys6mELXiORyOS09CCF6slIYipE2NVttlwxLCFsmxgfkzMqCgAlO 2f+HeWG5o/yaUFGqHlgn/2yj8BlgQN/ZbGLbWmmorVsqhrrzpoI4YOQpHsq//Gd0B0fp 20Ak9RZQkRR9FiDwSy0jkRo/kgS8IL5cEmASM4pEb/cGEUDAsC2lZSwGUyJ1wNY0ARb1 hIPgoSB96BFMgdoZgDWVS2JDB1SnMFmlUBQpSATJCH+iWODGipFVKVncKTKq4I5kPEZ5 93OdNQ8p0+UTAJYeRhEbO/uQVzhCgqQ05AbWEHYx3xH9t7UoEbaxEkW/hevwt70Q2xqM Yg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x2h6t1fak-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:00:45 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42P50iY3023005; Mon, 25 Mar 2024 05:00:44 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x2h6t1faj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:00:44 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42P4JnVh016410; Mon, 25 Mar 2024 05:00:43 GMT Received: from smtprelay07.wdc07v.mail.ibm.com ([172.16.1.74]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3x29dtq6dw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:00:43 +0000 Received: from smtpav04.dal12v.mail.ibm.com (smtpav04.dal12v.mail.ibm.com [10.241.53.103]) by smtprelay07.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42P50ebo24707822 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 25 Mar 2024 05:00:42 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 898B958067; Mon, 25 Mar 2024 05:00:40 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EDF4E58077; Mon, 25 Mar 2024 05:00:33 +0000 (GMT) Received: from [9.109.245.191] (unknown [9.109.245.191]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTP; Mon, 25 Mar 2024 05:00:33 +0000 (GMT) Message-ID: <29936a4b-95c3-4592-8eae-7d4741e4a51f@linux.ibm.com> Date: Mon, 25 Mar 2024 10:30:32 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy Content-Language: en-US To: "Huang, Ying" Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan References: <87h6gyr7jf.fsf@yhuang6-desk2.ccr.corp.intel.com> <875xxbqb51.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Donet Tom In-Reply-To: <875xxbqb51.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: E2dXiDVVREiPB9wvERfQdEXSxnh3LQXD X-Proofpoint-ORIG-GUID: oMUQTqu-LlmdFFm-_qnRf5hDuc3QCurp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-25_02,2024-03-21_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 malwarescore=0 adultscore=0 suspectscore=0 clxscore=1015 phishscore=0 bulkscore=0 lowpriorityscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2403250026 On 3/25/24 08:18, Huang, Ying wrote: > Donet Tom writes: > >> On 3/22/24 14:02, Huang, Ying wrote: >>> Donet Tom writes: >>> >>>> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound >>>> nodes") added support for migrate on protnone reference with MPOL_BIND >>>> memory policy. This allowed numa fault migration when the executing node >>>> is part of the policy mask for MPOL_BIND. This patch extends migration >>>> support to MPOL_PREFERRED_MANY policy. >>>> >>>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag >>>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use >>>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, >>>> the kernel should not allocate pages from the slower memory tier via >>>> allocation control zonelist fallback. Instead, we should move cold pages >>>> from the faster memory node via memory demotion. For a page allocation, >>>> kswapd is only woken up after we try to allocate pages from all nodes in >>>> the allocation zone list. This implies that, without using memory >>>> policies, we will end up allocating hot pages in the slower memory tier. >>>> >>>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add >>>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better >>>> allocation control when we have memory tiers in the system. With >>>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only >>>> of faster memory nodes. When we fail to allocate pages from the faster >>>> memory node, kswapd would be woken up, allowing demotion of cold pages >>>> to slower memory nodes. >>>> >>>> With the current kernel, such usage of memory policies implies we can't >>>> do page promotion from a slower memory tier to a faster memory tier >>>> using numa fault. This patch fixes this issue. >>>> >>>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node >>>> mask, we allow numa migration to the executing nodes. If the executing >>>> node is not in the policy node mask, we do not allow numa migration. >>> Can we provide more information about this? I suggest to use an >>> example, for instance, pages may be distributed among multiple sockets >>> unexpectedly. >> Thank you for your suggestion. However, this commit message explains all the scenarios. > Yes. The commit message is correct and covers many cases. What I > suggested is to describe why we do that? An examples can not covers all > possibility, but it is easy to be understood. For example, something as > below? > > For example, on a 2-sockets system, there are N0, N1, N2 in socket 0, N3 > in socket 1. N0, N1, N3 have fast memory and CPU, while N2 has slow > memory and no CPU. For a workload, we may use MPOL_PREFERRED_MANY with > nodemask with N0 and N1 set because the workload runs on CPUs of socket > 0 at most times. Then, even if the workload runs on CPUs of N3 > occasionally, we will not try to migrate the workload pages from N2 to > N3 because users may want to avoid cross-socket access as much as > possible in the long term. Thank you. I will change the commit message and post V4. Thanks Donet Tom > >> For example, Consider a system with 3 numa nodes (N0,N1 and N6). >> N0 and N1 are tier1 DRAM nodes  and N6 is tier 2 PMEM node. >> >> Scenario 1: The process is executing on N1, >> If the executing node is in the policy node mask, >> Curr Loc Pages - The numa node where page present(folio node) >> ================================================================================== >> Process    Policy          Curr Loc Pages      Observations >> ----------------------------------------------------------------------------------- >> N1           N0 N1 N6              N0                   Pages Migrated from N0 to N1 >> N1           N0 N1 N6              N6                   Pages Migrated from N6 to N1 >> N1           N0 N1                 N1                   Pages Migrated from N1 to N6 > Pages are not Migrating ? > >> N1           N0 N1                 N6                   Pages Migrated from N6 to N1 >> ------------------------------------------------------------------------------------ >> Scenario 2:  The process is executing on N1, >> If the executing node is NOT in the policy node mask, >> Curr Loc Pages - The numa node where page present(folio node) >> =================================================================================== >> Process    Policy       Curr Loc Pages    Observations >> ----------------------------------------------------------------------------------- >> N1            N0 N6             N0              Pages are not Migrating >> N1            N0 N6             N6              Pages are not migration, >> N1            N0                N0              Pages are not Migrating >> ------------------------------------------------------------------------------------ >> >> Scenario 3: The process is executing on N1, >> If the executing node and folio nodes are  NOT in the policy node mask, >> Curr Loc Pages - The numa node where page present (folio node) >> ==================================================================================== >> Thread    Policy       Curr Loc Pages        Observations >> ------------------------------------------------------------------------------------ >> N1          N0               N6                 Pages are not Migrating >> N1          N6               N0                 Pages are not Migrating >> ------------------------------------------------------------------------------------ >> >> We can conclude that even if the pages are distributed among multiple sockets, >> if the executing node is in the policy node mask, we allow numa migration to the >> executing nodes. If the executing node is not in the policy node mask, >> we do not allow numa migration. >> > [snip] > > -- > Best Regards, > Huang, Ying