Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp1580333rdb; Mon, 19 Feb 2024 23:54:19 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWqfXaHUDoLUqMwk7ZWfVpzhkP58RJOxb8L1ojFPVPMhPwZmjeNErErZZKLpAjPEBkfaSQkg6yMTkKdtgZGvV4D8kteyHRhzwEgNq3Hrw== X-Google-Smtp-Source: AGHT+IEKViiepoc1M9LMuod+44LifqQA2xh2db/R5RUW8MemxoOH98/hWXRLPkkdqFOHodcOPXV/ X-Received: by 2002:a05:6a21:626:b0:19e:ac67:13a9 with SMTP id ll38-20020a056a21062600b0019eac6713a9mr11677432pzb.19.1708415659431; Mon, 19 Feb 2024 23:54:19 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708415659; cv=pass; d=google.com; s=arc-20160816; b=RgkTouK2xJUfPU7m9nZXxYTkh39jGr8HATu0JHIgRx20jZb0vPxkgk2XQhq+uC0BeL OjJ5IlwdKCWZhhOBXhpZSxjKuhFqt052J9TJ2qGnzC3/7Y/KuLHEWHEF7ihIRqU4MRSF JUfXdFomqLMKi0Mpn3sJ+762esF5btk527py/Ep5lEpUUeyKEt+8pzE84bRQ9UUe7xuO OzOLsV8C7TDQeVt5Pc7oZNM5M1ExjT0qLpcHq+sMq9MeFo1vIaA5985q96MuevZpgkKu CLOFfvHkPTERzbN3ncpLnKtmDvnk5bP4k1/NlDyuc0dviXkMc3qzQIFFfj/Zh5SRHbEM BaHg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :message-id:date:references:in-reply-to:subject:cc:to:from :dkim-signature; bh=Q8xxIZHJKfSWHeEttC/KI/t79x9L+4NFuahYf5DAJxg=; fh=pddU6AkOc8rbr2D6G1npm8qX/MeArDrAROePuZEHDLQ=; b=BoXL2dPAaRDNSGgNcVf0B0K8sJMjvBHPoIVZ2EVun1mBVNoTZeXqqZFe7VQif+TEky LCeZc8U1SXvv6OnmuqmKIN7O8Xi4jKan6dSrRn3rMIEv4+fOTLJ7MNfoFsc5aKYf8KRE Z5BmutBz4gBJ4AtlnREYRB52GKlP1pyqd0A63hAYc0zm+va1aZ64szTFLLXGtvW7uq4K kQRUl62zO9zUiYDkx5rEX8a7BtMxnk5G5SEYZAN8fDdsXVE0VJZ6GCTrh2vMVN18bzsR /oXzU9Lefrnqkp6VaqOXRSetzHxri7J2yT/6Eskkbr9F+F4oTDuZHxODwWWKdt0H+fnL ieeA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Vr/2mGgF"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-72496-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-72496-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id pc17-20020a17090b3b9100b0029956277b6csi5485858pjb.134.2024.02.19.23.54.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Feb 2024 23:54:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-72496-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Vr/2mGgF"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-72496-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-72496-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 227D92844C0 for ; Tue, 20 Feb 2024 07:54:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 87BAB5C5FD; Tue, 20 Feb 2024 07:54:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Vr/2mGgF" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 094785BAFD for ; Tue, 20 Feb 2024 07:54:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708415650; cv=none; b=UKFHmHz/kbIVEjJHsc8MNKuIAwCF6blfo2zm/xB8ktHYMUuuv8DGjWlVB2C0qfSop73TOkuRBVxoOfff8KXaxLZnZG78xqwiwjG7FvLI9JkmYLyshpRQwzO01a/vYqQ2mYmwYAKev9u8Boerf/z3fgBmGajlesQsMAEV+f+purI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708415650; c=relaxed/simple; bh=nZRwj2fpad3qzEKDGddArjPflIxngkmQgThhJYg64dc=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=OaF1/Hr5be7VTSynZI0cuY4bq1fZeQ3tA0Z0/9CFftgJ18GBGADEk4dC/2l8k8xwNQoqPgisBgcmt366EdRoCh5qidq5PlMN1RbUheozHVk8NTRfz79vRFHj0uaPb0O8iyilRf0bkRIKJjN2rSWflD6ciydAzgYpkg/qFSNznbs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Vr/2mGgF; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 76B5BC433C7; Tue, 20 Feb 2024 07:54:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708415649; bh=nZRwj2fpad3qzEKDGddArjPflIxngkmQgThhJYg64dc=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=Vr/2mGgFfAue0SY+UF7cEYZOwWdvp5FO1cUEQ2Wf2EHU31cahpsEujyr03eOYRAyK hU0zoIJExoHn6KqcUZIT5D4gpst4/yDY3qmPglzpc3lZfcoWVwRB2FbgfwfSzPRZMU fo7Rj+/wuIIp03poIfY63SM+Gs7MQqVh/UmD3Wx1wjfoa+Aooq2PX3I6iY8RMPOI0f FsezOj6nz4TsmlocRTpgED0mgpocO2JBQ+4iybcAYTN1X9NiT+PDnsip9GbGHTOqtY 4QgKzroZ0KsseifiabXTIRIWIi8pFNYYcfgJlVjOo4Rgbf7fq5H1z2YFYybYeTR532 eLZh+M7Pqs7EQ== X-Mailer: emacs 29.2 (via feedmail 11-beta-1 I) From: Aneesh Kumar K.V To: "Huang, Ying" , Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Hansen , Mel Gorman , Ben Widawsky , Feng Tang , Michal Hocko , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Mike Kravetz , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: <877cizppsa.fsf@yhuang6-desk2.ccr.corp.intel.com> References: <9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com> <8d7737208bd24e754dc7a538a3f7f02de84f1f72.1708097962.git.donettom@linux.ibm.com> <877cizppsa.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 20 Feb 2024 13:23:59 +0530 Message-ID: <87sf1nzi3s.fsf@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain "Huang, Ying" writes: > Donet Tom writes: > >> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound >> nodes") added support for migrate on protnone reference with MPOL_BIND >> memory policy. This allowed numa fault migration when the executing node >> is part of the policy mask for MPOL_BIND. This patch extends migration >> support to MPOL_PREFERRED_MANY policy. >> >> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag >> MPOL_F_NUMA_BALANCING. This causes issues when we want to use >> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, >> the kernel should not allocate pages from the slower memory tier via >> allocation control zonelist fallback. Instead, we should move cold pages >> from the faster memory node via memory demotion. For a page allocation, >> kswapd is only woken up after we try to allocate pages from all nodes in >> the allocation zone list. This implies that, without using memory >> policies, we will end up allocating hot pages in the slower memory tier. >> >> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add >> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better >> allocation control when we have memory tiers in the system. With >> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only >> of faster memory nodes. When we fail to allocate pages from the faster >> memory node, kswapd would be woken up, allowing demotion of cold pages >> to slower memory nodes. >> >> With the current kernel, such usage of memory policies implies we can't >> do page promotion from a slower memory tier to a faster memory tier >> using numa fault. This patch fixes this issue. >> >> For MPOL_PREFERRED_MANY, if the executing node is in the policy node >> mask, we allow numa migration to the executing nodes. If the executing >> node is not in the policy node mask but the folio is already allocated >> based on policy preference (the folio node is in the policy node mask), >> we don't allow numa migration. If both the executing node and folio node >> are outside the policy node mask, we allow numa migration to the >> executing nodes. >> >> Signed-off-by: Aneesh Kumar K.V (IBM) >> Signed-off-by: Donet Tom >> --- >> mm/mempolicy.c | 28 ++++++++++++++++++++++++++-- >> 1 file changed, 26 insertions(+), 2 deletions(-) >> >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index 73d698e21dae..8c4c92b10371 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -1458,9 +1458,10 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) >> if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) >> return -EINVAL; >> if (*flags & MPOL_F_NUMA_BALANCING) { >> - if (*mode != MPOL_BIND) >> + if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY) >> + *flags |= (MPOL_F_MOF | MPOL_F_MORON); >> + else >> return -EINVAL; >> - *flags |= (MPOL_F_MOF | MPOL_F_MORON); >> } >> return 0; >> } >> @@ -2463,6 +2464,23 @@ static void sp_free(struct sp_node *n) >> kmem_cache_free(sn_cache, n); >> } >> >> +static inline bool mpol_preferred_should_numa_migrate(int exec_node, int folio_node, >> + struct mempolicy *pol) >> +{ >> + /* if the executing node is in the policy node mask, migrate */ >> + if (node_isset(exec_node, pol->nodes)) >> + return true; >> + >> + /* If the folio node is in policy node mask, don't migrate */ >> + if (node_isset(folio_node, pol->nodes)) >> + return false; >> + /* >> + * both the folio node and executing node are outside the policy nodemask, >> + * migrate as normal numa fault migration. >> + */ >> + return true; > > Why? This may cause some unexpected result. For example, pages may be > distributed among multiple sockets unexpectedly. So, I prefer the more > conservative policy, that is, only migrate if this node is in > pol->nodes. > This will only have an impact if the user specifies MPOL_F_NUMA_BALANCING. This means that the user is explicitly requesting for frequently accessed memory pages to be migrated. Memory policy MPOL_PREFERRED_MANY is able to allocate pages from nodes outside of policy->nodes. For the specific use case that I am interested in, it should be okay to restrict it to policy->nodes. However, I am wondering if this is too restrictive given the definition of MPOL_PREFERRED_MANY. -aneesh