Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp978456lqp; Fri, 22 Mar 2024 01:34:29 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUSs5r+wzM7Z9S9cKAZDbhaG29Cug+DtPyteaHRNT7zwxM2Y45xy9s1X2YsyQ/VV3xO2Fw3hduABEcH7sIfxNi5i/M4Hq6gyF6lHm49tA== X-Google-Smtp-Source: AGHT+IH0/Rsm608kWVuDOt/VMH0657BWBhtLU4m6zurz8tAjnsyJvTNRaPfT6EO/ehxoIrS+W10h X-Received: by 2002:a17:907:3f9c:b0:a46:5f1d:f2a1 with SMTP id hr28-20020a1709073f9c00b00a465f1df2a1mr1406061ejc.27.1711096469454; Fri, 22 Mar 2024 01:34:29 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711096469; cv=pass; d=google.com; s=arc-20160816; b=ynbtMp7LnmExFCR32AxiHD7BdK9EQs4RQysJXFE77S+RIYyP8Fserlyb/dsnZAXzIU HSzeexYriz2AJjplMyyLDWdilBhPiFDR7afHP0WRd+zZLAt02J7AsCLFb2sRoymeAZwY BrJ69763ZN1U3rCgl+KqQ4E1O2ydAMyJ4kMu2F+QJnGscr4A1xfu6XCFrlXJStTlyXJV P+XHgeolt/H1mqskggDmJ0x3i+VjNACBP87IrYWiKfp4q43crOSDOCwZLhGGaf6fXPFk tnRx1TbWrmSYDHp9BUVLgqdL+JDcV2OTRjd4kFsf+5bYgtTVSFgZD88LfqZjHQEC7Yv8 FI3g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=C7oJsHOQSVFgF1+9ewQKwNsVRaAUoOh2vdgl32F4L0k=; fh=hrlWvyh2+2+qtYSDS74IWbwD2TgzeHtygwXvTncL5J0=; b=Do1L4BLZ4VfRxnJ4Z3poQ+Mcbj0OY6+V8FyDeoTjM61P9RQ2iUojZUDmX0lztB6Sbq XFYAzHWqdy2siPuMif2TQe85Kz+9u68gSA/HrsKfLE2Uff2m1BOxJp5bP/OpD/nRPgG0 YBICE1J+Y679tziKH2064IHdKgEflpo7tBukjY4szAT2rZrPCFpoGoqVlvWooEIvBjxx 5Uh8uAePfXBq/1fh3eiEm6IeO+oCgJ1GBwAthFqbtiODbCm7J+8S2Y0TX5gjRC9GIAlJ FoxJA79NR8Ej/dE1BjfDeQpGXWakOsbO8wstINCMMpkjA/791VK/jRqMXfC+bn/oZI7G FwIw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cUpPQHxp; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-111128-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-111128-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id l20-20020a1709062a9400b00a46c1bc8e76si722978eje.844.2024.03.22.01.34.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Mar 2024 01:34:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-111128-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cUpPQHxp; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-111128-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-111128-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 0377C1F22CB7 for ; Fri, 22 Mar 2024 08:34:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 54B9418AE8; Fri, 22 Mar 2024 08:34:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cUpPQHxp" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68BDB17748 for ; Fri, 22 Mar 2024 08:34:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711096461; cv=none; b=CLN9C9Jj90Z+rUpLYnbL0Uu05jv8CzeWDDNeUXN7qpS5Wn5qAk9Bj15zvkBEBpLv0amLTTD0l/SExuN68n8RETb+YSUi2/mUOs+0JilYNt3QCAeG3UUX85P0uhgKDGhS6WVpB8sm0gcRfWwFu8dDORLIbhVXWJzTs2qk3PIhvhs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711096461; c=relaxed/simple; bh=mRVzyIiP7iscM7MSn5nw7hKjhSKi3ZHs2yFxGw3ITds=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=sQayDcFkE611INKw+h74kSdmmeyIGNRBv4nLWcHcs+iVU6IQXSAsffo52jivTPDdblMFJxlF+ymirJkAq9GnlOH0s5uwiaU6ELiB5D8c6nXIfYcXfB2bUKv81FJdWJIFtxhv6/Or1wqNpiMLiZeXDIqg4rw+206ayOqv7SWos4k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cUpPQHxp; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711096460; x=1742632460; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=mRVzyIiP7iscM7MSn5nw7hKjhSKi3ZHs2yFxGw3ITds=; b=cUpPQHxpcPVYzxvyl1mOAAIUVIB4T2z/AFrp9LuEDcLkZ9B0V9tZefxR nLOKXW8dG8WoyzqfUjPeyjQeIEp0Nb8sUuyPh4IVorxSTIOwowCUEkk6h OGNz5arW93szvEpYOBdRIssien4PumoL+AJpUEnuENVUPD4IEOTwQhKCO q8fJwt5wElhT/+hJdxth8YMWzI7wXvtFPyQ4c8xQ7NW9zp+Ro+VTm8ePk kITkmoGVnSpugINhh8Sq5S7Nl9wutJi+FtQr+L+5/RmlW+VgfBxPh4dsO YM+EZn2ZXAAvB7E2gLIn0yWCSxHSNItY8S8683GGl9c2bW2rOZj1+z34Q g==; X-IronPort-AV: E=McAfee;i="6600,9927,11020"; a="6024928" X-IronPort-AV: E=Sophos;i="6.07,145,1708416000"; d="scan'208";a="6024928" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2024 01:34:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,145,1708416000"; d="scan'208";a="19381543" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2024 01:34:14 -0700 From: "Huang, Ying" To: Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: (Donet Tom's message of "Thu, 21 Mar 2024 06:29:51 -0500") References: Date: Fri, 22 Mar 2024 16:32:20 +0800 Message-ID: <87h6gyr7jf.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Donet Tom writes: > commit bda420b98505 ("numa balancing: migrate on fault among multiple bound > nodes") added support for migrate on protnone reference with MPOL_BIND > memory policy. This allowed numa fault migration when the executing node > is part of the policy mask for MPOL_BIND. This patch extends migration > support to MPOL_PREFERRED_MANY policy. > > Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag > MPOL_F_NUMA_BALANCING. This causes issues when we want to use > NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, > the kernel should not allocate pages from the slower memory tier via > allocation control zonelist fallback. Instead, we should move cold pages > from the faster memory node via memory demotion. For a page allocation, > kswapd is only woken up after we try to allocate pages from all nodes in > the allocation zone list. This implies that, without using memory > policies, we will end up allocating hot pages in the slower memory tier. > > MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add > MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better > allocation control when we have memory tiers in the system. With > MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only > of faster memory nodes. When we fail to allocate pages from the faster > memory node, kswapd would be woken up, allowing demotion of cold pages > to slower memory nodes. > > With the current kernel, such usage of memory policies implies we can't > do page promotion from a slower memory tier to a faster memory tier > using numa fault. This patch fixes this issue. > > For MPOL_PREFERRED_MANY, if the executing node is in the policy node > mask, we allow numa migration to the executing nodes. If the executing > node is not in the policy node mask, we do not allow numa migration. Can we provide more information about this? I suggest to use an example, for instance, pages may be distributed among multiple sockets unexpectedly. -- Best Regards, Huang, Ying > Signed-off-by: Aneesh Kumar K.V (IBM) > Signed-off-by: Donet Tom > --- > mm/mempolicy.c | 22 +++++++++++++++++----- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index aa48376e2d34..13100a290918 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1504,9 +1504,10 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) > if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) > return -EINVAL; > if (*flags & MPOL_F_NUMA_BALANCING) { > - if (*mode != MPOL_BIND) > + if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY) > + *flags |= (MPOL_F_MOF | MPOL_F_MORON); > + else > return -EINVAL; > - *flags |= (MPOL_F_MOF | MPOL_F_MORON); > } > return 0; > } > @@ -2770,15 +2771,26 @@ int mpol_misplaced(struct folio *folio, struct vm_fault *vmf, > break; > > case MPOL_BIND: > - /* Optimize placement among multiple nodes via NUMA balancing */ > + case MPOL_PREFERRED_MANY: > + /* > + * Even though MPOL_PREFERRED_MANY can allocate pages outside > + * policy nodemask we don't allow numa migration to nodes > + * outside policy nodemask for now. This is done so that if we > + * want demotion to slow memory to happen, before allocating > + * from some DRAM node say 'x', we will end up using a > + * MPOL_PREFERRED_MANY mask excluding node 'x'. In such scenario > + * we should not promote to node 'x' from slow memory node. > + */ > if (pol->flags & MPOL_F_MORON) { > + /* > + * Optimize placement among multiple nodes > + * via NUMA balancing > + */ > if (node_isset(thisnid, pol->nodes)) > break; > goto out; > } > - fallthrough; > > - case MPOL_PREFERRED_MANY: > /* > * use current page if in policy nodemask, > * else select nearest allowed node, if any.