Received: by 2002:ab2:710b:0:b0:1ef:a325:1205 with SMTP id z11csp140305lql; Sun, 10 Mar 2024 18:39:49 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXIN5c5tvnlhKXrEi789EFVlr1XbIo6WnHSlbc4oDuTFd7HZnM555vRtPt92rP8aTyFNwGvX6fnMo1duS0Qr3NtbS8kj01YsB97dT9x4A== X-Google-Smtp-Source: AGHT+IGs8OpfP8UouVgYt8RQMdjVd0Cl2K7Cyx9Wr6JkPBLzQwx6NaxGEyMeZIBaaYh/ifqMxpMK X-Received: by 2002:a17:90a:b887:b0:29b:a59:a12d with SMTP id o7-20020a17090ab88700b0029b0a59a12dmr2817133pjr.25.1710121189617; Sun, 10 Mar 2024 18:39:49 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710121189; cv=pass; d=google.com; s=arc-20160816; b=gHtEmPrc2EV+/NNz+uLOPaZROTPY2i+Kwezkye4EMPGaR9+PPYnmirSqt+qq1ULkUl pcBxZgh5LWYPqgUhyZesg3scQIHvizlsTXooEC1s3FSmLBql89MSA/ZWWQO9WMGV4vud 9TcCNaazIfy/MMKWPoNF88K/zYq1M+2iBE5ERQbVP0lX1v9lKMl112XWCHmNK49MgcmS YcFoA/VCu7g34bgkTv/q7fDKVQNfFOXHVxmFs0qHSrhwypfcn95L20gVrsb8L/GRFSh6 DglvM5Icou2h95K0+5I+2LAR8K0o56wwo72nQlusiYP+rtn+RTacPW6j8BpG9GEPfkA0 BrVg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=weoYIyWSXjgTWranweffbvJDQ2Tws/LJi4LwMTJKCCg=; fh=hrlWvyh2+2+qtYSDS74IWbwD2TgzeHtygwXvTncL5J0=; b=xVUQePj2WKQzqM7HSlNiOZXNVvRsE6WX3D+s3LeBj9Sjs1GCJnKLjVE5rQTs9DkKsQ tEJgbiBu0uAasrbIPTtFCx+fGXgvVU1qqZNeGHuFM9lvlfJHAywg0z8LE7EXeC4OU/8Z v5ZxlX44+2rZFFDNd6SSqtG4MbwuuKTe3Aluz2v1OQpdG8EVU6mApDgki9LlGwWdrFDf Qs0EBrf/W6eGhTVEd74GtN1EiurT3n5h8IecZLxpyWpj2F0/0viBb0Y557oAkFub5p/p rzZlc3cJAMjOf5l0WO2GjvElAqFEprCnxHW8KD8jk1oqXbQj9D5tIhTxzit9qNTrRv/O +Vuw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fnVjnoku; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-98399-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-98399-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id r12-20020a17090aad0c00b0029bb424c4efsi4071845pjq.13.2024.03.10.18.39.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Mar 2024 18:39:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-98399-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fnVjnoku; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-98399-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-98399-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id F3A7EB21215 for ; Mon, 11 Mar 2024 01:39:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 20A46EBE; Mon, 11 Mar 2024 01:39:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fnVjnoku" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 388877EC for ; Mon, 11 Mar 2024 01:39:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710121178; cv=none; b=ihA0b9qnnwz/CMcP7WlxKJOzuTTBAqHRNX/EVxlI9CY061Nd+n5B3yjCDnBPhS0hWxFlfjfd83JmcLOYbIJlE4zdKz8PWURR1vNDWSUc3DRb3JqatgOIA5BPffNwpZsFN4ZNl1hGQGI9LZqrGqmTMIK4yyii62GjkhIOndGA9ps= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710121178; c=relaxed/simple; bh=RH97JCVDoNwe8SVC6X+BVVE8z2EgxsVYbKPbHPNPJLY=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=o+t8AP4NHjEGR2t77oHM8YEa7J9h0OLfqWkDDUG04WhHoHQI8YcFX4OmfUWJgqoKI1epEFgnFO05TC2Z2hsQN4wnQBQ+ojDOWFkeuUCbTIyT1jNe5dmGZYntsao17IwM1epjmcyrHfO/ooLG9xr7Vnflon+Z4WTkILBQA7jSVVs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fnVjnoku; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710121176; x=1741657176; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=RH97JCVDoNwe8SVC6X+BVVE8z2EgxsVYbKPbHPNPJLY=; b=fnVjnokutBVnw/4/5s6SVG47jRwPm6+pCUDheIPkA4GDrp5Met4Nxp31 yYcK37+ee+8lZiuGT4IT8TQdMLGgMytGB2NiqvHm1NhCp5/kYGkxkk1+d NKiQ/TJuvsx0S5T+QCD59VVr+MSPhiD9Y+P5bixjz2agAu9rABWRyU8zT 213RV98DJWLYRImBWoaY7Y/RRSNdGD5yoXaWFSnpQiC9VimuxCAKJwOyA QVRX1Zdj+nRCFAPF4rasZ302bBORF6dd0fz8uueoNaeK6tebVJYe8+a9l qGu2mTIRxh17H5JhN8kpR1bGWQ4O8cmkH9c4toGdqUbfYKE5nKl6Fd590 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11009"; a="30212913" X-IronPort-AV: E=Sophos;i="6.07,115,1708416000"; d="scan'208";a="30212913" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2024 18:39:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,115,1708416000"; d="scan'208";a="15665180" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2024 18:39:30 -0700 From: "Huang, Ying" To: Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH v2 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: <369d6a58758396335fd1176d97bbca4e7730d75a.1709909210.git.donettom@linux.ibm.com> (Donet Tom's message of "Fri, 8 Mar 2024 09:15:38 -0600") References: <369d6a58758396335fd1176d97bbca4e7730d75a.1709909210.git.donettom@linux.ibm.com> Date: Mon, 11 Mar 2024 09:37:36 +0800 Message-ID: <874jdd5z1b.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Donet Tom writes: > commit bda420b98505 ("numa balancing: migrate on fault among multiple bound > nodes") added support for migrate on protnone reference with MPOL_BIND > memory policy. This allowed numa fault migration when the executing node > is part of the policy mask for MPOL_BIND. This patch extends migration > support to MPOL_PREFERRED_MANY policy. > > Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag > MPOL_F_NUMA_BALANCING. This causes issues when we want to use > NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, > the kernel should not allocate pages from the slower memory tier via > allocation control zonelist fallback. Instead, we should move cold pages > from the faster memory node via memory demotion. For a page allocation, > kswapd is only woken up after we try to allocate pages from all nodes in > the allocation zone list. This implies that, without using memory > policies, we will end up allocating hot pages in the slower memory tier. > > MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add > MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better > allocation control when we have memory tiers in the system. With > MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only > of faster memory nodes. When we fail to allocate pages from the faster > memory node, kswapd would be woken up, allowing demotion of cold pages > to slower memory nodes. > > With the current kernel, such usage of memory policies implies we can't > do page promotion from a slower memory tier to a faster memory tier > using numa fault. This patch fixes this issue. > > For MPOL_PREFERRED_MANY, if the executing node is in the policy node > mask, we allow numa migration to the executing nodes. If the executing > node is not in the policy node mask, we do not allow numa migration. > > Signed-off-by: Aneesh Kumar K.V (IBM) > Signed-off-by: Donet Tom > --- > mm/mempolicy.c | 22 +++++++++++++++++----- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index e635d7ed501b..ccd9c6c5fcf5 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1458,9 +1458,10 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) > if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) > return -EINVAL; > if (*flags & MPOL_F_NUMA_BALANCING) { > - if (*mode != MPOL_BIND) > + if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY) > + *flags |= (MPOL_F_MOF | MPOL_F_MORON); > + else > return -EINVAL; > - *flags |= (MPOL_F_MOF | MPOL_F_MORON); > } > return 0; > } > @@ -2515,15 +2516,26 @@ int mpol_misplaced(struct folio *folio, struct vm_fault *vmf, > break; > > case MPOL_BIND: > - /* Optimize placement among multiple nodes via NUMA balancing */ > + case MPOL_PREFERRED_MANY: > + /* > + * Even though MPOL_PREFERRED_MANY can allocate pages outside > + * policy nodemask we don't allow numa migration to nodes > + * outside policy nodemask for now. This is done so that if we > + * want demotion to slow memory to happen, before allocating > + * from some DRAM node say 'x', we will end up using a > + * MPOL_PREFERRED_MANY mask excluding node 'x'. In such scenario > + * we should not promote to node 'x' from slow memory node. > + */ This is a little hard to digest for me. And, I don't think that we need to put this policy choice in code comments. It's better to put it in patch description. Where we can give more background, for example, to avoid cross-socket traffic, etc. Otherwise, the patchset looks good to me. Thanks! > if (pol->flags & MPOL_F_MORON) { > + /* > + * Optimize placement among multiple nodes > + * via NUMA balancing > + */ > if (node_isset(thisnid, pol->nodes)) > break; > goto out; > } > - fallthrough; > > - case MPOL_PREFERRED_MANY: > /* > * use current page if in policy nodemask, > * else select nearest allowed node, if any. -- Best Regards, Huang, Ying