Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp2798514pxv; Mon, 12 Jul 2021 02:05:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwxTE+pMoAwTbJ61p3nmBrk7J+wg96h0br8yfIupCX1tXsvsuHlJcrm7z6O3+q6yG7YTzSK X-Received: by 2002:a02:2b27:: with SMTP id h39mr44022625jaa.62.1626080707522; Mon, 12 Jul 2021 02:05:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626080707; cv=none; d=google.com; s=arc-20160816; b=kdhSVYVMsNgJ+GIU0nndAIy53uIHqaziuME1VKBLYR3CCbU+XKURuCW2YXsOVvkZ3W PFdK2QkOkFfS8mWMZeVFry86MBdKG3xkUl7sjiZQW8cZGuUmxNnQC/+Ymc5Rd8tsKGtT 2G/HFtSi22a9m9i67oQee4HtruFvCcGHSXgf5oB7kKCd+ASZh+1x6ZvfjTrlI8IQF8Qe QM0yiyjgiLK2uB/nQ0HyumiwSAS1p5LF7i5GQog16cKnz5meiwdekLWxnxIGewMsNp5A uttK9rD8dppPYC6dhGymF0gRO0b3QbIWNXb6dPh0KHuy0OTFQ5uguomAMghHiL3oHvCB vg6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from; bh=pQxJi8+jgblLkOv3aRTdz/4Tv6oIB/useuApPbPZPX0=; b=WmFqo1U5TL7p48qadsdYLQMw/8oxmqO4Exl+VJGwzgp53QDEwyRXEYm4L/h77AB2Zh pXIJIcIsTwPGudTnV3u/EiGJktTAmBU337FhNCda6c9oJXVHynm+Cq0XeO4IjmGqwlOE lcvxLQ1sMa0OuNLTJ/wZsRLSnEZ/HYV3aYh8LZ6trJIzA3xNmbgy4bPaeMRWTTUHRbhy VxVHRBSCeCIQBWt0edU+qmy/UmlkLiPEaWPswN9m+zdUBB4foHrm1l8+KxaOrHoVyAtf 5OYrb9jxVW4kc974m+NQdL0pzKgR5ZFKPrsX/zt5YuUV0iyQ00ZEvlVSCSnt17R5AnA1 RrDA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s16si11857521jan.83.2021.07.12.02.04.55; Mon, 12 Jul 2021 02:05:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1385151AbhGLJGG (ORCPT + 99 others); Mon, 12 Jul 2021 05:06:06 -0400 Received: from mga17.intel.com ([192.55.52.151]:12177 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351344AbhGLINw (ORCPT ); Mon, 12 Jul 2021 04:13:52 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10042"; a="190322277" X-IronPort-AV: E=Sophos;i="5.84,232,1620716400"; d="scan'208";a="190322277" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2021 01:10:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,232,1620716400"; d="scan'208";a="459109394" Received: from shbuild999.sh.intel.com ([10.239.146.151]) by orsmga008.jf.intel.com with ESMTP; 12 Jul 2021 01:09:57 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [PATCH v6 5/6] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Date: Mon, 12 Jul 2021 16:09:33 +0800 Message-Id: <1626077374-81682-6-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1626077374-81682-1-git-send-email-feng.tang@intel.com> References: <1626077374-81682-1-git-send-email-feng.tang@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ben Widawsky Adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY. MPOL_PREFERRED_MANY will be adequately documented in the internal admin-guide with this patch. Eventually, the man pages for mbind(2), get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text about this mode. Those shall contain the canonical reference. NUMA systems continue to become more prevalent. New technologies like PMEM make finer grain control over memory access patterns increasingly desirable. MPOL_PREFERRED_MANY allows userspace to specify a set of nodes that will be tried first when performing allocations. If those allocations fail, all remaining nodes will be tried. It's a straight forward API which solves many of the presumptive needs of system administrators wanting to optimize workloads on such machines. The mode will work either per VMA, or per thread. Link: https://lore.kernel.org/r/20200630212517.308045-13-ben.widawsky@intel.com Signed-off-by: Ben Widawsky Signed-off-by: Feng Tang --- Documentation/admin-guide/mm/numa_memory_policy.rst | 16 ++++++++++++---- mm/mempolicy.c | 7 +------ 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index 067a90a1499c..cd653561e531 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -245,6 +245,14 @@ MPOL_INTERLEAVED address range or file. During system boot up, the temporary interleaved system default policy works in this mode. +MPOL_PREFERRED_MANY + This mode specifies that the allocation should be attempted from the + nodemask specified in the policy. If that allocation fails, the kernel + will search other nodes, in order of increasing distance from the first + set bit in the nodemask based on information provided by the platform + firmware. It is similar to MPOL_PREFERRED with the main exception that + is an error to have an empty nodemask. + NUMA memory policy supports the following optional mode flags: MPOL_F_STATIC_NODES @@ -253,10 +261,10 @@ MPOL_F_STATIC_NODES nodes changes after the memory policy has been defined. Without this flag, any time a mempolicy is rebound because of a - change in the set of allowed nodes, the node (Preferred) or - nodemask (Bind, Interleave) is remapped to the new set of - allowed nodes. This may result in nodes being used that were - previously undesired. + change in the set of allowed nodes, the preferred nodemask (Preferred + Many), preferred node (Preferred) or nodemask (Bind, Interleave) is + remapped to the new set of allowed nodes. This may result in nodes + being used that were previously undesired. With this flag, if the user-specified nodes overlap with the nodes allowed by the task's cpuset, then the memory policy is diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 93f8789758a7..d90247d6a71b 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1463,12 +1463,7 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) *flags = *mode & MPOL_MODE_FLAGS; *mode &= ~MPOL_MODE_FLAGS; - /* - * The check should be 'mode >= MPOL_MAX', but as 'prefer_many' - * is not fully implemented, don't permit it to be used for now, - * and the logic will be restored in following patch - */ - if ((unsigned int)(*mode) >= MPOL_PREFERRED_MANY) + if ((unsigned int)(*mode) >= MPOL_MAX) return -EINVAL; if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) return -EINVAL; -- 2.7.4