Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp2185108pxy; Mon, 2 Aug 2021 23:02:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzcWaKZGyJe2jg1prAlRyfh6eoAp8HJhi32+qda8Fwycgol8e/0TDYTY/vJGOqpQrfzLTmz X-Received: by 2002:a6b:2bd4:: with SMTP id r203mr210969ior.157.1627970525281; Mon, 02 Aug 2021 23:02:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627970525; cv=none; d=google.com; s=arc-20160816; b=SeShDxyjY9u3OKm6YCfHnkGQsuyKVyHlw1KbJnt69xUrxkv6aA1Vl3GZDqMjfS0BWk yZM1wo708o5bQMOkcKWjsHaiS57P2eHvE9flpIsG+ZXhcQ1hQst0jJL2HHk2fTslYYNT mSltBbyQZ9K6U2hTfkNQMAvTv0xW8Qgx/vefHtC/MluojK2tqxh9aQSQkLrCENmvv5PP gyCLlDZhYg/Tmf53eeJ6NpZfUMCRWRiDqHvvl9WwwldYLx/ZbuVDqLkkNWIMeleM1Q8X kZ7gqJYFK6DZtIWIHwNFZypIt7kfYtrgnInnBvScgWIdWSX4Cgsj13RWIqSq3v7YX/gz oCbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from; bh=nQoCZQxrrW0VsCW9EDfRIH8urPsVeDAZa6mlGKbjkCM=; b=BwZtO/EioXyKzhzMR5utE2xqpD7LTui3hK2v4p5hEamZCgl5XtI5iv47aBmnuRUlde RrhkYZP6uvk7on5ZA9+l9jf7eoMn0miDPV1acvcsm9w7t4lfh2zkoTg3h3J8aTYqRImH BN681r3UmRVx3VCcQw8GoGrTo/8chFDHjtWIHlR0zc4WUDBCB2AvSUZJ+FRzS0TVvAC2 o3S39eC+XhUl7p8f+qZR8HEPTRQ0Gl77ApXyC3M/lFOCjSmWddxHGTkdBSPv4dnzHPoc kqG40V5v1Xxbv2lU/lfJaCBkEc+UEF2QAESomOuy0CCgx15ZOuPAwryBWqjK/dBwSl3h eruQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p6si15170101ilo.37.2021.08.02.23.01.49; Mon, 02 Aug 2021 23:02:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234024AbhHCF7y (ORCPT + 99 others); Tue, 3 Aug 2021 01:59:54 -0400 Received: from mga01.intel.com ([192.55.52.88]:34948 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234051AbhHCF7x (ORCPT ); Tue, 3 Aug 2021 01:59:53 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10064"; a="235529260" X-IronPort-AV: E=Sophos;i="5.84,291,1620716400"; d="scan'208";a="235529260" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Aug 2021 22:59:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,291,1620716400"; d="scan'208";a="479233384" Received: from shbuild999.sh.intel.com ([10.239.146.151]) by fmsmga008.fm.intel.com with ESMTP; 02 Aug 2021 22:59:39 -0700 From: Feng Tang To: linux-mm@kvack.org, Andrew Morton , Michal Hocko , David Rientjes , Dave Hansen , Ben Widawsky Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Feng Tang Subject: [PATCH v7 4/5] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Date: Tue, 3 Aug 2021 13:59:21 +0800 Message-Id: <1627970362-61305-5-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1627970362-61305-1-git-send-email-feng.tang@intel.com> References: <1627970362-61305-1-git-send-email-feng.tang@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ben Widawsky Adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY. MPOL_PREFERRED_MANY will be adequately documented in the internal admin-guide with this patch. Eventually, the man pages for mbind(2), get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text about this mode. Those shall contain the canonical reference. NUMA systems continue to become more prevalent. New technologies like PMEM make finer grain control over memory access patterns increasingly desirable. MPOL_PREFERRED_MANY allows userspace to specify a set of nodes that will be tried first when performing allocations. If those allocations fail, all remaining nodes will be tried. It's a straight forward API which solves many of the presumptive needs of system administrators wanting to optimize workloads on such machines. The mode will work either per VMA, or per thread. [Michal Hocko: refine kernel doc for MPOL_PREFERRED_MANY] Link: https://lore.kernel.org/r/20200630212517.308045-13-ben.widawsky@intel.com Signed-off-by: Ben Widawsky Signed-off-by: Feng Tang Acked-by: Michal Hocko --- Documentation/admin-guide/mm/numa_memory_policy.rst | 15 +++++++++++---- mm/mempolicy.c | 7 +------ 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index 067a90a1499c..64fd0ba0d057 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -245,6 +245,13 @@ MPOL_INTERLEAVED address range or file. During system boot up, the temporary interleaved system default policy works in this mode. +MPOL_PREFERRED_MANY + This mode specifices that the allocation should be preferrably + satisfied from the nodemask specified in the policy. If there is + a memory pressure on all nodes in the nodemask, the allocation + can fall back to all existing numa nodes. This is effectively + MPOL_PREFERRED allowed for a mask rather than a single node. + NUMA memory policy supports the following optional mode flags: MPOL_F_STATIC_NODES @@ -253,10 +260,10 @@ MPOL_F_STATIC_NODES nodes changes after the memory policy has been defined. Without this flag, any time a mempolicy is rebound because of a - change in the set of allowed nodes, the node (Preferred) or - nodemask (Bind, Interleave) is remapped to the new set of - allowed nodes. This may result in nodes being used that were - previously undesired. + change in the set of allowed nodes, the preferred nodemask (Preferred + Many), preferred node (Preferred) or nodemask (Bind, Interleave) is + remapped to the new set of allowed nodes. This may result in nodes + being used that were previously undesired. With this flag, if the user-specified nodes overlap with the nodes allowed by the task's cpuset, then the memory policy is diff --git a/mm/mempolicy.c b/mm/mempolicy.c index a00bb1c48a15..e437fe96acd0 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1463,12 +1463,7 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) *flags = *mode & MPOL_MODE_FLAGS; *mode &= ~MPOL_MODE_FLAGS; - /* - * The check should be 'mode >= MPOL_MAX', but as 'prefer_many' - * is not fully implemented, don't permit it to be used for now, - * and the logic will be restored in following patch - */ - if ((unsigned int)(*mode) >= MPOL_PREFERRED_MANY) + if ((unsigned int)(*mode) >= MPOL_MAX) return -EINVAL; if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) return -EINVAL; -- 2.14.1