Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp41867iof; Wed, 8 Jun 2022 14:45:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJysUKemcgOALpaKvkIthwgcurAfARrgMz9jN1hMGEUc+WddTGPsXwYOkHJ9hdYmkb82N8m8 X-Received: by 2002:a17:903:120c:b0:167:8847:21f2 with SMTP id l12-20020a170903120c00b00167884721f2mr14437022plh.11.1654724754844; Wed, 08 Jun 2022 14:45:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654724754; cv=none; d=google.com; s=arc-20160816; b=YTryzNMU9ofnIYNg6+bSSodkKFJ1bW/aiCu5JJb8E7esI15+ffh2Vn2wxXyZWWFvSZ 5mWW9cm6/fNi/bhmgb/4mrk7LNQ8SDlSNGzMrp8CW1zXKPW5QXxTaYOgqbAc7MxeDceh r9bFr3WKecQGD5F9qIaXyeO4tm5EYW2s0C8ptr9jP4hmgq7spKo5m8Xj3Vz8Cwex10M9 tVZtllHFkYoadEU2nl+lMh9HE6pW/Ks/7xHUQJz1MoTXc00EAJYtO93uSyIsxo018beD t9xW/Xsroh1VmbaY7+gAa9R325Wl558Hs2xJeUrxbAGidqvz9M1PU4IGitRQ14TnZ0BM h03A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=/qNS2/+mjRPXJI7P7YF+0Gz1mau+sTRclHFOVZphUJA=; b=ObhtBmc0Qk6ptSS7BHyXreY5F2vvaKHsEp4eTliTkiD/nFpHmOQb+wAN8nVTPSDlOS qRPxeshwxxUZeK7hnNXLPgEMX9murpc0IUuirqlYZ3hVNIA4ee6cj/Apv9WoFOVRdAKO vCWVYm7mRoMf4w1dAhzBxkp6Kv1b/HAlVVc5aVA7+n+aVZwHthqp4ZjvcCcPQU1vpUQV LdNN/by/GwR3qpNZIcuVbo+8EI+yFrsKOGjaYkvmZTaDEmC25xLhJiw+/5pB8aJdfavv nB6l7QJtElYoVYG+9xLXP4gZ5lI1qwWqbicu47+qbv4XXdZb5kcIzygADOR3UTPm2FPH adzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=glCHWxYf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p4-20020a17090ab90400b001bfb0db0879si27140395pjr.88.2022.06.08.14.45.41; Wed, 08 Jun 2022 14:45:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=glCHWxYf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229862AbiFHUV1 (ORCPT + 99 others); Wed, 8 Jun 2022 16:21:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229614AbiFHUVZ (ORCPT ); Wed, 8 Jun 2022 16:21:25 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09A2ED47 for ; Wed, 8 Jun 2022 13:21:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654719683; x=1686255683; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=dLp0cUT1fUxwoqMLlx+zcnqvmgDGV98fyxDj54aIu5I=; b=glCHWxYf/lbUJQ3j4JQ1PiEMGAH525o59SK/ewwNReAez2igdnKK0wjn Gs/Zb4+uPw7W/pRDY9wUsDtzKpU0EkGRNnVdSeXdo/5oENayVOTVdLH5e rNbZ7Pb7iwSrqRFnRTJFyjExElKwZqfQNAGW2I6KYBQB5f3Y2wP7HgWo+ 8XNVpv0y7mDNoTQo+QReJQkYmrnpok0HHmuwx39K9BZJAvW9IByE+hzL5 a7ZR3QwwEaZwkPHeGmivlfV/1KF+z3n+7AdUHdum59S4EM12ICLsXWrff TlYLShxQMDMDbA0dFhUjXI7ssrf//JVwW4E5Lm0qx2+kuECvUrvTKSqgD w==; X-IronPort-AV: E=McAfee;i="6400,9594,10372"; a="278201172" X-IronPort-AV: E=Sophos;i="5.91,286,1647327600"; d="scan'208";a="278201172" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2022 13:14:56 -0700 X-IronPort-AV: E=Sophos;i="5.91,286,1647327600"; d="scan'208";a="533256642" Received: from schen9-mobl.amr.corp.intel.com ([10.209.124.119]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2022 13:14:55 -0700 Message-ID: <0d8849467053cf48f5d7356de5f1e3e600a85b39.camel@linux.intel.com> Subject: Re: [PATCH v5 9/9] mm/demotion: Update node_is_toptier to work with memory tiers From: Tim Chen To: "Aneesh Kumar K.V" , Ying Huang , linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes Date: Wed, 08 Jun 2022 13:14:55 -0700 In-Reply-To: <87sfoffcfz.fsf@linux.ibm.com> References: <20220603134237.131362-1-aneesh.kumar@linux.ibm.com> <20220603134237.131362-10-aneesh.kumar@linux.ibm.com> <6e94b7e2a6192e4cacba1db3676b5b5cf9b98eac.camel@intel.com> <11f94e0c50f17f4a6a2f974cb69a1ae72853e2be.camel@intel.com> <232817e0-24fd-e022-6c92-c260f7f01f8a@linux.ibm.com> <87sfoffcfz.fsf@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2022-06-08 at 20:07 +0530, Aneesh Kumar K.V wrote: > > > This is what I am testing now. We still need to closely audit that lock > free access to the NODE_DATA()->memtier. You're refering to this or something else? This is a write so seems okay. > + for_each_node_state(node, N_MEMORY) { > + /* > + * Should be safe to do this early in the boot. > + */ > + NODE_DATA(node)->memtier = memtier; > + node_set(node, memtier->nodelist); > + } > migrate_on_reclaim_init(); > For v6 I will keep this as a > separate patch and once we all agree that it is safe, I will fold it > back. Please update code that uses __node_get_memory_tier(node) to use NODE_DATA(node)->memtier; Otherwise the code looks okay at a first glance. Tim > > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h > index a388a806b61a..3e733de1a8a0 100644 > --- a/include/linux/memory-tiers.h > +++ b/include/linux/memory-tiers.h > @@ -17,7 +17,6 @@ > #define MAX_MEMORY_TIERS (MAX_STATIC_MEMORY_TIERS + 2) > > extern bool numa_demotion_enabled; > -extern nodemask_t promotion_mask; > int node_create_and_set_memory_tier(int node, int tier); > int next_demotion_node(int node); > int node_set_memory_tier(int node, int tier); > @@ -25,15 +24,7 @@ int node_get_memory_tier_id(int node); > int node_reset_memory_tier(int node, int tier); > void node_remove_from_memory_tier(int node); > void node_get_allowed_targets(int node, nodemask_t *targets); > - > -/* > - * By default all nodes are top tiper. As we create new memory tiers > - * we below top tiers we add them to NON_TOP_TIER state. > - */ > -static inline bool node_is_toptier(int node) > -{ > - return !node_isset(node, promotion_mask); > -} > +bool node_is_toptier(int node); > > #else > #define numa_demotion_enabled false > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index aab70355d64f..c4fcfd2b9980 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -928,6 +928,9 @@ typedef struct pglist_data { > /* Per-node vmstats */ > struct per_cpu_nodestat __percpu *per_cpu_nodestats; > atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; > +#ifdef CONFIG_TIERED_MEMORY > + struct memory_tier *memtier; > +#endif > } pg_data_t; > > #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > index 29a038bb38b0..31ef0fab5f19 100644 > --- a/mm/memory-tiers.c > +++ b/mm/memory-tiers.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include "internal.h" > > @@ -26,7 +27,7 @@ struct demotion_nodes { > static void establish_migration_targets(void); > static DEFINE_MUTEX(memory_tier_lock); > static LIST_HEAD(memory_tiers); > -nodemask_t promotion_mask; > +static int top_tier_rank; > /* > * node_demotion[] examples: > * > @@ -135,7 +136,7 @@ static void memory_tier_device_release(struct device *dev) > if (tier->dev.id >= MAX_STATIC_MEMORY_TIERS) > ida_free(&memtier_dev_id, tier->dev.id); > > - kfree(tier); > + kfree_rcu(tier); > } > > /* > @@ -233,6 +234,70 @@ static struct memory_tier *__get_memory_tier_from_id(int id) > return NULL; > } > > +/* > + * Called with memory_tier_lock. Hence the device references cannot > + * be dropped during this function. > + */ > +static void memtier_node_clear(int node, struct memory_tier *memtier) > +{ > + pg_data_t *pgdat; > + > + pgdat = NODE_DATA(node); > + if (!pgdat) > + return; > + > + rcu_assign_pointer(pgdat->memtier, NULL); > + /* > + * Make sure read side see the NULL value before we clear the node > + * from the nodelist. > + */ > + synchronize_rcu(); > + node_clear(node, memtier->nodelist); > +} > + > +static void memtier_node_set(int node, struct memory_tier *memtier) > +{ > + pg_data_t *pgdat; > + > + pgdat = NODE_DATA(node); > + if (!pgdat) > + return; > + /* > + * Make sure we mark the memtier NULL before we assign the new memory tier > + * to the NUMA node. This make sure that anybody looking at NODE_DATA > + * finds a NULL memtier or the one which is still valid. > + */ > + rcu_assign_pointer(pgdat->memtier, NULL); > + synchronize_rcu(); > + node_set(node, memtier->nodelist); > + rcu_assign_pointer(pgdat->memtier, memtier); > +} > + > +bool node_is_toptier(int node) > +{ > + bool toptier; > + pg_data_t *pgdat; > + struct memory_tier *memtier; > + > + pgdat = NODE_DATA(node); > + if (!pgdat) > + return false; > + > + rcu_read_lock(); > + memtier = rcu_dereference(pgdat->memtier); > + if (!memtier) { > + toptier = true; > + goto out; > + } > + if (memtier->rank >= top_tier_rank) > + toptier = true; > + else > + toptier = false; > +out: > + rcu_read_unlock(); > + return toptier; > +} > + > static int __node_create_and_set_memory_tier(int node, int tier) > { > int ret = 0; > @@ -253,7 +318,7 @@ static int __node_create_and_set_memory_tier(int node, int tier) > goto out; > } > } > - node_set(node, memtier->nodelist); > + memtier_node_set(node, memtier); > out: > return ret; > } > @@ -275,12 +340,12 @@ int node_create_and_set_memory_tier(int node, int tier) > if (current_tier->dev.id == tier) > goto out; > > - node_clear(node, current_tier->nodelist); > + memtier_node_clear(node, current_tier); > > ret = __node_create_and_set_memory_tier(node, tier); > if (ret) { > /* reset it back to older tier */ > - node_set(node, current_tier->nodelist); > + memtier_node_set(node, current_tier); > goto out; > } > > @@ -305,7 +370,7 @@ static int __node_set_memory_tier(int node, int tier) > ret = -EINVAL; > goto out; > } > - node_set(node, memtier->nodelist); > + memtier_node_set(node, memtier); > out: > return ret; > } > @@ -374,12 +439,12 @@ int node_reset_memory_tier(int node, int tier) > if (current_tier->dev.id == tier) > goto out; > > - node_clear(node, current_tier->nodelist); > + memtier_node_clear(node, current_tier); > > ret = __node_set_memory_tier(node, tier); > if (ret) { > /* reset it back to older tier */ > - node_set(node, current_tier->nodelist); > + memtier_node_set(node, current_tier); > goto out; > } > > @@ -407,7 +472,7 @@ void node_remove_from_memory_tier(int node) > * empty then unregister it to make it invisible > * in sysfs. > */ > - node_clear(node, memtier->nodelist); > + memtier_node_clear(node, memtier); > if (nodes_empty(memtier->nodelist)) > unregister_memory_tier(memtier); > > @@ -570,15 +635,13 @@ static void establish_migration_targets(void) > * a memory tier, we consider that tier as top tiper from > * which promotion is not allowed. > */ > - promotion_mask = NODE_MASK_NONE; > list_for_each_entry_reverse(memtier, &memory_tiers, list) { > nodes_and(allowed, node_states[N_CPU], memtier->nodelist); > - if (nodes_empty(allowed)) > - nodes_or(promotion_mask, promotion_mask, memtier->nodelist); > - else > + if (!nodes_empty(allowed)) { > + top_tier_rank = memtier->rank; > break; > + } > } > - > pr_emerg("top tier rank is %d\n", top_tier_rank); > allowed = NODE_MASK_NONE; > /* > @@ -748,7 +811,7 @@ static const struct attribute_group *memory_tier_attr_groups[] = { > > static int __init memory_tier_init(void) > { > - int ret; > + int ret, node; > struct memory_tier *memtier; > > ret = subsys_system_register(&memory_tier_subsys, memory_tier_attr_groups); > @@ -766,7 +829,13 @@ static int __init memory_tier_init(void) > panic("%s() failed to register memory tier: %d\n", __func__, ret); > > /* CPU only nodes are not part of memory tiers. */ > - memtier->nodelist = node_states[N_MEMORY]; > + for_each_node_state(node, N_MEMORY) { > + /* > + * Should be safe to do this early in the boot. > + */ > + NODE_DATA(node)->memtier = memtier; > + node_set(node, memtier->nodelist); > + } > migrate_on_reclaim_init(); > > return 0;