Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp2292635iof; Wed, 8 Jun 2022 01:38:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJylh1jIZmp4DVeyqexNPyvy5arYYC6J6fY+yYfKJPKYUKYBAS6ZgRFa8EahaP1/OI+kaCAd X-Received: by 2002:a17:902:d48d:b0:167:53e2:3dc5 with SMTP id c13-20020a170902d48d00b0016753e23dc5mr23671041plg.105.1654677497707; Wed, 08 Jun 2022 01:38:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654677497; cv=none; d=google.com; s=arc-20160816; b=HMMjYmGZ1QoaUeiTQLqqJqTSp/MgHDeswwcR+KaOtANCaVbzr2DuyBIkPVq/iGwDSR 8ru0yuyotPRSc4MpwZaY0LGdn+Sz9EXG/gthIFdoPuCoGFGqWWcGKrCmZNhSmVCSIoZ6 oNINTXdGLoWZ+zA/spVJAPrgdBh7cHsGmv4gNR4gwP27NQMdv3/UYWzfPWM3+6xbtwWz d0jIBLRqckCLRw6eyFyk9tseqj214DR64tgrjppQ2ugnzyCOipgVcTRxXYOnZpNfdlbx +CLHTWU6sO4aYIOX6JXFIpwO2SGQkQ9PdbY423wVGngODrWIc/Fs3ROjAIlFJ3oHakcd nssw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=lY0j7+t8Fuv5SGooVEQMT+TyFpFWVjA9GoAIzDojqgA=; b=b9O9eySDna9LzJpZJL0Wr3PBvNQ+yRB0w8Ij091zU42IlbH+Y61UE3MTTzdfujCf2F ZNu9eQBIqd5VZ30Mo0PgTBybJoWMG0vvfbm/4lQTKQ0PgUPsnXDdQoSpgPoeR6T5fjOy kh/XQbH2etTwJ1P8pfJvN2uNu0NchkqZrnW/4DTkqdd6iSWN2DaSxgZLq9h/IQoMwUho cCqrdYXxLN6Jx/dmWBy/SpZxCwrPVMhsGDo5YLzeE0WRYko4YuvQYo0koSnrQMEJQroD LELBrgSFJoMNQQGkhH9Q28S9O0dT+WPVfT9kMDwl7L6g5kI0rVWUKLArhJS9QVV8nY34 gNIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DUbzwg4y; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id i186-20020a6387c3000000b003ab9d94aaccsi617155pge.328.2022.06.08.01.38.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jun 2022 01:38:17 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DUbzwg4y; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8A8DD1F5AE6; Wed, 8 Jun 2022 01:02:39 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229581AbiFHHV5 (ORCPT + 99 others); Wed, 8 Jun 2022 03:21:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241422AbiFHHMh (ORCPT ); Wed, 8 Jun 2022 03:12:37 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1ACE1D0887 for ; Tue, 7 Jun 2022 23:52:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654671151; x=1686207151; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=Q+QS/cn0Iya0AlVh9AVNArnJL/U93flfH93KNZW4kJk=; b=DUbzwg4yIS/tmTkgJUh1dV7YKsxr3KoiCIb8tZti12ozlWF21TGPVcuT 2HQohRwiNq+wxziTErmc09L63zyJnVqzZUalryF1NPgq0l4N5m2DerhEl AmKof9uDsyk6QWW3CYVpmKeh8VuX3gOMiQa3R82bQv4GE/j6j4ATyLYAA lg9NlAQ1Apfa25Dyj4HJQMLRYf530Fifg6r4Y7wZEMZVum2EciXAOMpXG l+lfFjjM4Hqw3YTmuGm8khwD8mdoC3tSOsZ+AXZwIpTPWYY0KFmR1U8jb BAlQ6UrKBadWZkrA7XIDTsqx/FN3AGjzSNtb1AvA54KZzy4UVXj8Zuf8u w==; X-IronPort-AV: E=McAfee;i="6400,9594,10371"; a="277619320" X-IronPort-AV: E=Sophos;i="5.91,285,1647327600"; d="scan'208";a="277619320" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 23:52:27 -0700 X-IronPort-AV: E=Sophos;i="5.91,285,1647327600"; d="scan'208";a="636620957" Received: from xding11-mobl.ccr.corp.intel.com ([10.254.214.239]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 23:52:22 -0700 Message-ID: Subject: Re: [PATCH v5 4/9] mm/demotion: Build demotion targets based on explicit memory tiers From: Ying Huang To: Tim Chen , "Aneesh Kumar K.V" , linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes Date: Wed, 08 Jun 2022 14:52:20 +0800 In-Reply-To: References: <20220603134237.131362-1-aneesh.kumar@linux.ibm.com> <20220603134237.131362-5-aneesh.kumar@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2022-06-07 at 15:51 -0700, Tim Chen wrote: > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: > > > > +int next_demotion_node(int node) > > +{ > > + struct demotion_nodes *nd; > > + int target, nnodes, i; > > + > > + if (!node_demotion) > > + return NUMA_NO_NODE; > > + > > + nd = &node_demotion[node]; > > + > > + /* > > + * node_demotion[] is updated without excluding this > > + * function from running. > > + * > > + * Make sure to use RCU over entire code blocks if > > + * node_demotion[] reads need to be consistent. > > + */ > > + rcu_read_lock(); > > + > > + nnodes = nodes_weight(nd->preferred); > > + if (!nnodes) > > + return NUMA_NO_NODE; > > + > > + /* > > + * If there are multiple target nodes, just select one > > + * target node randomly. > > + * > > + * In addition, we can also use round-robin to select > > + * target node, but we should introduce another variable > > + * for node_demotion[] to record last selected target node, > > + * that may cause cache ping-pong due to the changing of > > + * last target node. Or introducing per-cpu data to avoid > > + * caching issue, which seems more complicated. So selecting > > + * target node randomly seems better until now. > > + */ > > + nnodes = get_random_int() % nnodes; > > + target = first_node(nd->preferred); > > + for (i = 0; i < nnodes; i++) > > + target = next_node(target, nd->preferred); > > We can simplify the above 4 lines. > > target = node_random(nd->preferred); > > There's still a loop overhead though :( To avoid loop overhead, we can use the original implementation of next_demotion_node. The performance is much better for the most common cases, the number of preferred node is 1. Best Regards, Huang, Ying > > > > > > + > > + rcu_read_unlock(); > > + > > + return target; > > +} > > + > > > > + */ > > +static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, > > + unsigned long action, void *_arg) > > +{ > > + struct memory_notify *arg = _arg; > > + > > + /* > > + * Only update the node migration order when a node is > > + * changing status, like online->offline. > > + */ > > + if (arg->status_change_nid < 0) > > + return notifier_from_errno(0); > > + > > + switch (action) { > > + case MEM_OFFLINE: > > + /* > > + * In case we are moving out of N_MEMORY. Keep the node > > + * in the memory tier so that when we bring memory online, > > + * they appear in the right memory tier. We still need > > + * to rebuild the demotion order. > > + */ > > + mutex_lock(&memory_tier_lock); > > + establish_migration_targets(); > > + mutex_unlock(&memory_tier_lock); > > + break; > > + case MEM_ONLINE: > > + /* > > + * We ignore the error here, if the node already have the tier > > + * registered, we will continue to use that for the new memory > > + * we are adding here. > > + */ > > + node_set_memory_tier(arg->status_change_nid, DEFAULT_MEMORY_TIER); > > Should establish_migration_targets() be run here? Otherwise what are the > demotion targets for this newly onlined node? > > > + break; > > + } > > + > > + return notifier_from_errno(0); > > +} > > + > > Tim >