Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp170828pxb; Thu, 31 Mar 2022 02:27:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJywCjtEeizjrRMPzBs+qH13B0e7t3CYWdm0x/+g3jyB2e1jFcTdVUK1VG6RgnaV6/4322GJ X-Received: by 2002:a17:903:2288:b0:153:bfbe:7354 with SMTP id b8-20020a170903228800b00153bfbe7354mr4490867plh.112.1648718859742; Thu, 31 Mar 2022 02:27:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648718859; cv=none; d=google.com; s=arc-20160816; b=aAizLvUGpyHNepkgeOY9kw/PdbySxznEuW3WcxKX84CDz6cR4O0b0N0qtoZJZf0F5T jlRgz0ijW4SOsgmvdn/BJqrktO2Wc49G8MIKdcfGdcBUNCmEtWQ9sPDx2hv2ExIgBtl9 8eX8T+TjxomibG3WQcHRNfSSiZyLQ5mL22inrCBZdtFNmMSMhPpZL4jhDhyZP5urCyAi 5dAcYoyrRdMBa15nHGoy7MFfRVZpZL7nR7FGoKGNIeHZuXyNuLu+HvIZ/0G6UF4a6Dnl kBol7irM+YskeNyGZiJ1DNzTgWr14W/a01RkckAtkTI+Q47/vmk8g6jlgmA3VIstlmql inew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=8DCZ2zRpjG0vsq2GU7XRRV9kP2j1of8QFCIsnX/Epu4=; b=ZgYXgzbavnFjLiJ++kHQrQmlPUPakZIVvxv+ZCFqB6yw6IB2JblobhpIyn2MOhNREa kDkGKVrLog40ihM0f+OCnAaA/kfVtVUdzPqiiP9KvwqU7Ch7lo5KWd3TlAvUPmionZE+ I+raPA9enEpe7994gQHLJfVuqszZAwqF/cVH+cTx6IIKOHms1XdU4Ve14sc9eOSoHIgV l5SKz84CdriKwzvD7lSHA5m7nPdeG3s7qMoYYndNamPEZcanNiNSrfwzKrs79HOZI3zD TcZdpockMfJp82/8yu8E3TZqbwwQqfaMnM55L0KtDRrNgxLeH784BW9kckBnfWyZK0DP 9FWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=J6OcgCnO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y14-20020a63e24e000000b003816043eeb3si20560454pgj.168.2022.03.31.02.27.26; Thu, 31 Mar 2022 02:27:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=J6OcgCnO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231927AbiCaHZa (ORCPT + 99 others); Thu, 31 Mar 2022 03:25:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230421AbiCaHZZ (ORCPT ); Thu, 31 Mar 2022 03:25:25 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FD206142 for ; Thu, 31 Mar 2022 00:23:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1648711417; x=1680247417; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=g1tUG2Q8DPCSkogHEocQF8gq9/NFNRZEipovqQEvc7c=; b=J6OcgCnOEqChPElYpAKNDGy20ESv7m6JeCQ0n+HljjULtGRYQb0RrHwj ECp+xnmGQ1tMUCmq7mM2IVa64N4Xy2hwywF/iUCSVfP/jcbK5rOYGGJ6k bdWi1hWhTFM3OUELetY59dcil2pWwKeTmnxJf2YNDLNC3RPwF0d9QpKh7 HOf8jDru5kpQ4u/0Ugg6bwCL3S16y8U0K3iYDFQRUrnJjUagEdXjrFXDA HjE1tZL9pTtjT+2Znasi3JrhCQwB3r3VdkzevZIiw1a2H9GO3L9CUb0pe UeOCTklo5rI9e155Q1DQULAT10y7jBwZ93cDzg0Ixbe/GWcrphx/BoTuk g==; X-IronPort-AV: E=McAfee;i="6200,9189,10302"; a="322923129" X-IronPort-AV: E=Sophos;i="5.90,224,1643702400"; d="scan'208";a="322923129" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Mar 2022 00:23:21 -0700 X-IronPort-AV: E=Sophos;i="5.90,224,1643702400"; d="scan'208";a="566225927" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.94]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Mar 2022 00:23:18 -0700 From: "Huang, Ying" To: "Aneesh Kumar K.V" , Jagdish Gediya Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com, Fan Du Subject: Re: [PATCH] mm: migrate: set demotion targets differently References: <20220329115222.8923-1-jvgediya@linux.ibm.com> <87pmm4c4ys.fsf@yhuang6-desk2.ccr.corp.intel.com> <87lewrxsv1.fsf@linux.ibm.com> <878rsrc672.fsf@yhuang6-desk2.ccr.corp.intel.com> <87ilruy5zt.fsf@linux.ibm.com> Date: Thu, 31 Mar 2022 15:23:16 +0800 In-Reply-To: <87ilruy5zt.fsf@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Thu, 31 Mar 2022 12:15:58 +0530") Message-ID: <87h77ebn6j.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org "Aneesh Kumar K.V" writes: > "Huang, Ying" writes: > >> "Aneesh Kumar K.V" writes: >> >>> "Huang, Ying" writes: >>> >>>> Hi, Jagdish, >>>> >>>> Jagdish Gediya writes: >>>> >>> >>> ... >>> >>>>> e.g. with below NUMA topology, where node 0 & 1 are >>>>> cpu + dram nodes, node 2 & 3 are equally slower memory >>>>> only nodes, and node 4 is slowest memory only node, >>>>> >>>>> available: 5 nodes (0-4) >>>>> node 0 cpus: 0 1 >>>>> node 0 size: n MB >>>>> node 0 free: n MB >>>>> node 1 cpus: 2 3 >>>>> node 1 size: n MB >>>>> node 1 free: n MB >>>>> node 2 cpus: >>>>> node 2 size: n MB >>>>> node 2 free: n MB >>>>> node 3 cpus: >>>>> node 3 size: n MB >>>>> node 3 free: n MB >>>>> node 4 cpus: >>>>> node 4 size: n MB >>>>> node 4 free: n MB >>>>> node distances: >>>>> node 0 1 2 3 4 >>>>> 0: 10 20 40 40 80 >>>>> 1: 20 10 40 40 80 >>>>> 2: 40 40 10 40 80 >>>>> 3: 40 40 40 10 80 >>>>> 4: 80 80 80 80 10 >>>>> >>>>> The existing implementation gives below demotion targets, >>>>> >>>>> node demotion_target >>>>> 0 3, 2 >>>>> 1 4 >>>>> 2 X >>>>> 3 X >>>>> 4 X >>>>> >>>>> With this patch applied, below are the demotion targets, >>>>> >>>>> node demotion_target >>>>> 0 3, 2 >>>>> 1 3, 2 >>>>> 2 3 >>>>> 3 4 >>>>> 4 X >>>> >>>> For such machine, I think the perfect demotion order is, >>>> >>>> node demotion_target >>>> 0 2, 3 >>>> 1 2, 3 >>>> 2 4 >>>> 3 4 >>>> 4 X >>> >>> I guess the "equally slow nodes" is a confusing definition here. Now if the >>> system consists of 2 1GB equally slow memory and the firmware doesn't want to >>> differentiate between them, firmware can present a single NUMA node >>> with 2GB capacity? The fact that we are finding two NUMA nodes is a hint >>> that there is some difference between these two memory devices. This is >>> also captured by the fact that the distance between 2 and 3 is 40 and not 10. >> >> Do you have more information about this? > > Not sure I follow the question there. I was checking shouldn't firmware > do a single NUMA node if two memory devices are of the same type? How will > optane present such a config? Both the DIMMs will have the same > proximity domain value and hence dax kmem will add them to the same NUMA > node? Sorry for confusing. I just wanted to check whether you have more information about the machine configuration above. The machines in my hand have no complex NUMA topology as in the patch description. > If you are suggesting that firmware doesn't do that, then I agree with you > that a demotion target like the below is good. > > node demotion_target > 0 2, 3 > 1 2, 3 > 2 4 > 3 4 > 4 X > > We can also achieve that with a smiple change as below. Glad to see the demotion order can be implemented in a simple way. My concern is that is it necessary to do this? If there are real machines with the NUMA topology, then I think it's good to add the support. But if not, why do we make the code complex unnecessarily? I don't have these kind of machines, do you have and will have? > @@ -3120,7 +3120,7 @@ static void __set_migration_target_nodes(void) > { > nodemask_t next_pass = NODE_MASK_NONE; > nodemask_t this_pass = NODE_MASK_NONE; > - nodemask_t used_targets = NODE_MASK_NONE; > + nodemask_t this_pass_used_targets = NODE_MASK_NONE; > int node, best_distance; > > /* > @@ -3141,17 +3141,20 @@ static void __set_migration_target_nodes(void) > /* > * To avoid cycles in the migration "graph", ensure > * that migration sources are not future targets by > - * setting them in 'used_targets'. Do this only > + * setting them in 'this_pass_used_targets'. Do this only > * once per pass so that multiple source nodes can > * share a target node. > * > - * 'used_targets' will become unavailable in future > + * 'this_pass_used_targets' will become unavailable in future > * passes. This limits some opportunities for > * multiple source nodes to share a destination. > */ > - nodes_or(used_targets, used_targets, this_pass); > + nodes_or(this_pass_used_targets, this_pass_used_targets, this_pass); > > for_each_node_mask(node, this_pass) { > + > + nodemask_t used_targets = this_pass_used_targets; > + > best_distance = -1; > > /* Best Regards, Huang, Ying