Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp781801rdd; Tue, 9 Jan 2024 22:09:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IG+oK2hQicjnjLWIlJrQ+TvyCY845rks3wxs6iGoVdGOVwdHJ2E0+3TOhIPfCQH9dAy2kpq X-Received: by 2002:a05:6830:e12:b0:6dd:ed2a:5442 with SMTP id do18-20020a0568300e1200b006dded2a5442mr227334otb.55.1704866941975; Tue, 09 Jan 2024 22:09:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704866941; cv=none; d=google.com; s=arc-20160816; b=MFAMBpsy/Liyfi9KAF7AsCG4IF7U01rUjfb+o50lvXY48uMETMsJO5FaTuCIt2d9uq /M3jiEiDfsPlmX/9mx5USmNFsKcts3f5g+IDLQTYxWEWFBWwAlw0NMoHmmoQQma+kKEs IBi5mXBKQUXr9Pw2LSPm5SIkMcqfEjZIc6O2fisj4y8liFzoXO6OPvzKF89CQES2yvax uD9/0K/Mcf7BxahZShlL8OQ85Qab9jzvo+QxLKtwRAZkQsjaPggXRln3+mqp+wCvCdeD ZozcWr+w2lt1Pm5N69mDjJhsGiS6lWbyRJmKh1E64mbcsUhKWCTvy8pPrJWqz2oPb/lG Z1vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=99nhO8NOrB/69zMq53HJs+9QxZHZqpMgy8w8qPwQIUA=; fh=aQvvszqu3ai/U+ul3iN8t72VX4tNpiHtcFFNZiNP46o=; b=rZ3dSAWNXaH0NmEECD3FFG9to23U/Tw/dmbvGMgmRWluhFJvYU1j9rEFwcdCgF0mmk 815pFHl4cusNXnl9EL0bRAunKvOm36++Xx+RjqIuu9gXwT9HvlYgQn3oZM3FXZNId7Lu YxC4HiezXO6MSLOGG+bqKPgJXmLPTUxdeBKf721BA1vNlecZj4IeeSp8GSU8OycAqSkE CLIvnCy6M4W8pITu2/NmMYJt+fz7barCVymQiKg41GFcmbTKOLLb4AVglkFR1glI0n1y yOUxfapAaSuT7ROFwyFEu8TzFFB+fsFegajaecsxo7YV63AOfpxrSH+GIiZyLlV4WuM8 ybMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=grS0AZGh; spf=pass (google.com: domain of linux-kernel+bounces-21738-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-21738-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id h185-20020a636cc2000000b005c625d44bd5si2740805pgc.281.2024.01.09.22.09.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jan 2024 22:09:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-21738-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=grS0AZGh; spf=pass (google.com: domain of linux-kernel+bounces-21738-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-21738-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id E393FB246FA for ; Wed, 10 Jan 2024 06:08:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1F07232C72; Wed, 10 Jan 2024 06:08:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="grS0AZGh" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2186DF6C; Wed, 10 Jan 2024 06:08:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704866928; x=1736402928; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=xCUpG7MPuTydGqWkJiwHDdaFqNDHXSJYG5vfi+6flw4=; b=grS0AZGhNq8soDcw6nbog9S8KeGUMle8P2BED5TJf+QI7Bbv4pwyINI+ AhFVbeljHYndWyjl2eZYZMbEHvn55vyagcRpzfcOxLmLdpxmoBRa/Yzxk PIjaIdnXJRm+/EJC7iRPQW8e1BtE55jqP8vQOp5lxEHQe21YhmRYq+HU9 p401qReKBNpTcWfe1MvRheh0Yn4JCcNNoIaY3y0eHzAwVUkF1xRkzUgln lHPJVyAQWCRh3XWT8Py2Nwfwawwx8Q7SO7uP3DkXq6sWAxfoRM0OZd4jU ZoAOhHXI35eHwkTQjAsXGxKoX9IVTYB4SkXTGJ3qp904K18/c3HRt2rOd A==; X-IronPort-AV: E=McAfee;i="6600,9927,10947"; a="11765306" X-IronPort-AV: E=Sophos;i="6.04,184,1695711600"; d="scan'208";a="11765306" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jan 2024 22:08:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10947"; a="852435366" X-IronPort-AV: E=Sophos;i="6.04,184,1695711600"; d="scan'208";a="852435366" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jan 2024 22:08:42 -0800 From: "Huang, Ying" To: Jonathan Cameron Cc: Gregory Price , Srinivasulu Thanneeru , Srinivasulu Opensrc , "linux-cxl@vger.kernel.org" , "linux-mm@kvack.org" , "aneesh.kumar@linux.ibm.com" , "dan.j.williams@intel.com" , "mhocko@suse.com" , "tj@kernel.org" , "john@jagalactic.com" , Eishan Mirakhur , Vinicius Tavares Petrucci , Ravis OpenSrc , "linux-kernel@vger.kernel.org" , "Johannes Weiner" , Wei Xu , Hao Xiang , "Ho-Ren (Jack) Chuang" Subject: Re: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory tiers In-Reply-To: <20240109155049.00003f13@Huawei.com> (Jonathan Cameron's message of "Tue, 9 Jan 2024 15:50:49 +0000") References: <20231213175329.594-1-sthanneeru.opensrc@micron.com> <87cyv8qcqk.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fs00njft.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edezc5l1.fsf@yhuang6-desk2.ccr.corp.intel.com> <87a5pmddl5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87wmspbpma.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o7dv897s.fsf@yhuang6-desk2.ccr.corp.intel.com> <20240109155049.00003f13@Huawei.com> Date: Wed, 10 Jan 2024 14:06:44 +0800 Message-ID: <874jfl90y3.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Jonathan Cameron writes: > On Tue, 09 Jan 2024 11:41:11 +0800 > "Huang, Ying" wrote: > >> Gregory Price writes: >> >> > On Thu, Jan 04, 2024 at 02:05:01PM +0800, Huang, Ying wrote: >> >> > >> >> > From https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf >> >> > abstract_distance_offset: override by users to deal with firmware issue. >> >> > >> >> > say firmware can configure the cxl node into wrong tiers, similar to >> >> > that it may also configure all cxl nodes into single memtype, hence >> >> > all these nodes can fall into a single wrong tier. >> >> > In this case, per node adistance_offset would be good to have ? >> >> >> >> I think that it's better to fix the error firmware if possible. And >> >> these are only theoretical, not practical issues. Do you have some >> >> practical issues? >> >> >> >> I understand that users may want to move nodes between memory tiers for >> >> different policy choices. For that, memory_type based adistance_offset >> >> should be good. >> >> >> > >> > There's actually an affirmative case to change memory tiering to allow >> > either movement of nodes between tiers, or at least base placement on >> > HMAT information. Preferably, membership would be changable to allow >> > hotplug/DCD to be managed (there's no guarantee that the memory passed >> > through will always be what HMAT says on initial boot). >> >> IIUC, from Jonathan Cameron as below, the performance of memory >> shouldn't change even for DCD devices. >> >> https://lore.kernel.org/linux-mm/20231103141636.000007e4@Huawei.com/ >> >> It's possible to change the performance of a NUMA node changed, if we >> hot-remove a memory device, then hot-add another different memory >> device. It's hoped that the CDAT changes too. > > Not supported, but ACPI has _HMA methods to in theory allow changing > HMAT values based on firmware notifications... So we 'could' make > it work for HMAT based description. > > Ultimately my current thinking is we'll end up emulating CXL type3 > devices (hiding topology complexity) and you can update CDAT but > IIRC that is only meant to be for degraded situations - so if you > want multiple performance regions, CDAT should describe them form the start. Thank you very much for input! So, to support degraded performance, we will need to move a NUMA node between memory tiers. And, per my understanding, we should do that in kernel. >> >> So, all in all, HMAT + CDAT can help us to put the memory device in >> appropriate memory tiers. Now, we have HMAT support in upstream. We >> will working on CDAT support. >> -- Best Regards, Huang, Ying