Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp3706464rwi; Wed, 2 Nov 2022 02:04:47 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7QSjJdkzG5eDxlpW7XHo9jA+b1MQq8OvQUVrk3uONMKDb1Oi+4Xock9SPpVtu8tqi260oV X-Received: by 2002:a17:907:6d0b:b0:7ad:f7ce:fad4 with SMTP id sa11-20020a1709076d0b00b007adf7cefad4mr5091629ejc.472.1667379887079; Wed, 02 Nov 2022 02:04:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667379887; cv=none; d=google.com; s=arc-20160816; b=TXGZ6bRdZ1ABCf1KVWPKDWjZCJUkXpKoI51thIub2FdsXCrpxjWrR0Gu++ac+5JbRr mgkNYClmbXMAEm4ou5UY4iZPLoyuGypFQJecCrI8zgWnSXYljixAVhtGY1dz0KrE4E/m GxbqJjjO2IzQV4NDHiHp38RYVNMH3EA3PowYZnyAJ63FXxooypX7x3GHdq4qKrtwtKwW 5NXXrHxLm4RX5EgEklHUOSYXyQMmCk8yCLpAPzaIUf8jF6Q2nvAZbGV0Gwpw6V+fIbvP xpTq0j4mslKrEW95A4+gMYb7MviRIUyzwkX6e5v9eNafQ4v/eg2fI7XZ4fzPApji/bZa KEkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=nBU2uKBZJkhLDSkLei84kmcAK9ewj/ssKD1ALUI5KMw=; b=y+X2+MoknB8MdpeuS1gbOuW4UJDmrtwwCHt09m7rv2RbpzLuq+i8uGFl++eeVr3rE4 SXti053XrnKrTwEnYpZF8dQxDOyAUetXqs2ZyRuotcPg5HRdM49otvDtNYzCJEBm6+Dc WQ7sYDsDM12YzqymiY4edIiObimbVUaS403TSy1SaabXPDL3QTlDBiqxhlOWba9TI8T7 S/u9RDTixcd9jGrZYkGrWROhOah0MRHBwG+staStb06aCaexF++P+t1AWthTQK5NKSlk xNQSPkbIPGMqGWGzWDGO+QKFaYh4umpLXPDOhbc7uxcuHDwKhCYnOw0Bh1w3BV1eLsnP B1FA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="X/x8PSSS"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j22-20020aa7c0d6000000b00461d5afea01si13572178edp.165.2022.11.02.02.04.22; Wed, 02 Nov 2022 02:04:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="X/x8PSSS"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231136AbiKBIq3 (ORCPT + 97 others); Wed, 2 Nov 2022 04:46:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231129AbiKBIqW (ORCPT ); Wed, 2 Nov 2022 04:46:22 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C330927CD9 for ; Wed, 2 Nov 2022 01:46:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667378781; x=1698914781; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=aS31B0qCdqNEVkcc38xnNSgQkHYgfw1HrY1wqPLFUtU=; b=X/x8PSSSLZ6yS6/krARQgPIANDUYPRH+LpGj+YM5nHS/Meqx7XPuW2hm R5hyZt3IiTkg2YOHrqhZmd299S/jSza6kU63kYm/xd5lovzK8jAXvPVeA GZqjPA4Qt1cPHJKQmkpOXYPtgG9+MAdwRC2j3HmOg5Oh6Jq1rcTJUqAgD yHpUDjnXo4avoqvCh7yXJYOjcdp72SEtLCmSJIH0SqKhS0UMsNjYmW+xm o87/ziB4SwstVSKPmdAwU0CLmF2GZaUo8t9vJby+RJJuadkgROFmCVXR4 EPp98D4RrT+IHVd+v2SD43C0QwK32LD+U71glnr2I3dqojFfn0PC4o1wl Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10518"; a="310454506" X-IronPort-AV: E=Sophos;i="5.95,232,1661842800"; d="scan'208";a="310454506" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2022 01:46:21 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10518"; a="634180580" X-IronPort-AV: E=Sophos;i="5.95,232,1661842800"; d="scan'208";a="634180580" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2022 01:46:17 -0700 From: "Huang, Ying" To: Michal Hocko Cc: Bharata B Rao , Aneesh Kumar K V , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Alistair Popple , Dan Williams , Dave Hansen , Davidlohr Bueso , Hesham Almatary , Jagdish Gediya , Johannes Weiner , Jonathan Cameron , Tim Chen , Wei Xu , Yang Shi Subject: Re: [RFC] memory tiering: use small chunk size and more tiers References: <0d938c9f-c810-b10a-e489-c2b312475c52@amd.com> <87tu3oibyr.fsf@yhuang6-desk2.ccr.corp.intel.com> <07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com> <87leowepz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <878rkuchpm.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkppbx75.fsf@yhuang6-desk2.ccr.corp.intel.com> <877d0dbw13.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 02 Nov 2022 16:45:38 +0800 In-Reply-To: (Michal Hocko's message of "Wed, 2 Nov 2022 09:39:25 +0100") Message-ID: <8735b1bv7x.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Wed 02-11-22 16:28:08, Huang, Ying wrote: >> Michal Hocko writes: >> >> > On Wed 02-11-22 16:02:54, Huang, Ying wrote: >> >> Michal Hocko writes: >> >> >> >> > On Wed 02-11-22 08:39:49, Huang, Ying wrote: >> >> >> Michal Hocko writes: >> >> >> >> >> >> > On Mon 31-10-22 09:33:49, Huang, Ying wrote: >> >> >> > [...] >> >> >> >> In the upstream implementation, 4 tiers are possible below DRAM. That's >> >> >> >> enough for now. But in the long run, it may be better to define more. >> >> >> >> 100 possible tiers below DRAM may be too extreme. >> >> >> > >> >> >> > I am just curious. Is any configurations with more than couple of tiers >> >> >> > even manageable? I mean applications have been struggling even with >> >> >> > regular NUMA systems for years and vast majority of them is largerly >> >> >> > NUMA unaware. How are they going to configure for a more complex system >> >> >> > when a) there is no resource access control so whatever you aim for >> >> >> > might not be available and b) in which situations there is going to be a >> >> >> > demand only for subset of tears (GPU memory?) ? >> >> >> >> >> >> Sorry for confusing. I think that there are only several (less than 10) >> >> >> tiers in a system in practice. Yes, here, I suggested to define 100 (10 >> >> >> in the later text) POSSIBLE tiers below DRAM. My intention isn't to >> >> >> manage a system with tens memory tiers. Instead, my intention is to >> >> >> avoid to put 2 memory types into one memory tier by accident via make >> >> >> the abstract distance range of each memory tier as small as possible. >> >> >> More possible memory tiers, smaller abstract distance range of each >> >> >> memory tier. >> >> > >> >> > TBH I do not really understand how tweaking ranges helps anything. >> >> > IIUC drivers are free to assign any abstract distance so they will clash >> >> > without any higher level coordination. >> >> >> >> Yes. That's possible. Each memory tier corresponds to one abstract >> >> distance range. The larger the range is, the higher the possibility of >> >> clashing is. So I suggest to make the abstract distance range smaller >> >> to reduce the possibility of clashing. >> > >> > I am sorry but I really do not understand how the size of the range >> > actually addresses a fundamental issue that each driver simply picks >> > what it wants. Is there any enumeration defining basic characteristic of >> > each tier? How does a driver developer knows which tear to assign its >> > driver to? >> >> The smaller range size will not guarantee anything. It just tries to >> help the default behavior. >> >> The drivers are expected to assign the abstract distance based on the >> memory latency/bandwidth, etc. > > Would it be possible/feasible to have a canonical way to calculate the > abstract distance from these characteristics by the core kernel so that > drivers do not even have fall into that trap? Yes. That sounds a good idea. We can provide a function to map from the memory latency/bandwidth to the abstract distance for the drivers. Best Regards, Huang, Ying