Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp168242imw; Wed, 13 Jul 2022 22:08:25 -0700 (PDT) X-Google-Smtp-Source: AGRyM1ucijwFIoLJCl2HG177UCOeF8nX9XGubDh20qeYENoCn4m23GluDuHlhTha55g35B2Nnf5W X-Received: by 2002:a17:90a:1943:b0:1ef:8146:f32f with SMTP id 3-20020a17090a194300b001ef8146f32fmr7846337pjh.112.1657775304985; Wed, 13 Jul 2022 22:08:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657775304; cv=none; d=google.com; s=arc-20160816; b=CoTlhDt0KWHJnF22Wt6bp/kqi/Bs9K842ZIjwOBuO+VYxTIJuCmTwPj94Yf126ptpN H8vPVi/QI4uoup0y7wMLaJS42Xym4pJkjI/sOhFqr1XmzO1jaCAevbXQGUDsfS8AKBom GmBfSossnlNAYUguyYDxEreKfD2iHpMs8EKOUjc8Q5BVIte6o+0LFBpZr2d6875vDDoL 8zGEwbXfYOao0qRR2tWuxD7yeSvzmfKuLhyFx9ib11Diy1vtYmQhkscpBNPSUTHAU+b8 IDf8P3AuAjJKEaeyplgJePnlRtr1uHAt2VzDrvlTneuJjjfGqjm3Cfc3WVFoxDONEn+F VP/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=WM7c+7kDaUxU6x17nu8qbekhuGZhGqJUIWiHcv8BrBg=; b=KYd6cqgKyi+7DgPXkYz7K4oKyX1zFa+DBjEvAbtabEZXEW0IiLYtwX/ViZ6ODRgc2b Pe3I4WioJQUeYZBMTlOuWRA6tzYC+vS4YZiqP5q3ztB1yjf6BM4hGF5VY3JKAxc9w0m/ ASMYXZPUvFPnE5zbgqYVMJqAbUCv3gwwWxyF0xikbFsqaB0h0QJzV7iFoniduiBzX5By cTcLFId+vgLmoHnNUI7kBZuFbqkQDFB5juuQLXOqWUDwe1elN3d9ad6foOhhBN+PYVuJ 9Mxe+Y/0jxjOJBuiY76SVqTW1uCRrQIGLjpvbnvCRdQgmdVxhGZbCRTO/D70rFZyGTI2 8DHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GtPSE7cV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j15-20020a170903024f00b0016a1dfe96f4si727854plh.224.2022.07.13.22.08.13; Wed, 13 Jul 2022 22:08:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GtPSE7cV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232066AbiGNFCU (ORCPT + 99 others); Thu, 14 Jul 2022 01:02:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230435AbiGNFAW (ORCPT ); Thu, 14 Jul 2022 01:00:22 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55A2325E94 for ; Wed, 13 Jul 2022 21:57:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657774637; x=1689310637; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=lwQnOGjgivQ5p2oJYc4Kbap67iNh8bkmNF0iKNVM8Oo=; b=GtPSE7cV+cUQQj2ay8gZSqzTOaSta9qsTh+r0Jg7W+nS4BwjVqHY1kL+ tpXxKXIcwPjEZsuU4/9SSWhfEdThoMdzxBkOpQjBn9jfkgOGs7Bvbgaiw 2Twjnu/eoQOx6UGu/xItgsHEQTiJ4tZpBvYGTdPQVQ+sVjkmVKXz0xFfK YXAi2JBGqMJQhvSagcz31cOL9e9WzMjpay+MJxYCm5rDqJzQbzS+r+QkZ /t0s3xZJpkcTS18hLQfE1yPJnRdqIJ0tfVqP/Rd6magt4nmA1Je+xT/9D aZ2FWY/Fz7Pmb9IThkvf9nc2dNlc/vtqVe+kigyT6mUlVtYtIQBQTBogf A==; X-IronPort-AV: E=McAfee;i="6400,9594,10407"; a="286554359" X-IronPort-AV: E=Sophos;i="5.92,269,1650956400"; d="scan'208";a="286554359" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2022 21:57:16 -0700 X-IronPort-AV: E=Sophos;i="5.92,269,1650956400"; d="scan'208";a="546128526" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.94]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2022 21:57:13 -0700 From: "Huang, Ying" To: "Aneesh Kumar K.V" Cc: Johannes Weiner , linux-mm@kvack.org, akpm@linux-foundation.org, Wei Xu , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , jvgediya.oss@gmail.com Subject: Re: [PATCH v8 00/12] mm/demotion: Memory tiers and demotion References: <20220704070612.299585-1-aneesh.kumar@linux.ibm.com> <87r130b2rh.fsf@yhuang6-desk2.ccr.corp.intel.com> <60e97fa2-0b89-cf42-5307-5a57c956f741@linux.ibm.com> <87r12r5dwu.fsf@yhuang6-desk2.ccr.corp.intel.com> <0a55e48a-b4b7-4477-a72f-73644b5fc4cb@linux.ibm.com> <87mtde6cla.fsf@yhuang6-desk2.ccr.corp.intel.com> <87ilo267jl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edyp67m1.fsf@yhuang6-desk2.ccr.corp.intel.com> <878roxuz9l.fsf@linux.ibm.com> Date: Thu, 14 Jul 2022 12:56:55 +0800 In-Reply-To: <878roxuz9l.fsf@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Wed, 13 Jul 2022 15:10:06 +0530") Message-ID: <87o7xs47hk.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org "Aneesh Kumar K.V" writes: > "Huang, Ying" writes: [snip] >> >> I believe that sparse memory tier IDs can make memory tier more stable >> in some cases. But this is different from the system suggested by >> Johannes. Per my understanding, with Johannes' system, we will >> >> - one driver may online different memory types (such as kmem_dax may >> online HBM, PMEM, etc.) >> >> - one memory type manages several memory nodes (NUMA nodes) >> >> - one "abstract distance" for each memory type >> >> - the "abstract distance" can be offset by user space override knob >> >> - memory tiers generated dynamic from different memory types according >> "abstract distance" and overridden "offset" >> >> - the granularity to group several memory types into one memory tier can >> be overridden via user space knob >> >> In this way, the memory tiers may be changed totally after user space >> overridden. It may be hard to link memory tiers before/after the >> overridden. So we may need to reset all per-memory-tier configuration, >> such as cgroup paritation limit or interleave weight, etc. > > Making sure we all agree on the details. > > In the proposal https://lore.kernel.org/linux-mm/7b72ccf4-f4ae-cb4e-f411-74d055482026@linux.ibm.com > instead of calling it "abstract distance" I was referring it as device > attributes. > > Johannes also suggested these device attributes/"abstract distance" > to be used to derive the memory tier to which the memory type/memory > device will be assigned. > > So dax kmem would manage different types of memory and based on the device > attributes, we would assign them to different memory tiers (memory tiers > in the range [0-200)). > > Now the additional detail here is that we might add knobs that will be > used by dax kmem to fine-tune memory types to memory tiers assignment. > On updating these knob values, the kernel should rebuild the entire > memory tier hierarchy. (earlier I was considering only newly added > memory devices will get impacted by such a change. But I agree it > makes sense to rebuild the entire hierarchy again) But that rebuilding > will be restricted to dax kmem driver. > Thanks for explanation and pointer. Per my understanding, memory types and memory devices including abstract distances are used to describe the *physical* memory devices, not *policy*. We may add more physical attributes to these memory devices, such as, latency, throughput, etc. I think we can reach consensus on this point? In contrast, memory tiers are more about policy, such as demotion/promotion, interleaving and possible partition among cgroups. How to derive memory tiers from memory types (or devices)? We have multiple choices. Per my understanding, Johannes suggested to use some policy parameters such as distance granularity (e.g., if granularity is 100, then memory devices with abstract distance 0-100, 100-200, 200-300, ... will be put to memory tier 0, 1, 2, ...) to build the memory tiers. Distance granularity may be not flexible enough, we may need something like a set of cutoffs or range, e.g., 50, 100, 200, 500, or 0-50, 50-100, 100-200, 200-500, >500. These policy parameters should be overridable from user space. And per my understanding, you suggested to place memory devices to memory tiers directly via a knob of memory types (or memory devices). e.g., memory_type/memtier can be written to place the memory devices of the memory_type to the specified memtier. Or via memorty_type/distance_offset to do that. Best Regards, Huang, Ying [snip]