Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp3036350imw; Mon, 18 Jul 2022 00:17:15 -0700 (PDT) X-Google-Smtp-Source: AGRyM1teatEhxvkQKcSVzqGeVCaIw3ZFZCi4+5IfJPja/6pEQCVOCS5KErT5zWmjQvfL02qv98Xm X-Received: by 2002:a63:4942:0:b0:41a:20e8:cba1 with SMTP id y2-20020a634942000000b0041a20e8cba1mr4175626pgk.514.1658128635657; Mon, 18 Jul 2022 00:17:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658128635; cv=none; d=google.com; s=arc-20160816; b=YuwsHZ/7CcDPW2n5Yuum1GPeFDAKRLHPOGqY0RCQ/fElmjO5vLmnNZZCE7o48aC8V4 xJBuYnbLTT4/+WE4LdbSq34DXHxH+tyxb0TkdWlx2eXWnano+MLqGaGWhaiGsT/sO5Ue NYEt4frGJp5cEiQUmyd7fu2hZrbv9W9V+cvgD+/zyBey0sldAiFA4jaQN8r0I3NBa24h 4i5CxsROpK8CSCO7LlyRFv899GFdecaGef3nTEGSHpKCHldoOOdj6UJM/YVkesrkU7oV sztGm1NLa7wFKMsqCmJLXXmqZP4YABLDlp3dhd1xl4uoK+9LGJHBKflXzU1A7zQ5zMIV AQeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=++WSFdn27j34ZyOj9JlpdxS1HaxWQms5paNZLnyCrzk=; b=nZ+IZXc+rRtvmXdBSo6cryFr38mWO8SNhD38AqQ8rrkdUroZRlCIzfaMsa/Cm2jYmm u04kcmdD0UjBs6OYR/5d7b435THnfZVw6X+RMoYgB3z8h4S9W1+st1IS11pwmedQzGox A2xdGu4dEs1aeJ7qQaMsu1wMngEUj2BINiqspdQ8ybFMkZfocr9RY3+71SrrGdA2Oh4k JLlycR1Lk7qD8PvM9K85pUJmLdF0YTmFau+fNQlyQ/JAUcztJCI8oCj9qAAEDE9yBA7b lhCQ02K74THmduQrLv46EVU4lW/JI699ViqcyB/pEp66zU3emz66kAqvZehrdmAHhk2W s1Hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=c1+P4Muf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a14-20020a17090a8c0e00b001efc2d68f86si17984550pjo.61.2022.07.18.00.17.01; Mon, 18 Jul 2022 00:17:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=c1+P4Muf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233675AbiGRG6A (ORCPT + 99 others); Mon, 18 Jul 2022 02:58:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233606AbiGRG5v (ORCPT ); Mon, 18 Jul 2022 02:57:51 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D059E165B4 for ; Sun, 17 Jul 2022 23:57:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658127470; x=1689663470; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=kP3TMLhOBN0SOfYIAaFFq2xdvS0WfG2ZEeJ6EaXyvT0=; b=c1+P4MufSBGQoO+sPBy9Q5Ezyj4JXJbFV1EAbBv8zmqRm7+v2PDgikL6 VwBG3VuaazET0UybOfJG2r5ECxaSnEkXBN4Qr7NJHc2jLTwClRcemrNC3 /jZV9vCFHQ++ZaZqOXZdAYFa/2vZ0mvWL9YhzN9v4SSWD23MPnu/97DC5 A0zeCsqvxiJ2j2cqJbVYCh/bNKuwvKG5LI0u5xccNy0E21W0JDqK96Pgq okpJdnrk2Ab3/FgOY9lEB9AxlBJbk3AdQmewSu2yXSCOOILEJk3qHDSsW aK/wdBLeG7OmOGvM1T90BeSWhPCfHys/pdl+WUk3f6qMwVppJ58xsnJ2o w==; X-IronPort-AV: E=McAfee;i="6400,9594,10411"; a="286169616" X-IronPort-AV: E=Sophos;i="5.92,280,1650956400"; d="scan'208";a="286169616" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2022 23:57:50 -0700 X-IronPort-AV: E=Sophos;i="5.92,280,1650956400"; d="scan'208";a="572284966" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.94]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2022 23:57:46 -0700 From: "Huang, Ying" To: Aneesh Kumar K V Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Wei Xu , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, Jagdish Gediya Subject: Re: [PATCH v9 1/8] mm/demotion: Add support for explicit memory tiers References: <20220714045351.434957-1-aneesh.kumar@linux.ibm.com> <20220714045351.434957-2-aneesh.kumar@linux.ibm.com> <87bktq4xs7.fsf@yhuang6-desk2.ccr.corp.intel.com> <3659f1bb-a82e-1aad-f297-808a2c17687d@linux.ibm.com> Date: Mon, 18 Jul 2022 14:57:42 +0800 In-Reply-To: <3659f1bb-a82e-1aad-f297-808a2c17687d@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Fri, 15 Jul 2022 14:38:43 +0530") Message-ID: <87tu7e3o2h.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Aneesh Kumar K V writes: > On 7/15/22 1:23 PM, Huang, Ying wrote: [snip] >> >> You dropped the original sysfs interface patches from the series, but >> the kernel internal implementation is still for the original sysfs >> interface. For example, memory tier ID is for the original sysfs >> interface, not for the new proposed sysfs interface. So I suggest you >> to implement with the new interface in mind. What do you think about >> the following design? >> > > Sorry I am not able to follow you here. This patchset completely drops > exposing memory tiers to userspace via sysfs. Instead it allow > creation of memory tiers with specific tierID from within the kernel/device driver. > Default tierID is 200 and dax kmem creates memory tier with tierID 100. > > >> - Each NUMA node belongs to a memory type, and each memory type >> corresponds to a "abstract distance", so each NUMA node corresonds to >> a "distance". For simplicity, we can start with static distances, for >> example, DRAM (default): 150, PMEM: 250. The distance of each NUMA >> node can be recorded in a global array, >> >> int node_distances[MAX_NUMNODES]; >> >> or, just >> >> pgdat->distance >> > > I don't follow this. I guess you are trying to have a different design. > Would it be much easier if you can write this in the form of a patch? Written some pseudo code as follow to show my basic idea. #define MEMORY_TIER_ADISTANCE_DRAM 150 #define MEMORY_TIER_ADISTANCE_PMEM 250 struct memory_tier { /* abstract distance range covered by the memory tier */ int adistance_start; int adistance_len; struct list_head list; nodemask_t nodemask; }; /* RCU list of memory tiers */ static LIST_HEAD(memory_tiers); /* abstract distance of each NUMA node */ int node_adistances[MAX_NUMNODES]; struct memory_tier *find_create_memory_tier(int adistance) { struct memory_tier *tier; list_for_each_entry(tier, &memory_tiers, list) { if (adistance >= tier->adistance_start && adistance < tier->adistance_start + tier->adistance_len) return tier; } /* allocate a new memory tier and return */ } void memory_tier_add_node(int nid) { int adistance; struct memory_tier *tier; adistance = node_adistances[nid] || MEMORY_TIER_ADISTANCE_DRAM; tier = find_create_memory_tier(adistance); node_set(nid, &tier->nodemask); /* setup demotion data structure, etc */ } static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, unsigned long action, void *_arg) { struct memory_notify *arg = _arg; int nid; nid = arg->status_change_nid; if (nid < 0) return notifier_from_errno(0); switch (action) { case MEM_ONLINE: memory_tier_add_node(nid); break; } return notifier_from_errno(0); } /* kmem.c */ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) { node_adistances[dev_dax->target_node] = MEMORY_TIER_ADISTANCE_PMEM; /* add_memory_driver_managed() */ } [snip] Best Regards, Huang, Ying