Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp117729rwe; Fri, 26 Aug 2022 01:40:23 -0700 (PDT) X-Google-Smtp-Source: AA6agR6c4wwOQG8OSvwnuwyqkHVoRL7yLmcaPXbXSxfVddCz5XJj7DJe2tbSjQ3fkRibjvpx6AhM X-Received: by 2002:a17:902:cec7:b0:172:5b09:161c with SMTP id d7-20020a170902cec700b001725b09161cmr2695760plg.60.1661503223204; Fri, 26 Aug 2022 01:40:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661503223; cv=none; d=google.com; s=arc-20160816; b=VG27iK+RNGCUqKVYakn7wsjioN+PQZoXQkQW41JttTD9iGUMS5rUQXcT5I3HkvYh1C z7MWyqu9OP7jnSXL/JFcBSAy8kxvedzZtZXRNOxkvAymjM1EZyambYwj+SEriasbF+69 5/fH874vPB2onHnhZvSnVGS4tUNid2Fkp6/gtutIE7bzF7p+oGwFpUmOBFzAgoplDK/j FJp6u8EZfZUrH1r0TFBggOmOofT03xIH3kasFDrbqdNTnihw2iOXOojOsFMoaxZMF61m sENKyKWLGKPSTSltmHEOZJCZNpNoKnumwC3ZJT8Ul8Kzxp8KFBaBuWQGWutKSUu/v3w/ B/nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ZXxV5H02sTf27cGjR4xFe9u/ajRHs1osMk1ZZ7VKeFA=; b=rzPmgPcBYXbyS12z3RTTpMOX7D14bQliLMhx+pFqJOJ8wEUUBsJSa2FxkRETLq29y1 MMUviErBUbiwgBLvBCG60IilbafgM/GALEV6xhzhzXScqKmEyN6/jSQDM0bgc9c4beWH wRiqkaJ/gNIIcXebcPT1jiFjMJ4GxaqsPW+GA7Qn4Fcu0snmxeoeAFcUpmsFmxgO0IW3 lJjOSPpxxWMADJ4Y0nM6cDdH/+5QizYTv6N5+JIaASuNEORaIgY0qe103jVM8xQvor91 jpFgChxWiIMiqj35mCWXUK/EX229k00ED//PaLBnUit6d4CxF01ZhqaQxZvKadjUz/0w 67Og== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GZ9AZ3Km; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id eb19-20020a056a004c9300b005355f31efbesi1256429pfb.194.2022.08.26.01.40.12; Fri, 26 Aug 2022 01:40:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GZ9AZ3Km; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343515AbiHZIBP (ORCPT + 99 others); Fri, 26 Aug 2022 04:01:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241772AbiHZIBK (ORCPT ); Fri, 26 Aug 2022 04:01:10 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8021CD3E71 for ; Fri, 26 Aug 2022 01:01:09 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id i8-20020a17090a65c800b001fd602afda2so953146pjs.4 for ; Fri, 26 Aug 2022 01:01:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc; bh=ZXxV5H02sTf27cGjR4xFe9u/ajRHs1osMk1ZZ7VKeFA=; b=GZ9AZ3KmTgLjSIPCC9LY9iLyjXG05/sw6lrzkIivXH45QE0379fs/LAe6C0lMHu0wo cf0J0z/P1iD36Iq2KzxKO3pMAKh36CIW8t1q29VJVXWNEXZYkfZRnXoZADo9PSiuH0Cr 4qTdTNOSHOX5ejQSovpO6oP4GRp6d8v7+Dj6GKgYCysZ4GnI+WBZD/1T0g0MBVrpHCG6 sbx0NcdX0JMzRepJgi9zWFSJbtWvNXxuwE6ci5R4ttKYrcUhOjJ+LIxLItLdm2gk1nob PaZTmEfZ+RbWMijQBTI6ELPlGBoGB3n4XOQ+Kqc3y77dUouCKgIDH0siLGpwFz7i6GLL 5vxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc; bh=ZXxV5H02sTf27cGjR4xFe9u/ajRHs1osMk1ZZ7VKeFA=; b=fav3W6q2nkw7znn9DF5I2iVrGxakyZwBoZItkXjv7exyerS9k7ftiMa7T//T1yFlMq 3vTc/yhKMxKqzGK3WAXKTWrXgv1eNDIGTDsHUQ7lxDZVcnOO2CYwgOkYd7IrnEGaTeCr zk/DKa3XYRLr8P3PGlhbvGfwHsxNlbvwRsy3MOx6QZurxCjb/iqpKGIkm0RNU5/ZEifW AUQFHqP7erdgdsJqblyEToLeVEWYRV20UqygXjfFzt7ctqO9kvIHrM6U/rxXoXdhu9cG xpe50zOBA2PCqgYgOOTKW+eoQJ/rXNHrYYk48Om2sX/wAimikGNYZt5kQ4FxhuWZp07c 8wjQ== X-Gm-Message-State: ACgBeo2c4b2PRIRM14A2jMP/ik55ckrx98ESkl0u9hDC7j+mTuoH0/UM JvUB0Cn6ql1x63nVQ38rmPcSwav/qvOg09GRgrrgLQ== X-Received: by 2002:a17:902:d643:b0:172:84c4:d513 with SMTP id y3-20020a170902d64300b0017284c4d513mr2653434plh.138.1661500868782; Fri, 26 Aug 2022 01:01:08 -0700 (PDT) MIME-Version: 1.0 References: <20220825092325.381517-1-aneesh.kumar@linux.ibm.com> <877d2v3h8s.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Wei Xu Date: Fri, 26 Aug 2022 01:00:57 -0700 Message-ID: Subject: Re: [RFC PATCH 1/2] mm/demotion: Expose memory type details via sysfs To: Aneesh Kumar K V Cc: "Huang, Ying" , Linux MM , Andrew Morton , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, Bharata B Rao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 25, 2022 at 8:00 PM Aneesh Kumar K V wrote: > > On 8/26/22 7:20 AM, Huang, Ying wrote: > > "Aneesh Kumar K.V" writes: > > > >> This patch adds /sys/devices/virtual/memtier/ where all memory tier re= lated > >> details can be found. All allocated memory types will be listed there = as > >> /sys/devices/virtual/memtier/memtypeN/ > > > > Another choice is to make memory types and memory tiers system devices. > > That is, > > > > /sys/devices/system/memory_type/memory_typeN > > /sys/devices/system/memory_tier/memory_tierN > > > > subsys_system_register() documentation says > > * Do not use this interface for anything new, it exists for compatibilit= y > * with bad ideas only. New subsystems should use plain subsystems; and > * add the subsystem-wide attributes should be added to the subsystem > * directory itself and not some create fake root-device placed in > * /sys/devices/system/. > > memtier being a virtual device, I was under the impression that /sys/devi= ces/virtual > is the recommended place. > > > That looks more natural to me. Because we already have "node" and > > "memory" devices there. Why don't you put memory types and memory tier= s > > there? > > > > And, I think we shouldn't put "memory_type" in the "memory_tier" > > directory. "memory_type" isn't a part of "memory_tier". > > > > I was looking consolidating both memory tier and memory type into the sam= e sysfs subsystem. > Your recommendation imply we create two subsystem memory_tier and memtype= . I was > trying to avoid that. May be a generic term like "memory_tiering" can hel= p to > consolidate all tiering related details there? > A generic term "memory_tiering" sounds good to me. Given that this will be a user-facing, stable kernel API, I think we'd better to only add what is most useful for userspace and don't have to mirror the kernel internal data structures in this interface. My understanding is that we haven't fully settled down on how to customize memory tiers from userspace. So we don't have to show memory_type yet, which is a kernel data structure at this point. The userspace does need to know what are the memory tiers and which NUMA nodes are included in each memory tier. How about we provide the "nodelist" interface for each memory tier as in the original proposal? The userspace would also like to know which memory tiers/nodes belong to the top tiers (the promotion targets). We can provide a "toptiers" or "toptiers_nodelist" interface to report that. Both should still be useful even if we decide to add memory_type for memory tier customization. > >> The nodes which are part of a specific memory type can be listed via > >> /sys/devices/system/memtier/memtypeN/nodes. > > > > How about create links to /sys/devices/system/node/nodeN in > > "memory_type". But I'm OK to have "nodes" file too. > > > >> The adistance value of a specific memory type can be listed via > >> /sys/devices/system/memtier/memtypeN/adistance. > >> > >> A directory listing looks like: > >> :/sys/devices/virtual/memtier# tree memtype1 > >> memtype1 > >> =E2=94=9C=E2=94=80=E2=94=80 adistance > > > > Why not just use "abstract_distance"? This is user space interface, > > it's better to be intuitive. > > > >> =E2=94=9C=E2=94=80=E2=94=80 nodes > >> =E2=94=9C=E2=94=80=E2=94=80 subsystem -> ../../../../bus/memtier > >> =E2=94=94=E2=94=80=E2=94=80 uevent > >> > >> Since we will be using struct device to expose details via sysfs, drop= struct > >> kref and use struct device for refcounting the memtype. > >> > > > > Best Regards, > > Huang, Ying >