Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp1554273rwe; Thu, 1 Sep 2022 22:47:59 -0700 (PDT) X-Google-Smtp-Source: AA6agR4/z2Kp/m+e9VKugxjCmlONtx5Nvif5n28vR+ncYgmYDy061uzochzbe735eR9eQ+hKXPvO X-Received: by 2002:a17:90b:4b48:b0:1fd:d2cd:896c with SMTP id mi8-20020a17090b4b4800b001fdd2cd896cmr3024577pjb.120.1662097679600; Thu, 01 Sep 2022 22:47:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662097679; cv=none; d=google.com; s=arc-20160816; b=YH0mMvw0TVJ2iWZtRwNFwJJNf5cigTyN/PSAFiAvyBaBBs7YWFvEF6qIZysYgW7q4d nrxuCEl8V5VgACYomDznrrpQ1sN1xcCMWowUS8ZasPPZGmRLccByeVc59ydmFUkg7bnu Teh8wJJlU9d7fl1RdhrZ0OzPGJI7Xu1Mi9cum47fNufHI1AGga7m7JUEtu3mb+UnAtzB HV0WelKFsZNNrt4ZsVKkDAe8ITvJCR/I/OBHhoZJ913K91dz2dD2AQmMi6pocKvsVOXd 30jczh8bsMFkEtr/RmQIxtHxWHloEtikCHIs2niJnWmx71YsLg9HjW+j6OruojFNx8eZ EE/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:in-reply-to:date:references:subject:cc:to :from:dkim-signature; bh=qa3k8iSdK5wYDjV8ewXtRvniN9Rpuz6SMAGpL2DAhaE=; b=bu6jlEAySUKxB6UbqS5lNZzbbg0YwpPEfB9myDOJD9xikoCorp2Vb9PKrSxOi0hm4o BUQx44qC5gHrlRku56gFQlgSOAvwGgIbXIWQYaKBlnshw7nXFuc6YzBI5MwPkqOoGJC3 bLF5uNFwpk1bR5gZ5pgekeIm0tIiKcqIL/fTkjZsDAfU49PpJq2+SPTW640K1WsQr32a RBuVnhnYBVpx5X99Zf7M1Zs8fRR2ZWt5FL3b2XhGbADmPMudzV0rE165Sgi2zOyUNFNK 3CNmPg9KgkfEdWANg4LjolYxhcLNeKswsF9CZxkxfPLiR9+Y2/PABBOzYiyqAg/qGtch q31g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SkjrMU1g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i26-20020a633c5a000000b00419c3bbce5bsi951958pgn.795.2022.09.01.22.47.47; Thu, 01 Sep 2022 22:47:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SkjrMU1g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235137AbiIBFk6 (ORCPT + 99 others); Fri, 2 Sep 2022 01:40:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235136AbiIBFk4 (ORCPT ); Fri, 2 Sep 2022 01:40:56 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0B104DF20 for ; Thu, 1 Sep 2022 22:40:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662097255; x=1693633255; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version:content-transfer-encoding; bh=BfQPvi3TdP7Kz3uTSqeZ22PJmFXw1j3AImJ2haLhM+M=; b=SkjrMU1girL5ueuxN5MVFuowP10zW7nmm/sGLabKsM1ZyCLztV2TI7jJ Ue0AlFcsd0ZHrZOCr5yFflt/7ajV+8wO6iJ6WkmK2RkHP2s49Isp44dDP ew/hGrVrwTyMmm646QFCq/yRbEaxuTuGXIlJ2OW/x64YSpwS57Hgwvqxu AbqOCB0d6YO2xyPJrtlJjPX8jZxyQprbdC8ujYQmJ+QvI8b3Fb+ITHdWU O234Iz7DH4E01zvDeqUMCy4I8cnofM9i6c5PQm3lxDv66kAD5AicuVdKg 9AKaNYIv6VwD1y+GePQYvFczqqmvwTVcw7lnuc65KTnfrUxVP8QRII7GJ Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10457"; a="276300832" X-IronPort-AV: E=Sophos;i="5.93,283,1654585200"; d="scan'208";a="276300832" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Sep 2022 22:40:55 -0700 X-IronPort-AV: E=Sophos;i="5.93,283,1654585200"; d="scan'208";a="941158445" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Sep 2022 22:40:51 -0700 From: "Huang, Ying" To: Aneesh Kumar K V Cc: Wei Xu , Johannes Weiner , Linux MM , Andrew Morton , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , jvgediya.oss@gmail.com, Bharata B Rao , Greg Thelen Subject: Re: [PATCH v3 updated] mm/demotion: Expose memory tier details via sysfs References: <20220830081736.119281-1-aneesh.kumar@linux.ibm.com> <87tu5rzigc.fsf@yhuang6-desk2.ccr.corp.intel.com> <87pmgezkhp.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 02 Sep 2022 13:40:50 +0800 In-Reply-To: (Aneesh Kumar K. V.'s message of "Fri, 2 Sep 2022 10:53:40 +0530") Message-ID: <87fshaz63h.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Aneesh Kumar K V writes: > On 9/2/22 10:39 AM, Wei Xu wrote: >> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying wrote: >>> >>> Aneesh Kumar K V writes: >>> >>>> On 9/1/22 12:31 PM, Huang, Ying wrote: >>>>> "Aneesh Kumar K.V" writes: >>>>> >>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memor= y tier >>>>>> related details can be found. All allocated memory tiers will be lis= ted >>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/ >>>>>> >>>>>> The nodes which are part of a specific memory tier can be listed via >>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes >>>>> >>>>> I think "memory_tier" is a better subsystem/bus name than >>>>> memory_tiering. Because we have a set of memory_tierN devices inside. >>>>> "memory_tier" sounds more natural. I know this is subjective, just my >>>>> preference. >>>>> > > > I missed replying to this earlier. I will keep memory_tiering as subsyste= m name in v4=20 > because we would want it to a susbsystem where all memory tiering related= details can be found > including memory type in the future. This is as per discussion=20 > > https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=3Dr-io3gkX7gorUunS2Ufs= tudCWuihrA=3D0g@mail.gmail.com I don't think that it's a good idea to mix 2 types of devices in one subsystem (bus). If my understanding were correct, that breaks the driver core convention. >>>>>> >>>>>> A directory hierarchy looks like >>>>>> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/ >>>>>> memory_tier4/ >>>>>> =E2=94=9C=E2=94=80=E2=94=80 nodes >>>>>> =E2=94=9C=E2=94=80=E2=94=80 subsystem -> ../../../../bus/memory_tier= ing >>>>>> =E2=94=94=E2=94=80=E2=94=80 uevent >>>>>> >>>>>> All toptier nodes are listed via >>>>>> /sys/devices/virtual/memory_tiering/toptier_nodes >>>>>> >>>>>> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes >>>>>> 0,2 >>>>>> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes >>>>>> 0,2 >>>>> >>>>> I don't think that it is a good idea to show toptier information in u= ser >>>>> space interface. Because it is just a in kernel implementation >>>>> details. Now, we only promote pages from !toptier to toptier. But >>>>> there may be multiple memory tiers in toptier and !toptier, we may >>>>> change the implementation in the future. For example, we may promote >>>>> pages from DRAM to HBM in the future. >>>>> >>>> >>>> >>>> In the case you describe above and others, we will always have a list = of >>>> NUMA nodes from which memory promotion is not done. >>>> /sys/devices/virtual/memory_tiering/toptier_nodes shows that list. >>> >>> I don't think we will need that interface if we don't restrict promotion >>> in the future. For example, he can just check the memory tier with >>> smallest number. >>> >>> TBH, I don't know why do we need that interface. What is it for? We >>> don't want to expose unnecessary information to restrict our in kernel >>> implementation in the future. >>> >>> So, please remove that interface at least before we discussing it >>> thoroughly. >>=20 >> I have asked for this interface to allow the userspace to query a list >> of top-tier nodes as the targets of userspace-driven promotions. The >> idea is that demotion can gradually go down tier by tier, but we >> promote hot pages directly to the top-tier and bypass the immediate >> tiers. >>=20 >> Certainly, this can be viewed as a policy choice. Given that now we >> have a clearly defined memory tier hierarchy in sysfs and the >> toptier_nodes content can be constructed from this memory tier >> hierarchy and other information from the node sysfs interfaces, I am >> fine if we want to remove toptier_nodes and keep the current memory >> tier sysfs interfaces to the minimal. >> > > > Ok I can do a v4 with toptier_nodes dropped. Thanks! Best Regards, Huang, Ying