Received: by 2002:a05:6a10:5594:0:0:0:0 with SMTP id ee20csp586508pxb; Mon, 25 Apr 2022 17:22:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxZobSlBHy/RgfQiZsz4f5p8JRQz1J8j6ofbOTYIiFR2pI0Q8COmahcbnA4AA8oidgoraaA X-Received: by 2002:a05:6a00:1a8e:b0:50a:90c4:95f3 with SMTP id e14-20020a056a001a8e00b0050a90c495f3mr21674463pfv.75.1650932554073; Mon, 25 Apr 2022 17:22:34 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1650932554; cv=pass; d=google.com; s=arc-20160816; b=ELhyCaNtHUd78ANCXunPYV0JdN9+x+jRywrre8iRk0QK2XyBuD4rP6DCwFlLhTky4O xSwpb+IydfRtdvr/MMzkuukiu1Em0XOvLnURnB42ORjvKWNSOPqlpsFx3LD6Kacyrzrz 8yBNX48PLa5ZzdEtM8T4V3NZdNsaBb8A0fIIZ6wUO7/RHn6RKnrR0rRDelkV4YFiKLdH D0MdD9WsPuJ5DkDgQ113LG+EPTsEgiRH7zUFn20kn10hK830DjpWEq4I4vXPNOh8aQ4t iYyreHuM/YZ3Jh0jyUo0hA8mrq2yrgCJOTQWbDgP0XTgfTAQdHEp1MUqHxOuE8NQ0Axi heRg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:dkim-signature; bh=6zh5F+jPIE5B2KMd//wepyBpImwUBWJ8d5VzR30wV3c=; b=yoL/RQvVKI9vp18zJj6ZnJqXUfcoNIO2BkEZFJcGH/q4RPYc0v6ldYUQ5rBuJmVexG HzLmb88PbE+ooyfWWpGHAMGuIDA4HMrmBXI8/mVlAyxt9MGG0YtJpq41ajKEfZvOuiDl wCSTy8q9drc7aDXpGpMJamqROU5RdergROu7ScDJKvK6xrS9JrdDoAaQgYr4aZH68SFi bn7BHhF9BVcTA9dlwclWTAYQUWy0LFTLIMGh6fZ16jsuv+rnlc+xza+rPcHCjVY9RPVv G5erv015ULelzkf6lTT+PObFBkCdcj3cg5Zaq0Xng/CqqcJCsXbupfG5y95apkxLKaLm ejpQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@stgolabs.net header.s=dreamhost header.b=gBFBkLd4; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m188-20020a6326c5000000b0039e176d74a5si2574653pgm.628.2022.04.25.17.22.18; Mon, 25 Apr 2022 17:22:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@stgolabs.net header.s=dreamhost header.b=gBFBkLd4; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245167AbiDYUUp (ORCPT + 99 others); Mon, 25 Apr 2022 16:20:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243717AbiDYUUn (ORCPT ); Mon, 25 Apr 2022 16:20:43 -0400 Received: from beige.elm.relay.mailchannels.net (beige.elm.relay.mailchannels.net [23.83.212.16]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A03A421832 for ; Mon, 25 Apr 2022 13:17:38 -0700 (PDT) X-Sender-Id: dreamhost|x-authsender|dave@stgolabs.net Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 6744222170; Mon, 25 Apr 2022 20:17:37 +0000 (UTC) Received: from pdx1-sub0-mail-a237.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id E09C721F3E; Mon, 25 Apr 2022 20:17:34 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1650917856; a=rsa-sha256; cv=none; b=U+SjjHMoOExG3kDCYhODOkewAZips9bS4fMVW9LzyNYowd4iLFdbUFl3b6KfWEfYUtbRsn CmeePNbK2vWDh0pwtJ+eT0HoLViYpcb62yKKcp7OHcEDYGd7kvIuN4e6zt577Qzag4rD+a 3+r92eqsVFdE+76bBe5Chgxey+cjCWwK4GQ/A3YJ/EI5zK6Fm4yD0dQMjWGVC1eU4QNnrS 46jT904dIB9N0XyDNQwNHwWJguJiAdy9vw0FqxK19xjyl67ZJQsw/VrcnDG8QGVjbPkfQF Cz7ARRix0oX9ys1vmvB9WtLALbV/cFcawnwXEURFajqXYXqRryPf924pGtzP5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1650917856; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6zh5F+jPIE5B2KMd//wepyBpImwUBWJ8d5VzR30wV3c=; b=MbMmf/otaCMAcsgU3fczXX16R2GK7r2rKIpQj2ksdGn6QqEoTURQfvedehyroNcTvAMn9P cwJn3XPpYLBPFUJaS99npoBK5ARGOM/XyLcp8/C2IybXPm1ZNY4+nPRz9nip4Q7+mRkhz/ wdxXw+oj8BYscSQQbWr6ReatAP7QOM8z3+tnRqcZc0WyxljZ0eWxNgJBueTRk0d/uLaHyE PMegZpKtPvc6CL24NQBvWl6CQfGJweQHJGbXe5DGFP1uGoLLx7Oe9IvSNc96OlVui97+Jm BMC3Ryv1tvQyfC87x2dx4glJqnrugj8gID7Jik2JjG1L48wSDH/9ic8U2Pre7Q== ARC-Authentication-Results: i=1; rspamd-67b64f579b-dbvc4; auth=pass smtp.auth=dreamhost smtp.mailfrom=dave@stgolabs.net X-Sender-Id: dreamhost|x-authsender|dave@stgolabs.net Received: from pdx1-sub0-mail-a237.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.120.38.168 (trex/6.7.1); Mon, 25 Apr 2022 20:17:37 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|dave@stgolabs.net X-MailChannels-Auth-Id: dreamhost X-Rock-Bubble: 04da64d03139273e_1650917857159_376882223 X-MC-Loop-Signature: 1650917857159:3692506912 X-MC-Ingress-Time: 1650917857158 Received: from offworld (unknown [104.36.31.105]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: dave@stgolabs.net) by pdx1-sub0-mail-a237.dreamhost.com (Postfix) with ESMTPSA id 4KnGXF4LMlz39; Mon, 25 Apr 2022 13:17:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stgolabs.net; s=dreamhost; t=1650917854; bh=6zh5F+jPIE5B2KMd//wepyBpImwUBWJ8d5VzR30wV3c=; h=Date:From:To:Cc:Subject:Content-Type:Content-Transfer-Encoding; b=gBFBkLd4y39ZL66DqgPXptgkgqNkIJg1j3Y6gBsXk0EvO0gofnWG2iqI5bYk4fLH4 QRx8/sxAVNE+PHDd6nKgWGs9uDZrfgitGtPNR/WxQRZLtYKKNjBElTzaD3Dn8FGfjR SXrzPuPEBtd9QqfE2NLY+dS63lwizhqKAkmnpg+Ivim+Bz2WbzwEE9Su96oV74HH6O q6cWNWkdThSrU/pb6I6kJjzwI41PHYhzuQdf69xgkzN1u7q7Mq32X+522KshdW6epE U1lUF1qi6S4Sp6gDFxDPFbJjGx6yYzdwNPcmFd9aZ+wMagooVc+Khf1cPvxzy6CERX i/NCWJaaiaViw== Date: Mon, 25 Apr 2022 13:17:28 -0700 From: Davidlohr Bueso To: Aneesh Kumar K V Cc: "ying.huang@intel.com" , Jagdish Gediya , Wei Xu , Yang Shi , Dave Hansen , Dan Williams , Linux MM , Linux Kernel Mailing List , Andrew Morton , Baolin Wang , Greg Thelen , MichalHocko , Brice Goglin Subject: Re: [PATCH v2 0/5] mm: demotion: Introduce new node state N_DEMOTION_TARGETS Message-ID: <20220425201728.5kzm4seu7rep7ndr@offworld> Mail-Followup-To: Aneesh Kumar K V , "ying.huang@intel.com" , Jagdish Gediya , Wei Xu , Yang Shi , Dave Hansen , Dan Williams , Linux MM , Linux Kernel Mailing List , Andrew Morton , Baolin Wang , Greg Thelen , MichalHocko , Brice Goglin References: <610ccaad03f168440ce765ae5570634f3b77555e.camel@intel.com> <8e31c744a7712bb05dbf7ceb2accf1a35e60306a.camel@intel.com> <78b5f4cfd86efda14c61d515e4db9424e811c5be.camel@intel.com> <200e95cf36c1642512d99431014db8943fed715d.camel@intel.com> <8735i1zurt.fsf@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: User-Agent: NeoMutt/20201120 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 25 Apr 2022, Aneesh Kumar K V wrote: >On 4/25/22 11:40 AM, ying.huang@intel.com wrote: >>On Mon, 2022-04-25 at 09:20 +0530, Aneesh Kumar K.V wrote: >>>"ying.huang@intel.com" writes: >>> >>>>Hi, All, >>>> >>>>On Fri, 2022-04-22 at 16:30 +0530, Jagdish Gediya wrote: >>>> >>>>[snip] >>>> >>>>>I think it is necessary to either have per node demotion targets >>>>>configuration or the user space interface supported by this patch >>>>>series. As we don't have clear consensus on how the user interface >>>>>should look like, we can defer the per node demotion target set >>>>>interface to future until the real need arises. >>>>> >>>>>Current patch series sets N_DEMOTION_TARGET from dax device kmem >>>>>driver, it may be possible that some memory node desired as demotion >>>>>target is not detected in the system from dax-device kmem probe path. >>>>> >>>>>It is also possible that some of the dax-devices are not preferred as >>>>>demotion target e.g. HBM, for such devices, node shouldn't be set to >>>>>N_DEMOTION_TARGETS. In future, Support should be added to distinguish >>>>>such dax-devices and not mark them as N_DEMOTION_TARGETS from the >>>>>kernel, but for now this user space interface will be useful to avoid >>>>>such devices as demotion targets. >>>>> >>>>>We can add read only interface to view per node demotion targets >>>>>from /sys/devices/system/node/nodeX/demotion_targets, remove >>>>>duplicated /sys/kernel/mm/numa/demotion_target interface and instead >>>>>make /sys/devices/system/node/demotion_targets writable. >>>>> >>>>>Huang, Wei, Yang, >>>>>What do you suggest? >>>> >>>>We cannot remove a kernel ABI in practice. So we need to make it right >>>>at the first time. Let's try to collect some information for the kernel >>>>ABI definitation. >>>> >>>>The below is just a starting point, please add your requirements. >>>> >>>>1. Jagdish has some machines with DRAM only NUMA nodes, but they don't >>>>want to use that as the demotion targets. But I don't think this is a >>>>issue in practice for now, because demote-in-reclaim is disabled by >>>>default. >>> >>>It is not just that the demotion can be disabled. We should be able to >>>use demotion on a system where we can find DRAM only NUMA nodes. That >>>cannot be achieved by /sys/kernel/mm/numa/demotion_enabled. It needs >>>something similar to to N_DEMOTION_TARGETS >>> >> >>Can you show NUMA information of your machines with DRAM-only nodes and >>PMEM nodes? We can try to find the proper demotion order for the >>system. If you can not show it, we can defer N_DEMOTION_TARGETS until >>the machine is available. > > >Sure will find one such config. As you might have noticed this is very >easy to have in a virtualization setup because the hypervisor can >assign memory to a guest VM from a numa node that doesn't have CPU >assigned to the same guest. This depends on the other guest VM >instance config running on the system. So on any virtualization config >that has got persistent memory attached, this can become an easy >config to end up with. And as hw becomes available things like CXL will also start to show "interesting" setups. You have a mix of volatile and/or pmem nodes with different access costs, so: CPU+DRAM, DRAM (?), volatile CXL mem, CXL pmem, non-cxl pmem. imo, by default, slower mem should be demotion candidates regardless of type or socket layout (which can be a last consideration such that this is somewhat mitigated). And afaict this is along the lines of what Jagdish's first example refers to in patch 1/5. > >>>>2. For machines with PMEM installed in only 1 of 2 sockets, for example, >>>> >>>>Node 0 & 2 are cpu + dram nodes and node 1 are slow >>>>memory node near node 0, >>>> >>>>available: 3 nodes (0-2) >>>>node 0 cpus: 0 1 >>>>node 0 size: n MB >>>>node 0 free: n MB >>>>node 1 cpus: >>>>node 1 size: n MB >>>>node 1 free: n MB >>>>node 2 cpus: 2 3 >>>>node 2 size: n MB >>>>node 2 free: n MB >>>>node distances: >>>>node 0 1 2 >>>> =A0=A00: 10 40 20 >>>> =A0=A01: 40 10 80 >>>> =A0=A02: 20 80 10 >>>> >>>>We have 2 choices, >>>> >>>>a) >>>>node demotion targets >>>>0 1 >>>>2 1 >>> >>>This is achieved by >>> >>>[PATCH v2 1/5] mm: demotion: Set demotion list differently Yes, I think it makes sense to do 2a. Thanks, Davidlohr