Received: by 2002:a05:6a10:5594:0:0:0:0 with SMTP id ee20csp60522pxb; Mon, 25 Apr 2022 05:52:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxvw60QlOszy91s7SoQjUmapF2pt8xDes6E8M8VW6NQXxSBBzK4D3vYU0ShIiWnPdiHZYin X-Received: by 2002:a05:6402:1cc1:b0:413:2cfb:b6ca with SMTP id ds1-20020a0564021cc100b004132cfbb6camr18254968edb.265.1650891122782; Mon, 25 Apr 2022 05:52:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650891122; cv=none; d=google.com; s=arc-20160816; b=aAY6HWAmfzcQ0RWrl0ueJcOHCuwhjMAciI99gNiBuDU7bsKaJrnVIXa1POSqPjsF3z +AQzyqWfyMupddz6pymPoKkn0Y72GQQ57mYaiEovqIzFcUb4Sd+YXKkkA9EVm33CaJu/ y9H33MNrgtt9Aj+S9k5vTP9W5D8aHwL/RAKnCClBbVp8OvoGnayRBQhmn//wbQvJSxm2 wyDURlHvVO25WluoPgEOnkoRfuXHSSDXV5Qy3uIgl0ZBA8G7d/RcXi8qDuEQqYed6NxD Qyceqscrh54GaYtaVyB/ciI8xyo/6TbzTl0mZQvKDgvzZG5dvASm0k20L1Z1+p0ezU9M SdhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=dCVPKYv5VNMsz/Nqd9o08lQyf9PDUaiErrxL1ARkM5c=; b=pe10+66H2X57LIzVg/3BZ3nAOpdgPZvsbrlE82bti00A+grvtNTVN3d7v56EanYRLp 6hpJiTWr+sUYzl+l9pBVdA8cIOwmBCqhlh8xEGWkv7h1lz/lSmOjInya94M+s2lONLMX 3Z3mCayymKiRzN7r2ax4WsiBUlZZIyDMtVLyKMYgPOxnFy6/rxc1C9aVB4DzNAkSsQXT 26qhUPti9NpyuOax1WULJNiA1HgbZSqigh2iDpxgD2nV9W90TNvf/WeJpICqoKnQqMdM ggWzBUDvy55/1jRdILWiUpAA47pe61p1pgFxFVkTy1I73TXBLIYV5dLgtE5dqVLgKmbe ppXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=X5J+3Wd1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g21-20020aa7c855000000b0041d7e12113fsi11058014edt.268.2022.04.25.05.51.35; Mon, 25 Apr 2022 05:52:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=X5J+3Wd1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239820AbiDYI6T (ORCPT + 99 others); Mon, 25 Apr 2022 04:58:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240033AbiDYI5n (ORCPT ); Mon, 25 Apr 2022 04:57:43 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBE6CB860 for ; Mon, 25 Apr 2022 01:54:34 -0700 (PDT) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 23P7WthB002630; Mon, 25 Apr 2022 08:54:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : from : to : cc : references : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=dCVPKYv5VNMsz/Nqd9o08lQyf9PDUaiErrxL1ARkM5c=; b=X5J+3Wd1J1/pI+ccqjVVy6ZH/MDhAWXFJO31Xv+d4Mv2i8DPWGm+TNDdRNamStr0ncTd izsHlPeF+8oKef5gC82si3dZw1MQ6H+4C6TPhkb3nWdKpe0HxEKRg9WBeugK/I1sxjqv yj2JwuR4LollCe5qWrHo06GE2z+Zf++uSGCb3h5dvPoHq+AYbB18WKXsnc60P52nC2l2 eadpefUTsYw9klr7deRz2pJ/LP26sKKv3gU871DG1CFXaH1Rxu0lJFL7WatDEpsJy+HU lvhQYZyZHuSl6lo+oKVRSNvz85Avs7bTNm6Y0uTAbDEsSTBDp2YTir6eZFKURhbk3VaI 1w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3fnqbrhkmx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Apr 2022 08:54:27 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 23P8ldrr003350; Mon, 25 Apr 2022 08:54:26 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3fnqbrhkm5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Apr 2022 08:54:26 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 23P8nSs5032381; Mon, 25 Apr 2022 08:54:24 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma01fra.de.ibm.com with ESMTP id 3fm938spq9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Apr 2022 08:54:23 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 23P8sLDg52625850 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 25 Apr 2022 08:54:21 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A2E2BA405B; Mon, 25 Apr 2022 08:54:21 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 41539A4054; Mon, 25 Apr 2022 08:54:17 +0000 (GMT) Received: from [9.43.95.32] (unknown [9.43.95.32]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 25 Apr 2022 08:54:16 +0000 (GMT) Message-ID: <2f716f45-f8c6-a078-6cfc-b4fb5ef74cd5@linux.ibm.com> Date: Mon, 25 Apr 2022 14:24:15 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v2 0/5] mm: demotion: Introduce new node state N_DEMOTION_TARGETS Content-Language: en-US From: Aneesh Kumar K V To: "ying.huang@intel.com" , Jagdish Gediya , Wei Xu , Yang Shi , Dave Hansen , Dan Williams , Davidlohr Bueso Cc: Linux MM , Linux Kernel Mailing List , Andrew Morton , Baolin Wang , Greg Thelen , MichalHocko , Brice Goglin References: <610ccaad03f168440ce765ae5570634f3b77555e.camel@intel.com> <8e31c744a7712bb05dbf7ceb2accf1a35e60306a.camel@intel.com> <78b5f4cfd86efda14c61d515e4db9424e811c5be.camel@intel.com> <200e95cf36c1642512d99431014db8943fed715d.camel@intel.com> <8735i1zurt.fsf@linux.ibm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: UDs4WcZScv8QbXAcsoc__p8Qhwe0gbe8 X-Proofpoint-GUID: QSYIHtWzkuqkKtx8Or5MBtdIJfmM3UPw X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-04-25_05,2022-04-22_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 mlxlogscore=999 mlxscore=0 bulkscore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=0 impostorscore=0 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204250038 X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/25/22 1:39 PM, Aneesh Kumar K V wrote: > On 4/25/22 11:40 AM, ying.huang@intel.com wrote: >> On Mon, 2022-04-25 at 09:20 +0530, Aneesh Kumar K.V wrote: >>> "ying.huang@intel.com" writes: >>> >>>> Hi, All, >>>> >>>> On Fri, 2022-04-22 at 16:30 +0530, Jagdish Gediya wrote: >>>> >>>> [snip] >>>> >>>>> I think it is necessary to either have per node demotion targets >>>>> configuration or the user space interface supported by this patch >>>>> series. As we don't have clear consensus on how the user interface >>>>> should look like, we can defer the per node demotion target set >>>>> interface to future until the real need arises. >>>>> >>>>> Current patch series sets N_DEMOTION_TARGET from dax device kmem >>>>> driver, it may be possible that some memory node desired as demotion >>>>> target is not detected in the system from dax-device kmem probe path. >>>>> >>>>> It is also possible that some of the dax-devices are not preferred as >>>>> demotion target e.g. HBM, for such devices, node shouldn't be set to >>>>> N_DEMOTION_TARGETS. In future, Support should be added to distinguish >>>>> such dax-devices and not mark them as N_DEMOTION_TARGETS from the >>>>> kernel, but for now this user space interface will be useful to avoid >>>>> such devices as demotion targets. >>>>> >>>>> We can add read only interface to view per node demotion targets >>>>> from /sys/devices/system/node/nodeX/demotion_targets, remove >>>>> duplicated /sys/kernel/mm/numa/demotion_target interface and instead >>>>> make /sys/devices/system/node/demotion_targets writable. >>>>> >>>>> Huang, Wei, Yang, >>>>> What do you suggest? >>>> >>>> We cannot remove a kernel ABI in practice.  So we need to make it right >>>> at the first time.  Let's try to collect some information for the >>>> kernel >>>> ABI definitation. >>>> >>>> The below is just a starting point, please add your requirements. >>>> >>>> 1. Jagdish has some machines with DRAM only NUMA nodes, but they don't >>>> want to use that as the demotion targets.  But I don't think this is a >>>> issue in practice for now, because demote-in-reclaim is disabled by >>>> default. >>> >>> It is not just that the demotion can be disabled. We should be able to >>> use demotion on a system where we can find DRAM only NUMA nodes. That >>> cannot be achieved by /sys/kernel/mm/numa/demotion_enabled. It needs >>> something similar to to N_DEMOTION_TARGETS >>> >> >> Can you show NUMA information of your machines with DRAM-only nodes and >> PMEM nodes?  We can try to find the proper demotion order for the >> system.  If you can not show it, we can defer N_DEMOTION_TARGETS until >> the machine is available. > > > Sure will find one such config. As you might have noticed this is very > easy to have in a virtualization setup because the hypervisor can assign > memory to a guest VM from a numa node that doesn't have CPU assigned to > the same guest. This depends on the other guest VM instance config > running on the system. So on any virtualization config that has got > persistent memory attached, this can become an easy config to end up with. > > something like this $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 14272 MB node 0 free: 13392 MB node 1 cpus: node 1 size: 2028 MB node 1 free: 1971 MB node distances: node 0 1 0: 10 40 1: 40 10 $ cat /sys/bus/nd/devices/dax0.0/target_node 2 $ # cd /sys/bus/dax/drivers/ :/sys/bus/dax/drivers# ls device_dax kmem :/sys/bus/dax/drivers# cd device_dax/ :/sys/bus/dax/drivers/device_dax# echo dax0.0 > unbind :/sys/bus/dax/drivers/device_dax# echo dax0.0 > ../kmem/new_id :/sys/bus/dax/drivers/device_dax# numactl -H available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 14272 MB node 0 free: 13380 MB node 1 cpus: node 1 size: 2028 MB node 1 free: 1961 MB node 2 cpus: node 2 size: 0 MB node 2 free: 0 MB node distances: node 0 1 2 0: 10 40 80 1: 40 10 80 2: 80 80 10 :/sys/bus/dax/drivers/device_dax#