Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp241779iof; Mon, 6 Jun 2022 02:29:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzkWbZmOUDCfav4Wh4X74nur8BZwSpIeee78Q7LnqWtCb7sR0YdZvzFqrJ5tfhuNP/NhiKW X-Received: by 2002:a05:6a00:842:b0:51b:f289:7354 with SMTP id q2-20020a056a00084200b0051bf2897354mr12200351pfk.75.1654507796248; Mon, 06 Jun 2022 02:29:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654507796; cv=none; d=google.com; s=arc-20160816; b=ScTUZ5uqSeYjmcKSkQ8QDa/Gd7pWSwtMdN00aHupx0ILhFquztem9UAo3SB2ZZEvaL 2G6ouERAD527/N4ER5PPumKmkp8rnCM7CD6LxNvfoAIvg/GEjARHRzPT6qNEewsasVat JPA1otvBC2FDyIYHSemcy5VmPUZLdhacA1R1okb6VSlx06g+uJPJ5DegD9TsMNmaTlmY sZN1YXvxNwNzG2RO/1/wpJYyhim+CgQwGJGI5Rx5iTc0bLGBSPlwQDg0ydNkbSwtcnun /nQUC5FIIj4KGd2f6MYkb5bpj5+ZlzjqvBFwHmDDTejO/S6zI+EKZF4EK0Ok8cufbwmw 6ZNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=mzZRZT0AHyq8asyKqO60WX5Lt3f8z0sEDNX/x+OL8GM=; b=UP7cqMRw3YhAf1WWrXXIOJpBlMOR1rV82k4FjRGmsZpsmwDGX6XUQSLhmYKSc+2HFQ Aw/1fafYk+rh6cyW3wniAtoo63/mWBNvhF5lKkKLksJyNmIzrG8OizdBRX4XADdb11ZO DqDzlbwG0GUlCZu8U4dSAdVBV1mcLhBVZ1UqlGLRPirLHHyDbEKPMqfwVbdraVqBFyIc 6ovs0wt4HzhLlbDn4EpRxx9a9XSwBVESTpLC2WE0zeVWJsGch8YbwVBU5UdE13cbyy0r Pa1CeyiRkF2GR5oMRja/Vov0r+UZT0Oc4f4pKVSYIpAXRf5pSJNEnDgjMIS6pteoLGJr 8wtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=KLLkjnVy; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id u6-20020a17090a890600b001dc82308e3asi18426830pjn.96.2022.06.06.02.29.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Jun 2022 02:29:56 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=KLLkjnVy; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EC1B421E21; Mon, 6 Jun 2022 02:03:13 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232499AbiFFJDD (ORCPT + 99 others); Mon, 6 Jun 2022 05:03:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232515AbiFFJCw (ORCPT ); Mon, 6 Jun 2022 05:02:52 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E678A21814 for ; Mon, 6 Jun 2022 02:02:50 -0700 (PDT) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2568a0Mt015613; Mon, 6 Jun 2022 09:02:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=mzZRZT0AHyq8asyKqO60WX5Lt3f8z0sEDNX/x+OL8GM=; b=KLLkjnVy1FzYLEa96yyDz9iJ+ljdJoPsG7XWRhczhNOYgOR+isSTFD/XaI0UPnQbLU8Q JbIu8fS68H7tbxPB947mfePx2DbiEJpmlrBvB0EXAV+Diy48JmHCr+4+mmLGpD9jixWf AVJlJQBZz7IPnUFUyn1Jq6uDv6fR40NmYA1QnHsS2EDe/wQkvf+e9U1RjVMGqRQ7FuQP tN851335kIwyaX5bH0E7wRF/eSCErRzZbqjRME7dcZrr6W8JbFRPIUCA4bpanqRIIbmZ UxXdMYox30RD1GKiJ4PtWIiSue5mjTHTnbhr9GHfheuFl97cHNQXY4auyVIOzfkLOeNz Cw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3gghf7bs03-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Jun 2022 09:02:30 +0000 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2568OtO1006102; Mon, 6 Jun 2022 09:02:29 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3gghf7bryd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Jun 2022 09:02:29 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2568pjjS002172; Mon, 6 Jun 2022 09:02:27 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma02fra.de.ibm.com with ESMTP id 3gfy191ngv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Jun 2022 09:02:27 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25692PZ913697476 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 6 Jun 2022 09:02:25 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 47095A4040; Mon, 6 Jun 2022 09:02:25 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 02554A404D; Mon, 6 Jun 2022 09:02:19 +0000 (GMT) Received: from [9.43.87.254] (unknown [9.43.87.254]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 6 Jun 2022 09:02:18 +0000 (GMT) Message-ID: <1301311f-12f0-0fda-1245-82bb4c3f5e93@linux.ibm.com> Date: Mon, 6 Jun 2022 14:32:17 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers Content-Language: en-US To: Ying Huang Cc: Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes , linux-mm@kvack.org, akpm@linux-foundation.org References: <20220527122528.129445-1-aneesh.kumar@linux.ibm.com> <20220527122528.129445-2-aneesh.kumar@linux.ibm.com> <352ae5f408b6d7d4d3d820d68e2f2c6b494e95e1.camel@intel.com> <143e40bcf46097d14514504518fdc1870fd8d4a1.camel@intel.com> <87ilpe8fxh.fsf@linux.ibm.com> From: Aneesh Kumar K V In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: sJ8vN0SFkEeNXPAJn6o8Cf3tKTp8qn-N X-Proofpoint-ORIG-GUID: -pUIvSSa_YdMBAh1F8FxerjJXz4pW1Hp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-06_02,2022-06-03_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 mlxscore=0 suspectscore=0 phishscore=0 clxscore=1015 adultscore=0 malwarescore=0 priorityscore=1501 lowpriorityscore=0 impostorscore=0 bulkscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206060039 X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/6/22 2:22 PM, Ying Huang wrote: .... >>>> I can move the patch "mm/demotion/dax/kmem: Set node's memory tier to >>>> MEMORY_TIER_PMEM" before switching the demotion logic so that on systems >>>> with two memory tiers (DRAM and pmem) the demotion continues to work >>>> as expected after patch 3 ("mm/demotion: Build demotion targets based on >>>> explicit memory tiers"). With that, there will not be any regression in >>>> between the patch series. >>>> >>> >>> Thanks! Please do that. And I think you can add sysfs interface after >>> that patch too. That is, in [1/7] >>> >> >> I am not sure why you insist on moving sysfs interfaces later. They are >> introduced based on the helper added. It make patch review easier to >> look at both the helpers and the user of the helper together in a patch. > > Yes. We should introduce a function and its user in one patch for > review. But this doesn't mean that we should introduce the user space > interface as the first step. I think the user space interface should > output correct information when we expose it. > If you look at this patchset we are not exposing any wrong information. patch 1 -> adds ability to register the memory tiers and expose details of registered memory tier. At this point the patchset only support DRAM tier and hence only one tier is shown patch 2 -> adds per node memtier attribute. So only DRAM nodes shows the details, because the patchset yet has not introduced a slower memory tier like PMEM. patch 4 -> introducing demotion. Will make that patch 5 patch 5 -> add dax kmem numa nodes as slower memory tier. Now this becomes patch 4 at which point we will correctly show two memory tiers in the system. >>> +struct memory_tier { >>> + nodemask_t nodelist; >>> +}; >>> >>> And struct device can be added after the kernel has switched the >>> implementation based on explicit memory tiers. >>> >>> +struct memory_tier { >>> + struct device dev; >>> + nodemask_t nodelist; >>> +}; >>> >> >> >> Can you elaborate on this? or possibly review the v5 series indicating >> what change you are suggesting here? >> >> >>> But I don't think it's a good idea to have "struct device" embedded in >>> "struct memory_tier". We don't have "struct device" embedded in "struct >>> pgdata_list"... >>> >> >> I avoided creating an array for memory_tier (memory_tier[]) so that we >> can keep it dynamic. Keeping dev embedded in struct memory_tier simplify >> the life cycle management of that dynamic list. We free the struct >> memory_tier allocation via device release function (memtier->dev.release >> = memory_tier_device_release ) >> >> Why do you think it is not a good idea? > > I think that we shouldn't bind our kernel internal implementation with > user space interface too much. Yes. We can expose kernel internal > implementation to user space in a direct way. I suggest you to follow > the style of "struct pglist_data" and "struct node". If we decouple > "struct memory_tier" and "struct memory_tier_dev" (or some other name), > we can refer to "struct memory_tier" without depending on all device > core. Memory tier should be accessible inside the kernel even without a > user interface. And memory tier isn't a device in concept. > memory_tiers are different from pglist_data and struct node in that we also allow the creation of them from userspace. That is the life time of a memory tier is driven from userspace and it is much easier to manage them via sysfs file lifetime mechanism rather than inventing an independent and more complex way of doing the same. > For life cycle management, I think that we can do that without sysfs > too. > unless there are specific details that you think will be broken by embedding struct device inside struct memory_tier, IMHO I still consider the embedded implementation much simpler and in accordance with other kernel design patterns. -aneesh