Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp939613pxv; Thu, 15 Jul 2021 20:34:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyHRS7z4Yf/NxlIJrSOR8GU5S21ovAuCa0TsozgszYJVsiKDpPMPGw8tK0H2nEcKhbwe462 X-Received: by 2002:a17:906:2da1:: with SMTP id g1mr9368533eji.47.1626406480471; Thu, 15 Jul 2021 20:34:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626406480; cv=none; d=google.com; s=arc-20160816; b=AvJeziPpaN+5K53xWvKetBt5Ms6MRrII68ZjX9ts7Efim8Yq6FK0cb2szFVu5O/atd NrRZtHgj0iyjflgtDihGzaBEVNkt2kob0TCW7WDH8qkfrr4sSlfQVE11elRhI+IHpuke M+Bl0QoVSvE8TwJJ375TdtO2ksLxLe7hWVFCsfSf1bl6dcV3+RiBZoi4Z6aUXsXGtqyh k7FYoYF5BKH3JzNyC7wpTjwLkj1WHYLbCbgGY/MjmAnsmwfOJn+rMuQ2e7obVqfPYVB/ IDX6iRY2oSHXUeit4d8/7qZF2QbwthryfpRNa2+9sXe9LN+NWaCMfOPqhy+3hkFkswn0 211g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from; bh=0T1PKmUWIq8m6GASiejcYB/RI1NHF0AspK5a2ftmxjQ=; b=h9sK6YxGoCRfhByPJkMtB3NpN3gsvsSttRHGBNaqYnDaZyTSq8xcc2ISj2ajcWhAbu 8QGJtBl4diMkzbnTYSb5QU9wmLG1syoT0kCY3hL0IOrv1SpL5LkoX4vO2RwDk8/JcuOJ 5cXLNgBqXHgofbAuE5taxtUrQIdU478gZx4L7a3bulZNXDgjYNTupeLn0CETrfAQnJlB ZyWdP1Woz1SGM8Y1uy7R4KL5+acqmgoT8vHwPIPF01oIve95c9+JqWmAMcxqyPjnyHOX vUVym42U3odWzVHdrTOzK+BPtJEcqoua+gIXx3XlJvPLpDBu2a1zSGeAjAS/XzmJoODB k+GA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 27si9027406eja.1.2021.07.15.20.34.17; Thu, 15 Jul 2021 20:34:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233726AbhGPDfK (ORCPT + 99 others); Thu, 15 Jul 2021 23:35:10 -0400 Received: from mga14.intel.com ([192.55.52.115]:33669 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233612AbhGPDfJ (ORCPT ); Thu, 15 Jul 2021 23:35:09 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10046"; a="210479838" X-IronPort-AV: E=Sophos;i="5.84,244,1620716400"; d="scan'208";a="210479838" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jul 2021 20:32:14 -0700 X-IronPort-AV: E=Sophos;i="5.84,244,1620716400"; d="scan'208";a="495779552" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.159.119]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jul 2021 20:32:10 -0700 From: "Huang, Ying" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Hansen , yang.shi@linux.alibaba.com, rientjes@google.com, dan.j.williams@intel.com, david@redhat.com, osalvador@suse.de, weixugc@google.com, Michal Hocko , Yang Shi , Zi Yan Subject: Re: [PATCH -V10 0/9] Migrate Pages in lieu of discard References: <20210715055145.195411-1-ying.huang@intel.com> <20210715123836.ad76b0a2e29c0bbd3cd67767@linux-foundation.org> Date: Fri, 16 Jul 2021 11:32:09 +0800 In-Reply-To: <20210715123836.ad76b0a2e29c0bbd3cd67767@linux-foundation.org> (Andrew Morton's message of "Thu, 15 Jul 2021 12:38:36 -0700") Message-ID: <87k0lrndc6.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton writes: > On Thu, 15 Jul 2021 13:51:36 +0800 Huang Ying wrote: > > The [0/n] description talks a lot about PMEM, but the patches > themselves are all about NUMA nodes. I assume that what ties this > together is that the PMEM tends to be organized as a NUMA node on its > own, and that by enabling migrate-to-remote-node-during-reclaim, we get > this PMEM behaviour as a desired side-effect? > > IOW, perhaps this [0/n] description could explain the linkage between > PMEM and NUMA nodes more explicitly. Hi, Andrew, I have added some words in the [0/9] description to link PMEM and NUMA nodes. The updated description is as below. Can you take a look at it? Best Regards, Huang, Ying --------------------------8<----------------------------------- We're starting to see systems with more and more kinds of memory such as Intel's implementation of persistent memory. Let's say you have a system with some DRAM and some persistent memory. Today, once DRAM fills up, reclaim will start and some of the DRAM contents will be thrown out. Allocations will, at some point, start falling over to the slower persistent memory. That has two nasty properties. First, the newer allocations can end up in the slower persistent memory. Second, reclaimed data in DRAM are just discarded even if there are gobs of space in persistent memory that could be used. This patchset implements a solution to these problems. At the end of the reclaim process in shrink_page_list() just before the last page refcount is dropped, the page is migrated to persistent memory instead of being dropped. While I've talked about a DRAM/PMEM pairing, this approach would function in any environment where memory tiers exist. This is not perfect. It "strands" pages in slower memory and never brings them back to fast DRAM. Huang Ying has follow-on work which repurposes autonuma to promote hot pages back to DRAM. This is also all based on an upstream mechanism that allows persistent memory to be onlined and used as if it were volatile: http://lkml.kernel.org/r/20190124231441.37A4A305@viggo.jf.intel.com With that, the DRAM and PMEM in each socket will be represented as 2 separate NUMA nodes, with the CPUs sit in the DRAM node. So the general inter-NUMA demotion mechanism introduced in the patchset can migrate the cold DRAM pages to the PMEM node. We have tested the patchset with the postgresql and pgbench. On a 2-socket server machine with DRAM and PMEM, the kernel with the patchset can improve the score of pgbench up to 22.1% compared with that of the DRAM only + disk case. This comes from the reduced disk read throughput (which reduces up to 70.8%). == Open Issues == * Memory policies and cpusets that, for instance, restrict allocations to DRAM can be demoted to PMEM whenever they opt in to this new mechanism. A cgroup-level API to opt-in or opt-out of these migrations will likely be required as a follow-on. * Could be more aggressive about where anon LRU scanning occurs since it no longer necessarily involves I/O. get_scan_count() for instance says: "If we have no swap space, do not bother scanning anon pages"