Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753411AbbD2E6P (ORCPT ); Wed, 29 Apr 2015 00:58:15 -0400 Received: from mail-pd0-f178.google.com ([209.85.192.178]:34651 "EHLO mail-pd0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753326AbbD2E6L (ORCPT ); Wed, 29 Apr 2015 00:58:11 -0400 Date: Wed, 29 Apr 2015 13:57:59 +0900 From: Minchan Kim To: Vladimir Davydov Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Greg Thelen , Michel Lespinasse , David Rientjes , Pavel Emelyanov , Cyrill Gorcunov , Jonathan Corbet , linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 3/3] proc: add kpageidle file Message-ID: <20150429045759.GA27051@blaptop> References: <4c24a6bf2c9711dd4dbb72a43a16eba6867527b7.1430217477.git.vdavydov@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4c24a6bf2c9711dd4dbb72a43a16eba6867527b7.1430217477.git.vdavydov@parallels.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4444 Lines: 89 On Tue, Apr 28, 2015 at 03:24:42PM +0300, Vladimir Davydov wrote: > Knowing the portion of memory that is not used by a certain application > or memory cgroup (idle memory) can be useful for partitioning the system > efficiently, e.g. by setting memory cgroup limits appropriately. > Currently, the only means to estimate the amount of idle memory provided > by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the > access bit for all pages mapped to a particular process by writing 1 to > clear_refs, wait for some time, and then count smaps:Referenced. > However, this method has two serious shortcomings: > > - it does not count unmapped file pages > - it affects the reclaimer logic > > To overcome these drawbacks, this patch introduces two new page flags, > Idle and Young, and a new proc file, /proc/kpageidle. A page's Idle flag > can only be set from userspace by writing 1 to /proc/kpageidle at the > offset corresponding to the page, and it is cleared whenever the page is > accessed either through page tables (it is cleared in page_referenced() > in this case) or using the read(2) system call (mark_page_accessed()). > Thus by setting the Idle flag for pages of a particular workload, which > can be found e.g. by reading /proc/PID/pagemap, waiting for some time to > let the workload access its working set, and then reading the kpageidle > file, one can estimate the amount of pages that are not used by the > workload. > > The Young page flag is used to avoid interference with the memory > reclaimer. A page's Young flag is set whenever the Access bit of a page > table entry pointing to the page is cleared by writing to kpageidle. If > page_referenced() is called on a Young page, it will add 1 to its return > value, therefore concealing the fact that the Access bit was cleared. > > Note, since there is no room for extra page flags on 32 bit, this > feature uses extended page flags when compiled on 32 bit. > > Signed-off-by: Vladimir Davydov > --- > Documentation/vm/pagemap.txt | 10 ++- > fs/proc/page.c | 154 ++++++++++++++++++++++++++++++++++++++++++ > fs/proc/task_mmu.c | 4 +- > include/linux/mm.h | 88 ++++++++++++++++++++++++ > include/linux/page-flags.h | 9 +++ > include/linux/page_ext.h | 4 ++ > mm/Kconfig | 12 ++++ > mm/debug.c | 4 ++ > mm/page_ext.c | 3 + > mm/rmap.c | 7 ++ > mm/swap.c | 2 + > 11 files changed, 295 insertions(+), 2 deletions(-) > > diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt > index a9b7afc8fbc6..ac6fd32a9296 100644 > --- a/Documentation/vm/pagemap.txt > +++ b/Documentation/vm/pagemap.txt > @@ -5,7 +5,7 @@ pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow > userspace programs to examine the page tables and related information by > reading files in /proc. > > -There are four components to pagemap: > +There are five components to pagemap: > > * /proc/pid/pagemap. This file lets a userspace process find out which > physical frame each virtual page is mapped to. It contains one 64-bit > @@ -69,6 +69,14 @@ There are four components to pagemap: > memory cgroup each page is charged to, indexed by PFN. Only available when > CONFIG_MEMCG is set. > > + * /proc/kpageidle. For each page this file contains a 64-bit number, which > + equals 1 if the page is idle or 0 otherwise, indexed by PFN. A page is > + considered idle if it has not been accessed since it was marked idle. To > + mark a page idle one should write 1 to this file at the offset corresponding > + to the page. Only user memory pages can be marked idle, for other page types > + input is silently ignored. Writing to this file beyond max PFN results in > + the ENXIO error. Only available when CONFIG_IDLE_PAGE_TRACKING is set. > + How about using kpageflags for reading part? I mean PG_idle is one of the page flags and we already have a feature to parse of each PFN flag so we could reuse existing feature for reading idleness. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/