From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
        David Rientjes <rientjes@google.com>,
        =?utf-8?B?SsO2cm4gRW5nZWw=?= <joern@purestorage.com>,
        Andrew Morton <akpm@linux-foundation.org>
CC: "linux-mm@kvack.org" <linux-mm@kvack.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: [PATCH] smaps: fill missing fields for vma(VM_HUGETLB)
Thread-Topic: [PATCH] smaps: fill missing fields for vma(VM_HUGETLB)
Thread-Index: AQHQznRSEg5CNM2w20WnYVDLM1ypkA==
Date: Tue, 4 Aug 2015 05:13:39 +0000
Message-ID: <20150804051339.GA24931@hori1.linux.bs1.fc.nec.co.jp>
References: <55B6BE37.3010804@oracle.com>
 <20150728183248.GB1406@Sligo.logfs.org> <55B7F0F8.8080909@oracle.com>
 <alpine.DEB.2.10.1507281509420.23577@chino.kir.corp.google.com>
 <20150728222654.GA28456@Sligo.logfs.org>
 <alpine.DEB.2.10.1507281622470.10368@chino.kir.corp.google.com>
 <20150729005332.GB17938@Sligo.logfs.org>
 <alpine.DEB.2.10.1507291205590.24373@chino.kir.corp.google.com>
 <55B95FDB.1000801@oracle.com>
 <20150804025530.GA13210@hori1.linux.bs1.fc.nec.co.jp>
In-Reply-To: <20150804025530.GA13210@hori1.linux.bs1.fc.nec.co.jp>
Accept-Language: ja-JP, en-US
Content-Language: ja-JP
Content-Type: text/plain; charset="utf-8"
Content-ID: <D702EB50A1EE8E4AB3996A1307D939AE@gisp.nec.co.jp>
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Transfer-Encoding: 8bit
Content-Length: 5125
Lines: 143

On Tue, Aug 04, 2015 at 02:55:30AM +0000, Naoya Horiguchi wrote:
> On Wed, Jul 29, 2015 at 04:20:59PM -0700, Mike Kravetz wrote:
> > On 07/29/2015 12:08 PM, David Rientjes wrote:
> > >On Tue, 28 Jul 2015, Jörn Engel wrote:
> > >
> > >>Well, we definitely need something.  Having a 100GB process show 3GB of
> > >>rss is not very useful.  How would we notice a memory leak if it only
> > >>affects hugepages, for example?
> > >>
> > >
> > >Since the hugetlb pool is a global resource, it would also be helpful to
> > >determine if a process is mapping more than expected.  You can't do that
> > >just by adding a huge rss metric, however: if you have 2MB and 1GB
> > >hugepages configured you wouldn't know if a process was mapping 512 2MB
> > >hugepages or 1 1GB hugepage.
> > >
> > >That's the purpose of hugetlb_cgroup, after all, and it supports usage
> > >counters for all hstates.  The test could be converted to use that to
> > >measure usage if configured in the kernel.
> > >
> > >Beyond that, I'm not sure how a per-hstate rss metric would be exported to
> > >userspace in a clean way and other ways of obtaining the same data are
> > >possible with hugetlb_cgroup.  I'm not sure how successful you'd be in
> > >arguing that we need separate rss counters for it.
> >
> > If I want to track hugetlb usage on a per-task basis, do I then need to
> > create one cgroup per task?
> >
> > For example, suppose I have many tasks using hugetlb and the global pool
> > is getting low on free pages.  It might be useful to know which tasks are
> > using hugetlb pages, and how many they are using.
> >
> > I don't actually have this need (I think), but it appears to be what
> > Jörn is asking for.
> 
> One possible way to get hugetlb metric in per-task basis is to walk page
> table via /proc/pid/pagemap, and counting page flags for each mapped page
> (we can easily do this with tools/vm/page-types.c like "page-types -p <PID>
> -b huge"). This is obviously slower than just storing the counter as
> in-kernel data and just exporting it, but might be useful in some situation.

BTW, currently smaps doesn't report any meaningful info for vma(VM_HUGETLB).
I wrote the following patch, which hopefully is helpful for your purpose.

Thanks,
Naoya Horiguchi

---
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: [PATCH] smaps: fill missing fields for vma(VM_HUGETLB)

Currently smaps reports many zero fields for vma(VM_HUGETLB), which is
inconvenient when we want to know per-task or per-vma base hugetlb usage.
This patch enables these fields by introducing smaps_hugetlb_range().

before patch:

  Size:              20480 kB
  Rss:                   0 kB
  Pss:                   0 kB
  Shared_Clean:          0 kB
  Shared_Dirty:          0 kB
  Private_Clean:         0 kB
  Private_Dirty:         0 kB
  Referenced:            0 kB
  Anonymous:             0 kB
  AnonHugePages:         0 kB
  Swap:                  0 kB
  KernelPageSize:     2048 kB
  MMUPageSize:        2048 kB
  Locked:                0 kB
  VmFlags: rd wr mr mw me de ht

after patch:

  Size:              20480 kB
  Rss:               18432 kB
  Pss:               18432 kB
  Shared_Clean:          0 kB
  Shared_Dirty:          0 kB
  Private_Clean:         0 kB
  Private_Dirty:     18432 kB
  Referenced:        18432 kB
  Anonymous:         18432 kB
  AnonHugePages:         0 kB
  Swap:                  0 kB
  KernelPageSize:     2048 kB
  MMUPageSize:        2048 kB
  Locked:                0 kB
  VmFlags: rd wr mr mw me de ht

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 fs/proc/task_mmu.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ca1e091881d4..c7218603306d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -610,12 +610,39 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 	seq_putc(m, '\n');
 }
 
+#ifdef CONFIG_HUGETLB_PAGE
+static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
+				 unsigned long addr, unsigned long end,
+				 struct mm_walk *walk)
+{
+	struct mem_size_stats *mss = walk->private;
+	struct vm_area_struct *vma = walk->vma;
+	struct page *page = NULL;
+
+	if (pte_present(*pte)) {
+		page = vm_normal_page(vma, addr, *pte);
+	} else if (is_swap_pte(*pte)) {
+		swp_entry_t swpent = pte_to_swp_entry(*pte);
+
+		if (is_migration_entry(swpent))
+			page = migration_entry_to_page(swpent);
+	}
+	if (page)
+		smaps_account(mss, page, huge_page_size(hstate_vma(vma)),
+			      pte_young(*pte), pte_dirty(*pte));
+	return 0;
+}
+#endif /* HUGETLB_PAGE */
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss;
 	struct mm_walk smaps_walk = {
 		.pmd_entry = smaps_pte_range,
+#ifdef CONFIG_HUGETLB_PAGE
+		.hugetlb_entry = smaps_hugetlb_range,
+#endif
 		.mm = vma->vm_mm,
 		.private = &mss,
 	};
-- 
2.4.3
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?