Andrew, these have gone through a couple of review rounds. Can
they have a spin in -mm?
--
I'm working on some more reports that transparent huge pages and
KSM do not play nicely together. Basically, whenever THP's are
present along with KSM, there is a lot of attrition over time,
and we do not see much overall progress keeping THP's around:
http://sr71.net/~dave/ibm/038_System_Anonymous_Pages.png
(That's Karl Rister's graph, thanks Karl!)
However, I realized that we do not currently have a nice way to
find out where individual THP's might be on the system. We
have an overall count, but no way of telling which processes or
VMAs they might be in.
I started to implement this in the /proc/$pid/smaps code, but
quickly realized that the lib/pagewalk.c code unconditionally
splits THPs up. This set reworks that code a bit and, in the
end, gives you a per-map count of the numbers of huge pages.
It also makes it possible for page walks to _not_ split THPs.
v2 - rework if() block, and remove now redundant split_huge_page()
Right now, if a mm_walk has either ->pte_entry or ->pmd_entry
set, it will unconditionally split any transparent huge pages
it runs in to. In practice, that means that anyone doing a
cat /proc/$pid/smaps
will unconditionally break down every huge page in the process
and depend on khugepaged to re-collapse it later. This is
fairly suboptimal.
This patch changes that behavior. It teaches each ->pmd_entry
handler (there are five) that they must break down the THPs
themselves. Also, the _generic_ code will never break down
a THP unless a ->pte_entry handler is actually set.
This means that the ->pmd_entry handlers can now choose to
deal with THPs without breaking them down.
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
---
linux-2.6.git-dave/fs/proc/task_mmu.c | 6 ++++++
linux-2.6.git-dave/include/linux/mm.h | 3 +++
linux-2.6.git-dave/mm/memcontrol.c | 5 +++--
linux-2.6.git-dave/mm/pagewalk.c | 24 ++++++++++++++++++++----
4 files changed, 32 insertions(+), 6 deletions(-)
diff -puN fs/proc/task_mmu.c~pagewalk-dont-always-split-thp fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~pagewalk-dont-always-split-thp 2011-02-14 09:59:42.438543522 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c 2011-02-14 09:59:42.458544115 -0800
@@ -343,6 +343,8 @@ static int smaps_pte_range(pmd_t *pmd, u
struct page *page;
int mapcount;
+ split_huge_page_pmd(walk->mm, pmd);
+
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
@@ -467,6 +469,8 @@ static int clear_refs_pte_range(pmd_t *p
spinlock_t *ptl;
struct page *page;
+ split_huge_page_pmd(walk->mm, pmd);
+
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
@@ -623,6 +627,8 @@ static int pagemap_pte_range(pmd_t *pmd,
pte_t *pte;
int err = 0;
+ split_huge_page_pmd(walk->mm, pmd);
+
/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
for (; addr != end; addr += PAGE_SIZE) {
diff -puN include/linux/mm.h~pagewalk-dont-always-split-thp include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~pagewalk-dont-always-split-thp 2011-02-14 09:59:42.442543640 -0800
+++ linux-2.6.git-dave/include/linux/mm.h 2011-02-14 09:59:42.458544115 -0800
@@ -899,6 +899,9 @@ unsigned long unmap_vmas(struct mmu_gath
* @pgd_entry: if set, called for each non-empty PGD (top-level) entry
* @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
* @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
+ * this handler is required to be able to handle
+ * pmd_trans_huge() pmds. They may simply choose to
+ * split_huge_page() instead of handling it explicitly.
* @pte_entry: if set, called for each non-empty PTE (4th-level) entry
* @pte_hole: if set, called for each hole at all levels
* @hugetlb_entry: if set, called for each hugetlb entry
diff -puN mm/memcontrol.c~pagewalk-dont-always-split-thp mm/memcontrol.c
--- linux-2.6.git/mm/memcontrol.c~pagewalk-dont-always-split-thp 2011-02-14 09:59:42.446543758 -0800
+++ linux-2.6.git-dave/mm/memcontrol.c 2011-02-14 09:59:42.462544233 -0800
@@ -4737,7 +4737,8 @@ static int mem_cgroup_count_precharge_pt
pte_t *pte;
spinlock_t *ptl;
- VM_BUG_ON(pmd_trans_huge(*pmd));
+ split_huge_page_pmd(walk->mm, pmd);
+
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE)
if (is_target_pte_for_mc(vma, addr, *pte, NULL))
@@ -4899,8 +4900,8 @@ static int mem_cgroup_move_charge_pte_ra
pte_t *pte;
spinlock_t *ptl;
+ split_huge_page_pmd(walk->mm, pmd);
retry:
- VM_BUG_ON(pmd_trans_huge(*pmd));
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; addr += PAGE_SIZE) {
pte_t ptent = *(pte++);
diff -puN mm/pagewalk.c~pagewalk-dont-always-split-thp mm/pagewalk.c
--- linux-2.6.git/mm/pagewalk.c~pagewalk-dont-always-split-thp 2011-02-14 09:59:42.450543877 -0800
+++ linux-2.6.git-dave/mm/pagewalk.c 2011-02-14 09:59:42.466544351 -0800
@@ -33,19 +33,35 @@ static int walk_pmd_range(pud_t *pud, un
pmd = pmd_offset(pud, addr);
do {
+ again:
next = pmd_addr_end(addr, end);
- split_huge_page_pmd(walk->mm, pmd);
- if (pmd_none_or_clear_bad(pmd)) {
+ if (pmd_none(*pmd)) {
if (walk->pte_hole)
err = walk->pte_hole(addr, next, walk);
if (err)
break;
continue;
}
+ /*
+ * This implies that each ->pmd_entry() handler
+ * needs to know about pmd_trans_huge() pmds
+ */
if (walk->pmd_entry)
err = walk->pmd_entry(pmd, addr, next, walk);
- if (!err && walk->pte_entry)
- err = walk_pte_range(pmd, addr, next, walk);
+ if (err)
+ break;
+
+ /*
+ * Check this here so we only break down trans_huge
+ * pages when we _need_ to
+ */
+ if (!walk->pte_entry)
+ continue;
+
+ split_huge_page_pmd(walk->mm, pmd);
+ if (pmd_none_or_clear_bad(pmd))
+ goto again;
+ err = walk_pte_range(pmd, addr, next, walk);
if (err)
break;
} while (pmd++, addr = next, addr != end);
_
v2 - used mm->page_table_lock to fix up locking bug that
Mel pointed out. Also remove Acks since things
got changed significantly.
This adds code to explicitly detect and handle
pmd_trans_huge() pmds. It then passes HPAGE_SIZE units
in to the smap_pte_entry() function instead of PAGE_SIZE.
This means that using /proc/$pid/smaps now will no longer
cause THPs to be broken down in to small pages.
Signed-off-by: Dave Hansen <[email protected]>
---
linux-2.6.git-dave/fs/proc/task_mmu.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff -puN fs/proc/task_mmu.c~teach-smaps_pte_range-about-thp-pmds fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~teach-smaps_pte_range-about-thp-pmds 2011-02-14 09:59:44.034590716 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c 2011-02-21 15:12:46.144181298 -0800
@@ -1,5 +1,6 @@
#include <linux/mm.h>
#include <linux/hugetlb.h>
+#include <linux/huge_mm.h>
#include <linux/mount.h>
#include <linux/seq_file.h>
#include <linux/highmem.h>
@@ -7,6 +8,7 @@
#include <linux/slab.h>
#include <linux/pagemap.h>
#include <linux/mempolicy.h>
+#include <linux/rmap.h>
#include <linux/swap.h>
#include <linux/swapops.h>
@@ -385,8 +387,25 @@ static int smaps_pte_range(pmd_t *pmd, u
pte_t *pte;
spinlock_t *ptl;
- split_huge_page_pmd(walk->mm, pmd);
-
+ spin_lock(&walk->mm->page_table_lock);
+ if (pmd_trans_huge(*pmd)) {
+ if (pmd_trans_splitting(*pmd)) {
+ spin_unlock(&walk->mm->page_table_lock);
+ wait_split_huge_page(vma->anon_vma, pmd);
+ } else {
+ smaps_pte_entry(*(pte_t *)pmd, addr,
+ HPAGE_PMD_SIZE, walk);
+ spin_unlock(&walk->mm->page_table_lock);
+ return 0;
+ }
+ } else {
+ spin_unlock(&walk->mm->page_table_lock);
+ }
+ /*
+ * The mmap_sem held all the way back in m_start() is what
+ * keeps khugepaged out of here and from collapsing things
+ * in here.
+ */
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE)
smaps_pte_entry(*pte, addr, PAGE_SIZE, walk);
diff -puN mm/migrate.c~teach-smaps_pte_range-about-thp-pmds mm/migrate.c
diff -puN mm/mincore.c~teach-smaps_pte_range-about-thp-pmds mm/mincore.c
diff -puN include/linux/mm.h~teach-smaps_pte_range-about-thp-pmds include/linux/mm.h
diff -puN mm/mempolicy.c~teach-smaps_pte_range-about-thp-pmds mm/mempolicy.c
_
This patch adds an argument to the new smaps_pte_entry()
function to let it account in things other than PAGE_SIZE
units. I changed all of the PAGE_SIZE sites, even though
not all of them can be reached for transparent huge pages,
just so this will continue to work without changes as THPs
are improved.
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
---
linux-2.6.git-dave/fs/proc/task_mmu.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff -puN fs/proc/task_mmu.c~give-smaps_pte_range-a-size-arg fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~give-smaps_pte_range-a-size-arg 2011-02-14 09:59:43.530575814 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c 2011-02-14 09:59:43.538576050 -0800
@@ -335,7 +335,7 @@ struct mem_size_stats {
static void smaps_pte_entry(pte_t ptent, unsigned long addr,
- struct mm_walk *walk)
+ unsigned long ptent_size, struct mm_walk *walk)
{
struct mem_size_stats *mss = walk->private;
struct vm_area_struct *vma = mss->vma;
@@ -343,7 +343,7 @@ static void smaps_pte_entry(pte_t ptent,
int mapcount;
if (is_swap_pte(ptent)) {
- mss->swap += PAGE_SIZE;
+ mss->swap += ptent_size;
return;
}
@@ -355,25 +355,25 @@ static void smaps_pte_entry(pte_t ptent,
return;
if (PageAnon(page))
- mss->anonymous += PAGE_SIZE;
+ mss->anonymous += ptent_size;
- mss->resident += PAGE_SIZE;
+ mss->resident += ptent_size;
/* Accumulate the size in pages that have been accessed. */
if (pte_young(ptent) || PageReferenced(page))
- mss->referenced += PAGE_SIZE;
+ mss->referenced += ptent_size;
mapcount = page_mapcount(page);
if (mapcount >= 2) {
if (pte_dirty(ptent) || PageDirty(page))
- mss->shared_dirty += PAGE_SIZE;
+ mss->shared_dirty += ptent_size;
else
- mss->shared_clean += PAGE_SIZE;
- mss->pss += (PAGE_SIZE << PSS_SHIFT) / mapcount;
+ mss->shared_clean += ptent_size;
+ mss->pss += (ptent_size << PSS_SHIFT) / mapcount;
} else {
if (pte_dirty(ptent) || PageDirty(page))
- mss->private_dirty += PAGE_SIZE;
+ mss->private_dirty += ptent_size;
else
- mss->private_clean += PAGE_SIZE;
- mss->pss += (PAGE_SIZE << PSS_SHIFT);
+ mss->private_clean += ptent_size;
+ mss->pss += (ptent_size << PSS_SHIFT);
}
}
@@ -389,7 +389,7 @@ static int smaps_pte_range(pmd_t *pmd, u
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE)
- smaps_pte_entry(*pte, addr, walk);
+ smaps_pte_entry(*pte, addr, PAGE_SIZE, walk);
pte_unmap_unlock(pte - 1, ptl);
cond_resched();
return 0;
_
Now that the mere act of _looking_ at /proc/$pid/smaps will not
destroy transparent huge pages, tell how much of the VMA is
actually mapped with them.
This way, we can make sure that we're getting THPs where we
expect to see them.
v3 - * changed HPAGE_SIZE to HPAGE_PMD_SIZE, probably more correct
and also has a nice BUG() in case there was a .config mishap
* remove direct reference to ->page_table_lock, and used the
passed-in ptl pointer insteadl
Acked-by: Mel Gorman <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
---
linux-2.6.git-dave/fs/proc/task_mmu.c | 4 ++++
1 file changed, 4 insertions(+)
diff -puN fs/proc/task_mmu.c~teach-smaps-thp fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~teach-smaps-thp 2011-02-21 15:07:55.707591741 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c 2011-02-21 15:07:55.803594580 -0800
@@ -331,6 +331,7 @@ struct mem_size_stats {
unsigned long private_dirty;
unsigned long referenced;
unsigned long anonymous;
+ unsigned long anonymous_thp;
unsigned long swap;
u64 pss;
};
@@ -396,6 +397,7 @@ static int smaps_pte_range(pmd_t *pmd, u
smaps_pte_entry(*(pte_t *)pmd, addr,
HPAGE_PMD_SIZE, walk);
spin_unlock(&walk->mm->page_table_lock);
+ mss->anonymous_thp += HPAGE_PMD_SIZE;
return 0;
}
} else {
@@ -444,6 +446,7 @@ static int show_smap(struct seq_file *m,
"Private_Dirty: %8lu kB\n"
"Referenced: %8lu kB\n"
"Anonymous: %8lu kB\n"
+ "AnonHugePages: %8lu kB\n"
"Swap: %8lu kB\n"
"KernelPageSize: %8lu kB\n"
"MMUPageSize: %8lu kB\n"
@@ -457,6 +460,7 @@ static int show_smap(struct seq_file *m,
mss.private_dirty >> 10,
mss.referenced >> 10,
mss.anonymous >> 10,
+ mss.anonymous_thp >> 10,
mss.swap >> 10,
vma_kernel_pagesize(vma) >> 10,
vma_mmu_pagesize(vma) >> 10,
diff -puN include/linux/huge_mm.h~teach-smaps-thp include/linux/huge_mm.h
diff -puN mm/memory-failure.c~teach-smaps-thp mm/memory-failure.c
diff -puN mm/huge_memory.c~teach-smaps-thp mm/huge_memory.c
_
We will use smaps_pte_entry() in a moment to handle both small
and transparent large pages. But, we must break it out of
smaps_pte_range() first.
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
---
linux-2.6.git-dave/fs/proc/task_mmu.c | 85 ++++++++++++++++++----------------
1 file changed, 46 insertions(+), 39 deletions(-)
diff -puN fs/proc/task_mmu.c~break-out-smaps_pte_entry fs/proc/task_mmu.c
--- linux-2.6.git/fs/proc/task_mmu.c~break-out-smaps_pte_entry 2011-02-14 09:59:43.030561028 -0800
+++ linux-2.6.git-dave/fs/proc/task_mmu.c 2011-02-14 09:59:43.038561264 -0800
@@ -333,56 +333,63 @@ struct mem_size_stats {
u64 pss;
};
-static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+
+static void smaps_pte_entry(pte_t ptent, unsigned long addr,
+ struct mm_walk *walk)
{
struct mem_size_stats *mss = walk->private;
struct vm_area_struct *vma = mss->vma;
- pte_t *pte, ptent;
- spinlock_t *ptl;
struct page *page;
int mapcount;
- split_huge_page_pmd(walk->mm, pmd);
-
- pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- for (; addr != end; pte++, addr += PAGE_SIZE) {
- ptent = *pte;
+ if (is_swap_pte(ptent)) {
+ mss->swap += PAGE_SIZE;
+ return;
+ }
- if (is_swap_pte(ptent)) {
- mss->swap += PAGE_SIZE;
- continue;
- }
+ if (!pte_present(ptent))
+ return;
- if (!pte_present(ptent))
- continue;
+ page = vm_normal_page(vma, addr, ptent);
+ if (!page)
+ return;
+
+ if (PageAnon(page))
+ mss->anonymous += PAGE_SIZE;
+
+ mss->resident += PAGE_SIZE;
+ /* Accumulate the size in pages that have been accessed. */
+ if (pte_young(ptent) || PageReferenced(page))
+ mss->referenced += PAGE_SIZE;
+ mapcount = page_mapcount(page);
+ if (mapcount >= 2) {
+ if (pte_dirty(ptent) || PageDirty(page))
+ mss->shared_dirty += PAGE_SIZE;
+ else
+ mss->shared_clean += PAGE_SIZE;
+ mss->pss += (PAGE_SIZE << PSS_SHIFT) / mapcount;
+ } else {
+ if (pte_dirty(ptent) || PageDirty(page))
+ mss->private_dirty += PAGE_SIZE;
+ else
+ mss->private_clean += PAGE_SIZE;
+ mss->pss += (PAGE_SIZE << PSS_SHIFT);
+ }
+}
- page = vm_normal_page(vma, addr, ptent);
- if (!page)
- continue;
+static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
+{
+ struct mem_size_stats *mss = walk->private;
+ struct vm_area_struct *vma = mss->vma;
+ pte_t *pte;
+ spinlock_t *ptl;
- if (PageAnon(page))
- mss->anonymous += PAGE_SIZE;
+ split_huge_page_pmd(walk->mm, pmd);
- mss->resident += PAGE_SIZE;
- /* Accumulate the size in pages that have been accessed. */
- if (pte_young(ptent) || PageReferenced(page))
- mss->referenced += PAGE_SIZE;
- mapcount = page_mapcount(page);
- if (mapcount >= 2) {
- if (pte_dirty(ptent) || PageDirty(page))
- mss->shared_dirty += PAGE_SIZE;
- else
- mss->shared_clean += PAGE_SIZE;
- mss->pss += (PAGE_SIZE << PSS_SHIFT) / mapcount;
- } else {
- if (pte_dirty(ptent) || PageDirty(page))
- mss->private_dirty += PAGE_SIZE;
- else
- mss->private_clean += PAGE_SIZE;
- mss->pss += (PAGE_SIZE << PSS_SHIFT);
- }
- }
+ pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+ for (; addr != end; pte++, addr += PAGE_SIZE)
+ smaps_pte_entry(*pte, addr, walk);
pte_unmap_unlock(pte - 1, ptl);
cond_resched();
return 0;
_
On Mon, Feb 21, 2011 at 05:53:43PM -0800, Dave Hansen wrote:
> @@ -385,8 +387,25 @@ static int smaps_pte_range(pmd_t *pmd, u
> pte_t *pte;
> spinlock_t *ptl;
>
> - split_huge_page_pmd(walk->mm, pmd);
> -
> + spin_lock(&walk->mm->page_table_lock);
> + if (pmd_trans_huge(*pmd)) {
> + if (pmd_trans_splitting(*pmd)) {
> + spin_unlock(&walk->mm->page_table_lock);
> + wait_split_huge_page(vma->anon_vma, pmd);
> + } else {
> + smaps_pte_entry(*(pte_t *)pmd, addr,
> + HPAGE_PMD_SIZE, walk);
> + spin_unlock(&walk->mm->page_table_lock);
> + return 0;
> + }
> + } else {
> + spin_unlock(&walk->mm->page_table_lock);
> + }
> + /*
> + * The mmap_sem held all the way back in m_start() is what
> + * keeps khugepaged out of here and from collapsing things
> + * in here.
> + */
This time the locking is right and HPAGE_PMD_SIZE is used instead of
HPAGE_SIZE, thanks! I think all 5 patches can go in -mm and upstream
anytime (not mandatory for 2.6.38 but definitely we want this for
2.6.39).
BTW, Andi in his NUMA THP improvement series added a THP_SPLIT vmstat
per-cpu counter so that part removed from his series, is taken care by
him.
Acked-by: Andrea Arcangeli <[email protected]>
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> v2 - rework if() block, and remove now redundant split_huge_page()
>
> Right now, if a mm_walk has either ->pte_entry or ->pmd_entry
> set, it will unconditionally split any transparent huge pages
> it runs in to. In practice, that means that anyone doing a
>
> cat /proc/$pid/smaps
>
> will unconditionally break down every huge page in the process
> and depend on khugepaged to re-collapse it later. This is
> fairly suboptimal.
>
> This patch changes that behavior. It teaches each ->pmd_entry
> handler (there are five) that they must break down the THPs
> themselves. Also, the _generic_ code will never break down
> a THP unless a ->pte_entry handler is actually set.
>
> This means that the ->pmd_entry handlers can now choose to
> deal with THPs without breaking them down.
>
> Acked-by: Mel Gorman <[email protected]>
> Signed-off-by: Dave Hansen <[email protected]>
Acked-by: David Rientjes <[email protected]>
Thanks for adding the comment about ->pmd_entry() being required to split
the pages in include/linux/mm.h!
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> We will use smaps_pte_entry() in a moment to handle both small
> and transparent large pages. But, we must break it out of
> smaps_pte_range() first.
>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Signed-off-by: Dave Hansen <[email protected]>
Acked-by: David Rientjes <[email protected]>
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> v2 - used mm->page_table_lock to fix up locking bug that
> Mel pointed out. Also remove Acks since things
> got changed significantly.
>
> This adds code to explicitly detect and handle
> pmd_trans_huge() pmds. It then passes HPAGE_SIZE units
> in to the smap_pte_entry() function instead of PAGE_SIZE.
>
> This means that using /proc/$pid/smaps now will no longer
> cause THPs to be broken down in to small pages.
>
> Signed-off-by: Dave Hansen <[email protected]>
Acked-by: David Rientjes <[email protected]>
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> This patch adds an argument to the new smaps_pte_entry()
> function to let it account in things other than PAGE_SIZE
> units. I changed all of the PAGE_SIZE sites, even though
> not all of them can be reached for transparent huge pages,
> just so this will continue to work without changes as THPs
> are improved.
>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Signed-off-by: Dave Hansen <[email protected]>
Acked-by: David Rientjes <[email protected]>
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> v2 - rework if() block, and remove now redundant split_huge_page()
>
> Right now, if a mm_walk has either ->pte_entry or ->pmd_entry
> set, it will unconditionally split any transparent huge pages
> it runs in to. In practice, that means that anyone doing a
>
> cat /proc/$pid/smaps
>
> will unconditionally break down every huge page in the process
> and depend on khugepaged to re-collapse it later. This is
> fairly suboptimal.
>
> This patch changes that behavior. It teaches each ->pmd_entry
> handler (there are five) that they must break down the THPs
> themselves. Also, the _generic_ code will never break down
> a THP unless a ->pte_entry handler is actually set.
>
> This means that the ->pmd_entry handlers can now choose to
> deal with THPs without breaking them down.
>
> Acked-by: Mel Gorman <[email protected]>
> Signed-off-by: Dave Hansen <[email protected]>
I have been running this set for serveral hours now and viewing
various smaps files is not causing wild shifts in my AnonHugePages:
counter.
Reviewed-and-tested-by: Eric B Munson <[email protected]>
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> v2 - used mm->page_table_lock to fix up locking bug that
> Mel pointed out. Also remove Acks since things
> got changed significantly.
>
> This adds code to explicitly detect and handle
> pmd_trans_huge() pmds. It then passes HPAGE_SIZE units
> in to the smap_pte_entry() function instead of PAGE_SIZE.
>
> This means that using /proc/$pid/smaps now will no longer
> cause THPs to be broken down in to small pages.
>
> Signed-off-by: Dave Hansen <[email protected]>
Reviewed-and-tested-by: Eric B Munson <[email protected]>
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> We will use smaps_pte_entry() in a moment to handle both small
> and transparent large pages. But, we must break it out of
> smaps_pte_range() first.
>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Signed-off-by: Dave Hansen <[email protected]>
Reviewed-and-tested-by: Eric B Munson <[email protected]>
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> Now that the mere act of _looking_ at /proc/$pid/smaps will not
> destroy transparent huge pages, tell how much of the VMA is
> actually mapped with them.
>
> This way, we can make sure that we're getting THPs where we
> expect to see them.
>
> v3 - * changed HPAGE_SIZE to HPAGE_PMD_SIZE, probably more correct
> and also has a nice BUG() in case there was a .config mishap
> * remove direct reference to ->page_table_lock, and used the
> passed-in ptl pointer insteadl
>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: David Rientjes <[email protected]>
> Signed-off-by: Dave Hansen <[email protected]>
Reviewed-and-tested-by: Eric B Munson <[email protected]>
On Mon, 21 Feb 2011, Dave Hansen wrote:
>
> This patch adds an argument to the new smaps_pte_entry()
> function to let it account in things other than PAGE_SIZE
> units. I changed all of the PAGE_SIZE sites, even though
> not all of them can be reached for transparent huge pages,
> just so this will continue to work without changes as THPs
> are improved.
>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Signed-off-by: Dave Hansen <[email protected]>
Reviewed-and-tested-by: Eric B Munson <[email protected]>