This patch improves THP collapse rates, by allowing zero pages.
Currently THP can collapse 4kB pages into a THP when there
are up to khugepaged_max_ptes_none pte_none ptes in a 2MB
range. This patch counts pte none and mapped zero pages
with the same variable.
The patch was tested with a program that allocates 800MB of
memory, and performs interleaved reads and writes, in a pattern
that causes some 2MB areas to first see read accesses, resulting
in the zero pfn being mapped there.
To simulate memory fragmentation at allocation time, I modified
do_huge_pmd_anonymous_page to return VM_FAULT_FALLBACK for read
faults.
Without the patch, only %50 of the program was collapsed into
THP and the percentage did not increase over time.
With this patch after 10 minutes of waiting khugepaged had
collapsed %89 of the program's memory.
Signed-off-by: Ebru Akagunduz <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
---
mm/huge_memory.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e08e37a..83ef846 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2150,13 +2150,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
{
struct page *page;
pte_t *_pte;
- int none = 0;
+ int none_or_zero = 0;
bool referenced = false, writable = false;
for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
_pte++, address += PAGE_SIZE) {
pte_t pteval = *_pte;
- if (pte_none(pteval)) {
- if (++none <= khugepaged_max_ptes_none)
+ if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+ if (++none_or_zero <= khugepaged_max_ptes_none)
continue;
else
goto out;
@@ -2237,7 +2237,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
pte_t pteval = *_pte;
struct page *src_page;
- if (pte_none(pteval)) {
+ if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
clear_user_highpage(page, address);
add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
} else {
@@ -2573,7 +2573,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
{
pmd_t *pmd;
pte_t *pte, *_pte;
- int ret = 0, none = 0;
+ int ret = 0, none_or_zero = 0;
struct page *page;
unsigned long _address;
spinlock_t *ptl;
@@ -2591,8 +2591,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
_pte++, _address += PAGE_SIZE) {
pte_t pteval = *_pte;
- if (pte_none(pteval)) {
- if (++none <= khugepaged_max_ptes_none)
+ if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+ if (++none_or_zero <= khugepaged_max_ptes_none)
continue;
else
goto out_unmap;
--
1.9.1
On Tue, Feb 10, 2015 at 12:47:37AM +0200, Ebru Akagunduz wrote:
> This patch improves THP collapse rates, by allowing zero pages.
>
> Currently THP can collapse 4kB pages into a THP when there
> are up to khugepaged_max_ptes_none pte_none ptes in a 2MB
> range. This patch counts pte none and mapped zero pages
> with the same variable.
>
> The patch was tested with a program that allocates 800MB of
> memory, and performs interleaved reads and writes, in a pattern
> that causes some 2MB areas to first see read accesses, resulting
> in the zero pfn being mapped there.
>
> To simulate memory fragmentation at allocation time, I modified
> do_huge_pmd_anonymous_page to return VM_FAULT_FALLBACK for read
> faults.
>
> Without the patch, only %50 of the program was collapsed into
> THP and the percentage did not increase over time.
>
> With this patch after 10 minutes of waiting khugepaged had
> collapsed %89 of the program's memory.
This is very good idea, associating it with the sysctl is sensible
here as collapsing zeropages would affect the memory footprint in the
same way as none ptes.
__collapse_huge_page_copy however is likely screwing with the
refcounts of the zero page. Did you have DEBUG_VM=y enabled? If yes
you should get one warning that the zeropage refcount underflowed that
could confirm my concern:
static inline int put_page_testzero(struct page *page)
{
VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0, page);
Zeropages are normally implemented as pte_special if the arch supports
pte_special and have no refcounting. vm_normal_pages returns NULL and
that let it skip the refcounting. But __collapse_huge_page_copy would
call both release_pte_page and free_page_and_swap_cache after a
src_page = pte_page(pteval); and not a src_page =
vm_normal_page(pteval).
So in short I think __collapse_huge_page_copy and release_pte_pages
needs an additional case that complements the already existing special
pte_none case, to account for those zeropages. The special zeropage
case can also use clear_user_highpage(page, address) instead of
copy_user_highpage (clearing uses half the CPU cache of copying so
it's more efficient to use that like for the pte_none case).
Thanks,
Andrea
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/10/2015 04:06 PM, Andrea Arcangeli wrote:
> On Tue, Feb 10, 2015 at 12:47:37AM +0200, Ebru Akagunduz wrote:
>> This patch improves THP collapse rates, by allowing zero pages.
>>
>> Currently THP can collapse 4kB pages into a THP when there are up
>> to khugepaged_max_ptes_none pte_none ptes in a 2MB range. This
>> patch counts pte none and mapped zero pages with the same
>> variable.
>>
>> The patch was tested with a program that allocates 800MB of
>> memory, and performs interleaved reads and writes, in a pattern
>> that causes some 2MB areas to first see read accesses, resulting
>> in the zero pfn being mapped there.
>>
>> To simulate memory fragmentation at allocation time, I modified
>> do_huge_pmd_anonymous_page to return VM_FAULT_FALLBACK for read
>> faults.
>>
>> Without the patch, only %50 of the program was collapsed into THP
>> and the percentage did not increase over time.
>>
>> With this patch after 10 minutes of waiting khugepaged had
>> collapsed %89 of the program's memory.
>
> This is very good idea, associating it with the sysctl is sensible
> here as collapsing zeropages would affect the memory footprint in
> the same way as none ptes.
>
> __collapse_huge_page_copy however is likely screwing with the
> refcounts of the zero page. Did you have DEBUG_VM=y enabled? If
> yes you should get one warning that the zeropage refcount
> underflowed that could confirm my concern:
In __collapse_huge_page_copy, the zero pte takes the same path
as pte_none, so I believe that part of the code is correct.
> So in short I think __collapse_huge_page_copy and
> release_pte_pages needs an additional case that complements the
> already existing special
You are right that release_pte_pages needs a special case too,
in order to skip refcounting on the zero page.
Ebru?
- --
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJU2r3fAAoJEM553pKExN6DJPAH/1TT9uzS0/1wRcN7gn/UP0rb
TpkKzihDOeQgEPfGjd6wUgepU0iVhMX80qBCqk0wIAPgZLnt4IxSl24f09Sm38Cn
zAV0mLySmoaYNisf+qieZ/NF/PDiUOrxGzWJzvm7Ymqq8Mh94qdgpsLy2I+EQioT
RqwbYMMB2XvH3mWOzhQUfnyG5mJMmZtpVcrJ4MIVVq5a3x+Ry668ZT75oNegni5W
Hfax6/8jf4Bjpxc9I/9FvZXzZr9m9yVcGHoCckdGxlnsSSgd60B9b+EYy6AlJpqS
xYkGhKSL0iAAoXYkmrtFdLpdhU/eqhgLb0V2NxcimjrzNG/0LE8fGhb/0SmPXUU=
=085q
-----END PGP SIGNATURE-----