Hi,
There's few bugfixes for THP refcounting patchset. It should address most
reported bugs.
I need to track down one more bug: rss-counter mismatch on exit.
Kirill A. Shutemov (4):
mm: do not crash on PageDoubleMap() for non-head pages
mm: duplicate rmap reference for hugetlb pages as compound
thp: fix split vs. unmap race
mm: prepare page_referenced() and page_idle to new THP refcounting
include/linux/huge_mm.h | 4 --
include/linux/mm.h | 19 +++++++
include/linux/page-flags.h | 3 +-
mm/huge_memory.c | 76 ++++++-------------------
mm/migrate.c | 2 +-
mm/page_idle.c | 64 ++++++++++++++++++---
mm/rmap.c | 137 ++++++++++++++++++++++++++++-----------------
7 files changed, 180 insertions(+), 125 deletions(-)
--
2.6.1
We usually don't call PageDoubleMap() on small or tail pages, but during
read from /proc/kpageflags we don't protect the page from being freed under
us and it can lead to VM_BUG_ON_PAGE() in PageDoubleMap():
page:ffffea00033e0000 count:0 mapcount:0 mapping: (null) index:0x700000200
flags: 0x4000000000000000()
page dumped because: VM_BUG_ON_PAGE(!PageHead(page))
page->mem_cgroup:ffff88021588cc00
------------[ cut here ]------------
kernel BUG at /src/linux-dev/include/linux/page-flags.h:552!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: cfg80211 rfkill crc32c_intel virtio_balloon serio_raw i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
CPU: 0 PID: 1183 Comm: page-types Not tainted 4.2.0-mmotm-2015-10-21-14-41-151027-1418-00014-41+ #179
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff880214a08bc0 ti: ffff880213e2c000 task.ti: ffff880213e2c000
RIP: 0010:[<ffffffff812434b6>] [<ffffffff812434b6>] stable_page_flags+0x336/0x340
RSP: 0018:ffff880213e2fda8 EFLAGS: 00010292
RAX: 0000000000000021 RBX: ffff8802150a39c0 RCX: 0000000000000000
RDX: ffff88021ec0ff38 RSI: ffff88021ec0d658 RDI: ffff88021ec0d658
RBP: ffff880213e2fdc8 R08: 000000000000000a R09: 000000000000132f
R10: 0000000000000000 R11: 000000000000132f R12: 4000000000000000
R13: ffffea00033e6340 R14: 00007fff8449e430 R15: ffffea00033e6340
FS: 00007ff7f9525700(0000) GS:ffff88021ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000063b800 CR3: 00000000d9e71000 CR4: 00000000000006f0
Stack:
ffff8800db82df80 ffff8802150a39c0 0000000000000008 00000000000cf98d
ffff880213e2fe18 ffffffff81243588 00007fff8449e430 ffff880213e2ff20
000000000063b800 ffff8802150a39c0 fffffffffffffffb ffff880213e2ff20
Call Trace:
[<ffffffff81243588>] kpageflags_read+0xc8/0x130
[<ffffffff81235848>] proc_reg_read+0x48/0x70
[<ffffffff811d6b08>] __vfs_read+0x28/0xd0
[<ffffffff812ee43e>] ? security_file_permission+0xae/0xc0
[<ffffffff811d6f53>] ? rw_verify_area+0x53/0xf0
[<ffffffff811d707a>] vfs_read+0x8a/0x130
[<ffffffff811d7bf7>] SyS_pread64+0x77/0x90
[<ffffffff81648117>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: ca 00 00 40 01 48 39 c1 48 0f 44 da e9 a2 fd ff ff 48 c7 c6 50 a6 a1 8 1 e8 58 ab f4 ff 0f 0b 48 c7 c6 90 a2 a1 81 e8 4a ab f4 ff <0f> 0b 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57
RIP [<ffffffff812434b6>] stable_page_flags+0x336/0x340
RSP <ffff880213e2fda8>
---[ end trace e5d18553088c026a ]---
Let's drop the VM_BUG_ON_PAGE() from PageDoubleMap() and return false for
non-head pages.
The patch can be folded into
"mm: rework mapcount accounting to enable 4k mapping of THPs"
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-by: Naoya Horiguchi <[email protected]>
---
include/linux/page-flags.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 72356fbc3f2d..26cc7a068126 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -549,8 +549,7 @@ static inline int PageTransTail(struct page *page)
*/
static inline int PageDoubleMap(struct page *page)
{
- VM_BUG_ON_PAGE(!PageHead(page), page);
- return test_bit(PG_double_map, &page[1].flags);
+ return PageHead(page) && test_bit(PG_double_map, &page[1].flags);
}
static inline int TestSetPageDoubleMap(struct page *page)
--
2.6.1
Naoya noticed that I wrongly duplicate rmap reference for hugetlb pages
in remove_migration_pte() as non-compound. Let's fix this.
The patch can be folded into
"mm: rework mapcount accounting to enable 4k mapping of THPs"
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-by: Naoya Horiguchi <[email protected]>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 1ae0113559c9..b1034f9c77e7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -165,7 +165,7 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
if (PageAnon(new))
hugepage_add_anon_rmap(new, vma, addr);
else
- page_dup_rmap(new, false);
+ page_dup_rmap(new, true);
} else if (PageAnon(new))
page_add_anon_rmap(new, vma, addr, false);
else
--
2.6.1
To stabilize compound page during split we use migration entries.
The code to implement this is buggy: I wrongly assumed that kernel would
wait migration to finish, before zapping ptes.
But turn out that's not true.
As result if zap_pte_range() races with split_huge_page(), we can end up
with page which is not mapped anymore but has _count and _mapcount
elevated. The page is on LRU too. So it's still reachable by vmscan and by
pfn scanners. It's likely that page->mapping in this case would point to
freed anon_vma.
BOOM!
The patch modify freeze/unfreeze_page() code to match normal migration
entries logic: on setup we remove page from rmap and drop pin, on removing
we get pin back and put page on rmap. This way even if migration entry
will be removed under us we don't corrupt page's state.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-by: Minchan Kim <[email protected]>
Reported-by: Sasha Levin <[email protected]>
---
mm/huge_memory.c | 22 ++++++++++++++++++----
mm/rmap.c | 19 +++++--------------
2 files changed, 23 insertions(+), 18 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5009f68786d0..3700981f8035 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2934,6 +2934,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
+
+ if (freeze) {
+ for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+ page_remove_rmap(page + i, false);
+ put_page(page + i);
+ }
+ }
}
void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
@@ -3079,6 +3086,8 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
if (pte_soft_dirty(entry))
swp_pte = pte_swp_mksoft_dirty(swp_pte);
set_pte_at(vma->vm_mm, address, pte + i, swp_pte);
+ page_remove_rmap(page, false);
+ put_page(page);
}
pte_unmap_unlock(pte, ptl);
}
@@ -3117,8 +3126,6 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
return;
pte = pte_offset_map_lock(vma->vm_mm, pmd, address, &ptl);
for (i = 0; i < HPAGE_PMD_NR; i++, address += PAGE_SIZE, page++) {
- if (!page_mapped(page))
- continue;
if (!is_swap_pte(pte[i]))
continue;
@@ -3128,6 +3135,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
if (migration_entry_to_page(swp_entry) != page)
continue;
+ get_page(page);
+ page_add_anon_rmap(page, vma, address, false);
+
entry = pte_mkold(mk_pte(page, vma->vm_page_prot));
entry = pte_mkdirty(entry);
if (is_write_migration_entry(swp_entry))
@@ -3195,8 +3205,6 @@ static int __split_huge_page_tail(struct page *head, int tail,
*/
atomic_add(mapcount + 1, &page_tail->_count);
- /* after clearing PageTail the gup refcount can be released */
- smp_mb__after_atomic();
page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
page_tail->flags |= (head->flags &
@@ -3209,6 +3217,12 @@ static int __split_huge_page_tail(struct page *head, int tail,
(1L << PG_unevictable)));
page_tail->flags |= (1L << PG_dirty);
+ /*
+ * After clearing PageTail the gup refcount can be released.
+ * Page flags also must be visible before we make the page non-compound.
+ */
+ smp_wmb();
+
clear_compound_head(page_tail);
if (page_is_young(head))
diff --git a/mm/rmap.c b/mm/rmap.c
index 288622f5f34d..ad9af8b3a381 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1135,20 +1135,12 @@ void do_page_add_anon_rmap(struct page *page,
bool compound = flags & RMAP_COMPOUND;
bool first;
- if (PageTransCompound(page)) {
+ if (compound) {
+ atomic_t *mapcount;
VM_BUG_ON_PAGE(!PageLocked(page), page);
- if (compound) {
- atomic_t *mapcount;
-
- VM_BUG_ON_PAGE(!PageTransHuge(page), page);
- mapcount = compound_mapcount_ptr(page);
- first = atomic_inc_and_test(mapcount);
- } else {
- /* Anon THP always mapped first with PMD */
- first = 0;
- VM_BUG_ON_PAGE(!page_mapcount(page), page);
- atomic_inc(&page->_mapcount);
- }
+ VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+ mapcount = compound_mapcount_ptr(page);
+ first = atomic_inc_and_test(mapcount);
} else {
VM_BUG_ON_PAGE(compound, page);
first = atomic_inc_and_test(&page->_mapcount);
@@ -1163,7 +1155,6 @@ void do_page_add_anon_rmap(struct page *page,
* disabled.
*/
if (compound) {
- VM_BUG_ON_PAGE(!PageTransHuge(page), page);
__inc_zone_page_state(page,
NR_ANON_TRANSPARENT_HUGEPAGES);
}
--
2.6.1
I've missed two simlar codepath which need some preparation to work well
with reworked THP refcounting.
Both page_referenced() and page_idle_clear_pte_refs_one() assume that
THP can only be mapped with PMD, so there's no reason to look on PTEs
for PageTransHuge() pages. That's no true anymore: THP can be mapped
with PTEs too.
The patch removes PageTransHuge() test from the functions and opencode
page table check.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Vladimir Davydov <[email protected]>
---
include/linux/huge_mm.h | 4 --
include/linux/mm.h | 19 ++++++++
mm/huge_memory.c | 54 ----------------------
mm/page_idle.c | 64 ++++++++++++++++++++++----
mm/rmap.c | 118 +++++++++++++++++++++++++++++++++---------------
5 files changed, 155 insertions(+), 104 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index f7c3f13f3a9c..5c7b00e88236 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -51,10 +51,6 @@ enum transparent_hugepage_flag {
#endif
};
-extern pmd_t *page_check_address_pmd(struct page *page,
- struct mm_struct *mm,
- unsigned long address,
- spinlock_t **ptl);
extern int pmd_freeable(pmd_t pmd);
#define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b4cd988a794a..a36f9fa4e4cd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -432,6 +432,25 @@ static inline int page_mapcount(struct page *page)
return ret;
}
+static inline int total_mapcount(struct page *page)
+{
+ int i, ret;
+
+ VM_BUG_ON_PAGE(PageTail(page), page);
+
+ if (likely(!PageCompound(page)))
+ return atomic_read(&page->_mapcount) + 1;
+
+ ret = compound_mapcount(page);
+ if (PageHuge(page))
+ return ret;
+ for (i = 0; i < HPAGE_PMD_NR; i++)
+ ret += atomic_read(&page[i]._mapcount) + 1;
+ if (PageDoubleMap(page))
+ ret -= HPAGE_PMD_NR;
+ return ret;
+}
+
static inline int page_count(struct page *page)
{
return atomic_read(&compound_head(page)->_count);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3700981f8035..14cbbad54a3e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1713,46 +1713,6 @@ bool __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
return false;
}
-/*
- * This function returns whether a given @page is mapped onto the @address
- * in the virtual space of @mm.
- *
- * When it's true, this function returns *pmd with holding the page table lock
- * and passing it back to the caller via @ptl.
- * If it's false, returns NULL without holding the page table lock.
- */
-pmd_t *page_check_address_pmd(struct page *page,
- struct mm_struct *mm,
- unsigned long address,
- spinlock_t **ptl)
-{
- pgd_t *pgd;
- pud_t *pud;
- pmd_t *pmd;
-
- if (address & ~HPAGE_PMD_MASK)
- return NULL;
-
- pgd = pgd_offset(mm, address);
- if (!pgd_present(*pgd))
- return NULL;
- pud = pud_offset(pgd, address);
- if (!pud_present(*pud))
- return NULL;
- pmd = pmd_offset(pud, address);
-
- *ptl = pmd_lock(mm, pmd);
- if (!pmd_present(*pmd))
- goto unlock;
- if (pmd_page(*pmd) != page)
- goto unlock;
- if (pmd_trans_huge(*pmd))
- return pmd;
-unlock:
- spin_unlock(*ptl);
- return NULL;
-}
-
#define VM_NO_THP (VM_SPECIAL | VM_HUGETLB | VM_SHARED | VM_MAYSHARE)
int hugepage_madvise(struct vm_area_struct *vma,
@@ -3169,20 +3129,6 @@ static void unfreeze_page(struct anon_vma *anon_vma, struct page *page)
}
}
-static int total_mapcount(struct page *page)
-{
- int i, ret;
-
- ret = compound_mapcount(page);
- for (i = 0; i < HPAGE_PMD_NR; i++)
- ret += atomic_read(&page[i]._mapcount) + 1;
-
- if (PageDoubleMap(page))
- ret -= HPAGE_PMD_NR;
-
- return ret;
-}
-
static int __split_huge_page_tail(struct page *head, int tail,
struct lruvec *lruvec, struct list_head *list)
{
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 1c245d9027e3..2c9ebe12b40d 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -56,23 +56,69 @@ static int page_idle_clear_pte_refs_one(struct page *page,
{
struct mm_struct *mm = vma->vm_mm;
spinlock_t *ptl;
+ pgd_t *pgd;
+ pud_t *pud;
pmd_t *pmd;
pte_t *pte;
bool referenced = false;
- if (unlikely(PageTransHuge(page))) {
- pmd = page_check_address_pmd(page, mm, addr, &ptl);
- if (pmd) {
- referenced = pmdp_clear_young_notify(vma, addr, pmd);
+ pgd = pgd_offset(mm, addr);
+ if (!pgd_present(*pgd))
+ return SWAP_AGAIN;
+ pud = pud_offset(pgd, addr);
+ if (!pud_present(*pud))
+ return SWAP_AGAIN;
+ pmd = pmd_offset(pud, addr);
+
+ if (pmd_trans_huge(*pmd)) {
+ ptl = pmd_lock(mm, pmd);
+ if (!pmd_present(*pmd))
+ goto unlock_pmd;
+ if (unlikely(!pmd_trans_huge(*pmd))) {
spin_unlock(ptl);
+ goto map_pte;
}
+
+ if (pmd_page(*pmd) != page)
+ goto unlock_pmd;
+
+ referenced = pmdp_clear_young_notify(vma, addr, pmd);
+ spin_unlock(ptl);
+ goto found;
+unlock_pmd:
+ spin_unlock(ptl);
+ return SWAP_AGAIN;
} else {
- pte = page_check_address(page, mm, addr, &ptl, 0);
- if (pte) {
- referenced = ptep_clear_young_notify(vma, addr, pte);
- pte_unmap_unlock(pte, ptl);
- }
+ pmd_t pmde = *pmd;
+ barrier();
+ if (!pmd_present(pmde) || pmd_trans_huge(pmde))
+ return SWAP_AGAIN;
+
+ }
+map_pte:
+ pte = pte_offset_map(pmd, addr);
+ if (!pte_present(*pte)) {
+ pte_unmap(pte);
+ return SWAP_AGAIN;
}
+
+ ptl = pte_lockptr(mm, pmd);
+ spin_lock(ptl);
+
+ if (!pte_present(*pte)) {
+ pte_unmap_unlock(pte, ptl);
+ return SWAP_AGAIN;
+ }
+
+ /* THP can be referenced by any subpage */
+ if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
+ pte_unmap_unlock(pte, ptl);
+ return SWAP_AGAIN;
+ }
+
+ referenced = ptep_clear_young_notify(vma, addr, pte);
+ pte_unmap_unlock(pte, ptl);
+found:
if (referenced) {
clear_page_idle(page);
/*
diff --git a/mm/rmap.c b/mm/rmap.c
index ad9af8b3a381..0837487d3737 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -812,60 +812,104 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
spinlock_t *ptl;
int referenced = 0;
struct page_referenced_arg *pra = arg;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
- if (unlikely(PageTransHuge(page))) {
- pmd_t *pmd;
-
- /*
- * rmap might return false positives; we must filter
- * these out using page_check_address_pmd().
- */
- pmd = page_check_address_pmd(page, mm, address, &ptl);
- if (!pmd)
+ if (unlikely(PageHuge(page))) {
+ /* when pud is not present, pte will be NULL */
+ pte = huge_pte_offset(mm, address);
+ if (!pte)
return SWAP_AGAIN;
- if (vma->vm_flags & VM_LOCKED) {
+ ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
+ goto check_pte;
+ }
+
+ pgd = pgd_offset(mm, address);
+ if (!pgd_present(*pgd))
+ return SWAP_AGAIN;
+ pud = pud_offset(pgd, address);
+ if (!pud_present(*pud))
+ return SWAP_AGAIN;
+ pmd = pmd_offset(pud, address);
+
+ if (pmd_trans_huge(*pmd)) {
+ int ret = SWAP_AGAIN;
+
+ ptl = pmd_lock(mm, pmd);
+ if (!pmd_present(*pmd))
+ goto unlock_pmd;
+ if (unlikely(!pmd_trans_huge(*pmd))) {
spin_unlock(ptl);
+ goto map_pte;
+ }
+
+ if (pmd_page(*pmd) != page)
+ goto unlock_pmd;
+
+ if (vma->vm_flags & VM_LOCKED) {
pra->vm_flags |= VM_LOCKED;
- return SWAP_FAIL; /* To break the loop */
+ ret = SWAP_FAIL; /* To break the loop */
+ goto unlock_pmd;
}
if (pmdp_clear_flush_young_notify(vma, address, pmd))
referenced++;
-
spin_unlock(ptl);
+ goto found;
+unlock_pmd:
+ spin_unlock(ptl);
+ return ret;
} else {
- pte_t *pte;
-
- /*
- * rmap might return false positives; we must filter
- * these out using page_check_address().
- */
- pte = page_check_address(page, mm, address, &ptl, 0);
- if (!pte)
+ pmd_t pmde = *pmd;
+ barrier();
+ if (!pmd_present(pmde) || pmd_trans_huge(pmde))
return SWAP_AGAIN;
+ }
+map_pte:
+ pte = pte_offset_map(pmd, address);
+ if (!pte_present(*pte)) {
+ pte_unmap(pte);
+ return SWAP_AGAIN;
+ }
- if (vma->vm_flags & VM_LOCKED) {
- pte_unmap_unlock(pte, ptl);
- pra->vm_flags |= VM_LOCKED;
- return SWAP_FAIL; /* To break the loop */
- }
+ ptl = pte_lockptr(mm, pmd);
+check_pte:
+ spin_lock(ptl);
- if (ptep_clear_flush_young_notify(vma, address, pte)) {
- /*
- * Don't treat a reference through a sequentially read
- * mapping as such. If the page has been used in
- * another mapping, we will catch it; if this other
- * mapping is already gone, the unmap path will have
- * set PG_referenced or activated the page.
- */
- if (likely(!(vma->vm_flags & VM_SEQ_READ)))
- referenced++;
- }
+ if (!pte_present(*pte)) {
+ pte_unmap_unlock(pte, ptl);
+ return SWAP_AGAIN;
+ }
+
+ /* THP can be referenced by any subpage */
+ if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
+ pte_unmap_unlock(pte, ptl);
+ return SWAP_AGAIN;
+ }
+ if (vma->vm_flags & VM_LOCKED) {
pte_unmap_unlock(pte, ptl);
+ pra->vm_flags |= VM_LOCKED;
+ return SWAP_FAIL; /* To break the loop */
}
+ if (ptep_clear_flush_young_notify(vma, address, pte)) {
+ /*
+ * Don't treat a reference through a sequentially read
+ * mapping as such. If the page has been used in
+ * another mapping, we will catch it; if this other
+ * mapping is already gone, the unmap path will have
+ * set PG_referenced or activated the page.
+ */
+ if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+ referenced++;
+ }
+ pte_unmap_unlock(pte, ptl);
+
+found:
if (referenced)
clear_page_idle(page);
if (test_and_clear_page_young(page))
@@ -912,7 +956,7 @@ int page_referenced(struct page *page,
int ret;
int we_locked = 0;
struct page_referenced_arg pra = {
- .mapcount = page_mapcount(page),
+ .mapcount = total_mapcount(page),
.memcg = memcg,
};
struct rmap_walk_control rwc = {
--
2.6.1
> @@ -1135,20 +1135,12 @@ void do_page_add_anon_rmap(struct page *page,
> bool compound = flags & RMAP_COMPOUND;
> bool first;
>
> - if (PageTransCompound(page)) {
> + if (compound) {
> + atomic_t *mapcount;
> VM_BUG_ON_PAGE(!PageLocked(page), page);
> - if (compound) {
> - atomic_t *mapcount;
> -
> - VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> - mapcount = compound_mapcount_ptr(page);
> - first = atomic_inc_and_test(mapcount);
> - } else {
> - /* Anon THP always mapped first with PMD */
> - first = 0;
> - VM_BUG_ON_PAGE(!page_mapcount(page), page);
> - atomic_inc(&page->_mapcount);
> - }
> + VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> + mapcount = compound_mapcount_ptr(page);
> + first = atomic_inc_and_test(mapcount);
> } else {
> VM_BUG_ON_PAGE(compound, page);
Then this debug info is no longer needed.
> first = atomic_inc_and_test(&page->_mapcount);
On Tue, Nov 03, 2015 at 05:26:15PM +0200, Kirill A. Shutemov wrote:
...
> @@ -56,23 +56,69 @@ static int page_idle_clear_pte_refs_one(struct page *page,
> {
> struct mm_struct *mm = vma->vm_mm;
> spinlock_t *ptl;
> + pgd_t *pgd;
> + pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
> bool referenced = false;
>
> - if (unlikely(PageTransHuge(page))) {
> - pmd = page_check_address_pmd(page, mm, addr, &ptl);
> - if (pmd) {
> - referenced = pmdp_clear_young_notify(vma, addr, pmd);
> + pgd = pgd_offset(mm, addr);
> + if (!pgd_present(*pgd))
> + return SWAP_AGAIN;
> + pud = pud_offset(pgd, addr);
> + if (!pud_present(*pud))
> + return SWAP_AGAIN;
> + pmd = pmd_offset(pud, addr);
> +
> + if (pmd_trans_huge(*pmd)) {
> + ptl = pmd_lock(mm, pmd);
> + if (!pmd_present(*pmd))
> + goto unlock_pmd;
> + if (unlikely(!pmd_trans_huge(*pmd))) {
> spin_unlock(ptl);
> + goto map_pte;
> }
> +
> + if (pmd_page(*pmd) != page)
> + goto unlock_pmd;
> +
> + referenced = pmdp_clear_young_notify(vma, addr, pmd);
> + spin_unlock(ptl);
> + goto found;
> +unlock_pmd:
> + spin_unlock(ptl);
> + return SWAP_AGAIN;
> } else {
> - pte = page_check_address(page, mm, addr, &ptl, 0);
> - if (pte) {
> - referenced = ptep_clear_young_notify(vma, addr, pte);
> - pte_unmap_unlock(pte, ptl);
> - }
> + pmd_t pmde = *pmd;
> + barrier();
> + if (!pmd_present(pmde) || pmd_trans_huge(pmde))
> + return SWAP_AGAIN;
> +
> + }
> +map_pte:
> + pte = pte_offset_map(pmd, addr);
> + if (!pte_present(*pte)) {
> + pte_unmap(pte);
> + return SWAP_AGAIN;
> }
> +
> + ptl = pte_lockptr(mm, pmd);
> + spin_lock(ptl);
> +
> + if (!pte_present(*pte)) {
> + pte_unmap_unlock(pte, ptl);
> + return SWAP_AGAIN;
> + }
> +
> + /* THP can be referenced by any subpage */
> + if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
> + pte_unmap_unlock(pte, ptl);
> + return SWAP_AGAIN;
> + }
> +
> + referenced = ptep_clear_young_notify(vma, addr, pte);
> + pte_unmap_unlock(pte, ptl);
> +found:
Can't we hide this stuff in a helper function, which would be used by
both page_referenced_one and page_idle_clear_pte_refs_one, instead of
duplicating page_referenced_one code here?
Thanks,
Vladimir
On Thu, Nov 05, 2015 at 12:10:13PM +0300, Vladimir Davydov wrote:
> On Tue, Nov 03, 2015 at 05:26:15PM +0200, Kirill A. Shutemov wrote:
> ...
> > @@ -56,23 +56,69 @@ static int page_idle_clear_pte_refs_one(struct page *page,
> > {
> > struct mm_struct *mm = vma->vm_mm;
> > spinlock_t *ptl;
> > + pgd_t *pgd;
> > + pud_t *pud;
> > pmd_t *pmd;
> > pte_t *pte;
> > bool referenced = false;
> >
> > - if (unlikely(PageTransHuge(page))) {
> > - pmd = page_check_address_pmd(page, mm, addr, &ptl);
> > - if (pmd) {
> > - referenced = pmdp_clear_young_notify(vma, addr, pmd);
> > + pgd = pgd_offset(mm, addr);
> > + if (!pgd_present(*pgd))
> > + return SWAP_AGAIN;
> > + pud = pud_offset(pgd, addr);
> > + if (!pud_present(*pud))
> > + return SWAP_AGAIN;
> > + pmd = pmd_offset(pud, addr);
> > +
> > + if (pmd_trans_huge(*pmd)) {
> > + ptl = pmd_lock(mm, pmd);
> > + if (!pmd_present(*pmd))
> > + goto unlock_pmd;
> > + if (unlikely(!pmd_trans_huge(*pmd))) {
> > spin_unlock(ptl);
> > + goto map_pte;
> > }
> > +
> > + if (pmd_page(*pmd) != page)
> > + goto unlock_pmd;
> > +
> > + referenced = pmdp_clear_young_notify(vma, addr, pmd);
> > + spin_unlock(ptl);
> > + goto found;
> > +unlock_pmd:
> > + spin_unlock(ptl);
> > + return SWAP_AGAIN;
> > } else {
> > - pte = page_check_address(page, mm, addr, &ptl, 0);
> > - if (pte) {
> > - referenced = ptep_clear_young_notify(vma, addr, pte);
> > - pte_unmap_unlock(pte, ptl);
> > - }
> > + pmd_t pmde = *pmd;
> > + barrier();
> > + if (!pmd_present(pmde) || pmd_trans_huge(pmde))
> > + return SWAP_AGAIN;
> > +
> > + }
> > +map_pte:
> > + pte = pte_offset_map(pmd, addr);
> > + if (!pte_present(*pte)) {
> > + pte_unmap(pte);
> > + return SWAP_AGAIN;
> > }
> > +
> > + ptl = pte_lockptr(mm, pmd);
> > + spin_lock(ptl);
> > +
> > + if (!pte_present(*pte)) {
> > + pte_unmap_unlock(pte, ptl);
> > + return SWAP_AGAIN;
> > + }
> > +
> > + /* THP can be referenced by any subpage */
> > + if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
> > + pte_unmap_unlock(pte, ptl);
> > + return SWAP_AGAIN;
> > + }
> > +
> > + referenced = ptep_clear_young_notify(vma, addr, pte);
> > + pte_unmap_unlock(pte, ptl);
> > +found:
>
> Can't we hide this stuff in a helper function, which would be used by
> both page_referenced_one and page_idle_clear_pte_refs_one, instead of
> duplicating page_referenced_one code here?
I would like to, but there's no obvious way to do that: PMDs and PTEs
require different handling.
Any ideas?
--
Kirill A. Shutemov
On Wed, Nov 04, 2015 at 05:20:15PM +0800, Hillf Danton wrote:
> > @@ -1135,20 +1135,12 @@ void do_page_add_anon_rmap(struct page *page,
> > bool compound = flags & RMAP_COMPOUND;
> > bool first;
> >
> > - if (PageTransCompound(page)) {
> > + if (compound) {
> > + atomic_t *mapcount;
> > VM_BUG_ON_PAGE(!PageLocked(page), page);
> > - if (compound) {
> > - atomic_t *mapcount;
> > -
> > - VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > - mapcount = compound_mapcount_ptr(page);
> > - first = atomic_inc_and_test(mapcount);
> > - } else {
> > - /* Anon THP always mapped first with PMD */
> > - first = 0;
> > - VM_BUG_ON_PAGE(!page_mapcount(page), page);
> > - atomic_inc(&page->_mapcount);
> > - }
> > + VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> > + mapcount = compound_mapcount_ptr(page);
> > + first = atomic_inc_and_test(mapcount);
> > } else {
> > VM_BUG_ON_PAGE(compound, page);
>
> Then this debug info is no longer needed.
> > first = atomic_inc_and_test(&page->_mapcount);
Right.
diff --git a/mm/rmap.c b/mm/rmap.c
index 0837487d3737..a9550b1f74cd 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1186,7 +1186,6 @@ void do_page_add_anon_rmap(struct page *page,
mapcount = compound_mapcount_ptr(page);
first = atomic_inc_and_test(mapcount);
} else {
- VM_BUG_ON_PAGE(compound, page);
first = atomic_inc_and_test(&page->_mapcount);
}
--
Kirill A. Shutemov
On Thu, Nov 05, 2015 at 11:24:59AM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 05, 2015 at 12:10:13PM +0300, Vladimir Davydov wrote:
> > On Tue, Nov 03, 2015 at 05:26:15PM +0200, Kirill A. Shutemov wrote:
> > ...
> > > @@ -56,23 +56,69 @@ static int page_idle_clear_pte_refs_one(struct page *page,
> > > {
> > > struct mm_struct *mm = vma->vm_mm;
> > > spinlock_t *ptl;
> > > + pgd_t *pgd;
> > > + pud_t *pud;
> > > pmd_t *pmd;
> > > pte_t *pte;
> > > bool referenced = false;
> > >
> > > - if (unlikely(PageTransHuge(page))) {
> > > - pmd = page_check_address_pmd(page, mm, addr, &ptl);
> > > - if (pmd) {
> > > - referenced = pmdp_clear_young_notify(vma, addr, pmd);
> > > + pgd = pgd_offset(mm, addr);
> > > + if (!pgd_present(*pgd))
> > > + return SWAP_AGAIN;
> > > + pud = pud_offset(pgd, addr);
> > > + if (!pud_present(*pud))
> > > + return SWAP_AGAIN;
> > > + pmd = pmd_offset(pud, addr);
> > > +
> > > + if (pmd_trans_huge(*pmd)) {
> > > + ptl = pmd_lock(mm, pmd);
> > > + if (!pmd_present(*pmd))
> > > + goto unlock_pmd;
> > > + if (unlikely(!pmd_trans_huge(*pmd))) {
> > > spin_unlock(ptl);
> > > + goto map_pte;
> > > }
> > > +
> > > + if (pmd_page(*pmd) != page)
> > > + goto unlock_pmd;
> > > +
> > > + referenced = pmdp_clear_young_notify(vma, addr, pmd);
> > > + spin_unlock(ptl);
> > > + goto found;
> > > +unlock_pmd:
> > > + spin_unlock(ptl);
> > > + return SWAP_AGAIN;
> > > } else {
> > > - pte = page_check_address(page, mm, addr, &ptl, 0);
> > > - if (pte) {
> > > - referenced = ptep_clear_young_notify(vma, addr, pte);
> > > - pte_unmap_unlock(pte, ptl);
> > > - }
> > > + pmd_t pmde = *pmd;
> > > + barrier();
> > > + if (!pmd_present(pmde) || pmd_trans_huge(pmde))
> > > + return SWAP_AGAIN;
> > > +
> > > + }
> > > +map_pte:
> > > + pte = pte_offset_map(pmd, addr);
> > > + if (!pte_present(*pte)) {
> > > + pte_unmap(pte);
> > > + return SWAP_AGAIN;
> > > }
> > > +
> > > + ptl = pte_lockptr(mm, pmd);
> > > + spin_lock(ptl);
> > > +
> > > + if (!pte_present(*pte)) {
> > > + pte_unmap_unlock(pte, ptl);
> > > + return SWAP_AGAIN;
> > > + }
> > > +
> > > + /* THP can be referenced by any subpage */
> > > + if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
> > > + pte_unmap_unlock(pte, ptl);
> > > + return SWAP_AGAIN;
> > > + }
> > > +
> > > + referenced = ptep_clear_young_notify(vma, addr, pte);
> > > + pte_unmap_unlock(pte, ptl);
> > > +found:
> >
> > Can't we hide this stuff in a helper function, which would be used by
> > both page_referenced_one and page_idle_clear_pte_refs_one, instead of
> > duplicating page_referenced_one code here?
>
> I would like to, but there's no obvious way to do that: PMDs and PTEs
> require different handling.
>
> Any ideas?
Something like this? [COMPLETELY UNTESTED]
---
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 853f4f3c6742..bb9169d07c2b 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -216,6 +216,10 @@ static inline pte_t *page_check_address(struct page *page, struct mm_struct *mm,
return ptep;
}
+pte_t *page_check_address_transhuge(struct page *page, struct mm_struct *mm,
+ unsigned long address,
+ pmd_t **pmdp, spinlock_t **ptlp);
+
/*
* Used by swapoff to help locate where page is expected in vma.
*/
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 2c9ebe12b40d..6574ef6a1a96 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -56,69 +56,21 @@ static int page_idle_clear_pte_refs_one(struct page *page,
{
struct mm_struct *mm = vma->vm_mm;
spinlock_t *ptl;
- pgd_t *pgd;
- pud_t *pud;
pmd_t *pmd;
pte_t *pte;
bool referenced = false;
- pgd = pgd_offset(mm, addr);
- if (!pgd_present(*pgd))
+ pte = page_check_address_transhuge(page, mm, address, &pmd, &ptl);
+ if (!pte)
return SWAP_AGAIN;
- pud = pud_offset(pgd, addr);
- if (!pud_present(*pud))
- return SWAP_AGAIN;
- pmd = pmd_offset(pud, addr);
-
- if (pmd_trans_huge(*pmd)) {
- ptl = pmd_lock(mm, pmd);
- if (!pmd_present(*pmd))
- goto unlock_pmd;
- if (unlikely(!pmd_trans_huge(*pmd))) {
- spin_unlock(ptl);
- goto map_pte;
- }
- if (pmd_page(*pmd) != page)
- goto unlock_pmd;
-
- referenced = pmdp_clear_young_notify(vma, addr, pmd);
- spin_unlock(ptl);
- goto found;
-unlock_pmd:
- spin_unlock(ptl);
- return SWAP_AGAIN;
- } else {
- pmd_t pmde = *pmd;
- barrier();
- if (!pmd_present(pmde) || pmd_trans_huge(pmde))
- return SWAP_AGAIN;
-
- }
-map_pte:
- pte = pte_offset_map(pmd, addr);
- if (!pte_present(*pte)) {
- pte_unmap(pte);
- return SWAP_AGAIN;
- }
+ if (pte == pmd) /* trans huge */
+ referenced = pmdp_clear_young_notify(vma, address, pmd);
+ else
+ referenced = ptep_clear_young_notify(vma, addr, pte);
- ptl = pte_lockptr(mm, pmd);
- spin_lock(ptl);
-
- if (!pte_present(*pte)) {
- pte_unmap_unlock(pte, ptl);
- return SWAP_AGAIN;
- }
-
- /* THP can be referenced by any subpage */
- if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
- pte_unmap_unlock(pte, ptl);
- return SWAP_AGAIN;
- }
-
- referenced = ptep_clear_young_notify(vma, addr, pte);
pte_unmap_unlock(pte, ptl);
-found:
+
if (referenced) {
clear_page_idle(page);
/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 1f90bda685b6..3638190cf7bc 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -796,48 +796,35 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
return 1;
}
-struct page_referenced_arg {
- int mapcount;
- int referenced;
- unsigned long vm_flags;
- struct mem_cgroup *memcg;
-};
-/*
- * arg: page_referenced_arg will be passed
- */
-static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
- unsigned long address, void *arg)
+pte_t *page_check_address_transhuge(struct page *page, struct mm_struct *mm,
+ unsigned long address,
+ pmd_t **pmdp, spinlock_t **ptlp)
{
- struct mm_struct *mm = vma->vm_mm;
- spinlock_t *ptl;
- int referenced = 0;
- struct page_referenced_arg *pra = arg;
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
+ spinlock_t *ptl;
if (unlikely(PageHuge(page))) {
/* when pud is not present, pte will be NULL */
pte = huge_pte_offset(mm, address);
if (!pte)
- return SWAP_AGAIN;
+ return NULL;
ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
+ pmd = NULL;
goto check_pte;
}
pgd = pgd_offset(mm, address);
if (!pgd_present(*pgd))
- return SWAP_AGAIN;
- pud = pud_offset(pgd, address);
+ return NULL;
if (!pud_present(*pud))
- return SWAP_AGAIN;
+ return NULL;
pmd = pmd_offset(pud, address);
if (pmd_trans_huge(*pmd)) {
- int ret = SWAP_AGAIN;
-
ptl = pmd_lock(mm, pmd);
if (!pmd_present(*pmd))
goto unlock_pmd;
@@ -849,30 +836,23 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
if (pmd_page(*pmd) != page)
goto unlock_pmd;
- if (vma->vm_flags & VM_LOCKED) {
- pra->vm_flags |= VM_LOCKED;
- ret = SWAP_FAIL; /* To break the loop */
- goto unlock_pmd;
- }
-
- if (pmdp_clear_flush_young_notify(vma, address, pmd))
- referenced++;
- spin_unlock(ptl);
+ pte = (pte_t *)pmd;
goto found;
unlock_pmd:
spin_unlock(ptl);
- return ret;
+ return NULL;
} else {
pmd_t pmde = *pmd;
barrier();
if (!pmd_present(pmde) || pmd_trans_huge(pmde))
- return SWAP_AGAIN;
+ return NULL;
}
+
map_pte:
pte = pte_offset_map(pmd, address);
if (!pte_present(*pte)) {
pte_unmap(pte);
- return SWAP_AGAIN;
+ return NULL;
}
ptl = pte_lockptr(mm, pmd);
@@ -881,35 +861,66 @@ check_pte:
if (!pte_present(*pte)) {
pte_unmap_unlock(pte, ptl);
- return SWAP_AGAIN;
+ return NULL;
}
/* THP can be referenced by any subpage */
if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
pte_unmap_unlock(pte, ptl);
- return SWAP_AGAIN;
+ return NULL;
}
+found:
+ *ptlp = ptl;
+ *pmdp = pmd;
+ return pte;
+}
+
+struct page_referenced_arg {
+ int mapcount;
+ int referenced;
+ unsigned long vm_flags;
+ struct mem_cgroup *memcg;
+};
+/*
+ * arg: page_referenced_arg will be passed
+ */
+static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
+ unsigned long address, void *arg)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ int referenced = 0;
+ struct page_referenced_arg *pra = arg;
+ pmd_t *pmd;
+ pte_t *pte;
+ spinlock_t *ptl;
+
+ pte = page_check_address_transhuge(page, mm, address, &pmd, &ptl);
+ if (!pte)
+ return SWAP_AGAIN;
if (vma->vm_flags & VM_LOCKED) {
pte_unmap_unlock(pte, ptl);
- pra->vm_flags |= VM_LOCKED;
return SWAP_FAIL; /* To break the loop */
}
- if (ptep_clear_flush_young_notify(vma, address, pte)) {
- /*
- * Don't treat a reference through a sequentially read
- * mapping as such. If the page has been used in
- * another mapping, we will catch it; if this other
- * mapping is already gone, the unmap path will have
- * set PG_referenced or activated the page.
- */
- if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+ if (pte == pmd) { /* trans huge */
+ if (pmdp_clear_flush_young_notify(vma, address, pmd))
referenced++;
+ } else {
+ if (ptep_clear_flush_young_notify(vma, address, pte)) {
+ /*
+ * Don't treat a reference through a sequentially read
+ * mapping as such. If the page has been used in
+ * another mapping, we will catch it; if this other
+ * mapping is already gone, the unmap path will have
+ * set PG_referenced or activated the page.
+ */
+ if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+ referenced++;
+ }
}
pte_unmap_unlock(pte, ptl);
-found:
if (referenced)
clear_page_idle(page);
if (test_and_clear_page_young(page))
On Thu, Nov 05, 2015 at 03:07:26PM +0300, Vladimir Davydov wrote:
> @@ -849,30 +836,23 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> if (pmd_page(*pmd) != page)
> goto unlock_pmd;
>
> - if (vma->vm_flags & VM_LOCKED) {
> - pra->vm_flags |= VM_LOCKED;
> - ret = SWAP_FAIL; /* To break the loop */
> - goto unlock_pmd;
> - }
> -
> - if (pmdp_clear_flush_young_notify(vma, address, pmd))
> - referenced++;
> - spin_unlock(ptl);
> + pte = (pte_t *)pmd;
pmd_t and pte_t are not always compatible. We shouldn't pretend they are.
And we shouldn't use pte_unmap_unlock() to unlock pmd table.
What about interface like this (I'm not sure about helper's name):
void page_check_address_transhuge(struct page *page, struct mm_struct *mm,
unsigned long address,
pmd_t **pmdp, pte_t **ptep,
spinlock_t **ptlp);
page_check_address_transhuge(page, mm, address, &pmd, &pte, &ptl);
if (pmd) {
/* handle pmd... */
} else if (pte) {
/* handle pte... */
} else {
return SWAP_AGAIN;
}
/* common stuff */
if (pmd)
spin_unlock(ptl);
else
pte_unmap_unlock(pte, ptl);
/* ... */
The helper shouldn't set pmd if the page is tracked to pte.
--
Kirill A. Shutemov
On Thu, Nov 05, 2015 at 02:36:06PM +0200, Kirill A. Shutemov wrote:
> On Thu, Nov 05, 2015 at 03:07:26PM +0300, Vladimir Davydov wrote:
> > @@ -849,30 +836,23 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> > if (pmd_page(*pmd) != page)
> > goto unlock_pmd;
> >
> > - if (vma->vm_flags & VM_LOCKED) {
> > - pra->vm_flags |= VM_LOCKED;
> > - ret = SWAP_FAIL; /* To break the loop */
> > - goto unlock_pmd;
> > - }
> > -
> > - if (pmdp_clear_flush_young_notify(vma, address, pmd))
> > - referenced++;
> > - spin_unlock(ptl);
> > + pte = (pte_t *)pmd;
>
> pmd_t and pte_t are not always compatible. We shouldn't pretend they are.
> And we shouldn't use pte_unmap_unlock() to unlock pmd table.
Out of curiosity, is it OK that __page_check_address can call
pte_unmap_unlock on pte returned by huge_pte_offset, which isn't really
pte, but pmd or pud?
>
> What about interface like this (I'm not sure about helper's name):
>
> void page_check_address_transhuge(struct page *page, struct mm_struct *mm,
> unsigned long address,
> pmd_t **pmdp, pte_t **ptep,
> spinlock_t **ptlp);
>
> page_check_address_transhuge(page, mm, address, &pmd, &pte, &ptl);
> if (pmd) {
> /* handle pmd... */
> } else if (pte) {
> /* handle pte... */
> } else {
> return SWAP_AGAIN;
> }
>
> /* common stuff */
>
> if (pmd)
> spin_unlock(ptl);
> else
> pte_unmap_unlock(pte, ptl);
spin_unlock(ptl);
if (pte)
pte_unmap(pte);
would look neater IMO. Other than that, I think it's OK. At least, it
looks better and less error-prone than duplicating such a huge chunk of
code IMO.
Thanks,
Vladimir
>
> /* ... */
>
> The helper shouldn't set pmd if the page is tracked to pte.
On Thu, Nov 05, 2015 at 03:53:54PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 05, 2015 at 02:36:06PM +0200, Kirill A. Shutemov wrote:
> > On Thu, Nov 05, 2015 at 03:07:26PM +0300, Vladimir Davydov wrote:
> > > @@ -849,30 +836,23 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> > > if (pmd_page(*pmd) != page)
> > > goto unlock_pmd;
> > >
> > > - if (vma->vm_flags & VM_LOCKED) {
> > > - pra->vm_flags |= VM_LOCKED;
> > > - ret = SWAP_FAIL; /* To break the loop */
> > > - goto unlock_pmd;
> > > - }
> > > -
> > > - if (pmdp_clear_flush_young_notify(vma, address, pmd))
> > > - referenced++;
> > > - spin_unlock(ptl);
> > > + pte = (pte_t *)pmd;
> >
> > pmd_t and pte_t are not always compatible. We shouldn't pretend they are.
> > And we shouldn't use pte_unmap_unlock() to unlock pmd table.
>
> Out of curiosity, is it OK that __page_check_address can call
> pte_unmap_unlock on pte returned by huge_pte_offset, which isn't really
> pte, but pmd or pud?
hugetlb is usually implemented on architectures where you can expect some
level of compatibility between page table enties on different levels.
> > What about interface like this (I'm not sure about helper's name):
> >
> > void page_check_address_transhuge(struct page *page, struct mm_struct *mm,
> > unsigned long address,
> > pmd_t **pmdp, pte_t **ptep,
> > spinlock_t **ptlp);
> >
> > page_check_address_transhuge(page, mm, address, &pmd, &pte, &ptl);
> > if (pmd) {
> > /* handle pmd... */
> > } else if (pte) {
> > /* handle pte... */
> > } else {
> > return SWAP_AGAIN;
> > }
> >
> > /* common stuff */
> >
> > if (pmd)
> > spin_unlock(ptl);
> > else
> > pte_unmap_unlock(pte, ptl);
>
> spin_unlock(ptl);
> if (pte)
> pte_unmap(pte);
>
> would look neater IMO. Other than that, I think it's OK. At least, it
> looks better and less error-prone than duplicating such a huge chunk of
> code IMO.
Okay. Could you prepare the patch?
--
Kirill A. Shutemov
On Tue, Nov 03, 2015 at 05:26:15PM +0200, Kirill A. Shutemov wrote:
...
> @@ -812,60 +812,104 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> spinlock_t *ptl;
> int referenced = 0;
> struct page_referenced_arg *pra = arg;
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte;
>
> - if (unlikely(PageTransHuge(page))) {
> - pmd_t *pmd;
> -
> - /*
> - * rmap might return false positives; we must filter
> - * these out using page_check_address_pmd().
> - */
> - pmd = page_check_address_pmd(page, mm, address, &ptl);
> - if (!pmd)
> + if (unlikely(PageHuge(page))) {
> + /* when pud is not present, pte will be NULL */
> + pte = huge_pte_offset(mm, address);
> + if (!pte)
> return SWAP_AGAIN;
>
> - if (vma->vm_flags & VM_LOCKED) {
> + ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
> + goto check_pte;
> + }
> +
> + pgd = pgd_offset(mm, address);
> + if (!pgd_present(*pgd))
> + return SWAP_AGAIN;
> + pud = pud_offset(pgd, address);
> + if (!pud_present(*pud))
> + return SWAP_AGAIN;
> + pmd = pmd_offset(pud, address);
> +
> + if (pmd_trans_huge(*pmd)) {
> + int ret = SWAP_AGAIN;
> +
> + ptl = pmd_lock(mm, pmd);
> + if (!pmd_present(*pmd))
> + goto unlock_pmd;
> + if (unlikely(!pmd_trans_huge(*pmd))) {
> spin_unlock(ptl);
> + goto map_pte;
> + }
> +
> + if (pmd_page(*pmd) != page)
> + goto unlock_pmd;
> +
> + if (vma->vm_flags & VM_LOCKED) {
> pra->vm_flags |= VM_LOCKED;
> - return SWAP_FAIL; /* To break the loop */
> + ret = SWAP_FAIL; /* To break the loop */
> + goto unlock_pmd;
> }
>
> if (pmdp_clear_flush_young_notify(vma, address, pmd))
> referenced++;
> -
> spin_unlock(ptl);
> + goto found;
> +unlock_pmd:
> + spin_unlock(ptl);
> + return ret;
> } else {
> - pte_t *pte;
> -
> - /*
> - * rmap might return false positives; we must filter
> - * these out using page_check_address().
> - */
> - pte = page_check_address(page, mm, address, &ptl, 0);
> - if (!pte)
> + pmd_t pmde = *pmd;
> + barrier();
This is supposed to be
pmd_t pmde = READ_ONCE(*pmd);
Right?
I don't understand why we need a barrier here. Why can't we just do
} else if (!pmd_present(*pmd))
reutnr SWAP_AGAIN;
?
Thanks,
Vladimir
> + if (!pmd_present(pmde) || pmd_trans_huge(pmde))
> return SWAP_AGAIN;
> + }
On Thu, Nov 05, 2015 at 02:58:38PM +0200, Kirill A. Shutemov wrote:
> Okay. Could you prepare the patch?
OK, give me some time.
Thanks,
Vladimir
On Thu, Nov 05, 2015 at 07:03:24PM +0300, Vladimir Davydov wrote:
> On Tue, Nov 03, 2015 at 05:26:15PM +0200, Kirill A. Shutemov wrote:
> ...
> > @@ -812,60 +812,104 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> > spinlock_t *ptl;
> > int referenced = 0;
> > struct page_referenced_arg *pra = arg;
> > + pgd_t *pgd;
> > + pud_t *pud;
> > + pmd_t *pmd;
> > + pte_t *pte;
> >
> > - if (unlikely(PageTransHuge(page))) {
> > - pmd_t *pmd;
> > -
> > - /*
> > - * rmap might return false positives; we must filter
> > - * these out using page_check_address_pmd().
> > - */
> > - pmd = page_check_address_pmd(page, mm, address, &ptl);
> > - if (!pmd)
> > + if (unlikely(PageHuge(page))) {
> > + /* when pud is not present, pte will be NULL */
> > + pte = huge_pte_offset(mm, address);
> > + if (!pte)
> > return SWAP_AGAIN;
> >
> > - if (vma->vm_flags & VM_LOCKED) {
> > + ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
> > + goto check_pte;
> > + }
> > +
> > + pgd = pgd_offset(mm, address);
> > + if (!pgd_present(*pgd))
> > + return SWAP_AGAIN;
> > + pud = pud_offset(pgd, address);
> > + if (!pud_present(*pud))
> > + return SWAP_AGAIN;
> > + pmd = pmd_offset(pud, address);
> > +
> > + if (pmd_trans_huge(*pmd)) {
> > + int ret = SWAP_AGAIN;
> > +
> > + ptl = pmd_lock(mm, pmd);
> > + if (!pmd_present(*pmd))
> > + goto unlock_pmd;
> > + if (unlikely(!pmd_trans_huge(*pmd))) {
> > spin_unlock(ptl);
> > + goto map_pte;
> > + }
> > +
> > + if (pmd_page(*pmd) != page)
> > + goto unlock_pmd;
> > +
> > + if (vma->vm_flags & VM_LOCKED) {
> > pra->vm_flags |= VM_LOCKED;
> > - return SWAP_FAIL; /* To break the loop */
> > + ret = SWAP_FAIL; /* To break the loop */
> > + goto unlock_pmd;
> > }
> >
> > if (pmdp_clear_flush_young_notify(vma, address, pmd))
> > referenced++;
> > -
> > spin_unlock(ptl);
> > + goto found;
> > +unlock_pmd:
> > + spin_unlock(ptl);
> > + return ret;
> > } else {
> > - pte_t *pte;
> > -
> > - /*
> > - * rmap might return false positives; we must filter
> > - * these out using page_check_address().
> > - */
> > - pte = page_check_address(page, mm, address, &ptl, 0);
> > - if (!pte)
> > + pmd_t pmde = *pmd;
> > + barrier();
>
> This is supposed to be
>
> pmd_t pmde = READ_ONCE(*pmd);
>
> Right?
See e37c69827063. If I read this correctly, barrier() is less overhead for
some archs.
>
> I don't understand why we need a barrier here. Why can't we just do
>
> } else if (!pmd_present(*pmd))
> reutnr SWAP_AGAIN;
>
> ?
See f72e7dcdd252 too.
> > + if (!pmd_present(pmde) || pmd_trans_huge(pmde))
> > return SWAP_AGAIN;
> > + }
--
Kirill A. Shutemov
On Tue, 3 Nov 2015 17:26:15 +0200 "Kirill A. Shutemov" <[email protected]> wrote:
> I've missed two simlar codepath which need some preparation to work well
> with reworked THP refcounting.
>
> Both page_referenced() and page_idle_clear_pte_refs_one() assume that
> THP can only be mapped with PMD, so there's no reason to look on PTEs
> for PageTransHuge() pages. That's no true anymore: THP can be mapped
> with PTEs too.
>
> The patch removes PageTransHuge() test from the functions and opencode
> page table check.
x86_64 allnoconfig:
In file included from mm/rmap.c:47:
include/linux/mm.h: In function 'page_referenced':
include/linux/mm.h:448: error: call to '__compiletime_assert_448' declared with attribute error: BUILD_BUG failed
make[1]: *** [mm/rmap.o] Error 1
make: *** [mm/rmap.o] Error 2
because
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
btw, total_mapcount() is far too large to be inlined and
page_mapcount() is getting pretty bad too.
On Thu, Nov 05, 2015 at 04:32:11PM -0800, Andrew Morton wrote:
> On Tue, 3 Nov 2015 17:26:15 +0200 "Kirill A. Shutemov" <[email protected]> wrote:
>
> > I've missed two simlar codepath which need some preparation to work well
> > with reworked THP refcounting.
> >
> > Both page_referenced() and page_idle_clear_pte_refs_one() assume that
> > THP can only be mapped with PMD, so there's no reason to look on PTEs
> > for PageTransHuge() pages. That's no true anymore: THP can be mapped
> > with PTEs too.
> >
> > The patch removes PageTransHuge() test from the functions and opencode
> > page table check.
>
> x86_64 allnoconfig:
>
> In file included from mm/rmap.c:47:
> include/linux/mm.h: In function 'page_referenced':
> include/linux/mm.h:448: error: call to '__compiletime_assert_448' declared with attribute error: BUILD_BUG failed
> make[1]: *** [mm/rmap.o] Error 1
> make: *** [mm/rmap.o] Error 2
>
> because
>
> #else /* CONFIG_TRANSPARENT_HUGEPAGE */
> #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
>
>
> btw, total_mapcount() is far too large to be inlined and
The patch below is my propsal to fix this.
> page_mapcount() is getting pretty bad too.
Do you want me to uninline slow path (PageCompound())?
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a36f9fa4e4cd..f874d2a1d1a6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -432,24 +432,14 @@ static inline int page_mapcount(struct page *page)
return ret;
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+int total_mapcount(struct page *page);
+#else
static inline int total_mapcount(struct page *page)
{
- int i, ret;
-
- VM_BUG_ON_PAGE(PageTail(page), page);
-
- if (likely(!PageCompound(page)))
- return atomic_read(&page->_mapcount) + 1;
-
- ret = compound_mapcount(page);
- if (PageHuge(page))
- return ret;
- for (i = 0; i < HPAGE_PMD_NR; i++)
- ret += atomic_read(&page[i]._mapcount) + 1;
- if (PageDoubleMap(page))
- ret -= HPAGE_PMD_NR;
- return ret;
+ return page_mapcount(page);
}
+#endif
static inline int page_count(struct page *page)
{
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 14cbbad54a3e..287bc009bc10 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3236,6 +3236,25 @@ static void __split_huge_page(struct page *page, struct list_head *list)
}
}
+int total_mapcount(struct page *page)
+{
+ int i, ret;
+
+ VM_BUG_ON_PAGE(PageTail(page), page);
+
+ if (likely(!PageCompound(page)))
+ return atomic_read(&page->_mapcount) + 1;
+
+ ret = compound_mapcount(page);
+ if (PageHuge(page))
+ return ret;
+ for (i = 0; i < HPAGE_PMD_NR; i++)
+ ret += atomic_read(&page[i]._mapcount) + 1;
+ if (PageDoubleMap(page))
+ ret -= HPAGE_PMD_NR;
+ return ret;
+}
+
/*
* This function splits huge page into normal pages. @page can point to any
* subpage of huge page to split. Split doesn't change the position of @page.
--
Kirill A. Shutemov
page_referenced_one() and page_idle_clear_pte_refs_one() duplicate the
code for looking up pte of a (possibly transhuge) page. Move this code
to a new helper function, page_check_address_transhuge(), and make the
above mentioned functions use it.
This is just a cleanup, no functional changes are intended.
Signed-off-by: Vladimir Davydov <[email protected]>
---
include/linux/rmap.h | 8 ++++
mm/page_idle.c | 62 ++++-------------------------
mm/rmap.c | 110 ++++++++++++++++++++++++++++++---------------------
3 files changed, 81 insertions(+), 99 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 853f4f3c6742..7b7ce0282c2d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -217,6 +217,14 @@ static inline pte_t *page_check_address(struct page *page, struct mm_struct *mm,
}
/*
+ * Used by idle page tracking to check if a page was referenced via page
+ * tables.
+ */
+bool page_check_address_transhuge(struct page *page, struct mm_struct *mm,
+ unsigned long address, pmd_t **pmdp,
+ pte_t **ptep, spinlock_t **ptlp);
+
+/*
* Used by swapoff to help locate where page is expected in vma.
*/
unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 2c9ebe12b40d..374931f32ebc 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -55,70 +55,22 @@ static int page_idle_clear_pte_refs_one(struct page *page,
unsigned long addr, void *arg)
{
struct mm_struct *mm = vma->vm_mm;
- spinlock_t *ptl;
- pgd_t *pgd;
- pud_t *pud;
pmd_t *pmd;
pte_t *pte;
+ spinlock_t *ptl;
bool referenced = false;
- pgd = pgd_offset(mm, addr);
- if (!pgd_present(*pgd))
- return SWAP_AGAIN;
- pud = pud_offset(pgd, addr);
- if (!pud_present(*pud))
- return SWAP_AGAIN;
- pmd = pmd_offset(pud, addr);
-
- if (pmd_trans_huge(*pmd)) {
- ptl = pmd_lock(mm, pmd);
- if (!pmd_present(*pmd))
- goto unlock_pmd;
- if (unlikely(!pmd_trans_huge(*pmd))) {
- spin_unlock(ptl);
- goto map_pte;
- }
-
- if (pmd_page(*pmd) != page)
- goto unlock_pmd;
-
- referenced = pmdp_clear_young_notify(vma, addr, pmd);
- spin_unlock(ptl);
- goto found;
-unlock_pmd:
- spin_unlock(ptl);
+ if (!page_check_address_transhuge(page, mm, addr, &pmd, &pte, &ptl))
return SWAP_AGAIN;
- } else {
- pmd_t pmde = *pmd;
- barrier();
- if (!pmd_present(pmde) || pmd_trans_huge(pmde))
- return SWAP_AGAIN;
- }
-map_pte:
- pte = pte_offset_map(pmd, addr);
- if (!pte_present(*pte)) {
+ if (pte) {
+ referenced = ptep_clear_young_notify(vma, addr, pte);
pte_unmap(pte);
- return SWAP_AGAIN;
- }
-
- ptl = pte_lockptr(mm, pmd);
- spin_lock(ptl);
-
- if (!pte_present(*pte)) {
- pte_unmap_unlock(pte, ptl);
- return SWAP_AGAIN;
- }
+ } else
+ referenced = pmdp_clear_young_notify(vma, addr, pmd);
- /* THP can be referenced by any subpage */
- if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
- pte_unmap_unlock(pte, ptl);
- return SWAP_AGAIN;
- }
+ spin_unlock(ptl);
- referenced = ptep_clear_young_notify(vma, addr, pte);
- pte_unmap_unlock(pte, ptl);
-found:
if (referenced) {
clear_page_idle(page);
/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 0837487d3737..7ac775e41820 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -796,48 +796,43 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
return 1;
}
-struct page_referenced_arg {
- int mapcount;
- int referenced;
- unsigned long vm_flags;
- struct mem_cgroup *memcg;
-};
/*
- * arg: page_referenced_arg will be passed
+ * Check that @page is mapped at @address into @mm. In contrast to
+ * page_check_address(), this function can handle transparent huge pages.
+ *
+ * On success returns true with pte mapped and locked. For transparent huge
+ * pages *@ptep is set to NULL.
*/
-static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
- unsigned long address, void *arg)
+bool page_check_address_transhuge(struct page *page, struct mm_struct *mm,
+ unsigned long address, pmd_t **pmdp,
+ pte_t **ptep, spinlock_t **ptlp)
{
- struct mm_struct *mm = vma->vm_mm;
- spinlock_t *ptl;
- int referenced = 0;
- struct page_referenced_arg *pra = arg;
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
+ spinlock_t *ptl;
if (unlikely(PageHuge(page))) {
/* when pud is not present, pte will be NULL */
pte = huge_pte_offset(mm, address);
if (!pte)
- return SWAP_AGAIN;
+ return false;
ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
+ pmd = NULL;
goto check_pte;
}
pgd = pgd_offset(mm, address);
if (!pgd_present(*pgd))
- return SWAP_AGAIN;
+ return false;
pud = pud_offset(pgd, address);
if (!pud_present(*pud))
- return SWAP_AGAIN;
+ return false;
pmd = pmd_offset(pud, address);
if (pmd_trans_huge(*pmd)) {
- int ret = SWAP_AGAIN;
-
ptl = pmd_lock(mm, pmd);
if (!pmd_present(*pmd))
goto unlock_pmd;
@@ -849,30 +844,22 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
if (pmd_page(*pmd) != page)
goto unlock_pmd;
- if (vma->vm_flags & VM_LOCKED) {
- pra->vm_flags |= VM_LOCKED;
- ret = SWAP_FAIL; /* To break the loop */
- goto unlock_pmd;
- }
-
- if (pmdp_clear_flush_young_notify(vma, address, pmd))
- referenced++;
- spin_unlock(ptl);
+ pte = NULL;
goto found;
unlock_pmd:
spin_unlock(ptl);
- return ret;
+ return false;
} else {
pmd_t pmde = *pmd;
barrier();
if (!pmd_present(pmde) || pmd_trans_huge(pmde))
- return SWAP_AGAIN;
+ return false;
}
map_pte:
pte = pte_offset_map(pmd, address);
if (!pte_present(*pte)) {
pte_unmap(pte);
- return SWAP_AGAIN;
+ return false;
}
ptl = pte_lockptr(mm, pmd);
@@ -881,35 +868,70 @@ check_pte:
if (!pte_present(*pte)) {
pte_unmap_unlock(pte, ptl);
- return SWAP_AGAIN;
+ return false;
}
/* THP can be referenced by any subpage */
if (pte_pfn(*pte) - page_to_pfn(page) >= hpage_nr_pages(page)) {
pte_unmap_unlock(pte, ptl);
- return SWAP_AGAIN;
+ return false;
}
+found:
+ *ptep = pte;
+ *pmdp = pmd;
+ *ptlp = ptl;
+ return true;
+}
+
+struct page_referenced_arg {
+ int mapcount;
+ int referenced;
+ unsigned long vm_flags;
+ struct mem_cgroup *memcg;
+};
+/*
+ * arg: page_referenced_arg will be passed
+ */
+static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
+ unsigned long address, void *arg)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ struct page_referenced_arg *pra = arg;
+ pmd_t *pmd;
+ pte_t *pte;
+ spinlock_t *ptl;
+ int referenced = 0;
+
+ if (!page_check_address_transhuge(page, mm, address, &pmd, &pte, &ptl))
+ return SWAP_AGAIN;
if (vma->vm_flags & VM_LOCKED) {
- pte_unmap_unlock(pte, ptl);
+ if (pte)
+ pte_unmap(pte);
+ spin_unlock(ptl);
pra->vm_flags |= VM_LOCKED;
return SWAP_FAIL; /* To break the loop */
}
- if (ptep_clear_flush_young_notify(vma, address, pte)) {
- /*
- * Don't treat a reference through a sequentially read
- * mapping as such. If the page has been used in
- * another mapping, we will catch it; if this other
- * mapping is already gone, the unmap path will have
- * set PG_referenced or activated the page.
- */
- if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+ if (pte) {
+ if (ptep_clear_flush_young_notify(vma, address, pte)) {
+ /*
+ * Don't treat a reference through a sequentially read
+ * mapping as such. If the page has been used in
+ * another mapping, we will catch it; if this other
+ * mapping is already gone, the unmap path will have
+ * set PG_referenced or activated the page.
+ */
+ if (likely(!(vma->vm_flags & VM_SEQ_READ)))
+ referenced++;
+ }
+ pte_unmap(pte);
+ } else {
+ if (pmdp_clear_flush_young_notify(vma, address, pmd))
referenced++;
}
- pte_unmap_unlock(pte, ptl);
+ spin_unlock(ptl);
-found:
if (referenced)
clear_page_idle(page);
if (test_and_clear_page_young(page))
--
2.1.4
On Fri, Nov 06, 2015 at 05:37:07PM +0300, Vladimir Davydov wrote:
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0837487d3737..7ac775e41820 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -796,48 +796,43 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
> return 1;
> }
>
> -struct page_referenced_arg {
> - int mapcount;
> - int referenced;
> - unsigned long vm_flags;
> - struct mem_cgroup *memcg;
> -};
> /*
> - * arg: page_referenced_arg will be passed
> + * Check that @page is mapped at @address into @mm. In contrast to
> + * page_check_address(), this function can handle transparent huge pages.
> + *
> + * On success returns true with pte mapped and locked. For transparent huge
> + * pages *@ptep is set to NULL.
I think
"For PMD-mapped transparent huge pages"...
would be more correct.
Otherwise looks great!
Reviewed-by: Kirill A. Shutemov <[email protected]>
--
Kirill A. Shutemov
On Fri, 6 Nov 2015 12:29:21 +0200 "Kirill A. Shutemov" <[email protected]> wrote:
> > page_mapcount() is getting pretty bad too.
>
> Do you want me to uninline slow path (PageCompound())?
I guess so. Uninlining all of page_mapcount() does this:
gcc-4.4.4:
text data bss dec hex filename
973702 273954 831512 2079168 1fb9c0 mm/built-in.o-before
970148 273954 831000 2075102 1fa9de mm/built-in.o-after
That's quite a bit of bloat.
I don't know why bss changed; this usually (always?) happens. Seems
bogus.
On Fri, Nov 06, 2015 at 02:39:00PM -0800, Andrew Morton wrote:
> On Fri, 6 Nov 2015 12:29:21 +0200 "Kirill A. Shutemov" <[email protected]> wrote:
>
> > > page_mapcount() is getting pretty bad too.
> >
> > Do you want me to uninline slow path (PageCompound())?
>
> I guess so. Uninlining all of page_mapcount() does this:
>
> gcc-4.4.4:
>
> text data bss dec hex filename
> 973702 273954 831512 2079168 1fb9c0 mm/built-in.o-before
> 970148 273954 831000 2075102 1fa9de mm/built-in.o-after
>
> That's quite a bit of bloat.
>
> I don't know why bss changed; this usually (always?) happens. Seems
> bogus.
Here it is.
>From 4bd3af3b6b9498254bd71e8288721dcff641156c Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <[email protected]>
Date: Mon, 9 Nov 2015 01:34:08 +0200
Subject: [PATCH] mm: uninline slowpath of page_mapcount()
Let's move page_mapcount() part for compound page into mm/util.c.
make allyesconfig:
text data bss dec hex filename
188515051 153360535 85458720 427334306 19789aa2 vmlinux.o.before
188512917 153356439 85458720 427328076 1978824c vmlinux.o.after
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/mm.h | 14 +++++---------
mm/util.c | 14 ++++++++++++++
2 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f874d2a1d1a6..72edbbec7b91 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -417,19 +417,15 @@ static inline void page_mapcount_reset(struct page *page)
atomic_set(&(page)->_mapcount, -1);
}
+int __page_mapcount(struct page *page);
+
static inline int page_mapcount(struct page *page)
{
- int ret;
VM_BUG_ON_PAGE(PageSlab(page), page);
- ret = atomic_read(&page->_mapcount) + 1;
- if (PageCompound(page)) {
- page = compound_head(page);
- ret += atomic_read(compound_mapcount_ptr(page)) + 1;
- if (PageDoubleMap(page))
- ret--;
- }
- return ret;
+ if (unlikely(PageCompound(page)))
+ return __page_mapcount(page);
+ return atomic_read(&page->_mapcount) + 1;
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/mm/util.c b/mm/util.c
index 902b65a43899..68535c0bb9da 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -376,6 +376,20 @@ struct address_space *page_mapping(struct page *page)
return mapping;
}
+/* Slow path of page_mapcount() for compound pages */
+int __page_mapcount(struct page *page)
+{
+ int ret;
+
+ page = compound_head(page);
+ ret = atomic_read(&page->_mapcount) + 1;
+ ret += atomic_read(compound_mapcount_ptr(page)) + 1;
+ if (PageDoubleMap(page))
+ ret--;
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__page_mapcount);
+
int overcommit_ratio_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
--
Kirill A. Shutemov