2016-03-03 07:42:05

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 00/11] mm: page migration enhancement for thp

Hi everyone,

This patchset enhances page migration functionality to handle thp migration
for various page migration's callers:
- mbind(2)
- move_pages(2)
- migrate_pages(2)
- cgroup/cpuset migration
- memory hotremove
- soft offline

The main benefit is that we can avoid unnecessary thp splits, which helps us
avoid performance decrease when your applications handles NUMA optimization on
their own.

The implementation is similar to that of normal page migration, the key point
is that we modify a pmd to a pmd migration entry in swap-entry like format.
pmd_present() is not simple and it's not enough by itself to determine whether
a given pmd is a pmd migration entry. See patch 3/11 and 5/11 for details.

Here're topics which might be helpful to start discussion:

- at this point, this functionality is limited to x86_64.

- there's alrealy an implementation of thp migration in autonuma code of which
this patchset doesn't touch anything because it works fine as it is.

- fallback to thp split: current implementation just fails a migration trial if
thp migration fails. It's possible to retry migration after splitting the thp,
but that's not included in this version.

Any comments or advices are welcomed.

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (11):
mm: mempolicy: add queue_pages_node_check()
mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
mm: thp: add helpers related to thp/pmd migration
mm: thp: enable thp migration in generic path
mm: thp: check pmd migration entry in common path
mm: soft-dirty: keep soft-dirty bits over thp migration
mm: hwpoison: fix race between unpoisoning and freeing migrate source page
mm: hwpoison: soft offline supports thp migration
mm: mempolicy: mbind and migrate_pages support thp migration
mm: migrate: move_pages() supports thp migration
mm: memory_hotplug: memory hotremove supports thp migration

arch/x86/Kconfig | 4 +
arch/x86/include/asm/pgtable.h | 28 ++++++
arch/x86/include/asm/pgtable_64.h | 2 +
arch/x86/include/asm/pgtable_types.h | 8 +-
arch/x86/mm/gup.c | 3 +
fs/proc/task_mmu.c | 25 +++--
include/asm-generic/pgtable.h | 34 ++++++-
include/linux/huge_mm.h | 17 ++++
include/linux/swapops.h | 64 +++++++++++++
mm/Kconfig | 3 +
mm/gup.c | 8 ++
mm/huge_memory.c | 175 +++++++++++++++++++++++++++++++++--
mm/memcontrol.c | 2 +
mm/memory-failure.c | 41 ++++----
mm/memory.c | 5 +
mm/memory_hotplug.c | 8 ++
mm/mempolicy.c | 110 ++++++++++++++++------
mm/migrate.c | 57 +++++++++---
mm/page_isolation.c | 8 ++
mm/rmap.c | 7 +-
20 files changed, 527 insertions(+), 82 deletions(-)


2016-03-03 07:42:08

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 01/11] mm: mempolicy: add queue_pages_node_check()

Introduce a separate check routine related to MPOL_MF_INVERT flag. This patch
just does cleanup, no behavioral change.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/mempolicy.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/mempolicy.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/mempolicy.c
index 8c5fd08..840a0ad 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/mempolicy.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/mempolicy.c
@@ -478,6 +478,15 @@ struct queue_pages {
struct vm_area_struct *prev;
};

+static inline bool queue_pages_node_check(struct page *page,
+ struct queue_pages *qp)
+{
+ int nid = page_to_nid(page);
+ unsigned long flags = qp->flags;
+
+ return node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT);
+}
+
/*
* Scan through pages checking if pages follow certain conditions,
* and move them to the pagelist if they do.
@@ -529,8 +538,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
*/
if (PageReserved(page))
continue;
- nid = page_to_nid(page);
- if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
+ if (queue_pages_node_check(page, qp))
continue;
if (PageTail(page) && PageAnon(page)) {
get_page(page);
@@ -562,7 +570,6 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
#ifdef CONFIG_HUGETLB_PAGE
struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
- int nid;
struct page *page;
spinlock_t *ptl;
pte_t entry;
@@ -572,8 +579,7 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
if (!pte_present(entry))
goto unlock;
page = pte_page(entry);
- nid = page_to_nid(page);
- if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
+ if (queue_pages_node_check(page, qp))
goto unlock;
/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
if (flags & (MPOL_MF_MOVE_ALL) ||
--
2.7.0

2016-03-03 07:42:21

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 03/11] mm: thp: add helpers related to thp/pmd migration

This patch prepares thp migration's core code. These code will be open when
unmap_and_move() stops unconditionally splitting thp and get_new_page() starts
to allocate destination thps.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
arch/x86/include/asm/pgtable.h | 11 ++++++
arch/x86/include/asm/pgtable_64.h | 2 +
include/linux/swapops.h | 62 +++++++++++++++++++++++++++++++
mm/huge_memory.c | 78 +++++++++++++++++++++++++++++++++++++++
mm/migrate.c | 23 ++++++++++++
5 files changed, 176 insertions(+)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable.h
index 0687c47..0df9afe 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable.h
@@ -515,6 +515,17 @@ static inline int pmd_present(pmd_t pmd)
return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
}

+/*
+ * Unlike pmd_present(), __pmd_present() checks only _PAGE_PRESENT bit.
+ * Combined with is_migration_entry(), this routine is used to detect pmd
+ * migration entries. To make it work fine, callers should make sure that
+ * pmd_trans_huge() returns true beforehand.
+ */
+static inline int __pmd_present(pmd_t pmd)
+{
+ return pmd_flags(pmd) & _PAGE_PRESENT;
+}
+
#ifdef CONFIG_NUMA_BALANCING
/*
* These work without NUMA balancing but the kernel does not care. See the
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable_64.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable_64.h
index 2ee7811..df869d0 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable_64.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable_64.h
@@ -153,7 +153,9 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
((type) << (_PAGE_BIT_PRESENT + 1)) \
| ((offset) << SWP_OFFSET_SHIFT) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
+#define __pmd_to_swp_entry(pte) ((swp_entry_t) { pmd_val((pmd)) })
#define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val })
+#define __swp_entry_to_pmd(x) ((pmd_t) { .pmd = (x).val })

extern int kern_addr_valid(unsigned long addr);
extern void cleanup_highmap(void);
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/swapops.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/swapops.h
index 5c3a5f3..b402a2c 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/swapops.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/swapops.h
@@ -163,6 +163,68 @@ static inline int is_write_migration_entry(swp_entry_t entry)

#endif

+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+extern int set_pmd_migration_entry(struct page *page,
+ struct mm_struct *mm, unsigned long address);
+
+extern int remove_migration_pmd(struct page *new,
+ struct vm_area_struct *vma, unsigned long addr, void *old);
+
+extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
+
+static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
+{
+ swp_entry_t arch_entry;
+
+ arch_entry = __pmd_to_swp_entry(pmd);
+ return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
+}
+
+static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
+{
+ swp_entry_t arch_entry;
+
+ arch_entry = __swp_entry(swp_type(entry), swp_offset(entry));
+ return __swp_entry_to_pmd(arch_entry);
+}
+
+static inline int is_pmd_migration_entry(pmd_t pmd)
+{
+ return !__pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd));
+}
+#else
+static inline int set_pmd_migration_entry(struct page *page,
+ struct mm_struct *mm, unsigned long address)
+{
+ return 0;
+}
+
+static inline int remove_migration_pmd(struct page *new,
+ struct vm_area_struct *vma, unsigned long addr, void *old)
+{
+ return 0;
+}
+
+static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { }
+
+static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
+{
+ return swp_entry(0, 0);
+}
+
+static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
+{
+ pmd_t pmd = {};
+
+ return pmd;
+}
+
+static inline int is_pmd_migration_entry(pmd_t pmd)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_MEMORY_FAILURE

extern atomic_long_t num_poisoned_pages __read_mostly;
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
index 46ad357..c6d5406 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
@@ -3657,3 +3657,81 @@ static int __init split_huge_pages_debugfs(void)
}
late_initcall(split_huge_pages_debugfs);
#endif
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+int set_pmd_migration_entry(struct page *page, struct mm_struct *mm,
+ unsigned long addr)
+{
+ pte_t *pte;
+ pmd_t *pmd;
+ pmd_t pmdval;
+ pmd_t pmdswp;
+ swp_entry_t entry;
+ spinlock_t *ptl;
+
+ mmu_notifier_invalidate_range_start(mm, addr, addr + HPAGE_PMD_SIZE);
+ if (!page_check_address_transhuge(page, mm, addr, &pmd, &pte, &ptl))
+ goto out;
+ if (pte)
+ goto out;
+ pmdval = pmdp_huge_get_and_clear(mm, addr, pmd);
+ entry = make_migration_entry(page, pmd_write(pmdval));
+ pmdswp = swp_entry_to_pmd(entry);
+ pmdswp = pmd_mkhuge(pmdswp);
+ set_pmd_at(mm, addr, pmd, pmdswp);
+ page_remove_rmap(page, true);
+ page_cache_release(page);
+ spin_unlock(ptl);
+out:
+ mmu_notifier_invalidate_range_end(mm, addr, addr + HPAGE_PMD_SIZE);
+ return SWAP_AGAIN;
+}
+
+int remove_migration_pmd(struct page *new, struct vm_area_struct *vma,
+ unsigned long addr, void *old)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pmd_t pmde;
+ swp_entry_t entry;
+ unsigned long mmun_start = addr & HPAGE_PMD_MASK;
+ unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
+
+ pgd = pgd_offset(mm, addr);
+ if (!pgd_present(*pgd))
+ goto out;
+ pud = pud_offset(pgd, addr);
+ if (!pud_present(*pud))
+ goto out;
+ pmd = pmd_offset(pud, addr);
+ if (!pmd)
+ goto out;
+ ptl = pmd_lock(mm, pmd);
+ pmde = *pmd;
+ barrier();
+ if (!is_pmd_migration_entry(pmde))
+ goto unlock_ptl;
+ entry = pmd_to_swp_entry(pmde);
+ if (migration_entry_to_page(entry) != old)
+ goto unlock_ptl;
+ get_page(new);
+ pmde = mk_huge_pmd(new, vma->vm_page_prot);
+ if (is_write_migration_entry(entry))
+ pmde = maybe_pmd_mkwrite(pmde, vma);
+ flush_cache_range(vma, mmun_start, mmun_end);
+ page_add_anon_rmap(new, vma, mmun_start, true);
+ pmdp_huge_clear_flush_notify(vma, mmun_start, pmd);
+ set_pmd_at(mm, mmun_start, pmd, pmde);
+ flush_tlb_range(vma, mmun_start, mmun_end);
+ if (vma->vm_flags & VM_LOCKED)
+ mlock_vma_page(new);
+ update_mmu_cache_pmd(vma, addr, pmd);
+unlock_ptl:
+ spin_unlock(ptl);
+out:
+ return SWAP_AGAIN;
+}
+#endif
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
index 577c94b..14164f6 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
@@ -118,6 +118,8 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
if (!ptep)
goto out;
ptl = huge_pte_lockptr(hstate_vma(vma), mm, ptep);
+ } else if (PageTransHuge(new)) {
+ return remove_migration_pmd(new, vma, addr, old);
} else {
pmd = mm_find_pmd(mm, addr);
if (!pmd)
@@ -252,6 +254,27 @@ void migration_entry_wait_huge(struct vm_area_struct *vma,
__migration_entry_wait(mm, pte, ptl);
}

+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
+{
+ spinlock_t *ptl;
+ struct page *page;
+
+ ptl = pmd_lock(mm, pmd);
+ if (!is_pmd_migration_entry(*pmd))
+ goto unlock;
+ page = migration_entry_to_page(pmd_to_swp_entry(*pmd));
+ if (!get_page_unless_zero(page))
+ goto unlock;
+ spin_unlock(ptl);
+ wait_on_page_locked(page);
+ put_page(page);
+ return;
+unlock:
+ spin_unlock(ptl);
+}
+#endif
+
#ifdef CONFIG_BLOCK
/* Returns true if all buffers are successfully locked */
static bool buffer_migrate_lock_buffers(struct buffer_head *head,
--
2.7.0

2016-03-03 07:42:13

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 02/11] mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION

Introduces CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
arch/x86/Kconfig | 4 ++++
include/linux/huge_mm.h | 14 ++++++++++++++
mm/Kconfig | 3 +++
3 files changed, 21 insertions(+)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/Kconfig v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/Kconfig
index 993aca4..7a563cf 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/Kconfig
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/Kconfig
@@ -2198,6 +2198,10 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
def_bool y
depends on X86_64 && HUGETLB_PAGE && MIGRATION

+config ARCH_ENABLE_THP_MIGRATION
+ def_bool y
+ depends on X86_64 && TRANSPARENT_HUGEPAGE && MIGRATION
+
menu "Power management and ACPI options"

config ARCH_HIBERNATION_HEADER
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/huge_mm.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/huge_mm.h
index 459fd25..09b215d 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/huge_mm.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/huge_mm.h
@@ -156,6 +156,15 @@ static inline bool is_huge_zero_pmd(pmd_t pmd)

struct page *get_huge_zero_page(void);

+static inline bool thp_migration_supported(void)
+{
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+ return true;
+#else
+ return false;
+#endif
+}
+
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
#define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -213,6 +222,11 @@ static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma,
{
return NULL;
}
+
+static inline bool thp_migration_supported(void)
+{
+ return false;
+}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

#endif /* _LINUX_HUGE_MM_H */
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/Kconfig v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/Kconfig
index f2c1a07..64e7ab6 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/Kconfig
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/Kconfig
@@ -265,6 +265,9 @@ config MIGRATION
config ARCH_ENABLE_HUGEPAGE_MIGRATION
bool

+config ARCH_ENABLE_THP_MIGRATION
+ bool
+
config PHYS_ADDR_T_64BIT
def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT

--
2.7.0

2016-03-03 07:42:28

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 06/11] mm: soft-dirty: keep soft-dirty bits over thp migration

Soft dirty bit is designed to keep tracked over page migration, so this patch
makes it done for thp migration too.

This patch changes the bit for _PAGE_SWP_SOFT_DIRTY bit, because it's necessary
for thp migration (i.e. both of _PAGE_PSE and _PAGE_PRESENT is used to detect
pmd migration entry.) When soft-dirty was introduced, bit 6 was used for
nonlinear file mapping, but now that feature is replaced with emulation, so
we can relocate _PAGE_SWP_SOFT_DIRTY to bit 6.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
arch/x86/include/asm/pgtable.h | 17 +++++++++++++++++
arch/x86/include/asm/pgtable_types.h | 8 ++++----
include/asm-generic/pgtable.h | 34 +++++++++++++++++++++++++++++++++-
include/linux/swapops.h | 2 ++
mm/huge_memory.c | 33 +++++++++++++++++++++++++++++++--
5 files changed, 87 insertions(+), 7 deletions(-)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable.h
index 0df9afe..e3da9fe 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable.h
@@ -920,6 +920,23 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
{
return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
}
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+ return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+ return pmd_flags(pmd) & _PAGE_SWP_SOFT_DIRTY;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+ return pmd_clear_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
+}
+#endif
#endif

#include <asm-generic/pgtable.h>
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable_types.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable_types.h
index 4432ab7..a5d5e43 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable_types.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable_types.h
@@ -71,14 +71,14 @@
* Tracking soft dirty bit when a page goes to a swap is tricky.
* We need a bit which can be stored in pte _and_ not conflict
* with swap entry format. On x86 bits 6 and 7 are *not* involved
- * into swap entry computation, but bit 6 is used for nonlinear
- * file mapping, so we borrow bit 7 for soft dirty tracking.
+ * into swap entry computation, but bit 7 is used for thp migration,
+ * so we borrow bit 6 for soft dirty tracking.
*
* Please note that this bit must be treated as swap dirty page
- * mark if and only if the PTE has present bit clear!
+ * mark if and only if the PTE/PMD has present bit clear!
*/
#ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE
+#define _PAGE_SWP_SOFT_DIRTY _PAGE_DIRTY
#else
#define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0))
#endif
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/asm-generic/pgtable.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/asm-generic/pgtable.h
index 9401f48..1b0d610 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/include/asm-generic/pgtable.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/asm-generic/pgtable.h
@@ -489,7 +489,24 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
#define arch_start_context_switch(prev) do {} while (0)
#endif

-#ifndef CONFIG_HAVE_ARCH_SOFT_DIRTY
+#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
+#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+ return pmd;
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+ return 0;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+ return pmd;
+}
+#endif
+#else /* !CONFIG_HAVE_ARCH_SOFT_DIRTY */
static inline int pte_soft_dirty(pte_t pte)
{
return 0;
@@ -534,6 +551,21 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
{
return pte;
}
+
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+ return pmd;
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+ return 0;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+ return pmd;
+}
#endif

#ifndef __HAVE_PFNMAP_TRACKING
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/swapops.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/swapops.h
index b402a2c..18f3744 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/swapops.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/swapops.h
@@ -176,6 +176,8 @@ static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
{
swp_entry_t arch_entry;

+ if (pmd_swp_soft_dirty(pmd))
+ pmd = pmd_swp_clear_soft_dirty(pmd);
arch_entry = __pmd_to_swp_entry(pmd);
return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
}
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
index 7120036..a3f98ea 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
@@ -1113,6 +1113,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
if (is_write_migration_entry(entry)) {
make_migration_entry_read(&entry);
pmd = swp_entry_to_pmd(entry);
+ if (pmd_swp_soft_dirty(pmd))
+ pmd = pmd_swp_mksoft_dirty(pmd);
set_pmd_at(src_mm, addr, src_pmd, pmd);
}
set_pmd_at(dst_mm, addr, dst_pmd, pmd);
@@ -1733,6 +1735,17 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
return 1;
}

+static pmd_t move_soft_dirty_pmd(pmd_t pmd)
+{
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ if (unlikely(is_pmd_migration_entry(pmd)))
+ pmd = pmd_mksoft_dirty(pmd);
+ else if (pmd_present(pmd))
+ pmd = pmd_swp_mksoft_dirty(pmd);
+#endif
+ return pmd;
+}
+
bool move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
unsigned long old_addr,
unsigned long new_addr, unsigned long old_end,
@@ -1776,7 +1789,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
}
- set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
+ pmd = move_soft_dirty_pmd(pmd);
+ set_pmd_at(mm, new_addr, new_pmd, pmd);
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
spin_unlock(old_ptl);
@@ -1815,6 +1829,17 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
}

if (is_pmd_migration_entry(*pmd)) {
+ swp_entry_t entry = pmd_to_swp_entry(*pmd);
+
+ if (is_write_migration_entry(entry)) {
+ pmd_t newpmd;
+
+ make_migration_entry_read(&entry);
+ newpmd = swp_entry_to_pmd(entry);
+ if (pmd_swp_soft_dirty(newpmd))
+ newpmd = pmd_swp_mksoft_dirty(newpmd);
+ set_pmd_at(mm, addr, pmd, newpmd);
+ }
spin_unlock(ptl);
return ret;
}
@@ -3730,6 +3755,8 @@ int set_pmd_migration_entry(struct page *page, struct mm_struct *mm,
entry = make_migration_entry(page, pmd_write(pmdval));
pmdswp = swp_entry_to_pmd(entry);
pmdswp = pmd_mkhuge(pmdswp);
+ if (pmd_soft_dirty(pmdval))
+ pmdswp = pmd_swp_mksoft_dirty(pmdswp);
set_pmd_at(mm, addr, pmd, pmdswp);
page_remove_rmap(page, true);
page_cache_release(page);
@@ -3770,7 +3797,9 @@ int remove_migration_pmd(struct page *new, struct vm_area_struct *vma,
if (migration_entry_to_page(entry) != old)
goto unlock_ptl;
get_page(new);
- pmde = mk_huge_pmd(new, vma->vm_page_prot);
+ pmde = pmd_mkold(mk_huge_pmd(new, vma->vm_page_prot));
+ if (pmd_swp_soft_dirty(pmde))
+ pmde = pmd_mksoft_dirty(pmde);
if (is_write_migration_entry(entry))
pmde = maybe_pmd_mkwrite(pmde, vma);
flush_cache_range(vma, mmun_start, mmun_end);
--
2.7.0

2016-03-03 07:42:38

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 09/11] mm: mempolicy: mbind and migrate_pages support thp migration

This patch enables thp migration for mbind(2) and migrate_pages(2).

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/mempolicy.c | 94 ++++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 72 insertions(+), 22 deletions(-)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/mempolicy.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/mempolicy.c
index 840a0ad..a9754dd 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/mempolicy.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/mempolicy.c
@@ -94,6 +94,7 @@
#include <linux/mm_inline.h>
#include <linux/mmu_notifier.h>
#include <linux/printk.h>
+#include <linux/swapops.h>

#include <asm/tlbflush.h>
#include <asm/uaccess.h>
@@ -487,6 +488,49 @@ static inline bool queue_pages_node_check(struct page *page,
return node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT);
}

+static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
+ unsigned long end, struct mm_walk *walk)
+{
+ int ret = 0;
+ struct page *page;
+ struct queue_pages *qp = walk->private;
+ unsigned long flags;
+
+ if (unlikely(is_pmd_migration_entry(*pmd))) {
+ ret = 1;
+ goto unlock;
+ }
+ page = pmd_page(*pmd);
+ if (is_huge_zero_page(page)) {
+ spin_unlock(ptl);
+ split_huge_pmd(walk->vma, pmd, addr);
+ goto out;
+ }
+ if ((end - addr != HPAGE_PMD_SIZE) || !thp_migration_supported()) {
+ get_page(page);
+ spin_unlock(ptl);
+ lock_page(page);
+ ret = split_huge_page(page);
+ unlock_page(page);
+ put_page(page);
+ goto out;
+ }
+ if (queue_pages_node_check(page, qp)) {
+ ret = 1;
+ goto unlock;
+ }
+
+ ret = 1;
+ flags = qp->flags;
+ /* go to thp migration */
+ if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+ migrate_page_add(page, qp->pagelist, flags);
+unlock:
+ spin_unlock(ptl);
+out:
+ return ret;
+}
+
/*
* Scan through pages checking if pages follow certain conditions,
* and move them to the pagelist if they do.
@@ -498,32 +542,19 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
struct page *page;
struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
- int nid, ret;
+ int ret;
pte_t *pte;
spinlock_t *ptl;

- if (pmd_trans_huge(*pmd)) {
- ptl = pmd_lock(walk->mm, pmd);
- if (pmd_trans_huge(*pmd)) {
- page = pmd_page(*pmd);
- if (is_huge_zero_page(page)) {
- spin_unlock(ptl);
- split_huge_pmd(vma, pmd, addr);
- } else {
- get_page(page);
- spin_unlock(ptl);
- lock_page(page);
- ret = split_huge_page(page);
- unlock_page(page);
- put_page(page);
- if (ret)
- return 0;
- }
- } else {
- spin_unlock(ptl);
- }
+ ptl = pmd_trans_huge_lock(pmd, vma);
+ if (ptl) {
+ ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
+ if (ret)
+ return 0;
}

+ if (pmd_trans_unstable(pmd))
+ return 0;
retry:
pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
@@ -980,7 +1011,17 @@ static struct page *new_node_page(struct page *page, unsigned long node, int **x
if (PageHuge(page))
return alloc_huge_page_node(page_hstate(compound_head(page)),
node);
- else
+ else if (thp_migration_supported() && PageTransHuge(page)) {
+ struct page *thp;
+
+ thp = alloc_pages_node(node,
+ (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+ HPAGE_PMD_ORDER);
+ if (!thp)
+ return NULL;
+ prep_transhuge_page(thp);
+ return thp;
+ } else
return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
__GFP_THISNODE, 0);
}
@@ -1146,6 +1187,15 @@ static struct page *new_page(struct page *page, unsigned long start, int **x)
if (PageHuge(page)) {
BUG_ON(!vma);
return alloc_huge_page_noerr(vma, address, 1);
+ } else if (thp_migration_supported() && PageTransHuge(page)) {
+ struct page *thp;
+
+ thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
+ HPAGE_PMD_ORDER);
+ if (!thp)
+ return NULL;
+ prep_transhuge_page(thp);
+ return thp;
}
/*
* if !vma, alloc_page_vma() will use task or system default policy
--
2.7.0

2016-03-03 07:42:35

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 08/11] mm: hwpoison: soft offline supports thp migration

This patch enables thp migration for soft offline.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 31 ++++++++++++-------------------
1 file changed, 12 insertions(+), 19 deletions(-)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/memory-failure.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memory-failure.c
index bfb63c6..9099e78 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/memory-failure.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memory-failure.c
@@ -1490,7 +1490,17 @@ static struct page *new_page(struct page *p, unsigned long private, int **x)
if (PageHuge(p))
return alloc_huge_page_node(page_hstate(compound_head(p)),
nid);
- else
+ else if (thp_migration_supported() && PageTransHuge(p)) {
+ struct page *thp;
+
+ thp = alloc_pages_node(nid,
+ (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+ HPAGE_PMD_ORDER);
+ if (!thp)
+ return NULL;
+ prep_transhuge_page(thp);
+ return thp;
+ } else
return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
}

@@ -1693,28 +1703,11 @@ static int __soft_offline_page(struct page *page, int flags)
static int soft_offline_in_use_page(struct page *page, int flags)
{
int ret;
- struct page *hpage = compound_head(page);
-
- if (!PageHuge(page) && PageTransHuge(hpage)) {
- lock_page(hpage);
- if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) {
- unlock_page(hpage);
- if (!PageAnon(hpage))
- pr_info("soft offline: %#lx: non anonymous thp\n", page_to_pfn(page));
- else
- pr_info("soft offline: %#lx: thp split failed\n", page_to_pfn(page));
- put_hwpoison_page(hpage);
- return -EBUSY;
- }
- unlock_page(hpage);
- get_hwpoison_page(page);
- put_hwpoison_page(hpage);
- }

if (PageHuge(page))
ret = soft_offline_huge_page(page, flags);
else
- ret = __soft_offline_page(page, flags);
+ ret = __soft_offline_page(compound_head(page), flags);

return ret;
}
--
2.7.0

2016-03-03 07:42:43

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 11/11] mm: memory_hotplug: memory hotremove supports thp migration

This patch enables thp migration for memory hotremove. Stub definition of
prep_transhuge_page() is added for CONFIG_TRANSPARENT_HUGEPAGE=n.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/huge_mm.h | 3 +++
mm/memory_hotplug.c | 8 ++++++++
mm/page_isolation.c | 8 ++++++++
3 files changed, 19 insertions(+)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/huge_mm.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/huge_mm.h
index 09b215d..7944346 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/huge_mm.h
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/huge_mm.h
@@ -175,6 +175,9 @@ static inline bool thp_migration_supported(void)
#define transparent_hugepage_enabled(__vma) 0

#define transparent_hugepage_flags 0UL
+static inline void prep_transhuge_page(struct page *page)
+{
+}
static inline int
split_huge_page_to_list(struct page *page, struct list_head *list)
{
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/memory_hotplug.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memory_hotplug.c
index e62aa07..b4b23d5 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/memory_hotplug.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memory_hotplug.c
@@ -1511,6 +1511,14 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
if (isolate_huge_page(page, &source))
move_pages -= 1 << compound_order(head);
continue;
+ } else if (thp_migration_supported() && PageTransHuge(page)) {
+ struct page *head = compound_head(page);
+
+ pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
+ if (compound_order(head) > PFN_SECTION_SHIFT) {
+ ret = -EBUSY;
+ break;
+ }
}

if (!get_page_unless_zero(page))
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/page_isolation.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/page_isolation.c
index 92c4c36..b2d22e8 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/page_isolation.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/page_isolation.c
@@ -294,6 +294,14 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
nodes_complement(dst, src);
return alloc_huge_page_node(page_hstate(compound_head(page)),
next_node(page_to_nid(page), dst));
+ } else if (thp_migration_supported() && PageTransHuge(page)) {
+ struct page *thp;
+
+ thp = alloc_pages(GFP_TRANSHUGE, HPAGE_PMD_ORDER);
+ if (!thp)
+ return NULL;
+ prep_transhuge_page(thp);
+ return thp;
}

if (PageHighMem(page))
--
2.7.0

2016-03-03 07:43:01

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 10/11] mm: migrate: move_pages() supports thp migration

This patch enables thp migration for move_pages(2).

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/migrate.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
index 31bc724..5653d49 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
@@ -1240,7 +1240,17 @@ static struct page *new_page_node(struct page *p, unsigned long private,
if (PageHuge(p))
return alloc_huge_page_node(page_hstate(compound_head(p)),
pm->node);
- else
+ else if (thp_migration_supported() && PageTransHuge(p)) {
+ struct page *thp;
+
+ thp = alloc_pages_node(pm->node,
+ (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+ HPAGE_PMD_ORDER);
+ if (!thp)
+ return NULL;
+ prep_transhuge_page(thp);
+ return thp;
+ } else
return __alloc_pages_node(pm->node,
GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
}
@@ -1267,6 +1277,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
for (pp = pm; pp->node != MAX_NUMNODES; pp++) {
struct vm_area_struct *vma;
struct page *page;
+ unsigned int follflags;

err = -EFAULT;
vma = find_vma(mm, pp->addr);
@@ -1274,8 +1285,10 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
goto set_status;

/* FOLL_DUMP to ignore special (like zero) pages */
- page = follow_page(vma, pp->addr,
- FOLL_GET | FOLL_SPLIT | FOLL_DUMP);
+ follflags = FOLL_GET | FOLL_SPLIT | FOLL_DUMP;
+ if (thp_migration_supported())
+ follflags &= ~FOLL_SPLIT;
+ page = follow_page(vma, pp->addr, follflags);

err = PTR_ERR(page);
if (IS_ERR(page))
@@ -1303,6 +1316,11 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
if (PageHead(page))
isolate_huge_page(page, &pagelist);
goto put_and_set;
+ } else if (PageTransCompound(page)) {
+ if (PageTail(page)) {
+ err = pp->node;
+ goto put_and_set;
+ }
}

err = isolate_lru_page(page);
--
2.7.0

2016-03-03 07:43:38

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 07/11] mm: hwpoison: fix race between unpoisoning and freeing migrate source page

During testing thp migration, I saw the BUG_ON triggered due to the race between
soft offline and unpoison (what I actually saw was "bad page" warning of freeing
page with PageActive set, then subsequent bug messages differ each time.)

I tried to solve similar problem a few times (see commit f4c18e6f7b5b ("mm:
check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*",) but the new
workload brings out a new problem of the previous solution.

Let's say that unpoison never works well if the target page is not properly
contained,) so now I'm going in the direction of limiting unpoison function
(as commit 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed
pages" does). This patch takes another step in the direction by ensuring that
the target page is kicked out from any pcplist. With this change, the dirty hack
of calling put_page() instead of putback_lru_page() when migration reason is
MR_MEMORY_FAILURE is not necessary any more, so it's reverted.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 10 +++++++++-
mm/migrate.c | 8 +-------
2 files changed, 10 insertions(+), 8 deletions(-)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/memory-failure.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memory-failure.c
index 67c30eb..bfb63c6 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/memory-failure.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memory-failure.c
@@ -1431,6 +1431,13 @@ int unpoison_memory(unsigned long pfn)
return 0;
}

+ /*
+ * Soft-offlined pages might stay in PCP list because it's freed via
+ * putback_lru_page(), and such pages shouldn't be unpoisoned because
+ * it could cause list corruption. So let's drain pages to avoid that.
+ */
+ shake_page(page, 0);
+
nr_pages = 1 << compound_order(page);

if (!get_hwpoison_page(p)) {
@@ -1674,7 +1681,8 @@ static int __soft_offline_page(struct page *page, int flags)
pfn, ret, page->flags);
if (ret > 0)
ret = -EIO;
- }
+ } else if (!TestSetPageHWPoison(page))
+ num_poisoned_pages_inc();
} else {
pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
pfn, ret, page_count(page), page->flags);
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
index bd8bfa4..31bc724 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
@@ -994,13 +994,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- /* Soft-offlined page shouldn't go through lru cache list */
- if (reason == MR_MEMORY_FAILURE) {
- put_page(page);
- if (!test_set_page_hwpoison(page))
- num_poisoned_pages_inc();
- } else
- putback_lru_page(page);
+ putback_lru_page(page);
}

/*
--
2.7.0

2016-03-03 07:44:09

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 05/11] mm: thp: check pmd migration entry in common path

If one of callers of page migration starts to handle thp, memory management code
start to see pmd migration entry, so we need to prepare for it before enabling.
This patch changes various code point which checks the status of given pmds in
order to prevent race between thp migration and the pmd-related works.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
arch/x86/mm/gup.c | 3 +++
fs/proc/task_mmu.c | 25 +++++++++++++--------
mm/gup.c | 8 +++++++
mm/huge_memory.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++------
mm/memcontrol.c | 2 ++
mm/memory.c | 5 +++++
6 files changed, 93 insertions(+), 16 deletions(-)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/mm/gup.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/mm/gup.c
index f8d0b5e..34c3d43 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/mm/gup.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/mm/gup.c
@@ -10,6 +10,7 @@
#include <linux/highmem.h>
#include <linux/swap.h>
#include <linux/memremap.h>
+#include <linux/swapops.h>

#include <asm/pgtable.h>

@@ -210,6 +211,8 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
if (pmd_none(pmd))
return 0;
if (unlikely(pmd_large(pmd) || !pmd_present(pmd))) {
+ if (unlikely(is_pmd_migration_entry(pmd)))
+ return 0;
/*
* NUMA hinting faults need to be handled in the GUP
* slowpath for accounting purposes and so that they
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/fs/proc/task_mmu.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/fs/proc/task_mmu.c
index fa95ab2..20205d4 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/fs/proc/task_mmu.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/fs/proc/task_mmu.c
@@ -907,6 +907,9 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,

ptl = pmd_trans_huge_lock(pmd, vma);
if (ptl) {
+ if (unlikely(is_pmd_migration_entry(*pmd)))
+ goto out;
+
if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
clear_soft_dirty_pmd(vma, addr, pmd);
goto out;
@@ -1184,19 +1187,18 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
if (ptl) {
u64 flags = 0, frame = 0;
pmd_t pmd = *pmdp;
+ struct page *page;

if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(pmd))
flags |= PM_SOFT_DIRTY;

- /*
- * Currently pmd for thp is always present because thp
- * can not be swapped-out, migrated, or HWPOISONed
- * (split in such cases instead.)
- * This if-check is just to prepare for future implementation.
- */
- if (pmd_present(pmd)) {
- struct page *page = pmd_page(pmd);
-
+ if (is_pmd_migration_entry(pmd)) {
+ swp_entry_t entry = pmd_to_swp_entry(pmd);
+ frame = swp_type(entry) |
+ (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
+ page = migration_entry_to_page(entry);
+ } else if (pmd_present(pmd)) {
+ page = pmd_page(pmd);
if (page_mapcount(page) == 1)
flags |= PM_MMAP_EXCLUSIVE;

@@ -1518,6 +1520,11 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
pte_t huge_pte = *(pte_t *)pmd;
struct page *page;

+ if (unlikely(is_pmd_migration_entry(*pmd))) {
+ spin_unlock(ptl);
+ return 0;
+ }
+
page = can_gather_numa_stats(huge_pte, vma, addr);
if (page)
gather_stats(page, md, pte_dirty(huge_pte),
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/gup.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/gup.c
index 36ca850..113930b 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/gup.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/gup.c
@@ -271,6 +271,11 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
spin_unlock(ptl);
return follow_page_pte(vma, address, pmd, flags);
}
+ if (is_pmd_migration_entry(*pmd)) {
+ spin_unlock(ptl);
+ return no_page_table(vma, flags);
+ }
+
if (flags & FOLL_SPLIT) {
int ret;
page = pmd_page(*pmd);
@@ -1324,6 +1329,9 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
return 0;

if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+ if (unlikely(is_pmd_migration_entry(pmd)))
+ return 0;
+
/*
* NUMA hinting faults need to be handled in the GUP
* slowpath for accounting purposes and so that they
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
index c6d5406..7120036 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
@@ -1107,6 +1107,19 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
goto out_unlock;
}

+ if (unlikely(is_pmd_migration_entry(pmd))) {
+ swp_entry_t entry = pmd_to_swp_entry(pmd);
+
+ if (is_write_migration_entry(entry)) {
+ make_migration_entry_read(&entry);
+ pmd = swp_entry_to_pmd(entry);
+ set_pmd_at(src_mm, addr, src_pmd, pmd);
+ }
+ set_pmd_at(dst_mm, addr, dst_pmd, pmd);
+ ret = 0;
+ goto out_unlock;
+ }
+
if (!vma_is_dax(vma)) {
/* thp accounting separate from pmd_devmap accounting */
src_page = pmd_page(pmd);
@@ -1284,6 +1297,9 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto out_unlock;

+ if (unlikely(is_pmd_migration_entry(*pmd)))
+ goto out_unlock;
+
page = pmd_page(orig_pmd);
VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
/*
@@ -1418,7 +1434,14 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
goto out;

- page = pmd_page(*pmd);
+ if (is_pmd_migration_entry(*pmd)) {
+ swp_entry_t entry;
+ entry = pmd_to_swp_entry(*pmd);
+ page = pfn_to_page(swp_offset(entry));
+ if (!is_migration_entry(entry))
+ goto out;
+ } else
+ page = pmd_page(*pmd);
VM_BUG_ON_PAGE(!PageHead(page), page);
if (flags & FOLL_TOUCH)
touch_pmd(vma, addr, pmd);
@@ -1601,6 +1624,9 @@ int madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
goto out;
}

+ if (unlikely(is_pmd_migration_entry(orig_pmd)))
+ goto out;
+
page = pmd_page(orig_pmd);
/*
* If other processes are mapping this page, we couldn't discard
@@ -1681,15 +1707,28 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
spin_unlock(ptl);
put_huge_zero_page();
} else {
- struct page *page = pmd_page(orig_pmd);
- page_remove_rmap(page, true);
- VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
- add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
- VM_BUG_ON_PAGE(!PageHead(page), page);
+ struct page *page;
+ int migration = 0;
+
+ if (!is_pmd_migration_entry(orig_pmd)) {
+ page = pmd_page(orig_pmd);
+ page_remove_rmap(page, true);
+ VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
+ add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+ VM_BUG_ON_PAGE(!PageHead(page), page);
+ } else {
+ swp_entry_t entry;
+
+ entry = pmd_to_swp_entry(orig_pmd);
+ free_swap_and_cache(entry); /* waring in failure? */
+ add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+ migration = 1;
+ }
pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd));
atomic_long_dec(&tlb->mm->nr_ptes);
spin_unlock(ptl);
- tlb_remove_page(tlb, page);
+ if (!migration)
+ tlb_remove_page(tlb, page);
}
return 1;
}
@@ -1775,6 +1814,11 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
return ret;
}

+ if (is_pmd_migration_entry(*pmd)) {
+ spin_unlock(ptl);
+ return ret;
+ }
+
if (!prot_numa || !pmd_protnone(*pmd)) {
entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
entry = pmd_modify(entry, newprot);
@@ -3071,6 +3115,9 @@ static void split_huge_pmd_address(struct vm_area_struct *vma,
pmd = pmd_offset(pud, address);
if (!pmd_present(*pmd) || (!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)))
return;
+ if (pmd_trans_huge(*pmd) && is_pmd_migration_entry(*pmd))
+ return;
+
/*
* Caller holds the mmap_sem write mode, so a huge pmd cannot
* materialize from under us.
@@ -3151,6 +3198,11 @@ static void freeze_page_vma(struct vm_area_struct *vma, struct page *page,
return;
}
if (pmd_trans_huge(*pmd)) {
+ if (is_pmd_migration_entry(*pmd)) {
+ spin_unlock(ptl);
+ return;
+ }
+
if (page == pmd_page(*pmd))
__split_huge_pmd_locked(vma, pmd, haddr, true);
spin_unlock(ptl);
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/memcontrol.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memcontrol.c
index ae8b81c..1772043 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/memcontrol.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memcontrol.c
@@ -4548,6 +4548,8 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
struct page *page = NULL;
enum mc_target_type ret = MC_TARGET_NONE;

+ if (unlikely(is_pmd_migration_entry(pmd)))
+ return ret;
page = pmd_page(pmd);
VM_BUG_ON_PAGE(!page || !PageHead(page), page);
if (!(mc.flags & MOVE_ANON))
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/memory.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memory.c
index 6c92a99..a04a685 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/memory.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/memory.c
@@ -3405,6 +3405,11 @@ static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
unsigned int dirty = flags & FAULT_FLAG_WRITE;

+ if (unlikely(is_pmd_migration_entry(orig_pmd))) {
+ pmd_migration_entry_wait(mm, pmd);
+ return 0;
+ }
+
if (pmd_protnone(orig_pmd))
return do_huge_pmd_numa_page(mm, vma, address,
orig_pmd, pmd);
--
2.7.0

2016-03-03 07:44:34

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 04/11] mm: thp: enable thp migration in generic path

This patch makes it possible to support thp migration gradually. If you fail
to allocate a destination page as a thp, you just split the source thp as we
do now, and then enter the normal page migration. If you succeed to allocate
destination thp, you enter thp migration. Subsequent patches actually enable
thp migration for each caller of page migration by allowing its get_new_page()
callback to allocate thps.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/migrate.c | 2 +-
mm/rmap.c | 7 +++++--
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
index 14164f6..bd8bfa4 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
@@ -969,7 +969,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
goto out;
}

- if (unlikely(PageTransHuge(page))) {
+ if (unlikely(PageTransHuge(page) && !PageTransHuge(newpage))) {
lock_page(page);
rc = split_huge_page(page);
unlock_page(page);
diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/rmap.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/rmap.c
index 02f0bfc..49198b8 100644
--- v4.5-rc5-mmotm-2016-02-24-16-18/mm/rmap.c
+++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/rmap.c
@@ -1427,6 +1427,11 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
struct rmap_private *rp = arg;
enum ttu_flags flags = rp->flags;

+ if (!PageHuge(page) && PageTransHuge(page)) {
+ VM_BUG_ON_PAGE(!(flags & TTU_MIGRATION), page);
+ return set_pmd_migration_entry(page, mm, address);
+ }
+
/* munlock has nothing to gain from examining un-locked vmas */
if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
goto out;
@@ -1610,8 +1615,6 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
.anon_lock = page_lock_anon_vma_read,
};

- VM_BUG_ON_PAGE(!PageHuge(page) && PageTransHuge(page), page);
-
/*
* During exec, a temporary VMA is setup and later moved.
* The VMA is moved under the anon_vma lock but not the
--
2.7.0

2016-03-03 08:07:27

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 08/11] mm: hwpoison: soft offline supports thp migration

Hi Naoya,

[auto build test ERROR on v4.5-rc6]
[also build test ERROR on next-20160302]
[cannot apply to tip/x86/core asm-generic/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url: https://github.com/0day-ci/linux/commits/Naoya-Horiguchi/mm-page-migration-enhancement-for-thp/20160303-154610
config: i386-randconfig-x013-201609 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

Note: the linux-review/Naoya-Horiguchi/mm-page-migration-enhancement-for-thp/20160303-154610 HEAD a0e4b096d1324d23aed454a230d4d31b99b92ae7 builds fine.
It only hurts bisectibility.

All errors (new ones prefixed by >>):

mm/memory-failure.c: In function 'new_page':
>> mm/memory-failure.c:1503:3: error: implicit declaration of function 'prep_transhuge_page' [-Werror=implicit-function-declaration]
prep_transhuge_page(thp);
^
cc1: some warnings being treated as errors

vim +/prep_transhuge_page +1503 mm/memory-failure.c

1497
1498 thp = alloc_pages_node(nid,
1499 (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
1500 HPAGE_PMD_ORDER);
1501 if (!thp)
1502 return NULL;
> 1503 prep_transhuge_page(thp);
1504 return thp;
1505 } else
1506 return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.50 kB)
.config.gz (28.27 kB)
Download all attachments

2016-03-03 08:09:21

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 09/11] mm: mempolicy: mbind and migrate_pages support thp migration

Hi Naoya,

[auto build test ERROR on v4.5-rc6]
[also build test ERROR on next-20160303]
[cannot apply to tip/x86/core asm-generic/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url: https://github.com/0day-ci/linux/commits/Naoya-Horiguchi/mm-page-migration-enhancement-for-thp/20160303-154610
config: ia64-allmodconfig (attached as .config)
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64

Note: the linux-review/Naoya-Horiguchi/mm-page-migration-enhancement-for-thp/20160303-154610 HEAD a0e4b096d1324d23aed454a230d4d31b99b92ae7 builds fine.
It only hurts bisectibility.

All errors (new ones prefixed by >>):

mm/mempolicy.c: In function 'new_node_page':
>> mm/mempolicy.c:1020:3: error: implicit declaration of function 'prep_transhuge_page' [-Werror=implicit-function-declaration]
prep_transhuge_page(thp);
^
cc1: some warnings being treated as errors

vim +/prep_transhuge_page +1020 mm/mempolicy.c

1014
1015 thp = alloc_pages_node(node,
1016 (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
1017 HPAGE_PMD_ORDER);
1018 if (!thp)
1019 return NULL;
> 1020 prep_transhuge_page(thp);
1021 return thp;
1022 } else
1023 return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.63 kB)
.config.gz (41.29 kB)
Download all attachments

2016-03-03 08:25:51

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 10/11] mm: migrate: move_pages() supports thp migration

Hi Naoya,

[auto build test ERROR on v4.5-rc6]
[also build test ERROR on next-20160303]
[cannot apply to tip/x86/core asm-generic/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url: https://github.com/0day-ci/linux/commits/Naoya-Horiguchi/mm-page-migration-enhancement-for-thp/20160303-154610
config: ia64-allmodconfig (attached as .config)
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64

Note: the linux-review/Naoya-Horiguchi/mm-page-migration-enhancement-for-thp/20160303-154610 HEAD a0e4b096d1324d23aed454a230d4d31b99b92ae7 builds fine.
It only hurts bisectibility.

All errors (new ones prefixed by >>):

mm/migrate.c: In function 'new_page_node':
>> mm/migrate.c:1245:3: error: implicit declaration of function 'prep_transhuge_page' [-Werror=implicit-function-declaration]
prep_transhuge_page(thp);
^
cc1: some warnings being treated as errors

vim +/prep_transhuge_page +1245 mm/migrate.c

1239
1240 thp = alloc_pages_node(pm->node,
1241 (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
1242 HPAGE_PMD_ORDER);
1243 if (!thp)
1244 return NULL;
> 1245 prep_transhuge_page(thp);
1246 return thp;
1247 } else
1248 return __alloc_pages_node(pm->node,

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.61 kB)
.config.gz (41.29 kB)
Download all attachments

2016-03-03 08:42:23

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v1 11/11] mm: memory_hotplug: memory hotremove supports thp migration

On Thu, Mar 03, 2016 at 04:41:58PM +0900, Naoya Horiguchi wrote:
> This patch enables thp migration for memory hotremove. Stub definition of
> prep_transhuge_page() is added for CONFIG_TRANSPARENT_HUGEPAGE=n.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> include/linux/huge_mm.h | 3 +++
> mm/memory_hotplug.c | 8 ++++++++
> mm/page_isolation.c | 8 ++++++++
> 3 files changed, 19 insertions(+)
>
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/huge_mm.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/huge_mm.h
> index 09b215d..7944346 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/huge_mm.h
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/huge_mm.h
> @@ -175,6 +175,9 @@ static inline bool thp_migration_supported(void)
> #define transparent_hugepage_enabled(__vma) 0
>
> #define transparent_hugepage_flags 0UL
> +static inline void prep_transhuge_page(struct page *page)
> +{
> +}
> static inline int
> split_huge_page_to_list(struct page *page, struct list_head *list)
> {

According to the warnings from kbuild bot, this chunk should come with
patch 8/11 or earlier. I'll fix this.

Thanks,
Naoya Horiguchi

2016-03-03 09:27:26

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] mm: thp: add helpers related to thp/pmd migration

Hi Naoya,

[auto build test ERROR on v4.5-rc6]
[also build test ERROR on next-20160303]
[cannot apply to tip/x86/core asm-generic/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url: https://github.com/0day-ci/linux/commits/Naoya-Horiguchi/mm-page-migration-enhancement-for-thp/20160303-154610
config: arm-at91_dt_defconfig (attached as .config)
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm

All errors (new ones prefixed by >>):

In file included from mm/vmscan.c:54:0:
include/linux/swapops.h: In function 'swp_entry_to_pmd':
>> include/linux/swapops.h:217:2: error: empty scalar initializer
pmd_t pmd = {};
^
include/linux/swapops.h:217:2: error: (near initialization for 'pmd')

vim +217 include/linux/swapops.h

211 {
212 return swp_entry(0, 0);
213 }
214
215 static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
216 {
> 217 pmd_t pmd = {};
218
219 return pmd;
220 }

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.32 kB)
.config.gz (20.30 kB)
Download all attachments

2016-03-03 10:24:24

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v1 02/11] mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION

On Thu, Mar 03, 2016 at 04:41:49PM +0900, Naoya Horiguchi wrote:
> Introduces CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION to limit thp migration
> functionality to x86_64, which should be safer at the first step.

The name of the config option in description doesn't match the code.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> arch/x86/Kconfig | 4 ++++
> include/linux/huge_mm.h | 14 ++++++++++++++
> mm/Kconfig | 3 +++
> 3 files changed, 21 insertions(+)
>
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/Kconfig v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/Kconfig
> index 993aca4..7a563cf 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/Kconfig
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/Kconfig
> @@ -2198,6 +2198,10 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
> def_bool y
> depends on X86_64 && HUGETLB_PAGE && MIGRATION
>
> +config ARCH_ENABLE_THP_MIGRATION
> + def_bool y
> + depends on X86_64 && TRANSPARENT_HUGEPAGE && MIGRATION
> +
> menu "Power management and ACPI options"
>
> config ARCH_HIBERNATION_HEADER
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/huge_mm.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/huge_mm.h
> index 459fd25..09b215d 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/huge_mm.h
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/huge_mm.h
> @@ -156,6 +156,15 @@ static inline bool is_huge_zero_pmd(pmd_t pmd)
>
> struct page *get_huge_zero_page(void);
>
> +static inline bool thp_migration_supported(void)
> +{
> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> + return true;
> +#else
> + return false;
> +#endif

return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);

?
> +}
> +
> #else /* CONFIG_TRANSPARENT_HUGEPAGE */
> #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
> #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
> @@ -213,6 +222,11 @@ static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma,
> {
> return NULL;
> }
> +
> +static inline bool thp_migration_supported(void)
> +{
> + return false;
> +}
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> #endif /* _LINUX_HUGE_MM_H */
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/Kconfig v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/Kconfig
> index f2c1a07..64e7ab6 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/mm/Kconfig
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/Kconfig
> @@ -265,6 +265,9 @@ config MIGRATION
> config ARCH_ENABLE_HUGEPAGE_MIGRATION
> bool
>
> +config ARCH_ENABLE_THP_MIGRATION
> + bool
> +
> config PHYS_ADDR_T_64BIT
> def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
>
> --
> 2.7.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kirill A. Shutemov

2016-03-03 10:40:56

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] mm: thp: add helpers related to thp/pmd migration

On Thu, Mar 03, 2016 at 04:41:50PM +0900, Naoya Horiguchi wrote:
> This patch prepares thp migration's core code. These code will be open when
> unmap_and_move() stops unconditionally splitting thp and get_new_page() starts
> to allocate destination thps.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> arch/x86/include/asm/pgtable.h | 11 ++++++
> arch/x86/include/asm/pgtable_64.h | 2 +
> include/linux/swapops.h | 62 +++++++++++++++++++++++++++++++
> mm/huge_memory.c | 78 +++++++++++++++++++++++++++++++++++++++
> mm/migrate.c | 23 ++++++++++++
> 5 files changed, 176 insertions(+)
>
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable.h
> index 0687c47..0df9afe 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable.h
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable.h
> @@ -515,6 +515,17 @@ static inline int pmd_present(pmd_t pmd)
> return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
> }
>
> +/*
> + * Unlike pmd_present(), __pmd_present() checks only _PAGE_PRESENT bit.
> + * Combined with is_migration_entry(), this routine is used to detect pmd
> + * migration entries. To make it work fine, callers should make sure that
> + * pmd_trans_huge() returns true beforehand.
> + */

Hm. I don't this this would fly. What pevents false positive for PROT_NONE
pmds?

I guess the problem is _PAGE_PSE, right? I don't really understand why we
need it in pmd_present().

Andrea?

> +static inline int __pmd_present(pmd_t pmd)
> +{
> + return pmd_flags(pmd) & _PAGE_PRESENT;
> +}
> +
> #ifdef CONFIG_NUMA_BALANCING
> /*
> * These work without NUMA balancing but the kernel does not care. See the
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable_64.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable_64.h
> index 2ee7811..df869d0 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable_64.h
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable_64.h
> @@ -153,7 +153,9 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
> ((type) << (_PAGE_BIT_PRESENT + 1)) \
> | ((offset) << SWP_OFFSET_SHIFT) })
> #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
> +#define __pmd_to_swp_entry(pte) ((swp_entry_t) { pmd_val((pmd)) })
> #define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val })
> +#define __swp_entry_to_pmd(x) ((pmd_t) { .pmd = (x).val })
>
> extern int kern_addr_valid(unsigned long addr);
> extern void cleanup_highmap(void);
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/swapops.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/swapops.h
> index 5c3a5f3..b402a2c 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/swapops.h
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/swapops.h
> @@ -163,6 +163,68 @@ static inline int is_write_migration_entry(swp_entry_t entry)
>
> #endif
>
> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> +extern int set_pmd_migration_entry(struct page *page,
> + struct mm_struct *mm, unsigned long address);
> +
> +extern int remove_migration_pmd(struct page *new,
> + struct vm_area_struct *vma, unsigned long addr, void *old);
> +
> +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
> +
> +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
> +{
> + swp_entry_t arch_entry;
> +
> + arch_entry = __pmd_to_swp_entry(pmd);
> + return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> +}
> +
> +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
> +{
> + swp_entry_t arch_entry;
> +
> + arch_entry = __swp_entry(swp_type(entry), swp_offset(entry));
> + return __swp_entry_to_pmd(arch_entry);
> +}
> +
> +static inline int is_pmd_migration_entry(pmd_t pmd)
> +{
> + return !__pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd));
> +}
> +#else
> +static inline int set_pmd_migration_entry(struct page *page,
> + struct mm_struct *mm, unsigned long address)
> +{
> + return 0;
> +}
> +
> +static inline int remove_migration_pmd(struct page *new,
> + struct vm_area_struct *vma, unsigned long addr, void *old)
> +{
> + return 0;
> +}
> +
> +static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { }
> +
> +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
> +{
> + return swp_entry(0, 0);
> +}
> +
> +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
> +{
> + pmd_t pmd = {};
> +
> + return pmd;
> +}
> +
> +static inline int is_pmd_migration_entry(pmd_t pmd)
> +{
> + return 0;
> +}
> +#endif
> +
> #ifdef CONFIG_MEMORY_FAILURE
>
> extern atomic_long_t num_poisoned_pages __read_mostly;
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
> index 46ad357..c6d5406 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
> @@ -3657,3 +3657,81 @@ static int __init split_huge_pages_debugfs(void)
> }
> late_initcall(split_huge_pages_debugfs);
> #endif
> +
> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> +int set_pmd_migration_entry(struct page *page, struct mm_struct *mm,
> + unsigned long addr)
> +{
> + pte_t *pte;
> + pmd_t *pmd;
> + pmd_t pmdval;
> + pmd_t pmdswp;
> + swp_entry_t entry;
> + spinlock_t *ptl;
> +
> + mmu_notifier_invalidate_range_start(mm, addr, addr + HPAGE_PMD_SIZE);
> + if (!page_check_address_transhuge(page, mm, addr, &pmd, &pte, &ptl))
> + goto out;
> + if (pte)
> + goto out;
> + pmdval = pmdp_huge_get_and_clear(mm, addr, pmd);
> + entry = make_migration_entry(page, pmd_write(pmdval));
> + pmdswp = swp_entry_to_pmd(entry);
> + pmdswp = pmd_mkhuge(pmdswp);
> + set_pmd_at(mm, addr, pmd, pmdswp);
> + page_remove_rmap(page, true);
> + page_cache_release(page);
> + spin_unlock(ptl);
> +out:
> + mmu_notifier_invalidate_range_end(mm, addr, addr + HPAGE_PMD_SIZE);
> + return SWAP_AGAIN;
> +}
> +
> +int remove_migration_pmd(struct page *new, struct vm_area_struct *vma,
> + unsigned long addr, void *old)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + spinlock_t *ptl;
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pmd_t pmde;
> + swp_entry_t entry;
> + unsigned long mmun_start = addr & HPAGE_PMD_MASK;
> + unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
> +
> + pgd = pgd_offset(mm, addr);
> + if (!pgd_present(*pgd))
> + goto out;
> + pud = pud_offset(pgd, addr);
> + if (!pud_present(*pud))
> + goto out;
> + pmd = pmd_offset(pud, addr);
> + if (!pmd)
> + goto out;
> + ptl = pmd_lock(mm, pmd);
> + pmde = *pmd;
> + barrier();

Do we need a barrier under ptl?

> + if (!is_pmd_migration_entry(pmde))
> + goto unlock_ptl;
> + entry = pmd_to_swp_entry(pmde);
> + if (migration_entry_to_page(entry) != old)
> + goto unlock_ptl;
> + get_page(new);
> + pmde = mk_huge_pmd(new, vma->vm_page_prot);
> + if (is_write_migration_entry(entry))
> + pmde = maybe_pmd_mkwrite(pmde, vma);
> + flush_cache_range(vma, mmun_start, mmun_end);
> + page_add_anon_rmap(new, vma, mmun_start, true);
> + pmdp_huge_clear_flush_notify(vma, mmun_start, pmd);
> + set_pmd_at(mm, mmun_start, pmd, pmde);
> + flush_tlb_range(vma, mmun_start, mmun_end);
> + if (vma->vm_flags & VM_LOCKED)
> + mlock_vma_page(new);
> + update_mmu_cache_pmd(vma, addr, pmd);
> +unlock_ptl:
> + spin_unlock(ptl);
> +out:
> + return SWAP_AGAIN;
> +}
> +#endif
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
> index 577c94b..14164f6 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
> @@ -118,6 +118,8 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> if (!ptep)
> goto out;
> ptl = huge_pte_lockptr(hstate_vma(vma), mm, ptep);
> + } else if (PageTransHuge(new)) {
> + return remove_migration_pmd(new, vma, addr, old);

Hm. THP now can be mapped with PTEs too..

> } else {
> pmd = mm_find_pmd(mm, addr);
> if (!pmd)
> @@ -252,6 +254,27 @@ void migration_entry_wait_huge(struct vm_area_struct *vma,
> __migration_entry_wait(mm, pte, ptl);
> }
>
> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> +void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
> +{
> + spinlock_t *ptl;
> + struct page *page;
> +
> + ptl = pmd_lock(mm, pmd);
> + if (!is_pmd_migration_entry(*pmd))
> + goto unlock;
> + page = migration_entry_to_page(pmd_to_swp_entry(*pmd));
> + if (!get_page_unless_zero(page))
> + goto unlock;
> + spin_unlock(ptl);
> + wait_on_page_locked(page);
> + put_page(page);
> + return;
> +unlock:
> + spin_unlock(ptl);
> +}
> +#endif
> +
> #ifdef CONFIG_BLOCK
> /* Returns true if all buffers are successfully locked */
> static bool buffer_migrate_lock_buffers(struct buffer_head *head,
> --
> 2.7.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kirill A. Shutemov

2016-03-03 10:51:06

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v1 05/11] mm: thp: check pmd migration entry in common path

On Thu, Mar 03, 2016 at 04:41:52PM +0900, Naoya Horiguchi wrote:
> If one of callers of page migration starts to handle thp, memory management code
> start to see pmd migration entry, so we need to prepare for it before enabling.
> This patch changes various code point which checks the status of given pmds in
> order to prevent race between thp migration and the pmd-related works.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> arch/x86/mm/gup.c | 3 +++
> fs/proc/task_mmu.c | 25 +++++++++++++--------
> mm/gup.c | 8 +++++++
> mm/huge_memory.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++------
> mm/memcontrol.c | 2 ++
> mm/memory.c | 5 +++++
> 6 files changed, 93 insertions(+), 16 deletions(-)
>
> diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/mm/gup.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/mm/gup.c
> index f8d0b5e..34c3d43 100644
> --- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/mm/gup.c
> +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/mm/gup.c
> @@ -10,6 +10,7 @@
> #include <linux/highmem.h>
> #include <linux/swap.h>
> #include <linux/memremap.h>
> +#include <linux/swapops.h>
>
> #include <asm/pgtable.h>
>
> @@ -210,6 +211,8 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> if (pmd_none(pmd))
> return 0;
> if (unlikely(pmd_large(pmd) || !pmd_present(pmd))) {
> + if (unlikely(is_pmd_migration_entry(pmd)))
> + return 0;

Hm. I've expected to see bunch of pmd_none() to pmd_present() conversions.
That's seems a right way guard the code. Otherwise we wound need even more
checks once PMD-level swap is implemented.

I think we need to check for migration entires only if we have something
to do with migration. In all other cases pmd_present() should be enough to
bail out.

--
Kirill A. Shutemov

2016-03-07 00:58:22

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v1 02/11] mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION

On Thu, Mar 03, 2016 at 04:41:49PM +0900, Naoya Horiguchi wrote:
> Introduces CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION to limit thp migration
> functionality to x86_64, which should be safer at the first step.
>

The changelog is not helpful. Could you please describe what is
architecture specific in these changes? What do other arches need to do
to port these changes over?


2016-03-07 06:29:56

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v1 02/11] mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION

On Mon, Mar 07, 2016 at 11:58:04AM +1100, Balbir Singh wrote:
> On Thu, Mar 03, 2016 at 04:41:49PM +0900, Naoya Horiguchi wrote:
> > Introduces CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION to limit thp migration
> > functionality to x86_64, which should be safer at the first step.
> >
>
> The changelog is not helpful. Could you please describe what is
> architecture specific in these changes? What do other arches need to do
> to port these changes over?

The arch specific parts are pmd_present() and swap entry format. Currently
pmd_present() in x86_64 is not simple enough to easily determine pmd's state
(none, normal pmd entry pointing to pte page, pmd for thp, or pmd migration entry ...)
That requires me to assume in this version that pmd migration entry should
have_PAGE_PSE set, which should not be necessary if the complexity is fixed.
So I will mention this pmd_present() problem in the next version.

So if it's fixed, what developers need to do to port this feature to their
architectures is just to enable CONFIG_ARCH_ENABLE_THP_MIGRATION (and test it.)

Thanks,
Naoya Horiguchi

2016-03-03 16:22:31

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] mm: thp: add helpers related to thp/pmd migration

On Thu, Mar 03, 2016 at 01:40:51PM +0300, Kirill A. Shutemov wrote:
> On Thu, Mar 03, 2016 at 04:41:50PM +0900, Naoya Horiguchi wrote:
> > This patch prepares thp migration's core code. These code will be open when
> > unmap_and_move() stops unconditionally splitting thp and get_new_page() starts
> > to allocate destination thps.
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > arch/x86/include/asm/pgtable.h | 11 ++++++
> > arch/x86/include/asm/pgtable_64.h | 2 +
> > include/linux/swapops.h | 62 +++++++++++++++++++++++++++++++
> > mm/huge_memory.c | 78 +++++++++++++++++++++++++++++++++++++++
> > mm/migrate.c | 23 ++++++++++++
> > 5 files changed, 176 insertions(+)
> >
> > diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable.h
> > index 0687c47..0df9afe 100644
> > --- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable.h
> > +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable.h
> > @@ -515,6 +515,17 @@ static inline int pmd_present(pmd_t pmd)
> > return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
> > }
> >
> > +/*
> > + * Unlike pmd_present(), __pmd_present() checks only _PAGE_PRESENT bit.
> > + * Combined with is_migration_entry(), this routine is used to detect pmd
> > + * migration entries. To make it work fine, callers should make sure that
> > + * pmd_trans_huge() returns true beforehand.
> > + */
>
> Hm. I don't this this would fly. What pevents false positive for PROT_NONE
> pmds?

Nothing actually if we use __pmd_present alone. __pmd_present() is now used
only via is_pmd_migration_entry() combined with is_migration_entry(), and
is_migration_entry() should return false for PROT_NONE pmds (because
is_migration_entry() requires characteristic bits SWP_MIGRATION_READ|WRITE,
and they aren't compatible.) But I admit it might not be robust enough.

>
> I guess the problem is _PAGE_PSE, right? I don't really understand why we
> need it in pmd_present().

Yes, _PAGE_PSE in pmd_present() makes this branching harder/complicated.
Some simplification seems necessary.

>
> Andrea?
>
> > +static inline int __pmd_present(pmd_t pmd)
> > +{
> > + return pmd_flags(pmd) & _PAGE_PRESENT;
> > +}
> > +
> > #ifdef CONFIG_NUMA_BALANCING
> > /*
> > * These work without NUMA balancing but the kernel does not care. See the
> > diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable_64.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable_64.h
> > index 2ee7811..df869d0 100644
> > --- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/include/asm/pgtable_64.h
> > +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/include/asm/pgtable_64.h
> > @@ -153,7 +153,9 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
> > ((type) << (_PAGE_BIT_PRESENT + 1)) \
> > | ((offset) << SWP_OFFSET_SHIFT) })
> > #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
> > +#define __pmd_to_swp_entry(pte) ((swp_entry_t) { pmd_val((pmd)) })
> > #define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val })
> > +#define __swp_entry_to_pmd(x) ((pmd_t) { .pmd = (x).val })
> >
> > extern int kern_addr_valid(unsigned long addr);
> > extern void cleanup_highmap(void);
> > diff --git v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/swapops.h v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/swapops.h
> > index 5c3a5f3..b402a2c 100644
> > --- v4.5-rc5-mmotm-2016-02-24-16-18/include/linux/swapops.h
> > +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/include/linux/swapops.h
> > @@ -163,6 +163,68 @@ static inline int is_write_migration_entry(swp_entry_t entry)
> >
> > #endif
> >
> > +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> > +extern int set_pmd_migration_entry(struct page *page,
> > + struct mm_struct *mm, unsigned long address);
> > +
> > +extern int remove_migration_pmd(struct page *new,
> > + struct vm_area_struct *vma, unsigned long addr, void *old);
> > +
> > +extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
> > +
> > +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
> > +{
> > + swp_entry_t arch_entry;
> > +
> > + arch_entry = __pmd_to_swp_entry(pmd);
> > + return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
> > +}
> > +
> > +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
> > +{
> > + swp_entry_t arch_entry;
> > +
> > + arch_entry = __swp_entry(swp_type(entry), swp_offset(entry));
> > + return __swp_entry_to_pmd(arch_entry);
> > +}
> > +
> > +static inline int is_pmd_migration_entry(pmd_t pmd)
> > +{
> > + return !__pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd));
> > +}
> > +#else
> > +static inline int set_pmd_migration_entry(struct page *page,
> > + struct mm_struct *mm, unsigned long address)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline int remove_migration_pmd(struct page *new,
> > + struct vm_area_struct *vma, unsigned long addr, void *old)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { }
> > +
> > +static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
> > +{
> > + return swp_entry(0, 0);
> > +}
> > +
> > +static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
> > +{
> > + pmd_t pmd = {};
> > +
> > + return pmd;
> > +}
> > +
> > +static inline int is_pmd_migration_entry(pmd_t pmd)
> > +{
> > + return 0;
> > +}
> > +#endif
> > +
> > #ifdef CONFIG_MEMORY_FAILURE
> >
> > extern atomic_long_t num_poisoned_pages __read_mostly;
> > diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
> > index 46ad357..c6d5406 100644
> > --- v4.5-rc5-mmotm-2016-02-24-16-18/mm/huge_memory.c
> > +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/huge_memory.c
> > @@ -3657,3 +3657,81 @@ static int __init split_huge_pages_debugfs(void)
> > }
> > late_initcall(split_huge_pages_debugfs);
> > #endif
> > +
> > +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> > +int set_pmd_migration_entry(struct page *page, struct mm_struct *mm,
> > + unsigned long addr)
> > +{
> > + pte_t *pte;
> > + pmd_t *pmd;
> > + pmd_t pmdval;
> > + pmd_t pmdswp;
> > + swp_entry_t entry;
> > + spinlock_t *ptl;
> > +
> > + mmu_notifier_invalidate_range_start(mm, addr, addr + HPAGE_PMD_SIZE);
> > + if (!page_check_address_transhuge(page, mm, addr, &pmd, &pte, &ptl))
> > + goto out;
> > + if (pte)
> > + goto out;
> > + pmdval = pmdp_huge_get_and_clear(mm, addr, pmd);
> > + entry = make_migration_entry(page, pmd_write(pmdval));
> > + pmdswp = swp_entry_to_pmd(entry);
> > + pmdswp = pmd_mkhuge(pmdswp);
> > + set_pmd_at(mm, addr, pmd, pmdswp);
> > + page_remove_rmap(page, true);
> > + page_cache_release(page);
> > + spin_unlock(ptl);
> > +out:
> > + mmu_notifier_invalidate_range_end(mm, addr, addr + HPAGE_PMD_SIZE);
> > + return SWAP_AGAIN;
> > +}
> > +
> > +int remove_migration_pmd(struct page *new, struct vm_area_struct *vma,
> > + unsigned long addr, void *old)
> > +{
> > + struct mm_struct *mm = vma->vm_mm;
> > + spinlock_t *ptl;
> > + pgd_t *pgd;
> > + pud_t *pud;
> > + pmd_t *pmd;
> > + pmd_t pmde;
> > + swp_entry_t entry;
> > + unsigned long mmun_start = addr & HPAGE_PMD_MASK;
> > + unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
> > +
> > + pgd = pgd_offset(mm, addr);
> > + if (!pgd_present(*pgd))
> > + goto out;
> > + pud = pud_offset(pgd, addr);
> > + if (!pud_present(*pud))
> > + goto out;
> > + pmd = pmd_offset(pud, addr);
> > + if (!pmd)
> > + goto out;
> > + ptl = pmd_lock(mm, pmd);
> > + pmde = *pmd;
> > + barrier();
>
> Do we need a barrier under ptl?

No, I'll drop this. Thank you.

> > + if (!is_pmd_migration_entry(pmde))
> > + goto unlock_ptl;
> > + entry = pmd_to_swp_entry(pmde);
> > + if (migration_entry_to_page(entry) != old)
> > + goto unlock_ptl;
> > + get_page(new);
> > + pmde = mk_huge_pmd(new, vma->vm_page_prot);
> > + if (is_write_migration_entry(entry))
> > + pmde = maybe_pmd_mkwrite(pmde, vma);
> > + flush_cache_range(vma, mmun_start, mmun_end);
> > + page_add_anon_rmap(new, vma, mmun_start, true);
> > + pmdp_huge_clear_flush_notify(vma, mmun_start, pmd);
> > + set_pmd_at(mm, mmun_start, pmd, pmde);
> > + flush_tlb_range(vma, mmun_start, mmun_end);
> > + if (vma->vm_flags & VM_LOCKED)
> > + mlock_vma_page(new);
> > + update_mmu_cache_pmd(vma, addr, pmd);
> > +unlock_ptl:
> > + spin_unlock(ptl);
> > +out:
> > + return SWAP_AGAIN;
> > +}
> > +#endif
> > diff --git v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
> > index 577c94b..14164f6 100644
> > --- v4.5-rc5-mmotm-2016-02-24-16-18/mm/migrate.c
> > +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/mm/migrate.c
> > @@ -118,6 +118,8 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> > if (!ptep)
> > goto out;
> > ptl = huge_pte_lockptr(hstate_vma(vma), mm, ptep);
> > + } else if (PageTransHuge(new)) {
> > + return remove_migration_pmd(new, vma, addr, old);
>
> Hm. THP now can be mapped with PTEs too..

Right, and different calls of remove_migration_pte() handle pmd/pte migration
entries separately, so this particular code seems OK to me.

Thanks,
Naoya

2016-03-03 16:22:30

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v1 05/11] mm: thp: check pmd migration entry in common path

On Thu, Mar 03, 2016 at 01:50:58PM +0300, Kirill A. Shutemov wrote:
> On Thu, Mar 03, 2016 at 04:41:52PM +0900, Naoya Horiguchi wrote:
> > If one of callers of page migration starts to handle thp, memory management code
> > start to see pmd migration entry, so we need to prepare for it before enabling.
> > This patch changes various code point which checks the status of given pmds in
> > order to prevent race between thp migration and the pmd-related works.
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > arch/x86/mm/gup.c | 3 +++
> > fs/proc/task_mmu.c | 25 +++++++++++++--------
> > mm/gup.c | 8 +++++++
> > mm/huge_memory.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++------
> > mm/memcontrol.c | 2 ++
> > mm/memory.c | 5 +++++
> > 6 files changed, 93 insertions(+), 16 deletions(-)
> >
> > diff --git v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/mm/gup.c v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/mm/gup.c
> > index f8d0b5e..34c3d43 100644
> > --- v4.5-rc5-mmotm-2016-02-24-16-18/arch/x86/mm/gup.c
> > +++ v4.5-rc5-mmotm-2016-02-24-16-18_patched/arch/x86/mm/gup.c
> > @@ -10,6 +10,7 @@
> > #include <linux/highmem.h>
> > #include <linux/swap.h>
> > #include <linux/memremap.h>
> > +#include <linux/swapops.h>
> >
> > #include <asm/pgtable.h>
> >
> > @@ -210,6 +211,8 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> > if (pmd_none(pmd))
> > return 0;
> > if (unlikely(pmd_large(pmd) || !pmd_present(pmd))) {
> > + if (unlikely(is_pmd_migration_entry(pmd)))
> > + return 0;
>
> Hm. I've expected to see bunch of pmd_none() to pmd_present() conversions.
> That's seems a right way guard the code. Otherwise we wound need even more
> checks once PMD-level swap is implemented.

Yes, I agree. I'll try some for this pmd_none/pmd_present issue.

Thanks,
Naoya

>
> I think we need to check for migration entires only if we have something
> to do with migration. In all other cases pmd_present() should be enough to
> bail out.