2013-07-25 04:56:21

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v4 0/8] extend hugepage migration

Here is the 4th version of hugepage migration patchset.
I added Reviewed/Acked tags and applied the feedbacks in the previous discussion
(thank you, all reviewers!):
- fixed macro (1/8)
- improved comment and readability (1/8, 3/8, 4/8, 7/8)
- improved node choice in allocating destination hugepage (7/8)

TODOs: (likely to be done after this work)
- split page table lock for pmd/pud based hugepage (maybe applicable to thp)
- improve alloc_migrate_target (especially in node choice)
- using page walker in check_range

I hope that this series is becoming ready to be merge to -mm tree.
Andrew, could you review and judge this?

Thanks,
Naoya Horiguchi
---
GitHub:
git://github.com/Naoya-Horiguchi/linux.git extend_hugepage_migration.v4

Test code:
git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git

Naoya Horiguchi (8):
migrate: make core migration code aware of hugepage
soft-offline: use migrate_pages() instead of migrate_huge_page()
migrate: add hugepage migration code to migrate_pages()
migrate: add hugepage migration code to move_pages()
mbind: add hugepage migration code to mbind()
migrate: remove VM_HUGETLB from vma flag check in vma_migratable()
memory-hotplug: enable memory hotplug to handle hugepage
prepare to remove /proc/sys/vm/hugepages_treat_as_movable

Documentation/sysctl/vm.txt | 13 +----
include/linux/hugetlb.h | 15 +++++
include/linux/mempolicy.h | 2 +-
include/linux/migrate.h | 5 --
mm/hugetlb.c | 134 +++++++++++++++++++++++++++++++++++++++-----
mm/memory-failure.c | 15 ++++-
mm/memory.c | 17 +++++-
mm/memory_hotplug.c | 42 +++++++++++---
mm/mempolicy.c | 46 +++++++++++++--
mm/migrate.c | 51 ++++++++---------
mm/page_alloc.c | 12 ++++
mm/page_isolation.c | 14 +++++
12 files changed, 288 insertions(+), 78 deletions(-)


2013-07-25 04:55:58

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 5/8] mbind: add hugepage migration code to mbind()

This patch extends do_mbind() to handle vma with VM_HUGETLB set.
We will be able to migrate hugepage with mbind(2) after
applying the enablement patch which comes later in this series.

ChangeLog v3:
- revert introducing migrate_movable_pages
- added alloc_huge_page_noerr free from ERR_VALUE

ChangeLog v2:
- updated description and renamed patch title

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
include/linux/hugetlb.h | 3 +++
mm/hugetlb.c | 14 ++++++++++++++
mm/mempolicy.c | 4 +++-
3 files changed, 20 insertions(+), 1 deletion(-)

diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
index c7a14a4..cae5539 100644
--- v3.11-rc1.orig/include/linux/hugetlb.h
+++ v3.11-rc1/include/linux/hugetlb.h
@@ -267,6 +267,8 @@ struct huge_bootmem_page {
};

struct page *alloc_huge_page_node(struct hstate *h, int nid);
+struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
+ unsigned long addr, int avoid_reserve);

/* arch callback */
int __init alloc_bootmem_huge_page(struct hstate *h);
@@ -380,6 +382,7 @@ static inline pgoff_t basepage_index(struct page *page)
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
#define alloc_huge_page_node(h, nid) NULL
+#define alloc_huge_page_noerr(v, a, r) NULL
#define alloc_bootmem_huge_page(h) NULL
#define hstate_file(f) NULL
#define hstate_sizelog(s) NULL
diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
index 506d195..f6d8d67 100644
--- v3.11-rc1.orig/mm/hugetlb.c
+++ v3.11-rc1/mm/hugetlb.c
@@ -1195,6 +1195,20 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
return page;
}

+/*
+ * alloc_huge_page()'s wrapper which simply returns the page if allocation
+ * succeeds, otherwise NULL. This function is called from new_vma_page(),
+ * where no ERR_VALUE is expected to be returned.
+ */
+struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
+ unsigned long addr, int avoid_reserve)
+{
+ struct page *page = alloc_huge_page(vma, addr, avoid_reserve);
+ if (IS_ERR(page))
+ page = NULL;
+ return page;
+}
+
int __weak alloc_bootmem_huge_page(struct hstate *h)
{
struct huge_bootmem_page *m;
diff --git v3.11-rc1.orig/mm/mempolicy.c v3.11-rc1/mm/mempolicy.c
index d96afc1..4a03c14 100644
--- v3.11-rc1.orig/mm/mempolicy.c
+++ v3.11-rc1/mm/mempolicy.c
@@ -1183,6 +1183,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int *
vma = vma->vm_next;
}

+ if (PageHuge(page))
+ return alloc_huge_page_noerr(vma, address, 1);
/*
* if !vma, alloc_page_vma() will use task or system default policy
*/
@@ -1293,7 +1295,7 @@ static long do_mbind(unsigned long start, unsigned long len,
(unsigned long)vma,
MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
if (nr_failed)
- putback_lru_pages(&pagelist);
+ putback_movable_pages(&pagelist);
}

if (nr_failed && (flags & MPOL_MF_STRICT))
--
1.8.3.1

2013-07-25 04:55:52

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 2/8] soft-offline: use migrate_pages() instead of migrate_huge_page()

Currently migrate_huge_page() takes a pointer to a hugepage to be
migrated as an argument, instead of taking a pointer to the list of
hugepages to be migrated. This behavior was introduced in commit
189ebff28 ("hugetlb: simplify migrate_huge_page()"), and was OK
because until now hugepage migration is enabled only for soft-offlining
which migrates only one hugepage in a single call.

But the situation will change in the later patches in this series
which enable other users of page migration to support hugepage migration.
They can kick migration for both of normal pages and hugepages
in a single call, so we need to go back to original implementation
which uses linked lists to collect the hugepages to be migrated.

With this patch, soft_offline_huge_page() switches to use migrate_pages(),
and migrate_huge_page() is not used any more. So let's remove it.

ChangeLog v3:
- Merged with another cleanup patch (4/10 in previous version)

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
include/linux/migrate.h | 5 -----
mm/memory-failure.c | 15 ++++++++++++---
mm/migrate.c | 28 ++--------------------------
3 files changed, 14 insertions(+), 34 deletions(-)

diff --git v3.11-rc1.orig/include/linux/migrate.h v3.11-rc1/include/linux/migrate.h
index a405d3dc..6fe5214 100644
--- v3.11-rc1.orig/include/linux/migrate.h
+++ v3.11-rc1/include/linux/migrate.h
@@ -41,8 +41,6 @@ extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, enum migrate_mode mode, int reason);
-extern int migrate_huge_page(struct page *, new_page_t x,
- unsigned long private, enum migrate_mode mode);

extern int fail_migrate_page(struct address_space *,
struct page *, struct page *);
@@ -62,9 +60,6 @@ static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, enum migrate_mode mode, int reason)
{ return -ENOSYS; }
-static inline int migrate_huge_page(struct page *page, new_page_t x,
- unsigned long private, enum migrate_mode mode)
- { return -ENOSYS; }

static inline int migrate_prep(void) { return -ENOSYS; }
static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git v3.11-rc1.orig/mm/memory-failure.c v3.11-rc1/mm/memory-failure.c
index 2c13aa7..af6f61c 100644
--- v3.11-rc1.orig/mm/memory-failure.c
+++ v3.11-rc1/mm/memory-failure.c
@@ -1467,6 +1467,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
int ret;
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);
+ LIST_HEAD(pagelist);

/*
* This double-check of PageHWPoison is to avoid the race with
@@ -1482,12 +1483,20 @@ static int soft_offline_huge_page(struct page *page, int flags)
unlock_page(hpage);

/* Keep page count to indicate a given hugepage is isolated. */
- ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL,
- MIGRATE_SYNC);
- put_page(hpage);
+ list_move(&hpage->lru, &pagelist);
+ ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
+ MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
pfn, ret, page->flags);
+ /*
+ * We know that soft_offline_huge_page() tries to migrate
+ * only one hugepage pointed to by hpage, so we need not
+ * run through the pagelist here.
+ */
+ putback_active_hugepage(hpage);
+ if (ret > 0)
+ ret = -EIO;
} else {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
index b44a067..3ec47d3 100644
--- v3.11-rc1.orig/mm/migrate.c
+++ v3.11-rc1/mm/migrate.c
@@ -979,6 +979,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,

unlock_page(hpage);
out:
+ if (rc != -EAGAIN)
+ putback_active_hugepage(hpage);
put_page(new_hpage);
if (result) {
if (rc)
@@ -1066,32 +1068,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
return rc;
}

-int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
- unsigned long private, enum migrate_mode mode)
-{
- int pass, rc;
-
- for (pass = 0; pass < 10; pass++) {
- rc = unmap_and_move_huge_page(get_new_page, private,
- hpage, pass > 2, mode);
- switch (rc) {
- case -ENOMEM:
- goto out;
- case -EAGAIN:
- /* try again */
- cond_resched();
- break;
- case MIGRATEPAGE_SUCCESS:
- goto out;
- default:
- rc = -EIO;
- goto out;
- }
- }
-out:
- return rc;
-}
-
#ifdef CONFIG_NUMA
/*
* Move a list of individual pages
--
1.8.3.1

2013-07-25 04:56:14

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 7/8] memory-hotplug: enable memory hotplug to handle hugepage

Until now we can't offline memory blocks which contain hugepages because
a hugepage is considered as an unmovable page. But now with this patch
series, a hugepage has become movable, so by using hugepage migration we
can offline such memory blocks.

What's different from other users of hugepage migration is that we need
to decompose all the hugepages inside the target memory block into free
buddy pages after hugepage migration, because otherwise free hugepages
remaining in the memory block intervene the memory offlining.
For this reason we introduce new functions dissolve_free_huge_page() and
dissolve_free_huge_pages().

Other than that, what this patch does is straightforwardly to add hugepage
migration code, that is, adding hugepage code to the functions which scan
over pfn and collect hugepages to be migrated, and adding a hugepage
allocation function to alloc_migrate_target().

As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
over them because it's larger than memory block. So we now simply leave
it to fail as it is.

ChangeLog v4:
- add comment on dequeue_huge_page_node
- alloc_migrate_target allocates destination hugepage from the next node
of source node

ChangeLog v3:
- revert introducing migrate_movable_pages (the function was opened)
- add migratetype check in dequeue_huge_page_node to close the race
between scan and allocation
- make is_hugepage_movable use refcount to find active hugepages
instead of running through hugepage_activelist
- rename is_hugepage_movable to is_hugepage_active
- add alignment check in dissolve_free_huge_pages
- use round_up in calculating next scanning pfn
- use isolate_huge_page

ChangeLog v2:
- changed return value type of is_hugepage_movable() to bool
- is_hugepage_movable() uses list_for_each_entry() instead of *_safe()
- moved if(PageHuge) block before get_page_unless_zero() in do_migrate_range()
- do_migrate_range() returns -EBUSY for hugepages larger than memory block
- dissolve_free_huge_pages() calculates scan step and sets it to minimum
hugepage size

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Andi Kleen <[email protected]>
---
include/linux/hugetlb.h | 6 +++++
mm/hugetlb.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++--
mm/memory_hotplug.c | 42 ++++++++++++++++++++++++-----
mm/page_alloc.c | 12 +++++++++
mm/page_isolation.c | 14 ++++++++++
5 files changed, 136 insertions(+), 9 deletions(-)

diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
index cae5539..e486c50 100644
--- v3.11-rc1.orig/include/linux/hugetlb.h
+++ v3.11-rc1/include/linux/hugetlb.h
@@ -69,6 +69,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
bool isolate_huge_page(struct page *page, struct list_head *list);
void putback_active_hugepage(struct page *page);
void putback_active_hugepages(struct list_head *list);
+bool is_hugepage_active(struct page *page);
void copy_huge_page(struct page *dst, struct page *src);

#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
@@ -140,6 +141,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
#define isolate_huge_page(p, l) false
#define putback_active_hugepage(p) do {} while (0)
#define putback_active_hugepages(l) do {} while (0)
+#define is_hugepage_active(x) false
static inline void copy_huge_page(struct page *dst, struct page *src)
{
}
@@ -379,6 +381,9 @@ static inline pgoff_t basepage_index(struct page *page)
return __basepage_index(page);
}

+extern void dissolve_free_huge_pages(unsigned long start_pfn,
+ unsigned long end_pfn);
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
#define alloc_huge_page_node(h, nid) NULL
@@ -405,6 +410,7 @@ static inline pgoff_t basepage_index(struct page *page)
{
return page->index;
}
+#define dissolve_free_huge_pages(s, e) do {} while (0)
#endif /* CONFIG_HUGETLB_PAGE */

#endif /* _LINUX_HUGETLB_H */
diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
index f6d8d67..75bbdc0 100644
--- v3.11-rc1.orig/mm/hugetlb.c
+++ v3.11-rc1/mm/hugetlb.c
@@ -21,6 +21,7 @@
#include <linux/rmap.h>
#include <linux/swap.h>
#include <linux/swapops.h>
+#include <linux/page-isolation.h>

#include <asm/page.h>
#include <asm/pgtable.h>
@@ -518,9 +519,15 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
{
struct page *page;

- if (list_empty(&h->hugepage_freelists[nid]))
+ list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
+ if (!is_migrate_isolate_page(page))
+ break;
+ /*
+ * if 'non-isolated free hugepage' not found on the list,
+ * the allocation fails.
+ */
+ if (&h->hugepage_freelists[nid] == &page->lru)
return NULL;
- page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
h->free_huge_pages--;
@@ -861,6 +868,44 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
return ret;
}

+/*
+ * Dissolve a given free hugepage into free buddy pages. This function does
+ * nothing for in-use (including surplus) hugepages.
+ */
+static void dissolve_free_huge_page(struct page *page)
+{
+ spin_lock(&hugetlb_lock);
+ if (PageHuge(page) && !page_count(page)) {
+ struct hstate *h = page_hstate(page);
+ int nid = page_to_nid(page);
+ list_del(&page->lru);
+ h->free_huge_pages--;
+ h->free_huge_pages_node[nid]--;
+ update_and_free_page(h, page);
+ }
+ spin_unlock(&hugetlb_lock);
+}
+
+/*
+ * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
+ * make specified memory blocks removable from the system.
+ * Note that start_pfn should aligned with (minimum) hugepage size.
+ */
+void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned int order = 8 * sizeof(void *);
+ unsigned long pfn;
+ struct hstate *h;
+
+ /* Set scan step to minimum hugepage size */
+ for_each_hstate(h)
+ if (order > huge_page_order(h))
+ order = huge_page_order(h);
+ VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << order));
+ for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << order)
+ dissolve_free_huge_page(pfn_to_page(pfn));
+}
+
static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
{
struct page *page;
@@ -3418,6 +3463,28 @@ static int is_hugepage_on_freelist(struct page *hpage)
return 0;
}

+bool is_hugepage_active(struct page *page)
+{
+ VM_BUG_ON(!PageHuge(page));
+ /*
+ * This function can be called for a tail page because the caller,
+ * scan_movable_pages, scans through a given pfn-range which typically
+ * covers one memory block. In systems using gigantic hugepage (1GB
+ * for x86_64,) a hugepage is larger than a memory block, and we don't
+ * support migrating such large hugepages for now, so return false
+ * when called for tail pages.
+ */
+ if (PageTail(page))
+ return false;
+ /*
+ * Refcount of a hwpoisoned hugepages is 1, but they are not active,
+ * so we should return false for them.
+ */
+ if (unlikely(PageHWPoison(page)))
+ return false;
+ return page_count(page) > 0;
+}
+
/*
* This function is called from memory failure code.
* Assume the caller holds page lock of the head page.
diff --git v3.11-rc1.orig/mm/memory_hotplug.c v3.11-rc1/mm/memory_hotplug.c
index ca1dd3a..31f08fa 100644
--- v3.11-rc1.orig/mm/memory_hotplug.c
+++ v3.11-rc1/mm/memory_hotplug.c
@@ -30,6 +30,7 @@
#include <linux/mm_inline.h>
#include <linux/firmware-map.h>
#include <linux/stop_machine.h>
+#include <linux/hugetlb.h>

#include <asm/tlbflush.h>

@@ -1208,10 +1209,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
}

/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
+ * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
+ * and hugepages). We scan pfn because it's much easier than scanning over
+ * linked list. This function returns the pfn of the first found movable
+ * page if it's found, otherwise 0.
*/
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
{
unsigned long pfn;
struct page *page;
@@ -1220,6 +1223,13 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
page = pfn_to_page(pfn);
if (PageLRU(page))
return pfn;
+ if (PageHuge(page)) {
+ if (is_hugepage_active(page))
+ return pfn;
+ else
+ pfn = round_up(pfn + 1,
+ 1 << compound_order(page)) - 1;
+ }
}
}
return 0;
@@ -1240,6 +1250,19 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
if (!pfn_valid(pfn))
continue;
page = pfn_to_page(pfn);
+
+ if (PageHuge(page)) {
+ struct page *head = compound_head(page);
+ pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
+ if (compound_order(head) > PFN_SECTION_SHIFT) {
+ ret = -EBUSY;
+ break;
+ }
+ if (isolate_huge_page(page, &source))
+ move_pages -= 1 << compound_order(head);
+ continue;
+ }
+
if (!get_page_unless_zero(page))
continue;
/*
@@ -1272,7 +1295,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
}
if (!list_empty(&source)) {
if (not_managed) {
- putback_lru_pages(&source);
+ putback_movable_pages(&source);
goto out;
}

@@ -1283,7 +1306,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
ret = migrate_pages(&source, alloc_migrate_target, 0,
MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
if (ret)
- putback_lru_pages(&source);
+ putback_movable_pages(&source);
}
out:
return ret;
@@ -1527,8 +1550,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
drain_all_pages();
}

- pfn = scan_lru_pages(start_pfn, end_pfn);
- if (pfn) { /* We have page on LRU */
+ pfn = scan_movable_pages(start_pfn, end_pfn);
+ if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
if (!ret) {
drain = 1;
@@ -1547,6 +1570,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
yield();
/* drain pcp pages, this is synchronous. */
drain_all_pages();
+ /*
+ * dissolve free hugepages in the memory block before doing offlining
+ * actually in order to make hugetlbfs's object counting consistent.
+ */
+ dissolve_free_huge_pages(start_pfn, end_pfn);
/* check again */
offlined_pages = check_pages_isolated(start_pfn, end_pfn);
if (offlined_pages < 0) {
diff --git v3.11-rc1.orig/mm/page_alloc.c v3.11-rc1/mm/page_alloc.c
index b100255..24fe228 100644
--- v3.11-rc1.orig/mm/page_alloc.c
+++ v3.11-rc1/mm/page_alloc.c
@@ -60,6 +60,7 @@
#include <linux/page-debug-flags.h>
#include <linux/hugetlb.h>
#include <linux/sched/rt.h>
+#include <linux/hugetlb.h>

#include <asm/sections.h>
#include <asm/tlbflush.h>
@@ -5928,6 +5929,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
continue;

page = pfn_to_page(check);
+
+ /*
+ * Hugepages are not in LRU lists, but they're movable.
+ * We need not scan over tail pages bacause we don't
+ * handle each tail page individually in migration.
+ */
+ if (PageHuge(page)) {
+ iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
+ continue;
+ }
+
/*
* We can't use page_count without pin a page
* because another CPU can free compound page.
diff --git v3.11-rc1.orig/mm/page_isolation.c v3.11-rc1/mm/page_isolation.c
index 383bdbb..229d66f 100644
--- v3.11-rc1.orig/mm/page_isolation.c
+++ v3.11-rc1/mm/page_isolation.c
@@ -6,6 +6,7 @@
#include <linux/page-isolation.h>
#include <linux/pageblock-flags.h>
#include <linux/memory.h>
+#include <linux/hugetlb.h>
#include "internal.h"

int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
@@ -252,6 +253,19 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
{
gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;

+ /*
+ * TODO: allocate a destination hugepage from a nearest neighbor node,
+ * accordance with memory policy of the user process if possible. For
+ * now as a simple work-around, we use the next node for destination.
+ */
+ if (PageHuge(page)) {
+ nodemask_t src = nodemask_of_node(page_to_nid(page));
+ nodemask_t dst;
+ nodes_complement(dst, src);
+ return alloc_huge_page_node(page_hstate(compound_head(page)),
+ next_node(page_to_nid(page), dst));
+ }
+
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;

--
1.8.3.1

2013-07-25 04:56:52

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 1/8] migrate: make core migration code aware of hugepage

Before enabling each user of page migration to support hugepage,
this patch enables the list of pages for migration to link not only
LRU pages, but also hugepages. As a result, putback_movable_pages()
and migrate_pages() can handle both of LRU pages and hugepages.

ChangeLog v4:
- make some macros return 'do {} while(0)'
- use more readable variable name

ChangeLog v3:
- revert introducing migrate_movable_pages
- add isolate_huge_page

ChangeLog v2:
- move code removing VM_HUGETLB from vma_migratable check into a
separate patch
- hold hugetlb_lock in putback_active_hugepage
- update comment near the definition of hugetlb_lock

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
include/linux/hugetlb.h | 6 ++++++
mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++-
mm/migrate.c | 10 +++++++++-
3 files changed, 46 insertions(+), 2 deletions(-)

diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
index c2b1801..c7a14a4 100644
--- v3.11-rc1.orig/include/linux/hugetlb.h
+++ v3.11-rc1/include/linux/hugetlb.h
@@ -66,6 +66,9 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
vm_flags_t vm_flags);
void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
int dequeue_hwpoisoned_huge_page(struct page *page);
+bool isolate_huge_page(struct page *page, struct list_head *list);
+void putback_active_hugepage(struct page *page);
+void putback_active_hugepages(struct list_head *list);
void copy_huge_page(struct page *dst, struct page *src);

#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
@@ -134,6 +137,9 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
return 0;
}

+#define isolate_huge_page(p, l) false
+#define putback_active_hugepage(p) do {} while (0)
+#define putback_active_hugepages(l) do {} while (0)
static inline void copy_huge_page(struct page *dst, struct page *src)
{
}
diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
index 83aff0a..506d195 100644
--- v3.11-rc1.orig/mm/hugetlb.c
+++ v3.11-rc1/mm/hugetlb.c
@@ -48,7 +48,8 @@ static unsigned long __initdata default_hstate_max_huge_pages;
static unsigned long __initdata default_hstate_size;

/*
- * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
+ * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_pages,
+ * free_huge_pages, and surplus_huge_pages.
*/
DEFINE_SPINLOCK(hugetlb_lock);

@@ -3431,3 +3432,32 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
return ret;
}
#endif
+
+bool isolate_huge_page(struct page *page, struct list_head *list)
+{
+ VM_BUG_ON(!PageHead(page));
+ if (!get_page_unless_zero(page))
+ return false;
+ spin_lock(&hugetlb_lock);
+ list_move_tail(&page->lru, list);
+ spin_unlock(&hugetlb_lock);
+ return true;
+}
+
+void putback_active_hugepage(struct page *page)
+{
+ VM_BUG_ON(!PageHead(page));
+ spin_lock(&hugetlb_lock);
+ list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
+ spin_unlock(&hugetlb_lock);
+ put_page(page);
+}
+
+void putback_active_hugepages(struct list_head *list)
+{
+ struct page *page;
+ struct page *page2;
+
+ list_for_each_entry_safe(page, page2, list, lru)
+ putback_active_hugepage(page);
+}
diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
index 6f0c244..b44a067 100644
--- v3.11-rc1.orig/mm/migrate.c
+++ v3.11-rc1/mm/migrate.c
@@ -100,6 +100,10 @@ void putback_movable_pages(struct list_head *l)
struct page *page2;

list_for_each_entry_safe(page, page2, l, lru) {
+ if (unlikely(PageHuge(page))) {
+ putback_active_hugepage(page);
+ continue;
+ }
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
@@ -1025,7 +1029,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
list_for_each_entry_safe(page, page2, from, lru) {
cond_resched();

- rc = unmap_and_move(get_new_page, private,
+ if (PageHuge(page))
+ rc = unmap_and_move_huge_page(get_new_page,
+ private, page, pass > 2, mode);
+ else
+ rc = unmap_and_move(get_new_page, private,
page, pass > 2, mode);

switch(rc) {
--
1.8.3.1

2013-07-25 04:56:25

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 6/8] migrate: remove VM_HUGETLB from vma flag check in vma_migratable()

This patch enables hugepage migration from migrate_pages(2),
move_pages(2), and mbind(2).

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
include/linux/mempolicy.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git v3.11-rc1.orig/include/linux/mempolicy.h v3.11-rc1/include/linux/mempolicy.h
index 0d7df39..2e475b5 100644
--- v3.11-rc1.orig/include/linux/mempolicy.h
+++ v3.11-rc1/include/linux/mempolicy.h
@@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
/* Check if a vma is migratable */
static inline int vma_migratable(struct vm_area_struct *vma)
{
- if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP))
+ if (vma->vm_flags & (VM_IO | VM_PFNMAP))
return 0;
/*
* Migration allocates pages in the highest zone. If we cannot
--
1.8.3.1

2013-07-25 04:57:09

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

Now hugepages are definitely movable. So allocating hugepages from
ZONE_MOVABLE is natural and we have no reason to keep this parameter.
In order to allow userspace to prepare for the removal, let's leave
this sysctl handler as noop for a while.

ChangeLog v3:
- use WARN_ON_ONCE

ChangeLog v2:
- shift to noop function instead of completely removing the parameter
- rename patch title

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
Documentation/sysctl/vm.txt | 13 ++-----------
mm/hugetlb.c | 17 ++++++-----------
2 files changed, 8 insertions(+), 22 deletions(-)

diff --git v3.11-rc1.orig/Documentation/sysctl/vm.txt v3.11-rc1/Documentation/sysctl/vm.txt
index 36ecc26..6e211a1 100644
--- v3.11-rc1.orig/Documentation/sysctl/vm.txt
+++ v3.11-rc1/Documentation/sysctl/vm.txt
@@ -200,17 +200,8 @@ fragmentation index is <= extfrag_threshold. The default value is 500.

hugepages_treat_as_movable

-This parameter is only useful when kernelcore= is specified at boot time to
-create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages
-are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero
-value written to hugepages_treat_as_movable allows huge pages to be allocated
-from ZONE_MOVABLE.
-
-Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge
-pages pool can easily grow or shrink within. Assuming that applications are
-not running that mlock() a lot of memory, it is likely the huge pages pool
-can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value
-into nr_hugepages and triggering page reclaim.
+This parameter is obsolete and planned to be removed. The value has no effect
+on kernel's behavior.

==============================================================

diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
index 75bbdc0..30456e5 100644
--- v3.11-rc1.orig/mm/hugetlb.c
+++ v3.11-rc1/mm/hugetlb.c
@@ -34,7 +34,6 @@
#include "internal.h"

const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
-static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
unsigned long hugepages_treat_as_movable;

int hugetlb_max_hstate __read_mostly;
@@ -550,7 +549,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
retry_cpuset:
cpuset_mems_cookie = get_mems_allowed();
zonelist = huge_zonelist(vma, address,
- htlb_alloc_mask, &mpol, &nodemask);
+ GFP_HIGHUSER_MOVABLE, &mpol, &nodemask);
/*
* A child process with MAP_PRIVATE mappings created by their parent
* have no page reserves. This check ensures that reservations are
@@ -566,7 +565,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,

for_each_zone_zonelist_nodemask(zone, z, zonelist,
MAX_NR_ZONES - 1, nodemask) {
- if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
+ if (cpuset_zone_allowed_softwall(zone, GFP_HIGHUSER_MOVABLE)) {
page = dequeue_huge_page_node(h, zone_to_nid(zone));
if (page) {
if (!avoid_reserve)
@@ -723,7 +722,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
return NULL;

page = alloc_pages_exact_node(nid,
- htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
+ GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE|
__GFP_REPEAT|__GFP_NOWARN,
huge_page_order(h));
if (page) {
@@ -948,12 +947,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
spin_unlock(&hugetlb_lock);

if (nid == NUMA_NO_NODE)
- page = alloc_pages(htlb_alloc_mask|__GFP_COMP|
+ page = alloc_pages(GFP_HIGHUSER_MOVABLE|__GFP_COMP|
__GFP_REPEAT|__GFP_NOWARN,
huge_page_order(h));
else
page = alloc_pages_exact_node(nid,
- htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
+ GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE|
__GFP_REPEAT|__GFP_NOWARN, huge_page_order(h));

if (page && arch_prepare_hugepage(page)) {
@@ -2132,11 +2131,7 @@ int hugetlb_treat_movable_handler(struct ctl_table *table, int write,
void __user *buffer,
size_t *length, loff_t *ppos)
{
- proc_dointvec(table, write, buffer, length, ppos);
- if (hugepages_treat_as_movable)
- htlb_alloc_mask = GFP_HIGHUSER_MOVABLE;
- else
- htlb_alloc_mask = GFP_HIGHUSER;
+ WARN_ON_ONCE("This knob is obsolete and has no effect. It is scheduled for removal.\n");
return 0;
}

--
1.8.3.1

2013-07-25 04:56:12

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 3/8] migrate: add hugepage migration code to migrate_pages()

This patch extends check_range() to handle vma with VM_HUGETLB set.
We will be able to migrate hugepage with migrate_pages(2) after
applying the enablement patch which comes later in this series.

Note that for larger hugepages (covered by pud entries, 1GB for
x86_64 for example), we simply skip it now.

Note that using pmd_huge/pud_huge assumes that hugepages are pointed to
by pmd/pud. This is not true in some architectures implementing hugepage
with other mechanisms like ia64, but it's OK because pmd_huge/pud_huge
simply return 0 in such arch and page walker simply ignores such hugepages.

ChangeLog v4:
- refactored check_hugetlb_pmd_range for better readability

ChangeLog v3:
- revert introducing migrate_movable_pages
- use isolate_huge_page

ChangeLog v2:
- remove unnecessary extern
- fix page table lock in check_hugetlb_pmd_range
- updated description and renamed patch title

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
mm/mempolicy.c | 42 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 37 insertions(+), 5 deletions(-)

diff --git v3.11-rc1.orig/mm/mempolicy.c v3.11-rc1/mm/mempolicy.c
index 7431001..d96afc1 100644
--- v3.11-rc1.orig/mm/mempolicy.c
+++ v3.11-rc1/mm/mempolicy.c
@@ -512,6 +512,30 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
return addr != end;
}

+static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
+ const nodemask_t *nodes, unsigned long flags,
+ void *private)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+ int nid;
+ struct page *page;
+
+ spin_lock(&vma->vm_mm->page_table_lock);
+ page = pte_page(huge_ptep_get((pte_t *)pmd));
+ nid = page_to_nid(page);
+ if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT))
+ goto unlock;
+ /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
+ if (flags & (MPOL_MF_MOVE_ALL) ||
+ (flags & MPOL_MF_MOVE && page_mapcount(page) == 1))
+ isolate_huge_page(page, private);
+unlock:
+ spin_unlock(&vma->vm_mm->page_table_lock);
+#else
+ BUG();
+#endif
+}
+
static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
const nodemask_t *nodes, unsigned long flags,
@@ -523,6 +547,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
pmd = pmd_offset(pud, addr);
do {
next = pmd_addr_end(addr, end);
+ if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
+ check_hugetlb_pmd_range(vma, pmd, nodes,
+ flags, private);
+ continue;
+ }
split_huge_page_pmd(vma, addr, pmd);
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
continue;
@@ -544,6 +573,8 @@ static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
+ if (pud_huge(*pud) && is_vm_hugetlb_page(vma))
+ continue;
if (pud_none_or_clear_bad(pud))
continue;
if (check_pmd_range(vma, pud, addr, next, nodes,
@@ -635,9 +666,6 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
return ERR_PTR(-EFAULT);
}

- if (is_vm_hugetlb_page(vma))
- goto next;
-
if (flags & MPOL_MF_LAZY) {
change_prot_numa(vma, start, endvma);
goto next;
@@ -986,7 +1014,11 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,

static struct page *new_node_page(struct page *page, unsigned long node, int **x)
{
- return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
+ if (PageHuge(page))
+ return alloc_huge_page_node(page_hstate(compound_head(page)),
+ node);
+ else
+ return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
}

/*
@@ -1016,7 +1048,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
err = migrate_pages(&pagelist, new_node_page, dest,
MIGRATE_SYNC, MR_SYSCALL);
if (err)
- putback_lru_pages(&pagelist);
+ putback_movable_pages(&pagelist);
}

return err;
--
1.8.3.1

2013-07-25 04:57:54

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 4/8] migrate: add hugepage migration code to move_pages()

This patch extends move_pages() to handle vma with VM_HUGETLB set.
We will be able to migrate hugepage with move_pages(2) after
applying the enablement patch which comes later in this series.

We avoid getting refcount on tail pages of hugepage, because unlike thp,
hugepage is not split and we need not care about races with splitting.

And migration of larger (1GB for x86_64) hugepage are not enabled.

ChangeLog v4:
- use get_page instead of get_page_foll
- add comment in follow_page_mask

ChangeLog v3:
- revert introducing migrate_movable_pages
- follow_page_mask(FOLL_GET) returns NULL for tail pages
- use isolate_huge_page

ChangeLog v2:
- updated description and renamed patch title

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
---
mm/memory.c | 17 +++++++++++++++--
mm/migrate.c | 13 +++++++++++--
2 files changed, 26 insertions(+), 4 deletions(-)

diff --git v3.11-rc1.orig/mm/memory.c v3.11-rc1/mm/memory.c
index 1ce2e2a..7ec1252 100644
--- v3.11-rc1.orig/mm/memory.c
+++ v3.11-rc1/mm/memory.c
@@ -1496,7 +1496,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
if (pud_none(*pud))
goto no_page_table;
if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
- BUG_ON(flags & FOLL_GET);
+ if (flags & FOLL_GET)
+ goto out;
page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
goto out;
}
@@ -1507,8 +1508,20 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
if (pmd_none(*pmd))
goto no_page_table;
if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
- BUG_ON(flags & FOLL_GET);
page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
+ if (flags & FOLL_GET) {
+ /*
+ * Refcount on tail pages are not well-defined and
+ * shouldn't be taken. The caller should handle a NULL
+ * return when trying to follow tail pages.
+ */
+ if (PageHead(page))
+ get_page(page);
+ else {
+ page = NULL;
+ goto out;
+ }
+ }
goto out;
}
if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
index 3ec47d3..d313737 100644
--- v3.11-rc1.orig/mm/migrate.c
+++ v3.11-rc1/mm/migrate.c
@@ -1092,7 +1092,11 @@ static struct page *new_page_node(struct page *p, unsigned long private,

*result = &pm->status;

- return alloc_pages_exact_node(pm->node,
+ if (PageHuge(p))
+ return alloc_huge_page_node(page_hstate(compound_head(p)),
+ pm->node);
+ else
+ return alloc_pages_exact_node(pm->node,
GFP_HIGHUSER_MOVABLE | GFP_THISNODE, 0);
}

@@ -1152,6 +1156,11 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
!migrate_all)
goto put_and_set;

+ if (PageHuge(page)) {
+ isolate_huge_page(page, &pagelist);
+ goto put_and_set;
+ }
+
err = isolate_lru_page(page);
if (!err) {
list_add_tail(&page->lru, &pagelist);
@@ -1174,7 +1183,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
err = migrate_pages(&pagelist, new_page_node,
(unsigned long)pm, MIGRATE_SYNC, MR_SYSCALL);
if (err)
- putback_lru_pages(&pagelist);
+ putback_movable_pages(&pagelist);
}

up_read(&mm->mmap_sem);
--
1.8.3.1

2013-07-25 06:06:34

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH 1/8] migrate: make core migration code aware of hugepage

On Thu, Jul 25, 2013 at 12:54 PM, Naoya Horiguchi
<[email protected]> wrote:
> Before enabling each user of page migration to support hugepage,
> this patch enables the list of pages for migration to link not only
> LRU pages, but also hugepages. As a result, putback_movable_pages()
> and migrate_pages() can handle both of LRU pages and hugepages.
>
> ChangeLog v4:
> - make some macros return 'do {} while(0)'
> - use more readable variable name
>
> ChangeLog v3:
> - revert introducing migrate_movable_pages
> - add isolate_huge_page
>
> ChangeLog v2:
> - move code removing VM_HUGETLB from vma_migratable check into a
> separate patch
> - hold hugetlb_lock in putback_active_hugepage
> - update comment near the definition of hugetlb_lock
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Acked-by: Andi Kleen <[email protected]>
> Reviewed-by: Wanpeng Li <[email protected]>
> ---
Acked-by: Hillf Danton <[email protected]>

> include/linux/hugetlb.h | 6 ++++++
> mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++-
> mm/migrate.c | 10 +++++++++-
> 3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
> index c2b1801..c7a14a4 100644
> --- v3.11-rc1.orig/include/linux/hugetlb.h
> +++ v3.11-rc1/include/linux/hugetlb.h
> @@ -66,6 +66,9 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
> vm_flags_t vm_flags);
> void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> int dequeue_hwpoisoned_huge_page(struct page *page);
> +bool isolate_huge_page(struct page *page, struct list_head *list);
> +void putback_active_hugepage(struct page *page);
> +void putback_active_hugepages(struct list_head *list);
> void copy_huge_page(struct page *dst, struct page *src);
>
> #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
> @@ -134,6 +137,9 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> return 0;
> }
>
> +#define isolate_huge_page(p, l) false
> +#define putback_active_hugepage(p) do {} while (0)
> +#define putback_active_hugepages(l) do {} while (0)
> static inline void copy_huge_page(struct page *dst, struct page *src)
> {
> }
> diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
> index 83aff0a..506d195 100644
> --- v3.11-rc1.orig/mm/hugetlb.c
> +++ v3.11-rc1/mm/hugetlb.c
> @@ -48,7 +48,8 @@ static unsigned long __initdata default_hstate_max_huge_pages;
> static unsigned long __initdata default_hstate_size;
>
> /*
> - * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> + * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_pages,
> + * free_huge_pages, and surplus_huge_pages.
> */
> DEFINE_SPINLOCK(hugetlb_lock);
>
> @@ -3431,3 +3432,32 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
> return ret;
> }
> #endif
> +
> +bool isolate_huge_page(struct page *page, struct list_head *list)
> +{
> + VM_BUG_ON(!PageHead(page));
> + if (!get_page_unless_zero(page))
> + return false;
> + spin_lock(&hugetlb_lock);
> + list_move_tail(&page->lru, list);
> + spin_unlock(&hugetlb_lock);
> + return true;
> +}
> +
> +void putback_active_hugepage(struct page *page)
> +{
> + VM_BUG_ON(!PageHead(page));
> + spin_lock(&hugetlb_lock);
> + list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
> + spin_unlock(&hugetlb_lock);
> + put_page(page);
> +}
> +
> +void putback_active_hugepages(struct list_head *list)
> +{
> + struct page *page;
> + struct page *page2;
> +
> + list_for_each_entry_safe(page, page2, list, lru)
> + putback_active_hugepage(page);
> +}
> diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
> index 6f0c244..b44a067 100644
> --- v3.11-rc1.orig/mm/migrate.c
> +++ v3.11-rc1/mm/migrate.c
> @@ -100,6 +100,10 @@ void putback_movable_pages(struct list_head *l)
> struct page *page2;
>
> list_for_each_entry_safe(page, page2, l, lru) {
> + if (unlikely(PageHuge(page))) {
> + putback_active_hugepage(page);
> + continue;
> + }
> list_del(&page->lru);
> dec_zone_page_state(page, NR_ISOLATED_ANON +
> page_is_file_cache(page));
> @@ -1025,7 +1029,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> list_for_each_entry_safe(page, page2, from, lru) {
> cond_resched();
>
> - rc = unmap_and_move(get_new_page, private,
> + if (PageHuge(page))
> + rc = unmap_and_move_huge_page(get_new_page,
> + private, page, pass > 2, mode);
> + else
> + rc = unmap_and_move(get_new_page, private,
> page, pass > 2, mode);
>
> switch(rc) {
> --
> 1.8.3.1
>

2013-07-25 06:09:18

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH 2/8] soft-offline: use migrate_pages() instead of migrate_huge_page()

On Thu, Jul 25, 2013 at 12:54 PM, Naoya Horiguchi
<[email protected]> wrote:
> Currently migrate_huge_page() takes a pointer to a hugepage to be
> migrated as an argument, instead of taking a pointer to the list of
> hugepages to be migrated. This behavior was introduced in commit
> 189ebff28 ("hugetlb: simplify migrate_huge_page()"), and was OK
> because until now hugepage migration is enabled only for soft-offlining
> which migrates only one hugepage in a single call.
>
> But the situation will change in the later patches in this series
> which enable other users of page migration to support hugepage migration.
> They can kick migration for both of normal pages and hugepages
> in a single call, so we need to go back to original implementation
> which uses linked lists to collect the hugepages to be migrated.
>
> With this patch, soft_offline_huge_page() switches to use migrate_pages(),
> and migrate_huge_page() is not used any more. So let's remove it.
>
> ChangeLog v3:
> - Merged with another cleanup patch (4/10 in previous version)
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Acked-by: Andi Kleen <[email protected]>
> Reviewed-by: Wanpeng Li <[email protected]>
> ---
Acked-by: Hillf Danton <[email protected]>

> include/linux/migrate.h | 5 -----
> mm/memory-failure.c | 15 ++++++++++++---
> mm/migrate.c | 28 ++--------------------------
> 3 files changed, 14 insertions(+), 34 deletions(-)
>
> diff --git v3.11-rc1.orig/include/linux/migrate.h v3.11-rc1/include/linux/migrate.h
> index a405d3dc..6fe5214 100644
> --- v3.11-rc1.orig/include/linux/migrate.h
> +++ v3.11-rc1/include/linux/migrate.h
> @@ -41,8 +41,6 @@ extern int migrate_page(struct address_space *,
> struct page *, struct page *, enum migrate_mode);
> extern int migrate_pages(struct list_head *l, new_page_t x,
> unsigned long private, enum migrate_mode mode, int reason);
> -extern int migrate_huge_page(struct page *, new_page_t x,
> - unsigned long private, enum migrate_mode mode);
>
> extern int fail_migrate_page(struct address_space *,
> struct page *, struct page *);
> @@ -62,9 +60,6 @@ static inline void putback_movable_pages(struct list_head *l) {}
> static inline int migrate_pages(struct list_head *l, new_page_t x,
> unsigned long private, enum migrate_mode mode, int reason)
> { return -ENOSYS; }
> -static inline int migrate_huge_page(struct page *page, new_page_t x,
> - unsigned long private, enum migrate_mode mode)
> - { return -ENOSYS; }
>
> static inline int migrate_prep(void) { return -ENOSYS; }
> static inline int migrate_prep_local(void) { return -ENOSYS; }
> diff --git v3.11-rc1.orig/mm/memory-failure.c v3.11-rc1/mm/memory-failure.c
> index 2c13aa7..af6f61c 100644
> --- v3.11-rc1.orig/mm/memory-failure.c
> +++ v3.11-rc1/mm/memory-failure.c
> @@ -1467,6 +1467,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
> int ret;
> unsigned long pfn = page_to_pfn(page);
> struct page *hpage = compound_head(page);
> + LIST_HEAD(pagelist);
>
> /*
> * This double-check of PageHWPoison is to avoid the race with
> @@ -1482,12 +1483,20 @@ static int soft_offline_huge_page(struct page *page, int flags)
> unlock_page(hpage);
>
> /* Keep page count to indicate a given hugepage is isolated. */
> - ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL,
> - MIGRATE_SYNC);
> - put_page(hpage);
> + list_move(&hpage->lru, &pagelist);
> + ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
> + MIGRATE_SYNC, MR_MEMORY_FAILURE);
> if (ret) {
> pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
> pfn, ret, page->flags);
> + /*
> + * We know that soft_offline_huge_page() tries to migrate
> + * only one hugepage pointed to by hpage, so we need not
> + * run through the pagelist here.
> + */
> + putback_active_hugepage(hpage);
> + if (ret > 0)
> + ret = -EIO;
> } else {
> set_page_hwpoison_huge_page(hpage);
> dequeue_hwpoisoned_huge_page(hpage);
> diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
> index b44a067..3ec47d3 100644
> --- v3.11-rc1.orig/mm/migrate.c
> +++ v3.11-rc1/mm/migrate.c
> @@ -979,6 +979,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>
> unlock_page(hpage);
> out:
> + if (rc != -EAGAIN)
> + putback_active_hugepage(hpage);
> put_page(new_hpage);
> if (result) {
> if (rc)
> @@ -1066,32 +1068,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> return rc;
> }
>
> -int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
> - unsigned long private, enum migrate_mode mode)
> -{
> - int pass, rc;
> -
> - for (pass = 0; pass < 10; pass++) {
> - rc = unmap_and_move_huge_page(get_new_page, private,
> - hpage, pass > 2, mode);
> - switch (rc) {
> - case -ENOMEM:
> - goto out;
> - case -EAGAIN:
> - /* try again */
> - cond_resched();
> - break;
> - case MIGRATEPAGE_SUCCESS:
> - goto out;
> - default:
> - rc = -EIO;
> - goto out;
> - }
> - }
> -out:
> - return rc;
> -}
> -
> #ifdef CONFIG_NUMA
> /*
> * Move a list of individual pages
> --
> 1.8.3.1
>

2013-07-25 06:17:31

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH 3/8] migrate: add hugepage migration code to migrate_pages()

On Thu, Jul 25, 2013 at 12:54 PM, Naoya Horiguchi
<[email protected]> wrote:
> This patch extends check_range() to handle vma with VM_HUGETLB set.
> We will be able to migrate hugepage with migrate_pages(2) after
> applying the enablement patch which comes later in this series.
>
> Note that for larger hugepages (covered by pud entries, 1GB for
> x86_64 for example), we simply skip it now.
>
> Note that using pmd_huge/pud_huge assumes that hugepages are pointed to
> by pmd/pud. This is not true in some architectures implementing hugepage
> with other mechanisms like ia64, but it's OK because pmd_huge/pud_huge
> simply return 0 in such arch and page walker simply ignores such hugepages.
>
> ChangeLog v4:
> - refactored check_hugetlb_pmd_range for better readability
>
> ChangeLog v3:
> - revert introducing migrate_movable_pages
> - use isolate_huge_page
>
> ChangeLog v2:
> - remove unnecessary extern
> - fix page table lock in check_hugetlb_pmd_range
> - updated description and renamed patch title
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Acked-by: Andi Kleen <[email protected]>
> Reviewed-by: Wanpeng Li <[email protected]>
> ---
Acked-by: Hillf Danton <[email protected]>

> mm/mempolicy.c | 42 +++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 37 insertions(+), 5 deletions(-)
>
> diff --git v3.11-rc1.orig/mm/mempolicy.c v3.11-rc1/mm/mempolicy.c
> index 7431001..d96afc1 100644
> --- v3.11-rc1.orig/mm/mempolicy.c
> +++ v3.11-rc1/mm/mempolicy.c
> @@ -512,6 +512,30 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> return addr != end;
> }
>
> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> + const nodemask_t *nodes, unsigned long flags,
> + void *private)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> + int nid;
> + struct page *page;
> +
> + spin_lock(&vma->vm_mm->page_table_lock);
> + page = pte_page(huge_ptep_get((pte_t *)pmd));
> + nid = page_to_nid(page);
> + if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT))
> + goto unlock;
> + /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
> + if (flags & (MPOL_MF_MOVE_ALL) ||
> + (flags & MPOL_MF_MOVE && page_mapcount(page) == 1))
> + isolate_huge_page(page, private);
> +unlock:
> + spin_unlock(&vma->vm_mm->page_table_lock);
> +#else
> + BUG();
> +#endif
> +}
> +
> static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> unsigned long addr, unsigned long end,
> const nodemask_t *nodes, unsigned long flags,
> @@ -523,6 +547,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> pmd = pmd_offset(pud, addr);
> do {
> next = pmd_addr_end(addr, end);
> + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> + check_hugetlb_pmd_range(vma, pmd, nodes,
> + flags, private);
> + continue;
> + }
> split_huge_page_pmd(vma, addr, pmd);
> if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> continue;
> @@ -544,6 +573,8 @@ static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
> pud = pud_offset(pgd, addr);
> do {
> next = pud_addr_end(addr, end);
> + if (pud_huge(*pud) && is_vm_hugetlb_page(vma))
> + continue;
> if (pud_none_or_clear_bad(pud))
> continue;
> if (check_pmd_range(vma, pud, addr, next, nodes,
> @@ -635,9 +666,6 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
> return ERR_PTR(-EFAULT);
> }
>
> - if (is_vm_hugetlb_page(vma))
> - goto next;
> -
> if (flags & MPOL_MF_LAZY) {
> change_prot_numa(vma, start, endvma);
> goto next;
> @@ -986,7 +1014,11 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
>
> static struct page *new_node_page(struct page *page, unsigned long node, int **x)
> {
> - return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
> + if (PageHuge(page))
> + return alloc_huge_page_node(page_hstate(compound_head(page)),
> + node);
> + else
> + return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
> }
>
> /*
> @@ -1016,7 +1048,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
> err = migrate_pages(&pagelist, new_node_page, dest,
> MIGRATE_SYNC, MR_SYSCALL);
> if (err)
> - putback_lru_pages(&pagelist);
> + putback_movable_pages(&pagelist);
> }
>
> return err;
> --
> 1.8.3.1
>

2013-07-25 06:33:12

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH 5/8] mbind: add hugepage migration code to mbind()

On Thu, Jul 25, 2013 at 12:55 PM, Naoya Horiguchi
<[email protected]> wrote:
> This patch extends do_mbind() to handle vma with VM_HUGETLB set.
> We will be able to migrate hugepage with mbind(2) after
> applying the enablement patch which comes later in this series.
>
> ChangeLog v3:
> - revert introducing migrate_movable_pages
> - added alloc_huge_page_noerr free from ERR_VALUE
>
> ChangeLog v2:
> - updated description and renamed patch title
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Acked-by: Andi Kleen <[email protected]>
> Reviewed-by: Wanpeng Li <[email protected]>
> ---
Acked-by: Hillf Danton <[email protected]>

> include/linux/hugetlb.h | 3 +++
> mm/hugetlb.c | 14 ++++++++++++++
> mm/mempolicy.c | 4 +++-
> 3 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
> index c7a14a4..cae5539 100644
> --- v3.11-rc1.orig/include/linux/hugetlb.h
> +++ v3.11-rc1/include/linux/hugetlb.h
> @@ -267,6 +267,8 @@ struct huge_bootmem_page {
> };
>
> struct page *alloc_huge_page_node(struct hstate *h, int nid);
> +struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
> + unsigned long addr, int avoid_reserve);
>
> /* arch callback */
> int __init alloc_bootmem_huge_page(struct hstate *h);
> @@ -380,6 +382,7 @@ static inline pgoff_t basepage_index(struct page *page)
> #else /* CONFIG_HUGETLB_PAGE */
> struct hstate {};
> #define alloc_huge_page_node(h, nid) NULL
> +#define alloc_huge_page_noerr(v, a, r) NULL
> #define alloc_bootmem_huge_page(h) NULL
> #define hstate_file(f) NULL
> #define hstate_sizelog(s) NULL
> diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
> index 506d195..f6d8d67 100644
> --- v3.11-rc1.orig/mm/hugetlb.c
> +++ v3.11-rc1/mm/hugetlb.c
> @@ -1195,6 +1195,20 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
> return page;
> }
>
> +/*
> + * alloc_huge_page()'s wrapper which simply returns the page if allocation
> + * succeeds, otherwise NULL. This function is called from new_vma_page(),
> + * where no ERR_VALUE is expected to be returned.
> + */
> +struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
> + unsigned long addr, int avoid_reserve)
> +{
> + struct page *page = alloc_huge_page(vma, addr, avoid_reserve);
> + if (IS_ERR(page))
> + page = NULL;
> + return page;
> +}
> +
> int __weak alloc_bootmem_huge_page(struct hstate *h)
> {
> struct huge_bootmem_page *m;
> diff --git v3.11-rc1.orig/mm/mempolicy.c v3.11-rc1/mm/mempolicy.c
> index d96afc1..4a03c14 100644
> --- v3.11-rc1.orig/mm/mempolicy.c
> +++ v3.11-rc1/mm/mempolicy.c
> @@ -1183,6 +1183,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int *
> vma = vma->vm_next;
> }
>
> + if (PageHuge(page))
> + return alloc_huge_page_noerr(vma, address, 1);
> /*
> * if !vma, alloc_page_vma() will use task or system default policy
> */
> @@ -1293,7 +1295,7 @@ static long do_mbind(unsigned long start, unsigned long len,
> (unsigned long)vma,
> MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
> if (nr_failed)
> - putback_lru_pages(&pagelist);
> + putback_movable_pages(&pagelist);
> }
>
> if (nr_failed && (flags & MPOL_MF_STRICT))
> --
> 1.8.3.1
>

2013-07-30 18:28:37

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 1/8] migrate: make core migration code aware of hugepage

Naoya Horiguchi <[email protected]> writes:

> Before enabling each user of page migration to support hugepage,
> this patch enables the list of pages for migration to link not only
> LRU pages, but also hugepages. As a result, putback_movable_pages()
> and migrate_pages() can handle both of LRU pages and hugepages.
>
> ChangeLog v4:
> - make some macros return 'do {} while(0)'
> - use more readable variable name
>
> ChangeLog v3:
> - revert introducing migrate_movable_pages
> - add isolate_huge_page
>
> ChangeLog v2:
> - move code removing VM_HUGETLB from vma_migratable check into a
> separate patch
> - hold hugetlb_lock in putback_active_hugepage
> - update comment near the definition of hugetlb_lock
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Acked-by: Andi Kleen <[email protected]>
> Reviewed-by: Wanpeng Li <[email protected]>
> ---
> include/linux/hugetlb.h | 6 ++++++
> mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++-
> mm/migrate.c | 10 +++++++++-
> 3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
> index c2b1801..c7a14a4 100644
> --- v3.11-rc1.orig/include/linux/hugetlb.h
> +++ v3.11-rc1/include/linux/hugetlb.h
> @@ -66,6 +66,9 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
> vm_flags_t vm_flags);
> void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> int dequeue_hwpoisoned_huge_page(struct page *page);
> +bool isolate_huge_page(struct page *page, struct list_head *list);
> +void putback_active_hugepage(struct page *page);
> +void putback_active_hugepages(struct list_head *list);

are we using putback_active_hugepages in the patch series ?


> void copy_huge_page(struct page *dst, struct page *src);
>
> #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
> @@ -134,6 +137,9 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> return 0;
> }

-aneesh

2013-07-30 18:32:40

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

Naoya Horiguchi <[email protected]> writes:

> Now hugepages are definitely movable. So allocating hugepages from
> ZONE_MOVABLE is natural and we have no reason to keep this parameter.
> In order to allow userspace to prepare for the removal, let's leave
> this sysctl handler as noop for a while.

I guess you still need to handle architectures for which pmd_huge is

int pmd_huge(pmd_t pmd)
{
return 0;
}

embedded powerpc is one. They don't store pte information at the PMD
level. Instead pmd contains a pointer to hugepage directory which
contain huge pte.

-aneesh

2013-07-30 18:49:16

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 1/8] migrate: make core migration code aware of hugepage

On Tue, Jul 30, 2013 at 11:58:27PM +0530, Aneesh Kumar K.V wrote:
> Naoya Horiguchi <[email protected]> writes:
>
> > Before enabling each user of page migration to support hugepage,
> > this patch enables the list of pages for migration to link not only
> > LRU pages, but also hugepages. As a result, putback_movable_pages()
> > and migrate_pages() can handle both of LRU pages and hugepages.
> >
> > ChangeLog v4:
> > - make some macros return 'do {} while(0)'
> > - use more readable variable name
> >
> > ChangeLog v3:
> > - revert introducing migrate_movable_pages
> > - add isolate_huge_page
> >
> > ChangeLog v2:
> > - move code removing VM_HUGETLB from vma_migratable check into a
> > separate patch
> > - hold hugetlb_lock in putback_active_hugepage
> > - update comment near the definition of hugetlb_lock
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > Acked-by: Andi Kleen <[email protected]>
> > Reviewed-by: Wanpeng Li <[email protected]>
> > ---
> > include/linux/hugetlb.h | 6 ++++++
> > mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++-
> > mm/migrate.c | 10 +++++++++-
> > 3 files changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
> > index c2b1801..c7a14a4 100644
> > --- v3.11-rc1.orig/include/linux/hugetlb.h
> > +++ v3.11-rc1/include/linux/hugetlb.h
> > @@ -66,6 +66,9 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
> > vm_flags_t vm_flags);
> > void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> > int dequeue_hwpoisoned_huge_page(struct page *page);
> > +bool isolate_huge_page(struct page *page, struct list_head *list);
> > +void putback_active_hugepage(struct page *page);
> > +void putback_active_hugepages(struct list_head *list);
>
> are we using putback_active_hugepages in the patch series ?

This function has no user, so shouldn't be added.
I forgot to clean it up when changing code.
Thanks for pointing out.

Naoya

>
> > void copy_huge_page(struct page *dst, struct page *src);
> >
> > #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
> > @@ -134,6 +137,9 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> > return 0;
> > }
>
> -aneesh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2013-07-31 20:24:44

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

On Wed, Jul 31, 2013 at 12:02:30AM +0530, Aneesh Kumar K.V wrote:
> Naoya Horiguchi <[email protected]> writes:
>
> > Now hugepages are definitely movable. So allocating hugepages from
> > ZONE_MOVABLE is natural and we have no reason to keep this parameter.
> > In order to allow userspace to prepare for the removal, let's leave
> > this sysctl handler as noop for a while.
>
> I guess you still need to handle architectures for which pmd_huge is
>
> int pmd_huge(pmd_t pmd)
> {
> return 0;
> }
>
> embedded powerpc is one. They don't store pte information at the PMD
> level. Instead pmd contains a pointer to hugepage directory which
> contain huge pte.

It seems that this comment is for the whole series, not just for this
patch, right?

Some users of hugepage migration (mbind, move_pages, migrate_pages)
walk over page tables to collect hugepages to be migrated, where
hugepages are just ignored in such architectures due to pmd_huge.
So no problem for these users.

But the other users (softoffline, memory hotremove) choose hugepages
to be migrated based on pfn, where they don't check pmd_huge.
As you wrote, this can be problematic for such architectures.
So I think of adding pmd_huge() check somewhere (in unmap_and_move_huge_page
for example) to make it fail for such architectures.

Thanks,
Naoya Horiguchi

2013-08-01 06:00:04

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

Naoya Horiguchi <[email protected]> writes:

> On Wed, Jul 31, 2013 at 12:02:30AM +0530, Aneesh Kumar K.V wrote:
>> Naoya Horiguchi <[email protected]> writes:
>>
>> > Now hugepages are definitely movable. So allocating hugepages from
>> > ZONE_MOVABLE is natural and we have no reason to keep this parameter.
>> > In order to allow userspace to prepare for the removal, let's leave
>> > this sysctl handler as noop for a while.
>>
>> I guess you still need to handle architectures for which pmd_huge is
>>
>> int pmd_huge(pmd_t pmd)
>> {
>> return 0;
>> }
>>
>> embedded powerpc is one. They don't store pte information at the PMD
>> level. Instead pmd contains a pointer to hugepage directory which
>> contain huge pte.
>
> It seems that this comment is for the whole series, not just for this
> patch, right?
>
> Some users of hugepage migration (mbind, move_pages, migrate_pages)
> walk over page tables to collect hugepages to be migrated, where
> hugepages are just ignored in such architectures due to pmd_huge.
> So no problem for these users.
>
> But the other users (softoffline, memory hotremove) choose hugepages
> to be migrated based on pfn, where they don't check pmd_huge.
> As you wrote, this can be problematic for such architectures.
> So I think of adding pmd_huge() check somewhere (in unmap_and_move_huge_page
> for example) to make it fail for such architectures.

Considering that we have architectures that won't support migrating
explicit hugepages with this patch series, is it ok to use
GFP_HIGHUSER_MOVABLE for hugepage allocation ?

-aneesh

2013-08-02 02:43:53

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

On Thu, Aug 01, 2013 at 11:29:39AM +0530, Aneesh Kumar K.V wrote:
> Naoya Horiguchi <[email protected]> writes:
>
> > On Wed, Jul 31, 2013 at 12:02:30AM +0530, Aneesh Kumar K.V wrote:
> >> Naoya Horiguchi <[email protected]> writes:
> >>
> >> > Now hugepages are definitely movable. So allocating hugepages from
> >> > ZONE_MOVABLE is natural and we have no reason to keep this parameter.
> >> > In order to allow userspace to prepare for the removal, let's leave
> >> > this sysctl handler as noop for a while.
> >>
> >> I guess you still need to handle architectures for which pmd_huge is
> >>
> >> int pmd_huge(pmd_t pmd)
> >> {
> >> return 0;
> >> }
> >>
> >> embedded powerpc is one. They don't store pte information at the PMD
> >> level. Instead pmd contains a pointer to hugepage directory which
> >> contain huge pte.
> >
> > It seems that this comment is for the whole series, not just for this
> > patch, right?
> >
> > Some users of hugepage migration (mbind, move_pages, migrate_pages)
> > walk over page tables to collect hugepages to be migrated, where
> > hugepages are just ignored in such architectures due to pmd_huge.
> > So no problem for these users.
> >
> > But the other users (softoffline, memory hotremove) choose hugepages
> > to be migrated based on pfn, where they don't check pmd_huge.
> > As you wrote, this can be problematic for such architectures.
> > So I think of adding pmd_huge() check somewhere (in unmap_and_move_huge_page
> > for example) to make it fail for such architectures.
>
> Considering that we have architectures that won't support migrating
> explicit hugepages with this patch series, is it ok to use
> GFP_HIGHUSER_MOVABLE for hugepage allocation ?

Originally this parameter was introduced to make hugepage pool on ZONE_MOVABLE.
The benefit is that we can extend the hugepage pool more easily,
because external fragmentation less likely happens than other zone type
by rearranging fragmented pages with page migration/reclaim.

So I think using ZONE_MOVABLE for hugepage allocation by default makes sense
even on the architectures which don't support hugepage migration.

Thanks,
Naoya Horiguchi

2013-08-05 20:28:15

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH 9/8] hugetlb: add pmd_huge_support() to migrate only pmd-based hugepage

This patch is motivated by the discussion with Aneesh about "extend
hugepage migration" patchset.
http://thread.gmane.org/gmane.linux.kernel.mm/103933/focus=104391
I'll append this to the patchset in the next post, but before that
I want this patch to be reviewed (I don't want to repeat posting the
whole set for just minor changes.)

Any comments?

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi <[email protected]>
Date: Mon, 5 Aug 2013 13:33:02 -0400
Subject: [PATCH] hugetlb: add pmd_huge_support() to migrate only pmd-based
hugepage

Currently hugepage migration works well only for pmd-based hugepages,
because core routines of hugepage migration use pmd specific internal
functions like huge_pte_offset(). So we should not enable the migration
of other levels of hugepages until we are ready for it.

Some users of hugepage migration (mbind, move_pages, and migrate_pages)
do page table walk and check pud/pmd_huge() there, so they are safe.
But the other users (softoffline and memory hotremove) don't do this,
so they can try to migrate unexpected types of hugepages.

To prevent this, we introduce an architecture dependent check of whether
hugepage are implemented on a pmd basis or not. It returns 0 if pmd_huge()
returns always 0, and 1 otherwise.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
arch/arm/mm/hugetlbpage.c | 5 +++++
arch/arm64/mm/hugetlbpage.c | 5 +++++
arch/ia64/mm/hugetlbpage.c | 5 +++++
arch/metag/mm/hugetlbpage.c | 5 +++++
arch/mips/mm/hugetlbpage.c | 5 +++++
arch/powerpc/mm/hugetlbpage.c | 10 ++++++++++
arch/s390/mm/hugetlbpage.c | 5 +++++
arch/sh/mm/hugetlbpage.c | 5 +++++
arch/sparc/mm/hugetlbpage.c | 5 +++++
arch/tile/mm/hugetlbpage.c | 5 +++++
arch/x86/mm/hugetlbpage.c | 8 ++++++++
include/linux/hugetlb.h | 2 ++
mm/migrate.c | 11 +++++++++++
13 files changed, 76 insertions(+)

diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
index 3d1e4a2..3f3b6a7 100644
--- a/arch/arm/mm/hugetlbpage.c
+++ b/arch/arm/mm/hugetlbpage.c
@@ -99,3 +99,8 @@ int pmd_huge(pmd_t pmd)
{
return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
}
+
+int pmd_huge_support(void)
+{
+ return 1;
+}
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 2fc8258..5e9aec3 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -54,6 +54,11 @@ int pud_huge(pud_t pud)
return !(pud_val(pud) & PUD_TABLE_BIT);
}

+int pmd_huge_support(void)
+{
+ return 1;
+}
+
static __init int setup_hugepagesz(char *opt)
{
unsigned long ps = memparse(opt, &opt);
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index 76069c1..68232db 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -114,6 +114,11 @@ int pud_huge(pud_t pud)
return 0;
}

+int pmd_huge_support(void)
+{
+ return 0;
+}
+
struct page *
follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
{
diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
index 3c52fa6..0424315 100644
--- a/arch/metag/mm/hugetlbpage.c
+++ b/arch/metag/mm/hugetlbpage.c
@@ -110,6 +110,11 @@ int pud_huge(pud_t pud)
return 0;
}

+int pmd_huge_support(void)
+{
+ return 1;
+}
+
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
index a7fee0d..01fda44 100644
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -85,6 +85,11 @@ int pud_huge(pud_t pud)
return (pud_val(pud) & _PAGE_HUGE) != 0;
}

+int pmd_huge_support(void)
+{
+ return 1;
+}
+
struct page *
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 834ca8e..d67db4b 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -86,6 +86,11 @@ int pgd_huge(pgd_t pgd)
*/
return ((pgd_val(pgd) & 0x3) != 0x0);
}
+
+int pmd_huge_support(void)
+{
+ return 1;
+}
#else
int pmd_huge(pmd_t pmd)
{
@@ -101,6 +106,11 @@ int pgd_huge(pgd_t pgd)
{
return 0;
}
+
+int pmd_huge_support(void)
+{
+ return 0;
+}
#endif

pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 121089d..951ee25 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -117,6 +117,11 @@ int pud_huge(pud_t pud)
return 0;
}

+int pmd_huge_support(void)
+{
+ return 1;
+}
+
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmdp, int write)
{
diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index d776234..0d676a4 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -83,6 +83,11 @@ int pud_huge(pud_t pud)
return 0;
}

+int pmd_huge_support(void)
+{
+ return 0;
+}
+
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index d2b5944..9639964 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -234,6 +234,11 @@ int pud_huge(pud_t pud)
return 0;
}

+int pmd_huge_support(void)
+{
+ return 0;
+}
+
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
index 650ccff..0ac3599 100644
--- a/arch/tile/mm/hugetlbpage.c
+++ b/arch/tile/mm/hugetlbpage.c
@@ -198,6 +198,11 @@ int pud_huge(pud_t pud)
return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
}

+int pmd_huge_support(void)
+{
+ return 1;
+}
+
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int write)
{
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 7e73e8c..9d980d8 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -59,6 +59,10 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
return NULL;
}

+int pmd_huge_support(void)
+{
+ return 0;
+}
#else

struct page *
@@ -77,6 +81,10 @@ int pud_huge(pud_t pud)
return !!(pud_val(pud) & _PAGE_PSE);
}

+int pmd_huge_support(void)
+{
+ return 1;
+}
#endif

/* x86_64 also uses this file */
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2e02c4e..115b553 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -94,6 +94,7 @@ struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int write);
int pmd_huge(pmd_t pmd);
int pud_huge(pud_t pmd);
+int pmd_huge_support(void);
unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long address, unsigned long end, pgprot_t newprot);

@@ -128,6 +129,7 @@ static inline void hugetlb_show_meminfo(void)
#define prepare_hugepage_range(file, addr, len) (-EINVAL)
#define pmd_huge(x) 0
#define pud_huge(x) 0
+#define pmd_huge_support() 0
#define is_hugepage_only_range(mm, addr, len) 0
#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
#define hugetlb_fault(mm, vma, addr, flags) ({ BUG(); 0; })
diff --git a/mm/migrate.c b/mm/migrate.c
index d313737..7082e30 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -949,6 +949,17 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
struct page *new_hpage = get_new_page(hpage, private, &result);
struct anon_vma *anon_vma = NULL;

+ /*
+ * This restriction ensures that only pmd-based hugepages can migrate,
+ * because migration of other types of hugepages are not completely
+ * implemented nor tested. Some callers of hugepage migration like
+ * soft offline and memory hotremove don't walk through page tables
+ * before kicking migration, so we need this check to prevent hugepage
+ * migration in the architectures with non-pmd-based hugepage.
+ */
+ if (!pmd_huge_support())
+ return -ENOSYS;
+
if (!new_hpage)
return -ENOMEM;

--
1.8.3.1

2013-08-06 01:52:09

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

Naoya Horiguchi <[email protected]> writes:


>>
>> Considering that we have architectures that won't support migrating
>> explicit hugepages with this patch series, is it ok to use
>> GFP_HIGHUSER_MOVABLE for hugepage allocation ?
>
> Originally this parameter was introduced to make hugepage pool on ZONE_MOVABLE.
> The benefit is that we can extend the hugepage pool more easily,
> because external fragmentation less likely happens than other zone type
> by rearranging fragmented pages with page migration/reclaim.
>
> So I think using ZONE_MOVABLE for hugepage allocation by default makes sense
> even on the architectures which don't support hugepage migration.

But allocating hugepages from ZONE_MOVABLE means we have pages in that
zone which we can't migrate. Doesn't that impact other features like
hotplug ?


-aneesh

2013-08-06 01:56:22

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 9/8] hugetlb: add pmd_huge_support() to migrate only pmd-based hugepage

Naoya Horiguchi <[email protected]> writes:

> This patch is motivated by the discussion with Aneesh about "extend
> hugepage migration" patchset.
> http://thread.gmane.org/gmane.linux.kernel.mm/103933/focus=104391
> I'll append this to the patchset in the next post, but before that
> I want this patch to be reviewed (I don't want to repeat posting the
> whole set for just minor changes.)
>
> Any comments?
>
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <[email protected]>
> Date: Mon, 5 Aug 2013 13:33:02 -0400
> Subject: [PATCH] hugetlb: add pmd_huge_support() to migrate only pmd-based
> hugepage
>
> Currently hugepage migration works well only for pmd-based hugepages,
> because core routines of hugepage migration use pmd specific internal
> functions like huge_pte_offset(). So we should not enable the migration
> of other levels of hugepages until we are ready for it.

I guess huge_pte_offset may not be the right reason because archs do
implement huge_pte_offsets even if they are not pmd-based hugepages

pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
{
/* Only called for hugetlbfs pages, hence can ignore THP */
return find_linux_pte_or_hugepte(mm->pgd, addr, NULL);
}

>
> Some users of hugepage migration (mbind, move_pages, and migrate_pages)
> do page table walk and check pud/pmd_huge() there, so they are safe.
> But the other users (softoffline and memory hotremove) don't do this,
> so they can try to migrate unexpected types of hugepages.
>
> To prevent this, we introduce an architecture dependent check of whether
> hugepage are implemented on a pmd basis or not. It returns 0 if pmd_huge()
> returns always 0, and 1 otherwise.
>

so why not #define pmd_huge_support pmd_huge or use pmd_huge directly ?

> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> arch/arm/mm/hugetlbpage.c | 5 +++++
> arch/arm64/mm/hugetlbpage.c | 5 +++++
> arch/ia64/mm/hugetlbpage.c | 5 +++++
> arch/metag/mm/hugetlbpage.c | 5 +++++
> arch/mips/mm/hugetlbpage.c | 5 +++++
> arch/powerpc/mm/hugetlbpage.c | 10 ++++++++++
> arch/s390/mm/hugetlbpage.c | 5 +++++
> arch/sh/mm/hugetlbpage.c | 5 +++++
> arch/sparc/mm/hugetlbpage.c | 5 +++++
> arch/tile/mm/hugetlbpage.c | 5 +++++
> arch/x86/mm/hugetlbpage.c | 8 ++++++++
> include/linux/hugetlb.h | 2 ++
> mm/migrate.c | 11 +++++++++++
> 13 files changed, 76 insertions(+)
>
> diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> index 3d1e4a2..3f3b6a7 100644
> --- a/arch/arm/mm/hugetlbpage.c
> +++ b/arch/arm/mm/hugetlbpage.c
> @@ -99,3 +99,8 @@ int pmd_huge(pmd_t pmd)
> {
> return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
> }
> +
> +int pmd_huge_support(void)
> +{
> + return 1;
> +}

-aneesh

2013-08-06 04:30:59

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

On Tue, Aug 06, 2013 at 07:22:02AM +0530, Aneesh Kumar K.V wrote:
> Naoya Horiguchi <[email protected]> writes:
> >>
> >> Considering that we have architectures that won't support migrating
> >> explicit hugepages with this patch series, is it ok to use
> >> GFP_HIGHUSER_MOVABLE for hugepage allocation ?
> >
> > Originally this parameter was introduced to make hugepage pool on ZONE_MOVABLE.
> > The benefit is that we can extend the hugepage pool more easily,
> > because external fragmentation less likely happens than other zone type
> > by rearranging fragmented pages with page migration/reclaim.
> >
> > So I think using ZONE_MOVABLE for hugepage allocation by default makes sense
> > even on the architectures which don't support hugepage migration.
>
> But allocating hugepages from ZONE_MOVABLE means we have pages in that
> zone which we can't migrate. Doesn't that impact other features like
> hotplug ?

Memory blocks occupied by hugepages are not removable before this patchset,
whether they are from ZONE_MOVABLE or not, and the hugepage users accepted
it for now. So I think this change doesn't make things worse than now.

It can be more preferable to switch on/off __GFP_MOVABLE flag depending on
archs without using the tunable parameter. I'm ok for this direction, but
I want to do it as a separate work.

Thanks,
Naoya Horiguchi

2013-08-06 04:48:42

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 9/8] hugetlb: add pmd_huge_support() to migrate only pmd-based hugepage

On Tue, Aug 06, 2013 at 07:26:10AM +0530, Aneesh Kumar K.V wrote:
> Naoya Horiguchi <[email protected]> writes:
>
> > This patch is motivated by the discussion with Aneesh about "extend
> > hugepage migration" patchset.
> > http://thread.gmane.org/gmane.linux.kernel.mm/103933/focus=104391
> > I'll append this to the patchset in the next post, but before that
> > I want this patch to be reviewed (I don't want to repeat posting the
> > whole set for just minor changes.)
> >
> > Any comments?
> >
> > Thanks,
> > Naoya Horiguchi
> > ---
> > From: Naoya Horiguchi <[email protected]>
> > Date: Mon, 5 Aug 2013 13:33:02 -0400
> > Subject: [PATCH] hugetlb: add pmd_huge_support() to migrate only pmd-based
> > hugepage
> >
> > Currently hugepage migration works well only for pmd-based hugepages,
> > because core routines of hugepage migration use pmd specific internal
> > functions like huge_pte_offset(). So we should not enable the migration
> > of other levels of hugepages until we are ready for it.
>
> I guess huge_pte_offset may not be the right reason because archs do
> implement huge_pte_offsets even if they are not pmd-based hugepages
>
> pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> {
> /* Only called for hugetlbfs pages, hence can ignore THP */
> return find_linux_pte_or_hugepte(mm->pgd, addr, NULL);
> }

You're right, sorry.
Honestly saying, I tested only on x86 and my testing on pud-based hugepage
is not enough (I experienced undissolved bugs,) so I want to restrict the
target for now.

> >
> > Some users of hugepage migration (mbind, move_pages, and migrate_pages)
> > do page table walk and check pud/pmd_huge() there, so they are safe.
> > But the other users (softoffline and memory hotremove) don't do this,
> > so they can try to migrate unexpected types of hugepages.
> >
> > To prevent this, we introduce an architecture dependent check of whether
> > hugepage are implemented on a pmd basis or not. It returns 0 if pmd_huge()
> > returns always 0, and 1 otherwise.
> >
>
> so why not #define pmd_huge_support pmd_huge or use pmd_huge directly ?

The caller (unmap_and_move_huge_page) doesn't have pmd, so we need do
rmap to get the pmd associated with the source hugepage. Maybe the patch
becomes smaller with this, but maybe it's slower.

Thanks,
Naoya Horiguchi

> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > arch/arm/mm/hugetlbpage.c | 5 +++++
> > arch/arm64/mm/hugetlbpage.c | 5 +++++
> > arch/ia64/mm/hugetlbpage.c | 5 +++++
> > arch/metag/mm/hugetlbpage.c | 5 +++++
> > arch/mips/mm/hugetlbpage.c | 5 +++++
> > arch/powerpc/mm/hugetlbpage.c | 10 ++++++++++
> > arch/s390/mm/hugetlbpage.c | 5 +++++
> > arch/sh/mm/hugetlbpage.c | 5 +++++
> > arch/sparc/mm/hugetlbpage.c | 5 +++++
> > arch/tile/mm/hugetlbpage.c | 5 +++++
> > arch/x86/mm/hugetlbpage.c | 8 ++++++++
> > include/linux/hugetlb.h | 2 ++
> > mm/migrate.c | 11 +++++++++++
> > 13 files changed, 76 insertions(+)
> >
> > diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> > index 3d1e4a2..3f3b6a7 100644
> > --- a/arch/arm/mm/hugetlbpage.c
> > +++ b/arch/arm/mm/hugetlbpage.c
> > @@ -99,3 +99,8 @@ int pmd_huge(pmd_t pmd)
> > {
> > return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
> > }
> > +
> > +int pmd_huge_support(void)
> > +{
> > + return 1;
> > +}
>
> -aneesh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>