Here is the 3rd version of hugepage migration patchset.
I rebased it onto v3.11-rc1 and applied most of your feedbacks.
Some works referred to in previous discussion (shown below) are not included
in this patchset, but likely to be done after this work.
- using page walker in check_range
- split page table lock for pmd/pud based hugepage (maybe applicable to thp)
Thanks,
Naoya Horiguchi
--- General Description (exactly same with previous post) ---
Hugepage migration is now available only for soft offlining (moving
data on the half corrupted page to another page to save the data).
But it's also useful some other users of page migration, so this
patchset tries to extend some of such users to support hugepage.
The targets of this patchset are NUMA related system calls (i.e.
migrate_pages(2), move_pages(2), and mbind(2)), and memory hotplug.
This patchset does not extend page migration in memory compaction,
because I think that users of memory compaction mainly expect to
construct thp by arranging raw pages but hugepage migration doesn't
help it.
CMA, another user of page migration, can have benefit from hugepage
migration, but is not enabled to support it now. This is because
I've never used CMA and need to learn more to extend and/or test
hugepage migration in CMA. I'll add this in later version if it
becomes ready, or will post as a separate patchset.
Hugepage migration of 1GB hugepage is not enabled for now, because
I'm not sure whether users of 1GB hugepage really want it.
We need to spare free hugepage in order to do migration, but I don't
think that users want to 1GB memory to idle for that purpose
(currently we can't expand/shrink 1GB hugepage pool after boot).
---
GitHub:
git://github.com/Naoya-Horiguchi/linux.git extend_hugepage_migration.v3
Test code:
git://github.com/Naoya-Horiguchi/test_hugepage_migration_extension.git
Naoya Horiguchi (8):
migrate: make core migration code aware of hugepage
soft-offline: use migrate_pages() instead of migrate_huge_page()
migrate: add hugepage migration code to migrate_pages()
migrate: add hugepage migration code to move_pages()
mbind: add hugepage migration code to mbind()
migrate: remove VM_HUGETLB from vma flag check in vma_migratable()
memory-hotplug: enable memory hotplug to handle hugepage
prepare to remove /proc/sys/vm/hugepages_treat_as_movable
Documentation/sysctl/vm.txt | 13 +----
include/linux/hugetlb.h | 15 +++++
include/linux/mempolicy.h | 2 +-
include/linux/migrate.h | 5 --
mm/hugetlb.c | 130 +++++++++++++++++++++++++++++++++++++++-----
mm/memory-failure.c | 15 ++++-
mm/memory.c | 12 +++-
mm/memory_hotplug.c | 42 +++++++++++---
mm/mempolicy.c | 43 +++++++++++++--
mm/migrate.c | 51 ++++++++---------
mm/page_alloc.c | 12 ++++
mm/page_isolation.c | 5 ++
12 files changed, 267 insertions(+), 78 deletions(-)
Currently migrate_huge_page() takes a pointer to a hugepage to be
migrated as an argument, instead of taking a pointer to the list of
hugepages to be migrated. This behavior was introduced in commit
189ebff28 ("hugetlb: simplify migrate_huge_page()"), and was OK
because until now hugepage migration is enabled only for soft-offlining
which migrates only one hugepage in a single call.
But the situation will change in the later patches in this series
which enable other users of page migration to support hugepage migration.
They can kick migration for both of normal pages and hugepages
in a single call, so we need to go back to original implementation
which uses linked lists to collect the hugepages to be migrated.
With this patch, soft_offline_huge_page() switches to use migrate_pages(),
and migrate_huge_page() is not used any more. So let's remove it.
ChangeLog v3:
- Merged with another cleanup patch (4/10 in previous version)
Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/migrate.h | 5 -----
mm/memory-failure.c | 15 ++++++++++++---
mm/migrate.c | 28 ++--------------------------
3 files changed, 14 insertions(+), 34 deletions(-)
diff --git v3.11-rc1.orig/include/linux/migrate.h v3.11-rc1/include/linux/migrate.h
index a405d3dc..6fe5214 100644
--- v3.11-rc1.orig/include/linux/migrate.h
+++ v3.11-rc1/include/linux/migrate.h
@@ -41,8 +41,6 @@ extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, enum migrate_mode mode, int reason);
-extern int migrate_huge_page(struct page *, new_page_t x,
- unsigned long private, enum migrate_mode mode);
extern int fail_migrate_page(struct address_space *,
struct page *, struct page *);
@@ -62,9 +60,6 @@ static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, enum migrate_mode mode, int reason)
{ return -ENOSYS; }
-static inline int migrate_huge_page(struct page *page, new_page_t x,
- unsigned long private, enum migrate_mode mode)
- { return -ENOSYS; }
static inline int migrate_prep(void) { return -ENOSYS; }
static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git v3.11-rc1.orig/mm/memory-failure.c v3.11-rc1/mm/memory-failure.c
index 2c13aa7..af6f61c 100644
--- v3.11-rc1.orig/mm/memory-failure.c
+++ v3.11-rc1/mm/memory-failure.c
@@ -1467,6 +1467,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
int ret;
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);
+ LIST_HEAD(pagelist);
/*
* This double-check of PageHWPoison is to avoid the race with
@@ -1482,12 +1483,20 @@ static int soft_offline_huge_page(struct page *page, int flags)
unlock_page(hpage);
/* Keep page count to indicate a given hugepage is isolated. */
- ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL,
- MIGRATE_SYNC);
- put_page(hpage);
+ list_move(&hpage->lru, &pagelist);
+ ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
+ MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
pfn, ret, page->flags);
+ /*
+ * We know that soft_offline_huge_page() tries to migrate
+ * only one hugepage pointed to by hpage, so we need not
+ * run through the pagelist here.
+ */
+ putback_active_hugepage(hpage);
+ if (ret > 0)
+ ret = -EIO;
} else {
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
index b44a067..3ec47d3 100644
--- v3.11-rc1.orig/mm/migrate.c
+++ v3.11-rc1/mm/migrate.c
@@ -979,6 +979,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
unlock_page(hpage);
out:
+ if (rc != -EAGAIN)
+ putback_active_hugepage(hpage);
put_page(new_hpage);
if (result) {
if (rc)
@@ -1066,32 +1068,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
return rc;
}
-int migrate_huge_page(struct page *hpage, new_page_t get_new_page,
- unsigned long private, enum migrate_mode mode)
-{
- int pass, rc;
-
- for (pass = 0; pass < 10; pass++) {
- rc = unmap_and_move_huge_page(get_new_page, private,
- hpage, pass > 2, mode);
- switch (rc) {
- case -ENOMEM:
- goto out;
- case -EAGAIN:
- /* try again */
- cond_resched();
- break;
- case MIGRATEPAGE_SUCCESS:
- goto out;
- default:
- rc = -EIO;
- goto out;
- }
- }
-out:
- return rc;
-}
-
#ifdef CONFIG_NUMA
/*
* Move a list of individual pages
--
1.8.3.1
Now hugepages are definitely movable. So allocating hugepages from
ZONE_MOVABLE is natural and we have no reason to keep this parameter.
In order to allow userspace to prepare for the removal, let's leave
this sysctl handler as noop for a while.
ChangeLog v3:
- use WARN_ON_ONCE
ChangeLog v2:
- shift to noop function instead of completely removing the parameter
- rename patch title
Signed-off-by: Naoya Horiguchi <[email protected]>
---
Documentation/sysctl/vm.txt | 13 ++-----------
mm/hugetlb.c | 17 ++++++-----------
2 files changed, 8 insertions(+), 22 deletions(-)
diff --git v3.11-rc1.orig/Documentation/sysctl/vm.txt v3.11-rc1/Documentation/sysctl/vm.txt
index 36ecc26..6e211a1 100644
--- v3.11-rc1.orig/Documentation/sysctl/vm.txt
+++ v3.11-rc1/Documentation/sysctl/vm.txt
@@ -200,17 +200,8 @@ fragmentation index is <= extfrag_threshold. The default value is 500.
hugepages_treat_as_movable
-This parameter is only useful when kernelcore= is specified at boot time to
-create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages
-are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero
-value written to hugepages_treat_as_movable allows huge pages to be allocated
-from ZONE_MOVABLE.
-
-Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge
-pages pool can easily grow or shrink within. Assuming that applications are
-not running that mlock() a lot of memory, it is likely the huge pages pool
-can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value
-into nr_hugepages and triggering page reclaim.
+This parameter is obsolete and planned to be removed. The value has no effect
+on kernel's behavior.
==============================================================
diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
index 9575e8a..aab5aef 100644
--- v3.11-rc1.orig/mm/hugetlb.c
+++ v3.11-rc1/mm/hugetlb.c
@@ -34,7 +34,6 @@
#include "internal.h"
const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
-static gfp_t htlb_alloc_mask = GFP_HIGHUSER;
unsigned long hugepages_treat_as_movable;
int hugetlb_max_hstate __read_mostly;
@@ -546,7 +545,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
retry_cpuset:
cpuset_mems_cookie = get_mems_allowed();
zonelist = huge_zonelist(vma, address,
- htlb_alloc_mask, &mpol, &nodemask);
+ GFP_HIGHUSER_MOVABLE, &mpol, &nodemask);
/*
* A child process with MAP_PRIVATE mappings created by their parent
* have no page reserves. This check ensures that reservations are
@@ -562,7 +561,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
for_each_zone_zonelist_nodemask(zone, z, zonelist,
MAX_NR_ZONES - 1, nodemask) {
- if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
+ if (cpuset_zone_allowed_softwall(zone, GFP_HIGHUSER_MOVABLE)) {
page = dequeue_huge_page_node(h, zone_to_nid(zone));
if (page) {
if (!avoid_reserve)
@@ -719,7 +718,7 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
return NULL;
page = alloc_pages_exact_node(nid,
- htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
+ GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE|
__GFP_REPEAT|__GFP_NOWARN,
huge_page_order(h));
if (page) {
@@ -944,12 +943,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
spin_unlock(&hugetlb_lock);
if (nid == NUMA_NO_NODE)
- page = alloc_pages(htlb_alloc_mask|__GFP_COMP|
+ page = alloc_pages(GFP_HIGHUSER_MOVABLE|__GFP_COMP|
__GFP_REPEAT|__GFP_NOWARN,
huge_page_order(h));
else
page = alloc_pages_exact_node(nid,
- htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|
+ GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_THISNODE|
__GFP_REPEAT|__GFP_NOWARN, huge_page_order(h));
if (page && arch_prepare_hugepage(page)) {
@@ -2128,11 +2127,7 @@ int hugetlb_treat_movable_handler(struct ctl_table *table, int write,
void __user *buffer,
size_t *length, loff_t *ppos)
{
- proc_dointvec(table, write, buffer, length, ppos);
- if (hugepages_treat_as_movable)
- htlb_alloc_mask = GFP_HIGHUSER_MOVABLE;
- else
- htlb_alloc_mask = GFP_HIGHUSER;
+ WARN_ON_ONCE("This knob is obsolete and has no effect. It is scheduled for removal.\n");
return 0;
}
--
1.8.3.1
Until now we can't offline memory blocks which contain hugepages because
a hugepage is considered as an unmovable page. But now with this patch
series, a hugepage has become movable, so by using hugepage migration we
can offline such memory blocks.
What's different from other users of hugepage migration is that we need
to decompose all the hugepages inside the target memory block into free
buddy pages after hugepage migration, because otherwise free hugepages
remaining in the memory block intervene the memory offlining.
For this reason we introduce new functions dissolve_free_huge_page() and
dissolve_free_huge_pages().
Other than that, what this patch does is straightforwardly to add hugepage
migration code, that is, adding hugepage code to the functions which scan
over pfn and collect hugepages to be migrated, and adding a hugepage
allocation function to alloc_migrate_target().
As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
over them because it's larger than memory block. So we now simply leave
it to fail as it is.
ChangeLog v3:
- revert introducing migrate_movable_pages (the function was opened)
- add migratetype check in dequeue_huge_page_node to close the race
between scan and allocation
- make is_hugepage_movable use refcount to find active hugepages
instead of running through hugepage_activelist
- rename is_hugepage_movable to is_hugepage_active
- add alignment check in dissolve_free_huge_pages
- use round_up in calculating next scanning pfn
- use isolate_huge_page
ChangeLog v2:
- changed return value type of is_hugepage_movable() to bool
- is_hugepage_movable() uses list_for_each_entry() instead of *_safe()
- moved if(PageHuge) block before get_page_unless_zero() in do_migrate_range()
- do_migrate_range() returns -EBUSY for hugepages larger than memory block
- dissolve_free_huge_pages() calculates scan step and sets it to minimum
hugepage size
Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/hugetlb.h | 6 +++++
mm/hugetlb.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++--
mm/memory_hotplug.c | 42 +++++++++++++++++++++++++------
mm/page_alloc.c | 12 +++++++++
mm/page_isolation.c | 5 ++++
5 files changed, 123 insertions(+), 9 deletions(-)
diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
index 768ebbe..bb7651e 100644
--- v3.11-rc1.orig/include/linux/hugetlb.h
+++ v3.11-rc1/include/linux/hugetlb.h
@@ -69,6 +69,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
bool isolate_huge_page(struct page *page, struct list_head *l);
void putback_active_hugepage(struct page *page);
void putback_active_hugepages(struct list_head *l);
+bool is_hugepage_active(struct page *page);
void copy_huge_page(struct page *dst, struct page *src);
#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
@@ -140,6 +141,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
#define isolate_huge_page(p, l) false
#define putback_active_hugepage(p)
#define putback_active_hugepages(l)
+#define is_hugepage_active(x) false
static inline void copy_huge_page(struct page *dst, struct page *src)
{
}
@@ -379,6 +381,9 @@ static inline pgoff_t basepage_index(struct page *page)
return __basepage_index(page);
}
+extern void dissolve_free_huge_pages(unsigned long start_pfn,
+ unsigned long end_pfn);
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
#define alloc_huge_page_node(h, nid) NULL
@@ -405,6 +410,7 @@ static inline pgoff_t basepage_index(struct page *page)
{
return page->index;
}
+#define dissolve_free_huge_pages(s, e)
#endif /* CONFIG_HUGETLB_PAGE */
#endif /* _LINUX_HUGETLB_H */
diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
index fab29a1..9575e8a 100644
--- v3.11-rc1.orig/mm/hugetlb.c
+++ v3.11-rc1/mm/hugetlb.c
@@ -21,6 +21,7 @@
#include <linux/rmap.h>
#include <linux/swap.h>
#include <linux/swapops.h>
+#include <linux/page-isolation.h>
#include <asm/page.h>
#include <asm/pgtable.h>
@@ -518,9 +519,11 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
{
struct page *page;
- if (list_empty(&h->hugepage_freelists[nid]))
+ list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
+ if (!is_migrate_isolate_page(page))
+ break;
+ if (&h->hugepage_freelists[nid] == &page->lru)
return NULL;
- page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
h->free_huge_pages--;
@@ -861,6 +864,44 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
return ret;
}
+/*
+ * Dissolve a given free hugepage into free buddy pages. This function does
+ * nothing for in-use (including surplus) hugepages.
+ */
+static void dissolve_free_huge_page(struct page *page)
+{
+ spin_lock(&hugetlb_lock);
+ if (PageHuge(page) && !page_count(page)) {
+ struct hstate *h = page_hstate(page);
+ int nid = page_to_nid(page);
+ list_del(&page->lru);
+ h->free_huge_pages--;
+ h->free_huge_pages_node[nid]--;
+ update_and_free_page(h, page);
+ }
+ spin_unlock(&hugetlb_lock);
+}
+
+/*
+ * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
+ * make specified memory blocks removable from the system.
+ * Note that start_pfn should aligned with (minimum) hugepage size.
+ */
+void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned int order = 8 * sizeof(void *);
+ unsigned long pfn;
+ struct hstate *h;
+
+ /* Set scan step to minimum hugepage size */
+ for_each_hstate(h)
+ if (order > huge_page_order(h))
+ order = huge_page_order(h);
+ VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << order));
+ for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << order)
+ dissolve_free_huge_page(pfn_to_page(pfn));
+}
+
static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
{
struct page *page;
@@ -3418,6 +3459,28 @@ static int is_hugepage_on_freelist(struct page *hpage)
return 0;
}
+bool is_hugepage_active(struct page *page)
+{
+ VM_BUG_ON(!PageHuge(page));
+ /*
+ * This function can be called for a tail page because the caller,
+ * scan_movable_pages, scans through a given pfn-range which typically
+ * covers one memory block. In systems using gigantic hugepage (1GB
+ * for x86_64,) a hugepage is larger than a memory block, and we don't
+ * support migrating such large hugepages for now, so return false
+ * when called for tail pages.
+ */
+ if (PageTail(page))
+ return false;
+ /*
+ * Refcount of a hwpoisoned hugepages is 1, but they are not active,
+ * so we should return false for them.
+ */
+ if (unlikely(PageHWPoison(page)))
+ return false;
+ return page_count(page) > 0;
+}
+
/*
* This function is called from memory failure code.
* Assume the caller holds page lock of the head page.
diff --git v3.11-rc1.orig/mm/memory_hotplug.c v3.11-rc1/mm/memory_hotplug.c
index ca1dd3a..31f08fa 100644
--- v3.11-rc1.orig/mm/memory_hotplug.c
+++ v3.11-rc1/mm/memory_hotplug.c
@@ -30,6 +30,7 @@
#include <linux/mm_inline.h>
#include <linux/firmware-map.h>
#include <linux/stop_machine.h>
+#include <linux/hugetlb.h>
#include <asm/tlbflush.h>
@@ -1208,10 +1209,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
}
/*
- * Scanning pfn is much easier than scanning lru list.
- * Scan pfn from start to end and Find LRU page.
+ * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
+ * and hugepages). We scan pfn because it's much easier than scanning over
+ * linked list. This function returns the pfn of the first found movable
+ * page if it's found, otherwise 0.
*/
-static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
+static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
{
unsigned long pfn;
struct page *page;
@@ -1220,6 +1223,13 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
page = pfn_to_page(pfn);
if (PageLRU(page))
return pfn;
+ if (PageHuge(page)) {
+ if (is_hugepage_active(page))
+ return pfn;
+ else
+ pfn = round_up(pfn + 1,
+ 1 << compound_order(page)) - 1;
+ }
}
}
return 0;
@@ -1240,6 +1250,19 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
if (!pfn_valid(pfn))
continue;
page = pfn_to_page(pfn);
+
+ if (PageHuge(page)) {
+ struct page *head = compound_head(page);
+ pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
+ if (compound_order(head) > PFN_SECTION_SHIFT) {
+ ret = -EBUSY;
+ break;
+ }
+ if (isolate_huge_page(page, &source))
+ move_pages -= 1 << compound_order(head);
+ continue;
+ }
+
if (!get_page_unless_zero(page))
continue;
/*
@@ -1272,7 +1295,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
}
if (!list_empty(&source)) {
if (not_managed) {
- putback_lru_pages(&source);
+ putback_movable_pages(&source);
goto out;
}
@@ -1283,7 +1306,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
ret = migrate_pages(&source, alloc_migrate_target, 0,
MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
if (ret)
- putback_lru_pages(&source);
+ putback_movable_pages(&source);
}
out:
return ret;
@@ -1527,8 +1550,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
drain_all_pages();
}
- pfn = scan_lru_pages(start_pfn, end_pfn);
- if (pfn) { /* We have page on LRU */
+ pfn = scan_movable_pages(start_pfn, end_pfn);
+ if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
if (!ret) {
drain = 1;
@@ -1547,6 +1570,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
yield();
/* drain pcp pages, this is synchronous. */
drain_all_pages();
+ /*
+ * dissolve free hugepages in the memory block before doing offlining
+ * actually in order to make hugetlbfs's object counting consistent.
+ */
+ dissolve_free_huge_pages(start_pfn, end_pfn);
/* check again */
offlined_pages = check_pages_isolated(start_pfn, end_pfn);
if (offlined_pages < 0) {
diff --git v3.11-rc1.orig/mm/page_alloc.c v3.11-rc1/mm/page_alloc.c
index b100255..24fe228 100644
--- v3.11-rc1.orig/mm/page_alloc.c
+++ v3.11-rc1/mm/page_alloc.c
@@ -60,6 +60,7 @@
#include <linux/page-debug-flags.h>
#include <linux/hugetlb.h>
#include <linux/sched/rt.h>
+#include <linux/hugetlb.h>
#include <asm/sections.h>
#include <asm/tlbflush.h>
@@ -5928,6 +5929,17 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
continue;
page = pfn_to_page(check);
+
+ /*
+ * Hugepages are not in LRU lists, but they're movable.
+ * We need not scan over tail pages bacause we don't
+ * handle each tail page individually in migration.
+ */
+ if (PageHuge(page)) {
+ iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
+ continue;
+ }
+
/*
* We can't use page_count without pin a page
* because another CPU can free compound page.
diff --git v3.11-rc1.orig/mm/page_isolation.c v3.11-rc1/mm/page_isolation.c
index 383bdbb..cf48ef6 100644
--- v3.11-rc1.orig/mm/page_isolation.c
+++ v3.11-rc1/mm/page_isolation.c
@@ -6,6 +6,7 @@
#include <linux/page-isolation.h>
#include <linux/pageblock-flags.h>
#include <linux/memory.h>
+#include <linux/hugetlb.h>
#include "internal.h"
int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
@@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
{
gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;
+ if (PageHuge(page))
+ return alloc_huge_page_node(page_hstate(compound_head(page)),
+ numa_node_id());
+
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
--
1.8.3.1
This patch enables hugepage migration from migrate_pages(2),
move_pages(2), and mbind(2).
Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/mempolicy.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git v3.11-rc1.orig/include/linux/mempolicy.h v3.11-rc1/include/linux/mempolicy.h
index 0d7df39..2e475b5 100644
--- v3.11-rc1.orig/include/linux/mempolicy.h
+++ v3.11-rc1/include/linux/mempolicy.h
@@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
/* Check if a vma is migratable */
static inline int vma_migratable(struct vm_area_struct *vma)
{
- if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP))
+ if (vma->vm_flags & (VM_IO | VM_PFNMAP))
return 0;
/*
* Migration allocates pages in the highest zone. If we cannot
--
1.8.3.1
This patch extends do_mbind() to handle vma with VM_HUGETLB set.
We will be able to migrate hugepage with mbind(2) after
applying the enablement patch which comes later in this series.
ChangeLog v3:
- revert introducing migrate_movable_pages
- added alloc_huge_page_noerr free from ERR_VALUE
ChangeLog v2:
- updated description and renamed patch title
Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/hugetlb.h | 3 +++
mm/hugetlb.c | 14 ++++++++++++++
mm/mempolicy.c | 4 +++-
3 files changed, 20 insertions(+), 1 deletion(-)
diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
index 0b7a9e7..768ebbe 100644
--- v3.11-rc1.orig/include/linux/hugetlb.h
+++ v3.11-rc1/include/linux/hugetlb.h
@@ -267,6 +267,8 @@ struct huge_bootmem_page {
};
struct page *alloc_huge_page_node(struct hstate *h, int nid);
+struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
+ unsigned long addr, int avoid_reserve);
/* arch callback */
int __init alloc_bootmem_huge_page(struct hstate *h);
@@ -380,6 +382,7 @@ static inline pgoff_t basepage_index(struct page *page)
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
#define alloc_huge_page_node(h, nid) NULL
+#define alloc_huge_page_noerr(v, a, r) NULL
#define alloc_bootmem_huge_page(h) NULL
#define hstate_file(f) NULL
#define hstate_sizelog(s) NULL
diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
index 4c48a70..fab29a1 100644
--- v3.11-rc1.orig/mm/hugetlb.c
+++ v3.11-rc1/mm/hugetlb.c
@@ -1195,6 +1195,20 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
return page;
}
+/*
+ * alloc_huge_page()'s wrapper which simply returns the page if allocation
+ * succeeds, otherwise NULL. This function is called from new_vma_page(),
+ * where no ERR_VALUE is expected to be returned.
+ */
+struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
+ unsigned long addr, int avoid_reserve)
+{
+ struct page *page = alloc_huge_page(vma, addr, avoid_reserve);
+ if (IS_ERR(page))
+ page = NULL;
+ return page;
+}
+
int __weak alloc_bootmem_huge_page(struct hstate *h)
{
struct huge_bootmem_page *m;
diff --git v3.11-rc1.orig/mm/mempolicy.c v3.11-rc1/mm/mempolicy.c
index f3b65c0..d8ced3e 100644
--- v3.11-rc1.orig/mm/mempolicy.c
+++ v3.11-rc1/mm/mempolicy.c
@@ -1180,6 +1180,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int *
vma = vma->vm_next;
}
+ if (PageHuge(page))
+ return alloc_huge_page_noerr(vma, address, 1);
/*
* if !vma, alloc_page_vma() will use task or system default policy
*/
@@ -1290,7 +1292,7 @@ static long do_mbind(unsigned long start, unsigned long len,
(unsigned long)vma,
MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
if (nr_failed)
- putback_lru_pages(&pagelist);
+ putback_movable_pages(&pagelist);
}
if (nr_failed && (flags & MPOL_MF_STRICT))
--
1.8.3.1
Before enabling each user of page migration to support hugepage,
this patch enables the list of pages for migration to link not only
LRU pages, but also hugepages. As a result, putback_movable_pages()
and migrate_pages() can handle both of LRU pages and hugepages.
ChangeLog v3:
- revert introducing migrate_movable_pages
- add isolate_huge_page
ChangeLog v2:
- move code removing VM_HUGETLB from vma_migratable check into a
separate patch
- hold hugetlb_lock in putback_active_hugepage
- update comment near the definition of hugetlb_lock
Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/hugetlb.h | 6 ++++++
mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++-
mm/migrate.c | 10 +++++++++-
3 files changed, 46 insertions(+), 2 deletions(-)
diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
index c2b1801..0b7a9e7 100644
--- v3.11-rc1.orig/include/linux/hugetlb.h
+++ v3.11-rc1/include/linux/hugetlb.h
@@ -66,6 +66,9 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
vm_flags_t vm_flags);
void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
int dequeue_hwpoisoned_huge_page(struct page *page);
+bool isolate_huge_page(struct page *page, struct list_head *l);
+void putback_active_hugepage(struct page *page);
+void putback_active_hugepages(struct list_head *l);
void copy_huge_page(struct page *dst, struct page *src);
#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
@@ -134,6 +137,9 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
return 0;
}
+#define isolate_huge_page(p, l) false
+#define putback_active_hugepage(p)
+#define putback_active_hugepages(l)
static inline void copy_huge_page(struct page *dst, struct page *src)
{
}
diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
index 83aff0a..4c48a70 100644
--- v3.11-rc1.orig/mm/hugetlb.c
+++ v3.11-rc1/mm/hugetlb.c
@@ -48,7 +48,8 @@ static unsigned long __initdata default_hstate_max_huge_pages;
static unsigned long __initdata default_hstate_size;
/*
- * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
+ * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_pages,
+ * free_huge_pages, and surplus_huge_pages.
*/
DEFINE_SPINLOCK(hugetlb_lock);
@@ -3431,3 +3432,32 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
return ret;
}
#endif
+
+bool isolate_huge_page(struct page *page, struct list_head *l)
+{
+ VM_BUG_ON(!PageHead(page));
+ if (!get_page_unless_zero(page))
+ return false;
+ spin_lock(&hugetlb_lock);
+ list_move_tail(&page->lru, l);
+ spin_unlock(&hugetlb_lock);
+ return true;
+}
+
+void putback_active_hugepage(struct page *page)
+{
+ VM_BUG_ON(!PageHead(page));
+ spin_lock(&hugetlb_lock);
+ list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
+ spin_unlock(&hugetlb_lock);
+ put_page(page);
+}
+
+void putback_active_hugepages(struct list_head *l)
+{
+ struct page *page;
+ struct page *page2;
+
+ list_for_each_entry_safe(page, page2, l, lru)
+ putback_active_hugepage(page);
+}
diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
index 6f0c244..b44a067 100644
--- v3.11-rc1.orig/mm/migrate.c
+++ v3.11-rc1/mm/migrate.c
@@ -100,6 +100,10 @@ void putback_movable_pages(struct list_head *l)
struct page *page2;
list_for_each_entry_safe(page, page2, l, lru) {
+ if (unlikely(PageHuge(page))) {
+ putback_active_hugepage(page);
+ continue;
+ }
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
@@ -1025,7 +1029,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
list_for_each_entry_safe(page, page2, from, lru) {
cond_resched();
- rc = unmap_and_move(get_new_page, private,
+ if (PageHuge(page))
+ rc = unmap_and_move_huge_page(get_new_page,
+ private, page, pass > 2, mode);
+ else
+ rc = unmap_and_move(get_new_page, private,
page, pass > 2, mode);
switch(rc) {
--
1.8.3.1
This patch extends move_pages() to handle vma with VM_HUGETLB set.
We will be able to migrate hugepage with move_pages(2) after
applying the enablement patch which comes later in this series.
We avoid getting refcount on tail pages of hugepage, because unlike thp,
hugepage is not split and we need not care about races with splitting.
And migration of larger (1GB for x86_64) hugepage are not enabled.
ChangeLog v3:
- revert introducing migrate_movable_pages
- follow_page_mask(FOLL_GET) returns NULL for tail pages
- use isolate_huge_page
ChangeLog v2:
- updated description and renamed patch title
Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory.c | 12 ++++++++++--
mm/migrate.c | 13 +++++++++++--
2 files changed, 21 insertions(+), 4 deletions(-)
diff --git v3.11-rc1.orig/mm/memory.c v3.11-rc1/mm/memory.c
index 1ce2e2a..8c9a2cb 100644
--- v3.11-rc1.orig/mm/memory.c
+++ v3.11-rc1/mm/memory.c
@@ -1496,7 +1496,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
if (pud_none(*pud))
goto no_page_table;
if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
- BUG_ON(flags & FOLL_GET);
+ if (flags & FOLL_GET)
+ goto out;
page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
goto out;
}
@@ -1507,8 +1508,15 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
if (pmd_none(*pmd))
goto no_page_table;
if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
- BUG_ON(flags & FOLL_GET);
page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
+ if (flags & FOLL_GET) {
+ if (PageHead(page))
+ get_page_foll(page);
+ else {
+ page = NULL;
+ goto out;
+ }
+ }
goto out;
}
if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
index 3ec47d3..d313737 100644
--- v3.11-rc1.orig/mm/migrate.c
+++ v3.11-rc1/mm/migrate.c
@@ -1092,7 +1092,11 @@ static struct page *new_page_node(struct page *p, unsigned long private,
*result = &pm->status;
- return alloc_pages_exact_node(pm->node,
+ if (PageHuge(p))
+ return alloc_huge_page_node(page_hstate(compound_head(p)),
+ pm->node);
+ else
+ return alloc_pages_exact_node(pm->node,
GFP_HIGHUSER_MOVABLE | GFP_THISNODE, 0);
}
@@ -1152,6 +1156,11 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
!migrate_all)
goto put_and_set;
+ if (PageHuge(page)) {
+ isolate_huge_page(page, &pagelist);
+ goto put_and_set;
+ }
+
err = isolate_lru_page(page);
if (!err) {
list_add_tail(&page->lru, &pagelist);
@@ -1174,7 +1183,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
err = migrate_pages(&pagelist, new_page_node,
(unsigned long)pm, MIGRATE_SYNC, MR_SYSCALL);
if (err)
- putback_lru_pages(&pagelist);
+ putback_movable_pages(&pagelist);
}
up_read(&mm->mmap_sem);
--
1.8.3.1
This patch extends check_range() to handle vma with VM_HUGETLB set.
We will be able to migrate hugepage with migrate_pages(2) after
applying the enablement patch which comes later in this series.
Note that for larger hugepages (covered by pud entries, 1GB for
x86_64 for example), we simply skip it now.
Note that using pmd_huge/pud_huge assumes that hugepages are pointed to
by pmd/pud. This is not true in some architectures implementing hugepage
with other mechanisms like ia64, but it's OK because pmd_huge/pud_huge
simply return 0 in such arch and page walker simply ignores such hugepages.
ChangeLog v3:
- revert introducing migrate_movable_pages
- use isolate_huge_page
ChangeLog v2:
- remove unnecessary extern
- fix page table lock in check_hugetlb_pmd_range
- updated description and renamed patch title
Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/mempolicy.c | 39 ++++++++++++++++++++++++++++++++++-----
1 file changed, 34 insertions(+), 5 deletions(-)
diff --git v3.11-rc1.orig/mm/mempolicy.c v3.11-rc1/mm/mempolicy.c
index 7431001..f3b65c0 100644
--- v3.11-rc1.orig/mm/mempolicy.c
+++ v3.11-rc1/mm/mempolicy.c
@@ -512,6 +512,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
return addr != end;
}
+static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
+ const nodemask_t *nodes, unsigned long flags,
+ void *private)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+ int nid;
+ struct page *page;
+
+ spin_lock(&vma->vm_mm->page_table_lock);
+ page = pte_page(huge_ptep_get((pte_t *)pmd));
+ nid = page_to_nid(page);
+ if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
+ && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
+ || flags & MPOL_MF_MOVE_ALL))
+ isolate_huge_page(page, private);
+ spin_unlock(&vma->vm_mm->page_table_lock);
+#else
+ BUG();
+#endif
+}
+
static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end,
const nodemask_t *nodes, unsigned long flags,
@@ -523,6 +544,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
pmd = pmd_offset(pud, addr);
do {
next = pmd_addr_end(addr, end);
+ if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
+ check_hugetlb_pmd_range(vma, pmd, nodes,
+ flags, private);
+ continue;
+ }
split_huge_page_pmd(vma, addr, pmd);
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
continue;
@@ -544,6 +570,8 @@ static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
+ if (pud_huge(*pud) && is_vm_hugetlb_page(vma))
+ continue;
if (pud_none_or_clear_bad(pud))
continue;
if (check_pmd_range(vma, pud, addr, next, nodes,
@@ -635,9 +663,6 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
return ERR_PTR(-EFAULT);
}
- if (is_vm_hugetlb_page(vma))
- goto next;
-
if (flags & MPOL_MF_LAZY) {
change_prot_numa(vma, start, endvma);
goto next;
@@ -986,7 +1011,11 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
static struct page *new_node_page(struct page *page, unsigned long node, int **x)
{
- return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
+ if (PageHuge(page))
+ return alloc_huge_page_node(page_hstate(compound_head(page)),
+ node);
+ else
+ return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
}
/*
@@ -1016,7 +1045,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
err = migrate_pages(&pagelist, new_node_page, dest,
MIGRATE_SYNC, MR_SYSCALL);
if (err)
- putback_lru_pages(&pagelist);
+ putback_movable_pages(&pagelist);
}
return err;
--
1.8.3.1
Hey Naoya,
On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
<[email protected]> wrote:
> Before enabling each user of page migration to support hugepage,
> this patch enables the list of pages for migration to link not only
> LRU pages, but also hugepages. As a result, putback_movable_pages()
> and migrate_pages() can handle both of LRU pages and hugepages.
>
> ChangeLog v3:
> - revert introducing migrate_movable_pages
> - add isolate_huge_page
>
> ChangeLog v2:
> - move code removing VM_HUGETLB from vma_migratable check into a
> separate patch
> - hold hugetlb_lock in putback_active_hugepage
> - update comment near the definition of hugetlb_lock
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> include/linux/hugetlb.h | 6 ++++++
> mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++-
> mm/migrate.c | 10 +++++++++-
> 3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
> index c2b1801..0b7a9e7 100644
> --- v3.11-rc1.orig/include/linux/hugetlb.h
> +++ v3.11-rc1/include/linux/hugetlb.h
> @@ -66,6 +66,9 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
> vm_flags_t vm_flags);
> void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> int dequeue_hwpoisoned_huge_page(struct page *page);
> +bool isolate_huge_page(struct page *page, struct list_head *l);
> +void putback_active_hugepage(struct page *page);
> +void putback_active_hugepages(struct list_head *l);
> void copy_huge_page(struct page *dst, struct page *src);
>
> #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
> @@ -134,6 +137,9 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> return 0;
> }
>
> +#define isolate_huge_page(p, l) false
> +#define putback_active_hugepage(p)
Add do{}while(o), ok?
> +#define putback_active_hugepages(l)
> static inline void copy_huge_page(struct page *dst, struct page *src)
> {
> }
> diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
> index 83aff0a..4c48a70 100644
> --- v3.11-rc1.orig/mm/hugetlb.c
> +++ v3.11-rc1/mm/hugetlb.c
> @@ -48,7 +48,8 @@ static unsigned long __initdata default_hstate_max_huge_pages;
> static unsigned long __initdata default_hstate_size;
>
> /*
> - * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> + * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_pages,
> + * free_huge_pages, and surplus_huge_pages.
> */
> DEFINE_SPINLOCK(hugetlb_lock);
>
> @@ -3431,3 +3432,32 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
> return ret;
> }
> #endif
> +
> +bool isolate_huge_page(struct page *page, struct list_head *l)
Can we replace the page parameter with p?
> +{
> + VM_BUG_ON(!PageHead(page));
> + if (!get_page_unless_zero(page))
> + return false;
> + spin_lock(&hugetlb_lock);
> + list_move_tail(&page->lru, l);
> + spin_unlock(&hugetlb_lock);
> + return true;
> +}
> +
> +void putback_active_hugepage(struct page *page)
> +{
> + VM_BUG_ON(!PageHead(page));
> + spin_lock(&hugetlb_lock);
> + list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
> + spin_unlock(&hugetlb_lock);
> + put_page(page);
> +}
> +
> +void putback_active_hugepages(struct list_head *l)
> +{
> + struct page *page;
> + struct page *page2;
> +
> + list_for_each_entry_safe(page, page2, l, lru)
> + putback_active_hugepage(page);
Can we acquire hugetlb_lock only once?
> +}
> diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
> index 6f0c244..b44a067 100644
> --- v3.11-rc1.orig/mm/migrate.c
> +++ v3.11-rc1/mm/migrate.c
> @@ -100,6 +100,10 @@ void putback_movable_pages(struct list_head *l)
> struct page *page2;
>
> list_for_each_entry_safe(page, page2, l, lru) {
> + if (unlikely(PageHuge(page))) {
> + putback_active_hugepage(page);
> + continue;
> + }
> list_del(&page->lru);
> dec_zone_page_state(page, NR_ISOLATED_ANON +
> page_is_file_cache(page));
> @@ -1025,7 +1029,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> list_for_each_entry_safe(page, page2, from, lru) {
> cond_resched();
>
> - rc = unmap_and_move(get_new_page, private,
> + if (PageHuge(page))
> + rc = unmap_and_move_huge_page(get_new_page,
> + private, page, pass > 2, mode);
> + else
> + rc = unmap_and_move(get_new_page, private,
> page, pass > 2, mode);
>
Is this hunk unclean merge?
On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
<[email protected]> wrote:
> This patch extends check_range() to handle vma with VM_HUGETLB set.
> We will be able to migrate hugepage with migrate_pages(2) after
> applying the enablement patch which comes later in this series.
>
> Note that for larger hugepages (covered by pud entries, 1GB for
> x86_64 for example), we simply skip it now.
>
> Note that using pmd_huge/pud_huge assumes that hugepages are pointed to
> by pmd/pud. This is not true in some architectures implementing hugepage
> with other mechanisms like ia64, but it's OK because pmd_huge/pud_huge
> simply return 0 in such arch and page walker simply ignores such hugepages.
>
> ChangeLog v3:
> - revert introducing migrate_movable_pages
> - use isolate_huge_page
>
> ChangeLog v2:
> - remove unnecessary extern
> - fix page table lock in check_hugetlb_pmd_range
> - updated description and renamed patch title
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> mm/mempolicy.c | 39 ++++++++++++++++++++++++++++++++++-----
> 1 file changed, 34 insertions(+), 5 deletions(-)
>
> diff --git v3.11-rc1.orig/mm/mempolicy.c v3.11-rc1/mm/mempolicy.c
> index 7431001..f3b65c0 100644
> --- v3.11-rc1.orig/mm/mempolicy.c
> +++ v3.11-rc1/mm/mempolicy.c
> @@ -512,6 +512,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> return addr != end;
> }
>
> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> + const nodemask_t *nodes, unsigned long flags,
> + void *private)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> + int nid;
> + struct page *page;
> +
> + spin_lock(&vma->vm_mm->page_table_lock);
> + page = pte_page(huge_ptep_get((pte_t *)pmd));
> + nid = page_to_nid(page);
Can you please add a brief comment for the if block?
> + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
> + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
> + || flags & MPOL_MF_MOVE_ALL))
> + isolate_huge_page(page, private);
> + spin_unlock(&vma->vm_mm->page_table_lock);
> +#else
> + BUG();
> +#endif
> +}
> +
> static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> unsigned long addr, unsigned long end,
> const nodemask_t *nodes, unsigned long flags,
> @@ -523,6 +544,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> pmd = pmd_offset(pud, addr);
> do {
> next = pmd_addr_end(addr, end);
> + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> + check_hugetlb_pmd_range(vma, pmd, nodes,
> + flags, private);
> + continue;
> + }
> split_huge_page_pmd(vma, addr, pmd);
> if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> continue;
> @@ -544,6 +570,8 @@ static inline int check_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
> pud = pud_offset(pgd, addr);
> do {
> next = pud_addr_end(addr, end);
> + if (pud_huge(*pud) && is_vm_hugetlb_page(vma))
> + continue;
> if (pud_none_or_clear_bad(pud))
> continue;
> if (check_pmd_range(vma, pud, addr, next, nodes,
> @@ -635,9 +663,6 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
> return ERR_PTR(-EFAULT);
> }
>
> - if (is_vm_hugetlb_page(vma))
> - goto next;
> -
> if (flags & MPOL_MF_LAZY) {
> change_prot_numa(vma, start, endvma);
> goto next;
> @@ -986,7 +1011,11 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
>
> static struct page *new_node_page(struct page *page, unsigned long node, int **x)
> {
> - return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
> + if (PageHuge(page))
> + return alloc_huge_page_node(page_hstate(compound_head(page)),
> + node);
> + else
> + return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE, 0);
> }
>
> /*
> @@ -1016,7 +1045,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
> err = migrate_pages(&pagelist, new_node_page, dest,
> MIGRATE_SYNC, MR_SYSCALL);
> if (err)
> - putback_lru_pages(&pagelist);
> + putback_movable_pages(&pagelist);
> }
>
> return err;
> --
> 1.8.3.1
>
Hello Hillf,
Thanks for your reviewing.
On Fri, Jul 19, 2013 at 10:38:35AM +0800, Hillf Danton wrote:
> Hey Naoya,
>
> On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
> <[email protected]> wrote:
> > Before enabling each user of page migration to support hugepage,
> > this patch enables the list of pages for migration to link not only
> > LRU pages, but also hugepages. As a result, putback_movable_pages()
> > and migrate_pages() can handle both of LRU pages and hugepages.
> >
> > ChangeLog v3:
> > - revert introducing migrate_movable_pages
> > - add isolate_huge_page
> >
> > ChangeLog v2:
> > - move code removing VM_HUGETLB from vma_migratable check into a
> > separate patch
> > - hold hugetlb_lock in putback_active_hugepage
> > - update comment near the definition of hugetlb_lock
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > include/linux/hugetlb.h | 6 ++++++
> > mm/hugetlb.c | 32 +++++++++++++++++++++++++++++++-
> > mm/migrate.c | 10 +++++++++-
> > 3 files changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git v3.11-rc1.orig/include/linux/hugetlb.h v3.11-rc1/include/linux/hugetlb.h
> > index c2b1801..0b7a9e7 100644
> > --- v3.11-rc1.orig/include/linux/hugetlb.h
> > +++ v3.11-rc1/include/linux/hugetlb.h
> > @@ -66,6 +66,9 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to,
> > vm_flags_t vm_flags);
> > void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> > int dequeue_hwpoisoned_huge_page(struct page *page);
> > +bool isolate_huge_page(struct page *page, struct list_head *l);
> > +void putback_active_hugepage(struct page *page);
> > +void putback_active_hugepages(struct list_head *l);
> > void copy_huge_page(struct page *dst, struct page *src);
> >
> > #ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
> > @@ -134,6 +137,9 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> > return 0;
> > }
> >
> > +#define isolate_huge_page(p, l) false
> > +#define putback_active_hugepage(p)
>
> Add do{}while(o), ok?
OK. And I will get the same comment for patch 7/8.
> > +#define putback_active_hugepages(l)
> > static inline void copy_huge_page(struct page *dst, struct page *src)
> > {
> > }
> > diff --git v3.11-rc1.orig/mm/hugetlb.c v3.11-rc1/mm/hugetlb.c
> > index 83aff0a..4c48a70 100644
> > --- v3.11-rc1.orig/mm/hugetlb.c
> > +++ v3.11-rc1/mm/hugetlb.c
> > @@ -48,7 +48,8 @@ static unsigned long __initdata default_hstate_max_huge_pages;
> > static unsigned long __initdata default_hstate_size;
> >
> > /*
> > - * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages
> > + * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_pages,
> > + * free_huge_pages, and surplus_huge_pages.
> > */
> > DEFINE_SPINLOCK(hugetlb_lock);
> >
> > @@ -3431,3 +3432,32 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
> > return ret;
> > }
> > #endif
> > +
> > +bool isolate_huge_page(struct page *page, struct list_head *l)
>
> Can we replace the page parameter with p?
Yes. Maybe it's strange to use the full name "page" for one parameter
and an extremely shortened one "l" for another one.
> > +{
> > + VM_BUG_ON(!PageHead(page));
> > + if (!get_page_unless_zero(page))
> > + return false;
> > + spin_lock(&hugetlb_lock);
> > + list_move_tail(&page->lru, l);
> > + spin_unlock(&hugetlb_lock);
> > + return true;
> > +}
> > +
> > +void putback_active_hugepage(struct page *page)
> > +{
> > + VM_BUG_ON(!PageHead(page));
> > + spin_lock(&hugetlb_lock);
> > + list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
> > + spin_unlock(&hugetlb_lock);
> > + put_page(page);
> > +}
> > +
> > +void putback_active_hugepages(struct list_head *l)
> > +{
> > + struct page *page;
> > + struct page *page2;
> > +
> > + list_for_each_entry_safe(page, page2, l, lru)
> > + putback_active_hugepage(page);
>
> Can we acquire hugetlb_lock only once?
I'm not sure which is the best. In general, fine-grained locking is
preferred because other lock contenders wait less.
Could you tell some specific reason to hold lock outside the loop?
> > +}
> > diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
> > index 6f0c244..b44a067 100644
> > --- v3.11-rc1.orig/mm/migrate.c
> > +++ v3.11-rc1/mm/migrate.c
> > @@ -100,6 +100,10 @@ void putback_movable_pages(struct list_head *l)
> > struct page *page2;
> >
> > list_for_each_entry_safe(page, page2, l, lru) {
> > + if (unlikely(PageHuge(page))) {
> > + putback_active_hugepage(page);
> > + continue;
> > + }
> > list_del(&page->lru);
> > dec_zone_page_state(page, NR_ISOLATED_ANON +
> > page_is_file_cache(page));
> > @@ -1025,7 +1029,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> > list_for_each_entry_safe(page, page2, from, lru) {
> > cond_resched();
> >
> > - rc = unmap_and_move(get_new_page, private,
> > + if (PageHuge(page))
> > + rc = unmap_and_move_huge_page(get_new_page,
> > + private, page, pass > 2, mode);
> > + else
> > + rc = unmap_and_move(get_new_page, private,
> > page, pass > 2, mode);
> >
> Is this hunk unclean merge?
Sorry, I don't catch the point. This patch is based on v3.11-rc1 and
the present HEAD has no changes from that release.
Or do you mean that other trees have some conflicts? (my brief checking
on -mm/-next didn't find that...)
Thanks,
Naoya Horiguchi
On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
<[email protected]> wrote:
> This patch extends move_pages() to handle vma with VM_HUGETLB set.
> We will be able to migrate hugepage with move_pages(2) after
> applying the enablement patch which comes later in this series.
>
> We avoid getting refcount on tail pages of hugepage, because unlike thp,
> hugepage is not split and we need not care about races with splitting.
>
> And migration of larger (1GB for x86_64) hugepage are not enabled.
>
> ChangeLog v3:
> - revert introducing migrate_movable_pages
> - follow_page_mask(FOLL_GET) returns NULL for tail pages
> - use isolate_huge_page
>
> ChangeLog v2:
> - updated description and renamed patch title
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> mm/memory.c | 12 ++++++++++--
> mm/migrate.c | 13 +++++++++++--
> 2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git v3.11-rc1.orig/mm/memory.c v3.11-rc1/mm/memory.c
> index 1ce2e2a..8c9a2cb 100644
> --- v3.11-rc1.orig/mm/memory.c
> +++ v3.11-rc1/mm/memory.c
> @@ -1496,7 +1496,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
> if (pud_none(*pud))
> goto no_page_table;
> if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
> - BUG_ON(flags & FOLL_GET);
> + if (flags & FOLL_GET)
> + goto out;
> page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
> goto out;
> }
> @@ -1507,8 +1508,15 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
> if (pmd_none(*pmd))
> goto no_page_table;
> if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
> - BUG_ON(flags & FOLL_GET);
> page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
> + if (flags & FOLL_GET) {
> + if (PageHead(page))
> + get_page_foll(page);
> + else {
> + page = NULL;
> + goto out;
> + }
> + }
Can get_page do the work for us, like the following?
if (flags & FOLL_GET)
get_page(page);
> goto out;
> }
> if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
> diff --git v3.11-rc1.orig/mm/migrate.c v3.11-rc1/mm/migrate.c
> index 3ec47d3..d313737 100644
> --- v3.11-rc1.orig/mm/migrate.c
> +++ v3.11-rc1/mm/migrate.c
> @@ -1092,7 +1092,11 @@ static struct page *new_page_node(struct page *p, unsigned long private,
>
> *result = &pm->status;
>
> - return alloc_pages_exact_node(pm->node,
> + if (PageHuge(p))
> + return alloc_huge_page_node(page_hstate(compound_head(p)),
> + pm->node);
> + else
> + return alloc_pages_exact_node(pm->node,
> GFP_HIGHUSER_MOVABLE | GFP_THISNODE, 0);
> }
>
> @@ -1152,6 +1156,11 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
> !migrate_all)
> goto put_and_set;
>
> + if (PageHuge(page)) {
> + isolate_huge_page(page, &pagelist);
> + goto put_and_set;
> + }
> +
> err = isolate_lru_page(page);
> if (!err) {
> list_add_tail(&page->lru, &pagelist);
> @@ -1174,7 +1183,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
> err = migrate_pages(&pagelist, new_page_node,
> (unsigned long)pm, MIGRATE_SYNC, MR_SYSCALL);
> if (err)
> - putback_lru_pages(&pagelist);
> + putback_movable_pages(&pagelist);
> }
>
> up_read(&mm->mmap_sem);
> --
> 1.8.3.1
>
On Fri, Jul 19, 2013 at 11:18 AM, Naoya Horiguchi
<[email protected]> wrote:
>> > +bool isolate_huge_page(struct page *page, struct list_head *l)
>>
>> Can we replace the page parameter with p?
>
> Yes. Maybe it's strange to use the full name "page" for one parameter
> and an extremely shortened one "l" for another one.
>
Actually i mean the l arg could be replaced with something else ;)
>> > +
>> > +void putback_active_hugepage(struct page *page)
>> > +{
>> > + VM_BUG_ON(!PageHead(page));
>> > + spin_lock(&hugetlb_lock);
>> > + list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
>> > + spin_unlock(&hugetlb_lock);
>> > + put_page(page);
>> > +}
>> > +
>> > +void putback_active_hugepages(struct list_head *l)
>> > +{
>> > + struct page *page;
>> > + struct page *page2;
>> > +
>> > + list_for_each_entry_safe(page, page2, l, lru)
>> > + putback_active_hugepage(page);
>>
>> Can we acquire hugetlb_lock only once?
>
> I'm not sure which is the best. In general, fine-grained locking is
> preferred because other lock contenders wait less.
> Could you tell some specific reason to hold lock outside the loop?
>
No anything special, looks we can do list splice after taking lock,
then we no longer contend it.
>> > @@ -1025,7 +1029,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>> > list_for_each_entry_safe(page, page2, from, lru) {
>> > cond_resched();
>> >
>> > - rc = unmap_and_move(get_new_page, private,
>> > + if (PageHuge(page))
>> > + rc = unmap_and_move_huge_page(get_new_page,
>> > + private, page, pass > 2, mode);
>> > + else
>> > + rc = unmap_and_move(get_new_page, private,
>> > page, pass > 2, mode);
>> >
>> Is this hunk unclean merge?
>
> Sorry, I don't catch the point. This patch is based on v3.11-rc1 and
> the present HEAD has no changes from that release.
> Or do you mean that other trees have some conflicts? (my brief checking
> on -mm/-next didn't find that...)
>
Looks this hunk should appear in 2/8 or later, as 1/8 is focusing
on hugepage->lru?
On Fri, Jul 19, 2013 at 11:05:37AM +0800, Hillf Danton wrote:
> On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
> <[email protected]> wrote:
> > This patch extends check_range() to handle vma with VM_HUGETLB set.
> > We will be able to migrate hugepage with migrate_pages(2) after
> > applying the enablement patch which comes later in this series.
> >
> > Note that for larger hugepages (covered by pud entries, 1GB for
> > x86_64 for example), we simply skip it now.
> >
> > Note that using pmd_huge/pud_huge assumes that hugepages are pointed to
> > by pmd/pud. This is not true in some architectures implementing hugepage
> > with other mechanisms like ia64, but it's OK because pmd_huge/pud_huge
> > simply return 0 in such arch and page walker simply ignores such hugepages.
> >
> > ChangeLog v3:
> > - revert introducing migrate_movable_pages
> > - use isolate_huge_page
> >
> > ChangeLog v2:
> > - remove unnecessary extern
> > - fix page table lock in check_hugetlb_pmd_range
> > - updated description and renamed patch title
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > mm/mempolicy.c | 39 ++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 34 insertions(+), 5 deletions(-)
> >
> > diff --git v3.11-rc1.orig/mm/mempolicy.c v3.11-rc1/mm/mempolicy.c
> > index 7431001..f3b65c0 100644
> > --- v3.11-rc1.orig/mm/mempolicy.c
> > +++ v3.11-rc1/mm/mempolicy.c
> > @@ -512,6 +512,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> > return addr != end;
> > }
> >
> > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> > + const nodemask_t *nodes, unsigned long flags,
> > + void *private)
> > +{
> > +#ifdef CONFIG_HUGETLB_PAGE
> > + int nid;
> > + struct page *page;
> > +
> > + spin_lock(&vma->vm_mm->page_table_lock);
> > + page = pte_page(huge_ptep_get((pte_t *)pmd));
> > + nid = page_to_nid(page);
>
> Can you please add a brief comment for the if block?
Hmm, honestly saying, I just copied this complex if-condition from
check_pte_range() and opened migrate_page_add(), and refactored.
But this refactoring might not be good considering readability.
I will factorize duplicated logic into a single function and
add some comment to make it more readable.
Thanks,
Naoya
> > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
> > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
> > + || flags & MPOL_MF_MOVE_ALL))
> > + isolate_huge_page(page, private);
> > + spin_unlock(&vma->vm_mm->page_table_lock);
> > +#else
> > + BUG();
> > +#endif
> > +}
> > +
> > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > unsigned long addr, unsigned long end,
> > const nodemask_t *nodes, unsigned long flags,
On Fri, Jul 19, 2013 at 11:36:19AM +0800, Hillf Danton wrote:
> On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
> <[email protected]> wrote:
> > This patch extends move_pages() to handle vma with VM_HUGETLB set.
> > We will be able to migrate hugepage with move_pages(2) after
> > applying the enablement patch which comes later in this series.
> >
> > We avoid getting refcount on tail pages of hugepage, because unlike thp,
> > hugepage is not split and we need not care about races with splitting.
> >
> > And migration of larger (1GB for x86_64) hugepage are not enabled.
> >
> > ChangeLog v3:
> > - revert introducing migrate_movable_pages
> > - follow_page_mask(FOLL_GET) returns NULL for tail pages
> > - use isolate_huge_page
> >
> > ChangeLog v2:
> > - updated description and renamed patch title
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > mm/memory.c | 12 ++++++++++--
> > mm/migrate.c | 13 +++++++++++--
> > 2 files changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git v3.11-rc1.orig/mm/memory.c v3.11-rc1/mm/memory.c
> > index 1ce2e2a..8c9a2cb 100644
> > --- v3.11-rc1.orig/mm/memory.c
> > +++ v3.11-rc1/mm/memory.c
> > @@ -1496,7 +1496,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
> > if (pud_none(*pud))
> > goto no_page_table;
> > if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
> > - BUG_ON(flags & FOLL_GET);
> > + if (flags & FOLL_GET)
> > + goto out;
> > page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
> > goto out;
> > }
> > @@ -1507,8 +1508,15 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
> > if (pmd_none(*pmd))
> > goto no_page_table;
> > if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
> > - BUG_ON(flags & FOLL_GET);
> > page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
> > + if (flags & FOLL_GET) {
> > + if (PageHead(page))
> > + get_page_foll(page);
> > + else {
> > + page = NULL;
> > + goto out;
> > + }
> > + }
>
> Can get_page do the work for us, like the following?
>
> if (flags & FOLL_GET)
> get_page(page);
Ohh, OK. We should use get_page instead of get_page_foll, because
get_page_foll is for thp.
However, I think that if(PageHead) blocks are necessary because
otherwise we get refcounts on tail pages and release them immediately
in the caller's side, which is fragile (this was discussed previously.)
http://thread.gmane.org/gmane.linux.kernel.mm/96665/focus=96818
Anyway I'll add comment on this hunk in the next post.
Thanks,
Naoya
> > goto out;
> > }
> > if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
On Fri, Jul 19, 2013 at 12:04:56PM +0800, Hillf Danton wrote:
...
> >> > +
> >> > +void putback_active_hugepage(struct page *page)
> >> > +{
> >> > + VM_BUG_ON(!PageHead(page));
> >> > + spin_lock(&hugetlb_lock);
> >> > + list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
> >> > + spin_unlock(&hugetlb_lock);
> >> > + put_page(page);
> >> > +}
> >> > +
> >> > +void putback_active_hugepages(struct list_head *l)
> >> > +{
> >> > + struct page *page;
> >> > + struct page *page2;
> >> > +
> >> > + list_for_each_entry_safe(page, page2, l, lru)
> >> > + putback_active_hugepage(page);
> >>
> >> Can we acquire hugetlb_lock only once?
> >
> > I'm not sure which is the best. In general, fine-grained locking is
> > preferred because other lock contenders wait less.
> > Could you tell some specific reason to hold lock outside the loop?
> >
> No anything special, looks we can do list splice after taking lock,
> then we no longer contend it.
>
> >> > @@ -1025,7 +1029,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> >> > list_for_each_entry_safe(page, page2, from, lru) {
> >> > cond_resched();
> >> >
> >> > - rc = unmap_and_move(get_new_page, private,
> >> > + if (PageHuge(page))
> >> > + rc = unmap_and_move_huge_page(get_new_page,
> >> > + private, page, pass > 2, mode);
> >> > + else
> >> > + rc = unmap_and_move(get_new_page, private,
> >> > page, pass > 2, mode);
> >> >
> >> Is this hunk unclean merge?
> >
> > Sorry, I don't catch the point. This patch is based on v3.11-rc1 and
> > the present HEAD has no changes from that release.
> > Or do you mean that other trees have some conflicts? (my brief checking
> > on -mm/-next didn't find that...)
> >
> Looks this hunk should appear in 2/8 or later, as 1/8 is focusing
> on hugepage->lru?
I intended that 1/8 prepares common code used by all users of hugepage
migration. If I put this hunk on a patch which implements one of the
users, the other patchs implementing other users depends on it, which
looks to me an odd dependency...
Naoya
On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
<[email protected]> wrote:
> This patch enables hugepage migration from migrate_pages(2),
> move_pages(2), and mbind(2).
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
Acked-by: Hillf Danton <[email protected]>
> include/linux/mempolicy.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git v3.11-rc1.orig/include/linux/mempolicy.h v3.11-rc1/include/linux/mempolicy.h
> index 0d7df39..2e475b5 100644
> --- v3.11-rc1.orig/include/linux/mempolicy.h
> +++ v3.11-rc1/include/linux/mempolicy.h
> @@ -173,7 +173,7 @@ extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
> /* Check if a vma is migratable */
> static inline int vma_migratable(struct vm_area_struct *vma)
> {
> - if (vma->vm_flags & (VM_IO | VM_HUGETLB | VM_PFNMAP))
> + if (vma->vm_flags & (VM_IO | VM_PFNMAP))
> return 0;
> /*
> * Migration allocates pages in the highest zone. If we cannot
> --
> 1.8.3.1
>
On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
<[email protected]> wrote:
> @@ -518,9 +519,11 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
> {
> struct page *page;
>
> - if (list_empty(&h->hugepage_freelists[nid]))
> + list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
> + if (!is_migrate_isolate_page(page))
> + break;
> + if (&h->hugepage_freelists[nid] == &page->lru)
For what is this check?
> return NULL;
> - page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> list_move(&page->lru, &h->hugepage_activelist);
> set_page_refcounted(page);
> h->free_huge_pages--;
On Fri, Jul 19, 2013 at 01:40:38PM +0800, Hillf Danton wrote:
> On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
> <[email protected]> wrote:
> > @@ -518,9 +519,11 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
> > {
> > struct page *page;
> >
> > - if (list_empty(&h->hugepage_freelists[nid]))
> > + list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
> > + if (!is_migrate_isolate_page(page))
> > + break;
> > + if (&h->hugepage_freelists[nid] == &page->lru)
>
> For what is this check?
This check returns true unless a non-isolated free hugepage is found.
In "not found" case page points to h->hugepage_freelists, so without
this check successive code doesn't work fine.
Thanks,
Naoya
> > return NULL;
> > - page = list_entry(h->hugepage_freelists[nid].next, struct page, lru);
> > list_move(&page->lru, &h->hugepage_activelist);
> > set_page_refcounted(page);
> > h->free_huge_pages--;
>
On Thu, Jul 18, 2013 at 05:34:24PM -0400, Naoya Horiguchi wrote:
> Here is the 3rd version of hugepage migration patchset.
> I rebased it onto v3.11-rc1 and applied most of your feedbacks.
>
> Some works referred to in previous discussion (shown below) are not included
> in this patchset, but likely to be done after this work.
> - using page walker in check_range
> - split page table lock for pmd/pud based hugepage (maybe applicable to thp)
I did a quick read through the patchkit and it looks all good to me.
It also closes a long standing gap. Thanks!
Acked-by: Andi Kleen <[email protected]>
> Hugepage migration of 1GB hugepage is not enabled for now, because
> I'm not sure whether users of 1GB hugepage really want it.
> We need to spare free hugepage in order to do migration, but I don't
> think that users want to 1GB memory to idle for that purpose
> (currently we can't expand/shrink 1GB hugepage pool after boot).
I think we'll need 1GB migration sooner or later. As memory sizes
go up 1GB use will be more common, and the limitation of not
expanding/shrinking 1GB will be eventually fixed.
It would be just a straight forward extension of your patchkit,
right?
-Andi
On Fri, Jul 19, 2013 at 05:33:32PM +0200, Andi Kleen wrote:
> On Thu, Jul 18, 2013 at 05:34:24PM -0400, Naoya Horiguchi wrote:
> > Here is the 3rd version of hugepage migration patchset.
> > I rebased it onto v3.11-rc1 and applied most of your feedbacks.
> >
> > Some works referred to in previous discussion (shown below) are not included
> > in this patchset, but likely to be done after this work.
> > - using page walker in check_range
> > - split page table lock for pmd/pud based hugepage (maybe applicable to thp)
>
> I did a quick read through the patchkit and it looks all good to me.
> It also closes a long standing gap. Thanks!
>
> Acked-by: Andi Kleen <[email protected]>
Thank you.
Can I add your Ack on the whole series?
> > Hugepage migration of 1GB hugepage is not enabled for now, because
> > I'm not sure whether users of 1GB hugepage really want it.
> > We need to spare free hugepage in order to do migration, but I don't
> > think that users want to 1GB memory to idle for that purpose
> > (currently we can't expand/shrink 1GB hugepage pool after boot).
>
> I think we'll need 1GB migration sooner or later. As memory sizes
> go up 1GB use will be more common, and the limitation of not
> expanding/shrinking 1GB will be eventually fixed.
>
> It would be just a straight forward extension of your patchkit,
> right?
Right, I'm preparing for 1GB hugepages migration, but it's still
lack of testing.
Thanks,
Naoya
> > I did a quick read through the patchkit and it looks all good to me.
> > It also closes a long standing gap. Thanks!
> >
> > Acked-by: Andi Kleen <[email protected]>
>
> Thank you.
> Can I add your Ack on the whole series?
Yes.
-Andi
On Fri, Jul 19, 2013 at 10:39 PM, Naoya Horiguchi
<[email protected]> wrote:
> On Fri, Jul 19, 2013 at 01:40:38PM +0800, Hillf Danton wrote:
>> On Fri, Jul 19, 2013 at 5:34 AM, Naoya Horiguchi
>> <[email protected]> wrote:
>> > @@ -518,9 +519,11 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
>> > {
>> > struct page *page;
>> >
>> > - if (list_empty(&h->hugepage_freelists[nid]))
>> > + list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
>> > + if (!is_migrate_isolate_page(page))
>> > + break;
>> > + if (&h->hugepage_freelists[nid] == &page->lru)
>>
>> For what is this check?
>
> This check returns true unless a non-isolated free hugepage is found.
> In "not found" case page points to h->hugepage_freelists, so without
> this check successive code doesn't work fine.
>
Thanks for your explanation, and looks another local variable
struct page *found for easing reader.
Good weekend
Hillf
On Wed, Jul 24, 2013 at 02:10:07PM +0800, Wanpeng Li wrote:
...
> >diff --git v3.11-rc1.orig/mm/page_isolation.c v3.11-rc1/mm/page_isolation.c
> >index 383bdbb..cf48ef6 100644
> >--- v3.11-rc1.orig/mm/page_isolation.c
> >+++ v3.11-rc1/mm/page_isolation.c
> >@@ -6,6 +6,7 @@
> > #include <linux/page-isolation.h>
> > #include <linux/pageblock-flags.h>
> > #include <linux/memory.h>
> >+#include <linux/hugetlb.h>
> > #include "internal.h"
> >
> > int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
> >@@ -252,6 +253,10 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
> > {
> > gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;
> >
> >+ if (PageHuge(page))
> >+ return alloc_huge_page_node(page_hstate(compound_head(page)),
> >+ numa_node_id());
> >+
>
> Why specify current node? Maybe current node is under remove.
Yes. One difficulty is that this function doesn't have vma and we can't
rely on mempolicy for node choice. I think that simply choosing the next
node by incrementing node id can be a work around, though it's not the
best solution.
Thanks,
Naoya Horiguchi