From: Joonsoo Kim <[email protected]>
This patchset clean-up the migration target allocation functions.
* Changes on v4
- use full gfp_mask
- use memalloc_nocma_{save,restore} to exclude CMA memory
- separate __GFP_RECLAIM handling for THP allocation
- remove more wrapper functions
* Changes on v3
- As Vlastimil suggested, do not introduce alloc_control for hugetlb functions
- do not change the signature of migrate_pages()
- rename alloc_control to migration_target_control
* Changes on v2
- add acked-by tags
- fix missing compound_head() call for patch #3
- remove thisnode field on alloc_control and use __GFP_THISNODE directly
- fix missing __gfp_mask setup for patch
"mm/hugetlb: do not modify user provided gfp_mask"
* Cover-letter
Contributions of this patchset are:
1. unify two hugetlb alloc functions. As a result, one is remained.
2. make one external hugetlb alloc function to internal one.
3. unify three functions for migration target allocation.
The patchset is based on next-20200703 + revert v3 of this patchset.
git revert ddc017c727e429488cccd401a7794c8152e50a5b~1..583c2617fd3244fff79ba3b445964884c5cd7780
The patchset is available on:
https://github.com/JoonsooKim/linux/tree/cleanup-migration-target-allocation-v4.00-next-20200703
Thanks.
Joonsoo Kim (11):
mm/page_isolation: prefer the node of the source page
mm/migrate: move migration helper from .h to .c
mm/hugetlb: unify migration callbacks
mm/hugetlb: make hugetlb migration callback CMA aware
mm/migrate: clear __GFP_RECLAIM for THP allocation for migration
mm/migrate: make a standard migration target allocation function
mm/gup: use a standard migration target allocation callback
mm/mempolicy: use a standard migration target allocation callback
mm/page_alloc: remove a wrapper for alloc_migration_target()
mm/memory-failure: remove a wrapper for alloc_migration_target()
mm/memory_hotplug: remove a wrapper for alloc_migration_target()
include/linux/hugetlb.h | 28 ++++++++++++-------
include/linux/migrate.h | 34 +++++------------------
mm/gup.c | 60 +++++------------------------------------
mm/hugetlb.c | 71 +++++++++++++++++++------------------------------
mm/internal.h | 9 ++++++-
mm/memory-failure.c | 15 +++++------
mm/memory_hotplug.c | 42 +++++++++++++++--------------
mm/mempolicy.c | 29 +++++---------------
mm/migrate.c | 59 ++++++++++++++++++++++++++++++++++++++--
mm/page_alloc.c | 8 ++++--
mm/page_isolation.c | 5 ----
11 files changed, 163 insertions(+), 197 deletions(-)
--
2.7.4
From: Joonsoo Kim <[email protected]>
For locality, it's better to migrate the page to the same node rather than
the node of the current caller's cpu.
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_isolation.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f6d07c5..aec26d9 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -309,5 +309,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct page *alloc_migrate_target(struct page *page, unsigned long private)
{
- return new_page_nodemask(page, numa_node_id(), &node_states[N_MEMORY]);
+ int nid = page_to_nid(page);
+
+ return new_page_nodemask(page, nid, &node_states[N_MEMORY]);
}
--
2.7.4
From: Joonsoo Kim <[email protected]>
It's not performance sensitive function. Move it to .c. This is a
preparation step for future change.
Acked-by: Mike Kravetz <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/migrate.h | 33 +++++----------------------------
mm/migrate.c | 29 +++++++++++++++++++++++++++++
2 files changed, 34 insertions(+), 28 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cb..1d70b4a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -31,34 +31,6 @@ enum migrate_reason {
/* In mm/debug.c; also keep sync with include/trace/events/migrate.h */
extern const char *migrate_reason_names[MR_TYPES];
-static inline struct page *new_page_nodemask(struct page *page,
- int preferred_nid, nodemask_t *nodemask)
-{
- gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
- unsigned int order = 0;
- struct page *new_page = NULL;
-
- if (PageHuge(page))
- return alloc_huge_page_nodemask(page_hstate(compound_head(page)),
- preferred_nid, nodemask);
-
- if (PageTransHuge(page)) {
- gfp_mask |= GFP_TRANSHUGE;
- order = HPAGE_PMD_ORDER;
- }
-
- if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
- gfp_mask |= __GFP_HIGHMEM;
-
- new_page = __alloc_pages_nodemask(gfp_mask, order,
- preferred_nid, nodemask);
-
- if (new_page && PageTransHuge(new_page))
- prep_transhuge_page(new_page);
-
- return new_page;
-}
-
#ifdef CONFIG_MIGRATION
extern void putback_movable_pages(struct list_head *l);
@@ -67,6 +39,8 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
+extern struct page *new_page_nodemask(struct page *page,
+ int preferred_nid, nodemask_t *nodemask);
extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
extern void putback_movable_page(struct page *page);
@@ -85,6 +59,9 @@ static inline int migrate_pages(struct list_head *l, new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
+static inline struct page *new_page_nodemask(struct page *page,
+ int preferred_nid, nodemask_t *nodemask)
+ { return NULL; }
static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
diff --git a/mm/migrate.c b/mm/migrate.c
index d105b67..7370a66 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1531,6 +1531,35 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
return rc;
}
+struct page *new_page_nodemask(struct page *page,
+ int preferred_nid, nodemask_t *nodemask)
+{
+ gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+ unsigned int order = 0;
+ struct page *new_page = NULL;
+
+ if (PageHuge(page))
+ return alloc_huge_page_nodemask(
+ page_hstate(compound_head(page)),
+ preferred_nid, nodemask);
+
+ if (PageTransHuge(page)) {
+ gfp_mask |= GFP_TRANSHUGE;
+ order = HPAGE_PMD_ORDER;
+ }
+
+ if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+ gfp_mask |= __GFP_HIGHMEM;
+
+ new_page = __alloc_pages_nodemask(gfp_mask, order,
+ preferred_nid, nodemask);
+
+ if (new_page && PageTransHuge(new_page))
+ prep_transhuge_page(new_page);
+
+ return new_page;
+}
+
#ifdef CONFIG_NUMA
static int store_status(int __user *status, int start, int value, int nr)
--
2.7.4
From: Joonsoo Kim <[email protected]>
There is no difference between two migration callback functions,
alloc_huge_page_node() and alloc_huge_page_nodemask(), except
__GFP_THISNODE handling. It's redundant to have two almost similar
functions in order to handle this flag. So, this patch tries to
remove one by introducing a new argument, gfp_mask, to
alloc_huge_page_nodemask().
After introducing gfp_mask argument, it's caller's job to provide correct
gfp_mask. So, every callsites for alloc_huge_page_nodemask() are changed
to provide gfp_mask.
Note that it's safe to remove a node id check in alloc_huge_page_node()
since there is no caller passing NUMA_NO_NODE as a node id.
Reviewed-by: Mike Kravetz <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/hugetlb.h | 26 ++++++++++++++++++--------
mm/hugetlb.c | 35 ++---------------------------------
mm/mempolicy.c | 10 ++++++----
mm/migrate.c | 11 +++++++----
4 files changed, 33 insertions(+), 49 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 50650d0..bb93e95 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -10,6 +10,7 @@
#include <linux/list.h>
#include <linux/kref.h>
#include <linux/pgtable.h>
+#include <linux/gfp.h>
struct ctl_table;
struct user_struct;
@@ -504,9 +505,8 @@ struct huge_bootmem_page {
struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
-struct page *alloc_huge_page_node(struct hstate *h, int nid);
struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask);
+ nodemask_t *nmask, gfp_t gfp_mask);
struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
@@ -692,6 +692,15 @@ static inline bool hugepage_movable_supported(struct hstate *h)
return true;
}
+/* Movability of hugepages depends on migration support. */
+static inline gfp_t htlb_alloc_mask(struct hstate *h)
+{
+ if (hugepage_movable_supported(h))
+ return GFP_HIGHUSER_MOVABLE;
+ else
+ return GFP_HIGHUSER;
+}
+
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
@@ -759,13 +768,9 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
return NULL;
}
-static inline struct page *alloc_huge_page_node(struct hstate *h, int nid)
-{
- return NULL;
-}
-
static inline struct page *
-alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask)
+alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
+ nodemask_t *nmask, gfp_t gfp_mask)
{
return NULL;
}
@@ -878,6 +883,11 @@ static inline bool hugepage_movable_supported(struct hstate *h)
return false;
}
+static inline gfp_t htlb_alloc_mask(struct hstate *h)
+{
+ return 0;
+}
+
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7e5ba5c0..3245aa0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1089,15 +1089,6 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
return NULL;
}
-/* Movability of hugepages depends on migration support. */
-static inline gfp_t htlb_alloc_mask(struct hstate *h)
-{
- if (hugepage_movable_supported(h))
- return GFP_HIGHUSER_MOVABLE;
- else
- return GFP_HIGHUSER;
-}
-
static struct page *dequeue_huge_page_vma(struct hstate *h,
struct vm_area_struct *vma,
unsigned long address, int avoid_reserve,
@@ -1979,31 +1970,9 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
}
/* page migration callback function */
-struct page *alloc_huge_page_node(struct hstate *h, int nid)
-{
- gfp_t gfp_mask = htlb_alloc_mask(h);
- struct page *page = NULL;
-
- if (nid != NUMA_NO_NODE)
- gfp_mask |= __GFP_THISNODE;
-
- spin_lock(&hugetlb_lock);
- if (h->free_huge_pages - h->resv_huge_pages > 0)
- page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL);
- spin_unlock(&hugetlb_lock);
-
- if (!page)
- page = alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
-
- return page;
-}
-
-/* page migration callback function */
struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask)
+ nodemask_t *nmask, gfp_t gfp_mask)
{
- gfp_t gfp_mask = htlb_alloc_mask(h);
-
spin_lock(&hugetlb_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
struct page *page;
@@ -2031,7 +2000,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
gfp_mask = htlb_alloc_mask(h);
node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
- page = alloc_huge_page_nodemask(h, node, nodemask);
+ page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask);
mpol_cond_put(mpol);
return page;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index dabcee8..9034a53 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1068,10 +1068,12 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist,
/* page allocation callback for NUMA node migration */
struct page *alloc_new_node_page(struct page *page, unsigned long node)
{
- if (PageHuge(page))
- return alloc_huge_page_node(page_hstate(compound_head(page)),
- node);
- else if (PageTransHuge(page)) {
+ if (PageHuge(page)) {
+ struct hstate *h = page_hstate(compound_head(page));
+ gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
+
+ return alloc_huge_page_nodemask(h, node, NULL, gfp_mask);
+ } else if (PageTransHuge(page)) {
struct page *thp;
thp = alloc_pages_node(node,
diff --git a/mm/migrate.c b/mm/migrate.c
index 7370a66..3b3d918 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1538,10 +1538,13 @@ struct page *new_page_nodemask(struct page *page,
unsigned int order = 0;
struct page *new_page = NULL;
- if (PageHuge(page))
- return alloc_huge_page_nodemask(
- page_hstate(compound_head(page)),
- preferred_nid, nodemask);
+ if (PageHuge(page)) {
+ struct hstate *h = page_hstate(compound_head(page));
+
+ gfp_mask = htlb_alloc_mask(h);
+ return alloc_huge_page_nodemask(h, preferred_nid,
+ nodemask, gfp_mask);
+ }
if (PageTransHuge(page)) {
gfp_mask |= GFP_TRANSHUGE;
--
2.7.4
From: Joonsoo Kim <[email protected]>
new_non_cma_page() in gup.c which try to allocate migration target page
requires to allocate the new page that is not on the CMA area.
new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
works well for THP page or normal page but not for hugetlb page.
hugetlb page allocation process consists of two steps. First is dequeing
from the pool. Second is, if there is no available page on the queue,
allocating from the page allocator.
new_non_cma_page() can control allocation from the page allocator by
specifying correct gfp flag. However, dequeing cannot be controlled until
now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
patch tries to fix this situation.
This patch makes the deque function on hugetlb CMA aware and skip CMA
pages if newly added skip_cma argument is passed as true.
Acked-by: Mike Kravetz <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/hugetlb.h | 6 ++----
mm/gup.c | 3 ++-
mm/hugetlb.c | 46 ++++++++++++++++++++++++++++++----------------
mm/mempolicy.c | 2 +-
mm/migrate.c | 2 +-
5 files changed, 36 insertions(+), 23 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index bb93e95..5a9ddf1 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -506,11 +506,9 @@ struct huge_bootmem_page {
struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask);
+ nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma);
struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
- int nid, nodemask_t *nmask);
int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
@@ -770,7 +768,7 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
static inline struct page *
alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
{
return NULL;
}
diff --git a/mm/gup.c b/mm/gup.c
index 5daadae..2c3dab4 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
#ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+
/*
* We don't want to dequeue from the pool because pool pages will
* mostly be from the CMA region.
*/
- return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+ return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
}
#endif
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3245aa0..bcf4abe 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -29,6 +29,7 @@
#include <linux/numa.h>
#include <linux/llist.h>
#include <linux/cma.h>
+#include <linux/sched/mm.h>
#include <asm/page.h>
#include <asm/tlb.h>
@@ -1033,13 +1034,18 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
h->free_huge_pages_node[nid]++;
}
-static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
+static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, bool skip_cma)
{
struct page *page;
- list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
+ list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
+ if (skip_cma && is_migrate_cma_page(page))
+ continue;
+
if (!PageHWPoison(page))
break;
+ }
+
/*
* if 'non-isolated free hugepage' not found on the list,
* the allocation fails.
@@ -1054,7 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
}
static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask, int nid,
- nodemask_t *nmask)
+ nodemask_t *nmask, bool skip_cma)
{
unsigned int cpuset_mems_cookie;
struct zonelist *zonelist;
@@ -1079,7 +1085,7 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
continue;
node = zone_to_nid(zone);
- page = dequeue_huge_page_node_exact(h, node);
+ page = dequeue_huge_page_node_exact(h, node, skip_cma);
if (page)
return page;
}
@@ -1115,7 +1121,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
- page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+ page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask, false);
if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
SetPagePrivate(page);
h->resv_huge_pages--;
@@ -1928,7 +1934,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
return page;
}
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
int nid, nodemask_t *nmask)
{
struct page *page;
@@ -1971,21 +1977,29 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
/* page migration callback function */
struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
{
+ unsigned int flags = 0;
+ struct page *page = NULL;
+
+ if (skip_cma)
+ flags = memalloc_nocma_save();
+
spin_lock(&hugetlb_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
- struct page *page;
-
- page = dequeue_huge_page_nodemask(h, gfp_mask, preferred_nid, nmask);
- if (page) {
- spin_unlock(&hugetlb_lock);
- return page;
- }
+ page = dequeue_huge_page_nodemask(h, gfp_mask,
+ preferred_nid, nmask, skip_cma);
}
spin_unlock(&hugetlb_lock);
- return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask);
+ if (!page)
+ page = alloc_migrate_huge_page(h, gfp_mask,
+ preferred_nid, nmask);
+
+ if (skip_cma)
+ memalloc_nocma_restore(flags);
+
+ return page;
}
/* mempolicy aware migration callback */
@@ -2000,7 +2014,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
gfp_mask = htlb_alloc_mask(h);
node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
- page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask);
+ page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask, false);
mpol_cond_put(mpol);
return page;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 9034a53..667b453 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1072,7 +1072,7 @@ struct page *alloc_new_node_page(struct page *page, unsigned long node)
struct hstate *h = page_hstate(compound_head(page));
gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
- return alloc_huge_page_nodemask(h, node, NULL, gfp_mask);
+ return alloc_huge_page_nodemask(h, node, NULL, gfp_mask, false);
} else if (PageTransHuge(page)) {
struct page *thp;
diff --git a/mm/migrate.c b/mm/migrate.c
index 3b3d918..02b31fe 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1543,7 +1543,7 @@ struct page *new_page_nodemask(struct page *page,
gfp_mask = htlb_alloc_mask(h);
return alloc_huge_page_nodemask(h, preferred_nid,
- nodemask, gfp_mask);
+ nodemask, gfp_mask, false);
}
if (PageTransHuge(page)) {
--
2.7.4
From: Joonsoo Kim <[email protected]>
In mm/migrate.c, THP allocation for migration is called with the provided
gfp_mask | GFP_TRANSHUGE. This gfp_mask contains __GFP_RECLAIM and it
would be conflict with the intention of the GFP_TRANSHUGE.
GFP_TRANSHUGE/GFP_TRANSHUGE_LIGHT is introduced to control the reclaim
behaviour by well defined manner since overhead of THP allocation is
quite large and the whole system could suffer from it. So, they deals
with __GFP_RECLAIM mask deliberately. If gfp_mask contains __GFP_RECLAIM
and uses gfp_mask | GFP_TRANSHUGE(_LIGHT) for THP allocation, it means
that it breaks the purpose of the GFP_TRANSHUGE(_LIGHT).
This patch fixes this situation by clearing __GFP_RECLAIM in provided
gfp_mask. Note that there are some other THP allocations for migration
and they just uses GFP_TRANSHUGE(_LIGHT) directly. This patch would make
all THP allocation for migration consistent.
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/migrate.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/migrate.c b/mm/migrate.c
index 02b31fe..ecd7615 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1547,6 +1547,11 @@ struct page *new_page_nodemask(struct page *page,
}
if (PageTransHuge(page)) {
+ /*
+ * clear __GFP_RECALIM since GFP_TRANSHUGE is the gfp_mask
+ * that chooses the reclaim masks deliberately.
+ */
+ gfp_mask &= ~__GFP_RECLAIM;
gfp_mask |= GFP_TRANSHUGE;
order = HPAGE_PMD_ORDER;
}
--
2.7.4
From: Joonsoo Kim <[email protected]>
There is a well-defined migration target allocation callback. Use it.
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/internal.h | 1 -
mm/mempolicy.c | 31 ++++++-------------------------
mm/migrate.c | 8 ++++++--
3 files changed, 12 insertions(+), 28 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 3236fef..6205d8a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -613,7 +613,6 @@ static inline bool is_migrate_highatomic_page(struct page *page)
}
void setup_zone_pageset(struct zone *zone);
-extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
struct migration_target_control {
int nid; /* preferred node id */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 667b453..93fcfc1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1065,29 +1065,6 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist,
return 0;
}
-/* page allocation callback for NUMA node migration */
-struct page *alloc_new_node_page(struct page *page, unsigned long node)
-{
- if (PageHuge(page)) {
- struct hstate *h = page_hstate(compound_head(page));
- gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
-
- return alloc_huge_page_nodemask(h, node, NULL, gfp_mask, false);
- } else if (PageTransHuge(page)) {
- struct page *thp;
-
- thp = alloc_pages_node(node,
- (GFP_TRANSHUGE | __GFP_THISNODE),
- HPAGE_PMD_ORDER);
- if (!thp)
- return NULL;
- prep_transhuge_page(thp);
- return thp;
- } else
- return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
- __GFP_THISNODE, 0);
-}
-
/*
* Migrate pages from one node to a target node.
* Returns error or the number of pages not migrated.
@@ -1098,6 +1075,10 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
nodemask_t nmask;
LIST_HEAD(pagelist);
int err = 0;
+ struct migration_target_control mtc = {
+ .nid = dest,
+ .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+ };
nodes_clear(nmask);
node_set(source, nmask);
@@ -1112,8 +1093,8 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
flags | MPOL_MF_DISCONTIG_OK, &pagelist);
if (!list_empty(&pagelist)) {
- err = migrate_pages(&pagelist, alloc_new_node_page, NULL, dest,
- MIGRATE_SYNC, MR_SYSCALL);
+ err = migrate_pages(&pagelist, alloc_migration_target, NULL,
+ (unsigned long)&mtc, MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages(&pagelist);
}
diff --git a/mm/migrate.c b/mm/migrate.c
index ab18b9c..b7eac38 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1599,9 +1599,13 @@ static int do_move_pages_to_node(struct mm_struct *mm,
struct list_head *pagelist, int node)
{
int err;
+ struct migration_target_control mtc = {
+ .nid = node,
+ .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+ };
- err = migrate_pages(pagelist, alloc_new_node_page, NULL, node,
- MIGRATE_SYNC, MR_SYSCALL);
+ err = migrate_pages(pagelist, alloc_migration_target, NULL,
+ (unsigned long)&mtc, MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages(pagelist);
return err;
--
2.7.4
From: Joonsoo Kim <[email protected]>
There is a well-defined standard migration target callback. Use it
directly.
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 8 ++++++--
mm/page_isolation.c | 10 ----------
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f3b70ee..6416d08 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8354,6 +8354,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
unsigned long pfn = start;
unsigned int tries = 0;
int ret = 0;
+ struct migration_target_control mtc = {
+ .nid = zone_to_nid(cc->zone),
+ .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ };
migrate_prep();
@@ -8380,8 +8384,8 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
&cc->migratepages);
cc->nr_migratepages -= nr_reclaimed;
- ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
- NULL, 0, cc->mode, MR_CONTIG_RANGE);
+ ret = migrate_pages(&cc->migratepages, alloc_migration_target,
+ NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
}
if (ret < 0) {
putback_movable_pages(&cc->migratepages);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f25c66e..242c031 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -306,13 +306,3 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
return pfn < end_pfn ? -EBUSY : 0;
}
-
-struct page *alloc_migrate_target(struct page *page, unsigned long private)
-{
- struct migration_target_control mtc = {
- .nid = page_to_nid(page),
- .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
- };
-
- return alloc_migration_target(page, (unsigned long)&mtc);
-}
--
2.7.4
From: Joonsoo Kim <[email protected]>
To calculate the correct node to migrate the page for hotplug, we need
to check node id of the page. Wrapper for alloc_migration_target() exists
for this purpose.
However, Vlastimil informs that all migration source pages come from
a single node. In this case, we don't need to check the node id for each
page and we don't need to re-set the target nodemask for each page by
using the wrapper. Set up the migration_target_control once and use it for
all pages.
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/memory_hotplug.c | 46 ++++++++++++++++++++++------------------------
1 file changed, 22 insertions(+), 24 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 86bc2ad..269e8ca 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1265,27 +1265,6 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
return 0;
}
-static struct page *new_node_page(struct page *page, unsigned long private)
-{
- nodemask_t nmask = node_states[N_MEMORY];
- struct migration_target_control mtc = {
- .nid = page_to_nid(page),
- .nmask = &nmask,
- .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
- };
-
- /*
- * try to allocate from a different node but reuse this node if there
- * are no other online nodes to be used (e.g. we are offlining a part
- * of the only existing node)
- */
- node_clear(mtc.nid, *mtc.nmask);
- if (nodes_empty(*mtc.nmask))
- node_set(mtc.nid, *mtc.nmask);
-
- return alloc_migration_target(page, (unsigned long)&mtc);
-}
-
static int
do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
{
@@ -1345,9 +1324,28 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
put_page(page);
}
if (!list_empty(&source)) {
- /* Allocate a new page from the nearest neighbor node */
- ret = migrate_pages(&source, new_node_page, NULL, 0,
- MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+ nodemask_t nmask = node_states[N_MEMORY];
+ struct migration_target_control mtc = {
+ .nmask = &nmask,
+ .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ };
+
+ /*
+ * We have checked that migration range is on a single zone so
+ * we can use the nid of the first page to all the others.
+ */
+ mtc.nid = page_to_nid(list_first_entry(&source, struct page, lru));
+
+ /*
+ * try to allocate from a different node but reuse this node
+ * if there are no other online nodes to be used (e.g. we are
+ * offlining a part of the only existing node)
+ */
+ node_clear(mtc.nid, *mtc.nmask);
+ if (nodes_empty(*mtc.nmask))
+ node_set(mtc.nid, *mtc.nmask);
+ ret = migrate_pages(&source, alloc_migration_target, NULL,
+ (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
if (ret) {
list_for_each_entry(page, &source, lru) {
pr_warn("migrating pfn %lx failed ret:%d ",
--
2.7.4
From: Joonsoo Kim <[email protected]>
There are some similar functions for migration target allocation. Since
there is no fundamental difference, it's better to keep just one rather
than keeping all variants. This patch implements base migration target
allocation function. In the following patches, variants will be converted
to use this function.
Changes should be mechanical but there are some differences. First, Some
callers' nodemask is assgined to NULL since NULL nodemask will be
considered as all available nodes, that is, &node_states[N_MEMORY].
Second, for hugetlb page allocation, gfp_mask is ORed since a user could
provide a gfp_mask from now on. Third, if provided node id is NUMA_NO_NODE,
node id is set up to the node where migration source lives.
Note that PageHighmem() call in previous function is changed to open-code
"is_highmem_idx()" since it provides more readability.
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/migrate.h | 9 +++++----
mm/internal.h | 7 +++++++
mm/memory-failure.c | 7 +++++--
mm/memory_hotplug.c | 14 +++++++++-----
mm/migrate.c | 27 +++++++++++++++++----------
mm/page_isolation.c | 7 +++++--
6 files changed, 48 insertions(+), 23 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 1d70b4a..cc56f0d 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -10,6 +10,8 @@
typedef struct page *new_page_t(struct page *page, unsigned long private);
typedef void free_page_t(struct page *page, unsigned long private);
+struct migration_target_control;
+
/*
* Return values from addresss_space_operations.migratepage():
* - negative errno on page migration failure;
@@ -39,8 +41,7 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
-extern struct page *new_page_nodemask(struct page *page,
- int preferred_nid, nodemask_t *nodemask);
+extern struct page *alloc_migration_target(struct page *page, unsigned long private);
extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
extern void putback_movable_page(struct page *page);
@@ -59,8 +60,8 @@ static inline int migrate_pages(struct list_head *l, new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
-static inline struct page *new_page_nodemask(struct page *page,
- int preferred_nid, nodemask_t *nodemask)
+static inline struct page *alloc_migration_target(struct page *page,
+ unsigned long private)
{ return NULL; }
static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
diff --git a/mm/internal.h b/mm/internal.h
index dd14c53..0beacf3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -614,4 +614,11 @@ static inline bool is_migrate_highatomic_page(struct page *page)
void setup_zone_pageset(struct zone *zone);
extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
+
+struct migration_target_control {
+ int nid; /* preferred node id */
+ nodemask_t *nmask;
+ gfp_t gfp_mask;
+};
+
#endif /* __MM_INTERNAL_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c5e4cee..609d42b6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1679,9 +1679,12 @@ EXPORT_SYMBOL(unpoison_memory);
static struct page *new_page(struct page *p, unsigned long private)
{
- int nid = page_to_nid(p);
+ struct migration_target_control mtc = {
+ .nid = page_to_nid(p),
+ .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ };
- return new_page_nodemask(p, nid, &node_states[N_MEMORY]);
+ return alloc_migration_target(p, (unsigned long)&mtc);
}
/*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index cafe65eb..86bc2ad 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1267,19 +1267,23 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
static struct page *new_node_page(struct page *page, unsigned long private)
{
- int nid = page_to_nid(page);
nodemask_t nmask = node_states[N_MEMORY];
+ struct migration_target_control mtc = {
+ .nid = page_to_nid(page),
+ .nmask = &nmask,
+ .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ };
/*
* try to allocate from a different node but reuse this node if there
* are no other online nodes to be used (e.g. we are offlining a part
* of the only existing node)
*/
- node_clear(nid, nmask);
- if (nodes_empty(nmask))
- node_set(nid, nmask);
+ node_clear(mtc.nid, *mtc.nmask);
+ if (nodes_empty(*mtc.nmask))
+ node_set(mtc.nid, *mtc.nmask);
- return new_page_nodemask(page, nid, &nmask);
+ return alloc_migration_target(page, (unsigned long)&mtc);
}
static int
diff --git a/mm/migrate.c b/mm/migrate.c
index ecd7615..00cd81c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1531,19 +1531,27 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
return rc;
}
-struct page *new_page_nodemask(struct page *page,
- int preferred_nid, nodemask_t *nodemask)
+struct page *alloc_migration_target(struct page *page, unsigned long private)
{
- gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+ struct migration_target_control *mtc;
+ gfp_t gfp_mask;
unsigned int order = 0;
struct page *new_page = NULL;
+ int nid;
+ int zidx;
+
+ mtc = (struct migration_target_control *)private;
+ gfp_mask = mtc->gfp_mask;
+ nid = mtc->nid;
+ if (nid == NUMA_NO_NODE)
+ nid = page_to_nid(page);
if (PageHuge(page)) {
struct hstate *h = page_hstate(compound_head(page));
- gfp_mask = htlb_alloc_mask(h);
- return alloc_huge_page_nodemask(h, preferred_nid,
- nodemask, gfp_mask, false);
+ gfp_mask |= htlb_alloc_mask(h);
+ return alloc_huge_page_nodemask(h, nid, mtc->nmask,
+ gfp_mask, false);
}
if (PageTransHuge(page)) {
@@ -1555,12 +1563,11 @@ struct page *new_page_nodemask(struct page *page,
gfp_mask |= GFP_TRANSHUGE;
order = HPAGE_PMD_ORDER;
}
-
- if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+ zidx = zone_idx(page_zone(page));
+ if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
gfp_mask |= __GFP_HIGHMEM;
- new_page = __alloc_pages_nodemask(gfp_mask, order,
- preferred_nid, nodemask);
+ new_page = __alloc_pages_nodemask(gfp_mask, order, nid, mtc->nmask);
if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index aec26d9..f25c66e 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -309,7 +309,10 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct page *alloc_migrate_target(struct page *page, unsigned long private)
{
- int nid = page_to_nid(page);
+ struct migration_target_control mtc = {
+ .nid = page_to_nid(page),
+ .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ };
- return new_page_nodemask(page, nid, &node_states[N_MEMORY]);
+ return alloc_migration_target(page, (unsigned long)&mtc);
}
--
2.7.4
From: Joonsoo Kim <[email protected]>
There is a well-defined migration target allocation callback. It's mostly
similar with new_non_cma_page() except considering CMA pages.
This patch adds a CMA consideration to the standard migration target
allocation callback and use it on gup.c.
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/gup.c | 61 +++++++----------------------------------------------------
mm/internal.h | 1 +
mm/migrate.c | 9 ++++++++-
3 files changed, 16 insertions(+), 55 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 2c3dab4..6a74c30 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,58 +1608,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
}
#ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page, unsigned long private)
-{
- /*
- * We want to make sure we allocate the new page from the same node
- * as the source page.
- */
- int nid = page_to_nid(page);
- /*
- * Trying to allocate a page for migration. Ignore allocation
- * failure warnings. We don't force __GFP_THISNODE here because
- * this node here is the node where we have CMA reservation and
- * in some case these nodes will have really less non movable
- * allocation memory.
- */
- gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
-
- if (PageHighMem(page))
- gfp_mask |= __GFP_HIGHMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
- if (PageHuge(page)) {
- struct hstate *h = page_hstate(page);
-
- /*
- * We don't want to dequeue from the pool because pool pages will
- * mostly be from the CMA region.
- */
- return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
- }
-#endif
- if (PageTransHuge(page)) {
- struct page *thp;
- /*
- * ignore allocation failure warnings
- */
- gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
- /*
- * Remove the movable mask so that we don't allocate from
- * CMA area again.
- */
- thp_gfpmask &= ~__GFP_MOVABLE;
- thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
- if (!thp)
- return NULL;
- prep_transhuge_page(thp);
- return thp;
- }
-
- return __alloc_pages_node(nid, gfp_mask, 0);
-}
-
static long check_and_migrate_cma_pages(struct task_struct *tsk,
struct mm_struct *mm,
unsigned long start,
@@ -1674,6 +1622,11 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
bool migrate_allow = true;
LIST_HEAD(cma_page_list);
long ret = nr_pages;
+ struct migration_target_control mtc = {
+ .nid = NUMA_NO_NODE,
+ .gfp_mask = GFP_USER | __GFP_NOWARN,
+ .skip_cma = true,
+ };
check_again:
for (i = 0; i < nr_pages;) {
@@ -1719,8 +1672,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
- if (migrate_pages(&cma_page_list, new_non_cma_page,
- NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+ if (migrate_pages(&cma_page_list, alloc_migration_target, NULL,
+ (unsigned long)&mtc, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
* some of the pages failed migration. Do get_user_pages
* without migration.
diff --git a/mm/internal.h b/mm/internal.h
index 0beacf3..3236fef 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -619,6 +619,7 @@ struct migration_target_control {
int nid; /* preferred node id */
nodemask_t *nmask;
gfp_t gfp_mask;
+ bool skip_cma;
};
#endif /* __MM_INTERNAL_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index 00cd81c..ab18b9c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1539,6 +1539,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
struct page *new_page = NULL;
int nid;
int zidx;
+ unsigned int flags = 0;
mtc = (struct migration_target_control *)private;
gfp_mask = mtc->gfp_mask;
@@ -1551,9 +1552,12 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
gfp_mask |= htlb_alloc_mask(h);
return alloc_huge_page_nodemask(h, nid, mtc->nmask,
- gfp_mask, false);
+ gfp_mask, mtc->skip_cma);
}
+ if (mtc->skip_cma)
+ flags = memalloc_nocma_save();
+
if (PageTransHuge(page)) {
/*
* clear __GFP_RECALIM since GFP_TRANSHUGE is the gfp_mask
@@ -1572,6 +1576,9 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
+ if (mtc->skip_cma)
+ memalloc_nocma_restore(flags);
+
return new_page;
}
--
2.7.4
From: Joonsoo Kim <[email protected]>
There is a well-defined standard migration target callback. Use it
directly.
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/memory-failure.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 609d42b6..3b89804 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1677,16 +1677,6 @@ int unpoison_memory(unsigned long pfn)
}
EXPORT_SYMBOL(unpoison_memory);
-static struct page *new_page(struct page *p, unsigned long private)
-{
- struct migration_target_control mtc = {
- .nid = page_to_nid(p),
- .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
- };
-
- return alloc_migration_target(p, (unsigned long)&mtc);
-}
-
/*
* Safely get reference count of an arbitrary page.
* Returns 0 for a free page, -EIO for a zero refcount page
@@ -1793,6 +1783,10 @@ static int __soft_offline_page(struct page *page)
const char *msg_page[] = {"page", "hugepage"};
bool huge = PageHuge(page);
LIST_HEAD(pagelist);
+ struct migration_target_control mtc = {
+ .nid = NUMA_NO_NODE,
+ .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ };
/*
* Check PageHWPoison again inside page lock because PageHWPoison
@@ -1829,8 +1823,8 @@ static int __soft_offline_page(struct page *page)
}
if (isolate_page(hpage, &pagelist)) {
- ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
- MIGRATE_SYNC, MR_MEMORY_FAILURE);
+ ret = migrate_pages(&pagelist, alloc_migration_target, NULL,
+ (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (!ret) {
bool release = !huge;
--
2.7.4
On 7/7/20 9:44 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> There is no difference between two migration callback functions,
> alloc_huge_page_node() and alloc_huge_page_nodemask(), except
> __GFP_THISNODE handling. It's redundant to have two almost similar
> functions in order to handle this flag. So, this patch tries to
> remove one by introducing a new argument, gfp_mask, to
> alloc_huge_page_nodemask().
>
> After introducing gfp_mask argument, it's caller's job to provide correct
> gfp_mask. So, every callsites for alloc_huge_page_nodemask() are changed
> to provide gfp_mask.
>
> Note that it's safe to remove a node id check in alloc_huge_page_node()
> since there is no caller passing NUMA_NO_NODE as a node id.
>
> Reviewed-by: Mike Kravetz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
Yeah, this version looks very good :)
Reviewed-by: Vlastimil Babka <[email protected]>
Thanks!
On Tue 07-07-20 16:44:41, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> There is no difference between two migration callback functions,
> alloc_huge_page_node() and alloc_huge_page_nodemask(), except
> __GFP_THISNODE handling. It's redundant to have two almost similar
> functions in order to handle this flag. So, this patch tries to
> remove one by introducing a new argument, gfp_mask, to
> alloc_huge_page_nodemask().
>
> After introducing gfp_mask argument, it's caller's job to provide correct
> gfp_mask. So, every callsites for alloc_huge_page_nodemask() are changed
> to provide gfp_mask.
>
> Note that it's safe to remove a node id check in alloc_huge_page_node()
> since there is no caller passing NUMA_NO_NODE as a node id.
>
> Reviewed-by: Mike Kravetz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
Thanks for doing it this way. This makes much more sense than the
prvevious gfp_mask as a modifier approach.
I hope there won't be any weird include dependency problems but 0day
will tell us soon about that.
For the patch, feel free to add
Acked-by: Michal Hocko <[email protected]>
> ---
> include/linux/hugetlb.h | 26 ++++++++++++++++++--------
> mm/hugetlb.c | 35 ++---------------------------------
> mm/mempolicy.c | 10 ++++++----
> mm/migrate.c | 11 +++++++----
> 4 files changed, 33 insertions(+), 49 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 50650d0..bb93e95 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -10,6 +10,7 @@
> #include <linux/list.h>
> #include <linux/kref.h>
> #include <linux/pgtable.h>
> +#include <linux/gfp.h>
>
> struct ctl_table;
> struct user_struct;
> @@ -504,9 +505,8 @@ struct huge_bootmem_page {
>
> struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr, int avoid_reserve);
> -struct page *alloc_huge_page_node(struct hstate *h, int nid);
> struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> - nodemask_t *nmask);
> + nodemask_t *nmask, gfp_t gfp_mask);
> struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
> unsigned long address);
> struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
> @@ -692,6 +692,15 @@ static inline bool hugepage_movable_supported(struct hstate *h)
> return true;
> }
>
> +/* Movability of hugepages depends on migration support. */
> +static inline gfp_t htlb_alloc_mask(struct hstate *h)
> +{
> + if (hugepage_movable_supported(h))
> + return GFP_HIGHUSER_MOVABLE;
> + else
> + return GFP_HIGHUSER;
> +}
> +
> static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> struct mm_struct *mm, pte_t *pte)
> {
> @@ -759,13 +768,9 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
> return NULL;
> }
>
> -static inline struct page *alloc_huge_page_node(struct hstate *h, int nid)
> -{
> - return NULL;
> -}
> -
> static inline struct page *
> -alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask)
> +alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> + nodemask_t *nmask, gfp_t gfp_mask)
> {
> return NULL;
> }
> @@ -878,6 +883,11 @@ static inline bool hugepage_movable_supported(struct hstate *h)
> return false;
> }
>
> +static inline gfp_t htlb_alloc_mask(struct hstate *h)
> +{
> + return 0;
> +}
> +
> static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> struct mm_struct *mm, pte_t *pte)
> {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7e5ba5c0..3245aa0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1089,15 +1089,6 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
> return NULL;
> }
>
> -/* Movability of hugepages depends on migration support. */
> -static inline gfp_t htlb_alloc_mask(struct hstate *h)
> -{
> - if (hugepage_movable_supported(h))
> - return GFP_HIGHUSER_MOVABLE;
> - else
> - return GFP_HIGHUSER;
> -}
> -
> static struct page *dequeue_huge_page_vma(struct hstate *h,
> struct vm_area_struct *vma,
> unsigned long address, int avoid_reserve,
> @@ -1979,31 +1970,9 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
> }
>
> /* page migration callback function */
> -struct page *alloc_huge_page_node(struct hstate *h, int nid)
> -{
> - gfp_t gfp_mask = htlb_alloc_mask(h);
> - struct page *page = NULL;
> -
> - if (nid != NUMA_NO_NODE)
> - gfp_mask |= __GFP_THISNODE;
> -
> - spin_lock(&hugetlb_lock);
> - if (h->free_huge_pages - h->resv_huge_pages > 0)
> - page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL);
> - spin_unlock(&hugetlb_lock);
> -
> - if (!page)
> - page = alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
> -
> - return page;
> -}
> -
> -/* page migration callback function */
> struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> - nodemask_t *nmask)
> + nodemask_t *nmask, gfp_t gfp_mask)
> {
> - gfp_t gfp_mask = htlb_alloc_mask(h);
> -
> spin_lock(&hugetlb_lock);
> if (h->free_huge_pages - h->resv_huge_pages > 0) {
> struct page *page;
> @@ -2031,7 +2000,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
>
> gfp_mask = htlb_alloc_mask(h);
> node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
> - page = alloc_huge_page_nodemask(h, node, nodemask);
> + page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask);
> mpol_cond_put(mpol);
>
> return page;
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index dabcee8..9034a53 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1068,10 +1068,12 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist,
> /* page allocation callback for NUMA node migration */
> struct page *alloc_new_node_page(struct page *page, unsigned long node)
> {
> - if (PageHuge(page))
> - return alloc_huge_page_node(page_hstate(compound_head(page)),
> - node);
> - else if (PageTransHuge(page)) {
> + if (PageHuge(page)) {
> + struct hstate *h = page_hstate(compound_head(page));
> + gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
> +
> + return alloc_huge_page_nodemask(h, node, NULL, gfp_mask);
> + } else if (PageTransHuge(page)) {
> struct page *thp;
>
> thp = alloc_pages_node(node,
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 7370a66..3b3d918 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1538,10 +1538,13 @@ struct page *new_page_nodemask(struct page *page,
> unsigned int order = 0;
> struct page *new_page = NULL;
>
> - if (PageHuge(page))
> - return alloc_huge_page_nodemask(
> - page_hstate(compound_head(page)),
> - preferred_nid, nodemask);
> + if (PageHuge(page)) {
> + struct hstate *h = page_hstate(compound_head(page));
> +
> + gfp_mask = htlb_alloc_mask(h);
> + return alloc_huge_page_nodemask(h, preferred_nid,
> + nodemask, gfp_mask);
> + }
>
> if (PageTransHuge(page)) {
> gfp_mask |= GFP_TRANSHUGE;
> --
> 2.7.4
>
--
Michal Hocko
SUSE Labs
On 7/7/20 9:44 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> new_non_cma_page() in gup.c which try to allocate migration target page
> requires to allocate the new page that is not on the CMA area.
> new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> works well for THP page or normal page but not for hugetlb page.
>
> hugetlb page allocation process consists of two steps. First is dequeing
> from the pool. Second is, if there is no available page on the queue,
> allocating from the page allocator.
>
> new_non_cma_page() can control allocation from the page allocator by
> specifying correct gfp flag. However, dequeing cannot be controlled until
> now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
> since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
> patch tries to fix this situation.
>
> This patch makes the deque function on hugetlb CMA aware and skip CMA
> pages if newly added skip_cma argument is passed as true.
Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_
flag and avoid adding bool skip_cma everywhere?
I think that's what Michal suggested [1] except he said "the code already does
by memalloc_nocma_{save,restore} API". It needs extending a bit though, AFAICS.
__gup_longterm_locked() indeed does the save/restore, but restore comes before
check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an
adjustment is needed there, but that's all?
Hm the adjustment should be also done because save/restore is done around
__get_user_pages_locked(), but check_and_migrate_cma_pages() also calls
__get_user_pages_locked(), and that call not being between nocma save and
restore is thus also a correctness issue?
[1] https://lore.kernel.org/r/[email protected]
> Acked-by: Mike Kravetz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/linux/hugetlb.h | 6 ++----
> mm/gup.c | 3 ++-
> mm/hugetlb.c | 46 ++++++++++++++++++++++++++++++----------------
> mm/mempolicy.c | 2 +-
> mm/migrate.c | 2 +-
> 5 files changed, 36 insertions(+), 23 deletions(-)
>
On Tue 07-07-20 16:44:42, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> new_non_cma_page() in gup.c which try to allocate migration target page
> requires to allocate the new page that is not on the CMA area.
> new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> works well for THP page or normal page but not for hugetlb page.
>
> hugetlb page allocation process consists of two steps. First is dequeing
> from the pool. Second is, if there is no available page on the queue,
> allocating from the page allocator.
>
> new_non_cma_page() can control allocation from the page allocator by
> specifying correct gfp flag. However, dequeing cannot be controlled until
> now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
> since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
> patch tries to fix this situation.
>
> This patch makes the deque function on hugetlb CMA aware and skip CMA
> pages if newly added skip_cma argument is passed as true.
I really dislike this as already mentioned in the previous version of
the patch. You are making hugetlb and only one part of its allocator a
special snowflake which is almost always a bad idea. Your changelog
lacks any real justification for this inconsistency.
Also by doing so you are keeping an existing bug that the hugetlb
allocator doesn't respect scope gfp flags as I have mentioned when
reviewing the previous version. That bug likely doesn't matter now but
it might in future and as soon as it is fixed all this is just a
pointless exercise.
I do not have energy and time to burn to repeat that argumentation to I
will let Mike to have a final word. Btw. you are keeping his acks even
after considerable changes to patches which I am not really sure he is
ok with.
> Acked-by: Mike Kravetz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
To this particular patch.
[...]
> diff --git a/mm/gup.c b/mm/gup.c
> index 5daadae..2c3dab4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
> #ifdef CONFIG_HUGETLB_PAGE
> if (PageHuge(page)) {
> struct hstate *h = page_hstate(page);
> +
> /*
> * We don't want to dequeue from the pool because pool pages will
> * mostly be from the CMA region.
> */
> - return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
> + return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
Let me repeat that this whole thing is running under
memalloc_nocma_save. So additional parameter is bogus.
[...]
> -static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> +static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, bool skip_cma)
If you really insist on an additional parameter at this layer than it
should be checking for the PF_MEMALLOC_NOCMA instead.
[...]
> @@ -1971,21 +1977,29 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
>
> /* page migration callback function */
> struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> - nodemask_t *nmask, gfp_t gfp_mask)
> + nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
> {
> + unsigned int flags = 0;
> + struct page *page = NULL;
> +
> + if (skip_cma)
> + flags = memalloc_nocma_save();
This is pointless for a scope that is already defined up in the call
chain and fundamentally this is breaking the expected use of the scope
API. The primary reason for that API to exist is to define the scope and
have it sticky for _all_ allocation contexts. So if you have to use it
deep in the allocator then you are doing something wrong.
--
Michal Hocko
SUSE Labs
On Tue 07-07-20 16:44:43, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> In mm/migrate.c, THP allocation for migration is called with the provided
> gfp_mask | GFP_TRANSHUGE. This gfp_mask contains __GFP_RECLAIM and it
> would be conflict with the intention of the GFP_TRANSHUGE.
>
> GFP_TRANSHUGE/GFP_TRANSHUGE_LIGHT is introduced to control the reclaim
> behaviour by well defined manner since overhead of THP allocation is
> quite large and the whole system could suffer from it. So, they deals
> with __GFP_RECLAIM mask deliberately. If gfp_mask contains __GFP_RECLAIM
> and uses gfp_mask | GFP_TRANSHUGE(_LIGHT) for THP allocation, it means
> that it breaks the purpose of the GFP_TRANSHUGE(_LIGHT).
GFP_TRANSHUGE* is not a carved in stone design. Their primary reason to
exist is to control how hard to try for different allocation
paths/configurations because their latency expectations might be
largerly different. It is mostly the #PF path which aims to be as
lightweight as possible I believe nobody simply considered migration to be
very significant to even care. And I am still not sure it matters but
I would tend to agree that a consistency here is probably a very minor
plus.
Your changelog is slightly misleading in that regard because it suggests
that this is a real problem while it doesn't present any actual data.
It would be really nice to make the effective change really stand out.
We are only talking about __GFP_RECLAIM_KSWAPD here. So the only
difference is that the migration won't wake up kswapd now.
All that being said the changelog should be probably more explicit about
the fact that this is solely done for consistency and be honest that the
runtime effect is not really clear. This would help people reading it in
future.
--
Michal Hocko
SUSE Labs
On Tue 07-07-20 16:44:44, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> There are some similar functions for migration target allocation. Since
> there is no fundamental difference, it's better to keep just one rather
> than keeping all variants. This patch implements base migration target
> allocation function. In the following patches, variants will be converted
> to use this function.
>
> Changes should be mechanical but there are some differences. First, Some
> callers' nodemask is assgined to NULL since NULL nodemask will be
> considered as all available nodes, that is, &node_states[N_MEMORY].
> Second, for hugetlb page allocation, gfp_mask is ORed since a user could
> provide a gfp_mask from now on. Third, if provided node id is NUMA_NO_NODE,
> node id is set up to the node where migration source lives.
>
> Note that PageHighmem() call in previous function is changed to open-code
> "is_highmem_idx()" since it provides more readability.
>
> Acked-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Thanks!
> ---
> include/linux/migrate.h | 9 +++++----
> mm/internal.h | 7 +++++++
> mm/memory-failure.c | 7 +++++--
> mm/memory_hotplug.c | 14 +++++++++-----
> mm/migrate.c | 27 +++++++++++++++++----------
> mm/page_isolation.c | 7 +++++--
> 6 files changed, 48 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 1d70b4a..cc56f0d 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -10,6 +10,8 @@
> typedef struct page *new_page_t(struct page *page, unsigned long private);
> typedef void free_page_t(struct page *page, unsigned long private);
>
> +struct migration_target_control;
> +
> /*
> * Return values from addresss_space_operations.migratepage():
> * - negative errno on page migration failure;
> @@ -39,8 +41,7 @@ extern int migrate_page(struct address_space *mapping,
> enum migrate_mode mode);
> extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
> unsigned long private, enum migrate_mode mode, int reason);
> -extern struct page *new_page_nodemask(struct page *page,
> - int preferred_nid, nodemask_t *nodemask);
> +extern struct page *alloc_migration_target(struct page *page, unsigned long private);
> extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
> extern void putback_movable_page(struct page *page);
>
> @@ -59,8 +60,8 @@ static inline int migrate_pages(struct list_head *l, new_page_t new,
> free_page_t free, unsigned long private, enum migrate_mode mode,
> int reason)
> { return -ENOSYS; }
> -static inline struct page *new_page_nodemask(struct page *page,
> - int preferred_nid, nodemask_t *nodemask)
> +static inline struct page *alloc_migration_target(struct page *page,
> + unsigned long private)
> { return NULL; }
> static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
> { return -EBUSY; }
> diff --git a/mm/internal.h b/mm/internal.h
> index dd14c53..0beacf3 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -614,4 +614,11 @@ static inline bool is_migrate_highatomic_page(struct page *page)
>
> void setup_zone_pageset(struct zone *zone);
> extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
> +
> +struct migration_target_control {
> + int nid; /* preferred node id */
> + nodemask_t *nmask;
> + gfp_t gfp_mask;
> +};
> +
> #endif /* __MM_INTERNAL_H */
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index c5e4cee..609d42b6 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1679,9 +1679,12 @@ EXPORT_SYMBOL(unpoison_memory);
>
> static struct page *new_page(struct page *p, unsigned long private)
> {
> - int nid = page_to_nid(p);
> + struct migration_target_control mtc = {
> + .nid = page_to_nid(p),
> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> + };
>
> - return new_page_nodemask(p, nid, &node_states[N_MEMORY]);
> + return alloc_migration_target(p, (unsigned long)&mtc);
> }
>
> /*
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index cafe65eb..86bc2ad 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1267,19 +1267,23 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>
> static struct page *new_node_page(struct page *page, unsigned long private)
> {
> - int nid = page_to_nid(page);
> nodemask_t nmask = node_states[N_MEMORY];
> + struct migration_target_control mtc = {
> + .nid = page_to_nid(page),
> + .nmask = &nmask,
> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> + };
>
> /*
> * try to allocate from a different node but reuse this node if there
> * are no other online nodes to be used (e.g. we are offlining a part
> * of the only existing node)
> */
> - node_clear(nid, nmask);
> - if (nodes_empty(nmask))
> - node_set(nid, nmask);
> + node_clear(mtc.nid, *mtc.nmask);
> + if (nodes_empty(*mtc.nmask))
> + node_set(mtc.nid, *mtc.nmask);
>
> - return new_page_nodemask(page, nid, &nmask);
> + return alloc_migration_target(page, (unsigned long)&mtc);
> }
>
> static int
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ecd7615..00cd81c 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1531,19 +1531,27 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> return rc;
> }
>
> -struct page *new_page_nodemask(struct page *page,
> - int preferred_nid, nodemask_t *nodemask)
> +struct page *alloc_migration_target(struct page *page, unsigned long private)
> {
> - gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
> + struct migration_target_control *mtc;
> + gfp_t gfp_mask;
> unsigned int order = 0;
> struct page *new_page = NULL;
> + int nid;
> + int zidx;
> +
> + mtc = (struct migration_target_control *)private;
> + gfp_mask = mtc->gfp_mask;
> + nid = mtc->nid;
> + if (nid == NUMA_NO_NODE)
> + nid = page_to_nid(page);
>
> if (PageHuge(page)) {
> struct hstate *h = page_hstate(compound_head(page));
>
> - gfp_mask = htlb_alloc_mask(h);
> - return alloc_huge_page_nodemask(h, preferred_nid,
> - nodemask, gfp_mask, false);
> + gfp_mask |= htlb_alloc_mask(h);
> + return alloc_huge_page_nodemask(h, nid, mtc->nmask,
> + gfp_mask, false);
> }
>
> if (PageTransHuge(page)) {
> @@ -1555,12 +1563,11 @@ struct page *new_page_nodemask(struct page *page,
> gfp_mask |= GFP_TRANSHUGE;
> order = HPAGE_PMD_ORDER;
> }
> -
> - if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
> + zidx = zone_idx(page_zone(page));
> + if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
> gfp_mask |= __GFP_HIGHMEM;
>
> - new_page = __alloc_pages_nodemask(gfp_mask, order,
> - preferred_nid, nodemask);
> + new_page = __alloc_pages_nodemask(gfp_mask, order, nid, mtc->nmask);
>
> if (new_page && PageTransHuge(new_page))
> prep_transhuge_page(new_page);
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index aec26d9..f25c66e 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -309,7 +309,10 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>
> struct page *alloc_migrate_target(struct page *page, unsigned long private)
> {
> - int nid = page_to_nid(page);
> + struct migration_target_control mtc = {
> + .nid = page_to_nid(page),
> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> + };
>
> - return new_page_nodemask(page, nid, &node_states[N_MEMORY]);
> + return alloc_migration_target(page, (unsigned long)&mtc);
> }
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
On Tue 07-07-20 16:44:45, Joonsoo Kim wrote:
[...]
> @@ -1551,9 +1552,12 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
>
> gfp_mask |= htlb_alloc_mask(h);
> return alloc_huge_page_nodemask(h, nid, mtc->nmask,
> - gfp_mask, false);
> + gfp_mask, mtc->skip_cma);
> }
>
> + if (mtc->skip_cma)
> + flags = memalloc_nocma_save();
> +
As already mentioned in previous email this is a completely wrong usage
of the scope API. The scope should be defined by the caller and this
should be all transparent by the allocator layer.
> if (PageTransHuge(page)) {
> /*
> * clear __GFP_RECALIM since GFP_TRANSHUGE is the gfp_mask
> @@ -1572,6 +1576,9 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> if (new_page && PageTransHuge(new_page))
> prep_transhuge_page(new_page);
>
> + if (mtc->skip_cma)
> + memalloc_nocma_restore(flags);
> +
> return new_page;
> }
>
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
On Tue 07-07-20 16:44:48, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> There is a well-defined standard migration target callback. Use it
> directly.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> mm/memory-failure.c | 18 ++++++------------
> 1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 609d42b6..3b89804 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1677,16 +1677,6 @@ int unpoison_memory(unsigned long pfn)
> }
> EXPORT_SYMBOL(unpoison_memory);
>
> -static struct page *new_page(struct page *p, unsigned long private)
> -{
> - struct migration_target_control mtc = {
> - .nid = page_to_nid(p),
> - .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> - };
> -
> - return alloc_migration_target(p, (unsigned long)&mtc);
> -}
> -
> /*
> * Safely get reference count of an arbitrary page.
> * Returns 0 for a free page, -EIO for a zero refcount page
> @@ -1793,6 +1783,10 @@ static int __soft_offline_page(struct page *page)
> const char *msg_page[] = {"page", "hugepage"};
> bool huge = PageHuge(page);
> LIST_HEAD(pagelist);
> + struct migration_target_control mtc = {
> + .nid = NUMA_NO_NODE,
> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> + };
Is NUMA_NO_NODE really intended here? The original code has preferred to
stay on the same node. If this is intentional then the changelog should
be explicit about that.
>
> /*
> * Check PageHWPoison again inside page lock because PageHWPoison
> @@ -1829,8 +1823,8 @@ static int __soft_offline_page(struct page *page)
> }
>
> if (isolate_page(hpage, &pagelist)) {
> - ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> - MIGRATE_SYNC, MR_MEMORY_FAILURE);
> + ret = migrate_pages(&pagelist, alloc_migration_target, NULL,
> + (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE);
> if (!ret) {
> bool release = !huge;
>
> --
> 2.7.4
>
--
Michal Hocko
SUSE Labs
On Tue 07-07-20 16:44:49, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> To calculate the correct node to migrate the page for hotplug, we need
> to check node id of the page. Wrapper for alloc_migration_target() exists
> for this purpose.
>
> However, Vlastimil informs that all migration source pages come from
> a single node. In this case, we don't need to check the node id for each
> page and we don't need to re-set the target nodemask for each page by
> using the wrapper. Set up the migration_target_control once and use it for
> all pages.
yes, memory offlining only operates on a single zone. Have a look at
test_pages_in_a_zone().
>
> Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Michal Hocko <[email protected]>
> ---
> mm/memory_hotplug.c | 46 ++++++++++++++++++++++------------------------
> 1 file changed, 22 insertions(+), 24 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 86bc2ad..269e8ca 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1265,27 +1265,6 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
> return 0;
> }
>
> -static struct page *new_node_page(struct page *page, unsigned long private)
> -{
> - nodemask_t nmask = node_states[N_MEMORY];
> - struct migration_target_control mtc = {
> - .nid = page_to_nid(page),
> - .nmask = &nmask,
> - .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> - };
> -
> - /*
> - * try to allocate from a different node but reuse this node if there
> - * are no other online nodes to be used (e.g. we are offlining a part
> - * of the only existing node)
> - */
> - node_clear(mtc.nid, *mtc.nmask);
> - if (nodes_empty(*mtc.nmask))
> - node_set(mtc.nid, *mtc.nmask);
> -
> - return alloc_migration_target(page, (unsigned long)&mtc);
> -}
> -
> static int
> do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> {
> @@ -1345,9 +1324,28 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> put_page(page);
> }
> if (!list_empty(&source)) {
> - /* Allocate a new page from the nearest neighbor node */
> - ret = migrate_pages(&source, new_node_page, NULL, 0,
> - MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> + nodemask_t nmask = node_states[N_MEMORY];
> + struct migration_target_control mtc = {
> + .nmask = &nmask,
> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> + };
> +
> + /*
> + * We have checked that migration range is on a single zone so
> + * we can use the nid of the first page to all the others.
> + */
> + mtc.nid = page_to_nid(list_first_entry(&source, struct page, lru));
> +
> + /*
> + * try to allocate from a different node but reuse this node
> + * if there are no other online nodes to be used (e.g. we are
> + * offlining a part of the only existing node)
> + */
> + node_clear(mtc.nid, *mtc.nmask);
> + if (nodes_empty(*mtc.nmask))
> + node_set(mtc.nid, *mtc.nmask);
> + ret = migrate_pages(&source, alloc_migration_target, NULL,
> + (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> if (ret) {
> list_for_each_entry(page, &source, lru) {
> pr_warn("migrating pfn %lx failed ret:%d ",
> --
> 2.7.4
>
--
Michal Hocko
SUSE Labs
On 7/7/20 9:44 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> In mm/migrate.c, THP allocation for migration is called with the provided
> gfp_mask | GFP_TRANSHUGE. This gfp_mask contains __GFP_RECLAIM and it
> would be conflict with the intention of the GFP_TRANSHUGE.
>
> GFP_TRANSHUGE/GFP_TRANSHUGE_LIGHT is introduced to control the reclaim
> behaviour by well defined manner since overhead of THP allocation is
> quite large and the whole system could suffer from it. So, they deals
> with __GFP_RECLAIM mask deliberately. If gfp_mask contains __GFP_RECLAIM
> and uses gfp_mask | GFP_TRANSHUGE(_LIGHT) for THP allocation, it means
> that it breaks the purpose of the GFP_TRANSHUGE(_LIGHT).
>
> This patch fixes this situation by clearing __GFP_RECLAIM in provided
> gfp_mask. Note that there are some other THP allocations for migration
> and they just uses GFP_TRANSHUGE(_LIGHT) directly. This patch would make
> all THP allocation for migration consistent.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> mm/migrate.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 02b31fe..ecd7615 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1547,6 +1547,11 @@ struct page *new_page_nodemask(struct page *page,
> }
>
> if (PageTransHuge(page)) {
> + /*
> + * clear __GFP_RECALIM since GFP_TRANSHUGE is the gfp_mask
> + * that chooses the reclaim masks deliberately.
> + */
> + gfp_mask &= ~__GFP_RECLAIM;
> gfp_mask |= GFP_TRANSHUGE;
In addition to what Michal said...
The mask is not passed to this function, so I would just redefine it, as is done
in the hugetlb case. We probably don't even need the __GFP_RETRY_MAYFAIL for the
THP case asi it's just there to prevent OOM kill (per commit 0f55685627d6d ) and
the costly order of THP is enough for that.
> order = HPAGE_PMD_ORDER;
> }
>
On 7/7/20 9:44 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> There are some similar functions for migration target allocation. Since
> there is no fundamental difference, it's better to keep just one rather
> than keeping all variants. This patch implements base migration target
> allocation function. In the following patches, variants will be converted
> to use this function.
>
> Changes should be mechanical but there are some differences. First, Some
> callers' nodemask is assgined to NULL since NULL nodemask will be
> considered as all available nodes, that is, &node_states[N_MEMORY].
> Second, for hugetlb page allocation, gfp_mask is ORed since a user could
> provide a gfp_mask from now on.
I think that's wrong. See how htlb_alloc_mask() determines between
GFP_HIGHUSER_MOVABLE and GFP_HIGHUSER, but then you OR it with __GFP_MOVABLE so
it's always GFP_HIGHUSER_MOVABLE.
Yeah, gfp_mask for hugeltb become exposed in new_page_nodemask() after v4 3/11
patch, but that doesn't mean we can start modifying it :/
On 7/7/20 9:44 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> To calculate the correct node to migrate the page for hotplug, we need
> to check node id of the page. Wrapper for alloc_migration_target() exists
> for this purpose.
>
> However, Vlastimil informs that all migration source pages come from
> a single node. In this case, we don't need to check the node id for each
> page and we don't need to re-set the target nodemask for each page by
> using the wrapper. Set up the migration_target_control once and use it for
> all pages.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Thanks! Nitpick below.
> @@ -1345,9 +1324,28 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> put_page(page);
> }
> if (!list_empty(&source)) {
> - /* Allocate a new page from the nearest neighbor node */
> - ret = migrate_pages(&source, new_node_page, NULL, 0,
> - MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> + nodemask_t nmask = node_states[N_MEMORY];
> + struct migration_target_control mtc = {
> + .nmask = &nmask,
> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> + };
> +
> + /*
> + * We have checked that migration range is on a single zone so
> + * we can use the nid of the first page to all the others.
> + */
> + mtc.nid = page_to_nid(list_first_entry(&source, struct page, lru));
> +
> + /*
> + * try to allocate from a different node but reuse this node
> + * if there are no other online nodes to be used (e.g. we are
> + * offlining a part of the only existing node)
> + */
> + node_clear(mtc.nid, *mtc.nmask);
> + if (nodes_empty(*mtc.nmask))
> + node_set(mtc.nid, *mtc.nmask);
You could have kept using 'nmask' instead of '*mtc.nmask'. Actually that applies
to patch 6 too, for less churn.
> + ret = migrate_pages(&source, alloc_migration_target, NULL,
> + (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> if (ret) {
> list_for_each_entry(page, &source, lru) {
> pr_warn("migrating pfn %lx failed ret:%d ",
>
On 7/7/20 9:44 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> There is a well-defined standard migration target callback. Use it
> directly.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
> ---
> mm/memory-failure.c | 18 ++++++------------
> 1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 609d42b6..3b89804 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1677,16 +1677,6 @@ int unpoison_memory(unsigned long pfn)
> }
> EXPORT_SYMBOL(unpoison_memory);
>
> -static struct page *new_page(struct page *p, unsigned long private)
> -{
> - struct migration_target_control mtc = {
> - .nid = page_to_nid(p),
> - .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> - };
> -
> - return alloc_migration_target(p, (unsigned long)&mtc);
> -}
> -
> /*
> * Safely get reference count of an arbitrary page.
> * Returns 0 for a free page, -EIO for a zero refcount page
> @@ -1793,6 +1783,10 @@ static int __soft_offline_page(struct page *page)
> const char *msg_page[] = {"page", "hugepage"};
> bool huge = PageHuge(page);
> LIST_HEAD(pagelist);
> + struct migration_target_control mtc = {
> + .nid = NUMA_NO_NODE,
> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> + };
>
> /*
> * Check PageHWPoison again inside page lock because PageHWPoison
> @@ -1829,8 +1823,8 @@ static int __soft_offline_page(struct page *page)
> }
>
> if (isolate_page(hpage, &pagelist)) {
> - ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> - MIGRATE_SYNC, MR_MEMORY_FAILURE);
> + ret = migrate_pages(&pagelist, alloc_migration_target, NULL,
> + (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE);
> if (!ret) {
> bool release = !huge;
>
>
On 7/7/20 1:48 PM, Michal Hocko wrote:
> On Tue 07-07-20 16:44:48, Joonsoo Kim wrote:
>> From: Joonsoo Kim <[email protected]>
>>
>> There is a well-defined standard migration target callback. Use it
>> directly.
>>
>> Signed-off-by: Joonsoo Kim <[email protected]>
>> ---
>> mm/memory-failure.c | 18 ++++++------------
>> 1 file changed, 6 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 609d42b6..3b89804 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1677,16 +1677,6 @@ int unpoison_memory(unsigned long pfn)
>> }
>> EXPORT_SYMBOL(unpoison_memory);
>>
>> -static struct page *new_page(struct page *p, unsigned long private)
>> -{
>> - struct migration_target_control mtc = {
>> - .nid = page_to_nid(p),
>> - .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
>> - };
>> -
>> - return alloc_migration_target(p, (unsigned long)&mtc);
>> -}
>> -
>> /*
>> * Safely get reference count of an arbitrary page.
>> * Returns 0 for a free page, -EIO for a zero refcount page
>> @@ -1793,6 +1783,10 @@ static int __soft_offline_page(struct page *page)
>> const char *msg_page[] = {"page", "hugepage"};
>> bool huge = PageHuge(page);
>> LIST_HEAD(pagelist);
>> + struct migration_target_control mtc = {
>> + .nid = NUMA_NO_NODE,
>> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
>> + };
>
> Is NUMA_NO_NODE really intended here? The original code has preferred to
> stay on the same node.
The alloc_migration_target() interprets NUMA_NO_NODE as a request to call
page_to_nid(), so we don't need these thin wrappers that do just that. I have
suggested this in v3 review and it's mentioned in 06/11.
> If this is intentional then the changelog should
> be explicit about that.
>
>>
>> /*
>> * Check PageHWPoison again inside page lock because PageHWPoison
>> @@ -1829,8 +1823,8 @@ static int __soft_offline_page(struct page *page)
>> }
>>
>> if (isolate_page(hpage, &pagelist)) {
>> - ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>> - MIGRATE_SYNC, MR_MEMORY_FAILURE);
>> + ret = migrate_pages(&pagelist, alloc_migration_target, NULL,
>> + (unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE);
>> if (!ret) {
>> bool release = !huge;
>>
>> --
>> 2.7.4
>>
>
On Tue 07-07-20 17:03:50, Vlastimil Babka wrote:
> On 7/7/20 1:48 PM, Michal Hocko wrote:
> > On Tue 07-07-20 16:44:48, Joonsoo Kim wrote:
> >> From: Joonsoo Kim <[email protected]>
> >>
> >> There is a well-defined standard migration target callback. Use it
> >> directly.
> >>
> >> Signed-off-by: Joonsoo Kim <[email protected]>
> >> ---
> >> mm/memory-failure.c | 18 ++++++------------
> >> 1 file changed, 6 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 609d42b6..3b89804 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1677,16 +1677,6 @@ int unpoison_memory(unsigned long pfn)
> >> }
> >> EXPORT_SYMBOL(unpoison_memory);
> >>
> >> -static struct page *new_page(struct page *p, unsigned long private)
> >> -{
> >> - struct migration_target_control mtc = {
> >> - .nid = page_to_nid(p),
> >> - .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> >> - };
> >> -
> >> - return alloc_migration_target(p, (unsigned long)&mtc);
> >> -}
> >> -
> >> /*
> >> * Safely get reference count of an arbitrary page.
> >> * Returns 0 for a free page, -EIO for a zero refcount page
> >> @@ -1793,6 +1783,10 @@ static int __soft_offline_page(struct page *page)
> >> const char *msg_page[] = {"page", "hugepage"};
> >> bool huge = PageHuge(page);
> >> LIST_HEAD(pagelist);
> >> + struct migration_target_control mtc = {
> >> + .nid = NUMA_NO_NODE,
> >> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> >> + };
> >
> > Is NUMA_NO_NODE really intended here? The original code has preferred to
> > stay on the same node.
>
> The alloc_migration_target() interprets NUMA_NO_NODE as a request to call
> page_to_nid(), so we don't need these thin wrappers that do just that. I have
> suggested this in v3 review and it's mentioned in 06/11.
Ohh, right. I just lost that piece of information on the way. It
wouldn't hurt to keep page_to_nid here for readability though.
--
Michal Hocko
SUSE Labs
On Tue 07-07-20 16:49:51, Vlastimil Babka wrote:
> On 7/7/20 9:44 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > There are some similar functions for migration target allocation. Since
> > there is no fundamental difference, it's better to keep just one rather
> > than keeping all variants. This patch implements base migration target
> > allocation function. In the following patches, variants will be converted
> > to use this function.
> >
> > Changes should be mechanical but there are some differences. First, Some
> > callers' nodemask is assgined to NULL since NULL nodemask will be
> > considered as all available nodes, that is, &node_states[N_MEMORY].
> > Second, for hugetlb page allocation, gfp_mask is ORed since a user could
> > provide a gfp_mask from now on.
>
> I think that's wrong. See how htlb_alloc_mask() determines between
> GFP_HIGHUSER_MOVABLE and GFP_HIGHUSER, but then you OR it with __GFP_MOVABLE so
> it's always GFP_HIGHUSER_MOVABLE.
Right you are! Not that it would make any real difference because only
migrateable hugetlb pages will get __GFP_MOVABLE and so we shouldn't
really end up here for !movable pages in the first place (not sure about
soft offlining at this moment). But yeah it would be simply better to
override gfp mask for hugetlb which we have been doing anyway.
--
Michal Hocko
SUSE Labs
On Tue 07-07-20 13:31:19, Michal Hocko wrote:
> Btw. you are keeping his acks even
> after considerable changes to patches which I am not really sure he is
> ok with.
I am sorry but I have missed the last email from Mike in v3.
--
Michal Hocko
SUSE Labs
On Tue, Jul 07, 2020 at 01:31:16PM +0200, Michal Hocko wrote:
> On Tue 07-07-20 16:44:42, Joonsoo Kim wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > new_non_cma_page() in gup.c which try to allocate migration target page
> > requires to allocate the new page that is not on the CMA area.
> > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > works well for THP page or normal page but not for hugetlb page.
> >
> > hugetlb page allocation process consists of two steps. First is dequeing
> > from the pool. Second is, if there is no available page on the queue,
> > allocating from the page allocator.
> >
> > new_non_cma_page() can control allocation from the page allocator by
> > specifying correct gfp flag. However, dequeing cannot be controlled until
> > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
> > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
> > patch tries to fix this situation.
> >
> > This patch makes the deque function on hugetlb CMA aware and skip CMA
> > pages if newly added skip_cma argument is passed as true.
>
> I really dislike this as already mentioned in the previous version of
> the patch. You are making hugetlb and only one part of its allocator a
> special snowflake which is almost always a bad idea. Your changelog
> lacks any real justification for this inconsistency.
>
> Also by doing so you are keeping an existing bug that the hugetlb
> allocator doesn't respect scope gfp flags as I have mentioned when
> reviewing the previous version. That bug likely doesn't matter now but
> it might in future and as soon as it is fixed all this is just a
> pointless exercise.
>
> I do not have energy and time to burn to repeat that argumentation to I
> will let Mike to have a final word. Btw. you are keeping his acks even
> after considerable changes to patches which I am not really sure he is
> ok with.
As you replied a minute ago, Mike acked.
> > Acked-by: Mike Kravetz <[email protected]>
> > Signed-off-by: Joonsoo Kim <[email protected]>
>
> To this particular patch.
> [...]
>
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 5daadae..2c3dab4 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
> > #ifdef CONFIG_HUGETLB_PAGE
> > if (PageHuge(page)) {
> > struct hstate *h = page_hstate(page);
> > +
> > /*
> > * We don't want to dequeue from the pool because pool pages will
> > * mostly be from the CMA region.
> > */
> > - return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
> > + return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
>
> Let me repeat that this whole thing is running under
> memalloc_nocma_save. So additional parameter is bogus.
As Vlasimil said in other reply, we are not under
memalloc_nocma_save(). Anyway, now, I also think that additional parameter
isn't need.
> [...]
> > -static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> > +static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, bool skip_cma)
>
> If you really insist on an additional parameter at this layer than it
> should be checking for the PF_MEMALLOC_NOCMA instead.
I will change the patch to check PF_MEMALLOC_NOCMA instead of
introducing and checking skip_cma.
> [...]
> > @@ -1971,21 +1977,29 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
> >
> > /* page migration callback function */
> > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> > - nodemask_t *nmask, gfp_t gfp_mask)
> > + nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
> > {
> > + unsigned int flags = 0;
> > + struct page *page = NULL;
> > +
> > + if (skip_cma)
> > + flags = memalloc_nocma_save();
>
> This is pointless for a scope that is already defined up in the call
> chain and fundamentally this is breaking the expected use of the scope
> API. The primary reason for that API to exist is to define the scope and
> have it sticky for _all_ allocation contexts. So if you have to use it
> deep in the allocator then you are doing something wrong.
As mentioned above, we are not under memalloc_nocma_save(). Anyway, I
will rework the patch and attach it to Vlasimil's reply. It's appreciate
if you check it again.
Thanks.
On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote:
> On 7/7/20 9:44 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > new_non_cma_page() in gup.c which try to allocate migration target page
> > requires to allocate the new page that is not on the CMA area.
> > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > works well for THP page or normal page but not for hugetlb page.
> >
> > hugetlb page allocation process consists of two steps. First is dequeing
> > from the pool. Second is, if there is no available page on the queue,
> > allocating from the page allocator.
> >
> > new_non_cma_page() can control allocation from the page allocator by
> > specifying correct gfp flag. However, dequeing cannot be controlled until
> > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
> > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
> > patch tries to fix this situation.
> >
> > This patch makes the deque function on hugetlb CMA aware and skip CMA
> > pages if newly added skip_cma argument is passed as true.
>
> Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_
> flag and avoid adding bool skip_cma everywhere?
Okay! Please check following patch.
>
> I think that's what Michal suggested [1] except he said "the code already does
> by memalloc_nocma_{save,restore} API". It needs extending a bit though, AFAICS.
> __gup_longterm_locked() indeed does the save/restore, but restore comes before
> check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an
> adjustment is needed there, but that's all?
>
> Hm the adjustment should be also done because save/restore is done around
> __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls
> __get_user_pages_locked(), and that call not being between nocma save and
> restore is thus also a correctness issue?
Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It
would not cause any problem.
------------------>8-------------------
From bcfc57e3c6f2df1ad2940308b89d740cd3f0fba8 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <[email protected]>
Date: Wed, 8 Jul 2020 14:39:26 +0900
Subject: [PATCH] mm/hugetlb: make hugetlb migration callback CMA aware
new_non_cma_page() in gup.c which try to allocate migration target page
requires to allocate the new page that is not on the CMA area.
new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
works well for THP page or normal page but not for hugetlb page.
hugetlb page allocation process consists of two steps. First is dequeing
from the pool. Second is, if there is no available page on the queue,
allocating from the page allocator.
new_non_cma_page() can control allocation from the page allocator by
specifying correct gfp flag. However, dequeing cannot be controlled until
now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
patch tries to fix this situation.
This patch makes new_non_cma_page() uses memalloc_nocma_{save,restore}
to exclude CMA memory rather than manually clearing __GFP_MOVABLE. And,
this patch also makes the deque function on hugetlb CMA aware. In the
deque function, CMA memory is skipped if PF_MEMALLOC_NOCMA flag is set
by memalloc_nocma_{save,restore}.
Acked-by: Mike Kravetz <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/hugetlb.h | 2 --
mm/gup.c | 32 +++++++++++++++-----------------
mm/hugetlb.c | 11 +++++++++--
3 files changed, 24 insertions(+), 21 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index bb93e95..34a10e5 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -509,8 +509,6 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
- int nid, nodemask_t *nmask);
int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
diff --git a/mm/gup.c b/mm/gup.c
index 5daadae..79142a9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1623,6 +1623,8 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
* allocation memory.
*/
gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
+ unsigned int flags = memalloc_nocma_save();
+ struct page *new_page = NULL;
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
@@ -1630,33 +1632,29 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
#ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
- /*
- * We don't want to dequeue from the pool because pool pages will
- * mostly be from the CMA region.
- */
- return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+
+ new_page = alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
+ goto out;
}
#endif
+
if (PageTransHuge(page)) {
- struct page *thp;
/*
* ignore allocation failure warnings
*/
gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
- /*
- * Remove the movable mask so that we don't allocate from
- * CMA area again.
- */
- thp_gfpmask &= ~__GFP_MOVABLE;
- thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
- if (!thp)
- return NULL;
- prep_transhuge_page(thp);
- return thp;
+ new_page = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
+ if (new_page)
+ prep_transhuge_page(new_page);
+ goto out;
}
- return __alloc_pages_node(nid, gfp_mask, 0);
+ new_page = __alloc_pages_node(nid, gfp_mask, 0);
+
+out:
+ memalloc_nocma_restore(flags);
+ return new_page;
}
static long check_and_migrate_cma_pages(struct task_struct *tsk,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3245aa0..514e29c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -29,6 +29,7 @@
#include <linux/numa.h>
#include <linux/llist.h>
#include <linux/cma.h>
+#include <linux/sched/mm.h>
#include <asm/page.h>
#include <asm/tlb.h>
@@ -1036,10 +1037,16 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
{
struct page *page;
+ bool nocma = !!(READ_ONCE(current->flags) & PF_MEMALLOC_NOCMA);
+
+ list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
+ if (nocma && is_migrate_cma_page(page))
+ continue;
- list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
if (!PageHWPoison(page))
break;
+ }
+
/*
* if 'non-isolated free hugepage' not found on the list,
* the allocation fails.
@@ -1928,7 +1935,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
return page;
}
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
int nid, nodemask_t *nmask)
{
struct page *page;
--
2.7.4
On Tue, Jul 07, 2020 at 02:17:55PM +0200, Vlastimil Babka wrote:
> On 7/7/20 9:44 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > In mm/migrate.c, THP allocation for migration is called with the provided
> > gfp_mask | GFP_TRANSHUGE. This gfp_mask contains __GFP_RECLAIM and it
> > would be conflict with the intention of the GFP_TRANSHUGE.
> >
> > GFP_TRANSHUGE/GFP_TRANSHUGE_LIGHT is introduced to control the reclaim
> > behaviour by well defined manner since overhead of THP allocation is
> > quite large and the whole system could suffer from it. So, they deals
> > with __GFP_RECLAIM mask deliberately. If gfp_mask contains __GFP_RECLAIM
> > and uses gfp_mask | GFP_TRANSHUGE(_LIGHT) for THP allocation, it means
> > that it breaks the purpose of the GFP_TRANSHUGE(_LIGHT).
> >
> > This patch fixes this situation by clearing __GFP_RECLAIM in provided
> > gfp_mask. Note that there are some other THP allocations for migration
> > and they just uses GFP_TRANSHUGE(_LIGHT) directly. This patch would make
> > all THP allocation for migration consistent.
> >
> > Signed-off-by: Joonsoo Kim <[email protected]>
> > ---
> > mm/migrate.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 02b31fe..ecd7615 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1547,6 +1547,11 @@ struct page *new_page_nodemask(struct page *page,
> > }
> >
> > if (PageTransHuge(page)) {
> > + /*
> > + * clear __GFP_RECALIM since GFP_TRANSHUGE is the gfp_mask
> > + * that chooses the reclaim masks deliberately.
> > + */
> > + gfp_mask &= ~__GFP_RECLAIM;
> > gfp_mask |= GFP_TRANSHUGE;
>
> In addition to what Michal said...
>
> The mask is not passed to this function, so I would just redefine it, as is done
> in the hugetlb case. We probably don't even need the __GFP_RETRY_MAYFAIL for the
> THP case asi it's just there to prevent OOM kill (per commit 0f55685627d6d ) and
> the costly order of THP is enough for that.
Will check.
Thanks.
On Tue, Jul 07, 2020 at 01:40:19PM +0200, Michal Hocko wrote:
> On Tue 07-07-20 16:44:43, Joonsoo Kim wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > In mm/migrate.c, THP allocation for migration is called with the provided
> > gfp_mask | GFP_TRANSHUGE. This gfp_mask contains __GFP_RECLAIM and it
> > would be conflict with the intention of the GFP_TRANSHUGE.
> >
> > GFP_TRANSHUGE/GFP_TRANSHUGE_LIGHT is introduced to control the reclaim
> > behaviour by well defined manner since overhead of THP allocation is
> > quite large and the whole system could suffer from it. So, they deals
> > with __GFP_RECLAIM mask deliberately. If gfp_mask contains __GFP_RECLAIM
> > and uses gfp_mask | GFP_TRANSHUGE(_LIGHT) for THP allocation, it means
> > that it breaks the purpose of the GFP_TRANSHUGE(_LIGHT).
>
> GFP_TRANSHUGE* is not a carved in stone design. Their primary reason to
> exist is to control how hard to try for different allocation
> paths/configurations because their latency expectations might be
> largerly different. It is mostly the #PF path which aims to be as
> lightweight as possible I believe nobody simply considered migration to be
> very significant to even care. And I am still not sure it matters but
> I would tend to agree that a consistency here is probably a very minor
> plus.
>
> Your changelog is slightly misleading in that regard because it suggests
> that this is a real problem while it doesn't present any actual data.
> It would be really nice to make the effective change really stand out.
> We are only talking about __GFP_RECLAIM_KSWAPD here. So the only
> difference is that the migration won't wake up kswapd now.
>
> All that being said the changelog should be probably more explicit about
> the fact that this is solely done for consistency and be honest that the
> runtime effect is not really clear. This would help people reading it in
> future.
Okay. How about following changelog?
Thanks.
----------->8--------------------
Subject: [PATCH] mm/migrate: clear __GFP_RECLAIM for THP allocation for
migration
In migration target allocation functions, THP allocations uses different
gfp_mask, especially, in regard to the reclaim gfp_mask. There is no
reason to use different reclaim gfp_mask for each cases and it is
an obstacle to make a common function in order to clean-up migration
target allocation functions. This patch fixes this situation by using
common reclaim gfp_mask for THP allocation.
On Tue, Jul 07, 2020 at 01:46:14PM +0200, Michal Hocko wrote:
> On Tue 07-07-20 16:44:45, Joonsoo Kim wrote:
> [...]
> > @@ -1551,9 +1552,12 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
> >
> > gfp_mask |= htlb_alloc_mask(h);
> > return alloc_huge_page_nodemask(h, nid, mtc->nmask,
> > - gfp_mask, false);
> > + gfp_mask, mtc->skip_cma);
> > }
> >
> > + if (mtc->skip_cma)
> > + flags = memalloc_nocma_save();
> > +
>
> As already mentioned in previous email this is a completely wrong usage
> of the scope API. The scope should be defined by the caller and this
> should be all transparent by the allocator layer.
Okay. Like as newly sent patch for 04/11, this patch will also be changed.
Thanks.
On Wed 08-07-20 16:16:02, Joonsoo Kim wrote:
> On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote:
> > On 7/7/20 9:44 AM, [email protected] wrote:
> > > From: Joonsoo Kim <[email protected]>
> > >
> > > new_non_cma_page() in gup.c which try to allocate migration target page
> > > requires to allocate the new page that is not on the CMA area.
> > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > > works well for THP page or normal page but not for hugetlb page.
> > >
> > > hugetlb page allocation process consists of two steps. First is dequeing
> > > from the pool. Second is, if there is no available page on the queue,
> > > allocating from the page allocator.
> > >
> > > new_non_cma_page() can control allocation from the page allocator by
> > > specifying correct gfp flag. However, dequeing cannot be controlled until
> > > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
> > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
> > > patch tries to fix this situation.
> > >
> > > This patch makes the deque function on hugetlb CMA aware and skip CMA
> > > pages if newly added skip_cma argument is passed as true.
> >
> > Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_
> > flag and avoid adding bool skip_cma everywhere?
>
> Okay! Please check following patch.
> >
> > I think that's what Michal suggested [1] except he said "the code already does
> > by memalloc_nocma_{save,restore} API". It needs extending a bit though, AFAICS.
> > __gup_longterm_locked() indeed does the save/restore, but restore comes before
> > check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an
> > adjustment is needed there, but that's all?
> >
> > Hm the adjustment should be also done because save/restore is done around
> > __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls
> > __get_user_pages_locked(), and that call not being between nocma save and
> > restore is thus also a correctness issue?
>
> Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It
> would not cause any problem.
I believe a proper fix is the following. The scope is really defined for
FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages
will solve the problem as well but it imho makes more sense to do it in
the caller the same way we do for any others.
Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
I am not sure this is worth backporting to stable yet.
diff --git a/mm/gup.c b/mm/gup.c
index de9e36262ccb..75980dd5a2fc 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1794,7 +1794,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
vmas_tmp, NULL, gup_flags);
if (gup_flags & FOLL_LONGTERM) {
- memalloc_nocma_restore(flags);
if (rc < 0)
goto out;
@@ -1802,11 +1801,13 @@ static long __gup_longterm_locked(struct task_struct *tsk,
for (i = 0; i < rc; i++)
put_page(pages[i]);
rc = -EOPNOTSUPP;
+ memalloc_nocma_restore(flags);
goto out;
}
rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
vmas_tmp, gup_flags);
+ memalloc_nocma_restore(flags);
}
out:
--
Michal Hocko
SUSE Labs
On Wed 08-07-20 16:19:17, Joonsoo Kim wrote:
> On Tue, Jul 07, 2020 at 01:40:19PM +0200, Michal Hocko wrote:
[...]
> Subject: [PATCH] mm/migrate: clear __GFP_RECLAIM for THP allocation for
> migration
>
> In migration target allocation functions, THP allocations uses different
> gfp_mask, especially, in regard to the reclaim gfp_mask. There is no
> reason to use different reclaim gfp_mask for each cases and it is
> an obstacle to make a common function in order to clean-up migration
> target allocation functions. This patch fixes this situation by using
> common reclaim gfp_mask for THP allocation.
I would find the following more understandable, feel free to reuse parts
that you like:
"
new_page_nodemask is a migration callback and it tries to use a common
gfp flags for the target page allocation whether it is a base page or a
THP. The later only adds GFP_TRANSHUGE to the given mask. This results
in the allocation being slightly more aggressive than necessary because
the resulting gfp mask will contain also __GFP_RECLAIM_KSWAPD. THP
allocations usually exclude this flag to reduce over eager background
reclaim during a high THP allocation load which has been seen during
large mmaps initialization. There is no indication that this is a
problem for migration as well but theoretically the same might happen
when migrating large mappings to a different node. Make the migration
callback consistent with regular THP allocations.
"
--
Michal Hocko
SUSE Labs
On 7/8/20 9:41 AM, Michal Hocko wrote:
> On Wed 08-07-20 16:16:02, Joonsoo Kim wrote:
>> On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote:
>>
>> Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It
>> would not cause any problem.
>
> I believe a proper fix is the following. The scope is really defined for
> FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages
> will solve the problem as well but it imho makes more sense to do it in
> the caller the same way we do for any others.
>
> Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Agreed.
>
> I am not sure this is worth backporting to stable yet.
CC Aneesh.
Context: since check_and_migrate_cma_pages() calls __get_user_pages_locked(), it
should also be called under memalloc_nocma_save().
> diff --git a/mm/gup.c b/mm/gup.c
> index de9e36262ccb..75980dd5a2fc 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1794,7 +1794,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
> vmas_tmp, NULL, gup_flags);
>
> if (gup_flags & FOLL_LONGTERM) {
> - memalloc_nocma_restore(flags);
> if (rc < 0)
> goto out;
>
> @@ -1802,11 +1801,13 @@ static long __gup_longterm_locked(struct task_struct *tsk,
> for (i = 0; i < rc; i++)
> put_page(pages[i]);
> rc = -EOPNOTSUPP;
> + memalloc_nocma_restore(flags);
> goto out;
> }
>
> rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
> vmas_tmp, gup_flags);
> + memalloc_nocma_restore(flags);
> }
>
> out:
>
Vlastimil Babka <[email protected]> writes:
> On 7/8/20 9:41 AM, Michal Hocko wrote:
>> On Wed 08-07-20 16:16:02, Joonsoo Kim wrote:
>>> On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote:
>>>
>>> Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It
>>> would not cause any problem.
>>
>> I believe a proper fix is the following. The scope is really defined for
>> FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages
>> will solve the problem as well but it imho makes more sense to do it in
>> the caller the same way we do for any others.
>>
>> Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
>
> Agreed.
>
>>
>> I am not sure this is worth backporting to stable yet.
>
> CC Aneesh.
>
> Context: since check_and_migrate_cma_pages() calls __get_user_pages_locked(), it
> should also be called under memalloc_nocma_save().
But by then we faulted in all relevant pages and migrated them out of
CMA rea right?
>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index de9e36262ccb..75980dd5a2fc 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -1794,7 +1794,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
>> vmas_tmp, NULL, gup_flags);
>>
>> if (gup_flags & FOLL_LONGTERM) {
>> - memalloc_nocma_restore(flags);
>> if (rc < 0)
>> goto out;
>>
>> @@ -1802,11 +1801,13 @@ static long __gup_longterm_locked(struct task_struct *tsk,
>> for (i = 0; i < rc; i++)
>> put_page(pages[i]);
>> rc = -EOPNOTSUPP;
>> + memalloc_nocma_restore(flags);
>> goto out;
>> }
>>
>> rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
>> vmas_tmp, gup_flags);
>> + memalloc_nocma_restore(flags);
>> }
>>
>> out:
>>
-aneesh
On Wed 08-07-20 16:27:16, Aneesh Kumar K.V wrote:
> Vlastimil Babka <[email protected]> writes:
>
> > On 7/8/20 9:41 AM, Michal Hocko wrote:
> >> On Wed 08-07-20 16:16:02, Joonsoo Kim wrote:
> >>> On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote:
> >>>
> >>> Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It
> >>> would not cause any problem.
> >>
> >> I believe a proper fix is the following. The scope is really defined for
> >> FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages
> >> will solve the problem as well but it imho makes more sense to do it in
> >> the caller the same way we do for any others.
> >>
> >> Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
> >
> > Agreed.
> >
> >>
> >> I am not sure this is worth backporting to stable yet.
> >
> > CC Aneesh.
> >
> > Context: since check_and_migrate_cma_pages() calls __get_user_pages_locked(), it
> > should also be called under memalloc_nocma_save().
>
> But by then we faulted in all relevant pages and migrated them out of
> CMA rea right?
check_and_migrate_cma_pages will allocate target pages that you want to
migrate off the CMA region unless I am misreading the code. And those
allocation need to be placed outside of the CMA.
--
Michal Hocko
SUSE Labs
On 7/8/20 12:16 AM, Joonsoo Kim wrote:
> On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote:
>> On 7/7/20 9:44 AM, [email protected] wrote:
>>> From: Joonsoo Kim <[email protected]>
>>>
<...>
>>> This patch makes the deque function on hugetlb CMA aware and skip CMA
>>> pages if newly added skip_cma argument is passed as true.
>>
>> Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_
>> flag and avoid adding bool skip_cma everywhere?
>
> Okay! Please check following patch.
>>
>> I think that's what Michal suggested [1] except he said "the code already does
>> by memalloc_nocma_{save,restore} API". It needs extending a bit though, AFAICS.
>> __gup_longterm_locked() indeed does the save/restore, but restore comes before
>> check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an
>> adjustment is needed there, but that's all?
>>
>> Hm the adjustment should be also done because save/restore is done around
>> __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls
>> __get_user_pages_locked(), and that call not being between nocma save and
>> restore is thus also a correctness issue?
>
> Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It
> would not cause any problem.
>
> ------------------>8-------------------
> From bcfc57e3c6f2df1ad2940308b89d740cd3f0fba8 Mon Sep 17 00:00:00 2001
> From: Joonsoo Kim <[email protected]>
> Date: Wed, 8 Jul 2020 14:39:26 +0900
> Subject: [PATCH] mm/hugetlb: make hugetlb migration callback CMA aware
>
<...>
>
> This patch makes new_non_cma_page() uses memalloc_nocma_{save,restore}
> to exclude CMA memory rather than manually clearing __GFP_MOVABLE. And,
> this patch also makes the deque function on hugetlb CMA aware. In the
> deque function, CMA memory is skipped if PF_MEMALLOC_NOCMA flag is set
> by memalloc_nocma_{save,restore}.
>
> Acked-by: Mike Kravetz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
I did ACK the previous version of the patch, but I like this much better.
I assume there will be a new version built on top of Michal's patch to
change the placement of memalloc_nocma_restore calls in __gup_longterm_locked.
--
Mike Kravetz
2020년 7월 8일 (수) 오전 1:34, Vlastimil Babka <[email protected]>님이 작성:
>
> On 7/7/20 9:44 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > To calculate the correct node to migrate the page for hotplug, we need
> > to check node id of the page. Wrapper for alloc_migration_target() exists
> > for this purpose.
> >
> > However, Vlastimil informs that all migration source pages come from
> > a single node. In this case, we don't need to check the node id for each
> > page and we don't need to re-set the target nodemask for each page by
> > using the wrapper. Set up the migration_target_control once and use it for
> > all pages.
> >
> > Signed-off-by: Joonsoo Kim <[email protected]>
>
> Acked-by: Vlastimil Babka <[email protected]>
>
> Thanks! Nitpick below.
>
> > @@ -1345,9 +1324,28 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> > put_page(page);
> > }
> > if (!list_empty(&source)) {
> > - /* Allocate a new page from the nearest neighbor node */
> > - ret = migrate_pages(&source, new_node_page, NULL, 0,
> > - MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> > + nodemask_t nmask = node_states[N_MEMORY];
> > + struct migration_target_control mtc = {
> > + .nmask = &nmask,
> > + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> > + };
> > +
> > + /*
> > + * We have checked that migration range is on a single zone so
> > + * we can use the nid of the first page to all the others.
> > + */
> > + mtc.nid = page_to_nid(list_first_entry(&source, struct page, lru));
> > +
> > + /*
> > + * try to allocate from a different node but reuse this node
> > + * if there are no other online nodes to be used (e.g. we are
> > + * offlining a part of the only existing node)
> > + */
> > + node_clear(mtc.nid, *mtc.nmask);
> > + if (nodes_empty(*mtc.nmask))
> > + node_set(mtc.nid, *mtc.nmask);
>
> You could have kept using 'nmask' instead of '*mtc.nmask'. Actually that applies
> to patch 6 too, for less churn.
You are right. I will change it.
Thanks.
2020년 7월 8일 (수) 오후 4:48, Michal Hocko <[email protected]>님이 작성:
>
> On Wed 08-07-20 16:19:17, Joonsoo Kim wrote:
> > On Tue, Jul 07, 2020 at 01:40:19PM +0200, Michal Hocko wrote:
> [...]
> > Subject: [PATCH] mm/migrate: clear __GFP_RECLAIM for THP allocation for
> > migration
> >
> > In migration target allocation functions, THP allocations uses different
> > gfp_mask, especially, in regard to the reclaim gfp_mask. There is no
> > reason to use different reclaim gfp_mask for each cases and it is
> > an obstacle to make a common function in order to clean-up migration
> > target allocation functions. This patch fixes this situation by using
> > common reclaim gfp_mask for THP allocation.
>
> I would find the following more understandable, feel free to reuse parts
> that you like:
> "
> new_page_nodemask is a migration callback and it tries to use a common
> gfp flags for the target page allocation whether it is a base page or a
> THP. The later only adds GFP_TRANSHUGE to the given mask. This results
> in the allocation being slightly more aggressive than necessary because
> the resulting gfp mask will contain also __GFP_RECLAIM_KSWAPD. THP
> allocations usually exclude this flag to reduce over eager background
> reclaim during a high THP allocation load which has been seen during
> large mmaps initialization. There is no indication that this is a
> problem for migration as well but theoretically the same might happen
> when migrating large mappings to a different node. Make the migration
> callback consistent with regular THP allocations.
> "
Looks good!
I will use this description.
Thanks.
On Wed 08-07-20 09:41:06, Michal Hocko wrote:
> On Wed 08-07-20 16:16:02, Joonsoo Kim wrote:
> > On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote:
> > > On 7/7/20 9:44 AM, [email protected] wrote:
> > > > From: Joonsoo Kim <[email protected]>
> > > >
> > > > new_non_cma_page() in gup.c which try to allocate migration target page
> > > > requires to allocate the new page that is not on the CMA area.
> > > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > > > works well for THP page or normal page but not for hugetlb page.
> > > >
> > > > hugetlb page allocation process consists of two steps. First is dequeing
> > > > from the pool. Second is, if there is no available page on the queue,
> > > > allocating from the page allocator.
> > > >
> > > > new_non_cma_page() can control allocation from the page allocator by
> > > > specifying correct gfp flag. However, dequeing cannot be controlled until
> > > > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
> > > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
> > > > patch tries to fix this situation.
> > > >
> > > > This patch makes the deque function on hugetlb CMA aware and skip CMA
> > > > pages if newly added skip_cma argument is passed as true.
> > >
> > > Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_
> > > flag and avoid adding bool skip_cma everywhere?
> >
> > Okay! Please check following patch.
> > >
> > > I think that's what Michal suggested [1] except he said "the code already does
> > > by memalloc_nocma_{save,restore} API". It needs extending a bit though, AFAICS.
> > > __gup_longterm_locked() indeed does the save/restore, but restore comes before
> > > check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an
> > > adjustment is needed there, but that's all?
> > >
> > > Hm the adjustment should be also done because save/restore is done around
> > > __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls
> > > __get_user_pages_locked(), and that call not being between nocma save and
> > > restore is thus also a correctness issue?
> >
> > Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It
> > would not cause any problem.
>
> I believe a proper fix is the following. The scope is really defined for
> FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages
> will solve the problem as well but it imho makes more sense to do it in
> the caller the same way we do for any others.
>
> Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
>
> I am not sure this is worth backporting to stable yet.
Should I post it as a separate patch do you plan to include this into your next version?
>
> diff --git a/mm/gup.c b/mm/gup.c
> index de9e36262ccb..75980dd5a2fc 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1794,7 +1794,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
> vmas_tmp, NULL, gup_flags);
>
> if (gup_flags & FOLL_LONGTERM) {
> - memalloc_nocma_restore(flags);
> if (rc < 0)
> goto out;
>
> @@ -1802,11 +1801,13 @@ static long __gup_longterm_locked(struct task_struct *tsk,
> for (i = 0; i < rc; i++)
> put_page(pages[i]);
> rc = -EOPNOTSUPP;
> + memalloc_nocma_restore(flags);
> goto out;
> }
>
> rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
> vmas_tmp, gup_flags);
> + memalloc_nocma_restore(flags);
> }
>
> out:
> --
> Michal Hocko
> SUSE Labs
--
Michal Hocko
SUSE Labs
2020년 7월 9일 (목) 오후 3:43, Michal Hocko <[email protected]>님이 작성:
>
> On Wed 08-07-20 09:41:06, Michal Hocko wrote:
> > On Wed 08-07-20 16:16:02, Joonsoo Kim wrote:
> > > On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote:
> > > > On 7/7/20 9:44 AM, [email protected] wrote:
> > > > > From: Joonsoo Kim <[email protected]>
> > > > >
> > > > > new_non_cma_page() in gup.c which try to allocate migration target page
> > > > > requires to allocate the new page that is not on the CMA area.
> > > > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > > > > works well for THP page or normal page but not for hugetlb page.
> > > > >
> > > > > hugetlb page allocation process consists of two steps. First is dequeing
> > > > > from the pool. Second is, if there is no available page on the queue,
> > > > > allocating from the page allocator.
> > > > >
> > > > > new_non_cma_page() can control allocation from the page allocator by
> > > > > specifying correct gfp flag. However, dequeing cannot be controlled until
> > > > > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
> > > > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
> > > > > patch tries to fix this situation.
> > > > >
> > > > > This patch makes the deque function on hugetlb CMA aware and skip CMA
> > > > > pages if newly added skip_cma argument is passed as true.
> > > >
> > > > Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_
> > > > flag and avoid adding bool skip_cma everywhere?
> > >
> > > Okay! Please check following patch.
> > > >
> > > > I think that's what Michal suggested [1] except he said "the code already does
> > > > by memalloc_nocma_{save,restore} API". It needs extending a bit though, AFAICS.
> > > > __gup_longterm_locked() indeed does the save/restore, but restore comes before
> > > > check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an
> > > > adjustment is needed there, but that's all?
> > > >
> > > > Hm the adjustment should be also done because save/restore is done around
> > > > __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls
> > > > __get_user_pages_locked(), and that call not being between nocma save and
> > > > restore is thus also a correctness issue?
> > >
> > > Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It
> > > would not cause any problem.
> >
> > I believe a proper fix is the following. The scope is really defined for
> > FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages
> > will solve the problem as well but it imho makes more sense to do it in
> > the caller the same way we do for any others.
> >
> > Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
> >
> > I am not sure this is worth backporting to stable yet.
>
> Should I post it as a separate patch do you plan to include this into your next version?
It's better to include it on my next version since this patch would
cause a conflict with
the next tree that includes my v3 of this patchset.
Thanks.
2020년 7월 8일 (수) 오전 4:00, Michal Hocko <[email protected]>님이 작성:
>
> On Tue 07-07-20 16:49:51, Vlastimil Babka wrote:
> > On 7/7/20 9:44 AM, [email protected] wrote:
> > > From: Joonsoo Kim <[email protected]>
> > >
> > > There are some similar functions for migration target allocation. Since
> > > there is no fundamental difference, it's better to keep just one rather
> > > than keeping all variants. This patch implements base migration target
> > > allocation function. In the following patches, variants will be converted
> > > to use this function.
> > >
> > > Changes should be mechanical but there are some differences. First, Some
> > > callers' nodemask is assgined to NULL since NULL nodemask will be
> > > considered as all available nodes, that is, &node_states[N_MEMORY].
> > > Second, for hugetlb page allocation, gfp_mask is ORed since a user could
> > > provide a gfp_mask from now on.
> >
> > I think that's wrong. See how htlb_alloc_mask() determines between
> > GFP_HIGHUSER_MOVABLE and GFP_HIGHUSER, but then you OR it with __GFP_MOVABLE so
> > it's always GFP_HIGHUSER_MOVABLE.
Indeed.
> Right you are! Not that it would make any real difference because only
> migrateable hugetlb pages will get __GFP_MOVABLE and so we shouldn't
> really end up here for !movable pages in the first place (not sure about
> soft offlining at this moment). But yeah it would be simply better to
> override gfp mask for hugetlb which we have been doing anyway.
Override gfp mask doesn't work since some users will call this function with
__GFP_THISNODE. I will use hugepage_movable_supported() here and
clear __GFP_MOVABLE if needed.
Thanks.
Thanks.
2020년 7월 7일 (화) 오후 9:17, Vlastimil Babka <[email protected]>님이 작성:
>
> On 7/7/20 9:44 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > In mm/migrate.c, THP allocation for migration is called with the provided
> > gfp_mask | GFP_TRANSHUGE. This gfp_mask contains __GFP_RECLAIM and it
> > would be conflict with the intention of the GFP_TRANSHUGE.
> >
> > GFP_TRANSHUGE/GFP_TRANSHUGE_LIGHT is introduced to control the reclaim
> > behaviour by well defined manner since overhead of THP allocation is
> > quite large and the whole system could suffer from it. So, they deals
> > with __GFP_RECLAIM mask deliberately. If gfp_mask contains __GFP_RECLAIM
> > and uses gfp_mask | GFP_TRANSHUGE(_LIGHT) for THP allocation, it means
> > that it breaks the purpose of the GFP_TRANSHUGE(_LIGHT).
> >
> > This patch fixes this situation by clearing __GFP_RECLAIM in provided
> > gfp_mask. Note that there are some other THP allocations for migration
> > and they just uses GFP_TRANSHUGE(_LIGHT) directly. This patch would make
> > all THP allocation for migration consistent.
> >
> > Signed-off-by: Joonsoo Kim <[email protected]>
> > ---
> > mm/migrate.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 02b31fe..ecd7615 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1547,6 +1547,11 @@ struct page *new_page_nodemask(struct page *page,
> > }
> >
> > if (PageTransHuge(page)) {
> > + /*
> > + * clear __GFP_RECALIM since GFP_TRANSHUGE is the gfp_mask
> > + * that chooses the reclaim masks deliberately.
> > + */
> > + gfp_mask &= ~__GFP_RECLAIM;
> > gfp_mask |= GFP_TRANSHUGE;
>
> In addition to what Michal said...
>
> The mask is not passed to this function, so I would just redefine it, as is done
> in the hugetlb case. We probably don't even need the __GFP_RETRY_MAYFAIL for the
> THP case asi it's just there to prevent OOM kill (per commit 0f55685627d6d ) and
> the costly order of THP is enough for that.
As I said in another reply, provided __GFP_THISNODE should be handled
so just redefining it would not work.
Thanks.
On Thu 09-07-20 16:15:07, Joonsoo Kim wrote:
> 2020년 7월 8일 (수) 오전 4:00, Michal Hocko <[email protected]>님이 작성:
> >
> > On Tue 07-07-20 16:49:51, Vlastimil Babka wrote:
> > > On 7/7/20 9:44 AM, [email protected] wrote:
> > > > From: Joonsoo Kim <[email protected]>
> > > >
> > > > There are some similar functions for migration target allocation. Since
> > > > there is no fundamental difference, it's better to keep just one rather
> > > > than keeping all variants. This patch implements base migration target
> > > > allocation function. In the following patches, variants will be converted
> > > > to use this function.
> > > >
> > > > Changes should be mechanical but there are some differences. First, Some
> > > > callers' nodemask is assgined to NULL since NULL nodemask will be
> > > > considered as all available nodes, that is, &node_states[N_MEMORY].
> > > > Second, for hugetlb page allocation, gfp_mask is ORed since a user could
> > > > provide a gfp_mask from now on.
> > >
> > > I think that's wrong. See how htlb_alloc_mask() determines between
> > > GFP_HIGHUSER_MOVABLE and GFP_HIGHUSER, but then you OR it with __GFP_MOVABLE so
> > > it's always GFP_HIGHUSER_MOVABLE.
>
> Indeed.
>
> > Right you are! Not that it would make any real difference because only
> > migrateable hugetlb pages will get __GFP_MOVABLE and so we shouldn't
> > really end up here for !movable pages in the first place (not sure about
> > soft offlining at this moment). But yeah it would be simply better to
> > override gfp mask for hugetlb which we have been doing anyway.
>
> Override gfp mask doesn't work since some users will call this function with
> __GFP_THISNODE.
> I will use hugepage_movable_supported() here and
> clear __GFP_MOVABLE if needed.
hugepage_movable_supported is really an implementation detail, do not
use it here. I think it would be better to add
gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t mask)
{
gfp_t default_mask = htlb_alloc_mask(h);
/* Some callers might want to enforce node */
return default_mask | (mask & __GFP_THISNODE);
}
If we need to special case others, eg reclaim restrictions there would
be a single place to do so.
--
Michal Hocko
SUSE Labs