LinuxLists.cc - [PATCH 1/4] mm/page_alloc: fix non cma alloc context

[permalink] [raw]

Subject: [PATCH 2/4] mm/gup: restrict CMA region by using allocation scope API

From: Joonsoo Kim <[email protected]>

We have well defined scope API to exclude CMA region.
Use it rather than manipulating gfp_mask manually. With this change,
we can now use __GFP_MOVABLE for gfp_mask and the ZONE_MOVABLE is also
searched by page allocator. For hugetlb, gfp_mask is redefined since
it has a regular allocation mask filter for migration target.

Note that this can be considered as a fix for the commit 9a4e9f3b2d73
("mm: update get_user_pages_longterm to migrate pages allocated from
CMA region"). However, "Fixes" tag isn't added here since it is just
suboptimal but it doesn't cause any problem.

Suggested-by: Michal Hocko <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/hugetlb.h | 2 ++
mm/gup.c | 17 ++++++++---------
2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6b9508d..2660b04 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -708,6 +708,8 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
/* Some callers might want to enfoce node */
modified_mask |= (gfp_mask & __GFP_THISNODE);

+ modified_mask |= (gfp_mask & __GFP_NOWARN);
+
return modified_mask;
}

diff --git a/mm/gup.c b/mm/gup.c
index 5daadae..bbd36a1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1619,10 +1619,12 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
* Trying to allocate a page for migration. Ignore allocation
* failure warnings. We don't force __GFP_THISNODE here because
* this node here is the node where we have CMA reservation and
- * in some case these nodes will have really less non movable
+ * in some case these nodes will have really less non CMA
* allocation memory.
+ *
+ * Note that CMA region is prohibited by allocation scope.
*/
- gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
+ gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;

if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
@@ -1630,6 +1632,8 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
#ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+
+ gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
/*
* We don't want to dequeue from the pool because pool pages will
* mostly be from the CMA region.
@@ -1644,11 +1648,6 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
*/
gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;

- /*
- * Remove the movable mask so that we don't allocate from
- * CMA area again.
- */
- thp_gfpmask &= ~__GFP_MOVABLE;
thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
if (!thp)
return NULL;
@@ -1794,7 +1793,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
vmas_tmp, NULL, gup_flags);

if (gup_flags & FOLL_LONGTERM) {
- memalloc_nocma_restore(flags);
if (rc < 0)
goto out;

@@ -1807,9 +1805,10 @@ static long __gup_longterm_locked(struct task_struct *tsk,

rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
vmas_tmp, gup_flags);
+out:
+ memalloc_nocma_restore(flags);
}

-out:
if (vmas_tmp != vmas)
kfree(vmas_tmp);
return rc;
--
2.7.4

2020-07-15 06:00:21

[permalink] [raw]

Subject: [PATCH 3/4] mm/hugetlb: make hugetlb migration callback CMA aware

From: Joonsoo Kim <[email protected]>

new_non_cma_page() in gup.c requires to allocate the new page that is not
on the CMA area. new_non_cma_page() implements it by using allocation
scope APIs.

However, there is a work-around for hugetlb. Normal hugetlb page
allocation API for migration is alloc_huge_page_nodemask(). It consists
of two steps. First is dequeing from the pool. Second is, if there is no
available page on the queue, allocating by using the page allocator.

new_non_cma_page() can't use this API since first step (deque) isn't
aware of scope API to exclude CMA area. So, new_non_cma_page() exports
hugetlb internal function for the second step, alloc_migrate_huge_page(),
to global scope and uses it directly. This is suboptimal since hugetlb
pages on the queue cannot be utilized.

This patch tries to fix this situation by making the deque function on
hugetlb CMA aware. In the deque function, CMA memory is skipped if
PF_MEMALLOC_NOCMA flag is found.

Acked-by: Mike Kravetz <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/hugetlb.h | 2 --
mm/gup.c | 6 +-----
mm/hugetlb.c | 11 +++++++++--
3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2660b04..fb2b5aa 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -509,8 +509,6 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
- int nid, nodemask_t *nmask);
int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);

diff --git a/mm/gup.c b/mm/gup.c
index bbd36a1..4ba822a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1634,11 +1634,7 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
struct hstate *h = page_hstate(page);

gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
- /*
- * We don't want to dequeue from the pool because pool pages will
- * mostly be from the CMA region.
- */
- return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+ return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
}
#endif
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3245aa0..514e29c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -29,6 +29,7 @@
#include <linux/numa.h>
#include <linux/llist.h>
#include <linux/cma.h>
+#include <linux/sched/mm.h>

#include <asm/page.h>
#include <asm/tlb.h>
@@ -1036,10 +1037,16 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
{
struct page *page;
+ bool nocma = !!(READ_ONCE(current->flags) & PF_MEMALLOC_NOCMA);
+
+ list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
+ if (nocma && is_migrate_cma_page(page))
+ continue;

- list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
if (!PageHWPoison(page))
break;
+ }
+
/*
* if 'non-isolated free hugepage' not found on the list,
* the allocation fails.
@@ -1928,7 +1935,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
return page;
}

-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
int nid, nodemask_t *nmask)
{
struct page *page;
--
2.7.4

2020-07-15 06:00:32

[permalink] [raw]

Subject: [PATCH 4/4] mm/gup: use a standard migration target allocation callback

From: Joonsoo Kim <[email protected]>

There is a well-defined migration target allocation callback. Use it.

Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/gup.c | 54 ++++++------------------------------------------------
1 file changed, 6 insertions(+), 48 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 4ba822a..628ca4c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,52 +1608,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
}

#ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page, unsigned long private)
-{
- /*
- * We want to make sure we allocate the new page from the same node
- * as the source page.
- */
- int nid = page_to_nid(page);
- /*
- * Trying to allocate a page for migration. Ignore allocation
- * failure warnings. We don't force __GFP_THISNODE here because
- * this node here is the node where we have CMA reservation and
- * in some case these nodes will have really less non CMA
- * allocation memory.
- *
- * Note that CMA region is prohibited by allocation scope.
- */
- gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
-
- if (PageHighMem(page))
- gfp_mask |= __GFP_HIGHMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
- if (PageHuge(page)) {
- struct hstate *h = page_hstate(page);
-
- gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
- return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
- }
-#endif
- if (PageTransHuge(page)) {
- struct page *thp;
- /*
- * ignore allocation failure warnings
- */
- gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
- thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
- if (!thp)
- return NULL;
- prep_transhuge_page(thp);
- return thp;
- }
-
- return __alloc_pages_node(nid, gfp_mask, 0);
-}
-
static long check_and_migrate_cma_pages(struct task_struct *tsk,
struct mm_struct *mm,
unsigned long start,
@@ -1668,6 +1622,10 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
bool migrate_allow = true;
LIST_HEAD(cma_page_list);
long ret = nr_pages;
+ struct migration_target_control mtc = {
+ .nid = NUMA_NO_NODE,
+ .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
+ };

check_again:
for (i = 0; i < nr_pages;) {
@@ -1713,8 +1671,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);

- if (migrate_pages(&cma_page_list, new_non_cma_page,
- NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+ if (migrate_pages(&cma_page_list, alloc_migration_target, NULL,
+ (unsigned long)&mtc, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
* some of the pages failed migration. Do get_user_pages
* without migration.
--
2.7.4

2020-07-15 08:51:57

[permalink] [raw]

Subject: Re: [PATCH 2/4] mm/gup: restrict CMA region by using allocation scope API

On Wed 15-07-20 14:05:27, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> We have well defined scope API to exclude CMA region.
> Use it rather than manipulating gfp_mask manually. With this change,
> we can now use __GFP_MOVABLE for gfp_mask and the ZONE_MOVABLE is also
> searched by page allocator. For hugetlb, gfp_mask is redefined since
> it has a regular allocation mask filter for migration target.
>
> Note that this can be considered as a fix for the commit 9a4e9f3b2d73
> ("mm: update get_user_pages_longterm to migrate pages allocated from
> CMA region"). However, "Fixes" tag isn't added here since it is just
> suboptimal but it doesn't cause any problem.

But it is breaking the contract that the longterm pins never end up in a
cma managed memory. So I think Fixes tag is really due. I am not sure
about stable backport. If the patch was the trivial move of
memalloc_nocma_restore then it would be probably worth it because it is
trivial to review and backport. I suspect that longterm pins in CMA
regions would cause hard to debug issues where CMA memory will not be
available. But I am not really sure this is a real problem considering
how many long term pin users we have and I have also no idea whether
those are usually used along with CMA users.

Anyway I think it would really be much better to isolate the
memalloc_nocma_restore and have it first in the series. The reword of
the __GFP_MOVABLE functionality is orthogonal.

Btw __GFP_NOWARN change is not documented.

> Suggested-by: Michal Hocko <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/linux/hugetlb.h | 2 ++
> mm/gup.c | 17 ++++++++---------
> 2 files changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6b9508d..2660b04 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -708,6 +708,8 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
> /* Some callers might want to enfoce node */
> modified_mask |= (gfp_mask & __GFP_THISNODE);
>
> + modified_mask |= (gfp_mask & __GFP_NOWARN);
> +
> return modified_mask;
> }
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 5daadae..bbd36a1 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1619,10 +1619,12 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
> * Trying to allocate a page for migration. Ignore allocation
> * failure warnings. We don't force __GFP_THISNODE here because
> * this node here is the node where we have CMA reservation and
> - * in some case these nodes will have really less non movable
> + * in some case these nodes will have really less non CMA
> * allocation memory.
> + *
> + * Note that CMA region is prohibited by allocation scope.
> */
> - gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
> + gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
>
> if (PageHighMem(page))
> gfp_mask |= __GFP_HIGHMEM;
> @@ -1630,6 +1632,8 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
> #ifdef CONFIG_HUGETLB_PAGE
> if (PageHuge(page)) {
> struct hstate *h = page_hstate(page);
> +
> + gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
> /*
> * We don't want to dequeue from the pool because pool pages will
> * mostly be from the CMA region.
> @@ -1644,11 +1648,6 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
> */
> gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
>
> - /*
> - * Remove the movable mask so that we don't allocate from
> - * CMA area again.
> - */
> - thp_gfpmask &= ~__GFP_MOVABLE;
> thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
> if (!thp)
> return NULL;
> @@ -1794,7 +1793,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
> vmas_tmp, NULL, gup_flags);
>
> if (gup_flags & FOLL_LONGTERM) {
> - memalloc_nocma_restore(flags);
> if (rc < 0)
> goto out;
>
> @@ -1807,9 +1805,10 @@ static long __gup_longterm_locked(struct task_struct *tsk,
>
> rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
> vmas_tmp, gup_flags);
> +out:
> + memalloc_nocma_restore(flags);
> }
>
> -out:
> if (vmas_tmp != vmas)
> kfree(vmas_tmp);
> return rc;
> --
> 2.7.4

--
Michal Hocko
SUSE Labs

2020-07-15 08:52:41

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

On 7/15/20 7:05 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> Currently, preventing cma area in page allocation is implemented by using
> current_gfp_context(). However, there are two problems of this
> implementation.
>
> First, this doesn't work for allocation fastpath. In the fastpath,
> original gfp_mask is used since current_gfp_context() is introduced in
> order to control reclaim and it is on slowpath.
> Second, clearing __GFP_MOVABLE has a side effect to exclude the memory
> on the ZONE_MOVABLE for allocation target.
>
> To fix these problems, this patch changes the implementation to exclude
> cma area in page allocation. Main point of this change is using the
> alloc_flags. alloc_flags is mainly used to control allocation so it fits
> for excluding cma area in allocation.

Agreed, should have been done with ALLOC_CMA since the beginning.

> Fixes: d7fefcc (mm/cma: add PF flag to force non cma alloc)

More digits please.
Fixes: d7fefcc8de91 ("mm/cma: add PF flag to force non cma alloc")

> Cc: <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/linux/sched/mm.h | 4 ----
> mm/page_alloc.c | 27 +++++++++++++++------------
> 2 files changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 44ad5b7..a73847a 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -191,10 +191,6 @@ static inline gfp_t current_gfp_context(gfp_t flags)
> flags &= ~(__GFP_IO | __GFP_FS);
> else if (pflags & PF_MEMALLOC_NOFS)
> flags &= ~__GFP_FS;

Above this hunk you should also remove PF_MEMALLOC_NOCMA from the test.

> -#ifdef CONFIG_CMA
> - if (pflags & PF_MEMALLOC_NOCMA)
> - flags &= ~__GFP_MOVABLE;
> -#endif
> }
> return flags;
> }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6416d08..cd53894 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2791,7 +2791,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> * allocating from CMA when over half of the zone's free memory
> * is in the CMA area.
> */
> - if (migratetype == MIGRATE_MOVABLE &&
> + if (alloc_flags & ALLOC_CMA &&
> zone_page_state(zone, NR_FREE_CMA_PAGES) >
> zone_page_state(zone, NR_FREE_PAGES) / 2) {
> page = __rmqueue_cma_fallback(zone, order);
> @@ -2802,7 +2802,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> retry:
> page = __rmqueue_smallest(zone, order, migratetype);
> if (unlikely(!page)) {
> - if (migratetype == MIGRATE_MOVABLE)
> + if (alloc_flags & ALLOC_CMA)
> page = __rmqueue_cma_fallback(zone, order);
>
> if (!page && __rmqueue_fallback(zone, order, migratetype,
> @@ -3502,11 +3502,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
> if (likely(!alloc_harder))
> unusable_free += z->nr_reserved_highatomic;
>
> -#ifdef CONFIG_CMA
> /* If allocation can't use CMA areas don't use free CMA pages */
> - if (!(alloc_flags & ALLOC_CMA))
> + if (IS_ENABLED(CONFIG_CMA) && !(alloc_flags & ALLOC_CMA))
> unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
> -#endif
>
> return unusable_free;
> }
> @@ -3693,6 +3691,16 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> return alloc_flags;
> }
>
> +static inline void current_alloc_flags(gfp_t gfp_mask,
> + unsigned int *alloc_flags)
> +{
> + unsigned int pflags = READ_ONCE(current->flags);
> +
> + if (!(pflags & PF_MEMALLOC_NOCMA) &&
> + gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> + *alloc_flags |= ALLOC_CMA;
> +}

I don't like the modification through parameter, would just do what
current_gfp_context() does and return the modified value.
Also make it a no-op (including no READ_ONCE(current->flags)) if !CONFIG_CMA,
please.

> /*
> * get_page_from_freelist goes through the zonelist trying to allocate
> * a page.
> @@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> struct pglist_data *last_pgdat_dirty_limit = NULL;
> bool no_fallback;
>
> + current_alloc_flags(gfp_mask, &alloc_flags);

I don't see why to move the test here? It will still be executed in the
fastpath, if that's what you wanted to avoid.

> +
> retry:
> /*
> * Scan zonelist, looking for a zone with enough free.
> @@ -4339,10 +4349,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> } else if (unlikely(rt_task(current)) && !in_interrupt())
> alloc_flags |= ALLOC_HARDER;
>
> -#ifdef CONFIG_CMA
> - if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> - alloc_flags |= ALLOC_CMA;
> -#endif

I would just replace this here with:
alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);

> return alloc_flags;
> }
>
> @@ -4808,9 +4814,6 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
> if (should_fail_alloc_page(gfp_mask, order))
> return false;
>
> - if (IS_ENABLED(CONFIG_CMA) && ac->migratetype == MIGRATE_MOVABLE)
> - *alloc_flags |= ALLOC_CMA;

And same here... Ah, I see. current_alloc_flags() should probably take a
migratetype parameter instead of gfp_mask then.

> -
> return true;
> }
>
>

2020-07-15 09:04:40

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

On Wed 15-07-20 14:05:26, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> Currently, preventing cma area in page allocation is implemented by using
> current_gfp_context(). However, there are two problems of this
> implementation.
>
> First, this doesn't work for allocation fastpath. In the fastpath,
> original gfp_mask is used since current_gfp_context() is introduced in
> order to control reclaim and it is on slowpath.
> Second, clearing __GFP_MOVABLE has a side effect to exclude the memory
> on the ZONE_MOVABLE for allocation target.

This can be especially a problem with movable_node configurations where
a large portion of the memory is in movable zones.

> To fix these problems, this patch changes the implementation to exclude
> cma area in page allocation. Main point of this change is using the
> alloc_flags. alloc_flags is mainly used to control allocation so it fits
> for excluding cma area in allocation.

The approach is sensible and the patch makes sense to me from a quick
glance but I am not really familiar with all subtle details about cma
integration with the allocator so I do not feel confident to provide my
ack.

Thanks!

> Fixes: d7fefcc (mm/cma: add PF flag to force non cma alloc)
> Cc: <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/linux/sched/mm.h | 4 ----
> mm/page_alloc.c | 27 +++++++++++++++------------
> 2 files changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 44ad5b7..a73847a 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -191,10 +191,6 @@ static inline gfp_t current_gfp_context(gfp_t flags)
> flags &= ~(__GFP_IO | __GFP_FS);
> else if (pflags & PF_MEMALLOC_NOFS)
> flags &= ~__GFP_FS;
> -#ifdef CONFIG_CMA
> - if (pflags & PF_MEMALLOC_NOCMA)
> - flags &= ~__GFP_MOVABLE;
> -#endif
> }
> return flags;
> }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6416d08..cd53894 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2791,7 +2791,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> * allocating from CMA when over half of the zone's free memory
> * is in the CMA area.
> */
> - if (migratetype == MIGRATE_MOVABLE &&
> + if (alloc_flags & ALLOC_CMA &&
> zone_page_state(zone, NR_FREE_CMA_PAGES) >
> zone_page_state(zone, NR_FREE_PAGES) / 2) {
> page = __rmqueue_cma_fallback(zone, order);
> @@ -2802,7 +2802,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> retry:
> page = __rmqueue_smallest(zone, order, migratetype);
> if (unlikely(!page)) {
> - if (migratetype == MIGRATE_MOVABLE)
> + if (alloc_flags & ALLOC_CMA)
> page = __rmqueue_cma_fallback(zone, order);
>
> if (!page && __rmqueue_fallback(zone, order, migratetype,
> @@ -3502,11 +3502,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
> if (likely(!alloc_harder))
> unusable_free += z->nr_reserved_highatomic;
>
> -#ifdef CONFIG_CMA
> /* If allocation can't use CMA areas don't use free CMA pages */
> - if (!(alloc_flags & ALLOC_CMA))
> + if (IS_ENABLED(CONFIG_CMA) && !(alloc_flags & ALLOC_CMA))
> unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
> -#endif
>
> return unusable_free;
> }
> @@ -3693,6 +3691,16 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> return alloc_flags;
> }
>
> +static inline void current_alloc_flags(gfp_t gfp_mask,
> + unsigned int *alloc_flags)
> +{
> + unsigned int pflags = READ_ONCE(current->flags);
> +
> + if (!(pflags & PF_MEMALLOC_NOCMA) &&
> + gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> + *alloc_flags |= ALLOC_CMA;
> +}
> +
> /*
> * get_page_from_freelist goes through the zonelist trying to allocate
> * a page.
> @@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> struct pglist_data *last_pgdat_dirty_limit = NULL;
> bool no_fallback;
>
> + current_alloc_flags(gfp_mask, &alloc_flags);
> +
> retry:
> /*
> * Scan zonelist, looking for a zone with enough free.
> @@ -4339,10 +4349,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> } else if (unlikely(rt_task(current)) && !in_interrupt())
> alloc_flags |= ALLOC_HARDER;
>
> -#ifdef CONFIG_CMA
> - if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> - alloc_flags |= ALLOC_CMA;
> -#endif
> return alloc_flags;
> }
>
> @@ -4808,9 +4814,6 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
> if (should_fail_alloc_page(gfp_mask, order))
> return false;
>
> - if (IS_ENABLED(CONFIG_CMA) && ac->migratetype == MIGRATE_MOVABLE)
> - *alloc_flags |= ALLOC_CMA;
> -
> return true;
> }
>
> --
> 2.7.4

--
Michal Hocko
SUSE Labs

2020-07-15 09:07:50

[permalink] [raw]

Subject: Re: [PATCH 3/4] mm/hugetlb: make hugetlb migration callback CMA aware

On Wed 15-07-20 14:05:28, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> new_non_cma_page() in gup.c requires to allocate the new page that is not
> on the CMA area. new_non_cma_page() implements it by using allocation
> scope APIs.
>
> However, there is a work-around for hugetlb. Normal hugetlb page
> allocation API for migration is alloc_huge_page_nodemask(). It consists
> of two steps. First is dequeing from the pool. Second is, if there is no
> available page on the queue, allocating by using the page allocator.
>
> new_non_cma_page() can't use this API since first step (deque) isn't
> aware of scope API to exclude CMA area. So, new_non_cma_page() exports
> hugetlb internal function for the second step, alloc_migrate_huge_page(),
> to global scope and uses it directly. This is suboptimal since hugetlb
> pages on the queue cannot be utilized.
>
> This patch tries to fix this situation by making the deque function on
> hugetlb CMA aware. In the deque function, CMA memory is skipped if
> PF_MEMALLOC_NOCMA flag is found.

Now that this is in sync with the global case I do not have any
objections.

> Acked-by: Mike Kravetz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Michal Hocko <[email protected]>

Minor nit below

[...]
> @@ -1036,10 +1037,16 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
> static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> {
> struct page *page;
> + bool nocma = !!(READ_ONCE(current->flags) & PF_MEMALLOC_NOCMA);

READ_ONCE is not really needed because current->flags are always set on
the current so no race is possible.

> +
> + list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
> + if (nocma && is_migrate_cma_page(page))
> + continue;
>
> - list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
> if (!PageHWPoison(page))
> break;
> + }
> +
> /*
> * if 'non-isolated free hugepage' not found on the list,
> * the allocation fails.
> @@ -1928,7 +1935,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
> return page;
> }
>
> -struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
> +static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
> int nid, nodemask_t *nmask)
> {
> struct page *page;
> --
> 2.7.4

--
Michal Hocko
SUSE Labs

2020-07-15 09:11:52

[permalink] [raw]

Subject: Re: [PATCH 4/4] mm/gup: use a standard migration target allocation callback

On Wed 15-07-20 14:05:29, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> There is a well-defined migration target allocation callback. Use it.
>
> Acked-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/gup.c | 54 ++++++------------------------------------------------
> 1 file changed, 6 insertions(+), 48 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 4ba822a..628ca4c 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1608,52 +1608,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
> }
>
> #ifdef CONFIG_CMA
> -static struct page *new_non_cma_page(struct page *page, unsigned long private)
> -{
> - /*
> - * We want to make sure we allocate the new page from the same node
> - * as the source page.
> - */
> - int nid = page_to_nid(page);
> - /*
> - * Trying to allocate a page for migration. Ignore allocation
> - * failure warnings. We don't force __GFP_THISNODE here because
> - * this node here is the node where we have CMA reservation and
> - * in some case these nodes will have really less non CMA
> - * allocation memory.
> - *
> - * Note that CMA region is prohibited by allocation scope.
> - */
> - gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
> -
> - if (PageHighMem(page))
> - gfp_mask |= __GFP_HIGHMEM;
> -
> -#ifdef CONFIG_HUGETLB_PAGE
> - if (PageHuge(page)) {
> - struct hstate *h = page_hstate(page);
> -
> - gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
> - return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
> - }
> -#endif
> - if (PageTransHuge(page)) {
> - struct page *thp;
> - /*
> - * ignore allocation failure warnings
> - */
> - gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
> -
> - thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
> - if (!thp)
> - return NULL;
> - prep_transhuge_page(thp);
> - return thp;
> - }
> -
> - return __alloc_pages_node(nid, gfp_mask, 0);
> -}
> -
> static long check_and_migrate_cma_pages(struct task_struct *tsk,
> struct mm_struct *mm,
> unsigned long start,
> @@ -1668,6 +1622,10 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
> bool migrate_allow = true;
> LIST_HEAD(cma_page_list);
> long ret = nr_pages;
> + struct migration_target_control mtc = {
> + .nid = NUMA_NO_NODE,
> + .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
> + };
>
> check_again:
> for (i = 0; i < nr_pages;) {
> @@ -1713,8 +1671,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,
> for (i = 0; i < nr_pages; i++)
> put_page(pages[i]);
>
> - if (migrate_pages(&cma_page_list, new_non_cma_page,
> - NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
> + if (migrate_pages(&cma_page_list, alloc_migration_target, NULL,
> + (unsigned long)&mtc, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
> /*
> * some of the pages failed migration. Do get_user_pages
> * without migration.
> --
> 2.7.4
>

--
Michal Hocko
SUSE Labs

2020-07-15 09:53:28

[permalink] [raw]

Subject: Re: [PATCH 3/4] mm/hugetlb: make hugetlb migration callback CMA aware

On 7/15/20 7:05 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> new_non_cma_page() in gup.c requires to allocate the new page that is not
> on the CMA area. new_non_cma_page() implements it by using allocation
> scope APIs.
>
> However, there is a work-around for hugetlb. Normal hugetlb page
> allocation API for migration is alloc_huge_page_nodemask(). It consists
> of two steps. First is dequeing from the pool. Second is, if there is no
> available page on the queue, allocating by using the page allocator.
>
> new_non_cma_page() can't use this API since first step (deque) isn't
> aware of scope API to exclude CMA area. So, new_non_cma_page() exports
> hugetlb internal function for the second step, alloc_migrate_huge_page(),
> to global scope and uses it directly. This is suboptimal since hugetlb
> pages on the queue cannot be utilized.
>
> This patch tries to fix this situation by making the deque function on
> hugetlb CMA aware. In the deque function, CMA memory is skipped if
> PF_MEMALLOC_NOCMA flag is found.
>
> Acked-by: Mike Kravetz <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2020-07-16 07:27:50

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

2020년 7월 15일 (수) 오후 5:24, Vlastimil Babka <[email protected]>님이 작성:
>
> On 7/15/20 7:05 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > Currently, preventing cma area in page allocation is implemented by using
> > current_gfp_context(). However, there are two problems of this
> > implementation.
> >
> > First, this doesn't work for allocation fastpath. In the fastpath,
> > original gfp_mask is used since current_gfp_context() is introduced in
> > order to control reclaim and it is on slowpath.
> > Second, clearing __GFP_MOVABLE has a side effect to exclude the memory
> > on the ZONE_MOVABLE for allocation target.
> >
> > To fix these problems, this patch changes the implementation to exclude
> > cma area in page allocation. Main point of this change is using the
> > alloc_flags. alloc_flags is mainly used to control allocation so it fits
> > for excluding cma area in allocation.
>
> Agreed, should have been done with ALLOC_CMA since the beginning.
>
> > Fixes: d7fefcc (mm/cma: add PF flag to force non cma alloc)
>
> More digits please.
> Fixes: d7fefcc8de91 ("mm/cma: add PF flag to force non cma alloc")

Will do.

> > Cc: <[email protected]>
> > Signed-off-by: Joonsoo Kim <[email protected]>
> > ---
> > include/linux/sched/mm.h | 4 ----
> > mm/page_alloc.c | 27 +++++++++++++++------------
> > 2 files changed, 15 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index 44ad5b7..a73847a 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -191,10 +191,6 @@ static inline gfp_t current_gfp_context(gfp_t flags)
> > flags &= ~(__GFP_IO | __GFP_FS);
> > else if (pflags & PF_MEMALLOC_NOFS)
> > flags &= ~__GFP_FS;
>
> Above this hunk you should also remove PF_MEMALLOC_NOCMA from the test.

Will do.

> > -#ifdef CONFIG_CMA
> > - if (pflags & PF_MEMALLOC_NOCMA)
> > - flags &= ~__GFP_MOVABLE;
> > -#endif
> > }
> > return flags;
> > }
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 6416d08..cd53894 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2791,7 +2791,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> > * allocating from CMA when over half of the zone's free memory
> > * is in the CMA area.
> > */
> > - if (migratetype == MIGRATE_MOVABLE &&
> > + if (alloc_flags & ALLOC_CMA &&
> > zone_page_state(zone, NR_FREE_CMA_PAGES) >
> > zone_page_state(zone, NR_FREE_PAGES) / 2) {
> > page = __rmqueue_cma_fallback(zone, order);
> > @@ -2802,7 +2802,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> > retry:
> > page = __rmqueue_smallest(zone, order, migratetype);
> > if (unlikely(!page)) {
> > - if (migratetype == MIGRATE_MOVABLE)
> > + if (alloc_flags & ALLOC_CMA)
> > page = __rmqueue_cma_fallback(zone, order);
> >
> > if (!page && __rmqueue_fallback(zone, order, migratetype,
> > @@ -3502,11 +3502,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
> > if (likely(!alloc_harder))
> > unusable_free += z->nr_reserved_highatomic;
> >
> > -#ifdef CONFIG_CMA
> > /* If allocation can't use CMA areas don't use free CMA pages */
> > - if (!(alloc_flags & ALLOC_CMA))
> > + if (IS_ENABLED(CONFIG_CMA) && !(alloc_flags & ALLOC_CMA))
> > unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
> > -#endif
> >
> > return unusable_free;
> > }
> > @@ -3693,6 +3691,16 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> > return alloc_flags;
> > }
> >
> > +static inline void current_alloc_flags(gfp_t gfp_mask,
> > + unsigned int *alloc_flags)
> > +{
> > + unsigned int pflags = READ_ONCE(current->flags);
> > +
> > + if (!(pflags & PF_MEMALLOC_NOCMA) &&
> > + gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> > + *alloc_flags |= ALLOC_CMA;
> > +}
>
> I don't like the modification through parameter, would just do what
> current_gfp_context() does and return the modified value.
> Also make it a no-op (including no READ_ONCE(current->flags)) if !CONFIG_CMA,
> please.

Okay.

> > /*
> > * get_page_from_freelist goes through the zonelist trying to allocate
> > * a page.
> > @@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> > struct pglist_data *last_pgdat_dirty_limit = NULL;
> > bool no_fallback;
> >
> > + current_alloc_flags(gfp_mask, &alloc_flags);
>
> I don't see why to move the test here? It will still be executed in the
> fastpath, if that's what you wanted to avoid.

I want to execute it on the fastpath, too. Reason that I moved it here
is that alloc_flags could be reset on slowpath. See the code where
__gfp_pfmemalloc_flags() is on. This is the only place that I can apply
this option to all the allocation paths at once.

Thanks.

> > +
> > retry:
> > /*
> > * Scan zonelist, looking for a zone with enough free.
> > @@ -4339,10 +4349,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> > } else if (unlikely(rt_task(current)) && !in_interrupt())
> > alloc_flags |= ALLOC_HARDER;
> >
> > -#ifdef CONFIG_CMA
> > - if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> > - alloc_flags |= ALLOC_CMA;
> > -#endif
>
> I would just replace this here with:
> alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
>
> > return alloc_flags;
> > }
> >
> > @@ -4808,9 +4814,6 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
> > if (should_fail_alloc_page(gfp_mask, order))
> > return false;
> >
> > - if (IS_ENABLED(CONFIG_CMA) && ac->migratetype == MIGRATE_MOVABLE)
> > - *alloc_flags |= ALLOC_CMA;
>
> And same here... Ah, I see. current_alloc_flags() should probably take a
> migratetype parameter instead of gfp_mask then.
>
> > -
> > return true;
> > }
> >
> >
>

2020-07-16 07:48:56

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

On 7/16/20 9:27 AM, Joonsoo Kim wrote:
> 2020년 7월 15일 (수) 오후 5:24, Vlastimil Babka <[email protected]>님이 작성:
>> > /*
>> > * get_page_from_freelist goes through the zonelist trying to allocate
>> > * a page.
>> > @@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>> > struct pglist_data *last_pgdat_dirty_limit = NULL;
>> > bool no_fallback;
>> >
>> > + current_alloc_flags(gfp_mask, &alloc_flags);
>>
>> I don't see why to move the test here? It will still be executed in the
>> fastpath, if that's what you wanted to avoid.
>
> I want to execute it on the fastpath, too. Reason that I moved it here
> is that alloc_flags could be reset on slowpath. See the code where
> __gfp_pfmemalloc_flags() is on. This is the only place that I can apply
> this option to all the allocation paths at once.

But get_page_from_freelist() might be called multiple times in the slowpath, and
also anyone looking for gfp and alloc flags setup will likely not examine this
function. I don't see a problem in having it in two places that already deal
with alloc_flags setup, as it is now.

> Thanks.
>
>> > +
>> > retry:
>> > /*
>> > * Scan zonelist, looking for a zone with enough free.
>> > @@ -4339,10 +4349,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>> > } else if (unlikely(rt_task(current)) && !in_interrupt())
>> > alloc_flags |= ALLOC_HARDER;
>> >
>> > -#ifdef CONFIG_CMA
>> > - if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>> > - alloc_flags |= ALLOC_CMA;
>> > -#endif
>>
>> I would just replace this here with:
>> alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
>>
>> > return alloc_flags;
>> > }
>> >
>> > @@ -4808,9 +4814,6 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
>> > if (should_fail_alloc_page(gfp_mask, order))
>> > return false;
>> >
>> > - if (IS_ENABLED(CONFIG_CMA) && ac->migratetype == MIGRATE_MOVABLE)
>> > - *alloc_flags |= ALLOC_CMA;
>>
>> And same here... Ah, I see. current_alloc_flags() should probably take a
>> migratetype parameter instead of gfp_mask then.
>>
>> > -
>> > return true;
>> > }
>> >
>> >
>>
>

2020-07-17 07:32:39

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

2020년 7월 16일 (목) 오후 4:45, Vlastimil Babka <[email protected]>님이 작성:
>
> On 7/16/20 9:27 AM, Joonsoo Kim wrote:
> > 2020년 7월 15일 (수) 오후 5:24, Vlastimil Babka <[email protected]>님이 작성:
> >> > /*
> >> > * get_page_from_freelist goes through the zonelist trying to allocate
> >> > * a page.
> >> > @@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> >> > struct pglist_data *last_pgdat_dirty_limit = NULL;
> >> > bool no_fallback;
> >> >
> >> > + current_alloc_flags(gfp_mask, &alloc_flags);
> >>
> >> I don't see why to move the test here? It will still be executed in the
> >> fastpath, if that's what you wanted to avoid.
> >
> > I want to execute it on the fastpath, too. Reason that I moved it here
> > is that alloc_flags could be reset on slowpath. See the code where
> > __gfp_pfmemalloc_flags() is on. This is the only place that I can apply
> > this option to all the allocation paths at once.
>
> But get_page_from_freelist() might be called multiple times in the slowpath, and
> also anyone looking for gfp and alloc flags setup will likely not examine this
> function. I don't see a problem in having it in two places that already deal
> with alloc_flags setup, as it is now.

I agree that anyone looking alloc flags will miss that function easily. Okay.
I will place it on its original place, although we now need to add one
more place.
*Three places* are gfp_to_alloc_flags(), prepare_alloc_pages() and
__gfp_pfmemalloc_flags().

Thanks.

2020-07-17 07:48:50

[permalink] [raw]

Subject: Re: [PATCH 2/4] mm/gup: restrict CMA region by using allocation scope API

2020년 7월 15일 (수) 오후 5:24, Michal Hocko <[email protected]>님이 작성:
>
> On Wed 15-07-20 14:05:27, Joonsoo Kim wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > We have well defined scope API to exclude CMA region.
> > Use it rather than manipulating gfp_mask manually. With this change,
> > we can now use __GFP_MOVABLE for gfp_mask and the ZONE_MOVABLE is also
> > searched by page allocator. For hugetlb, gfp_mask is redefined since
> > it has a regular allocation mask filter for migration target.
> >
> > Note that this can be considered as a fix for the commit 9a4e9f3b2d73
> > ("mm: update get_user_pages_longterm to migrate pages allocated from
> > CMA region"). However, "Fixes" tag isn't added here since it is just
> > suboptimal but it doesn't cause any problem.
>
> But it is breaking the contract that the longterm pins never end up in a
> cma managed memory. So I think Fixes tag is really due. I am not sure
> about stable backport. If the patch was the trivial move of

Previous implementation is correct since longterm pins never end up in a CMA
managed memory with that implementation. It's just a different and suboptimal
implementation to exclude the CMA area. This is why I don't add the "Fixes"
tag on the patch.

> memalloc_nocma_restore then it would be probably worth it because it is
> trivial to review and backport. I suspect that longterm pins in CMA
> regions would cause hard to debug issues where CMA memory will not be
> available. But I am not really sure this is a real problem considering
> how many long term pin users we have and I have also no idea whether
> those are usually used along with CMA users.
>
> Anyway I think it would really be much better to isolate the
> memalloc_nocma_restore and have it first in the series. The reword of

Unfortunately, it's not possible to place it first in the series since
memalloc_nocma_XXX API has a bug that could return the CMA area even if
scope is set up. It is fixed on the first patch in this series.

> the __GFP_MOVABLE functionality is orthogonal.

My logic is that, we basically assume that using __GFP_MOVABLE is possible
in migration target allocation. And, it was necessarily cleared in
order to exclude
the CMA area. Now, we use the other method to exclude the CMA area so
__GFP_MOVABLE is added like usual. If you think that it deserves to be
a separate patch, I will do it on the next version.

> Btw __GFP_NOWARN change is not documented.

Will document it.

Thanks.

2020-07-17 08:10:42

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

On 7/17/20 9:29 AM, Joonsoo Kim wrote:
> 2020년 7월 16일 (목) 오후 4:45, Vlastimil Babka <[email protected]>님이 작성:
>>
>> On 7/16/20 9:27 AM, Joonsoo Kim wrote:
>> > 2020년 7월 15일 (수) 오후 5:24, Vlastimil Babka <[email protected]>님이 작성:
>> >> > /*
>> >> > * get_page_from_freelist goes through the zonelist trying to allocate
>> >> > * a page.
>> >> > @@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>> >> > struct pglist_data *last_pgdat_dirty_limit = NULL;
>> >> > bool no_fallback;
>> >> >
>> >> > + current_alloc_flags(gfp_mask, &alloc_flags);
>> >>
>> >> I don't see why to move the test here? It will still be executed in the
>> >> fastpath, if that's what you wanted to avoid.
>> >
>> > I want to execute it on the fastpath, too. Reason that I moved it here
>> > is that alloc_flags could be reset on slowpath. See the code where
>> > __gfp_pfmemalloc_flags() is on. This is the only place that I can apply
>> > this option to all the allocation paths at once.
>>
>> But get_page_from_freelist() might be called multiple times in the slowpath, and
>> also anyone looking for gfp and alloc flags setup will likely not examine this
>> function. I don't see a problem in having it in two places that already deal
>> with alloc_flags setup, as it is now.
>
> I agree that anyone looking alloc flags will miss that function easily. Okay.
> I will place it on its original place, although we now need to add one
> more place.
> *Three places* are gfp_to_alloc_flags(), prepare_alloc_pages() and
> __gfp_pfmemalloc_flags().

Hm the check below should also work for ALLOC_OOM|ALLOC_NOCMA then.

/* Avoid allocations with no watermarks from looping endlessly */
if (tsk_is_oom_victim(current) &&
(alloc_flags == ALLOC_OOM ||
(gfp_mask & __GFP_NOMEMALLOC)))
goto nopage;

Maybe it's simpler to change get_page_from_freelist() then. But document well.

> Thanks.
>

2020-07-17 08:17:37

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

On 7/17/20 10:10 AM, Vlastimil Babka wrote:
> On 7/17/20 9:29 AM, Joonsoo Kim wrote:
>> 2020년 7월 16일 (목) 오후 4:45, Vlastimil Babka <[email protected]>님이 작성:
>>>
>>> On 7/16/20 9:27 AM, Joonsoo Kim wrote:
>>> > 2020년 7월 15일 (수) 오후 5:24, Vlastimil Babka <[email protected]>님이 작성:
>>> >> > /*
>>> >> > * get_page_from_freelist goes through the zonelist trying to allocate
>>> >> > * a page.
>>> >> > @@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>>> >> > struct pglist_data *last_pgdat_dirty_limit = NULL;
>>> >> > bool no_fallback;
>>> >> >
>>> >> > + current_alloc_flags(gfp_mask, &alloc_flags);
>>> >>
>>> >> I don't see why to move the test here? It will still be executed in the
>>> >> fastpath, if that's what you wanted to avoid.
>>> >
>>> > I want to execute it on the fastpath, too. Reason that I moved it here
>>> > is that alloc_flags could be reset on slowpath. See the code where
>>> > __gfp_pfmemalloc_flags() is on. This is the only place that I can apply
>>> > this option to all the allocation paths at once.
>>>
>>> But get_page_from_freelist() might be called multiple times in the slowpath, and
>>> also anyone looking for gfp and alloc flags setup will likely not examine this
>>> function. I don't see a problem in having it in two places that already deal
>>> with alloc_flags setup, as it is now.
>>
>> I agree that anyone looking alloc flags will miss that function easily. Okay.
>> I will place it on its original place, although we now need to add one
>> more place.
>> *Three places* are gfp_to_alloc_flags(), prepare_alloc_pages() and
>> __gfp_pfmemalloc_flags().
>
> Hm the check below should also work for ALLOC_OOM|ALLOC_NOCMA then.
>
> /* Avoid allocations with no watermarks from looping endlessly */
> if (tsk_is_oom_victim(current) &&
> (alloc_flags == ALLOC_OOM ||
> (gfp_mask & __GFP_NOMEMALLOC)))
> goto nopage;
>
> Maybe it's simpler to change get_page_from_freelist() then. But document well.

But then we have e.g. should_reclaim_retry() which calls __zone_watermark_ok()
where ALLOC_CMA plays a role too, so that means we should have alloc_mask set up
correctly wrt ALLOC_CMA at the __alloc_pages_slowpath() level...

>> Thanks.
>>
>

2020-07-17 08:31:00

[permalink] [raw]

Subject: Re: [PATCH 2/4] mm/gup: restrict CMA region by using allocation scope API

On Fri 17-07-20 16:46:38, Joonsoo Kim wrote:
> 2020년 7월 15일 (수) 오후 5:24, Michal Hocko <[email protected]>님이 작성:
> >
> > On Wed 15-07-20 14:05:27, Joonsoo Kim wrote:
> > > From: Joonsoo Kim <[email protected]>
> > >
> > > We have well defined scope API to exclude CMA region.
> > > Use it rather than manipulating gfp_mask manually. With this change,
> > > we can now use __GFP_MOVABLE for gfp_mask and the ZONE_MOVABLE is also
> > > searched by page allocator. For hugetlb, gfp_mask is redefined since
> > > it has a regular allocation mask filter for migration target.
> > >
> > > Note that this can be considered as a fix for the commit 9a4e9f3b2d73
> > > ("mm: update get_user_pages_longterm to migrate pages allocated from
> > > CMA region"). However, "Fixes" tag isn't added here since it is just
> > > suboptimal but it doesn't cause any problem.
> >
> > But it is breaking the contract that the longterm pins never end up in a
> > cma managed memory. So I think Fixes tag is really due. I am not sure
> > about stable backport. If the patch was the trivial move of
>
> Previous implementation is correct since longterm pins never end up in a CMA
> managed memory with that implementation. It's just a different and suboptimal
> implementation to exclude the CMA area. This is why I don't add the "Fixes"A
> tag on the patch.

But the current implementation calls memalloc_nocma_restore too early so
__gu_longterm_locked will migrate pages possibly to CMA ranges as there
is no GFP_MOVABLE restriction in place. Or am I missing something?

--
Michal Hocko
SUSE Labs

2020-07-17 08:33:30

by David Laight

[permalink] [raw]

Subject: RE: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

From: [email protected]
> Sent: 15 July 2020 06:05
> From: Joonsoo Kim <[email protected]>
>
> Currently, preventing cma area in page allocation is implemented by using
> current_gfp_context(). However, there are two problems of this
> implementation.
...
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6416d08..cd53894 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
...
> @@ -3693,6 +3691,16 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> return alloc_flags;
> }
>
> +static inline void current_alloc_flags(gfp_t gfp_mask,
> + unsigned int *alloc_flags)
> +{
> + unsigned int pflags = READ_ONCE(current->flags);
> +
> + if (!(pflags & PF_MEMALLOC_NOCMA) &&
> + gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> + *alloc_flags |= ALLOC_CMA;
> +}
> +

I'd guess this would be easier to understand and probably generate
better code if renamed and used as:
alloc_flags |= can_alloc_cma(gpf_mask);

Given it is a 'static inline' the compiler might end up
generating the same code.
If you needed to clear a flag doing:
alloc_flags = current_alloc_flags(gpf_mask, alloc_flags);
is much better for non-inlined function.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-07-17 09:13:14

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

2020년 7월 17일 (금) 오후 5:15, Vlastimil Babka <[email protected]>님이 작성:
>
> On 7/17/20 10:10 AM, Vlastimil Babka wrote:
> > On 7/17/20 9:29 AM, Joonsoo Kim wrote:
> >> 2020년 7월 16일 (목) 오후 4:45, Vlastimil Babka <[email protected]>님이 작성:
> >>>
> >>> On 7/16/20 9:27 AM, Joonsoo Kim wrote:
> >>> > 2020년 7월 15일 (수) 오후 5:24, Vlastimil Babka <[email protected]>님이 작성:
> >>> >> > /*
> >>> >> > * get_page_from_freelist goes through the zonelist trying to allocate
> >>> >> > * a page.
> >>> >> > @@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> >>> >> > struct pglist_data *last_pgdat_dirty_limit = NULL;
> >>> >> > bool no_fallback;
> >>> >> >
> >>> >> > + current_alloc_flags(gfp_mask, &alloc_flags);
> >>> >>
> >>> >> I don't see why to move the test here? It will still be executed in the
> >>> >> fastpath, if that's what you wanted to avoid.
> >>> >
> >>> > I want to execute it on the fastpath, too. Reason that I moved it here
> >>> > is that alloc_flags could be reset on slowpath. See the code where
> >>> > __gfp_pfmemalloc_flags() is on. This is the only place that I can apply
> >>> > this option to all the allocation paths at once.
> >>>
> >>> But get_page_from_freelist() might be called multiple times in the slowpath, and
> >>> also anyone looking for gfp and alloc flags setup will likely not examine this
> >>> function. I don't see a problem in having it in two places that already deal
> >>> with alloc_flags setup, as it is now.
> >>
> >> I agree that anyone looking alloc flags will miss that function easily. Okay.
> >> I will place it on its original place, although we now need to add one
> >> more place.
> >> *Three places* are gfp_to_alloc_flags(), prepare_alloc_pages() and
> >> __gfp_pfmemalloc_flags().
> >
> > Hm the check below should also work for ALLOC_OOM|ALLOC_NOCMA then.
> >
> > /* Avoid allocations with no watermarks from looping endlessly */
> > if (tsk_is_oom_victim(current) &&
> > (alloc_flags == ALLOC_OOM ||
> > (gfp_mask & __GFP_NOMEMALLOC)))
> > goto nopage;
> >
> > Maybe it's simpler to change get_page_from_freelist() then. But document well.
>
> But then we have e.g. should_reclaim_retry() which calls __zone_watermark_ok()
> where ALLOC_CMA plays a role too, so that means we should have alloc_mask set up
> correctly wrt ALLOC_CMA at the __alloc_pages_slowpath() level...

Good catch! Hmm... Okay. It would be necessarily handled in three places.
I will fix it on the next version. Anyway, we need some clean-up about
alloc_flags
handling since it looks not good for maintenance.

Thanks.

2020-07-17 09:16:59

[permalink] [raw]

Subject: Re: [PATCH 1/4] mm/page_alloc: fix non cma alloc context

2020년 7월 17일 (금) 오후 5:32, David Laight <[email protected]>님이 작성:
>
> From: [email protected]
> > Sent: 15 July 2020 06:05
> > From: Joonsoo Kim <[email protected]>
> >
> > Currently, preventing cma area in page allocation is implemented by using
> > current_gfp_context(). However, there are two problems of this
> > implementation.
> ...
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 6416d08..cd53894 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> ...
> > @@ -3693,6 +3691,16 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
> > return alloc_flags;
> > }
> >
> > +static inline void current_alloc_flags(gfp_t gfp_mask,
> > + unsigned int *alloc_flags)
> > +{
> > + unsigned int pflags = READ_ONCE(current->flags);
> > +
> > + if (!(pflags & PF_MEMALLOC_NOCMA) &&
> > + gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> > + *alloc_flags |= ALLOC_CMA;
> > +}
> > +
>
> I'd guess this would be easier to understand and probably generate
> better code if renamed and used as:
> alloc_flags |= can_alloc_cma(gpf_mask);
>
> Given it is a 'static inline' the compiler might end up
> generating the same code.
> If you needed to clear a flag doing:
> alloc_flags = current_alloc_flags(gpf_mask, alloc_flags);
> is much better for non-inlined function.

Vlastimil suggested this way and I have agreed with that. Anyway,
thanks for the suggestion.

Thanks.

2020-07-17 09:28:59

[permalink] [raw]

Subject: Re: [PATCH 2/4] mm/gup: restrict CMA region by using allocation scope API

2020년 7월 17일 (금) 오후 5:26, Michal Hocko <[email protected]>님이 작성:
>
> On Fri 17-07-20 16:46:38, Joonsoo Kim wrote:
> > 2020년 7월 15일 (수) 오후 5:24, Michal Hocko <[email protected]>님이 작성:
> > >
> > > On Wed 15-07-20 14:05:27, Joonsoo Kim wrote:
> > > > From: Joonsoo Kim <[email protected]>
> > > >
> > > > We have well defined scope API to exclude CMA region.
> > > > Use it rather than manipulating gfp_mask manually. With this change,
> > > > we can now use __GFP_MOVABLE for gfp_mask and the ZONE_MOVABLE is also
> > > > searched by page allocator. For hugetlb, gfp_mask is redefined since
> > > > it has a regular allocation mask filter for migration target.
> > > >
> > > > Note that this can be considered as a fix for the commit 9a4e9f3b2d73
> > > > ("mm: update get_user_pages_longterm to migrate pages allocated from
> > > > CMA region"). However, "Fixes" tag isn't added here since it is just
> > > > suboptimal but it doesn't cause any problem.
> > >
> > > But it is breaking the contract that the longterm pins never end up in a
> > > cma managed memory. So I think Fixes tag is really due. I am not sure
> > > about stable backport. If the patch was the trivial move of
> >
> > Previous implementation is correct since longterm pins never end up in a CMA
> > managed memory with that implementation. It's just a different and suboptimal
> > implementation to exclude the CMA area. This is why I don't add the "Fixes"A
> > tag on the patch.
>
> But the current implementation calls memalloc_nocma_restore too early so
> __gu_longterm_locked will migrate pages possibly to CMA ranges as there
> is no GFP_MOVABLE restriction in place. Or am I missing something?

IIUC, calling memalloc_nocma_restore() too early doesn't cause the
actual problem.

Final check is done by check_and_migrate_cma_pages() which is outside of
scope API. If we find a CMA page in between the gup range here, the page is
migrated to the migration target page and this target page is allocated by
new_non_cma_page().

new_non_cma_page() try to allocate the page without __GFP_MOVABLE so
returned page will not be CMA area memory. Therefore, even if
memalloc_nocma_restore() is called early, there is no actual problem.

Thanks.

2020-07-17 11:09:07