2020-09-28 08:52:18

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

From: Joonsoo Kim <[email protected]>

memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.

Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
specified.

Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97..104d2e1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
struct page *page;

if (likely(order == 0)) {
- page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
+ /*
+ * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
+ * we need to skip it when CMA area isn't allowed.
+ */
+ if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
+ migratetype != MIGRATE_MOVABLE) {
+ page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
migratetype, alloc_flags);
- goto out;
+ goto out;
+ }
}

/*
@@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone,

do {
page = NULL;
- if (alloc_flags & ALLOC_HARDER) {
+ if (order > 0 && alloc_flags & ALLOC_HARDER) {
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
if (page)
trace_mm_page_alloc_zone_locked(page, order, migratetype);
--
2.7.4


2020-09-28 23:56:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On Mon, 28 Sep 2020 17:50:46 +0900 [email protected] wrote:

> From: Joonsoo Kim <[email protected]>
>
> memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> on CMA area, but, there is a missing case and the page on CMA area could
> be allocated even if APIs are used. This patch handles this case to fix
> the potential issue.
>
> Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> specified.

Changelog doesn't describe the end-user visible effects of the bug.
Please send this description?

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
> struct page *page;
>
> if (likely(order == 0)) {
> - page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> + /*
> + * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> + * we need to skip it when CMA area isn't allowed.
> + */
> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> + migratetype != MIGRATE_MOVABLE) {
> + page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> migratetype, alloc_flags);
> - goto out;
> + goto out;
> + }
> }
>
> /*

We still really really don't want to be adding overhead to the page
allocation hotpath for a really really obscure feature from a single
callsite.

Do we have an understanding of how many people's kernels are enabling
CONFIG_CMA?

I previously suggested retrying the allocation in
__gup_longterm_locked() but you said "it cannot ensure that we
eventually get the non-CMA page". Please explain why?

What about manually emptying the pcplists beforehand?

Or byassing the pcplists for this caller and calling __rmqueue() directly?

> @@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone,
>
> do {
> page = NULL;
> - if (alloc_flags & ALLOC_HARDER) {
> + if (order > 0 && alloc_flags & ALLOC_HARDER) {
> page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> if (page)
> trace_mm_page_alloc_zone_locked(page, order, migratetype);

What does this hunk do?

2020-09-29 01:30:53

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 9월 29일 (화) 오전 8:52, Andrew Morton <[email protected]>님이 작성:
>
> On Mon, 28 Sep 2020 17:50:46 +0900 [email protected] wrote:
>
> > From: Joonsoo Kim <[email protected]>
> >
> > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> > on CMA area, but, there is a missing case and the page on CMA area could
> > be allocated even if APIs are used. This patch handles this case to fix
> > the potential issue.
> >
> > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> > specified.
>
> Changelog doesn't describe the end-user visible effects of the bug.
> Please send this description?

How about this one?

memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.

For now, these APIs are used to prevent long-term pinning on the CMA page.
When the long-term pinning is requested on the CMA page, it is migrated to
the non-CMA page before pinning. This non-CMA page is allocated by using
memalloc_nocma_{save/restore} APIs. If APIs doesn't work as intended,
the CMA page is allocated and it is pinned for a long time. This long-term pin
for the CMA page causes cma_alloc() failure and it could result in wrong
behaviour on the device driver who uses the cma_alloc().

Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
specified.

> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
> > struct page *page;
> >
> > if (likely(order == 0)) {
> > - page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> > + /*
> > + * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> > + * we need to skip it when CMA area isn't allowed.
> > + */
> > + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> > + migratetype != MIGRATE_MOVABLE) {
> > + page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> > migratetype, alloc_flags);
> > - goto out;
> > + goto out;
> > + }
> > }
> >
> > /*
>
> We still really really don't want to be adding overhead to the page
> allocation hotpath for a really really obscure feature from a single
> callsite.

In fact, if we clear the __GFP_MOVABLE flag when initializing the
allocation context, we can avoid CMA page allocation without
adding this overhead to the page allocation hotpath. But, I think
this is not a good practice since it cheats the allocation type. There
would be a side-effect, for example, we cannot use the memory on
the ZONE_MOVABLE in this case.

> Do we have an understanding of how many people's kernels are enabling
> CONFIG_CMA?

AFAIK, the almost embedded system enables CONFIG_CMA. For example,
the handset, TV, AVN in a car. Recently, Roman makes CMA usable for huge
page allocation so users are continuously increased.

> I previously suggested retrying the allocation in
> __gup_longterm_locked() but you said "it cannot ensure that we
> eventually get the non-CMA page". Please explain why?

To avoid allocating a CMA page, the pcplist should be empty. Retry doesn't
guarantee that we will hit the case that the pcplist is empty. It increases
the probability for this case, but, I think that relying on the
probability is not
a good practice.

> What about manually emptying the pcplists beforehand?

It also increases the probability. schedule() or interrupt after emptying but
before the allocation could invalidate the effect.

> Or byassing the pcplists for this caller and calling __rmqueue() directly?

What this patch does is this one.

> > @@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone,
> >
> > do {
> > page = NULL;
> > - if (alloc_flags & ALLOC_HARDER) {
> > + if (order > 0 && alloc_flags & ALLOC_HARDER) {
> > page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> > if (page)
> > trace_mm_page_alloc_zone_locked(page, order, migratetype);
>
> What does this hunk do?

MIGRATE_HIGHATOMIC is a reserved area for high order atomic allocation.
This hunk makes that order-0 allocation skip this area.

Thanks.

2020-09-29 04:53:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On Tue, 29 Sep 2020 10:28:05 +0900 Joonsoo Kim <[email protected]> wrote:

> > What about manually emptying the pcplists beforehand?
>
> It also increases the probability. schedule() or interrupt after emptying but
> before the allocation could invalidate the effect.

Keep local interrupts disabled across the pcp drain and the allocation
attempt.

> > Or byassing the pcplists for this caller and calling __rmqueue() directly?
>
> What this patch does is this one.

I meant via a different function rather than by adding overhead to the
existing commonly-used function.

2020-09-29 06:48:02

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 9월 29일 (화) 오후 1:50, Andrew Morton <[email protected]>님이 작성:
>
> On Tue, 29 Sep 2020 10:28:05 +0900 Joonsoo Kim <[email protected]> wrote:
>
> > > What about manually emptying the pcplists beforehand?
> >
> > It also increases the probability. schedule() or interrupt after emptying but
> > before the allocation could invalidate the effect.
>
> Keep local interrupts disabled across the pcp drain and the allocation
> attempt.

As said before, it's an allocation context API and actual allocation
happens later.
Doing such things there is not an easy job.

> > > Or byassing the pcplists for this caller and calling __rmqueue() directly?
> >
> > What this patch does is this one.
>
> I meant via a different function rather than by adding overhead to the
> existing commonly-used function.

Got it. One idea could be disabling/enabling pcp list on these APIs but
it's overhead would not be appropriate on these APIs. I don't have
another idea that doesn't touch the allocation path.

Thanks.

2020-09-29 08:11:56

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On Mon 28-09-20 17:50:46, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> on CMA area, but, there is a missing case and the page on CMA area could
> be allocated even if APIs are used. This patch handles this case to fix
> the potential issue.
>
> Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> specified.
>
> Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> mm/page_alloc.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fab5e97..104d2e1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
> struct page *page;
>
> if (likely(order == 0)) {
> - page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> + /*
> + * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> + * we need to skip it when CMA area isn't allowed.
> + */
> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> + migratetype != MIGRATE_MOVABLE) {
> + page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> migratetype, alloc_flags);
> - goto out;
> + goto out;
> + }
> }

This approach looks definitely better than the previous version.

>
> /*
> @@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone,
>
> do {
> page = NULL;
> - if (alloc_flags & ALLOC_HARDER) {
> + if (order > 0 && alloc_flags & ALLOC_HARDER) {
> page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> if (page)
> trace_mm_page_alloc_zone_locked(page, order, migratetype);

But this condition is not clear to me. __rmqueue_smallest doesn't access
pcp lists. Maybe I have missed the point in the original discussion but
this deserves a comment at least.

> --
> 2.7.4

--
Michal Hocko
SUSE Labs

2020-09-29 08:17:13

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On 9/28/20 10:50 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> on CMA area, but, there is a missing case and the page on CMA area could
> be allocated even if APIs are used. This patch handles this case to fix
> the potential issue.
>
> Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> specified.
>
> Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
> Signed-off-by: Joonsoo Kim <[email protected]>

It's unfortunate, but hopefully we can still get the CMA in ZONE_MOVABLE one day
and get rid of all of this again? :)

For that reason I'd prefer simple solutions even if there's some potential
overhead. I think those tests should be in the noise, and avoided completely
with !CONFIG_CMA.

Acked-by: Vlastimil Babka <[email protected]>

> ---
> mm/page_alloc.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fab5e97..104d2e1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
> struct page *page;
>
> if (likely(order == 0)) {
> - page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> + /*
> + * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> + * we need to skip it when CMA area isn't allowed.
> + */
> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> + migratetype != MIGRATE_MOVABLE) {
> + page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> migratetype, alloc_flags);
> - goto out;
> + goto out;
> + }
> }
>
> /*
> @@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone,
>
> do {
> page = NULL;
> - if (alloc_flags & ALLOC_HARDER) {
> + if (order > 0 && alloc_flags & ALLOC_HARDER) {
> page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> if (page)
> trace_mm_page_alloc_zone_locked(page, order, migratetype);
>

2020-09-29 08:42:11

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 9월 29일 (화) 오후 5:08, Michal Hocko <[email protected]>님이 작성:
>
> On Mon 28-09-20 17:50:46, Joonsoo Kim wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> > on CMA area, but, there is a missing case and the page on CMA area could
> > be allocated even if APIs are used. This patch handles this case to fix
> > the potential issue.
> >
> > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> > specified.
> >
> > Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
> > Signed-off-by: Joonsoo Kim <[email protected]>
> > ---
> > mm/page_alloc.c | 13 ++++++++++---
> > 1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index fab5e97..104d2e1 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
> > struct page *page;
> >
> > if (likely(order == 0)) {
> > - page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> > + /*
> > + * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> > + * we need to skip it when CMA area isn't allowed.
> > + */
> > + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> > + migratetype != MIGRATE_MOVABLE) {
> > + page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> > migratetype, alloc_flags);
> > - goto out;
> > + goto out;
> > + }
> > }
>
> This approach looks definitely better than the previous version.

Thanks!

> >
> > /*
> > @@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone,
> >
> > do {
> > page = NULL;
> > - if (alloc_flags & ALLOC_HARDER) {
> > + if (order > 0 && alloc_flags & ALLOC_HARDER) {
> > page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> > if (page)
> > trace_mm_page_alloc_zone_locked(page, order, migratetype);
>
> But this condition is not clear to me. __rmqueue_smallest doesn't access
> pcp lists. Maybe I have missed the point in the original discussion but
> this deserves a comment at least.

Before the pcplist skipping is applied, order-0 request can not reach here.
But, now, an order-0 request can reach here. Free memory on
MIGRATE_HIGHATOMIC is reserved for high-order atomic allocation
so an order-0 request should skip it.

I will add a code comment on the next version.

Thanks.

2020-09-29 09:56:03

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On Tue 29-09-20 17:38:43, Joonsoo Kim wrote:
> 2020년 9월 29일 (화) 오후 5:08, Michal Hocko <[email protected]>님이 작성:
> >
> > On Mon 28-09-20 17:50:46, Joonsoo Kim wrote:
> > > From: Joonsoo Kim <[email protected]>
> > >
> > > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> > > on CMA area, but, there is a missing case and the page on CMA area could
> > > be allocated even if APIs are used. This patch handles this case to fix
> > > the potential issue.
> > >
> > > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> > > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> > > specified.
> > >
> > > Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
> > > Signed-off-by: Joonsoo Kim <[email protected]>
> > > ---
> > > mm/page_alloc.c | 13 ++++++++++---
> > > 1 file changed, 10 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index fab5e97..104d2e1 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
> > > struct page *page;
> > >
> > > if (likely(order == 0)) {
> > > - page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> > > + /*
> > > + * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
> > > + * we need to skip it when CMA area isn't allowed.
> > > + */
> > > + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
> > > + migratetype != MIGRATE_MOVABLE) {
> > > + page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
> > > migratetype, alloc_flags);
> > > - goto out;
> > > + goto out;
> > > + }
> > > }
> >
> > This approach looks definitely better than the previous version.
>
> Thanks!
>
> > >
> > > /*
> > > @@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone,
> > >
> > > do {
> > > page = NULL;
> > > - if (alloc_flags & ALLOC_HARDER) {
> > > + if (order > 0 && alloc_flags & ALLOC_HARDER) {
> > > page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> > > if (page)
> > > trace_mm_page_alloc_zone_locked(page, order, migratetype);
> >
> > But this condition is not clear to me. __rmqueue_smallest doesn't access
> > pcp lists. Maybe I have missed the point in the original discussion but
> > this deserves a comment at least.
>
> Before the pcplist skipping is applied, order-0 request can not reach here.
> But, now, an order-0 request can reach here. Free memory on
> MIGRATE_HIGHATOMIC is reserved for high-order atomic allocation
> so an order-0 request should skip it.

OK, I see. Thanks for the clarification.

> I will add a code comment on the next version.

Thanks, that would be indeed helpful. With that, feel free to add
Acked-by: Michal Hocko <[email protected]>

--
Michal Hocko
SUSE Labs