2020-08-25 05:04:09

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

From: Joonsoo Kim <[email protected]>

memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.

Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
specified.

This patch implements this behaviour by checking allocated page from
the pcplist rather than skipping an allocation from the pcplist entirely.
Skipping the pcplist entirely would result in a mismatch between watermark
check and actual page allocation. And, it requires to break current code
layering that order-0 page is always handled by the pcplist. I'd prefer
to avoid it so this patch uses different way to skip CMA page allocation
from the pcplist.

Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/page_alloc.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e2bab4..c4abf58 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3341,6 +3341,22 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
pcp = &this_cpu_ptr(zone->pageset)->pcp;
list = &pcp->lists[migratetype];
page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
+#ifdef CONFIG_CMA
+ if (page) {
+ int mt = get_pcppage_migratetype(page);
+
+ /*
+ * pcp could have the pages on CMA area and we need to skip it
+ * when !ALLOC_CMA. Free all pcplist and retry allocation.
+ */
+ if (is_migrate_cma(mt) && !(alloc_flags & ALLOC_CMA)) {
+ list_add(&page->lru, &pcp->lists[migratetype]);
+ pcp->count++;
+ free_pcppages_bulk(zone, pcp->count, pcp);
+ page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
+ }
+ }
+#endif
if (page) {
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
zone_statistics(preferred_zone, zone);
--
2.7.4


2020-08-25 05:14:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On Tue, 25 Aug 2020 13:59:42 +0900 [email protected] wrote:

> From: Joonsoo Kim <[email protected]>
>
> memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> on CMA area, but, there is a missing case and the page on CMA area could
> be allocated even if APIs are used. This patch handles this case to fix
> the potential issue.
>
> Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> specified.
>
> This patch implements this behaviour by checking allocated page from
> the pcplist rather than skipping an allocation from the pcplist entirely.
> Skipping the pcplist entirely would result in a mismatch between watermark
> check and actual page allocation. And, it requires to break current code
> layering that order-0 page is always handled by the pcplist. I'd prefer
> to avoid it so this patch uses different way to skip CMA page allocation
> from the pcplist.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3341,6 +3341,22 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
> pcp = &this_cpu_ptr(zone->pageset)->pcp;
> list = &pcp->lists[migratetype];
> page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
> +#ifdef CONFIG_CMA
> + if (page) {
> + int mt = get_pcppage_migratetype(page);
> +
> + /*
> + * pcp could have the pages on CMA area and we need to skip it
> + * when !ALLOC_CMA. Free all pcplist and retry allocation.
> + */
> + if (is_migrate_cma(mt) && !(alloc_flags & ALLOC_CMA)) {
> + list_add(&page->lru, &pcp->lists[migratetype]);
> + pcp->count++;
> + free_pcppages_bulk(zone, pcp->count, pcp);
> + page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
> + }
> + }
> +#endif
> if (page) {
> __count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
> zone_statistics(preferred_zone, zone);

That's a bunch more code on a very hot path to serve an obscure feature
which has a single obscure callsite.

Can we instead put the burden on that callsite rather than upon
everyone? For (dumb) example, teach __gup_longterm_locked() to put the
page back if it's CMA and go get another one?


2020-08-25 06:17:25

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 8월 25일 (화) 오후 2:10, Andrew Morton <[email protected]>님이 작성:
>
> On Tue, 25 Aug 2020 13:59:42 +0900 [email protected] wrote:
>
> > From: Joonsoo Kim <[email protected]>
> >
> > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> > on CMA area, but, there is a missing case and the page on CMA area could
> > be allocated even if APIs are used. This patch handles this case to fix
> > the potential issue.
> >
> > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> > specified.
> >
> > This patch implements this behaviour by checking allocated page from
> > the pcplist rather than skipping an allocation from the pcplist entirely.
> > Skipping the pcplist entirely would result in a mismatch between watermark
> > check and actual page allocation. And, it requires to break current code
> > layering that order-0 page is always handled by the pcplist. I'd prefer
> > to avoid it so this patch uses different way to skip CMA page allocation
> > from the pcplist.
> >
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3341,6 +3341,22 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
> > pcp = &this_cpu_ptr(zone->pageset)->pcp;
> > list = &pcp->lists[migratetype];
> > page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
> > +#ifdef CONFIG_CMA
> > + if (page) {
> > + int mt = get_pcppage_migratetype(page);
> > +
> > + /*
> > + * pcp could have the pages on CMA area and we need to skip it
> > + * when !ALLOC_CMA. Free all pcplist and retry allocation.
> > + */
> > + if (is_migrate_cma(mt) && !(alloc_flags & ALLOC_CMA)) {
> > + list_add(&page->lru, &pcp->lists[migratetype]);
> > + pcp->count++;
> > + free_pcppages_bulk(zone, pcp->count, pcp);
> > + page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
> > + }
> > + }
> > +#endif
> > if (page) {
> > __count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
> > zone_statistics(preferred_zone, zone);
>
> That's a bunch more code on a very hot path to serve an obscure feature
> which has a single obscure callsite.
>
> Can we instead put the burden on that callsite rather than upon
> everyone? For (dumb) example, teach __gup_longterm_locked() to put the
> page back if it's CMA and go get another one?

Hmm... Unfortunately, it cannot ensure that we eventually get the non-CMA page.
I think that the only way to ensure it is to implement the
functionality here. We can
use 'unlikely' or 'static branch' to reduce the overhead for a really
rare case but
for now I have no idea how to completely remove the overhead.

Thanks.

2020-08-25 09:45:33

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs


On 8/25/20 6:59 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> on CMA area, but, there is a missing case and the page on CMA area could
> be allocated even if APIs are used. This patch handles this case to fix
> the potential issue.
>
> Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> specified.
>
> This patch implements this behaviour by checking allocated page from
> the pcplist rather than skipping an allocation from the pcplist entirely.
> Skipping the pcplist entirely would result in a mismatch between watermark
> check and actual page allocation.

Are you sure? I think a mismatch exists already. Pages can be on the pcplist but
they are not considered as free in the watermark check. So passing watermark
check means there should be also pages on free lists. So skipping pcplists would
be safe, no?

> And, it requires to break current code
> layering that order-0 page is always handled by the pcplist. I'd prefer
> to avoid it so this patch uses different way to skip CMA page allocation
> from the pcplist.

Well it would be much simpler and won't affect most of allocations. Better than
flushing pcplists IMHO.

Something like this?
----8<----
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e2bab486fea..15787ffb1708 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3361,7 +3361,10 @@ struct page *rmqueue(struct zone *preferred_zone,
unsigned long flags;
struct page *page;

- if (likely(order == 0)) {
+ if (likely(order == 0) &&
+ (!IS_ENABLED(CONFIG_CMA) ||
+ migratetype != MIGRATE_MOVABLE ||
+ alloc_flags & ALLOC_CMA)) {
page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
migratetype, alloc_flags);
goto out;

2020-08-26 00:44:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On Tue, 25 Aug 2020 14:34:32 +0900 Joonsoo Kim <[email protected]> wrote:

> >
> > That's a bunch more code on a very hot path to serve an obscure feature
> > which has a single obscure callsite.
> >
> > Can we instead put the burden on that callsite rather than upon
> > everyone? For (dumb) example, teach __gup_longterm_locked() to put the
> > page back if it's CMA and go get another one?
>
> Hmm... Unfortunately, it cannot ensure that we eventually get the non-CMA page.
> I think that the only way to ensure it is to implement the
> functionality here. We can
> use 'unlikely' or 'static branch' to reduce the overhead for a really
> rare case but
> for now I have no idea how to completely remove the overhead.

Gee, there must be something? Provide the gup code with a special
entry point which takes the page straight from __rmqueue() and bypasses
the pcp lists?

2020-08-26 05:16:00

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 8월 25일 (화) 오후 6:43, Vlastimil Babka <[email protected]>님이 작성:
>
>
> On 8/25/20 6:59 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> > on CMA area, but, there is a missing case and the page on CMA area could
> > be allocated even if APIs are used. This patch handles this case to fix
> > the potential issue.
> >
> > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> > specified.
> >
> > This patch implements this behaviour by checking allocated page from
> > the pcplist rather than skipping an allocation from the pcplist entirely.
> > Skipping the pcplist entirely would result in a mismatch between watermark
> > check and actual page allocation.
>
> Are you sure? I think a mismatch exists already. Pages can be on the pcplist but
> they are not considered as free in the watermark check. So passing watermark
> check means there should be also pages on free lists. So skipping pcplists would
> be safe, no?

You are right.

> > And, it requires to break current code
> > layering that order-0 page is always handled by the pcplist. I'd prefer
> > to avoid it so this patch uses different way to skip CMA page allocation
> > from the pcplist.
>
> Well it would be much simpler and won't affect most of allocations. Better than
> flushing pcplists IMHO.

Hmm...Still, I'd prefer my approach. There are two reasons. First,
layering problem
mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
As the name shows, it's for high order atomic allocation. But, after
skipping pcplist
allocation as you suggested, we could get there with order 0 request.
We can also
change this code, but, I'd hope to maintain current layering. Second,
a performance
reason. After the flag for nocma is up, a burst of nocma allocation
could come. After
flushing the pcplist one times, we can use the free page on the
pcplist as usual until
the context is changed.

How about my reasoning?

Thanks.

2020-08-26 05:24:37

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 8월 26일 (수) 오전 9:42, Andrew Morton <[email protected]>님이 작성:
>
> On Tue, 25 Aug 2020 14:34:32 +0900 Joonsoo Kim <[email protected]> wrote:
>
> > >
> > > That's a bunch more code on a very hot path to serve an obscure feature
> > > which has a single obscure callsite.
> > >
> > > Can we instead put the burden on that callsite rather than upon
> > > everyone? For (dumb) example, teach __gup_longterm_locked() to put the
> > > page back if it's CMA and go get another one?
> >
> > Hmm... Unfortunately, it cannot ensure that we eventually get the non-CMA page.
> > I think that the only way to ensure it is to implement the
> > functionality here. We can
> > use 'unlikely' or 'static branch' to reduce the overhead for a really
> > rare case but
> > for now I have no idea how to completely remove the overhead.
>
> Gee, there must be something? Provide the gup code with a special
> entry point which takes the page straight from __rmqueue() and bypasses
> the pcp lists?

Hmm... it seems not possible. It's allocation context API and maybe actual
allocation happens in handle_mm_fault() or it's successor. We cannot use
a special entry point for allocation there since it's a general function.

And, IMHO, making a special allocation function that bypasses the pcp list
would not be a good practice.

Thanks.

2020-08-27 12:48:47

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On 8/26/20 7:12 AM, Joonsoo Kim wrote:
> 2020년 8월 25일 (화) 오후 6:43, Vlastimil Babka <[email protected]>님이 작성:
>>
>>
>> On 8/25/20 6:59 AM, [email protected] wrote:
>> > From: Joonsoo Kim <[email protected]>
>> >
>> > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
>> > on CMA area, but, there is a missing case and the page on CMA area could
>> > be allocated even if APIs are used. This patch handles this case to fix
>> > the potential issue.
>> >
>> > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
>> > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
>> > specified.
>> >
>> > This patch implements this behaviour by checking allocated page from
>> > the pcplist rather than skipping an allocation from the pcplist entirely.
>> > Skipping the pcplist entirely would result in a mismatch between watermark
>> > check and actual page allocation.
>>
>> Are you sure? I think a mismatch exists already. Pages can be on the pcplist but
>> they are not considered as free in the watermark check. So passing watermark
>> check means there should be also pages on free lists. So skipping pcplists would
>> be safe, no?
>
> You are right.
>
>> > And, it requires to break current code
>> > layering that order-0 page is always handled by the pcplist. I'd prefer
>> > to avoid it so this patch uses different way to skip CMA page allocation
>> > from the pcplist.
>>
>> Well it would be much simpler and won't affect most of allocations. Better than
>> flushing pcplists IMHO.
>
> Hmm...Still, I'd prefer my approach. There are two reasons. First,
> layering problem
> mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> As the name shows, it's for high order atomic allocation. But, after
> skipping pcplist
> allocation as you suggested, we could get there with order 0 request.
> We can also
> change this code, but, I'd hope to maintain current layering. Second,
> a performance
> reason. After the flag for nocma is up, a burst of nocma allocation
> could come. After
> flushing the pcplist one times, we can use the free page on the
> pcplist as usual until
> the context is changed.

Both solutions are ugly and we should have CMA in ZONE_MOVABLE or get rid of it
completely. Let's CC Mel what he thinks.

> How about my reasoning?
>
> Thanks.
>

2020-08-27 14:02:15

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On Wed, Aug 26, 2020 at 02:12:44PM +0900, Joonsoo Kim wrote:
> > > And, it requires to break current code
> > > layering that order-0 page is always handled by the pcplist. I'd prefer
> > > to avoid it so this patch uses different way to skip CMA page allocation
> > > from the pcplist.
> >
> > Well it would be much simpler and won't affect most of allocations. Better than
> > flushing pcplists IMHO.
>
> Hmm...Still, I'd prefer my approach.

I prefer the pcp bypass approach. It's simpler and it does not incur a
pcp drain/refill penalty.

> There are two reasons. First,
> layering problem
> mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> As the name shows, it's for high order atomic allocation. But, after
> skipping pcplist
> allocation as you suggested, we could get there with order 0 request.

I guess your concern is that under some circumstances that a request that
passes a watermark check could fail due to a highatomic reserve and to
an extent this is true. However, in that case the system is already low
on memory depending on the allocation context, the pcp lists may get
flushed anyway.

> We can also
> change this code, but, I'd hope to maintain current layering. Second,
> a performance
> reason. After the flag for nocma is up, a burst of nocma allocation
> could come. After
> flushing the pcplist one times, we can use the free page on the
> pcplist as usual until
> the context is changed.

It's not guaranteed because CMA pages could be freed between the nocma save
and restore triggering further drains due to a reschedule. Similarly,
a CMA allocation in parallel could refill with CMA pages on the per-cpu
list. While both cases are unlikely, it's more unpredictable than a
straight-forward pcp bypass.

I don't really see it as a layering violation of the API because all
order-0 pages go through the PCP lists. The fact that order-0 is serviced
from the pcp list is an internal implementation detail, the API doesn't
care.

--
Mel Gorman
SUSE Labs

2020-08-27 23:59:05

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 8월 27일 (목) 오후 10:35, Mel Gorman <[email protected]>님이 작성:
>
> On Wed, Aug 26, 2020 at 02:12:44PM +0900, Joonsoo Kim wrote:
> > > > And, it requires to break current code
> > > > layering that order-0 page is always handled by the pcplist. I'd prefer
> > > > to avoid it so this patch uses different way to skip CMA page allocation
> > > > from the pcplist.
> > >
> > > Well it would be much simpler and won't affect most of allocations. Better than
> > > flushing pcplists IMHO.
> >
> > Hmm...Still, I'd prefer my approach.
>
> I prefer the pcp bypass approach. It's simpler and it does not incur a
> pcp drain/refill penalty.
>
> > There are two reasons. First,
> > layering problem
> > mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> > As the name shows, it's for high order atomic allocation. But, after
> > skipping pcplist
> > allocation as you suggested, we could get there with order 0 request.
>
> I guess your concern is that under some circumstances that a request that
> passes a watermark check could fail due to a highatomic reserve and to
> an extent this is true. However, in that case the system is already low
> on memory depending on the allocation context, the pcp lists may get
> flushed anyway.

My concern is that non-highorder (order-0) allocation could pollute/use the
MIGRATE_HIGHATOMIC pageblock. It's reserved for highorder atomic
allocation so it's not good if an order-0 request could get there. It would
cause more fragmentation on that pageblock.

> > We can also
> > change this code, but, I'd hope to maintain current layering. Second,
> > a performance
> > reason. After the flag for nocma is up, a burst of nocma allocation
> > could come. After
> > flushing the pcplist one times, we can use the free page on the
> > pcplist as usual until
> > the context is changed.
>
> It's not guaranteed because CMA pages could be freed between the nocma save
> and restore triggering further drains due to a reschedule. Similarly,
> a CMA allocation in parallel could refill with CMA pages on the per-cpu
> list. While both cases are unlikely, it's more unpredictable than a
> straight-forward pcp bypass.

Agreed that it's unpredictable than the pcp bypass. But, as you said,
those cases
would be rare.

> I don't really see it as a layering violation of the API because all
> order-0 pages go through the PCP lists. The fact that order-0 is serviced
> from the pcp list is an internal implementation detail, the API doesn't
> care.

What I mean is an internal implementation layering violation. We could make
a rule even in internal implementation to make code simpler and maintainable.
I guess that order-0 is serviced from the pcp list is one of those.

Anyway, although I prefer my approach, I'm okay with using pcp bypass.

Thanks.

2020-09-25 05:01:59

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 8월 28일 (금) 오전 8:54, Joonsoo Kim <[email protected]>님이 작성:
>
> 2020년 8월 27일 (목) 오후 10:35, Mel Gorman <[email protected]>님이 작성:
> >
> > On Wed, Aug 26, 2020 at 02:12:44PM +0900, Joonsoo Kim wrote:
> > > > > And, it requires to break current code
> > > > > layering that order-0 page is always handled by the pcplist. I'd prefer
> > > > > to avoid it so this patch uses different way to skip CMA page allocation
> > > > > from the pcplist.
> > > >
> > > > Well it would be much simpler and won't affect most of allocations. Better than
> > > > flushing pcplists IMHO.
> > >
> > > Hmm...Still, I'd prefer my approach.
> >
> > I prefer the pcp bypass approach. It's simpler and it does not incur a
> > pcp drain/refill penalty.
> >
> > > There are two reasons. First,
> > > layering problem
> > > mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> > > As the name shows, it's for high order atomic allocation. But, after
> > > skipping pcplist
> > > allocation as you suggested, we could get there with order 0 request.
> >
> > I guess your concern is that under some circumstances that a request that
> > passes a watermark check could fail due to a highatomic reserve and to
> > an extent this is true. However, in that case the system is already low
> > on memory depending on the allocation context, the pcp lists may get
> > flushed anyway.
>
> My concern is that non-highorder (order-0) allocation could pollute/use the
> MIGRATE_HIGHATOMIC pageblock. It's reserved for highorder atomic
> allocation so it's not good if an order-0 request could get there. It would
> cause more fragmentation on that pageblock.
>
> > > We can also
> > > change this code, but, I'd hope to maintain current layering. Second,
> > > a performance
> > > reason. After the flag for nocma is up, a burst of nocma allocation
> > > could come. After
> > > flushing the pcplist one times, we can use the free page on the
> > > pcplist as usual until
> > > the context is changed.
> >
> > It's not guaranteed because CMA pages could be freed between the nocma save
> > and restore triggering further drains due to a reschedule. Similarly,
> > a CMA allocation in parallel could refill with CMA pages on the per-cpu
> > list. While both cases are unlikely, it's more unpredictable than a
> > straight-forward pcp bypass.
>
> Agreed that it's unpredictable than the pcp bypass. But, as you said,
> those cases
> would be rare.
>
> > I don't really see it as a layering violation of the API because all
> > order-0 pages go through the PCP lists. The fact that order-0 is serviced
> > from the pcp list is an internal implementation detail, the API doesn't
> > care.
>
> What I mean is an internal implementation layering violation. We could make
> a rule even in internal implementation to make code simpler and maintainable.
> I guess that order-0 is serviced from the pcp list is one of those.
>
> Anyway, although I prefer my approach, I'm okay with using pcp bypass.

Hello, Andrew and Vlastimil.

It's better to fix this possible bug introduced in v5.9-rc1 before
v5.9 is released.
Which approach do you prefer?
If it is determined, I will immediately send a patch as you suggested.

Thanks.

2020-09-25 08:57:01

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

On 9/25/20 6:59 AM, Joonsoo Kim wrote:
> 2020년 8월 28일 (금) 오전 8:54, Joonsoo Kim <[email protected]>님이 작성:
>
> Hello, Andrew and Vlastimil.
>
> It's better to fix this possible bug introduced in v5.9-rc1 before
> v5.9 is released.
> Which approach do you prefer?
> If it is determined, I will immediately send a patch as you suggested.

Hmm both Mel and I preferred the bypass approach and nobody else weighted in, so
if you don't mind, you can use my suggestion. Hmm maybe alloc_flags & ALLOC_CMA
check should precede migratetype check in the if () to optimize for userspace
allocations?

Thanks,
Vlastimil

> Thanks.
>

2020-09-25 08:59:54

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020년 9월 25일 (금) 오후 5:55, Vlastimil Babka <[email protected]>님이 작성:
>
> On 9/25/20 6:59 AM, Joonsoo Kim wrote:
> > 2020년 8월 28일 (금) 오전 8:54, Joonsoo Kim <[email protected]>님이 작성:
> >
> > Hello, Andrew and Vlastimil.
> >
> > It's better to fix this possible bug introduced in v5.9-rc1 before
> > v5.9 is released.
> > Which approach do you prefer?
> > If it is determined, I will immediately send a patch as you suggested.
>
> Hmm both Mel and I preferred the bypass approach and nobody else weighted in, so
> if you don't mind, you can use my suggestion. Hmm maybe alloc_flags & ALLOC_CMA
> check should precede migratetype check in the if () to optimize for userspace
> allocations?

Okay!
I will implement a bypass approach and send it early next week.

Thanks.