LinuxLists.cc - [PATCH] mm,page_alloc,cma: configurable CMA utilization

2023-01-31 07:13:18

Subject: [PATCH] mm,page_alloc,cma: configurable CMA utilization

Commit 16867664936e ("mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations")
added support to use CMA pages when more than 50% of total free pages in
the zone are free CMA pages.

However, with multiplatform kernels a single binary is used across different
targets of varying memory sizes. A low memory target using one such kernel
would incur allocation failures even when sufficient memory is available in
the CMA region. On these targets we would want to utilize a higher percentage
of the CMA region and reduce the allocation failures, even if it means that a
subsequent cma_alloc() would take longer.

Make the percentage of CMA utilization a configurable parameter to allow
for such usecases.

Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---
Note: There was a mention about it being the last resort to making this
percentage configurable (https://lkml.org/lkml/2020/3/12/751). But
as explained above, multi-platform kernels for varying memory size
targets would need this to be configurable.
---
include/linux/mm.h | 1 +
kernel/sysctl.c | 8 ++++++++
mm/page_alloc.c | 18 +++++++++++++++---
mm/util.c | 2 ++
4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8f857163ac89..e4e5d508e9eb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -203,6 +203,7 @@ extern unsigned long sysctl_admin_reserve_kbytes;

extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
+extern int sysctl_cma_utilization_ratio;
extern unsigned long sysctl_overcommit_kbytes;

int overcommit_ratio_handler(struct ctl_table *, int, void *, size_t *,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 137d4abe3eda..2dce6a908aa6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2445,6 +2445,14 @@ static struct ctl_table vm_table[] = {
.extra2 = SYSCTL_ONE,
},
#endif
+ {
+ .procname = "cma_utilization_ratio",
+ .data = &sysctl_cma_utilization_ratio,
+ .maxlen = sizeof(sysctl_cma_utilization_ratio),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ },
{ }
};

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0745aedebb37..b72db3824687 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3071,6 +3071,20 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,

}

+static __always_inline bool zone_can_use_cma_pages(struct zone *zone)
+{
+ unsigned long cma_free_pages;
+ unsigned long zone_free_pages;
+
+ cma_free_pages = zone_page_state(zone, NR_FREE_CMA_PAGES);
+ zone_free_pages = zone_page_state(zone, NR_FREE_PAGES);
+
+ if (cma_free_pages > zone_free_pages / sysctl_cma_utilization_ratio)
+ return true;
+
+ return false;
+}
+
/*
* Do the hard work of removing an element from the buddy allocator.
* Call me with the zone->lock already held.
@@ -3087,9 +3101,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
* allocating from CMA when over half of the zone's free memory
* is in the CMA area.
*/
- if (alloc_flags & ALLOC_CMA &&
- zone_page_state(zone, NR_FREE_CMA_PAGES) >
- zone_page_state(zone, NR_FREE_PAGES) / 2) {
+ if (alloc_flags & ALLOC_CMA && zone_can_use_cma_pages(zone)) {
page = __rmqueue_cma_fallback(zone, order);
if (page)
return page;
diff --git a/mm/util.c b/mm/util.c
index b56c92fb910f..4de81f04b249 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -781,6 +781,8 @@ void folio_copy(struct folio *dst, struct folio *src)
}

int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
+
+int sysctl_cma_utilization_ratio __read_mostly = 2;
int sysctl_overcommit_ratio __read_mostly = 50;
unsigned long sysctl_overcommit_kbytes __read_mostly;
int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
--
2.17.1

2023-01-31 07:23:37

by Anshuman Khandual

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On 1/31/23 12:40, Sukadev Bhattiprolu wrote:
>
> Commit 16867664936e ("mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations")
> added support to use CMA pages when more than 50% of total free pages in
> the zone are free CMA pages.
>
> However, with multiplatform kernels a single binary is used across different
> targets of varying memory sizes. A low memory target using one such kernel
> would incur allocation failures even when sufficient memory is available in
> the CMA region. On these targets we would want to utilize a higher percentage
> of the CMA region and reduce the allocation failures, even if it means that a
> subsequent cma_alloc() would take longer.
>
> Make the percentage of CMA utilization a configurable parameter to allow
> for such usecases.
>
> Signed-off-by: Sukadev Bhattiprolu <[email protected]>
> ---
> Note: There was a mention about it being the last resort to making this
> percentage configurable (https://lkml.org/lkml/2020/3/12/751). But
> as explained above, multi-platform kernels for varying memory size
> targets would need this to be configurable.
> ---
> include/linux/mm.h | 1 +
> kernel/sysctl.c | 8 ++++++++
> mm/page_alloc.c | 18 +++++++++++++++---
> mm/util.c | 2 ++
> 4 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8f857163ac89..e4e5d508e9eb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -203,6 +203,7 @@ extern unsigned long sysctl_admin_reserve_kbytes;
>
> extern int sysctl_overcommit_memory;
> extern int sysctl_overcommit_ratio;
> +extern int sysctl_cma_utilization_ratio;
> extern unsigned long sysctl_overcommit_kbytes;
>
> int overcommit_ratio_handler(struct ctl_table *, int, void *, size_t *,
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 137d4abe3eda..2dce6a908aa6 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2445,6 +2445,14 @@ static struct ctl_table vm_table[] = {
> .extra2 = SYSCTL_ONE,
> },
> #endif
> + {
> + .procname = "cma_utilization_ratio",
> + .data = &sysctl_cma_utilization_ratio,
> + .maxlen = sizeof(sysctl_cma_utilization_ratio),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ONE,
> + },
> { }
> };
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0745aedebb37..b72db3824687 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3071,6 +3071,20 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
>
> }
>
> +static __always_inline bool zone_can_use_cma_pages(struct zone *zone)
> +{
> + unsigned long cma_free_pages;
> + unsigned long zone_free_pages;
> +
> + cma_free_pages = zone_page_state(zone, NR_FREE_CMA_PAGES);
> + zone_free_pages = zone_page_state(zone, NR_FREE_PAGES);
> +
> + if (cma_free_pages > zone_free_pages / sysctl_cma_utilization_ratio)
> + return true;
> +
> + return false;
> +}
> +
> /*
> * Do the hard work of removing an element from the buddy allocator.
> * Call me with the zone->lock already held.
> @@ -3087,9 +3101,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> * allocating from CMA when over half of the zone's free memory
> * is in the CMA area.
> */
> - if (alloc_flags & ALLOC_CMA &&
> - zone_page_state(zone, NR_FREE_CMA_PAGES) >
> - zone_page_state(zone, NR_FREE_PAGES) / 2) {
> + if (alloc_flags & ALLOC_CMA && zone_can_use_cma_pages(zone)) {
> page = __rmqueue_cma_fallback(zone, order);
> if (page)
> return page;
> diff --git a/mm/util.c b/mm/util.c
> index b56c92fb910f..4de81f04b249 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -781,6 +781,8 @@ void folio_copy(struct folio *dst, struct folio *src)
> }
>
> int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
> +
> +int sysctl_cma_utilization_ratio __read_mostly = 2;

Make '2' here a macro e.g CMA_UTILIZATION_DEFAULT ? Also it might be a good
opportunity to comment why the default value is '2' i.e 50 %.

> int sysctl_overcommit_ratio __read_mostly = 50;
> unsigned long sysctl_overcommit_kbytes __read_mostly;
> int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;

2023-01-31 14:26:46

by Georgi Djakov

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

Hi Sukadev,

On 31.01.23 9:10, Sukadev Bhattiprolu wrote:
>
> Commit 16867664936e ("mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations")
> added support to use CMA pages when more than 50% of total free pages in
> the zone are free CMA pages.
>
> However, with multiplatform kernels a single binary is used across different
> targets of varying memory sizes. A low memory target using one such kernel
> would incur allocation failures even when sufficient memory is available in
> the CMA region. On these targets we would want to utilize a higher percentage
> of the CMA region and reduce the allocation failures, even if it means that a
> subsequent cma_alloc() would take longer.
> > Make the percentage of CMA utilization a configurable parameter to allow
> for such usecases.

The above makes sense to me. But it also needs to documented like the other
sysctl files in Documentation/admin-guide/sysctl/vm.rst

Thanks,
Georgi

>
> Signed-off-by: Sukadev Bhattiprolu <[email protected]>
> ---
> Note: There was a mention about it being the last resort to making this
> percentage configurable (https://lkml.org/lkml/2020/3/12/751). But
> as explained above, multi-platform kernels for varying memory size
> targets would need this to be configurable.
> ---
> include/linux/mm.h | 1 +
> kernel/sysctl.c | 8 ++++++++
> mm/page_alloc.c | 18 +++++++++++++++---
> mm/util.c | 2 ++
> 4 files changed, 26 insertions(+), 3 deletions(-)
>

2023-01-31 18:19:38

by Roman Gushchin

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Mon, Jan 30, 2023 at 11:10:52PM -0800, Sukadev Bhattiprolu wrote:
>
> Commit 16867664936e ("mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations")
> added support to use CMA pages when more than 50% of total free pages in
> the zone are free CMA pages.
>
> However, with multiplatform kernels a single binary is used across different
> targets of varying memory sizes. A low memory target using one such kernel
> would incur allocation failures even when sufficient memory is available in
> the CMA region. On these targets we would want to utilize a higher percentage
> of the CMA region and reduce the allocation failures, even if it means that a
> subsequent cma_alloc() would take longer.
>
> Make the percentage of CMA utilization a configurable parameter to allow
> for such usecases.
>
> Signed-off-by: Sukadev Bhattiprolu <[email protected]>
> ---
> Note: There was a mention about it being the last resort to making this
> percentage configurable (https://lkml.org/lkml/2020/3/12/751). But
> as explained above, multi-platform kernels for varying memory size
> targets would need this to be configurable.

Hi Sukadev!

Can you, please, share a bit more details about your setup? E.g. what is
the zone size, the cma area size and the value you want to set your sysctl to?

Roman

2023-01-31 20:10:09

by Sukadev Bhattiprolu

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Tue, Jan 31, 2023 at 10:10:40AM -0800, Roman Gushchin wrote:
> Hi Sukadev!
>
> Can you, please, share a bit more details about your setup? E.g. what is
> the zone size, the cma area size and the value you want to set your sysctl to?

Hi Roman,

I currently have a device with 8GB Zone normal and 600MB of CMA. We have a
slightly different implementation and use up all the available CMA region.
i.e. going forward, we intend to set the ratio to 100 or even higher.

We have other devices with 4GB or less memory but I don't have the CMA
size atm. Let me know if you need more info.

Thanks!

Sukadev

2023-02-01 00:00:08

by Roman Gushchin

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Tue, Jan 31, 2023 at 12:10:01PM -0800, Sukadev Bhattiprolu wrote:
> On Tue, Jan 31, 2023 at 10:10:40AM -0800, Roman Gushchin wrote:
> > Hi Sukadev!
> >
> > Can you, please, share a bit more details about your setup? E.g. what is
> > the zone size, the cma area size and the value you want to set your sysctl to?
>
> Hi Roman,
>
> I currently have a device with 8GB Zone normal and 600MB of CMA. We have a
> slightly different implementation and use up all the available CMA region.
> i.e. going forward, we intend to set the ratio to 100 or even higher.

It means you want allocations be always served from a cma region first?
What's the point of it?

The idea behind the current formula is to keep cma regions free if there is
a plenty of other free memory, otherwise treat it on par with other memory.

To justify a new sysctl you really need a solid use case, which is not limited
to your custom implementation.

Also, if decide to go with a new sysctl, we probably want to define it differently,
e.g. as a [0-1000)/1000 of the zone size. But, honestly, I'm not sold yet.

Thanks!

2023-02-01 04:09:00

by Chris Goldsworthy

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Tue, Jan 31, 2023 at 03:59:36PM -0800, Roman Gushchin wrote:
> On Tue, Jan 31, 2023 at 12:10:01PM -0800, Sukadev Bhattiprolu wrote:
> > On Tue, Jan 31, 2023 at 10:10:40AM -0800, Roman Gushchin wrote:
> > > Hi Sukadev!
> > >
> > > Can you, please, share a bit more details about your setup? E.g. what is
> > > the zone size, the cma area size and the value you want to set your sysctl to?
> >
> > Hi Roman,
> >
> > I currently have a device with 8GB Zone normal and 600MB of CMA. We have a
> > slightly different implementation and use up all the available CMA region.
> > i.e. going forward, we intend to set the ratio to 100 or even higher.

Hi Roman,

> It means you want allocations be always served from a cma region first?

Exactly.

> What's the point of it?

We're operating in a resource constrained environment, and we want to maximize
the amount of memory free / headroom for GFP_KERNEL allocations on our SoCs,
which are especially important for DMA allocations that use an IOMMU. We need a
large amount of CMA on our SoCs for various reasons (e.g. for devices not
upstream of an IOMMU), but whilst that CMA memory is not in use, we want to
route all GFP_MOVABLE allocations to the CMA regions, which will free up memory
for GFP_KERNEL allocations.

> The idea behind the current formula is to keep cma regions free if there is
> a plenty of other free memory, otherwise treat it on par with other memory.

With the current approach, if we have a large amount of movable memory allocated
that has not gone into the CMA regions yet, and a DMA use case starts that
causes the above condition to be met, we would head towards OOM conditions when
we otherwise could have delayed this with this change. Note that since we're
working on Android, there is a daemon built on top of PSI called LMKD that will
start killing things under memory pressure (before an OOM is actually reached)
in order to free up memory. This patch should then reduce kills accordingly for
a better user experience by keeping a larger set of background apps alive. When
a CMA allocation does occur and pages get migrated out, there is a similar
reduction in headroom (you probably already know this and know of the FB
equivalent made by Johannes Weiner).

Thanks,

Chris.

2023-02-01 19:00:43

by Roman Gushchin

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Tue, Jan 31, 2023 at 08:06:28PM -0800, Chris Goldsworthy wrote:
> On Tue, Jan 31, 2023 at 03:59:36PM -0800, Roman Gushchin wrote:
> > On Tue, Jan 31, 2023 at 12:10:01PM -0800, Sukadev Bhattiprolu wrote:
> > > On Tue, Jan 31, 2023 at 10:10:40AM -0800, Roman Gushchin wrote:
> > > > Hi Sukadev!
> > > >
> > > > Can you, please, share a bit more details about your setup? E.g. what is
> > > > the zone size, the cma area size and the value you want to set your sysctl to?
> > >
> > > Hi Roman,
> > >
> > > I currently have a device with 8GB Zone normal and 600MB of CMA. We have a
> > > slightly different implementation and use up all the available CMA region.
> > > i.e. going forward, we intend to set the ratio to 100 or even higher.
>
>
> Hi Roman,
>
> > It means you want allocations be always served from a cma region first?
>
> Exactly.
>
> > What's the point of it?
>
> We're operating in a resource constrained environment, and we want to maximize
> the amount of memory free / headroom for GFP_KERNEL allocations on our SoCs,
> which are especially important for DMA allocations that use an IOMMU. We need a
> large amount of CMA on our SoCs for various reasons (e.g. for devices not
> upstream of an IOMMU), but whilst that CMA memory is not in use, we want to
> route all GFP_MOVABLE allocations to the CMA regions, which will free up memory
> for GFP_KERNEL allocations.
>
> > The idea behind the current formula is to keep cma regions free if there is
> > a plenty of other free memory, otherwise treat it on par with other memory.
>
> With the current approach, if we have a large amount of movable memory allocated
> that has not gone into the CMA regions yet, and a DMA use case starts that
> causes the above condition to be met, we would head towards OOM conditions when
> we otherwise could have delayed this with this change.
> Note that since we're
> working on Android, there is a daemon built on top of PSI called LMKD that will
> start killing things under memory pressure (before an OOM is actually reached)
> in order to free up memory. This patch should then reduce kills accordingly for
> a better user experience by keeping a larger set of background apps alive. When
> a CMA allocation does occur and pages get migrated out, there is a similar
> reduction in headroom (you probably already know this and know of the FB
> equivalent made by Johannes Weiner).

I see... Thank you for the explanation!

So the problem is that movable allocations are spread between cma and non-cma
evenly, so that non-movable allocations might fail. And the idea is to use
the cma area more actively for movable allocations to keep a headroom for
non-movable allocations. Is it correct?

Then _maybe_ a new knob is justified, at least I don't have better ideas.
Rik, do you have any input here?

Let's then define it in a more generic way and _maybe_ move to the cma
sysfs/debugfs (not 100% sure about this part, but probably worth exploring).

Thanks!

2023-02-01 23:48:06

by Minchan Kim

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

Hi Chris,

On Tue, Jan 31, 2023 at 08:06:28PM -0800, Chris Goldsworthy wrote:
> On Tue, Jan 31, 2023 at 03:59:36PM -0800, Roman Gushchin wrote:
> > On Tue, Jan 31, 2023 at 12:10:01PM -0800, Sukadev Bhattiprolu wrote:
> > > On Tue, Jan 31, 2023 at 10:10:40AM -0800, Roman Gushchin wrote:
> > > > Hi Sukadev!
> > > >
> > > > Can you, please, share a bit more details about your setup? E.g. what is
> > > > the zone size, the cma area size and the value you want to set your sysctl to?
> > >
> > > Hi Roman,
> > >
> > > I currently have a device with 8GB Zone normal and 600MB of CMA. We have a
> > > slightly different implementation and use up all the available CMA region.
> > > i.e. going forward, we intend to set the ratio to 100 or even higher.
>
>
> Hi Roman,
>
> > It means you want allocations be always served from a cma region first?
>
> Exactly.
>
> > What's the point of it?
>
> We're operating in a resource constrained environment, and we want to maximize
> the amount of memory free / headroom for GFP_KERNEL allocations on our SoCs,
> which are especially important for DMA allocations that use an IOMMU. We need a
> large amount of CMA on our SoCs for various reasons (e.g. for devices not
> upstream of an IOMMU), but whilst that CMA memory is not in use, we want to
> route all GFP_MOVABLE allocations to the CMA regions, which will free up memory
> for GFP_KERNEL allocations.

I like this patch for different reason but for the specific problem you
mentioned, How about making reclaimer/compaction aware of the problem:

IOW, when the GFP_KERNEL/DMA allocation happens but not enough memory
in the zones, let's migrates movable pages in those zones into CMA
area/movable zone if they are plenty of free memory.

I guess you considered but did you observe some problems?

>
> > The idea behind the current formula is to keep cma regions free if there is
> > a plenty of other free memory, otherwise treat it on par with other memory.
>
> With the current approach, if we have a large amount of movable memory allocated
> that has not gone into the CMA regions yet, and a DMA use case starts that
> causes the above condition to be met, we would head towards OOM conditions when
> we otherwise could have delayed this with this change. Note that since we're
> working on Android, there is a daemon built on top of PSI called LMKD that will
> start killing things under memory pressure (before an OOM is actually reached)
> in order to free up memory. This patch should then reduce kills accordingly for
> a better user experience by keeping a larger set of background apps alive. When
> a CMA allocation does occur and pages get migrated out, there is a similar
> reduction in headroom (you probably already know this and know of the FB
> equivalent made by Johannes Weiner).
>
> Thanks,
>
> Chris.

2023-02-02 20:15:31

by Sukadev Bhattiprolu

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

Hi Roman,

On Wed, Feb 01, 2023 at 11:00:25AM -0800, Roman Gushchin wrote:
> Then _maybe_ a new knob is justified, at least I don't have better ideas.
> Rik, do you have any input here?
>
> Let's then define it in a more generic way and _maybe_ move to the cma
> sysfs/debugfs (not 100% sure about this part, but probably worth exploring).

We should be able to use a sysfs parameter too. Will try that out. But could
you elaborate on what is more generic way?

Also regarding following in the earlier message:

> Also, if decide to go with a new sysctl, we probably want to define it differently,
> e.g. as a [0-1000)/1000 of the zone size. But, honestly, I'm not sold yet.

Are you saying that the ratio should be limited to 1000th of the zone size?
Or of the free pages in the zone? If the zone is large, 1000th of it would
still be quite big?

Sukadev

2023-02-04 00:05:09

by Roman Gushchin

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Thu, Feb 02, 2023 at 12:13:02PM -0800, Sukadev Bhattiprolu wrote:
>
> Hi Roman,
>
> On Wed, Feb 01, 2023 at 11:00:25AM -0800, Roman Gushchin wrote:
> > Then _maybe_ a new knob is justified, at least I don't have better ideas.
> > Rik, do you have any input here?
> >
> > Let's then define it in a more generic way and _maybe_ move to the cma
> > sysfs/debugfs (not 100% sure about this part, but probably worth exploring).
>
> We should be able to use a sysfs parameter too. Will try that out.

Thanks. The reason I think it might be preferable is that new sysctls are hated
by everyone and this use case looks a very niche.

> But could you elaborate on what is more generic way?

I mean in the proposed patch it's only possible to shift the ratio towards cma:
a larger sysctl value will make allocations from the cma area happening earlier.
The value 1 will turn it off completely.

Probably better to define the new knob in % or 1/1000's with the default value
50% or 500. 0 will mean never use cma from this path, 999 - always.
Something along these lines.

Thanks!

2023-02-06 05:24:59

by Chris Goldsworthy

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Wed, Feb 01, 2023 at 03:47:58PM -0800, Minchan Kim wrote:
> Hi Chris,
>
> On Tue, Jan 31, 2023 at 08:06:28PM -0800, Chris Goldsworthy wrote:
> > We're operating in a resource constrained environment, and we want to maximize
> > the amount of memory free / headroom for GFP_KERNEL allocations on our SoCs,
> > which are especially important for DMA allocations that use an IOMMU. We need a
> > large amount of CMA on our SoCs for various reasons (e.g. for devices not
> > upstream of an IOMMU), but whilst that CMA memory is not in use, we want to
> > route all GFP_MOVABLE allocations to the CMA regions, which will free up memory
> > for GFP_KERNEL allocations.
>
> I like this patch for different reason but for the specific problem you
> mentioned, How about making reclaimer/compaction aware of the problem:
>
> IOW, when the GFP_KERNEL/DMA allocation happens but not enough memory
> in the zones, let's migrates movable pages in those zones into CMA
> area/movable zone if they are plenty of free memory.
>
> I guess you considered but did you observe some problems?

Hi Minchan,

This is not an approach we've considered. If you have a high-level idea of the
key parts of vmscan.c you'd need to touch to implement this, could you point me
to them?

I guess one drawback with this approach is that as soon as kswapd starts,
psi_memstall_enter() is called, which can eventually lead to LMKD running in
user space, which we want to minimize. One aim of what we're doing this is to
delay the calling of psi_memstall_enter().

It would be beneficial though on top of our change: if someone called
cma_alloc() and migrated out of the CMA regions, changing kswapd to behave like
this would move things back into the CMA regions after cma_release() is called
(instead of having to kill a user space process to have the CMA re-utilized upon
further user space actions).

Thanks,

Chris.

2023-02-08 22:00:58

by Minchan Kim

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Sun, Feb 05, 2023 at 09:22:28PM -0800, Chris Goldsworthy wrote:
> On Wed, Feb 01, 2023 at 03:47:58PM -0800, Minchan Kim wrote:
> > Hi Chris,
> >
> > On Tue, Jan 31, 2023 at 08:06:28PM -0800, Chris Goldsworthy wrote:
> > > We're operating in a resource constrained environment, and we want to maximize
> > > the amount of memory free / headroom for GFP_KERNEL allocations on our SoCs,
> > > which are especially important for DMA allocations that use an IOMMU. We need a
> > > large amount of CMA on our SoCs for various reasons (e.g. for devices not
> > > upstream of an IOMMU), but whilst that CMA memory is not in use, we want to
> > > route all GFP_MOVABLE allocations to the CMA regions, which will free up memory
> > > for GFP_KERNEL allocations.
> >
> > I like this patch for different reason but for the specific problem you
> > mentioned, How about making reclaimer/compaction aware of the problem:
> >
> > IOW, when the GFP_KERNEL/DMA allocation happens but not enough memory
> > in the zones, let's migrates movable pages in those zones into CMA
> > area/movable zone if they are plenty of free memory.
> >
> > I guess you considered but did you observe some problems?
>
> Hi Minchan,
>
> This is not an approach we've considered. If you have a high-level idea of the
> key parts of vmscan.c you'd need to touch to implement this, could you point me
> to them?

I think the problem is not specific with CMA but also movable zone.
If movable pages are charged into non-movable zones, the problem wil
happen. So what I suggested was if reclaimers(e.g., background/direct
reclaimers) found the request was GFP_KERNEL but there are not enough
free pages in the zone and lower zones but has movable pages in there,
migrate them into the CMA area and/or movable zones to make room for
the GFP_KERNEL allocation before the final failure.

It needs touch wakeup_kswapd/kcompactd to trigger the migration and
reclaim/compaction needs to deal with the commmand. I couldn't say
where are good places to change until I look at further details but
I thought it's more general solution.

>
> I guess one drawback with this approach is that as soon as kswapd starts,
> psi_memstall_enter() is called, which can eventually lead to LMKD running in
> user space, which we want to minimize. One aim of what we're doing this is to
> delay the calling of psi_memstall_enter().

LMKD running would be not a problem, I think but you are worry about
LMKD decide killing apps due to wrong signal? I think it's orthgonal
issue. Actually, it's long time problem for userspace memory manager
since they don't know where the memory pressure comes from with what
constrains. This is the GFP_KERNEL constraint but LMKD can kill apps
which consumes much memory for movable zones or CMA area so cannot
help the memory pressure. Furthermore, LMKD has bunch of knobs to
affect decision to kill apps. PSI is just event to wake up LMKD,
not decision policy.

>
> It would be beneficial though on top of our change: if someone called
> cma_alloc() and migrated out of the CMA regions, changing kswapd to behave like
> this would move things back into the CMA regions after cma_release() is called
> (instead of having to kill a user space process to have the CMA re-utilized upon
> further user space actions).
>
> Thanks,
>
> Chris.

2024-01-05 23:48:08

by Sukadev Bhattiprolu

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On 2/1/2023 3:47 PM, Minchan Kim wrote:
>
> I like this patch for different reason but for the specific problem you
> mentioned, How about making reclaimer/compaction aware of the problem:
>
> IOW, when the GFP_KERNEL/DMA allocation happens but not enough memory
> in the zones, let's migrates movable pages in those zones into CMA
> area/movable zone if they are plenty of free memory.

Hi Minchan,

Coming back to this thread after a while.

If the CMA region is usually free, allocating pages first in the non-CMA
region and then moving them into the CMA region would be extra work since
it would happen most of the time. In such cases, wouldn't it be better to
allocate from the CMA region itself?

Sukadev

2024-01-06 00:06:11

by Roman Gushchin

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Fri, Jan 05, 2024 at 03:46:55PM -0800, Sukadev Bhattiprolu wrote:
>
> On 2/1/2023 3:47 PM, Minchan Kim wrote:
> >
> > I like this patch for different reason but for the specific problem you
> > mentioned, How about making reclaimer/compaction aware of the problem:
> >
> > IOW, when the GFP_KERNEL/DMA allocation happens but not enough memory
> > in the zones, let's migrates movable pages in those zones into CMA
> > area/movable zone if they are plenty of free memory.
>
> Hi Minchan,
>
> Coming back to this thread after a while.
>
> If the CMA region is usually free, allocating pages first in the non-CMA
> region and then moving them into the CMA region would be extra work since
> it would happen most of the time. In such cases, wouldn't it be better to
> allocate from the CMA region itself?

I'm not sure there is a "one size fits all" solution here. There are two
distinctive cases:
1) A relatively small cma area used for a specific purpose. This is how cma
was used until recently. And it was barely used by the kernel for non-cma
allocations.
2) A relatively large cma area which is used to allocate gigantic hugepages
and as an anti-fragmentation mechanism in general (basically as a movable
zone). In this case it might be preferable to use cma for movable
allocations, because the space for non-movable allocations might be limited.

I see two options here:
1) introduce per-cma area flags which will define the usage policy
2) redesign the page allocator to better take care of fragmentation at 1Gb scale

The latter is obviously not a small endeavour.
The fundamentally missing piece is a notion of an anti-fragmentation cost.
E.g. how much work does it makes sense to put into page migration
before "polluting" a new large block of memory with an unmovable folio.

Thanks!

2024-01-08 20:15:37

by Sukadev Bhattiprolu

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On 1/5/2024 4:05 PM, Roman Gushchin wrote:
> I'm not sure there is a "one size fits all" solution here.
agree - that's why we are thinking a configurable cma utilization would be
useful.
> There are two distinctive cases:
> 1) A relatively small cma area used for a specific purpose. This is how cma
> was used until recently. And it was barely used by the kernel for non-cma
> allocations.
> 2) A relatively large cma area which is used to allocate gigantic hugepages
> and as an anti-fragmentation mechanism in general (basically as a movable
> zone). In this case it might be preferable to use cma for movable
> allocations, because the space for non-movable allocations might be limited.
>
> I see two options here:
> 1) introduce per-cma area flags which will define the usage policy
Could you please elaborate on this - how would we use the per-cma flags
when allocating pages?
> 2) redesign the page allocator to better take care of fragmentation at 1Gb scale
>
> The latter is obviously not a small endeavour.
> The fundamentally missing piece is a notion of an anti-fragmentation cost.
> E.g. how much work does it makes sense to put into page migration
> before "polluting" a new large block of memory with an unmovable folio.

Stepping back, we are trying to solve for a situation where system:
        - has lot of movable allocs in zone normal
        - has lot of idle memory in CMA region
        - but is low on memory for unmovable allocs, leading to oom-kills

On devices where cma region is mostly idle, allocating movable pages from
the cma region would have lesser overhead?

IIUC, this redesign for smarter migration would be in addition to or in
parallel to the CMA utilization right?

Thanks,

Sukadev

>
> Thanks!

2024-01-09 02:59:54

by Roman Gushchin

[permalink] [raw]

Subject: Re: [PATCH] mm,page_alloc,cma: configurable CMA utilization

On Mon, Jan 08, 2024 at 12:15:05PM -0800, Sukadev Bhattiprolu wrote:
>
> On 1/5/2024 4:05 PM, Roman Gushchin wrote:
> > I'm not sure there is a "one size fits all" solution here.
> agree - that's why we are thinking a configurable cma utilization would be
> useful.
> > There are two distinctive cases:
> > 1) A relatively small cma area used for a specific purpose. This is how cma
> > was used until recently. And it was barely used by the kernel for non-cma
> > allocations.
> > 2) A relatively large cma area which is used to allocate gigantic hugepages
> > and as an anti-fragmentation mechanism in general (basically as a movable
> > zone). In this case it might be preferable to use cma for movable
> > allocations, because the space for non-movable allocations might be limited.
> >
> > I see two options here:
> > 1) introduce per-cma area flags which will define the usage policy
> Could you please elaborate on this - how would we use the per-cma flags
> when allocating pages?

I mean potentially we can add some per-cma area configuration options which will
define the "priority" of using the memory from this cma area.

> > 2) redesign the page allocator to better take care of fragmentation at 1Gb scale
> >
> > The latter is obviously not a small endeavour.
> > The fundamentally missing piece is a notion of an anti-fragmentation cost.
> > E.g. how much work does it makes sense to put into page migration
> > before "polluting" a new large block of memory with an unmovable folio.
>
> Stepping back, we are trying to solve for a situation where system:
> ??????? - has lot of movable allocs in zone normal
> ??????? - has lot of idle memory in CMA region
> ??????? - but is low on memory for unmovable allocs, leading to oom-kills
>
> On devices where cma region is mostly idle, allocating movable pages from
> the cma region would have lesser overhead?

It's not that easy: imagine booting up a small system with a cma area reserved
for some hardware-related operations. This is pretty much what cma was initially
designed. How to not fill the cma area up with the page cache?

Thanks!