2017-09-06 04:35:38

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

From: Joonsoo Kim <[email protected]>

Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
important to reserve. When ZONE_MOVABLE is used, this problem would
theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
allocation request which is mainly used for page cache and anon page
allocation. So, fix it by setting 0 to
sysctl_lowmem_reserve_ratio[ZONE_HIGHMEM].

And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
makes code complex. For example, if there is highmem system, following
reserve ratio is activated for *NORMAL ZONE* which would be easyily
misleading people.

#ifdef CONFIG_HIGHMEM
32
#endif

This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
array by MAX_NR_ZONES and place "#ifdef" to right place.

Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
Documentation/sysctl/vm.txt | 5 ++---
include/linux/mmzone.h | 2 +-
mm/page_alloc.c | 25 ++++++++++++++-----------
3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9baf66a..e9059d3 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -336,8 +336,6 @@ The lowmem_reserve_ratio is an array. You can see them by reading this file.
% cat /proc/sys/vm/lowmem_reserve_ratio
256 256 32
-
-Note: # of this elements is one fewer than number of zones. Because the highest
- zone's value is not necessary for following calculation.

But, these values are not used directly. The kernel calculates # of protection
pages for each zones from them. These are shown as array of protection pages
@@ -388,7 +386,8 @@ As above expression, they are reciprocal number of ratio.
pages of higher zones on the node.

If you would like to protect more pages, smaller values are effective.
-The minimum value is 1 (1/1 -> 100%).
+The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
+disables protection of the pages.

==============================================================

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 356a814..d549c4e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -890,7 +890,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
-extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
+extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0f34356..2a7f7e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -203,17 +203,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
* TBD: should special case ZONE_DMA32 machines here - in those we normally
* don't need any ZONE_NORMAL reservation
*/
-int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
#ifdef CONFIG_ZONE_DMA
- 256,
+ [ZONE_DMA] = 256,
#endif
#ifdef CONFIG_ZONE_DMA32
- 256,
+ [ZONE_DMA32] = 256,
#endif
+ [ZONE_NORMAL] = 32,
#ifdef CONFIG_HIGHMEM
- 32,
+ [ZONE_HIGHMEM] = 0,
#endif
- 32,
+ [ZONE_MOVABLE] = 0,
};

EXPORT_SYMBOL(totalram_pages);
@@ -6921,13 +6922,15 @@ static void setup_per_zone_lowmem_reserve(void)
struct zone *lower_zone;

idx--;
-
- if (sysctl_lowmem_reserve_ratio[idx] < 1)
- sysctl_lowmem_reserve_ratio[idx] = 1;
-
lower_zone = pgdat->node_zones + idx;
- lower_zone->lowmem_reserve[j] = managed_pages /
- sysctl_lowmem_reserve_ratio[idx];
+
+ if (sysctl_lowmem_reserve_ratio[idx] < 1) {
+ sysctl_lowmem_reserve_ratio[idx] = 0;
+ lower_zone->lowmem_reserve[j] = 0;
+ } else {
+ lower_zone->lowmem_reserve[j] =
+ managed_pages / sysctl_lowmem_reserve_ratio[idx];
+ }
managed_pages += lower_zone->managed_pages;
}
}
--
2.7.4


2017-09-06 07:55:54

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

+CC linux-api

On 09/06/2017 06:35 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it by setting 0 to
> sysctl_lowmem_reserve_ratio[ZONE_HIGHMEM].
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
> #ifdef CONFIG_HIGHMEM
> 32
> #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> Documentation/sysctl/vm.txt | 5 ++---
> include/linux/mmzone.h | 2 +-
> mm/page_alloc.c | 25 ++++++++++++++-----------
> 3 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 9baf66a..e9059d3 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -336,8 +336,6 @@ The lowmem_reserve_ratio is an array. You can see them by reading this file.
> % cat /proc/sys/vm/lowmem_reserve_ratio
> 256 256 32
> -
> -Note: # of this elements is one fewer than number of zones. Because the highest
> - zone's value is not necessary for following calculation.
>
> But, these values are not used directly. The kernel calculates # of protection
> pages for each zones from them. These are shown as array of protection pages
> @@ -388,7 +386,8 @@ As above expression, they are reciprocal number of ratio.
> pages of higher zones on the node.
>
> If you would like to protect more pages, smaller values are effective.
> -The minimum value is 1 (1/1 -> 100%).
> +The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
> +disables protection of the pages.
>
> ==============================================================
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 356a814..d549c4e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -890,7 +890,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
> int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0f34356..2a7f7e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -203,17 +203,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
> * TBD: should special case ZONE_DMA32 machines here - in those we normally
> * don't need any ZONE_NORMAL reservation
> */
> -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
> #ifdef CONFIG_ZONE_DMA
> - 256,
> + [ZONE_DMA] = 256,
> #endif
> #ifdef CONFIG_ZONE_DMA32
> - 256,
> + [ZONE_DMA32] = 256,
> #endif
> + [ZONE_NORMAL] = 32,
> #ifdef CONFIG_HIGHMEM
> - 32,
> + [ZONE_HIGHMEM] = 0,
> #endif
> - 32,
> + [ZONE_MOVABLE] = 0,
> };
>
> EXPORT_SYMBOL(totalram_pages);
> @@ -6921,13 +6922,15 @@ static void setup_per_zone_lowmem_reserve(void)
> struct zone *lower_zone;
>
> idx--;
> -
> - if (sysctl_lowmem_reserve_ratio[idx] < 1)
> - sysctl_lowmem_reserve_ratio[idx] = 1;
> -
> lower_zone = pgdat->node_zones + idx;
> - lower_zone->lowmem_reserve[j] = managed_pages /
> - sysctl_lowmem_reserve_ratio[idx];
> +
> + if (sysctl_lowmem_reserve_ratio[idx] < 1) {
> + sysctl_lowmem_reserve_ratio[idx] = 0;
> + lower_zone->lowmem_reserve[j] = 0;
> + } else {
> + lower_zone->lowmem_reserve[j] =
> + managed_pages / sysctl_lowmem_reserve_ratio[idx];
> + }
> managed_pages += lower_zone->managed_pages;
> }
> }
>

2017-09-14 13:24:57

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

[Sorry for a later reply]

On Wed 06-09-17 13:35:25, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve.

I am still not convinced this is a good idea. I do agree that reserving
memory in both HIGHMEM and MOVABLE is just wasting memory but removing
the reserve from the highmem as well will result that an oom victim will
allocate from lower zones and that might have unexpected side effects.

Can we simply leave HIGHMEM reserve and only remove it from the movable
zone if both are present?

> When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it by setting 0 to
> sysctl_lowmem_reserve_ratio[ZONE_HIGHMEM].
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily

s@easyily@easily@

> misleading people.
>
> #ifdef CONFIG_HIGHMEM
> 32
> #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.

I would probably split this patch into two but this is up to you

> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> Documentation/sysctl/vm.txt | 5 ++---
> include/linux/mmzone.h | 2 +-
> mm/page_alloc.c | 25 ++++++++++++++-----------
> 3 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 9baf66a..e9059d3 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -336,8 +336,6 @@ The lowmem_reserve_ratio is an array. You can see them by reading this file.
> % cat /proc/sys/vm/lowmem_reserve_ratio
> 256 256 32
> -
> -Note: # of this elements is one fewer than number of zones. Because the highest
> - zone's value is not necessary for following calculation.
>
> But, these values are not used directly. The kernel calculates # of protection
> pages for each zones from them. These are shown as array of protection pages
> @@ -388,7 +386,8 @@ As above expression, they are reciprocal number of ratio.
> pages of higher zones on the node.
>
> If you would like to protect more pages, smaller values are effective.
> -The minimum value is 1 (1/1 -> 100%).
> +The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
> +disables protection of the pages.
>
> ==============================================================
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 356a814..d549c4e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -890,7 +890,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
> int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0f34356..2a7f7e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -203,17 +203,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
> * TBD: should special case ZONE_DMA32 machines here - in those we normally
> * don't need any ZONE_NORMAL reservation
> */
> -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
> #ifdef CONFIG_ZONE_DMA
> - 256,
> + [ZONE_DMA] = 256,
> #endif
> #ifdef CONFIG_ZONE_DMA32
> - 256,
> + [ZONE_DMA32] = 256,
> #endif
> + [ZONE_NORMAL] = 32,
> #ifdef CONFIG_HIGHMEM
> - 32,
> + [ZONE_HIGHMEM] = 0,
> #endif
> - 32,
> + [ZONE_MOVABLE] = 0,
> };
>
> EXPORT_SYMBOL(totalram_pages);
> @@ -6921,13 +6922,15 @@ static void setup_per_zone_lowmem_reserve(void)
> struct zone *lower_zone;
>
> idx--;
> -
> - if (sysctl_lowmem_reserve_ratio[idx] < 1)
> - sysctl_lowmem_reserve_ratio[idx] = 1;
> -
> lower_zone = pgdat->node_zones + idx;
> - lower_zone->lowmem_reserve[j] = managed_pages /
> - sysctl_lowmem_reserve_ratio[idx];
> +
> + if (sysctl_lowmem_reserve_ratio[idx] < 1) {
> + sysctl_lowmem_reserve_ratio[idx] = 0;
> + lower_zone->lowmem_reserve[j] = 0;
> + } else {
> + lower_zone->lowmem_reserve[j] =
> + managed_pages / sysctl_lowmem_reserve_ratio[idx];
> + }
> managed_pages += lower_zone->managed_pages;
> }
> }
> --
> 2.7.4
>

--
Michal Hocko
SUSE Labs

2018-04-04 00:26:37

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCH] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

Hello, Michal.

Sorry for a really long delay.

2017-09-14 22:24 GMT+09:00 Michal Hocko <[email protected]>:
> [Sorry for a later reply]
>
> On Wed 06-09-17 13:35:25, Joonsoo Kim wrote:
>> From: Joonsoo Kim <[email protected]>
>>
>> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
>> important to reserve.
>
> I am still not convinced this is a good idea. I do agree that reserving
> memory in both HIGHMEM and MOVABLE is just wasting memory but removing
> the reserve from the highmem as well will result that an oom victim will
> allocate from lower zones and that might have unexpected side effects.

Looks like you are confused.

This patch only affects the situation that ZONE_HIGHMEM and ZONE_MOVABLE is
used at the same time. In that case, before this patch, ZONE_HIGHMEM has
reserve for GFP_HIGHMEM | GFP_MOVABLE request, but, with this patch, no reserve
in ZONE_HIGHMEM for GFP_HIGHMEM | GFP_MOVABLE request. This perfectly
matchs with your hope. :)

> Can we simply leave HIGHMEM reserve and only remove it from the movable
> zone if both are present?

There is no higher zone than ZONE_MOVABLE so ZONE_MOVABLE has no reserve
with/without this patch. To save memory, we need to remove the reserve in
ZONE_HIGHMEM.

Thanks.

2018-04-12 12:05:19

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

On Wed 04-04-18 09:24:06, Joonsoo Kim wrote:
> 2017-09-14 22:24 GMT+09:00 Michal Hocko <[email protected]>:
> > [Sorry for a later reply]
> >
> > On Wed 06-09-17 13:35:25, Joonsoo Kim wrote:
> >> From: Joonsoo Kim <[email protected]>
> >>
> >> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> >> important to reserve.
> >
> > I am still not convinced this is a good idea. I do agree that reserving
> > memory in both HIGHMEM and MOVABLE is just wasting memory but removing
> > the reserve from the highmem as well will result that an oom victim will
> > allocate from lower zones and that might have unexpected side effects.
>
> Looks like you are confused.
>
> This patch only affects the situation that ZONE_HIGHMEM and ZONE_MOVABLE is
> used at the same time. In that case, before this patch, ZONE_HIGHMEM has
> reserve for GFP_HIGHMEM | GFP_MOVABLE request, but, with this patch, no reserve
> in ZONE_HIGHMEM for GFP_HIGHMEM | GFP_MOVABLE request. This perfectly
> matchs with your hope. :)

I have forgot all the details but my vague recollection is that the
concern was that GFP_HIGHUSER_MOVABLE etc. wouldn't keep any reserve in
the highmem zone and so emergency allocations - e.g. those during OOM
will have to fallback to kernel zones and might lead to hard to predict
results. Am I still confused and this will not happen after the patch?
--
Michal Hocko
SUSE Labs