From: Joonsoo Kim <[email protected]>
Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
important to reserve. When ZONE_MOVABLE is used, this problem would
theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
allocation request which is mainly used for page cache and anon page
allocation. So, fix it.
And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
makes code complex. For example, if there is highmem system, following
reserve ratio is activated for *NORMAL ZONE* which would be easyily
misleading people.
#ifdef CONFIG_HIGHMEM
32
#endif
This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
array by MAX_NR_ZONES and place "#ifdef" to right place.
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
include/linux/mmzone.h | 2 +-
mm/page_alloc.c | 11 ++++++-----
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e7e92c8..e5f134b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -882,7 +882,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
-extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
+extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 90b1996..6faa53d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -202,17 +202,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
* TBD: should special case ZONE_DMA32 machines here - in those we normally
* don't need any ZONE_NORMAL reservation
*/
-int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
#ifdef CONFIG_ZONE_DMA
- 256,
+ [ZONE_DMA] = 256,
#endif
#ifdef CONFIG_ZONE_DMA32
- 256,
+ [ZONE_DMA32] = 256,
#endif
+ [ZONE_NORMAL] = 32,
#ifdef CONFIG_HIGHMEM
- 32,
+ [ZONE_HIGHMEM] = INT_MAX,
#endif
- 32,
+ [ZONE_MOVABLE] = INT_MAX,
};
EXPORT_SYMBOL(totalram_pages);
--
2.7.4
On Thu 24-08-17 14:45:46, Joonsoo Kim wrote:
> From: Joonsoo Kim <[email protected]>
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it.
I do not really understand what is the problem you are trying to fix.
Yes the memory is reserved for a higher priority consumer and that is
deliberate AFAICT. Just consider that an OOM victim wants to make
further progress and rely on memory reserve while doing
GFP_HIGHUSER_MOVABLE request.
So what is the real problem you are trying to address here?
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
> #ifdef CONFIG_HIGHMEM
> 32
> #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/linux/mmzone.h | 2 +-
> mm/page_alloc.c | 11 ++++++-----
> 2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e7e92c8..e5f134b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -882,7 +882,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
> int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 90b1996..6faa53d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -202,17 +202,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
> * TBD: should special case ZONE_DMA32 machines here - in those we normally
> * don't need any ZONE_NORMAL reservation
> */
> -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
> #ifdef CONFIG_ZONE_DMA
> - 256,
> + [ZONE_DMA] = 256,
> #endif
> #ifdef CONFIG_ZONE_DMA32
> - 256,
> + [ZONE_DMA32] = 256,
> #endif
> + [ZONE_NORMAL] = 32,
> #ifdef CONFIG_HIGHMEM
> - 32,
> + [ZONE_HIGHMEM] = INT_MAX,
> #endif
> - 32,
> + [ZONE_MOVABLE] = INT_MAX,
> };
>
> EXPORT_SYMBOL(totalram_pages);
> --
> 2.7.4
>
--
Michal Hocko
SUSE Labs
On 08/24/2017 07:45 AM, [email protected] wrote:
> From: Joonsoo Kim <[email protected]>
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it.
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
> #ifdef CONFIG_HIGHMEM
> 32
> #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
Looks like I did that almost year ago, so definitely had to refresh my
memory now :)
Anyway now I looked more thoroughly and noticed that this change leaks
into the reported sysctl. On a 64bit system with ZONE_MOVABLE:
before the patch:
vm.lowmem_reserve_ratio = 256 256 32
after the patch:
vm.lowmem_reserve_ratio = 256 256 32 2147483647
So if we indeed remove HIGHMEM from protection (c.f. Michal's mail), we
should do that differently than with the INT_MAX trick, IMHO.
> Signed-off-by: Joonsoo Kim <[email protected]>
> ---
> include/linux/mmzone.h | 2 +-
> mm/page_alloc.c | 11 ++++++-----
> 2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e7e92c8..e5f134b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -882,7 +882,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
> int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 90b1996..6faa53d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -202,17 +202,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
> * TBD: should special case ZONE_DMA32 machines here - in those we normally
> * don't need any ZONE_NORMAL reservation
> */
> -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
> #ifdef CONFIG_ZONE_DMA
> - 256,
> + [ZONE_DMA] = 256,
> #endif
> #ifdef CONFIG_ZONE_DMA32
> - 256,
> + [ZONE_DMA32] = 256,
> #endif
> + [ZONE_NORMAL] = 32,
> #ifdef CONFIG_HIGHMEM
> - 32,
> + [ZONE_HIGHMEM] = INT_MAX,
> #endif
> - 32,
> + [ZONE_MOVABLE] = INT_MAX,
> };
>
> EXPORT_SYMBOL(totalram_pages);
>
On Thu, Aug 24, 2017 at 11:30:50AM +0200, Michal Hocko wrote:
> On Thu 24-08-17 14:45:46, Joonsoo Kim wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > important to reserve. When ZONE_MOVABLE is used, this problem would
> > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > allocation request which is mainly used for page cache and anon page
> > allocation. So, fix it.
>
> I do not really understand what is the problem you are trying to fix.
> Yes the memory is reserved for a higher priority consumer and that is
> deliberate AFAICT. Just consider that an OOM victim wants to make
> further progress and rely on memory reserve while doing
> GFP_HIGHUSER_MOVABLE request.
>
> So what is the real problem you are trying to address here?
If the system has the both, ZONE_HIGHMEM and ZONE_MOVABLE,
ZONE_HIGHMEM will reserve the memory for ZONE_MOVABLE request.
However, they are consumed by nearly equivalent priority consumer who
uses GFP_HIGHMEM + GFP_MOVABLE. In that case, reserved memory in
ZONE_HIGHMEM would not be used and it means just waste of the memory.
This patch try to fix it to nullify reserving memory in ZONE_HIGHMEM.
And, I think that all this problem is caused by the complex code in
lowmem reserve calculation. So, did some clean-up.
Thanks.
On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> On 08/24/2017 07:45 AM, [email protected] wrote:
> > From: Joonsoo Kim <[email protected]>
> >
> > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > important to reserve. When ZONE_MOVABLE is used, this problem would
> > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > allocation request which is mainly used for page cache and anon page
> > allocation. So, fix it.
> >
> > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > makes code complex. For example, if there is highmem system, following
> > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > misleading people.
> >
> > #ifdef CONFIG_HIGHMEM
> > 32
> > #endif
> >
> > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > array by MAX_NR_ZONES and place "#ifdef" to right place.
> >
> > Reviewed-by: Aneesh Kumar K.V <[email protected]>
> > Acked-by: Vlastimil Babka <[email protected]>
>
> Looks like I did that almost year ago, so definitely had to refresh my
> memory now :)
>
> Anyway now I looked more thoroughly and noticed that this change leaks
> into the reported sysctl. On a 64bit system with ZONE_MOVABLE:
>
> before the patch:
> vm.lowmem_reserve_ratio = 256 256 32
>
> after the patch:
> vm.lowmem_reserve_ratio = 256 256 32 2147483647
>
> So if we indeed remove HIGHMEM from protection (c.f. Michal's mail), we
> should do that differently than with the INT_MAX trick, IMHO.
Hmm, this is already pointed by Minchan and I have answered that.
lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
If you have a better idea, please let me know.
Thanks.
On Fri 25-08-17 09:15:43, Joonsoo Kim wrote:
> On Thu, Aug 24, 2017 at 11:30:50AM +0200, Michal Hocko wrote:
> > On Thu 24-08-17 14:45:46, Joonsoo Kim wrote:
> > > From: Joonsoo Kim <[email protected]>
> > >
> > > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > > important to reserve. When ZONE_MOVABLE is used, this problem would
> > > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > > allocation request which is mainly used for page cache and anon page
> > > allocation. So, fix it.
> >
> > I do not really understand what is the problem you are trying to fix.
> > Yes the memory is reserved for a higher priority consumer and that is
> > deliberate AFAICT. Just consider that an OOM victim wants to make
> > further progress and rely on memory reserve while doing
> > GFP_HIGHUSER_MOVABLE request.
> >
> > So what is the real problem you are trying to address here?
>
> If the system has the both, ZONE_HIGHMEM and ZONE_MOVABLE,
> ZONE_HIGHMEM will reserve the memory for ZONE_MOVABLE request.
Ohh, right. I forgot that __GFP_MOVABLE doesn't really enforce the
movable zone. It does so only if __GFP_HIGHMEM is specified as well when
ZONE_HIGHMEM is enabled. So indeed reserving memory in both is somehow
awkward. So why don't we simply remove reserves from the movable zone
when the highmem zone is enabled?
--
Michal Hocko
SUSE Labs
On Fri 25-08-17 09:20:31, Joonsoo Kim wrote:
> On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> > On 08/24/2017 07:45 AM, [email protected] wrote:
> > > From: Joonsoo Kim <[email protected]>
> > >
> > > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > > important to reserve. When ZONE_MOVABLE is used, this problem would
> > > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > > allocation request which is mainly used for page cache and anon page
> > > allocation. So, fix it.
> > >
> > > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > > makes code complex. For example, if there is highmem system, following
> > > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > > misleading people.
> > >
> > > #ifdef CONFIG_HIGHMEM
> > > 32
> > > #endif
> > >
> > > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > > array by MAX_NR_ZONES and place "#ifdef" to right place.
> > >
> > > Reviewed-by: Aneesh Kumar K.V <[email protected]>
> > > Acked-by: Vlastimil Babka <[email protected]>
> >
> > Looks like I did that almost year ago, so definitely had to refresh my
> > memory now :)
> >
> > Anyway now I looked more thoroughly and noticed that this change leaks
> > into the reported sysctl. On a 64bit system with ZONE_MOVABLE:
> >
> > before the patch:
> > vm.lowmem_reserve_ratio = 256 256 32
> >
> > after the patch:
> > vm.lowmem_reserve_ratio = 256 256 32 2147483647
> >
> > So if we indeed remove HIGHMEM from protection (c.f. Michal's mail), we
> > should do that differently than with the INT_MAX trick, IMHO.
>
> Hmm, this is already pointed by Minchan and I have answered that.
>
> lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
>
> If you have a better idea, please let me know.
Why don't we just use 0. In fact we are reserving 0 pages... Using
INT_MAX is just wrong.
--
Michal Hocko
SUSE Labs
On 08/25/2017 02:20 AM, Joonsoo Kim wrote:
> On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
>
> Hmm, this is already pointed by Minchan and I have answered that.
>
> lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
>
> If you have a better idea, please let me know.
My idea is that size of sysctl_lowmem_reserve_ratio is ZONE_NORMAL+1 and
it has no entries for zones > NORMAL. The
setup_per_zone_lowmem_reserve() is adjusted to only set
lower_zone->lowmem_reserve[j] for idx <= ZONE_NORMAL.
I can't imagine somebody would want override the ratio for HIGHMEM or
MOVABLE
(where it has no effect anyway) so the simplest thing is not to expose
it at all.
> Thanks.
>
On Fri, Aug 25, 2017 at 09:38:42AM +0200, Michal Hocko wrote:
> On Fri 25-08-17 09:20:31, Joonsoo Kim wrote:
> > On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> > > On 08/24/2017 07:45 AM, [email protected] wrote:
> > > > From: Joonsoo Kim <[email protected]>
> > > >
> > > > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > > > important to reserve. When ZONE_MOVABLE is used, this problem would
> > > > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > > > allocation request which is mainly used for page cache and anon page
> > > > allocation. So, fix it.
> > > >
> > > > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > > > makes code complex. For example, if there is highmem system, following
> > > > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > > > misleading people.
> > > >
> > > > #ifdef CONFIG_HIGHMEM
> > > > 32
> > > > #endif
> > > >
> > > > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > > > array by MAX_NR_ZONES and place "#ifdef" to right place.
> > > >
> > > > Reviewed-by: Aneesh Kumar K.V <[email protected]>
> > > > Acked-by: Vlastimil Babka <[email protected]>
> > >
> > > Looks like I did that almost year ago, so definitely had to refresh my
> > > memory now :)
> > >
> > > Anyway now I looked more thoroughly and noticed that this change leaks
> > > into the reported sysctl. On a 64bit system with ZONE_MOVABLE:
> > >
> > > before the patch:
> > > vm.lowmem_reserve_ratio = 256 256 32
> > >
> > > after the patch:
> > > vm.lowmem_reserve_ratio = 256 256 32 2147483647
> > >
> > > So if we indeed remove HIGHMEM from protection (c.f. Michal's mail), we
> > > should do that differently than with the INT_MAX trick, IMHO.
> >
> > Hmm, this is already pointed by Minchan and I have answered that.
> >
> > lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
> >
> > If you have a better idea, please let me know.
>
> Why don't we just use 0. In fact we are reserving 0 pages... Using
> INT_MAX is just wrong.
The number of reserved pages is calculated by "managed_pages /
ratio". Using INT_MAX, net result would be 0.
There is a logic converting ratio 0 to ratio 1.
if (sysctl_lowmem_reserve_ratio[idx] < 1)
sysctl_lowmem_reserve_ratio[idx] = 1
If I use 0 to represent 0 reserved page, there would be a user
who is affected by this change. So, I don't use 0 for this patch.
Thanks.
On Fri, Aug 25, 2017 at 09:56:10AM +0200, Vlastimil Babka wrote:
> On 08/25/2017 02:20 AM, Joonsoo Kim wrote:
> > On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> >
> > Hmm, this is already pointed by Minchan and I have answered that.
> >
> > lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
> >
> > If you have a better idea, please let me know.
>
> My idea is that size of sysctl_lowmem_reserve_ratio is ZONE_NORMAL+1 and
> it has no entries for zones > NORMAL. The
> setup_per_zone_lowmem_reserve() is adjusted to only set
> lower_zone->lowmem_reserve[j] for idx <= ZONE_NORMAL.
>
> I can't imagine somebody would want override the ratio for HIGHMEM or
> MOVABLE
> (where it has no effect anyway) so the simplest thing is not to expose
> it at all.
Seems reasonable. However, if there is a user who checks
sysctl_lowmem_reserve_ratio entry for HIGHMEM and change it, suggested
interface will cause a problem since it doesn't expose ratio for
HIGHMEM. Am I missing something?
Thanks.
>
> > Thanks.
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
+CC linux-api
On 08/28/2017 02:28 AM, Joonsoo Kim wrote:
> On Fri, Aug 25, 2017 at 09:56:10AM +0200, Vlastimil Babka wrote:
>> On 08/25/2017 02:20 AM, Joonsoo Kim wrote:
>>> On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
>>>
>>> Hmm, this is already pointed by Minchan and I have answered that.
>>>
>>> lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
>>>
>>> If you have a better idea, please let me know.
>>
>> My idea is that size of sysctl_lowmem_reserve_ratio is ZONE_NORMAL+1 and
>> it has no entries for zones > NORMAL. The
>> setup_per_zone_lowmem_reserve() is adjusted to only set
>> lower_zone->lowmem_reserve[j] for idx <= ZONE_NORMAL.
>>
>> I can't imagine somebody would want override the ratio for HIGHMEM or
>> MOVABLE
>> (where it has no effect anyway) so the simplest thing is not to expose
>> it at all.
>
> Seems reasonable. However, if there is a user who checks
> sysctl_lowmem_reserve_ratio entry for HIGHMEM and change it, suggested
> interface will cause a problem since it doesn't expose ratio for
> HIGHMEM. Am I missing something?
As you explained, it makes little sense to change it for HIGHMEM which
only affects MOVABLE allocations. Also I doubt there are many systems
with both HIGHMEM (implies 32bit) *and* MOVABLE (implies NUMA, memory
hotplug...) zones. So I would just remove it, and if somebody will
really miss it, we can always add it back. In any case, please CC
linux-api on the next version.
> Thanks.
>
>
>>
>>> Thanks.
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Mon 28-08-17 09:15:52, Joonsoo Kim wrote:
> On Fri, Aug 25, 2017 at 09:38:42AM +0200, Michal Hocko wrote:
> > On Fri 25-08-17 09:20:31, Joonsoo Kim wrote:
> > > On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> > > > On 08/24/2017 07:45 AM, [email protected] wrote:
> > > > > From: Joonsoo Kim <[email protected]>
> > > > >
> > > > > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > > > > important to reserve. When ZONE_MOVABLE is used, this problem would
> > > > > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > > > > allocation request which is mainly used for page cache and anon page
> > > > > allocation. So, fix it.
> > > > >
> > > > > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > > > > makes code complex. For example, if there is highmem system, following
> > > > > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > > > > misleading people.
> > > > >
> > > > > #ifdef CONFIG_HIGHMEM
> > > > > 32
> > > > > #endif
> > > > >
> > > > > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > > > > array by MAX_NR_ZONES and place "#ifdef" to right place.
> > > > >
> > > > > Reviewed-by: Aneesh Kumar K.V <[email protected]>
> > > > > Acked-by: Vlastimil Babka <[email protected]>
> > > >
> > > > Looks like I did that almost year ago, so definitely had to refresh my
> > > > memory now :)
> > > >
> > > > Anyway now I looked more thoroughly and noticed that this change leaks
> > > > into the reported sysctl. On a 64bit system with ZONE_MOVABLE:
> > > >
> > > > before the patch:
> > > > vm.lowmem_reserve_ratio = 256 256 32
> > > >
> > > > after the patch:
> > > > vm.lowmem_reserve_ratio = 256 256 32 2147483647
> > > >
> > > > So if we indeed remove HIGHMEM from protection (c.f. Michal's mail), we
> > > > should do that differently than with the INT_MAX trick, IMHO.
> > >
> > > Hmm, this is already pointed by Minchan and I have answered that.
> > >
> > > lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
> > >
> > > If you have a better idea, please let me know.
> >
> > Why don't we just use 0. In fact we are reserving 0 pages... Using
> > INT_MAX is just wrong.
>
> The number of reserved pages is calculated by "managed_pages /
> ratio". Using INT_MAX, net result would be 0.
Why cannot we simply special case 0?
> There is a logic converting ratio 0 to ratio 1.
>
> if (sysctl_lowmem_reserve_ratio[idx] < 1)
> sysctl_lowmem_reserve_ratio[idx] = 1
This code just tries to prevent from division by 0 but I am wondering
we should simply set lowmem_reserve to 0 in that case.
> If I use 0 to represent 0 reserved page, there would be a user
> who is affected by this change. So, I don't use 0 for this patch.
I am sorry but I do not understand? Could you be more specific please?
--
Michal Hocko
SUSE Labs
On Mon, Aug 28, 2017 at 08:45:07AM +0200, Vlastimil Babka wrote:
> +CC linux-api
>
> On 08/28/2017 02:28 AM, Joonsoo Kim wrote:
> > On Fri, Aug 25, 2017 at 09:56:10AM +0200, Vlastimil Babka wrote:
> >> On 08/25/2017 02:20 AM, Joonsoo Kim wrote:
> >>> On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> >>>
> >>> Hmm, this is already pointed by Minchan and I have answered that.
> >>>
> >>> lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
> >>>
> >>> If you have a better idea, please let me know.
> >>
> >> My idea is that size of sysctl_lowmem_reserve_ratio is ZONE_NORMAL+1 and
> >> it has no entries for zones > NORMAL. The
> >> setup_per_zone_lowmem_reserve() is adjusted to only set
> >> lower_zone->lowmem_reserve[j] for idx <= ZONE_NORMAL.
> >>
> >> I can't imagine somebody would want override the ratio for HIGHMEM or
> >> MOVABLE
> >> (where it has no effect anyway) so the simplest thing is not to expose
> >> it at all.
> >
> > Seems reasonable. However, if there is a user who checks
> > sysctl_lowmem_reserve_ratio entry for HIGHMEM and change it, suggested
> > interface will cause a problem since it doesn't expose ratio for
> > HIGHMEM. Am I missing something?
>
> As you explained, it makes little sense to change it for HIGHMEM which
> only affects MOVABLE allocations. Also I doubt there are many systems
> with both HIGHMEM (implies 32bit) *and* MOVABLE (implies NUMA, memory
> hotplug...) zones. So I would just remove it, and if somebody will
> really miss it, we can always add it back. In any case, please CC
> linux-api on the next version.
If we will accept a change that potentially breaks the user, I think
that making zero as a special value for sysctl_lowmem_reserve_ratio
is better solution. How about this way?
Thanks.
On Mon, Aug 28, 2017 at 11:56:16AM +0200, Michal Hocko wrote:
> On Mon 28-08-17 09:15:52, Joonsoo Kim wrote:
> > On Fri, Aug 25, 2017 at 09:38:42AM +0200, Michal Hocko wrote:
> > > On Fri 25-08-17 09:20:31, Joonsoo Kim wrote:
> > > > On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> > > > > On 08/24/2017 07:45 AM, [email protected] wrote:
> > > > > > From: Joonsoo Kim <[email protected]>
> > > > > >
> > > > > > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > > > > > important to reserve. When ZONE_MOVABLE is used, this problem would
> > > > > > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > > > > > allocation request which is mainly used for page cache and anon page
> > > > > > allocation. So, fix it.
> > > > > >
> > > > > > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > > > > > makes code complex. For example, if there is highmem system, following
> > > > > > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > > > > > misleading people.
> > > > > >
> > > > > > #ifdef CONFIG_HIGHMEM
> > > > > > 32
> > > > > > #endif
> > > > > >
> > > > > > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > > > > > array by MAX_NR_ZONES and place "#ifdef" to right place.
> > > > > >
> > > > > > Reviewed-by: Aneesh Kumar K.V <[email protected]>
> > > > > > Acked-by: Vlastimil Babka <[email protected]>
> > > > >
> > > > > Looks like I did that almost year ago, so definitely had to refresh my
> > > > > memory now :)
> > > > >
> > > > > Anyway now I looked more thoroughly and noticed that this change leaks
> > > > > into the reported sysctl. On a 64bit system with ZONE_MOVABLE:
> > > > >
> > > > > before the patch:
> > > > > vm.lowmem_reserve_ratio = 256 256 32
> > > > >
> > > > > after the patch:
> > > > > vm.lowmem_reserve_ratio = 256 256 32 2147483647
> > > > >
> > > > > So if we indeed remove HIGHMEM from protection (c.f. Michal's mail), we
> > > > > should do that differently than with the INT_MAX trick, IMHO.
> > > >
> > > > Hmm, this is already pointed by Minchan and I have answered that.
> > > >
> > > > lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
> > > >
> > > > If you have a better idea, please let me know.
> > >
> > > Why don't we just use 0. In fact we are reserving 0 pages... Using
> > > INT_MAX is just wrong.
> >
> > The number of reserved pages is calculated by "managed_pages /
> > ratio". Using INT_MAX, net result would be 0.
>
> Why cannot we simply special case 0?
>
> > There is a logic converting ratio 0 to ratio 1.
> >
> > if (sysctl_lowmem_reserve_ratio[idx] < 1)
> > sysctl_lowmem_reserve_ratio[idx] = 1
>
> This code just tries to prevent from division by 0 but I am wondering
> we should simply set lowmem_reserve to 0 in that case.
>
> > If I use 0 to represent 0 reserved page, there would be a user
> > who is affected by this change. So, I don't use 0 for this patch.
>
> I am sorry but I do not understand? Could you be more specific please?
If there is a user that manually set sysctl_lowmem_reserve_ratio and
he/she uses '0' to set ratio to '1', your suggestion making '0' as
a special value changes his/her system behaviour. I'm afraid this
case.
However, if you and Vlastimil agree with this making '0' as a special
value, I will go this way.
Thanks.
On 08/29/2017 02:36 AM, Joonsoo Kim wrote:
> On Mon, Aug 28, 2017 at 08:45:07AM +0200, Vlastimil Babka wrote:
>> +CC linux-api
>>
>> On 08/28/2017 02:28 AM, Joonsoo Kim wrote:
>>> On Fri, Aug 25, 2017 at 09:56:10AM +0200, Vlastimil Babka wrote:
>>>
>>> Seems reasonable. However, if there is a user who checks
>>> sysctl_lowmem_reserve_ratio entry for HIGHMEM and change it, suggested
>>> interface will cause a problem since it doesn't expose ratio for
>>> HIGHMEM. Am I missing something?
>>
>> As you explained, it makes little sense to change it for HIGHMEM which
>> only affects MOVABLE allocations. Also I doubt there are many systems
>> with both HIGHMEM (implies 32bit) *and* MOVABLE (implies NUMA, memory
>> hotplug...) zones. So I would just remove it, and if somebody will
>> really miss it, we can always add it back. In any case, please CC
>> linux-api on the next version.
>
> If we will accept a change that potentially breaks the user, I think
> that making zero as a special value for sysctl_lowmem_reserve_ratio
> is better solution. How about this way?
I'd prefer removal, but won't object to zero. Certainly much better than
UINT_MAX.
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Tue 29-08-17 09:45:47, Joonsoo Kim wrote:
> On Mon, Aug 28, 2017 at 11:56:16AM +0200, Michal Hocko wrote:
> > On Mon 28-08-17 09:15:52, Joonsoo Kim wrote:
> > > On Fri, Aug 25, 2017 at 09:38:42AM +0200, Michal Hocko wrote:
> > > > On Fri 25-08-17 09:20:31, Joonsoo Kim wrote:
> > > > > On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> > > > > > On 08/24/2017 07:45 AM, [email protected] wrote:
> > > > > > > From: Joonsoo Kim <[email protected]>
> > > > > > >
> > > > > > > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > > > > > > important to reserve. When ZONE_MOVABLE is used, this problem would
> > > > > > > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > > > > > > allocation request which is mainly used for page cache and anon page
> > > > > > > allocation. So, fix it.
> > > > > > >
> > > > > > > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > > > > > > makes code complex. For example, if there is highmem system, following
> > > > > > > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > > > > > > misleading people.
> > > > > > >
> > > > > > > #ifdef CONFIG_HIGHMEM
> > > > > > > 32
> > > > > > > #endif
> > > > > > >
> > > > > > > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > > > > > > array by MAX_NR_ZONES and place "#ifdef" to right place.
> > > > > > >
> > > > > > > Reviewed-by: Aneesh Kumar K.V <[email protected]>
> > > > > > > Acked-by: Vlastimil Babka <[email protected]>
> > > > > >
> > > > > > Looks like I did that almost year ago, so definitely had to refresh my
> > > > > > memory now :)
> > > > > >
> > > > > > Anyway now I looked more thoroughly and noticed that this change leaks
> > > > > > into the reported sysctl. On a 64bit system with ZONE_MOVABLE:
> > > > > >
> > > > > > before the patch:
> > > > > > vm.lowmem_reserve_ratio = 256 256 32
> > > > > >
> > > > > > after the patch:
> > > > > > vm.lowmem_reserve_ratio = 256 256 32 2147483647
> > > > > >
> > > > > > So if we indeed remove HIGHMEM from protection (c.f. Michal's mail), we
> > > > > > should do that differently than with the INT_MAX trick, IMHO.
> > > > >
> > > > > Hmm, this is already pointed by Minchan and I have answered that.
> > > > >
> > > > > lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
> > > > >
> > > > > If you have a better idea, please let me know.
> > > >
> > > > Why don't we just use 0. In fact we are reserving 0 pages... Using
> > > > INT_MAX is just wrong.
> > >
> > > The number of reserved pages is calculated by "managed_pages /
> > > ratio". Using INT_MAX, net result would be 0.
> >
> > Why cannot we simply special case 0?
> >
> > > There is a logic converting ratio 0 to ratio 1.
> > >
> > > if (sysctl_lowmem_reserve_ratio[idx] < 1)
> > > sysctl_lowmem_reserve_ratio[idx] = 1
> >
> > This code just tries to prevent from division by 0 but I am wondering
> > we should simply set lowmem_reserve to 0 in that case.
> >
> > > If I use 0 to represent 0 reserved page, there would be a user
> > > who is affected by this change. So, I don't use 0 for this patch.
> >
> > I am sorry but I do not understand? Could you be more specific please?
>
> If there is a user that manually set sysctl_lowmem_reserve_ratio and
> he/she uses '0' to set ratio to '1', your suggestion making '0' as
> a special value changes his/her system behaviour. I'm afraid this
> case.
Documentation (Documentation/sysctl/vm.txt) explicitly states that 1
is minimum. So I wouldn't afraid all that much. And you can actually
printk_once if 0 is set and explain that this disables memory reserve
for the particular zone altogether.
> However, if you and Vlastimil agree with this making '0' as a special
> value, I will go this way.
I do agree that INT_MAX is just too ugly.
--
Michal Hocko
SUSE Labs
On Tue, Aug 29, 2017 at 03:39:45PM +0200, Michal Hocko wrote:
> On Tue 29-08-17 09:45:47, Joonsoo Kim wrote:
> > On Mon, Aug 28, 2017 at 11:56:16AM +0200, Michal Hocko wrote:
> > > On Mon 28-08-17 09:15:52, Joonsoo Kim wrote:
> > > > On Fri, Aug 25, 2017 at 09:38:42AM +0200, Michal Hocko wrote:
> > > > > On Fri 25-08-17 09:20:31, Joonsoo Kim wrote:
> > > > > > On Thu, Aug 24, 2017 at 11:41:58AM +0200, Vlastimil Babka wrote:
> > > > > > > On 08/24/2017 07:45 AM, [email protected] wrote:
> > > > > > > > From: Joonsoo Kim <[email protected]>
> > > > > > > >
> > > > > > > > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > > > > > > > important to reserve. When ZONE_MOVABLE is used, this problem would
> > > > > > > > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > > > > > > > allocation request which is mainly used for page cache and anon page
> > > > > > > > allocation. So, fix it.
> > > > > > > >
> > > > > > > > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > > > > > > > makes code complex. For example, if there is highmem system, following
> > > > > > > > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > > > > > > > misleading people.
> > > > > > > >
> > > > > > > > #ifdef CONFIG_HIGHMEM
> > > > > > > > 32
> > > > > > > > #endif
> > > > > > > >
> > > > > > > > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > > > > > > > array by MAX_NR_ZONES and place "#ifdef" to right place.
> > > > > > > >
> > > > > > > > Reviewed-by: Aneesh Kumar K.V <[email protected]>
> > > > > > > > Acked-by: Vlastimil Babka <[email protected]>
> > > > > > >
> > > > > > > Looks like I did that almost year ago, so definitely had to refresh my
> > > > > > > memory now :)
> > > > > > >
> > > > > > > Anyway now I looked more thoroughly and noticed that this change leaks
> > > > > > > into the reported sysctl. On a 64bit system with ZONE_MOVABLE:
> > > > > > >
> > > > > > > before the patch:
> > > > > > > vm.lowmem_reserve_ratio = 256 256 32
> > > > > > >
> > > > > > > after the patch:
> > > > > > > vm.lowmem_reserve_ratio = 256 256 32 2147483647
> > > > > > >
> > > > > > > So if we indeed remove HIGHMEM from protection (c.f. Michal's mail), we
> > > > > > > should do that differently than with the INT_MAX trick, IMHO.
> > > > > >
> > > > > > Hmm, this is already pointed by Minchan and I have answered that.
> > > > > >
> > > > > > lkml.kernel.org/r/<20170421013243.GA13966@js1304-desktop>
> > > > > >
> > > > > > If you have a better idea, please let me know.
> > > > >
> > > > > Why don't we just use 0. In fact we are reserving 0 pages... Using
> > > > > INT_MAX is just wrong.
> > > >
> > > > The number of reserved pages is calculated by "managed_pages /
> > > > ratio". Using INT_MAX, net result would be 0.
> > >
> > > Why cannot we simply special case 0?
> > >
> > > > There is a logic converting ratio 0 to ratio 1.
> > > >
> > > > if (sysctl_lowmem_reserve_ratio[idx] < 1)
> > > > sysctl_lowmem_reserve_ratio[idx] = 1
> > >
> > > This code just tries to prevent from division by 0 but I am wondering
> > > we should simply set lowmem_reserve to 0 in that case.
> > >
> > > > If I use 0 to represent 0 reserved page, there would be a user
> > > > who is affected by this change. So, I don't use 0 for this patch.
> > >
> > > I am sorry but I do not understand? Could you be more specific please?
> >
> > If there is a user that manually set sysctl_lowmem_reserve_ratio and
> > he/she uses '0' to set ratio to '1', your suggestion making '0' as
> > a special value changes his/her system behaviour. I'm afraid this
> > case.
>
> Documentation (Documentation/sysctl/vm.txt) explicitly states that 1
> is minimum. So I wouldn't afraid all that much. And you can actually
> printk_once if 0 is set and explain that this disables memory reserve
> for the particular zone altogether.
Great! If documentation says that, we can freely use the value, zero.
I will do it as this way.
Thanks.