Usually the value of min_free_kbytes is multiply of 4,
and in this case ,the right shift is ok.
But if it's not, the right-shifting operation will lose the low 2 bits,
and this cause kernel don't reserve enough memory.
So it's necessary to align the value of min_free_kbytes to multiply of 4.
For example, if min_free_kbytes is 64, then should keep 16 pages,
but if min_free_kbytes is 65 or 66, then should keep 17 pages.
Signed-off-by: ChenGang <[email protected]>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8a..1baeeba 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7611,7 +7611,8 @@ static void setup_per_zone_lowmem_reserve(void)
static void __setup_per_zone_wmarks(void)
{
- unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
+ unsigned long pages_min =
+ (PAGE_ALIGN(min_free_kbytes * 1024) / 1024) >> (PAGE_SHIFT - 10);
unsigned long lowmem_pages = 0;
struct zone *zone;
unsigned long flags;
--
1.8.5.6
On Sun, Jun 09, 2019 at 05:10:28PM +0800, ChenGang wrote:
>Usually the value of min_free_kbytes is multiply of 4,
>and in this case ,the right shift is ok.
>But if it's not, the right-shifting operation will lose the low 2 bits,
But PAGE_SHIFT is not always 12.
>and this cause kernel don't reserve enough memory.
>So it's necessary to align the value of min_free_kbytes to multiply of 4.
>For example, if min_free_kbytes is 64, then should keep 16 pages,
>but if min_free_kbytes is 65 or 66, then should keep 17 pages.
>
>Signed-off-by: ChenGang <[email protected]>
>---
> mm/page_alloc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index d66bc8a..1baeeba 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -7611,7 +7611,8 @@ static void setup_per_zone_lowmem_reserve(void)
>
> static void __setup_per_zone_wmarks(void)
> {
>- unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
>+ unsigned long pages_min =
>+ (PAGE_ALIGN(min_free_kbytes * 1024) / 1024) >> (PAGE_SHIFT - 10);
In my mind, pages_min is an estimated value. Do we need to be so precise?
> unsigned long lowmem_pages = 0;
> struct zone *zone;
> unsigned long flags;
>--
>1.8.5.6
--
Wei Yang
Help you, Help me
On Sun 09-06-19 17:10:28, ChenGang wrote:
> Usually the value of min_free_kbytes is multiply of 4,
> and in this case ,the right shift is ok.
> But if it's not, the right-shifting operation will lose the low 2 bits,
> and this cause kernel don't reserve enough memory.
> So it's necessary to align the value of min_free_kbytes to multiply of 4.
> For example, if min_free_kbytes is 64, then should keep 16 pages,
> but if min_free_kbytes is 65 or 66, then should keep 17 pages.
Could you describe the actual problem? Do we ever generate
min_free_kbytes that would lead to unexpected reserves or is this trying
to compensate for those values being configured from the userspace? If
later why do we care at all?
Have you seen this to be an actual problem or is this mostly motivated
by the code reading?
> Signed-off-by: ChenGang <[email protected]>
> ---
> mm/page_alloc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d66bc8a..1baeeba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7611,7 +7611,8 @@ static void setup_per_zone_lowmem_reserve(void)
>
> static void __setup_per_zone_wmarks(void)
> {
> - unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
> + unsigned long pages_min =
> + (PAGE_ALIGN(min_free_kbytes * 1024) / 1024) >> (PAGE_SHIFT - 10);
> unsigned long lowmem_pages = 0;
> struct zone *zone;
> unsigned long flags;
> --
> 1.8.5.6
>
--
Michal Hocko
SUSE Labs
Hi Michal
>On Sun 09-06-19 17:10:28, ChenGang wrote:
>> Usually the value of min_free_kbytes is multiply of 4, and in this
>> case ,the right shift is ok.
>> But if it's not, the right-shifting operation will lose the low 2
>> bits, and this cause kernel don't reserve enough memory.
>> So it's necessary to align the value of min_free_kbytes to multiply of 4.
>> For example, if min_free_kbytes is 64, then should keep 16 pages, but
>> if min_free_kbytes is 65 or 66, then should keep 17 pages.
>Could you describe the actual problem? Do we ever generate min_free_kbytes that would lead to unexpected reserves or is this trying to compensate for those values being configured from the userspace? If later why do we care at all?
>Have you seen this to be an actual problem or is this mostly motivated by the code reading?
I haven't seen an actual problem, and it's motivated by code reading. Users can configure this value through interface /proc/sys/vm/min_free_kbytes, so I think a bit precious is better.
>> Signed-off-by: ChenGang <[email protected]>
>> ---
>> mm/page_alloc.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d66bc8a..1baeeba
>> 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7611,7 +7611,8 @@ static void setup_per_zone_lowmem_reserve(void)
>>
>> static void __setup_per_zone_wmarks(void) {
>> - unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
>> + unsigned long pages_min =
>> + (PAGE_ALIGN(min_free_kbytes * 1024) / 1024) >> (PAGE_SHIFT - 10);
>> unsigned long lowmem_pages = 0;
>> struct zone *zone;
>> unsigned long flags;
>> --
>> 1.8.5.6
>>
>--
>Michal Hocko
>SUSE Labs
Hi Wei Yang
>On Sun, Jun 09, 2019 at 05:10:28PM +0800, ChenGang wrote:
>>Usually the value of min_free_kbytes is multiply of 4, and in this case
>>,the right shift is ok.
>>But if it's not, the right-shifting operation will lose the low 2 bits,
>But PAGE_SHIFT is not always 12.
You are right, and this is not the key point, this is just an example.
>>and this cause kernel don't reserve enough memory.
>>So it's necessary to align the value of min_free_kbytes to multiply of 4.
>>For example, if min_free_kbytes is 64, then should keep 16 pages, but
>>if min_free_kbytes is 65 or 66, then should keep 17 pages.
>>
>>Signed-off-by: ChenGang <[email protected]>
>>---
>> mm/page_alloc.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>>diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d66bc8a..1baeeba
>>100644
>>--- a/mm/page_alloc.c
>>+++ b/mm/page_alloc.c
>>@@ -7611,7 +7611,8 @@ static void setup_per_zone_lowmem_reserve(void)
>>
>> static void __setup_per_zone_wmarks(void) {
>>- unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
>>+ unsigned long pages_min =
>>+ (PAGE_ALIGN(min_free_kbytes * 1024) / 1024) >> (PAGE_SHIFT - 10);
>In my mind, pages_min is an estimated value. Do we need to be so precise?
This is the key point, user can set this value through interface/proc/sys/vm/min_free_kbytes, so a bit more precise is better.
>> unsigned long lowmem_pages = 0;
>> struct zone *zone;
>> unsigned long flags;
>>--
>>1.8.5.6
>--
>Wei Yang
>Help you, Help me
On Tue 11-06-19 12:16:35, Chengang (L) wrote:
> Hi Michal
>
>
> >On Sun 09-06-19 17:10:28, ChenGang wrote:
> >> Usually the value of min_free_kbytes is multiply of 4, and in this
> >> case ,the right shift is ok.
> >> But if it's not, the right-shifting operation will lose the low 2
> >> bits, and this cause kernel don't reserve enough memory.
> >> So it's necessary to align the value of min_free_kbytes to multiply of 4.
> >> For example, if min_free_kbytes is 64, then should keep 16 pages, but
> >> if min_free_kbytes is 65 or 66, then should keep 17 pages.
>
> >Could you describe the actual problem? Do we ever generate min_free_kbytes that would lead to unexpected reserves or is this trying to compensate for those values being configured from the userspace? If later why do we care at all?
>
> >Have you seen this to be an actual problem or is this mostly motivated by the code reading?
>
> I haven't seen an actual problem, and it's motivated by code
> reading. Users can configure this value through interface
> /proc/sys/vm/min_free_kbytes, so I think a bit precious is better.
The interface is intended for admins and they should better know what
they are doing, right? Using an ad-hoc valus is not something that is a
common usecase.
That being said, your change makes the code slightly harder to read and
the benefit is not entirely clear from the changelog (which btw. sounds
like there is a real problem which is not described in the user visible
terms). So if you really believe this change is worth it, then make sure
you justify it by exaplain what is a negative consequence of a dubious
value set by an admin.
> >> Signed-off-by: ChenGang <[email protected]>
> >> ---
> >> mm/page_alloc.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d66bc8a..1baeeba
> >> 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -7611,7 +7611,8 @@ static void setup_per_zone_lowmem_reserve(void)
> >>
> >> static void __setup_per_zone_wmarks(void) {
> >> - unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
> >> + unsigned long pages_min =
> >> + (PAGE_ALIGN(min_free_kbytes * 1024) / 1024) >> (PAGE_SHIFT - 10);
> >> unsigned long lowmem_pages = 0;
> >> struct zone *zone;
> >> unsigned long flags;
> >> --
> >> 1.8.5.6
> >>
>
> >--
> >Michal Hocko
> >SUSE Labs
--
Michal Hocko
SUSE Labs