2023-01-03 08:58:05

by Michal Hocko

[permalink] [raw]
Subject: Re: (2) [PATCH] page_alloc: avoid the negative free for meminfo available

On Tue 03-01-23 17:20:08, 김재원 wrote:
> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
> >> The totalreserve_pages could be higher than the free because of
> >> watermark high or watermark boost. Handle this situation and fix it to 0
> >> free size.
> >
> >What is the actual problem you are trying to address by this change?
>
> Hello
>
> As described on the original commit,
> 34e431b0ae39 /proc/meminfo: provide estimated available memory
> mm is tring to provide the avaiable memory to user space.
>
> But if free is negative, the available memory shown to userspace
> would be shown smaller thatn the actual available size. The userspace
> may do unwanted memory shrinking actions like process kills.

Do you have any specific example? Have you seen this happening in
practice or is this based on the code inspection?

Also does this patch actually fix anything? Say the system is really
struggling and we are under min watermark. Shouldn't that lead to
Available to be reported as 0 without even looking at other counters?

> I think the logic sholud account the positive size only.
>
> BR
>
> >
> >> Signed-off-by: Jaewon Kim <[email protected]>
> >> ---
> >> mm/page_alloc.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 218b28ee49ed..e510ae83d5f3 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
> >> * without causing swapping or OOM.
> >> */
> >> available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
> >> + if (available < 0)
> >> + available = 0;
> >>
> >> /*
> >> * Not all the page cache can be freed, otherwise the system will
> >> --
> >> 2.17.1
> >
> >--
> >Michal Hocko
> >SUSE Labs
>
>
>
>  
> --------- Original Message ---------
> Sender : Michal Hocko <[email protected]>
> Date : 2023-01-03 17:03 (GMT+9)
> Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available
>  
> On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
> > The totalreserve_pages could be higher than the free because of
> > watermark high or watermark boost. Handle this situation and fix it to 0
> > free size.
>
> What is the actual problem you are trying to address by this change?
>
> > Signed-off-by: Jaewon Kim <[email protected]>
> > ---
> >  mm/page_alloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 218b28ee49ed..e510ae83d5f3 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5948,6 +5948,8 @@ long si_mem_available(void)
> >           * without causing swapping or OOM.
> >           */
> >          available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
> > +        if (available < 0)
> > +                available = 0;
> >  
> >          /*
> >           * Not all the page cache can be freed, otherwise the system will
> > --
> > 2.17.1
>
> --
> Michal Hocko
> SUSE Labs
>

--
Michal Hocko
SUSE Labs


2023-01-03 10:07:18

by Jaewon Kim

[permalink] [raw]
Subject: RE: (2) [PATCH] page_alloc: avoid the negative free for meminfo available

>> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
>> >> The totalreserve_pages could be higher than the free because of
>> >> watermark high or watermark boost. Handle this situation and fix it to 0
>> >> free size.
>> >
>> >What is the actual problem you are trying to address by this change?
>>
>> Hello
>>
>> As described on the original commit,
>> 34e431b0ae39 /proc/meminfo: provide estimated available memory
>> mm is tring to provide the avaiable memory to user space.
>>
>> But if free is negative, the available memory shown to userspace
>> would be shown smaller thatn the actual available size. The userspace
>> may do unwanted memory shrinking actions like process kills.
>
>Do you have any specific example? Have you seen this happening in
>practice or is this based on the code inspection?

I found this from a device using v5.10 based kernel.
Actually the log was printed by user space in its format after reading /proc/meminfo.

MemFree 38220 KB
MemAvailable 90008 KB
Active(file) 137116 KB
Inactive(file) 124128 KB
SReclaimable 100960 KB

Here's /proc/zoneinfo for wmark info.

------ ZONEINFO (/proc/zoneinfo) ------
Node 0, zone DMA32
pages free 17059
min 862
low 9790
high 18718
spanned 524288
present 497920
managed 413348
Node 0, zone Normal
pages free 12795
min 1044
low 11855
high 22666
spanned 8388608
present 524288
managed 500548

The pagecache at this time, seems to be 174,664 KB.
pagecache -= min(pagecache / 2, wmark_low)
We also need to add the reclaimable and the actual free on it to be MemAvaiable.

The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only
because the big wmark high 165,536 seems to be used.

>
>Also does this patch actually fix anything? Say the system is really
>struggling and we are under min watermark. Shouldn't that lead to
>Available to be reported as 0 without even looking at other counters?
>

Sorry but I did not understand, this mis-calculation can be happened
above the min watermark. Do you think the wmark high should be extracted
all the time even if the free is negative?


>> I think the logic sholud account the positive size only.
>>
>> BR
>>
>> >
>> >> Signed-off-by: Jaewon Kim <[email protected]>
>> >> ---
>> >> mm/page_alloc.c | 2 ++
>> >> 1 file changed, 2 insertions(+)
>> >>
>> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> >> index 218b28ee49ed..e510ae83d5f3 100644
>> >> --- a/mm/page_alloc.c
>> >> +++ b/mm/page_alloc.c
>> >> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>> >> * without causing swapping or OOM.
>> >> */
>> >> available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
>> >> + if (available < 0)
>> >> + available = 0;
>> >>
>> >> /*
>> >> * Not all the page cache can be freed, otherwise the system will
>> >> --
>> >> 2.17.1
>> >
>> >--
>> >Michal Hocko
>> >SUSE Labs
>>
>>
>>
>>
>> --------- Original Message ---------
>> Sender : Michal Hocko <[email protected]>
>> Date : 2023-01-03 17:03 (GMT+9)
>> Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available
>>
>> On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
>> > The totalreserve_pages could be higher than the free because of
>> > watermark high or watermark boost. Handle this situation and fix it to 0
>> > free size.
>>
>> What is the actual problem you are trying to address by this change?
>>
>> > Signed-off-by: Jaewon Kim <[email protected]>
>> > ---
>> > mm/page_alloc.c | 2 ++
>> > 1 file changed, 2 insertions(+)
>> >
>> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> > index 218b28ee49ed..e510ae83d5f3 100644
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>> > * without causing swapping or OOM.
>> > */
>> > available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
>> > + if (available < 0)
>> > + available = 0;
>> >
>> > /*
>> > * Not all the page cache can be freed, otherwise the system will
>> > --
>> > 2.17.1
>>
>> --
>> Michal Hocko
>> SUSE Labs
>>

2023-01-03 10:48:17

by Michal Hocko

[permalink] [raw]
Subject: Re: (2) [PATCH] page_alloc: avoid the negative free for meminfo available

On Tue 03-01-23 18:22:32, 김재원 wrote:
> >> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
> >> >> The totalreserve_pages could be higher than the free because of
> >> >> watermark high or watermark boost. Handle this situation and fix it to 0
> >> >> free size.
> >> >
> >> >What is the actual problem you are trying to address by this change?
> >>
> >> Hello
> >>
> >> As described on the original commit,
> >> 34e431b0ae39 /proc/meminfo: provide estimated available memory
> >> mm is tring to provide the avaiable memory to user space.
> >>
> >> But if free is negative, the available memory shown to userspace
> >> would be shown smaller thatn the actual available size. The userspace
> >> may do unwanted memory shrinking actions like process kills.
> >
> >Do you have any specific example? Have you seen this happening in
> >practice or is this based on the code inspection?
>
> I found this from a device using v5.10 based kernel.
> Actually the log was printed by user space in its format after reading /proc/meminfo.
>
> MemFree 38220 KB
> MemAvailable 90008 KB
> Active(file) 137116 KB
> Inactive(file) 124128 KB
> SReclaimable 100960 KB
>
> Here's /proc/zoneinfo for wmark info.
>
> ------ ZONEINFO (/proc/zoneinfo) ------
> Node 0, zone DMA32
> pages free 17059
> min 862
> low 9790
> high 18718
> spanned 524288
> present 497920
> managed 413348
> Node 0, zone Normal
> pages free 12795
> min 1044
> low 11855
> high 22666
> spanned 8388608
> present 524288
> managed 500548
>
> The pagecache at this time, seems to be 174,664 KB.
> pagecache -= min(pagecache / 2, wmark_low)
> We also need to add the reclaimable and the actual free on it to be MemAvaiable.
>
> The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only
> because the big wmark high 165,536 seems to be used.

How have you concluded that? Are you saying that a userspace would be
behaving more sanely when considering more memory to be available?
Please see more on the semantics below.

> >Also does this patch actually fix anything? Say the system is really
> >struggling and we are under min watermark. Shouldn't that lead to
> >Available to be reported as 0 without even looking at other counters?
> >
>
> Sorry but I did not understand,

What I meant here is that the core of the high level definition says:
"An estimate of how much memory is available for starting new
applications, without swapping." If the system is close enough to watermarks
that NR_FREE_PAGES < reserves then it is likely that further memory
allocations will not do without reclaim and potentially swapout.

So the question really is whether just clamping the value to 0 is
actually making MemAvailable more "correct"? See my point?

The actual value is never going to be lazer cut precise. Close to
watermark behavior will vary wildly depending on the memory
reclaimability. Kswapd might easily keep up with memory demand but it
also could get stuck. MemAvailable should be considered a hint rather
than an exact value IMHO.
--
Michal Hocko
SUSE Labs

2023-01-03 10:55:51

by Jaewon Kim

[permalink] [raw]
Subject: RE: (2) [PATCH] page_alloc: avoid the negative free for meminfo available

>> >> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
>> >> >> The totalreserve_pages could be higher than the free because of
>> >> >> watermark high or watermark boost. Handle this situation and fix it to 0
>> >> >> free size.
>> >> >
>> >> >What is the actual problem you are trying to address by this change?
>> >>
>> >> Hello
>> >>
>> >> As described on the original commit,
>> >> 34e431b0ae39 /proc/meminfo: provide estimated available memory
>> >> mm is tring to provide the avaiable memory to user space.
>> >>
>> >> But if free is negative, the available memory shown to userspace
>> >> would be shown smaller thatn the actual available size. The userspace
>> >> may do unwanted memory shrinking actions like process kills.
>> >
>> >Do you have any specific example? Have you seen this happening in
>> >practice or is this based on the code inspection?
>>
>> I found this from a device using v5.10 based kernel.
>> Actually the log was printed by user space in its format after reading /proc/meminfo.
>>
>> MemFree 38220 KB
>> MemAvailable 90008 KB
>> Active(file) 137116 KB
>> Inactive(file) 124128 KB
>> SReclaimable 100960 KB
>>
>> Here's /proc/zoneinfo for wmark info.
>>
>> ------ ZONEINFO (/proc/zoneinfo) ------
>> Node 0, zone DMA32
>> pages free 17059
>> min 862
>> low 9790
>> high 18718
>> spanned 524288
>> present 497920
>> managed 413348
>> Node 0, zone Normal
>> pages free 12795
>> min 1044
>> low 11855
>> high 22666
>> spanned 8388608
>> present 524288
>> managed 500548
>>
>> The pagecache at this time, seems to be 174,664 KB.
>> pagecache -= min(pagecache / 2, wmark_low)
>> We also need to add the reclaimable and the actual free on it to be MemAvaiable.
>>
>> The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only
>> because the big wmark high 165,536 seems to be used.
>
>How have you concluded that? Are you saying that a userspace would be
>behaving more sanely when considering more memory to be available?
>Please see more on the semantics below.
>
>> >Also does this patch actually fix anything? Say the system is really
>> >struggling and we are under min watermark. Shouldn't that lead to
>> >Available to be reported as 0 without even looking at other counters?
>> >
>>
>> Sorry but I did not understand,
>
>What I meant here is that the core of the high level definition says:
>"An estimate of how much memory is available for starting new
>applications, without swapping." If the system is close enough to watermarks
>that NR_FREE_PAGES < reserves then it is likely that further memory
>allocations will not do without reclaim and potentially swapout.

Yes reclaim would be needed in that case.

I think it is just a matter of perspective.
If I follow you, I think, the totalreserve_pages should be considered
as must-have free size.

>
>So the question really is whether just clamping the value to 0 is
>actually making MemAvailable more "correct"? See my point?
>
>The actual value is never going to be lazer cut precise. Close to
>watermark behavior will vary wildly depending on the memory
>reclaimability. Kswapd might easily keep up with memory demand but it
>also could get stuck. MemAvailable should be considered a hint rather
>than an exact value IMHO.

Yeah correct, it is not perfect.
I will drop my patch.
It was nice discussion.
Thank you