2014-06-16 09:26:41

by Xishi Qiu

[permalink] [raw]
Subject: [PATCH 0/8] mm: add page cache limit and reclaim feature

When system(e.g. smart phone) running for a long time, the cache often takes
a large memory, maybe the free memory is less than 50M, then OOM will happen
if APP allocate a large order pages suddenly and memory reclaim too slowly.

Use "echo 3 > /proc/sys/vm/drop_caches" will drop the whole cache, this will
affect the performance, so it is used for debugging only.

suse has this feature, I tested it before, but it can not limit the page cache
actually. So I rewrite the feature and add some parameters.

Christoph Lameter has written a patch "Limit the size of the pagecache"
http://marc.info/?l=linux-mm&m=116959990228182&w=2
It changes in zone fallback, this is not a good way.

The patchset is based on v3.15, it introduces two features, page cache limit
and page cache reclaim in circles.

Add four parameters in /proc/sys/vm

1) cache_limit_mbytes
This is used to limit page cache amount.
The input unit is MB, value range is from 0 to totalram_pages.
If this is set to 0, it will not limit page cache.
When written to the file, cache_limit_ratio will be updated too.
The default value is 0.

2) cache_limit_ratio
This is used to limit page cache amount.
The input unit is percent, value range is from 0 to 100.
If this is set to 0, it will not limit page cache.
When written to the file, cache_limit_mbytes will be updated too.
The default value is 0.

3) cache_reclaim_s
This is used to reclaim page cache in circles.
The input unit is second, the minimum value is 0.
If this is set to 0, it will disable the feature.
The default value is 0.

4) cache_reclaim_weight
This is used to speed up page cache reclaim.
It depend on enabling cache_limit_mbytes/cache_limit_ratio or cache_reclaim_s.
Value range is from 1(slow) to 100(fast).
The default value is 1.

I tested the two features on my system(x86_64), it seems to work right.
However, as it changes the hot path "add_to_page_cache_lru()", I don't know
how much it will the affect the performance, maybe there are some errors
in the patches too, RFC.


*** BLURB HERE ***

Xishi Qiu (8):
mm: introduce cache_limit_ratio and cache_limit_mbytes
mm: add shrink page cache core
mm: implement page cache limit feature
mm: introduce cache_reclaim_s
mm: implement page cache reclaim in circles
mm: introduce cache_reclaim_weight
mm: implement page cache reclaim speed
doc: update Documentation/sysctl/vm.txt

Documentation/sysctl/vm.txt | 43 +++++++++++++++++++
include/linux/swap.h | 17 ++++++++
kernel/sysctl.c | 35 +++++++++++++++
mm/filemap.c | 3 +
mm/hugetlb.c | 3 +
mm/page_alloc.c | 51 ++++++++++++++++++++++
mm/vmscan.c | 97 ++++++++++++++++++++++++++++++++++++++++++-
7 files changed, 248 insertions(+), 1 deletions(-)


2014-06-16 10:04:09

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

Hi,

On 06/16/2014 05:24 PM, Xishi Qiu wrote:
> When system(e.g. smart phone) running for a long time, the cache often takes
> a large memory, maybe the free memory is less than 50M, then OOM will happen
> if APP allocate a large order pages suddenly and memory reclaim too slowly.

If there is really too many page caches, and the free memory is low. I think
the page allocator will enter the slowpath to free more memory for allocation.
And it the slowpath, there is indeed the direct reclaim operation, so is that
really not enough to reclaim pagecaches?

>
> Use "echo 3 > /proc/sys/vm/drop_caches" will drop the whole cache, this will
> affect the performance, so it is used for debugging only.
>
> suse has this feature, I tested it before, but it can not limit the page cache
> actually. So I rewrite the feature and add some parameters.
>
> Christoph Lameter has written a patch "Limit the size of the pagecache"
> http://marc.info/?l=linux-mm&m=116959990228182&w=2
> It changes in zone fallback, this is not a good way.
>
> The patchset is based on v3.15, it introduces two features, page cache limit
> and page cache reclaim in circles.
>
> Add four parameters in /proc/sys/vm
>
> 1) cache_limit_mbytes
> This is used to limit page cache amount.
> The input unit is MB, value range is from 0 to totalram_pages.
> If this is set to 0, it will not limit page cache.
> When written to the file, cache_limit_ratio will be updated too.
> The default value is 0.
>
> 2) cache_limit_ratio
> This is used to limit page cache amount.
> The input unit is percent, value range is from 0 to 100.
> If this is set to 0, it will not limit page cache.
> When written to the file, cache_limit_mbytes will be updated too.
> The default value is 0.
>
> 3) cache_reclaim_s
> This is used to reclaim page cache in circles.
> The input unit is second, the minimum value is 0.
> If this is set to 0, it will disable the feature.
> The default value is 0.
>
> 4) cache_reclaim_weight
> This is used to speed up page cache reclaim.
> It depend on enabling cache_limit_mbytes/cache_limit_ratio or cache_reclaim_s.
> Value range is from 1(slow) to 100(fast).
> The default value is 1.
>
> I tested the two features on my system(x86_64), it seems to work right.
> However, as it changes the hot path "add_to_page_cache_lru()", I don't know
> how much it will the affect the performance,

Yeah, at a quick glance, for every invoke of add_to_page_cache_lru(), there is the
newly added test:

if (vm_cache_limit_mbytes && page_cache_over_limit())

and if the test is passed, shrink_page_cache()->do_try_to_free_pages() is called.
And this is a sync operation. IMO, it is better to make such an operation async.
(You've implemented async operation but I doubt if it is suitable to put the sync operation
here.)

Thanks.

maybe there are some errors
> in the patches too, RFC.
>
>
> *** BLURB HERE ***
>
> Xishi Qiu (8):
> mm: introduce cache_limit_ratio and cache_limit_mbytes
> mm: add shrink page cache core
> mm: implement page cache limit feature
> mm: introduce cache_reclaim_s
> mm: implement page cache reclaim in circles
> mm: introduce cache_reclaim_weight
> mm: implement page cache reclaim speed
> doc: update Documentation/sysctl/vm.txt
>
> Documentation/sysctl/vm.txt | 43 +++++++++++++++++++
> include/linux/swap.h | 17 ++++++++
> kernel/sysctl.c | 35 +++++++++++++++
> mm/filemap.c | 3 +
> mm/hugetlb.c | 3 +
> mm/page_alloc.c | 51 ++++++++++++++++++++++
> mm/vmscan.c | 97 ++++++++++++++++++++++++++++++++++++++++++-
> 7 files changed, 248 insertions(+), 1 deletions(-)
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> .
>


--
Thanks.
Zhang Yanfei

2014-06-16 10:45:12

by Xishi Qiu

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On 2014/6/16 18:04, Zhang Yanfei wrote:

> Hi,
>
> On 06/16/2014 05:24 PM, Xishi Qiu wrote:
>> When system(e.g. smart phone) running for a long time, the cache often takes
>> a large memory, maybe the free memory is less than 50M, then OOM will happen
>> if APP allocate a large order pages suddenly and memory reclaim too slowly.
>
> If there is really too many page caches, and the free memory is low. I think
> the page allocator will enter the slowpath to free more memory for allocation.
> And it the slowpath, there is indeed the direct reclaim operation, so is that
> really not enough to reclaim pagecaches?
>

Hi Yanfei,

Do you mean this path?
__alloc_pages_slowpath()
__alloc_pages_direct_reclaim()
__perform_reclaim()
try_to_free_pages()
the "nr_to_reclaim = SWAP_CLUSTER_MAX" is only 32 pages.

>>
>> Use "echo 3 > /proc/sys/vm/drop_caches" will drop the whole cache, this will
>> affect the performance, so it is used for debugging only.
>>
>> suse has this feature, I tested it before, but it can not limit the page cache
>> actually. So I rewrite the feature and add some parameters.
>>
>> Christoph Lameter has written a patch "Limit the size of the pagecache"
>> http://marc.info/?l=linux-mm&m=116959990228182&w=2
>> It changes in zone fallback, this is not a good way.
>>
>> The patchset is based on v3.15, it introduces two features, page cache limit
>> and page cache reclaim in circles.
>>
>> Add four parameters in /proc/sys/vm
>>
>> 1) cache_limit_mbytes
>> This is used to limit page cache amount.
>> The input unit is MB, value range is from 0 to totalram_pages.
>> If this is set to 0, it will not limit page cache.
>> When written to the file, cache_limit_ratio will be updated too.
>> The default value is 0.
>>
>> 2) cache_limit_ratio
>> This is used to limit page cache amount.
>> The input unit is percent, value range is from 0 to 100.
>> If this is set to 0, it will not limit page cache.
>> When written to the file, cache_limit_mbytes will be updated too.
>> The default value is 0.
>>
>> 3) cache_reclaim_s
>> This is used to reclaim page cache in circles.
>> The input unit is second, the minimum value is 0.
>> If this is set to 0, it will disable the feature.
>> The default value is 0.
>>
>> 4) cache_reclaim_weight
>> This is used to speed up page cache reclaim.
>> It depend on enabling cache_limit_mbytes/cache_limit_ratio or cache_reclaim_s.
>> Value range is from 1(slow) to 100(fast).
>> The default value is 1.
>>
>> I tested the two features on my system(x86_64), it seems to work right.
>> However, as it changes the hot path "add_to_page_cache_lru()", I don't know
>> how much it will the affect the performance,
>
> Yeah, at a quick glance, for every invoke of add_to_page_cache_lru(), there is the
> newly added test:
>
> if (vm_cache_limit_mbytes && page_cache_over_limit())
>
> and if the test is passed, shrink_page_cache()->do_try_to_free_pages() is called.
> And this is a sync operation. IMO, it is better to make such an operation async.
> (You've implemented async operation but I doubt if it is suitable to put the sync operation
> here.)
>
> Thanks.
>

Sounds to a good idea, how about waking up kswapd()?

Thanks,
Xishi Qiu

> maybe there are some errors
>> in the patches too, RFC.
>>
>>
>> *** BLURB HERE ***
>>
>> Xishi Qiu (8):
>> mm: introduce cache_limit_ratio and cache_limit_mbytes
>> mm: add shrink page cache core
>> mm: implement page cache limit feature
>> mm: introduce cache_reclaim_s
>> mm: implement page cache reclaim in circles
>> mm: introduce cache_reclaim_weight
>> mm: implement page cache reclaim speed
>> doc: update Documentation/sysctl/vm.txt
>>
>> Documentation/sysctl/vm.txt | 43 +++++++++++++++++++
>> include/linux/swap.h | 17 ++++++++
>> kernel/sysctl.c | 35 +++++++++++++++
>> mm/filemap.c | 3 +
>> mm/hugetlb.c | 3 +
>> mm/page_alloc.c | 51 ++++++++++++++++++++++
>> mm/vmscan.c | 97 ++++++++++++++++++++++++++++++++++++++++++-
>> 7 files changed, 248 insertions(+), 1 deletions(-)
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>> .
>>
>
>


2014-06-16 11:14:26

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On Mon 16-06-14 17:24:38, Xishi Qiu wrote:
> When system(e.g. smart phone) running for a long time, the cache often takes
> a large memory, maybe the free memory is less than 50M, then OOM will happen
> if APP allocate a large order pages suddenly and memory reclaim too slowly.

Have you ever seen this to happen? Page cache should be easy to reclaim and
if there is too mach dirty memory then you should be able to tune the
amount by dirty_bytes/ratio knob. If the page allocator falls back to
OOM and there is a lot of page cache then I would call it a bug. I do
not think that limiting the amount of the page cache globally makes
sense. There are Unix systems which offer this feature but I think it is
a bad interface which only papers over the reclaim inefficiency or lack
of other isolations between loads.

> Use "echo 3 > /proc/sys/vm/drop_caches" will drop the whole cache, this will
> affect the performance, so it is used for debugging only.
>
> suse has this feature, I tested it before, but it can not limit the page cache
> actually. So I rewrite the feature and add some parameters.

The feature is there for historic reasons and I _really_ think the
interface is not appropriate. If there is a big pagecache usage which
affects other loads then Memory cgroup controller can be used to help
from interference.

> Christoph Lameter has written a patch "Limit the size of the pagecache"
> http://marc.info/?l=linux-mm&m=116959990228182&w=2
> It changes in zone fallback, this is not a good way.
>
> The patchset is based on v3.15, it introduces two features, page cache limit
> and page cache reclaim in circles.
>
> Add four parameters in /proc/sys/vm
>
> 1) cache_limit_mbytes
> This is used to limit page cache amount.
> The input unit is MB, value range is from 0 to totalram_pages.
> If this is set to 0, it will not limit page cache.
> When written to the file, cache_limit_ratio will be updated too.
> The default value is 0.
>
> 2) cache_limit_ratio
> This is used to limit page cache amount.
> The input unit is percent, value range is from 0 to 100.
> If this is set to 0, it will not limit page cache.
> When written to the file, cache_limit_mbytes will be updated too.
> The default value is 0.
>
> 3) cache_reclaim_s
> This is used to reclaim page cache in circles.
> The input unit is second, the minimum value is 0.
> If this is set to 0, it will disable the feature.
> The default value is 0.
>
> 4) cache_reclaim_weight
> This is used to speed up page cache reclaim.
> It depend on enabling cache_limit_mbytes/cache_limit_ratio or cache_reclaim_s.
> Value range is from 1(slow) to 100(fast).
> The default value is 1.
>
> I tested the two features on my system(x86_64), it seems to work right.
> However, as it changes the hot path "add_to_page_cache_lru()", I don't know
> how much it will the affect the performance, maybe there are some errors
> in the patches too, RFC.

I haven't looked at patches yet but you would need to explain why the
feature is needed much better and why the existing features are not
sufficient.
--
Michal Hocko
SUSE Labs

2014-06-16 12:51:25

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On Mon, Jun 16, 2014 at 01:14:22PM +0200, Michal Hocko wrote:
> On Mon 16-06-14 17:24:38, Xishi Qiu wrote:
> > When system(e.g. smart phone) running for a long time, the cache often takes
> > a large memory, maybe the free memory is less than 50M, then OOM will happen
> > if APP allocate a large order pages suddenly and memory reclaim too slowly.
>
> Have you ever seen this to happen? Page cache should be easy to reclaim and
> if there is too mach dirty memory then you should be able to tune the
> amount by dirty_bytes/ratio knob. If the page allocator falls back to
> OOM and there is a lot of page cache then I would call it a bug. I do
> not think that limiting the amount of the page cache globally makes
> sense. There are Unix systems which offer this feature but I think it is
> a bad interface which only papers over the reclaim inefficiency or lack
> of other isolations between loads.
>
+1

It would be good if you could show some numbers that serve as evidence
of your theory on "excessive" pagecache acting as a trigger to your
observed OOMs. I'm assuming, by your 'e.g', you're running a swapless
system, so I would think your system OOMs are due to inability to
reclaim anon memory, instead of pagecache.


> > Use "echo 3 > /proc/sys/vm/drop_caches" will drop the whole cache, this will
> > affect the performance, so it is used for debugging only.
> >

If you are able to drop the whole pagecache by issuing the command
above, than it means the majority of it is just unmapped cache pages,
and those would be normally reclaimed upon demand by the PFRA. One more
thing that makes me wonder you're just seeing the effect of a leaky app
making the system unable to swap out anon pages.


> > suse has this feature, I tested it before, but it can not limit the page cache
> > actually. So I rewrite the feature and add some parameters.
>
> The feature is there for historic reasons and I _really_ think the
> interface is not appropriate. If there is a big pagecache usage which
> affects other loads then Memory cgroup controller can be used to help
> from interference.
>
> > Christoph Lameter has written a patch "Limit the size of the pagecache"
> > http://marc.info/?l=linux-mm&m=116959990228182&w=2
> > It changes in zone fallback, this is not a good way.
> >
> > The patchset is based on v3.15, it introduces two features, page cache limit
> > and page cache reclaim in circles.
> >
> > Add four parameters in /proc/sys/vm
> >
> > 1) cache_limit_mbytes
> > This is used to limit page cache amount.
> > The input unit is MB, value range is from 0 to totalram_pages.
> > If this is set to 0, it will not limit page cache.
> > When written to the file, cache_limit_ratio will be updated too.
> > The default value is 0.
> >
> > 2) cache_limit_ratio
> > This is used to limit page cache amount.
> > The input unit is percent, value range is from 0 to 100.
> > If this is set to 0, it will not limit page cache.
> > When written to the file, cache_limit_mbytes will be updated too.
> > The default value is 0.
> >
> > 3) cache_reclaim_s
> > This is used to reclaim page cache in circles.
> > The input unit is second, the minimum value is 0.
> > If this is set to 0, it will disable the feature.
> > The default value is 0.
> >
> > 4) cache_reclaim_weight
> > This is used to speed up page cache reclaim.
> > It depend on enabling cache_limit_mbytes/cache_limit_ratio or cache_reclaim_s.
> > Value range is from 1(slow) to 100(fast).
> > The default value is 1.
> >
> > I tested the two features on my system(x86_64), it seems to work right.
> > However, as it changes the hot path "add_to_page_cache_lru()", I don't know
> > how much it will the affect the performance, maybe there are some errors
> > in the patches too, RFC.
>
> I haven't looked at patches yet but you would need to explain why the
> feature is needed much better and why the existing features are not
> sufficient.
> --
> Michal Hocko
> SUSE Labs

2014-06-17 01:37:24

by Xishi Qiu

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On 2014/6/16 20:50, Rafael Aquini wrote:

> On Mon, Jun 16, 2014 at 01:14:22PM +0200, Michal Hocko wrote:
>> On Mon 16-06-14 17:24:38, Xishi Qiu wrote:
>>> When system(e.g. smart phone) running for a long time, the cache often takes
>>> a large memory, maybe the free memory is less than 50M, then OOM will happen
>>> if APP allocate a large order pages suddenly and memory reclaim too slowly.
>>
>> Have you ever seen this to happen? Page cache should be easy to reclaim and
>> if there is too mach dirty memory then you should be able to tune the
>> amount by dirty_bytes/ratio knob. If the page allocator falls back to
>> OOM and there is a lot of page cache then I would call it a bug. I do
>> not think that limiting the amount of the page cache globally makes
>> sense. There are Unix systems which offer this feature but I think it is
>> a bad interface which only papers over the reclaim inefficiency or lack
>> of other isolations between loads.
>>
> +1
>
> It would be good if you could show some numbers that serve as evidence
> of your theory on "excessive" pagecache acting as a trigger to your
> observed OOMs. I'm assuming, by your 'e.g', you're running a swapless
> system, so I would think your system OOMs are due to inability to
> reclaim anon memory, instead of pagecache.
>

Thank you for your reply.
I'll try to find some examples in my company.

>
>>> Use "echo 3 > /proc/sys/vm/drop_caches" will drop the whole cache, this will
>>> affect the performance, so it is used for debugging only.
>>>
>
> If you are able to drop the whole pagecache by issuing the command
> above, than it means the majority of it is just unmapped cache pages,
> and those would be normally reclaimed upon demand by the PFRA. One more
> thing that makes me wonder you're just seeing the effect of a leaky app
> making the system unable to swap out anon pages.
>

I find the page cache will only be reclaimed when there is not enough
memory. And in some smart phones, there is no swap disk.
So I add a parameter to reclaim in circles.

Thanks,
Xishi Qiu

>
>>> suse has this feature, I tested it before, but it can not limit the page cache
>>> actually. So I rewrite the feature and add some parameters.
>>
>> The feature is there for historic reasons and I _really_ think the
>> interface is not appropriate. If there is a big pagecache usage which
>> affects other loads then Memory cgroup controller can be used to help
>> from interference.
>>
>>> Christoph Lameter has written a patch "Limit the size of the pagecache"
>>> http://marc.info/?l=linux-mm&m=116959990228182&w=2
>>> It changes in zone fallback, this is not a good way.
>>>
>>> The patchset is based on v3.15, it introduces two features, page cache limit
>>> and page cache reclaim in circles.
>>>
>>> Add four parameters in /proc/sys/vm
>>>
>>> 1) cache_limit_mbytes
>>> This is used to limit page cache amount.
>>> The input unit is MB, value range is from 0 to totalram_pages.
>>> If this is set to 0, it will not limit page cache.
>>> When written to the file, cache_limit_ratio will be updated too.
>>> The default value is 0.
>>>
>>> 2) cache_limit_ratio
>>> This is used to limit page cache amount.
>>> The input unit is percent, value range is from 0 to 100.
>>> If this is set to 0, it will not limit page cache.
>>> When written to the file, cache_limit_mbytes will be updated too.
>>> The default value is 0.
>>>
>>> 3) cache_reclaim_s
>>> This is used to reclaim page cache in circles.
>>> The input unit is second, the minimum value is 0.
>>> If this is set to 0, it will disable the feature.
>>> The default value is 0.
>>>
>>> 4) cache_reclaim_weight
>>> This is used to speed up page cache reclaim.
>>> It depend on enabling cache_limit_mbytes/cache_limit_ratio or cache_reclaim_s.
>>> Value range is from 1(slow) to 100(fast).
>>> The default value is 1.
>>>
>>> I tested the two features on my system(x86_64), it seems to work right.
>>> However, as it changes the hot path "add_to_page_cache_lru()", I don't know
>>> how much it will the affect the performance, maybe there are some errors
>>> in the patches too, RFC.
>>
>> I haven't looked at patches yet but you would need to explain why the
>> feature is needed much better and why the existing features are not
>> sufficient.
>> --
>> Michal Hocko
>> SUSE Labs
>
> .
>


2014-06-20 07:59:03

by Xishi Qiu

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On 2014/6/17 9:35, Xishi Qiu wrote:

> On 2014/6/16 20:50, Rafael Aquini wrote:
>
>> On Mon, Jun 16, 2014 at 01:14:22PM +0200, Michal Hocko wrote:
>>> On Mon 16-06-14 17:24:38, Xishi Qiu wrote:
>>>> When system(e.g. smart phone) running for a long time, the cache often takes
>>>> a large memory, maybe the free memory is less than 50M, then OOM will happen
>>>> if APP allocate a large order pages suddenly and memory reclaim too slowly.
>>>
>>> Have you ever seen this to happen? Page cache should be easy to reclaim and
>>> if there is too mach dirty memory then you should be able to tune the
>>> amount by dirty_bytes/ratio knob. If the page allocator falls back to
>>> OOM and there is a lot of page cache then I would call it a bug. I do
>>> not think that limiting the amount of the page cache globally makes
>>> sense. There are Unix systems which offer this feature but I think it is
>>> a bad interface which only papers over the reclaim inefficiency or lack
>>> of other isolations between loads.
>>>
>> +1
>>
>> It would be good if you could show some numbers that serve as evidence
>> of your theory on "excessive" pagecache acting as a trigger to your
>> observed OOMs. I'm assuming, by your 'e.g', you're running a swapless
>> system, so I would think your system OOMs are due to inability to
>> reclaim anon memory, instead of pagecache.
>>

I asked some colleagues, when the cache takes a large memory, it will not
trigger OOM, but performance regression.

It is because that business process do IO high frequency, and this will
increase page cache. When there is not enough memory, page cache will
be reclaimed first, then alloc a new page, and add it to page cache. This
often takes too much time, and causes performance regression.

In view of this situation, if we reclaim page cache in circles may be
fix this problem. What do you think?

Thanks,
Xishi Qiu

2014-06-20 15:32:16

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On Fri 20-06-14 15:56:56, Xishi Qiu wrote:
> On 2014/6/17 9:35, Xishi Qiu wrote:
>
> > On 2014/6/16 20:50, Rafael Aquini wrote:
> >
> >> On Mon, Jun 16, 2014 at 01:14:22PM +0200, Michal Hocko wrote:
> >>> On Mon 16-06-14 17:24:38, Xishi Qiu wrote:
> >>>> When system(e.g. smart phone) running for a long time, the cache often takes
> >>>> a large memory, maybe the free memory is less than 50M, then OOM will happen
> >>>> if APP allocate a large order pages suddenly and memory reclaim too slowly.
> >>>
> >>> Have you ever seen this to happen? Page cache should be easy to reclaim and
> >>> if there is too mach dirty memory then you should be able to tune the
> >>> amount by dirty_bytes/ratio knob. If the page allocator falls back to
> >>> OOM and there is a lot of page cache then I would call it a bug. I do
> >>> not think that limiting the amount of the page cache globally makes
> >>> sense. There are Unix systems which offer this feature but I think it is
> >>> a bad interface which only papers over the reclaim inefficiency or lack
> >>> of other isolations between loads.
> >>>
> >> +1
> >>
> >> It would be good if you could show some numbers that serve as evidence
> >> of your theory on "excessive" pagecache acting as a trigger to your
> >> observed OOMs. I'm assuming, by your 'e.g', you're running a swapless
> >> system, so I would think your system OOMs are due to inability to
> >> reclaim anon memory, instead of pagecache.
> >>
>
> I asked some colleagues, when the cache takes a large memory, it will not
> trigger OOM, but performance regression.
>
> It is because that business process do IO high frequency, and this will
> increase page cache. When there is not enough memory, page cache will
> be reclaimed first, then alloc a new page, and add it to page cache. This
> often takes too much time, and causes performance regression.

I cannot say I would understand the problem you are describing. So the
page cache eats the most of the memory and that increases allocation
latency for new page cache? Is it because of the direct reclaim?
Why kswapd doesn't reclaim the clean pagecache? Or is the memory dirty?

> In view of this situation, if we reclaim page cache in circles may be
> fix this problem. What do you think?

No, it seems more like either system misconfiguration or a reclaim bug.
--
Michal Hocko
SUSE Labs

2014-06-23 02:06:49

by Xishi Qiu

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On 2014/6/20 23:32, Michal Hocko wrote:

> On Fri 20-06-14 15:56:56, Xishi Qiu wrote:
>> On 2014/6/17 9:35, Xishi Qiu wrote:
>>
>>> On 2014/6/16 20:50, Rafael Aquini wrote:
>>>
>>>> On Mon, Jun 16, 2014 at 01:14:22PM +0200, Michal Hocko wrote:
>>>>> On Mon 16-06-14 17:24:38, Xishi Qiu wrote:
>>>>>> When system(e.g. smart phone) running for a long time, the cache often takes
>>>>>> a large memory, maybe the free memory is less than 50M, then OOM will happen
>>>>>> if APP allocate a large order pages suddenly and memory reclaim too slowly.
>>>>>
>>>>> Have you ever seen this to happen? Page cache should be easy to reclaim and
>>>>> if there is too mach dirty memory then you should be able to tune the
>>>>> amount by dirty_bytes/ratio knob. If the page allocator falls back to
>>>>> OOM and there is a lot of page cache then I would call it a bug. I do
>>>>> not think that limiting the amount of the page cache globally makes
>>>>> sense. There are Unix systems which offer this feature but I think it is
>>>>> a bad interface which only papers over the reclaim inefficiency or lack
>>>>> of other isolations between loads.
>>>>>
>>>> +1
>>>>
>>>> It would be good if you could show some numbers that serve as evidence
>>>> of your theory on "excessive" pagecache acting as a trigger to your
>>>> observed OOMs. I'm assuming, by your 'e.g', you're running a swapless
>>>> system, so I would think your system OOMs are due to inability to
>>>> reclaim anon memory, instead of pagecache.
>>>>
>>
>> I asked some colleagues, when the cache takes a large memory, it will not
>> trigger OOM, but performance regression.
>>
>> It is because that business process do IO high frequency, and this will
>> increase page cache. When there is not enough memory, page cache will
>> be reclaimed first, then alloc a new page, and add it to page cache. This
>> often takes too much time, and causes performance regression.
>
> I cannot say I would understand the problem you are describing. So the
> page cache eats the most of the memory and that increases allocation
> latency for new page cache? Is it because of the direct reclaim?

Yes, allocation latency causes performance regression.

A user process produces page cache frequently, so free memory is not
enough after running a long time. Slow path takes much more time because
direct reclaim. And kswapd will reclaim memory too, but not much. Thus it
always triggers slow path. this will cause performance regression.

Thanks,
Xishi Qiu

> Why kswapd doesn't reclaim the clean pagecache? Or is the memory dirty?
>

>> In view of this situation, if we reclaim page cache in circles may be
>> fix this problem. What do you think?
>
> No, it seems more like either system misconfiguration or a reclaim bug.


2014-06-23 11:30:00

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On Mon 23-06-14 10:05:48, Xishi Qiu wrote:
> On 2014/6/20 23:32, Michal Hocko wrote:
>
> > On Fri 20-06-14 15:56:56, Xishi Qiu wrote:
> >> On 2014/6/17 9:35, Xishi Qiu wrote:
> >>
> >>> On 2014/6/16 20:50, Rafael Aquini wrote:
> >>>
> >>>> On Mon, Jun 16, 2014 at 01:14:22PM +0200, Michal Hocko wrote:
> >>>>> On Mon 16-06-14 17:24:38, Xishi Qiu wrote:
> >>>>>> When system(e.g. smart phone) running for a long time, the cache often takes
> >>>>>> a large memory, maybe the free memory is less than 50M, then OOM will happen
> >>>>>> if APP allocate a large order pages suddenly and memory reclaim too slowly.
> >>>>>
> >>>>> Have you ever seen this to happen? Page cache should be easy to reclaim and
> >>>>> if there is too mach dirty memory then you should be able to tune the
> >>>>> amount by dirty_bytes/ratio knob. If the page allocator falls back to
> >>>>> OOM and there is a lot of page cache then I would call it a bug. I do
> >>>>> not think that limiting the amount of the page cache globally makes
> >>>>> sense. There are Unix systems which offer this feature but I think it is
> >>>>> a bad interface which only papers over the reclaim inefficiency or lack
> >>>>> of other isolations between loads.
> >>>>>
> >>>> +1
> >>>>
> >>>> It would be good if you could show some numbers that serve as evidence
> >>>> of your theory on "excessive" pagecache acting as a trigger to your
> >>>> observed OOMs. I'm assuming, by your 'e.g', you're running a swapless
> >>>> system, so I would think your system OOMs are due to inability to
> >>>> reclaim anon memory, instead of pagecache.
> >>>>
> >>
> >> I asked some colleagues, when the cache takes a large memory, it will not
> >> trigger OOM, but performance regression.
> >>
> >> It is because that business process do IO high frequency, and this will
> >> increase page cache. When there is not enough memory, page cache will
> >> be reclaimed first, then alloc a new page, and add it to page cache. This
> >> often takes too much time, and causes performance regression.
> >
> > I cannot say I would understand the problem you are describing. So the
> > page cache eats the most of the memory and that increases allocation
> > latency for new page cache? Is it because of the direct reclaim?
>
> Yes, allocation latency causes performance regression.

This doesn't make much sense to me. So you have a problem with latency
caused by direct reclaim so you add a new way of direct page cache
reclaim.

> A user process produces page cache frequently, so free memory is not
> enough after running a long time. Slow path takes much more time because
> direct reclaim. And kswapd will reclaim memory too, but not much. Thus it
> always triggers slow path. this will cause performance regression.

If I were you I would focus on why the reclaim doesn't catch up with the
page cache users. The mechanism you are proposing in unacceptable.
--
Michal Hocko
SUSE Labs

2014-06-24 02:26:28

by Xishi Qiu

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On 2014/6/23 19:29, Michal Hocko wrote:

> On Mon 23-06-14 10:05:48, Xishi Qiu wrote:
>> On 2014/6/20 23:32, Michal Hocko wrote:
>>
>>> On Fri 20-06-14 15:56:56, Xishi Qiu wrote:
>>>> On 2014/6/17 9:35, Xishi Qiu wrote:
>>>>
>>>>> On 2014/6/16 20:50, Rafael Aquini wrote:
>>>>>
>>>>>> On Mon, Jun 16, 2014 at 01:14:22PM +0200, Michal Hocko wrote:
>>>>>>> On Mon 16-06-14 17:24:38, Xishi Qiu wrote:
>>>>>>>> When system(e.g. smart phone) running for a long time, the cache often takes
>>>>>>>> a large memory, maybe the free memory is less than 50M, then OOM will happen
>>>>>>>> if APP allocate a large order pages suddenly and memory reclaim too slowly.
>>>>>>>
>>>>>>> Have you ever seen this to happen? Page cache should be easy to reclaim and
>>>>>>> if there is too mach dirty memory then you should be able to tune the
>>>>>>> amount by dirty_bytes/ratio knob. If the page allocator falls back to
>>>>>>> OOM and there is a lot of page cache then I would call it a bug. I do
>>>>>>> not think that limiting the amount of the page cache globally makes
>>>>>>> sense. There are Unix systems which offer this feature but I think it is
>>>>>>> a bad interface which only papers over the reclaim inefficiency or lack
>>>>>>> of other isolations between loads.
>>>>>>>
>>>>>> +1
>>>>>>
>>>>>> It would be good if you could show some numbers that serve as evidence
>>>>>> of your theory on "excessive" pagecache acting as a trigger to your
>>>>>> observed OOMs. I'm assuming, by your 'e.g', you're running a swapless
>>>>>> system, so I would think your system OOMs are due to inability to
>>>>>> reclaim anon memory, instead of pagecache.
>>>>>>
>>>>
>>>> I asked some colleagues, when the cache takes a large memory, it will not
>>>> trigger OOM, but performance regression.
>>>>
>>>> It is because that business process do IO high frequency, and this will
>>>> increase page cache. When there is not enough memory, page cache will
>>>> be reclaimed first, then alloc a new page, and add it to page cache. This
>>>> often takes too much time, and causes performance regression.
>>>
>>> I cannot say I would understand the problem you are describing. So the
>>> page cache eats the most of the memory and that increases allocation
>>> latency for new page cache? Is it because of the direct reclaim?
>>
>> Yes, allocation latency causes performance regression.
>
> This doesn't make much sense to me. So you have a problem with latency
> caused by direct reclaim so you add a new way of direct page cache
> reclaim.
>
>> A user process produces page cache frequently, so free memory is not
>> enough after running a long time. Slow path takes much more time because
>> direct reclaim. And kswapd will reclaim memory too, but not much. Thus it
>> always triggers slow path. this will cause performance regression.
>
> If I were you I would focus on why the reclaim doesn't catch up with the
> page cache users. The mechanism you are proposing in unacceptable.

Hi Michal,

Do you mean why the reclaim is slower than page cache increase?

I think there are two reasons:
1. kswapd and direct_reclaim will be triggered only when there is not
enough memory(e.g. __alloc_pages_slowpath()). That means it will not
reclaim when memory is enough(e.g. get_page_from_freelist()).
2. __alloc_pages_direct_reclaim
try_to_free_pages
nr_to_reclaim = SWAP_CLUSTER_MAX
And "#define SWAP_CLUSTER_MAX 32UL", that means it expect to reclaim 32
pages. It is too few, if we alloc 2^10 pages in one time.

Thanks,
Xishi Qiu

2014-06-24 07:37:05

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/8] mm: add page cache limit and reclaim feature

On Tue 24-06-14 10:25:32, Xishi Qiu wrote:
> On 2014/6/23 19:29, Michal Hocko wrote:
[...]
> > This doesn't make much sense to me. So you have a problem with latency
> > caused by direct reclaim so you add a new way of direct page cache
> > reclaim.
> >
> >> A user process produces page cache frequently, so free memory is not
> >> enough after running a long time. Slow path takes much more time because
> >> direct reclaim. And kswapd will reclaim memory too, but not much. Thus it
> >> always triggers slow path. this will cause performance regression.
> >
> > If I were you I would focus on why the reclaim doesn't catch up with the
> > page cache users. The mechanism you are proposing in unacceptable.
>
> Hi Michal,
>
> Do you mean why the reclaim is slower than page cache increase?
>
> I think there are two reasons:
> 1. kswapd and direct_reclaim will be triggered only when there is not
> enough memory(e.g. __alloc_pages_slowpath()). That means it will not
> reclaim when memory is enough(e.g. get_page_from_freelist()).

Yeah and that is the whole point. If you want to start to reclaim earlier
because you need a bigger pillow for the free memory for sudden memory
pressure then increase min_free_kbytes.

> 2. __alloc_pages_direct_reclaim
> try_to_free_pages
> nr_to_reclaim = SWAP_CLUSTER_MAX
> And "#define SWAP_CLUSTER_MAX 32UL", that means it expect to reclaim 32
> pages. It is too few, if we alloc 2^10 pages in one time.

Maybe _userspace_ allocates that much of memory but it is not faulted
in/allocated by kernel in one shot. Besides that at the time you enter
direct reclaim kswapd should be reclaiming memory to balance zones.
So reclaiming SWAP_CLUSTER_MAX from the direct reclaim shouldn't
matter that much. If it does then show us some numbers to prove it.
SWAP_CLUSTER_MAX is kind of arbitrary number but I haven't seen any
reclaim regression becuse of this value being too small AFAIR.

--
Michal Hocko
SUSE Labs