mem_cgroup_force_empty() tries to free only 32 (SWAP_CLUSTER_MAX) pages
on each iteration, if a memory cgroup has lots of page cache, it will
take many iterations to empty all page cache, so increase the reclaimed
number per iteration to speed it up. same as in mem_cgroup_resize_limit()
a simple test show:
$dd if=aaa of=bbb bs=1k count=3886080
$rm -f bbb
$time echo 100000000 >/cgroup/memory/test/memory.limit_in_bytes
Before: 0m0.252s ===> after: 0m0.178s
Signed-off-by: Li RongQing <[email protected]>
---
mm/memcontrol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 670e99b68aa6..8910d9e8e908 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2480,7 +2480,7 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
if (!ret)
break;
- if (!try_to_free_mem_cgroup_pages(memcg, 1,
+ if (!try_to_free_mem_cgroup_pages(memcg, 1024,
GFP_KERNEL, !memsw)) {
ret = -EBUSY;
break;
@@ -2610,7 +2610,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
if (signal_pending(current))
return -EINTR;
- progress = try_to_free_mem_cgroup_pages(memcg, 1,
+ progress = try_to_free_mem_cgroup_pages(memcg, 1024,
GFP_KERNEL, true);
if (!progress) {
nr_retries--;
--
2.11.0
On Mon 19-03-18 16:29:30, Li RongQing wrote:
> mem_cgroup_force_empty() tries to free only 32 (SWAP_CLUSTER_MAX) pages
> on each iteration, if a memory cgroup has lots of page cache, it will
> take many iterations to empty all page cache, so increase the reclaimed
> number per iteration to speed it up. same as in mem_cgroup_resize_limit()
>
> a simple test show:
>
> $dd if=aaa of=bbb bs=1k count=3886080
> $rm -f bbb
> $time echo 100000000 >/cgroup/memory/test/memory.limit_in_bytes
>
> Before: 0m0.252s ===> after: 0m0.178s
Andrey was proposing something similar [1]. My main objection was that
his approach might lead to over-reclaim. Your approach is more
conservative because it just increases the batch size. The size is still
rather arbitrary. Same as SWAP_CLUSTER_MAX but that one is a commonly
used unit of reclaim in the MM code.
I would be really curious about more detailed explanation why having a
larger batch yields to a better performance because we are doingg
SWAP_CLUSTER_MAX batches at the lower reclaim level anyway.
[1] http://lkml.kernel.org/r/[email protected]
>
> Signed-off-by: Li RongQing <[email protected]>
> ---
> mm/memcontrol.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 670e99b68aa6..8910d9e8e908 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2480,7 +2480,7 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg,
> if (!ret)
> break;
>
> - if (!try_to_free_mem_cgroup_pages(memcg, 1,
> + if (!try_to_free_mem_cgroup_pages(memcg, 1024,
> GFP_KERNEL, !memsw)) {
> ret = -EBUSY;
> break;
> @@ -2610,7 +2610,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
> if (signal_pending(current))
> return -EINTR;
>
> - progress = try_to_free_mem_cgroup_pages(memcg, 1,
> + progress = try_to_free_mem_cgroup_pages(memcg, 1024,
> GFP_KERNEL, true);
> if (!progress) {
> nr_retries--;
> --
> 2.11.0
--
Michal Hocko
SUSE Labs
On Mon 19-03-18 16:29:30, Li RongQing wrote:
> mem_cgroup_force_empty() tries to free only 32 (SWAP_CLUSTER_MAX) pages
> on each iteration, if a memory cgroup has lots of page cache, it will
> take many iterations to empty all page cache, so increase the reclaimed
> number per iteration to speed it up. same as in mem_cgroup_resize_limit()
>
> a simple test show:
>
> $dd if=aaa of=bbb bs=1k count=3886080
> $rm -f bbb
> $time echo 100000000 >/cgroup/memory/test/memory.limit_in_bytes
>
> Before: 0m0.252s ===> after: 0m0.178s
One more note. I have only now realized that increasing the patch size
might have another negative side effect. Memcg reclaim bails out early
when the required target has been reclaimed and so we might skip memcgs
in the hierarchy and could end up hamering one child in the hierarchy
much more than others. Our current code is not ideal and we workaround
this by a smaller target and caching the last reclaimed memcg so the
imbalance is not so visible at least.
This is not something that couldn't be fixed and maybe 1M chunk would be
acceptable as well. I dunno. Let's focus on the main bottleneck first
before we start doing these changes though.
--
Michal Hocko
SUSE Labs