2 follow-up patches for "memcg: move charges to root cgroup if use_hierarchy=0",
developped/tested onto memcg-devel tree. Maybe no HUNK with -next and -mm....
-Kame
==
memcg: remove -EINTR at rmdir()
By commit "memcg: move charges to root cgroup if use_hierarchy=0",
no memory reclaiming will occur at removing memory cgroup.
So, we don't need to take care of user interrupt by signal. This
patch removes it.
(*) If -EINTR is returned here, cgroup will show WARNING.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0623300..cf8a0f6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3890,9 +3890,6 @@ move_account:
ret = -EBUSY;
if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
goto out;
- ret = -EINTR;
- if (signal_pending(current))
- goto out;
/* This is for making all *used* pages to be on LRU. */
lru_add_drain_all();
drain_all_stock_sync(memcg);
--
1.7.4.1
By commit "memcg: move charges to root cgroup if use_hierarchy=0"
mem_cgroup_move_parent() only returns -EBUSY, -EINVAL.
So, we can remove -ENOMEM and -EINTR checks.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cf8a0f6..726b7c6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3847,8 +3847,6 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
pc = lookup_page_cgroup(page);
ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
- if (ret == -ENOMEM || ret == -EINTR)
- break;
if (ret == -EBUSY || ret == -EINVAL) {
/* found lock contention or "pc" is obsolete. */
@@ -3910,9 +3908,6 @@ move_account:
}
mem_cgroup_end_move(memcg);
memcg_oom_recover(memcg);
- /* it seems parent cgroup doesn't have enough mem */
- if (ret == -ENOMEM)
- goto try_to_free;
cond_resched();
/* "ret" should also be checked to ensure all lists are empty. */
} while (res_counter_read_u64(&memcg->res, RES_USAGE) > 0 || ret);
--
1.7.4.1
On Mon 18-06-12 20:57:23, KAMEZAWA Hiroyuki wrote:
> 2 follow-up patches for "memcg: move charges to root cgroup if use_hierarchy=0",
> developped/tested onto memcg-devel tree. Maybe no HUNK with -next and -mm....
> -Kame
> ==
> memcg: remove -EINTR at rmdir()
>
> By commit "memcg: move charges to root cgroup if use_hierarchy=0",
> no memory reclaiming will occur at removing memory cgroup.
OK, so the there are only 2 reasons why move_parent could fail in this
path. 1) it races with somebody else who is uncharging or moving the
charge and 2) THP split.
1) works for us and 2) doens't seem to be serious enough to expect that
it would stall rmdir on the group for unbound amount of time so the
change is safe (can we make this into the changelog please?).
> So, we don't need to take care of user interrupt by signal. This
> patch removes it.
> (*) If -EINTR is returned here, cgroup will show WARNING.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Michal Hocko <[email protected]>
> ---
> mm/memcontrol.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0623300..cf8a0f6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3890,9 +3890,6 @@ move_account:
> ret = -EBUSY;
> if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
> goto out;
> - ret = -EINTR;
> - if (signal_pending(current))
> - goto out;
> /* This is for making all *used* pages to be on LRU. */
> lru_add_drain_all();
> drain_all_stock_sync(memcg);
> --
> 1.7.4.1
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Mon 18-06-12 20:59:44, KAMEZAWA Hiroyuki wrote:
>
> By commit "memcg: move charges to root cgroup if use_hierarchy=0"
> mem_cgroup_move_parent() only returns -EBUSY, -EINVAL.
> So, we can remove -ENOMEM and -EINTR checks.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Michal Hocko <[email protected]>
> ---
> mm/memcontrol.c | 5 -----
> 1 files changed, 0 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index cf8a0f6..726b7c6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3847,8 +3847,6 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> pc = lookup_page_cgroup(page);
>
> ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
> - if (ret == -ENOMEM || ret == -EINTR)
> - break;
>
> if (ret == -EBUSY || ret == -EINVAL) {
> /* found lock contention or "pc" is obsolete. */
> @@ -3910,9 +3908,6 @@ move_account:
> }
> mem_cgroup_end_move(memcg);
> memcg_oom_recover(memcg);
> - /* it seems parent cgroup doesn't have enough mem */
> - if (ret == -ENOMEM)
> - goto try_to_free;
> cond_resched();
> /* "ret" should also be checked to ensure all lists are empty. */
> } while (res_counter_read_u64(&memcg->res, RES_USAGE) > 0 || ret);
> --
> 1.7.4.1
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
(2012/06/18 22:30), Michal Hocko wrote:
> On Mon 18-06-12 20:57:23, KAMEZAWA Hiroyuki wrote:
>> 2 follow-up patches for "memcg: move charges to root cgroup if use_hierarchy=0",
>> developped/tested onto memcg-devel tree. Maybe no HUNK with -next and -mm....
>> -Kame
>> ==
>> memcg: remove -EINTR at rmdir()
>>
>> By commit "memcg: move charges to root cgroup if use_hierarchy=0",
>> no memory reclaiming will occur at removing memory cgroup.
>
> OK, so the there are only 2 reasons why move_parent could fail in this
> path. 1) it races with somebody else who is uncharging or moving the
> charge and 2) THP split.
> 1) works for us and 2) doens't seem to be serious enough to expect that
> it would stall rmdir on the group for unbound amount of time so the
> change is safe (can we make this into the changelog please?).
>
Yes. But the failure of move_parent() (-EBUSY) will be retried.
Remaining problems are
- attaching task while pre_destroy() is called.
- creating child cgroup while pre_destroy() is called.
I think I need to make a patch for cgroup layer as I previously posted.
I'd like to try again.
Thanks,
-Kame
On Tue 19-06-12 09:09:47, KAMEZAWA Hiroyuki wrote:
> (2012/06/18 22:30), Michal Hocko wrote:
> > On Mon 18-06-12 20:57:23, KAMEZAWA Hiroyuki wrote:
> >> 2 follow-up patches for "memcg: move charges to root cgroup if use_hierarchy=0",
> >> developped/tested onto memcg-devel tree. Maybe no HUNK with -next and -mm....
> >> -Kame
> >> ==
> >> memcg: remove -EINTR at rmdir()
> >>
> >> By commit "memcg: move charges to root cgroup if use_hierarchy=0",
> >> no memory reclaiming will occur at removing memory cgroup.
> >
> > OK, so the there are only 2 reasons why move_parent could fail in this
> > path. 1) it races with somebody else who is uncharging or moving the
> > charge and 2) THP split.
> > 1) works for us and 2) doens't seem to be serious enough to expect that
> > it would stall rmdir on the group for unbound amount of time so the
> > change is safe (can we make this into the changelog please?).
> >
>
> Yes. But the failure of move_parent() (-EBUSY) will be retried.
>
> Remaining problems are
> - attaching task while pre_destroy() is called.
> - creating child cgroup while pre_destroy() is called.
I don't know why but I thought that tasks and subgroups are not alowed
when pre_destroy is called. If this is possible then we probably want to
check for pending signals or at least add cond_resched.
>
> I think I need to make a patch for cgroup layer as I previously posted.
> I'd like to try again.
>
> Thanks,
> -Kame
>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Mon, 18 Jun 2012 20:59:44 +0900
Kamezawa Hiroyuki <[email protected]> wrote:
>
> By commit "memcg: move charges to root cgroup if use_hierarchy=0"
> mem_cgroup_move_parent() only returns -EBUSY, -EINVAL.
> So, we can remove -ENOMEM and -EINTR checks.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> mm/memcontrol.c | 5 -----
> 1 files changed, 0 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index cf8a0f6..726b7c6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3847,8 +3847,6 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> pc = lookup_page_cgroup(page);
>
> ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
> - if (ret == -ENOMEM || ret == -EINTR)
> - break;
>
> if (ret == -EBUSY || ret == -EINVAL) {
This looks a bit fragile - if mem_cgroup_move_parent() is later changed
(intentionally or otherwise!) to return -Esomethingelse then
mem_cgroup_force_empty_list() will subtly break. Why not just do
if (ret < 0)
here?
(2012/06/20 8:58), Andrew Morton wrote:
> On Mon, 18 Jun 2012 20:59:44 +0900
> Kamezawa Hiroyuki<[email protected]> wrote:
>
>>
>> By commit "memcg: move charges to root cgroup if use_hierarchy=0"
>> mem_cgroup_move_parent() only returns -EBUSY, -EINVAL.
>> So, we can remove -ENOMEM and -EINTR checks.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki<[email protected]>
>> ---
>> mm/memcontrol.c | 5 -----
>> 1 files changed, 0 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index cf8a0f6..726b7c6 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3847,8 +3847,6 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
>> pc = lookup_page_cgroup(page);
>>
>> ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
>> - if (ret == -ENOMEM || ret == -EINTR)
>> - break;
>>
>> if (ret == -EBUSY || ret == -EINVAL) {
>
> This looks a bit fragile - if mem_cgroup_move_parent() is later changed
> (intentionally or otherwise!) to return -Esomethingelse then
> mem_cgroup_force_empty_list() will subtly break. Why not just do
>
> if (ret< 0)
>
You're right. I'm sorry I haven't done enough clean-ups.
I made 2 more patches...I'll repost/remake all paches if it's better.
one more patch will follow this email.
==
From eee5f31fc6378da19705de7187bb3f219ef6d7f6 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <[email protected]>
Date: Thu, 21 Jun 2012 17:25:04 +0900
Subject: [PATCH 1/2] mem_cgroup_move_parent() doesn't need gfp_mask.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 76d83a5..90a2ad4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2662,8 +2662,7 @@ out:
static int mem_cgroup_move_parent(struct page *page,
struct page_cgroup *pc,
- struct mem_cgroup *child,
- gfp_t gfp_mask)
+ struct mem_cgroup *child)
{
struct mem_cgroup *parent;
unsigned int nr_pages;
@@ -3837,7 +3836,7 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
pc = lookup_page_cgroup(page);
- ret = mem_cgroup_move_parent(page, pc, memcg, GFP_KERNEL);
+ ret = mem_cgroup_move_parent(page, pc, memcg);
if (ret == -EBUSY || ret == -EINVAL) {
/* found lock contention or "pc" is obsolete. */
--
1.7.4.1
You're right and I think this will be much cleaner.
==
From 9b6224616282d74838b393485eb7c9215f546ec9 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <[email protected]>
Date: Thu, 21 Jun 2012 17:28:55 +0900
Subject: [PATCH 2/2] memcg: make mem_cgroup_force_empty_list() as boolean function
Now, mem_cgroup_force_empty_list() just returns 0 or -EBUSY and
-EBUSY is just indicating 'you need to retry.'.
This patch makes mem_cgroup_force_empty_list() as boolean function and
make the logic simpler.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 13 +++----------
1 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 90a2ad4..767440c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3797,7 +3797,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
* This routine traverse page_cgroup in given list and drop them all.
* *And* this routine doesn't reclaim page itself, just removes page_cgroup.
*/
-static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
+static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
int node, int zid, enum lru_list lru)
{
struct mem_cgroup_per_zone *mz;
@@ -3805,7 +3805,6 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
struct list_head *list;
struct page *busy;
struct zone *zone;
- int ret = 0;
zone = &NODE_DATA(node)->node_zones[zid];
mz = mem_cgroup_zoneinfo(memcg, node, zid);
@@ -3819,7 +3818,6 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
struct page_cgroup *pc;
struct page *page;
- ret = 0;
spin_lock_irqsave(&zone->lru_lock, flags);
if (list_empty(list)) {
spin_unlock_irqrestore(&zone->lru_lock, flags);
@@ -3836,19 +3834,14 @@ static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
pc = lookup_page_cgroup(page);
- ret = mem_cgroup_move_parent(page, pc, memcg);
-
- if (ret == -EBUSY || ret == -EINVAL) {
+ if (mem_cgroup_move_parent(page, pc, memcg)) {
/* found lock contention or "pc" is obsolete. */
busy = page;
cond_resched();
} else
busy = NULL;
}
-
- if (!ret && !list_empty(list))
- return -EBUSY;
- return ret;
+ return !list_empty(list);
}
/*
--
1.7.4.1
(2012/06/19 21:40), Michal Hocko wrote:
> On Tue 19-06-12 09:09:47, KAMEZAWA Hiroyuki wrote:
>> (2012/06/18 22:30), Michal Hocko wrote:
>>> On Mon 18-06-12 20:57:23, KAMEZAWA Hiroyuki wrote:
>>>> 2 follow-up patches for "memcg: move charges to root cgroup if use_hierarchy=0",
>>>> developped/tested onto memcg-devel tree. Maybe no HUNK with -next and -mm....
>>>> -Kame
>>>> ==
>>>> memcg: remove -EINTR at rmdir()
>>>>
>>>> By commit "memcg: move charges to root cgroup if use_hierarchy=0",
>>>> no memory reclaiming will occur at removing memory cgroup.
>>>
>>> OK, so the there are only 2 reasons why move_parent could fail in this
>>> path. 1) it races with somebody else who is uncharging or moving the
>>> charge and 2) THP split.
>>> 1) works for us and 2) doens't seem to be serious enough to expect that
>>> it would stall rmdir on the group for unbound amount of time so the
>>> change is safe (can we make this into the changelog please?).
>>>
>>
>> Yes. But the failure of move_parent() (-EBUSY) will be retried.
>>
>> Remaining problems are
>> - attaching task while pre_destroy() is called.
>> - creating child cgroup while pre_destroy() is called.
>
> I don't know why but I thought that tasks and subgroups are not alowed
> when pre_destroy is called. If this is possible then we probably want to
> check for pending signals or at least add cond_resched.
Now, pre_destroy() call is done as
lock_cgroup_mutex();
do some pre-check, no child, no tasks.
unlock_cgroup_mutex();
->pre_destroy()
lock_cgroup_mutex()
check css's refcnt....
What I take care of now is following case.
CPU A CPU-B
unlock_cgroup_mutex()
->pre_destroy()
<delay by something> attach new task
add new charge
detach the task
lock_cgroup_mutex()
check rss' refcnt
This will cause account leak even if I think this will not happen in the real world.
I'd like to disable attach task.
Now, our ->pre_destroy() is quite fast because we don't have no memory reclaim.
I believe we can call ->pre_destroy() without dropping cgroup_mutex.
lock_cgroup_mutex()
do pre-check
->pre_destroy()
check css's refcnt
I think this is straightforward. I'd like to post a patch.
Thanks,
-Kame
On Thu, 21 Jun 2012 17:17:01 +0900
Kamezawa Hiroyuki <[email protected]> wrote:
> Now, mem_cgroup_force_empty_list() just returns 0 or -EBUSY and
> -EBUSY is just indicating 'you need to retry.'.
> This patch makes mem_cgroup_force_empty_list() as boolean function and
> make the logic simpler.
>
For some reason I'm having trouble applying these patches - many
rejects, need to massage it in by hand.
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3797,7 +3797,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
> * This routine traverse page_cgroup in given list and drop them all.
> * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
> */
> -static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> +static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> int node, int zid, enum lru_list lru)
Let's document the return value. The mem_cgroup_force_empty_list()
comment is a mess so I tried to help it a bit. How does this look?
--- a/mm/memcontrol.c~memcg-make-mem_cgroup_force_empty_list-return-bool-fix
+++ a/mm/memcontrol.c
@@ -3609,8 +3609,10 @@ unsigned long mem_cgroup_soft_limit_recl
}
/*
- * This routine traverse page_cgroup in given list and drop them all.
- * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
+ * Traverse a specified page_cgroup list and try to drop them all. This doesn't
+ * reclaim the pages page themselves - it just removes the page_cgroups.
+ * Returns true if some page_cgroups were not freed, indicating that the caller
+ * must retry this operation.
*/
static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
int node, int zid, enum lru_list lru)
_
(2012/06/22 5:13), Andrew Morton wrote:
> On Thu, 21 Jun 2012 17:17:01 +0900
> Kamezawa Hiroyuki<[email protected]> wrote:
>
>> Now, mem_cgroup_force_empty_list() just returns 0 or -EBUSY and
>> -EBUSY is just indicating 'you need to retry.'.
>> This patch makes mem_cgroup_force_empty_list() as boolean function and
>> make the logic simpler.
>>
>
> For some reason I'm having trouble applying these patches - many
> rejects, need to massage it in by hand.
>
Oh, sorry. Is it space breakage or some ? (I had mailer troubles in the last week..)
Or, maybe, my tree/patch queue was too old.
I'll rebased to new -mm tree when I post new patch.
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3797,7 +3797,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
>> * This routine traverse page_cgroup in given list and drop them all.
>> * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
>> */
>> -static int mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
>> +static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
>> int node, int zid, enum lru_list lru)
>
> Let's document the return value. The mem_cgroup_force_empty_list()
> comment is a mess so I tried to help it a bit. How does this look?
>
> --- a/mm/memcontrol.c~memcg-make-mem_cgroup_force_empty_list-return-bool-fix
> +++ a/mm/memcontrol.c
> @@ -3609,8 +3609,10 @@ unsigned long mem_cgroup_soft_limit_recl
> }
>
> /*
> - * This routine traverse page_cgroup in given list and drop them all.
> - * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
> + * Traverse a specified page_cgroup list and try to drop them all. This doesn't
> + * reclaim the pages page themselves - it just removes the page_cgroups.
> + * Returns true if some page_cgroups were not freed, indicating that the caller
> + * must retry this operation.
> */
> static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> int node, int zid, enum lru_list lru)
> _
Seems nice! Thank you very much.
Regards,
-Kame