2009-01-13 10:33:51

by Daisuke Nishimura

[permalink] [raw]
Subject: [PATCH -rc] some fixes for memcg

Hi, Andrew.

These are some fixes for memcg that has been pushed into mainline.

They are based on 2.6.29-rc1.

[1/4] fix mem_cgroup_get_reclaim_stat_from_page

[2/4] fix error path of mem_cgroup_move_parent

[3/4] fix hierarchical reclaim

[4/4] make oom less frequently


Thanks,
Daisuke Nishimura.


2009-01-13 10:34:42

by Daisuke Nishimura

[permalink] [raw]
Subject: [PATCH 1/4] memcg: fix mem_cgroup_get_reclaim_stat_from_page

In case of swapin, a new page is added to lru before it is charged,
so page->pc->mem_cgroup points to NULL or last mem_cgroup the page
was charged before.

In the latter case, if the mem_cgroup has already freed by rmdir,
the area pointed to by page->pc->mem_cgroup may have invalid data.

Actually, I saw general protection fault.

general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map
CPU 4
Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery ac lp sg ide_cd_mod cdrom button serio_raw acpi_memhotplug parport_pc e1000 rtc_cmos parport rtc_core rtc_lib i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 26038, comm: page01 Tainted: G W 2.6.28-rc9-mm1-mmotm-2008-12-22-16-14-f2ab3dea #1
RIP: 0010:[<ffffffff8028e710>] [<ffffffff8028e710>] update_page_reclaim_stat+0x2f/0x42
RSP: 0000:ffff8801ee457da8 EFLAGS: 00010002
RAX: 32353438312021c8 RBX: 0000000000000000 RCX: 32353438312021c8
RDX: 0000000000000000 RSI: ffff8800cb0b1000 RDI: ffff8801164d1d28
RBP: ffff880110002cb8 R08: ffff88010f2eae23 R09: 0000000000000001
R10: ffff8800bc514b00 R11: ffff880110002c00 R12: 0000000000000000
R13: ffff88000f484100 R14: 0000000000000003 R15: 00000000001200d2
FS: 00007f8a261726f0(0000) GS:ffff88010f2eaa80(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f8a25d22000 CR3: 00000001ef18c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process page01 (pid: 26038, threadinfo ffff8801ee456000, task ffff8800b585b960)
Stack:
ffffe200071ee568 ffff880110001f00 0000000000000000 ffffffff8028ea17
ffff88000f484100 0000000000000000 0000000000000020 00007f8a25d22000
ffff8800bc514b00 ffffffff8028ec34 0000000000000000 0000000000016fd8
Call Trace:
[<ffffffff8028ea17>] ? ____pagevec_lru_add+0xc1/0x13c
[<ffffffff8028ec34>] ? drain_cpu_pagevecs+0x36/0x89
[<ffffffff802a4f8c>] ? swapin_readahead+0x78/0x98
[<ffffffff8029a37a>] ? handle_mm_fault+0x3d9/0x741
[<ffffffff804da654>] ? do_page_fault+0x3ce/0x78c
[<ffffffff804d7a42>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff804d860f>] ? page_fault+0x1f/0x30
Code: cc 55 48 8d af b8 0d 00 00 48 89 f7 53 89 d3 e8 39 85 02 00 48 63 d3 48 ff 44 d5 10 45 85 e4 74 05 48 ff 44 d5 00 48 85 c0 74 0e <48> ff 44 d0 10 45 85 e4 74 04 48 ff 04 d0 5b 5d 41 5c c3 41 54
RIP [<ffffffff8028e710>] update_page_reclaim_stat+0x2f/0x42
RSP <ffff8801ee457da8>

ChangeLog: RFC->v1
- add comments to smp_rmb.

Signed-off-by: Daisuke Nishimura <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e2996b8..b665127 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -358,6 +358,10 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
return;

pc = lookup_page_cgroup(page);
+ /*
+ * Used bit is set without atomic ops but after smp_wmb().
+ * For making pc->mem_cgroup visible, insert smp_rmb() here.
+ */
smp_rmb();
/* unused page is not rotated. */
if (!PageCgroupUsed(pc))
@@ -374,7 +378,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
if (mem_cgroup_disabled())
return;
pc = lookup_page_cgroup(page);
- /* barrier to sync with "charge" */
+ /*
+ * Used bit is set without atomic ops but after smp_wmb().
+ * For making pc->mem_cgroup visible, insert smp_rmb() here.
+ */
smp_rmb();
if (!PageCgroupUsed(pc))
return;
@@ -559,6 +566,14 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
return NULL;

pc = lookup_page_cgroup(page);
+ /*
+ * Used bit is set without atomic ops but after smp_wmb().
+ * For making pc->mem_cgroup visible, insert smp_rmb() here.
+ */
+ smp_rmb();
+ if (!PageCgroupUsed(pc))
+ return NULL;
+
mz = page_cgroup_zoneinfo(pc);
if (!mz)
return NULL;

2009-01-13 10:36:59

by Daisuke Nishimura

[permalink] [raw]
Subject: [PATCH 2/4] memcg: fix error path of mem_cgroup_move_parent

There is a bug in error path of mem_cgroup_move_parent.

Extra refcnt got from try_charge should be dropped, and usages incremented
by try_charge should be decremented in both error paths:

A: failure at get_page_unless_zero
B: failure at isolate_lru_page

This bug makes this parent directory unremovable.

In case of A, rmdir doesn't return, because res.usage doesn't go
down to 0 at mem_cgroup_force_empty even after all the pc in
lru are removed.
In case of B, rmdir fails and returns -EBUSY, because it has
extra ref counts even after res.usage goes down to 0.

Signed-off-by: Daisuke Nishimura <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Balbir Singh <[email protected]>
---
mm/memcontrol.c | 23 +++++++++++++++--------
1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b665127..7be9b35 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -994,14 +994,15 @@ static int mem_cgroup_move_account(struct page_cgroup *pc,
if (pc->mem_cgroup != from)
goto out;

- css_put(&from->css);
res_counter_uncharge(&from->res, PAGE_SIZE);
mem_cgroup_charge_statistics(from, pc, false);
if (do_swap_account)
res_counter_uncharge(&from->memsw, PAGE_SIZE);
+ css_put(&from->css);
+
+ css_get(&to->css);
pc->mem_cgroup = to;
mem_cgroup_charge_statistics(to, pc, true);
- css_get(&to->css);
ret = 0;
out:
unlock_page_cgroup(pc);
@@ -1034,8 +1035,10 @@ static int mem_cgroup_move_parent(struct page_cgroup *pc,
if (ret || !parent)
return ret;

- if (!get_page_unless_zero(page))
- return -EBUSY;
+ if (!get_page_unless_zero(page)) {
+ ret = -EBUSY;
+ goto uncharge;
+ }

ret = isolate_lru_page(page);

@@ -1044,19 +1047,23 @@ static int mem_cgroup_move_parent(struct page_cgroup *pc,

ret = mem_cgroup_move_account(pc, child, parent);

- /* drop extra refcnt by try_charge() (move_account increment one) */
- css_put(&parent->css);
putback_lru_page(page);
if (!ret) {
put_page(page);
+ /* drop extra refcnt by try_charge() */
+ css_put(&parent->css);
return 0;
}
- /* uncharge if move fails */
+
cancel:
+ put_page(page);
+uncharge:
+ /* drop extra refcnt by try_charge() */
+ css_put(&parent->css);
+ /* uncharge if move fails */
res_counter_uncharge(&parent->res, PAGE_SIZE);
if (do_swap_account)
res_counter_uncharge(&parent->memsw, PAGE_SIZE);
- put_page(page);
return ret;
}

2009-01-13 10:44:20

by Daisuke Nishimura

[permalink] [raw]
Subject: [PATCH 3/4] memcg: fix hierarchical reclaim

If root_mem has no children, last_scaned_child is set to root_mem itself.
But after some children added to root_mem, mem_cgroup_get_next_node can
mem_cgroup_put the root_mem although root_mem has not been mem_cgroup_get.

This patch fixes this behavior by:
- Set last_scanned_child to NULL if root_mem has no children or DFS search
has returned to root_mem itself(root_mem is not a "child" of root_mem).
Make mem_cgroup_get_first_node return root_mem in this case.
There are no mem_cgroup_get/put for root_mem.
- Rename mem_cgroup_get_next_node to __mem_cgroup_get_next_node, and
mem_cgroup_get_first_node to mem_cgroup_get_next_node.
Make mem_cgroup_hierarchical_reclaim call only new mem_cgroup_get_next_node.

ChangeLog: RFC->v1
- add mem_cgroup_put of last_scanned_child at mem_cgroup_destroy.
ChangeLog: v1->v2
- cleanup. move all mem_cgroup_get/put to the end of mem_cgroup_get_next_node.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 68 +++++++++++++++++++++++++++++-------------------------
1 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7be9b35..322625f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -633,7 +633,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
* called with hierarchy_mutex held
*/
static struct mem_cgroup *
-mem_cgroup_get_next_node(struct mem_cgroup *curr, struct mem_cgroup *root_mem)
+__mem_cgroup_get_next_node(struct mem_cgroup *curr, struct mem_cgroup *root_mem)
{
struct cgroup *cgroup, *curr_cgroup, *root_cgroup;

@@ -644,19 +644,16 @@ mem_cgroup_get_next_node(struct mem_cgroup *curr, struct mem_cgroup *root_mem)
/*
* Walk down to children
*/
- mem_cgroup_put(curr);
cgroup = list_entry(curr_cgroup->children.next,
struct cgroup, sibling);
curr = mem_cgroup_from_cont(cgroup);
- mem_cgroup_get(curr);
goto done;
}

visit_parent:
if (curr_cgroup == root_cgroup) {
- mem_cgroup_put(curr);
- curr = root_mem;
- mem_cgroup_get(curr);
+ /* caller handles NULL case */
+ curr = NULL;
goto done;
}

@@ -664,11 +661,9 @@ visit_parent:
* Goto next sibling
*/
if (curr_cgroup->sibling.next != &curr_cgroup->parent->children) {
- mem_cgroup_put(curr);
cgroup = list_entry(curr_cgroup->sibling.next, struct cgroup,
sibling);
curr = mem_cgroup_from_cont(cgroup);
- mem_cgroup_get(curr);
goto done;
}

@@ -679,7 +674,6 @@ visit_parent:
goto visit_parent;

done:
- root_mem->last_scanned_child = curr;
return curr;
}

@@ -689,40 +683,46 @@ done:
* that to reclaim free pages from.
*/
static struct mem_cgroup *
-mem_cgroup_get_first_node(struct mem_cgroup *root_mem)
+mem_cgroup_get_next_node(struct mem_cgroup *root_mem)
{
struct cgroup *cgroup;
- struct mem_cgroup *ret;
+ struct mem_cgroup *orig, *next;
bool obsolete;

- obsolete = mem_cgroup_is_obsolete(root_mem->last_scanned_child);
-
/*
* Scan all children under the mem_cgroup mem
*/
mutex_lock(&mem_cgroup_subsys.hierarchy_mutex);
+
+ orig = root_mem->last_scanned_child;
+ obsolete = mem_cgroup_is_obsolete(orig);
+
if (list_empty(&root_mem->css.cgroup->children)) {
- ret = root_mem;
+ /*
+ * root_mem might have children before and last_scanned_child
+ * may point to one of them. We put it later.
+ */
+ if (orig)
+ VM_BUG_ON(!obsolete);
+ next = NULL;
goto done;
}

- if (!root_mem->last_scanned_child || obsolete) {
-
- if (obsolete && root_mem->last_scanned_child)
- mem_cgroup_put(root_mem->last_scanned_child);
-
+ if (!orig || obsolete) {
cgroup = list_first_entry(&root_mem->css.cgroup->children,
struct cgroup, sibling);
- ret = mem_cgroup_from_cont(cgroup);
- mem_cgroup_get(ret);
+ next = mem_cgroup_from_cont(cgroup);
} else
- ret = mem_cgroup_get_next_node(root_mem->last_scanned_child,
- root_mem);
+ next = __mem_cgroup_get_next_node(orig, root_mem);

done:
- root_mem->last_scanned_child = ret;
+ if (next)
+ mem_cgroup_get(next);
+ root_mem->last_scanned_child = next;
+ if (orig)
+ mem_cgroup_put(orig);
mutex_unlock(&mem_cgroup_subsys.hierarchy_mutex);
- return ret;
+ return (next) ? next : root_mem;
}

static bool mem_cgroup_check_under_limit(struct mem_cgroup *mem)
@@ -780,21 +780,18 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
if (!root_mem->use_hierarchy)
return ret;

- next_mem = mem_cgroup_get_first_node(root_mem);
+ next_mem = mem_cgroup_get_next_node(root_mem);

while (next_mem != root_mem) {
if (mem_cgroup_is_obsolete(next_mem)) {
- mem_cgroup_put(next_mem);
- next_mem = mem_cgroup_get_first_node(root_mem);
+ next_mem = mem_cgroup_get_next_node(root_mem);
continue;
}
ret = try_to_free_mem_cgroup_pages(next_mem, gfp_mask, noswap,
get_swappiness(next_mem));
if (mem_cgroup_check_under_limit(root_mem))
return 0;
- mutex_lock(&mem_cgroup_subsys.hierarchy_mutex);
- next_mem = mem_cgroup_get_next_node(next_mem, root_mem);
- mutex_unlock(&mem_cgroup_subsys.hierarchy_mutex);
+ next_mem = mem_cgroup_get_next_node(root_mem);
}
return ret;
}
@@ -2254,7 +2251,14 @@ static void mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
static void mem_cgroup_destroy(struct cgroup_subsys *ss,
struct cgroup *cont)
{
- mem_cgroup_put(mem_cgroup_from_cont(cont));
+ struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
+ struct mem_cgroup *last_scanned_child = mem->last_scanned_child;
+
+ if (last_scanned_child) {
+ VM_BUG_ON(!mem_cgroup_is_obsolete(last_scanned_child));
+ mem_cgroup_put(last_scanned_child);
+ }
+ mem_cgroup_put(mem);
}

static int mem_cgroup_populate(struct cgroup_subsys *ss,

2009-01-13 10:45:00

by Daisuke Nishimura

[permalink] [raw]
Subject: [PATCH 4/4] memcg: make oom less frequently

In previous implementation, mem_cgroup_try_charge checked the return
value of mem_cgroup_try_to_free_pages, and just retried if some pages
had been reclaimed.
But now, try_charge(and mem_cgroup_hierarchical_reclaim called from it)
only checks whether the usage is less than the limit.

This patch tries to change the behavior as before to cause oom less frequently.

ChangeLog: RFC->v1
- removed RETRIES_MAX.

Signed-off-by: Daisuke Nishimura <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 322625f..fb62b43 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -773,10 +773,10 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
* but there might be left over accounting, even after children
* have left.
*/
- ret = try_to_free_mem_cgroup_pages(root_mem, gfp_mask, noswap,
+ ret += try_to_free_mem_cgroup_pages(root_mem, gfp_mask, noswap,
get_swappiness(root_mem));
if (mem_cgroup_check_under_limit(root_mem))
- return 0;
+ return 1; /* indicate reclaim has succeeded */
if (!root_mem->use_hierarchy)
return ret;

@@ -787,10 +787,10 @@ static int mem_cgroup_hierarchical_reclaim(struct mem_cgroup *root_mem,
next_mem = mem_cgroup_get_next_node(root_mem);
continue;
}
- ret = try_to_free_mem_cgroup_pages(next_mem, gfp_mask, noswap,
+ ret += try_to_free_mem_cgroup_pages(next_mem, gfp_mask, noswap,
get_swappiness(next_mem));
if (mem_cgroup_check_under_limit(root_mem))
- return 0;
+ return 1; /* indicate reclaim has succeeded */
next_mem = mem_cgroup_get_next_node(root_mem);
}
return ret;
@@ -875,6 +875,8 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,

ret = mem_cgroup_hierarchical_reclaim(mem_over_limit, gfp_mask,
noswap);
+ if (ret)
+ continue;

/*
* try_to_free_mem_cgroup_pages() might not give us a full

2009-01-14 08:58:20

by Daisuke Nishimura

[permalink] [raw]
Subject: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

This is a new one. Please review.

===
From: Daisuke Nishimura <[email protected]>

mem_cgroup_get ensures that the memcg that has been got can be accessed
even after the directory has been removed, but it doesn't ensure that parents
of it can be accessed: parents might have been freed already by rmdir.

This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
climb up the tree.

Check if the memcg is obsolete, and don't call res_counter_uncharge when obsole.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..4ee95a8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1182,7 +1182,8 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
/* avoid double counting */
mem = swap_cgroup_record(ent, NULL);
if (mem) {
- res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+ if (!mem_cgroup_is_obsolete(mem))
+ res_counter_uncharge(&mem->memsw, PAGE_SIZE);
mem_cgroup_put(mem);
}
}
@@ -1252,7 +1253,8 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
struct mem_cgroup *memcg;
memcg = swap_cgroup_record(ent, NULL);
if (memcg) {
- res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ if (!mem_cgroup_is_obsolete(memcg))
+ res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
mem_cgroup_put(memcg);
}

@@ -1397,7 +1399,8 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)

memcg = swap_cgroup_record(ent, NULL);
if (memcg) {
- res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ if (!mem_cgroup_is_obsolete(memcg))
+ res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
mem_cgroup_put(memcg);
}
}

2009-01-14 13:43:24

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

Daisuke Nishimura さんは書きました:
> This is a new one. Please review.
>
> ===
> From: Daisuke Nishimura <[email protected]>
>
> mem_cgroup_get ensures that the memcg that has been got can be accessed
> even after the directory has been removed, but it doesn't ensure that
> parents
> of it can be accessed: parents might have been freed already by rmdir.
>
> This causes a bug in case of use_hierarchy==1, because
> res_counter_uncharge
> climb up the tree.
>
> Check if the memcg is obsolete, and don't call res_counter_uncharge when
> obsole.
>
Hmm, did you see panic ?
To handle the problem "parent may be obsolete",

call mem_cgroup_get(parent) at create()
call mem_cgroup_put(parent) at freeing memcg.
(regardless of use_hierarchy.)

is clearer way to go, I think.

I wonder whether there is mis-accounting problem or not..

So, adding css_tryget() around problematic code can be a fix.
--
mem = swap_cgroup_record();
if (css_tryget(&mem->css)) {
res_counter_uncharge(&mem->memsw, PAZE_SIZE);
css_put(&mem->css)
}
--
I like css_tryget() rather than mem_cgroup_obsolete().
To be honest, I'd like to remove memcg special stuff when I can.

Thanks,
-Kame

> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/memcontrol.c | 9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fb62b43..4ee95a8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1182,7 +1182,8 @@ int mem_cgroup_cache_charge(struct page *page,
> struct mm_struct *mm,
> /* avoid double counting */
> mem = swap_cgroup_record(ent, NULL);
> if (mem) {
> - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> + if (!mem_cgroup_is_obsolete(mem))
> + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> mem_cgroup_put(mem);
> }
> }
> @@ -1252,7 +1253,8 @@ void mem_cgroup_commit_charge_swapin(struct page
> *page, struct mem_cgroup *ptr)
> struct mem_cgroup *memcg;
> memcg = swap_cgroup_record(ent, NULL);
> if (memcg) {
> - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + if (!mem_cgroup_is_obsolete(memcg))
> + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> mem_cgroup_put(memcg);
> }
>
> @@ -1397,7 +1399,8 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
>
> memcg = swap_cgroup_record(ent, NULL);
> if (memcg) {
> - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + if (!mem_cgroup_is_obsolete(memcg))
> + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> mem_cgroup_put(memcg);
> }
> }
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2009-01-14 13:55:54

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

* Daisuke Nishimura <[email protected]> [2009-01-14 17:51:21]:

> This is a new one. Please review.
>
> ===
> From: Daisuke Nishimura <[email protected]>
>
> mem_cgroup_get ensures that the memcg that has been got can be accessed
> even after the directory has been removed, but it doesn't ensure that parents
> of it can be accessed: parents might have been freed already by rmdir.
>
> This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> climb up the tree.
>
> Check if the memcg is obsolete, and don't call res_counter_uncharge when obsole.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>

I liked the earlier, EBUSY approach that ensured that parents could
not go away if children exist. IMHO, the code has gotten too complex
and has too many corner cases. Time to revisit it.

--
Balbir

2009-01-15 01:19:23

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Wed, 14 Jan 2009 22:43:05 +0900 (JST), "KAMEZAWA Hiroyuki" <[email protected]> wrote:
> Daisuke Nishimura さんは書きました:
> > This is a new one. Please review.
> >
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > even after the directory has been removed, but it doesn't ensure that
> > parents
> > of it can be accessed: parents might have been freed already by rmdir.
> >
> > This causes a bug in case of use_hierarchy==1, because
> > res_counter_uncharge
> > climb up the tree.
> >
> > Check if the memcg is obsolete, and don't call res_counter_uncharge when
> > obsole.
> >
> Hmm, did you see panic ?
I saw 2 types of bugs, A: spinlock lockup and B: general protection fault.
(described below)

Those bugs happened in case of (use_hierarchy && do_swap_account),
and didn't happen (at leaset I haven't seen) in case of
(!use_hierarchy && do_swap_account) nor (use_hierarchy && !do_swap_account).
And, they didn't happen with this patch applied all through the last night.

A: spinlock lockup
===
BUG: spinlock lockup on CPU#1, mmapstress10/27706, ffff880
3a41ef0a0
Pid: 27706, comm: mmapstress10 Not tainted 2.6.28-git12-7c
99bf20 #1
Call Trace:
[<ffffffff803687ba>] _raw_spin_lock+0xfb/0x122
[<ffffffff804d83b7>] _spin_lock+0x4e/0x5f
[<ffffffff8026f999>] res_counter_uncharge+0x2a/0x70
[<ffffffff8026f999>] res_counter_uncharge+0x2a/0x70
[<ffffffff802a5ddc>] swap_info_get+0x6a/0xa3
[<ffffffff802b72a2>] mem_cgroup_uncharge_swap+0x2a/0x35
[<ffffffff802a6059>] swap_entry_free+0x8f/0x93
[<ffffffff802a6076>] swap_free+0x19/0x28
[<ffffffff802a572d>] delete_from_swap_cache+0x36/0x43
[<ffffffff802a6be9>] free_swap_and_cache+0xb1/0xeb
[<ffffffff80299e77>] unmap_vmas+0x57f/0x837
[<ffffffff8029e426>] exit_mmap+0xa5/0x11c
[<ffffffff80239f78>] mmput+0x41/0x9f
[<ffffffff8023ddeb>] exit_mm+0x102/0x10d
[<ffffffff8023f36a>] do_exit+0x1a2/0x73e
[<ffffffff80246317>] __dequeue_signal+0x15/0x11c
[<ffffffff8023f979>] do_group_exit+0x73/0xa5
[<ffffffff8024870a>] get_signal_to_deliver+0x34f/0x3a1
[<ffffffff8020b212>] do_notify_resume+0x8c/0x7a5
[<ffffffff80250f64>] lock_hrtimer_base+0x1b/0x3c
[<ffffffff8025474e>] getnstimeofday+0x57/0xb6
[<ffffffff80251314>] ktime_get_ts+0x22/0x4b
[<ffffffff802513bf>] ktime_get+0xc/0x41
[<ffffffff802511fa>] hrtimer_nanosleep+0xa5/0xf1
[<ffffffff80250d24>] hrtimer_wakeup+0x0/0x22
[<ffffffff8020bf58>] sysret_signal+0x7c/0xcb
===

This context has hold swap_lock already, so other contexts trying to hold
swap_lock also get spinlock lockup bug.

B: general protection fault
===
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map
CPU 3
Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6
autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm
_mod sbs sbshc battery ac lp sg rtc_cmos rtc_core ide_cd_mod parport_pc rtc_lib parport se
rio_raw cdrom acpi_memhotplug button e1000 i2c_i801 i2c_core shpchp pcspkr ata_piix libata
megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloa
ded: microcode]
Pid: 8051, comm: bash Not tainted 2.6.29-rc1-0ed85935 #1
RIP: 0010:[<ffffffff80368620>] [<ffffffff80368620>] _raw_spin_trylock+0x0/0x39
RSP: 0000:ffff8800bb995e00 EFLAGS: 00010092
RAX: ffff88010b54a620 RBX: 0097040900455377 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0097040900455377
RBP: 009704090045538f R08: 0000000000000002 R09: 0000000000000001
R10: ffffe2000c861640 R11: ffffffff8026f9b5 R12: 0000000000000296
R13: 0000000000001000 R14: ffff8801003cf080 R15: 00007fa79a932028
FS: 00007fa79a9316f0(0000) GS:ffff8803af7d7a80(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fa79a932028 CR3: 00000000cc8e0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 8051, threadinfo ffff8800bb994000, task ffff88010b54a620)
Stack:
ffffffff804d8f1e ffffffff8026f9b5 0097040900455377 0097040900455357
ffffffff8026f9b5 0000000000000282 ffff88010e1a2000 ffffe2000c861640
ffff8801094f9000 0000000000000000 ffffffff802b7371 00000001ed3e8025
Call Trace:
[<ffffffff804d8f1e>] ? _spin_lock+0x35/0x5f
[<ffffffff8026f9b5>] ? res_counter_uncharge+0x2a/0x70
[<ffffffff8026f9b5>] ? res_counter_uncharge+0x2a/0x70
[<ffffffff802b7371>] ? mem_cgroup_commit_charge_swapin+0x74/0x8a
[<ffffffff8029ad00>] ? handle_mm_fault+0x5e3/0x750
[<ffffffff804db70a>] ? do_page_fault+0x3b2/0x73e
[<ffffffff804d96ef>] ? page_fault+0x1f/0x30
Code: 31 c0 e8 5f 4a ed ff 48 c7 c7 da df 5c 80 31 c0 e8 51 4a ed ff c7 05 cc d5 33 00 01
00 00 00 c7 05 62 c7 de 00 00 00 00 00 58 c3 <0f> b7 07 38 e0 8d 88 00 01 00 00 75 05 f0 6
6 0f b1 0f 0f 94 c1
RIP [<ffffffff80368620>] _raw_spin_trylock+0x0/0x39
RSP <ffff8800bb995e00>
---[ end trace 1ecf768aff114688 ]---
===


> To handle the problem "parent may be obsolete",
>
> call mem_cgroup_get(parent) at create()
> call mem_cgroup_put(parent) at freeing memcg.
> (regardless of use_hierarchy.)
>
> is clearer way to go, I think.
>
> I wonder whether there is mis-accounting problem or not..
>
> So, adding css_tryget() around problematic code can be a fix.
> --
> mem = swap_cgroup_record();
> if (css_tryget(&mem->css)) {
> res_counter_uncharge(&mem->memsw, PAZE_SIZE);
> css_put(&mem->css)
> }
> --
> I like css_tryget() rather than mem_cgroup_obsolete().
I agree.
The updated version is attached.


Thanks,
Daisuke nishimura.

> To be honest, I'd like to remove memcg special stuff when I can.
>
> Thanks,
> -Kame
>
===
From: Daisuke Nishimura <[email protected]>

mem_cgroup_get ensures that the memcg that has been got can be accessed
even after the directory has been removed, but it doesn't ensure that parents
of it can be accessed: parents might have been freed already by rmdir.

This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
climb up the tree.

Check if the memcg is obsolete by css_tryget, and don't call
res_counter_uncharge when obsole.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..4e3b100 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
/* avoid double counting */
mem = swap_cgroup_record(ent, NULL);
if (mem) {
- res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+ if (!css_tryget(&mem->css)) {
+ res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+ css_put(&mem->css);
+ }
mem_cgroup_put(mem);
}
}
@@ -1252,7 +1255,10 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
struct mem_cgroup *memcg;
memcg = swap_cgroup_record(ent, NULL);
if (memcg) {
- res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ if (!css_tryget(&memcg->css)) {
+ res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ css_put(&memcg->css);
+ }
mem_cgroup_put(memcg);
}

@@ -1397,7 +1403,10 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)

memcg = swap_cgroup_record(ent, NULL);
if (memcg) {
- res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ if (!css_tryget(&memcg->css)) {
+ res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ css_put(&memcg->css);
+ }
mem_cgroup_put(memcg);
}
}

2009-01-15 02:02:03

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 10:03:30 +0900
Daisuke Nishimura <[email protected]> wrote:

> On Wed, 14 Jan 2009 22:43:05 +0900 (JST), "KAMEZAWA Hiroyuki" <[email protected]> wrote:
> > Daisuke Nishimura さんは書きました:
> > > This is a new one. Please review.
> > >
> > > ===
> > > From: Daisuke Nishimura <[email protected]>
> > >
> > > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > > even after the directory has been removed, but it doesn't ensure that
> > > parents
> > > of it can be accessed: parents might have been freed already by rmdir.
> > >
> > > This causes a bug in case of use_hierarchy==1, because
> > > res_counter_uncharge
> > > climb up the tree.
> > >
> > > Check if the memcg is obsolete, and don't call res_counter_uncharge when
> > > obsole.
> > >
> > Hmm, did you see panic ?
> I saw 2 types of bugs, A: spinlock lockup and B: general protection fault.
> (described below)
>
> Those bugs happened in case of (use_hierarchy && do_swap_account),
> and didn't happen (at leaset I haven't seen) in case of
> (!use_hierarchy && do_swap_account) nor (use_hierarchy && !do_swap_account).
> And, they didn't happen with this patch applied all through the last night.
>
> A: spinlock lockup
> ===
> BUG: spinlock lockup on CPU#1, mmapstress10/27706, ffff880
> 3a41ef0a0
> Pid: 27706, comm: mmapstress10 Not tainted 2.6.28-git12-7c
> 99bf20 #1
> Call Trace:
> [<ffffffff803687ba>] _raw_spin_lock+0xfb/0x122
> [<ffffffff804d83b7>] _spin_lock+0x4e/0x5f
> [<ffffffff8026f999>] res_counter_uncharge+0x2a/0x70
> [<ffffffff8026f999>] res_counter_uncharge+0x2a/0x70
> [<ffffffff802a5ddc>] swap_info_get+0x6a/0xa3
> [<ffffffff802b72a2>] mem_cgroup_uncharge_swap+0x2a/0x35
> [<ffffffff802a6059>] swap_entry_free+0x8f/0x93
> [<ffffffff802a6076>] swap_free+0x19/0x28
> [<ffffffff802a572d>] delete_from_swap_cache+0x36/0x43
> [<ffffffff802a6be9>] free_swap_and_cache+0xb1/0xeb
> [<ffffffff80299e77>] unmap_vmas+0x57f/0x837
> [<ffffffff8029e426>] exit_mmap+0xa5/0x11c
> [<ffffffff80239f78>] mmput+0x41/0x9f
> [<ffffffff8023ddeb>] exit_mm+0x102/0x10d
> [<ffffffff8023f36a>] do_exit+0x1a2/0x73e
> [<ffffffff80246317>] __dequeue_signal+0x15/0x11c
> [<ffffffff8023f979>] do_group_exit+0x73/0xa5
> [<ffffffff8024870a>] get_signal_to_deliver+0x34f/0x3a1
> [<ffffffff8020b212>] do_notify_resume+0x8c/0x7a5
> [<ffffffff80250f64>] lock_hrtimer_base+0x1b/0x3c
> [<ffffffff8025474e>] getnstimeofday+0x57/0xb6
> [<ffffffff80251314>] ktime_get_ts+0x22/0x4b
> [<ffffffff802513bf>] ktime_get+0xc/0x41
> [<ffffffff802511fa>] hrtimer_nanosleep+0xa5/0xf1
> [<ffffffff80250d24>] hrtimer_wakeup+0x0/0x22
> [<ffffffff8020bf58>] sysret_signal+0x7c/0xcb
> ===
>
> This context has hold swap_lock already, so other contexts trying to hold
> swap_lock also get spinlock lockup bug.
>
> B: general protection fault
> ===
> general protection fault: 0000 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map
> CPU 3
> Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6
> autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm
> _mod sbs sbshc battery ac lp sg rtc_cmos rtc_core ide_cd_mod parport_pc rtc_lib parport se
> rio_raw cdrom acpi_memhotplug button e1000 i2c_i801 i2c_core shpchp pcspkr ata_piix libata
> megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloa
> ded: microcode]
> Pid: 8051, comm: bash Not tainted 2.6.29-rc1-0ed85935 #1
> RIP: 0010:[<ffffffff80368620>] [<ffffffff80368620>] _raw_spin_trylock+0x0/0x39
> RSP: 0000:ffff8800bb995e00 EFLAGS: 00010092
> RAX: ffff88010b54a620 RBX: 0097040900455377 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0097040900455377
> RBP: 009704090045538f R08: 0000000000000002 R09: 0000000000000001
> R10: ffffe2000c861640 R11: ffffffff8026f9b5 R12: 0000000000000296
> R13: 0000000000001000 R14: ffff8801003cf080 R15: 00007fa79a932028
> FS: 00007fa79a9316f0(0000) GS:ffff8803af7d7a80(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fa79a932028 CR3: 00000000cc8e0000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process bash (pid: 8051, threadinfo ffff8800bb994000, task ffff88010b54a620)
> Stack:
> ffffffff804d8f1e ffffffff8026f9b5 0097040900455377 0097040900455357
> ffffffff8026f9b5 0000000000000282 ffff88010e1a2000 ffffe2000c861640
> ffff8801094f9000 0000000000000000 ffffffff802b7371 00000001ed3e8025
> Call Trace:
> [<ffffffff804d8f1e>] ? _spin_lock+0x35/0x5f
> [<ffffffff8026f9b5>] ? res_counter_uncharge+0x2a/0x70
> [<ffffffff8026f9b5>] ? res_counter_uncharge+0x2a/0x70
> [<ffffffff802b7371>] ? mem_cgroup_commit_charge_swapin+0x74/0x8a
> [<ffffffff8029ad00>] ? handle_mm_fault+0x5e3/0x750
> [<ffffffff804db70a>] ? do_page_fault+0x3b2/0x73e
> [<ffffffff804d96ef>] ? page_fault+0x1f/0x30
> Code: 31 c0 e8 5f 4a ed ff 48 c7 c7 da df 5c 80 31 c0 e8 51 4a ed ff c7 05 cc d5 33 00 01
> 00 00 00 c7 05 62 c7 de 00 00 00 00 00 58 c3 <0f> b7 07 38 e0 8d 88 00 01 00 00 75 05 f0 6
> 6 0f b1 0f 0f 94 c1
> RIP [<ffffffff80368620>] _raw_spin_trylock+0x0/0x39
> RSP <ffff8800bb995e00>
> ---[ end trace 1ecf768aff114688 ]---
> ===
>
>
> > To handle the problem "parent may be obsolete",
> >
> > call mem_cgroup_get(parent) at create()
> > call mem_cgroup_put(parent) at freeing memcg.
> > (regardless of use_hierarchy.)
> >
> > is clearer way to go, I think.
> >
> > I wonder whether there is mis-accounting problem or not..
> >
> > So, adding css_tryget() around problematic code can be a fix.
> > --
> > mem = swap_cgroup_record();
> > if (css_tryget(&mem->css)) {
> > res_counter_uncharge(&mem->memsw, PAZE_SIZE);
> > css_put(&mem->css)
> > }
> > --
> > I like css_tryget() rather than mem_cgroup_obsolete().
> I agree.
> The updated version is attached.
>
>
> Thanks,
> Daisuke nishimura.
>
> > To be honest, I'd like to remove memcg special stuff when I can.
> >
> > Thanks,
> > -Kame
> >
> ===
> From: Daisuke Nishimura <[email protected]>
>
> mem_cgroup_get ensures that the memcg that has been got can be accessed
> even after the directory has been removed, but it doesn't ensure that parents
> of it can be accessed: parents might have been freed already by rmdir.
>
> This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> climb up the tree.
>
> Check if the memcg is obsolete by css_tryget, and don't call
> res_counter_uncharge when obsole.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
seems nice loock.


> ---
> mm/memcontrol.c | 15 ++++++++++++---
> 1 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fb62b43..4e3b100 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> /* avoid double counting */
> mem = swap_cgroup_record(ent, NULL);
> if (mem) {
> - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> + if (!css_tryget(&mem->css)) {
> + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> + css_put(&mem->css);
> + }
> mem_cgroup_put(mem);
> }
> }

I think css_tryget() returns "ture" at success....

So,
==
if (mem && css_tryget(&mem->css))
res_counter....

is correct.

-Kame


> @@ -1252,7 +1255,10 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
> struct mem_cgroup *memcg;
> memcg = swap_cgroup_record(ent, NULL);
> if (memcg) {
> - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + if (!css_tryget(&memcg->css)) {
> + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + css_put(&memcg->css);
> + }
> mem_cgroup_put(memcg);
> }
>
> @@ -1397,7 +1403,10 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
>
> memcg = swap_cgroup_record(ent, NULL);
> if (memcg) {
> - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + if (!css_tryget(&memcg->css)) {
> + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + css_put(&memcg->css);
> + }
> mem_cgroup_put(memcg);
> }
> }
>

2009-01-15 02:30:53

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

> > > To handle the problem "parent may be obsolete",
> > >
> > > call mem_cgroup_get(parent) at create()
> > > call mem_cgroup_put(parent) at freeing memcg.
> > > (regardless of use_hierarchy.)
> > >
> > > is clearer way to go, I think.
> > >
> > > I wonder whether there is mis-accounting problem or not..
> > >
> > > So, adding css_tryget() around problematic code can be a fix.
> > > --
> > > mem = swap_cgroup_record();
> > > if (css_tryget(&mem->css)) {
> > > res_counter_uncharge(&mem->memsw, PAZE_SIZE);
> > > css_put(&mem->css)
> > > }
> > > --
> > > I like css_tryget() rather than mem_cgroup_obsolete().
> > I agree.
> > The updated version is attached.
> >
> >
> > Thanks,
> > Daisuke nishimura.
> >
> > > To be honest, I'd like to remove memcg special stuff when I can.
> > >
> > > Thanks,
> > > -Kame
> > >
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > even after the directory has been removed, but it doesn't ensure that parents
> > of it can be accessed: parents might have been freed already by rmdir.
> >
> > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > climb up the tree.
> >
> > Check if the memcg is obsolete by css_tryget, and don't call
> > res_counter_uncharge when obsole.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
> seems nice loock.
>
>
> > ---
> > mm/memcontrol.c | 15 ++++++++++++---
> > 1 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index fb62b43..4e3b100 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> > /* avoid double counting */
> > mem = swap_cgroup_record(ent, NULL);
> > if (mem) {
> > - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + if (!css_tryget(&mem->css)) {
> > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + css_put(&mem->css);
> > + }
> > mem_cgroup_put(mem);
> > }
> > }
>
> I think css_tryget() returns "ture" at success....
>
> So,
> ==
> if (mem && css_tryget(&mem->css))
> res_counter....
>
> is correct.
>
> -Kame
>
Ooops! you are right.
Sorry for my silly mistake..

"mem" is checked beforehand, so I think css_tryget would be enough.
I'm now testing the attached one.


Thanks,
Daisuke Nishimura.
===
From: Daisuke Nishimura <[email protected]>

mem_cgroup_get ensures that the memcg that has been got can be accessed
even after the directory has been removed, but it doesn't ensure that parents
of it can be accessed: parents might have been freed already by rmdir.

This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
climb up the tree.

Check if the memcg is obsolete by css_tryget, and don't call
res_counter_uncharge when obsole.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..b9d5271 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
/* avoid double counting */
mem = swap_cgroup_record(ent, NULL);
if (mem) {
- res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+ if (css_tryget(&mem->css)) {
+ res_counter_uncharge(&mem->memsw, PAGE_SIZE);
+ css_put(&mem->css);
+ }
mem_cgroup_put(mem);
}
}
@@ -1252,7 +1255,10 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
struct mem_cgroup *memcg;
memcg = swap_cgroup_record(ent, NULL);
if (memcg) {
- res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ if (css_tryget(&memcg->css)) {
+ res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ css_put(&memcg->css);
+ }
mem_cgroup_put(memcg);
}

@@ -1397,7 +1403,10 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)

memcg = swap_cgroup_record(ent, NULL);
if (memcg) {
- res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ if (css_tryget(&memcg->css)) {
+ res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
+ css_put(&memcg->css);
+ }
mem_cgroup_put(memcg);
}
}

2009-01-15 02:50:12

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Wed, 14 Jan 2009 19:25:39 +0530
Balbir Singh <[email protected]> wrote:

> * Daisuke Nishimura <[email protected]> [2009-01-14 17:51:21]:
>
> > This is a new one. Please review.
> >
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > even after the directory has been removed, but it doesn't ensure that parents
> > of it can be accessed: parents might have been freed already by rmdir.
> >
> > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > climb up the tree.
> >
> > Check if the memcg is obsolete, and don't call res_counter_uncharge when obsole.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
>
> I liked the earlier, EBUSY approach that ensured that parents could
> not go away if children exist. IMHO, the code has gotten too complex
> and has too many corner cases. Time to revisit it.
>
It's on my plan.
I'll fix this by CSS-ID patch.

When I recored memcg's CSS-ID to swap_cgroup (not pointer to memcg),
we never need memcg's refcnt.

Thanks,
-Kame

2009-01-15 03:09:45

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 11:48:46 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> On Wed, 14 Jan 2009 19:25:39 +0530
> Balbir Singh <[email protected]> wrote:
> > I liked the earlier, EBUSY approach that ensured that parents could
> > not go away if children exist. IMHO, the code has gotten too complex
> > and has too many corner cases. Time to revisit it.
> >
> It's on my plan.
> I'll fix this by CSS-ID patch.
>
> When I recored memcg's CSS-ID to swap_cgroup (not pointer to memcg),
> we never need memcg's refcnt.
>
Sorry, ignore this. refcnt in memcg is necessary anyway to prevent reuse of ID.

-Kame

2009-01-15 03:25:35

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Wed, 14 Jan 2009 19:25:39 +0530
Balbir Singh <[email protected]> wrote:

> * Daisuke Nishimura <[email protected]> [2009-01-14 17:51:21]:
>
> > This is a new one. Please review.
> >
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > even after the directory has been removed, but it doesn't ensure that parents
> > of it can be accessed: parents might have been freed already by rmdir.
> >
> > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > climb up the tree.
> >
> > Check if the memcg is obsolete, and don't call res_counter_uncharge when obsole.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
>
> I liked the earlier, EBUSY approach that ensured that parents could
> not go away if children exist. IMHO, the code has gotten too complex
> and has too many corner cases. Time to revisit it.
>

But I don't like -EBUSY ;)

When rmdir() returns -EBUSY even if there are no (visible) children and tasks,
our customer will take kdump and send it to me "please explain this kernel bug"

I'm sure it will happen ;)

-Kame



> --
> Balbir
>

2009-01-15 04:18:07

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

* KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 12:24:16]:

> On Wed, 14 Jan 2009 19:25:39 +0530
> Balbir Singh <[email protected]> wrote:
>
> > * Daisuke Nishimura <[email protected]> [2009-01-14 17:51:21]:
> >
> > > This is a new one. Please review.
> > >
> > > ===
> > > From: Daisuke Nishimura <[email protected]>
> > >
> > > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > > even after the directory has been removed, but it doesn't ensure that parents
> > > of it can be accessed: parents might have been freed already by rmdir.
> > >
> > > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > > climb up the tree.
> > >
> > > Check if the memcg is obsolete, and don't call res_counter_uncharge when obsole.
> > >
> > > Signed-off-by: Daisuke Nishimura <[email protected]>
> >
> > I liked the earlier, EBUSY approach that ensured that parents could
> > not go away if children exist. IMHO, the code has gotten too complex
> > and has too many corner cases. Time to revisit it.
> >
>
> But I don't like -EBUSY ;)
>
> When rmdir() returns -EBUSY even if there are no (visible) children and tasks,
> our customer will take kdump and send it to me "please explain this kernel bug"
>
> I'm sure it will happen ;)
>

OK, but memory.stat can show why the group is busy and with
move_to_parent() such issues should not occur right? I'll relook at
the code. Thanks for your input.

--
Balbir

2009-01-15 04:42:32

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 09:47:50 +0530
Balbir Singh <[email protected]> wrote:

> * KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 12:24:16]:

> > But I don't like -EBUSY ;)
> >
> > When rmdir() returns -EBUSY even if there are no (visible) children and tasks,
> > our customer will take kdump and send it to me "please explain this kernel bug"
> >
> > I'm sure it will happen ;)
> >
>
> OK, but memory.stat can show why the group is busy and with
> move_to_parent() such issues should not occur right? I'll relook at
> the code. Thanks for your input.
>

There was a design choice at swap_cgroup.

At rmdir, there may be used swap entry in memcg. (mem->memsw.usage can be > 0)
1. update all records in swap cgroup
2. just ignore account from swap, we can treat then at swap-in.

I implemented "2" by refcnt.

To do "1", we have to scan all used swap_cgroup but I don't want to scan all
swap_cgroup entry at rmdir. It's heavy job.
(*) To reduce memory usage by swap_cgroup, swap_cgroup just have a pointer to memcg
(**) I implemented swap_cgroup as statically allocated array because I don't want
any dynamic memory allocation at swap-out and want to avoid unnecessary memory
usage.

Thanks,
-Kame

2009-01-15 04:46:18

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

* KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 13:41:14]:

> On Thu, 15 Jan 2009 09:47:50 +0530
> Balbir Singh <[email protected]> wrote:
>
> > * KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 12:24:16]:
>
> > > But I don't like -EBUSY ;)
> > >
> > > When rmdir() returns -EBUSY even if there are no (visible) children and tasks,
> > > our customer will take kdump and send it to me "please explain this kernel bug"
> > >
> > > I'm sure it will happen ;)
> > >
> >
> > OK, but memory.stat can show why the group is busy and with
> > move_to_parent() such issues should not occur right? I'll relook at
> > the code. Thanks for your input.
> >
>
> There was a design choice at swap_cgroup.
>
> At rmdir, there may be used swap entry in memcg. (mem->memsw.usage can be > 0)
> 1. update all records in swap cgroup
> 2. just ignore account from swap, we can treat then at swap-in.
>
> I implemented "2" by refcnt.
>
> To do "1", we have to scan all used swap_cgroup but I don't want to scan all
> swap_cgroup entry at rmdir. It's heavy job.
> (*) To reduce memory usage by swap_cgroup, swap_cgroup just have a pointer to memcg
> (**) I implemented swap_cgroup as statically allocated array because I don't want
> any dynamic memory allocation at swap-out and want to avoid unnecessary memory
> usage.

Fair enough, but I don't like that we don't have any checks for

If parent still has children, parent should not go away. The
problem that Daisuke-San is seeing.

--
Balbir

2009-01-15 04:53:40

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 09:47:50 +0530
Balbir Singh <[email protected]> wrote:

> * KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 12:24:16]:
>
> > On Wed, 14 Jan 2009 19:25:39 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> > > * Daisuke Nishimura <[email protected]> [2009-01-14 17:51:21]:
> > >
> > > > This is a new one. Please review.
> > > >
> > > > ===
> > > > From: Daisuke Nishimura <[email protected]>
> > > >
> > > > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > > > even after the directory has been removed, but it doesn't ensure that parents
> > > > of it can be accessed: parents might have been freed already by rmdir.
> > > >
> > > > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > > > climb up the tree.
> > > >
> > > > Check if the memcg is obsolete, and don't call res_counter_uncharge when obsole.
> > > >
> > > > Signed-off-by: Daisuke Nishimura <[email protected]>
> > >
> > > I liked the earlier, EBUSY approach that ensured that parents could
> > > not go away if children exist. IMHO, the code has gotten too complex
> > > and has too many corner cases. Time to revisit it.
> > >
> >
> > But I don't like -EBUSY ;)
> >
> > When rmdir() returns -EBUSY even if there are no (visible) children and tasks,
> > our customer will take kdump and send it to me "please explain this kernel bug"
> >
> > I'm sure it will happen ;)
> >
>
> OK, but memory.stat can show why the group is busy and with
> move_to_parent() such issues should not occur right? I'll relook at
> the code. Thanks for your input.
>
Write a shell script as following ?
==
TASKS=`cat /xxx/xxx/xxx/tasks`
if [ -n $TASKS ]; then
echo "there is alive tasks in group /xxx/xxx/xxx/"
fi

rmdir /xxx/xxx/xxx/
CODE=$?
if [ $CODE = EBUSY ]; then
investigate why....
fi
==
I don't want.

I think rmdir() should succeed everywhen "there are no tasks and children".
And that can be done.

With Paul's suggestion, I'll add wait_queue for rmdir of cgroup.

-Kame

2009-01-15 04:55:31

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 10:15:57 +0530
Balbir Singh <[email protected]> wrote:

> * KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 13:41:14]:
>
> > On Thu, 15 Jan 2009 09:47:50 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> > > * KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 12:24:16]:
> >
> > > > But I don't like -EBUSY ;)
> > > >
> > > > When rmdir() returns -EBUSY even if there are no (visible) children and tasks,
> > > > our customer will take kdump and send it to me "please explain this kernel bug"
> > > >
> > > > I'm sure it will happen ;)
> > > >
> > >
> > > OK, but memory.stat can show why the group is busy and with
> > > move_to_parent() such issues should not occur right? I'll relook at
> > > the code. Thanks for your input.
> > >
> >
> > There was a design choice at swap_cgroup.
> >
> > At rmdir, there may be used swap entry in memcg. (mem->memsw.usage can be > 0)
> > 1. update all records in swap cgroup
> > 2. just ignore account from swap, we can treat then at swap-in.
> >
> > I implemented "2" by refcnt.
> >
> > To do "1", we have to scan all used swap_cgroup but I don't want to scan all
> > swap_cgroup entry at rmdir. It's heavy job.
> > (*) To reduce memory usage by swap_cgroup, swap_cgroup just have a pointer to memcg
> > (**) I implemented swap_cgroup as statically allocated array because I don't want
> > any dynamic memory allocation at swap-out and want to avoid unnecessary memory
> > usage.
>
> Fair enough, but I don't like that we don't have any checks for
>
> If parent still has children, parent should not go away. The
> problem that Daisuke-San is seeing.
>
The parent has "no children" in cgroup layer. It just has children in
"memory subsys" layer.

It's better to add mem_cgroup_get()/put agaisnt the parent at create()/destroy().
I'm now preparing a patch.

-Kame


> --
> Balbir
>

2009-01-15 05:04:08

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 11:14:20 +0900, Daisuke Nishimura <[email protected]> wrote:
> > > > To handle the problem "parent may be obsolete",
> > > >
> > > > call mem_cgroup_get(parent) at create()
> > > > call mem_cgroup_put(parent) at freeing memcg.
> > > > (regardless of use_hierarchy.)
> > > >
> > > > is clearer way to go, I think.
> > > >
> > > > I wonder whether there is mis-accounting problem or not..
> > > >
hmm, after more consideration, although this patch can prevent the BUG,
it can leak memsw accounting of parents because memsw of parents, which
have been incremented by charge, does not decremented.

I'll try pet/put parent approach..
Or any other good ideas ?


Thanks,
Daisuke Nishimura.

> > > > So, adding css_tryget() around problematic code can be a fix.
> > > > --
> > > > mem = swap_cgroup_record();
> > > > if (css_tryget(&mem->css)) {
> > > > res_counter_uncharge(&mem->memsw, PAZE_SIZE);
> > > > css_put(&mem->css)
> > > > }
> > > > --
> > > > I like css_tryget() rather than mem_cgroup_obsolete().
> > > I agree.
> > > The updated version is attached.
> > >
> > >
> > > Thanks,
> > > Daisuke nishimura.
> > >
> > > > To be honest, I'd like to remove memcg special stuff when I can.
> > > >
> > > > Thanks,
> > > > -Kame
> > > >
> > > ===
> > > From: Daisuke Nishimura <[email protected]>
> > >
> > > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > > even after the directory has been removed, but it doesn't ensure that parents
> > > of it can be accessed: parents might have been freed already by rmdir.
> > >
> > > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > > climb up the tree.
> > >
> > > Check if the memcg is obsolete by css_tryget, and don't call
> > > res_counter_uncharge when obsole.
> > >
> > > Signed-off-by: Daisuke Nishimura <[email protected]>
> > seems nice loock.
> >
> >
> > > ---
> > > mm/memcontrol.c | 15 ++++++++++++---
> > > 1 files changed, 12 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index fb62b43..4e3b100 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> > > /* avoid double counting */
> > > mem = swap_cgroup_record(ent, NULL);
> > > if (mem) {
> > > - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > + if (!css_tryget(&mem->css)) {
> > > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > + css_put(&mem->css);
> > > + }
> > > mem_cgroup_put(mem);
> > > }
> > > }
> >
> > I think css_tryget() returns "ture" at success....
> >
> > So,
> > ==
> > if (mem && css_tryget(&mem->css))
> > res_counter....
> >
> > is correct.
> >
> > -Kame
> >
> Ooops! you are right.
> Sorry for my silly mistake..
>
> "mem" is checked beforehand, so I think css_tryget would be enough.
> I'm now testing the attached one.
>
>
> Thanks,
> Daisuke Nishimura.
> ===
> From: Daisuke Nishimura <[email protected]>
>
> mem_cgroup_get ensures that the memcg that has been got can be accessed
> even after the directory has been removed, but it doesn't ensure that parents
> of it can be accessed: parents might have been freed already by rmdir.
>
> This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> climb up the tree.
>
> Check if the memcg is obsolete by css_tryget, and don't call
> res_counter_uncharge when obsole.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/memcontrol.c | 15 ++++++++++++---
> 1 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fb62b43..b9d5271 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> /* avoid double counting */
> mem = swap_cgroup_record(ent, NULL);
> if (mem) {
> - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> + if (css_tryget(&mem->css)) {
> + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> + css_put(&mem->css);
> + }
> mem_cgroup_put(mem);
> }
> }
> @@ -1252,7 +1255,10 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
> struct mem_cgroup *memcg;
> memcg = swap_cgroup_record(ent, NULL);
> if (memcg) {
> - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + if (css_tryget(&memcg->css)) {
> + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + css_put(&memcg->css);
> + }
> mem_cgroup_put(memcg);
> }
>
> @@ -1397,7 +1403,10 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
>
> memcg = swap_cgroup_record(ent, NULL);
> if (memcg) {
> - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + if (css_tryget(&memcg->css)) {
> + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> + css_put(&memcg->css);
> + }
> mem_cgroup_put(memcg);
> }
> }

2009-01-15 05:16:20

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 13:38:14 +0900
Daisuke Nishimura <[email protected]> wrote:

> On Thu, 15 Jan 2009 11:14:20 +0900, Daisuke Nishimura <[email protected]> wrote:
> > > > > To handle the problem "parent may be obsolete",
> > > > >
> > > > > call mem_cgroup_get(parent) at create()
> > > > > call mem_cgroup_put(parent) at freeing memcg.
> > > > > (regardless of use_hierarchy.)
> > > > >
> > > > > is clearer way to go, I think.
> > > > >
> > > > > I wonder whether there is mis-accounting problem or not..
> > > > >
> hmm, after more consideration, although this patch can prevent the BUG,
> it can leak memsw accounting of parents because memsw of parents, which
> have been incremented by charge, does not decremented.
>
> I'll try pet/put parent approach..
> Or any other good ideas ?
>
>
I believe get/put at create/destroy is enough now..
Let's try and see what happens.

Thanks,
-Kame


> Thanks,
> Daisuke Nishimura.
>
> > > > > So, adding css_tryget() around problematic code can be a fix.
> > > > > --
> > > > > mem = swap_cgroup_record();
> > > > > if (css_tryget(&mem->css)) {
> > > > > res_counter_uncharge(&mem->memsw, PAZE_SIZE);
> > > > > css_put(&mem->css)
> > > > > }
> > > > > --
> > > > > I like css_tryget() rather than mem_cgroup_obsolete().
> > > > I agree.
> > > > The updated version is attached.
> > > >
> > > >
> > > > Thanks,
> > > > Daisuke nishimura.
> > > >
> > > > > To be honest, I'd like to remove memcg special stuff when I can.
> > > > >
> > > > > Thanks,
> > > > > -Kame
> > > > >
> > > > ===
> > > > From: Daisuke Nishimura <[email protected]>
> > > >
> > > > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > > > even after the directory has been removed, but it doesn't ensure that parents
> > > > of it can be accessed: parents might have been freed already by rmdir.
> > > >
> > > > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > > > climb up the tree.
> > > >
> > > > Check if the memcg is obsolete by css_tryget, and don't call
> > > > res_counter_uncharge when obsole.
> > > >
> > > > Signed-off-by: Daisuke Nishimura <[email protected]>
> > > seems nice loock.
> > >
> > >
> > > > ---
> > > > mm/memcontrol.c | 15 ++++++++++++---
> > > > 1 files changed, 12 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index fb62b43..4e3b100 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> > > > /* avoid double counting */
> > > > mem = swap_cgroup_record(ent, NULL);
> > > > if (mem) {
> > > > - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > > + if (!css_tryget(&mem->css)) {
> > > > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > > + css_put(&mem->css);
> > > > + }
> > > > mem_cgroup_put(mem);
> > > > }
> > > > }
> > >
> > > I think css_tryget() returns "ture" at success....
> > >
> > > So,
> > > ==
> > > if (mem && css_tryget(&mem->css))
> > > res_counter....
> > >
> > > is correct.
> > >
> > > -Kame
> > >
> > Ooops! you are right.
> > Sorry for my silly mistake..
> >
> > "mem" is checked beforehand, so I think css_tryget would be enough.
> > I'm now testing the attached one.
> >
> >
> > Thanks,
> > Daisuke Nishimura.
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > even after the directory has been removed, but it doesn't ensure that parents
> > of it can be accessed: parents might have been freed already by rmdir.
> >
> > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > climb up the tree.
> >
> > Check if the memcg is obsolete by css_tryget, and don't call
> > res_counter_uncharge when obsole.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
> > ---
> > mm/memcontrol.c | 15 ++++++++++++---
> > 1 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index fb62b43..b9d5271 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> > /* avoid double counting */
> > mem = swap_cgroup_record(ent, NULL);
> > if (mem) {
> > - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + if (css_tryget(&mem->css)) {
> > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + css_put(&mem->css);
> > + }
> > mem_cgroup_put(mem);
> > }
> > }
> > @@ -1252,7 +1255,10 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
> > struct mem_cgroup *memcg;
> > memcg = swap_cgroup_record(ent, NULL);
> > if (memcg) {
> > - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > + if (css_tryget(&memcg->css)) {
> > + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > + css_put(&memcg->css);
> > + }
> > mem_cgroup_put(memcg);
> > }
> >
> > @@ -1397,7 +1403,10 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
> >
> > memcg = swap_cgroup_record(ent, NULL);
> > if (memcg) {
> > - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > + if (css_tryget(&memcg->css)) {
> > + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > + css_put(&memcg->css);
> > + }
> > mem_cgroup_put(memcg);
> > }
> > }
>

2009-01-15 05:25:36

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

* KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 13:52:23]:

> On Thu, 15 Jan 2009 09:47:50 +0530
> Balbir Singh <[email protected]> wrote:
>
> > * KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 12:24:16]:
> >
> > > On Wed, 14 Jan 2009 19:25:39 +0530
> > > Balbir Singh <[email protected]> wrote:
> > >
> > > > * Daisuke Nishimura <[email protected]> [2009-01-14 17:51:21]:
> > > >
> > > > > This is a new one. Please review.
> > > > >
> > > > > ===
> > > > > From: Daisuke Nishimura <[email protected]>
> > > > >
> > > > > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > > > > even after the directory has been removed, but it doesn't ensure that parents
> > > > > of it can be accessed: parents might have been freed already by rmdir.
> > > > >
> > > > > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > > > > climb up the tree.
> > > > >
> > > > > Check if the memcg is obsolete, and don't call res_counter_uncharge when obsole.
> > > > >
> > > > > Signed-off-by: Daisuke Nishimura <[email protected]>
> > > >
> > > > I liked the earlier, EBUSY approach that ensured that parents could
> > > > not go away if children exist. IMHO, the code has gotten too complex
> > > > and has too many corner cases. Time to revisit it.
> > > >
> > >
> > > But I don't like -EBUSY ;)
> > >
> > > When rmdir() returns -EBUSY even if there are no (visible) children and tasks,
> > > our customer will take kdump and send it to me "please explain this kernel bug"
> > >
> > > I'm sure it will happen ;)
> > >
> >
> > OK, but memory.stat can show why the group is busy and with
> > move_to_parent() such issues should not occur right? I'll relook at
> > the code. Thanks for your input.
> >
> Write a shell script as following ?
> ==
> TASKS=`cat /xxx/xxx/xxx/tasks`
> if [ -n $TASKS ]; then
> echo "there is alive tasks in group /xxx/xxx/xxx/"
> fi
>
> rmdir /xxx/xxx/xxx/
> CODE=$?
> if [ $CODE = EBUSY ]; then
> investigate why....
> fi
> ==
> I don't want.
>

I agree with that.

> I think rmdir() should succeed everywhen "there are no tasks and children".
> And that can be done.
>

All I am saying is that let rmdir() fail if there are tasks or
children, which I suspect cgroup takes care of. The second thing to do would
be to add a mem_cgroup_get_hierarchical() and _put_hierarchical() API's so
that we can get references all the way up to the parents. My concern
is that not calling res_counter_uncharge() can lead to dangling
references and bad behaviour.

> With Paul's suggestion, I'll add wait_queue for rmdir of cgroup.
>

That might be a good idea and also a good idea to maintain the
hierarchy (since we will walk up when we do uncharge) until we know
that css reference count is down to 0.

--
Balbir

2009-01-15 05:28:43

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 10:47:17 +0530
Balbir Singh <[email protected]> wrote:

> * KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 13:52:23]:

> > I think rmdir() should succeed everywhen "there are no tasks and children".
> > And that can be done.
> >
>
> All I am saying is that let rmdir() fail if there are tasks or
> children, which I suspect cgroup takes care of. The second thing to do would
> be to add a mem_cgroup_get_hierarchical() and _put_hierarchical() API's so
> that we can get references all the way up to the parents. My concern
> is that not calling res_counter_uncharge() can lead to dangling
> references and bad behaviour.
>
> > With Paul's suggestion, I'll add wait_queue for rmdir of cgroup.
> >
>
> That might be a good idea and also a good idea to maintain the
> hierarchy (since we will walk up when we do uncharge) until we know
> that css reference count is down to 0.
>
It seems Nishimura started his work in that direction. (see other mail.)
Let's wait a bit.

-Kame

2009-01-15 05:57:19

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

* KAMEZAWA Hiroyuki <[email protected]> [2009-01-15 14:14:58]:

> On Thu, 15 Jan 2009 13:38:14 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > On Thu, 15 Jan 2009 11:14:20 +0900, Daisuke Nishimura <[email protected]> wrote:
> > > > > > To handle the problem "parent may be obsolete",
> > > > > >
> > > > > > call mem_cgroup_get(parent) at create()
> > > > > > call mem_cgroup_put(parent) at freeing memcg.
> > > > > > (regardless of use_hierarchy.)
> > > > > >
> > > > > > is clearer way to go, I think.
> > > > > >
> > > > > > I wonder whether there is mis-accounting problem or not..
> > > > > >
> > hmm, after more consideration, although this patch can prevent the BUG,
> > it can leak memsw accounting of parents because memsw of parents, which
> > have been incremented by charge, does not decremented.
> >
> > I'll try pet/put parent approach..
> > Or any other good ideas ?
> >
> >
> I believe get/put at create/destroy is enough now..
> Let's try and see what happens.
>

Other approach could be get_hierarchical/put_hierarchical, but that
can quickly get complex. Let us try and avoid it, unless nothing else
works.

--
Balbir

2009-01-15 06:40:49

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/4] memcg: don't call res_counter_uncharge when obsolete

On Thu, 15 Jan 2009 13:38:14 +0900
Daisuke Nishimura <[email protected]> wrote:

> On Thu, 15 Jan 2009 11:14:20 +0900, Daisuke Nishimura <[email protected]> wrote:
> > > > > To handle the problem "parent may be obsolete",
> > > > >
> > > > > call mem_cgroup_get(parent) at create()
> > > > > call mem_cgroup_put(parent) at freeing memcg.
> > > > > (regardless of use_hierarchy.)
> > > > >
> > > > > is clearer way to go, I think.
> > > > >
> > > > > I wonder whether there is mis-accounting problem or not..
> > > > >
> hmm, after more consideration, although this patch can prevent the BUG,
> it can leak memsw accounting of parents because memsw of parents, which
> have been incremented by charge, does not decremented.
>
> I'll try pet/put parent approach..
> Or any other good ideas ?
>
"don't free parent's css_id until all chilredn are freed" is necessary for my
CSS ID. I'll rebase my patch on to yours.

thanks,
-Kame




>
> Thanks,
> Daisuke Nishimura.
>
> > > > > So, adding css_tryget() around problematic code can be a fix.
> > > > > --
> > > > > mem = swap_cgroup_record();
> > > > > if (css_tryget(&mem->css)) {
> > > > > res_counter_uncharge(&mem->memsw, PAZE_SIZE);
> > > > > css_put(&mem->css)
> > > > > }
> > > > > --
> > > > > I like css_tryget() rather than mem_cgroup_obsolete().
> > > > I agree.
> > > > The updated version is attached.
> > > >
> > > >
> > > > Thanks,
> > > > Daisuke nishimura.
> > > >
> > > > > To be honest, I'd like to remove memcg special stuff when I can.
> > > > >
> > > > > Thanks,
> > > > > -Kame
> > > > >
> > > > ===
> > > > From: Daisuke Nishimura <[email protected]>
> > > >
> > > > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > > > even after the directory has been removed, but it doesn't ensure that parents
> > > > of it can be accessed: parents might have been freed already by rmdir.
> > > >
> > > > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > > > climb up the tree.
> > > >
> > > > Check if the memcg is obsolete by css_tryget, and don't call
> > > > res_counter_uncharge when obsole.
> > > >
> > > > Signed-off-by: Daisuke Nishimura <[email protected]>
> > > seems nice loock.
> > >
> > >
> > > > ---
> > > > mm/memcontrol.c | 15 ++++++++++++---
> > > > 1 files changed, 12 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index fb62b43..4e3b100 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> > > > /* avoid double counting */
> > > > mem = swap_cgroup_record(ent, NULL);
> > > > if (mem) {
> > > > - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > > + if (!css_tryget(&mem->css)) {
> > > > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > > > + css_put(&mem->css);
> > > > + }
> > > > mem_cgroup_put(mem);
> > > > }
> > > > }
> > >
> > > I think css_tryget() returns "ture" at success....
> > >
> > > So,
> > > ==
> > > if (mem && css_tryget(&mem->css))
> > > res_counter....
> > >
> > > is correct.
> > >
> > > -Kame
> > >
> > Ooops! you are right.
> > Sorry for my silly mistake..
> >
> > "mem" is checked beforehand, so I think css_tryget would be enough.
> > I'm now testing the attached one.
> >
> >
> > Thanks,
> > Daisuke Nishimura.
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > even after the directory has been removed, but it doesn't ensure that parents
> > of it can be accessed: parents might have been freed already by rmdir.
> >
> > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > climb up the tree.
> >
> > Check if the memcg is obsolete by css_tryget, and don't call
> > res_counter_uncharge when obsole.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
> > ---
> > mm/memcontrol.c | 15 ++++++++++++---
> > 1 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index fb62b43..b9d5271 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1182,7 +1182,10 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> > /* avoid double counting */
> > mem = swap_cgroup_record(ent, NULL);
> > if (mem) {
> > - res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + if (css_tryget(&mem->css)) {
> > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > + css_put(&mem->css);
> > + }
> > mem_cgroup_put(mem);
> > }
> > }
> > @@ -1252,7 +1255,10 @@ void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
> > struct mem_cgroup *memcg;
> > memcg = swap_cgroup_record(ent, NULL);
> > if (memcg) {
> > - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > + if (css_tryget(&memcg->css)) {
> > + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > + css_put(&memcg->css);
> > + }
> > mem_cgroup_put(memcg);
> > }
> >
> > @@ -1397,7 +1403,10 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
> >
> > memcg = swap_cgroup_record(ent, NULL);
> > if (memcg) {
> > - res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > + if (css_tryget(&memcg->css)) {
> > + res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > + css_put(&memcg->css);
> > + }
> > mem_cgroup_put(memcg);
> > }
> > }
>

2009-01-15 07:51:45

by Daisuke Nishimura

[permalink] [raw]
Subject: [RFC][PATCH] memcg: get/put parents at create/free

On Thu, 15 Jan 2009 13:38:14 +0900, Daisuke Nishimura <[email protected]> wrote:
> On Thu, 15 Jan 2009 11:14:20 +0900, Daisuke Nishimura <[email protected]> wrote:
> > > > > To handle the problem "parent may be obsolete",
> > > > >
> > > > > call mem_cgroup_get(parent) at create()
> > > > > call mem_cgroup_put(parent) at freeing memcg.
> > > > > (regardless of use_hierarchy.)
> > > > >
> > > > > is clearer way to go, I think.
> > > > >
> > > > > I wonder whether there is mis-accounting problem or not..
> > > > >
> hmm, after more consideration, although this patch can prevent the BUG,
> it can leak memsw accounting of parents because memsw of parents, which
> have been incremented by charge, does not decremented.
>
> I'll try pet/put parent approach..
> Or any other good ideas ?
>
I attach a tryial patch.

It has been working fine so far(for about 1 hour).

Thanks,
Daisuke Nishimura.
===
From: Daisuke Nishimura <[email protected]>

mem_cgroup_get ensures that the memcg that has been got can be accessed
even after the directory has been removed, but it doesn't ensure that parents
of it can be accessed: parents might have been freed already by rmdir.

This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
climb up the tree.

This patch tries to fix this probrem by getting parents at create, and
putting them at freeing.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 33 ++++++++++++++++++++++++++++++++-
1 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..b4aed07 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {

static void mem_cgroup_get(struct mem_cgroup *mem);
static void mem_cgroup_put(struct mem_cgroup *mem);
+static void mem_cgroup_get_parents(struct mem_cgroup *mem);
+static void mem_cgroup_put_parents(struct mem_cgroup *mem);

static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
struct page_cgroup *pc,
@@ -2185,10 +2187,38 @@ static void mem_cgroup_get(struct mem_cgroup *mem)

static void mem_cgroup_put(struct mem_cgroup *mem)
{
- if (atomic_dec_and_test(&mem->refcnt))
+ if (atomic_dec_and_test(&mem->refcnt)) {
+ mem_cgroup_put_parents(mem);
__mem_cgroup_free(mem);
+ }
+}
+
+static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
+{
+ if (!mem->res.parent)
+ return NULL;
+ return mem_cgroup_from_res_counter(mem->res.parent, res);
+}
+
+static void mem_cgroup_get_parents(struct mem_cgroup *mem)
+{
+ struct mem_cgroup *parent = parent_mem_cgroup(mem);
+
+ while (parent) {
+ mem_cgroup_get(parent);
+ parent = parent_mem_cgroup(parent);
+ }
}

+static void mem_cgroup_put_parents(struct mem_cgroup *mem)
+{
+ struct mem_cgroup *parent = parent_mem_cgroup(mem);
+
+ while (parent) {
+ mem_cgroup_put(parent);
+ parent = parent_mem_cgroup(parent);
+ }
+}

#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
static void __init enable_swap_cgroup(void)
@@ -2237,6 +2267,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
if (parent)
mem->swappiness = get_swappiness(parent);
atomic_set(&mem->refcnt, 1);
+ mem_cgroup_get_parents(mem);
return &mem->css;
free_out:
__mem_cgroup_free(mem);

2009-01-15 07:56:19

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg: get/put parents at create/free

On Thu, 15 Jan 2009 16:45:37 +0900
Daisuke Nishimura <[email protected]> wrote:

> On Thu, 15 Jan 2009 13:38:14 +0900, Daisuke Nishimura <[email protected]> wrote:
> > On Thu, 15 Jan 2009 11:14:20 +0900, Daisuke Nishimura <[email protected]> wrote:
> > > > > > To handle the problem "parent may be obsolete",
> > > > > >
> > > > > > call mem_cgroup_get(parent) at create()
> > > > > > call mem_cgroup_put(parent) at freeing memcg.
> > > > > > (regardless of use_hierarchy.)
> > > > > >
> > > > > > is clearer way to go, I think.
> > > > > >
> > > > > > I wonder whether there is mis-accounting problem or not..
> > > > > >
> > hmm, after more consideration, although this patch can prevent the BUG,
> > it can leak memsw accounting of parents because memsw of parents, which
> > have been incremented by charge, does not decremented.
> >
> > I'll try pet/put parent approach..
> > Or any other good ideas ?
> >
> I attach a tryial patch.
>
> It has been working fine so far(for about 1 hour).
>
> Thanks,
> Daisuke Nishimura.
> ===
> From: Daisuke Nishimura <[email protected]>
>
> mem_cgroup_get ensures that the memcg that has been got can be accessed
> even after the directory has been removed, but it doesn't ensure that parents
> of it can be accessed: parents might have been freed already by rmdir.
>
> This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> climb up the tree.
>
> This patch tries to fix this probrem by getting parents at create, and
> putting them at freeing.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/memcontrol.c | 33 ++++++++++++++++++++++++++++++++-
> 1 files changed, 32 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fb62b43..b4aed07 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
>
> static void mem_cgroup_get(struct mem_cgroup *mem);
> static void mem_cgroup_put(struct mem_cgroup *mem);
> +static void mem_cgroup_get_parents(struct mem_cgroup *mem);
> +static void mem_cgroup_put_parents(struct mem_cgroup *mem);
>
> static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> struct page_cgroup *pc,
> @@ -2185,10 +2187,38 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
>
> static void mem_cgroup_put(struct mem_cgroup *mem)
> {
> - if (atomic_dec_and_test(&mem->refcnt))
> + if (atomic_dec_and_test(&mem->refcnt)) {
> + mem_cgroup_put_parents(mem);
> __mem_cgroup_free(mem);
> + }
> +}
> +
> +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> +{
> + if (!mem->res.parent)
> + return NULL;
> + return mem_cgroup_from_res_counter(mem->res.parent, res);
> +}
> +
> +static void mem_cgroup_get_parents(struct mem_cgroup *mem)
> +{
> + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> +
> + while (parent) {
> + mem_cgroup_get(parent);
> + parent = parent_mem_cgroup(parent);
> + }
> }
>

does we have to add refcnt to all ancestors ?

Thanks,
-Kame

> +static void mem_cgroup_put_parents(struct mem_cgroup *mem)
> +{
> + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> +
> + while (parent) {
> + mem_cgroup_put(parent);
> + parent = parent_mem_cgroup(parent);
> + }
> +}
>
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> static void __init enable_swap_cgroup(void)
> @@ -2237,6 +2267,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> if (parent)
> mem->swappiness = get_swappiness(parent);
> atomic_set(&mem->refcnt, 1);
> + mem_cgroup_get_parents(mem);
> return &mem->css;
> free_out:
> __mem_cgroup_free(mem);
>

2009-01-15 08:18:56

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg: get/put parents at create/free

On Thu, 15 Jan 2009 16:54:53 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Thu, 15 Jan 2009 16:45:37 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > On Thu, 15 Jan 2009 13:38:14 +0900, Daisuke Nishimura <[email protected]> wrote:
> > > On Thu, 15 Jan 2009 11:14:20 +0900, Daisuke Nishimura <[email protected]> wrote:
> > > > > > > To handle the problem "parent may be obsolete",
> > > > > > >
> > > > > > > call mem_cgroup_get(parent) at create()
> > > > > > > call mem_cgroup_put(parent) at freeing memcg.
> > > > > > > (regardless of use_hierarchy.)
> > > > > > >
> > > > > > > is clearer way to go, I think.
> > > > > > >
> > > > > > > I wonder whether there is mis-accounting problem or not..
> > > > > > >
> > > hmm, after more consideration, although this patch can prevent the BUG,
> > > it can leak memsw accounting of parents because memsw of parents, which
> > > have been incremented by charge, does not decremented.
> > >
> > > I'll try pet/put parent approach..
> > > Or any other good ideas ?
> > >
> > I attach a tryial patch.
> >
> > It has been working fine so far(for about 1 hour).
> >
> > Thanks,
> > Daisuke Nishimura.
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > even after the directory has been removed, but it doesn't ensure that parents
> > of it can be accessed: parents might have been freed already by rmdir.
> >
> > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > climb up the tree.
> >
> > This patch tries to fix this probrem by getting parents at create, and
> > putting them at freeing.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
> > ---
> > mm/memcontrol.c | 33 ++++++++++++++++++++++++++++++++-
> > 1 files changed, 32 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index fb62b43..b4aed07 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
> >
> > static void mem_cgroup_get(struct mem_cgroup *mem);
> > static void mem_cgroup_put(struct mem_cgroup *mem);
> > +static void mem_cgroup_get_parents(struct mem_cgroup *mem);
> > +static void mem_cgroup_put_parents(struct mem_cgroup *mem);
> >
> > static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> > struct page_cgroup *pc,
> > @@ -2185,10 +2187,38 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
> >
> > static void mem_cgroup_put(struct mem_cgroup *mem)
> > {
> > - if (atomic_dec_and_test(&mem->refcnt))
> > + if (atomic_dec_and_test(&mem->refcnt)) {
> > + mem_cgroup_put_parents(mem);
> > __mem_cgroup_free(mem);
> > + }
> > +}
> > +
> > +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> > +{
> > + if (!mem->res.parent)
> > + return NULL;
> > + return mem_cgroup_from_res_counter(mem->res.parent, res);
> > +}
> > +
> > +static void mem_cgroup_get_parents(struct mem_cgroup *mem)
> > +{
> > + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> > +
> > + while (parent) {
> > + mem_cgroup_get(parent);
> > + parent = parent_mem_cgroup(parent);
> > + }
> > }
> >
>
> does we have to add refcnt to all ancestors ?
>
Ah, no need.

Just ensureing "a parent memcg is not freed while it has child memcg unfreed"
would be enough, because a parent is a child of parent of the parent.

This is the updated version.

Thanks,
Daisuke Nishimura.
===
From: Daisuke Nishimura <[email protected]>

mem_cgroup_get ensures that the memcg that has been got can be accessed
even after the directory has been removed, but it doesn't ensure that parents
of it can be accessed: parents might have been freed already by rmdir.

This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
climb up the tree.

This patch tries to fix this probrem by getting the parent at create, and
putting it at freeing.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 29 ++++++++++++++++++++++++++++-
1 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..a80ba68 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {

static void mem_cgroup_get(struct mem_cgroup *mem);
static void mem_cgroup_put(struct mem_cgroup *mem);
+static void mem_cgroup_get_parent(struct mem_cgroup *mem);
+static void mem_cgroup_put_parent(struct mem_cgroup *mem);

static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
struct page_cgroup *pc,
@@ -2185,10 +2187,34 @@ static void mem_cgroup_get(struct mem_cgroup *mem)

static void mem_cgroup_put(struct mem_cgroup *mem)
{
- if (atomic_dec_and_test(&mem->refcnt))
+ if (atomic_dec_and_test(&mem->refcnt)) {
+ mem_cgroup_put_parent(mem);
__mem_cgroup_free(mem);
+ }
+}
+
+static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
+{
+ if (!mem->res.parent)
+ return NULL;
+ return mem_cgroup_from_res_counter(mem->res.parent, res);
+}
+
+static void mem_cgroup_get_parent(struct mem_cgroup *mem)
+{
+ struct mem_cgroup *parent = parent_mem_cgroup(mem);
+
+ if (parent)
+ mem_cgroup_get(parent);
}

+static void mem_cgroup_put_parent(struct mem_cgroup *mem)
+{
+ struct mem_cgroup *parent = parent_mem_cgroup(mem);
+
+ if (parent)
+ mem_cgroup_put(parent);
+}

#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
static void __init enable_swap_cgroup(void)
@@ -2237,6 +2263,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
if (parent)
mem->swappiness = get_swappiness(parent);
atomic_set(&mem->refcnt, 1);
+ mem_cgroup_get_parent(mem);
return &mem->css;
free_out:
__mem_cgroup_free(mem);

2009-01-15 08:24:59

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg: get/put parents at create/free

On Thu, 15 Jan 2009 17:13:15 +0900
Daisuke Nishimura <[email protected]> wrote:

> From: Daisuke Nishimura <[email protected]>
>
> mem_cgroup_get ensures that the memcg that has been got can be accessed
> even after the directory has been removed, but it doesn't ensure that parents
> of it can be accessed: parents might have been freed already by rmdir.
>
> This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> climb up the tree.
>
> This patch tries to fix this probrem by getting the parent at create, and
> putting it at freeing.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>

Seems very simple and promissive.
But one nitpick

> ---
> mm/memcontrol.c | 29 ++++++++++++++++++++++++++++-
> 1 files changed, 28 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fb62b43..a80ba68 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
>
> static void mem_cgroup_get(struct mem_cgroup *mem);
> static void mem_cgroup_put(struct mem_cgroup *mem);
> +static void mem_cgroup_get_parent(struct mem_cgroup *mem);
> +static void mem_cgroup_put_parent(struct mem_cgroup *mem);
>
> static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> struct page_cgroup *pc,
> @@ -2185,10 +2187,34 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
>
> static void mem_cgroup_put(struct mem_cgroup *mem)
> {
> - if (atomic_dec_and_test(&mem->refcnt))
> + if (atomic_dec_and_test(&mem->refcnt)) {
> + mem_cgroup_put_parent(mem);
> __mem_cgroup_free(mem);
> + }
> +}

Here, parent is freed before children is freed. Then,

==
if (atomic_dec_and_test(&mem->refcnt)) {
struct mem_cgroup *parent = parent_mem_cgroup(mem);
__mem_cgroup_free(mem);
mem_cgroup_put(parent);
}
==

Is maybe usual way.

-Kame




> +
> +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> +{
> + if (!mem->res.parent)
> + return NULL;
> + return mem_cgroup_from_res_counter(mem->res.parent, res);
> +}
> +
> +static void mem_cgroup_get_parent(struct mem_cgroup *mem)
> +{
> + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> +
> + if (parent)
> + mem_cgroup_get(parent);
> }
>
> +static void mem_cgroup_put_parent(struct mem_cgroup *mem)
> +{
> + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> +
> + if (parent)
> + mem_cgroup_put(parent);
> +}
>
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> static void __init enable_swap_cgroup(void)
> @@ -2237,6 +2263,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> if (parent)
> mem->swappiness = get_swappiness(parent);
> atomic_set(&mem->refcnt, 1);
> + mem_cgroup_get_parent(mem);
> return &mem->css;
> free_out:
> __mem_cgroup_free(mem);
>

2009-01-15 08:58:04

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg: get/put parents at create/free

On Thu, 15 Jan 2009 17:23:36 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Thu, 15 Jan 2009 17:13:15 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > From: Daisuke Nishimura <[email protected]>
> >
> > mem_cgroup_get ensures that the memcg that has been got can be accessed
> > even after the directory has been removed, but it doesn't ensure that parents
> > of it can be accessed: parents might have been freed already by rmdir.
> >
> > This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> > climb up the tree.
> >
> > This patch tries to fix this probrem by getting the parent at create, and
> > putting it at freeing.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
>
> Seems very simple and promissive.
Thanks.

> But one nitpick
>
> > ---
> > mm/memcontrol.c | 29 ++++++++++++++++++++++++++++-
> > 1 files changed, 28 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index fb62b43..a80ba68 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
> >
> > static void mem_cgroup_get(struct mem_cgroup *mem);
> > static void mem_cgroup_put(struct mem_cgroup *mem);
> > +static void mem_cgroup_get_parent(struct mem_cgroup *mem);
> > +static void mem_cgroup_put_parent(struct mem_cgroup *mem);
> >
> > static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> > struct page_cgroup *pc,
> > @@ -2185,10 +2187,34 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
> >
> > static void mem_cgroup_put(struct mem_cgroup *mem)
> > {
> > - if (atomic_dec_and_test(&mem->refcnt))
> > + if (atomic_dec_and_test(&mem->refcnt)) {
> > + mem_cgroup_put_parent(mem);
> > __mem_cgroup_free(mem);
> > + }
> > +}
>
> Here, parent is freed before children is freed. Then,
>
> ==
> if (atomic_dec_and_test(&mem->refcnt)) {
> struct mem_cgroup *parent = parent_mem_cgroup(mem);
> __mem_cgroup_free(mem);
> mem_cgroup_put(parent);
> }
> ==
>
> Is maybe usual way.
>
I see.

Thank you for your patient reviews.

Daisuke Nishimura.
===
From: Daisuke Nishimura <[email protected]>

mem_cgroup_get ensures that the memcg that has been got can be accessed
even after the directory has been removed, but it doesn't ensure that parents
of it can be accessed: parents might have been freed already by rmdir.

This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
climb up the tree.

This patch tries to fix this probrem by getting the parent at create, and
putting it at freeing.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..45e1b51 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {

static void mem_cgroup_get(struct mem_cgroup *mem);
static void mem_cgroup_put(struct mem_cgroup *mem);
+static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
+static void mem_cgroup_get_parent(struct mem_cgroup *mem);

static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
struct page_cgroup *pc,
@@ -2185,10 +2187,28 @@ static void mem_cgroup_get(struct mem_cgroup *mem)

static void mem_cgroup_put(struct mem_cgroup *mem)
{
- if (atomic_dec_and_test(&mem->refcnt))
+ if (atomic_dec_and_test(&mem->refcnt)) {
+ struct mem_cgroup *parent = parent_mem_cgroup(mem);
__mem_cgroup_free(mem);
+ if (parent)
+ mem_cgroup_put(parent);
+ }
+}
+
+static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
+{
+ if (!mem->res.parent)
+ return NULL;
+ return mem_cgroup_from_res_counter(mem->res.parent, res);
}

+static void mem_cgroup_get_parent(struct mem_cgroup *mem)
+{
+ struct mem_cgroup *parent = parent_mem_cgroup(mem);
+
+ if (parent)
+ mem_cgroup_get(parent);
+}

#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
static void __init enable_swap_cgroup(void)
@@ -2237,6 +2257,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
if (parent)
mem->swappiness = get_swappiness(parent);
atomic_set(&mem->refcnt, 1);
+ mem_cgroup_get_parent(mem);
return &mem->css;
free_out:
__mem_cgroup_free(mem);

2009-01-15 09:12:55

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg: get/put parents at create/free

On Thu, 15 Jan 2009 17:51:31 +0900
Daisuke Nishimura <[email protected]> wrote:

> > But one nitpick
> >
> > > ---
> > > mm/memcontrol.c | 29 ++++++++++++++++++++++++++++-
> > > 1 files changed, 28 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index fb62b43..a80ba68 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
> > >
> > > static void mem_cgroup_get(struct mem_cgroup *mem);
> > > static void mem_cgroup_put(struct mem_cgroup *mem);
> > > +static void mem_cgroup_get_parent(struct mem_cgroup *mem);
> > > +static void mem_cgroup_put_parent(struct mem_cgroup *mem);
> > >
> > > static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> > > struct page_cgroup *pc,
> > > @@ -2185,10 +2187,34 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
> > >
> > > static void mem_cgroup_put(struct mem_cgroup *mem)
> > > {
> > > - if (atomic_dec_and_test(&mem->refcnt))
> > > + if (atomic_dec_and_test(&mem->refcnt)) {
> > > + mem_cgroup_put_parent(mem);
> > > __mem_cgroup_free(mem);
> > > + }
> > > +}
> >
> > Here, parent is freed before children is freed. Then,
> >
> > ==
> > if (atomic_dec_and_test(&mem->refcnt)) {
> > struct mem_cgroup *parent = parent_mem_cgroup(mem);
> > __mem_cgroup_free(mem);
> > mem_cgroup_put(parent);
> > }
> > ==
> >
> > Is maybe usual way.
> >
> I see.
>
> Thank you for your patient reviews.
>
> Daisuke Nishimura.
> ===
> From: Daisuke Nishimura <[email protected]>
>
> mem_cgroup_get ensures that the memcg that has been got can be accessed
> even after the directory has been removed, but it doesn't ensure that parents
> of it can be accessed: parents might have been freed already by rmdir.
>
> This causes a bug in case of use_hierarchy==1, because res_counter_uncharge
> climb up the tree.
>
> This patch tries to fix this probrem by getting the parent at create, and
> putting it at freeing.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>

I think this will work well. I'll test.
For now,

Reviewed-by; KAMEZAWA Hiroyuki <[email protected]>


> ---
> mm/memcontrol.c | 23 ++++++++++++++++++++++-
> 1 files changed, 22 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fb62b43..45e1b51 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
>
> static void mem_cgroup_get(struct mem_cgroup *mem);
> static void mem_cgroup_put(struct mem_cgroup *mem);
> +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> +static void mem_cgroup_get_parent(struct mem_cgroup *mem);
>
> static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> struct page_cgroup *pc,
> @@ -2185,10 +2187,28 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
>
> static void mem_cgroup_put(struct mem_cgroup *mem)
> {
> - if (atomic_dec_and_test(&mem->refcnt))
> + if (atomic_dec_and_test(&mem->refcnt)) {
> + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> __mem_cgroup_free(mem);
> + if (parent)
> + mem_cgroup_put(parent);
> + }
> +}
> +
> +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> +{
> + if (!mem->res.parent)
> + return NULL;
> + return mem_cgroup_from_res_counter(mem->res.parent, res);
> }
>
> +static void mem_cgroup_get_parent(struct mem_cgroup *mem)
> +{
> + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> +
> + if (parent)
> + mem_cgroup_get(parent);
> +}
>
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> static void __init enable_swap_cgroup(void)
> @@ -2237,6 +2257,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> if (parent)
> mem->swappiness = get_swappiness(parent);
> atomic_set(&mem->refcnt, 1);
> + mem_cgroup_get_parent(mem);
> return &mem->css;
> free_out:
> __mem_cgroup_free(mem);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2009-01-16 02:01:46

by Daisuke Nishimura

[permalink] [raw]
Subject: [BUGFIX][PATCH] memcg: get/put parents at create/free

This version works well in my test.

Andrew, please pick up this one.

===
From: Daisuke Nishimura <[email protected]>

The lifetime of struct cgroup and struct mem_cgroup is different and
mem_cgroup has its own reference count for handling references from swap_cgroup.

This causes strange problem that the parent mem_cgroup dies while
child mem_cgroup alive, and this problem causes a bug in case of use_hierarchy==1
because res_counter_uncharge climbs up the tree.

This patch is for avoiding it by getting the parent at create, and
putting it at freeing.

Signed-off-by: Daisuke Nishimura <[email protected]>
Reviewed-by; KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fb62b43..45e1b51 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {

static void mem_cgroup_get(struct mem_cgroup *mem);
static void mem_cgroup_put(struct mem_cgroup *mem);
+static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
+static void mem_cgroup_get_parent(struct mem_cgroup *mem);

static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
struct page_cgroup *pc,
@@ -2185,10 +2187,28 @@ static void mem_cgroup_get(struct mem_cgroup *mem)

static void mem_cgroup_put(struct mem_cgroup *mem)
{
- if (atomic_dec_and_test(&mem->refcnt))
+ if (atomic_dec_and_test(&mem->refcnt)) {
+ struct mem_cgroup *parent = parent_mem_cgroup(mem);
__mem_cgroup_free(mem);
+ if (parent)
+ mem_cgroup_put(parent);
+ }
+}
+
+static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
+{
+ if (!mem->res.parent)
+ return NULL;
+ return mem_cgroup_from_res_counter(mem->res.parent, res);
}

+static void mem_cgroup_get_parent(struct mem_cgroup *mem)
+{
+ struct mem_cgroup *parent = parent_mem_cgroup(mem);
+
+ if (parent)
+ mem_cgroup_get(parent);
+}

#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
static void __init enable_swap_cgroup(void)
@@ -2237,6 +2257,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
if (parent)
mem->swappiness = get_swappiness(parent);
atomic_set(&mem->refcnt, 1);
+ mem_cgroup_get_parent(mem);
return &mem->css;
free_out:
__mem_cgroup_free(mem);

2009-01-16 02:13:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH] memcg: get/put parents at create/free

On Fri, 16 Jan 2009 10:50:09 +0900 Daisuke Nishimura <[email protected]> wrote:

> This version works well in my test.
>
> Andrew, please pick up this one.
>
> ===
> From: Daisuke Nishimura <[email protected]>
>
> The lifetime of struct cgroup and struct mem_cgroup is different and
> mem_cgroup has its own reference count for handling references from swap_cgroup.
>
> This causes strange problem that the parent mem_cgroup dies while
> child mem_cgroup alive, and this problem causes a bug in case of use_hierarchy==1
> because res_counter_uncharge climbs up the tree.
>
> This patch is for avoiding it by getting the parent at create, and
> putting it at freeing.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> Reviewed-by; KAMEZAWA Hiroyuki <[email protected]>
> ---
> mm/memcontrol.c | 23 ++++++++++++++++++++++-
> 1 files changed, 22 insertions(+), 1 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fb62b43..45e1b51 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
>
> static void mem_cgroup_get(struct mem_cgroup *mem);
> static void mem_cgroup_put(struct mem_cgroup *mem);
> +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> +static void mem_cgroup_get_parent(struct mem_cgroup *mem);
>
> static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> struct page_cgroup *pc,
> @@ -2185,10 +2187,28 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
>
> static void mem_cgroup_put(struct mem_cgroup *mem)
> {
> - if (atomic_dec_and_test(&mem->refcnt))
> + if (atomic_dec_and_test(&mem->refcnt)) {
> + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> __mem_cgroup_free(mem);
> + if (parent)
> + mem_cgroup_put(parent);
> + }
> +}
> +
> +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> +{
> + if (!mem->res.parent)
> + return NULL;
> + return mem_cgroup_from_res_counter(mem->res.parent, res);
> }
>
> +static void mem_cgroup_get_parent(struct mem_cgroup *mem)
> +{
> + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> +
> + if (parent)
> + mem_cgroup_get(parent);
> +}
>
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> static void __init enable_swap_cgroup(void)
> @@ -2237,6 +2257,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> if (parent)
> mem->swappiness = get_swappiness(parent);
> atomic_set(&mem->refcnt, 1);
> + mem_cgroup_get_parent(mem);
> return &mem->css;
> free_out:
> __mem_cgroup_free(mem);

It seems strange that we add a little helper function for the get(),
but open-code the put()?

2009-01-16 02:18:21

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH] memcg: get/put parents at create/free

On Thu, 15 Jan 2009 18:12:43 -0800
Andrew Morton <[email protected]> wrote:

> On Fri, 16 Jan 2009 10:50:09 +0900 Daisuke Nishimura <[email protected]> wrote:
>
> > This version works well in my test.
> >
> > Andrew, please pick up this one.
> >
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > The lifetime of struct cgroup and struct mem_cgroup is different and
> > mem_cgroup has its own reference count for handling references from swap_cgroup.
> >
> > This causes strange problem that the parent mem_cgroup dies while
> > child mem_cgroup alive, and this problem causes a bug in case of use_hierarchy==1
> > because res_counter_uncharge climbs up the tree.
> >
> > This patch is for avoiding it by getting the parent at create, and
> > putting it at freeing.
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
> > Reviewed-by; KAMEZAWA Hiroyuki <[email protected]>
> > ---
> > mm/memcontrol.c | 23 ++++++++++++++++++++++-
> > 1 files changed, 22 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index fb62b43..45e1b51 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
> >
> > static void mem_cgroup_get(struct mem_cgroup *mem);
> > static void mem_cgroup_put(struct mem_cgroup *mem);
> > +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> > +static void mem_cgroup_get_parent(struct mem_cgroup *mem);
> >
> > static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> > struct page_cgroup *pc,
> > @@ -2185,10 +2187,28 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
> >
> > static void mem_cgroup_put(struct mem_cgroup *mem)
> > {
> > - if (atomic_dec_and_test(&mem->refcnt))
> > + if (atomic_dec_and_test(&mem->refcnt)) {
> > + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> > __mem_cgroup_free(mem);
> > + if (parent)
> > + mem_cgroup_put(parent);
> > + }
> > +}
> > +
> > +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> > +{
> > + if (!mem->res.parent)
> > + return NULL;
> > + return mem_cgroup_from_res_counter(mem->res.parent, res);
> > }
> >
> > +static void mem_cgroup_get_parent(struct mem_cgroup *mem)
> > +{
> > + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> > +
> > + if (parent)
> > + mem_cgroup_get(parent);
> > +}
> >
> > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> > static void __init enable_swap_cgroup(void)
> > @@ -2237,6 +2257,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > if (parent)
> > mem->swappiness = get_swappiness(parent);
> > atomic_set(&mem->refcnt, 1);
> > + mem_cgroup_get_parent(mem);
> > return &mem->css;
> > free_out:
> > __mem_cgroup_free(mem);
>
> It seems strange that we add a little helper function for the get(),
> but open-code the put()?
>
Maybe I don't feel this as strange because I saw update history of this patch ;(
As you pointed out, I like open-code rather than helper here. Nishimura-san,
could you update ?

Thanks,
-Kame



2009-01-16 02:30:22

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH] memcg: get/put parents at create/free

On Fri, 16 Jan 2009 11:17:02 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Thu, 15 Jan 2009 18:12:43 -0800
> Andrew Morton <[email protected]> wrote:
>
> > On Fri, 16 Jan 2009 10:50:09 +0900 Daisuke Nishimura <[email protected]> wrote:
> >
> > > This version works well in my test.
> > >
> > > Andrew, please pick up this one.
> > >
> > > ===
> > > From: Daisuke Nishimura <[email protected]>
> > >
> > > The lifetime of struct cgroup and struct mem_cgroup is different and
> > > mem_cgroup has its own reference count for handling references from swap_cgroup.
> > >
> > > This causes strange problem that the parent mem_cgroup dies while
> > > child mem_cgroup alive, and this problem causes a bug in case of use_hierarchy==1
> > > because res_counter_uncharge climbs up the tree.
> > >
> > > This patch is for avoiding it by getting the parent at create, and
> > > putting it at freeing.
> > >
> > > Signed-off-by: Daisuke Nishimura <[email protected]>
> > > Reviewed-by; KAMEZAWA Hiroyuki <[email protected]>
> > > ---
> > > mm/memcontrol.c | 23 ++++++++++++++++++++++-
> > > 1 files changed, 22 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index fb62b43..45e1b51 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -202,6 +202,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
> > >
> > > static void mem_cgroup_get(struct mem_cgroup *mem);
> > > static void mem_cgroup_put(struct mem_cgroup *mem);
> > > +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> > > +static void mem_cgroup_get_parent(struct mem_cgroup *mem);
> > >
> > > static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> > > struct page_cgroup *pc,
> > > @@ -2185,10 +2187,28 @@ static void mem_cgroup_get(struct mem_cgroup *mem)
> > >
> > > static void mem_cgroup_put(struct mem_cgroup *mem)
> > > {
> > > - if (atomic_dec_and_test(&mem->refcnt))
> > > + if (atomic_dec_and_test(&mem->refcnt)) {
> > > + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> > > __mem_cgroup_free(mem);
> > > + if (parent)
> > > + mem_cgroup_put(parent);
> > > + }
> > > +}
> > > +
> > > +static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> > > +{
> > > + if (!mem->res.parent)
> > > + return NULL;
> > > + return mem_cgroup_from_res_counter(mem->res.parent, res);
> > > }
> > >
> > > +static void mem_cgroup_get_parent(struct mem_cgroup *mem)
> > > +{
> > > + struct mem_cgroup *parent = parent_mem_cgroup(mem);
> > > +
> > > + if (parent)
> > > + mem_cgroup_get(parent);
> > > +}
> > >
> > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> > > static void __init enable_swap_cgroup(void)
> > > @@ -2237,6 +2257,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > > if (parent)
> > > mem->swappiness = get_swappiness(parent);
> > > atomic_set(&mem->refcnt, 1);
> > > + mem_cgroup_get_parent(mem);
> > > return &mem->css;
> > > free_out:
> > > __mem_cgroup_free(mem);
> >
> > It seems strange that we add a little helper function for the get(),
> > but open-code the put()?
> >
> Maybe I don't feel this as strange because I saw update history of this patch ;(
> As you pointed out, I like open-code rather than helper here. Nishimura-san,
> could you update ?
>
Sure.

The patch has gone into mmotm already, so I'll send a fix patch.

please wait for a while


Thanks,
Daisuke Nishimura.

2009-01-16 04:26:21

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH] memcg: get/put parents at create/free

This is a fix for memcg-get-put-parents-at-create-free.patch.

===
From: Daisuke Nishimura <[email protected]>

Andrew suggested that it's strange to add a little helper function for get(),
while put() is open-code.

This patch also adds a few comments.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/memcontrol.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 45e1b51..92790e4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -203,7 +203,6 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
static void mem_cgroup_get(struct mem_cgroup *mem);
static void mem_cgroup_put(struct mem_cgroup *mem);
static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
-static void mem_cgroup_get_parent(struct mem_cgroup *mem);

static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
struct page_cgroup *pc,
@@ -2195,6 +2194,9 @@ static void mem_cgroup_put(struct mem_cgroup *mem)
}
}

+/*
+ * Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
+ */
static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
{
if (!mem->res.parent)
@@ -2202,14 +2204,6 @@ static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
return mem_cgroup_from_res_counter(mem->res.parent, res);
}

-static void mem_cgroup_get_parent(struct mem_cgroup *mem)
-{
- struct mem_cgroup *parent = parent_mem_cgroup(mem);
-
- if (parent)
- mem_cgroup_get(parent);
-}
-
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
static void __init enable_swap_cgroup(void)
{
@@ -2247,6 +2241,13 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
if (parent && parent->use_hierarchy) {
res_counter_init(&mem->res, &parent->res);
res_counter_init(&mem->memsw, &parent->memsw);
+ /*
+ * We increment refcnt of the parent to ensure that we can
+ * safely access it on res_counter_charge/uncharge.
+ * This refcnt will be decremented when freeing this
+ * mem_cgroup(see mem_cgroup_put).
+ */
+ mem_cgroup_get(parent);
} else {
res_counter_init(&mem->res, NULL);
res_counter_init(&mem->memsw, NULL);
@@ -2257,7 +2258,6 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
if (parent)
mem->swappiness = get_swappiness(parent);
atomic_set(&mem->refcnt, 1);
- mem_cgroup_get_parent(mem);
return &mem->css;
free_out:
__mem_cgroup_free(mem);

2009-01-16 04:32:38

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH] memcg: get/put parents at create/free

On Fri, 16 Jan 2009 13:22:53 +0900
Daisuke Nishimura <[email protected]> wrote:

> This is a fix for memcg-get-put-parents-at-create-free.patch.
>

Thank you, I'll test.

Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

> ===
> From: Daisuke Nishimura <[email protected]>
>
> Andrew suggested that it's strange to add a little helper function for get(),
> while put() is open-code.
>
> This patch also adds a few comments.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/memcontrol.c | 20 ++++++++++----------
> 1 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 45e1b51..92790e4 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -203,7 +203,6 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
> static void mem_cgroup_get(struct mem_cgroup *mem);
> static void mem_cgroup_put(struct mem_cgroup *mem);
> static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> -static void mem_cgroup_get_parent(struct mem_cgroup *mem);
>
> static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> struct page_cgroup *pc,
> @@ -2195,6 +2194,9 @@ static void mem_cgroup_put(struct mem_cgroup *mem)
> }
> }
>
> +/*
> + * Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
> + */
> static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> {
> if (!mem->res.parent)
> @@ -2202,14 +2204,6 @@ static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem)
> return mem_cgroup_from_res_counter(mem->res.parent, res);
> }
>
> -static void mem_cgroup_get_parent(struct mem_cgroup *mem)
> -{
> - struct mem_cgroup *parent = parent_mem_cgroup(mem);
> -
> - if (parent)
> - mem_cgroup_get(parent);
> -}
> -
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> static void __init enable_swap_cgroup(void)
> {
> @@ -2247,6 +2241,13 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> if (parent && parent->use_hierarchy) {
> res_counter_init(&mem->res, &parent->res);
> res_counter_init(&mem->memsw, &parent->memsw);
> + /*
> + * We increment refcnt of the parent to ensure that we can
> + * safely access it on res_counter_charge/uncharge.
> + * This refcnt will be decremented when freeing this
> + * mem_cgroup(see mem_cgroup_put).
> + */
> + mem_cgroup_get(parent);
> } else {
> res_counter_init(&mem->res, NULL);
> res_counter_init(&mem->memsw, NULL);
> @@ -2257,7 +2258,6 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> if (parent)
> mem->swappiness = get_swappiness(parent);
> atomic_set(&mem->refcnt, 1);
> - mem_cgroup_get_parent(mem);
> return &mem->css;
> free_out:
> __mem_cgroup_free(mem);
>