2010-06-09 07:04:10

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC][PATCH] memcg remove css_get/put per pages v2

Still RFC, added lkml to CC: list.
==
From: KAMEZAWA Hiroyuki <[email protected]>

Now, memory cgroup increments css(cgroup subsys state)'s reference
count per a charged page. And the reference count is kept until
the page is uncharged. But this has 2 bad effect.

1. Because css_get/put calls atoimic_inc()/dec, heavy call of them
on large smp will not scale well.
2. Because css's refcnt cannot be in a state as "ready-to-release",
cgroup's notify_on_release handler can't work with memcg.

This is a trial to remove css's refcnt per a page. Even if we remove
refcnt, pre_destroy() does enough synchronization.

After this patch, it seems css_get() is still called in try_charge().
But the logic is.

1. task_lock(mm->owner)
2. If a memcg of mm->owner is cached one, consume_stock() will work.
If it returns true, task_unlock and return immediately.
3. If a memcg doesn't hit cached one, css_get() and access res_counter
and do charge.
So, in the fast path, we don't call css_get() and can avoid access to
shared counter.
The point is that while there are used resource, memcg can't be freed.
So, if it's cached, we don't have to take css_get().

Brief test result on mmotm-2.6.32-Jul-06 + Nishimura-san's cleanup.
test with multi-threaded page fault program on 8cpu system.
(Each threads touch 2M of pages and release them by madvice().)

[Before]

39522457 page-faults
549387860 cache-misses (cache-miss/page-fault 13.90)

60.003715887 seconds time elapsed

25.75% multi-fault-all [kernel.kallsyms] [k] clear_page_c
8.59% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave
8.22% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm
8.01% multi-fault-all [kernel.kallsyms] [k] down_read_trylock
5.41% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask
5.41% multi-fault-all [kernel.kallsyms] [k] __css_put
4.48% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq
4.41% multi-fault-all [kernel.kallsyms] [k] up_read
3.60% multi-fault-all [kernel.kallsyms] [k] css_put
1.93% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault
1.91% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock
1.89% multi-fault-all [kernel.kallsyms] [k] _cond_resched
1.78% multi-fault-all [kernel.kallsyms] [k] __rmqueue
1.69% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list
1.63% multi-fault-all [kernel.kallsyms] [k] page_fault
1.45% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge
1.34% multi-fault-all [kernel.kallsyms] [k] find_vma

[After]

43253862 page-faults
505750203 cache-misses (cache-miss/page-fault 11.69)

60.004123555 seconds time elapsed

27.98% multi-fault-all [kernel.kallsyms] [k] clear_page_c
9.89% multi-fault-all [kernel.kallsyms] [k] down_read_trylock
9.88% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq
9.37% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave
5.91% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask
5.69% multi-fault-all [kernel.kallsyms] [k] up_read
2.94% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock
2.70% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault
2.66% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list
2.25% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge
1.90% multi-fault-all [kernel.kallsyms] [k] __rmqueue
1.74% multi-fault-all [kernel.kallsyms] [k] page_fault
1.52% multi-fault-all [kernel.kallsyms] [k] find_vma

Then, overhead is reduced.

Note: Because this is an extreme, not-realistic, test, I'm not sure how
this change will be benefits for usual applications.


Changelog 20100609:
- clean up try_charge().
- fixed mem_cgroup_clear_mc
- removed unnecessary warnings.
- removed task_lock, added more comments.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 96 ++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 63 insertions(+), 33 deletions(-)

Index: mmotm-2.6.34-Jun6/mm/memcontrol.c
===================================================================
--- mmotm-2.6.34-Jun6.orig/mm/memcontrol.c
+++ mmotm-2.6.34-Jun6/mm/memcontrol.c
@@ -1717,28 +1717,61 @@ static int __mem_cgroup_try_charge(struc
* thread group leader migrates. It's possible that mm is not
* set, if so charge the init_mm (happens for pagecache usage).
*/
- if (*memcg) {
+ if (!*memcg && !mm)
+ goto bypass;
+again:
+ if (*memcg) { /* css should be a valid one */
mem = *memcg;
+ VM_BUG_ON(css_is_removed(&mem->css));
+ if (mem_cgroup_is_root(mem))
+ goto done;
+ if (consume_stock(mem))
+ goto done;
css_get(&mem->css);
} else {
- mem = try_get_mem_cgroup_from_mm(mm);
- if (unlikely(!mem))
- return 0;
- *memcg = mem;
- }
+ struct task_struct *p;

- VM_BUG_ON(css_is_removed(&mem->css));
- if (mem_cgroup_is_root(mem))
- goto done;
+ rcu_read_lock();
+ p = rcu_dereference(mm->owner);
+ VM_BUG_ON(!p);
+ /*
+ * It seems there are some races when 'p' exits. But, at exit(),
+ * p->cgroups will be reset to init_css_set. So, we'll just
+ * find the root cgroup even if 'p' gets to be obsolete.
+ */
+ mem = mem_cgroup_from_task(p);
+ VM_BUG_ON(!mem);
+ if (mem_cgroup_is_root(mem)) {
+ rcu_read_unlock();
+ goto done;
+ }
+ if (consume_stock(mem)) {
+ /*
+ * It seems dagerous to access memcg without css_get().
+ * But considering how consume_stok works, it's not
+ * necessary. If consume_stock success, some charges
+ * from this memcg are cached on this cpu. So, we
+ * don't need to call css_get()/css_tryget() before
+ * calling consume_stock().
+ */
+ rcu_read_unlock();
+ goto done;
+ }
+ if (!css_tryget(&mem->css)) {
+ rcu_read_unlock();
+ goto again;
+ }
+ rcu_read_unlock();
+ }

do {
bool oom_check;

- if (consume_stock(mem))
- goto done; /* don't need to fill stock */
/* If killed, bypass charge */
- if (fatal_signal_pending(current))
+ if (fatal_signal_pending(current)) {
+ css_put(&mem->css);
goto bypass;
+ }

oom_check = false;
if (oom && !nr_oom_retries) {
@@ -1753,30 +1786,36 @@ static int __mem_cgroup_try_charge(struc
break;
case CHARGE_RETRY: /* not in OOM situation but retry */
csize = PAGE_SIZE;
- break;
+ css_put(&mem->css);
+ mem = NULL;
+ goto again;
case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */
+ css_put(&mem->css);
goto nomem;
case CHARGE_NOMEM: /* OOM routine works */
- if (!oom)
+ if (!oom) {
+ css_put(&mem->css);
goto nomem;
+ }
/* If oom, we never return -ENOMEM */
nr_oom_retries--;
break;
case CHARGE_OOM_DIE: /* Killed by OOM Killer */
+ css_put(&mem->css);
goto bypass;
}
} while (ret != CHARGE_OK);

if (csize > PAGE_SIZE)
refill_stock(mem, csize - PAGE_SIZE);
+ css_put(&mem->css);
done:
+ *memcg = mem;
return 0;
nomem:
- css_put(&mem->css);
+ *memcg = NULL;
return -ENOMEM;
bypass:
- if (mem)
- css_put(&mem->css);
*memcg = NULL;
return 0;
}
@@ -1793,11 +1832,7 @@ static void __mem_cgroup_cancel_charge(s
res_counter_uncharge(&mem->res, PAGE_SIZE * count);
if (do_swap_account)
res_counter_uncharge(&mem->memsw, PAGE_SIZE * count);
- VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
- WARN_ON_ONCE(count > INT_MAX);
- __css_put(&mem->css, (int)count);
}
- /* we don't need css_put for root */
}

static void mem_cgroup_cancel_charge(struct mem_cgroup *mem)
@@ -2158,7 +2193,6 @@ int mem_cgroup_try_charge_swapin(struct
goto charge_cur_mm;
*ptr = mem;
ret = __mem_cgroup_try_charge(NULL, mask, ptr, true);
- /* drop extra refcnt from tryget */
css_put(&mem->css);
return ret;
charge_cur_mm:
@@ -2345,9 +2379,6 @@ __mem_cgroup_uncharge_common(struct page
unlock_page_cgroup(pc);

memcg_check_events(mem, page);
- /* at swapout, this memcg will be accessed to record to swap */
- if (ctype != MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
- css_put(&mem->css);

return mem;

@@ -2432,15 +2463,18 @@ mem_cgroup_uncharge_swapcache(struct pag
if (!swapout) /* this was a swap cache but the swap is unused ! */
ctype = MEM_CGROUP_CHARGE_TYPE_DROP;

- memcg = __mem_cgroup_uncharge_common(page, ctype);
+ memcg = try_get_mem_cgroup_from_page(page);
+ if (!memcg)
+ return;
+
+ __mem_cgroup_uncharge_common(page, ctype);

/* record memcg information */
- if (do_swap_account && swapout && memcg) {
+ if (do_swap_account && swapout) {
swap_cgroup_record(ent, css_id(&memcg->css));
mem_cgroup_get(memcg);
}
- if (swapout && memcg)
- css_put(&memcg->css);
+ css_put(&memcg->css);
}
#endif

@@ -2518,7 +2552,6 @@ static int mem_cgroup_move_swap_account(
*/
if (!mem_cgroup_is_root(to))
res_counter_uncharge(&to->res, PAGE_SIZE);
- css_put(&to->css);
}
return 0;
}
@@ -4219,7 +4252,6 @@ static int mem_cgroup_do_precharge(unsig
mc.precharge += count;
VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
WARN_ON_ONCE(count > INT_MAX);
- __css_get(&mem->css, (int)count);
return ret;
}
one_by_one:
@@ -4469,8 +4501,6 @@ static void mem_cgroup_clear_mc(void)
*/
res_counter_uncharge(&mc.to->res,
PAGE_SIZE * mc.moved_swap);
- VM_BUG_ON(test_bit(CSS_ROOT, &mc.to->css.flags));
- __css_put(&mc.to->css, mc.moved_swap);
}
/* we've already done mem_cgroup_get(mc.to) */


2010-06-10 02:49:50

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg remove css_get/put per pages v2

I can't find any trivial bugs from my review at the moment.
I'll do some tests.

Some minor commens.

On Wed, 9 Jun 2010 15:59:40 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> Still RFC, added lkml to CC: list.
> ==
> From: KAMEZAWA Hiroyuki <[email protected]>
>
> Now, memory cgroup increments css(cgroup subsys state)'s reference
> count per a charged page. And the reference count is kept until
> the page is uncharged. But this has 2 bad effect.
>
> 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them
> on large smp will not scale well.
> 2. Because css's refcnt cannot be in a state as "ready-to-release",
> cgroup's notify_on_release handler can't work with memcg.
>
> This is a trial to remove css's refcnt per a page. Even if we remove
> refcnt, pre_destroy() does enough synchronization.
>
> After this patch, it seems css_get() is still called in try_charge().
> But the logic is.
>
> 1. task_lock(mm->owner)
There is no task_lock() in this version :)

(snip)
> @@ -4219,7 +4252,6 @@ static int mem_cgroup_do_precharge(unsig
> mc.precharge += count;
> VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
> WARN_ON_ONCE(count > INT_MAX);
> - __css_get(&mem->css, (int)count);
> return ret;
> }
> one_by_one:
You can remove VM_BUG_ON() and WARN_ON_ONCE() here, too.

> @@ -4469,8 +4501,6 @@ static void mem_cgroup_clear_mc(void)
> */
> res_counter_uncharge(&mc.to->res,
> PAGE_SIZE * mc.moved_swap);
> - VM_BUG_ON(test_bit(CSS_ROOT, &mc.to->css.flags));
> - __css_put(&mc.to->css, mc.moved_swap);
> }
> /* we've already done mem_cgroup_get(mc.to) */
>
>
And, you can remove "WARN_ON_ONCE(mc.moved_swap > INT_MAX)" at the beginning
of this block, too.


Thanks,
Daisuke Nishimura.

2010-06-10 02:53:53

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg remove css_get/put per pages v2

On Thu, 10 Jun 2010 11:34:24 +0900
Daisuke Nishimura <[email protected]> wrote:

> I can't find any trivial bugs from my review at the moment.
> I'll do some tests.
>

Thank you for review.

> Some minor commens.
>
> On Wed, 9 Jun 2010 15:59:40 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > Still RFC, added lkml to CC: list.
> > ==
> > From: KAMEZAWA Hiroyuki <[email protected]>
> >
> > Now, memory cgroup increments css(cgroup subsys state)'s reference
> > count per a charged page. And the reference count is kept until
> > the page is uncharged. But this has 2 bad effect.
> >
> > 1. Because css_get/put calls atoimic_inc()/dec, heavy call of them
> > on large smp will not scale well.
> > 2. Because css's refcnt cannot be in a state as "ready-to-release",
> > cgroup's notify_on_release handler can't work with memcg.
> >
> > This is a trial to remove css's refcnt per a page. Even if we remove
> > refcnt, pre_destroy() does enough synchronization.
> >
> > After this patch, it seems css_get() is still called in try_charge().
> > But the logic is.
> >
> > 1. task_lock(mm->owner)
> There is no task_lock() in this version :)
>
yes.


> (snip)
> > @@ -4219,7 +4252,6 @@ static int mem_cgroup_do_precharge(unsig
> > mc.precharge += count;
> > VM_BUG_ON(test_bit(CSS_ROOT, &mem->css.flags));
> > WARN_ON_ONCE(count > INT_MAX);
> > - __css_get(&mem->css, (int)count);
> > return ret;
> > }
> > one_by_one:
> You can remove VM_BUG_ON() and WARN_ON_ONCE() here, too.
>
ok.


> > @@ -4469,8 +4501,6 @@ static void mem_cgroup_clear_mc(void)
> > */
> > res_counter_uncharge(&mc.to->res,
> > PAGE_SIZE * mc.moved_swap);
> > - VM_BUG_ON(test_bit(CSS_ROOT, &mc.to->css.flags));
> > - __css_put(&mc.to->css, mc.moved_swap);
> > }
> > /* we've already done mem_cgroup_get(mc.to) */
> >
> >
> And, you can remove "WARN_ON_ONCE(mc.moved_swap > INT_MAX)" at the beginning
> of this block, too.
>
Hmm. ok.
Ah...them, this patch fixes the limitation by "css's refcnt is int" problem.
sounds nice.

Thanks,
-Kmae

2010-06-11 04:50:05

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg remove css_get/put per pages v2

> @@ -2432,15 +2463,18 @@ mem_cgroup_uncharge_swapcache(struct pag
> if (!swapout) /* this was a swap cache but the swap is unused ! */
> ctype = MEM_CGROUP_CHARGE_TYPE_DROP;
>
> - memcg = __mem_cgroup_uncharge_common(page, ctype);
> + memcg = try_get_mem_cgroup_from_page(page);
> + if (!memcg)
> + return;
> +
> + __mem_cgroup_uncharge_common(page, ctype);
>
> /* record memcg information */
> - if (do_swap_account && swapout && memcg) {
> + if (do_swap_account && swapout) {
> swap_cgroup_record(ent, css_id(&memcg->css));
> mem_cgroup_get(memcg);
> }
> - if (swapout && memcg)
> - css_put(&memcg->css);
> + css_put(&memcg->css);
> }
> #endif
>
hmm, this change seems to cause a problem.
I can see under flow of mem->memsw and "swap" field in memory.stat.

I think doing swap_cgroup_record() against mem_cgroup which is not returned
by __mem_cgroup_uncharge_common() is a bad behavior.

How about doing like this ? We can safely access mem_cgroup while it has
memory.usage, iow, before we call res_counter_uncharge().
After this change, it seems to work well.

---
mm/memcontrol.c | 22 +++++++++-------------
1 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6e7c1c9..2fae26f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2362,10 +2362,6 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
break;
}

- if (!mem_cgroup_is_root(mem))
- __do_uncharge(mem, ctype);
- if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
- mem_cgroup_swap_statistics(mem, true);
mem_cgroup_charge_statistics(mem, pc, false);

ClearPageCgroupUsed(pc);
@@ -2379,6 +2375,12 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
unlock_page_cgroup(pc);

memcg_check_events(mem, page);
+ if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) {
+ mem_cgroup_swap_statistics(mem, true);
+ mem_cgroup_get(mem);
+ }
+ if (!mem_cgroup_is_root(mem))
+ __do_uncharge(mem, ctype);

return mem;

@@ -2463,18 +2465,12 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
if (!swapout) /* this was a swap cache but the swap is unused ! */
ctype = MEM_CGROUP_CHARGE_TYPE_DROP;

- memcg = try_get_mem_cgroup_from_page(page);
- if (!memcg)
- return;
-
- __mem_cgroup_uncharge_common(page, ctype);
+ memcg = __mem_cgroup_uncharge_common(page, ctype);

/* record memcg information */
- if (do_swap_account && swapout) {
+ if (do_swap_account && swapout && memcg)
+ /* We've already done mem_cgroup_get() in uncharge_common(). */
swap_cgroup_record(ent, css_id(&memcg->css));
- mem_cgroup_get(memcg);
- }
- css_put(&memcg->css);
}
#endif

2010-06-11 04:56:44

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg remove css_get/put per pages v2

On Fri, 11 Jun 2010 13:37:44 +0900
Daisuke Nishimura <[email protected]> wrote:

> > @@ -2432,15 +2463,18 @@ mem_cgroup_uncharge_swapcache(struct pag
> > if (!swapout) /* this was a swap cache but the swap is unused ! */
> > ctype = MEM_CGROUP_CHARGE_TYPE_DROP;
> >
> > - memcg = __mem_cgroup_uncharge_common(page, ctype);
> > + memcg = try_get_mem_cgroup_from_page(page);
> > + if (!memcg)
> > + return;
> > +
> > + __mem_cgroup_uncharge_common(page, ctype);
> >
> > /* record memcg information */
> > - if (do_swap_account && swapout && memcg) {
> > + if (do_swap_account && swapout) {
> > swap_cgroup_record(ent, css_id(&memcg->css));
> > mem_cgroup_get(memcg);
> > }
> > - if (swapout && memcg)
> > - css_put(&memcg->css);
> > + css_put(&memcg->css);
> > }
> > #endif
> >
> hmm, this change seems to cause a problem.
> I can see under flow of mem->memsw and "swap" field in memory.stat.
>
> I think doing swap_cgroup_record() against mem_cgroup which is not returned
> by __mem_cgroup_uncharge_common() is a bad behavior.
>
> How about doing like this ? We can safely access mem_cgroup while it has
> memory.usage, iow, before we call res_counter_uncharge().
> After this change, it seems to work well.
>

Thank you!. seems to work. I'll merge your change.
Can I add your Signed-off-by ?

Thanks,
-Kame

> ---
> mm/memcontrol.c | 22 +++++++++-------------
> 1 files changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6e7c1c9..2fae26f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2362,10 +2362,6 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> break;
> }
>
> - if (!mem_cgroup_is_root(mem))
> - __do_uncharge(mem, ctype);
> - if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
> - mem_cgroup_swap_statistics(mem, true);
> mem_cgroup_charge_statistics(mem, pc, false);
>
> ClearPageCgroupUsed(pc);
> @@ -2379,6 +2375,12 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> unlock_page_cgroup(pc);
>
> memcg_check_events(mem, page);
> + if (ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) {
> + mem_cgroup_swap_statistics(mem, true);
> + mem_cgroup_get(mem);
> + }
> + if (!mem_cgroup_is_root(mem))
> + __do_uncharge(mem, ctype);
>
> return mem;
>
> @@ -2463,18 +2465,12 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
> if (!swapout) /* this was a swap cache but the swap is unused ! */
> ctype = MEM_CGROUP_CHARGE_TYPE_DROP;
>
> - memcg = try_get_mem_cgroup_from_page(page);
> - if (!memcg)
> - return;
> -
> - __mem_cgroup_uncharge_common(page, ctype);
> + memcg = __mem_cgroup_uncharge_common(page, ctype);
>
> /* record memcg information */
> - if (do_swap_account && swapout) {
> + if (do_swap_account && swapout && memcg)
> + /* We've already done mem_cgroup_get() in uncharge_common(). */
> swap_cgroup_record(ent, css_id(&memcg->css));
> - mem_cgroup_get(memcg);
> - }
> - css_put(&memcg->css);
> }
> #endif
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2010-06-11 05:05:35

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg remove css_get/put per pages v2

On Fri, 11 Jun 2010 13:52:02 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Fri, 11 Jun 2010 13:37:44 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > > @@ -2432,15 +2463,18 @@ mem_cgroup_uncharge_swapcache(struct pag
> > > if (!swapout) /* this was a swap cache but the swap is unused ! */
> > > ctype = MEM_CGROUP_CHARGE_TYPE_DROP;
> > >
> > > - memcg = __mem_cgroup_uncharge_common(page, ctype);
> > > + memcg = try_get_mem_cgroup_from_page(page);
> > > + if (!memcg)
> > > + return;
> > > +
> > > + __mem_cgroup_uncharge_common(page, ctype);
> > >
> > > /* record memcg information */
> > > - if (do_swap_account && swapout && memcg) {
> > > + if (do_swap_account && swapout) {
> > > swap_cgroup_record(ent, css_id(&memcg->css));
> > > mem_cgroup_get(memcg);
> > > }
> > > - if (swapout && memcg)
> > > - css_put(&memcg->css);
> > > + css_put(&memcg->css);
> > > }
> > > #endif
> > >
> > hmm, this change seems to cause a problem.
> > I can see under flow of mem->memsw and "swap" field in memory.stat.
> >
> > I think doing swap_cgroup_record() against mem_cgroup which is not returned
> > by __mem_cgroup_uncharge_common() is a bad behavior.
> >
> > How about doing like this ? We can safely access mem_cgroup while it has
> > memory.usage, iow, before we call res_counter_uncharge().
> > After this change, it seems to work well.
> >
>
> Thank you!. seems to work. I'll merge your change.
> Can I add your Signed-off-by ?
>
Sure.

Signed-off-by: Daisuke Nishimura <[email protected]>

2010-06-11 06:12:26

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg remove css_get/put per pages v2

* KAMEZAWA Hiroyuki <[email protected]> [2010-06-09 15:59:40]:

> + if (consume_stock(mem)) {
> + /*
> + * It seems dagerous to access memcg without css_get().
> + * But considering how consume_stok works, it's not
> + * necessary. If consume_stock success, some charges
> + * from this memcg are cached on this cpu. So, we
> + * don't need to call css_get()/css_tryget() before
> + * calling consume_stock().
> + */
> + rcu_read_unlock();
> + goto done;
> + }
> + if (!css_tryget(&mem->css)) {

If tryget fails, can one assume that this due to a race and the mem is
about to be freed?


--
Three Cheers,
Balbir

2010-06-11 06:26:11

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][PATCH] memcg remove css_get/put per pages v2

On Fri, 11 Jun 2010 11:41:02 +0530
Balbir Singh <[email protected]> wrote:

> * KAMEZAWA Hiroyuki <[email protected]> [2010-06-09 15:59:40]:
>
> > + if (consume_stock(mem)) {
> > + /*
> > + * It seems dagerous to access memcg without css_get().
> > + * But considering how consume_stok works, it's not
> > + * necessary. If consume_stock success, some charges
> > + * from this memcg are cached on this cpu. So, we
> > + * don't need to call css_get()/css_tryget() before
> > + * calling consume_stock().
> > + */
> > + rcu_read_unlock();
> > + goto done;
> > + }
> > + if (!css_tryget(&mem->css)) {
>
> If tryget fails, can one assume that this due to a race and the mem is
> about to be freed?
>
Yes. it's due to a race and "mem" will be no longer used.
This does the same thing which try_get_mem_cgrou_from_mm() does now.

Thanks,
-Kame