2008-12-12 08:30:35

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak


From: KAMEZAWA Hiroyuki <[email protected]>

Fix swap-page-fault charge leak of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page.
I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

(Page fault with WRITE)
Assume the page is SwapCache "on memory (still charged)"
try_charge() (charge += PAGESIZE)
commit_charge()
=> (Check page_cgroup and found PCG_USED bit, charge-=PAGE_SIZE
because it seems already charged.)
reuse_swap_page()
-> delete_from_swapcache()
-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
......

too much uncharge.....

To avoid this, move commit_charge() after page_mapcount() goes up to 1.
By this,
Assume the page is SwapCache "on memory"
try_charge() (charge += PAGESIZE)
reuse_swap_page() (may charge -= PAGESIZE if PCG_USED is set)
commit_charge() (Ony if page_cgroup is marked as PCG_USED,
charge -= PAGESIZE)
Accounting will be correct.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: mmotm-2.6.28-Dec11/mm/memory.c
===================================================================
--- mmotm-2.6.28-Dec11.orig/mm/memory.c
+++ mmotm-2.6.28-Dec11/mm/memory.c
@@ -2433,17 +2433,17 @@ static int do_swap_page(struct mm_struct
* which may delete_from_swap_cache().
*/

- mem_cgroup_commit_charge_swapin(page, ptr);
inc_mm_counter(mm, anon_rss);
pte = mk_pte(page, vma->vm_page_prot);
if (write_access && reuse_swap_page(page)) {
pte = maybe_mkwrite(pte_mkdirty(pte), vma);
write_access = 0;
}
-
flush_icache_page(vma, page);
set_pte_at(mm, address, page_table, pte);
page_add_anon_rmap(page, vma, address);
+ /* It's better to call commit-charge after rmap is established */
+ mem_cgroup_commit_charge_swapin(page, ptr);

swap_free(entry);
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))


2008-12-12 09:53:46

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak

(add CC: Hugh Dickins <[email protected]>)

On Fri, 12 Dec 2008 17:29:30 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
>
> From: KAMEZAWA Hiroyuki <[email protected]>
>
> Fix swap-page-fault charge leak of memcg.
>
> Now, memcg has hooks to swap-out operation and checks SwapCache is really
> unused or not. That check depends on contents of struct page.
> I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
> still-in-use.
>
> Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
> of any rmap. Then, in followinig sequence
>
> (Page fault with WRITE)
> Assume the page is SwapCache "on memory (still charged)"
> try_charge() (charge += PAGESIZE)
> commit_charge()
> => (Check page_cgroup and found PCG_USED bit, charge-=PAGE_SIZE
> because it seems already charged.)
> reuse_swap_page()
> -> delete_from_swapcache()
> -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
> ......
>
> too much uncharge.....
>
> To avoid this, move commit_charge() after page_mapcount() goes up to 1.
> By this,
> Assume the page is SwapCache "on memory"
> try_charge() (charge += PAGESIZE)
> reuse_swap_page() (may charge -= PAGESIZE if PCG_USED is set)
> commit_charge() (Ony if page_cgroup is marked as PCG_USED,
> charge -= PAGESIZE)
> Accounting will be correct.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

I've confirmed that the problem you reported offline is fixed, but...

> ---
> mm/memory.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: mmotm-2.6.28-Dec11/mm/memory.c
> ===================================================================
> --- mmotm-2.6.28-Dec11.orig/mm/memory.c
> +++ mmotm-2.6.28-Dec11/mm/memory.c
> @@ -2433,17 +2433,17 @@ static int do_swap_page(struct mm_struct
> * which may delete_from_swap_cache().
> */
>
The comment here says:

/*
* The page isn't present yet, go ahead with the fault.
*
* Be careful about the sequence of operations here.
* To get its accounting right, reuse_swap_page() must be called
* while the page is counted on swap but not yet in mapcount i.e.
* before page_add_anon_rmap() and swap_free(); try_to_free_swap()
* must be called after the swap_free(), or it will never succeed.
* And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
* in page->private, must be called before reuse_swap_page(),
* which may delete_from_swap_cache().
*/

Hmm.. should we save page->private before calling reuse_swap_page and pass it
to mem_cgroup_commit_charge_swapin(I think it cannot be changed because the page
is locked)?


Thanks,
Daisuke Nishimura.

> - mem_cgroup_commit_charge_swapin(page, ptr);
> inc_mm_counter(mm, anon_rss);
> pte = mk_pte(page, vma->vm_page_prot);
> if (write_access && reuse_swap_page(page)) {
> pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> write_access = 0;
> }
> -
> flush_icache_page(vma, page);
> set_pte_at(mm, address, page_table, pte);
> page_add_anon_rmap(page, vma, address);
> + /* It's better to call commit-charge after rmap is established */
> + mem_cgroup_commit_charge_swapin(page, ptr);
>
> swap_free(entry);
> if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>

2008-12-12 11:16:18

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak

Daisuke Nishimura said:
>
> /*
> * The page isn't present yet, go ahead with the fault.
> *
> * Be careful about the sequence of operations here.
> * To get its accounting right, reuse_swap_page() must be called
> * while the page is counted on swap but not yet in mapcount i.e.
> * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
> * must be called after the swap_free(), or it will never succeed.
> * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
> * in page->private, must be called before reuse_swap_page(),
> * which may delete_from_swap_cache().
> */
>
> Hmm.. should we save page->private before calling reuse_swap_page and pass
> it
> to mem_cgroup_commit_charge_swapin(I think it cannot be changed because
> the page
> is locked)?
>
seems not necessary (see below). I'll fix comment if I uses my pc tomorrow..

Considering 2 cases,
A. the SwapCache is already chareged before try_charge_swapin()
B. the SwapCache is very new and not charged before try_charge_swapin()

Case A.
0. We have charge of PAGE_SIZE to this page before reach here.
1. try_charge_swapin() is called and charge += PAGE_SIZE
2. reuse_swap_page() is called.
when delete_from_swap_cache() is called..
2-a. if already mapped, no change in charges.
2-b. if not mapped, charge-=PAGE_SIZE. PCG_USED bit is cleared.
and charge-record is written into swap_cgroup
not called.
2-c. no changes in charge.
3. commit_charge is called.
3-a. PCG_USED bit is set, so charge -= PAGE_SIZE.
3-b. PCG_USED bit is cleared and so we set PCG_USED bit and no
changes in charge.
3-c. no changes in charge.
4-b. swap_free() will clear record in swap_cgroup.

Then, finally we have PAGE_SIZE of charge to this page.

Case B.
0. We have no charges to this page.
1. try_charge_swapin() is called and charge += PAGE_SIZE.
2. reuse_swap_page() is called.
2-a if delete_from_swap_cache() is called.
the page is not mapped. but PCG_USED bit is not set.
so, no change in charges finally. (just recorded in swap_cgroup)
2-b. not called ... no changes in charge.
3. commit_charge() is called and set PCG_USED bit. no changes in charnge.
4. swap_free() is called and clear record in swap_cgroup.

Then, finally we have PAGE_SIZE of charge to this page.



Thanks,
-Kame


2008-12-13 07:04:17

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2)

Updated explanation and fixed comment.
==

From: KAMEZAWA Hiroyuki <[email protected]>

Fix swapin charge operation of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page.
I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

(Page fault with WRITE)
try_charge() (charge += PAGESIZE)
commit_charge() (Check page_cgroup is used or not..)
reuse_swap_page()
-> delete_from_swapcache()
-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
......
New charge is uncharged soon....
To avoid this, move commit_charge() after page_mapcount() goes up to 1.
By this,

try_charge() (usage += PAGESIZE)
reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set)
commit_charge() (If page_cgroup is not marked as PCG_USED,
add new charge.)
Accounting will be correct.

Changelog (v1) -> (v2)
- fixed comment.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
Documentation/controllers/memcg_test.txt | 50 +++++++++++++++++++++++++++++--
mm/memory.c | 11 +++---
2 files changed, 54 insertions(+), 7 deletions(-)

Index: mmotm-2.6.28-Dec12/mm/memory.c
===================================================================
--- mmotm-2.6.28-Dec12.orig/mm/memory.c
+++ mmotm-2.6.28-Dec12/mm/memory.c
@@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
* while the page is counted on swap but not yet in mapcount i.e.
* before page_add_anon_rmap() and swap_free(); try_to_free_swap()
* must be called after the swap_free(), or it will never succeed.
- * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
- * in page->private, must be called before reuse_swap_page(),
- * which may delete_from_swap_cache().
+ * Because delete_from_swap_page() may be called by reuse_swap_page(),
+ * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
+ * in page->private. In this case, a record in swap_cgroup is silently
+ * discarded at swap_free().
*/

- mem_cgroup_commit_charge_swapin(page, ptr);
inc_mm_counter(mm, anon_rss);
pte = mk_pte(page, vma->vm_page_prot);
if (write_access && reuse_swap_page(page)) {
pte = maybe_mkwrite(pte_mkdirty(pte), vma);
write_access = 0;
}
-
flush_icache_page(vma, page);
set_pte_at(mm, address, page_table, pte);
page_add_anon_rmap(page, vma, address);
+ /* It's better to call commit-charge after rmap is established */
+ mem_cgroup_commit_charge_swapin(page, ptr);

swap_free(entry);
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
===================================================================
--- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
+++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
@@ -1,6 +1,6 @@
Memory Resource Controller(Memcg) Implementation Memo.
-Last Updated: 2008/12/10
-Base Kernel Version: based on 2.6.28-rc7-mm.
+Last Updated: 2008/12/13
+Base Kernel Version: based on 2.6.28-rc8-mm.

Because VM is getting complex (one of reasons is memcg...), memcg's behavior
is complex. This is a document for memcg's internal behavior.
@@ -115,6 +115,52 @@ Under below explanation, we assume CONFI
(But racy state between (a) and (b) exists. We do check it.)
At charging, a charge recorded in swap_cgroup is moved to page_cgroup.

+ In case (a), reuse_swap_page() may call delete_from_swap_cache() if
+ the page can drop swp_entry and be reused for "WRITE".
+ Note: If the page may be accounted before (A), if it isn't kicked out
+ to disk before page fault.
+
+ case A) the page is not accounted as SwapCache and SwapCache is deleted
+ by reuse_swap_page().
+ 1. try_charge_swapin() is called and
+ - charge_for_memory +=1.
+ - charge_for_memsw +=1.
+ 2. reuse_swap_page -> delete_from_swap_cache() is called.
+ because the page is not accounted as SwapCache,
+ no changes in accounting.
+ 3. commit_charge_swapin() finds PCG_USED bit is not set and
+ set PCG_USED bit.
+ Because page->private is empty by 2. no changes in charge.
+ 4. swap_free(entry) is called.
+ - charge_for_memsw -= 1.
+
+ Finally, charge_for_memory +=1, charge_for_memsw = +-0.
+
+ case B) the page is accounted as SwapCache and SwapCache is deleted
+ by reuse_swap_page.
+ 1. try_charge_swapin() is called.
+ - charge_for_memory += 1.
+ - charge_for_memsw += 1.
+ 2. reuse_swap_page -> delete_from_swap_cache() is called.
+ PCG_USED bit is found and cleared.
+ - charge_for_memory -= 1. (swap_cgroup is recorded.)
+ 3. commit_charge_swapin() finds PCG_USED bit is not set.
+ 4. swap_free(entry) is called and
+ - charge_for_memsw -= 1.
+
+ Finally, charge_for_memory = +-0, charge_for_memsw = +-0.
+
+ case C) the page is not accounted as SwapCache and reuse_swap_page
+ doesn't call delete_from_swap_cache()
+ 1. try_charge_swapin() is called.
+ - charge_for_memory += 1.
+ - charge_for_memsw += 1.
+ 2. commit_charge_swapin() finds PCG_USED bit is not set
+ and finds swap_cgroup records this entry.
+ - charge_for_memsw -= 1.
+
+ Finally, charge_for_memory +=1, charge_for_memsw = +-0
+
4.2 Swap-out.
At swap-out, typical state transition is below.

2008-12-13 09:49:21

by Hugh Dickins

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2)

On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote:
> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
> +++ mmotm-2.6.28-Dec12/mm/memory.c
>
> - mem_cgroup_commit_charge_swapin(page, ptr);
> inc_mm_counter(mm, anon_rss);
> pte = mk_pte(page, vma->vm_page_prot);
> if (write_access && reuse_swap_page(page)) {
> pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> write_access = 0;
> }
> -
> flush_icache_page(vma, page);
> set_pte_at(mm, address, page_table, pte);
> page_add_anon_rmap(page, vma, address);
> + /* It's better to call commit-charge after rmap is established */
> + mem_cgroup_commit_charge_swapin(page, ptr);
>
> swap_free(entry);
> if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))

That ordering is back to how it was before I adjusted it
for reuse_swap_page()'s delete_from_swap_cache(), isn't it?

So I don't understand how you've fixed the bug I hit (not an
accounting imbalance but an oops or BUG, I forget) with this
ordering, without making some other change elsewhere.

mem_cgroup_commit_charge_swapin calls swap_cgroup_record with
bogus swp_entry_t 0, which appears to belong to swp_offset 0 of
swp_type 0, but the ctrl->map for type 0 may have been freed
ages ago (we do always start from 0, but maybe we swapped on
type 1 and swapped off type 0 meanwhile). I'm guessing that
by looking at the code, not by retesting it, so I may have the
details wrong; but I didn't reorder your code just for fun.

Perhaps your restored ordering works if you check PageSwapCache
in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record,
but I don't see that in yesterday's mmotm, nor in this patch.

(And I should admit, I've not even attempted to follow your
accounting justification: I'll leave that to you memcg guys.)

An alternative could be not to clear page->private when deleting
from swap cache, that's only done for tidiness and to force notice
of races like this; but I'd want a much stronger reason to change that.

Or am I making this up? As I say, I've not tested it this time around.

Hugh

2008-12-13 10:27:59

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2)

Hugh Dickins said:
> On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote:
>> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
>> +++ mmotm-2.6.28-Dec12/mm/memory.c
>>
>> - mem_cgroup_commit_charge_swapin(page, ptr);
>> inc_mm_counter(mm, anon_rss);
>> pte = mk_pte(page, vma->vm_page_prot);
>> if (write_access && reuse_swap_page(page)) {
>> pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>> write_access = 0;
>> }
>> -
>> flush_icache_page(vma, page);
>> set_pte_at(mm, address, page_table, pte);
>> page_add_anon_rmap(page, vma, address);
>> + /* It's better to call commit-charge after rmap is established */
>> + mem_cgroup_commit_charge_swapin(page, ptr);
>>
>> swap_free(entry);
>> if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) ||
>> PageMlocked(page))
>
> That ordering is back to how it was before I adjusted it
> for reuse_swap_page()'s delete_from_swap_cache(), isn't it?
>
> So I don't understand how you've fixed the bug I hit (not an
> accounting imbalance but an oops or BUG, I forget) with this
> ordering, without making some other change elsewhere.
>
Ah, this is for fixing the bug by this order of calls.
==


> mem_cgroup_commit_charge_swapin calls swap_cgroup_record with
> bogus swp_entry_t 0, which appears to belong to swp_offset 0 of
> swp_type 0, but the ctrl->map for type 0 may have been freed
> ages ago (we do always start from 0, but maybe we swapped on
> type 1 and swapped off type 0 meanwhile). I'm guessing that
> by looking at the code, not by retesting it, so I may have the
> details wrong; but I didn't reorder your code just for fun.
>
Ah, sorry. commit_charge_swapin() should chekc the page is still
SwapCache. Sorry. I'll update this in Monday.

> Perhaps your restored ordering works if you check PageSwapCache
> in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record,
> but I don't see that in yesterday's mmotm, nor in this patch.
>
yes. I'm wrong at that point.
I'll add PageSwapCache check to "commit" ops.

> (And I should admit, I've not even attempted to follow your
> accounting justification: I'll leave that to you memcg guys.)
>
> An alternative could be not to clear page->private when deleting
> from swap cache, that's only done for tidiness and to force notice
> of races like this; but I'd want a much stronger reason to change that.
>
Hmm, doesn't that change will add new unnecessary complex ?

> Or am I making this up? As I say, I've not tested it this time around.
>
I'll revisit this Monday and think of swp_entry==0 problem.

Thank you for pointing out.

-Kame

> Hugh
>

2008-12-13 10:38:47

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2)

Hugh Dickins said:
> On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote:
>> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
>> +++ mmotm-2.6.28-Dec12/mm/memory.c
>>
>> - mem_cgroup_commit_charge_swapin(page, ptr);
>> inc_mm_counter(mm, anon_rss);
>> pte = mk_pte(page, vma->vm_page_prot);
>> if (write_access && reuse_swap_page(page)) {
>> pte = maybe_mkwrite(pte_mkdirty(pte), vma);
>> write_access = 0;
>> }
>> -
>> flush_icache_page(vma, page);
>> set_pte_at(mm, address, page_table, pte);
>> page_add_anon_rmap(page, vma, address);
>> + /* It's better to call commit-charge after rmap is established */
>> + mem_cgroup_commit_charge_swapin(page, ptr);
>>
>> swap_free(entry);
>> if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) ||
>> PageMlocked(page))
>
> That ordering is back to how it was before I adjusted it
> for reuse_swap_page()'s delete_from_swap_cache(), isn't it?
>
> So I don't understand how you've fixed the bug I hit (not an
> accounting imbalance but an oops or BUG, I forget) with this
> ordering, without making some other change elsewhere.
>
Ah, this is a fix for the new bug by this order.
==
try_charge()
commit_charge()
reuse_swap_page()
-> delete_from_swapcache() -> uncharge_swapcache().
increase mapcount here.
==
Because ucharge_swapcache() assumes following
a. if mapcount==0, this swap cache is of no use and will be discarded.
b. if mapcount >0, this swap cache is in use.
A charge commited by commit_charge() is discarded by reuse_swap_page().

By delaying commit (means checking flag of page_cgroup).
==
try_charge()
reuse_swap_page()
commit_charge()
==
the leak of charge doesn't happen.
(reuse_swap_page() may drop page from swap-cache, but it's no probelm to
commit. But as you say, this has swp_entry==0 bug.)

> mem_cgroup_commit_charge_swapin calls swap_cgroup_record with
> bogus swp_entry_t 0, which appears to belong to swp_offset 0 of
> swp_type 0, but the ctrl->map for type 0 may have been freed
> ages ago (we do always start from 0, but maybe we swapped on
> type 1 and swapped off type 0 meanwhile). I'm guessing that
> by looking at the code, not by retesting it, so I may have the
> details wrong; but I didn't reorder your code just for fun.
>
> Perhaps your restored ordering works if you check PageSwapCache
> in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record,
> but I don't see that in yesterday's mmotm, nor in this patch.
>
Ahhhh, sorry. ok, swp_entry==0 is valid...Sigh...
I'll revisit this and check how commit_charge() works.
I think checking PageSwapCache() is enough but if not, do somehing other.
(Maybe Nishimura's suggestion to pass swp_entry directly to commit_charge()
is one way.)

> (And I should admit, I've not even attempted to follow your
> accounting justification: I'll leave that to you memcg guys.)
>
Sorry for complication ;(

> An alternative could be not to clear page->private when deleting
> from swap cache, that's only done for tidiness and to force notice
> of races like this; but I'd want a much stronger reason to change that.
>
It seems that it will add another complex or unexpected behavior..
I think I can do something workaround.

> Or am I making this up? As I say, I've not tested it this time around.
>
I'll ask you if I found I can't do anything ;(

Thank you for pointing out!
I'll revisit this on Monday.

-Kame


2008-12-15 07:09:03

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)


From: KAMEZAWA Hiroyuki <[email protected]>

Fix swapin charge operation of memcg.

Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page.
I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.

Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence

(Page fault with WRITE)
try_charge() (charge += PAGESIZE)
commit_charge() (Check page_cgroup is used or not..)
reuse_swap_page()
-> delete_from_swapcache()
-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
......
New charge is uncharged soon....
To avoid this, move commit_charge() after page_mapcount() goes up to 1.
By this,

try_charge() (usage += PAGESIZE)
reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set)
commit_charge() (If page_cgroup is not marked as PCG_USED,
add new charge.)
Accounting will be correct.

Changelog (v2) -> (v3)
- fixed invalid charge to swp_entry==0.
- updated documentation.
Changelog (v1) -> (v2)
- fixed comment.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
Documentation/controllers/memcg_test.txt | 41 +++++++++++++++++++++++++++----
mm/memcontrol.c | 7 +++--
mm/memory.c | 11 ++++----
3 files changed, 46 insertions(+), 13 deletions(-)

Index: mmotm-2.6.28-Dec12/mm/memory.c
===================================================================
--- mmotm-2.6.28-Dec12.orig/mm/memory.c
+++ mmotm-2.6.28-Dec12/mm/memory.c
@@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
* while the page is counted on swap but not yet in mapcount i.e.
* before page_add_anon_rmap() and swap_free(); try_to_free_swap()
* must be called after the swap_free(), or it will never succeed.
- * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
- * in page->private, must be called before reuse_swap_page(),
- * which may delete_from_swap_cache().
+ * Because delete_from_swap_page() may be called by reuse_swap_page(),
+ * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
+ * in page->private. In this case, a record in swap_cgroup is silently
+ * discarded at swap_free().
*/

- mem_cgroup_commit_charge_swapin(page, ptr);
inc_mm_counter(mm, anon_rss);
pte = mk_pte(page, vma->vm_page_prot);
if (write_access && reuse_swap_page(page)) {
pte = maybe_mkwrite(pte_mkdirty(pte), vma);
write_access = 0;
}
-
flush_icache_page(vma, page);
set_pte_at(mm, address, page_table, pte);
page_add_anon_rmap(page, vma, address);
+ /* It's better to call commit-charge after rmap is established */
+ mem_cgroup_commit_charge_swapin(page, ptr);

swap_free(entry);
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
===================================================================
--- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
+++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
@@ -1,6 +1,6 @@
Memory Resource Controller(Memcg) Implementation Memo.
-Last Updated: 2008/12/10
-Base Kernel Version: based on 2.6.28-rc7-mm.
+Last Updated: 2008/12/15
+Base Kernel Version: based on 2.6.28-rc8-mm.

Because VM is getting complex (one of reasons is memcg...), memcg's behavior
is complex. This is a document for memcg's internal behavior.
@@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
(b) If the SwapCache has been mapped by processes, it has been
charged already.

- In case (a), we charge it. In case (b), we don't charge it.
- (But racy state between (a) and (b) exists. We do check it.)
- At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
+ This swap-in is one of the most complicated work. In do_swap_page(),
+ following events occur when pte is unchanged.
+
+ (1) the page (SwapCache) is looked up.
+ (2) lock_page()
+ (3) try_charge_swapin()
+ (4) reuse_swap_page() (may call delete_swap_cache())
+ (5) commit_charge_swapin()
+ (6) swap_free().
+
+ Considering following situation for example.
+
+ (A) The page has not been charged before (2) and reuse_swap_page()
+ doesn't call delete_from_swap_cache().
+ (B) The page has not been charged before (2) and reuse_swap_page()
+ calls delete_from_swap_cache().
+ (C) The page has been charged before (2) and reuse_swap_page() doesn't
+ call delete_from_swap_cache().
+ (D) The page has been charged before (2) and reuse_swap_page() calls
+ delete_from_swap_cache().
+
+ memory.usage/memsw.usage changes to this page/swp_entry will be
+ Case (A) (B) (C) (D)
+ Event
+ Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
+ ===========================================
+ (3) +1/+1 +1/+1 +1/+1 +1/+1
+ (4) - 0/ 0 - -1/ 0
+ (5) 0/ 1 0/-1 -1/-1 0/ 0
+ (6) - - - 0/-1
+ ===========================================
+ Result 1/ 1 1/1 1/ 1 1/ 1
+
+ In any cases, charges to this page should be 1/ 1.

4.2 Swap-out.
At swap-out, typical state transition is below.
Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
===================================================================
--- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
+++ mmotm-2.6.28-Dec12/mm/memcontrol.c
@@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
/*
* Now swap is on-memory. This means this page may be
* counted both as mem and swap....double count.
- * Fix it by uncharging from memsw. This SwapCache is stable
- * because we're still under lock_page().
+ * Fix it by uncharging from memsw. Basically, this SwapCache is stable
+ * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
+ * may call delete_from_swap_cache() before reach here.
*/
- if (do_swap_account) {
+ if (do_swap_account && PageSwapCache(page)) {
swp_entry_t ent = {.val = page_private(page)};
struct mem_cgroup *memcg;
memcg = swap_cgroup_record(ent, NULL);

2008-12-15 08:37:42

by Balbir Singh

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)

* KAMEZAWA Hiroyuki <[email protected]> [2008-12-15 16:07:51]:

>
> From: KAMEZAWA Hiroyuki <[email protected]>
>
> Fix swapin charge operation of memcg.
>
> Now, memcg has hooks to swap-out operation and checks SwapCache is really
> unused or not. That check depends on contents of struct page.
> I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
> still-in-use.
>
> Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
> of any rmap. Then, in followinig sequence
>
> (Page fault with WRITE)
> try_charge() (charge += PAGESIZE)
> commit_charge() (Check page_cgroup is used or not..)
> reuse_swap_page()
> -> delete_from_swapcache()
> -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
> ......
> New charge is uncharged soon....
> To avoid this, move commit_charge() after page_mapcount() goes up to 1.
> By this,
>
> try_charge() (usage += PAGESIZE)
> reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set)
> commit_charge() (If page_cgroup is not marked as PCG_USED,
> add new charge.)
> Accounting will be correct.
>
> Changelog (v2) -> (v3)
> - fixed invalid charge to swp_entry==0.
> - updated documentation.
> Changelog (v1) -> (v2)
> - fixed comment.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> Documentation/controllers/memcg_test.txt | 41 +++++++++++++++++++++++++++----
> mm/memcontrol.c | 7 +++--
> mm/memory.c | 11 ++++----
> 3 files changed, 46 insertions(+), 13 deletions(-)
>
> Index: mmotm-2.6.28-Dec12/mm/memory.c
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/mm/memory.c
> +++ mmotm-2.6.28-Dec12/mm/memory.c
> @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
> * while the page is counted on swap but not yet in mapcount i.e.
> * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
> * must be called after the swap_free(), or it will never succeed.
> - * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
> - * in page->private, must be called before reuse_swap_page(),
> - * which may delete_from_swap_cache().
> + * Because delete_from_swap_page() may be called by reuse_swap_page(),
> + * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
> + * in page->private. In this case, a record in swap_cgroup is silently
> + * discarded at swap_free().
> */
>
> - mem_cgroup_commit_charge_swapin(page, ptr);
> inc_mm_counter(mm, anon_rss);
> pte = mk_pte(page, vma->vm_page_prot);
> if (write_access && reuse_swap_page(page)) {
> pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> write_access = 0;
> }
> -

Removal of unassociated lines, not sure if that is a good practice.

> flush_icache_page(vma, page);
> set_pte_at(mm, address, page_table, pte);
> page_add_anon_rmap(page, vma, address);
> + /* It's better to call commit-charge after rmap is established */
> + mem_cgroup_commit_charge_swapin(page, ptr);
>

Yes, it does make sense

> swap_free(entry);
> if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
> Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
> +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> @@ -1,6 +1,6 @@
> Memory Resource Controller(Memcg) Implementation Memo.
> -Last Updated: 2008/12/10
> -Base Kernel Version: based on 2.6.28-rc7-mm.
> +Last Updated: 2008/12/15
> +Base Kernel Version: based on 2.6.28-rc8-mm.
>
> Because VM is getting complex (one of reasons is memcg...), memcg's behavior
> is complex. This is a document for memcg's internal behavior.
> @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
> (b) If the SwapCache has been mapped by processes, it has been
> charged already.
>
> - In case (a), we charge it. In case (b), we don't charge it.
> - (But racy state between (a) and (b) exists. We do check it.)
> - At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> + This swap-in is one of the most complicated work. In do_swap_page(),
> + following events occur when pte is unchanged.
> +
> + (1) the page (SwapCache) is looked up.
> + (2) lock_page()
> + (3) try_charge_swapin()
> + (4) reuse_swap_page() (may call delete_swap_cache())
> + (5) commit_charge_swapin()
> + (6) swap_free().
> +
> + Considering following situation for example.
> +
> + (A) The page has not been charged before (2) and reuse_swap_page()
> + doesn't call delete_from_swap_cache().
> + (B) The page has not been charged before (2) and reuse_swap_page()
> + calls delete_from_swap_cache().
> + (C) The page has been charged before (2) and reuse_swap_page() doesn't
> + call delete_from_swap_cache().
> + (D) The page has been charged before (2) and reuse_swap_page() calls
> + delete_from_swap_cache().
> +
> + memory.usage/memsw.usage changes to this page/swp_entry will be
> + Case (A) (B) (C) (D)
> + Event
> + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
> + ===========================================
> + (3) +1/+1 +1/+1 +1/+1 +1/+1
> + (4) - 0/ 0 - -1/ 0
> + (5) 0/ 1 0/-1 -1/-1 0/ 0
> + (6) - - - 0/-1
> + ===========================================
> + Result 1/ 1 1/1 1/ 1 1/ 1
> +
> + In any cases, charges to this page should be 1/ 1.
>

The documentation patch failed to apply for me

> 4.2 Swap-out.
> At swap-out, typical state transition is below.
> Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
> +++ mmotm-2.6.28-Dec12/mm/memcontrol.c
> @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
> /*
> * Now swap is on-memory. This means this page may be
> * counted both as mem and swap....double count.
> - * Fix it by uncharging from memsw. This SwapCache is stable
> - * because we're still under lock_page().
> + * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> + * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> + * may call delete_from_swap_cache() before reach here.
> */
> - if (do_swap_account) {
> + if (do_swap_account && PageSwapCache(page)) {
> swp_entry_t ent = {.val = page_private(page)};
> struct mem_cgroup *memcg;
> memcg = swap_cgroup_record(ent, NULL);
>
>


Looks good to me

Acked-by: Balbir Singh <[email protected]>
Tested-by: Balbir Singh <[email protected]>

--
Balbir

2008-12-15 08:41:35

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)

On Mon, 15 Dec 2008 14:07:26 +0530
Balbir Singh <[email protected]> wrote:

> * KAMEZAWA Hiroyuki <[email protected]> [2008-12-15 16:07:51]:
>
> >
> > From: KAMEZAWA Hiroyuki <[email protected]>
> >
> > Fix swapin charge operation of memcg.
> >
> > Now, memcg has hooks to swap-out operation and checks SwapCache is really
> > unused or not. That check depends on contents of struct page.
> > I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
> > still-in-use.
> >
> > Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
> > of any rmap. Then, in followinig sequence
> >
> > (Page fault with WRITE)
> > try_charge() (charge += PAGESIZE)
> > commit_charge() (Check page_cgroup is used or not..)
> > reuse_swap_page()
> > -> delete_from_swapcache()
> > -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
> > ......
> > New charge is uncharged soon....
> > To avoid this, move commit_charge() after page_mapcount() goes up to 1.
> > By this,
> >
> > try_charge() (usage += PAGESIZE)
> > reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set)
> > commit_charge() (If page_cgroup is not marked as PCG_USED,
> > add new charge.)
> > Accounting will be correct.
> >
> > Changelog (v2) -> (v3)
> > - fixed invalid charge to swp_entry==0.
> > - updated documentation.
> > Changelog (v1) -> (v2)
> > - fixed comment.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> > ---
> > Documentation/controllers/memcg_test.txt | 41 +++++++++++++++++++++++++++----
> > mm/memcontrol.c | 7 +++--
> > mm/memory.c | 11 ++++----
> > 3 files changed, 46 insertions(+), 13 deletions(-)
> >
> > Index: mmotm-2.6.28-Dec12/mm/memory.c
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/mm/memory.c
> > +++ mmotm-2.6.28-Dec12/mm/memory.c
> > @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct
> > * while the page is counted on swap but not yet in mapcount i.e.
> > * before page_add_anon_rmap() and swap_free(); try_to_free_swap()
> > * must be called after the swap_free(), or it will never succeed.
> > - * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry
> > - * in page->private, must be called before reuse_swap_page(),
> > - * which may delete_from_swap_cache().
> > + * Because delete_from_swap_page() may be called by reuse_swap_page(),
> > + * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry
> > + * in page->private. In this case, a record in swap_cgroup is silently
> > + * discarded at swap_free().
> > */
> >
> > - mem_cgroup_commit_charge_swapin(page, ptr);
> > inc_mm_counter(mm, anon_rss);
> > pte = mk_pte(page, vma->vm_page_prot);
> > if (write_access && reuse_swap_page(page)) {
> > pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> > write_access = 0;
> > }
> > -
>
> Removal of unassociated lines, not sure if that is a good practice.
>
my mistake...

> > flush_icache_page(vma, page);
> > set_pte_at(mm, address, page_table, pte);
> > page_add_anon_rmap(page, vma, address);
> > + /* It's better to call commit-charge after rmap is established */
> > + mem_cgroup_commit_charge_swapin(page, ptr);
> >
>
> Yes, it does make sense
>
> > swap_free(entry);
> > if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
> > Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt
> > +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt
> > @@ -1,6 +1,6 @@
> > Memory Resource Controller(Memcg) Implementation Memo.
> > -Last Updated: 2008/12/10
> > -Base Kernel Version: based on 2.6.28-rc7-mm.
> > +Last Updated: 2008/12/15
> > +Base Kernel Version: based on 2.6.28-rc8-mm.
> >
> > Because VM is getting complex (one of reasons is memcg...), memcg's behavior
> > is complex. This is a document for memcg's internal behavior.
> > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
> > (b) If the SwapCache has been mapped by processes, it has been
> > charged already.
> >
> > - In case (a), we charge it. In case (b), we don't charge it.
> > - (But racy state between (a) and (b) exists. We do check it.)
> > - At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> > + This swap-in is one of the most complicated work. In do_swap_page(),
> > + following events occur when pte is unchanged.
> > +
> > + (1) the page (SwapCache) is looked up.
> > + (2) lock_page()
> > + (3) try_charge_swapin()
> > + (4) reuse_swap_page() (may call delete_swap_cache())
> > + (5) commit_charge_swapin()
> > + (6) swap_free().
> > +
> > + Considering following situation for example.
> > +
> > + (A) The page has not been charged before (2) and reuse_swap_page()
> > + doesn't call delete_from_swap_cache().
> > + (B) The page has not been charged before (2) and reuse_swap_page()
> > + calls delete_from_swap_cache().
> > + (C) The page has been charged before (2) and reuse_swap_page() doesn't
> > + call delete_from_swap_cache().
> > + (D) The page has been charged before (2) and reuse_swap_page() calls
> > + delete_from_swap_cache().
> > +
> > + memory.usage/memsw.usage changes to this page/swp_entry will be
> > + Case (A) (B) (C) (D)
> > + Event
> > + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
> > + ===========================================
> > + (3) +1/+1 +1/+1 +1/+1 +1/+1
> > + (4) - 0/ 0 - -1/ 0
> > + (5) 0/ 1 0/-1 -1/-1 0/ 0
> > + (6) - - - 0/-1
> > + ===========================================
> > + Result 1/ 1 1/1 1/ 1 1/ 1
> > +
> > + In any cases, charges to this page should be 1/ 1.
> >
>
> The documentation patch failed to apply for me
>
Hmm... I'll check my queue again.

Thanks,
-Kame

> > 4.2 Swap-out.
> > At swap-out, typical state transition is below.
> > Index: mmotm-2.6.28-Dec12/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c
> > +++ mmotm-2.6.28-Dec12/mm/memcontrol.c
> > @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
> > /*
> > * Now swap is on-memory. This means this page may be
> > * counted both as mem and swap....double count.
> > - * Fix it by uncharging from memsw. This SwapCache is stable
> > - * because we're still under lock_page().
> > + * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> > + * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> > + * may call delete_from_swap_cache() before reach here.
> > */
> > - if (do_swap_account) {
> > + if (do_swap_account && PageSwapCache(page)) {
> > swp_entry_t ent = {.val = page_private(page)};
> > struct mem_cgroup *memcg;
> > memcg = swap_cgroup_record(ent, NULL);
> >
> >
>
>
> Looks good to me
>
> Acked-by: Balbir Singh <[email protected]>
> Tested-by: Balbir Singh <[email protected]>
>
> --
> Balbir
>

2008-12-15 10:33:56

by Hugh Dickins

[permalink] [raw]
Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)

On Mon, 15 Dec 2008, KAMEZAWA Hiroyuki wrote:
>
> Fix swapin charge operation of memcg.
>
> @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str
> /*
> * Now swap is on-memory. This means this page may be
> * counted both as mem and swap....double count.
> - * Fix it by uncharging from memsw. This SwapCache is stable
> - * because we're still under lock_page().
> + * Fix it by uncharging from memsw. Basically, this SwapCache is stable
> + * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page()
> + * may call delete_from_swap_cache() before reach here.
> */
> - if (do_swap_account) {
> + if (do_swap_account && PageSwapCache(page)) {
> swp_entry_t ent = {.val = page_private(page)};
> struct mem_cgroup *memcg;
> memcg = swap_cgroup_record(ent, NULL);

Yes, that addition looks good to me.

Hugh

2008-12-16 04:29:20

by Daisuke Nishimura

[permalink] [raw]
Subject: [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3))

Sorry for late reply.

> @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
> (b) If the SwapCache has been mapped by processes, it has been
> charged already.
>
> - In case (a), we charge it. In case (b), we don't charge it.
> - (But racy state between (a) and (b) exists. We do check it.)
> - At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> + This swap-in is one of the most complicated work. In do_swap_page(),
> + following events occur when pte is unchanged.
> +
> + (1) the page (SwapCache) is looked up.
> + (2) lock_page()
> + (3) try_charge_swapin()
> + (4) reuse_swap_page() (may call delete_swap_cache())
> + (5) commit_charge_swapin()
> + (6) swap_free().
> +
> + Considering following situation for example.
> +
> + (A) The page has not been charged before (2) and reuse_swap_page()
> + doesn't call delete_from_swap_cache().
> + (B) The page has not been charged before (2) and reuse_swap_page()
> + calls delete_from_swap_cache().
> + (C) The page has been charged before (2) and reuse_swap_page() doesn't
> + call delete_from_swap_cache().
> + (D) The page has been charged before (2) and reuse_swap_page() calls
> + delete_from_swap_cache().
> +
> + memory.usage/memsw.usage changes to this page/swp_entry will be
> + Case (A) (B) (C) (D)
> + Event
> + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
> + ===========================================
> + (3) +1/+1 +1/+1 +1/+1 +1/+1
> + (4) - 0/ 0 - -1/ 0
> + (5) 0/ 1 0/-1 -1/-1 0/ 0
> + (6) - - - 0/-1
> + ===========================================
> + Result 1/ 1 1/1 1/ 1 1/ 1
> +
> + In any cases, charges to this page should be 1/ 1.
>
I've verified that charges will result in valid values by tracing source code
in all of these cases, but in case of (B) I don't think commit_charge_swapin
does memsw-- because PageSwapCache has been cleared already. swap_free does
memsw-- in this case.

I attached a fix patch.

Thanks,
Daisuke Nishimura.

===
From: Daisuke Nishimura <[email protected]>

fix for documentation.

Signed-off-by: Daisuke Nishimura <[email protected]>
---
Documentation/controllers/memcg_test.txt | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt
index 3c1458a..08d4d3e 100644
--- a/Documentation/controllers/memcg_test.txt
+++ b/Documentation/controllers/memcg_test.txt
@@ -139,10 +139,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
===========================================
(3) +1/+1 +1/+1 +1/+1 +1/+1
(4) - 0/ 0 - -1/ 0
- (5) 0/ 1 0/-1 -1/-1 0/ 0
- (6) - - - 0/-1
+ (5) 0/-1 0/ 0 -1/-1 0/ 0
+ (6) - 0/-1 - 0/-1
===========================================
- Result 1/ 1 1/1 1/ 1 1/ 1
+ Result 1/ 1 1/ 1 1/ 1 1/ 1

In any cases, charges to this page should be 1/ 1.

2008-12-16 04:54:21

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3))

On Tue, 16 Dec 2008 13:02:30 +0900
Daisuke Nishimura <[email protected]> wrote:

> Sorry for late reply.
>
> > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI
> > (b) If the SwapCache has been mapped by processes, it has been
> > charged already.
> >
> > - In case (a), we charge it. In case (b), we don't charge it.
> > - (But racy state between (a) and (b) exists. We do check it.)
> > - At charging, a charge recorded in swap_cgroup is moved to page_cgroup.
> > + This swap-in is one of the most complicated work. In do_swap_page(),
> > + following events occur when pte is unchanged.
> > +
> > + (1) the page (SwapCache) is looked up.
> > + (2) lock_page()
> > + (3) try_charge_swapin()
> > + (4) reuse_swap_page() (may call delete_swap_cache())
> > + (5) commit_charge_swapin()
> > + (6) swap_free().
> > +
> > + Considering following situation for example.
> > +
> > + (A) The page has not been charged before (2) and reuse_swap_page()
> > + doesn't call delete_from_swap_cache().
> > + (B) The page has not been charged before (2) and reuse_swap_page()
> > + calls delete_from_swap_cache().
> > + (C) The page has been charged before (2) and reuse_swap_page() doesn't
> > + call delete_from_swap_cache().
> > + (D) The page has been charged before (2) and reuse_swap_page() calls
> > + delete_from_swap_cache().
> > +
> > + memory.usage/memsw.usage changes to this page/swp_entry will be
> > + Case (A) (B) (C) (D)
> > + Event
> > + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
> > + ===========================================
> > + (3) +1/+1 +1/+1 +1/+1 +1/+1
> > + (4) - 0/ 0 - -1/ 0
> > + (5) 0/ 1 0/-1 -1/-1 0/ 0
> > + (6) - - - 0/-1
> > + ===========================================
> > + Result 1/ 1 1/1 1/ 1 1/ 1
> > +
> > + In any cases, charges to this page should be 1/ 1.
> >
> I've verified that charges will result in valid values by tracing source code
> in all of these cases, but in case of (B) I don't think commit_charge_swapin
> does memsw-- because PageSwapCache has been cleared already. swap_free does
> memsw-- in this case.
>
> I attached a fix patch.
>
you're right. it comes from this version's new change....sorry.

Acked-by; KAMEZAWA Hiroyuki <[email protected]>


> Thanks,
> Daisuke Nishimura.
>
> ===
> From: Daisuke Nishimura <[email protected]>
>
> fix for documentation.
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> Documentation/controllers/memcg_test.txt | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt
> index 3c1458a..08d4d3e 100644
> --- a/Documentation/controllers/memcg_test.txt
> +++ b/Documentation/controllers/memcg_test.txt
> @@ -139,10 +139,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
> ===========================================
> (3) +1/+1 +1/+1 +1/+1 +1/+1
> (4) - 0/ 0 - -1/ 0
> - (5) 0/ 1 0/-1 -1/-1 0/ 0
> - (6) - - - 0/-1
> + (5) 0/-1 0/ 0 -1/-1 0/ 0
> + (6) - 0/-1 - 0/-1
> ===========================================
> - Result 1/ 1 1/1 1/ 1 1/ 1
> + Result 1/ 1 1/ 1 1/ 1 1/ 1
>
> In any cases, charges to this page should be 1/ 1.
>
>