2010-04-13 04:50:44

by Daisuke Nishimura

[permalink] [raw]
Subject: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

Hi.

When I was testing page migration, I found underflow problem of "mapped_file" field
in memory.stat. This is a fix for the problem.

This patch is based on mmotm-2010-04-05-16-09, and IIUC it conflicts with Mel's
compaction patches, so I send it as RFC for now. After next mmotm, which will
include those patches, I'll update and resend this patch.

===
From: Daisuke Nishimura <[email protected]>

page_add_file_rmap(), which can be called from remove_migration_ptes(), is
assumed to increment memcg's stat of mapped file. But on success of page
migration, the newpage(mapped file) has not been charged yet, so the stat will
not be incremented. This behavior leads to underflow of memcg's stat because
when the newpage is unmapped afterwards, page_remove_rmap() decrements the stat.
This problem doesn't happen on failure path of page migration, because the old
page(mapped file) hasn't been uncharge at the point of remove_migration_ptes().
This patch fixes this problem by calling commit_charge(mem_cgroup_end_migration)
before remove_migration_ptes().

Signed-off-by: Daisuke Nishimura <[email protected]>
---
mm/migrate.c | 19 ++++++++++++++-----
1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 5938db5..915c35e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -485,7 +485,8 @@ static int fallback_migrate_page(struct address_space *mapping,
* < 0 - error code
* == 0 - success
*/
-static int move_to_new_page(struct page *newpage, struct page *page)
+static int move_to_new_page(struct page *newpage, struct page *page,
+ struct mem_cgroup *mem)
{
struct address_space *mapping;
int rc;
@@ -520,9 +521,16 @@ static int move_to_new_page(struct page *newpage, struct page *page)
else
rc = fallback_migrate_page(mapping, newpage, page);

- if (!rc)
+ if (!rc) {
+ /*
+ * On success of page migration, the newpage has not been
+ * charged yet, so we must call end_migration() before
+ * remove_migration_ptes() to update stats of mapped file
+ * properly.
+ */
+ mem_cgroup_end_migration(mem, page, newpage);
remove_migration_ptes(page, newpage);
- else
+ } else
newpage->mapping = NULL;

unlock_page(newpage);
@@ -633,7 +641,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,

skip_unmap:
if (!page_mapped(page))
- rc = move_to_new_page(newpage, page);
+ rc = move_to_new_page(newpage, page, mem);

if (rc)
remove_migration_ptes(page, page);
@@ -641,7 +649,8 @@ rcu_unlock:
if (rcu_locked)
rcu_read_unlock();
uncharge:
- if (!charge)
+ if (rc)
+ /* On success of page migration, we've alread called it */
mem_cgroup_end_migration(mem, page, newpage);
unlock:
unlock_page(page);
--
1.6.4


2010-04-13 06:17:58

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

On Tue, 13 Apr 2010 13:42:07 +0900
Daisuke Nishimura <[email protected]> wrote:

> Hi.
>
> When I was testing page migration, I found underflow problem of "mapped_file" field
> in memory.stat. This is a fix for the problem.
>
> This patch is based on mmotm-2010-04-05-16-09, and IIUC it conflicts with Mel's
> compaction patches, so I send it as RFC for now. After next mmotm, which will
> include those patches, I'll update and resend this patch.
>
> ===
> From: Daisuke Nishimura <[email protected]>
>
> page_add_file_rmap(), which can be called from remove_migration_ptes(), is
> assumed to increment memcg's stat of mapped file. But on success of page
> migration, the newpage(mapped file) has not been charged yet, so the stat will
> not be incremented. This behavior leads to underflow of memcg's stat because
> when the newpage is unmapped afterwards, page_remove_rmap() decrements the stat.
> This problem doesn't happen on failure path of page migration, because the old
> page(mapped file) hasn't been uncharge at the point of remove_migration_ptes().
> This patch fixes this problem by calling commit_charge(mem_cgroup_end_migration)
> before remove_migration_ptes().
>
> Signed-off-by: Daisuke Nishimura <[email protected]>

Nice catch. but...I want to make all kind of complicated things under
prepare/end migration. (And I want to avoid changes in migrate.c...)

Considering some racy condistions, I wonder memcg_update_file_mapped() itself
still need fixes..

So, how about this ? We already added FILE_MAPPED flags, then, make use of it.
==


At migrating mapped file, events happens in following sequence.

1. allocate a new page.
2. get memcg of an old page.
3. charge ageinst new page, before migration. But at this point
no changes to page_cgroup, no commit-charge.
4. page migration replaces radix-tree, old-page and new-page.
5. page migration remaps the new page if the old page was mapped.
6. memcg commits the charge for newpage.

Because "commit" happens after page-remap, we lose file_mapped
accounting information at migration.

This patch fixes it by accounting file_mapped information at
commiting charge.

Reported-by: Daisuke Nishimura <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

Index: mmotm-temp/mm/memcontrol.c
===================================================================
--- mmotm-temp.orig/mm/memcontrol.c
+++ mmotm-temp/mm/memcontrol.c
@@ -1435,11 +1435,13 @@ void mem_cgroup_update_file_mapped(struc

/*
* Preemption is already disabled. We can use __this_cpu_xxx
+ * We have no lock per page at inc/dec mapcount of pages. We have to do
+ * check by ourselves under lock_page_cgroup().
*/
- if (val > 0) {
+ if (val > 0 && !PageCgroupFileMapped(pc)) {
__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
SetPageCgroupFileMapped(pc);
- } else {
+ } else if (PageCgroupFileMapped(pc)) {
__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
ClearPageCgroupFileMapped(pc);
}
@@ -2563,6 +2565,15 @@ void mem_cgroup_end_migration(struct mem
*/
if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
mem_cgroup_uncharge_page(target);
+ else {
+ /*
+ * When a migrated file cache is remapped, it's not charged.
+ * Verify it. Because we're under lock_page(), there are
+ * no race with uncharge.
+ */
+ if (page_mapped(target))
+ mem_cgroup_update_file_mapped(mem, target, 1);
+ }
/*
* At migration, we may charge account against cgroup which has no tasks
* So, rmdir()->pre_destroy() can be called while we do this charge.

2010-04-13 06:45:55

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

* [email protected] <[email protected]> [2010-04-13 13:42:07]:

> Hi.
>
> When I was testing page migration, I found underflow problem of "mapped_file" field
> in memory.stat. This is a fix for the problem.
>
> This patch is based on mmotm-2010-04-05-16-09, and IIUC it conflicts with Mel's
> compaction patches, so I send it as RFC for now. After next mmotm, which will
> include those patches, I'll update and resend this patch.
>
> ===
> From: Daisuke Nishimura <[email protected]>
>
> page_add_file_rmap(), which can be called from remove_migration_ptes(), is
> assumed to increment memcg's stat of mapped file. But on success of page
> migration, the newpage(mapped file) has not been charged yet, so the stat will
> not be incremented. This behavior leads to underflow of memcg's stat because
> when the newpage is unmapped afterwards, page_remove_rmap() decrements the stat.
> This problem doesn't happen on failure path of page migration, because the old
> page(mapped file) hasn't been uncharge at the point of remove_migration_ptes().
> This patch fixes this problem by calling commit_charge(mem_cgroup_end_migration)
> before remove_migration_ptes().
>
> Signed-off-by: Daisuke Nishimura <[email protected]>
> ---
> mm/migrate.c | 19 ++++++++++++++-----
> 1 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 5938db5..915c35e 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -485,7 +485,8 @@ static int fallback_migrate_page(struct address_space *mapping,
> * < 0 - error code
> * == 0 - success
> */
> -static int move_to_new_page(struct page *newpage, struct page *page)
> +static int move_to_new_page(struct page *newpage, struct page *page,
> + struct mem_cgroup *mem)
> {
> struct address_space *mapping;
> int rc;
> @@ -520,9 +521,16 @@ static int move_to_new_page(struct page *newpage, struct page *page)
> else
> rc = fallback_migrate_page(mapping, newpage, page);
>
> - if (!rc)
> + if (!rc) {
> + /*
> + * On success of page migration, the newpage has not been
> + * charged yet, so we must call end_migration() before
> + * remove_migration_ptes() to update stats of mapped file
> + * properly.
> + */
> + mem_cgroup_end_migration(mem, page, newpage);
> remove_migration_ptes(page, newpage);
> - else
> + } else
> newpage->mapping = NULL;
>
> unlock_page(newpage);
> @@ -633,7 +641,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
>
> skip_unmap:
> if (!page_mapped(page))
> - rc = move_to_new_page(newpage, page);
> + rc = move_to_new_page(newpage, page, mem);

Why do we need to pass mem, won't try_get_mem_cgroup_from_page() help?
Is it cost versus space tradeoff.

>
> if (rc)
> remove_migration_ptes(page, page);
> @@ -641,7 +649,8 @@ rcu_unlock:
> if (rcu_locked)
> rcu_read_unlock();
> uncharge:
> - if (!charge)
> + if (rc)
> + /* On success of page migration, we've alread called it */

Comment is not clear to me, but the code is :)

> mem_cgroup_end_migration(mem, page, newpage);
> unlock:
> unlock_page(page);
> --
> 1.6.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Three Cheers,
Balbir

2010-04-14 01:00:55

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

On Tue, 13 Apr 2010 15:14:00 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Tue, 13 Apr 2010 13:42:07 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > Hi.
> >
> > When I was testing page migration, I found underflow problem of "mapped_file" field
> > in memory.stat. This is a fix for the problem.
> >
> > This patch is based on mmotm-2010-04-05-16-09, and IIUC it conflicts with Mel's
> > compaction patches, so I send it as RFC for now. After next mmotm, which will
> > include those patches, I'll update and resend this patch.
> >
> > ===
> > From: Daisuke Nishimura <[email protected]>
> >
> > page_add_file_rmap(), which can be called from remove_migration_ptes(), is
> > assumed to increment memcg's stat of mapped file. But on success of page
> > migration, the newpage(mapped file) has not been charged yet, so the stat will
> > not be incremented. This behavior leads to underflow of memcg's stat because
> > when the newpage is unmapped afterwards, page_remove_rmap() decrements the stat.
> > This problem doesn't happen on failure path of page migration, because the old
> > page(mapped file) hasn't been uncharge at the point of remove_migration_ptes().
> > This patch fixes this problem by calling commit_charge(mem_cgroup_end_migration)
> > before remove_migration_ptes().
> >
> > Signed-off-by: Daisuke Nishimura <[email protected]>
>
> Nice catch. but...I want to make all kind of complicated things under
> prepare/end migration. (And I want to avoid changes in migrate.c...)
>
hmm, I want to call mem_cgroup_update_file_mapped() only where we update
NR_FILE_MAPPED, but, okey, I see your concern.

> Considering some racy condistions, I wonder memcg_update_file_mapped() itself
> still need fixes..
>
> So, how about this ? We already added FILE_MAPPED flags, then, make use of it.
> ==
>
>
> At migrating mapped file, events happens in following sequence.
>
> 1. allocate a new page.
> 2. get memcg of an old page.
> 3. charge ageinst new page, before migration. But at this point
> no changes to page_cgroup, no commit-charge.
> 4. page migration replaces radix-tree, old-page and new-page.
> 5. page migration remaps the new page if the old page was mapped.
> 6. memcg commits the charge for newpage.
>
> Because "commit" happens after page-remap, we lose file_mapped
> accounting information at migration.
>
> This patch fixes it by accounting file_mapped information at
> commiting charge.
>
> Reported-by: Daisuke Nishimura <[email protected]>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> mm/memcontrol.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> Index: mmotm-temp/mm/memcontrol.c
> ===================================================================
> --- mmotm-temp.orig/mm/memcontrol.c
> +++ mmotm-temp/mm/memcontrol.c
> @@ -1435,11 +1435,13 @@ void mem_cgroup_update_file_mapped(struc
>
> /*
> * Preemption is already disabled. We can use __this_cpu_xxx
> + * We have no lock per page at inc/dec mapcount of pages. We have to do
> + * check by ourselves under lock_page_cgroup().
> */
> - if (val > 0) {
> + if (val > 0 && !PageCgroupFileMapped(pc)) {
> __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> SetPageCgroupFileMapped(pc);
> - } else {
> + } else if (PageCgroupFileMapped(pc)) {
> __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> ClearPageCgroupFileMapped(pc);
> }
Adding likely() is better ? IIUC, these conditions are usually met except for
the case of page migration. And, can you add a comment about it ?

> @@ -2563,6 +2565,15 @@ void mem_cgroup_end_migration(struct mem
> */
> if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> mem_cgroup_uncharge_page(target);
> + else {
> + /*
> + * When a migrated file cache is remapped, it's not charged.
> + * Verify it. Because we're under lock_page(), there are
> + * no race with uncharge.
> + */
> + if (page_mapped(target))
> + mem_cgroup_update_file_mapped(mem, target, 1);
> + }
We cannot rely on page lock, because when we succeeded in page migration,
"target" = "newpage" has already unlocked in move_to_new_page(). So the "target"
can be removed from the radix-tree theoretically(it's not related to this
underflow problem, though).
Shouldn't we call lock_page(target) and check "if (!target->mapping)" to handle
this case(maybe in another patch) ?

Thanks,
Daisuke Nishimura.

> /*
> * At migration, we may charge account against cgroup which has no tasks
> * So, rmdir()->pre_destroy() can be called while we do this charge.
>

2010-04-14 01:07:10

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

On Wed, 14 Apr 2010 09:54:08 +0900
Daisuke Nishimura <[email protected]> wrote:

> On Tue, 13 Apr 2010 15:14:00 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > On Tue, 13 Apr 2010 13:42:07 +0900
> > Daisuke Nishimura <[email protected]> wrote:
> >
> > > Hi.
> > >
> > > When I was testing page migration, I found underflow problem of "mapped_file" field
> > > in memory.stat. This is a fix for the problem.
> > >
> > > This patch is based on mmotm-2010-04-05-16-09, and IIUC it conflicts with Mel's
> > > compaction patches, so I send it as RFC for now. After next mmotm, which will
> > > include those patches, I'll update and resend this patch.
> > >
> > > ===
> > > From: Daisuke Nishimura <[email protected]>
> > >
> > > page_add_file_rmap(), which can be called from remove_migration_ptes(), is
> > > assumed to increment memcg's stat of mapped file. But on success of page
> > > migration, the newpage(mapped file) has not been charged yet, so the stat will
> > > not be incremented. This behavior leads to underflow of memcg's stat because
> > > when the newpage is unmapped afterwards, page_remove_rmap() decrements the stat.
> > > This problem doesn't happen on failure path of page migration, because the old
> > > page(mapped file) hasn't been uncharge at the point of remove_migration_ptes().
> > > This patch fixes this problem by calling commit_charge(mem_cgroup_end_migration)
> > > before remove_migration_ptes().
> > >
> > > Signed-off-by: Daisuke Nishimura <[email protected]>
> >
> > Nice catch. but...I want to make all kind of complicated things under
> > prepare/end migration. (And I want to avoid changes in migrate.c...)
> >
> hmm, I want to call mem_cgroup_update_file_mapped() only where we update
> NR_FILE_MAPPED, but, okey, I see your concern.
>

Thank you.


> > Considering some racy condistions, I wonder memcg_update_file_mapped() itself
> > still need fixes..
> >
> > So, how about this ? We already added FILE_MAPPED flags, then, make use of it.
> > ==
> >
> >
> > At migrating mapped file, events happens in following sequence.
> >
> > 1. allocate a new page.
> > 2. get memcg of an old page.
> > 3. charge ageinst new page, before migration. But at this point
> > no changes to page_cgroup, no commit-charge.
> > 4. page migration replaces radix-tree, old-page and new-page.
> > 5. page migration remaps the new page if the old page was mapped.
> > 6. memcg commits the charge for newpage.
> >
> > Because "commit" happens after page-remap, we lose file_mapped
> > accounting information at migration.
> >
> > This patch fixes it by accounting file_mapped information at
> > commiting charge.
> >
> > Reported-by: Daisuke Nishimura <[email protected]>
> > Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> > ---
> > mm/memcontrol.c | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > Index: mmotm-temp/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-temp.orig/mm/memcontrol.c
> > +++ mmotm-temp/mm/memcontrol.c
> > @@ -1435,11 +1435,13 @@ void mem_cgroup_update_file_mapped(struc
> >
> > /*
> > * Preemption is already disabled. We can use __this_cpu_xxx
> > + * We have no lock per page at inc/dec mapcount of pages. We have to do
> > + * check by ourselves under lock_page_cgroup().
> > */
> > - if (val > 0) {
> > + if (val > 0 && !PageCgroupFileMapped(pc)) {
> > __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > SetPageCgroupFileMapped(pc);
> > - } else {
> > + } else if (PageCgroupFileMapped(pc)) {
> > __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > ClearPageCgroupFileMapped(pc);
> > }
> Adding likely() is better ? IIUC, these conditions are usually met except for
> the case of page migration. And, can you add a comment about it ?
>
Sure.


> > @@ -2563,6 +2565,15 @@ void mem_cgroup_end_migration(struct mem
> > */
> > if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> > mem_cgroup_uncharge_page(target);
> > + else {
> > + /*
> > + * When a migrated file cache is remapped, it's not charged.
> > + * Verify it. Because we're under lock_page(), there are
> > + * no race with uncharge.
> > + */
> > + if (page_mapped(target))
> > + mem_cgroup_update_file_mapped(mem, target, 1);
> > + }
> We cannot rely on page lock, because when we succeeded in page migration,
> "target" = "newpage" has already unlocked in move_to_new_page(). So the "target"
> can be removed from the radix-tree theoretically(it's not related to this
> underflow problem, though).
> Shouldn't we call lock_page(target) and check "if (!target->mapping)" to handle
> this case(maybe in another patch) ?
>
Sounds reasonable. I think about that.

Thanks,
-Kame

2010-04-14 01:44:10

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

On Wed, 14 Apr 2010 10:03:08 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> > > @@ -2563,6 +2565,15 @@ void mem_cgroup_end_migration(struct mem
> > > */
> > > if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> > > mem_cgroup_uncharge_page(target);
> > > + else {
> > > + /*
> > > + * When a migrated file cache is remapped, it's not charged.
> > > + * Verify it. Because we're under lock_page(), there are
> > > + * no race with uncharge.
> > > + */
> > > + if (page_mapped(target))
> > > + mem_cgroup_update_file_mapped(mem, target, 1);
> > > + }
> > We cannot rely on page lock, because when we succeeded in page migration,
> > "target" = "newpage" has already unlocked in move_to_new_page(). So the "target"
> > can be removed from the radix-tree theoretically(it's not related to this
> > underflow problem, though).
> > Shouldn't we call lock_page(target) and check "if (!target->mapping)" to handle
> > this case(maybe in another patch) ?
> >
> Sounds reasonable. I think about that.
>

Ah, PageCgroupUsed() is already checked under lock_page_cgroup(). It's enough.

Thanks,
-Kame

2010-04-14 02:00:11

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

On Wed, 14 Apr 2010 10:40:10 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> On Wed, 14 Apr 2010 10:03:08 +0900
> KAMEZAWA Hiroyuki <[email protected]> wrote:
>
> > > > @@ -2563,6 +2565,15 @@ void mem_cgroup_end_migration(struct mem
> > > > */
> > > > if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> > > > mem_cgroup_uncharge_page(target);
> > > > + else {
> > > > + /*
> > > > + * When a migrated file cache is remapped, it's not charged.
> > > > + * Verify it. Because we're under lock_page(), there are
> > > > + * no race with uncharge.
> > > > + */
> > > > + if (page_mapped(target))
> > > > + mem_cgroup_update_file_mapped(mem, target, 1);
> > > > + }
> > > We cannot rely on page lock, because when we succeeded in page migration,
> > > "target" = "newpage" has already unlocked in move_to_new_page(). So the "target"
> > > can be removed from the radix-tree theoretically(it's not related to this
> > > underflow problem, though).
> > > Shouldn't we call lock_page(target) and check "if (!target->mapping)" to handle
> > > this case(maybe in another patch) ?
> > >
> > Sounds reasonable. I think about that.
> >
>

Thinking again....new page is unlocked here. It means the new page may be
removed from radix-tree before commit_charge().

Haha, it seems totally wrong. please wait..

Thanks,
-Kame





2010-04-14 03:10:21

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

On Wed, 14 Apr 2010 10:56:08 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:
> Thinking again....new page is unlocked here. It means the new page may be
> removed from radix-tree before commit_charge().
>
> Haha, it seems totally wrong. please wait..
>

This is my answer. How do you think ?
I think this logic is simple. (But yes, we should check corner cases.)

==

At page migration, the new page is unlocked before calling end_migration().
This is mis-understanding with page-migration code of memcg.
And FILE_MAPPED of migrated file cache is not properly updated, now.

At migrating mapped file, events happens in following sequence.

1. allocate a new page.
2. get memcg of an old page.
3. charge ageinst new page, before migration. But at this point
no changes to page_cgroup, no commit-charge.
4. page migration replaces radix-tree, old-page and new-page.
5. page migration remaps the new page if the old page was mapped.
6. memcg commits the charge for newpage.

Because "commit" happens after page-remap, we lose file_mapped
accounting information at migration.

For fixing all, this changes parepare/end migration.
New migration sequence with memcg is:

1. allocate a new page.
2. charge against a new page onto old page's memcg. (here, the new page
is marked as PageCgroupUsed.)
3. page migration replaces radix-tree, page table, etc...
4. At remapping, FILE_MAPPED will be properly updated. (because newpage is marked as
USED, already)
5. If anonymous page is freed before remap, check it and fixup accounting.


Reported-by: Daisuke Nishimura <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
include/linux/memcontrol.h | 6 +-
mm/memcontrol.c | 95 ++++++++++++++++++++++++---------------------
mm/migrate.c | 2
3 files changed, 56 insertions(+), 47 deletions(-)

Index: mmotm-temp/mm/memcontrol.c
===================================================================
--- mmotm-temp.orig/mm/memcontrol.c
+++ mmotm-temp/mm/memcontrol.c
@@ -2501,10 +2501,12 @@ static inline int mem_cgroup_move_swap_a
* Before starting migration, account PAGE_SIZE to mem_cgroup that the old
* page belongs to.
*/
-int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+int mem_cgroup_prepare_migration(struct page *page,
+ struct page *newpage, struct mem_cgroup **ptr)
{
struct page_cgroup *pc;
struct mem_cgroup *mem = NULL;
+ enum charge_type ctype;
int ret = 0;

if (mem_cgroup_disabled())
@@ -2517,65 +2519,70 @@ int mem_cgroup_prepare_migration(struct
css_get(&mem->css);
}
unlock_page_cgroup(pc);
-
- if (mem) {
- ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
- css_put(&mem->css);
- }
- *ptr = mem;
+ /*
+ * If the page is uncharged before migration (removed from radix-tree)
+ * we return here.
+ */
+ if (!mem)
+ return 0;
+ ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
+ css_put(&mem->css); /* drop extra refcnt */
+ if (ret)
+ return ret;
+ /*
+ * The old page is under lock_page().
+ * If the old_page is uncharged and freed while migration, page migration
+ * will fail and newpage will properly uncharged by end_migration.
+ * And commit_charge against newpage never fails.
+ */
+ pc = lookup_page_cgroup(newpage);
+ if (PageAnon(page))
+ ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
+ else if (!PageSwapBacked(page))
+ ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
+ else
+ ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
+ __mem_cgroup_commit_charge(mem, pc, ctype);
+ /* FILE_MAPPED of this page will be updated at remap routine */
return ret;
}

/* remove redundant charge if migration failed*/
void mem_cgroup_end_migration(struct mem_cgroup *mem,
- struct page *oldpage, struct page *newpage)
+ struct page *oldpage, struct page *newpage)
{
- struct page *target, *unused;
- struct page_cgroup *pc;
- enum charge_type ctype;
+ struct page *used, *unused;

if (!mem)
return;
cgroup_exclude_rmdir(&mem->css);
+
+
/* at migration success, oldpage->mapping is NULL. */
if (oldpage->mapping) {
- target = oldpage;
- unused = NULL;
+ used = oldpage;
+ unused = newpage;
} else {
- target = newpage;
+ used = newpage;
unused = oldpage;
}
-
- if (PageAnon(target))
- ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
- else if (page_is_file_cache(target))
- ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
- else
- ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
-
- /* unused page is not on radix-tree now. */
- if (unused)
- __mem_cgroup_uncharge_common(unused, ctype);
-
- pc = lookup_page_cgroup(target);
- /*
- * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
- * So, double-counting is effectively avoided.
- */
- __mem_cgroup_commit_charge(mem, pc, ctype);
-
+ /* PageCgroupUsed() flag check will do all we want */
+ mem_cgroup_uncharge_page(unused);
/*
- * Both of oldpage and newpage are still under lock_page().
- * Then, we don't have to care about race in radix-tree.
- * But we have to be careful that this page is unmapped or not.
- *
- * There is a case for !page_mapped(). At the start of
- * migration, oldpage was mapped. But now, it's zapped.
- * But we know *target* page is not freed/reused under us.
- * mem_cgroup_uncharge_page() does all necessary checks.
- */
- if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
- mem_cgroup_uncharge_page(target);
+ * If old page was file cache, and removed from radix-tree
+ * before lock_page(), perepare_migration doesn't charge and we never
+ * reach here.
+ *
+ * Considering ANON pages, we can't depend on lock_page.
+ * If a page may be unmapped before it's remapped, new page's
+ * mapcount will not increase. (case that mapcount 0->1 never occur.)
+ * PageCgroupUsed() and SwapCache checks will be done.
+ *
+ * Once mapcount goes to 1, our hook to page_remove_rmap will do
+ * enough jobs.
+ */
+ if (PageAnon(used) && !page_mapped(used))
+ mem_cgroup_uncharge_page(used);
/*
* At migration, we may charge account against cgroup which has no tasks
* So, rmdir()->pre_destroy() can be called while we do this charge.
Index: mmotm-temp/mm/migrate.c
===================================================================
--- mmotm-temp.orig/mm/migrate.c
+++ mmotm-temp/mm/migrate.c
@@ -576,7 +576,7 @@ static int unmap_and_move(new_page_t get
}

/* charge against new page */
- charge = mem_cgroup_prepare_migration(page, &mem);
+ charge = mem_cgroup_prepare_migration(page, newpage, &mem);
if (charge == -ENOMEM) {
rc = -ENOMEM;
goto unlock;
Index: mmotm-temp/include/linux/memcontrol.h
===================================================================
--- mmotm-temp.orig/include/linux/memcontrol.h
+++ mmotm-temp/include/linux/memcontrol.h
@@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru
extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);

extern int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
+mem_cgroup_prepare_migration(struct page *page,
+ struct page *newpage, struct mem_cgroup **ptr);
extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
struct page *oldpage, struct page *newpage);

@@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state
}

static inline int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
+ struct mem_cgroup **ptr)
{
return 0;
}

2010-04-14 05:41:12

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

On Wed, 14 Apr 2010 12:06:22 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> At page migration, the new page is unlocked before calling end_migration().
> This is mis-understanding with page-migration code of memcg.
> And FILE_MAPPED of migrated file cache is not properly updated, now.
>
> At migrating mapped file, events happens in following sequence.
>
> 1. allocate a new page.
> 2. get memcg of an old page.
> 3. charge ageinst new page, before migration. But at this point
> no changes to page_cgroup, no commit-charge.
> 4. page migration replaces radix-tree, old-page and new-page.
> 5. page migration remaps the new page if the old page was mapped.
> 6. memcg commits the charge for newpage.
>
> Because "commit" happens after page-remap, we lose file_mapped
> accounting information at migration.
>
> For fixing all, this changes parepare/end migration.
> New migration sequence with memcg is:
>
> 1. allocate a new page.
> 2. charge against a new page onto old page's memcg. (here, the new page
> is marked as PageCgroupUsed.)
> 3. page migration replaces radix-tree, page table, etc...
> 4. At remapping, FILE_MAPPED will be properly updated. (because newpage is marked as
> USED, already)
> 5. If anonymous page is freed before remap, check it and fixup accounting.
>
>
Great! I like this change very much.

Some comments are inlined.

> Reported-by: Daisuke Nishimura <[email protected]>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> include/linux/memcontrol.h | 6 +-
> mm/memcontrol.c | 95 ++++++++++++++++++++++++---------------------
> mm/migrate.c | 2
> 3 files changed, 56 insertions(+), 47 deletions(-)
>
> Index: mmotm-temp/mm/memcontrol.c
> ===================================================================
> --- mmotm-temp.orig/mm/memcontrol.c
> +++ mmotm-temp/mm/memcontrol.c
> @@ -2501,10 +2501,12 @@ static inline int mem_cgroup_move_swap_a
> * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
> * page belongs to.
> */
> -int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> +int mem_cgroup_prepare_migration(struct page *page,
> + struct page *newpage, struct mem_cgroup **ptr)
> {
> struct page_cgroup *pc;
> struct mem_cgroup *mem = NULL;
> + enum charge_type ctype;
> int ret = 0;
>
> if (mem_cgroup_disabled())
> @@ -2517,65 +2519,70 @@ int mem_cgroup_prepare_migration(struct
> css_get(&mem->css);
> }
> unlock_page_cgroup(pc);
> -
> - if (mem) {
> - ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> - css_put(&mem->css);
> - }
> - *ptr = mem;
> + /*
> + * If the page is uncharged before migration (removed from radix-tree)
> + * we return here.
> + */
> + if (!mem)
> + return 0;
> + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> + css_put(&mem->css); /* drop extra refcnt */
it should be:

*ptr = mem;
ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
css_put(&mem->css);

as Andrea has fixed already.

> + if (ret)
> + return ret;
> + /*
> + * The old page is under lock_page().
> + * If the old_page is uncharged and freed while migration, page migration
> + * will fail and newpage will properly uncharged by end_migration.
> + * And commit_charge against newpage never fails.
> + */
> + pc = lookup_page_cgroup(newpage);
> + if (PageAnon(page))
> + ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> + else if (!PageSwapBacked(page))
I think using page_is_file_cache() would be better.

> + ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> + else
> + ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> + __mem_cgroup_commit_charge(mem, pc, ctype);
> + /* FILE_MAPPED of this page will be updated at remap routine */
> return ret;
> }
>
> /* remove redundant charge if migration failed*/
> void mem_cgroup_end_migration(struct mem_cgroup *mem,
> - struct page *oldpage, struct page *newpage)
> + struct page *oldpage, struct page *newpage)
> {
> - struct page *target, *unused;
> - struct page_cgroup *pc;
> - enum charge_type ctype;
> + struct page *used, *unused;
>
> if (!mem)
> return;
> cgroup_exclude_rmdir(&mem->css);
> +
> +
unnecessary extra line :)

> /* at migration success, oldpage->mapping is NULL. */
> if (oldpage->mapping) {
> - target = oldpage;
> - unused = NULL;
> + used = oldpage;
> + unused = newpage;
> } else {
> - target = newpage;
> + used = newpage;
> unused = oldpage;
> }
> -
> - if (PageAnon(target))
> - ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> - else if (page_is_file_cache(target))
> - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> - else
> - ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> -
> - /* unused page is not on radix-tree now. */
> - if (unused)
> - __mem_cgroup_uncharge_common(unused, ctype);
> -
> - pc = lookup_page_cgroup(target);
> - /*
> - * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
> - * So, double-counting is effectively avoided.
> - */
> - __mem_cgroup_commit_charge(mem, pc, ctype);
> -
> + /* PageCgroupUsed() flag check will do all we want */
> + mem_cgroup_uncharge_page(unused);
hmm... using mem_cgroup_uncharge_page() would be enough, but I think it doesn't
show what we want: we must uncharge "unused" by all means in PageCgroupUsed case,
and I feel it strange a bit to uncharge "unused" by mem_cgroup_uncharge_page(),
if it *was* a cache page.
So I think __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE)
would be better, otherwise we need more comments to explain why
mem_cgroup_uncharge_page() is enough.

> /*
> - * Both of oldpage and newpage are still under lock_page().
> - * Then, we don't have to care about race in radix-tree.
> - * But we have to be careful that this page is unmapped or not.
> - *
> - * There is a case for !page_mapped(). At the start of
> - * migration, oldpage was mapped. But now, it's zapped.
> - * But we know *target* page is not freed/reused under us.
> - * mem_cgroup_uncharge_page() does all necessary checks.
> - */
> - if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> - mem_cgroup_uncharge_page(target);
> + * If old page was file cache, and removed from radix-tree
> + * before lock_page(), perepare_migration doesn't charge and we never
> + * reach here.
> + *
And if newpage was removed from radix-tree after unlock_page(),
the context which removed it from radix-tree uncharges it properly, because
it is charged at prepare_migration.

right?

> + * Considering ANON pages, we can't depend on lock_page.
> + * If a page may be unmapped before it's remapped, new page's
> + * mapcount will not increase. (case that mapcount 0->1 never occur.)
> + * PageCgroupUsed() and SwapCache checks will be done.
> + *
> + * Once mapcount goes to 1, our hook to page_remove_rmap will do
> + * enough jobs.
> + */
> + if (PageAnon(used) && !page_mapped(used))
> + mem_cgroup_uncharge_page(used);
mem_cgroup_uncharge_page() does the same check :)


Thanks,
Daisuke Nishimura.

> /*
> * At migration, we may charge account against cgroup which has no tasks
> * So, rmdir()->pre_destroy() can be called while we do this charge.
> Index: mmotm-temp/mm/migrate.c
> ===================================================================
> --- mmotm-temp.orig/mm/migrate.c
> +++ mmotm-temp/mm/migrate.c
> @@ -576,7 +576,7 @@ static int unmap_and_move(new_page_t get
> }
>
> /* charge against new page */
> - charge = mem_cgroup_prepare_migration(page, &mem);
> + charge = mem_cgroup_prepare_migration(page, newpage, &mem);
> if (charge == -ENOMEM) {
> rc = -ENOMEM;
> goto unlock;
> Index: mmotm-temp/include/linux/memcontrol.h
> ===================================================================
> --- mmotm-temp.orig/include/linux/memcontrol.h
> +++ mmotm-temp/include/linux/memcontrol.h
> @@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru
> extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);
>
> extern int
> -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
> +mem_cgroup_prepare_migration(struct page *page,
> + struct page *newpage, struct mem_cgroup **ptr);
> extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
> struct page *oldpage, struct page *newpage);
>
> @@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state
> }
>
> static inline int
> -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> +mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
> + struct mem_cgroup **ptr)
> {
> return 0;
> }
>

2010-04-14 05:44:14

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

On Wed, 14 Apr 2010 14:31:32 +0900
Daisuke Nishimura <[email protected]> wrote:

> > Reported-by: Daisuke Nishimura <[email protected]>
> > Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> > ---
> > include/linux/memcontrol.h | 6 +-
> > mm/memcontrol.c | 95 ++++++++++++++++++++++++---------------------
> > mm/migrate.c | 2
> > 3 files changed, 56 insertions(+), 47 deletions(-)
> >
> > Index: mmotm-temp/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-temp.orig/mm/memcontrol.c
> > +++ mmotm-temp/mm/memcontrol.c
> > @@ -2501,10 +2501,12 @@ static inline int mem_cgroup_move_swap_a
> > * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
> > * page belongs to.
> > */
> > -int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> > +int mem_cgroup_prepare_migration(struct page *page,
> > + struct page *newpage, struct mem_cgroup **ptr)
> > {
> > struct page_cgroup *pc;
> > struct mem_cgroup *mem = NULL;
> > + enum charge_type ctype;
> > int ret = 0;
> >
> > if (mem_cgroup_disabled())
> > @@ -2517,65 +2519,70 @@ int mem_cgroup_prepare_migration(struct
> > css_get(&mem->css);
> > }
> > unlock_page_cgroup(pc);
> > -
> > - if (mem) {
> > - ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> > - css_put(&mem->css);
> > - }
> > - *ptr = mem;
> > + /*
> > + * If the page is uncharged before migration (removed from radix-tree)
> > + * we return here.
> > + */
> > + if (!mem)
> > + return 0;
> > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> > + css_put(&mem->css); /* drop extra refcnt */
> it should be:
>
> *ptr = mem;
> ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
> css_put(&mem->css);
>
> as Andrea has fixed already.
>
Ah, yes. I'll rebase this onto Andrea's fix.



> > + if (ret)
> > + return ret;
> > + /*
> > + * The old page is under lock_page().
> > + * If the old_page is uncharged and freed while migration, page migration
> > + * will fail and newpage will properly uncharged by end_migration.
> > + * And commit_charge against newpage never fails.
> > + */
> > + pc = lookup_page_cgroup(newpage);
> > + if (PageAnon(page))
> > + ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> > + else if (!PageSwapBacked(page))
> I think using page_is_file_cache() would be better.
>
Right.



> > + ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> > + else
> > + ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> > + __mem_cgroup_commit_charge(mem, pc, ctype);
> > + /* FILE_MAPPED of this page will be updated at remap routine */
> > return ret;
> > }
> >
> > /* remove redundant charge if migration failed*/
> > void mem_cgroup_end_migration(struct mem_cgroup *mem,
> > - struct page *oldpage, struct page *newpage)
> > + struct page *oldpage, struct page *newpage)
> > {
> > - struct page *target, *unused;
> > - struct page_cgroup *pc;
> > - enum charge_type ctype;
> > + struct page *used, *unused;
> >
> > if (!mem)
> > return;
> > cgroup_exclude_rmdir(&mem->css);
> > +
> > +
> unnecessary extra line :)
>
will remove.



> > /* at migration success, oldpage->mapping is NULL. */
> > if (oldpage->mapping) {
> > - target = oldpage;
> > - unused = NULL;
> > + used = oldpage;
> > + unused = newpage;
> > } else {
> > - target = newpage;
> > + used = newpage;
> > unused = oldpage;
> > }
> > -
> > - if (PageAnon(target))
> > - ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> > - else if (page_is_file_cache(target))
> > - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> > - else
> > - ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> > -
> > - /* unused page is not on radix-tree now. */
> > - if (unused)
> > - __mem_cgroup_uncharge_common(unused, ctype);
> > -
> > - pc = lookup_page_cgroup(target);
> > - /*
> > - * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
> > - * So, double-counting is effectively avoided.
> > - */
> > - __mem_cgroup_commit_charge(mem, pc, ctype);
> > -
> > + /* PageCgroupUsed() flag check will do all we want */
> > + mem_cgroup_uncharge_page(unused);
> hmm... using mem_cgroup_uncharge_page() would be enough, but I think it doesn't
> show what we want: we must uncharge "unused" by all means in PageCgroupUsed case,
> and I feel it strange a bit to uncharge "unused" by mem_cgroup_uncharge_page(),
> if it *was* a cache page.
> So I think __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE)
> would be better, otherwise we need more comments to explain why
> mem_cgroup_uncharge_page() is enough.
>

Hmm. ok. consider this part again.



> > /*
> > - * Both of oldpage and newpage are still under lock_page().
> > - * Then, we don't have to care about race in radix-tree.
> > - * But we have to be careful that this page is unmapped or not.
> > - *
> > - * There is a case for !page_mapped(). At the start of
> > - * migration, oldpage was mapped. But now, it's zapped.
> > - * But we know *target* page is not freed/reused under us.
> > - * mem_cgroup_uncharge_page() does all necessary checks.
> > - */
> > - if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> > - mem_cgroup_uncharge_page(target);
> > + * If old page was file cache, and removed from radix-tree
> > + * before lock_page(), perepare_migration doesn't charge and we never
> > + * reach here.
> > + *
> And if newpage was removed from radix-tree after unlock_page(),
> the context which removed it from radix-tree uncharges it properly, because
> it is charged at prepare_migration.
>
> right?
>
yes. I'll add more texts.




> > + * Considering ANON pages, we can't depend on lock_page.
> > + * If a page may be unmapped before it's remapped, new page's
> > + * mapcount will not increase. (case that mapcount 0->1 never occur.)
> > + * PageCgroupUsed() and SwapCache checks will be done.
> > + *
> > + * Once mapcount goes to 1, our hook to page_remove_rmap will do
> > + * enough jobs.
> > + */
> > + if (PageAnon(used) && !page_mapped(used))
> > + mem_cgroup_uncharge_page(used);
> mem_cgroup_uncharge_page() does the same check :)
>
Ok. I'll fix.

Thanks,
-Kame

2010-04-15 02:31:56

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH] memcg: fix underflow of mapped_file stat

> > > @@ -2517,65 +2519,70 @@ int mem_cgroup_prepare_migration(struct
> > > css_get(&mem->css);
> > > }
> > > unlock_page_cgroup(pc);
> > > -
> > > - if (mem) {
> > > - ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> > > - css_put(&mem->css);
> > > - }
> > > - *ptr = mem;
> > > + /*
> > > + * If the page is uncharged before migration (removed from radix-tree)
> > > + * we return here.
> > > + */
> > > + if (!mem)
> > > + return 0;
> > > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, &mem, false);
> > > + css_put(&mem->css); /* drop extra refcnt */
> > it should be:
> >
> > *ptr = mem;
> > ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
> > css_put(&mem->css);
> >
> > as Andrea has fixed already.
> >
> Ah, yes. I'll rebase this onto Andrea's fix.
>
>
>
> > > + if (ret)
We should check "if (ret || !*ptr)" not to do commit in !*ptr case.

> > > + * Considering ANON pages, we can't depend on lock_page.
> > > + * If a page may be unmapped before it's remapped, new page's
> > > + * mapcount will not increase. (case that mapcount 0->1 never occur.)
> > > + * PageCgroupUsed() and SwapCache checks will be done.
> > > + *
> > > + * Once mapcount goes to 1, our hook to page_remove_rmap will do
> > > + * enough jobs.
> > > + */
> > > + if (PageAnon(used) && !page_mapped(used))
> > > + mem_cgroup_uncharge_page(used);
> > mem_cgroup_uncharge_page() does the same check :)
> >
> Ok. I'll fix.
>
Considering more, we'd better to check PageAnon() at least not to call
mem_cgroup_uncharge_page() for cache page.


Thanks,
Daisuke Nishimura.

2010-04-15 03:09:14

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration

I'd like to wait until next mmotm comes out. (So, [RFC]) I'll rebase
This patch is based on
mmotm-2010/04/05
+
mm-migration-take-a-reference-to-the-anon_vma-before-migrating.patch
mm-migration-do-not-try-to-migrate-unmapped-anonymous-pages.patch
mm-share-the-anon_vma-ref-counts-between-ksm-and-page-migration.patch
mm-migration-allow-the-migration-of-pageswapcache-pages.patch
memcg-fix-prepare-migration.patch

==
From: KAMEZAWA Hiroyuki <[email protected]>

This is an additonal fix to memcg-fix-prepare-migration.patch

Now, try_charge can bypass charge if TIF_MEMDIE at el are marked on the caller.
In this case, the charge is bypassed. This makes accounts corrupted.
(PageCgroup will be marked as PCG_USED even if bypassed, and css->refcnt
can leak.)

This patch clears passed "*memcg" in bypass route.

Because usual page allocater passes memcg=NULL, this patch only affects
some special case as
- move account
- migration
- swapin.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

Index: mmotm-temp/mm/memcontrol.c
===================================================================
--- mmotm-temp.orig/mm/memcontrol.c
+++ mmotm-temp/mm/memcontrol.c
@@ -1606,8 +1606,12 @@ static int __mem_cgroup_try_charge(struc
* MEMDIE process.
*/
if (unlikely(test_thread_flag(TIF_MEMDIE)
- || fatal_signal_pending(current)))
+ || fatal_signal_pending(current))) {
+ /* Showing we skipped charge */
+ if (memcg)
+ *memcg = NULL;
goto bypass;
+ }

/*
* We always charge the cgroup the mm_struct belongs to.
@@ -2523,7 +2527,6 @@ int mem_cgroup_prepare_migration(struct
ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
css_put(&mem->css);
}
- *ptr = mem;
return ret;
}




2010-04-15 03:10:49

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v2)


At page migration, the new page is unlocked before calling end_migration().
This is mis-understanding with page-migration code of memcg.
And FILE_MAPPED of migrated file cache is not properly updated, now.

At migrating mapped file, events happens in following sequence.

1. allocate a new page.
2. get memcg of an old page.
3. charge ageinst new page, before migration. But at this point
no changes to page_cgroup, no commit-charge.
4. page migration replaces radix-tree, old-page and new-page.
5. page migration remaps the new page if the old page was mapped.
6. memcg commits the charge for newpage.

Because "commit" happens after page-remap, we lose file_mapped
accounting information at migration.

For fixing all, this changes parepare/end migration.
New migration sequence with memcg is:

1. allocate a new page.
2. charge against a new page onto old page's memcg. (here, new page's pc
is marked as PageCgroupUsed.)
3. page migration replaces radix-tree, page table, etc...
4. At remapping, FILE_MAPPED will be properly updated.

Changelog: 2010/04/14
- updated onto the latest mmotm + page-compaction, etc...
- fixed __try_charge() bypass case.
- use CHARGE_TYPE_FORCE for uncharging an unused page.

Reported-by: Daisuke Nishimura <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
include/linux/memcontrol.h | 6 +-
mm/memcontrol.c | 96 ++++++++++++++++++++++++---------------------
mm/migrate.c | 2
3 files changed, 58 insertions(+), 46 deletions(-)

Index: mmotm-temp/mm/memcontrol.c
===================================================================
--- mmotm-temp.orig/mm/memcontrol.c
+++ mmotm-temp/mm/memcontrol.c
@@ -2505,10 +2505,12 @@ static inline int mem_cgroup_move_swap_a
* Before starting migration, account PAGE_SIZE to mem_cgroup that the old
* page belongs to.
*/
-int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+int mem_cgroup_prepare_migration(struct page *page,
+ struct page *newpage, struct mem_cgroup **ptr)
{
struct page_cgroup *pc;
struct mem_cgroup *mem = NULL;
+ enum charge_type ctype;
int ret = 0;

if (mem_cgroup_disabled())
@@ -2521,65 +2523,73 @@ int mem_cgroup_prepare_migration(struct
css_get(&mem->css);
}
unlock_page_cgroup(pc);
-
+ /*
+ * If the page is uncharged before migration (removed from radix-tree)
+ * we return here.
+ */
+ if (!mem)
+ return 0;
*ptr = mem;
- if (mem) {
- ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
- css_put(&mem->css);
- }
+ ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
+ css_put(&mem->css); /* drop extra refcnt */
+ /*
+ * When we get *ptr==NULL, it means the current is being killed and
+ * we skipped account charge, In this case, make migration failed.
+ */
+ if (ret || *ptr == NULL)
+ return -ENOMEM;
+ /*
+ * The old page is under lock_page().
+ * If the old_page is uncharged and freed while migration, page
+ * migration will fail and newpage will properly uncharged.
+ * This commit_charge against newpage never fails.
+ */
+ pc = lookup_page_cgroup(newpage);
+ if (PageAnon(page))
+ ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
+ else if (page_is_file_cache(page))
+ ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
+ else
+ ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
+ __mem_cgroup_commit_charge(mem, pc, ctype);
+ /* FILE_MAPPED of this page will be updated at remap routine */
return ret;
}

/* remove redundant charge if migration failed*/
void mem_cgroup_end_migration(struct mem_cgroup *mem,
- struct page *oldpage, struct page *newpage)
+ struct page *oldpage, struct page *newpage)
{
- struct page *target, *unused;
- struct page_cgroup *pc;
- enum charge_type ctype;
+ struct page *used, *unused;

if (!mem)
return;
cgroup_exclude_rmdir(&mem->css);
/* at migration success, oldpage->mapping is NULL. */
if (oldpage->mapping) {
- target = oldpage;
- unused = NULL;
+ used = oldpage;
+ unused = newpage;
} else {
- target = newpage;
+ used = newpage;
unused = oldpage;
}
-
- if (PageAnon(target))
- ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
- else if (page_is_file_cache(target))
- ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
- else
- ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
-
- /* unused page is not on radix-tree now. */
- if (unused)
- __mem_cgroup_uncharge_common(unused, ctype);
-
- pc = lookup_page_cgroup(target);
+ /* PageCgroupUsed() flag check will do all we want */
+ __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE);
/*
- * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
- * So, double-counting is effectively avoided.
- */
- __mem_cgroup_commit_charge(mem, pc, ctype);
-
- /*
- * Both of oldpage and newpage are still under lock_page().
- * Then, we don't have to care about race in radix-tree.
- * But we have to be careful that this page is unmapped or not.
- *
- * There is a case for !page_mapped(). At the start of
- * migration, oldpage was mapped. But now, it's zapped.
- * But we know *target* page is not freed/reused under us.
- * mem_cgroup_uncharge_page() does all necessary checks.
- */
- if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
- mem_cgroup_uncharge_page(target);
+ * If old page was file cache, and removed from radix-tree
+ * before lock_page(), perepare_migration doesn't charge and we never
+ * reach here.
+ *
+ * Considering ANON pages, we can't depend on lock_page.
+ * If a page is unmapped before it's remapped, new page's
+ * mapcount doesn't ncrease. (case that mapcount 0->1 never occur.)
+ * and we have no chance to uncharge it.
+ *
+ * Once mapcount goes to 1, our hook to page_remove_rmap will do
+ * enough jobs.
+ */
+ if (PageAnon(used))
+ mem_cgroup_uncharge_page(used);
/*
* At migration, we may charge account against cgroup which has no tasks
* So, rmdir()->pre_destroy() can be called while we do this charge.
Index: mmotm-temp/mm/migrate.c
===================================================================
--- mmotm-temp.orig/mm/migrate.c
+++ mmotm-temp/mm/migrate.c
@@ -581,7 +581,7 @@ static int unmap_and_move(new_page_t get
}

/* charge against new page */
- charge = mem_cgroup_prepare_migration(page, &mem);
+ charge = mem_cgroup_prepare_migration(page, newpage, &mem);
if (charge == -ENOMEM) {
rc = -ENOMEM;
goto unlock;
Index: mmotm-temp/include/linux/memcontrol.h
===================================================================
--- mmotm-temp.orig/include/linux/memcontrol.h
+++ mmotm-temp/include/linux/memcontrol.h
@@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru
extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);

extern int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
+mem_cgroup_prepare_migration(struct page *page,
+ struct page *newpage, struct mem_cgroup **ptr);
extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
struct page *oldpage, struct page *newpage);

@@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state
}

static inline int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
+ struct mem_cgroup **ptr)
{
return 0;
}

2010-04-15 06:49:21

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration

On Thu, 15 Apr 2010 12:05:16 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> I'd like to wait until next mmotm comes out. (So, [RFC]) I'll rebase
> This patch is based on
> mmotm-2010/04/05
> +
> mm-migration-take-a-reference-to-the-anon_vma-before-migrating.patch
> mm-migration-do-not-try-to-migrate-unmapped-anonymous-pages.patch
> mm-share-the-anon_vma-ref-counts-between-ksm-and-page-migration.patch
> mm-migration-allow-the-migration-of-pageswapcache-pages.patch
> memcg-fix-prepare-migration.patch
>
> ==
> From: KAMEZAWA Hiroyuki <[email protected]>
>
> This is an additonal fix to memcg-fix-prepare-migration.patch
>
> Now, try_charge can bypass charge if TIF_MEMDIE at el are marked on the caller.
> In this case, the charge is bypassed. This makes accounts corrupted.
> (PageCgroup will be marked as PCG_USED even if bypassed, and css->refcnt
> can leak.)
>
> This patch clears passed "*memcg" in bypass route.
>
> Because usual page allocater passes memcg=NULL, this patch only affects
> some special case as
> - move account
> - migration
> - swapin.
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> mm/memcontrol.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> Index: mmotm-temp/mm/memcontrol.c
> ===================================================================
> --- mmotm-temp.orig/mm/memcontrol.c
> +++ mmotm-temp/mm/memcontrol.c
> @@ -1606,8 +1606,12 @@ static int __mem_cgroup_try_charge(struc
> * MEMDIE process.
> */
> if (unlikely(test_thread_flag(TIF_MEMDIE)
> - || fatal_signal_pending(current)))
> + || fatal_signal_pending(current))) {
> + /* Showing we skipped charge */
> + if (memcg)
> + *memcg = NULL;
> goto bypass;
> + }
>
I'm sorry, I can't understand what this part fixes.
We set *memcg to NULL at "bypass" part already:

1740 bypass:
1741 *memcg = NULL;
1742 return 0;

and __mem_cgroup_try_charge() is never called with @memcg == NULL, IIUC.

> /*
> * We always charge the cgroup the mm_struct belongs to.
> @@ -2523,7 +2527,6 @@ int mem_cgroup_prepare_migration(struct
> ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
> css_put(&mem->css);
> }
> - *ptr = mem;
> return ret;
> }
>
I sent a patch to Andrew to fix this part yesterday :)


Thanks,
Daisuke Nishimura.

2010-04-15 07:00:13

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration

On Thu, 15 Apr 2010 15:43:24 +0900
Daisuke Nishimura <[email protected]> wrote:

> On Thu, 15 Apr 2010 12:05:16 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > I'd like to wait until next mmotm comes out. (So, [RFC]) I'll rebase
> > This patch is based on
> > mmotm-2010/04/05
> > +
> > mm-migration-take-a-reference-to-the-anon_vma-before-migrating.patch
> > mm-migration-do-not-try-to-migrate-unmapped-anonymous-pages.patch
> > mm-share-the-anon_vma-ref-counts-between-ksm-and-page-migration.patch
> > mm-migration-allow-the-migration-of-pageswapcache-pages.patch
> > memcg-fix-prepare-migration.patch
> >
> > ==
> > From: KAMEZAWA Hiroyuki <[email protected]>
> >
> > This is an additonal fix to memcg-fix-prepare-migration.patch
> >
> > Now, try_charge can bypass charge if TIF_MEMDIE at el are marked on the caller.
> > In this case, the charge is bypassed. This makes accounts corrupted.
> > (PageCgroup will be marked as PCG_USED even if bypassed, and css->refcnt
> > can leak.)
> >
> > This patch clears passed "*memcg" in bypass route.
> >
> > Because usual page allocater passes memcg=NULL, this patch only affects
> > some special case as
> > - move account
> > - migration
> > - swapin.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> > ---
> > mm/memcontrol.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > Index: mmotm-temp/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-temp.orig/mm/memcontrol.c
> > +++ mmotm-temp/mm/memcontrol.c
> > @@ -1606,8 +1606,12 @@ static int __mem_cgroup_try_charge(struc
> > * MEMDIE process.
> > */
> > if (unlikely(test_thread_flag(TIF_MEMDIE)
> > - || fatal_signal_pending(current)))
> > + || fatal_signal_pending(current))) {
> > + /* Showing we skipped charge */
> > + if (memcg)
> > + *memcg = NULL;
> > goto bypass;
> > + }
> >
> I'm sorry, I can't understand what this part fixes.
> We set *memcg to NULL at "bypass" part already:
>
> 1740 bypass:
> 1741 *memcg = NULL;
> 1742 return 0;
>
> and __mem_cgroup_try_charge() is never called with @memcg == NULL, IIUC.
>

I totally missed that..Sigh.


> > /*
> > * We always charge the cgroup the mm_struct belongs to.
> > @@ -2523,7 +2527,6 @@ int mem_cgroup_prepare_migration(struct
> > ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
> > css_put(&mem->css);
> > }
> > - *ptr = mem;
> > return ret;
> > }
> >
> I sent a patch to Andrew to fix this part yesterday :)
>

Ok, ignore this patch.

-Kame

2010-04-15 08:18:26

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration

On Thu, Apr 15, 2010 at 03:56:11PM +0900, KAMEZAWA Hiroyuki wrote:
> Ok, ignore this patch.

Ok so I'll stick to my original patch on aa.git:

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=patch;h=f0a05fea58501298ab7b800ac8220f017c66f427

I already also merged the move from /proc to debugfs from Mel of two
files. So now I've to:

1) finish the generic doc in Documentation/ (mostly taken from
transparent hugepage core changeset comments here:
http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commit;h=b901f7e1ab412241d4299954ae28505f2206af1d
)

2) add alloc_pages_vma for numa awareness in the huge page faults

3) have the kernel stack 2m aligned and growsdown the vm_start in 2m
chunks when enabled=always. I doubt it makes sense to decouple this
feature from enabled=always and to add a special sysfs control for
it, plus I don't like adding too many apis and it can always
decoupled later.

4) I think I will not add a prctl to achieve Ingo's per-process enable
for now. I'm quite convinced in real life madvise is enough and
enabled=always|madvise|never is more than enough for the testing
without having to add a prctl. This is identical issue to KSM after
all, in the end also KSM is missing a prctl to enabled merging on a
per process basis and that's fine. prctl really looks very much
like libhugetlbfs to me so I'm not very attracted to it as I doubt
its usefulness strongly and if I add it, it becomes a
forever-existing API (actually even worse than the sysfs layout
from the kernel API point of view) so there has to be a strong
reason for it. And I don't think there's any point to add a
madvise(MADV_NO_HUGEPAGE) or a prctl to selectively _disable_
hugepages on mappings or processes when enabled=always. It makes no
sense to use enabled=always and then to disable hugepages in a few
apps. The opposite makes sense to save memory of course! I don't
want to add kernel APIs in prctl useful only for testing and
benchmarking. It can always be added later anyway...

5) Ulrich sent me a _three_ liner that will make glibc fully cooperate
and guarantee all anon ram goes in hugepages without using
khugepaged (just like libhugetlbfs would cooperate with
hugetlbfs). For the posix threads it won't work yet and for that we
may need to add a MAP_ALIGN flag to mmap (suggested by him) to be
optimal and not waste address space on 32bit archs. That's no big
deal, it's still orders of magnitude simpler that backing an
mmap(4k) with a 2M page and collect the still unmapped parts of
the 2M pages when system is low on memory. Furthermore MAP_ALIGN
will involve the mmap paths with mmap_sem write mode, that aren't
really fast paths, while the mmap(4k) backed by 2M would slowdown
do_anonymous_pages and other core fast paths that are much more
performance critical than the mmap paths. So I think this is the
way to go. And if somebody don't want to risk wasting memory the
default should be enabled=madvise and then add madvise where
needed. One either has to choose between performance and memory,
and I don't want intermediate terms like "a bit faster but not as
fast as it can be, but waste a little less memory" which also
complicates the code a lot and microslowdown the fast paths.

6) add a config option at kernel configuration time to select the
transparent hugepage default between always/madvise/never
(in-kernel set_recommended_min_free_kbytes late_initcall() will be
running only for always/madvise, as it already checks the built time
default and it won't run unless enabled=always|madvise).

2010-04-16 10:35:51

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

This is still RFC.
I hope this will fix memcg v.s. page-migraion war finally ;(
But we have to be careful...
==

At page migration, the new page is unlocked before calling end_migration().
This is mis-understanding with page-migration code of memcg.
(But it wasn't big problem..the real issue is mapcount handling.)

This has small race and we need to fix this race. By this,
FILE_MAPPED of migrated file cache is not properly updated, now.

This patch is for fixing the race by changing algorithm.

At migrating mapped file, events happens in following sequence.

1. allocate a new page.
2. get memcg of an old page.
3. charge ageinst a new page before migration. But at this point,
no changes to new page's page_cgroup, no commit-charge.
4. page migration replaces radix-tree, old-page and new-page.
5. page migration remaps the new page if the old page was mapped.
6. Here, the new page is unlocked.
7. memcg commits the charge for newpage, Mark page cgroup as USED.

Because "commit" happens after page-remap, we cannot count FILE_MAPPED
at "5", we can lose file_mapped accounting information within this
small race. FILE_MAPPED is updated only when mapcount changes 0->1.
So, if we don't catch this 0->1 event, we can underflow FILE_MAPPED
at 1->0 event.

We may be able to avoid underflow by some small technique or new hook but
we should catch mapcount 0->1 event fundamentaly. To catch this, we have to
make page_cgroup of new page as "USED".

BTW, historically, above implemntation comes from migration-failure
of anonymous page. When we charge both of old page and new page
with mapcount=0 before migration we can't catch
- the page is really freed before remap.
- migration fails but it's freed before remap
.....corner cases.

For fixing all, this changes parepare/end migration with MIGRATION flag on
page_cgroup.

New migration sequence with memcg is:

1. allocate a new page.
2. mark PageCgroupMigration to the old page.
3. charge against a new page onto the old page's memcg. (here, new page's pc
is marked as PageCgroupUsed.)
4. mark PageCgroupMigration to the new page.

If a page_cgroup is marked as PageCgroupMigration, it's uncahrged until
it's cleared.

5. page migration replaces radix-tree, page table, etc...
6. At remapping, new page's page_cgroup is now marked as "USED"
We can catch 0->1 event and FILE_MAPPED will be properly updated.
7. Clear PageCgroupMigration of both of old page and new page.
8. uncharge unnecessary pages.

By this:
At migration success of Anon:
- The new page is properly charged. If not-mapped after remap,
uncharge() will be called.
- The file cache is properly charged. FILE_MAPPED event can be caught.

At migration failure of Anon:
- The old page stays as charged. If not mapped after remap,
uncharge() will be called after clearing PageCgroupMigration.
If mapped, or on radix-tree(swapcache), it's not uncharged.

I hope this change will make memcg's page migration much simpler.
Page migration has caused several troubles. Worth to add a flag
for simplification.

Changelog: 2010/04/15
- updated against mm-of-the-moment snapshot 2010-04-15-14-42
+ Nishimura's fix. memcg-fix-prepare-migration-fix.patch
- fixed some typos.
- handle migration failure of anon page.

Changelog: 2010/04/14
- updated onto the latest mmotm + page-compaction, etc...
- fixed __try_charge() bypass case.
- use CHARGE_TYPE_FORCE for uncharging an unused page.

Reported-by: Daisuke Nishimura <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
include/linux/memcontrol.h | 6 +-
include/linux/page_cgroup.h | 5 +
mm/memcontrol.c | 115 +++++++++++++++++++++++++++++---------------
mm/migrate.c | 2
4 files changed, 87 insertions(+), 41 deletions(-)

Index: mmotm-Apr16/mm/memcontrol.c
===================================================================
--- mmotm-Apr16.orig/mm/memcontrol.c
+++ mmotm-Apr16/mm/memcontrol.c
@@ -2275,6 +2275,9 @@ __mem_cgroup_uncharge_common(struct page
if (!PageCgroupUsed(pc))
goto unlock_out;

+ if (unlikely(PageAnon(page) && PageCgroupMigration(pc)))
+ goto unlock_out;
+
switch (ctype) {
case MEM_CGROUP_CHARGE_TYPE_MAPPED:
case MEM_CGROUP_CHARGE_TYPE_DROP:
@@ -2501,10 +2504,12 @@ static inline int mem_cgroup_move_swap_a
* Before starting migration, account PAGE_SIZE to mem_cgroup that the old
* page belongs to.
*/
-int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+int mem_cgroup_prepare_migration(struct page *page,
+ struct page *newpage, struct mem_cgroup **ptr)
{
struct page_cgroup *pc;
struct mem_cgroup *mem = NULL;
+ enum charge_type ctype;
int ret = 0;

if (mem_cgroup_disabled())
@@ -2515,69 +2520,103 @@ int mem_cgroup_prepare_migration(struct
if (PageCgroupUsed(pc)) {
mem = pc->mem_cgroup;
css_get(&mem->css);
+ /* disallow uncharge until the end of migration */
+ SetPageCgroupMigration(pc);
}
unlock_page_cgroup(pc);
+ /*
+ * If the page is uncharged before migration (removed from radix-tree)
+ * we return here.
+ */
+ if (!mem)
+ return 0;

*ptr = mem;
- if (mem) {
- ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
- css_put(&mem->css);
+ ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
+ css_put(&mem->css);/* drop extra refcnt */
+ if (ret || *ptr == NULL) {
+ lock_page_cgroup(pc);
+ ClearPageCgroupMigration(pc);
+ unlock_page_cgroup(pc);
+ /*
+ * The old page may be fully unmapped while we kept it.
+ * If file cache, we hold lock on this page and there
+ * is no race.
+ */
+ if (PageAnon(page))
+ mem_cgroup_uncharge_page(page);
+ return -ENOMEM;
+ }
+ /*
+ * The old page is under lock_page().
+ * If the old_page is uncharged and freed while migration, page
+ * migration will fail and newpage will properly uncharged.
+ * Because we're only referer to this newpage, this commit_charge
+ * against newpage never fails.
+ */
+ pc = lookup_page_cgroup(newpage);
+ if (PageAnon(page))
+ ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
+ else if (page_is_file_cache(page))
+ ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
+ else
+ ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
+ __mem_cgroup_commit_charge(mem, pc, ctype);
+ if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED) {
+ lock_page_cgroup(pc);
+ SetPageCgroupMigration(pc);
+ unlock_page_cgroup(pc);
}
return ret;
}

/* remove redundant charge if migration failed*/
void mem_cgroup_end_migration(struct mem_cgroup *mem,
- struct page *oldpage, struct page *newpage)
+ struct page *oldpage, struct page *newpage)
{
- struct page *target, *unused;
+ struct page *used, *unused;
struct page_cgroup *pc;
- enum charge_type ctype;

if (!mem)
return;
+ /* blocks rmdir() */
cgroup_exclude_rmdir(&mem->css);
/* at migration success, oldpage->mapping is NULL. */
if (oldpage->mapping) {
- target = oldpage;
- unused = NULL;
+ used = oldpage;
+ unused = newpage;
} else {
- target = newpage;
+ used = newpage;
unused = oldpage;
}
-
- if (PageAnon(target))
- ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
- else if (page_is_file_cache(target))
- ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
- else
- ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
-
- /* unused page is not on radix-tree now. */
- if (unused)
- __mem_cgroup_uncharge_common(unused, ctype);
-
- pc = lookup_page_cgroup(target);
/*
- * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
- * So, double-counting is effectively avoided.
+ * We disallowed uncharge of pages under migration because mapcount
+ * of the page goes down to zero, temporarly.
+ * Clear the flag and check the page should be charged.
*/
- __mem_cgroup_commit_charge(mem, pc, ctype);
+ pc = lookup_page_cgroup(unused);
+ lock_page_cgroup(pc);
+ ClearPageCgroupMigration(pc);
+ unlock_page_cgroup(pc);
+ __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE);

+ pc = lookup_page_cgroup(used);
+ lock_page_cgroup(pc);
+ ClearPageCgroupMigration(pc);
+ unlock_page_cgroup(pc);
/*
- * Both of oldpage and newpage are still under lock_page().
- * Then, we don't have to care about race in radix-tree.
- * But we have to be careful that this page is unmapped or not.
- *
- * There is a case for !page_mapped(). At the start of
- * migration, oldpage was mapped. But now, it's zapped.
- * But we know *target* page is not freed/reused under us.
- * mem_cgroup_uncharge_page() does all necessary checks.
- */
- if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
- mem_cgroup_uncharge_page(target);
+ * If the page is file cache, radix-tree replacement is very atomic
+ * and we can skip this check. When it comes to Anon pages, it's
+ * uncharged when mapcount goes down to 0. Because page migration
+ * has to make mapcount goes down to 0, we may miss the caes as
+ * migration-failure or really-unmapped-while-migration.
+ * Check it out here.
+ */
+ if (PageAnon(used))
+ mem_cgroup_uncharge_page(used);
/*
- * At migration, we may charge account against cgroup which has no tasks
+ * At migration, we may charge account against cgroup which has no
+ * tasks.
* So, rmdir()->pre_destroy() can be called while we do this charge.
* In that case, we need to call pre_destroy() again. check it here.
*/
Index: mmotm-Apr16/mm/migrate.c
===================================================================
--- mmotm-Apr16.orig/mm/migrate.c
+++ mmotm-Apr16/mm/migrate.c
@@ -581,7 +581,7 @@ static int unmap_and_move(new_page_t get
}

/* charge against new page */
- charge = mem_cgroup_prepare_migration(page, &mem);
+ charge = mem_cgroup_prepare_migration(page, newpage, &mem);
if (charge == -ENOMEM) {
rc = -ENOMEM;
goto unlock;
Index: mmotm-Apr16/include/linux/memcontrol.h
===================================================================
--- mmotm-Apr16.orig/include/linux/memcontrol.h
+++ mmotm-Apr16/include/linux/memcontrol.h
@@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru
extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);

extern int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
+mem_cgroup_prepare_migration(struct page *page,
+ struct page *newpage, struct mem_cgroup **ptr);
extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
struct page *oldpage, struct page *newpage);

@@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state
}

static inline int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
+ struct mem_cgroup **ptr)
{
return 0;
}
Index: mmotm-Apr16/include/linux/page_cgroup.h
===================================================================
--- mmotm-Apr16.orig/include/linux/page_cgroup.h
+++ mmotm-Apr16/include/linux/page_cgroup.h
@@ -40,6 +40,7 @@ enum {
PCG_USED, /* this object is in use. */
PCG_ACCT_LRU, /* page has been accounted for */
PCG_FILE_MAPPED, /* page is accounted as "mapped" */
+ PCG_MIGRATION, /* under page migration */
};

#define TESTPCGFLAG(uname, lname) \
@@ -79,6 +80,10 @@ SETPCGFLAG(FileMapped, FILE_MAPPED)
CLEARPCGFLAG(FileMapped, FILE_MAPPED)
TESTPCGFLAG(FileMapped, FILE_MAPPED)

+SETPCGFLAG(Migration, MIGRATION)
+CLEARPCGFLAG(Migration, MIGRATION)
+TESTPCGFLAG(Migration, MIGRATION)
+
static inline int page_cgroup_nid(struct page_cgroup *pc)
{
return page_to_nid(pc->page);

2010-04-16 16:14:35

by Christoph Lameter

[permalink] [raw]
Subject: Interleave policy on 2M pages (was Re: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration)

On Thu, 15 Apr 2010, Andrea Arcangeli wrote:

> 2) add alloc_pages_vma for numa awareness in the huge page faults

How do interleave policies work with alloc_pages_vma? So far the semantics
is to spread 4k pages over different nodes. With 2M pages this can no
longer work the way is was.

2010-04-16 17:52:09

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Interleave policy on 2M pages (was Re: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration)

On Fri, Apr 16, 2010 at 11:13:10AM -0500, Christoph Lameter wrote:
> On Thu, 15 Apr 2010, Andrea Arcangeli wrote:
>
> > 2) add alloc_pages_vma for numa awareness in the huge page faults
>
> How do interleave policies work with alloc_pages_vma? So far the semantics
> is to spread 4k pages over different nodes. With 2M pages this can no
> longer work the way is was.

static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
unsigned nid)

See the order parameter, so I hope it's already solved. I assume the
idea would be to interleave 2M pages to avoid the CPU the memory
overhead of the pte layer and to decrease the tlb misses, but still
maxing out the bandwidth of the system when multiple threads accesses
memory that is stored in different nodes with random access. It should
be ideal for hugetlbfs too for the large shared memory pools of the
DB. Surely it'll be better than having all hugepages from the same
node despite MPOL_INTERLEAVE is set.

Said that, it'd also be possible to disable hugepages if the vma has
MPOL_INTERLEAVE set, but I doubt we want to do that by default. Maybe
we can add a sysfs control later for that which can be further tweaked
at boot time by per-arch quirks, dunno... It's really up to you, you
know numa better, but I've no doubt that MPOL_INTERLEAVE also can make
sense with hugepages (both hugetlbfs and transparent hugepage
support).

Thanks,
Andrea

2010-04-19 03:45:25

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

Hmm, before going further, will you explain why we need a new PCG_MIGRATION flag ?
What's the problem of v2 ?

Thanks,
Daisuke Nishimura.

On Fri, 16 Apr 2010 19:31:43 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> This is still RFC.
> I hope this will fix memcg v.s. page-migraion war finally ;(
> But we have to be careful...
> ==
>
> At page migration, the new page is unlocked before calling end_migration().
> This is mis-understanding with page-migration code of memcg.
> (But it wasn't big problem..the real issue is mapcount handling.)
>
> This has small race and we need to fix this race. By this,
> FILE_MAPPED of migrated file cache is not properly updated, now.
>
> This patch is for fixing the race by changing algorithm.
>
> At migrating mapped file, events happens in following sequence.
>
> 1. allocate a new page.
> 2. get memcg of an old page.
> 3. charge ageinst a new page before migration. But at this point,
> no changes to new page's page_cgroup, no commit-charge.
> 4. page migration replaces radix-tree, old-page and new-page.
> 5. page migration remaps the new page if the old page was mapped.
> 6. Here, the new page is unlocked.
> 7. memcg commits the charge for newpage, Mark page cgroup as USED.
>
> Because "commit" happens after page-remap, we cannot count FILE_MAPPED
> at "5", we can lose file_mapped accounting information within this
> small race. FILE_MAPPED is updated only when mapcount changes 0->1.
> So, if we don't catch this 0->1 event, we can underflow FILE_MAPPED
> at 1->0 event.
>
> We may be able to avoid underflow by some small technique or new hook but
> we should catch mapcount 0->1 event fundamentaly. To catch this, we have to
> make page_cgroup of new page as "USED".
>
> BTW, historically, above implemntation comes from migration-failure
> of anonymous page. When we charge both of old page and new page
> with mapcount=0 before migration we can't catch
> - the page is really freed before remap.
> - migration fails but it's freed before remap
> .....corner cases.
>
> For fixing all, this changes parepare/end migration with MIGRATION flag on
> page_cgroup.
>
> New migration sequence with memcg is:
>
> 1. allocate a new page.
> 2. mark PageCgroupMigration to the old page.
> 3. charge against a new page onto the old page's memcg. (here, new page's pc
> is marked as PageCgroupUsed.)
> 4. mark PageCgroupMigration to the new page.
>
> If a page_cgroup is marked as PageCgroupMigration, it's uncahrged until
> it's cleared.
>
> 5. page migration replaces radix-tree, page table, etc...
> 6. At remapping, new page's page_cgroup is now marked as "USED"
> We can catch 0->1 event and FILE_MAPPED will be properly updated.
> 7. Clear PageCgroupMigration of both of old page and new page.
> 8. uncharge unnecessary pages.
>
> By this:
> At migration success of Anon:
> - The new page is properly charged. If not-mapped after remap,
> uncharge() will be called.
> - The file cache is properly charged. FILE_MAPPED event can be caught.
>
> At migration failure of Anon:
> - The old page stays as charged. If not mapped after remap,
> uncharge() will be called after clearing PageCgroupMigration.
> If mapped, or on radix-tree(swapcache), it's not uncharged.
>
> I hope this change will make memcg's page migration much simpler.
> Page migration has caused several troubles. Worth to add a flag
> for simplification.
>
> Changelog: 2010/04/15
> - updated against mm-of-the-moment snapshot 2010-04-15-14-42
> + Nishimura's fix. memcg-fix-prepare-migration-fix.patch
> - fixed some typos.
> - handle migration failure of anon page.
>
> Changelog: 2010/04/14
> - updated onto the latest mmotm + page-compaction, etc...
> - fixed __try_charge() bypass case.
> - use CHARGE_TYPE_FORCE for uncharging an unused page.
>
> Reported-by: Daisuke Nishimura <[email protected]>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> include/linux/memcontrol.h | 6 +-
> include/linux/page_cgroup.h | 5 +
> mm/memcontrol.c | 115 +++++++++++++++++++++++++++++---------------
> mm/migrate.c | 2
> 4 files changed, 87 insertions(+), 41 deletions(-)
>
> Index: mmotm-Apr16/mm/memcontrol.c
> ===================================================================
> --- mmotm-Apr16.orig/mm/memcontrol.c
> +++ mmotm-Apr16/mm/memcontrol.c
> @@ -2275,6 +2275,9 @@ __mem_cgroup_uncharge_common(struct page
> if (!PageCgroupUsed(pc))
> goto unlock_out;
>
> + if (unlikely(PageAnon(page) && PageCgroupMigration(pc)))
> + goto unlock_out;
> +
> switch (ctype) {
> case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> case MEM_CGROUP_CHARGE_TYPE_DROP:
> @@ -2501,10 +2504,12 @@ static inline int mem_cgroup_move_swap_a
> * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
> * page belongs to.
> */
> -int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> +int mem_cgroup_prepare_migration(struct page *page,
> + struct page *newpage, struct mem_cgroup **ptr)
> {
> struct page_cgroup *pc;
> struct mem_cgroup *mem = NULL;
> + enum charge_type ctype;
> int ret = 0;
>
> if (mem_cgroup_disabled())
> @@ -2515,69 +2520,103 @@ int mem_cgroup_prepare_migration(struct
> if (PageCgroupUsed(pc)) {
> mem = pc->mem_cgroup;
> css_get(&mem->css);
> + /* disallow uncharge until the end of migration */
> + SetPageCgroupMigration(pc);
> }
> unlock_page_cgroup(pc);
> + /*
> + * If the page is uncharged before migration (removed from radix-tree)
> + * we return here.
> + */
> + if (!mem)
> + return 0;
>
> *ptr = mem;
> - if (mem) {
> - ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
> - css_put(&mem->css);
> + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
> + css_put(&mem->css);/* drop extra refcnt */
> + if (ret || *ptr == NULL) {
> + lock_page_cgroup(pc);
> + ClearPageCgroupMigration(pc);
> + unlock_page_cgroup(pc);
> + /*
> + * The old page may be fully unmapped while we kept it.
> + * If file cache, we hold lock on this page and there
> + * is no race.
> + */
> + if (PageAnon(page))
> + mem_cgroup_uncharge_page(page);
> + return -ENOMEM;
> + }
> + /*
> + * The old page is under lock_page().
> + * If the old_page is uncharged and freed while migration, page
> + * migration will fail and newpage will properly uncharged.
> + * Because we're only referer to this newpage, this commit_charge
> + * against newpage never fails.
> + */
> + pc = lookup_page_cgroup(newpage);
> + if (PageAnon(page))
> + ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> + else if (page_is_file_cache(page))
> + ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> + else
> + ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> + __mem_cgroup_commit_charge(mem, pc, ctype);
> + if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED) {
> + lock_page_cgroup(pc);
> + SetPageCgroupMigration(pc);
> + unlock_page_cgroup(pc);
> }
> return ret;
> }
>
> /* remove redundant charge if migration failed*/
> void mem_cgroup_end_migration(struct mem_cgroup *mem,
> - struct page *oldpage, struct page *newpage)
> + struct page *oldpage, struct page *newpage)
> {
> - struct page *target, *unused;
> + struct page *used, *unused;
> struct page_cgroup *pc;
> - enum charge_type ctype;
>
> if (!mem)
> return;
> + /* blocks rmdir() */
> cgroup_exclude_rmdir(&mem->css);
> /* at migration success, oldpage->mapping is NULL. */
> if (oldpage->mapping) {
> - target = oldpage;
> - unused = NULL;
> + used = oldpage;
> + unused = newpage;
> } else {
> - target = newpage;
> + used = newpage;
> unused = oldpage;
> }
> -
> - if (PageAnon(target))
> - ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> - else if (page_is_file_cache(target))
> - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> - else
> - ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> -
> - /* unused page is not on radix-tree now. */
> - if (unused)
> - __mem_cgroup_uncharge_common(unused, ctype);
> -
> - pc = lookup_page_cgroup(target);
> /*
> - * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
> - * So, double-counting is effectively avoided.
> + * We disallowed uncharge of pages under migration because mapcount
> + * of the page goes down to zero, temporarly.
> + * Clear the flag and check the page should be charged.
> */
> - __mem_cgroup_commit_charge(mem, pc, ctype);
> + pc = lookup_page_cgroup(unused);
> + lock_page_cgroup(pc);
> + ClearPageCgroupMigration(pc);
> + unlock_page_cgroup(pc);
> + __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE);
>
> + pc = lookup_page_cgroup(used);
> + lock_page_cgroup(pc);
> + ClearPageCgroupMigration(pc);
> + unlock_page_cgroup(pc);
> /*
> - * Both of oldpage and newpage are still under lock_page().
> - * Then, we don't have to care about race in radix-tree.
> - * But we have to be careful that this page is unmapped or not.
> - *
> - * There is a case for !page_mapped(). At the start of
> - * migration, oldpage was mapped. But now, it's zapped.
> - * But we know *target* page is not freed/reused under us.
> - * mem_cgroup_uncharge_page() does all necessary checks.
> - */
> - if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> - mem_cgroup_uncharge_page(target);
> + * If the page is file cache, radix-tree replacement is very atomic
> + * and we can skip this check. When it comes to Anon pages, it's
> + * uncharged when mapcount goes down to 0. Because page migration
> + * has to make mapcount goes down to 0, we may miss the caes as
> + * migration-failure or really-unmapped-while-migration.
> + * Check it out here.
> + */
> + if (PageAnon(used))
> + mem_cgroup_uncharge_page(used);
> /*
> - * At migration, we may charge account against cgroup which has no tasks
> + * At migration, we may charge account against cgroup which has no
> + * tasks.
> * So, rmdir()->pre_destroy() can be called while we do this charge.
> * In that case, we need to call pre_destroy() again. check it here.
> */
> Index: mmotm-Apr16/mm/migrate.c
> ===================================================================
> --- mmotm-Apr16.orig/mm/migrate.c
> +++ mmotm-Apr16/mm/migrate.c
> @@ -581,7 +581,7 @@ static int unmap_and_move(new_page_t get
> }
>
> /* charge against new page */
> - charge = mem_cgroup_prepare_migration(page, &mem);
> + charge = mem_cgroup_prepare_migration(page, newpage, &mem);
> if (charge == -ENOMEM) {
> rc = -ENOMEM;
> goto unlock;
> Index: mmotm-Apr16/include/linux/memcontrol.h
> ===================================================================
> --- mmotm-Apr16.orig/include/linux/memcontrol.h
> +++ mmotm-Apr16/include/linux/memcontrol.h
> @@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru
> extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);
>
> extern int
> -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
> +mem_cgroup_prepare_migration(struct page *page,
> + struct page *newpage, struct mem_cgroup **ptr);
> extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
> struct page *oldpage, struct page *newpage);
>
> @@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state
> }
>
> static inline int
> -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> +mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
> + struct mem_cgroup **ptr)
> {
> return 0;
> }
> Index: mmotm-Apr16/include/linux/page_cgroup.h
> ===================================================================
> --- mmotm-Apr16.orig/include/linux/page_cgroup.h
> +++ mmotm-Apr16/include/linux/page_cgroup.h
> @@ -40,6 +40,7 @@ enum {
> PCG_USED, /* this object is in use. */
> PCG_ACCT_LRU, /* page has been accounted for */
> PCG_FILE_MAPPED, /* page is accounted as "mapped" */
> + PCG_MIGRATION, /* under page migration */
> };
>
> #define TESTPCGFLAG(uname, lname) \
> @@ -79,6 +80,10 @@ SETPCGFLAG(FileMapped, FILE_MAPPED)
> CLEARPCGFLAG(FileMapped, FILE_MAPPED)
> TESTPCGFLAG(FileMapped, FILE_MAPPED)
>
> +SETPCGFLAG(Migration, MIGRATION)
> +CLEARPCGFLAG(Migration, MIGRATION)
> +TESTPCGFLAG(Migration, MIGRATION)
> +
> static inline int page_cgroup_nid(struct page_cgroup *pc)
> {
> return page_to_nid(pc->page);
>

2010-04-19 04:22:21

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

On Mon, 19 Apr 2010 12:42:25 +0900
Daisuke Nishimura <[email protected]> wrote:

> Hmm, before going further, will you explain why we need a new PCG_MIGRATION flag ?
> What's the problem of v2 ?
>

v2 can't handle migration-failure case of freed swapcache and the used page
was swapped-out case. I think.

All "page" in following is ANON.


mem_cgroup_prepare_migration()
charge against new page.

try_to_unmap()
-> mapcount goes down to 0.
-> an old page is unchaged

move_to_new_page()
-> may fail. (in some case.) ----(*1)

remap the old page to pte.

mem_cgroup_end_migration()
(at success *1)
check charge for newpage is valid or not (*2)

(at fail *1)
uncharge new page.
What we should do for an old page. ---(*3)

At (*2). (*3), there are several cases.

(*2) migration was succeeded.
1. The new page was successfully remapped.
-> Nothing to do.
2. The new page was remapped but finally unmapped before (*3)
-> page_remove_rmap() will catch the event.
3. The new page was not remapped.
-> page_remove_rmap() can't catch the event. end_migraion() has to
uncharge it.

(*3) migration was failed.
1. The old page was successfully remapped.
-> We have to recharge against the old page. (But it may hit OOM.)
2. The old page wasn't remapped.
-> mapcount is 0. No new charge will happen.
3. The old page wasn't remapped but SwapCache.
-> mapcount is 0. We have to recharge against the old page (But it may hit OOM)

Maybe other seqence I couldn't write will exist......IMHO, "we have to recharge it because
it's uncharged.." is bad idea. It seems hard to maintainace..


When we use MIGRATION flag.
After migaration.

1. Agaisnt new page, we remove MIGRATION flag and try to uncharge() it again.

2. Agaisnt old page, we remove MIGRATION flag and try to uncharge it again.

NOTE: I noticed my v3 patch is buggy when the page-is-swapped-out case. It seems
mem_cgroup_uncharge_swapcache() has to wait for migration ends or some
other case handling. (Anyway, this race exists only after unlock_page(newpage).
So, wait for MIGRATION ends in spin will not be very bad.)


To me, things are much simpler than now, we have to know what kind of magics behind us...

Maybe I can think of other tricks for handling them...but using a FLAG and prevent uncharge
is the simplest, I think.


Thanks,
-Kame



> Thanks,
> Daisuke Nishimura.
>
> On Fri, 16 Apr 2010 19:31:43 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > This is still RFC.
> > I hope this will fix memcg v.s. page-migraion war finally ;(
> > But we have to be careful...
> > ==
> >
> > At page migration, the new page is unlocked before calling end_migration().
> > This is mis-understanding with page-migration code of memcg.
> > (But it wasn't big problem..the real issue is mapcount handling.)
> >
> > This has small race and we need to fix this race. By this,
> > FILE_MAPPED of migrated file cache is not properly updated, now.
> >
> > This patch is for fixing the race by changing algorithm.
> >
> > At migrating mapped file, events happens in following sequence.
> >
> > 1. allocate a new page.
> > 2. get memcg of an old page.
> > 3. charge ageinst a new page before migration. But at this point,
> > no changes to new page's page_cgroup, no commit-charge.
> > 4. page migration replaces radix-tree, old-page and new-page.
> > 5. page migration remaps the new page if the old page was mapped.
> > 6. Here, the new page is unlocked.
> > 7. memcg commits the charge for newpage, Mark page cgroup as USED.
> >
> > Because "commit" happens after page-remap, we cannot count FILE_MAPPED
> > at "5", we can lose file_mapped accounting information within this
> > small race. FILE_MAPPED is updated only when mapcount changes 0->1.
> > So, if we don't catch this 0->1 event, we can underflow FILE_MAPPED
> > at 1->0 event.
> >
> > We may be able to avoid underflow by some small technique or new hook but
> > we should catch mapcount 0->1 event fundamentaly. To catch this, we have to
> > make page_cgroup of new page as "USED".
> >
> > BTW, historically, above implemntation comes from migration-failure
> > of anonymous page. When we charge both of old page and new page
> > with mapcount=0 before migration we can't catch
> > - the page is really freed before remap.
> > - migration fails but it's freed before remap
> > .....corner cases.
> >
> > For fixing all, this changes parepare/end migration with MIGRATION flag on
> > page_cgroup.
> >
> > New migration sequence with memcg is:
> >
> > 1. allocate a new page.
> > 2. mark PageCgroupMigration to the old page.
> > 3. charge against a new page onto the old page's memcg. (here, new page's pc
> > is marked as PageCgroupUsed.)
> > 4. mark PageCgroupMigration to the new page.
> >
> > If a page_cgroup is marked as PageCgroupMigration, it's uncahrged until
> > it's cleared.
> >
> > 5. page migration replaces radix-tree, page table, etc...
> > 6. At remapping, new page's page_cgroup is now marked as "USED"
> > We can catch 0->1 event and FILE_MAPPED will be properly updated.
> > 7. Clear PageCgroupMigration of both of old page and new page.
> > 8. uncharge unnecessary pages.
> >
> > By this:
> > At migration success of Anon:
> > - The new page is properly charged. If not-mapped after remap,
> > uncharge() will be called.
> > - The file cache is properly charged. FILE_MAPPED event can be caught.
> >
> > At migration failure of Anon:
> > - The old page stays as charged. If not mapped after remap,
> > uncharge() will be called after clearing PageCgroupMigration.
> > If mapped, or on radix-tree(swapcache), it's not uncharged.
> >
> > I hope this change will make memcg's page migration much simpler.
> > Page migration has caused several troubles. Worth to add a flag
> > for simplification.
> >
> > Changelog: 2010/04/15
> > - updated against mm-of-the-moment snapshot 2010-04-15-14-42
> > + Nishimura's fix. memcg-fix-prepare-migration-fix.patch
> > - fixed some typos.
> > - handle migration failure of anon page.
> >
> > Changelog: 2010/04/14
> > - updated onto the latest mmotm + page-compaction, etc...
> > - fixed __try_charge() bypass case.
> > - use CHARGE_TYPE_FORCE for uncharging an unused page.
> >
> > Reported-by: Daisuke Nishimura <[email protected]>
> > Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> > ---
> > include/linux/memcontrol.h | 6 +-
> > include/linux/page_cgroup.h | 5 +
> > mm/memcontrol.c | 115 +++++++++++++++++++++++++++++---------------
> > mm/migrate.c | 2
> > 4 files changed, 87 insertions(+), 41 deletions(-)
> >
> > Index: mmotm-Apr16/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-Apr16.orig/mm/memcontrol.c
> > +++ mmotm-Apr16/mm/memcontrol.c
> > @@ -2275,6 +2275,9 @@ __mem_cgroup_uncharge_common(struct page
> > if (!PageCgroupUsed(pc))
> > goto unlock_out;
> >
> > + if (unlikely(PageAnon(page) && PageCgroupMigration(pc)))
> > + goto unlock_out;
> > +
> > switch (ctype) {
> > case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > case MEM_CGROUP_CHARGE_TYPE_DROP:
> > @@ -2501,10 +2504,12 @@ static inline int mem_cgroup_move_swap_a
> > * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
> > * page belongs to.
> > */
> > -int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> > +int mem_cgroup_prepare_migration(struct page *page,
> > + struct page *newpage, struct mem_cgroup **ptr)
> > {
> > struct page_cgroup *pc;
> > struct mem_cgroup *mem = NULL;
> > + enum charge_type ctype;
> > int ret = 0;
> >
> > if (mem_cgroup_disabled())
> > @@ -2515,69 +2520,103 @@ int mem_cgroup_prepare_migration(struct
> > if (PageCgroupUsed(pc)) {
> > mem = pc->mem_cgroup;
> > css_get(&mem->css);
> > + /* disallow uncharge until the end of migration */
> > + SetPageCgroupMigration(pc);
> > }
> > unlock_page_cgroup(pc);
> > + /*
> > + * If the page is uncharged before migration (removed from radix-tree)
> > + * we return here.
> > + */
> > + if (!mem)
> > + return 0;
> >
> > *ptr = mem;
> > - if (mem) {
> > - ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
> > - css_put(&mem->css);
> > + ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
> > + css_put(&mem->css);/* drop extra refcnt */
> > + if (ret || *ptr == NULL) {
> > + lock_page_cgroup(pc);
> > + ClearPageCgroupMigration(pc);
> > + unlock_page_cgroup(pc);
> > + /*
> > + * The old page may be fully unmapped while we kept it.
> > + * If file cache, we hold lock on this page and there
> > + * is no race.
> > + */
> > + if (PageAnon(page))
> > + mem_cgroup_uncharge_page(page);
> > + return -ENOMEM;
> > + }
> > + /*
> > + * The old page is under lock_page().
> > + * If the old_page is uncharged and freed while migration, page
> > + * migration will fail and newpage will properly uncharged.
> > + * Because we're only referer to this newpage, this commit_charge
> > + * against newpage never fails.
> > + */
> > + pc = lookup_page_cgroup(newpage);
> > + if (PageAnon(page))
> > + ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> > + else if (page_is_file_cache(page))
> > + ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> > + else
> > + ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> > + __mem_cgroup_commit_charge(mem, pc, ctype);
> > + if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED) {
> > + lock_page_cgroup(pc);
> > + SetPageCgroupMigration(pc);
> > + unlock_page_cgroup(pc);
> > }
> > return ret;
> > }
> >
> > /* remove redundant charge if migration failed*/
> > void mem_cgroup_end_migration(struct mem_cgroup *mem,
> > - struct page *oldpage, struct page *newpage)
> > + struct page *oldpage, struct page *newpage)
> > {
> > - struct page *target, *unused;
> > + struct page *used, *unused;
> > struct page_cgroup *pc;
> > - enum charge_type ctype;
> >
> > if (!mem)
> > return;
> > + /* blocks rmdir() */
> > cgroup_exclude_rmdir(&mem->css);
> > /* at migration success, oldpage->mapping is NULL. */
> > if (oldpage->mapping) {
> > - target = oldpage;
> > - unused = NULL;
> > + used = oldpage;
> > + unused = newpage;
> > } else {
> > - target = newpage;
> > + used = newpage;
> > unused = oldpage;
> > }
> > -
> > - if (PageAnon(target))
> > - ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> > - else if (page_is_file_cache(target))
> > - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> > - else
> > - ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> > -
> > - /* unused page is not on radix-tree now. */
> > - if (unused)
> > - __mem_cgroup_uncharge_common(unused, ctype);
> > -
> > - pc = lookup_page_cgroup(target);
> > /*
> > - * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
> > - * So, double-counting is effectively avoided.
> > + * We disallowed uncharge of pages under migration because mapcount
> > + * of the page goes down to zero, temporarly.
> > + * Clear the flag and check the page should be charged.
> > */
> > - __mem_cgroup_commit_charge(mem, pc, ctype);
> > + pc = lookup_page_cgroup(unused);
> > + lock_page_cgroup(pc);
> > + ClearPageCgroupMigration(pc);
> > + unlock_page_cgroup(pc);
> > + __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE);
> >
> > + pc = lookup_page_cgroup(used);
> > + lock_page_cgroup(pc);
> > + ClearPageCgroupMigration(pc);
> > + unlock_page_cgroup(pc);
> > /*
> > - * Both of oldpage and newpage are still under lock_page().
> > - * Then, we don't have to care about race in radix-tree.
> > - * But we have to be careful that this page is unmapped or not.
> > - *
> > - * There is a case for !page_mapped(). At the start of
> > - * migration, oldpage was mapped. But now, it's zapped.
> > - * But we know *target* page is not freed/reused under us.
> > - * mem_cgroup_uncharge_page() does all necessary checks.
> > - */
> > - if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
> > - mem_cgroup_uncharge_page(target);
> > + * If the page is file cache, radix-tree replacement is very atomic
> > + * and we can skip this check. When it comes to Anon pages, it's
> > + * uncharged when mapcount goes down to 0. Because page migration
> > + * has to make mapcount goes down to 0, we may miss the caes as
> > + * migration-failure or really-unmapped-while-migration.
> > + * Check it out here.
> > + */
> > + if (PageAnon(used))
> > + mem_cgroup_uncharge_page(used);
> > /*
> > - * At migration, we may charge account against cgroup which has no tasks
> > + * At migration, we may charge account against cgroup which has no
> > + * tasks.
> > * So, rmdir()->pre_destroy() can be called while we do this charge.
> > * In that case, we need to call pre_destroy() again. check it here.
> > */
> > Index: mmotm-Apr16/mm/migrate.c
> > ===================================================================
> > --- mmotm-Apr16.orig/mm/migrate.c
> > +++ mmotm-Apr16/mm/migrate.c
> > @@ -581,7 +581,7 @@ static int unmap_and_move(new_page_t get
> > }
> >
> > /* charge against new page */
> > - charge = mem_cgroup_prepare_migration(page, &mem);
> > + charge = mem_cgroup_prepare_migration(page, newpage, &mem);
> > if (charge == -ENOMEM) {
> > rc = -ENOMEM;
> > goto unlock;
> > Index: mmotm-Apr16/include/linux/memcontrol.h
> > ===================================================================
> > --- mmotm-Apr16.orig/include/linux/memcontrol.h
> > +++ mmotm-Apr16/include/linux/memcontrol.h
> > @@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru
> > extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);
> >
> > extern int
> > -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
> > +mem_cgroup_prepare_migration(struct page *page,
> > + struct page *newpage, struct mem_cgroup **ptr);
> > extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
> > struct page *oldpage, struct page *newpage);
> >
> > @@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state
> > }
> >
> > static inline int
> > -mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
> > +mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
> > + struct mem_cgroup **ptr)
> > {
> > return 0;
> > }
> > Index: mmotm-Apr16/include/linux/page_cgroup.h
> > ===================================================================
> > --- mmotm-Apr16.orig/include/linux/page_cgroup.h
> > +++ mmotm-Apr16/include/linux/page_cgroup.h
> > @@ -40,6 +40,7 @@ enum {
> > PCG_USED, /* this object is in use. */
> > PCG_ACCT_LRU, /* page has been accounted for */
> > PCG_FILE_MAPPED, /* page is accounted as "mapped" */
> > + PCG_MIGRATION, /* under page migration */
> > };
> >
> > #define TESTPCGFLAG(uname, lname) \
> > @@ -79,6 +80,10 @@ SETPCGFLAG(FileMapped, FILE_MAPPED)
> > CLEARPCGFLAG(FileMapped, FILE_MAPPED)
> > TESTPCGFLAG(FileMapped, FILE_MAPPED)
> >
> > +SETPCGFLAG(Migration, MIGRATION)
> > +CLEARPCGFLAG(Migration, MIGRATION)
> > +TESTPCGFLAG(Migration, MIGRATION)
> > +
> > static inline int page_cgroup_nid(struct page_cgroup *pc)
> > {
> > return page_to_nid(pc->page);
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2010-04-19 08:13:18

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

Thank you for explaining in detail.

On Mon, 19 Apr 2010 13:18:17 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Mon, 19 Apr 2010 12:42:25 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > Hmm, before going further, will you explain why we need a new PCG_MIGRATION flag ?
> > What's the problem of v2 ?
> >
>
> v2 can't handle migration-failure case of freed swapcache and the used page
> was swapped-out case. I think.
>
> All "page" in following is ANON.
>
>
> mem_cgroup_prepare_migration()
> charge against new page.
>
> try_to_unmap()
> -> mapcount goes down to 0.
> -> an old page is unchaged
>
But old page isn't uncharged iff PageSwapCache, is it ?

> move_to_new_page()
> -> may fail. (in some case.) ----(*1)
>
> remap the old page to pte.
>
> mem_cgroup_end_migration()
> (at success *1)
> check charge for newpage is valid or not (*2)
>
> (at fail *1)
> uncharge new page.
> What we should do for an old page. ---(*3)
>
> At (*2). (*3), there are several cases.
>
> (*2) migration was succeeded.
> 1. The new page was successfully remapped.
> -> Nothing to do.
> 2. The new page was remapped but finally unmapped before (*3)
> -> page_remove_rmap() will catch the event.
> 3. The new page was not remapped.
> -> page_remove_rmap() can't catch the event. end_migraion() has to
> uncharge it.
>
> (*3) migration was failed.
> 1. The old page was successfully remapped.
> -> We have to recharge against the old page. (But it may hit OOM.)
> 2. The old page wasn't remapped.
> -> mapcount is 0. No new charge will happen.
> 3. The old page wasn't remapped but SwapCache.
> -> mapcount is 0. We have to recharge against the old page (But it may hit OOM)
>
hmm, we've done try_charge at this point, so why can we cause oom here ?

> Maybe other seqence I couldn't write will exist......IMHO, "we have to recharge it because
> it's uncharged.." is bad idea. It seems hard to maintainace..
>
>
> When we use MIGRATION flag.
> After migaration.
>
> 1. Agaisnt new page, we remove MIGRATION flag and try to uncharge() it again.
>
> 2. Agaisnt old page, we remove MIGRATION flag and try to uncharge it again.
>
> NOTE: I noticed my v3 patch is buggy when the page-is-swapped-out case. It seems
> mem_cgroup_uncharge_swapcache() has to wait for migration ends or some
> other case handling. (Anyway, this race exists only after unlock_page(newpage).
> So, wait for MIGRATION ends in spin will not be very bad.)
>
>
> To me, things are much simpler than now, we have to know what kind of magics behind us...
>
> Maybe I can think of other tricks for handling them...but using a FLAG and prevent uncharge
> is the simplest, I think.
>
Anyway, I agree that current implementation is complicated and there might be
some cases we are missing. MIGRATION flag can make it simpler.

I have one concern for now. Reading the patch, the flag have influence on
only anonymous pages, so we'd better to note it and I feel it strange to
set(and clear) the flag of "old page" always(iow, even when !PageAnon)
in prepare_migration.


Thanks,
Daisuke Nishimura.

2010-04-19 08:30:30

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

On Mon, 19 Apr 2010 17:07:01 +0900
Daisuke Nishimura <[email protected]> wrote:

> Thank you for explaining in detail.
>
> On Mon, 19 Apr 2010 13:18:17 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > On Mon, 19 Apr 2010 12:42:25 +0900
> > Daisuke Nishimura <[email protected]> wrote:
> >
> > > Hmm, before going further, will you explain why we need a new PCG_MIGRATION flag ?
> > > What's the problem of v2 ?
> > >
> >
> > v2 can't handle migration-failure case of freed swapcache and the used page
> > was swapped-out case. I think.
> >
> > All "page" in following is ANON.
> >
> >
> > mem_cgroup_prepare_migration()
> > charge against new page.
> >
> > try_to_unmap()
> > -> mapcount goes down to 0.
> > -> an old page is unchaged
> >
> But old page isn't uncharged iff PageSwapCache, is it ?
>
yes.


> > move_to_new_page()
> > -> may fail. (in some case.) ----(*1)
> >
> > remap the old page to pte.
> >
> > mem_cgroup_end_migration()
> > (at success *1)
> > check charge for newpage is valid or not (*2)
> >
> > (at fail *1)
> > uncharge new page.
> > What we should do for an old page. ---(*3)
> >
> > At (*2). (*3), there are several cases.
> >
> > (*2) migration was succeeded.
> > 1. The new page was successfully remapped.
> > -> Nothing to do.
> > 2. The new page was remapped but finally unmapped before (*3)
> > -> page_remove_rmap() will catch the event.
> > 3. The new page was not remapped.
> > -> page_remove_rmap() can't catch the event. end_migraion() has to
> > uncharge it.
> >
> > (*3) migration was failed.
> > 1. The old page was successfully remapped.
> > -> We have to recharge against the old page. (But it may hit OOM.)
> > 2. The old page wasn't remapped.
> > -> mapcount is 0. No new charge will happen.
> > 3. The old page wasn't remapped but SwapCache.
> > -> mapcount is 0. We have to recharge against the old page (But it may hit OOM)
> >
> hmm, we've done try_charge at this point, so why can we cause oom here ?
>

v2 doesn't charge. That was the bug.
"may hit OOM" is an explanation for why current implementation is used.
(current implemnation == delayed commmit charge.)


> > Maybe other seqence I couldn't write will exist......IMHO, "we have to recharge it because
> > it's uncharged.." is bad idea. It seems hard to maintainace..
> >
> >
> > When we use MIGRATION flag.
> > After migaration.
> >
> > 1. Agaisnt new page, we remove MIGRATION flag and try to uncharge() it again.
> >
> > 2. Agaisnt old page, we remove MIGRATION flag and try to uncharge it again.
> >
> > NOTE: I noticed my v3 patch is buggy when the page-is-swapped-out case. It seems
> > mem_cgroup_uncharge_swapcache() has to wait for migration ends or some
> > other case handling. (Anyway, this race exists only after unlock_page(newpage).
> > So, wait for MIGRATION ends in spin will not be very bad.)
> >
> >
> > To me, things are much simpler than now, we have to know what kind of magics behind us...
> >
> > Maybe I can think of other tricks for handling them...but using a FLAG and prevent uncharge
> > is the simplest, I think.
> >
> Anyway, I agree that current implementation is complicated and there might be
> some cases we are missing. MIGRATION flag can make it simpler.
>
I think so.

> I have one concern for now. Reading the patch, the flag have influence on
> only anonymous pages, so we'd better to note it and I feel it strange to
> set(and clear) the flag of "old page" always(iow, even when !PageAnon)
> in prepare_migration.
>

Hmm...Checking "Only Anon" is simpler ? It will have no meanings for migrating
file caches, but it may have some meanings for easy debugging.
I think "mark it always but it's used only for anonymous page" is reasonable
(if it causes no bug.)


Thanks,
-Kame



2010-04-20 04:23:47

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

> > I have one concern for now. Reading the patch, the flag have influence on
> > only anonymous pages, so we'd better to note it and I feel it strange to
> > set(and clear) the flag of "old page" always(iow, even when !PageAnon)
> > in prepare_migration.
> >
>
> Hmm...Checking "Only Anon" is simpler ?
I just thought it was inconsistent that we always set/clear the bit about "old page",
while we set the bit about "new page" only in PageAnon case.

> It will have no meanings for migrating
> file caches, but it may have some meanings for easy debugging.
> I think "mark it always but it's used only for anonymous page" is reasonable
> (if it causes no bug.)
>
Anyway, I don't have any strong objection.
It's all right for me as long as it is well documented or commented.


Thanks,
Daisuke Nishimura.

2010-04-20 04:30:18

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

On Tue, 20 Apr 2010 13:20:50 +0900
Daisuke Nishimura <[email protected]> wrote:

> > > I have one concern for now. Reading the patch, the flag have influence on
> > > only anonymous pages, so we'd better to note it and I feel it strange to
> > > set(and clear) the flag of "old page" always(iow, even when !PageAnon)
> > > in prepare_migration.
> > >
> >
> > Hmm...Checking "Only Anon" is simpler ?
> I just thought it was inconsistent that we always set/clear the bit about "old page",
> while we set the bit about "new page" only in PageAnon case.
>
Ok, look into again.


> > It will have no meanings for migrating
> > file caches, but it may have some meanings for easy debugging.
> > I think "mark it always but it's used only for anonymous page" is reasonable
> > (if it causes no bug.)
> >
> Anyway, I don't have any strong objection.
> It's all right for me as long as it is well documented or commented.
>
Maybe I can post v4, today.

Thanks,
-Kame

2010-04-20 09:23:29

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

On Tue, 20 Apr 2010 13:20:50 +0900
Daisuke Nishimura <[email protected]> wrote:

> > It will have no meanings for migrating
> > file caches, but it may have some meanings for easy debugging.
> > I think "mark it always but it's used only for anonymous page" is reasonable
> > (if it causes no bug.)
> >
> Anyway, I don't have any strong objection.
> It's all right for me as long as it is well documented or commented.
>
Okay, before posting as v4, here is draft version.

I'll check again.
==

At page migration, the new page is unlocked before calling end_migration().
This is mis-understanding with page-migration code of memcg.
(it worked but...)
This has small race and we need to fix this race. By this,
FILE_MAPPED of migrated file cache is not properly updated, now.

This patch is for fixing the race by changing algorithm.

At migrating mapped file, events happens in following sequence.

1. allocate a new page.
2. get memcg of an old page.
3. charge ageinst a new page before migration. But at this point,
no changes to new page's page_cgroup, no commit-charge.
4. page migration replaces radix-tree, old-page and new-page.
5. page migration remaps the new page if the old page was mapped.
6. Here, the new page is unlocked.
7. memcg commits the charge for newpage, Mark page cgroup as USED.

Because "commit" happens after page-remap, we can count FILE_MAPPED
at "5", we can lose file_mapped accounting information within this
small race. FILE_MAPPED is updated only when mapcount changes 0->1.
So, if we don't catch this 0->1 event, we can underflow FILE_MAPPED
at 1->0 event.

We may be able to avoid underflow by some small technique but
we should catch mapcount 0->1 event. To catch this, we have to
make page_cgroup of new page as "USED".

BTW, historically, above implemntation comes from migration-failure
of anonymous page. Because we charge both of old page and new page
with mapcount=0, we can't catch
- the page is really freed before remap.
- migration fails but it's freed before remap
or .....corner cases.

For fixing all, this changes parepare/end migration.

New migration sequence with memcg is:

1. allocate a new page.
2. mark PageCgroupMigration to the old page.
3. charge against a new page onto the old page's memcg. (here, new page's pc
is marked as PageCgroupUsed.)
4. mark PageCgroupMigration to the old page.

If page_cgroup is marked as PageCgroupMigration, it's uncahrged until
it's cleared.

5. page migration replaces radix-tree, page table, etc...
6. At remapping, new page's page_cgroup is now makrked as "USED"
We can catch 0->1 event and FILE_MAPPED will be properly updated.
And we can catch SWAPOUT event after unlock this and freeing this
page by unmap() can be caught.

7. Clear PageCgroupMigration of the old page.

By this:
At migration success of Anon:
- The new page is properly charged. If not-mapped after remap,
uncharge() will be called.
- The file cache is properly charged. FILE_MAPPED event can be caught.

At migration failure of Anon:
- The old page stays as charged. If not mapped after remap,
uncharge() will called. The corner case is SWAPOUT. But, while migraion,
it's locked. So, we have no race with it.

Then, for what MIGRATION flag is ?
Without it, at migration failure, we may have to charge old page again
because it may be fully unmapped. "charge" means that we have to dive into
memory reclaim or something complated. So, it's better to avoid
charge it again. Before this patch, __commit_charge() was working for
both of the old/new page and fixed up all. But this technique has some
racy condtion around FILE_MAPPED and SWAPOUT etc...
Now, the kernel use MIGRATION flag and don't uncharge old page until
the end of migration.


I hope this change will make memcg's page migration much simpler.
This page migration has caused several troubles. Worth to add
a flag for simplification.

Changelog: 2010/04/20
- fixed SWAPOUT case.
- added texts for explanation.
- removed MIGRAION flag onto new page.

Changelog: 2010/04/15
- updated against mm-of-the-moment snapshot 2010-04-15-14-42
+ Nishimura's fix. memcg-fix-prepare-migration-fix.patch
- fixed some typos.
- handle migration failure of anon page.

Changelog: 2010/04/14
- updated onto the latest mmotm + page-compaction, etc...
- fixed __try_charge() bypass case.
- use CHARGE_TYPE_FORCE for uncharging an unused page.

Reported-by: Daisuke Nishimura <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
include/linux/memcontrol.h | 6 +
include/linux/page_cgroup.h | 5 +
mm/memcontrol.c | 145 +++++++++++++++++++++++++++++++-------------
mm/migrate.c | 2
4 files changed, 115 insertions(+), 43 deletions(-)

Index: linux-2.6.34-rc4-mm1/mm/memcontrol.c
===================================================================
--- linux-2.6.34-rc4-mm1.orig/mm/memcontrol.c
+++ linux-2.6.34-rc4-mm1/mm/memcontrol.c
@@ -2278,7 +2278,8 @@ __mem_cgroup_uncharge_common(struct page
switch (ctype) {
case MEM_CGROUP_CHARGE_TYPE_MAPPED:
case MEM_CGROUP_CHARGE_TYPE_DROP:
- if (page_mapped(page))
+ /* See mem_cgroup_prepare_migration() */
+ if (page_mapped(page) || PageCgroupMigration(pc))
goto unlock_out;
break;
case MEM_CGROUP_CHARGE_TYPE_SWAPOUT:
@@ -2501,10 +2502,12 @@ static inline int mem_cgroup_move_swap_a
* Before starting migration, account PAGE_SIZE to mem_cgroup that the old
* page belongs to.
*/
-int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+int mem_cgroup_prepare_migration(struct page *page,
+ struct page *newpage, struct mem_cgroup **ptr)
{
struct page_cgroup *pc;
struct mem_cgroup *mem = NULL;
+ enum charge_type ctype;
int ret = 0;

if (mem_cgroup_disabled())
@@ -2515,69 +2518,131 @@ int mem_cgroup_prepare_migration(struct
if (PageCgroupUsed(pc)) {
mem = pc->mem_cgroup;
css_get(&mem->css);
+ /*
+ * At migrationg an anonymous page, its mapcount goes down
+ * to 0 and uncharge() will be called. But, even if it's fully
+ * unmapped, migration may fail and this page has to be
+ * charged again. We set MIGRATION flag here and delay uncharge
+ * until end_migration() is called
+ *
+ * Corner Case Thinking
+ * A)
+ * When the old page was mapped as Anon and it's unmap-and-freed
+ * while migration was ongoing, it's not uncharged until
+ * end_migration(). Both of old and new page will be uncharged
+ * at end_migration() because it's not mapped and not SwapCache.
+ *
+ * B)
+ * When the old page was mapped but migraion fails, the kernel
+ * remap it. The charge for it is kept by MIGRATION flag even
+ * if mapcount goes down to 0, we can do remap successfully
+ * without charging it again.
+ * If the kernel doesn't remap it because it's unmapped,
+ * we can check it at end_migration(), no new charge happens
+ * at end_migration().
+ *
+ * C)
+ * The "old" page is under lock_page() until the end of
+ * migration, so, the old page itself will not be swapped-out.
+ * But the new page can be.
+ * etc...
+ */
+ if (PageAnon(page))
+ SetPageCgroupMigration(pc);
}
unlock_page_cgroup(pc);
+ /*
+ * If the page is not charged at this point,
+ * we return here.
+ */
+ if (!mem)
+ return 0;

*ptr = mem;
- if (mem) {
- ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
- css_put(&mem->css);
+ ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
+ css_put(&mem->css);/* drop extra refcnt */
+ if (ret || *ptr == NULL) {
+ if (PageAnon(page)) {
+ lock_page_cgroup(pc);
+ ClearPageCgroupMigration(pc);
+ unlock_page_cgroup(pc);
+ /*
+ * The old page may be fully unmapped while we kept it.
+ */
+ mem_cgroup_uncharge_page(page);
+ }
+ return -ENOMEM;
}
+ /*
+ * We charge new page before it's mapped. So, even if unlock_page()
+ * is called far before end_migration, we can catch all events on
+ * this new page. In the case new page is migrated but not remapped,
+ * new page's mapcount will be finally 0 and we call uncharge in
+ * end_migration().
+ */
+ pc = lookup_page_cgroup(newpage);
+ if (PageAnon(page))
+ ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
+ else if (page_is_file_cache(page))
+ ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
+ else
+ ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
+ __mem_cgroup_commit_charge(mem, pc, ctype);
return ret;
}

/* remove redundant charge if migration failed*/
void mem_cgroup_end_migration(struct mem_cgroup *mem,
- struct page *oldpage, struct page *newpage)
+ struct page *oldpage, struct page *newpage)
{
- struct page *target, *unused;
+ struct page *used, *unused;
struct page_cgroup *pc;
- enum charge_type ctype;

if (!mem)
return;
+ /* blocks rmdir() */
cgroup_exclude_rmdir(&mem->css);
/* at migration success, oldpage->mapping is NULL. */
if (oldpage->mapping) {
- target = oldpage;
- unused = NULL;
+ used = oldpage;
+ unused = newpage;
} else {
- target = newpage;
+ used = newpage;
unused = oldpage;
}
-
- if (PageAnon(target))
- ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
- else if (page_is_file_cache(target))
- ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
- else
- ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
-
- /* unused page is not on radix-tree now. */
- if (unused)
- __mem_cgroup_uncharge_common(unused, ctype);
-
- pc = lookup_page_cgroup(target);
/*
- * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
- * So, double-counting is effectively avoided.
+ * We disallowed uncharge of pages under migration because mapcount
+ * of the page goes down to zero, temporarly.
+ * Clear the flag and check the page should be charged.
*/
- __mem_cgroup_commit_charge(mem, pc, ctype);
-
+ pc = lookup_page_cgroup(unused);
+ /* This flag itself is not racy, so, check it before lock */
+ if (PageCgroupMigration(pc)) {
+ lock_page_cgroup(pc);
+ ClearPageCgroupMigration(pc);
+ unlock_page_cgroup(pc);
+ }
+ __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE);
+
+ pc = lookup_page_cgroup(used);
+ if (PageCgroupMigration(pc)) {
+ lock_page_cgroup(pc);
+ ClearPageCgroupMigration(pc);
+ unlock_page_cgroup(pc);
+ }
/*
- * Both of oldpage and newpage are still under lock_page().
- * Then, we don't have to care about race in radix-tree.
- * But we have to be careful that this page is unmapped or not.
- *
- * There is a case for !page_mapped(). At the start of
- * migration, oldpage was mapped. But now, it's zapped.
- * But we know *target* page is not freed/reused under us.
- * mem_cgroup_uncharge_page() does all necessary checks.
- */
- if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
- mem_cgroup_uncharge_page(target);
+ * If a page is a file cache, radix-tree replacement is very atomic
+ * and we can skip this check. When it was an Anon page, its mapcount
+ * goes down to 0. But because we added MIGRATION flage, it's not
+ * uncharged yet. There are several case but page->mapcount check
+ * and USED bit check in mem_cgroup_uncharge_page() will do enough
+ * check. (see prepare_charge() also)
+ */
+ if (PageAnon(used))
+ mem_cgroup_uncharge_page(used);
/*
- * At migration, we may charge account against cgroup which has no tasks
+ * At migration, we may charge account against cgroup which has no
+ * tasks.
* So, rmdir()->pre_destroy() can be called while we do this charge.
* In that case, we need to call pre_destroy() again. check it here.
*/
Index: linux-2.6.34-rc4-mm1/mm/migrate.c
===================================================================
--- linux-2.6.34-rc4-mm1.orig/mm/migrate.c
+++ linux-2.6.34-rc4-mm1/mm/migrate.c
@@ -581,7 +581,7 @@ static int unmap_and_move(new_page_t get
}

/* charge against new page */
- charge = mem_cgroup_prepare_migration(page, &mem);
+ charge = mem_cgroup_prepare_migration(page, newpage, &mem);
if (charge == -ENOMEM) {
rc = -ENOMEM;
goto unlock;
Index: linux-2.6.34-rc4-mm1/include/linux/memcontrol.h
===================================================================
--- linux-2.6.34-rc4-mm1.orig/include/linux/memcontrol.h
+++ linux-2.6.34-rc4-mm1/include/linux/memcontrol.h
@@ -89,7 +89,8 @@ int mm_match_cgroup(const struct mm_stru
extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);

extern int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
+mem_cgroup_prepare_migration(struct page *page,
+ struct page *newpage, struct mem_cgroup **ptr);
extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
struct page *oldpage, struct page *newpage);

@@ -228,7 +229,8 @@ static inline struct cgroup_subsys_state
}

static inline int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
+ struct mem_cgroup **ptr)
{
return 0;
}
Index: linux-2.6.34-rc4-mm1/include/linux/page_cgroup.h
===================================================================
--- linux-2.6.34-rc4-mm1.orig/include/linux/page_cgroup.h
+++ linux-2.6.34-rc4-mm1/include/linux/page_cgroup.h
@@ -40,6 +40,7 @@ enum {
PCG_USED, /* this object is in use. */
PCG_ACCT_LRU, /* page has been accounted for */
PCG_FILE_MAPPED, /* page is accounted as "mapped" */
+ PCG_MIGRATION, /* under page migration */
};

#define TESTPCGFLAG(uname, lname) \
@@ -79,6 +80,10 @@ SETPCGFLAG(FileMapped, FILE_MAPPED)
CLEARPCGFLAG(FileMapped, FILE_MAPPED)
TESTPCGFLAG(FileMapped, FILE_MAPPED)

+SETPCGFLAG(Migration, MIGRATION)
+CLEARPCGFLAG(Migration, MIGRATION)
+TESTPCGFLAG(Migration, MIGRATION)
+
static inline int page_cgroup_nid(struct page_cgroup *pc)
{
return page_to_nid(pc->page);

2010-04-23 08:13:08

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

I'm sorry for my late reply.

On Tue, 20 Apr 2010 18:19:25 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Tue, 20 Apr 2010 13:20:50 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > > It will have no meanings for migrating
> > > file caches, but it may have some meanings for easy debugging.
> > > I think "mark it always but it's used only for anonymous page" is reasonable
> > > (if it causes no bug.)
> > >
> > Anyway, I don't have any strong objection.
> > It's all right for me as long as it is well documented or commented.
> >
> Okay, before posting as v4, here is draft version.
>
Thank you for adding good comments about what it does and why we need it.
I like the direction that we set MIGRATION flags only on the old page.
And this patch looks good to me, except that checkpatch warns some problems
about indent :)

I have one question.

> /* remove redundant charge if migration failed*/
> void mem_cgroup_end_migration(struct mem_cgroup *mem,
> - struct page *oldpage, struct page *newpage)
> + struct page *oldpage, struct page *newpage)
> {
> - struct page *target, *unused;
> + struct page *used, *unused;
> struct page_cgroup *pc;
> - enum charge_type ctype;
>
> if (!mem)
> return;
> + /* blocks rmdir() */
> cgroup_exclude_rmdir(&mem->css);
> /* at migration success, oldpage->mapping is NULL. */
> if (oldpage->mapping) {
> - target = oldpage;
> - unused = NULL;
> + used = oldpage;
> + unused = newpage;
> } else {
> - target = newpage;
> + used = newpage;
> unused = oldpage;
> }
> -
> - if (PageAnon(target))
> - ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> - else if (page_is_file_cache(target))
> - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> - else
> - ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> -
> - /* unused page is not on radix-tree now. */
> - if (unused)
> - __mem_cgroup_uncharge_common(unused, ctype);
> -
> - pc = lookup_page_cgroup(target);
> /*
> - * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
> - * So, double-counting is effectively avoided.
> + * We disallowed uncharge of pages under migration because mapcount
> + * of the page goes down to zero, temporarly.
> + * Clear the flag and check the page should be charged.
> */
> - __mem_cgroup_commit_charge(mem, pc, ctype);
> -
> + pc = lookup_page_cgroup(unused);
> + /* This flag itself is not racy, so, check it before lock */
> + if (PageCgroupMigration(pc)) {
> + lock_page_cgroup(pc);
> + ClearPageCgroupMigration(pc);
> + unlock_page_cgroup(pc);
> + }
The reason why "This flag itself is not racy" is that we update the flag only
while the page is isolated ?
Then, we doesn't need page_cgroup lock, do we ? PCG_USED bit will avoid
double-uncharge.

Thanks,
Daisuke Nishimura.

2010-04-23 08:27:44

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC][BUGFIX][PATCH 2/2] memcg: fix file mapped underflow at migration (v3)

On Fri, 23 Apr 2010 17:08:46 +0900
Daisuke Nishimura <[email protected]> wrote:

> I'm sorry for my late reply.
>
> On Tue, 20 Apr 2010 18:19:25 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > On Tue, 20 Apr 2010 13:20:50 +0900
> > Daisuke Nishimura <[email protected]> wrote:
> >
> > > > It will have no meanings for migrating
> > > > file caches, but it may have some meanings for easy debugging.
> > > > I think "mark it always but it's used only for anonymous page" is reasonable
> > > > (if it causes no bug.)
> > > >
> > > Anyway, I don't have any strong objection.
> > > It's all right for me as long as it is well documented or commented.
> > >
> > Okay, before posting as v4, here is draft version.
> >
> Thank you for adding good comments about what it does and why we need it.
> I like the direction that we set MIGRATION flags only on the old page.
> And this patch looks good to me, except that checkpatch warns some problems
> about indent :)
>
(--;

I'm sorry that this patch is delayed. I have to fix migration itself
for testing this. I'd like to post this before long holidayes in the next week.

> I have one question.
>
> > /* remove redundant charge if migration failed*/
> > void mem_cgroup_end_migration(struct mem_cgroup *mem,
> > - struct page *oldpage, struct page *newpage)
> > + struct page *oldpage, struct page *newpage)
> > {
> > - struct page *target, *unused;
> > + struct page *used, *unused;
> > struct page_cgroup *pc;
> > - enum charge_type ctype;
> >
> > if (!mem)
> > return;
> > + /* blocks rmdir() */
> > cgroup_exclude_rmdir(&mem->css);
> > /* at migration success, oldpage->mapping is NULL. */
> > if (oldpage->mapping) {
> > - target = oldpage;
> > - unused = NULL;
> > + used = oldpage;
> > + unused = newpage;
> > } else {
> > - target = newpage;
> > + used = newpage;
> > unused = oldpage;
> > }
> > -
> > - if (PageAnon(target))
> > - ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
> > - else if (page_is_file_cache(target))
> > - ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
> > - else
> > - ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> > -
> > - /* unused page is not on radix-tree now. */
> > - if (unused)
> > - __mem_cgroup_uncharge_common(unused, ctype);
> > -
> > - pc = lookup_page_cgroup(target);
> > /*
> > - * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
> > - * So, double-counting is effectively avoided.
> > + * We disallowed uncharge of pages under migration because mapcount
> > + * of the page goes down to zero, temporarly.
> > + * Clear the flag and check the page should be charged.
> > */
> > - __mem_cgroup_commit_charge(mem, pc, ctype);
> > -
> > + pc = lookup_page_cgroup(unused);
> > + /* This flag itself is not racy, so, check it before lock */
> > + if (PageCgroupMigration(pc)) {
> > + lock_page_cgroup(pc);
> > + ClearPageCgroupMigration(pc);
> > + unlock_page_cgroup(pc);
> > + }
> The reason why "This flag itself is not racy" is that we update the flag only
> while the page is isolated ?
yes and no.
It's not racy because a page is only under a migration thread, not under a few of
migration threads. And only the migration thread mark this MIGRATION.

> Then, we doesn't need page_cgroup lock, do we ? PCG_USED bit will avoid
> double-uncharge.
>
no. there is a chance to update FILE_MAPPED etc..and any other races. I guess.

Thanks,
-Kame