Hi.
It seems the current -mm has been gradually stabilized,
but I encounter another bad page problem in my test(*1)
on 2.6.26-rc5-mm3 + patch collection(*2).
Compared to previous probrems fixed by the patch collection,
the frequency is law.
- 1 time in 1 hour running(1'st one was seen after 30 minutes)
- 3 times in 16 hours running(1'st one was seen after 4 hours)
- 10 times in 70 hours running(1'st one was seen after 8 hours)
All bad pages show similar message like below:
---
Bad page state in process 'switch.sh'
page:ffffe2000c8e59c0 flags:0x0200000000080018 mapping:000
0000000000000 mapcount:0 count:0
cgroup:ffff81062a817050
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 14980, comm: switch.sh Not tainted 2.6.26-rc5-mm3-mem
fix #1
Jun 19 20:10:23 opteron kernel:
Call Trace:
[<ffffffff802747b0>] bad_page+0x97/0x131
[<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
[<ffffffff80275bcf>] __pagevec_free+0x21/0x2e
[<ffffffff80278d51>] release_pages+0x18d/0x19f
[<ffffffff80278e58>] ____pagevec_lru_add+0xf5/0x106
[<ffffffff8027a5ea>] putback_lru_page+0x52/0xe9
[<ffffffff8029baec>] migrate_pages+0x331/0x42a
[<ffffffff8029070f>] new_node_page+0x0/0x2f
[<ffffffff802915a9>] do_migrate_pages+0x19b/0x1e7
[<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
[<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
[<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
[<ffffffff8029db71>] __dentry_open+0x154/0x238
[<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
[<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
[<ffffffff8030a335>] selinux_file_permission+0x56/0x117
[<ffffffff8029f74d>] vfs_write+0xad/0x136
[<ffffffff8029fc8a>] sys_write+0x45/0x6e
[<ffffffff8020bef2>] tracesys+0xd5/0xda
Jun 19 20:10:23 opteron kernel:
Hexdump:
000: 28 00 08 00 00 00 00 02 01 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 a1 f1 08 25 03 81 ff ff
020: 6e 06 90 f5 07 00 00 00 68 59 8e 0c 00 e2 ff ff
030: a8 a5 8c 0c 00 e2 ff ff 00 cf 11 25 03 81 ff ff
040: 18 00 08 00 00 00 00 02 00 00 00 00 ff ff ff ff
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: c0 08 00 00 00 00 00 00 00 01 10 00 00 c1 ff ff
070: 00 02 20 00 00 c1 ff ff 00 00 00 00 00 00 00 00
080: 08 00 04 00 00 00 00 02 00 00 00 00 ff ff ff ff
090: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0a0: 7e 99 7a f6 07 00 00 00 28 9c 8d 0c 00 e2 ff ff
0b0: 28 16 86 0c 00 e2 ff ff 00 00 00 00 00 00 00 00
---
- page flags are 0x...80018, PG_uptodate/PG_dirty/PG_swapbacked,
and count/map_count/mapping are all 0(no pproblem).
- contains "cgroup:..." line. this is the cause of bad page.
So, some pages that have not been uncharged by memcg
are beeing freed(I don't mount memcg, but don't specify
"cgroup_disable=memory").
I have not found yet the path where this can happen,
and I'm digging more.
Thanks,
Daisuke Nishimura.
*1 http://lkml.org/lkml/2008/6/17/367
*2 http://lkml.org/lkml/2008/6/19/62
On Mon, 23 Jun 2008 14:53:41 +0900
Daisuke Nishimura <[email protected]> wrote:
> Hi.
>
> It seems the current -mm has been gradually stabilized,
> but I encounter another bad page problem in my test(*1)
> on 2.6.26-rc5-mm3 + patch collection(*2).
>
> Compared to previous probrems fixed by the patch collection,
> the frequency is law.
>
> - 1 time in 1 hour running(1'st one was seen after 30 minutes)
> - 3 times in 16 hours running(1'st one was seen after 4 hours)
> - 10 times in 70 hours running(1'st one was seen after 8 hours)
>
> All bad pages show similar message like below:
>
Thank you. I'll dig this.
-Kame
> ---
> Bad page state in process 'switch.sh'
> page:ffffe2000c8e59c0 flags:0x0200000000080018 mapping:000
> 0000000000000 mapcount:0 count:0
> cgroup:ffff81062a817050
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 14980, comm: switch.sh Not tainted 2.6.26-rc5-mm3-mem
> fix #1
> Jun 19 20:10:23 opteron kernel:
> Call Trace:
> [<ffffffff802747b0>] bad_page+0x97/0x131
> [<ffffffff80275ae6>] free_hot_cold_page+0xd4/0x19c
> [<ffffffff80275bcf>] __pagevec_free+0x21/0x2e
> [<ffffffff80278d51>] release_pages+0x18d/0x19f
> [<ffffffff80278e58>] ____pagevec_lru_add+0xf5/0x106
> [<ffffffff8027a5ea>] putback_lru_page+0x52/0xe9
> [<ffffffff8029baec>] migrate_pages+0x331/0x42a
> [<ffffffff8029070f>] new_node_page+0x0/0x2f
> [<ffffffff802915a9>] do_migrate_pages+0x19b/0x1e7
> [<ffffffff8025c827>] cpuset_migrate_mm+0x58/0x8f
> [<ffffffff8025d0fd>] cpuset_attach+0x8b/0x9e
> [<ffffffff8025a3e1>] cgroup_attach_task+0x3a3/0x3f5
> [<ffffffff8029db71>] __dentry_open+0x154/0x238
> [<ffffffff8025af06>] cgroup_common_file_write+0x150/0x1dd
> [<ffffffff8025aaf4>] cgroup_file_write+0x54/0x150
> [<ffffffff8030a335>] selinux_file_permission+0x56/0x117
> [<ffffffff8029f74d>] vfs_write+0xad/0x136
> [<ffffffff8029fc8a>] sys_write+0x45/0x6e
> [<ffffffff8020bef2>] tracesys+0xd5/0xda
> Jun 19 20:10:23 opteron kernel:
> Hexdump:
> 000: 28 00 08 00 00 00 00 02 01 00 00 00 00 00 00 00
> 010: 00 00 00 00 00 00 00 00 a1 f1 08 25 03 81 ff ff
> 020: 6e 06 90 f5 07 00 00 00 68 59 8e 0c 00 e2 ff ff
> 030: a8 a5 8c 0c 00 e2 ff ff 00 cf 11 25 03 81 ff ff
> 040: 18 00 08 00 00 00 00 02 00 00 00 00 ff ff ff ff
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: c0 08 00 00 00 00 00 00 00 01 10 00 00 c1 ff ff
> 070: 00 02 20 00 00 c1 ff ff 00 00 00 00 00 00 00 00
> 080: 08 00 04 00 00 00 00 02 00 00 00 00 ff ff ff ff
> 090: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0a0: 7e 99 7a f6 07 00 00 00 28 9c 8d 0c 00 e2 ff ff
> 0b0: 28 16 86 0c 00 e2 ff ff 00 00 00 00 00 00 00 00
> ---
>
> - page flags are 0x...80018, PG_uptodate/PG_dirty/PG_swapbacked,
> and count/map_count/mapping are all 0(no pproblem).
> - contains "cgroup:..." line. this is the cause of bad page.
>
> So, some pages that have not been uncharged by memcg
> are beeing freed(I don't mount memcg, but don't specify
> "cgroup_disable=memory").
> I have not found yet the path where this can happen,
> and I'm digging more.
>
>
> Thanks,
> Daisuke Nishimura.
>
> *1 http://lkml.org/lkml/2008/6/17/367
> *2 http://lkml.org/lkml/2008/6/19/62
>
On Mon, 23 Jun 2008 15:08:17 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Mon, 23 Jun 2008 14:53:41 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > Hi.
> >
> > It seems the current -mm has been gradually stabilized,
> > but I encounter another bad page problem in my test(*1)
> > on 2.6.26-rc5-mm3 + patch collection(*2).
> >
> > Compared to previous probrems fixed by the patch collection,
> > the frequency is law.
> >
> > - 1 time in 1 hour running(1'st one was seen after 30 minutes)
> > - 3 times in 16 hours running(1'st one was seen after 4 hours)
> > - 10 times in 70 hours running(1'st one was seen after 8 hours)
> >
> > All bad pages show similar message like below:
> >
> Thank you. I'll dig this.
>
>
Here is one possibilty. But if your test doesn't migrate any shmem,
I'll have to dig more ;)
Anyway, I'll schedule this patch.
-Kame
=
mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
And there were special handling to ingore swap-cache page. But, shmem can
be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
not correct here. Check PageAnon() instead.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/migrate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
Index: test2-2.6.26-rc5-mm3/mm/migrate.c
===================================================================
--- test2-2.6.26-rc5-mm3.orig/mm/migrate.c
+++ test2-2.6.26-rc5-mm3/mm/migrate.c
@@ -330,7 +330,13 @@ static int migrate_page_move_mapping(str
__inc_zone_page_state(newpage, NR_FILE_PAGES);
spin_unlock_irq(&mapping->tree_lock);
- if (!PageSwapCache(newpage))
+
+ /*
+ * The page is removed from radix-tree implicitly.
+ * We uncharge it here but swap cache of anonymous page should be
+ * uncharged by mem_cgroup_ucharge_page().
+ */
+ if (!PageAnon(newpage))
mem_cgroup_uncharge_cache_page(page);
return 0;
@@ -379,7 +385,8 @@ static void migrate_page_copy(struct pag
/*
* SwapCache is removed implicitly. Uncharge against swapcache
* should be called after ClearPageSwapCache() because
- * mem_cgroup_uncharge_page checks the flag.
+ * mem_cgroup_uncharge_page checks the flag. shmem's swap cache
+ * is uncharged before here.
*/
mem_cgroup_uncharge_page(page);
}
On Mon, 23 Jun 2008 20:21:11 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Mon, 23 Jun 2008 15:08:17 +0900
> KAMEZAWA Hiroyuki <[email protected]> wrote:
>
> > On Mon, 23 Jun 2008 14:53:41 +0900
> > Daisuke Nishimura <[email protected]> wrote:
> >
> > > Hi.
> > >
> > > It seems the current -mm has been gradually stabilized,
> > > but I encounter another bad page problem in my test(*1)
> > > on 2.6.26-rc5-mm3 + patch collection(*2).
> > >
> > > Compared to previous probrems fixed by the patch collection,
> > > the frequency is law.
> > >
> > > - 1 time in 1 hour running(1'st one was seen after 30 minutes)
> > > - 3 times in 16 hours running(1'st one was seen after 4 hours)
> > > - 10 times in 70 hours running(1'st one was seen after 8 hours)
> > >
> > > All bad pages show similar message like below:
> > >
> > Thank you. I'll dig this.
> >
> >
> Here is one possibilty. But if your test doesn't migrate any shmem,
> I'll have to dig more ;)
> Anyway, I'll schedule this patch.
>
Thank you for your investigation and a patch!
I don't use shmem explicitly, but I'll test this patch anyway
and report the result.
Considering the frequency of the problem, it will take long time
to tell whether this patch fixes the problem, so please wait :)
Thanks,
Daisuke Nishimura.
> I don't use shmem explicitly, but I'll test this patch anyway
> and report the result.
>
Unfortunately, this patch doesn't solve my problem, hum...
I'll dig more, too.
In my test, I don't use large amount of memory, so I think
no swap activities happens, perhaps.
Anyway, I agree that this patch itself is needed for shmem migration.
Thanks,
Daisuke Nishimura.
On Tue, 24 Jun 2008 10:37:09 +0900
Daisuke Nishimura <[email protected]> wrote:
> > I don't use shmem explicitly, but I'll test this patch anyway
> > and report the result.
> >
> Unfortunately, this patch doesn't solve my problem, hum...
> I'll dig more, too.
> In my test, I don't use large amount of memory, so I think
> no swap activities happens, perhaps.
>
Sigh, one hint in the log is
==
Bad page state in process 'switch.sh'
page:ffffe2000c8e59c0 flags:0x0200000000080018 mapping:000
0000000000000 mapcount:0 count:0
cgroup:ffff81062a817050
==
- the page was mapped one.
- a page is swapbacked ....Anon or Shmem/tmpfs.
- mapping is NULL
When it was a *source* page.
.. if it was Anon, page->mapping was cleared by migrate_page_copy()
.. if not, replacement in radix-tree was succeeded.
When it was a destination page
.. page-flags is copied, then, migrate_page_copy() was called.
.. newpage->mapping is cleared only at migration failure.
Hmm..I think the troublesome page is *source* page now.
Anyway, thanks.
-Kame
On Tue, 24 Jun 2008 12:22:57 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Tue, 24 Jun 2008 10:37:09 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > > I don't use shmem explicitly, but I'll test this patch anyway
> > > and report the result.
> > >
> > Unfortunately, this patch doesn't solve my problem, hum...
> > I'll dig more, too.
> > In my test, I don't use large amount of memory, so I think
> > no swap activities happens, perhaps.
> >
> Sigh, one hint in the log is
> ==
> Bad page state in process 'switch.sh'
> page:ffffe2000c8e59c0 flags:0x0200000000080018 mapping:000
> 0000000000000 mapcount:0 count:0
> cgroup:ffff81062a817050
> ==
>
> - the page was mapped one.
> - a page is swapbacked ....Anon or Shmem/tmpfs.
> - mapping is NULL
>
ignore this... free_hot_cold_page() clears page->mapping. (--;
-Kame
Hi, Nishimura-san. thank you for all your help.
I think this one is......hopefully.
==
In general, mem_cgroup's charge on ANON page is removed when page_remove_rmap()
is called.
At migration, the newpage is remapped again by remove_migration_ptes(). But
pte may be already changed (by task exits).
It is charged at page allocation but have no chance to be uncharged in that
case because it is never added to rmap.
Handle that corner case in mem_cgroup_end_migration().
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/memcontrol.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
Index: test2-2.6.26-rc5-mm3/mm/memcontrol.c
===================================================================
--- test2-2.6.26-rc5-mm3.orig/mm/memcontrol.c
+++ test2-2.6.26-rc5-mm3/mm/memcontrol.c
@@ -747,10 +747,22 @@ int mem_cgroup_prepare_migration(struct
/* remove redundant charge if migration failed*/
void mem_cgroup_end_migration(struct page *newpage)
{
- /* At success, page->mapping is not NULL and nothing to do. */
+ /*
+ * At success, page->mapping is not NULL.
+ * special rollback care is necessary when
+ * 1. at migration failure. (newpage->mapping is cleared in this case)
+ * 2. the newpage was moved but not remapped again because the task
+ * exits and the newpage is obsolete. In this case, the new page
+ * may be a swapcache. So, we just call mem_cgroup_uncharge_page()
+ * always for avoiding mess. The page_cgroup will be removed if
+ * unnecessary. File cache pages is still on radix-tree. Don't
+ * care it.
+ */
if (!newpage->mapping)
__mem_cgroup_uncharge_common(newpage,
MEM_CGROUP_CHARGE_TYPE_FORCE);
+ else if (PageAnon(newpage))
+ mem_cgroup_uncharge_page(newpage);
}
/*
On Tue, 24 Jun 2008 14:51:27 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> Hi, Nishimura-san. thank you for all your help.
>
> I think this one is......hopefully.
>
I hope so too :)
I think the corner case that this patch fixes is likely
in my case(there may be other cases though..).
I'm testing this one now.
> ==
>
> In general, mem_cgroup's charge on ANON page is removed when page_remove_rmap()
> is called.
>
> At migration, the newpage is remapped again by remove_migration_ptes(). But
> pte may be already changed (by task exits).
> It is charged at page allocation but have no chance to be uncharged in that
> case because it is never added to rmap.
>
I think "charged by mem_cgroup_prepare_migration()" is more precise.
> Handle that corner case in mem_cgroup_end_migration().
>
>
Thanks,
Daisuke Nishimura.
On Tue, 24 Jun 2008 16:19:03 +0900
Daisuke Nishimura <[email protected]> wrote:
> On Tue, 24 Jun 2008 14:51:27 +0900, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > Hi, Nishimura-san. thank you for all your help.
> >
> > I think this one is......hopefully.
> >
> I hope so too :)
>
> I think the corner case that this patch fixes is likely
> in my case(there may be other cases though..).
>
> I'm testing this one now.
>
> > ==
> >
> > In general, mem_cgroup's charge on ANON page is removed when page_remove_rmap()
> > is called.
> >
> > At migration, the newpage is remapped again by remove_migration_ptes(). But
> > pte may be already changed (by task exits).
> > It is charged at page allocation but have no chance to be uncharged in that
> > case because it is never added to rmap.
> >
> I think "charged by mem_cgroup_prepare_migration()" is more precise.
>
Thanks, will rewrite.
Regards,
-Kame
> > Handle that corner case in mem_cgroup_end_migration().
> >
> >
>
>
> Thanks,
> Daisuke Nishimura.
>
KAMEZAWA Hiroyuki wrote:
> Hi, Nishimura-san. thank you for all your help.
>
> I think this one is......hopefully.
>
> ==
>
> In general, mem_cgroup's charge on ANON page is removed when page_remove_rmap()
> is called.
>
> At migration, the newpage is remapped again by remove_migration_ptes(). But
> pte may be already changed (by task exits).
> It is charged at page allocation but have no chance to be uncharged in that
> case because it is never added to rmap.
>
> Handle that corner case in mem_cgroup_end_migration().
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
>
>
> ---
> mm/memcontrol.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> Index: test2-2.6.26-rc5-mm3/mm/memcontrol.c
> ===================================================================
> --- test2-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> +++ test2-2.6.26-rc5-mm3/mm/memcontrol.c
> @@ -747,10 +747,22 @@ int mem_cgroup_prepare_migration(struct
> /* remove redundant charge if migration failed*/
> void mem_cgroup_end_migration(struct page *newpage)
> {
> - /* At success, page->mapping is not NULL and nothing to do. */
> + /*
> + * At success, page->mapping is not NULL.
> + * special rollback care is necessary when
> + * 1. at migration failure. (newpage->mapping is cleared in this case)
> + * 2. the newpage was moved but not remapped again because the task
> + * exits and the newpage is obsolete. In this case, the new page
> + * may be a swapcache. So, we just call mem_cgroup_uncharge_page()
> + * always for avoiding mess. The page_cgroup will be removed if
> + * unnecessary. File cache pages is still on radix-tree. Don't
> + * care it.
> + */
> if (!newpage->mapping)
> __mem_cgroup_uncharge_common(newpage,
> MEM_CGROUP_CHARGE_TYPE_FORCE);
> + else if (PageAnon(newpage))
> + mem_cgroup_uncharge_page(newpage);
> }
Definitely makes sense to me!
Acked-by: Balbir Singh <[email protected]>
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL