2011-02-09 06:17:12

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [mmotm] BUG: Bad page state in process khugepaged ?


In environment
- Host: kernel-2.6.32xxx (RHEL6)
- Guest: mm-of-the-moment snapshot 2011-02-04-15-15

I saw this when I ran make -j8 under 200M limit of memcg with 4vcpu.
But it seems this doesn't directly related to memcg or virtualization.

Anyway, log is here. I'm sorry if this is a fixed one.
My .config is attached. and a brief note is below


==
Feb 9 14:39:54 rhel6-test kernel: [ 4209.076861] BUG: Bad page state in process khugepaged pfn:1e9800
Feb 9 14:39:54 rhel6-test kernel: [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800
Feb 9 14:39:54 rhel6-test kernel: [ 4209.078674] page flags: 0x40000000004000(head)
Feb 9 14:39:54 rhel6-test kernel: [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
Feb 9 14:39:54 rhel6-test kernel: [ 4209.082177] (/A)
Feb 9 14:39:54 rhel6-test kernel: [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
Feb 9 14:39:54 rhel6-test kernel: [ 4209.083412] Call Trace:
Feb 9 14:39:54 rhel6-test kernel: [ 4209.083678] [<ffffffff810f4454>] ? bad_page+0xe4/0x140
Feb 9 14:39:54 rhel6-test kernel: [ 4209.084240] [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
Feb 9 14:39:54 rhel6-test kernel: [ 4209.084837] [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
Feb 9 14:39:54 rhel6-test kernel: [ 4209.085509] [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
Feb 9 14:39:54 rhel6-test kernel: [ 4209.086110] [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
Feb 9 14:39:54 rhel6-test kernel: [ 4209.086699] [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
Feb 9 14:39:54 rhel6-test kernel: [ 4209.087333] [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
Feb 9 14:39:54 rhel6-test kernel: [ 4209.087935] [<ffffffff810fb015>] ? put_page+0x45/0x50
Feb 9 14:39:54 rhel6-test kernel: [ 4209.097361] [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
Feb 9 14:39:54 rhel6-test kernel: [ 4209.098364] [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
Feb 9 14:39:54 rhel6-test kernel: [ 4209.099121] [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
Feb 9 14:39:54 rhel6-test kernel: [ 4209.099780] [<ffffffff8107c236>] ? kthread+0x96/0xa0
Feb 9 14:39:54 rhel6-test kernel: [ 4209.100452] [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
Feb 9 14:39:54 rhel6-test kernel: [ 4209.101214] [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
Feb 9 14:39:54 rhel6-test kernel: [ 4209.101842] [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
Feb 9 14:39:54 rhel6-test kernel: [ 4209.102575] ------------[ cut here ]------------
Feb 9 14:39:54 rhel6-test kernel: [ 4209.103190] kernel BUG at include/linux/mm.h:420!
Feb 9 14:39:54 rhel6-test kernel: [ 4209.103803] invalid opcode: 0000 [#1] SMP
Feb 9 14:39:54 rhel6-test kernel: [ 4209.104309] last sysfs file: /sys/devices/system/cpu/online
Feb 9 14:39:54 rhel6-test kernel: [ 4209.104991] CPU 0
Feb 9 14:39:54 rhel6-test kernel: [ 4209.105244] Modules linked in: autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: scsi_wait_scan]
Feb 9 14:39:54 rhel6-test kernel: [ 4209.108135]
Feb 9 14:39:54 rhel6-test kernel: [ 4209.108303] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1 /KVM
Feb 9 14:39:54 rhel6-test kernel: [ 4209.109098] RIP: 0010:[<ffffffff810f4486>] [<ffffffff810f4486>] bad_page+0x116/0x140
Feb 9 14:39:54 rhel6-test kernel: [ 4209.110086] RSP: 0000:ffff880211f99ca0 EFLAGS: 00010202
Feb 9 14:39:54 rhel6-test kernel: [ 4209.110747] RAX: 00000000ffffffff RBX: ffffea0006b14000 RCX: 0000000000000712
Feb 9 14:39:54 rhel6-test kernel: [ 4209.111601] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff880211f99f58
Feb 9 14:39:54 rhel6-test kernel: [ 4209.112491] RBP: ffff880211f99cb0 R08: ffffffff81b41b80 R09: 0000000000000200
Feb 9 14:39:54 rhel6-test kernel: [ 4209.119933] R10: 0000000000000006 R11: 0000000000000001 R12: 0000000000000200
Feb 9 14:39:54 rhel6-test kernel: [ 4209.120764] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Feb 9 14:39:54 rhel6-test kernel: [ 4209.121623] FS: 0000000000000000(0000) GS:ffff8800dfc00000(0000) knlGS:0000000000000000
Feb 9 14:39:54 rhel6-test kernel: [ 4209.122523] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb 9 14:39:54 rhel6-test kernel: [ 4209.123195] CR2: 0000000000b9dd90 CR3: 00000001ffeaf000 CR4: 00000000000006f0
Feb 9 14:39:54 rhel6-test kernel: [ 4209.123951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 9 14:39:54 rhel6-test kernel: [ 4209.124765] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 9 14:39:54 rhel6-test kernel: [ 4209.125610] Process khugepaged (pid: 31, threadinfo ffff880211f98000, task ffff880211f34590)
Feb 9 14:39:54 rhel6-test kernel: [ 4209.126529] Stack:
Feb 9 14:39:54 rhel6-test kernel: [ 4209.126793] ffff880211f99cb0 ffffea0006b14000 ffff880211f99d00 ffffffff810f53e6
Feb 9 14:39:54 rhel6-test kernel: [ 4209.127725] ffff880211f99d20 ffffffff8155621d dead000000100100 ffffea0006b14000
Feb 9 14:39:54 rhel6-test kernel: [ 4209.130864] 0000000000000009 0000000000000019 0000000000000000 0000000000000447
Feb 9 14:39:54 rhel6-test kernel: [ 4209.131766] Call Trace:
Feb 9 14:39:54 rhel6-test kernel: [ 4209.132236] [<ffffffff810f53e6>] free_pages_prepare+0xd6/0x120
Feb 9 14:39:54 rhel6-test kernel: [ 4209.132891] [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
Feb 9 14:39:54 rhel6-test kernel: [ 4209.133864] [<ffffffff810f5462>] __free_pages_ok+0x32/0xe0
Feb 9 14:39:54 rhel6-test kernel: [ 4209.134518] [<ffffffff810f552b>] free_compound_page+0x1b/0x20
Feb 9 14:39:54 rhel6-test kernel: [ 4209.135244] [<ffffffff810fad6c>] __put_compound_page+0x1c/0x30
Feb 9 14:39:54 rhel6-test kernel: [ 4209.135934] [<ffffffff810fae1d>] put_compound_page+0x4d/0x200
Feb 9 14:39:54 rhel6-test kernel: [ 4209.136659] [<ffffffff810fb015>] put_page+0x45/0x50
Feb 9 14:39:54 rhel6-test kernel: [ 4209.137232] [<ffffffff8113f779>] khugepaged+0x9e9/0x1430
Feb 9 14:39:54 rhel6-test kernel: [ 4209.137801] [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
Feb 9 14:39:54 rhel6-test kernel: [ 4209.138527] [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
Feb 9 14:39:54 rhel6-test kernel: [ 4209.139166] [<ffffffff8107c236>] kthread+0x96/0xa0
Feb 9 14:39:54 rhel6-test kernel: [ 4209.139707] [<ffffffff8100dda4>] kernel_thread_helper+0x4/0x10
Feb 9 14:39:54 rhel6-test kernel: [ 4209.140334] [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
Feb 9 14:39:54 rhel6-test kernel: [ 4209.140901] [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
Feb 9 14:39:54 rhel6-test kernel: [ 4209.141582] Code: 35 f0 19 c1 00 48 85 f6 75 25 48 c7 05 e8 19 c1 00 01 00 00 00 48 8b 05 49 54 a5 00 48 05 60 ea 00 00 48 89 05 dc 19 c1 00 eb 83 <0f> 0b eb fe 48 c7 c7 30 5a 7a 81 31 c0 e8 e0 f0 45 00 48 c7 05

==

2nd log, "kernel BUG at include/linux/mm.h:420!" is This one.
==
static inline void __ClearPageBuddy(struct page *page)
{
VM_BUG_ON(!PageBuddy(page));
atomic_set(&page->_mapcount, -1);
}
==
But this is just a tail of bad_page().
==
static void bad_page(struct page *page)
{
static unsigned long resume;
static unsigned long nr_shown;
static unsigned long nr_unshown;
...
dump_stack();
out:
/* Leave bad fields for debug, except PageBuddy could make trouble */
__ClearPageBuddy(page);
add_taint(TAINT_BAD_PAGE);
}
==
So, what important is bad_page().

BAD page says
==
BUG: Bad page state in process khugepaged pfn:1e9800
page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800
page flags: 0x40000000004000(head)
pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
==

Maybe page_mapcount(page) was > 0. and ->mapping was NULL.

BTW, I think pc->flags should be printed in hex...Nishimura-san, how do you think ?


In hex, pc->flags was 7A00000000004 and this means PCG_USED bit is set.
This implies page_remove_rmap() may not be called but ->mapping is NULL. Hmm?
(7A is encoding of section number.)

make -j 8 under 200M limit is highly busy with swap.

Thanks,
-Kame


Attachments:
myconfig (78.08 kB)

2011-02-09 06:46:30

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Wed, 9 Feb 2011 15:10:36 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> 2nd log, "kernel BUG at include/linux/mm.h:420!" is This one.
> ==
> static inline void __ClearPageBuddy(struct page *page)
> {
> VM_BUG_ON(!PageBuddy(page));
> atomic_set(&page->_mapcount, -1);
> }
> ==
> But this is just a tail of bad_page().
> ==
> static void bad_page(struct page *page)
> {
> static unsigned long resume;
> static unsigned long nr_shown;
> static unsigned long nr_unshown;
> ...
> dump_stack();
> out:
> /* Leave bad fields for debug, except PageBuddy could make trouble */
> __ClearPageBuddy(page);
> add_taint(TAINT_BAD_PAGE);
> }
> ==
> So, what important is bad_page().
>
> BAD page says
> ==
> BUG: Bad page state in process khugepaged pfn:1e9800
> page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800
> page flags: 0x40000000004000(head)
> pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
> ==
>
> Maybe page_mapcount(page) was > 0. and ->mapping was NULL.
Sorry please ignore above. bad_page() used page_mapcount().


Regards,
-Kame




2011-02-09 06:51:38

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Wed, 9 Feb 2011 15:10:36 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

>
> In environment
> - Host: kernel-2.6.32xxx (RHEL6)
> - Guest: mm-of-the-moment snapshot 2011-02-04-15-15
>
> I saw this when I ran make -j8 under 200M limit of memcg with 4vcpu.
> But it seems this doesn't directly related to memcg or virtualization.
>
> Anyway, log is here. I'm sorry if this is a fixed one.
> My .config is attached. and a brief note is below
>
>
> ==
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.076861] BUG: Bad page state in process khugepaged pfn:1e9800
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.078674] page flags: 0x40000000004000(head)
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.082177] (/A)
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.083412] Call Trace:
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.083678] [<ffffffff810f4454>] ? bad_page+0xe4/0x140
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.084240] [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.084837] [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.085509] [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.086110] [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.086699] [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.087333] [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.087935] [<ffffffff810fb015>] ? put_page+0x45/0x50
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.097361] [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.098364] [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.099121] [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.099780] [<ffffffff8107c236>] ? kthread+0x96/0xa0
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.100452] [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.101214] [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.101842] [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.102575] ------------[ cut here ]------------
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.103190] kernel BUG at include/linux/mm.h:420!
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.103803] invalid opcode: 0000 [#1] SMP
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.104309] last sysfs file: /sys/devices/system/cpu/online
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.104991] CPU 0
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.105244] Modules linked in: autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: scsi_wait_scan]
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.108135]
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.108303] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1 /KVM
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.109098] RIP: 0010:[<ffffffff810f4486>] [<ffffffff810f4486>] bad_page+0x116/0x140
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.110086] RSP: 0000:ffff880211f99ca0 EFLAGS: 00010202
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.110747] RAX: 00000000ffffffff RBX: ffffea0006b14000 RCX: 0000000000000712
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.111601] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff880211f99f58
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.112491] RBP: ffff880211f99cb0 R08: ffffffff81b41b80 R09: 0000000000000200
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.119933] R10: 0000000000000006 R11: 0000000000000001 R12: 0000000000000200
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.120764] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.121623] FS: 0000000000000000(0000) GS:ffff8800dfc00000(0000) knlGS:0000000000000000
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.122523] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.123195] CR2: 0000000000b9dd90 CR3: 00000001ffeaf000 CR4: 00000000000006f0
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.123951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.124765] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.125610] Process khugepaged (pid: 31, threadinfo ffff880211f98000, task ffff880211f34590)
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.126529] Stack:
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.126793] ffff880211f99cb0 ffffea0006b14000 ffff880211f99d00 ffffffff810f53e6
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.127725] ffff880211f99d20 ffffffff8155621d dead000000100100 ffffea0006b14000
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.130864] 0000000000000009 0000000000000019 0000000000000000 0000000000000447
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.131766] Call Trace:
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.132236] [<ffffffff810f53e6>] free_pages_prepare+0xd6/0x120
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.132891] [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.133864] [<ffffffff810f5462>] __free_pages_ok+0x32/0xe0
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.134518] [<ffffffff810f552b>] free_compound_page+0x1b/0x20
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.135244] [<ffffffff810fad6c>] __put_compound_page+0x1c/0x30
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.135934] [<ffffffff810fae1d>] put_compound_page+0x4d/0x200
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.136659] [<ffffffff810fb015>] put_page+0x45/0x50
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.137232] [<ffffffff8113f779>] khugepaged+0x9e9/0x1430
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.137801] [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.138527] [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.139166] [<ffffffff8107c236>] kthread+0x96/0xa0
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.139707] [<ffffffff8100dda4>] kernel_thread_helper+0x4/0x10
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.140334] [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.140901] [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
> Feb 9 14:39:54 rhel6-test kernel: [ 4209.141582] Code: 35 f0 19 c1 00 48 85 f6 75 25 48 c7 05 e8 19 c1 00 01 00 00 00 48 8b 05 49 54 a5 00 48 05 60 ea 00 00 48 89 05 dc 19 c1 00 eb 83 <0f> 0b eb fe 48 c7 c7 30 5a 7a 81 31 c0 e8 e0 f0 45 00 48 c7 05
>
> ==
>
> 2nd log, "kernel BUG at include/linux/mm.h:420!" is This one.
> ==
> static inline void __ClearPageBuddy(struct page *page)
> {
> VM_BUG_ON(!PageBuddy(page));
> atomic_set(&page->_mapcount, -1);
> }
> ==
> But this is just a tail of bad_page().
> ==
> static void bad_page(struct page *page)
> {
> static unsigned long resume;
> static unsigned long nr_shown;
> static unsigned long nr_unshown;
> ...
> dump_stack();
> out:
> /* Leave bad fields for debug, except PageBuddy could make trouble */
> __ClearPageBuddy(page);
> add_taint(TAINT_BAD_PAGE);
> }
> ==
> So, what important is bad_page().
>
> BAD page says
> ==
> BUG: Bad page state in process khugepaged pfn:1e9800
> page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800
> page flags: 0x40000000004000(head)
> pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
> ==
>
> Maybe page_mapcount(page) was > 0. and ->mapping was NULL.
>
> BTW, I think pc->flags should be printed in hex...Nishimura-san, how do you think ?
>
Agreed.
I don't enough time this week, so I'll prepare a patch in next week if necessary.

>
> In hex, pc->flags was 7A00000000004 and this means PCG_USED bit is set.
> This implies page_remove_rmap() may not be called but ->mapping is NULL. Hmm?
> (7A is encoding of section number.)
>
Sigh.. it seems another freed-but-not-uncharged problem..

> make -j 8 under 200M limit is highly busy with swap.
>
I'll try, and look at the issue too.

Thanks,
Daisuke Nishimura.

2011-02-09 06:58:59

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Wed, 9 Feb 2011 15:50:01 +0900
Daisuke Nishimura <[email protected]> wrote:

> >
> > In hex, pc->flags was 7A00000000004 and this means PCG_USED bit is set.
> > This implies page_remove_rmap() may not be called but ->mapping is NULL. Hmm?
> > (7A is encoding of section number.)
> >
> Sigh.. it seems another freed-but-not-uncharged problem..
>

Ah, ok, this is maybe caused by this. I'm sorry that I missed this.
==
static inline int free_pages_check(struct page *page)
{
if (unlikely(page_mapcount(page) |
(page->mapping != NULL) |
(atomic_read(&page->_count) != 0) |
(page->flags & PAGE_FLAGS_CHECK_AT_FREE) |
(mem_cgroup_bad_page_check(page)))) { <==========(*)
bad_page(page);
return 1;
==

Then, ok, this is a memcgroup and hugepage issue.

I'll look into.

Thanks,
-Kame

2011-02-09 07:29:35

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: [PATCH][BUGFIX] memcg: fix leak of accounting at failure path of hugepage collapsing.

There was a big bug. Anyway, thank you for adding new bad_page for memcg.
==

mem_cgroup_uncharge_page() should be called in all failure case
after mem_cgroup_charge_newpage() is called in
huge_memory.c::collapse_huge_page()

[ 4209.076861] BUG: Bad page state in process khugepaged pfn:1e9800
[ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800
[ 4209.078674] page flags: 0x40000000004000(head)
[ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
[ 4209.082177] (/A)
[ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
[ 4209.083412] Call Trace:
[ 4209.083678] [<ffffffff810f4454>] ? bad_page+0xe4/0x140
[ 4209.084240] [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
[ 4209.084837] [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
[ 4209.085509] [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
[ 4209.086110] [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
[ 4209.086699] [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
[ 4209.087333] [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
[ 4209.087935] [<ffffffff810fb015>] ? put_page+0x45/0x50
[ 4209.097361] [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
[ 4209.098364] [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
[ 4209.099121] [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
[ 4209.099780] [<ffffffff8107c236>] ? kthread+0x96/0xa0
[ 4209.100452] [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
[ 4209.101214] [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
[ 4209.101842] [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10


Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: mmotm-0204/mm/huge_memory.c
===================================================================
--- mmotm-0204.orig/mm/huge_memory.c
+++ mmotm-0204/mm/huge_memory.c
@@ -1852,7 +1852,6 @@ static void collapse_huge_page(struct mm
set_pmd_at(mm, address, pmd, _pmd);
spin_unlock(&mm->page_table_lock);
anon_vma_unlock(vma->anon_vma);
- mem_cgroup_uncharge_page(new_page);
goto out;
}

@@ -1898,6 +1897,7 @@ out_up_write:
return;

out:
+ mem_cgroup_uncharge_page(new_page);
#ifdef CONFIG_NUMA
put_page(new_page);
#endif

2011-02-09 07:53:08

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [PATCH][BUGFIX] memcg: fix leak of accounting at failure path of hugepage collapsing.

On Wed, 9 Feb 2011 16:23:24 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> There was a big bug. Anyway, thank you for adding new bad_page for memcg.
I hope this is the last time we'd see the bad_page for memcg ;)

> ==
>
> mem_cgroup_uncharge_page() should be called in all failure case
> after mem_cgroup_charge_newpage() is called in
> huge_memory.c::collapse_huge_page()
>
> [ 4209.076861] BUG: Bad page state in process khugepaged pfn:1e9800
> [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800
> [ 4209.078674] page flags: 0x40000000004000(head)
> [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
> [ 4209.082177] (/A)
> [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
> [ 4209.083412] Call Trace:
> [ 4209.083678] [<ffffffff810f4454>] ? bad_page+0xe4/0x140
> [ 4209.084240] [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
> [ 4209.084837] [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
> [ 4209.085509] [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
> [ 4209.086110] [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
> [ 4209.086699] [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
> [ 4209.087333] [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
> [ 4209.087935] [<ffffffff810fb015>] ? put_page+0x45/0x50
> [ 4209.097361] [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
> [ 4209.098364] [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
> [ 4209.099121] [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
> [ 4209.099780] [<ffffffff8107c236>] ? kthread+0x96/0xa0
> [ 4209.100452] [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
> [ 4209.101214] [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
> [ 4209.101842] [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

Acked-by: Daisuke Nishimura <[email protected]>

Thanks,
Daisuke Nishimura.

2011-02-09 09:51:32

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH][BUGFIX] memcg: fix leak of accounting at failure path of hugepage collapsing.

On Wed, Feb 09, 2011 at 04:23:24PM +0900, KAMEZAWA Hiroyuki wrote:
> There was a big bug. Anyway, thank you for adding new bad_page for memcg.

That check is really awesome :-)

> mem_cgroup_uncharge_page() should be called in all failure case
> after mem_cgroup_charge_newpage() is called in
> huge_memory.c::collapse_huge_page()
>
> [ 4209.076861] BUG: Bad page state in process khugepaged pfn:1e9800
> [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800
> [ 4209.078674] page flags: 0x40000000004000(head)
> [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
> [ 4209.082177] (/A)
> [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
> [ 4209.083412] Call Trace:
> [ 4209.083678] [<ffffffff810f4454>] ? bad_page+0xe4/0x140
> [ 4209.084240] [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
> [ 4209.084837] [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
> [ 4209.085509] [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
> [ 4209.086110] [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
> [ 4209.086699] [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
> [ 4209.087333] [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
> [ 4209.087935] [<ffffffff810fb015>] ? put_page+0x45/0x50
> [ 4209.097361] [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
> [ 4209.098364] [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
> [ 4209.099121] [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
> [ 4209.099780] [<ffffffff8107c236>] ? kthread+0x96/0xa0
> [ 4209.100452] [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
> [ 4209.101214] [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
> [ 4209.101842] [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

Reviewed-by: Johannes Weiner <[email protected]>

Thanks for debugging this.

2011-02-09 20:08:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Wed, Feb 09, 2011 at 03:52:46PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 9 Feb 2011 15:50:01 +0900
> Daisuke Nishimura <[email protected]> wrote:
>
> > >
> > > In hex, pc->flags was 7A00000000004 and this means PCG_USED bit is set.
> > > This implies page_remove_rmap() may not be called but ->mapping is NULL. Hmm?
> > > (7A is encoding of section number.)
> > >
> > Sigh.. it seems another freed-but-not-uncharged problem..
> >
>
> Ah, ok, this is maybe caused by this. I'm sorry that I missed this.
> ==
> static inline int free_pages_check(struct page *page)
> {
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> (atomic_read(&page->_count) != 0) |
> (page->flags & PAGE_FLAGS_CHECK_AT_FREE) |
> (mem_cgroup_bad_page_check(page)))) { <==========(*)
> bad_page(page);
> return 1;
> ==
>
> Then, ok, this is a memcgroup and hugepage issue.
>
> I'll look into.

Yes, the rest of the info on the page looked ok and shouldn't have
triggered a bad_page call. Thanks so much for looking into it.

Andrea

2011-02-10 02:49:21

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH][BUGFIX] memcg: fix leak of accounting at failure path of hugepage collapsing.

On Wed, Feb 9, 2011 at 4:23 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> There was a big bug. Anyway, thank you for adding new bad_page for memcg.
> ==
>
> mem_cgroup_uncharge_page() should be called in all failure case
> after mem_cgroup_charge_newpage() is called in
> huge_memory.c::collapse_huge_page()
>
>  [ 4209.076861] BUG: Bad page state in process khugepaged  pfn:1e9800
>  [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping:          (null) index:0x2800
>  [ 4209.078674] page flags: 0x40000000004000(head)
>  [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
>  [ 4209.082177] (/A)
>  [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
>  [ 4209.083412] Call Trace:
>  [ 4209.083678]  [<ffffffff810f4454>] ? bad_page+0xe4/0x140
>  [ 4209.084240]  [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
>  [ 4209.084837]  [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
>  [ 4209.085509]  [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
>  [ 4209.086110]  [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
>  [ 4209.086699]  [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
>  [ 4209.087333]  [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
>  [ 4209.087935]  [<ffffffff810fb015>] ? put_page+0x45/0x50
>  [ 4209.097361]  [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
>  [ 4209.098364]  [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
>  [ 4209.099121]  [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
>  [ 4209.099780]  [<ffffffff8107c236>] ? kthread+0x96/0xa0
>  [ 4209.100452]  [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
>  [ 4209.101214]  [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
>  [ 4209.101842]  [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>

--
Kind regards,
Minchan Kim

2011-02-11 07:03:05

by Hugh Dickins

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Wed, 9 Feb 2011, Andrea Arcangeli wrote:
> On Wed, Feb 09, 2011 at 03:52:46PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 9 Feb 2011 15:50:01 +0900
> > Daisuke Nishimura <[email protected]> wrote:
> >
> > > >
> > > > In hex, pc->flags was 7A00000000004 and this means PCG_USED bit is set.
> > > > This implies page_remove_rmap() may not be called but ->mapping is NULL. Hmm?
> > > > (7A is encoding of section number.)
> > > >
> > > Sigh.. it seems another freed-but-not-uncharged problem..
> > >
> >
> > Ah, ok, this is maybe caused by this. I'm sorry that I missed this.
> > ==
> > static inline int free_pages_check(struct page *page)
> > {
> > if (unlikely(page_mapcount(page) |
> > (page->mapping != NULL) |
> > (atomic_read(&page->_count) != 0) |
> > (page->flags & PAGE_FLAGS_CHECK_AT_FREE) |
> > (mem_cgroup_bad_page_check(page)))) { <==========(*)
> > bad_page(page);
> > return 1;
> > ==
> >
> > Then, ok, this is a memcgroup and hugepage issue.
> >
> > I'll look into.
>
> Yes, the rest of the info on the page looked ok and shouldn't have
> triggered a bad_page call. Thanks so much for looking into it.

There is a separate little issue here, Andrea.

Although we went to some trouble for bad_page() to take the page out
of circulation yet let the system continue, your VM_BUG_ON(!PageBuddy)
inside __ClearPageBuddy(page), from two callsites in bad_page(), is
turning it into a fatal error when CONFIG_DEBUG_VM.

You could that only MM developers switch CONFIG_DEBUG_VM=y, and they
would like bad_page() to be fatal; maybe, but if so we should do that
as an intentional patch, rather than as an unexpected side-effect ;)

I noticed this a few days ago, but hadn't quite decided whether just to
remove the VM_BUG_ON, or move it to __ClearPageBuddy's third callsite,
or... doesn't matter much.

I do also wonder if PageBuddy would better be _mapcount -something else:
if we've got a miscounted page (itself unlikely of course), there's a
chance that its _mapcount will be further decremented after it has been
freed: whereupon it will go from -1 to -2, PageBuddy at present. The
special avoidance of PageBuddy being that it can pull a whole block of
pages into misuse if its mistaken.

Hugh

2011-02-11 10:49:20

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Thu, Feb 10, 2011 at 11:02:50PM -0800, Hugh Dickins wrote:
> There is a separate little issue here, Andrea.
>
> Although we went to some trouble for bad_page() to take the page out
> of circulation yet let the system continue, your VM_BUG_ON(!PageBuddy)
> inside __ClearPageBuddy(page), from two callsites in bad_page(), is
> turning it into a fatal error when CONFIG_DEBUG_VM.

I see what you mean. Of course it is only a problem after bad_page
already triggered.... but then it trigger an BUG_ON instead of only a
bad_page.

> You could that only MM developers switch CONFIG_DEBUG_VM=y, and they
> would like bad_page() to be fatal; maybe, but if so we should do that
> as an intentional patch, rather than as an unexpected side-effect ;)

Fedora kernels are built with CONFIG_DEBUG_VM, all my kernels runs
with CONFIG_DEBUG_VM too, so we want it to be as "production" as
possible, and we don't want DEBUG_VM to decrease any reliability (only
to increase it of course).

> I noticed this a few days ago, but hadn't quite decided whether just to
> remove the VM_BUG_ON, or move it to __ClearPageBuddy's third callsite,
> or... doesn't matter much.
>
> I do also wonder if PageBuddy would better be _mapcount -something else:
> if we've got a miscounted page (itself unlikely of course), there's a
> chance that its _mapcount will be further decremented after it has been
> freed: whereupon it will go from -1 to -2, PageBuddy at present. The
> special avoidance of PageBuddy being that it can pull a whole block of
> pages into misuse if its mistaken.

Agreed. What about the below?

=====
Subject: mm: PageBuddy cleanups

From: Andrea Arcangeli <[email protected]>

bad_page could VM_BUG_ON(!PageBuddy(page)) inside __ClearPageBuddy(). I prefer
to keep the VM_BUG_ON for safety and to add a if to solve it.

Change the _mapcount value indicating PageBuddy from -2 to -1024 for more
robusteness against page_mapcount() undeflows.

Signed-off-by: Andrea Arcangeli <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
---

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f6385fc..fa16ba0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -402,16 +402,22 @@ static inline void init_page_count(struct page *page)
/*
* PageBuddy() indicate that the page is free and in the buddy system
* (see mm/page_alloc.c).
+ *
+ * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
+ * -2 so that an underflow of the page_mapcount() won't be mistaken
+ * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE.
*/
+#define PAGE_BUDDY_MAPCOUNT_VALUE (-1024*1024)
+
static inline int PageBuddy(struct page *page)
{
- return atomic_read(&page->_mapcount) == -2;
+ return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
}

static inline void __SetPageBuddy(struct page *page)
{
VM_BUG_ON(atomic_read(&page->_mapcount) != -1);
- atomic_set(&page->_mapcount, -2);
+ atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
}

static inline void __ClearPageBuddy(struct page *page)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a873e61..8aac134 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -286,7 +286,9 @@ static void bad_page(struct page *page)

/* Don't complain about poisoned pages */
if (PageHWPoison(page)) {
- __ClearPageBuddy(page);
+ /* __ClearPageBuddy VM_BUG_ON(!PageBuddy(page)) */
+ if (PageBuddy(page))
+ __ClearPageBuddy(page);
return;
}

@@ -317,7 +319,8 @@ static void bad_page(struct page *page)
dump_stack();
out:
/* Leave bad fields for debug, except PageBuddy could make trouble */
- __ClearPageBuddy(page);
+ if (PageBuddy(page)) /* __ClearPageBuddy VM_BUG_ON(!PageBuddy(page)) */
+ __ClearPageBuddy(page);
add_taint(TAINT_BAD_PAGE);
}

2011-02-11 19:59:13

by Hugh Dickins

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Fri, 11 Feb 2011, Andrea Arcangeli wrote:
> On Thu, Feb 10, 2011 at 11:02:50PM -0800, Hugh Dickins wrote:
> > There is a separate little issue here, Andrea.
> >
> > Although we went to some trouble for bad_page() to take the page out
> > of circulation yet let the system continue, your VM_BUG_ON(!PageBuddy)
> > inside __ClearPageBuddy(page), from two callsites in bad_page(), is
> > turning it into a fatal error when CONFIG_DEBUG_VM.
>
> I see what you mean. Of course it is only a problem after bad_page
> already triggered.... but then it trigger an BUG_ON instead of only a
> bad_page.
>
> > You could that only MM developers switch CONFIG_DEBUG_VM=y, and they
> > would like bad_page() to be fatal; maybe, but if so we should do that
> > as an intentional patch, rather than as an unexpected side-effect ;)
>
> Fedora kernels are built with CONFIG_DEBUG_VM, all my kernels runs
> with CONFIG_DEBUG_VM too, so we want it to be as "production" as
> possible, and we don't want DEBUG_VM to decrease any reliability (only
> to increase it of course).

Oh, I hadn't realized Fedora use it. I wonder if that's wise, I thought
Nick introduced it partly for the more expensive checks, and there might
be one or two of those around - those bad_range()s in page_alloc.c?

>
> > I noticed this a few days ago, but hadn't quite decided whether just to
> > remove the VM_BUG_ON, or move it to __ClearPageBuddy's third callsite,
> > or... doesn't matter much.
> >
> > I do also wonder if PageBuddy would better be _mapcount -something else:
> > if we've got a miscounted page (itself unlikely of course), there's a
> > chance that its _mapcount will be further decremented after it has been
> > freed: whereupon it will go from -1 to -2, PageBuddy at present. The
> > special avoidance of PageBuddy being that it can pull a whole block of
> > pages into misuse if its mistaken.
>
> Agreed. What about the below?
>
> =====
> Subject: mm: PageBuddy cleanups
>
> From: Andrea Arcangeli <[email protected]>
>
> bad_page could VM_BUG_ON(!PageBuddy(page)) inside __ClearPageBuddy().
> I prefer to keep the VM_BUG_ON for safety and to add a if to solve it.

Too much iffery: I ended up preferring it in rmv_page_order() myself.

>
> Change the _mapcount value indicating PageBuddy from -2 to -1024 for more
> robusteness against page_mapcount() undeflows.

But the patch actually says -1024*1024: either would do.

>
> Signed-off-by: Andrea Arcangeli <[email protected]>
> Reported-by: Hugh Dickins <[email protected]>
> ---
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f6385fc..fa16ba0 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -402,16 +402,22 @@ static inline void init_page_count(struct page *page)
> /*
> * PageBuddy() indicate that the page is free and in the buddy system
> * (see mm/page_alloc.c).
> + *
> + * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
> + * -2 so that an underflow of the page_mapcount() won't be mistaken
> + * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE.

Yes, good to comment that, thanks.

> */
> +#define PAGE_BUDDY_MAPCOUNT_VALUE (-1024*1024)
> +
> static inline int PageBuddy(struct page *page)
> {
> - return atomic_read(&page->_mapcount) == -2;
> + return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
> }
>
> static inline void __SetPageBuddy(struct page *page)
> {
> VM_BUG_ON(atomic_read(&page->_mapcount) != -1);
> - atomic_set(&page->_mapcount, -2);
> + atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
> }
>
> static inline void __ClearPageBuddy(struct page *page)

Yes, that's fine, 0xfff00000 looks unlikely enough (and my
imagination for "deadbeef"-like magic is too drowsy today).

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a873e61..8aac134 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -286,7 +286,9 @@ static void bad_page(struct page *page)
>
> /* Don't complain about poisoned pages */
> if (PageHWPoison(page)) {
> - __ClearPageBuddy(page);
> + /* __ClearPageBuddy VM_BUG_ON(!PageBuddy(page)) */
> + if (PageBuddy(page))
> + __ClearPageBuddy(page);
> return;
> }
>
> @@ -317,7 +319,8 @@ static void bad_page(struct page *page)
> dump_stack();
> out:
> /* Leave bad fields for debug, except PageBuddy could make trouble */
> - __ClearPageBuddy(page);
> + if (PageBuddy(page)) /* __ClearPageBuddy VM_BUG_ON(!PageBuddy(page)) */
> + __ClearPageBuddy(page);
> add_taint(TAINT_BAD_PAGE);
> }
>

Okay I suppose: it seems rather laboured to me, I think I'd have just
moved the VM_BUG_ON into rmv_page_order() if I'd done the patch; but
since I was too lazy to do it, I'd better be grateful for yours!

Hugh

2011-02-11 20:24:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Fri, Feb 11, 2011 at 11:58:58AM -0800, Hugh Dickins wrote:
> Oh, I hadn't realized Fedora use it. I wonder if that's wise, I thought
> Nick introduced it partly for the more expensive checks, and there might
> be one or two of those around - those bad_range()s in page_alloc.c?

I doubt the more expensive checks are very measurable.. benchmarks
usually run on enterprise distro. I'm sure when they enabled, they
were aware of having to run more expensive runtime checks.

> But the patch actually says -1024*1024: either would do.

I actually increased it to -1024*1024 after writing the email ;) sorry
the for the confusion.

> Yes, that's fine, 0xfff00000 looks unlikely enough (and my
> imagination for "deadbeef"-like magic is too drowsy today).

I used a negative power of two even if I doubt the compiler can make
much use of it.

> Okay I suppose: it seems rather laboured to me, I think I'd have just
> moved the VM_BUG_ON into rmv_page_order() if I'd done the patch; but
> since I was too lazy to do it, I'd better be grateful for yours!

Ok the reason I didn't move the VM_BUG_ON is to be stricter in case
there are more usages of __ClearPageBuddy in the future. I guess it's
not so important, but when I initially implemented it, it wasn't
entirely obvious it would work safe with memory hotplug, compaction
and all other bits using PageBuddy, so...

2011-02-14 22:24:44

by Johannes Weiner

[permalink] [raw]
Subject: Re: [mmotm] BUG: Bad page state in process khugepaged ?

On Fri, Feb 11, 2011 at 11:49:06AM +0100, Andrea Arcangeli wrote:
> On Thu, Feb 10, 2011 at 11:02:50PM -0800, Hugh Dickins wrote:
> > There is a separate little issue here, Andrea.
> >
> > Although we went to some trouble for bad_page() to take the page out
> > of circulation yet let the system continue, your VM_BUG_ON(!PageBuddy)
> > inside __ClearPageBuddy(page), from two callsites in bad_page(), is
> > turning it into a fatal error when CONFIG_DEBUG_VM.
>
> I see what you mean. Of course it is only a problem after bad_page
> already triggered.... but then it trigger an BUG_ON instead of only a
> bad_page.
>
> > You could that only MM developers switch CONFIG_DEBUG_VM=y, and they
> > would like bad_page() to be fatal; maybe, but if so we should do that
> > as an intentional patch, rather than as an unexpected side-effect ;)
>
> Fedora kernels are built with CONFIG_DEBUG_VM, all my kernels runs
> with CONFIG_DEBUG_VM too, so we want it to be as "production" as
> possible, and we don't want DEBUG_VM to decrease any reliability (only
> to increase it of course).

Are you sure?

$ grep DEBUG_VM /boot/config-*
/boot/config-2.6.35.10-74.fc14.x86_64:# CONFIG_DEBUG_VM is not set
/boot/config-2.6.35.6-45.fc14.x86_64:# CONFIG_DEBUG_VM is not set

Only the one from the kernel-debug package has it set on this F14.

Hannes