2009-06-19 21:48:26

by Maxim Levitsky

[permalink] [raw]
Subject: Strange oopses in 2.6.30

I see lots of following oopses in 2.6.30 and latest -git

Many different applications shows up, not just reiserfsck
Something in MM I guess, it makes me worry. But system seems to work.

Is this known?

dmesg attached.


[ 34.544040] BUG: Bad page state in process reiserfsck pfn:37d86
[ 34.544044] page:c2a34f38 flags:3650000c count:0 mapcount:0
mapping:(null) index:bffeb
[ 34.544048] Pid: 2654, comm: reiserfsck Tainted: G B
2.6.30-git #4
[ 34.544051] Call Trace:
[ 34.544055] [<c04cd26a>] ? printk+0x18/0x1e
[ 34.544059] [<c018f065>] bad_page+0xd5/0x140
[ 34.544064] [<c0190097>] free_hot_cold_page+0x1e7/0x280
[ 34.544069] [<c0193682>] ? release_pages+0x92/0x1b0
[ 34.544074] [<c0190155>] __pagevec_free+0x25/0x30
[ 34.544078] [<c0193758>] release_pages+0x168/0x1b0
[ 34.544084] [<c0193cf3>] ? lru_add_drain+0x53/0xd0
[ 34.544088] [<c01ab7d4>] free_pages_and_swap_cache+0x84/0xa0
[ 34.544093] [<c019ff5d>] unmap_vmas+0x73d/0x760
[ 34.544099] [<c016480e>] ? lock_release_non_nested+0x15e/0x270
[ 34.544104] [<c01a43b5>] exit_mmap+0xb5/0x1b0
[ 34.544109] [<c0138666>] mmput+0x36/0xc0
[ 34.544113] [<c013c874>] exit_mm+0xe4/0x120
[ 34.544117] [<c0175539>] ? acct_collect+0x139/0x180
[ 34.544122] [<c013e889>] do_exit+0x6b9/0x720
[ 34.544142] [<c01bcac2>] ? vfs_write+0x122/0x180
[ 34.544146] [<c01bbda0>] ? do_sync_write+0x0/0x110
[ 34.544151] [<c013e920>] do_group_exit+0x30/0x90
[ 34.544156] [<c013e993>] sys_exit_group+0x13/0x20
[ 34.544161] [<c01039e8>] sysenter_do_call+0x12/0x3c
[ 34.544180] BUG: Bad page state in process reiserfsck pfn:37d91
[ 34.544184] page:c2a35174 flags:3650000c count:0 mapcount:0
mapping:(null) index:bfff6
[ 34.544188] Pid: 2654, comm: reiserfsck Tainted: G B
2.6.30-git #4


Attachments:
dmesg.log.gz (17.07 kB)

2009-06-20 14:13:41

by Maxim Levitsky

[permalink] [raw]
Subject: BUG] Strange oopses in 2.6.30

On Sat, 2009-06-20 at 00:48 +0300, Maxim Levitsky wrote:
> I see lots of following oopses in 2.6.30 and latest -git
>
> Many different applications shows up, not just reiserfsck
> Something in MM I guess, it makes me worry. But system seems to work.
>
> Is this known?
>
> dmesg attached.
>
>
> [ 34.544040] BUG: Bad page state in process reiserfsck pfn:37d86
> [ 34.544044] page:c2a34f38 flags:3650000c count:0 mapcount:0
> mapping:(null) index:bffeb
> [ 34.544048] Pid: 2654, comm: reiserfsck Tainted: G B
> 2.6.30-git #4
> [ 34.544051] Call Trace:
> [ 34.544055] [<c04cd26a>] ? printk+0x18/0x1e
> [ 34.544059] [<c018f065>] bad_page+0xd5/0x140
> [ 34.544064] [<c0190097>] free_hot_cold_page+0x1e7/0x280
> [ 34.544069] [<c0193682>] ? release_pages+0x92/0x1b0
> [ 34.544074] [<c0190155>] __pagevec_free+0x25/0x30
> [ 34.544078] [<c0193758>] release_pages+0x168/0x1b0
> [ 34.544084] [<c0193cf3>] ? lru_add_drain+0x53/0xd0
> [ 34.544088] [<c01ab7d4>] free_pages_and_swap_cache+0x84/0xa0
> [ 34.544093] [<c019ff5d>] unmap_vmas+0x73d/0x760
> [ 34.544099] [<c016480e>] ? lock_release_non_nested+0x15e/0x270
> [ 34.544104] [<c01a43b5>] exit_mmap+0xb5/0x1b0
> [ 34.544109] [<c0138666>] mmput+0x36/0xc0
> [ 34.544113] [<c013c874>] exit_mm+0xe4/0x120
> [ 34.544117] [<c0175539>] ? acct_collect+0x139/0x180
> [ 34.544122] [<c013e889>] do_exit+0x6b9/0x720
> [ 34.544142] [<c01bcac2>] ? vfs_write+0x122/0x180
> [ 34.544146] [<c01bbda0>] ? do_sync_write+0x0/0x110
> [ 34.544151] [<c013e920>] do_group_exit+0x30/0x90
> [ 34.544156] [<c013e993>] sys_exit_group+0x13/0x20
> [ 34.544161] [<c01039e8>] sysenter_do_call+0x12/0x3c
> [ 34.544180] BUG: Bad page state in process reiserfsck pfn:37d91
> [ 34.544184] page:c2a35174 flags:3650000c count:0 mapcount:0
> mapping:(null) index:bfff6
> [ 34.544188] Pid: 2654, comm: reiserfsck Tainted: G B
> 2.6.30-git #4
>

This really worries me

2009-06-20 15:28:11

by Jiri Slaby

[permalink] [raw]
Subject: BUG: Bad page state [was: Strange oopses in 2.6.30]

On 06/20/2009 04:08 PM, Maxim Levitsky wrote:
> On Sat, 2009-06-20 at 00:48 +0300, Maxim Levitsky wrote:
>> I see lots of following oopses in 2.6.30 and latest -git
>>
>> Many different applications shows up, not just reiserfsck
>> Something in MM I guess, it makes me worry. But system seems to work.
>>
>> Is this known?
>>
>> dmesg attached.
>>
>>
>> [ 34.544040] BUG: Bad page state in process reiserfsck pfn:37d86
>> [ 34.544044] page:c2a34f38 flags:3650000c count:0 mapcount:0
>> mapping:(null) index:bffeb
>> [ 34.544048] Pid: 2654, comm: reiserfsck Tainted: G B
>> 2.6.30-git #4
>> [ 34.544051] Call Trace:
>> [ 34.544055] [<c04cd26a>] ? printk+0x18/0x1e
>> [ 34.544059] [<c018f065>] bad_page+0xd5/0x140
>> [ 34.544064] [<c0190097>] free_hot_cold_page+0x1e7/0x280
>> [ 34.544069] [<c0193682>] ? release_pages+0x92/0x1b0
>> [ 34.544074] [<c0190155>] __pagevec_free+0x25/0x30
>> [ 34.544078] [<c0193758>] release_pages+0x168/0x1b0
>> [ 34.544084] [<c0193cf3>] ? lru_add_drain+0x53/0xd0
>> [ 34.544088] [<c01ab7d4>] free_pages_and_swap_cache+0x84/0xa0
>> [ 34.544093] [<c019ff5d>] unmap_vmas+0x73d/0x760
>> [ 34.544099] [<c016480e>] ? lock_release_non_nested+0x15e/0x270
>> [ 34.544104] [<c01a43b5>] exit_mmap+0xb5/0x1b0
>> [ 34.544109] [<c0138666>] mmput+0x36/0xc0
>> [ 34.544113] [<c013c874>] exit_mm+0xe4/0x120
>> [ 34.544117] [<c0175539>] ? acct_collect+0x139/0x180
>> [ 34.544122] [<c013e889>] do_exit+0x6b9/0x720
>> [ 34.544142] [<c01bcac2>] ? vfs_write+0x122/0x180
>> [ 34.544146] [<c01bbda0>] ? do_sync_write+0x0/0x110
>> [ 34.544151] [<c013e920>] do_group_exit+0x30/0x90
>> [ 34.544156] [<c013e993>] sys_exit_group+0x13/0x20
>> [ 34.544161] [<c01039e8>] sysenter_do_call+0x12/0x3c
>> [ 34.544180] BUG: Bad page state in process reiserfsck pfn:37d91
>> [ 34.544184] page:c2a35174 flags:3650000c count:0 mapcount:0
>> mapping:(null) index:bfff6
>> [ 34.544188] Pid: 2654, comm: reiserfsck Tainted: G B
>> 2.6.30-git #4

I got similar on 64-bit mmotm 2009-06-12-12-20. You seem not to use
2.6.30, but some git post 2.6.30.

Flags are:
0000000000400000 -- __PG_MLOCKED
800000000050000c -- my page flags
3650000c -- Maxim's page flags
0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE

In my .config, there is
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y

The traces:
BUG: Bad page state in process gpg-agent pfn:1c83c9
page:ffffea00063cd3f8 flags:800000000050000c count:0 mapcount:0
mapping:(null) i
ndex:7feda9eae
Pid: 3859, comm: gpg-agent Not tainted 2.6.30-mm1_64 #641
Call Trace:
[<ffffffff8108fab2>] bad_page+0xd2/0x130
[<ffffffff81091a57>] free_hot_cold_page+0x47/0x200
[<ffffffff81091c48>] __pagevec_free+0x38/0x50
[<ffffffff81094eac>] release_pages+0x20c/0x240
[<ffffffff810afc6f>] free_pages_and_swap_cache+0xaf/0xd0
[<ffffffff8115d53a>] ? cpumask_any_but+0x2a/0x40
[<ffffffff810a8520>] unmap_region+0x150/0x170
[<ffffffff810a87b4>] do_munmap+0x274/0x370
[<ffffffff810a88fc>] sys_munmap+0x4c/0x70
[<ffffffff8100beab>] system_call_fastpath+0x16/0x1b
Disabling lock debugging due to kernel taint
BUG: Bad page state in process gpg-agent pfn:1c83c8
page:ffffea00063cd3c0 flags:800000000050000c count:0 mapcount:0
mapping:(null) i
ndex:7feda9ead
Pid: 3859, comm: gpg-agent Tainted: G B 2.6.30-mm1_64 #641
Call Trace:
[<ffffffff8108fab2>] bad_page+0xd2/0x130
[<ffffffff81091a57>] free_hot_cold_page+0x47/0x200
[<ffffffff81091c48>] __pagevec_free+0x38/0x50
[<ffffffff81094eac>] release_pages+0x20c/0x240
[<ffffffff810afc6f>] free_pages_and_swap_cache+0xaf/0xd0
[<ffffffff8115d53a>] ? cpumask_any_but+0x2a/0x40
[<ffffffff810a8520>] unmap_region+0x150/0x170
[<ffffffff810a87b4>] do_munmap+0x274/0x370
[<ffffffff810a88fc>] sys_munmap+0x4c/0x70
[<ffffffff8100beab>] system_call_fastpath+0x16/0x1b
BUG: Bad page state in process gpg-agent pfn:1c800f
page:ffffea00063c0348 flags:800000000050000c count:0 mapcount:0
mapping:(null) i
ndex:7feda9eac
Pid: 3859, comm: gpg-agent Tainted: G B 2.6.30-mm1_64 #641
Call Trace:
[<ffffffff8108fab2>] bad_page+0xd2/0x130
[<ffffffff81091a57>] free_hot_cold_page+0x47/0x200
[<ffffffff81091c48>] __pagevec_free+0x38/0x50
[<ffffffff81094eac>] release_pages+0x20c/0x240
[<ffffffff810afc6f>] free_pages_and_swap_cache+0xaf/0xd0
[<ffffffff8115d53a>] ? cpumask_any_but+0x2a/0x40
[<ffffffff810a8520>] unmap_region+0x150/0x170
[<ffffffff810a87b4>] do_munmap+0x274/0x370
[<ffffffff810a88fc>] sys_munmap+0x4c/0x70
[<ffffffff8100beab>] system_call_fastpath+0x16/0x1b

2009-06-20 15:41:20

by Hugh Dickins

[permalink] [raw]
Subject: Re: BUG] Strange oopses in 2.6.30

On Sat, 20 Jun 2009, Maxim Levitsky wrote:
> On Sat, 2009-06-20 at 00:48 +0300, Maxim Levitsky wrote:
> > I see lots of following oopses in 2.6.30 and latest -git
> >
> > Many different applications shows up, not just reiserfsck
> > Something in MM I guess, it makes me worry. But system seems to work.
> >
> > Is this known?

I think so...

> >
> > dmesg attached.
> >
> >
> > [ 34.544040] BUG: Bad page state in process reiserfsck pfn:37d86
> > [ 34.544044] page:c2a34f38 flags:3650000c count:0 mapcount:0
> > mapping:(null) index:bffeb
> > [ 34.544048] Pid: 2654, comm: reiserfsck Tainted: G B
> > 2.6.30-git #4
> > [ 34.544051] Call Trace:
> > [ 34.544055] [<c04cd26a>] ? printk+0x18/0x1e
> > [ 34.544059] [<c018f065>] bad_page+0xd5/0x140
> > [ 34.544064] [<c0190097>] free_hot_cold_page+0x1e7/0x280
> > [ 34.544069] [<c0193682>] ? release_pages+0x92/0x1b0
> > [ 34.544074] [<c0190155>] __pagevec_free+0x25/0x30
> > [ 34.544078] [<c0193758>] release_pages+0x168/0x1b0
> > [ 34.544084] [<c0193cf3>] ? lru_add_drain+0x53/0xd0
> > [ 34.544088] [<c01ab7d4>] free_pages_and_swap_cache+0x84/0xa0
> > [ 34.544093] [<c019ff5d>] unmap_vmas+0x73d/0x760
> > [ 34.544099] [<c016480e>] ? lock_release_non_nested+0x15e/0x270
> > [ 34.544104] [<c01a43b5>] exit_mmap+0xb5/0x1b0
> > [ 34.544109] [<c0138666>] mmput+0x36/0xc0
> > [ 34.544113] [<c013c874>] exit_mm+0xe4/0x120
> > [ 34.544117] [<c0175539>] ? acct_collect+0x139/0x180
> > [ 34.544122] [<c013e889>] do_exit+0x6b9/0x720
> > [ 34.544142] [<c01bcac2>] ? vfs_write+0x122/0x180
> > [ 34.544146] [<c01bbda0>] ? do_sync_write+0x0/0x110
> > [ 34.544151] [<c013e920>] do_group_exit+0x30/0x90
> > [ 34.544156] [<c013e993>] sys_exit_group+0x13/0x20
> > [ 34.544161] [<c01039e8>] sysenter_do_call+0x12/0x3c
> > [ 34.544180] BUG: Bad page state in process reiserfsck pfn:37d91
> > [ 34.544184] page:c2a35174 flags:3650000c count:0 mapcount:0
> > mapping:(null) index:bfff6
> > [ 34.544188] Pid: 2654, comm: reiserfsck Tainted: G B
> > 2.6.30-git #4
> >
>
> This really worries me

I hope it's fixed by this patch Hannes posted yesterday...

>From [email protected] Fri Jun 19 19:04:49 2009
Date: Fri, 19 Jun 2009 19:45:02 +0200
From: Johannes Weiner <[email protected]>
To: Peter Chubb <[email protected]>
Cc: <[email protected]>, <[email protected]>, <[email protected]>
Subject: Re: [BUG] Bad page flags when process using mlock()ed memory exits

On Fri, Jun 19, 2009 at 02:11:21PM +1000, Peter Chubb wrote:
>
> In recent kernels I've been seeing many mesages of the form:
>
> BUG: Bad page state in process reiserfsck pfn:79c58
> page:c3d03b00 flags:8050000c count:0 mapcount:0 mapping:(null) index:8095
> Pid: 3927, comm: reiserfsck Not tainted 2.6.30-test-05456-gda456f1 #60
> Call Trace:
> [<c134a67c>] ? printk+0xf/0x13
> [<c10774dc>] bad_page+0xc9/0xe2
> [<c1078041>] free_hot_cold_page+0x5c/0x204
> [<c1078206>] __pagevec_free+0x1d/0x25
> [<c107ac3e>] release_pages+0x14e/0x18e)
> [<c108ef8a>] free_pages_and_swap_cache+0x69/0x82
> [<c1089458>] exit_mmap+0xf6/0x11f
> [<c102afcd>] mmput+0x39/0xaf
> [<c102e534>] exit_mm+0xe5/0xed
> [<c102fa66>] do_exit+0x13f/0x578
> [<c102fefd>] do_group_exit+0x5e/0x85
> [<c102ff37>] sys_exit_group+0x13/0x17
> [<c10031ef>] sysenter_do_call+0x12/0x3c
> Disabling lock debugging due to kernel taint
>
> This appears to have been introduced by patch
> da456f14d2f2d7350f2b9440af79c85a34c7eed5
> page allocator: do not disable interrupts in free_page_mlock()
>
> That patch removed the free_page_mlock() from free_pages_check(), so
> if free_hot_cold_page() is called on an Mlocked page (e.g., if a
> process that used mlock() calls exit()) free_pages_check() will always
> barf, whereas before it would just unlock the page.

I prepared a fix, thanks for chasing it down.

Mel, to keep this simple I just used the atomic test-clear, but if I
am not mistaken we should not need any atomicity here, so we could
probably add a __TestClearPage version and use this instead...?

---
>From 493b74e8615db4e3323b5b169b0b8265dfd7c3f4 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <[email protected]>
Date: Fri, 19 Jun 2009 19:30:56 +0200
Subject: [patch] mm: page_alloc: clear PG_locked before checking flags on free

da456f1 "page allocator: do not disable interrupts in free_page_mlock()" moved
the PG_mlocked clearing after the flag sanity checking which makes mlocked
pages always trigger 'bad page'. Fix this by clearing the bit up front.

Reported-by: Peter Chubb <[email protected]>
Debugged-by: Peter Chubb <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
---
mm/page_alloc.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6f0753f..30d5093 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -488,7 +488,6 @@ static inline void __free_one_page(struct page *page,
*/
static inline void free_page_mlock(struct page *page)
{
- __ClearPageMlocked(page);
__dec_zone_page_state(page, NR_MLOCK);
__count_vm_event(UNEVICTABLE_MLOCKFREED);
}
@@ -558,7 +557,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
unsigned long flags;
int i;
int bad = 0;
- int clearMlocked = PageMlocked(page);
+ int wasMlocked = TestClearPageMlocked(page);

kmemcheck_free_shadow(page, order);

@@ -576,7 +575,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
kernel_map_pages(page, 1 << order, 0);

local_irq_save(flags);
- if (unlikely(clearMlocked))
+ if (unlikely(wasMlocked))
free_page_mlock(page);
__count_vm_events(PGFREE, 1 << order);
free_one_page(page_zone(page), page, order,
@@ -1022,7 +1021,7 @@ static void free_hot_cold_page(struct page *page, int cold)
struct zone *zone = page_zone(page);
struct per_cpu_pages *pcp;
unsigned long flags;
- int clearMlocked = PageMlocked(page);
+ int wasMlocked = TestClearPageMlocked(page);

kmemcheck_free_shadow(page, 0);

@@ -1041,7 +1040,7 @@ static void free_hot_cold_page(struct page *page, int cold)
pcp = &zone_pcp(zone, get_cpu())->pcp;
set_page_private(page, get_pageblock_migratetype(page));
local_irq_save(flags);
- if (unlikely(clearMlocked))
+ if (unlikely(wasMlocked))
free_page_mlock(page);
__count_vm_event(PGFREE);

--
1.6.3

2009-06-20 15:45:10

by Maxim Levitsky

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Sat, 2009-06-20 at 17:27 +0200, Jiri Slaby wrote:
> On 06/20/2009 04:08 PM, Maxim Levitsky wrote:
> > On Sat, 2009-06-20 at 00:48 +0300, Maxim Levitsky wrote:
> >> I see lots of following oopses in 2.6.30 and latest -git
> >>
> >> Many different applications shows up, not just reiserfsck
> >> Something in MM I guess, it makes me worry. But system seems to work.
> >>
> >> Is this known?
> >>
> >> dmesg attached.
> >>
> >>
> >> [ 34.544040] BUG: Bad page state in process reiserfsck pfn:37d86
> >> [ 34.544044] page:c2a34f38 flags:3650000c count:0 mapcount:0
> >> mapping:(null) index:bffeb
> >> [ 34.544048] Pid: 2654, comm: reiserfsck Tainted: G B
> >> 2.6.30-git #4
> >> [ 34.544051] Call Trace:
> >> [ 34.544055] [<c04cd26a>] ? printk+0x18/0x1e
> >> [ 34.544059] [<c018f065>] bad_page+0xd5/0x140
> >> [ 34.544064] [<c0190097>] free_hot_cold_page+0x1e7/0x280
> >> [ 34.544069] [<c0193682>] ? release_pages+0x92/0x1b0
> >> [ 34.544074] [<c0190155>] __pagevec_free+0x25/0x30
> >> [ 34.544078] [<c0193758>] release_pages+0x168/0x1b0
> >> [ 34.544084] [<c0193cf3>] ? lru_add_drain+0x53/0xd0
> >> [ 34.544088] [<c01ab7d4>] free_pages_and_swap_cache+0x84/0xa0
> >> [ 34.544093] [<c019ff5d>] unmap_vmas+0x73d/0x760
> >> [ 34.544099] [<c016480e>] ? lock_release_non_nested+0x15e/0x270
> >> [ 34.544104] [<c01a43b5>] exit_mmap+0xb5/0x1b0
> >> [ 34.544109] [<c0138666>] mmput+0x36/0xc0
> >> [ 34.544113] [<c013c874>] exit_mm+0xe4/0x120
> >> [ 34.544117] [<c0175539>] ? acct_collect+0x139/0x180
> >> [ 34.544122] [<c013e889>] do_exit+0x6b9/0x720
> >> [ 34.544142] [<c01bcac2>] ? vfs_write+0x122/0x180
> >> [ 34.544146] [<c01bbda0>] ? do_sync_write+0x0/0x110
> >> [ 34.544151] [<c013e920>] do_group_exit+0x30/0x90
> >> [ 34.544156] [<c013e993>] sys_exit_group+0x13/0x20
> >> [ 34.544161] [<c01039e8>] sysenter_do_call+0x12/0x3c
> >> [ 34.544180] BUG: Bad page state in process reiserfsck pfn:37d91
> >> [ 34.544184] page:c2a35174 flags:3650000c count:0 mapcount:0
> >> mapping:(null) index:bfff6
> >> [ 34.544188] Pid: 2654, comm: reiserfsck Tainted: G B
> >> 2.6.30-git #4
>
> I got similar on 64-bit mmotm 2009-06-12-12-20. You seem not to use
> 2.6.30, but some git post 2.6.30.
Yes, yesterday git
I tried 2.6.30 too, but it gives same warnings.



>
> Flags are:
> 0000000000400000 -- __PG_MLOCKED
> 800000000050000c -- my page flags
> 3650000c -- Maxim's page flags
> 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
>
> In my .config, there is
> CONFIG_PAGEFLAGS_EXTENDED=y
> CONFIG_HAVE_MLOCKED_PAGE_BIT=y
>
> The traces:
> BUG: Bad page state in process gpg-agent pfn:1c83c9
> page:ffffea00063cd3f8 flags:800000000050000c count:0 mapcount:0
> mapping:(null) i
> ndex:7feda9eae
> Pid: 3859, comm: gpg-agent Not tainted 2.6.30-mm1_64 #641
> Call Trace:
> [<ffffffff8108fab2>] bad_page+0xd2/0x130
> [<ffffffff81091a57>] free_hot_cold_page+0x47/0x200
> [<ffffffff81091c48>] __pagevec_free+0x38/0x50
> [<ffffffff81094eac>] release_pages+0x20c/0x240
> [<ffffffff810afc6f>] free_pages_and_swap_cache+0xaf/0xd0
> [<ffffffff8115d53a>] ? cpumask_any_but+0x2a/0x40
> [<ffffffff810a8520>] unmap_region+0x150/0x170
> [<ffffffff810a87b4>] do_munmap+0x274/0x370
> [<ffffffff810a88fc>] sys_munmap+0x4c/0x70
> [<ffffffff8100beab>] system_call_fastpath+0x16/0x1b
> Disabling lock debugging due to kernel taint
> BUG: Bad page state in process gpg-agent pfn:1c83c8
> page:ffffea00063cd3c0 flags:800000000050000c count:0 mapcount:0
> mapping:(null) i
> ndex:7feda9ead
> Pid: 3859, comm: gpg-agent Tainted: G B 2.6.30-mm1_64 #641
> Call Trace:
> [<ffffffff8108fab2>] bad_page+0xd2/0x130
> [<ffffffff81091a57>] free_hot_cold_page+0x47/0x200
> [<ffffffff81091c48>] __pagevec_free+0x38/0x50
> [<ffffffff81094eac>] release_pages+0x20c/0x240
> [<ffffffff810afc6f>] free_pages_and_swap_cache+0xaf/0xd0
> [<ffffffff8115d53a>] ? cpumask_any_but+0x2a/0x40
> [<ffffffff810a8520>] unmap_region+0x150/0x170
> [<ffffffff810a87b4>] do_munmap+0x274/0x370
> [<ffffffff810a88fc>] sys_munmap+0x4c/0x70
> [<ffffffff8100beab>] system_call_fastpath+0x16/0x1b
> BUG: Bad page state in process gpg-agent pfn:1c800f
> page:ffffea00063c0348 flags:800000000050000c count:0 mapcount:0
> mapping:(null) i
> ndex:7feda9eac
> Pid: 3859, comm: gpg-agent Tainted: G B 2.6.30-mm1_64 #641
> Call Trace:
> [<ffffffff8108fab2>] bad_page+0xd2/0x130
> [<ffffffff81091a57>] free_hot_cold_page+0x47/0x200
> [<ffffffff81091c48>] __pagevec_free+0x38/0x50
> [<ffffffff81094eac>] release_pages+0x20c/0x240
> [<ffffffff810afc6f>] free_pages_and_swap_cache+0xaf/0xd0
> [<ffffffff8115d53a>] ? cpumask_any_but+0x2a/0x40
> [<ffffffff810a8520>] unmap_region+0x150/0x170
> [<ffffffff810a87b4>] do_munmap+0x274/0x370
> [<ffffffff810a88fc>] sys_munmap+0x4c/0x70
> [<ffffffff8100beab>] system_call_fastpath+0x16/0x1b


Regards,
Maxim Levitsky

2009-06-20 19:26:58

by Mel Gorman

[permalink] [raw]
Subject: Re: BUG] Strange oopses in 2.6.30

On Sat, Jun 20, 2009 at 04:40:12PM +0100, Hugh Dickins wrote:
> On Sat, 20 Jun 2009, Maxim Levitsky wrote:
> > On Sat, 2009-06-20 at 00:48 +0300, Maxim Levitsky wrote:
> > > I see lots of following oopses in 2.6.30 and latest -git
> > >
> > > Many different applications shows up, not just reiserfsck
> > > Something in MM I guess, it makes me worry. But system seems to work.
> > >
> > > Is this known?
>
> I think so...
>

Thanks Hugh.

> > >
> > > dmesg attached.
> > >
> > >
> > > [ 34.544040] BUG: Bad page state in process reiserfsck pfn:37d86
> > > [ 34.544044] page:c2a34f38 flags:3650000c count:0 mapcount:0
> > > mapping:(null) index:bffeb
> > > [ 34.544048] Pid: 2654, comm: reiserfsck Tainted: G B
> > > 2.6.30-git #4
> > > [ 34.544051] Call Trace:
> > > [ 34.544055] [<c04cd26a>] ? printk+0x18/0x1e
> > > [ 34.544059] [<c018f065>] bad_page+0xd5/0x140
> > > [ 34.544064] [<c0190097>] free_hot_cold_page+0x1e7/0x280
> > > [ 34.544069] [<c0193682>] ? release_pages+0x92/0x1b0
> > > [ 34.544074] [<c0190155>] __pagevec_free+0x25/0x30
> > > [ 34.544078] [<c0193758>] release_pages+0x168/0x1b0
> > > [ 34.544084] [<c0193cf3>] ? lru_add_drain+0x53/0xd0
> > > [ 34.544088] [<c01ab7d4>] free_pages_and_swap_cache+0x84/0xa0
> > > [ 34.544093] [<c019ff5d>] unmap_vmas+0x73d/0x760
> > > [ 34.544099] [<c016480e>] ? lock_release_non_nested+0x15e/0x270
> > > [ 34.544104] [<c01a43b5>] exit_mmap+0xb5/0x1b0
> > > [ 34.544109] [<c0138666>] mmput+0x36/0xc0
> > > [ 34.544113] [<c013c874>] exit_mm+0xe4/0x120
> > > [ 34.544117] [<c0175539>] ? acct_collect+0x139/0x180
> > > [ 34.544122] [<c013e889>] do_exit+0x6b9/0x720
> > > [ 34.544142] [<c01bcac2>] ? vfs_write+0x122/0x180
> > > [ 34.544146] [<c01bbda0>] ? do_sync_write+0x0/0x110
> > > [ 34.544151] [<c013e920>] do_group_exit+0x30/0x90
> > > [ 34.544156] [<c013e993>] sys_exit_group+0x13/0x20
> > > [ 34.544161] [<c01039e8>] sysenter_do_call+0x12/0x3c
> > > [ 34.544180] BUG: Bad page state in process reiserfsck pfn:37d91
> > > [ 34.544184] page:c2a35174 flags:3650000c count:0 mapcount:0
> > > mapping:(null) index:bfff6
> > > [ 34.544188] Pid: 2654, comm: reiserfsck Tainted: G B
> > > 2.6.30-git #4
> > >
> >
> > This really worries me
>
> I hope it's fixed by this patch Hannes posted yesterday...
>

Does the patch fix the problem up?

> From [email protected] Fri Jun 19 19:04:49 2009
> Date: Fri, 19 Jun 2009 19:45:02 +0200
> From: Johannes Weiner <[email protected]>
> To: Peter Chubb <[email protected]>
> Cc: <[email protected]>, <[email protected]>, <[email protected]>
> Subject: Re: [BUG] Bad page flags when process using mlock()ed memory exits
>
> On Fri, Jun 19, 2009 at 02:11:21PM +1000, Peter Chubb wrote:
> >
> > In recent kernels I've been seeing many mesages of the form:
> >
> > BUG: Bad page state in process reiserfsck pfn:79c58
> > page:c3d03b00 flags:8050000c count:0 mapcount:0 mapping:(null) index:8095
> > Pid: 3927, comm: reiserfsck Not tainted 2.6.30-test-05456-gda456f1 #60
> > Call Trace:
> > [<c134a67c>] ? printk+0xf/0x13
> > [<c10774dc>] bad_page+0xc9/0xe2
> > [<c1078041>] free_hot_cold_page+0x5c/0x204
> > [<c1078206>] __pagevec_free+0x1d/0x25
> > [<c107ac3e>] release_pages+0x14e/0x18e)
> > [<c108ef8a>] free_pages_and_swap_cache+0x69/0x82
> > [<c1089458>] exit_mmap+0xf6/0x11f
> > [<c102afcd>] mmput+0x39/0xaf
> > [<c102e534>] exit_mm+0xe5/0xed
> > [<c102fa66>] do_exit+0x13f/0x578
> > [<c102fefd>] do_group_exit+0x5e/0x85
> > [<c102ff37>] sys_exit_group+0x13/0x17
> > [<c10031ef>] sysenter_do_call+0x12/0x3c
> > Disabling lock debugging due to kernel taint
> >
> > This appears to have been introduced by patch
> > da456f14d2f2d7350f2b9440af79c85a34c7eed5
> > page allocator: do not disable interrupts in free_page_mlock()
> >
> > That patch removed the free_page_mlock() from free_pages_check(), so
> > if free_hot_cold_page() is called on an Mlocked page (e.g., if a
> > process that used mlock() calls exit()) free_pages_check() will always
> > barf, whereas before it would just unlock the page.
>
> I prepared a fix, thanks for chasing it down.
>
> Mel, to keep this simple I just used the atomic test-clear, but if I
> am not mistaken we should not need any atomicity here, so we could
> probably add a __TestClearPage version and use this instead...?
>
> ---
> >From 493b74e8615db4e3323b5b169b0b8265dfd7c3f4 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <[email protected]>
> Date: Fri, 19 Jun 2009 19:30:56 +0200
> Subject: [patch] mm: page_alloc: clear PG_locked before checking flags on free
>
> da456f1 "page allocator: do not disable interrupts in free_page_mlock()" moved
> the PG_mlocked clearing after the flag sanity checking which makes mlocked
> pages always trigger 'bad page'. Fix this by clearing the bit up front.
>
> Reported-by: Peter Chubb <[email protected]>
> Debugged-by: Peter Chubb <[email protected]>
> Signed-off-by: Johannes Weiner <[email protected]>
> Cc: Mel Gorman <[email protected]>
> ---
> mm/page_alloc.c | 9 ++++-----
> 1 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6f0753f..30d5093 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -488,7 +488,6 @@ static inline void __free_one_page(struct page *page,
> */
> static inline void free_page_mlock(struct page *page)
> {
> - __ClearPageMlocked(page);
> __dec_zone_page_state(page, NR_MLOCK);
> __count_vm_event(UNEVICTABLE_MLOCKFREED);
> }
> @@ -558,7 +557,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> unsigned long flags;
> int i;
> int bad = 0;
> - int clearMlocked = PageMlocked(page);
> + int wasMlocked = TestClearPageMlocked(page);
>
> kmemcheck_free_shadow(page, order);
>
> @@ -576,7 +575,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> kernel_map_pages(page, 1 << order, 0);
>
> local_irq_save(flags);
> - if (unlikely(clearMlocked))
> + if (unlikely(wasMlocked))
> free_page_mlock(page);
> __count_vm_events(PGFREE, 1 << order);
> free_one_page(page_zone(page), page, order,
> @@ -1022,7 +1021,7 @@ static void free_hot_cold_page(struct page *page, int cold)
> struct zone *zone = page_zone(page);
> struct per_cpu_pages *pcp;
> unsigned long flags;
> - int clearMlocked = PageMlocked(page);
> + int wasMlocked = TestClearPageMlocked(page);
>
> kmemcheck_free_shadow(page, 0);
>
> @@ -1041,7 +1040,7 @@ static void free_hot_cold_page(struct page *page, int cold)
> pcp = &zone_pcp(zone, get_cpu())->pcp;
> set_page_private(page, get_pageblock_migratetype(page));
> local_irq_save(flags);
> - if (unlikely(clearMlocked))
> + if (unlikely(wasMlocked))
> free_page_mlock(page);
> __count_vm_event(PGFREE);
>
> --
> 1.6.3
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-06-20 23:01:44

by Maxim Levitsky

[permalink] [raw]
Subject: Re: BUG] Strange oopses in 2.6.30

On Sat, 2009-06-20 at 16:40 +0100, Hugh Dickins wrote:
> On Sat, 20 Jun 2009, Maxim Levitsky wrote:
> > On Sat, 2009-06-20 at 00:48 +0300, Maxim Levitsky wrote:
> > > I see lots of following oopses in 2.6.30 and latest -git
> > >
> > > Many different applications shows up, not just reiserfsck
> > > Something in MM I guess, it makes me worry. But system seems to work.
> > >
> > > Is this known?
>
> I think so...
>
> > >
> > > dmesg attached.
> > >
> > >
> > > [ 34.544040] BUG: Bad page state in process reiserfsck pfn:37d86
> > > [ 34.544044] page:c2a34f38 flags:3650000c count:0 mapcount:0
> > > mapping:(null) index:bffeb
> > > [ 34.544048] Pid: 2654, comm: reiserfsck Tainted: G B
> > > 2.6.30-git #4
> > > [ 34.544051] Call Trace:
> > > [ 34.544055] [<c04cd26a>] ? printk+0x18/0x1e
> > > [ 34.544059] [<c018f065>] bad_page+0xd5/0x140
> > > [ 34.544064] [<c0190097>] free_hot_cold_page+0x1e7/0x280
> > > [ 34.544069] [<c0193682>] ? release_pages+0x92/0x1b0
> > > [ 34.544074] [<c0190155>] __pagevec_free+0x25/0x30
> > > [ 34.544078] [<c0193758>] release_pages+0x168/0x1b0
> > > [ 34.544084] [<c0193cf3>] ? lru_add_drain+0x53/0xd0
> > > [ 34.544088] [<c01ab7d4>] free_pages_and_swap_cache+0x84/0xa0
> > > [ 34.544093] [<c019ff5d>] unmap_vmas+0x73d/0x760
> > > [ 34.544099] [<c016480e>] ? lock_release_non_nested+0x15e/0x270
> > > [ 34.544104] [<c01a43b5>] exit_mmap+0xb5/0x1b0
> > > [ 34.544109] [<c0138666>] mmput+0x36/0xc0
> > > [ 34.544113] [<c013c874>] exit_mm+0xe4/0x120
> > > [ 34.544117] [<c0175539>] ? acct_collect+0x139/0x180
> > > [ 34.544122] [<c013e889>] do_exit+0x6b9/0x720
> > > [ 34.544142] [<c01bcac2>] ? vfs_write+0x122/0x180
> > > [ 34.544146] [<c01bbda0>] ? do_sync_write+0x0/0x110
> > > [ 34.544151] [<c013e920>] do_group_exit+0x30/0x90
> > > [ 34.544156] [<c013e993>] sys_exit_group+0x13/0x20
> > > [ 34.544161] [<c01039e8>] sysenter_do_call+0x12/0x3c
> > > [ 34.544180] BUG: Bad page state in process reiserfsck pfn:37d91
> > > [ 34.544184] page:c2a35174 flags:3650000c count:0 mapcount:0
> > > mapping:(null) index:bfff6
> > > [ 34.544188] Pid: 2654, comm: reiserfsck Tainted: G B
> > > 2.6.30-git #4
> > >
> >
> > This really worries me
>
> I hope it's fixed by this patch Hannes posted yesterday...

Indeed, this helps, thanks a lot

Best regards,

Maxim Levitsky

2009-06-22 02:41:22

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

(cc to Mel and some reviewer)

> Flags are:
> 0000000000400000 -- __PG_MLOCKED
> 800000000050000c -- my page flags
> 3650000c -- Maxim's page flags
> 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE

I guess commit da456f14d (page allocator: do not disable interrupts in
free_page_mlock()) is a bit wrong.

current code is:
-------------------------------------------------------------
static void free_hot_cold_page(struct page *page, int cold)
{
(snip)
int clearMlocked = PageMlocked(page);
(snip)
if (free_pages_check(page))
return;
(snip)
local_irq_save(flags);
if (unlikely(clearMlocked))
free_page_mlock(page);
-------------------------------------------------------------

Oh well, we remove PG_Mlocked *after* free_pages_check().
Then, it makes false-positive warning.

Sorry, my review was also wrong. I think reverting this patch is better ;)




2009-06-22 07:42:19

by Pekka Enberg

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Mon, 2009-06-22 at 11:39 +0900, KOSAKI Motohiro wrote:
> (cc to Mel and some reviewer)
>
> > Flags are:
> > 0000000000400000 -- __PG_MLOCKED
> > 800000000050000c -- my page flags
> > 3650000c -- Maxim's page flags
> > 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
>
> I guess commit da456f14d (page allocator: do not disable interrupts in
> free_page_mlock()) is a bit wrong.
>
> current code is:
> -------------------------------------------------------------
> static void free_hot_cold_page(struct page *page, int cold)
> {
> (snip)
> int clearMlocked = PageMlocked(page);
> (snip)
> if (free_pages_check(page))
> return;
> (snip)
> local_irq_save(flags);
> if (unlikely(clearMlocked))
> free_page_mlock(page);
> -------------------------------------------------------------
>
> Oh well, we remove PG_Mlocked *after* free_pages_check().
> Then, it makes false-positive warning.
>
> Sorry, my review was also wrong. I think reverting this patch is better ;)

Well, I am not sure we need to revert the patch. I'd argue it's simply a
bug in free_pages_check() that can be fixed with something like this.
Mel, what do you think?

Pekka

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d6792f8..b002b65 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -385,7 +385,7 @@ static inline void __ClearPageTail(struct page *page)
* these flags set. It they are, there is a problem.
*/
#define PAGE_FLAGS_CHECK_AT_FREE \
- (1 << PG_lru | 1 << PG_locked | \
+ (1 << PG_lru | \
1 << PG_private | 1 << PG_private_2 | \
1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a5f3c27..ff7c713 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -497,6 +497,11 @@ static void free_page_mlock(struct page *page) { }

static inline int free_pages_check(struct page *page)
{
+ /*
+ * Note: the page can have PG_mlock set here because we clear it
+ * lazily to avoid unnecessary disabling and enabling of interrupts in
+ * page free fastpath.
+ */
if (unlikely(page_mapcount(page) |
(page->mapping != NULL) |
(atomic_read(&page->_count) != 0) |

2009-06-22 09:16:45

by Mel Gorman

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Mon, Jun 22, 2009 at 11:39:53AM +0900, KOSAKI Motohiro wrote:
> (cc to Mel and some reviewer)
>
> > Flags are:
> > 0000000000400000 -- __PG_MLOCKED
> > 800000000050000c -- my page flags
> > 3650000c -- Maxim's page flags
> > 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
>
> I guess commit da456f14d (page allocator: do not disable interrupts in
> free_page_mlock()) is a bit wrong.
>
> current code is:
> -------------------------------------------------------------
> static void free_hot_cold_page(struct page *page, int cold)
> {
> (snip)
> int clearMlocked = PageMlocked(page);
> (snip)
> if (free_pages_check(page))
> return;
> (snip)
> local_irq_save(flags);
> if (unlikely(clearMlocked))
> free_page_mlock(page);
> -------------------------------------------------------------
>
> Oh well, we remove PG_Mlocked *after* free_pages_check().
> Then, it makes false-positive warning.
>
> Sorry, my review was also wrong. I think reverting this patch is better ;)
>

I think a revert is way overkill. The intention of the patch is sound -
reducing the number of times interrupts are disabled. Having pages
with the PG_locked bit is now somewhat of an expected situation. I'd
prefer to go with either

1. Unconditionally clearing the bit with TestClearPageLocked as the
patch already posted does
2. Removing PG_locked from the free_pages_check()
3. Unlocking the pages as we go when an mlocked VMA is being torn town

The patch that addresses 1 seemed ok to me. What do you think?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-06-22 16:02:44

by Lee Schermerhorn

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Mon, 2009-06-22 at 10:16 +0100, Mel Gorman wrote:
> On Mon, Jun 22, 2009 at 11:39:53AM +0900, KOSAKI Motohiro wrote:
> > (cc to Mel and some reviewer)

[added Rik so that he can get multiple copies, too. :)]

> >
> > > Flags are:
> > > 0000000000400000 -- __PG_MLOCKED
> > > 800000000050000c -- my page flags
> > > 3650000c -- Maxim's page flags
> > > 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
> >
> > I guess commit da456f14d (page allocator: do not disable interrupts in
> > free_page_mlock()) is a bit wrong.
> >
> > current code is:
> > -------------------------------------------------------------
> > static void free_hot_cold_page(struct page *page, int cold)
> > {
> > (snip)
> > int clearMlocked = PageMlocked(page);
> > (snip)
> > if (free_pages_check(page))
> > return;
> > (snip)
> > local_irq_save(flags);
> > if (unlikely(clearMlocked))
> > free_page_mlock(page);
> > -------------------------------------------------------------
> >
> > Oh well, we remove PG_Mlocked *after* free_pages_check().
> > Then, it makes false-positive warning.
> >
> > Sorry, my review was also wrong. I think reverting this patch is better ;)
> >
>
> I think a revert is way overkill. The intention of the patch is sound -
> reducing the number of times interrupts are disabled. Having pages
> with the PG_locked bit is now somewhat of an expected situation. I'd
> prefer to go with either
>
> 1. Unconditionally clearing the bit with TestClearPageLocked as the
> patch already posted does
> 2. Removing PG_locked from the free_pages_check()
> 3. Unlocking the pages as we go when an mlocked VMA is being torn town

Mel,

#3 SHOULD be happening in all cases. The free_page_mlocked() function
counts when this is not happening. We tried to fix all cases that we
encountered before this feature was submitted, but left the vm_stat
there to report if more PG_mlocked leaks were introduced. We also,
inadvertently, left PG_mlocked in the flags to check at free. We didn't
hit this before your patch because free_page_mlock() did a test&clear on
the PG_mlocked before checking the flags. Since you moved the call, and
used PageMlocked() instead of TestClearPageMlocked(), any PG_locked page
will cause the bug.

So, we have another PG_mlocked flag leaking to free. I don't think this
is terribly serious in itself, and probably not deserving of a BUG_ON.
It probably doesn't deserve a vm_stat, either, I guess. However, it
could indicate a more serious logic error and should be examined. So it
would be nice to retain some indication that it's happening.

> The patch that addresses 1 seemed ok to me. What do you think?
>

Your alternative #2 sounds less expensive that test&clear.

Lee

2009-06-22 20:53:30

by Mel Gorman

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Mon, Jun 22, 2009 at 12:02:33PM -0400, Lee Schermerhorn wrote:
> On Mon, 2009-06-22 at 10:16 +0100, Mel Gorman wrote:
> > On Mon, Jun 22, 2009 at 11:39:53AM +0900, KOSAKI Motohiro wrote:
> > > (cc to Mel and some reviewer)
>
> [added Rik so that he can get multiple copies, too. :)]
>
> > >
> > > > Flags are:
> > > > 0000000000400000 -- __PG_MLOCKED
> > > > 800000000050000c -- my page flags
> > > > 3650000c -- Maxim's page flags
> > > > 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
> > >
> > > I guess commit da456f14d (page allocator: do not disable interrupts in
> > > free_page_mlock()) is a bit wrong.
> > >
> > > current code is:
> > > -------------------------------------------------------------
> > > static void free_hot_cold_page(struct page *page, int cold)
> > > {
> > > (snip)
> > > int clearMlocked = PageMlocked(page);
> > > (snip)
> > > if (free_pages_check(page))
> > > return;
> > > (snip)
> > > local_irq_save(flags);
> > > if (unlikely(clearMlocked))
> > > free_page_mlock(page);
> > > -------------------------------------------------------------
> > >
> > > Oh well, we remove PG_Mlocked *after* free_pages_check().
> > > Then, it makes false-positive warning.
> > >
> > > Sorry, my review was also wrong. I think reverting this patch is better ;)
> > >
> >
> > I think a revert is way overkill. The intention of the patch is sound -
> > reducing the number of times interrupts are disabled. Having pages
> > with the PG_locked bit is now somewhat of an expected situation. I'd
> > prefer to go with either
> >
> > 1. Unconditionally clearing the bit with TestClearPageLocked as the
> > patch already posted does
> > 2. Removing PG_locked from the free_pages_check()
> > 3. Unlocking the pages as we go when an mlocked VMA is being torn town
>
> Mel,
>
> #3 SHOULD be happening in all cases. The free_page_mlocked() function
> counts when this is not happening. We tried to fix all cases that we
> encountered before this feature was submitted, but left the vm_stat
> there to report if more PG_mlocked leaks were introduced.

That makes sense. I was surprised at the thought that the pages were
apparently not getting freed properly and upon investigation I could not
trivially reproduce the problem. Can someone with this problem post their
.config please in case I'm missing something in there?

> We also,
> inadvertently, left PG_mlocked in the flags to check at free. We didn't
> hit this before your patch because free_page_mlock() did a test&clear on
> the PG_mlocked before checking the flags. Since you moved the call, and
> used PageMlocked() instead of TestClearPageMlocked(), any PG_locked page
> will cause the bug.
>
> So, we have another PG_mlocked flag leaking to free. I don't think this
> is terribly serious in itself, and probably not deserving of a BUG_ON.
> It probably doesn't deserve a vm_stat, either, I guess. However, it
> could indicate a more serious logic error and should be examined. So it
> would be nice to retain some indication that it's happening.
>
> > The patch that addresses 1 seemed ok to me. What do you think?
> >
>
> Your alternative #2 sounds less expensive that test&clear.
>

How about the following? The intention is to warn once when PG_mlocked
is set but continue to count the number of times the event occured.

==== CUT HERE ====
mm: Warn once when a page is freed with PG_mlocked set

When a page is freed with the PG_mlocked set, it is considered an unexpected
but recoverable situation. A counter records how often this event happens
but due to commit da456f14d [page allocator: do not disable interrupts in
free_page_mlock()], the page state is being treated as a bad page which is
considered a severe bug.

This bug drops the severity of the report in the event a page is freed
with PG_mlocked set. A warning is printed once and the subsequent events
counted.

Signed-off-by: Mel Gorman <[email protected]>
---
include/linux/page-flags.h | 10 +++++++++-
mm/page_alloc.c | 9 +++++++++
2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d6792f8..81731cf 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -389,7 +389,15 @@ static inline void __ClearPageTail(struct page *page)
1 << PG_private | 1 << PG_private_2 | \
1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
- 1 << PG_unevictable | __PG_MLOCKED)
+ 1 << PG_unevictable)
+
+/*
+ * Flags checked when a page is freed. Pages being freed should not have
+ * these set but the situation is easily resolved and should just be
+ * reported as a once-off warning.
+ */
+#define PAGE_FLAGS_WARN_AT_FREE \
+ (__PG_MLOCKED)

/*
* Flags checked when a page is prepped for return by the page allocator.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a5f3c27..c8c029e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -497,6 +497,15 @@ static void free_page_mlock(struct page *page) { }

static inline int free_pages_check(struct page *page)
{
+ if (unlikely(page->flags & PAGE_FLAGS_WARN_AT_FREE)) {
+ WARN_ONCE(1, KERN_WARNING
+ "Sloppy page flags set process %s at pfn:%05lx\n"
+ "page:%p flags:%p\n",
+ current->comm, page_to_pfn(page),
+ page, (void *)page->flags);
+ page->flags &= ~PAGE_FLAGS_WARN_AT_FREE;
+ }
+
if (unlikely(page_mapcount(page) |
(page->mapping != NULL) |
(atomic_read(&page->_count) != 0) |

2009-06-22 20:55:28

by Mel Gorman

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Mon, Jun 22, 2009 at 10:42:09AM +0300, Pekka Enberg wrote:
> On Mon, 2009-06-22 at 11:39 +0900, KOSAKI Motohiro wrote:
> > (cc to Mel and some reviewer)
> >
> > > Flags are:
> > > 0000000000400000 -- __PG_MLOCKED
> > > 800000000050000c -- my page flags
> > > 3650000c -- Maxim's page flags
> > > 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
> >
> > I guess commit da456f14d (page allocator: do not disable interrupts in
> > free_page_mlock()) is a bit wrong.
> >
> > current code is:
> > -------------------------------------------------------------
> > static void free_hot_cold_page(struct page *page, int cold)
> > {
> > (snip)
> > int clearMlocked = PageMlocked(page);
> > (snip)
> > if (free_pages_check(page))
> > return;
> > (snip)
> > local_irq_save(flags);
> > if (unlikely(clearMlocked))
> > free_page_mlock(page);
> > -------------------------------------------------------------
> >
> > Oh well, we remove PG_Mlocked *after* free_pages_check().
> > Then, it makes false-positive warning.
> >
> > Sorry, my review was also wrong. I think reverting this patch is better ;)
>
> Well, I am not sure we need to revert the patch. I'd argue it's simply a
> bug in free_pages_check() that can be fixed with something like this.
> Mel, what do you think?
>

I think you removed the check for the wrong flag - PG_locked vs
PG_mlocked :).

That aside, I reckon your intention was not far off the mark. I posted a
separate patch to see about warning once when PG_mlocked is set and
counting the event. When the warning appears, it's not world ending but
chances are it's something that needs to be fixed up.

> Pekka
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d6792f8..b002b65 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -385,7 +385,7 @@ static inline void __ClearPageTail(struct page *page)
> * these flags set. It they are, there is a problem.
> */
> #define PAGE_FLAGS_CHECK_AT_FREE \
> - (1 << PG_lru | 1 << PG_locked | \
> + (1 << PG_lru | \
> 1 << PG_private | 1 << PG_private_2 | \
> 1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
> 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a5f3c27..ff7c713 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -497,6 +497,11 @@ static void free_page_mlock(struct page *page) { }
>
> static inline int free_pages_check(struct page *page)
> {
> + /*
> + * Note: the page can have PG_mlock set here because we clear it
> + * lazily to avoid unnecessary disabling and enabling of interrupts in
> + * page free fastpath.
> + */
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> (atomic_read(&page->_count) != 0) |
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-06-23 11:04:19

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

> On Mon, Jun 22, 2009 at 11:39:53AM +0900, KOSAKI Motohiro wrote:
> > (cc to Mel and some reviewer)
> >
> > > Flags are:
> > > 0000000000400000 -- __PG_MLOCKED
> > > 800000000050000c -- my page flags
> > > 3650000c -- Maxim's page flags
> > > 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
> >
> > I guess commit da456f14d (page allocator: do not disable interrupts in
> > free_page_mlock()) is a bit wrong.
> >
> > current code is:
> > -------------------------------------------------------------
> > static void free_hot_cold_page(struct page *page, int cold)
> > {
> > (snip)
> > int clearMlocked = PageMlocked(page);
> > (snip)
> > if (free_pages_check(page))
> > return;
> > (snip)
> > local_irq_save(flags);
> > if (unlikely(clearMlocked))
> > free_page_mlock(page);
> > -------------------------------------------------------------
> >
> > Oh well, we remove PG_Mlocked *after* free_pages_check().
> > Then, it makes false-positive warning.
> >
> > Sorry, my review was also wrong. I think reverting this patch is better ;)
> >
>
> I think a revert is way overkill. The intention of the patch is sound -
> reducing the number of times interrupts are disabled. Having pages
> with the PG_locked bit is now somewhat of an expected situation. I'd
> prefer to go with either
>
> 1. Unconditionally clearing the bit with TestClearPageLocked as the
> patch already posted does
> 2. Removing PG_locked from the free_pages_check()
> 3. Unlocking the pages as we go when an mlocked VMA is being torn town
>
> The patch that addresses 1 seemed ok to me. What do you think?

Yes, I've overlooked Hanns's patch. I think that is good patch.
Thansk folks.



2009-06-23 11:12:00

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

> On Mon, Jun 22, 2009 at 12:02:33PM -0400, Lee Schermerhorn wrote:
> > On Mon, 2009-06-22 at 10:16 +0100, Mel Gorman wrote:
> > > On Mon, Jun 22, 2009 at 11:39:53AM +0900, KOSAKI Motohiro wrote:
> > > > (cc to Mel and some reviewer)
> >
> > [added Rik so that he can get multiple copies, too. :)]
> >
> > > >
> > > > > Flags are:
> > > > > 0000000000400000 -- __PG_MLOCKED
> > > > > 800000000050000c -- my page flags
> > > > > 3650000c -- Maxim's page flags
> > > > > 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
> > > >
> > > > I guess commit da456f14d (page allocator: do not disable interrupts in
> > > > free_page_mlock()) is a bit wrong.
> > > >
> > > > current code is:
> > > > -------------------------------------------------------------
> > > > static void free_hot_cold_page(struct page *page, int cold)
> > > > {
> > > > (snip)
> > > > int clearMlocked = PageMlocked(page);
> > > > (snip)
> > > > if (free_pages_check(page))
> > > > return;
> > > > (snip)
> > > > local_irq_save(flags);
> > > > if (unlikely(clearMlocked))
> > > > free_page_mlock(page);
> > > > -------------------------------------------------------------
> > > >
> > > > Oh well, we remove PG_Mlocked *after* free_pages_check().
> > > > Then, it makes false-positive warning.
> > > >
> > > > Sorry, my review was also wrong. I think reverting this patch is better ;)
> > > >
> > >
> > > I think a revert is way overkill. The intention of the patch is sound -
> > > reducing the number of times interrupts are disabled. Having pages
> > > with the PG_locked bit is now somewhat of an expected situation. I'd
> > > prefer to go with either
> > >
> > > 1. Unconditionally clearing the bit with TestClearPageLocked as the
> > > patch already posted does
> > > 2. Removing PG_locked from the free_pages_check()
> > > 3. Unlocking the pages as we go when an mlocked VMA is being torn town
> >
> > Mel,
> >
> > #3 SHOULD be happening in all cases. The free_page_mlocked() function
> > counts when this is not happening. We tried to fix all cases that we
> > encountered before this feature was submitted, but left the vm_stat
> > there to report if more PG_mlocked leaks were introduced.
>
> That makes sense. I was surprised at the thought that the pages were
> apparently not getting freed properly and upon investigation I could not
> trivially reproduce the problem. Can someone with this problem post their
> .config please in case I'm missing something in there?
>
> > We also,
> > inadvertently, left PG_mlocked in the flags to check at free. We didn't
> > hit this before your patch because free_page_mlock() did a test&clear on
> > the PG_mlocked before checking the flags. Since you moved the call, and
> > used PageMlocked() instead of TestClearPageMlocked(), any PG_locked page
> > will cause the bug.
> >
> > So, we have another PG_mlocked flag leaking to free. I don't think this
> > is terribly serious in itself, and probably not deserving of a BUG_ON.
> > It probably doesn't deserve a vm_stat, either, I guess. However, it
> > could indicate a more serious logic error and should be examined. So it
> > would be nice to retain some indication that it's happening.
> >
> > > The patch that addresses 1 seemed ok to me. What do you think?
> > >
> >
> > Your alternative #2 sounds less expensive that test&clear.
> >
>
> How about the following? The intention is to warn once when PG_mlocked
> is set but continue to count the number of times the event occured.
>
> ==== CUT HERE ====
> mm: Warn once when a page is freed with PG_mlocked set
>
> When a page is freed with the PG_mlocked set, it is considered an unexpected
> but recoverable situation. A counter records how often this event happens
> but due to commit da456f14d [page allocator: do not disable interrupts in
> free_page_mlock()], the page state is being treated as a bad page which is
> considered a severe bug.
>
> This bug drops the severity of the report in the event a page is freed
> with PG_mlocked set. A warning is printed once and the subsequent events
> counted.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> include/linux/page-flags.h | 10 +++++++++-
> mm/page_alloc.c | 9 +++++++++
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d6792f8..81731cf 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -389,7 +389,15 @@ static inline void __ClearPageTail(struct page *page)
> 1 << PG_private | 1 << PG_private_2 | \
> 1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
> 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
> - 1 << PG_unevictable | __PG_MLOCKED)
> + 1 << PG_unevictable)
> +
> +/*
> + * Flags checked when a page is freed. Pages being freed should not have
> + * these set but the situation is easily resolved and should just be
> + * reported as a once-off warning.
> + */
> +#define PAGE_FLAGS_WARN_AT_FREE \
> + (__PG_MLOCKED)
>
> /*
> * Flags checked when a page is prepped for return by the page allocator.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a5f3c27..c8c029e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -497,6 +497,15 @@ static void free_page_mlock(struct page *page) { }
>
> static inline int free_pages_check(struct page *page)
> {
> + if (unlikely(page->flags & PAGE_FLAGS_WARN_AT_FREE)) {

this condition is always false. it's because caller clear PG_Mlocked flag
before calling free_pages_check().


> + WARN_ONCE(1, KERN_WARNING
> + "Sloppy page flags set process %s at pfn:%05lx\n"
> + "page:%p flags:%p\n",
> + current->comm, page_to_pfn(page),
> + page, (void *)page->flags);

hmm, mystery (void*) casting is here.


> + page->flags &= ~PAGE_FLAGS_WARN_AT_FREE;
> + }
> +
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> (atomic_read(&page->_count) != 0) |

Howerver, I like this patch concept. this warning is useful and meaningful IMHO.


2009-06-29 08:41:26

by Mel Gorman

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Tue, Jun 23, 2009 at 08:11:51PM +0900, KOSAKI Motohiro wrote:
> > On Mon, Jun 22, 2009 at 12:02:33PM -0400, Lee Schermerhorn wrote:
> > > On Mon, 2009-06-22 at 10:16 +0100, Mel Gorman wrote:
> > > > On Mon, Jun 22, 2009 at 11:39:53AM +0900, KOSAKI Motohiro wrote:
> > > > > (cc to Mel and some reviewer)
> > >
> > > [added Rik so that he can get multiple copies, too. :)]
> > >
> > > > >
> > > > > > Flags are:
> > > > > > 0000000000400000 -- __PG_MLOCKED
> > > > > > 800000000050000c -- my page flags
> > > > > > 3650000c -- Maxim's page flags
> > > > > > 0000000000693ce1 -- my PAGE_FLAGS_CHECK_AT_FREE
> > > > >
> > > > > I guess commit da456f14d (page allocator: do not disable interrupts in
> > > > > free_page_mlock()) is a bit wrong.
> > > > >
> > > > > current code is:
> > > > > -------------------------------------------------------------
> > > > > static void free_hot_cold_page(struct page *page, int cold)
> > > > > {
> > > > > (snip)
> > > > > int clearMlocked = PageMlocked(page);
> > > > > (snip)
> > > > > if (free_pages_check(page))
> > > > > return;
> > > > > (snip)
> > > > > local_irq_save(flags);
> > > > > if (unlikely(clearMlocked))
> > > > > free_page_mlock(page);
> > > > > -------------------------------------------------------------
> > > > >
> > > > > Oh well, we remove PG_Mlocked *after* free_pages_check().
> > > > > Then, it makes false-positive warning.
> > > > >
> > > > > Sorry, my review was also wrong. I think reverting this patch is better ;)
> > > > >
> > > >
> > > > I think a revert is way overkill. The intention of the patch is sound -
> > > > reducing the number of times interrupts are disabled. Having pages
> > > > with the PG_locked bit is now somewhat of an expected situation. I'd
> > > > prefer to go with either
> > > >
> > > > 1. Unconditionally clearing the bit with TestClearPageLocked as the
> > > > patch already posted does
> > > > 2. Removing PG_locked from the free_pages_check()
> > > > 3. Unlocking the pages as we go when an mlocked VMA is being torn town
> > >
> > > Mel,
> > >
> > > #3 SHOULD be happening in all cases. The free_page_mlocked() function
> > > counts when this is not happening. We tried to fix all cases that we
> > > encountered before this feature was submitted, but left the vm_stat
> > > there to report if more PG_mlocked leaks were introduced.
> >
> > That makes sense. I was surprised at the thought that the pages were
> > apparently not getting freed properly and upon investigation I could not
> > trivially reproduce the problem. Can someone with this problem post their
> > .config please in case I'm missing something in there?
> >
> > > We also,
> > > inadvertently, left PG_mlocked in the flags to check at free. We didn't
> > > hit this before your patch because free_page_mlock() did a test&clear on
> > > the PG_mlocked before checking the flags. Since you moved the call, and
> > > used PageMlocked() instead of TestClearPageMlocked(), any PG_locked page
> > > will cause the bug.
> > >
> > > So, we have another PG_mlocked flag leaking to free. I don't think this
> > > is terribly serious in itself, and probably not deserving of a BUG_ON.
> > > It probably doesn't deserve a vm_stat, either, I guess. However, it
> > > could indicate a more serious logic error and should be examined. So it
> > > would be nice to retain some indication that it's happening.
> > >
> > > > The patch that addresses 1 seemed ok to me. What do you think?
> > > >
> > >
> > > Your alternative #2 sounds less expensive that test&clear.
> > >
> >
> > How about the following? The intention is to warn once when PG_mlocked
> > is set but continue to count the number of times the event occured.
> >
> > ==== CUT HERE ====
> > mm: Warn once when a page is freed with PG_mlocked set
> >
> > When a page is freed with the PG_mlocked set, it is considered an unexpected
> > but recoverable situation. A counter records how often this event happens
> > but due to commit da456f14d [page allocator: do not disable interrupts in
> > free_page_mlock()], the page state is being treated as a bad page which is
> > considered a severe bug.
> >
> > This bug drops the severity of the report in the event a page is freed
> > with PG_mlocked set. A warning is printed once and the subsequent events
> > counted.
> >
> > Signed-off-by: Mel Gorman <[email protected]>
> > ---
> > include/linux/page-flags.h | 10 +++++++++-
> > mm/page_alloc.c | 9 +++++++++
> > 2 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index d6792f8..81731cf 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -389,7 +389,15 @@ static inline void __ClearPageTail(struct page *page)
> > 1 << PG_private | 1 << PG_private_2 | \
> > 1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
> > 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
> > - 1 << PG_unevictable | __PG_MLOCKED)
> > + 1 << PG_unevictable)
> > +
> > +/*
> > + * Flags checked when a page is freed. Pages being freed should not have
> > + * these set but the situation is easily resolved and should just be
> > + * reported as a once-off warning.
> > + */
> > +#define PAGE_FLAGS_WARN_AT_FREE \
> > + (__PG_MLOCKED)
> >
> > /*
> > * Flags checked when a page is prepped for return by the page allocator.
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a5f3c27..c8c029e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -497,6 +497,15 @@ static void free_page_mlock(struct page *page) { }
> >
> > static inline int free_pages_check(struct page *page)
> > {
> > + if (unlikely(page->flags & PAGE_FLAGS_WARN_AT_FREE)) {
>
> this condition is always false. it's because caller clear PG_Mlocked flag
> before calling free_pages_check().
>

This patch is intended as an alternative to the patch that replaces
PageMlocked() with TestClearPageMlocked() so I expect the flag to only be
set in the situation where PG_mlocked is not being cleared properly.

I see the unconditionoal clearing of the flag was merged since but even
that might be too heavy handed as we are making a locked bit operation
on every page free. That's unfortunate overhead to incur on every page
free to handle a situation that should not be occurring at all.

>
> > + WARN_ONCE(1, KERN_WARNING
> > + "Sloppy page flags set process %s at pfn:%05lx\n"
> > + "page:%p flags:%p\n",
> > + current->comm, page_to_pfn(page),
> > + page, (void *)page->flags);
>
> hmm, mystery (void*) casting is here.
>

Code was taken from bad_page(). I should have used 0x%lX here.

>
> > + page->flags &= ~PAGE_FLAGS_WARN_AT_FREE;
> > + }
> > +
> > if (unlikely(page_mapcount(page) |
> > (page->mapping != NULL) |
> > (atomic_read(&page->_count) != 0) |
>
> Howerver, I like this patch concept. this warning is useful and meaningful IMHO.
>

This is a version that is based on top of current mainline that just
displays the warning. However, I think we should consider changing
TestClearPageMlocked() back to PageMlocked() and only clearing the flags
when the unusual condition is encountered.

==== CUT HERE ====
mm: Warn once when a page is freed with PG_mlocked set

When a page is freed with the PG_mlocked set, it is considered an unexpected
but recoverable situation. A counter records how often this event happens
but it is easy to miss that this event has occured at all. This patch warns
once when PG_mlocked is set to prompt debuggers to check the counter to
see how often it is happening.

Signed-off-by: Mel Gorman <[email protected]>
---
mm/page_alloc.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d714f8..519ea6e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -495,8 +495,16 @@ static inline void free_page_mlock(struct page *page)
static void free_page_mlock(struct page *page) { }
#endif

-static inline int free_pages_check(struct page *page)
-{
+static inline int free_pages_check(struct page *page, int wasMlocked)
+{
+ if (unlikely(wasMlocked)) {
+ WARN_ONCE(1, KERN_WARNING
+ "Page flag mlocked set for process %s at pfn:%05lx\n"
+ "page:%p flags:0x%lX\n",
+ current->comm, page_to_pfn(page),
+ page, page->flags|__PG_MLOCKED);
+ }
+
if (unlikely(page_mapcount(page) |
(page->mapping != NULL) |
(atomic_read(&page->_count) != 0) |
@@ -562,7 +570,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
kmemcheck_free_shadow(page, order);

for (i = 0 ; i < (1 << order) ; ++i)
- bad += free_pages_check(page + i);
+ bad += free_pages_check(page + i, wasMlocked);
if (bad)
return;

@@ -1027,7 +1035,7 @@ static void free_hot_cold_page(struct page *page, int cold)

if (PageAnon(page))
page->mapping = NULL;
- if (free_pages_check(page))
+ if (free_pages_check(page, wasMlocked))
return;

if (!PageHighMem(page)) {

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-06-29 10:22:30

by Johannes Weiner

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Mon, Jun 29, 2009 at 09:41:14AM +0100, Mel Gorman wrote:
> I see the unconditionoal clearing of the flag was merged since but even
> that might be too heavy handed as we are making a locked bit operation
> on every page free. That's unfortunate overhead to incur on every page
> free to handle a situation that should not be occurring at all.

Linus was probably quick to merge it as istr several people hitting
bad_page() triggering. We should get rid of the locked op, I was just
not 100% sure and chose the safer version.

> > > + WARN_ONCE(1, KERN_WARNING
> > > + "Sloppy page flags set process %s at pfn:%05lx\n"
> > > + "page:%p flags:%p\n",
> > > + current->comm, page_to_pfn(page),
> > > + page, (void *)page->flags);
[...]
> > > + page->flags &= ~PAGE_FLAGS_WARN_AT_FREE;
> > > + }
> > > +
> > > if (unlikely(page_mapcount(page) |
> > > (page->mapping != NULL) |
> > > (atomic_read(&page->_count) != 0) |
> >
> > Howerver, I like this patch concept. this warning is useful and meaningful IMHO.
> >
>
> This is a version that is based on top of current mainline that just
> displays the warning. However, I think we should consider changing
> TestClearPageMlocked() back to PageMlocked() and only clearing the flags
> when the unusual condition is encountered.

I have a diff at home that makes this an unlocked
__TestClearPageMlocked(), would you be okay with this?

2009-06-29 10:37:17

by Mel Gorman

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Mon, Jun 29, 2009 at 12:18:19PM +0200, Johannes Weiner wrote:
> On Mon, Jun 29, 2009 at 09:41:14AM +0100, Mel Gorman wrote:
> > I see the unconditionoal clearing of the flag was merged since but even
> > that might be too heavy handed as we are making a locked bit operation
> > on every page free. That's unfortunate overhead to incur on every page
> > free to handle a situation that should not be occurring at all.
>
> Linus was probably quick to merge it as istr several people hitting
> bad_page() triggering. We should get rid of the locked op, I was just
> not 100% sure and chose the safer version.
>

And I have no problem with the decision. Leaving it as it was would have
caused a storm of bug reports, all similar.

> > > > + WARN_ONCE(1, KERN_WARNING
> > > > + "Sloppy page flags set process %s at pfn:%05lx\n"
> > > > + "page:%p flags:%p\n",
> > > > + current->comm, page_to_pfn(page),
> > > > + page, (void *)page->flags);
> [...]
> > > > + page->flags &= ~PAGE_FLAGS_WARN_AT_FREE;
> > > > + }
> > > > +
> > > > if (unlikely(page_mapcount(page) |
> > > > (page->mapping != NULL) |
> > > > (atomic_read(&page->_count) != 0) |
> > >
> > > Howerver, I like this patch concept. this warning is useful and meaningful IMHO.
> > >
> >
> > This is a version that is based on top of current mainline that just
> > displays the warning. However, I think we should consider changing
> > TestClearPageMlocked() back to PageMlocked() and only clearing the flags
> > when the unusual condition is encountered.
>
> I have a diff at home that makes this an unlocked
> __TestClearPageMlocked(), would you be okay with this?
>

It'd be an improvement for sure. Post it and I'll take a look.

My preference is still to clear the flag only when found to be erroneously set
and print a warning once but that's because it was the patch I put together
so I'm biased :)

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-06-30 00:31:17

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

Hi

Thank you new version.

> ==== CUT HERE ====
> mm: Warn once when a page is freed with PG_mlocked set
>
> When a page is freed with the PG_mlocked set, it is considered an unexpected
> but recoverable situation. A counter records how often this event happens
> but it is easy to miss that this event has occured at all. This patch warns
> once when PG_mlocked is set to prompt debuggers to check the counter to
> see how often it is happening.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/page_alloc.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5d714f8..519ea6e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -495,8 +495,16 @@ static inline void free_page_mlock(struct page *page)
> static void free_page_mlock(struct page *page) { }
> #endif
>
> -static inline int free_pages_check(struct page *page)
> -{
> +static inline int free_pages_check(struct page *page, int wasMlocked)
> +{
> + if (unlikely(wasMlocked)) {
> + WARN_ONCE(1, KERN_WARNING
> + "Page flag mlocked set for process %s at pfn:%05lx\n"
> + "page:%p flags:0x%lX\n",

0x%lX is a bit redundunt.
%lX insert "0x" string by itself, I think.


> + current->comm, page_to_pfn(page),
> + page, page->flags|__PG_MLOCKED);
> + }
> +
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> (atomic_read(&page->_count) != 0) |
> @@ -562,7 +570,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> kmemcheck_free_shadow(page, order);
>
> for (i = 0 ; i < (1 << order) ; ++i)
> - bad += free_pages_check(page + i);
> + bad += free_pages_check(page + i, wasMlocked);
> if (bad)
> return;
>
> @@ -1027,7 +1035,7 @@ static void free_hot_cold_page(struct page *page, int cold)
>
> if (PageAnon(page))
> page->mapping = NULL;
> - if (free_pages_check(page))
> + if (free_pages_check(page, wasMlocked))
> return;
>
> if (!PageHighMem(page)) {

Other part looks fine. thanks.


2009-06-30 15:11:22

by Mel Gorman

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Tue, Jun 30, 2009 at 09:31:04AM +0900, KOSAKI Motohiro wrote:
> Hi
>
> Thank you new version.
>
> > ==== CUT HERE ====
> > mm: Warn once when a page is freed with PG_mlocked set
> >
> > When a page is freed with the PG_mlocked set, it is considered an unexpected
> > but recoverable situation. A counter records how often this event happens
> > but it is easy to miss that this event has occured at all. This patch warns
> > once when PG_mlocked is set to prompt debuggers to check the counter to
> > see how often it is happening.
> >
> > Signed-off-by: Mel Gorman <[email protected]>
> > ---
> > mm/page_alloc.c | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5d714f8..519ea6e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -495,8 +495,16 @@ static inline void free_page_mlock(struct page *page)
> > static void free_page_mlock(struct page *page) { }
> > #endif
> >
> > -static inline int free_pages_check(struct page *page)
> > -{
> > +static inline int free_pages_check(struct page *page, int wasMlocked)
> > +{
> > + if (unlikely(wasMlocked)) {
> > + WARN_ONCE(1, KERN_WARNING
> > + "Page flag mlocked set for process %s at pfn:%05lx\n"
> > + "page:%p flags:0x%lX\n",
>
> 0x%lX is a bit redundunt.
> %lX insert "0x" string by itself, I think.
>

It does not automatically insert the 0x for me and I just did a quick
test there. Can you double check please?

>
> > + current->comm, page_to_pfn(page),
> > + page, page->flags|__PG_MLOCKED);
> > + }
> > +
> > if (unlikely(page_mapcount(page) |
> > (page->mapping != NULL) |
> > (atomic_read(&page->_count) != 0) |
> > @@ -562,7 +570,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> > kmemcheck_free_shadow(page, order);
> >
> > for (i = 0 ; i < (1 << order) ; ++i)
> > - bad += free_pages_check(page + i);
> > + bad += free_pages_check(page + i, wasMlocked);
> > if (bad)
> > return;
> >
> > @@ -1027,7 +1035,7 @@ static void free_hot_cold_page(struct page *page, int cold)
> >
> > if (PageAnon(page))
> > page->mapping = NULL;
> > - if (free_pages_check(page))
> > + if (free_pages_check(page, wasMlocked))
> > return;
> >
> > if (!PageHighMem(page)) {
>
> Other part looks fine. thanks.
>
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-06-30 16:35:07

by Mel Gorman

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

On Tue, Jun 30, 2009 at 09:31:04AM +0900, KOSAKI Motohiro wrote:
> > -static inline int free_pages_check(struct page *page)
> > -{
> > +static inline int free_pages_check(struct page *page, int wasMlocked)
> > +{
> > + if (unlikely(wasMlocked)) {
> > + WARN_ONCE(1, KERN_WARNING
> > + "Page flag mlocked set for process %s at pfn:%05lx\n"
> > + "page:%p flags:0x%lX\n",
>
> 0x%lX is a bit redundunt.
> %lX insert "0x" string by itself, I think.
>

/me slaps self

As hnaz pointed out to me on IRC, %#lX would have done the job of
putting in the 0x automatically.

==== CUT HERE ====
mm: Warn once when a page is freed with PG_mlocked set

When a page is freed with the PG_mlocked set, it is considered an unexpected
but recoverable situation. A counter records how often this event happens
but it is easy to miss that this event has occured at all. This patch warns
once when PG_mlocked is set to prompt debuggers to check the counter to see
how often it is happening.

Signed-off-by: Mel Gorman <[email protected]>
---
mm/page_alloc.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d714f8..519ea6e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -495,8 +495,16 @@ static inline void free_page_mlock(struct page *page)
static void free_page_mlock(struct page *page) { }
#endif

-static inline int free_pages_check(struct page *page)
-{
+static inline int free_pages_check(struct page *page, int wasMlocked)
+{
+ if (unlikely(wasMlocked)) {
+ WARN_ONCE(1, KERN_WARNING
+ "Page flag mlocked set for process %s at pfn:%05lx\n"
+ "page:%p flags:%#lX\n",
+ current->comm, page_to_pfn(page),
+ page, page->flags|__PG_MLOCKED);
+ }
+
if (unlikely(page_mapcount(page) |
(page->mapping != NULL) |
(atomic_read(&page->_count) != 0) |
@@ -562,7 +570,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
kmemcheck_free_shadow(page, order);

for (i = 0 ; i < (1 << order) ; ++i)
- bad += free_pages_check(page + i);
+ bad += free_pages_check(page + i, wasMlocked);
if (bad)
return;

@@ -1027,7 +1035,7 @@ static void free_hot_cold_page(struct page *page, int cold)

if (PageAnon(page))
page->mapping = NULL;
- if (free_pages_check(page))
+ if (free_pages_check(page, wasMlocked))
return;

if (!PageHighMem(page)) {

2009-06-30 23:45:59

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: BUG: Bad page state [was: Strange oopses in 2.6.30]

> On Tue, Jun 30, 2009 at 09:31:04AM +0900, KOSAKI Motohiro wrote:
> > > -static inline int free_pages_check(struct page *page)
> > > -{
> > > +static inline int free_pages_check(struct page *page, int wasMlocked)
> > > +{
> > > + if (unlikely(wasMlocked)) {
> > > + WARN_ONCE(1, KERN_WARNING
> > > + "Page flag mlocked set for process %s at pfn:%05lx\n"
> > > + "page:%p flags:0x%lX\n",
> >
> > 0x%lX is a bit redundunt.
> > %lX insert "0x" string by itself, I think.
> >
>
> /me slaps self
>
> As hnaz pointed out to me on IRC, %#lX would have done the job of
> putting in the 0x automatically.

No. it's my fault. my last mail has typo (i forgot to write '#'), sorry.

>
> ==== CUT HERE ====
> mm: Warn once when a page is freed with PG_mlocked set
>
> When a page is freed with the PG_mlocked set, it is considered an unexpected
> but recoverable situation. A counter records how often this event happens
> but it is easy to miss that this event has occured at all. This patch warns
> once when PG_mlocked is set to prompt debuggers to check the counter to see
> how often it is happening.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/page_alloc.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5d714f8..519ea6e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -495,8 +495,16 @@ static inline void free_page_mlock(struct page *page)
> static void free_page_mlock(struct page *page) { }
> #endif
>
> -static inline int free_pages_check(struct page *page)
> -{
> +static inline int free_pages_check(struct page *page, int wasMlocked)
> +{
> + if (unlikely(wasMlocked)) {
> + WARN_ONCE(1, KERN_WARNING
> + "Page flag mlocked set for process %s at pfn:%05lx\n"
> + "page:%p flags:%#lX\n",
> + current->comm, page_to_pfn(page),
> + page, page->flags|__PG_MLOCKED);
> + }
> +
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> (atomic_read(&page->_count) != 0) |
> @@ -562,7 +570,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> kmemcheck_free_shadow(page, order);
>
> for (i = 0 ; i < (1 << order) ; ++i)
> - bad += free_pages_check(page + i);
> + bad += free_pages_check(page + i, wasMlocked);
> if (bad)
> return;
>
> @@ -1027,7 +1035,7 @@ static void free_hot_cold_page(struct page *page, int cold)
>
> if (PageAnon(page))
> page->mapping = NULL;
> - if (free_pages_check(page))
> + if (free_pages_check(page, wasMlocked))
> return;
>
> if (!PageHighMem(page)) {

OK, looks fine.
Reviewed-by: KOSAKI Motohiro <[email protected]>