2005-01-08 09:38:53

by Chris Wright

[permalink] [raw]
Subject: panic on bootup due to __GFP_ZERO patch

I'm getting a panic during pidmap_init with a backtrace that looks
something like:

buffered_rmqueue
__alloc_pages
get_zeroed_page
pidmap_init
start_kernel

Reverting the __GFP_ZERO patch fixes the issue, haven't drilled down
any deeper yet to see what in the patch is causing the problem. This is
x86 w/out HIGHMEM (and no NUMA).

Any ideas?

thanks,
-chris


2005-01-08 20:01:36

by Dave Jones

[permalink] [raw]
Subject: Re: panic on bootup due to __GFP_ZERO patch

On Sat, Jan 08, 2005 at 01:06:30AM -0800, Chris Wright wrote:
> I'm getting a panic during pidmap_init with a backtrace that looks
> something like:
>
> buffered_rmqueue
> __alloc_pages
> get_zeroed_page
> pidmap_init
> start_kernel
>
> Reverting the __GFP_ZERO patch fixes the issue, haven't drilled down
> any deeper yet to see what in the patch is causing the problem. This is
> x86 w/out HIGHMEM (and no NUMA).

ACK, there has been a number of folks hit by this since I updated
the Fedora rawhide kernel to snapshots including this change.

https://bugzilla.redhat.com/beta/show_bug.cgi?id=144480

I've also hit in on my test box that has 256MB.
The pattern so far does seem to be 'no highmem', though
I've not actually tried a recent snapshot on my highmem box.

Dave

2005-01-09 09:45:40

by Andrew Morton

[permalink] [raw]
Subject: Re: panic on bootup due to __GFP_ZERO patch

Chris Wright <[email protected]> wrote:
>
> I'm getting a panic during pidmap_init with a backtrace that looks
> something like:
>
> buffered_rmqueue
> __alloc_pages
> get_zeroed_page
> pidmap_init
> start_kernel
>
> Reverting the __GFP_ZERO patch fixes the issue, haven't drilled down
> any deeper yet to see what in the patch is causing the problem. This is
> x86 w/out HIGHMEM (and no NUMA).
>

Well it's doing clear_highpage() before __alloc_pages() has called
kernel_map_pages(), so CONFIG_DEBUG_PAGEALLOC is quite kaput.

So the current __GFP_ZERO buglist is:

1: Breaks CONFIG_DEBUG_PAGEALLOC

2: Breaks the cache aliasing protection for anonymous pages

3: prep_zero_page() uses KM_USER0 so __GFP_ZERO from IRQ context will
cause rare memory corruption.

2005-01-09 15:56:09

by Zwane Mwaikambo

[permalink] [raw]
Subject: [PATCH] Fixes for prep_zero_page

On Sun, 9 Jan 2005, Andrew Morton wrote:

> Well it's doing clear_highpage() before __alloc_pages() has called
> kernel_map_pages(), so CONFIG_DEBUG_PAGEALLOC is quite kaput.
>
> So the current __GFP_ZERO buglist is:
>
> 1: Breaks CONFIG_DEBUG_PAGEALLOC
>
> 2: Breaks the cache aliasing protection for anonymous pages
>
> 3: prep_zero_page() uses KM_USER0 so __GFP_ZERO from IRQ context will
> cause rare memory corruption.

The following should take care of 1 and 3. I opted to unmap the pages
again after the clear page so that it remains isolated and we don't have
to make additional checks to see if we should unmap the pages.

Signed-off-by: Zwane Mwaikambo <[email protected]>

Index: linux-2.6.10-mm2/include/linux/highmem.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm2/include/linux/highmem.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 highmem.h
--- linux-2.6.10-mm2/include/linux/highmem.h 9 Jan 2005 04:51:52 -0000 1.1.1.1
+++ linux-2.6.10-mm2/include/linux/highmem.h 9 Jan 2005 15:32:17 -0000
@@ -50,6 +50,18 @@ static inline void clear_highpage(struct
kunmap_atomic(kaddr, KM_USER0);
}

+static inline void clear_irq_highpage(struct page *page)
+{
+ char *kaddr;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ kaddr = kmap_atomic(page, KM_IRQ0);
+ clear_page(kaddr);
+ kunmap_atomic(kaddr, KM_IRQ0);
+ local_irq_restore(flags);
+}
+
/*
* Same but also flushes aliased cache contents to RAM.
*/
Index: linux-2.6.10-mm2/mm/page_alloc.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm2/mm/page_alloc.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 page_alloc.c
--- linux-2.6.10-mm2/mm/page_alloc.c 9 Jan 2005 04:52:40 -0000 1.1.1.1
+++ linux-2.6.10-mm2/mm/page_alloc.c 9 Jan 2005 15:46:00 -0000
@@ -691,10 +691,17 @@ perthread_pages_alloc(void)
*/
static inline void prep_zero_page(struct page *page, int order)
{
- int i;
+ int i, nr_pages = (1 << order);
+ int context = in_interrupt();

- for(i = 0; i < (1 << order); i++)
- clear_highpage(page + i);
+ kernel_map_pages(page, nr_pages, 1);
+ for(i = 0; i < nr_pages; i++) {
+ if (likely(!context))
+ clear_highpage(page + i);
+ else
+ clear_irq_highpage(page + i);
+ }
+ kernel_map_pages(page, nr_pages, 0);
}

static struct page *

2005-01-09 20:52:43

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Fixes for prep_zero_page

Zwane Mwaikambo <[email protected]> wrote:
>
> On Sun, 9 Jan 2005, Andrew Morton wrote:
>
> > Well it's doing clear_highpage() before __alloc_pages() has called
> > kernel_map_pages(), so CONFIG_DEBUG_PAGEALLOC is quite kaput.
> >
> > So the current __GFP_ZERO buglist is:
> >
> > 1: Breaks CONFIG_DEBUG_PAGEALLOC
> >
> > 2: Breaks the cache aliasing protection for anonymous pages
> >
> > 3: prep_zero_page() uses KM_USER0 so __GFP_ZERO from IRQ context will
> > cause rare memory corruption.
>
> The following should take care of 1 and 3. I opted to unmap the pages
> again after the clear page so that it remains isolated and we don't have
> to make additional checks to see if we should unmap the pages.
>
> Signed-off-by: Zwane Mwaikambo <[email protected]>
>
> Index: linux-2.6.10-mm2/include/linux/highmem.h
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.10-mm2/include/linux/highmem.h,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 highmem.h
> --- linux-2.6.10-mm2/include/linux/highmem.h 9 Jan 2005 04:51:52 -0000 1.1.1.1
> +++ linux-2.6.10-mm2/include/linux/highmem.h 9 Jan 2005 15:32:17 -0000
> @@ -50,6 +50,18 @@ static inline void clear_highpage(struct
> kunmap_atomic(kaddr, KM_USER0);
> }
>
> +static inline void clear_irq_highpage(struct page *page)
> +{
> + char *kaddr;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + kaddr = kmap_atomic(page, KM_IRQ0);
> + clear_page(kaddr);
> + kunmap_atomic(kaddr, KM_IRQ0);
> + local_irq_restore(flags);
> +}
> +

This won't work right if someone tries to allocate memory while holding a
KM_IRQ0 atomic kmap.

It would be quite bizarre for anyone to be allocating highmem pages from
IRQ context anyway, but as a generic mechanism this really should work as
expected in all contexts. That means a new kmap slot.

> /*
> * Same but also flushes aliased cache contents to RAM.
> */
> Index: linux-2.6.10-mm2/mm/page_alloc.c
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.10-mm2/mm/page_alloc.c,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 page_alloc.c
> --- linux-2.6.10-mm2/mm/page_alloc.c 9 Jan 2005 04:52:40 -0000 1.1.1.1
> +++ linux-2.6.10-mm2/mm/page_alloc.c 9 Jan 2005 15:46:00 -0000
> @@ -691,10 +691,17 @@ perthread_pages_alloc(void)
> */
> static inline void prep_zero_page(struct page *page, int order)
> {
> - int i;
> + int i, nr_pages = (1 << order);
> + int context = in_interrupt();
>
> - for(i = 0; i < (1 << order); i++)
> - clear_highpage(page + i);
> + kernel_map_pages(page, nr_pages, 1);
> + for(i = 0; i < nr_pages; i++) {
> + if (likely(!context))
> + clear_highpage(page + i);
> + else
> + clear_irq_highpage(page + i);
> + }
> + kernel_map_pages(page, nr_pages, 0);
> }

Can't we simply move the page zeroing to the very end of __alloc_pages()?

2005-01-09 21:31:51

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] Fixes for prep_zero_page

On Sun, 9 Jan 2005, Andrew Morton wrote:

> This won't work right if someone tries to allocate memory while holding a
> KM_IRQ0 atomic kmap.
>
> It would be quite bizarre for anyone to be allocating highmem pages from
> IRQ context anyway, but as a generic mechanism this really should work as
> expected in all contexts. That means a new kmap slot.

I was trying to find users of nested KM_IRQ to verify against so i just
went with the first slot. The problem with sharing a slot with irq context
is that you have to exclude interrupts whilst doing the zeroing too,
unless we maybe create two slots.

> Can't we simply move the page zeroing to the very end of __alloc_pages()?

Ok, i've changed that bit to something like;

Index: linux-2.6.10-mm2/mm/page_alloc.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.10-mm2/mm/page_alloc.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 page_alloc.c
--- linux-2.6.10-mm2/mm/page_alloc.c 9 Jan 2005 04:52:40 -0000 1.1.1.1
+++ linux-2.6.10-mm2/mm/page_alloc.c 9 Jan 2005 21:23:31 -0000
@@ -732,9 +740,6 @@ buffered_rmqueue(struct zone *zone, int
mod_page_state_zone(zone, pgalloc, 1 << order);
prep_new_page(page, order);

- if (gfp_flags & __GFP_ZERO)
- prep_zero_page(page, order);
-
if (order && (gfp_flags & __GFP_COMP))
prep_compound_page(page, order);
}
@@ -935,6 +940,8 @@ nopage:
got_pg:
zone_statistics(zonelist, z);
kernel_map_pages(page, 1 << order, 1);
+ if (gfp_mask & __GFP_ZERO)
+ prep_zero_page(page, order);
return page;
}

2005-01-09 22:48:49

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] Fixes for prep_zero_page

* Zwane Mwaikambo ([email protected]) wrote:
> On Sun, 9 Jan 2005, Andrew Morton wrote:
> > Can't we simply move the page zeroing to the very end of __alloc_pages()?
>
> Ok, i've changed that bit to something like;

I did it the other way around, and moved kernel_map_pages to prep_new_page
so it's called before zeroing to keep that with the other prep bits
in buffered_rmqueue. Made sense to me that kernel_map_pages is part of
prepping a new page, but this isn't my area, so I could be way off ;-)
It works for me with DEBUG_PAGEALLOC enabled.

===== mm/page_alloc.c 1.251 vs edited =====
--- 1.251/mm/page_alloc.c 2005-01-07 21:44:07 -08:00
+++ edited/mm/page_alloc.c 2005-01-09 14:36:38 -08:00
@@ -413,6 +413,7 @@ static void prep_new_page(struct page *p
1 << PG_checked | 1 << PG_mappedtodisk);
page->private = 0;
set_page_refs(page, order);
+ kernel_map_pages(page, 1 << order, 1);
}

/*
@@ -823,7 +824,6 @@ nopage:
return NULL;
got_pg:
zone_statistics(zonelist, z);
- kernel_map_pages(page, 1 << order, 1);
return page;
}

2005-01-10 04:18:36

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] Fixes for prep_zero_page

On Sun, 9 Jan 2005, Chris Wright wrote:

> * Zwane Mwaikambo ([email protected]) wrote:
> > On Sun, 9 Jan 2005, Andrew Morton wrote:
> > > Can't we simply move the page zeroing to the very end of __alloc_pages()?
> >
> > Ok, i've changed that bit to something like;
>
> I did it the other way around, and moved kernel_map_pages to prep_new_page
> so it's called before zeroing to keep that with the other prep bits
> in buffered_rmqueue. Made sense to me that kernel_map_pages is part of
> prepping a new page, but this isn't my area, so I could be way off ;-)
> It works for me with DEBUG_PAGEALLOC enabled.

A lot more digestible than my offering ;)

2005-01-13 05:05:40

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] Fixes for prep_zero_page

It looks like it's still not happy with CONFIG_DEBUG_PAGEALLOC under load.

Unable to handle kernel paging request at virtual address ec5d97f4
printing eip:
c014a882
*pde = 0083e067
Oops: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0
EIP: 0060:[<c014a882>] Not tainted VLI
EFLAGS: 00010002 (2.6.10-mm2)
EIP is at check_slabuse+0x52/0xf0
eax: ec5d97f4 ebx: c18dd180 ecx: ec5b685c edx: ec5d9000
esi: c18dd180 edi: ec5d9000 ebp: c19d1efc esp: c19d1ed4
ds: 007b es: 007b ss: 0068
Process events/0 (pid: 6, threadinfo=c19d0000 task=c19abac0)
Stack: c19d1efc c0149c64 c19d1f0c c0149b0e c19d1ef0 00000000 ec5b685c ec5b685c
c18dd180 c18dd1a0 c19d1f24 c014aa19 c18dd2a0 c19d1f24 c014acec 00000000
c18dd180 c18dd2a0 c18dd180 00000005 c19d1f48 c014af91 c18dd2c8 c18dd2a0
Call Trace:
[<c0103fda>] show_stack+0x7a/0x90
[<c0104166>] show_registers+0x156/0x1c0
[<c0104380>] die+0x100/0x190
[<c0117159>] do_page_fault+0x369/0x65f
[<c0103c6f>] error_code+0x2b/0x30
[<c014aa19>] check_redzone+0xf9/0x140
[<c014af91>] cache_reap+0x221/0x230
[<c0130b2b>] worker_thread+0x17b/0x210
[<c0135605>] kthread+0xa5/0xb0
[<c0101415>] kernel_thread_helper+0x5/0x10
Code: 00 00 8d bc 27 00 00 00 00 83 c4 1c 5b 5e 5f 5d c3 8b 43 38 8b 7d ec
8b 4d f0 0f af f8 8b 410c 01 c7 89 fa 89 d8 e8 ee d3 ff ff <8b> 30 89 fa
89 d8 e8 13 d4 ff ff 81 fe 71 f0 2c 5a 8b 00 74 7b
<3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():1, irqs_disabled():0
[<c0104007>] dump_stack+0x17/0x20
[<c011cb7c>] __might_sleep+0xac/0xc0
[<c0120753>] profile_task_exit+0x23/0x60
[<c012297c>] do_exit+0x1c/0x440
[<c0104408>] die+0x188/0x190
[<c0117159>] do_page_fault+0x369/0x65f
[<c0103c6f>] error_code+0x2b/0x30
[<c014aa19>] check_redzone+0xf9/0x140
[<c014af91>] cache_reap+0x221/0x230
[<c0130b2b>] worker_thread+0x17b/0x210
[<c0135605>] kthread+0xa5/0xb0
[<c0101415>] kernel_thread_helper+0x5/0x10
note: events/0[6] exited with preempt_count 1

2005-01-13 18:29:01

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] Fixes for prep_zero_page

* Zwane Mwaikambo ([email protected]) wrote:
> It looks like it's still not happy with CONFIG_DEBUG_PAGEALLOC under load.
>
> Unable to handle kernel paging request at virtual address ec5d97f4

Is that in vmalloc space?

> printing eip:
> c014a882
> *pde = 0083e067
> Oops: 0000 [#1]
> PREEMPT SMP DEBUG_PAGEALLOC
> Modules linked in:
> CPU: 0
> EIP: 0060:[<c014a882>] Not tainted VLI
> EFLAGS: 00010002 (2.6.10-mm2)
> EIP is at check_slabuse+0x52/0xf0

Hmm, isn't that from Manfred's patch to periodically scan? Doesn't look
to me like it's related to the page prep fixup. What kind of load, etc?

thanks,
-chris

2005-01-14 00:56:43

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] Fixes for prep_zero_page

On Thu, 13 Jan 2005, Chris Wright wrote:

> * Zwane Mwaikambo ([email protected]) wrote:
> > It looks like it's still not happy with CONFIG_DEBUG_PAGEALLOC under load.
> >
> > Unable to handle kernel paging request at virtual address ec5d97f4
>
> Is that in vmalloc space?

Hmm it looks like.

> > printing eip:
> > c014a882
> > *pde = 0083e067
> > Oops: 0000 [#1]
> > PREEMPT SMP DEBUG_PAGEALLOC
> > Modules linked in:
> > CPU: 0
> > EIP: 0060:[<c014a882>] Not tainted VLI
> > EFLAGS: 00010002 (2.6.10-mm2)
> > EIP is at check_slabuse+0x52/0xf0
>
> Hmm, isn't that from Manfred's patch to periodically scan? Doesn't look
> to me like it's related to the page prep fixup. What kind of load, etc?

I had an email exchange with Christopher and we came to the conclusion
that it's breakage elsewhere with CONFIG_DEBUG_PAGEALLOC.