2014-06-11 21:34:31

by Mark Salter

[permalink] [raw]
Subject: [PATCH] arm64: fix MAX_ORDER for 64K pagesize

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
I get this at early boot:

SMP: Total of 8 processors activated.
devtmpfs: initialized
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = fffffe0000050000
[00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
Internal error: Oops: 96000006 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
PC is at __list_add+0x10/0xd4
LR is at free_one_page+0x270/0x638
...
Call trace:
[<fffffe00003ee970>] __list_add+0x10/0xd4
[<fffffe000019c478>] free_one_page+0x26c/0x638
[<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
[<fffffe000019d5e8>] __free_pages+0x74/0xbc
[<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
[<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
[<fffffe0000090418>] do_one_initcall+0xc4/0x154
[<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
[<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens in this configuration because __free_one_page() is called
with an order greater than MAX_ORDER, accesses past zone->free_list[]
and passes a bogus list_head to list_add().

arch/arm64/Kconfig has:

config FORCE_MAX_ZONEORDER
int
default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
default "11"

So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
passes __free_pages() an order of pageblock_order which is based on
(HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
ARM64_64K_PAGES.

Signed-off-by: Mark Salter <[email protected]>
---
arch/arm64/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7295419..42a334e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -269,7 +269,7 @@ config XEN

config FORCE_MAX_ZONEORDER
int
- default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
+ default "14" if ARM64_64K_PAGES
default "11"

endmenu
--
1.9.0


2014-06-11 23:03:46

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize

On Wed, 11 Jun 2014, Mark Salter wrote:

> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
> I get this at early boot:
>
> SMP: Total of 8 processors activated.
> devtmpfs: initialized
> Unable to handle kernel NULL pointer dereference at virtual address 00000008
> pgd = fffffe0000050000
> [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
> Internal error: Oops: 96000006 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
> task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
> PC is at __list_add+0x10/0xd4
> LR is at free_one_page+0x270/0x638
> ...
> Call trace:
> [<fffffe00003ee970>] __list_add+0x10/0xd4
> [<fffffe000019c478>] free_one_page+0x26c/0x638
> [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
> [<fffffe000019d5e8>] __free_pages+0x74/0xbc
> [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
> [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
> [<fffffe0000090418>] do_one_initcall+0xc4/0x154
> [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
> [<fffffe00007520a0>] kernel_init+0xc/0xd4
>
> This happens in this configuration because __free_one_page() is called
> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> and passes a bogus list_head to list_add().
>
> arch/arm64/Kconfig has:
>
> config FORCE_MAX_ZONEORDER
> int
> default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> default "11"
>
> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> passes __free_pages() an order of pageblock_order which is based on
> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
> ARM64_64K_PAGES.
>
> Signed-off-by: Mark Salter <[email protected]>
> ---
> arch/arm64/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7295419..42a334e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -269,7 +269,7 @@ config XEN
>
> config FORCE_MAX_ZONEORDER
> int
> - default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> + default "14" if ARM64_64K_PAGES
> default "11"
>
> endmenu

Any reason to not switch this to

ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA

instead? If pageblock_order > MAX_ORDER because of
HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a
too-large-order to free_pages_prepare() via this path.

Adding Michal and Marek to the cc.

2014-06-11 23:04:33

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize

On Wed, 11 Jun 2014, David Rientjes wrote:

> Any reason to not switch this to
>
> ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
>

(ARM64_64K_PAGES) && (TRANSPARENT_HUGEPAGE || CMA)

> instead? If pageblock_order > MAX_ORDER because of
> HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a
> too-large-order to free_pages_prepare() via this path.
>
> Adding Michal and Marek to the cc.
>

2014-06-12 13:58:12

by Mark Salter

[permalink] [raw]
Subject: Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize

On Wed, 2014-06-11 at 16:04 -0700, David Rientjes wrote:
> On Wed, 11 Jun 2014, David Rientjes wrote:
>
> > Any reason to not switch this to
> >
> > ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
> >
>
> (ARM64_64K_PAGES) && (TRANSPARENT_HUGEPAGE || CMA)

And add HUGETLB to the list also? I'm not sure of all the trade offs
here, so I kept it simple. I don't have a strong opinion one way or
the other.

2014-06-17 18:32:15

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize

On Wed, Jun 11 2014, David Rientjes wrote:
> On Wed, 11 Jun 2014, Mark Salter wrote:
>
>> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
>> I get this at early boot:
>>
>> SMP: Total of 8 processors activated.
>> devtmpfs: initialized
>> Unable to handle kernel NULL pointer dereference at virtual address 00000008
>> pgd = fffffe0000050000
>> [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>> Internal error: Oops: 96000006 [#1] SMP
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>> task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>> PC is at __list_add+0x10/0xd4
>> LR is at free_one_page+0x270/0x638
>> ...
>> Call trace:
>> [<fffffe00003ee970>] __list_add+0x10/0xd4
>> [<fffffe000019c478>] free_one_page+0x26c/0x638
>> [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>> [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>> [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>> [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>> [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>> [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>> [<fffffe00007520a0>] kernel_init+0xc/0xd4
>>
>> This happens in this configuration because __free_one_page() is called
>> with an order greater than MAX_ORDER, accesses past zone->free_list[]
>> and passes a bogus list_head to list_add().
>>
>> arch/arm64/Kconfig has:
>>
>> config FORCE_MAX_ZONEORDER
>> int
>> default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
>> default "11"
>>
>> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
>> passes __free_pages() an order of pageblock_order which is based on
>> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
>> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
>> ARM64_64K_PAGES.
>>
>> Signed-off-by: Mark Salter <[email protected]>
>> ---
>> arch/arm64/Kconfig | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 7295419..42a334e 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -269,7 +269,7 @@ config XEN
>>
>> config FORCE_MAX_ZONEORDER
>> int
>> - default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
>> + default "14" if ARM64_64K_PAGES
>> default "11"
>>
>> endmenu
>
> Any reason to not switch this to
>
> ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
>
> instead? If pageblock_order > MAX_ORDER because of
> HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a
> too-large-order to free_pages_prepare() via this path.
>
> Adding Michal and Marek to the cc.

The correct fix would be to change init_cma_reserved_pageblock such that
it checks whether pageblock_order > MAX_ORDER and if so frees each max
order page of the pageblock individually:

--------- >8 ---------------------------------------------------------
From: Michal Nazarewicz <[email protected]>
Subject: [PATCH] mm: cma: fix cases where pageblock is bigger then MAX_ORDER

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

SMP: Total of 8 processors activated.
devtmpfs: initialized
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = fffffe0000050000
[00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
Internal error: Oops: 96000006 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
PC is at __list_add+0x10/0xd4
LR is at free_one_page+0x270/0x638
...
Call trace:
[<fffffe00003ee970>] __list_add+0x10/0xd4
[<fffffe000019c478>] free_one_page+0x26c/0x638
[<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
[<fffffe000019d5e8>] __free_pages+0x74/0xbc
[<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
[<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
[<fffffe0000090418>] do_one_initcall+0xc4/0x154
[<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
[<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens in this configuration because __free_one_page() is called
with an order greater than MAX_ORDER, accesses past zone->free_list[]
and passes a bogus list_head to list_add().

arch/arm64/Kconfig has:

config FORCE_MAX_ZONEORDER
int
default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
default "11"

So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
passes __free_pages() an order of pageblock_order which is based on
(HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages.

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

Signed-off-by: Michal Nazarewicz <[email protected]>
Reported-by: Mark Salter <[email protected]>
---
mm/page_alloc.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..6e657ce 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)

set_page_refcounted(page);
set_pageblock_migratetype(page, MIGRATE_CMA);
- __free_pages(page, pageblock_order);
+ if (pageblock_order > MAX_ORDER) {
+ struct page *subpage = p;
+ unsigned count = 1 << (pageblock_order - MAX_ORDER);
+ do {
+ __free_pages(subpage, pageblock_order);
+ } while (subpage += MAX_ORDER_NR_PAGES, --count);
+ } else {
+ __free_pages(page, pageblock_order);
+ }
adjust_managed_page_count(page, pageblock_nr_pages);
}
#endif
--------- >8 ---------------------------------------------------------

Thoughts? This has not been tested and I think it may cause performance
degradation in some cases since pageblock_order is not always
a constant, so the comparison may end up not being stripped away even on
systems where it's always false.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--

2014-06-19 18:12:44

by Mark Salter

[permalink] [raw]
Subject: Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize

On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
> On Wed, Jun 11 2014, David Rientjes wrote:
> > On Wed, 11 Jun 2014, Mark Salter wrote:
> >
> >> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE
> >> I get this at early boot:
> >>
> >> SMP: Total of 8 processors activated.
> >> devtmpfs: initialized
> >> Unable to handle kernel NULL pointer dereference at virtual address 00000008
> >> pgd = fffffe0000050000
> >> [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
> >> Internal error: Oops: 96000006 [#1] SMP
> >> Modules linked in:
> >> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
> >> task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
> >> PC is at __list_add+0x10/0xd4
> >> LR is at free_one_page+0x270/0x638
> >> ...
> >> Call trace:
> >> [<fffffe00003ee970>] __list_add+0x10/0xd4
> >> [<fffffe000019c478>] free_one_page+0x26c/0x638
> >> [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
> >> [<fffffe000019d5e8>] __free_pages+0x74/0xbc
> >> [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
> >> [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
> >> [<fffffe0000090418>] do_one_initcall+0xc4/0x154
> >> [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
> >> [<fffffe00007520a0>] kernel_init+0xc/0xd4
> >>
> >> This happens in this configuration because __free_one_page() is called
> >> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> >> and passes a bogus list_head to list_add().
> >>
> >> arch/arm64/Kconfig has:
> >>
> >> config FORCE_MAX_ZONEORDER
> >> int
> >> default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> >> default "11"
> >>
> >> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> >> passes __free_pages() an order of pageblock_order which is based on
> >> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages. I worked around
> >> this by removing the THP test so FORCE_MAX_ZONEORDER is always 14 for
> >> ARM64_64K_PAGES.
> >>
> >> Signed-off-by: Mark Salter <[email protected]>
> >> ---
> >> arch/arm64/Kconfig | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index 7295419..42a334e 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -269,7 +269,7 @@ config XEN
> >>
> >> config FORCE_MAX_ZONEORDER
> >> int
> >> - default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> >> + default "14" if ARM64_64K_PAGES
> >> default "11"
> >>
> >> endmenu
> >
> > Any reason to not switch this to
> >
> > ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE && CMA
> >
> > instead? If pageblock_order > MAX_ORDER because of
> > HPAGE_SHIFT > PAGE_SHIFT, then cma is always going to be passing a
> > too-large-order to free_pages_prepare() via this path.
> >
> > Adding Michal and Marek to the cc.
>
> The correct fix would be to change init_cma_reserved_pageblock such that
> it checks whether pageblock_order > MAX_ORDER and if so frees each max
> order page of the pageblock individually:
>
> --------- >8 ---------------------------------------------------------
> From: Michal Nazarewicz <[email protected]>
> Subject: [PATCH] mm: cma: fix cases where pageblock is bigger then MAX_ORDER
>
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
>
> SMP: Total of 8 processors activated.
> devtmpfs: initialized
> Unable to handle kernel NULL pointer dereference at virtual address 00000008
> pgd = fffffe0000050000
> [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
> Internal error: Oops: 96000006 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
> task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
> PC is at __list_add+0x10/0xd4
> LR is at free_one_page+0x270/0x638
> ...
> Call trace:
> [<fffffe00003ee970>] __list_add+0x10/0xd4
> [<fffffe000019c478>] free_one_page+0x26c/0x638
> [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
> [<fffffe000019d5e8>] __free_pages+0x74/0xbc
> [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
> [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
> [<fffffe0000090418>] do_one_initcall+0xc4/0x154
> [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
> [<fffffe00007520a0>] kernel_init+0xc/0xd4
>
> This happens in this configuration because __free_one_page() is called
> with an order greater than MAX_ORDER, accesses past zone->free_list[]
> and passes a bogus list_head to list_add().
>
> arch/arm64/Kconfig has:
>
> config FORCE_MAX_ZONEORDER
> int
> default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
> default "11"
>
> So with THP turned off MAX_ORDER == 11 but init_cma_reserved_pageblock()
> passes __free_pages() an order of pageblock_order which is based on
> (HPAGE_SHIFT - PAGE_SHIFT) which is 13 for 64K pages.
>
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
>
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Reported-by: Mark Salter <[email protected]>
> ---
> mm/page_alloc.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..6e657ce 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
>
> set_page_refcounted(page);
> set_pageblock_migratetype(page, MIGRATE_CMA);
> - __free_pages(page, pageblock_order);
> + if (pageblock_order > MAX_ORDER) {
> + struct page *subpage = p;
> + unsigned count = 1 << (pageblock_order - MAX_ORDER);
> + do {
> + __free_pages(subpage, pageblock_order);
^^^^^^^
MAX_ORDER

> + } while (subpage += MAX_ORDER_NR_PAGES, --count);
> + } else {
> + __free_pages(page, pageblock_order);
> + }
> adjust_managed_page_count(page, pageblock_nr_pages);
> }
> #endif
> --------- >8 ---------------------------------------------------------
>
> Thoughts? This has not been tested and I think it may cause performance
> degradation in some cases since pageblock_order is not always
> a constant, so the comparison may end up not being stripped away even on
> systems where it's always false.
>

This works with the above tweak. So it fixes the problm here, but I was
not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.
It will be slower, but does it only gets called a few time at most at
boot time, right?

2014-06-19 19:24:11

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize

On Thu, Jun 19 2014, Mark Salter <[email protected]> wrote:
> On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 5dba293..6e657ce 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
>>
>> set_page_refcounted(page);
>> set_pageblock_migratetype(page, MIGRATE_CMA);
>> - __free_pages(page, pageblock_order);
>> + if (pageblock_order > MAX_ORDER) {
>> + struct page *subpage = p;
>> + unsigned count = 1 << (pageblock_order - MAX_ORDER);
>> + do {
>> + __free_pages(subpage, pageblock_order);
> ^^^^^^^
> MAX_ORDER

D'oh! I'll send a revised patch.

>> + } while (subpage += MAX_ORDER_NR_PAGES, --count);
>> + } else {
>> + __free_pages(page, pageblock_order);
>> + }
>> adjust_managed_page_count(page, pageblock_nr_pages);
>> }
>> #endif
>> --------- >8 ---------------------------------------------------------
>>
>> Thoughts? This has not been tested and I think it may cause performance
>> degradation in some cases since pageblock_order is not always
>> a constant, so the comparison may end up not being stripped away even on
>> systems where it's always false.

> This works with the above tweak. So it fixes the problm here, but I was
> not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.

This is always a possibility, but in such cases, it's a bug in CMA.
I've tried to keep in mind that pageblock_order may be greater than
MAX_ORDER when writing CMA, but I've never tested on such a system.

> It will be slower, but does it only gets called a few time at most at
> boot time, right?

Yes. The performance degradation should be negligible since
init_cma_reserved is hardly a critical path and is called at most
MAX_CMA_AREAS times which by default is 8. And I mean it will be slower
because it will have to perform a branch.


--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--

2014-06-19 19:53:40

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

SMP: Total of 8 processors activated.
devtmpfs: initialized
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = fffffe0000050000
[00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
Internal error: Oops: 96000006 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
PC is at __list_add+0x10/0xd4
LR is at free_one_page+0x270/0x638
...
Call trace:
[<fffffe00003ee970>] __list_add+0x10/0xd4
[<fffffe000019c478>] free_one_page+0x26c/0x638
[<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
[<fffffe000019d5e8>] __free_pages+0x74/0xbc
[<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
[<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
[<fffffe0000090418>] do_one_initcall+0xc4/0x154
[<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
[<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
han MAX_ORDER. This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
“pageblock_order > MAX_ORDER” condition will be optimised out since
both sides of the operator are constants. In cases where pageblock
size is variable, the performance degradation should not be
significant anyway since init_cma_reserved_pageblock() is called
only at boot time at most MAX_CMA_AREAS times which by default is
eight.

Signed-off-by: Michal Nazarewicz <[email protected]>
Reported-by: Mark Salter <[email protected]>
---
mm/page_alloc.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7f97767..fe114db 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -817,7 +817,18 @@ void __init init_cma_reserved_pageblock(struct page *page)

set_page_refcounted(page);
set_pageblock_migratetype(page, MIGRATE_CMA);
- __free_pages(page, pageblock_order);
+
+ if (pageblock_order > MAX_ORDER) {
+ i = pageblock_order - MAX_ORDER;
+ i = 1 << i;
+ p = page;
+ do {
+ __free_pages(p, MAX_ORDER);
+ } while (p += MAX_ORDER_NR_PAGES, --i);
+ } else {
+ __free_pages(page, pageblock_order);
+ }
+
adjust_managed_page_count(page, pageblock_nr_pages);
}
#endif
--
2.0.0.526.g5318336

2014-06-20 13:54:17

by Christopher Covington

[permalink] [raw]
Subject: Re: [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

On 06/19/2014 03:53 PM, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
>
> SMP: Total of 8 processors activated.
> devtmpfs: initialized
> Unable to handle kernel NULL pointer dereference at virtual address 00000008
> pgd = fffffe0000050000
> [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
> Internal error: Oops: 96000006 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
> task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
> PC is at __list_add+0x10/0xd4
> LR is at free_one_page+0x270/0x638
> ...
> Call trace:
> [<fffffe00003ee970>] __list_add+0x10/0xd4
> [<fffffe000019c478>] free_one_page+0x26c/0x638
> [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
> [<fffffe000019d5e8>] __free_pages+0x74/0xbc
> [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
> [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
> [<fffffe0000090418>] do_one_initcall+0xc4/0x154
> [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
> [<fffffe00007520a0>] kernel_init+0xc/0xd4

I just ran into this. Thanks for the fix.

Tested-by: Christopher Covington <[email protected]>

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

2014-06-20 15:48:43

by Mark Salter

[permalink] [raw]
Subject: Re: [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

On Thu, 2014-06-19 at 21:53 +0200, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
>
> SMP: Total of 8 processors activated.
> devtmpfs: initialized
> Unable to handle kernel NULL pointer dereference at virtual address 00000008
> pgd = fffffe0000050000
> [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
> Internal error: Oops: 96000006 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
> task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
> PC is at __list_add+0x10/0xd4
> LR is at free_one_page+0x270/0x638
> ...
> Call trace:
> [<fffffe00003ee970>] __list_add+0x10/0xd4
> [<fffffe000019c478>] free_one_page+0x26c/0x638
> [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
> [<fffffe000019d5e8>] __free_pages+0x74/0xbc
> [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
> [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
> [<fffffe0000090418>] do_one_initcall+0xc4/0x154
> [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
> [<fffffe00007520a0>] kernel_init+0xc/0xd4
>
> This happens because init_cma_reserved_pageblock() calls
> __free_one_page() with pageblock_order as page order but it is bigger
> han MAX_ORDER. This in turn causes accesses past zone->free_list[].
>
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
>
> In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
> architectures expect for ia64, powerpc and tile at the moment, the
> “pageblock_order > MAX_ORDER” condition will be optimised out since
> both sides of the operator are constants. In cases where pageblock
> size is variable, the performance degradation should not be
> significant anyway since init_cma_reserved_pageblock() is called
> only at boot time at most MAX_CMA_AREAS times which by default is
> eight.
>
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Reported-by: Mark Salter <[email protected]>
> ---
> mm/page_alloc.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7f97767..fe114db 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -817,7 +817,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
>
> set_page_refcounted(page);
> set_pageblock_migratetype(page, MIGRATE_CMA);
> - __free_pages(page, pageblock_order);
> +
> + if (pageblock_order > MAX_ORDER) {
> + i = pageblock_order - MAX_ORDER;
> + i = 1 << i;
> + p = page;
> + do {
> + __free_pages(p, MAX_ORDER);
> + } while (p += MAX_ORDER_NR_PAGES, --i);
> + } else {
> + __free_pages(page, pageblock_order);
> + }
> +
> adjust_managed_page_count(page, pageblock_nr_pages);
> }
> #endif

This still isn't quite right. __free_pages can only take up to
MAX_ORDER-1 (MAX_ORDER_NR_PAGES is 1 << (MAX_ORDER - 1)). But
I'm hitting a slightly different issue even with that fixed up.
Still looking...

2014-06-20 16:37:06

by Michal Nazarewicz

[permalink] [raw]
Subject: Re: [PATCHv2] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

On Fri, Jun 20 2014, Mark Salter <[email protected]> wrote:
> This still isn't quite right. __free_pages can only take up to
> MAX_ORDER-1 (MAX_ORDER_NR_PAGES is 1 << (MAX_ORDER - 1)).

Good catch. I'll send v3 in a few days then.

--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<[email protected]>--<xmpp:[email protected]>--ooO--(_)--Ooo--

2014-06-20 17:37:51

by Mark Salter

[permalink] [raw]
Subject: Re: [PATCH] arm64: fix MAX_ORDER for 64K pagesize

On Thu, 2014-06-19 at 21:24 +0200, Michal Nazarewicz wrote:
> On Thu, Jun 19 2014, Mark Salter <[email protected]> wrote:
> > On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 5dba293..6e657ce 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
> >>
> >> set_page_refcounted(page);
> >> set_pageblock_migratetype(page, MIGRATE_CMA);
> >> - __free_pages(page, pageblock_order);
> >> + if (pageblock_order > MAX_ORDER) {
> >> + struct page *subpage = p;
> >> + unsigned count = 1 << (pageblock_order - MAX_ORDER);
> >> + do {
> >> + __free_pages(subpage, pageblock_order);
> > ^^^^^^^
> > MAX_ORDER
>
> D'oh! I'll send a revised patch.
>
> >> + } while (subpage += MAX_ORDER_NR_PAGES, --count);
> >> + } else {
> >> + __free_pages(page, pageblock_order);
> >> + }
> >> adjust_managed_page_count(page, pageblock_nr_pages);
> >> }
> >> #endif
> >> --------- >8 ---------------------------------------------------------
> >>
> >> Thoughts? This has not been tested and I think it may cause performance
> >> degradation in some cases since pageblock_order is not always
> >> a constant, so the comparison may end up not being stripped away even on
> >> systems where it's always false.
>
> > This works with the above tweak. So it fixes the problm here, but I was
> > not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.
>
> This is always a possibility, but in such cases, it's a bug in CMA.
> I've tried to keep in mind that pageblock_order may be greater than
> MAX_ORDER when writing CMA, but I've never tested on such a system.
>
> > It will be slower, but does it only gets called a few time at most at
> > boot time, right?
>
> Yes. The performance degradation should be negligible since
> init_cma_reserved is hardly a critical path and is called at most
> MAX_CMA_AREAS times which by default is 8. And I mean it will be slower
> because it will have to perform a branch.
>

I ended up needing this (on top of your patch) to get the system to
boot. Each MAX_ORDER-1 group needs the refcount and migratetype set so
that __free_pages does the right thing.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 02fb1ed..a7ca6cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
set_page_count(p, 0);
} while (++p, --i);

- set_page_refcounted(page);
- set_pageblock_migratetype(page, MIGRATE_CMA);
-
- if (pageblock_order > MAX_ORDER) {
- i = pageblock_order - MAX_ORDER;
+ if (pageblock_order >= MAX_ORDER) {
+ i = pageblock_order - MAX_ORDER + 1;
i = 1 << i;
p = page;
do {
- __free_pages(p, MAX_ORDER);
+ set_page_refcounted(p);
+ set_pageblock_migratetype(p, MIGRATE_CMA);
+ __free_pages(p, MAX_ORDER - 1);
} while (p += MAX_ORDER_NR_PAGES, --i);
} else {
+ set_page_refcounted(page);
+ set_pageblock_migratetype(page, MIGRATE_CMA);
__free_pages(page, pageblock_order);
}


2014-06-23 19:40:56

by Michal Nazarewicz

[permalink] [raw]
Subject: [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

SMP: Total of 8 processors activated.
devtmpfs: initialized
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = fffffe0000050000
[00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
Internal error: Oops: 96000006 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
PC is at __list_add+0x10/0xd4
LR is at free_one_page+0x270/0x638
...
Call trace:
[<fffffe00003ee970>] __list_add+0x10/0xd4
[<fffffe000019c478>] free_one_page+0x26c/0x638
[<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
[<fffffe000019d5e8>] __free_pages+0x74/0xbc
[<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
[<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
[<fffffe0000090418>] do_one_initcall+0xc4/0x154
[<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
[<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
han MAX_ORDER. This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
“pageblock_order > MAX_ORDER” condition will be optimised out since
both sides of the operator are constants. In cases where pageblock
size is variable, the performance degradation should not be
significant anyway since init_cma_reserved_pageblock() is called
only at boot time at most MAX_CMA_AREAS times which by default is
eight.

Cc: [email protected]
Signed-off-by: Michal Nazarewicz <[email protected]>
Reported-by: Mark Salter <[email protected]>
Tested-by: Christopher Covington <[email protected]>
---
mm/page_alloc.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

Mark Salter wrote:
> I ended up needing this (on top of your patch) to get the system to
> boot. Each MAX_ORDER-1 group needs the refcount and migratetype set
> so that __free_pages does the right thing.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 02fb1ed..a7ca6cc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
> set_page_count(p, 0);
> } while (++p, --i);
>
> - set_page_refcounted(page);
> - set_pageblock_migratetype(page, MIGRATE_CMA);
> -
> - if (pageblock_order > MAX_ORDER) {
> - i = pageblock_order - MAX_ORDER;
> + if (pageblock_order >= MAX_ORDER) {
> + i = pageblock_order - MAX_ORDER + 1;
> i = 1 << i;
> p = page;
> do {
> - __free_pages(p, MAX_ORDER);
> + set_page_refcounted(p);
> + set_pageblock_migratetype(p, MIGRATE_CMA);
> + __free_pages(p, MAX_ORDER - 1);
> } while (p += MAX_ORDER_NR_PAGES, --i);
> } else {
> + set_page_refcounted(page);
> + set_pageblock_migratetype(page, MIGRATE_CMA);
> __free_pages(page, pageblock_order);
> }

This is kinda embarrassing, dunno how I missed that.

But each page actually does not need to have migratetype set, does it?
All of those pages are in a single pageblock so a single call
suffices. If you track set_pageblock_migratetype down to pfn_to_bitidx
there is:

return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;

so for pfns inside of a pageblock, they get truncated. Or did I miss
yet another thing?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee92384..fef9614 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
set_page_count(p, 0);
} while (++p, --i);

- set_page_refcounted(page);
set_pageblock_migratetype(page, MIGRATE_CMA);
- __free_pages(page, pageblock_order);
+
+ if (pageblock_order >= MAX_ORDER) {
+ i = pageblock_nr_pages;
+ p = page;
+ do {
+ set_page_refcounted(p);
+ __free_pages(p, MAX_ORDER - 1);
+ p += MAX_ORDER_NR_PAGES;
+ } while (i -= MAX_ORDER_NR_PAGES);
+ } else {
+ set_page_refcounted(page);
+ __free_pages(page, pageblock_order);
+ }
+
adjust_managed_page_count(page, pageblock_nr_pages);
}
#endif
--
2.0.0.526.g5318336

2014-06-23 21:10:45

by Mark Salter

[permalink] [raw]
Subject: Re: [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

On Mon, 2014-06-23 at 21:40 +0200, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
>
> SMP: Total of 8 processors activated.
> devtmpfs: initialized
> Unable to handle kernel NULL pointer dereference at virtual address 00000008
> pgd = fffffe0000050000
> [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
> Internal error: Oops: 96000006 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
> task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
> PC is at __list_add+0x10/0xd4
> LR is at free_one_page+0x270/0x638
> ...
> Call trace:
> [<fffffe00003ee970>] __list_add+0x10/0xd4
> [<fffffe000019c478>] free_one_page+0x26c/0x638
> [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
> [<fffffe000019d5e8>] __free_pages+0x74/0xbc
> [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
> [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
> [<fffffe0000090418>] do_one_initcall+0xc4/0x154
> [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
> [<fffffe00007520a0>] kernel_init+0xc/0xd4
>
> This happens because init_cma_reserved_pageblock() calls
> __free_one_page() with pageblock_order as page order but it is bigger
> han MAX_ORDER. This in turn causes accesses past zone->free_list[].
>
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
>
> In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
> architectures expect for ia64, powerpc and tile at the moment, the
> “pageblock_order > MAX_ORDER” condition will be optimised out since
> both sides of the operator are constants. In cases where pageblock
> size is variable, the performance degradation should not be
> significant anyway since init_cma_reserved_pageblock() is called
> only at boot time at most MAX_CMA_AREAS times which by default is
> eight.
>
> Cc: [email protected]
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Reported-by: Mark Salter <[email protected]>
> Tested-by: Christopher Covington <[email protected]>
> ---
> mm/page_alloc.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> Mark Salter wrote:
> > I ended up needing this (on top of your patch) to get the system to
> > boot. Each MAX_ORDER-1 group needs the refcount and migratetype set
> > so that __free_pages does the right thing.
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 02fb1ed..a7ca6cc 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
> > set_page_count(p, 0);
> > } while (++p, --i);
> >
> > - set_page_refcounted(page);
> > - set_pageblock_migratetype(page, MIGRATE_CMA);
> > -
> > - if (pageblock_order > MAX_ORDER) {
> > - i = pageblock_order - MAX_ORDER;
> > + if (pageblock_order >= MAX_ORDER) {
> > + i = pageblock_order - MAX_ORDER + 1;
> > i = 1 << i;
> > p = page;
> > do {
> > - __free_pages(p, MAX_ORDER);
> > + set_page_refcounted(p);
> > + set_pageblock_migratetype(p, MIGRATE_CMA);
> > + __free_pages(p, MAX_ORDER - 1);
> > } while (p += MAX_ORDER_NR_PAGES, --i);
> > } else {
> > + set_page_refcounted(page);
> > + set_pageblock_migratetype(page, MIGRATE_CMA);
> > __free_pages(page, pageblock_order);
> > }
>
> This is kinda embarrassing, dunno how I missed that.
>
> But each page actually does not need to have migratetype set, does it?
> All of those pages are in a single pageblock so a single call
> suffices. If you track set_pageblock_migratetype down to pfn_to_bitidx
> there is:
>
> return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
>
> so for pfns inside of a pageblock, they get truncated. Or did I miss
> yet another thing?

Nope, my turn to miss something. You only need to set migrate type
once per pageblock.

>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee92384..fef9614 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
> set_page_count(p, 0);
> } while (++p, --i);
>
> - set_page_refcounted(page);
> set_pageblock_migratetype(page, MIGRATE_CMA);
> - __free_pages(page, pageblock_order);
> +
> + if (pageblock_order >= MAX_ORDER) {
> + i = pageblock_nr_pages;
> + p = page;
> + do {
> + set_page_refcounted(p);
> + __free_pages(p, MAX_ORDER - 1);
> + p += MAX_ORDER_NR_PAGES;
> + } while (i -= MAX_ORDER_NR_PAGES);
> + } else {
> + set_page_refcounted(page);
> + __free_pages(page, pageblock_order);
> + }
> +
> adjust_managed_page_count(page, pageblock_nr_pages);
> }
> #endif

This version works for me. Thanks.