2003-11-16 10:15:40

by Russell King

[permalink] [raw]
Subject: Bootmem broke ARM

Andrew & others,

2.6 contains a change to init_bootmem_core() which now sorts the nodes
according to their start pfn. This change occurred in revision 1.20 of
bootmem.c. Unfortunately, this active sorting broke ARM discontig memory
support.

With previous kernels, the nodes are added to the list in reverse order,
so architecture code knew we had to add the highest PFN first and the
lowest PFN node last.

However, we now sort the nodes using node_start_pfn, which, at this point,
will be uninitialised - the responsibility for initialising this field
is with the generic code - in free_area_init_node() which occurs well
after bootmem has been initialised.

The result of this change is that we now add nodes to the tail of the
pgdat list, which is the opposite way to 2.4.

This causes problems for ARM because we need to use bootmem to initialise
the kernels page tables, and we can only allocate these from node 0 - none
of the other nodes are mapped into memory at this point.

I, therefore, believe this change is bogus. Can it be reverted please?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core


2003-11-16 20:06:39

by Andrew Morton

[permalink] [raw]
Subject: Re: Bootmem broke ARM

Russell King <[email protected]> wrote:
>
> Andrew & others,
>
> 2.6 contains a change to init_bootmem_core() which now sorts the nodes
> according to their start pfn. This change occurred in revision 1.20 of
> bootmem.c. Unfortunately, this active sorting broke ARM discontig memory
> support.
>
> With previous kernels, the nodes are added to the list in reverse order,
> so architecture code knew we had to add the highest PFN first and the
> lowest PFN node last.
>
> However, we now sort the nodes using node_start_pfn, which, at this point,
> will be uninitialised - the responsibility for initialising this field
> is with the generic code - in free_area_init_node() which occurs well
> after bootmem has been initialised.
>
> The result of this change is that we now add nodes to the tail of the
> pgdat list, which is the opposite way to 2.4.
>
> This causes problems for ARM because we need to use bootmem to initialise
> the kernels page tables, and we can only allocate these from node 0 - none
> of the other nodes are mapped into memory at this point.
>
> I, therefore, believe this change is bogus. Can it be reverted please?
>

It looks to be bogus on ia64 as well, for which the patch was written.

Or maybe ia64 _does_ arrange for the node_start_pfn to be initialised
before init_bootmem_core(), but I cannot see where. So the attempt to sort
the pgdat list in there doesn't actually sort it at all - it simply
reverses it by accident.

Jesse, it looks like this needs to be revisited please.

2003-11-17 18:05:21

by Jesse Barnes

[permalink] [raw]
Subject: Re: Bootmem broke ARM

On Sun, Nov 16, 2003 at 12:11:31PM -0800, Andrew Morton wrote:
> Russell King <[email protected]> wrote:
> > With previous kernels, the nodes are added to the list in reverse order,
> > so architecture code knew we had to add the highest PFN first and the
> > lowest PFN node last.

You're right, I think the arch code should probably worry about this.

> It looks to be bogus on ia64 as well, for which the patch was written.

Yep, I think it is bogus. There's only one caller on ia64 that would be
affected--swiotlb_init(), and afaik multi-node systems won't be using
that code (except maybe NEC?), so even if the pgdat list is out of order
we should be ok. If not I'll fix the ia64 discontig code.

Thanks,
Jesse

===== mm/bootmem.c 1.22 vs edited =====
--- 1.22/mm/bootmem.c Sun Sep 28 10:11:27 2003
+++ edited/mm/bootmem.c Mon Nov 17 10:44:59 2003
@@ -48,24 +48,8 @@
bootmem_data_t *bdata = pgdat->bdata;
unsigned long mapsize = ((end - start)+7)/8;

-
- /*
- * sort pgdat_list so that the lowest one comes first,
- * which makes alloc_bootmem_low_pages work as desired.
- */
- if (!pgdat_list || pgdat_list->node_start_pfn > pgdat->node_start_pfn) {
- pgdat->pgdat_next = pgdat_list;
- pgdat_list = pgdat;
- } else {
- pg_data_t *tmp = pgdat_list;
- while (tmp->pgdat_next) {
- if (tmp->pgdat_next->node_start_pfn > pgdat->node_start_pfn)
- break;
- tmp = tmp->pgdat_next;
- }
- pgdat->pgdat_next = tmp->pgdat_next;
- tmp->pgdat_next = pgdat;
- }
+ pgdat->pgdat_next = pgdat_list;
+ pgdat_list = pgdat;

mapsize = (mapsize + (sizeof(long) - 1UL)) & ~(sizeof(long) - 1UL);
bdata->node_bootmem_map = phys_to_virt(mapstart << PAGE_SHIFT);

2003-11-18 09:42:12

by Russell King

[permalink] [raw]
Subject: Re: Bootmem broke ARM

On Mon, Nov 17, 2003 at 10:04:40AM -0800, Jesse Barnes wrote:
> On Sun, Nov 16, 2003 at 12:11:31PM -0800, Andrew Morton wrote:
> > Russell King <[email protected]> wrote:
> > > With previous kernels, the nodes are added to the list in reverse order,
> > > so architecture code knew we had to add the highest PFN first and the
> > > lowest PFN node last.
>
> You're right, I think the arch code should probably worry about this.
>
> > It looks to be bogus on ia64 as well, for which the patch was written.
>
> Yep, I think it is bogus. There's only one caller on ia64 that would be
> affected--swiotlb_init(), and afaik multi-node systems won't be using
> that code (except maybe NEC?), so even if the pgdat list is out of order
> we should be ok. If not I'll fix the ia64 discontig code.

Ok, I've received word from the original reporter on the ARM lists that
this patch does indeed solve their problem.

Linus - can this be merged for 2.6.0-test10 please, or do you consider
this too large a change?

===== mm/bootmem.c 1.22 vs edited =====
--- 1.22/mm/bootmem.c Sun Sep 28 10:11:27 2003
+++ edited/mm/bootmem.c Mon Nov 17 10:44:59 2003
@@ -48,24 +48,8 @@
bootmem_data_t *bdata = pgdat->bdata;
unsigned long mapsize = ((end - start)+7)/8;

-
- /*
- * sort pgdat_list so that the lowest one comes first,
- * which makes alloc_bootmem_low_pages work as desired.
- */
- if (!pgdat_list || pgdat_list->node_start_pfn > pgdat->node_start_pfn) {
- pgdat->pgdat_next = pgdat_list;
- pgdat_list = pgdat;
- } else {
- pg_data_t *tmp = pgdat_list;
- while (tmp->pgdat_next) {
- if (tmp->pgdat_next->node_start_pfn > pgdat->node_start_pfn)
- break;
- tmp = tmp->pgdat_next;
- }
- pgdat->pgdat_next = tmp->pgdat_next;
- tmp->pgdat_next = pgdat;
- }
+ pgdat->pgdat_next = pgdat_list;
+ pgdat_list = pgdat;

mapsize = (mapsize + (sizeof(long) - 1UL)) & ~(sizeof(long) - 1UL);
bdata->node_bootmem_map = phys_to_virt(mapstart << PAGE_SHIFT);

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2003-11-18 09:57:32

by Russell King

[permalink] [raw]
Subject: Re: Bootmem broke ARM

On Tue, Nov 18, 2003 at 09:42:07AM +0000, Russell King wrote:
> Linus - can this be merged for 2.6.0-test10 please, or do you consider
> this too large a change?

Ignore me - its already in. 8)

Thanks.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core