2008-01-29 19:43:05

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap

[PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap

otherise early_node_mem will use up these for 8 nodes system

Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c
index f8b7beb..e3d3815 100644
--- a/arch/x86/kernel/e820_64.c
+++ b/arch/x86/kernel/e820_64.c
@@ -50,7 +50,7 @@ static unsigned long __initdata end_user_pfn = MAXMEM>>PAGE_SHIFT;
/*
* Early reserved memory areas.
*/
-#define MAX_EARLY_RES 20
+#define MAX_EARLY_RES 30

struct early_res {
unsigned long start, end;


2008-01-30 02:59:48

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap

On Tuesday 29 January 2008 20:16, Yinghai Lu wrote:
> [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
>
> otherise early_node_mem will use up these for 8 nodes system

Yes this was the problem with my early_reserve node bootmem patch.
It adds a node limit.

But even with increasing the limit is far too small. Probably best to not
use the patch. In theory it should not have been needed anyways because
there is no need to reserve here because there are no interfering users.

Whatever your problem is it needs to be solved differently.

-Andi

2008-01-30 03:09:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap

On Tuesday 29 January 2008 06:57:54 pm Andi Kleen wrote:
> On Tuesday 29 January 2008 20:16, Yinghai Lu wrote:
> > [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
> >
> > otherise early_node_mem will use up these for 8 nodes system
>
> Yes this was the problem with my early_reserve node bootmem patch.
> It adds a node limit.
>
> But even with increasing the limit is far too small. Probably best to not
> use the patch. In theory it should not have been needed anyways because
> there is no need to reserve here because there are no interfering users.
>
> Whatever your problem is it needs to be solved differently.

ok, discard 3, and 4.

how about 2 v2?

YH

2008-01-31 13:25:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap


* Yinghai Lu <[email protected]> wrote:

> ok, discard 3, and 4.
>
> how about 2 v2?

i'm leaning towards v4, but the more fundamental breakage is in the
early_node_mem() ad-hoc allocator that got butchered into this code a
year ago:

commit a8062231d80239cf3405982858c02aea21a6066a
Author: Andi Kleen <[email protected]>
Date: Fri Apr 7 19:49:21 2006 +0200

[PATCH] x86_64: Handle empty PXMs that only contain hotplug memory

...
+static void * __init
+early_node_mem(int nodeid, unsigned long start, unsigned long end,
+ unsigned long size)

and we are now suffering the side-effects of that hack.

what i suspect we need instead is a proper early-allocator that works in
the e820 space.

Ingo

2008-01-31 13:34:45

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap

On Thursday 31 January 2008 14:24:38 Ingo Molnar wrote:
>
> * Yinghai Lu <[email protected]> wrote:
>
> > ok, discard 3, and 4.
> >
> > how about 2 v2?
>
> i'm leaning towards v4, but the more fundamental breakage is in the
> early_node_mem() ad-hoc allocator that got butchered into this code a
> year ago:

No it has nothing to do with early_node_mem which is just a thin
wrapper around find_e820_area() anyways.

I think the problem is that the page alignment in bad_addr() and friends is not
always correct. e.g. the early_reserve for the kernel in head64.c really need to
round up to pages. I suspect (not 100% sure yet that is the core of the problem)

Note this was broken even before early reservation; the only difference
was that it was all hard coded in bad_addr() then.

There were various hacks around this in the past, but none fixed the problem
completely.

> commit a8062231d80239cf3405982858c02aea21a6066a
> Author: Andi Kleen <[email protected]>
> Date: Fri Apr 7 19:49:21 2006 +0200
>
> [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory
>
> ...
> +static void * __init
> +early_node_mem(int nodeid, unsigned long start, unsigned long end,
> + unsigned long size)
>
> and we are now suffering the side-effects of that hack.
>
> what i suspect we need instead is a proper early-allocator that works in
> the e820 space.

That is find_e820_area() or rather find_e820_area+early_reserve now.

I had this implemented as a shrink wrapped function earlier for lockdep too,
but dropped the patch because there was a nasty ordering issue with the e820
command line parsing that i could not easily resolve.

-Andi

2008-01-31 20:37:42

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86_64: add debug name for early_res

[PATCH] x86_64: add debug name for early_res

Signed-off-by: Yinghai Lu <[email protected]>

Index: linux-2.6/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6/arch/x86/kernel/e820_64.c
@@ -54,30 +54,33 @@ static unsigned long __initdata end_user

struct early_res {
unsigned long start, end;
+ char name[16];
};
static struct early_res early_res[MAX_EARLY_RES] __initdata = {
- { 0, PAGE_SIZE }, /* BIOS data page */
+ { 0, PAGE_SIZE, "BIOS data page" }, /* BIOS data page */
#ifdef CONFIG_SMP
- { SMP_TRAMPOLINE_BASE, SMP_TRAMPOLINE_BASE + 2*PAGE_SIZE },
+ { SMP_TRAMPOLINE_BASE, SMP_TRAMPOLINE_BASE + 2*PAGE_SIZE, "SMP_TRAMPOLINE" },
#endif
{}
};

-void __init reserve_early(unsigned long start, unsigned long end)
+void __init reserve_early(unsigned long start, unsigned long end, char *name)
{
int i;
struct early_res *r;
for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
r = &early_res[i];
if (end > r->start && start < r->end)
- panic("Overlapping early reservations %lx-%lx to %lx-%lx\n",
- start, end, r->start, r->end);
+ panic("Overlapping early reservations %lx-%lx %s to %lx-%lx %s\n",
+ start, end - 1, name?name:"", r->start, r->end - 1, r->name);
}
if (i >= MAX_EARLY_RES)
panic("Too many early reservations");
r = &early_res[i];
r->start = start;
r->end = end;
+ if (name)
+ strncpy(r->name, name, sizeof(r->name) - 1);
}

void __init early_res_to_bootmem(void)
@@ -85,6 +88,8 @@ void __init early_res_to_bootmem(void)
int i;
for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
struct early_res *r = &early_res[i];
+ printk(KERN_INFO "early res: %d [%lx-%lx] %s\n", i,
+ r->start, r->end - 1, r->name);
reserve_bootmem_generic(r->start, r->end - r->start);
}
}
Index: linux-2.6/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head64.c
+++ linux-2.6/arch/x86/kernel/head64.c
@@ -75,7 +75,7 @@ static __init void reserve_ebda(void)
if (ebda_size > 64*1024)
ebda_size = 64*1024;

- reserve_early(ebda_addr, ebda_addr + ebda_size);
+ reserve_early(ebda_addr, ebda_addr + ebda_size, "EBDA");
}

void __init x86_64_start_kernel(char * real_mode_data)
@@ -105,14 +105,14 @@ void __init x86_64_start_kernel(char * r
pda_init(0);
copy_bootdata(__va(real_mode_data));

- reserve_early(__pa_symbol(&_text), __pa_symbol(&_end));
+ reserve_early(__pa_symbol(&_text), __pa_symbol(&_end), "TEXT DATA BSS");

/* Reserve INITRD */
if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
unsigned long ramdisk_size = boot_params.hdr.ramdisk_size;
unsigned long ramdisk_end = ramdisk_image + ramdisk_size;
- reserve_early(ramdisk_image, ramdisk_end);
+ reserve_early(ramdisk_image, ramdisk_end, "RAMDISK");
}

reserve_ebda();
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -420,7 +420,7 @@ void __init_refok init_memory_mapping(un
mmu_cr4_features = read_cr4();
__flush_tlb_all();

- reserve_early(table_start << PAGE_SHIFT, table_end << PAGE_SHIFT);
+ reserve_early(table_start << PAGE_SHIFT, table_end << PAGE_SHIFT, "PGTABLE");
}

#ifndef CONFIG_NUMA
Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -103,7 +103,7 @@ static int __init allocate_cachealigned_
}
pad_addr = (nodemap_addr + pad) & ~pad;
memnodemap = phys_to_virt(pad_addr);
- reserve_early(nodemap_addr, nodemap_addr + nodemap_size);
+ reserve_early(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");

printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
nodemap_addr, nodemap_addr + nodemap_size);
Index: linux-2.6/include/asm-x86/e820_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/e820_64.h
+++ linux-2.6/include/asm-x86/e820_64.h
@@ -41,7 +41,7 @@ extern void finish_e820_parsing(void);
extern struct e820map e820;
extern void update_e820(void);

-extern void reserve_early(unsigned long start, unsigned long end);
+extern void reserve_early(unsigned long start, unsigned long end, char *name);
extern void early_res_to_bootmem(void);

#endif/*!__ASSEMBLY__*/

2008-01-31 20:38:11

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86_64: make bootmap_start page align v6

[PATCH] x86_64: make bootmap_start page align v6

need to apply after x86_64: add debug name for early_res

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60 EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS: 0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack: ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
[<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
[<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
[<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
[<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
[<ffffffff8020ccf8>] ? child_rip+0xa/0x12
[<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
[<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008a485 - 0000000000091484]
bootmap [0000000000d406f4 - 0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100

actually, setup_node_bootmem hope to make NODE_DATA to be aligned,
and bootmap will after that in PAGE.

the patch update find_e820_area to make sure we can address with for alignment.

Signed-off-by: Yinghai Lu <[email protected]>

Index: linux-2.6/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6/arch/x86/kernel/e820_64.c
@@ -171,12 +171,13 @@ int __init e820_all_mapped(unsigned long
}

/*
- * Find a free area in a specific range.
+ * Find a free area with specified alignment in a specific range.
*/
unsigned long __init find_e820_area(unsigned long start, unsigned long end,
- unsigned size)
+ unsigned size, unsigned long align)
{
int i;
+ unsigned long mask = ~(align - 1);

for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i];
@@ -190,7 +191,8 @@ unsigned long __init find_e820_area(unsi
continue;
while (bad_addr(&addr, size) && addr+size <= ei->addr+ei->size)
;
- last = PAGE_ALIGN(addr) + size;
+ addr = (addr + align - 1) & mask;
+ last = addr + size;
if (last > ei->addr + ei->size)
continue;
if (last > end)
Index: linux-2.6/arch/x86/kernel/setup_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup_64.c
+++ linux-2.6/arch/x86/kernel/setup_64.c
@@ -182,7 +182,8 @@ contig_initmem_init(unsigned long start_
unsigned long bootmap_size, bootmap;

bootmap_size = bootmem_bootmap_pages(end_pfn)<<PAGE_SHIFT;
- bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size);
+ bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size,
+ PAGE_SIZE);
if (bootmap == -1L)
panic("Cannot find bootmem map of size %ld\n", bootmap_size);
bootmap_size = init_bootmem(bootmap >> PAGE_SHIFT, end_pfn);
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -354,17 +354,10 @@ static void __init find_early_table_spac
* need roughly 0.5KB per GB.
*/
start = 0x8000;
- table_start = find_e820_area(start, end, tables);
+ table_start = find_e820_area(start, end, tables, PAGE_SIZE);
if (table_start == -1UL)
panic("Cannot find space for the kernel page tables");

- /*
- * When you have a lot of RAM like 256GB, early_table will not fit
- * into 0x8000 range, find_e820_area() will find area after kernel
- * bss but the table_start is not page aligned, so need to round it
- * up to avoid overlap with bss:
- */
- table_start = round_up(table_start, PAGE_SIZE);
table_start >>= PAGE_SHIFT;
table_end = table_start;

@@ -420,7 +413,9 @@ void __init_refok init_memory_mapping(un
mmu_cr4_features = read_cr4();
__flush_tlb_all();

- reserve_early(table_start << PAGE_SHIFT, table_end << PAGE_SHIFT, "PGTABLE");
+ if (!after_bootmem)
+ reserve_early(table_start << PAGE_SHIFT,
+ table_end << PAGE_SHIFT, "PGTABLE");
}

#ifndef CONFIG_NUMA
Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -84,25 +84,23 @@ static int __init populate_memnodemap(co

static int __init allocate_cachealigned_memnodemap(void)
{
- unsigned long pad, pad_addr;
+ unsigned long addr;

memnodemap = memnode.embedded_map;
if (memnodemapsize <= ARRAY_SIZE(memnode.embedded_map))
return 0;

- pad = L1_CACHE_BYTES - 1;
- pad_addr = 0x8000;
- nodemap_size = pad + sizeof(s16) * memnodemapsize;
- nodemap_addr = find_e820_area(pad_addr, end_pfn<<PAGE_SHIFT,
- nodemap_size);
+ addr = 0x8000;
+ nodemap_size = round_up(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES);
+ nodemap_addr = find_e820_area(addr, end_pfn<<PAGE_SHIFT,
+ nodemap_size, L1_CACHE_BYTES);
if (nodemap_addr == -1UL) {
printk(KERN_ERR
"NUMA: Unable to allocate Memory to Node hash map\n");
nodemap_addr = nodemap_size = 0;
return -1;
}
- pad_addr = (nodemap_addr + pad) & ~pad;
- memnodemap = phys_to_virt(pad_addr);
+ memnodemap = phys_to_virt(nodemap_addr);
reserve_early(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");

printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
@@ -164,15 +162,17 @@ int early_pfn_to_nid(unsigned long pfn)
}

static void * __init early_node_mem(int nodeid, unsigned long start,
- unsigned long end, unsigned long size)
+ unsigned long end, unsigned long size,
+ unsigned long align)
{
- unsigned long mem = find_e820_area(start, end, size);
+ unsigned long mem = find_e820_area(start, end, size, align);
void *ptr;

- if (mem != -1L)
+ if (mem != -1L) {
+ mem = round_up(mem, align);
return __va(mem);
- ptr = __alloc_bootmem_nopanic(size,
- SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS));
+ }
+ ptr = __alloc_bootmem_nopanic(size, align, __pa(MAX_DMA_ADDRESS));
if (ptr == NULL) {
printk(KERN_ERR "Cannot find %lu bytes in node %d\n",
size, nodeid);
@@ -198,7 +198,8 @@ void __init setup_node_bootmem(int nodei
start_pfn = start >> PAGE_SHIFT;
end_pfn = end >> PAGE_SHIFT;

- node_data[nodeid] = early_node_mem(nodeid, start, end, pgdat_size);
+ node_data[nodeid] = early_node_mem(nodeid, start, end, pgdat_size,
+ SMP_CACHE_BYTES);
if (node_data[nodeid] == NULL)
return;
nodedata_phys = __pa(node_data[nodeid]);
@@ -213,8 +214,12 @@ void __init setup_node_bootmem(int nodei
/* Find a place for the bootmem map */
bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
bootmap_start = round_up(nodedata_phys + pgdat_size, PAGE_SIZE);
+ /*
+ * SMP_CAHCE_BYTES could be enough, but init_bootmem_node like
+ * to use that to align to PAGE_SIZE
+ */
bootmap = early_node_mem(nodeid, bootmap_start, end,
- bootmap_pages<<PAGE_SHIFT);
+ bootmap_pages<<PAGE_SHIFT, PAGE_SIZE);
if (bootmap == NULL) {
if (nodedata_phys < start || nodedata_phys >= end)
free_bootmem((unsigned long)node_data[nodeid],
Index: linux-2.6/include/asm-x86/e820_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/e820_64.h
+++ linux-2.6/include/asm-x86/e820_64.h
@@ -15,7 +15,7 @@

#ifndef __ASSEMBLY__
extern unsigned long find_e820_area(unsigned long start, unsigned long end,
- unsigned size);
+ unsigned size, unsigned long align);
extern void add_memory_region(unsigned long start, unsigned long size,
int type);
extern void setup_memory_region(void);

2008-01-31 21:06:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: make bootmap_start page align v6


* Yinghai Lu <[email protected]> wrote:

> [PATCH] x86_64: make bootmap_start page align v6
>
> need to apply after x86_64: add debug name for early_res
>
> boot oops when system get 64g or 128 installed

thanks - this v6 approach looks a _lot_ saner because it solves the core
problem: the fragility of the early allocator code. They are also
cleanups, besides being fixes. Applied.

does this solve all the boot problems you were seeing with 64 or 128 GB
of RAM?

Ingo

2008-01-31 21:13:28

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86_64: make bootmap_start page align v6

On Thursday 31 January 2008 01:05:53 pm Ingo Molnar wrote:
>
> * Yinghai Lu <[email protected]> wrote:
>
> > [PATCH] x86_64: make bootmap_start page align v6
> >
> > need to apply after x86_64: add debug name for early_res
> >
> > boot oops when system get 64g or 128 installed
>
> thanks - this v6 approach looks a _lot_ saner because it solves the core
> problem: the fragility of the early allocator code. They are also
> cleanups, besides being fixes. Applied.
>
> does this solve all the boot problems you were seeing with 64 or 128 GB
> of RAM?


yes.

YH

2008-01-31 22:48:56

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86_64: remove unneeded round_up

[PATCH] x86_64: remove unneeded round_up

Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index d585d27..5a02bf4 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -168,10 +168,9 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
unsigned long mem = find_e820_area(start, end, size, align);
void *ptr;

- if (mem != -1L) {
- mem = round_up(mem, align);
+ if (mem != -1L)
return __va(mem);
- }
+
ptr = __alloc_bootmem_nopanic(size, align, __pa(MAX_DMA_ADDRESS));
if (ptr == NULL) {
printk(KERN_ERR "Cannot find %lu bytes in node %d\n",

2008-01-31 22:53:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: remove unneeded round_up


* Yinghai Lu <[email protected]> wrote:

> - if (mem != -1L) {
> - mem = round_up(mem, align);
> + if (mem != -1L)
> return __va(mem);
> - }

thanks, applied.

It even reduces the size of the kernel a tiny bit:

text data bss dec hex filename
2963 4149 4352 11464 2cc8 numa_64.o.before
2949 4149 4352 11450 2cba numa_64.o.after

and it's always a good sign to kernel quality when patches (that change
functionality) have that effect :)

Ingo