2008-01-29 19:42:10

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 2/4] x86_64: make early_node_mem return align address v2

[PATCH 2/4] x86_64: make early_node_mem return align address v2

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60 EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS: 0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack: ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
[<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
[<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
[<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
[<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
[<ffffffff8020ccf8>] ? child_rip+0xa/0x12
[<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
[<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008a485 - 0000000000091484]
bootmap [0000000000d406f4 - 0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100

the patch update early_node_mem to make sure we can extra range
for alignment.

Signed-off-by: Yinghai Lu <[email protected]>

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -169,8 +169,14 @@ static void * __init early_node_mem(int
unsigned long mem = find_e820_area(start, end, size);
void *ptr;

- if (mem != -1L)
+ if (mem != -1L) {
+ /*
+ * make sure we NODE_DATA, or bootmap is not overlapped
+ * with bss section
+ */
+ mem = round_up(mem, PAGE_SIZE);
return __va(mem);
+ }
ptr = __alloc_bootmem_nopanic(size,
SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS));
if (ptr == NULL) {


2008-01-30 02:33:22

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86_64: make early_node_mem return align address v2

On Tuesday 29 January 2008 11:14:48 am Yinghai Lu wrote:
> [PATCH 2/4] x86_64: make early_node_mem return align address v2
>
> boot oops when system get 64g or 128 installed
>

can you apply this updated version with others?

setup_node_mem should return with PAGE_ALIGN.

in setup_node_bootmem, it need bootmap_start to be PAGE_ALIGN, without this patch it will overlap with bss.

YH

2008-01-30 03:28:16

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH 2/2] x86_64: make bootmap_start page align v3

[PATCH 2/2] x86_64: make bootmap_start page align v3

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60 EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS: 0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack: ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
[<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
[<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
[<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
[<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
[<ffffffff8020ccf8>] ? child_rip+0xa/0x12
[<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
[<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008a485 - 0000000000091484]
bootmap [0000000000d406f4 - 0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100

the patch update bootmap_start to page_align to make sure we can extra range
for alignment.

Signed-off-by: Yinghai Lu <[email protected]>

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -224,6 +224,9 @@ void __init setup_node_bootmem(int nodei
}
bootmap_start = __pa(bootmap);

+ /* make sure bootmap is not overlapped with bss section */
+ bootmap_start = round_up(bootmap_start, PAGE_SIZE);
+
bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
bootmap_start >> PAGE_SHIFT,
start_pfn, end_pfn);

2008-01-30 18:44:42

by Yinghai Lu

[permalink] [raw]
Subject: PATCH] x86_64: make bootmap_start page align v4

[PATCH] x86_64: make bootmap_start page align v3

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60 EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS: 0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack: ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
[<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
[<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
[<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
[<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
[<ffffffff8020ccf8>] ? child_rip+0xa/0x12
[<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
[<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008a485 - 0000000000091484]
bootmap [0000000000d406f4 - 0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100

the patch update bootmap_start to page_align to make sure we can extra range
for alignment.

Signed-off-by: Yinghai Lu <[email protected]>

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -224,6 +224,14 @@ void __init setup_node_bootmem(int nodei
}
bootmap_start = __pa(bootmap);

+ /*
+ * when you have 64g or 128g ram, bootmap will be pushed after bss
+ * section, the bootmap we get from early_node_mem via find_e820_area
+ * is not page aligned, we need to round it up to make sure bootmap
+ * is not overlapped with bss section
+ */
+ bootmap_start = round_up(bootmap_start, PAGE_SIZE);
+
bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
bootmap_start >> PAGE_SHIFT,
start_pfn, end_pfn);

2008-01-30 19:08:11

by Andi Kleen

[permalink] [raw]
Subject: Re: PATCH] x86_64: make bootmap_start page align v4


> + /*
> + * when you have 64g or 128g ram, bootmap will be pushed after bss
> + * section, the bootmap we get from early_node_mem via find_e820_area
> + * is not page aligned, we need to round it up to make sure bootmap
> + * is not overlapped with bss section
> + */
> + bootmap_start = round_up(bootmap_start, PAGE_SIZE);

The better solution would be to PAGE_ALIGN() the addresses
in bad_addr(). Or better fix it that no such alignment is needed to not
get conflicts.

-Andi

2008-01-30 20:16:34

by Yinghai Lu

[permalink] [raw]
Subject: Re: PATCH] x86_64: make bootmap_start page align v4

On Wednesday 30 January 2008 11:08:32 am Andi Kleen wrote:
>
> > + /*
> > + * when you have 64g or 128g ram, bootmap will be pushed after bss
> > + * section, the bootmap we get from early_node_mem via find_e820_area
> > + * is not page aligned, we need to round it up to make sure bootmap
> > + * is not overlapped with bss section
> > + */
> > + bootmap_start = round_up(bootmap_start, PAGE_SIZE);
>
> The better solution would be to PAGE_ALIGN() the addresses
> in bad_addr(). Or better fix it that no such alignment is needed to not
> get conflicts.

I don't think so, that is setup_node_bootmem's problem.

it is supposed to round_up instead of just do PAGE_SHIFT...

other caller will just use the address instead change it page.

YH

2008-01-30 23:43:53

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86_64: make bootmap_start page align v5

[PATCH] x86_64: make bootmap_start page align v5

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60 EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS: 0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack: ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
[<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
[<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
[<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
[<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
[<ffffffff8020ccf8>] ? child_rip+0xa/0x12
[<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
[<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008a485 - 0000000000091484]
bootmap [0000000000d406f4 - 0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100

need to round bootmap_start to page_align to make sure we can extra range
for alignment.

andi said it is better to fix in bad_addr, instead of early_node_mem or
setup_node_bootmem directly

Signed-off-by: Yinghai Lu <[email protected]>

Index: linux-2.6/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6/arch/x86/kernel/e820_64.c
@@ -100,7 +100,7 @@ again:
for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
struct early_res *r = &early_res[i];
if (last >= r->start && addr < r->end) {
- *addrp = addr = r->end;
+ *addrp = addr = PAGE_ALIGN(r->end);
changed = 1;
goto again;
}

2008-01-30 23:44:13

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86_64: make bootmap_start page align v5

On Wednesday 30 January 2008 03:23:35 pm Yinghai Lu wrote:
> [PATCH] x86_64: make bootmap_start page align v5

Ingo,

Please consider to apply
v5:
or
v4: http://lkml.org/lkml/2008/1/30/377
it makes bootmap page alignment
or
v2: http://lkml.org/lkml/2008/1/29/349
it makes NODE_DATA and bootmap page alignement

just pick one.

YH

v5:
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008b000 - 0000000000091fff]
bootmap [0000000000d9d000 - 0000000000ea1fff] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100
early res: 0 [0-1000) BIOS data page
early res: 1 [6000-8000) SMP_TRAMPOLINE
early res: 2 [200000-d9c274) TEXT DATA BSS
early res: 3 [7e6f4000-7fff3a26) RAMDISK
early res: 4 [99800-9d800) EBDA
early res: 5 [8000-8a000) PGTABLE
early res: 6 [8a000-8a88b) MEMNODEMAP

v4:
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008a88b - 000000000009188a]
bootmap [0000000000d9d000 - 0000000000ea1fff] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100
early res: 0 [0-1000) BIOS data page
early res: 1 [6000-8000) SMP_TRAMPOLINE
early res: 2 [200000-d9c274) TEXT DATA BSS
early res: 3 [7e6f4000-7fff3a26) RAMDISK
early res: 4 [99800-9d800) EBDA
early res: 5 [8000-8a000) PGTABLE
early res: 6 [8a000-8a88b) MEMNODEMAP

v2:
Bootmem setup node 0 0000000000000000-0000000828000000
NODE_DATA [000000000008b000 - 0000000000091fff]
bootmap [0000000000d9d000 - 0000000000ea1fff] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
NODE_DATA [0000000828000000 - 0000000828006fff]
bootmap [0000000828007000 - 0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
NODE_DATA [0000001028000000 - 0000001028006fff]
bootmap [0000001028007000 - 0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
NODE_DATA [0000001828000000 - 0000001828006fff]
bootmap [0000001828007000 - 0000001828106fff] pages 100
early res: 0 [0-1000) BIOS data page
early res: 1 [6000-8000) SMP_TRAMPOLINE
early res: 2 [200000-d9c274) TEXT DATA BSS
early res: 3 [7e6f4000-7fff3a26) RAMDISK
early res: 4 [99800-9d800) EBDA
early res: 5 [8000-8a000) PGTABLE
early res: 6 [8a000-8a88b) MEMNODEMAP

2008-01-31 03:04:01

by Andi Kleen

[permalink] [raw]
Subject: Re: PATCH] x86_64: make bootmap_start page align v4


> I don't think so, that is setup_node_bootmem's problem.
>
> it is supposed to round_up instead of just do PAGE_SHIFT...
>
> other caller will just use the address instead change it page.

You're right. In this case it really is the best way to round up in the
caller. I retract my earlier objection.

That said there is not really any true requirement for the bootmem
map to be page aligned (AFAIK) so an alternative might be to
fix the bootmem interface to not take a PFN. But that would be a
much more intrusive patch because you would need to fix
other architectures too or add compat wrappers.

-Andi

2008-01-31 03:12:27

by Yinghai Lu

[permalink] [raw]
Subject: Re: PATCH] x86_64: make bootmap_start page align v4

On Wednesday 30 January 2008 07:02:14 pm Andi Kleen wrote:
>
> > I don't think so, that is setup_node_bootmem's problem.
> >
> > it is supposed to round_up instead of just do PAGE_SHIFT...
> >
> > other caller will just use the address instead change it page.
>
> You're right. In this case it really is the best way to round up in the
> caller. I retract my earlier objection.
>
> That said there is not really any true requirement for the bootmem
> map to be page aligned (AFAIK) so an alternative might be to
> fix the bootmem interface to not take a PFN. But that would be a
> much more intrusive patch because you would need to fix
> other architectures too or add compat wrappers.

thanks.

So we need to v4.

Ingo,

can you apply the v4?

YH

2008-01-31 13:07:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: PATCH] x86_64: make bootmap_start page align v4


* Yinghai Lu <[email protected]> wrote:

> [PATCH] x86_64: make bootmap_start page align v3
>
> boot oops when system get 64g or 128 installed

> +++ linux-2.6/arch/x86/mm/numa_64.c
> @@ -224,6 +224,14 @@ void __init setup_node_bootmem(int nodei
> }
> bootmap_start = __pa(bootmap);
>
> + /*
> + * when you have 64g or 128g ram, bootmap will be pushed after bss
> + * section, the bootmap we get from early_node_mem via find_e820_area
> + * is not page aligned, we need to round it up to make sure bootmap
> + * is not overlapped with bss section
> + */
> + bootmap_start = round_up(bootmap_start, PAGE_SIZE);
> +

thanks, applied.

Ingo