2008-01-29 02:37:58

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86_64: fix overlap between pagetable with bss section

[PATCH] x86_64: fix overlap between pagetable with bss section

one early crash on one 8 node 256g machine

Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dffe0000 (usable)
BIOS-e820: 00000000dffe0000 - 00000000dffee000 (ACPI data)
BIOS-e820: 00000000dffee000 - 00000000dffff050 (ACPI NVS)
BIOS-e820: 00000000dffff050 - 00000000e0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000004020000000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 67239936
Kernel panic - not syncing: Duplicated early reservation d40000-e42000

Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3

Call Trace:
[<ffffffff80221545>] lapic_get_maxlvt+0x0/0x10
[<ffffffff80221657>] clear_local_APIC+0x5/0xcf
[<ffffffff80221726>] disable_local_APIC+0x5/0x17
[<ffffffff8021fe16>] smp_send_stop+0x46/0x4c
[<ffffffff80235293>] panic+0x94/0x13e
[<ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34
[<ffffffff80b9f1c5>] reserve_early+0x30/0x6c
[<ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc
[<ffffffff80b9dc01>] setup_arch+0x21f/0x44e
[<ffffffff80b978be>] start_kernel+0x6f/0x2c7
[<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3

one later oops on other machine

Calling initcall 0xffffffff80bc33ac: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 7
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #1
RIP: 0010:[<ffffffff802bfe55>] [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff811074c55e60 EFLAGS: 00010246
RAX: 0000000000008d8d RBX: ffff811074d78d80 RCX: ffff811074c55e08
RDX: 0000000000000000 RSI: 0000000000000141 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811074d78d80
R10: 0000000000000000 R11: ffffffff80b78750 R12: ffff811074c55e6c
R13: 0000000000000000 R14: ffff811074c55ee0 R15: 00000006eb27426e
FS: 0000000000000000(0000) GS:ffff811074cc7f00(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff811074c54000, task ffff810874c54000)
Stack: ffffffff80a57340 0000014100000000 ffff811074d78d80 0000000000000000
00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
0000000000000000 ffffffff80bc3b41 ffff811074c55ee0 ffffffff80bc349b
Call Trace:
[<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
[<ffffffff80bc3b41>] ? sctp_snmp_proc_init+0x1c/0x34
[<ffffffff80bc349b>] ? sctp_init+0xef/0x711
[<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
[<ffffffff8020ccf8>] ? child_rip+0xa/0x12
[<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
[<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP <ffff811074c55e60>
CR2: 000000000000005f
---[ end trace c97bfb5810c69e0c ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out there is overlap between pgtable and bss...

need to round up table_start to PAGE

also make the panic more informative.

Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c
index f8b7beb..6f07bab 100644
--- a/arch/x86/kernel/e820_64.c
+++ b/arch/x86/kernel/e820_64.c
@@ -70,8 +70,8 @@ void __init reserve_early(unsigned long start, unsigned long end)
for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
r = &early_res[i];
if (end > r->start && start < r->end)
- panic("Duplicated early reservation %lx-%lx\n",
- start, end);
+ panic("Overlap early reservation %lx-%lx to %lx-%lx\n",
+ start, end, r->start, r->end);
}
if (i >= MAX_EARLY_RES)
panic("Too many early reservations");
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b09faf2..bf02f7e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -358,6 +358,8 @@ static void __init find_early_table_space(unsigned long end)
if (table_start == -1UL)
panic("Cannot find space for the kernel page tables");

+ /* need to round it up to avoid overlap less one page */
+ table_start = round_up(table_start, PAGE_SIZE);
table_start >>= PAGE_SHIFT;
table_end = table_start;


2008-01-29 03:59:15

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86_64: fix overlap between pagetable with bss section v2

[PATCH] x86_64: fix overlap between pagetable with bss section v2

one early crash on one 8 node 256g machine

Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dffe0000 (usable)
BIOS-e820: 00000000dffe0000 - 00000000dffee000 (ACPI data)
BIOS-e820: 00000000dffee000 - 00000000dffff050 (ACPI NVS)
BIOS-e820: 00000000dffff050 - 00000000e0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000004020000000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 67239936
Kernel panic - not syncing: Duplicated early reservation d40000-e42000

Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3

Call Trace:
[<ffffffff80221545>] lapic_get_maxlvt+0x0/0x10
[<ffffffff80221657>] clear_local_APIC+0x5/0xcf
[<ffffffff80221726>] disable_local_APIC+0x5/0x17
[<ffffffff8021fe16>] smp_send_stop+0x46/0x4c
[<ffffffff80235293>] panic+0x94/0x13e
[<ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34
[<ffffffff80b9f1c5>] reserve_early+0x30/0x6c
[<ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc
[<ffffffff80b9dc01>] setup_arch+0x21f/0x44e
[<ffffffff80b978be>] start_kernel+0x6f/0x2c7
[<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3

it turns out there is overlap between pgtable and bss...

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

need to round up table_start to PAGE_SIZE

also make the panic more informative.

Signed-off-by: Yinghai Lu <[email protected]>

Index: linux-2.6/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6/arch/x86/kernel/e820_64.c
@@ -70,8 +70,8 @@ void __init reserve_early(unsigned long
for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
r = &early_res[i];
if (end > r->start && start < r->end)
- panic("Duplicated early reservation %lx-%lx\n",
- start, end);
+ panic("Overlap early reservation %lx-%lx to %lx-%lx\n",
+ start, end, r->start, r->end);
}
if (i >= MAX_EARLY_RES)
panic("Too many early reservations");
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -358,6 +358,13 @@ static void __init find_early_table_spac
if (table_start == -1UL)
panic("Cannot find space for the kernel page tables");

+ /*
+ * when you have a lot of ram like 256g, early_table will not fit
+ * into 0x8000 range, find_e820_area will find area after kerne bss
+ * but the table_start is not page align, so need to round it up to
+ * avoid overlap with bss
+ */
+ table_start = round_up(table_start, PAGE_SIZE);
table_start >>= PAGE_SHIFT;
table_end = table_start;

2008-01-29 17:43:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: fix overlap between pagetable with bss section


* Yinghai Lu <[email protected]> wrote:

> [PATCH] x86_64: fix overlap between pagetable with bss section
>
> one early crash on one 8 node 256g machine

> +++ b/arch/x86/mm/init_64.c
> @@ -358,6 +358,8 @@ static void __init find_early_table_space(unsigned long end)
> if (table_start == -1UL)
> panic("Cannot find space for the kernel page tables");
>
> + /* need to round it up to avoid overlap less one page */
> + table_start = round_up(table_start, PAGE_SIZE);
> table_start >>= PAGE_SHIFT;
> table_end = table_start;

thanks, applied.

Ingo

2008-01-29 17:46:21

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86_64: fix overlap between pagetable with bss section

On Tuesday 29 January 2008 09:42:43 am Ingo Molnar wrote:
>
> * Yinghai Lu <[email protected]> wrote:
>
> > [PATCH] x86_64: fix overlap between pagetable with bss section
> >
> > one early crash on one 8 node 256g machine
>
> > +++ b/arch/x86/mm/init_64.c
> > @@ -358,6 +358,8 @@ static void __init find_early_table_space(unsigned long end)
> > if (table_start == -1UL)
> > panic("Cannot find space for the kernel page tables");
> >
> > + /* need to round it up to avoid overlap less one page */
> > + table_start = round_up(table_start, PAGE_SIZE);
> > table_start >>= PAGE_SHIFT;
> > table_end = table_start;
>
> thanks, applied.
>
> Ingo
>

can you use v2 instead? v2 have more comments.

Thanks

Yinghai Lu

2008-01-29 17:48:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: fix overlap between pagetable with bss section


* Yinghai Lu <[email protected]> wrote:

> > > + /* need to round it up to avoid overlap less one page */
> > > + table_start = round_up(table_start, PAGE_SIZE);
> > > table_start >>= PAGE_SHIFT;
> > > table_end = table_start;
> >
> > thanks, applied.
>
> can you use v2 instead? v2 have more comments.

yes, i have v2.

Ingo