2005-11-29 03:31:05

by Matti Aarnio

[permalink] [raw]
Subject: x86-64 2.6.15-rc2-git5 fails to boot with 4GB memory

With 2 GB in place, the kernel boots just fine, but with
4 GB, it reports:

kernel direct mapping tables upto ffff 8101 5000 000 @ 8000-f000
PANIC: early exception rip ffff ffff 8016 f002 error 0 cr2 4230
PANIC: early exception rip ffff ffff 8011 d1fe error 0 cr2 ffff ffff f5ff d023

and some other lines, which I didn't jot down on paper...
These were copied from some Fedora Core development kernel version
after 2.6.15-rc1 (last working one) in a box with 4 GB memory.

Those hex values didn't have intermediate spaces in them, though.
That was me trying to understand 64 bit values.

Last working kernel with all 4 GB memory in the box was 2.6.15-rc1
Since then the kernels have failed to boot at all, unless machine
PHYSICAL memory is stripped down to 2 GB. Command-line options
(e.g. "mem=2G") don't help at all.

/Matti Aarnio


2005-11-29 09:33:05

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 2.6.15-rc2-git5 fails to boot with 4GB memory

Matti Aarnio <[email protected]> writes:

> With 2 GB in place, the kernel boots just fine, but with
> 4 GB, it reports:

Works for me on several machines.

I even have a fix for the Asus wrong MCFG problem now that
broke the IOMMU on these boards (workaround is pci=nommconf)

>
> kernel direct mapping tables upto ffff 8101 5000 000 @ 8000-f000
> PANIC: early exception rip ffff ffff 8016 f002 error 0 cr2 4230
> PANIC: early exception rip ffff ffff 8011 d1fe error 0 cr2 ffff ffff f5ff d023
>
> and some other lines, which I didn't jot down on paper...

Can you please look up the RIP values in your System.map?

> These were copied from some Fedora Core development kernel version
> after 2.6.15-rc1 (last working one) in a box with 4 GB memory.

Please try vanilla 2.6.15rc2 as a reference at least.

-Andi

2005-11-29 23:53:08

by Matti Aarnio

[permalink] [raw]
Subject: Re: x86-64 2.6.15-rc2-git5 fails to boot with 4GB memory

On Tue, Nov 29, 2005 at 07:01:12AM -0700, Andi Kleen wrote:
> Matti Aarnio <[email protected]> writes:
>
> > With 2 GB in place, the kernel boots just fine, but with
> > 4 GB, it reports:
>
> Works for me on several machines.
>
> I even have a fix for the Asus wrong MCFG problem now that
> broke the IOMMU on these boards (workaround is pci=nommconf)
>
> >
> > kernel direct mapping tables upto ffff 8101 5000 000 @ 8000-f000
> > PANIC: early exception rip ffff ffff 8016 f002 error 0 cr2 4230
> > PANIC: early exception rip ffff ffff 8011 d1fe error 0 cr2 ffff ffff f5ff d023
> >
> > and some other lines, which I didn't jot down on paper...
>
> Can you please look up the RIP values in your System.map?
>
> > These were copied from some Fedora Core development kernel version
> > after 2.6.15-rc1 (last working one) in a box with 4 GB memory.
>
> Please try vanilla 2.6.15rc2 as a reference at least.

Tried. Crashes with 4 GB memory present in the box.
Boots and runs nicely with 2 GB memory populated in.

After adding -g to *CFLAGS of top-level Makefile, and
trying to determine WHERE those PANICs happened in rc2:

(gdb) list *0xffffffff80163a43
0xffffffff80163a43 is in memmap_init_zone (mm/page_alloc.c:1687).
1682 for (pfn = start_pfn; pfn < end_pfn; pfn++, page++) {
1683 if (!early_pfn_valid(pfn))
1684 continue;
1685 if (!early_pfn_in_nid(pfn, nid))
1686 continue;
1687 page = pfn_to_page(pfn);
1688 set_page_links(page, zone, nid, pfn);
1689 set_page_count(page, 1);
1690 reset_page_mapcount(page);
1691 SetPageReserved(page);

(gdb) list *0xffffffff801196fa
0xffffffff801196fa is in safe_smp_processor_id (include/asm/smp.h:77).
72 #define raw_smp_processor_id() read_pda(cpunumber)
73
74 static inline int hard_smp_processor_id(void)
75 {
76 /* we don't want to mark this access volatile - bad code generation */
77 return GET_APIC_ID(*(unsigned int *)(APIC_BASE+APIC_ID));
78 }
79
80 extern int safe_smp_processor_id(void);
81 extern int __cpu_disable(void);


Not that those explain all that much...


> -Andi

/Matti Aarnio

2005-11-30 00:31:20

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 2.6.15-rc2-git5 fails to boot with 4GB memory

> Not that those explain all that much...

Can you send me your .config? If you have SPARSEMEM enabled can you
disable it?

-Andi

2005-11-30 02:26:29

by Keith Mannthey

[permalink] [raw]
Subject: Re: x86-64 2.6.15-rc2-git5 fails to boot with 4GB memory

On 11/29/05, Andi Kleen <[email protected]> wrote:
> > Not that those explain all that much...
>
> Can you send me your .config? If you have SPARSEMEM enabled can you
> disable it?

This looks just like the sparsemem troubles. There is a patch around
somwhere.... I thought a patch was being pushed into mainline but I
guess not.

Thanks,
Keith

2005-11-30 02:56:32

by Andi Kleen

[permalink] [raw]
Subject: Re: x86-64 2.6.15-rc2-git5 fails to boot with 4GB memory

On Tue, Nov 29, 2005 at 06:26:28PM -0800, Keith Mannthey wrote:
> On 11/29/05, Andi Kleen <[email protected]> wrote:
> > > Not that those explain all that much...
> >
> > Can you send me your .config? If you have SPARSEMEM enabled can you
> > disable it?
>
> This looks just like the sparsemem troubles. There is a patch around
> somwhere.... I thought a patch was being pushed into mainline but I
> guess not.

It was I think. But I still don't trust it.

-Andi