2008-12-30 21:19:30

by David Lang

[permalink] [raw]
Subject: early exception error

I'm trying to upgrade a older dual opteron box to a current 64 bit system,
but I'm getting early exception errors from several different kernels

I tried the ubuntu 8.04 and 8.10 disks, and then I decided to try hunting
it down myself, so I took my 2.6.25 32 bit config copied it to a 64 bit
system and did a make oldconfig with 2.6.28 and am getting the same type
of error (the address changes from kernel to kernel and config to config
on the same kernel)

doing a grep through System.map for the address that appears in the error
returns nothing

attached are snapshots of the screen when the error happens and two
different configs with the same error (I tried disabling high-res timers,
tickless operation, and PAT, but got the same error), the 28 image is the
-2 config.

where do I go from here to track this down?

David Lang


Attachments:
IMG00028.jpg (160.59 kB)
IMG00027.jpg (164.78 kB)
kernel.config-2.6.28-64 (50.11 kB)
config.kernel-2.6.28-64-2 (50.10 kB)
Download all attachments

2008-12-30 21:26:56

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Tue, Dec 30, 2008 at 02:21:10PM -0800]
> I'm trying to upgrade a older dual opteron box to a current 64 bit
> system, but I'm getting early exception errors from several different
> kernels
>
> I tried the ubuntu 8.04 and 8.10 disks, and then I decided to try hunting
> it down myself, so I took my 2.6.25 32 bit config copied it to a 64 bit
> system and did a make oldconfig with 2.6.28 and am getting the same type
> of error (the address changes from kernel to kernel and config to config
> on the same kernel)
>
> doing a grep through System.map for the address that appears in the error
> returns nothing
>
> attached are snapshots of the screen when the error happens and two
> different configs with the same error (I tried disabling high-res timers,
> tickless operation, and PAT, but got the same error), the 28 image is the
> -2 config.
>
> where do I go from here to track this down?
>
> David Lang
...

Hi David,

not sure if it's the same but I've got similar errors for
early_params being not capable to handle null passed args.
What is the cmdline? It seems to be framebuffer related.
But otoh I could be just wrong.

- Cyrill -

2008-12-30 21:29:36

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:

> [[email protected] - Tue, Dec 30, 2008 at 02:21:10PM -0800]
>> I'm trying to upgrade a older dual opteron box to a current 64 bit
>> system, but I'm getting early exception errors from several different
>> kernels
>>
>> I tried the ubuntu 8.04 and 8.10 disks, and then I decided to try hunting
>> it down myself, so I took my 2.6.25 32 bit config copied it to a 64 bit
>> system and did a make oldconfig with 2.6.28 and am getting the same type
>> of error (the address changes from kernel to kernel and config to config
>> on the same kernel)
>>
>> doing a grep through System.map for the address that appears in the error
>> returns nothing
>>
>> attached are snapshots of the screen when the error happens and two
>> different configs with the same error (I tried disabling high-res timers,
>> tickless operation, and PAT, but got the same error), the 28 image is the
>> -2 config.
>>
>> where do I go from here to track this down?
>>
>> David Lang
> ...
>
> Hi David,
>
> not sure if it's the same but I've got similar errors for
> early_params being not capable to handle null passed args.
> What is the cmdline? It seems to be framebuffer related.
> But otoh I could be just wrong.

very trivial

grep -v "^#" /etc/lilo.conf
boot = /dev/sda
message = /boot/boot_message.txt
prompt
compact
timeout = 1200
change-rules
reset
vga = ask
image = /boot/vmlinuz-2.6.28-64-2
root = /dev/sda2
label = 2.6.28-64

I trimmed the additional boot images from this.

David Lang

2008-12-30 21:41:58

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Tue, Dec 30, 2008 at 02:31:28PM -0800]
> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>
>> [[email protected] - Tue, Dec 30, 2008 at 02:21:10PM -0800]
>>> I'm trying to upgrade a older dual opteron box to a current 64 bit
>>> system, but I'm getting early exception errors from several different
>>> kernels
>>>
>>> I tried the ubuntu 8.04 and 8.10 disks, and then I decided to try hunting
>>> it down myself, so I took my 2.6.25 32 bit config copied it to a 64 bit
>>> system and did a make oldconfig with 2.6.28 and am getting the same type
>>> of error (the address changes from kernel to kernel and config to config
>>> on the same kernel)
>>>
>>> doing a grep through System.map for the address that appears in the error
>>> returns nothing
>>>
>>> attached are snapshots of the screen when the error happens and two
>>> different configs with the same error (I tried disabling high-res timers,
>>> tickless operation, and PAT, but got the same error), the 28 image is the
>>> -2 config.
>>>
>>> where do I go from here to track this down?
>>>
>>> David Lang
>> ...
>>
>> Hi David,
>>
>> not sure if it's the same but I've got similar errors for
>> early_params being not capable to handle null passed args.
>> What is the cmdline? It seems to be framebuffer related.
>> But otoh I could be just wrong.
>
> very trivial
>
> grep -v "^#" /etc/lilo.conf
> boot = /dev/sda
> message = /boot/boot_message.txt
> prompt
> compact
> timeout = 1200
> change-rules
> reset
> vga = ask
> image = /boot/vmlinuz-2.6.28-64-2
> root = /dev/sda2
> label = 2.6.28-64
>
> I trimmed the additional boot images from this.
>
> David Lang
>

thanks David, will try to reproduce it

- Cyrill -

2008-12-30 21:46:35

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:

> [[email protected] - Tue, Dec 30, 2008 at 02:31:28PM -0800]
>> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>>
>>> [[email protected] - Tue, Dec 30, 2008 at 02:21:10PM -0800]
>>>> I'm trying to upgrade a older dual opteron box to a current 64 bit
>>>> system, but I'm getting early exception errors from several different
>>>> kernels
>>>>
>>>> I tried the ubuntu 8.04 and 8.10 disks, and then I decided to try hunting
>>>> it down myself, so I took my 2.6.25 32 bit config copied it to a 64 bit
>>>> system and did a make oldconfig with 2.6.28 and am getting the same type
>>>> of error (the address changes from kernel to kernel and config to config
>>>> on the same kernel)
>>>>
>>>> doing a grep through System.map for the address that appears in the error
>>>> returns nothing
>>>>
>>>> attached are snapshots of the screen when the error happens and two
>>>> different configs with the same error (I tried disabling high-res timers,
>>>> tickless operation, and PAT, but got the same error), the 28 image is the
>>>> -2 config.
>>>>
>>>> where do I go from here to track this down?
>>>>
>>>> David Lang
>>> ...
>>>
>>> Hi David,
>>>
>>> not sure if it's the same but I've got similar errors for
>>> early_params being not capable to handle null passed args.
>>> What is the cmdline? It seems to be framebuffer related.
>>> But otoh I could be just wrong.
>>
>> very trivial
>>
>> grep -v "^#" /etc/lilo.conf
>> boot = /dev/sda
>> message = /boot/boot_message.txt
>> prompt
>> compact
>> timeout = 1200
>> change-rules
>> reset
>> vga = ask
>> image = /boot/vmlinuz-2.6.28-64-2
>> root = /dev/sda2
>> label = 2.6.28-64
>>
>> I trimmed the additional boot images from this.
>>
>> David Lang
>>
>
> thanks David, will try to reproduce it

I disabled the framebuffer and still got the error

the system is a dual opteron 240 with an adaptec 3210S i2o raid card and
radon video card

lspci returns

00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03)
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:01.0 PCI bridge: PLX Technology, Inc.: Unknown device 8111 (rev 21)
01:03.0 PCI bridge: Adaptec (formerly DPT) PCI Bridge (rev 01)
01:03.1 I2O: Adaptec (formerly DPT) SmartRAID V Controller (rev 01)
03:00.0 VGA compatible controller: ATI Technologies Inc: Unknown device 7183
03:00.1 Display controller: ATI Technologies Inc: Unknown device 71a3
04:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
04:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
05:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
05:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
05:08.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 10)

2008-12-31 00:24:23

by Andi Kleen

[permalink] [raw]
Subject: Re: early exception error

[email protected] writes:
>
> doing a grep through System.map for the address that appears in the
> error returns nothing

This might be obvious, but you can't grep directly for these addresses
because System.map contains the starting addresses of functions only
and normally the reported address is somewhere in the middle of a
function. So you instead have to look for the highest number lower or equal
the address from the exception.

-Andi

--
[email protected]

2008-12-31 00:37:33

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Wed, 31 Dec 2008, Andi Kleen wrote:

> [email protected] writes:
>>
>> doing a grep through System.map for the address that appears in the
>> error returns nothing
>
> This might be obvious, but you can't grep directly for these addresses
> because System.map contains the starting addresses of functions only
> and normally the reported address is somewhere in the middle of a
> function. So you instead have to look for the highest number lower or equal
> the address from the exception.

thanks, this was not obvious to me

the -2 error maps to

ffffffff8099e4c1 T free_bootmem_node
ffffffff8099e4e5 t alloc_bootmem_core
ffffffff8099e774 t ___alloc_bootmem_nopanic


the first error maps to

ffffffff809c2de4 T free_bootmem_node
ffffffff809c2e08 t alloc_bootmem_core
ffffffff809c3097 t ___alloc_bootmem_nopanic


so it looks like this is in alloc_bootmem_core in both cases.

David Lang

2008-12-31 01:12:28

by Andi Kleen

[permalink] [raw]
Subject: Re: early exception error

On Tue, Dec 30, 2008 at 05:39:29PM -0800, [email protected] wrote:
> On Wed, 31 Dec 2008, Andi Kleen wrote:
>
> >[email protected] writes:
> >>
> >>doing a grep through System.map for the address that appears in the
> >>error returns nothing
> >
> >This might be obvious, but you can't grep directly for these addresses
> >because System.map contains the starting addresses of functions only
> >and normally the reported address is somewhere in the middle of a
> >function. So you instead have to look for the highest number lower or equal
> >the address from the exception.
>
> thanks, this was not obvious to me
>
> the -2 error maps to

You should also boot with earlyprintk=vga to get more context.

-Andi

2008-12-31 09:38:19

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Tue, Dec 30, 2008 at 05:39:29PM -0800]
> On Wed, 31 Dec 2008, Andi Kleen wrote:
>
>> [email protected] writes:
>>>
>>> doing a grep through System.map for the address that appears in the
>>> error returns nothing
>>
>> This might be obvious, but you can't grep directly for these addresses
>> because System.map contains the starting addresses of functions only
>> and normally the reported address is somewhere in the middle of a
>> function. So you instead have to look for the highest number lower or equal
>> the address from the exception.
>
> thanks, this was not obvious to me
>
> the -2 error maps to
>
> ffffffff8099e4c1 T free_bootmem_node
> ffffffff8099e4e5 t alloc_bootmem_core
> ffffffff8099e774 t ___alloc_bootmem_nopanic
>
>
> the first error maps to
>
> ffffffff809c2de4 T free_bootmem_node
> ffffffff809c2e08 t alloc_bootmem_core
> ffffffff809c3097 t ___alloc_bootmem_nopanic
>
>
> so it looks like this is in alloc_bootmem_core in both cases.
>
> David Lang
>

Along with Andi's proposed earlyprintk=vga I think
bootmem_debug option could be usefull here too.

- Cyrill -

2008-12-31 18:10:40

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:

> [[email protected] - Tue, Dec 30, 2008 at 05:39:29PM -0800]
>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>
>>> [email protected] writes:
>>>>
>>>> doing a grep through System.map for the address that appears in the
>>>> error returns nothing
>>>
>>> This might be obvious, but you can't grep directly for these addresses
>>> because System.map contains the starting addresses of functions only
>>> and normally the reported address is somewhere in the middle of a
>>> function. So you instead have to look for the highest number lower or equal
>>> the address from the exception.
>>
>> thanks, this was not obvious to me
>>
>> the -2 error maps to
>>
>> ffffffff8099e4c1 T free_bootmem_node
>> ffffffff8099e4e5 t alloc_bootmem_core
>> ffffffff8099e774 t ___alloc_bootmem_nopanic
>>
>>
>> the first error maps to
>>
>> ffffffff809c2de4 T free_bootmem_node
>> ffffffff809c2e08 t alloc_bootmem_core
>> ffffffff809c3097 t ___alloc_bootmem_nopanic
>>
>>
>> so it looks like this is in alloc_bootmem_core in both cases.
>>
>> David Lang
>>
>
> Along with Andi's proposed earlyprintk=vga I think
> bootmem_debug option could be usefull here too.

adding bootmem_debug creates so much additonal output that the oops
scrolls off the screen (except the last 'paragraph' of it)

it looks like it's individual items being allocated (trying to scan it as
it scrolled by)

if that output is needed I will need to setup a serial console to gather
it (can this be done for the earlyprintk?)

David Lang

2008-12-31 18:30:52

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Wed, Dec 31, 2008 at 11:12:12AM -0800]
> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>
>> [[email protected] - Tue, Dec 30, 2008 at 05:39:29PM -0800]
>>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>>
>>>> [email protected] writes:
>>>>>
>>>>> doing a grep through System.map for the address that appears in the
>>>>> error returns nothing
>>>>
>>>> This might be obvious, but you can't grep directly for these addresses
>>>> because System.map contains the starting addresses of functions only
>>>> and normally the reported address is somewhere in the middle of a
>>>> function. So you instead have to look for the highest number lower or equal
>>>> the address from the exception.
>>>
>>> thanks, this was not obvious to me
>>>
>>> the -2 error maps to
>>>
>>> ffffffff8099e4c1 T free_bootmem_node
>>> ffffffff8099e4e5 t alloc_bootmem_core
>>> ffffffff8099e774 t ___alloc_bootmem_nopanic
>>>
>>>
>>> the first error maps to
>>>
>>> ffffffff809c2de4 T free_bootmem_node
>>> ffffffff809c2e08 t alloc_bootmem_core
>>> ffffffff809c3097 t ___alloc_bootmem_nopanic
>>>
>>>
>>> so it looks like this is in alloc_bootmem_core in both cases.
>>>
>>> David Lang
>>>
>>
>> Along with Andi's proposed earlyprintk=vga I think
>> bootmem_debug option could be usefull here too.
>
> adding bootmem_debug creates so much additonal output that the oops
> scrolls off the screen (except the last 'paragraph' of it)
>
> it looks like it's individual items being allocated (trying to scan it as
> it scrolled by)

on the picture you sent me i noticed the message
"Your memory is not aligned you need to rebuild your
kernel with bigger NODEMAP SIZE shift=20" and then
srat code complains about "No NUMA code hash function found"
which looks a bit scary. Btw, could you post this picture
on some public resource so NUMA people could check it?

>
> if that output is needed I will need to setup a serial console to gather
> it (can this be done for the earlyprintk?)

yep, earlyprintk=serial (at least code say it would support it :)

>
> David Lang
>
- Cyrill -

2008-12-31 20:20:28

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Andi Kleen - Wed, Dec 31, 2008 at 08:50:05PM +0100]
| > on the picture you sent me i noticed the message
| > "Your memory is not aligned you need to rebuild your
| > kernel with bigger NODEMAP SIZE shift=20" and then
| > srat code complains about "No NUMA code hash function found"
| > which looks a bit scary. Btw, could you post this picture
| > on some public resource so NUMA people could check it?
|
| This case used to be handled cleanly (NUMA disabled), but perhaps
| that has regressed. But still it sounds like something is going wrong,
| unless his machine really has a very weird memory map.
|
| -Andi
| --
| [email protected]
|

Andi, it seems I missed where is on the photo NUMA disabled.
At least on picture 2 nodes reported to have place (the nodes
are with 10 and 20 bootmap pages on each node).

- Cyrill -

2008-12-31 20:27:58

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Cyrill Gorcunov - Wed, Dec 31, 2008 at 11:23:46PM +0300]
...
| >>
| >> also you could just pass numa=off and check if it help.
| >> (even if it help it would not mean that problem are gone
| >> but become hidden)
| >
| > with numa=off the system looks like it gets a bit further
| >
| > http://linux.lang.hm/linux/IMG00031.jpg
| >
| > this is with framebuffer disabled, earlyprintk=vga bootmem_debug numa=off
| >
| > David Lang
| >
|
| Thanks David, if I recognize correctly now it fails at
| vfs_caches_init. hmm...
|
| - Cyrill -

no suprises - vfs_caches_init uses SLAB_PANIC and since
we seems to have "memory related" problems earlier now
we've been catched explicitly by slab code.

- Cyrill -

2008-12-31 20:23:59

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Wed, Dec 31, 2008 at 01:18:25PM -0800]
> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>
>> [[email protected] - Wed, Dec 31, 2008 at 12:07:33PM -0800]
>>> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>>>
>>>> [[email protected] - Wed, Dec 31, 2008 at 11:12:12AM -0800]
>>>>> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>>>>>
>>>>>> [[email protected] - Tue, Dec 30, 2008 at 05:39:29PM -0800]
>>>>>>>
>>>>>>> so it looks like this is in alloc_bootmem_core in both cases.
>>>>>>>
>>>>>>> David Lang
>>>>>>>
>>>>>>
>>>>>> Along with Andi's proposed earlyprintk=vga I think
>>>>>> bootmem_debug option could be usefull here too.
>>>>>
>>>>> adding bootmem_debug creates so much additonal output that the oops
>>>>> scrolls off the screen (except the last 'paragraph' of it)
>>>>>
>>>>> it looks like it's individual items being allocated (trying to scan it as
>>>>> it scrolled by)
>>>>
>>>> on the picture you sent me i noticed the message
>>>> "Your memory is not aligned you need to rebuild your
>>>> kernel with bigger NODEMAP SIZE shift=20" and then
>>>> srat code complains about "No NUMA code hash function found"
>>>> which looks a bit scary. Btw, could you post this picture
>>>> on some public resource so NUMA people could check it?
>>>
>>> http://linux.lang.hm/linux/IMG00030.jpg
>>>
>>> I'll try rebuilding with a bigger nodemap size and let you know
>>>
>>> David Lang
>>>
>>
>> also you could just pass numa=off and check if it help.
>> (even if it help it would not mean that problem are gone
>> but become hidden)
>
> with numa=off the system looks like it gets a bit further
>
> http://linux.lang.hm/linux/IMG00031.jpg
>
> this is with framebuffer disabled, earlyprintk=vga bootmem_debug numa=off
>
> David Lang
>

Thanks David, if I recognize correctly now it fails at
vfs_caches_init. hmm...

- Cyrill -

2008-12-31 19:57:18

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Wed, 31 Dec 2008, Andi Kleen wrote:

>> on the picture you sent me i noticed the message
>> "Your memory is not aligned you need to rebuild your
>> kernel with bigger NODEMAP SIZE shift=20" and then
>> srat code complains about "No NUMA code hash function found"
>> which looks a bit scary. Btw, could you post this picture
>> on some public resource so NUMA people could check it?
>
> This case used to be handled cleanly (NUMA disabled), but perhaps
> that has regressed. But still it sounds like something is going wrong,
> unless his machine really has a very weird memory map.

it shouldn't, it was one of the high-volume servers 4-5 years ago and only
has 4G of ram in it

here's the start of the boot with 2.6.25 (32 bit)

BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000004cff0000 (usable)
BIOS-e820: 000000004cff0000 - 000000004cfff000 (ACPI data)
BIOS-e820: 000000004cfff000 - 000000004d000000 (ACPI NVS)
BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
Warning only 4GB will be used.
Use a HIGHMEM64G enabled kernel.
3200MB HIGHMEM available.
896MB LOWMEM available.
Scan SMP from c0000000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f0000 for 65536 bytes.
found SMP MP-table at [c00ff780] 000ff780
Entering add_active_range(0, 0, 1048576) 0 entries of 256 used
Zone PFN ranges:
DMA 0 -> 4096
Normal 4096 -> 229376
HighMem 229376 -> 1048576
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0: 0 -> 1048576
On node 0 totalpages: 1048576
DMA zone: 32 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 4064 pages, LIFO batch:0
Normal zone: 1760 pages used for memmap
Normal zone: 223520 pages, LIFO batch:31
HighMem zone: 6400 pages used for memmap
HighMem zone: 812800 pages, LIFO batch:31
Movable zone: 0 pages used for memmap
DMI 2.3 present.

2008-12-31 20:16:29

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:

> [[email protected] - Wed, Dec 31, 2008 at 12:07:33PM -0800]
>> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>>
>>> [[email protected] - Wed, Dec 31, 2008 at 11:12:12AM -0800]
>>>> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>>>>
>>>>> [[email protected] - Tue, Dec 30, 2008 at 05:39:29PM -0800]
>>>>>>
>>>>>> so it looks like this is in alloc_bootmem_core in both cases.
>>>>>>
>>>>>> David Lang
>>>>>>
>>>>>
>>>>> Along with Andi's proposed earlyprintk=vga I think
>>>>> bootmem_debug option could be usefull here too.
>>>>
>>>> adding bootmem_debug creates so much additonal output that the oops
>>>> scrolls off the screen (except the last 'paragraph' of it)
>>>>
>>>> it looks like it's individual items being allocated (trying to scan it as
>>>> it scrolled by)
>>>
>>> on the picture you sent me i noticed the message
>>> "Your memory is not aligned you need to rebuild your
>>> kernel with bigger NODEMAP SIZE shift=20" and then
>>> srat code complains about "No NUMA code hash function found"
>>> which looks a bit scary. Btw, could you post this picture
>>> on some public resource so NUMA people could check it?
>>
>> http://linux.lang.hm/linux/IMG00030.jpg
>>
>> I'll try rebuilding with a bigger nodemap size and let you know
>>
>> David Lang
>>
>
> also you could just pass numa=off and check if it help.
> (even if it help it would not mean that problem are gone
> but become hidden)

with numa=off the system looks like it gets a bit further

http://linux.lang.hm/linux/IMG00031.jpg

this is with framebuffer disabled, earlyprintk=vga bootmem_debug numa=off

David Lang

2008-12-31 19:36:50

by Andi Kleen

[permalink] [raw]
Subject: Re: early exception error

> on the picture you sent me i noticed the message
> "Your memory is not aligned you need to rebuild your
> kernel with bigger NODEMAP SIZE shift=20" and then
> srat code complains about "No NUMA code hash function found"
> which looks a bit scary. Btw, could you post this picture
> on some public resource so NUMA people could check it?

This case used to be handled cleanly (NUMA disabled), but perhaps
that has regressed. But still it sounds like something is going wrong,
unless his machine really has a very weird memory map.

-Andi
--
[email protected]

2008-12-31 22:30:26

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Cyrill Gorcunov - Wed, Dec 31, 2008 at 11:27:44PM +0300]
...
| | Thanks David, if I recognize correctly now it fails at
| | vfs_caches_init. hmm...
| |
| | - Cyrill -
|
| no suprises - vfs_caches_init uses SLAB_PANIC and since
| we seems to have "memory related" problems earlier now
| we've been catched explicitly by slab code.
|
| - Cyrill -

on the other hand I think we would have different Oops
form here if it would really be slab panic... checking.

- Cyrill -

2008-12-31 19:05:35

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:

> [[email protected] - Wed, Dec 31, 2008 at 11:12:12AM -0800]
>> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>>
>>> [[email protected] - Tue, Dec 30, 2008 at 05:39:29PM -0800]
>>>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>>>
>>>>> [email protected] writes:
>>>>>>
>>>>>> doing a grep through System.map for the address that appears in the
>>>>>> error returns nothing
>>>>>
>>>>> This might be obvious, but you can't grep directly for these addresses
>>>>> because System.map contains the starting addresses of functions only
>>>>> and normally the reported address is somewhere in the middle of a
>>>>> function. So you instead have to look for the highest number lower or equal
>>>>> the address from the exception.
>>>>
>>>> thanks, this was not obvious to me
>>>>
>>>> the -2 error maps to
>>>>
>>>> ffffffff8099e4c1 T free_bootmem_node
>>>> ffffffff8099e4e5 t alloc_bootmem_core
>>>> ffffffff8099e774 t ___alloc_bootmem_nopanic
>>>>
>>>>
>>>> the first error maps to
>>>>
>>>> ffffffff809c2de4 T free_bootmem_node
>>>> ffffffff809c2e08 t alloc_bootmem_core
>>>> ffffffff809c3097 t ___alloc_bootmem_nopanic
>>>>
>>>>
>>>> so it looks like this is in alloc_bootmem_core in both cases.
>>>>
>>>> David Lang
>>>>
>>>
>>> Along with Andi's proposed earlyprintk=vga I think
>>> bootmem_debug option could be usefull here too.
>>
>> adding bootmem_debug creates so much additonal output that the oops
>> scrolls off the screen (except the last 'paragraph' of it)
>>
>> it looks like it's individual items being allocated (trying to scan it as
>> it scrolled by)
>
> on the picture you sent me i noticed the message
> "Your memory is not aligned you need to rebuild your
> kernel with bigger NODEMAP SIZE shift=20" and then
> srat code complains about "No NUMA code hash function found"
> which looks a bit scary. Btw, could you post this picture
> on some public resource so NUMA people could check it?

http://linux.lang.hm/linux/IMG00030.jpg

I'll try rebuilding with a bigger nodemap size and let you know

David Lang

2008-12-31 19:12:29

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Wed, Dec 31, 2008 at 12:07:33PM -0800]
> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>
>> [[email protected] - Wed, Dec 31, 2008 at 11:12:12AM -0800]
>>> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>>>
>>>> [[email protected] - Tue, Dec 30, 2008 at 05:39:29PM -0800]
>>>>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>>>>
>>>>>> [email protected] writes:
>>>>>>>
>>>>>>> doing a grep through System.map for the address that appears in the
>>>>>>> error returns nothing
>>>>>>
>>>>>> This might be obvious, but you can't grep directly for these addresses
>>>>>> because System.map contains the starting addresses of functions only
>>>>>> and normally the reported address is somewhere in the middle of a
>>>>>> function. So you instead have to look for the highest number lower or equal
>>>>>> the address from the exception.
>>>>>
>>>>> thanks, this was not obvious to me
>>>>>
>>>>> the -2 error maps to
>>>>>
>>>>> ffffffff8099e4c1 T free_bootmem_node
>>>>> ffffffff8099e4e5 t alloc_bootmem_core
>>>>> ffffffff8099e774 t ___alloc_bootmem_nopanic
>>>>>
>>>>>
>>>>> the first error maps to
>>>>>
>>>>> ffffffff809c2de4 T free_bootmem_node
>>>>> ffffffff809c2e08 t alloc_bootmem_core
>>>>> ffffffff809c3097 t ___alloc_bootmem_nopanic
>>>>>
>>>>>
>>>>> so it looks like this is in alloc_bootmem_core in both cases.
>>>>>
>>>>> David Lang
>>>>>
>>>>
>>>> Along with Andi's proposed earlyprintk=vga I think
>>>> bootmem_debug option could be usefull here too.
>>>
>>> adding bootmem_debug creates so much additonal output that the oops
>>> scrolls off the screen (except the last 'paragraph' of it)
>>>
>>> it looks like it's individual items being allocated (trying to scan it as
>>> it scrolled by)
>>
>> on the picture you sent me i noticed the message
>> "Your memory is not aligned you need to rebuild your
>> kernel with bigger NODEMAP SIZE shift=20" and then
>> srat code complains about "No NUMA code hash function found"
>> which looks a bit scary. Btw, could you post this picture
>> on some public resource so NUMA people could check it?
>
> http://linux.lang.hm/linux/IMG00030.jpg
>
> I'll try rebuilding with a bigger nodemap size and let you know
>
> David Lang
>

also you could just pass numa=off and check if it help.
(even if it help it would not mean that problem are gone
but become hidden)

- Cyrill -

2009-01-01 04:04:18

by Andi Kleen

[permalink] [raw]
Subject: Re: early exception error

On Wed, Dec 31, 2008 at 12:59:08PM -0800, [email protected] wrote:
> On Wed, 31 Dec 2008, Andi Kleen wrote:
>
> >>on the picture you sent me i noticed the message
> >>"Your memory is not aligned you need to rebuild your
> >>kernel with bigger NODEMAP SIZE shift=20" and then
> >>srat code complains about "No NUMA code hash function found"
> >>which looks a bit scary. Btw, could you post this picture
> >>on some public resource so NUMA people could check it?
> >
> >This case used to be handled cleanly (NUMA disabled), but perhaps
> >that has regressed. But still it sounds like something is going wrong,
> >unless his machine really has a very weird memory map.
>
> it shouldn't, it was one of the high-volume servers 4-5 years ago and only
> has 4G of ram in it

>From looking at the screenshot Cyrill sent you seem to have a funny
SRAT with overlapping areas that is rejected in the end. I suspect the
fallback code doesn't handle this properly.

Does it work when you boot with numa=noacpi ?

-Andi

2009-01-01 05:15:17

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Thu, 1 Jan 2009, Andi Kleen wrote:

> On Wed, Dec 31, 2008 at 12:59:08PM -0800, [email protected] wrote:
>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>
>>>> on the picture you sent me i noticed the message
>>>> "Your memory is not aligned you need to rebuild your
>>>> kernel with bigger NODEMAP SIZE shift=20" and then
>>>> srat code complains about "No NUMA code hash function found"
>>>> which looks a bit scary. Btw, could you post this picture
>>>> on some public resource so NUMA people could check it?
>>>
>>> This case used to be handled cleanly (NUMA disabled), but perhaps
>>> that has regressed. But still it sounds like something is going wrong,
>>> unless his machine really has a very weird memory map.
>>
>> it shouldn't, it was one of the high-volume servers 4-5 years ago and only
>> has 4G of ram in it
>
> From looking at the screenshot Cyrill sent you seem to have a funny
> SRAT with overlapping areas that is rejected in the end. I suspect the
> fallback code doesn't handle this properly.
>
> Does it work when you boot with numa=noacpi ?

it gets past the point where the bootmemory_debug messages flow by, but I
get another oops (snapshot of the screen is at
http://linux.lang.hm/linux/IMG00031.jpg )

David Lang

2009-01-01 13:49:39

by Andi Kleen

[permalink] [raw]
Subject: Re: early exception error

On Wed, Dec 31, 2008 at 10:17:06PM -0800, [email protected] wrote:
> On Thu, 1 Jan 2009, Andi Kleen wrote:
>
> >On Wed, Dec 31, 2008 at 12:59:08PM -0800, [email protected] wrote:
> >>On Wed, 31 Dec 2008, Andi Kleen wrote:
> >>
> >>>>on the picture you sent me i noticed the message
> >>>>"Your memory is not aligned you need to rebuild your
> >>>>kernel with bigger NODEMAP SIZE shift=20" and then
> >>>>srat code complains about "No NUMA code hash function found"
> >>>>which looks a bit scary. Btw, could you post this picture
> >>>>on some public resource so NUMA people could check it?
> >>>
> >>>This case used to be handled cleanly (NUMA disabled), but perhaps
> >>>that has regressed. But still it sounds like something is going wrong,
> >>>unless his machine really has a very weird memory map.
> >>
> >>it shouldn't, it was one of the high-volume servers 4-5 years ago and only
> >>has 4G of ram in it
> >
> >From looking at the screenshot Cyrill sent you seem to have a funny
> >SRAT with overlapping areas that is rejected in the end. I suspect the
> >fallback code doesn't handle this properly.
> >
> >Does it work when you boot with numa=noacpi ?
>
> it gets past the point where the bootmemory_debug messages flow by, but I
> get another oops (snapshot of the screen is at
> http://linux.lang.hm/linux/IMG00031.jpg )

Node setup seems to be still broken. You'll likely need a full
serial log with earlyprintk=serial (and no numa=...)

-Andi

--
[email protected]

2009-01-02 17:19:53

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Wed, 31 Dec 2008, [email protected] wrote:

> On Thu, 1 Jan 2009, Andi Kleen wrote:
>
>> On Wed, Dec 31, 2008 at 12:59:08PM -0800, [email protected] wrote:
>>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>>
>>>>> on the picture you sent me i noticed the message
>>>>> "Your memory is not aligned you need to rebuild your
>>>>> kernel with bigger NODEMAP SIZE shift=20" and then
>>>>> srat code complains about "No NUMA code hash function found"
>>>>> which looks a bit scary. Btw, could you post this picture
>>>>> on some public resource so NUMA people could check it?
>>>>
>>>> This case used to be handled cleanly (NUMA disabled), but perhaps
>>>> that has regressed. But still it sounds like something is going wrong,
>>>> unless his machine really has a very weird memory map.
>>>
>>> it shouldn't, it was one of the high-volume servers 4-5 years ago and only
>>> has 4G of ram in it
>>
>> From looking at the screenshot Cyrill sent you seem to have a funny
>> SRAT with overlapping areas that is rejected in the end. I suspect the
>> fallback code doesn't handle this properly.
>>
>> Does it work when you boot with numa=noacpi ?
>
> it gets past the point where the bootmemory_debug messages flow by, but I get
> another oops (snapshot of the screen is at
> http://linux.lang.hm/linux/IMG00031.jpg )

oops, I misread your mail, IMG00031.jpg was with numa=off

I just posted IMG00033.jpg which is with numa=noacpi and earlyprintk=vga
but not bootmem_debug

David Lang

2009-01-02 17:42:04

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Fri, Jan 02, 2009 at 10:21:52AM -0800]
> On Wed, 31 Dec 2008, [email protected] wrote:
>
>> On Thu, 1 Jan 2009, Andi Kleen wrote:
>>
>>> On Wed, Dec 31, 2008 at 12:59:08PM -0800, [email protected] wrote:
>>>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>>>
>>>>>> on the picture you sent me i noticed the message
>>>>>> "Your memory is not aligned you need to rebuild your
>>>>>> kernel with bigger NODEMAP SIZE shift=20" and then
>>>>>> srat code complains about "No NUMA code hash function found"
>>>>>> which looks a bit scary. Btw, could you post this picture
>>>>>> on some public resource so NUMA people could check it?
>>>>>
>>>>> This case used to be handled cleanly (NUMA disabled), but perhaps
>>>>> that has regressed. But still it sounds like something is going wrong,
>>>>> unless his machine really has a very weird memory map.
>>>>
>>>> it shouldn't, it was one of the high-volume servers 4-5 years ago and only
>>>> has 4G of ram in it
>>>
>>> From looking at the screenshot Cyrill sent you seem to have a funny
>>> SRAT with overlapping areas that is rejected in the end. I suspect the
>>> fallback code doesn't handle this properly.
>>>
>>> Does it work when you boot with numa=noacpi ?
>>
>> it gets past the point where the bootmemory_debug messages flow by, but
>> I get another oops (snapshot of the screen is at
>> http://linux.lang.hm/linux/IMG00031.jpg )
>
> oops, I misread your mail, IMG00031.jpg was with numa=off
>
> I just posted IMG00033.jpg which is with numa=noacpi and earlyprintk=vga
> but not bootmem_debug
>
> David Lang
>

Thanks, David! Trying to understand what is going on :)

Here is a new picture if someone would like to jump into
the bug handling

http://linux.lang.hm/linux/IMG00033.jpg

- Cyrill -

2009-01-02 20:57:46

by Robert Hancock

[permalink] [raw]
Subject: Re: early exception error

Cyrill Gorcunov wrote:
> [[email protected] - Fri, Jan 02, 2009 at 10:21:52AM -0800]
>> On Wed, 31 Dec 2008, [email protected] wrote:
>>
>>> On Thu, 1 Jan 2009, Andi Kleen wrote:
>>>
>>>> On Wed, Dec 31, 2008 at 12:59:08PM -0800, [email protected] wrote:
>>>>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>>>>
>>>>>>> on the picture you sent me i noticed the message
>>>>>>> "Your memory is not aligned you need to rebuild your
>>>>>>> kernel with bigger NODEMAP SIZE shift=20" and then
>>>>>>> srat code complains about "No NUMA code hash function found"
>>>>>>> which looks a bit scary. Btw, could you post this picture
>>>>>>> on some public resource so NUMA people could check it?
>>>>>> This case used to be handled cleanly (NUMA disabled), but perhaps
>>>>>> that has regressed. But still it sounds like something is going wrong,
>>>>>> unless his machine really has a very weird memory map.
>>>>> it shouldn't, it was one of the high-volume servers 4-5 years ago and only
>>>>> has 4G of ram in it
>>>> From looking at the screenshot Cyrill sent you seem to have a funny
>>>> SRAT with overlapping areas that is rejected in the end. I suspect the
>>>> fallback code doesn't handle this properly.
>>>>
>>>> Does it work when you boot with numa=noacpi ?
>>> it gets past the point where the bootmemory_debug messages flow by, but
>>> I get another oops (snapshot of the screen is at
>>> http://linux.lang.hm/linux/IMG00031.jpg )
>> oops, I misread your mail, IMG00031.jpg was with numa=off
>>
>> I just posted IMG00033.jpg which is with numa=noacpi and earlyprintk=vga
>> but not bootmem_debug
>>
>> David Lang
>>
>
> Thanks, David! Trying to understand what is going on :)
>
> Here is a new picture if someone would like to jump into
> the bug handling
>
> http://linux.lang.hm/linux/IMG00033.jpg

alloc_bootmem_core is a reasonably big function, it would be useful if
we could track down what line it's blowing up on.. Can you try to find
out what line that fault address (ffffffff8096452a in this crash) is on
as described in Documentation/BUG-HUNTING, i.e. build with
CONFIG_DEBUG_INFO enabled, run gdb on vmlinux and do:

l *0xffffffff8096452a

2009-01-03 19:13:39

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

(list restored)

[[email protected] - Sat, Jan 03, 2009 at 11:19:00AM -0800]
...
>>>
>>> two new screenshots at http://linux.lang.hm/linux
>>>
>>> 36 is a boot with just earlyprintk=vga
>>> 37 is a boot with numa=noacpi
>>> I also put the vmlinux file there, I'll put the System.map and config
>>> there later (I did enable kernel_debug on this build as well)
>>>
>>> David Lang
>>>
>>
>> David, I can't find vmlinux neither .config?
>> Maybe they have hidden attribute?
>
> oops, they are there now.
>
> David Lang
>

ok, according to failing address we've a BUG_ON
triggered

---
(gdb) l *0xffffffff8096452a
0xffffffff8096452a is in alloc_bootmem_core (mm/bootmem.c:442).
437 unsigned long fallback = 0;
438 unsigned long min, max, start, sidx, midx, step;
439
440 BUG_ON(!size);
441 BUG_ON(align & (align - 1));
442 BUG_ON(limit && goal + size > limit);
443
444 if (!bdata->node_bootmem_map)
445 return NULL;
446
(gdb)
---

so we're in attempt to overrun 'limit'.
Hmm...

- Cyrill -

2009-01-03 21:24:42

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Cyrill Gorcunov - Sat, Jan 03, 2009 at 10:03:16PM +0300]
| (list restored)
|
| [[email protected] - Sat, Jan 03, 2009 at 11:19:00AM -0800]
| ...
| >>>
| >>> two new screenshots at http://linux.lang.hm/linux
| >>>
| >>> 36 is a boot with just earlyprintk=vga
| >>> 37 is a boot with numa=noacpi
| >>> I also put the vmlinux file there, I'll put the System.map and config
| >>> there later (I did enable kernel_debug on this build as well)
| >>>
| >>> David Lang
| >>>
| >>
| >> David, I can't find vmlinux neither .config?
| >> Maybe they have hidden attribute?
| >
| > oops, they are there now.
| >
| > David Lang
| >
|
| ok, according to failing address we've a BUG_ON
| triggered
|
| ---
| (gdb) l *0xffffffff8096452a
| 0xffffffff8096452a is in alloc_bootmem_core (mm/bootmem.c:442).
| 437 unsigned long fallback = 0;
| 438 unsigned long min, max, start, sidx, midx, step;
| 439
| 440 BUG_ON(!size);
| 441 BUG_ON(align & (align - 1));
| 442 BUG_ON(limit && goal + size > limit);
| 443
| 444 if (!bdata->node_bootmem_map)
| 445 return NULL;
| 446
| (gdb)
| ---
|
| so we're in attempt to overrun 'limit'.
| Hmm...
|
| - Cyrill -

Hardly possible that we trigger BUG here since I don't
see BUG: on the photo. Investigating.

- Cyrill -

2009-01-04 00:24:20

by Jiri Slaby

[permalink] [raw]
Subject: Re: early exception error

On 01/03/2009 10:24 PM, Cyrill Gorcunov wrote:
> [Cyrill Gorcunov - Sat, Jan 03, 2009 at 10:03:16PM +0300]
> | (list restored)
> |
> | [[email protected] - Sat, Jan 03, 2009 at 11:19:00AM -0800]
> | ...
> | >>>
> | >>> two new screenshots at http://linux.lang.hm/linux
> | >>>
> | >>> 36 is a boot with just earlyprintk=vga
> | >>> 37 is a boot with numa=noacpi
> | >>> I also put the vmlinux file there, I'll put the System.map and config
> | >>> there later (I did enable kernel_debug on this build as well)
> | >>>
> | >>> David Lang
> | >>>
> | >>
> | >> David, I can't find vmlinux neither .config?
> | >> Maybe they have hidden attribute?
> |
> | ok, according to failing address we've a BUG_ON
> | triggered
> |
> | ---
> | (gdb) l *0xffffffff8096452a
> | 0xffffffff8096452a is in alloc_bootmem_core (mm/bootmem.c:442).
> | 437 unsigned long fallback = 0;
> | 438 unsigned long min, max, start, sidx, midx, step;
> | 439
> | 440 BUG_ON(!size);
> | 441 BUG_ON(align & (align - 1));
> | 442 BUG_ON(limit && goal + size > limit);
> | 443
> | 444 if (!bdata->node_bootmem_map)
> | 445 return NULL;
> | 446
> | (gdb)
> | ---
> |
> | so we're in attempt to overrun 'limit'.
> | Hmm...
> |
> | - Cyrill -
>
> Hardly possible that we trigger BUG here since I don't
> see BUG: on the photo. Investigating.

Hint: line 442 in 2.6.28 is
if (!bdata->node_bootmem_map)
;)

It's:
0xffffffff8096452a <alloc_bootmem_core+69>: cmpq $0x0,0x10(%rbp)
and hence cr2 is 10.

node_data[nid] is NULL... But both of them are set up. Maybe too high nid (and
pnum in sparse_init)?

2009-01-04 00:45:57

by Andi Kleen

[permalink] [raw]
Subject: Re: early exception error

> Hint: line 442 in 2.6.28 is
> if (!bdata->node_bootmem_map)
> ;)
>
> It's:
> 0xffffffff8096452a <alloc_bootmem_core+69>: cmpq $0x0,0x10(%rbp)
> and hence cr2 is 10.
>
> node_data[nid] is NULL... But both of them are set up. Maybe too high nid (and
> pnum in sparse_init)?

I think it's because SRAT parsing failed and the fallback forget
to clean some state. Or at least I thought that until numa=noacpi
failed too (if it fails the same way that theory is not correct)

-Andi
>

--
[email protected]

2009-01-04 10:32:53

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Andi Kleen - Sun, Jan 04, 2009 at 01:59:04AM +0100]
| > Hint: line 442 in 2.6.28 is
| > if (!bdata->node_bootmem_map)
| > ;)
| >
| > It's:
| > 0xffffffff8096452a <alloc_bootmem_core+69>: cmpq $0x0,0x10(%rbp)
| > and hence cr2 is 10.
| >
| > node_data[nid] is NULL... But both of them are set up. Maybe too high nid (and
| > pnum in sparse_init)?
|
| I think it's because SRAT parsing failed and the fallback forget
| to clean some state. Or at least I thought that until numa=noacpi
| failed too (if it fails the same way that theory is not correct)
|
| -Andi
| >
|
| --
| [email protected]
|

according to image David's machine fails the same
way for numa=noacpi (unfortunately).

Actually I found one bug in memory_present -- in
case of SLAB code being activated (which should
be later stage of booting so it's not our case
now) sparse_index_init could fail with -ENOMEM
and we'll try to deref NULL in further. I'm fixing
it now but again -- it's not the issue we have now.

To Jiri: good catch! :-)

- Cyrill -

2009-01-04 11:12:21

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Jiri Slaby - Sun, Jan 04, 2009 at 01:24:03AM +0100]
| On 01/03/2009 10:24 PM, Cyrill Gorcunov wrote:
| > [Cyrill Gorcunov - Sat, Jan 03, 2009 at 10:03:16PM +0300]
| > | (list restored)
| > |
| > | [[email protected] - Sat, Jan 03, 2009 at 11:19:00AM -0800]
| > | ...
| > | >>>
| > | >>> two new screenshots at http://linux.lang.hm/linux
| > | >>>
| > | >>> 36 is a boot with just earlyprintk=vga
| > | >>> 37 is a boot with numa=noacpi
| > | >>> I also put the vmlinux file there, I'll put the System.map and config
| > | >>> there later (I did enable kernel_debug on this build as well)
| > | >>>
| > | >>> David Lang
| > | >>>
| > | >>
| > | >> David, I can't find vmlinux neither .config?
| > | >> Maybe they have hidden attribute?
| > |
| > | ok, according to failing address we've a BUG_ON
| > | triggered
| > |
| > | ---
| > | (gdb) l *0xffffffff8096452a
| > | 0xffffffff8096452a is in alloc_bootmem_core (mm/bootmem.c:442).
| > | 437 unsigned long fallback = 0;
| > | 438 unsigned long min, max, start, sidx, midx, step;
| > | 439
| > | 440 BUG_ON(!size);
| > | 441 BUG_ON(align & (align - 1));
| > | 442 BUG_ON(limit && goal + size > limit);
| > | 443
| > | 444 if (!bdata->node_bootmem_map)
| > | 445 return NULL;
| > | 446
| > | (gdb)
| > | ---
| > |
| > | so we're in attempt to overrun 'limit'.
| > | Hmm...
| > |
| > | - Cyrill -
| >
| > Hardly possible that we trigger BUG here since I don't
| > see BUG: on the photo. Investigating.
|
| Hint: line 442 in 2.6.28 is
| if (!bdata->node_bootmem_map)
| ;)
|
| It's:
| 0xffffffff8096452a <alloc_bootmem_core+69>: cmpq $0x0,0x10(%rbp)
| and hence cr2 is 10.
|
| node_data[nid] is NULL... But both of them are set up. Maybe too high nid (and
| pnum in sparse_init)?
|

It seems to be true!

What is worse we have a number of __nr_to_section users which
don't check for NULL returned and secondly

static inline struct mem_section *__nr_to_section(unsigned long nr)
{
if (!mem_section[SECTION_NR_TO_ROOT(nr)])
return NULL;
return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
}

SECTION_NR_TO_ROOT is not modulo operation so we could run out of
mem_section[NR_SECTION_ROOTS].

David I'll cook some testing patch shortly.
Many thanks to Jiri!

- Cyrill -

2009-01-04 11:29:32

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Jiri Slaby - Sun, Jan 04, 2009 at 01:24:03AM +0100]
| On 01/03/2009 10:24 PM, Cyrill Gorcunov wrote:
| > [Cyrill Gorcunov - Sat, Jan 03, 2009 at 10:03:16PM +0300]
| > | (list restored)
| > |
| > | [[email protected] - Sat, Jan 03, 2009 at 11:19:00AM -0800]
| > | ...
| > | >>>
| > | >>> two new screenshots at http://linux.lang.hm/linux
| > | >>>
| > | >>> 36 is a boot with just earlyprintk=vga
| > | >>> 37 is a boot with numa=noacpi
| > | >>> I also put the vmlinux file there, I'll put the System.map and config
| > | >>> there later (I did enable kernel_debug on this build as well)
| > | >>>
| > | >>> David Lang
| > | >>>
| > | >>
| > | >> David, I can't find vmlinux neither .config?
| > | >> Maybe they have hidden attribute?
| > |
| > | ok, according to failing address we've a BUG_ON
| > | triggered
| > |
| > | ---
| > | (gdb) l *0xffffffff8096452a
| > | 0xffffffff8096452a is in alloc_bootmem_core (mm/bootmem.c:442).
| > | 437 unsigned long fallback = 0;
| > | 438 unsigned long min, max, start, sidx, midx, step;
| > | 439
| > | 440 BUG_ON(!size);
| > | 441 BUG_ON(align & (align - 1));
| > | 442 BUG_ON(limit && goal + size > limit);
| > | 443
| > | 444 if (!bdata->node_bootmem_map)
| > | 445 return NULL;
| > | 446
| > | (gdb)
| > | ---
| > |
| > | so we're in attempt to overrun 'limit'.
| > | Hmm...
| > |
| > | - Cyrill -
| >
| > Hardly possible that we trigger BUG here since I don't
| > see BUG: on the photo. Investigating.
|
| Hint: line 442 in 2.6.28 is
| if (!bdata->node_bootmem_map)
| ;)
|
| It's:
| 0xffffffff8096452a <alloc_bootmem_core+69>: cmpq $0x0,0x10(%rbp)
| and hence cr2 is 10.
|
| node_data[nid] is NULL... But both of them are set up. Maybe too high nid (and
| pnum in sparse_init)?
|

David, could you give it a try?

- Cyrill -
---
include/linux/mmzone.h | 1 +
1 file changed, 1 insertion(+)

Index: linux-2.6.git/include/linux/mmzone.h
===================================================================
--- linux-2.6.git.orig/include/linux/mmzone.h
+++ linux-2.6.git/include/linux/mmzone.h
@@ -980,6 +980,7 @@ extern struct mem_section mem_section[NR

static inline struct mem_section *__nr_to_section(unsigned long nr)
{
+ BUG_ON(SECTION_NR_TO_ROOT(nr) >= NR_SECTION_ROOTS);
if (!mem_section[SECTION_NR_TO_ROOT(nr)])
return NULL;
return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];

2009-01-04 12:23:19

by Jiri Slaby

[permalink] [raw]
Subject: Re: early exception error

Cyrill Gorcunov write:
> static inline struct mem_section *__nr_to_section(unsigned long nr)
> {
> + BUG_ON(SECTION_NR_TO_ROOT(nr) >= NR_SECTION_ROOTS);

sidenote: David, you should get early exception 6.

2009-01-05 09:26:46

by Johannes Weiner

[permalink] [raw]
Subject: Re: early exception error

On Fri, Jan 02, 2009 at 02:57:17PM -0600, Robert Hancock wrote:
> Cyrill Gorcunov wrote:
> >
> >Here is a new picture if someone would like to jump into
> >the bug handling
> >
> > http://linux.lang.hm/linux/IMG00033.jpg
>
> alloc_bootmem_core is a reasonably big function, it would be useful if
> we could track down what line it's blowing up on.. Can you try to find
> out what line that fault address (ffffffff8096452a in this crash) is on
> as described in Documentation/BUG-HUNTING, i.e. build with
> CONFIG_DEBUG_INFO enabled, run gdb on vmlinux and do:
>
> l *0xffffffff8096452a

He has booted with bootmem debugging output. Given that the bdebug()
describing the request wasn't hit yet, it must be one of the BUG_ON()s
(or bdata is NULL).

If you can find out the line with gdb, this would be great.

Besides that it might be useful to move the bdebug() before the
BUG_ON()s. With the line info available, the expressions that trigger
a bug are pretty unambiguous, but since we would print the parameters
anyway we can as well do so before a possible panic to quickly deduce
what went wrong without decoding.

Hannes

---
Subject: bootmem: print request details before BUG_ON(them)

Moving the request details print-out before the sanity checks that
might panic() enables us to analyse invalid requests without having
access to the line information of the stack dump.

Signed-off-by: Johannes Weiner <[email protected]>
---

diff --git a/mm/bootmem.c b/mm/bootmem.c
index ac5a891..51a0ccf 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -435,6 +435,10 @@ static void * __init alloc_bootmem_core(struct bootmem_data *bdata,
unsigned long fallback = 0;
unsigned long min, max, start, sidx, midx, step;

+ bdebug("nid=%td size=%lx [%lu pages] align=%lx goal=%lx limit=%lx\n",
+ bdata - bootmem_node_data, size, PAGE_ALIGN(size) >> PAGE_SHIFT,
+ align, goal, limit);
+
BUG_ON(!size);
BUG_ON(align & (align - 1));
BUG_ON(limit && goal + size > limit);
@@ -442,10 +446,6 @@ static void * __init alloc_bootmem_core(struct bootmem_data *bdata,
if (!bdata->node_bootmem_map)
return NULL;

- bdebug("nid=%td size=%lx [%lu pages] align=%lx goal=%lx limit=%lx\n",
- bdata - bootmem_node_data, size, PAGE_ALIGN(size) >> PAGE_SHIFT,
- align, goal, limit);
-
min = bdata->node_min_pfn;
max = bdata->node_low_pfn;

2009-01-05 12:54:56

by Andi Kleen

[permalink] [raw]
Subject: Re: early exception error

On Mon, Jan 05, 2009 at 10:26:19AM +0100, Johannes Weiner wrote:
> On Fri, Jan 02, 2009 at 02:57:17PM -0600, Robert Hancock wrote:
> > Cyrill Gorcunov wrote:
> > >
> > >Here is a new picture if someone would like to jump into
> > >the bug handling
> > >
> > > http://linux.lang.hm/linux/IMG00033.jpg
> >
> > alloc_bootmem_core is a reasonably big function, it would be useful if
> > we could track down what line it's blowing up on.. Can you try to find
> > out what line that fault address (ffffffff8096452a in this crash) is on
> > as described in Documentation/BUG-HUNTING, i.e. build with
> > CONFIG_DEBUG_INFO enabled, run gdb on vmlinux and do:
> >
> > l *0xffffffff8096452a
>
> He has booted with bootmem debugging output. Given that the bdebug()
> describing the request wasn't hit yet, it must be one of the BUG_ON()s
> (or bdata is NULL).

BUG_ONs with early exceptions are always a big annoyance.

I did an EARLY_BUG_ON() infrastructure some time ago, but ended up
not submitting it because the BUG_ONs I wanted it for originally
disappeared before submission.

It would probably be a good idea to convert the bootmem bugs over
to that.

-Andi

---


Add EARLY_BUG_ON infrastructure

EARLY_BUG_ON is larger than BUG_ON, but it works before traps_init
and always outputs the line number without having to decode
addresses from the early exception handler.

It always panics.

Shouldn't be used when multiple CPUs are active because it makes
no attempt to stop the others.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86/include/asm/bug.h | 11 +++++++++++
arch/x86/kernel/early_printk.c | 7 +++++++
include/linux/bug.h | 5 +++++
3 files changed, 23 insertions(+)

Index: linux-2.6.28-test/arch/x86/include/asm/bug.h
===================================================================
--- linux-2.6.28-test.orig/arch/x86/include/asm/bug.h 2008-10-24 13:34:40.000000000 +0200
+++ linux-2.6.28-test/arch/x86/include/asm/bug.h 2009-01-05 13:47:02.000000000 +0100
@@ -33,6 +33,17 @@
} while (0)
#endif

+extern void early_bug(char *file, int line) __attribute__((noreturn));
+
+/* All BUG_ONs before console_init should be EARLY_BUG_ONs. */
+#define EARLY_BUG() early_bug(__FILE__, __LINE__)
+#define EARLY_BUG_ON(x) do { if (unlikely(!(x))) EARLY_BUG(); } while (0)
+
+#else
+
+#define EARLY_BUG() do {} while(0)
+#define EARLY_BUG_ON(x) do {} while(0)
+
#endif /* !CONFIG_BUG */

#include <asm-generic/bug.h>
Index: linux-2.6.28-test/arch/x86/kernel/early_printk.c
===================================================================
--- linux-2.6.28-test.orig/arch/x86/kernel/early_printk.c 2008-10-24 13:34:40.000000000 +0200
+++ linux-2.6.28-test/arch/x86/kernel/early_printk.c 2009-01-05 13:48:46.000000000 +0100
@@ -934,6 +934,13 @@
va_end(ap);
}

+void early_bug(char *file, int line)
+{
+ early_printk("PANIC: Early BUG at %s:%d\n", file, line);
+ printk("PANIC: Early BUG at %s:%d\n", file, line);
+ for (;;)
+ cpu_relax();
+}

static int __init setup_early_printk(char *buf)
{
Index: linux-2.6.28-test/include/linux/bug.h
===================================================================
--- linux-2.6.28-test.orig/include/linux/bug.h 2008-07-05 14:11:02.000000000 +0200
+++ linux-2.6.28-test/include/linux/bug.h 2009-01-05 13:49:32.000000000 +0100
@@ -47,4 +47,9 @@
static inline void module_bug_cleanup(struct module *mod) {}

#endif /* CONFIG_GENERIC_BUG */
+
+#ifndef EARLY_BUG_ON
+#define EARLY_BUG_ON(x) BUG_ON(x)
+#endif
+
#endif /* _LINUX_BUG_H */


2009-01-05 14:53:25

by Jiri Slaby

[permalink] [raw]
Subject: Re: early exception error

On 01/05/2009 10:26 AM, Johannes Weiner wrote:
> (or bdata is NULL).

Confirmed, see http://lkml.org/lkml/2009/1/3/204

2009-01-05 14:53:57

by Jiri Slaby

[permalink] [raw]
Subject: Re: early exception error

On 01/05/2009 02:08 PM, Andi Kleen wrote:
> +#define EARLY_BUG_ON(x) do { if (unlikely(!(x))) EARLY_BUG(); } while (0)

I think unintentionally inverted logic.

> +#define EARLY_BUG_ON(x) do {} while(0)

Shouldn't be x referenced here?

2009-01-05 15:02:16

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Jiri Slaby - Mon, Jan 05, 2009 at 03:51:26PM +0100]
| On 01/05/2009 02:08 PM, Andi Kleen wrote:
| > +#define EARLY_BUG_ON(x) do { if (unlikely(!(x))) EARLY_BUG(); } while (0)
|
| I think unintentionally inverted logic.

just second ! is missed :)

|
| > +#define EARLY_BUG_ON(x) do {} while(0)
|
| Shouldn't be x referenced here?
|

but for what?

- Cyrill -

2009-01-05 15:14:26

by Jiri Slaby

[permalink] [raw]
Subject: Re: early exception error

On 01/05/2009 04:01 PM, Cyrill Gorcunov wrote:
> [Jiri Slaby - Mon, Jan 05, 2009 at 03:51:26PM +0100]
> | On 01/05/2009 02:08 PM, Andi Kleen wrote:
> | > +#define EARLY_BUG_ON(x) do { if (unlikely(!(x))) EARLY_BUG(); } while (0)
> |
> | I think unintentionally inverted logic.
>
> just second ! is missed :)

None is needed, two '!' are added in the macro itself while it passes the
parameter to the builtin.

> |
> | > +#define EARLY_BUG_ON(x) do {} while(0)
> |
> | Shouldn't be x referenced here?
> |
>
> but for what?

I know, core devs are sane, but e.g. for reasons such as

{'a' is used here already}
EARLY_BUG_ON(!(a = readl(...)))
{use 'a' again}

to stay on the safe side.

2009-01-05 15:30:38

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Jiri Slaby - Mon, Jan 05, 2009 at 04:14:12PM +0100]
| On 01/05/2009 04:01 PM, Cyrill Gorcunov wrote:
| > [Jiri Slaby - Mon, Jan 05, 2009 at 03:51:26PM +0100]
| > | On 01/05/2009 02:08 PM, Andi Kleen wrote:
| > | > +#define EARLY_BUG_ON(x) do { if (unlikely(!(x))) EARLY_BUG(); } while (0)
| > |
| > | I think unintentionally inverted logic.
| >
| > just second ! is missed :)
|
| None is needed, two '!' are added in the macro itself while it passes the
| parameter to the builtin.

ah, yep :)

|
| > |
| > | > +#define EARLY_BUG_ON(x) do {} while(0)
| > |
| > | Shouldn't be x referenced here?
| > |
| >
| > but for what?
|
| I know, core devs are sane, but e.g. for reasons such as
|
| {'a' is used here already}
| EARLY_BUG_ON(!(a = readl(...)))
| {use 'a' again}
|
| to stay on the safe side.
|

I wouldn't populate this style Jiri. It become more complicated
as it should to be, agreed?

- Cyrill -

2009-01-05 15:39:22

by Jiri Slaby

[permalink] [raw]
Subject: Re: early exception error

On 01/05/2009 04:30 PM, Cyrill Gorcunov wrote:
> [Jiri Slaby - Mon, Jan 05, 2009 at 04:14:12PM +0100]
> | On 01/05/2009 04:01 PM, Cyrill Gorcunov wrote:
> | > [Jiri Slaby - Mon, Jan 05, 2009 at 03:51:26PM +0100]
> | > | On 01/05/2009 02:08 PM, Andi Kleen wrote:
> | > | > +#define EARLY_BUG_ON(x) do {} while(0)
> | > |
> | > | Shouldn't be x referenced here?
> | > |
> | >
> | > but for what?
> |
> | I know, core devs are sane, but e.g. for reasons such as
> |
> | {'a' is used here already}
> | EARLY_BUG_ON(!(a = readl(...)))
> | {use 'a' again}
> |
> | to stay on the safe side.
>
> I wouldn't populate this style Jiri. It become more complicated
> as it should to be, agreed?

No, I tend to disagree. Macros should evaluate argument(s) the same no matter
what is in .config.

2009-01-05 15:42:28

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[Jiri Slaby - Mon, Jan 05, 2009 at 04:39:07PM +0100]
...
| > | I know, core devs are sane, but e.g. for reasons such as
| > |
| > | {'a' is used here already}
| > | EARLY_BUG_ON(!(a = readl(...)))
| > | {use 'a' again}
| > |
| > | to stay on the safe side.
| >
| > I wouldn't populate this style Jiri. It become more complicated
| > as it should to be, agreed?
|
| No, I tend to disagree. Macros should evaluate argument(s) the same no matter
| what is in .config.
|

I see what you mean (now) -- and you're right!

- Cyrill -

2009-01-05 21:18:31

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Sun, 4 Jan 2009, Cyrill Gorcunov wrote:

I assume you want this instead of the prior patches. I will test this
shortly.

David Lang

> ---
> include/linux/mmzone.h | 1 +
> 1 file changed, 1 insertion(+)
>
> Index: linux-2.6.git/include/linux/mmzone.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/mmzone.h
> +++ linux-2.6.git/include/linux/mmzone.h
> @@ -980,6 +980,7 @@ extern struct mem_section mem_section[NR
>
> static inline struct mem_section *__nr_to_section(unsigned long nr)
> {
> + BUG_ON(SECTION_NR_TO_ROOT(nr) >= NR_SECTION_ROOTS);
> if (!mem_section[SECTION_NR_TO_ROOT(nr)])
> return NULL;
> return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
>

2009-01-05 21:25:29

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Mon, Jan 05, 2009 at 02:20:42PM -0800]
> On Sun, 4 Jan 2009, Cyrill Gorcunov wrote:
>
> I assume you want this instead of the prior patches. I will test this
> shortly.
>
> David Lang
>
>> ---
>> include/linux/mmzone.h | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> Index: linux-2.6.git/include/linux/mmzone.h
>> ===================================================================
>> --- linux-2.6.git.orig/include/linux/mmzone.h
>> +++ linux-2.6.git/include/linux/mmzone.h
>> @@ -980,6 +980,7 @@ extern struct mem_section mem_section[NR
>>
>> static inline struct mem_section *__nr_to_section(unsigned long nr)
>> {
>> + BUG_ON(SECTION_NR_TO_ROOT(nr) >= NR_SECTION_ROOTS);
>> if (!mem_section[SECTION_NR_TO_ROOT(nr)])
>> return NULL;
>> return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
>>
>

Yes, you even may combine it with the patch Johannes proposed.
They should not interfere.

- Cyrill -

2009-01-05 21:55:25

by Yinghai Lu

[permalink] [raw]
Subject: Re: early exception error

Can you send out boot log with working kernel before 2.6.28?

need to look at the e820 table.

YH

2009-01-05 22:07:52

by Yinghai Lu

[permalink] [raw]
Subject: Re: early exception error

can you make sure X86_64_APCI_NUMA is set?

# CONFIG_X86_64_ACPI_NUMA is not set

YH

2009-01-05 22:16:31

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Mon, 5 Jan 2009, Yinghai Lu wrote:

> Can you send out boot log with working kernel before 2.6.28?
>
> need to look at the e820 table.


here is the 32 bit kernel I'm running now (from dmesg )

Linux version 2.6.25.14 (root@dlang) (gcc version 3.3.6) #1 SMP PREEMPT
Mon Aug 4 18:22:50 PDT 2008
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000004cff0000 (usable)
BIOS-e820: 000000004cff0000 - 000000004cfff000 (ACPI data)
BIOS-e820: 000000004cfff000 - 000000004d000000 (ACPI NVS)
BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
Warning only 4GB will be used.
Use a HIGHMEM64G enabled kernel.
3200MB HIGHMEM available.
896MB LOWMEM available.
Scan SMP from c0000000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f0000 for 65536 bytes.
found SMP MP-table at [c00ff780] 000ff780
Entering add_active_range(0, 0, 1048576) 0 entries of 256 used
Zone PFN ranges:
DMA 0 -> 4096
Normal 4096 -> 229376
HighMem 229376 -> 1048576
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0: 0 -> 1048576
On node 0 totalpages: 1048576
DMA zone: 32 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 4064 pages, LIFO batch:0
Normal zone: 1760 pages used for memmap
Normal zone: 223520 pages, LIFO batch:31
HighMem zone: 6400 pages used for memmap
HighMem zone: 812800 pages, LIFO batch:31
Movable zone: 0 pages used for memmap
DMI 2.3 present.
ACPI: RSDP 000F68E0, 0024 (r2 ACPIAM)
ACPI: XSDT 4CFF0100, 0054 (r1 A M I OEMXSDT 6000428 MSFT 97)
ACPI: FACP 4CFF0281, 00F4 (r1 A M I OEMFACP 6000428 MSFT 97)
ACPI: DSDT 4CFF0400, 30A7 (r1 0AAAA 0AAAA000 0 INTL 2002026)
ACPI: FACS 4CFFF000, 0040
ACPI: APIC 4CFF0380, 0074 (r1 A M I OEMAPIC 6000428 MSFT 97)
ACPI: OEMB 4CFFF040, 0041 (r1 A M I OEMBIOS 6000428 MSFT 97)
ACPI: SRAT 4CFF34B0, 00F0 (r1 A M I OEMSRAT 6000428 MSFT 97)
ACPI: HPET 4CFF35A0, 0038 (r1 A M I OEMHPET 6000428 MSFT 97)
ACPI: ASF! 4CFF35E0, 0086 (r1 AMIASF AMDSTRET 1 INTL 2002026)
ACPI: PM-Timer IO Port: 0x5008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x03] address[0xff6ff000] gsi_base[24])
IOAPIC[1]: apic_id 3, version 17, address 0xff6ff000, GSI 24-27
ACPI: IOAPIC (id[0x04] address[0xff6fe000] gsi_base[28])
IOAPIC[2]: apic_id 4, version 17, address 0xff6fe000, GSI 28-31
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode: Flat. Using 3 I/O APICs
ACPI: HPET id: 0x102282a0 base: 0xfec01000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 50000000 (gap: 4d000000:b2780000)
Built 1 zonelists in Zone order, mobility grouping on. Total pages:
1040384
Kernel command line: BOOT_IMAGE=2.6.25.14 ro root=802
mapped APIC to ffffb000 (fee00000)
mapped IOAPIC to ffffa000 (fec00000)
mapped IOAPIC to ffff9000 (ff6ff000)
mapped IOAPIC to ffff8000 (ff6fe000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
Preemptible RCU implementation.
CPU 0 irqstacks, hard=c07e3000 soft=c07e1000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 1394.218 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1216196k/4194304k available (4621k kernel code, 44152k reserved,
2074k data, 304k init, 344000k highmem)
virtual kernel memory layout:
fixmap : 0xfff9b000 - 0xfffff000 ( 400 kB)
pkmap : 0xff800000 - 0xffc00000 (4096 kB)
vmalloc : 0xf8800000 - 0xff7fe000 ( 111 MB)
lowmem : 0xc0000000 - 0xf8000000 ( 896 MB)
.init : 0xc0792000 - 0xc07de000 ( 304 kB)
.data : 0xc0583639 - 0xc078a1fc (2074 kB)
.text : 0xc0100000 - 0xc0583639 (4621 kB)
Checking if this processor honours the WP bit even in supervisor
mode...Ok.
CPA: page pool initialized 1 of 1 pages preallocated
hpet clockevent registered
Calibrating delay using timer specific routine.. 2791.98 BogoMIPS
(lpj=4651182)
Mount-cache hash table entries: 512
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
ACPI: Core revision 20070126
CPU0: AMD Opteron(tm) Processor 240 stepping 0a
Booting processor 1/1 ip 2000
CPU 1 irqstacks, hard=c07e4000 soft=c07e2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 2789.41 BogoMIPS
(lpj=4647056)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: AMD Opteron(tm) Processor 240 stepping 0a
Total of 2 processors activated (5581.39 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0
Brought up 2 CPUs
CPU0 attaching sched-domain:
domain 0: span 3
groups: 1 2
CPU1 attaching sched-domain:
domain 0: span 3
groups: 2 1
net_namespace: 448 bytes
xor: automatically using best checksumming function: pIII_sse
pIII_sse : 4393.200 MB/sec
xor: using function: pIII_sse (4393.200 MB/sec)
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=5
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.GOLA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.GOLB._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 16 devices
ACPI: ACPI bus type pnp unregistered
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a
report
Bluetooth: Core ver 2.11
NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
hpet0: at MMIO 0xfec01000, IRQs 2, 8, 0
hpet0: 3 32-bit timers, 14318180 Hz
ACPI: RTC can wake from S4
Switched to high resolution mode on CPU 0
Switched to high resolution mode on CPU 1
system 00:0b: ioport range 0x680-0x6ff has been reserved
system 00:0b: ioport range 0x295-0x296 has been reserved
system 00:0b: ioport range 0x778-0x77f has been reserved
system 00:0b: ioport range 0xb78-0xb7f has been reserved
system 00:0b: ioport range 0xf78-0xf7f has been reserved
system 00:0c: ioport range 0x4d0-0x4d1 has been reserved
system 00:0c: ioport range 0x5000-0x50bf has been reserved
system 00:0c: ioport range 0x50e0-0x50ff has been reserved
system 00:0c: ioport range 0x50c0-0x50df has been reserved
system 00:0c: ioport range 0xde00-0xde7f has been reserved
system 00:0c: ioport range 0xde80-0xdeff has been reserved
system 00:0e: iomem range 0xfec00000-0xfec00fff has been reserved
system 00:0e: iomem range 0xfee00000-0xfee00fff has been reserved
system 00:0e: iomem range 0xfff80000-0xffffffff could not be reserved
system 00:0e: iomem range 0xff780000-0xff7fffff could not be reserved
system 00:0f: iomem range 0x0-0x9ffff could not be reserved
system 00:0f: iomem range 0xc0000-0xdffff could not be reserved
system 00:0f: iomem range 0xe0000-0xfffff could not be reserved
system 00:0f: iomem range 0x100000-0x4cffffff could not be reserved
system 00:0f: iomem range 0x0-0x0 could not be reserved
PCI: Bridge: 0000:00:06.0
IO window: b000-bfff
MEM window: 0xff500000-0xff5fffff
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0a.0
IO window: disabled.
MEM window: 0xff400000-0xff4fffff
PREFETCH window: 0x00000000fea00000-0x00000000feafffff
PCI: Bridge: 0000:01:01.0
IO window: a000-afff
MEM window: 0xff200000-0xff2fffff
PREFETCH window: 0x00000000ce900000-0x00000000ee9fffff
PCI: Bridge: 0000:01:03.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Bridge: 0000:00:0b.0
IO window: a000-afff
MEM window: 0xff200000-0xff3fffff
PREFETCH window: 0x00000000ce900000-0x00000000fe9fffff
ACPI: PCI Interrupt 0000:01:01.0[A] -> GSI 29 (level, low) -> IRQ 29
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
Machine check exception polling timer started.
highmem bounce pool size: 64 pages
Installing knfsd (copyright (C) 1996 [email protected]).
NTFS driver 2.1.29 [Flags: R/W].
fuse init (API version 7.9)
SGI XFS with no debug enabled
async_tx: api initialized (sync-only)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci 0000:00:0b.0: AMD8131 rev 12 detected; disabling PCI-X MMRBC
pci 0000:00:0a.0: AMD8131 rev 12 detected; disabling PCI-X MMRBC
pci 0000:05:08.0: Firmware left e100 interrupts enabled; disabling
pci 0000:03:00.0: Boot video device
vga16fb: initializing
vga16fb: mapped to 0xc00a0000
Console: switching to colour frame buffer device 80x30
fb0: VGA16 VGA frame buffer device
lp: driver loaded but no devices found
Real Time Clock Driver v1.12ac
hpet_resources: 0xfec01000 is busy
Non-volatile memory driver v1.2
AMD768 RNG detected
ppdev: user-space parallel port driver
Linux agpgart interface v0.103
[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 29 (level, low) -> IRQ 29
PCI: Setting latency timer of device 0000:03:00.0 to 64
[drm] Initialized radeon 1.28.0 20060524 on minor 0
ipmi message handler version 39.1
ipmi device interface
IPMI System Interface driver.
ipmi_si: Unable to find any System Interface(s)
Copyright (C) 2004 MontaVista Software - IPMI Powerdown via sys_reboot.
Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is
60 seconds).
Hangcheck: Using get_cycles().
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:08: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport_pc 00:0a: reported by Plug and Play ACPI
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
lp0: using parport0 (interrupt-driven).
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
loop: module loaded
nbd: registered device at major 43
usbcore: registered new interface driver ub
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
e1000e: Copyright (c) 1999-2007 Intel Corporation.
Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)
bonding: Warning: either miimon or arp_interval and arp_ip_target module
parameters must be specified, otherwise
bonding will not detect link failures! see bonding.txt for details.
e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
ACPI: PCI Interrupt 0000:05:08.0[A] -> GSI 18 (level, low) -> IRQ 18
e100: eth0: e100_probe: addr 0xff5fc000, irq 18, MAC addr
00:e0:81:2a:cc:b2
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <[email protected]>
Uniform Multi-Platform E-IDE driver
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
AMD8111: IDE controller (0x1022:0x7469 rev 0x03) at PCI slot 0000:00:07.1
AMD8111: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:PIO, hdb:PIO
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:PIO
Probing IDE interface ide0...
Probing IDE interface ide1...
hdc: VOM-12E48X, ATAPI CD/DVD-ROM drive
hdc: host max PIO5 wanted PIO255(auto-tune) selected PIO4
hdc: UDMA/33 mode selected
ide1 at 0x170-0x177,0x376 on irq 15
hdc: ATAPI 1X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache
Uniform CD-ROM driver Revision: 3.20
Loading iSCSI transport class v2.0-869.
Loading Adaptec I2O RAID: Version 2.4 Build 5go
Detecting Adaptec I2O RAID controllers...
ACPI: PCI Interrupt 0000:01:03.1[A] -> GSI 28 (level, low) -> IRQ 28
Adaptec I2O RAID controller 0 at f8880000 size=100000 irq=28
dpti: If you have a lot of devices this could take a few minutes.
dpti0: Reading the hardware resource table.
TID 008 Vendor: ADAPTEC Device: AIC-7899 Rev: 00000001
TID 009 Vendor: ADAPTEC Device: AIC-7899 Rev: 00000001
TID 523 Vendor: ADAPTEC Device: RAID-1 Rev: 370F
TID 524 Vendor: ADAPTEC Device: RAID-1 Rev: 370F
scsi0 : Vendor: Adaptec Model: 3210S FW:370F
scsi 0:0:1:0: Direct-Access ADAPTEC RAID-1 370F PQ: 0 ANSI:
2
scsi 0:1:0:0: Direct-Access ADAPTEC RAID-1 370F PQ: 0 ANSI:
2
Adaptec aacraid driver 1.1-5[2455]-ms
Driver 'sd' needs updating - please use bus_type methods
sd 0:0:1:0: [sda] 71686144 512-byte hardware sectors (36703 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: cf 00 10 08
sd 0:0:1:0: [sda] Write cache: disabled, read cache: enabled, supports DPO
and FUA
sd 0:0:1:0: [sda] 71686144 512-byte hardware sectors (36703 MB)
sd 0:0:1:0: [sda] Write Protect is off
sd 0:0:1:0: [sda] Mode Sense: cf 00 10 08
sd 0:0:1:0: [sda] Write cache: disabled, read cache: enabled, supports DPO
and FUA
sda: sda1 sda2 sda3
sd 0:0:1:0: [sda] Attached SCSI disk
sd 0:1:0:0: [sdb] 71686144 512-byte hardware sectors (36703 MB)
sd 0:1:0:0: [sdb] Write Protect is off
sd 0:1:0:0: [sdb] Mode Sense: ab 00 10 08
sd 0:1:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO
and FUA
sd 0:1:0:0: [sdb] 71686144 512-byte hardware sectors (36703 MB)
sd 0:1:0:0: [sdb] Write Protect is off
sd 0:1:0:0: [sdb] Mode Sense: ab 00 10 08
sd 0:1:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO
and FUA
sdb: sdb1 sdb2
sd 0:1:0:0: [sdb] Attached SCSI disk
Driver 'sr' needs updating - please use bus_type methods
sd 0:0:1:0: Attached scsi generic sg0 type 0
sd 0:1:0:0: Attached scsi generic sg1 type 0
SCSI Media Changer driver v0.25
Driver 'ch' needs updating - please use bus_type methods
I2O subsystem v1.325
i2o: max drivers = 8
i2o: Checking for PCI I2O controllers...
iop0: controller found (0000:01:03.1)
PCI: Unable to reserve mem region #1:8000000@f0000000 for device
0000:01:03.1
iop0: device already claimed
iop0: DMA / IO allocation for I2O controller failed
I2O Configuration OSM v1.323
I2O Bus Adapter OSM v1.317
I2O Block Device OSM v1.325
I2O SCSI Peripheral OSM v1.316
I2O ProcFS OSM v1.316
Fusion MPT base driver 3.04.06
Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SPI Host driver 3.04.06
Fusion MPT misc device (ioctl) driver 3.04.06
mptctl: Registered with Fusion MPT base driver
mptctl: /dev/mptctl @ (major,minor=10,220)
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
ACPI: PCI Interrupt 0000:05:00.0[D] -> GSI 19 (level, low) -> IRQ 19
ohci_hcd 0000:05:00.0: OHCI Host Controller
ohci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 1
ohci_hcd 0000:05:00.0: irq 19, io mem 0xff5fd000
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 3 ports detected
usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: OHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.25.14 ohci_hcd
usb usb1: SerialNumber: 0000:05:00.0
ACPI: PCI Interrupt 0000:05:00.1[D] -> GSI 19 (level, low) -> IRQ 19
ohci_hcd 0000:05:00.1: OHCI Host Controller
ohci_hcd 0000:05:00.1: new USB bus registered, assigned bus number 2
ohci_hcd 0000:05:00.1: irq 19, io mem 0xff5fe000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: OHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.25.14 ohci_hcd
usb usb2: SerialNumber: 0000:05:00.1
USB Universal Host Controller Interface driver v3.0
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usbcore: registered new interface driver libusual
usbcore: registered new interface driver berry_charge
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard as /class/input/input0
input: PC Speaker as /class/input/input1
input: ImPS/2 Logitech Wheel Mouse as /class/input/input2
rtc_cmos: probe of 00:02 failed with error -16
i2c /dev entries driver
w83627hf: Found W83627HF chip at 0x290
i2c-adapter i2c-0: detect fail: address match, 0x2e
hdaps: supported laptop not found!
hdaps: driver init failed (ret=-19)!
pc87360: PC8736x not detected, module not inserted.
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
md: raid10 personality registered for level 10
raid6: int32x1 582 MB/s
raid6: int32x2 834 MB/s
raid6: int32x4 498 MB/s
raid6: int32x8 312 MB/s
raid6: mmxx1 1176 MB/s
raid6: mmxx2 2177 MB/s
raid6: sse1x1 739 MB/s
raid6: sse1x2 1393 MB/s
raid6: sse2x1 1469 MB/s
raid6: sse2x2 2169 MB/s
raid6: using algorithm sse2x2 (2169 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
device-mapper: ioctl: 4.13.0-ioctl (2007-10-18) initialised:
[email protected]
Bluetooth: HCI USB driver ver 2.9
usbcore: registered new interface driver hci_usb
Bluetooth: Virtual HCI driver ver 1.2
Bluetooth: HCI UART driver ver 2.2
Bluetooth: HCI H4 protocol initialized
Bluetooth: HCI BCSP protocol initialized
Bluetooth: HCILL protocol initialized
Bluetooth: Broadcom Blutonium firmware driver ver 1.1
usbcore: registered new interface driver bcm203x
Bluetooth: Digianswer Bluetooth USB driver ver 0.9
usbcore: registered new interface driver bpa10x
Bluetooth: BlueFRITZ! USB driver ver 1.1
usbcore: registered new interface driver bfusb
EDAC MC: Ver: 2.1.0 Aug 4 2008
cpuidle: using governor ladder
cpuidle: using governor menu
Bluetooth: HCI BCSP protocol initialized
Bluetooth: HCILL protocol initialized
Bluetooth: Broadcom Blutonium firmware driver ver 1.1
usbcore: registered new interface driver bcm203x
Bluetooth: Digianswer Bluetooth USB driver ver 0.9
usbcore: registered new interface driver bpa10x
Bluetooth: BlueFRITZ! USB driver ver 1.1
usbcore: registered new interface driver bfusb
EDAC MC: Ver: 2.1.0 Aug 4 2008
cpuidle: using governor ladder
cpuidle: using governor menu
oprofile: using NMI interrupt.
pktgen v2.69: Packet Generator for packet performance testing.
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
IPv4 over IPv4 tunneling driver
GRE over IPv4 tunneling driver
ip_tables: (C) 2000-2006 Netfilter Core Team
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Bridge firewalling registered
Bluetooth: L2CAP ver 2.9
Bluetooth: L2CAP socket layer initialized
Bluetooth: SCO (Voice Link) ver 0.5
Bluetooth: SCO socket layer initialized
Bluetooth: RFCOMM socket layer initialized
Bluetooth: RFCOMM TTY layer initialized
Bluetooth: RFCOMM ver 1.8
Bluetooth: BNEP (Ethernet Emulation) ver 1.2
Bluetooth: BNEP filters: protocol multicast
Bluetooth: HIDP (Human Interface Emulation) ver 1.2
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
802.1Q VLAN Support v1.8 Ben Greear <[email protected]>
All bugs added by David S. Miller <[email protected]>
Using IPI No-Shortcut mode
drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
md: Autodetecting RAID arrays.
md: Scanned 0 and added 0 devices.
md: autorun ...
md: ... autorun DONE.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 304k freed

2009-01-05 22:17:59

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Tue, 6 Jan 2009, Cyrill Gorcunov wrote:

>
> Yes, you even may combine it with the patch Johannes proposed.
> They should not interfere.

I'm not spotting that patch.

David Lang

2009-01-05 22:18:45

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Mon, 5 Jan 2009, Yinghai Lu wrote:

> can you make sure X86_64_APCI_NUMA is set?
>
> # CONFIG_X86_64_ACPI_NUMA is not set

will do, recompiling...

David Lang

2009-01-05 22:30:31

by Yinghai Lu

[permalink] [raw]
Subject: Re: early exception error

[email protected] wrote:
> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>
>> Can you send out boot log with working kernel before 2.6.28?
>>
>> need to look at the e820 table.
>
>
> here is the 32 bit kernel I'm running now (from dmesg )
>
> Linux version 2.6.25.14 (root@dlang) (gcc version 3.3.6) #1 SMP PREEMPT
> Mon Aug 4 18:22:50 PDT 2008
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
> BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000004cff0000 (usable)
> BIOS-e820: 000000004cff0000 - 000000004cfff000 (ACPI data)
> BIOS-e820: 000000004cfff000 - 000000004d000000 (ACPI NVS)
> BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
> Warning only 4GB will be used.
> Use a HIGHMEM64G enabled kernel.
> 3200MB HIGHMEM available.
> 896MB LOWMEM available.
> Scan SMP from c0000000 for 1024 bytes.
> Scan SMP from c009fc00 for 1024 bytes.
> Scan SMP from c00f0000 for 65536 bytes.
> found SMP MP-table at [c00ff780] 000ff780
> Entering add_active_range(0, 0, 1048576) 0 entries of 256 used
> Zone PFN ranges:
> DMA 0 -> 4096
> Normal 4096 -> 229376
> HighMem 229376 -> 1048576
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
> 0: 0 -> 1048576
> On node 0 totalpages: 1048576
> DMA zone: 32 pages used for memmap
> DMA zone: 0 pages reserved
> DMA zone: 4064 pages, LIFO batch:0
> Normal zone: 1760 pages used for memmap
> Normal zone: 223520 pages, LIFO batch:31
> HighMem zone: 6400 pages used for memmap
> HighMem zone: 812800 pages, LIFO batch:31
> Movable zone: 0 pages used for memmap


BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000004cff0000 (usable)
BIOS-e820: 000000004cff0000 - 000000004cfff000 (ACPI data)
BIOS-e820: 000000004cfff000 - 000000004d000000 (ACPI NVS)
BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000180000000 (usable)

and for pci mem route: node0 [0, 2g), node1 is [4g, 6g)

and e820 said only 3g can be used... [0, 1g), and [4g, 6g)

please check if you can change HW memhole setup to 3g instead 2g.

YH

2009-01-05 22:32:33

by Yinghai Lu

[permalink] [raw]
Subject: Re: early exception error

Yinghai Lu wrote:
> [email protected] wrote:
>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>
>>> Can you send out boot log with working kernel before 2.6.28?
>>>
>>> need to look at the e820 table.
>>
>> here is the 32 bit kernel I'm running now (from dmesg )
>>
>> Linux version 2.6.25.14 (root@dlang) (gcc version 3.3.6) #1 SMP PREEMPT
>> Mon Aug 4 18:22:50 PDT 2008
>> BIOS-provided physical RAM map:
>> BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>> BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
>> BIOS-e820: 0000000000100000 - 000000004cff0000 (usable)
>> BIOS-e820: 000000004cff0000 - 000000004cfff000 (ACPI data)
>> BIOS-e820: 000000004cfff000 - 000000004d000000 (ACPI NVS)
>> BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
>> BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
>> Warning only 4GB will be used.
>> Use a HIGHMEM64G enabled kernel.
>> 3200MB HIGHMEM available.
>> 896MB LOWMEM available.
>> Scan SMP from c0000000 for 1024 bytes.
>> Scan SMP from c009fc00 for 1024 bytes.
>> Scan SMP from c00f0000 for 65536 bytes.
>> found SMP MP-table at [c00ff780] 000ff780
>> Entering add_active_range(0, 0, 1048576) 0 entries of 256 used
>> Zone PFN ranges:
>> DMA 0 -> 4096
>> Normal 4096 -> 229376
>> HighMem 229376 -> 1048576
>> Movable zone start PFN for each node
>> early_node_map[1] active PFN ranges
>> 0: 0 -> 1048576
>> On node 0 totalpages: 1048576
>> DMA zone: 32 pages used for memmap
>> DMA zone: 0 pages reserved
>> DMA zone: 4064 pages, LIFO batch:0
>> Normal zone: 1760 pages used for memmap
>> Normal zone: 223520 pages, LIFO batch:31
>> HighMem zone: 6400 pages used for memmap
>> HighMem zone: 812800 pages, LIFO batch:31
>> Movable zone: 0 pages used for memmap
>
>
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
> BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000004cff0000 (usable)
> BIOS-e820: 000000004cff0000 - 000000004cfff000 (ACPI data)
> BIOS-e820: 000000004cfff000 - 000000004d000000 (ACPI NVS)
> BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
>
> and for pci mem route: node0 [0, 2g), node1 is [4g, 6g)
>
> and e820 said only 3g can be used... [0, 1g), and [4g, 6g)
>
> please check if you can change HW memhole setup to 3g instead 2g.
in BIOS, to get 1g back.

YH

2009-01-05 23:49:09

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Mon, 5 Jan 2009, [email protected] wrote:

> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>
>> can you make sure X86_64_APCI_NUMA is set?
>>
>> # CONFIG_X86_64_ACPI_NUMA is not set
>
> will do, recompiling...

new version up at http://linux.lang.hm/linux


the -6 builds are the latest ones, pictures are still arriving

38 earlyprintk=vga
38 earlyprintk=vga numa=noacpi
38 earlyprintk=vga numa=off
38 earlyprintk=vga numa-noacpi bootmem_debug

I forgot to try and move the memory hole, I'll reboot and try that.

David Lang

2009-01-05 23:52:14

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Mon, 5 Jan 2009, [email protected] wrote:

> On Mon, 5 Jan 2009, [email protected] wrote:
>
>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>
>>> can you make sure X86_64_APCI_NUMA is set?
>>>
>>> # CONFIG_X86_64_ACPI_NUMA is not set
>>
>> will do, recompiling...
>
> new version up at http://linux.lang.hm/linux
>
>
> the -6 builds are the latest ones, pictures are still arriving
>
> 38 earlyprintk=vga
> 38 earlyprintk=vga numa=noacpi
> 38 earlyprintk=vga numa=off
> 38 earlyprintk=vga numa-noacpi bootmem_debug
>
> I forgot to try and move the memory hole, I'll reboot and try that.

oops, cut-and-paste got me, those were 38-41

David Lang

2009-01-06 00:03:45

by Yinghai Lu

[permalink] [raw]
Subject: Re: early exception error

[email protected] wrote:
> On Mon, 5 Jan 2009, [email protected] wrote:
>
>> On Mon, 5 Jan 2009, [email protected] wrote:
>>
>>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>>
>>>> can you make sure X86_64_APCI_NUMA is set?
>>>>
>>>> # CONFIG_X86_64_ACPI_NUMA is not set
>>>
>>> will do, recompiling...
>>
>> new version up at http://linux.lang.hm/linux
>>
>>
>> the -6 builds are the latest ones, pictures are still arriving
>>
>> 38 earlyprintk=vga
>> 38 earlyprintk=vga numa=noacpi
>> 38 earlyprintk=vga numa=off
>> 38 earlyprintk=vga numa-noacpi bootmem_debug
>>
>> I forgot to try and move the memory hole, I'll reboot and try that.
>

any 64bit kernel before 2.6.28 works on that system?

YH

2009-01-06 00:20:08

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Mon, 5 Jan 2009, Yinghai Lu wrote:

> [email protected] wrote:
>> On Mon, 5 Jan 2009, [email protected] wrote:
>>
>>> On Mon, 5 Jan 2009, [email protected] wrote:
>>>
>>>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>>>
>>>>> can you make sure X86_64_APCI_NUMA is set?
>>>>>
>>>>> # CONFIG_X86_64_ACPI_NUMA is not set
>>>>
>>>> will do, recompiling...
>>>
>>> new version up at http://linux.lang.hm/linux
>>>
>>>
>>> the -6 builds are the latest ones, pictures are still arriving
>>>
>>> 38 earlyprintk=vga
>>> 38 earlyprintk=vga numa=noacpi
>>> 38 earlyprintk=vga numa=off
>>> 38 earlyprintk=vga numa-noacpi bootmem_debug
>>>
>>> I forgot to try and move the memory hole, I'll reboot and try that.
>>
>
> any 64bit kernel before 2.6.28 works on that system?

I'm pretty sure that I've done it on other systems from this buy, but on
this system, no I don't have any working 64 bit system. I tried ubuntu
8.10 and 8.04 before trying to use the config for my working 32 bit system
as the basis for a make oldconfig to try and get a working 64 bit kernel
to start using.

this motherboard is a Tyan Thunder K8S Pro S2882

I just uploaded some snapshots of the bios screens

David Lang

2009-01-06 00:25:56

by Yinghai Lu

[permalink] [raw]
Subject: Re: early exception error

[email protected] wrote:
> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>
>> [email protected] wrote:
>>> On Mon, 5 Jan 2009, [email protected] wrote:
>>>
>>>> On Mon, 5 Jan 2009, [email protected] wrote:
>>>>
>>>>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>>>>
>>>>>> can you make sure X86_64_APCI_NUMA is set?
>>>>>>
>>>>>> # CONFIG_X86_64_ACPI_NUMA is not set
>>>>>
>>>>> will do, recompiling...
>>>>
>>>> new version up at http://linux.lang.hm/linux
>>>>
>>>>
>>>> the -6 builds are the latest ones, pictures are still arriving
>>>>
>>>> 38 earlyprintk=vga
>>>> 38 earlyprintk=vga numa=noacpi
>>>> 38 earlyprintk=vga numa=off
>>>> 38 earlyprintk=vga numa-noacpi bootmem_debug
>>>>
>>>> I forgot to try and move the memory hole, I'll reboot and try that.
>>>
>>
>> any 64bit kernel before 2.6.28 works on that system?
>
> I'm pretty sure that I've done it on other systems from this buy, but on
> this system, no I don't have any working 64 bit system. I tried ubuntu
> 8.10 and 8.04 before trying to use the config for my working 32 bit
> system as the basis for a make oldconfig to try and get a working 64 bit
> kernel to start using.
>
> this motherboard is a Tyan Thunder K8S Pro S2882
>
> I just uploaded some snapshots of the bios screens

please try to update the BIOS. the installed BIOS seems has problem...

http://tyan.com/support_download_bios.aspx?model=S.S2882

YH

2009-01-06 01:01:22

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Mon, 5 Jan 2009, Yinghai Lu wrote:

> [email protected] wrote:
>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>
>>
>> this motherboard is a Tyan Thunder K8S Pro S2882
>>
>> I just uploaded some snapshots of the bios screens
>
> please try to update the BIOS. the installed BIOS seems has problem...
>
> http://tyan.com/support_download_bios.aspx?model=S.S2882

what problems are you seeing (i.e. what did I miss seeing that would have
pointed me in this direction without eating everyone's time)

David Lang

2009-01-06 01:06:51

by Yinghai Lu

[permalink] [raw]
Subject: Re: early exception error

[email protected] wrote:
> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>
>> [email protected] wrote:
>>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>>
>>>
>>> this motherboard is a Tyan Thunder K8S Pro S2882
>>>
>>> I just uploaded some snapshots of the bios screens
>>
>> please try to update the BIOS. the installed BIOS seems has problem...
>>
>> http://tyan.com/support_download_bios.aspx?model=S.S2882
>
> what problems are you seeing (i.e. what did I miss seeing that would
> have pointed me in this direction without eating everyone's time)
>

you should get 4g ram instead of 3g according to e820.

BIOS should set ram routing correctly according to iommu (gart) and memhole etc.

YH

2009-01-06 04:27:40

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Mon, 5 Jan 2009, Yinghai Lu wrote:

> [email protected] wrote:
>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>
>>> [email protected] wrote:
>>>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>>>
>>>>
>>>> this motherboard is a Tyan Thunder K8S Pro S2882
>>>>
>>>> I just uploaded some snapshots of the bios screens
>>>
>>> please try to update the BIOS. the installed BIOS seems has problem...
>>>
>>> http://tyan.com/support_download_bios.aspx?model=S.S2882
>>
>> what problems are you seeing (i.e. what did I miss seeing that would
>> have pointed me in this direction without eating everyone's time)
>>
>
> you should get 4g ram instead of 3g according to e820.
>
> BIOS should set ram routing correctly according to iommu (gart) and memhole etc.

this seems to have solved the problem

I do have another of these systems if there is any desire to do any more
troubleshooting (it would be really nice if things died with a better
error message for example)

if not, thanks for the assistance

David Lang

2009-01-06 06:09:19

by Yinghai Lu

[permalink] [raw]
Subject: Re: early exception error

On Mon, Jan 5, 2009 at 9:29 PM, <[email protected]> wrote:
> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>
>> [email protected] wrote:
>>>
>>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>>
>>>> [email protected] wrote:
>>>>>
>>>>> On Mon, 5 Jan 2009, Yinghai Lu wrote:
>>>>>
>>>>>
>>>>> this motherboard is a Tyan Thunder K8S Pro S2882
>>>>>
>>>>> I just uploaded some snapshots of the bios screens
>>>>
>>>> please try to update the BIOS. the installed BIOS seems has problem...
>>>>
>>>> http://tyan.com/support_download_bios.aspx?model=S.S2882
>>>
>>> what problems are you seeing (i.e. what did I miss seeing that would
>>> have pointed me in this direction without eating everyone's time)
>>>
>>
>> you should get 4g ram instead of 3g according to e820.
>>
>> BIOS should set ram routing correctly according to iommu (gart) and
>> memhole etc.
>
> this seems to have solved the problem

can you post boot log and lspci -vvxxx ?

>
> I do have another of these systems if there is any desire to do any more
> troubleshooting (it would be really nice if things died with a better error
> message for example)

good,
0. setup serial cable between your two system. use minicom in your
first system to capture serial message from second system
1. apply attached patch
2. we need full boot log of your second system. please boot with "
debug console=uart8250,io,0x3f8,115200n8 pci=earlydump"

YH


Attachments:
(No filename) (1.34 kB)
mminit_loglevel_2.patch (1.04 kB)
Download all attachments

2009-01-06 08:01:28

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: early exception error

[[email protected] - Mon, Jan 05, 2009 at 03:20:13PM -0800]
> On Tue, 6 Jan 2009, Cyrill Gorcunov wrote:
>
>>
>> Yes, you even may combine it with the patch Johannes proposed.
>> They should not interfere.
>
> I'm not spotting that patch.
>
> David Lang
>

It's here http://lkml.org/lkml/2009/1/5/51

- Cyrill -

2009-01-07 06:47:51

by David Lang

[permalink] [raw]
Subject: Re: early exception error

On Mon, 5 Jan 2009, Yinghai Lu wrote:

>> this seems to have solved the problem
>
> can you post boot log and lspci -vvxxx ?

I will do this tomorrow

>>
>> I do have another of these systems if there is any desire to do any more
>> troubleshooting (it would be really nice if things died with a better error
>> message for example)
>
> good,
> 0. setup serial cable between your two system. use minicom in your
> first system to capture serial message from second system
> 1. apply attached patch
> 2. we need full boot log of your second system. please boot with "
> debug console=uart8250,io,0x3f8,115200n8 pci=earlydump"

it will take me a little bit of time to do this. the one I've been working
with is now my desktop at work, now that the holidays are over I need to
work with it instead of rebooting it, the other boxes are in a stack at
home. getting at them is easy compared to figuring out where to set them
up. I should have something setup within a couple of days.

David Lang


Attachments:
mminit_loglevel_2.patch (1.04 kB)