2003-03-10 22:42:00

by adrian.golumbovici

[permalink] [raw]
Subject: kernel 2.4.21-pre5 crash at boot with 1GB memory, highmem 4GB and vga=788 in lilo

Motherboard: Asus A7V8X
CPU: Athlon XP 2400+
Memory: 2 modules DDR PC333 of 512MB each. (3.5 hours continuous memtest86
showed no error in any of them).
Graphics Card: Atlantis Sapphire Radeon 9700 Pro

With just one module (same thing with either of them) works ok. As soon as I
have 2nd module in it crashes at kernel boot (lilo menu comes on OK but
after selecting kernel it freezes with capslock and scrolllock lights on and
black screen). Highmem is set to 4G as the subject line says. Tried all
possible settings as append boot parameter (noapic, noacpi, acpi=off, etc)
and it worked only when vga=ask (I picked 2 values out of them in 2 tests
i.e. 0 and 6 and it worked ok). Put the original settings back in lilo.conf
and tried to play with the mem setting at boot. Lowest value at which still
crashes is mem=888M, highest value at which it doesn't crash or give any
segfault errors is 848M. Everything in between these values doesn't crash,
but gives a huge list of segfaults. It still boots, but most modules are
down (cannot be loaded due to the huge list of segfaults) and is highly
unstable. Windows 2000 Pro as second OS with dualboot (I know... I know... I
hate M$ products but I have to play some games which don't work under linux
from time to time... :) ) boots ok and starting a divx encoding with a cache
in RAM set to 1024MB (to force it use all main RAM and see what happens)
still didn't crash after 2 hours of encoding, while main unused RAM went
down to 2.5MB of RAM (didn't go past that, but instead increased the swap
only). I compiled the 2.4.21pre5 kernel with kmsgdump patch and booted with
original parameters (no mem parameter and vga=788 as is standard in my
distro - Mandrake Linux 9.1rc2). Attached you will find the messages.txt
produced by kmsgdump and also the result of running ksymoops on it. Please
help before I go insane.:( I tried all the last 4 available bioses for my
mainboard and also all enterprise kernels (highmem enabled) from my distro,
including the development ones and still no go (same freeze). :( If you need
more info to help pinpoint the problem, please ask. I've been trying through
the whole weekend to get it work or to isolate the problem and am up to the
point where I feel like going to sit in a corner and cry. :(

Best regards,
Adrian Golumbovici


Attachments:
MESSAGES.TXT (16.00 kB)
oops.txt (5.37 kB)
Download all attachments

2003-03-12 12:41:11

by Denis Vlasenko

[permalink] [raw]
Subject: Re: kernel 2.4.21-pre5 crash at boot with 1GB memory, highmem 4GB and vga=788 in lilo

On 11 March 2003 00:51, Adrian Golumbovici wrote:
> Motherboard: Asus A7V8X
> CPU: Athlon XP 2400+
> Memory: 2 modules DDR PC333 of 512MB each. (3.5 hours continuous
> memtest86 showed no error in any of them).

cpuburn sometimes uncover memory failures too (especially burnMMX).
But read on, this is most likely not the case for you.

> Graphics Card: Atlantis Sapphire Radeon 9700 Pro
>
> With just one module (same thing with either of them) works ok. As
> soon as I have 2nd module in it crashes at kernel boot (lilo menu
> comes on OK but after selecting kernel it freezes with capslock and
> scrolllock lights on and black screen). Highmem is set to 4G as the
> subject line says. Tried all possible settings as append boot
> parameter (noapic, noacpi, acpi=off, etc) and it worked only when
> vga=ask (I picked 2 values out of them in 2 tests i.e. 0 and 6 and it
> worked ok).

So it works with standard 80x25 text mode? ok...

AFAIK vesa mode 788=0x314 - 800x600 65k colors (5:6:5 r:g:b bits),
am I remembering right?

Did you try other framebuffer modes like 640x480 256 colors?
I suspect they will fail similarly, but worth testing that.

> Put the original settings back in lilo.conf and tried to
> play with the mem setting at boot. Lowest value at which still
> crashes is mem=888M, highest value at which it doesn't crash or give
> any segfault errors is 848M.

This is interesting. Where does your linear framebuffer memory start?
Post dmesg, /proc/iomem. You may do this from mem=848M boot first,
then from crashing one.

I suspect that kernel somehow overlayed RAM and video RAM ;)

> Everything in between these values
> doesn't crash, but gives a huge list of segfaults. It still boots,
> but most modules are down (cannot be loaded due to the huge list of
> segfaults) and is highly unstable.

You can try "mem=exactmap mem=640K@0 mem=nnnM@1M" to make kernel
avoid stomping into video memory.

It's a bug, we'll need to check why does that happen.

> Windows 2000 Pro as second OS with
> dualboot (I know... I know... I hate M$ products but I have to play
> some games which don't work under linux from time to time... :) )
> boots ok and starting a divx encoding with a cache in RAM set to
> 1024MB (to force it use all main RAM and see what happens) still
> didn't crash after 2 hours of encoding, while main unused RAM went
> down to 2.5MB of RAM (didn't go past that, but instead increased the
> swap only).

Let's debug it, and Linux will not crash too.
--
vda