2002-08-20 13:21:04

by Justin Heesemann

[permalink] [raw]
Subject: shared graphic ram hangs kernel since 2.4.3-ac1

hi..
i finally got some more info about my problem with kernel booting:
after some tests i found all kernels prior (and including) 2.4.3 to boot
properly, whereas every later one (2.4.3-ac1 was the first i tested) failed
right after "Ok, booting the kernel."

since i have 512 mb ram and the onboard gfx chip uses 8MB (can switch to 1MB
in the bios) i decided to try to set the mem to 504 (512 - 8 = 504)

when i use mem=504M as boot parameter, 2.4.3-ac1 boots and all the other
kernels i tested up to 2.4.18 boot with mem=504M.

2.4.19 however doesnt and 2.4.20-pre4 doesnt as well.. no mem=xxxM line helps,
they always hang at "Ok, booting the kernel."


Kernels prior to 2.4.3 didn't need a mem=xxxM line at all.


Here is some info about the system:
Epox 4G4A+, i845G, onboard HPT372, onboard Realteak RTL 8100B, onboard
Integrated Graphics Controller, ... 512 MB Double Sided DDR Ram 2.5 V, P4 2.0
GHz Northwood


/proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.00GHz
stepping : 4
cpu MHz : 2019.980
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 4023.91


/proc/meminfo

total: used: free: shared: buffers: cached:
Mem: 526389248 13664256 512724992 0 598016 6148096
Swap: 509956096 0 509956096
MemTotal: 514052 kB
MemFree: 500708 kB
MemShared: 0 kB
Buffers: 584 kB
Cached: 6004 kB
SwapCached: 0 kB
Active: 3640 kB
Inactive: 4524 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 514052 kB
LowFree: 500708 kB
SwapTotal: 498004 kB
SwapFree: 498004 kB


lspci -vvv

00:00.0 Host bridge: Intel Corp.: Unknown device 2560 (rev 01)
Subsystem: Unknown device 1695:4002
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort+ >SERR- <PERR-
Latency: 0
Region 0: Memory at e8000000 (32-bit, prefetchable) [size=64M]
Capabilities: [e4] #09 [0105]

00:02.0 VGA compatible controller: Intel Corp.: Unknown device 2562 (rev 01)
(prog-if 00 [VGA])
Subsystem: Unknown device 1695:9002
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 10
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M]
Region 1: Memory at ee000000 (32-bit, non-prefetchable) [size=512K]
Capabilities: [d0] Power Management version 1
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:1e.0 PCI bridge: Intel Corp. 82820 820 (Camino 2) Chipset PCI (rev 81)
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR+
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
I/O behind bridge: 00009000-0000afff
Memory behind bridge: ec000000-edffffff
Prefetchable memory behind bridge: fff00000-000fffff
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-

00:1f.0 ISA bridge: Intel Corp.: Unknown device 24c0 (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 0

00:1f.1 IDE interface: Intel Corp.: Unknown device 24cb (rev 01) (prog-if 8a
[Master SecP PriP])
Subsystem: Unknown device 1695:4002
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at <unassigned> [size=8]
Region 1: I/O ports at <unassigned> [size=4]
Region 2: I/O ports at <unassigned> [size=8]
Region 3: I/O ports at <unassigned> [size=4]
Region 4: I/O ports at f000 [size=16]
Region 5: Memory at 20000000 (32-bit, non-prefetchable) [size=1K]

00:1f.3 SMBus: Intel Corp.: Unknown device 24c3 (rev 01)
Subsystem: Unknown device 1695:4002
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Interrupt: pin B routed to IRQ 11
Region 4: I/O ports at 0500 [size=32]

01:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
Subsystem: Unknown device 1695:9001
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 32 (8000ns min, 16000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at a400 [size=256]
Region 1: Memory at ed000000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [60] Vital Product Data


--
Best Regards,
Justin Heesemann


2002-08-21 11:44:45

by Justin Heesemann

[permalink] [raw]
Subject: Re: shared graphic ram hangs kernel since 2.4.3-ac1

ok.. i finally managed to get the exact file that causes the problems.


2.4.19-pre6 works.
2.4.19-pre7 doesnt: hangs right after "Ok, booting the kernel"
2.4.19-pre7 with pre6 arch/i386/kernel/setup.c works !
as i dont have any highmem support configured and as i always have to provide
the option mem=511M (due to 1MB shared video ram) i suspect that part of
setup.c. but as i'm not a kernel hacker, any help would be appreciated.
note: any kernel prior to 2.4.3 was able to boot without the mem=511M option.

http://www.kernel.org/diff/diffview.cgi?css=%2Fdiff%2Fdiff.css;file=%2Fpub%2Flinux%2Fkernel%2Fv2.4%2Ftesting%2Fincr%2Fpatch-2.4.19-pre6-pre7.gz;z=54

shows the diff which causes my problems..
anyone ?

--
Best Regards,
Justin Heesemann

2002-08-21 13:11:46

by Alan

[permalink] [raw]
Subject: Re: shared graphic ram hangs kernel since 2.4.3-ac1

On Wed, 2002-08-21 at 12:52, Justin Heesemann wrote:
> 2.4.19-pre7 with pre6 arch/i386/kernel/setup.c works !
> as i dont have any highmem support configured and as i always have to provide
> the option mem=511M (due to 1MB shared video ram) i suspect that part of
> setup.c. but as i'm not a kernel hacker, any help would be appreciated.
> note: any kernel prior to 2.4.3 was able to boot without the mem=511M option.

Are you running a very old version of grub ?

2002-08-21 13:22:12

by Justin Heesemann

[permalink] [raw]
Subject: Re: shared graphic ram hangs kernel since 2.4.3-ac1

On Wednesday 21 August 2002 15:16, Alan Cox wrote:
> On Wed, 2002-08-21 at 12:52, Justin Heesemann wrote:
> > 2.4.19-pre7 with pre6 arch/i386/kernel/setup.c works !
> > as i dont have any highmem support configured and as i always have to
> > provide the option mem=511M (due to 1MB shared video ram) i suspect
> > that part of setup.c. but as i'm not a kernel hacker, any help would be
> > appreciated. note: any kernel prior to 2.4.3 was able to boot without the
> > mem=511M option.
>
> Are you running a very old version of grub ?

actually i am running lilo..
the one that comes with debian 3.0.
the problem also occurs with every bootable linux cd, that i tried.. as long
as it's running kernel 2.4.19.
debian bf24 kernel image (i think its 2.4.16?) is booting when i append
mem=511M, knoppix/gentoo with 2.4.19 doesnt.

would you suggest that i try grub ?

2002-08-21 14:03:34

by Alan

[permalink] [raw]
Subject: Re: shared graphic ram hangs kernel since 2.4.3-ac1

On Wed, 2002-08-21 at 14:29, Justin Heesemann wrote:
> > Are you running a very old version of grub ?
>
> actually i am running lilo..
> the one that comes with debian 3.0.
> the problem also occurs with every bootable linux cd, that i tried.. as long
> as it's running kernel 2.4.19.
> debian bf24 kernel image (i think its 2.4.16?) is booting when i append
> mem=511M, knoppix/gentoo with 2.4.19 doesnt.
>
> would you suggest that i try grub ?

It shouldnt make any difference. Very old grb always passed mem=, which
did break some things because at one point it overrode the reporting of
holes and the like.

Shared graphic ram shouldnt in theory ever be causing hangs. The BIOS
E820 memory reporting should be excluding any video reserved memory from
its reporting. For the i810/845 its fractionally more complex once we go
into X11 (we allocate from the AGP pool ourselves) but not in console
mode.

2002-08-21 18:17:49

by Justin Heesemann

[permalink] [raw]
Subject: Re: shared graphic ram hangs kernel since 2.4.3-ac1

On Wednesday 21 August 2002 16:08, Alan Cox wrote:

> Shared graphic ram shouldnt in theory ever be causing hangs. The BIOS
> E820 memory reporting should be excluding any video reserved memory from
> its reporting. For the i810/845 its fractionally more complex once we go
> into X11 (we allocate from the AGP pool ourselves) but not in console
> mode.

Well.. how ever the problem I have does exist. whether it's because of the
shared ram or not.
When i add the option mem=512M (which simply ignores the fact of the 1MB
shared ram), i get this kernel panic with 2.4.19-pre6 (which is booting fine
with mem=511M):

Linux version 2.4.19-pre6 (root@lux) (gcc version 2.95.3 20010315 (release))
#52
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000a0000 - 000000001fef0000 (reserved)
BIOS-e820: 000000001fef0000 - 000000001fef3000 (ACPI NVS)
BIOS-e820: 000000001fef3000 - 000000001ff00000 (ACPI data)
On node 0 totalpages: 131072
zone(0): 4096 pages.
zone(1): 126976 pages.
zone(2): 0 pages.
Kernel command line: BOOT_IMAGE=test ro root=304 mem=512M console=ttyS0,9600
Initializing CPU#0
Detected 2019.980 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4023.91 BogoMIPS
Memory: 516460k/524288k available (1045k kernel code, 7440k reserved, 326k
data)
Dentry-cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 512K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: Intel(R) Pentium(R) 4 CPU 2.00GHz stepping 04
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
PCI: PCI BIOS revision 2.10 entry at 0xfae70, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
Unknown bridge resource 2: assuming transparent
PCI: Using IRQ router PIIX [8086/24c0] at 00:1f.0
PCI: Found IRQ 10 for device 00:1f.1
PCI: Sharing IRQ 10 with 00:02.0
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Unable to handle kernel NULL pointer dereference at virtual address 00000003
printing eip:
c013c543
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c013c543>] Not tainted
EFLAGS: 00010246
eax: ffffffff ebx: c1588420 ecx: dffaa9d0 edx: c1588430
esi: 00000002 edi: dfee7060 ebp: dfefdf70 esp: dfefdf18
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 1, stackpage=dfefd000)
Stack: c1588420 c0142b08 c1588420 c1588420 dfee7060 c158be00 00000000 c0247d7c
00000002 0000000c dfefdf60 dfefdf5c 00000002 c15883a0 c1588420 c15884a0
c020faec 00000006 d66fc523 c020faf3 00000008 7ee58900 00000000 c013092f
Call Trace: [<c0142b08>] [<c013092f>] [<c01309e5>] [<c0130b59>] [<c010502f>]
[<c0106ef8>]

Code: 89 50 04 89 43 10 89 4a 04 89 11 5b c3 55 57 56 53 8b 6c 24
<0>Kernel panic: Attempted to kill init!




And since the problem definitivly seems to be in
2.4.19-pre7/arch/i386/kernel/setup.c (pre6 one's working, well.. not with
mem=512M but with mem=511M) do you have any idea of what I could try to do
next ? Could this be caused by bad hardware ? (Since everything seems to run
so fine with 2.4.19-pre6 I don't want to believe that.. but this would make
things a lot easier..contact the hardware vendor.. get new one :)