2001-11-15 01:12:13

by Sven.Riedel

[permalink] [raw]
Subject: 2.4.14 Oops during boot (KT133A Problem?)

[Note: I already sent this Oops-Report to the list a few days ago, but I
didn't see it arrive. I apologize in case this mail reaches the list
twice...]

Hi,
I get the following kernel oops when booting 2.4.14 vanilla on an Athlon
1200, KT133A Chipset Motherboard, 768MB RAM. Kernel has been compiled
for Athlon CPUs, non-SMP. The next thing the kernel would have done
after the oops would have been to start kswapd.
Sidenote: booting the exact same kernel from floppy (debian rescue with
kernels exchanged and booting with 'linux root=/dev/hda6') gives me a rather
unstable system for a short while (uptime usually < 6 hours before oopsing).

Linux NET 4.0 for Linux 2.4
Based upon Swansea University Computer Society Net 3.039
Unable to handle kernel NULL pointer dereference at virtual address
00000003 printing eip:
c0112073
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0112073>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: f7efe000 ebx: f7efe000 ecx: 00000000 edx: ffffffff
esi: c1e1a5a0 edi: fffffff5 ebp: 00000700 esp: c1e1bf50
ds: 0018 es: 0018 ss:0018
Process swapper (pid: 1, stackpage=c1e1b000)
Stack: c1e1a000 c1e1bfc4 c023d3c0 c1e1bf88 00000000 00000000 c0228f80 00000004
c0228f80 c01058be 00000700 00000078 c1e1bf90 00000000 0008e000 c0106c4b
00000700 00000078 c011ded0 c1e1bfc4 c023d3c0 0008e000 00000078 00000018
Call Trace: [<c01058be>] [<c0106c4b>] [<c011ded0>] [<c01054df>] [<c011e124>]
[<c011ded0>] [<c0105047>] [<c01054e8>]
Code: 8b 42 04 3b 83 18 02 00 00 0f 83 a7 05 00 00 ff 02 8b 83 e4

>>EIP; c0112072 <do_fork+62/640> <=====
Trace; c01058be <sys_clone+1e/30>
Trace; c0106c4a <system_call+32/38>
Trace; c011ded0 <context_thread+0/1a0>
Trace; c01054de <kernel_thread+1e/40>
Trace; c011e124 <start_context_thread+14/30>
Trace; c011ded0 <context_thread+0/1a0>
Trace; c0105046 <init+6/110>
Trace; c01054e8 <kernel_thread+28/40>
Code; c0112072 <do_fork+62/640>
00000000 <_EIP>:
Code; c0112072 <do_fork+62/640> <=====
0: 8b 42 04 mov 0x4(%edx),%eax <=====
Code; c0112074 <do_fork+64/640>
3: 3b 83 18 02 00 00 cmp 0x218(%ebx),%eax
Code; c011207a <do_fork+6a/640>
9: 0f 83 a7 05 00 00 jae 5b6 <_EIP+0x5b6> c0112628
<do_fork+618/640>
Code; c0112080 <do_fork+70/640>
f: ff 02 incl (%edx)
Code; c0112082 <do_fork+72/640>
11: 8b 83 e4 00 00 00 mov 0xe4(%ebx),%eax

<0>Kernel panic: Attempted to kill init!

Regs,
Sven
--
Sven Riedel [email protected]
Osteroeder Str. 6 / App. 13 [email protected]
38678 Clausthal "Call me bored, but don't call me boring."
- Larry Wall


2001-11-15 02:41:18

by nakai

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

I think you'd be better compile kernel for K6, not for K7. There is
something wrong with KT133 chip set and Athlon/Duron.

[email protected] wrote:
> I get the following kernel oops when booting 2.4.14 vanilla on an Athlon
> 1200, KT133A Chipset Motherboard, 768MB RAM. Kernel has been compiled
> for Athlon CPUs, non-SMP. The next thing the kernel would have done
> after the oops would have been to start kswapd.
> Sidenote: booting the exact same kernel from floppy (debian rescue with
> kernels exchanged and booting with 'linux root=/dev/hda6') gives me a rather
> unstable system for a short while (uptime usually < 6 hours before oopsing).
>
> Linux NET 4.0 for Linux 2.4
> Based upon Swansea University Computer Society Net 3.039
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000003 printing eip:
> c0112073
> *pde = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<c0112073>] Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010286
> eax: f7efe000 ebx: f7efe000 ecx: 00000000 edx: ffffffff
> esi: c1e1a5a0 edi: fffffff5 ebp: 00000700 esp: c1e1bf50
> ds: 0018 es: 0018 ss:0018
> Process swapper (pid: 1, stackpage=c1e1b000)

--
-=-=-=-= SHINKO ELECTRIC INDUSTRIES CO., LTD. =-=-=-=-
=-=-=-=- Core Technology Research & Laboratory, -=-=-=-=
-=-=-=-= Infomation Technology Research Dept. =-=-=-=-
=-=-=-=- Name:Hisakazu Nakai TEL:026-283-2866 -=-=-=-=
-=-=-=-= Mail:[email protected] FAX:026-283-2820 =-=-=-=-

2001-11-15 03:48:23

by Sven.Riedel

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

On Thu, Nov 15, 2001 at 11:40:54AM +0900, nakai wrote:
> I think you'd be better compile kernel for K6, not for K7. There is
> something wrong with KT133 chip set and Athlon/Duron.

I just tried it with K6 and plain old 386 CPU settings. In both cases
the same oops happened at the same place (+- a few bytes due to the
different opcodes generated by the compiler).
If more machine info is needed, I'd be glad to provide it.

Regs,
Sven
--
Sven Riedel [email protected]
Osteroeder Str. 6 / App. 13 [email protected]
38678 Clausthal "Call me bored, but don't call me boring."
- Larry Wall

2001-11-15 08:03:35

by nakai

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

[email protected] wrote:
>
> On Thu, Nov 15, 2001 at 11:40:54AM +0900, nakai wrote:
> > I think you'd be better compile kernel for K6, not for K7. There is
> > something wrong with KT133 chip set and Athlon/Duron.
>
> I just tried it with K6 and plain old 386 CPU settings. In both cases
> the same oops happened at the same place (+- a few bytes due to the
> different opcodes generated by the compiler).

I have no idea. In my case, it went well.

> If more machine info is needed, I'd be glad to provide it.

Do you have any cards on PCI bus ?

--
-=-=-=-= SHINKO ELECTRIC INDUSTRIES CO., LTD. =-=-=-=-
=-=-=-=- Core Technology Research & Laboratory, -=-=-=-=
-=-=-=-= Infomation Technology Research Dept. =-=-=-=-
=-=-=-=- Name:Hisakazu Nakai TEL:026-283-2866 -=-=-=-=
-=-=-=-= Mail:[email protected] FAX:026-283-2820 =-=-=-=-

2001-11-15 10:22:41

by Sven.Riedel

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

On Thu, Nov 15, 2001 at 05:03:08PM +0900, nakai wrote:
> > If more machine info is needed, I'd be glad to provide it.
> Do you have any cards on PCI bus ?

Yes. The following is my /proc/pci:
PCI devices found:
Bus 0, device 0, function 0:
Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev
3).
Master Capable. Latency=8.
Prefetchable 32 bit memory at 0xd0000000 [0xd3ffffff].
Bus 0, device 1, function 0:
PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
(rev 0).
Master Capable. No bursts. Min Gnt=12.
Bus 0, device 7, function 0:
ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 64).
Bus 0, device 7, function 1:
IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 6).
Master Capable. Latency=32.
I/O at 0x9000 [0x900f].
Bus 0, device 7, function 2:
USB Controller: VIA Technologies, Inc. UHCI USB (rev 22).
IRQ 15.
Master Capable. Latency=32.
I/O at 0x9400 [0x941f].
Bus 0, device 7, function 4:
Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev
64).
Bus 0, device 7, function 5:
Multimedia audio controller: VIA Technologies, Inc. AC97 Audio
Controller (rev 80).
IRQ 12.
I/O at 0x9c00 [0x9cff].
I/O at 0xa000 [0xa003].
I/O at 0xa400 [0xa403].
Bus 0, device 9, function 0:
Ethernet controller: Davicom Semiconductor, Inc. Ethernet 100/10
MBit (rev 49).
IRQ 11.
Master Capable. Latency=32. Min Gnt=20.Max Lat=40.
I/O at 0xa800 [0xa8ff].
Non-prefetchable 32 bit memory at 0xda000000 [0xda0000ff].
Bus 0, device 11, function 0:
SCSI storage controller: Advanced Micro Devices [AMD] 53c974
[PCscsi] (rev 16).
IRQ 15.
Master Capable. Latency=32. Min Gnt=4.Max Lat=40.
I/O at 0xac00 [0xac7f].
Bus 0, device 13, function 0:
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
(rev 0).
IRQ 15.
I/O at 0xb000 [0xb01f].
Bus 0, device 14, function 0:
Unknown mass storage controller: Triones Technologies, Inc. HPT366 /
HPT370 (rev 4).
IRQ 11.
Master Capable. Latency=64. Min Gnt=8.Max Lat=8.
I/O at 0xb400 [0xb407].
I/O at 0xb800 [0xb803].
I/O at 0xbc00 [0xbc07].
I/O at 0xc000 [0xc003].
I/O at 0xc400 [0xc4ff].
Bus 1, device 0, function 0:
VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev
4).
IRQ 10.
Master Capable. Latency=64. Min Gnt=16.Max Lat=32.
Prefetchable 32 bit memory at 0xd4000000 [0xd5ffffff].
Non-prefetchable 32 bit memory at 0xd6000000 [0xd6003fff].
Non-prefetchable 32 bit memory at 0xd7000000 [0xd77fffff].

Regs,
Sven

--
Sven Riedel [email protected]
Osteroeder Str. 6 / App. 13 [email protected]
38678 Clausthal "Call me bored, but don't call me boring."
- Larry Wall

2001-11-15 23:59:34

by nakai

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

[email protected] wrote:
>
> On Thu, Nov 15, 2001 at 05:03:08PM +0900, nakai wrote:
> > > If more machine info is needed, I'd be glad to provide it.
> > Do you have any cards on PCI bus ?
> Bus 0, device 14, function 0:
> Unknown mass storage controller: Triones Technologies, Inc. HPT366 /
> HPT370 (rev 4).
> IRQ 11.
> Master Capable. Latency=64. Min Gnt=8.Max Lat=8.
> I/O at 0xb400 [0xb407].
> I/O at 0xb800 [0xb803].
> I/O at 0xbc00 [0xbc07].
> I/O at 0xc000 [0xc003].
> I/O at 0xc400 [0xc4ff].
> Bus 1, device 0, function 0:

Can you remove or disable HPT366? Or how about you using 2.4.2
kernel? I am using 2.4.2 kernel, because 2.4.10's ide-pci has
something wrong with promise IDE cards. I did not check about
HPT366 chip, but I think it would be better 2.4.2 kernel.

--
-=-=-=-= SHINKO ELECTRIC INDUSTRIES CO., LTD. =-=-=-=-
=-=-=-=- Core Technology Research & Laboratory, -=-=-=-=
-=-=-=-= Infomation Technology Research Dept. =-=-=-=-
=-=-=-=- Name:Hisakazu Nakai TEL:026-283-2866 -=-=-=-=
-=-=-=-= Mail:[email protected] FAX:026-283-2820 =-=-=-=-

2001-11-19 16:11:50

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

In article <[email protected]> [email protected] wrote:
>I think you'd be better compile kernel for K6, not for K7. There is
>something wrong with KT133 chip set and Athlon/Duron.

There is a patch for this chipset around, which AFAIK was never put in
the kernel because the exact function of the patch was not known WRT the
chipset internals. None the less, without it my Athlon systems won't run
an Athlon compiled kernel, and while a P6 kernel will boot,
Athlon-optimized user software will hang the system. Since that's not
acceptable I run the patch. You should be able to find it on this list
in the archives.

--
bill davidsen <[email protected]>
His first management concern is not solving the problem, but covering
his ass. If he lived in the middle ages he'd wear his codpiece backward.

2001-11-19 22:50:27

by Sven.Riedel

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

On Mon, Nov 19, 2001 at 11:11:20AM -0500, bill davidsen wrote:
> In article <[email protected]> [email protected] wrote:
> >I think you'd be better compile kernel for K6, not for K7. There is
> >something wrong with KT133 chip set and Athlon/Duron.
>
> There is a patch for this chipset around, which AFAIK was never put in
> the kernel because the exact function of the patch was not known WRT the
> chipset internals.
I guess you mean Kurt Garloffs patch that turned certain features on and
off, since the bit 55 disable patch is in 2.4.14. Unfortunately, I still
get the Oops after applying this patch (and the quirk-fixes were
executed before the oops).

Regs,
Sven

--
Sven Riedel [email protected]
Osteroeder Str. 6 / App. 13 [email protected]
38678 Clausthal "Call me bored, but don't call me boring."
- Larry Wall

2001-11-19 23:31:42

by nakai

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

Excuse me, I had holidays.

[email protected] wrote:
>
> On Fri, Nov 16, 2001 at 08:59:11AM +0900, nakai wrote:
> > I am using 2.4.2 kernel, because 2.4.10's ide-pci has
> > something wrong with promise IDE cards. I did not check about
> > HPT366 chip, but I think it would be better 2.4.2 kernel.
> Could the EIDE code cause problems that far up in the kernel booting
> procedure? The IDE drivers aren't loaded yet at that point...

Yes, you are right.

May I ask you how to get /proc/pci ?
Is there any kernel which can run ?
Is it correctly hardware trouble or kernel trouble ?

--
-=-=-=-= SHINKO ELECTRIC INDUSTRIES CO., LTD. =-=-=-=-
=-=-=-=- Core Technology Research & Laboratory, -=-=-=-=
-=-=-=-= Infomation Technology Research Dept. =-=-=-=-
=-=-=-=- Name:Hisakazu Nakai TEL:026-283-2866 -=-=-=-=
-=-=-=-= Mail:[email protected] FAX:026-283-2820 =-=-=-=-

2001-11-22 15:47:46

by Sven.Riedel

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

[Again, I did not see my mail make it to the linux-kernel mailing list.
Again my apologies if you receive this mail twice. Now, who is munching
the mail I'm sending to the list?]

On Tue, Nov 20, 2001 at 08:31:05AM +0900, nakai wrote:
> Excuse me, I had holidays.
We all need those :)

> May I ask you how to get /proc/pci ?
Well as I said in my original posting:
Sidenote: booting the exact same kernel from floppy (debian rescue with
kernels exchanged and booting with 'linux root=/dev/hda6') gives me a
rather unstable system for a short while (uptime usually < 6 hours before
oopsing).
Currently, the "unstable" attribute doesn't really seem to apply
anymore, I get uptimes >1 day at the moment. Maybe /dev/random likes me
better now...
Anyway, I boot from floppy, so I do get a somewhat useable system. It's
just very annoying, and I don't really trust things showing this sort of
behaviour (likes floppy, dislikes harddisk).

> Is there any kernel which can run ?
2.4.7 ran fine from floppy (rather stable, I had uptimes > 14 days),
2.4.14 seems to work semi-fine from floppy. Other kernels may or may not
boot from floppy, most oops before the bootscripts finish running. No
kernel likes to be booted from harddisk at the moment.

> Is it correctly hardware trouble or kernel trouble ?
Good question, since there is _no_ difference between the kernels that
boot from harddisk and those that boot from floppy (at least I did not
change anything). So what is different?
* Bootlocation/Bootsector/MBR
I doubt it, since linux booted fine off the very same harddisk before I
exchanged board and CPU, and lilo (nor kernel) complain when writing
a new setup to the harddisks MBR.

* Speed
The kernel will be loaded much slower from floppy than from harddisk.

* NOT Mainboard, CPU, RAM
Don't change when I boot from floppy ;). Maybe memory is used in a
different way when the kernel is loaded from floppy?
I severely doubt there is a problem with the RAM, I already ran
memcheck86 for several hours, put the speed down to 100MHz in the
BIOS, and the RAM is the same I used in my previous hardware
configuration and linux booted fine from harddisk then.

* Interrupts Triggered
Maybe I should try writing a kernel to a bootable CDROM and try to
boot from that (my CDROM drive is attached to a SCSI Adapter, so if
this causes the kernel to Oops as well, the chances that it's the IDE
Controller interrupt requests is lessened).

* Lilo version
I doubt this is the cause either, since the kernel has already been
loaded and is executing.

I can't think of anything else that may be different between floppy and
harddisk boot. I guess it may be more of a kernel problem now than
hardware problem, since the KT133A patch made it into the kernel and
I've tested the other KT133A patch with no different results as well.
Although with the x86 architecture, who can be sure? :)

Regs,
Sven

--
Sven Riedel [email protected]
Osteroeder Str. 6 / App. 13 [email protected]
38678 Clausthal "Call me bored, but don't call me boring."
- Larry Wall

2001-11-24 17:54:31

by Sven.Riedel

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

Well, the problem got solved (although not in a way I'd consider
satisfactory). After my machine started random segfaulting the day
before yesterday, I memcheck86'ed it again (the last check is a mere two
months ago), and lo - all three RAM chips were broken. Unfortunately, I
discovered this, after this broken RAM caused my /usr partition to go
fubar, resulting in me spending yesterday with a nice little reinstall.
After the reinstall, 2.4.14 booted fine off the harddisk. No more
oopses.
As to the cause of the problem: I think I can rule out the possibility
of getting a bad kernel compiled due to the bad ram, as I booted once
well below the problem zones with mem=32m and recompiled a kernel with
that and tried to boot - same symptoms.
Maybe lilo was broken or didn't like the MBR it was written to, or
something along those lines.
Thanks to all who tried to help me!

Regs,
Sven
--
Sven Riedel [email protected]
Osteroeder Str. 6 / App. 13 [email protected]
38678 Clausthal "Call me bored, but don't call me boring."
- Larry Wall

2001-11-26 00:40:57

by nakai

[permalink] [raw]
Subject: Re: 2.4.14 Oops during boot (KT133A Problem?)

Congratulations!

[email protected] wrote:
> Well, the problem got solved (although not in a way I'd consider
> satisfactory). After my machine started random segfaulting the day
> before yesterday, I memcheck86'ed it again (the last check is a mere two
> months ago), and lo - all three RAM chips were broken. Unfortunately, I
> discovered this, after this broken RAM caused my /usr partition to go
> fubar, resulting in me spending yesterday with a nice little reinstall.
> After the reinstall, 2.4.14 booted fine off the harddisk. No more
> oopses.

During this holidays, I guessed, and I thought it because of
harddisk or PCI chip erro. Memory error! Was it found when booting
matherboard by BIOS? I think motherboard always check memories when
booting.

--
-=-=-=-= SHINKO ELECTRIC INDUSTRIES CO., LTD. =-=-=-=-
=-=-=-=- Core Technology Research & Laboratory, -=-=-=-=
-=-=-=-= Infomation Technology Research Dept. =-=-=-=-
=-=-=-=- Name:Hisakazu Nakai TEL:026-283-2866 -=-=-=-=
-=-=-=-= Mail:[email protected] FAX:026-283-2820 =-=-=-=-

2001-11-28 10:51:00

by PVotruba

[permalink] [raw]
Subject: RE: 2.4.14 Oops during boot (KT133A Problem?)

the memory check in bios is designed mostly for testing if (detected
memory chips == present memory chips).
Safe comprehensive checking can be done by memtest86, and virtualy
unsafe testing can be cyclic kernel compilation (if you notice strange
compile errors at same kernel config but at different moments, you can
suspect that your hardware is rotten).

To: Sven
are you sure that ALL your memory chips are bad? how about memory
latency and other bios settings, overclocking, etc.? try to setup your box
to run at slowest memory configuration as possible. It's hard to believe
that all of your chips are gone.

Regards
Petr

----- previous message follows:
> Congratulations!
>
> [email protected] wrote:
> > Well, the problem got solved (although not in a way I'd consider
> > satisfactory). After my machine started random segfaulting the day
> > before yesterday, I memcheck86'ed it again (the last check is a mere two
> > months ago), and lo - all three RAM chips were broken. Unfortunately, I
> > discovered this, after this broken RAM caused my /usr partition to go
> > fubar, resulting in me spending yesterday with a nice little reinstall.
> > After the reinstall, 2.4.14 booted fine off the harddisk. No more
> > oopses.
>
> During this holidays, I guessed, and I thought it because of
> harddisk or PCI chip erro. Memory error! Was it found when booting
> matherboard by BIOS? I think motherboard always check memories when
> booting.
>
> --
> -=-=-=-= SHINKO ELECTRIC INDUSTRIES CO., LTD. =-=-=-=-
> =-=-=-=- Core Technology Research & Laboratory, -=-=-=-=
> -=-=-=-= Infomation Technology Research Dept. =-=-=-=-
> =-=-=-=- Name:Hisakazu Nakai TEL:026-283-2866 -=-=-=-=
> -=-=-=-= Mail:[email protected] FAX:026-283-2820 =-=-=-=-
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/