2001-07-29 18:34:20

by James A. Treacy

[permalink] [raw]
Subject: PROBLEM: Random (hard) lockups

At random times this brand new machine locks up hard (so nothing to
report from Sys Rq). This happens with all recent 2.4.x kernels, 2.2.17
and 2.2.19. Some kernels seem to be worse than others as I can often work
for 30 minutes or more on 2.4.6 without a lockup. 2.4.7 generally locks
up within a few minutes. The only way I have been able to reliably
force a lockup is to use lots of memory (doesn't have to start swapping)
and push the machine hard (say, by a kernel compile). Compiling a kernel
on an otherwise unloaded machine will not cause a lockup.

The memory has been tested by running memtest86 overnight.
The machine runs cpuburn (burnK7) without a problem.

The machine is a 1GHz Athlon (266) on an MSI K7T Turbo with 256M ram,
a GeForce2 MX-400 and a Maxtor 5T020H2 (Diamond MaxPlus 20G).
The only other devices are a DVD and 2 ethernet cards (tulip and
8139TOO), although I temporarily have an old QUANTUM SIROCCO1700A hard
drive in the machine.

As I'd like to make this my main machine, I am willing to put some
time into tracking down the problem, but need pointers on where to
begin.


bash$ sh scripts/ver_linux
Linux horta 2.4.6 #5 Mon Jul 23 16:47:46 EDT 2001 i686 unknown

Gnu C 2.95.4
Gnu make 3.79.1
binutils 2.11.90.0.24
util-linux 2.11g
mount 2.11g
modutils 2.4.6
e2fsprogs 1.22
Linux C Library 2.2.3
Dynamic linker (ldd) 2.2.3
Procps 2.0.7
Net-tools 1.60
Console-tools 0.2.3
Sh-utils 2.0.11
Modules Loaded parport_pc lp parport ppp_deflate bsd_comp
ipt_MASQUERADE ppp_async ip_nat_ftp iptable_nat ip_conntrack
iptable_filter ip_tables ppp_generic slhc

bash$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) Processor
stepping : 2
cpu MHz : 996.041
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca
cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 1985.74

bash$ cat /proc/modules
parport_pc 13616 1 (autoclean)
lp 5328 0 (autoclean)
parport 13952 1 (autoclean) [parport_pc lp]
ppp_deflate 39648 0 (autoclean)
bsd_comp 4256 0 (autoclean)
ipt_MASQUERADE 1520 1
ppp_async 6480 1 (autoclean)
ip_nat_ftp 3360 0 (unused)
iptable_nat 15376 1 [ipt_MASQUERADE ip_nat_ftp]
ip_conntrack 14672 1 [ipt_MASQUERADE ip_nat_ftp
iptable_nat]
iptable_filter 2080 0 (unused)
ip_tables 10656 5 [ipt_MASQUERADE iptable_nat
iptable_filter]
ppp_generic 14336 3 (autoclean) [ppp_deflate bsd_comp
ppp_async]
slhc 4832 1 (autoclean) [ppp_generic]

bash$ cat /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(set)
0376-0376 : ide1
0378-037a : parport0
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(set)
0cf8-0cff : PCI conf1
5000-500f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
6000-607f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
d000-d00f : VIA Technologies, Inc. Bus Master IDE
d000-d007 : ide0
d008-d00f : ide1
d400-d41f : VIA Technologies, Inc. UHCI USB
d800-d81f : VIA Technologies, Inc. UHCI USB (#2)
dc00-dcff : Macronix, Inc. [MXIC] MX987x5
dc00-dcff : tulip
e000-e0ff : Realtek Semiconductor Co., Ltd. RTL-8139
e000-e0ff : 8139too

bash$ cat /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-0fffffff : System RAM
00100000-001c34c9 : Kernel code
001c34ca-00206a1f : Kernel data
d0000000-d7ffffff : PCI Bus #01
d0000000-d7ffffff : nVidia Corporation NV11
d8000000-dbffffff : VIA Technologies, Inc. VT8363/8365 [KT133/KM133]
dc000000-ddffffff : PCI Bus #01
dc000000-dcffffff : nVidia Corporation NV11
df000000-df0000ff : Macronix, Inc. [MXIC] MX987x5
df000000-df0000ff : tulip
df001000-df0010ff : Realtek Semiconductor Co., Ltd. RTL-8139
df001000-df0010ff : 8139too
ffff0000-ffffffff : reserved

lspci -vvv
00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
Latency: 8
Region 0: Memory at d8000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: dc000000-ddffffff
Prefetchable memory behind bridge: d0000000-d7ffffff
BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
Subsystem: VIA Technologies, Inc. VT82C686/A PCI to ISA Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 4: I/O ports at d000 [size=16]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16) (prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Interrupt: pin D routed to IRQ 5
Region 4: I/O ports at d400 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16) (prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Interrupt: pin D routed to IRQ 5
Region 4: I/O ports at d800 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Capabilities: [68] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:09.0 Ethernet controller: Macronix, Inc. [MXIC] MX987x5 (rev 25)
Subsystem: Unknown device 2078:0540
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2000ns min, 14000ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at dc00 [size=256]
Region 1: Memory at df000000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at <unassigned> [disabled] [size=256K]
Capabilities: [44] Power Management version 1
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable+ DSel=0 DScale=0 PME-

00:0c.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (8000ns min, 16000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at e000 [size=256]
Region 1: Memory at df001000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable+ DSel=0 DScale=0 PME-

01:00.0 VGA compatible controller: nVidia Corporation NV11 (rev b2) (prog-if 00 [VGA])
Subsystem: Asustek Computer, Inc.: Unknown device 4031
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (1250ns min, 250ns max)
Interrupt: pin A routed to IRQ 5
Region 0: Memory at dc000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at d0000000 (32-bit, prefetchable) [size=128M]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [44] AGP version 2.0
Status: RQ=31 SBA- 64bit- FW+ Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

--
James (Jay) Treacy
[email protected]


2001-07-29 18:48:21

by Alan

[permalink] [raw]
Subject: Re: PROBLEM: Random (hard) lockups

> At random times this brand new machine locks up hard (so nothing to
> report from Sys Rq). This happens with all recent 2.4.x kernels, 2.2.17
> and 2.2.19. Some kernels seem to be worse than others as I can often work
>
> As I'd like to make this my main machine, I am willing to put some
> time into tracking down the problem, but need pointers on where to
> begin.

Given you see the same behaviour in very stable 2.2 kernels, I'd say begin
at the hardware.

2001-07-30 20:23:37

by Kurt Garloff

[permalink] [raw]
Subject: Re: PROBLEM: Random (hard) lockups

On Sun, Jul 29, 2001 at 02:34:01PM -0400, James A. Treacy wrote:
> The machine is a 1GHz Athlon (266) on an MSI K7T Turbo with 256M ram,

A 1.2GHz Athlon with the very same motherboard and the same amount of RAM
seems to be stable with 2.4.7 and PPro or K6 optimizations and crashes
during the init procedure if the kernel is optimized for K7.

It seems that the board is sensitive to high memory bandwidth operations.
This may be due to bad electrical design of the board or the chipset or due
to bad chipset settings. (See thread VIA KT133A / athlon / MMX)

It may be a good idea to play with BIOS settings or slow the machine down a
bit.

Regards,
--
Kurt Garloff <[email protected]> Eindhoven, NL
GPG key: See mail header, key servers Linux kernel development
SuSE GmbH, Nuernberg, DE SCSI, Security


Attachments:
(No filename) (868.00 B)
(No filename) (232.00 B)
Download all attachments

2001-07-30 20:47:27

by Dan Hollis

[permalink] [raw]
Subject: Re: PROBLEM: Random (hard) lockups

On Mon, 30 Jul 2001, Kurt Garloff wrote:
> On Sun, Jul 29, 2001 at 02:34:01PM -0400, James A. Treacy wrote:
> > The machine is a 1GHz Athlon (266) on an MSI K7T Turbo with 256M ram,
> A 1.2GHz Athlon with the very same motherboard and the same amount of RAM
> seems to be stable with 2.4.7 and PPro or K6 optimizations and crashes
> during the init procedure if the kernel is optimized for K7.

Perhaps someone can make a test case .c program which uses K7
optimizations to smash memory? It would be nice to be able to pin this
down. Obviously, the standard memory testers aren't catching it.

Is this only happening on DDR systems?

-Dan

--
[-] Omae no subete no kichi wa ore no mono da. [-]

2001-07-30 21:18:07

by Manuel A. McLure

[permalink] [raw]
Subject: RE: PROBLEM: Random (hard) lockups

Dan Hollis wrote:
> On Mon, 30 Jul 2001, Kurt Garloff wrote:
> > On Sun, Jul 29, 2001 at 02:34:01PM -0400, James A. Treacy wrote:
> > > The machine is a 1GHz Athlon (266) on an MSI K7T Turbo
> with 256M ram,
> > A 1.2GHz Athlon with the very same motherboard and the same
> amount of RAM
> > seems to be stable with 2.4.7 and PPro or K6 optimizations
> and crashes
> > during the init procedure if the kernel is optimized for K7.
>
> Perhaps someone can make a test case .c program which uses K7
> optimizations to smash memory? It would be nice to be able to pin this
> down. Obviously, the standard memory testers aren't catching it.
>
> Is this only happening on DDR systems?
>
> -Dan


I am seeing something very similar on my K7T Turbo/Athlon 900/256M PC133
SDRAM (note to Dan, the K7T Turbo is an SDRAM mobo, not DDR) - 2.4.6 worked
fine with no hangs/oopses but 2.4.7 will suddenly hang after an uptime of a
day or two. Sysrq-b will reboot, but Sysrq-s and Sysrq-u seem to be
ineffective, and the machine won't respond to pings. Of course, I'm always
in X when this happens so I don't see any oops information (whatever
happened to the "write oops to floppy" patches?). Both 2.4.6 and 2.4.7 are
compiled with K7 optimizations turned on.

--
Manuel A. McLure - Unify Corp. Technical Support <[email protected]>
Zathras is used to being beast of burden to other peoples needs. Very sad
life. Probably have very sad death, but at least there is symmetry.

2001-07-30 21:32:37

by Dan Hollis

[permalink] [raw]
Subject: RE: PROBLEM: Random (hard) lockups

On Mon, 30 Jul 2001, Manuel A. McLure wrote:
> I am seeing something very similar on my K7T Turbo/Athlon 900/256M PC133
> SDRAM (note to Dan, the K7T Turbo is an SDRAM mobo, not DDR)

Hmm. I have an MSI pro2-A which works fine with K7 optimizations...
tbird900, 256mb sdram...

May very well be a hardware design flaw. The pro2-a appears very carefully
engineered to keep most traces short and they went through a lot of effort
to keep traces the same length.

The only stability problems I've had were overloading the 300W PS with a
geforce2, all 6 PCI slots filled, 3 HD, 1 CDRW, and one DVDrom. Removing
one of the drives (CDRW or DVDrom) lowers the power consumption back into
the realm of 100% stability.

Next on my shopping list is a 450W PS :-)

Has anyone tried swapping buffered/unbuffered DIMMs to see if it made a
difference?

-Dan

--
[-] Omae no subete no kichi wa ore no mono da. [-]

2001-07-30 21:38:59

by Kurt Garloff

[permalink] [raw]
Subject: Re: PROBLEM: Random (hard) lockups

On Mon, Jul 30, 2001 at 01:46:49PM -0700, Dan Hollis wrote:
> Perhaps someone can make a test case .c program which uses K7
> optimizations to smash memory? It would be nice to be able to pin this
> down. Obviously, the standard memory testers aren't catching it.

Well, I posted a test program to LKML, but that one failed to show any
errors. Maybe we have to provoke certain access patterns of (physical)
adresses to trigger the bugs ... ?

Regards,
--
Kurt Garloff <[email protected]> Eindhoven, NL
GPG key: See mail header, key servers Linux kernel development
SuSE GmbH, Nuernberg, DE SCSI, Security


Attachments:
(No filename) (669.00 B)
(No filename) (232.00 B)
Download all attachments

2001-07-31 00:50:20

by Harold Oga

[permalink] [raw]
Subject: Re: PROBLEM: Random (hard) lockups

On Mon, Jul 30, 2001 at 02:31:49PM -0700, Dan Hollis wrote:
>On Mon, 30 Jul 2001, Manuel A. McLure wrote:
>> I am seeing something very similar on my K7T Turbo/Athlon 900/256M PC133
>> SDRAM (note to Dan, the K7T Turbo is an SDRAM mobo, not DDR)
>
>Hmm. I have an MSI pro2-A which works fine with K7 optimizations...
>tbird900, 256mb sdram...
>
>May very well be a hardware design flaw. The pro2-a appears very carefully
>engineered to keep most traces short and they went through a lot of effort
>to keep traces the same length.
>
>The only stability problems I've had were overloading the 300W PS with a
>geforce2, all 6 PCI slots filled, 3 HD, 1 CDRW, and one DVDrom. Removing
>one of the drives (CDRW or DVDrom) lowers the power consumption back into
>the realm of 100% stability.
>
>Next on my shopping list is a 450W PS :-)
>
>Has anyone tried swapping buffered/unbuffered DIMMs to see if it made a
>difference?
>
>-Dan
Hi,
I also have a pro2a with no problems. However, the pro2a has a KT133,
and all the problems I've seen reported on the list seem to be with the
KT133A or KT266. Maybe its only a problem with chipsets that support
266FSB. Just tossing out ideas here.

-Harold
--
"Life sucks, deal with it!"