2002-06-30 18:53:23

by Ken Witherow

[permalink] [raw]
Subject: Tyan 2460/Dual Athlon MP hangs

[1.] One line summary of the problem:
Random hangs occur on dual athlon system

[2.] Full description of the problem/report:
Sometimes the system hangs during boot, sometimes at the console and
other times while I'm in X. Happens whether or not I use the mem=nopentium
and noapic options. Frequently, I can't catch the OOPS because I'm in X
and everything freezes.

[3.] Keywords (i.e., modules, networking, kernel):
[4.] Kernel version (from /proc/version):
Linux version 2.4.19-rc1 ([email protected]) (gcc version 3.1) #18 SMP
Sat Jun 29 13:50:03 EDT 2002

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)
here are the two most recent OOPS I've caught. The first was when the
system was recovering ext3 journals and the second after about 45 minutes
of normal system use. Happens regardless of NVidia's driver being loaded

invalid operand: 0000
CPU: 1
EIP: 0010:[<c033abf0>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: c033abe4 ebx: 0000002d ecx: c033abf0 edx: c003abf0
esi: c1597ec4 edi: 00000040 ebp: 00000001 esp: c1597ebc
ds: 0018 es: 0018 ss: 0018
stack: c0123741 c033abf0 c0339420 c0339520 00000001 c03d0d40 c012731f
c0339520
c0123664 c03d0d54 c0123503 00000001 00000001 c03a9240 fffffffe
00000001
c01232ce c03a9240 00000046 c03a2800 00000000 c0337650 00000000
c010acec
Call Trace: [<c0123741>] [<c012731f>] [<c0123664>] [<c0123503>]
[<c01232ce>]
[<c010acec>] [<c010d318>] [<c01c1940>] [<c01c17e0>]
[<c0106ec2>] [c011e04b>]
[<c011e2b7>]
Code: f0 ab c0 f0 ab 33 c0 01 00 00 00 d8 c8 59 c1 d8 cc 59 c1


>>EIP; c033abf0 <blocked_list+0/8> <=====

>>eax; c033abe4 <lease_break_time+0/4>
>>ecx; c033abf0 <blocked_list+0/8>
>>edx; c003abf0 Before first symbol
>>esi; c1597ec4 <_end+1189610/2047274c>
>>esp; c1597ebc <_end+1189608/2047274c>

Trace; c0123741 <__run_task_queue+61/70>
Trace; c012731f <tqueue_bh+1f/30>
Trace; c0123664 <bh_action+54/80>
Trace; c0123503 <tasklet_hi_action+63/a0>
Trace; c01232ce <do_softirq+ce/d0>
Trace; c010acec <do_IRQ+dc/e0>
Trace; c010d318 <call_do_IRQ+5/d>
Trace; c01c1940 <pr_power_idle+160/310>
Trace; c01c17e0 <pr_power_idle+0/310>
Trace; c0106ec2 <cpu_idle+42/60>
Trace; c011e2b7 <printk+137/180>

Code; c033abf0 <blocked_list+0/8>
00000000 <_EIP>:
Code; c033abf0 <blocked_list+0/8> <=====
0: f0 ab lock stos %eax,%es:(%edi) <=====
Code; c033abf2 <blocked_list+2/8>
2: c0 (bad)
Code; c033abf3 <blocked_list+3/8>
3: f0 ab lock stos %eax,%es:(%edi)
Code; c033abf5 <blocked_list+5/8>
5: 33 c0 xor %eax,%eax
Code; c033abf7 <blocked_list+7/8>
7: 01 00 add %eax,(%eax)
Code; c033abf9 <__compound_literal.0+1/4>
9: 00 00 add %al,(%eax)
Code; c033abfb <__compound_literal.0+3/4>
b: d8 c8 fmul %st(0),%st
Code; c033abfd <dentry_unused+1/8>
d: 59 pop %ecx
Code; c033abfe <dentry_unused+2/8>
e: c1 d8 cc rcr $0xcc,%eax
Code; c033ac01 <dentry_unused+5/8>
11: 59 pop %ecx
Code; c033ac02 <dentry_unused+6/8>
12: c1 00 00 roll $0x0,(%eax)


kernel BUG in header file at line 51
kernel BUG at panic.c: 141!
invalid operand: 0000
CPU: 1
EIP: 0010:[<c011dab7>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 00000025 ebx: 00000001 ecx: 00000086 edx: dfb0df7c
esi: d451d980 edi: c1596000 ebp: c1597f6c esp: c1597f48
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c1597000)
Stack: c02df320 00000033 c011b9bc 00000033 00000000 00000018 dadc8000
d4519800
c1596000 c1597fac c011a520 d451d980 d451d980 dadc8000 00000001
dadc803c
dadc8000 d451d980 00000002 00000026 00000001 c03a7a40 c01c17e0
c1596000
Call Trace: [<c011b9bc>] [<c011a520>] [<c01c17e0>] [<c0106ece>]
[<c011e04b>]
[<c011e2b7>]
Code: 0f 0b 8d 00 61 00 2e c0 90 eb fe 90 90 90 90 90 90 90 90 90


>>EIP; c011dab7 <__out_of_line_bug+17/60> <=====

>>edx; dfb0df7c <_end+1f6ff6c8/2047274c>
>>esi; d451d980 <_end+1410f0cc/2047274c>
>>edi; c1596000 <_end+118774c/2047274c>
>>ebp; c1597f6c <_end+11896b8/2047274c>
>>esp; c1597f48 <_end+1189694/2047274c>

Trace; c011b9bc <switch_mm+12c/130>
Trace; c011a520 <schedule+1f0/3e0>
Trace; c01c17e0 <pr_power_idle+0/310>
Trace; c0106ece <cpu_idle+4e/60>
Trace; c011e04b <call_console_drivers+5b/120>
Trace; c011e2b7 <printk+137/180>

Code; c011dab7 <__out_of_line_bug+17/60>
00000000 <_EIP>:
Code; c011dab7 <__out_of_line_bug+17/60> <=====
0: 0f 0b ud2a <=====
Code; c011dab9 <__out_of_line_bug+19/60>
2: 8d 00 lea (%eax),%eax
Code; c011dabb <__out_of_line_bug+1b/60>
4: 61 popa
Code; c011dabc <__out_of_line_bug+1c/60>
5: 00 2e add %ch,(%esi)
Code; c011dabe <__out_of_line_bug+1e/60>
7: c0 90 eb fe 90 90 90 rclb $0x90,0x9090feeb(%eax)
Code; c011dac5 <__out_of_line_bug+25/60>
e: 90 nop
Code; c011dac6 <__out_of_line_bug+26/60>
f: 90 nop
Code; c011dac7 <__out_of_line_bug+27/60>
10: 90 nop
Code; c011dac8 <__out_of_line_bug+28/60>
11: 90 nop
Code; c011dac9 <__out_of_line_bug+29/60>
12: 90 nop
Code; c011daca <__out_of_line_bug+2a/60>
13: 90 nop


[6.] A small shell script or example program which triggers the
problem (if possible)
[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)
Linux death.krwtech.com 2.4.19-rc1 #18 SMP Sat Jun 29 13:50:03 EDT 2002
i686 unknown

Gnu C gcc (GCC) 3.1 Copyright (C) 2002 Free Software
Foundation, Inc. This is free software; see the source for copying
conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.
Gnu make 3.79
util-linux 2.11r
mount 2.11r
modutils 2.4.16
e2fsprogs 1.22
Linux C Library 2.2.5
Dynamic linker (ldd) 2.2.5
Linux C++ Library 4.0.0
Procps 2.0.7
Net-tools 1.60
Kbd 1.06
Sh-utils 2.0
Modules Loaded NVdriver w83781d i2c-amd756

[7.2.] Processor information (from /proc/cpuinfo):
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) MP 1800+
stepping : 2
cpu MHz : 1533.393
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3060.53

processor : 1
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) Processor
stepping : 2
cpu MHz : 1533.393
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 3060.53

[7.3.] Module information (from /proc/modules):
NVdriver 989408 10 (autoclean)
w83781d 19392 0
i2c-amd756 3236 0 (unused)

[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
/proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
02f8-02ff : serial(auto)
0378-037a : parport0
037b-037f : parport0
03c0-03df : vga+
03f8-03ff : serial(auto)
0cf8-0cff : PCI conf1
1000-10ff : D-Link System Inc RTL8139 Ethernet
1000-10ff : 8139too
1400-143f : Advanced System Products, Inc ABP940-U / ABP960-U
1400-140f : advansys
1440-145f : Creative Labs SB Live! EMU10k1
1440-145f : EMU10K1
1470-1473 : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System
Controller
1478-147f : Creative Labs SB Live! MIDI/Game Port
1478-147f : emu10k1-gp
80e0-80ef : amd756-smbus
f000-f00f : Advanced Micro Devices [AMD] AMD-766 [ViperPlus] IDE
f000-f007 : ide0
f008-f00f : ide1

/proc/iomem
00000000-0009f3ff : System RAM
0009f400-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000cb800-000ce7ff : Extension ROM
000dc000-000dcfff : Advanced Micro Devices [AMD] AMD-766 [ViperPlus] USB
000e0000-000effff : Extension ROM
000f0000-000fffff : System ROM
00100000-1ffeffff : System RAM
00100000-002cd878 : Kernel code
002cd879-00359dbf : Kernel data
1fff0000-1ffffbff : ACPI Tables
1ffffc00-1fffffff : ACPI Non-volatile Storage
e8001000-e80010ff : D-Link System Inc RTL8139 Ethernet
e8001000-e80010ff : 8139too
e8002000-e8002fff : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P]
System Controller
e9000000-e9ffffff : PCI Bus #01
e9000000-e9ffffff : nVidia Corporation NV11 (GeForce2 MX)
ec000000-efffffff : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P]
System Controller
f0000000-f7ffffff : PCI Bus #01
f0000000-f7ffffff : nVidia Corporation NV11 (GeForce2 MX)
fec00000-fec0ffff : reserved
fee00000-fee00fff : reserved
fff80000-ffffffff : reserved


[7.5.] PCI information ('lspci -vvv' as root)
PCI devices found:
Bus 0, device 0, function 0:
Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System
Controller (rev 17).
Master Capable. Latency=64.
Prefetchable 32 bit memory at 0xec000000 [0xefffffff].
Prefetchable 32 bit memory at 0xe8002000 [0xe8002fff].
I/O at 0x1470 [0x1473].
Bus 0, device 1, function 0:
PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP
Bridge (rev 0).
Master Capable. Latency=99. Min Gnt=12.
Bus 0, device 7, function 0:
ISA bridge: Advanced Micro Devices [AMD] AMD-766 [ViperPlus] ISA (rev
2).
Bus 0, device 7, function 1:
IDE interface: Advanced Micro Devices [AMD] AMD-766 [ViperPlus] IDE
(rev 1).
Master Capable. Latency=64.
I/O at 0xf000 [0xf00f].
Bus 0, device 7, function 3:
Bridge: Advanced Micro Devices [AMD] AMD-766 [ViperPlus] ACPI (rev 1).
Master Capable. Latency=64.
Bus 0, device 7, function 4:
USB Controller: Advanced Micro Devices [AMD] AMD-766 [ViperPlus] USB
(rev 7).
IRQ 19.
Master Capable. Latency=16. Max Lat=80.
Non-prefetchable 32 bit memory at 0xdc000 [0xdcfff].
Bus 0, device 10, function 0:
SCSI storage controller: Advanced System Products, Inc ABP940-U /
ABP960-U (rev 2).
IRQ 18.
Master Capable. Latency=64. Min Gnt=4.Max Lat=4.
I/O at 0x1400 [0x143f].
Bus 0, device 11, function 0:
Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 10).
IRQ 19.
Master Capable. Latency=64. Min Gnt=2.Max Lat=20.
I/O at 0x1440 [0x145f].
Bus 0, device 11, function 1:
Input device controller: Creative Labs SB Live! MIDI/Game Port (rev
10).
Master Capable. Latency=64.
I/O at 0x1478 [0x147f].
Bus 0, device 13, function 0:
Ethernet controller: D-Link System Inc RTL8139 Ethernet (rev 16).
IRQ 17.
Master Capable. Latency=64. Min Gnt=32.Max Lat=64.
I/O at 0x1000 [0x10ff].
Non-prefetchable 32 bit memory at 0xe8001000 [0xe80010ff].
Bus 1, device 5, function 0:
VGA compatible controller: nVidia Corporation NV11 (GeForce2 MX) (rev
178).
IRQ 18.
Master Capable. Latency=248. Min Gnt=5.Max Lat=1.
Non-prefetchable 32 bit memory at 0xe9000000 [0xe9ffffff].
Prefetchable 32 bit memory at 0xf0000000 [0xf7ffffff].

[7.6.] SCSI information (from /proc/scsi/scsi)
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: DNES-309170 Rev: SAH0
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: WDIGTL Model: WD183 ULTRA2 Rev: 1.00
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: MATSHITA Model: CD-R CW-7502 Rev: 4.05
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 05 Lun: 00
Vendor: SEAGATE Model: SX410800N Rev: 7117
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 06 Lun: 00
Vendor: CyberDrv Model: CD-ROM TW240S Rev: 1.20
Type: CD-ROM ANSI SCSI revision: 02


[7.7.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):
[ken@death linux]$ cat /proc/interrupts
CPU0 CPU1
0: 88598 86200 IO-APIC-edge timer
1: 2798 2740 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
8: 0 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-edge acpi
12: 26342 29117 IO-APIC-edge PS/2 Mouse
17: 815 831 IO-APIC-level eth0
18: 58428 64043 IO-APIC-level advansys, nvidia
19: 1235 1182 IO-APIC-level EMU10K1
NMI: 0 0
LOC: 174708 174662
ERR: 0
MIS: 0


[X.] Other notes, patches, fixes, workarounds:
.config attached

--
Ken Witherow <phantoml AT rochester.rr.com> ICQ: 21840670
KRW Technologies http://www.krwtech.com
Linux 2.4.18 - because I'd like to get there today
to remain free, we must remain free of intrusive government



Attachments:
.config (18.57 kB)

2002-06-30 19:18:22

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

On Sun, Jun 30, 2002 at 02:55:18PM -0400, Ken Witherow wrote:
> [1.] One line summary of the problem:
> Random hangs occur on dual athlon system

The 2460 systems (and many other dual athlon systems) are very sensitive
to temperature.

I'm writing this mail from a 2460 with a 62 day uptime and a load that
frequently exceeds 10.

The system I have here has a 12 cm fan (40 ltr/sec) mounted on the front
of the case for better air-intake. This was a hardware hack we applied
after realizing the box would melt itself without some excessive
cooling.

>
> [2.] Full description of the problem/report:
> Sometimes the system hangs during boot, sometimes at the console and
> other times while I'm in X. Happens whether or not I use the mem=nopentium
> and noapic options. Frequently, I can't catch the OOPS because I'm in X
> and everything freezes.

Can it happen during boot, if you let the system cool down for an hour
before powering it on ?

Also be sure that the PSU is big enough.


--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2002-06-30 19:31:48

by Ken Witherow

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

On Sun, 30 Jun 2002, Jakob Oestergaard wrote:

> On Sun, Jun 30, 2002 at 02:55:18PM -0400, Ken Witherow wrote:
> > [1.] One line summary of the problem:
> > Random hangs occur on dual athlon system
>
> The 2460 systems (and many other dual athlon systems) are very sensitive
> to temperature.

seems to happen regardless of temperature. In fact, it most frequenly
happens under no load. After doing some compiling, the current
temperatures read CPU0: 131F/55C, CPU1: 136F/58C, MB: 122F/50C. At idle,
they're normally in the 114/45, 120/48, 110/43 range respectively.
Currenly at about 70 minutes of uptime with the longest being about 36
hours.

> > [2.] Full description of the problem/report:
> > Sometimes the system hangs during boot, sometimes at the console and
> > other times while I'm in X. Happens whether or not I use the mem=nopentium
> > and noapic options. Frequently, I can't catch the OOPS because I'm in X
> > and everything freezes.
>
> Can it happen during boot, if you let the system cool down for an hour
> before powering it on ?

I'll have to try letting it cool the next time it happens to see if that
is indeed the cause.

> Also be sure that the PSU is big enough.

PSU is an Antec PP-412X 400 watt.

--
Ken Witherow <phantoml AT rochester.rr.com> ICQ: 21840670
KRW Technologies http://www.krwtech.com
Linux 2.4.18 - because I'd like to get there today
to remain free, we must remain free of intrusive government




2002-06-30 22:04:51

by Steve Lee

[permalink] [raw]
Subject: RE: Tyan 2460/Dual Athlon MP hangs

Ken,
Try a different video card. My Dual Athlon system (A7M266-D)
works like a charm with my GeForce 2 MX-400 but if I swap it with my
GeForce 3 Ti500 the system will completely shutdown at some point
(guaranteed). This happens with the system idle or while under a load.

Steve

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Ken Witherow
Sent: Sunday, June 30, 2002 2:34 PM
To: Jakob Oestergaard
Cc: [email protected]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

On Sun, 30 Jun 2002, Jakob Oestergaard wrote:

> On Sun, Jun 30, 2002 at 02:55:18PM -0400, Ken Witherow wrote:
> > [1.] One line summary of the problem:
> > Random hangs occur on dual athlon system
>
> The 2460 systems (and many other dual athlon systems) are very
sensitive
> to temperature.

seems to happen regardless of temperature. In fact, it most frequenly
happens under no load. After doing some compiling, the current
temperatures read CPU0: 131F/55C, CPU1: 136F/58C, MB: 122F/50C. At idle,
they're normally in the 114/45, 120/48, 110/43 range respectively.
Currenly at about 70 minutes of uptime with the longest being about 36
hours.

> > [2.] Full description of the problem/report:
> > Sometimes the system hangs during boot, sometimes at the console and
> > other times while I'm in X. Happens whether or not I use the
mem=nopentium
> > and noapic options. Frequently, I can't catch the OOPS because I'm
in X
> > and everything freezes.
>
> Can it happen during boot, if you let the system cool down for an hour
> before powering it on ?

I'll have to try letting it cool the next time it happens to see if that
is indeed the cause.

> Also be sure that the PSU is big enough.

PSU is an Antec PP-412X 400 watt.

--
Ken Witherow <phantoml AT rochester.rr.com> ICQ: 21840670
KRW Technologies http://www.krwtech.com
Linux 2.4.18 - because I'd like to get there today
to remain free, we must remain free of intrusive government




2002-06-30 23:11:20

by Philip Wyett

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

Hi,

This issue has all the hallmarks of PSU problems. I had the same issue running
dual Athlon MP's with a GeForce 4 Ti4600 until I switched the 400W PSU that
came with the case too an Enermax 550W beast. The basic Tyan approved PSU for
the 2460 is a M2463 (460W) unit when talking to my main vendor specifically
because of all the problems with lockups and failures to even POST (Power On
Self Test) etc.

Regards

Philip Wyett

On Sunday 30 Jun 2002 11:09 pm, Stephen Lee wrote:
> Ken,
> Try a different video card. My Dual Athlon system (A7M266-D)
> works like a charm with my GeForce 2 MX-400 but if I swap it with my
> GeForce 3 Ti500 the system will completely shutdown at some point
> (guaranteed). This happens with the system idle or while under a load.
>
> Steve
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Ken Witherow
> Sent: Sunday, June 30, 2002 2:34 PM
> To: Jakob Oestergaard
> Cc: [email protected]
> Subject: Re: Tyan 2460/Dual Athlon MP hangs
>
> On Sun, 30 Jun 2002, Jakob Oestergaard wrote:
> > On Sun, Jun 30, 2002 at 02:55:18PM -0400, Ken Witherow wrote:
> > > [1.] One line summary of the problem:
> > > Random hangs occur on dual athlon system
> >
> > The 2460 systems (and many other dual athlon systems) are very
>
> sensitive
>
> > to temperature.
>
> seems to happen regardless of temperature. In fact, it most frequenly
> happens under no load. After doing some compiling, the current
> temperatures read CPU0: 131F/55C, CPU1: 136F/58C, MB: 122F/50C. At idle,
> they're normally in the 114/45, 120/48, 110/43 range respectively.
> Currenly at about 70 minutes of uptime with the longest being about 36
> hours.
>
> > > [2.] Full description of the problem/report:
> > > Sometimes the system hangs during boot, sometimes at the console and
> > > other times while I'm in X. Happens whether or not I use the
>
> mem=nopentium
>
> > > and noapic options. Frequently, I can't catch the OOPS because I'm
>
> in X
>
> > > and everything freezes.
> >
> > Can it happen during boot, if you let the system cool down for an hour
> > before powering it on ?
>
> I'll have to try letting it cool the next time it happens to see if that
> is indeed the cause.
>
> > Also be sure that the PSU is big enough.
>
> PSU is an Antec PP-412X 400 watt.

2002-07-01 03:52:02

by Ken Witherow

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

On Mon, 1 Jul 2002, Philip Wyett wrote:

> Hi,
>
> This issue has all the hallmarks of PSU problems. I had the same issue running
> dual Athlon MP's with a GeForce 4 Ti4600 until I switched the 400W PSU that
> came with the case too an Enermax 550W beast. The basic Tyan approved PSU for
> the 2460 is a M2463 (460W) unit when talking to my main vendor specifically
> because of all the problems with lockups and failures to even POST (Power On
> Self Test) etc.

I have seen it fail to properly post after rebooting and I figured it
might be the ACPI problem I've seen people have before. I'll upgrade the
PSU and see if that doesn't take care of the problem. Thanks to everyone
for the advice :)

--
Ken Witherow <phantoml AT rochester.rr.com> ICQ: 21840670
KRW Technologies http://www.krwtech.com
Linux 2.4.18 - because I'd like to get there today
to remain free, we must remain free of intrusive government


2002-07-01 08:48:42

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

Ken Witherow <[email protected]>, <[email protected]> writes:

> On Mon, 1 Jul 2002, Philip Wyett wrote:
>
> > Hi,
> >
> > This issue has all the hallmarks of PSU problems. I had the same issue running
>
> > dual Athlon MP's with a GeForce 4 Ti4600 until I switched the 400W PSU that
> > came with the case too an Enermax 550W beast. The basic Tyan approved PSU for
> > the 2460 is a M2463 (460W) unit when talking to my main vendor specifically
> > because of all the problems with lockups and failures to even POST (Power On
> > Self Test) etc.
>
> I have seen it fail to properly post after rebooting and I figured it
> might be the ACPI problem I've seen people have before. I'll upgrade the
> PSU and see if that doesn't take care of the problem. Thanks to everyone
> for the advice :)

My memory says the 2460 doesn't supply enough voltage/power to both
cpus, so getting it to run stable is a challenge.

I have seen some systems where burn k7 would lock them in under a minute,
due to this.

Eric

2002-07-01 13:34:12

by Luigi Genoni

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs


I had similar behaviour because of excessive heat on a DUAL Athlon
2000MP+, same MB; the case had a bad air flow.

This is easy to check, just try with the case open and see if the oops are
repeatable.

Luigi

On Sun, 30 Jun 2002, Ken Witherow wrote:

> Date: Sun, 30 Jun 2002 14:55:18 -0400 (EDT)
> From: Ken Witherow <[email protected]>
> Reply-To: Ken Witherow <[email protected]>
> To: [email protected]
> Subject: Tyan 2460/Dual Athlon MP hangs
>
> [1.] One line summary of the problem:
> Random hangs occur on dual athlon system
>
> [2.] Full description of the problem/report:
> Sometimes the system hangs during boot, sometimes at the console and
> other times while I'm in X. Happens whether or not I use the mem=nopentium
> and noapic options. Frequently, I can't catch the OOPS because I'm in X
> and everything freezes.
>
> [3.] Keywords (i.e., modules, networking, kernel):
> [4.] Kernel version (from /proc/version):
> Linux version 2.4.19-rc1 ([email protected]) (gcc version 3.1) #18 SMP
> Sat Jun 29 13:50:03 EDT 2002
>
> [5.] Output of Oops.. message (if applicable) with symbolic information
> resolved (see Documentation/oops-tracing.txt)
> here are the two most recent OOPS I've caught. The first was when the
> system was recovering ext3 journals and the second after about 45 minutes
> of normal system use. Happens regardless of NVidia's driver being loaded
>
> invalid operand: 0000
> CPU: 1
> EIP: 0010:[<c033abf0>] Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010286
> eax: c033abe4 ebx: 0000002d ecx: c033abf0 edx: c003abf0
> esi: c1597ec4 edi: 00000040 ebp: 00000001 esp: c1597ebc
> ds: 0018 es: 0018 ss: 0018
> stack: c0123741 c033abf0 c0339420 c0339520 00000001 c03d0d40 c012731f
> c0339520
> c0123664 c03d0d54 c0123503 00000001 00000001 c03a9240 fffffffe
> 00000001
> c01232ce c03a9240 00000046 c03a2800 00000000 c0337650 00000000
> c010acec
> Call Trace: [<c0123741>] [<c012731f>] [<c0123664>] [<c0123503>]
> [<c01232ce>]
> [<c010acec>] [<c010d318>] [<c01c1940>] [<c01c17e0>]
> [<c0106ec2>] [c011e04b>]
> [<c011e2b7>]
> Code: f0 ab c0 f0 ab 33 c0 01 00 00 00 d8 c8 59 c1 d8 cc 59 c1
>
>
> >>EIP; c033abf0 <blocked_list+0/8> <=====
>
> >>eax; c033abe4 <lease_break_time+0/4>
> >>ecx; c033abf0 <blocked_list+0/8>
> >>edx; c003abf0 Before first symbol
> >>esi; c1597ec4 <_end+1189610/2047274c>
> >>esp; c1597ebc <_end+1189608/2047274c>
>
> Trace; c0123741 <__run_task_queue+61/70>
> Trace; c012731f <tqueue_bh+1f/30>
> Trace; c0123664 <bh_action+54/80>
> Trace; c0123503 <tasklet_hi_action+63/a0>
> Trace; c01232ce <do_softirq+ce/d0>
> Trace; c010acec <do_IRQ+dc/e0>
> Trace; c010d318 <call_do_IRQ+5/d>
> Trace; c01c1940 <pr_power_idle+160/310>
> Trace; c01c17e0 <pr_power_idle+0/310>
> Trace; c0106ec2 <cpu_idle+42/60>
> Trace; c011e2b7 <printk+137/180>
>
> Code; c033abf0 <blocked_list+0/8>
> 00000000 <_EIP>:
> Code; c033abf0 <blocked_list+0/8> <=====
> 0: f0 ab lock stos %eax,%es:(%edi) <=====
> Code; c033abf2 <blocked_list+2/8>
> 2: c0 (bad)
> Code; c033abf3 <blocked_list+3/8>
> 3: f0 ab lock stos %eax,%es:(%edi)
> Code; c033abf5 <blocked_list+5/8>
> 5: 33 c0 xor %eax,%eax
> Code; c033abf7 <blocked_list+7/8>
> 7: 01 00 add %eax,(%eax)
> Code; c033abf9 <__compound_literal.0+1/4>
> 9: 00 00 add %al,(%eax)
> Code; c033abfb <__compound_literal.0+3/4>
> b: d8 c8 fmul %st(0),%st
> Code; c033abfd <dentry_unused+1/8>
> d: 59 pop %ecx
> Code; c033abfe <dentry_unused+2/8>
> e: c1 d8 cc rcr $0xcc,%eax
> Code; c033ac01 <dentry_unused+5/8>
> 11: 59 pop %ecx
> Code; c033ac02 <dentry_unused+6/8>
> 12: c1 00 00 roll $0x0,(%eax)
>
>
> kernel BUG in header file at line 51
> kernel BUG at panic.c: 141!
> invalid operand: 0000
> CPU: 1
> EIP: 0010:[<c011dab7>] Tainted: P
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010282
> eax: 00000025 ebx: 00000001 ecx: 00000086 edx: dfb0df7c
> esi: d451d980 edi: c1596000 ebp: c1597f6c esp: c1597f48
> ds: 0018 es: 0018 ss: 0018
> Process swapper (pid: 0, stackpage=c1597000)
> Stack: c02df320 00000033 c011b9bc 00000033 00000000 00000018 dadc8000
> d4519800
> c1596000 c1597fac c011a520 d451d980 d451d980 dadc8000 00000001
> dadc803c
> dadc8000 d451d980 00000002 00000026 00000001 c03a7a40 c01c17e0
> c1596000
> Call Trace: [<c011b9bc>] [<c011a520>] [<c01c17e0>] [<c0106ece>]
> [<c011e04b>]
> [<c011e2b7>]
> Code: 0f 0b 8d 00 61 00 2e c0 90 eb fe 90 90 90 90 90 90 90 90 90
>
>
> >>EIP; c011dab7 <__out_of_line_bug+17/60> <=====
>
> >>edx; dfb0df7c <_end+1f6ff6c8/2047274c>
> >>esi; d451d980 <_end+1410f0cc/2047274c>
> >>edi; c1596000 <_end+118774c/2047274c>
> >>ebp; c1597f6c <_end+11896b8/2047274c>
> >>esp; c1597f48 <_end+1189694/2047274c>
>
> Trace; c011b9bc <switch_mm+12c/130>
> Trace; c011a520 <schedule+1f0/3e0>
> Trace; c01c17e0 <pr_power_idle+0/310>
> Trace; c0106ece <cpu_idle+4e/60>
> Trace; c011e04b <call_console_drivers+5b/120>
> Trace; c011e2b7 <printk+137/180>
>
> Code; c011dab7 <__out_of_line_bug+17/60>
> 00000000 <_EIP>:
> Code; c011dab7 <__out_of_line_bug+17/60> <=====
> 0: 0f 0b ud2a <=====
> Code; c011dab9 <__out_of_line_bug+19/60>
> 2: 8d 00 lea (%eax),%eax
> Code; c011dabb <__out_of_line_bug+1b/60>
> 4: 61 popa
> Code; c011dabc <__out_of_line_bug+1c/60>
> 5: 00 2e add %ch,(%esi)
> Code; c011dabe <__out_of_line_bug+1e/60>
> 7: c0 90 eb fe 90 90 90 rclb $0x90,0x9090feeb(%eax)
> Code; c011dac5 <__out_of_line_bug+25/60>
> e: 90 nop
> Code; c011dac6 <__out_of_line_bug+26/60>
> f: 90 nop
> Code; c011dac7 <__out_of_line_bug+27/60>
> 10: 90 nop
> Code; c011dac8 <__out_of_line_bug+28/60>
> 11: 90 nop
> Code; c011dac9 <__out_of_line_bug+29/60>
> 12: 90 nop
> Code; c011daca <__out_of_line_bug+2a/60>
> 13: 90 nop
>
>
> [6.] A small shell script or example program which triggers the
> problem (if possible)
> [7.] Environment
> [7.1.] Software (add the output of the ver_linux script here)
> Linux death.krwtech.com 2.4.19-rc1 #18 SMP Sat Jun 29 13:50:03 EDT 2002
> i686 unknown
>
> Gnu C gcc (GCC) 3.1 Copyright (C) 2002 Free Software
> Foundation, Inc. This is free software; see the source for copying
> conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS
> FOR A PARTICULAR PURPOSE.
> Gnu make 3.79
> util-linux 2.11r
> mount 2.11r
> modutils 2.4.16
> e2fsprogs 1.22
> Linux C Library 2.2.5
> Dynamic linker (ldd) 2.2.5
> Linux C++ Library 4.0.0
> Procps 2.0.7
> Net-tools 1.60
> Kbd 1.06
> Sh-utils 2.0
> Modules Loaded NVdriver w83781d i2c-amd756
>
> [7.2.] Processor information (from /proc/cpuinfo):
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 6
> model : 6
> model name : AMD Athlon(tm) MP 1800+
> stepping : 2
> cpu MHz : 1533.393
> cache size : 256 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
> bogomips : 3060.53
>
> processor : 1
> vendor_id : AuthenticAMD
> cpu family : 6
> model : 6
> model name : AMD Athlon(tm) Processor
> stepping : 2
> cpu MHz : 1533.393
> cache size : 256 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
> bogomips : 3060.53
>
> [7.3.] Module information (from /proc/modules):
> NVdriver 989408 10 (autoclean)
> w83781d 19392 0
> i2c-amd756 3236 0 (unused)
>
> [7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
> /proc/ioports
> 0000-001f : dma1
> 0020-003f : pic1
> 0040-005f : timer
> 0060-006f : keyboard
> 0070-007f : rtc
> 0080-008f : dma page reg
> 00a0-00bf : pic2
> 00c0-00df : dma2
> 00f0-00ff : fpu
> 02f8-02ff : serial(auto)
> 0378-037a : parport0
> 037b-037f : parport0
> 03c0-03df : vga+
> 03f8-03ff : serial(auto)
> 0cf8-0cff : PCI conf1
> 1000-10ff : D-Link System Inc RTL8139 Ethernet
> 1000-10ff : 8139too
> 1400-143f : Advanced System Products, Inc ABP940-U / ABP960-U
> 1400-140f : advansys
> 1440-145f : Creative Labs SB Live! EMU10k1
> 1440-145f : EMU10K1
> 1470-1473 : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System
> Controller
> 1478-147f : Creative Labs SB Live! MIDI/Game Port
> 1478-147f : emu10k1-gp
> 80e0-80ef : amd756-smbus
> f000-f00f : Advanced Micro Devices [AMD] AMD-766 [ViperPlus] IDE
> f000-f007 : ide0
> f008-f00f : ide1
>
> /proc/iomem
> 00000000-0009f3ff : System RAM
> 0009f400-0009ffff : reserved
> 000a0000-000bffff : Video RAM area
> 000c0000-000c7fff : Video ROM
> 000cb800-000ce7ff : Extension ROM
> 000dc000-000dcfff : Advanced Micro Devices [AMD] AMD-766 [ViperPlus] USB
> 000e0000-000effff : Extension ROM
> 000f0000-000fffff : System ROM
> 00100000-1ffeffff : System RAM
> 00100000-002cd878 : Kernel code
> 002cd879-00359dbf : Kernel data
> 1fff0000-1ffffbff : ACPI Tables
> 1ffffc00-1fffffff : ACPI Non-volatile Storage
> e8001000-e80010ff : D-Link System Inc RTL8139 Ethernet
> e8001000-e80010ff : 8139too
> e8002000-e8002fff : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P]
> System Controller
> e9000000-e9ffffff : PCI Bus #01
> e9000000-e9ffffff : nVidia Corporation NV11 (GeForce2 MX)
> ec000000-efffffff : Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P]
> System Controller
> f0000000-f7ffffff : PCI Bus #01
> f0000000-f7ffffff : nVidia Corporation NV11 (GeForce2 MX)
> fec00000-fec0ffff : reserved
> fee00000-fee00fff : reserved
> fff80000-ffffffff : reserved
>
>
> [7.5.] PCI information ('lspci -vvv' as root)
> PCI devices found:
> Bus 0, device 0, function 0:
> Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System
> Controller (rev 17).
> Master Capable. Latency=64.
> Prefetchable 32 bit memory at 0xec000000 [0xefffffff].
> Prefetchable 32 bit memory at 0xe8002000 [0xe8002fff].
> I/O at 0x1470 [0x1473].
> Bus 0, device 1, function 0:
> PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP
> Bridge (rev 0).
> Master Capable. Latency=99. Min Gnt=12.
> Bus 0, device 7, function 0:
> ISA bridge: Advanced Micro Devices [AMD] AMD-766 [ViperPlus] ISA (rev
> 2).
> Bus 0, device 7, function 1:
> IDE interface: Advanced Micro Devices [AMD] AMD-766 [ViperPlus] IDE
> (rev 1).
> Master Capable. Latency=64.
> I/O at 0xf000 [0xf00f].
> Bus 0, device 7, function 3:
> Bridge: Advanced Micro Devices [AMD] AMD-766 [ViperPlus] ACPI (rev 1).
> Master Capable. Latency=64.
> Bus 0, device 7, function 4:
> USB Controller: Advanced Micro Devices [AMD] AMD-766 [ViperPlus] USB
> (rev 7).
> IRQ 19.
> Master Capable. Latency=16. Max Lat=80.
> Non-prefetchable 32 bit memory at 0xdc000 [0xdcfff].
> Bus 0, device 10, function 0:
> SCSI storage controller: Advanced System Products, Inc ABP940-U /
> ABP960-U (rev 2).
> IRQ 18.
> Master Capable. Latency=64. Min Gnt=4.Max Lat=4.
> I/O at 0x1400 [0x143f].
> Bus 0, device 11, function 0:
> Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 10).
> IRQ 19.
> Master Capable. Latency=64. Min Gnt=2.Max Lat=20.
> I/O at 0x1440 [0x145f].
> Bus 0, device 11, function 1:
> Input device controller: Creative Labs SB Live! MIDI/Game Port (rev
> 10).
> Master Capable. Latency=64.
> I/O at 0x1478 [0x147f].
> Bus 0, device 13, function 0:
> Ethernet controller: D-Link System Inc RTL8139 Ethernet (rev 16).
> IRQ 17.
> Master Capable. Latency=64. Min Gnt=32.Max Lat=64.
> I/O at 0x1000 [0x10ff].
> Non-prefetchable 32 bit memory at 0xe8001000 [0xe80010ff].
> Bus 1, device 5, function 0:
> VGA compatible controller: nVidia Corporation NV11 (GeForce2 MX) (rev
> 178).
> IRQ 18.
> Master Capable. Latency=248. Min Gnt=5.Max Lat=1.
> Non-prefetchable 32 bit memory at 0xe9000000 [0xe9ffffff].
> Prefetchable 32 bit memory at 0xf0000000 [0xf7ffffff].
>
> [7.6.] SCSI information (from /proc/scsi/scsi)
> Attached devices:
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
> Vendor: IBM Model: DNES-309170 Rev: SAH0
> Type: Direct-Access ANSI SCSI revision: 03
> Host: scsi0 Channel: 00 Id: 01 Lun: 00
> Vendor: WDIGTL Model: WD183 ULTRA2 Rev: 1.00
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 02 Lun: 00
> Vendor: MATSHITA Model: CD-R CW-7502 Rev: 4.05
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 05 Lun: 00
> Vendor: SEAGATE Model: SX410800N Rev: 7117
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 06 Lun: 00
> Vendor: CyberDrv Model: CD-ROM TW240S Rev: 1.20
> Type: CD-ROM ANSI SCSI revision: 02
>
>
> [7.7.] Other information that might be relevant to the problem
> (please look in /proc and include all information that you
> think to be relevant):
> [ken@death linux]$ cat /proc/interrupts
> CPU0 CPU1
> 0: 88598 86200 IO-APIC-edge timer
> 1: 2798 2740 IO-APIC-edge keyboard
> 2: 0 0 XT-PIC cascade
> 8: 0 0 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-edge acpi
> 12: 26342 29117 IO-APIC-edge PS/2 Mouse
> 17: 815 831 IO-APIC-level eth0
> 18: 58428 64043 IO-APIC-level advansys, nvidia
> 19: 1235 1182 IO-APIC-level EMU10K1
> NMI: 0 0
> LOC: 174708 174662
> ERR: 0
> MIS: 0
>
>
> [X.] Other notes, patches, fixes, workarounds:
> .config attached
>
> --
> Ken Witherow <phantoml AT rochester.rr.com> ICQ: 21840670
> KRW Technologies http://www.krwtech.com
> Linux 2.4.18 - because I'd like to get there today
> to remain free, we must remain free of intrusive government
>
>
>

2002-07-01 13:33:15

by Daniel Gryniewicz

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

On Mon, 2002-07-01 at 04:40, Eric W. Biederman wrote:
> Ken Witherow <[email protected]>, <[email protected]> writes:
>
> My memory says the 2460 doesn't supply enough voltage/power to both
> cpus, so getting it to run stable is a challenge.
>
> I have seen some systems where burn k7 would lock them in under a minute,
> due to this.
>
> Eric

I have a 2460 that runs rock solid. I does not warm reboot, but then it
doesn't reboot from Windows either. I've heard that a bios upgrade
fixes the warm reboot issue. I loaded it pretty hard for days at a
time, and never a hitch, except for the warm boot problems. Are you
sure the ones you've had problems with had MPs and not XPs in them?

Daniel

--
Recursion n.:
See Recursion.
-- Random Shack Data Processing Dictionary


2002-07-01 16:08:18

by Christopher Swingley

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

* Ken Witherow <[email protected]> [2002-Jun-30 10:55 AKDT]:
> [1.] One line summary of the problem:
> Random hangs occur on dual athlon system

I'll agree with previous comments about cooling, video card issues,
and adequate power supplies. But you may also want to consider the
quality of the power feeding into your machine. I had random halts,
shutdowns and even catastrophic failuers of Tyan dual AMD boards until
I put the systems on a good line conditioner. I'm convinced that
the problems all centered around under voltages from the wall, and
now that I've got true conditioned power, the systems run perfectly.

Chris
--
Christopher S. Swingley phone: 907-474-2689
Computer Systems Manager email: [email protected]
IARC -- Frontier Program GPG and PGP keys at my web page:
University of Alaska Fairbanks http://www.frontier.iarc.uaf.edu/~cswingle

2002-07-01 16:54:28

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Tyan 2460/Dual Athlon MP hangs

Daniel Gryniewicz <[email protected]> writes:

> On Mon, 2002-07-01 at 04:40, Eric W. Biederman wrote:
> > Ken Witherow <[email protected]>, <[email protected]>
> writes:
>
> >
> > My memory says the 2460 doesn't supply enough voltage/power to both
> > cpus, so getting it to run stable is a challenge.
> >
> > I have seen some systems where burn k7 would lock them in under a minute,
> > due to this.
> >
> > Eric
>
> I have a 2460 that runs rock solid. I does not warm reboot, but then it
> doesn't reboot from Windows either. I've heard that a bios upgrade
> fixes the warm reboot issue. I loaded it pretty hard for days at a
> time, and never a hitch, except for the warm boot problems. Are you
> sure the ones you've had problems with had MPs and not XPs in them?

100% though they were high end MP's that require more power.

Eric