2006-10-22 18:48:59

by Martin Bligh

[permalink] [raw]
Subject: Strange errors from e1000 driver (2.6.18)

I'm getting a lot of these type of errors if I run 2.6.18. If
I run the standard Ubuntu Dapper kernel, I don't get them.
What do they indicate?

Oct 21 18:48:28 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:28 localhost kernel: time_stamp <7b79d33>
Oct 21 18:48:28 localhost kernel: next_to_watch <3d>
Oct 21 18:48:28 localhost kernel: jiffies <7b7a0c1>
Oct 21 18:48:28 localhost kernel: next_to_watch.status <0>
Oct 21 18:48:30 localhost kernel: Tx Queue <0>
Oct 21 18:48:30 localhost kernel: TDH <3d>
Oct 21 18:48:30 localhost kernel: TDT <44>
Oct 21 18:48:30 localhost kernel: next_to_use <44>
Oct 21 18:48:30 localhost kernel: next_to_clean <39>
Oct 21 18:48:30 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:30 localhost kernel: time_stamp <7b79d33>
Oct 21 18:48:30 localhost kernel: next_to_watch <3d>
Oct 21 18:48:30 localhost kernel: jiffies <7b7a2b5>
Oct 21 18:48:30 localhost kernel: next_to_watch.status <0>
Oct 21 18:48:32 localhost kernel: Tx Queue <0>
Oct 21 18:48:32 localhost kernel: TDH <3d>
Oct 21 18:48:32 localhost kernel: TDT <44>
Oct 21 18:48:32 localhost kernel: next_to_use <44>
Oct 21 18:48:32 localhost kernel: next_to_clean <39>
Oct 21 18:48:32 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:32 localhost kernel: time_stamp <7b79d33>
Oct 21 18:48:32 localhost kernel: next_to_watch <3d>
Oct 21 18:48:32 localhost kernel: jiffies <7b7a4a9>
Oct 21 18:48:32 localhost kernel: next_to_watch.status <0>
Oct 21 18:48:34 localhost kernel: Tx Queue <0>
Oct 21 18:48:34 localhost kernel: TDH <3d>
Oct 21 18:48:34 localhost kernel: TDT <44>
Oct 21 18:48:34 localhost kernel: next_to_use <44>
Oct 21 18:48:34 localhost kernel: next_to_clean <39>
Oct 21 18:48:34 localhost kernel: buffer_info[next_to_clean]
Oct 21 18:48:34 localhost kernel: time_stamp <7b79d33>
Oct 21 18:48:34 localhost kernel: next_to_watch <3d>
Oct 21 18:48:34 localhost kernel: jiffies <7b7a69d>
Oct 21 18:48:34 localhost kernel: next_to_watch.status <0>
Oct 21 18:48:35 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 21 18:48:36 localhost kernel: e1000: eth0: e1000_watchdog: NIC Link
is Up 100 Mbps Full Duplex


2006-10-22 19:07:12

by Martin Bligh

[permalink] [raw]
Subject: Re: Strange errors from e1000 driver (2.6.18)

Martin J. Bligh wrote:
> I'm getting a lot of these type of errors if I run 2.6.18. If
> I run the standard Ubuntu Dapper kernel, I don't get them.
> What do they indicate?
>
> Oct 21 18:48:28 localhost kernel: buffer_info[next_to_clean]
> Oct 21 18:48:28 localhost kernel: time_stamp <7b79d33>
> Oct 21 18:48:28 localhost kernel: next_to_watch <3d>
> Oct 21 18:48:28 localhost kernel: jiffies <7b7a0c1>
> Oct 21 18:48:28 localhost kernel: next_to_watch.status <0>
> Oct 21 18:48:30 localhost kernel: Tx Queue <0>
> Oct 21 18:48:30 localhost kernel: TDH <3d>
> Oct 21 18:48:30 localhost kernel: TDT <44>
> Oct 21 18:48:30 localhost kernel: next_to_use <44>
> Oct 21 18:48:30 localhost kernel: next_to_clean <39>
> Oct 21 18:48:30 localhost kernel: buffer_info[next_to_clean]
> Oct 21 18:48:30 localhost kernel: time_stamp <7b79d33>
> Oct 21 18:48:30 localhost kernel: next_to_watch <3d>
> Oct 21 18:48:30 localhost kernel: jiffies <7b7a2b5>
> Oct 21 18:48:30 localhost kernel: next_to_watch.status <0>
> Oct 21 18:48:32 localhost kernel: Tx Queue <0>
> Oct 21 18:48:32 localhost kernel: TDH <3d>
> Oct 21 18:48:32 localhost kernel: TDT <44>
> Oct 21 18:48:32 localhost kernel: next_to_use <44>
> Oct 21 18:48:32 localhost kernel: next_to_clean <39>
> Oct 21 18:48:32 localhost kernel: buffer_info[next_to_clean]
> Oct 21 18:48:32 localhost kernel: time_stamp <7b79d33>
> Oct 21 18:48:32 localhost kernel: next_to_watch <3d>
> Oct 21 18:48:32 localhost kernel: jiffies <7b7a4a9>
> Oct 21 18:48:32 localhost kernel: next_to_watch.status <0>
> Oct 21 18:48:34 localhost kernel: Tx Queue <0>
> Oct 21 18:48:34 localhost kernel: TDH <3d>
> Oct 21 18:48:34 localhost kernel: TDT <44>
> Oct 21 18:48:34 localhost kernel: next_to_use <44>
> Oct 21 18:48:34 localhost kernel: next_to_clean <39>
> Oct 21 18:48:34 localhost kernel: buffer_info[next_to_clean]
> Oct 21 18:48:34 localhost kernel: time_stamp <7b79d33>
> Oct 21 18:48:34 localhost kernel: next_to_watch <3d>
> Oct 21 18:48:34 localhost kernel: jiffies <7b7a69d>
> Oct 21 18:48:34 localhost kernel: next_to_watch.status <0>
> Oct 21 18:48:35 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Oct 21 18:48:36 localhost kernel: e1000: eth0: e1000_watchdog: NIC Link
> is Up 100 Mbps Full Duplex

Actually, maybe this set is more helpful:

e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <6>
TDT <1f>
next_to_use <1f>
next_to_clean <2>
buffer_info[next_to_clean]
time_stamp <2de8b54>
next_to_watch <6>
jiffies <2de8db7>
next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <6>
TDT <1f>
next_to_use <1f>
next_to_clean <2>
buffer_info[next_to_clean]
time_stamp <2de8b54>
next_to_watch <6>
jiffies <2de8fab>
next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <6>
TDT <1f>
next_to_use <1f>
next_to_clean <2>
buffer_info[next_to_clean]
time_stamp <2de8b54>
next_to_watch <6>
jiffies <2de919f>
next_to_watch.status <0>
NETDEV WATCHDOG: eth0: transmit timed out
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex


2006-10-22 20:21:43

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: Strange errors from e1000 driver (2.6.18)

On 10/22/06, Martin J. Bligh <[email protected]> wrote:
> Martin J. Bligh wrote:
> > I'm getting a lot of these type of errors if I run 2.6.18. If
> > I run the standard Ubuntu Dapper kernel, I don't get them.
> > What do they indicate?

Hi Martin, they indicate that you're getting transmit hangs. Means
your hardware is having issues with some of the buffers it is being
handed. Because the TDH and TDT noted below are not equal, it means
the hardware is hung processing buffers that the driver gave to it.

We need the standard bug report particulars, lspci -vv, cat
/proc/interrupts, dmesg, ethtool -e eth0, and maybe output of
dmidecode, etc. I'm pretty sure you know the drill.

> > Oct 21 18:48:28 localhost kernel: buffer_info[next_to_clean]
> > Oct 21 18:48:28 localhost kernel: time_stamp <7b79d33>
> > Oct 21 18:48:28 localhost kernel: next_to_watch <3d>
> > Oct 21 18:48:28 localhost kernel: jiffies <7b7a0c1>
> > Oct 21 18:48:28 localhost kernel: next_to_watch.status <0>
> > Oct 21 18:48:30 localhost kernel: Tx Queue <0>
> > Oct 21 18:48:30 localhost kernel: TDH <3d>
> > Oct 21 18:48:30 localhost kernel: TDT <44>
> > Oct 21 18:48:30 localhost kernel: next_to_use <44>
> > Oct 21 18:48:30 localhost kernel: next_to_clean <39>
<snip>

> Actually, maybe this set is more helpful:
>
> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> Tx Queue <0>
> TDH <6>
> TDT <1f>
> next_to_use <1f>
> next_to_clean <2>
> buffer_info[next_to_clean]
> time_stamp <2de8b54>
> next_to_watch <6>
> jiffies <2de8db7>
> next_to_watch.status <0>
<snip>
> NETDEV WATCHDOG: eth0: transmit timed out
> e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex

only a little. There are so many different pieces of e1000 hardware
and so few specifics in this report that I'll be able to tell you lots
more when you get us the info requested.

Jesse

2006-10-22 20:29:25

by Martin Bligh

[permalink] [raw]
Subject: Re: Strange errors from e1000 driver (2.6.18)

# dmidecode 2.7
SMBIOS 2.2 present.
37 structures occupying 959 bytes.
Table at 0x000F0800.

Handle 0x0000, DMI type 0, 19 bytes.
BIOS Information
Vendor: Phoenix Technologies, LTD
Version: 6.00 PG
Release Date: 09/13/2004
Address: 0xE0000
Runtime Size: 128 kB
ROM Size: 256 kB
Characteristics:
ISA is supported
PCI is supported
PNP is supported
APM is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/360 KB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 KB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
AGP is supported
LS-120 boot is supported
ATAPI Zip drive boot is supported

Handle 0x0001, DMI type 1, 25 bytes.
System Information
Manufacturer: VIA Technologies, Inc.
Product Name: KT600-8237
Version:
Serial Number:
UUID: Not Present
Wake-up Type: Power Switch

Handle 0x0002, DMI type 2, 8 bytes.
Base Board Information
Manufacturer:
Product Name: KT600-8237
Version:
Serial Number:

Handle 0x0003, DMI type 3, 13 bytes.
Chassis Information
Manufacturer:
Type: Desktop
Lock: Not Present
Version:
Serial Number:
Asset Tag:
Boot-up State: Unknown
Power Supply State: Unknown
Thermal State: Unknown
Security Status: Unknown

Handle 0x0004, DMI type 4, 32 bytes.
Processor Information
Socket Designation: Socket A
Type: Central Processor
Family: Athlon XP
Manufacturer: AMD
ID: A0 06 00 00 FF FB 83 03
Signature: Family 6, Model A, Stepping 0
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
MMX (MMX technology supported)
FXSR (Fast floating-point save and restore)
SSE (Streaming SIMD extensions)
Version: AMD Athlon(tm) XP
Voltage: 1.5 V
External Clock: 100 MHz
Max Speed: 3000 MHz
Current Speed: 1100 MHz
Status: Populated, Enabled
Upgrade: ZIF Socket
L1 Cache Handle: 0x0009
L2 Cache Handle: 0x000A
L3 Cache Handle: No L3 Cache

Handle 0x0005, DMI type 5, 22 bytes.
Memory Controller Information
Error Detecting Method: None
Error Correcting Capabilities:
None
Supported Interleave: One-way Interleave
Current Interleave: Four-way Interleave
Maximum Memory Module Size: 32 MB
Maximum Total Memory Size: 96 MB
Supported Speeds:
70 ns
60 ns
Supported Memory Types:
Standard
EDO
Memory Module Voltage: 5.0 V
Associated Memory Slots: 3
0x0006
0x0007
0x0008
Enabled Error Correcting Capabilities: None

Handle 0x0006, DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A0
Bank Connections: 0 1
Current Speed: 60 ns
Type: Other SDRAM
Installed Size: 512 MB (Double-bank Connection)
Enabled Size: 512 MB (Double-bank Connection)
Error Status: OK

Handle 0x0007, DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A1
Bank Connections: 2 3
Current Speed: 60 ns
Type: Other SDRAM
Installed Size: 512 MB (Double-bank Connection)
Enabled Size: 512 MB (Double-bank Connection)
Error Status: OK

Handle 0x0008, DMI type 6, 12 bytes.
Memory Module Information
Socket Designation: A2
Bank Connections: None
Current Speed: 60 ns
Type: Unknown
Installed Size: Not Installed
Enabled Size: Not Installed
Error Status: OK

Handle 0x0009, DMI type 7, 19 bytes.
Cache Information
Socket Designation: Internal Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 128 KB
Maximum Size: 128 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown

Handle 0x000A, DMI type 7, 19 bytes.
Cache Information
Socket Designation: External Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: External
Installed Size: 512 KB
Maximum Size: 512 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown

Handle 0x000B, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: PRIMARY IDE
Internal Connector Type: On Board IDE
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other

Handle 0x000C, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: SECONDARY IDE
Internal Connector Type: On Board IDE
External Reference Designator: Not Specified
External Connector Type: None
Port Type: Other

Handle 0x000D, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: FDD
Internal Connector Type: On Board Floppy
External Reference Designator: Not Specified
External Connector Type: None
Port Type: 8251 FIFO Compatible

Handle 0x000E, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: COM1
Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
External Reference Designator:
External Connector Type: DB-9 male
Port Type: Serial Port 16450 Compatible

Handle 0x000F, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: COM2
Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
External Reference Designator:
External Connector Type: DB-9 male
Port Type: Serial Port 16450 Compatible

Handle 0x0010, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: LPT1
Internal Connector Type: DB-25 female
External Reference Designator:
External Connector Type: DB-25 female
Port Type: Parallel Port ECP/EPP

Handle 0x0011, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: Keyboard
Internal Connector Type: PS/2
External Reference Designator:
External Connector Type: PS/2
Port Type: Keyboard Port

Handle 0x0012, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: PS/2 Mouse
Internal Connector Type: PS/2
External Reference Designator:
External Connector Type: PS/2
Port Type: Mouse Port

Handle 0x0013, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: USB0
External Connector Type: Other
Port Type: USB

Handle 0x0014, DMI type 8, 9 bytes.
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: AUDIO
External Connector Type: None
Port Type: Audio Port

Handle 0x0015, DMI type 9, 13 bytes.
System Slot Information
Designation: PCI0
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 1
Characteristics:
5.0 V is provided
PME signal is supported

Handle 0x0016, DMI type 9, 13 bytes.
System Slot Information
Designation: PCI1
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 2
Characteristics:
5.0 V is provided
PME signal is supported

Handle 0x0017, DMI type 9, 13 bytes.
System Slot Information
Designation: PCI2
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 3
Characteristics:
5.0 V is provided
PME signal is supported

Handle 0x0018, DMI type 9, 13 bytes.
System Slot Information
Designation: PCI3
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 4
Characteristics:
5.0 V is provided
PME signal is supported

Handle 0x0019, DMI type 9, 13 bytes.
System Slot Information
Designation: AGP
Type: 32-bit AGP
Current Usage: Available
Length: Long
ID: 8
Characteristics:
5.0 V is provided

Handle 0x001A, DMI type 13, 22 bytes.
BIOS Language Information
Installable Languages: 3
n|US|iso8859-1
n|US|iso8859-1
r|CA|iso8859-1
Currently Installed Language: n|US|iso8859-1

Handle 0x001B, DMI type 16, 15 bytes.
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 3 GB
Error Information Handle: Not Provided
Number Of Devices: 3

Handle 0x001C, DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: 512 MB
Form Factor: DIMM
Set: None
Locator: A0
Bank Locator: Bank0/1
Type: Unknown
Type Detail: None

Handle 0x001D, DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: 512 MB
Form Factor: DIMM
Set: None
Locator: A1
Bank Locator: Bank2/3
Type: Unknown
Type Detail: None

Handle 0x001E, DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: DIMM
Set: None
Locator: A2
Bank Locator: Bank4/5
Type: Unknown
Type Detail: None

Handle 0x001F, DMI type 19, 15 bytes.
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0003FFFFFFF
Range Size: 1 GB
Physical Array Handle: 0x001B
Partition Width: 0

Handle 0x0020, DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0001FFFFFFF
Range Size: 512 MB
Physical Device Handle: 0x001C
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1

Handle 0x0021, DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00020000000
Ending Address: 0x0003FFFFFFF
Range Size: 512 MB
Physical Device Handle: 0x001D
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1

Handle 0x0022, DMI type 20, 19 bytes.
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x000000003FF
Range Size: 1 kB
Physical Device Handle: 0x001E
Memory Array Mapped Address Handle: 0x001F
Partition Row Position: 1

Handle 0x0023, DMI type 32, 11 bytes.
System Boot Information
Status: No errors detected

Handle 0x0024, DMI type 127, 4 bytes.
End Of Table


Attachments:
dmidecode (10.48 kB)

2006-10-22 22:15:30

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: Strange errors from e1000 driver (2.6.18)

Analysis follows, but I wanted to ask you to bisect back if you can to
find the apparent patch to make the difference. Basically at this
point I'd say its not likely to be an e1000 issue, but I'd like to
follow up and make sure.

On 10/22/06, Martin J. Bligh <[email protected]> wrote:
> 0000:00:0a.0 Ethernet controller: Intel Corporation 82546EB Gigabit
> Ethernet Con
> troller (Copper) (rev 01)
> Latency: 32 (63750ns min), Cache Line Size: 0x08 (32 bytes)
> Interrupt: pin A routed to IRQ 5
> Region 0: Memory at ef020000 (64-bit, non-prefetchable) [size=128K]
> Region 4: I/O ports at a000 [size=64]
> Capabilities: [dc] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot
> +,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [e4] Capabilities: [f0] Message Signalled
> Interrupts:
> 64bit+ Queue=0/0 Enable-
> Address: 0000000000000000 Data: 0000
>
> 0000:00:0a.1 Ethernet controller: Intel Corporation 82546EB Gigabit
> Ethernet Con
> troller (Copper) (rev 01)
> Subsystem: Intel Corporation PRO/1000 MT Dual Port Server Adapter
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Step
> ping- SERR- FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
> >TAbort- <TAbort
> - <MAbort- >SERR- <PERR-
> Latency: 32 (63750ns min), Cache Line Size: 0x08 (32 bytes)
> Interrupt: pin B routed to IRQ 11
> Region 0: Memory at ef000000 (64-bit, non-prefetchable) [size=128K]
> Region 4: I/O ports at a400 [size=64]
> Capabilities: [dc] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot
> +,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [e4] Capabilities: [f0] Message Signalled
> Interrupts:
> 64bit+ Queue=0/0 Enable-
> Address: 0000000000000000 Data: 0000

Nothing seems out of order, but the latency may be low, I'd be curious
what these looked like before with the old kernel. Some of the other
things to compare would have been the lspci -vv output from your
chipset with old/new kernel, in case the bridge/system configuration
changed. There are no known problems right now with this chipset
82546EB

> > cat /proc/interrupts,
>
> CPU0
> 5: 1975991 XT-PIC ehci_hcd:usb2, VIA8237, eth0
> NMI: 0
> LOC: 146264664
> ERR: 52805

shared int, fine, but whats with the ERR: ?

> > dmesg
>
> Did that bit already.

except you didn't include any of the e1000 load information nor the
system's boot information as it came up.

> > ethtool -e eth0,
>
> root@titus:/usr/local/autotest/bin # ethtool -e eth0
> Offset Values
> ------ ------
> 0x0000 00 07 e9 09 0b 08 30 05 ff ff ff ff ff ff ff ff
> 0x0010 44 a9 03 98 0b 46 11 10 86 80 10 10 86 80 68 34
> 0x0020 0c 00 10 10 00 00 02 21 c8 18 ff ff ff ff ff ff
> 0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 0x0040 0c c3 61 78 04 50 02 21 c8 08 ff ff ff ff ff ff
> 0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02 06
> 0x0060 2c 00 00 40 07 11 00 00 2c 00 00 40 ff ff ff ff
nothing out of order here that I can see immediately.

> > and maybe output of
> > dmidecode, etc.
>
> Attached.
>
> > only a little. There are so many different pieces of e1000 hardware
> > and so few specifics in this report that I'll be able to tell you lots
> > more when you get us the info requested.
>
> Thanks. Not sure if the bug wasn't there in earlier kernels, or if we
> just weren't printing anything.

I think it may not have been in earlier kernels, but I also don't
think this is an e1000 problem, at least initially.

<snip>
> BIOS Information
> Vendor: Phoenix Technologies, LTD
> Version: 6.00 PG
> Release Date: 09/13/2004
> Address: 0xE0000
> Runtime Size: 128 kB
> ROM Size: 256 kB
<snip>

> Handle 0x0001, DMI type 1, 25 bytes.
> System Information
> Manufacturer: VIA Technologies, Inc.
> Product Name: KT600-8237
>
> Handle 0x0002, DMI type 2, 8 bytes.
> Base Board Information
> Manufacturer:
> Product Name: KT600-8237
> Version:
> Serial Number:

This chipset is one of the most frequent common elements in problem
reports of TX hangs for e1000. My current theory (we've bought a
bunch of these systems and never reproduced the issue) is that there
is something either design specific or BIOS specific that causes this
chipset to interact very badly with e1000 hardware. Some systems have
the issue and some don't. If you could bisect back to a working point
it would be interesting to see where that pointed.

> Handle 0x0004, DMI type 4, 32 bytes.
> Processor Information
> Socket Designation: Socket A
> Type: Central Processor
> Family: Athlon XP
> Manufacturer: AMD
> ID: A0 06 00 00 FF FB 83 03
> Signature: Family 6, Model A, Stepping 0
> Version: AMD Athlon(tm) XP
> Voltage: 1.5 V
> External Clock: 100 MHz
> Max Speed: 3000 MHz
> Current Speed: 1100 MHz

doesn't seem you're overclocked. Good.

2006-10-22 22:35:12

by Martin Bligh

[permalink] [raw]
Subject: Re: Strange errors from e1000 driver (2.6.18)

Linux version 2.6.18 (mbligh@titus) (gcc version 3.4.6 (Ubuntu 3.4.6-1ubuntu2)) #2 Sun Oct 22 13:45:39 PDT 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
Warning only 896MB will be used.
Use a HIGHMEM enabled kernel.
896MB LOWMEM available.
found SMP MP-table at 000f52b0
On node 0 totalpages: 229376
DMA zone: 4096 pages, LIFO batch:0
Normal zone: 225280 pages, LIFO batch:31
DMI 2.2 present.
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
Processor #0 6:10 APIC version 17
I/O APIC #2 Version 17 at 0xFEC00000.
Enabling APIC mode: Flat. Using 1 I/O APICs
Processors: 1
Allocating PCI resources starting at 50000000 (gap: 40000000:bec00000)
Detected 1098.980 MHz processor.
Built 1 zonelists. Total pages: 229376
Kernel command line: root=/dev/hda1 ro lapic profile=2
kernel profiling enabled (shift: 2)
mapped APIC to ffffd000 (fee00000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c04e4000 soft=c04e3000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 902328k/917504k available (2647k kernel code, 14784k reserved, 1144k data, 160k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 2200.00 BogoMIPS (lpj=4400011)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383fbff c1c3fbff 00000000 00000000 00000000 00000000 00000000
CPU: After vendor identify, caps: 0383fbff c1c3fbff 00000000 00000000 00000000 00000000 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1c3fbff 00000000 00000420 00000000 00000000 00000000
Compat vDSO mapped to ffffe000.
CPU: AMD Athlon(tm) stepping 00
Checking 'hlt' instruction... OK.
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb360, last bus=1
PCI: Using configuration type 1
Setting up standard PCI resources
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
Boot video device is 0000:00:09.0
PCI: Using IRQ router VIA [1106/3227] at 0000:00:11.0
spurious 8259A interrupt: IRQ7.
PCI: Bridge: 0000:00:01.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Setting latency timer of device 0000:00:01.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 7, 524288 bytes)
TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac)
Installing knfsd (copyright (C) 1996 [email protected]).
io scheduler noop registered
io scheduler cfq registered (default)
lp: driver loaded but no devices found
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
floppy0: no floppy controllers found
loop: loaded (max 8 devices)
Intel(R) PRO/1000 Network Driver - version 7.1.9-k4
Copyright (c) 1999-2006 Intel Corporation.
PCI: setting IRQ 7 as level-triggered
PCI: Found IRQ 7 for device 0000:00:0a.0
PCI: Sharing IRQ 7 with 0000:00:10.4
PCI: Sharing IRQ 7 with 0000:00:11.5
e1000: 0000:00:0a.0: e1000_probe: (PCI:33MHz:32-bit) 00:07:e9:09:0b:08
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
PCI: setting IRQ 5 as level-triggered
PCI: Found IRQ 5 for device 0000:00:0a.1
PCI: Sharing IRQ 5 with 0000:00:0b.0
e1000: 0000:00:0a.1: e1000_probe: (PCI:33MHz:32-bit) 00:07:e9:09:0b:09
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
via-rhine.c:v1.10-LK1.4.1 July-24-2006 Written by Donald Becker
PCI: setting IRQ 10 as level-triggered
PCI: Found IRQ 10 for device 0000:00:12.0
PCI: Sharing IRQ 10 with 0000:00:0b.1
PCI: Sharing IRQ 10 with 0000:00:10.0
PCI: Sharing IRQ 10 with 0000:00:10.1
PCI: Sharing IRQ 10 with 0000:00:0c.0
eth2: VIA Rhine II at 0x1e000, 00:11:5b:a4:70:4d, IRQ 10.
eth2: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000.
Linux video capture interface: v2.00
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:0f.1
PCI: VIA IRQ fixup for 0000:00:0f.1, from 255 to 0
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8237 (rev 00) IDE UDMA133 controller on pci0000:00:0f.1
ide0: BM-DMA at 0xc800-0xc807, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xc808-0xc80f, BIOS settings: hdc:DMA, hdd:DMA
Probing IDE interface ide0...
hda: ST3200822A, ATA DISK drive
hdb: ST3400832A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: _NEC DVD_RW ND-3550A, ATAPI CD/DVD-ROM drive
hdd: DVD-ROM BDV316B, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 512KiB
hda: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, UDMA(100)
hda: cache flushes supported
hda: hda1 hda2 hda3
hdb: max request size: 512KiB
hdb: 781422768 sectors (400088 MB) w/8192KiB Cache, CHS=48641/255/63, UDMA(100)
hdb: cache flushes supported
hdb: hdb1
hdc: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
hdd: ATAPI 48X DVD-ROM drive, 512kB Cache, UDMA(33)
ide-floppy driver 0.99.newide
usbmon: debugfs is not available
PCI: setting IRQ 11 as level-triggered
PCI: Found IRQ 11 for device 0000:00:0b.2
PCI: Sharing IRQ 11 with 0000:00:09.0
PCI: Sharing IRQ 11 with 0000:00:10.2
PCI: Sharing IRQ 11 with 0000:00:10.3
ehci_hcd 0000:00:0b.2: EHCI Host Controller
ehci_hcd 0000:00:0b.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:0b.2: irq 11, io mem 0xef040000
ehci_hcd 0000:00:0b.2: USB 2.0 started, EHCI 0.95, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
PCI: Found IRQ 7 for device 0000:00:10.4
PCI: Sharing IRQ 7 with 0000:00:0a.0
PCI: Sharing IRQ 7 with 0000:00:11.5
ehci_hcd 0000:00:10.4: EHCI Host Controller
ehci_hcd 0000:00:10.4: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:10.4: irq 7, io mem 0xef041000
ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 8 ports detected
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
USB Universal Host Controller Interface driver v3.0
PCI: Found IRQ 5 for device 0000:00:0b.0
PCI: Sharing IRQ 5 with 0000:00:0a.1
uhci_hcd 0000:00:0b.0: UHCI Host Controller
uhci_hcd 0000:00:0b.0: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:0b.0: irq 5, io base 0x0000a800
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
PCI: Found IRQ 10 for device 0000:00:0b.1
PCI: Sharing IRQ 10 with 0000:00:10.0
PCI: Sharing IRQ 10 with 0000:00:10.1
PCI: Sharing IRQ 10 with 0000:00:0c.0
PCI: Sharing IRQ 10 with 0000:00:12.0
uhci_hcd 0000:00:0b.1: UHCI Host Controller
uhci_hcd 0000:00:0b.1: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:0b.1: irq 10, io base 0x0000ac00
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
PCI: Found IRQ 10 for device 0000:00:10.0
PCI: Sharing IRQ 10 with 0000:00:0b.1
PCI: Sharing IRQ 10 with 0000:00:10.1
PCI: Sharing IRQ 10 with 0000:00:0c.0
PCI: Sharing IRQ 10 with 0000:00:12.0
uhci_hcd 0000:00:10.0: UHCI Host Controller
uhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:10.0: irq 10, io base 0x0000cc00
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
PCI: Found IRQ 10 for device 0000:00:10.1
PCI: Sharing IRQ 10 with 0000:00:0b.1
PCI: Sharing IRQ 10 with 0000:00:10.0
PCI: Sharing IRQ 10 with 0000:00:0c.0
PCI: Sharing IRQ 10 with 0000:00:12.0
uhci_hcd 0000:00:10.1: UHCI Host Controller
uhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 6
uhci_hcd 0000:00:10.1: irq 10, io base 0x0000d000
usb usb6: configuration #1 chosen from 1 choice
hub 6-0:1.0: USB hub found
hub 6-0:1.0: 2 ports detected
PCI: Found IRQ 11 for device 0000:00:10.2
PCI: Sharing IRQ 11 with 0000:00:09.0
PCI: Sharing IRQ 11 with 0000:00:10.3
PCI: Sharing IRQ 11 with 0000:00:0b.2
uhci_hcd 0000:00:10.2: UHCI Host Controller
uhci_hcd 0000:00:10.2: new USB bus registered, assigned bus number 7
uhci_hcd 0000:00:10.2: irq 11, io base 0x0000d400
usb usb7: configuration #1 chosen from 1 choice
hub 7-0:1.0: USB hub found
hub 7-0:1.0: 2 ports detected
PCI: Found IRQ 11 for device 0000:00:10.3
PCI: Sharing IRQ 11 with 0000:00:09.0
PCI: Sharing IRQ 11 with 0000:00:10.2
PCI: Sharing IRQ 11 with 0000:00:0b.2
uhci_hcd 0000:00:10.3: UHCI Host Controller
uhci_hcd 0000:00:10.3: new USB bus registered, assigned bus number 8
uhci_hcd 0000:00:10.3: irq 11, io base 0x0000d800
usb usb8: configuration #1 chosen from 1 choice
hub 8-0:1.0: USB hub found
hub 8-0:1.0: 2 ports detected
Initializing USB Mass Storage driver...
usbcore: registered new driver usb-storage
USB Mass Storage support registered.
usbcore: registered new driver usbserial
drivers/usb/serial/usb-serial.c: USB Serial support registered for generic
usbcore: registered new driver usbserial_generic
drivers/usb/serial/usb-serial.c: USB Serial Driver core
drivers/usb/serial/usb-serial.c: USB Serial support registered for Handspring Visor / Palm OS
drivers/usb/serial/usb-serial.c: USB Serial support registered for Sony Clie 3.5
drivers/usb/serial/usb-serial.c: USB Serial support registered for Sony Clie 5.0
usbcore: registered new driver visor
drivers/usb/serial/visor.c: USB HandSpring Visor / Palm OS driver
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
Advanced Linux Sound Architecture Driver Version 1.0.12rc1 (Thu Jun 22 13:55:50 2006 UTC).
PCI: Found IRQ 7 for device 0000:00:11.5
PCI: Sharing IRQ 7 with 0000:00:0a.0
PCI: Sharing IRQ 7 with 0000:00:10.4
PCI: Setting latency timer of device 0000:00:11.5 to 64
input: AT Translated Set 2 keyboard as /class/input/input0
logips2pp: Detected unknown logitech mouse model 11
ALSA device list:
#0: VIA 8237 with ALC655 at 0xdc00, irq 7
oprofile: using NMI interrupt.
ip_conntrack version 2.4 (7168 buckets, 57344 max) - 172 bytes per conntrack
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Using IPI Shortcut mode
Time: tsc clocksource has been installed.
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 160k freed
input: PS/2 Logitech Mouse as /class/input/input1
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
Unable to find swap-space signature
EXT3 FS on hda1, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on hda3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on hdb1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Unable to find swap-space signature
hdc: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hdc: drive_cmd: error=0x04 { AbortedCommand }
ide: failed opcode was: 0xec
hdd: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hdd: drive_cmd: error=0x04 { AbortedCommand }
ide: failed opcode was: 0xec
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <26>
TDT <26>
next_to_use <26>
next_to_clean <39>
buffer_info[next_to_clean]
time_stamp <77145>
next_to_watch <3b>
jiffies <7734f>
next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <26>
TDT <26>
next_to_use <26>
next_to_clean <39>
buffer_info[next_to_clean]
time_stamp <77145>
next_to_watch <3b>
jiffies <77543>
next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <26>
TDT <26>
next_to_use <26>
next_to_clean <39>
buffer_info[next_to_clean]
time_stamp <77145>
next_to_watch <3b>
jiffies <77737>
next_to_watch.status <0>
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <26>
TDT <26>
next_to_use <26>
next_to_clean <39>
buffer_info[next_to_clean]
time_stamp <77145>
next_to_watch <3b>
jiffies <7792b>
next_to_watch.status <0>
NETDEV WATCHDOG: eth0: transmit timed out
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex


Attachments:
dmesg (13.82 kB)

2006-10-22 23:03:15

by Dumitru Ciobarcianu

[permalink] [raw]
Subject: Re: Strange errors from e1000 driver (2.6.18)

On Sun, 2006-10-22 at 13:27 -0700, Martin J. Bligh wrote:
> Jesse Brandeburg wrote:
> > On 10/22/06, Martin J. Bligh <[email protected]> wrote:
> >> Martin J. Bligh wrote:
> >> > I'm getting a lot of these type of errors if I run 2.6.18. If
> >> > I run the standard Ubuntu Dapper kernel, I don't get them.
> >> > What do they indicate?
> >
> > Hi Martin, they indicate that you're getting transmit hangs. Means
> > your hardware is having issues with some of the buffers it is being
> > handed. Because the TDH and TDT noted below are not equal, it means
> > the hardware is hung processing buffers that the driver gave to it.
> >
> > We need the standard bug report particulars,
>
> Sure.
>
> Handle 0x0001, DMI type 1, 25 bytes.
> System Information
> Manufacturer: VIA Technologies, Inc.
> Product Name: KT600-8237
> Version:
> Serial Number:
> UUID: Not Present
> Wake-up Type: Power Switch


If this matters: I've got the same errors with the fc5 kernel sometime
around january, also on an VIA-based motherboard. I only got around to
fix it by changing the motherboard... (worked fine with an intel-based
mb).


--
Cioby


2006-10-22 23:29:11

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: Strange errors from e1000 driver (2.6.18)

On 10/22/06, Martin J. Bligh <[email protected]> wrote:
> Jesse Brandeburg wrote:
> > Analysis follows, but I wanted to ask you to bisect back if you can to
> > find the apparent patch to make the difference. Basically at this
> > point I'd say its not likely to be an e1000 issue, but I'd like to
> > follow up and make sure.
>
> That's going to be ugly, since I can't reproduce it at will. Maybe if
> I netperf it to the other box I can push it over.

try tbench with 100 sessions (from dbench package) and see if that hurts.

> > Nothing seems out of order, but the latency may be low, I'd be curious
> > what these looked like before with the old kernel. Some of the other
> > things to compare would have been the lspci -vv output from your
> > chipset with old/new kernel, in case the bridge/system configuration
> > changed. There are no known problems right now with this chipset
> > 82546EB
>
> OK. will try later when I have more time. For now I switched to the
> onboard via rhine controller.

ouch.

> > shared int, fine, but whats with the ERR: ?
>
> Hmm. Having rebooted they look rather lower. but might be a time thing.
>
> CPU0
> 0: 1405995 XT-PIC timer
> 1: 5910 XT-PIC i8042
> 2: 0 XT-PIC cascade
> 5: 0 XT-PIC uhci_hcd:usb3
> 7: 27135 XT-PIC ehci_hcd:usb2, VIA8237, eth0
> 10: 0 XT-PIC uhci_hcd:usb4, uhci_hcd:usb5,
> uhci_hcd:usb6
> 11: 0 XT-PIC ehci_hcd:usb1, uhci_hcd:usb7,
> uhci_hcd:usb8
> 12: 157547 XT-PIC i8042
> 14: 36296 XT-PIC ide0
> 15: 196690 XT-PIC ide1
> NMI: 0
> LOC: 1406006
> ERR: 26
>
> > except you didn't include any of the e1000 load information nor the
> > system's boot information as it came up.
>
> OK, it had gone since reboot, but I rebooted just now .... new info
> attached.
>
> > This chipset is one of the most frequent common elements in problem
> > reports of TX hangs for e1000. My current theory (we've bought a
> > bunch of these systems and never reproduced the issue) is that there
> > is something either design specific or BIOS specific that causes this
> > chipset to interact very badly with e1000 hardware. Some systems have
> > the issue and some don't. If you could bisect back to a working point
> > it would be interesting to see where that pointed.
>
> OK, is going to be hard to bisect, since the other one was an Ubuntu
> kernel, but I guess I can give 2.6.15 virgin a shot, at least.

thanks, I know how difficult and time consuming bisecting is.

> > doesn't seem you're overclocked. Good.
>
> Nah, I'm pretty conservative with hardware, get enough problems when
> it's all running within specs ;-)
>
> Thanks for looking at all this.

welcome, like to help when I can.

> Linux version 2.6.18 (mbligh@titus) (gcc version 3.4.6 (Ubuntu 3.4.6-1ubuntu2)) #2 Sun > e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> Tx Queue <0>
> TDH <26>
> TDT <26>
> next_to_use <26>
> next_to_clean <39>
> buffer_info[next_to_clean]
> time_stamp <77145>
> next_to_watch <3b>
> jiffies <7734f>
> next_to_watch.status <0>
> NETDEV WATCHDOG: eth0: transmit timed out
> e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex

hey, this one is different. It is actually the common tx hang
signature (TDH == TDT) for these kinds of systems. I've come up with a
workaround driver, code is still in development.

you can try it if you would like.
http://sourceforge.net/tracker/download.php?group_id=42302&atid=447449&file_id=198849&aid=1463045

Thanks,
Jesse