2000-12-13 23:32:40

by Joseph Cheek

[permalink] [raw]
Subject: test12: eth0 trasmit timed out after one hour uptime

hi all,

after about an hour of uptime [and heavy HD usage] my ethernet just
died. couldn't ping a thing. syslog showed:

Dec 13 14:51:46 sanfrancisco kernel: NETDEV WATCHDOG: eth0: transmit
timed out
Dec 13 14:51:46 sanfrancisco kernel: eth0: transmit timed out, tx_status
00 status e680.
Dec 13 14:51:46 sanfrancisco kernel: Flags; bus-master 1, full 1;
dirty 3306(10) current 3322(10).
Dec 13 14:51:46 sanfrancisco kernel: Transmit list 00000000 vs.
c7c732a0.
Dec 13 14:51:46 sanfrancisco kernel: 0: @c7c73200 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 1: @c7c73210 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 2: @c7c73220 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 3: @c7c73230 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 4: @c7c73240 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 5: @c7c73250 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 6: @c7c73260 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 7: @c7c73270 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 8: @c7c73280 length 8000002a
status 8001002a
Dec 13 14:51:46 sanfrancisco kernel: 9: @c7c73290 length 8000002a
status 8001002a
Dec 13 14:51:46 sanfrancisco kernel: 10: @c7c732a0 length 8000004b
status 0001004b
Dec 13 14:51:46 sanfrancisco kernel: 11: @c7c732b0 length 8000004b
status 0001004b
Dec 13 14:51:46 sanfrancisco kernel: 12: @c7c732c0 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 13: @c7c732d0 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 14: @c7c732e0 length 8000002a
status 0001002a
Dec 13 14:51:46 sanfrancisco kernel: 15: @c7c732f0 length 8000002a
status 0001002a

after reboot it works fine again [i'll give it an hour...] test12-pre8
and before worked fine. any ideas?

--
thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare. Support for the Revolution. | [email protected]
CTO / Acting PM, Redmond Linux Project | [email protected]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [email protected]




2000-12-13 23:42:42

by Mohammad A. Haque

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

I just crossed the 1 day mark. What ethernet card do you have?

Joseph Cheek wrote:
>
> hi all,
>
> after about an hour of uptime [and heavy HD usage] my ethernet just
> died. couldn't ping a thing. syslog showed:

--

=====================================================================
Mohammad A. Haque http://www.haque.net/
[email protected]

"Alcohol and calculus don't mix. Project Lead
Don't drink and derive." --Unknown http://wm.themes.org/
[email protected]
=====================================================================

2000-12-13 23:45:02

by Joseph Cheek

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

00:0e.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)

i've been doing a ton of compiling which has thrashed the [IDE] HD,
perhaps it is related. other than that, just normal web surfing...

"Mohammad A. Haque" wrote:

> I just crossed the 1 day mark. What ethernet card do you have?
>
> Joseph Cheek wrote:
> >
> > hi all,
> >
> > after about an hour of uptime [and heavy HD usage] my ethernet just
> > died. couldn't ping a thing. syslog showed:
>
> --
>
> =====================================================================
> Mohammad A. Haque http://www.haque.net/
> [email protected]
>
> "Alcohol and calculus don't mix. Project Lead
> Don't drink and derive." --Unknown http://wm.themes.org/
> [email protected]
> =====================================================================

--
thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare. Support for the Revolution. | [email protected]
CTO / Acting PM, Redmond Linux Project | [email protected]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [email protected]



2000-12-13 23:56:14

by Mohammad A. Haque

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

At first I thought my lockups were HD i/o related also but then the last
lockup I had happened a while after I trashed my disk but while grabbing
email (ppp link).

Joseph Cheek wrote:
>
> 00:0e.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
> (rev 30)
>
> i've been doing a ton of compiling which has thrashed the [IDE] HD,
> perhaps it is related. other than that, just normal web surfing...

--

=====================================================================
Mohammad A. Haque http://www.haque.net/
[email protected]

"Alcohol and calculus don't mix. Project Lead
Don't drink and derive." --Unknown http://wm.themes.org/
[email protected]
=====================================================================

2000-12-14 01:56:04

by Michael Peddemors

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

I wasted time trying to track something similar down, replaced the card
instead :> My first clue was when smacking the box, it started working
again... (j/k)

You didna' mention the card type ...

On Wed, 13 Dec 2000, Joseph Cheek wrote:
> hi all,
>
> after about an hour of uptime [and heavy HD usage] my ethernet just
> died. couldn't ping a thing. syslog showed:
>
> Dec 13 14:51:46 sanfrancisco kernel: NETDEV WATCHDOG: eth0: transmit
> timed out
> Dec 13 14:51:46 sanfrancisco kernel: eth0: transmit timed out, tx_status
> 00 status e680.

--
--------------------------------------------------------
Michael Peddemors - Senior Consultant
Unix?Administration - WebSite Hosting
Network?Services - Programming
Wizard?Internet Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)?589-0037 Beautiful British Columbia, Canada
--------------------------------------------------------

2000-12-14 01:59:48

by Michael Peddemors

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

On Wed, 13 Dec 2000, Joseph Cheek wrote:
> 00:0e.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
> (rev 30)

Hmm, maybe I thru it out too fast.. 3Com905B -TXNM XL PCI SN=6xb1b85caf

--
--------------------------------------------------------
Michael Peddemors - Senior Consultant
Unix?Administration - WebSite Hosting
Network?Services - Programming
Wizard?Internet Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)?589-0037 Beautiful British Columbia, Canada
--------------------------------------------------------

2000-12-14 02:12:24

by Joseph Cheek

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

well, it seems to be working fine now, so i guess it was a fluke.

Michael Peddemors wrote:

> On Wed, 13 Dec 2000, Joseph Cheek wrote:
> > 00:0e.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
> > (rev 30)
>
> Hmm, maybe I thru it out too fast.. 3Com905B -TXNM XL PCI SN=6xb1b85caf

thanks!

joe

--
Joseph Cheek, Sr Linux Consultant, Linuxcare | http://www.linuxcare.com/
Linuxcare. Support for the Revolution. | [email protected]
CTO / Acting PM, Redmond Linux Project | [email protected]
425 990-1072 vox [1074 fax] 206 679-6838 pcs | [email protected]



2000-12-14 02:30:48

by James Stevenson

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

Hi

i may have also had some problems with this
when i was connected to the net though ppp (most of the night)
so far in about 6 hours it has stoped transmitting 2 times but still
recives it is fine after i disconnect and reconnect i will try and get it
to stop working with heavry disk io
BTW this is all under 2.2.18 and never had any problem with the isp
over the past month or so.


In local.linux-kernel-list, you wrote:
>At first I thought my lockups were HD i/o related also but then the last
>lockup I had happened a while after I trashed my disk but while grabbing
>email (ppp link).
>
>Joseph Cheek wrote:
>>
>> 00:0e.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
>> (rev 30)
>>
>> i've been doing a ton of compiling which has thrashed the [IDE] HD,
>> perhaps it is related. other than that, just normal web surfing...
>
>--
>
>


--
---------------------------------------------
Check Out: http://stev.org
E-Mail: [email protected]
1:50am up 1 day, 11:05, 7 users, load average: 0.11, 0.09, 0.03

2000-12-15 14:50:53

by Ingo Oeser

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

On Wed, Dec 13, 2000 at 03:01:29PM -0800, Joseph Cheek wrote:
> Dec 13 14:51:46 sanfrancisco kernel: NETDEV WATCHDOG: eth0: transmit
> timed out
> Dec 13 14:51:46 sanfrancisco kernel: eth0: transmit timed out, tx_status
> 00 status e680.
> Dec 13 14:51:46 sanfrancisco kernel: Flags; bus-master 1, full 1;
> dirty 3306(10) current 3322(10).
> Dec 13 14:51:46 sanfrancisco kernel: Transmit list 00000000 vs.
> c7c732a0.
> Dec 13 14:51:46 sanfrancisco kernel: 0: @c7c73200 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 1: @c7c73210 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 2: @c7c73220 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 3: @c7c73230 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 4: @c7c73240 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 5: @c7c73250 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 6: @c7c73260 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 7: @c7c73270 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 8: @c7c73280 length 8000002a
> status 8001002a
> Dec 13 14:51:46 sanfrancisco kernel: 9: @c7c73290 length 8000002a
> status 8001002a
> Dec 13 14:51:46 sanfrancisco kernel: 10: @c7c732a0 length 8000004b
> status 0001004b
> Dec 13 14:51:46 sanfrancisco kernel: 11: @c7c732b0 length 8000004b
> status 0001004b
> Dec 13 14:51:46 sanfrancisco kernel: 12: @c7c732c0 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 13: @c7c732d0 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 14: @c7c732e0 length 8000002a
> status 0001002a
> Dec 13 14:51:46 sanfrancisco kernel: 15: @c7c732f0 length 8000002a
> status 0001002a

I have this too since testX-Kernels are released.

I use a "3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 24)"
(actually two of them ;-)).

> after reboot it works fine again [i'll give it an hour...] test12-pre8
> and before worked fine. any ideas?

This seems to be code to debug these timeouts.

It didn't cause any harm AFICS, but I CC'ed the Author of this
code anyway.

Regards

Ingo Oeser
--
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
<<<<<<<<<<<< come and join the fun >>>>>>>>>>>>

2000-12-15 15:18:30

by Andrew Morton

[permalink] [raw]
Subject: Re: test12: eth0 trasmit timed out after one hour uptime

Ingo Oeser wrote:
>
> On Wed, Dec 13, 2000 at 03:01:29PM -0800, Joseph Cheek wrote:
> > Dec 13 14:51:46 sanfrancisco kernel: NETDEV WATCHDOG: eth0: transmit
> > timed out
> ...
> I have this too since testX-Kernels are released.
>
> I use a "3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 24)"
> (actually two of them ;-)).
>
> > after reboot it works fine again [i'll give it an hour...] test12-pre8
> > and before worked fine. any ideas?
>
> This seems to be code to debug these timeouts.
>
> It didn't cause any harm AFICS, but I CC'ed the Author of this
> code anyway.

Ingo,

Donald wrote just about all the Linux netdrivers, but he
now concentrates upon the drivers which he maintains at
http://www.scyld.com. Other people try to help out with the
drivers which come from kernel.org.

This particular problem does still occur occasionally.

It's way too infrequent to pin down. It can certainly
be caused by a very high collision rate on a hubbed LAN.
If that were the only cause I would take all the diagnostics
out, because that's simply ethernet.

Other possible causes are lost interrupts in the kernel
or hardware, cabling problems, power supply problems or,
indeed, a driver bug.

If you are able to reproduce this then I'd be very interested
in working with you on it. First step is to read the final
section of Documentation/networking/vortex.txt, then send
me a long email.

Thanks.