2000-11-28 04:53:19

by Toby Jaffey

[permalink] [raw]
Subject: test-10 tulip "eth0 timed out" (smp, heavy IDE use)

Using 2.4-test10 I got a series of timeout errors on my tulip network
card (Linksys LNE version 4, 00:0d.0 Ethernet controller: Bridgecom,
Inc: Unknown device 0985 (rev 11)). Networking then completely stopped
working. Restarting the interface with ifconfig fixed the problem.

I am using an SMP kernel, compiled with gcc 2.95.2 on an ABit BP6 (dual
celeron 500s, 128mb, PIIX4 using DMA).

[log extract]
Nov 28 04:04:52 twoey kernel: NETDEV WATCHDOG: eth0: transmit timed out
Nov 28 04:04:52 twoey kernel: eth0: Transmit timed out, status fc664010,
CSR12 00000000, resetting...
[end]

This has only happenned once so far, I have not been able to repeat the
problem. At the time, I was simultanously using two cd drives to rip
audio cds with DMA turned on, also I was using both processors fully.

--
http://www.nott.ac.uk/~psystrj ..::::::::::::::::::::::::::::::::::::::::::::::::.
/\_./o__ ....:::::::::' Mescaline, the only way to fly. '::::
(/^/(_^~' ''::::: ::::
___.(_.)____ ':::::::::::::::::::::::::::::::::::::::::::::::::::


2000-11-28 05:09:38

by Andre Hedrick

[permalink] [raw]
Subject: Re: test-10 tulip "eth0 timed out" (smp, heavy IDE use)


Toby,

Nothing can be done without the full re-write of the driver.
The global request_io_lock is slammed into play way to early and release
way to late. You will have to suffer with this flaw until the spin of
2.5. We are talking about a rewrite that is still at least 60 days from
being ready for me to test.

I still have to draft the recovery tools to mirror the disks for the data
lose in the pre-alpha tests.

On Tue, 28 Nov 2000, Toby Jaffey wrote:

> Using 2.4-test10 I got a series of timeout errors on my tulip network
> card (Linksys LNE version 4, 00:0d.0 Ethernet controller: Bridgecom,
> Inc: Unknown device 0985 (rev 11)). Networking then completely stopped
> working. Restarting the interface with ifconfig fixed the problem.
>
> I am using an SMP kernel, compiled with gcc 2.95.2 on an ABit BP6 (dual
> celeron 500s, 128mb, PIIX4 using DMA).
>
> [log extract]
> Nov 28 04:04:52 twoey kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Nov 28 04:04:52 twoey kernel: eth0: Transmit timed out, status fc664010,
> CSR12 00000000, resetting...
> [end]
>
> This has only happenned once so far, I have not been able to repeat the
> problem. At the time, I was simultanously using two cd drives to rip
> audio cds with DMA turned on, also I was using both processors fully.
>
> --
> http://www.nott.ac.uk/~psystrj ..::::::::::::::::::::::::::::::::::::::::::::::::.
> /\_./o__ ....:::::::::' Mescaline, the only way to fly. '::::
> (/^/(_^~' ''::::: ::::
> ___.(_.)____ ':::::::::::::::::::::::::::::::::::::::::::::::::::
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

Andre Hedrick
CTO Timpanogas Research Group
EVP Linux Development, TRG
Linux ATA Development

2000-11-28 06:18:52

by Jacob Luna Lundberg

[permalink] [raw]
Subject: Re: test-10 tulip "eth0 timed out" (smp, heavy IDE use)


> > Linksys LNE version 4, 00:0d.0 Ethernet controller: Bridgecom, Inc:
> > Unknown device 0985 (rev 11)
[...]
> > Nov 28 04:04:52 twoey kernel: NETDEV WATCHDOG: eth0: transmit timed out
> > Nov 28 04:04:52 twoey kernel: eth0: Transmit timed out, status fc664010,
> > CSR12 00000000, resetting...

I can replicate this message any day you want. It seems that this card is
perhaps a bit too sensitive to high interrupt latencies or something to
that effect. Dan Hollis worked on my box for several days and we found
that the problem tends to trigger (in my case) when nfs is in use. But I
still haven't had time to explore further. :(

Dan tells me the chip in question is a Centaur so I presume eventually
kernels will identify it correctly once somebody adds it to the list. :)

-Jacob

--

" ... mutant DEC .au files ... "

-http://ocean.hhardy.net/ftp/systems/linux/snd/Lsox/Sox/
[1999.09.22 - sorry, link is dead nowadays]

2000-11-28 10:23:25

by Lukasz Trabinski

[permalink] [raw]
Subject: Re: test-10 tulip "eth0 timed out" (smp, heavy IDE use)

In article <20001128042134.A1041@twoey> you wrote:

> Using 2.4-test10 I got a series of timeout errors on my tulip network
> card (Linksys LNE version 4, 00:0d.0 Ethernet controller: Bridgecom,
> Inc: Unknown device 0985 (rev 11)). Networking then completely stopped
> working. Restarting the interface with ifconfig fixed the problem.

> I am using an SMP kernel, compiled with gcc 2.95.2 on an ABit BP6 (dual
> celeron 500s, 128mb, PIIX4 using DMA).

> [log extract]
> Nov 28 04:04:52 twoey kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Nov 28 04:04:52 twoey kernel: eth0: Transmit timed out, status fc664010,
> CSR12 00000000, resetting...

I have the same problem with SMSC EPIC/100 83c170 Ethernet controller:

NETDEV WATCHDOG: eth0: transmit timed out
eth0: Transmit timeout using MII device, Tx status 0003.
eth0: Restarting the EPIC chip, Rx 4568454/4568454 Tx 6262613/6262623.
eth0: epic_restart() done, cmd status 000a, ctl 0512 interrupt 240000.

kernel 2.4.0-test11

--
*[ ?ukasz Tr?bi?ski ]*
SysAdmin @wsisiz.edu.pl

2000-11-28 16:39:47

by Francois romieu

[permalink] [raw]
Subject: Re: test-10 tulip "eth0 timed out" (smp, heavy IDE use)

Hi,

The Tue, Nov 28, 2000 at 10:52:46AM +0100, Lukasz Trabinski wrote :
[...]
> I have the same problem with SMSC EPIC/100 83c170 Ethernet controller:
>
> NETDEV WATCHDOG: eth0: transmit timed out
> eth0: Transmit timeout using MII device, Tx status 0003.
> eth0: Restarting the EPIC chip, Rx 4568454/4568454 Tx 6262613/6262623.
> eth0: epic_restart() done, cmd status 000a, ctl 0512 interrupt 240000.

Could you describe a way to reproduce it or be more specific regarding
the hardware/load/whatever ?

--
Ueimor

2000-11-28 16:54:10

by Lukasz Trabinski

[permalink] [raw]
Subject: Re: test-10 tulip "eth0 timed out" (smp, heavy IDE use)

On Tue, 28 Nov 2000, Francois romieu wrote:

> > NETDEV WATCHDOG: eth0: transmit timed out
> > eth0: Transmit timeout using MII device, Tx status 0003.
> > eth0: Restarting the EPIC chip, Rx 4568454/4568454 Tx 6262613/6262623.
> > eth0: epic_restart() done, cmd status 000a, ctl 0512 interrupt 240000.
>
> Could you describe a way to reproduce it or be more specific regarding
> the hardware/load/whatever ?

Well, during the heavy load tranfser for example big files (iso images)
over FTP or sending backup files by Amanda to other the machine in the
same network.

Linux RedHat 7.0+upgrades (glibc 2.2)

[root@mask /root]# procinfo
Linux 2.4.0-test11 ([email protected]) (gcc 2.96 20000731 ) #1 1CPU
[mask]
Memory: Total Used Free Shared Buffers
Cached
Mem: 255804 230416 25388 0 2180
96472
Swap: 311300 4 311296

Bootup: Sat Nov 25 23:20:16 2000 Load average: 0.15 0.32 0.34 1/112
17056

user : 5:13:02.51 7.9% page in : 12874041
nice : 0:02:59.64 0.1% page out: 2702219
system: 2:42:42.61 4.1% swap in : 2
idle : 2d 9:57:42.91 87.9% swap out: 1
uptime: 2d 17:56:27.67 context : 36996214

irq 0: 23738767 timer irq 7: 13183
irq 1: 2 keyboard irq 10: 11915007 eth0
irq 2: 0 cascade [4] irq 14: 1344730 ide0
irq 4: 106 serial irq 15: 406003 ide1
irq 6: 3


[root@mask /root]# cat /proc/pci
PCI devices found:
Bus 0, device 0, function 0:
Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 3).
Master Capable. Latency=64.
Prefetchable 32 bit memory at 0xe0000000 [0xe7ffffff].
Bus 0, device 1, function 0:
PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 3).
Master Capable. Latency=64. Min Gnt=137.
Bus 0, device 7, function 0:
ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 2).
Bus 0, device 7, function 1:
IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 1).
Master Capable. Latency=64.
I/O at 0xf000 [0xf00f].
USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 1).
IRQ 11.
Master Capable. Latency=64.
I/O at 0x6400 [0x641f].
Bus 0, device 7, function 3:
Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 2).
Bus 0, device 12, function 0:
Ethernet controller: Standard Microsystems Corp [SMC] 83C170QF (rev 6).
IRQ 10.
Master Capable. Latency=64. Min Gnt=8.Max Lat=28.
I/O at 0x6800 [0x68ff].
Non-prefetchable 32 bit memory at 0xe8000000 [0xe8000fff].
Bus 1, device 0, function 0:
VGA compatible controller: S3 Inc. ViRGE/GX2 (rev 6).
IRQ 9.
Master Capable. Latency=64. Min Gnt=4.Max Lat=255.
Non-prefetchable 32 bit memory at 0xd8000000 [0xdbffffff].

[root@mask /root]# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 350.000800
cache size : 512 KB


I have just found another strange message on this machine:

Attempt to read inode for relocated directory

What it means?

--
*[ ?ukasz Tr?bi?ski ]*
SysAdmin @wsisiz.edu.pl