2000-12-22 17:53:04

by Stefan Hoffmeister

[permalink] [raw]
Subject: rtl8139 driver broken? (2.2.16)


[please CC replies; I am not on the list]

I have a 2.2.16 kernel on an HP Omnibook 800 CT with docking station. That
docking station contains an Allied Telesyn 2500TX NIC, identified by lspci
as "Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)". Versions 1.07
(RedHat 7.0) and 1.08 (SuSE 7.0) exhibit the same behaviour.

The network is set up correctly - I can ping 127.0.0.1 without problems,
but the connection to the external network simply "stops working" after a
while as soon as I do something more exciting.

Examples of failure when the rtl8139 driver is used:

ping 192.168.0.55
// (sometimes) works for ages

ping -s 5000 192.168.0.55
// makes network die almost instantaneously
// after that, all outbound traffic just does
// not get through

ftp 192.168.0.55
binary
get <largish file>
// makes network die almost instantaneously

The network is resurrected by /etc/rc.d/[init.d/]network restart. This
happens both with the stock kernel + modules shipped with RedHat 7.0, a
default SuSE 7.0 setup, and a self-built kernel + modules based on SuSE
7.0.

I do not see any kernel messages indicating any failure anywhere; only on
boot do I get "neighbour table overflow" (4x), but the NIC works
nevertheless.

When I insert a PCMCIA NIC and use the network over *that* card,
everything works fine forever ("ifconfig eth0 down", then "ifconfig eth1
up").

Fiddling with BIOS options (PnP, PCI bridge configuration) does not seem
to have any effect.

Questions:
* Is the rtl8139 driver broken?
* Is there some kind of problem with the docking station (bridge)?

FWIW, this combination ran perfectly fine with Windows NT4 SP3...


2000-12-22 19:03:07

by Alan

[permalink] [raw]
Subject: Re: rtl8139 driver broken? (2.2.16)

> Questions:
> * Is the rtl8139 driver broken?

Somewhat, especially in kernels that old

2.2.18 might help and also as an '8139too' driver rewrite which may work

2000-12-23 06:03:37

by Stefan Hoffmeister

[permalink] [raw]
Subject: Re: rtl8139 driver broken? (2.2.16)

: On Fri, 22 Dec 2000 18:34:46 +0000 (GMT), Alan Cox wrote:

>> Questions:
>> * Is the rtl8139 driver broken?
>
>Somewhat, especially in kernels that old

We do live on Internet time, I know, but has it gotten that fast that the
latest distributions ship with kernels "that old" ;-)

>2.2.18 might help and also as an '8139too' driver rewrite which may work

I am now running SuSE 7.0 and have replaced the supplied (2.2.16) kernel
with a stock 2.2.18 kernel. The situation has improved, but in the end my
problems persist.

The rtl8139 driver continues to exhibit the old problem: ping with large
packet sizes kills off the external connection. The connection can be
resurrected by "ifconfig eth0 down" followed by "ifconfig eth0 up".

The 8189too driver handles large packet sizes *much* more gracefully; I
now can "ping -s 5000 192.168.0.55" - which gets me 100% packet loss -,
but a "ping -s 500 192.168.0.55" works perfectly (0% packet loss) even
after doing that.

The major problem I have is that FTP'ing to the machine slows down to a
*major* crawl at random (?) places. Example:

192.168.0.77 = Realtek 8139 (8139too) 2.2.18, in docking station
192.168.0.55 = Intel eepro100 (eepro100, 2.2.16, SuSE 7.0)

>From 192.168.0.55:

ftp 192.168.0.77
binary
put <file> // 3 MB

* Up to 40% transfer all seems to work smoothly (varies)
* Transfer stalled (for 30-35 seconds)
* Sudden burst of activity up to 65% (all smooth)
* Transfer stalled (for 3-4 minutes!)
* Sudden burst of activity up to 100% (all smooth) (varies)

Data transferred successfully; but 3390679 bytes sent in 05:12 (10.59
KB/s)? This cannot be right. I got 730 KB/s when transferring the pristine
2.2.18 kernel over the 10 MBit/s PCMCIA NIC.

All this is on a 100 MBit network over a hub; there is no network traffic
except the FTP.

I get a massive amount of log entries:

kernel: eth0: Abnormal interrupt, status 00000041
<repeated many times>

Interspersed are single occurrences of

kernel: eth0: Abnormal interrupt, status 00000040
and
kernel: eth0: Abnormal interrupt, status 00000045

AFAICT, these are directly related to the FTP transfer: I connect to the
machine, get the log entry, start FTP'ing, and get all this mess in
messages.

Some more digging seems to indicate the following:

ping -s 4500 192.168.0.55

gets some major packet loss. For each packet lost, I get one of these
"Abnormal interrupt" in the log.

I notice that "Version 0.9.10" of 8139too ships with the 2.2.18 kernel;
there is a "Version 0.9.12 - November 23, 2000" on
http://sourceforge.net/projects/gkernel/
but that won't compile with the 2.2.18 kernel (as advertised in the
readme)

The changelog of that new version contains (amongst others) these entries

* Kill major Tx stop/wake queue race
* Replace timer with kernel thread for twister tuning state machine
and media checking. Fixes mdio_xxx locking, now mdio_xxx is always
protected by rtnl_lock semaphore.
* Sanity check Rx packet status and size (Tobias)
* When handling a Tx timeout, disable Tx ASAP if not already.
* Do not abort Rx processing on lack of memory, keep going
until the current Rx ring is completely handling. (Tobias)

I have no clue what all this means, and neither am I competent to backport
the diffs to 2.2.18, but I sure would be curious to learn whether any of
the above changes could make a difference to the problem I observe?
http://www.afthd.tu-darmstadt.de/~dg1kjd/stuff/ (Jens David, apparently
2.2 backport maintainer) does not have any updates.

Any ideas? Any leads on how to proceed?

TIA!

2000-12-23 18:21:01

by Stefan Hoffmeister

[permalink] [raw]
Subject: 8139too driver broken? (2.4-test12) - Was: Re: rtl8139 driver broken? (2.2.16)

: On Fri, 22 Dec 2000 18:34:46 +0000 (GMT), Alan Cox wrote:

>2.2.18 might help and also as an '8139too' driver rewrite which may work

Advancing further to a 2.4-test12 kernel (with the latest available
8139too driver - 0.9.12) improves the situation even further, but doesn't
solve it.

I still cannot "ping -s 5000 192.168.0.55" from the 8139too machine, but
at least I can now FTP the 2.4-test12 kernel (18 MB) from another machine
to the 8139 target without any stalls. The rather major problem that
remains is performance.

As before:
* 192.168.0.77 = 8139too (2.4-test12, SuSE 7.0);
HP Omnibook 800; P133; 48 MB

* 192.168.0.55 = eepro100 (2.2.16, SuSE 7.0);
Sony VAIO Z600NE, PIII 650, 256 MB

FTP from 192.168.0.55 (eepro100) to 192.168.0.77 (8139too):

put linux-2.4.0-test12.tar.bz2
18975167 bytes sent in 00:48 (385.71 KB/s)

get linux-2.4.0-test12.tar.bz2
18975167 bytes received in 00:05 (3.18 MB/s)

FTP from 192.168.0.77 (8139too) to 192.168.0.55 (eepro100):

get linux-2.4.0-test12.tar.bz2
18975167 bytes sent in 00:34 (530.39 KB/s)

put linux-2.4.0-test12.tar.bz2
18975167 bytes sent in 00:05 (3.13 MB/s)


IOW, when the 8139too driver itself *delivers* data (put initiated from
192.168.0.77, get initiated from 192.168.0.55), I get 3 MB/s throughput,
pretty much what I would expect.

When the 8139too driver gets data pushed down its throat (get initiated
from 192.168.0.77, put initiated from 192.168.0.55), then performance
plainly does not exist.

Any ideas?

TIA!

2000-12-26 04:44:14

by Stefan Hoffmeister

[permalink] [raw]
Subject: Re: 8139too driver broken? (2.4-test12) - Was: Re: rtl8139 driver broken? (2.2.16)

: On Sat, 23 Dec 2000 18:50:53 +0100, Stefan Hoffmeister wrote:

>The rather major problem that
>remains is performance.

In case someone is interested...

Windows 2000 SP1 now has the Realtek 8139 (Celeron 433, 192 MB, pure
SCSI); drivers as shipped with W2K. Using a 40 MB test file over FTP, I
get

Realtek card sends with 3.5 MB/s
Realtek card receives with 5 MB/s

The system that previously contained the 8139 card now has a (10 MBit)
8029 card - transfer rates with that card are about 850 KB/s, compared to
the 400KB/s to 530 KB/s with the (100 MBit) 8139 card.

This makes me conclude that there is some pretty serious problem left in
the 8139too driver.

2000-12-29 02:53:56

by Stefan Hoffmeister

[permalink] [raw]
Subject: NIC + PCI busmaster problems? (2.2, 2.4) - Was: Re: 8139too driver broken? (2.4-test12)

: On Sat, 23 Dec 2000 18:50:53 +0100, Stefan Hoffmeister wrote:

>: On Fri, 22 Dec 2000 18:34:46 +0000 (GMT), Alan Cox wrote:
>
>>2.2.18 might help and also as an '8139too' driver rewrite which may work
>
>Advancing further to a 2.4-test12 kernel (with the latest available
>8139too driver - 0.9.12) improves the situation even further, but doesn't
>solve it.

Cool, I am talking to myself again.

Is it possible that problems in busmastering support cause these problems
(2.2 + 2.4)?

There has been a bit of "swap the nic" fun, some work with Manfred on the
Realtek, and it seems as if the Realtek 8139 drivers are not to blame,
because a 3com 509C-TX exhibits even worse problems in the same system,
while both the Realtek 8139 and the 3com 509C-TX perform fine (8 MB/s)
when dropped into a different system.

Due to that, I believe that the Realtek itself is not to blame, but the
system it is stuck in :-)

HP Omnibook 800, P133, 48 MB, in the docking station

00:00.0 Host bridge: VLSI Technology Inc 82C535 (rev 03)
00:01.0 PCI bridge: VLSI Technology Inc 82C534 (rev 03)
00:02.0 Class ff00: VLSI Technology Inc 82C532 (rev 02)

Complete lspci -vv at end of message.

In total, I have tried three different NICs (Realtek 8029(AS), Realtek
8129B, 3com 905C-TX). Of these, only the Realtek 8029(AS) performs as
expected:

* Realtek 8029(AS), ne2k-pci:
1100+-1 KB/s send; 1000+-30 KB/s receive (netperf)
"ping -s 65000" works

but

* Realtek 8139B, 8139too:
3500+-10 KB/s send; 1300 +-400(!) KB/s receive (netperf)
"ping -s 4433" (>3 packets) - 100% packet loss at the Realtek

* 3com 905C-TX, 3c59x:
3500+-10 KB/s send; 400 (!) +-300(!) KB/s receive (netperf)
"ping -s 2593" (>2 packets) - 100% packet loss at the 3com 905C-TX

* 3com 905C-TX, 3c90x:
3500+-10 KB/s send; 400 (!) +-300(!) KB/s receive (netperf)
"ping -s 3300" (>2.5 packets) - 100% packet loss at the 3com 905C-TX

I find it interesting that only the card that *doesn't* do busmastering
(Realtek 8029(AS), according to lspci -vv) performs in an acceptable
manner.

Could busmastering problems be responsible for this?

TIA!
Stefan

**********************************
00:00.0 Host bridge: VLSI Technology Inc 82C535 (rev 03)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0

00:01.0 PCI bridge: VLSI Technology Inc 82C534 (rev 03) (prog-if 00
[Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00004000-00007fff
Memory behind bridge: 20000000-2fffffff
Prefetchable memory behind bridge: 30000000-3fffffff
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-

00:02.0 Class ff00: VLSI Technology Inc 82C532 (rev 02)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-

00:03.0 VGA compatible controller: Neomagic Corporation NM2070 [MagicGraph
NM2070] (rev 01) (prog-if 00 [VGA])
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 0
Region 0: Memory at c0000000 (32-bit, prefetchable)

00:04.0 CardBus bridge: Texas Instruments PCI1130 (rev 04)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64, cache line size 08
Interrupt: pin A routed to IRQ 0
Region 0: Memory at <ignored> (32-bit, non-prefetchable)
Bus: primary=00, secondary=20, subordinate=22, sec-latency=32
I/O window 0: 00000000-00000003
I/O window 1: 00000000-00000003
BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt+ PostWrite-
16-bit legacy interface ports at 0001

00:04.1 CardBus bridge: Texas Instruments PCI1130 (rev 04)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64, cache line size 08
Interrupt: pin B routed to IRQ 0
Region 0: Memory at <ignored> (32-bit, non-prefetchable)
Bus: primary=00, secondary=23, subordinate=25, sec-latency=32
I/O window 0: 00000000-00000003
I/O window 1: 00000000-00000003
BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt+ PostWrite-
16-bit legacy interface ports at 0001

00:06.0 IRDA controller: VLSI Technology Inc 82C147 (rev 02)
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 0
Region 0: I/O ports at 1000

01:00.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c810
(rev 11)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 128 (2000ns min, 16000ns max), cache line size 04
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at 4100
Region 1: Memory at 20000100 (32-bit, non-prefetchable)

01:05.0 ISA bridge: VLSI Technology Inc 82C538
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0

01:06.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink]
(rev 74)
Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management
NIC
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 128 (2500ns min, 2500ns max), cache line size 04
Interrupt: pin A routed to IRQ 15
Region 0: I/O ports at 4000
Region 1: Memory at 20000000 (32-bit, non-prefetchable)
Expansion ROM at 21f00000
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable+ DSel=0 DScale=2 PME-