2003-02-11 13:34:41

by Henrik Persson

[permalink] [raw]
Subject: via rhine bug? (timeouts and resets)

Hi there.

I know this has been up before, but I couldn't find a solution in the
archives that would solve my problems..

The problem is that my Via Rhine-NIC when transmitting alot of data fast
(like.. ftp:ing large files over the network at 100mbit/s) gets an error
(frame dropped, transmit error, reset).. As a cause of this the speed
drops to about 3-4MB/s and the rest of the communication trough the
network isn't working very well..

Note that this ONLY happens when there's alot of traffic (i.e. speeds at
~100mbit/s)..

Here is my dmesg (notice that the driver is inserted twice.. Once at
boot-time and once later on, loaded with debug=3)..

And ah, yes. It's an Acer Aspire 1300XV if that helps.. And yes, I've
tried the rhinefet.o-module from viarena..

Thanks. Hope anyone knows what to do..

Linux version 2.4.20 (root@vega) (gcc version 2.95.3 20010315 (release))
#5 Tue Feb 11 12:05:25 CET 2003
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000eff0000 (usable)
BIOS-e820: 000000000eff0000 - 000000000effffc0 (ACPI data)
BIOS-e820: 000000000effffc0 - 000000000f000000 (ACPI NVS)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
239MB LOWMEM available.
On node 0 totalpages: 61424
zone(0): 4096 pages.
zone(1): 57328 pages.
zone(2): 0 pages.
Kernel command line: auto BOOT_IMAGE=Linux ro root=301 noapic
Initializing CPU#0
Detected 1200.078 MHz processor.
Console: colour dummy device 80x25
Calibrating delay loop... 2392.06 BogoMIPS
Memory: 240244k/245696k available (1486k kernel code, 5064k reserved, 418k
data, 256k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 0183f9ff c1cbf9ff 00000000 00000000
CPU: Common caps: 0183f9ff c1cbf9ff 00000000 00000000
CPU: AMD mobile AMD Athlon(tm) XP 1400+ stepping 00
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch ([email protected])
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xe8a64, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
Disabling VIA memory write queue (PCI ID 0305, rev 80): [55] 3c & 1f -> 1c
PCI: Using IRQ router default [1106/8231] at 00:11.0
PCI: Cannot allocate resource region 0 of device 00:0a.0
Applying VIA southbridge workaround.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 [email protected]).
ACPI: Core Subsystem version [20011018]
ACPI: Subsystem enabled
EC: found, GPE 1
ACPI: System firmware supports S0 S3 S4 S5
Processor[0]: C0 C1 C2
ACPI: Battery socket found, battery present
ACPI: AC Adapter found
ACPI: Power Button (FF) found
ACPI: Multiple power buttons detected, ignoring fixed-feature
ACPI: Power Button (CM) found
ACPI: Lid Switch (CM) found
ACPI: Thermal Zone found
parport0: PC-style at 0x378 (0x778) [PCSPP(,...)]
parport0: irq 7 detected
vesafb: framebuffer at 0x90000000, mapped to 0xcf807000, size 15296k
vesafb: mode is 1024x768x8, linelength=1024, pages=18
vesafb: protected mode interface info at c000:7926
vesafb: scrolling: redraw
Console: switching to colour frame buffer device 128x48
fb0: VESA VGA frame buffer device
vga16fb: initializing
vga16fb: mapped to 0xc00a0000
fb1: VGA16 VGA frame buffer device
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ
SERIAL_PCI enabled
Real Time Clock Driver v1.10e
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 89
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
VP_IDE: VIA vt8231 (rev 10) IDE UDMA100 controller on pci00:11.1
ide0: BM-DMA at 0x1100-0x1107, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0x1108-0x110f, BIOS settings: hdc:DMA, hdd:pio
hda: TOSHIBA MK2018GAP, ATA DISK drive
hdc: QSI DVD-ROM SDR-083, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
blk: queue c0348484, I/O limit 4095Mb (mask 0xffffffff)
hda: 39070080 sectors (20004 MB), CHS=2584/240/63, UDMA(100)
hdc: ATAPI 24X DVD-ROM drive, 512kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
Partition check:
hda: hda1 hda2 hda3
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
loop: loaded (max 8 devices)
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 189M
agpgart: Detected Via Apollo Pro KT133 chipset
agpgart: AGP aperture is 64M @ 0xa0000000
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
uhci.c: USB Universal Host Controller Interface driver v1.1
uhci.c: USB UHCI at I/O 0x1200, IRQ 11
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <[email protected]>
hid-core.c: USB HID support drivers
mice: PS/2 mouse device common for all mice
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 16384)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
IPv6 v0.8 for NET4.0
IPv6 over IPv4 tunneling driver
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 256k freed
hub.c: new USB device 00:11.2-1, assigned address 2
input0: USB HID v1.00 Mouse [Microsoft Microsoft IntelliMouse? Explorer]
on usb1:2.0
hub.c: new USB device 00:11.2-2, assigned address 3
Adding Swap: 491392k swap-space (priority -1)
input1: USB HID v1.10 Keyboard [Logitech Logitech USB Keyboard] on
usb1:3.0
input2: USB HID v1.10 Pointer [Logitech Logitech USB Keyboard] on usb1:3.1
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
via-rhine.c:v1.10-LK1.1.14 May-3-2002 Written by Donald Becker
http://www.scyld.com/network/via-rhine.html
PCI: Enabling device 00:12.0 (0001 -> 0003)
eth0: VIA VT6102 Rhine-II at 0xf0000000, 00:c0:9f:0d:d1:dd, IRQ 11.
eth0: MII PHY found at address 1, status 0x782d advertising 01e1 Link
45e1.
Via 686a audio driver 1.9.1
ac97_codec: AC97 Modem codec, id: CXT41(Unknown)
via82cxxx: board #1 at 0xE000, IRQ 10
eth0: Setting full-duplex based on MII #1 link partner capability of 45e1.
eth0: no IPv6 routers present
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
via-rhine.c:v1.10-LK1.1.14 May-3-2002 Written by Donald Becker
http://www.scyld.com/network/via-rhine.html
via-rhine: reset finished after 5 microseconds.
eth%d: Set to forced full duplex, autonegotiation disabled.
eth0: VIA VT6102 Rhine-II at 0xf0000000, 00:c0:9f:0d:d1:dd, IRQ 11.
eth0: MII PHY found at address 1, status 0x782d advertising 01e1 Link
45e1.
eth0: via_rhine_open() irq 11.
eth0: reset finished after 5 microseconds.
eth0: Done via_rhine_open(), status 0c1a MII status: 782d.
eth0: no IPv6 routers present
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0209, frame dropped.
eth0: Something Wicked happened! 0209.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.
eth0: Transmit error, Tx status 00008800.
eth0: MII status changed: Autonegotiation advertising 01e1 partner 45e1.
eth0: Abort 0208, frame dropped.
eth0: Something Wicked happened! 0208.


--
Henrik Persson
e-mail: [email protected] WWW: http://nix.badanka.com
ICQ: 26019058 PGP/GPG: http://nix.badanka.com/pgp
PGP-Key-ID: 0x43B68116 PGP-Keyserver: pgp.mit.edu


2003-02-11 14:08:25

by Gianni Tedesco

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 2003-02-11 at 13:43, Henrik Persson wrote:
> The problem is that my Via Rhine-NIC when transmitting alot of data fast
> (like.. ftp:ing large files over the network at 100mbit/s) gets an error
> (frame dropped, transmit error, reset).. As a cause of this the speed
> drops to about 3-4MB/s and the rest of the communication trough the
> network isn't working very well..
>
> Note that this ONLY happens when there's alot of traffic (i.e. speeds at
> ~100mbit/s)..

Have you tried connecting directly to the other device with a crossover
cable, do problems still occur?

--
// Gianni Tedesco (gianni at scaramanga dot co dot uk)
lynx --source http://www.scaramanga.co.uk/gianni-at-ecsc.asc | gpg --import
8646BE7D: 6D9F 2287 870E A2C9 8F60 3A3C 91B5 7669 8646 BE7D


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-02-11 14:25:01

by Henrik Persson

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On 11 Feb 2003 14:18:28 +0000
Gianni Tedesco <[email protected]> wrote:

GT> On Tue, 2003-02-11 at 13:43, Henrik Persson wrote:
GT> > The problem is that my Via Rhine-NIC when transmitting alot of data fast
GT> > (like.. ftp:ing large files over the network at 100mbit/s) gets an error
GT> > (frame dropped, transmit error, reset).. As a cause of this the speed
GT> > drops to about 3-4MB/s and the rest of the communication trough the
GT> > network isn't working very well..
GT> >
GT> > Note that this ONLY happens when there's alot of traffic (i.e. speeds at
GT> > ~100mbit/s)..
GT>
GT> Have you tried connecting directly to the other device with a crossover
GT> cable, do problems still occur?

Yes, exactly the same problems occurs.

--
Henrik Persson
e-mail: [email protected] WWW: http://nix.badanka.com
ICQ: 26019058 PGP/GPG: http://nix.badanka.com/pgp
PGP-Key-ID: 0x43B68116 PGP-Keyserver: pgp.mit.edu


Attachments:
(No filename) (189.00 B)

2003-02-11 15:35:12

by Roger Luethi

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 14:43:46 +0100, Henrik Persson wrote:
> I know this has been up before, but I couldn't find a solution in the
> archives that would solve my problems..

The patch attached below will definitely solve some of the problems
you're seeing (e.g. "excessive collisions" on a switch). Feedback
welcome. As I've explained in previous postings, the current event
handling is pretty broken.

I have nailed down a number of problems even this patch doesn't fix,
but it's kinda hard to build from there, since testing feedback has been
basically zero. Pretty amazing considering how common Rhine hardware
is. I guess I should write code for NUMA or ia64 instead, _they_ have
testers <g>.

You shouldn't need to force full duplex, btw.

Roger


Attachments:
(No filename) (751.00 B)
via-rhine.c-1.15exp1.diff (8.76 kB)
Download all attachments

2003-02-11 16:42:16

by Henrik Persson

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 16:44:49 +0100
Roger Luethi <[email protected]> wrote:

RL> The patch attached below will definitely solve some of the problems
RL> you're seeing (e.g. "excessive collisions" on a switch). Feedback
RL> welcome. As I've explained in previous postings, the current event
RL> handling is pretty broken.
RL>
RL> I have nailed down a number of problems even this patch doesn't fix,
RL> but it's kinda hard to build from there, since testing feedback has
RL> been basically zero. Pretty amazing considering how common Rhine
RL> hardware is. I guess I should write code for NUMA or ia64 instead,
RL> _they_ have testers <g>.

Well.. It didn't solve my problems.. Still the same errors.. :/

Everyone I know who have a rhineII-card does have the same problem.. I can
be your personal tester, just make my card work, pleeeeeease? :PP

RL> You shouldn't need to force full duplex, btw.

Nah, that was "just in case".. ;)

--
Henrik Persson
e-mail: [email protected] WWW: http://nix.badanka.com
ICQ: 26019058 PGP/GPG: http://nix.badanka.com/pgp
PGP-Key-ID: 0x43B68116 PGP-Keyserver: pgp.mit.edu


Attachments:
(No filename) (189.00 B)

2003-02-11 17:07:52

by Roger Luethi

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 17:51:09 +0100, Henrik Persson wrote:
> RL> The patch attached below will definitely solve some of the problems
> RL> you're seeing (e.g. "excessive collisions" on a switch). Feedback
>
> Well.. It didn't solve my problems.. Still the same errors.. :/

That I find hard to believe. You were seeing a combination of "MII status
changed" and "Abort 0208, frame dropped.". That's because the driver makes
two mistakes: It treats 0200 as a link change (first message), and it
thinks 0008 indicates excessive collisions (second message).

In fact, 0008 means "transmission error", and 0200 specifies a buffer
underrun. The patch fixes that (lines 204, 213). If you are seeing the
_same_ errors my guess is you're still running the old driver. Check the
log at debug=3.

> RL> You shouldn't need to force full duplex, btw.
>
> Nah, that was "just in case".. ;)

It's masking another bug that's waiting to hit you <g>.

Roger

2003-02-11 17:36:04

by Henrik Persson

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 18:17:36 +0100
Roger Luethi <[email protected]> wrote:

RL> > Well.. It didn't solve my problems.. Still the same errors.. :/
RL>
RL> That I find hard to believe. You were seeing a combination of "MII
RL> status changed" and "Abort 0208, frame dropped.". That's because the
RL> driver makes two mistakes: It treats 0200 as a link change (first
RL> message), and it thinks 0008 indicates excessive collisions (second
RL> message).

RL> In fact, 0008 means "transmission error", and 0200 specifies a buffer
RL> underrun. The patch fixes that (lines 204, 213). If you are seeing the
RL> _same_ errors my guess is you're still running the old driver. Check
RL> the log at debug=3.


Darn. The same PROBLEMS, not the same errors. Indeed, the errors are not
there. But the behaviour is still the same, i.e. slow speeds after a
while.. :/

But it's not as bad as it got a few minutes ago when I tested the driver
from scyld.com.. It totally trashed my NIC.. A shame though, since it ran
perfectly until it totally died.. I wan't a combination of those drivers..
;)

RL> > Nah, that was "just in case".. ;)
RL>
RL> It's masking another bug that's waiting to hit you <g>.

Woohoo. Ehm. Nah. ;)

--
Henrik Persson
e-mail: [email protected] WWW: http://nix.badanka.com
ICQ: 26019058 PGP/GPG: http://nix.badanka.com/pgp
PGP-Key-ID: 0x43B68116 PGP-Keyserver: pgp.mit.edu


Attachments:
(No filename) (189.00 B)

2003-02-11 18:30:03

by Roger Luethi

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 18:44:59 +0100, Henrik Persson wrote:
> RL> _same_ errors my guess is you're still running the old driver. Check
> RL> the log at debug=3.
>
> Darn. The same PROBLEMS, not the same errors. Indeed, the errors are not
> there. But the behaviour is still the same, i.e. slow speeds after a
> while.. :/

No errors at all? No "Transmitter underrun" (at debug>1)? I suspect you hit
two more bugs: If the driver resets the chip (e.g. watchdog timeout),
chances are the chip is programmed to go half-duplex -> performance breaks
down. No problem as long as we deal with errors properly, but the Rhine-II
can throw an error the mainline driver doesn't notice because the interrupt
status registers stay clean.

Can I see a complete log (at debug=3), starting with module insertion?
There's got to be some underrun and watchdog timeout.

> But it's not as bad as it got a few minutes ago when I tested the driver
> from scyld.com.. It totally trashed my NIC.. A shame though, since it ran

Define "trashed". How exactly did it misbehave, what did you have to do to
get it back working? Anything interesting in the log before it breaks?

FWIW, it is possible to get a Rhine into a state where physically removing
the PCI card from the computer and keeping both away from any power source
for an hour still results in the driver hanging on boot (after putting
everything back together, of course). I've gone through this twice so far.
Voodoo magic.

Roger

2003-02-11 18:46:12

by Henrik Persson

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 19:39:43 +0100
Roger Luethi <[email protected]> wrote:

RL> No errors at all? No "Transmitter underrun" (at debug>1)? I suspect
RL> you hit two more bugs: If the driver resets the chip (e.g. watchdog
RL> timeout), chances are the chip is programmed to go half-duplex ->
RL> performance breaks down. No problem as long as we deal with errors
RL> properly, but the Rhine-II can throw an error the mainline driver
RL> doesn't notice because the interrupt status registers stay clean.

Something was strange.. Now I get the errors.. But the funny thing is:
when downloading the file there's no problem at all. Uploading the same
file results in the attached dmesg.. So, something is fishy in that code
;)

RL> Can I see a complete log (at debug=3), starting with module insertion?
RL> There's got to be some underrun and watchdog timeout.

Whops. I included the module in the kernel ;) But the errors ought to be
the same.. Hm..

RL> > But it's not as bad as it got a few minutes ago when I tested the
RL> > driver from scyld.com.. It totally trashed my NIC.. A shame though,
RL> > since it ran
RL>
RL> Define "trashed". How exactly did it misbehave, what did you have to
RL> do to get it back working? Anything interesting in the log before it
RL> breaks?

Well.. Not much in the logs.. I rebooted with 2.4.20 ;)

RL> FWIW, it is possible to get a Rhine into a state where physically
RL> removing the PCI card from the computer and keeping both away from any
RL> power source for an hour still results in the driver hanging on boot
RL> (after putting everything back together, of course). I've gone through
RL> this twice so far. Voodoo magic.

Creepy..

--
Henrik Persson
e-mail: [email protected] WWW: http://nix.badanka.com
ICQ: 26019058 PGP/GPG: http://nix.badanka.com/pgp
PGP-Key-ID: 0x43B68116 PGP-Keyserver: pgp.mit.edu


Attachments:
dmesg (9.68 kB)

2003-02-11 19:21:41

by Roger Luethi

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 19:55:07 +0100, Henrik Persson wrote:
> Something was strange.. Now I get the errors.. But the funny thing is:
> when downloading the file there's no problem at all. Uploading the same

It's the Rhine Tx engine that's been giving us headaches all along. There's
at least one bug in the Rx path, too, but it's masked by the Tx problems.

Try this, log again. This will show whether I'm suspecting the right bug.

@@ -1290,6 +1290,9 @@ static void via_rhine_interrupt(int irq,
while ((intr_status = readw(ioaddr + IntrStatus))) {
/* Acknowledge all of the current interrupt sources ASAP. */
writew(intr_status & 0xffff, ioaddr + IntrStatus);
+ if (readb(ioaddr+0x84) & 0x08)
+ printk(KERN_DEBUG "Gotcha: %#x %#x %#x\n", intr_status,
+ readb(ioaddr+0x84), readb(ioaddr+0x86));

if (debug > 4)
printk(KERN_DEBUG "%s: Interrupt, status %4.4x.\n",

Roger

2003-02-11 20:22:04

by Henrik Persson

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 20:31:27 +0100
Roger Luethi <[email protected]> wrote:

RL> It's the Rhine Tx engine that's been giving us headaches all along.
RL> There's at least one bug in the Rx path, too, but it's masked by the
RL> Tx problems.
RL>
RL> Try this, log again. This will show whether I'm suspecting the right
RL> bug.

And look what came up when I stressed the net a bit.. Worked fine at
first, though.. But I guess that depends on other things.. Sunset and
all.. Heh ;)

Well.. dmesg attached..

--
Henrik Persson
e-mail: [email protected] WWW: http://nix.badanka.com
ICQ: 26019058 PGP/GPG: http://nix.badanka.com/pgp
PGP-Key-ID: 0x43B68116 PGP-Keyserver: pgp.mit.edu


Attachments:
dmesg (11.34 kB)

2003-02-11 21:04:59

by Alan

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 2003-02-11 at 14:34, Henrik Persson wrote:
> On 11 Feb 2003 14:18:28 +0000
> Gianni Tedesco <[email protected]> wrote:
>
> GT> On Tue, 2003-02-11 at 13:43, Henrik Persson wrote:
> GT> > The problem is that my Via Rhine-NIC when transmitting alot of data fast
> GT> > (like.. ftp:ing large files over the network at 100mbit/s) gets an error
> GT> > (frame dropped, transmit error, reset).. As a cause of this the speed
> GT> > drops to about 3-4MB/s and the rest of the communication trough the
> GT> > network isn't working very well..
> GT> >
> GT> > Note that this ONLY happens when there's alot of traffic (i.e. speeds at
> GT> > ~100mbit/s)..
> GT>
> GT> Have you tried connecting directly to the other device with a crossover
> GT> cable, do problems still occur?

I have two EPIA-M boards, one does this and is really touchy about cables
the other is quite reliable. If you use a different via-rhine does it work
any better. I'm wondering if there are some dud phy's around on the via stuff

2003-02-11 21:05:55

by Roger Luethi

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 21:31:10 +0100, Henrik Persson wrote:
> And look what came up when I stressed the net a bit.. Worked fine at
> first, though.. But I guess that depends on other things.. Sunset and
> all.. Heh ;)

It's pretty easy to trigger, actually. Just have some heavy traffic going
in _and_ out, e.g. netcat blowing iso images both ways. It will last a
couple of seconds at most.

> Well.. dmesg attached..

# eth0: Setting full-duplex based on MII #1 link partner capability of 45e1.
# eth0: Done via_rhine_open(), status 0c1a MII status: 782d.
# eth0: no IPv6 routers present
# eth0: Transmit error, Tx status 00008800.
# eth0: Transmitter underrun, Tx threshold now 40.
# eth0: Transmit error, Tx status 00008800.
# eth0: Transmitter underrun, Tx threshold now 60.
# eth0: Transmit error, Tx status 00008800.
# eth0: Transmitter underrun, Tx threshold now 80.
# Gotcha: 0x2 0x8 0x0
# Gotcha: 0x1 0x8 0x0
# Gotcha: 0x1 0x8 0x0
# NETDEV WATCHDOG: eth0: transmit timed out
# eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
# eth0: Reset succeeded.

As expected. Now comes the punch line: I don't know how to fix this. I
locked my machine up solid a couple of times trying. It seems that
particular flag doesn't want to be cleared. Of course I could simply reset
the chip, but that's a) less than elegant and b) would make cleaning up the
force_media mess kind of urgent. And doing that properly is a rather
non-trivial change. Also, I need to investigate the implications for
Rhine-III and have somebody test Rhine-I.

Thanks for the logs, though; at least now I know that many more would hit
that problem if they weren't using a driver that breaks down way earlier.
If the problem bothers you I can send you a dirty hack. I need to whack
some registers before writing a proper fix, and I don't know when that will
happen.

Roger

2003-02-11 21:05:44

by Alan

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 2003-02-11 at 15:44, Roger Luethi wrote:
> I have nailed down a number of problems even this patch doesn't fix,
> but it's kinda hard to build from there, since testing feedback has been
> basically zero. Pretty amazing considering how common Rhine hardware
> is. I guess I should write code for NUMA or ia64 instead, _they_ have
> testers <g>.

I'd be happy to test via-rhine stuff, but my boxes don't generally like
2.5.x so I can only usefully test 2.4.x fixes

2003-02-12 12:42:44

by Roger Luethi

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Tue, 11 Feb 2003 17:44:53 +0000, Alan Cox wrote:
> I'd be happy to test via-rhine stuff, but my boxes don't generally like
> 2.5.x so I can only usefully test 2.4.x fixes

No prob. AFAIK the only significant difference 2.4/2.5 is the change you
made in 2.4.21pre4-ac1 (which, being short of IO-APIC hw, I can't test):

o Always set interrupt line with VIA northbridge (me)
| Should fix apic mode problems with USB/audio/net on VIA boards

Besides that, Rhine drivers are in sync and should fail (or work)
identically in both trees.

However, unlike previous patches, changes pending now are not "obviously
right", so they need regression testing. 1.1.15exp1 for example cut the
time allowed for chip reset by three orders of magnitude [1]. Upcoming
patches will likely reshuffle code logic in order to fix races and bugs in
the current code.

So, thanks for the offer. If you can give 1.1.15exp1 a spin that'd be a
good start. It does fix a few existing problems and should not introduce
any new ones.

Roger

[1] http://marc.theaimsgroup.com/?l=linux-kernel&m=104050723916958&w=2

2003-02-12 14:48:48

by Christian Guggenberger

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

>> I'd be happy to test via-rhine stuff, but my boxes don't generally like
>> 2.5.x so I can only usefully test 2.4.x fixes
>
> No prob. AFAIK the only significant difference 2.4/2.5 is the change you
> made in 2.4.21pre4-ac1 (which, being short of IO-APIC hw, I can't test):
>
> o Always set interrupt line with VIA northbridge (me)
> | Should fix apic mode problems with USB/audio/net on VIA boards
>
Can you please send a patch against 2.5.60, cause I would like to test these
IO APIC things on my via board. 2.4-ac is no choice for me, since patching xfs
into 2.4-ac is a little bit too painful for me;-)

Christian

2003-02-12 15:31:53

by Alan

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On Wed, 2003-02-12 at 14:58, Christian Guggenberger wrote:
> > o Always set interrupt line with VIA northbridge (me)
> > | Should fix apic mode problems with USB/audio/net on VIA boards
> >
> Can you please send a patch against 2.5.60, cause I would like to test these
> IO APIC things on my via board. 2.4-ac is no choice for me, since patching xfs
> into 2.4-ac is a little bit too painful for me;-)

At the moment I can't even get 2.5.60 to boot so its a bit hard to do any work
on it. Just run via boxes with "noapic" and dont enable the apic stuff on single
cpu systems. Thats as good if not a better test

2003-02-12 16:56:46

by Christian Guggenberger

[permalink] [raw]
Subject: Re: via rhine bug? (timeouts and resets)

On 12.02.2003 17:41 Alan Cox wrote:
> On Wed, 2003-02-12 at 14:58, Christian Guggenberger wrote:
> > > o Always set interrupt line with VIA northbridge (me)
> > > | Should fix apic mode problems with USB/audio/net on VIA boards
> > >
> > Can you please send a patch against 2.5.60, cause I would like to test
> these
> > IO APIC things on my via board. 2.4-ac is no choice for me, since patching
> xfs
> > into 2.4-ac is a little bit too painful for me;-)
>
> At the moment I can't even get 2.5.60 to boot so its a bit hard to do any
> work
> on it.
Of course;-)

> Just run via boxes with "noapic" and dont enable the apic stuff on
> single
> cpu systems. Thats as good if not a better test
>
That's what I'm almost doing since I have this mobo. I have APICs enabled in
both kernel and bios, but IO-APICs disabled. 2.5.60 seems to work for me.
The only thing I'd like to get rid off, are those Interrupt errors in
/proc/interrupts (maybe they are harmless anyway):

CPU0
0: 941032 XT-PIC timer
1: 1927 XT-PIC i8042
2: 0 XT-PIC cascade
5: 0 XT-PIC VIA8233
8: 4 XT-PIC rtc
10: 14946 XT-PIC ide2, eth0
12: 29425 XT-PIC i8042
14: 7525 XT-PIC ide0
15: 36 XT-PIC ide1
NMI: 0
LOC: 940979
ERR: 914

They won't go away with noapic, too.

With IO-APICs the ERR count would stay at 0. (but then most onboard devices
wouldn't work)
That's why i asked for that APIC patch in my previous mail.

Christian