Subject: need contact of via-rhine developers


hi,

the via-rhine network-driver was originally coded by donald becker who
seems to have stopped working on it 2001 and has since then been modified
by several people like Jeff Garzik, Justin Guyett, Urban Widmark, Dave Miller
and so on.

unfortunately, there are no email-adresses of these persons. does anyone
know how to contact the via-rhine developers, since I am seeing problems
with the VT6103.

thanks,
h.rosmanith




2002-08-20 12:46:46

by Rob Myers

[permalink] [raw]
Subject: Re: need contact of via-rhine developers

this driver seems to work for the vt6103 in patch-2.4.20-pre2-ac3.

with 2.4.18 it would die after about 2mb of data transferred and not
recover.

hth

rob.

On Tue, 2002-08-20 at 07:49, H.Rosmanith (Kernel Mailing List) wrote:
>
> hi,
>
> the via-rhine network-driver was originally coded by donald becker who
> seems to have stopped working on it 2001 and has since then been modified
> by several people like Jeff Garzik, Justin Guyett, Urban Widmark, Dave Miller
> and so on.
>
> unfortunately, there are no email-adresses of these persons. does anyone
> know how to contact the via-rhine developers, since I am seeing problems
> with the VT6103.
>
> thanks,
> h.rosmanith
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Subject: Re: need contact of via-rhine developers

> this driver seems to work for the vt6103 in patch-2.4.20-pre2-ac3.

okay, it seems that some code that restarts a hung chip has been
added.

> with 2.4.18 it would die after about 2mb of data transferred and not
> recover.

aha? I see stalls occuring frequently too, but earlier than 2mb. the
card is then resetted by "NETDEV WATCHDOG", which appears in dmesg,
and you'll also see a "transmit timeout eth0" (or something similar).
so, if the patch is for a chip that does not recover, I guess it
will not address this problem then (because the netdev-watchdog restarts
the chip which *does* recover then).

my concern is that both vt6102 and vt6103 have been assigned the same
pci-device-id by via. I dont think that the via-rhine driver does
differ the two chips and treats the 6103 as a 6102.

also, on ftp.via.com.tw/...NIC/VT6103, there are no chip-specs, as
opposed to the 6102 and 6105-directory. I've just mailed to
[email protected], I hope they will put up the missing specs.

do you know of any other sources of specs beside ftp.via.com.tw?

regards,
h.rosmanith

Subject: Re: need contact of via-rhine developers

> here is the driver i'm using that works on my vt6103 (from
> 2.4.20-pre2-ac3). i have not done throughput benchmarks, but i have put
> many gb's through them with no ill effects so far.
>
> it looks like the patch adds the ability for the driver to restart the
> confused chip...

thanks! that fixed the transmit-timeouts! they happened quite frequently.
the more traffic you'd submit, the more timeouts. e.g. , when viewing
icture over the net (e.g. xv running on a different host), I'd see about
12 timeouts per minute(raw estimation).

any idea what's confusing the chip in the first place?

regards,
h.rosmanith

2002-08-20 17:51:44

by Roger Luethi

[permalink] [raw]
Subject: Re: need contact of via-rhine developers

On Tue, 20 Aug 2002 18:39:38 +0200, H.Rosmanith (Kernel Mailing List) wrote:
> thanks! that fixed the transmit-timeouts! they happened quite frequently.
> the more traffic you'd submit, the more timeouts. e.g. , when viewing
> icture over the net (e.g. xv running on a different host), I'd see about
> 12 timeouts per minute(raw estimation).
>
> any idea what's confusing the chip in the first place?

After a transmission error (e.g. excessive collisions) the chip stops to
let the driver handle it. The driver does its thing and restarts the
transmission engine. Problem is, the ring buffer pointer on the chip
skidded too far and hence takes up work from the wrong entry.

If an error occured on entry n, the chip continues on n+2. The driver stops
harvesting transmitted buffers because the next entry in the ring (n+1)
remains marked as owned by the driver. A few more packets may be sent after
the restart, then the card stalls. After a while the watchdog kicks in to
resets chip and buffers. Transmission continues.

You can verify this easily by dumping ring pointer information and the
status bits associated with the ring buffer.

The fix is to have the interrupt handler set the ring buffer pointer to
what the driver knows to be the current entry.

Btw: The stalling you've seen, was that at 10 or 100 Mbps? Hub or Switch?
With debug level 2 (and fixed driver), do you find Abort or Underrun errors
in your log in situations where stalling occured with the old driver?

Roger

Subject: Re: need contact of via-rhine developers


> > thanks! that fixed the transmit-timeouts!

ouch, that was too soon. the new driver performs better, but not 100%
without timeouts.

>
> Btw: The stalling you've seen, was that at 10 or 100 Mbps? Hub or Switch?

a 10 Mbps hub, half duplex connection. I tried with a Surecom and an Asante
hub. I also tried connection the two machines I've been testing with
with a crossover-cable, but things went worse then.

> With debug level 2 (and fixed driver), do you find Abort or Underrun errors
> in your log in situations where stalling occured with the old driver?

old driver, kernel 2.4.19, debug level 2:
: via-rhine.c:v1.10-LK1.1.13 Nov-17-2001 Written by Donald Becker
: http://www.scyld.com/network/via-rhine.html
: via-rhine: reset finished after 5 microseconds.
: eth0: VIA VT6102 Rhine-II at 0xe800, 00:40:63:c0:b4:8c, IRQ 11.
: eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link 0021.
: eth0: via_rhine_open() irq 11.
: eth0: reset finished after 5 microseconds.
: eth0: Transmit error, Tx status 00008100.
: NETDEV WATCHDOG: eth0: transmit timed out
: eth0: Transmit timed out, status 0000, PHY status 786d, resetting...
: eth0: reset finished after 5 microseconds.


new driver with 2.4.19, debug level 2:
: via-rhine.c:v1.10-LK1.1.14 May-3-2002 Written by Donald Becker
: http://www.scyld.com/network/via-rhine.html
: via-rhine: reset finished after 5 microseconds.
: eth0: VIA VT6102 Rhine-II at 0xe800, 00:40:63:c0:b4:8c, IRQ 11.
: eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link 0021.
: eth0: via_rhine_open() irq 11.
: eth0: reset finished after 5 microseconds.
: eth0: no IPv6 routers present
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: NETDEV WATCHDOG: eth0: transmit timed out
: eth0: Transmit timed out, status 0000, PHY status 786d, resetting...
: eth0: reset finished after 5 microseconds.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: NETDEV WATCHDOG: eth0: transmit timed out
: eth0: Transmit timed out, status 0000, PHY status 786d, resetting...
: eth0: reset finished after 5 microseconds.
: eth0: Transmit error, Tx status 00008100.
: eth0: Abort 2008, frame dropped.
: NETDEV WATCHDOG: eth0: transmit timed out
: eth0: Transmit timed out, status 0000, PHY status 786d, resetting...
: eth0: reset finished after 5 microseconds.
: NETDEV WATCHDOG: eth0: transmit timed out
: eth0: Transmit timed out, status 0000, PHY status 786d, resetting...
: eth0: reset finished after 5 microseconds.

shall we continue this via private email?

regards,
h.rosmanith

Subject: Re: need contact of via-rhine developers


hi,

the following comment in the header of via-rhine.c made me suspicious:


: The send packet thread has partial control over the Tx ring. It locks
: the dev->priv->lock whenever it's queuing a Tx packet. If the next slot
: in the ring is not available it stops the transmit queue by calling
: netif_stop_queue.

okay so far. reading through net/sched/sched_generic.c shows that "NETDEV
WATCHDOG" will bark when (among other conditions) the netif queue is stoppped
and the timer expires. so, what if the queue is stopped, but never started
again by the driver? the only call to netif_start_queue is in via_rhine_open.
shouldn't there be another netif_start_queue if netif_queue_stopped(dev) &&
a packet has left the queue? where's the proper place to insert that?

regards,
h.rosmanith


Subject: Re: need contact of via-rhine developers

>
> You will probably need more log information than the driver can provide.

e.g. like this:

: Aug 21 07:37:22 samiel kernel: > via_rhine_tx
: Aug 21 07:37:22 samiel kernel: Tx scavenge 11 status 80000000.
: Aug 21 07:37:22 samiel kernel: eth0: exiting interrupt, status=0000.
: Aug 21 07:37:22 samiel kernel: > via_rhine_start_tx
: Aug 21 07:37:22 samiel kernel: stopping netif_queue
: Aug 21 07:37:22 samiel kernel: eth0: Transmit frame #36 queued in slot 4.
: Aug 21 07:37:22 samiel kernel: eth0: Interrupt, status 0002.
: Aug 21 07:37:22 samiel kernel: > via_rhine_tx
: Aug 21 07:37:22 samiel kernel: Tx scavenge 11 status 80000000.
: Aug 21 07:37:22 samiel kernel: eth0: exiting interrupt, status=0000.
: Aug 21 07:37:26 samiel kernel: eth0: VIA Rhine monitor tick, status 0000.
: Aug 21 07:37:26 samiel kernel: NETDEV WATCHDOG: eth0: transmit timed out
: Aug 21 07:37:26 samiel kernel: > via_rhine_tx_timeout

this indicates that the netif_queue is stopped, but never started again,
allthough:

if ((np->cur_tx - np->dirty_tx) < TX_QUEUE_LEN - 4)
netif_wake_queue (dev);

maybe something is wrong with that one?

I'll add more debugs...

regards,
h.rosmanith