2002-03-20 07:27:44

by Ivan G.

[permalink] [raw]
Subject: Via-Rhine stalls - transmit errors

Hello,

I was unsure about the maintainer of the via-rhine
driver so I'm mailing the bug report to the kernel
list. Please cc to [email protected].

Problem:
My ethernet card

/proc/pci:
Ethernet controller: VIA Technologies, Inc.
VT86C100A [Rhine 10/100] (rev 6).
IRQ 11.

is stalling during a large scp transfer.
The card freezes for a long time, before continuing
the transfer. I receive transmit timeout messages
and "something wicked happened".
Connection is negotiated to 100BaseTx-FD.


System Information:
AMD Athlon XP 1600+, Matsonic MS8127C+ board,
VIA Apollo Kt133A/VT82C686B,
Ethernet Controller - listed above,
Kernel 2.4.19-pre3, using the via-rhine driver.


System on the opposite end:
Sony Vaio Laptop - Pentium III Coppermine,
ethernet card: Netgear FA410TX Pcmcia
Kernel: 2.4.19-pre3 using pcnet_cs driver

Appended are sections of the log of an scp transfer
using via-rhine debug lvl 7 - /var/log/messages.

----------------

Mar 19 23:08:53 cobra kernel: eth0: Setting
full-duplex based on MII #1 link partner capability of
41e1.

....


Mar 19 23:12:15 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 40.
Mar 19 23:12:15 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:12:15 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 60.
Mar 19 23:12:15 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:12:16 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 80.
Mar 19 23:12:16 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:12:21 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 19 23:12:21 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status
782d, resetting...
Mar 19 23:12:58 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 40.
Mar 19 23:12:58 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:13:02 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 60.
Mar 19 23:13:02 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:13:04 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 80.
Mar 19 23:13:04 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:14:09 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 19 23:14:09 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status
782d, resetting...

....

Mar 19 23:15:09 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:15:09 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to a0.
Mar 19 23:15:09 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:15:12 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to c0.
Mar 19 23:15:12 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:15:13 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to e0.
Mar 19 23:15:13 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:15:13 cobra kernel: s 0002.
Mar 19 23:15:13 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to e0.

_______________________________________
What could be the problem?
Thank you for your help in advance.







__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/


2002-03-20 15:38:33

by Andy Carlson

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

On Tue, 19 Mar 2002, Ivan Gurdiev wrote:

> Hello,
>
> I was unsure about the maintainer of the via-rhine
> driver so I'm mailing the bug report to the kernel
> list. Please cc to [email protected].
>

Here is a patch the Urban Widmark originally came up with for 2.4.17,
and I retrofitted to 2.4.18. I do not know if it will patch vs
2.4.19-pre3:

--- linux-2.4.18-orig/drivers/net/via-rhine.c Mon Feb 25 13:38:00 2002
+++ linux-2.4.18-rhine/drivers/net/via-rhine.c Wed Feb 27 15:17:40 2002
@@ -91,7 +91,7 @@
/* A few user-configurable values.
These may be modified when a driver module is loaded. */

-static int debug = 1; /* 1 normal messages, 0 quiet .. 7 verbose. */
+static int debug = 3; /* 1 normal messages, 0 quiet .. 7 verbose. */
static int max_interrupt_work = 20;

/* Set the copy breakpoint for the copy-only-tiny-frames scheme.
@@ -329,7 +329,7 @@

enum chip_capability_flags {
CanHaveMII=1, HasESIPhy=2, HasDavicomPhy=4,
- ReqTxAlign=0x10, HasWOL=0x20, };
+ ReqTxAlign=0x10, HasWOL=0x20, HasIntTxUnderFlow=0x40, };

#ifdef USE_MEM
#define RHINE_IOTYPE (PCI_USES_MEM | PCI_USES_MASTER | PCI_ADDR1)
@@ -343,11 +343,13 @@
{ "VIA VT86C100A Rhine", RHINE_IOTYPE, 128,
CanHaveMII | ReqTxAlign },
{ "VIA VT6102 Rhine-II", RHINE_IOTYPE, 256,
- CanHaveMII | HasWOL },
+ CanHaveMII | HasWOL | HasIntTxUnderFlow },
{ "VIA VT3043 Rhine", RHINE_IOTYPE, 128,
CanHaveMII | ReqTxAlign }
};

+#define MII_DAVICOM_DM9101 0x0181b800
+
static struct pci_device_id via_rhine_pci_tbl[] __devinitdata =
{
{0x1106, 0x6100, PCI_ANY_ID, PCI_ANY_ID, 0, 0, VT86C100A},
@@ -384,27 +386,14 @@
IntrRxDone=0x0001, IntrRxErr=0x0004, IntrRxEmpty=0x0020,
IntrTxDone=0x0002, IntrTxAbort=0x0008, IntrTxUnderrun=0x0010,
IntrPCIErr=0x0040,
- IntrStatsMax=0x0080, IntrRxEarly=0x0100, IntrMIIChange=0x0200,
+ IntrStatsMax=0x0080, IntrRxEarly=0x0100,
+ IntrMIIChange=0x0200, IntrTxUnderflow=0x0200, /* see HasIntTxUnderFlow */
IntrRxOverflow=0x0400, IntrRxDropped=0x0800, IntrRxNoBuf=0x1000,
IntrTxAborted=0x2000, IntrLinkChange=0x4000,
IntrRxWakeUp=0x8000,
IntrNormalSummary=0x0003, IntrAbnormalSummary=0xC260,
};

-/* MII interface, status flags.
- Not to be confused with the MIIStatus register ... */
-enum mii_status_bits {
- MIICap100T4 = 0x8000,
- MIICap10100HdFd = 0x7800,
- MIIPreambleSupr = 0x0040,
- MIIAutoNegCompleted = 0x0020,
- MIIRemoteFault = 0x0010,
- MIICapAutoNeg = 0x0008,
- MIILink = 0x0004,
- MIIJabber = 0x0002,
- MIIExtended = 0x0001
-};
-
/* The Rx and Tx buffer descriptors. */
struct rx_desc {
s32 rx_status;
@@ -464,6 +453,7 @@

/* Frequently used values: keep some adjacent for cache effect. */
int chip_id, drv_flags;
+ u8 rev_id;
struct rx_desc *rx_head_desc;
unsigned int cur_rx, dirty_rx; /* Producer/consumer ring indices */
unsigned int cur_tx, dirty_tx;
@@ -480,14 +470,14 @@
u16 advertising; /* NWay media advertisement */
unsigned char phys[MAX_MII_CNT]; /* MII device addresses. */
unsigned int mii_cnt; /* number of MIIs found, but only the first one is used */
- u16 mii_status; /* last read MII status */
+ u32 mii;
struct mii_if_info mii_if;
};

static int mdio_read(struct net_device *dev, int phy_id, int location);
static void mdio_write(struct net_device *dev, int phy_id, int location, int value);
static int via_rhine_open(struct net_device *dev);
-static void via_rhine_check_duplex(struct net_device *dev);
+static int via_rhine_check_duplex(struct net_device *dev);
static void via_rhine_timer(unsigned long data);
static void via_rhine_tx_timeout(struct net_device *dev);
static int via_rhine_start_tx(struct sk_buff *skb, struct net_device *dev);
@@ -679,19 +669,32 @@
goto err_out_unmap;
}

+ /* "3065" specials ... */
if (chip_id == VT6102) {
+ u8 mode3_reg;
/*
* for 3065D, EEPROM reloaded will cause bit 0 in MAC_REG_CFGA
* turned on. it makes MAC receive magic packet
* automatically. So, we turn it off. (D-Link)
*/
- writeb(readb(ioaddr + ConfigA) & 0xFE, ioaddr + ConfigA);
+ writeb(readb(ioaddr + ConfigA) & 0xFE, ioaddr + ConfigA);
+
+ /*
+ * turn on bit2 in PCI configuration register 0x53 ("PCI_REG_MOD
+E3")
+ * 0x04 == MODE3_MIION
+ * FIXME: what does this do?
+ */
+ pci_read_config_byte(pdev, 0x53, &mode3_reg);
+ pci_write_config_byte(pdev, 0x53, mode3_reg | 0x04);
+
}

dev->irq = pdev->irq;

np = dev->priv;
spin_lock_init (&np->lock);
+ pci_read_config_byte(pdev, PCI_REVISION_ID, &(np->rev_id));
np->chip_id = chip_id;
np->drv_flags = via_rhine_chip_info[chip_id].drv_flags;
np->pdev = pdev;
@@ -733,8 +736,9 @@
if (i)
goto err_out_unmap;

- printk(KERN_INFO "%s: %s at 0x%lx, ",
+ printk(KERN_INFO "%s: %s (rev 0x%02x) at 0x%lx, ",
dev->name, via_rhine_chip_info[chip_id].name,
+ np->rev_id,
(pci_flags & PCI_USES_IO) ? ioaddr : memaddr);

for (i = 0; i < 5; i++)
@@ -747,23 +751,30 @@
int phy, phy_idx = 0;
np->phys[0] = 1; /* Standard for this chip. */
for (phy = 1; phy < 32 && phy_idx < MAX_MII_CNT; phy++) {
- int mii_status = mdio_read(dev, phy, 1);
+ int mii_status = mdio_read(dev, phy, MII_BMSR);
if (mii_status != 0xffff && mii_status != 0x0000) {
np->phys[phy_idx++] = phy;
- np->mii_if.advertising = mdio_read(dev, phy, 4);
- printk(KERN_INFO "%s: MII PHY found at address %d, status "
+ np->advertising = mdio_read(dev, phy, MII_ADVERTISE);
+ np->mii = (mdio_read(dev, phy, MII_PHYSID1) << 16) +
+ mdio_read(dev, phy, MII_PHYSID2);
+ printk(KERN_INFO "%s: MII PHY %8.8x found at add
+ress %d, status "
+
"0x%4.4x advertising %4.4x Link %4.4x.\n",
- dev->name, phy, mii_status, np->mii_if.advertising,
- mdio_read(dev, phy, 5));
+ dev->name, np->mii, phy, mii_status, np->advertising,
+ mdio_read(dev, phy, MII_LPA));

- /* set IFF_RUNNING */
- if (mii_status & MIILink)
- netif_carrier_on(dev);
- else
- netif_carrier_off(dev);
+ if ((np->mii & ~0xf) == MII_DAVICOM_DM9101)
+ np->drv_flags |= HasDavicomPhy;
}
}
np->mii_cnt = phy_idx;
+ if (phy_idx == 0) {
+ printk(KERN_WARNING "%s: MII PHY not found -- this devic
+e may "
+ "not operate correctly.\n", dev->name);
+ }
+
np->mii_if.phy_id = np->phys[0];
}

@@ -773,15 +784,14 @@
np->mii_if.full_duplex = 1;
np->default_port = option & 0x3ff;
if (np->default_port & 0x330) {
- /* FIXME: shouldn't someone check this variable? */
- /* np->medialock = 1; */
printk(KERN_INFO " Forcing %dMbs %s-duplex operation.\n",
(option & 0x300 ? 100 : 10),
(option & 0x220 ? "full" : "half"));
if (np->mii_cnt)
- mdio_write(dev, np->phys[0], 0,
- ((option & 0x300) ? 0x2000 : 0) | /* 100mbps? */
- ((option & 0x220) ? 0x0100 : 0)); /* Full duplex? */
+ mdio_write(dev, np->phys[0], MII_BMCR,
+ ((option & 0x300) ? 0x2000 : 0) | /* 100mbps */
+ ((option & 0x220) ? 0x0100 : 0)); /* Full duplex */
+
}
}

@@ -962,10 +972,27 @@
for (i = 0; i < 6; i++)
writeb(dev->dev_addr[i], ioaddr + StationAddr + i);

- /* Initialize other registers. */
+ /* Turn on bit3 (OFSET) in TCR during MAC initialization. */
+ writeb(readb(ioaddr + TxConfig) | 0x08, ioaddr + TxConfig);
+
+ /* Turn off mauto */
+ writeb(readb(ioaddr + MIICmd) & 0x7f, ioaddr + MIICmd);
+ if (np->drv_flags & HasDavicomPhy) {
+ udelay(100);
+ } else {
+ int boguscnt = 0x3fff;
+ while (!(readb(ioaddr + MIIRegAddr) & 0x80) && --boguscnt > 0)
+ ;
+ }
+
writew(0x0006, ioaddr + PCIBusConfig); /* Tune configuration??? */
+
+ /* Clear the lower 4 bits in Config D. */
+ outb(inb(ioaddr + ConfigD) & 0xf0, ioaddr + ConfigD);
+
+
/* Configure the FIFO thresholds. */
- writeb(0x20, ioaddr + TxConfig); /* Initial threshold 32 bytes */
+ writeb(0x20, ioaddr + TxConfig); /* Initial threshold */
np->tx_thresh = 0x20;
np->rx_thresh = 0x60; /* Written in via_rhine_set_rx_mode(). */

@@ -988,13 +1015,33 @@
np->chip_cmd |= CmdFDuplex;
writew(np->chip_cmd, ioaddr + ChipCmd);

+ /* Restart MII auto-negotiation for davidcom phy bug */
+ if (np->drv_flags & HasDavicomPhy) {
+ mdio_write(dev,np->phys[0], MII_BMCR,
+ mdio_read(dev, np->phys[0], MII_BMCR) | 0x1200);
+ /* FIXME: shouldn't we wait for it to complete? Adding silly-delay. */
+ udelay(100);
+ }
+
+
via_rhine_check_duplex(dev);

/* The LED outputs of various MII xcvrs should be configured. */
+ /* For Davicom phys, turn on bit 5 in register 0x16. */
/* For NS or Mison phys, turn on bit 1 in register 0x17 */
/* For ESI phys, turn on bit 7 in register 0x17. */
- mdio_write(dev, np->phys[0], 0x17, mdio_read(dev, np->phys[0], 0x17) |
- (np->drv_flags & HasESIPhy) ? 0x0080 : 0x0001);
+ if (np->drv_flags & HasDavicomPhy) {
+ mdio_write(dev, np->phys[0], 0x16,
+ mdio_read(dev, np->phys[0], 0x16) | 0x20);
+ } else if (np->drv_flags & HasESIPhy) {
+ mdio_write(dev, np->phys[0], 0x17,
+ mdio_read(dev, np->phys[0], 0x17) | 0x80);
+ } else {
+ /* All unknowns get this. Could be a problem. */
+ mdio_write(dev, np->phys[0], 0x17,
+ mdio_read(dev, np->phys[0], 0x17) | 0x01);
+ }
+
}
/* Read and write over the MII Management Data I/O (MDIO) interface. */

@@ -1089,29 +1136,70 @@
return 0;
}

-static void via_rhine_check_duplex(struct net_device *dev)
+static int via_rhine_check_duplex(struct net_device *dev)
{
struct netdev_private *np = dev->priv;
long ioaddr = dev->base_addr;
- int mii_lpa = mdio_read(dev, np->phys[0], MII_LPA);
- int negotiated = mii_lpa & np->mii_if.advertising;
+ int mii_reg;
int duplex;
+ int changed = 0;
+
+ mii_reg = mdio_read(dev, np->phys[0], MII_BMSR);
+ if (mii_reg == 0xffff)
+ return changed;
+
+ if (debug > 3)
+ printk(KERN_DEBUG "%s: MII #%d reports: %4.4x, carrier is: %d\n",
+ dev->name, np->phys[0], mii_reg, netif_carrier_ok(dev));
+ if (!(mii_reg & BMSR_LSTATUS)) {
+ if (netif_carrier_ok(dev)) {
+ if (debug > 1)
+ printk(KERN_DEBUG "%s: MII #%d reports no link. Disabling watchdog.\n",
+ dev->name, np->phys[0]);
+ netif_carrier_off(dev);
+ }
+ } else if (!netif_carrier_ok(dev)) {
+ if (debug > 1)
+ printk(KERN_DEBUG "%s: MII #%d link is back. Enabling watchdog.\n",
+ dev->name, np->phys[0]);
+ netif_carrier_on(dev);
+ }
+
+ if (np->drv_flags & HasDavicomPhy) {
+ /* If the link partner doesn't support autonegotiation
+ * the MII detects it's abilities with the "parallel detection".
+ * Some MIIs update the LPA register to the result of the parallel
+ * detection, some don't.
+ * The Davicom PHY [at least 0181b800] doesn't.
+ * Instead bit 8 and 13 of the BMCR are updated to the result
+ * of the negotiation..
+ */
+ mii_reg = mdio_read(dev, np->phys[0], MII_BMCR);
+ duplex = mii_reg & 0x100;
+ } else {
+ int negotiated;
+ mii_reg = mdio_read(dev, np->phys[0], MII_LPA);
+ negotiated = mii_reg & np->advertising;
+
+ duplex = (negotiated & 0x0100) || ((negotiated & 0x02C0) == 0x0040);
+ }

- if (np->mii_if.duplex_lock || mii_lpa == 0xffff)
- return;
- duplex = (negotiated & 0x0100) || (negotiated & 0x01C0) == 0x0040;
+ if (np->mii_if.duplex_lock || mii_reg == 0xffff)
+ return changed;
if (np->mii_if.full_duplex != duplex) {
np->mii_if.full_duplex = duplex;
+ changed = 1;
if (debug)
printk(KERN_INFO "%s: Setting %s-duplex based on MII #%d link"
" partner capability of %4.4x.\n", dev->name,
- duplex ? "full" : "half", np->phys[0], mii_lpa);
+ duplex ? "full" : "half", np->phys[0], mii_reg);
if (duplex)
np->chip_cmd |= CmdFDuplex;
else
np->chip_cmd &= ~CmdFDuplex;
writew(np->chip_cmd, ioaddr + ChipCmd);
}
+ return changed;
}


@@ -1121,7 +1209,6 @@
struct netdev_private *np = dev->priv;
long ioaddr = dev->base_addr;
int next_tick = 10*HZ;
- int mii_status;

if (debug > 3) {
printk(KERN_DEBUG "%s: VIA Rhine monitor tick, status %4.4x.\n",
@@ -1129,19 +1216,8 @@
}

spin_lock_irq (&np->lock);
-
via_rhine_check_duplex(dev);

- /* make IFF_RUNNING follow the MII status bit "Link established" */
- mii_status = mdio_read(dev, np->phys[0], MII_BMSR);
- if ( (mii_status & MIILink) != (np->mii_status & MIILink) ) {
- if (mii_status & MIILink)
- netif_carrier_on(dev);
- else
- netif_carrier_off(dev);
- }
- np->mii_status = mii_status;
-
spin_unlock_irq (&np->lock);

np->timer.expires = jiffies + next_tick;
@@ -1259,6 +1335,8 @@
long ioaddr;
u32 intr_status;
int boguscnt = max_interrupt_work;
+ struct netdev_private *np = dev->priv;
+ int underflow = np->drv_flags & HasIntTxUnderFlow ? IntrTxUnderflow : 0;

ioaddr = dev->base_addr;

@@ -1275,7 +1353,7 @@
via_rhine_rx(dev);

if (intr_status & (IntrTxDone | IntrTxAbort | IntrTxUnderrun |
- IntrTxAborted))
+ IntrTxAborted | underflow))
via_rhine_tx(dev);

/* Abnormal error summary/uncommon events handlers. */
@@ -1323,7 +1401,13 @@
if (txstatus & 0x0100) np->stats.tx_aborted_errors++;
if (txstatus & 0x0080) np->stats.tx_heartbeat_errors++;
if (txstatus & 0x0002) np->stats.tx_fifo_errors++;
+
/* Transmitter restarted in 'abnormal' handler. */
+ if (txstatus & 0x0900) {
+ /* FIXME: also update TxRingPtr ??? */
+ np->tx_ring[entry].tx_status = cpu_to_le32(DescOwn);
+ }
+
} else {
np->stats.collisions += (txstatus >> 3) & 15;
np->stats.tx_bytes += np->tx_skbuff[entry]->len;
@@ -1465,30 +1549,46 @@
{
struct netdev_private *np = dev->priv;
long ioaddr = dev->base_addr;
+ int mii = np->drv_flags & HasIntTxUnderFlow ? 0 : IntrMIIChange;
+ int underflow = np->drv_flags & HasIntTxUnderFlow ? IntrTxUnderflow : 0;

spin_lock (&np->lock);

- if (intr_status & (IntrMIIChange | IntrLinkChange)) {
+ if (intr_status & (mii | IntrLinkChange)) {
+ int changed;
+ if (debug)
+ printk(KERN_DEBUG "%s: Link change, status=%4.4x\n",
+ dev->name, intr_status);
if (readb(ioaddr + MIIStatus) & 0x02) {
/* Link failed, restart autonegotiation. */
if (np->drv_flags & HasDavicomPhy)
mdio_write(dev, np->phys[0], MII_BMCR, 0x3300);
+ /* Will the new link status be reported late? */
+ changed = 0;
+
} else
- via_rhine_check_duplex(dev);
- if (debug)
+ changed = via_rhine_check_duplex(dev);
+ if (debug && changed)
+
printk(KERN_ERR "%s: MII status changed: Autonegotiation "
"advertising %4.4x partner %4.4x.\n", dev->name,
- mdio_read(dev, np->phys[0], MII_ADVERTISE),
- mdio_read(dev, np->phys[0], MII_LPA));
+ mdio_read(dev, np->phys[0], MII_ADVERTISE),
+ mdio_read(dev, np->phys[0], MII_LPA));
+
}
if (intr_status & IntrStatsMax) {
np->stats.rx_crc_errors += readw(ioaddr + RxCRCErrs);
np->stats.rx_missed_errors += readw(ioaddr + RxMissed);
clear_tally_counters(ioaddr);
}
- if (intr_status & IntrTxAbort) {
+ if (intr_status & (underflow | IntrTxAbort)) {
+
/* Stats counted in Tx-done handler, just restart Tx. */
writew(CmdTxDemand | np->chip_cmd, dev->base_addr + ChipCmd);
+ if (debug > 1)
+ printk(KERN_INFO "%s: Transmitter underflow?, status %4.4x.\n",
+ dev->name, intr_status);
+
}
if (intr_status & IntrTxUnderrun) {
if (np->tx_thresh < 0xE0)
@@ -1497,8 +1597,7 @@
printk(KERN_INFO "%s: Transmitter underrun, increasing Tx "
"threshold setting to %2.2x.\n", dev->name, np->tx_thresh);
}
- if ((intr_status & ~( IntrLinkChange | IntrStatsMax |
- IntrTxAbort | IntrTxAborted))) {
+ if ((intr_status & ~( IntrLinkChange | IntrStatsMax | IntrTxAborted))) {
if (debug > 1)
printk(KERN_ERR "%s: Something Wicked happened! %4.4x.\n",
dev->name, intr_status);

--
Andy Carlson |\ _,,,---,,_
[email protected] ZZZzz /,`.-'`' -. ;-;;,_
Cat Pics: http://andyc.dyndns.org/animal.html |,4- ) )-,_. ,\ ( `'-'
St. Louis, Missouri '---''(_/--' `-'\_)


2002-03-21 05:21:03

by Ivan G.

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

> Here is a patch the Urban Widmark originally came up
> with for 2.4.17,
> and I retrofitted to 2.4.18. I do not know if it
> will patch vs 2.4.19-pre3:

It does. However the problem persists.
I changed the default debug value to 7 again.

Here's more information:
Unlike the old driver this one repeatedly logs:
Mar 20 21:47:50 cobra kernel: eth0: Setting
full-duplex based on MII #1 link partner capability of
3100.
Mar 20 21:48:30 cobra last message repeated 4 times

...when inactive...

The opposite side repeatedly logs:
eth0: lost link beat
eth0: found link beat
eth0: autonegotiation complete: 100BaseT-FD selected

As for the scp transfer.... same problem except
now I get "Transmitter underflow?" errors
and status of 0008.

Here's a section of an example log:
--------------------------------------------------
Mar 20 21:51:51 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 20 21:51:51 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 20 21:51:51 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 60.
Mar 20 21:51:51 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 20 21:51:52 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:51:52 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:51:56 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 20 21:51:56 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status
782d, resetting...
Mar 20 21:51:56 cobra kernel: eth0: reset finished
after 5 microseconds.
Mar 20 21:51:56 cobra kernel: eth0: Setting
full-duplex based on MII #1 link par
tner capability of 3100.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 40.
Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underflow?, status 000a.
Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 000a.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 60.
Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 20 21:52:00 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:52:00 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:52:00 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:52:00 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:52:00 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:52:00 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:52:00 cobra kernel: eth0: Setting
full-duplex based on MII #1 link par
tner capability of 3100.

------------------------
Let me know how I can help.
Thanks for your assistance.



__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards?
http://movies.yahoo.com/

2002-03-21 20:50:13

by Ivan G.

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

if ((intr_status & ~( IntrLinkChange | IntrStatsMax |
IntrTxAborted ))) {
if (debug > 1)
printk(KERN_ERR "%s: Something Wicked happened!
%4.4x.\n",dev->name, intr_status);
/* Recovery for other fault sources not known. */
writew(CmdTxDemand | np->chip_cmd, dev->base_addr +
ChipCmd);
}

What's classified as "Something Wicked" ?

Mar 20 21:52:00 cobra kernel: eth0: Something Wicked
happened! 0008.

This is tx abort isn't it?

Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 001a.

...and this should be : tx underrun, tx abort, tx done

are those supposed to be logged as "Wicked"?
Those interrupts are handled earlier aren't they?
if (intr_status & (underflow | IntrTxAbort))
...
if (intr_status & IntrTxUnderrun) {
...


I'm quite ignorant of all this, but I'm trying to
learn. I apologize if this is a stupid question.




__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards?
http://movies.yahoo.com/

2002-03-21 21:56:11

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

On Thu, 21 Mar 2002, Ivan Gurdiev wrote:

> if ((intr_status & ~( IntrLinkChange | IntrStatsMax |
> IntrTxAborted ))) {
> if (debug > 1)
> printk(KERN_ERR "%s: Something Wicked happened!
> %4.4x.\n",dev->name, intr_status);
> /* Recovery for other fault sources not known. */
> writew(CmdTxDemand | np->chip_cmd, dev->base_addr +
> ChipCmd);
> }
>
> What's classified as "Something Wicked" ?
>
> Mar 20 21:52:00 cobra kernel: eth0: Something Wicked
> happened! 0008.
>
> This is tx abort isn't it?
>
> Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
> happened! 001a.
>
> ...and this should be : tx underrun, tx abort, tx done
>
> are those supposed to be logged as "Wicked"?
> Those interrupts are handled earlier aren't they?
> if (intr_status & (underflow | IntrTxAbort))
> ...
> if (intr_status & IntrTxUnderrun) {
> ...
>
>
> I'm quite ignorant of all this, but I'm trying to
> learn. I apologize if this is a stupid question.
>

If there was a link-mode change (100-to-10-base, 1/2 to full duplex) OR
if the chip status overflowed (tx packets, rx, packets, errors, etc) OR
if the transmitter aborted (unplug when active, etc.) THEN
reset and reprogram the chip (after telling you something wicked
happened IF verbose debug is enabled).

You can turn OFF verbose debug (set debug to 1) and you won't have
the message.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-03-22 02:33:32

by Ivan G.

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

Okay...let's see.
I've been playing around with Urban's patch
and now I'll risk making a fool of myself
sharing some of my 'ideas' - heh.

ISSUE 1: Those "Something Wicked" messages
Present in original and patch too.

> if ((intr_status & ~( IntrLinkChange | IntrStatsMax
|
> IntrTxAborted ))) {

Richard,
Isn't this bitwise AND with a complement...
Meaning negation?
The way I understood it:
If this is an interrupt that is NOT IntrLinkChange,
IntrStatsMax or IntrTxAborted, we don't know what's
going on but it can't be good so print error message
and reset the chip.....a block designed to trap
all other problems at the end of the error function.

I still don't see the point behind this.
The function via_rhine_error is called only once like
this:
if (intr_status & (IntrPCIErr | IntrLinkChange |
IntrMIIChange | IntrStatsMax | IntrTxAbort |
IntrTxUnderrun))
via_rhine_error(dev, intr_status);

so if none of those interrupts are present,
the error function won't even be called.
So why check for anything else?

Inside error function:
if (intr_status & (mii | IntrLinkChange)) {
takes care of IntrLinkChange and IntrMIIChange

if (intr_status & IntrStatsMax) {
takes care of IntrStatsMax

if (intr_status & (underflow | IntrTxAbort)) {
takes care of IntrTxUnderflow and IntrTxAbort

if (intr_status & IntrTxUnderrun) {
takes care of IntrTxUnderrun

only IntrPCIErr is missing....that could have
trigged this function call...
so why don't just add:
if (intr_status & IntrPCIErr) {
-do error message
-reset chip
and get rid of the "Wicked" checks...
They prints misleading error messages...
Maybe I'm missing something.


ISSUE 2: Repetitive negotiation of full duplex
Created by the Urban patch.

Andy,
Um...
I did some printouts and ended up with duplex being
256....probably because of: duplex = mii_reg & 0x100;
Also: np->mii_if.full_duplex = duplex; refused
to set full_duplex to 256 (?)

I am not sure but I believe, based on other code,
that duplex is supposed to be 0 or 1.

changed to
duplex = (mii_reg & 0x100)? 1:0;
and it's working fine now - negotiates only once.
full_duplex actually changes...

ISSUE 3: Well, my card is still stalling.
But I should probably leave this to somebody
who actually has a clue about those things.
The log looks a lot cleaner now, though:

.....
Mar 21 19:04:10 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 21 19:04:10 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshold setting to 40.
Mar 21 19:04:15 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 21 19:04:15 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 21 19:04:19 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 21 19:04:19 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status 782d, resetting...
Mar 21 19:04:19 cobra kernel: eth0: reset finished
after 5 microseconds.
Mar 21 19:04:24 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 21 19:04:24 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshold setting to 40.
Mar 21 19:04:24 cobra kernel: eth0: Transmitter
underflow?, status 000a.
Mar 21 19:04:29 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 21 19:04:29 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status 782d, resetting...
Mar 21 19:04:29 cobra kernel: eth0: reset finished
after 5 microseconds.
Mar 21 19:04:36 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 21 19:04:36 cobra last message repeated 2 times
Mar 21 19:04:39 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 21 19:04:39 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status 782d, resetting...
Mar 21 19:04:39 cobra kernel: eth0: reset finished
after 5 microseconds.
...........

So?
Is any of the above correct?
Or am I really close to frying my ethernet controller?
:)
Either way, changing stuff in the kernel's been fun.
I'll investigate some more.

Also: an off-topic question...
How do I reply to a particular message..
So that my messages appear in thread format
with more than 1 level...
In-Reply To rather than Maybe In Reply-To
This thread is starting to grow and I'd like to know.


__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards?
http://movies.yahoo.com/

2002-03-24 12:41:19

by Urban Widmark

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

On Wed, 20 Mar 2002, Ivan Gurdiev wrote:

> > Here is a patch the Urban Widmark originally came up
> > with for 2.4.17,
> > and I retrofitted to 2.4.18. I do not know if it
> > will patch vs 2.4.19-pre3:
>
> It does. However the problem persists.
> I changed the default debug value to 7 again.

That patch was an attempt to fix failed transmission and it contains all
sorts of junk. It worked for Andy, but not for one of the others with that
problem (and that motherboard, a Dragon+ with on-board eth chip).

You probably shouldn't use that patch at all ...


VIA has drivers for VT86c100A, the VT6102 and the VT6105, available here:
http://www.viaarena.com/?PageID=71

They are all obviously derived from the Donald Becker and the in-kernel
driver and are as such GPL, even if there is no mention of any license.
(someone should probably talk to VIA about that ...)


What you could do is:
a) Try those drivers instead and see if any one works better.
The 6105 driver should work with the older chips too.

b) Start moving bits of code from those drivers to the kernel driver.
The drivers have lots of "if vt3065 do this" comments, and then they
flip some bit that you couldn't have guessed needed flipping ...

I think there is very little chance of you frying your controller by
doing this.

But watch out for the multiple entry points, they support 2.2 and 2.4
by having both old and new init code.


There is also the Donald Becker driver at http://www.scyld.com/

There is an explanation of common "something wicked" errors on
http://www.scyld.com/network/ethercard.html

001a would be
0010 "Transmit FIFO underrun" = Slow or busy PCI bus
0008 "Transmit error" = Transmit aborted because of excessive collisions

So if you trust the explanations of the errors it could just be something
with cable/hub/switch or a fun PCI error. I know some have solved their
001a's by switching slots for their cards.


> The opposite side repeatedly logs:
> eth0: lost link beat
> eth0: found link beat
> eth0: autonegotiation complete: 100BaseT-FD selected

You had a "VT86c100A"? I think this is a bug in that patch where it
misdetects link changes or something. There were ideas on changed meaning
of an interrupt bit (0x0200) and the "fix" for that is probably causing
this.

The other main idea for Andy's problem was that the initial tx threshold
should be increased. The 6105 driver has code that allows you to set the
default as a module parameter, which could be useful.

/Urban

2002-03-26 01:52:43

by Ivan G.

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors


> That patch was an attempt to fix failed transmission
> and it contains all sorts of junk. It worked for
> Andy, but not for one of the others with that
problem
> (and that motherboard, a Dragon+ with on-board eth
> chip).
>
> You probably shouldn't use that patch at all ...

Ok, I'll follow your advice and start from scratch...
or the kernel driver that is :) Interestingly,
however,
I was able to somehow get consistent stall-free
transfers for a while using your patch and some other
fixes of my own...but then I changed things and was
unable to reproduce the same situation... This is
happening again with the kernel driver. Sometimes my
card works great, sometimes it works ok, and sometimes
it doesn't. I am losing faith in all hardware...
now I don't trust my Netgear card on the other side
either. The thing is, this Via-rhine card works fine
under Win 98 (I have a dual-boot) or under light load
(ping or ping flood). It only breaks under heavy
transmit.


> VIA has drivers for VT86c100A, the VT6102 and the
> VT6105, available here:
> http://www.viaarena.com/?PageID=71

tried that one...it seemed to fix transmit when
initiated from my desktop computer, but it freezes
everything when I initiate the transmit in the same
direction from the laptop.

....just like the time I decided to divide by 0
in the kernel :)

> There is also the Donald Becker driver at
> http://www.scyld.com/

This one won't compile. Lots of errors.
Entire include files are missing.



> There is an explanation of common "something wicked"
> errors on
> http://www.scyld.com/network/ethercard.html...

> So if you trust the explanations of the errors it

I do, the erorrs are correct. However for some of
those errors you can't even get the "something wicked"
message the way that the code is written.
Other errors are handled elsewhere. The whole
thing is complicated and may cause redundancies
and problems. Error handling needs improvement.

> There were ideas on changed meaning
> of an interrupt bit (0x0200) and the "fix" for that
> is probably causing
>this.

Isn't this your patch?
It adds a mii/underflow inerrupt switch scheme.

By the way,

/* Enable interrupts by setting the interrupt mask. */
writew(IntrRxDone | IntrRxErr | IntrRxEmpty|
IntrRxOverflow| IntrRxDropp
ed|IntrTxDone | IntrTxAbort | IntrTxUnderrun |
IntrPCIErr | IntrStatsMax | IntrLinkChange |
IntrMIIChange, ioaddr + IntrEnable);


Where's IntrRxEarly? IntrRxNoBuf? IntrRxWakeUp?
IntrTxAborted? .... I added those to my version.


__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards?
http://movies.yahoo.com/

2002-03-26 21:20:12

by Urban Widmark

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

On Mon, 25 Mar 2002, Ivan Gurdiev wrote:

> tried that one...it seemed to fix transmit when
> initiated from my desktop computer, but it freezes
> everything when I initiate the transmit in the same
> direction from the laptop.

Ok, that's bad but still not so bad. It could be a locking problem in the
via driver but that is fixable you just need to find out where it hangs
and then guess why :)

(sysrq to get stacktraces and run through ksymoops, or kernel debugger or
some deadlock detection thing, if no one gives you more specific
directions search for ikd)


> This one won't compile. Lots of errors.
> Entire include files are missing.

You need to follow the instructions on those pages, there are some
additional header files for backwards compatibility. I don't know if it
compiles under 2.4 but I think the idea is that it should.


> I do, the erorrs are correct. However for some of
> those errors you can't even get the "something wicked"
> message the way that the code is written.
> Other errors are handled elsewhere. The whole
> thing is complicated and may cause redundancies
> and problems. Error handling needs improvement.

I believe the "something wicked" is for an error/uncommon event that isn't
handled and so the known events are filtered out. If you get a IntrTxAbort
by itself then a message isn't printed, but if you get a IntrTxAbort and
IntrTxDone you get some output (000a).

If you look at the via code or the parts I copied from it, then yes errors
are handled both in the error and the tx parts of the interrupt handler.
The reason for that is that some handling is done for error bits set in
the descriptor word and the others are based on the interrupt bits.

Perhaps those aren't the redundancies you mean.


> > There were ideas on changed meaning
> > of an interrupt bit (0x0200) and the "fix" for that
> > is probably causing
> >this.
>
> Isn't this your patch?

Yes? I don't understand that question to that statement.

People around me (in a virtual sense) had ideas that the newer datasheets
described a different function for that bit. When fixing that it seems
like the older chips weren't too happy. But the goal of that patch was to
try and fix the "Dragon+" errors so it wasn't tested on older chips.


> Where's IntrRxEarly? IntrRxNoBuf? IntrRxWakeUp?
> IntrTxAborted? .... I added those to my version.

I don't know. Possibly those flags are never set without also setting
another flag, like IntrRxErr, and the driver doesn't want to do anything
special on a "IntrRxNoBuf" error anyway. I see no harm in adding them.

Maybe ask Donald Becker why he didn't when he wrote that code?

/Urban

2002-03-28 08:51:19

by Ivan G.

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

> (sysrq to get stacktraces and run through ksymoops,
or kernel debugger or
> some deadlock detection thing, if no one gives you
more specific
> directions search for ikd)

I don't have much experience with traces.
I'd rather mess with the mainstream driver
and try to merge code (read on..)

> You need to follow the instructions on those pages,
there are some
> additional header files for backwards compatibility.
I don't know if it
> compiles under 2.4 but I think the idea is that it
should.

I don't need backwards compatibility.
I have kernel 2.4.19-pre3.
And it doesn't compile.

> I believe the "something wicked" is for an
error/uncommon event that isn't
> handled and so the known events are filtered out.

if (intr_status & (IntrPCIErr | IntrLinkChange |
IntrMIIChange| IntrStatsMax | IntrTxAbort |
IntrTxUnderrun))
via_rhine_error(dev, intr_status);

Which ones aren't handled?
The only one I see is PCI Error....

> If you get a IntrTxAbort
> by itself then a message isn't printed, but if you
get a IntrTxAbort and
> IntrTxDone you get some output (000a).

This is actually a great example of the
'redundancies' I was talking about.
You get both Abort and Done...
Abort handler sends command CmdTxDemand.
"Wicked" message handler excludes Abort
but not Done, so it will: (1) print message,
(2) send CmdTxDemand (once again)

>Possibly those flags are never set without also
setting
> another flag, like IntrRxErr,

Why? Each interrupt should have its own bit
in the bitmask.

>the driver doesn't want to do anything
> special on a "IntrRxNoBuf" error anyway.

IntrRxEarly is not utilised in this driver
IntrRxWakeUp is used to call rx function
IntrRxNoBuf is used to call rx function
IntrTxAborted is used to call tx function
and NOT used to call error function(??)
while it being used inside the error function
for exclusion from "Wicked" messages.

___________
Ok.. on to other major issues:

I tried merging some code from the linuxfet
driver, and I managed to solve the stall problem
and get some interesting speed results.

I know what I did, but I am not certain
exactly how it helps the problems. My lack
of knowledge regarding the operation
of the hardware is beginning to be frustrating.
Perhaps you could help locate the problem...

The summary:
Tx aborted and Tx Underrun
are handled differently in linuxfet.

The kernel driver
simply increases stats for aborts
and ignores underruns in via_rhine_tx,
and later sends CmdDemandTx for aborts,
and increases threshold for Underrun
in via_rhine_error.

The linuxfet driver uses the following
code to handle both aborts and underruns
inside the interrupt handler
(tx sequence...separated as a function
in kernel driver)
/*----------------------------------------------*/
np->tx_ring[entry].tx_status = cpu_to_le32(DescOwn);

writel(virt_to_bus(&np->tx_ring[entry]), ioaddr +
TxRingPtr)
;
/* Turn on Tx On*/
writew(CmdTxOn | np->chip_cmd, dev->base_addr +
ChipCmd);
/* Stats counted in Tx-done handler, just restart Tx.
*/
writew(CmdTxDemand | np->chip_cmd, dev->base_addr +
ChipCmd)
;
/*----------------------------------------------*/
I am particularly curious about the ownership bits
and the TxRingPtr save... no such thing
in kernel via_rhine_tx...I don't think.

Then the linuxfet driver doesn't do
anything for abort in the error handler
and increases threshold for underrun.
______________________________________
I moved the code above to the kernel driver
and made it handle aborts and underruns
the same way. I disabled the CmdTxDemand
for aborts in the error handler since it's now
done in the tx function.

This way, I fixed stalls on the Desktop.
However, my laptop was still slow and stalling.
(desktop->laptop transmit, laptop initiates.)

I commented the underrun code in via_rhine_tx
and it fixed all stalls, but speed decreased.

I enabled underrun code in via_rhine_tx
and commented underrun code in via_rhine_error-
same results - decreased speed, no stalls.

I wish I could explain all of this,
but I have little knowledge of hardware operation.
Apparently the handling of aborts is the cause
of stalling. Previously I had logged ownership
bits and stalls always occured when
an ownership bit was set to 0, but transmit
stopped (result of abort, wrong handling?)
The underrun code has effect on speed
and on transmits initiated from my laptop..
but I am too confused to comment about it.
I am sure this is a horribly ignorant question,
but what exactly is an underrun :)

Thank you for all your help.



__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards?
http://movies.yahoo.com/

2002-04-04 22:11:07

by Urban Widmark

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

On Thu, 28 Mar 2002, Ivan Gurdiev wrote:

(A week is probably a personal record ... just hasn't been time to think
about funny rhine problems)

> > If you get a IntrTxAbort
> > by itself then a message isn't printed, but if you
> get a IntrTxAbort and
> > IntrTxDone you get some output (000a).
>
> This is actually a great example of the
> 'redundancies' I was talking about.
> You get both Abort and Done...
> Abort handler sends command CmdTxDemand.
> "Wicked" message handler excludes Abort
> but not Done, so it will: (1) print message,
> (2) send CmdTxDemand (once again)

Setting CmdTxDemand twice is probably (thought to be) harmless. Also
IntrTxAbort doesn't necessarily want the same fix as IntrTxAbort |
IntrTxDone even if it currently does the same.

> >Possibly those flags are never set without also
> setting
> > another flag, like IntrRxErr,
>
> Why? Each interrupt should have its own bit
> in the bitmask.

The bitmask is the hardware interrupts, so they do. I meant if the
hardware always sets those bits together (and no, I don't know why Donald
can make that assumption if that is what he did).


> /*----------------------------------------------*/
> np->tx_ring[entry].tx_status = cpu_to_le32(DescOwn);
>
> writel(virt_to_bus(&np->tx_ring[entry]), ioaddr +
> TxRingPtr)
> ;
> /* Turn on Tx On*/
> writew(CmdTxOn | np->chip_cmd, dev->base_addr +
> ChipCmd);
> /* Stats counted in Tx-done handler, just restart Tx.
> */
> writew(CmdTxDemand | np->chip_cmd, dev->base_addr +
> ChipCmd)
> ;
> /*----------------------------------------------*/
> I am particularly curious about the ownership bits
> and the TxRingPtr save... no such thing
> in kernel via_rhine_tx...I don't think.

The DescOwn tells if the NIC owns the descriptor. TxRingPtr points to RAM
that holds the descriptor ring. Both exist in both drivers, but the kernel
driver looks a little different because it uses the 2.4 PCI DMA interface.

Oh you mean their error handling. No, the descriptor status is only used
to log errors in the kernel driver.

> Then the linuxfet driver doesn't do
> anything for abort in the error handler
> and increases threshold for underrun.

You have descriptor status "txstatus" that they do the above on but the
kernel driver only does a tx_aborted_errors++ on. Question is if that
txstatus is always accompanied by a IntrTxAbort or IntrTxAborted.
(Note that the name "abort" is used for different abort flags, not
necessarily having the same meaning)

> ______________________________________
> I moved the code above to the kernel driver
> and made it handle aborts and underruns
> the same way. I disabled the CmdTxDemand

If you just copied it the "virt_to_bus(&np->tx_ring[entry])" construct
isn't supposed to be used. I think. But I also think it ends up being the
same on x86 so it probably doesn't matter for you.

> This way, I fixed stalls on the Desktop.
> However, my laptop was still slow and stalling.
> (desktop->laptop transmit, laptop initiates.)
>
> I commented the underrun code in via_rhine_tx
> and it fixed all stalls, but speed decreased.

Could you send diffs for these changes? Incremental would be nice (first
vs the orig driver, then against the modified and so on). It's kind of
hard to follow what changes you make.

If speed decreases then maybe it is still aborting a lot, and if CmdTxOn
or something takes a little bit of time it would be noticable.

It could also be that you have decreased the speed of the driver and that
it for some reason then doesn't trigger the event that stalls it.

Another thought is that aborted in txstatus happens a lot more than
IntrTxAbort and that you have increased the workload of the driver.
Perhaps the aborted error handling in the linuxfet driver should be done
in the IntrTxAbort handler instead.

> I wish I could explain all of this,
> but I have little knowledge of hardware operation.
> Apparently the handling of aborts is the cause
> of stalling. Previously I had logged ownership
> bits and stalls always occured when
> an ownership bit was set to 0, but transmit
> stopped (result of abort, wrong handling?)

DescOwn 0 means that the chip no longer has any right to access that
descriptor, so it should stop. If the chip signals IntrTxDone |
IntrTxAbort, then it seems reasonable that it would no longer own the
descriptor because it says it is done with it.

Btw, this could be a reason why IntrTxDone | IntrTxAbort would be
different from IntrTxAbort (if the second ever happens) and that they
should be treated differently (the current code has a "Recovery for other
fault sources not known." in the wicked-part but maybe that should be
something else).

That it stalls could be that it for some reason doesn't continue sending
the new descriptors that are written when packets are moved to the driver.
You could examine the state of the tx descriptors when it stalls, and if
they still make up a ring.

If the TxRingPtr is readable then I think you can look at that to see
which descriptor the card thinks is current. If that pointer does not
match what the cur_tx ("entry") points to then something has made them
loose sync and I guess that could stop progress.


> The underrun code has effect on speed
> and on transmits initiated from my laptop..
> but I am too confused to comment about it.
> I am sure this is a horribly ignorant question,
> but what exactly is an underrun :)

My understanding:
A buffer on the chip was empty when it needed to send more data to keep up
with the ethernet speed. Increasing the buffer means that it has more time
to get the rest of the data, and store&forward means that it reads the
full packet into the buffer before beginning to send.

Have you tried simply increasing the tx_thresh and not change anything in
the IntrTxUnderrun handling?

/Urban

2002-04-05 05:52:49

by Ivan G.

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

> (A week is probably a personal record ... just hasn't been
> time to think about funny rhine problems)

Well, thanks for even bothering with my problems.
Appreciate it.

Inconsistent hardware tests are driving me insane.
I am beginning to think that I am wasting both my and your
time by doing such testing. My card is stalling again and
I have no idea why. I think I should start concentrating more
on software code and what it does, rather than judging by
the way hardware "seems" to work. For some reason
I never know what to expect when I reboot
that computer and try again.

Because of those problems, I think that I should list all
the things that I have added or consider adding to my
version of the driver and, with your help (if you have time
for this stuff), decide which ones to keep and integrate into
a patch, and which ones to abandon. This way, at least I would
know that my conclusions/confusions are based on the cleanest
code possible at the moment. Also, at least some improvement
will come out of this, while fixing my card transmit is an
unsure thing.

So:
Here we go...
Enumerated list of all issues I'm concerned about...
Please approve or disapprove changes if you have time :)

1) Type of chip

Attempting to fix any device clearly requires knowing what
kind it is. I obtained my ethernet card from a friend and was
unsure about the model - I know that it has a davicom 9101f
chip on it. So far, I thought my card was a VT86C100A, because
that's what /proc/pci says. However, I see now the driver
and via-diag identify the card as VT3043 Rhine. What could
cause the inconsistency and how can I check for sure?


Related code:
In the meantime, trying to check that
I ran into the following issue:
init one: chip_id: 2
wait for reset, chip_id: 0
wait for reset, chip_id: 2
(these are my own debug messages)

Cause of the problem: The first time wait_for_reset is called
(in init_one) np->chip_id is not initialized.
Effect of the problem: Delay code for 3043
and VT86C100A will also affect the Rhine-II
the first time wait_for_reset is ran since chip_id
is always 0. Fix: Unsure of the best way to fix.

Related code:
I am also concerned about this:
In via_rhine_error there is code
related to link change that includes HasDavicomPhy.
HasDavicomPhy is not included in the chip_info
structure of any of the three chips supported.
According to the other drivers I've seen
it should be. Fix: Should I add that flag
to the 3043 and the VT86C100A? if those
are the right cards.


2) Full duplex.
So far I've been ignoring the fact that my transmit,
whenever it works, however it works, is very very slow.
I use two cards capable of full duplex over a crossover
cable. The driver and diagnostic programs of both
cards report full duplex speed. However, my transfer speed,
even if not stalling does not exceed 1.5mb/s.
I assumed transmit errors were responsible,
however: while I was changing the driver
I ran into a situation where my card was actually
transmitting at the speed it is supposed to - 7Mb/s+
I couldn't reproduce the situation. Do you think
this is related to the transmit errors or it is a duplex
negotiation issue?

3) The missing interrupts

I added those. There's 3 or 4 of them.

4) Queue debug message

printk(KERN_DEBUG "%s: Transmit frame #%d queued in slot %d.\n",
dev->name, np->cur_tx-1, entry);

Changed np->cur_tx to np->cur_tx-1.
I thought the frame was one off...

5) The abort handling in linuxfet vs. kernel driver
The underflow handling in linuxfet vs. kernel driver

You say that the descriptor status
is only used to log errors in the kernel.
Is this correct handling? Why does the linuxfet
driver set the ownership bit and send
CmdTxOn, CmdTxDemand in cases of abort and underflow?
How would the hardware react to such handling?
What's a good resource to check on those commands?
The datasheets are not very verbose.

I'll do tests on ownership bit
and descriptor status myself.
I would just like to clear those other
issues since they might interfere...

6) Rx Threshold

...defaults to 0x60 in kernel driver
and 0xE0 in another one I've seen -
either linuxfet or Becker. Which is correct?

7) Those "Wicked" messages

Ok. I understand why you could have a case
to handle any weird interrupt combinations
combining error interrupts and others
(such as Abort/Done).
However, how do you explain Becker's driver...

if ((intr_status & ~(IntrLinkChange | IntrMIIChange | IntrStatsMax |
IntrTxAbort|IntrTxAborted | IntrNormalSummary))

it excludes IntrNormalSummary from those messages.
TxAbort/TxDone would be excluded in this case.
Exactly what kind of error should this message trap?

8) Some other information:
cat /proc/net/dev
on desktop (the via card) shows transmit errors.
cat /proc/net/dev
on laptop (the netgear opp. card) shows
large amounts of packets under FRAME.

-----

Ok, this is all for now.
I apologize for the long message
and any confusion I'm creating.
Just trying to help. I would really like to fix
this driver but I don't seem to be very good at this.
I will do additional testing with ownership bits
and try changing tresholds, etc..

2002-04-07 06:48:35

by Ivan G.

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

Regarding, issue #6 (or whatever one that was)
Ownership bits, tx rings and other fun stuff:
Here's a bunch of logs I generated which clearly show a problem
with perhaps missed interrupts? mishandled ownership bits??
I do not know the cause but here's the evidence.

Info:
Logs are generated using a modified kernel driver.
Major changes in operation include abort handling from linuxfet driver.
However, you'll notice the problem I'm talking about does not occur
after either Abort or Aborted interrupt. In fact, I think I have previously
detected the same problem with the original driver.

More Info:
These are sections of a dmesg -c >> log of an scp transfer
between laptop and desktop. The desktop stubbornly refused to stall this time
(but it stalls other times!), however, the laptop stalled every once in a
while so it generated the timeout messages I was looking for. The transfer
has to be INITIATED from the laptop - didn't stall otherwise (but I'm not
sure about any hardware tests - I'd prefer to look at logs)

So, here are the logs with commentary:

At the beginning, a normal? log
//--------------------------------------------------------
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 00000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 00000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 00000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 00000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 00000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 00000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 00000000, addr: 00000000, next_desc:
1352a1b0
Tx descriptor slot 11: tx_status: 00000000, addr: 00000000, next_desc:
1352a1c0
Tx descriptor slot 12: tx_status: 00000000, addr: 00000000, next_desc:
1352a1d0
Tx descriptor slot 13: tx_status: 00000000, addr: 00000000, next_desc:
1352a1e0
Tx descriptor slot 14: tx_status: 00000000, addr: 00000000, next_desc:
1352a1f0
Tx descriptor slot 15: tx_status: 00000000, addr: 00000000, next_desc:
1352a100
//-----------------------------------------------------//
frame number is evidence that my frame-1 fix is working.
this log seems normal, except
1) are the addresses supposed to be initialized ? rx addresses are ...
2) what exactly do addr and next_desc point to? how can i check those
addresses.

------------------------------------------------------
Anyway, here's the abnormal piece causing the problems:
Look at txstatus - notice one 0002 interrupt (tx done) removes 2 ownership
bits, after which another interrupt removes 0, transmit stops soon, and the
queue keeps going on until timeout. In another log, I recorded many
exit_status interrupts between the ownership lock
and the NETDEV timeout. After the timeout, addr fields are marked bad.
Here's the log:

Descriptor messages PRECEDE the interrupt message.
(Interrupt has occured but you get the message after the ownership logs)

Notice the cur->tx and dirty->tx reported after timeout.

0002 is the transmit interrupt, 0001 is the receive one

Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 80000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 80000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 80000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 00000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 00000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 00000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 00000000, addr: 13353c00, next_desc:
1352a1b0
Tx descriptor slot 11: tx_status: 00000000, addr: 13354200, next_desc:
1352a1c0
Tx descriptor slot 12: tx_status: 00000000, addr: 13354800, next_desc:
1352a1d0
Tx descriptor slot 13: tx_status: 00000000, addr: 13354e00, next_desc:
1352a1e0
Tx descriptor slot 14: tx_status: 00000000, addr: 13355400, next_desc:
1352a1f0
Tx descriptor slot 15: tx_status: 00000000, addr: 13355a00, next_desc:
1352a100
eth0: Interrupt, status 0002.
eth0: exiting interrupt, status=0x0000.
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 00000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 00000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 80000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 00000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 00000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 00000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 00000000, addr: 13353c00, next_desc:
1352a1b0
Tx descriptor slot 11: tx_status: 00000000, addr: 13354200, next_desc:
1352a1c0
Tx descriptor slot 12: tx_status: 00000000, addr: 13354800, next_desc:
1352a1d0
Tx descriptor slot 13: tx_status: 00000000, addr: 13354e00, next_desc:
1352a1e0
Tx descriptor slot 14: tx_status: 00000000, addr: 13355400, next_desc:
1352a1f0
Tx descriptor slot 15: tx_status: 00000000, addr: 13355a00, next_desc:
1352a100
eth0: Interrupt, status 0002.
eth0: exiting interrupt, status=0x0000.
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 00000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 00000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 80000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 00000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 00000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 00000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 00000000, addr: 13353c00, next_desc:
1352a1b0
Tx descriptor slot 11: tx_status: 00000000, addr: 13354200, next_desc:
1352a1c0
Tx descriptor slot 12: tx_status: 00000000, addr: 13354800, next_desc:
1352a1d0
Tx descriptor slot 13: tx_status: 00000000, addr: 13354e00, next_desc:
1352a1e0
Tx descriptor slot 14: tx_status: 00000000, addr: 13355400, next_desc:
1352a1f0
Tx descriptor slot 15: tx_status: 00000000, addr: 13355a00, next_desc:
1352a100
eth0: Interrupt, status 0001.
In via_rhine_rx(), entry 14 status 00468f00.
via_rhine_rx() status is 00468f00.
eth0: exiting interrupt, status=0x0000.
eth0: Transmit frame #6807 queued in slot 7.
eth0: Transmit frame #6808 queued in slot 8.
eth0: Transmit frame #6809 queued in slot 9.
eth0: Transmit frame #6810 queued in slot 10.
eth0: Transmit frame #6811 queued in slot 11.
eth0: Transmit frame #6812 queued in slot 12.
eth0: Transmit frame #6813 queued in slot 13.
eth0: Transmit frame #6814 queued in slot 14.
eth0: Transmit frame #6815 queued in slot 15.
NETDEV WATCHDOG: eth0: transmit timed out
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 00000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 00000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 80000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 80000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 80000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 80000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 80000000, addr: 13353c00, next_desc:
1352a1b0
Tx descriptor slot 11: tx_status: 80000000, addr: 13354200, next_desc:
1352a1c0
Tx descriptor slot 12: tx_status: 80000000, addr: 13354800, next_desc:
1352a1d0
Tx descriptor slot 13: tx_status: 80000000, addr: 13354e00, next_desc:
1352a1e0
Tx descriptor slot 14: tx_status: 80000000, addr: 13355400, next_desc:
1352a1f0
Tx descriptor slot 15: tx_status: 80000000, addr: 13355a00, next_desc:
1352a100
Cur Tx points to slot: 0
Dirty Tx points to slot: 6
eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
wait for reset, chip_id: 2
eth0: reset finished after 5 microseconds.
eth0: Transmit frame #0 queued in slot 0.
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: badf00d0, next_desc: 1352a120


...and so on...

code used to generate logs: - see CHANGE tags for additions
this is in interrupt function, and I have more in the timeout function

/*CHANGE*/
int i;
struct netdev_private *np=dev->priv;


ioaddr = dev->base_addr;

while ((intr_status = readw(ioaddr + IntrStatus))) {
/* Acknowledge all of the current interrupt sources ASAP. */
writew(intr_status & 0xffff, ioaddr + IntrStatus);

/*CHANGE*/
for (i = 0; i < TX_RING_SIZE; i++) {
printk (KERN_INFO "Tx descriptor slot %i: tx_status: %8.8x,
addr: %8.8x, next_desc: %8.8x\n",i,
np->tx_ring[i].tx_status, le32_to_cpu(np->tx_ring[i].addr),
le32_to_cpu(np->tx_ring[i].next_desc));


2002-04-10 16:51:36

by Urban Widmark

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors

On Sat, 6 Apr 2002, Ivan G. wrote:

> frame number is evidence that my frame-1 fix is working.

Which frame-1 fix?

> this log seems normal, except
> 1) are the addresses supposed to be initialized ? rx addresses are ...

They are written in start_tx when used. The buffer with data comes from
the higher levels so it can't be initialised here.

> 2) what exactly do addr and next_desc point to? how can i check those
> addresses.

The addr points to the data to transmit. The next_desc simply makes the
entries form a ring. I think you can assume that they are ok. But
otherwise check what is written in via_rhine_start_tx.


> Look at txstatus - notice one 0002 interrupt (tx done) removes 2 ownership
> bits, after which another interrupt removes 0, transmit stops soon, and the
> queue keeps going on until timeout. In another log, I recorded many
> exit_status interrupts between the ownership lock
> and the NETDEV timeout. After the timeout, addr fields are marked bad.

It is intentional that one interrupt can remove more than one used buffer.
via_rhine_tx has a loop that tries to clean up all "dirty" tx descriptors.
I think that one is ok.

I wonder about the one that removes zero. Why that interrupt happened.
Maybe it just happened while the previous interrupt was being handled.


> Notice the cur->tx and dirty->tx reported after timeout.

You don't print cur_tx and dirty_tx, but the slots they point to are
strange. You should check what they point to after the tx_timeout routine
has completed, they should both be 0 by then.

/Urban

2002-04-10 22:51:56

by Ivan G.

[permalink] [raw]
Subject: Re: Via-Rhine stalls - transmit errors


> Which frame-1 fix?

This one -> I reduced the frame by one for correct debug mssg.
Not important - I just happened to mention it.

/*CHANGE*/
if (debug > 4) { printk(KERN_DEBUG "%s: Transmit frame #%d queued in slot
%d.\n", dev->name, np->cur_tx-1, entry); }

This was included in this message:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0204.0/0722.html

You must not have read that one.
it contains lots of stuff about small changes in the code
and also link related issues.


> The addr points to the data to transmit. The next_desc simply makes the
> entries form a ring. I think you can assume that they are ok. But
> otherwise check what is written in via_rhine_start_tx.

I'll assume those are fine - they seem to form a ring.

> It is intentional that one interrupt can remove more than one used buffer.
> via_rhine_tx has a loop that tries to clean up all "dirty" tx descriptors.
> I think that one is ok.
>
> I wonder about the one that removes zero. Why that interrupt happened.
> Maybe it just happened while the previous interrupt was being handled.

Ok, this is actually my fault. i misinterpreted the logs
since I made them too complicated - they precede the interrupt instead of
follow. That means you were not seing 2 bits removed, then 0, but
1 bit removed - normal interrupt, then 2 bits removed with 1 interrupt.
So the second case is not an issue.
However, you say that 2 bits with 1 interrupt is fine...
The logs show all timeouts occur after 1 interrupt clears 2 ownership bits,
transmit stops and the queue fills up. What could possibly be causing this?

> You don't print cur_tx and dirty_tx, but the slots they point to are
> strange. You should check what they point to after the tx_timeout routine
> has completed, they should both be 0 by then.

strange? why?
cur_tx points to the next free slot without ownership bit
dirty_tx points to the first slot with ownership bit set
I checked both after timeout, they point to 0.

/-----------------------/
I'll provide whatever other logs are necessary.
However, I am not sure what to look for.
Additionally, my version of the driver has some stuff that's not in the
kernel driver. That's why I had listed it all in a previous message
(see link above) to see what to keep and what to get rid of
and then be able to debug an identical driver to the kernel.

Particularly the abort code from the linuxfet driver
seems to make my card stall a lot less or not at all
when transfer is initiated from the same computer.
The logs I generated last message showed a transfer
initiated from the opposite end.