Hi All,
Just installed Redhat 7.3 on an Athlon 2000 system (2.4.20).
We have a 3c905c network card, and when I try copying files from an nfs
mount I get this in the logs:
Mar 12 13:15:11 ark kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 12 13:15:11 ark kernel: eth0: transmit timed out, tx_status 00 status
e000.
Mar 12 13:15:11 ark kernel: diagnostics: net 0cc6 media 8880 dma 000000a0.
Mar 12 13:15:11 ark kernel: Flags; bus-master 1, dirty 4048(0) current
4064(0)
Mar 12 13:15:11 ark kernel: Transmit list fffffff8 vs. f7ea9200.
Mar 12 13:15:11 ark kernel: 0: @f7ea9200 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 1: @f7ea9240 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 2: @f7ea9280 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 3: @f7ea92c0 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 4: @f7ea9300 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 5: @f7ea9340 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 6: @f7ea9380 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 7: @f7ea93c0 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 8: @f7ea9400 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 9: @f7ea9440 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 10: @f7ea9480 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 11: @f7ea94c0 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 12: @f7ea9500 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 13: @f7ea9540 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 14: @f7ea9580 length 800000b6 status 800000b6
Mar 12 13:15:11 ark kernel: 15: @f7ea95c0 length 800000b6 status 800000b6
Mar 12 13:15:11 ark kernel: eth0: Resetting the Tx ring pointer.
Usually the machine freezes up after this, and I can't ping it, and when I
press
CTRL-ALT-DEL i get "INIT: cannot execute "echo"" and have to hit reset.
However this last time the machine came back after about 3 secs of no
network activity.
Now I was looking at some info on google, and people said to turn off
IO-APIC
in the kernel, but this was for 2.4.5, has this been fixed in 2.4.20?
Cheers
Dave
/-----------------------------------
David Shirley
System's Administrator
Computer Science - Curtin University
(08) 9266 2986
-----------------------------------/
On 12 March 2003 07:23, David Shirley wrote:
> Hi All,
>
> Just installed Redhat 7.3 on an Athlon 2000 system (2.4.20).
>
> We have a 3c905c network card, and when I try copying files from an
> nfs mount I get this in the logs:
>
Mar 12 13:15:11 ark kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 12 13:15:11 ark kernel: eth0: transmit timed out, tx_status 00 status e000.
Mar 12 13:15:11 ark kernel: diagnostics: net 0cc6 media 8880 dma 000000a0.
Mar 12 13:15:11 ark kernel: Flags; bus-master 1, dirty 4048(0) current 4064(0)
Mar 12 13:15:11 ark kernel: Transmit list fffffff8 vs. f7ea9200.
Mar 12 13:15:11 ark kernel: 0: @f7ea9200 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 1: @f7ea9240 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 2: @f7ea9280 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 3: @f7ea92c0 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 4: @f7ea9300 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 5: @f7ea9340 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 6: @f7ea9380 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 7: @f7ea93c0 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 8: @f7ea9400 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 9: @f7ea9440 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 10: @f7ea9480 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 11: @f7ea94c0 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 12: @f7ea9500 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 13: @f7ea9540 length 800000b6 status 000000b6
Mar 12 13:15:11 ark kernel: 14: @f7ea9580 length 800000b6 status 800000b6
Mar 12 13:15:11 ark kernel: 15: @f7ea95c0 length 800000b6 status 800000b6
Mar 12 13:15:11 ark kernel: eth0: Resetting the Tx ring pointer.
>
> Usually the machine freezes up after this, and I can't ping it,
Drivers tend to be less stable that core kernel. Maybe there's
a bug in 3c59x.c network card driver. You may read the code
and start putting some printks there, especially in error
paths.
Say, "Resetting the Tx ring pointer" comes from
vortex_tx_timeout(struct net_device *dev), and looking at that code
you may notice that resetting is done without disabling interrupts.
I know nil about low-level network driver stuff, but
printk(KERN_DEBUG "%s: Resetting the Tx ring pointer.\n", dev->name);
if (vp->cur_tx - vp->dirty_tx > 0 && inl(ioaddr + DownListPtr) == 0)
outl(vp->tx_ring_dma + (vp->dirty_tx % TX_RING_SIZE) * sizeof(struct boom_tx_desc),
ioaddr + DownListPtr);
if (vp->cur_tx - vp->dirty_tx < TX_RING_SIZE)
===> netif_wake_queue (dev);
if (vp->drv_flags & IS_BOOMERANG)
outb(PKT_BUF_SZ>>8, ioaddr + TxFreeThreshold);
outw(DownUnstall, ioaddr + EL3_CMD);
looks a bit strange. I'd move netif_wake_queue() below the out()s
and encased out()s in IRQ disabled region.
Or some such.
Whee, this module can talk a lot, see:
MODULE_PARM_DESC(debug, "3c59x debug level (0-6)");
;)
--
vda
Its strange we have another machine, exact same setup,
hardware/software and it doesn't have any problems.
The other thing is that sometimes the machine freezes totally,
and othertimes it comes back after 30 secs??
FYI: Its not a modular kernel, but ill try the printk thing.
Dave.
----- Original Message -----
From: "Denis Vlasenko" <[email protected]>
To: "David Shirley" <[email protected]>; <[email protected]>
Sent: Wednesday, March 12, 2003 9:37 PM
Subject: Re: Help, eth0: transmit timed out!
> On 12 March 2003 07:23, David Shirley wrote:
> > Hi All,
> >
> > Just installed Redhat 7.3 on an Athlon 2000 system (2.4.20).
> >
> > We have a 3c905c network card, and when I try copying files from an
> > nfs mount I get this in the logs:
> >
> Mar 12 13:15:11 ark kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Mar 12 13:15:11 ark kernel: eth0: transmit timed out, tx_status 00 status
e000.
> Mar 12 13:15:11 ark kernel: diagnostics: net 0cc6 media 8880 dma
000000a0.
> Mar 12 13:15:11 ark kernel: Flags; bus-master 1, dirty 4048(0) current
4064(0)
> Mar 12 13:15:11 ark kernel: Transmit list fffffff8 vs. f7ea9200.
> Mar 12 13:15:11 ark kernel: 0: @f7ea9200 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 1: @f7ea9240 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 2: @f7ea9280 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 3: @f7ea92c0 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 4: @f7ea9300 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 5: @f7ea9340 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 6: @f7ea9380 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 7: @f7ea93c0 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 8: @f7ea9400 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 9: @f7ea9440 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 10: @f7ea9480 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 11: @f7ea94c0 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 12: @f7ea9500 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 13: @f7ea9540 length 800000b6 status
000000b6
> Mar 12 13:15:11 ark kernel: 14: @f7ea9580 length 800000b6 status
800000b6
> Mar 12 13:15:11 ark kernel: 15: @f7ea95c0 length 800000b6 status
800000b6
> Mar 12 13:15:11 ark kernel: eth0: Resetting the Tx ring pointer.
> >
> > Usually the machine freezes up after this, and I can't ping it,
>
> Drivers tend to be less stable that core kernel. Maybe there's
> a bug in 3c59x.c network card driver. You may read the code
> and start putting some printks there, especially in error
> paths.
>
> Say, "Resetting the Tx ring pointer" comes from
> vortex_tx_timeout(struct net_device *dev), and looking at that code
> you may notice that resetting is done without disabling interrupts.
> I know nil about low-level network driver stuff, but
>
> printk(KERN_DEBUG "%s: Resetting the Tx ring pointer.\n",
dev->name);
> if (vp->cur_tx - vp->dirty_tx > 0 && inl(ioaddr +
DownListPtr) == 0)
> outl(vp->tx_ring_dma + (vp->dirty_tx %
TX_RING_SIZE) * sizeof(struct boom_tx_desc),
> ioaddr + DownListPtr);
> if (vp->cur_tx - vp->dirty_tx < TX_RING_SIZE)
> ===> netif_wake_queue (dev);
> if (vp->drv_flags & IS_BOOMERANG)
> outb(PKT_BUF_SZ>>8, ioaddr + TxFreeThreshold);
> outw(DownUnstall, ioaddr + EL3_CMD);
>
> looks a bit strange. I'd move netif_wake_queue() below the out()s
> and encased out()s in IRQ disabled region.
>
> Or some such.
>
> Whee, this module can talk a lot, see:
> MODULE_PARM_DESC(debug, "3c59x debug level (0-6)");
>
> ;)
> --
> vda
>
On 12 March 2003 15:54, David Shirley wrote:
> Its strange we have another machine, exact same setup,
> hardware/software and it doesn't have any problems.
Did you try to swap some hardware? NIC would be the first
to swap.
> The other thing is that sometimes the machine freezes totally,
> and othertimes it comes back after 30 secs??
If it's a driver problem, anything is possible.
> FYI: Its not a modular kernel, but ill try the printk thing.
Modular one will be far easier (faster) to play with.
You just unload the module, recompile and reload.
No reboot cycles.
--
vda
Tried a different NIC, another 3c905c.
Yeah i know about the modules thing, i turned it off
in case that was the problem.
D
----- Original Message -----
From: "Denis Vlasenko" <[email protected]>
To: "David Shirley" <[email protected]>; <[email protected]>
Sent: Wednesday, March 12, 2003 9:50 PM
Subject: Re: Help, eth0: transmit timed out!
> On 12 March 2003 15:54, David Shirley wrote:
> > Its strange we have another machine, exact same setup,
> > hardware/software and it doesn't have any problems.
>
> Did you try to swap some hardware? NIC would be the first
> to swap.
>
> > The other thing is that sometimes the machine freezes totally,
> > and othertimes it comes back after 30 secs??
>
> If it's a driver problem, anything is possible.
>
> > FYI: Its not a modular kernel, but ill try the printk thing.
>
> Modular one will be far easier (faster) to play with.
> You just unload the module, recompile and reload.
> No reboot cycles.
> --
> vda
>
On Wednesday 12 March 2003 15:41, David Shirley wrote:
> Tried a different NIC, another 3c905c.
..and? I'm using this NIC family with this driver in all my diskless setups
with kernels since 2.0.* up to 2.4.20, and I never experienced the problem
you describe, so I would check for some hardware, bios, chipset, cable, hub
or switch problem.
>From about 30 NICs currently in production for 6 month up to 5 years, I had
one failure (3c905b). I haven't found Don's driver failing since ages ;-),
through the b versions created me some headaches for etherbooting and the
newest 3c905cx-txm has a problem with software bootprom flashing :-(.
Pete
Sorry
Diferent NIC didn't help.
Yeah we have used about 300 3c905's over the last couple of years
(labs for a university dept). Never had a problem
Must be something else, mem of MB i reckon.
Will change it and let you all know.
Cheers
Dave
----- Original Message -----
From: "Hans-Peter Jansen" <[email protected]>
To: "David Shirley" <[email protected]>;
<[email protected]>; <[email protected]>
Sent: Thursday, March 13, 2003 1:52 AM
Subject: Re: Help, eth0: transmit timed out!
> On Wednesday 12 March 2003 15:41, David Shirley wrote:
> > Tried a different NIC, another 3c905c.
>
> ..and? I'm using this NIC family with this driver in all my diskless
setups
> with kernels since 2.0.* up to 2.4.20, and I never experienced the problem
> you describe, so I would check for some hardware, bios, chipset, cable,
hub
> or switch problem.
>
> From about 30 NICs currently in production for 6 month up to 5 years, I
had
> one failure (3c905b). I haven't found Don's driver failing since ages ;-),
> through the b versions created me some headaches for etherbooting and the
> newest 3c905cx-txm has a problem with software bootprom flashing :-(.
>
> Pete
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>