2002-01-22 19:16:10

by mirian

[permalink] [raw]
Subject: network hangs, NETDEV WATCHDOG messages, Dual AMD Duron, APIC

I've seen a few other reports of this; let me add mine to the mix in
hopes of working out a solution:

I have a dual processor AMD Duron, using the Tyan Tiger motherboard and
Intel APIC. On every kernel I've tried, from the 2.4.6 that comes with
Mandrake to 2.4.17 and 2.4.18-pre3 with ac-patches, I get the same
problem: when transferring files over my local network, the first 5K or
so arrive properly and then my network interface goes completely dead.
This happens predictably and reliably. The problem does not happen with
bursty net activity (ssh logins) on my local net, or at all with file
transfers across the internet ... just sustained network activity with
other machines on my local net.

One the network device goes dead, it stays dead and the NETDEV WATCHDOG
messages appear in the syslog, periodically, until (a) the net device is
ifconfig'ed down, or (b) the driver module (tulip.o) is unloaded. After
that, if the module is reloaded or the device ifconfig'ed back up, the
net is fine again, until the next such file transfer.

I've tried most of the advice that others have suggested (moving the
network card to another PCI slot, using the -noapic boot option). I've
even applied both of the suggested patches for this problem (both of
which appear to attempt to address the problem by edge/level'ing the
relevant IRQ. I can verify that this code is triggered properly, and
operates on the correct IRQ, but it does not unwedge the network device
(and in fact, the code just keeps getting called over and over again,
along with the NETDEV WATCHDOG messages.

I'm fairly conversant with Linux kernel code, but I don't really
understand the inner magic of APIC. Can someone help me in getting to
the bottom of this problem?

--Mirian


2002-01-24 06:29:49

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: network hangs, NETDEV WATCHDOG messages, Dual AMD Duron, APIC

Have you tested with another NIC? I've witnessed a problem similar to
yours recently (specific machines could login to a Samba server and i
could ssh out to/from the server but file copies from Samba failed after
copying 20-30 files. Please test with another NIC (another Tulip maybe?)
so that we can determine wether its a hardware/kernel issue.

Also consider that Duron SMP is not a supported configuration, and
therefore CPU/PIC based issues like APIC problems aren't going to get you
far with some of the kernel hackers. Incidentally, what are the cpuids of
your Durons?

Regards,
Zwane Mwaikambo

2002-01-26 13:38:57

by mirian

[permalink] [raw]
Subject: Re: network hangs, NETDEV WATCHDOG messages, Dual AMD Duron, APIC

On Thu, 24 Jan 2002 08:25:05 +0200 (SAST), Zwane Mwaikambo <[email protected]> wrote:
>Have you tested with another NIC? I've witnessed a problem similar to
>yours recently (specific machines could login to a Samba server and i
>could ssh out to/from the server but file copies from Samba failed after
>copying 20-30 files. Please test with another NIC (another Tulip maybe?)
>so that we can determine wether its a hardware/kernel issue.

The NIC I was using was a NetGear Lite-On PNIC. I swapped it out with
another one, and got exactly the same behavior. Then I swapped in a
3C905B, and the problem disappeared; the network works fine.

Just out of curiosity, I tried another NIC, a Kington KNE100TX. That
card didn't work at all, in fact, it somehow zapped my CMOS. I went
back to the 3C905B.

After reading some more reports of problems with AMDs, I should also
mention that I'm using an AGP video card (an Asus GeForce3). The card
itself isn't giving me trouble, and I've experienced none of the hangs
or Oopses that others have, but APIC issues do seem to manifest in
bizarre ways.

>Also consider that Duron SMP is not a supported configuration, and
>therefore CPU/PIC based issues like APIC problems aren't going to get you
>far with some of the kernel hackers.

That's interesting. Well, in that case, I can report that my Duron SMP
works pretty well. The only problem I've encountered has been the
network problem with certain cards; everything else works a treat on the
2.4 series kernels.

Thanks for your help and advice,
--Mirian