Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Wed, 30 Jan 2002 14:29:56 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Wed, 30 Jan 2002 14:29:47 -0500 Received: from flubber.jvb.tudelft.nl ([130.161.76.47]:40835 "EHLO mail.jvb.tudelft.nl") by vger.kernel.org with ESMTP id ; Wed, 30 Jan 2002 14:29:28 -0500 From: "Robbert Kouprie" To: Subject: RE: NIC lockup in 2.4.17 (SMP/APIC/Intel 82557) Date: Wed, 30 Jan 2002 20:29:15 +0100 Message-ID: <001701c1a9c4$673dc4b0$020da8c0@nitemare> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Not much new, but still: Today I got the same problem again with 2.4.18-pre3-ac2. Network connections stuck, NFS mounts stuck. Bringing down/up the interface doesn't help. Seems like the NIC is really in trouble here. Only a reboot would bring the nick back in use. Still no testcase though, and I have no idea on how to investigate this :( Can anyone give a hint as where to seek? Regards, - Robbert Kouprie > -----Original Message----- > From: James Bourne [mailto:jbourne@MtRoyal.AB.CA] > Sent: dinsdag 22 januari 2002 15:54 > To: Robbert Kouprie > Cc: jussi.laako@kolumbus.fi; linux-kernel@vger.kernel.org > Subject: Re: NIC lockup in 2.4.17 (SMP/APIC/Intel 82557) > > > On Tue, 22 Jan 2002, Robbert Kouprie wrote: > > > > > Jussi Laako wrote: > > > > > Robbert Kouprie wrote: > > > > > > > > Thanks for the quick reply :) Just checked it, and it's > in slot 2, so > > > > that's not the problem. It doesn't share the HPT366 > IRQ. This is my > > > > /proc/interrupts: > > > > > > Driver is eepro100? I suspect there is something in > eepro100 driver that > > > should be protected by a spinlock but is not. I haven't > got time to > > > analyze it further, yet... > > > > > > - Jussi Laako > > > > Yes, eepro100.c. Let me know if I can test something, > although I would > > need a reproducible testcase also. Still doing some tests with high > > network load, as this caused the similar lockup in the other thread. > > > > - Robbert > > > > Perhaps this will help. Yesterday we had a strange error on > an eepro100 > NIC. System is 4-way Xeon, 4G RAM, 4 eepro100 nics (2 in > use), Dell PE6400. > Kernel is 2.4.17, no additional patches. The system has not locked up > though. > > The error was > eth0: can't fill rx buffer (force 0)! > eth0: Tx ring dump, Tx queue 3013060 / 3013060: > eth0: 0 200ca000. > eth0: 1 000ca000. > eth0: 2 000ca000. > eth0: 3 400ca000. > eth0: *= 4 000ca000. > eth0: 5 000ca000. > [... all the same ...] > eth0: 30 000ca000. > eth0: 31 000ca000. > eth0: Printing Rx ring (next to receive into 2522947, dirty > index 2522946). > eth0: 0 00000001. > eth0: l 1 c0000001. > eth0: * 2 00000000. > eth0: = 3 00000001. > eth0: 4 00000001. > eth0: 5 00000001. > [... all the same ...] > eth0: 30 00000001. > eth0: 31 00000001. > > System has only 4 days uptime, eth0 output is: > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:3049032 errors:0 dropped:0 overruns:0 frame:0 > TX packets:3566542 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:100 > RX bytes:580129470 (553.2 Mb) TX bytes:2775810991 > (2647.2 Mb) > Interrupt:26 > > /proc/interrupts > loki:bash# cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 > 0: 10332192 10317829 10310969 10381268 > IO-APIC-edge timer > 1: 292 307 250 273 > IO-APIC-edge keyboard > 2: 0 0 0 0 > XT-PIC cascade > 8: 26 32 30 27 IO-APIC-edge rtc > 17: 6 4 3 3 > IO-APIC-level aic7xxx > 18: 2 5 2 7 > IO-APIC-level aic7xxx > 22: 766030 765530 764892 765127 IO-APIC-level eth1 > 23: 388900 388509 388022 388376 > IO-APIC-level megaraid > 26: 1395108 1395240 1394961 1396555 IO-APIC-level eth0 > NMI: 0 0 0 0 > LOC: 41347538 41347536 41347536 41347494 > ERR: 0 > MIS: 0 > > lspci > loki:bash# lspci > 00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 21) > 00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01) > 00:00.2 Host bridge: ServerWorks: Unknown device 0006 > 00:00.3 Host bridge: ServerWorks: Unknown device 0006 > 00:04.0 VGA compatible controller: ATI Technologies Inc 3D > Rage IIC (rev 7a) > 00:05.0 SCSI storage controller: Adaptec 7899P (rev 01) > 00:05.1 SCSI storage controller: Adaptec 7899P (rev 01) > 00:08.0 Ethernet controller: Intel Corporation 82557 > [Ethernet Pro 100] (rev 08) > 00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 50) > 00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller > 00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB > Controller (rev 04) > 03:08.0 Ethernet controller: Intel Corporation 82557 > [Ethernet Pro 100] (rev 08) > 03:09.0 PCI bridge: Digital Equipment Corporation DECchip > 21154 (rev 05) > 03:0a.0 Ethernet controller: Intel Corporation 82557 > [Ethernet Pro 100] (rev 08) > 03:0b.0 Ethernet controller: Intel Corporation 82557 > [Ethernet Pro 100] (rev 08) > 04:00.0 PCI bridge: Digital Equipment Corporation DECchip > 21154 (rev 05) > 04:01.0 SCSI storage controller: Q Logic QLA12160 (rev 06) > 05:00.0 RAID bus controller: American Megatrends Inc. > MegaRAID (rev 20) > > > Although I haven't had much time to track this yet (was planning later > today) I thought it might be related to the above... If any other > information would help, please let me know. > > Regards > James Bourne > > -- > James Bourne, Supervisor Data Centre Operations > Mount Royal College, Calgary, AB, CA > www.mtroyal.ab.ca > > ************************************************************** > **************** > This communication is intended for the use of the recipient > to which it is > addressed, and may contain confidential, personal, and or privileged > information. Please contact the sender immediately if you are not the > intended recipient of this communication, and do not copy, > distribute, or > take action relying on it. Any communication received in error, or > subsequent reply, should be deleted or destroyed. > ************************************************************** > **************** > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/