Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752113Ab2FFB6o (ORCPT ); Tue, 5 Jun 2012 21:58:44 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:54514 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751327Ab2FFB6n convert rfc822-to-8bit (ORCPT ); Tue, 5 Jun 2012 21:58:43 -0400 MIME-Version: 1.0 In-Reply-To: <20120606010255.GA9991@mcarlson.broadcom.com> References: <20120606010255.GA9991@mcarlson.broadcom.com> Date: Wed, 6 Jun 2012 09:58:42 +0800 Message-ID: Subject: Re: tg3: transmit timed out, resetting From: ethan zhao To: Matt Carlson Cc: Christian Kujau , LKML Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4750 Lines: 108 Saw many similar bugs report by simply google, The root cause of this issue may be related to Broadcom tg3 firmware and the version of tg3 hardware, so I think it is hard to get fix in Linux driver. better way is get another NIC, or disable some its feature to workaround if we got what feature block it (tso ? sg ? ). Some debugging messages from other guys: [ 3538.223529] tg3 0000:01:08.0: eth1: transmit timed out, resetting [ 3538.229698] tg3 0000:01:08.0: eth1: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000008] [ 3538.236001] tg3 0000:01:08.0: eth1: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] [ 3538.343602] tg3 0000:01:08.0: tg3_stop_block timed out, ofs=1800 enable_bit=2 [ 3538.449609] tg3 0000:01:08.0: tg3_stop_block timed out, ofs=c00 enable_bit=2 [ 3538.555402] tg3 0000:01:08.0: tg3_stop_block timed out, ofs=4800 enable_bit=2 [ 3538.692079] tg3 0000:01:08.0: eth1: Link is down We could see tg3_reset_hw()-->tg3_stop_fw()--> tg3_stop_block() timeout, so the response of firmware is not right. Just my 2 cents. Ethan On Wed, Jun 6, 2012 at 9:02 AM, Matt Carlson wrote: > I'm attempting to reproduce this in our lab. ?In the meantime, > the latest revisions of the driver output a register dump and some > additional information when transmit timeouts happen. ?It would be > useful to see that data. ?Would it be possible to try a the latest > kernels and get this information? > > On Mon, Jun 04, 2012 at 04:14:30PM -0700, Christian Kujau wrote: >> Hi, >> >> on this Ideapad S10 the onboard Broadcom BCM5906M prints the warning >> below, once. From then on, the "transmit timed out, resetting" message >> repeats, every now and then. >> >> This laptop is mounting 2 readonly NFS shares from a box in the same LAN >> and when scanning lots of files on these NFS shares, the transmit timeouts >> occur more often, I think. When there's sequential traffic (i.e. reading >> larger files from the NFS shares), fewer warnings occur. But this is just >> manual observation, I haven't been able to reproduce this reliably. >> However, there's constant traffic on the device (maybe ~700KB/s both tx >> and rx), so the messages occur pretty regularly. >> >> I have reported the error against the Fedora 17 kernel [0] but it happens >> with a vanilla 3.4.0 too[1] - check out for full dmesg, .config and more. >> >> I had a similar issue a while ago[2] and almost forgot about them. The >> laptop ran Ubuntu 10.04 (2.6.32) since then and the problem was gone, so >> I'd say 2.6.32 fixed it. Now the same laptop switched to Fedora, kernel >> 3.3.4 and the problem seems to be back again. >> >> I'll try running with sg=off, as Matt suggested in [3] and report back. >> >> Thanks, >> Christian. >> >> [0] https://bugzilla.redhat.com/show_bug.cgi?id=825123 >> [1] http://nerdbynature.de/bits/3.4.0/tg3/ >> [2] http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/00004.html >> [3] http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/00317.html >> >> ------------[ cut here ]------------ >> WARNING: at /opt/home/chrisk/dev/linux-2.6-git/net/sched/sch_generic.c:255 >> dev_watchdog+0x1cc/0x1e0() >> Hardware name: Lenovo >> NETDEV WATCHDOG: p2p1 (tg3): transmit queue 0 timed out >> Modules linked in: acpi_cpufreq mperf freq_table nfs lockd sunrpc b43 >> mac80211 cfg80211 ssb coretemp hwmon usb_storage [last unloaded: scsi_wait_scan] >> Pid: 685, comm: FahCore_78 Not tainted 3.4.0-10151-g4fc3acf #8 >> Call Trace: >> ?[] ? warn_slowpath_common+0x79/0xb0 >> ?[] ? dev_watchdog+0x1cc/0x1e0 >> ?[] ? dev_watchdog+0x1cc/0x1e0 >> ?[] ? warn_slowpath_fmt+0x34/0x40 >> ?[] ? dev_watchdog+0x1cc/0x1e0 >> ?[] ? pfifo_fast_dequeue+0xe0/0xe0 >> ?[] ? run_timer_softirq+0xd1/0x1d0 >> ?[] ? __do_softirq+0x75/0x100 >> ?[] ? remote_softirq_receive+0x20/0x20 >> ? ?[] ? irq_exit+0x66/0x90 >> ?[] ? smp_apic_timer_interrupt+0x59/0x90 >> ?[] ? apic_timer_interrupt+0x31/0x38 >> ?[] ? rt_mutex_trylock+0x70/0x70 >> ---[ end trace 9de668a859ee5d6c ]--- >> tg3 0000:02:00.0: p2p1: transmit timed out, resetting >> >> >> -- >> BOFH excuse #438: >> >> sticky bit has come loose >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at ?http://vger.kernel.org/majordomo-info.html > Please read the FAQ at ?http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/