Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752544AbbBWQnu (ORCPT ); Mon, 23 Feb 2015 11:43:50 -0500 Received: from mga11.intel.com ([192.55.52.93]:5622 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752308AbbBWQnt convert rfc822-to-8bit (ORCPT ); Mon, 23 Feb 2015 11:43:49 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,631,1418112000"; d="scan'208";a="670265351" From: "Tantilov, Emil S" To: Justin Piszcz , "linux-kernel@vger.kernel.org" Subject: RE: 3.19: ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout Thread-Topic: 3.19: ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout Thread-Index: AdBOlvMlmz0RWXcqThyaZ0dg57bRogA7ymMA Date: Mon, 23 Feb 2015 16:42:52 +0000 Message-ID: <87618083B2453E4A8714035B62D67992502411C3@FMSMSX105.amr.corp.intel.com> References: <000001d04e97$43b4b950$cb1e2bf0$@lucidpixels.com> In-Reply-To: <000001d04e97$43b4b950$cb1e2bf0$@lucidpixels.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.1.200.108] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4854 Lines: 86 >-----Original Message----- >From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Justin Piszcz >Sent: Sunday, February 22, 2015 4:01 AM >To: linux-kernel@vger.kernel.org >Subject: 3.19: ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > >Hello, > >Kernel: 3.19.0 >Issue: When using robocopy to copy files (from Windows 8/8.1) to >Linux/samba, the 10GbE NIC resets - dmesg [1] below. To get it back working >again, I have to down/up the interface. Jumbo frames are being used (mtu of >9014) on each side. The lspci output is listed below. Are there any other >recommended workarounds for this issue as LRO is already off for me as shown >below. When using Linux<->Linux with rsync or NFS, there are no errors with >10GbE. When using Samba<->Windows 8 over 10GbE, this issue occurs >persistently as shown below when a copy is running. > ># ethtool -k eth4|grep large >large-receive-offload: off [fixed] The issue is a Tx timeout, so LRO is unlikely to have an effect. Is the interface that hangs (eth4) mostly receiving or transmitting? Posting the stats (ethtool -S eth4) would help here. >There is/was a similar issue as reported here: >https://communities.intel.com/message/207408 > > [1] dmesg > > [538576.098186] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow Control: RX/TX > [541013.223961] ------------[ cut here ]------------ > [541013.223970] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x227/0x230() > [541013.223971] NETDEV WATCHDOG: eth4 (ixgbe): transmit queue 0 timed out > [541013.223972] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0 #2 > [541013.223973] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.0a 12/05/2013 > [541013.223974] ffffffff81d3a6ae ffff88107fc03da8 ffffffff819d07d7 ffffffff81e34d98 > [541013.223976] ffff88107fc03df8 ffff88107fc03de8 ffffffff810dbdab 0000000000000000 > [541013.223977] 0000000000000000 ffff881036304000 0000000000000000 0000000000000010 > [541013.223979] Call Trace: > [541013.223979] [] dump_stack+0x45/0x57 > [541013.223985] [] warn_slowpath_common+0x7b/0xc0 > [541013.223987] [] warn_slowpath_fmt+0x41/0x50 > [541013.223990] [] ? __queue_work+0xfc/0x290 > [541013.223996] [] dev_watchdog+0x227/0x230 > [541013.223997] [] ? qdisc_rcu_free+0x40/0x40 > [541013.223998] [] ? qdisc_rcu_free+0x40/0x40 > [541013.224001] [] call_timer_fn.isra.29+0x17/0x80 > [541013.224002] [] run_timer_softirq+0x1c9/0x280 > [541013.224004] [] __do_softirq+0xff/0x200 > [541013.224005] [] irq_exit+0x76/0xa0 > [541013.224007] [] smp_apic_timer_interrupt+0x41/0x50 > [541013.224009] [] apic_timer_interrupt+0x6a/0x70 > [541013.224009] [] ? cpuidle_enter_state+0x48/0xc0 > [541013.224013] [] ? cpuidle_enter_state+0x3d/0xc0 > [541013.224014] [] cpuidle_enter+0x12/0x20 > [541013.224017] [] cpu_startup_entry+0x272/0x2f0 > [541013.224018] [] rest_init+0x6d/0x70 > [541013.224021] [] start_kernel+0x353/0x360 > [541013.224022] [] x86_64_start_reservations+0x2a/0x2c > [541013.224023] [] x86_64_start_kernel+0xc8/0xcc > [541013.224024] ---[ end trace 59877113cf8b7358 ]--- > [541013.224026] ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > [541013.224036] ixgbe 0000:01:00.0 eth4: Reset adapter > [541020.099402] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow Control: RX/TX > > ( .. it continue but without the trace later .. ) > > [567457.771728] ixgbe 0000:01:00.0 eth4: NIC Link is Down > [567458.140112] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow Control: RX/TX > [567561.611941] ixgbe 0000:01:00.0 eth4: NIC Link is Down > [567568.188422] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow Control: RX/TX > [570130.483823] ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > [570130.483924] ixgbe 0000:01:00.0 eth4: Reset adapter The reset is a side effect of the Tx hang - the driver is trying to recover from the hang by resetting the interface. If you could open up a ticket at e1000.sf.net with details about your setup and how you configure the interfaces that would help us get a better idea of the issue. You can also upload the stats, kernel config and any other logs that may be relevant. Thanks, Emil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/