Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752382AbbBWMf2 (ORCPT ); Mon, 23 Feb 2015 07:35:28 -0500 Received: from mail-qc0-f173.google.com ([209.85.216.173]:36638 "EHLO mail-qc0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752120AbbBWMf1 (ORCPT ); Mon, 23 Feb 2015 07:35:27 -0500 MIME-Version: 1.0 X-Originating-IP: [72.246.116.10] In-Reply-To: <000001d04e97$43b4b950$cb1e2bf0$@lucidpixels.com> References: <000001d04e97$43b4b950$cb1e2bf0$@lucidpixels.com> Date: Mon, 23 Feb 2015 07:35:26 -0500 Message-ID: Subject: Re: 3.19: ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout From: Justin Piszcz To: open list , netdev@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9971 Lines: 201 On Sun, Feb 22, 2015 at 7:01 AM, Justin Piszcz wrote: > > Hello, > > Kernel: 3.19.0 > Issue: When using robocopy to copy files (from Windows 8/8.1) to > Linux/samba, the 10GbE NIC resets - dmesg [1] below. To get it back working > again, I have to down/up the interface. Jumbo frames are being used (mtu of > 9014) on each side. The lspci output is listed below. Are there any other > recommended workarounds for this issue as LRO is already off for me as shown > below. When using Linux<->Linux with rsync or NFS, there are no errors with > 10GbE. When using Samba<->Windows 8 over 10GbE, this issue occurs > persistently as shown below when a copy is running. > > # ethtool -k eth4|grep large > large-receive-offload: off [fixed] > > There is/was a similar issue as reported here: > https://communities.intel.com/message/207408 > > [1] dmesg > > [538576.098186] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [541013.223961] ------------[ cut here ]------------ > [541013.223970] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 > dev_watchdog+0x227/0x230() > [541013.223971] NETDEV WATCHDOG: eth4 (ixgbe): transmit queue 0 timed out > [541013.223972] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0 #2 > [541013.223973] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.0a > 12/05/2013 > [541013.223974] ffffffff81d3a6ae ffff88107fc03da8 ffffffff819d07d7 > ffffffff81e34d98 > [541013.223976] ffff88107fc03df8 ffff88107fc03de8 ffffffff810dbdab > 0000000000000000 > [541013.223977] 0000000000000000 ffff881036304000 0000000000000000 > 0000000000000010 > [541013.223979] Call Trace: > [541013.223979] [] dump_stack+0x45/0x57 > [541013.223985] [] warn_slowpath_common+0x7b/0xc0 > [541013.223987] [] warn_slowpath_fmt+0x41/0x50 > [541013.223990] [] ? __queue_work+0xfc/0x290 > [541013.223996] [] dev_watchdog+0x227/0x230 > [541013.223997] [] ? qdisc_rcu_free+0x40/0x40 > [541013.223998] [] ? qdisc_rcu_free+0x40/0x40 > [541013.224001] [] call_timer_fn.isra.29+0x17/0x80 > [541013.224002] [] run_timer_softirq+0x1c9/0x280 > [541013.224004] [] __do_softirq+0xff/0x200 > [541013.224005] [] irq_exit+0x76/0xa0 > [541013.224007] [] smp_apic_timer_interrupt+0x41/0x50 > [541013.224009] [] apic_timer_interrupt+0x6a/0x70 > [541013.224009] [] ? cpuidle_enter_state+0x48/0xc0 > [541013.224013] [] ? cpuidle_enter_state+0x3d/0xc0 > [541013.224014] [] cpuidle_enter+0x12/0x20 > [541013.224017] [] cpu_startup_entry+0x272/0x2f0 > [541013.224018] [] rest_init+0x6d/0x70 > [541013.224021] [] start_kernel+0x353/0x360 > [541013.224022] [] x86_64_start_reservations+0x2a/0x2c > [541013.224023] [] x86_64_start_kernel+0xc8/0xcc > [541013.224024] ---[ end trace 59877113cf8b7358 ]--- > [541013.224026] ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > [541013.224036] ixgbe 0000:01:00.0 eth4: Reset adapter > [541020.099402] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > > ( .. it continue but without the trace later .. ) > > [567457.771728] ixgbe 0000:01:00.0 eth4: NIC Link is Down > [567458.140112] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [567561.611941] ixgbe 0000:01:00.0 eth4: NIC Link is Down > [567568.188422] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [570130.483823] ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > [570130.483924] ixgbe 0000:01:00.0 eth4: Reset adapter > [570137.252167] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [572094.256452] ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > [572094.256538] ixgbe 0000:01:00.0 eth4: Reset adapter > [572101.130915] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [573967.946084] ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > [573967.946097] ixgbe 0000:01:00.0 eth4: Reset adapter > [573974.676387] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [575766.574731] ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > [575766.574753] ixgbe 0000:01:00.0 eth4: Reset adapter > [575773.315067] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [585476.513732] perf interrupt took too long (5003 > 5000), lowering > kernel.perf_event_max_sample_rate to 25000 > [597267.959412] ixgbe 0000:01:00.0 eth4: initiating reset due to tx timeout > [597267.959452] ixgbe 0000:01:00.0 eth4: Reset adapter > [597274.709728] ixgbe 0000:01:00.0 eth4: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > > [2] lspci > > 01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server > Adapter (rev 01) > Subsystem: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 85 > Region 0: Memory at fbe40000 (32-bit, non-prefetchable) [size=128K] > Region 1: Memory at fbe00000 (32-bit, non-prefetchable) [size=256K] > Region 2: I/O ports at e000 [size=32] > Region 3: Memory at fbe60000 (32-bit, non-prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold-) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ > Address: 0000000000000000 Data: 0000 > Capabilities: [60] MSI-X: Enable+ Count=18 Masked- > Vector table: BAR=3 offset=00000000 > PBA: BAR=3 offset=00002000 > Capabilities: [a0] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 256 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Exit Latency L0s > <4us, L1 <64us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- > BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not > Supported > DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-, LTR-, OBFF > Disabled > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- > ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, > EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [100 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq+ ACSViol- > UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ > MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- > Capabilities: [140 v1] Device Serial Number 00-1b-21-ff-ff-58-e6-aa > Kernel driver in use: ixgbe > 00: 86 80 0b 15 07 04 10 00 01 00 00 02 10 00 00 00 > 10: 00 00 e4 fb 00 00 e0 fb 01 e0 00 00 00 00 e6 fb > 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 2c a1 > 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00 > 40: 01 50 23 48 00 20 00 fa 00 00 00 00 00 00 00 00 > 50: 05 60 80 00 00 00 00 00 00 00 00 00 00 00 00 00 > 60: 11 a0 11 80 03 00 00 00 03 20 00 00 00 00 00 00 > 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 10 00 02 00 c1 8c 00 00 2f 28 00 00 81 6c 03 00 > b0: 40 00 81 10 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 00 00 00 00 1f 00 00 00 05 00 00 00 00 00 00 00 > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 100: 01 00 01 14 00 00 00 00 00 00 10 00 11 20 06 00 > 110: 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 > 120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 140: 03 00 01 00 aa e6 58 ff ff 21 1b 00 00 00 00 00 > 150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > (the rest are: XXX: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00) > > Justin. > +CC netdev@ I also tried the latest ixgbe (3.23.2) from Intel and it does not compile against 3.19-- is there a newer version I should be trying or possibly try different module parameters/tweaking to work-around this issue? https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14687 Thanks, Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/