Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752630AbcLDHgl (ORCPT ); Sun, 4 Dec 2016 02:36:41 -0500 Received: from mga06.intel.com ([134.134.136.31]:29399 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750778AbcLDHgi (ORCPT ); Sun, 4 Dec 2016 02:36:38 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,740,1477983600"; d="scan'208";a="1067695694" Subject: Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN To: "Baicar, Tyler" , jeffrey.t.kirsher@intel.com, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, okaya@codeaurora.org, timur@codeaurora.org References: <1478727681-16505-1-git-send-email-tbaicar@codeaurora.org> <42e53bc2-0361-e7b7-1093-4e095aa56955@intel.com> <3914adae-91b2-7b74-02bc-579b3e32d143@intel.com> <20c191fe-e8fe-b2ca-0628-7d0188d1f28e@codeaurora.org> From: "Neftin, Sasha" Message-ID: Date: Sun, 4 Dec 2016 09:35:43 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10090 Lines: 230 On 12/2/2016 7:02 PM, Baicar, Tyler wrote: > Hello Sasha, > > Were you able to reproduce this issue? > > Do you have a patch fixing the close function inconsistencies that you > mentioned which I could try out? > > Thanks, > Tyler > > On 11/21/2016 1:40 PM, Baicar, Tyler wrote: >> On 11/17/2016 6:31 AM, Neftin, Sasha wrote: >>> On 11/13/2016 10:34 AM, Neftin, Sasha wrote: >>>> On 11/11/2016 12:35 AM, Baicar, Tyler wrote: >>>>> Hello Sasha, >>>>> >>>>> On 11/9/2016 11:19 PM, Neftin, Sasha wrote: >>>>>> On 11/9/2016 11:41 PM, Tyler Baicar wrote: >>>>>>> Move IRQ free code so that it will happen regardless of the >>>>>>> __E1000_DOWN bit. Currently the e1000e driver only releases its IRQ >>>>>>> if the __E1000_DOWN bit is cleared. This is not sufficient because >>>>>>> it is possible for __E1000_DOWN to be set without releasing the IRQ. >>>>>>> In such a situation, we will hit a kernel bug later in e1000_remove >>>>>>> because the IRQ still has action since it was never freed. A >>>>>>> secondary bus reset can cause this case to happen. >>>>>>> >>>>>>> Signed-off-by: Tyler Baicar >>>>>>> --- >>>>>>> drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++- >>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c >>>>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c >>>>>>> index 7017281..36cfcb0 100644 >>>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c >>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c >>>>>>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev) >>>>>>> if (!test_bit(__E1000_DOWN, &adapter->state)) { >>>>>>> e1000e_down(adapter, true); >>>>>>> - e1000_free_irq(adapter); >>>>>>> /* Link status message must follow this format */ >>>>>>> pr_info("%s NIC Link is Down\n", adapter->netdev->name); >>>>>>> } >>>>>>> + e1000_free_irq(adapter); >>>>>>> + >>>>>>> napi_disable(&adapter->napi); >>>>>>> e1000e_free_tx_resources(adapter->tx_ring); >>>>>>> >>>>>> I would like not recommend insert this change. This change related >>>>>> driver state machine, we afraid from lot of synchronization >>>>>> problem and >>>>>> issues. >>>>>> We need keep e1000_free_irq in loop and check for 'test_bit' ready. >>>>> What do you mean here? There is no loop. If __E1000_DOWN is set >>>>> then we >>>>> will never free the IRQ. >>>>> >>>>>> Another point, does before execute secondary bus reset your SW >>>>>> back up >>>>>> pcie configuration space as properly? >>>>> After a secondary bus reset, the link needs to recover and go back >>>>> to a >>>>> working state after 1 second. >>>>> >>>>> From the callstack, the issue is happening while removing the >>>>> endpoint >>>>> from the system, before applying the secondary bus reset. >>>>> >>>>> The order of events is >>>>> 1. remove the drivers >>>>> 2. cause a secondary bus reset >>>>> 3. wait 1 second >>>> Actually, this is too much, usually link up in less than 100ms.You can >>>> check Data Link Layer indication. >>>>> 4. recover the link >>>>> >>>>> callstack: >>>>> free_msi_irqs+0x6c/0x1a8 >>>>> pci_disable_msi+0xb0/0x148 >>>>> e1000e_reset_interrupt_capability+0x60/0x78 >>>>> e1000_remove+0xc8/0x180 >>>>> pci_device_remove+0x48/0x118 >>>>> __device_release_driver+0x80/0x108 >>>>> device_release_driver+0x2c/0x40 >>>>> pci_stop_bus_device+0xa0/0xb0 >>>>> pci_stop_bus_device+0x3c/0xb0 >>>>> pci_stop_root_bus+0x54/0x80 >>>>> acpi_pci_root_remove+0x28/0x64 >>>>> acpi_bus_trim+0x6c/0xa4 >>>>> acpi_device_hotplug+0x19c/0x3f4 >>>>> acpi_hotplug_work_fn+0x28/0x3c >>>>> process_one_work+0x150/0x460 >>>>> worker_thread+0x50/0x4b8 >>>>> kthread+0xd4/0xe8 >>>>> ret_from_fork+0x10/0x50 >>>>> >>>>> Thanks, >>>>> Tyler >>>>> >>>> Hello Tyler, >>>> Okay, we need consult more about this suggestion. >>>> May I ask what is setup you run? Is there NIC or on board LAN? I would >>>> like try reproduce this issue in our lab's too. >>>> Also, is same issue observed with same scenario and others NIC's too? >>>> Sasha >>>> _______________________________________________ >>>> Intel-wired-lan mailing list >>>> Intel-wired-lan@lists.osuosl.org >>>> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan >>>> >>> Hello Tyler, >>> I see some in consistent implementation of __*_close methods in our >>> drivers. Do you have any igb NIC to check if same problem persist there? >>> Thanks, >>> Sasha >> Hello Sasha, >> >> I couldn't find an igb NIC to test with, but I did find another e1000e >> card that does not cause the same issue. That card is: >> >> 0004:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit >> Network Connection >> Subsystem: Intel Corporation Gigabit CT Desktop Adapter >> Physical Slot: 5-1 >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >> ParErr- Stepping- SERR+ FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >> SERR- > Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 299 >> Region 0: Memory at c01001c0000 (32-bit, non-prefetchable) >> [size=128K] >> Region 1: Memory at c0100100000 (32-bit, non-prefetchable) >> [size=512K] >> Region 2: I/O ports at 1000 [size=32] >> Region 3: Memory at c01001e0000 (32-bit, non-prefetchable) >> [size=16K] >> Expansion ROM at c0100180000 [disabled] [size=256K] >> Capabilities: [c8] Power Management version 2 >> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA >> PME(D0+,D1-,D2-,D3hot+,D3cold+) >> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Address: 00000000397f0040 Data: 0000 >> Capabilities: [e0] Express (v1) Endpoint, MSI 00 >> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s >> <512ns, L1 <64us >> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- >> DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ >> Unsupported+ >> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ >> MaxPayload 256 bytes, MaxReadReq 512 bytes >> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- >> AuxPwr+ TransPend- >> LnkCap: Port #8, Speed 2.5GT/s, Width x1, ASPM L0s L1, >> Exit Latency L0s <128ns, L1 <64us >> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- >> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- >> SlotClk+ DLActive- BWMgmt- ABWMgmt- >> Capabilities: [a0] MSI-X: Enable- Count=5 Masked- >> Vector table: BAR=3 offset=00000000 >> PBA: BAR=3 offset=00002000 >> Capabilities: [100 v1] Advanced Error Reporting >> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- >> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- >> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- >> UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- >> NonFatalErr- >> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- >> NonFatalErr+ >> AERCap: First Error Pointer: 00, GenCap- CGenEn- >> ChkCap- ChkEn- >> Capabilities: [140 v1] Device Serial Number >> 68-05-ca-ff-ff-29-47-34 >> Kernel driver in use: e1000e >> >> Here are the kernel logs from first running the test on this card and >> then running the test on the Intel 82571EB card that I originally >> found the issue with (you can see the issue doesn't happen on this >> card but does on the other): >> >> [ 44.146749] ACPI: \_SB_.PCI0: Device has suffered a power fault >> [ 44.155238] pcieport 0000:00:00.0: PCIe Bus Error: >> severity=Uncorrected (Non-Fatal), type=Transaction Layer, >> id=0000(Requester ID) >> [ 44.166111] pcieport 0000:00:00.0: device [17cb:0400] error >> status/mask=00004000/00400000 >> [ 44.174420] pcieport 0000:00:00.0: [14] Completion Timeout (First) >> [ 44.401943] e1000e 0000:01:00.0 eth0: Timesync Tx Control register >> not set as expected >> [ 82.445586] pcieport 0002:00:00.0: PCIe Bus Error: >> severity=Corrected, type=Physical Layer, id=0000(Receiver ID) >> [ 82.454851] pcieport 0002:00:00.0: device [17cb:0400] error >> status/mask=00000001/00006000 >> [ 82.463209] pcieport 0002:00:00.0: [ 0] Receiver Error >> [ 82.469355] pcieport 0002:00:00.0: PCIe Bus Error: >> severity=Uncorrected (Non-Fatal), type=Transaction Layer, >> id=0000(Requester ID) >> [ 82.481026] pcieport 0002:00:00.0: device [17cb:0400] error >> status/mask=00004000/00400000 >> [ 82.489343] pcieport 0002:00:00.0: [14] Completion Timeout (First) >> [ 82.504573] ACPI: \_SB_.PCI2: Device has suffered a power fault >> [ 84.528036] kernel BUG at drivers/pci/msi.c:369! >> >> I'm not sure why it reproduces on the 82571EB card and not the 82574L >> card. The only obvious difference is there is no Reciever Error on the >> 82574L card. >> >> If you have a patch fixing the inconsistencies you mentioned with the >> __*_close methods I would certainly be willing to test it out! >> >> Thanks, >> Tyler >> > Hello Tyler, We do not tried reproduce issues yet. As I know hit on BUG_ON happen on very old NIC. Our more new NIC do not experienced such problem. I would like recommend do not change kernel code in this moment. This is very sensitive for a driver state machine and we afraid from opening lot of issues. I will investigate more depth this problem later(hope have a time for it) and let you know if we have suggest fixes for this problem. Thanks, Sasha