Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752220AbbHYEDS (ORCPT ); Tue, 25 Aug 2015 00:03:18 -0400 Received: from mail-yk0-f174.google.com ([209.85.160.174]:34144 "EHLO mail-yk0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751681AbbHYEDQ (ORCPT ); Tue, 25 Aug 2015 00:03:16 -0400 MIME-Version: 1.0 In-Reply-To: References: <1439108128-18441-1-git-send-email-jiang.liu@linux.intel.com> <55C94AA5.8090904@linux.intel.com> Date: Tue, 25 Aug 2015 00:03:15 -0400 Message-ID: Subject: Re: [Bugfix] x86, irq: Fix a regression caused by commit b5dc8e6c21e7 From: Alex Deucher To: Jiang Liu Cc: Thomas Gleixner , Alexander Holler , Mark Rustad , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Tony Luck , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10740 Lines: 206 On Thu, Aug 13, 2015 at 6:13 PM, Alex Deucher wrote: > On Thu, Aug 13, 2015 at 4:15 PM, Alex Deucher wrote: >> On Thu, Aug 13, 2015 at 3:46 PM, Alex Deucher wrote: >>> On Mon, Aug 10, 2015 at 9:06 PM, Jiang Liu wrote: >>>> On 2015/8/10 23:00, Alex Deucher wrote: >>>>> On Sun, Aug 9, 2015 at 4:15 AM, Jiang Liu wrote: >>>>>> Alex Deucher, Mark Rustad and Alexander Holler reported a regression >>>>>> with the latest v4.2-rc4 kernel, which breaks some SATA controllers. >>>>>> With multi-MSI capable SATA controllers, only the first port works, >>>>>> all other ports times out when executing SATA commands. This regression >>>>>> bisects to 52f518a3a7c2 ("x86/MSI: Use hierarchical irqdomains to manage >>>>>> MSI interrupts"), but it's not the root cause, it just triggers a bug >>>>>> caused by b5dc8e6c21e7 ("x86/irq: Use hierarchical irqdomain to manage >>>>>> CPU interrupt vectors"). >>>>>> >>>>>> With this patch applied, the affected SATA controllers work as expected. >>>>> >>>>> Yes, this fixes the SATA regression: >>>>> Tested-by: Alex Deucher >>>>> >>>>> I'm not sure if it's related to this patch or not (I haven't bisected >>>>> it independently yet), but MSIs don't seem to work on GPUs. See the >>>>> line for amdgpu. This is just after loading the driver. >>>> Hi Alex, >>>> This patch only affects multiple-MSI, and it seems that your >>>> gpu only uses one MSI interrupt, so it may not be related to this patch. >>>> And this seems like a sort of interrupt storm. >>>>> 52: 16579895 16579562 16580988 16583443 IR-PCI-MSI >>>>> 524288-edge amdgpu >>>> >>>> Does it make any change by disable interrupt remapping? >>> >>> Nope. Still going crazy: >>> 46: 4769660 4769130 4775899 4784657 PCI-MSI >>> 524288-edge amdgpu >>> >>> >>>> Does it make any change by disable MSI? >>> >>> If I set pci=nomsi, the sata controllers time out. If I disable MSIs >>> just for the gpu, I don't get any interrupts: >>> 25: 0 0 0 0 IR-IO-APIC >>> 0-fasteoi amdgpu >>> >> >> Strangely, it only seems to affect certain boards. E.g., this card works fine: >> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. >> [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R9 260 OEM] (prog-if 00 >> [VGA controller]) >> Subsystem: Diamond Multimedia Systems Device 2329 >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >> ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >> SERR- > Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 52 >> Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M] >> Region 2: Memory at d0000000 (64-bit, prefetchable) [size=8M] >> Region 4: I/O ports at e000 [size=256] >> Region 5: Memory at ff600000 (32-bit, non-prefetchable) [size=256K] >> Expansion ROM at ff640000 [disabled] [size=128K] >> Capabilities: [48] Vendor Specific Information: Len=08 >> Capabilities: [50] Power Management version 3 >> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA >> PME(D0-,D1+,D2+,D3hot+,D3cold-) >> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 >> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s >> <4us, L1 unlimited >> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- >> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ >> MaxPayload 256 bytes, MaxReadReq 512 bytes >> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- >> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit >> Latency L0s <64ns, L1 <1us >> ClockPM- Surprise- LLActRep- BwNot- >> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ >> DLActive- BWMgmt- ABWMgmt- >> DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, >> OBFF Not Supported >> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, >> OBFF Disabled >> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- >> Transmit Margin: Normal Operating Range, >> EnterModifiedCompliance- ComplianceSOS- >> Compliance De-emphasis: -6dB >> LnkSta2: Current De-emphasis Level: -6dB, >> EqualizationComplete+, EqualizationPhase1+ >> EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- >> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Address: 00000000fee00000 Data: 0000 >> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 >> Len=010 >> Capabilities: [150 v2] Advanced Error Reporting >> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- >> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ >> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ >> AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- >> Capabilities: [270 v1] #19 >> Capabilities: [2b0 v1] Address Translation Service (ATS) >> ATSCap: Invalidate Queue Depth: 00 >> ATSCtl: Enable+, Smallest Translation Unit: 00 >> Capabilities: [2c0 v1] #13 >> Capabilities: [2d0 v1] #1b >> Kernel driver in use: amdgpu >> >> This one does not: >> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. >> [AMD/ATI] Device 6939 (prog-if 00 [VGA controller]) >> Subsystem: Gigabyte Technology Co., Ltd Device 229d >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >> ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >> SERR- > Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 52 >> Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M] >> Region 2: Memory at d0000000 (64-bit, prefetchable) [size=2M] >> Region 4: I/O ports at e000 [size=256] >> Region 5: Memory at ff600000 (32-bit, non-prefetchable) [size=256K] >> Expansion ROM at ff640000 [disabled] [size=128K] >> Capabilities: [48] Vendor Specific Information: Len=08 >> Capabilities: [50] Power Management version 3 >> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA >> PME(D0-,D1+,D2+,D3hot+,D3cold+) >> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 >> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s >> <4us, L1 unlimited >> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- >> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ >> MaxPayload 256 bytes, MaxReadReq 512 bytes >> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- >> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit >> Latency L0s <64ns, L1 <1us >> ClockPM- Surprise- LLActRep- BwNot- >> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ >> DLActive- BWMgmt- ABWMgmt- >> DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, >> OBFF Not Supported >> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, >> OBFF Disabled >> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- >> Transmit Margin: Normal Operating Range, >> EnterModifiedCompliance- ComplianceSOS- >> Compliance De-emphasis: -6dB >> LnkSta2: Current De-emphasis Level: -6dB, >> EqualizationComplete+, EqualizationPhase1+ >> EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- >> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Address: 00000000fee00000 Data: 0000 >> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 >> Len=010 >> Capabilities: [150 v2] Advanced Error Reporting >> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- >> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ >> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ >> AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- >> Capabilities: [200 v1] #15 >> Capabilities: [270 v1] #19 >> Capabilities: [2b0 v1] Address Translation Service (ATS) >> ATSCap: Invalidate Queue Depth: 00 >> ATSCtl: Enable+, Smallest Translation Unit: 00 >> Capabilities: [2c0 v1] #13 >> Capabilities: [2d0 v1] #1b >> Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI) >> ARICap: MFVC- ACS-, Next Function: 1 >> ARICtl: MFVC- ACS-, Function Group: 0 >> Kernel driver in use: amdgpu >> >> Any ideas? I'll see if I can find the time to bisect this. > > I attempted to bisect this, however the regression happened prior to > my driver being merged upstream: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=099bfbfc7fbbe22356c02f0caf709ac32e1126ea > So I can't easily bisect it further without backporting the driver to > each commit before that. This may take a while... Just a heads up, this ended up being an alignment issue in the driver and was not a regression. Alex > > Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/