Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754302AbbHMWNO (ORCPT ); Thu, 13 Aug 2015 18:13:14 -0400 Received: from mail-yk0-f169.google.com ([209.85.160.169]:35559 "EHLO mail-yk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752135AbbHMWNL (ORCPT ); Thu, 13 Aug 2015 18:13:11 -0400 MIME-Version: 1.0 In-Reply-To: References: <1439108128-18441-1-git-send-email-jiang.liu@linux.intel.com> <55C94AA5.8090904@linux.intel.com> Date: Thu, 13 Aug 2015 18:13:10 -0400 Message-ID: Subject: Re: [Bugfix] x86, irq: Fix a regression caused by commit b5dc8e6c21e7 From: Alex Deucher To: Jiang Liu Cc: Thomas Gleixner , Alexander Holler , Mark Rustad , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Tony Luck , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10358 Lines: 199 On Thu, Aug 13, 2015 at 4:15 PM, Alex Deucher wrote: > On Thu, Aug 13, 2015 at 3:46 PM, Alex Deucher wrote: >> On Mon, Aug 10, 2015 at 9:06 PM, Jiang Liu wrote: >>> On 2015/8/10 23:00, Alex Deucher wrote: >>>> On Sun, Aug 9, 2015 at 4:15 AM, Jiang Liu wrote: >>>>> Alex Deucher, Mark Rustad and Alexander Holler reported a regression >>>>> with the latest v4.2-rc4 kernel, which breaks some SATA controllers. >>>>> With multi-MSI capable SATA controllers, only the first port works, >>>>> all other ports times out when executing SATA commands. This regression >>>>> bisects to 52f518a3a7c2 ("x86/MSI: Use hierarchical irqdomains to manage >>>>> MSI interrupts"), but it's not the root cause, it just triggers a bug >>>>> caused by b5dc8e6c21e7 ("x86/irq: Use hierarchical irqdomain to manage >>>>> CPU interrupt vectors"). >>>>> >>>>> With this patch applied, the affected SATA controllers work as expected. >>>> >>>> Yes, this fixes the SATA regression: >>>> Tested-by: Alex Deucher >>>> >>>> I'm not sure if it's related to this patch or not (I haven't bisected >>>> it independently yet), but MSIs don't seem to work on GPUs. See the >>>> line for amdgpu. This is just after loading the driver. >>> Hi Alex, >>> This patch only affects multiple-MSI, and it seems that your >>> gpu only uses one MSI interrupt, so it may not be related to this patch. >>> And this seems like a sort of interrupt storm. >>>> 52: 16579895 16579562 16580988 16583443 IR-PCI-MSI >>>> 524288-edge amdgpu >>> >>> Does it make any change by disable interrupt remapping? >> >> Nope. Still going crazy: >> 46: 4769660 4769130 4775899 4784657 PCI-MSI >> 524288-edge amdgpu >> >> >>> Does it make any change by disable MSI? >> >> If I set pci=nomsi, the sata controllers time out. If I disable MSIs >> just for the gpu, I don't get any interrupts: >> 25: 0 0 0 0 IR-IO-APIC >> 0-fasteoi amdgpu >> > > Strangely, it only seems to affect certain boards. E.g., this card works fine: > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. > [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R9 260 OEM] (prog-if 00 > [VGA controller]) > Subsystem: Diamond Multimedia Systems Device 2329 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 52 > Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M] > Region 2: Memory at d0000000 (64-bit, prefetchable) [size=8M] > Region 4: I/O ports at e000 [size=256] > Region 5: Memory at ff600000 (32-bit, non-prefetchable) [size=256K] > Expansion ROM at ff640000 [disabled] [size=128K] > Capabilities: [48] Vendor Specific Information: Len=08 > Capabilities: [50] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA > PME(D0-,D1+,D2+,D3hot+,D3cold-) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s > <4us, L1 unlimited > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 256 bytes, MaxReadReq 512 bytes > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit > Latency L0s <64ns, L1 <1us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, > OBFF Not Supported > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, > OBFF Disabled > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, > EqualizationComplete+, EqualizationPhase1+ > EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee00000 Data: 0000 > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 > Len=010 > Capabilities: [150 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > Capabilities: [270 v1] #19 > Capabilities: [2b0 v1] Address Translation Service (ATS) > ATSCap: Invalidate Queue Depth: 00 > ATSCtl: Enable+, Smallest Translation Unit: 00 > Capabilities: [2c0 v1] #13 > Capabilities: [2d0 v1] #1b > Kernel driver in use: amdgpu > > This one does not: > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. > [AMD/ATI] Device 6939 (prog-if 00 [VGA controller]) > Subsystem: Gigabyte Technology Co., Ltd Device 229d > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 52 > Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M] > Region 2: Memory at d0000000 (64-bit, prefetchable) [size=2M] > Region 4: I/O ports at e000 [size=256] > Region 5: Memory at ff600000 (32-bit, non-prefetchable) [size=256K] > Expansion ROM at ff640000 [disabled] [size=128K] > Capabilities: [48] Vendor Specific Information: Len=08 > Capabilities: [50] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA > PME(D0-,D1+,D2+,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s > <4us, L1 unlimited > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 256 bytes, MaxReadReq 512 bytes > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit > Latency L0s <64ns, L1 <1us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ > DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, > OBFF Not Supported > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, > OBFF Disabled > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, > EqualizationComplete+, EqualizationPhase1+ > EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee00000 Data: 0000 > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 > Len=010 > Capabilities: [150 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > Capabilities: [200 v1] #15 > Capabilities: [270 v1] #19 > Capabilities: [2b0 v1] Address Translation Service (ATS) > ATSCap: Invalidate Queue Depth: 00 > ATSCtl: Enable+, Smallest Translation Unit: 00 > Capabilities: [2c0 v1] #13 > Capabilities: [2d0 v1] #1b > Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI) > ARICap: MFVC- ACS-, Next Function: 1 > ARICtl: MFVC- ACS-, Function Group: 0 > Kernel driver in use: amdgpu > > Any ideas? I'll see if I can find the time to bisect this. I attempted to bisect this, however the regression happened prior to my driver being merged upstream: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=099bfbfc7fbbe22356c02f0caf709ac32e1126ea So I can't easily bisect it further without backporting the driver to each commit before that. This may take a while... Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/