2016-10-30 22:06:16

by Mike Krinkin

[permalink] [raw]
Subject: Commit "locking/drm: Kill mutex trickery" causes hangs

Hello,

i faced system hangs with recent linux-next versions, bisect points at the
commit 3ab7c086d5ec72585ef0 ("locking/drm: Kill mutex trickery"), bisect log
attached. System just hangs after few minutes when i compile kernel with -j4
and watch some video simultaneously.

Here is bisect log:

# good: [596f144943812954113f028c915e0b6c08200429] staging: vt6656: Remove unnecessary parentheses.
git bisect start '44c67c23aa9a90bbfff9e3ce38314777bd80ec7a' '596f144943812954113f028c915e0b6c08200429'
# good: [9fba39fd558fedc7fb383228bbf41b15f1f4be3d] Merge remote-tracking branch 'net-next/master'
git bisect good 9fba39fd558fedc7fb383228bbf41b15f1f4be3d
# good: [30e0094cb5df3f716783c42fbfaf995ab63416bf] Merge remote-tracking branch 'kgdb/kgdb-next'
git bisect good 30e0094cb5df3f716783c42fbfaf995ab63416bf
# bad: [792538c0afdd4b1086283359f8f9ed78553ffb35] Merge remote-tracking branch 'leds/for-next'
git bisect bad 792538c0afdd4b1086283359f8f9ed78553ffb35
# good: [2b8ff98d9795952121f6106ad7121c83e66804c4] Merge branch 'sched/core'
git bisect good 2b8ff98d9795952121f6106ad7121c83e66804c4
# bad: [a1f03c7327d1060eabb4892df301101d9317685b] Merge remote-tracking branch 'tip/auto-latest'
git bisect bad a1f03c7327d1060eabb4892df301101d9317685b
# good: [537208ff20de2745286419d2ed63955028bb43d0] Merge remote-tracking branch 'trivial/for-next'
git bisect good 537208ff20de2745286419d2ed63955028bb43d0
# bad: [9576aaa21079789c84357605a7f15cb64b055561] Merge branch 'x86/urgent'
git bisect bad 9576aaa21079789c84357605a7f15cb64b055561
# bad: [8f142f9a607a5d30b9cdf1e9b1ee456194793928] Merge branch 'x86/urgent'
git bisect bad 8f142f9a607a5d30b9cdf1e9b1ee456194793928
# bad: [9d659ae14b545c4296e812c70493bfdc999b5c1c] locking/mutex: Add lock handoff to avoid starvation
git bisect bad 9d659ae14b545c4296e812c70493bfdc999b5c1c
# bad: [3ca0ff571b092ee4d807f1168caa428d95b0173b] locking/mutex: Rework mutex::owner
git bisect bad 3ca0ff571b092ee4d807f1168caa428d95b0173b
# bad: [3ab7c086d5ec72585ef0158dbc265cb03ddc682a] locking/drm: Kill mutex trickery
git bisect bad 3ab7c086d5ec72585ef0158dbc265cb03ddc682a
# first bad commit: [3ab7c086d5ec72585ef0158dbc265cb03ddc682a] locking/drm: Kill mutex trickery

also lspci -vvv output:

00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 08)
Subsystem: Lenovo Skylake Host Bridge/DRAM Registers
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information: Len=10 <?>

00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 07) (prog-if 00 [VGA controller])
Subsystem: Lenovo Skylake Integrated Graphics
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 277
Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at e000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00018 Data: 0000
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] #1b
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [300 v1] #13
Kernel driver in use: i915
Kernel modules: i915

00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) (prog-if 30 [XHCI])
Subsystem: Lenovo Sunrise Point-LP USB 3.0 xHCI Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 124
Region 0: Memory at f1120000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [70] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
Address: 00000000fee00278 Data: 0000
Kernel driver in use: xhci_hcd

00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
Subsystem: Lenovo Sunrise Point-LP Thermal subsystem
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin C routed to IRQ 18
Region 0: Memory at f114a000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Kernel driver in use: intel_pch_thermal
Kernel modules: intel_pch_thermal

00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller (rev 21)
Subsystem: Lenovo Sunrise Point-LP Serial IO I2C Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f114b000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] Vendor Specific Information: Len=14 <?>
Kernel driver in use: intel-lpss
Kernel modules: intel_lpss_pci

00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller (rev 21)
Subsystem: Lenovo Sunrise Point-LP Serial IO I2C Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 17
Region 0: Memory at f114c000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] Vendor Specific Information: Len=14 <?>
Kernel driver in use: intel-lpss
Kernel modules: intel_lpss_pci

00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI (rev 21)
Subsystem: Lenovo Sunrise Point-LP CSME HECI
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 278
Region 0: Memory at f114d000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee002d8 Data: 0000
Kernel driver in use: mei_me
Kernel modules: mei_me

00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21) (prog-if 01 [AHCI 1.0])
Subsystem: Lenovo Sunrise Point-LP SATA Controller [AHCI mode]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f1148000 (32-bit, non-prefetchable) [size=8K]
Region 1: Memory at f1150000 (32-bit, non-prefetchable) [size=256]
Region 2: I/O ports at e080 [size=8]
Region 3: I/O ports at e088 [size=4]
Region 4: I/O ports at e060 [size=32]
Region 5: Memory at f114e000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities: [70] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
Kernel driver in use: ahci
Kernel modules: ahci

00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 122
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #1, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L0s unlimited, L1 <16us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #0, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00218 Data: 0000
Capabilities: [90] Subsystem: Lenovo Device 5059
Capabilities: [a0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [200 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
Capabilities: [220 v1] #19
Kernel driver in use: pcieport

00:1c.3 PCI bridge: Intel Corporation Device 9d13 (rev f1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin D routed to IRQ 123
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: f1000000-f10fffff
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #4, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L0s <1us, L1 <16us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #3, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00258 Data: 0000
Capabilities: [90] Subsystem: Lenovo Device 5059
Capabilities: [a0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [220 v1] #19
Kernel driver in use: pcieport

00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21)
Subsystem: Lenovo Sunrise Point-LP LPC Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0

00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
Subsystem: Lenovo Sunrise Point-LP PMC
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Region 0: Memory at f1144000 (32-bit, non-prefetchable) [disabled] [size=16K]

00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
Subsystem: Lenovo Sunrise Point-LP HD Audio
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64
Interrupt: pin A routed to IRQ 280
Region 0: Memory at f1140000 (64-bit, non-prefetchable) [size=16K]
Region 4: Memory at f1130000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00318 Data: 0000
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
Subsystem: Lenovo Sunrise Point-LP SMBus
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 255
Region 0: Memory at f114f000 (64-bit, non-prefetchable) [disabled] [size=256]
Region 4: I/O ports at efa0 [disabled] [size=32]
Kernel modules: i2c_i801

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection I219-V (rev 21)
Subsystem: Intel Corporation Ethernet Connection I219-V
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 255
Region 0: Memory at f1100000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-

03:00.0 Network controller: Intel Corporation Intel Dual Band Wireless-AC 3165 Plus Bluetooth (rev 99)
Subsystem: Intel Corporation Intel Dual Band Wireless-AC 3165 Plus Bluetooth
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 279
Region 0: Memory at f1000000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee002f8 Data: 0000
Capabilities: [40] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L0s <4us, L1 <32us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Via WAKE#
DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-, LTR+, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number ac-2b-6e-ff-ff-29-07-98
Capabilities: [14c v1] Latency Tolerance Reporting
Max snoop latency: 3145728ns
Max no snoop latency: 3145728ns
Capabilities: [154 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi


2016-10-31 00:09:51

by Hugh Dickins

[permalink] [raw]
Subject: Re: Commit "locking/drm: Kill mutex trickery" causes hangs

On Mon, 31 Oct 2016, Mike Krinkin wrote:
>
> i faced system hangs with recent linux-next versions, bisect points at the
> commit 3ab7c086d5ec72585ef0 ("locking/drm: Kill mutex trickery"), bisect log
> attached. System just hangs after few minutes when i compile kernel with -j4
> and watch some video simultaneously.
[...]
> also lspci -vvv output:
[...]
> Kernel driver in use: i915
> Kernel modules: i915

Yes, that's hit me too, on mmotm on i915. i915_gem_shrinker_lock()
is broken: but copy the pattern from msm_gem_shrinker_lock() and it's
okay - patch below. Well, okay-ish: I'm reluctant to sign off on that
as more than a quick fix for i915 linux-next users, since the unlock
variable and those _gem_shrinker_lock() wrappers should just be deleted
(if the mutex trickery is indeed to be killed).

And I'm still left with a "sleeping function called from invalid context"
warning, which seems easier to live with: I've not looked to see whether
that's a consequence of the mutex trickery killage or something else.

[ 12.887922] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:956
[ 12.887925] in_atomic(): 1, irqs_disabled(): 0, pid: 787, name: X
[ 12.887927] 1 lock held by X/787:
[ 12.887928] #0:
[ 12.887929] (
[ 12.887930] &dev->struct_mutex
[ 12.887931] ){+.+.+.}
[ 12.887932] , at:
[ 12.887937] [<ffffffff813e0ccb>] i915_mutex_lock_interruptible+0x23/0x26
[ 12.887939] Preemption disabled at:
[ 12.887943] [<ffffffff813d67c0>] i915_gem_execbuffer_relocate_entry+0x5fb/0x70f
[ 12.887947] CPU: 2 PID: 787 Comm: X Not tainted 4.9.0-rc2-mm1 #5
[ 12.887948] Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011
[ 12.887950] Call Trace:
[ 12.887955] dump_stack+0x67/0x90
[ 12.887958] ? i915_gem_execbuffer_relocate_entry+0x5fb/0x70f
[ 12.887961] ___might_sleep+0x223/0x23a
[ 12.887963] __might_sleep+0x6d/0x81
[ 12.887966] __pm_runtime_resume+0x35/0x7a
[ 12.887970] intel_runtime_pm_get+0x20/0x7f
[ 12.887973] aliasing_gtt_bind_vma+0x4d/0xb1
[ 12.887975] i915_vma_bind+0x67/0xbd
[ 12.887977] i915_gem_execbuffer_relocate_entry+0xc6/0x70f
[ 12.887981] ? _raw_spin_unlock_irq+0x27/0x45
[ 12.887984] i915_gem_execbuffer_relocate_vma+0x128/0x1dd
[ 12.887987] ? nommu_map_sg+0x9e/0xca
[ 12.887990] ? __i915_vma_do_pin+0x3da/0x421
[ 12.887994] ? i915_gem_execbuffer_reserve_vma.isra.34+0xbc/0x189
[ 12.887996] ? i915_gem_execbuffer_reserve.isra.35+0x32f/0x3da
[ 12.887999] i915_gem_do_execbuffer.isra.36+0x64c/0x10a9
[ 12.888002] i915_gem_execbuffer2+0x15d/0x203
[ 12.888005] drm_ioctl+0x25a/0x38b
[ 12.888007] ? i915_gem_execbuffer+0x2d3/0x2d3
[ 12.888011] vfs_ioctl+0x1c/0x33
[ 12.888014] do_vfs_ioctl+0x5c5/0x601
[ 12.888016] ? __fget+0x17e/0x18f
[ 12.888019] ? expand_files+0x23e/0x23e
[ 12.888021] SyS_ioctl+0x38/0x60
[ 12.888023] entry_SYSCALL_64_fastpath+0x18/0xad

--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -229,8 +229,9 @@ unsigned long i915_gem_shrink_all(struct
static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
{
if (!mutex_trylock(&dev->struct_mutex))
- *unlock = false;
+ return false;

+ *unlock = true;
return true;
}


2016-10-31 13:19:36

by Mike Krinkin

[permalink] [raw]
Subject: Re: Commit "locking/drm: Kill mutex trickery" causes hangs

On Sun, Oct 30, 2016 at 05:09:41PM -0700, Hugh Dickins wrote:
> On Mon, 31 Oct 2016, Mike Krinkin wrote:
> >
> > i faced system hangs with recent linux-next versions, bisect points at the
> > commit 3ab7c086d5ec72585ef0 ("locking/drm: Kill mutex trickery"), bisect log
> > attached. System just hangs after few minutes when i compile kernel with -j4
> > and watch some video simultaneously.
> [...]
> > also lspci -vvv output:
> [...]
> > Kernel driver in use: i915
> > Kernel modules: i915
>
> Yes, that's hit me too, on mmotm on i915. i915_gem_shrinker_lock()
> is broken: but copy the pattern from msm_gem_shrinker_lock() and it's
> okay - patch below. Well, okay-ish: I'm reluctant to sign off on that
> as more than a quick fix for i915 linux-next users, since the unlock
> variable and those _gem_shrinker_lock() wrappers should just be deleted
> (if the mutex trickery is indeed to be killed).
>
> And I'm still left with a "sleeping function called from invalid context"
> warning, which seems easier to live with: I've not looked to see whether
> that's a consequence of the mutex trickery killage or something else.
>
> [ 12.887922] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:956
> [ 12.887925] in_atomic(): 1, irqs_disabled(): 0, pid: 787, name: X
> [ 12.887927] 1 lock held by X/787:
> [ 12.887928] #0:
> [ 12.887929] (
> [ 12.887930] &dev->struct_mutex
> [ 12.887931] ){+.+.+.}
> [ 12.887932] , at:
> [ 12.887937] [<ffffffff813e0ccb>] i915_mutex_lock_interruptible+0x23/0x26
> [ 12.887939] Preemption disabled at:
> [ 12.887943] [<ffffffff813d67c0>] i915_gem_execbuffer_relocate_entry+0x5fb/0x70f
> [ 12.887947] CPU: 2 PID: 787 Comm: X Not tainted 4.9.0-rc2-mm1 #5
> [ 12.887948] Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011
> [ 12.887950] Call Trace:
> [ 12.887955] dump_stack+0x67/0x90
> [ 12.887958] ? i915_gem_execbuffer_relocate_entry+0x5fb/0x70f
> [ 12.887961] ___might_sleep+0x223/0x23a
> [ 12.887963] __might_sleep+0x6d/0x81
> [ 12.887966] __pm_runtime_resume+0x35/0x7a
> [ 12.887970] intel_runtime_pm_get+0x20/0x7f
> [ 12.887973] aliasing_gtt_bind_vma+0x4d/0xb1
> [ 12.887975] i915_vma_bind+0x67/0xbd
> [ 12.887977] i915_gem_execbuffer_relocate_entry+0xc6/0x70f
> [ 12.887981] ? _raw_spin_unlock_irq+0x27/0x45
> [ 12.887984] i915_gem_execbuffer_relocate_vma+0x128/0x1dd
> [ 12.887987] ? nommu_map_sg+0x9e/0xca
> [ 12.887990] ? __i915_vma_do_pin+0x3da/0x421
> [ 12.887994] ? i915_gem_execbuffer_reserve_vma.isra.34+0xbc/0x189
> [ 12.887996] ? i915_gem_execbuffer_reserve.isra.35+0x32f/0x3da
> [ 12.887999] i915_gem_do_execbuffer.isra.36+0x64c/0x10a9
> [ 12.888002] i915_gem_execbuffer2+0x15d/0x203
> [ 12.888005] drm_ioctl+0x25a/0x38b
> [ 12.888007] ? i915_gem_execbuffer+0x2d3/0x2d3
> [ 12.888011] vfs_ioctl+0x1c/0x33
> [ 12.888014] do_vfs_ioctl+0x5c5/0x601
> [ 12.888016] ? __fget+0x17e/0x18f
> [ 12.888019] ? expand_files+0x23e/0x23e
> [ 12.888021] SyS_ioctl+0x38/0x60
> [ 12.888023] entry_SYSCALL_64_fastpath+0x18/0xad
>
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -229,8 +229,9 @@ unsigned long i915_gem_shrink_all(struct
> static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
> {
> if (!mutex_trylock(&dev->struct_mutex))
> - *unlock = false;
> + return false;
>
> + *unlock = true;
> return true;
> }

Works for me, no warnings noted yet. Thank you.

>

Subject: [tip:locking/core] locking/drm: Fix i915_gem_shrinker_lock() locking

Commit-ID: c7faee2109f978f3ef826c48b7e60609061fda4f
Gitweb: http://git.kernel.org/tip/c7faee2109f978f3ef826c48b7e60609061fda4f
Author: Ingo Molnar <[email protected]>
AuthorDate: Thu, 3 Nov 2016 07:16:43 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 3 Nov 2016 07:21:12 +0100

locking/drm: Fix i915_gem_shrinker_lock() locking

Mike Krinkin reported hangs in the DRM code and bisected it to:

3ab7c086d5ec72585ef0 ("locking/drm: Kill mutex trickery")

Hugh Dickins observed:

"i915_gem_shrinker_lock() is broken: but copy the pattern from
msm_gem_shrinker_lock() and it's okay - patch below."

Pick up the fix in isolation to make sure the bug is fixed, cleanup
patch will follow up.

Originally-From: Hugh Dickins <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Reported-by: Mike Krinkin <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/gpu/drm/i915/i915_gem_shrinker.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 3695375..e9bd2a8 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -228,8 +228,9 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
{
if (!mutex_trylock(&dev->struct_mutex))
- *unlock = false;
+ return false;

+ *unlock = true;
return true;
}