2009-01-21 12:28:03

by Sami Kerola

[permalink] [raw]
Subject: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

Hi,

I compiled the Torvalds git kernel 2.6.29-rc2-00013 and I got an oops.
The oops happens when ever X starts. Initially I was booting with run
level 5 and it hung. I tried to use run level to 3 and an operating
system started just fine. When I type startx the hung happen again.
Please let me know if you need some more information besides oops from
messages file and lspci output.


Jan 21 08:53:58 lelux kernel: ------------[ cut here ]------------
Jan 21 08:53:58 lelux kernel: kernel BUG at drivers/gpu/drm/drm_fops.c:146!
Jan 21 08:53:58 lelux kernel: invalid opcode: 0000 [#1] SMP
Jan 21 08:53:58 lelux kernel: last sysfs file:
/sys/devices/pci0000:00/0000:00:02.0/drm/card0/dri_library_name
Jan 21 08:53:58 lelux kernel: CPU 0
Jan 21 08:53:58 lelux kernel: Modules linked in: i915 drm i2c_algo_bit
ipv6 fuse acpi_cpufreq dm_multipath snd_hda_codec_idt arc4 ecb
snd_hda_intel snd_hda_codec iwlag
n snd_hwdep snd_seq_dummy snd_seq_oss iwlcore snd_seq_midi_event
snd_seq rfkill snd_seq_device lib80211 snd_pcm_oss mac80211
snd_mixer_oss snd_pcm firewire_ohci i2c_i8
01 snd_timer firewire_core ppdev snd sr_mod pcspkr joydev yenta_socket
i2c_core video sg iTCO_wdt tg3 rsrc_nonstatic cdrom crc_itu_t
iTCO_vendor_support parport_pc out
put parport libphy soundcore cfg80211 wmi battery snd_page_alloc ac
pata_acpi ata_generic ata_piix libata sd_mod scsi_mod sha256_generic
cbc aes_x86_64 aes_generic dm_
crypt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ext3
jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Jan 21 08:53:58 lelux kernel: Pid: 2902, comm: X Not tainted
2.6.29-rc2-00013-gf3b8436-dirty #1
Jan 21 08:53:58 lelux kernel: RIP: 0010:[<ffffffffa0486f7b>]
[<ffffffffa0486f7b>] drm_open+0x4b7/0x4f0 [drm]
Jan 21 08:53:58 lelux kernel: RSP: 0018:ffff88006ec01cd8 EFLAGS: 00010293
Jan 21 08:53:58 lelux kernel: RAX: ffff88006f8e2100 RBX:
ffff88006f47d060 RCX: 0000000000000000
Jan 21 08:53:58 lelux kernel: RDX: ffff88007a016b90 RSI:
ffff88006ec40000 RDI: ffff88006ec01c28
Jan 21 08:53:58 lelux kernel: RBP: ffff88006ec01d18 R08:
ffff88006ec40760 R09: 00000000000002e7
Jan 21 08:53:58 lelux kernel: R10: ffff88006ec40000 R11:
0000000000000006 R12: 0000000000000000
Jan 21 08:53:58 lelux kernel: R13: ffff88006f47d000 R14:
ffff88006f47d060 R15: ffff88006ec33918
Jan 21 08:53:58 lelux kernel: FS: 00007f4495429780(0000)
GS:ffffffff8192e000(0000) knlGS:0000000000000000
Jan 21 08:53:58 lelux kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 21 08:53:58 lelux kernel: CR2: 00007f4494de01a0 CR3:
0000000070460000 CR4: 00000000000006e0
Jan 21 08:53:58 lelux kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jan 21 08:53:58 lelux kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jan 21 08:53:58 lelux kernel: Process X (pid: 2902, threadinfo
ffff88006ec00000, task ffff88006ec40000)
Jan 21 08:53:58 lelux kernel: Stack:
Jan 21 08:53:58 lelux kernel: ffff88006d071000 ffff88007a016b90
ffff88006d044000 00000000ffffffed
Jan 21 08:53:58 lelux kernel: ffffffffa04959d0 ffff88006d071000
ffff88007a016b90 ffff88007a016b90
Jan 21 08:53:58 lelux kernel: ffff88006ec01d48 ffffffffa0486a53
0000000000000000 ffff88007c9adc00
Jan 21 08:53:58 lelux kernel: Call Trace:
Jan 21 08:53:58 lelux kernel: [<ffffffffa0486a53>]
drm_stub_open+0xd2/0x143 [drm]
Jan 21 08:53:58 lelux kernel: [<ffffffff810df703>] chrdev_open+0x149/0x168
Jan 21 08:53:58 lelux kernel: [<ffffffff8114e8b9>] ?
selinux_dentry_open+0xeb/0xf4
Jan 21 08:53:58 lelux kernel: [<ffffffff810df5ba>] ? chrdev_open+0x0/0x168
Jan 21 08:53:58 lelux kernel: [<ffffffff810db03f>] __dentry_open+0x151/0x270
Jan 21 08:53:58 lelux kernel: [<ffffffff810db235>] nameidata_to_filp+0x46/0x57
Jan 21 08:53:58 lelux kernel: [<ffffffff810e84fb>] do_filp_open+0x44f/0x8a7
Jan 21 08:53:58 lelux kernel: [<ffffffff81065661>] ?
lock_release_holdtime+0x1c/0x173
Jan 21 08:53:58 lelux kernel: [<ffffffff8118a136>] ? _raw_spin_unlock+0x8e/0x94
Jan 21 08:53:58 lelux kernel: [<ffffffff810f14c5>] ? alloc_fd+0x122/0x133
Jan 21 08:53:58 lelux kernel: [<ffffffff810dae22>] do_sys_open+0x58/0xd8
Jan 21 08:53:58 lelux kernel: [<ffffffff810daed5>] sys_open+0x20/0x22
Jan 21 08:53:58 lelux kernel: [<ffffffff8100c42a>]
system_call_fastpath+0x16/0x1b
Jan 21 08:53:58 lelux kernel: Code: 48 89 df e8 55 e4 ea e0 48 8b 45
d0 83 78 04 01 75 2f 49 8b 85 00 06 00 00 48 85 c0 74 11 48 8b 55 c8
48 3b 82 30 02 00 00 74 16 <0f> 0b eb fe 48 8b 55 c8 48 8b 82 30 02 00
00 49 89 85 00 06 00
Jan 21 08:53:58 lelux kernel: RIP [<ffffffffa0486f7b>]
drm_open+0x4b7/0x4f0 [drm]
Jan 21 08:53:58 lelux kernel: RSP <ffff88006ec01cd8>
Jan 21 08:53:58 lelux kernel: ---[ end trace 9cfbed5f45e30ada ]---


You might be interested of my hardware so here comes lspci -vvv (taken
when 2.6.27.9-73.fc9.x86_64 was running).


00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory
Controller Hub (rev 0c)
Subsystem: Dell Unknown device 01f9
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information <?>
Kernel driver in use: agpgart-intel

00:02.0 VGA compatible controller: Intel Corporation Mobile
GM965/GL960 Integrated Graphics Controller (rev 0c) (prog-if 00 [VGA
controller])
Subsystem: Dell Latitude D630
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fea00000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at efe8 [size=8]
Capabilities: [90] Message Signalled Interrupts: Mask- 64bit-
Queue=0/0 Enable-
Address: 00000000 Data: 0000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Bridge: PM- B3+
Kernel modules: intelfb, i915

00:02.1 Display controller: Intel Corporation Mobile GM965/GL960
Integrated Graphics Controller (rev 0c)
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at feb00000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Bridge: PM- B3+

00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB
UHCI Controller #4 (rev 02) (prog-if 00 [UHCI])
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 20
Region 4: I/O ports at 6f20 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd

00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB
UHCI Controller #5 (rev 02) (prog-if 00 [UHCI])
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 21
Region 4: I/O ports at 6f00 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd

00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2
EHCI Controller #2 (rev 02) (prog-if 20 [EHCI])
Subsystem: Dell Unknown device 01f9
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin C routed to IRQ 22
Region 0: Memory at fed1c400 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Debug port: BAR=1 offset=00a0
Kernel driver in use: ehci_hcd
Kernel modules: ehci-hcd

00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio
Controller (rev 02)
Subsystem: Dell Dell Latitude D630
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 21
Region 0: Memory at fe9fc000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
unlimited, L1 unlimited
ExtTag- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr+ TransPend-
LnkCap: Port #0, Speed unknown, Width x0, ASPM
unknown, Latency L0 <64ns, L1 <1us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x0, TrErr- Train-
SlotClk- DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Virtual Channel <?>
Capabilities: [130] Root Complex Link <?>
Kernel driver in use: HDA Intel
Kernel modules: snd-hda-intel

00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express
Port 1 (rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=0b, subordinate=0b, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
unlimited, L1 unlimited
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr+ TransPend-
LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Latency L0 <1us, L1 <4us
ClockPM- Suprise- LLActRep+ BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug+ Surpise+
Slot # 2, PowerLimit 6.500000; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+
CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [80] Message Signalled Interrupts: Mask- 64bit-
Queue=0/0 Enable-
Address: 00000000 Data: 0000
Capabilities: [90] Subsystem: Dell Unknown device 01f9
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100] Virtual Channel <?>
Capabilities: [180] Root Complex Link <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp

00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express
Port 2 (rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=0c, subordinate=0c, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fe800000-fe8fffff
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
unlimited, L1 unlimited
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq-
AuxPwr+ TransPend-
LnkCap: Port #2, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Latency L0 <256ns, L1 <4us
ClockPM- Suprise- LLActRep+ BwNot-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled-
Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug+ Surpise+
Slot # 3, PowerLimit 6.500000; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+
CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet+ Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [80] Message Signalled Interrupts: Mask- 64bit-
Queue=0/0 Enable-
Address: 00000000 Data: 0000
Capabilities: [90] Subsystem: Dell Unknown device 01f9
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100] Virtual Channel <?>
Capabilities: [180] Root Complex Link <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp

00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express
Port 6 (rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=09, subordinate=09, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fe700000-fe7fffff
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
unlimited, L1 unlimited
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq-
AuxPwr+ TransPend-
LnkCap: Port #6, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Latency L0 <256ns, L1 <4us
ClockPM- Suprise- LLActRep+ BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug+ Surpise+
Slot # 3, PowerLimit 6.500000; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+
CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet+ Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [80] Message Signalled Interrupts: Mask- 64bit-
Queue=0/0 Enable-
Address: 00000000 Data: 0000
Capabilities: [90] Subsystem: Dell Unknown device 01f9
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100] Virtual Channel <?>
Capabilities: [180] Root Complex Link <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp

00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB
UHCI Controller #1 (rev 02) (prog-if 00 [UHCI])
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 20
Region 4: I/O ports at 6f80 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd

00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB
UHCI Controller #2 (rev 02) (prog-if 00 [UHCI])
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 21
Region 4: I/O ports at 6f60 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd

00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB
UHCI Controller #3 (rev 02) (prog-if 00 [UHCI])
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin C routed to IRQ 22
Region 4: I/O ports at 6f40 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd

00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2
EHCI Controller #1 (rev 02) (prog-if 20 [EHCI])
Subsystem: Dell Unknown device 01f9
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 20
Region 0: Memory at fed1c000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Debug port: BAR=1 offset=00a0
Kernel driver in use: ehci_hcd
Kernel modules: ehci-hcd

00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
(prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Bus: primary=00, secondary=03, subordinate=07, sec-latency=32
I/O behind bridge: 00002000-00002fff
Memory behind bridge: fe600000-fe6fffff
Prefetchable memory behind bridge: 0000000088000000-000000008bffffff
Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort+ <SERR- <PERR+
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Subsystem: Dell Unknown device 01f9

00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface
Controller (rev 02)
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information <?>
Kernel modules: iTCO_wdt

00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E)
IDE Controller (rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: I/O ports at 01f0 [size=8]
Region 1: I/O ports at 03f4 [size=1]
Region 2: I/O ports at 0170 [size=8]
Region 3: I/O ports at 0374 [size=1]
Region 4: I/O ports at 6fa0 [size=16]
Kernel driver in use: ata_piix
Kernel modules: pata_acpi, ata_generic, ata_piix

00:1f.2 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E)
SATA IDE Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 17
Region 0: I/O ports at 6eb0 [size=8]
Region 1: I/O ports at 6eb8 [size=4]
Region 2: I/O ports at 6ec0 [size=8]
Region 3: I/O ports at 6ec8 [size=4]
Region 4: I/O ports at 6ee0 [size=16]
Region 5: I/O ports at eff0 [size=16]
Capabilities: [70] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: ata_piix
Kernel modules: pata_acpi, ata_generic, ata_piix

00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin B routed to IRQ 17
Region 0: Memory at fe9fbf00 (32-bit, non-prefetchable) [size=256]
Region 4: I/O ports at 10c0 [size=32]
Kernel driver in use: i801_smbus
Kernel modules: i2c-i801

03:01.0 CardBus bridge: O2 Micro, Inc. Cardbus bridge (rev 21)
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping+ SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 168
Interrupt: pin A routed to IRQ 19
Region 0: Memory at fe600000 (32-bit, non-prefetchable) [size=4K]
Bus: primary=03, secondary=04, subordinate=07, sec-latency=176
Memory window 0: 88000000-8bfff000 (prefetchable)
Memory window 1: 8c000000-8ffff000
I/O window 0: 00002000-000020ff
I/O window 1: 00002400-000024ff
BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt+ PostWrite+
16-bit legacy interface ports at 0001
Kernel driver in use: yenta_cardbus
Kernel modules: yenta_socket

03:01.4 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev
02) (prog-if 10 [OHCI])
Subsystem: Dell Unknown device 01f9
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
Latency: 64, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: Memory at fe6ff000 (32-bit, non-prefetchable) [size=4K]
Region 1: Memory at fe6fe800 (32-bit, non-prefetchable) [size=2K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME+
Kernel driver in use: firewire_ohci
Kernel modules: firewire-ohci

09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5755M
Gigabit Ethernet PCI Express (rev 02)
Subsystem: Dell Unknown device 01f9
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at fe7f0000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at <ignored> [disabled]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Vendor Specific Information <?>
Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: e767ffbbad8bdbfc Data: fffa
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Latency L0 <4us, L1 <64us
ClockPM+ Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Advanced Error Reporting <?>
Capabilities: [13c] Virtual Channel <?>
Capabilities: [160] Device Serial Number 39-9e-37-fe-ff-23-1c-00
Capabilities: [16c] Power Budgeting <?>
Kernel driver in use: tg3
Kernel modules: tg3

0c:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or
AGN Network Connection (rev 61)
Subsystem: Intel Corporation Unknown device 1121
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 17
Region 0: Memory at fe8fe000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<512ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq-
AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Latency L0 <128ns, L1 <64us
ClockPM+ Suprise- LLActRep- BwNot-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled-
Retrain- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Advanced Error Reporting <?>
Capabilities: [140] Device Serial Number 43-97-1e-ff-ff-3b-1f-00
Kernel driver in use: iwlagn
Kernel modules: iwlagn

--
Sami Kerola
http://www.iki.fi/kerolasa/


2009-01-30 00:31:14

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

(cc's added)

On Wed, 21 Jan 2009 13:27:48 +0100
Sami Kerola <[email protected]> wrote:

> I compiled the Torvalds git kernel 2.6.29-rc2-00013 and I got an oops.
> The oops happens when ever X starts. Initially I was booting with run
> level 5 and it hung. I tried to use run level to 3 and an operating
> system started just fine. When I type startx the hung happen again.
> Please let me know if you need some more information besides oops from
> messages file and lspci output.
>
>
> Jan 21 08:53:58 lelux kernel: ------------[ cut here ]------------
> Jan 21 08:53:58 lelux kernel: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

I assume that 2.6.28 didn't do this?

Seems odd - nothing much has changed around there lately.

> Jan 21 08:53:58 lelux kernel: invalid opcode: 0000 [#1] SMP
> Jan 21 08:53:58 lelux kernel: last sysfs file:
> /sys/devices/pci0000:00/0000:00:02.0/drm/card0/dri_library_name
> Jan 21 08:53:58 lelux kernel: CPU 0
> Jan 21 08:53:58 lelux kernel: Modules linked in: i915 drm i2c_algo_bit
> ipv6 fuse acpi_cpufreq dm_multipath snd_hda_codec_idt arc4 ecb
> snd_hda_intel snd_hda_codec iwlag
> n snd_hwdep snd_seq_dummy snd_seq_oss iwlcore snd_seq_midi_event
> snd_seq rfkill snd_seq_device lib80211 snd_pcm_oss mac80211
> snd_mixer_oss snd_pcm firewire_ohci i2c_i8
> 01 snd_timer firewire_core ppdev snd sr_mod pcspkr joydev yenta_socket
> i2c_core video sg iTCO_wdt tg3 rsrc_nonstatic cdrom crc_itu_t
> iTCO_vendor_support parport_pc out
> put parport libphy soundcore cfg80211 wmi battery snd_page_alloc ac
> pata_acpi ata_generic ata_piix libata sd_mod scsi_mod sha256_generic
> cbc aes_x86_64 aes_generic dm_
> crypt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ext3
> jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
> Jan 21 08:53:58 lelux kernel: Pid: 2902, comm: X Not tainted
> 2.6.29-rc2-00013-gf3b8436-dirty #1
> Jan 21 08:53:58 lelux kernel: RIP: 0010:[<ffffffffa0486f7b>]
> [<ffffffffa0486f7b>] drm_open+0x4b7/0x4f0 [drm]
> Jan 21 08:53:58 lelux kernel: RSP: 0018:ffff88006ec01cd8 EFLAGS: 00010293
> Jan 21 08:53:58 lelux kernel: RAX: ffff88006f8e2100 RBX:
> ffff88006f47d060 RCX: 0000000000000000
> Jan 21 08:53:58 lelux kernel: RDX: ffff88007a016b90 RSI:
> ffff88006ec40000 RDI: ffff88006ec01c28
> Jan 21 08:53:58 lelux kernel: RBP: ffff88006ec01d18 R08:
> ffff88006ec40760 R09: 00000000000002e7
> Jan 21 08:53:58 lelux kernel: R10: ffff88006ec40000 R11:
> 0000000000000006 R12: 0000000000000000
> Jan 21 08:53:58 lelux kernel: R13: ffff88006f47d000 R14:
> ffff88006f47d060 R15: ffff88006ec33918
> Jan 21 08:53:58 lelux kernel: FS: 00007f4495429780(0000)
> GS:ffffffff8192e000(0000) knlGS:0000000000000000
> Jan 21 08:53:58 lelux kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jan 21 08:53:58 lelux kernel: CR2: 00007f4494de01a0 CR3:
> 0000000070460000 CR4: 00000000000006e0
> Jan 21 08:53:58 lelux kernel: DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Jan 21 08:53:58 lelux kernel: DR3: 0000000000000000 DR6:
> 00000000ffff0ff0 DR7: 0000000000000400
> Jan 21 08:53:58 lelux kernel: Process X (pid: 2902, threadinfo
> ffff88006ec00000, task ffff88006ec40000)
> Jan 21 08:53:58 lelux kernel: Stack:
> Jan 21 08:53:58 lelux kernel: ffff88006d071000 ffff88007a016b90
> ffff88006d044000 00000000ffffffed
> Jan 21 08:53:58 lelux kernel: ffffffffa04959d0 ffff88006d071000
> ffff88007a016b90 ffff88007a016b90
> Jan 21 08:53:58 lelux kernel: ffff88006ec01d48 ffffffffa0486a53
> 0000000000000000 ffff88007c9adc00
> Jan 21 08:53:58 lelux kernel: Call Trace:
> Jan 21 08:53:58 lelux kernel: [<ffffffffa0486a53>]
> drm_stub_open+0xd2/0x143 [drm]
> Jan 21 08:53:58 lelux kernel: [<ffffffff810df703>] chrdev_open+0x149/0x168
> Jan 21 08:53:58 lelux kernel: [<ffffffff8114e8b9>] ?
> selinux_dentry_open+0xeb/0xf4
> Jan 21 08:53:58 lelux kernel: [<ffffffff810df5ba>] ? chrdev_open+0x0/0x168
> Jan 21 08:53:58 lelux kernel: [<ffffffff810db03f>] __dentry_open+0x151/0x270
> Jan 21 08:53:58 lelux kernel: [<ffffffff810db235>] nameidata_to_filp+0x46/0x57
> Jan 21 08:53:58 lelux kernel: [<ffffffff810e84fb>] do_filp_open+0x44f/0x8a7
> Jan 21 08:53:58 lelux kernel: [<ffffffff81065661>] ?
> lock_release_holdtime+0x1c/0x173
> Jan 21 08:53:58 lelux kernel: [<ffffffff8118a136>] ? _raw_spin_unlock+0x8e/0x94
> Jan 21 08:53:58 lelux kernel: [<ffffffff810f14c5>] ? alloc_fd+0x122/0x133
> Jan 21 08:53:58 lelux kernel: [<ffffffff810dae22>] do_sys_open+0x58/0xd8
> Jan 21 08:53:58 lelux kernel: [<ffffffff810daed5>] sys_open+0x20/0x22
> Jan 21 08:53:58 lelux kernel: [<ffffffff8100c42a>]
> system_call_fastpath+0x16/0x1b
> Jan 21 08:53:58 lelux kernel: Code: 48 89 df e8 55 e4 ea e0 48 8b 45
> d0 83 78 04 01 75 2f 49 8b 85 00 06 00 00 48 85 c0 74 11 48 8b 55 c8
> 48 3b 82 30 02 00 00 74 16 <0f> 0b eb fe 48 8b 55 c8 48 8b 82 30 02 00
> 00 49 89 85 00 06 00

2009-01-30 01:06:57

by Dave Airlie

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Fri, Jan 30, 2009 at 10:30 AM, Andrew Morton
<[email protected]> wrote:
> (cc's added)
>
> On Wed, 21 Jan 2009 13:27:48 +0100
> Sami Kerola <[email protected]> wrote:
>
>> I compiled the Torvalds git kernel 2.6.29-rc2-00013 and I got an oops.
>> The oops happens when ever X starts. Initially I was booting with run
>> level 5 and it hung. I tried to use run level to 3 and an operating
>> system started just fine. When I type startx the hung happen again.
>> Please let me know if you need some more information besides oops from
>> messages file and lspci output.
>>
>>
>> Jan 21 08:53:58 lelux kernel: ------------[ cut here ]------------
>> Jan 21 08:53:58 lelux kernel: kernel BUG at drivers/gpu/drm/drm_fops.c:146!
>
> I assume that 2.6.28 didn't do this?

This is a userspace race between udev and libdrm, I'm not sure we can do
anything in the kernel other than BUG, maybe we should just WARN instead.

Basically, libdrm creates devices nodes, the initial drm opening gets that, udev
comes along when the module is loaded and re-creates the device node,
when AIGLX opens the device
it can't figure out wtf just happened, as the inode->i_mapping we use
to store the GEM device mmap ranges is different.

I think building libdrm with --enable-udev is the correct answer, and
maybe switching this to a WARN so it doesn't blow up.

maybe we shouldn't be storing the inode mapping like this? anyone any
better idea?

Dave.
>
> Seems odd - nothing much has changed around there lately.
>
>> Jan 21 08:53:58 lelux kernel: invalid opcode: 0000 [#1] SMP
>> Jan 21 08:53:58 lelux kernel: last sysfs file:
>> /sys/devices/pci0000:00/0000:00:02.0/drm/card0/dri_library_name
>> Jan 21 08:53:58 lelux kernel: CPU 0
>> Jan 21 08:53:58 lelux kernel: Modules linked in: i915 drm i2c_algo_bit
>> ipv6 fuse acpi_cpufreq dm_multipath snd_hda_codec_idt arc4 ecb
>> snd_hda_intel snd_hda_codec iwlag
>> n snd_hwdep snd_seq_dummy snd_seq_oss iwlcore snd_seq_midi_event
>> snd_seq rfkill snd_seq_device lib80211 snd_pcm_oss mac80211
>> snd_mixer_oss snd_pcm firewire_ohci i2c_i8
>> 01 snd_timer firewire_core ppdev snd sr_mod pcspkr joydev yenta_socket
>> i2c_core video sg iTCO_wdt tg3 rsrc_nonstatic cdrom crc_itu_t
>> iTCO_vendor_support parport_pc out
>> put parport libphy soundcore cfg80211 wmi battery snd_page_alloc ac
>> pata_acpi ata_generic ata_piix libata sd_mod scsi_mod sha256_generic
>> cbc aes_x86_64 aes_generic dm_
>> crypt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ext3
>> jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>> Jan 21 08:53:58 lelux kernel: Pid: 2902, comm: X Not tainted
>> 2.6.29-rc2-00013-gf3b8436-dirty #1
>> Jan 21 08:53:58 lelux kernel: RIP: 0010:[<ffffffffa0486f7b>]
>> [<ffffffffa0486f7b>] drm_open+0x4b7/0x4f0 [drm]
>> Jan 21 08:53:58 lelux kernel: RSP: 0018:ffff88006ec01cd8 EFLAGS: 00010293
>> Jan 21 08:53:58 lelux kernel: RAX: ffff88006f8e2100 RBX:
>> ffff88006f47d060 RCX: 0000000000000000
>> Jan 21 08:53:58 lelux kernel: RDX: ffff88007a016b90 RSI:
>> ffff88006ec40000 RDI: ffff88006ec01c28
>> Jan 21 08:53:58 lelux kernel: RBP: ffff88006ec01d18 R08:
>> ffff88006ec40760 R09: 00000000000002e7
>> Jan 21 08:53:58 lelux kernel: R10: ffff88006ec40000 R11:
>> 0000000000000006 R12: 0000000000000000
>> Jan 21 08:53:58 lelux kernel: R13: ffff88006f47d000 R14:
>> ffff88006f47d060 R15: ffff88006ec33918
>> Jan 21 08:53:58 lelux kernel: FS: 00007f4495429780(0000)
>> GS:ffffffff8192e000(0000) knlGS:0000000000000000
>> Jan 21 08:53:58 lelux kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Jan 21 08:53:58 lelux kernel: CR2: 00007f4494de01a0 CR3:
>> 0000000070460000 CR4: 00000000000006e0
>> Jan 21 08:53:58 lelux kernel: DR0: 0000000000000000 DR1:
>> 0000000000000000 DR2: 0000000000000000
>> Jan 21 08:53:58 lelux kernel: DR3: 0000000000000000 DR6:
>> 00000000ffff0ff0 DR7: 0000000000000400
>> Jan 21 08:53:58 lelux kernel: Process X (pid: 2902, threadinfo
>> ffff88006ec00000, task ffff88006ec40000)
>> Jan 21 08:53:58 lelux kernel: Stack:
>> Jan 21 08:53:58 lelux kernel: ffff88006d071000 ffff88007a016b90
>> ffff88006d044000 00000000ffffffed
>> Jan 21 08:53:58 lelux kernel: ffffffffa04959d0 ffff88006d071000
>> ffff88007a016b90 ffff88007a016b90
>> Jan 21 08:53:58 lelux kernel: ffff88006ec01d48 ffffffffa0486a53
>> 0000000000000000 ffff88007c9adc00
>> Jan 21 08:53:58 lelux kernel: Call Trace:
>> Jan 21 08:53:58 lelux kernel: [<ffffffffa0486a53>]
>> drm_stub_open+0xd2/0x143 [drm]
>> Jan 21 08:53:58 lelux kernel: [<ffffffff810df703>] chrdev_open+0x149/0x168
>> Jan 21 08:53:58 lelux kernel: [<ffffffff8114e8b9>] ?
>> selinux_dentry_open+0xeb/0xf4
>> Jan 21 08:53:58 lelux kernel: [<ffffffff810df5ba>] ? chrdev_open+0x0/0x168
>> Jan 21 08:53:58 lelux kernel: [<ffffffff810db03f>] __dentry_open+0x151/0x270
>> Jan 21 08:53:58 lelux kernel: [<ffffffff810db235>] nameidata_to_filp+0x46/0x57
>> Jan 21 08:53:58 lelux kernel: [<ffffffff810e84fb>] do_filp_open+0x44f/0x8a7
>> Jan 21 08:53:58 lelux kernel: [<ffffffff81065661>] ?
>> lock_release_holdtime+0x1c/0x173
>> Jan 21 08:53:58 lelux kernel: [<ffffffff8118a136>] ? _raw_spin_unlock+0x8e/0x94
>> Jan 21 08:53:58 lelux kernel: [<ffffffff810f14c5>] ? alloc_fd+0x122/0x133
>> Jan 21 08:53:58 lelux kernel: [<ffffffff810dae22>] do_sys_open+0x58/0xd8
>> Jan 21 08:53:58 lelux kernel: [<ffffffff810daed5>] sys_open+0x20/0x22
>> Jan 21 08:53:58 lelux kernel: [<ffffffff8100c42a>]
>> system_call_fastpath+0x16/0x1b
>> Jan 21 08:53:58 lelux kernel: Code: 48 89 df e8 55 e4 ea e0 48 8b 45
>> d0 83 78 04 01 75 2f 49 8b 85 00 06 00 00 48 85 c0 74 11 48 8b 55 c8
>> 48 3b 82 30 02 00 00 74 16 <0f> 0b eb fe 48 8b 55 c8 48 8b 82 30 02 00
>> 00 49 89 85 00 06 00
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-01-30 01:17:39

by Dan Nicholson

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Thu, Jan 29, 2009 at 5:06 PM, Dave Airlie <[email protected]> wrote:
> On Fri, Jan 30, 2009 at 10:30 AM, Andrew Morton
> <[email protected]> wrote:
>> (cc's added)
>>
>> On Wed, 21 Jan 2009 13:27:48 +0100
>> Sami Kerola <[email protected]> wrote:
>>
>>> I compiled the Torvalds git kernel 2.6.29-rc2-00013 and I got an oops.
>>> The oops happens when ever X starts. Initially I was booting with run
>>> level 5 and it hung. I tried to use run level to 3 and an operating
>>> system started just fine. When I type startx the hung happen again.
>>> Please let me know if you need some more information besides oops from
>>> messages file and lspci output.
>>>
>>>
>>> Jan 21 08:53:58 lelux kernel: ------------[ cut here ]------------
>>> Jan 21 08:53:58 lelux kernel: kernel BUG at drivers/gpu/drm/drm_fops.c:146!
>>
>> I assume that 2.6.28 didn't do this?
>
> This is a userspace race between udev and libdrm, I'm not sure we can do
> anything in the kernel other than BUG, maybe we should just WARN instead.
>
> Basically, libdrm creates devices nodes, the initial drm opening gets that, udev
> comes along when the module is loaded and re-creates the device node,
> when AIGLX opens the device
> it can't figure out wtf just happened, as the inode->i_mapping we use
> to store the GEM device mmap ranges is different.
>
> I think building libdrm with --enable-udev is the correct answer, and
> maybe switching this to a WARN so it doesn't blow up.

Hi Dave,

I really think libdrm should have --enable-udev on by default for
linux. The number of people not using udev for managing their devices
is surely in the minority now. What do you think?

--
Dan

2009-01-30 01:20:38

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Fri, 30 Jan 2009 11:06:47 +1000 Dave Airlie <[email protected]> wrote:

> On Fri, Jan 30, 2009 at 10:30 AM, Andrew Morton
> <[email protected]> wrote:
> > (cc's added)
> >
> > On Wed, 21 Jan 2009 13:27:48 +0100
> > Sami Kerola <[email protected]> wrote:
> >
> >> I compiled the Torvalds git kernel 2.6.29-rc2-00013 and I got an oops.
> >> The oops happens when ever X starts. Initially I was booting with run
> >> level 5 and it hung. I tried to use run level to 3 and an operating
> >> system started just fine. When I type startx the hung happen again.
> >> Please let me know if you need some more information besides oops from
> >> messages file and lspci output.
> >>
> >>
> >> Jan 21 08:53:58 lelux kernel: ------------[ cut here ]------------
> >> Jan 21 08:53:58 lelux kernel: kernel BUG at drivers/gpu/drm/drm_fops.c:146!
> >
> > I assume that 2.6.28 didn't do this?
>
> This is a userspace race between udev and libdrm, I'm not sure we can do
> anything in the kernel other than BUG, maybe we should just WARN instead.
>
> Basically, libdrm creates devices nodes, the initial drm opening gets that, udev
> comes along when the module is loaded and re-creates the device node,
> when AIGLX opens the device
> it can't figure out wtf just happened, as the inode->i_mapping we use
> to store the GEM device mmap ranges is different.
>
> I think building libdrm with --enable-udev is the correct answer, and
> maybe switching this to a WARN so it doesn't blow up.
>
> maybe we shouldn't be storing the inode mapping like this? anyone any
> better idea?
>

hm, I'm a bit surprised to see the drm code using `struct
address_space' and read_mapping_page() and unmap_mapping_range() and
such. I thought those only worked with regular files and pagecache :)

Is it possible to briefly explain what's going on there?

What instance of address_space_operations does ->dev_mapping actually
point at?

2009-01-30 01:43:34

by Dave Airlie

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Fri, Jan 30, 2009 at 11:20 AM, Andrew Morton
<[email protected]> wrote:
> On Fri, 30 Jan 2009 11:06:47 +1000 Dave Airlie <[email protected]> wrote:
>
>> On Fri, Jan 30, 2009 at 10:30 AM, Andrew Morton
>> <[email protected]> wrote:
>> > (cc's added)
>> >
>> > On Wed, 21 Jan 2009 13:27:48 +0100
>> > Sami Kerola <[email protected]> wrote:
>> >
>> >> I compiled the Torvalds git kernel 2.6.29-rc2-00013 and I got an oops.
>> >> The oops happens when ever X starts. Initially I was booting with run
>> >> level 5 and it hung. I tried to use run level to 3 and an operating
>> >> system started just fine. When I type startx the hung happen again.
>> >> Please let me know if you need some more information besides oops from
>> >> messages file and lspci output.
>> >>
>> >>
>> >> Jan 21 08:53:58 lelux kernel: ------------[ cut here ]------------
>> >> Jan 21 08:53:58 lelux kernel: kernel BUG at drivers/gpu/drm/drm_fops.c:146!
>> >
>> > I assume that 2.6.28 didn't do this?
>>
>> This is a userspace race between udev and libdrm, I'm not sure we can do
>> anything in the kernel other than BUG, maybe we should just WARN instead.
>>
>> Basically, libdrm creates devices nodes, the initial drm opening gets that, udev
>> comes along when the module is loaded and re-creates the device node,
>> when AIGLX opens the device
>> it can't figure out wtf just happened, as the inode->i_mapping we use
>> to store the GEM device mmap ranges is different.
>>
>> I think building libdrm with --enable-udev is the correct answer, and
>> maybe switching this to a WARN so it doesn't blow up.
>>
>> maybe we shouldn't be storing the inode mapping like this? anyone any
>> better idea?
>>
>
> hm, I'm a bit surprised to see the drm code using `struct
> address_space' and read_mapping_page() and unmap_mapping_range() and
> such. I thought those only worked with regular files and pagecache :)
>
> Is it possible to briefly explain what's going on there?
>
> What instance of address_space_operations does ->dev_mapping actually
> point at?

Okay a bit tired and headache coming on but I'll try, maybe jbarnes
can help out,

We need to provide mappings to userspace that are backed by memory
that can move around behind the mappings.

So userspace wants a mapping for a GEM object via the AGP/GTT aperture
instead of directly to the backing pages.
Now as the GEM object is backed by shmem we can't use the shmem file
descriptor we have to tie the mapping to without
hacking up the shmem mmap functionality which seemed like a bad plan.

So GEM uses the device inode to setup the mappings on. We just use a
simple linear allocator to split up the device inodes address space
and assign chunks to handles for different objects. The userspace app
then uses the handle via mmap to get access to the VMAs. Now when GEM
wants to move that object out of the GTT or to another area of the GTT
we need some way to invalidate it, so we use unmap_mapping_range
which destroys all the mappings for the object in all the VMA for all
the processes mapping it currently

GEM's read_mapping_page is distinct from this and is to do with the
shmem interfaceing.

Not sure if this explains it or just make it worse.

Dave.

2009-01-30 03:50:35

by Jesse Barnes

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Thursday, January 29, 2009 5:43 pm Dave Airlie wrote:
> On Fri, Jan 30, 2009 at 11:20 AM, Andrew Morton
> > hm, I'm a bit surprised to see the drm code using `struct
> > address_space' and read_mapping_page() and unmap_mapping_range() and
> > such. I thought those only worked with regular files and pagecache :)
> >
> > Is it possible to briefly explain what's going on there?
> >
> > What instance of address_space_operations does ->dev_mapping actually
> > point at?
>
> Okay a bit tired and headache coming on but I'll try, maybe jbarnes
> can help out,
>
> We need to provide mappings to userspace that are backed by memory
> that can move around behind the mappings.
>
> So userspace wants a mapping for a GEM object via the AGP/GTT aperture
> instead of directly to the backing pages.
> Now as the GEM object is backed by shmem we can't use the shmem file
> descriptor we have to tie the mapping to without
> hacking up the shmem mmap functionality which seemed like a bad plan.
>
> So GEM uses the device inode to setup the mappings on. We just use a
> simple linear allocator to split up the device inodes address space
> and assign chunks to handles for different objects. The userspace app
> then uses the handle via mmap to get access to the VMAs. Now when GEM
> wants to move that object out of the GTT or to another area of the GTT
> we need some way to invalidate it, so we use unmap_mapping_range
> which destroys all the mappings for the object in all the VMA for all
> the processes mapping it currently
>
> GEM's read_mapping_page is distinct from this and is to do with the
> shmem interfaceing.
>
> Not sure if this explains it or just make it worse.

Sounds right to me. The offsets are just handles, not real file objects or
backing store addresses. We use them to take advantage of all the inode
address mapping helpers, since they track stuff for us.

That said, unmap_mapping_range may not be the best way to do this; basically
we need a way to invalidate a given processes' mapping of a GTT range (which
in turn is backed by real RAM). If there's some other way we should be doing
this I'm all ears.

--
Jesse Barnes, Intel Open Source Technology Center

2009-01-30 04:44:58

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Thu, 29 Jan 2009 19:50:17 -0800 Jesse Barnes <[email protected]> wrote:

> On Thursday, January 29, 2009 5:43 pm Dave Airlie wrote:
> > On Fri, Jan 30, 2009 at 11:20 AM, Andrew Morton
> > > hm, I'm a bit surprised to see the drm code using `struct
> > > address_space' and read_mapping_page() and unmap_mapping_range() and
> > > such. I thought those only worked with regular files and pagecache :)
> > >
> > > Is it possible to briefly explain what's going on there?
> > >
> > > What instance of address_space_operations does ->dev_mapping actually
> > > point at?
> >
> > Okay a bit tired and headache coming on but I'll try, maybe jbarnes
> > can help out,
> >
> > We need to provide mappings to userspace that are backed by memory
> > that can move around behind the mappings.
> >
> > So userspace wants a mapping for a GEM object via the AGP/GTT aperture
> > instead of directly to the backing pages.
> > Now as the GEM object is backed by shmem we can't use the shmem file
> > descriptor we have to tie the mapping to without
> > hacking up the shmem mmap functionality which seemed like a bad plan.
> >
> > So GEM uses the device inode to setup the mappings on. We just use a
> > simple linear allocator to split up the device inodes address space
> > and assign chunks to handles for different objects. The userspace app
> > then uses the handle via mmap to get access to the VMAs. Now when GEM
> > wants to move that object out of the GTT or to another area of the GTT
> > we need some way to invalidate it, so we use unmap_mapping_range
> > which destroys all the mappings for the object in all the VMA for all
> > the processes mapping it currently
> >
> > GEM's read_mapping_page is distinct from this and is to do with the
> > shmem interfaceing.
> >
> > Not sure if this explains it or just make it worse.
>
> Sounds right to me. The offsets are just handles, not real file objects or
> backing store addresses. We use them to take advantage of all the inode
> address mapping helpers, since they track stuff for us.
>
> That said, unmap_mapping_range may not be the best way to do this; basically
> we need a way to invalidate a given processes' mapping of a GTT range (which
> in turn is backed by real RAM). If there's some other way we should be doing
> this I'm all ears.

Well, we'd need to call in the big guns on this one - I've already
stirred Hugh ;)

unmap_mapping_range() is basically a truncate thing - it shoots down
all mappings of a range of a *file*. Across all processes in the
machine which map that file.

If that isn't what you want to do (and it sounds that way) then you'd
want to use something which is mm_struct (or vma) centric, rather than
file-centric. zap_page_range(), methinks.

2009-01-30 08:43:30

by Sami Kerola

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Fri, Jan 30, 2009 at 05:44, Andrew Morton <[email protected]> wrote:
> On Thu, 29 Jan 2009 19:50:17 -0800 Jesse Barnes <[email protected]> wrote:
>
>> On Thursday, January 29, 2009 5:43 pm Dave Airlie wrote:
>> > On Fri, Jan 30, 2009 at 11:20 AM, Andrew Morton
>> > > hm, I'm a bit surprised to see the drm code using `struct
>> > > address_space' and read_mapping_page() and unmap_mapping_range() and
>> > > such. I thought those only worked with regular files and pagecache :)
>> > >
>> > > Is it possible to briefly explain what's going on there?
>> > >
>> > > What instance of address_space_operations does ->dev_mapping actually
>> > > point at?
>> >
>> > Okay a bit tired and headache coming on but I'll try, maybe jbarnes
>> > can help out,
>> >
>> > We need to provide mappings to userspace that are backed by memory
>> > that can move around behind the mappings.
>> >
>> > So userspace wants a mapping for a GEM object via the AGP/GTT aperture
>> > instead of directly to the backing pages.
>> > Now as the GEM object is backed by shmem we can't use the shmem file
>> > descriptor we have to tie the mapping to without
>> > hacking up the shmem mmap functionality which seemed like a bad plan.
>> >
>> > So GEM uses the device inode to setup the mappings on. We just use a
>> > simple linear allocator to split up the device inodes address space
>> > and assign chunks to handles for different objects. The userspace app
>> > then uses the handle via mmap to get access to the VMAs. Now when GEM
>> > wants to move that object out of the GTT or to another area of the GTT
>> > we need some way to invalidate it, so we use unmap_mapping_range
>> > which destroys all the mappings for the object in all the VMA for all
>> > the processes mapping it currently
>> >
>> > GEM's read_mapping_page is distinct from this and is to do with the
>> > shmem interfaceing.
>> >
>> > Not sure if this explains it or just make it worse.
>>
>> Sounds right to me. The offsets are just handles, not real file objects or
>> backing store addresses. We use them to take advantage of all the inode
>> address mapping helpers, since they track stuff for us.
>>
>> That said, unmap_mapping_range may not be the best way to do this; basically
>> we need a way to invalidate a given processes' mapping of a GTT range (which
>> in turn is backed by real RAM). If there's some other way we should be doing
>> this I'm all ears.
>
> Well, we'd need to call in the big guns on this one - I've already
> stirred Hugh ;)
>
> unmap_mapping_range() is basically a truncate thing - it shoots down
> all mappings of a range of a *file*. Across all processes in the
> machine which map that file.
>
> If that isn't what you want to do (and it sounds that way) then you'd
> want to use something which is mm_struct (or vma) centric, rather than
> file-centric. zap_page_range(), methinks.

I have never seen this happen with 2.6.28. Last kernel that works
without freeze is 2.6.29-rc1-00224. Next one which I compiled was
2.6.29-rc1-00534 and it did not work. I assumed PEBKAC. When the next
one I tried 2.6.29-rc2-00013 did the same so I sent a bug report.

I don't know does following have anything to do with issue but extra
information has never hurt anyone.

1. Kernels which has the problem print 'acpiphp_ibm: ibm_acpiphp_init:
acpi_walknamespace failed' before LUKS is started.

2. One out of three reboots fail because LUKS is unable to find partitions.

--
Sami Kerola
http://www.iki.fi/kerolasa/

2009-01-30 09:22:40

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Fri, 30 Jan 2009 10:13:55 +0100 Thomas Hellstr__m <[email protected]> wrote:

> >> Sounds right to me. The offsets are just handles, not real file objects or
> >> backing store addresses. We use them to take advantage of all the inode
> >> address mapping helpers, since they track stuff for us.
> >>
> >> That said, unmap_mapping_range may not be the best way to do this; basically
> >> we need a way to invalidate a given processes' mapping of a GTT range (which
> >> in turn is backed by real RAM). If there's some other way we should be doing
> >> this I'm all ears.
> >>
> >
> > Well, we'd need to call in the big guns on this one - I've already
> > stirred Hugh ;)
> >
> > unmap_mapping_range() is basically a truncate thing - it shoots down
> > all mappings of a range of a *file*. Across all processes in the
> > machine which map that file.
> >
> > If that isn't what you want to do (and it sounds that way) then you'd
> > want to use something which is mm_struct (or vma) centric, rather than
> > file-centric. zap_page_range(), methinks.
> >
> >
> I guess I was the one starting to use this function, so some explanation:
>
> When the drm device is used to provide address space for buffers,
> user-space actually see it as a file with a distinct offset where
> buffers are laid out in a linear fashion, To access a certain buffer you
> need to lseek() to the correct offset and then read() write() or, the
> more common use, mmap / munmap.
>
> When looking through its implementation, unmap_mapping_range() seemed to
> do exactly the thing I wanted, namely to kill all user-space mappings of
> all vmas of all processes mapping a part of the device address space.

That's different from what Jesse said. That _is_ a more appropriate
use of unmap_mapping_range(). Although all the futzing that function
does with truncate_count is now looking inappropriately-placed.

> And it saves us from storing a list of all vmas mapping the device
> within the drm device.
>
> What makes usage of unmap_mapping_range() on a device node with a well
> defined offset-to-data mapping different from using it on a file?

umm, nothing I guess, if the driver sufficiently imitates a regular
file. It's unexpected (by me). I don't think we wrote that code with
this application in mind ;)


2009-01-30 09:31:16

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

Andrew Morton wrote:
> On Thu, 29 Jan 2009 19:50:17 -0800 Jesse Barnes <[email protected]> wrote:
>
>
>> On Thursday, January 29, 2009 5:43 pm Dave Airlie wrote:
>>
>>> On Fri, Jan 30, 2009 at 11:20 AM, Andrew Morton
>>>
>>>> hm, I'm a bit surprised to see the drm code using `struct
>>>> address_space' and read_mapping_page() and unmap_mapping_range() and
>>>> such. I thought those only worked with regular files and pagecache :)
>>>>
>>>> Is it possible to briefly explain what's going on there?
>>>>
>>>> What instance of address_space_operations does ->dev_mapping actually
>>>> point at?
>>>>
>>> Okay a bit tired and headache coming on but I'll try, maybe jbarnes
>>> can help out,
>>>
>>> We need to provide mappings to userspace that are backed by memory
>>> that can move around behind the mappings.
>>>
>>> So userspace wants a mapping for a GEM object via the AGP/GTT aperture
>>> instead of directly to the backing pages.
>>> Now as the GEM object is backed by shmem we can't use the shmem file
>>> descriptor we have to tie the mapping to without
>>> hacking up the shmem mmap functionality which seemed like a bad plan.
>>>
>>> So GEM uses the device inode to setup the mappings on. We just use a
>>> simple linear allocator to split up the device inodes address space
>>> and assign chunks to handles for different objects. The userspace app
>>> then uses the handle via mmap to get access to the VMAs. Now when GEM
>>> wants to move that object out of the GTT or to another area of the GTT
>>> we need some way to invalidate it, so we use unmap_mapping_range
>>> which destroys all the mappings for the object in all the VMA for all
>>> the processes mapping it currently
>>>
>>> GEM's read_mapping_page is distinct from this and is to do with the
>>> shmem interfaceing.
>>>
>>> Not sure if this explains it or just make it worse.
>>>
>> Sounds right to me. The offsets are just handles, not real file objects or
>> backing store addresses. We use them to take advantage of all the inode
>> address mapping helpers, since they track stuff for us.
>>
>> That said, unmap_mapping_range may not be the best way to do this; basically
>> we need a way to invalidate a given processes' mapping of a GTT range (which
>> in turn is backed by real RAM). If there's some other way we should be doing
>> this I'm all ears.
>>
>
> Well, we'd need to call in the big guns on this one - I've already
> stirred Hugh ;)
>
> unmap_mapping_range() is basically a truncate thing - it shoots down
> all mappings of a range of a *file*. Across all processes in the
> machine which map that file.
>
> If that isn't what you want to do (and it sounds that way) then you'd
> want to use something which is mm_struct (or vma) centric, rather than
> file-centric. zap_page_range(), methinks.
>
>
I guess I was the one starting to use this function, so some explanation:

When the drm device is used to provide address space for buffers,
user-space actually see it as a file with a distinct offset where
buffers are laid out in a linear fashion, To access a certain buffer you
need to lseek() to the correct offset and then read() write() or, the
more common use, mmap / munmap.

When looking through its implementation, unmap_mapping_range() seemed to
do exactly the thing I wanted, namely to kill all user-space mappings of
all vmas of all processes mapping a part of the device address space.
And it saves us from storing a list of all vmas mapping the device
within the drm device.

What makes usage of unmap_mapping_range() on a device node with a well
defined offset-to-data mapping different from using it on a file?

/Thomas
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> --
> _______________________________________________
> Dri-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dri-devel
>


2009-01-30 10:44:19

by Thomas Hellstrom

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

Andrew Morton wrote:
> On Fri, 30 Jan 2009 10:13:55 +0100 Thomas Hellstr__m <[email protected]> wrote:
>
>
>>>> Sounds right to me. The offsets are just handles, not real file objects or
>>>> backing store addresses. We use them to take advantage of all the inode
>>>> address mapping helpers, since they track stuff for us.
>>>>
>>>> That said, unmap_mapping_range may not be the best way to do this; basically
>>>> we need a way to invalidate a given processes' mapping of a GTT range (which
>>>> in turn is backed by real RAM). If there's some other way we should be doing
>>>> this I'm all ears.
>>>>
>>>>
>>> Well, we'd need to call in the big guns on this one - I've already
>>> stirred Hugh ;)
>>>
>>> unmap_mapping_range() is basically a truncate thing - it shoots down
>>> all mappings of a range of a *file*. Across all processes in the
>>> machine which map that file.
>>>
>>> If that isn't what you want to do (and it sounds that way) then you'd
>>> want to use something which is mm_struct (or vma) centric, rather than
>>> file-centric. zap_page_range(), methinks.
>>>
>>>
>>>
>> I guess I was the one starting to use this function, so some explanation:
>>
>> When the drm device is used to provide address space for buffers,
>> user-space actually see it as a file with a distinct offset where
>> buffers are laid out in a linear fashion, To access a certain buffer you
>> need to lseek() to the correct offset and then read() write() or, the
>> more common use, mmap / munmap.
>>
>> When looking through its implementation, unmap_mapping_range() seemed to
>> do exactly the thing I wanted, namely to kill all user-space mappings of
>> all vmas of all processes mapping a part of the device address space.
>>
>
> That's different from what Jesse said. That _is_ a more appropriate
> use of unmap_mapping_range(). Although all the futzing that function
> does with truncate_count is now looking inappropriately-placed.
>
>
Hmm, yes. I guess that was to fix a race with old do_nopage()?
Since GEM and similar managers are using vm_insert_pfn, I guess that's
not really needed.
>> And it saves us from storing a list of all vmas mapping the device
>> within the drm device.
>>
>> What makes usage of unmap_mapping_range() on a device node with a well
>> defined offset-to-data mapping different from using it on a file?
>>
>
> umm, nothing I guess, if the driver sufficiently imitates a regular
> file. It's unexpected (by me). I don't think we wrote that code with
> this application in mind ;)
>
>
>
No. It's a little odd ;), but we had a thorough discussion at that time
(some two years ago) on LKML.

What strucks me now, though, is that if the struct address_space *
differs between device nodes pointing to the same underlying device, we
might be in trouble (Is that what started this thread from the first
place?), and we might have to resort to keep track of all VMAs anyway...

/Thomas


2009-01-30 16:55:36

by Jesse Barnes

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at drivers/gpu/drm/drm_fops.c:146!

On Friday, January 30, 2009 1:21 am Andrew Morton wrote:
> On Fri, 30 Jan 2009 10:13:55 +0100 Thomas Hellstr__m <[email protected]>
wrote:
> > >> Sounds right to me. The offsets are just handles, not real file
> > >> objects or backing store addresses. We use them to take advantage of
> > >> all the inode address mapping helpers, since they track stuff for us.
> > >>
> > >> That said, unmap_mapping_range may not be the best way to do this;
> > >> basically we need a way to invalidate a given processes' mapping of a
> > >> GTT range (which in turn is backed by real RAM). If there's some
> > >> other way we should be doing this I'm all ears.
> > >
> > > Well, we'd need to call in the big guns on this one - I've already
> > > stirred Hugh ;)
> > >
> > > unmap_mapping_range() is basically a truncate thing - it shoots down
> > > all mappings of a range of a *file*. Across all processes in the
> > > machine which map that file.
> > >
> > > If that isn't what you want to do (and it sounds that way) then you'd
> > > want to use something which is mm_struct (or vma) centric, rather than
> > > file-centric. zap_page_range(), methinks.
> >
> > I guess I was the one starting to use this function, so some explanation:
> >
> > When the drm device is used to provide address space for buffers,
> > user-space actually see it as a file with a distinct offset where
> > buffers are laid out in a linear fashion, To access a certain buffer you
> > need to lseek() to the correct offset and then read() write() or, the
> > more common use, mmap / munmap.
> >
> > When looking through its implementation, unmap_mapping_range() seemed to
> > do exactly the thing I wanted, namely to kill all user-space mappings of
> > all vmas of all processes mapping a part of the device address space.
>
> That's different from what Jesse said. That _is_ a more appropriate
> use of unmap_mapping_range(). Although all the futzing that function
> does with truncate_count is now looking inappropriately-placed.

Yeah I misspoke, we do need to blow away *all* the mappings, not just the ones
for a given process (since the backing GTT mapping is gone/moved). We could
probably use zap_page_range, but might have to do a bit more work in the
driver if we did.

--
Jesse Barnes, Intel Open Source Technology Center