2011-02-27 09:11:11

by Paolo Ornati

[permalink] [raw]
Subject: [2.6.38-rc6] G965: i915 Hangcheck timer elapsed... GPU hung (not reproducible)

Today I got this while starting a video in SMplayer (MPlayer) with
2.6.38-rc6-00113-g4662db4:

[ 830.880014] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 830.880736] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 174895 at 174857, next 174896)
[ 830.881093] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[ 831.379079] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[ 831.399099] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
...
[ 837.392012] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 837.392038] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 175016 at 174857, next 175022)
[ 837.392491] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[ 837.537479] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[ 837.543285] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
...
[ 839.040011] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 839.040034] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 175022 at 174857, next 175122)
[ 839.040364] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 839.040367] [drm:i915_reset] *ERROR* Failed to reset chip.

Screen was almost freezed, cursor was stuck but the machine was alive
and I was able to use SysRq to kill X and try to restart it (but that
didn't help).

I don't remember anything similar in recent kernels (<= 2.6.37) and got
this only once with 2.6.38-rcX.

Environment at the time of the GPU crash:
KDE4 (without "Desktop Effects")
Chromium
Claws Mail
Dolphin
ccached make -j3 on a just pulled linux-tree (so I/O bound)
SMPlayer/Mplayer (just launched)

Assorted logs attached.

Bye,

--
Paolo Ornati
Linux 2.6.38-rc6-00113-g4662db4 on x86_64


Attachments:
(No filename) (2.10 kB)
config (57.94 kB)
dmesg (124.29 kB)
lspci (2.03 kB)
normal-dmesg (44.98 kB)
Xorg.log (17.77 kB)
Download all attachments

2011-03-02 23:57:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [2.6.38-rc6] G965: i915 Hangcheck timer elapsed... GPU hung (not reproducible)


(cc dri-devel)

A post-2.6.37 regression.

On Sun, 27 Feb 2011 10:10:41 +0100
Paolo Ornati <[email protected]> wrote:

> Today I got this while starting a video in SMplayer (MPlayer) with
> 2.6.38-rc6-00113-g4662db4:
>
> [ 830.880014] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
> [ 830.880736] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 174895 at 174857, next 174896)
> [ 830.881093] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> [ 831.379079] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
> [ 831.399099] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
> ...
> [ 837.392012] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
> [ 837.392038] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 175016 at 174857, next 175022)
> [ 837.392491] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> [ 837.537479] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
> [ 837.543285] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
> ...
> [ 839.040011] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
> [ 839.040034] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 175022 at 174857, next 175122)
> [ 839.040364] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> [ 839.040367] [drm:i915_reset] *ERROR* Failed to reset chip.
>
> Screen was almost freezed, cursor was stuck but the machine was alive
> and I was able to use SysRq to kill X and try to restart it (but that
> didn't help).
>
> I don't remember anything similar in recent kernels (<= 2.6.37) and got
> this only once with 2.6.38-rcX.
>
> Environment at the time of the GPU crash:
> KDE4 (without "Desktop Effects")
> Chromium
> Claws Mail
> Dolphin
> ccached make -j3 on a just pulled linux-tree (so I/O bound)
> SMPlayer/Mplayer (just launched)
>
> Assorted logs attached.
>

2011-03-03 00:43:41

by Nick Bowler

[permalink] [raw]
Subject: Re: [2.6.38-rc6] G965: i915 Hangcheck timer elapsed... GPU hung (not reproducible)

On Sun, 27 Feb 2011 10:10:41 +0100 Paolo Ornati <[email protected]> wrote:
> Today I got this while starting a video in SMplayer (MPlayer) with
> 2.6.38-rc6-00113-g4662db4:

> > [ 830.880014] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
> > [ 830.880736] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 174895 at 174857, next 174896)
> > [ 830.881093] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > [ 831.379079] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
> > [ 831.399099] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling

I was experiencing intermittent hangs when starting mplayer earlier in
this release cycle (on both a desktop with a G45 and a laptop with a
GM45), but I haven't encountered them in quite a while. I don't know if
they looked exactly like the above since all the hangs have been rotated
out of my logs :(. I ended up concluding that it was actually a
regression in xf86-video-intel rather than the kernel (but no real way
of testing this), since there was a lot of Xv related churn in the
driver around the time I was having the issues.

So you might want to try again with the latest git xf86-video-intel and
see if it still happens.

--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

2011-03-03 20:02:27

by Paolo Ornati

[permalink] [raw]
Subject: Re: [2.6.38-rc6] G965: i915 Hangcheck timer elapsed... GPU hung (not reproducible)

On Wed, 2 Mar 2011 19:43:26 -0500
Nick Bowler <[email protected]> wrote:

> So you might want to try again with the latest git xf86-video-intel and
> see if it still happens.

xf86-video-intel bug or not the kernel should be able to reset the GPU
I think... (I'm using KMS).

Anyway I'm using 2.6.38-rcX on this PC for a month, and it happened
only once. This is the complete list of 2.6.38 based kernel I've used an
when (login time):

Sun Jan 30 09:00:04 CET 2011 -- Linux tux 2.6.38-rc2-00274-g1f0324c
Sun Jan 30 15:52:14 CET 2011 -- Linux tux 2.6.38-rc2-00274-g1f0324c
Mon Jan 31 20:01:29 CET 2011 -- Linux tux 2.6.38-rc2-00274-g1f0324c
Tue Feb 1 19:31:05 CET 2011 -- Linux tux 2.6.38-rc2-00274-g1f0324c
Wed Feb 2 19:39:28 CET 2011 -- Linux tux 2.6.38-rc3
Thu Feb 3 19:22:14 CET 2011 -- Linux tux 2.6.38-rc3
Fri Feb 4 20:14:07 CET 2011 -- Linux tux 2.6.38-rc3
Sat Feb 5 08:09:57 CET 2011 -- Linux tux 2.6.38-rc3
Sun Feb 6 09:28:15 CET 2011 -- Linux tux 2.6.38-rc3
Sun Feb 6 13:09:55 CET 2011 -- Linux tux 2.6.38-rc3
Mon Feb 7 19:31:43 CET 2011 -- Linux tux 2.6.38-rc3
Tue Feb 8 19:56:27 CET 2011 -- Linux tux 2.6.38-rc3
Wed Feb 9 20:09:37 CET 2011 -- Linux tux 2.6.38-rc4
Fri Feb 11 20:01:20 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 12 09:16:32 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 12 12:28:23 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 12 14:39:15 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 12 15:05:29 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 12 15:20:37 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 12 15:23:49 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 12 20:44:18 CET 2011 -- Linux tux 2.6.38-rc4
Sun Feb 13 08:40:09 CET 2011 -- Linux tux 2.6.38-rc4
Sun Feb 13 13:11:28 CET 2011 -- Linux tux 2.6.38-rc4
Mon Feb 14 20:20:12 CET 2011 -- Linux tux 2.6.38-rc4
Tue Feb 15 20:28:22 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 19 08:55:51 CET 2011 -- Linux tux 2.6.38-rc4
Sat Feb 19 09:56:02 CET 2011 -- Linux tux 2.6.38-rc5-00100-g0cc9d52
Sat Feb 19 13:50:37 CET 2011 -- Linux tux 2.6.38-rc5-00100-g0cc9d52
Sat Feb 19 18:05:28 CET 2011 -- Linux tux 2.6.38-rc5-00100-g0cc9d52
Sat Feb 19 20:12:05 CET 2011 -- Linux tux 2.6.38-rc5-00100-g0cc9d52
Sun Feb 20 08:57:19 CET 2011 -- Linux tux 2.6.38-rc5-00100-g0cc9d52
Sun Feb 20 18:05:43 CET 2011 -- Linux tux 2.6.38-rc5-00100-g0cc9d52
Mon Feb 21 20:22:51 CET 2011 -- Linux tux 2.6.38-rc5-00100-g0cc9d52
Tue Feb 22 20:02:13 CET 2011 -- Linux tux 2.6.38-rc5-00100-g0cc9d52
Wed Feb 23 20:06:45 CET 2011 -- Linux tux 2.6.38-rc6-00020-gd8204a3
Thu Feb 24 20:15:23 CET 2011 -- Linux tux 2.6.38-rc6-00020-gd8204a3
Fri Feb 25 20:04:37 CET 2011 -- Linux tux 2.6.38-rc6-00020-gd8204a3
Sat Feb 26 09:18:44 CET 2011 -- Linux tux 2.6.38-rc6-00113-g4662db4
Sat Feb 26 19:45:28 CET 2011 -- Linux tux 2.6.38-rc6-00113-g4662db4
Sun Feb 27 09:12:54 CET 2011 -- Linux tux 2.6.38-rc6-00113-g4662db4 <--- GPU hung ;)
Sun Feb 27 09:32:17 CET 2011 -- Linux tux 2.6.38-rc6-00113-g4662db4
Sun Feb 27 15:32:46 CET 2011 -- Linux tux 2.6.38-rc6-00113-g4662db4
Sun Feb 27 17:38:17 CET 2011 -- Linux tux 2.6.38-rc6-00113-g4662db4
Mon Feb 28 21:00:49 CET 2011 -- Linux tux 2.6.38-rc6-00113-g4662db4
Tue Mar 1 20:25:01 CET 2011 -- Linux tux 2.6.38-rc6-00113-g4662db4
Thu Mar 3 20:27:57 CET 2011 -- Linux tux 2.6.38-rc6-00212-g3e1f235

I'll see if it happens again...

--
Paolo Ornati
Linux 2.6.38-rc6-00212-g3e1f235 on x86_64

2011-03-14 19:21:25

by Paolo Ornati

[permalink] [raw]
Subject: Re: [2.6.38-rc6] G965: i915 Hangcheck timer elapsed... GPU hung (not reproducible)

On Thu, 3 Mar 2011 21:01:52 +0100
Paolo Ornati <[email protected]> wrote:

> I'll see if it happens again...

Happened again today (two weeks later), always while starting a video
in SMplayer/MPlayer.

kernel: 2.6.38-rc8

dmesg
-------
[ 2512.924027] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 2512.924719] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 261386 at 261385, next 261414)
[ 2512.924945] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[ 2513.060468] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
...

Xorg.log
-----------
[ 2512.240] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 2512.240]
Backtrace:
[ 2512.254] 0: /usr/bin/X (xorg_backtrace+0x28) [0x4a1a78]
[ 2512.254] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x4a1414]
[ 2512.254] 2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47e374]
[ 2512.254] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f0354270000+0x546c) [0x7f035427546c]
[ 2512.254] 4: /usr/bin/X (0x400000+0x6c407) [0x46c407]
[ 2512.254] 5: /usr/bin/X (0x400000+0x11a933) [0x51a933]
[ 2512.254] 6: /lib/libpthread.so.0 (0x7f0358890000+0xf120) [0x7f035889f120]
[ 2512.254] 7: /lib/libc.so.6 (ioctl+0x7) [0x7f03578c2107]
[ 2512.254] 8: /usr/lib/libdrm.so.2 (drmIoctl+0x28) [0x7f0356290bf8]
[ 2512.254] 9: /usr/lib/libdrm_intel.so.1 (drm_intel_gem_bo_map_gtt+0x7c) [0x7f0355a27f6c]
[ 2512.254] 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f0355c2d000+0x151c4) [0x7f0355c421c4]
[ 2512.254] 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f0355c2d000+0x159f9) [0x7f0355c429f9]
[ 2512.254] 12: /usr/bin/X (0x400000+0x12d000) [0x52d000]
[ 2512.254] 13: /usr/lib64/xorg/modules/extensions/libextmod.so (0x7f035690a000+0x10114) [0x7f035691a114]
[ 2512.254] 14: /usr/bin/X (0x400000+0x2f1b9) [0x42f1b9]
[ 2512.255] 15: /usr/bin/X (0x400000+0x247fb) [0x4247fb]
[ 2512.255] 16: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f0357816bbd]
[ 2512.255] 17: /usr/bin/X (0x400000+0x24389) [0x424389]
[ 2517.380] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 2517.380]
Backtrace:
[ 2517.380] 0: /usr/bin/X (xorg_backtrace+0x28) [0x4a1a78]
[ 2517.380] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x4a1414]
[ 2517.380] 2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47e374]
[ 2517.380] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f0354270000+0x546c) [0x7f035427546c]
[ 2517.380] 4: /usr/bin/X (0x400000+0x6c407) [0x46c407]
[ 2517.380] 5: /usr/bin/X (0x400000+0x11a933) [0x51a933]
[ 2517.380] 6: /lib/libpthread.so.0 (0x7f0358890000+0xf120) [0x7f035889f120]
[ 2517.380] 7: /lib/libc.so.6 (ioctl+0x7) [0x7f03578c2107]
[ 2517.380] 8: /usr/lib/libdrm.so.2 (drmIoctl+0x28) [0x7f0356290bf8]
[ 2517.380] 9: /usr/lib/libdrm_intel.so.1 (drm_intel_gem_bo_map_gtt+0x7c) [0x7f0355a27f6c]
[ 2517.380] 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f0355c2d000+0x129b3) [0x7f0355c3f9b3]
[ 2517.380] 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f0355c2d000+0x2a102) [0x7f0355c57102]
[ 2517.380] 12: /usr/bin/X (0x400000+0x157cf0) [0x557cf0]
[ 2517.380] 13: /usr/bin/X (0x400000+0x30adb) [0x430adb]
[ 2517.380] 14: /usr/bin/X (0x400000+0x2f1b9) [0x42f1b9]
[ 2517.380] 15: /usr/bin/X (0x400000+0x247fb) [0x4247fb]
[ 2517.380] 16: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f0357816bbd]
[ 2517.380] 17: /usr/bin/X (0x400000+0x24389) [0x424389]

--
Paolo Ornati
Linux 2.6.38-rc8 on x86_64


Attachments:
(No filename) (3.49 kB)
dmesg.txt (123.76 kB)
Xorg.0.log (28.87 kB)
Download all attachments