Today I have experienced two "GPU hungs" on machine with 82865G chipset
working with 2.6.38.2 kernel
In the /var/log/syslog I have found the following errors:
Apr 15 22:50:51 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
Hangcheck timer elapsed... GPU hung
Apr 15 22:50:51 wzab kernel: [drm:i915_do_wait_request] *ERROR*
i915_do_wait_request returns -11 (awaiting 1352947 at 1352942, next 1352948)
Apr 15 22:50:51 wzab kernel: [drm:init_ring_common] *ERROR* failed to
set render ring head to zero ctl 00000000 head 3280a54c tail 00000000
start 00000000
Apr 15 22:50:51 wzab kernel: [drm:init_ring_common] *ERROR* render ring
initialization failed ctl 0001f003 head 3280a54c tail 00000000 start
00000000
Apr 15 22:50:52 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
Hangcheck timer elapsed... GPU hung
Apr 15 22:50:52 wzab kernel: [drm:i915_do_wait_request] *ERROR*
i915_do_wait_request returns -11 (awaiting 1353049 at 1352942, next 1353050)
After reboot:
Apr 15 22:56:41 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
Hangcheck timer elapsed... GPU hung
Apr 15 22:56:41 wzab kernel: [drm:i915_do_wait_request] *ERROR*
i915_do_wait_request returns -11 (awaiting 29134 at 29129, next 29135)
--
Regards,
Wojtek
(CC'ing Chris Wilson)
On Fri, Apr 15, 2011 at 11:07:51PM +0200, wzab wrote:
> Today I have experienced two "GPU hungs" on machine with 82865G chipset
> working with 2.6.38.2 kernel
> In the /var/log/syslog I have found the following errors:
>
> Apr 15 22:50:51 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
> Hangcheck timer elapsed... GPU hung
> Apr 15 22:50:51 wzab kernel: [drm:i915_do_wait_request] *ERROR*
> i915_do_wait_request returns -11 (awaiting 1352947 at 1352942, next
> 1352948)
> Apr 15 22:50:51 wzab kernel: [drm:init_ring_common] *ERROR* failed to
> set render ring head to zero ctl 00000000 head 3280a54c tail 00000000
> start 00000000
> Apr 15 22:50:51 wzab kernel: [drm:init_ring_common] *ERROR* render ring
> initialization failed ctl 0001f003 head 3280a54c tail 00000000 start
> 00000000
> Apr 15 22:50:52 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
> Hangcheck timer elapsed... GPU hung
> Apr 15 22:50:52 wzab kernel: [drm:i915_do_wait_request] *ERROR*
> i915_do_wait_request returns -11 (awaiting 1353049 at 1352942, next
> 1353050)
>
> After reboot:
>
> Apr 15 22:56:41 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
> Hangcheck timer elapsed... GPU hung
> Apr 15 22:56:41 wzab kernel: [drm:i915_do_wait_request] *ERROR*
> i915_do_wait_request returns -11 (awaiting 29134 at 29129, next 29135)
--
Sitsofe | http://sucs.org/~sits/
Hi Wojtek,
Your best chance to see the bugs fixed is to report a bug following
guidelines at http://intellinuxgraphics.org/how_to_report_bug.html
For GPU hangs, collecting the error state (while GPU is hung) under
debugfs is very helpful.
Report contents of (as well as details on your software stack and
if know what kind of action triggered the hung):
/sys/kernel/debug/dri/0/i915_error_state
Also you should CC [email protected] and Chris Wilson
<[email protected]> for all issues regarding intel GPUs as
thats the place it has better chance of being seen by someone who
can help out. (intel-gfx is subscribers-only but at least part of the
thread would reach it)
Regards,
Bruno
On Fri, 15 April 2011 wzab <[email protected]> wrote:
> Today I have experienced two "GPU hungs" on machine with 82865G chipset
> working with 2.6.38.2 kernel
> In the /var/log/syslog I have found the following errors:
>
> Apr 15 22:50:51 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
> Hangcheck timer elapsed... GPU hung
> Apr 15 22:50:51 wzab kernel: [drm:i915_do_wait_request] *ERROR*
> i915_do_wait_request returns -11 (awaiting 1352947 at 1352942, next 1352948)
> Apr 15 22:50:51 wzab kernel: [drm:init_ring_common] *ERROR* failed to
> set render ring head to zero ctl 00000000 head 3280a54c tail 00000000
> start 00000000
> Apr 15 22:50:51 wzab kernel: [drm:init_ring_common] *ERROR* render ring
> initialization failed ctl 0001f003 head 3280a54c tail 00000000 start
> 00000000
> Apr 15 22:50:52 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
> Hangcheck timer elapsed... GPU hung
> Apr 15 22:50:52 wzab kernel: [drm:i915_do_wait_request] *ERROR*
> i915_do_wait_request returns -11 (awaiting 1353049 at 1352942, next 1353050)
>
> After reboot:
>
> Apr 15 22:56:41 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
> Hangcheck timer elapsed... GPU hung
> Apr 15 22:56:41 wzab kernel: [drm:i915_do_wait_request] *ERROR*
> i915_do_wait_request returns -11 (awaiting 29134 at 29129, next 29135)
>
On Fri, 15 Apr 2011 23:07:51 +0200, wzab <[email protected]> wrote:
> Today I have experienced two "GPU hungs" on machine with 82865G chipset
> working with 2.6.38.2 kernel
> In the /var/log/syslog I have found the following errors:
>
> Apr 15 22:50:51 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
> Hangcheck timer elapsed... GPU hung
As Bruno said there is a /sys/kernel/debug/dri/0/i915_error_state file
that contains a GPU dump at the time of the error which often contains the
vital clue at to what went wrong. If you can also think back to what was
happening on the machine at the time of the hang, that can also help
identify the trigger and the suspect code.
Thanks,
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
wzab wrote:
> Today I have experienced two "GPU hungs" on machine with 82865G chipset
> working with 2.6.38.2 kernel
> In the /var/log/syslog I have found the following errors:
>
> [...]
>
Hi, just for info, I seemingly had the same error occurring yesterday, but I
also failed to retrieve information from the debugfs. Also on 2.6.38.2,
chipset being H55 Express. Seems to happen after a long uptime. I will be more
careful to retrieve debug information next time.
Another sort-of-regression I noticed since moving to 2.6.38 is the 30 second
waiting period when resuming. However, I have no idea what driver might be the
cause.
Martin
---- from syslog ----
Apr 15 19:16:28 arnold kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed... GPU hung
Apr 15 19:16:28 arnold kernel: [drm:i915_do_wait_request] *ERROR*
i915_do_wait_request returns -11 (awaiting 50400651 at 50400598, next
50400652)
Apr 15 19:16:30 arnold kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed... GPU hung
Apr 15 19:16:30 arnold kernel: [drm:init_ring_common] *ERROR* failed to set
render ring head to zero ctl 00000000 head 87c0f714 tail 00000000 start
00001000
Apr 15 19:16:30 arnold kernel: [drm:init_ring_common] *ERROR* render ring
initialization failed ctl 0001f003 head 87c0f714 tail 00000000 start 00001000
Apr 15 19:16:58 arnold kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed... GPU hung
Apr 15 19:16:58 arnold kernel: [drm:init_ring_common] *ERROR* failed to set
render ring head to zero ctl 00000000 head 87c0f714 tail 00000000 start
00001000
Apr 15 19:16:58 arnold kernel: [drm:init_ring_common] *ERROR* render ring
initialization failed ctl 0001f003 head 87c0f714 tail 00000000 start 00001000
Ooops, it seems, that my previous message with uncompressed logs was a
little too big.
I'm resending it with compressed logs, even though they will be probably
discarded in the archive as "unhandled content"
WZab
W dniu 16.04.2011 08:07, Chris Wilson pisze:
> On Fri, 15 Apr 2011 23:07:51 +0200, wzab<[email protected]> wrote:
>> Today I have experienced two "GPU hungs" on machine with 82865G chipset
>> working with 2.6.38.2 kernel
>> In the /var/log/syslog I have found the following errors:
>>
>> Apr 15 22:50:51 wzab kernel: [drm:i915_hangcheck_elapsed] *ERROR*
>> Hangcheck timer elapsed... GPU hung
>
> As Bruno said there is a /sys/kernel/debug/dri/0/i915_error_state file
> that contains a GPU dump at the time of the error which often contains the
> vital clue at to what went wrong. If you can also think back to what was
> happening on the machine at the time of the hang, that can also help
> identify the trigger and the suspect code.
>
> Thanks,
> -Chris
>
Hi,
After I've switched on debugging (booted with drm.debug=0x06 and mounted
"sudo mount -t debugfs debugfs /sys/kernel/debug" the probability of the
error decreased.
However after 2 hours or work it happened again.
There was nothing specific performed on the machine.
OK. This time I had the iceweasel window in the background, and was
running an application under wine emulator, but previously
the same problem occured when I had only two gnome-terminals opened and
iceweasel.
The problem may be associated with switching of active window in X or
with switching of active tab in iceweasel (i.e. with activity related to
massive change of displayed contents).
I attach the output of the i915_error_state.txt, the log of X server.
Sorry for big uncompressed files, but when I compressed them previously,
I saw that they were ignored by archive website as "unhandled content".
Below follows information requested on
http://intellinuxgraphics.org/how_to_report_bug.html
output of "uname -m": i686
output of "uname -a":
Linux wzab 2.6.38.2 #1 SMP PREEMPT Fri Apr 8 18:37:23 CEST 2011 i686
GNU/Linux
info about chipset (from lspci):
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM
Controller/Host-Hub Interface (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated
Graphics Controller (rev 02)
00:03.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to CSA Bridge (rev 02)
00:06.0 System peripheral: Intel Corporation 82865G/PE/P Processor to
I/O Memory Interface (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB
UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB
UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB
UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB
UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2
EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC
Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE
Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller
(rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus
Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER
(ICH5/ICH5R) AC'97 Audio Controller (rev 02)
01:01.0 Ethernet controller: Intel Corporation 82547EI Gigabit Ethernet
Controller
Version of libdrm2:
Package: libdrm2
Priority: optional
Section: libs
Installed-Size: 500
Maintainer: Debian X Strike Force <[email protected]>
Architecture: i386
Source: libdrm
Version: 2.4.23-3
Depends: libc6 (>= 2.7)
Filename: pool/main/libd/libdrm/libdrm2_2.4.23-3_i386.deb
Size: 421754
MD5sum: e175512785e1db00a09a4ed2063acbeb
SHA1: 1f010300dd200d4a70337f190ecc6848e653bece
SHA256: 2735ec5fbbcad7c48c34308702aa8867c8aa0ae26c5f1be52dadb431f1355c08
I was not able to send the glxinfo, as after login via ssh to the hung
machine and running "DISPLAY=:0 glxinfo" the command hung and didn't
display anything.
I have also attached the output of "intel_gpu_dump" command.
--
HTH & Regards,
Wojtek Zabolotny