2011-02-14 17:31:46

by Chris Clayton

[permalink] [raw]
Subject: System lockup with 2.6.38-rc4+

Hi,

I'm not subscribed, so please cc me on any reply.

I've just had a complete system lock up with a kernel that I pulled, built and
installed yesterday morning. IT was locked hard and I had to power off and on
to get it back.

The kernel log file has this snippet at the end:

Feb 14 16:29:21 upstairs kernel: pci 0000:00:02.0: PCI INT A -> GSI 16 (level,
low) -> IRQ 16
Feb 14 16:29:21 upstairs kernel: pci 0000:00:02.0: setting latency timer to 64
Feb 14 16:29:21 upstairs kernel: [drm] Supports vblank timestamp caching Rev 1
(10.10.2010).
Feb 14 16:29:21 upstairs kernel: [drm] No driver support for vblank timestamp
query.
Feb 14 16:29:22 upstairs kernel: [drm] Initialized i915 1.6.0 20080730 for
0000:00:02.0 on minor 0
Feb 14 16:44:27 upstairs kernel: render error detected, EIR: 0x00000010
Feb 14 16:44:27 upstairs kernel: IPEIR: 0x00000000
Feb 14 16:44:27 upstairs kernel: IPEHR: 0x01000000
Feb 14 16:44:27 upstairs kernel: INSTDONE: 0xfffffffe
Feb 14 16:44:27 upstairs kernel: INSTPS: 0x0001e000
Feb 14 16:44:27 upstairs kernel: INSTDONE1: 0xffffffff
Feb 14 16:44:27 upstairs kernel: ACTHD: 0x05204d80
Feb 14 16:44:27 upstairs kernel: page table error
Feb 14 16:44:27 upstairs kernel: PGTBL_ER: 0x00000002
Feb 14 16:44:27 upstairs kernel: [drm:i915_report_and_clear_eir] *ERROR* EIR
stuck: 0x00000010, masking
Feb 14 16:45:09 upstairs kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed... GPU hung
Feb 14 16:45:09 upstairs kernel: render error detected, EIR: 0x00000010
Feb 14 16:45:09 upstairs kernel: IPEIR: 0x00000000
Feb 14 16:45:09 upstairs kernel: IPEHR: 0x54f00006
Feb 14 16:45:09 upstairs kernel: INSTDONE: 0xffffffff
Feb 14 16:45:09 upstairs kernel: INSTPS: 0x8001e02a
Feb 14 16:45:09 upstairs kernel: INSTDONE1: 0xbfbbffff
Feb 14 16:45:09 upstairs kernel: ACTHD: 0x0721207c
Feb 14 16:45:09 upstairs kernel: page table error
Feb 14 16:45:09 upstairs kernel: PGTBL_ER: 0x01000003
Feb 14 16:45:10 upstairs kernel: [drm:i915_reset] *ERROR* Failed to reset chip.

and Xorg log file has:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/X11/bin/X(xorg_backtrace+0x3b) [0x81352db]
1: /usr/X11/bin/X(mieqEnqueue+0x26c) [0x81144ec]
2: /usr/X11/bin/X(xf86PostMotionEventP+0xc2) [0x80c5a92]
3: /usr/X11/bin/X(xf86PostMotionEvent+0x87) [0x80c5c17]
4: /usr/X11/lib/xorg/modules/input//mouse_drv.so(+0x6e39) [0xb71e4e39]
5: /usr/X11/lib/xorg/modules/input//mouse_drv.so(+0x7a68) [0xb71e5a68]
6: /usr/X11/lib/xorg/modules/input//mouse_drv.so(+0x3f17) [0xb71e1f17]
7: /usr/X11/bin/X() [0x80ba1d7]
8: /usr/X11/bin/X() [0x80af49b]
9: [0xb77d4400]
10: /usr/X11R6/lib/libdrm.so.2(drmCommandWrite+0x3b) [0xb77b168b]
11: /usr/X11/lib/xorg/modules/drivers//intel_drv.so(I830Sync+0xc5) [0xb720b025]
12: /usr/X11/lib/xorg/modules/drivers//intel_drv.so(+0x49efa) [0xb723eefa]
13: /usr/X11/lib/xorg/modules//libexa.so(exaWaitSync+0x63) [0xb6f200a3]
14: /usr/X11/lib/xorg/modules//libexa.so(ExaDoPrepareAccess+0x7b) [0xb6f20ddb]
15: /usr/X11/lib/xorg/modules//libexa.so(ExaCheckPutImage+0xf8) [0xb6f28f08]
16: /usr/X11/lib/xorg/modules//libexa.so(+0x548c) [0xb6f2248c]
17: /usr/X11/bin/X() [0x817da72]
18: /usr/X11/bin/X(ProcPutImage+0x164) [0x8083864]
19: /usr/X11/bin/X(Dispatch+0x2bc) [0x808706c]
20: /usr/X11/bin/X(main+0x46a) [0x806be6a]
21: /lib/libc.so.6(__libc_start_main+0xe6) [0xb7331b86]
22: /usr/X11/bin/X() [0x806b2c1]

followed by many instances of thse two lines:

[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Xorg server is version 1.5.3 and xorg intel driver is version 2.7.1

bzipped copies of the kernel and Xorg logs are attached but, please let me know
if any additional diagnostics are needed.

Thanks,

Chris
--
The more I see, the more I know. The more I know, the less I understand.
Changing Man - Paul Weller


Attachments:
(No filename) (3.82 kB)
lockup.kernel.log.bz2 (10.31 kB)
lockup.Xorg.0.log.bz2 (6.48 kB)
Download all attachments

2011-02-14 18:09:47

by Ben Gamari

[permalink] [raw]
Subject: Re: System lockup with 2.6.38-rc4+

On Mon, 14 Feb 2011 17:31:29 +0000, Chris Clayton <[email protected]> wrote:
> Hi,
>
> I'm not subscribed, so please cc me on any reply.
>
> I've just had a complete system lock up with a kernel that I pulled, built and
> installed yesterday morning. IT was locked hard and I had to power off and on
> to get it back.
>
It looks like the GPU barfed. If you SSH'd in to the machine you'd find
that everything but the display and resources held by the hung X server
would be fine. Were you doing anything particular in your X session when
this happened? There's a good chance this isn't actually a kernel bug
but instead is a DRI client doing something dumb.

Cheers,

- Ben

P.S. The bzipped kernel image generally won't help in diagnosing the
problem. The most important thing to include in a bug report is the
kernel version and the dmesg output from the failure if available. Folks
will ask later if more is necessary.

2011-02-14 19:21:29

by Chris Clayton

[permalink] [raw]
Subject: Re: System lockup with 2.6.38-rc4+

On Monday 14 February 2011, Ben Gamari wrote:
> On Mon, 14 Feb 2011 17:31:29 +0000, Chris Clayton <[email protected]>
wrote:
> > Hi,
> >
> > I'm not subscribed, so please cc me on any reply.
> >
> > I've just had a complete system lock up with a kernel that I pulled,
> > built and installed yesterday morning. IT was locked hard and I had to
> > power off and on to get it back.
>
> It looks like the GPU barfed. If you SSH'd in to the machine you'd find
> that everything but the display and resources held by the hung X server
> would be fine. Were you doing anything particular in your X session when
> this happened? There's a good chance this isn't actually a kernel bug
> but instead is a DRI client doing something dumb.
>

I was panning around a google map in firefox. Other than that, I had no other
applications running. I've just done the same thing in 2.6.37 for 10 minutes
with no problems - 2.6.38-rc4+ In fact, I don't recall having this sort of
problem in any previous kernel.

> Cheers,
>
> - Ben
>
> P.S. The bzipped kernel image generally won't help in diagnosing the
> problem. The most important thing to include in a bug report is the
> kernel version and the dmesg output from the failure if available. Folks
> will ask later if more is necessary.

Note kernel and Xorg logs, plural - i.e. the logs from the /var/log/kernel
and /var/Xorg.log. The kernel version is in the subject.

Thanks,

Chris

--
The more I see, the more I know. The more I know, the less I understand.
Changing Man - Paul Weller