2012-10-23 06:19:00

by Norbert Preining

[permalink] [raw]
Subject: drm i915 hangs on heavy io load

Hi everyone,

(please Cc)

I am running 3.7-rc2 and got recently hit a few times (under rc1, too)
by hanging drm i915 while doing large io operations.

The efect in the dmesg:
[13193.297751] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[13193.297758] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[13193.302728] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 00000000 head 85a05e3c tail 00000000 start 00003000
[13193.357584] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 85a05e3c tail 00000000 start 00003000
[13194.861769] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[13194.861838] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[13194.861840] [drm:i915_reset] *ERROR* Failed to reset chip.

I captured the i915_error_state and uploaded it here:
http://www.logic.at/people/preining/drm_i915_error_state.gz

The hangs have been normally initiated on svn up in a very big
repository, or git checkout on a very big repository or so.

Other system is Debian/unstable. The above output and error state is
from after a reboot without any suspends or other tricks inbetween,
uptime 3.5h.

Best wishes and thanks for any suggestions

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
CORRIEMOILLIE (n.)
The dreadful sinking sensation in a long passageway encounter when
both protagonists immediately realise they have plumped for the
corriedoo (q.v.) much too early as they are still a good thirty yards
apart. They were embarrassed by the pretence of corriecravie (q.v.)
and decided to make use of the corriedoo because they felt silly. This
was a mistake as corrievorrie (q.v.) will make them seem far sillier.
--- Douglas Adams, The Meaning of Liff


2012-10-23 06:56:49

by Dave Airlie

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

>
> (please Cc)
>
> I am running 3.7-rc2 and got recently hit a few times (under rc1, too)
> by hanging drm i915 while doing large io operations.

Does booting with i915.i915_enable_rc6=0 help?

(Daniel, looks like an ironlake).

Dave.

2012-10-23 07:24:21

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

Hi Dave,

(switched to freedesktop for dri-dvel)

> Does booting with i915.i915_enable_rc6=0 help?

Will try immediately.

> (Daniel, looks like an ironlake).

Sorry, I forgot that one ... how stupid>

>From XOrg.0.log:
...
[ 13535.841] (II) intel(0): Integrated Graphics Chipset: Intel(R) Arrandale
[ 13535.841] (--) intel(0): Chipset: "Arrandale"
...

00:02.0 0300: 8086:0046 (rev 02) (prog-if 00 [VGA controller])
Subsystem: 17aa:215a
Flags: bus master, fast devsel, latency 0, IRQ 42
Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
I/O ports at 1800 [size=8]
Expansion ROM at <unassigned> [disabled]
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [a4] PCI Advanced Features
Kernel driver in use: i915

00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02)

Does that make any differences?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
WIKE (vb.)
To rip a piece of sticky plaster off your skin as fast as possible in
the hope that it will (a) show how brave you are, and (b) not hurt.
--- Douglas Adams, The Meaning of Liff

2012-10-23 09:18:19

by Chris Wilson

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Tue, 23 Oct 2012 14:38:30 +0900, Norbert Preining <[email protected]> wrote:
> Hi everyone,
>
> (please Cc)
>
> I am running 3.7-rc2 and got recently hit a few times (under rc1, too)
> by hanging drm i915 while doing large io operations.
[snip]
>
> I captured the i915_error_state and uploaded it here:
> http://www.logic.at/people/preining/drm_i915_error_state.gz
>
> The hangs have been normally initiated on svn up in a very big
> repository, or git checkout on a very big repository or so.
>
> Other system is Debian/unstable. The above output and error state is
> from after a reboot without any suspends or other tricks inbetween,
> uptime 3.5h.

Looks like fallout from a missing ILK rc6 workaround - it looks like the
write to the ring tail never landed and so the command streamer hung.

See https://bugs.freedesktop.org/show_bug.cgi?id=55984 and
http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile of which I
think
http://cgit.freedesktop.org/~danvet/drm/commit/?h=ilk-wa-pile&id=0d5fed2de763b49bb1a90140758153481f043757
is the missing ingredient.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2012-10-24 00:37:06

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

Hi Dave, hi Chris,

thanks for your answers.

On Di, 23 Okt 2012, Dave Airlie wrote:
> Does booting with i915.i915_enable_rc6=0 help?

No,booted with that, it happened again on a completely idle
system (well, I believe completely idle, I was doing the
dishes ;-)

[12437.995026] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[12437.995034] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[12438.000213] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 00000000 head 5ee06f14 tail 00000000 start 00003000
[12438.054894] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 5ee06f14 tail 00000000 start 00003000
[12439.583064] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[12439.583176] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[12439.583182] [drm:i915_reset] *ERROR* Failed to reset chip.

New output see here:
http://www.logic.at/people/preining/i915_error_state.gz

> http://cgit.freedesktop.org/~danvet/drm/commit/?h=ilk-wa-pile&id=0d5fed2de763b49bb1a90140758153481f043757
> is the missing ingredient.

I am compiling a kernel with this patch based on current git now.
Should I still use the above kernel cmd argument (i915...rc6=0)
or try without it?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
What are you talking about?
Never mind, eat the fruit.
You know, this place almost looks like the Garden of Eden.
Eat the fruit.
Sounds quite like it too.
--- Douglas Adams, The Hitchhikers Guide to the Galaxy

2012-10-24 08:21:41

by Chris Wilson

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Wed, 24 Oct 2012 09:36:59 +0900, Norbert Preining <[email protected]> wrote:
> Hi Dave, hi Chris,
>
> thanks for your answers.
>
> On Di, 23 Okt 2012, Dave Airlie wrote:
> > Does booting with i915.i915_enable_rc6=0 help?
>
> No,booted with that, it happened again on a completely idle
> system (well, I believe completely idle, I was doing the
> dishes ;-)
>
> [12437.995026] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [12437.995034] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> [12438.000213] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 00000000 head 5ee06f14 tail 00000000 start 00003000
> [12438.054894] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 5ee06f14 tail 00000000 start 00003000
> [12439.583064] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [12439.583176] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> [12439.583182] [drm:i915_reset] *ERROR* Failed to reset chip.
>
> New output see here:
> http://www.logic.at/people/preining/i915_error_state.gz

That has a very similar look to it, so reasonable to assume that is the
same issue.

> > http://cgit.freedesktop.org/~danvet/drm/commit/?h=ilk-wa-pile&id=0d5fed2de763b49bb1a90140758153481f043757
> > is the missing ingredient.
>
> I am compiling a kernel with this patch based on current git now.
> Should I still use the above kernel cmd argument (i915...rc6=0)
> or try without it?

Without any rc6 parameter would be best. But if rc6=0 wasn't the
solution for you, then I may have identified the wrong w/a. Can I ask
you try the patches in that branch until you find one (or more perhaps)
that stabilise your system?
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2012-10-28 02:48:00

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

Hi Chris,

I haven't answered due to several reboots necessary (sometimes
I have to work on Win***) and no effect, but ..

On Mi, 24 Okt 2012, Chris Wilson wrote:
> > > http://cgit.freedesktop.org/~danvet/drm/commit/?h=ilk-wa-pile&id=0d5fed2de763b49bb1a90140758153481f043757
> > > is the missing ingredient.
> >
> > I am compiling a kernel with this patch based on current git now.
> > Should I still use the above kernel cmd argument (i915...rc6=0)
> > or try without it?
>
> Without any rc6 parameter would be best. But if rc6=0 wasn't the
> solution for you, then I may have identified the wrong w/a. Can I ask
> you try the patches in that branch until you find one (or more perhaps)
> that stabilise your system?

I pulled the whole branch into my compile branch, and removed everything
from kernel cmd line regarding rc6, and got the
[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[drm:i915_reset] *ERROR* Failed to reset chip.
new i915_error_state.gz at the same place.

So it seems that the patches in the ilk-wa-pile branch do not help.

All the best

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
SCRONKEY (n.)
Something that hits the window as a result of a violent sneeze.
--- Douglas Adams, The Meaning of Liff

2012-10-28 11:11:07

by Chris Wilson

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Sun, 28 Oct 2012 11:47:53 +0900, Norbert Preining <[email protected]> wrote:
> I pulled the whole branch into my compile branch, and removed everything
> from kernel cmd line regarding rc6, and got the
> [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> [drm:i915_reset] *ERROR* Failed to reset chip.
> new i915_error_state.gz at the same place.
>
> So it seems that the patches in the ilk-wa-pile branch do not help.

Yeah, looks like we have another issue to contend with, so can you
please file a bug on bugzilla.freedesktop.org (or bugzilla.kernel.org)
so that we don't lose track of it.

If your have the option, can you switch the ddx between using SNA and
UXA. They stress different paths through the driver which may provide a
lue. Thanks,
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2012-10-28 12:31:58

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

Hi Chris,


> so can you
> please file a bug on bugzilla.freedesktop.org (or bugzilla.kernel.org)
> so that we don't lose track of it.

Will do when I'm back from the mountains.

> If your have the option, can you switch the ddx between using SNA and
> UXA.

??? Is that a BIOS option? Or kernel?
I can try both.

Norbert

(on mobile)

2012-10-29 07:25:26

by Tino Keitel

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Sun, Oct 28, 2012 at 21:32:53 +0900, Norbert Preining wrote:
> Hi Chris,
>
>
> > so can you
> > please file a bug on bugzilla.freedesktop.org (or bugzilla.kernel.org)
> > so that we don't lose track of it.
>
> Will do when I'm back from the mountains.
>
> > If your have the option, can you switch the ddx between using SNA and
> > UXA.
>
> ??? Is that a BIOS option? Or kernel?
> I can try both.

It is an option in the Intel Xorg driver. What is actually used depends
on the options provided to the configure script during build time. You
can see the current state in nur Xorg.0.log.

Here is an example of /etc/X11/xorg.conf which enforces SNA:

Section "Device"
Option "AccelMethod" "SNA"
Identifier "Card0"
Driver "intel"
EndSection

Regards,
Tino

2012-10-30 00:39:49

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

Hi Chris,

On So, 28 Okt 2012, Chris Wilson wrote:
> > I pulled the whole branch into my compile branch, and removed everything
> > from kernel cmd line regarding rc6, and got the
> > [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> > [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> > [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> > [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> > [drm:i915_reset] *ERROR* Failed to reset chip.
> > new i915_error_state.gz at the same place.
> >
> > So it seems that the patches in the ilk-wa-pile branch do not help.
>
> Yeah, looks like we have another issue to contend with, so can you
> please file a bug on bugzilla.freedesktop.org (or bugzilla.kernel.org)
> so that we don't lose track of it.

I have seen this here:
https://bugs.freedesktop.org/show_bug.cgi?id=55984
does it make sense to start a new bug for that?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
SCREMBY (n.)
The dehydrated felt-tip pen attached by a string to the 'Don't Forget'
board in the kitchen which has never worked in living memory but which
no one can be bothered to throw away.
--- Douglas Adams, The Meaning of Liff

2012-10-30 00:50:05

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Mo, 29 Okt 2012, Tino Keitel wrote:
> Section "Device"
> Option "AccelMethod" "SNA"
> Identifier "Card0"
> Driver "intel"
> EndSection

Thanks, running now with SNA. Let us see what happens.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
RECULVER (n.)
The sort of remark only ever made during Any Questions.
--- Douglas Adams, The Meaning of Liff

2012-10-30 00:55:44

by Dave Airlie

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Tue, Oct 30, 2012 at 10:49 AM, Norbert Preining <[email protected]> wrote:
> On Mo, 29 Okt 2012, Tino Keitel wrote:
>> Section "Device"
>> Option "AccelMethod" "SNA"
>> Identifier "Card0"
>> Driver "intel"
>> EndSection
>
> Thanks, running now with SNA. Let us see what happens.

Please don't, we ain't going to find the bug any quicker changing
variables, if the only thing that changed on your system was the
kernel then we need to figure out which kernel changes caused it and
remove them. Changing userspace is only complicating things and making
it less likely we'll ever find the regressions. Once we find the
regression, changing userspace optiosn to help understand it is more
reasonable.

How long does it take you to reproduce, and does it happen when in
actual use. On my laptop I've noticed I come back to it sometimes and
gnome-shell is dead. This never happened pre 3.7-rc's. But for me its
a 3-4 day window so far for it to die, which makes bisecting it a bit
of a major problem. and I'm just finished bisecting the last Ironlake
regression that took over a month.

I would suggest starting a bisect on drivers/gpu/drm/i915 from 3.6
final to 3.7-rc1 or maybe -rc2.

Dave.

2012-10-30 01:01:43

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

Hi Dave,

On Di, 30 Okt 2012, Dave Airlie wrote:
> > Thanks, running now with SNA. Let us see what happens.
>
> Please don't, we ain't going to find the bug any quicker changing
> variables, if the only thing that changed on your system was the

Sorry, didn't know. I supposed from the email of Chris that I should try
it "to stress different code path" ... anyway, disabling it again.

> How long does it take you to reproduce, and does it happen when in

Very hard to say, most of the times it is in a few days scale.
Though it happened also after a few hours once.

> actual use. On my laptop I've noticed I come back to it sometimes and

Concerning actual use: I had instances on several occassions. Just 30min
ago it was while working with shotwell on my photo collection, tagging
photos. So there should not be a big disk activity or so, but a lot
of screen redraws etc when going through the photos.
On other times it was locked screen without screen saver.

Concerning coming back: For me it never worked. I always have to reboot
to get a working state again. Ok, to be more specific. GNome3 is dead.
I can close the windows normally with kbd shortcuts and some mouse
interaction, but no new windows, no moving etc.

> gnome-shell is dead. This never happened pre 3.7-rc's. But for me its
> a 3-4 day window so far for it to die, which makes bisecting it a bit

That sounds pretty much like my case, but since I often don't use
the laptop for 2 days or so, it might be a bit longer.

> of a major problem. and I'm just finished bisecting the last Ironlake
> regression that took over a month.

Ouch ...

> I would suggest starting a bisect on drivers/gpu/drm/i915 from 3.6
> final to 3.7-rc1 or maybe -rc2.

Ok, thanks. I will try.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
QUALL (vb.)
To speak with the voice of one who requires another to do something
for them.
--- Douglas Adams, The Meaning of Liff

2012-10-30 01:45:16

by Ben Widawsky

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Tue, 30 Oct 2012 10:01:38 +0900
Norbert Preining <[email protected]> wrote:

> Hi Dave,
>
> On Di, 30 Okt 2012, Dave Airlie wrote:
> > > Thanks, running now with SNA. Let us see what happens.
> >
> > Please don't, we ain't going to find the bug any quicker changing
> > variables, if the only thing that changed on your system was the
>
> Sorry, didn't know. I supposed from the email of Chris that I should
> try it "to stress different code path" ... anyway, disabling it again.
>
> > How long does it take you to reproduce, and does it happen when in
>
> Very hard to say, most of the times it is in a few days scale.
> Though it happened also after a few hours once.
>
> > actual use. On my laptop I've noticed I come back to it sometimes
> > and
>
> Concerning actual use: I had instances on several occassions. Just
> 30min ago it was while working with shotwell on my photo collection,
> tagging photos. So there should not be a big disk activity or so, but
> a lot of screen redraws etc when going through the photos.
> On other times it was locked screen without screen saver.
>
> Concerning coming back: For me it never worked. I always have to
> reboot to get a working state again. Ok, to be more specific. GNome3
> is dead. I can close the windows normally with kbd shortcuts and some
> mouse interaction, but no new windows, no moving etc.
>
> > gnome-shell is dead. This never happened pre 3.7-rc's. But for me
> > its a 3-4 day window so far for it to die, which makes bisecting it
> > a bit
>
> That sounds pretty much like my case, but since I often don't use
> the laptop for 2 days or so, it might be a bit longer.
>
> > of a major problem. and I'm just finished bisecting the last
> > Ironlake regression that took over a month.
>
> Ouch ...
>
> > I would suggest starting a bisect on drivers/gpu/drm/i915 from 3.6
> > final to 3.7-rc1 or maybe -rc2.
>
> Ok, thanks. I will try.
>
> Best wishes
>
> Norbert

Hi Norbert. In addition to the above, if this truly appears to be
related to i/o, can we try to decrease the time to failure with some
serious i/o tests? Off the top of my head I am not sure what's
available, but surely Google should be able to find something.

--
Ben Widawsky, Intel Open Source Technology Center

2012-10-30 03:13:28

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Mo, 29 Okt 2012, Ben Widawsky wrote:
> Hi Norbert. In addition to the above, if this truly appears to be
> related to i/o, can we try to decrease the time to failure with some

I am *not* sure. As I said, the last thing was shotwell photo
editing. It might be some io while loading the photos, but
after that they are in the cache, and the only thing is done is
lots of displaying.

> serious i/o tests? Off the top of my head I am not sure what's

Anyway, that is my idea. I think I don't need google. A simple
svn up
on my 15Gb svn repository creates enough io. And doing some git pull
or so on same sized repositories in parallel brings anyway the
laptop to its knees (actually, badly to its knees).

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
Far out in the uncharted backwaters of the unfashionable
end of the western spiral arm of the Galaxy lies a small
unregarded yellow sun.
--- Douglas Adams, The Hitchhikers Guide to the Galaxy

2012-10-30 10:03:07

by Chris Wilson

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Tue, 30 Oct 2012 09:39:43 +0900, Norbert Preining <[email protected]> wrote:
> Hi Chris,
>
> On So, 28 Okt 2012, Chris Wilson wrote:
> > Yeah, looks like we have another issue to contend with, so can you
> > please file a bug on bugzilla.freedesktop.org (or bugzilla.kernel.org)
> > so that we don't lose track of it.
>
> I have seen this here:
> https://bugs.freedesktop.org/show_bug.cgi?id=55984
> does it make sense to start a new bug for that?

I was fearing it was something different, but since Dave has now found
that rc6=0 was not sufficient in his case, it is probably the same. The
issue surrounding cpu-relocs was never explained and I suspect that we
are still being bitten by that root cause. Along those lines:

commit 86a1ee26bb60e1ab8984e92f0e9186c354670aed
Author: Chris Wilson <[email protected]>
Date: Sat Aug 11 15:41:04 2012 +0100

drm/i915: Only pwrite through the GTT if there is space in the aperture

is the most contentious patch in 3.7-rc.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2012-11-04 00:44:44

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

Hi all,

On Di, 30 Okt 2012, Dave Airlie wrote:
> I would suggest starting a bisect on drivers/gpu/drm/i915 from 3.6
> final to 3.7-rc1 or maybe -rc2.

Sorry for my ignorance ... I did on master branch
$ git checkout v3.7-rc1
...
$ git bisect start drivers/gpu/drm/i915
$ git bisect bad
$ git bisect good v3.6
Bisecting: 121 revisions left to test after this (roughly 7 steps)
[25c5b2665fe4cc5a93edd29b62e7c05c15dddd26] drm/i915: implement new set_mode code flow
$
after that I am back somewhere around
3.6.0-rc2
???

Am I doing something wrong? I thought I am bisecting between 3.6 and 3.7.-rc2?
How can I go back to 3.6.0-rc2?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
SCREEB (n.)
To make the noise of a nylon anorak rubbing against a pair of corduroy
trousers.
--- Douglas Adams, The Meaning of Liff

2012-11-04 06:08:50

by Dave Airlie

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On Sun, Nov 4, 2012 at 10:44 AM, Norbert Preining <[email protected]> wrote:
> Hi all,
>
> On Di, 30 Okt 2012, Dave Airlie wrote:
>> I would suggest starting a bisect on drivers/gpu/drm/i915 from 3.6
>> final to 3.7-rc1 or maybe -rc2.
>
> Sorry for my ignorance ... I did on master branch
> $ git checkout v3.7-rc1
> ...
> $ git bisect start drivers/gpu/drm/i915
> $ git bisect bad
> $ git bisect good v3.6
> Bisecting: 121 revisions left to test after this (roughly 7 steps)
> [25c5b2665fe4cc5a93edd29b62e7c05c15dddd26] drm/i915: implement new set_mode code flow
> $
> after that I am back somewhere around
> 3.6.0-rc2
> ???
>
> Am I doing something wrong? I thought I am bisecting between 3.6 and 3.7.-rc2?
> How can I go back to 3.6.0-rc2?

Yeah thats fine, bisecting works by going to where commits were
originally committed, so drm-intel-next was 3.6.0-rc2 at some point
was only merged into Linus later.

Dave.

2012-11-05 00:33:11

by Norbert Preining

[permalink] [raw]
Subject: Re: drm i915 hangs on heavy io load

On So, 04 Nov 2012, Dave Airlie wrote:
> Yeah thats fine, bisecting works by going to where commits were
> originally committed, so drm-intel-next was 3.6.0-rc2 at some point
> was only merged into Linus later.

Ok, thanks, didn't know that. Have started the bisect game now,
coming back in about 1 year ;-)

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
SCOPWICK (n.)
The flap of skin which is torn off you lip when trying to smoke an
untipped cigarette.
--- Douglas Adams, The Meaning of Liff

2012-11-05 20:30:17

by Peter Wu

[permalink] [raw]
Subject: [bisected] drm i915 hangs on heavy io load

On Sunday 04 November 2012 16:08:47 Dave Airlie wrote:
> On Sun, Nov 4, 2012 at 10:44 AM, Norbert Preining <[email protected]> wrote:

> > On Di, 30 Okt 2012, Dave Airlie wrote:
> >> I would suggest starting a bisect on drivers/gpu/drm/i915 from 3.6
> >> final to 3.7-rc1 or maybe -rc2.
> >
> > Sorry for my ignorance ... I did on master branch
> >
> > $ git checkout v3.7-rc1
> > ...
> > $ git bisect start drivers/gpu/drm/i915
> > $ git bisect bad
> > $ git bisect good v3.6
> > Bisecting: 121 revisions left to test after this (roughly 7 steps)
> > [25c5b2665fe4cc5a93edd29b62e7c05c15dddd26] drm/i915: implement new
> > set_mode code flow $
> >
> > after that I am back somewhere around
> >
> > 3.6.0-rc2
> >
> > ???
> >
> > Am I doing something wrong? I thought I am bisecting between 3.6 and
> > 3.7.-rc2? How can I go back to 3.6.0-rc2?
>
> Yeah thats fine, bisecting works by going to where commits were
> originally committed, so drm-intel-next was 3.6.0-rc2 at some point
> was only merged into Linus later.

As I mentioned on https://bugs.freedesktop.org/show_bug.cgi?id=55984, I also
hit this bug. The first time was on branch drm-intel-next-2012-09-20 on Daniel
Vetters drm-intel git.

I guess it has something to do with low memory. To reproduce the bug on my
laptop with 8GB RAM and a i5-460M, I did:

1. Boot (I use KDE)
3. Start glxspheres (from http://virtualgl.org/, but glxgears might work too,
not tested)
2. Copy a 1.2 GiB Linux source tree to /dev/shm and /tmp (both tmpfs), 5
times. This uses 6GiB of RAM. I used this bash script:
#!/bin/bash
for i in /tmp/hang-l1 /tmp/hang-l2 /tmp/hang-l3 \
/dev/shm/hang-l1 /dev/shm/hang-l2; do
cp -ra ~/Linux-src/linux "$i" &
done; wait
3. When the copy is almost done, watch the machine become sluggish and
eventually print the "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
elapsed... GPU hung" message to the kernel log. Until the machine is rebooted,
all OpenGL applications will fail to load.

On kernels where it was working fine, there is no lag when the copy is almost
finished.

504c7267a1e84b157cbd7e9c1b805e1bc0c2c846 is the first bad commit
commit 504c7267a1e84b157cbd7e9c1b805e1bc0c2c846
Author: Chris Wilson <[email protected]>
Date: Thu Aug 23 13:12:52 2012 +0100

drm/i915: Use cpu relocations if the object is in the GTT but not mappable

This prevents the case of unbinding the object in order to process the
relocations through the GTT and then rebinding it only to then proceed
to use cpu relocations as the object is now in the CPU write domain. By
choosing to use cpu relocations up front, we can therefore avoid the
rebind penalty.

Signed-off-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>

:040000 040000 090ed3d52b4f3210b988877f747b6ff86e123385
1d48be89ded4777a543b693db833de64877059c4 M drivers

Regards,
Peter