2010-11-12 00:32:00

by Zdenek Kabelac

[permalink] [raw]
Subject: Regressions in resume on intel graphics

Hi

I've noticed that after resume with 2.6.36 kernel I need to switch
between console and back to Xorg to get usable Xsession again.
(my hw - T61, gma965, 4GB)

I've played bisect game - and this is the first broken kernel:

---
commit 8fd4bd22350784d5b2fe9274f6790ba353976415
Author: Jesse Barnes <[email protected]>
Date: Wed Jun 23 12:56:12 2010 -0700

vt/console: try harder to print output when panicing
---

Also I'd like to increase focus on another suspend/resume killer (at
least on my T61)
https://bugzilla.kernel.org/show_bug.cgi?id=19052
https://bugzilla.redhat.com/show_bug.cgi?id=617809

Though disabling polling thread fixes problem for 100% it's probably
not the best fix for this problem.


Zdenek


2010-11-12 09:17:11

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: Regressions in resume on intel graphics

2010/11/12 Zdenek Kabelac <[email protected]>:
> Hi
>
> I've noticed that after resume ?with 2.6.36 kernel I need to switch
> between console and back to Xorg to get usable Xsession again.
> (my hw - T61, gma965, 4GB)
>
> I've played bisect game - ?and this is the first broken kernel:
>
> ---
> commit 8fd4bd22350784d5b2fe9274f6790ba353976415
> Author: Jesse Barnes <[email protected]>
> Date: ? Wed Jun 23 12:56:12 2010 -0700
>
> ? ?vt/console: try harder to print output when panicing
> ---

I've been able to boot and test with 2.6.37-rc1-00170-gf6614b7
- and this problem seems to be fixed in this version (vt.c file seems
to be gone?).

>
> Also I'd like to increase focus on another ?suspend/resume killer (at
> least on my T61)
> https://bugzilla.kernel.org/show_bug.cgi?id=19052
> https://bugzilla.redhat.com/show_bug.cgi?id=617809
>

With kernel 2.6.37-rc1-00170-gf6614b7 message has changed (with
enabled polling thread)
--
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[drm:i915_reset] *ERROR* Failed to reset chip.
X[1378]: segfault at 0 ip 00007f33409c727d sp 00007fff930e7120 error 4
in intel_drv.so[7f3340990000+4a000]
[drm:intel_panel_get_max_backlight] *ERROR* fixme: max PWM is zero.
--

But I can't be sure about whether it's the same bug - as display seems
to be enabled for a second after resume.
and then just everything crashes.. - with original EIR bug 19052 - my
display remains completely black

I'm using Fedora Rawhide Xorg & driver:
xorg-x11-server-Xorg-1.9.1-3.fc15.x86_64
xorg-x11-drv-intel-2.12.0-8.fc15.x86_64

Zdenek

2010-11-19 21:03:57

by Jesse Barnes

[permalink] [raw]
Subject: Re: Regressions in resume on intel graphics

On Fri, 12 Nov 2010 10:17:07 +0100
Zdenek Kabelac <[email protected]> wrote:

> 2010/11/12 Zdenek Kabelac <[email protected]>:
> > Hi
> >
> > I've noticed that after resume  with 2.6.36 kernel I need to switch
> > between console and back to Xorg to get usable Xsession again.
> > (my hw - T61, gma965, 4GB)
> >
> > I've played bisect game -  and this is the first broken kernel:
> >
> > ---
> > commit 8fd4bd22350784d5b2fe9274f6790ba353976415
> > Author: Jesse Barnes <[email protected]>
> > Date:   Wed Jun 23 12:56:12 2010 -0700
> >
> >    vt/console: try harder to print output when panicing
> > ---
>
> I've been able to boot and test with 2.6.37-rc1-00170-gf6614b7
> - and this problem seems to be fixed in this version (vt.c file seems
> to be gone?).

This is really weird; there was one issue Dave tracked down related to
lockdep and the new oops code, but I think it's been fixed. And it
should manifest as something other than a GPU hang...

--
Jesse Barnes, Intel Open Source Technology Center

2010-11-19 21:35:23

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: Regressions in resume on intel graphics

2010/11/19 Jesse Barnes <[email protected]>:
> On Fri, 12 Nov 2010 10:17:07 +0100
> Zdenek Kabelac <[email protected]> wrote:
>
>> 2010/11/12 Zdenek Kabelac <[email protected]>:
>> > Hi
>> >
>> > I've noticed that after resume ?with 2.6.36 kernel I need to switch
>> > between console and back to Xorg to get usable Xsession again.
>> > (my hw - T61, gma965, 4GB)
>> >
>> > I've played bisect game - ?and this is the first broken kernel:
>> >
>> > ---
>> > commit 8fd4bd22350784d5b2fe9274f6790ba353976415
>> > Author: Jesse Barnes <[email protected]>
>> > Date: ? Wed Jun 23 12:56:12 2010 -0700
>> >
>> > ? ?vt/console: try harder to print output when panicing
>> > ---
>>
>> I've been able to boot and test with 2.6.37-rc1-00170-gf6614b7
>> - and this problem seems to be fixed in this version (vt.c file seems
>> to be gone?).
>
> This is really weird; there was one issue Dave tracked down related to
> lockdep and the new oops code, but I think it's been fixed. ?And it
> should manifest as something other than a GPU hang...
>

Well - unsure - how this worked ok - for 2.6.37-rc1-00170-gf6614b7 (I
could try few more times - though this kernel seems to be crashing for
various other reasons - so I've had quite few testruns before I've
been able to test this in some very limited X environment - so maybe
it was just some lucky case - but I still keep it on my disk)

Anyway - as an update - I'm now regularly using 2.6.37-rc2 - and it
has exactly same problems - as in my original report - so the problem
has not magically disappeared and it is still there. Is there
anything I should try ?

Basically every time after resume from suspend I have to switch from X
to console and back to get usable screen back. Without this I seen
only black screen with mouse over it - is there some way for reverting
of this patch - or maybe just disabling some part of it ?


To have reliable resume I still have to keep drm_kms_help thread
disable - otherwise I observe GPU errors.
Switch between consoles is 'relatively' easy to overcome.

And another thing which might help - during suspend/resume I could
usually see weird switch to console with some special text on the
whole console screen -

[ ###.... ]
[ ###.... ]

where '###' is some changing number - and it seems to be different
between resumes.
Also I should add I'm using 'no-console-suspend' kernel boot option.

Zdenek

2010-11-19 22:10:31

by David Airlie

[permalink] [raw]
Subject: Re: Regressions in resume on intel graphics

On Fri, 2010-11-19 at 22:35 +0100, Zdenek Kabelac wrote:
> 2010/11/19 Jesse Barnes <[email protected]>:
> > On Fri, 12 Nov 2010 10:17:07 +0100
> > Zdenek Kabelac <[email protected]> wrote:
> >
> >> 2010/11/12 Zdenek Kabelac <[email protected]>:
> >> > Hi
> >> >
> >> > I've noticed that after resume with 2.6.36 kernel I need to switch
> >> > between console and back to Xorg to get usable Xsession again.
> >> > (my hw - T61, gma965, 4GB)
> >> >
> >> > I've played bisect game - and this is the first broken kernel:
> >> >
> >> > ---
> >> > commit 8fd4bd22350784d5b2fe9274f6790ba353976415
> >> > Author: Jesse Barnes <[email protected]>
> >> > Date: Wed Jun 23 12:56:12 2010 -0700
> >> >
> >> > vt/console: try harder to print output when panicing
> >> > ---
> >>
> >> I've been able to boot and test with 2.6.37-rc1-00170-gf6614b7
> >> - and this problem seems to be fixed in this version (vt.c file seems
> >> to be gone?).
> >
> > This is really weird; there was one issue Dave tracked down related to
> > lockdep and the new oops code, but I think it's been fixed. And it
> > should manifest as something other than a GPU hang...
> >
>
> Well - unsure - how this worked ok - for 2.6.37-rc1-00170-gf6614b7 (I
> could try few more times - though this kernel seems to be crashing for
> various other reasons - so I've had quite few testruns before I've
> been able to test this in some very limited X environment - so maybe
> it was just some lucky case - but I still keep it on my disk)
>
> Anyway - as an update - I'm now regularly using 2.6.37-rc2 - and it
> has exactly same problems - as in my original report - so the problem
> has not magically disappeared and it is still there. Is there
> anything I should try ?
>
> Basically every time after resume from suspend I have to switch from X
> to console and back to get usable screen back. Without this I seen
> only black screen with mouse over it - is there some way for reverting
> of this patch - or maybe just disabling some part of it ?
>
>
> To have reliable resume I still have to keep drm_kms_help thread
> disable - otherwise I observe GPU errors.
> Switch between consoles is 'relatively' easy to overcome.
>
> And another thing which might help - during suspend/resume I could
> usually see weird switch to console with some special text on the
> whole console screen -
>
> [ ###.... ]
> [ ###.... ]
>
> where '###' is some changing number - and it seems to be different
> between resumes.
> Also I should add I'm using 'no-console-suspend' kernel boot option.

Can you see if e0fdace10e75dac67d906213b780ff1b1a4cc360 reverts cleanly?
and fixes it?

The problem is I think you are getting a lockdep splat or rcu issue
before suspending, which sets oops_in_progress and never unsets it,
which means on resume the fb resume path kicks in to show you the oops
that is happening when there isn't actually anything to show. I've
gotten acks to have this reverted I just need to send the patch.

Dave

2010-11-19 23:05:45

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: Regressions in resume on intel graphics

2010/11/19 Dave Airlie <[email protected]>:
> On Fri, 2010-11-19 at 22:35 +0100, Zdenek Kabelac wrote:
>> 2010/11/19 Jesse Barnes <[email protected]>:
>> > On Fri, 12 Nov 2010 10:17:07 +0100
>> > Zdenek Kabelac <[email protected]> wrote:
>> >
>> >> 2010/11/12 Zdenek Kabelac <[email protected]>:
>> >> > Hi
>> >> >
>> >> > I've noticed that after resume ?with 2.6.36 kernel I need to switch
>> >> > between console and back to Xorg to get usable Xsession again.
>> >> > (my hw - T61, gma965, 4GB)
>> >> >
>> >> > I've played bisect game - ?and this is the first broken kernel:
>> >> >
>> >> > ---
>> >> > commit 8fd4bd22350784d5b2fe9274f6790ba353976415
>> >> > Author: Jesse Barnes <[email protected]>
>> >> > Date: ? Wed Jun 23 12:56:12 2010 -0700
>> >> >
>> >> > ? ?vt/console: try harder to print output when panicing
>> >> > ---
>> >>
>> >> I've been able to boot and test with 2.6.37-rc1-00170-gf6614b7
>> >> - and this problem seems to be fixed in this version (vt.c file seems
>> >> to be gone?).
>> >
>> > This is really weird; there was one issue Dave tracked down related to
>> > lockdep and the new oops code, but I think it's been fixed. ?And it
>> > should manifest as something other than a GPU hang...
>> >
>>
>> Well - unsure - how this worked ok - for 2.6.37-rc1-00170-gf6614b7 (I
>> could try few more times - though this kernel seems to be crashing for
>> various other reasons - so I've had quite few testruns before I've
>> been able to test this in some very limited X environment - so maybe
>> it was just some lucky case - but I still keep it on my disk)
>>
>> Anyway - as an update - I'm now regularly using 2.6.37-rc2 - and it
>> has exactly same problems - as in my original report - ?so the problem
>> has not magically disappeared and it is still there. ?Is there
>> anything I should try ?
>>
>> Basically every time after resume from suspend I have to switch from X
>> to console and back to get usable screen back. Without this I seen
>> only black screen with mouse over it - is there some way for reverting
>> of this patch - or maybe just disabling some part of it ?
>>
>>
>> To have reliable resume I still have to keep drm_kms_help thread
>> disable - otherwise I observe GPU errors.
>> Switch between consoles is 'relatively' easy to overcome.
>>
>> And another thing which might help - ?during suspend/resume ?I could
>> usually see weird switch to console with some special text on the
>> whole console screen -
>>
>> [ ###.... ]
>> [ ###.... ]
>>
>> where ?'###' is some changing number - and it seems to be different
>> between resumes.
>> Also I should add I'm using ?'no-console-suspend' ?kernel boot option.
>
> Can you see if e0fdace10e75dac67d906213b780ff1b1a4cc360 reverts cleanly?
> and fixes it?
>
> The problem is I think you are getting a lockdep splat or rcu issue
> before suspending, which sets oops_in_progress and never unsets it,
> which means on resume the fb resume path kicks in to show you the oops
> that is happening when there isn't actually anything to show. I've
> gotten acks to have this reverted I just need to send the patch.
>


Ok - tested currentl 6656b3fc8aba2eb7ca00c06c7fe4917938b0b652 vanilla kernel
with reverted commit e0fdace10e75da - and it seems Xorg screen after
resume is again properly working.
So this was quick - now the remaining problem -
https://bugzilla.kernel.org/show_bug.cgi?id=19052

Zdenek