2008-02-19 12:47:20

by Soeren Sonnenburg

[permalink] [raw]
Subject: 2.6.25-rc2 regression - hang on suspend

Hi,

since 2.6.25-rc1 (first version I tried) and still in rc2 (and git), I
see a hang on s2ram already when trying to suspend.

This is on a macbookpro 1,1 - which steps should I do next to help
isolating the problem?

Soeren


2008-02-19 13:12:55

by Tino Keitel

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Tue, Feb 19, 2008 at 12:59:46 +0100, Soeren Sonnenburg wrote:
> Hi,
>
> since 2.6.25-rc1 (first version I tried) and still in rc2 (and git), I
> see a hang on s2ram already when trying to suspend.
>
> This is on a macbookpro 1,1 - which steps should I do next to help
> isolating the problem?

Could this be the same as http://lkml.org/lkml/2008/2/17/381?

Regards,
Tino

2008-02-19 21:08:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> Hi,

Hi,

> since 2.6.25-rc1 (first version I tried) and still in rc2 (and git), I
> see a hang on s2ram already when trying to suspend.

Does it work with 2.6.24?

Rafael

2008-02-19 22:00:26

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Tue, 2008-02-19 at 22:06 +0100, Rafael J. Wysocki wrote:
> On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > Hi,
>
> Hi,
>
> > since 2.6.25-rc1 (first version I tried) and still in rc2 (and git), I
> > see a hang on s2ram already when trying to suspend.
>
> Does it work with 2.6.24?

yes.

Soeren

2008-02-19 23:51:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> On Tue, 2008-02-19 at 22:06 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > Hi,
> >
> > Hi,
> >
> > > since 2.6.25-rc1 (first version I tried) and still in rc2 (and git), I
> > > see a hang on s2ram already when trying to suspend.
> >
> > Does it work with 2.6.24?
>
> yes.

Please take the current mainline (there are a couple of nasty bugs fixed in
it), configure it with CONFIG_PM_DEBUG set, boot it with "no_console_suspend",
run

# echo 8 > /proc/sys/kernel/printk
# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state

If it hangs, it should leave a stack trace before and I need that trace to see
what's going on. If it doesn't hang, I'll tell you what to do next.

Thanks,
Rafael

2008-02-20 06:01:23

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Wed, 2008-02-20 at 00:50 +0100, Rafael J. Wysocki wrote:
> On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > On Tue, 2008-02-19 at 22:06 +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > > Hi,
> > >
> > > Hi,
> > >
> > > > since 2.6.25-rc1 (first version I tried) and still in rc2 (and git), I
> > > > see a hang on s2ram already when trying to suspend.
> > >
> > > Does it work with 2.6.24?
> >
> > yes.
>
> Please take the current mainline (there are a couple of nasty bugs fixed in
> it), configure it with CONFIG_PM_DEBUG set, boot it with "no_console_suspend",
> run
>
> # echo 8 > /proc/sys/kernel/printk
> # echo devices > /sys/power/pm_test
> # echo mem > /sys/power/state
>
> If it hangs, it should leave a stack trace before and I need that trace to see
> what's going on. If it doesn't hang, I'll tell you what to do next.

I tried with 2.6.24.2 with CONFIG_PM_DEBUG set, following your steps and
yes it works flawlessly (though the display did not come back I could
suspend/resume multiple times without problems, and finally s2ram -f -p
brought the display back).

So what next?

Soeren

2008-02-21 00:33:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Wednesday, 20 of February 2008, Soeren Sonnenburg wrote:
> On Wed, 2008-02-20 at 00:50 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > On Tue, 2008-02-19 at 22:06 +0100, Rafael J. Wysocki wrote:
> > > > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > > > Hi,
> > > >
> > > > Hi,
> > > >
> > > > > since 2.6.25-rc1 (first version I tried) and still in rc2 (and git), I
> > > > > see a hang on s2ram already when trying to suspend.
> > > >
> > > > Does it work with 2.6.24?
> > >
> > > yes.
> >
> > Please take the current mainline (there are a couple of nasty bugs fixed in
> > it), configure it with CONFIG_PM_DEBUG set, boot it with "no_console_suspend",
> > run
> >
> > # echo 8 > /proc/sys/kernel/printk
> > # echo devices > /sys/power/pm_test
> > # echo mem > /sys/power/state
> >
> > If it hangs, it should leave a stack trace before and I need that trace to see
> > what's going on. If it doesn't hang, I'll tell you what to do next.
>
> I tried with 2.6.24.2 with CONFIG_PM_DEBUG set, following your steps and
> yes it works flawlessly (though the display did not come back I could
> suspend/resume multiple times without problems, and finally s2ram -f -p
> brought the display back).

Hm, there's no /sys/power/pm_test in 2.6.24.2 (and the "current mainline"
means the latest -git kernel possible or the current top of the Linus' tree),
so in fact you tested 2.6.24 again, that is known to work ...

Thanks,
Rafael

2008-02-21 11:06:19

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Thu, 2008-02-21 at 01:31 +0100, Rafael J. Wysocki wrote:
> On Wednesday, 20 of February 2008, Soeren Sonnenburg wrote:
> > On Wed, 2008-02-20 at 00:50 +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > > On Tue, 2008-02-19 at 22:06 +0100, Rafael J. Wysocki wrote:
> > > > > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > > > > Hi,
> > > > >
> > > > > Hi,
> > > > >
> > > > > > since 2.6.25-rc1 (first version I tried) and still in rc2
> (and git), I
> > > > > > see a hang on s2ram already when trying to suspend.
> > > > >
> > > > > Does it work with 2.6.24?
> > > >
> > > > yes.
> > >
> > > Please take the current mainline (there are a couple of nasty bugs
> fixed in
> > > it), configure it with CONFIG_PM_DEBUG set, boot it with
> "no_console_suspend",
> > > run
> > >
> > > # echo 8 > /proc/sys/kernel/printk
> > > # echo devices > /sys/power/pm_test
> > > # echo mem > /sys/power/state
> > >
> > > If it hangs, it should leave a stack trace before and I need that trace to see
> > > what's going on. If it doesn't hang, I'll tell you what to do next.
> >
> > I tried with 2.6.24.2 with CONFIG_PM_DEBUG set, following your steps and
> > yes it works flawlessly (though the display did not come back I could
> > suspend/resume multiple times without problems, and finally s2ram -f -p
> > brought the display back).
>
> Hm, there's no /sys/power/pm_test in 2.6.24.2 (and the "current mainline"
> means the latest -git kernel possible or the current top of the Linus' tree),
> so in fact you tested 2.6.24 again, that is known to work ...

Great :(

Anyway testing linus' git-current, I see that it does not hang. However
after the echo mem >/sys/power/state I am seeing:
[...]
PM: Finishing wakeup.
Restarting tasks ... done.
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: device not ready (errno=-16), forcing hardreset


and this a couple of times. Using echo none >/sys/power/pm_test and then
echo mem >/sys/power/state I see it hang on ata1 errors again. Waiting
about 10-30 seconds it progresses further and finally arrives at

CPU0 attaching NULL sched-domain
CPU1 attaching NULL sched-domain

then hangs.

So what next?
Soeren

2008-02-21 23:08:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Thursday, 21 of February 2008, Soeren Sonnenburg wrote:
> On Thu, 2008-02-21 at 01:31 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, 20 of February 2008, Soeren Sonnenburg wrote:
> > > On Wed, 2008-02-20 at 00:50 +0100, Rafael J. Wysocki wrote:
> > > > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > > > On Tue, 2008-02-19 at 22:06 +0100, Rafael J. Wysocki wrote:
> > > > > > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > > > > > Hi,
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > > since 2.6.25-rc1 (first version I tried) and still in rc2
> > (and git), I
> > > > > > > see a hang on s2ram already when trying to suspend.
> > > > > >
> > > > > > Does it work with 2.6.24?
> > > > >
> > > > > yes.
> > > >
> > > > Please take the current mainline (there are a couple of nasty bugs
> > fixed in
> > > > it), configure it with CONFIG_PM_DEBUG set, boot it with
> > "no_console_suspend",
> > > > run
> > > >
> > > > # echo 8 > /proc/sys/kernel/printk
> > > > # echo devices > /sys/power/pm_test
> > > > # echo mem > /sys/power/state
> > > >
> > > > If it hangs, it should leave a stack trace before and I need that trace to see
> > > > what's going on. If it doesn't hang, I'll tell you what to do next.
> > >
> > > I tried with 2.6.24.2 with CONFIG_PM_DEBUG set, following your steps and
> > > yes it works flawlessly (though the display did not come back I could
> > > suspend/resume multiple times without problems, and finally s2ram -f -p
> > > brought the display back).
> >
> > Hm, there's no /sys/power/pm_test in 2.6.24.2 (and the "current mainline"
> > means the latest -git kernel possible or the current top of the Linus' tree),
> > so in fact you tested 2.6.24 again, that is known to work ...
>
> Great :(
>
> Anyway testing linus' git-current, I see that it does not hang. However
> after the echo mem >/sys/power/state I am seeing:
> [...]
> PM: Finishing wakeup.
> Restarting tasks ... done.
> ata1: port is slow to respond, please be patient (Status 0x80)
> ata1: device not ready (errno=-16), forcing hardreset
>
>
> and this a couple of times. Using echo none >/sys/power/pm_test and then
> echo mem >/sys/power/state I see it hang on ata1 errors again. Waiting
> about 10-30 seconds it progresses further and finally arrives at
>
> CPU0 attaching NULL sched-domain
> CPU1 attaching NULL sched-domain
>
> then hangs.

Please see if compiling the kernel with CONFIG_SMP unset makes suspend
work.

Also, if you could find the commit that broke things, it would probably help a
lot.

Thanks,
Rafael

2008-02-22 07:00:43

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Fri, 2008-02-22 at 00:06 +0100, Rafael J. Wysocki wrote:
> On Thursday, 21 of February 2008, Soeren Sonnenburg wrote:
> > On Thu, 2008-02-21 at 01:31 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, 20 of February 2008, Soeren Sonnenburg wrote:
> > > > On Wed, 2008-02-20 at 00:50 +0100, Rafael J. Wysocki wrote:
[...]
> > Using echo none >/sys/power/pm_test and then
> > echo mem >/sys/power/state I see it hang on ata1 errors again. Waiting
> > about 10-30 seconds it progresses further and finally arrives at
> >
> > CPU0 attaching NULL sched-domain
> > CPU1 attaching NULL sched-domain
> >
> > then hangs.
>
> Please see if compiling the kernel with CONFIG_SMP unset makes suspend
> work.

*Argh*, this bug is not behaving nicely :( Whatever happened,
git-current now suspends correctly with and without CONFIG_SMP and all
may CONFIG_PREEMPT_RCU=y and CONFIG_CLASSIC_RCU=y attempts. Also no sata
errors anymore.

However it is not reliably waking up (at least when all of the above
except CLASSIC_RCU is on). Sometimes the display remains black on the
console, but X still works and sometimes it hangs completely on resume.

Also when compiling these many kernels via make -j4 I noted that I could
hardly move the mouse / use the keyboard, but saw random jumps and
key-repetitions...

Soeren

2008-02-22 15:58:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Friday, 22 of February 2008, Soeren Sonnenburg wrote:
> On Fri, 2008-02-22 at 00:06 +0100, Rafael J. Wysocki wrote:
> > On Thursday, 21 of February 2008, Soeren Sonnenburg wrote:
> > > On Thu, 2008-02-21 at 01:31 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, 20 of February 2008, Soeren Sonnenburg wrote:
> > > > > On Wed, 2008-02-20 at 00:50 +0100, Rafael J. Wysocki wrote:
> [...]
> > > Using echo none >/sys/power/pm_test and then
> > > echo mem >/sys/power/state I see it hang on ata1 errors again. Waiting
> > > about 10-30 seconds it progresses further and finally arrives at
> > >
> > > CPU0 attaching NULL sched-domain
> > > CPU1 attaching NULL sched-domain
> > >
> > > then hangs.
> >
> > Please see if compiling the kernel with CONFIG_SMP unset makes suspend
> > work.
>
> *Argh*, this bug is not behaving nicely :( Whatever happened,
> git-current now suspends correctly with and without CONFIG_SMP and all
> may CONFIG_PREEMPT_RCU=y and CONFIG_CLASSIC_RCU=y attempts. Also no sata
> errors anymore.
>
> However it is not reliably waking up (at least when all of the above
> except CLASSIC_RCU is on). Sometimes the display remains black on the
> console, but X still works and sometimes it hangs completely on resume.

I have created the Bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=10065
for this issue. Please add yourself to the CC list in there and update it with
any new findings.

> Also when compiling these many kernels via make -j4 I noted that I could
> hardly move the mouse / use the keyboard, but saw random jumps and
> key-repetitions...

This last bit is most likely a scheduler issue. Do you have CONFIG_GROUP_SCHED
set by chance? If you do, please try to unset it and see if that helps.

Thanks,
Rafael

2008-02-23 07:38:54

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: 2.6.25-rc2 regression - hang on suspend

On Fri, 2008-02-22 at 16:56 +0100, Rafael J. Wysocki wrote:
> On Friday, 22 of February 2008, Soeren Sonnenburg wrote:
> > On Fri, 2008-02-22 at 00:06 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, 21 of February 2008, Soeren Sonnenburg wrote:
> > > > On Thu, 2008-02-21 at 01:31 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, 20 of February 2008, Soeren Sonnenburg wrote:
> > > > > > On Wed, 2008-02-20 at 00:50 +0100, Rafael J. Wysocki wrote:
> > [...]
> > Also when compiling these many kernels via make -j4 I noted that I could
> > hardly move the mouse / use the keyboard, but saw random jumps and
> > key-repetitions...
>
> This last bit is most likely a scheduler issue. Do you have CONFIG_GROUP_SCHED
> set by chance? If you do, please try to unset it and see if that helps.

Yes I had. Disabling this helped a lot -- the kernel seems to behave
normally with this option unset.

Soeren