2008-01-31 21:41:18

by Michael Tokarev

[permalink] [raw]
Subject: swsusp on an AMD x2-64, 2.6.24: regression?

Since I upgraded from 2.6.23 to 2.6.24, suspend to
disk does not work anymore on this machine. I'm
trying to debug this now, for several hours already,
without much luck so far.

The machine is based on AMD X2-64 (BE-2400) CPU and
NVidia MCP51PV (GeForce 6150/NForce 430) chipset.

Up until 2.6.23 (both 32- and 64-bit kernels), suspend/
resume worked just fine without any glitch.

With 2.6.24, it tries to suspend, saves pages to disk,
when prints this:

..Saving pages... done.
Sl
Suspending console(s)
_

At this point, nothing more happens. It does not
react to keyboard or to any other external events,
only reset/poweroff helps. Note the "Sl" stuff -
I guess it's a beginning of some message, but I
don't know which one.

After a hard reboot after that, the system resumes
from the saved image.

This happens with both 32- and 64-bits kernels, when
using either good'old `echo disk > /sys/power/state'
or when using s2disk from uswsusp (with 32bits kernel,
as this utility does not work with 64bits kernel when
compiled as 32bits application). Note that when using
s2disk with 2.6.24, very similar picture is shown -
that same "Sl" line.

The only noticeable changes in my config (x86-64 one)
compared with the one from 2.6.23 is - I enabled tickless
(dyntics) and high-res timers, CPU_IDLE and FAIR_CGROUP_SCHED.
I recompiled the kernel without those options (on by one), --
the effect on suspend is exactly the same.

Is there anything I can do to further diagnose this?

Thanks!

/mjt


2008-01-31 23:12:18

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp on an AMD x2-64, 2.6.24: regression?

On Fri 2008-02-01 00:41:06, Michael Tokarev wrote:
> Since I upgraded from 2.6.23 to 2.6.24, suspend to
> disk does not work anymore on this machine. I'm
> trying to debug this now, for several hours already,
> without much luck so far.
>
> The machine is based on AMD X2-64 (BE-2400) CPU and
> NVidia MCP51PV (GeForce 6150/NForce 430) chipset.
>
> Up until 2.6.23 (both 32- and 64-bit kernels), suspend/
> resume worked just fine without any glitch.
>
> With 2.6.24, it tries to suspend, saves pages to disk,
> when prints this:
>
> ..Saving pages... done.
> Sl
> Suspending console(s)
> _
>
> At this point, nothing more happens. It does not
> react to keyboard or to any other external events,
> only reset/poweroff helps. Note the "Sl" stuff -
> I guess it's a beginning of some message, but I
> don't know which one.
>
> After a hard reboot after that, the system resumes
> from the saved image.
>
> This happens with both 32- and 64-bits kernels, when
> using either good'old `echo disk > /sys/power/state'
> or when using s2disk from uswsusp (with 32bits kernel,
> as this utility does not work with 64bits kernel when
> compiled as 32bits application). Note that when using
> s2disk with 2.6.24, very similar picture is shown -
> that same "Sl" line.
>
> The only noticeable changes in my config (x86-64 one)
> compared with the one from 2.6.23 is - I enabled tickless
> (dyntics) and high-res timers, CPU_IDLE and FAIR_CGROUP_SCHED.
> I recompiled the kernel without those options (on by one), --
> the effect on suspend is exactly the same.
>
> Is there anything I can do to further diagnose this?

no_console_suspend (sp?), nohz=off, highres=off, and try with minimum
config.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-01 10:16:20

by Michael Tokarev

[permalink] [raw]
Subject: Re: swsusp on an AMD x2-64, 2.6.24: regression?

Pavel Machek wrote:
> On Fri 2008-02-01 00:41:06, Michael Tokarev wrote:
[]
>> With 2.6.24, it tries to suspend, saves pages to disk,
>> when prints this:
>>
>> ..Saving pages... done.
>> Sl

It's actually "S|", not "Sl".

>> Suspending console(s)
>> _
>>
>> At this point, nothing more happens. It does not
>> react to keyboard or to any other external events,

..because the keyboard is USB-connected, and it shuts
down all USB devices. I'll try with PS/2 keyboard
(when I'll find one I had somewhere... ;)

[]
> no_console_suspend (sp?), nohz=off, highres=off, and try with minimum
> config.

no_console_suspend it is. Tried that, the "S|" thing is still
here, but instead of "Suspending console(s)" it now shows
progress of suspending other devices. The end result is
the same - finally it stops and sits here ad infinitum.

nohz and highres are useless now, as I recompiled the kernel
without support for those, and without CPU_IDLE and other
fancy stuff, and disabled cpufreq just in case.

What's minimum config? Should I turn off SMP (it's a dual-core
CPU by the way)? Something else? (I already removed most
driver modules when when trying suspend - only ones which are
absolutely necessary are left).

I've read Documentation/power/tricks.txt. From that list,
I have the following:

o all drivers are unloaded except disk and usb (keyboard)
o preempt is disabled (was never enabled)
o APIC IS in use.
o modules are in use. Is it worth to try module-less?
o vga text console - not even "vga" per se, - no framebuffers
and such, not even as modules. No "video mode switching
support" is enabled.
o only a few processes left, in like single-user mode.

One other difference between 2.6.23 and 2.6.24 as I see here
is: 2.6.24 tells me about TSC unstability (when I load cpufreq
stuff), while 2.6.23 did not. This is about 64bit mode - with
32bits, both switches from tsc to hpet, so in this regard,
2.6.24 (with 32bits) is not different from 2.6.23 it seems
(i mean in relation with suspend issues, since 32bits .23
mentioned tsc instability yet it suspended fine).

So I'm.. stuck. :) Don't know where to go from here.

Thanks!

/mjt

2008-02-01 11:27:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: swsusp on an AMD x2-64, 2.6.24: regression?

On Friday, 1 of February 2008, Michael Tokarev wrote:
> Pavel Machek wrote:
> > On Fri 2008-02-01 00:41:06, Michael Tokarev wrote:
> []
> >> With 2.6.24, it tries to suspend, saves pages to disk,
> >> when prints this:
> >>
> >> ..Saving pages... done.
> >> Sl
>
> It's actually "S|", not "Sl".
>
> >> Suspending console(s)
> >> _
> >>
> >> At this point, nothing more happens. It does not
> >> react to keyboard or to any other external events,
>
> ..because the keyboard is USB-connected, and it shuts
> down all USB devices. I'll try with PS/2 keyboard
> (when I'll find one I had somewhere... ;)
>
> []
> > no_console_suspend (sp?), nohz=off, highres=off, and try with minimum
> > config.
>
> no_console_suspend it is. Tried that, the "S|" thing is still
> here, but instead of "Suspending console(s)" it now shows
> progress of suspending other devices. The end result is
> the same - finally it stops and sits here ad infinitum.

I guess it's a special variation of
http://bugzilla.kernel.org/show_bug.cgi?id=9528

Please try to hibernate in the shutdown mode (ie. echo
"shutdown" into /sys/power/disk before hibernation).

Thanks,
Rafael

2008-02-01 12:44:26

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp on an AMD x2-64, 2.6.24: regression?

On Fri 2008-02-01 13:16:08, Michael Tokarev wrote:
> Pavel Machek wrote:
> > On Fri 2008-02-01 00:41:06, Michael Tokarev wrote:
> []
> >> With 2.6.24, it tries to suspend, saves pages to disk,
> >> when prints this:
> >>
> >> ..Saving pages... done.
> >> Sl
>
> It's actually "S|", not "Sl".
>
> >> Suspending console(s)
> >> _
> >>
> >> At this point, nothing more happens. It does not
> >> react to keyboard or to any other external events,
>
> ..because the keyboard is USB-connected, and it shuts
> down all USB devices. I'll try with PS/2 keyboard
> (when I'll find one I had somewhere... ;)
>
> []
> > no_console_suspend (sp?), nohz=off, highres=off, and try with minimum
> > config.
>
> no_console_suspend it is. Tried that, the "S|" thing is still
> here, but instead of "Suspending console(s)" it now shows
> progress of suspending other devices. The end result is
> the same - finally it stops and sits here ad infinitum.
>
> nohz and highres are useless now, as I recompiled the kernel
> without support for those, and without CPU_IDLE and other
> fancy stuff, and disabled cpufreq just in case.
>
> What's minimum config? Should I turn off SMP (it's a dual-core
> CPU by the way)? Something else? (I already removed most
> driver modules when when trying suspend - only ones which are
> absolutely necessary are left).

Disabling smp can't hurt, agreed.

> I've read Documentation/power/tricks.txt. From that list,
> I have the following:
>
> o all drivers are unloaded except disk and usb (keyboard)
> o preempt is disabled (was never enabled)
> o APIC IS in use.
> o modules are in use. Is it worth to try module-less?
> o vga text console - not even "vga" per se, - no framebuffers
> and such, not even as modules. No "video mode switching
> support" is enabled.
> o only a few processes left, in like single-user mode.

Try init=/bin/bash, too.

> One other difference between 2.6.23 and 2.6.24 as I see here
> is: 2.6.24 tells me about TSC unstability (when I load cpufreq
> stuff), while 2.6.23 did not. This is about 64bit mode - with
> 32bits, both switches from tsc to hpet, so in this regard,
> 2.6.24 (with 32bits) is not different from 2.6.23 it seems
> (i mean in relation with suspend issues, since 32bits .23
> mentioned tsc instability yet it suspended fine).
>
> So I'm.. stuck. :) Don't know where to go from here.

git bisect is always an option.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-01 15:05:00

by Michael Tokarev

[permalink] [raw]
Subject: Re: swsusp on an AMD x2-64, 2.6.24: regression?

Rafael J. Wysocki wrote:
> On Friday, 1 of February 2008, Michael Tokarev wrote:
[]
>> no_console_suspend it is. Tried that, the "S|" thing is still
>> here, but instead of "Suspending console(s)" it now shows
>> progress of suspending other devices. The end result is
>> the same - finally it stops and sits here ad infinitum.
>
> I guess it's a special variation of
> http://bugzilla.kernel.org/show_bug.cgi?id=9528
>
> Please try to hibernate in the shutdown mode (ie. echo
> "shutdown" into /sys/power/disk before hibernation).

Hmm. A very obscure thing - that bug, that is.

Tried "shutdown" - it works - even with all the other
"fancy" stuff like highres timers, cpufreq et al. And
it resumes correctly as well.

After reading all the stuff attached to that bugreport,
I also tried removing ohci_hcd - it also works just fine
(had to do it in one line --
rmmod ohci-hcd; sleep 5; echo disk > /sys/power/state
-- because I don't have non-USB keyboard handy :)

What I also noticied is that at least twice while doing
all the experiments, I've seen a message similar to (off
memory):

ohci_hcd: unlink after non-IRQ - controller is probably using the wrong IRQ

this is done when no_console_suspend is enabled - during
the final stage of suspend, when the kernel prints messages
about disabling acpi devices. I can't reproduce it easily,
but it happened at least twice with the same kernel configuration
(i tried different options, many variations, recompiling and
reinstalling kernel each time).

In any way, this is definitely progress, and that bug
seems to be the same as I'm seeing here.

Now... I see there's a new BIOS for this mobo available
(it's ASUS M2NPV-VM motherboard, Geforce6150/Nforce430(?)),
which is more recent compared with what I have here. Trying
it now (will try to reflash it without a floppy - it turns
out to be quite.. challenging task ;)

Thanks!

/mjt

2008-02-01 15:28:21

by Michael Tokarev

[permalink] [raw]
Subject: Re: swsusp on an AMD x2-64, 2.6.24: regression?

Michael Tokarev wrote:
> Rafael J. Wysocki wrote:
[]
>> I guess it's a special variation of
>> http://bugzilla.kernel.org/show_bug.cgi?id=9528
>>
>> Please try to hibernate in the shutdown mode (ie. echo
>> "shutdown" into /sys/power/disk before hibernation).

[yes it works with shutdown...]

> In any way, this is definitely progress, and that bug
> seems to be the same as I'm seeing here.
>
> Now... I see there's a new BIOS for this mobo available
> (it's ASUS M2NPV-VM motherboard, Geforce6150/Nforce430(?)),
> which is more recent compared with what I have here. Trying
> it now (will try to reflash it without a floppy - it turns
> out to be quite.. challenging task ;)

Ok, updated the bios (using freedos virtual boot floppy provided
by memdisk from syslinux), and... now it all works correctly!

I was definitely blaming linux for the regression - obviously,
as "before-kernel" worked, while "current-kernel" does not
anymore. But the problem seems to be due to some bios buglet.
Oh well... ;)

> Thanks!
!

Now I can go on with my other.. question, namely the UPS thingie... :)

/mjt