2007-09-05 00:12:48

by Fernando Lopez-Lezcano

[permalink] [raw]
Subject: 2.6.22.6 + rt9: suspend/hibernate not working

Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
I confirmed) that the latest rt kernel I released has broken suspend
(tested on fc6 & fc7, stock Fedora kernel works fine - the rt
configuration files are virtual clones as far as possible of the
standard Fedora kernel config files).

I don't know where to start debugging this. When suspend is initiated it
freezes with a "Stopping tasks ... " message in the text console - a
hard power cycle is the only way to get the machine back to normal.

kernel/power/process.c seems to contain that string in the
freeze_processes function so it looks like the freezer is not freezing
tasks as no "done" message is ever printed.

What could we do to help?
-- Fernando



2007-09-05 00:27:40

by Daniel Walker

[permalink] [raw]
Subject: Re: 2.6.22.6 + rt9: suspend/hibernate not working

On Tue, 2007-09-04 at 17:12 -0700, Fernando Lopez-Lezcano wrote:
> Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
> I confirmed) that the latest rt kernel I released has broken suspend
> (tested on fc6 & fc7, stock Fedora kernel works fine - the rt
> configuration files are virtual clones as far as possible of the
> standard Fedora kernel config files).
>
> I don't know where to start debugging this. When suspend is initiated it
> freezes with a "Stopping tasks ... " message in the text console - a
> hard power cycle is the only way to get the machine back to normal.
>
> kernel/power/process.c seems to contain that string in the
> freeze_processes function so it looks like the freezer is not freezing
> tasks as no "done" message is ever printed.
>
> What could we do to help?

If you have high resolution timers enabled you could try disabling it,
and see if the problem persists .

Daniel

2007-09-06 10:42:50

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.22.6 + rt9: suspend/hibernate not working

Hi!

> Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
> I confirmed) that the latest rt kernel I released has broken suspend
> (tested on fc6 & fc7, stock Fedora kernel works fine - the rt
> configuration files are virtual clones as far as possible of the
> standard Fedora kernel config files).
>
> I don't know where to start debugging this. When suspend is initiated it
> freezes with a "Stopping tasks ... " message in the text console - a
> hard power cycle is the only way to get the machine back to normal.

It should print what went wrong about ~20 seconds. If it does not,
verify that timers work properly for you.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-09-06 18:42:39

by Fernando Lopez-Lezcano

[permalink] [raw]
Subject: Re: 2.6.22.6 + rt9: suspend/hibernate not working

On Tue, 2007-09-04 at 17:15 -0700, Daniel Walker wrote:
> On Tue, 2007-09-04 at 17:12 -0700, Fernando Lopez-Lezcano wrote:
> > Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
> > I confirmed) that the latest rt kernel I released has broken suspend
> > (tested on fc6 & fc7, stock Fedora kernel works fine - the rt
> > configuration files are virtual clones as far as possible of the
> > standard Fedora kernel config files).
> >
> > I don't know where to start debugging this. When suspend is initiated it
> > freezes with a "Stopping tasks ... " message in the text console - a
> > hard power cycle is the only way to get the machine back to normal.
> >
> > kernel/power/process.c seems to contain that string in the
> > freeze_processes function so it looks like the freezer is not freezing
> > tasks as no "done" message is ever printed.
> >
> > What could we do to help?
>
> If you have high resolution timers enabled you could try disabling it,
> and see if the problem persists .

The problem is still there ("Stopping tasks ... " and nothing
afterwards).

I'm attaching the configuration with which the kernel was built.
-- Fernando


Attachments:
config-2.6.22.6-1.rt9.4.fc7.ccrmart (75.58 kB)

2007-09-06 19:55:46

by Fernando Lopez-Lezcano

[permalink] [raw]
Subject: Re: 2.6.22.6 + rt9: suspend/hibernate not working

On Thu, 2007-09-06 at 11:42 -0700, Fernando Lopez-Lezcano wrote:
> On Tue, 2007-09-04 at 17:15 -0700, Daniel Walker wrote:
> > On Tue, 2007-09-04 at 17:12 -0700, Fernando Lopez-Lezcano wrote:
> > > Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
> > > I confirmed) that the latest rt kernel I released has broken suspend
> > > (tested on fc6 & fc7, stock Fedora kernel works fine - the rt
> > > configuration files are virtual clones as far as possible of the
> > > standard Fedora kernel config files).
> > >
> > > I don't know where to start debugging this. When suspend is initiated it
> > > freezes with a "Stopping tasks ... " message in the text console - a
> > > hard power cycle is the only way to get the machine back to normal.
> > >
> > > kernel/power/process.c seems to contain that string in the
> > > freeze_processes function so it looks like the freezer is not freezing
> > > tasks as no "done" message is ever printed.
> > >
> > > What could we do to help?
> >
> > If you have high resolution timers enabled you could try disabling it,
> > and see if the problem persists .
>
> The problem is still there ("Stopping tasks ... " and nothing
> afterwards).

Looks like it was a known problem (sorry about the noise), see:
http://lkml.org/lkml/2007/8/25/117

It does fix the problem here as well.
Ingo: is this still the right fix for 2.6.22.6 + rt9?

-- Fernando


2007-09-06 23:30:23

by Fernando Lopez-Lezcano

[permalink] [raw]
Subject: Re: [PlanetCCRMA] Re: 2.6.22.6 + rt9: suspend/hibernate not working

On Thu, 2007-09-06 at 12:55 -0700, Fernando Lopez-Lezcano wrote:
> On Thu, 2007-09-06 at 11:42 -0700, Fernando Lopez-Lezcano wrote:
> > On Tue, 2007-09-04 at 17:15 -0700, Daniel Walker wrote:
> > > On Tue, 2007-09-04 at 17:12 -0700, Fernando Lopez-Lezcano wrote:
> > > > Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
> > > > I confirmed) that the latest rt kernel I released has broken suspend
> > > > (tested on fc6 & fc7, stock Fedora kernel works fine - the rt
> > > > configuration files are virtual clones as far as possible of the
> > > > standard Fedora kernel config files).
> > > >
> > > > I don't know where to start debugging this. When suspend is initiated it
> > > > freezes with a "Stopping tasks ... " message in the text console - a
> > > > hard power cycle is the only way to get the machine back to normal.
> > > >
> > > > kernel/power/process.c seems to contain that string in the
> > > > freeze_processes function so it looks like the freezer is not freezing
> > > > tasks as no "done" message is ever printed.
> > > >
> > > > What could we do to help?
> > >
> > > If you have high resolution timers enabled you could try disabling it,
> > > and see if the problem persists .
> >
> > The problem is still there ("Stopping tasks ... " and nothing
> > afterwards).
>
> Looks like it was a known problem (sorry about the noise), see:
> http://lkml.org/lkml/2007/8/25/117
>
> It does fix the problem here as well.
> Ingo: is this still the right fix for 2.6.22.6 + rt9?

I'm seeing this while going into suspend:

...
Disabling non-boot CPUs ...
Breaking affinity for irq 218
CPU 1 is now offline
SMP alternatives: switching to UP code
BUG: sleeping function called from invalid context pm-suspend(3676) at
kernel/rtmutex.c:636
in_atomic():0 [00000000], irqs_disabled():1
[<c061c13d>] __rt_spin_lock+0x21/0x3d
[<c0461b2c>] free_pages_bulk+0x28/0x188
[<c042653d>] migration_call+0x3a5/0x3be
[<c0461cd4>] __drain_pages+0x48/0x69
[<c0461d0a>] page_alloc_cpu_notify+0x15/0x2b
[<c061e083>] notifier_call_chain+0x2a/0x47
[<c0435fe8>] raw_notifier_call_chain+0x17/0x1a
[<c0446869>] _cpu_down+0x17a/0x238
[<c042b4e5>] printk+0x1f/0x92
[<c0446ad0>] disable_nonboot_cpus+0x4e/0xd2
[<c044b673>] enter_state+0x116/0x1d6
[<c044b831>] state_store+0xc9/0xe0
[<c044b768>] state_store+0x0/0xe0
[<c04b44cf>] subsys_attr_store+0x27/0x2b
[<c04b45db>] sysfs_write_file+0x9a/0xbd
[<c04b4541>] sysfs_write_file+0x0/0xbd
[<c047bc97>] vfs_write+0xa8/0x15a
[<c047c2c0>] sys_write+0x41/0x67
[<c0404ef6>] syscall_call+0x7/0xb
=======================
CPU1 is down
PM: Entering mem sleep
thinkpad_acpi thinkpad_acpi: LATE suspend
...

I'm attaching the whole compressed dmesg output to put it in context.
-- Fernando


Attachments:
dmesg.1.bz2 (10.82 kB)