2007-05-26 22:42:48

by Bill Davidsen

[permalink] [raw]
Subject: [2.6.21.1] resume doesn't run suspended kernel?

I was testing susp2disk in 2.6.21.1 under FC6, to support reliable
computing environment (RCE) needs. The idea is that if power fails,
after some short time on UPS the system does susp2disk with a time set,
and boots back every so often to see if power is stable.

No, I don't want susp2mem until I debug it, console come up in useless
mode, console as kalidescope is not what I need.

Anyway, I pulled the plug on the UPS, and the system shut down. But when
it powered up, it booted the default kernel rather than the test kernel,
decided that it couldn't resume, and then did a cold boot.

I can bypass this by making the debug kernel the default, but WHY? Is
the kernel not saved such that any kernel can be rolled back into memory
and run? Actually, the answer is HELL NO, so I really ask if this is the
intended mode of operation, that only the default boot kernel will restore.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot


2007-05-27 02:44:48

by Robert Hancock

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Bill Davidsen wrote:
> I was testing susp2disk in 2.6.21.1 under FC6, to support reliable
> computing environment (RCE) needs. The idea is that if power fails,
> after some short time on UPS the system does susp2disk with a time set,
> and boots back every so often to see if power is stable.
>
> No, I don't want susp2mem until I debug it, console come up in useless
> mode, console as kalidescope is not what I need.
>
> Anyway, I pulled the plug on the UPS, and the system shut down. But when
> it powered up, it booted the default kernel rather than the test kernel,
> decided that it couldn't resume, and then did a cold boot.
>
> I can bypass this by making the debug kernel the default, but WHY? Is
> the kernel not saved such that any kernel can be rolled back into memory
> and run? Actually, the answer is HELL NO, so I really ask if this is the
> intended mode of operation, that only the default boot kernel will restore.

Fedora scripts for hibernation are supposed to tell GRUB to set the
default kernel on the next boot to be the current one before suspending
to disk, so that it comes up with the same version it was running and
the resume can succeed. If the way you're triggering the suspend
bypasses this mechanism, you'll see this problem.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-05-27 08:41:30

by David Greaves

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Bill Davidsen wrote:
> Anyway, I pulled the plug on the UPS, and the system shut down. But when
> it powered up, it booted the default kernel rather than the test kernel,
> decided that it couldn't resume, and then did a cold boot.

Booting the machine isn't the kernel's job, it's the bootloader's job.

> I can bypass this by making the debug kernel the default, but WHY? Is
> the kernel not saved such that any kernel can be rolled back into memory
> and run? Actually, the answer is HELL NO, so I really ask if this is the
> intended mode of operation, that only the default boot kernel will restore.

Yes.

It is very dangerous to attempt a resume with a different kernel than the one
that has gone to sleep.
Different kernels may be compiled with different options that affect where or
how in-memory structures are saved.

So you suspend with a kernel which holds your filesystem data/cache/inodes at
0x1234000 and restore with a kernel that expects to see your filesystem data at
0x1235000.

Ouch.

Personally I think the kernel suspend should write a signature - similar to a
hash of the bzImage - into the suspend image so it won't even attempt a resume
if there's a mismatch. (Yes, I made this mistake once whilst playing with suspend).

David


2007-05-27 13:08:09

by Bill Davidsen

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

David Greaves wrote:
> Bill Davidsen wrote:
>> Anyway, I pulled the plug on the UPS, and the system shut down. But when
>> it powered up, it booted the default kernel rather than the test kernel,
>> decided that it couldn't resume, and then did a cold boot.
>
> Booting the machine isn't the kernel's job, it's the bootloader's job.
>
And resume is not the the bootloader's job... if memory and registers
are restored, and a jump is made to the resume address, a resumed system
should result. clearly some part of that didn't happen :-(

>> I can bypass this by making the debug kernel the default, but WHY? Is
>> the kernel not saved such that any kernel can be rolled back into memory
>> and run? Actually, the answer is HELL NO, so I really ask if this is the
>> intended mode of operation, that only the default boot kernel will restore.
>
> Yes.
>
> It is very dangerous to attempt a resume with a different kernel than the one
> that has gone to sleep.
> Different kernels may be compiled with different options that affect where or
> how in-memory structures are saved.
>
If the mainline resume is depending on that no wonder resume is so
fragile. User action can change order of module loads, kmalloc calls
move allocated structures, etc. Counting on anything to be locked in
place seems naive.

> So you suspend with a kernel which holds your filesystem data/cache/inodes at
> 0x1234000 and restore with a kernel that expects to see your filesystem data at
> 0x1235000.
>
> Ouch.
>
I would hope that the data used by the resumed kernel would be the same
data that was suspended, not something from another kernel.

> Personally I think the kernel suspend should write a signature - similar to a
> hash of the bzImage - into the suspend image so it won't even attempt a resume
> if there's a mismatch. (Yes, I made this mistake once whilst playing with suspend).
>
Someone else dropped a note saying the FC kernels use suspend2, and work
fine. I'm off to look at the FC source and see if that's the case. That
would explain why suspend works and resume doesn't, hopefully there's a
2.6.21 suspend2 patch in that case.

Thanks for the feedback in any case.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-05-27 15:27:16

by David Greaves

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Bill Davidsen wrote:
> David Greaves wrote:
>> Bill Davidsen wrote:
>>> Anyway, I pulled the plug on the UPS, and the system shut down. But when
>>> it powered up, it booted the default kernel rather than the test kernel,
>>> decided that it couldn't resume, and then did a cold boot.
>>
>> Booting the machine isn't the kernel's job, it's the bootloader's job.
>>
> And resume is not the the bootloader's job... if memory and registers
> are restored, and a jump is made to the resume address, a resumed system
> should result. clearly some part of that didn't happen :-(

Well, what if you wanted to boot a 2nd, dual-boot OS?
The bootloader needs to boot the kernel which may choose to resume.

Is there a misunderstanding here?

I read your OP as saying that you booted kernel B (configured to have suspend
support) and then hit suspend. When the machine rebooted the "default kernel"
ie, kernel A, not kernel B was selected by the bootloader. Since the default
kernel didn't have or couldn't resume, it simply booted.
Just what I'd expect.

>> It is very dangerous to attempt a resume with a different kernel than
>> the one
>> that has gone to sleep.
>> Different kernels may be compiled with different options that affect
>> where or
>> how in-memory structures are saved.
>>
> If the mainline resume is depending on that no wonder resume is so
> fragile. User action can change order of module loads, kmalloc calls
> move allocated structures, etc. Counting on anything to be locked in
> place seems naive.
Err, no. It's a lot more sophisticated. However it does ask that you not resume
with a different kernel than you suspended with - not unreasonable!!

>> So you suspend with a kernel which holds your filesystem
>> data/cache/inodes at
>> 0x1234000 and restore with a kernel that expects to see your
>> filesystem data at
>> 0x1235000.
>>
>> Ouch.
>>
> I would hope that the data used by the resumed kernel would be the same
> data that was suspended, not something from another kernel.

Linux based OSes provide enough rope to build a harness or a noose. Choose wisely :)

As you suggest you are about to, it may be best to get a distro-configured
system or do some more background research. Mainline doesn't provide scripts to
interact with bootloaders etc.

Nb I replied because I've just done some work configuring s2d and now have 3
desktop/server machines doing suspend2disk on 2.6.21 quite nicely - thanks all
around.

David

2007-05-27 21:14:55

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

On Sat 2007-05-26 18:42:37, Bill Davidsen wrote:
> I was testing susp2disk in 2.6.21.1 under FC6, to
> support reliable computing environment (RCE) needs. The
> idea is that if power fails, after some short time on
> UPS the system does susp2disk with a time set, and boots
> back every so often to see if power is stable.
>
> No, I don't want susp2mem until I debug it, console come
> up in useless mode, console as kalidescope is not what I
> need.
>
> Anyway, I pulled the plug on the UPS, and the system
> shut down. But when it powered up, it booted the default
> kernel rather than the test kernel, decided that it
> couldn't resume, and then did a cold boot.
>
> I can bypass this by making the debug kernel the
> default, but WHY?

HELL YES :-). We do not save kernel code into image.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-27 21:17:37

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi!

> Personally I think the kernel suspend should write a signature - similar to a
> hash of the bzImage - into the suspend image so it won't even attempt a resume
> if there's a mismatch. (Yes, I made this mistake once whilst playing with suspend).

We have such 'hash' but it is not foolproof. Improvements welcome.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-27 21:20:25

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi!

> >It is very dangerous to attempt a resume with a
> >different kernel than the one
> >that has gone to sleep.
> >Different kernels may be compiled with different
> >options that affect where or
> >how in-memory structures are saved.
> >
> If the mainline resume is depending on that no wonder
> resume is so fragile. User action can change order of
> module loads, kmalloc calls move allocated structures,
> etc. Counting on anything to be locked in place seems
> naive.

Look at code before spreading FUD. (suspend and suspend2 are same in
this matter).

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-28 03:12:55

by Bill Davidsen

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Pavel Machek wrote:
> On Sat 2007-05-26 18:42:37, Bill Davidsen wrote:
>
>> I was testing susp2disk in 2.6.21.1 under FC6, to
>> support reliable computing environment (RCE) needs. The
>> idea is that if power fails, after some short time on
>> UPS the system does susp2disk with a time set, and boots
>> back every so often to see if power is stable.
>>
>> No, I don't want susp2mem until I debug it, console come
>> up in useless mode, console as kalidescope is not what I
>> need.
>>
>> Anyway, I pulled the plug on the UPS, and the system
>> shut down. But when it powered up, it booted the default
>> kernel rather than the test kernel, decided that it
>> couldn't resume, and then did a cold boot.
>>
>> I can bypass this by making the debug kernel the
>> default, but WHY?
>>
>
> HELL YES :-). We do not save kernel code into image.
>
>
That's clear, I'll have to use xen or kvm or similar which restores the
system as suspended. Thanks for the clarification of the limitations.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2007-05-28 13:19:30

by Bill Davidsen

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Bill Davidsen wrote:
> Pavel Machek wrote:
>> On Sat 2007-05-26 18:42:37, Bill Davidsen wrote:
>>
>>> I was testing susp2disk in 2.6.21.1 under FC6, to support reliable
>>> computing environment (RCE) needs. The idea is that if power fails,
>>> after some short time on UPS the system does susp2disk with a time
>>> set, and boots back every so often to see if power is stable.
>>>
>>> No, I don't want susp2mem until I debug it, console come up in
>>> useless mode, console as kalidescope is not what I need.
>>>
>>> Anyway, I pulled the plug on the UPS, and the system shut down. But
>>> when it powered up, it booted the default kernel rather than the
>>> test kernel, decided that it couldn't resume, and then did a cold boot.
>>>
>>> I can bypass this by making the debug kernel the default, but WHY?
>>
>> HELL YES :-). We do not save kernel code into image.
>>
>>
> That's clear, I'll have to use xen or kvm or similar which restores
> the system as suspended. Thanks for the clarification of the limitations.
>
Sorry, I wrote that late at night and quickly. I should have said
"design decision" rather than "limitation," For systems which don't do
multiple kernels it's not an issue.

I certainly would not have made the same decision, but I didn't write
the code. It seems more robust to save everything than to try to
identify what has and hasn't changed in a modular kernel.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-05-28 13:26:36

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi!

> >That's clear, I'll have to use xen or kvm or similar which restores
> >the system as suspended. Thanks for the clarification of the limitations.
> >
> Sorry, I wrote that late at night and quickly. I should have said
> "design decision" rather than "limitation," For systems which don't do
> multiple kernels it's not an issue.
>
> I certainly would not have made the same decision, but I didn't write
> the code. It seems more robust to save everything than to try to
> identify what has and hasn't changed in a modular kernel.

We rely on atomic copy routine not moving inside the kernel. Yes, it
would be possible to copy it to "known good" address and gain ability
to resume different kernels. Actually it should not be _that_ hard.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-28 17:52:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

On Monday, 28 May 2007 15:26, Pavel Machek wrote:
> Hi!
>
> > >That's clear, I'll have to use xen or kvm or similar which restores
> > >the system as suspended. Thanks for the clarification of the limitations.
> > >
> > Sorry, I wrote that late at night and quickly. I should have said
> > "design decision" rather than "limitation," For systems which don't do
> > multiple kernels it's not an issue.
> >
> > I certainly would not have made the same decision, but I didn't write
> > the code. It seems more robust to save everything than to try to
> > identify what has and hasn't changed in a modular kernel.
>
> We rely on atomic copy routine not moving inside the kernel. Yes, it
> would be possible to copy it to "known good" address and gain ability
> to resume different kernels. Actually it should not be _that_ hard.

Yup. Don't we do something like this for the (ACPI-based) suspend to RAM
already?

Greetings,
Rafael

2007-05-28 22:48:16

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi.

On Mon, 2007-05-28 at 19:57 +0200, Rafael J. Wysocki wrote:
> On Monday, 28 May 2007 15:26, Pavel Machek wrote:
> > Hi!
> >
> > > >That's clear, I'll have to use xen or kvm or similar which restores
> > > >the system as suspended. Thanks for the clarification of the limitations.
> > > >
> > > Sorry, I wrote that late at night and quickly. I should have said
> > > "design decision" rather than "limitation," For systems which don't do
> > > multiple kernels it's not an issue.
> > >
> > > I certainly would not have made the same decision, but I didn't write
> > > the code. It seems more robust to save everything than to try to
> > > identify what has and hasn't changed in a modular kernel.
> >
> > We rely on atomic copy routine not moving inside the kernel. Yes, it
> > would be possible to copy it to "known good" address and gain ability
> > to resume different kernels. Actually it should not be _that_ hard.
>
> Yup. Don't we do something like this for the (ACPI-based) suspend to RAM
> already?

Yeah, I was thinking about this overnight too. It should be doable. In
addition to what we already do, I think you'd want:

- to copy the assembly to do the copying to a safe page;
- to put the location of the cpu state that was saved in the image
header so that it can be used after the data is copied back;
- to copy the nosave data to a 'safe' page.

What else?

Regards,

Nigel


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-05-29 11:30:08

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi!

> > Yup. Don't we do something like this for the (ACPI-based) suspend to RAM
> > already?
>
> Yeah, I was thinking about this overnight too. It should be doable. In
> addition to what we already do, I think you'd want:
>
> - to copy the assembly to do the copying to a safe page;
> - to put the location of the cpu state that was saved in the image
> header so that it can be used after the data is copied back;

...alternatively, we can just rely on copy routine (and its data) not
changing frequently.

> - to copy the nosave data to a 'safe' page.
>
> What else?

page directories need to be on a safe place, too.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-29 11:57:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

On Tuesday, 29 May 2007 13:29, Pavel Machek wrote:
> Hi!
>
> > > Yup. Don't we do something like this for the (ACPI-based) suspend to RAM
> > > already?
> >
> > Yeah, I was thinking about this overnight too. It should be doable. In
> > addition to what we already do, I think you'd want:
> >
> > - to copy the assembly to do the copying to a safe page;
> > - to put the location of the cpu state that was saved in the image
> > header so that it can be used after the data is copied back;
>
> ...alternatively, we can just rely on copy routine (and its data) not
> changing frequently.
>
> > - to copy the nosave data to a 'safe' page.
> >
> > What else?
>
> page directories need to be on a safe place, too.

They are already.

Greetings,
Rafael

2007-05-29 12:24:20

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi.

On Tue, 2007-05-29 at 14:03 +0200, Rafael J. Wysocki wrote:
> On Tuesday, 29 May 2007 13:29, Pavel Machek wrote:
> > Hi!
> >
> > > > Yup. Don't we do something like this for the (ACPI-based) suspend to RAM
> > > > already?
> > >
> > > Yeah, I was thinking about this overnight too. It should be doable. In
> > > addition to what we already do, I think you'd want:
> > >
> > > - to copy the assembly to do the copying to a safe page;
> > > - to put the location of the cpu state that was saved in the image
> > > header so that it can be used after the data is copied back;
> >
> > ...alternatively, we can just rely on copy routine (and its data) not
> > changing frequently.

I'd rather be sure. It will be extra code, but reliability is important.

> > > - to copy the nosave data to a 'safe' page.
> > >
> > > What else?
> >
> > page directories need to be on a safe place, too.
>
> They are already.

Yeah - that's why I ignored them.

Regards,

Nigel


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-05-29 12:41:09

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

On Tue 2007-05-29 14:03:07, Rafael J. Wysocki wrote:
> On Tuesday, 29 May 2007 13:29, Pavel Machek wrote:
> > Hi!
> >
> > > > Yup. Don't we do something like this for the (ACPI-based) suspend to RAM
> > > > already?
> > >
> > > Yeah, I was thinking about this overnight too. It should be doable. In
> > > addition to what we already do, I think you'd want:
> > >
> > > - to copy the assembly to do the copying to a safe page;
> > > - to put the location of the cpu state that was saved in the image
> > > header so that it can be used after the data is copied back;
> >
> > ...alternatively, we can just rely on copy routine (and its data) not
> > changing frequently.
> >
> > > - to copy the nosave data to a 'safe' page.
> > >
> > > What else?
> >
> > page directories need to be on a safe place, too.
>
> They are already.

...but will that place still be safe when we use other version of
kernel?

Anyway, pagedirs are on the safe place, right? That means that we
swsusp should no longer clash with page allocation debugging...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-29 13:14:15

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi.

On Tue, 2007-05-29 at 14:40 +0200, Pavel Machek wrote:
> On Tue 2007-05-29 14:03:07, Rafael J. Wysocki wrote:
> > On Tuesday, 29 May 2007 13:29, Pavel Machek wrote:
> > > Hi!
> > >
> > > > > Yup. Don't we do something like this for the (ACPI-based) suspend to RAM
> > > > > already?
> > > >
> > > > Yeah, I was thinking about this overnight too. It should be doable. In
> > > > addition to what we already do, I think you'd want:
> > > >
> > > > - to copy the assembly to do the copying to a safe page;
> > > > - to put the location of the cpu state that was saved in the image
> > > > header so that it can be used after the data is copied back;
> > >
> > > ...alternatively, we can just rely on copy routine (and its data) not
> > > changing frequently.
> > >
> > > > - to copy the nosave data to a 'safe' page.
> > > >
> > > > What else?
> > >
> > > page directories need to be on a safe place, too.
> >
> > They are already.
>
> ...but will that place still be safe when we use other version of
> kernel?

They'll be in the image too, won't they? Failing that, the information
could be stored in the image header.

> Anyway, pagedirs are on the safe place, right? That means that we
> swsusp should no longer clash with page allocation debugging...

You mean DEBUG_PAGEALLOC? That can be overcome easily - I have code in
current Suspend2 that works with DEBUG_PAGEALLOC. I handle the page
fault, mapping the page and setting a flag in the fault handler to tell
the atomic copy code to unmap the page again once it has been copied.

Regards,

Nigel


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-05-29 21:45:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi,

On Tuesday, 29 May 2007 15:13, Nigel Cunningham wrote:
> Hi.
>
> On Tue, 2007-05-29 at 14:40 +0200, Pavel Machek wrote:
> > On Tue 2007-05-29 14:03:07, Rafael J. Wysocki wrote:
> > > On Tuesday, 29 May 2007 13:29, Pavel Machek wrote:
> > > > Hi!
> > > >
> > > > > > Yup. Don't we do something like this for the (ACPI-based) suspend to RAM
> > > > > > already?
> > > > >
> > > > > Yeah, I was thinking about this overnight too. It should be doable. In
> > > > > addition to what we already do, I think you'd want:
> > > > >
> > > > > - to copy the assembly to do the copying to a safe page;
> > > > > - to put the location of the cpu state that was saved in the image
> > > > > header so that it can be used after the data is copied back;
> > > >
> > > > ...alternatively, we can just rely on copy routine (and its data) not
> > > > changing frequently.
> > > >
> > > > > - to copy the nosave data to a 'safe' page.
> > > > >
> > > > > What else?
> > > >
> > > > page directories need to be on a safe place, too.
> > >
> > > They are already.
> >
> > ...but will that place still be safe when we use other version of
> > kernel?

Yes.

> They'll be in the image too, won't they? Failing that, the information
> could be stored in the image header.

In fact, for each page we have the number of the page frame that it should be
restored to. Page frame numbers don't change. :-)

> > Anyway, pagedirs are on the safe place, right? That means that we
> > swsusp should no longer clash with page allocation debugging...
>
> You mean DEBUG_PAGEALLOC? That can be overcome easily - I have code in
> current Suspend2 that works with DEBUG_PAGEALLOC. I handle the page
> fault, mapping the page and setting a flag in the fault handler to tell
> the atomic copy code to unmap the page again once it has been copied.

Well, I can't comment, I haven't look at that yet.

Greetings,
Rafael

2007-06-04 11:02:22

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi!

> > > They are already.
> >
> > ...but will that place still be safe when we use other version of
> > kernel?
>
> They'll be in the image too, won't they? Failing that, the information
> could be stored in the image header.
>
> > Anyway, pagedirs are on the safe place, right? That means that we
> > swsusp should no longer clash with page allocation debugging...
>
> You mean DEBUG_PAGEALLOC? That can be overcome easily - I have code in
> current Suspend2 that works with DEBUG_PAGEALLOC. I handle the page
> fault, mapping the page and setting a flag in the fault handler to tell
> the atomic copy code to unmap the page again once it has been copied.

I meant debug_pagealloc, but no, I do not think we want to make page
fault handler more complex. Switching to 1:1 mapping tables should be
enough.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-06-04 11:06:15

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi.

On Mon, 2007-06-04 at 13:02 +0200, Pavel Machek wrote:
> Hi!
>
> > > > They are already.
> > >
> > > ...but will that place still be safe when we use other version of
> > > kernel?
> >
> > They'll be in the image too, won't they? Failing that, the information
> > could be stored in the image header.
> >
> > > Anyway, pagedirs are on the safe place, right? That means that we
> > > swsusp should no longer clash with page allocation debugging...
> >
> > You mean DEBUG_PAGEALLOC? That can be overcome easily - I have code in
> > current Suspend2 that works with DEBUG_PAGEALLOC. I handle the page
> > fault, mapping the page and setting a flag in the fault handler to tell
> > the atomic copy code to unmap the page again once it has been copied.
>
> I meant debug_pagealloc, but no, I do not think we want to make page
> fault handler more complex. Switching to 1:1 mapping tables should be
> enough.
> Pavel

@@ -311,6 +315,20 @@ fastcall void __kprobes do_page_fault(struct pt_regs *regs,

si_code = SEGV_MAPERR;

+ /* During a Suspend2 atomic copy, with DEBUG_SLAB, we will
+ * get page faults where slab has been unmapped. Map them
+ * temporarily and set the variable that tells Suspend2 to
+ * unmap afterwards.
+ */
+
+ if (unlikely(suspend2_running && !suspend2_faulted)) {
+ struct page *page = NULL;
+ suspend2_faulted = 1;
+ page = virt_to_page(address);
+ kernel_map_pages(page, 1, 1);
+ return;
+ }
+
/*
* We fault-in kernel-space virtual memory on-demand. The
* 'reference' page table is init_mm.pgd.

Regards,

Nigel


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-06-05 09:35:17

by Stefan Seyfried

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Hi,

On Sat, May 26, 2007 at 06:42:37PM -0400, Bill Davidsen wrote:
> I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing
> environment (RCE) needs. The idea is that if power fails, after some short
> time on UPS the system does susp2disk with a time set, and boots back every
> so often to see if power is stable.

Interesting use case.

> No, I don't want susp2mem until I debug it, console come up in useless mode,
> console as kalidescope is not what I need.

You probably need to reset the video mode. Try the s2ram workaround,
specifically "-m".

> Anyway, I pulled the plug on the UPS, and the system shut down. But when it
> powered up, it booted the default kernel rather than the test kernel, decided
> that it couldn't resume, and then did a cold boot.
>
> I can bypass this by making the debug kernel the default, but WHY? Is the
> kernel not saved such that any kernel can be rolled back into memory and run?

The Kernel does nothing to the bootloader during suspend. The kernel does not
even know that you are using a bootloader and how it might be configured.

Userland has to do this (and SUSE's pm-utils actually do. I thought the
Fedora pm-utils also did, but i cannot say for sure). "Just" find out which
entry in menu.lst corresponds to the currently running kernel, and preselect
it for the next boot. It is doable.

So it's a problem of your distro's userland (and if you did not use
pm-hibernate to suspend, it is your very own problem).

You could of course simply go for GRUB's "default saved" and "savedefault"
feature, to always boot the last-booted kernel unless changed in the menu.
--
Stefan Seyfried
QA / R&D Team Mobile Devices | "Any ideas, John?"
SUSE LINUX Products GmbH, N?rnberg | "Well, surrounding them's out."

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)

2007-06-05 14:07:25

by Bill Davidsen

[permalink] [raw]
Subject: Re: [2.6.21.1] resume doesn't run suspended kernel?

Stefan Seyfried wrote:
> Hi,
>
> On Sat, May 26, 2007 at 06:42:37PM -0400, Bill Davidsen wrote:
>
>> I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing
>> environment (RCE) needs. The idea is that if power fails, after some short
>> time on UPS the system does susp2disk with a time set, and boots back every
>> so often to see if power is stable.
>>
>
> Interesting use case.
>
>
>> No, I don't want susp2mem until I debug it, console come up in useless mode,
>> console as kalidescope is not what I need.
>>
>
> You probably need to reset the video mode. Try the s2ram workaround,
> specifically "-m".
>
>
>> Anyway, I pulled the plug on the UPS, and the system shut down. But when it
>> powered up, it booted the default kernel rather than the test kernel, decided
>> that it couldn't resume, and then did a cold boot.
>>
>> I can bypass this by making the debug kernel the default, but WHY? Is the
>> kernel not saved such that any kernel can be rolled back into memory and run?
>>
>
> The Kernel does nothing to the bootloader during suspend. The kernel does not
> even know that you are using a bootloader and how it might be configured.
>
>
What I really expected is that what I was running would be save, and
resume would restore what I was running and then jump back to where that
suspended itself. Without having to address the issue of booting the
"right" kernel, but having any functional kernel which was booted then
restore whar was originally suspended.

From discussion here, I conclude that "it could work that way but doesn't."
> Userland has to do this (and SUSE's pm-utils actually do. I thought the
> Fedora pm-utils also did, but i cannot say for sure). "Just" find out which
> entry in menu.lst corresponds to the currently running kernel, and preselect
> it for the next boot. It is doable.
>
> So it's a problem of your distro's userland (and if you did not use
> pm-hibernate to suspend, it is your very own problem).
>
> You could of course simply go for GRUB's "default saved" and "savedefault"
> feature, to always boot the last-booted kernel unless changed in the menu.
>
I'm being very careful to avoid changing the default boot kernel. If the
system suspends (ie. deliberately) I want to resume in the running
kernel, but if it crashes I want the cold boot to bring up a known
stable kernel, even though that may be lacking in features, have an old
scheduler, etc.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979