2005-03-18 16:33:32

by Erik Andrén

[permalink] [raw]
Subject: Suspend-to-disk woes

Hello, I experienced a pretty nasty problem a couple of days back:

I ran 2.6.11-ck1 and built 2.6.11-ck2. The last thing I did before
booting the new kernel was to suspend-to-disk the old kernel (something
I usually do as I'm working on this laptop).
I ran the new kernel a couple of days and decided to boot the old kernel
to do some performance tests. Imagine my dread as the old kernel instead
of detecting that the system has booted another kernel just reloads the
old suspend-to-disk image. The result is that after succesfully
resuming, my harddrive goes bonkers and starts to work. After a couple
of minutes the whole kernel hangs. I reboot and try to boot the -ck2
kernel again only to find that the system complains as it finds missing
nodes. The reisertools try to rebuild the system unsucessully. The
--rebuild-tree parameter worked but a lot of files were still missing.
In the end I had to reinstall the whole system as it went so unstable.

My question is: Why isn't there a check before resuming a
suspend-to-disk image if the system has booted another kernel since the
suspend to prevent this kind of hassle?
//Regards Erik Andr?n

Please cc me as I'm not on the lkml list yadda yadda


2005-03-18 20:55:37

by Stefan Seyfried

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Erik Andr?n wrote:

> My question is: Why isn't there a check before resuming a
> suspend-to-disk image if the system has booted another kernel since the
> suspend to prevent this kind of hassle?

Just provide a patch which does this. Hint: this is highly nontrivial.
If you boot a kernel, that does not know swsusp (and if it knew, it
would have invalidated the suspend image in the swap), or which does not
have the necessary information (because of a missing resume= parameter),
this kernel cannot do much.

Stefan

2005-03-18 22:04:23

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Hi.

The simplest solution is to mkswap your swap partitions during boot.

Nigel

On Sat, 2005-03-19 at 03:28, Erik Andr?n wrote:
> Hello, I experienced a pretty nasty problem a couple of days back:
>
> I ran 2.6.11-ck1 and built 2.6.11-ck2. The last thing I did before
> booting the new kernel was to suspend-to-disk the old kernel (something
> I usually do as I'm working on this laptop).
> I ran the new kernel a couple of days and decided to boot the old kernel
> to do some performance tests. Imagine my dread as the old kernel instead
> of detecting that the system has booted another kernel just reloads the
> old suspend-to-disk image. The result is that after succesfully
> resuming, my harddrive goes bonkers and starts to work. After a couple
> of minutes the whole kernel hangs. I reboot and try to boot the -ck2
> kernel again only to find that the system complains as it finds missing
> nodes. The reisertools try to rebuild the system unsucessully. The
> --rebuild-tree parameter worked but a lot of files were still missing.
> In the end I had to reinstall the whole system as it went so unstable.
>
> My question is: Why isn't there a check before resuming a
> suspend-to-disk image if the system has booted another kernel since the
> suspend to prevent this kind of hassle?
> //Regards Erik Andr?n
>
> Please cc me as I'm not on the lkml list yadda yadda
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-03-19 19:20:59

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Hi!

> Hello, I experienced a pretty nasty problem a couple of days back:
>
> I ran 2.6.11-ck1 and built 2.6.11-ck2. The last thing I did before
> booting the new kernel was to suspend-to-disk the old kernel
> (something I usually do as I'm working on this laptop).
> I ran the new kernel a couple of days and decided to boot the old
> kernel to do some performance tests. Imagine my dread as the old
> kernel instead of detecting that the system has booted another kernel
> just reloads the old suspend-to-disk image. The result is that after
> succesfully resuming, my harddrive goes bonkers and starts to work.
> After a couple of minutes the whole kernel hangs. I reboot and try to
> boot the -ck2 kernel again only to find that the system complains as
> it finds missing nodes. The reisertools try to rebuild the system
> unsucessully. The --rebuild-tree parameter worked but a lot of files
> were still missing. In the end I had to reinstall the whole system as
> it went so unstable.
>
> My question is: Why isn't there a check before resuming a
> suspend-to-disk image if the system has booted another kernel since
> the suspend to prevent this kind of hassle?

Checking that would be hard, but you might want to provide patch to check
last-mounted dates of filesystems and panic if they changed.
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2005-03-19 20:19:23

by Russell Miller

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

On Saturday 19 March 2005 05:26, Pavel Machek wrote:

> Checking that would be hard, but you might want to provide patch to check
> last-mounted dates of filesystems and panic if they changed.
> Pavel

Then how would you fix it? There'd also have to be a way to reset it,
otherwise the kernel will never boot again. Perhaps an argument to the
kernel that allows for resetting of the mechanism?

--Russell

--

Russell Miller - [email protected] - Agoura, CA

2005-03-19 21:29:38

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

On So 19-03-05 12:20:35, Russell Miller wrote:
> On Saturday 19 March 2005 05:26, Pavel Machek wrote:
>
> > Checking that would be hard, but you might want to provide patch to check
> > last-mounted dates of filesystems and panic if they changed.
> > Pavel
>
> Then how would you fix it? There'd also have to be a way to reset it,

boot with "noresume", then mkswap.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-19 21:42:49

by Russell Miller

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

On Saturday 19 March 2005 13:29, Pavel Machek wrote:
> On So 19-03-05 12:20:35, Russell Miller wrote:
> > On Saturday 19 March 2005 05:26, Pavel Machek wrote:
> > > Checking that would be hard, but you might want to provide patch to
> > > check last-mounted dates of filesystems and panic if they changed.
> > > Pavel
> >
> > Then how would you fix it? There'd also have to be a way to reset it,
>
> boot with "noresume", then mkswap.
> Pavel
Ah, makes sense.

I've never used the resume functionality, so my ignorance on that subject is
understandable... :-)

--Russell

--

Russell Miller - [email protected] - Agoura, CA

2005-03-21 00:12:33

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Hi.

On Sun, 2005-03-20 at 08:29, Pavel Machek wrote:
> On So 19-03-05 12:20:35, Russell Miller wrote:
> > On Saturday 19 March 2005 05:26, Pavel Machek wrote:
> >
> > > Checking that would be hard, but you might want to provide patch to check
> > > last-mounted dates of filesystems and panic if they changed.
> > > Pavel
> >
> > Then how would you fix it? There'd also have to be a way to reset it,
>
> boot with "noresume", then mkswap.

Yuck! Why panic when you know what is needed? A better solution is to
tell the user they've messed up and given them the option to (1) reboot
and try another kernel or (2) have swsusp restore the original swap
signature and continue booting. This is what suspend2 does (with a
timeout for the prompt). It's not that hard.

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-03-21 00:18:01

by Matthew Garrett

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Nigel Cunningham <[email protected]> wrote:

> Yuck! Why panic when you know what is needed? A better solution is to
> tell the user they've messed up and given them the option to (1) reboot
> and try another kernel or (2) have swsusp restore the original swap
> signature and continue booting. This is what suspend2 does (with a
> timeout for the prompt). It's not that hard.

It's trivial to do this in userspace - just have an app in initramfs
that checks for a swsusp signature, and then compare the kernel
versions. If they mismatch, prompt for what to do. Putting it in the
kernel is madness.

--
Matthew Garrett | [email protected]

2005-03-21 05:57:38

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Hi.

On Mon, 2005-03-21 at 11:17, Matthew Garrett wrote:
> Nigel Cunningham <[email protected]> wrote:
>
> > Yuck! Why panic when you know what is needed? A better solution is to
> > tell the user they've messed up and given them the option to (1) reboot
> > and try another kernel or (2) have swsusp restore the original swap
> > signature and continue booting. This is what suspend2 does (with a
> > timeout for the prompt). It's not that hard.
>
> It's trivial to do this in userspace - just have an app in initramfs
> that checks for a swsusp signature, and then compare the kernel
> versions. If they mismatch, prompt for what to do. Putting it in the
> kernel is madness.

It's not that trivial.

- You need to know how to modify your initramfs to do it;
- You might have to (learn how to) set up an initramfs just for this;
- Your image might not be stored in a swap partition. For Suspend2, it
can potentially in a swap file or (soon) an ordinary file;
- Finding which partition to look in for the signature might be non
trivial (labels in fstab). You'd want to hard code it or (perferably)
copy a config file from the root (or other) partition;
- Having addressed the above issues, you still need to add code to read
the swap header, parse it to find the header, read the header from the
image, parse it and obtain the kernel version of the saved image.

If your image is not stored in a swap partition, you probably can't
mount the fs the image is stored on, because doing so will replay the
image and make resuming unsafe, so this approach is less trivial without
knowing exactly which disk blocks and device IDs to use (and using dd to
access them).

On top of these, we have two implementations, so you'll want to check
for the signatures of both.

That said, I am considering making something like what you're saying:
exposing methods of testing whether an image exists and an entry through
which you can get Suspend to erase an image via a proc (eventually
sysfs) entry. This will allow something like what you're saying to be
controlled from userspace.

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-03-21 08:19:23

by Stefan Seyfried

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Nigel Cunningham wrote:
> Hi.
>
> On Sun, 2005-03-20 at 08:29, Pavel Machek wrote:

>> boot with "noresume", then mkswap.
>
> Yuck! Why panic when you know what is needed? A better solution is to

Ok, so let's

printk("You booted another kernel than you suspended with.\n");
printk("You have two options now:\n");
printk(" - boot the kernel you suspended with\n");
printk(" - pass 'noresume' at boot and mkswap your swap partition "
" later\n");
printk("Try again, player 1!\n");
panic();

> tell the user they've messed up and give them the option to (1) reboot
> and try another kernel or (2) have swsusp restore the original swap
> signature and continue booting. This is what suspend2 does (with a
> timeout for the prompt). It's not that hard.

yes, but you need user input etc. Not considered a good idea IIRC.

Anyway, the hard thing to do is to find out when to bail out and when
not. The part that handles the user interface is the easier one :-)

Regards,

Stefan

2005-03-21 11:21:17

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Hi.

On Mon, 2005-03-21 at 18:38, Stefan Seyfried wrote:
> Nigel Cunningham wrote:
> > Hi.
> >
> > On Sun, 2005-03-20 at 08:29, Pavel Machek wrote:
>
> >> boot with "noresume", then mkswap.
> >
> > Yuck! Why panic when you know what is needed? A better solution is to
>
> Ok, so let's
>
> printk("You booted another kernel than you suspended with.\n");
> printk("You have two options now:\n");
> printk(" - boot the kernel you suspended with\n");
> printk(" - pass 'noresume' at boot and mkswap your swap partition "
> " later\n");
> printk("Try again, player 1!\n");
> panic();

Still in the yuck category, although the better information is
definitely an improvement :>

> > tell the user they've messed up and give them the option to (1) reboot
> > and try another kernel or (2) have swsusp restore the original swap
> > signature and continue booting. This is what suspend2 does (with a
> > timeout for the prompt). It's not that hard.
>
> yes, but you need user input etc. Not considered a good idea IIRC.

I understood that having it hang indefinitely was considered a bad idea.
Suspend2 already has code that does what I'm suggesting, and
incorporates a 30 second timeout.

> Anyway, the hard thing to do is to find out when to bail out and when
> not. The part that handles the user interface is the easier one :-)

Agreed. That's where Pavel's code might need a little hacking around.

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-03-21 13:15:34

by Stefan Seyfried

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Hi,

Nigel Cunningham wrote:

> On Mon, 2005-03-21 at 11:17, Matthew Garrett wrote:

>> It's trivial to do this in userspace - just have an app in initramfs

> It's not that trivial.

> - Your image might not be stored in a swap partition. For Suspend2, it
> can potentially in a swap file or (soon) an ordinary file;
> - Finding which partition to look in for the signature might be non
> trivial (labels in fstab). You'd want to hard code it or (perferably)
> copy a config file from the root (or other) partition;
> - Having addressed the above issues, you still need to add code to read
> the swap header, parse it to find the header, read the header from the
> image, parse it and obtain the kernel version of the saved image.

Well, and you want to compile all this into the kernel? Just to hold the
hands of users who have not read the fine manual?
And you'd need to compile this into all kernels, especially those that
_don't_ support suspend to disk. Or you are back at the place where the
thread started.

> If your image is not stored in a swap partition, you probably can't
> mount the fs the image is stored on, because doing so will replay the
> image and make resuming unsafe, so this approach is less trivial without
> knowing exactly which disk blocks and device IDs to use (and using dd to
> access them).

GRUB reads kernel and initramfs from a dirty reiserfs partition on
resume (although this is a bad idea if you want a fast resume, but
that's another problem). It is possible.

> On top of these, we have two implementations, so you'll want to check
> for the signatures of both.

This is the final argument for doing it in userspace :-).

> That said, I am considering making something like what you're saying:
> exposing methods of testing whether an image exists and an entry through
> which you can get Suspend to erase an image via a proc (eventually
> sysfs) entry. This will allow something like what you're saying to be
> controlled from userspace.

It does not help if the next kernel i boot is not suspend2 patched. This
work should rather go into a library that exports this functions to
userspace programs, for all known suspend implementations.

Regards,

Stefan

2005-03-21 21:44:42

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend-to-disk woes

Hi.

On Mon, 2005-03-21 at 20:33, Stefan Seyfried wrote:
> Hi,
>
> Nigel Cunningham wrote:
>
> > On Mon, 2005-03-21 at 11:17, Matthew Garrett wrote:
>
> >> It's trivial to do this in userspace - just have an app in initramfs
>
> > It's not that trivial.
>
> > - Your image might not be stored in a swap partition. For Suspend2, it
> > can potentially in a swap file or (soon) an ordinary file;
> > - Finding which partition to look in for the signature might be non
> > trivial (labels in fstab). You'd want to hard code it or (perferably)
> > copy a config file from the root (or other) partition;
> > - Having addressed the above issues, you still need to add code to read
> > the swap header, parse it to find the header, read the header from the
> > image, parse it and obtain the kernel version of the saved image.
>
> Well, and you want to compile all this into the kernel? Just to hold the
> hands of users who have not read the fine manual?

Most of it is in there anyway - the kernel code needs to check the image
exists and read the header irrespective of whether it does sanity
checking. In Suspend2, this code is also used for other error conditions
that can stop you being able to resume (failure to load the right
modules in an initrd, failure at accessing the device where the image
should be found etc).

> And you'd need to compile this into all kernels, especially those that
> _don't_ support suspend to disk. Or you are back at the place where the
> thread started.

Yes. The real solution is for all kernels on a system to either support
suspend to disk or not support it. Half measures are what cause the
problem.

> > If your image is not stored in a swap partition, you probably can't
> > mount the fs the image is stored on, because doing so will replay the
> > image and make resuming unsafe, so this approach is less trivial without
> > knowing exactly which disk blocks and device IDs to use (and using dd to
> > access them).
>
> GRUB reads kernel and initramfs from a dirty reiserfs partition on
> resume (although this is a bad idea if you want a fast resume, but
> that's another problem). It is possible.

Mmm. I know it's all possible, but I'm pointing out the issues that make
it not "trivial", which was the original claim.

> > On top of these, we have two implementations, so you'll want to check
> > for the signatures of both.
>
> This is the final argument for doing it in userspace :-).

How so? You then have to maintain two codebases for doing all this
reading and parsing.

> > That said, I am considering making something like what you're saying:
> > exposing methods of testing whether an image exists and an entry through
> > which you can get Suspend to erase an image via a proc (eventually
> > sysfs) entry. This will allow something like what you're saying to be
> > controlled from userspace.
>
> It does not help if the next kernel i boot is not suspend2 patched. This
> work should rather go into a library that exports this functions to
> userspace programs, for all known suspend implementations.

So don't use kernels that aren't suspend2 patched :>

If someone said "I want to boot a kernel that doesn't have support for
ext3 but my rootfs is ext3", would we say "Well then, write a userspace
ext3 driver"? Not exactly the same, I know, but I think the point
stands. We'd say "Don't be silly. Put in the support you need."

The real solution to this mess is to get distros compiling in support
for suspend-to-disk by default. I realise that hasn't been attractive.
Hopefully it will change real-soon-now.

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net