2007-05-07 13:53:25

by Pavel Machek

[permalink] [raw]
Subject: Re: Back to the future.

Hi!

> nobody is suggesting that you leave peocesses running
> while you do the snapshot, what is being proposed is
>
> 1. pause userspace (prevent scheduling)
> 2. make snapshot image of memory
> 3. make mounted filesystems read-only (possibly with
> snapshot/checkpoint)
> 4. unpause
> 5. save image (with full userspace available, including
> network)

Including network? Your tcp peers will be really confused, then, if
you ACK packets then claim you did not get them. No, you do not want
to start network.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2007-05-07 02:52:52

by David Lang

[permalink] [raw]
Subject: Re: Back to the future.

On Thu, 3 May 2007, Pavel Machek wrote:

> Hi!
>
>> nobody is suggesting that you leave peocesses running
>> while you do the snapshot, what is being proposed is
>>
>> 1. pause userspace (prevent scheduling)
>> 2. make snapshot image of memory
>> 3. make mounted filesystems read-only (possibly with
>> snapshot/checkpoint)
>> 4. unpause
>> 5. save image (with full userspace available, including
>> network)
>
> Including network? Your tcp peers will be really confused, then, if
> you ACK packets then claim you did not get them. No, you do not want
> to start network.

anyone who is doing a hibernate or suspend who expect all the network
connections to be working afterwords is dreaming or smokeing something.

this is just another way that the failure can show up.

in fact, I would say that it would probalby be a nice thing to do for
intervening firewalls and external servers if a suspend closed all external TCP
connections rather then leaving them dangling (eating up resources until they
time out)

if you software can't tolorate the network connection going away on you it will
have problems in normal operation anyway, let alone when you suspend/hibernate
your machine.

David Lang

2007-05-07 03:34:23

by Kyle Moffett

[permalink] [raw]
Subject: Re: Back to the future.

On May 06, 2007, at 22:13:51, David Lang wrote:
> anyone who is doing a hibernate or suspend who expect all the
> network connections to be working afterwords is dreaming or
> smokeing something.
>
> this is just another way that the failure can show up.
>
> in fact, I would say that it would probalby be a nice thing to do
> for intervening firewalls and external servers if a suspend closed
> all external TCP connections rather then leaving them dangling
> (eating up resources until they time out)
>
> if you software can't tolorate the network connection going away on
> you it will have problems in normal operation anyway, let alone
> when you suspend/hibernate your machine.

Yeah, for suspend-to-ram+resume and for snapshot+restore you probably
want userspace to support some kind of initscript-like mechanism
which is triggered by the lid-switch or something before calling into
the kernel. That way it can close network connections mostly-nicely
and down network interfaces before suspending, then re-run DHCP/
802.11/whatever configuration after resume/restore. That might not
be a bad place to handle NFS mounts and such too.

Cheers,
Kyle Moffett

2007-05-07 12:48:23

by Pavel Machek

[permalink] [raw]
Subject: Re: Back to the future.

Hi!

> >>nobody is suggesting that you leave peocesses running
> >>while you do the snapshot, what is being proposed is
> >>
> >>1. pause userspace (prevent scheduling)
> >>2. make snapshot image of memory
> >>3. make mounted filesystems read-only (possibly with
> >>snapshot/checkpoint)
> >>4. unpause
> >>5. save image (with full userspace available, including
> >>network)
> >
> >Including network? Your tcp peers will be really confused, then, if
> >you ACK packets then claim you did not get them. No, you do not want
> >to start network.
>
> anyone who is doing a hibernate or suspend who expect all the network
> connections to be working afterwords is dreaming or smokeing
>something.

Really? It works today... if the suspend is short enough. And that's
how it should be.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-07 12:52:30

by Oliver Neukum

[permalink] [raw]
Subject: Re: Back to the future.

Am Montag, 7. Mai 2007 14:48 schrieb Pavel Machek:
> > >Including network? Your tcp peers will be really confused, then, if
> > >you ACK packets then claim you did not get them. No, you do not want
> > >to start network.
> >
> > anyone who is doing a hibernate or suspend who expect all the network
> > connections to be working afterwords is dreaming or smokeing
> >something.
>
> Really? It works today... if the suspend is short enough. And that's
> how it should be.

If we get very good at Wake-on-Lan it should work for any length
of time.

Regards
Oliver

2007-05-07 14:40:39

by David Lang

[permalink] [raw]
Subject: Re: Back to the future.

On Mon, 7 May 2007, Oliver Neukum wrote:

> Am Montag, 7. Mai 2007 14:48 schrieb Pavel Machek:
>>>> Including network? Your tcp peers will be really confused, then, if
>>>> you ACK packets then claim you did not get them. No, you do not want
>>>> to start network.
>>>
>>> anyone who is doing a hibernate or suspend who expect all the network
>>> connections to be working afterwords is dreaming or smokeing
>>> something.
>>
>> Really? It works today... if the suspend is short enough. And that's
>> how it should be.
>
> If we get very good at Wake-on-Lan it should work for any length
> of time.

for suspend-to-ram this would work, I stand corrected.

for hibernate this would almost certinly not work, and I don't think that
it's worth raising false hopes.

David Lang

2007-05-07 19:51:56

by Pavel Machek

[permalink] [raw]
Subject: Re: Back to the future.

Hi!

> >>Really? It works today... if the suspend is short
> >>enough. And that's
> >>how it should be.
> >
> >If we get very good at Wake-on-Lan it should work for
> >any length
> >of time.
>
> for suspend-to-ram this would work, I stand corrected.
>
> for hibernate this would almost certinly not work, and I
> don't think that it's worth raising false hopes.

Check the facts. It used to work, and it should work today.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-07 19:59:10

by David Lang

[permalink] [raw]
Subject: Re: Back to the future.

On Mon, 7 May 2007, Pavel Machek wrote:

>>>> Really? It works today... if the suspend is short
>>>> enough. And that's
>>>> how it should be.
>>>
>>> If we get very good at Wake-on-Lan it should work for
>>> any length
>>> of time.
>>
>> for suspend-to-ram this would work, I stand corrected.
>>
>> for hibernate this would almost certinly not work, and I
>> don't think that it's worth raising false hopes.
>
> Check the facts. It used to work, and it should work today.

I don't dispute that it sometimes works today.

what I dispute is that makeing it work should be a contraint on a cleaner
design that happens to cause tcp connections to fail on suspend-to-disk
(hibernate).

if you are dong suspend-to-disk for such a short period that TCP
connections are able to recover (typically <15 min for most firewalls, in
some cases <2 min for connections with keep-alive) is it really worth it?

and once you pass the timeframes where the connections are still alive
then it shouldn't matter, and in fact the server should gracefully close
the connections to be nice to other devices and servers on the network.

I dispute the idea that doing a suspend-to-disk and expecting that your
network connections will recover when you wake up is a sane expectation.

David Lang

2007-05-07 20:39:15

by Pavel Machek

[permalink] [raw]
Subject: Re: Back to the future.

Hi!

> I don't dispute that it sometimes works today.
>
> what I dispute is that makeing it work should be a contraint on a cleaner
> design that happens to cause tcp connections to fail on suspend-to-disk
> (hibernate).
>
> if you are dong suspend-to-disk for such a short period that TCP
> connections are able to recover (typically <15 min for most firewalls, in
> some cases <2 min for connections with keep-alive) is it really
> worth it?

People were using swsusp to move server from one room to another.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-08 17:36:24

by Disconnect

[permalink] [raw]
Subject: Re: Back to the future.

We used it (with great success) to replace bad UPSs on single-PSU
database servers under (light) load. No need for scheduled downtime,
etc.

The whole point of hibernation (or suspend to disk, or whatever you
call it) is that the system goes to a zero-power state and then can be
brought back to its original state. Closing in-progress network
connections has nothing to do with pausing a machine any more than
setting IM clients to 'away' would, or locking an X session. That sort
of side-effect needs to be handled outside the core of "put state out
to disk and read it back".

On 5/7/07, Pavel Machek <[email protected]> wrote:
> Hi!
>
> > I don't dispute that it sometimes works today.
> >
> > what I dispute is that makeing it work should be a contraint on a cleaner
> > design that happens to cause tcp connections to fail on suspend-to-disk
> > (hibernate).
> >
> > if you are dong suspend-to-disk for such a short period that TCP
> > connections are able to recover (typically <15 min for most firewalls, in
> > some cases <2 min for connections with keep-alive) is it really
> > worth it?
>
> People were using swsusp to move server from one room to another.
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>