2008-10-17 23:33:18

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> This patchset introduces kernel based checkpointing/restart as it is
> implemented in OpenVZ project. This patchset has limited functionality and
> are able to checkpoint/restart only single process. Recently Oren Laaden
> sent another kernel based implementation of checkpoint/restart. The main
> differences between this patchset and Oren's patchset are:

Hi Andrey,

I'm curious what you want to happen with this patch set. Is there
something specific in Oren's set that deficient which you need
implemented? Are there some technical reasons you prefer this code?

-- Dave


2008-10-20 11:10:22

by Louis Rilling

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> > This patchset introduces kernel based checkpointing/restart as it is
> > implemented in OpenVZ project. This patchset has limited functionality and
> > are able to checkpoint/restart only single process. Recently Oren Laaden
> > sent another kernel based implementation of checkpoint/restart. The main
> > differences between this patchset and Oren's patchset are:
>
> Hi Andrey,
>
> I'm curious what you want to happen with this patch set. Is there
> something specific in Oren's set that deficient which you need
> implemented? Are there some technical reasons you prefer this code?

To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
approach, shouldn't Oren answer the same questions with respect to Andrey's
patchset?

I'm afraid that we are forgetting to take the best from both approaches...

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (1.11 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-10-20 12:15:28

by Andrey Mirkin

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Saturday 18 October 2008 03:33 Dave Hansen wrote:
> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> > This patchset introduces kernel based checkpointing/restart as it is
> > implemented in OpenVZ project. This patchset has limited functionality
> > and are able to checkpoint/restart only single process. Recently Oren
> > Laaden sent another kernel based implementation of checkpoint/restart.
> > The main differences between this patchset and Oren's patchset are:
>
> Hi Andrey,
>
> I'm curious what you want to happen with this patch set. Is there
> something specific in Oren's set that deficient which you need
> implemented? Are there some technical reasons you prefer this code?

Hi Dave,

Right now my patchset (v2) provides an ability to checkpoint and restart a
group of processes. The process of checkpointing and restart can be initiated
from external process (not from the process which should be checkpointed).
Also I think that all the restart job (including process forking) should be
done in kernel, as in this case we will not depend on user space and will be
more secure. This is also implemented in my patchset.

Andrey

2008-10-20 13:28:22

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

Louis Rilling wrote:
> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>> This patchset introduces kernel based checkpointing/restart as it is
>>> implemented in OpenVZ project. This patchset has limited functionality and
>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>> sent another kernel based implementation of checkpoint/restart. The main
>>> differences between this patchset and Oren's patchset are:
>> Hi Andrey,
>>
>> I'm curious what you want to happen with this patch set. Is there
>> something specific in Oren's set that deficient which you need
>> implemented? Are there some technical reasons you prefer this code?
>
> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
> approach, shouldn't Oren answer the same questions with respect to Andrey's
> patchset?
>
> I'm afraid that we are forgetting to take the best from both approaches...

I agree with Louis.

I played with Oren's patchset and tryed to port it on x86_64. I was able
to sys_checkpoint/sys_restart but if you remove the restoring of the
general registers, the restart still works. I am not an expert on asm,
but my hypothesis is when we call sys_checkpoint the registers are saved
on the stack by the syscall and when we restore the memory of the
process, we restore the stack and the stacked registers are restored
when exiting the sys_restart. That make me feel there is an important
gap between external checkpoint and internal checkpoint.

Dmitry's patchset is nice too, but IMO, it goes too far from what we
decided to do at the container mini-summit. I think there are a lot of
design questions to be solved before going further.

IMHO we should look at Dmitry patchset and merge the external checkpoint
code to Oren's patchset in order to checkpoint *one* process and have
the process to restart itself. At this point, we can begin to talk about
the restart itself, shall we have the kernel to fork the processes to be
restarted ? shall we fork from userspace and implement some mechanism to
have each processes to restart themselves ? etc...







2008-10-20 13:48:47

by Cédric Le Goater

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

>> I'm afraid that we are forgetting to take the best from both
>> approaches...
>
> I agree with Louis.
>
> I played with Oren's patchset and tryed to port it on x86_64. I was able
> to sys_checkpoint/sys_restart but if you remove the restoring of the
> general registers, the restart still works. I am not an expert on asm,
> but my hypothesis is when we call sys_checkpoint the registers are saved
> on the stack by the syscall and when we restore the memory of the
> process, we restore the stack and the stacked registers are restored
> when exiting the sys_restart. That make me feel there is an important
> gap between external checkpoint and internal checkpoint.
>
> Dmitry's patchset is nice too, but IMO, it goes too far from what we

I think you are talking about Andrey.

C.

> decided to do at the container mini-summit. I think there are a lot of
> design questions to be solved before going further.
>
> IMHO we should look at Dmitry patchset and merge the external checkpoint
> code to Oren's patchset in order to checkpoint *one* process and have
> the process to restart itself. At this point, we can begin to talk about
> the restart itself, shall we have the kernel to fork the processes to be
> restarted ? shall we fork from userspace and implement some mechanism to
> have each processes to restart themselves ? etc...

2008-10-20 13:49:39

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

Cedric Le Goater wrote:
>>> I'm afraid that we are forgetting to take the best from both
>>> approaches...
>> I agree with Louis.
>>
>> I played with Oren's patchset and tryed to port it on x86_64. I was able
>> to sys_checkpoint/sys_restart but if you remove the restoring of the
>> general registers, the restart still works. I am not an expert on asm,
>> but my hypothesis is when we call sys_checkpoint the registers are saved
>> on the stack by the syscall and when we restore the memory of the
>> process, we restore the stack and the stacked registers are restored
>> when exiting the sys_restart. That make me feel there is an important
>> gap between external checkpoint and internal checkpoint.
>>
>> Dmitry's patchset is nice too, but IMO, it goes too far from what we
>
> I think you are talking about Andrey.

Yes :)

2008-10-20 15:55:25

by Dave Hansen

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
> Right now my patchset (v2) provides an ability to checkpoint and restart a
> group of processes. The process of checkpointing and restart can be initiated
> from external process (not from the process which should be checkpointed).

Absolutely. Oren's code does it this way to make for a smaller patch at
first. The syscall takes a pid argument so it is surely expected to be
expanded upon later.

> Also I think that all the restart job (including process forking) should be
> done in kernel, as in this case we will not depend on user space and will be
> more secure. This is also implemented in my patchset.

Do you think that this is an approach that Oren's patches are married
to, or is this a "feature" we can add on later?

I don't care which patch set we end up sticking in the kernel. I'm
trying to figure out which code we can more easily build upon in the
future. The fact that Oren's or yours can't do certain little things
right now does not bother me.

Honestly, I'm a little more confident that everyone can work with Oren
since he managed to get 7 revisions of his patch out and make some
pretty large changes while in the same time the OpenVZ patch was only
released twice. I'm not sure what has changed in the OpenVZ patch
between releases, either.

Are there any reasons that you absolutely can not use the code Oren
posted? Will it not fulfill your needs somehow? If so, could you
please elaborate on how?

-- Dave

2008-10-20 15:56:29

by Oren Laadan

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart



Daniel Lezcano wrote:
> Louis Rilling wrote:
>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>>> This patchset introduces kernel based checkpointing/restart as it is
>>>> implemented in OpenVZ project. This patchset has limited functionality and
>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>>> sent another kernel based implementation of checkpoint/restart. The main
>>>> differences between this patchset and Oren's patchset are:
>>> Hi Andrey,
>>>
>>> I'm curious what you want to happen with this patch set. Is there
>>> something specific in Oren's set that deficient which you need
>>> implemented? Are there some technical reasons you prefer this code?
>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
>> approach, shouldn't Oren answer the same questions with respect to Andrey's
>> patchset?
>>
>> I'm afraid that we are forgetting to take the best from both approaches...
>
> I agree with Louis.
>
> I played with Oren's patchset and tryed to port it on x86_64. I was able
> to sys_checkpoint/sys_restart but if you remove the restoring of the
> general registers, the restart still works. I am not an expert on asm,
> but my hypothesis is when we call sys_checkpoint the registers are saved
> on the stack by the syscall and when we restore the memory of the
> process, we restore the stack and the stacked registers are restored
> when exiting the sys_restart. That make me feel there is an important
> gap between external checkpoint and internal checkpoint.

This is a misconception: my patches are not "internal checkpoint". My
patches are basically "external checkpoint" by design, which *also*
accommodates self-checkpointing (aka internal). The same holds for the
restart. The implementation is demonstrated with "self-checkpoint" to
avoid complicating things at this early stage of proof-of-concept.

For multiple processes all that is needed is a container and a loop
on the checkpoint side, and a method to recreate processes on the
restart side. Andrew suggests to do it in kernel space, I still have
doubts.

While I held out the multi-process part of the patch so far because I
was explicitly asked to do it, it seems like this would be a good time
to push it out and get feedback.

>
> Dmitry's patchset is nice too, but IMO, it goes too far from what we
> decided to do at the container mini-summit. I think there are a lot of
> design questions to be solved before going further.
>
> IMHO we should look at Dmitry patchset and merge the external checkpoint
> code to Oren's patchset in order to checkpoint *one* process and have
> the process to restart itself. At this point, we can begin to talk about
> the restart itself, shall we have the kernel to fork the processes to be
> restarted ? shall we fork from userspace and implement some mechanism to
> have each processes to restart themselves ? etc...
>

In both approaches, processes restart themselves, in the sense that a
process to be restarted eventually calls "do_restart()" (or equivalent).

The only question is how processes are created. Andrew's patch creates
everything inside the kernel. I would like to still give it a try outside
the kernel. Everything is ready, except that we need a way to pre-select
a PID for the new child... we never agreed on that one, did we ?

If we go ahead with the kernel-based process creation, it's easy to merge
it to the current patch-set.

Oren.

2008-10-20 16:37:00

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Mon, 2008-10-20 at 13:10 +0200, Louis Rilling wrote:
> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
> approach, shouldn't Oren answer the same questions with respect to Andrey's
> patchset?

I'm only really "supporting" Oren's patch set because he got it out way
before the OpenVZ one showed up, and has kept integrating improvements
into it to keep me interested. The OpenVZ approach does a few things
that Oren's does not, and it *is* a bit more mature.

I'm quite willing to jump camps if there's something compelling in what
Andrey has posted, even though I've put quite a bit of time in to
reviewing and improving Oren's. I'm looking for that "something". :)

-- Dave

2008-10-20 16:39:52

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

Oren Laadan wrote:
>
> Daniel Lezcano wrote:
>> Louis Rilling wrote:
>>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>>>> This patchset introduces kernel based checkpointing/restart as it is
>>>>> implemented in OpenVZ project. This patchset has limited functionality and
>>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>>>> sent another kernel based implementation of checkpoint/restart. The main
>>>>> differences between this patchset and Oren's patchset are:
>>>> Hi Andrey,
>>>>
>>>> I'm curious what you want to happen with this patch set. Is there
>>>> something specific in Oren's set that deficient which you need
>>>> implemented? Are there some technical reasons you prefer this code?
>>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
>>> approach, shouldn't Oren answer the same questions with respect to Andrey's
>>> patchset?
>>>
>>> I'm afraid that we are forgetting to take the best from both approaches...
>> I agree with Louis.
>>
>> I played with Oren's patchset and tryed to port it on x86_64. I was able
>> to sys_checkpoint/sys_restart but if you remove the restoring of the
>> general registers, the restart still works. I am not an expert on asm,
>> but my hypothesis is when we call sys_checkpoint the registers are saved
>> on the stack by the syscall and when we restore the memory of the
>> process, we restore the stack and the stacked registers are restored
>> when exiting the sys_restart. That make me feel there is an important
>> gap between external checkpoint and internal checkpoint.
>
> This is a misconception: my patches are not "internal checkpoint". My
> patches are basically "external checkpoint" by design, which *also*
> accommodates self-checkpointing (aka internal). The same holds for the
> restart. The implementation is demonstrated with "self-checkpoint" to
> avoid complicating things at this early stage of proof-of-concept.

Yep, I read your patchset :)

I just want to clarify what we want to demonstrate with this patchset
for the proof-of-concept ? A self CR does not show what are the
complicate parts of the CR, we are just showing we can dump the memory
from the kernel and do setcontext/getcontext.

We state at the container mini-summit on an approach:

1. Pre-dump
2. Freeze the container
3. Dump
4. Thaw/Kill the container
5. Post-dump

We already have the freezer, and we can forget for now pre-dump and
post-dump.

IMHO, for the proof-of-concept we should do a minimal CR (like you did),
but conforming with these 5 points, but that means we have to do an
external checkpoint.

If the POC conforms with that, the patchset will be a little different
and that will show what are the difficult part for restarting a process,
especially to restart it at the frozen state :) and that will give an
idea from 10000 feets of the big picture.

> For multiple processes all that is needed is a container and a loop
> on the checkpoint side, and a method to recreate processes on the
> restart side. Andrew suggests to do it in kernel space, I still have
> doubts.

A question to Andrey, do you, in OpenVZ, restart "externally" or it is
the first process of the pid namespace which calls sys_restart and then
populates the pid namespace ?

> While I held out the multi-process part of the patch so far because I
> was explicitly asked to do it, it seems like this would be a good time
> to push it out and get feedback.

IMHO it is too soon...

2008-10-20 16:51:35

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

Quoting Oren Laadan ([email protected]):
> This is a misconception: my patches are not "internal checkpoint". My
> patches are basically "external checkpoint" by design, which *also*
> accommodates self-checkpointing (aka internal). The same holds for the
> restart. The implementation is demonstrated with "self-checkpoint" to
> avoid complicating things at this early stage of proof-of-concept.
>
> For multiple processes all that is needed is a container and a loop
> on the checkpoint side, and a method to recreate processes on the
> restart side. Andrew suggests to do it in kernel space, I still have
> doubts.

Yes I still prefer in-kernel. Can you elaborate on advantages of doing
more work in userspace?

> While I held out the multi-process part of the patch so far because I

Yup, and i appreciate your restraint until now :) It made your patchset
much easier to review.

> was explicitly asked to do it, it seems like this would be a good time
> to push it out and get feedback.

Can you send that out as a patch(set) on top of your v7? I'd love to
see (and test) it.

thanks,
-serge

2008-10-20 17:24:16

by Oren Laadan

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart



Andrey Mirkin wrote:
> On Saturday 18 October 2008 03:33 Dave Hansen wrote:
>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>> This patchset introduces kernel based checkpointing/restart as it is
>>> implemented in OpenVZ project. This patchset has limited functionality
>>> and are able to checkpoint/restart only single process. Recently Oren
>>> Laaden sent another kernel based implementation of checkpoint/restart.
>>> The main differences between this patchset and Oren's patchset are:
>> Hi Andrey,
>>
>> I'm curious what you want to happen with this patch set. Is there
>> something specific in Oren's set that deficient which you need
>> implemented? Are there some technical reasons you prefer this code?
>
> Hi Dave,
>
> Right now my patchset (v2) provides an ability to checkpoint and restart a
> group of processes. The process of checkpointing and restart can be initiated
> from external process (not from the process which should be checkpointed).

Both patchsets share the same design, namely be able to checkpoint and
restart multiple processes, with the operation initiated by an external
processes.

I deliberately left out the part that handles multiple processes to
keeps things simple for initial review, and until we decide on the
question of kernel- or user- based process creation on restart.

> Also I think that all the restart job (including process forking) should be
> done in kernel, as in this case we will not depend on user space and will be
> more secure. This is also implemented in my patchset.

I'm not convinced that creating the processes in the kernel makes it
more secure. Can you elaborate ? for the discussion, let's compare
these two basic scenarios:

1) container and processes are created in user space; each process
calls "sys_restart()" which eventually calls "do_restart()" that
does kernel-based restart.

2) container and processes are created in kernel space; each process
calls "do_restart()" to do kernel-based restart.

In fact, creating in user based makes it easier to enforce capabilities
and limits of the user. It also simplifies the debugging significantly,
and allows us to delegate the entire issue of containers and namespace
management back to user space, where it probably belongs.

On the other hand, doing it in kernel space likely to produce simpler
code for the creation of the processes.

Thanks,

Oren.

2008-10-20 17:25:18

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

Quoting Daniel Lezcano ([email protected]):
> Oren Laadan wrote:
> >
> > Daniel Lezcano wrote:
> >> Louis Rilling wrote:
> >>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
> >>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> >>>>> This patchset introduces kernel based checkpointing/restart as it is
> >>>>> implemented in OpenVZ project. This patchset has limited functionality and
> >>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
> >>>>> sent another kernel based implementation of checkpoint/restart. The main
> >>>>> differences between this patchset and Oren's patchset are:
> >>>> Hi Andrey,
> >>>>
> >>>> I'm curious what you want to happen with this patch set. Is there
> >>>> something specific in Oren's set that deficient which you need
> >>>> implemented? Are there some technical reasons you prefer this code?
> >>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
> >>> approach, shouldn't Oren answer the same questions with respect to Andrey's
> >>> patchset?
> >>>
> >>> I'm afraid that we are forgetting to take the best from both approaches...
> >> I agree with Louis.
> >>
> >> I played with Oren's patchset and tryed to port it on x86_64. I was able
> >> to sys_checkpoint/sys_restart but if you remove the restoring of the
> >> general registers, the restart still works. I am not an expert on asm,
> >> but my hypothesis is when we call sys_checkpoint the registers are saved
> >> on the stack by the syscall and when we restore the memory of the
> >> process, we restore the stack and the stacked registers are restored
> >> when exiting the sys_restart. That make me feel there is an important
> >> gap between external checkpoint and internal checkpoint.
> >
> > This is a misconception: my patches are not "internal checkpoint". My
> > patches are basically "external checkpoint" by design, which *also*
> > accommodates self-checkpointing (aka internal). The same holds for the
> > restart. The implementation is demonstrated with "self-checkpoint" to
> > avoid complicating things at this early stage of proof-of-concept.
>
> Yep, I read your patchset :)
>
> I just want to clarify what we want to demonstrate with this patchset
> for the proof-of-concept ? A self CR does not show what are the
> complicate parts of the CR, we are just showing we can dump the memory
> from the kernel and do setcontext/getcontext.
>
> We state at the container mini-summit on an approach:
>
> 1. Pre-dump
> 2. Freeze the container
> 3. Dump
> 4. Thaw/Kill the container
> 5. Post-dump
>
> We already have the freezer, and we can forget for now pre-dump and
> post-dump.
>
> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
> but conforming with these 5 points, but that means we have to do an
> external checkpoint.

Right, Oren, iiuc you are insisting that 'external checkpoint' and
'multiple task checkpoint' are the same thing. But they aren't.
Rather, I think that what we say is 'multiple tasks c/r' is what you say
should be done from user-space :)

So particularly given that your patchset seems to be in good shape,
I'd like to see external checkpoint explicitly supported. Please
just call me a dunce if v7 already works for that.

thanks,
-serge

2008-10-21 00:18:50

by Oren Laadan

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart



Serge E. Hallyn wrote:
> Quoting Daniel Lezcano ([email protected]):
>> Oren Laadan wrote:
>>> Daniel Lezcano wrote:
>>>> Louis Rilling wrote:
>>>>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>>>>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>>>>>> This patchset introduces kernel based checkpointing/restart as it is
>>>>>>> implemented in OpenVZ project. This patchset has limited functionality and
>>>>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>>>>>> sent another kernel based implementation of checkpoint/restart. The main
>>>>>>> differences between this patchset and Oren's patchset are:
>>>>>> Hi Andrey,
>>>>>>
>>>>>> I'm curious what you want to happen with this patch set. Is there
>>>>>> something specific in Oren's set that deficient which you need
>>>>>> implemented? Are there some technical reasons you prefer this code?
>>>>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
>>>>> approach, shouldn't Oren answer the same questions with respect to Andrey's
>>>>> patchset?
>>>>>
>>>>> I'm afraid that we are forgetting to take the best from both approaches...
>>>> I agree with Louis.
>>>>
>>>> I played with Oren's patchset and tryed to port it on x86_64. I was able
>>>> to sys_checkpoint/sys_restart but if you remove the restoring of the
>>>> general registers, the restart still works. I am not an expert on asm,
>>>> but my hypothesis is when we call sys_checkpoint the registers are saved
>>>> on the stack by the syscall and when we restore the memory of the
>>>> process, we restore the stack and the stacked registers are restored
>>>> when exiting the sys_restart. That make me feel there is an important
>>>> gap between external checkpoint and internal checkpoint.
>>> This is a misconception: my patches are not "internal checkpoint". My
>>> patches are basically "external checkpoint" by design, which *also*
>>> accommodates self-checkpointing (aka internal). The same holds for the
>>> restart. The implementation is demonstrated with "self-checkpoint" to
>>> avoid complicating things at this early stage of proof-of-concept.
>> Yep, I read your patchset :)
>>
>> I just want to clarify what we want to demonstrate with this patchset
>> for the proof-of-concept ? A self CR does not show what are the
>> complicate parts of the CR, we are just showing we can dump the memory
>> from the kernel and do setcontext/getcontext.
>>
>> We state at the container mini-summit on an approach:
>>
>> 1. Pre-dump
>> 2. Freeze the container
>> 3. Dump
>> 4. Thaw/Kill the container
>> 5. Post-dump
>>
>> We already have the freezer, and we can forget for now pre-dump and
>> post-dump.
>>
>> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
>> but conforming with these 5 points, but that means we have to do an
>> external checkpoint.
>
> Right, Oren, iiuc you are insisting that 'external checkpoint' and
> 'multiple task checkpoint' are the same thing. But they aren't.
> Rather, I think that what we say is 'multiple tasks c/r' is what you say
> should be done from user-space :)

Then I don't explain myself clearly :)

The only thing I consider doing in user space is the creation of
the container, the namespaces and the processes.

I argue that "external checkpoint of a single process" is very few
lines of code away from "self checkpoint" that is in v7.

I'm not sure how you define "external restart" ? eventually, the
processes restart themselves. It is a question of how the processes
are created to begin with.

>
> So particularly given that your patchset seems to be in good shape,
> I'd like to see external checkpoint explicitly supported. Please
> just call me a dunce if v7 already works for that.
>

It seems like you want a single process to checkpoint a single (other)
process, and then a single process to start a single (other) process.

I tried to explicitly avoid dealing with the container (user space ?
kernel space ?) and with creating new processes (user space ? kernel
space ?).

Nevertheless, it's the _same_ code. All that is needed is to make the
checkpoint syscall refer to the other task instead of self, and the
restart should create a container and fork there, then call sys_restart.

I guess instead of repeating this argument over, I will go ahead and
post a patch on top of v7 to demonstrate this (without a container,
therefore without preserving the original pid).

Oren.

2008-10-21 00:58:38

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

Quoting Oren Laadan ([email protected]):
>
>
> Serge E. Hallyn wrote:
> > Quoting Daniel Lezcano ([email protected]):
> >> Oren Laadan wrote:
> >>> Daniel Lezcano wrote:
> >>>> Louis Rilling wrote:
> >>>>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
> >>>>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> >>>>>>> This patchset introduces kernel based checkpointing/restart as it is
> >>>>>>> implemented in OpenVZ project. This patchset has limited functionality and
> >>>>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
> >>>>>>> sent another kernel based implementation of checkpoint/restart. The main
> >>>>>>> differences between this patchset and Oren's patchset are:
> >>>>>> Hi Andrey,
> >>>>>>
> >>>>>> I'm curious what you want to happen with this patch set. Is there
> >>>>>> something specific in Oren's set that deficient which you need
> >>>>>> implemented? Are there some technical reasons you prefer this code?
> >>>>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
> >>>>> approach, shouldn't Oren answer the same questions with respect to Andrey's
> >>>>> patchset?
> >>>>>
> >>>>> I'm afraid that we are forgetting to take the best from both approaches...
> >>>> I agree with Louis.
> >>>>
> >>>> I played with Oren's patchset and tryed to port it on x86_64. I was able
> >>>> to sys_checkpoint/sys_restart but if you remove the restoring of the
> >>>> general registers, the restart still works. I am not an expert on asm,
> >>>> but my hypothesis is when we call sys_checkpoint the registers are saved
> >>>> on the stack by the syscall and when we restore the memory of the
> >>>> process, we restore the stack and the stacked registers are restored
> >>>> when exiting the sys_restart. That make me feel there is an important
> >>>> gap between external checkpoint and internal checkpoint.
> >>> This is a misconception: my patches are not "internal checkpoint". My
> >>> patches are basically "external checkpoint" by design, which *also*
> >>> accommodates self-checkpointing (aka internal). The same holds for the
> >>> restart. The implementation is demonstrated with "self-checkpoint" to
> >>> avoid complicating things at this early stage of proof-of-concept.
> >> Yep, I read your patchset :)
> >>
> >> I just want to clarify what we want to demonstrate with this patchset
> >> for the proof-of-concept ? A self CR does not show what are the
> >> complicate parts of the CR, we are just showing we can dump the memory
> >> from the kernel and do setcontext/getcontext.
> >>
> >> We state at the container mini-summit on an approach:
> >>
> >> 1. Pre-dump
> >> 2. Freeze the container
> >> 3. Dump
> >> 4. Thaw/Kill the container
> >> 5. Post-dump
> >>
> >> We already have the freezer, and we can forget for now pre-dump and
> >> post-dump.
> >>
> >> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
> >> but conforming with these 5 points, but that means we have to do an
> >> external checkpoint.
> >
> > Right, Oren, iiuc you are insisting that 'external checkpoint' and
> > 'multiple task checkpoint' are the same thing. But they aren't.
> > Rather, I think that what we say is 'multiple tasks c/r' is what you say
> > should be done from user-space :)
>
> Then I don't explain myself clearly :)
>
> The only thing I consider doing in user space is the creation of
> the container, the namespaces and the processes.

That I understand.

> I argue that "external checkpoint of a single process" is very few
> lines of code away from "self checkpoint" that is in v7.
>
> I'm not sure how you define "external restart" ? eventually, the

If I ever said external restart, I actually meant external checkpoint.
I understand that a task should call sys_restart() itself.

> processes restart themselves. It is a question of how the processes
> are created to begin with.
>
> >
> > So particularly given that your patchset seems to be in good shape,
> > I'd like to see external checkpoint explicitly supported. Please
> > just call me a dunce if v7 already works for that.
> >
>
> It seems like you want a single process to checkpoint a single (other)
> process, and then a single process to start a single (other) process.

Yup.

> I tried to explicitly avoid dealing with the container (user space ?
> kernel space ?) and with creating new processes (user space ? kernel
> space ?).

And that's the right thing to do. But:

> Nevertheless, it's the _same_ code. All that is needed is to make the

I was under the impression that sys_checkpoint() on some other task's
pid and then restarting with that image would fail right now.

> checkpoint syscall refer to the other task instead of self, and the
> restart should create a container and fork there, then call sys_restart.
>
> I guess instead of repeating this argument over, I will go ahead and
> post a patch on top of v7 to demonstrate this (without a container,

Cool, thanks!

> therefore without preserving the original pid).

Yes, as i believe you said in another email earlier today, we have not
decided about how to restore the pid. Eric continues to argue for
playing games with /proc/sys/kernel/pid_max.

-serge

2008-10-21 09:36:28

by Cédric Le Goater

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

>> IMHO we should look at Dmitry patchset and merge the external checkpoint
>> code to Oren's patchset in order to checkpoint *one* process and have
>> the process to restart itself. At this point, we can begin to talk about
>> the restart itself, shall we have the kernel to fork the processes to be
>> restarted ? shall we fork from userspace and implement some mechanism to
>> have each processes to restart themselves ? etc...
>>
>
> In both approaches, processes restart themselves, in the sense that a
> process to be restarted eventually calls "do_restart()" (or equivalent).
>
> The only question is how processes are created. Andrew's patch creates
> everything inside the kernel. I would like to still give it a try outside
> the kernel. Everything is ready, except that we need a way to pre-select
> a PID for the new child... we never agreed on that one, did we ?

what do you mean ? like a clone_with_pid() routine ?

> If we go ahead with the kernel-based process creation, it's easy to merge
> it to the current patch-set.

Both solution are valid. Nevertheless, I would choose the solution
sharing existing code and being arch independent.

Now, we can start by duplicating code and see later how we could
share. But for mainline inclusion, I think that code reuse is a
faster path.

C.

2008-10-21 13:25:29

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

Oren Laadan wrote:
>
> Serge E. Hallyn wrote:
>> Quoting Daniel Lezcano ([email protected]):
>>> Oren Laadan wrote:
>>>> Daniel Lezcano wrote:
>>>>> Louis Rilling wrote:
>>>>>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>>>>>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>>>>>>> This patchset introduces kernel based checkpointing/restart as it is
>>>>>>>> implemented in OpenVZ project. This patchset has limited functionality and
>>>>>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>>>>>>> sent another kernel based implementation of checkpoint/restart. The main
>>>>>>>> differences between this patchset and Oren's patchset are:
>>>>>>> Hi Andrey,
>>>>>>>
>>>>>>> I'm curious what you want to happen with this patch set. Is there
>>>>>>> something specific in Oren's set that deficient which you need
>>>>>>> implemented? Are there some technical reasons you prefer this code?
>>>>>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
>>>>>> approach, shouldn't Oren answer the same questions with respect to Andrey's
>>>>>> patchset?
>>>>>>
>>>>>> I'm afraid that we are forgetting to take the best from both approaches...
>>>>> I agree with Louis.
>>>>>
>>>>> I played with Oren's patchset and tryed to port it on x86_64. I was able
>>>>> to sys_checkpoint/sys_restart but if you remove the restoring of the
>>>>> general registers, the restart still works. I am not an expert on asm,
>>>>> but my hypothesis is when we call sys_checkpoint the registers are saved
>>>>> on the stack by the syscall and when we restore the memory of the
>>>>> process, we restore the stack and the stacked registers are restored
>>>>> when exiting the sys_restart. That make me feel there is an important
>>>>> gap between external checkpoint and internal checkpoint.
>>>> This is a misconception: my patches are not "internal checkpoint". My
>>>> patches are basically "external checkpoint" by design, which *also*
>>>> accommodates self-checkpointing (aka internal). The same holds for the
>>>> restart. The implementation is demonstrated with "self-checkpoint" to
>>>> avoid complicating things at this early stage of proof-of-concept.
>>> Yep, I read your patchset :)
>>>
>>> I just want to clarify what we want to demonstrate with this patchset
>>> for the proof-of-concept ? A self CR does not show what are the
>>> complicate parts of the CR, we are just showing we can dump the memory
>>> from the kernel and do setcontext/getcontext.
>>>
>>> We state at the container mini-summit on an approach:
>>>
>>> 1. Pre-dump
>>> 2. Freeze the container
>>> 3. Dump
>>> 4. Thaw/Kill the container
>>> 5. Post-dump
>>>
>>> We already have the freezer, and we can forget for now pre-dump and
>>> post-dump.
>>>
>>> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
>>> but conforming with these 5 points, but that means we have to do an
>>> external checkpoint.
>> Right, Oren, iiuc you are insisting that 'external checkpoint' and
>> 'multiple task checkpoint' are the same thing. But they aren't.
>> Rather, I think that what we say is 'multiple tasks c/r' is what you say
>> should be done from user-space :)
>
> Then I don't explain myself clearly :)
>
> The only thing I consider doing in user space is the creation of
> the container, the namespaces and the processes.
>
> I argue that "external checkpoint of a single process" is very few
> lines of code away from "self checkpoint" that is in v7.
>
> I'm not sure how you define "external restart" ? eventually, the
> processes restart themselves. It is a question of how the processes
> are created to begin with.
>
>> So particularly given that your patchset seems to be in good shape,
>> I'd like to see external checkpoint explicitly supported. Please
>> just call me a dunce if v7 already works for that.
>>
>
> It seems like you want a single process to checkpoint a single (other)
> process, and then a single process to start a single (other) process.
>
> I tried to explicitly avoid dealing with the container (user space ?
> kernel space ?) and with creating new processes (user space ? kernel
> space ?).
>
> Nevertheless, it's the _same_ code. All that is needed is to make the
> checkpoint syscall refer to the other task instead of self, and the
> restart should create a container and fork there, then call sys_restart.
>
> I guess instead of repeating this argument over, I will go ahead and
> post a patch on top of v7 to demonstrate this (without a container,
> therefore without preserving the original pid).

Cedric made a patch for the external checkpoint:

http://lxc.sourceforge.net/patches/2.6.27/2.6.27-rc8-lxc1/0035-enable-external-checkpoint.patch

The main difference is you will need to freeze the process because it
will not block itself via a syscall (there is the freezer patchset).

For the restart, perhaps you can just do a process calling sys_restart
and so we delay the fork from user/kernel discussion, no ?

2008-10-27 14:06:58

by Andrey Mirkin

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Monday 20 October 2008 19:55 Dave Hansen wrote:
> On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
> > Right now my patchset (v2) provides an ability to checkpoint and restart
> > a group of processes. The process of checkpointing and restart can be
> > initiated from external process (not from the process which should be
> > checkpointed).
>
> Absolutely. Oren's code does it this way to make for a smaller patch at
> first. The syscall takes a pid argument so it is surely expected to be
> expanded upon later.
>
> > Also I think that all the restart job (including process forking) should
> > be done in kernel, as in this case we will not depend on user space and
> > will be more secure. This is also implemented in my patchset.
>
> Do you think that this is an approach that Oren's patches are married
> to, or is this a "feature" we can add on later?

Well, AFAICS from Oren's patch set his approach is oriented on process
creation in user space. I think we should choose right now what approach will
be used for process creation.
We have two options here: fork processes in kernel or fork them in user space.
If process will be forked in user space, then there will be a gap when process
will be in user space and can be killed with received signal before entering
kernel. Also we will need a functionolity to create processes with predefined
PID. I think it is not very good to provide such ability to user space. That
is why we prefer in OpenVZ to do all the job in kernel.

> I don't care which patch set we end up sticking in the kernel. I'm
> trying to figure out which code we can more easily build upon in the
> future. The fact that Oren's or yours can't do certain little things
> right now does not bother me.
>
> Honestly, I'm a little more confident that everyone can work with Oren
> since he managed to get 7 revisions of his patch out and make some
> pretty large changes while in the same time the OpenVZ patch was only
> released twice. I'm not sure what has changed in the OpenVZ patch
> between releases, either.

That is my fault. I am working right now on my Ph.D, that is why my activity
is not very high. But now I hope I will have more time for that.

> Are there any reasons that you absolutely can not use the code Oren
> posted? Will it not fulfill your needs somehow? If so, could you
> please elaborate on how?

We have one major difference with Oren's code - how processes are created
during restr.
Right now I'm trying to port kernel process creation on top of Oren's patches.
I agree that working in collaboration will speed up merging of checkpointing
to mainstream.

Andrey

P.S.: Sorry for late reply, my mailer attached your e-mail to wrong thread.

2008-10-27 14:38:48

by Andrey Mirkin

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Monday 20 October 2008 21:17 Oren Laadan wrote:
> Andrey Mirkin wrote:
> > On Saturday 18 October 2008 03:33 Dave Hansen wrote:
> >> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> >>> This patchset introduces kernel based checkpointing/restart as it is
> >>> implemented in OpenVZ project. This patchset has limited functionality
> >>> and are able to checkpoint/restart only single process. Recently Oren
> >>> Laaden sent another kernel based implementation of checkpoint/restart.
> >>> The main differences between this patchset and Oren's patchset are:
> >>
> >> Hi Andrey,
> >>
> >> I'm curious what you want to happen with this patch set. Is there
> >> something specific in Oren's set that deficient which you need
> >> implemented? Are there some technical reasons you prefer this code?
> >
> > Hi Dave,
> >
> > Right now my patchset (v2) provides an ability to checkpoint and restart
> > a group of processes. The process of checkpointing and restart can be
> > initiated from external process (not from the process which should be
> > checkpointed).
>
> Both patchsets share the same design, namely be able to checkpoint and
> restart multiple processes, with the operation initiated by an external
> processes.
>
> I deliberately left out the part that handles multiple processes to
> keeps things simple for initial review, and until we decide on the
> question of kernel- or user- based process creation on restart.

I agree that multiple process handling is not needed for initial review. But I
believe that the question with process creation should be discussed right
now.

> > Also I think that all the restart job (including process forking) should
> > be done in kernel, as in this case we will not depend on user space and
> > will be more secure. This is also implemented in my patchset.
>
> I'm not convinced that creating the processes in the kernel makes it
> more secure. Can you elaborate ? for the discussion, let's compare
> these two basic scenarios:
>
> 1) container and processes are created in user space; each process
> calls "sys_restart()" which eventually calls "do_restart()" that
> does kernel-based restart.

Well, in this case there will be a gap after process is returned from fork but
before entering kernel. During that time process can be killed by delivered
signal. Another drawback of this approach is that we will need to provide an
ability for user to create processes with predefined PID.

> 2) container and processes are created in kernel space; each process
> calls "do_restart()" to do kernel-based restart.
>
> In fact, creating in user based makes it easier to enforce capabilities
> and limits of the user. It also simplifies the debugging significantly,
> and allows us to delegate the entire issue of containers and namespace
> management back to user space, where it probably belongs.
>
> On the other hand, doing it in kernel space likely to produce simpler
> code for the creation of the processes.

You right here. Both approaches have pros and cons, but I think that kernel
approach has more advantages

Andrey

2008-10-27 14:40:19

by Oren Laadan

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart



Andrey Mirkin wrote:
> On Monday 20 October 2008 19:55 Dave Hansen wrote:
>> On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
>>> Right now my patchset (v2) provides an ability to checkpoint and restart
>>> a group of processes. The process of checkpointing and restart can be
>>> initiated from external process (not from the process which should be
>>> checkpointed).
>> Absolutely. Oren's code does it this way to make for a smaller patch at
>> first. The syscall takes a pid argument so it is surely expected to be
>> expanded upon later.
>>
>>> Also I think that all the restart job (including process forking) should
>>> be done in kernel, as in this case we will not depend on user space and
>>> will be more secure. This is also implemented in my patchset.
>> Do you think that this is an approach that Oren's patches are married
>> to, or is this a "feature" we can add on later?
>
> Well, AFAICS from Oren's patch set his approach is oriented on process
> creation in user space. I think we should choose right now what approach will
> be used for process creation.

This is inaccurate.

I intentionally did not address how processes will be created, by
simply allowing either way to be added to the patch.

I do agree that we probably want to decide how to do it. However,
there is also room to allow for both approaches, in a compatible
way, should we wish to explore both.

> We have two options here: fork processes in kernel or fork them in user space.
> If process will be forked in user space, then there will be a gap when process
> will be in user space and can be killed with received signal before entering

Why do we care about it ?
Why is there a difference if it is killed before or after entering
the kernel (e.g. user aborted restart, or kernel OOM kicked in) ?

> kernel. Also we will need a functionolity to create processes with predefined
> PID. I think it is not very good to provide such ability to user space. That
> is why we prefer in OpenVZ to do all the job in kernel.

This is the weak side of creating the processes in user space -
that we need such an interface. Note, however, that we can
easily "hide" it inside the interface of the sys_restart() call,
and restrict how it may be used.

Oren.

>
>> I don't care which patch set we end up sticking in the kernel. I'm
>> trying to figure out which code we can more easily build upon in the
>> future. The fact that Oren's or yours can't do certain little things
>> right now does not bother me.
>>
>> Honestly, I'm a little more confident that everyone can work with Oren
>> since he managed to get 7 revisions of his patch out and make some
>> pretty large changes while in the same time the OpenVZ patch was only
>> released twice. I'm not sure what has changed in the OpenVZ patch
>> between releases, either.
>
> That is my fault. I am working right now on my Ph.D, that is why my activity
> is not very high. But now I hope I will have more time for that.
>
>> Are there any reasons that you absolutely can not use the code Oren
>> posted? Will it not fulfill your needs somehow? If so, could you
>> please elaborate on how?
>
> We have one major difference with Oren's code - how processes are created
> during restr.
> Right now I'm trying to port kernel process creation on top of Oren's patches.
> I agree that working in collaboration will speed up merging of checkpointing
> to mainstream.
>
> Andrey
>
> P.S.: Sorry for late reply, my mailer attached your e-mail to wrong thread.
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers
>

2008-10-27 14:47:10

by Andrey Mirkin

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Monday 20 October 2008 20:37 Daniel Lezcano wrote:
> Oren Laadan wrote:
> > Daniel Lezcano wrote:
> >> Louis Rilling wrote:
> >>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
> >>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> >>>>> This patchset introduces kernel based checkpointing/restart as it is
> >>>>> implemented in OpenVZ project. This patchset has limited
> >>>>> functionality and are able to checkpoint/restart only single process.
> >>>>> Recently Oren Laaden sent another kernel based implementation of
> >>>>> checkpoint/restart. The main differences between this patchset and
> >>>>> Oren's patchset are:
> >>>>
> >>>> Hi Andrey,
> >>>>
> >>>> I'm curious what you want to happen with this patch set. Is there
> >>>> something specific in Oren's set that deficient which you need
> >>>> implemented? Are there some technical reasons you prefer this code?
> >>>
> >>> To be fair, and since (IIRC) the initial intent was to start with
> >>> OpenVZ's approach, shouldn't Oren answer the same questions with
> >>> respect to Andrey's patchset?
> >>>
> >>> I'm afraid that we are forgetting to take the best from both
> >>> approaches...
> >>
> >> I agree with Louis.
> >>
> >> I played with Oren's patchset and tryed to port it on x86_64. I was able
> >> to sys_checkpoint/sys_restart but if you remove the restoring of the
> >> general registers, the restart still works. I am not an expert on asm,
> >> but my hypothesis is when we call sys_checkpoint the registers are saved
> >> on the stack by the syscall and when we restore the memory of the
> >> process, we restore the stack and the stacked registers are restored
> >> when exiting the sys_restart. That make me feel there is an important
> >> gap between external checkpoint and internal checkpoint.
> >
> > This is a misconception: my patches are not "internal checkpoint". My
> > patches are basically "external checkpoint" by design, which *also*
> > accommodates self-checkpointing (aka internal). The same holds for the
> > restart. The implementation is demonstrated with "self-checkpoint" to
> > avoid complicating things at this early stage of proof-of-concept.
>
> Yep, I read your patchset :)
>
> I just want to clarify what we want to demonstrate with this patchset
> for the proof-of-concept ? A self CR does not show what are the
> complicate parts of the CR, we are just showing we can dump the memory
> from the kernel and do setcontext/getcontext.
>
> We state at the container mini-summit on an approach:
>
> 1. Pre-dump
> 2. Freeze the container
> 3. Dump
> 4. Thaw/Kill the container
> 5. Post-dump
>
> We already have the freezer, and we can forget for now pre-dump and
> post-dump.
>
> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
> but conforming with these 5 points, but that means we have to do an
> external checkpoint.
>
> If the POC conforms with that, the patchset will be a little different
> and that will show what are the difficult part for restarting a process,
> especially to restart it at the frozen state :) and that will give an
> idea from 10000 feets of the big picture.
>
> > For multiple processes all that is needed is a container and a loop
> > on the checkpoint side, and a method to recreate processes on the
> > restart side. Andrew suggests to do it in kernel space, I still have
> > doubts.
>
> A question to Andrey, do you, in OpenVZ, restart "externally" or it is
> the first process of the pid namespace which calls sys_restart and then
> populates the pid namespace ?

In OpenVZ we are creating first task and namespaces from sys_restart.

Andrey

>
> > While I held out the multi-process part of the patch so far because I
> > was explicitly asked to do it, it seems like this would be a good time
> > to push it out and get feedback.
>
> IMHO it is too soon...
>
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
> _______________________________________________
> Devel mailing list
> [email protected]
> https://openvz.org/mailman/listinfo/devel

2008-10-30 06:02:19

by Andrey Mirkin

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Monday 27 October 2008 17:39 Oren Laadan wrote:
> Andrey Mirkin wrote:
> > On Monday 20 October 2008 19:55 Dave Hansen wrote:
> >> On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
> >>> Right now my patchset (v2) provides an ability to checkpoint and
> >>> restart a group of processes. The process of checkpointing and restart
> >>> can be initiated from external process (not from the process which
> >>> should be checkpointed).
> >>
> >> Absolutely. Oren's code does it this way to make for a smaller patch at
> >> first. The syscall takes a pid argument so it is surely expected to be
> >> expanded upon later.
> >>
> >>> Also I think that all the restart job (including process forking)
> >>> should be done in kernel, as in this case we will not depend on user
> >>> space and will be more secure. This is also implemented in my patchset.
> >>
> >> Do you think that this is an approach that Oren's patches are married
> >> to, or is this a "feature" we can add on later?
> >
> > Well, AFAICS from Oren's patch set his approach is oriented on process
> > creation in user space. I think we should choose right now what approach
> > will be used for process creation.
>
> This is inaccurate.
>
> I intentionally did not address how processes will be created, by
> simply allowing either way to be added to the patch.

Yes, you right. Either way is possible with your patchset. But as I understand
in ZAP you are using user space process creation. No?
That is why I think that your design is more convenient for user process
creation.

> I do agree that we probably want to decide how to do it. However,
> there is also room to allow for both approaches, in a compatible
> way, should we wish to explore both.

Yes, we can implement both approaches. Do you think we really need this?

> > We have two options here: fork processes in kernel or fork them in user
> > space. If process will be forked in user space, then there will be a gap
> > when process will be in user space and can be killed with received signal
> > before entering
>
> Why do we care about it ?
> Why is there a difference if it is killed before or after entering
> the kernel (e.g. user aborted restart, or kernel OOM kicked in) ?

If one process is killed during restart then you can even do not notice that
(if processes are created from user space and then call sys_restart). And you
will get not the same state as before C/R.

> > kernel. Also we will need a functionolity to create processes with
> > predefined PID. I think it is not very good to provide such ability to
> > user space. That is why we prefer in OpenVZ to do all the job in kernel.
>
> This is the weak side of creating the processes in user space -
> that we need such an interface. Note, however, that we can
> easily "hide" it inside the interface of the sys_restart() call,
> and restrict how it may be used.

Of course we can "hide" it somehow, but anyway we will have a hole and that is
not good.

Anyway we should ask everyone what they think about user- and kernel- based
process creation.
Dave, Serge, Cedric, Daniel, Louis what do you think about that?

Andrey

> >> I don't care which patch set we end up sticking in the kernel. I'm
> >> trying to figure out which code we can more easily build upon in the
> >> future. The fact that Oren's or yours can't do certain little things
> >> right now does not bother me.
> >>
> >> Honestly, I'm a little more confident that everyone can work with Oren
> >> since he managed to get 7 revisions of his patch out and make some
> >> pretty large changes while in the same time the OpenVZ patch was only
> >> released twice. I'm not sure what has changed in the OpenVZ patch
> >> between releases, either.
> >
> > That is my fault. I am working right now on my Ph.D, that is why my
> > activity is not very high. But now I hope I will have more time for that.
> >
> >> Are there any reasons that you absolutely can not use the code Oren
> >> posted? Will it not fulfill your needs somehow? If so, could you
> >> please elaborate on how?
> >
> > We have one major difference with Oren's code - how processes are created
> > during restr.
> > Right now I'm trying to port kernel process creation on top of Oren's
> > patches. I agree that working in collaboration will speed up merging of
> > checkpointing to mainstream.
> >
> > Andrey
> >
> > P.S.: Sorry for late reply, my mailer attached your e-mail to wrong
> > thread. _______________________________________________
> > Containers mailing list
> > [email protected]
> > https://lists.linux-foundation.org/mailman/listinfo/containers

2008-10-30 11:47:59

by Louis Rilling

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Thu, Oct 30, 2008 at 10:02:44AM +0400, Andrey Mirkin wrote:
> > > kernel. Also we will need a functionolity to create processes with
> > > predefined PID. I think it is not very good to provide such ability to
> > > user space. That is why we prefer in OpenVZ to do all the job in kernel.
> >
> > This is the weak side of creating the processes in user space -
> > that we need such an interface. Note, however, that we can
> > easily "hide" it inside the interface of the sys_restart() call,
> > and restrict how it may be used.
>
> Of course we can "hide" it somehow, but anyway we will have a hole and that is
> not good.
>
> Anyway we should ask everyone what they think about user- and kernel- based
> process creation.
> Dave, Serge, Cedric, Daniel, Louis what do you think about that?

Frankly, I'm not convinced (yet) that one approach is better than the other one.
I only *tend* to prefer kernel-based, for the reasons explained below. I know
that there are arguments in favor of userspace (I've at least seen
security-related ones), but I let their authors detail them (again).

In Kerrighed this is kernel-based, and will remain kernel-based because we
checkpoint a distributed task tree, and want to restart it as mush as possible
with the same distribution. The distributed protocol used for restart is
currently too fragile and complex to rely on customized user-space
implementations. That said, if someone brings very good arguments in favor of
userspace implementations, we might consider changing this.

Without taking distributed restart into account, I also tend to prefer
kernel-based, mainly for two (not so strong) reasons:
1) this prevents userspace from doing weird things, like changing the task tree
and let the kernel detect it and deal with the mess this creates (think about
two threads being restarted in separate processes that do not even share their
parents). But one can argue that userspace can change the checkpoint image as
well, so that the kernel must check for such weird things anyway.
2) restart will be more efficient with respect to shared objects.

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (2.23 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-10-30 14:09:08

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

Quoting Andrey Mirkin ([email protected]):
> Anyway we should ask everyone what they think about user- and kernel- based
> process creation.
> Dave, Serge, Cedric, Daniel, Louis what do you think about that?

I prefer kernel. Pretty sure Dave prefers user-space.

I'd say the thing to do is push the core API that supports single-thread
c/r. Then try to push the patch to do recreate processes from the
kernel, and let the arguments for and against that stand on their own.

-serge

2008-10-30 17:04:01

by Dave Hansen

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Thu, 2008-10-30 at 10:02 +0400, Andrey Mirkin wrote:
> Anyway we should ask everyone what they think about user- and kernel- based
> process creation.
> Dave, Serge, Cedric, Daniel, Louis what do you think about that?

My worry is where a single sys_restart() plus in-kernel process creation
takes us.

In practice, what do we do? Do we single-thread the entire restore
process? Or, do we do in-kernel process creation and have multiple
kernel threads trying to read out of different points in the checkpoint
file, trying to restore all their own states in parallel? Does that
mean that we can't in practice restore from a fd like a pipe or a
network socket?

In the same way, if we *do* create the processes in userspace, how do we
do _that_? Do we just fork() and sleep() until the kernel comes along
and blows our state away? How does the kernel process doing the
restoring tell userspace how many things to fork? How do we match these
new userspace processes up with the ones coming out of the checkpoint
process?

To me, it's just way too early to talk about this stuff. Both
approaches have their issues, and I'm yet to see the differences
manifested in code so I can really sink my teeth into them

-- Dave

2008-10-30 17:09:00

by Dave Hansen

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Thu, 2008-10-30 at 12:47 +0100, Louis Rilling wrote:
> 1) this prevents userspace from doing weird things, like changing the task tree
> and let the kernel detect it and deal with the mess this creates (think about
> two threads being restarted in separate processes that do not even share their
> parents). But one can argue that userspace can change the checkpoint image as
> well, so that the kernel must check for such weird things anyway.

To me, this is one of the strongest arguments out there for doing
restart as much as possible with existing user<->kernel APIs. Having
the kernel detect and clean up userspace's messes is not going to work.
We might as well just do things in the kernel rather than do that.

What we *should* do is leverage all of the existing APIs that we already
have instead of creating completely new code paths into which my butter
fingers can introduce new kernel bugs.

> 2) restart will be more efficient with respect to shared objects.

Can you quantify this? Which objects? How much more efficient?

-- Dave

2008-10-30 17:47:58

by Oren Laadan

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart



Louis Rilling wrote:
> On Thu, Oct 30, 2008 at 10:02:44AM +0400, Andrey Mirkin wrote:
>>>> kernel. Also we will need a functionolity to create processes with
>>>> predefined PID. I think it is not very good to provide such ability to
>>>> user space. That is why we prefer in OpenVZ to do all the job in kernel.
>>> This is the weak side of creating the processes in user space -
>>> that we need such an interface. Note, however, that we can
>>> easily "hide" it inside the interface of the sys_restart() call,
>>> and restrict how it may be used.
>> Of course we can "hide" it somehow, but anyway we will have a hole and that is
>> not good.
>>
>> Anyway we should ask everyone what they think about user- and kernel- based
>> process creation.
>> Dave, Serge, Cedric, Daniel, Louis what do you think about that?
>
> Frankly, I'm not convinced (yet) that one approach is better than the other one.
> I only *tend* to prefer kernel-based, for the reasons explained below. I know
> that there are arguments in favor of userspace (I've at least seen
> security-related ones), but I let their authors detail them (again).

I'm not convinced either. I think both implementation can eventually
work well.

>
> In Kerrighed this is kernel-based, and will remain kernel-based because we
> checkpoint a distributed task tree, and want to restart it as mush as possible
> with the same distribution. The distributed protocol used for restart is
> currently too fragile and complex to rely on customized user-space
> implementations. That said, if someone brings very good arguments in favor of
> userspace implementations, we might consider changing this.

Zap also has distributed checkpoint which does not require strict
kernel-side ordering. Do you need that because you do SSI ?

>
> Without taking distributed restart into account, I also tend to prefer
> kernel-based, mainly for two (not so strong) reasons:
> 1) this prevents userspace from doing weird things, like changing the task tree
> and let the kernel detect it and deal with the mess this creates (think about
> two threads being restarted in separate processes that do not even share their
> parents). But one can argue that userspace can change the checkpoint image as
> well, so that the kernel must check for such weird things anyway.

I don't really buy this argument. First, as you say, user can change
the checkpoint image file. Second, you can verify in the kernel that
the real relationships of the processes match those specified (and
expected from) the image file. That's pretty straightforward.

> 2) restart will be more efficient with respect to shared objects.

Can you elaborate on this ? In what sense "more efficient" ?

Note that the topic in question is not whether to do the entire restart
from user space (and I argue that most work should be done in the kernel),
but rather whether process creation (and only that) should be done in
kernel or user space.

Quick thoughts of pros/cons of each approach are:

user space:

+ re-use existing api (fork)
+ easier to debug
+ will allow 'handmade' resources restart: it was mentioned before that
one may want to reattach stdout to a different place after restart; a
user based restart of processes can make this much easier: e.g. the
user process can create the alternative resources, give them to the
kernel and only then call sys_restart)
+ arch-independent code

- a bit slower than in kernel space
- requires a clone-with-specific-pid syscall or interface

kernel space:

+ a bit easier to control everything
+ a bit faster than user space
+ no need for user-visible interface for clone-with-...

- arch-dependent code
- needs special code to fight 'fork-bomb'

So, I'm not convinced, and I even think there may be room to both, for
the time being. I volunteer to support the user-space alternative while
we make up our minds.

Oren.

2008-10-30 18:01:47

by Louis Rilling

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Thu, Oct 30, 2008 at 10:08:44AM -0700, Dave Hansen wrote:
> On Thu, 2008-10-30 at 12:47 +0100, Louis Rilling wrote:
> > 1) this prevents userspace from doing weird things, like changing the task tree
> > and let the kernel detect it and deal with the mess this creates (think about
> > two threads being restarted in separate processes that do not even share their
> > parents). But one can argue that userspace can change the checkpoint image as
> > well, so that the kernel must check for such weird things anyway.
>
> To me, this is one of the strongest arguments out there for doing
> restart as much as possible with existing user<->kernel APIs. Having
> the kernel detect and clean up userspace's messes is not going to work.
> We might as well just do things in the kernel rather than do that.
>
> What we *should* do is leverage all of the existing APIs that we already
> have instead of creating completely new code paths into which my butter
> fingers can introduce new kernel bugs.
>
> > 2) restart will be more efficient with respect to shared objects.
>
> Can you quantify this? Which objects? How much more efficient?

Quantify? No. I expect that investigating both approaches will show us numbers.
Unless Oren already has some?

Which objects? I think that two kinds will especially matter: objects usually
shared only inside a thread group (mm_struct, fs_struct, files_struct,
signal_struct and sighand_struct), and individual file descriptors. The point is
to avoid creating new structures before destroying them because the restarted
task shares them with a previously restarted one.

Concerning individual file descriptors, limiting the number of open files before
calling sys_restart() may avoid these useless creations/destructions (actually
the "useless" work mainly consists in managing ref counts since file descriptors
are shared after fork()).

Concerning thread-shared structures, it is probably easy for userspace to guess
which clone flags to use when restarting threads, but
1) kernel-space will have to check that the sharing is correct anyway, and
2) kernel-space will have to fix it anyway if structures are not shared in an
obvious manner between tasks (think about A creating B with shared files_struct,
B creating C with shared files_struct, B unsharing its files_struct, and then
checkpoint).

So, with a userspace implementation, useless structures will be created anyway,
and optimizing the common cases (regular threads) just duplicates kernel's work
of checking which shared structure to use for each task to restart.
With a kernel-space implementation, all useless creations can be avoided, and no
duplicate work is needed.

That said, numbers may show us that useless creations are not so
time-consuming, but we won't know before seeing them...

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (2.91 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-10-30 18:14:40

by Louis Rilling

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
>
>
> Louis Rilling wrote:
> > In Kerrighed this is kernel-based, and will remain kernel-based because we
> > checkpoint a distributed task tree, and want to restart it as mush as possible
> > with the same distribution. The distributed protocol used for restart is
> > currently too fragile and complex to rely on customized user-space
> > implementations. That said, if someone brings very good arguments in favor of
> > userspace implementations, we might consider changing this.
>
> Zap also has distributed checkpoint which does not require strict
> kernel-side ordering. Do you need that because you do SSI ?

Yes. Tasks from different nodes have parent-children, session leader, etc.
relationships, and the distributed management of struct pid lifecycle is a bit
touchy too. By the way, splitting the checkpoint image in one file for each task
helps us a lot to make restart parallel, because it is more efficient for the file
system to handle parallel reads of different files from different nodes than
parallel reads on a single file descriptor from different nodes.

>
> >
> > Without taking distributed restart into account, I also tend to prefer
> > kernel-based, mainly for two (not so strong) reasons:
> > 1) this prevents userspace from doing weird things, like changing the task tree
> > and let the kernel detect it and deal with the mess this creates (think about
> > two threads being restarted in separate processes that do not even share their
> > parents). But one can argue that userspace can change the checkpoint image as
> > well, so that the kernel must check for such weird things anyway.
>
> I don't really buy this argument. First, as you say, user can change
> the checkpoint image file. Second, you can verify in the kernel that
> the real relationships of the processes match those specified (and
> expected from) the image file. That's pretty straightforward.
>
> > 2) restart will be more efficient with respect to shared objects.
>
> Can you elaborate on this ? In what sense "more efficient" ?
>
> Note that the topic in question is not whether to do the entire restart
> from user space (and I argue that most work should be done in the kernel),
> but rather whether process creation (and only that) should be done in
> kernel or user space.

See my answer to Dave.

>
> Quick thoughts of pros/cons of each approach are:
>
> user space:
>
> + re-use existing api (fork)
> + easier to debug
> + will allow 'handmade' resources restart: it was mentioned before that
> one may want to reattach stdout to a different place after restart; a
> user based restart of processes can make this much easier: e.g. the
> user process can create the alternative resources, give them to the
> kernel and only then call sys_restart)
> + arch-independent code
>
> - a bit slower than in kernel space
> - requires a clone-with-specific-pid syscall or interface
>
> kernel space:
>
> + a bit easier to control everything
> + a bit faster than user space
> + no need for user-visible interface for clone-with-...
>
> - arch-dependent code
> - needs special code to fight 'fork-bomb'
>
> So, I'm not convinced, and I even think there may be room to both, for
> the time being. I volunteer to support the user-space alternative while
> we make up our minds.

Yes, I hope that investigating both approaches will give us stronger arguments.

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (3.54 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-10-30 18:30:29

by Oren Laadan

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart



Louis Rilling wrote:
> On Thu, Oct 30, 2008 at 10:08:44AM -0700, Dave Hansen wrote:
>> On Thu, 2008-10-30 at 12:47 +0100, Louis Rilling wrote:
>>> 1) this prevents userspace from doing weird things, like changing the task tree
>>> and let the kernel detect it and deal with the mess this creates (think about
>>> two threads being restarted in separate processes that do not even share their
>>> parents). But one can argue that userspace can change the checkpoint image as
>>> well, so that the kernel must check for such weird things anyway.
>> To me, this is one of the strongest arguments out there for doing
>> restart as much as possible with existing user<->kernel APIs. Having
>> the kernel detect and clean up userspace's messes is not going to work.
>> We might as well just do things in the kernel rather than do that.
>>
>> What we *should* do is leverage all of the existing APIs that we already
>> have instead of creating completely new code paths into which my butter
>> fingers can introduce new kernel bugs.
>>
>>> 2) restart will be more efficient with respect to shared objects.
>> Can you quantify this? Which objects? How much more efficient?
>
> Quantify? No. I expect that investigating both approaches will show us numbers.
> Unless Oren already has some?

I do have some. it's pretty quick :) see the usenix 2007 paper...
the new implementation will be faster, though.

>
> Which objects? I think that two kinds will especially matter: objects usually
> shared only inside a thread group (mm_struct, fs_struct, files_struct,
> signal_struct and sighand_struct), and individual file descriptors. The point is
> to avoid creating new structures before destroying them because the restarted
> task shares them with a previously restarted one.

all the forks in the user space will be done with CLONE_VM etc, to avoid
exactly that sort of overhead.

in any event, my experience is that this is not the dominant factor in the
restart time.

>
> Concerning individual file descriptors, limiting the number of open files before
> calling sys_restart() may avoid these useless creations/destructions (actually
> the "useless" work mainly consists in managing ref counts since file descriptors
> are shared after fork()).
>
> Concerning thread-shared structures, it is probably easy for userspace to guess
> which clone flags to use when restarting threads, but
> 1) kernel-space will have to check that the sharing is correct anyway, and

ok. that's not a lot of work :p
(see more below)

> 2) kernel-space will have to fix it anyway if structures are not shared in an
> obvious manner between tasks (think about A creating B with shared files_struct,
> B creating C with shared files_struct, B unsharing its files_struct, and then
> checkpoint).
>
> So, with a userspace implementation, useless structures will be created anyway,
> and optimizing the common cases (regular threads) just duplicates kernel's work
> of checking which shared structure to use for each task to restart.
> With a kernel-space implementation, all useless creations can be avoided, and no
> duplicate work is needed.

they can also be avoided in user space - you "optimistically" create everything
shared to begin with, and in the kernel (inside sys_restart) you "unshare" and
create the necessary resources on demand - just like you would do with kernel
based process creation.

in this case, the extra work is only ref-counting, and then sys_restart will
unconditionally attach the right shared resource to the restarting process
(the "right" shared resource will be found, of course, in the shared pool).

this way, you don't even need to check what the user gave you - you simply
ignore overwrite it.

>
> That said, numbers may show us that useless creations are not so
> time-consuming, but we won't know before seeing them...

yes, odds are that you are right.

Oren

2008-10-30 18:33:30

by Oren Laadan

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart



Louis Rilling wrote:
> On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
>>
>> Louis Rilling wrote:
>>> In Kerrighed this is kernel-based, and will remain kernel-based because we
>>> checkpoint a distributed task tree, and want to restart it as mush as possible
>>> with the same distribution. The distributed protocol used for restart is
>>> currently too fragile and complex to rely on customized user-space
>>> implementations. That said, if someone brings very good arguments in favor of
>>> userspace implementations, we might consider changing this.
>> Zap also has distributed checkpoint which does not require strict
>> kernel-side ordering. Do you need that because you do SSI ?
>
> Yes. Tasks from different nodes have parent-children, session leader, etc.
> relationships, and the distributed management of struct pid lifecycle is a bit
> touchy too. By the way, splitting the checkpoint image in one file for each task
> helps us a lot to make restart parallel, because it is more efficient for the file
> system to handle parallel reads of different files from different nodes than
> parallel reads on a single file descriptor from different nodes.

You can also make parallel restart work with the single stream, without
much effort. Particularly if you store everything on the file system.

In both cases, the limiting factor is shared resources - where one task
cannot proceed with checkpoint because it waits for another task to first
(re)create that resource.

[...]

Oren.

2008-10-31 10:37:35

by Louis Rilling

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

On Thu, Oct 30, 2008 at 02:32:51PM -0400, Oren Laadan wrote:
>
>
> Louis Rilling wrote:
> > On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
> >>
> >> Louis Rilling wrote:
> >>> In Kerrighed this is kernel-based, and will remain kernel-based because we
> >>> checkpoint a distributed task tree, and want to restart it as mush as possible
> >>> with the same distribution. The distributed protocol used for restart is
> >>> currently too fragile and complex to rely on customized user-space
> >>> implementations. That said, if someone brings very good arguments in favor of
> >>> userspace implementations, we might consider changing this.
> >> Zap also has distributed checkpoint which does not require strict
> >> kernel-side ordering. Do you need that because you do SSI ?
> >
> > Yes. Tasks from different nodes have parent-children, session leader, etc.
> > relationships, and the distributed management of struct pid lifecycle is a bit
> > touchy too. By the way, splitting the checkpoint image in one file for each task
> > helps us a lot to make restart parallel, because it is more efficient for the file
> > system to handle parallel reads of different files from different nodes than
> > parallel reads on a single file descriptor from different nodes.
>
> You can also make parallel restart work with the single stream, without
> much effort. Particularly if you store everything on the file system.

Sure we can use a single stream, since we already share file descriptors accross
nodes. But the distributed synchronization of the file pointer is costly
compared to having each node access different files. This way we push the
parallelization bottelneck down to the file system rather than in the
distributed VFS layer.

>
> In both cases, the limiting factor is shared resources - where one task
> cannot proceed with checkpoint because it waits for another task to first
> (re)create that resource.

We just try to avoid other bottlenecks :) And besides file descriptors, shared
resources are as common as multi-threaded programs, which are not the majority
of the workloads we can address.

Louis

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (2.25 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-11-03 19:36:05

by Oren Laadan

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart



Andrey Mirkin wrote:
> On Monday 20 October 2008 19:55 Dave Hansen wrote:
>> On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
>>> Right now my patchset (v2) provides an ability to checkpoint and restart
>>> a group of processes. The process of checkpointing and restart can be
>>> initiated from external process (not from the process which should be
>>> checkpointed).
>> Absolutely. Oren's code does it this way to make for a smaller patch at
>> first. The syscall takes a pid argument so it is surely expected to be
>> expanded upon later.
>>
>>> Also I think that all the restart job (including process forking) should
>>> be done in kernel, as in this case we will not depend on user space and
>>> will be more secure. This is also implemented in my patchset.
>> Do you think that this is an approach that Oren's patches are married
>> to, or is this a "feature" we can add on later?
>
> Well, AFAICS from Oren's patch set his approach is oriented on process
> creation in user space. I think we should choose right now what approach will
> be used for process creation.
> We have two options here: fork processes in kernel or fork them in user space.
> If process will be forked in user space, then there will be a gap when process
> will be in user space and can be killed with received signal before entering

> kernel. Also we will need a functionolity to create processes with predefined
> PID. I think it is not very good to provide such ability to user space. That

Rethinking this -- if the user wishes she can construct a suitable
checkpoint image that would do exactly that. It takes more effort than
using a system call, but the result is similar.

What I had in mind for that special clone-with-pid is to restrict when
it can be used (e.g. when the container is in a "restarting" state or
something like that.

[...]

Oren.