2008-10-09 19:04:19

by Dave Hansen

[permalink] [raw]
Subject: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work


Suggested by Ingo.

Checkpoint/restart is going to be a long effort to get things working.
We're going to have a lot of things that we know just don't work for
a long time. That doesn't mean that it will be useless, it just means
that there's some complicated features that we are going to have to
work incrementally to fix.

This patch introduces a new mechanism to help the checkpoint/restart
developers. A new function pair: task/process_deny_checkpoint() is
created. When called, these tell the kernel that we *know* that the
process has performed some activity that will keep it from being
properly checkpointed.

The 'flag' is an atomic_t for now so that we can have some level
of atomicity and make sure to only warn once.

For now, this is a one-way trip. Once a process is no longer
'may_checkpoint' capable, neither it nor its children ever will be.
This can, of course, be fixed up in the future. We might want to
reset the flag when a new pid namespace is created, for instance.


Signed-off-by: Dave Hansen <[email protected]>
---

linux-2.6.git-dave/include/linux/checkpoint.h | 31 +++++++++++++++++++++++++-
linux-2.6.git-dave/include/linux/sched.h | 3 ++
linux-2.6.git-dave/kernel/fork.c | 10 ++++++++
3 files changed, 43 insertions(+), 1 deletion(-)

diff -puN include/linux/checkpoint.h~flag include/linux/checkpoint.h
--- linux-2.6.git/include/linux/checkpoint.h~flag 2008-10-09 11:56:57.000000000 -0700
+++ linux-2.6.git-dave/include/linux/checkpoint.h 2008-10-09 11:56:57.000000000 -0700
@@ -10,8 +10,11 @@
* distribution for more details.
*/

-#include <linux/path.h>
#include <linux/fs.h>
+#include <linux/path.h>
+#include <linux/sched.h>
+
+#ifdef CONFIG_CHECKPOINT_RESTART

#define CR_VERSION 2

@@ -94,4 +97,30 @@ extern void file_pos_write(struct file *
#define cr_debug(fmt, args...) \
pr_debug("[CR:%s] " fmt, __func__, ## args)

+static inline void __task_deny_checkpointing(struct task_struct *task,
+ char *file, int line)
+{
+ if(!atomic_dec_and_test(&task->may_checkpoint))
+ return;
+ printk(KERN_INFO "process performed an action that can not be "
+ "checkpointed at: %s:%d\n", file, line);
+ WARN_ON(1);
+}
+#define process_deny_checkpointing(p) __task_deny_checkpointing(p, __FILE__, __LINE__)
+
+/*
+ * For now, we're not going to have a distinction between
+ * tasks and processes for the purpose of c/r. But, allow
+ * these two calls anyway to make new users at least think
+ * about it.
+ */
+#define task_deny_checkpointing(p) __task_deny_checkpointing(p, __FILE__, __LINE__)
+
+#else
+
+static inline void task_deny_checkpointing(struct task_struct *task) {}
+static inline void process_deny_checkpointing(struct task_struct *task) {}
+
+#endif
+
#endif /* _CHECKPOINT_CKPT_H_ */
diff -puN include/linux/sched.h~flag include/linux/sched.h
--- linux-2.6.git/include/linux/sched.h~flag 2008-10-09 11:56:57.000000000 -0700
+++ linux-2.6.git-dave/include/linux/sched.h 2008-10-09 11:56:57.000000000 -0700
@@ -1301,6 +1301,9 @@ struct task_struct {
int latency_record_count;
struct latency_record latency_record[LT_SAVECOUNT];
#endif
+#ifdef CONFIG_CHECKPOINT_RESTART
+ atomic_t may_checkpoint;
+#endif
};

/*
diff -puN kernel/fork.c~flag kernel/fork.c
--- linux-2.6.git/kernel/fork.c~flag 2008-10-09 11:56:57.000000000 -0700
+++ linux-2.6.git-dave/kernel/fork.c 2008-10-09 11:56:57.000000000 -0700
@@ -194,6 +194,13 @@ void __init fork_init(unsigned long memp
init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
init_task.signal->rlim[RLIMIT_SIGPENDING] =
init_task.signal->rlim[RLIMIT_NPROC];
+
+#ifdef CONFIG_CHECKPOINT_RESTART
+ /*
+ * This probably won't stay set for long...
+ */
+ atomic_set(&init_task.may_checkpoint, 1);
+#endif
}

int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
@@ -244,6 +251,9 @@ static struct task_struct *dup_task_stru
tsk->btrace_seq = 0;
#endif
tsk->splice_pipe = NULL;
+#ifdef CONFIG_CHECKPOINT_RESTART
+ atomic_set(&tsk->may_checkpoint, atomic_read(&orig->may_checkpoint));
+#endif
return tsk;

out:
_


2008-10-09 19:04:35

by Dave Hansen

[permalink] [raw]
Subject: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()


These are just a few simple examples of things we know we can't
checkpoint now. There are plenty more, but this should give
everyone an idea how this will look in practice.


Signed-off-by: Dave Hansen <[email protected]>
---

linux-2.6.git-dave/fs/aio.c | 7 +++++++
linux-2.6.git-dave/ipc/mqueue.c | 3 +++
linux-2.6.git-dave/net/socket.c | 5 +++++
3 files changed, 15 insertions(+)

diff -puN fs/aio.c~no-checkpointing-for-sockets fs/aio.c
--- linux-2.6.git/fs/aio.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
+++ linux-2.6.git-dave/fs/aio.c 2008-10-09 11:56:58.000000000 -0700
@@ -19,6 +19,7 @@

#define DEBUG 0

+#include <linux/checkpoint.h>
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/file.h>
@@ -1663,6 +1664,8 @@ asmlinkage long sys_io_submit(aio_contex
if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
return -EFAULT;

+ process_deny_checkpointing(current);
+
ctx = lookup_ioctx(ctx_id);
if (unlikely(!ctx)) {
pr_debug("EINVAL: io_submit: invalid context id\n");
@@ -1742,6 +1745,8 @@ asmlinkage long sys_io_cancel(aio_contex
if (unlikely(!ctx))
return -EINVAL;

+ process_deny_checkpointing(current);
+
spin_lock_irq(&ctx->ctx_lock);
ret = -EAGAIN;
kiocb = lookup_kiocb(ctx, iocb, key);
@@ -1796,6 +1801,8 @@ asmlinkage long sys_io_getevents(aio_con
struct kioctx *ioctx = lookup_ioctx(ctx_id);
long ret = -EINVAL;

+ process_deny_checkpointing(current);
+
if (likely(ioctx)) {
if (likely(min_nr <= nr && min_nr >= 0 && nr >= 0))
ret = read_events(ioctx, min_nr, nr, events, timeout);
diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
--- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
+++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
@@ -14,6 +14,7 @@
*/

#include <linux/capability.h>
+#include <linux/checkpoint.h>
#include <linux/init.h>
#include <linux/pagemap.h>
#include <linux/file.h>
@@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
char *name;
int fd, error;

+ process_deny_checkpointing(current);
+
error = audit_mq_open(oflag, mode, u_attr);
if (error != 0)
return error;
diff -puN net/socket.c~no-checkpointing-for-sockets net/socket.c
--- linux-2.6.git/net/socket.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
+++ linux-2.6.git-dave/net/socket.c 2008-10-09 11:56:58.000000000 -0700
@@ -87,6 +87,7 @@
#include <linux/audit.h>
#include <linux/wireless.h>
#include <linux/nsproxy.h>
+#include <linux/checkpoint.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -1236,6 +1237,8 @@ asmlinkage long sys_socket(int family, i
if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;

+ process_deny_checkpointing(current);
+
retval = sock_create(family, type, protocol, &sock);
if (retval < 0)
goto out;
@@ -2130,6 +2133,8 @@ asmlinkage long sys_socketcall(int call,
a0 = a[0];
a1 = a[1];

+ process_deny_checkpointing(current);
+
switch (call) {
case SYS_SOCKET:
err = sys_socket(a0, a1, a[2]);
_

2008-10-09 19:44:09

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

Quoting Dave Hansen ([email protected]):
>
> These are just a few simple examples of things we know we can't
> checkpoint now. There are plenty more, but this should give
> everyone an idea how this will look in practice.
>
>
> Signed-off-by: Dave Hansen <[email protected]>
> ---
>
> linux-2.6.git-dave/fs/aio.c | 7 +++++++
> linux-2.6.git-dave/ipc/mqueue.c | 3 +++
> linux-2.6.git-dave/net/socket.c | 5 +++++
> 3 files changed, 15 insertions(+)
>
> diff -puN fs/aio.c~no-checkpointing-for-sockets fs/aio.c
> --- linux-2.6.git/fs/aio.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
> +++ linux-2.6.git-dave/fs/aio.c 2008-10-09 11:56:58.000000000 -0700
> @@ -19,6 +19,7 @@
>
> #define DEBUG 0
>
> +#include <linux/checkpoint.h>
> #include <linux/sched.h>
> #include <linux/fs.h>
> #include <linux/file.h>
> @@ -1663,6 +1664,8 @@ asmlinkage long sys_io_submit(aio_contex
> if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
> return -EFAULT;
>
> + process_deny_checkpointing(current);
> +
> ctx = lookup_ioctx(ctx_id);
> if (unlikely(!ctx)) {
> pr_debug("EINVAL: io_submit: invalid context id\n");
> @@ -1742,6 +1745,8 @@ asmlinkage long sys_io_cancel(aio_contex
> if (unlikely(!ctx))
> return -EINVAL;
>
> + process_deny_checkpointing(current);
> +
> spin_lock_irq(&ctx->ctx_lock);
> ret = -EAGAIN;
> kiocb = lookup_kiocb(ctx, iocb, key);
> @@ -1796,6 +1801,8 @@ asmlinkage long sys_io_getevents(aio_con
> struct kioctx *ioctx = lookup_ioctx(ctx_id);
> long ret = -EINVAL;
>
> + process_deny_checkpointing(current);

Hmm, I don't know too much about aio, but is it possible to succeed with
io_getevents if we didn't first do a submit? It looks like the contexts
are looked up out of current->mm, so I don't think we need this call
here.

Otherwise, this is neat.

-serge

> +
> if (likely(ioctx)) {
> if (likely(min_nr <= nr && min_nr >= 0 && nr >= 0))
> ret = read_events(ioctx, min_nr, nr, events, timeout);
> diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
> --- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
> +++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
> @@ -14,6 +14,7 @@
> */
>
> #include <linux/capability.h>
> +#include <linux/checkpoint.h>
> #include <linux/init.h>
> #include <linux/pagemap.h>
> #include <linux/file.h>
> @@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
> char *name;
> int fd, error;
>
> + process_deny_checkpointing(current);
> +
> error = audit_mq_open(oflag, mode, u_attr);
> if (error != 0)
> return error;
> diff -puN net/socket.c~no-checkpointing-for-sockets net/socket.c
> --- linux-2.6.git/net/socket.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
> +++ linux-2.6.git-dave/net/socket.c 2008-10-09 11:56:58.000000000 -0700
> @@ -87,6 +87,7 @@
> #include <linux/audit.h>
> #include <linux/wireless.h>
> #include <linux/nsproxy.h>
> +#include <linux/checkpoint.h>
>
> #include <asm/uaccess.h>
> #include <asm/unistd.h>
> @@ -1236,6 +1237,8 @@ asmlinkage long sys_socket(int family, i
> if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
> flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
>
> + process_deny_checkpointing(current);
> +
> retval = sock_create(family, type, protocol, &sock);
> if (retval < 0)
> goto out;
> @@ -2130,6 +2133,8 @@ asmlinkage long sys_socketcall(int call,
> a0 = a[0];
> a1 = a[1];
>
> + process_deny_checkpointing(current);
> +
> switch (call) {
> case SYS_SOCKET:
> err = sys_socket(a0, a1, a[2]);
> _
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers

2008-10-09 20:54:43

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

On Thu, 2008-10-09 at 14:43 -0500, Serge E. Hallyn wrote:
> Hmm, I don't know too much about aio, but is it possible to succeed with
> io_getevents if we didn't first do a submit? It looks like the contexts
> are looked up out of current->mm, so I don't think we need this call
> here.
>
> Otherwise, this is neat.

Good question. I know nothing, either. :)

My thought was that any process *trying* to do aio stuff of any kind is
going to be really confused if it gets checkpointed. Or, it might try
to submit an aio right after it checks the list of them. I thought it
best to be cautious and say, if you screw with aio, no checkpointing for
you!

-- Dave

2008-10-10 08:21:05

by Greg Kurz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

On Thu, 2008-10-09 at 12:04 -0700, Dave Hansen wrote:
> Suggested by Ingo.
>
> Checkpoint/restart is going to be a long effort to get things working.
> We're going to have a lot of things that we know just don't work for
> a long time. That doesn't mean that it will be useless, it just means
> that there's some complicated features that we are going to have to
> work incrementally to fix.
>
> This patch introduces a new mechanism to help the checkpoint/restart
> developers. A new function pair: task/process_deny_checkpoint() is
> created. When called, these tell the kernel that we *know* that the
> process has performed some activity that will keep it from being
> properly checkpointed.
>
> The 'flag' is an atomic_t for now so that we can have some level
> of atomicity and make sure to only warn once.
>
> For now, this is a one-way trip. Once a process is no longer
> 'may_checkpoint' capable, neither it nor its children ever will be.
> This can, of course, be fixed up in the future. We might want to
> reset the flag when a new pid namespace is created, for instance.
>

Then this patch should be described as:

Track in-kernel when we expect checkpoint/restart to fail.

By the way, why don't you introduce the reverse operation ?

>
> Signed-off-by: Dave Hansen <[email protected]>
> ---
>
> linux-2.6.git-dave/include/linux/checkpoint.h | 31 +++++++++++++++++++++++++-
> linux-2.6.git-dave/include/linux/sched.h | 3 ++
> linux-2.6.git-dave/kernel/fork.c | 10 ++++++++
> 3 files changed, 43 insertions(+), 1 deletion(-)
>
> diff -puN include/linux/checkpoint.h~flag include/linux/checkpoint.h
> --- linux-2.6.git/include/linux/checkpoint.h~flag 2008-10-09 11:56:57.000000000 -0700
> +++ linux-2.6.git-dave/include/linux/checkpoint.h 2008-10-09 11:56:57.000000000 -0700
> @@ -10,8 +10,11 @@
> * distribution for more details.
> */
>
> -#include <linux/path.h>
> #include <linux/fs.h>
> +#include <linux/path.h>
> +#include <linux/sched.h>
> +
> +#ifdef CONFIG_CHECKPOINT_RESTART
>
> #define CR_VERSION 2
>
> @@ -94,4 +97,30 @@ extern void file_pos_write(struct file *
> #define cr_debug(fmt, args...) \
> pr_debug("[CR:%s] " fmt, __func__, ## args)
>
> +static inline void __task_deny_checkpointing(struct task_struct *task,
> + char *file, int line)
> +{
> + if(!atomic_dec_and_test(&task->may_checkpoint))
> + return;
> + printk(KERN_INFO "process performed an action that can not be "
> + "checkpointed at: %s:%d\n", file, line);
> + WARN_ON(1);
> +}
> +#define process_deny_checkpointing(p) __task_deny_checkpointing(p, __FILE__, __LINE__)
> +
> +/*
> + * For now, we're not going to have a distinction between
> + * tasks and processes for the purpose of c/r. But, allow
> + * these two calls anyway to make new users at least think
> + * about it.
> + */
> +#define task_deny_checkpointing(p) __task_deny_checkpointing(p, __FILE__, __LINE__)
> +
> +#else
> +
> +static inline void task_deny_checkpointing(struct task_struct *task) {}
> +static inline void process_deny_checkpointing(struct task_struct *task) {}
> +
> +#endif
> +
> #endif /* _CHECKPOINT_CKPT_H_ */
> diff -puN include/linux/sched.h~flag include/linux/sched.h
> --- linux-2.6.git/include/linux/sched.h~flag 2008-10-09 11:56:57.000000000 -0700
> +++ linux-2.6.git-dave/include/linux/sched.h 2008-10-09 11:56:57.000000000 -0700
> @@ -1301,6 +1301,9 @@ struct task_struct {
> int latency_record_count;
> struct latency_record latency_record[LT_SAVECOUNT];
> #endif
> +#ifdef CONFIG_CHECKPOINT_RESTART
> + atomic_t may_checkpoint;
> +#endif
> };
>
> /*
> diff -puN kernel/fork.c~flag kernel/fork.c
> --- linux-2.6.git/kernel/fork.c~flag 2008-10-09 11:56:57.000000000 -0700
> +++ linux-2.6.git-dave/kernel/fork.c 2008-10-09 11:56:57.000000000 -0700
> @@ -194,6 +194,13 @@ void __init fork_init(unsigned long memp
> init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
> init_task.signal->rlim[RLIMIT_SIGPENDING] =
> init_task.signal->rlim[RLIMIT_NPROC];
> +
> +#ifdef CONFIG_CHECKPOINT_RESTART
> + /*
> + * This probably won't stay set for long...
> + */
> + atomic_set(&init_task.may_checkpoint, 1);
> +#endif
> }
>
> int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
> @@ -244,6 +251,9 @@ static struct task_struct *dup_task_stru
> tsk->btrace_seq = 0;
> #endif
> tsk->splice_pipe = NULL;
> +#ifdef CONFIG_CHECKPOINT_RESTART
> + atomic_set(&tsk->may_checkpoint, atomic_read(&orig->may_checkpoint));
> +#endif
> return tsk;
>
> out:
> _
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers
--
Gregory Kurz [email protected]
Software Engineer @ IBM/Meiosys http://www.ibm.com
Tel +33 (0)534 638 479 Fax +33 (0)561 400 420

"Anarchy is about taking complete responsibility for yourself."
Alan Moore.

2008-10-10 08:39:40

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

Greg Kurz wrote:
> On Thu, 2008-10-09 at 12:04 -0700, Dave Hansen wrote:
>> Suggested by Ingo.
>>
>> Checkpoint/restart is going to be a long effort to get things working.
>> We're going to have a lot of things that we know just don't work for
>> a long time. That doesn't mean that it will be useless, it just means
>> that there's some complicated features that we are going to have to
>> work incrementally to fix.
>>
>> This patch introduces a new mechanism to help the checkpoint/restart
>> developers. A new function pair: task/process_deny_checkpoint() is
>> created. When called, these tell the kernel that we *know* that the
>> process has performed some activity that will keep it from being
>> properly checkpointed.
>>
>> The 'flag' is an atomic_t for now so that we can have some level
>> of atomicity and make sure to only warn once.
>>
>> For now, this is a one-way trip. Once a process is no longer
>> 'may_checkpoint' capable, neither it nor its children ever will be.
>> This can, of course, be fixed up in the future. We might want to
>> reset the flag when a new pid namespace is created, for instance.
>>
>
> Then this patch should be described as:
>
> Track in-kernel when we expect checkpoint/restart to fail.
>
> By the way, why don't you introduce the reverse operation ?

I think implementing the reverse operation will be a nightmare, IMHO it
is safe to say we deny checkpointing for the process life-cycle either
if the created resource was destroyed before we initiate the checkpoint.

For example, you create a socket, the process becomes uncheckpointable,
you close (via sys_close) the socket, you have to track this close to be
related to the socket which made the process uncheckpointable in order
to make the operation reversible.

Let's imagine you implement this reverse operation anyway, you have a
process which creates a TCP connection, writes data and close the socket
(so you are again checkpointable), but in the namespace there is the
orphan socket which is not checkpointable yet and you missed this case.

2008-10-10 08:43:50

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

Dave Hansen wrote:
> These are just a few simple examples of things we know we can't
> checkpoint now. There are plenty more, but this should give
> everyone an idea how this will look in practice.
>
>
> Signed-off-by: Dave Hansen <[email protected]>
> ---
>
> linux-2.6.git-dave/fs/aio.c | 7 +++++++
> linux-2.6.git-dave/ipc/mqueue.c | 3 +++
> linux-2.6.git-dave/net/socket.c | 5 +++++
> 3 files changed, 15 insertions(+)
>
> diff -puN fs/aio.c~no-checkpointing-for-sockets fs/aio.c
> --- linux-2.6.git/fs/aio.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
> +++ linux-2.6.git-dave/fs/aio.c 2008-10-09 11:56:58.000000000 -0700
> @@ -19,6 +19,7 @@
>
> #define DEBUG 0
>
> +#include <linux/checkpoint.h>
> #include <linux/sched.h>
> #include <linux/fs.h>
> #include <linux/file.h>
> @@ -1663,6 +1664,8 @@ asmlinkage long sys_io_submit(aio_contex
> if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
> return -EFAULT;
>
> + process_deny_checkpointing(current);
> +
> ctx = lookup_ioctx(ctx_id);
> if (unlikely(!ctx)) {
> pr_debug("EINVAL: io_submit: invalid context id\n");
> @@ -1742,6 +1745,8 @@ asmlinkage long sys_io_cancel(aio_contex
> if (unlikely(!ctx))
> return -EINVAL;
>
> + process_deny_checkpointing(current);
> +
> spin_lock_irq(&ctx->ctx_lock);
> ret = -EAGAIN;
> kiocb = lookup_kiocb(ctx, iocb, key);
> @@ -1796,6 +1801,8 @@ asmlinkage long sys_io_getevents(aio_con
> struct kioctx *ioctx = lookup_ioctx(ctx_id);
> long ret = -EINVAL;
>
> + process_deny_checkpointing(current);
> +
> if (likely(ioctx)) {
> if (likely(min_nr <= nr && min_nr >= 0 && nr >= 0))
> ret = read_events(ioctx, min_nr, nr, events, timeout);
> diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
> --- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
> +++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
> @@ -14,6 +14,7 @@
> */
>
> #include <linux/capability.h>
> +#include <linux/checkpoint.h>
> #include <linux/init.h>
> #include <linux/pagemap.h>
> #include <linux/file.h>
> @@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
> char *name;
> int fd, error;
>
> + process_deny_checkpointing(current);
> +
> error = audit_mq_open(oflag, mode, u_attr);
> if (error != 0)
> return error;
> diff -puN net/socket.c~no-checkpointing-for-sockets net/socket.c
> --- linux-2.6.git/net/socket.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
> +++ linux-2.6.git-dave/net/socket.c 2008-10-09 11:56:58.000000000 -0700
> @@ -87,6 +87,7 @@
> #include <linux/audit.h>
> #include <linux/wireless.h>
> #include <linux/nsproxy.h>
> +#include <linux/checkpoint.h>
>
> #include <asm/uaccess.h>
> #include <asm/unistd.h>
> @@ -1236,6 +1237,8 @@ asmlinkage long sys_socket(int family, i
> if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
> flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
>
> + process_deny_checkpointing(current);
> +
> retval = sock_create(family, type, protocol, &sock);
> if (retval < 0)
> goto out;
> @@ -2130,6 +2133,8 @@ asmlinkage long sys_socketcall(int call,
> a0 = a[0];
> a1 = a[1];
>
> + process_deny_checkpointing(current);
> +
> switch (call) {
> case SYS_SOCKET:
> err = sys_socket(a0, a1, a[2]);

That seems to be a good idea.

There isn't a risk of a big propagation of this function all around the
kernel code ? Especially if there are partial support for a specific
resource ? I mean we are able to checkpoint ipv4/tcp sockets, (yeah I
like socket examples :) ) , but not other protocols so we have to remove
from sys_socket the process_deny_checkpoint and add it to all others
protocols.



2008-10-10 08:48:46

by Greg Kurz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

On Fri, 2008-10-10 at 10:37 +0200, Daniel Lezcano wrote:
> I think implementing the reverse operation will be a nightmare, IMHO it
> is safe to say we deny checkpointing for the process life-cycle either
> if the created resource was destroyed before we initiate the checkpoint.
>
> For example, you create a socket, the process becomes uncheckpointable,
> you close (via sys_close) the socket, you have to track this close to be
> related to the socket which made the process uncheckpointable in order
> to make the operation reversible.
>
> Let's imagine you implement this reverse operation anyway, you have a
> process which creates a TCP connection, writes data and close the socket
> (so you are again checkpointable), but in the namespace there is the
> orphan socket which is not checkpointable yet and you missed this case.

That's exactly what I wanted to read... Tracking only is inherently
flawed. The valid way IMHO implies checks at checkpoint time.

--
Gregory Kurz [email protected]
Software Engineer @ IBM/Meiosys http://www.ibm.com
Tel +33 (0)534 638 479 Fax +33 (0)561 400 420

"Anarchy is about taking complete responsibility for yourself."
Alan Moore.

2008-10-10 08:50:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()


* Dave Hansen <[email protected]> wrote:

> On Thu, 2008-10-09 at 14:43 -0500, Serge E. Hallyn wrote:
> > Hmm, I don't know too much about aio, but is it possible to succeed with
> > io_getevents if we didn't first do a submit? It looks like the contexts
> > are looked up out of current->mm, so I don't think we need this call
> > here.
> >
> > Otherwise, this is neat.
>
> Good question. I know nothing, either. :)
>
> My thought was that any process *trying* to do aio stuff of any kind
> is going to be really confused if it gets checkpointed. Or, it might
> try to submit an aio right after it checks the list of them. I
> thought it best to be cautious and say, if you screw with aio, no
> checkpointing for you!

as long as there's total transparency and the transition from CR-capable
to CR-disabled state is absolutely safe and race-free, that should be
fine.

I expect users to quickly cause enough pressure to reduce the NOCR areas
of the kernel significantly ;-)

In the long run, could we expect a (experimental) version of hibernation
that would just use this checkpointing facility to hibernate? That would
be way cool for users and for testing: we could do transparent kernel
upgrades/downgrades via this form of hibernation, between CR-compatible
kernels (!).

distros could mark certain kernels as 'safe fallback' kernels, and if
say a WARN_ON() or app breakage hits that is suspected to be kernel
related, the user could hit a 'switch back to safe kernel' button -
which would switch back _without losing any app state_.

People could even try new versions of the kernel which would just fall
back to the known-workin safe kernel if the bootup fails for example.

Pie in the sky for sure, but way cool: it could propel Linux kernel
testing to completely new areas - new kernels could be tried
non-intrusively. (as long as a new kernel does not corrupt the CR data
structures - so some good consistency and redundancy checking would be
nice in the format!)

Ingo

2008-10-10 10:12:52

by Oren Laadan

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work



Daniel Lezcano wrote:
> Greg Kurz wrote:
>> On Thu, 2008-10-09 at 12:04 -0700, Dave Hansen wrote:
>>> Suggested by Ingo.
>>>
>>> Checkpoint/restart is going to be a long effort to get things working.
>>> We're going to have a lot of things that we know just don't work for
>>> a long time. That doesn't mean that it will be useless, it just means
>>> that there's some complicated features that we are going to have to
>>> work incrementally to fix.
>>>
>>> This patch introduces a new mechanism to help the checkpoint/restart
>>> developers. A new function pair: task/process_deny_checkpoint() is
>>> created. When called, these tell the kernel that we *know* that the
>>> process has performed some activity that will keep it from being
>>> properly checkpointed.
>>>
>>> The 'flag' is an atomic_t for now so that we can have some level
>>> of atomicity and make sure to only warn once.
>>>
>>> For now, this is a one-way trip. Once a process is no longer
>>> 'may_checkpoint' capable, neither it nor its children ever will be.
>>> This can, of course, be fixed up in the future. We might want to
>>> reset the flag when a new pid namespace is created, for instance.
>>>
>> Then this patch should be described as:
>>
>> Track in-kernel when we expect checkpoint/restart to fail.
>>
>> By the way, why don't you introduce the reverse operation ?
>
> I think implementing the reverse operation will be a nightmare, IMHO it
> is safe to say we deny checkpointing for the process life-cycle either
> if the created resource was destroyed before we initiate the checkpoint.
>
> For example, you create a socket, the process becomes uncheckpointable,
> you close (via sys_close) the socket, you have to track this close to be
> related to the socket which made the process uncheckpointable in order
> to make the operation reversible.

I agree that it makes sense to only track transitions in one direction.
Therefore at any given point in time all we'll know is that the process
"may be non-checkpointable", instead of the clear-cut "uncheckpointable"
(webster anyone ?).

The distinction is important, because it may be that the process is,
after all, checkpointable, so users/developers could still try to
perform a checkpoint, should they wish too. The only thing is that
it is not guaranteed to succeed.

In fact, one way to transition back to the "checkpointable" state is
by doing a dry-checkpoint, where no data is saved (/dev/null ?). No
side effects will occur except for a short downtime due to the freeze
period. If the dry-checkpoint completes successfully - we can reset
the non-/un-/not-/a-/dis-checkpointable flag.

>
> Let's imagine you implement this reverse operation anyway, you have a
> process which creates a TCP connection, writes data and close the socket
> (so you are again checkpointable), but in the namespace there is the
> orphan socket which is not checkpointable yet and you missed this case.
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers

Oren.

2008-10-10 10:17:40

by Cédric Le Goater

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

> diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
> --- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
> +++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
> @@ -14,6 +14,7 @@
> */
>
> #include <linux/capability.h>
> +#include <linux/checkpoint.h>
> #include <linux/init.h>
> #include <linux/pagemap.h>
> #include <linux/file.h>
> @@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
> char *name;
> int fd, error;
>
> + process_deny_checkpointing(current);
> +

mqueue being a file system, i would put the checks in the inode_operations.

Also, you can't always deny ! I would expect some allow in sys_mq_unlink().

C.

2008-10-10 10:28:21

by Cédric Le Goater

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

Dave Hansen wrote:
> On Thu, 2008-10-09 at 14:43 -0500, Serge E. Hallyn wrote:
>> Hmm, I don't know too much about aio, but is it possible to succeed with
>> io_getevents if we didn't first do a submit? It looks like the contexts
>> are looked up out of current->mm, so I don't think we need this call
>> here.
>>
>> Otherwise, this is neat.
>
> Good question. I know nothing, either. :)
>
> My thought was that any process *trying* to do aio stuff of any kind is
> going to be really confused if it gets checkpointed. Or, it might try
> to submit an aio right after it checks the list of them. I thought it
> best to be cautious and say, if you screw with aio, no checkpointing for
> you!

IMO, the right path would be to deny checkpoint if you have aio requests
pending but if you have not, aio's are not an issue for checkpoint anymore.

but that's one easy one to track ...

C.

2008-10-10 13:13:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

On Friday, 10 of October 2008, Ingo Molnar wrote:
>
> * Dave Hansen <[email protected]> wrote:
>
> > On Thu, 2008-10-09 at 14:43 -0500, Serge E. Hallyn wrote:
> > > Hmm, I don't know too much about aio, but is it possible to succeed with
> > > io_getevents if we didn't first do a submit? It looks like the contexts
> > > are looked up out of current->mm, so I don't think we need this call
> > > here.
> > >
> > > Otherwise, this is neat.
> >
> > Good question. I know nothing, either. :)
> >
> > My thought was that any process *trying* to do aio stuff of any kind
> > is going to be really confused if it gets checkpointed. Or, it might
> > try to submit an aio right after it checks the list of them. I
> > thought it best to be cautious and say, if you screw with aio, no
> > checkpointing for you!
>
> as long as there's total transparency and the transition from CR-capable
> to CR-disabled state is absolutely safe and race-free, that should be
> fine.
>
> I expect users to quickly cause enough pressure to reduce the NOCR areas
> of the kernel significantly ;-)
>
> In the long run, could we expect a (experimental) version of hibernation
> that would just use this checkpointing facility to hibernate?

Surely not ACPI-compliant.

Apart from this I don't see why not, but OTOH I'm not particularly interested
in implementing that.

Thanks,
Rafael

2008-10-10 14:04:47

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

Quoting Cedric Le Goater ([email protected]):
> > diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
> > --- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
> > +++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
> > @@ -14,6 +14,7 @@
> > */
> >
> > #include <linux/capability.h>
> > +#include <linux/checkpoint.h>
> > #include <linux/init.h>
> > #include <linux/pagemap.h>
> > #include <linux/file.h>
> > @@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
> > char *name;
> > int fd, error;
> >
> > + process_deny_checkpointing(current);
> > +
>
> mqueue being a file system, i would put the checks in the inode_operations.
>
> Also, you can't always deny ! I would expect some allow in sys_mq_unlink().

Remember a part of Ingo's motivation is to push c/r developers to
address the lacking features that users use most, earlier. So the
warnings and subsequent email complaints are what we're after. Hence a
single 'checkpointable or not' flag.

Given the single flag, how do you know at sys_mq_unlink() whether the
process also has an opensocket?

Rather than make this tracking facility more complicated and intrusive,
if people complain that they couldn't checkpoint bc of a warning about
aio, then we implement aio c/r! We don't just try and reduce the amount
of time that you can't checkpoint bc of lack of aio c/r support :)

-serge

2008-10-10 14:55:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()


* Rafael J. Wysocki <[email protected]> wrote:

> > In the long run, could we expect a (experimental) version of
> > hibernation that would just use this checkpointing facility to
> > hibernate?
>
> Surely not ACPI-compliant.

what do you mean?

Ingo

2008-10-10 15:03:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work


* Daniel Lezcano <[email protected]> wrote:

>> By the way, why don't you introduce the reverse operation ?
>
> I think implementing the reverse operation will be a nightmare, IMHO
> it is safe to say we deny checkpointing for the process life-cycle
> either if the created resource was destroyed before we initiate the
> checkpoint.

it's also a not too interesting case. The end goal is to just be able to
checkpoint everything that matters - in the long run there simply wont
be many places that are marked 'cannot checkpoint'.

So the ability to deny a checkpoint is a transitional feature - a
flexible CR todo list in essence - but also needed for
applications/users that want to rely on CR being a dependable facility.

It would be bad for most of the practical usecases of checkpointing to
allow the checkpointing of an app, just to see it break on restore due
to lost context.

Ingo

2008-10-10 15:23:40

by Oren Laadan

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work



Ingo Molnar wrote:
> * Daniel Lezcano <[email protected]> wrote:
>
>>> By the way, why don't you introduce the reverse operation ?
>> I think implementing the reverse operation will be a nightmare, IMHO
>> it is safe to say we deny checkpointing for the process life-cycle
>> either if the created resource was destroyed before we initiate the
>> checkpoint.
>
> it's also a not too interesting case. The end goal is to just be able to
> checkpoint everything that matters - in the long run there simply wont
> be many places that are marked 'cannot checkpoint'.
>
> So the ability to deny a checkpoint is a transitional feature - a
> flexible CR todo list in essence - but also needed for
> applications/users that want to rely on CR being a dependable facility.
>
> It would be bad for most of the practical usecases of checkpointing to
> allow the checkpointing of an app, just to see it break on restore due
> to lost context.

Actually it need not wait for restore to fail - it can fail during the
checkpoint, as soon as the unsupported feature is encountered.

Adding that flag of what you suggest will help make it more vocal and
obvious that a feature isn't supported, even without the user actually
trying to take a checkpoint. I like that I idea.

Oren.

>
> Ingo
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers

2008-10-10 15:31:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work


* Oren Laadan <[email protected]> wrote:

>
>
> Ingo Molnar wrote:
> > * Daniel Lezcano <[email protected]> wrote:
> >
> >>> By the way, why don't you introduce the reverse operation ?
> >> I think implementing the reverse operation will be a nightmare, IMHO
> >> it is safe to say we deny checkpointing for the process life-cycle
> >> either if the created resource was destroyed before we initiate the
> >> checkpoint.
> >
> > it's also a not too interesting case. The end goal is to just be able to
> > checkpoint everything that matters - in the long run there simply wont
> > be many places that are marked 'cannot checkpoint'.
> >
> > So the ability to deny a checkpoint is a transitional feature - a
> > flexible CR todo list in essence - but also needed for
> > applications/users that want to rely on CR being a dependable facility.
> >
> > It would be bad for most of the practical usecases of checkpointing to
> > allow the checkpointing of an app, just to see it break on restore due
> > to lost context.
>
> Actually it need not wait for restore to fail - it can fail during the
> checkpoint, as soon as the unsupported feature is encountered.

correct, that is the idea.

Ingo

2008-10-10 16:35:04

by Greg Kurz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

On Fri, 2008-10-10 at 11:17 -0400, Oren Laadan wrote:
>
> Ingo Molnar wrote:
> > * Daniel Lezcano <[email protected]> wrote:
> >
> >>> By the way, why don't you introduce the reverse operation ?
> >> I think implementing the reverse operation will be a nightmare, IMHO
> >> it is safe to say we deny checkpointing for the process life-cycle
> >> either if the created resource was destroyed before we initiate the
> >> checkpoint.
> >
> > it's also a not too interesting case. The end goal is to just be able to
> > checkpoint everything that matters - in the long run there simply wont
> > be many places that are marked 'cannot checkpoint'.
> >
> > So the ability to deny a checkpoint is a transitional feature - a
> > flexible CR todo list in essence - but also needed for
> > applications/users that want to rely on CR being a dependable facility.
> >
> > It would be bad for most of the practical usecases of checkpointing to
> > allow the checkpointing of an app, just to see it break on restore due
> > to lost context.
>
> Actually it need not wait for restore to fail - it can fail during the
> checkpoint, as soon as the unsupported feature is encountered.
>

Of course, bad things must be spotted at checkpoint time ! :)

> Adding that flag of what you suggest will help make it more vocal and
> obvious that a feature isn't supported, even without the user actually
> trying to take a checkpoint. I like that I idea.
>

This flag is weak... testing it gives absolutly no hint whether the
checkpoint may succeed or not. As it is designed now, a user can only be
aware that checkpoint is *forever* denied. I agree that it's only useful
as a "flexible CR todo list".

In the long run, if there are still things that can prevent checkpoint
from being consistent, they will have to be checked at checkpoint time.

Greg.

2008-10-10 16:37:22

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

On Fri, 2008-10-10 at 10:37 +0200, Daniel Lezcano wrote:
> For example, you create a socket, the process becomes uncheckpointable,
> you close (via sys_close) the socket, you have to track this close to be
> related to the socket which made the process uncheckpointable in order
> to make the operation reversible.

Challenging, yes. Not quite a nightmare, though.

It's basically the same problem we have with r/o bind mounts individual
users in the kernel need to check some global state. There are
temporary (like the duration of the syscall) and more chronic (like your
sockets) users. The hard part is making sure that everything gets
properly balanced with the chronic ones. But, we did manage to
accomplish that with the r/o bind mounts. I haven't seen a bug in
*hours*! ;)

In any case, we can worry about that later.

-- Dave

2008-10-10 16:37:40

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

On Fri, 2008-10-10 at 18:34 +0200, Greg Kurz wrote:
> This flag is weak... testing it gives absolutly no hint whether the
> checkpoint may succeed or not. As it is designed now, a user can only be
> aware that checkpoint is *forever* denied. I agree that it's only useful
> as a "flexible CR todo list".

Cool, so everyone agrees the patch is useful!

-- Dave

2008-10-10 16:45:45

by Greg Kurz

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

On Fri, 2008-10-10 at 09:04 -0500, Serge E. Hallyn wrote:
> Remember a part of Ingo's motivation is to push c/r developers to
> address the lacking features that users use most, earlier. So the
> warnings and subsequent email complaints are what we're after. Hence a
> single 'checkpointable or not' flag.
>
> Given the single flag, how do you know at sys_mq_unlink() whether the
> process also has an opensocket?
>
> Rather than make this tracking facility more complicated and intrusive,
> if people complain that they couldn't checkpoint bc of a warning about
> aio, then we implement aio c/r! We don't just try and reduce the amount
> of time that you can't checkpoint bc of lack of aio c/r support :)
>
> -serge

Serge,

It's exactly what I meant before, the tracking facility would be awfully
complicated. It cannot be done that way.
But there's also something awkward with the flag thing : can you provide
right now an exhaustive list of all the places where you must raise it ?

I'd rather do some heavy checking at checkpoint time.

Greg.

2008-10-10 17:14:15

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

Quoting Greg Kurz ([email protected]):
> On Fri, 2008-10-10 at 09:04 -0500, Serge E. Hallyn wrote:
> > Remember a part of Ingo's motivation is to push c/r developers to
> > address the lacking features that users use most, earlier. So the
> > warnings and subsequent email complaints are what we're after. Hence a
> > single 'checkpointable or not' flag.
> >
> > Given the single flag, how do you know at sys_mq_unlink() whether the
> > process also has an opensocket?
> >
> > Rather than make this tracking facility more complicated and intrusive,
> > if people complain that they couldn't checkpoint bc of a warning about
> > aio, then we implement aio c/r! We don't just try and reduce the amount
> > of time that you can't checkpoint bc of lack of aio c/r support :)
> >
> > -serge
>
> Serge,
>
> It's exactly what I meant before, the tracking facility would be awfully
> complicated. It cannot be done that way.
> But there's also something awkward with the flag thing : can you provide
> right now an exhaustive list of all the places where you must raise it ?
>
> I'd rather do some heavy checking at checkpoint time.

Noone is saying that we are not going to do that.

-serge

2008-10-10 17:19:20

by Chris Friesen

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

Greg Kurz wrote:

> This flag is weak... testing it gives absolutly no hint whether the
> checkpoint may succeed or not. As it is designed now, a user can only be
> aware that checkpoint is *forever* denied. I agree that it's only useful
> as a "flexible CR todo list".

I don't think it's true that it gives "absolutly no hint".

If the flag is not set, then checkpoint will succeed, right? Whereas if
the flag is set, then it's an indication that checkpoint could fail (but
may still succeed if whatever condition caused the flag to be set is no
longer true).

Chris

2008-10-10 17:28:42

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

On Fri, 2008-10-10 at 18:45 +0200, Greg Kurz wrote:
> It's exactly what I meant before, the tracking facility would be awfully
> complicated. It cannot be done that way.
> But there's also something awkward with the flag thing : can you provide
> right now an exhaustive list of all the places where you must raise it ?

Greg, that's just pure FUD. We don't say that spinlocks are a bad thing
because we can't come up with an exhaustive list of places where we need
locking.

We'll do plenty of checks at checkpoint time.

We'll do plenty of checks at runtime.

Neither will work completely on its own, and neither will be exhaustive.
The way this will work is just as Serge said: in true Linux style, we'll
add more places users of process_deny_checkpoint() incrementally as we
find them and as people complain. We'll also be incrementally removing
them as we add functionality.

-- Dave

2008-10-10 19:49:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

On Friday, 10 of October 2008, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > > In the long run, could we expect a (experimental) version of
> > > hibernation that would just use this checkpointing facility to
> > > hibernate?
> >
> > Surely not ACPI-compliant.
>
> what do you mean?

The ACPI spec says quite specifically what should be done while entering
hibernation and during resume from hibernation. We're not following that
in the current code, but we can (gradually) update the code to become
ACPI-compilant in that respect. However, if we go the checkpointing route, I
don't think that will be possible any more.

[In short, the problem is that ACPI regards the S4 state corresponding to
hibernation as a sleep state of the system which is therefore fundamentally
different from the soft power-off state and requires special handling.]

This may be a theory etc. (I don't want to start the entire discussion about
that once again), but clearly there's a choice to be made here. I'd prefer
hibernation to be ACPI-compliant, but if people don't want that, I won't fight
for it.

Thanks,
Rafael

2008-10-10 19:54:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()


* Rafael J. Wysocki <[email protected]> wrote:

> > > Surely not ACPI-compliant.
> >
> > what do you mean?
>
> The ACPI spec says quite specifically what should be done while
> entering hibernation and during resume from hibernation. We're not
> following that in the current code, but we can (gradually) update the
> code to become ACPI-compilant in that respect. However, if we go the
> checkpointing route, I don't think that will be possible any more.

ah, i see. I did not mean to utilize any ACPI paths but simple powerdown
or reboot.

If we checkpoint all apps to persistent disk areas (which the checkpoint
patches in this thread are about), then we can just reboot the kernel
and forget all its state.

That capability can be used to build a really robust hibernation
implementation IMO: we could "hibernate/kexec" over between different
kernel versions transparently. (only a small delay will be noticed by
the user - if we do it smartly with in-kernel modesetting then not even
the screen contents will be changed over this.)

Ingo

2008-10-10 20:57:57

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

Dave Hansen wrote:
> On Fri, 2008-10-10 at 18:34 +0200, Greg Kurz wrote:
>> This flag is weak... testing it gives absolutly no hint whether the
>> checkpoint may succeed or not. As it is designed now, a user can only be
>> aware that checkpoint is *forever* denied. I agree that it's only useful
>> as a "flexible CR todo list".
>
> Cool, so everyone agrees the patch is useful!

Yes, definitively. It is a good idea.

2008-10-10 21:13:29

by Len Brown

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()



On Fri, 10 Oct 2008, Ingo Molnar wrote:

>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > > > Surely not ACPI-compliant.
> > >
> > > what do you mean?
> >
> > The ACPI spec says quite specifically what should be done while
> > entering hibernation and during resume from hibernation. We're not
> > following that in the current code, but we can (gradually) update the
> > code to become ACPI-compilant in that respect. However, if we go the
> > checkpointing route, I don't think that will be possible any more.
>
> ah, i see. I did not mean to utilize any ACPI paths but simple powerdown
> or reboot.

If we don't enter ACPI S4, and instead poweroff,
then we'll lose the capability to wake the system from
devices that are capable of waking S4, but incapable of waking S5.

ie. The power button will still work, but others may not.

cheers,
-Len

> If we checkpoint all apps to persistent disk areas (which the checkpoint
> patches in this thread are about), then we can just reboot the kernel
> and forget all its state.
>
> That capability can be used to build a really robust hibernation
> implementation IMO: we could "hibernate/kexec" over between different
> kernel versions transparently. (only a small delay will be noticed by
> the user - if we do it smartly with in-kernel modesetting then not even
> the screen contents will be changed over this.)

2008-10-10 22:53:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

On Friday, 10 of October 2008, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > > > Surely not ACPI-compliant.
> > >
> > > what do you mean?
> >
> > The ACPI spec says quite specifically what should be done while
> > entering hibernation and during resume from hibernation. We're not
> > following that in the current code, but we can (gradually) update the
> > code to become ACPI-compilant in that respect. However, if we go the
> > checkpointing route, I don't think that will be possible any more.
>
> ah, i see. I did not mean to utilize any ACPI paths but simple powerdown
> or reboot.
>
> If we checkpoint all apps to persistent disk areas (which the checkpoint
> patches in this thread are about), then we can just reboot the kernel
> and forget all its state.
>
> That capability can be used to build a really robust hibernation
> implementation IMO: we could "hibernate/kexec" over between different
> kernel versions transparently. (only a small delay will be noticed by
> the user - if we do it smartly with in-kernel modesetting then not even
> the screen contents will be changed over this.)

That actually should be called a migration of VM IMO and would be a useful
functionality. Sure.

Hibernation, however, generally involves the restoration of the hardware and
most importantly _platform_ state which IMO is impossible without the ACPI
functionality, as well as wake-up, which may depend on ACPI too.

Thanks,
Rafael

2008-10-11 14:48:47

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

Hi!

> > > Hmm, I don't know too much about aio, but is it possible to succeed with
> > > io_getevents if we didn't first do a submit? It looks like the contexts
> > > are looked up out of current->mm, so I don't think we need this call
> > > here.
> > >
> > > Otherwise, this is neat.
> >
> > Good question. I know nothing, either. :)
> >
> > My thought was that any process *trying* to do aio stuff of any kind
> > is going to be really confused if it gets checkpointed. Or, it might
> > try to submit an aio right after it checks the list of them. I
> > thought it best to be cautious and say, if you screw with aio, no
> > checkpointing for you!
>
> as long as there's total transparency and the transition from CR-capable
> to CR-disabled state is absolutely safe and race-free, that should be
> fine.
>
> I expect users to quickly cause enough pressure to reduce the NOCR areas
> of the kernel significantly ;-)
>
> In the long run, could we expect a (experimental) version of hibernation
> that would just use this checkpointing facility to hibernate? That would
> be way cool for users and for testing: we could do transparent kernel
> upgrades/downgrades via this form of hibernation, between CR-compatible
> kernels (!).

Well, if we could do that, I guess we could also use CR to 'hibernate'
your desktop then continue on your notebook. And yes that sounds cool.

> Pie in the sky for sure, but way cool: it could propel Linux kernel
> testing to completely new areas - new kernels could be tried
> non-intrusively. (as long as a new kernel does not corrupt the CR data
> structures - so some good consistency and redundancy checking would be
> nice in the format!)

Well, for simple apps, it should not be that hard...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-10-11 15:01:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()


* Pavel Machek <[email protected]> wrote:

> > In the long run, could we expect a (experimental) version of
> > hibernation that would just use this checkpointing facility to
> > hibernate? That would be way cool for users and for testing: we
> > could do transparent kernel upgrades/downgrades via this form of
> > hibernation, between CR-compatible kernels (!).
>
> Well, if we could do that, I guess we could also use CR to 'hibernate'
> your desktop then continue on your notebook. And yes that sounds cool.

yes.

> > Pie in the sky for sure, but way cool: it could propel Linux kernel
> > testing to completely new areas - new kernels could be tried
> > non-intrusively. (as long as a new kernel does not corrupt the CR
> > data structures - so some good consistency and redundancy checking
> > would be nice in the format!)
>
> Well, for simple apps, it should not be that hard...

Generally, if something works for simple apps already (in a robust,
compatible and supportable way) and users find it "very cool", then
support for more complex apps is not far in the future.

but if you want to support more complex apps straight away, it takes
forever and gets ugly.

Ingo

2008-10-13 08:19:33

by Greg Kurz

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

On Fri, 2008-10-10 at 11:18 -0600, Chris Friesen wrote:
> Greg Kurz wrote:
>
> > This flag is weak... testing it gives absolutly no hint whether the
> > checkpoint may succeed or not. As it is designed now, a user can only be
> > aware that checkpoint is *forever* denied. I agree that it's only useful
> > as a "flexible CR todo list".
>
> I don't think it's true that it gives "absolutly no hint".
>
> If the flag is not set, then checkpoint will succeed, right? Whereas if

Wrong. Unless you test_and_checkpoint atomically, the flag doesn't help.

> the flag is set, then it's an indication that checkpoint could fail (but
> may still succeed if whatever condition caused the flag to be set is no
> longer true).
>
> Chris
>
--
Gregory Kurz [email protected]
Software Engineer @ IBM/Meiosys http://www.ibm.com
Tel +33 (0)534 638 479 Fax +33 (0)561 400 420

"Anarchy is about taking complete responsibility for yourself."
Alan Moore.

2008-10-13 08:21:31

by Greg Kurz

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

On Fri, 2008-10-10 at 10:28 -0700, Dave Hansen wrote:
> On Fri, 2008-10-10 at 18:45 +0200, Greg Kurz wrote:
> > It's exactly what I meant before, the tracking facility would be awfully
> > complicated. It cannot be done that way.
> > But there's also something awkward with the flag thing : can you provide
> > right now an exhaustive list of all the places where you must raise it ?
>
> Greg, that's just pure FUD. We don't say that spinlocks are a bad thing
> because we can't come up with an exhaustive list of places where we need
> locking.
>
> We'll do plenty of checks at checkpoint time.
>
> We'll do plenty of checks at runtime.
>
> Neither will work completely on its own, and neither will be exhaustive.
> The way this will work is just as Serge said: in true Linux style, we'll
> add more places users of process_deny_checkpoint() incrementally as we
> find them and as people complain. We'll also be incrementally removing
> them as we add functionality.
>
> -- Dave
>

Well then I misunderstood the purpose your initial postings. Sorry. :)

--
Gregory Kurz [email protected]
Software Engineer @ IBM/Meiosys http://www.ibm.com
Tel +33 (0)534 638 479 Fax +33 (0)561 400 420

"Anarchy is about taking complete responsibility for yourself."
Alan Moore.

2008-10-13 16:46:18

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

Quoting Greg Kurz ([email protected]):
> On Fri, 2008-10-10 at 11:18 -0600, Chris Friesen wrote:
> > Greg Kurz wrote:
> >
> > > This flag is weak... testing it gives absolutly no hint whether the
> > > checkpoint may succeed or not. As it is designed now, a user can only be
> > > aware that checkpoint is *forever* denied. I agree that it's only useful
> > > as a "flexible CR todo list".
> >
> > I don't think it's true that it gives "absolutly no hint".
> >
> > If the flag is not set, then checkpoint will succeed, right? Whereas if
>
> Wrong. Unless you test_and_checkpoint atomically, the flag doesn't help.

Atomically wrt what? Presumably you test and checkpoint while the
container is frozen...

> > the flag is set, then it's an indication that checkpoint could fail (but
> > may still succeed if whatever condition caused the flag to be set is no
> > longer true).
> >
> > Chris
> >
> --
> Gregory Kurz [email protected]
> Software Engineer @ IBM/Meiosys http://www.ibm.com
> Tel +33 (0)534 638 479 Fax +33 (0)561 400 420
>
> "Anarchy is about taking complete responsibility for yourself."
> Alan Moore.
>
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers