2010-04-29 00:22:31

by Christoffer Dall

[permalink] [raw]
Subject: [C/R ARM v2][PATCH 0/3] Linux Checkpoint-Restart - ARM port

Following there will be two preparatory patches for an ARM port of the
checkpoint-restart code and finally a third patch implementing the
architecture-specific parts of c/r.

The preparatory patches consist of a partial syscall trace implementation
for ARM and an eclone implementation for ARM. The syscall trace
implementation provides only the needed functionality for c/r.

There is a separate patch for the user space code, which supports
cross-compilation, extracting headers for ARM and an eclone implementation
for ARM.

The kernel patches presented here are based on the ckpt-v21-rc6 patch set.

---

CHANGELOG:

[2010-Apr-08] v2:
- Systrace implementation now inspects process state to get the
system call number thereby avoiding extra work on system calls.
- Removed __user attribute on long type in eclone implementation
- Better check for architecture versions across C/R
- Improved checking of user space ABI settings across C/R
- Code simplifications

[2010-Mar-22] v1:
- Initial version
- Systrace implementation modified the system call entry path to
store the system call number globally in memory.
- ARM implementation lightly tested


2010-04-29 00:08:53

by Roland McGrath

[permalink] [raw]
Subject: Re: [C/R ARM v2][PATCH 1/3] ARM: Rudimentary syscall interfaces

> + * syscalls.h - Linux syscall interfaces for ARM

s/syscalls/syscall/

> +static inline int get_swi_instruction(struct task_struct *task,
> + struct pt_regs *regs,
> + unsigned long *instr)
> +{

Why doesn't this just use access_process_vm?

> +/*
> + * This function essentially duplicates the logic from vector_swi in
> + * arch/arm/kernel/entry-common.S. However, that code is in the
> + * critical path for system calls and is hard to factor out without
> + * compromising performance.
> + */

No clue about the ARM details, not reviewing that. I think this is too big
to be an inline and should be in some arch/arm/kernel/*.c place instead.
Of course, if (config_aeabi && !config_oabi) is true at compile time, it's
not large at all. So perhaps just move the compelx cases to a function
and leave the "Pure EABI" fork in the inline.


Thanks,
Roland

2010-04-29 00:25:44

by Christoffer Dall

[permalink] [raw]
Subject: [C/R ARM v2][PATCH 1/3] ARM: Rudimentary syscall interfaces

Introduces a few of the system call inspection functions for ARM. The
current motivation is checkpoint restart, but the general interface
requirements are met, making it possible for a debugger or tracer to
obtain information about the system call status of another process.

The patch is in part based on the following proposal from Roland McGrath:
https://patchwork.kernel.org/patch/32101/

Compared to other architectures, the code to implement syscall_get_nr is
somewhat comprehensive. However, it's a result of no globally stored
location for the system call number and the complexity of the ARM ABI with
multiple versions.

Changelog[v2]:
- Get the system call number by inspecting the process instead of
storing the system call number globally on entry to each system
call.

Cc: Roland McGrath <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Acked-by: Oren Laadan <[email protected]>
---
arch/arm/include/asm/syscall.h | 133 ++++++++++++++++++++++++++++++++++++++++
1 files changed, 133 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/include/asm/syscall.h

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
new file mode 100644
index 0000000..49cb10e
--- /dev/null
+++ b/arch/arm/include/asm/syscall.h
@@ -0,0 +1,133 @@
+/*
+ * syscalls.h - Linux syscall interfaces for ARM
+ *
+ * Copyright (c) 2010 Christoffer Dall
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifndef _ASM_ARM_SYSCALLS_H
+#define _ASM_ARM_SYSCALLS_H
+
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/memory.h>
+#include <asm/unistd.h>
+
+static inline int get_swi_instruction(struct task_struct *task,
+ struct pt_regs *regs,
+ unsigned long *instr)
+{
+ struct page *page = NULL;
+ unsigned long instr_addr;
+ unsigned long *ptr;
+ int ret;
+
+ instr_addr = regs->ARM_pc - 4;
+
+ down_read(&task->mm->mmap_sem);
+ ret = get_user_pages(task, task->mm, instr_addr,
+ 1, 0, 0, &page, NULL);
+ up_read(&task->mm->mmap_sem);
+
+ if (ret < 0)
+ return ret;
+
+ ptr = (unsigned long *)kmap_atomic(page, KM_USER1);
+ memcpy(instr,
+ ptr + (instr_addr >> PAGE_SHIFT),
+ sizeof(unsigned long));
+ kunmap_atomic(ptr, KM_USER1);
+
+ page_cache_release(page);
+
+ return 0;
+}
+
+/*
+ * This function essentially duplicates the logic from vector_swi in
+ * arch/arm/kernel/entry-common.S. However, that code is in the
+ * critical path for system calls and is hard to factor out without
+ * compromising performance.
+ */
+static inline int __syscall_get_nr(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ int ret;
+ int scno;
+ unsigned long instr;
+ bool config_oabi = false;
+ bool config_aeabi = false;
+ bool config_arm_thumb = false;
+ bool config_cpu_endian_be8 = false;
+
+#ifdef CONFIG_OABI_COMPAT
+ config_oabi = true;
+#endif
+#ifdef CONFIG_AEABI
+ config_aeabi = true;
+#endif
+#ifdef CONFIG_ARM_THUMB
+ config_arm_thumb = true;
+#endif
+#ifdef CONFIG_CPU_ENDIAN_BE8
+ config_cpu_endian_be8 = true;
+#endif
+#ifdef CONFIG_CPU_ARM710
+ return -1;
+#endif
+
+ if (config_aeabi && !config_oabi) {
+ /* Pure EABI */
+ return regs->ARM_r7;
+ } else if (config_oabi) {
+ if (config_arm_thumb && (regs->ARM_cpsr & PSR_T_BIT))
+ return -1;
+
+ ret = get_swi_instruction(task, regs, &instr);
+ if (ret < 0)
+ return -1;
+
+ if (config_cpu_endian_be8)
+ asm ("rev %[out], %[in]": [out] "=r" (instr):
+ [in] "r" (instr));
+
+ if ((instr & 0x00ffffff) == 0)
+ return regs->ARM_r7; /* EABI call */
+ else
+ return (instr & 0x00ffffff) | __NR_OABI_SYSCALL_BASE;
+ } else {
+ /* Legacy ABI only */
+ if (config_arm_thumb && (regs->ARM_cpsr & PSR_T_BIT)) {
+ /* Thumb mode ABI */
+ scno = regs->ARM_r7 + __NR_SYSCALL_BASE;
+ } else {
+ ret = get_swi_instruction(task, regs, &instr);
+ if (ret < 0)
+ return -1;
+ scno = instr;
+ }
+ return scno & 0x00ffffff;
+ }
+}
+
+static inline int syscall_get_nr(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return __syscall_get_nr(task, regs);
+}
+
+static inline long syscall_get_return_value(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return regs->ARM_r0;
+}
+
+static inline long syscall_get_error(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return regs->ARM_r0;
+}
+
+#endif /* _ASM_ARM_SYSCALLS_H */
--
1.5.6.5

2010-04-29 00:25:48

by Christoffer Dall

[permalink] [raw]
Subject: [C/R ARM v2][PATCH 2/3] ARM: Add the eclone system call

In addition to doing everything that clone() system call does, the
eclone() system call:

- allows additional clone flags (31 of 32 bits in the flags
parameter to clone() are in use)

- allows user to specify a pid for the child process in its
active and ancestor pid namespaces.

Eclone is needed for restarting a process from a checkpoint. See more
in Documentation/eclone and refer to the original LKML posting:
http://lkml.org/lkml/2009/11/11/361

The new system call for ARM has number 366.

Changelog[v2]:
- Removed __user attribute on long type

Cc: [email protected]
Cc: libc-ports <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Acked-by: Oren Laadan <[email protected]>
---
arch/arm/include/asm/unistd.h | 1 +
arch/arm/kernel/calls.S | 1 +
arch/arm/kernel/entry-common.S | 6 ++++++
arch/arm/kernel/sys_arm.c | 39 +++++++++++++++++++++++++++++++++++++++
4 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index dd2bf53..8dcb42a 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -392,6 +392,7 @@
#define __NR_rt_tgsigqueueinfo (__NR_SYSCALL_BASE+363)
#define __NR_perf_event_open (__NR_SYSCALL_BASE+364)
#define __NR_recvmmsg (__NR_SYSCALL_BASE+365)
+#define __NR_eclone (__NR_SYSCALL_BASE+366)

/*
* The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 37ae301..80047c8 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -375,6 +375,7 @@
CALL(sys_rt_tgsigqueueinfo)
CALL(sys_perf_event_open)
/* 365 */ CALL(sys_recvmmsg)
+ CALL(sys_eclone_wrapper)
#ifndef syscalls_counted
.equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
#define syscalls_counted
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 2c1db77..ba365dc 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -380,6 +380,12 @@ sys_clone_wrapper:
b sys_clone
ENDPROC(sys_clone_wrapper)

+sys_eclone_wrapper:
+ add ip, sp, #S_OFF
+ str ip, [sp, #0]
+ b sys_eclone
+ENDPROC(sys_eclone_wrapper)
+
sys_sigreturn_wrapper:
add r0, sp, #S_OFF
b sys_sigreturn
diff --git a/arch/arm/kernel/sys_arm.c b/arch/arm/kernel/sys_arm.c
index c235018..c23f133 100644
--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -54,6 +54,45 @@ asmlinkage int sys_clone(unsigned long clone_flags, unsigned long newsp,
return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr);
}

+asmlinkage int sys_eclone(unsigned flags_low, struct clone_args __user *uca,
+ int args_size, pid_t __user *pids,
+ struct pt_regs *regs)
+{
+ int rc;
+ struct clone_args kca;
+ unsigned long flags;
+ int __user *parent_tidp;
+ int __user *child_tidp;
+ unsigned long child_stack;
+ unsigned long stack_size;
+
+ rc = fetch_clone_args_from_user(uca, args_size, &kca);
+ if (rc)
+ return rc;
+
+ /*
+ * TODO: Convert 'clone-flags' to 64-bits on all architectures.
+ * TODO: When ->clone_flags_high is non-zero, copy it in to the
+ * higher word(s) of 'flags':
+ *
+ * flags = (kca.clone_flags_high << 32) | flags_low;
+ */
+ flags = flags_low;
+ parent_tidp = (int *)(unsigned long)kca.parent_tid_ptr;
+ child_tidp = (int *)(unsigned long)kca.child_tid_ptr;
+
+ stack_size = (unsigned long)kca.child_stack_size;
+ if (stack_size)
+ return -EINVAL;
+
+ child_stack = (unsigned long)kca.child_stack;
+ if (!child_stack)
+ child_stack = regs->ARM_sp;
+
+ return do_fork_with_pids(flags, child_stack, regs, stack_size,
+ parent_tidp, child_tidp, kca.nr_pids, pids);
+}
+
asmlinkage int sys_vfork(struct pt_regs *regs)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->ARM_sp, regs, 0, NULL, NULL);
--
1.5.6.5

2010-04-29 00:32:09

by Christoffer Dall

[permalink] [raw]
Subject: [C/R ARM v2][PATCH 3/3] c/r: ARM implementation of checkpoint/restart

Implements architecture specific requirements for checkpoint/restart on
ARM. The changes touch almost only c/r related code. Most of the work is
done in arch/arm/checkpoint.c, which implements checkpointing of the CPU
and necessary fields on the thread_info struct.

The following restrictions are enforced:
----------------------------------------

The CPU architecture (given by cpu_architecture()) is checkpointed and
verified against the CPU architecture on restart. We require that the
restart architecture must be at least as new as the checkpoint
architecture.

We checkpoint whether the system is running with CONFIG_MMU or not and
require the same configuration for the system on which we restore the
process. As discussed in the original post of these patches, it should be
possible to checkpoint a non-mmu process and restart it on an mmu system.
However, the implementation and testing is left for someone with knowledge
about both configurations. (See
https://lists.linux-foundation.org/pipermail/containers/2010-March/023996.html)

Obviously, processes using the old ARM ABI cannot be restarted on kernels
configured with CONFIG_AEABI and without CONFIG_OABI_COMPAT. The same goes
for restarting processes using AEABI on kernels configured without
CONFIG_AEABI. Unfortunately, if the kernel on which we checkpoint is
configured with CONFIG_OABI_COMPAT there is no way of knowing which ABI the
process actually uses. Therefore, we raise warnings on restart whenever in
doubt and continue with the restart process optimistically.

Other:
------
Regarding ThumbEE, the thumbee_state field on the thread_info is stored
in checkpoints when CONFIG_ARM_THUMBEE and 0 is stored otherwise. If
a value different than 0 is checkpointed and CONFIG_ARM_THUMBEE is not
set on the restore system, the restore is aborted. Feedback on this
implementation is very welcome.

Added support for syscall sys_checkpoint and sys_restart for ARM:
__NR_checkpoint 367
__NR_restart 368

Changelog[v2]:
- Changed __LINUX_ARM_ARCH__ to cpu_architecture()
- Support restart on newer ISA versions
- More thorough checking of CONFIG_EABI and CONFIG_OABI_COMPAT
between checkpoint and restart kernels.
- Simplified code by inlining small routines

Cc: [email protected]
Signed-off-by: Christoffer Dall <[email protected]>
Acked-by: Oren Laadan <[email protected]>
---
arch/arm/Kconfig | 4 +
arch/arm/include/asm/checkpoint_hdr.h | 72 ++++++++
arch/arm/include/asm/ptrace.h | 1 +
arch/arm/include/asm/unistd.h | 2 +
arch/arm/kernel/Makefile | 1 +
arch/arm/kernel/calls.S | 2 +
arch/arm/kernel/checkpoint.c | 302 +++++++++++++++++++++++++++++++++
arch/arm/kernel/signal.c | 5 +
arch/arm/kernel/sys_arm.c | 13 ++
include/linux/checkpoint_hdr.h | 2 +
10 files changed, 404 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/include/asm/checkpoint_hdr.h
create mode 100644 arch/arm/kernel/checkpoint.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c5408bf..14c7c84 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -100,6 +100,10 @@ config HAVE_LATENCYTOP_SUPPORT
depends on !SMP
default y

+config CHECKPOINT_SUPPORT
+ bool
+ default y
+
config LOCKDEP_SUPPORT
bool
default y
diff --git a/arch/arm/include/asm/checkpoint_hdr.h b/arch/arm/include/asm/checkpoint_hdr.h
new file mode 100644
index 0000000..38e8446
--- /dev/null
+++ b/arch/arm/include/asm/checkpoint_hdr.h
@@ -0,0 +1,72 @@
+#ifndef __ASM_ARM_CKPT_HDR_H
+#define __ASM_ARM_CKPT_HDR_H
+/*
+ * Checkpoint/restart - architecture specific headers ARM
+ *
+ * Copyright (C) 2008-2010 Oren Laadan
+ * Copyright (C) 2010 Christoffer Dall
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of the Linux
+ * distribution for more details.
+ */
+
+#ifndef _CHECKPOINT_CKPT_HDR_H_
+#error asm/checkpoint_hdr.h included directly
+#endif
+
+#include <linux/types.h>
+
+/* ARM structure seen from kernel/userspace */
+#ifdef __KERNEL__
+#include <asm/processor.h>
+#endif
+
+#define CKPT_ARCH_ID CKPT_ARCH_ARM
+
+/* arch dependent constants */
+#define CKPT_ARCH_NSIG 64
+#define CKPT_TTY_NCC 8
+
+#ifdef __KERNEL__
+
+#include <asm/signal.h>
+#if CKPT_ARCH_NSIG != _NSIG
+#error CKPT_ARCH_NSIG size is wrong per asm/signal.h and asm/checkpoint_hdr.h
+#endif
+
+#include <linux/tty.h>
+#if CKPT_TTY_NCC != NCC
+#error CKPT_TTY_NCC size is wrong per asm-generic/termios.h
+#endif
+
+#endif /* __KERNEL__ */
+
+
+struct ckpt_hdr_header_arch {
+ struct ckpt_hdr h;
+ __u32 cpu_architecture;
+ __u8 mmu; /* Checkpointed on mmu system */
+ __u8 aeabi; /* Checkpointed on AEABI kernel */
+ __u8 oabi_compat; /* Checkpointed on OABI compat. system */
+} __attribute__((aligned(8)));
+
+struct ckpt_hdr_thread {
+ struct ckpt_hdr h;
+ __u32 syscall;
+ __u32 tp_value;
+ __u32 thumbee_state;
+} __attribute__((aligned(8)));
+
+
+struct ckpt_hdr_cpu {
+ struct ckpt_hdr h;
+ __u32 uregs[18];
+} __attribute__((aligned(8)));
+
+struct ckpt_hdr_mm_context {
+ struct ckpt_hdr h;
+ __u32 end_brk;
+} __attribute__((aligned(8)));
+
+#endif /* __ASM_ARM_CKPT_HDR__H */
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index 9dcb11e..9999568 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -57,6 +57,7 @@
#define PSR_C_BIT 0x20000000
#define PSR_Z_BIT 0x40000000
#define PSR_N_BIT 0x80000000
+#define PSR_GE_BITS 0x000f0000

/*
* Groups of PSR bits
diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index 8dcb42a..89484b4 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -393,6 +393,8 @@
#define __NR_perf_event_open (__NR_SYSCALL_BASE+364)
#define __NR_recvmmsg (__NR_SYSCALL_BASE+365)
#define __NR_eclone (__NR_SYSCALL_BASE+366)
+#define __NR_checkpoint (__NR_SYSCALL_BASE+367)
+#define __NR_restart (__NR_SYSCALL_BASE+368)

/*
* The following SWIs are ARM private.
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 26d302c..bfe39d8 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_ARM_THUMBEE) += thumbee.o
obj-$(CONFIG_KGDB) += kgdb.o
obj-$(CONFIG_ARM_UNWIND) += unwind.o
obj-$(CONFIG_HAVE_TCM) += tcm.o
+obj-$(CONFIG_CHECKPOINT) += checkpoint.o

obj-$(CONFIG_CRUNCH) += crunch.o crunch-bits.o
AFLAGS_crunch-bits.o := -Wa,-mcpu=ep9312
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 80047c8..7126034 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -376,6 +376,8 @@
CALL(sys_perf_event_open)
/* 365 */ CALL(sys_recvmmsg)
CALL(sys_eclone_wrapper)
+ CALL(sys_checkpoint)
+ CALL(sys_restart)
#ifndef syscalls_counted
.equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
#define syscalls_counted
diff --git a/arch/arm/kernel/checkpoint.c b/arch/arm/kernel/checkpoint.c
new file mode 100644
index 0000000..14911f8
--- /dev/null
+++ b/arch/arm/kernel/checkpoint.c
@@ -0,0 +1,302 @@
+/*
+ * Checkpoint/restart - architecture specific support for ARM
+ *
+ * Copyright (C) 2008-2010 Oren Laadan
+ * Copyright (C) 2010 Christoffer Dall
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of the Linux
+ * distribution for more details.
+ */
+#include <linux/checkpoint.h>
+#include <linux/checkpoint_hdr.h>
+
+#include <asm/processor.h>
+
+
+#ifdef CONFIG_MMU
+ const u8 ckpt_mmu = 1;
+#else
+ const u8 ckpt_mmu = 0;
+#endif
+
+#ifdef CONFIG_OABI_COMPAT
+ const u8 ckpt_oabi_compat = 1;
+#else
+ const u8 ckpt_oabi_compat = 0;
+#endif
+
+#ifdef CONFIG_AEABI
+ const u8 ckpt_aeabi = 1;
+#else
+ const u8 ckpt_aeabi = 0;
+#endif
+
+
+/**************************************************************************
+ * Checkpoint
+ */
+
+/* dump the thread_struct of a given task */
+int checkpoint_thread(struct ckpt_ctx *ctx, struct task_struct *t)
+{
+ int ret;
+ struct ckpt_hdr_thread *h;
+ struct thread_info *ti = task_thread_info(t);
+
+ h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_THREAD);
+ if (!h)
+ return -ENOMEM;
+
+ /*
+ * Store the syscall information about the checkpointed process
+ * as we need to know if the process was doing a syscall (and which)
+ * during restart.
+ */
+ h->syscall = ti->syscall;
+
+ /*
+ * Store remaining thread-specific info.
+ */
+ h->tp_value = ti->tp_value;
+#ifdef CONFIG_ARM_THUMBEE
+ h->thumbee_state = ti->thumbee_state;
+#else
+ /*
+ * If restoring on system with ThumbeEE support,
+ * zero will set ThumbEE state to unused.
+ */
+ h->thumbee_state = 0;
+#endif
+
+ ret = ckpt_write_obj(ctx, &h->h);
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+/* dump the cpu state and registers of a given task */
+int checkpoint_cpu(struct ckpt_ctx *ctx, struct task_struct *t)
+{
+ struct ckpt_hdr_cpu *h;
+ int ret;
+ struct pt_regs *regs = task_pt_regs(t);
+
+ h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_CPU);
+ if (!h)
+ return -ENOMEM;
+
+ memcpy(&h->uregs, regs, sizeof(h->uregs));
+
+ /*
+ * for checkpoint in process context (from within a container),
+ * the actual syscall is taking place at this very moment; so
+ * we (optimistically) subtitute the future return value (0) of
+ * this syscall into r0, so that upon restart it will
+ * succeed (or it will endlessly retry checkpoint...)
+ */
+ if (t == current)
+ h->ARM_r0 = 0;
+
+ ret = ckpt_write_obj(ctx, &h->h);
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+int checkpoint_write_header_arch(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_header_arch *arch_hdr;
+ unsigned int cpu_arch = cpu_architecture();
+ int ret;
+
+
+ arch_hdr = ckpt_hdr_get_type(ctx, sizeof(*arch_hdr),
+ CKPT_HDR_HEADER_ARCH);
+ if (!arch_hdr)
+ return -ENOMEM;
+
+ if (cpu_arch == CPU_ARCH_UNKNOWN)
+ ckpt_msg(ctx, "warning: cannot determine CPU architecutre. "
+ "cannot validate compatibility on restore");
+ arch_hdr->cpu_architecture = cpu_arch;
+ arch_hdr->mmu = ckpt_mmu;
+ arch_hdr->oabi_compat = ckpt_oabi_compat;
+ arch_hdr->aeabi = ckpt_aeabi;
+
+ ret = ckpt_write_obj(ctx, &arch_hdr->h);
+ ckpt_hdr_put(ctx, arch_hdr);
+
+ return ret;
+}
+
+/* dump the mm->context state */
+int checkpoint_mm_context(struct ckpt_ctx *ctx, struct mm_struct *mm)
+{
+ struct ckpt_hdr_mm_context *h;
+ int ret;
+
+ h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_MM_CONTEXT);
+ if (!h)
+ return -ENOMEM;
+
+#ifdef CONFIG_MMU
+ /*
+ * We do not checkpoint kvm_seq as we do not know of any generally
+ * exported functionality which would associate an ioremapped VMA
+ * with a task. A driver might use this functionality, but should
+ * implement its own checkpoint functionality to deal with this.
+ */
+#else
+ h->end_brk = mm->context.end_brk;
+#endif
+
+ ret = ckpt_write_obj(ctx, &h->h);
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+/**************************************************************************
+ * Restart
+ */
+
+/* read the thread_struct into the current task */
+int restore_thread(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_thread *h;
+ int ret = 0;
+ struct thread_info *ti = task_thread_info(current);
+
+ h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_THREAD);
+ if (IS_ERR(h))
+ return PTR_ERR(h);
+
+ ti->syscall = h->syscall;
+ ti->tp_value = h->tp_value;
+
+#ifdef CONFIG_ARM_THUMBEE
+ /*
+ * If the checkpoint system did not support ThumbEE, this field
+ * will be zero, equivalent to unused ThumbEE state.
+ */
+ h->thumbee_state = ti->thumbee_state;
+#else
+ if (h->thumbee_state != 0) {
+ ret = -EINVAL;
+ ckpt_err(ctx, ret, "Checkpoint had ThumbEE state but "
+ "ARM_THUMBEE not configured.");
+ }
+#endif
+
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+/* read the cpu state and registers for the current task */
+int restore_cpu(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_cpu *h;
+ struct task_struct *t = current;
+ struct pt_regs *regs = task_pt_regs(t);
+ int i;
+
+ h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_CPU);
+ if (IS_ERR(h))
+ return PTR_ERR(h);
+
+ /*
+ * Restore user registers
+ */
+ memcpy(regs, &h->uregs, 16 * sizeof(__u32));
+
+ /*
+ * Restore only user-writable bits on the CPSR
+ */
+ regs->ARM_cpsr = regs->ARM_cpsr |
+ (h->ARM_cpsr & (PSR_N_BIT | PSR_Z_BIT |
+ PSR_C_BIT | PSR_V_BIT |
+ PSR_V_BIT | PSR_Q_BIT |
+ PSR_E_BIT | PSR_GE_BITS));
+ regs->ARM_ORIG_r0 = h->ARM_ORIG_r0;
+
+
+ ckpt_hdr_put(ctx, h);
+ return 0;
+}
+
+int restore_read_header_arch(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_header_arch *arch_hdr;
+ unsigned int cpu_arch = cpu_architecture();
+ int ret = -EINVAL;
+
+ arch_hdr = ckpt_read_obj_type(ctx, sizeof(*arch_hdr),
+ CKPT_HDR_HEADER_ARCH);
+ if (IS_ERR(arch_hdr))
+ return PTR_ERR(arch_hdr);
+
+ if (cpu_arch == CPU_ARCH_UNKNOWN)
+ ckpt_msg(ctx, "warning: cannot determine CPU architecutre. "
+ "cannot validate compatibility.");
+
+ if (arch_hdr->cpu_architecture == CPU_ARCH_UNKNOWN)
+ ckpt_msg(ctx, "warning: unknown checkpoint CPU architecture. "
+ "cannot validate compatibility.");
+
+ if (arch_hdr->cpu_architecture > cpu_architecture()) {
+ ckpt_err(ctx, ret, "cannot restore on older ARM architecture");
+ goto out;
+ }
+
+ /* TODO: Maybe non-mmu to mmu checkpoint/restart is possible */
+ if (arch_hdr->mmu != ckpt_mmu) {
+ ckpt_err(ctx, ret, "checkpoint %s MMU, restore %s MMU",
+ arch_hdr->mmu ? "with" : "without",
+ ckpt_mmu ? "with" : "without");
+ goto out;
+ }
+
+ ret = 0;
+
+ if (!ckpt_oabi_compat && ckpt_aeabi) {
+ /* Only AEABI */
+ if (!arch_hdr->aeabi) {
+ ret = -EINVAL;
+ ckpt_err(ctx, ret, "process used OABI "
+ "and CONFIG_OABI_COMPAT not set.");
+ goto out;
+ } else if (arch_hdr->oabi_compat) {
+ ckpt_msg(ctx, "warning: process may have used OABI "
+ "and CONFIG_OABI_COMPAT not set.");
+ }
+ } else if (!ckpt_aeabi) {
+ /* Only old ABI */
+ if (arch_hdr->aeabi && !arch_hdr->oabi_compat) {
+ ret = -EINVAL;
+ ckpt_err(ctx, ret, "process used AEABI "
+ "and CONFIG_AEABI not set.");
+ goto out;
+ } else if (arch_hdr->oabi_compat) {
+ ckpt_msg(ctx, "warning: process may have used AEABI "
+ "and CONFIG_AEABI not set.");
+ }
+ }
+
+out:
+ ckpt_hdr_put(ctx, arch_hdr);
+ return ret;
+}
+
+int restore_mm_context(struct ckpt_ctx *ctx, struct mm_struct *mm)
+{
+ struct ckpt_hdr_mm_context *h;
+
+ h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_MM_CONTEXT);
+ if (IS_ERR(h))
+ return PTR_ERR(h);
+
+#if !CONFIG_MMU
+ mm->context.end_brk = h->end_brk;
+#endif
+
+ ckpt_hdr_put(ctx, h);
+ return 0;
+}
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index 907d5a6..d37ef41 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -773,6 +773,11 @@ static void do_signal(struct pt_regs *regs, int syscall)
single_step_set(current);
}

+int task_has_saved_sigmask(struct task_struct *task)
+{
+ return !!(task_thread_info(task)->flags & _TIF_RESTORE_SIGMASK);
+}
+
asmlinkage void
do_notify_resume(struct pt_regs *regs, unsigned int thread_flags, int syscall)
{
diff --git a/arch/arm/kernel/sys_arm.c b/arch/arm/kernel/sys_arm.c
index c23f133..11e27a1 100644
--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -27,6 +27,7 @@
#include <linux/ipc.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
+#include <linux/checkpoint.h>

/* Fork a new task - this creates a new program thread.
* This is called indirectly via a small wrapper
@@ -166,3 +167,15 @@ asmlinkage long sys_arm_fadvise64_64(int fd, int advice,
{
return sys_fadvise64_64(fd, offset, len, advice);
}
+
+asmlinkage long sys_checkpoint(unsigned long pid, unsigned long fd,
+ unsigned long flags, unsigned long logfd)
+{
+ return do_sys_checkpoint(pid, fd, flags, logfd);
+}
+
+asmlinkage long sys_restart(unsigned long pid, unsigned long fd,
+ unsigned long flags, unsigned long logfd)
+{
+ return do_sys_restart(pid, fd, flags, logfd);
+}
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 36386ad..bf20b45 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -208,6 +208,8 @@ enum {
#define CKPT_ARCH_PPC32 CKPT_ARCH_PPC32
CKPT_ARCH_PPC64,
#define CKPT_ARCH_PPC64 CKPT_ARCH_PPC64
+ CKPT_ARCH_ARM,
+#define CKPT_ARCH_ARM CKPT_ARCH_ARM
};

/* shared objrects (objref) */
--
1.5.6.5

2010-04-29 00:46:21

by Christoffer Dall

[permalink] [raw]
Subject: Re: [C/R ARM v2][PATCH] ARM: Rudimentary syscall interfaces

Hi Roland.

Thanks for your feedback. The changed patch below should address your
concerns.

Best,
Christoffer

---
arch/arm/include/asm/syscall.h | 32 ++++++++++++++++++
arch/arm/kernel/ptrace.c | 69 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/include/asm/syscall.h

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
new file mode 100644
index 0000000..1a6ca68
--- /dev/null
+++ b/arch/arm/include/asm/syscall.h
@@ -0,0 +1,32 @@
+/*
+ * syscall.h - Linux syscall interfaces for ARM
+ *
+ * Copyright (c) 2010 Christoffer Dall
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifndef _ASM_ARM_SYSCALLS_H
+#define _ASM_ARM_SYSCALLS_H
+
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/memory.h>
+#include <asm/unistd.h>
+
+int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
+
+static inline long syscall_get_return_value(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return regs->ARM_r0;
+}
+
+static inline long syscall_get_error(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return regs->ARM_r0;
+}
+
+#endif /* _ASM_ARM_SYSCALLS_H */
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 3f562a7..acf9a39 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -23,6 +23,7 @@
#include <asm/pgtable.h>
#include <asm/system.h>
#include <asm/traps.h>
+#include <asm/syscall.h>

#include "ptrace.h"

@@ -863,3 +864,71 @@ asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)

return current_thread_info()->syscall;
}
+
+/*
+ * This function essentially duplicates the logic from vector_swi in
+ * arch/arm/kernel/entry-common.S. However, that code is in the
+ * critical path for system calls and is hard to factor out without
+ * compromising performance.
+ */
+int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
+{
+ int ret;
+ int scno;
+ unsigned long instr;
+ bool config_oabi = false;
+ bool config_aeabi = false;
+ bool config_arm_thumb = false;
+ bool config_cpu_endian_be8 = false;
+
+#ifdef CONFIG_OABI_COMPAT
+ config_oabi = true;
+#endif
+#ifdef CONFIG_AEABI
+ config_aeabi = true;
+#endif
+#ifdef CONFIG_ARM_THUMB
+ config_arm_thumb = true;
+#endif
+#ifdef CONFIG_CPU_ENDIAN_BE8
+ config_cpu_endian_be8 = true;
+#endif
+#ifdef CONFIG_CPU_ARM710
+ return -1;
+#endif
+
+ if (config_aeabi && !config_oabi) {
+ /* Pure EABI */
+ return regs->ARM_r7;
+ } else if (config_oabi) {
+ if (config_arm_thumb && (regs->ARM_cpsr & PSR_T_BIT))
+ return -1;
+
+ ret = access_process_vm(task, regs->ARM_pc - 4, &instr,
+ sizeof(unsigned long), 0);
+ if (ret != sizeof(unsigned long))
+ return -1;
+
+ if (config_cpu_endian_be8)
+ asm ("rev %[out], %[in]": [out] "=r" (instr):
+ [in] "r" (instr));
+
+ if ((instr & 0x00ffffff) == 0)
+ return regs->ARM_r7; /* EABI call */
+ else
+ return (instr & 0x00ffffff) | __NR_OABI_SYSCALL_BASE;
+ } else {
+ /* Legacy ABI only */
+ if (config_arm_thumb && (regs->ARM_cpsr & PSR_T_BIT)) {
+ /* Thumb mode ABI */
+ scno = regs->ARM_r7 + __NR_SYSCALL_BASE;
+ } else {
+ ret = access_process_vm(task, regs->ARM_pc - 4, &instr,
+ sizeof(unsigned long), 0);
+ if (ret != sizeof(unsigned long))
+ return -1;
+ scno = instr;
+ }
+ return scno & 0x00ffffff;
+ }
+}
--
1.5.6.5

2010-06-20 22:02:23

by Oren Laadan

[permalink] [raw]
Subject: Re: [C/R ARM v2][PATCH 0/3] Linux Checkpoint-Restart - ARM port

Applied.

On 04/26/2010 05:43 PM, Christoffer Dall wrote:
> Following there will be two preparatory patches for an ARM port of the
> checkpoint-restart code and finally a third patch implementing the
> architecture-specific parts of c/r.
>
> The preparatory patches consist of a partial syscall trace implementation
> for ARM and an eclone implementation for ARM. The syscall trace
> implementation provides only the needed functionality for c/r.
>
> There is a separate patch for the user space code, which supports
> cross-compilation, extracting headers for ARM and an eclone implementation
> for ARM.
>
> The kernel patches presented here are based on the ckpt-v21-rc6 patch set.
>
> ---
>
> CHANGELOG:
>
> [2010-Apr-08] v2:
> - Systrace implementation now inspects process state to get the
> system call number thereby avoiding extra work on system calls.
> - Removed __user attribute on long type in eclone implementation
> - Better check for architecture versions across C/R
> - Improved checking of user space ABI settings across C/R
> - Code simplifications
>
> [2010-Mar-22] v1:
> - Initial version
> - Systrace implementation modified the system call entry path to
> store the system call number globally in memory.
> - ARM implementation lightly tested
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers
>