2023-01-31 15:30:56

by Gregory Price

[permalink] [raw]
Subject: [PATCH v8 0/1] Checkpoint Support for Syscall User Dispatch

v8: fix include issue Reported-by: kernel test robot <[email protected]>
summary:
+++ b/kernel/entry/syscall_user_dispatch.c
+ include <linux/ptrace.h>

v7: drop ptrace suspend flag, not required
hanging unreferenced variable
whitespace

v6: drop fs/proc/array update, it's not needed
drop on_dispatch field exposure in config structure, it's not
checkpoint relevant.
(Thank you for the reviews Oleg and Andrei)

v5: automated test for !defined(GENERIC_ENTRY) failed, fix fs/proc
use ifdef for GENERIC_ENTRY || TIF_SYSCALL_USER_DISPATCH
note: syscall user dispatch is not presently supported for
non-generic entry, but could be implemented. question is
whether the TIF_ define should be carved out now or then

v4: Whitespace
s/CHECKPOINT_RESTART/CHECKPOINT_RESUME
check test_syscall_work(SYSCALL_USER_DISPATCH) to determine if it's
turned on or not in fs/proc/array and getter interface

v3: Kernel test robot static function fix
Whitespace nitpicks

v2: Implements the getter/setter interface in ptrace rather than prctl

Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land. However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

Patch summary:
- Implement a getter interface for Syscall User Dispatch config info.
To resume successfully, the checkpoint/resume software has to
save and restore this information. Presently this configuration
is write-only, with no way for C/R software to save it.

This was done in ptrace because syscall user dispatch is not part of
uapi. The syscall_user_dispatch_config structure was added to the
ptrace exports.

Gregory Price (1):
ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

.../admin-guide/syscall-user-dispatch.rst | 5 ++-
include/linux/syscall_user_dispatch.h | 18 +++++++++
include/uapi/linux/ptrace.h | 9 +++++
kernel/entry/syscall_user_dispatch.c | 40 +++++++++++++++++++
kernel/ptrace.c | 9 +++++
5 files changed, 80 insertions(+), 1 deletion(-)

--
2.39.0



2023-01-31 15:31:00

by Gregory Price

[permalink] [raw]
Subject: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

Implement ptrace getter/setter interface for syscall user dispatch.

These prctl settings are presently write-only, making it impossible to
implement transparent checkpoint/restore via software like CRIU.

'on_dispatch' field is not exposed because it is a kernel-internal
only field that cannot be 'true' when returning to userland.

Signed-off-by: Gregory Price <[email protected]>
---
.../admin-guide/syscall-user-dispatch.rst | 5 ++-
include/linux/syscall_user_dispatch.h | 18 +++++++++
include/uapi/linux/ptrace.h | 9 +++++
kernel/entry/syscall_user_dispatch.c | 40 +++++++++++++++++++
kernel/ptrace.c | 9 +++++
5 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..a23ae21a1d5b 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -43,7 +43,10 @@ doesn't rely on any of the syscall ABI to make the filtering. It uses
only the syscall dispatcher address and the userspace key.

As the ABI of these intercepted syscalls is unknown to Linux, these
-syscalls are not instrumentable via ptrace or the syscall tracepoints.
+syscalls are not instrumentable via ptrace or the syscall tracepoints,
+however an interfaces to suspend, checkpoint, and restore syscall user
+dispatch configuration has been added to ptrace to assist userland
+checkpoint/restart software.

Interface
---------
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..5de2d64ace19 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
#define clear_syscall_work_syscall_user_dispatch(tsk) \
clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)

+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data);
+
#else
struct syscall_user_dispatch {};

@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
{
}

+static inline int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ return -EINVAL;
+}
+
#endif /* CONFIG_GENERIC_ENTRY */

#endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..6d2f3b86f932 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,15 @@ struct ptrace_rseq_configuration {
__u32 pad;
};

+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+struct syscall_user_dispatch_config {
+ __u64 mode;
+ __s8 *selector;
+ __u64 offset;
+ __u64 len;
+};
+
/*
* These values are stored in task->ptrace_message
* by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..05d4053a771e 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -4,6 +4,7 @@
*/
#include <linux/sched.h>
#include <linux/prctl.h>
+#include <linux/ptrace.h>
#include <linux/syscall_user_dispatch.h>
#include <linux/uaccess.h>
#include <linux/signal.h>
@@ -106,3 +107,42 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,

return 0;
}
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+ struct syscall_user_dispatch_config config;
+
+ if (size != sizeof(struct syscall_user_dispatch_config))
+ return -EINVAL;
+
+ if (test_syscall_work(SYSCALL_USER_DISPATCH))
+ config.mode = PR_SYS_DISPATCH_ON;
+ else
+ config.mode = PR_SYS_DISPATCH_OFF;
+
+ config.offset = sd->offset;
+ config.len = sd->len;
+ config.selector = sd->selector;
+
+ if (copy_to_user(data, &config, sizeof(config)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ struct syscall_user_dispatch_config config;
+
+ if (size != sizeof(struct syscall_user_dispatch_config))
+ return -EINVAL;
+
+ if (copy_from_user(&config, data, sizeof(config)))
+ return -EFAULT;
+
+ return set_syscall_user_dispatch(config.mode, config.offset, config.len,
+ config.selector);
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..d99376532b56 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
#include <linux/compat.h>
#include <linux/sched/signal.h>
#include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>

#include <asm/syscall.h> /* for syscall_get_* */

@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
break;
#endif

+ case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+ ret = syscall_user_dispatch_set_config(child, addr, datavp);
+ break;
+
+ case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+ ret = syscall_user_dispatch_get_config(child, addr, datavp);
+ break;
+
default:
break;
}
--
2.39.0


2023-01-31 18:59:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

On 01/31, Gregory Price wrote:
>
> Implement ptrace getter/setter interface for syscall user dispatch.
>
> These prctl settings are presently write-only, making it impossible to
> implement transparent checkpoint/restore via software like CRIU.
>
> 'on_dispatch' field is not exposed because it is a kernel-internal
> only field that cannot be 'true' when returning to userland.
>
> Signed-off-by: Gregory Price <[email protected]>

Reviewed-by: Oleg Nesterov <[email protected]>


2023-02-09 20:29:54

by Andrei Vagin

[permalink] [raw]
Subject: Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

On Tue, Jan 31, 2023 at 7:30 AM Gregory Price <[email protected]> wrote:
>
> Implement ptrace getter/setter interface for syscall user dispatch.
>
> These prctl settings are presently write-only, making it impossible to
> implement transparent checkpoint/restore via software like CRIU.
>
> 'on_dispatch' field is not exposed because it is a kernel-internal
> only field that cannot be 'true' when returning to userland.
>

Acked-by: Andrei Vagin <[email protected]>

> Signed-off-by: Gregory Price <[email protected]>

<snip>

> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index 195ae64a8c87..6d2f3b86f932 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -112,6 +112,15 @@ struct ptrace_rseq_configuration {
> __u32 pad;
> };
>
> +#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
> +#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
> +struct syscall_user_dispatch_config {

nit: all structures in this header have the ptrace prefix. I think it
is better to add it to this one too.

> + __u64 mode;
> + __s8 *selector;
> + __u64 offset;
> + __u64 len;
> +};
> +
> /*
> * These values are stored in task->ptrace_message
> * by ptrace_stop to describe the current syscall-stop.

2023-02-09 23:45:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

On Tue, Jan 31 2023 at 09:44, Gregory Price wrote:
>
> +static inline int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
> + void __user *data)

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#line-breaks

All over the place.

> index 195ae64a8c87..6d2f3b86f932 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -112,6 +112,15 @@ struct ptrace_rseq_configuration {
> __u32 pad;
> };
>
> +#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
> +#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
> +struct syscall_user_dispatch_config {

Can you please visibly separate the defines from the struct definition
by a newline? Glueing that stuff together is just horrible to read.

> + __u64 mode;
> + __s8 *selector;
> + __u64 offset;
> + __u64 len;

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#struct-declarations-and-initializers

Please add proper documentation to this struct. It's user space ABI and
it's not the job of the man page maintainers to figure out what this
actually means.

> +int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
> + void __user *data)
> +{
> + struct syscall_user_dispatch_config config;
> +
> + if (size != sizeof(struct syscall_user_dispatch_config))
> + return -EINVAL;
> +
> + if (copy_from_user(&config, data, sizeof(config)))
> + return -EFAULT;
> +
> + return set_syscall_user_dispatch(config.mode, config.offset, config.len,
> + config.selector);

How is this supposed to work? This is called from the ptracer to set the
user dispatch mode on the ptracee, i.e. on @task.

But set_syscall_user_dispatch() operates on current, which is the
ptracer itself.

Clearly well tested with the non-existant selftest, which is part of
this submission.

So please fix the above issues, add a selftest and proper documentation.

I'm neither impressed by this patch nor by the reviews.

Sigh

tglx


2023-02-09 23:46:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

On Fri, Feb 10 2023 at 00:45, Thomas Gleixner wrote:
> I'm neither impressed by this patch nor by the reviews.

I just noticed that this is V8. I'm truly impressed now.

2023-02-10 04:58:32

by Gregory Price

[permalink] [raw]
Subject: Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

On Fri, Feb 10, 2023 at 12:46:41AM +0100, Thomas Gleixner wrote:
> On Fri, Feb 10 2023 at 00:45, Thomas Gleixner wrote:
> > I'm neither impressed by this patch nor by the reviews.
>
> I just noticed that this is V8. I'm truly impressed now.

I'll address your feedback, thank you for the review :]