2023-01-26 19:06:53

by Gregory Price

[permalink] [raw]
Subject: [PATCH v7 0/1] Checkpoint Support for Syscall User Dispatch

v7: drop ptrace suspend flag, not required
hanging unreferenced variable
whitespace

v6: drop fs/proc/array update, it's not needed
drop on_dispatch field exposure in config structure, it's not
checkpoint relevant.
(Thank you for the reviews Oleg and Andrei)

v5: automated test for !defined(GENERIC_ENTRY) failed, fix fs/proc
use ifdef for GENERIC_ENTRY || TIF_SYSCALL_USER_DISPATCH
note: syscall user dispatch is not presently supported for
non-generic entry, but could be implemented. question is
whether the TIF_ define should be carved out now or then

v4: Whitespace
s/CHECKPOINT_RESTART/CHECKPOINT_RESUME
check test_syscall_work(SYSCALL_USER_DISPATCH) to determine if it's
turned on or not in fs/proc/array and getter interface

v3: Kernel test robot static function fix
Whitespace nitpicks

v2: Implements the getter/setter interface in ptrace rather than prctl

Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land. However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

Patch summary:
- Implement a getter interface for Syscall User Dispatch config info.
To resume successfully, the checkpoint/resume software has to
save and restore this information. Presently this configuration
is write-only, with no way for C/R software to save it.

This was done in ptrace because syscall user dispatch is not part of
uapi. The syscall_user_dispatch_config structure was added to the
ptrace exports.

Gregory Price (1):
ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

.../admin-guide/syscall-user-dispatch.rst | 5 ++-
include/linux/syscall_user_dispatch.h | 18 +++++++++
include/uapi/linux/ptrace.h | 9 +++++
kernel/entry/syscall_user_dispatch.c | 39 +++++++++++++++++++
kernel/ptrace.c | 9 +++++
5 files changed, 79 insertions(+), 1 deletion(-)

--
2.39.0



2023-01-26 19:06:55

by Gregory Price

[permalink] [raw]
Subject: [PATCH v7 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

Implement ptrace getter/setter interface for syscall user dispatch.

These prctl settings are presently write-only, making it impossible to
implement transparent checkpoint/restore via software like CRIU.

'on_dispatch' field is not exposed because it is a kernel-internal
only field that cannot be 'true' when returning to userland.

Signed-off-by: Gregory Price <[email protected]>
---
.../admin-guide/syscall-user-dispatch.rst | 5 ++-
include/linux/syscall_user_dispatch.h | 18 +++++++++
include/uapi/linux/ptrace.h | 9 +++++
kernel/entry/syscall_user_dispatch.c | 39 +++++++++++++++++++
kernel/ptrace.c | 9 +++++
5 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..a23ae21a1d5b 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -43,7 +43,10 @@ doesn't rely on any of the syscall ABI to make the filtering. It uses
only the syscall dispatcher address and the userspace key.

As the ABI of these intercepted syscalls is unknown to Linux, these
-syscalls are not instrumentable via ptrace or the syscall tracepoints.
+syscalls are not instrumentable via ptrace or the syscall tracepoints,
+however an interfaces to suspend, checkpoint, and restore syscall user
+dispatch configuration has been added to ptrace to assist userland
+checkpoint/restart software.

Interface
---------
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..5de2d64ace19 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
#define clear_syscall_work_syscall_user_dispatch(tsk) \
clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)

+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data);
+
#else
struct syscall_user_dispatch {};

@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
{
}

+static inline int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ return -EINVAL;
+}
+
#endif /* CONFIG_GENERIC_ENTRY */

#endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..6d2f3b86f932 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,15 @@ struct ptrace_rseq_configuration {
__u32 pad;
};

+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+struct syscall_user_dispatch_config {
+ __u64 mode;
+ __s8 *selector;
+ __u64 offset;
+ __u64 len;
+};
+
/*
* These values are stored in task->ptrace_message
* by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..26217fcc1c90 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -106,3 +106,42 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,

return 0;
}
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+ struct syscall_user_dispatch_config config;
+
+ if (size != sizeof(struct syscall_user_dispatch_config))
+ return -EINVAL;
+
+ if (test_syscall_work(SYSCALL_USER_DISPATCH))
+ config.mode = PR_SYS_DISPATCH_ON;
+ else
+ config.mode = PR_SYS_DISPATCH_OFF;
+
+ config.offset = sd->offset;
+ config.len = sd->len;
+ config.selector = sd->selector;
+
+ if (copy_to_user(data, &config, sizeof(config)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ struct syscall_user_dispatch_config config;
+
+ if (size != sizeof(struct syscall_user_dispatch_config))
+ return -EINVAL;
+
+ if (copy_from_user(&config, data, sizeof(config)))
+ return -EFAULT;
+
+ return set_syscall_user_dispatch(config.mode, config.offset, config.len,
+ config.selector);
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..d99376532b56 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
#include <linux/compat.h>
#include <linux/sched/signal.h>
#include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>

#include <asm/syscall.h> /* for syscall_get_* */

@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
break;
#endif

+ case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+ ret = syscall_user_dispatch_set_config(child, addr, datavp);
+ break;
+
+ case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+ ret = syscall_user_dispatch_get_config(child, addr, datavp);
+ break;
+
default:
break;
}
--
2.39.0


2023-01-26 19:16:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v7 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

On 01/26, Gregory Price wrote:
>
> Implement ptrace getter/setter interface for syscall user dispatch.
>
> These prctl settings are presently write-only, making it impossible to
> implement transparent checkpoint/restore via software like CRIU.
>
> 'on_dispatch' field is not exposed because it is a kernel-internal
> only field that cannot be 'true' when returning to userland.
>
> Signed-off-by: Gregory Price <[email protected]>
> ---
> .../admin-guide/syscall-user-dispatch.rst | 5 ++-
> include/linux/syscall_user_dispatch.h | 18 +++++++++
> include/uapi/linux/ptrace.h | 9 +++++
> kernel/entry/syscall_user_dispatch.c | 39 +++++++++++++++++++
> kernel/ptrace.c | 9 +++++
> 5 files changed, 79 insertions(+), 1 deletion(-)

Reviewed-by: Oleg Nesterov <[email protected]>


2023-01-28 12:03:45

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v7 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

Hi Gregory,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on tip/core/entry v6.2-rc5 next-20230127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Gregory-Price/ptrace-syscall_user_dispatch-checkpoint-restore-support-for-SUD/20230128-145101
patch link: https://lore.kernel.org/r/20230126190645.18341-2-gregory.price%40memverge.com
patch subject: [PATCH v7 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
config: x86_64-rhel-8.3-syz (https://download.01.org/0day-ci/archive/20230128/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/bc68df21f98617e74a8c5368a901041f89bdb17f
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Gregory-Price/ptrace-syscall_user_dispatch-checkpoint-restore-support-for-SUD/20230128-145101
git checkout bc68df21f98617e74a8c5368a901041f89bdb17f
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 olddefconfig
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash kernel/entry/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

kernel/entry/syscall_user_dispatch.c: In function 'syscall_user_dispatch_get_config':
kernel/entry/syscall_user_dispatch.c:114:45: error: storage size of 'config' isn't known
114 | struct syscall_user_dispatch_config config;
| ^~~~~~
kernel/entry/syscall_user_dispatch.c:116:28: error: invalid application of 'sizeof' to incomplete type 'struct syscall_user_dispatch_config'
116 | if (size != sizeof(struct syscall_user_dispatch_config))
| ^~~~~~
>> kernel/entry/syscall_user_dispatch.c:114:45: warning: unused variable 'config' [-Wunused-variable]
114 | struct syscall_user_dispatch_config config;
| ^~~~~~
kernel/entry/syscall_user_dispatch.c: In function 'syscall_user_dispatch_set_config':
kernel/entry/syscall_user_dispatch.c:137:45: error: storage size of 'config' isn't known
137 | struct syscall_user_dispatch_config config;
| ^~~~~~
kernel/entry/syscall_user_dispatch.c:139:28: error: invalid application of 'sizeof' to incomplete type 'struct syscall_user_dispatch_config'
139 | if (size != sizeof(struct syscall_user_dispatch_config))
| ^~~~~~
kernel/entry/syscall_user_dispatch.c:137:45: warning: unused variable 'config' [-Wunused-variable]
137 | struct syscall_user_dispatch_config config;
| ^~~~~~
kernel/entry/syscall_user_dispatch.c:147:1: error: control reaches end of non-void function [-Werror=return-type]
147 | }
| ^
cc1: some warnings being treated as errors


vim +/config +114 kernel/entry/syscall_user_dispatch.c

109
110 int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
111 void __user *data)
112 {
113 struct syscall_user_dispatch *sd = &task->syscall_dispatch;
> 114 struct syscall_user_dispatch_config config;
115
116 if (size != sizeof(struct syscall_user_dispatch_config))
117 return -EINVAL;
118
119 if (test_syscall_work(SYSCALL_USER_DISPATCH))
120 config.mode = PR_SYS_DISPATCH_ON;
121 else
122 config.mode = PR_SYS_DISPATCH_OFF;
123
124 config.offset = sd->offset;
125 config.len = sd->len;
126 config.selector = sd->selector;
127
128 if (copy_to_user(data, &config, sizeof(config)))
129 return -EFAULT;
130
131 return 0;
132 }
133

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-01-28 16:11:49

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v7 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

Hi Gregory,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on tip/core/entry v6.2-rc5 next-20230127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Gregory-Price/ptrace-syscall_user_dispatch-checkpoint-restore-support-for-SUD/20230128-145101
patch link: https://lore.kernel.org/r/20230126190645.18341-2-gregory.price%40memverge.com
patch subject: [PATCH v7 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
config: x86_64-randconfig-a013-20230123 (https://download.01.org/0day-ci/archive/20230129/[email protected]/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/bc68df21f98617e74a8c5368a901041f89bdb17f
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Gregory-Price/ptrace-syscall_user_dispatch-checkpoint-restore-support-for-SUD/20230128-145101
git checkout bc68df21f98617e74a8c5368a901041f89bdb17f
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

>> kernel/entry/syscall_user_dispatch.c:114:38: error: variable has incomplete type 'struct syscall_user_dispatch_config'
struct syscall_user_dispatch_config config;
^
kernel/entry/syscall_user_dispatch.c:114:9: note: forward declaration of 'struct syscall_user_dispatch_config'
struct syscall_user_dispatch_config config;
^
>> kernel/entry/syscall_user_dispatch.c:116:14: error: invalid application of 'sizeof' to an incomplete type 'struct syscall_user_dispatch_config'
if (size != sizeof(struct syscall_user_dispatch_config))
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/entry/syscall_user_dispatch.c:114:9: note: forward declaration of 'struct syscall_user_dispatch_config'
struct syscall_user_dispatch_config config;
^
kernel/entry/syscall_user_dispatch.c:137:38: error: variable has incomplete type 'struct syscall_user_dispatch_config'
struct syscall_user_dispatch_config config;
^
kernel/entry/syscall_user_dispatch.c:137:9: note: forward declaration of 'struct syscall_user_dispatch_config'
struct syscall_user_dispatch_config config;
^
kernel/entry/syscall_user_dispatch.c:139:14: error: invalid application of 'sizeof' to an incomplete type 'struct syscall_user_dispatch_config'
if (size != sizeof(struct syscall_user_dispatch_config))
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/entry/syscall_user_dispatch.c:137:9: note: forward declaration of 'struct syscall_user_dispatch_config'
struct syscall_user_dispatch_config config;
^
4 errors generated.


vim +114 kernel/entry/syscall_user_dispatch.c

109
110 int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
111 void __user *data)
112 {
113 struct syscall_user_dispatch *sd = &task->syscall_dispatch;
> 114 struct syscall_user_dispatch_config config;
115
> 116 if (size != sizeof(struct syscall_user_dispatch_config))
117 return -EINVAL;
118
119 if (test_syscall_work(SYSCALL_USER_DISPATCH))
120 config.mode = PR_SYS_DISPATCH_ON;
121 else
122 config.mode = PR_SYS_DISPATCH_OFF;
123
124 config.offset = sd->offset;
125 config.len = sd->len;
126 config.selector = sd->selector;
127
128 if (copy_to_user(data, &config, sizeof(config)))
129 return -EFAULT;
130
131 return 0;
132 }
133

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-01-28 19:49:59

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v7 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

Hi Gregory,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on tip/core/entry v6.2-rc5 next-20230127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Gregory-Price/ptrace-syscall_user_dispatch-checkpoint-restore-support-for-SUD/20230128-145101
patch link: https://lore.kernel.org/r/20230126190645.18341-2-gregory.price%40memverge.com
patch subject: [PATCH v7 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20230129/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/bc68df21f98617e74a8c5368a901041f89bdb17f
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Gregory-Price/ptrace-syscall_user_dispatch-checkpoint-restore-support-for-SUD/20230128-145101
git checkout bc68df21f98617e74a8c5368a901041f89bdb17f
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 olddefconfig
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

kernel/entry/syscall_user_dispatch.c: In function 'syscall_user_dispatch_get_config':
>> kernel/entry/syscall_user_dispatch.c:114:45: error: storage size of 'config' isn't known
114 | struct syscall_user_dispatch_config config;
| ^~~~~~
>> kernel/entry/syscall_user_dispatch.c:116:28: error: invalid application of 'sizeof' to incomplete type 'struct syscall_user_dispatch_config'
116 | if (size != sizeof(struct syscall_user_dispatch_config))
| ^~~~~~
kernel/entry/syscall_user_dispatch.c:114:45: warning: unused variable 'config' [-Wunused-variable]
114 | struct syscall_user_dispatch_config config;
| ^~~~~~
kernel/entry/syscall_user_dispatch.c: In function 'syscall_user_dispatch_set_config':
kernel/entry/syscall_user_dispatch.c:137:45: error: storage size of 'config' isn't known
137 | struct syscall_user_dispatch_config config;
| ^~~~~~
kernel/entry/syscall_user_dispatch.c:139:28: error: invalid application of 'sizeof' to incomplete type 'struct syscall_user_dispatch_config'
139 | if (size != sizeof(struct syscall_user_dispatch_config))
| ^~~~~~
kernel/entry/syscall_user_dispatch.c:137:45: warning: unused variable 'config' [-Wunused-variable]
137 | struct syscall_user_dispatch_config config;
| ^~~~~~
kernel/entry/syscall_user_dispatch.c:147:1: error: control reaches end of non-void function [-Werror=return-type]
147 | }
| ^
cc1: some warnings being treated as errors


vim +114 kernel/entry/syscall_user_dispatch.c

109
110 int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
111 void __user *data)
112 {
113 struct syscall_user_dispatch *sd = &task->syscall_dispatch;
> 114 struct syscall_user_dispatch_config config;
115
> 116 if (size != sizeof(struct syscall_user_dispatch_config))
117 return -EINVAL;
118
119 if (test_syscall_work(SYSCALL_USER_DISPATCH))
120 config.mode = PR_SYS_DISPATCH_ON;
121 else
122 config.mode = PR_SYS_DISPATCH_OFF;
123
124 config.offset = sd->offset;
125 config.len = sd->len;
126 config.selector = sd->selector;
127
128 if (copy_to_user(data, &config, sizeof(config)))
129 return -EFAULT;
130
131 return 0;
132 }
133

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests