2023-04-07 17:20:35

by Gregory Price

[permalink] [raw]
Subject: [PATCH v16 0/4] Checkpoint Support for Syscall User Dispatch

This appears to be in a good state now. Not sure which branch
I should seek to have this pulled through, or what else needs to
be done to push this forward.

v16: pick up review tags, mild comment change

v15: drop task_access_ok variant, instead prefer to just untag the
selector address when validating the user pointer.

v14: implement task_access_ok variant for cross-task pointer checks

v13: sizeof consistency and cosmetic changes in patch 2

v12: split test into its own patch
change from padding a u8 to using a u64
casting issues
checkpatch.pl

[truncating version history]

Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land. However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

Patch summary:

- Refactor configuration setting interface to operate on a task
rather than current, so the set and error paths can be consolidated

- Untag the selector address when being set in order to enable an
untagged tracer to set a tagged tracee's syscall dispatch selector.
Otherwise an untagged tracer will always fail to set a tagged address.

- Implement a getter interface for Syscall User Dispatch config info.
To resume successfully, the checkpoint/resume software has to
save and restore this information. Presently this configuration
is write-only, with no way for C/R software to save it.

This was done in ptrace because syscall user dispatch is not part of
uapi. The syscall_user_dispatch_config structure was added to the
ptrace exports.

- Selftest for the new feature


Gregory Price (4):
syscall_user_dispatch: helper function to operate on given task
syscall user dispatch: untag selector addresses before access_ok
ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
selftest,ptrace: Add selftest for syscall user dispatch config api

.../admin-guide/syscall-user-dispatch.rst | 4 +
include/linux/syscall_user_dispatch.h | 18 +++++
include/uapi/linux/ptrace.h | 29 ++++++++
kernel/entry/syscall_user_dispatch.c | 74 ++++++++++++++++---
kernel/ptrace.c | 9 +++
tools/testing/selftests/ptrace/.gitignore | 1 +
tools/testing/selftests/ptrace/Makefile | 2 +-
tools/testing/selftests/ptrace/get_set_sud.c | 72 ++++++++++++++++++
8 files changed, 199 insertions(+), 10 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

--
2.39.1


2023-04-07 17:20:55

by Gregory Price

[permalink] [raw]
Subject: [PATCH v16 1/4] syscall_user_dispatch: helper function to operate on given task

Preparatory patch ahead of set/get interfaces which will allow a
ptrace to get/set the syscall user dispatch configuration of a task.

This will simplify the set interface and consolidates error paths.

Signed-off-by: Gregory Price <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
---
kernel/entry/syscall_user_dispatch.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..22396b234854 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -68,8 +68,9 @@ bool syscall_user_dispatch(struct pt_regs *regs)
return true;
}

-int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
- unsigned long len, char __user *selector)
+static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
+ unsigned long offset, unsigned long len,
+ char __user *selector)
{
switch (mode) {
case PR_SYS_DISPATCH_OFF:
@@ -94,15 +95,21 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
return -EINVAL;
}

- current->syscall_dispatch.selector = selector;
- current->syscall_dispatch.offset = offset;
- current->syscall_dispatch.len = len;
- current->syscall_dispatch.on_dispatch = false;
+ task->syscall_dispatch.selector = selector;
+ task->syscall_dispatch.offset = offset;
+ task->syscall_dispatch.len = len;
+ task->syscall_dispatch.on_dispatch = false;

if (mode == PR_SYS_DISPATCH_ON)
- set_syscall_work(SYSCALL_USER_DISPATCH);
+ set_task_syscall_work(task, SYSCALL_USER_DISPATCH);
else
- clear_syscall_work(SYSCALL_USER_DISPATCH);
+ clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);

return 0;
}
+
+int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
+ unsigned long len, char __user *selector)
+{
+ return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
+}
--
2.39.1

2023-04-07 17:21:09

by Gregory Price

[permalink] [raw]
Subject: [PATCH v16 2/4] syscall user dispatch: untag selector addresses before access_ok

This is a preparatory patch for enabling checkpoint/restart of tasks
utilizing syscall user dispatch via ptrace.

To support checkpoint/restart, ptrace must be able to set the selector
of the tracee. The selector is a user pointer that may be subject to
memory tagging extensions on some architectures (namely ARM MTE).

access_ok will clear memory tags for tagged addresses on tasks where
memory tagging is enabled. However, to allow ptrace to set a task's
selector when tracer and tracee are not both tagged or untagged,
the selector address must be untagged when calling access_ok.

Since access_ok utilizes current to determine whether or not to untag
an address, an untagged tracer will always fail to restore a tagged
address in a tagged tracee. This patch will resolve this issue.

The result of this is that a tagged tracer may be capable of setting
an invalid address, which will cause the tracee to SIGSEGV on next
syscall. This is equivalent to the tracee setting a bad selector
address (such as selector=0x1). This is preferable to the alternative
of creating a task_access_ok variant, and is consistent with other
operations which change tracee pointers via ptrace.

For more information, see:
https://lore.kernel.org/all/[email protected]/

Signed-off-by: Gregory Price <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
---
kernel/entry/syscall_user_dispatch.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 22396b234854..424f24350f8b 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -87,7 +87,14 @@ static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned lon
if (offset && offset + len <= offset)
return -EINVAL;

- if (selector && !access_ok(selector, sizeof(*selector)))
+ /*
+ * access_ok will clear memory tags for tagged addresses on tasks where
+ * memory tagging is enabled. To enable a tracer to set a tracee's
+ * selector not in the same tagging state, the selector address must be
+ * untagged for access_ok, otherwise an untagged tracer will always fail
+ * to set a tagged tracee's selector.
+ */
+ if (selector && !access_ok(untagged_addr(selector), sizeof(*selector)))
return -EFAULT;

break;
--
2.39.1

2023-04-07 17:21:36

by Gregory Price

[permalink] [raw]
Subject: [PATCH v16 4/4] selftest,ptrace: Add selftest for syscall user dispatch config api

Validate that the following new ptrace requests work as expected

* PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG
- returns the contents of task->syscall_dispatch if enabled

* PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
- sets the contents of task->syscall_dispatch

Signed-off-by: Gregory Price <[email protected]>
---
tools/testing/selftests/ptrace/.gitignore | 1 +
tools/testing/selftests/ptrace/Makefile | 2 +-
tools/testing/selftests/ptrace/get_set_sud.c | 72 ++++++++++++++++++++
3 files changed, 74 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

diff --git a/tools/testing/selftests/ptrace/.gitignore b/tools/testing/selftests/ptrace/.gitignore
index 792318aaa30c..b7dde152e75a 100644
--- a/tools/testing/selftests/ptrace/.gitignore
+++ b/tools/testing/selftests/ptrace/.gitignore
@@ -1,4 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
get_syscall_info
+get_set_sud
peeksiginfo
vmaccess
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index 2f1f532c39db..33a36b73bcb9 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
CFLAGS += -std=c99 -pthread -iquote../../../../include/uapi -Wall

-TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
+TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess get_set_sud

include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/get_set_sud.c b/tools/testing/selftests/ptrace/get_set_sud.c
new file mode 100644
index 000000000000..5297b10d25c3
--- /dev/null
+++ b/tools/testing/selftests/ptrace/get_set_sud.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <sys/prctl.h>
+
+#include "linux/ptrace.h"
+
+static int sys_ptrace(int request, pid_t pid, void *addr, void *data)
+{
+ return syscall(SYS_ptrace, request, pid, addr, data);
+}
+
+TEST(get_set_sud)
+{
+ struct ptrace_sud_config config;
+ pid_t child;
+ int ret = 0;
+ int status;
+
+ child = fork();
+ ASSERT_GE(child, 0);
+ if (child == 0) {
+ ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+ TH_LOG("PTRACE_TRACEME: %m");
+ }
+ kill(getpid(), SIGSTOP);
+ _exit(1);
+ }
+
+ waitpid(child, &status, 0);
+
+ memset(&config, 0xff, sizeof(config));
+ config.mode = PR_SYS_DISPATCH_ON;
+
+ ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+ (void *)sizeof(config), &config);
+
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(config.mode, PR_SYS_DISPATCH_OFF);
+ ASSERT_EQ(config.selector, 0);
+ ASSERT_EQ(config.offset, 0);
+ ASSERT_EQ(config.len, 0);
+
+ config.mode = PR_SYS_DISPATCH_ON;
+ config.selector = 0;
+ config.offset = 0x400000;
+ config.len = 0x1000;
+
+ ret = sys_ptrace(PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG, child,
+ (void *)sizeof(config), &config);
+
+ ASSERT_EQ(ret, 0);
+
+ memset(&config, 1, sizeof(config));
+ ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+ (void *)sizeof(config), &config);
+
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(config.mode, PR_SYS_DISPATCH_ON);
+ ASSERT_EQ(config.selector, 0);
+ ASSERT_EQ(config.offset, 0x400000);
+ ASSERT_EQ(config.len, 0x1000);
+
+ kill(child, SIGKILL);
+}
+
+TEST_HARNESS_MAIN
--
2.39.1

2023-04-07 17:22:16

by Gregory Price

[permalink] [raw]
Subject: [PATCH v16 3/4] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

Implement ptrace getter/setter interface for syscall user dispatch.

These prctl settings are presently write-only, making it impossible to
implement transparent checkpoint/restore via software like CRIU.

'on_dispatch' field is not exposed because it is a kernel-internal
only field that cannot be 'true' when returning to userland.

Signed-off-by: Gregory Price <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
---
.../admin-guide/syscall-user-dispatch.rst | 4 ++
include/linux/syscall_user_dispatch.h | 18 ++++++++
include/uapi/linux/ptrace.h | 29 +++++++++++++
kernel/entry/syscall_user_dispatch.c | 42 +++++++++++++++++++
kernel/ptrace.c | 9 ++++
5 files changed, 102 insertions(+)

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..f7648c08297e 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -73,6 +73,10 @@ thread-wide, without the need to invoke the kernel directly. selector
can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK.
Any other value should terminate the program with a SIGSYS.

+Additionally, a task's syscall user dispatch configuration can be peeked
+and poked via the PTRACE_(GET|SET)_SYSCALL_USER_DISPATCH_CONFIG ptrace
+requests. This is useful for checkpoint/restart software.
+
Security Notes
--------------

diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..641ca8880995 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
#define clear_syscall_work_syscall_user_dispatch(tsk) \
clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)

+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data);
+
#else
struct syscall_user_dispatch {};

@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
{
}

+static inline int syscall_user_dispatch_get_config(struct task_struct *task,
+ unsigned long size, void __user *data)
+{
+ return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task,
+ unsigned long size, void __user *data)
+{
+ return -EINVAL;
+}
+
#endif /* CONFIG_GENERIC_ENTRY */

#endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..1e77b02344c3 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,35 @@ struct ptrace_rseq_configuration {
__u32 pad;
};

+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+
+/*
+ * struct ptrace_sud_config - Per-task configuration for SUD
+ * @mode: One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF
+ * @selector: Tracee's user virtual address of SUD selector
+ * @offset: SUD exclusion area (virtual address)
+ * @len: Length of SUD exclusion area
+ *
+ * Used to get/set the syscall user dispatch configuration for tracee.
+ * process. Selector is optional (may be NULL), and if invalid will produce
+ * a SIGSEGV in the tracee upon first access.
+ *
+ * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If
+ * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other
+ * parameters must be 0. The value in *selector (if not null), also determines
+ * whether syscall dispatch will occur.
+ *
+ * The SUD Exclusion area described by offset/len is the virtual address space
+ * from which syscalls will not produce a user dispatch.
+ */
+struct ptrace_sud_config {
+ __u64 mode;
+ __u64 selector;
+ __u64 offset;
+ __u64 len;
+};
+
/*
* These values are stored in task->ptrace_message
* by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 424f24350f8b..3af4e73b62b4 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -4,6 +4,7 @@
*/
#include <linux/sched.h>
#include <linux/prctl.h>
+#include <linux/ptrace.h>
#include <linux/syscall_user_dispatch.h>
#include <linux/uaccess.h>
#include <linux/signal.h>
@@ -120,3 +121,44 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
{
return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
}
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+ struct ptrace_sud_config cfg;
+
+ if (size != sizeof(cfg))
+ return -EINVAL;
+
+ if (test_task_syscall_work(task, SYSCALL_USER_DISPATCH))
+ cfg.mode = PR_SYS_DISPATCH_ON;
+ else
+ cfg.mode = PR_SYS_DISPATCH_OFF;
+
+ cfg.offset = sd->offset;
+ cfg.len = sd->len;
+ cfg.selector = (__u64)(uintptr_t)sd->selector;
+
+ if (copy_to_user(data, &cfg, sizeof(cfg)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ int rc;
+ struct ptrace_sud_config cfg;
+
+ if (size != sizeof(cfg))
+ return -EINVAL;
+
+ if (copy_from_user(&cfg, data, sizeof(cfg)))
+ return -EFAULT;
+
+ rc = task_set_syscall_user_dispatch(task, cfg.mode, cfg.offset, cfg.len,
+ (char __user *)(uintptr_t)cfg.selector);
+ return rc;
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..d99376532b56 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
#include <linux/compat.h>
#include <linux/sched/signal.h>
#include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>

#include <asm/syscall.h> /* for syscall_get_* */

@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
break;
#endif

+ case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+ ret = syscall_user_dispatch_set_config(child, addr, datavp);
+ break;
+
+ case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+ ret = syscall_user_dispatch_get_config(child, addr, datavp);
+ break;
+
default:
break;
}
--
2.39.1

Subject: [tip: core/entry] selftest, ptrace: Add selftest for syscall user dispatch config api

The following commit has been merged into the core/entry branch of tip:

Commit-ID: 8c8fa605f7b8b6df3e6fb280a74cff8d7374a7b7
Gitweb: https://git.kernel.org/tip/8c8fa605f7b8b6df3e6fb280a74cff8d7374a7b7
Author: Gregory Price <[email protected]>
AuthorDate: Fri, 07 Apr 2023 13:18:34 -04:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Sun, 16 Apr 2023 14:23:08 +02:00

selftest, ptrace: Add selftest for syscall user dispatch config api

Validate that the following new ptrace requests work as expected

* PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG
returns the contents of task->syscall_dispatch

* PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
sets the contents of task->syscall_dispatch

Signed-off-by: Gregory Price <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
tools/testing/selftests/ptrace/.gitignore | 1 +-
tools/testing/selftests/ptrace/Makefile | 2 +-
tools/testing/selftests/ptrace/get_set_sud.c | 72 +++++++++++++++++++-
3 files changed, 74 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

diff --git a/tools/testing/selftests/ptrace/.gitignore b/tools/testing/selftests/ptrace/.gitignore
index 792318a..b7dde15 100644
--- a/tools/testing/selftests/ptrace/.gitignore
+++ b/tools/testing/selftests/ptrace/.gitignore
@@ -1,4 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
get_syscall_info
+get_set_sud
peeksiginfo
vmaccess
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index 96ffa94..1c63174 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
CFLAGS += -std=c99 -pthread -Wall $(KHDR_INCLUDES)

-TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
+TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess get_set_sud

include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/get_set_sud.c b/tools/testing/selftests/ptrace/get_set_sud.c
new file mode 100644
index 0000000..5297b10
--- /dev/null
+++ b/tools/testing/selftests/ptrace/get_set_sud.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <sys/prctl.h>
+
+#include "linux/ptrace.h"
+
+static int sys_ptrace(int request, pid_t pid, void *addr, void *data)
+{
+ return syscall(SYS_ptrace, request, pid, addr, data);
+}
+
+TEST(get_set_sud)
+{
+ struct ptrace_sud_config config;
+ pid_t child;
+ int ret = 0;
+ int status;
+
+ child = fork();
+ ASSERT_GE(child, 0);
+ if (child == 0) {
+ ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+ TH_LOG("PTRACE_TRACEME: %m");
+ }
+ kill(getpid(), SIGSTOP);
+ _exit(1);
+ }
+
+ waitpid(child, &status, 0);
+
+ memset(&config, 0xff, sizeof(config));
+ config.mode = PR_SYS_DISPATCH_ON;
+
+ ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+ (void *)sizeof(config), &config);
+
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(config.mode, PR_SYS_DISPATCH_OFF);
+ ASSERT_EQ(config.selector, 0);
+ ASSERT_EQ(config.offset, 0);
+ ASSERT_EQ(config.len, 0);
+
+ config.mode = PR_SYS_DISPATCH_ON;
+ config.selector = 0;
+ config.offset = 0x400000;
+ config.len = 0x1000;
+
+ ret = sys_ptrace(PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG, child,
+ (void *)sizeof(config), &config);
+
+ ASSERT_EQ(ret, 0);
+
+ memset(&config, 1, sizeof(config));
+ ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+ (void *)sizeof(config), &config);
+
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(config.mode, PR_SYS_DISPATCH_ON);
+ ASSERT_EQ(config.selector, 0);
+ ASSERT_EQ(config.offset, 0x400000);
+ ASSERT_EQ(config.len, 0x1000);
+
+ kill(child, SIGKILL);
+}
+
+TEST_HARNESS_MAIN

Subject: [tip: core/entry] ptrace: Provide set/get interface for syscall user dispatch

The following commit has been merged into the core/entry branch of tip:

Commit-ID: 3f67987cdc09778e75098f9f5168832f8f8e1f1c
Gitweb: https://git.kernel.org/tip/3f67987cdc09778e75098f9f5168832f8f8e1f1c
Author: Gregory Price <[email protected]>
AuthorDate: Fri, 07 Apr 2023 13:18:33 -04:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Sun, 16 Apr 2023 14:23:07 +02:00

ptrace: Provide set/get interface for syscall user dispatch

The syscall user dispatch configuration can only be set by the task itself,
but lacks a ptrace set/get interface which makes it impossible to implement
checkpoint/restore for it.

Add the required ptrace requests and the get/set functions in the syscall
user dispatch code to make that possible.

Signed-off-by: Gregory Price <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
Documentation/admin-guide/syscall-user-dispatch.rst | 4 +-
include/linux/syscall_user_dispatch.h | 18 +++++-
include/uapi/linux/ptrace.h | 30 +++++++++-
kernel/entry/syscall_user_dispatch.c | 40 ++++++++++++-
kernel/ptrace.c | 9 +++-
5 files changed, 101 insertions(+)

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 6031495..e3cfffe 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -73,6 +73,10 @@ thread-wide, without the need to invoke the kernel directly. selector
can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK.
Any other value should terminate the program with a SIGSYS.

+Additionally, a tasks syscall user dispatch configuration can be peeked
+and poked via the PTRACE_(GET|SET)_SYSCALL_USER_DISPATCH_CONFIG ptrace
+requests. This is useful for checkpoint/restart software.
+
Security Notes
--------------

diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443..641ca88 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
#define clear_syscall_work_syscall_user_dispatch(tsk) \
clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)

+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data);
+
#else
struct syscall_user_dispatch {};

@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
{
}

+static inline int syscall_user_dispatch_get_config(struct task_struct *task,
+ unsigned long size, void __user *data)
+{
+ return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task,
+ unsigned long size, void __user *data)
+{
+ return -EINVAL;
+}
+
#endif /* CONFIG_GENERIC_ENTRY */

#endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64..72c038f 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,36 @@ struct ptrace_rseq_configuration {
__u32 pad;
};

+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+
+/*
+ * struct ptrace_sud_config - Per-task configuration for Syscall User Dispatch
+ * @mode: One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF
+ * @selector: Tracees user virtual address of SUD selector
+ * @offset: SUD exclusion area (virtual address)
+ * @len: Length of SUD exclusion area
+ *
+ * Used to get/set the syscall user dispatch configuration for a tracee.
+ * Selector is optional (may be NULL), and if invalid will produce
+ * a SIGSEGV in the tracee upon first access.
+ *
+ * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If
+ * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other
+ * parameters must be 0. The value in *selector (if not null), also determines
+ * whether syscall dispatch will occur.
+ *
+ * The Syscall User Dispatch Exclusion area described by offset/len is the
+ * virtual address space from which syscalls will not produce a user
+ * dispatch.
+ */
+struct ptrace_sud_config {
+ __u64 mode;
+ __u64 selector;
+ __u64 offset;
+ __u64 len;
+};
+
/*
* These values are stored in task->ptrace_message
* by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 7f2add4..5340c5a 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -4,6 +4,7 @@
*/
#include <linux/sched.h>
#include <linux/prctl.h>
+#include <linux/ptrace.h>
#include <linux/syscall_user_dispatch.h>
#include <linux/uaccess.h>
#include <linux/signal.h>
@@ -122,3 +123,42 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
{
return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
}
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+ struct ptrace_sud_config cfg;
+
+ if (size != sizeof(cfg))
+ return -EINVAL;
+
+ if (test_task_syscall_work(task, SYSCALL_USER_DISPATCH))
+ cfg.mode = PR_SYS_DISPATCH_ON;
+ else
+ cfg.mode = PR_SYS_DISPATCH_OFF;
+
+ cfg.offset = sd->offset;
+ cfg.len = sd->len;
+ cfg.selector = (__u64)(uintptr_t)sd->selector;
+
+ if (copy_to_user(data, &cfg, sizeof(cfg)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+ void __user *data)
+{
+ struct ptrace_sud_config cfg;
+
+ if (size != sizeof(cfg))
+ return -EINVAL;
+
+ if (copy_from_user(&cfg, data, sizeof(cfg)))
+ return -EFAULT;
+
+ return task_set_syscall_user_dispatch(task, cfg.mode, cfg.offset, cfg.len,
+ (char __user *)(uintptr_t)cfg.selector);
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0786450..443057b 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
#include <linux/compat.h>
#include <linux/sched/signal.h>
#include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>

#include <asm/syscall.h> /* for syscall_get_* */

@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
break;
#endif

+ case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+ ret = syscall_user_dispatch_set_config(child, addr, datavp);
+ break;
+
+ case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+ ret = syscall_user_dispatch_get_config(child, addr, datavp);
+ break;
+
default:
break;
}

Subject: [tip: core/entry] syscall_user_dispatch: Untag selector address before access_ok()

The following commit has been merged into the core/entry branch of tip:

Commit-ID: 463b7715e7ce367fce89769c5d85e31595715ee1
Gitweb: https://git.kernel.org/tip/463b7715e7ce367fce89769c5d85e31595715ee1
Author: Gregory Price <[email protected]>
AuthorDate: Fri, 07 Apr 2023 13:18:32 -04:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Sun, 16 Apr 2023 14:23:07 +02:00

syscall_user_dispatch: Untag selector address before access_ok()

To support checkpoint/restart, ptrace must be able to set the selector
of the tracee. The selector is a user pointer that may be subject to
memory tagging extensions on some architectures (namely ARM MTE).

access_ok() clears memory tags for tagged addresses if the current task has
memory tagging enabled.

This obviously fails when ptrace modifies the selector of a tracee when
tracer and tracee do not have the same memory tagging enabled state.

Solve this by untagging the selector address before handing it to
access_ok(), like other ptrace functions which modify tracee pointers do.

Obviously a tracer can set an invalid selector address for the tracee, but
that's independent of tagging and a general capability of the tracer.

Suggested-by: Catalin Marinas <[email protected]>
Signed-off-by: Gregory Price <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/r/[email protected]

---
kernel/entry/syscall_user_dispatch.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 22396b2..7f2add4 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -87,7 +87,16 @@ static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned lon
if (offset && offset + len <= offset)
return -EINVAL;

- if (selector && !access_ok(selector, sizeof(*selector)))
+ /*
+ * access_ok() will clear memory tags for tagged addresses
+ * if current has memory tagging enabled.
+
+ * To enable a tracer to set a tracees selector the
+ * selector address must be untagged for access_ok(),
+ * otherwise an untagged tracer will always fail to set a
+ * tagged tracees selector.
+ */
+ if (selector && !access_ok(untagged_addr(selector), sizeof(*selector)))
return -EFAULT;

break;

Subject: [tip: core/entry] syscall_user_dispatch: Split up set_syscall_user_dispatch()

The following commit has been merged into the core/entry branch of tip:

Commit-ID: 43360686328b1f4b7d9dcc883ee9243ad7c16fae
Gitweb: https://git.kernel.org/tip/43360686328b1f4b7d9dcc883ee9243ad7c16fae
Author: Gregory Price <[email protected]>
AuthorDate: Fri, 07 Apr 2023 13:18:31 -04:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Sun, 16 Apr 2023 14:23:07 +02:00

syscall_user_dispatch: Split up set_syscall_user_dispatch()

syscall user dispatch configuration is not covered by checkpoint/restore.

To prepare for ptrace access to the syscall user dispatch configuration,
move the inner working of set_syscall_user_dispatch() into a helper
function. Make the helper function task pointer based and let
set_syscall_user_dispatch() invoke it with task=current.

No functional change.

Signed-off-by: Gregory Price <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
kernel/entry/syscall_user_dispatch.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379a..22396b2 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -68,8 +68,9 @@ bool syscall_user_dispatch(struct pt_regs *regs)
return true;
}

-int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
- unsigned long len, char __user *selector)
+static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
+ unsigned long offset, unsigned long len,
+ char __user *selector)
{
switch (mode) {
case PR_SYS_DISPATCH_OFF:
@@ -94,15 +95,21 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
return -EINVAL;
}

- current->syscall_dispatch.selector = selector;
- current->syscall_dispatch.offset = offset;
- current->syscall_dispatch.len = len;
- current->syscall_dispatch.on_dispatch = false;
+ task->syscall_dispatch.selector = selector;
+ task->syscall_dispatch.offset = offset;
+ task->syscall_dispatch.len = len;
+ task->syscall_dispatch.on_dispatch = false;

if (mode == PR_SYS_DISPATCH_ON)
- set_syscall_work(SYSCALL_USER_DISPATCH);
+ set_task_syscall_work(task, SYSCALL_USER_DISPATCH);
else
- clear_syscall_work(SYSCALL_USER_DISPATCH);
+ clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);

return 0;
}
+
+int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
+ unsigned long len, char __user *selector)
+{
+ return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
+}