2021-03-22 22:54:55

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 0/4 v3] arm64/ptrace: allow to get all registers on syscall traps

Here are two known problems with registers when a tracee is stopped in
syscall traps.

The first problem is about the x7 register that is used to indicate
whether or not the stop has been signalled from syscall entry or syscall
exit. This means that:

- Any writes by the tracer to this register during the stop are
ignored/discarded.

- The actual value of the register is not available during the stop,
so the tracer cannot save it and restore it later.

For applications like the user-mode Linux or gVisor, it is critical to
have access to the full set of registers in any moment. For example,
they need to change values of all registers to emulate rt_sigreturn or
execve and they need to have the full set of registers to build a signal
frame.

The second problem is that orig_x0 isn't exposed to user-space. orig_x0
is recorded at the start of the syscall entry and then it is used for
resetting the argument back to its original value during syscall
restarts.

This series extends the user_pt_regs structure with orig_x0 and orig_x7.
This doesn't break the backward compatibility. There are two interfaces
how user-space can receive user_pt_regs from the kernel. The first one
is PTRACE_{GET,SET}REGSET. In this case, the user-space provides a
buffer and its size and the kernel fills only the part that fits the
size. The second interface is a core dump file where registers are
written in a separate note and the user-space can parse only the part
that it knows.

Cc: Catalin Marinas <[email protected]>
Cc: Dave Martin <[email protected]>
Cc: Keno Fischer <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Will Deacon <[email protected]>

Andrei Vagin (4):
arm64: expose orig_x0 in the user_pt_regs structure
arm64/ptrace: introduce orig_x7 in the user_pt_regs structure
selftest/arm64/ptrace: add a test for orig_x0
selftest/arm64/ptrace: add a test for orig_x7

v2: use the ptrace option instead of adding a new regset.
v3: append orig_x0 and orig_x7 to the user_pt_regs.

arch/arm64/include/asm/ptrace.h | 5 +
arch/arm64/kernel/ptrace.c | 130 +++++++++++-----
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/ptrace/Makefile | 6 +
.../arm64/ptrace/ptrace_syscall_regs_test.c | 142 ++++++++++++++++++
6 files changed, 246 insertions(+), 40 deletions(-)
create mode 100644 tools/testing/selftests/arm64/ptrace/Makefile
create mode 100644 tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c

--
2.29.2


2021-03-22 22:54:55

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 2/4] arm64/ptrace: introduce orig_x7 in the user_pt_regs structure

We have some ABI weirdness in the way that we handle syscall exit stops
because we indicate whether or not the stop has been signalled from
syscall entry or syscall exit by clobbering a general purpose register
x7 in the tracee and restoring its old value after the stop.

This behavior was inherited from ARM and it isn't common for other
architectures. Now, we have PTRACE_GET_SYSCALL_INFO that gives all
required information about system calls, so the hack with clobbering
registers isn't needed anymore.

This change instroduces orig_x7 in the user_pt_regs structure that will
contains an origin value of the x7 register if the tracee is stopped in
a system call..

Signed-off-by: Andrei Vagin <[email protected]>
---
arch/arm64/include/asm/ptrace.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 1 +
arch/arm64/kernel/ptrace.c | 18 ++++++++++++------
3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index d4cdf98ac003..1008f0fbc5ea 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -184,6 +184,7 @@ struct pt_regs {
u64 pc;
u64 pstate;
u64 orig_x0;
+ u64 orig_x7;
};
};
#ifdef __AARCH64EB__
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 3c118c5b0893..be7583ff5f4d 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -91,6 +91,7 @@ struct user_pt_regs {
__u64 pc;
__u64 pstate;
__u64 orig_x0;
+ __u64 orig_x7;
};

struct user_fpsimd_state {
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 170f42fd6101..1ed5b4aa986b 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1750,7 +1750,7 @@ static void tracehook_report_syscall(struct pt_regs *regs,
enum ptrace_syscall_dir dir)
{
int regno;
- unsigned long saved_reg;
+ u64 _saved_reg, *saved_reg;

/*
* We have some ABI weirdness here in the way that we handle syscall
@@ -1768,19 +1768,25 @@ static void tracehook_report_syscall(struct pt_regs *regs,
* - Syscall stops behave differently to seccomp and pseudo-step traps
* (the latter do not nobble any registers).
*/
- regno = (is_compat_task() ? 12 : 7);
- saved_reg = regs->regs[regno];
+ if (is_compat_task()) {
+ regno = 12;
+ saved_reg = &_saved_reg;
+ } else {
+ regno = 7;
+ saved_reg = &regs->orig_x7;
+ }
+ *saved_reg = regs->regs[regno];
regs->regs[regno] = dir;

if (dir == PTRACE_SYSCALL_ENTER) {
if (tracehook_report_syscall_entry(regs))
forget_syscall(regs);
- regs->regs[regno] = saved_reg;
+ regs->regs[regno] = *saved_reg;
} else if (!test_thread_flag(TIF_SINGLESTEP)) {
tracehook_report_syscall_exit(regs, 0);
- regs->regs[regno] = saved_reg;
+ regs->regs[regno] = *saved_reg;
} else {
- regs->regs[regno] = saved_reg;
+ regs->regs[regno] = *saved_reg;

/*
* Signal a pseudo-step exception since we are stepping but
--
2.29.2

2021-03-22 22:55:15

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 4/4] selftest/arm64/ptrace: add a test for orig_x7

In system calls, x7 is used to indicate whether a tracee has been
stopped on syscall-enter or syscall-exit and the origin value of x7 is
saved in orig_x7.

Test output:
$ ./ptrace_syscall_test
1..4
ok 1 x7: 0
ok 2 x7: 1
ok 3 x7: 686920776f726c64
ok 4 The child exited with code 0.
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Andrei Vagin <[email protected]>
---
tools/testing/selftests/arm64/ptrace/Makefile | 2 +-
.../arm64/ptrace/ptrace_syscall_test.c | 158 ++++++++++++++++++
2 files changed, 159 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/ptrace/ptrace_syscall_test.c

diff --git a/tools/testing/selftests/arm64/ptrace/Makefile b/tools/testing/selftests/arm64/ptrace/Makefile
index 1bc10e2d2ac8..ea02c1a63806 100644
--- a/tools/testing/selftests/arm64/ptrace/Makefile
+++ b/tools/testing/selftests/arm64/ptrace/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0

CFLAGS += -g -I../../../../../usr/include/
-TEST_GEN_PROGS := ptrace_restart_syscall_test
+TEST_GEN_PROGS := ptrace_restart_syscall_test ptrace_syscall_test

include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/ptrace/ptrace_syscall_test.c b/tools/testing/selftests/arm64/ptrace/ptrace_syscall_test.c
new file mode 100644
index 000000000000..ad55b44ae9f5
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/ptrace_syscall_test.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/ptrace.h>
+#include <sys/user.h>
+#include <sys/wait.h>
+#include <sys/uio.h>
+#include <linux/elf.h>
+#include <linux/unistd.h>
+
+#include "../../kselftest.h"
+#include "lib.h"
+
+#define X7_TEST_VAL 0x686920776f726c64UL
+
+static long test_syscall(long *x7)
+{
+ register long x0 __asm__("x0") = 0;
+ register long *x1 __asm__("x1") = x7;
+ register long x8 __asm__("x8") = 0x555;
+
+ __asm__ (
+ "ldr x7, [x1, 0]\n"
+ "svc 0\n"
+ "str x7, [x1, 0]\n"
+ : "=r"(x0)
+ : "r"(x0), "r"(x1), "r"(x8)
+ :
+ );
+ return x0;
+}
+
+static int child(void)
+{
+ long val = X7_TEST_VAL;
+
+ if (test_syscall(&val)) {
+ ksft_print_msg("The test syscall failed\n");
+ return 1;
+ }
+ if (val != X7_TEST_VAL) {
+ ksft_print_msg("Unexpected x7: %lx\n", val);
+ return 1;
+ }
+
+ if (test_syscall(&val)) {
+ ksft_print_msg("The test syscall failed\n");
+ return 1;
+ }
+ if (val != ~X7_TEST_VAL) {
+ ksft_print_msg("Unexpected x7: %lx\n", val);
+ return 1;
+ }
+
+ return 0;
+}
+
+#ifndef PTRACE_SYSEMU
+#define PTRACE_SYSEMU 31
+#endif
+
+int main(int argc, void **argv)
+{
+ union {
+ struct user_regs_struct r;
+ struct {
+ char __regs[272];
+ unsigned long long orig_x0;
+ unsigned long long orig_x7;
+ };
+ } regs = {};
+ struct iovec iov = {
+ .iov_base = &regs,
+ .iov_len = sizeof(regs),
+ };
+ int status;
+ pid_t pid;
+
+ ksft_set_plan(4);
+
+ pid = fork();
+ if (pid == 0) {
+ kill(getpid(), SIGSTOP);
+ _exit(child());
+ }
+ if (pid < 0)
+ return 1;
+
+ if (ptrace_and_wait(pid, PTRACE_ATTACH, SIGSTOP))
+ return 1;
+ /* skip SIGSTOP */
+ if (ptrace_and_wait(pid, PTRACE_CONT, SIGSTOP))
+ return 1;
+
+ /* Resume the child to the next system call. */
+ if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGTRAP))
+ return 1;
+
+ /* Check that x7 is 0 on syscall-enter. */
+ if (ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))
+ return pr_perror("Can't get child registers");
+ if (regs.orig_x7 != X7_TEST_VAL)
+ return pr_fail("Unexpected orig_x7: %lx", regs.orig_x7);
+ if (regs.r.regs[7] != 0)
+ return pr_fail("Unexpected x7: %lx", regs.r.regs[7]);
+ ksft_test_result_pass("x7: %llx\n", regs.r.regs[7]);
+
+ if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGTRAP))
+ return 1;
+
+ /* Check that x7 is 1 on syscall-exit. */
+ if (ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))
+ return pr_perror("Can't get child registers");
+ if (regs.r.regs[7] != 1)
+ return pr_fail("Unexpected x7: %lx", regs.r.regs[7]);
+ ksft_test_result_pass("x7: %llx\n", regs.r.regs[7]);
+
+ /* Check that the child will not see a new value of x7. */
+ regs.r.regs[0] = 0;
+ regs.r.regs[7] = ~X7_TEST_VAL;
+ if (ptrace(PTRACE_SETREGSET, pid, NT_PRSTATUS, &iov))
+ return pr_perror("Can't set child registers");
+
+ /* Resume the child to the next system call. */
+ if (ptrace_and_wait(pid, PTRACE_SYSEMU, SIGTRAP))
+ return 1;
+
+ /* Check that orig_x7 contains the actual value of x7. */
+ if (ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))
+ return pr_perror("Can't get child registers");
+ if (regs.orig_x7 != X7_TEST_VAL)
+ return pr_fail("Unexpected orig_x7: %lx", regs.orig_x7);
+ ksft_test_result_pass("x7: %llx\n", regs.orig_x7);
+
+ /* Check that the child will see a new value of x7. */
+ regs.r.regs[0] = 0;
+ regs.orig_x7 = ~X7_TEST_VAL;
+ if (ptrace(PTRACE_SETREGSET, pid, NT_PRSTATUS, &iov))
+ return pr_perror("Can't set child registers");
+
+ if (ptrace(PTRACE_CONT, pid, 0, 0))
+ return pr_perror("Can't resume the child %d", pid);
+ if (waitpid(pid, &status, 0) != pid)
+ return pr_perror("Can't wait for the child %d", pid);
+ if (status != 0)
+ return pr_fail("Child exited with code %d.", status);
+
+ ksft_test_result_pass("The child exited with code 0.\n");
+ ksft_exit_pass();
+ return 0;
+}
+
--
2.29.2

2021-03-22 22:57:20

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 3/4] selftest/arm64/ptrace: add a test for orig_x0

The test creates two processes where one traces another one. The tracee
executes a system call, the tracer traps it, changes orig_x0, triggers a
signal and checks that the syscall is restarted with the setted
argument.

Test output:
$ ./ptrace_restart_syscall_test
1..3
ok 1 orig_x0: 0x3
ok 2 x0: 0x5
ok 3 The child exited with code 0.
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Andrei Vagin <[email protected]>
---
tools/testing/selftests/arm64/ptrace/Makefile | 6 +
tools/testing/selftests/arm64/ptrace/lib.h | 36 ++++++
.../ptrace/ptrace_restart_syscall_test.c | 122 ++++++++++++++++++
3 files changed, 164 insertions(+)
create mode 100644 tools/testing/selftests/arm64/ptrace/Makefile
create mode 100644 tools/testing/selftests/arm64/ptrace/lib.h
create mode 100644 tools/testing/selftests/arm64/ptrace/ptrace_restart_syscall_test.c

diff --git a/tools/testing/selftests/arm64/ptrace/Makefile b/tools/testing/selftests/arm64/ptrace/Makefile
new file mode 100644
index 000000000000..1bc10e2d2ac8
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CFLAGS += -g -I../../../../../usr/include/
+TEST_GEN_PROGS := ptrace_restart_syscall_test
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/ptrace/lib.h b/tools/testing/selftests/arm64/ptrace/lib.h
new file mode 100644
index 000000000000..14f4737188a3
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/lib.h
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#ifndef __PTRACE_TEST_LOG_H
+#define __PTRACE_TEST_LOG_H
+
+#define pr_p(func, fmt, ...) func("%s:%d: " fmt ": %m", \
+ __func__, __LINE__, ##__VA_ARGS__)
+
+#define pr_err(fmt, ...) \
+ ({ \
+ ksft_test_result_error(fmt "\n", ##__VA_ARGS__); \
+ -1; \
+ })
+
+#define pr_fail(fmt, ...) \
+ ({ \
+ ksft_test_result_fail(fmt "\n", ##__VA_ARGS__); \
+ -1; \
+ })
+
+#define pr_perror(fmt, ...) pr_p(pr_err, fmt, ##__VA_ARGS__)
+
+static inline int ptrace_and_wait(pid_t pid, int cmd, int sig)
+{
+ int status;
+
+ /* Stop on syscall-exit. */
+ if (ptrace(cmd, pid, 0, 0))
+ return pr_perror("Can't resume the child %d", pid);
+ if (waitpid(pid, &status, 0) != pid)
+ return pr_perror("Can't wait for the child %d", pid);
+ if (!WIFSTOPPED(status) || WSTOPSIG(status) != sig)
+ return pr_err("Unexpected status: %x", status);
+ return 0;
+}
+
+#endif
diff --git a/tools/testing/selftests/arm64/ptrace/ptrace_restart_syscall_test.c b/tools/testing/selftests/arm64/ptrace/ptrace_restart_syscall_test.c
new file mode 100644
index 000000000000..ce59657f41be
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/ptrace_restart_syscall_test.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+
+#include <sys/types.h>
+#include <sys/ptrace.h>
+#include <sys/stat.h>
+#include <sys/user.h>
+#include <sys/wait.h>
+#include <sys/uio.h>
+#include <linux/elf.h>
+#include <linux/unistd.h>
+
+#include "../../kselftest.h"
+#include "lib.h"
+
+static int child(int fd)
+{
+ char c;
+
+ if (read(fd, &c, 1) != 1)
+ return 1;
+
+ return 0;
+}
+
+int main(int argc, void **argv)
+{
+ union {
+ struct user_regs_struct r;
+ struct {
+ char __regs[272];
+ unsigned long long orig_x0;
+ unsigned long long orig_x7;
+ };
+ } regs = {};
+ struct iovec iov = {
+ .iov_base = &regs,
+ .iov_len = sizeof(regs),
+ };
+ int status;
+ pid_t pid;
+ int p[2], fdzero;
+
+ ksft_set_plan(3);
+
+ if (pipe(p))
+ return pr_perror("Can't create a pipe");
+ fdzero = open("/dev/zero", O_RDONLY);
+ if (fdzero < 0)
+ return pr_perror("Can't open /dev/zero");
+
+ pid = fork();
+ if (pid == 0) {
+ kill(getpid(), SIGSTOP);
+ return child(p[0]);
+ }
+ if (pid < 0)
+ return 1;
+
+ if (ptrace(PTRACE_ATTACH, pid, 0, 0))
+ return pr_perror("Can't attach to the child %d", pid);
+ if (waitpid(pid, &status, 0) != pid)
+ return pr_perror("Can't wait for the child %d", pid);
+ /* Skip SIGSTOP */
+ if (ptrace_and_wait(pid, PTRACE_CONT, SIGSTOP))
+ return 1;
+
+ /* Resume the child to the next system call. */
+ if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGTRAP))
+ return 1;
+
+ /* Send a signal to interrupt the system call. */
+ kill(pid, SIGUSR1);
+
+ /* Stop on syscall-exit. */
+ if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGTRAP))
+ return 1;
+
+ if (ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))
+ return pr_perror("Can't get child registers");
+ if (regs.orig_x0 != p[0])
+ return pr_fail("Unexpected x0: 0x%lx", regs.r.regs[0]);
+ ksft_test_result_pass("orig_x0: 0x%llx\n", regs.orig_x0);
+
+ /* Change orig_x0 that will be x0 for the restarted system call. */
+ regs.orig_x0 = fdzero;
+ if (ptrace(PTRACE_SETREGSET, pid, NT_PRSTATUS, &iov))
+ return pr_perror("Can't get child registers");
+
+ /* Trap the signal and skip it. */
+ if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGUSR1))
+ return 1;
+
+ /* Trap the restarted system call. */
+ if (ptrace_and_wait(pid, PTRACE_SYSCALL, SIGTRAP))
+ return 1;
+
+ /* Check that the syscall is started with the right first argument. */
+ if (ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))
+ return pr_perror("Can't get child registers");
+ if (regs.r.regs[0] != fdzero)
+ return pr_fail("unexpected x0: %lx", regs.r.regs[0]);
+ ksft_test_result_pass("x0: 0x%llx\n", regs.r.regs[0]);
+
+ if (ptrace(PTRACE_CONT, pid, 0, 0))
+ return pr_perror("Can't resume the child %d", pid);
+ if (waitpid(pid, &status, 0) != pid)
+ return pr_perror("Can't wait for the child %d", pid);
+ if (status != 0)
+ return pr_fail("Child exited with code %d.", status);
+
+ ksft_test_result_pass("The child exited with code 0.\n");
+ ksft_exit_pass();
+ return 0;
+}
+
--
2.29.2

2021-03-26 18:41:15

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 2/4] arm64/ptrace: introduce orig_x7 in the user_pt_regs structure

On Mon, Mar 22, 2021 at 03:50:51PM -0700, Andrei Vagin wrote:
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index d4cdf98ac003..1008f0fbc5ea 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -184,6 +184,7 @@ struct pt_regs {
> u64 pc;
> u64 pstate;
> u64 orig_x0;
> + u64 orig_x7;
> };
> };
> #ifdef __AARCH64EB__
> diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
> index 3c118c5b0893..be7583ff5f4d 100644
> --- a/arch/arm64/include/uapi/asm/ptrace.h
> +++ b/arch/arm64/include/uapi/asm/ptrace.h
> @@ -91,6 +91,7 @@ struct user_pt_regs {
> __u64 pc;
> __u64 pstate;
> __u64 orig_x0;
> + __u64 orig_x7;
> };

Same here. So unless I miss something, we better have a separate
NT_ORIGREG (or some better name) regset to retrieve the additional
registers. Or, if you want to get all of them in one go, just add a
new one similar to NT_PRSTATUS but which restores x0 to orig_x0 and x7
to orig_x7.

Sorry if this was already discussed. I had a brief look at the previous
versions and couldn't see a user_pt_regs structure change, nor a
suggestion to do so.

--
Catalin