Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp2040048ybz; Sun, 26 Apr 2020 09:46:19 -0700 (PDT) X-Google-Smtp-Source: APiQypK+m5T/FMGE6IZOneqWuK4FQ/I/yPFh1pATCScw6RgFNuFn3IyQppwzYkprtDmhbTp8P25A X-Received: by 2002:aa7:c649:: with SMTP id z9mr15952511edr.288.1587919579584; Sun, 26 Apr 2020 09:46:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587919579; cv=none; d=google.com; s=arc-20160816; b=xuI00qI1RXX5OPSyqM6+3e3kSCAXAS+m6IHZ0ukOfF5Fzt2nd8wKl8iopjGoBG71/S ESL/v6IW8pRNmgcsBwLAvnlSdgUdc9aF6HoVfbyCxK9F5rKLUl1UCfZ884GDFxzIaSYm jujlwT5acmWVLtSGC2fCGFH9vGMB/jgJXdNtALftiq1KXjkCy3/zmNeqFOSSINdsIYvc 3bVs4hZwkdt4YbSWrhDfdgXYA1zym+BuOq1k7n326ag+yxSSiwgGlfZdYfaGfZ49QNtC fo0JhCkCvh4Q5xhdGz5NOyUX9PvNVl+i13PmNhJQdM0p+sfaNLDv60vehKvmt8foVNsj aCGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=xNSeQcnRdIm+v5k7Ok8Eg2ptts17diTzXMd2K/H0I+M=; b=V+xb8VS3JcCaeoceKY8a/tySF661+0Gkm8/OKbT9RzkoQAnprkJiVBukt9X1GDZcvM QHHgK4oTvDJwmzHwQDi7NsTwJES0g5agv6DcqMbi+EBIHr8V8Yq/86o1c6eXqB1XaSUr ExOJQc5YYFqHvkNcYfQ52bBpNu0r+pef3yYL0Fpe79VlkTvjLhItUJ1pTzkUCmFuVCyL ns6jPtZgmBO+RIxoMyNIxURNhSzQTUsjbnzw3CqPnfeD/SMURc1fJuLXQSKaoAqJIsAu edvHiSQa+1jFdivYHQaI96iOkz5y3fMvOPB2c8jbc/azo9lX05IisG1S60mzwaDwp7O0 B7Cw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j13si6943251edt.371.2020.04.26.09.45.54; Sun, 26 Apr 2020 09:46:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726171AbgDZQoe (ORCPT + 99 others); Sun, 26 Apr 2020 12:44:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726143AbgDZQod (ORCPT ); Sun, 26 Apr 2020 12:44:33 -0400 X-Greylist: delayed 557 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sun, 26 Apr 2020 09:44:33 PDT Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [IPv6:2001:67c:2050::465:201]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C55FC061A0F; Sun, 26 Apr 2020 09:44:33 -0700 (PDT) Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 499D662DTMzQlLM; Sun, 26 Apr 2020 18:35:10 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter01.heinlein-hosting.de (spamfilter01.heinlein-hosting.de [80.241.56.115]) (amavisd-new, port 10030) with ESMTP id lFiSgdPsNgDS; Sun, 26 Apr 2020 18:35:05 +0200 (CEST) From: Hagen Paul Pfeifer To: linux-kernel@vger.kernel.org Cc: Florian Weimer , Al Viro , Hagen Paul Pfeifer , Christian Brauner , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Arnd Bergmann , Brian Gerst , Sami Tolvanen , David Howells , Aleksa Sarai , Andy Lutomirski , Oleg Nesterov , "Eric W . Biederman" , Arnaldo Carvalho de Melo , Sargun Dhillon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org Subject: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall Date: Sun, 26 Apr 2020 18:34:30 +0200 Message-Id: <20200426163430.22743-1-hagen@jauu.net> In-Reply-To: <20200426130100.306246-1-hagen@jauu.net> References: <20200426130100.306246-1-hagen@jauu.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 808B21799 X-Rspamd-Score: 3.05 / 15.00 / 15.00 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Working on a safety-critical stress testing tool, using ptrace in an rather uncommon way (stop, peeking memory, ...) for a bunch of applications in an automated way I realized that once opened processes where restarted and PIDs recycled. Resulting in monitoring and manipulating the wrong processes. With the advent of pidfd we are now able to stick with one stable handle to identifying processes exactly. We now have the ability to get this race free. Sending signals now works like a charm, next step is to extend the functionality also for ptrace. API: long pidfd_ptrace(int pidfd, enum __ptrace_request request, void *addr, void *data, unsigned flags); Based on original ptrace, the following API changes where made: - Process identificator (pidfd) is now moved to start, this is aligned with pidfd_send_signal(int pidfd, ...) because potential future pidfd_* will have one thing in common: the pid identifier. I think is natural to have this argument upfront - Add an additional flags argument, not used now - but you never know All other arguments are identical compared to ptrace - no other modifications where made. Currently there are some pieces missing! This is just an early proposal for a new syscall. Still missing: - support for every architecture - re-use shared functions and move to common place - perf syscall registration - selftests - ...| Userspace Example: #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifndef __NR_pidfd_ptrace #define __NR_pidfd_ptrace 439 #endif static inline long do_pidfd_ptrace(int pidfd, int request, void *addr, void *data, unsigned int flags) { #ifdef __NR_pidfd_ptrace return syscall(__NR_pidfd_ptrace, pidfd, request, addr, data, flags); #else return -ENOSYS; #endif } int main(int argc, char *argv[]) { int pid, pidfd, ret, sleep_time = 10; char pid_path[PATH_MAX]; struct user_regs_struct regs; if (argc < 2) { fprintf(stderr, "Usage: %s \n", argv[0]); goto err; } pid = atoi(argv[1]); sprintf(pid_path, "/proc/%d", pid); pidfd = open(pid_path, O_DIRECTORY | O_CLOEXEC); if (pidfd == -1) { fprintf(stderr, "failed to open %s\n", pid_path); goto err; } ret = do_pidfd_ptrace(pidfd, PTRACE_ATTACH, 0, 0, 0); if (ret < 0) { perror("do_pidfd_ptrace, PTRACE_ATTACH:"); goto err; } waitpid(pid, NULL, 0); ret = do_pidfd_ptrace(pidfd, PTRACE_GETREGS, NULL, ®s, 0); if (ret == -1) { perror("do_pidfd_ptrace, PTRACE_GETREGS:"); goto err; } printf("RIP: %llx\nRAX: %llx\nRCX: %llx\nRDX: %llx\nRSI: %llx\nRDI: %llx\n", regs.rip, regs.rax, regs.rcx, regs.rdx, regs.rsi, regs.rdi); fprintf(stdout, "stopping task for %d seconds\n", sleep_time); sleep(sleep_time); ret = do_pidfd_ptrace(pidfd, PTRACE_DETACH, 0, 0, 0); if (ret == -1) { perror("do_pidfd_ptrace, PTRACE_DETACH:"); goto err; } exit(EXIT_SUCCESS); err: exit(EXIT_FAILURE); } Cc: Christian Brauner Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Arnd Bergmann Cc: Brian Gerst Cc: Sami Tolvanen Cc: David Howells Cc: Aleksa Sarai Cc: Andy Lutomirski Cc: Oleg Nesterov Cc: Eric W. Biederman Cc: Arnaldo Carvalho de Melo Cc: Sargun Dhillon Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Signed-off-by: Hagen Paul Pfeifer --- v2: - fixed a OOPS in __x64_sys_pidfd_ptrace+0x1bf/0x220 (call to __put_task_struct()) - add userland example --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- kernel/ptrace.c | 126 ++++++++++++++++++++----- kernel/sys_ni.c | 1 + 6 files changed, 113 insertions(+), 22 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 54581ac671b4..593f7fab90eb 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -442,3 +442,4 @@ 435 i386 clone3 sys_clone3 437 i386 openat2 sys_openat2 438 i386 pidfd_getfd sys_pidfd_getfd +438 i386 pidfd_ptrace sys_pidfd_ptrace diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 37b844f839bc..cd76d8343510 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -359,6 +359,7 @@ 435 common clone3 sys_clone3 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd +439 common pidfd_ptrace sys_pidfd_ptrace # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 1815065d52f3..254b071a5334 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1003,6 +1003,8 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); +asmlinkage long sys_pidfd_ptrace(int pidfd, long request, unsigned long addr, + unsigned long data, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 3a3201e4618e..d62505742447 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3) __SYSCALL(__NR_openat2, sys_openat2) #define __NR_pidfd_getfd 438 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) +#define __NR_pidfd_ptrace 439 +__SYSCALL(__NR_pidfd_ptrace, sys_pidfd_ptrace) #undef __NR_syscalls -#define __NR_syscalls 439 +#define __NR_syscalls 440 /* * 32 bit systems traditionally used different diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 43d6179508d6..e9e7e3225b9a 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -1239,10 +1240,39 @@ int ptrace_request(struct task_struct *child, long request, #define arch_ptrace_attach(child) do { } while (0) #endif +static inline long ptrace_call(struct task_struct *task, long request, unsigned long addr, + unsigned long data) +{ + long ret; + + if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) { + ret = ptrace_attach(task, request, addr, data); + /* + * Some architectures need to do book-keeping after + * a ptrace attach. + */ + if (!ret) + arch_ptrace_attach(task); + goto out; + } + + ret = ptrace_check_attach(task, request == PTRACE_KILL || + request == PTRACE_INTERRUPT); + if (ret < 0) + goto out; + + ret = arch_ptrace(task, request, addr, data); + if (ret || request != PTRACE_DETACH) + ptrace_unfreeze_traced(task); + + out: + return ret; +} + SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr, unsigned long, data) { - struct task_struct *child; + struct task_struct *task; long ret; if (request == PTRACE_TRACEME) { @@ -1252,35 +1282,89 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr, goto out; } - child = find_get_task_by_vpid(pid); - if (!child) { + task = find_get_task_by_vpid(pid); + if (!task) { ret = -ESRCH; goto out; } - if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) { - ret = ptrace_attach(child, request, addr, data); - /* - * Some architectures need to do book-keeping after - * a ptrace attach. - */ + ret = ptrace_call(task, request, addr, data); + put_task_struct(task); +out: + return ret; +} + +static struct pid *pidfd_to_pid(const struct file *file) +{ + struct pid *pid; + + pid = pidfd_pid(file); + if (!IS_ERR(pid)) + return pid; + + return tgid_pidfd_to_pid(file); +} + +static bool access_pidfd_pidns(struct pid *pid) +{ + struct pid_namespace *active = task_active_pid_ns(current); + struct pid_namespace *p = ns_of_pid(pid); + + for (;;) { + if (!p) + return false; + if (p == active) + break; + p = p->parent; + } + + return true; +} + +SYSCALL_DEFINE5(pidfd_ptrace, int, pidfd, long, request, unsigned long, addr, + unsigned long, data, unsigned int, flags) +{ + long ret; + struct fd f; + struct pid *pid; + struct task_struct *task; + + /* Enforce flags be set to 0 until we add an extension. */ + if (flags) + return -EINVAL; + + if (request == PTRACE_TRACEME) { + ret = ptrace_traceme(); if (!ret) - arch_ptrace_attach(child); - goto out_put_task_struct; + arch_ptrace_attach(current); + goto out; } - ret = ptrace_check_attach(child, request == PTRACE_KILL || - request == PTRACE_INTERRUPT); - if (ret < 0) - goto out_put_task_struct; + f = fdget(pidfd); + if (!f.file) + return -EBADF; - ret = arch_ptrace(child, request, addr, data); - if (ret || request != PTRACE_DETACH) - ptrace_unfreeze_traced(child); + /* Is this a pidfd? */ + pid = pidfd_to_pid(f.file); + if (IS_ERR(pid)) { + ret = PTR_ERR(pid); + goto err; + } - out_put_task_struct: - put_task_struct(child); - out: + ret = -EINVAL; + if (!access_pidfd_pidns(pid)) + goto err; + + task = pid_task(pid, PIDTYPE_PID); + if (!task) { + ret = -EINVAL; + goto err; + } + + ret = ptrace_call(task, request, addr, data); +err: + fdput(f); +out: return ret; } diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 3b69a560a7ac..f7795294b8c4 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -166,6 +166,7 @@ COND_SYSCALL(delete_module); COND_SYSCALL(syslog); /* kernel/ptrace.c */ +COND_SYSCALL_COMPAT(pidfd_ptrace); /* kernel/sched/core.c */ -- 2.26.2