Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3944804imu; Mon, 10 Dec 2018 10:18:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/XWluZ8mFpnr+irObyUTZIj6+yf+RMEbCa6Mr8iFN74DqS3d10yXksIEVJTM7/oM+PhvXrW X-Received: by 2002:a17:902:b68d:: with SMTP id c13mr13015348pls.102.1544465880726; Mon, 10 Dec 2018 10:18:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544465880; cv=none; d=google.com; s=arc-20160816; b=r7Sz/AQ2916kcwmiRkGu3xqKguhdvbu8dxM/HyHL8f9Y8ZIhkctSHATgUfojoWDxIT H14ZUluPiZIyLD19u7XsbahJNTgBfhdyfpOpAANexygznvYhSj+aALOB4I64HS/D5QKL zyh9AQZnC9bvwJUNMbBHxsFSMl8qUeSFJf3eTkYLoYrHAIOAXoWW/hkc1gXI5qrUzmM5 uCqBoALee7xWbHi7UCjHzqYnElUhK6FW8XMhN5vCargeT+/Zk3S5OqMq/HB1nXQOfOUE hEk9JPF3D3qeQ13DQH0rsNOJxGj4rUMavr3dQXX1wMCyY0heubc9UKBb0tRzkwlLJcri 1eaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ydSrZdf0tCMrWRKg5/zGSyre0upgFq9YW9bzOqRlNds=; b=k4Paa2ZuPigQLnipUH3WxrQrFuOSHNBcYbAkao0/zJfllZhzpIt+WIgqGSOCD/A1A+ EmvaExB/m8UVV3lp8+p7b78L7R5pFvZUok0BJjqvAOGpRBbIcTa+lI4BeNVDNYrO5iyP EPn4bwjg2Zz+RB90Td2dBINK4LPAJBEazT6a1HpKpiJ4fWj6FGrNdzXzQvJ3Mek4smp9 wTTz64qInYTmFJzxR9vRCIxOio7BZRyGSgC97knyvgnYLT0mzGOESEBrdDNznTL1RUAg RGOrNOZCffdnlNZ9ugXGu14yWBFxbnOufOohGb6EZWfE+rcFDaf0WFYbQJJds8vskZzK SanA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=RCV0wDWn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 9si9130967pgm.112.2018.12.10.10.17.44; Mon, 10 Dec 2018 10:18:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=RCV0wDWn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728965AbeLJRow (ORCPT + 99 others); Mon, 10 Dec 2018 12:44:52 -0500 Received: from mail-yw1-f68.google.com ([209.85.161.68]:39741 "EHLO mail-yw1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728151AbeLJRov (ORCPT ); Mon, 10 Dec 2018 12:44:51 -0500 Received: by mail-yw1-f68.google.com with SMTP id j6so4288453ywj.6 for ; Mon, 10 Dec 2018 09:44:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ydSrZdf0tCMrWRKg5/zGSyre0upgFq9YW9bzOqRlNds=; b=RCV0wDWnfvJPqocIFiVhEhXXmUCBe81E3FB1vHqt1dVigePUTC/5rbOd0nm0MZ2+sT YW+34brn9JskWLxCcJBGzK/hxo/7OfxnHPK5yNsVBLe3rR0qv4bVtWDR1esZBd11B8p2 Bl4eGN2ynL0YTiuhTUl1ssOfWlTlEzw9PkBRk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ydSrZdf0tCMrWRKg5/zGSyre0upgFq9YW9bzOqRlNds=; b=Wpw9yE3U93h2aHfRxLYyYn9zIdlqlmVb3cTgslEx6PPRtpwvLTlKJQPxSuhPOUAqdC JcVxAZD+5wBrVODvWMPueEGgoNdx3X3BsSHbeMRnghV9xAyJUmmJkdV1WCTt4CauHhYZ 65vg0e4BOW+FDzrsIUPaDqeHFXXJvXWEkDqnvQTD+ir+RqNz3AgEu4okpfjh1dC0lC21 TFDnik0gj1hNGEmH5c3hi5x+qG6S/anBYG7V5KXvlReCuQ38fpSX1eiQEz/H6dUNjK2+ B/I3jp+hO+JKdxgCXFxr4dtAW20IOQL+O35z7MIx/tb1dms5B8SZ2QIOpiW4pGW2BBCs Oy7Q== X-Gm-Message-State: AA+aEWZjYijORV+O4GdLDC0ApIA7mMiKiWRhPVb+0zbc017qD8aMKj5c KaTreQOwPUFXwZZDv8E5U3E2UvnI8LA= X-Received: by 2002:a0d:db97:: with SMTP id d145mr13277529ywe.487.1544463889289; Mon, 10 Dec 2018 09:44:49 -0800 (PST) Received: from mail-yw1-f44.google.com (mail-yw1-f44.google.com. [209.85.161.44]) by smtp.gmail.com with ESMTPSA id i128sm4060695ywb.82.2018.12.10.09.44.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 09:44:48 -0800 (PST) Received: by mail-yw1-f44.google.com with SMTP id t13so4288780ywe.13 for ; Mon, 10 Dec 2018 09:44:47 -0800 (PST) X-Received: by 2002:a81:2890:: with SMTP id o138mr13665414ywo.168.1544463887329; Mon, 10 Dec 2018 09:44:47 -0800 (PST) MIME-Version: 1.0 References: <20181210042352.GA6092@altlinux.org> <20181210043126.GX6131@altlinux.org> In-Reply-To: <20181210043126.GX6131@altlinux.org> From: Kees Cook Date: Mon, 10 Dec 2018 09:44:35 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v5 24/25] ptrace: add PTRACE_GET_SYSCALL_INFO request To: "Dmitry V. Levin" Cc: Oleg Nesterov , Andy Lutomirski , Elvira Khabirova , Eugene Syromiatnikov , Jann Horn , Linux API , strace-devel@lists.strace.io, LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 9, 2018 at 8:31 PM Dmitry V. Levin wrote: > > From: Elvira Khabirova > > PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain > details of the syscall the tracee is blocked in. > > There are two reasons for a special syscall-related ptrace request. > > Firstly, with the current ptrace API there are cases when ptracer cannot > retrieve necessary information about syscalls. Some examples include: > * The notorious int-0x80-from-64-bit-task issue. See [1] for details. > In short, if a 64-bit task performs a syscall through int 0x80, its tracer > has no reliable means to find out that the syscall was, in fact, > a compat syscall, and misidentifies it. > * Syscall-enter-stop and syscall-exit-stop look the same for the tracer. > Common practice is to keep track of the sequence of ptrace-stops in order > not to mix the two syscall-stops up. But it is not as simple as it looks; > for example, strace had a (just recently fixed) long-standing bug where > attaching strace to a tracee that is performing the execve system call > led to the tracer identifying the following syscall-exit-stop as > syscall-enter-stop, which messed up all the state tracking. > * Since the introduction of commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3 > ("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA > and process_vm_readv become unavailable when the process dumpable flag > is cleared. On such architectures as ia64 this results in all syscall > arguments being unavailable for the tracer. > > Secondly, ptracers also have to support a lot of arch-specific code for > obtaining information about the tracee. For some architectures, this > requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall > argument and return value. > > ptrace(2) man page: > > long ptrace(enum __ptrace_request request, pid_t pid, > void *addr, void *data); > ... > PTRACE_GET_SYSCALL_INFO > Retrieve information about the syscall that caused the stop. > The information is placed into the buffer pointed by "data" > argument, which should be a pointer to a buffer of type > "struct ptrace_syscall_info". > The "addr" argument contains the size of the buffer pointed to > by "data" argument (i.e., sizeof(struct ptrace_syscall_info)). > The return value contains the number of bytes available > to be written by the kernel. > If the size of data to be written by the kernel exceeds the size > specified by "addr" argument, the output is truncated. > > Co-authored-by: Dmitry V. Levin > Cc: Oleg Nesterov > Cc: Andy Lutomirski > Cc: Eugene Syromyatnikov > Cc: Kees Cook > Cc: Jann Horn > Cc: linux-api@vger.kernel.org > Cc: strace-devel@lists.strace.io > Signed-off-by: Elvira Khabirova > Signed-off-by: Dmitry V. Levin Reviewed-by: Kees Cook -Kees > --- > > Notes: > v5: > * Change PTRACE_EVENTMSG_SYSCALL_{ENTRY,EXIT} values as requested by Oleg. > * Change struct ptrace_syscall_info: generalize instruction_pointer, > stack_pointer, and frame_pointer fields by moving them from > ptrace_syscall_info.{entry,seccomp} substructures to ptrace_syscall_info > and initializing them for all stops. > * Add PTRACE_SYSCALL_INFO_NONE, set it when not in a syscall stop, > so e.g. "strace -i" could use PTRACE_SYSCALL_INFO_SECCOMP to obtain > instruction_pointer when the tracee is in a signal stop. > * Make available for all architectures: do not conditionalize on > CONFIG_HAVE_ARCH_TRACEHOOK since all syscall_get_* functions > are implemented on all architectures. > > v4: > * Do not introduce task_struct.ptrace_event, > use child->last_siginfo->si_code instead. > * Implement PTRACE_SYSCALL_INFO_SECCOMP and ptrace_syscall_info.seccomp > support along with PTRACE_SYSCALL_INFO_{ENTRY,EXIT} and > ptrace_syscall_info.{entry,exit}. > > v3: > * Change struct ptrace_syscall_info. > * Support PTRACE_EVENT_SECCOMP by adding ptrace_event to task_struct. > * Add proper defines for ptrace_syscall_info.op values. > * Rename PT_SYSCALL_IS_ENTERING and PT_SYSCALL_IS_EXITING to > PTRACE_EVENTMSG_SYSCALL_ENTRY and PTRACE_EVENTMSG_SYSCALL_EXIT > * and move them to uapi. > > v2: > * Do not use task->ptrace. > * Replace entry_info.is_compat with entry_info.arch, use syscall_get_arch(). > * Use addr argument of sys_ptrace to get expected size of the struct; > return full size of the struct. > > include/linux/tracehook.h | 9 ++-- > include/uapi/linux/ptrace.h | 39 +++++++++++++++ > kernel/ptrace.c | 99 ++++++++++++++++++++++++++++++++++++- > 3 files changed, 143 insertions(+), 4 deletions(-) > > diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h > index df20f8bdbfa3..6bc7a3d58e2f 100644 > --- a/include/linux/tracehook.h > +++ b/include/linux/tracehook.h > @@ -57,13 +57,15 @@ struct linux_binprm; > /* > * ptrace report for syscall entry and exit looks identical. > */ > -static inline int ptrace_report_syscall(struct pt_regs *regs) > +static inline int ptrace_report_syscall(struct pt_regs *regs, > + unsigned long message) > { > int ptrace = current->ptrace; > > if (!(ptrace & PT_PTRACED)) > return 0; > > + current->ptrace_message = message; > ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0)); > > /* > @@ -76,6 +78,7 @@ static inline int ptrace_report_syscall(struct pt_regs *regs) > current->exit_code = 0; > } > > + current->ptrace_message = 0; > return fatal_signal_pending(current); > } > > @@ -101,7 +104,7 @@ static inline int ptrace_report_syscall(struct pt_regs *regs) > static inline __must_check int tracehook_report_syscall_entry( > struct pt_regs *regs) > { > - return ptrace_report_syscall(regs); > + return ptrace_report_syscall(regs, PTRACE_EVENTMSG_SYSCALL_ENTRY); > } > > /** > @@ -126,7 +129,7 @@ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) > if (step) > user_single_step_report(regs); > else > - ptrace_report_syscall(regs); > + ptrace_report_syscall(regs, PTRACE_EVENTMSG_SYSCALL_EXIT); > } > > /** > diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h > index d5a1b8a492b9..f0af09fe4e17 100644 > --- a/include/uapi/linux/ptrace.h > +++ b/include/uapi/linux/ptrace.h > @@ -73,6 +73,45 @@ struct seccomp_metadata { > __u64 flags; /* Output: filter's flags */ > }; > > +#define PTRACE_GET_SYSCALL_INFO 0x420e > +#define PTRACE_SYSCALL_INFO_NONE 0 > +#define PTRACE_SYSCALL_INFO_ENTRY 1 > +#define PTRACE_SYSCALL_INFO_EXIT 2 > +#define PTRACE_SYSCALL_INFO_SECCOMP 3 > + > +struct ptrace_syscall_info { > + __u8 op; /* PTRACE_SYSCALL_INFO_* */ > + __u8 __pad0[3]; > + __u32 arch; > + __u64 instruction_pointer; > + __u64 stack_pointer; > + __u64 frame_pointer; > + union { > + struct { > + __u64 nr; > + __u64 args[6]; > + } entry; > + struct { > + __s64 rval; > + __u8 is_error; > + __u8 __pad1[7]; > + } exit; > + struct { > + __u64 nr; > + __u64 args[6]; > + __u32 ret_data; > + __u8 __pad2[4]; > + } seccomp; > + }; > +}; > + > +/* > + * These values are stored in task->ptrace_message > + * by tracehook_report_syscall_* to describe the current syscall-stop. > + */ > +#define PTRACE_EVENTMSG_SYSCALL_ENTRY 1 > +#define PTRACE_EVENTMSG_SYSCALL_EXIT 2 > + > /* Read signals from a shared (process wide) queue */ > #define PTRACE_PEEKSIGINFO_SHARED (1 << 0) > > diff --git a/kernel/ptrace.c b/kernel/ptrace.c > index c2cee9db5204..4562b2cb1087 100644 > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -30,6 +30,8 @@ > #include > #include > > +#include /* For syscall_get_* */ > + > /* > * Access another process' address space via ptrace. > * Source/target buffer must be kernel space, > @@ -878,7 +880,98 @@ static int ptrace_regset(struct task_struct *task, int req, unsigned int type, > * to ensure no machine forgets it. > */ > EXPORT_SYMBOL_GPL(task_user_regset_view); > -#endif > +#endif /* CONFIG_HAVE_ARCH_TRACEHOOK */ > + > +static unsigned long > +ptrace_get_syscall_info_entry(struct task_struct *child, struct pt_regs *regs, > + struct ptrace_syscall_info *info) > +{ > + unsigned long args[ARRAY_SIZE(info->entry.args)]; > + int i; > + > + info->op = PTRACE_SYSCALL_INFO_ENTRY; > + info->entry.nr = syscall_get_nr(child, regs); > + syscall_get_arguments(child, regs, 0, ARRAY_SIZE(args), args); > + for (i = 0; i < ARRAY_SIZE(args); i++) > + info->entry.args[i] = args[i]; > + > + return offsetofend(struct ptrace_syscall_info, entry); > +} > + > +static unsigned long > +ptrace_get_syscall_info_seccomp(struct task_struct *child, struct pt_regs *regs, > + struct ptrace_syscall_info *info) > +{ > + /* > + * As struct ptrace_syscall_info.entry is currently a subset > + * of struct ptrace_syscall_info.seccomp, it makes sense to > + * initialize that subset using ptrace_get_syscall_info_entry(). > + * This can be reconsidered in the future if these structures > + * diverge significantly enough. > + */ > + ptrace_get_syscall_info_entry(child, regs, info); > + info->op = PTRACE_SYSCALL_INFO_SECCOMP; > + info->seccomp.ret_data = child->ptrace_message; > + > + return offsetofend(struct ptrace_syscall_info, seccomp); > +} > + > +static unsigned long > +ptrace_get_syscall_info_exit(struct task_struct *child, struct pt_regs *regs, > + struct ptrace_syscall_info *info) > +{ > + info->op = PTRACE_SYSCALL_INFO_EXIT; > + info->exit.rval = syscall_get_error(child, regs); > + info->exit.is_error = !!info->exit.rval; > + if (!info->exit.is_error) > + info->exit.rval = syscall_get_return_value(child, regs); > + > + return offsetofend(struct ptrace_syscall_info, exit); > +} > + > +static int > +ptrace_get_syscall_info(struct task_struct *child, unsigned long user_size, > + void __user *datavp) > +{ > + struct pt_regs *regs = task_pt_regs(child); > + struct ptrace_syscall_info info = { > + .op = PTRACE_SYSCALL_INFO_NONE, > + .arch = syscall_get_arch(child), > + .instruction_pointer = instruction_pointer(regs), > + .stack_pointer = user_stack_pointer(regs), > + .frame_pointer = frame_pointer(regs) > + }; > + unsigned long actual_size = offsetof(struct ptrace_syscall_info, entry); > + unsigned long write_size; > + > + /* > + * This does not need lock_task_sighand() to access > + * child->last_siginfo because ptrace_freeze_traced() > + * called earlier by ptrace_check_attach() ensures that > + * the tracee cannot go away and clear its last_siginfo. > + */ > + switch (child->last_siginfo ? child->last_siginfo->si_code : 0) { > + case SIGTRAP | 0x80: > + switch (child->ptrace_message) { > + case PTRACE_EVENTMSG_SYSCALL_ENTRY: > + actual_size = ptrace_get_syscall_info_entry(child, regs, > + &info); > + break; > + case PTRACE_EVENTMSG_SYSCALL_EXIT: > + actual_size = ptrace_get_syscall_info_exit(child, regs, > + &info); > + break; > + } > + break; > + case SIGTRAP | (PTRACE_EVENT_SECCOMP << 8): > + actual_size = ptrace_get_syscall_info_seccomp(child, regs, > + &info); > + break; > + } > + > + write_size = min(actual_size, user_size); > + return copy_to_user(datavp, &info, write_size) ? -EFAULT : actual_size; > +} > > int ptrace_request(struct task_struct *child, long request, > unsigned long addr, unsigned long data) > @@ -1095,6 +1188,10 @@ int ptrace_request(struct task_struct *child, long request, > ret = seccomp_get_metadata(child, addr, datavp); > break; > > + case PTRACE_GET_SYSCALL_INFO: > + ret = ptrace_get_syscall_info(child, addr, datavp); > + break; > + > default: > break; > } > -- > ldv -- Kees Cook