Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2228397imm; Thu, 27 Sep 2018 09:21:21 -0700 (PDT) X-Google-Smtp-Source: ACcGV635PCHPv9A5w3Gh3XhN7Ae8hOH3AmkNdSYkODTPDA6cWyGXA3vnixp+KbGOMvZitBn8GCPH X-Received: by 2002:a62:8f0c:: with SMTP id n12-v6mr8786034pfd.172.1538065281791; Thu, 27 Sep 2018 09:21:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538065281; cv=none; d=google.com; s=arc-20160816; b=sgW37gZt/ju6Hy2k5U291VsJIggYoY0swHt58gRgcLb7IhsMBUIqNegLQSOvadsaJZ RR2OIQTHWZ+UHiDw3wRqGDq2pLhy+R4U0v4G7Y+UES+cmSPJJAIsSt09R1c2hEW4CMfk 4jkZ4HbuuNckD5eyAaM34ZPZNiBy1futmOx2pfyqR8XUqjkNZ90u5t1cvy1bQ/12WYmF +lK913TIdmFQpu5a+KpYKwf9UsYFa9P3cF735EESy0Z+Cf2Nu/K2zUUUWZM5oBHpZQkR gPNP0u/dICH+RRnoyQai0n+cscH5vRM9b2LBZB+78Aq/UlraX7blstujnMJWgrTVsV+t IHpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=WmfJZ8I3ByYiNWKyiZ67w64THDEDx9w7tnj/8EtSRPw=; b=I5HyHMQUlJCwNlt0XtkFnxSpvArYbRxWHo/P4v2IT6uDVHAsrSvqznKTKP1pCdMDCu QgV5jaNoII1Itt49C0AYcjWTL831jilpZvuL3cXhzMoqRXk+pnofjGDv6Gqy92ENOPXv M8z7Tr+qelCiRjwzu2I7tgkFC56ahduM49A/RvikNxWJijqVP4zkY5hdDq7Nx6fWWgON dNtIDJ5eNSPrYtUuDITwAOt2MG6JaRyvQoObcnmuJlPWFJTfynvFkdPV1EZF1nkY9MHN eAcvrreDZQOBSDmbBXLpCfnBiQu1pB4gwpEicgwdf6MzgqoFTQHMfvU8YY3BdzsMjGqx zrlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mADmgmg+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f190-v6si2514222pfc.327.2018.09.27.09.21.05; Thu, 27 Sep 2018 09:21:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=mADmgmg+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728328AbeI0Wjv (ORCPT + 99 others); Thu, 27 Sep 2018 18:39:51 -0400 Received: from mail-ot1-f65.google.com ([209.85.210.65]:44359 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728285AbeI0Wjv (ORCPT ); Thu, 27 Sep 2018 18:39:51 -0400 Received: by mail-ot1-f65.google.com with SMTP id 36-v6so3104212oth.11 for ; Thu, 27 Sep 2018 09:20:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WmfJZ8I3ByYiNWKyiZ67w64THDEDx9w7tnj/8EtSRPw=; b=mADmgmg+7OMfA3wqKNTesEfVENnn0c0yA61kpZCZib5Mcywm9ORo5UV51I2K8E41/2 OeSnYwY6nr/hamYhc1h83HFNaZp6TfYOPBXI9nyNdO4VIDFDFHzPwAXhx4eTbJEUTPoD NE26fgWedJrd9MvysPgWK1lbn5fZF3CXdsMEnkIBguCR3w1MEe9OXMCn+RoQAxGOhj5S 9n9eJY1QLdsLX26Ft8fb609dslup7dBy+JFE1ta8nbAfSFVKicOB+iOvT+4v6YR+20M8 0K1J34RYhpsHrit+bBNEDqDK0tZVb0suMlVr82cweeEGsLV6g01EBpkO5XYHN9ntDd1V OLDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WmfJZ8I3ByYiNWKyiZ67w64THDEDx9w7tnj/8EtSRPw=; b=QtULo0O+RTx9lewoZrtDtcVb2ESIRIXvJNxjJpx7iU35NaqfUp6MWMv2OcB6fJI6iC +Lbb47ZQY3pQf0UkwRGkJpBai8rkSJGpimsm6opSKKdV2k8sEtVXKIGjrWoSD9+IKa/t gRkXdYZi9SW5hzj9E/ZYgEJK9z6hKinO9YLqV+lDFz3Z3X019agm9jHxv/ncjCFl8Dh7 R21ezonAhkiWzcwpmkPnxtX+FgQHoRY8Quw/SK3oT7gvS3NRORbJG/FZOI85byZOsjYZ qRIBQeQr7+uYgWDBZ2B6u7gncowTL7KUatQmJUMskOJ3+TE+47Y09LWB1IzzqChHk9Pp qrNQ== X-Gm-Message-State: ABuFfojrE3ectX5FnRpaXYO09U4zoIuMsolZKcDcHeqCdF/F1IlJziyy wCcCkhypqrvmu0AjyZyVqzWP9/cfTPVjezOoGmy1vg== X-Received: by 2002:a9d:2843:: with SMTP id h3-v6mr8013807otd.230.1538065249380; Thu, 27 Sep 2018 09:20:49 -0700 (PDT) MIME-Version: 1.0 References: <20180927151119.9989-1-tycho@tycho.ws> <20180927151119.9989-4-tycho@tycho.ws> In-Reply-To: <20180927151119.9989-4-tycho@tycho.ws> From: Jann Horn Date: Thu, 27 Sep 2018 18:20:23 +0200 Message-ID: Subject: Re: [PATCH v7 3/6] seccomp: add a way to get a listener fd from ptrace To: Tycho Andersen Cc: Kees Cook , kernel list , containers@lists.linux-foundation.org, Linux API , Andy Lutomirski , Oleg Nesterov , "Eric W. Biederman" , "Serge E. Hallyn" , Christian Brauner , Tyler Hicks , suda.akihiro@lab.ntt.co.jp, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 27, 2018 at 5:11 PM Tycho Andersen wrote: > As an alternative to SECCOMP_FILTER_FLAG_GET_LISTENER, perhaps a ptrace() > version which can acquire filters is useful. There are at least two reasons > this is preferable, even though it uses ptrace: > > 1. You can control tasks that aren't cooperating with you > 2. You can control tasks whose filters block sendmsg() and socket(); if the > task installs a filter which blocks these calls, there's no way with > SECCOMP_FILTER_FLAG_GET_LISTENER to get the fd out to the privileged task. > > v2: fix a bug where listener mode was not unset when an unused fd was not > available > v3: fix refcounting bug (Oleg) > v4: * change the listener's fd flags to be 0 > * rename GET_LISTENER to NEW_LISTENER (Matthew) > v5: * add capable(CAP_SYS_ADMIN) requirement > v7: * point the new listener at the right filter (Jann) > > Signed-off-by: Tycho Andersen > CC: Kees Cook > CC: Andy Lutomirski > CC: Oleg Nesterov > CC: Eric W. Biederman > CC: "Serge E. Hallyn" > CC: Christian Brauner > CC: Tyler Hicks > CC: Akihiro Suda If you address the two nits below, you can add: Reviewed-by: Jann Horn > include/linux/seccomp.h | 7 ++ > include/uapi/linux/ptrace.h | 2 + > kernel/ptrace.c | 4 ++ > kernel/seccomp.c | 31 +++++++++ > tools/testing/selftests/seccomp/seccomp_bpf.c | 68 +++++++++++++++++++ > 5 files changed, 112 insertions(+) > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > index 017444b5efed..234c61b37405 100644 > --- a/include/linux/seccomp.h > +++ b/include/linux/seccomp.h > @@ -83,6 +83,8 @@ static inline int seccomp_mode(struct seccomp *s) > #ifdef CONFIG_SECCOMP_FILTER > extern void put_seccomp_filter(struct task_struct *tsk); > extern void get_seccomp_filter(struct task_struct *tsk); > +extern long seccomp_new_listener(struct task_struct *task, > + unsigned long filter_off); Nit: Sorry, I only noticed this just now, but this should have return type int, not long. ptrace_request() returns an int, and an fd is also normally represented as an int, not a long. > #else /* CONFIG_SECCOMP_FILTER */ > static inline void put_seccomp_filter(struct task_struct *tsk) > { > @@ -92,6 +94,11 @@ static inline void get_seccomp_filter(struct task_struct *tsk) > { > return; > } > +static inline long seccomp_new_listener(struct task_struct *task, > + unsigned long filter_off) > +{ > + return -EINVAL; > +} > #endif /* CONFIG_SECCOMP_FILTER */ > > #if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE) > diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h > index d5a1b8a492b9..e80ecb1bd427 100644 > --- a/include/uapi/linux/ptrace.h > +++ b/include/uapi/linux/ptrace.h > @@ -73,6 +73,8 @@ struct seccomp_metadata { > __u64 flags; /* Output: filter's flags */ > }; > > +#define PTRACE_SECCOMP_NEW_LISTENER 0x420e > + > /* Read signals from a shared (process wide) queue */ > #define PTRACE_PEEKSIGINFO_SHARED (1 << 0) > > diff --git a/kernel/ptrace.c b/kernel/ptrace.c > index 21fec73d45d4..289960ac181b 100644 > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -1096,6 +1096,10 @@ int ptrace_request(struct task_struct *child, long request, > ret = seccomp_get_metadata(child, addr, datavp); > break; > > + case PTRACE_SECCOMP_NEW_LISTENER: > + ret = seccomp_new_listener(child, addr); > + break; > + > default: > break; > } > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index 44a31ac8373a..17685803a2af 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -1777,4 +1777,35 @@ static struct file *init_listener(struct task_struct *task, > > return ret; > } > + > +long seccomp_new_listener(struct task_struct *task, > + unsigned long filter_off) > +{ > + struct seccomp_filter *filter; > + struct file *listener; > + int fd; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EACCES; > + > + filter = get_nth_filter(task, filter_off); > + if (IS_ERR(filter)) > + return PTR_ERR(filter); > + > + fd = get_unused_fd_flags(0); s/0/O_CLOEXEC/ ? If userspace needs a non-cloexec fd, userspace can easily unset O_CLOEXEC; but the reverse isn't true, because it'd be racy. > + if (fd < 0) { > + __put_seccomp_filter(filter); > + return fd; > + } > + > + listener = init_listener(task, filter); > + __put_seccomp_filter(filter); > + if (IS_ERR(listener)) { > + put_unused_fd(fd); > + return PTR_ERR(listener); > + } > + > + fd_install(fd, listener); > + return fd; > +} > #endif > diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c > index 5f4b836a6792..c6ba3ed5392e 100644 > --- a/tools/testing/selftests/seccomp/seccomp_bpf.c > +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c > @@ -193,6 +193,10 @@ int seccomp(unsigned int op, unsigned int flags, void *args) > } > #endif > > +#ifndef PTRACE_SECCOMP_NEW_LISTENER > +#define PTRACE_SECCOMP_NEW_LISTENER 0x420e > +#endif > + > #if __BYTE_ORDER == __LITTLE_ENDIAN > #define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n])) > #elif __BYTE_ORDER == __BIG_ENDIAN > @@ -3175,6 +3179,70 @@ TEST(get_user_notification_syscall) > EXPECT_EQ(0, WEXITSTATUS(status)); > } > > +TEST(get_user_notification_ptrace) > +{ > + pid_t pid; > + int status, listener; > + int sk_pair[2]; > + char c; > + struct seccomp_notif req = {}; > + struct seccomp_notif_resp resp = {}; > + > + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) { > + EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0); > + > + /* Test that we get ENOSYS while not attached */ > + EXPECT_EQ(syscall(__NR_getpid), -1); > + EXPECT_EQ(errno, ENOSYS); > + > + /* Signal we're ready and have installed the filter. */ > + EXPECT_EQ(write(sk_pair[1], "J", 1), 1); > + > + EXPECT_EQ(read(sk_pair[1], &c, 1), 1); > + EXPECT_EQ(c, 'H'); > + > + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); > + } > + > + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); > + EXPECT_EQ(c, 'J'); > + > + EXPECT_EQ(ptrace(PTRACE_ATTACH, pid), 0); > + EXPECT_EQ(waitpid(pid, NULL, 0), pid); > + listener = ptrace(PTRACE_SECCOMP_NEW_LISTENER, pid, 0); > + EXPECT_GE(listener, 0); > + > + /* EBUSY for second listener */ > + EXPECT_EQ(ptrace(PTRACE_SECCOMP_NEW_LISTENER, pid, 0), -1); > + EXPECT_EQ(errno, EBUSY); > + > + EXPECT_EQ(ptrace(PTRACE_DETACH, pid, NULL, 0), 0); > + > + /* Now signal we are done and respond with magic */ > + EXPECT_EQ(write(sk_pair[0], "H", 1), 1); > + > + req.len = sizeof(req); > + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_RECV, &req), sizeof(req)); > + > + resp.len = sizeof(resp); > + resp.id = req.id; > + resp.error = 0; > + resp.val = USER_NOTIF_MAGIC; > + > + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > + > + close(listener); > +} > + > /* > * Check that a pid in a child namespace still shows up as valid in ours. > */ > -- > 2.17.1 >