Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3046260ybf; Mon, 2 Mar 2020 21:49:51 -0800 (PST) X-Google-Smtp-Source: ADFU+vshHo7scnxvndJfJduTt0bcknS+H8/KxiwxVseGxzKmPBXGtup20zGy6wMNm70G5IxGdWb6 X-Received: by 2002:a9d:6255:: with SMTP id i21mr2240567otk.183.1583214591255; Mon, 02 Mar 2020 21:49:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583214591; cv=none; d=google.com; s=arc-20160816; b=Tv/pcx7v+AifJFeKSyTRT95HnSz465xR23s8CVPGx2c0jkRf7mdtky2l43nWkiW7eC lCC3+peeA0NZRCc9Eo/OJV/5KuDWORopgcjoF5eWpPTZaTjIf4HUT/2dcS+ymMs3AICB djXG9CNJxaNssizNoHIT74rcpqsdygbVSC34m1rdECVHf/aigiT0pRO6pmBy8kFCZKK1 nnS/Hv+NMWQ7b9lZDnhfwgm+CFUDn890yI9Q/kCiFXv+aUsmJvw/Q285n3Rqo9CJb9Jt essQzylQKOEJ4LhV/McdbFfp0kIIn/6yByFWzp/xPcqM/z1Gs2ggbpC7yqijqlDmUqdO FD7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=5j43Mr/bIrOBMk42mKrN2eImbOvbfPNmFOTHSb411kg=; b=Tu4wasjeM/vAnyzZwyfPOqUkPsLDYqZ62bBCg2Z39NEYL4s/r7e4kHPlkjVRTVqECT d1X5GHjXPwjwKCGaiTgUGT9ysUVfaqh0+FQY7mBqsil5skhpbZdCVsmQIwuXXaui6lAs nnsv435SUGmuHpJ7i3vLVFDWCqggPKhb/KdipM6yDmTGe99T6fF5oiq8POjP/Vmn4LFP FF7au+e8oBpuxUTbk5ZvgS1f2N5Yt9sFQJEjePgoyQiB1qtCU6RPE0f7MQ9cBtvQeyse 3dEVnjLXtNPZEiuMu2iJbIetCMmcGxrzNV7KpeTVdPl3LMWHV9T8hFcG11qljydr7ma3 AlQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=OAD7nK7j; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s129si8159811oig.177.2020.03.02.21.49.39; Mon, 02 Mar 2020 21:49:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=OAD7nK7j; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727407AbgCCFsh (ORCPT + 99 others); Tue, 3 Mar 2020 00:48:37 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:45860 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725765AbgCCFsh (ORCPT ); Tue, 3 Mar 2020 00:48:37 -0500 Received: by mail-pf1-f193.google.com with SMTP id 2so878146pfg.12 for ; Mon, 02 Mar 2020 21:48:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=5j43Mr/bIrOBMk42mKrN2eImbOvbfPNmFOTHSb411kg=; b=OAD7nK7jEoAlZecHBF+pnbuP/UPbpiVptrt3hVUEk4WE3IiIyEzFqMVYyAfhlzvDYB tMbkY8MXm6lWVLO8RpQ3Wfsd1LJ6XnEYwgjQGp7KBUI503tqt0Kv3BhKjoVIgTPHSXfQ zu20aUo7oMZw5nJnLi+MJqJWi+OsA9bj6ZCmU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5j43Mr/bIrOBMk42mKrN2eImbOvbfPNmFOTHSb411kg=; b=LyJ6yZ9ArlMbTUZsSS6kiSzf1Ibb3hkQ61o690K6PVfdxlBszyRlMEvP/YGstEROyx Lczu4uzqt5ZOWdNGmNUFnsQe2rHI1KjfCrML2oi+3LLfWbUh9wQreVA+zkWR+7ioe3Uh Ot17dcAEJ/Nl+vbH/DFA44qmzputKur6wRnH4ZgnzLTX8QlTWbV9CKjsLIkvNXIJAW3K mfwsgYNx/zGHfhewfDF8EQ4hZO+/YPGtq/92LH3X+tbDBODZ11pGvk8Smd1lXGs1bjq8 RL7QwMEJbaqjC6SMvGM9nKU0zQVm7pRG/SvJ+mJysfvaz3ALwGUTI7TB4gd8ySORmesy xLEw== X-Gm-Message-State: ANhLgQ2xTRBBtT3NQ4MSoUNBhenPnu0AVXq/kgpn/ap48CtYDXBR9u1u Z0sIkXKNpPyAHrO9AJbhTu/+MhyXvuc= X-Received: by 2002:a63:e053:: with SMTP id n19mr1049656pgj.64.1583214516048; Mon, 02 Mar 2020 21:48:36 -0800 (PST) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id y142sm14629115pfb.25.2020.03.02.21.48.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Mar 2020 21:48:35 -0800 (PST) Date: Mon, 2 Mar 2020 21:48:34 -0800 From: Kees Cook To: Tycho Andersen Cc: linux-kernel@vger.kernel.org, Matthew Denton Subject: Re: [PATCH] seccomp: allow TSYNC and USER_NOTIF together Message-ID: <202003022137.9DAB55E6F0@keescook> References: <20200206165027.18415-1-tycho@tycho.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200206165027.18415-1-tycho@tycho.ws> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 06, 2020 at 09:50:27AM -0700, Tycho Andersen wrote: > The restriction introduced in 7a0df7fbc145 ("seccomp: Make NEW_LISTENER and > TSYNC flags exclusive") is mostly artificial: there is enough information > in a seccomp user notification to tell which thread triggered a > notification. The reason it was introduced is because TSYNC makes the > syscall return a thread-id on failure, and NEW_LISTENER returns an fd, and > there's no way to distinguish between these two cases (well, I suppose the > caller could check all fds it has, then do the syscall, and if the return > value was an fd that already existed, then it must be a thread id, but > bleh). > > Matthew would like to use these two flags together in the Chrome sandbox > which wants to use TSYNC for video drivers and NEW_LISTENER to proxy > syscalls. > > So, let's fix this ugliness by adding another flag, NO_TID_ON_TSYNC_ERR, > which tells the kernel to just return -EAGAIN on a TSYNC error. This way, > NEW_LISTENER (and any subsequent seccomp() commands that want to return > positive values) don't conflict with each other. > > Suggested-by: Matthew Denton > Signed-off-by: Tycho Andersen Thanks for this! (And thanks for waiting on my review!) Yeah, this makes things much more sensible. If we get a third thing that wants to be returned, we'll have to rev the userspace struct API to have an "output" area. :P Bike shedding below... > --- > include/linux/seccomp.h | 3 +- > include/uapi/linux/seccomp.h | 1 + > kernel/seccomp.c | 14 +++- > tools/testing/selftests/seccomp/seccomp_bpf.c | 74 ++++++++++++++++++- > 4 files changed, 86 insertions(+), 6 deletions(-) > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > index 03583b6d1416..e0560a941ed1 100644 > --- a/include/linux/seccomp.h > +++ b/include/linux/seccomp.h > @@ -7,7 +7,8 @@ > #define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ > SECCOMP_FILTER_FLAG_LOG | \ > SECCOMP_FILTER_FLAG_SPEC_ALLOW | \ > - SECCOMP_FILTER_FLAG_NEW_LISTENER) > + SECCOMP_FILTER_FLAG_NEW_LISTENER | \ > + SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR) > > #ifdef CONFIG_SECCOMP > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h > index be84d87f1f46..64678cc20e18 100644 > --- a/include/uapi/linux/seccomp.h > +++ b/include/uapi/linux/seccomp.h > @@ -22,6 +22,7 @@ > #define SECCOMP_FILTER_FLAG_LOG (1UL << 1) > #define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) > #define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3) > +#define SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR (1UL << 4) Bikeshed: what do you think about calling this SECCOMP_FILTER_FLAG_TSYNC_ESRCH to mean "I don't care _which_ thread, just fail" (See below about the ESRCH bit...) > > /* > * All BPF programs must return a 32-bit value. > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index b6ea3dcb57bf..fa01ec085d60 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -528,8 +528,12 @@ static long seccomp_attach_filter(unsigned int flags, > int ret; > > ret = seccomp_can_sync_threads(); > - if (ret) > - return ret; > + if (ret) { > + if (flags & SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR) > + return -EAGAIN; Hm hm, I think EAGAIN is wrong here: this isn't likely to be a transient failure (unless the offending thread dies). The two ways TSYNC can fail are if a thread has seccomp mode 1 set, or if the thread's filter ancestry has already diverged. Trying again isn't really going to help (which is why the original motivation was to return thread details to help debug why TSYNC failed). In the case where the thread id can't be found (container visibility??), we fail with -ESRCH. That might be more sensible than -EAGAIN here. (Or maybe -EBUSY?) > + else > + return ret; > + } > } > > /* Set log flag, if present. */ > @@ -1288,10 +1292,12 @@ static long seccomp_set_mode_filter(unsigned int flags, > * In the successful case, NEW_LISTENER returns the new listener fd. > * But in the failure case, TSYNC returns the thread that died. If you > * combine these two flags, there's no way to tell whether something > - * succeeded or failed. So, let's disallow this combination. > + * succeeded or failed. So, let's disallow this combination if the user > + * has not explicitly requested no errors from TSYNC. > */ > if ((flags & SECCOMP_FILTER_FLAG_TSYNC) && > - (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER)) > + (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) && > + ((flags & SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR) == 0)) > return -EINVAL; > > /* Prepare the new filter before holding any locks. */ > diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c > index ee1b727ede04..b7ec8655dd1c 100644 > --- a/tools/testing/selftests/seccomp/seccomp_bpf.c > +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c > @@ -212,6 +212,10 @@ struct seccomp_notif_sizes { > #define SECCOMP_USER_NOTIF_FLAG_CONTINUE 0x00000001 > #endif > > +#ifndef SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR > +#define SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR (1UL << 4) > +#endif > + > #ifndef seccomp > int seccomp(unsigned int op, unsigned int flags, void *args) > { > @@ -2187,7 +2191,8 @@ TEST(detect_seccomp_filter_flags) > unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC, > SECCOMP_FILTER_FLAG_LOG, > SECCOMP_FILTER_FLAG_SPEC_ALLOW, > - SECCOMP_FILTER_FLAG_NEW_LISTENER }; > + SECCOMP_FILTER_FLAG_NEW_LISTENER, > + SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR }; > unsigned int exclusive[] = { > SECCOMP_FILTER_FLAG_TSYNC, > SECCOMP_FILTER_FLAG_NEW_LISTENER }; > @@ -2645,6 +2650,55 @@ TEST_F(TSYNC, two_siblings_with_one_divergence) > EXPECT_EQ(SIBLING_EXIT_UNKILLED, (long)status); > } > > +TEST_F(TSYNC, two_siblings_with_one_divergence_no_tid_in_err) > +{ > + long ret, flags; > + void *status; > + > + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { > + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); > + } > + > + ret = seccomp(SECCOMP_SET_MODE_FILTER, 0, &self->root_prog); > + ASSERT_NE(ENOSYS, errno) { > + TH_LOG("Kernel does not support seccomp syscall!"); > + } > + ASSERT_EQ(0, ret) { > + TH_LOG("Kernel does not support SECCOMP_SET_MODE_FILTER!"); > + } > + self->sibling[0].diverge = 1; > + tsync_start_sibling(&self->sibling[0]); > + tsync_start_sibling(&self->sibling[1]); > + > + while (self->sibling_count < TSYNC_SIBLINGS) { > + sem_wait(&self->started); > + self->sibling_count++; > + } > + > + flags = SECCOMP_FILTER_FLAG_TSYNC | \ > + SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR; > + ret = seccomp(SECCOMP_SET_MODE_FILTER, flags, &self->apply_prog); > + ASSERT_EQ(EAGAIN, errno) { > + TH_LOG("Did not return EAGAIN for diverged sibling."); > + } > + ASSERT_EQ(-1, ret) { > + TH_LOG("Did not fail on diverged sibling."); > + } > + > + /* Wake the threads */ > + pthread_mutex_lock(&self->mutex); > + ASSERT_EQ(0, pthread_cond_broadcast(&self->cond)) { > + TH_LOG("cond broadcast non-zero"); > + } > + pthread_mutex_unlock(&self->mutex); > + > + /* Ensure they are both unkilled. */ > + PTHREAD_JOIN(self->sibling[0].tid, &status); > + EXPECT_EQ(SIBLING_EXIT_UNKILLED, (long)status); > + PTHREAD_JOIN(self->sibling[1].tid, &status); > + EXPECT_EQ(SIBLING_EXIT_UNKILLED, (long)status); > +} > + > TEST_F(TSYNC, two_siblings_not_under_filter) > { > long ret, sib; > @@ -3196,6 +3250,24 @@ TEST(user_notification_basic) > EXPECT_EQ(0, WEXITSTATUS(status)); > } > > +TEST(user_notification_with_tsync) > +{ > + int ret; > + unsigned int flags; > + > + /* these were exclusive */ > + flags = SECCOMP_FILTER_FLAG_NEW_LISTENER | > + SECCOMP_FILTER_FLAG_TSYNC; > + ASSERT_EQ(-1, user_trap_syscall(__NR_getppid, flags)); > + ASSERT_EQ(EINVAL, errno); > + > + /* but now they're not */ > + flags |= SECCOMP_FILTER_FLAG_NO_TID_ON_TSYNC_ERR; > + ret = user_trap_syscall(__NR_getppid, flags); > + close(ret); > + ASSERT_LE(0, ret); > +} > + > TEST(user_notification_kill_in_middle) > { > pid_t pid; > -- > 2.20.1 > Otherwise, yes, let's do this. Much saner. :) -- Kees Cook