2017-08-09 19:02:49

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 0/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

This series is the result of Fabricio and I going around a few times
on possible solutions for finding a way to enhance RET_KILL to kill
the process group. There's a lot of ways this could be done, but I
wanted something that felt cleanest. As it happens, Tyler's recent
patch series for logging improvement also needs to know a litte bit
more during filter runs, and the solution for both is to pass back
the matched filter. This lets us examine it here for RET_KILL and
in the future for logging changes.

The filter passing is patch 1, the new flag for RET_KILL is patch 2.
Some test refactoring is in patch 3 for the RET_DATA ordering, and
patch 4 is the test for the new RET_KILL flag.

One thing missing is that CRIU will likely need to be updated, since
saving/restoring seccomp filter _rules_ will not include the filter
_flags_ for a process. This can be addressed separately.

Please take a look!

Thanks,

-Kees

v3:
- adjust seccomp_run_filters() to avoid later filters from masking
kill-process RET_KILL actions (drewry)
- add test for masked RET_KILL.

v2:
- moved kill_process bool into struct padding gap (tyhicks)
- improved comments/docs in various places for clarify (tyhicks)
- use ASSERT_TRUE() for WIFEXITED and WIFSIGNALLED (tyhicks)
- adding Reviewed-bys from tyhicks


2017-08-09 19:02:09

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

Right now, SECCOMP_RET_KILL kills the current thread. There have been
a few requests for RET_KILL to kill the entire process (the thread
group), but since seccomp's u32 return values are ABI, and ordered by
lowest value, with RET_KILL as 0, there isn't a trivial way to provide
an even smaller value that would mean the more restrictive action of
killing the thread group.

Changing the definition of SECCOMP_RET_KILL to mean "kill process" and
then adding a new higher definition of SECCOMP_RET_KILL_THREAD isn't
possible since there are already userspace programs depending on the
kill-thread behavior.

Instead, create a simulated SECCOMP_RET_KILL_PROCESS action which
has the semantics of such a return had it been possible to create
it naturally. This is done by adding a filter flag that indicates
that a RET_KILL from this filter must kill the process rather
than the thread. This can be set (and not cleared) via the new
SECCOMP_FILTER_FLAG_KILL_PROCESS flag.

Pros:
- the logic for the filter action is contained in the filter.
- userspace can detect support for the feature since earlier kernels
will reject the new flag.
- can be distinguished from "normal" RET_KILL in the same filter
chain (in case a program wants to choose to kill thread or process).
Cons:
- depends on adding an assignment to the seccomp_run_filters() loop
(previous patch).
- adds logic to the seccomp_run_filters() loop (though it is a single
unlikely test for zero, so the cycle count change isn't measurable).

Alternatives to this approach with pros/cons:

- Change userspace API to use an s32 for BPF return value, and use a
negative value to indicate SECCOMP_RET_KILL_PROCESS.
Pros:
- no change needed to seccomp_run_filters() loop.
- the logic for the filter action is contained in the filter.
- can be distinguished from "normal" RET_KILL in the same filter
chain (in case a program wants to choose to kill thread or process).
Cons:
- there isn't a trivial way for userspace to detect if the kernel
supports the feature (earlier kernels will silently ignore the
new value, only kill the thread).
- need to teach libseccomp about new details, extensive updates
to documentation, and hope there is not confusion about signedness
in existing userspace.

- Use a new test during seccomp_run_filters() that treats the RET_DATA
mask of a RET_KILL action as special. If a new bit is set in the data,
then treat the return value as -1 (lower than 0).
Pros:
- the logic for the filter action is contained in the filter.
- can be distinguished from "normal" RET_KILL in the same filter
chain (in case a program wants to choose to kill thread or process).
Cons:
- adds more than a single unlikely test to time-sensitive
seccomp_run_filters() loop.
- there isn't a trivial way for userspace to detect if the kernel
supports the feature (earlier kernels will silently ignore the
RET_DATA and only kill the thread).
- Violates the "most recent RET_DATA" return value used for all
other actions (since processing must continue to find the first
RET_KILL with the flag set).
- May create problems for existing programs that were using RET_DATA
to distinguish between various SECCOMP_RET_KILL locations.

- Have SECCOMP_FILTER_FLAG_KILL_PROCESS attach to the seccomp struct
rather than the filter.
Pros:
- no change needed to seccomp_run_filters() loop.
- userspace can detect support for the feature since earlier kernels
will reject the new flag.
Cons:
- does not provide a way for a set of filters to distinguish
between wanting to kill a thread or a process.
- the logic for the filter action isn't contained in the filter,
which may lead to synchronization bugs (e.g. vs TSYNC, vs CRIU, etc).

Signed-off-by: Kees Cook <[email protected]>
---
include/linux/seccomp.h | 3 ++-
include/uapi/linux/seccomp.h | 3 ++-
kernel/seccomp.c | 54 ++++++++++++++++++++++++++++++++++++++++----
3 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index ecc296c137cd..59d001ba655c 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,7 +3,8 @@

#include <uapi/linux/seccomp.h>

-#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
+#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
+ SECCOMP_FILTER_FLAG_KILL_PROCESS)

#ifdef CONFIG_SECCOMP

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a43ff1e..4b75d8c297b6 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -15,7 +15,8 @@
#define SECCOMP_SET_MODE_FILTER 1

/* Valid flags for SECCOMP_SET_MODE_FILTER */
-#define SECCOMP_FILTER_FLAG_TSYNC 1
+#define SECCOMP_FILTER_FLAG_TSYNC 1
+#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2

/*
* All BPF programs must return a 32-bit value.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 1f3347fc2605..6a196d1495e6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -44,6 +44,7 @@
* is only needed for handling filters shared across tasks.
* @prev: points to a previously installed, or inherited, filter
* @prog: the BPF program to evaluate
+ * @kill_process: if true, RET_KILL will kill process rather than thread.
*
* seccomp_filter objects are organized in a tree linked via the @prev
* pointer. For any task, it appears to be a singly-linked list starting
@@ -57,6 +58,7 @@
*/
struct seccomp_filter {
refcount_t usage;
+ bool kill_process;
struct seccomp_filter *prev;
struct bpf_prog *prog;
};
@@ -201,8 +203,25 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
*/
for (; f; f = f->prev) {
u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
+ u32 action = cur_ret & SECCOMP_RET_ACTION;

- if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) {
+ /*
+ * In order to distinguish between SECCOMP_RET_KILL and
+ * "higher priority" synthetic SECCOMP_RET_KILL_PROCESS
+ * identified by the kill_process filter flag, treat any
+ * case as immediately stopping filter processing. No
+ * higher priority action can exist, and we can't stop
+ * on the first RET_KILL (which may not have set
+ * f->kill_process) when a RET_KILL further up the filter
+ * list may have f->kill_process set which would go
+ * unnoticed.
+ */
+ if (unlikely(action == SECCOMP_RET_KILL && f->kill_process)) {
+ *match = f;
+ return cur_ret;
+ }
+
+ if (action < (ret & SECCOMP_RET_ACTION)) {
ret = cur_ret;
*match = f;
}
@@ -450,6 +469,10 @@ static long seccomp_attach_filter(unsigned int flags,
return ret;
}

+ /* Set process-killing flag, if present. */
+ if (flags & SECCOMP_FILTER_FLAG_KILL_PROCESS)
+ filter->kill_process = true;
+
/*
* If there is an existing filter, make it the prev and don't drop its
* task reference.
@@ -574,6 +597,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
u32 filter_ret, action;
struct seccomp_filter *match = NULL;
int data;
+ bool kill_process;

/*
* Make sure that any changes to mode from another thread have
@@ -655,8 +679,26 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
case SECCOMP_RET_KILL:
default:
audit_seccomp(this_syscall, SIGSYS, action);
- /* Dump core only if this is the last remaining thread. */
- if (get_nr_threads(current) == 1) {
+
+ /*
+ * The only way match can be NULL here is if something
+ * went very wrong in seccomp_run_filters() (e.g. a NULL
+ * filter list in struct seccomp) and the return action
+ * falls back to failing closed. In this case, take the
+ * strongest possible action.
+ *
+ * If we get here with match->kill_process set, we need
+ * to kill the entire thread group. Otherwise, kill only
+ * the offending thread.
+ */
+ kill_process = (!match || match->kill_process);
+
+ /*
+ * Dumping core kills the entire thread group, so only
+ * coredump if that has been requested or this was already
+ * the only thread running.
+ */
+ if (kill_process || get_nr_threads(current) == 1) {
siginfo_t info;

/* Show the original registers in the dump. */
@@ -665,7 +707,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
seccomp_init_siginfo(&info, this_syscall, data);
do_coredump(&info);
}
- do_exit(SIGSYS);
+
+ if (kill_process)
+ do_group_exit(SIGSYS);
+ else
+ do_exit(SIGSYS);
}

unreachable();
--
2.7.4

2017-08-09 19:02:25

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 4/4] selftests/seccomp: Test thread vs process killing

SECCOMP_RET_KILL is supposed to kill the current thread (and userspace
depends on this), so test for this, distinct from killing the entire
process. This also tests killing the entire process with the new
SECCOMP_FILTER_FLAG_KILL_PROCESS flag. (This also moves a bunch of
defines up earlier in the file to use them earlier.)

Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 192 ++++++++++++++++++++------
1 file changed, 151 insertions(+), 41 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index ee78a53da5d1..92f2779b6309 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -87,6 +87,51 @@ struct seccomp_data {
};
#endif

+#ifndef __NR_seccomp
+# if defined(__i386__)
+# define __NR_seccomp 354
+# elif defined(__x86_64__)
+# define __NR_seccomp 317
+# elif defined(__arm__)
+# define __NR_seccomp 383
+# elif defined(__aarch64__)
+# define __NR_seccomp 277
+# elif defined(__hppa__)
+# define __NR_seccomp 338
+# elif defined(__powerpc__)
+# define __NR_seccomp 358
+# elif defined(__s390__)
+# define __NR_seccomp 348
+# else
+# warning "seccomp syscall number unknown for this architecture"
+# define __NR_seccomp 0xffff
+# endif
+#endif
+
+#ifndef SECCOMP_SET_MODE_STRICT
+#define SECCOMP_SET_MODE_STRICT 0
+#endif
+
+#ifndef SECCOMP_SET_MODE_FILTER
+#define SECCOMP_SET_MODE_FILTER 1
+#endif
+
+#ifndef SECCOMP_FILTER_FLAG_TSYNC
+#define SECCOMP_FILTER_FLAG_TSYNC 1
+#endif
+
+#ifndef SECCOMP_FILTER_FLAG_KILL_PROCESS
+#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2
+#endif
+
+#ifndef seccomp
+int seccomp(unsigned int op, unsigned int flags, void *args)
+{
+ errno = 0;
+ return syscall(__NR_seccomp, op, flags, args);
+}
+#endif
+
#if __BYTE_ORDER == __LITTLE_ENDIAN
#define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
#elif __BYTE_ORDER == __BIG_ENDIAN
@@ -520,6 +565,112 @@ TEST_SIGNAL(KILL_one_arg_six, SIGSYS)
close(fd);
}

+/* This is a thread task to die via seccomp filter violation. */
+void *kill_thread(void *data)
+{
+ bool die = (bool)data;
+
+ if (die) {
+ prctl(PR_GET_SECCOMP, 0, 0, 0, 0);
+ return (void *)SIBLING_EXIT_FAILURE;
+ }
+
+ return (void *)SIBLING_EXIT_UNKILLED;
+}
+
+/* Prepare a thread that will kill itself or both of us. */
+void kill_thread_or_group(struct __test_metadata *_metadata, bool kill_process)
+{
+ pthread_t thread;
+ void *status;
+ unsigned int flags;
+ /* Kill only when calling __NR_prctl. */
+ struct sock_filter filter[] = {
+ BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+ offsetof(struct seccomp_data, nr)),
+ BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_prctl, 0, 1),
+ BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL),
+ BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+ };
+ struct sock_fprog prog = {
+ .len = (unsigned short)ARRAY_SIZE(filter),
+ .filter = filter,
+ };
+
+ ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+ TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+ }
+
+ flags = kill_process ? SECCOMP_FILTER_FLAG_KILL_PROCESS : 0;
+ ASSERT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog)) {
+ if (kill_process)
+ TH_LOG("Kernel does not support SECCOMP_FILTER_FLAG_KILL_PROCESS");
+ else
+ TH_LOG("Kernel does not support seccomp syscall");
+ }
+
+ /*
+ * Add the kill rule again to make sure that the KILL_PROCESS
+ * flag cannot be downgraded by a new filter.
+ */
+ ASSERT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog));
+
+ /* Start a thread that will exit immediately. */
+ ASSERT_EQ(0, pthread_create(&thread, NULL, kill_thread, (void *)false));
+ ASSERT_EQ(0, pthread_join(thread, &status));
+ ASSERT_EQ(SIBLING_EXIT_UNKILLED, (unsigned long)status);
+
+ /* Start a thread that will die immediately. */
+ ASSERT_EQ(0, pthread_create(&thread, NULL, kill_thread, (void *)true));
+ ASSERT_EQ(0, pthread_join(thread, &status));
+ ASSERT_NE(SIBLING_EXIT_FAILURE, (unsigned long)status);
+
+ /*
+ * If we get here, only the spawned thread died. Let the parent know
+ * the whole process didn't die (i.e. this thread, the spawner,
+ * stayed running).
+ */
+ exit(42);
+}
+
+TEST(KILL_thread)
+{
+ int status;
+ pid_t child_pid;
+
+ child_pid = fork();
+ ASSERT_LE(0, child_pid);
+ if (child_pid == 0) {
+ kill_thread_or_group(_metadata, false);
+ _exit(38);
+ }
+
+ ASSERT_EQ(child_pid, waitpid(child_pid, &status, 0));
+
+ /* If only the thread was killed, we'll see exit 42. */
+ ASSERT_TRUE(WIFEXITED(status));
+ ASSERT_EQ(42, WEXITSTATUS(status));
+}
+
+TEST(KILL_process)
+{
+ int status;
+ pid_t child_pid;
+
+ child_pid = fork();
+ ASSERT_LE(0, child_pid);
+ if (child_pid == 0) {
+ kill_thread_or_group(_metadata, true);
+ _exit(38);
+ }
+
+ ASSERT_EQ(child_pid, waitpid(child_pid, &status, 0));
+
+ /* If the entire process was killed, we'll see SIGSYS. */
+ ASSERT_TRUE(WIFSIGNALED(status));
+ ASSERT_EQ(SIGSYS, WTERMSIG(status));
+}
+
/* TODO(wad) add 64-bit versus 32-bit arg tests. */
TEST(arg_out_of_range)
{
@@ -1675,47 +1826,6 @@ TEST_F_SIGNAL(TRACE_syscall, kill_after_ptrace, SIGSYS)
EXPECT_NE(self->mypid, syscall(__NR_getpid));
}

-#ifndef __NR_seccomp
-# if defined(__i386__)
-# define __NR_seccomp 354
-# elif defined(__x86_64__)
-# define __NR_seccomp 317
-# elif defined(__arm__)
-# define __NR_seccomp 383
-# elif defined(__aarch64__)
-# define __NR_seccomp 277
-# elif defined(__hppa__)
-# define __NR_seccomp 338
-# elif defined(__powerpc__)
-# define __NR_seccomp 358
-# elif defined(__s390__)
-# define __NR_seccomp 348
-# else
-# warning "seccomp syscall number unknown for this architecture"
-# define __NR_seccomp 0xffff
-# endif
-#endif
-
-#ifndef SECCOMP_SET_MODE_STRICT
-#define SECCOMP_SET_MODE_STRICT 0
-#endif
-
-#ifndef SECCOMP_SET_MODE_FILTER
-#define SECCOMP_SET_MODE_FILTER 1
-#endif
-
-#ifndef SECCOMP_FILTER_FLAG_TSYNC
-#define SECCOMP_FILTER_FLAG_TSYNC 1
-#endif
-
-#ifndef seccomp
-int seccomp(unsigned int op, unsigned int flags, void *args)
-{
- errno = 0;
- return syscall(__NR_seccomp, op, flags, args);
-}
-#endif
-
TEST(seccomp_syscall)
{
struct sock_filter filter[] = {
--
2.7.4

2017-08-09 19:02:47

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 3/4] selftests/seccomp: Refactor RET_ERRNO tests

This refactors the errno tests (since they all use the same pattern for
their filter) and adds a RET_DATA field ordering test.

Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 95 ++++++++++++++++-----------
1 file changed, 58 insertions(+), 37 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 73f5ea6778ce..ee78a53da5d1 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -136,7 +136,7 @@ TEST(no_new_privs_support)
}
}

-/* Tests kernel support by checking for a copy_from_user() fault on * NULL. */
+/* Tests kernel support by checking for a copy_from_user() fault on NULL. */
TEST(mode_filter_support)
{
long ret;
@@ -541,26 +541,30 @@ TEST(arg_out_of_range)
EXPECT_EQ(EINVAL, errno);
}

+#define ERRNO_FILTER(name, errno) \
+ struct sock_filter _read_filter_##name[] = { \
+ BPF_STMT(BPF_LD|BPF_W|BPF_ABS, \
+ offsetof(struct seccomp_data, nr)), \
+ BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1), \
+ BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | errno), \
+ BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), \
+ }; \
+ struct sock_fprog prog_##name = { \
+ .len = (unsigned short)ARRAY_SIZE(_read_filter_##name), \
+ .filter = _read_filter_##name, \
+ }
+
+/* Make sure basic errno values are correctly passed through a filter. */
TEST(ERRNO_valid)
{
- struct sock_filter filter[] = {
- BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
- offsetof(struct seccomp_data, nr)),
- BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
- BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | E2BIG),
- BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
- };
- struct sock_fprog prog = {
- .len = (unsigned short)ARRAY_SIZE(filter),
- .filter = filter,
- };
+ ERRNO_FILTER(valid, E2BIG);
long ret;
pid_t parent = getppid();

ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);

- ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
+ ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_valid);
ASSERT_EQ(0, ret);

EXPECT_EQ(parent, syscall(__NR_getppid));
@@ -568,26 +572,17 @@ TEST(ERRNO_valid)
EXPECT_EQ(E2BIG, errno);
}

+/* Make sure an errno of zero is correctly handled by the arch code. */
TEST(ERRNO_zero)
{
- struct sock_filter filter[] = {
- BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
- offsetof(struct seccomp_data, nr)),
- BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
- BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | 0),
- BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
- };
- struct sock_fprog prog = {
- .len = (unsigned short)ARRAY_SIZE(filter),
- .filter = filter,
- };
+ ERRNO_FILTER(zero, 0);
long ret;
pid_t parent = getppid();

ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);

- ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
+ ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_zero);
ASSERT_EQ(0, ret);

EXPECT_EQ(parent, syscall(__NR_getppid));
@@ -595,26 +590,21 @@ TEST(ERRNO_zero)
EXPECT_EQ(0, read(0, NULL, 0));
}

+/*
+ * The SECCOMP_RET_DATA mask is 16 bits wide, but errno is smaller.
+ * This tests that the errno value gets capped correctly, fixed by
+ * 580c57f10768 ("seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO").
+ */
TEST(ERRNO_capped)
{
- struct sock_filter filter[] = {
- BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
- offsetof(struct seccomp_data, nr)),
- BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
- BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | 4096),
- BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
- };
- struct sock_fprog prog = {
- .len = (unsigned short)ARRAY_SIZE(filter),
- .filter = filter,
- };
+ ERRNO_FILTER(capped, 4096);
long ret;
pid_t parent = getppid();

ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);

- ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
+ ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_capped);
ASSERT_EQ(0, ret);

EXPECT_EQ(parent, syscall(__NR_getppid));
@@ -622,6 +612,37 @@ TEST(ERRNO_capped)
EXPECT_EQ(4095, errno);
}

+/*
+ * Filters are processed in reverse order: last applied is executed first.
+ * Since only the SECCOMP_RET_ACTION mask is tested for return values, the
+ * SECCOMP_RET_DATA mask results will follow the most recently applied
+ * matching filter return (and not the lowest or highest value).
+ */
+TEST(ERRNO_order)
+{
+ ERRNO_FILTER(first, 11);
+ ERRNO_FILTER(second, 13);
+ ERRNO_FILTER(third, 12);
+ long ret;
+ pid_t parent = getppid();
+
+ ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+ ASSERT_EQ(0, ret);
+
+ ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_first);
+ ASSERT_EQ(0, ret);
+
+ ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_second);
+ ASSERT_EQ(0, ret);
+
+ ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_third);
+ ASSERT_EQ(0, ret);
+
+ EXPECT_EQ(parent, syscall(__NR_getppid));
+ EXPECT_EQ(-1, read(0, NULL, 0));
+ EXPECT_EQ(12, errno);
+}
+
FIXTURE_DATA(TRAP) {
struct sock_fprog prog;
};
--
2.7.4

2017-08-09 19:02:45

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 1/4] seccomp: Provide matching filter for introspection

Both the upcoming logging improvements and changes to RET_KILL will need
to know which filter a given seccomp return value originated from. In
order to delay logic processing of result until after the seccomp loop,
this adds a single pointer assignment on matches. This will allow both
log and RET_KILL logic to work off the filter rather than doing more
expensive tests inside the time-critical run_filters loop.

Running tight cycles of getpid() with filters attached shows no measurable
difference in speed.

Suggested-by: Tyler Hicks <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
---
kernel/seccomp.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 98b59b5db90b..1f3347fc2605 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -171,10 +171,14 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
/**
* seccomp_run_filters - evaluates all seccomp filters against @sd
* @sd: optional seccomp data to be passed to filters
+ * @match: stores struct seccomp_filter that resulted in the return value,
+ * unless filter returned SECCOMP_RET_ALLOW, in which case it will
+ * be unchanged.
*
* Returns valid seccomp BPF response codes.
*/
-static u32 seccomp_run_filters(const struct seccomp_data *sd)
+static u32 seccomp_run_filters(const struct seccomp_data *sd,
+ struct seccomp_filter **match)
{
struct seccomp_data sd_local;
u32 ret = SECCOMP_RET_ALLOW;
@@ -198,8 +202,10 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd)
for (; f; f = f->prev) {
u32 cur_ret = BPF_PROG_RUN(f->prog, sd);

- if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
+ if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) {
ret = cur_ret;
+ *match = f;
+ }
}
return ret;
}
@@ -566,6 +572,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
const bool recheck_after_trace)
{
u32 filter_ret, action;
+ struct seccomp_filter *match = NULL;
int data;

/*
@@ -574,7 +581,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
*/
rmb();

- filter_ret = seccomp_run_filters(sd);
+ filter_ret = seccomp_run_filters(sd, &match);
data = filter_ret & SECCOMP_RET_DATA;
action = filter_ret & SECCOMP_RET_ACTION;

@@ -638,6 +645,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
return 0;

case SECCOMP_RET_ALLOW:
+ /*
+ * Note that the "match" filter will always be NULL for
+ * this action since SECCOMP_RET_ALLOW is the starting
+ * state in seccomp_run_filters().
+ */
return 0;

case SECCOMP_RET_KILL:
--
2.7.4

2017-08-09 20:22:35

by Tycho Andersen

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

On Wed, Aug 09, 2017 at 12:01:53PM -0700, Kees Cook wrote:
> This series is the result of Fabricio and I going around a few times
> on possible solutions for finding a way to enhance RET_KILL to kill
> the process group. There's a lot of ways this could be done, but I
> wanted something that felt cleanest. As it happens, Tyler's recent
> patch series for logging improvement also needs to know a litte bit
> more during filter runs, and the solution for both is to pass back
> the matched filter. This lets us examine it here for RET_KILL and
> in the future for logging changes.
>
> The filter passing is patch 1, the new flag for RET_KILL is patch 2.
> Some test refactoring is in patch 3 for the RET_DATA ordering, and
> patch 4 is the test for the new RET_KILL flag.
>
> One thing missing is that CRIU will likely need to be updated, since
> saving/restoring seccomp filter _rules_ will not include the filter
> _flags_ for a process. This can be addressed separately.

Thanks for the heads up, I suppose PTRACE_SECCOMP_GET_FLAGS similar to
how PTRACE_SECCOMP_GET_FILTER works will be fine for this. One
question is: would we then also need to keep track of the TSYNC flag?
I don't think CRIU needs this to be correct, and we can grab the
KILL_PROCESS flag from filter->kill_process, so perhaps it's moot.

Anyway, happy to do this and the userspace part when this lands.

Cheers,

Tycho

> Please take a look!
>
> Thanks,
>
> -Kees
>
> v3:
> - adjust seccomp_run_filters() to avoid later filters from masking
> kill-process RET_KILL actions (drewry)
> - add test for masked RET_KILL.
>
> v2:
> - moved kill_process bool into struct padding gap (tyhicks)
> - improved comments/docs in various places for clarify (tyhicks)
> - use ASSERT_TRUE() for WIFEXITED and WIFSIGNALLED (tyhicks)
> - adding Reviewed-bys from tyhicks
>

2017-08-09 20:33:38

by Tyler Hicks

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

Hey Tycho!

On 08/09/2017 03:22 PM, Tycho Andersen wrote:
> On Wed, Aug 09, 2017 at 12:01:53PM -0700, Kees Cook wrote:
>> This series is the result of Fabricio and I going around a few times
>> on possible solutions for finding a way to enhance RET_KILL to kill
>> the process group. There's a lot of ways this could be done, but I
>> wanted something that felt cleanest. As it happens, Tyler's recent
>> patch series for logging improvement also needs to know a litte bit
>> more during filter runs, and the solution for both is to pass back
>> the matched filter. This lets us examine it here for RET_KILL and
>> in the future for logging changes.
>>
>> The filter passing is patch 1, the new flag for RET_KILL is patch 2.
>> Some test refactoring is in patch 3 for the RET_DATA ordering, and
>> patch 4 is the test for the new RET_KILL flag.
>>
>> One thing missing is that CRIU will likely need to be updated, since
>> saving/restoring seccomp filter _rules_ will not include the filter
>> _flags_ for a process. This can be addressed separately.
>
> Thanks for the heads up, I suppose PTRACE_SECCOMP_GET_FLAGS similar to
> how PTRACE_SECCOMP_GET_FILTER works will be fine for this. One
> question is: would we then also need to keep track of the TSYNC flag?
> I don't think CRIU needs this to be correct, and we can grab the
> KILL_PROCESS flag from filter->kill_process, so perhaps it's moot.

Note that the logging changes that I'm working on also introduce a new
filter flag (as Kees mentioned above). My filter flag is a lot like the
KILL_PROCESS filter flag in that it is stored as a member of the
seccomp_filter struct.

I would think that you'd want to be able to do something like
PTRACE_SECCOMP_GET_FILTER to (hopefully) future proof CRIU against all
newly added filter flags.

I'll also mention that I have a libseccomp branch in the making that
allows libseccomp to query the kernel to see if it supports a given
filter flag. I haven't done a PR on that yet because I'm waiting to see
how my related kernel patches play out (they seem to be getting close to
being acceptable).

Tyler

>
> Anyway, happy to do this and the userspace part when this lands.
>
> Cheers,
>
> Tycho
>
>> Please take a look!
>>
>> Thanks,
>>
>> -Kees
>>
>> v3:
>> - adjust seccomp_run_filters() to avoid later filters from masking
>> kill-process RET_KILL actions (drewry)
>> - add test for masked RET_KILL.
>>
>> v2:
>> - moved kill_process bool into struct padding gap (tyhicks)
>> - improved comments/docs in various places for clarify (tyhicks)
>> - use ASSERT_TRUE() for WIFEXITED and WIFSIGNALLED (tyhicks)
>> - adding Reviewed-bys from tyhicks
>>



Attachments:
signature.asc (801.00 B)
OpenPGP digital signature

2017-08-09 20:49:01

by Tycho Andersen

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

Hey Tyler :)

On Wed, Aug 09, 2017 at 03:33:28PM -0500, Tyler Hicks wrote:
> Hey Tycho!
>
> On 08/09/2017 03:22 PM, Tycho Andersen wrote:
> > On Wed, Aug 09, 2017 at 12:01:53PM -0700, Kees Cook wrote:
> >> This series is the result of Fabricio and I going around a few times
> >> on possible solutions for finding a way to enhance RET_KILL to kill
> >> the process group. There's a lot of ways this could be done, but I
> >> wanted something that felt cleanest. As it happens, Tyler's recent
> >> patch series for logging improvement also needs to know a litte bit
> >> more during filter runs, and the solution for both is to pass back
> >> the matched filter. This lets us examine it here for RET_KILL and
> >> in the future for logging changes.
> >>
> >> The filter passing is patch 1, the new flag for RET_KILL is patch 2.
> >> Some test refactoring is in patch 3 for the RET_DATA ordering, and
> >> patch 4 is the test for the new RET_KILL flag.
> >>
> >> One thing missing is that CRIU will likely need to be updated, since
> >> saving/restoring seccomp filter _rules_ will not include the filter
> >> _flags_ for a process. This can be addressed separately.
> >
> > Thanks for the heads up, I suppose PTRACE_SECCOMP_GET_FLAGS similar to
> > how PTRACE_SECCOMP_GET_FILTER works will be fine for this. One
> > question is: would we then also need to keep track of the TSYNC flag?
> > I don't think CRIU needs this to be correct, and we can grab the
> > KILL_PROCESS flag from filter->kill_process, so perhaps it's moot.
>
> Note that the logging changes that I'm working on also introduce a new
> filter flag (as Kees mentioned above). My filter flag is a lot like the
> KILL_PROCESS filter flag in that it is stored as a member of the
> seccomp_filter struct.
>
> I would think that you'd want to be able to do something like
> PTRACE_SECCOMP_GET_FILTER to (hopefully) future proof CRIU against all
> newly added filter flags.

Yep, the theoretical GET_FLAGS above would handle this, I think. What
I was wondering about is for TSYNC (or any future flags) which aren't
tracked in the struct seccomp_filter; would the existence of GET_FLAGS
mean we need to remember such flags as well somewhere? Not necessary
for CRIU's correctness right now at least, but...

Cheers,

Tycho

2017-08-09 20:54:24

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

On Wed, Aug 9, 2017 at 1:33 PM, Tyler Hicks <[email protected]> wrote:
> Hey Tycho!
>
> On 08/09/2017 03:22 PM, Tycho Andersen wrote:
>> On Wed, Aug 09, 2017 at 12:01:53PM -0700, Kees Cook wrote:
>>> This series is the result of Fabricio and I going around a few times
>>> on possible solutions for finding a way to enhance RET_KILL to kill
>>> the process group. There's a lot of ways this could be done, but I
>>> wanted something that felt cleanest. As it happens, Tyler's recent
>>> patch series for logging improvement also needs to know a litte bit
>>> more during filter runs, and the solution for both is to pass back
>>> the matched filter. This lets us examine it here for RET_KILL and
>>> in the future for logging changes.
>>>
>>> The filter passing is patch 1, the new flag for RET_KILL is patch 2.
>>> Some test refactoring is in patch 3 for the RET_DATA ordering, and
>>> patch 4 is the test for the new RET_KILL flag.
>>>
>>> One thing missing is that CRIU will likely need to be updated, since
>>> saving/restoring seccomp filter _rules_ will not include the filter
>>> _flags_ for a process. This can be addressed separately.
>>
>> Thanks for the heads up, I suppose PTRACE_SECCOMP_GET_FLAGS similar to
>> how PTRACE_SECCOMP_GET_FILTER works will be fine for this. One
>> question is: would we then also need to keep track of the TSYNC flag?
>> I don't think CRIU needs this to be correct, and we can grab the
>> KILL_PROCESS flag from filter->kill_process, so perhaps it's moot.

It does not need to track TSYNC (since the results of that flag are
represented in the filter tree itself).

> Note that the logging changes that I'm working on also introduce a new
> filter flag (as Kees mentioned above). My filter flag is a lot like the
> KILL_PROCESS filter flag in that it is stored as a member of the
> seccomp_filter struct.
>
> I would think that you'd want to be able to do something like
> PTRACE_SECCOMP_GET_FILTER to (hopefully) future proof CRIU against all
> newly added filter flags.

I didn't see an obvious way to extend the existing
PTRACE_SECCOMP_GET_FILTER to also return flags, so I think either a
new function for just flags or a new versioned function for rules and
flags will be needed.

>> Anyway, happy to do this and the userspace part when this lands.

Okay, great! Thanks!

-Kees

--
Kees Cook
Pixel Security

2017-08-11 16:59:14

by Tyler Hicks

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

On 08/09/2017 02:01 PM, Kees Cook wrote:
> Right now, SECCOMP_RET_KILL kills the current thread. There have been
> a few requests for RET_KILL to kill the entire process (the thread
> group), but since seccomp's u32 return values are ABI, and ordered by
> lowest value, with RET_KILL as 0, there isn't a trivial way to provide
> an even smaller value that would mean the more restrictive action of
> killing the thread group.
>
> Changing the definition of SECCOMP_RET_KILL to mean "kill process" and
> then adding a new higher definition of SECCOMP_RET_KILL_THREAD isn't
> possible since there are already userspace programs depending on the
> kill-thread behavior.
>
> Instead, create a simulated SECCOMP_RET_KILL_PROCESS action which
> has the semantics of such a return had it been possible to create
> it naturally. This is done by adding a filter flag that indicates
> that a RET_KILL from this filter must kill the process rather
> than the thread. This can be set (and not cleared) via the new
> SECCOMP_FILTER_FLAG_KILL_PROCESS flag.

I think the name of the pkill utility has made it impossible for my
brain to type FILTER_FLAG_KILL_PROCESS. My fingers want to type
FILTER_FLAG_PROCESS_KILL every single time. :)

>
> Pros:
> - the logic for the filter action is contained in the filter.
> - userspace can detect support for the feature since earlier kernels
> will reject the new flag.
> - can be distinguished from "normal" RET_KILL in the same filter
> chain (in case a program wants to choose to kill thread or process).
> Cons:
> - depends on adding an assignment to the seccomp_run_filters() loop
> (previous patch).
> - adds logic to the seccomp_run_filters() loop (though it is a single
> unlikely test for zero, so the cycle count change isn't measurable).
>
> Alternatives to this approach with pros/cons:
>
> - Change userspace API to use an s32 for BPF return value, and use a
> negative value to indicate SECCOMP_RET_KILL_PROCESS.
> Pros:
> - no change needed to seccomp_run_filters() loop.
> - the logic for the filter action is contained in the filter.
> - can be distinguished from "normal" RET_KILL in the same filter
> chain (in case a program wants to choose to kill thread or process).
> Cons:
> - there isn't a trivial way for userspace to detect if the kernel
> supports the feature (earlier kernels will silently ignore the
> new value, only kill the thread).
> - need to teach libseccomp about new details, extensive updates
> to documentation, and hope there is not confusion about signedness
> in existing userspace.
>
> - Use a new test during seccomp_run_filters() that treats the RET_DATA
> mask of a RET_KILL action as special. If a new bit is set in the data,
> then treat the return value as -1 (lower than 0).
> Pros:
> - the logic for the filter action is contained in the filter.
> - can be distinguished from "normal" RET_KILL in the same filter
> chain (in case a program wants to choose to kill thread or process).
> Cons:
> - adds more than a single unlikely test to time-sensitive
> seccomp_run_filters() loop.
> - there isn't a trivial way for userspace to detect if the kernel
> supports the feature (earlier kernels will silently ignore the
> RET_DATA and only kill the thread).
> - Violates the "most recent RET_DATA" return value used for all
> other actions (since processing must continue to find the first
> RET_KILL with the flag set).
> - May create problems for existing programs that were using RET_DATA
> to distinguish between various SECCOMP_RET_KILL locations.
>
> - Have SECCOMP_FILTER_FLAG_KILL_PROCESS attach to the seccomp struct
> rather than the filter.
> Pros:
> - no change needed to seccomp_run_filters() loop.
> - userspace can detect support for the feature since earlier kernels
> will reject the new flag.
> Cons:
> - does not provide a way for a set of filters to distinguish
> between wanting to kill a thread or a process.
> - the logic for the filter action isn't contained in the filter,
> which may lead to synchronization bugs (e.g. vs TSYNC, vs CRIU, etc).
>
> Signed-off-by: Kees Cook <[email protected]>
> ---
> include/linux/seccomp.h | 3 ++-
> include/uapi/linux/seccomp.h | 3 ++-
> kernel/seccomp.c | 54 ++++++++++++++++++++++++++++++++++++++++----
> 3 files changed, 54 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index ecc296c137cd..59d001ba655c 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -3,7 +3,8 @@
>
> #include <uapi/linux/seccomp.h>
>
> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
> +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
> + SECCOMP_FILTER_FLAG_KILL_PROCESS)
>
> #ifdef CONFIG_SECCOMP
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a43ff1e..4b75d8c297b6 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -15,7 +15,8 @@
> #define SECCOMP_SET_MODE_FILTER 1
>
> /* Valid flags for SECCOMP_SET_MODE_FILTER */
> -#define SECCOMP_FILTER_FLAG_TSYNC 1
> +#define SECCOMP_FILTER_FLAG_TSYNC 1
> +#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2
>
> /*
> * All BPF programs must return a 32-bit value.
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 1f3347fc2605..6a196d1495e6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -44,6 +44,7 @@
> * is only needed for handling filters shared across tasks.
> * @prev: points to a previously installed, or inherited, filter
> * @prog: the BPF program to evaluate
> + * @kill_process: if true, RET_KILL will kill process rather than thread.
> *
> * seccomp_filter objects are organized in a tree linked via the @prev
> * pointer. For any task, it appears to be a singly-linked list starting
> @@ -57,6 +58,7 @@
> */
> struct seccomp_filter {
> refcount_t usage;
> + bool kill_process;
> struct seccomp_filter *prev;
> struct bpf_prog *prog;
> };
> @@ -201,8 +203,25 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> */
> for (; f; f = f->prev) {
> u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
> + u32 action = cur_ret & SECCOMP_RET_ACTION;
>
> - if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) {
> + /*
> + * In order to distinguish between SECCOMP_RET_KILL and
> + * "higher priority" synthetic SECCOMP_RET_KILL_PROCESS
> + * identified by the kill_process filter flag, treat any
> + * case as immediately stopping filter processing. No
> + * higher priority action can exist, and we can't stop
> + * on the first RET_KILL (which may not have set
> + * f->kill_process) when a RET_KILL further up the filter
> + * list may have f->kill_process set which would go
> + * unnoticed.
> + */
> + if (unlikely(action == SECCOMP_RET_KILL && f->kill_process)) {
> + *match = f;
> + return cur_ret;
> + }

Why not let the application enforce this via the seccomp filter? In
other words, the first filter loaded with
SECCOMP_FILTER_FLAG_KILL_PROCESS set could have a rule in the filter
that only allows seccomp(2) to be called in the future with the
SECCOMP_FILTER_FLAG_KILL_PROCESS flag set.

I understand the reasoning for wanting to enforce this automatically at
the kernel level but I think mixing return action priorities with filter
flags could be confusing and inflexible in the long run since filters
are inherited and your parent's desire to kill the entire thread group
may not mix with your desire to only kill a single thread.

Another way that this doesn't mix perfectly with the existing design is
when the action is unknown. In that situation, we treat it as RET_KILL.
However, this patch hard-codes the comparison with RET_KILL so we get
into this situation where an unknown action is treated as RET_KILL
except when the filter has the FILTER_FLAG_KILL_PROCESS flag set and
then this short-circuit doesn't kick in. It is a corner case, for sure,
but worth mentioning.

Tyler

> +
> + if (action < (ret & SECCOMP_RET_ACTION)) {
> ret = cur_ret;
> *match = f;
> }
> @@ -450,6 +469,10 @@ static long seccomp_attach_filter(unsigned int flags,
> return ret;
> }
>
> + /* Set process-killing flag, if present. */
> + if (flags & SECCOMP_FILTER_FLAG_KILL_PROCESS)
> + filter->kill_process = true;
> +
> /*
> * If there is an existing filter, make it the prev and don't drop its
> * task reference.
> @@ -574,6 +597,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
> u32 filter_ret, action;
> struct seccomp_filter *match = NULL;
> int data;
> + bool kill_process;
>
> /*
> * Make sure that any changes to mode from another thread have
> @@ -655,8 +679,26 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
> case SECCOMP_RET_KILL:
> default:
> audit_seccomp(this_syscall, SIGSYS, action);
> - /* Dump core only if this is the last remaining thread. */
> - if (get_nr_threads(current) == 1) {
> +
> + /*
> + * The only way match can be NULL here is if something
> + * went very wrong in seccomp_run_filters() (e.g. a NULL
> + * filter list in struct seccomp) and the return action
> + * falls back to failing closed. In this case, take the
> + * strongest possible action.
> + *
> + * If we get here with match->kill_process set, we need
> + * to kill the entire thread group. Otherwise, kill only
> + * the offending thread.
> + */
> + kill_process = (!match || match->kill_process);
> +
> + /*
> + * Dumping core kills the entire thread group, so only
> + * coredump if that has been requested or this was already
> + * the only thread running.
> + */
> + if (kill_process || get_nr_threads(current) == 1) {
> siginfo_t info;
>
> /* Show the original registers in the dump. */
> @@ -665,7 +707,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
> seccomp_init_siginfo(&info, this_syscall, data);
> do_coredump(&info);
> }
> - do_exit(SIGSYS);
> +
> + if (kill_process)
> + do_group_exit(SIGSYS);
> + else
> + do_exit(SIGSYS);
> }
>
> unreachable();
>



Attachments:
signature.asc (801.00 B)
OpenPGP digital signature

2017-08-11 18:32:57

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

On Fri, Aug 11, 2017 at 9:58 AM, Tyler Hicks <[email protected]> wrote:
>> @@ -201,8 +203,25 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>> */
>> for (; f; f = f->prev) {
>> u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
>> + u32 action = cur_ret & SECCOMP_RET_ACTION;
>>
>> - if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) {
>> + /*
>> + * In order to distinguish between SECCOMP_RET_KILL and
>> + * "higher priority" synthetic SECCOMP_RET_KILL_PROCESS
>> + * identified by the kill_process filter flag, treat any
>> + * case as immediately stopping filter processing. No
>> + * higher priority action can exist, and we can't stop
>> + * on the first RET_KILL (which may not have set
>> + * f->kill_process) when a RET_KILL further up the filter
>> + * list may have f->kill_process set which would go
>> + * unnoticed.
>> + */
>> + if (unlikely(action == SECCOMP_RET_KILL && f->kill_process)) {
>> + *match = f;
>> + return cur_ret;
>> + }
>
> Why not let the application enforce this via the seccomp filter? In
> other words, the first filter loaded with
> SECCOMP_FILTER_FLAG_KILL_PROCESS set could have a rule in the filter
> that only allows seccomp(2) to be called in the future with the
> SECCOMP_FILTER_FLAG_KILL_PROCESS flag set.

I've been using the guide of "if SECCOMP_RET_KILL_PROCESS _did_ exist,
how would its semantics differ?"

In that magic world, it wouldn't be possible to create a seccomp
filter to screen out SECCOMP_RET_KILL_PROCESS. Also, being able to
distinguish between the two states (see below).

> I understand the reasoning for wanting to enforce this automatically at
> the kernel level but I think mixing return action priorities with filter
> flags could be confusing and inflexible in the long run since filters
> are inherited and your parent's desire to kill the entire thread group
> may not mix with your desire to only kill a single thread.

Blocking the use of SECCOMP_FILTER_FLAG_KILL_PROCESS just means a
child can never perform a KILL_PROCESS, which doesn't really make much
sense, IMO.

The trouble may be that KILL_PROCESS would be used sparingly by either
parent or child, in the sense that maybe "unknown syscall gets
KILL_PROCESS, but 'connect' should just do KILL_THREAD". Or the
reverse. There isn't a way to mix combinations of return values across
filter chains without treating it exactly like a "real"
SECCOMP_RET_KILL_PROCESS would have worked. That means I have to treat
it as "higher priority" in the seccomp_run_filters() loop (which is
luckily very very cheap, as the "unlikely(register == zero)" test is
correct branch-predicted for the non-zero case, and the test is cheap
(we've already done the assignment which we need for the "<" test
below it, so it's a single pipelined instruction for the zero flag).

I don't expect to adjust KILL_THREAD vs KILL_PROCESS ever again, so
I'm not too worried about inflexibility.

What I don't get in this version is a _single_ filter being able to
distinguish between KILL_THREAD and KILL_PROCESS. Userspace is forced
to split up a rule if it wants to have different results. Also, parent
_can_ stop a child from escalating their KILL_THREADs to KILL_PROCESS
via the filter you mentioned, which is weird.

I spent some time trying to use the high bit in the return, to make
this signed, and in the end it was much much more ugly, and I didn't
want to deal with the fallout to userspace which may suddenly have to
deal with unexpected bits in the BPF return:

basically s/u32/s32/ in __seccomp_filter() and seccomp_run_filters().
add #define SECCOMP_RET_ACTION_FULL 0xffff0000
add #define SECCOMP_RET_KILL_PROC 0x80000000

Then use SECCOMP_RET_ACTION_FULL to mask everything (after forcing a u32 cast).

But the more I stare at this, the more I just want a value that that
works correctly without totally crazy flags and things.

> Another way that this doesn't mix perfectly with the existing design is
> when the action is unknown. In that situation, we treat it as RET_KILL.
> However, this patch hard-codes the comparison with RET_KILL so we get
> into this situation where an unknown action is treated as RET_KILL
> except when the filter has the FILTER_FLAG_KILL_PROCESS flag set and
> then this short-circuit doesn't kick in. It is a corner case, for sure,
> but worth mentioning.

Hm, yeah, good point. This leaves unknown returns as KILL_THREAD, not
KILL_PROCESS.

Let me spent some more time looking at the high bit version of this...

-Kees

--
Kees Cook
Pixel Security