Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932387Ab2B1RSI (ORCPT ); Tue, 28 Feb 2012 12:18:08 -0500 Received: from mail-ey0-f174.google.com ([209.85.215.174]:53469 "EHLO mail-ey0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756767Ab2B1RSE convert rfc822-to-8bit (ORCPT ); Tue, 28 Feb 2012 12:18:04 -0500 Authentication-Results: mr.google.com; spf=pass (google.com: domain of wad@chromium.org designates 10.112.86.67 as permitted sender) smtp.mail=wad@chromium.org; dkim=pass header.i=wad@chromium.org MIME-Version: 1.0 In-Reply-To: References: <1330140111-17201-1-git-send-email-wad@chromium.org> <1330140111-17201-6-git-send-email-wad@chromium.org> Date: Tue, 28 Feb 2012 11:17:59 -0600 Message-ID: Subject: Re: [PATCH v11 06/12] seccomp: add system call filtering using BPF From: Will Drewry To: Indan Zupancic Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, kernel-hardening@lists.openwall.com, netdev@vger.kernel.org, x86@kernel.org, arnd@arndb.de, davem@davemloft.net, hpa@zytor.com, mingo@redhat.com, oleg@redhat.com, peterz@infradead.org, rdunlap@xenotime.net, mcgrathr@chromium.org, tglx@linutronix.de, luto@mit.edu, eparis@redhat.com, serge.hallyn@canonical.com, djm@mindrot.org, scarybeasts@gmail.com, pmoore@redhat.com, akpm@linux-foundation.org, corbet@lwn.net, eric.dumazet@gmail.com, markus@chromium.org, coreyb@linux.vnet.ibm.com, keescook@chromium.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5589 Lines: 158 On Tue, Feb 28, 2012 at 12:51 AM, Indan Zupancic wrote: > Hello, > > On Sat, February 25, 2012 04:21, Will Drewry wrote: >> @@ -169,6 +170,7 @@ void free_task(struct task_struct *tsk) >> ? ? ? free_thread_info(tsk->stack); >> ? ? ? rt_mutex_debug_task_free(tsk); >> ? ? ? ftrace_graph_exit_task(tsk); >> + ? ? put_seccomp_filter(tsk->seccomp.filter); >> ? ? ? free_task_struct(tsk); >> } >> EXPORT_SYMBOL(free_task); >> @@ -1113,6 +1115,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, >> ? ? ? ? ? ? ? goto fork_out; >> >> ? ? ? ftrace_graph_init_task(p); >> + ? ? copy_seccomp(&p->seccomp, ¤t->seccomp); > > I agree it's more symmetrical when get_seccomp_filter() is used here > directly instead of copy_seccomp(). That should put Kees at ease. Makes sense to me too. Once I'd dropped the other bits, it seemed silly to keep copy_seccomp. >> +static void seccomp_filter_log_failure(int syscall) >> +{ >> + ? ? int compat = 0; >> +#ifdef CONFIG_COMPAT >> + ? ? compat = is_compat_task(); >> +#endif >> + ? ? pr_info("%s[%d]: %ssystem call %d blocked at 0x%lx\n", >> + ? ? ? ? ? ? current->comm, task_pid_nr(current), >> + ? ? ? ? ? ? (compat ? "compat " : ""), >> + ? ? ? ? ? ? syscall, KSTK_EIP(current)); >> +} > > This should be at least rate limited, but could be dropped altogether, > as it's mostly useful for debugging filters. There is no kernel message > when a process is killed because it exceeds a ulimit either. The death > by SIGSYS is hopefully clear enough for users, and filter writers can > return different errno values when debugging where it goes wrong. I'll pull Kees's patch into the series this next go-round. >> +/** >> + * bpf_load: checks and returns a pointer to the requested offset >> + * @nr: int syscall passed as a void * to bpf_run_filter >> + * @off: offset into struct seccomp_data to load from > > Must be aligned, that's worth mentioning. True - thanks! >> + * @size: number of bytes to load (must be 4) >> + * @buffer: temporary storage supplied by bpf_run_filter (4 bytes) > > '@buf'. Oops - fixed. >> +/** >> + * copy_seccomp: manages inheritance on fork >> + * @child: forkee's seccomp >> + * @parent: forker's seccomp >> + * >> + * Ensures that @child inherits filters if in use. >> + */ >> +void copy_seccomp(struct seccomp *child, const struct seccomp *parent) >> +{ >> + ? ? /* Other fields are handled by dup_task_struct. */ >> + ? ? child->filter = get_seccomp_filter(parent->filter); >> +} >> +#endif ? ? ? /* CONFIG_SECCOMP_FILTER */ > > Yeah, just get rid of this and use get_seccomp_filter directly, and make > it return void instead of a pointer. It'll be updated. >> >> /* >> ?* Secure computing mode 1 allows only read/write/exit/sigreturn. >> @@ -34,10 +293,11 @@ static int mode1_syscalls_32[] = { >> void __secure_computing(int this_syscall) >> { >> ? ? ? int mode = current->seccomp.mode; >> - ? ? int * syscall; >> + ? ? int exit_code = SIGKILL; >> + ? ? int *syscall; >> >> ? ? ? switch (mode) { >> - ? ? case 1: >> + ? ? case SECCOMP_MODE_STRICT: >> ? ? ? ? ? ? ? syscall = mode1_syscalls; >> #ifdef CONFIG_COMPAT >> ? ? ? ? ? ? ? if (is_compat_task()) >> @@ -48,6 +308,14 @@ void __secure_computing(int this_syscall) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return; >> ? ? ? ? ? ? ? } while (*++syscall); >> ? ? ? ? ? ? ? break; >> +#ifdef CONFIG_SECCOMP_FILTER >> + ? ? case SECCOMP_MODE_FILTER: >> + ? ? ? ? ? ? if (seccomp_run_filters(this_syscall) == SECCOMP_RET_ALLOW) >> + ? ? ? ? ? ? ? ? ? ? return; >> + ? ? ? ? ? ? seccomp_filter_log_failure(this_syscall); >> + ? ? ? ? ? ? exit_code = SIGSYS; > > Wouldn't it make more sense to always kill with SIGSYS, also for mode 1? > I suppose it's too late for that now. It would but I don't want to go and change existing ABI. >> +/** >> + * prctl_set_seccomp: configures current->seccomp.mode >> + * @seccomp_mode: requested mode to use >> + * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER >> + * >> + * This function may be called repeatedly with a @seccomp_mode of >> + * SECCOMP_MODE_FILTER to install additional filters. ?Every filter >> + * successfully installed will be evaluated (in reverse order) for each system >> + * call the task makes. >> + * >> + * It is not possible to transition change the current->seccomp.mode after >> + * one has been set on a task. > > That reads awkwardly, do you mean the mode can't be changed once it's set? Yup - I will fix that. It doesn't even make sense :) > --- > > Reviewed-by: Indan Zupancic Thanks! > All in all I'm quite happy with how the code is now, at least this bit. > I'm still unsure about the networking patch and the 32-bit BPF on 64-bit > archs mismatch. Yeah - that is really the most challenging set of compromises, but I think they are the right ones. > As for the unlimited filter count, I'm not sure how to fix that. > The problem is that filters are inherited and shared. Limiting the > list length (tree depth) helps a bit, then the total memory usage > is limited by max number of processes. It may make sense to limit > the total instruction count instead of the number of filters. I'll take a stab at tree path size for starters and hopefully we can encourage consumers of the API to check for errors on return. Thanks for the continued review and feedback! will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/