Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932101AbbFATvl (ORCPT ); Mon, 1 Jun 2015 15:51:41 -0400 Received: from mail-la0-f42.google.com ([209.85.215.42]:35177 "EHLO mail-la0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752141AbbFATve (ORCPT ); Mon, 1 Jun 2015 15:51:34 -0400 MIME-Version: 1.0 In-Reply-To: <20150601194707.GA2818@hopstrocity> References: <1433186918-9626-1-git-send-email-tycho.andersen@canonical.com> <20150601194707.GA2818@hopstrocity> From: Andy Lutomirski Date: Mon, 1 Jun 2015 12:51:12 -0700 Message-ID: Subject: Re: [PATCH] seccomp: add ptrace commands for suspend/resume To: Tycho Andersen Cc: "linux-kernel@vger.kernel.org" , Linux API , Kees Cook , Will Drewry , Roland McGrath , Oleg Nesterov , Pavel Emelyanov , "Serge E. Hallyn" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7954 Lines: 225 On Mon, Jun 1, 2015 at 12:47 PM, Tycho Andersen wrote: > On Mon, Jun 01, 2015 at 12:38:57PM -0700, Andy Lutomirski wrote: >> On Mon, Jun 1, 2015 at 12:28 PM, Tycho Andersen >> wrote: >> > This patch is the first step in enabling checkpoint/restore of processes >> > with seccomp enabled. >> > >> > One of the things CRIU does while dumping tasks is inject code into them >> > via ptrace to collect information that is only available to the process >> > itself. However, if we are in a seccomp mode where these processes are >> > prohibited from making these syscalls, then what CRIU does kills the task. >> > >> > This patch adds a new ptrace command, PTRACE_SUSPEND_SECCOMP that enables a >> > task from the init user namespace which has CAP_SYS_ADMIN to disable (and >> > re-enable) seccomp filters for another task so that they can be >> > successfully dumped (and restored). >> > >> > Signed-off-by: Tycho Andersen >> > CC: Kees Cook >> > CC: Andy Lutomirski >> > CC: Will Drewry >> > CC: Roland McGrath >> > CC: Oleg Nesterov >> > CC: Pavel Emelyanov >> > CC: Serge E. Hallyn >> > --- >> > include/linux/seccomp.h | 8 ++++++ >> > include/uapi/linux/ptrace.h | 1 + >> > kernel/ptrace.c | 10 ++++++++ >> > kernel/seccomp.c | 62 ++++++++++++++++++++++++++++++++++++++++++++- >> > 4 files changed, 80 insertions(+), 1 deletion(-) >> > >> > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h >> > index a19ddac..7cc870f 100644 >> > --- a/include/linux/seccomp.h >> > +++ b/include/linux/seccomp.h >> > @@ -25,6 +25,9 @@ struct seccomp_filter; >> > struct seccomp { >> > int mode; >> > struct seccomp_filter *filter; >> > +#ifdef CONFIG_CHECKPOINT_RESTORE >> > + bool suspended; >> > +#endif >> > }; >> > >> > #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER >> > @@ -53,6 +56,11 @@ static inline int seccomp_mode(struct seccomp *s) >> > return s->mode; >> > } >> > >> > +#ifdef CONFIG_CHECKPOINT_RESTORE >> > +extern int suspend_seccomp(struct task_struct *); >> > +extern int resume_seccomp(struct task_struct *); >> > +#endif >> > + >> > #else /* CONFIG_SECCOMP */ >> > >> > #include >> > diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h >> > index cf1019e..8ba4e4f 100644 >> > --- a/include/uapi/linux/ptrace.h >> > +++ b/include/uapi/linux/ptrace.h >> > @@ -17,6 +17,7 @@ >> > #define PTRACE_CONT 7 >> > #define PTRACE_KILL 8 >> > #define PTRACE_SINGLESTEP 9 >> > +#define PTRACE_SUSPEND_SECCOMP 10 >> > >> > #define PTRACE_ATTACH 16 >> > #define PTRACE_DETACH 17 >> > diff --git a/kernel/ptrace.c b/kernel/ptrace.c >> > index c8e0e05..a6b6527 100644 >> > --- a/kernel/ptrace.c >> > +++ b/kernel/ptrace.c >> > @@ -15,6 +15,7 @@ >> > #include >> > #include >> > #include >> > +#include >> > #include >> > #include >> > #include >> > @@ -1003,6 +1004,15 @@ int ptrace_request(struct task_struct *child, long request, >> > break; >> > } >> > #endif >> > + >> > +#if defined(CONFIG_SECCOMP) && defined(CONFIG_CHECKPOINT_RESTORE) >> > + case PTRACE_SUSPEND_SECCOMP: >> > + if (data) >> > + return suspend_seccomp(child); >> > + else >> > + return resume_seccomp(child); >> > +#endif >> > + >> > default: >> > break; >> > } >> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c >> > index 980fd26..a358a58 100644 >> > --- a/kernel/seccomp.c >> > +++ b/kernel/seccomp.c >> > @@ -569,6 +569,7 @@ static int mode1_syscalls_32[] = { >> > static void __secure_computing_strict(int this_syscall) >> > { >> > int *syscall_whitelist = mode1_syscalls; >> > + >> > #ifdef CONFIG_COMPAT >> > if (is_compat_task()) >> > syscall_whitelist = mode1_syscalls_32; >> > @@ -590,6 +591,11 @@ void secure_computing_strict(int this_syscall) >> > { >> > int mode = current->seccomp.mode; >> > >> > +#ifdef CONFIG_CHECKPOINT_RESTORE >> > + if (current->seccomp.suspended) >> > + return; >> > +#endif >> >> IMO it's unfortunate that this has any runtime overhead at all. Can >> it be suspend be multiplexed into mode to avoid this? >> >> > >> > +#ifdef CONFIG_CHECKPOINT_RESTORE >> > + if (unlikely(current->seccomp.suspended)) >> > + return SECCOMP_PHASE1_OK; >> > +#endif >> > + >> >> Ditto. >> >> > @@ -769,7 +780,8 @@ static long seccomp_set_mode_strict(void) >> > goto out; >> > >> > #ifdef TIF_NOTSC >> > - disable_TSC(); >> > + if (!current->seccomp.suspended) >> > + disable_TSC(); >> > #endif >> >> If you get here with seccomp suspended, you have already utterly >> failed. Please don't try to pretend that you're going to do something >> sensible if this happens. >> >> CRIU needs to freeze the target task reliably before suspending >> seccomp to avoid massive security holes. > > Sorry, I should probably have mentioned this. It actually useful on > restore: the problem is that CRIU maps its restorer code into memory, > does all of the necessary work to restore (including restore the > seccomp stuff), and then tries to unmap that blob and gets killed. > > So, the criu restore procedure is to suspend seccomp, restore the > seccomp state, and then let criu resume the seccomp state just before > the final task resume. > >> > seccomp_assign_mode(current, seccomp_mode); >> > ret = 0; >> > @@ -901,3 +913,51 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) >> > /* prctl interface doesn't have flags, so they are always zero. */ >> > return do_seccomp(op, 0, uargs); >> > } >> > + >> > +#ifdef CONFIG_CHECKPOINT_RESTORE >> > +int suspend_seccomp(struct task_struct *task) >> > +{ >> > + int ret = -EACCES; >> > + >> > + spin_lock_irq(&task->sighand->siglock); >> > + >> > + if (!capable(CAP_SYS_ADMIN)) >> > + goto out; >> > + >> > + task->seccomp.suspended = true; >> > + >> > +#ifdef TIF_NOTSC >> > + if (task->seccomp.mode == SECCOMP_MODE_STRICT) >> > + clear_tsk_thread_flag(task, TIF_NOTSC); >> > +#endif >> >> And what if the task is running? >> >> > + >> > + ret = 0; >> > +out: >> > + spin_unlock_irq(&task->sighand->siglock); >> > + >> > + return ret; >> > +} >> > + >> > +int resume_seccomp(struct task_struct *task) >> > +{ >> > + int ret = -EACCES; >> > + >> > + spin_lock_irq(&task->sighand->siglock); >> > + >> > + if (!capable(CAP_SYS_ADMIN)) >> > + goto out; >> > + >> > + task->seccomp.suspended = false; >> > + >> > +#ifdef TIF_NOTSC >> > + if (task->seccomp.mode == SECCOMP_MODE_STRICT) >> > + set_tsk_thread_flag(task, TIF_NOTSC); >> > +#endif >> >> Ditto. Or can the task not be running here? > > It is stopped since ptrace requires it to be stopped; I don't know if > that's enough to guarantee correctness, though. Is there some > additional barrier that is needed? Dunno. Does ptrace actually guarantee that for new operations? In any event, I'm not sure I see why you need to manipulate NOTSC like this at all. What goes wrong if you let the NOTSC take effect immediately? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/