This patch is the first step in enabling checkpoint/restore of processes
with seccomp enabled.
One of the things CRIU does while dumping tasks is inject code into them
via ptrace to collect information that is only available to the process
itself. However, if we are in a seccomp mode where these processes are
prohibited from making these syscalls, then what CRIU does kills the task.
This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
filters to disable (and re-enable) seccomp filters for another task so that
they can be successfully dumped (and restored). We restrict the set of
processes that can disable seccomp through ptrace because although today
ptrace can be used to bypass seccomp, there is some discussion of closing
this loophole in the future and we would like this patch to not depend on
that behavior and be future proofed for when it is removed.
Note that seccomp can be suspended before any filters are actually
installed; this behavior is useful on criu restore, so that we can suspend
seccomp, restore the filters, unmap our restore code from the restored
process' address space, and then resume the task by detaching and have the
filters resumed as well.
v2 changes:
* require that the tracer have no seccomp filters installed
* drop TIF_NOTSC manipulation from the patch
* change from ptrace command to a ptrace option and use this ptrace option
as the flag to check. This means that as soon as the tracer
detaches/dies, seccomp is re-enabled and as a corrollary that one can not
disable seccomp across PTRACE_ATTACHs.
Signed-off-by: Tycho Andersen <[email protected]>
CC: Kees Cook <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Will Drewry <[email protected]>
CC: Roland McGrath <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Pavel Emelyanov <[email protected]>
CC: Serge E. Hallyn <[email protected]>
---
include/linux/ptrace.h | 1 +
include/linux/seccomp.h | 4 ++++
include/uapi/linux/ptrace.h | 6 ++++--
kernel/ptrace.c | 6 ++++++
kernel/seccomp.c | 23 +++++++++++++++++++++++
5 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 987a73a..061265f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -34,6 +34,7 @@
#define PT_TRACE_SECCOMP PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP)
#define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
+#define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
/* single stepping state bits (used on ARM and PA-RISC) */
#define PT_SINGLESTEP_BIT 31
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index a19ddac..ae3ec52 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -53,6 +53,10 @@ static inline int seccomp_mode(struct seccomp *s)
return s->mode;
}
+#ifdef CONFIG_CHECKPOINT_RESTORE
+extern bool may_suspend_seccomp(void);
+#endif
+
#else /* CONFIG_SECCOMP */
#include <linux/errno.h>
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index cf1019e..a7a6979 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -89,9 +89,11 @@ struct ptrace_peeksiginfo_args {
#define PTRACE_O_TRACESECCOMP (1 << PTRACE_EVENT_SECCOMP)
/* eventless options */
-#define PTRACE_O_EXITKILL (1 << 20)
+#define PTRACE_O_EXITKILL (1 << 20)
+#define PTRACE_O_SUSPEND_SECCOMP (1 << 21)
-#define PTRACE_O_MASK (0x000000ff | PTRACE_O_EXITKILL)
+#define PTRACE_O_MASK (\
+ 0x000000ff | PTRACE_O_EXITKILL | PTRACE_O_SUSPEND_SECCOMP)
#include <asm/ptrace.h>
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c8e0e05..e3e68a2 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -15,6 +15,7 @@
#include <linux/highmem.h>
#include <linux/pagemap.h>
#include <linux/ptrace.h>
+#include <linux/seccomp.h>
#include <linux/security.h>
#include <linux/signal.h>
#include <linux/uio.h>
@@ -556,6 +557,11 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
if (data & ~(unsigned long)PTRACE_O_MASK)
return -EINVAL;
+#ifdef CONFIG_CHECKPOINT_RESTORE
+ if (data & PTRACE_O_SUSPEND_SECCOMP && !may_suspend_seccomp())
+ return -EPERM;
+#endif
+
/* Avoid intermediate state when all opts are cleared */
flags = child->ptrace;
flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 980fd26..2a1bd35 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -590,6 +590,11 @@ void secure_computing_strict(int this_syscall)
{
int mode = current->seccomp.mode;
+#ifdef CONFIG_CHECKPOINT_RESTORE
+ if (unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
+ return;
+#endif
+
if (mode == 0)
return;
else if (mode == SECCOMP_MODE_STRICT)
@@ -691,6 +696,11 @@ u32 seccomp_phase1(struct seccomp_data *sd)
int this_syscall = sd ? sd->nr :
syscall_get_nr(current, task_pt_regs(current));
+#ifdef CONFIG_CHECKPOINT_RESTORE
+ if (unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
+ return SECCOMP_PHASE1_OK;
+#endif
+
switch (mode) {
case SECCOMP_MODE_STRICT:
__secure_computing_strict(this_syscall); /* may call do_exit */
@@ -901,3 +911,16 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
/* prctl interface doesn't have flags, so they are always zero. */
return do_seccomp(op, 0, uargs);
}
+
+#ifdef CONFIG_CHECKPOINT_RESTORE
+bool may_suspend_seccomp(void)
+{
+ if (!capable(CAP_SYS_ADMIN))
+ return false;
+
+ if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
+ return false;
+
+ return true;
+}
+#endif /* CONFIG_CHECKPOINT_RESTORE */
--
2.1.4
On Wed, Jun 3, 2015 at 3:09 PM, Tycho Andersen
<[email protected]> wrote:
> This patch is the first step in enabling checkpoint/restore of processes
> with seccomp enabled.
>
> One of the things CRIU does while dumping tasks is inject code into them
> via ptrace to collect information that is only available to the process
> itself. However, if we are in a seccomp mode where these processes are
> prohibited from making these syscalls, then what CRIU does kills the task.
>
> This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
> a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
> filters to disable (and re-enable) seccomp filters for another task so that
> they can be successfully dumped (and restored). We restrict the set of
> processes that can disable seccomp through ptrace because although today
> ptrace can be used to bypass seccomp, there is some discussion of closing
> this loophole in the future and we would like this patch to not depend on
> that behavior and be future proofed for when it is removed.
>
> Note that seccomp can be suspended before any filters are actually
> installed; this behavior is useful on criu restore, so that we can suspend
> seccomp, restore the filters, unmap our restore code from the restored
> process' address space, and then resume the task by detaching and have the
> filters resumed as well.
>
> v2 changes:
>
> * require that the tracer have no seccomp filters installed
> * drop TIF_NOTSC manipulation from the patch
> * change from ptrace command to a ptrace option and use this ptrace option
> as the flag to check. This means that as soon as the tracer
> detaches/dies, seccomp is re-enabled and as a corrollary that one can not
> disable seccomp across PTRACE_ATTACHs.
This feature gives me the creeps, but I think it's okay. Could it be
further restricted so that the process doing the suspension is already
ptracing the target?
> Signed-off-by: Tycho Andersen <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Roland McGrath <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Pavel Emelyanov <[email protected]>
> CC: Serge E. Hallyn <[email protected]>
> ---
> include/linux/ptrace.h | 1 +
> include/linux/seccomp.h | 4 ++++
> include/uapi/linux/ptrace.h | 6 ++++--
> kernel/ptrace.c | 6 ++++++
> kernel/seccomp.c | 23 +++++++++++++++++++++++
> 5 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
> index 987a73a..061265f 100644
> --- a/include/linux/ptrace.h
> +++ b/include/linux/ptrace.h
> @@ -34,6 +34,7 @@
> #define PT_TRACE_SECCOMP PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP)
>
> #define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
> +#define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
>
> /* single stepping state bits (used on ARM and PA-RISC) */
> #define PT_SINGLESTEP_BIT 31
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index a19ddac..ae3ec52 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -53,6 +53,10 @@ static inline int seccomp_mode(struct seccomp *s)
> return s->mode;
> }
>
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +extern bool may_suspend_seccomp(void);
> +#endif
#else
static inline bool may_suspend_seccomp(void) { return false; }
#endif
> +
> #else /* CONFIG_SECCOMP */
>
> #include <linux/errno.h>
> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index cf1019e..a7a6979 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -89,9 +89,11 @@ struct ptrace_peeksiginfo_args {
> #define PTRACE_O_TRACESECCOMP (1 << PTRACE_EVENT_SECCOMP)
>
> /* eventless options */
> -#define PTRACE_O_EXITKILL (1 << 20)
> +#define PTRACE_O_EXITKILL (1 << 20)
> +#define PTRACE_O_SUSPEND_SECCOMP (1 << 21)
>
> -#define PTRACE_O_MASK (0x000000ff | PTRACE_O_EXITKILL)
> +#define PTRACE_O_MASK (\
> + 0x000000ff | PTRACE_O_EXITKILL | PTRACE_O_SUSPEND_SECCOMP)
>
> #include <asm/ptrace.h>
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index c8e0e05..e3e68a2 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -15,6 +15,7 @@
> #include <linux/highmem.h>
> #include <linux/pagemap.h>
> #include <linux/ptrace.h>
> +#include <linux/seccomp.h>
> #include <linux/security.h>
> #include <linux/signal.h>
> #include <linux/uio.h>
> @@ -556,6 +557,11 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
> if (data & ~(unsigned long)PTRACE_O_MASK)
> return -EINVAL;
>
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> + if (data & PTRACE_O_SUSPEND_SECCOMP && !may_suspend_seccomp())
> + return -EPERM;
> +#endif
I'd like to avoid seeing any #ifdefs added to the .c files. Using a
static inline for may_suspend_seccomp() should cause this statement to
be eliminated by the compiler.
> +
> /* Avoid intermediate state when all opts are cleared */
> flags = child->ptrace;
> flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 980fd26..2a1bd35 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -590,6 +590,11 @@ void secure_computing_strict(int this_syscall)
> {
> int mode = current->seccomp.mode;
>
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> + if (unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
> + return;
> +#endif
Could PT_SUSPEND_SECCOMP be defined to "0" with not
CONFIG_CHECKPOINT_RESTORE? Then this wouldn't need ifdefs, and should
be similarly eliminated by the compiler.
> +
> if (mode == 0)
> return;
> else if (mode == SECCOMP_MODE_STRICT)
> @@ -691,6 +696,11 @@ u32 seccomp_phase1(struct seccomp_data *sd)
> int this_syscall = sd ? sd->nr :
> syscall_get_nr(current, task_pt_regs(current));
>
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> + if (unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
> + return SECCOMP_PHASE1_OK;
> +#endif
> +
> switch (mode) {
> case SECCOMP_MODE_STRICT:
> __secure_computing_strict(this_syscall); /* may call do_exit */
> @@ -901,3 +911,16 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
> /* prctl interface doesn't have flags, so they are always zero. */
> return do_seccomp(op, 0, uargs);
> }
> +
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +bool may_suspend_seccomp(void)
> +{
> + if (!capable(CAP_SYS_ADMIN))
> + return false;
> +
> + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
> + return false;
> +
> + return true;
> +}
> +#endif /* CONFIG_CHECKPOINT_RESTORE */
> --
> 2.1.4
>
Thanks for working on this!
-Kees
--
Kees Cook
Chrome OS Security
On Thu, Jun 04, 2015 at 09:44:36AM -0700, Kees Cook wrote:
> On Wed, Jun 3, 2015 at 3:09 PM, Tycho Andersen
> <[email protected]> wrote:
> > This patch is the first step in enabling checkpoint/restore of processes
> > with seccomp enabled.
> >
> > One of the things CRIU does while dumping tasks is inject code into them
> > via ptrace to collect information that is only available to the process
> > itself. However, if we are in a seccomp mode where these processes are
> > prohibited from making these syscalls, then what CRIU does kills the task.
> >
> > This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
> > a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
> > filters to disable (and re-enable) seccomp filters for another task so that
> > they can be successfully dumped (and restored). We restrict the set of
> > processes that can disable seccomp through ptrace because although today
> > ptrace can be used to bypass seccomp, there is some discussion of closing
> > this loophole in the future and we would like this patch to not depend on
> > that behavior and be future proofed for when it is removed.
> >
> > Note that seccomp can be suspended before any filters are actually
> > installed; this behavior is useful on criu restore, so that we can suspend
> > seccomp, restore the filters, unmap our restore code from the restored
> > process' address space, and then resume the task by detaching and have the
> > filters resumed as well.
> >
> > v2 changes:
> >
> > * require that the tracer have no seccomp filters installed
> > * drop TIF_NOTSC manipulation from the patch
> > * change from ptrace command to a ptrace option and use this ptrace option
> > as the flag to check. This means that as soon as the tracer
> > detaches/dies, seccomp is re-enabled and as a corrollary that one can not
> > disable seccomp across PTRACE_ATTACHs.
>
> This feature gives me the creeps, but I think it's okay.
:D
> Could it be
> further restricted so that the process doing the suspension is already
> ptracing the target?
As far as I understand it you do have to PTRACE_{ATTACH,SEIZE} to the
target before setting options in general. Is that not what you mean
here?
The rest of the changes sound good, I'll make those and resend.
>
> Thanks for working on this!
Thanks for the review.
Tycho
On 06/04, Kees Cook wrote:
>
> On Wed, Jun 3, 2015 at 3:09 PM, Tycho Andersen
> > @@ -556,6 +557,11 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
> > if (data & ~(unsigned long)PTRACE_O_MASK)
> > return -EINVAL;
> >
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > + if (data & PTRACE_O_SUSPEND_SECCOMP && !may_suspend_seccomp())
> > + return -EPERM;
> > +#endif
>
> I'd like to avoid seeing any #ifdefs added to the .c files. Using a
> static inline for may_suspend_seccomp() should cause this statement to
> be eliminated by the compiler.
Agreed, me too, but see below.
> > @@ -590,6 +590,11 @@ void secure_computing_strict(int this_syscall)
> > {
> > int mode = current->seccomp.mode;
> >
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > + if (unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
> > + return;
> > +#endif
>
> Could PT_SUSPEND_SECCOMP be defined to "0" with not
> CONFIG_CHECKPOINT_RESTORE? Then this wouldn't need ifdefs, and should
> be similarly eliminated by the compiler.
Yes, but this way we add another ugly ifdef into .h, and if you read
this code it is not clear that this check should be eliminated by gcc.
I'd suggest
if (config_enabled(CONFIG_CHECKPOINT_RESTORE) &&
unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
return;
Oleg.
On Thu, Jun 4, 2015 at 11:03 AM, Oleg Nesterov <[email protected]> wrote:
> On 06/04, Kees Cook wrote:
>>
>> On Wed, Jun 3, 2015 at 3:09 PM, Tycho Andersen
>> > @@ -556,6 +557,11 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
>> > if (data & ~(unsigned long)PTRACE_O_MASK)
>> > return -EINVAL;
>> >
>> > +#ifdef CONFIG_CHECKPOINT_RESTORE
>> > + if (data & PTRACE_O_SUSPEND_SECCOMP && !may_suspend_seccomp())
>> > + return -EPERM;
>> > +#endif
>>
>> I'd like to avoid seeing any #ifdefs added to the .c files. Using a
>> static inline for may_suspend_seccomp() should cause this statement to
>> be eliminated by the compiler.
>
> Agreed, me too, but see below.
>
>> > @@ -590,6 +590,11 @@ void secure_computing_strict(int this_syscall)
>> > {
>> > int mode = current->seccomp.mode;
>> >
>> > +#ifdef CONFIG_CHECKPOINT_RESTORE
>> > + if (unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
>> > + return;
>> > +#endif
>>
>> Could PT_SUSPEND_SECCOMP be defined to "0" with not
>> CONFIG_CHECKPOINT_RESTORE? Then this wouldn't need ifdefs, and should
>> be similarly eliminated by the compiler.
>
> Yes, but this way we add another ugly ifdef into .h, and if you read
> this code it is not clear that this check should be eliminated by gcc.
>
> I'd suggest
>
> if (config_enabled(CONFIG_CHECKPOINT_RESTORE) &&
> unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
> return;
Ah! Yes, that makes things nicer.
-Kees
--
Kees Cook
Chrome OS Security
On Thu, Jun 4, 2015 at 10:15 AM, Tycho Andersen
<[email protected]> wrote:
> On Thu, Jun 04, 2015 at 09:44:36AM -0700, Kees Cook wrote:
>> On Wed, Jun 3, 2015 at 3:09 PM, Tycho Andersen
>> <[email protected]> wrote:
>> > This patch is the first step in enabling checkpoint/restore of processes
>> > with seccomp enabled.
>> >
>> > One of the things CRIU does while dumping tasks is inject code into them
>> > via ptrace to collect information that is only available to the process
>> > itself. However, if we are in a seccomp mode where these processes are
>> > prohibited from making these syscalls, then what CRIU does kills the task.
>> >
>> > This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
>> > a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
>> > filters to disable (and re-enable) seccomp filters for another task so that
>> > they can be successfully dumped (and restored). We restrict the set of
>> > processes that can disable seccomp through ptrace because although today
>> > ptrace can be used to bypass seccomp, there is some discussion of closing
>> > this loophole in the future and we would like this patch to not depend on
>> > that behavior and be future proofed for when it is removed.
>> >
>> > Note that seccomp can be suspended before any filters are actually
>> > installed; this behavior is useful on criu restore, so that we can suspend
>> > seccomp, restore the filters, unmap our restore code from the restored
>> > process' address space, and then resume the task by detaching and have the
>> > filters resumed as well.
>> >
>> > v2 changes:
>> >
>> > * require that the tracer have no seccomp filters installed
>> > * drop TIF_NOTSC manipulation from the patch
>> > * change from ptrace command to a ptrace option and use this ptrace option
>> > as the flag to check. This means that as soon as the tracer
>> > detaches/dies, seccomp is re-enabled and as a corrollary that one can not
>> > disable seccomp across PTRACE_ATTACHs.
>>
>> This feature gives me the creeps, but I think it's okay.
>
> :D
>
>> Could it be
>> further restricted so that the process doing the suspension is already
>> ptracing the target?
>
> As far as I understand it you do have to PTRACE_{ATTACH,SEIZE} to the
> target before setting options in general. Is that not what you mean
> here?
Ah, true, yes. Okay, ignore me. I was thinking about the mechanism for
setting the flag wrong. :)
-Kees
>
> The rest of the changes sound good, I'll make those and resend.
>
>>
>> Thanks for working on this!
>
> Thanks for the review.
>
> Tycho
--
Kees Cook
Chrome OS Security
On 06/03, Tycho Andersen wrote:
>
> @@ -556,6 +557,11 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
> if (data & ~(unsigned long)PTRACE_O_MASK)
> return -EINVAL;
>
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> + if (data & PTRACE_O_SUSPEND_SECCOMP && !may_suspend_seccomp())
> + return -EPERM;
> +#endif
> +
Well. This -EPERM doesn't look consistent...
if config_enabled(CONFIG_CHECKPOINT_RESTORE) == F, we return success
but PTRACE_O_SUSPEND_SECCOMP has no effect because of another ifdef in
seccomp.
OTOH, if CONFIG_SECCOMP=n, this option has no effect too but we return
-EPERM even.
Also. Suppose that the tracer sets SUSPEND_SECCOMP and then drops
CAP_SYS_ADMIN. After that it can't set or clear other ptrace options.
So if we really want the security checks (I still think we do not ;)
then we should probably check "flags & SUSPEND_SECCOMP" as well.
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +bool may_suspend_seccomp(void)
> +{
> + if (!capable(CAP_SYS_ADMIN))
> + return false;
> +
> + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
> + return false;
Heh. OK, I won't argue with the new check too ;)
Oleg.
Hi Oleg,
On Thu, Jun 04, 2015 at 08:31:49PM +0200, Oleg Nesterov wrote:
> On 06/03, Tycho Andersen wrote:
> >
> > @@ -556,6 +557,11 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
> > if (data & ~(unsigned long)PTRACE_O_MASK)
> > return -EINVAL;
> >
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > + if (data & PTRACE_O_SUSPEND_SECCOMP && !may_suspend_seccomp())
> > + return -EPERM;
> > +#endif
> > +
>
> Well. This -EPERM doesn't look consistent...
>
> if config_enabled(CONFIG_CHECKPOINT_RESTORE) == F, we return success
> but PTRACE_O_SUSPEND_SECCOMP has no effect because of another ifdef in
> seccomp.
>
> OTOH, if CONFIG_SECCOMP=n, this option has no effect too but we return
> -EPERM even.
Yes, something like:
if (unlikely(data & PTRACE_O_SUSPEND_SECCOMP)) {
if (!config_enabled(CONFIG_CHECKPOINT_RESTORE) ||
!config_enabled(CONFIG_SECCOMP))
return -EINVAL;
if (!may_suspend_seccomp())
return -EPERM;
}
I guess.
> Also. Suppose that the tracer sets SUSPEND_SECCOMP and then drops
> CAP_SYS_ADMIN. After that it can't set or clear other ptrace options.
Is this a case we're concerned about? I think this should be ok (i.e.
"don't do that" :).
> So if we really want the security checks (I still think we do not ;)
> then we should probably check "flags & SUSPEND_SECCOMP" as well.
Good point.
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > +bool may_suspend_seccomp(void)
> > +{
> > + if (!capable(CAP_SYS_ADMIN))
> > + return false;
> > +
> > + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
> > + return false;
>
> Heh. OK, I won't argue with the new check too ;)
Actually now that I think about it I agree with you, these checks
don't seem necessary. Even inside a user namespace, if you can ptrace
a process you can make it do whatever you want irrespective of
seccomp, as long as it has the necessary capabilities. Once the
seccomp checks are run after ptrace, they'll be enforced so you
couldn't have it call whatever you want in the first place.
Still, perhaps I'm missing something...
Tycho
Hi Tycho,
On 06/04, Tycho Andersen wrote:
>
> On Thu, Jun 04, 2015 at 08:31:49PM +0200, Oleg Nesterov wrote:
> > Also. Suppose that the tracer sets SUSPEND_SECCOMP and then drops
> > CAP_SYS_ADMIN. After that it can't set or clear other ptrace options.
>
> Is this a case we're concerned about? I think this should be ok (i.e.
> "don't do that" :).
Sure, I won't insist. Just this looks a bit confusing. I mean, if you
read this code it is not clear why may_suspend_seccomp() is called even
if the tracer changes other bits, and "data & PTRACE_O_SUSPEND" is true
only because the tracer does _not_ change this option.
IOW, imo the code will just look better if may_suspend_seccomp() is
called only when PTRACE_O_SUSPEND is set. But this is minor, feel free
to ignore.
> > > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > > +bool may_suspend_seccomp(void)
> > > +{
> > > + if (!capable(CAP_SYS_ADMIN))
> > > + return false;
> > > +
> > > + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
> > > + return false;
> >
> > Heh. OK, I won't argue with the new check too ;)
>
> Actually now that I think about it I agree with you, these checks
> don't seem necessary. Even inside a user namespace, if you can ptrace
> a process you can make it do whatever you want irrespective of
> seccomp, as long as it has the necessary capabilities. Once the
> seccomp checks are run after ptrace, they'll be enforced so you
> couldn't have it call whatever you want in the first place.
Good ;)
> Still, perhaps I'm missing something...
Kees, Andy?
Oleg.
On Fri, Jun 05, 2015 at 11:16:50PM +0200, Oleg Nesterov wrote:
> Hi Tycho,
>
> On 06/04, Tycho Andersen wrote:
> >
> > On Thu, Jun 04, 2015 at 08:31:49PM +0200, Oleg Nesterov wrote:
> > > Also. Suppose that the tracer sets SUSPEND_SECCOMP and then drops
> > > CAP_SYS_ADMIN. After that it can't set or clear other ptrace options.
> >
> > Is this a case we're concerned about? I think this should be ok (i.e.
> > "don't do that" :).
>
> Sure, I won't insist. Just this looks a bit confusing. I mean, if you
> read this code it is not clear why may_suspend_seccomp() is called even
> if the tracer changes other bits, and "data & PTRACE_O_SUSPEND" is true
> only because the tracer does _not_ change this option.
>
> IOW, imo the code will just look better if may_suspend_seccomp() is
> called only when PTRACE_O_SUSPEND is set. But this is minor, feel free
> to ignore.
Oh, I understand now. I think this is fixed in v3 that I just sent,
but may go away in any case if we remove the checks...
> > > > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > > > +bool may_suspend_seccomp(void)
> > > > +{
> > > > + if (!capable(CAP_SYS_ADMIN))
> > > > + return false;
> > > > +
> > > > + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
> > > > + return false;
> > >
> > > Heh. OK, I won't argue with the new check too ;)
> >
> > Actually now that I think about it I agree with you, these checks
> > don't seem necessary. Even inside a user namespace, if you can ptrace
> > a process you can make it do whatever you want irrespective of
> > seccomp, as long as it has the necessary capabilities. Once the
> > seccomp checks are run after ptrace, they'll be enforced so you
> > couldn't have it call whatever you want in the first place.
>
> Good ;)
>
> > Still, perhaps I'm missing something...
>
> Kees, Andy?
Doh, just sent v3. If you guys are convinced too, then I can send v4
with these checks removed.
Tycho
Hi Kees, Andy,
On Fri, Jun 05, 2015 at 11:16:50PM +0200, Oleg Nesterov wrote:
> Hi Tycho,
>
> On 06/04, Tycho Andersen wrote:
> > > > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > > > +bool may_suspend_seccomp(void)
> > > > +{
> > > > + if (!capable(CAP_SYS_ADMIN))
> > > > + return false;
> > > > +
> > > > + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
> > > > + return false;
> > >
> > > Heh. OK, I won't argue with the new check too ;)
> >
> > Actually now that I think about it I agree with you, these checks
> > don't seem necessary. Even inside a user namespace, if you can ptrace
> > a process you can make it do whatever you want irrespective of
> > seccomp, as long as it has the necessary capabilities. Once the
> > seccomp checks are run after ptrace, they'll be enforced so you
> > couldn't have it call whatever you want in the first place.
>
> Good ;)
>
> > Still, perhaps I'm missing something...
>
> Kees, Andy?
Any thoughts on removing may_suspend_seccomp() all together?
I sent v3 with this still in it, but I can send v4 without it if we
are all in agreement.
Tycho
On Tue, Jun 9, 2015 at 2:22 PM, Tycho Andersen
<[email protected]> wrote:
> Hi Kees, Andy,
>
> On Fri, Jun 05, 2015 at 11:16:50PM +0200, Oleg Nesterov wrote:
>> Hi Tycho,
>>
>> On 06/04, Tycho Andersen wrote:
>> > > > +#ifdef CONFIG_CHECKPOINT_RESTORE
>> > > > +bool may_suspend_seccomp(void)
>> > > > +{
>> > > > + if (!capable(CAP_SYS_ADMIN))
>> > > > + return false;
>> > > > +
>> > > > + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
>> > > > + return false;
>> > >
>> > > Heh. OK, I won't argue with the new check too ;)
>> >
>> > Actually now that I think about it I agree with you, these checks
>> > don't seem necessary. Even inside a user namespace, if you can ptrace
>> > a process you can make it do whatever you want irrespective of
>> > seccomp, as long as it has the necessary capabilities. Once the
>> > seccomp checks are run after ptrace, they'll be enforced so you
>> > couldn't have it call whatever you want in the first place.
>>
>> Good ;)
>>
>> > Still, perhaps I'm missing something...
>>
>> Kees, Andy?
>
> Any thoughts on removing may_suspend_seccomp() all together?
As in, just open-code the check? That would be fine by me.
> I sent v3 with this still in it, but I can send v4 without it if we
> are all in agreement.
-Kees
--
Kees Cook
Chrome OS Security
On Tue, Jun 09, 2015 at 02:45:49PM -0700, Kees Cook wrote:
> On Tue, Jun 9, 2015 at 2:22 PM, Tycho Andersen
> <[email protected]> wrote:
> > Hi Kees, Andy,
> >
> > On Fri, Jun 05, 2015 at 11:16:50PM +0200, Oleg Nesterov wrote:
> >> Hi Tycho,
> >>
> >> On 06/04, Tycho Andersen wrote:
> >> > > > +#ifdef CONFIG_CHECKPOINT_RESTORE
> >> > > > +bool may_suspend_seccomp(void)
> >> > > > +{
> >> > > > + if (!capable(CAP_SYS_ADMIN))
> >> > > > + return false;
> >> > > > +
> >> > > > + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
> >> > > > + return false;
> >> > >
> >> > > Heh. OK, I won't argue with the new check too ;)
> >> >
> >> > Actually now that I think about it I agree with you, these checks
> >> > don't seem necessary. Even inside a user namespace, if you can ptrace
> >> > a process you can make it do whatever you want irrespective of
> >> > seccomp, as long as it has the necessary capabilities. Once the
> >> > seccomp checks are run after ptrace, they'll be enforced so you
> >> > couldn't have it call whatever you want in the first place.
> >>
> >> Good ;)
> >>
> >> > Still, perhaps I'm missing something...
> >>
> >> Kees, Andy?
> >
> > Any thoughts on removing may_suspend_seccomp() all together?
>
> As in, just open-code the check? That would be fine by me.
Sorry, I meant getting rid of any checks entirely. Using my argument
above I've managed to convince myself they don't add any value. You
guys know a lot more about this than I do, though.
Tycho
On Tue, Jun 9, 2015 at 2:52 PM, Tycho Andersen
<[email protected]> wrote:
> On Tue, Jun 09, 2015 at 02:45:49PM -0700, Kees Cook wrote:
>> On Tue, Jun 9, 2015 at 2:22 PM, Tycho Andersen
>> <[email protected]> wrote:
>> > Hi Kees, Andy,
>> >
>> > On Fri, Jun 05, 2015 at 11:16:50PM +0200, Oleg Nesterov wrote:
>> >> Hi Tycho,
>> >>
>> >> On 06/04, Tycho Andersen wrote:
>> >> > > > +#ifdef CONFIG_CHECKPOINT_RESTORE
>> >> > > > +bool may_suspend_seccomp(void)
>> >> > > > +{
>> >> > > > + if (!capable(CAP_SYS_ADMIN))
>> >> > > > + return false;
>> >> > > > +
>> >> > > > + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
>> >> > > > + return false;
>> >> > >
>> >> > > Heh. OK, I won't argue with the new check too ;)
>> >> >
>> >> > Actually now that I think about it I agree with you, these checks
>> >> > don't seem necessary. Even inside a user namespace, if you can ptrace
>> >> > a process you can make it do whatever you want irrespective of
>> >> > seccomp, as long as it has the necessary capabilities. Once the
>> >> > seccomp checks are run after ptrace, they'll be enforced so you
>> >> > couldn't have it call whatever you want in the first place.
>> >>
>> >> Good ;)
>> >>
>> >> > Still, perhaps I'm missing something...
>> >>
>> >> Kees, Andy?
>> >
>> > Any thoughts on removing may_suspend_seccomp() all together?
>>
>> As in, just open-code the check? That would be fine by me.
>
> Sorry, I meant getting rid of any checks entirely. Using my argument
> above I've managed to convince myself they don't add any value. You
> guys know a lot more about this than I do, though.
Well, as things stand currently, yes, that check would be redundant.
The fact that ptrace can be used to bypass seccomp is kind of an
accident, though. The design for ptrace-based seccomp managers was
that the manager would do the work, rather than rewriting the syscall
on behalf of the child. I don't think anything actually uses this
effect. It's something we've wanted to fix, though a clean solution
isn't obvious. As a result, I'm cautious to add this behavior in such
a wide open fashion. For now, I'd like to limit the scope of this to
CAP_SYS_ADMIN.
I do think dropping the seccomp.mode check is fine -- this would mean
you could set this flag before the child even added seccomp filters.
So, instead of the function call, maybe just add the capable() call?
-Kees
--
Kees Cook
Chrome OS Security
On Tue, Jun 09, 2015 at 03:06:07PM -0700, Kees Cook wrote:
> On Tue, Jun 9, 2015 at 2:52 PM, Tycho Andersen
> <[email protected]> wrote:
> > On Tue, Jun 09, 2015 at 02:45:49PM -0700, Kees Cook wrote:
> >> On Tue, Jun 9, 2015 at 2:22 PM, Tycho Andersen
> >> <[email protected]> wrote:
> >> > Hi Kees, Andy,
> >> >
> >> > On Fri, Jun 05, 2015 at 11:16:50PM +0200, Oleg Nesterov wrote:
> >> >> Hi Tycho,
> >> >>
> >> >> On 06/04, Tycho Andersen wrote:
> >> >> > > > +#ifdef CONFIG_CHECKPOINT_RESTORE
> >> >> > > > +bool may_suspend_seccomp(void)
> >> >> > > > +{
> >> >> > > > + if (!capable(CAP_SYS_ADMIN))
> >> >> > > > + return false;
> >> >> > > > +
> >> >> > > > + if (current->seccomp.mode != SECCOMP_MODE_DISABLED)
> >> >> > > > + return false;
> >> >> > >
> >> >> > > Heh. OK, I won't argue with the new check too ;)
> >> >> >
> >> >> > Actually now that I think about it I agree with you, these checks
> >> >> > don't seem necessary. Even inside a user namespace, if you can ptrace
> >> >> > a process you can make it do whatever you want irrespective of
> >> >> > seccomp, as long as it has the necessary capabilities. Once the
> >> >> > seccomp checks are run after ptrace, they'll be enforced so you
> >> >> > couldn't have it call whatever you want in the first place.
> >> >>
> >> >> Good ;)
> >> >>
> >> >> > Still, perhaps I'm missing something...
> >> >>
> >> >> Kees, Andy?
> >> >
> >> > Any thoughts on removing may_suspend_seccomp() all together?
> >>
> >> As in, just open-code the check? That would be fine by me.
> >
> > Sorry, I meant getting rid of any checks entirely. Using my argument
> > above I've managed to convince myself they don't add any value. You
> > guys know a lot more about this than I do, though.
>
> Well, as things stand currently, yes, that check would be redundant.
> The fact that ptrace can be used to bypass seccomp is kind of an
> accident, though. The design for ptrace-based seccomp managers was
> that the manager would do the work, rather than rewriting the syscall
> on behalf of the child. I don't think anything actually uses this
> effect. It's something we've wanted to fix, though a clean solution
> isn't obvious. As a result, I'm cautious to add this behavior in such
> a wide open fashion. For now, I'd like to limit the scope of this to
> CAP_SYS_ADMIN.
>
> I do think dropping the seccomp.mode check is fine -- this would mean
> you could set this flag before the child even added seccomp filters.
> So, instead of the function call, maybe just add the capable() call?
Ok, sounds good; I'll make the change and re-send.
Thanks!
Tycho