This patch is the first step in enabling checkpoint/restore of processes
with seccomp enabled.
One of the things CRIU does while dumping tasks is inject code into them
via ptrace to collect information that is only available to the process
itself. However, if we are in a seccomp mode where these processes are
prohibited from making these syscalls, then what CRIU does kills the task.
This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
filters to disable (and re-enable) seccomp filters for another task so that
they can be successfully dumped (and restored). We restrict the set of
processes that can disable seccomp through ptrace because although today
ptrace can be used to bypass seccomp, there is some discussion of closing
this loophole in the future and we would like this patch to not depend on
that behavior and be future proofed for when it is removed.
Note that seccomp can be suspended before any filters are actually
installed; this behavior is useful on criu restore, so that we can suspend
seccomp, restore the filters, unmap our restore code from the restored
process' address space, and then resume the task by detaching and have the
filters resumed as well.
v2 changes:
* require that the tracer have no seccomp filters installed
* drop TIF_NOTSC manipulation from the patch
* change from ptrace command to a ptrace option and use this ptrace option
as the flag to check. This means that as soon as the tracer
detaches/dies, seccomp is re-enabled and as a corrollary that one can not
disable seccomp across PTRACE_ATTACHs.
v3 changes:
* get rid of various #ifdefs everywhere
* report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
used
v4 changes:
* get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
directly
v5 changes:
* check that seccomp is not enabled (or suspended) on the tracer
Signed-off-by: Tycho Andersen <[email protected]>
CC: Kees Cook <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Will Drewry <[email protected]>
CC: Roland McGrath <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Pavel Emelyanov <[email protected]>
CC: Serge E. Hallyn <[email protected]>
---
include/linux/ptrace.h | 1 +
include/uapi/linux/ptrace.h | 6 ++++--
kernel/ptrace.c | 13 +++++++++++++
kernel/seccomp.c | 8 ++++++++
4 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 987a73a..061265f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -34,6 +34,7 @@
#define PT_TRACE_SECCOMP PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP)
#define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
+#define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
/* single stepping state bits (used on ARM and PA-RISC) */
#define PT_SINGLESTEP_BIT 31
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index cf1019e..a7a6979 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -89,9 +89,11 @@ struct ptrace_peeksiginfo_args {
#define PTRACE_O_TRACESECCOMP (1 << PTRACE_EVENT_SECCOMP)
/* eventless options */
-#define PTRACE_O_EXITKILL (1 << 20)
+#define PTRACE_O_EXITKILL (1 << 20)
+#define PTRACE_O_SUSPEND_SECCOMP (1 << 21)
-#define PTRACE_O_MASK (0x000000ff | PTRACE_O_EXITKILL)
+#define PTRACE_O_MASK (\
+ 0x000000ff | PTRACE_O_EXITKILL | PTRACE_O_SUSPEND_SECCOMP)
#include <asm/ptrace.h>
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c8e0e05..496028b 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -556,6 +556,19 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
if (data & ~(unsigned long)PTRACE_O_MASK)
return -EINVAL;
+ if (unlikely(data & PTRACE_O_SUSPEND_SECCOMP)) {
+ if (!config_enabled(CONFIG_CHECKPOINT_RESTORE) ||
+ !config_enabled(CONFIG_SECCOMP))
+ return -EINVAL;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (current->seccomp.mode != SECCOMP_MODE_DISABLED ||
+ current->ptrace & PT_SUSPEND_SECCOMP)
+ return -EPERM;
+ }
+
/* Avoid intermediate state when all opts are cleared */
flags = child->ptrace;
flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 980fd26..645e42d 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -590,6 +590,10 @@ void secure_computing_strict(int this_syscall)
{
int mode = current->seccomp.mode;
+ if (config_enabled(CONFIG_CHECKPOINT_RESTORE) &&
+ unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
+ return;
+
if (mode == 0)
return;
else if (mode == SECCOMP_MODE_STRICT)
@@ -691,6 +695,10 @@ u32 seccomp_phase1(struct seccomp_data *sd)
int this_syscall = sd ? sd->nr :
syscall_get_nr(current, task_pt_regs(current));
+ if (config_enabled(CONFIG_CHECKPOINT_RESTORE) &&
+ unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
+ return SECCOMP_PHASE1_OK;
+
switch (mode) {
case SECCOMP_MODE_STRICT:
__secure_computing_strict(this_syscall); /* may call do_exit */
--
2.1.4
On 06/13, Tycho Andersen wrote:
>
> This patch is the first step in enabling checkpoint/restore of processes
> with seccomp enabled.
So just in case, I am fine with this version.
> One of the things CRIU does while dumping tasks is inject code into them
> via ptrace to collect information that is only available to the process
> itself. However, if we are in a seccomp mode where these processes are
> prohibited from making these syscalls, then what CRIU does kills the task.
>
> This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
> a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
> filters to disable (and re-enable) seccomp filters for another task so that
> they can be successfully dumped (and restored). We restrict the set of
> processes that can disable seccomp through ptrace because although today
> ptrace can be used to bypass seccomp, there is some discussion of closing
> this loophole in the future and we would like this patch to not depend on
> that behavior and be future proofed for when it is removed.
>
> Note that seccomp can be suspended before any filters are actually
> installed; this behavior is useful on criu restore, so that we can suspend
> seccomp, restore the filters, unmap our restore code from the restored
> process' address space, and then resume the task by detaching and have the
> filters resumed as well.
>
> v2 changes:
>
> * require that the tracer have no seccomp filters installed
> * drop TIF_NOTSC manipulation from the patch
> * change from ptrace command to a ptrace option and use this ptrace option
> as the flag to check. This means that as soon as the tracer
> detaches/dies, seccomp is re-enabled and as a corrollary that one can not
> disable seccomp across PTRACE_ATTACHs.
>
> v3 changes:
>
> * get rid of various #ifdefs everywhere
> * report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
> used
>
> v4 changes:
>
> * get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
> directly
>
> v5 changes:
>
> * check that seccomp is not enabled (or suspended) on the tracer
>
> Signed-off-by: Tycho Andersen <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Roland McGrath <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Pavel Emelyanov <[email protected]>
> CC: Serge E. Hallyn <[email protected]>
> ---
> include/linux/ptrace.h | 1 +
> include/uapi/linux/ptrace.h | 6 ++++--
> kernel/ptrace.c | 13 +++++++++++++
> kernel/seccomp.c | 8 ++++++++
> 4 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
> index 987a73a..061265f 100644
> --- a/include/linux/ptrace.h
> +++ b/include/linux/ptrace.h
> @@ -34,6 +34,7 @@
> #define PT_TRACE_SECCOMP PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP)
>
> #define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
> +#define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
>
> /* single stepping state bits (used on ARM and PA-RISC) */
> #define PT_SINGLESTEP_BIT 31
> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index cf1019e..a7a6979 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -89,9 +89,11 @@ struct ptrace_peeksiginfo_args {
> #define PTRACE_O_TRACESECCOMP (1 << PTRACE_EVENT_SECCOMP)
>
> /* eventless options */
> -#define PTRACE_O_EXITKILL (1 << 20)
> +#define PTRACE_O_EXITKILL (1 << 20)
> +#define PTRACE_O_SUSPEND_SECCOMP (1 << 21)
>
> -#define PTRACE_O_MASK (0x000000ff | PTRACE_O_EXITKILL)
> +#define PTRACE_O_MASK (\
> + 0x000000ff | PTRACE_O_EXITKILL | PTRACE_O_SUSPEND_SECCOMP)
>
> #include <asm/ptrace.h>
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index c8e0e05..496028b 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -556,6 +556,19 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
> if (data & ~(unsigned long)PTRACE_O_MASK)
> return -EINVAL;
>
> + if (unlikely(data & PTRACE_O_SUSPEND_SECCOMP)) {
> + if (!config_enabled(CONFIG_CHECKPOINT_RESTORE) ||
> + !config_enabled(CONFIG_SECCOMP))
> + return -EINVAL;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (current->seccomp.mode != SECCOMP_MODE_DISABLED ||
> + current->ptrace & PT_SUSPEND_SECCOMP)
> + return -EPERM;
> + }
> +
> /* Avoid intermediate state when all opts are cleared */
> flags = child->ptrace;
> flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 980fd26..645e42d 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -590,6 +590,10 @@ void secure_computing_strict(int this_syscall)
> {
> int mode = current->seccomp.mode;
>
> + if (config_enabled(CONFIG_CHECKPOINT_RESTORE) &&
> + unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
> + return;
> +
> if (mode == 0)
> return;
> else if (mode == SECCOMP_MODE_STRICT)
> @@ -691,6 +695,10 @@ u32 seccomp_phase1(struct seccomp_data *sd)
> int this_syscall = sd ? sd->nr :
> syscall_get_nr(current, task_pt_regs(current));
>
> + if (config_enabled(CONFIG_CHECKPOINT_RESTORE) &&
> + unlikely(current->ptrace & PT_SUSPEND_SECCOMP))
> + return SECCOMP_PHASE1_OK;
> +
> switch (mode) {
> case SECCOMP_MODE_STRICT:
> __secure_computing_strict(this_syscall); /* may call do_exit */
> --
> 2.1.4
>
On 06/13/2015 06:02 PM, Tycho Andersen wrote:
> This patch is the first step in enabling checkpoint/restore of processes
> with seccomp enabled.
>
> One of the things CRIU does while dumping tasks is inject code into them
> via ptrace to collect information that is only available to the process
> itself. However, if we are in a seccomp mode where these processes are
> prohibited from making these syscalls, then what CRIU does kills the task.
>
> This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
> a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
> filters to disable (and re-enable) seccomp filters for another task so that
> they can be successfully dumped (and restored). We restrict the set of
> processes that can disable seccomp through ptrace because although today
> ptrace can be used to bypass seccomp, there is some discussion of closing
> this loophole in the future and we would like this patch to not depend on
> that behavior and be future proofed for when it is removed.
>
> Note that seccomp can be suspended before any filters are actually
> installed; this behavior is useful on criu restore, so that we can suspend
> seccomp, restore the filters, unmap our restore code from the restored
> process' address space, and then resume the task by detaching and have the
> filters resumed as well.
>
> v2 changes:
>
> * require that the tracer have no seccomp filters installed
> * drop TIF_NOTSC manipulation from the patch
> * change from ptrace command to a ptrace option and use this ptrace option
> as the flag to check. This means that as soon as the tracer
> detaches/dies, seccomp is re-enabled and as a corrollary that one can not
> disable seccomp across PTRACE_ATTACHs.
>
> v3 changes:
>
> * get rid of various #ifdefs everywhere
> * report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
> used
>
> v4 changes:
>
> * get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
> directly
>
> v5 changes:
>
> * check that seccomp is not enabled (or suspended) on the tracer
>
> Signed-off-by: Tycho Andersen <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Roland McGrath <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Pavel Emelyanov <[email protected]>
> CC: Serge E. Hallyn <[email protected]>
Acked-by: Pavel Emelyanov <[email protected]>
On Sat, Jun 13, 2015 at 4:52 PM, Oleg Nesterov <[email protected]> wrote:
> On 06/13, Tycho Andersen wrote:
>>
>> This patch is the first step in enabling checkpoint/restore of processes
>> with seccomp enabled.
>
> So just in case, I am fine with this version.
Should I add your Ack? Though I really like the idea of a
"Fine-with-this:" tag. ;)
Andy, if you're okay with this too, I'll add it to my seccomp tree.
Thanks again Tycho!
-Kees
--
Kees Cook
Chrome OS Security
On Mon, Jun 15, 2015 at 1:19 PM, Kees Cook <[email protected]> wrote:
> On Sat, Jun 13, 2015 at 4:52 PM, Oleg Nesterov <[email protected]> wrote:
>> On 06/13, Tycho Andersen wrote:
>>>
>>> This patch is the first step in enabling checkpoint/restore of processes
>>> with seccomp enabled.
>>
>> So just in case, I am fine with this version.
>
> Should I add your Ack? Though I really like the idea of a
> "Fine-with-this:" tag. ;)
>
> Andy, if you're okay with this too, I'll add it to my seccomp tree.
I'm fine with this. :)
Acked-by: Andy Lutomirski <[email protected]>
>
> Thanks again Tycho!
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security
--
Andy Lutomirski
AMA Capital Management, LLC
On 06/15, Kees Cook wrote:
>
> On Sat, Jun 13, 2015 at 4:52 PM, Oleg Nesterov <[email protected]> wrote:
> > On 06/13, Tycho Andersen wrote:
> >>
> >> This patch is the first step in enabling checkpoint/restore of processes
> >> with seccomp enabled.
> >
> > So just in case, I am fine with this version.
>
> Should I add your Ack? Though I really like the idea of a
> "Fine-with-this:" tag. ;)
Yes, please feel free to add ;)
Oleg.
On Mon, Jun 15, 2015 at 2:20 PM, Oleg Nesterov <[email protected]> wrote:
> On 06/15, Kees Cook wrote:
>>
>> On Sat, Jun 13, 2015 at 4:52 PM, Oleg Nesterov <[email protected]> wrote:
>> > On 06/13, Tycho Andersen wrote:
>> >>
>> >> This patch is the first step in enabling checkpoint/restore of processes
>> >> with seccomp enabled.
>> >
>> > So just in case, I am fine with this version.
>>
>> Should I add your Ack? Though I really like the idea of a
>> "Fine-with-this:" tag. ;)
>
> Yes, please feel free to add ;)
Thanks!
I've added this to the seccomp tree. It may be a bit delayed appearing
in -next, due to 4.2 opening soon.
-Kees
--
Kees Cook
Chrome OS Security
On Mon, Jun 15, 2015 at 3:04 PM, Kees Cook <[email protected]> wrote:
> On Mon, Jun 15, 2015 at 2:20 PM, Oleg Nesterov <[email protected]> wrote:
>> On 06/15, Kees Cook wrote:
>>>
>>> On Sat, Jun 13, 2015 at 4:52 PM, Oleg Nesterov <[email protected]> wrote:
>>> > On 06/13, Tycho Andersen wrote:
>>> >>
>>> >> This patch is the first step in enabling checkpoint/restore of processes
>>> >> with seccomp enabled.
>>> >
>>> > So just in case, I am fine with this version.
>>>
>>> Should I add your Ack? Though I really like the idea of a
>>> "Fine-with-this:" tag. ;)
>>
>> Yes, please feel free to add ;)
>
> Thanks!
>
> I've added this to the seccomp tree. It may be a bit delayed appearing
> in -next, due to 4.2 opening soon.
>
> -Kees
Tycho, would you be willing to send some man-page updates for this new
interface, so it's documented correctly for ptrace?
https://www.kernel.org/doc/man-pages/patches.html
Michael, see I thought about linux-api before it was actually in
Linus's tree! ;) If you see any issues here, please let us know.
Otherwise, hopefully Tycho will get you some updates to ptrace.2
and/or seccomp.2. :)
-Kees
--
Kees Cook
Chrome OS Security
On Mon, Jun 15, 2015 at 03:50:59PM -0700, Kees Cook wrote:
> On Mon, Jun 15, 2015 at 3:04 PM, Kees Cook <[email protected]> wrote:
> > On Mon, Jun 15, 2015 at 2:20 PM, Oleg Nesterov <[email protected]> wrote:
> >> On 06/15, Kees Cook wrote:
> >>>
> >>> On Sat, Jun 13, 2015 at 4:52 PM, Oleg Nesterov <[email protected]> wrote:
> >>> > On 06/13, Tycho Andersen wrote:
> >>> >>
> >>> >> This patch is the first step in enabling checkpoint/restore of processes
> >>> >> with seccomp enabled.
> >>> >
> >>> > So just in case, I am fine with this version.
> >>>
> >>> Should I add your Ack? Though I really like the idea of a
> >>> "Fine-with-this:" tag. ;)
> >>
> >> Yes, please feel free to add ;)
> >
> > Thanks!
> >
> > I've added this to the seccomp tree. It may be a bit delayed appearing
> > in -next, due to 4.2 opening soon.
> >
> > -Kees
>
> Tycho, would you be willing to send some man-page updates for this new
> interface, so it's documented correctly for ptrace?
> https://www.kernel.org/doc/man-pages/patches.html
Yep, sounds good. I'll send some ASAP.
Thanks,
Tycho