Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
introduced the ability to opt out of audit messages for accesses to
various proc files since they are not violations of policy.
While doing so it somehow switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking the
subjective credentials of the task to using the objective credentials. I
couldn't find the original lkml thread and so I don't know why this switch
was done. But it seems wrong since ptrace_has_cap() is currently only used
in ptrace_may_access(). And it's used to check whether the calling task
(subject) has the CAP_SYS_PTRACE capability in the provided user namespace
to operate on the target task (object). According to the cred.h comments
this would mean the subjective credentials of the calling task need to be
used.
This switches it to use security_capable() because we only call
ptrace_has_cap() in ptrace_may_access() and in there we already have a
stable reference to the calling tasks creds under cred_guard_mutex so
there's no need to go through another series of dereferences and rcu
locking done in ns_capable{_noaudit}().
Cc: Serge Hallyn <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: [email protected]
Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
Signed-off-by: Christian Brauner <[email protected]>
---
kernel/ptrace.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index cb9ddcc08119..b2fe800cae9a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -264,12 +264,14 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
return ret;
}
-static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
+static int ptrace_has_cap(const struct cred *cred, struct user_namespace *ns,
+ unsigned int mode)
{
if (mode & PTRACE_MODE_NOAUDIT)
- return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
+ return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NONE);
else
- return has_ns_capability(current, ns, CAP_SYS_PTRACE);
+ return security_capable(cred, ns, CAP_SYS_PTRACE,
+ CAP_OPT_NOAUDIT);
}
/* Returns 0 on success, -errno on denial. */
@@ -321,7 +323,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
gid_eq(caller_gid, tcred->sgid) &&
gid_eq(caller_gid, tcred->gid))
goto ok;
- if (ptrace_has_cap(tcred->user_ns, mode))
+ if (ptrace_has_cap(cred, tcred->user_ns, mode))
goto ok;
rcu_read_unlock();
return -EPERM;
@@ -340,7 +342,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
mm = task->mm;
if (mm &&
((get_dumpable(mm) != SUID_DUMP_USER) &&
- !ptrace_has_cap(mm->user_ns, mode)))
+ !ptrace_has_cap(cred, mm->user_ns, mode)))
return -EPERM;
return security_ptrace_access_check(task, mode);
base-commit: b3a987b0264d3ddbb24293ebff10eddfc472f653
--
2.25.0
Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
introduced the ability to opt out of audit messages for accesses to
various proc files since they are not violations of policy.
While doing so it somehow switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking the
subjective credentials of the task to using the objective credentials. I
couldn't find the original lkml thread and so I don't know why this switch
was done. But it seems wrong since ptrace_has_cap() is currently only used
in ptrace_may_access(). And it's used to check whether the calling task
(subject) has the CAP_SYS_PTRACE capability in the provided user namespace
to operate on the target task (object). According to the cred.h comments
this would mean the subjective credentials of the calling task need to be
used.
This switches it to use security_capable() because we only call
ptrace_has_cap() in ptrace_may_access() and in there we already have a
stable reference to the calling tasks creds under cred_guard_mutex so
there's no need to go through another series of dereferences and rcu
locking done in ns_capable{_noaudit}().
Cc: Serge Hallyn <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: [email protected]
Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
Signed-off-by: Christian Brauner <[email protected]>
---
kernel/ptrace.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index cb9ddcc08119..d146133e97f1 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -264,12 +264,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
return ret;
}
-static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
+static int ptrace_has_cap(const struct cred *cred, struct user_namespace *ns,
+ unsigned int mode)
{
if (mode & PTRACE_MODE_NOAUDIT)
- return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
+ return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NOAUDIT);
else
- return has_ns_capability(current, ns, CAP_SYS_PTRACE);
+ return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NONE);
}
/* Returns 0 on success, -errno on denial. */
@@ -321,7 +322,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
gid_eq(caller_gid, tcred->sgid) &&
gid_eq(caller_gid, tcred->gid))
goto ok;
- if (ptrace_has_cap(tcred->user_ns, mode))
+ if (ptrace_has_cap(cred, tcred->user_ns, mode))
goto ok;
rcu_read_unlock();
return -EPERM;
@@ -340,7 +341,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
mm = task->mm;
if (mm &&
((get_dumpable(mm) != SUID_DUMP_USER) &&
- !ptrace_has_cap(mm->user_ns, mode)))
+ !ptrace_has_cap(cred, mm->user_ns, mode)))
return -EPERM;
return security_ptrace_access_check(task, mode);
base-commit: b3a987b0264d3ddbb24293ebff10eddfc472f653
--
2.25.0
On Wed, Jan 15, 2020 at 06:17:36PM +0100, Christian Brauner wrote:
> Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
> introduced the ability to opt out of audit messages for accesses to
> various proc files since they are not violations of policy.
> While doing so it somehow switched the check from ns_capable() to
> has_ns_capability{_noaudit}(). That means it switched from checking the
> subjective credentials of the task to using the objective credentials. I
> couldn't find the original lkml thread and so I don't know why this switch
> was done. But it seems wrong since ptrace_has_cap() is currently only used
> in ptrace_may_access(). And it's used to check whether the calling task
> (subject) has the CAP_SYS_PTRACE capability in the provided user namespace
> to operate on the target task (object). According to the cred.h comments
> this would mean the subjective credentials of the calling task need to be
> used.
> This switches it to use security_capable() because we only call
> ptrace_has_cap() in ptrace_may_access() and in there we already have a
> stable reference to the calling tasks creds under cred_guard_mutex so
> there's no need to go through another series of dereferences and rcu
> locking done in ns_capable{_noaudit}().
>
> Cc: Serge Hallyn <[email protected]>
> Cc: Jann Horn <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Eric Paris <[email protected]>
> Cc: [email protected]
> Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
> Signed-off-by: Christian Brauner <[email protected]>
> ---
> kernel/ptrace.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index cb9ddcc08119..b2fe800cae9a 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -264,12 +264,14 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> return ret;
> }
>
> -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> +static int ptrace_has_cap(const struct cred *cred, struct user_namespace *ns,
> + unsigned int mode)
> {
> if (mode & PTRACE_MODE_NOAUDIT)
> - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> + return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NONE);
> else
> - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> + return security_capable(cred, ns, CAP_SYS_PTRACE,
> + CAP_OPT_NOAUDIT);
Accidently switched those two lines. I sent a fixed version now!
Thanks!
Christian
On Wed, Jan 15, 2020 at 06:23:55PM +0100, Christian Brauner wrote:
> Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
> introduced the ability to opt out of audit messages for accesses to
> various proc files since they are not violations of policy.
> While doing so it somehow switched the check from ns_capable() to
> has_ns_capability{_noaudit}(). That means it switched from checking the
> subjective credentials of the task to using the objective credentials. I
> couldn't find the original lkml thread and so I don't know why this switch
> was done. But it seems wrong since ptrace_has_cap() is currently only used
> in ptrace_may_access(). And it's used to check whether the calling task
> (subject) has the CAP_SYS_PTRACE capability in the provided user namespace
> to operate on the target task (object). According to the cred.h comments
> this would mean the subjective credentials of the calling task need to be
> used.
> This switches it to use security_capable() because we only call
> ptrace_has_cap() in ptrace_may_access() and in there we already have a
> stable reference to the calling tasks creds under cred_guard_mutex so
> there's no need to go through another series of dereferences and rcu
> locking done in ns_capable{_noaudit}().
>
> Cc: Serge Hallyn <[email protected]>
Thanks.
Reviewed-by: Serge Hallyn <[email protected]>
> Cc: Jann Horn <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Eric Paris <[email protected]>
> Cc: [email protected]
> Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
> Signed-off-by: Christian Brauner <[email protected]>
> ---
> kernel/ptrace.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index cb9ddcc08119..d146133e97f1 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -264,12 +264,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> return ret;
> }
>
> -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> +static int ptrace_has_cap(const struct cred *cred, struct user_namespace *ns,
> + unsigned int mode)
> {
> if (mode & PTRACE_MODE_NOAUDIT)
> - return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
> + return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NOAUDIT);
> else
> - return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> + return security_capable(cred, ns, CAP_SYS_PTRACE, CAP_OPT_NONE);
> }
>
> /* Returns 0 on success, -errno on denial. */
> @@ -321,7 +322,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> gid_eq(caller_gid, tcred->sgid) &&
> gid_eq(caller_gid, tcred->gid))
> goto ok;
> - if (ptrace_has_cap(tcred->user_ns, mode))
> + if (ptrace_has_cap(cred, tcred->user_ns, mode))
> goto ok;
> rcu_read_unlock();
> return -EPERM;
> @@ -340,7 +341,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> mm = task->mm;
> if (mm &&
> ((get_dumpable(mm) != SUID_DUMP_USER) &&
> - !ptrace_has_cap(mm->user_ns, mode)))
> + !ptrace_has_cap(cred, mm->user_ns, mode)))
> return -EPERM;
>
> return security_ptrace_access_check(task, mode);
>
> base-commit: b3a987b0264d3ddbb24293ebff10eddfc472f653
> --
> 2.25.0
On Wed, Jan 15, 2020 at 6:24 PM Christian Brauner
<[email protected]> wrote:
> Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
> introduced the ability to opt out of audit messages for accesses to
> various proc files since they are not violations of policy.
> While doing so it somehow switched the check from ns_capable() to
> has_ns_capability{_noaudit}(). That means it switched from checking the
> subjective credentials of the task to using the objective credentials. I
> couldn't find the original lkml thread and so I don't know why this switch
> was done. But it seems wrong since ptrace_has_cap() is currently only used
> in ptrace_may_access(). And it's used to check whether the calling task
> (subject) has the CAP_SYS_PTRACE capability in the provided user namespace
> to operate on the target task (object). According to the cred.h comments
> this would mean the subjective credentials of the calling task need to be
> used.
> This switches it to use security_capable() because we only call
> ptrace_has_cap() in ptrace_may_access() and in there we already have a
> stable reference to the calling tasks creds under cred_guard_mutex so
> there's no need to go through another series of dereferences and rcu
> locking done in ns_capable{_noaudit}().
Reviewed-by: Jann Horn <[email protected]>
On Wed, Jan 15, 2020 at 9:18 AM Christian Brauner
<[email protected]> wrote:
>
> Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
> introduced the ability to opt out of audit messages for accesses to
> various proc files since they are not violations of policy.
> While doing so it somehow switched the check from ns_capable() to
> has_ns_capability{_noaudit}(). That means it switched from checking the
> subjective credentials of the task to using the objective credentials. I
> couldn't find the original lkml thread and so I don't know why this switch
> was done. But it seems wrong since ptrace_has_cap() is currently only used
> in ptrace_may_access(). And it's used to check whether the calling task
> (subject) has the CAP_SYS_PTRACE capability in the provided user namespace
> to operate on the target task (object). According to the cred.h comments
> this would mean the subjective credentials of the calling task need to be
> used.
> This switches it to use security_capable() because we only call
> ptrace_has_cap() in ptrace_may_access() and in there we already have a
> stable reference to the calling tasks creds under cred_guard_mutex so
> there's no need to go through another series of dereferences and rcu
> locking done in ns_capable{_noaudit}().
This patch breaks CRIU tests:
All CRIU tests fail because ptrace returns EPERM:
$ python test/zdtm.py run -t zdtm/static/env00 --sat
=== Run 1/1 ================ zdtm/static/env00
========================== Run zdtm/static/env00 in h ==========================
Start test
./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
Run criu dump
=[strace]=> dump/zdtm/static/env00/44/1/dump.strace
=[log]=> dump/zdtm/static/env00/44/1/dump.log
------------------------ grep Error ------------------------
b'(00.014558) cg: `- [net_cls,net_prio] -> [/] [0]'
b'(00.014634) cg: `- [perf_event] -> [/] [0]'
b'(00.014713) cg: `- [pids] ->
[/user.slice/user-0.slice/session-1.scope] [0]'
b'(00.014818) cg: Set 1 is criu one'
b'(00.015123) Warn (compel/src/lib/infect.c:127): Unable to interrupt
task: 44 (Operation not permitted)'
b'(00.015302) Unlock network'
b'(00.015423) Unfreezing tasks into 1'
b'(00.015524) \tUnseizing 44 into 1'
b'(00.015701) Error (compel/src/lib/infect.c:346): Unable to detach
from 44: No such process'
b'(00.015864) Error (criu/cr-dump.c:1775): Dumping FAILED.'
------------------------ ERROR OVER ------------------------
################### Test zdtm/static/env00 FAIL at CRIU dump ###################
Send the 9 signal to 44
Wait for zdtm/static/env00(44) to die for 0.100000
##################################### FAIL #####################################
Here is a strace output for the criu dump process:
write(4, "(00.014482) cg: `- [name=zdt"..., 61) = 61 <0.000028>
write(4, "(00.014558) cg: `- [net_cls,"..., 53) = 53 <0.000028>
write(4, "(00.014634) cg: `- [perf_eve"..., 47) = 47 <0.000031>
write(4, "(00.014713) cg: `- [pids] ->"..., 80) = 80 <0.000034>
write(4, "(00.014818) cg: Set 1 is criu on"..., 34) = 34 <0.000028>
rt_sigaction(SIGALRM, {sa_handler=0x43de00, sa_mask=[ALRM],
sa_flags=SA_RESTORER, sa_restorer=0x7f962247c6b0}, NULL, 8) = 0
<0.000018>
alarm(10) = 0 <0.000025>
ptrace(PTRACE_SEIZE, 44, NULL, 0) = -1 EPERM (Operation not
permitted) <0.000022>
write(4, "(00.015123) Warn (compel/src/li"..., 104) = 104 <0.000029>
alarm(0) = 10 <0.000032>
write(4, "(00.015302) Unlock network\n", 27) = 27 <0.000043>
write(4, "(00.015423) Unfreezing tasks int"..., 36) = 36 <0.000036>
The criu process is started with all capabilities in the root user namespace.
I don't have time to investigate this issue right now, will provide
more details next Tuesday.
The issue has been detected by our travis-c job:
https://travis-ci.org/avagin/linux/jobs/638547093
Thanks,
Andrei
On Fri, Jan 17, 2020 at 05:08:14PM -0800, Andrei Vagin wrote:
> On Wed, Jan 15, 2020 at 9:18 AM Christian Brauner
> <[email protected]> wrote:
> >
> > Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
> > introduced the ability to opt out of audit messages for accesses to
> > various proc files since they are not violations of policy.
> > While doing so it somehow switched the check from ns_capable() to
> > has_ns_capability{_noaudit}(). That means it switched from checking the
> > subjective credentials of the task to using the objective credentials. I
> > couldn't find the original lkml thread and so I don't know why this switch
> > was done. But it seems wrong since ptrace_has_cap() is currently only used
> > in ptrace_may_access(). And it's used to check whether the calling task
> > (subject) has the CAP_SYS_PTRACE capability in the provided user namespace
> > to operate on the target task (object). According to the cred.h comments
> > this would mean the subjective credentials of the calling task need to be
> > used.
> > This switches it to use security_capable() because we only call
> > ptrace_has_cap() in ptrace_may_access() and in there we already have a
> > stable reference to the calling tasks creds under cred_guard_mutex so
> > there's no need to go through another series of dereferences and rcu
> > locking done in ns_capable{_noaudit}().
>
>
> The criu process is started with all capabilities in the root user namespace.
>
> I don't have time to investigate this issue right now, will provide
> more details next Tuesday.
Yeah, we've detected the issue. security_capable() indicates success by
returning 0 for whatever reason whereas has_ns_capability() returns 1.
So the logic was inverted. This is fixed in the new version. Sorry for
the noise!
Thanks!
Christian
On Sat, Jan 18, 2020 at 02:17:01AM +0100, Christian Brauner wrote:
> On Fri, Jan 17, 2020 at 05:08:14PM -0800, Andrei Vagin wrote:
> > On Wed, Jan 15, 2020 at 9:18 AM Christian Brauner
> > <[email protected]> wrote:
> > >
> > > Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
> > > introduced the ability to opt out of audit messages for accesses to
> > > various proc files since they are not violations of policy.
> > > While doing so it somehow switched the check from ns_capable() to
> > > has_ns_capability{_noaudit}(). That means it switched from checking the
> > > subjective credentials of the task to using the objective credentials. I
> > > couldn't find the original lkml thread and so I don't know why this switch
> > > was done. But it seems wrong since ptrace_has_cap() is currently only used
> > > in ptrace_may_access(). And it's used to check whether the calling task
> > > (subject) has the CAP_SYS_PTRACE capability in the provided user namespace
> > > to operate on the target task (object). According to the cred.h comments
> > > this would mean the subjective credentials of the calling task need to be
> > > used.
> > > This switches it to use security_capable() because we only call
> > > ptrace_has_cap() in ptrace_may_access() and in there we already have a
> > > stable reference to the calling tasks creds under cred_guard_mutex so
> > > there's no need to go through another series of dereferences and rcu
> > > locking done in ns_capable{_noaudit}().
> >
> >
> > The criu process is started with all capabilities in the root user namespace.
> >
> > I don't have time to investigate this issue right now, will provide
> > more details next Tuesday.
>
> Yeah, we've detected the issue. security_capable() indicates success by
> returning 0 for whatever reason whereas has_ns_capability() returns 1.
> So the logic was inverted. This is fixed in the new version. Sorry for
> the noise!
So, I just finished compiling criu and running the test suite on the
criu-dev branch. The test-suite passes fine after the security_capable()
braino in my original patch was corrected to security_capable() == 0:
################## ALL TEST(S) PASSED (TOTAL 178/SKIPPED 16) ###################
Thanks!
Christian
On Sat, Jan 18, 2020 at 4:47 AM Christian Brauner
<[email protected]> wrote:
> > > The criu process is started with all capabilities in the root user namespace.
> > >
> > > I don't have time to investigate this issue right now, will provide
> > > more details next Tuesday.
> >
> > Yeah, we've detected the issue. security_capable() indicates success by
> > returning 0 for whatever reason whereas has_ns_capability() returns 1.
> > So the logic was inverted. This is fixed in the new version. Sorry for
> > the noise!
>
> So, I just finished compiling criu and running the test suite on the
> criu-dev branch. The test-suite passes fine after the security_capable()
> braino in my original patch was corrected to security_capable() == 0:
>
> ################## ALL TEST(S) PASSED (TOTAL 178/SKIPPED 16) ###################
Thank you for doing this! Not all CRIU contributors can run all tests. You rock!
>
> Thanks!
> Christian