2012-05-17 20:00:19

by Kok, Auke

[permalink] [raw]
Subject: [PATCH] Trace event for capable().

Add a simple trace event for capable().

There's been a lot of discussion around capable(), and there
are plenty of tools to help reduce capabilities' usage from
userspace. A major gap however is that it's almost impossible
to see or verify which bits are requested from either userspace
or in the kernel.

This patch adds a minimal tracer that will print out which
CAPs are requested and whether the request was granted.

Signed-off-by: Auke Kok <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Serge Hallyn <[email protected]>
Cc: Eric Paris <[email protected]>
---
include/trace/events/capabilities.h | 33 +++++++++++++++++++++++++++++++++
kernel/capability.c | 5 +++++
2 files changed, 38 insertions(+)
create mode 100644 include/trace/events/capabilities.h

diff --git a/include/trace/events/capabilities.h b/include/trace/events/capabilities.h
new file mode 100644
index 0000000..97997fa
--- /dev/null
+++ b/include/trace/events/capabilities.h
@@ -0,0 +1,33 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM capabilities
+
+#if !defined(_TRACE_CAPABILITIES_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_CAPABILITIES_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(capable,
+
+ TP_PROTO(pid_t pid, int cap, bool rc),
+
+ TP_ARGS(pid, cap, rc),
+
+ TP_STRUCT__entry(
+ __field(pid_t, pid)
+ __field(int, cap)
+ __field(bool, rc)
+ ),
+
+ TP_fast_assign(
+ __entry->pid = pid;
+ __entry->cap = cap;
+ __entry->rc = rc;
+ ),
+
+ TP_printk("pid=%d cap=%d rc=%d", __entry->pid, __entry->cap, __entry->rc)
+);
+
+#endif /* _TRACE_CAPABILITIES_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/capability.c b/kernel/capability.c
index 3f1adb6..2941f37 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -17,6 +17,9 @@
#include <linux/user_namespace.h>
#include <asm/uaccess.h>

+#define CREATE_TRACE_POINTS
+#include <trace/events/capabilities.h>
+
/*
* Leveraged for setting/resetting capabilities
*/
@@ -386,8 +389,10 @@ bool ns_capable(struct user_namespace *ns, int cap)

if (security_capable(current_cred(), ns, cap) == 0) {
current->flags |= PF_SUPERPRIV;
+ trace_capable(current->pid, cap, true);
return true;
}
+ trace_capable(current->pid, cap, false);
return false;
}
EXPORT_SYMBOL(ns_capable);
--
1.7.10


2012-05-18 22:25:15

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

Quoting Auke Kok ([email protected]):
> Add a simple trace event for capable().
>
> There's been a lot of discussion around capable(), and there
> are plenty of tools to help reduce capabilities' usage from
> userspace. A major gap however is that it's almost impossible
> to see or verify which bits are requested from either userspace
> or in the kernel.
>
> This patch adds a minimal tracer that will print out which
> CAPs are requested and whether the request was granted.
>
> Signed-off-by: Auke Kok <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: Serge Hallyn <[email protected]>

Hi,

is there any measurable performance impact by this patch? (Have you
measured it?)

I'm not familiar enough with the tracing stuff, but if the tracing is
done so there's no impact when not tracing, then I have no problem with
this. It could be quite useful as you say.

Acked-by: Serge Hallyn <[email protected]>

thanks,
-serge

> Cc: Eric Paris <[email protected]>
> ---
> include/trace/events/capabilities.h | 33 +++++++++++++++++++++++++++++++++
> kernel/capability.c | 5 +++++
> 2 files changed, 38 insertions(+)
> create mode 100644 include/trace/events/capabilities.h
>
> diff --git a/include/trace/events/capabilities.h b/include/trace/events/capabilities.h
> new file mode 100644
> index 0000000..97997fa
> --- /dev/null
> +++ b/include/trace/events/capabilities.h
> @@ -0,0 +1,33 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM capabilities
> +
> +#if !defined(_TRACE_CAPABILITIES_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_CAPABILITIES_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(capable,
> +
> + TP_PROTO(pid_t pid, int cap, bool rc),
> +
> + TP_ARGS(pid, cap, rc),
> +
> + TP_STRUCT__entry(
> + __field(pid_t, pid)
> + __field(int, cap)
> + __field(bool, rc)
> + ),
> +
> + TP_fast_assign(
> + __entry->pid = pid;
> + __entry->cap = cap;
> + __entry->rc = rc;
> + ),
> +
> + TP_printk("pid=%d cap=%d rc=%d", __entry->pid, __entry->cap, __entry->rc)
> +);
> +
> +#endif /* _TRACE_CAPABILITIES_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 3f1adb6..2941f37 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -17,6 +17,9 @@
> #include <linux/user_namespace.h>
> #include <asm/uaccess.h>
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/capabilities.h>
> +
> /*
> * Leveraged for setting/resetting capabilities
> */
> @@ -386,8 +389,10 @@ bool ns_capable(struct user_namespace *ns, int cap)
>
> if (security_capable(current_cred(), ns, cap) == 0) {
> current->flags |= PF_SUPERPRIV;
> + trace_capable(current->pid, cap, true);
> return true;
> }
> + trace_capable(current->pid, cap, false);
> return false;
> }
> EXPORT_SYMBOL(ns_capable);
> --
> 1.7.10
>

2012-05-18 22:33:08

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

On Thu, May 17, 2012 at 9:50 PM, Auke Kok <[email protected]> wrote:
> Add a simple trace event for capable().
>
> There's been a lot of discussion around capable(), and there
> are plenty of tools to help reduce capabilities' usage from
> userspace. A major gap however is that it's almost impossible
> to see or verify which bits are requested from either userspace
> or in the kernel.
>
> This patch adds a minimal tracer that will print out which
> CAPs are requested and whether the request was granted.

Can we please have support for user namespaces?
At least idicate whether the current namespace is init_user_ns or not.

--
Thanks,
//richard

2012-05-18 23:09:23

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

On Fri, May 18, 2012 at 3:33 PM, richard -rw- weinberger
<[email protected]> wrote:
>
> On Thu, May 17, 2012 at 9:50 PM, Auke Kok <[email protected]>
> wrote:
> > Add a simple trace event for capable().
> >
> > There's been a lot of discussion around capable(), and there
> > are plenty of tools to help reduce capabilities' usage from
> > userspace. A major gap however is that it's almost impossible
> > to see or verify which bits are requested from either userspace
> > or in the kernel.
> >
> > This patch adds a minimal tracer that will print out which
> > CAPs are requested and whether the request was granted.
>
> Can we please have support for user namespaces?
> At least idicate whether the current namespace is init_user_ns or not.

that was the main reason for sending this out already - that should be
trivial to add to the trace event, but I haven't looked at namespaces
yet myself. I'll check it out.

Auke

2012-05-18 23:11:14

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

On Fri, May 18, 2012 at 3:25 PM, Serge Hallyn
<[email protected]> wrote:
> Quoting Auke Kok ([email protected]):
>> Add a simple trace event for capable().
>>
>> There's been a lot of discussion around capable(), and there
>> are plenty of tools to help reduce capabilities' usage from
>> userspace. A major gap however is that it's almost impossible
>> to see or verify which bits are requested from either userspace
>> or in the kernel.
>>
>> This patch adds a minimal tracer that will print out which
>> CAPs are requested and whether the request was granted.
>>
>> Signed-off-by: Auke Kok <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: Serge Hallyn <[email protected]>
>
> Hi,
>
> is there any measurable performance impact by this patch? ?(Have you
> measured it?)

I actually did a full OS boot test and didn't see a noticeable
difference - basically
booted with init=/bin/bash, mount debugfs, start the tracer and then
exec /sbin/init...
It's anecdotal, but, given the use of this tracer should be more than
acceptable.

Of course, there is a small (nop - ~1 cycle) penalty to the tracepoint
even if tracing
is disabled, but the codepath should never be in any form of hotpath no matter
what, since any call to capable() will end up in LSM checks and audit
checks anyway.

> I'm not familiar enough with the tracing stuff, but if the tracing is
> done so there's no impact when not tracing, then I have no problem with
> this. ?It could be quite useful as you say.
>
> Acked-by: Serge Hallyn <[email protected]>

thanks, - I might resend a new patch if I can add the namespace info
as per the other comment.

Auke

2012-05-18 23:19:53

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().


----- Original message -----
> On Fri, May 18, 2012 at 3:33 PM, richard -rw- weinberger
> <[email protected]> wrote:
> >
> > On Thu, May 17, 2012 at 9:50 PM, Auke Kok <[email protected]>
> > wrote:
> > > Add a simple trace event for capable().
> > >
> > > There's been a lot of discussion around capable(), and there
> > > are plenty of tools to help reduce capabilities' usage from
> > > userspace. A major gap however is that it's almost impossible
> > > to see or verify which bits are requested from either userspace
> > > or in the kernel.
> > >
> > > This patch adds a minimal tracer that will print out which
> > > CAPs are requested and whether the request was granted.
> >
> > Can we please have support for user namespaces?
> > At least idicate whether the current namespace is init_user_ns or not.
>
> that was the main reason for sending this out already - that should be
> trivial to add to the trace event, but I haven't looked at namespaces
> yet myself. I'll check it out.
>

right, trivial to add, but either go through linux-next or wait for Eric's patchset to move from there to Linus' tree. Print the from_kuid(&init_user_ns, current_uid()), and if not in init_user_ns then also print the ns creator and task uid in his own ns.

I don't think you need to do that right now.

> Auke
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

2012-05-19 06:59:34

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

Auke Kok <[email protected]> writes:

> Add a simple trace event for capable().
>
> There's been a lot of discussion around capable(), and there
> are plenty of tools to help reduce capabilities' usage from
> userspace. A major gap however is that it's almost impossible
> to see or verify which bits are requested from either userspace
> or in the kernel.
>
> This patch adds a minimal tracer that will print out which
> CAPs are requested and whether the request was granted.

A small comment assigned from the other issues.

current->pid for anything going to userspace is broken,
and in fact current->pid should be killed on of these days.

Which pid namespace is your tracer running in?

> Signed-off-by: Auke Kok <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: Serge Hallyn <[email protected]>
> Cc: Eric Paris <[email protected]>
> ---
> include/trace/events/capabilities.h | 33 +++++++++++++++++++++++++++++++++
> kernel/capability.c | 5 +++++
> 2 files changed, 38 insertions(+)
> create mode 100644 include/trace/events/capabilities.h
>
> diff --git a/include/trace/events/capabilities.h b/include/trace/events/capabilities.h
> new file mode 100644
> index 0000000..97997fa
> --- /dev/null
> +++ b/include/trace/events/capabilities.h
> @@ -0,0 +1,33 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM capabilities
> +
> +#if !defined(_TRACE_CAPABILITIES_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_CAPABILITIES_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(capable,
> +
> + TP_PROTO(pid_t pid, int cap, bool rc),
> +
> + TP_ARGS(pid, cap, rc),
> +
> + TP_STRUCT__entry(
> + __field(pid_t, pid)
> + __field(int, cap)
> + __field(bool, rc)
> + ),
> +
> + TP_fast_assign(
> + __entry->pid = pid;
> + __entry->cap = cap;
> + __entry->rc = rc;
> + ),
> +
> + TP_printk("pid=%d cap=%d rc=%d", __entry->pid, __entry->cap, __entry->rc)
> +);
> +
> +#endif /* _TRACE_CAPABILITIES_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 3f1adb6..2941f37 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -17,6 +17,9 @@
> #include <linux/user_namespace.h>
> #include <asm/uaccess.h>
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/capabilities.h>
> +
> /*
> * Leveraged for setting/resetting capabilities
> */
> @@ -386,8 +389,10 @@ bool ns_capable(struct user_namespace *ns, int cap)
>
> if (security_capable(current_cred(), ns, cap) == 0) {
> current->flags |= PF_SUPERPRIV;
> + trace_capable(current->pid, cap, true);
> return true;
> }
> + trace_capable(current->pid, cap, false);
> return false;
> }
> EXPORT_SYMBOL(ns_capable);

2012-05-19 18:39:19

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

On Fri, May 18, 2012 at 11:59 PM, Eric W. Biederman
<[email protected]> wrote:
> Auke Kok <[email protected]> writes:
>
> > Add a simple trace event for capable().
> >
> > There's been a lot of discussion around capable(), and there
> > are plenty of tools to help reduce capabilities' usage from
> > userspace. A major gap however is that it's almost impossible
> > to see or verify which bits are requested from either userspace
> > or in the kernel.
> >
> > This patch adds a minimal tracer that will print out which
> > CAPs are requested and whether the request was granted.
>
> A small comment assigned from the other issues.
>
> current->pid for anything going to userspace is broken,
> and in fact current->pid should be killed on of these days.
>
> Which pid namespace is your tracer running in?

init - I currently have no need for namespaces myself at all, and, as I replied
to Serge and Eric already - I'll see if I can fis up the tracer with their
suggestions.

Auke

2012-05-20 13:07:48

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

Quoting Serge Hallyn ([email protected]):
>
> ----- Original message -----
> > On Fri, May 18, 2012 at 3:33 PM, richard -rw- weinberger
> > <[email protected]> wrote:
> > >
> > > On Thu, May 17, 2012 at 9:50 PM, Auke Kok <[email protected]>
> > > wrote:
> > > > Add a simple trace event for capable().
> > > >
> > > > There's been a lot of discussion around capable(), and there
> > > > are plenty of tools to help reduce capabilities' usage from
> > > > userspace. A major gap however is that it's almost impossible
> > > > to see or verify which bits are requested from either userspace
> > > > or in the kernel.
> > > >
> > > > This patch adds a minimal tracer that will print out which
> > > > CAPs are requested and whether the request was granted.
> > >
> > > Can we please have support for user namespaces?
> > > At least idicate whether the current namespace is init_user_ns or not.
> >
> > that was the main reason for sending this out already - that should be
> > trivial to add to the trace event, but I haven't looked at namespaces
> > yet myself. I'll check it out.
> >
>
> right, trivial to add, but either go through linux-next or wait for Eric's
> patchset to move from there to Linus' tree. Print the
> from_kuid(&init_user_ns, current_uid()), and if not in init_user_ns then also
> print the ns creator and task uid in his own ns.
>
> I don't think you need to do that right now.

Oh, you'll also need to add the uid (in init_user_ns) of the owner of
the target namespace ('ns'). Otherwise, the admin may freak out seeing
uid 500 got cap_sys_admin :)

-serge

2012-05-22 00:04:04

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

"Kok, Auke-jan H" <[email protected]> writes:

> On Fri, May 18, 2012 at 11:59 PM, Eric W. Biederman
> <[email protected]> wrote:
>> Auke Kok <[email protected]> writes:
>>
>> > Add a simple trace event for capable().
>> >
>> > There's been a lot of discussion around capable(), and there
>> > are plenty of tools to help reduce capabilities' usage from
>> > userspace. A major gap however is that it's almost impossible
>> > to see or verify which bits are requested from either userspace
>> > or in the kernel.
>> >
>> > This patch adds a minimal tracer that will print out which
>> > CAPs are requested and whether the request was granted.
>>
>> A small comment assigned from the other issues.
>>
>> current->pid for anything going to userspace is broken,
>> and in fact current->pid should be killed on of these days.
>>
>> Which pid namespace is your tracer running in?
>
> init - I currently have no need for namespaces myself at all, and, as I replied
> to Serge and Eric already - I'll see if I can fis up the tracer with their
> suggestions.

Thanks.

A quick read of perf_event_open shows that the syscall is allowed by
unprivileged users in any context if sysctl_perf_event_paranoid is
set properly. So it looks like the perf code needs to handle namespaces
properly if we are going to be reporting namespaced values like uid
and pids. Even if no one cares today what about next week?

Eric

2012-05-22 02:17:11

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

On Tue, May 22, 2012 at 12:03 AM, Eric W. Biederman
<[email protected]> wrote:
> "Kok, Auke-jan H" <[email protected]> writes:
>
>> On Fri, May 18, 2012 at 11:59 PM, Eric W. Biederman
>> <[email protected]> wrote:
>>> Auke Kok <[email protected]> writes:
>>>
>>> > Add a simple trace event for capable().
>>> >
>>> > There's been a lot of discussion around capable(), and there
>>> > are plenty of tools to help reduce capabilities' usage from
>>> > userspace. A major gap however is that it's almost impossible
>>> > to see or verify which bits are requested from either userspace
>>> > or in the kernel.
>>> >
>>> > This patch adds a minimal tracer that will print out which
>>> > CAPs are requested and whether the request was granted.
>>>
>>> A small comment assigned from the other issues.
>>>
>>> current->pid for anything going to userspace is broken,
>>> and in fact current->pid should be killed on of these days.
>>>
>>> Which pid namespace is your tracer running in?
>>
>> init - I currently have no need for namespaces myself at all, and, as I replied
>> to Serge and Eric already - I'll see if I can fis up the tracer with their
>> suggestions.
>
> Thanks.
>
> A quick read of perf_event_open shows that the syscall is allowed by
> unprivileged users in any context if sysctl_perf_event_paranoid is
> set properly. ?So it looks like the perf code needs to handle namespaces
> properly if we are going to be reporting namespaced values like uid
> and pids. ?Even if no one cares today what about next week?

I thought about dropping current->pid from this patch entirely and referring
to the comm+pid combination that is already printed in the event log.

That avoids the issue entirely on one side, and defers the problem with
namespaces entirely to the perf code, where it should probably be solved
in a consistend way anyway.

Would that be an appropriate solution, or do you think it's preferable to
fix the trace event to properly annotate namespaces? I'm slanting towards
just dropping current->pid from this patch for now.

Auke

>
> Eric

2012-05-22 14:50:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] Trace event for capable().

"Kok, Auke-jan H" <[email protected]> writes:

> On Tue, May 22, 2012 at 12:03 AM, Eric W. Biederman
> <[email protected]> wrote:
>> "Kok, Auke-jan H" <[email protected]> writes:
>>
>>> On Fri, May 18, 2012 at 11:59 PM, Eric W. Biederman
>>> <[email protected]> wrote:
>>>> Auke Kok <[email protected]> writes:
>>>>
>>>> > Add a simple trace event for capable().
>>>> >
>>>> > There's been a lot of discussion around capable(), and there
>>>> > are plenty of tools to help reduce capabilities' usage from
>>>> > userspace. A major gap however is that it's almost impossible
>>>> > to see or verify which bits are requested from either userspace
>>>> > or in the kernel.
>>>> >
>>>> > This patch adds a minimal tracer that will print out which
>>>> > CAPs are requested and whether the request was granted.
>>>>
>>>> A small comment assigned from the other issues.
>>>>
>>>> current->pid for anything going to userspace is broken,
>>>> and in fact current->pid should be killed on of these days.
>>>>
>>>> Which pid namespace is your tracer running in?
>>>
>>> init - I currently have no need for namespaces myself at all, and, as I replied
>>> to Serge and Eric already - I'll see if I can fis up the tracer with their
>>> suggestions.
>>
>> Thanks.
>>
>> A quick read of perf_event_open shows that the syscall is allowed by
>> unprivileged users in any context if sysctl_perf_event_paranoid is
>> set properly.  So it looks like the perf code needs to handle namespaces
>> properly if we are going to be reporting namespaced values like uid
>> and pids.  Even if no one cares today what about next week?
>
> I thought about dropping current->pid from this patch entirely and referring
> to the comm+pid combination that is already printed in the event log.
>
> That avoids the issue entirely on one side, and defers the problem with
> namespaces entirely to the perf code, where it should probably be solved
> in a consistend way anyway.

Yes. There is at least one other place where we need to solve this.

> Would that be an appropriate solution, or do you think it's preferable to
> fix the trace event to properly annotate namespaces? I'm slanting towards
> just dropping current->pid from this patch for now.

I'm not quite certain in the context of perf. Both because it is high
speed and because I don't understand perf very well.

The usual idiom is to hold a struct pid * in the data structure and then
to translate when the value is read out to userspace. If we know who
the userspace reader is going to be at the time we log the message we
can translate when we log the information.

High speed probably conflicts with reference counts.

Eric