2020-01-20 11:20:07

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 0/10] Introduce CAP_PERFMON to secure system performance monitoring and observability


Currently access to perf_events, i915_perf and other performance monitoring and
observability subsystems of the kernel is open for a privileged process [1] with
CAP_SYS_ADMIN capability enabled in the process effective set [2].

This patch set introduces CAP_PERFMON capability designed to secure system
performance monitoring and observability operations so that CAP_PERFMON would
assist CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
and other performance monitoring and observability subsystems of the kernel.

CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to system
performance monitoring and observability operations and balance amount of
CAP_SYS_ADMIN credentials following the recommendations in the capabilities man
page [2] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to
kernel developers, below."

CAP_PERFMON intends to harden system security and integrity during system
performance monitoring and observability operations by decreasing attack surface
that is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
to system performance monitoring and observability operations under CAP_PERFMON
capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances
to misuse the credentials and makes the operation more secure.

For backward compatibility reasons access to system performance monitoring and
observability subsystems of the kernel remains open for CAP_SYS_ADMIN privileged
processes but CAP_SYS_ADMIN capability usage for secure system performance
monitoring and observability operations is discouraged with respect to the
designed CAP_PERFMON capability.

CAP_PERFMON intends to meet the demand to secure system performance monitoring
and observability operations in security sensitive, restricted, multiuser production
environments (e.g. HPC clusters, cloud and virtual compute environments) where
root or CAP_SYS_ADMIN credentials are not available to mass users of a system
because of security considerations.

Possible alternative solution to this capabilities balancing, system security
hardening task could be to use the existing CAP_SYS_PTRACE capability to govern
system performance monitoring and observability operations. However CAP_SYS_PTRACE
capability still provides users with more credentials than are required for
secure performance monitoring and observability operations and this excess is
avoided by the designed CAP_PERFMON capability.

Although the software running under CAP_PERFMON can not ensure avoidance of
related hardware issues, the software can still mitigate those issues following
the official embargoed hardware issues mitigation procedure [3]. The bugs in
the software itself could be fixed following the standard kernel development
process [4] to maintain and harden security of system performance monitoring
and observability operations. After all, the patch set is shaped in the way
that simplifies procedure for backtracking of possible issues and bugs [5] as
much as possible.

The patch set is for tip perf/core repository:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
sha1: 5738891229a25e9e678122a843cbf0466a456d0c

---
Changes in v5:
- renamed CAP_SYS_PERFMON to CAP_PERFMON
- extended perfmon_capable() with noaudit checks
Changes in v4:
- converted perfmon_capable() into an inline function
- made perf_events kprobes, uprobes, hw breakpoints and namespaces data available
to CAP_SYS_PERFMON privileged processes
- applied perfmon_capable() to drivers/perf and drivers/oprofile
- extended __cmd_ftrace() with support of CAP_SYS_PERFMON
Changes in v3:
- implemented perfmon_capable() macros aggregating required capabilities checks
Changes in v2:
- made perf_events trace points available to CAP_SYS_PERFMON privileged processes
- made perf_event_paranoid_check() treat CAP_SYS_PERFMON equally to CAP_SYS_ADMIN
- applied CAP_SYS_PERFMON to i915_perf, bpf_trace, powerpc and parisc system
performance monitoring and observability related subsystems

---
Alexey Budankov (10):
capabilities: introduce CAP_PERFMON to kernel and user space
perf/core: open access to the core for CAP_PERFMON privileged process
perf/core: open access to anon probes for CAP_PERFMON privileged process
perf tool: extend Perf tool with CAP_PERFMON capability support
drm/i915/perf: open access for CAP_PERFMON privileged process
trace/bpf_trace: open access for CAP_PERFMON privileged process
powerpc/perf: open access for CAP_PERFMON privileged process
parisc/perf: open access for CAP_PERFMON privileged process
drivers/perf: open access for CAP_PERFMON privileged process
drivers/oprofile: open access for CAP_PERFMON privileged process

arch/parisc/kernel/perf.c | 2 +-
arch/powerpc/perf/imc-pmu.c | 4 ++--
drivers/gpu/drm/i915/i915_perf.c | 13 ++++++-------
drivers/oprofile/event_buffer.c | 2 +-
drivers/perf/arm_spe_pmu.c | 4 ++--
include/linux/capability.h | 12 ++++++++++++
include/linux/perf_event.h | 6 +++---
include/uapi/linux/capability.h | 8 +++++++-
kernel/events/core.c | 6 +++---
kernel/trace/bpf_trace.c | 2 +-
security/selinux/include/classmap.h | 4 ++--
tools/perf/builtin-ftrace.c | 5 +++--
tools/perf/design.txt | 3 ++-
tools/perf/util/cap.h | 4 ++++
tools/perf/util/evsel.c | 10 +++++-----
tools/perf/util/util.c | 1 +
16 files changed, 55 insertions(+), 31 deletions(-)

---
Testing and validation (Intel Skylake, 8 cores, Fedora 29, 5.5.0-rc3+, x86_64):

libcap library [4], [5], [6] and Perf tool can be used to apply CAP_PERFMON
capability for secure system performance monitoring and observability beyond the
scope permitted by the system wide perf_event_paranoid kernel setting [7] and
below are the steps for evaluation:

- patch, build and boot the kernel
- patch, build Perf tool e.g. to /home/user/perf
...
# git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap
# pushd libcap
# patch libcap/include/uapi/linux/capabilities.h with [PATCH 1]
# make
# pushd progs
# ./setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
# ./setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf
/home/user/perf: OK
# ./getcap /home/user/perf
/home/user/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
# echo 2 > /proc/sys/kernel/perf_event_paranoid
# cat /proc/sys/kernel/perf_event_paranoid
2
...
$ /home/user/perf top
... works as expected ...
$ cat /proc/`pidof perf`/status
Name: perf
Umask: 0002
State: S (sleeping)
Tgid: 2958
Ngid: 0
Pid: 2958
PPid: 9847
TracerPid: 0
Uid: 500 500 500 500
Gid: 500 500 500 500
FDSize: 256
...
CapInh: 0000000000000000
CapPrm: 0000004400080000
CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000
cap_perfmon,cap_sys_ptrace,cap_syslog
CapBnd: 0000007fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ff
Cpus_allowed_list: 0-7
...

Usage of cap_perfmon effectively avoids unused credentials excess:

- with cap_sys_admin:
CapEff: 0000007fffffffff => 01111111 11111111 11111111 11111111 11111111

- with cap_perfmon:
CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000
38 34 19
perfmon syslog sys_ptrace

---
[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
[2] http://man7.org/linux/man-pages/man7/capabilities.7.html
[3] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[4] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
[5] https://www.kernel.org/doc/html/latest/process/management-style.html#decisions
[6] http://man7.org/linux/man-pages/man8/setcap.8.html
[7] https://git.kernel.org/pub/scm/libs/libcap/libcap.git
[8] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
[9] http://man7.org/linux/man-pages/man2/perf_event_open.2.html

--
2.20.1


2020-01-20 11:24:22

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


Introduce CAP_PERFMON capability designed to secure system performance
monitoring and observability operations so that CAP_PERFMON would assist
CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
and other performance monitoring and observability subsystems.

CAP_PERFMON intends to harden system security and integrity during system
performance monitoring and observability operations by decreasing attack
surface that is available to a CAP_SYS_ADMIN privileged process [1].
Providing access to system performance monitoring and observability
operations under CAP_PERFMON capability singly, without the rest of
CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
makes operation more secure.

CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
system performance monitoring and observability operations and balance
amount of CAP_SYS_ADMIN credentials following the recommendations in the
capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
overloaded; see Notes to kernel developers, below."

Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official embargoed hardware issues mitigation procedure [2].
The bugs in the software itself could be fixed following the standard
kernel development process [3] to maintain and harden security of system
performance monitoring and observability operations.

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html

Signed-off-by: Alexey Budankov <[email protected]>
---
include/linux/capability.h | 12 ++++++++++++
include/uapi/linux/capability.h | 8 +++++++-
security/selinux/include/classmap.h | 4 ++--
3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index ecce0f43c73a..8784969d91e1 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
+static inline bool perfmon_capable(void)
+{
+ struct user_namespace *ns = &init_user_ns;
+
+ if (ns_capable_noaudit(ns, CAP_PERFMON))
+ return ns_capable(ns, CAP_PERFMON);
+
+ if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
+ return ns_capable(ns, CAP_SYS_ADMIN);
+
+ return false;
+}

/* audit system wants to get cap info from files as well */
extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 240fdb9a60f6..8b416e5f3afa 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -366,8 +366,14 @@ struct vfs_ns_cap_data {

#define CAP_AUDIT_READ 37

+/*
+ * Allow system performance and observability privileged operations
+ * using perf_events, i915_perf and other kernel subsystems
+ */
+
+#define CAP_PERFMON 38

-#define CAP_LAST_CAP CAP_AUDIT_READ
+#define CAP_LAST_CAP CAP_PERFMON

#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 7db24855e12d..c599b0c2b0e7 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -27,9 +27,9 @@
"audit_control", "setfcap"

#define COMMON_CAP2_PERMS "mac_override", "mac_admin", "syslog", \
- "wake_alarm", "block_suspend", "audit_read"
+ "wake_alarm", "block_suspend", "audit_read", "perfmon"

-#if CAP_LAST_CAP > CAP_AUDIT_READ
+#if CAP_LAST_CAP > CAP_PERFMON
#error New capability defined, please update COMMON_CAP2_PERMS.
#endif

--
2.20.1


2020-01-20 11:26:24

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 02/10] perf/core: open access to the core for CAP_PERFMON privileged process


Open access to monitoring of kernel code, system, tracepoints and namespaces
data for a CAP_PERFMON privileged process. For backward compatibility
reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN
privileged processes but CAP_SYS_ADMIN usage for secure perf_events
monitoring is discouraged with respect to CAP_PERFMON capability.
Providing the access under CAP_PERFMON capability singly, without the rest
of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
and makes operation more secure.

Signed-off-by: Alexey Budankov <[email protected]>
---
include/linux/perf_event.h | 6 +++---
kernel/events/core.c | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6d4c22aee384..730469babcc2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1285,7 +1285,7 @@ static inline int perf_is_paranoid(void)

static inline int perf_allow_kernel(struct perf_event_attr *attr)
{
- if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
+ if (sysctl_perf_event_paranoid > 1 && !perfmon_capable())
return -EACCES;

return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
@@ -1293,7 +1293,7 @@ static inline int perf_allow_kernel(struct perf_event_attr *attr)

static inline int perf_allow_cpu(struct perf_event_attr *attr)
{
- if (sysctl_perf_event_paranoid > 0 && !capable(CAP_SYS_ADMIN))
+ if (sysctl_perf_event_paranoid > 0 && !perfmon_capable())
return -EACCES;

return security_perf_event_open(attr, PERF_SECURITY_CPU);
@@ -1301,7 +1301,7 @@ static inline int perf_allow_cpu(struct perf_event_attr *attr)

static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
{
- if (sysctl_perf_event_paranoid > -1 && !capable(CAP_SYS_ADMIN))
+ if (sysctl_perf_event_paranoid > -1 && !perfmon_capable())
return -EPERM;

return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a1f8bde19b56..b1fcbbe24849 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11186,7 +11186,7 @@ SYSCALL_DEFINE5(perf_event_open,
}

if (attr.namespaces) {
- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;
}

--
2.20.1

2020-01-20 11:27:57

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 03/10] perf/core: open access to anon probes for CAP_PERFMON privileged process


Open access to anon kprobes, uprobes and eBPF tracing for CAP_PERFMON
privileged processes. For backward compatibility reasons access remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure monitoring is discouraged with respect to CAP_PERFMON capability.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operations more secure.

Anon kprobes and uprobes are used by ftrace and eBPF. perf probe uses
ftrace to define new kprobe events, and those events are treated as
tracepoint events. eBPF defines new probes via perf_event_open syscall
and then the probes are used in eBPF tracing.

Signed-off-by: Alexey Budankov <[email protected]>
---
kernel/events/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b1fcbbe24849..8a6c0b08451d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9088,7 +9088,7 @@ static int perf_kprobe_event_init(struct perf_event *event)
if (event->attr.type != perf_kprobe.type)
return -ENOENT;

- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;

/*
@@ -9148,7 +9148,7 @@ static int perf_uprobe_event_init(struct perf_event *event)
if (event->attr.type != perf_uprobe.type)
return -ENOENT;

- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;

/*
--
2.20.1

2020-01-20 11:29:25

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 04/10] perf tool: extend Perf tool with CAP_PERFMON capability support


Extend error messages to mention CAP_PERFMON capability as an option
to substitute CAP_SYS_ADMIN capability for secure system performance
monitoring and observability operations. Make perf_event_paranoid_check()
and __cmd_ftrace() to be aware of CAP_PERFMON capability.

Signed-off-by: Alexey Budankov <[email protected]>
---
tools/perf/builtin-ftrace.c | 5 +++--
tools/perf/design.txt | 3 ++-
tools/perf/util/cap.h | 4 ++++
tools/perf/util/evsel.c | 10 +++++-----
tools/perf/util/util.c | 1 +
5 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index d5adc417a4ca..55eda54240fb 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -284,10 +284,11 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char **argv)
.events = POLLIN,
};

- if (!perf_cap__capable(CAP_SYS_ADMIN)) {
+ if (!(perf_cap__capable(CAP_PERFMON) ||
+ perf_cap__capable(CAP_SYS_ADMIN))) {
pr_err("ftrace only works for %s!\n",
#ifdef HAVE_LIBCAP_SUPPORT
- "users with the SYS_ADMIN capability"
+ "users with the CAP_PERFMON or CAP_SYS_ADMIN capability"
#else
"root"
#endif
diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index 0453ba26cdbd..a42fab308ff6 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -258,7 +258,8 @@ gets schedule to. Per task counters can be created by any user, for
their own tasks.

A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
-all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege.
+all events on CPU-x. Per CPU counters need CAP_PERFMON or CAP_SYS_ADMIN
+privilege.

The 'flags' parameter is currently unused and must be zero.

diff --git a/tools/perf/util/cap.h b/tools/perf/util/cap.h
index 051dc590ceee..ae52878c0b2e 100644
--- a/tools/perf/util/cap.h
+++ b/tools/perf/util/cap.h
@@ -29,4 +29,8 @@ static inline bool perf_cap__capable(int cap __maybe_unused)
#define CAP_SYSLOG 34
#endif

+#ifndef CAP_PERFMON
+#define CAP_PERFMON 38
+#endif
+
#endif /* __PERF_CAP_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a69e64236120..a35f17723dd3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2491,14 +2491,14 @@ int perf_evsel__open_strerror(struct evsel *evsel, struct target *target,
"You may not have permission to collect %sstats.\n\n"
"Consider tweaking /proc/sys/kernel/perf_event_paranoid,\n"
"which controls use of the performance events system by\n"
- "unprivileged users (without CAP_SYS_ADMIN).\n\n"
+ "unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).\n\n"
"The current value is %d:\n\n"
" -1: Allow use of (almost) all events by all users\n"
" Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK\n"
- ">= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN\n"
- " Disallow raw tracepoint access by users without CAP_SYS_ADMIN\n"
- ">= 1: Disallow CPU event access by users without CAP_SYS_ADMIN\n"
- ">= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN\n\n"
+ ">= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN\n"
+ " Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN\n"
+ ">= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN\n"
+ ">= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN\n\n"
"To make this setting permanent, edit /etc/sysctl.conf too, e.g.:\n\n"
" kernel.perf_event_paranoid = -1\n" ,
target->system_wide ? "system-wide " : "",
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 969ae560dad9..51cf3071db74 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -272,6 +272,7 @@ int perf_event_paranoid(void)
bool perf_event_paranoid_check(int max_level)
{
return perf_cap__capable(CAP_SYS_ADMIN) ||
+ perf_cap__capable(CAP_PERFMON) ||
perf_event_paranoid() <= max_level;
}

--
2.20.1


2020-01-20 11:29:40

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 05/10] drm/i915/perf: open access for CAP_PERFMON privileged process


Open access to i915_perf monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to i915_perf subsystem remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure i915_perf monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse
the credentials and makes operations more secure.

Signed-off-by: Alexey Budankov <[email protected]>
---
drivers/gpu/drm/i915/i915_perf.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2ae14bc14931..d89347861b7d 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3375,10 +3375,10 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
* we check a dev.i915.perf_stream_paranoid sysctl option
* to determine if it's ok to access system wide OA counters
- * without CAP_SYS_ADMIN privileges.
+ * without CAP_PERFMON or CAP_SYS_ADMIN privileges.
*/
if (privileged_op &&
- i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+ i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
ret = -EACCES;
goto err_ctx;
@@ -3571,9 +3571,8 @@ static int read_properties_unlocked(struct i915_perf *perf,
} else
oa_freq_hz = 0;

- if (oa_freq_hz > i915_oa_max_sample_rate &&
- !capable(CAP_SYS_ADMIN)) {
- DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without root privileges\n",
+ if (oa_freq_hz > i915_oa_max_sample_rate && !perfmon_capable()) {
+ DRM_DEBUG("OA exponent would exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without CAP_PERFMON or CAP_SYS_ADMIN privileges\n",
i915_oa_max_sample_rate);
return -EACCES;
}
@@ -3994,7 +3993,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
return -EINVAL;
}

- if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+ if (i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to add i915 OA config\n");
return -EACCES;
}
@@ -4141,7 +4140,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
return -ENOTSUPP;
}

- if (i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
+ if (i915_perf_stream_paranoid && !perfmon_capable()) {
DRM_DEBUG("Insufficient privileges to remove i915 OA config\n");
return -EACCES;
}
--
2.20.1

2020-01-20 11:31:07

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 06/10] trace/bpf_trace: open access for CAP_PERFMON privileged process


Open access to bpf_trace monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to bpf_trace monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
secure bpf_trace monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse
the credentials and makes operations more secure.

Signed-off-by: Alexey Budankov <[email protected]>
---
kernel/trace/bpf_trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e5ef4ae9edb5..334f1d71ebb1 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1395,7 +1395,7 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
u32 *ids, prog_cnt, ids_len;
int ret;

- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EPERM;
if (event->attr.type != PERF_TYPE_TRACEPOINT)
return -EINVAL;
--
2.20.1

2020-01-20 11:33:13

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 07/10] powerpc/perf: open access for CAP_PERFMON privileged process


Open access to monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
for secure monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to
misuse the credentials and makes the operations more secure.

Signed-off-by: Alexey Budankov <[email protected]>
---
arch/powerpc/perf/imc-pmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cb50a9e1fd2d..e837717492e4 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -898,7 +898,7 @@ static int thread_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;

- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;

/* Sampling not supported */
@@ -1307,7 +1307,7 @@ static int trace_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;

- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;

/* Return if this is a couting event */
--
2.20.1

2020-01-20 11:34:06

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 08/10] parisc/perf: open access for CAP_PERFMON privileged process


Open access to monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
for secure monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to
misuse the credentials and makes the operations more secure.

Signed-off-by: Alexey Budankov <[email protected]>
---
arch/parisc/kernel/perf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
index 676683641d00..c4208d027794 100644
--- a/arch/parisc/kernel/perf.c
+++ b/arch/parisc/kernel/perf.c
@@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char __user *buf,
else
return -EFAULT;

- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EACCES;

if (count != sizeof(uint32_t))
--
2.20.1


2020-01-20 11:35:03

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 10/10] drivers/oprofile: open access for CAP_PERFMON privileged process


Open access to monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
for secure monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to
misuse the credentials and makes the operations more secure.

Signed-off-by: Alexey Budankov <[email protected]>
---
drivers/oprofile/event_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c
index 12ea4a4ad607..6c9edc8bbc95 100644
--- a/drivers/oprofile/event_buffer.c
+++ b/drivers/oprofile/event_buffer.c
@@ -113,7 +113,7 @@ static int event_buffer_open(struct inode *inode, struct file *file)
{
int err = -EPERM;

- if (!capable(CAP_SYS_ADMIN))
+ if (!perfmon_capable())
return -EPERM;

if (test_and_set_bit_lock(0, &buffer_opened))
--
2.20.1

2020-01-20 11:35:03

by Alexey Budankov

[permalink] [raw]
Subject: [PATCH v5 09/10] drivers/perf: open access for CAP_PERFMON privileged process


Open access to monitoring for CAP_PERFMON privileged processes.
For backward compatibility reasons access to the monitoring remains
open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
for secure monitoring is discouraged with respect to CAP_PERFMON
capability. Providing the access under CAP_PERFMON capability singly,
without the rest of CAP_SYS_ADMIN credentials, excludes chances to
misuse the credentials and makes the operations more secure.

Signed-off-by: Alexey Budankov <[email protected]>
---
drivers/perf/arm_spe_pmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4e4984a55cd1..5dff81bc3324 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event *event)
if (!attr->exclude_kernel)
reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);

- if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN))
+ if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);

return reg;
@@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;

reg = arm_spe_event_to_pmscr(event);
- if (!capable(CAP_SYS_ADMIN) &&
+ if (!perfmon_capable() &&
(reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
BIT(SYS_PMSCR_EL1_CX_SHIFT) |
BIT(SYS_PMSCR_EL1_PCT_SHIFT))))
--
2.20.1

2020-01-21 14:44:29

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On 1/20/20 6:23 AM, Alexey Budankov wrote:
>
> Introduce CAP_PERFMON capability designed to secure system performance
> monitoring and observability operations so that CAP_PERFMON would assist
> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
> and other performance monitoring and observability subsystems.
>
> CAP_PERFMON intends to harden system security and integrity during system
> performance monitoring and observability operations by decreasing attack
> surface that is available to a CAP_SYS_ADMIN privileged process [1].
> Providing access to system performance monitoring and observability
> operations under CAP_PERFMON capability singly, without the rest of
> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
> makes operation more secure.
>
> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> system performance monitoring and observability operations and balance
> amount of CAP_SYS_ADMIN credentials following the recommendations in the
> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
> overloaded; see Notes to kernel developers, below."
>
> Although the software running under CAP_PERFMON can not ensure avoidance
> of related hardware issues, the software can still mitigate these issues
> following the official embargoed hardware issues mitigation procedure [2].
> The bugs in the software itself could be fixed following the standard
> kernel development process [3] to maintain and harden security of system
> performance monitoring and observability operations.
>
> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>
> Signed-off-by: Alexey Budankov <[email protected]>
> ---
> include/linux/capability.h | 12 ++++++++++++
> include/uapi/linux/capability.h | 8 +++++++-
> security/selinux/include/classmap.h | 4 ++--
> 3 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index ecce0f43c73a..8784969d91e1 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
> +static inline bool perfmon_capable(void)
> +{
> + struct user_namespace *ns = &init_user_ns;
> +
> + if (ns_capable_noaudit(ns, CAP_PERFMON))
> + return ns_capable(ns, CAP_PERFMON);
> +
> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
> + return ns_capable(ns, CAP_SYS_ADMIN);
> +
> + return false;
> +}

Why _noaudit()? Normally only used when a permission failure is
non-fatal to the operation. Otherwise, we want the audit message.

2020-01-21 17:33:33

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


On 21.01.2020 17:43, Stephen Smalley wrote:
> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>
>> Introduce CAP_PERFMON capability designed to secure system performance
>> monitoring and observability operations so that CAP_PERFMON would assist
>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>> and other performance monitoring and observability subsystems.
>>
>> CAP_PERFMON intends to harden system security and integrity during system
>> performance monitoring and observability operations by decreasing attack
>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>> Providing access to system performance monitoring and observability
>> operations under CAP_PERFMON capability singly, without the rest of
>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>> makes operation more secure.
>>
>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>> system performance monitoring and observability operations and balance
>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>> overloaded; see Notes to kernel developers, below."
>>
>> Although the software running under CAP_PERFMON can not ensure avoidance
>> of related hardware issues, the software can still mitigate these issues
>> following the official embargoed hardware issues mitigation procedure [2].
>> The bugs in the software itself could be fixed following the standard
>> kernel development process [3] to maintain and harden security of system
>> performance monitoring and observability operations.
>>
>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>
>> Signed-off-by: Alexey Budankov <[email protected]>
>> ---
>>   include/linux/capability.h          | 12 ++++++++++++
>>   include/uapi/linux/capability.h     |  8 +++++++-
>>   security/selinux/include/classmap.h |  4 ++--
>>   3 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>> index ecce0f43c73a..8784969d91e1 100644
>> --- a/include/linux/capability.h
>> +++ b/include/linux/capability.h
>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>   extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>   extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>> +static inline bool perfmon_capable(void)
>> +{
>> +    struct user_namespace *ns = &init_user_ns;
>> +
>> +    if (ns_capable_noaudit(ns, CAP_PERFMON))
>> +        return ns_capable(ns, CAP_PERFMON);
>> +
>> +    if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>> +        return ns_capable(ns, CAP_SYS_ADMIN);
>> +
>> +    return false;
>> +}
>
> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.

Some of ideas from v4 review.
Well, on the second sight, it defenitly should be logged for CAP_SYS_ADMIN.
Probably it is not so fatal for CAP_PERFMON, but personally
I would unconditionally log it for CAP_PERFMON as well.
Good catch, thank you.

~Alexey

2020-01-21 17:58:20

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
<[email protected]> wrote:
>
>
> On 21.01.2020 17:43, Stephen Smalley wrote:
> > On 1/20/20 6:23 AM, Alexey Budankov wrote:
> >>
> >> Introduce CAP_PERFMON capability designed to secure system performance
> >> monitoring and observability operations so that CAP_PERFMON would assist
> >> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
> >> and other performance monitoring and observability subsystems.
> >>
> >> CAP_PERFMON intends to harden system security and integrity during system
> >> performance monitoring and observability operations by decreasing attack
> >> surface that is available to a CAP_SYS_ADMIN privileged process [1].
> >> Providing access to system performance monitoring and observability
> >> operations under CAP_PERFMON capability singly, without the rest of
> >> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
> >> makes operation more secure.
> >>
> >> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> >> system performance monitoring and observability operations and balance
> >> amount of CAP_SYS_ADMIN credentials following the recommendations in the
> >> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
> >> overloaded; see Notes to kernel developers, below."
> >>
> >> Although the software running under CAP_PERFMON can not ensure avoidance
> >> of related hardware issues, the software can still mitigate these issues
> >> following the official embargoed hardware issues mitigation procedure [2].
> >> The bugs in the software itself could be fixed following the standard
> >> kernel development process [3] to maintain and harden security of system
> >> performance monitoring and observability operations.
> >>
> >> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> >> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
> >> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
> >>
> >> Signed-off-by: Alexey Budankov <[email protected]>
> >> ---
> >> include/linux/capability.h | 12 ++++++++++++
> >> include/uapi/linux/capability.h | 8 +++++++-
> >> security/selinux/include/classmap.h | 4 ++--
> >> 3 files changed, 21 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/capability.h b/include/linux/capability.h
> >> index ecce0f43c73a..8784969d91e1 100644
> >> --- a/include/linux/capability.h
> >> +++ b/include/linux/capability.h
> >> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
> >> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
> >> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
> >> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
> >> +static inline bool perfmon_capable(void)
> >> +{
> >> + struct user_namespace *ns = &init_user_ns;
> >> +
> >> + if (ns_capable_noaudit(ns, CAP_PERFMON))
> >> + return ns_capable(ns, CAP_PERFMON);
> >> +
> >> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
> >> + return ns_capable(ns, CAP_SYS_ADMIN);
> >> +
> >> + return false;
> >> +}
> >
> > Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
>
> Some of ideas from v4 review.

well, in the requested changes form v4 I wrote:
return capable(CAP_PERFMON);
instead of
return false;

That's what Andy suggested earlier for CAP_BPF.
I think that should resolve Stephen's concern.

2020-01-21 18:29:07

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


On 21.01.2020 20:55, Alexei Starovoitov wrote:
> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
> <[email protected]> wrote:
>>
>>
>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>
>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>> and other performance monitoring and observability subsystems.
>>>>
>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>> performance monitoring and observability operations by decreasing attack
>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>> Providing access to system performance monitoring and observability
>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>> makes operation more secure.
>>>>
>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>> system performance monitoring and observability operations and balance
>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>> overloaded; see Notes to kernel developers, below."
>>>>
>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>> of related hardware issues, the software can still mitigate these issues
>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>> The bugs in the software itself could be fixed following the standard
>>>> kernel development process [3] to maintain and harden security of system
>>>> performance monitoring and observability operations.
>>>>
>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>
>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>> ---
>>>> include/linux/capability.h | 12 ++++++++++++
>>>> include/uapi/linux/capability.h | 8 +++++++-
>>>> security/selinux/include/classmap.h | 4 ++--
>>>> 3 files changed, 21 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>> index ecce0f43c73a..8784969d91e1 100644
>>>> --- a/include/linux/capability.h
>>>> +++ b/include/linux/capability.h
>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>> +static inline bool perfmon_capable(void)
>>>> +{
>>>> + struct user_namespace *ns = &init_user_ns;
>>>> +
>>>> + if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>> + return ns_capable(ns, CAP_PERFMON);
>>>> +
>>>> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>> + return ns_capable(ns, CAP_SYS_ADMIN);
>>>> +
>>>> + return false;
>>>> +}
>>>
>>> Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
>>
>> Some of ideas from v4 review.
>
> well, in the requested changes form v4 I wrote:
> return capable(CAP_PERFMON);
> instead of
> return false;

Aww, indeed. I was concerning exactly about it when updating the patch
and simply put false, missing the fact that capable() also logs.

I suppose the idea is originally from here [1].
BTW, Has it already seen any _more optimal_ implementation?
Anyway, original or optimized version could be reused for CAP_PERFMON.

~Alexey

[1] https://patchwork.ozlabs.org/patch/1159243/

>
> That's what Andy suggested earlier for CAP_BPF.
> I think that should resolve Stephen's concern.
>

2020-01-22 10:47:18

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


On 21.01.2020 21:27, Alexey Budankov wrote:
>
> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>> <[email protected]> wrote:
>>>
>>>
>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>
>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>> and other performance monitoring and observability subsystems.
>>>>>
>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>> performance monitoring and observability operations by decreasing attack
>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>> Providing access to system performance monitoring and observability
>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>> makes operation more secure.
>>>>>
>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>> system performance monitoring and observability operations and balance
>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>> overloaded; see Notes to kernel developers, below."
>>>>>
>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>> of related hardware issues, the software can still mitigate these issues
>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>> The bugs in the software itself could be fixed following the standard
>>>>> kernel development process [3] to maintain and harden security of system
>>>>> performance monitoring and observability operations.
>>>>>
>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>
>>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>>> ---
>>>>> include/linux/capability.h | 12 ++++++++++++
>>>>> include/uapi/linux/capability.h | 8 +++++++-
>>>>> security/selinux/include/classmap.h | 4 ++--
>>>>> 3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>> --- a/include/linux/capability.h
>>>>> +++ b/include/linux/capability.h
>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>> +static inline bool perfmon_capable(void)
>>>>> +{
>>>>> + struct user_namespace *ns = &init_user_ns;
>>>>> +
>>>>> + if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>> + return ns_capable(ns, CAP_PERFMON);
>>>>> +
>>>>> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>> + return ns_capable(ns, CAP_SYS_ADMIN);
>>>>> +
>>>>> + return false;
>>>>> +}
>>>>
>>>> Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.

So far so good, I suggest using the simplest version for v6:

static inline bool perfmon_capable(void)
{
return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
}

It keeps the implementation simple and readable. The implementation is more
performant in the sense of calling the API - one capable() call for CAP_PERFMON
privileged process.

Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
but this bloating also advertises and leverages using more secure CAP_PERFMON
based approach to use perf_event_open system call.

~Alexey

>>>
>>> Some of ideas from v4 review.
>>
>> well, in the requested changes form v4 I wrote:
>> return capable(CAP_PERFMON);
>> instead of
>> return false;
>
> Aww, indeed. I was concerning exactly about it when updating the patch
> and simply put false, missing the fact that capable() also logs.
>
> I suppose the idea is originally from here [1].
> BTW, Has it already seen any _more optimal_ implementation?
> Anyway, original or optimized version could be reused for CAP_PERFMON.
>
> ~Alexey
>
> [1] https://patchwork.ozlabs.org/patch/1159243/
>
>>
>> That's what Andy suggested earlier for CAP_BPF.
>> I think that should resolve Stephen's concern.
>>

2020-01-22 11:05:22

by Anju T Sudhakar

[permalink] [raw]
Subject: Re: [PATCH v5 07/10] powerpc/perf: open access for CAP_PERFMON privileged process


On 1/20/20 5:00 PM, Alexey Budankov wrote:
> Open access to monitoring for CAP_PERFMON privileged processes.
> For backward compatibility reasons access to the monitoring remains
> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
> for secure monitoring is discouraged with respect to CAP_PERFMON
> capability. Providing the access under CAP_PERFMON capability singly,
> without the rest of CAP_SYS_ADMIN credentials, excludes chances to
> misuse the credentials and makes the operations more secure.
>
> Signed-off-by: Alexey Budankov<[email protected]>
> ---

Acked-by: Anju T Sudhakar<[email protected]>

2020-01-22 14:09:06

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On 1/22/20 5:45 AM, Alexey Budankov wrote:
>
> On 21.01.2020 21:27, Alexey Budankov wrote:
>>
>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>> <[email protected]> wrote:
>>>>
>>>>
>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>
>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>>> and other performance monitoring and observability subsystems.
>>>>>>
>>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>>> performance monitoring and observability operations by decreasing attack
>>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>>> Providing access to system performance monitoring and observability
>>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>>> makes operation more secure.
>>>>>>
>>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>>> system performance monitoring and observability operations and balance
>>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>>> overloaded; see Notes to kernel developers, below."
>>>>>>
>>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>>> of related hardware issues, the software can still mitigate these issues
>>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>>> The bugs in the software itself could be fixed following the standard
>>>>>> kernel development process [3] to maintain and harden security of system
>>>>>> performance monitoring and observability operations.
>>>>>>
>>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>>
>>>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>>>> ---
>>>>>> include/linux/capability.h | 12 ++++++++++++
>>>>>> include/uapi/linux/capability.h | 8 +++++++-
>>>>>> security/selinux/include/classmap.h | 4 ++--
>>>>>> 3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>>> --- a/include/linux/capability.h
>>>>>> +++ b/include/linux/capability.h
>>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>>> extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>>> +static inline bool perfmon_capable(void)
>>>>>> +{
>>>>>> + struct user_namespace *ns = &init_user_ns;
>>>>>> +
>>>>>> + if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>>> + return ns_capable(ns, CAP_PERFMON);
>>>>>> +
>>>>>> + if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>>> + return ns_capable(ns, CAP_SYS_ADMIN);
>>>>>> +
>>>>>> + return false;
>>>>>> +}
>>>>>
>>>>> Why _noaudit()? Normally only used when a permission failure is non-fatal to the operation. Otherwise, we want the audit message.
>
> So far so good, I suggest using the simplest version for v6:
>
> static inline bool perfmon_capable(void)
> {
> return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
> }
>
> It keeps the implementation simple and readable. The implementation is more
> performant in the sense of calling the API - one capable() call for CAP_PERFMON
> privileged process.
>
> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
> but this bloating also advertises and leverages using more secure CAP_PERFMON
> based approach to use perf_event_open system call.

I can live with that. We just need to document that when you see both a
CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only
allowing CAP_PERFMON first and see if that resolves the issue. We have
a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.

2020-01-22 14:27:36

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


On 22.01.2020 17:07, Stephen Smalley wrote:
> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>
>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>
>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>>>> and other performance monitoring and observability subsystems.
>>>>>>>
>>>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>>>> performance monitoring and observability operations by decreasing attack
>>>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>>>> Providing access to system performance monitoring and observability
>>>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>>>> makes operation more secure.
>>>>>>>
>>>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>>>> system performance monitoring and observability operations and balance
>>>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>>>> overloaded; see Notes to kernel developers, below."
>>>>>>>
>>>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>>>> of related hardware issues, the software can still mitigate these issues
>>>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>>>> The bugs in the software itself could be fixed following the standard
>>>>>>> kernel development process [3] to maintain and harden security of system
>>>>>>> performance monitoring and observability operations.
>>>>>>>
>>>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
>>>>>>>
>>>>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>>>>> ---
>>>>>>>    include/linux/capability.h          | 12 ++++++++++++
>>>>>>>    include/uapi/linux/capability.h     |  8 +++++++-
>>>>>>>    security/selinux/include/classmap.h |  4 ++--
>>>>>>>    3 files changed, 21 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>>>>>> index ecce0f43c73a..8784969d91e1 100644
>>>>>>> --- a/include/linux/capability.h
>>>>>>> +++ b/include/linux/capability.h
>>>>>>> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct
>>>>>>>    extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>>>>>>    extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>>>>>    extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
>>>>>>> +static inline bool perfmon_capable(void)
>>>>>>> +{
>>>>>>> +    struct user_namespace *ns = &init_user_ns;
>>>>>>> +
>>>>>>> +    if (ns_capable_noaudit(ns, CAP_PERFMON))
>>>>>>> +        return ns_capable(ns, CAP_PERFMON);
>>>>>>> +
>>>>>>> +    if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
>>>>>>> +        return ns_capable(ns, CAP_SYS_ADMIN);
>>>>>>> +
>>>>>>> +    return false;
>>>>>>> +}
>>>>>>
>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>
>> So far so good, I suggest using the simplest version for v6:
>>
>> static inline bool perfmon_capable(void)
>> {
>>     return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>> }
>>
>> It keeps the implementation simple and readable. The implementation is more
>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>> privileged process.
>>
>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>> based approach to use perf_event_open system call.
>
> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.

perf security [1] document can be updated, at least, to align and document
this audit logging specifics.

~Alexey

[1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html

2020-02-06 18:04:53

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


On 22.01.2020 17:25, Alexey Budankov wrote:
>
> On 22.01.2020 17:07, Stephen Smalley wrote:
>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>
>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>
>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>> monitoring and observability operations so that CAP_PERFMON would assist
>>>>>>>> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
>>>>>>>> and other performance monitoring and observability subsystems.
>>>>>>>>
>>>>>>>> CAP_PERFMON intends to harden system security and integrity during system
>>>>>>>> performance monitoring and observability operations by decreasing attack
>>>>>>>> surface that is available to a CAP_SYS_ADMIN privileged process [1].
>>>>>>>> Providing access to system performance monitoring and observability
>>>>>>>> operations under CAP_PERFMON capability singly, without the rest of
>>>>>>>> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
>>>>>>>> makes operation more secure.
>>>>>>>>
>>>>>>>> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
>>>>>>>> system performance monitoring and observability operations and balance
>>>>>>>> amount of CAP_SYS_ADMIN credentials following the recommendations in the
>>>>>>>> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
>>>>>>>> overloaded; see Notes to kernel developers, below."
>>>>>>>>
>>>>>>>> Although the software running under CAP_PERFMON can not ensure avoidance
>>>>>>>> of related hardware issues, the software can still mitigate these issues
>>>>>>>> following the official embargoed hardware issues mitigation procedure [2].
>>>>>>>> The bugs in the software itself could be fixed following the standard
>>>>>>>> kernel development process [3] to maintain and harden security of system
>>>>>>>> performance monitoring and observability operations.
>>>>>>>>
>>>>>>>> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
>>>>>>>> [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
>>>>>>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
<SNIP>
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Budankov <[email protected]>
>>>>>>>
>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>
>>> So far so good, I suggest using the simplest version for v6:
>>>
>>> static inline bool perfmon_capable(void)
>>> {
>>>     return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>> }
>>>
>>> It keeps the implementation simple and readable. The implementation is more
>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>> privileged process.
>>>
>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>> based approach to use perf_event_open system call.
>>
>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>
> perf security [1] document can be updated, at least, to align and document
> this audit logging specifics.

And I plan to update the document right after this patch set is accepted.
Feel free to let me know of the places in the kernel docs that also
require update w.r.t CAP_PERFMON extension.

~Alexey

>
> ~Alexey
>
> [1] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>

2020-02-07 11:40:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

Alexey Budankov <[email protected]> writes:
> On 22.01.2020 17:25, Alexey Budankov wrote:
>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>> It keeps the implementation simple and readable. The implementation is more
>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>> privileged process.
>>>>
>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>> based approach to use perf_event_open system call.
>>>
>>> I can live with that.  We just need to document that when you see
>>> both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process,
>>> try only allowing CAP_PERFMON first and see if that resolves the
>>> issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus
>>> CAP_DAC_OVERRIDE.
>>
>> perf security [1] document can be updated, at least, to align and document
>> this audit logging specifics.
>
> And I plan to update the document right after this patch set is accepted.
> Feel free to let me know of the places in the kernel docs that also
> require update w.r.t CAP_PERFMON extension.

The documentation update wants be part of the patch set and not planned
to be done _after_ the patch set is merged.

Thanks,

tglx

2020-02-07 13:40:27

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


On 07.02.2020 14:38, Thomas Gleixner wrote:
> Alexey Budankov <[email protected]> writes:
>> On 22.01.2020 17:25, Alexey Budankov wrote:
>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>> privileged process.
>>>>>
>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>> based approach to use perf_event_open system call.
>>>>
>>>> I can live with that.  We just need to document that when you see
>>>> both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process,
>>>> try only allowing CAP_PERFMON first and see if that resolves the
>>>> issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus
>>>> CAP_DAC_OVERRIDE.
>>>
>>> perf security [1] document can be updated, at least, to align and document
>>> this audit logging specifics.
>>
>> And I plan to update the document right after this patch set is accepted.
>> Feel free to let me know of the places in the kernel docs that also
>> require update w.r.t CAP_PERFMON extension.
>
> The documentation update wants be part of the patch set and not planned
> to be done _after_ the patch set is merged.

Well, accepted. It is going to make patches #11 and beyond.

Thanks,
Alexey

>
> Thanks,
>
> tglx
>

2020-02-12 08:54:18

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

Hi Stephen,

On 22.01.2020 17:07, Stephen Smalley wrote:
> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>
>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>
<SNIP>
>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>
>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>
>> So far so good, I suggest using the simplest version for v6:
>>
>> static inline bool perfmon_capable(void)
>> {
>>     return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>> }
>>
>> It keeps the implementation simple and readable. The implementation is more
>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>> privileged process.
>>
>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>> based approach to use perf_event_open system call.
>
> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.

I am trying to reproduce this double logging with CAP_PERFMON.
I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
When running perf stat -a I am observing this AVC audit messages:

type=AVC msg=audit(1581496695.666:8691): avc: denied { open } for pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
type=AVC msg=audit(1581496695.666:8691): avc: denied { kernel } for pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
type=AVC msg=audit(1581496695.666:8691): avc: denied { cpu } for pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
type=AVC msg=audit(1581496695.666:8692): avc: denied { write } for pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1

However there is no capability related messages around. I suppose my refpolicy should
be modified somehow to observe capability related AVCs.

Could you please comment or clarify on how to enable caps related AVCs in order
to test the concerned logging.

Thanks,
Alexey

---
[1] https://github.com/SELinuxProject/refpolicy.git

2020-02-12 13:32:11

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On 2/12/20 3:53 AM, Alexey Budankov wrote:
> Hi Stephen,
>
> On 22.01.2020 17:07, Stephen Smalley wrote:
>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>
>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>
>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>
> <SNIP>
>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>
>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>
>>> So far so good, I suggest using the simplest version for v6:
>>>
>>> static inline bool perfmon_capable(void)
>>> {
>>>     return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>> }
>>>
>>> It keeps the implementation simple and readable. The implementation is more
>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>> privileged process.
>>>
>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>> based approach to use perf_event_open system call.
>>
>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>
> I am trying to reproduce this double logging with CAP_PERFMON.
> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
> When running perf stat -a I am observing this AVC audit messages:
>
> type=AVC msg=audit(1581496695.666:8691): avc: denied { open } for pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
> type=AVC msg=audit(1581496695.666:8691): avc: denied { kernel } for pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
> type=AVC msg=audit(1581496695.666:8691): avc: denied { cpu } for pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
> type=AVC msg=audit(1581496695.666:8692): avc: denied { write } for pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>
> However there is no capability related messages around. I suppose my refpolicy should
> be modified somehow to observe capability related AVCs.
>
> Could you please comment or clarify on how to enable caps related AVCs in order
> to test the concerned logging.

The new perfmon permission has to be defined in your policy; you'll have
a message in dmesg about "Permission perfmon in class capability2 not
defined in policy.". You can either add it to the common cap2
definition in refpolicy/policy/flask/access_vectors and rebuild your
policy or extract your base module as CIL, add it there, and insert the
updated module.


2020-02-12 13:54:19

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On 12.02.2020 16:32, Stephen Smalley wrote:
> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>> Hi Stephen,
>>
>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>
>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>
>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>
>> <SNIP>
>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>
>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>
>>>> So far so good, I suggest using the simplest version for v6:
>>>>
>>>> static inline bool perfmon_capable(void)
>>>> {
>>>>      return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>> }
>>>>
>>>> It keeps the implementation simple and readable. The implementation is more
>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>> privileged process.
>>>>
>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>> based approach to use perf_event_open system call.
>>>
>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>
>> I am trying to reproduce this double logging with CAP_PERFMON.
>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>> When running perf stat -a I am observing this AVC audit messages:
>>
>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>
>> However there is no capability related messages around. I suppose my refpolicy should
>> be modified somehow to observe capability related AVCs.
>>
>> Could you please comment or clarify on how to enable caps related AVCs in order
>> to test the concerned logging.
>
> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.

Yes, I already have it like this:
common cap2
{
<------>mac_override<--># unused by SELinux
<------>mac_admin
<------>syslog
<------>wake_alarm
<------>block_suspend
<------>audit_read
<------>perfmon
}

dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.

~Alexey

>
>

2020-02-12 15:21:21

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On 2/12/20 8:53 AM, Alexey Budankov wrote:
> On 12.02.2020 16:32, Stephen Smalley wrote:
>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>> Hi Stephen,
>>>
>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>
>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>
>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>
>>> <SNIP>
>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>
>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>
>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>
>>>>> static inline bool perfmon_capable(void)
>>>>> {
>>>>>      return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>> }
>>>>>
>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>> privileged process.
>>>>>
>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>> based approach to use perf_event_open system call.
>>>>
>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>
>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>> When running perf stat -a I am observing this AVC audit messages:
>>>
>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>
>>> However there is no capability related messages around. I suppose my refpolicy should
>>> be modified somehow to observe capability related AVCs.
>>>
>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>> to test the concerned logging.
>>
>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>
> Yes, I already have it like this:
> common cap2
> {
> <------>mac_override<--># unused by SELinux
> <------>mac_admin
> <------>syslog
> <------>wake_alarm
> <------>block_suspend
> <------>audit_read
> <------>perfmon
> }
>
> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.

Some denials may be silenced by dontaudit rules; semodule -DB will strip
those and semodule -B will restore them. Other possibility is that the
process doesn't have CAP_PERFMON in its effective set and therefore
never reaches SELinux at all; denied first by the capability module.



2020-02-12 15:45:55

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On 2/12/20 10:21 AM, Stephen Smalley wrote:
> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>> Hi Stephen,
>>>>
>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>
>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>
>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>
>>>> <SNIP>
>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system
>>>>>>>>>>> performance
>>>>>>>>>>
>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure
>>>>>>>>>> is non-fatal to the operation.  Otherwise, we want the audit
>>>>>>>>>> message.
>>>>>>
>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>
>>>>>> static inline bool perfmon_capable(void)
>>>>>> {
>>>>>>       return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>> }
>>>>>>
>>>>>> It keeps the implementation simple and readable. The
>>>>>> implementation is more
>>>>>> performant in the sense of calling the API - one capable() call
>>>>>> for CAP_PERFMON
>>>>>> privileged process.
>>>>>>
>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and
>>>>>> unprivileged processes,
>>>>>> but this bloating also advertises and leverages using more secure
>>>>>> CAP_PERFMON
>>>>>> based approach to use perf_event_open system call.
>>>>>
>>>>> I can live with that.  We just need to document that when you see
>>>>> both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process,
>>>>> try only allowing CAP_PERFMON first and see if that resolves the
>>>>> issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus
>>>>> CAP_DAC_OVERRIDE.
>>>>
>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>> I am using the refpolicy version with enabled perf_event tclass [1],
>>>> in permissive mode.
>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for
>>>> pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t
>>>> tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel }
>>>> for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t
>>>> tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for
>>>> pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t
>>>> tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write }
>>>> for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t
>>>> tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>
>>>> However there is no capability related messages around. I suppose my
>>>> refpolicy should
>>>> be modified somehow to observe capability related AVCs.
>>>>
>>>> Could you please comment or clarify on how to enable caps related
>>>> AVCs in order
>>>> to test the concerned logging.
>>>
>>> The new perfmon permission has to be defined in your policy; you'll
>>> have a message in dmesg about "Permission perfmon in class
>>> capability2 not defined in policy.".  You can either add it to the
>>> common cap2 definition in refpolicy/policy/flask/access_vectors and
>>> rebuild your policy or extract your base module as CIL, add it there,
>>> and insert the updated module.
>>
>> Yes, I already have it like this:
>> common cap2
>> {
>> <------>mac_override<--># unused by SELinux
>> <------>mac_admin
>> <------>syslog
>> <------>wake_alarm
>> <------>block_suspend
>> <------>audit_read
>> <------>perfmon
>> }
>>
>> dmesg stopped reporting perfmon as not defined but audit.log still
>> doesn't report CAP_PERFMON denials.
>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however
>> perfmon_capable() does check for it.
>
> Some denials may be silenced by dontaudit rules; semodule -DB will strip
> those and semodule -B will restore them.  Other possibility is that the
> process doesn't have CAP_PERFMON in its effective set and therefore
> never reaches SELinux at all; denied first by the capability module.

Also, the fact that your denials are showing up in user_systemd_t
suggests that something is off in your policy or userspace/distro; I
assume that is a domain type for the systemd --user instance, but your
shell and commands shouldn't be running in that domain (user_t would be
more appropriate for that).

2020-02-12 16:18:44

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On 12.02.2020 18:21, Stephen Smalley wrote:
> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>> Hi Stephen,
>>>>
>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>
>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>
>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>
>>>> <SNIP>
>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>>
>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>>
>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>
>>>>>> static inline bool perfmon_capable(void)
>>>>>> {
>>>>>>       return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>> }
>>>>>>
>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>> privileged process.
>>>>>>
>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>> based approach to use perf_event_open system call.
>>>>>
>>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>>
>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>
>>>> However there is no capability related messages around. I suppose my refpolicy should
>>>> be modified somehow to observe capability related AVCs.
>>>>
>>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>>> to test the concerned logging.
>>>
>>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>>
>> Yes, I already have it like this:
>> common cap2
>> {
>> <------>mac_override<--># unused by SELinux
>> <------>mac_admin
>> <------>syslog
>> <------>wake_alarm
>> <------>block_suspend
>> <------>audit_read
>> <------>perfmon
>> }
>>
>> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.
>
> Some denials may be silenced by dontaudit rules; semodule -DB will strip those and semodule -B will restore them.  Other possibility is that the process doesn't have CAP_PERFMON in its effective set and therefore never reaches SELinux at all; denied first by the capability module.

Yes, that all makes sense.
selinux_capable() calls avc_audit() logging but cap_capable() doesn't, so proper order matters.
I am doing debug tracing of the kernel code to reveal the exact reasons.

~Alexey

2020-02-12 16:57:37

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space



On 12.02.2020 18:45, Stephen Smalley wrote:
> On 2/12/20 10:21 AM, Stephen Smalley wrote:
>> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>>> Hi Stephen,
>>>>>
>>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>>
>>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>>
>>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>>
>>>>> <SNIP>
>>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>>>
>>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>>>
>>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>>
>>>>>>> static inline bool perfmon_capable(void)
>>>>>>> {
>>>>>>>       return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>>> }
>>>>>>>
>>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>>> privileged process.
>>>>>>>
>>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>>> based approach to use perf_event_open system call.
>>>>>>
>>>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>>>
>>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>>
>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>
>>>>> However there is no capability related messages around. I suppose my refpolicy should
>>>>> be modified somehow to observe capability related AVCs.
>>>>>
>>>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>>>> to test the concerned logging.
>>>>
>>>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>>>
>>> Yes, I already have it like this:
>>> common cap2
>>> {
>>> <------>mac_override<--># unused by SELinux
>>> <------>mac_admin
>>> <------>syslog
>>> <------>wake_alarm
>>> <------>block_suspend
>>> <------>audit_read
>>> <------>perfmon
>>> }
>>>
>>> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
>>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.
>>
>> Some denials may be silenced by dontaudit rules; semodule -DB will strip those and semodule -B will restore them.  Other possibility is that the process doesn't have CAP_PERFMON in its effective set and therefore never reaches SELinux at all; denied first by the capability module.
>
> Also, the fact that your denials are showing up in user_systemd_t suggests that something is off in your policy or userspace/distro; I assume that is a domain type for the systemd --user instance, but your shell and commands shouldn't be running in that domain (user_t would be more appropriate for that).

It is user_t for local terminal session:
ps -Z
LABEL PID TTY TIME CMD
user_u:user_r:user_t 11317 pts/9 00:00:00 bash
user_u:user_r:user_t 11796 pts/9 00:00:00 ps

For local terminal root session:
ps -Z
LABEL PID TTY TIME CMD
user_u:user_r:user_su_t 2926 pts/3 00:00:00 bash
user_u:user_r:user_su_t 10995 pts/3 00:00:00 ps

For remote ssh session:
ps -Z
LABEL PID TTY TIME CMD
user_u:user_r:user_t 7540 pts/8 00:00:00 ps
user_u:user_r:user_systemd_t 8875 pts/8 00:00:00 bash

~Alexey

2020-02-12 17:09:41

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

On 2/12/20 11:56 AM, Alexey Budankov wrote:
>
>
> On 12.02.2020 18:45, Stephen Smalley wrote:
>> On 2/12/20 10:21 AM, Stephen Smalley wrote:
>>> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>>>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>>>> Hi Stephen,
>>>>>>
>>>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>>>
>>>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>>>
>>>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>>>
>>>>>> <SNIP>
>>>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>>>>
>>>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>>>>
>>>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>>>
>>>>>>>> static inline bool perfmon_capable(void)
>>>>>>>> {
>>>>>>>>       return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>>>> }
>>>>>>>>
>>>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>>>> privileged process.
>>>>>>>>
>>>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>>>> based approach to use perf_event_open system call.
>>>>>>>
>>>>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>>>>
>>>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>>>
>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>
>>>>>> However there is no capability related messages around. I suppose my refpolicy should
>>>>>> be modified somehow to observe capability related AVCs.
>>>>>>
>>>>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>>>>> to test the concerned logging.
>>>>>
>>>>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>>>>
>>>> Yes, I already have it like this:
>>>> common cap2
>>>> {
>>>> <------>mac_override<--># unused by SELinux
>>>> <------>mac_admin
>>>> <------>syslog
>>>> <------>wake_alarm
>>>> <------>block_suspend
>>>> <------>audit_read
>>>> <------>perfmon
>>>> }
>>>>
>>>> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
>>>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.
>>>
>>> Some denials may be silenced by dontaudit rules; semodule -DB will strip those and semodule -B will restore them.  Other possibility is that the process doesn't have CAP_PERFMON in its effective set and therefore never reaches SELinux at all; denied first by the capability module.
>>
>> Also, the fact that your denials are showing up in user_systemd_t suggests that something is off in your policy or userspace/distro; I assume that is a domain type for the systemd --user instance, but your shell and commands shouldn't be running in that domain (user_t would be more appropriate for that).
>
> It is user_t for local terminal session:
> ps -Z
> LABEL PID TTY TIME CMD
> user_u:user_r:user_t 11317 pts/9 00:00:00 bash
> user_u:user_r:user_t 11796 pts/9 00:00:00 ps
>
> For local terminal root session:
> ps -Z
> LABEL PID TTY TIME CMD
> user_u:user_r:user_su_t 2926 pts/3 00:00:00 bash
> user_u:user_r:user_su_t 10995 pts/3 00:00:00 ps
>
> For remote ssh session:
> ps -Z
> LABEL PID TTY TIME CMD
> user_u:user_r:user_t 7540 pts/8 00:00:00 ps
> user_u:user_r:user_systemd_t 8875 pts/8 00:00:00 bash

That's a bug in either your policy or your userspace/distro integration.
In any event, unless user_systemd_t is allowed all capability2
permissions by your policy, you should see the denials if CAP_PERFMON is
set in the effective capability set of the process.

2020-02-13 09:05:57

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


On 12.02.2020 20:09, Stephen Smalley wrote:
> On 2/12/20 11:56 AM, Alexey Budankov wrote:
>>
>>
>> On 12.02.2020 18:45, Stephen Smalley wrote:
>>> On 2/12/20 10:21 AM, Stephen Smalley wrote:
>>>> On 2/12/20 8:53 AM, Alexey Budankov wrote:
>>>>> On 12.02.2020 16:32, Stephen Smalley wrote:
>>>>>> On 2/12/20 3:53 AM, Alexey Budankov wrote:
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>>>>> On 1/22/20 5:45 AM, Alexey Budankov wrote:
>>>>>>>>>
>>>>>>>>> On 21.01.2020 21:27, Alexey Budankov wrote:
>>>>>>>>>>
>>>>>>>>>> On 21.01.2020 20:55, Alexei Starovoitov wrote:
>>>>>>>>>>> On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 21.01.2020 17:43, Stephen Smalley wrote:
>>>>>>>>>>>>> On 1/20/20 6:23 AM, Alexey Budankov wrote:
>>>>>>>>>>>>>>
>>>>>>> <SNIP>
>>>>>>>>>>>>>> Introduce CAP_PERFMON capability designed to secure system performance
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why _noaudit()?  Normally only used when a permission failure is non-fatal to the operation.  Otherwise, we want the audit message.
>>>>>>>>>
>>>>>>>>> So far so good, I suggest using the simplest version for v6:
>>>>>>>>>
>>>>>>>>> static inline bool perfmon_capable(void)
>>>>>>>>> {
>>>>>>>>>        return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>>>>> privileged process.
>>>>>>>>>
>>>>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>>>>> based approach to use perf_event_open system call.
>>>>>>>>
>>>>>>>> I can live with that.  We just need to document that when you see both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process, try only allowing CAP_PERFMON first and see if that resolves the issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus CAP_DAC_OVERRIDE.
>>>>>>>
>>>>>>> I am trying to reproduce this double logging with CAP_PERFMON.
>>>>>>> I am using the refpolicy version with enabled perf_event tclass [1], in permissive mode.
>>>>>>> When running perf stat -a I am observing this AVC audit messages:
>>>>>>>
>>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { open } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { kernel } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>> type=AVC msg=audit(1581496695.666:8691): avc:  denied  { cpu } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>> type=AVC msg=audit(1581496695.666:8692): avc:  denied  { write } for  pid=2779 comm="perf" scontext=user_u:user_r:user_systemd_t tcontext=user_u:user_r:user_systemd_t tclass=perf_event permissive=1
>>>>>>>
>>>>>>> However there is no capability related messages around. I suppose my refpolicy should
>>>>>>> be modified somehow to observe capability related AVCs.
>>>>>>>
>>>>>>> Could you please comment or clarify on how to enable caps related AVCs in order
>>>>>>> to test the concerned logging.
>>>>>>
>>>>>> The new perfmon permission has to be defined in your policy; you'll have a message in dmesg about "Permission perfmon in class capability2 not defined in policy.".  You can either add it to the common cap2 definition in refpolicy/policy/flask/access_vectors and rebuild your policy or extract your base module as CIL, add it there, and insert the updated module.
>>>>>
>>>>> Yes, I already have it like this:
>>>>> common cap2
>>>>> {
>>>>> <------>mac_override<--># unused by SELinux
>>>>> <------>mac_admin
>>>>> <------>syslog
>>>>> <------>wake_alarm
>>>>> <------>block_suspend
>>>>> <------>audit_read
>>>>> <------>perfmon
>>>>> }
>>>>>
>>>>> dmesg stopped reporting perfmon as not defined but audit.log still doesn't report CAP_PERFMON denials.
>>>>> BTW, audit even doesn't report CAP_SYS_ADMIN denials, however perfmon_capable() does check for it.
>>>>
>>>> Some denials may be silenced by dontaudit rules; semodule -DB will strip those and semodule -B will restore them.  Other possibility is that the process doesn't have CAP_PERFMON in its effective set and therefore never reaches SELinux at all; denied first by the capability module.
>>>
>>> Also, the fact that your denials are showing up in user_systemd_t suggests that something is off in your policy or userspace/distro; I assume that is a domain type for the systemd --user instance, but your shell and commands shouldn't be running in that domain (user_t would be more appropriate for that).
>>
>> It is user_t for local terminal session:
>> ps -Z
>> LABEL                             PID TTY          TIME CMD
>> user_u:user_r:user_t            11317 pts/9    00:00:00 bash
>> user_u:user_r:user_t            11796 pts/9    00:00:00 ps
>>
>> For local terminal root session:
>> ps -Z
>> LABEL                             PID TTY          TIME CMD
>> user_u:user_r:user_su_t          2926 pts/3    00:00:00 bash
>> user_u:user_r:user_su_t         10995 pts/3    00:00:00 ps
>>
>> For remote ssh session:
>> ps -Z
>> LABEL                             PID TTY          TIME CMD
>> user_u:user_r:user_t             7540 pts/8    00:00:00 ps
>> user_u:user_r:user_systemd_t     8875 pts/8    00:00:00 bash
>
> That's a bug in either your policy or your userspace/distro integration.  In any event, unless user_systemd_t is allowed all capability2 permissions by your policy, you should see the denials if CAP_PERFMON is set in the effective capability set of the process.
>

That all seems to be true. After instrumentation, rebuilding and rebooting, in CAP_PERFMON case:

$ getcap perf
perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep

$ perf stat -a

type=AVC msg=audit(1581580399.165:784): avc: denied { open } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:785): avc: denied { perfmon } for pid=8859 comm="perf" capability=38 scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=capability2 permissive=1
type=AVC msg=audit(1581580399.165:786): avc: denied { kernel } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:787): avc: denied { cpu } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:788): avc: denied { write } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580408.078:791): avc: denied { read } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1

dmesg:

[ 137.877713] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[ 137.877774] cread_has_capability(CAP_PERFMON) = 0
[ 137.877775] prior avc_audit(CAP_PERFMON)
[ 137.877779] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = 0

[ 137.877784] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[ 137.877785] cread_has_capability(CAP_PERFMON) = 0
[ 137.877786] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = 0

[ 137.877794] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[ 137.877795] cread_has_capability(CAP_PERFMON) = 0
[ 137.877796] security_capable(0000000071f7ee6e, 000000009dd7a5fc, CAP_PERFMON, 0) = 0

...

in CAP_SYS_ADMIN case:

$ getcap perf
perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep

$ perf stat -a

type=AVC msg=audit(1581580747.928:835): avc: denied { open } for pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580747.928:836): avc: denied { cpu } for pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580747.928:837): avc: denied { kernel } for pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580747.928:838): avc: denied { read } for pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580747.928:839): avc: denied { write } for pid=8927 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
...

$ perf record -- ls
...
type=AVC msg=audit(1581580747.930:843): avc: denied { sys_ptrace } for pid=8927 comm="perf" capability=19 scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=capability permissive=1
...

dmesg:

[ 276.714266] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[ 276.714268] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[ 276.714269] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[ 276.714270] cread_has_capability(CAP_SYS_ADMIN) = 0
[ 276.714270] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = 0

[ 276.714287] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[ 276.714287] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[ 276.714288] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[ 276.714288] cread_has_capability(CAP_SYS_ADMIN) = 0
[ 276.714289] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = 0

[ 276.714294] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[ 276.714295] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[ 276.714295] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[ 276.714296] cread_has_capability(CAP_SYS_ADMIN) = 0
[ 276.714296] security_capable(000000006b09ad8a, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = 0

...

in unprivileged case:

$ getcap perf
perf =

$ perf stat -a; perf record -a

...

dmesg:

[ 947.275611] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[ 947.275613] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[ 947.275614] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[ 947.275615] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = -1

[ 947.275636] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_PERFMON, 0) = ?
[ 947.275637] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_PERFMON, 0) = -1

[ 947.275638] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = ?
[ 947.275638] security_capable(00000000d3a75377, 000000009dd7a5fc, CAP_SYS_ADMIN, 0) = -1

...

So it looks like CAP_PERFMON and CAP_SYS_ADMIN are not ever logged by AVC simultaneously,
in the current LSM and perfmon_capable() implementations.

If perfmon is granted:
perfmon is not logged by capabilities, perfmon is logged by AVC,
no check for sys_admin by perfmon_capable().

If perfmon is not granted but sys_admin is granted:
perfmon is not logged by capabilities, AVC logging is not called for perfmon,
sys_admin is not logged by capabilities, sys_admin is not logged by AVC, for some intended reason?

No caps are granted:
AVC logging is not called either for perfmon or for sys_admin.

BTW, is there a way to may be drop some AV cache so denials would appear in audit in the next AV access?

Well, I guess you have initially mentioned some case similar to this (note that ids are not the same but pids= are):

type=AVC msg=audit(1581580399.165:784): avc: denied { open } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:785): avc: denied { perfmon } for pid=8859 comm="perf" capability=38 scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=capability2 permissive=1
type=AVC msg=audit( . : ): avc: denied { sys_admin } for pid=8859 comm="perf" capability=21 scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=capability2 permissive=1
type=AVC msg=audit(1581580399.165:786): avc: denied { kernel } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:787): avc: denied { cpu } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580399.165:788): avc: denied { write } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1
type=AVC msg=audit(1581580408.078:791): avc: denied { read } for pid=8859 comm="perf" scontext=user_u:user_r:user_t tcontext=user_u:user_r:user_t tclass=perf_event permissive=1

So the message could be like this:

"If audit logs for a process using perf_events related syscalls i.e. perf_event_open(), read(), write(),
ioctl(), mmap() contain denials both for CAP_PERFMON and CAP_SYS_ADMIN capabilities then providing the
process with CAP_PERFMON capability singly is the secure preferred approach to resolve access denials
to performance monitoring and observability operations."

~Alexey

2020-02-20 13:06:04

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space


On 07.02.2020 16:39, Alexey Budankov wrote:
>
> On 07.02.2020 14:38, Thomas Gleixner wrote:
>> Alexey Budankov <[email protected]> writes:
>>> On 22.01.2020 17:25, Alexey Budankov wrote:
>>>> On 22.01.2020 17:07, Stephen Smalley wrote:
>>>>>> It keeps the implementation simple and readable. The implementation is more
>>>>>> performant in the sense of calling the API - one capable() call for CAP_PERFMON
>>>>>> privileged process.
>>>>>>
>>>>>> Yes, it bloats audit log for CAP_SYS_ADMIN privileged and unprivileged processes,
>>>>>> but this bloating also advertises and leverages using more secure CAP_PERFMON
>>>>>> based approach to use perf_event_open system call.
>>>>>
>>>>> I can live with that.  We just need to document that when you see
>>>>> both a CAP_PERFMON and a CAP_SYS_ADMIN audit message for a process,
>>>>> try only allowing CAP_PERFMON first and see if that resolves the
>>>>> issue.  We have a similar issue with CAP_DAC_READ_SEARCH versus
>>>>> CAP_DAC_OVERRIDE.
>>>>
>>>> perf security [1] document can be updated, at least, to align and document
>>>> this audit logging specifics.
>>>
>>> And I plan to update the document right after this patch set is accepted.
>>> Feel free to let me know of the places in the kernel docs that also
>>> require update w.r.t CAP_PERFMON extension.
>>
>> The documentation update wants be part of the patch set and not planned
>> to be done _after_ the patch set is merged.
>
> Well, accepted. It is going to make patches #11 and beyond.

Patches #11 and #12 of v7 [1] contain information on CAP_PERFMON intention and usage.
Patch for man-pages [2] extends perf_event_open.2 documentation.

Thanks,
Alexey

---
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/