2018-09-19 12:29:41

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

From: Tvrtko Ursulin <[email protected]>

For situations where sysadmins might want to allow different level of
access control for different PMUs, we start creating per-PMU
perf_event_paranoid controls in sysfs.

These work in equivalent fashion as the existing perf_event_paranoid
sysctl, which now becomes the parent control for each PMU.

On PMU registration the global/parent value will be inherited by each PMU,
as it will be propagated to all registered PMUs when the sysctl is
updated.

At any later point individual PMU access controls, located in
<sysfs>/device/<pmu-name>/perf_event_paranoid, can be adjusted to achieve
fine grained access control.

Discussion from previous posting:
https://lkml.org/lkml/2018/5/21/156

New in this version:

1. Hopefully fixed build issues in core-book3s.c. (Thanks kbuild test robot!)
2. Perf tool support for new controls, largely contributed by Jiri Olsa.

Tvrtko Ursulin (5):
perf: Move some access checks later in perf_event_open
perf: Pass pmu pointer to perf_paranoid_* helpers
perf: Allow per PMU access control
perf Documentation: Document the per PMU perf_event_paranoid interface
tools/perf: Add support for per-PMU access control

.../sysfs-bus-event_source-devices-events | 14 +++
arch/powerpc/perf/core-book3s.c | 31 ++++--
arch/x86/events/intel/bts.c | 2 +-
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/p4.c | 2 +-
include/linux/perf_event.h | 18 ++-
kernel/events/core.c | 104 +++++++++++++++---
kernel/sysctl.c | 4 +-
kernel/trace/trace_event_perf.c | 6 +-
tools/perf/arch/arm/util/cs-etm.c | 2 +-
tools/perf/arch/arm64/util/arm-spe.c | 2 +-
tools/perf/arch/x86/util/intel-bts.c | 2 +-
tools/perf/arch/x86/util/intel-pt.c | 2 +-
tools/perf/util/evsel.c | 41 ++++++-
tools/perf/util/pmu.c | 17 +++
tools/perf/util/pmu.h | 1 +
16 files changed, 204 insertions(+), 46 deletions(-)

--
2.17.1



2018-09-19 12:28:37

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 4/5] perf Documentation: Document the per PMU perf_event_paranoid interface

From: Tvrtko Ursulin <[email protected]>

Explain behaviour of the new control knob along side the existing perf
event documentation.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
.../testing/sysfs-bus-event_source-devices-events | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
index 505f080d20a1..b3096ae9be6f 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
@@ -92,3 +92,17 @@ Description: Perf event scaling factors

This is provided to avoid performing floating point arithmetic
in the kernel.
+
+What: /sys/bus/event_source/devices/<pmu>/perf_event_paranoid
+Date: 2018/06/26
+Contact: Linux kernel mailing list <[email protected]>
+Description: Per PMU access control file
+
+ This is the per PMU version of the perf_event_paranoid sysctl
+ which allows controlling the access control level for each
+ individual PMU.
+
+ Writes to the sysctl will propagate to all PMU providers.
+
+ For explanation of supported values please consult the sysctl
+ documentation.
--
2.17.1


2018-09-19 12:28:40

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 5/5] tools/perf: Add support for per-PMU access control

From: Tvrtko Ursulin <[email protected]>

Now that the perf core supports per-PMU paranoid settings we need to
extend the tool to support that.

We handle the per-PMU setting in the platform support code where
applicable and also notify the user of the new facility on failures to
open the event.

Thanks to Jiri Olsa for contributing most of the "meat" for this patch and
suggestion to improve the error banner as well.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
tools/perf/arch/arm/util/cs-etm.c | 2 +-
tools/perf/arch/arm64/util/arm-spe.c | 2 +-
tools/perf/arch/x86/util/intel-bts.c | 2 +-
tools/perf/arch/x86/util/intel-pt.c | 2 +-
tools/perf/util/evsel.c | 41 ++++++++++++++++++++++++++--
tools/perf/util/pmu.c | 17 ++++++++++++
tools/perf/util/pmu.h | 1 +
7 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index 2f595cd73da6..a5437f100ab9 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -69,7 +69,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
struct perf_evsel *evsel, *cs_etm_evsel = NULL;
const struct cpu_map *cpus = evlist->cpus;
- bool privileged = (geteuid() == 0 || perf_event_paranoid() < 0);
+ bool privileged = (geteuid() == 0 || cs_etm_pmu->paranoid < 0);

ptr->evlist = evlist;
ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 5ccfce87e693..e7e8154757fa 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -65,7 +65,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
container_of(itr, struct arm_spe_recording, itr);
struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
struct perf_evsel *evsel, *arm_spe_evsel = NULL;
- bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+ bool privileged = geteuid() == 0 || arm_spe_pmu->paranoid < 0;
struct perf_evsel *tracking_evsel;
int err;

diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
index 781df40b2966..c97e6556c8e7 100644
--- a/tools/perf/arch/x86/util/intel-bts.c
+++ b/tools/perf/arch/x86/util/intel-bts.c
@@ -116,7 +116,7 @@ static int intel_bts_recording_options(struct auxtrace_record *itr,
struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
struct perf_evsel *evsel, *intel_bts_evsel = NULL;
const struct cpu_map *cpus = evlist->cpus;
- bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+ bool privileged = geteuid() == 0 || intel_bts_pmu->paranoid < 0;

btsr->evlist = evlist;
btsr->snapshot_mode = opts->auxtrace_snapshot_mode;
diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index db0ba8caf5a2..ffbe5f7f1c57 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -555,7 +555,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
bool have_timing_info, need_immediate = false;
struct perf_evsel *evsel, *intel_pt_evsel = NULL;
const struct cpu_map *cpus = evlist->cpus;
- bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+ bool privileged = geteuid() == 0 || intel_pt_pmu->paranoid < 0;
u64 tsc_bit;
int err;

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1a61628a1c12..fbebce0593f4 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -9,6 +9,7 @@

#include <byteswap.h>
#include <errno.h>
+#include <fcntl.h>
#include <inttypes.h>
#include <linux/bitops.h>
#include <api/fs/fs.h>
@@ -20,6 +21,7 @@
#include <linux/err.h>
#include <sys/ioctl.h>
#include <sys/resource.h>
+#include <sys/stat.h>
#include <sys/types.h>
#include <dirent.h>
#include "asm/bug.h"
@@ -2843,10 +2845,37 @@ static bool find_process(const char *name)
return ret ? false : true;
}

+static int __pmu_paranoid_value(const char *name, char *buf, int bufsz)
+{
+ char path[PATH_MAX];
+ int fd, ret;
+
+ ret = snprintf(path, sizeof(path),
+ "%s/bus/event_source/devices/%s/perf_event_paranoid",
+ sysfs__mountpoint(), name);
+ if (ret < 0 || ret == sizeof(path))
+ return -1;
+
+ fd = open(path, O_RDONLY);
+ if (fd < 0)
+ return -1;
+
+ ret = read(fd, buf, bufsz - 1);
+ if (ret <= 0)
+ return -1;
+
+ if (buf[ret - 1] == '\n')
+ buf[ret - 1] = 0;
+ else
+ buf[ret] = 0;
+
+ return 0;
+}
+
int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
int err, char *msg, size_t size)
{
- char sbuf[STRERR_BUFSIZE];
+ char sbuf[STRERR_BUFSIZE], buf[4];
int printed = 0;

switch (err) {
@@ -2870,9 +2899,15 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
">= 1: Disallow CPU event access by users without CAP_SYS_ADMIN\n"
">= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN\n\n"
"To make this setting permanent, edit /etc/sysctl.conf too, e.g.:\n\n"
- " kernel.perf_event_paranoid = -1\n" ,
+ " kernel.perf_event_paranoid = -1\n\n"
+ "Alternatively an identical per PMU setting can be found and adjusted at\n"
+ "/sys/bus/event_source/devices/%s/perf_event_paranoid for fine-grained\n"
+ "access control. The current value is '%s'.\n",
target->system_wide ? "system-wide " : "",
- perf_event_paranoid());
+ perf_event_paranoid(),
+ evsel->pmu_name,
+ __pmu_paranoid_value(evsel->pmu_name, buf,
+ sizeof(buf)) ? "-" : buf);
case ENOENT:
return scnprintf(msg, size, "The %s event is not supported.",
perf_evsel__name(evsel));
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index afd68524ffa9..5ecead4969af 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -545,6 +545,22 @@ static int pmu_type(const char *name, __u32 *type)
return ret;
}

+static int pmu_paranoid(const char *name)
+{
+ char path[PATH_MAX];
+ int ret, paranoid;
+
+ ret = snprintf(path, sizeof(path),
+ EVENT_SOURCE_DEVICE_PATH "%s/perf_event_paranoid",
+ name);
+
+ if (ret > 0 && ret < (int)sizeof(path) &&
+ !sysfs__read_int(path, &paranoid))
+ return paranoid;
+
+ return perf_event_paranoid();
+}
+
/* Add all pmus in sysfs to pmu list: */
static void pmu_read_sysfs(void)
{
@@ -825,6 +841,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
pmu->name = strdup(name);
pmu->type = type;
pmu->is_uncore = pmu_is_uncore(name);
+ pmu->paranoid = pmu_paranoid(name);
pmu_add_cpu_aliases(&aliases, pmu);

INIT_LIST_HEAD(&pmu->format);
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 76fecec7b3f9..1f00c8bbdb90 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -30,6 +30,7 @@ struct perf_pmu {
struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
struct list_head list; /* ELEM */
int (*set_drv_config) (struct perf_evsel_config_term *term);
+ int paranoid;
};

struct perf_pmu_info {
--
2.17.1


2018-09-19 12:29:04

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 2/5] perf: Pass pmu pointer to perf_paranoid_* helpers

From: Tvrtko Ursulin <[email protected]>

To enable per-PMU access controls in a following patch we need to start
passing in the PMU object pointer to perf_paranoid_* helpers.

This patch only changes the API across the code base without changing the
behaviour.

v2:
* Correct errors in core-book3s.c as reported by kbuild test robot.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/powerpc/perf/core-book3s.c | 31 ++++++++++++++++++++++---------
arch/x86/events/intel/bts.c | 2 +-
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/p4.c | 2 +-
include/linux/perf_event.h | 6 +++---
kernel/events/core.c | 15 ++++++++-------
kernel/trace/trace_event_perf.c | 6 ++++--
7 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 81f8a0c838ae..1e8b1aed6e81 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -95,7 +95,13 @@ static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
{
return 0;
}
-static inline void perf_get_data_addr(struct pt_regs *regs, u64 *addrp) { }
+
+static inline void
+perf_get_data_addr(struct pmu *pmu, struct pt_regs *regs, u64 *addrp)
+{
+
+}
+
static inline u32 perf_get_misc_flags(struct pt_regs *regs)
{
return 0;
@@ -126,7 +132,13 @@ static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
static inline void power_pmu_bhrb_enable(struct perf_event *event) {}
static inline void power_pmu_bhrb_disable(struct perf_event *event) {}
static void power_pmu_sched_task(struct perf_event_context *ctx, bool sched_in) {}
-static inline void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) {}
+
+static inline void
+power_pmu_bhrb_read(struct pmu *pmu,struct cpu_hw_events *cpuhw)
+{
+
+}
+
static void pmao_restore_workaround(bool ebb) { }
#endif /* CONFIG_PPC32 */

@@ -170,7 +182,8 @@ static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
* pointed to by SIAR; this is indicated by the [POWER6_]MMCRA_SDSYNC, the
* [POWER7P_]MMCRA_SDAR_VALID bit in MMCRA, or the SDAR_VALID bit in SIER.
*/
-static inline void perf_get_data_addr(struct pt_regs *regs, u64 *addrp)
+static inline void
+perf_get_data_addr(struct pmu *pmu, struct pt_regs *regs, u64 *addrp)
{
unsigned long mmcra = regs->dsisr;
bool sdar_valid;
@@ -195,7 +208,7 @@ static inline void perf_get_data_addr(struct pt_regs *regs, u64 *addrp)
if (!(mmcra & MMCRA_SAMPLE_ENABLE) || sdar_valid)
*addrp = mfspr(SPRN_SDAR);

- if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN) &&
+ if (perf_paranoid_kernel(pmu) && !capable(CAP_SYS_ADMIN) &&
is_kernel_addr(mfspr(SPRN_SDAR)))
*addrp = 0;
}
@@ -435,7 +448,7 @@ static __u64 power_pmu_bhrb_to(u64 addr)
}

/* Processing BHRB entries */
-static void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
+static void power_pmu_bhrb_read(struct pmu *pmu, struct cpu_hw_events *cpuhw)
{
u64 val;
u64 addr;
@@ -463,8 +476,8 @@ static void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
* exporting it to userspace (avoid exposure of regions
* where we could have speculative execution)
*/
- if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN) &&
- is_kernel_addr(addr))
+ if (perf_paranoid_kernel(pmu) &&
+ !capable(CAP_SYS_ADMIN) && is_kernel_addr(addr))
continue;

/* Branches are read most recent first (ie. mfbhrb 0 is
@@ -2066,12 +2079,12 @@ static void record_and_restart(struct perf_event *event, unsigned long val,

if (event->attr.sample_type &
(PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR))
- perf_get_data_addr(regs, &data.addr);
+ perf_get_data_addr(event->pmu, regs, &data.addr);

if (event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK) {
struct cpu_hw_events *cpuhw;
cpuhw = this_cpu_ptr(&cpu_hw_events);
- power_pmu_bhrb_read(cpuhw);
+ power_pmu_bhrb_read(event->pmu, cpuhw);
data.br_stack = &cpuhw->bhrb_stack;
}

diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index 24ffa1e88cf9..e416c9e2400a 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -555,7 +555,7 @@ static int bts_event_init(struct perf_event *event)
* Note that the default paranoia setting permits unprivileged
* users to profile the kernel.
*/
- if (event->attr.exclude_kernel && perf_paranoid_kernel() &&
+ if (event->attr.exclude_kernel && perf_paranoid_kernel(event->pmu) &&
!capable(CAP_SYS_ADMIN))
return -EACCES;

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 035c37481f57..40ccb4dbbadf 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3033,7 +3033,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (x86_pmu.version < 3)
return -EINVAL;

- if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_cpu(event->pmu) && !capable(CAP_SYS_ADMIN))
return -EACCES;

event->hw.config |= ARCH_PERFMON_EVENTSEL_ANY;
diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c
index d32c0eed38ca..878451ef1ace 100644
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -776,7 +776,7 @@ static int p4_validate_raw_event(struct perf_event *event)
* the user needs special permissions to be able to use it
*/
if (p4_ht_active() && p4_event_bind_map[v].shared) {
- if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_cpu(event->pmu) && !capable(CAP_SYS_ADMIN))
return -EACCES;
}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f0ca79..22906bcc1bcd 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1179,17 +1179,17 @@ extern int perf_cpu_time_max_percent_handler(struct ctl_table *table, int write,
int perf_event_max_stack_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);

-static inline bool perf_paranoid_tracepoint_raw(void)
+static inline bool perf_paranoid_tracepoint_raw(const struct pmu *pmu)
{
return sysctl_perf_event_paranoid > -1;
}

-static inline bool perf_paranoid_cpu(void)
+static inline bool perf_paranoid_cpu(const struct pmu *pmu)
{
return sysctl_perf_event_paranoid > 0;
}

-static inline bool perf_paranoid_kernel(void)
+static inline bool perf_paranoid_kernel(const struct pmu *pmu)
{
return sysctl_perf_event_paranoid > 1;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index adcd9eae13fb..f556144bc0c5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4108,7 +4108,7 @@ find_get_context(struct pmu *pmu, struct task_struct *task,

if (!task) {
/* Must be root to operate on a CPU event: */
- if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_cpu(pmu) && !capable(CAP_SYS_ADMIN))
return ERR_PTR(-EACCES);

cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
@@ -5676,7 +5676,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
lock_limit >>= PAGE_SHIFT;
locked = vma->vm_mm->pinned_vm + extra;

- if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() &&
+ if ((locked > lock_limit) && perf_paranoid_tracepoint_raw(event->pmu) &&
!capable(CAP_IPC_LOCK)) {
ret = -EPERM;
goto unlock;
@@ -10492,8 +10492,10 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_cred;
}

+ pmu = event->pmu;
+
if (!attr.exclude_kernel) {
- if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) {
+ if (perf_paranoid_kernel(pmu) && !capable(CAP_SYS_ADMIN)) {
err = -EACCES;
goto err_alloc;
}
@@ -10501,7 +10503,7 @@ SYSCALL_DEFINE5(perf_event_open,

/* Only privileged users can get physical addresses */
if ((attr.sample_type & PERF_SAMPLE_PHYS_ADDR) &&
- perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) {
+ perf_paranoid_kernel(pmu) && !capable(CAP_SYS_ADMIN)) {
err = -EACCES;
goto err_alloc;
}
@@ -10509,13 +10511,13 @@ SYSCALL_DEFINE5(perf_event_open,
/* privileged levels capture (kernel, hv): check permissions */
if ((attr.sample_type & PERF_SAMPLE_BRANCH_STACK) &&
(attr.branch_sample_type & PERF_SAMPLE_BRANCH_PERM_PLM) &&
- perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) {
+ perf_paranoid_kernel(pmu) && !capable(CAP_SYS_ADMIN)) {
err = -EACCES;
goto err_alloc;
}

if (is_sampling_event(event)) {
- if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
+ if (pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
err = -EOPNOTSUPP;
goto err_alloc;
}
@@ -10525,7 +10527,6 @@ SYSCALL_DEFINE5(perf_event_open,
* Special case software events and allow them to be part of
* any hardware group.
*/
- pmu = event->pmu;

if (attr.use_clockid) {
err = perf_event_set_clock(event, attr.clockid);
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 69a3fe926e8c..04ea3afec5b2 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -46,7 +46,8 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,

/* The ftrace function trace is allowed only for root. */
if (ftrace_event_is_function(tp_event)) {
- if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_tracepoint_raw(p_event->pmu) &&
+ !capable(CAP_SYS_ADMIN))
return -EPERM;

if (!is_sampling_event(p_event))
@@ -82,7 +83,8 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
* ...otherwise raw tracepoint data can be a severe data leak,
* only allow root to have these.
*/
- if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
+ if (perf_paranoid_tracepoint_raw(p_event->pmu) &&
+ !capable(CAP_SYS_ADMIN))
return -EPERM;

return 0;
--
2.17.1


2018-09-19 12:29:27

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 1/5] perf: Move some access checks later in perf_event_open

From: Tvrtko Ursulin <[email protected]>

To enable per-PMU access controls in a following patch first move all call
sites of perf_paranoid_kernel() to after the event has been created.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
kernel/events/core.c | 36 ++++++++++++++++++++++--------------
1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index bccce538f51d..adcd9eae13fb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10194,10 +10194,6 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
*/
attr->branch_sample_type = mask;
}
- /* privileged levels capture (kernel, hv): check permissions */
- if ((mask & PERF_SAMPLE_BRANCH_PERM_PLM)
- && perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
- return -EACCES;
}

if (attr->sample_type & PERF_SAMPLE_REGS_USER) {
@@ -10414,11 +10410,6 @@ SYSCALL_DEFINE5(perf_event_open,
if (err)
return err;

- if (!attr.exclude_kernel) {
- if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
- return -EACCES;
- }
-
if (attr.namespaces) {
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
@@ -10432,11 +10423,6 @@ SYSCALL_DEFINE5(perf_event_open,
return -EINVAL;
}

- /* Only privileged users can get physical addresses */
- if ((attr.sample_type & PERF_SAMPLE_PHYS_ADDR) &&
- perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
- return -EACCES;
-
/*
* In cgroup mode, the pid argument is used to pass the fd
* opened to the cgroup directory in cgroupfs. The cpu argument
@@ -10506,6 +10492,28 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_cred;
}

+ if (!attr.exclude_kernel) {
+ if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) {
+ err = -EACCES;
+ goto err_alloc;
+ }
+ }
+
+ /* Only privileged users can get physical addresses */
+ if ((attr.sample_type & PERF_SAMPLE_PHYS_ADDR) &&
+ perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) {
+ err = -EACCES;
+ goto err_alloc;
+ }
+
+ /* privileged levels capture (kernel, hv): check permissions */
+ if ((attr.sample_type & PERF_SAMPLE_BRANCH_STACK) &&
+ (attr.branch_sample_type & PERF_SAMPLE_BRANCH_PERM_PLM) &&
+ perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) {
+ err = -EACCES;
+ goto err_alloc;
+ }
+
if (is_sampling_event(event)) {
if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
err = -EOPNOTSUPP;
--
2.17.1


2018-09-19 12:29:37

by Tvrtko Ursulin

[permalink] [raw]
Subject: [RFC 3/5] perf: Allow per PMU access control

From: Tvrtko Ursulin <[email protected]>

For situations where sysadmins might want to allow different level of
access control for different PMUs, we start creating per-PMU
perf_event_paranoid controls in sysfs.

These work in equivalent fashion as the existing perf_event_paranoid
sysctl, which now becomes the parent control for each PMU.

On PMU registration the global/parent value will be inherited by each PMU,
as it will be propagated to all registered PMUs when the sysctl is
updated.

At any later point individual PMU access controls, located in
<sysfs>/device/<pmu-name>/perf_event_paranoid, can be adjusted to achieve
fine grained access control.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
include/linux/perf_event.h | 12 ++++++--
kernel/events/core.c | 59 ++++++++++++++++++++++++++++++++++++++
kernel/sysctl.c | 4 ++-
3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 22906bcc1bcd..bb82e47f5343 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -271,6 +271,9 @@ struct pmu {
/* number of address filters this PMU can do */
unsigned int nr_addr_filters;

+ /* per PMU access control */
+ int perf_event_paranoid;
+
/*
* Fully disable/enable this PMU, can be used to protect from the PMI
* as well as for lazy/batch writing of the MSRs.
@@ -1169,6 +1172,9 @@ extern int sysctl_perf_cpu_time_max_percent;

extern void perf_sample_event_took(u64 sample_len_ns);

+extern int perf_proc_paranoid_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos);
extern int perf_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos);
@@ -1181,17 +1187,17 @@ int perf_event_max_stack_handler(struct ctl_table *table, int write,

static inline bool perf_paranoid_tracepoint_raw(const struct pmu *pmu)
{
- return sysctl_perf_event_paranoid > -1;
+ return pmu->perf_event_paranoid > -1;
}

static inline bool perf_paranoid_cpu(const struct pmu *pmu)
{
- return sysctl_perf_event_paranoid > 0;
+ return pmu->perf_event_paranoid > 0;
}

static inline bool perf_paranoid_kernel(const struct pmu *pmu)
{
- return sysctl_perf_event_paranoid > 1;
+ return pmu->perf_event_paranoid > 1;
}

extern void perf_event_init(void);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f556144bc0c5..35f122349508 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -432,6 +432,24 @@ static void update_perf_cpu_limits(void)

static bool perf_rotate_context(struct perf_cpu_context *cpuctx);

+int perf_proc_paranoid_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+ struct pmu *pmu;
+
+ if (ret || !write)
+ return ret;
+
+ mutex_lock(&pmus_lock);
+ list_for_each_entry(pmu, &pmus, entry)
+ pmu->perf_event_paranoid = sysctl_perf_event_paranoid;
+ mutex_unlock(&pmus_lock);
+
+ return 0;
+}
+
int perf_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
@@ -9430,6 +9448,41 @@ static void free_pmu_context(struct pmu *pmu)
mutex_unlock(&pmus_lock);
}

+/*
+ * Fine-grained access control:
+ */
+static ssize_t
+perf_event_paranoid_show(struct device *dev,
+ struct device_attribute *attr,
+ char *page)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+
+ return snprintf(page, PAGE_SIZE - 1, "%d\n", pmu->perf_event_paranoid);
+}
+
+static ssize_t
+perf_event_paranoid_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+ int ret, val;
+
+ ret = kstrtoint(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ if (val < -1 || val > 2)
+ return -EINVAL;
+
+ pmu->perf_event_paranoid = val;
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(perf_event_paranoid);
+
/*
* Let userspace know that this PMU supports address range filtering:
*/
@@ -9544,6 +9597,11 @@ static int pmu_dev_alloc(struct pmu *pmu)
if (ret)
goto free_dev;

+ /* Add fine-grained access control attribute. */
+ ret = device_create_file(pmu->dev, &dev_attr_perf_event_paranoid);
+ if (ret)
+ goto del_dev;
+
/* For PMUs with address filters, throw in an extra attribute: */
if (pmu->nr_addr_filters)
ret = device_create_file(pmu->dev, &dev_attr_nr_addr_filters);
@@ -9575,6 +9633,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
if (!pmu->pmu_disable_count)
goto unlock;

+ pmu->perf_event_paranoid = sysctl_perf_event_paranoid;
pmu->type = -1;
if (!name)
goto skip_type;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index cc02050fd0c4..83179c443c89 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1145,7 +1145,9 @@ static struct ctl_table kern_table[] = {
.data = &sysctl_perf_event_paranoid,
.maxlen = sizeof(sysctl_perf_event_paranoid),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = perf_proc_paranoid_handler,
+ .extra1 = &neg_one,
+ .extra2 = &two,
},
{
.procname = "perf_event_mlock_kb",
--
2.17.1


2018-09-27 20:18:22

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC 3/5] perf: Allow per PMU access control

> + mutex_lock(&pmus_lock);
> + list_for_each_entry(pmu, &pmus, entry)
> + pmu->perf_event_paranoid = sysctl_perf_event_paranoid;
> + mutex_unlock(&pmus_lock);

What happens to pmus that got added later?

The rest looks good.

Can you post a non RFC version?

-Andi

2018-09-28 09:00:18

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [RFC 3/5] perf: Allow per PMU access control


On 27/09/2018 21:15, Andi Kleen wrote:
>> + mutex_lock(&pmus_lock);
>> + list_for_each_entry(pmu, &pmus, entry)
>> + pmu->perf_event_paranoid = sysctl_perf_event_paranoid;
>> + mutex_unlock(&pmus_lock);
>
> What happens to pmus that got added later?

There is a hunk a bit lower in the patch where in perf_pmu_register the
initial setting is assigned from the global sysctl.

> The rest looks good.
>
> Can you post a non RFC version?

Sure!

Regards,

Tvrtko

2018-09-28 10:27:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Tvrtko,

On Wed, 19 Sep 2018, Tvrtko Ursulin wrote:

It would be very helpful if you cc all involved people on the cover letter
instead of just cc'ing your own pile of email addresses. CC'ed now.

> For situations where sysadmins might want to allow different level of
> access control for different PMUs, we start creating per-PMU
> perf_event_paranoid controls in sysfs.
>
> These work in equivalent fashion as the existing perf_event_paranoid
> sysctl, which now becomes the parent control for each PMU.
>
> On PMU registration the global/parent value will be inherited by each PMU,
> as it will be propagated to all registered PMUs when the sysctl is
> updated.
>
> At any later point individual PMU access controls, located in
> <sysfs>/device/<pmu-name>/perf_event_paranoid, can be adjusted to achieve
> fine grained access control.
>
> Discussion from previous posting:
> https://lkml.org/lkml/2018/5/21/156

This is really not helpful. The cover letter and the change logs should
contain a summary of that discussion and a proper justification of the
proposed change. Just saying 'sysadmins might want to allow' is not useful
at all, it's yet another 'I want a pony' thing.

I read through the previous thread and there was a clear request to involve
security people into this. Especially those who are deeply involved with
hardware side channels. I don't see anyone Cc'ed on the whole series.

For the record, I'm not buying the handwavy 'more noise' argument at
all. It wants a proper analysis and we need to come up with criteria which
PMUs can be exposed at all.

All of this want's a proper documentation clearly explaining the risks and
scope of these knobs per PMU. Just throwing magic knobs at sysadmins and
then saying 'its their problem to figure it out' is not acceptable.

Thanks,

tglx


2018-09-28 13:23:29

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)


Hi,

On 28/09/2018 11:26, Thomas Gleixner wrote:
> Tvrtko,
>
> On Wed, 19 Sep 2018, Tvrtko Ursulin wrote:
>
> It would be very helpful if you cc all involved people on the cover letter
> instead of just cc'ing your own pile of email addresses. CC'ed now.

I accept it was by bad to miss adding Cc's on the cover letter, but my
own email addresses hopefully should not bother you. It is simply a
question of what I have in .gitconfig vs what I forgot to do manually.

>> For situations where sysadmins might want to allow different level of
>> access control for different PMUs, we start creating per-PMU
>> perf_event_paranoid controls in sysfs.
>>
>> These work in equivalent fashion as the existing perf_event_paranoid
>> sysctl, which now becomes the parent control for each PMU.
>>
>> On PMU registration the global/parent value will be inherited by each PMU,
>> as it will be propagated to all registered PMUs when the sysctl is
>> updated.
>>
>> At any later point individual PMU access controls, located in
>> <sysfs>/device/<pmu-name>/perf_event_paranoid, can be adjusted to achieve
>> fine grained access control.
>>
>> Discussion from previous posting:
>> https://lkml.org/lkml/2018/5/21/156
>
> This is really not helpful. The cover letter and the change logs should
> contain a summary of that discussion and a proper justification of the
> proposed change. Just saying 'sysadmins might want to allow' is not useful
> at all, it's yet another 'I want a pony' thing.

Okay, for the next round I will expand the cover letter with at least
one concrete example on how it is usable and summarize the discussion a bit.

> I read through the previous thread and there was a clear request to involve
> security people into this. Especially those who are deeply involved with
> hardware side channels. I don't see anyone Cc'ed on the whole series.

Who would you recommend I add? Because I really don't know..

> For the record, I'm not buying the handwavy 'more noise' argument at
> all. It wants a proper analysis and we need to come up with criteria which
> PMUs can be exposed at all.
>
> All of this want's a proper documentation clearly explaining the risks and
> scope of these knobs per PMU. Just throwing magic knobs at sysadmins and
> then saying 'its their problem to figure it out' is not acceptable.

Presumably you see adding fine grained control as diminishing the
overall security rather than raising it? Could you explain why? Because
incompetent sysadmin will turn it off for some PMU, while without having
the fine-grained control they wouldn't turn it off globally?

This feature was requested by the exact opposite concern, that in order
to access the i915 PMU, one has to compromise the security of the entire
system by allowing access to *all* PMU's.

Making this ability fine-grained sounds like a logical solution for
solving this weakening of security controls.

Concrete example was that on video transcoding farms users want to
monitor the utilization of GPU engines (like CPU cores) and they can do
that via the i915 PMU. But for that to work today they have to dial down
the global perf_event_paranoid setting. Obvious improvement was to allow
them to only dial down the i915.perf_event_paranoid setting. As such,
for this specific use case at least, the security is increased.

Regards,

Tvrtko

2018-09-28 14:03:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Tvrtko,

On Fri, 28 Sep 2018, Tvrtko Ursulin wrote:
> On 28/09/2018 11:26, Thomas Gleixner wrote:
> > On Wed, 19 Sep 2018, Tvrtko Ursulin wrote:
> >
> > It would be very helpful if you cc all involved people on the cover letter
> > instead of just cc'ing your own pile of email addresses. CC'ed now.
>
> I accept it was by bad to miss adding Cc's on the cover letter, but my own
> email addresses hopefully should not bother you. It is simply a question of
> what I have in .gitconfig vs what I forgot to do manually.

The keyword in the above sentence is 'just'. You can add as many of yours
as you want as long as everybody else is cc'ed.

> > I read through the previous thread and there was a clear request to involve
> > security people into this. Especially those who are deeply involved with
> > hardware side channels. I don't see anyone Cc'ed on the whole series.
>
> Who would you recommend I add? Because I really don't know..

Sure, and because you don't know you didn't bother to ask around and
ignored the review request.

I already added Kees and Jann. Please look for the SECCOMP folks in
MAINTAINERS.

> > For the record, I'm not buying the handwavy 'more noise' argument at
> > all. It wants a proper analysis and we need to come up with criteria which
> > PMUs can be exposed at all.
> >
> > All of this want's a proper documentation clearly explaining the risks and
> > scope of these knobs per PMU. Just throwing magic knobs at sysadmins and
> > then saying 'its their problem to figure it out' is not acceptable.
>
> Presumably you see adding fine grained control as diminishing the overall
> security rather than raising it? Could you explain why? Because incompetent
> sysadmin will turn it off for some PMU, while without having the fine-grained
> control they wouldn't turn it off globally?

I did not say at all that this might be diminishing security. And the
argumentation with 'incompetent sysadmins' is just the wrong attitude.

What I was asking for is proper documentation and this proper documentation
is meant for _competent_ sysadmins.

That documentation has to clearly describe what kind of information is
accessible and what potential side effects security wise this might
have. You cannot expect that even competent sysadmins know offhand what
which PMU might expose. And telling them 'Use Google' is just not the right
thing to do.

If you can't explain and document it, then providing the knob is just
fulfilling somebodys 'I want a pony' request.

> This feature was requested by the exact opposite concern, that in order to
> access the i915 PMU, one has to compromise the security of the entire system
> by allowing access to *all* PMU's.
>
> Making this ability fine-grained sounds like a logical solution for solving
> this weakening of security controls.

Sure, and this wants to be documented in the cover letter and the
changelogs.

But this does also require a proper analysis and documentation why it is
not a security risk to expose the i915 PMU or what potential security
issues this can create, so that the competent sysadmin can make a
judgement.

And the same is required for all other PMUs which can be enabled in the
same way for unprivileged access. And we might as well come to the
conclusion via analysis that for some PMUs unpriviledged access is just not
a good idea and exclude them. I surely know a few which qualify for
exclusion, so the right approach is to provide this knob only when the risk
is analyzed and documented and the PMU has been flagged as candidate for
unpriviledged exposure. I.e. opt in and not opt out.

Thanks,

tglx

2018-09-28 14:57:32

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)


On 28/09/2018 15:02, Thomas Gleixner wrote:
> Tvrtko,
>
> On Fri, 28 Sep 2018, Tvrtko Ursulin wrote:
>> On 28/09/2018 11:26, Thomas Gleixner wrote:
>>> On Wed, 19 Sep 2018, Tvrtko Ursulin wrote:
>>>
>>> It would be very helpful if you cc all involved people on the cover letter
>>> instead of just cc'ing your own pile of email addresses. CC'ed now.
>>
>> I accept it was by bad to miss adding Cc's on the cover letter, but my own
>> email addresses hopefully should not bother you. It is simply a question of
>> what I have in .gitconfig vs what I forgot to do manually.
>
> The keyword in the above sentence is 'just'. You can add as many of yours
> as you want as long as everybody else is cc'ed.

Sure, but you also used the word "pile" and I would argue that made the
rest of your sentence, after and including "instead", sound like it not
only bothers you I forgot to Cc people on the cover letter, but it also
bothers you I included a "pile" of my own addresses. If that wasn't your
intention in the slightest then I apologise for misreading it.

>>> I read through the previous thread and there was a clear request to involve
>>> security people into this. Especially those who are deeply involved with
>>> hardware side channels. I don't see anyone Cc'ed on the whole series.
>>
>> Who would you recommend I add? Because I really don't know..
>
> Sure, and because you don't know you didn't bother to ask around and
> ignored the review request.

No, not because of that. You are assuming my actions and motivations and
constructing a story.

"did not bother" = negative connotations
"ignored" = negative connotations

Note instead the time lapse between this and previous posting of the
series, and if you want to assume something, assume things can get
missed and forgotten without intent or malice.

> I already added Kees and Jann. Please look for the SECCOMP folks in
> MAINTAINERS.

Thanks!

>>> For the record, I'm not buying the handwavy 'more noise' argument at
>>> all. It wants a proper analysis and we need to come up with criteria which
>>> PMUs can be exposed at all.
>>>
>>> All of this want's a proper documentation clearly explaining the risks and
>>> scope of these knobs per PMU. Just throwing magic knobs at sysadmins and
>>> then saying 'its their problem to figure it out' is not acceptable.
>>
>> Presumably you see adding fine grained control as diminishing the overall
>> security rather than raising it? Could you explain why? Because incompetent
>> sysadmin will turn it off for some PMU, while without having the fine-grained
>> control they wouldn't turn it off globally?
>
> I did not say at all that this might be diminishing security. And the
> argumentation with 'incompetent sysadmins' is just the wrong attitude.

Wrong attitude what? I was trying to guess your reasoning (cues in
"presumably" and a lot of question marks) since it wasn't clear to me
why is your position what it is.

> What I was asking for is proper documentation and this proper documentation
> is meant for _competent_ sysadmins.
>
> That documentation has to clearly describe what kind of information is
> accessible and what potential side effects security wise this might
> have. You cannot expect that even competent sysadmins know offhand what
> which PMU might expose. And telling them 'Use Google' is just not the right
> thing to do.

I did not mention Google.

> If you can't explain and document it, then providing the knob is just
> fulfilling somebodys 'I want a pony' request.

Well it's not a pony, it is mechanism to avoid having to turn off all
security. We can hopefully discuss it without ponies.

>> This feature was requested by the exact opposite concern, that in order to
>> access the i915 PMU, one has to compromise the security of the entire system
>> by allowing access to *all* PMU's.
>>
>> Making this ability fine-grained sounds like a logical solution for solving
>> this weakening of security controls.
>
> Sure, and this wants to be documented in the cover letter and the
> changelogs.
>
> But this does also require a proper analysis and documentation why it is
> not a security risk to expose the i915 PMU or what potential security
> issues this can create, so that the competent sysadmin can make a
> judgement.
>
> And the same is required for all other PMUs which can be enabled in the
> same way for unprivileged access. And we might as well come to the
> conclusion via analysis that for some PMUs unpriviledged access is just not
> a good idea and exclude them. I surely know a few which qualify for
> exclusion, so the right approach is to provide this knob only when the risk
> is analyzed and documented and the PMU has been flagged as candidate for
> unpriviledged exposure. I.e. opt in and not opt out.

I am happy to work on the mechanics of achieving this once the security
guys and all PMU owners get involved. Even though I am not convinced the
bar to allow fine-grained control should be evaluating all possible
PMUs*, but if the security folks agree that is the case it is fine by me.

Regards,

Tvrtko

*) The part of my reply you did not quote explains how the fine-grained
control improves security in existing deployments. The documentation I
added refers to the existing perf_event_paranoid documentation for
explanation of security concerns involved. Which is not much in itself.
But essentially we both have a PMU and a knob already. I don't see why
adding the same knob per-PMU needs much more stringent criteria to be
accepted. But as said, that's for security people to decide.

2018-09-28 15:12:51

by Jann Horn

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, Sep 28, 2018 at 3:22 PM Tvrtko Ursulin
<[email protected]> wrote:
> On 28/09/2018 11:26, Thomas Gleixner wrote:
> > On Wed, 19 Sep 2018, Tvrtko Ursulin wrote:
> >> For situations where sysadmins might want to allow different level of
> >> access control for different PMUs, we start creating per-PMU
> >> perf_event_paranoid controls in sysfs.
> >>
> >> These work in equivalent fashion as the existing perf_event_paranoid
> >> sysctl, which now becomes the parent control for each PMU.
> >>
> >> On PMU registration the global/parent value will be inherited by each PMU,
> >> as it will be propagated to all registered PMUs when the sysctl is
> >> updated.
> >>
> >> At any later point individual PMU access controls, located in
> >> <sysfs>/device/<pmu-name>/perf_event_paranoid, can be adjusted to achieve
> >> fine grained access control.
> >>
> >> Discussion from previous posting:
> >> https://lkml.org/lkml/2018/5/21/156
> >
> > This is really not helpful. The cover letter and the change logs should
> > contain a summary of that discussion and a proper justification of the
> > proposed change. Just saying 'sysadmins might want to allow' is not useful
> > at all, it's yet another 'I want a pony' thing.
>
> Okay, for the next round I will expand the cover letter with at least
> one concrete example on how it is usable and summarize the discussion a bit.
>
> > I read through the previous thread and there was a clear request to involve
> > security people into this. Especially those who are deeply involved with
> > hardware side channels. I don't see anyone Cc'ed on the whole series.
>
> Who would you recommend I add? Because I really don't know..
>
> > For the record, I'm not buying the handwavy 'more noise' argument at
> > all. It wants a proper analysis and we need to come up with criteria which
> > PMUs can be exposed at all.
> >
> > All of this want's a proper documentation clearly explaining the risks and
> > scope of these knobs per PMU. Just throwing magic knobs at sysadmins and
> > then saying 'its their problem to figure it out' is not acceptable.
>
> Presumably you see adding fine grained control as diminishing the
> overall security rather than raising it? Could you explain why? Because
> incompetent sysadmin will turn it off for some PMU, while without having
> the fine-grained control they wouldn't turn it off globally?
>
> This feature was requested by the exact opposite concern, that in order
> to access the i915 PMU, one has to compromise the security of the entire
> system by allowing access to *all* PMU's.
>
> Making this ability fine-grained sounds like a logical solution for
> solving this weakening of security controls.
>
> Concrete example was that on video transcoding farms users want to
> monitor the utilization of GPU engines (like CPU cores) and they can do
> that via the i915 PMU. But for that to work today they have to dial down
> the global perf_event_paranoid setting. Obvious improvement was to allow
> them to only dial down the i915.perf_event_paranoid setting. As such,
> for this specific use case at least, the security is increased.

Which paranoia level would be used for the i915.perf_event_paranoid
setting in such a case?

Perhaps also CC [email protected] on the next version.

2018-09-28 15:24:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Tvrtko,

On Fri, 28 Sep 2018, Tvrtko Ursulin wrote:
> On 28/09/2018 15:02, Thomas Gleixner wrote:
> > > Presumably you see adding fine grained control as diminishing the overall
> > > security rather than raising it? Could you explain why? Because
> > > incompetent
> > > sysadmin will turn it off for some PMU, while without having the
> > > fine-grained
> > > control they wouldn't turn it off globally?
> >
> > I did not say at all that this might be diminishing security. And the
> > argumentation with 'incompetent sysadmins' is just the wrong attitude.
>
> Wrong attitude what? I was trying to guess your reasoning (cues in
> "presumably" and a lot of question marks) since it wasn't clear to me why is
> your position what it is.

Guessing my reasonings has nothing to do with you mentioning incompentent
sysadmins.

> > What I was asking for is proper documentation and this proper documentation
> > is meant for _competent_ sysadmins.
> >
> > That documentation has to clearly describe what kind of information is
> > accessible and what potential side effects security wise this might
> > have. You cannot expect that even competent sysadmins know offhand what
> > which PMU might expose. And telling them 'Use Google' is just not the right
> > thing to do.
>
> I did not mention Google.

I did not say that you mentioned google. But what is a sysadmin supposed to
do when there is no documentation aside of using google? And not having
documentation is basically the same thing as telling them to use google.

> > If you can't explain and document it, then providing the knob is just
> > fulfilling somebodys 'I want a pony' request.
>
> Well it's not a pony, it is mechanism to avoid having to turn off all
> security. We can hopefully discuss it without ponies.

If you want to make a pettifogger contest out of this discussion, then we
can stop right here. I explained it technically why just adding a knob
without further explanation and analysis is not acceptable.

> > And the same is required for all other PMUs which can be enabled in the
> > same way for unprivileged access. And we might as well come to the
> > conclusion via analysis that for some PMUs unpriviledged access is just not
> > a good idea and exclude them. I surely know a few which qualify for
> > exclusion, so the right approach is to provide this knob only when the risk
> > is analyzed and documented and the PMU has been flagged as candidate for
> > unpriviledged exposure. I.e. opt in and not opt out.
>
> I am happy to work on the mechanics of achieving this once the security guys
> and all PMU owners get involved. Even though I am not convinced the bar to
> allow fine-grained control should be evaluating all possible PMUs*, but if the
> security folks agree that is the case it is fine by me.

Making the knob opt in per PMU does not need all PMU owners to be
involved. It allows to add the opt in flag on a case by case basis.

> *) The part of my reply you did not quote explains how the fine-grained
> control improves security in existing deployments. The documentation I added
> refers to the existing perf_event_paranoid documentation for explanation of
> security concerns involved. Which is not much in itself. But essentially we
> both have a PMU and a knob already. I don't see why adding the same knob
> per-PMU needs much more stringent criteria to be accepted. But as said, that's
> for security people to decide.

The fact, that the existing knob is poorly documented does make an excuse
for adding more knobs without documentation. Quite the contrary, if we
notice that the existing knob lacks proper documentation, then we should
fix that first.

Thanks,

tglx

2018-09-28 15:46:08

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Hello,

On 28.09.2018 17:02, Thomas Gleixner wrote:
> Tvrtko,
>
> On Fri, 28 Sep 2018, Tvrtko Ursulin wrote:
>> On 28/09/2018 11:26, Thomas Gleixner wrote:
>>> On Wed, 19 Sep 2018, Tvrtko Ursulin wrote:
>>>
>>> It would be very helpful if you cc all involved people on the cover letter
>>> instead of just cc'ing your own pile of email addresses. CC'ed now.
>>
>> I accept it was by bad to miss adding Cc's on the cover letter, but my own
>> email addresses hopefully should not bother you. It is simply a question of
>> what I have in .gitconfig vs what I forgot to do manually.
>
> The keyword in the above sentence is 'just'. You can add as many of yours
> as you want as long as everybody else is cc'ed.
>
>>> I read through the previous thread and there was a clear request to involve
>>> security people into this. Especially those who are deeply involved with
>>> hardware side channels. I don't see anyone Cc'ed on the whole series.
>>
>> Who would you recommend I add? Because I really don't know..
>
> Sure, and because you don't know you didn't bother to ask around and
> ignored the review request.
>
> I already added Kees and Jann. Please look for the SECCOMP folks in
> MAINTAINERS.

Thomas, thanks a lot for involving the folks!

>
>>> For the record, I'm not buying the handwavy 'more noise' argument at
>>> all. It wants a proper analysis and we need to come up with criteria which
>>> PMUs can be exposed at all.
>>>
>>> All of this want's a proper documentation clearly explaining the risks and
>>> scope of these knobs per PMU. Just throwing magic knobs at sysadmins and
>>> then saying 'its their problem to figure it out' is not acceptable.
>>
>> Presumably you see adding fine grained control as diminishing the overall
>> security rather than raising it? Could you explain why? Because incompetent
>> sysadmin will turn it off for some PMU, while without having the fine-grained
>> control they wouldn't turn it off globally?
>
> I did not say at all that this might be diminishing security. And the
> argumentation with 'incompetent sysadmins' is just the wrong attitude.
>
> What I was asking for is proper documentation and this proper documentation
> is meant for _competent_ sysadmins.
>
> That documentation has to clearly describe what kind of information is
> accessible and what potential side effects security wise this might
> have. You cannot expect that even competent sysadmins know offhand what
> which PMU might expose. And telling them 'Use Google' is just not the right
> thing to do.
>
> If you can't explain and document it, then providing the knob is just
> fulfilling somebodys 'I want a pony' request.>
>> This feature was requested by the exact opposite concern, that in order to
>> access the i915 PMU, one has to compromise the security of the entire system
>> by allowing access to *all* PMU's.
>>
>> Making this ability fine-grained sounds like a logical solution for solving
>> this weakening of security controls.
>
> Sure, and this wants to be documented in the cover letter and the
> changelogs.
>
> But this does also require a proper analysis and documentation why it is
> not a security risk to expose the i915 PMU or what potential security
> issues this can create, so that the competent sysadmin can make a
> judgement.
>
> And the same is required for all other PMUs which can be enabled in the
> same way for unprivileged access. And we might as well come to the
> conclusion via analysis that for some PMUs unpriviledged access is just not
> a good idea and exclude them. I surely know a few which qualify for
> exclusion, so the right approach is to provide this knob only when the risk
> is analyzed and documented and the PMU has been flagged as candidate for
> unpriviledged exposure. I.e. opt in and not opt out.

If you ask me then, IMHO, unprivileged access to CBOX pmu looks unsafe
and is now governed by traditional *core* perf_event_paranoid setting.

But *core* paranoid >= 1 (per-process mode) prevents simultaneous perf
record sampling and perf stat -I reading from IMC, UPI, PCIe and other
uncore counters.

This kind of monitoring could make process performance observability
thru Perf subsystem more flexible and better tailored for cloud and
cluster environments. However it requires fine-tuning control
capabilities in order to still keep system as secure as possible.

Could i915, IMC, UPI, PCIe pmus be safe enough to be governed by
a separate perf_event_paranoid settings?

Thanks!
Alexey

>
> Thanks,
>
> tglx
>

2018-09-28 16:43:30

by Mark Rutland

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, Sep 28, 2018 at 12:26:52PM +0200, Thomas Gleixner wrote:
> On Wed, 19 Sep 2018, Tvrtko Ursulin wrote:
>
> It would be very helpful if you cc all involved people on the cover letter
> instead of just cc'ing your own pile of email addresses. CC'ed now.
>
> > For situations where sysadmins might want to allow different level of
> > access control for different PMUs, we start creating per-PMU
> > perf_event_paranoid controls in sysfs.
> >
> > These work in equivalent fashion as the existing perf_event_paranoid
> > sysctl, which now becomes the parent control for each PMU.
> >
> > On PMU registration the global/parent value will be inherited by each PMU,
> > as it will be propagated to all registered PMUs when the sysctl is
> > updated.
> >
> > At any later point individual PMU access controls, located in
> > <sysfs>/device/<pmu-name>/perf_event_paranoid, can be adjusted to achieve
> > fine grained access control.
> >
> > Discussion from previous posting:
> > https://lkml.org/lkml/2018/5/21/156
>
> This is really not helpful. The cover letter and the change logs should
> contain a summary of that discussion and a proper justification of the
> proposed change. Just saying 'sysadmins might want to allow' is not useful
> at all, it's yet another 'I want a pony' thing.
>
> I read through the previous thread and there was a clear request to involve
> security people into this. Especially those who are deeply involved with
> hardware side channels. I don't see anyone Cc'ed on the whole series.
>
> For the record, I'm not buying the handwavy 'more noise' argument at
> all. It wants a proper analysis and we need to come up with criteria which
> PMUs can be exposed at all.
>
> All of this want's a proper documentation clearly explaining the risks and
> scope of these knobs per PMU. Just throwing magic knobs at sysadmins and
> then saying 'its their problem to figure it out' is not acceptable.

There's also been prior discussion on these feature in other contexts
(e.g. android expoits resulting from out-of-tree drivers). It would be
nice to see those considered.

IIRC The conclusion from prior discussions (e.g. [1]) was that we wanted
finer granularity of control such that we could limit PMU access to
specific users -- e.g. disallow arbitrary android apps from poking *any*
PMU, while allowing some more trusted apps/users to uses *some* specific
PMUs.

e.g. we could add /sys/bus/event_source/devices/${PMU}/device, protect
this via the usual fs ACLs, and pass the fd to perf_event_open()
somehow. A valid fd would act as a capability, taking precedence over
perf_event_paranoid.

Thanks,
Mark.

[1] https://patchwork.kernel.org/patch/9249919/

2018-09-28 17:24:08

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

> There's also been prior discussion on these feature in other contexts
> (e.g. android expoits resulting from out-of-tree drivers). It would be
> nice to see those considered.
>
> IIRC The conclusion from prior discussions (e.g. [1]) was that we wanted
> finer granularity of control such that we could limit PMU access to
> specific users -- e.g. disallow arbitrary android apps from poking *any*
> PMU, while allowing some more trusted apps/users to uses *some* specific
> PMUs.
>
> e.g. we could add /sys/bus/event_source/devices/${PMU}/device, protect
> this via the usual fs ACLs, and pass the fd to perf_event_open()
> somehow. A valid fd would act as a capability, taking precedence over
> perf_event_paranoid.

That sounds like an orthogonal feature. I don't think the original
patchkit would need to be hold up for this. It would be something
in addition.

BTW can't you already do that with the syscall filter? I assume
the Android sandboxes already use that. Just forbid perf_event_open
for the apps.

-Andi


2018-09-28 17:42:32

by Mark Rutland

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, Sep 28, 2018 at 10:23:40AM -0700, Andi Kleen wrote:
> > There's also been prior discussion on these feature in other contexts
> > (e.g. android expoits resulting from out-of-tree drivers). It would be
> > nice to see those considered.
> >
> > IIRC The conclusion from prior discussions (e.g. [1]) was that we wanted
> > finer granularity of control such that we could limit PMU access to
> > specific users -- e.g. disallow arbitrary android apps from poking *any*
> > PMU, while allowing some more trusted apps/users to uses *some* specific
> > PMUs.
> >
> > e.g. we could add /sys/bus/event_source/devices/${PMU}/device, protect
> > this via the usual fs ACLs, and pass the fd to perf_event_open()
> > somehow. A valid fd would act as a capability, taking precedence over
> > perf_event_paranoid.
>
> That sounds like an orthogonal feature. I don't think the original
> patchkit would need to be hold up for this. It would be something
> in addition.

I have to say that I disagree -- these controls will have to interact
somehow, and the fewer of them we have, the less complexity we'll have
to deal with longer-term.

> BTW can't you already do that with the syscall filter? I assume
> the Android sandboxes already use that. Just forbid perf_event_open
> for the apps.

Note that this was about providing access to *some* PMUs in some cases.

IIUC, if that can be done today via a syscall filter, the same is true
of per-pmu paranoid settings.

Thanks,
Mark.

2018-09-28 18:22:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Alexey,

On Fri, 28 Sep 2018, Alexey Budankov wrote:
> On 28.09.2018 17:02, Thomas Gleixner wrote:
> > But this does also require a proper analysis and documentation why it is
> > not a security risk to expose the i915 PMU or what potential security
> > issues this can create, so that the competent sysadmin can make a
> > judgement.
> >
> > And the same is required for all other PMUs which can be enabled in the
> > same way for unprivileged access. And we might as well come to the
> > conclusion via analysis that for some PMUs unpriviledged access is just not
> > a good idea and exclude them. I surely know a few which qualify for
> > exclusion, so the right approach is to provide this knob only when the risk
> > is analyzed and documented and the PMU has been flagged as candidate for
> > unpriviledged exposure. I.e. opt in and not opt out.
>
> If you ask me then, IMHO, unprivileged access to CBOX pmu looks unsafe
> and is now governed by traditional *core* perf_event_paranoid setting.
>
> But *core* paranoid >= 1 (per-process mode) prevents simultaneous perf
> record sampling and perf stat -I reading from IMC, UPI, PCIe and other
> uncore counters.
>
> This kind of monitoring could make process performance observability
> thru Perf subsystem more flexible and better tailored for cloud and
> cluster environments. However it requires fine-tuning control
> capabilities in order to still keep system as secure as possible.
>
> Could i915, IMC, UPI, PCIe pmus be safe enough to be governed by
> a separate perf_event_paranoid settings?

Just to make it clear. I'm not against separate settings at all. But I'm
against adding knobs for every PMU the kernel supports wholesale without
analysis and documentation just because we can and somebody wants it.

Right now we have a single knob, which is poorly documented and that should
be fixed first. But some googling gives you the information that allowing
unprivilegded access is a security risk. So the security focussed sysadmin
will deny access to the PMUs no matter what.

Now we add more knobs without documentation what the exposure and risk of
each PMU is. The proposed patch set does this wholesale for every PMU
supported by the kernel. So the GPU user will ask the sysadmin to allow him
access. How can he make an informed decision? If he grants it then the next
user comes around and wants it for something else arguing that the other
got it for the GPU already. How can he make an informed decision about
that one?

We provide the knobs, so it's also our responsibility towards our users to
give them the information about their usage and scope. And every single PMU
knob has a different scope.

The documentation of the gazillion of knobs in /proc and /sysfs is not
really brilliant, but we should really not continue this bad practice
especially not when these knobs have potentially security relevant
implcations. Yes, I know, writing documentation is work, but it's valuable
and is appreciated by our users.

To make this doable and not blocked by requiring every PMU to be analyzed
and documented at once, I suggested to make this opt-in. Do analysis for a
given PMU, decide whether it should be exposed at all. If so, document it
proper and flip the bit. That way this can be done gradually as the need
arises and we can exclude the riskier ones completely.

I don't think this is an unreasonable request as it does not require the
i915 folks to look at PMUs they are not familiar with and does not get
blocked by waiting on every PMU wizard on the planet to have time.

Start with something like Documentation/admin-guide/perf-security.rst or
whatever name fits better and add a proper documentation for the existing
knob. With the infrastructure for fine grained access control add the
general explanation for fine grained access control. With each PMU which
opt's in for the knob, add a section with guidance about scope and risk for
this particular one.

Thanks,

tglx



2018-09-28 20:47:15

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

> Right now we have a single knob, which is poorly documented and that should
> be fixed first. But some googling gives you the information that allowing
> unprivilegded access is a security risk. So the security focussed sysadmin

Ah only if google could simply answer all our questions!

> will deny access to the PMUs no matter what.

It's not like there is or isn't a security risk and that you
can say that it is or it isn't in a global way.

Essentially these are channels of information. The channels always exist
in form of timing variances for any shared resource (like shared caches
or shared memory/IO/interconnect bandwidth) that can be measured.

Perfmon counters make the channels generally less noisy, but they do not cause
them.

To really close them completely you would need to avoid sharing
anything, or not allowing to measure time, neither of which is practical
short of an air gap.

There are reasonable assesments you can make either way and the answers
will be different based on your requirements. There isn't a single
answer that works for everyone.

There are cases where it isn't a problem at all.

If you don't have multiple users on the system your tolerance
should be extremely high.

For users who have multiple users there can be different tradeoffs.

So there isn't a single answer, and that is why it is important
that this if configurable.

-Andi


2018-09-28 20:49:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, Sep 28, 2018 at 06:40:17PM +0100, Mark Rutland wrote:
> On Fri, Sep 28, 2018 at 10:23:40AM -0700, Andi Kleen wrote:
> > > There's also been prior discussion on these feature in other contexts
> > > (e.g. android expoits resulting from out-of-tree drivers). It would be
> > > nice to see those considered.
> > >
> > > IIRC The conclusion from prior discussions (e.g. [1]) was that we wanted
> > > finer granularity of control such that we could limit PMU access to
> > > specific users -- e.g. disallow arbitrary android apps from poking *any*
> > > PMU, while allowing some more trusted apps/users to uses *some* specific
> > > PMUs.
> > >
> > > e.g. we could add /sys/bus/event_source/devices/${PMU}/device, protect
> > > this via the usual fs ACLs, and pass the fd to perf_event_open()
> > > somehow. A valid fd would act as a capability, taking precedence over
> > > perf_event_paranoid.
> >
> > That sounds like an orthogonal feature. I don't think the original
> > patchkit would need to be hold up for this. It would be something
> > in addition.
>
> I have to say that I disagree -- these controls will have to interact
> somehow, and the fewer of them we have, the less complexity we'll have
> to deal with longer-term.

You're proposing to completely redesign perf_event_open.

This new file descriptor argument doesn't exist today so it would
need to create a new system call with more arguments

(and BTW it would be more than the normal 6 argument limit
we have, so actually it couldn't even be a standard sycall)

Obviously we would need to keep the old system call around
for compability, so you would need to worry about this
interaction in any case!

So tying it together doesn't make any sense, because
the problem has to be solved separately anyways.

>
> > BTW can't you already do that with the syscall filter? I assume
> > the Android sandboxes already use that. Just forbid perf_event_open
> > for the apps.
>
> Note that this was about providing access to *some* PMUs in some cases.
>
> IIUC, if that can be done today via a syscall filter, the same is true
> of per-pmu paranoid settings.

The difference is that the Android sandboxes likely already doing this
and have all the infrastructure, and it's just another rule.

Requiring syscall filters just to use the PMU on xn system
that otherwise doesn't need them would be very odd.

-Andi

2018-09-28 20:54:55

by Jann Horn

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, Sep 28, 2018 at 10:49 PM Andi Kleen <[email protected]> wrote:
>
> On Fri, Sep 28, 2018 at 06:40:17PM +0100, Mark Rutland wrote:
> > On Fri, Sep 28, 2018 at 10:23:40AM -0700, Andi Kleen wrote:
> > > > There's also been prior discussion on these feature in other contexts
> > > > (e.g. android expoits resulting from out-of-tree drivers). It would be
> > > > nice to see those considered.
> > > >
> > > > IIRC The conclusion from prior discussions (e.g. [1]) was that we wanted
> > > > finer granularity of control such that we could limit PMU access to
> > > > specific users -- e.g. disallow arbitrary android apps from poking *any*
> > > > PMU, while allowing some more trusted apps/users to uses *some* specific
> > > > PMUs.
> > > >
> > > > e.g. we could add /sys/bus/event_source/devices/${PMU}/device, protect
> > > > this via the usual fs ACLs, and pass the fd to perf_event_open()
> > > > somehow. A valid fd would act as a capability, taking precedence over
> > > > perf_event_paranoid.
> > >
> > > That sounds like an orthogonal feature. I don't think the original
> > > patchkit would need to be hold up for this. It would be something
> > > in addition.
> >
> > I have to say that I disagree -- these controls will have to interact
> > somehow, and the fewer of them we have, the less complexity we'll have
> > to deal with longer-term.
>
> You're proposing to completely redesign perf_event_open.

And I think it would be a very good redesign. :) I love things that
use file descriptors to represent capabilities.

> This new file descriptor argument doesn't exist today so it would
> need to create a new system call with more arguments

Is that true? The first argument is a pointer to a struct that
contains its own size, so it can be expanded without an ABI break. I
don't see any reason why you couldn't cram more stuff in there.

> (and BTW it would be more than the normal 6 argument limit
> we have, so actually it couldn't even be a standard sycall)
>
> Obviously we would need to keep the old system call around
> for compability, so you would need to worry about this
> interaction in any case!
>
> So tying it together doesn't make any sense, because
> the problem has to be solved separately anyways.
>
> >
> > > BTW can't you already do that with the syscall filter? I assume
> > > the Android sandboxes already use that. Just forbid perf_event_open
> > > for the apps.
> >
> > Note that this was about providing access to *some* PMUs in some cases.
> >
> > IIUC, if that can be done today via a syscall filter, the same is true
> > of per-pmu paranoid settings.
>
> The difference is that the Android sandboxes likely already doing this
> and have all the infrastructure, and it's just another rule.
>
> Requiring syscall filters just to use the PMU on xn system
> that otherwise doesn't need them would be very odd.
>
> -Andi

2018-09-28 20:59:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

> > This new file descriptor argument doesn't exist today so it would
> > need to create a new system call with more arguments
>
> Is that true? The first argument is a pointer to a struct that
> contains its own size, so it can be expanded without an ABI break. I
> don't see any reason why you couldn't cram more stuff in there.

You're right we could put the fd into the perf_event, but the following is
still true:

> > Obviously we would need to keep the old system call around
> > for compability, so you would need to worry about this
> > interaction in any case!
> >
> > So tying it together doesn't make any sense, because
> > the problem has to be solved separately anyways.


-Andi

2018-09-28 21:24:43

by Jann Horn

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, Sep 28, 2018 at 10:59 PM Andi Kleen <[email protected]> wrote:
> > > This new file descriptor argument doesn't exist today so it would
> > > need to create a new system call with more arguments
> >
> > Is that true? The first argument is a pointer to a struct that
> > contains its own size, so it can be expanded without an ABI break. I
> > don't see any reason why you couldn't cram more stuff in there.
>
> You're right we could put the fd into the perf_event, but the following is
> still true:
>
> > > Obviously we would need to keep the old system call around
> > > for compability, so you would need to worry about this
> > > interaction in any case!

<blasphemy>
Is that true? IIRC if you want to use the perf tools after a kernel
update, you have to install a new version of perf anyway, no? I think
after I run a kernel update, when I run "perf top", it just refuses to
start and tells me to go install a newer version. Would the users of
perf_event_open() that want to monitor this graphics stuff normally
keep working after a kernel version bump?
I realize that the kernel is very much against breaking userspace
interfaces, but if userspace has already decided to break itself after
every update, we might as well take advantage of that...
</blasphemy>

> > > So tying it together doesn't make any sense, because
> > > the problem has to be solved separately anyways.

2018-09-28 21:29:03

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, Sep 28, 2018 at 11:22:37PM +0200, Jann Horn wrote:
> On Fri, Sep 28, 2018 at 10:59 PM Andi Kleen <[email protected]> wrote:
> > > > This new file descriptor argument doesn't exist today so it would
> > > > need to create a new system call with more arguments
> > >
> > > Is that true? The first argument is a pointer to a struct that
> > > contains its own size, so it can be expanded without an ABI break. I
> > > don't see any reason why you couldn't cram more stuff in there.
> >
> > You're right we could put the fd into the perf_event, but the following is
> > still true:
> >
> > > > Obviously we would need to keep the old system call around
> > > > for compability, so you would need to worry about this
> > > > interaction in any case!
>
> <blasphemy>
> Is that true? IIRC if you want to use the perf tools after a kernel
> update, you have to install a new version of perf anyway, no?

Not at all. perf is fully ABI compatible.

Yes Ubuntu/Debian make you do it, but there is no reason for it other
than their ignorance. Other sane distributions don't.

Usually the first step when I'm forced to use one of those machine is to
remove the useless wrapper and call the perf binary directly.

-Andi

2018-09-28 22:04:37

by Jann Horn

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, Sep 28, 2018 at 5:12 PM Jann Horn <[email protected]> wrote:
>
> On Fri, Sep 28, 2018 at 3:22 PM Tvrtko Ursulin
> <[email protected]> wrote:
> > On 28/09/2018 11:26, Thomas Gleixner wrote:
> > > On Wed, 19 Sep 2018, Tvrtko Ursulin wrote:
> > >> For situations where sysadmins might want to allow different level of
> > >> access control for different PMUs, we start creating per-PMU
> > >> perf_event_paranoid controls in sysfs.
> > >>
> > >> These work in equivalent fashion as the existing perf_event_paranoid
> > >> sysctl, which now becomes the parent control for each PMU.
> > >>
> > >> On PMU registration the global/parent value will be inherited by each PMU,
> > >> as it will be propagated to all registered PMUs when the sysctl is
> > >> updated.
> > >>
> > >> At any later point individual PMU access controls, located in
> > >> <sysfs>/device/<pmu-name>/perf_event_paranoid, can be adjusted to achieve
> > >> fine grained access control.
> > >>
> > >> Discussion from previous posting:
> > >> https://lkml.org/lkml/2018/5/21/156
> > >
> > > This is really not helpful. The cover letter and the change logs should
> > > contain a summary of that discussion and a proper justification of the
> > > proposed change. Just saying 'sysadmins might want to allow' is not useful
> > > at all, it's yet another 'I want a pony' thing.
> >
> > Okay, for the next round I will expand the cover letter with at least
> > one concrete example on how it is usable and summarize the discussion a bit.
> >
> > > I read through the previous thread and there was a clear request to involve
> > > security people into this. Especially those who are deeply involved with
> > > hardware side channels. I don't see anyone Cc'ed on the whole series.
> >
> > Who would you recommend I add? Because I really don't know..
> >
> > > For the record, I'm not buying the handwavy 'more noise' argument at
> > > all. It wants a proper analysis and we need to come up with criteria which
> > > PMUs can be exposed at all.
> > >
> > > All of this want's a proper documentation clearly explaining the risks and
> > > scope of these knobs per PMU. Just throwing magic knobs at sysadmins and
> > > then saying 'its their problem to figure it out' is not acceptable.
> >
> > Presumably you see adding fine grained control as diminishing the
> > overall security rather than raising it? Could you explain why? Because
> > incompetent sysadmin will turn it off for some PMU, while without having
> > the fine-grained control they wouldn't turn it off globally?
> >
> > This feature was requested by the exact opposite concern, that in order
> > to access the i915 PMU, one has to compromise the security of the entire
> > system by allowing access to *all* PMU's.
> >
> > Making this ability fine-grained sounds like a logical solution for
> > solving this weakening of security controls.
> >
> > Concrete example was that on video transcoding farms users want to
> > monitor the utilization of GPU engines (like CPU cores) and they can do
> > that via the i915 PMU. But for that to work today they have to dial down
> > the global perf_event_paranoid setting. Obvious improvement was to allow
> > them to only dial down the i915.perf_event_paranoid setting. As such,
> > for this specific use case at least, the security is increased.
>
> Which paranoia level would be used for the i915.perf_event_paranoid
> setting in such a case?

Ah, I guess the answer is "0", since you want to see data about what
other users are doing.

Does the i915 PMU expose sampling events, counting events, or both?
The thing about sampling events is that they AFAIK always let the user
pick arbitrary data to collect - like register contents, or userspace
stack memory -, and independent of the performance counter being
monitored, this kind of access should not be permitted to other
contexts. (But it might be that I misunderstand how perf works - I'm
not super familiar with its API.)

2018-09-29 06:19:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, 28 Sep 2018, Andi Kleen wrote:
> So there isn't a single answer, and that is why it is important
> that this if configurable.

I said clearly that I'm not opposed against making it configurable. But
because there is no single answer, it's even more important to have proper
documentation. And that's all I'm asking for aside of making it opt-in
instead of a wholesale expose everything approach.

Thanks,

tglx



2018-09-29 06:30:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Fri, 28 Sep 2018, Andi Kleen wrote:
> > > This new file descriptor argument doesn't exist today so it would
> > > need to create a new system call with more arguments
> >
> > Is that true? The first argument is a pointer to a struct that
> > contains its own size, so it can be expanded without an ABI break. I
> > don't see any reason why you couldn't cram more stuff in there.
>
> You're right we could put the fd into the perf_event, but the following is
> still true:
>
> > > Obviously we would need to keep the old system call around
> > > for compability, so you would need to worry about this
> > > interaction in any case!
> > >
> > > So tying it together doesn't make any sense, because
> > > the problem has to be solved separately anyways.

And why so? You can keep the original functionality around with the
existing restrictions without breaking any existing user space. That
existing functionality does not require new knobs. It stays as is.

So if you want to use the enhanced version with per PMU permissions based
on file descriptors you need a new version of perf. That's nothing new, if
the kernel adds new features to any syscall, then you need new tools, new
libraries etc. The only guarantee the kernel makes is not to break existing
user space, but there is no guarantee that you can utilize new features
with existing userspace.

Thanks,

tglx

2018-10-01 06:25:58

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Hello,

On 28.09.2018 21:20, Thomas Gleixner wrote:
<SNIP>
> Start with something like Documentation/admin-guide/perf-security.rst or
> whatever name fits better and add a proper documentation for the existing
> knob. With the infrastructure for fine grained access control add the
> general explanation for fine grained access control. With each PMU which
> opt's in for the knob, add a section with guidance about scope and risk for
> this particular one.

Sounds like a plan. Thanks!

BR,
Alexey

>
> Thanks,
>
> tglx
>
>
>

2018-10-01 06:27:41

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Hello Jann and Kees,

On 29.09.2018 1:02, Jann Horn wrote:
<SNIP>
> Ah, I guess the answer is "0", since you want to see data about what
> other users are doing.
>
> Does the i915 PMU expose sampling events, counting events, or both?
> The thing about sampling events is that they AFAIK always let the user
> pick arbitrary data to collect - like register contents, or userspace
> stack memory -, and independent of the performance counter being
> monitored, this kind of access should not be permitted to other
> contexts. (But it might be that I misunderstand how perf works - I'm
> not super familiar with its API.)
>

Currently *core* paranoid >= 1 (per-process mode) prevents simultaneous
sampling on CPU events (perf record) and reading of uncore HW counters
(perf stat -I), because uncore counters count system wide and that is
allowed only when *core* paranoid <= 0.

Uncore counts collected simultaneously with CPU event samples can be
correlated using timestamps taken from some common system clock e.g.
CLOCK_MONOTONIC_RAW.

Could it be secure enough to still allow reading of system wide uncore
HW counters when sampling of CPU events is limited to specific processes
by *core* paranoid >= 1?

Thanks,
Alexey

2018-10-01 06:27:50

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Hello Jann,

> On Fri, Sep 28, 2018 at 11:22:37PM +0200, Jann Horn wrote:
<SNIP>
>>
>> <blasphemy>
>> Is that true? IIRC if you want to use the perf tools after a kernel
>> update, you have to install a new version of perf anyway, no?

There are usages in production where perf_event_open() syscall
accompanied with read(), mmap() etc. is embedded into application
on per-thread basis and is used for self monitoring and dynamic
execution tuning.

There are also other Perf tools around that, for example, are
statically linked and then used as on Linux as on Android.

Backward compatibility does matter in these cases.

Thanks,
Alexey

2018-10-01 16:12:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Alexey,

On Mon, 1 Oct 2018, Alexey Budankov wrote:
> > On Fri, Sep 28, 2018 at 11:22:37PM +0200, Jann Horn wrote:
> >> <blasphemy>
> >> Is that true? IIRC if you want to use the perf tools after a kernel
> >> update, you have to install a new version of perf anyway, no?
>
> There are usages in production where perf_event_open() syscall
> accompanied with read(), mmap() etc. is embedded into application
> on per-thread basis and is used for self monitoring and dynamic
> execution tuning.
>
> There are also other Perf tools around that, for example, are
> statically linked and then used as on Linux as on Android.
>
> Backward compatibility does matter in these cases.

Well, it's nothing fundamentally new, that new features require changes to
applications, libraries etc. It's nice if it can be avoided of course.

From a design POV, Jann's idea to have a per PMU special file which you
need to open for getting access is way better than the extra knobs. It
allows to use all existing security mechanisms to be used.

Peter and I discussed that and we came up with the idea that the file
descriptor is not even required, i.e. you could make it backward
compatible.

perf_event_open() knows which PMU is associated with the event the caller
tries to open. So perf_event_open() can try to access/open the special per
PMU file on behalf of the caller. That should get the same security
treatment like a regular open() from user space. If that succeeds, access
is granted.

The magic file could still be writeable for root to give general
restrictions aside of the file based ones similar to what you are
proposing.

The analysis and documentation requirements still remain of course.

Thanks,

tglx



2018-10-01 16:16:05

by Jann Horn

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Mon, Oct 1, 2018 at 6:12 PM Thomas Gleixner <[email protected]> wrote:
> From a design POV, Jann's idea to have a per PMU special file which you
> need to open for getting access is way better than the extra knobs. It
> allows to use all existing security mechanisms to be used.

(That was Mark's idea, not mine, I just agree with his idea a lot.)

2018-10-01 20:54:24

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)


Hello,

On 01.10.2018 19:11, Thomas Gleixner wrote:

<SNIP>

> Peter and I discussed that and we came up with the idea that the file
> descriptor is not even required, i.e. you could make it backward
> compatible.
>
> perf_event_open() knows which PMU is associated with the event the caller
> tries to open. So perf_event_open() can try to access/open the special per
> PMU file on behalf of the caller. That should get the same security
> treatment like a regular open() from user space. If that succeeds, access
> is granted.
>
> The magic file could still be writeable for root to give general
> restrictions aside of the file based ones similar to what you are
> proposing.

Let me wrap up all the requirements and ideas that have been captured so far.

1. A file [1] is added so that it can belong to a group of users allowed to use ${PMU},
something like this:

ls -alh /sys/bus/event_source/devices/${PMU}/caps/
total 0
drwxr-xr-x 2 root root 0 Oct 1 20:36 .
drwxr-xr-x 6 root root 0 Oct 1 20:36 ..
-r--r--r-- 1 root root 4.0K Oct 1 20:36 branches
-r--r--r-- 1 root root 4.0K Oct 1 20:36 max_precise
-r--r--r-- 1 root root 4.0K Oct 1 20:36 pmu_name
-rw-r--r-- root ${PMU}_users paranoid <===

Modifications of file content are allowed to those who can
modify /proc/sys/kernel/perf_event_paranoid setting.

2. Semantics and content of the introduced paranoid file is
similar to /proc/sys/kernel/perf_even_paranoid [2]:

The perf_event_paranoid file can be set to restrict access
to the performance counters.

2 allow only user-space measurements (default since Linux 4.6).
1 allow both kernel and user measurements (default before Linux 4.6).
0 allow access to CPU-specific data but not raw trace‐point samples.
-1 no restrictions.

The existence of the perf_event_paranoid file is the official method
for determining if a kernel supports perf_event_open().

3. Every time an event for ${PMU} is created over perf_event_open():
a) the calling thread's euid is checked to belong to ${PMU}_users group
and if it does then the event's fd is allocated;
b) then traditional checks against perf_event_pranoid content are applied;
c) if the file doesn't exist the access is governed by global setting
at /proc/sys/kernel/perf_even_paranoid;

4. Documentation/admin-guide/perf-security.rst file is introduced that:
a) contains general explanation for fine grained access control;
b) contains a section with guidance about scope and risk for each PMU
which is enabled for fine grained access control;
c) file is extended when more PMUs are enabled for fine grain control;

>
> The analysis and documentation requirements still remain of course.

Security analysis for uncore IMC, QPI/UPI, PCIe PMUs is still required
to be enabled for fine grain control.

Thanks,
Alexey

[1] https://patchwork.kernel.org/patch/9249919/#19714087
[2] http://man7.org/linux/man-pages/man2/perf_event_open.2.html

2018-10-02 06:41:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Alexey,

On Mon, 1 Oct 2018, Alexey Budankov wrote:
> > perf_event_open() knows which PMU is associated with the event the caller
> > tries to open. So perf_event_open() can try to access/open the special per
> > PMU file on behalf of the caller. That should get the same security
> > treatment like a regular open() from user space. If that succeeds, access
> > is granted.
> >
> > The magic file could still be writeable for root to give general
> > restrictions aside of the file based ones similar to what you are
> > proposing.
>
> Let me wrap up all the requirements and ideas that have been captured so far.
>
> 1. A file [1] is added so that it can belong to a group of users allowed to use ${PMU},
> something like this:
>
> ls -alh /sys/bus/event_source/devices/${PMU}/caps/
> total 0
> drwxr-xr-x 2 root root 0 Oct 1 20:36 .
> drwxr-xr-x 6 root root 0 Oct 1 20:36 ..
> -r--r--r-- 1 root root 4.0K Oct 1 20:36 branches
> -r--r--r-- 1 root root 4.0K Oct 1 20:36 max_precise
> -r--r--r-- 1 root root 4.0K Oct 1 20:36 pmu_name
> -rw-r--r-- root ${PMU}_users paranoid <===

Right, though I personaly prefer something like 'access_control' as file
name, but that's bike shed painting realm.

> Modifications of file content are allowed to those who can
> modify /proc/sys/kernel/perf_event_paranoid setting.
>
> 2. Semantics and content of the introduced paranoid file is
> similar to /proc/sys/kernel/perf_even_paranoid [2]:
>
> The perf_event_paranoid file can be set to restrict access
> to the performance counters.
>
> 2 allow only user-space measurements (default since Linux 4.6).
> 1 allow both kernel and user measurements (default before Linux 4.6).
> 0 allow access to CPU-specific data but not raw trace‐point samples.
> -1 no restrictions.
>
> The existence of the perf_event_paranoid file is the official method
> for determining if a kernel supports perf_event_open().
>
> 3. Every time an event for ${PMU} is created over perf_event_open():
> a) the calling thread's euid is checked to belong to ${PMU}_users group
> and if it does then the event's fd is allocated;

Not only the user group, it really should do the full security checks which
are done on open().

> b) then traditional checks against perf_event_pranoid content are applied;

Hmm, not sure about that because that might be conflicting.

> c) if the file doesn't exist the access is governed by global setting
> at /proc/sys/kernel/perf_even_paranoid;

Correct.

> 4. Documentation/admin-guide/perf-security.rst file is introduced that:

0) Better documentation of /proc/sys/kernel/perf_even_paranoid

> a) contains general explanation for fine grained access control;
> b) contains a section with guidance about scope and risk for each PMU
> which is enabled for fine grained access control;
> c) file is extended when more PMUs are enabled for fine grain control;

Thanks,

tglx

2018-10-02 11:47:11

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)


Hello,

On 02.10.2018 9:40, Thomas Gleixner wrote:

<SNIP>

>
> Not only the user group, it really should do the full security checks which
> are done on open().

I expect it is already implemented by some internal kernel API so that
it could be reused.

>
>> b) then traditional checks against perf_event_pranoid content are applied;
>
> Hmm, not sure about that because that might be conflicting.

Well, possible contradictions could be converged to some reasonable point
during technical review stage.

Current perf_event_paranoid semantics is still required for PMUs
that are governed by global setting at /proc/sys/kernel/perf_event_paranoid.

<SNIP>

>> 4. Documentation/admin-guide/perf-security.rst file is introduced that:
>
> 0) Better documentation of /proc/sys/kernel/perf_even_paranoid

Exactly. perf_event_open man7 [1] requires update as well, however
this is not a part of kernel source tree so these docs changes are
to be mailed TO: [email protected] and CC: [email protected].

Thanks,
Alexey

[1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html

2018-10-03 17:02:04

by Jann Horn

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

On Mon, Oct 1, 2018 at 10:53 PM Alexey Budankov
<[email protected]> wrote:
> On 01.10.2018 19:11, Thomas Gleixner wrote:
> > Peter and I discussed that and we came up with the idea that the file
> > descriptor is not even required, i.e. you could make it backward
> > compatible.
> >
> > perf_event_open() knows which PMU is associated with the event the caller
> > tries to open. So perf_event_open() can try to access/open the special per
> > PMU file on behalf of the caller. That should get the same security
> > treatment like a regular open() from user space. If that succeeds, access
> > is granted.
> >
> > The magic file could still be writeable for root to give general
> > restrictions aside of the file based ones similar to what you are
> > proposing.
>
> Let me wrap up all the requirements and ideas that have been captured so far.
>
> 1. A file [1] is added so that it can belong to a group of users allowed to use ${PMU},
> something like this:
>
> ls -alh /sys/bus/event_source/devices/${PMU}/caps/
> total 0
> drwxr-xr-x 2 root root 0 Oct 1 20:36 .
> drwxr-xr-x 6 root root 0 Oct 1 20:36 ..
> -r--r--r-- 1 root root 4.0K Oct 1 20:36 branches
> -r--r--r-- 1 root root 4.0K Oct 1 20:36 max_precise
> -r--r--r-- 1 root root 4.0K Oct 1 20:36 pmu_name
> -rw-r--r-- root ${PMU}_users paranoid <===
>
> Modifications of file content are allowed to those who can
> modify /proc/sys/kernel/perf_event_paranoid setting.
>
> 2. Semantics and content of the introduced paranoid file is
> similar to /proc/sys/kernel/perf_even_paranoid [2]:
>
> The perf_event_paranoid file can be set to restrict access
> to the performance counters.
>
> 2 allow only user-space measurements (default since Linux 4.6).
> 1 allow both kernel and user measurements (default before Linux 4.6).
> 0 allow access to CPU-specific data but not raw trace‐point samples.
> -1 no restrictions.
>
> The existence of the perf_event_paranoid file is the official method
> for determining if a kernel supports perf_event_open().
>
> 3. Every time an event for ${PMU} is created over perf_event_open():
> a) the calling thread's euid is checked to belong to ${PMU}_users group
> and if it does then the event's fd is allocated;
> b) then traditional checks against perf_event_pranoid content are applied;
> c) if the file doesn't exist the access is governed by global setting
> at /proc/sys/kernel/perf_even_paranoid;

You'll also have to make sure that this thing in kernel/events/core.c
doesn't have any bad effect:

/*
* Special case software events and allow them to be part of
* any hardware group.
*/

As in, make sure that you can't smuggle in arbitrary software events
by attaching them to a whitelisted hardware event.

> 4. Documentation/admin-guide/perf-security.rst file is introduced that:
> a) contains general explanation for fine grained access control;
> b) contains a section with guidance about scope and risk for each PMU
> which is enabled for fine grained access control;
> c) file is extended when more PMUs are enabled for fine grain control;
>
> >
> > The analysis and documentation requirements still remain of course.
>
> Security analysis for uncore IMC, QPI/UPI, PCIe PMUs is still required
> to be enabled for fine grain control.

And you can't whitelist anything that permits using sampling events
with arbitrary sample_type.

2018-10-04 17:15:24

by Alexey Budankov

[permalink] [raw]
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Hi,

On 03.10.2018 20:01, Jann Horn wrote:
> On Mon, Oct 1, 2018 at 10:53 PM Alexey Budankov
> <[email protected]> wrote:
<SNIP>
>> 3. Every time an event for ${PMU} is created over perf_event_open():
>> a) the calling thread's euid is checked to belong to ${PMU}_users group
>> and if it does then the event's fd is allocated;
>> b) then traditional checks against perf_event_pranoid content are applied;
>> c) if the file doesn't exist the access is governed by global setting
>> at /proc/sys/kernel/perf_even_paranoid;
>
> You'll also have to make sure that this thing in kernel/events/core.c
> doesn't have any bad effect:
>
> /*
> * Special case software events and allow them to be part of
> * any hardware group.
> */
>
> As in, make sure that you can't smuggle in arbitrary software events
> by attaching them to a whitelisted hardware event.

Yes, makes sense. Please see and comment below.

>
<SNIP>
>> Security analysis for uncore IMC, QPI/UPI, PCIe PMUs is still required
>> to be enabled for fine grain control.
>
> And you can't whitelist anything that permits using sampling events
> with arbitrary sample_type.
>

It appears that there is a dependency on the significance of data that PMUs captures
for later analysis. Currently there are following options for data being captured
(please correct or extend if something is missing from the list below):

1) Monitored process details:
- system information on a process as a container (of threads, memory data and
IDs (e.g. open fds) from process specific namespaces and etc.);
- system information on threads as containers (of execution context details);
2) Execution context details:
- memory addresses;
- memory data;
- calculation results;
- calculation state in HW;
3) Monitored process and execution context telemetry data, used for building
various performance metrics and can come from:
- user mode code and OS kernel;
- various parts of HW e.g. core, uncore, peripheral and etc.

Group 2) is the potential leakage source of sensitive process data so if a PMU,
at some mode, samples execution context details then the PMU, working in that mode,
is the subject for *access* and *scope* control.

On the other hand if captured data contain only the monitored process details
and/or associated execution telemetry, there is probably no sensitive data leakage
thru that captured data.

For example, if cpu PMU samples PC addresses overtime, e.g. for providing
hotspots-by-function profile, then this requires to be controlled as from access as
from scope perspective, because PC addresses is execution context details that
can contain sensitive data.

However, if cpu PMU does counting of some metric value, or if software PMU reads
value of thread active time from the OS, possibly overtime, for later building some
rating profile, or reading of some HW counter value without attribution to any
execution context details, that is probably not that risky as in the case of
PC address sampling.

Uncore PMUs e.g. memory controller (IMC), interconnect (QPI/UPI) and peripheral (PCIe)
currently only read counters values that are captured system wide by HW, and provide
no attribution to any specific execution context details, thus, sensitive process data.

Based on that,

A) paranoid knob is required for a PMU if it can capture data from group 2)
B) paranoid knob limits scope of capturing sensitive data:
-3 - *scope* is defined by some high level setting
-2 - disabled - no allowed *scope*
-1 - no restrictions - max *scope*
0 - system wide
1 - process user and kernel space
2 - process user space only
C) paranoid knob has to be checked every time the PMU is going to start
capturing sensitive data to avoid capturing beyond the allowed scope.

PMU *access* semantics is derived from fs ACLs and could look like this:

r - read PMU architectural and configuration details, read PMU *access* settings
w - modify PMU *access* settings
x - modify PMU configuration and collect data

So levels of *access* to PMU could look like this:

root=rwx, ${PMU}_users=r-x, other=r--.

Possible examples of *scope* control settings could look like this:

1) system wide user+kernel mode CPU sampling with context switches
and uncore counting:

/proc/sys/kernel/perf_event_paranoid (-2, 2): 0
SW.paranoid (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -3
UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -3
PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -3

2) per-process CPU sampling with context switches and uncore counting:

/proc/sys/kernel/perf_event_paranoid (-2, 2): 1|2
SW.paranoid (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -1
UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -1
PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -1

3) per-process user mode CPU sampling allowed to specific ${PMU}_groups only:

/proc/sys/kernel/perf_event_paranoid (-2, 2): -2
SW.paranoid (-3, 2):(root=rwx, SW_users=r-x,other=r--): 2
CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): 2
IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -3
UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -3
PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -3

4) uncore HW counters monitoring, possibly overtime:

/proc/sys/kernel/perf_event_paranoid (-2, 2): -2
SW.paranoid (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -1
UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -1
PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -1

Please share more thought so that it eventually could go into
Documentation/admin-guide/perf-security.rst.

Thanks,
Alexey