2014-06-02 13:59:42

by Yann Droneaud

[permalink] [raw]
Subject: [PATCHv5] perf tools: enable close-on-exec flag on perf file descriptor

In commit a21b0b354d4a ('perf: Introduce a flag to enable
close-on-exec in perf_event_open()'), flag PERF_FLAG_FD_CLOEXEC
was added to perf_event_open(2) syscall to allows userspace
to atomically enable close-on-exec behavor when creating
the file descriptor.

This patch makes perf tools use the new flag if supported
by the kernel, so that the event file descriptors got
automatically closed if perf tool exec a sub-command.

Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Yann Droneaud <[email protected]>
---

Hi,

Quite the same patch from v4. I'm interested in some
feedback so that I could improve the patch if needed.

Regards.

Changes from v4 [1]:
- rebase on next-20140530 and update commit message.

Changes from v3 [2]:
- rebase on next-20140307 and resend asis.

Changes from v2 [3]:
- use safest perf attr configuration in probe function,
as used in perf_evsel__fallback().

Changes from v1 [4]:
- don't probe PERF_FLAG_FD_CLOEXEC for each call to
perf_event_open_cloexec_flag(): don't forget to set
'probed' variable once flag was probed.
- call perf_event_open_cloexec_flag() only once in
util/record.c:perf_do_probe_api().
- fixed minor coding style issue (unneeded braces)
in util/cloexec.c

Changes from v0 [5]:
- add probing for PERF_FLAG_FD_CLOEXEC flag allowing
perf to run on older kernel:
* use "missing feature" pattern in evsel to disable
PERF_FLAG_FD_CLOEXEC in perf_evsel__open() if not
supported by kernel;
* add a dedicated function to probe for
PERF_FLAG_FD_CLOEXEC support in the current kernel.
This function is to be used by other functions
calling sys_perf_event_open() directly.
- remove PERF_FLAG_FD_CLOEXEC from PowerPC selftest
as it's not related to perf tool: it should be part
of a separate patch.

[1] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3809711/

[2] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3540091/

[3] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3474141/

[4] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3469571/

[5] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3437111/

tools/perf/Makefile.perf | 1 +
tools/perf/bench/mem-memcpy.c | 4 ++-
tools/perf/bench/mem-memset.c | 4 ++-
tools/perf/builtin-sched.c | 4 ++-
tools/perf/tests/bp_signal.c | 4 ++-
tools/perf/tests/bp_signal_overflow.c | 4 ++-
tools/perf/tests/rdpmc.c | 4 ++-
tools/perf/util/cloexec.c | 56 +++++++++++++++++++++++++++++++++++
tools/perf/util/cloexec.h | 6 ++++
tools/perf/util/evsel.c | 12 ++++++--
tools/perf/util/record.c | 9 ++++--
11 files changed, 96 insertions(+), 12 deletions(-)
create mode 100644 tools/perf/util/cloexec.c
create mode 100644 tools/perf/util/cloexec.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 02f0a4dd1a80..000d34087f77 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -373,6 +373,7 @@ LIB_OBJS += $(OUTPUT)util/stat.o
LIB_OBJS += $(OUTPUT)util/record.o
LIB_OBJS += $(OUTPUT)util/srcline.o
LIB_OBJS += $(OUTPUT)util/data.o
+LIB_OBJS += $(OUTPUT)util/cloexec.o

LIB_OBJS += $(OUTPUT)ui/setup.o
LIB_OBJS += $(OUTPUT)ui/helpline.o
diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c
index 5ce71d3b72cf..bf5a21b848a9 100644
--- a/tools/perf/bench/mem-memcpy.c
+++ b/tools/perf/bench/mem-memcpy.c
@@ -10,6 +10,7 @@
#include "../util/util.h"
#include "../util/parse-options.h"
#include "../util/header.h"
+#include "../util/cloexec.h"
#include "bench.h"
#include "mem-memcpy-arch.h"

@@ -83,7 +84,8 @@ static struct perf_event_attr cycle_attr = {

static void init_cycle(void)
{
- cycle_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1, 0);
+ cycle_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1,
+ perf_event_open_cloexec_flag());

if (cycle_fd < 0 && errno == ENOSYS)
die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
diff --git a/tools/perf/bench/mem-memset.c b/tools/perf/bench/mem-memset.c
index 9af79d2b18e5..260747ea1e0e 100644
--- a/tools/perf/bench/mem-memset.c
+++ b/tools/perf/bench/mem-memset.c
@@ -10,6 +10,7 @@
#include "../util/util.h"
#include "../util/parse-options.h"
#include "../util/header.h"
+#include "../util/cloexec.h"
#include "bench.h"
#include "mem-memset-arch.h"

@@ -83,7 +84,8 @@ static struct perf_event_attr cycle_attr = {

static void init_cycle(void)
{
- cycle_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1, 0);
+ cycle_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1,
+ perf_event_open_cloexec_flag());

if (cycle_fd < 0 && errno == ENOSYS)
die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index d7176830b9b2..bb4afc9fae54 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -10,6 +10,7 @@
#include "util/header.h"
#include "util/session.h"
#include "util/tool.h"
+#include "util/cloexec.h"

#include "util/parse-options.h"
#include "util/trace-event.h"
@@ -434,7 +435,8 @@ static int self_open_counters(void)
attr.type = PERF_TYPE_SOFTWARE;
attr.config = PERF_COUNT_SW_TASK_CLOCK;

- fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+ fd = sys_perf_event_open(&attr, 0, -1, -1,
+ perf_event_open_cloexec_flag());

if (fd < 0)
pr_err("Error: sys_perf_event_open() syscall returned "
diff --git a/tools/perf/tests/bp_signal.c b/tools/perf/tests/bp_signal.c
index aba095489193..fdc0d3e185f9 100644
--- a/tools/perf/tests/bp_signal.c
+++ b/tools/perf/tests/bp_signal.c
@@ -25,6 +25,7 @@
#include "tests.h"
#include "debug.h"
#include "perf.h"
+#include "../util/cloexec.h"

static int fd1;
static int fd2;
@@ -78,7 +79,8 @@ static int bp_event(void *fn, int setup_signal)
pe.exclude_kernel = 1;
pe.exclude_hv = 1;

- fd = sys_perf_event_open(&pe, 0, -1, -1, 0);
+ fd = sys_perf_event_open(&pe, 0, -1, -1,
+ perf_event_open_cloexec_flag());
if (fd < 0) {
pr_debug("failed opening event %llx\n", pe.config);
return TEST_FAIL;
diff --git a/tools/perf/tests/bp_signal_overflow.c b/tools/perf/tests/bp_signal_overflow.c
index 44ac82179708..b0b17415f18c 100644
--- a/tools/perf/tests/bp_signal_overflow.c
+++ b/tools/perf/tests/bp_signal_overflow.c
@@ -24,6 +24,7 @@
#include "tests.h"
#include "debug.h"
#include "perf.h"
+#include "../util/cloexec.h"

static int overflows;

@@ -91,7 +92,8 @@ int test__bp_signal_overflow(void)
pe.exclude_kernel = 1;
pe.exclude_hv = 1;

- fd = sys_perf_event_open(&pe, 0, -1, -1, 0);
+ fd = sys_perf_event_open(&pe, 0, -1, -1,
+ perf_event_open_cloexec_flag());
if (fd < 0) {
pr_debug("failed opening event %llx\n", pe.config);
return TEST_FAIL;
diff --git a/tools/perf/tests/rdpmc.c b/tools/perf/tests/rdpmc.c
index e59143fd9e71..8432bfcffed6 100644
--- a/tools/perf/tests/rdpmc.c
+++ b/tools/perf/tests/rdpmc.c
@@ -6,6 +6,7 @@
#include "perf.h"
#include "debug.h"
#include "tests.h"
+#include "../util/cloexec.h"

#if defined(__x86_64__) || defined(__i386__)

@@ -104,7 +105,8 @@ static int __test__rdpmc(void)
sa.sa_sigaction = segfault_handler;
sigaction(SIGSEGV, &sa, NULL);

- fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+ fd = sys_perf_event_open(&attr, 0, -1, -1,
+ perf_event_open_cloexec_flag());
if (fd < 0) {
pr_err("Error: sys_perf_event_open() syscall returned "
"with %d (%s)\n", fd, strerror(errno));
diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c
new file mode 100644
index 000000000000..771ed36fcc61
--- /dev/null
+++ b/tools/perf/util/cloexec.c
@@ -0,0 +1,56 @@
+#include "util.h"
+#include "../perf.h"
+#include "cloexec.h"
+
+static unsigned long flag = PERF_FLAG_FD_CLOEXEC;
+
+static int perf_flag_probe(void)
+{
+ /* use 'safest' configuration as used in perf_evsel__fallback() */
+ struct perf_event_attr attr = {
+ .type = PERF_COUNT_SW_CPU_CLOCK,
+ .config = PERF_COUNT_SW_CPU_CLOCK,
+ };
+ int fd;
+ int err;
+
+ /* check cloexec flag */
+ fd = sys_perf_event_open(&attr, 0, -1, -1,
+ PERF_FLAG_FD_CLOEXEC);
+ if (fd >= 0) {
+ close(fd);
+ return 1;
+ }
+
+ if (errno != EINVAL) {
+ err = errno;
+ pr_warning("sys_perf_event_open() syscall returned "
+ "%d (%d: %s)\n", fd, err, strerror(err));
+ }
+
+ /* not supported, confirm error related to PERF_FLAG_FD_CLOEXEC */
+ fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+ if (fd >= 0) {
+ close(fd);
+ return 0;
+ }
+
+ err = errno;
+ die("sys_perf_event_open() syscall returned "
+ "%d (%d: %s)\n", fd, err, strerror(err));
+
+ return -1;
+}
+
+unsigned long perf_event_open_cloexec_flag(void)
+{
+ static bool probed;
+
+ if (!probed) {
+ if (perf_flag_probe() <= 0)
+ flag = 0;
+ probed = true;
+ }
+
+ return flag;
+}
diff --git a/tools/perf/util/cloexec.h b/tools/perf/util/cloexec.h
new file mode 100644
index 000000000000..94a5a7d829d5
--- /dev/null
+++ b/tools/perf/util/cloexec.h
@@ -0,0 +1,6 @@
+#ifndef __PERF_CLOEXEC_H
+#define __PERF_CLOEXEC_H
+
+unsigned long perf_event_open_cloexec_flag(void);
+
+#endif /* __PERF_CLOEXEC_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 5c28d82b76c4..d32019461993 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -29,6 +29,7 @@ static struct {
bool sample_id_all;
bool exclude_guest;
bool mmap2;
+ bool cloexec;
} perf_missing_features;

#define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
@@ -988,7 +989,7 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
struct thread_map *threads)
{
int cpu, thread;
- unsigned long flags = 0;
+ unsigned long flags = PERF_FLAG_FD_CLOEXEC;
int pid = -1, err;
enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE;

@@ -997,11 +998,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
return -ENOMEM;

if (evsel->cgrp) {
- flags = PERF_FLAG_PID_CGROUP;
+ flags |= PERF_FLAG_PID_CGROUP;
pid = evsel->cgrp->fd;
}

fallback_missing_features:
+ if (perf_missing_features.cloexec)
+ flags &= ~(unsigned long)PERF_FLAG_FD_CLOEXEC;
if (perf_missing_features.mmap2)
evsel->attr.mmap2 = 0;
if (perf_missing_features.exclude_guest)
@@ -1070,7 +1073,10 @@ try_fallback:
if (err != -EINVAL || cpu > 0 || thread > 0)
goto out_close;

- if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
+ if (!perf_missing_features.cloexec && (flags & PERF_FLAG_FD_CLOEXEC)) {
+ perf_missing_features.cloexec = true;
+ goto fallback_missing_features;
+ } else if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
perf_missing_features.mmap2 = true;
goto fallback_missing_features;
} else if (!perf_missing_features.exclude_guest &&
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 049e0a09ccd3..9a8d622809f6 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -4,6 +4,7 @@
#include "parse-events.h"
#include <api/fs/fs.h>
#include "util.h"
+#include "cloexec.h"

typedef void (*setup_probe_fn_t)(struct perf_evsel *evsel);

@@ -11,6 +12,7 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str)
{
struct perf_evlist *evlist;
struct perf_evsel *evsel;
+ unsigned long flags = perf_event_open_cloexec_flag();
int err = -EAGAIN, fd;

evlist = perf_evlist__new();
@@ -22,14 +24,14 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str)

evsel = perf_evlist__first(evlist);

- fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
+ fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, flags);
if (fd < 0)
goto out_delete;
close(fd);

fn(evsel);

- fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
+ fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, flags);
if (fd < 0) {
if (errno == EINVAL)
err = -EINVAL;
@@ -203,7 +205,8 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str)
cpu = evlist->cpus->map[0];
}

- fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
+ fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1,
+ perf_event_open_cloexec_flag());
if (fd >= 0) {
close(fd);
ret = true;
--
1.9.3


2014-06-02 19:25:19

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHv5] perf tools: enable close-on-exec flag on perf file descriptor

On Mon, Jun 02, 2014 at 12:56:34PM +0200, Yann Droneaud wrote:

SNIP

>
> Hi,
>
> Quite the same patch from v4. I'm interested in some
> feedback so that I could improve the patch if needed.
>
> Regards.
>
> Changes from v4 [1]:
> - rebase on next-20140530 and update commit message.

hi,
I wasn't following on this one before.. so please shut me up
if my comments were already discussed ;-)

SNIP

> index d7176830b9b2..bb4afc9fae54 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -10,6 +10,7 @@
> #include "util/header.h"
> #include "util/session.h"
> #include "util/tool.h"
> +#include "util/cloexec.h"
>
> #include "util/parse-options.h"
> #include "util/trace-event.h"
> @@ -434,7 +435,8 @@ static int self_open_counters(void)
> attr.type = PERF_TYPE_SOFTWARE;
> attr.config = PERF_COUNT_SW_TASK_CLOCK;
>
> - fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> + fd = sys_perf_event_open(&attr, 0, -1, -1,
> + perf_event_open_cloexec_flag());
>
> if (fd < 0)
> pr_err("Error: sys_perf_event_open() syscall returned "
> diff --git a/tools/perf/tests/bp_signal.c b/tools/perf/tests/bp_signal.c
> index aba095489193..fdc0d3e185f9 100644
> --- a/tools/perf/tests/bp_signal.c
> +++ b/tools/perf/tests/bp_signal.c
> @@ -25,6 +25,7 @@
> #include "tests.h"
> #include "debug.h"
> #include "perf.h"
> +#include "../util/cloexec.h"

#include "cloexec.h" should be enough

>
> static int fd1;
> static int fd2;
> @@ -78,7 +79,8 @@ static int bp_event(void *fn, int setup_signal)
> pe.exclude_kernel = 1;
> pe.exclude_hv = 1;
>
> - fd = sys_perf_event_open(&pe, 0, -1, -1, 0);
> + fd = sys_perf_event_open(&pe, 0, -1, -1,
> + perf_event_open_cloexec_flag());
> if (fd < 0) {
> pr_debug("failed opening event %llx\n", pe.config);
> return TEST_FAIL;
> diff --git a/tools/perf/tests/bp_signal_overflow.c b/tools/perf/tests/bp_signal_overflow.c
> index 44ac82179708..b0b17415f18c 100644
> --- a/tools/perf/tests/bp_signal_overflow.c
> +++ b/tools/perf/tests/bp_signal_overflow.c
> @@ -24,6 +24,7 @@
> #include "tests.h"
> #include "debug.h"
> #include "perf.h"
> +#include "../util/cloexec.h"

ditto

>
> static int overflows;
>
> @@ -91,7 +92,8 @@ int test__bp_signal_overflow(void)
> pe.exclude_kernel = 1;
> pe.exclude_hv = 1;
>
> - fd = sys_perf_event_open(&pe, 0, -1, -1, 0);
> + fd = sys_perf_event_open(&pe, 0, -1, -1,
> + perf_event_open_cloexec_flag());
> if (fd < 0) {
> pr_debug("failed opening event %llx\n", pe.config);
> return TEST_FAIL;
> diff --git a/tools/perf/tests/rdpmc.c b/tools/perf/tests/rdpmc.c
> index e59143fd9e71..8432bfcffed6 100644
> --- a/tools/perf/tests/rdpmc.c
> +++ b/tools/perf/tests/rdpmc.c
> @@ -6,6 +6,7 @@
> #include "perf.h"
> #include "debug.h"
> #include "tests.h"
> +#include "../util/cloexec.h"

ditto

SNIP

> +
> +static int perf_flag_probe(void)
> +{
> + /* use 'safest' configuration as used in perf_evsel__fallback() */
> + struct perf_event_attr attr = {
> + .type = PERF_COUNT_SW_CPU_CLOCK,
> + .config = PERF_COUNT_SW_CPU_CLOCK,
> + };
> + int fd;
> + int err;
> +
> + /* check cloexec flag */
> + fd = sys_perf_event_open(&attr, 0, -1, -1,
> + PERF_FLAG_FD_CLOEXEC);
> + if (fd >= 0) {
> + close(fd);
> + return 1;
> + }
> +
> + if (errno != EINVAL) {
> + err = errno;
> + pr_warning("sys_perf_event_open() syscall returned "
> + "%d (%d: %s)\n", fd, err, strerror(err));
> + }
> +
> + /* not supported, confirm error related to PERF_FLAG_FD_CLOEXEC */
> + fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> + if (fd >= 0) {
> + close(fd);
> + return 0;
> + }
> +
> + err = errno;
> + die("sys_perf_event_open() syscall returned "
> + "%d (%d: %s)\n", fd, err, strerror(err));

I think we have a strategy of not using die in new code..
please just use WARN_ONCE.. and let the code fail in the
standard way

SNIP

>
> #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
> @@ -988,7 +989,7 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> struct thread_map *threads)
> {
> int cpu, thread;
> - unsigned long flags = 0;
> + unsigned long flags = PERF_FLAG_FD_CLOEXEC;
> int pid = -1, err;
> enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE;
>
> @@ -997,11 +998,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> return -ENOMEM;
>
> if (evsel->cgrp) {
> - flags = PERF_FLAG_PID_CGROUP;
> + flags |= PERF_FLAG_PID_CGROUP;
> pid = evsel->cgrp->fd;
> }
>
> fallback_missing_features:
> + if (perf_missing_features.cloexec)
> + flags &= ~(unsigned long)PERF_FLAG_FD_CLOEXEC;
> if (perf_missing_features.mmap2)
> evsel->attr.mmap2 = 0;
> if (perf_missing_features.exclude_guest)
> @@ -1070,7 +1073,10 @@ try_fallback:
> if (err != -EINVAL || cpu > 0 || thread > 0)
> goto out_close;
>
> - if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
> + if (!perf_missing_features.cloexec && (flags & PERF_FLAG_FD_CLOEXEC)) {
> + perf_missing_features.cloexec = true;
> + goto fallback_missing_features;

I think it does not fit in here, because we check latest perf_event_attr
bits and removing one after another, to see if miracle happens.
(also looks like we miss exclude_callchain_(kernel|user) bits)

Wouldn't it be better just to use perf_event_open_cloexec_flag() function
as you did in the rest of the code? I think we dont need 2 detection codes
for this.

> + } else if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
> perf_missing_features.mmap2 = true;
> goto fallback_missing_features;
> } else if (!perf_missing_features.exclude_guest &&
> diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
> index 049e0a09ccd3..9a8d622809f6 100644
> --- a/tools/perf/util/record.c
> +++ b/tools/perf/util/record.c
> @@ -4,6 +4,7 @@
> #include "parse-events.h"
> #include <api/fs/fs.h>
> #include "util.h"
> +#include "cloexec.h"
>
> typedef void (*setup_probe_fn_t)(struct perf_evsel *evsel);
>
> @@ -11,6 +12,7 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str)
> {
> struct perf_evlist *evlist;
> struct perf_evsel *evsel;
> + unsigned long flags = perf_event_open_cloexec_flag();
> int err = -EAGAIN, fd;
>
> evlist = perf_evlist__new();
> @@ -22,14 +24,14 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str)
>
> evsel = perf_evlist__first(evlist);
>
> - fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
> + fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, flags);
> if (fd < 0)
> goto out_delete;
> close(fd);
>
> fn(evsel);
>
> - fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
> + fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, flags);
> if (fd < 0) {
> if (errno == EINVAL)
> err = -EINVAL;
> @@ -203,7 +205,8 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str)
> cpu = evlist->cpus->map[0];
> }

hum... no one calls this exported function ^^^ :-) I wonder why it's there.. cool

thanks,
jirka

2014-06-03 08:57:46

by Yann Droneaud

[permalink] [raw]
Subject: Re: [PATCHv5] perf tools: enable close-on-exec flag on perf file descriptor

Le lundi 02 juin 2014 à 21:23 +0200, Jiri Olsa a écrit :
> On Mon, Jun 02, 2014 at 12:56:34PM +0200, Yann Droneaud wrote:
>
> SNIP
>
> >
> > Hi,
> >
> > Quite the same patch from v4. I'm interested in some
> > feedback so that I could improve the patch if needed.
> >
> > Regards.
> >
> > Changes from v4 [1]:
> > - rebase on next-20140530 and update commit message.
>
> hi,
> I wasn't following on this one before.. so please shut me up
> if my comments were already discussed ;-)
>

You had a chance to review it before :)
See http://lkml.kernel.org/r/[email protected]

> SNIP
>
> > index d7176830b9b2..bb4afc9fae54 100644
> > --- a/tools/perf/builtin-sched.c
> > +++ b/tools/perf/builtin-sched.c
> > @@ -10,6 +10,7 @@
> > #include "util/header.h"
> > #include "util/session.h"
> > #include "util/tool.h"
> > +#include "util/cloexec.h"
> >
> > #include "util/parse-options.h"
> > #include "util/trace-event.h"
> > @@ -434,7 +435,8 @@ static int self_open_counters(void)
> > attr.type = PERF_TYPE_SOFTWARE;
> > attr.config = PERF_COUNT_SW_TASK_CLOCK;
> >
> > - fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> > + fd = sys_perf_event_open(&attr, 0, -1, -1,
> > + perf_event_open_cloexec_flag());
> >
> > if (fd < 0)
> > pr_err("Error: sys_perf_event_open() syscall returned "
> > diff --git a/tools/perf/tests/bp_signal.c b/tools/perf/tests/bp_signal.c
> > index aba095489193..fdc0d3e185f9 100644
> > --- a/tools/perf/tests/bp_signal.c
> > +++ b/tools/perf/tests/bp_signal.c
> > @@ -25,6 +25,7 @@
> > #include "tests.h"
> > #include "debug.h"
> > #include "perf.h"
> > +#include "../util/cloexec.h"
>
> #include "cloexec.h" should be enough
>

OK

> >
> > static int fd1;
> > static int fd2;
> > @@ -78,7 +79,8 @@ static int bp_event(void *fn, int setup_signal)
> > pe.exclude_kernel = 1;
> > pe.exclude_hv = 1;
> >
> > - fd = sys_perf_event_open(&pe, 0, -1, -1, 0);
> > + fd = sys_perf_event_open(&pe, 0, -1, -1,
> > + perf_event_open_cloexec_flag());
> > if (fd < 0) {
> > pr_debug("failed opening event %llx\n", pe.config);
> > return TEST_FAIL;
> > diff --git a/tools/perf/tests/bp_signal_overflow.c b/tools/perf/tests/bp_signal_overflow.c
> > index 44ac82179708..b0b17415f18c 100644
> > --- a/tools/perf/tests/bp_signal_overflow.c
> > +++ b/tools/perf/tests/bp_signal_overflow.c
> > @@ -24,6 +24,7 @@
> > #include "tests.h"
> > #include "debug.h"
> > #include "perf.h"
> > +#include "../util/cloexec.h"
>
> ditto
>

OK

> >
> > static int overflows;
> >
> > @@ -91,7 +92,8 @@ int test__bp_signal_overflow(void)
> > pe.exclude_kernel = 1;
> > pe.exclude_hv = 1;
> >
> > - fd = sys_perf_event_open(&pe, 0, -1, -1, 0);
> > + fd = sys_perf_event_open(&pe, 0, -1, -1,
> > + perf_event_open_cloexec_flag());
> > if (fd < 0) {
> > pr_debug("failed opening event %llx\n", pe.config);
> > return TEST_FAIL;
> > diff --git a/tools/perf/tests/rdpmc.c b/tools/perf/tests/rdpmc.c
> > index e59143fd9e71..8432bfcffed6 100644
> > --- a/tools/perf/tests/rdpmc.c
> > +++ b/tools/perf/tests/rdpmc.c
> > @@ -6,6 +6,7 @@
> > #include "perf.h"
> > #include "debug.h"
> > #include "tests.h"
> > +#include "../util/cloexec.h"
>
> ditto
>

OK

> SNIP
>
> > +
> > +static int perf_flag_probe(void)
> > +{
> > + /* use 'safest' configuration as used in perf_evsel__fallback() */
> > + struct perf_event_attr attr = {
> > + .type = PERF_COUNT_SW_CPU_CLOCK,
> > + .config = PERF_COUNT_SW_CPU_CLOCK,
> > + };
> > + int fd;
> > + int err;
> > +
> > + /* check cloexec flag */
> > + fd = sys_perf_event_open(&attr, 0, -1, -1,
> > + PERF_FLAG_FD_CLOEXEC);
> > + if (fd >= 0) {
> > + close(fd);
> > + return 1;
> > + }
> > +
> > + if (errno != EINVAL) {
> > + err = errno;
> > + pr_warning("sys_perf_event_open() syscall returned "
> > + "%d (%d: %s)\n", fd, err, strerror(err));
> > + }
> > +
> > + /* not supported, confirm error related to PERF_FLAG_FD_CLOEXEC */
> > + fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> > + if (fd >= 0) {
> > + close(fd);
> > + return 0;
> > + }
> > +
> > + err = errno;
> > + die("sys_perf_event_open() syscall returned "
> > + "%d (%d: %s)\n", fd, err, strerror(err));
>
> I think we have a strategy of not using die in new code..
> please just use WARN_ONCE.. and let the code fail in the
> standard way
>

OK


> SNIP
>
> >
> > #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
> > @@ -988,7 +989,7 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> > struct thread_map *threads)
> > {
> > int cpu, thread;
> > - unsigned long flags = 0;
> > + unsigned long flags = PERF_FLAG_FD_CLOEXEC;
> > int pid = -1, err;
> > enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE;
> >
> > @@ -997,11 +998,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> > return -ENOMEM;
> >
> > if (evsel->cgrp) {
> > - flags = PERF_FLAG_PID_CGROUP;
> > + flags |= PERF_FLAG_PID_CGROUP;
> > pid = evsel->cgrp->fd;
> > }
> >
> > fallback_missing_features:
> > + if (perf_missing_features.cloexec)
> > + flags &= ~(unsigned long)PERF_FLAG_FD_CLOEXEC;
> > if (perf_missing_features.mmap2)
> > evsel->attr.mmap2 = 0;
> > if (perf_missing_features.exclude_guest)
> > @@ -1070,7 +1073,10 @@ try_fallback:
> > if (err != -EINVAL || cpu > 0 || thread > 0)
> > goto out_close;
> >
> > - if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
> > + if (!perf_missing_features.cloexec && (flags & PERF_FLAG_FD_CLOEXEC)) {
> > + perf_missing_features.cloexec = true;
> > + goto fallback_missing_features;
>
> I think it does not fit in here, because we check latest perf_event_attr
> bits and removing one after another, to see if miracle happens.
> (also looks like we miss exclude_callchain_(kernel|user) bits)
>
> Wouldn't it be better just to use perf_event_open_cloexec_flag() function
> as you did in the rest of the code? I think we dont need 2 detection codes
> for this.
>

I was asked by ACME to use here the missing feature detection pattern
and to treat the flag just as a new feature like perf_event_attr new
attributes.

See http://lkml.kernel.org/r/[email protected]
http://lkml.kernel.org/r/[email protected]

> > + } else if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
> > perf_missing_features.mmap2 = true;
> > goto fallback_missing_features;
> > } else if (!perf_missing_features.exclude_guest &&
> > diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
> > index 049e0a09ccd3..9a8d622809f6 100644
> > --- a/tools/perf/util/record.c
> > +++ b/tools/perf/util/record.c
> > @@ -4,6 +4,7 @@
> > #include "parse-events.h"
> > #include <api/fs/fs.h>
> > #include "util.h"
> > +#include "cloexec.h"
> >
> > typedef void (*setup_probe_fn_t)(struct perf_evsel *evsel);
> >
> > @@ -11,6 +12,7 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str)
> > {
> > struct perf_evlist *evlist;
> > struct perf_evsel *evsel;
> > + unsigned long flags = perf_event_open_cloexec_flag();
> > int err = -EAGAIN, fd;
> >
> > evlist = perf_evlist__new();
> > @@ -22,14 +24,14 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str)
> >
> > evsel = perf_evlist__first(evlist);
> >
> > - fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
> > + fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, flags);
> > if (fd < 0)
> > goto out_delete;
> > close(fd);
> >
> > fn(evsel);
> >
> > - fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
> > + fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, flags);
> > if (fd < 0) {
> > if (errno == EINVAL)
> > err = -EINVAL;
> > @@ -203,7 +205,8 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str)
> > cpu = evlist->cpus->map[0];
> > }
>
> hum... no one calls this exported function ^^^ :-) I wonder why it's there.. cool
>

Nice catch. This function was introduced by commit
c09ec622629eeb4b7877646a42852e7156363425 ('perf evlist: Add
can_select_event() method') by Adrian Hunter <[email protected]>
and it seems it was never used by perf tools.
Is it available to some outside tree perf tools ?


Thanks for the review.

Regards

--
Yann Droneaud
OPTEYA

2014-06-03 09:24:06

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCHv5] perf tools: enable close-on-exec flag on perf file descriptor

On 06/03/2014 11:57 AM, Yann Droneaud wrote:
> Nice catch. This function was introduced by commit
> c09ec622629eeb4b7877646a42852e7156363425 ('perf evlist: Add
> can_select_event() method') by Adrian Hunter <[email protected]>
> and it seems it was never used by perf tools.

It is part of preparation for Intel PT.

2014-06-03 11:52:25

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHv5] perf tools: enable close-on-exec flag on perf file descriptor

On Tue, Jun 03, 2014 at 10:57:15AM +0200, Yann Droneaud wrote:
> Le lundi 02 juin 2014 ? 21:23 +0200, Jiri Olsa a ?crit :
> > On Mon, Jun 02, 2014 at 12:56:34PM +0200, Yann Droneaud wrote:
> >
> > SNIP
> >
> > >
> > > Hi,
> > >
> > > Quite the same patch from v4. I'm interested in some
> > > feedback so that I could improve the patch if needed.
> > >
> > > Regards.
> > >
> > > Changes from v4 [1]:
> > > - rebase on next-20140530 and update commit message.
> >
> > hi,
> > I wasn't following on this one before.. so please shut me up
> > if my comments were already discussed ;-)
> >
>
> You had a chance to review it before :)
> See http://lkml.kernel.org/r/[email protected]

yep, but then I drift away ;-)

SNIP

> > >
> > > #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
> > > @@ -988,7 +989,7 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> > > struct thread_map *threads)
> > > {
> > > int cpu, thread;
> > > - unsigned long flags = 0;
> > > + unsigned long flags = PERF_FLAG_FD_CLOEXEC;
> > > int pid = -1, err;
> > > enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE;
> > >
> > > @@ -997,11 +998,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> > > return -ENOMEM;
> > >
> > > if (evsel->cgrp) {
> > > - flags = PERF_FLAG_PID_CGROUP;
> > > + flags |= PERF_FLAG_PID_CGROUP;
> > > pid = evsel->cgrp->fd;
> > > }
> > >
> > > fallback_missing_features:
> > > + if (perf_missing_features.cloexec)
> > > + flags &= ~(unsigned long)PERF_FLAG_FD_CLOEXEC;
> > > if (perf_missing_features.mmap2)
> > > evsel->attr.mmap2 = 0;
> > > if (perf_missing_features.exclude_guest)
> > > @@ -1070,7 +1073,10 @@ try_fallback:
> > > if (err != -EINVAL || cpu > 0 || thread > 0)
> > > goto out_close;
> > >
> > > - if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
> > > + if (!perf_missing_features.cloexec && (flags & PERF_FLAG_FD_CLOEXEC)) {
> > > + perf_missing_features.cloexec = true;
> > > + goto fallback_missing_features;
> >
> > I think it does not fit in here, because we check latest perf_event_attr
> > bits and removing one after another, to see if miracle happens.
> > (also looks like we miss exclude_callchain_(kernel|user) bits)
> >
> > Wouldn't it be better just to use perf_event_open_cloexec_flag() function
> > as you did in the rest of the code? I think we dont need 2 detection codes
> > for this.
> >
>
> I was asked by ACME to use here the missing feature detection pattern
> and to treat the flag just as a new feature like perf_event_attr new
> attributes.
>
> See http://lkml.kernel.org/r/[email protected]
> http://lkml.kernel.org/r/[email protected]

ook, I probably missed that.. anyway I still feel like the perf_event_open_cloexec_flag
function should be enough.. I'll leave this one to Arnadlo

thanks,
jirka

2014-06-30 20:32:05

by Yann Droneaud

[permalink] [raw]
Subject: [PATCHv6] perf tools: enable close-on-exec flag on perf file descriptor

In commit a21b0b354d4a ('perf: Introduce a flag to enable
close-on-exec in perf_event_open()'), flag PERF_FLAG_FD_CLOEXEC
was added to perf_event_open(2) syscall to allows userspace
to atomically enable close-on-exec behavor when creating
the file descriptor.

This patch makes perf tools use the new flag if supported
by the kernel, so that the event file descriptors got
automatically closed if perf tool exec a sub-command.

Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Yann Droneaud <[email protected]>
---

Hi,

I've addressed the issues reported by Jiri Olsa,
please have a look at the update patch.

Regards.

Changes from v5 [1]
- replace "../util/cloexec.h" by "cloexec.h"
- replace die() by WARN_ONCE()
- replace pr_warning() by WARN_ONCE()
- rework warning messages

Changes from v4 [2]:
- rebase on next-20140530 and update commit message.

Changes from v3 [3]:
- rebase on next-20140307 and resend asis.

Changes from v2 [4]:
- use safest perf attr configuration in probe function,
as used in perf_evsel__fallback().

Changes from v1 [5]:
- don't probe PERF_FLAG_FD_CLOEXEC for each call to
perf_event_open_cloexec_flag(): don't forget to set
'probed' variable once flag was probed.
- call perf_event_open_cloexec_flag() only once in
util/record.c:perf_do_probe_api().
- fixed minor coding style issue (unneeded braces)
in util/cloexec.c

Changes from v0 [6]:
- add probing for PERF_FLAG_FD_CLOEXEC flag allowing
perf to run on older kernel:
* use "missing feature" pattern in evsel to disable
PERF_FLAG_FD_CLOEXEC in perf_evsel__open() if not
supported by kernel;
* add a dedicated function to probe for
PERF_FLAG_FD_CLOEXEC support in the current kernel.
This function is to be used by other functions
calling sys_perf_event_open() directly.
- remove PERF_FLAG_FD_CLOEXEC from PowerPC selftest
as it's not related to perf tool: it should be part
of a separate patch.

[1] http://lkml.kernel.org/r/[email protected]

[2] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3809711/

[3] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3540091/

[4] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3474141/

[5] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3469571/

[6] http://lkml.kernel.org/r/[email protected]
https://patchwork.kernel.org/patch/3437111/

tools/perf/Makefile.perf | 1 +
tools/perf/bench/mem-memcpy.c | 4 ++-
tools/perf/bench/mem-memset.c | 4 ++-
tools/perf/builtin-sched.c | 4 ++-
tools/perf/tests/bp_signal.c | 4 ++-
tools/perf/tests/bp_signal_overflow.c | 4 ++-
tools/perf/tests/rdpmc.c | 4 ++-
tools/perf/util/cloexec.c | 57 +++++++++++++++++++++++++++++++++++
tools/perf/util/cloexec.h | 6 ++++
tools/perf/util/evsel.c | 12 ++++++--
tools/perf/util/record.c | 9 ++++--
11 files changed, 97 insertions(+), 12 deletions(-)
create mode 100644 tools/perf/util/cloexec.c
create mode 100644 tools/perf/util/cloexec.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 9670a16fa577..66cde74ee851 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -373,6 +373,7 @@ LIB_OBJS += $(OUTPUT)util/stat.o
LIB_OBJS += $(OUTPUT)util/record.o
LIB_OBJS += $(OUTPUT)util/srcline.o
LIB_OBJS += $(OUTPUT)util/data.o
+LIB_OBJS += $(OUTPUT)util/cloexec.o

LIB_OBJS += $(OUTPUT)ui/setup.o
LIB_OBJS += $(OUTPUT)ui/helpline.o
diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c
index 5ce71d3b72cf..bf5a21b848a9 100644
--- a/tools/perf/bench/mem-memcpy.c
+++ b/tools/perf/bench/mem-memcpy.c
@@ -10,6 +10,7 @@
#include "../util/util.h"
#include "../util/parse-options.h"
#include "../util/header.h"
+#include "../util/cloexec.h"
#include "bench.h"
#include "mem-memcpy-arch.h"

@@ -83,7 +84,8 @@ static struct perf_event_attr cycle_attr = {

static void init_cycle(void)
{
- cycle_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1, 0);
+ cycle_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1,
+ perf_event_open_cloexec_flag());

if (cycle_fd < 0 && errno == ENOSYS)
die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
diff --git a/tools/perf/bench/mem-memset.c b/tools/perf/bench/mem-memset.c
index 9af79d2b18e5..260747ea1e0e 100644
--- a/tools/perf/bench/mem-memset.c
+++ b/tools/perf/bench/mem-memset.c
@@ -10,6 +10,7 @@
#include "../util/util.h"
#include "../util/parse-options.h"
#include "../util/header.h"
+#include "../util/cloexec.h"
#include "bench.h"
#include "mem-memset-arch.h"

@@ -83,7 +84,8 @@ static struct perf_event_attr cycle_attr = {

static void init_cycle(void)
{
- cycle_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1, 0);
+ cycle_fd = sys_perf_event_open(&cycle_attr, getpid(), -1, -1,
+ perf_event_open_cloexec_flag());

if (cycle_fd < 0 && errno == ENOSYS)
die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index c38d06c04775..9c686b342013 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -10,6 +10,7 @@
#include "util/header.h"
#include "util/session.h"
#include "util/tool.h"
+#include "util/cloexec.h"

#include "util/parse-options.h"
#include "util/trace-event.h"
@@ -434,7 +435,8 @@ static int self_open_counters(void)
attr.type = PERF_TYPE_SOFTWARE;
attr.config = PERF_COUNT_SW_TASK_CLOCK;

- fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+ fd = sys_perf_event_open(&attr, 0, -1, -1,
+ perf_event_open_cloexec_flag());

if (fd < 0)
pr_err("Error: sys_perf_event_open() syscall returned "
diff --git a/tools/perf/tests/bp_signal.c b/tools/perf/tests/bp_signal.c
index aba095489193..a02b035fd5aa 100644
--- a/tools/perf/tests/bp_signal.c
+++ b/tools/perf/tests/bp_signal.c
@@ -25,6 +25,7 @@
#include "tests.h"
#include "debug.h"
#include "perf.h"
+#include "cloexec.h"

static int fd1;
static int fd2;
@@ -78,7 +79,8 @@ static int bp_event(void *fn, int setup_signal)
pe.exclude_kernel = 1;
pe.exclude_hv = 1;

- fd = sys_perf_event_open(&pe, 0, -1, -1, 0);
+ fd = sys_perf_event_open(&pe, 0, -1, -1,
+ perf_event_open_cloexec_flag());
if (fd < 0) {
pr_debug("failed opening event %llx\n", pe.config);
return TEST_FAIL;
diff --git a/tools/perf/tests/bp_signal_overflow.c b/tools/perf/tests/bp_signal_overflow.c
index 44ac82179708..e76537724491 100644
--- a/tools/perf/tests/bp_signal_overflow.c
+++ b/tools/perf/tests/bp_signal_overflow.c
@@ -24,6 +24,7 @@
#include "tests.h"
#include "debug.h"
#include "perf.h"
+#include "cloexec.h"

static int overflows;

@@ -91,7 +92,8 @@ int test__bp_signal_overflow(void)
pe.exclude_kernel = 1;
pe.exclude_hv = 1;

- fd = sys_perf_event_open(&pe, 0, -1, -1, 0);
+ fd = sys_perf_event_open(&pe, 0, -1, -1,
+ perf_event_open_cloexec_flag());
if (fd < 0) {
pr_debug("failed opening event %llx\n", pe.config);
return TEST_FAIL;
diff --git a/tools/perf/tests/rdpmc.c b/tools/perf/tests/rdpmc.c
index e59143fd9e71..c04d1f268576 100644
--- a/tools/perf/tests/rdpmc.c
+++ b/tools/perf/tests/rdpmc.c
@@ -6,6 +6,7 @@
#include "perf.h"
#include "debug.h"
#include "tests.h"
+#include "cloexec.h"

#if defined(__x86_64__) || defined(__i386__)

@@ -104,7 +105,8 @@ static int __test__rdpmc(void)
sa.sa_sigaction = segfault_handler;
sigaction(SIGSEGV, &sa, NULL);

- fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+ fd = sys_perf_event_open(&attr, 0, -1, -1,
+ perf_event_open_cloexec_flag());
if (fd < 0) {
pr_err("Error: sys_perf_event_open() syscall returned "
"with %d (%s)\n", fd, strerror(errno));
diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c
new file mode 100644
index 000000000000..c5d05ec17220
--- /dev/null
+++ b/tools/perf/util/cloexec.c
@@ -0,0 +1,57 @@
+#include "util.h"
+#include "../perf.h"
+#include "cloexec.h"
+#include "asm/bug.h"
+
+static unsigned long flag = PERF_FLAG_FD_CLOEXEC;
+
+static int perf_flag_probe(void)
+{
+ /* use 'safest' configuration as used in perf_evsel__fallback() */
+ struct perf_event_attr attr = {
+ .type = PERF_COUNT_SW_CPU_CLOCK,
+ .config = PERF_COUNT_SW_CPU_CLOCK,
+ };
+ int fd;
+ int err;
+
+ /* check cloexec flag */
+ fd = sys_perf_event_open(&attr, 0, -1, -1,
+ PERF_FLAG_FD_CLOEXEC);
+ err = errno;
+
+ if (fd >= 0) {
+ close(fd);
+ return 1;
+ }
+
+ WARN_ONCE(err != EINVAL,
+ "perf_event_open(..., PERF_FLAG_FD_CLOEXEC) failed with unexpected error %d (%s)\n",
+ err, strerror(err));
+
+ /* not supported, confirm error related to PERF_FLAG_FD_CLOEXEC */
+ fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+ err = errno;
+
+ if (WARN_ONCE(fd < 0,
+ "perf_event_open(..., 0) failed unexpectedly with error %d (%s)\n",
+ err, strerror(err)))
+ return -1;
+
+ close(fd);
+
+ return 0;
+}
+
+unsigned long perf_event_open_cloexec_flag(void)
+{
+ static bool probed;
+
+ if (!probed) {
+ if (perf_flag_probe() <= 0)
+ flag = 0;
+ probed = true;
+ }
+
+ return flag;
+}
diff --git a/tools/perf/util/cloexec.h b/tools/perf/util/cloexec.h
new file mode 100644
index 000000000000..94a5a7d829d5
--- /dev/null
+++ b/tools/perf/util/cloexec.h
@@ -0,0 +1,6 @@
+#ifndef __PERF_CLOEXEC_H
+#define __PERF_CLOEXEC_H
+
+unsigned long perf_event_open_cloexec_flag(void);
+
+#endif /* __PERF_CLOEXEC_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8606175fe1e8..8623840a6e4a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -29,6 +29,7 @@ static struct {
bool sample_id_all;
bool exclude_guest;
bool mmap2;
+ bool cloexec;
} perf_missing_features;

#define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
@@ -989,7 +990,7 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
struct thread_map *threads)
{
int cpu, thread;
- unsigned long flags = 0;
+ unsigned long flags = PERF_FLAG_FD_CLOEXEC;
int pid = -1, err;
enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE;

@@ -998,11 +999,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
return -ENOMEM;

if (evsel->cgrp) {
- flags = PERF_FLAG_PID_CGROUP;
+ flags |= PERF_FLAG_PID_CGROUP;
pid = evsel->cgrp->fd;
}

fallback_missing_features:
+ if (perf_missing_features.cloexec)
+ flags &= ~(unsigned long)PERF_FLAG_FD_CLOEXEC;
if (perf_missing_features.mmap2)
evsel->attr.mmap2 = 0;
if (perf_missing_features.exclude_guest)
@@ -1071,7 +1074,10 @@ try_fallback:
if (err != -EINVAL || cpu > 0 || thread > 0)
goto out_close;

- if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
+ if (!perf_missing_features.cloexec && (flags & PERF_FLAG_FD_CLOEXEC)) {
+ perf_missing_features.cloexec = true;
+ goto fallback_missing_features;
+ } else if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
perf_missing_features.mmap2 = true;
goto fallback_missing_features;
} else if (!perf_missing_features.exclude_guest &&
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 049e0a09ccd3..9a8d622809f6 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -4,6 +4,7 @@
#include "parse-events.h"
#include <api/fs/fs.h>
#include "util.h"
+#include "cloexec.h"

typedef void (*setup_probe_fn_t)(struct perf_evsel *evsel);

@@ -11,6 +12,7 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str)
{
struct perf_evlist *evlist;
struct perf_evsel *evsel;
+ unsigned long flags = perf_event_open_cloexec_flag();
int err = -EAGAIN, fd;

evlist = perf_evlist__new();
@@ -22,14 +24,14 @@ static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str)

evsel = perf_evlist__first(evlist);

- fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
+ fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, flags);
if (fd < 0)
goto out_delete;
close(fd);

fn(evsel);

- fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
+ fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, flags);
if (fd < 0) {
if (errno == EINVAL)
err = -EINVAL;
@@ -203,7 +205,8 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str)
cpu = evlist->cpus->map[0];
}

- fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
+ fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1,
+ perf_event_open_cloexec_flag());
if (fd >= 0) {
close(fd);
ret = true;
--
1.9.3

2014-07-12 23:30:29

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHv6] perf tools: enable close-on-exec flag on perf file descriptor

On Mon, Jun 30, 2014 at 10:28:47PM +0200, Yann Droneaud wrote:
> In commit a21b0b354d4a ('perf: Introduce a flag to enable
> close-on-exec in perf_event_open()'), flag PERF_FLAG_FD_CLOEXEC
> was added to perf_event_open(2) syscall to allows userspace
> to atomically enable close-on-exec behavor when creating
> the file descriptor.
>
> This patch makes perf tools use the new flag if supported
> by the kernel, so that the event file descriptors got
> automatically closed if perf tool exec a sub-command.
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Yann Droneaud <[email protected]>
> ---
>
> Hi,
>
> I've addressed the issues reported by Jiri Olsa,
> please have a look at the update patch.

hi,
I'll queue this one

thanks,
jirka