2012-10-20 14:36:14

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

hi,
adding support to read sample values through the PERF_SAMPLE_READ
sample type. It's now possible to specify 'S' modifier for an event
and get its sample value by PERF_SAMPLE_READ.

For group the 'S' modifier will enable sampling only for the leader
and read all the group member by PERF_SAMPLE_READ smple type with
PERF_FORMAT_GROUP read format.

This patchset is based on group report patches by Namhyung Kim:
http://lwn.net/Articles/518569/

Example:
(making sample on cycles, reading both cycles and cache-misses
by PERF_SAMPLE_READ/PERF_FORMAT_GROUP)

# ./perf record -e '{cycles,cache-misses}:S' ls
...

# ./perf report --group --show-total-period --stdio
# ========
# captured on: Sat Oct 20 16:53:39 2012
...
# group: {cycles,cache-misses}
# ========
#
# Samples: 86 of event 'anon group { cycles, cache-misses }'
# Event count (approx.): 34863674
#
# Overhead Period Command Shared Object Symbol
# ................ ........................ ....... ................. ................................
#
16.56% 19.47% 5773450 475 ls [kernel.kallsyms] [k] native_sched_clock
10.87% 0.74% 3789088 18 ls [kernel.kallsyms] [k] rtl8169_interrupt
9.82% 15.86% 3423364 387 ls [kernel.kallsyms] [k] mark_lock
8.43% 17.75% 2938384 433 ls ld-2.14.90.so [.] do_lookup_x
6.79% 20.86% 2365622 509 ls ls [.] calculate_columns
6.36% 0.61% 2216808 15 ls [kernel.kallsyms] [k] lock_release
...


Attached patches:
01/11 perf: Add PERF_EVENT_IOC_ID ioctl to return event ID
02/11 perf: Do not get values from disabled counters in group format read
03/11 perf tool: Use PERF_EVENT_IOC_ID perf ioctl to read event id
04/11 perf tool: Add support for parsing PERF_SAMPLE_READ sample type
05/11 perf tool: Fix event ID retrieval for group format read case
06/11 perf tool: Add perf_evlist__id2sid function to get event ID related data
07/11 perf tool: Add PERF_SAMPLE_READ sample related processing
08/11 perf tool: Add 'S' event/group modifier to read sample value
09/11 perf test: Add parse events tests for leader sampling
10/11 perf tool: Display period values for all group members
11/11 perf record: Fix mmap error output condition


wbr,
jirka

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 27 ++++++++++++++++++---
tools/perf/Documentation/perf-list.txt | 1 +
tools/perf/builtin-record.c | 3 ++-
tools/perf/ui/hist.c | 46 +++++++++++++++++++++++++++++++----
tools/perf/util/event.h | 18 ++++++++++++++
tools/perf/util/evlist.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++------------
tools/perf/util/evlist.h | 4 ++++
tools/perf/util/evsel.c | 54 +++++++++++++++++++++++++++++++++++++++--
tools/perf/util/evsel.h | 4 ++++
tools/perf/util/parse-events-test.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/parse-events.c | 6 +++++
tools/perf/util/parse-events.l | 2 +-
tools/perf/util/session.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
14 files changed, 448 insertions(+), 27 deletions(-)


2012-10-20 14:33:45

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 03/11] perf tool: Use PERF_EVENT_IOC_ID perf ioctl to read event id

Changing the way we retrieve the event ID. Instead of parsing out
the ID out of the read data, using the PERF_EVENT_IOC_ID ioctl.

Keeping the old way in place to support kernels without
PERF_EVENT_IOC_ID ioctl support.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/util/evlist.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 561bdfb..ef13aab 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -290,19 +290,33 @@ static int perf_evlist__id_add_fd(struct perf_evlist *evlist,
struct perf_evsel *evsel,
int cpu, int thread, int fd)
{
- u64 read_data[4] = { 0, };
- int id_idx = 1; /* The first entry is the counter value */
+ u64 id;
+ int ret;

- if (!(evsel->attr.read_format & PERF_FORMAT_ID) ||
- read(fd, &read_data, sizeof(read_data)) == -1)
- return -1;
+ ret = ioctl(fd, PERF_EVENT_IOC_ID, &id);
+ if (!ret)
+ goto add;
+
+ /* Legacy way to get event id.. All hail to old kernels! */
+ if (errno == ENOTTY) {
+ u64 read_data[4] = { 0, };
+ int id_idx = 1; /* The first entry is the counter value */
+
+ if (!(evsel->attr.read_format & PERF_FORMAT_ID) ||
+ read(fd, &read_data, sizeof(read_data)) == -1)
+ return -1;

- if (evsel->attr.read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
- ++id_idx;
- if (evsel->attr.read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
- ++id_idx;
+ if (evsel->attr.read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+ ++id_idx;
+ if (evsel->attr.read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+ ++id_idx;
+
+ id = read_data[id_idx];
+ } else if (errno)
+ return -1;

- perf_evlist__id_add(evlist, evsel, cpu, thread, read_data[id_idx]);
+ add:
+ perf_evlist__id_add(evlist, evsel, cpu, thread, id);
return 0;
}

--
1.7.11.4

2012-10-20 14:33:53

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 04/11] perf tool: Add support for parsing PERF_SAMPLE_READ sample type

Adding support to parse out the PERF_SAMPLE_READ sample bits.
The code contains both single and group format specification.

This code parse out and prepare prepare PERF_SAMPLE_READ data
into the perf_sample struct. It will be used for group leader
sampling feature comming in shortly.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/util/event.h | 18 ++++++++++++++++++
tools/perf/util/evlist.c | 20 ++++++++++++++++++++
tools/perf/util/evlist.h | 2 ++
tools/perf/util/evsel.c | 29 +++++++++++++++++++++++++++--
tools/perf/util/session.c | 38 ++++++++++++++++++++++++++++++++++++++
5 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index da97aff..5948d7a 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -79,6 +79,23 @@ struct stack_dump {
char *data;
};

+struct sample_read_value {
+ u64 value;
+ u64 id;
+};
+
+struct sample_read {
+ u64 time_enabled;
+ u64 time_running;
+ union {
+ struct {
+ u64 nr;
+ struct sample_read_value *values;
+ } group;
+ struct sample_read_value one;
+ };
+};
+
struct perf_sample {
u64 ip;
u32 pid, tid;
@@ -94,6 +111,7 @@ struct perf_sample {
struct branch_stack *branch_stack;
struct regs_dump user_regs;
struct stack_dump user_stack;
+ struct sample_read read;
};

#define BUILD_ID_SIZE 20
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index ef13aab..b1c8f35 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -668,6 +668,26 @@ u64 perf_evlist__sample_type(struct perf_evlist *evlist)
return first->attr.sample_type;
}

+bool perf_evlist__valid_read_format(const struct perf_evlist *evlist)
+{
+ struct perf_evsel *pos, *first;
+
+ pos = first = list_entry(evlist->entries.next, struct perf_evsel, node);
+
+ list_for_each_entry_continue(pos, &evlist->entries, node) {
+ if (first->attr.read_format != pos->attr.read_format)
+ return false;
+ }
+
+ return true;
+}
+
+u64 perf_evlist__read_format(struct perf_evlist *evlist)
+{
+ struct perf_evsel *first = perf_evlist__first(evlist);
+ return first->attr.read_format;
+}
+
u16 perf_evlist__id_hdr_size(struct perf_evlist *evlist)
{
struct perf_evsel *first = perf_evlist__first(evlist);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index fa76bf9..a6c74bd 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -111,6 +111,7 @@ int perf_evlist__apply_filters(struct perf_evlist *evlist);
void __perf_evlist__set_leader(struct list_head *list);
void perf_evlist__set_leader(struct perf_evlist *evlist);

+u64 perf_evlist__read_format(struct perf_evlist *evlist);
u64 perf_evlist__sample_type(struct perf_evlist *evlist);
bool perf_evlist__sample_id_all(struct perf_evlist *evlist);
u16 perf_evlist__id_hdr_size(struct perf_evlist *evlist);
@@ -120,6 +121,7 @@ int perf_evlist__parse_sample(struct perf_evlist *evlist, union perf_event *even

bool perf_evlist__valid_sample_type(struct perf_evlist *evlist);
bool perf_evlist__valid_sample_id_all(struct perf_evlist *evlist);
+bool perf_evlist__valid_read_format(const struct perf_evlist *evlist);

void perf_evlist__splice_list_tail(struct perf_evlist *evlist,
struct list_head *list,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 919b4d2..f37bec0 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -967,8 +967,33 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
}

if (type & PERF_SAMPLE_READ) {
- fprintf(stderr, "PERF_SAMPLE_READ is unsupported for now\n");
- return -1;
+ u64 read_format = evsel->attr.read_format;
+
+ if (read_format & PERF_FORMAT_GROUP)
+ data->read.group.nr = *array;
+ else
+ data->read.one.value = *array;
+
+ array++;
+
+ if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
+ data->read.time_enabled = *array;
+ array++;
+ }
+
+ if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
+ data->read.time_running = *array;
+ array++;
+ }
+
+ if (read_format & PERF_FORMAT_GROUP) {
+ data->read.group.values = (struct sample_read_value *) array;
+ array = (void *) array + data->read.group.nr *
+ sizeof(struct sample_read_value);
+ } else {
+ data->read.one.id = *array;
+ array++;
+ }
}

if (type & PERF_SAMPLE_CALLCHAIN) {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 15abe40..ceb3fe2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -74,6 +74,11 @@ static int perf_session__open(struct perf_session *self, bool force)
goto out_close;
}

+ if (!perf_evlist__valid_read_format(self->evlist)) {
+ pr_err("non matching read_format");
+ goto out_close;
+ }
+
self->size = input_stat.st_size;
return 0;

@@ -963,6 +968,36 @@ static void perf_session__print_tstamp(struct perf_session *session,
printf("%" PRIu64 " ", sample->time);
}

+static void sample_read__printf(struct perf_sample *sample, u64 read_format)
+{
+ printf("... sample_read:\n");
+
+ if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+ printf("...... time enabled %016" PRIx64 "\n",
+ sample->read.time_enabled);
+
+ if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+ printf("...... time running %016" PRIx64 "\n",
+ sample->read.time_running);
+
+ if (read_format & PERF_FORMAT_GROUP) {
+ u64 i;
+
+ printf(".... group nr %" PRIu64 "\n", sample->read.group.nr);
+
+ for (i = 0; i < sample->read.group.nr; i++) {
+ struct sample_read_value *value;
+
+ value = &sample->read.group.values[i];
+ printf("..... id %016" PRIx64
+ ", value %016" PRIx64 "\n",
+ value->id, value->value);
+ }
+ } else
+ printf("..... id %016" PRIx64 ", value %016" PRIx64 "\n",
+ sample->read.one.id, sample->read.one.value);
+}
+
static void dump_event(struct perf_session *session, union perf_event *event,
u64 file_offset, struct perf_sample *sample)
{
@@ -1006,6 +1041,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,

if (sample_type & PERF_SAMPLE_STACK_USER)
stack_user__printf(&sample->user_stack);
+
+ if (sample_type & PERF_SAMPLE_READ)
+ sample_read__printf(sample, evsel->attr.read_format);
}

static struct machine *
--
1.7.11.4

2012-10-20 14:34:00

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 09/11] perf test: Add parse events tests for leader sampling

Adding 2 more tests to the automated parse events suite
for following event config:

'{cycles,cache-misses,branch-misses}:S'
'{instructions,branch-misses}:Su'

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/util/parse-events-test.c | 117 ++++++++++++++++++++++++++++++++++++
1 file changed, 117 insertions(+)

diff --git a/tools/perf/util/parse-events-test.c b/tools/perf/util/parse-events-test.c
index d7244e5..903ba77 100644
--- a/tools/perf/util/parse-events-test.c
+++ b/tools/perf/util/parse-events-test.c
@@ -504,6 +504,7 @@ static int test__group1(struct perf_evlist *evlist)
TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* cycles:upp */
evsel = perf_evsel__next(evsel);
@@ -517,6 +518,7 @@ static int test__group1(struct perf_evlist *evlist)
TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip == 2);
TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

return 0;
}
@@ -539,6 +541,7 @@ static int test__group2(struct perf_evlist *evlist)
TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* cache-references + :u modifier */
evsel = perf_evsel__next(evsel);
@@ -552,6 +555,7 @@ static int test__group2(struct perf_evlist *evlist)
TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* cycles:k */
evsel = perf_evsel__next(evsel);
@@ -565,6 +569,7 @@ static int test__group2(struct perf_evlist *evlist)
TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

return 0;
}
@@ -590,6 +595,7 @@ static int test__group3(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
TEST_ASSERT_VAL("wrong group name",
!strcmp(leader->group_name, "group1"));
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* group1 cycles:kppp */
evsel = perf_evsel__next(evsel);
@@ -604,6 +610,7 @@ static int test__group3(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip == 3);
TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* group2 cycles + G modifier */
evsel = leader = perf_evsel__next(evsel);
@@ -619,6 +626,7 @@ static int test__group3(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
TEST_ASSERT_VAL("wrong group name",
!strcmp(leader->group_name, "group2"));
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* group2 1:3 + G modifier */
evsel = perf_evsel__next(evsel);
@@ -631,6 +639,7 @@ static int test__group3(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude host", evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* instructions:u */
evsel = perf_evsel__next(evsel);
@@ -644,6 +653,7 @@ static int test__group3(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

return 0;
}
@@ -667,6 +677,7 @@ static int test__group4(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip == 1);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* instructions:kp + p */
evsel = perf_evsel__next(evsel);
@@ -680,6 +691,7 @@ static int test__group4(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip == 2);
TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

return 0;
}
@@ -703,6 +715,7 @@ static int test__group5(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* instructions + G */
evsel = perf_evsel__next(evsel);
@@ -716,6 +729,7 @@ static int test__group5(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude host", evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* cycles:G */
evsel = leader = perf_evsel__next(evsel);
@@ -730,6 +744,7 @@ static int test__group5(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* instructions:G */
evsel = perf_evsel__next(evsel);
@@ -743,6 +758,7 @@ static int test__group5(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude host", evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);

/* cycles */
evsel = perf_evsel__next(evsel);
@@ -756,6 +772,99 @@ static int test__group5(struct perf_evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", !evsel->sample_read);
+
+ return 0;
+}
+
+static int test__leader_sample1(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel, *leader;
+
+ TEST_ASSERT_VAL("wrong number of entries", 3 == evlist->nr_entries);
+
+ /* cycles - sampling group leader */
+ evsel = leader = perf_evlist__first(evlist);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+ TEST_ASSERT_VAL("wrong config",
+ PERF_COUNT_HW_CPU_CYCLES == evsel->attr.config);
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
+ TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+ TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
+ TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", evsel->sample_read);
+
+ /* cache-misses - not sampling */
+ evsel = perf_evsel__next(evsel);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+ TEST_ASSERT_VAL("wrong config",
+ PERF_COUNT_HW_CACHE_MISSES == evsel->attr.config);
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
+ TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+ TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", evsel->sample_read);
+
+ /* branch-misses - not sampling */
+ evsel = perf_evsel__next(evsel);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+ TEST_ASSERT_VAL("wrong config",
+ PERF_COUNT_HW_BRANCH_MISSES == evsel->attr.config);
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
+ TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+ TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
+ TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", evsel->sample_read);
+
+ return 0;
+}
+
+static int test__leader_sample2(struct perf_evlist *evlist __maybe_unused)
+{
+ struct perf_evsel *evsel, *leader;
+
+ TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->nr_entries);
+
+ /* cycles - sampling group leader */
+ evsel = leader = perf_evlist__first(evlist);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+ TEST_ASSERT_VAL("wrong config",
+ PERF_COUNT_HW_INSTRUCTIONS == evsel->attr.config);
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
+ TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+ TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
+ TEST_ASSERT_VAL("wrong leader", evsel->leader == NULL);
+ TEST_ASSERT_VAL("wrong sample_read", evsel->sample_read);
+
+ /* branch-misses - not sampling */
+ evsel = perf_evsel__next(evsel);
+ TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+ TEST_ASSERT_VAL("wrong config",
+ PERF_COUNT_HW_BRANCH_MISSES == evsel->attr.config);
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
+ TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+ TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
+ TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+ TEST_ASSERT_VAL("wrong sample_read", evsel->sample_read);

return 0;
}
@@ -899,6 +1008,14 @@ static struct test__event_st test__events[] = {
.name = "{cycles,instructions}:G,{cycles:G,instructions:G},cycles",
.check = test__group5,
},
+ [33] = {
+ .name = "{cycles,cache-misses,branch-misses}:S",
+ .check = test__leader_sample1,
+ },
+ [34] = {
+ .name = "{instructions,branch-misses}:Su",
+ .check = test__leader_sample2,
+ },
};

static struct test__event_st test__events_pmu[] = {
--
1.7.11.4

2012-10-20 14:34:11

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 05/11] perf tool: Fix event ID retrieval for group format read case

We need to fail the event ID retrieval in case both following
conditions are true:
- we are on kernel with no PERF_EVENT_IOC_ID support
- PERF_FORMAT_GROUP read format is set

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/util/evlist.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index b1c8f35..b7c64c3 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -302,6 +302,13 @@ static int perf_evlist__id_add_fd(struct perf_evlist *evlist,
u64 read_data[4] = { 0, };
int id_idx = 1; /* The first entry is the counter value */

+ /*
+ * This way does not work with group format read, so bail
+ * out in that case.
+ */
+ if (perf_evlist__read_format(evlist) & PERF_FORMAT_GROUP)
+ return -1;
+
if (!(evsel->attr.read_format & PERF_FORMAT_ID) ||
read(fd, &read_data, sizeof(read_data)) == -1)
return -1;
--
1.7.11.4

2012-10-20 14:33:56

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 07/11] perf tool: Add PERF_SAMPLE_READ sample related processing

For sample with sample type PERF_SAMPLE_READ the period value is
stored in the 'struct sample_read'.

Moreover if the read format has PERF_FORMAT_GROUP, the 'struct
sample_read' contains period values for all events in the group
(for which the sample's event is a leader).

We deliver separated samples for all the values contained
within the 'struct sample_read'.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/util/evsel.h | 3 ++
tools/perf/util/session.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8d5937d..351b48e 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -36,6 +36,9 @@ struct perf_sample_id {
struct hlist_node node;
u64 id;
struct perf_evsel *evsel;
+
+ /* Holds total ID period value for PERF_SAMPLE_READ processing. */
+ u64 period;
};

/** struct perf_evsel - event selector
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ceb3fe2..34e226c 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1068,6 +1068,75 @@ static struct machine *
return perf_session__find_host_machine(session);
}

+static int deliver_sample_value(struct perf_session *session,
+ struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct sample_read_value *v,
+ struct machine *machine)
+{
+ struct perf_sample_id *sid;
+
+ sid = perf_evlist__id2sid(session->evlist, v->id);
+ if (sid) {
+ sample->id = v->id;
+ sample->period = v->value - sid->period;
+ sid->period = v->value;
+ }
+
+ if (!sid || sid->evsel == NULL) {
+ ++session->hists.stats.nr_unknown_id;
+ return 0;
+ }
+
+ return tool->sample(tool, event, sample, sid->evsel, machine);
+}
+
+static int deliver_sample_group(struct perf_session *session,
+ struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ int ret = -EINVAL;
+ u64 i;
+
+ for (i = 0; i < sample->read.group.nr; i++) {
+ ret = deliver_sample_value(session, tool, event, sample,
+ &sample->read.group.values[i],
+ machine);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+static int
+perf_session__deliver_sample(struct perf_session *session,
+ struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct perf_evsel *evsel,
+ struct machine *machine)
+{
+ /* We know evsel != NULL. */
+ u64 sample_type = evsel->attr.sample_type;
+ u64 read_format = evsel->attr.read_format;
+
+ /* Standard sample delievery. */
+ if (!(sample_type & PERF_SAMPLE_READ))
+ return tool->sample(tool, event, sample, evsel, machine);
+
+ /* For PERF_SAMPLE_READ we have either single or group mode. */
+ if (read_format & PERF_FORMAT_GROUP)
+ return deliver_sample_group(session, tool, event, sample,
+ machine);
+ else
+ return deliver_sample_value(session, tool, event, sample,
+ &sample->read.one, machine);
+}
+
static int perf_session_deliver_event(struct perf_session *session,
union perf_event *event,
struct perf_sample *sample,
@@ -1110,7 +1179,8 @@ static int perf_session_deliver_event(struct perf_session *session,
++session->hists.stats.nr_unprocessable_samples;
return 0;
}
- return tool->sample(tool, event, sample, evsel, machine);
+ return perf_session__deliver_sample(session, tool, event,
+ sample, evsel, machine);
case PERF_RECORD_MMAP:
return tool->mmap(tool, event, sample, machine);
case PERF_RECORD_COMM:
--
1.7.11.4

2012-10-20 14:34:06

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 11/11] perf record: Fix mmap error output condition

The mmap_pages default value is not power of 2 (UINT_MAX).

Together with perf_evlist__mmap function returning error value
different from EPERM, we get misleading error message:
"--mmap_pages/-m value must be a power of two."

Fixing this by adding extra check for UINT_MAX value for this
error condition.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/builtin-record.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 3cf7fe2..93dc67e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -358,7 +358,8 @@ try_again:
"or try again with a smaller value of -m/--mmap_pages.\n"
"(current value: %d)\n", opts->mmap_pages);
rc = -errno;
- } else if (!is_power_of_2(opts->mmap_pages)) {
+ } else if (!is_power_of_2(opts->mmap_pages) &&
+ (opts->mmap_pages != UINT_MAX)) {
pr_err("--mmap_pages/-m value must be a power of two.");
rc = -EINVAL;
} else {
--
1.7.11.4

2012-10-20 14:34:46

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 10/11] perf tool: Display period values for all group members

When in report group mode (perf report --group) with option
--show-total-period, display period values for all group members.

Currently only the group leader period counts are shown.

Example of new output:

$ perf report --group --show-total-period --stdio
...

# Samples: 76 of event 'anon group { cycles, cache-misses }'
# Event count (approx.): 32249096
#
# Overhead Period Command Shared Object Symbol
# ................ ........................ ....... ................. .............................
#
18.45% 11.67% 5949678 333 ls [kernel.kallsyms] [k] __lock_acquire
16.49% 11.39% 5317432 325 ls [kernel.kallsyms] [k] native_sched_clock
8.55% 15.42% 2757064 440 ls [kernel.kallsyms] [k] lock_acquire
8.31% 12.58% 2680708 359 ls [kernel.kallsyms] [k] clear_page_c

...

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/ui/hist.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 51934a8..50e8d85 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -273,21 +273,59 @@ static int hpp__entry_samples(struct perf_hpp *hpp, struct hist_entry *he)

static int hpp__header_period(struct perf_hpp *hpp)
{
- const char *fmt = symbol_conf.field_sep ? "%s" : "%12s";
+ int len = 12;
+
+ if (symbol_conf.field_sep)
+ return scnprintf(hpp->buf, hpp->size, "%s", "Period");
+
+ if (symbol_conf.event_group) {
+ struct perf_evsel *evsel = hpp->ptr;
+
+ BUG_ON(!perf_evsel__is_group_leader(evsel));
+
+ len += evsel->nr_members * 12;
+ }

- return scnprintf(hpp->buf, hpp->size, fmt, "Period");
+ return scnprintf(hpp->buf, hpp->size, "%*s", len, "Period");
}

static int hpp__width_period(struct perf_hpp *hpp __maybe_unused)
{
- return 12;
+ int len = 12;
+
+ if (symbol_conf.event_group) {
+ struct perf_evsel *evsel = hpp->ptr;
+
+ len += evsel->nr_members * 12;
+ }
+
+ return len;
}

static int hpp__entry_period(struct perf_hpp *hpp, struct hist_entry *he)
{
const char *fmt = symbol_conf.field_sep ? "%" PRIu64 : "%12" PRIu64;
+ struct hists *hists = he->hists;
+ int ret;
+
+ ret = scnprintf(hpp->buf, hpp->size, fmt, he->stat.period);
+
+ if (symbol_conf.event_group) {
+ int i;
+ struct perf_evsel *evsel = hists_2_evsel(hists);
+
+ for (i = 0; i < evsel->nr_members; i++) {
+ u64 period = he->group_stats[i].period;

- return scnprintf(hpp->buf, hpp->size, fmt, he->stat.period);
+ if (symbol_conf.field_sep) {
+ ret += scnprintf(hpp->buf + ret,
+ hpp->size - ret, " ");
+ }
+ ret += scnprintf(hpp->buf + ret, hpp->size - ret,
+ fmt, period);
+ }
+ }
+ return ret;
}

static int hpp__header_period_baseline(struct perf_hpp *hpp)
--
1.7.11.4

2012-10-20 14:35:17

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 08/11] perf tool: Add 'S' event/group modifier to read sample value

Adding 'S' event/group modifier to specify that the event value/s
are read by PERF_SAMPLE_READ sample type processing, instead of the
period value offered by lower layers.

There's additional behaviour change for 'S' modifier being specified
on event group:

Currently all the events within a group makes samples. If user now
specifies 'S' within group modifier, only the leader will trigger
samples. The rest of events in the group will have sampling disabled.

And same as for single events, values of all events within the group
(including leader) are read by PERF_SAMPLE_READ sample type processing.

Following example will create event group with cycles and cache-misses
events, setting the cycles as group leader and the only event to actually
sample. Both cycles and cache-misses event period values are read by
PERF_SAMPLE_READ sample type processing with PERF_FORMAT_GROUP read format.

$ perf record -e '{cycles,cache-misses}:S' ls

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-list.txt | 1 +
tools/perf/util/evsel.c | 25 +++++++++++++++++++++++++
tools/perf/util/evsel.h | 1 +
tools/perf/util/parse-events.c | 6 ++++++
tools/perf/util/parse-events.l | 2 +-
5 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index d1e39dc..7ddef65 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -29,6 +29,7 @@ counted. The following modifiers exist:
G - guest counting (in KVM guests)
H - host counting (not in KVM guests)
p - precise level
+ S - read sample value (PERF_SAMPLE_READ)

The 'p' modifier can be used for specifying how precise the instruction
address should be. The 'p' modifier can be specified multiple times:
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f37bec0..836a786 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -432,6 +432,7 @@ int perf_evsel__group_desc(struct perf_evsel *evsel, char *buf, size_t size)
void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
struct perf_evsel *first)
{
+ struct perf_evsel *leader = evsel->leader;
struct perf_event_attr *attr = &evsel->attr;
int track = !evsel->idx; /* only the first counter needs these */

@@ -442,6 +443,21 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
PERF_FORMAT_TOTAL_TIME_RUNNING |
PERF_FORMAT_ID;

+ if (evsel->sample_read) {
+ attr->sample_type |= PERF_SAMPLE_READ;
+
+ /*
+ * Apply group format only if:
+ * - we have a leader so there's a group and at
+ * least 2 events,
+ * - we are a leader with more than 1 event
+ */
+ if (evsel->leader || evsel->nr_members) {
+ attr->read_format |= PERF_FORMAT_GROUP;
+ attr->inherit = 0;
+ }
+ }
+
attr->sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID;

/*
@@ -459,6 +475,15 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
}
}

+ /*
+ * We are safe here, because we have a leader so there's at
+ * least 2 events in the group.
+ */
+ if (leader && leader->sample_read) {
+ attr->sample_freq = 0;
+ attr->sample_period = 0;
+ }
+
if (opts->no_samples)
attr->sample_freq = 0;

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 351b48e..1e34060 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -75,6 +75,7 @@ struct perf_evsel {
bool needs_swap;
/* parse modifier helper */
int exclude_GH;
+ int sample_read;
struct perf_evsel *leader;
char *group_name;
union {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 364e518..5a6ca8a 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -642,6 +642,7 @@ struct event_modifier {
int eG;
int precise;
int exclude_GH;
+ int sample_read;
};

static int get_event_modifier(struct event_modifier *mod, char *str,
@@ -653,6 +654,7 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
int eH = evsel ? evsel->attr.exclude_host : 0;
int eG = evsel ? evsel->attr.exclude_guest : 0;
int precise = evsel ? evsel->attr.precise_ip : 0;
+ int sample_read = 0;

int exclude = eu | ek | eh;
int exclude_GH = evsel ? evsel->exclude_GH : 0;
@@ -690,6 +692,8 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
eH = 0;
} else if (*str == 'p') {
precise++;
+ } else if (*str == 'S') {
+ sample_read = 1;
} else
break;

@@ -716,6 +720,7 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
mod->eG = eG;
mod->precise = precise;
mod->exclude_GH = exclude_GH;
+ mod->sample_read = sample_read;
return 0;
}

@@ -742,6 +747,7 @@ int parse_events__modifier_event(struct list_head *list, char *str, bool add)
evsel->attr.exclude_host = mod.eH;
evsel->attr.exclude_guest = mod.eG;
evsel->exclude_GH = mod.exclude_GH;
+ evsel->sample_read = mod.sample_read;
}

return 0;
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index c87efc1..391aabb 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -81,7 +81,7 @@ num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
num_raw_hex [a-fA-F0-9]+
name [a-zA-Z_*?][a-zA-Z0-9_*?]*
-modifier_event [ukhpGH]{1,8}
+modifier_event [ukhpGHS]{1,9}
modifier_bp [rwx]{1,3}

%%
--
1.7.11.4

2012-10-20 14:33:52

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 06/11] perf tool: Add perf_evlist__id2sid function to get event ID related data

Adding perf_evlist__id2sid function to be able to get ID related
data. This will be helpful for PERF_FORMAT_GROUP samples where
we need to store ID related period value for each event.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
tools/perf/util/evlist.c | 21 ++++++++++++++++-----
tools/perf/util/evlist.h | 2 ++
2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index b7c64c3..71064d0 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -327,22 +327,33 @@ static int perf_evlist__id_add_fd(struct perf_evlist *evlist,
return 0;
}

-struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id)
+struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id)
{
struct hlist_head *head;
struct hlist_node *pos;
struct perf_sample_id *sid;
int hash;

- if (evlist->nr_entries == 1)
- return perf_evlist__first(evlist);
-
hash = hash_64(id, PERF_EVLIST__HLIST_BITS);
head = &evlist->heads[hash];

hlist_for_each_entry(sid, pos, head, node)
if (sid->id == id)
- return sid->evsel;
+ return sid;
+
+ return NULL;
+}
+
+struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id)
+{
+ struct perf_sample_id *sid;
+
+ if (evlist->nr_entries == 1)
+ return perf_evlist__first(evlist);
+
+ sid = perf_evlist__id2sid(evlist, id);
+ if (sid)
+ return sid->evsel;

if (!perf_evlist__sample_id_all(evlist))
return perf_evlist__first(evlist);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a6c74bd..ae443ef 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -73,6 +73,8 @@ void perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd);

struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);

+struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
+
union perf_event *perf_evlist__mmap_read(struct perf_evlist *self, int idx);

int perf_evlist__open(struct perf_evlist *evlist);
--
1.7.11.4

2012-10-20 14:33:43

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 01/11] perf: Add PERF_EVENT_IOC_ID ioctl to return event ID

The only way to get the event ID is by reading the event fd,
followed by parsing the ID value out of the returned data.

While this is ok for current read format used by perf tool,
it is not ok when we use PERF_FORMAT_GROUP format.

With this format the data are returned for the whole group
and there's no way to find out what ID belongs to our fd
(if we are not group leader event).

Adding a simple ioctl that returns event primary ID for given fd.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 9 +++++++++
2 files changed, 10 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 28f9cee..f11c967 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -314,6 +314,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
+#define PERF_EVENT_IOC_ID _IOR('$', 7, u64 *)

enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2ba8904..32aec40 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3272,6 +3272,15 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PERF_EVENT_IOC_PERIOD:
return perf_event_period(event, (u64 __user *)arg);

+ case PERF_EVENT_IOC_ID:
+ {
+ u64 id = primary_event_id(event);
+
+ if (copy_to_user((void __user *)arg, &id, sizeof(id)))
+ return -EFAULT;
+ return 0;
+ }
+
case PERF_EVENT_IOC_SET_OUTPUT:
{
struct perf_event *output_event = NULL;
--
1.7.11.4

2012-10-20 14:36:12

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

It's possible some of the counters in the group could be
disabled when sampling member of the event group is reading
the rest via PERF_SAMPLE_READ sample type processing. Disabled
counters could then produce wrong numbers.

Fixing that by reading only enabled counters for PERF_SAMPLE_READ
sample type processing.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
kernel/events/core.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 32aec40..5220d01 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4012,12 +4012,24 @@ static void perf_output_read_group(struct perf_output_handle *handle,
__output_copy(handle, values, n * sizeof(u64));

list_for_each_entry(sub, &leader->sibling_list, group_entry) {
+ u64 value = 0;
n = 0;

- if (sub != event)
- sub->pmu->read(sub);
+ /*
+ * We are NOT interested in disabled counters,
+ * giving us strange values and keeping us from
+ * good night sleep!!!
+ */
+ if (sub->state > PERF_EVENT_STATE_OFF) {
+
+ if (sub != event)
+ sub->pmu->read(sub);
+
+ value = perf_event_count(sub);
+ }
+
+ values[n++] = value;

- values[n++] = perf_event_count(sub);
if (read_format & PERF_FORMAT_ID)
values[n++] = primary_event_id(sub);

--
1.7.11.4

2012-10-21 16:38:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support


* Jiri Olsa <[email protected]> wrote:

> hi,
> adding support to read sample values through the PERF_SAMPLE_READ
> sample type. It's now possible to specify 'S' modifier for an event
> and get its sample value by PERF_SAMPLE_READ.
>
> For group the 'S' modifier will enable sampling only for the leader
> and read all the group member by PERF_SAMPLE_READ smple type with
> PERF_FORMAT_GROUP read format.
>
> This patchset is based on group report patches by Namhyung Kim:
> http://lwn.net/Articles/518569/
>
> Example:
> (making sample on cycles, reading both cycles and cache-misses
> by PERF_SAMPLE_READ/PERF_FORMAT_GROUP)
>
> # ./perf record -e '{cycles,cache-misses}:S' ls
> ...
>
> # ./perf report --group --show-total-period --stdio
> # ========
> # captured on: Sat Oct 20 16:53:39 2012
> ...
> # group: {cycles,cache-misses}
> # ========
> #
> # Samples: 86 of event 'anon group { cycles, cache-misses }'
> # Event count (approx.): 34863674
> #
> # Overhead Period Command Shared Object Symbol
> # ................ ........................ ....... ................. ................................

Might make sense to consider this column enumeration:

#
# cycles
# | cache-misses
# | |
> # v v
> #
> 16.56% 19.47% 5773450 475 ls [kernel.kallsyms] [k] native_sched_clock
> 10.87% 0.74% 3789088 18 ls [kernel.kallsyms] [k] rtl8169_interrupt
> 9.82% 15.86% 3423364 387 ls [kernel.kallsyms] [k] mark_lock
> 8.43% 17.75% 2938384 433 ls ld-2.14.90.so [.] do_lookup_x
> 6.79% 20.86% 2365622 509 ls ls [.] calculate_columns
> 6.36% 0.61% 2216808 15 ls [kernel.kallsyms] [k] lock_release
> ...

/me wants this feature ASAP :-)

This should probably be the out of box default for perf record
and perf top as well - the cache miss rate is probably one of
the least appreciated aspects of overhead analysis.

Does it have sane output if the cache-misses event is not
supported? The cache-misses column should probably stay empty in
that case - basically falling back to today's default output.

Thanks,

Ingo

2012-10-22 07:32:15

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

Hi Jiri,

It'd be better if you provide a git branch for this series.

Thanks,
Namhyung

2012-10-22 07:53:31

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On Mon, Oct 22, 2012 at 04:32:11PM +0900, Namhyung Kim wrote:
> Hi Jiri,
>
> It'd be better if you provide a git branch for this series.

aaah right, I forgot to mention that.. ;)

git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/linux.git
perf/group2

thanks,
jirka

2012-10-22 07:57:29

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 03/11] perf tool: Use PERF_EVENT_IOC_ID perf ioctl to read event id

Hi,

Just a minor nitpicking..


On Sat, 20 Oct 2012 16:33:11 +0200, Jiri Olsa wrote:
> Changing the way we retrieve the event ID. Instead of parsing out
> the ID out of the read data, using the PERF_EVENT_IOC_ID ioctl.
>
> Keeping the old way in place to support kernels without
> PERF_EVENT_IOC_ID ioctl support.
[snip]
> + } else if (errno)
> + return -1;

Is this check really needed? I think that returning non-zero from the
ioctl always sets the errno, no? How about this:

ret = ioctl(fd, PERF_EVENT_IOC_ID, &id);
if (!ret)
goto add;

if (errno != ENOTTY)
return -1;

...

I guess that it'll show you better diff stat. :)

Thanks,
Namhyung

>
> - perf_evlist__id_add(evlist, evsel, cpu, thread, read_data[id_idx]);
> + add:
> + perf_evlist__id_add(evlist, evsel, cpu, thread, id);
> return 0;
> }

2012-10-22 08:11:51

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On Sun, Oct 21, 2012 at 06:38:49PM +0200, Ingo Molnar wrote:
>
> * Jiri Olsa <[email protected]> wrote:
>
> > hi,
> > adding support to read sample values through the PERF_SAMPLE_READ
> > sample type. It's now possible to specify 'S' modifier for an event
> > and get its sample value by PERF_SAMPLE_READ.
> >
> > For group the 'S' modifier will enable sampling only for the leader
> > and read all the group member by PERF_SAMPLE_READ smple type with
> > PERF_FORMAT_GROUP read format.
> >
> > This patchset is based on group report patches by Namhyung Kim:
> > http://lwn.net/Articles/518569/
> >
> > Example:
> > (making sample on cycles, reading both cycles and cache-misses
> > by PERF_SAMPLE_READ/PERF_FORMAT_GROUP)
> >
> > # ./perf record -e '{cycles,cache-misses}:S' ls
> > ...
> >
> > # ./perf report --group --show-total-period --stdio
> > # ========
> > # captured on: Sat Oct 20 16:53:39 2012
> > ...
> > # group: {cycles,cache-misses}
> > # ========
> > #
> > # Samples: 86 of event 'anon group { cycles, cache-misses }'
> > # Event count (approx.): 34863674
> > #
> > # Overhead Period Command Shared Object Symbol
> > # ................ ........................ ....... ................. ................................
>
> Might make sense to consider this column enumeration:
>
> #
> # cycles
> # | cache-misses
> # | |
> > # v v
> > #
> > 16.56% 19.47% 5773450 475 ls [kernel.kallsyms] [k] native_sched_clock
> > 10.87% 0.74% 3789088 18 ls [kernel.kallsyms] [k] rtl8169_interrupt

no problem in '--stdio' mode I guess.. not sure in '--tui/--gtk', Namhyung?


> > 9.82% 15.86% 3423364 387 ls [kernel.kallsyms] [k] mark_lock
> > 8.43% 17.75% 2938384 433 ls ld-2.14.90.so [.] do_lookup_x
> > 6.79% 20.86% 2365622 509 ls ls [.] calculate_columns
> > 6.36% 0.61% 2216808 15 ls [kernel.kallsyms] [k] lock_release
> > ...
>
> /me wants this feature ASAP :-)
>
> This should probably be the out of box default for perf record
> and perf top as well - the cache miss rate is probably one of
> the least appreciated aspects of overhead analysis.
>
> Does it have sane output if the cache-misses event is not
> supported? The cache-misses column should probably stay empty in
> that case - basically falling back to today's default output.

unsupported counters fail before report:

# ./perf record -e '{cycles,stalled-cycles-backend}:S' ls
Error:
The stalled-cycles-backend event is not supported.
ls: Terminated

similar for top, so we'd need special treatment for default

thanks,
jirka

2012-10-22 08:40:40

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 03/11] perf tool: Use PERF_EVENT_IOC_ID perf ioctl to read event id

On Mon, Oct 22, 2012 at 04:57:24PM +0900, Namhyung Kim wrote:
> Hi,
>
> Just a minor nitpicking..
>
>
> On Sat, 20 Oct 2012 16:33:11 +0200, Jiri Olsa wrote:
> > Changing the way we retrieve the event ID. Instead of parsing out
> > the ID out of the read data, using the PERF_EVENT_IOC_ID ioctl.
> >
> > Keeping the old way in place to support kernels without
> > PERF_EVENT_IOC_ID ioctl support.
> [snip]
> > + } else if (errno)
> > + return -1;
>
> Is this check really needed? I think that returning non-zero from the
> ioctl always sets the errno, no? How about this:
>
> ret = ioctl(fd, PERF_EVENT_IOC_ID, &id);
> if (!ret)
> goto add;
>
> if (errno != ENOTTY)
> return -1;
>
> ...
>
> I guess that it'll show you better diff stat. :)

yep, looks better.. v2 candidate ;)

thanks,
jirka

2012-10-22 08:51:07

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On Mon, 22 Oct 2012 10:09:31 +0200, Jiri Olsa wrote:
> On Sun, Oct 21, 2012 at 06:38:49PM +0200, Ingo Molnar wrote:
>> > #
>> > # Samples: 86 of event 'anon group { cycles, cache-misses }'
>> > # Event count (approx.): 34863674
>> > #
>> > # Overhead Period Command Shared Object Symbol
>> > # ................ ........................ ....... ................. ................................
>>
>> Might make sense to consider this column enumeration:
>>
>> #
>> # cycles
>> # | cache-misses
>> # | |
>> > # v v
>> > #
>> > 16.56% 19.47% 5773450 475 ls [kernel.kallsyms] [k] native_sched_clock
>> > 10.87% 0.74% 3789088 18 ls [kernel.kallsyms] [k] rtl8169_interrupt
>
> no problem in '--stdio' mode I guess.. not sure in '--tui/--gtk', Namhyung?

I thought a similar way but met a problem. We have other output columns
than 'overhead' like 'period' in this example. What about others?
Adding all event names for each column will just increase the column
width so bothers user IMHO - especially for pmu format events.

So I ended up printing those events on the header:

# Samples: 86 of event 'anon group { cycles, cache-misses }'

Thanks,
Namhyung

2012-10-22 08:53:36

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On Mon, 22 Oct 2012 09:53:05 +0200, Jiri Olsa wrote:
> On Mon, Oct 22, 2012 at 04:32:11PM +0900, Namhyung Kim wrote:
>> It'd be better if you provide a git branch for this series.
>
> aaah right, I forgot to mention that.. ;)
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/linux.git
> perf/group2

I couldn't get the branch. Do you mind rechecking?

$ git fetch git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/linux.git perf/group2
fatal: Couldn't find remote ref perf/group2

Thanks,
Namhyung

2012-10-22 08:56:34

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 04/11] perf tool: Add support for parsing PERF_SAMPLE_READ sample type

On Sat, 20 Oct 2012 16:33:12 +0200, Jiri Olsa wrote:
> Adding support to parse out the PERF_SAMPLE_READ sample bits.
> The code contains both single and group format specification.
>
> This code parse out and prepare prepare PERF_SAMPLE_READ data

Doubly prepare? :)

Thanks,
Namhyung

2012-10-22 08:58:26

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 09/11] perf test: Add parse events tests for leader sampling

On Sat, 20 Oct 2012 16:33:17 +0200, Jiri Olsa wrote:
> Adding 2 more tests to the automated parse events suite
> for following event config:
>
> '{cycles,cache-misses,branch-misses}:S'
> '{instructions,branch-misses}:Su'
[snip]
> +static int test__leader_sample2(struct perf_evlist *evlist __maybe_unused)
> +{
> + struct perf_evsel *evsel, *leader;
> +
> + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->nr_entries);
> +
> + /* cycles - sampling group leader */

s/cycles/instructions/ ?

Thanks,
Namhyung


> + [34] = {
> + .name = "{instructions,branch-misses}:Su",
> + .check = test__leader_sample2,
> + },
> };
>
> static struct test__event_st test__events_pmu[] = {

2012-10-22 09:07:11

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On Mon, Oct 22, 2012 at 05:53:32PM +0900, Namhyung Kim wrote:
> On Mon, 22 Oct 2012 09:53:05 +0200, Jiri Olsa wrote:
> > On Mon, Oct 22, 2012 at 04:32:11PM +0900, Namhyung Kim wrote:
> >> It'd be better if you provide a git branch for this series.
> >
> > aaah right, I forgot to mention that.. ;)
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/linux.git
> > perf/group2
>
> I couldn't get the branch. Do you mind rechecking?
>
> $ git fetch git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/linux.git perf/group2
> fatal: Couldn't find remote ref perf/group2

aaand it's pushed now ;)

jirka

2012-10-22 09:12:41

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 09/11] perf test: Add parse events tests for leader sampling

On Mon, Oct 22, 2012 at 05:58:22PM +0900, Namhyung Kim wrote:
> On Sat, 20 Oct 2012 16:33:17 +0200, Jiri Olsa wrote:
> > Adding 2 more tests to the automated parse events suite
> > for following event config:
> >
> > '{cycles,cache-misses,branch-misses}:S'
> > '{instructions,branch-misses}:Su'
> [snip]
> > +static int test__leader_sample2(struct perf_evlist *evlist __maybe_unused)
> > +{
> > + struct perf_evsel *evsel, *leader;
> > +
> > + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->nr_entries);
> > +
> > + /* cycles - sampling group leader */
>
> s/cycles/instructions/ ?

right

thanks,
jirka

2012-10-22 09:16:16

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On Mon, Oct 22, 2012 at 05:51:01PM +0900, Namhyung Kim wrote:
> On Mon, 22 Oct 2012 10:09:31 +0200, Jiri Olsa wrote:
> > On Sun, Oct 21, 2012 at 06:38:49PM +0200, Ingo Molnar wrote:
> >> > #
> >> > # Samples: 86 of event 'anon group { cycles, cache-misses }'
> >> > # Event count (approx.): 34863674
> >> > #
> >> > # Overhead Period Command Shared Object Symbol
> >> > # ................ ........................ ....... ................. ................................
> >>
> >> Might make sense to consider this column enumeration:
> >>
> >> #
> >> # cycles
> >> # | cache-misses
> >> # | |
> >> > # v v
> >> > #
> >> > 16.56% 19.47% 5773450 475 ls [kernel.kallsyms] [k] native_sched_clock
> >> > 10.87% 0.74% 3789088 18 ls [kernel.kallsyms] [k] rtl8169_interrupt
> >
> > no problem in '--stdio' mode I guess.. not sure in '--tui/--gtk', Namhyung?
>
> I thought a similar way but met a problem. We have other output columns
> than 'overhead' like 'period' in this example. What about others?
> Adding all event names for each column will just increase the column
> width so bothers user IMHO - especially for pmu format events.

how about each event name is in special row as outlined above

should be no problem in stdio and could be optional (key toggled) in tui

jirka

2012-10-23 16:13:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote:
> It's possible some of the counters in the group could be
> disabled when sampling member of the event group is reading
> the rest via PERF_SAMPLE_READ sample type processing. Disabled
> counters could then produce wrong numbers.
>
> Fixing that by reading only enabled counters for PERF_SAMPLE_READ
> sample type processing.
>

However did you run into this?

> ---
> kernel/events/core.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 32aec40..5220d01 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4012,12 +4012,24 @@ static void perf_output_read_group(struct perf_output_handle *handle,
> __output_copy(handle, values, n * sizeof(u64));
>
> list_for_each_entry(sub, &leader->sibling_list, group_entry) {
> + u64 value = 0;
> n = 0;
>
> - if (sub != event)
> - sub->pmu->read(sub);
> + /*
> + * We are NOT interested in disabled counters,
> + * giving us strange values and keeping us from
> + * good night sleep!!!
> + */
> + if (sub->state > PERF_EVENT_STATE_OFF) {
> +

superfluous whitespace there... ;-)

> + if (sub != event)
> + sub->pmu->read(sub);
> +
> + value = perf_event_count(sub);
> + }
> +
> + values[n++] = value;
>
> - values[n++] = perf_event_count(sub);
> if (read_format & PERF_FORMAT_ID)
> values[n++] = primary_event_id(sub);
>

2012-10-23 16:51:28

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote:
> On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote:
> > It's possible some of the counters in the group could be
> > disabled when sampling member of the event group is reading
> > the rest via PERF_SAMPLE_READ sample type processing. Disabled
> > counters could then produce wrong numbers.
> >
> > Fixing that by reading only enabled counters for PERF_SAMPLE_READ
> > sample type processing.
> >
>
> However did you run into this?

yep, with perf record -a

hm, I just checked and we enable/disable event groups atomicaly..
I haven't checked that before because it seemed obvious :-/

So, I'm not sure now about the exact code path that triggered it
in my test.. however you could always disable child event from
group and hit this issue, but thats not what happened in perf.

might be some other bug... I'll check

>
> > ---
> > kernel/events/core.c | 18 +++++++++++++++---
> > 1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 32aec40..5220d01 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -4012,12 +4012,24 @@ static void perf_output_read_group(struct perf_output_handle *handle,
> > __output_copy(handle, values, n * sizeof(u64));
> >
> > list_for_each_entry(sub, &leader->sibling_list, group_entry) {
> > + u64 value = 0;
> > n = 0;
> >
> > - if (sub != event)
> > - sub->pmu->read(sub);
> > + /*
> > + * We are NOT interested in disabled counters,
> > + * giving us strange values and keeping us from
> > + * good night sleep!!!
> > + */
> > + if (sub->state > PERF_EVENT_STATE_OFF) {
> > +
>
> superfluous whitespace there... ;-)

yea.. v2 ;)


thanks,
jirka

2012-10-24 12:01:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

On Tue, 2012-10-23 at 18:50 +0200, Jiri Olsa wrote:
> On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote:
> > On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote:
> > > It's possible some of the counters in the group could be
> > > disabled when sampling member of the event group is reading
> > > the rest via PERF_SAMPLE_READ sample type processing. Disabled
> > > counters could then produce wrong numbers.
> > >
> > > Fixing that by reading only enabled counters for PERF_SAMPLE_READ
> > > sample type processing.
> > >
> >
> > However did you run into this?
>
> yep, with perf record -a
>
> hm, I just checked and we enable/disable event groups atomicaly..
> I haven't checked that before because it seemed obvious :-/
>
> So, I'm not sure now about the exact code path that triggered it
> in my test.. however you could always disable child event from
> group and hit this issue, but thats not what happened in perf.
>
> might be some other bug... I'll check

Right, so I don't object to the patch per-se, I was just curious how you
ran into it, because ISTR what you just said, we enable all this stuff
together.

Also, why would disabled counters give strange values? They'd simply
return the same old value time after time, right?

2012-10-24 12:14:53

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

On Wed, Oct 24, 2012 at 02:01:18PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-10-23 at 18:50 +0200, Jiri Olsa wrote:
> > On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote:
> > > On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote:
> > > > It's possible some of the counters in the group could be
> > > > disabled when sampling member of the event group is reading
> > > > the rest via PERF_SAMPLE_READ sample type processing. Disabled
> > > > counters could then produce wrong numbers.
> > > >
> > > > Fixing that by reading only enabled counters for PERF_SAMPLE_READ
> > > > sample type processing.
> > > >
> > >
> > > However did you run into this?
> >
> > yep, with perf record -a
> >
> > hm, I just checked and we enable/disable event groups atomicaly..
> > I haven't checked that before because it seemed obvious :-/
> >
> > So, I'm not sure now about the exact code path that triggered it
> > in my test.. however you could always disable child event from
> > group and hit this issue, but thats not what happened in perf.
> >
> > might be some other bug... I'll check
>
> Right, so I don't object to the patch per-se, I was just curious how you
> ran into it, because ISTR what you just said, we enable all this stuff
> together.
>
> Also, why would disabled counters give strange values? They'd simply
> return the same old value time after time, right?

well, x86_pmu_read calls x86_perf_event_update, which expects the event
is active.. if it's not it'll update the count from whatever left in
event.hw.idx counter.. could be uninitialized or used by others..

I can easily reproduce this one, so let's see.. ;)

jirka

2012-10-24 12:33:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

On Wed, 2012-10-24 at 14:14 +0200, Jiri Olsa wrote:
>
> well, x86_pmu_read calls x86_perf_event_update, which expects the
> event
> is active.. if it's not it'll update the count from whatever left in
> event.hw.idx counter.. could be uninitialized or used by others..
>
Oh right, we shouldn't call ->read() unless ->state ==
PERF_EVENT_STATE_ACTIVE. Aside from that, perf_event_count() should
return the 'right' value regardless state (although excluding ERROR
might make sense).

2012-10-24 16:04:21

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

On Wed, Oct 24, 2012 at 02:14:06PM +0200, Jiri Olsa wrote:
> On Wed, Oct 24, 2012 at 02:01:18PM +0200, Peter Zijlstra wrote:

SNIP

> > Right, so I don't object to the patch per-se, I was just curious how you
> > ran into it, because ISTR what you just said, we enable all this stuff
> > together.
> >
> > Also, why would disabled counters give strange values? They'd simply
> > return the same old value time after time, right?
>
> well, x86_pmu_read calls x86_perf_event_update, which expects the event
> is active.. if it's not it'll update the count from whatever left in
> event.hw.idx counter.. could be uninitialized or used by others..
>
> I can easily reproduce this one, so let's see.. ;)

ok, the problem code path is like this:

- running "perf record -e '{cycles,cache-misses}:S' -a sleep 1"
which creates group of counters, that are enabled by perf via ioctl

- within the __perf_event_enable function the __perf_event_mark_enabled only
change state for leader, so following group_sched_in will fail to schedule
group siblings, because of the state check in event_sched_in:

static int
event_sched_in(struct perf_event *event,
struct perf_cpu_context *cpuctx,
struct perf_event_context *ctx)
{
u64 tstamp = perf_event_time(event);

if (event->state <= PERF_EVENT_STATE_OFF)
return 0;

- ending up with only leader enabled
- all the other events in group are enabled by perf after the leader,
but meanwhile leader can hit sample.. and read group events.. ;)

attached patch fixies this for me and I was wondering we want
same behaviour for disable path as well (included below not tested)

I also think that we should keep that state check before calling
pmu->read() in the perf sample read

thanks,
jirka


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dabfc5d..119a57e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1253,6 +1253,16 @@ retry:
raw_spin_unlock_irq(&ctx->lock);
}

+static void __perf_event_mark_disabled(struct perf_event *event)
+{
+ struct perf_event *sub;
+
+ event->state = PERF_EVENT_STATE_OFF;
+
+ list_for_each_entry(sub, &event->sibling_list, group_entry)
+ sub->state = PERF_EVENT_STATE_OFF;
+}
+
/*
* Cross CPU call to disable a performance event
*/
@@ -1286,7 +1296,8 @@ int __perf_event_disable(void *info)
group_sched_out(event, cpuctx, ctx);
else
event_sched_out(event, cpuctx, ctx);
- event->state = PERF_EVENT_STATE_OFF;
+
+ __perf_event_mark_disabled(event);
}

raw_spin_unlock(&ctx->lock);
@@ -1685,8 +1696,8 @@ retry:
/*
* Put a event into inactive state and update time fields.
* Enabling the leader of a group effectively enables all
- * the group members that aren't explicitly disabled, so we
- * have to update their ->tstamp_enabled also.
+ * the group members, so we have to update their ->tstamp_enabled
+ * also.
* Note: this works for group members as well as group leaders
* since the non-leader members' sibling_lists will be empty.
*/
@@ -1697,9 +1708,10 @@ static void __perf_event_mark_enabled(struct perf_event *event)

event->state = PERF_EVENT_STATE_INACTIVE;
event->tstamp_enabled = tstamp - event->total_time_enabled;
+
list_for_each_entry(sub, &event->sibling_list, group_entry) {
- if (sub->state >= PERF_EVENT_STATE_INACTIVE)
- sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+ sub->state = PERF_EVENT_STATE_INACTIVE;
+ sub->tstamp_enabled = tstamp - sub->total_time_enabled;
}
}

2012-10-26 01:29:12

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

Hi Jiri,

On Sat, 20 Oct 2012 16:33:08 +0200, Jiri Olsa wrote:
> hi,
> adding support to read sample values through the PERF_SAMPLE_READ
> sample type. It's now possible to specify 'S' modifier for an event
> and get its sample value by PERF_SAMPLE_READ.

I have a question. What's an actual impact of specifying 'S' modifiere
to a non-group event or even only a (non-leader) member of a group? For
instance, 'cycles:S' or '{branches,branch-misses:S}'.

Thanks,
Namhyung

2012-10-26 09:15:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On Fri, 2012-10-26 at 10:29 +0900, Namhyung Kim wrote:
> Hi Jiri,
>
> On Sat, 20 Oct 2012 16:33:08 +0200, Jiri Olsa wrote:
> > hi,
> > adding support to read sample values through the PERF_SAMPLE_READ
> > sample type. It's now possible to specify 'S' modifier for an event
> > and get its sample value by PERF_SAMPLE_READ.
>
> I have a question. What's an actual impact of specifying 'S' modifiere
> to a non-group event or even only a (non-leader) member of a group? For
> instance, 'cycles:S' or '{branches,branch-misses:S}'.

I would hope a syntax error from the parser ;-)

I haven't actually looked at the implementation, but I understood it to
be a group modifier, not an event modifier.

2012-10-26 10:24:00

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On Fri, Oct 26, 2012 at 11:14:45AM +0200, Peter Zijlstra wrote:
> On Fri, 2012-10-26 at 10:29 +0900, Namhyung Kim wrote:
> > Hi Jiri,
> >
> > On Sat, 20 Oct 2012 16:33:08 +0200, Jiri Olsa wrote:
> > > hi,
> > > adding support to read sample values through the PERF_SAMPLE_READ
> > > sample type. It's now possible to specify 'S' modifier for an event
> > > and get its sample value by PERF_SAMPLE_READ.
> >
> > I have a question. What's an actual impact of specifying 'S' modifiere
> > to a non-group event or even only a (non-leader) member of a group? For
> > instance, 'cycles:S' or '{branches,branch-misses:S}'.
>
> I would hope a syntax error from the parser ;-)

yeaa.. no ;)

$ ./perf record -e '{cycles:S,cache-misses}' ls
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (~476 samples) ]

$ ./perf report
non matching sample_type

>
> I haven't actually looked at the implementation, but I understood it to
> be a group modifier, not an event modifier.

we might want to be able to use PERF_SAMPLE_READ for single event
same as for groups

the difference between single event and group 'S' modifier:

$ ./perf record -e 'cycles:S' ls
- records 'cycles' samples and read period value via PERF_SAMPLE_READ

$ ./perf record -e '{cycles,cache-misses}:S'
- samples just on 'cycles' samples and read both period values
(cycles and cache-misses) via PERF_SAMPLE_READ group format

$ ./perf record -e '{cycles,cache-misses}:S,instructions' ls
$ ./perf record -e '{cycles:S,cache-misses},instructions' ls
$ ./perf record -e 'cycles:S,instructions' ls
- non matching sample_type

hm, thats the unique sample_type issue again ;) Once we set
PERF_SAMPLE_READ for event or group, we need to set it for
all other events in session, otherwise the report fails

sooo, it looks like:
- global record '-S' option instead, setting PERF_SAMPLE_READ sample type
globaly for all events
- and ':S' group modifier to enable sampling only on leader
with group format reading for the rest of the group
- ':S' group modifier alone would imply -S

How about that?

thanks,
jirka

2012-10-26 15:39:14

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

2012-10-26 (금), 12:23 +0200, Jiri Olsa:
> On Fri, Oct 26, 2012 at 11:14:45AM +0200, Peter Zijlstra wrote:
> > I haven't actually looked at the implementation, but I understood it to
> > be a group modifier, not an event modifier.
>
> we might want to be able to use PERF_SAMPLE_READ for single event
> same as for groups

What is the merits of doing it?

>
> the difference between single event and group 'S' modifier:
>
> $ ./perf record -e 'cycles:S' ls
> - records 'cycles' samples and read period value via PERF_SAMPLE_READ
>
> $ ./perf record -e '{cycles,cache-misses}:S'
> - samples just on 'cycles' samples and read both period values
> (cycles and cache-misses) via PERF_SAMPLE_READ group format
>
> $ ./perf record -e '{cycles,cache-misses}:S,instructions' ls
> $ ./perf record -e '{cycles:S,cache-misses},instructions' ls
> $ ./perf record -e 'cycles:S,instructions' ls
> - non matching sample_type
>
> hm, thats the unique sample_type issue again ;) Once we set
> PERF_SAMPLE_READ for event or group, we need to set it for
> all other events in session, otherwise the report fails

Sorry, I don't understand why we need to set it for all events. Just
setting it for a group is not sufficient? Please shed some light on
this.

Thanks,
Namhyung

>
> sooo, it looks like:
> - global record '-S' option instead, setting PERF_SAMPLE_READ sample type
> globaly for all events
> - and ':S' group modifier to enable sampling only on leader
> with group format reading for the rest of the group
> - ':S' group modifier alone would imply -S
>
> How about that?
>
> thanks,
> jirka


2012-10-26 16:14:57

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On 10/26/12 9:39 AM, Namhyung Kim wrote:
>> hm, thats the unique sample_type issue again ;) Once we set
>> PERF_SAMPLE_READ for event or group, we need to set it for
>> all other events in session, otherwise the report fails
>
> Sorry, I don't understand why we need to set it for all events. Just
> setting it for a group is not sufficient? Please shed some light on
> this.

perf does not support multiple sample_types -- all events in the session
need to use the same one.

David

2012-10-26 16:25:39

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

2012-10-26 (금), 10:14 -0600, David Ahern:
> On 10/26/12 9:39 AM, Namhyung Kim wrote:
> >> hm, thats the unique sample_type issue again ;) Once we set
> >> PERF_SAMPLE_READ for event or group, we need to set it for
> >> all other events in session, otherwise the report fails
> >
> > Sorry, I don't understand why we need to set it for all events. Just
> > setting it for a group is not sufficient? Please shed some light on
> > this.
>
> perf does not support multiple sample_types -- all events in the session
> need to use the same one.

Can I ask why? :)

Thanks,
Namhyung

2012-10-26 16:48:05

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

On 10/26/12 10:25 AM, Namhyung Kim wrote:
> 2012-10-26 (금), 10:14 -0600, David Ahern:
>> On 10/26/12 9:39 AM, Namhyung Kim wrote:
>>>> hm, thats the unique sample_type issue again ;) Once we set
>>>> PERF_SAMPLE_READ for event or group, we need to set it for
>>>> all other events in session, otherwise the report fails
>>>
>>> Sorry, I don't understand why we need to set it for all events. Just
>>> setting it for a group is not sufficient? Please shed some light on
>>> this.
>>
>> perf does not support multiple sample_types -- all events in the session
>> need to use the same one.
>
> Can I ask why? :)

Of course:
https://lkml.org/lkml/2011/8/15/6
https://lkml.org/lkml/2011/9/29/502

David

2012-10-26 17:00:47

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 00/11] perf tool: Add PERF_SAMPLE_READ sample read support

Em Sat, Oct 27, 2012 at 12:39:06AM +0900, Namhyung Kim escreveu:
> 2012-10-26 (금), 12:23 +0200, Jiri Olsa:
> > $ ./perf record -e '{cycles,cache-misses}:S,instructions' ls
> > $ ./perf record -e '{cycles:S,cache-misses},instructions' ls
> > $ ./perf record -e 'cycles:S,instructions' ls
> > - non matching sample_type
> >
> > hm, thats the unique sample_type issue again ;) Once we set
> > PERF_SAMPLE_READ for event or group, we need to set it for
> > all other events in session, otherwise the report fails
>
> Sorry, I don't understand why we need to set it for all events. Just
> setting it for a group is not sufficient? Please shed some light on
> this.

Artificial limitation on having multiple sample_type's on a single
perf.data?

My plan is to experiment with multiple evlists in 'trace' to get, say,
callchains on some events but not in the others.

- Arnaldo

2012-10-30 12:12:35

by Jiri Olsa

[permalink] [raw]
Subject: [tip:perf/core] perf record: Fix mmap error output condition

Commit-ID: 0089fa9831f3752869143c7928adb94ac6353085
Gitweb: http://git.kernel.org/tip/0089fa9831f3752869143c7928adb94ac6353085
Author: Jiri Olsa <[email protected]>
AuthorDate: Sat, 20 Oct 2012 16:33:19 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Mon, 29 Oct 2012 12:14:59 -0200

perf record: Fix mmap error output condition

The mmap_pages default value is not power of 2 (UINT_MAX).

Together with perf_evlist__mmap function returning error value different
from EPERM, we get misleading error message: "--mmap_pages/-m value must
be a power of two."

Fixing this by adding extra check for UINT_MAX value for this error
condition.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 53c9892..5783c32 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -363,7 +363,8 @@ try_again:
"or try again with a smaller value of -m/--mmap_pages.\n"
"(current value: %d)\n", opts->mmap_pages);
rc = -errno;
- } else if (!is_power_of_2(opts->mmap_pages)) {
+ } else if (!is_power_of_2(opts->mmap_pages) &&
+ (opts->mmap_pages != UINT_MAX)) {
pr_err("--mmap_pages/-m value must be a power of two.");
rc = -EINVAL;
} else {