2012-05-25 21:49:15

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [GIT PULL 0/5 v2] perf/urgent fixes

Hi Ingo,

There was a problem in the previous series, the 4 patches that ended up
being posted were not the 4 on perf/urgent, but on my local perf/core branch,
so I'm reposting now and have resigned the perf-urgent-for-mingo signed tag.

Please consider pulling and sorry for the thinko,

- Arnaldo

The following changes since commit eaec12d7f526694f24d581a4ad23de6ce0315cd2:

tools lib traceevent: Fix signature of create_arg_item() (2012-05-24 11:36:05 -0300)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux tags/perf-urgent-for-mingo

for you to fetch changes up to a00dc319e98161949aa87f71a17db32e925c3257:

perf record: Fix branch_stack type in perf_record_opts (2012-05-25 18:32:44 -0300)

----------------------------------------------------------------
Fixes for perf/urgent.

. Fix build on newer distros where _FORTIFY_SOURCE needs optimization enable.

. Fix event name caching when fallbacking to cpu-clock

. Event name reconstruction from perf_event_attr now include modifiers.

. Elliminate leak on error path when parsing pid/tid list, From Franck Bui-Huu.

. Fix type problems resulting in endianness problems on PPC64, from Stephane Eranian.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

----------------------------------------------------------------
Arnaldo Carvalho de Melo (3):
perf tools: Do not use _FORTIFY_SOURCE when DEBUG=1 is specified
perf top: Fix counter name fixup when fallbacking to cpu-clock
perf tools: Reconstruct event with modifiers from perf_event_attr

Franck Bui-Huu (1):
perf tools: fix thread_map__new_by_pid_str() memory leak in error path

Stephane Eranian (1):
perf record: Fix branch_stack type in perf_record_opts

tools/perf/Makefile | 4 +-
tools/perf/builtin-top.c | 2 +-
tools/perf/perf.h | 2 +-
tools/perf/util/evsel.c | 90 ++++++++++++++++++++++++++++++++++++++++
tools/perf/util/evsel.h | 3 ++
tools/perf/util/parse-events.c | 27 +++++-------
tools/perf/util/thread_map.c | 21 +++++-----
7 files changed, 118 insertions(+), 31 deletions(-)


2012-05-25 21:49:05

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 2/5 v2] perf tools: fix thread_map__new_by_pid_str() memory leak in error path

From: Franck Bui-Huu <[email protected]>

The namelist array (including its content) was not freed if we fail to
realloc a new 'threads' structure.

Signed-off-by: Franck Bui-Huu <[email protected]>
Cc: David Ahern <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/thread_map.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 84d9bd78..9b5f856 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -188,28 +188,27 @@ static struct thread_map *thread_map__new_by_pid_str(const char *pid_str)
nt = realloc(threads, (sizeof(*threads) +
sizeof(pid_t) * total_tasks));
if (nt == NULL)
- goto out_free_threads;
+ goto out_free_namelist;

threads = nt;

- if (threads) {
- for (i = 0; i < items; i++)
- threads->map[j++] = atoi(namelist[i]->d_name);
- threads->nr = total_tasks;
- }
-
- for (i = 0; i < items; i++)
+ for (i = 0; i < items; i++) {
+ threads->map[j++] = atoi(namelist[i]->d_name);
free(namelist[i]);
+ }
+ threads->nr = total_tasks;
free(namelist);
-
- if (!threads)
- break;
}

out:
strlist__delete(slist);
return threads;

+out_free_namelist:
+ for (i = 0; i < items; i++)
+ free(namelist[i]);
+ free(namelist);
+
out_free_threads:
free(threads);
threads = NULL;
--
1.7.9.2.358.g22243

2012-05-25 21:49:10

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 3/5 v2] perf top: Fix counter name fixup when fallbacking to cpu-clock

From: Arnaldo Carvalho de Melo <[email protected]>

In 40491eaa "perf top: Update event name when falling back to cpu-clock"
we freed counter->name but didn't reset it to NULL, then when setting it
to the result of event_name(), event_name() would use the cached value,
which by now was overwritten and thus we got garbage or a zero lenght
string.

Fix it by just freeing and setting counter->name to NULL, this way
event_name() when called afterwards, will find the right counter name
and cache it again.

Found while trying 'cycles:pp' on a machine were :pp couldn't be
honoured. Probably the best fallback here is to tell the user that that
level of precision is not available on the PMU and then go removing 'p',
levels of precision till we get to play 'cycles' and if even that fails,
_then_ get to 'cpu-clock'.

But that is the matter for another patch, this one just needs to fix the
caching issue, which in the end will show 'cpu-clock' when tools ask for
the event name being used, which clarifies things for the user, that
will see that 'cycles:pp' or whatever not support event is not being
used, some sort of fallback happened.

Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-top.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 6031dce..d4a5f9b 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -953,7 +953,7 @@ try_again:
attr->config = PERF_COUNT_SW_CPU_CLOCK;
if (counter->name) {
free(counter->name);
- counter->name = strdup(event_name(counter));
+ counter->name = NULL;
}
goto try_again;
}
--
1.7.9.2.358.g22243

2012-05-25 21:49:34

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 5/5 v2] perf record: Fix branch_stack type in perf_record_opts

From: Stephane Eranian <[email protected]>

The attr.branch_sample_type field is defined as u64 by the API. As
such, we need to ensure the variable holding the value of the branch
stack filters is also u64 otherwise we may lose bits in the future.

Note also that the bogus definition of the field in perf_record_opts
caused problems on big-endian PPC systems. Thanks to Anshuman Khandual
for tracking the problem on PPC.

Reported-by: Anshuman Khandual <[email protected]>
Signed-off-by: Stephane Eranian <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/20120525211344.GA7729@quad
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/perf.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 14f1034..f960ccb 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -227,7 +227,7 @@ struct perf_record_opts {
unsigned int freq;
unsigned int mmap_pages;
unsigned int user_freq;
- int branch_stack;
+ u64 branch_stack;
u64 default_interval;
u64 user_interval;
};
--
1.7.9.2.358.g22243

2012-05-25 21:49:08

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 1/5 v2] perf tools: Do not use _FORTIFY_SOURCE when DEBUG=1 is specified

From: Arnaldo Carvalho de Melo <[email protected]>

As:

make DEBUG=1 -C tools/perf

disables optimizations and _FORTIFY_SOURCE in recent distros requires
optimizations to be enabled, seen on a Fedora 17 system:

[acme@Fedora17 linux]$ make DEBUG=1 O=/home/acme/git/build/perf/ -C
tools/perf install
In file included from /usr/include/sys/types.h:26:0,
from /usr/include/libelf.h:53,
from /usr/include/gelf.h:53,
from /usr/include/elfutils/libdw.h:53,
from <stdin>:2:
/usr/include/features.h:314:4: error: #warning _FORTIFY_SOURCE requires
compiling with optimization (-O) [-Werror=cpp

Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 1d3d513..0eee64c 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -80,7 +80,7 @@ ifeq ("$(origin DEBUG)", "command line")
PERF_DEBUG = $(DEBUG)
endif
ifndef PERF_DEBUG
- CFLAGS_OPTIMIZE = -O6
+ CFLAGS_OPTIMIZE = -O6 -D_FORTIFY_SOURCE=2
endif

ifdef PARSER_DEBUG
@@ -89,7 +89,7 @@ ifdef PARSER_DEBUG
PARSER_DEBUG_CFLAGS := -DPARSER_DEBUG
endif

-CFLAGS = -fno-omit-frame-pointer -ggdb3 -Wall -Wextra -std=gnu99 $(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) -D_FORTIFY_SOURCE=2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) $(PARSER_DEBUG_CFLAGS)
+CFLAGS = -fno-omit-frame-pointer -ggdb3 -Wall -Wextra -std=gnu99 $(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) $(PARSER_DEBUG_CFLAGS)
EXTLIBS = -lpthread -lrt -lelf -lm
ALL_CFLAGS = $(CFLAGS) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
ALL_LDFLAGS = $(LDFLAGS)
--
1.7.9.2.358.g22243

2012-05-25 21:49:53

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 4/5 v2] perf tools: Reconstruct event with modifiers from perf_event_attr

From: Arnaldo Carvalho de Melo <[email protected]>

The modifiers:

k kernel space
u user space
h hypervisor
G guest
H host
p, pp, ppp precision level (PEBS)

that can be suffixed to an event were lost when tools used event_name()
to reconstruct them from the perf_event_attr entries in a perf.data
file.

Fix it by following the defaults used for these modifiers in the current
codebase, so:

$ perf record -e instructions:u usleep 1 2> /dev/null
$ perf evlist
instructions:u
$ perf record -e cycles:k usleep 1 2> /dev/null
$ perf evlist
cycles:k
$ perf record -e cycles:kh usleep 1 2> /dev/null
$ perf evlist
cycles:kh
$ perf record -e cache-misses:G usleep 1 2> /dev/null
$ perf evlist
cache-misses:G
$ perf record -e cycles:ppk usleep 1 2> /dev/null
$ perf evlist
cycles:kpp
$

Also works with 'top', 'report', etc.

More work needed to cover tracepoints and software events while not
dragging lots of baggage to the python binding, this is a minimal fix
for v3.5.

Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/evsel.c | 90 ++++++++++++++++++++++++++++++++++++++++
tools/perf/util/evsel.h | 3 ++
tools/perf/util/parse-events.c | 27 +++++-------
3 files changed, 104 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 57e4ce5..91d1913 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -15,6 +15,7 @@
#include "cpumap.h"
#include "thread_map.h"
#include "target.h"
+#include "../../include/linux/perf_event.h"

#define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
#define GROUP_FD(group_fd, cpu) (*(int *)xyarray__entry(group_fd, cpu, 0))
@@ -64,6 +65,95 @@ struct perf_evsel *perf_evsel__new(struct perf_event_attr *attr, int idx)
return evsel;
}

+static const char *perf_evsel__hw_names[PERF_COUNT_HW_MAX] = {
+ "cycles",
+ "instructions",
+ "cache-references",
+ "cache-misses",
+ "branches",
+ "branch-misses",
+ "bus-cycles",
+ "stalled-cycles-frontend",
+ "stalled-cycles-backend",
+ "ref-cycles",
+};
+
+const char *__perf_evsel__hw_name(u64 config)
+{
+ if (config < PERF_COUNT_HW_MAX && perf_evsel__hw_names[config])
+ return perf_evsel__hw_names[config];
+
+ return "unknown-hardware";
+}
+
+static int perf_evsel__hw_name(struct perf_evsel *evsel, char *bf, size_t size)
+{
+ int colon = 0;
+ struct perf_event_attr *attr = &evsel->attr;
+ int r = scnprintf(bf, size, "%s", __perf_evsel__hw_name(attr->config));
+ bool exclude_guest_default = false;
+
+#define MOD_PRINT(context, mod) do { \
+ if (!attr->exclude_##context) { \
+ if (!colon) colon = r++; \
+ r += scnprintf(bf + r, size - r, "%c", mod); \
+ } } while(0)
+
+ if (attr->exclude_kernel || attr->exclude_user || attr->exclude_hv) {
+ MOD_PRINT(kernel, 'k');
+ MOD_PRINT(user, 'u');
+ MOD_PRINT(hv, 'h');
+ exclude_guest_default = true;
+ }
+
+ if (attr->precise_ip) {
+ if (!colon)
+ colon = r++;
+ r += scnprintf(bf + r, size - r, "%.*s", attr->precise_ip, "ppp");
+ exclude_guest_default = true;
+ }
+
+ if (attr->exclude_host || attr->exclude_guest == exclude_guest_default) {
+ MOD_PRINT(host, 'H');
+ MOD_PRINT(guest, 'G');
+ }
+#undef MOD_PRINT
+ if (colon)
+ bf[colon] = ':';
+ return r;
+}
+
+int perf_evsel__name(struct perf_evsel *evsel, char *bf, size_t size)
+{
+ int ret;
+
+ switch (evsel->attr.type) {
+ case PERF_TYPE_RAW:
+ ret = scnprintf(bf, size, "raw 0x%" PRIx64, evsel->attr.config);
+ break;
+
+ case PERF_TYPE_HARDWARE:
+ ret = perf_evsel__hw_name(evsel, bf, size);
+ break;
+ default:
+ /*
+ * FIXME
+ *
+ * This is the minimal perf_evsel__name so that we can
+ * reconstruct event names taking into account event modifiers.
+ *
+ * The old event_name uses it now for raw anr hw events, so that
+ * we don't drag all the parsing stuff into the python binding.
+ *
+ * On the next devel cycle the rest of the event naming will be
+ * brought here.
+ */
+ return 0;
+ }
+
+ return ret;
+}
+
void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
struct perf_evsel *first)
{
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 3d6b3e4..4ba8b56 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -83,6 +83,9 @@ void perf_evsel__config(struct perf_evsel *evsel,
struct perf_record_opts *opts,
struct perf_evsel *first);

+const char* __perf_evsel__hw_name(u64 config);
+int perf_evsel__name(struct perf_evsel *evsel, char *bf, size_t size);
+
int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads);
int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads);
int perf_evsel__alloc_counts(struct perf_evsel *evsel, int ncpus);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index fac7d59..05dbc8b 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -62,19 +62,6 @@ static struct event_symbol event_symbols[] = {
#define PERF_EVENT_TYPE(config) __PERF_EVENT_FIELD(config, TYPE)
#define PERF_EVENT_ID(config) __PERF_EVENT_FIELD(config, EVENT)

-static const char *hw_event_names[PERF_COUNT_HW_MAX] = {
- "cycles",
- "instructions",
- "cache-references",
- "cache-misses",
- "branches",
- "branch-misses",
- "bus-cycles",
- "stalled-cycles-frontend",
- "stalled-cycles-backend",
- "ref-cycles",
-};
-
static const char *sw_event_names[PERF_COUNT_SW_MAX] = {
"cpu-clock",
"task-clock",
@@ -300,6 +287,16 @@ const char *event_name(struct perf_evsel *evsel)
u64 config = evsel->attr.config;
int type = evsel->attr.type;

+ if (type == PERF_TYPE_RAW || type == PERF_TYPE_HARDWARE) {
+ /*
+ * XXX minimal fix, see comment on perf_evsen__name, this static buffer
+ * will go away together with event_name in the next devel cycle.
+ */
+ static char bf[128];
+ perf_evsel__name(evsel, bf, sizeof(bf));
+ return bf;
+ }
+
if (evsel->name)
return evsel->name;

@@ -317,9 +314,7 @@ const char *__event_name(int type, u64 config)

switch (type) {
case PERF_TYPE_HARDWARE:
- if (config < PERF_COUNT_HW_MAX && hw_event_names[config])
- return hw_event_names[config];
- return "unknown-hardware";
+ return __perf_evsel__hw_name(config);

case PERF_TYPE_HW_CACHE: {
u8 cache_type, cache_op, cache_result;
--
1.7.9.2.358.g22243