2010-11-19 21:30:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [GIT PULL 0/3] perf tools improvements

From: Arnaldo Carvalho de Melo <[email protected]>

Hi Ingo,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 perf/core

Regards,

- Arnaldo

Arnaldo Carvalho de Melo (1):
perf tools: Change my maintainer address

Robert Morell (1):
perf tools: Remove hardcoded include paths for elfutils

Stephane Eranian (1):
perf stat: Add no-aggregation mode to -a

CREDITS | 2 -
MAINTAINERS | 2 +-
tools/perf/Documentation/perf-stat.txt | 5 +
tools/perf/Makefile | 4 +-
tools/perf/builtin-stat.c | 169 +++++++++++++++++++++++++++-----
tools/perf/feature-tests.mak | 4 +-
tools/perf/util/probe-finder.h | 6 +-
7 files changed, 157 insertions(+), 35 deletions(-)


2010-11-19 21:29:56

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 3/3] perf tools: Change my maintainer address

From: Arnaldo Carvalho de Melo <[email protected]>

Also remove old snail mail address from CREDITS, moved years ago.

LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
CREDITS | 2 --
MAINTAINERS | 2 +-
2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/CREDITS b/CREDITS
index 41d8e63..494b6e4 100644
--- a/CREDITS
+++ b/CREDITS
@@ -2365,8 +2365,6 @@ E: [email protected]
W: http://oops.ghostprotocols.net:81/blog/
P: 1024D/9224DF01 D5DF E3BB E3C8 BCBB F8AD 841A B6AB 4681 9224 DF01
D: IPX, LLC, DCCP, cyc2x, wl3501_cs, net/ hacks
-S: R. Brasílio Itiberê, 4270/1010 - Água Verde
-S: 80240-060 - Curitiba - Paraná
S: Brazil

N: Karsten Merker
diff --git a/MAINTAINERS b/MAINTAINERS
index 8e6548d..f94cd37 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4602,7 +4602,7 @@ PERFORMANCE EVENTS SUBSYSTEM
M: Peter Zijlstra <[email protected]>
M: Paul Mackerras <[email protected]>
M: Ingo Molnar <[email protected]>
-M: Arnaldo Carvalho de Melo <[email protected]>
+M: Arnaldo Carvalho de Melo <[email protected]>
S: Supported
F: kernel/perf_event*.c
F: include/linux/perf_event.h
--
1.6.2.5

2010-11-19 21:29:58

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 2/4] perf scripting: Shut up 'perf record' final status

From: Arnaldo Carvalho de Melo <[email protected]>

We want just the script output, not internal details about the record phase.

Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 4 ++++
tools/perf/builtin-record.c | 4 ++++
tools/perf/builtin-trace.c | 9 +++++----
tools/perf/util/debug.c | 4 ++--
tools/perf/util/debug.h | 2 +-
5 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 3ee27dc..a91f9f9 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -83,6 +83,10 @@ OPTIONS
--call-graph::
Do call-graph (stack chain/backtrace) recording.

+-q::
+--quiet::
+ Don't print any message, useful for scripting.
+
-v::
--verbose::
Be more verbose (show counter open errors, etc).
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b530bee..4e75583 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -761,6 +761,9 @@ static int __cmd_record(int argc, const char **argv)
}
}

+ if (quiet)
+ return 0;
+
fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);

/*
@@ -820,6 +823,7 @@ static const struct option options[] = {
"do call-graph (stack chain/backtrace) recording"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose (show counter open errors, etc)"),
+ OPT_BOOLEAN('q', "quiet", &quiet, "don't print any message"),
OPT_BOOLEAN('s', "stat", &inherit_stat,
"per thread counts"),
OPT_BOOLEAN('d', "data", &sample_address,
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index deda1a9..2f8df45 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -625,12 +625,13 @@ int cmd_trace(int argc, const char **argv, const char *prefix __used)
dup2(live_pipe[1], 1);
close(live_pipe[0]);

- __argv = malloc(5 * sizeof(const char *));
+ __argv = malloc(6 * sizeof(const char *));
__argv[0] = "/bin/sh";
__argv[1] = record_script_path;
- __argv[2] = "-o";
- __argv[3] = "-";
- __argv[4] = NULL;
+ __argv[2] = "-q";
+ __argv[3] = "-o";
+ __argv[4] = "-";
+ __argv[5] = NULL;

execvp("/bin/sh", (char **)__argv);
exit(-1);
diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index f9c7e3a..c8d81b0 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -12,8 +12,8 @@
#include "debug.h"
#include "util.h"

-int verbose = 0;
-bool dump_trace = false;
+int verbose;
+bool dump_trace = false, quiet = false;

int eprintf(int level, const char *fmt, ...)
{
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index 7a17ee0..7b51408 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -6,7 +6,7 @@
#include "event.h"

extern int verbose;
-extern bool dump_trace;
+extern bool quiet, dump_trace;

int dump_printf(const char *fmt, ...) __attribute__((format(printf, 1, 2)));
void trace_event(event_t *event);
--
1.6.2.5

2010-11-19 21:30:03

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 3/4] perf python scripting: Fixup cut'n'paste error in sctop script

From: Arnaldo Carvalho de Melo <[email protected]>

Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/scripts/python/sctop.py | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/scripts/python/sctop.py b/tools/perf/scripts/python/sctop.py
index 547cbe9..7a6ec2c 100644
--- a/tools/perf/scripts/python/sctop.py
+++ b/tools/perf/scripts/python/sctop.py
@@ -17,7 +17,7 @@ from perf_trace_context import *
from Core import *
from Util import *

-usage = "perf trace -s syscall-counts.py [comm] [interval]\n";
+usage = "perf trace -s sctop.py [comm] [interval]\n";

for_comm = None
default_interval = 3
--
1.6.2.5

2010-11-19 21:30:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 1/4] perf record: Remove newline character from perror() argument

From: Matt Fleming <[email protected]>

If we include a newline character in the string argument to perror()
then the output will be split across two lines like so,

Unable to read perf file descriptor
: No space left on device

Deleting the newline character prints a much more readable error,

Unable to read perf file descriptor: No space left on device

Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <89e77b54659bc3798b23a5596c2debb7f6f4cf27.1283010281.git.matt@console-pimps.org>
Signed-off-by: Matt Fleming <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ff77b80..b530bee 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -353,7 +353,7 @@ try_again:
}

if (read(fd[nr_cpu][counter][thread_index], &read_data, sizeof(read_data)) == -1) {
- perror("Unable to read perf file descriptor\n");
+ perror("Unable to read perf file descriptor");
exit(-1);
}

@@ -626,7 +626,7 @@ static int __cmd_record(int argc, const char **argv)

nr_cpus = read_cpu_map(cpu_list);
if (nr_cpus < 1) {
- perror("failed to collect number of CPUs\n");
+ perror("failed to collect number of CPUs");
return -1;
}

--
1.6.2.5

2010-11-19 21:30:27

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 1/3] perf stat: Add no-aggregation mode to -a

From: Stephane Eranian <[email protected]>

This patch adds a new -A option to perf stat. If specified then perf stat does
not aggregate counts across all monitored CPUs in system-wide mode, i.e., when
using -a. This option is not supported in per-thread mode.

Being able to get a per-cpu breakdown is useful to detect imbalances between
CPUs when running a uniform workload than spans all monitored CPUs.

The second version corrects the missing cpumap[] support, so that it works when
the -C option is used.

The third version fixes a missing cpumap[] in print_counter() and removes a
stray patch in builtin-trace.c.

Examples on a 4-way system:

# perf stat -a -e cycles,instructions -- sleep 1
Performance counter stats for 'sleep 1':
9592808135 cycles
3490380006 instructions # 0.364 IPC
1.001584632 seconds time elapsed

# perf stat -a -A -e cycles,instructions -- sleep 1
Performance counter stats for 'sleep 1':
CPU0 2398163767 cycles
CPU1 2398180817 cycles
CPU2 2398217115 cycles
CPU3 2398247483 cycles
CPU0 872282046 instructions # 0.364 IPC
CPU1 873481776 instructions # 0.364 IPC
CPU2 872638127 instructions # 0.364 IPC
CPU3 872437789 instructions # 0.364 IPC
1.001556052 seconds time elapsed

Cc: David S. Miller <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 5 +
tools/perf/builtin-stat.c | 169 +++++++++++++++++++++++++++-----
2 files changed, 149 insertions(+), 25 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 4b3a2d4..c405bca 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -53,6 +53,11 @@ comma-sperated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2
In per-thread mode, this option is ignored. The -a option is still necessary
to activate system-wide monitoring. Default is to count on all CPUs.

+-A::
+--no-aggr::
+Do not aggregate counts across all monitored CPUs in system-wide mode (-a).
+This option is only valid in system-wide mode.
+
EXAMPLES
--------

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a6b4d44..b3e568f 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -75,6 +75,7 @@ static int run_idx = 0;
static int run_count = 1;
static bool no_inherit = false;
static bool scale = true;
+static bool no_aggr = false;
static pid_t target_pid = -1;
static pid_t target_tid = -1;
static pid_t *all_tids = NULL;
@@ -89,6 +90,12 @@ static int *fd[MAX_NR_CPUS][MAX_COUNTERS];

static int event_scaled[MAX_COUNTERS];

+static struct {
+ u64 val;
+ u64 ena;
+ u64 run;
+} cpu_counts[MAX_NR_CPUS][MAX_COUNTERS];
+
static volatile int done = 0;

struct stats
@@ -136,10 +143,10 @@ static double stddev_stats(struct stats *stats)
}

struct stats event_res_stats[MAX_COUNTERS][3];
-struct stats runtime_nsecs_stats;
+struct stats runtime_nsecs_stats[MAX_NR_CPUS];
+struct stats runtime_cycles_stats[MAX_NR_CPUS];
+struct stats runtime_branches_stats[MAX_NR_CPUS];
struct stats walltime_nsecs_stats;
-struct stats runtime_cycles_stats;
-struct stats runtime_branches_stats;

#define MATCH_EVENT(t, c, counter) \
(attrs[counter].type == PERF_TYPE_##t && \
@@ -205,8 +212,9 @@ static inline int nsec_counter(int counter)

/*
* Read out the results of a single counter:
+ * aggregate counts across CPUs in system-wide mode
*/
-static void read_counter(int counter)
+static void read_counter_aggr(int counter)
{
u64 count[3], single_count[3];
int cpu;
@@ -264,11 +272,58 @@ static void read_counter(int counter)
* Save the full runtime - to allow normalization during printout:
*/
if (MATCH_EVENT(SOFTWARE, SW_TASK_CLOCK, counter))
- update_stats(&runtime_nsecs_stats, count[0]);
+ update_stats(&runtime_nsecs_stats[0], count[0]);
if (MATCH_EVENT(HARDWARE, HW_CPU_CYCLES, counter))
- update_stats(&runtime_cycles_stats, count[0]);
+ update_stats(&runtime_cycles_stats[0], count[0]);
if (MATCH_EVENT(HARDWARE, HW_BRANCH_INSTRUCTIONS, counter))
- update_stats(&runtime_branches_stats, count[0]);
+ update_stats(&runtime_branches_stats[0], count[0]);
+}
+
+/*
+ * Read out the results of a single counter:
+ * do not aggregate counts across CPUs in system-wide mode
+ */
+static void read_counter(int counter)
+{
+ u64 count[3];
+ int cpu;
+ size_t res, nv;
+
+ count[0] = count[1] = count[2] = 0;
+
+ nv = scale ? 3 : 1;
+
+ for (cpu = 0; cpu < nr_cpus; cpu++) {
+
+ if (fd[cpu][counter][0] < 0)
+ continue;
+
+ res = read(fd[cpu][counter][0], count, nv * sizeof(u64));
+
+ assert(res == nv * sizeof(u64));
+
+ close(fd[cpu][counter][0]);
+ fd[cpu][counter][0] = -1;
+
+ if (scale) {
+ if (count[2] == 0) {
+ count[0] = 0;
+ } else if (count[2] < count[1]) {
+ count[0] = (unsigned long long)
+ ((double)count[0] * count[1] / count[2] + 0.5);
+ }
+ }
+ cpu_counts[cpu][counter].val = count[0]; /* scaled count */
+ cpu_counts[cpu][counter].ena = count[1];
+ cpu_counts[cpu][counter].run = count[2];
+
+ if (MATCH_EVENT(SOFTWARE, SW_TASK_CLOCK, counter))
+ update_stats(&runtime_nsecs_stats[cpu], count[0]);
+ if (MATCH_EVENT(HARDWARE, HW_CPU_CYCLES, counter))
+ update_stats(&runtime_cycles_stats[cpu], count[0]);
+ if (MATCH_EVENT(HARDWARE, HW_BRANCH_INSTRUCTIONS, counter))
+ update_stats(&runtime_branches_stats[cpu], count[0]);
+ }
}

static int run_perf_stat(int argc __used, const char **argv)
@@ -362,9 +417,13 @@ static int run_perf_stat(int argc __used, const char **argv)

update_stats(&walltime_nsecs_stats, t1 - t0);

- for (counter = 0; counter < nr_counters; counter++)
- read_counter(counter);
-
+ if (no_aggr) {
+ for (counter = 0; counter < nr_counters; counter++)
+ read_counter(counter);
+ } else {
+ for (counter = 0; counter < nr_counters; counter++)
+ read_counter_aggr(counter);
+ }
return WEXITSTATUS(status);
}

@@ -377,11 +436,15 @@ static void print_noise(int counter, double avg)
100 * stddev_stats(&event_res_stats[counter][0]) / avg);
}

-static void nsec_printout(int counter, double avg)
+static void nsec_printout(int cpu, int counter, double avg)
{
double msecs = avg / 1e6;

- fprintf(stderr, " %18.6f %-24s", msecs, event_name(counter));
+ if (no_aggr)
+ fprintf(stderr, "CPU%-4d %18.6f %-24s",
+ cpumap[cpu], msecs, event_name(counter));
+ else
+ fprintf(stderr, " %18.6f %-24s", msecs, event_name(counter));

if (MATCH_EVENT(SOFTWARE, SW_TASK_CLOCK, counter)) {
fprintf(stderr, " # %10.3f CPUs ",
@@ -389,33 +452,41 @@ static void nsec_printout(int counter, double avg)
}
}

-static void abs_printout(int counter, double avg)
+static void abs_printout(int cpu, int counter, double avg)
{
double total, ratio = 0.0;
+ char cpustr[16] = { '\0', };
+
+ if (no_aggr)
+ sprintf(cpustr, "CPU%-4d", cpumap[cpu]);
+ else
+ cpu = 0;

if (big_num)
- fprintf(stderr, " %'18.0f %-24s", avg, event_name(counter));
+ fprintf(stderr, "%s %'18.0f %-24s",
+ cpustr, avg, event_name(counter));
else
- fprintf(stderr, " %18.0f %-24s", avg, event_name(counter));
+ fprintf(stderr, "%s %18.0f %-24s",
+ cpustr, avg, event_name(counter));

if (MATCH_EVENT(HARDWARE, HW_INSTRUCTIONS, counter)) {
- total = avg_stats(&runtime_cycles_stats);
+ total = avg_stats(&runtime_cycles_stats[cpu]);

if (total)
ratio = avg / total;

fprintf(stderr, " # %10.3f IPC ", ratio);
} else if (MATCH_EVENT(HARDWARE, HW_BRANCH_MISSES, counter) &&
- runtime_branches_stats.n != 0) {
- total = avg_stats(&runtime_branches_stats);
+ runtime_branches_stats[cpu].n != 0) {
+ total = avg_stats(&runtime_branches_stats[cpu]);

if (total)
ratio = avg * 100 / total;

fprintf(stderr, " # %10.3f %% ", ratio);

- } else if (runtime_nsecs_stats.n != 0) {
- total = avg_stats(&runtime_nsecs_stats);
+ } else if (runtime_nsecs_stats[cpu].n != 0) {
+ total = avg_stats(&runtime_nsecs_stats[cpu]);

if (total)
ratio = 1000.0 * avg / total;
@@ -426,8 +497,9 @@ static void abs_printout(int counter, double avg)

/*
* Print out the results of a single counter:
+ * aggregated counts in system-wide mode
*/
-static void print_counter(int counter)
+static void print_counter_aggr(int counter)
{
double avg = avg_stats(&event_res_stats[counter][0]);
int scaled = event_scaled[counter];
@@ -439,9 +511,9 @@ static void print_counter(int counter)
}

if (nsec_counter(counter))
- nsec_printout(counter, avg);
+ nsec_printout(-1, counter, avg);
else
- abs_printout(counter, avg);
+ abs_printout(-1, counter, avg);

print_noise(counter, avg);

@@ -458,6 +530,42 @@ static void print_counter(int counter)
fprintf(stderr, "\n");
}

+/*
+ * Print out the results of a single counter:
+ * does not use aggregated count in system-wide
+ */
+static void print_counter(int counter)
+{
+ u64 ena, run, val;
+ int cpu;
+
+ for (cpu = 0; cpu < nr_cpus; cpu++) {
+ val = cpu_counts[cpu][counter].val;
+ ena = cpu_counts[cpu][counter].ena;
+ run = cpu_counts[cpu][counter].run;
+ if (run == 0 || ena == 0) {
+ fprintf(stderr, "CPU%-4d %18s %-24s", cpumap[cpu],
+ "<not counted>", event_name(counter));
+
+ fprintf(stderr, "\n");
+ continue;
+ }
+
+ if (nsec_counter(counter))
+ nsec_printout(cpu, counter, val);
+ else
+ abs_printout(cpu, counter, val);
+
+ print_noise(counter, 1.0);
+
+ if (run != ena) {
+ fprintf(stderr, " (scaled from %.2f%%)",
+ 100.0 * run / ena);
+ }
+ fprintf(stderr, "\n");
+ }
+}
+
static void print_stat(int argc, const char **argv)
{
int i, counter;
@@ -480,8 +588,13 @@ static void print_stat(int argc, const char **argv)
fprintf(stderr, " (%d runs)", run_count);
fprintf(stderr, ":\n\n");

- for (counter = 0; counter < nr_counters; counter++)
- print_counter(counter);
+ if (no_aggr) {
+ for (counter = 0; counter < nr_counters; counter++)
+ print_counter(counter);
+ } else {
+ for (counter = 0; counter < nr_counters; counter++)
+ print_counter_aggr(counter);
+ }

fprintf(stderr, "\n");
fprintf(stderr, " %18.9f seconds time elapsed",
@@ -545,6 +658,8 @@ static const struct option options[] = {
"print large numbers with thousands\' separators"),
OPT_STRING('C', "cpu", &cpu_list, "cpu",
"list of cpus to monitor in system-wide"),
+ OPT_BOOLEAN('A', "no-aggr", &no_aggr,
+ "disable CPU count aggregation"),
OPT_END()
};

@@ -562,6 +677,10 @@ int cmd_stat(int argc, const char **argv, const char *prefix __used)
if (run_count <= 0)
usage_with_options(stat_usage, options);

+ /* no_aggr is for system-wide only */
+ if (no_aggr && !system_wide)
+ usage_with_options(stat_usage, options);
+
/* Set attrs and nr_counters if no event is selected and !null_run */
if (!null_run && !nr_counters) {
memcpy(attrs, default_attrs, sizeof(default_attrs));
--
1.6.2.5

2010-11-19 21:30:13

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 4/4] perf python scripting: Add futex-contention script

From: Arnaldo Carvalho de Melo <[email protected]>

The equivalent to this SystemTAP script:

http://sourceware.org/systemtap/wiki/WSFutexContention

[root@doppio ~]# perf trace futex-contention
Press control+C to stop and show the summary
^Cnpviewer.bin[15242] lock 7f0a8be19104 contended 29 times, 72806 avg ns
npviewer.bin[15242] lock 7f0a8be19130 contended 2 times, 1355 avg ns
synergyc[17245] lock f127f4 contended 1 times, 1830569 avg ns
firefox[15116] lock 7f2b7238af0c contended 168 times, 1230390 avg ns
synergyc[17245] lock f2fc20 contended 1 times, 33149 avg ns
npviewer.bin[15255] lock 7f0a8be19074 contended 155 times, 73047 avg ns
npviewer.bin[15255] lock 7f0a8be190a0 contended 127 times, 7088 avg ns
synergyc[17247] lock f12854 contended 1 times, 46741 avg ns
synergyc[17245] lock f12610 contended 1 times, 7358 avg ns
[root@doppio ~]#

Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
.../python/Perf-Trace-Util/lib/Perf/Trace/Util.py | 18 +++++++
.../scripts/python/bin/futex-contention-record | 2 +
.../scripts/python/bin/futex-contention-report | 4 ++
tools/perf/scripts/python/futex-contention.py | 50 ++++++++++++++++++++
4 files changed, 74 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/scripts/python/bin/futex-contention-record
create mode 100644 tools/perf/scripts/python/bin/futex-contention-report
create mode 100644 tools/perf/scripts/python/futex-contention.py

diff --git a/tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/Util.py b/tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/Util.py
index 99ff1b7..13cc02b 100644
--- a/tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/Util.py
+++ b/tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/Util.py
@@ -8,6 +8,12 @@

import errno, os

+FUTEX_WAIT = 0
+FUTEX_WAKE = 1
+FUTEX_PRIVATE_FLAG = 128
+FUTEX_CLOCK_REALTIME = 256
+FUTEX_CMD_MASK = ~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+
NSECS_PER_SEC = 1000000000

def avg(total, n):
@@ -26,6 +32,18 @@ def nsecs_str(nsecs):
str = "%5u.%09u" % (nsecs_secs(nsecs), nsecs_nsecs(nsecs)),
return str

+def add_stats(dict, key, value):
+ if not dict.has_key(key):
+ dict[key] = (value, value, value, 1)
+ else:
+ min, max, avg, count = dict[key]
+ if value < min:
+ min = value
+ if value > max:
+ max = value
+ avg = (avg + value) / 2
+ dict[key] = (min, max, avg, count + 1)
+
def clear_term():
print("\x1b[H\x1b[2J")

diff --git a/tools/perf/scripts/python/bin/futex-contention-record b/tools/perf/scripts/python/bin/futex-contention-record
new file mode 100644
index 0000000..5ecbb43
--- /dev/null
+++ b/tools/perf/scripts/python/bin/futex-contention-record
@@ -0,0 +1,2 @@
+#!/bin/bash
+perf record -a -e syscalls:sys_enter_futex -e syscalls:sys_exit_futex $@
diff --git a/tools/perf/scripts/python/bin/futex-contention-report b/tools/perf/scripts/python/bin/futex-contention-report
new file mode 100644
index 0000000..c826813
--- /dev/null
+++ b/tools/perf/scripts/python/bin/futex-contention-report
@@ -0,0 +1,4 @@
+#!/bin/bash
+# description: futext contention measurement
+
+perf trace $@ -s "$PERF_EXEC_PATH"/scripts/python/futex-contention.py
diff --git a/tools/perf/scripts/python/futex-contention.py b/tools/perf/scripts/python/futex-contention.py
new file mode 100644
index 0000000..11e70a3
--- /dev/null
+++ b/tools/perf/scripts/python/futex-contention.py
@@ -0,0 +1,50 @@
+# futex contention
+# (c) 2010, Arnaldo Carvalho de Melo <[email protected]>
+# Licensed under the terms of the GNU GPL License version 2
+#
+# Translation of:
+#
+# http://sourceware.org/systemtap/wiki/WSFutexContention
+#
+# to perf python scripting.
+#
+# Measures futex contention
+
+import os, sys
+sys.path.append(os.environ['PERF_EXEC_PATH'] + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+from Util import *
+
+process_names = {}
+thread_thislock = {}
+thread_blocktime = {}
+
+lock_waits = {} # long-lived stats on (tid,lock) blockage elapsed time
+process_names = {} # long-lived pid-to-execname mapping
+
+def syscalls__sys_enter_futex(event, ctxt, cpu, s, ns, tid, comm,
+ nr, uaddr, op, val, utime, uaddr2, val3):
+ cmd = op & FUTEX_CMD_MASK
+ if cmd != FUTEX_WAIT:
+ return # we don't care about originators of WAKE events
+
+ process_names[tid] = comm
+ thread_thislock[tid] = uaddr
+ thread_blocktime[tid] = nsecs(s, ns)
+
+def syscalls__sys_exit_futex(event, ctxt, cpu, s, ns, tid, comm,
+ nr, ret):
+ if thread_blocktime.has_key(tid):
+ elapsed = nsecs(s, ns) - thread_blocktime[tid]
+ add_stats(lock_waits, (tid, thread_thislock[tid]), elapsed)
+ del thread_blocktime[tid]
+ del thread_thislock[tid]
+
+def trace_begin():
+ print "Press control+C to stop and show the summary"
+
+def trace_end():
+ for (tid, lock) in lock_waits:
+ min, max, avg, count = lock_waits[tid, lock]
+ print "%s[%d] lock %x contended %d times, %d avg ns" % \
+ (process_names[tid], tid, lock, count, avg)
+
--
1.6.2.5

2010-11-19 21:30:11

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 2/3] perf tools: Remove hardcoded include paths for elfutils

From: Robert Morell <[email protected]>

This change removes the use of hardcoded absolute "/usr/include/elfutils" paths
from the perf build. The problem with hardcoded paths is that it prevents them
from being overridden by $prefix or by -I in CFLAGS (e.g., for cross-compiling
purposes).

Instead, just include the "elfutils/" subdirectory as a relative path when
files are needed from that directory.

Tested by building perf:
- Cross-compiled for ARM on x86_64
- Built natively on x86_64
- Built on x86_64 with /usr/include/elfutils moved to another location
and manually included in CFLAGS

Acked-by: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Robert Morell <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Makefile | 4 ++--
tools/perf/feature-tests.mak | 4 ++--
tools/perf/util/probe-finder.h | 6 +++---
3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index d1db0f6..74b684d 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -507,7 +507,7 @@ PERFLIBS = $(LIB_FILE)
-include config.mak

ifndef NO_DWARF
-FLAGS_DWARF=$(ALL_CFLAGS) -I/usr/include/elfutils -ldw -lelf $(ALL_LDFLAGS) $(EXTLIBS)
+FLAGS_DWARF=$(ALL_CFLAGS) -ldw -lelf $(ALL_LDFLAGS) $(EXTLIBS)
ifneq ($(call try-cc,$(SOURCE_DWARF),$(FLAGS_DWARF)),y)
msg := $(warning No libdw.h found or old libdw.h found or elfutils is older than 0.138, disables dwarf support. Please install new elfutils-devel/libdw-dev);
NO_DWARF := 1
@@ -554,7 +554,7 @@ ifndef NO_DWARF
ifeq ($(origin PERF_HAVE_DWARF_REGS), undefined)
msg := $(warning DWARF register mappings have not been defined for architecture $(ARCH), DWARF support disabled);
else
- BASIC_CFLAGS += -I/usr/include/elfutils -DDWARF_SUPPORT
+ BASIC_CFLAGS += -DDWARF_SUPPORT
EXTLIBS += -lelf -ldw
LIB_OBJS += $(OUTPUT)util/probe-finder.o
endif # PERF_HAVE_DWARF_REGS
diff --git a/tools/perf/feature-tests.mak b/tools/perf/feature-tests.mak
index b253db6..b041ca6 100644
--- a/tools/perf/feature-tests.mak
+++ b/tools/perf/feature-tests.mak
@@ -9,8 +9,8 @@ endef
ifndef NO_DWARF
define SOURCE_DWARF
#include <dwarf.h>
-#include <libdw.h>
-#include <version.h>
+#include <elfutils/libdw.h>
+#include <elfutils/version.h>
#ifndef _ELFUTILS_PREREQ
#error
#endif
diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h
index bba69d4..beaefc3 100644
--- a/tools/perf/util/probe-finder.h
+++ b/tools/perf/util/probe-finder.h
@@ -34,9 +34,9 @@ extern int find_available_vars_at(int fd, struct perf_probe_event *pev,
bool externs);

#include <dwarf.h>
-#include <libdw.h>
-#include <libdwfl.h>
-#include <version.h>
+#include <elfutils/libdw.h>
+#include <elfutils/libdwfl.h>
+#include <elfutils/version.h>

struct probe_finder {
struct perf_probe_event *pev; /* Target probe event */
--
1.6.2.5

2010-11-21 13:21:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL 0/3] perf tools improvements


* Arnaldo Carvalho de Melo <[email protected]> wrote:

> From: Arnaldo Carvalho de Melo <[email protected]>
>
> Hi Ingo,
>
> Please pull from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 perf/core
>
> Regards,
>
> - Arnaldo
>
> Arnaldo Carvalho de Melo (1):
> perf tools: Change my maintainer address
>
> Robert Morell (1):
> perf tools: Remove hardcoded include paths for elfutils
>
> Stephane Eranian (1):
> perf stat: Add no-aggregation mode to -a
>
> CREDITS | 2 -
> MAINTAINERS | 2 +-
> tools/perf/Documentation/perf-stat.txt | 5 +
> tools/perf/Makefile | 4 +-
> tools/perf/builtin-stat.c | 169 +++++++++++++++++++++++++++-----
> tools/perf/feature-tests.mak | 4 +-
> tools/perf/util/probe-finder.h | 6 +-
> 7 files changed, 157 insertions(+), 35 deletions(-)

Pulled, thanks a lot Arnaldo!

Ingo