Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756013AbZFVUJc (ORCPT ); Mon, 22 Jun 2009 16:09:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752048AbZFVUJZ (ORCPT ); Mon, 22 Jun 2009 16:09:25 -0400 Received: from hera.kernel.org ([140.211.167.34]:36493 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750913AbZFVUJY (ORCPT ); Mon, 22 Jun 2009 16:09:24 -0400 Subject: Re: [PATCH 2/2 -tip] perf_counter: parse-events.c introduce alias member in event_symbol From: Jaswinder Singh Rajput To: Ingo Molnar Cc: Thomas Gleixner , Peter Zijlstra , LKML In-Reply-To: <1245700551.6167.5.camel@localhost.localdomain> References: <1245669194.17153.6.camel@localhost.localdomain> <1245669268.17153.8.camel@localhost.localdomain> <20090622113256.GA22479@elte.hu> <1245675657.7537.3.camel@localhost.localdomain> <20090622141009.GB6486@elte.hu> <1245700551.6167.5.camel@localhost.localdomain> Content-Type: text/plain Date: Tue, 23 Jun 2009 01:37:59 +0530 Message-Id: <1245701279.6167.7.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.24.5 (2.24.5-1.fc10) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 21265 Lines: 658 On Tue, 2009-06-23 at 01:25 +0530, Jaswinder Singh Rajput wrote: > On Mon, 2009-06-22 at 16:10 +0200, Ingo Molnar wrote: > > yeah, somethig like that. I'd suggest to print out the actual > > measured events: > > > > cache-references 10123 events > > cache-misses 15 events > > > > and if something does not appear to be ticking then do something > > like: > > > > cache-misses > > > > I.e. 'perf test' could be a quick way both to users and to > > developers to see all possible hw and sw events. > > > > Perhaps builtin-test.c should also do specific testcases for certain > > counters - say intentionally migrate to a CPU and back to see the > > CPU-migration count. > > > > Also, you seem to have copied builtin-stat.c, right? Try to > > librarize as much of the functionality (into util/*) to make the > > resulting linecount increase as small as possible. > > > > perf test also need some command to execute otherwise it will also show > long list of > > I think better I should support all events in perf stat so user can get > better information from it and we can all add some other testing option > to it. > > Anyway currently it looks like this : > > [RFC][PATCH] perf_counter tools: introduce perf test to test event for ticks This fixes some style issues : [RFC][PATCH] perf_counter tools: introduce perf test to test event for ticks perf test to Test performance counter events, its output on AMD box : ./perf test -a -- ls -lR > /dev/null Performance counter stats for 'ls' -lR: cycles 1226819954 instructions 283680441 cache-references 144893559 cache-misses 3268438 branches 37488241 branch-misses 2464027 bus-cycles cpu-clock-msecs 17175506056 task-clock-msecs 17175086665 page-faults 488 minor-faults 488 major-faults context-switches 7956 CPU-migrations 7 L1-data-Cache-Load-Referencees 398303881 L1-data-Cache-Load-Misses 3552374 L1-data-Cache-Store-Referencees 270178 L1-data-Cache-Store-Misses L1-data-Cache-Prefetch-Referencees 611622 L1-data-Cache-Prefetch-Misses 399730 L1-instruction-Cache-Load-Referencees 124696447 L1-instruction-Cache-Load-Misses 2912802 L1-instruction-Cache-Store-Referencees L1-instruction-Cache-Store-Misses L1-instruction-Cache-Prefetch-Referencees 156576 L1-instruction-Cache-Prefetch-Misses L2-Cache-Load-Referencees 4312353 L2-Cache-Load-Misses 470382 L2-Cache-Store-Referencees 4392945 L2-Cache-Store-Misses L2-Cache-Prefetch-Referencees L2-Cache-Prefetch-Misses Data-TLB-Cache-Load-Referencees 127076487 Data-TLB-Cache-Load-Misses 1930048 Data-TLB-Cache-Store-Referencees Data-TLB-Cache-Store-Misses Data-TLB-Cache-Prefetch-Referencees Data-TLB-Cache-Prefetch-Misses Instruction-TLB-Cache-Load-Referencees 132768077 Instruction-TLB-Cache-Load-Misses 6406 Instruction-TLB-Cache-Store-Referencees Instruction-TLB-Cache-Store-Misses Instruction-TLB-Cache-Prefetch-Referencees Instruction-TLB-Cache-Prefetch-Misses Branch-Cache-Load-Referencees 58030210 Branch-Cache-Load-Misses 3257804 Branch-Cache-Store-Referencees Branch-Cache-Store-Misses Branch-Cache-Prefetch-Referencees Branch-Cache-Prefetch-Misses 8.681671511 seconds time elapsed. Signed-off-by: Jaswinder Singh Rajput --- tools/perf/Documentation/perf-test.txt | 44 ++++ tools/perf/Makefile | 1 + tools/perf/builtin-test.c | 436 ++++++++++++++++++++++++++++++++ tools/perf/builtin.h | 1 + tools/perf/command-list.txt | 1 + tools/perf/perf.c | 1 + 6 files changed, 484 insertions(+), 0 deletions(-) create mode 100644 tools/perf/Documentation/perf-test.txt create mode 100644 tools/perf/builtin-test.c diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt new file mode 100644 index 0000000..6233769 --- /dev/null +++ b/tools/perf/Documentation/perf-test.txt @@ -0,0 +1,44 @@ +perf-test(1) +============ + +NAME +---- +perf-test - Run a command and gather performance counter event count if any + +SYNOPSIS +-------- +[verse] +'perf test' [-e | --event=EVENT] [-a] +'perf test' [-e | --event=EVENT] [-a] -- [] + +DESCRIPTION +----------- +This command runs a command and gathers performance counter event count +from it. + + +OPTIONS +------- +...:: + Any command you can specify in a shell. + + +-e:: +--event=:: + Select the PMU event. Selection can be a symbolic event name + (use 'perf list' to list all events) or a raw PMU + event (eventsel+umask) in the form of rNNN where NNN is a + hexadecimal event descriptor. + +-a:: + system-wide collection + +EXAMPLES +-------- + +$ perf test -- make -j + + +SEE ALSO +-------- +linkperf:perf-stat[1], perf-top[1], linkperf:perf-list[1] diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 36d7eef..f5ac83f 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -335,6 +335,7 @@ BUILTIN_OBJS += builtin-list.o BUILTIN_OBJS += builtin-record.o BUILTIN_OBJS += builtin-report.o BUILTIN_OBJS += builtin-stat.o +BUILTIN_OBJS += builtin-test.o BUILTIN_OBJS += builtin-top.o PERFLIBS = $(LIB_FILE) diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c new file mode 100644 index 0000000..3b24b2d --- /dev/null +++ b/tools/perf/builtin-test.c @@ -0,0 +1,436 @@ +/* + * builtin-test.c + * + * Builtin test command: Test performace counter events + * + * Sample output on AMD box: + + $ perf test -a -- ls -lR > /dev/null + + Performance counter stats for 'ls' -lR: + + cycles 1226819954 + instructions 283680441 + cache-references 144893559 + cache-misses 3268438 + branches 37488241 + branch-misses 2464027 + bus-cycles + cpu-clock-msecs 17175506056 + task-clock-msecs 17175086665 + page-faults 488 + minor-faults 488 + major-faults + context-switches 7956 + CPU-migrations 7 + L1-data-Cache-Load-Referencees 398303881 + L1-data-Cache-Load-Misses 3552374 + L1-data-Cache-Store-Referencees 270178 + L1-data-Cache-Store-Misses + L1-data-Cache-Prefetch-Referencees 611622 + L1-data-Cache-Prefetch-Misses 399730 + L1-instruction-Cache-Load-Referencees 124696447 + L1-instruction-Cache-Load-Misses 2912802 + L1-instruction-Cache-Store-Referencees + L1-instruction-Cache-Store-Misses + L1-instruction-Cache-Prefetch-Referencees 156576 + L1-instruction-Cache-Prefetch-Misses + L2-Cache-Load-Referencees 4312353 + L2-Cache-Load-Misses 470382 + L2-Cache-Store-Referencees 4392945 + L2-Cache-Store-Misses + L2-Cache-Prefetch-Referencees + L2-Cache-Prefetch-Misses + Data-TLB-Cache-Load-Referencees 127076487 + Data-TLB-Cache-Load-Misses 1930048 + Data-TLB-Cache-Store-Referencees + Data-TLB-Cache-Store-Misses + Data-TLB-Cache-Prefetch-Referencees + Data-TLB-Cache-Prefetch-Misses + Instruction-TLB-Cache-Load-Referencees 132768077 + Instruction-TLB-Cache-Load-Misses 6406 + Instruction-TLB-Cache-Store-Referencees + Instruction-TLB-Cache-Store-Misses + Instruction-TLB-Cache-Prefetch-Referencees + Instruction-TLB-Cache-Prefetch-Misses + Branch-Cache-Load-Referencees 58030210 + Branch-Cache-Load-Misses 3257804 + Branch-Cache-Store-Referencees + Branch-Cache-Store-Misses + Branch-Cache-Prefetch-Referencees + Branch-Cache-Prefetch-Misses + + 8.681671511 seconds time elapsed. + + * (based on builtin-stat.c) + * + * Copyright (C) 2008, Red Hat Inc, Ingo Molnar + * Copyright (C) 2009, Jaswinder Singh Rajput + * + * Released under the GPL v2. (and only v2, not any later version) + */ + +#include "perf.h" +#include "builtin.h" +#include "util/util.h" +#include "util/parse-options.h" +#include "util/parse-events.h" + +#include +#include + +#define CHW(x) .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_##x +#define CSW(x) .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_##x +#define CHCACHE(x, y, z) \ +.type = PERF_TYPE_HW_CACHE, \ +.config = (PERF_COUNT_HW_CACHE_##x | (PERF_COUNT_HW_CACHE_OP_##y << 8) |\ + (PERF_COUNT_HW_CACHE_RESULT_##z << 16)) + +static struct perf_counter_attr default_attrs[] = { +/* Generalized Hardware events */ + { CHW(CPU_CYCLES) }, + { CHW(INSTRUCTIONS) }, + { CHW(CACHE_REFERENCES) }, + { CHW(CACHE_MISSES) }, + { CHW(BRANCH_INSTRUCTIONS) }, + { CHW(BRANCH_MISSES) }, + { CHW(BUS_CYCLES) }, + +/* Generalized Software events */ + { CSW(CPU_CLOCK) }, + { CSW(TASK_CLOCK) }, + { CSW(PAGE_FAULTS) }, + { CSW(PAGE_FAULTS_MIN) }, + { CSW(PAGE_FAULTS_MAJ) }, + { CSW(CONTEXT_SWITCHES) }, + { CSW(CPU_MIGRATIONS) }, + +/* Generalized Hardware cache counters events */ + { CHCACHE(L1D, READ, ACCESS) }, + { CHCACHE(L1D, READ, MISS) }, + { CHCACHE(L1D, WRITE, ACCESS) }, + { CHCACHE(L1D, WRITE, MISS) }, + { CHCACHE(L1D, PREFETCH, ACCESS) }, + { CHCACHE(L1D, PREFETCH, MISS) }, + + { CHCACHE(L1I, READ, ACCESS) }, + { CHCACHE(L1I, READ, MISS) }, + { CHCACHE(L1I, WRITE, ACCESS) }, + { CHCACHE(L1I, WRITE, MISS) }, + { CHCACHE(L1I, PREFETCH, ACCESS) }, + { CHCACHE(L1I, PREFETCH, MISS) }, + + { CHCACHE(LL, READ, ACCESS) }, + { CHCACHE(LL, READ, MISS) }, + { CHCACHE(LL, WRITE, ACCESS) }, + { CHCACHE(LL, WRITE, MISS) }, + { CHCACHE(LL, PREFETCH, ACCESS) }, + { CHCACHE(LL, PREFETCH, MISS) }, + + { CHCACHE(DTLB, READ, ACCESS) }, + { CHCACHE(DTLB, READ, MISS) }, + { CHCACHE(DTLB, WRITE, ACCESS) }, + { CHCACHE(DTLB, WRITE, MISS) }, + { CHCACHE(DTLB, PREFETCH, ACCESS) }, + { CHCACHE(DTLB, PREFETCH, MISS) }, + + { CHCACHE(ITLB, READ, ACCESS) }, + { CHCACHE(ITLB, READ, MISS) }, + { CHCACHE(ITLB, WRITE, ACCESS) }, + { CHCACHE(ITLB, WRITE, MISS) }, + { CHCACHE(ITLB, PREFETCH, ACCESS) }, + { CHCACHE(ITLB, PREFETCH, MISS) }, + + { CHCACHE(BPU, READ, ACCESS) }, + { CHCACHE(BPU, READ, MISS) }, + { CHCACHE(BPU, WRITE, ACCESS) }, + { CHCACHE(BPU, WRITE, MISS) }, + { CHCACHE(BPU, PREFETCH, ACCESS) }, + { CHCACHE(BPU, PREFETCH, MISS) }, + +}; + +#define MAX_RUN 100 + +static int system_wide = 0; +static int verbose = 0; + +static int nr_cpus = 0; + +static int run_count = 1; +static int run_idx = 0; + +static unsigned int page_size; + +static int fd[MAX_NR_CPUS][MAX_COUNTERS]; + +static u64 event_res[MAX_RUN][MAX_COUNTERS][3]; + +static u64 walltime_nsecs[MAX_RUN]; +static u64 runtime_cycles[MAX_RUN]; + +static u64 event_res_avg[MAX_COUNTERS][3]; + +static u64 walltime_nsecs_avg; + +static u64 runtime_cycles_avg; + +static void create_perf_stat_counter(int counter) +{ + struct perf_counter_attr *attr = attrs + counter; + + if (system_wide) { + int cpu; + for (cpu = 0; cpu < nr_cpus; cpu++) { + fd[cpu][counter] = sys_perf_counter_open(attr, -1, cpu, -1, 0); + if (fd[cpu][counter] < 0 && verbose) { + printf("Error: counter %d, sys_perf_counter_open() syscall returned with %d (%s)\n", counter, fd[cpu][counter], strerror(errno)); + } + } + } else { + attr->disabled = 1; + + fd[0][counter] = sys_perf_counter_open(attr, 0, -1, -1, 0); + if (fd[0][counter] < 0 && verbose) { + printf("Error: counter %d, sys_perf_counter_open() syscall returned with %d (%s)\n", counter, fd[0][counter], strerror(errno)); + } + } +} + +/* + * Read out the results of a single counter: + */ +static void read_counter(int counter) +{ + u64 *count, single_count[3]; + ssize_t res; + int cpu, nv; + + count = event_res[run_idx][counter]; + + count[0] = count[1] = count[2] = 0; + + nv = 1; + for (cpu = 0; cpu < nr_cpus; cpu++) { + if (fd[cpu][counter] < 0) + continue; + + res = read(fd[cpu][counter], single_count, nv * sizeof(u64)); + assert(res == nv * sizeof(u64)); + close(fd[cpu][counter]); + fd[cpu][counter] = -1; + + count[0] += single_count[0]; + } + + /* + * Save the full runtime - to allow normalization during printout: + */ + runtime_cycles[run_idx] = count[0]; +} + +static int run_perf_test(int argc, const char **argv) +{ + unsigned long long t0, t1; + int status = 0; + int counter; + int pid; + + if (!system_wide) + nr_cpus = 1; + + for (counter = 0; counter < nr_counters; counter++) + create_perf_stat_counter(counter); + + /* + * Enable counters and exec the command: + */ + t0 = rdclock(); + prctl(PR_TASK_PERF_COUNTERS_ENABLE); + + if ((pid = fork()) < 0) + perror("failed to fork"); + + if (!pid) { + if (execvp(argv[0], (char **)argv)) { + perror(argv[0]); + exit(-1); + } + } + + wait(&status); + + prctl(PR_TASK_PERF_COUNTERS_DISABLE); + t1 = rdclock(); + + walltime_nsecs[run_idx] = t1 - t0; + + for (counter = 0; counter < nr_counters; counter++) + read_counter(counter); + + return WEXITSTATUS(status); +} + +static void test_printout(int counter, u64 *count) +{ + fprintf(stderr, " %-45s", event_name(counter)); + + if (count[0]) + fprintf(stderr, " %14Ld", count[0]); + else + fprintf(stderr, " "); +} + +/* + * Print out the results of a single counter: + */ +static void print_counter(int counter) +{ + u64 *count; + + count = event_res_avg[counter]; + + test_printout(counter, count); + + fprintf(stderr, "\n"); +} + +static void update_avg(const char *name, int idx, u64 *avg, u64 *val) +{ + *avg += *val; + + if (verbose > 1) + fprintf(stderr, "debug: %20s[%d]: %Ld\n", name, idx, *val); +} +/* + * Calculate the averages: + */ +static void calc_avg(void) +{ + int i, j; + + if (verbose > 1) + fprintf(stderr, "\n"); + + for (i = 0; i < run_count; i++) { + update_avg("walltime", 0, &walltime_nsecs_avg, walltime_nsecs + i); + update_avg("runtime_cycles", 0, &runtime_cycles_avg, runtime_cycles + i); + for (j = 0; j < nr_counters; j++) { + update_avg("counter/0", j, + event_res_avg[j]+0, event_res[i][j]+0); + update_avg("counter/1", j, + event_res_avg[j]+1, event_res[i][j]+1); + update_avg("counter/2", j, + event_res_avg[j]+2, event_res[i][j]+2); + } + } + walltime_nsecs_avg /= run_count; + runtime_cycles_avg /= run_count; + + for (j = 0; j < nr_counters; j++) { + event_res_avg[j][0] /= run_count; + event_res_avg[j][1] /= run_count; + event_res_avg[j][2] /= run_count; + } +} + +static void print_test(int argc, const char **argv) +{ + int i, counter; + + calc_avg(); + + fflush(stdout); + + fprintf(stderr, "\n"); + fprintf(stderr, " Performance counter stats for \'%s\'", argv[0]); + + for (i = 1; i < argc; i++) + fprintf(stderr, " %s", argv[i]); + + fprintf(stderr, ":\n\n"); + + for (counter = 0; counter < nr_counters; counter++) + print_counter(counter); + + fprintf(stderr, "\n"); + fprintf(stderr, " %14.9f seconds time elapsed.\n", + (double)walltime_nsecs_avg/1e9); + fprintf(stderr, "\n"); +} + +static volatile int signr = -1; + +static void skip_signal(int signo) +{ + signr = signo; +} + +static const char * const test_usage[] = { + "perf test [] ", + NULL +}; + +static void sig_atexit(void) +{ + if (signr == -1) + return; + + signal(signr, SIG_DFL); + kill(getpid(), signr); +} + +static const struct option options[] = { + OPT_CALLBACK('e', "event", NULL, "event", + "event selector. use 'perf list' to list available events", + parse_events), + OPT_BOOLEAN('a', "all-cpus", &system_wide, + "system-wide collection from all CPUs"), + OPT_BOOLEAN('v', "verbose", &verbose, + "be more verbose (show counter open errors, etc)"), + OPT_END() +}; + +int cmd_test(int argc, const char **argv, const char *prefix) +{ + int status; + + page_size = sysconf(_SC_PAGE_SIZE); + + memcpy(attrs, default_attrs, sizeof(attrs)); + + argc = parse_options(argc, argv, options, test_usage, 0); + if (!argc) + usage_with_options(test_usage, options); + if (run_count <= 0 || run_count > MAX_RUN) + usage_with_options(test_usage, options); + + if (!nr_counters) + nr_counters = ARRAY_SIZE(default_attrs); + + nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); + assert(nr_cpus <= MAX_NR_CPUS); + assert(nr_cpus >= 0); + + /* + * We dont want to block the signals - that would cause + * child tasks to inherit that and Ctrl-C would not work. + * What we want is for Ctrl-C to work in the exec()-ed + * task, but being ignored by perf test itself: + */ + atexit(sig_atexit); + signal(SIGINT, skip_signal); + signal(SIGALRM, skip_signal); + signal(SIGABRT, skip_signal); + + status = 0; + for (run_idx = 0; run_idx < run_count; run_idx++) { + if (run_count != 1 && verbose) + fprintf(stderr, "[ perf test: executing run #%d ... ]\n", run_idx+1); + status = run_perf_test(argc, argv); + } + + print_test(argc, argv); + + return status; +} diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h index 51d1682..3ed0362 100644 --- a/tools/perf/builtin.h +++ b/tools/perf/builtin.h @@ -22,5 +22,6 @@ extern int cmd_stat(int argc, const char **argv, const char *prefix); extern int cmd_top(int argc, const char **argv, const char *prefix); extern int cmd_version(int argc, const char **argv, const char *prefix); extern int cmd_list(int argc, const char **argv, const char *prefix); +extern int cmd_test(int argc, const char **argv, const char *prefix); #endif diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index eebce30..f53544c 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -7,4 +7,5 @@ perf-list mainporcelain common perf-record mainporcelain common perf-report mainporcelain common perf-stat mainporcelain common +perf-test mainporcelain common perf-top mainporcelain common diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 4eb7259..9f98f5e 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -262,6 +262,7 @@ static void handle_internal_command(int argc, const char **argv) { "record", cmd_record, 0 }, { "report", cmd_report, 0 }, { "stat", cmd_stat, 0 }, + { "test", cmd_test, 0 }, { "top", cmd_top, 0 }, { "annotate", cmd_annotate, 0 }, { "version", cmd_version, 0 }, -- 1.6.0.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/