2015-08-18 16:41:30

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 00/10] stat read during perf sampling

From: Kan Liang <[email protected]>

The patch series intends to read counter statistics during sampling.
The instant benefit is that we can read memory bandwidth from uncore
event during cpu PMU event is sampling. Also, there are more and more
free running counter events (such as freq, power etc) we have supported
or plan to support on perf. So it could benefit more.

The patch series includs 10 patches.
- Patch 1: This patch fixes a potential bug, when evlist and evsel
have different CPU maps. It can be merged separately.
- Patch 2-3: These patches introduce a new sort option --socket.
The user can sort the perf result by socket. User also can get
the socket view of samples from perf report --stdio --socket.
This feature should be useful for per-socket event.
- Patch 4-10: Introduce 'N' event/group modifier. The event with
this modifier will do counting not sampling. If a group with this
modifier, only group leader do sampling. The counter statistics will
be wrote in new RECORD type PERF_RECORD_STAT_READ and stored in
perf.data. So perf report can present the counter statistics data
accordingly.
There may be an alternative way to get counter statistics during
sampling by running perf record and perf stat together by script.
But the script way have various issue and complex to parses the
output. For example, the sophisticated bandwidth analysis requires
fine granularity (10-20ms interval), while the perf stat interval
is 100ms. It's better to record all data in perf.data as the
patchset does.

Example:

#perf record -e 'cycles,uncore_imc_1/cas_count_read/N'
--stat-read-interval 10 -a ./tchain_edit
[ perf record: Woken up 520 times to write data ]
[ perf record: Captured and wrote 1.454 MB perf.data (21328 samples) ]

$perf report --stdio --socket

# Samples: 21K of event 'cycles'
# Event count (approx.): 12073951084
#

# Socket: 0
#
# Overhead Command Shared Object Symbol
# ........ ............ ................ .......................................
#
97.58% tchain_edit tchain_edit [.] f3
0.08% tchain_edit tchain_edit [.] f2
0.05% swapper [kernel.vmlinux] [k] run_timer_softirq
# Socket: 1
#
# Overhead Command Shared Object Symbol
# ........ ............ ................ .......................................
#
0.43% swapper [kernel.vmlinux] [k] acpi_idle_do_entry
0.24% kworker/22:1 [kernel.vmlinux] [k] delay_tsc
0.17% perf [kernel.vmlinux] [k] smp_call_function_single

# Socket: 0
#
# Performance counter stats:
# uncore_imc_1/cas_count_read/N 29004
#
# Socket: 1
#
# Performance counter stats:
# uncore_imc_1/cas_count_read/N 11350


$perf report -D

...
0x3e3a8 [0x20]: PERF_RECORD_STAT_READ: uncore_imc_0/cas_count_read/N CPU 0:
value 29 time: 78608435366512
0x3e3c8 [0x20]: PERF_RECORD_STAT_READ: uncore_imc_0/cas_count_read/N CPU 18:
value 15 time: 78608435429055
...
0x3e468 [0x20]: PERF_RECORD_STAT_READ: uncore_imc_0/cas_count_read/N CPU 0:
value 25 time: 78608445379258
0x3e488 [0x20]: PERF_RECORD_STAT_READ: uncore_imc_0/cas_count_read/N CPU 18:
value 12 time: 78608445423995
...

Kan Liang (10):
perf,tools: open event on evsel cpus and threads
perf,tools: Support new sort type --socket
perf,tools: support option --socket
perf,tools: Add 'N' event/group modifier
perf,tools: Enable statistic read for perf record
perf,tools: New RECORD type PERF_RECORD_STAT_READ
perf,tools: record counter statistics during sampling
perf,tools: option to set stat read interval
perf,tools: don't validate non-sample event
perf,tools: Show STAT_READ in perf report

tools/perf/Documentation/perf-list.txt | 5 ++
tools/perf/Documentation/perf-record.txt | 7 ++
tools/perf/Documentation/perf-report.txt | 6 +-
tools/perf/builtin-diff.c | 2 +-
tools/perf/builtin-record.c | 140 ++++++++++++++++++++++++++++++-
tools/perf/builtin-report.c | 108 +++++++++++++++++++++++-
tools/perf/builtin-top.c | 2 +-
tools/perf/ui/stdio/hist.c | 14 +++-
tools/perf/util/cpumap.c | 35 ++++++--
tools/perf/util/cpumap.h | 4 +
tools/perf/util/event.c | 1 +
tools/perf/util/event.h | 10 +++
tools/perf/util/evlist.c | 9 ++
tools/perf/util/evsel.h | 1 +
tools/perf/util/hist.h | 3 +-
tools/perf/util/parse-events.c | 8 +-
tools/perf/util/parse-events.l | 2 +-
tools/perf/util/session.c | 15 ++++
tools/perf/util/sort.c | 34 ++++++++
tools/perf/util/sort.h | 1 +
tools/perf/util/symbol.h | 1 +
tools/perf/util/tool.h | 1 +
22 files changed, 387 insertions(+), 22 deletions(-)

--
1.8.3.1


2015-08-18 16:41:28

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 01/10] perf,tools: open event on evsel cpus and threads

From: Kan Liang <[email protected]>

evsel may have different cpus and threads as evlist's.
Use it's own cpus and threads, when open evsel in perf record.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-record.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 25cf6b4..a0178bf 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -279,7 +279,7 @@ static int record__open(struct record *rec)

evlist__for_each(evlist, pos) {
try_again:
- if (perf_evsel__open(pos, evlist->cpus, evlist->threads) < 0) {
+ if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
if (perf_evsel__fallback(pos, errno, msg, sizeof(msg))) {
if (verbose)
ui__warning("%s\n", msg);
--
1.8.3.1

2015-08-18 16:41:24

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 02/10] perf,tools: Support new sort type --socket

From: Kan Liang <[email protected]>

This patch enable perf report to sort by socket

$ perf report --stdio --sort socket,comm,dso,symbol
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 686 of event 'cycles'
# Event count (approx.): 349215462
#
# Overhead SOCKET Command Shared Object Symbol
# ........ ...... ......... ................
.................................
#
97.05% 000 test test [.] plusB_c
0.98% 000 test test [.] plusA_c
0.93% 001 perf [kernel.vmlinux] [k]
smp_call_function_single
0.19% 001 perf [kernel.vmlinux] [k] page_fault
0.19% 001 swapper [kernel.vmlinux] [k] pm_qos_request
0.16% 000 test [kernel.vmlinux] [k] add_mm_counter_fast

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 3 ++-
tools/perf/util/cpumap.c | 21 ++++++++++++++-------
tools/perf/util/cpumap.h | 1 +
tools/perf/util/hist.h | 1 +
tools/perf/util/sort.c | 28 ++++++++++++++++++++++++++++
tools/perf/util/sort.h | 1 +
6 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index a18ba75..ca76b0d 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -68,7 +68,7 @@ OPTIONS
--sort=::
Sort histogram entries by given key(s) - multiple keys can be specified
in CSV format. Following sort keys are available:
- pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.
+ pid, comm, dso, symbol, parent, cpu, socket, srcline, weight, local_weight.

Each key has following meaning:

@@ -79,6 +79,7 @@ OPTIONS
- parent: name of function matched to the parent regex filter. Unmatched
entries are displayed as "[other]".
- cpu: cpu number the task ran at the time of sample
+ - socket: socket number the task ran at the time of sample
- srcline: filename and line number executed at the time of sample. The
DWARF debugging info must be provided.
- srcfile: file name of the source file of the same. Requires dwarf
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 3667e21..7f25e6c 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -225,17 +225,12 @@ void cpu_map__put(struct cpu_map *map)
cpu_map__delete(map);
}

-int cpu_map__get_socket(struct cpu_map *map, int idx)
+int cpu__get_socket(int cpu)
{
FILE *fp;
const char *mnt;
char path[PATH_MAX];
- int cpu, ret;
-
- if (idx > map->nr)
- return -1;
-
- cpu = map->map[idx];
+ int ret;

mnt = sysfs__mountpoint();
if (!mnt)
@@ -253,6 +248,18 @@ int cpu_map__get_socket(struct cpu_map *map, int idx)
return ret == 1 ? cpu : -1;
}

+int cpu_map__get_socket(struct cpu_map *map, int idx)
+{
+ int cpu;
+
+ if (idx > map->nr)
+ return -1;
+
+ cpu = map->map[idx];
+
+ return cpu__get_socket(cpu);
+}
+
static int cmp_ids(const void *a, const void *b)
{
return *(int *)a - *(int *)b;
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 0af9cec..effb56e 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -18,6 +18,7 @@ struct cpu_map *cpu_map__new(const char *cpu_list);
struct cpu_map *cpu_map__dummy_new(void);
struct cpu_map *cpu_map__read(FILE *file);
size_t cpu_map__fprintf(struct cpu_map *map, FILE *fp);
+int cpu__get_socket(int cpu);
int cpu_map__get_socket(struct cpu_map *map, int idx);
int cpu_map__get_core(struct cpu_map *map, int idx);
int cpu_map__build_socket_map(struct cpu_map *cpus, struct cpu_map **sockp);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index bc528d5..af80fb5 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -29,6 +29,7 @@ enum hist_column {
HISTC_COMM,
HISTC_PARENT,
HISTC_CPU,
+ HISTC_SOCKET,
HISTC_SRCLINE,
HISTC_SRCFILE,
HISTC_MISPREDICT,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 7e38716..245e254 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -421,6 +421,33 @@ struct sort_entry sort_cpu = {
.se_width_idx = HISTC_CPU,
};

+/* --sort socket */
+
+static int64_t
+sort__socket_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ int r_socket, l_socket;
+
+ r_socket = cpu__get_socket(right->cpu);
+ l_socket = cpu__get_socket(left->cpu);
+ return r_socket - l_socket;
+}
+
+static int hist_entry__socket_snprintf(struct hist_entry *he, char *bf,
+ size_t size, unsigned int width)
+{
+ int socket = cpu__get_socket(he->cpu);
+
+ return repsep_snprintf(bf, size, "%*.*d", width, width-3, socket);
+}
+
+struct sort_entry sort_socket = {
+ .se_header = "SOCKET",
+ .se_cmp = sort__socket_cmp,
+ .se_snprintf = hist_entry__socket_snprintf,
+ .se_width_idx = HISTC_SOCKET,
+};
+
/* sort keys for branch stacks */

static int64_t
@@ -1248,6 +1275,7 @@ static struct sort_dimension common_sort_dimensions[] = {
DIM(SORT_SYM, "symbol", sort_sym),
DIM(SORT_PARENT, "parent", sort_parent),
DIM(SORT_CPU, "cpu", sort_cpu),
+ DIM(SORT_SOCKET, "socket", sort_socket),
DIM(SORT_SRCLINE, "srcline", sort_srcline),
DIM(SORT_SRCFILE, "srcfile", sort_srcfile),
DIM(SORT_LOCAL_WEIGHT, "local_weight", sort_local_weight),
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 3c2a399..9d86a1f 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -172,6 +172,7 @@ enum sort_type {
SORT_SYM,
SORT_PARENT,
SORT_CPU,
+ SORT_SOCKET,
SORT_SRCLINE,
SORT_SRCFILE,
SORT_LOCAL_WEIGHT,
--
1.8.3.1

2015-08-18 16:41:27

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 03/10] perf,tools: support option --socket

From: Kan Liang <[email protected]>

Introduce a new option for perf report to show the event information in
the same socket together.
When this option is set, perf report will force to sort by socket.

$ perf report --stdio --socket
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 686 of event 'cycles'
# Event count (approx.): 349215462
#
#
# Socket: 0
#
# Overhead Command Shared Object Symbol
# ........ ......... ................
.................................
#
97.05% test test [.] plusB_c
0.98% test test [.] plusA_c
0.16% test [kernel.vmlinux] [k] add_mm_counter_fast
0.15% swapper [kernel.vmlinux] [k] note_gp_changes
0.15% perf [kernel.vmlinux] [k] unmap_single_vma
0.06% perf [kernel.vmlinux] [k] run_timer_softirq
0.00% swapper [kernel.vmlinux] [k] native_write_msr
#
# Socket: 1
#
# Overhead Command Shared Object Symbol
# ........ ......... ................
.................................
#
0.93% perf [kernel.vmlinux] [k] smp_call_function_single
0.19% perf [kernel.vmlinux] [k] page_fault
0.19% swapper [kernel.vmlinux] [k] pm_qos_request
0.12% rcu_sched [kernel.vmlinux] [k]
dyntick_save_progress_counter
0.00% swapper [kernel.vmlinux] [k] wake_up_process
0.00% swapper [kernel.vmlinux] [k] __do_softirq
0.00% swapper [kernel.vmlinux] [k] run_timer_softirq
0.00% swapper [kernel.vmlinux] [k] native_write_msr
0.00% perf [kernel.vmlinux] [k] native_write_msr

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 3 +++
tools/perf/builtin-diff.c | 2 +-
tools/perf/builtin-report.c | 17 ++++++++++++++++-
tools/perf/builtin-top.c | 2 +-
tools/perf/ui/stdio/hist.c | 14 ++++++++++----
tools/perf/util/cpumap.c | 14 ++++++++++++++
tools/perf/util/cpumap.h | 2 ++
tools/perf/util/hist.h | 2 +-
tools/perf/util/sort.c | 6 ++++++
tools/perf/util/symbol.h | 1 +
10 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index ca76b0d..49a42e4 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -293,6 +293,9 @@ OPTIONS
--group::
Show event group information together.

+--socket::
+ Show event information in the same socket together.
+
--demangle::
Demangle symbol names to human readable form. It's enabled by default,
disable with --no-demangle.
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 0b180a8..0dfd91f 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -667,7 +667,7 @@ static void hists__process(struct hists *hists)
hists__precompute(hists);
hists__output_resort(hists, NULL);

- hists__fprintf(hists, true, 0, 0, 0, stdout);
+ hists__fprintf(hists, true, 0, 0, 0, stdout, -1);
}

static void data__fprintf(void)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 62b285e..6fdf9f4 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -345,7 +345,17 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
continue;

hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
- hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout);
+
+ if (symbol_conf.socket) {
+ int i;
+
+ for (i = 0; i < max_socket_num; i++) {
+ fprintf(stdout, "#\n# Socket: %d\n#\n", i);
+ hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout, i);
+ }
+ } else
+ hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout, -1);
+
fprintf(stdout, "\n\n");
}

@@ -724,6 +734,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
"Show a column with the sum of periods"),
OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
"Show event group information together"),
+ OPT_BOOLEAN(0, "socket", &symbol_conf.socket,
+ "Show event information in the same socket together"),
OPT_CALLBACK_NOOPT('b', "branch-stack", &branch_mode, "",
"use branch records for per branch histogram filling",
parse_branch_mode),
@@ -909,6 +921,9 @@ repeat:

sort__setup_elide(stdout);

+ if (symbol_conf.socket)
+ set_max_socket_num();
+
ret = __cmd_report(&report);
if (ret == K_SWITCH_INPUT_DATA) {
perf_session__delete(session);
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index bfe24f1..deffe44 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -295,7 +295,7 @@ static void perf_top__print_sym_table(struct perf_top *top)
hists__output_recalc_col_len(hists, top->print_entries - printed);
putchar('\n');
hists__fprintf(hists, false, top->print_entries - printed, win_width,
- top->min_percent, stdout);
+ top->min_percent, stdout, -1);
}

static void prompt_integer(int *target, const char *msg)
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index dfcbc90..2f512b8 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -323,7 +323,8 @@ static int hist_entry__snprintf(struct hist_entry *he, struct perf_hpp *hpp)
return 0;

perf_hpp__for_each_format(fmt) {
- if (perf_hpp__should_skip(fmt))
+ if (perf_hpp__should_skip(fmt) ||
+ !strcmp(fmt->name, "SOCKET"))
continue;

/*
@@ -371,7 +372,7 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
}

size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
- int max_cols, float min_pcnt, FILE *fp)
+ int max_cols, float min_pcnt, FILE *fp, int socket)
{
struct perf_hpp_fmt *fmt;
struct rb_node *nd;
@@ -402,7 +403,8 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
fprintf(fp, "# ");

perf_hpp__for_each_format(fmt) {
- if (perf_hpp__should_skip(fmt))
+ if (perf_hpp__should_skip(fmt) ||
+ !strcmp(fmt->name, "SOCKET"))
continue;

if (!first)
@@ -428,7 +430,8 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
perf_hpp__for_each_format(fmt) {
unsigned int i;

- if (perf_hpp__should_skip(fmt))
+ if (perf_hpp__should_skip(fmt) ||
+ !strcmp(fmt->name, "SOCKET"))
continue;

if (!first)
@@ -465,6 +468,9 @@ print_entries:
if (h->filtered)
continue;

+ if ((socket >= 0) && (cpu__get_socket(h->cpu) != socket))
+ continue;
+
percent = hist_entry__get_percent_limit(h);
if (percent < min_pcnt)
continue;
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 7f25e6c..ae03426 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -407,6 +407,20 @@ out:
pr_err("Failed to read max cpus, using default of %d\n", max_cpu_num);
}

+void set_max_socket_num(void)
+{
+ int cpu, socket;
+
+ set_max_cpu_num();
+
+ max_socket_num = 1;
+ for (cpu = 0; cpu < max_cpu_num; cpu++) {
+ socket = cpu__get_socket(cpu);
+ if (max_socket_num < (socket + 1))
+ max_socket_num = socket + 1;
+ }
+}
+
/* Determine highest possible node in the system for sparse allocation */
static void set_max_node_num(void)
{
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index effb56e..094edd9 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -56,9 +56,11 @@ static inline bool cpu_map__empty(const struct cpu_map *map)

int max_cpu_num;
int max_node_num;
+int max_socket_num;
int *cpunode_map;

int cpu__setup_cpunode_map(void);
+void set_max_socket_num(void);

static inline int cpu__max_node(void)
{
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index af80fb5..b188a62 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -139,7 +139,7 @@ void events_stats__inc(struct events_stats *stats, u32 type);
size_t events_stats__fprintf(struct events_stats *stats, FILE *fp);

size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
- int max_cols, float min_pcnt, FILE *fp);
+ int max_cols, float min_pcnt, FILE *fp, int socket);
size_t perf_evlist__fprintf_nr_events(struct perf_evlist *evlist, FILE *fp);

void hists__filter_by_dso(struct hists *hists);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 245e254..70f0526 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1930,6 +1930,12 @@ int setup_sorting(void)
return err;
}

+ if (symbol_conf.socket) {
+ err = sort_dimension__add("socket");
+ if (err < 0)
+ return err;
+ }
+
reset_dimensions();

/*
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index a4cde92..7c153c7 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -101,6 +101,7 @@ struct symbol_conf {
annotate_asm_raw,
annotate_src,
event_group,
+ socket,
demangle,
demangle_kernel,
filter_relative,
--
1.8.3.1

2015-08-18 16:44:21

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 04/10] perf,tools: Add 'N' event/group modifier

From: Kan Liang <[email protected]>

Add a new event/group modifier 'N' to mark the event which will be read
counter statistics during sampling.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/evsel.h | 1 +
tools/perf/util/parse-events.c | 8 +++++++-
tools/perf/util/parse-events.l | 2 +-
3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 93ac6b1..e5ced73 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -110,6 +110,7 @@ struct perf_evsel {
int exclude_GH;
int nr_members;
int sample_read;
+ int stat_read;
unsigned long *per_pkg_mask;
struct perf_evsel *leader;
char *group_name;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index d826e6f..a322b60 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -821,6 +821,7 @@ struct event_modifier {
int precise;
int exclude_GH;
int sample_read;
+ int stat_read;
int pinned;
};

@@ -835,6 +836,7 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
int eI = evsel ? evsel->attr.exclude_idle : 0;
int precise = evsel ? evsel->attr.precise_ip : 0;
int sample_read = 0;
+ int stat_read = 0;
int pinned = evsel ? evsel->attr.pinned : 0;

int exclude = eu | ek | eh;
@@ -872,6 +874,8 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
eG = 1;
} else if (*str == 'S') {
sample_read = 1;
+ } else if (*str == 'N') {
+ stat_read = 1;
} else if (*str == 'D') {
pinned = 1;
} else
@@ -902,6 +906,7 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
mod->precise = precise;
mod->exclude_GH = exclude_GH;
mod->sample_read = sample_read;
+ mod->stat_read = stat_read;
mod->pinned = pinned;

return 0;
@@ -916,7 +921,7 @@ static int check_modifier(char *str)
char *p = str;

/* The sizeof includes 0 byte as well. */
- if (strlen(str) > (sizeof("ukhGHpppSDI") - 1))
+ if (strlen(str) > (sizeof("ukhGHpppSNDI") - 1))
return -1;

while (*p) {
@@ -955,6 +960,7 @@ int parse_events__modifier_event(struct list_head *list, char *str, bool add)
evsel->attr.exclude_idle = mod.eI;
evsel->exclude_GH = mod.exclude_GH;
evsel->sample_read = mod.sample_read;
+ evsel->stat_read = mod.stat_read;

if (perf_evsel__is_group_leader(evsel))
evsel->attr.pinned = mod.pinned;
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 936d566..de2dd4b 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -122,7 +122,7 @@ num_raw_hex [a-fA-F0-9]+
name [a-zA-Z_*?][a-zA-Z0-9_*?.]*
name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?.]*
/* If you add a modifier you need to update check_modifier() */
-modifier_event [ukhpGHSDI]+
+modifier_event [ukhpGHSNDI]+
modifier_bp [rwx]{1,3}

%%
--
1.8.3.1

2015-08-18 16:43:35

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 05/10] perf,tools: Enable statistic read for perf record

From: Kan Liang <[email protected]>

Using 'N' event/group modifier to specify the event which want to read
counter statistics during sampling. For this event, the sampling will
be disabled unless it's group leader.
The 'N' modifier only be available for in system-wide/CPU mode. If a
group is marked as 'N' modifier, only group members read counter
statistics. Group leader always do sampling.
The other limit is that the first event cannot be stat read event, since
many tools special handle first event.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/Documentation/perf-list.txt | 5 +++++
tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
tools/perf/util/evlist.c | 3 +++
3 files changed, 39 insertions(+)

diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index bada893..133dc2c 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -31,6 +31,11 @@ counted. The following modifiers exist:
H - host counting (not in KVM guests)
p - precise level
S - read sample value (PERF_SAMPLE_READ)
+ N - read counter statistics during sampling (Can be used to read
+ counter statistics of PMU_A event when PMU_B events are sampling.
+ For example, getting memory bandwidth by uncore events during the
+ CPU PMU events run time)
+ (Only available for group members or non-first event in system-wide/CPU mode)
D - pin the event to the PMU

The 'p' modifier can be used for specifying how precise the instruction
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a0178bf..5b09318 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -273,11 +273,42 @@ static int record__open(struct record *rec)
struct perf_evlist *evlist = rec->evlist;
struct perf_session *session = rec->session;
struct record_opts *opts = &rec->opts;
+ struct perf_evsel *first = perf_evlist__first(evlist);
+ struct perf_event_attr *attr;
int rc = 0;

perf_evlist__config(evlist, opts);

evlist__for_each(evlist, pos) {
+
+ if (pos->stat_read) {
+ if (!target__has_cpu(&opts->target)) {
+ pos->stat_read = 0;
+ ui__warning("Statistics read only available "
+ "on system-wide/CPU mode. "
+ "Remove :N modifier for event %s\n",
+ pos->name);
+ goto out;
+ }
+ /* Don't do stat read for Group leader */
+ if ((pos->leader == pos) && (pos->leader->nr_members > 1))
+ pos->stat_read = 0;
+ else {
+ if (first == pos) {
+ pos->stat_read = 0;
+ ui__warning("The first event cannot "
+ "be stat read event\n");
+ goto out;
+ }
+ attr = &pos->attr;
+ attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
+ PERF_FORMAT_TOTAL_TIME_RUNNING;
+ attr->sample_freq = 0;
+ attr->sample_period = 0;
+ attr->sample_type = 0;
+ pos->sample_size = 0;
+ }
+ }
try_again:
if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
if (perf_evsel__fallback(pos, errno, msg, sizeof(msg))) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 373f65b..ca7bf8d 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -854,6 +854,9 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
if (evsel->system_wide && thread)
continue;

+ if (evsel->stat_read)
+ continue;
+
fd = FD(evsel, cpu, thread);

if (*output == -1) {
--
1.8.3.1

2015-08-18 16:43:38

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 06/10] perf,tools: New RECORD type PERF_RECORD_STAT_READ

From: Kan Liang <[email protected]>

Introduce a new user RECORD type PERF_RECORD_STAT_READ to store the
stat event read result.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/event.c | 1 +
tools/perf/util/event.h | 10 ++++++++++
tools/perf/util/session.c | 15 +++++++++++++++
tools/perf/util/tool.h | 1 +
4 files changed, 27 insertions(+)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 7ff6127..601f4be 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -37,6 +37,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_AUXTRACE_INFO] = "AUXTRACE_INFO",
[PERF_RECORD_AUXTRACE] = "AUXTRACE",
[PERF_RECORD_AUXTRACE_ERROR] = "AUXTRACE_ERROR",
+ [PERF_RECORD_STAT_READ] = "STAT_READ",
};

const char *perf_event__name(unsigned int id)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index f729df5..f6932cf 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -226,6 +226,7 @@ enum perf_user_event_type { /* above any possible kernel type */
PERF_RECORD_AUXTRACE_INFO = 70,
PERF_RECORD_AUXTRACE = 71,
PERF_RECORD_AUXTRACE_ERROR = 72,
+ PERF_RECORD_STAT_READ = 73,
PERF_RECORD_HEADER_MAX
};

@@ -355,6 +356,14 @@ struct context_switch_event {
u32 next_prev_tid;
};

+struct stat_read_event {
+ struct perf_event_header header;
+ u64 value; /* counter value delta */
+ u32 cpu;
+ u32 pos_id; /* event position in evlist */
+ u64 time; /* time stamp */
+};
+
union perf_event {
struct perf_event_header header;
struct mmap_event mmap;
@@ -377,6 +386,7 @@ union perf_event {
struct aux_event aux;
struct itrace_start_event itrace_start;
struct context_switch_event context_switch;
+ struct stat_read_event stat_read;
};

void perf_event__print_totals(void);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 18722e7..1547970 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -337,6 +337,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
tool->context_switch = perf_event__process_switch;
if (tool->read == NULL)
tool->read = process_event_sample_stub;
+ if (tool->stat_read == NULL)
+ tool->stat_read = process_build_id_stub;
if (tool->throttle == NULL)
tool->throttle = process_event_stub;
if (tool->unthrottle == NULL)
@@ -440,6 +442,16 @@ static void perf_event__task_swap(union perf_event *event, bool sample_id_all)
swap_sample_id_all(event, &event->fork + 1);
}

+static void perf_event__stat_read_swap(union perf_event *event,
+ bool sample_id_all __maybe_unused)
+{
+
+ event->stat_read.value = bswap_64(event->stat_read.value);
+ event->stat_read.cpu = bswap_32(event->stat_read.cpu);
+ event->stat_read.pos_id = bswap_32(event->stat_read.pos_id);
+ event->stat_read.time = bswap_64(event->stat_read.time);
+}
+
static void perf_event__read_swap(union perf_event *event, bool sample_id_all)
{
event->read.pid = bswap_32(event->read.pid);
@@ -642,6 +654,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
[PERF_RECORD_EXIT] = perf_event__task_swap,
[PERF_RECORD_LOST] = perf_event__all64_swap,
[PERF_RECORD_READ] = perf_event__read_swap,
+ [PERF_RECORD_STAT_READ] = perf_event__stat_read_swap,
[PERF_RECORD_THROTTLE] = perf_event__throttle_swap,
[PERF_RECORD_UNTHROTTLE] = perf_event__throttle_swap,
[PERF_RECORD_SAMPLE] = perf_event__all64_swap,
@@ -1191,6 +1204,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
case PERF_RECORD_AUXTRACE_ERROR:
perf_session__auxtrace_error_inc(session, event);
return tool->auxtrace_error(tool, event, session);
+ case PERF_RECORD_STAT_READ:
+ return tool->stat_read(tool, event, session);
default:
return -EINVAL;
}
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index cab8cc2..3a002a2 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -57,6 +57,7 @@ struct perf_tool {
auxtrace_info,
auxtrace_error;
event_op3 auxtrace;
+ event_op2 stat_read;
bool ordered_events;
bool ordering_requires_timestamps;
};
--
1.8.3.1

2015-08-18 16:43:34

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 07/10] perf,tools: record counter statistics during sampling

From: Kan Liang <[email protected]>

Reading at the start and the end of sampling, and save the counter
statistics value to PERF_RECORD_STAT_READ.
The absolate value from counter will be stored in prev_raw_counts. The
detla value will be caculated and saved in PERF_RECORD_STAT_READ.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-record.c | 76 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 75 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5b09318..0e5e4c0 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -29,6 +29,7 @@
#include "util/data.h"
#include "util/auxtrace.h"
#include "util/parse-branch-options.h"
+#include "util/stat.h"

#include <unistd.h>
#include <sched.h>
@@ -266,6 +267,30 @@ int auxtrace_record__snapshot_start(struct auxtrace_record *itr __maybe_unused)

#endif

+static void
+free_prev_stat_counts(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel;
+
+ evlist__for_each(evlist, evsel) {
+ if (evsel->prev_raw_counts != NULL)
+ perf_evsel__free_prev_raw_counts(evsel);
+ }
+}
+
+static int perf_evlist__alloc_prev_stat(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel;
+
+ evlist__for_each(evlist, evsel) {
+ if (evsel->stat_read &&
+ (perf_evsel__alloc_prev_raw_counts(evsel, cpu_map__nr(evsel->cpus), 1) < 0))
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
static int record__open(struct record *rec)
{
char msg[512];
@@ -352,6 +377,9 @@ try_again:
goto out;
}

+ if (perf_evlist__alloc_prev_stat(evlist))
+ goto out;
+
session->evlist = evlist;
perf_session__set_id_hdr_size(session);
out:
@@ -427,6 +455,50 @@ static struct perf_event_header finished_round_event = {
.type = PERF_RECORD_FINISHED_ROUND,
};

+static void process_stat_read_event(struct record *rec, int id,
+ struct perf_evsel *pos,
+ struct perf_counts_values *count,
+ u32 pos_id)
+{
+ union perf_event ev;
+ struct perf_counts_values tmp;
+
+ memset(&ev, 0, sizeof(ev));
+ ev.stat_read.header.type = PERF_RECORD_STAT_READ;
+ ev.stat_read.header.size = sizeof(ev.stat_read);
+ ev.stat_read.cpu = pos->cpus->map[id];
+ ev.stat_read.pos_id = pos_id;
+
+ BUG_ON(pos->prev_raw_counts == NULL);
+ tmp.val = perf_counts(pos->prev_raw_counts, id, 0)->val;
+ if (tmp.val)
+ ev.stat_read.value = count->val - tmp.val;
+ perf_counts(pos->prev_raw_counts, id, 0)->val = count->val;
+
+ ev.stat_read.time = rdclock();
+
+ record__write(rec, &ev, sizeof(ev.stat_read));
+}
+
+static void perf_read_counter(struct record *rec)
+{
+ struct perf_evlist *evlist = rec->evlist;
+ struct perf_counts_values count;
+ struct perf_evsel *pos;
+ int cpu;
+ u32 pos_id = 0;
+
+ evlist__for_each(evlist, pos) {
+ for (cpu = 0; cpu < cpu_map__nr(pos->cpus); cpu++) {
+ if (pos->stat_read &&
+ !perf_evsel__read(pos, cpu, 0, &count)) {
+ process_stat_read_event(rec, cpu, pos, &count, pos_id);
+ }
+ }
+ pos_id++;
+ }
+}
+
static int record__mmap_read_all(struct record *rec)
{
u64 bytes_written = rec->bytes_written;
@@ -664,6 +736,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
perf_evlist__enable(rec->evlist);
}

+ perf_read_counter(rec);
auxtrace_snapshot_enabled = 1;
for (;;) {
int hits = rec->samples;
@@ -708,6 +781,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
*/
if (done && !disabled && !target__none(&opts->target)) {
auxtrace_snapshot_enabled = 0;
+ perf_read_counter(rec);
perf_evlist__disable(rec->evlist);
disabled = true;
}
@@ -776,7 +850,7 @@ out_child:
perf_data_file__size(file) / 1024.0 / 1024.0,
file->path, samples);
}
-
+ free_prev_stat_counts(rec->evlist);
out_delete_session:
perf_session__delete(session);
return status;
--
1.8.3.1

2015-08-18 16:42:51

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 08/10] perf,tools: option to set stat read interval

From: Kan Liang <[email protected]>

Add a timer to read stat event regularly. Option --stat-read-interval
can be used to set the interval.
Only read stat event at the beginning and the end of the test is not
enough. Sometimes, we need fine granularity to do sophisticated
analysis. For example, 10-20ms is required to do sophisticated bandwidth
analysis.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 7 +++++++
tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 347a273..00dc000 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -304,6 +304,13 @@ This option sets the time out limit. The default value is 500 ms.
Record context switch events i.e. events of type PERF_RECORD_SWITCH or
PERF_RECORD_SWITCH_CPU_WIDE.

+--stat-read-interval::
+Sets the interval to do stat read regularly. This option is valid only with
+stat read event. This option is disabled by default. It means that counter
+only be read at the beginning and end of the test.
+But sometimes, we need fine granularity to do sophisticated analysis.
+For example, 10-20ms is required to do sophisticated bandwidth analysis.
+
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0e5e4c0..ebf37e3 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -572,6 +572,14 @@ static void workload_exec_failed_signal(int signo __maybe_unused,

static void snapshot_sig_handler(int sig);

+static unsigned int interval;
+struct record *g_rec;
+
+static void perf_read_alarm(int sig __maybe_unused)
+{
+ perf_read_counter(g_rec);
+}
+
static int __cmd_record(struct record *rec, int argc, const char **argv)
{
int err;
@@ -584,6 +592,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
struct perf_data_file *file = &rec->file;
struct perf_session *session;
bool disabled = false, draining = false;
+ static timer_t timerid;
+ struct itimerspec timeout;
+ struct sigevent sigevent;
int fd;

rec->progname = argv[0];
@@ -597,6 +608,19 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
else
signal(SIGUSR2, SIG_IGN);

+ if (interval) {
+ signal(SIGALRM, perf_read_alarm);
+ g_rec = rec;
+
+ /* Create a timer. */
+ sigevent.sigev_notify = SIGEV_SIGNAL;
+ sigevent.sigev_signo = SIGALRM;
+ timer_create(CLOCK_REALTIME, &sigevent, &timerid);
+ memset(&timeout, 0, sizeof(timeout));
+ timeout.it_value.tv_nsec = interval * 1000000ULL;
+ timeout.it_interval.tv_nsec = interval * 1000000ULL;
+ }
+
session = perf_session__new(file, false, tool);
if (session == NULL) {
pr_err("Perf session creation failed.\n");
@@ -737,6 +761,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
}

perf_read_counter(rec);
+ if (interval)
+ timer_settime(timerid, 0, &timeout, NULL);
+
auxtrace_snapshot_enabled = 1;
for (;;) {
int hits = rec->samples;
@@ -781,6 +808,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
*/
if (done && !disabled && !target__none(&opts->target)) {
auxtrace_snapshot_enabled = 0;
+ if (interval)
+ timer_delete(timerid);
perf_read_counter(rec);
perf_evlist__disable(rec->evlist);
disabled = true;
@@ -1187,6 +1216,8 @@ struct option __record_options[] = {
"per thread proc mmap processing timeout in ms"),
OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
"Record context switch events"),
+ OPT_UINTEGER(0, "stat-read-interval", &interval,
+ "Stat read counter at regular interval in ms (>= 10)"),
OPT_END()
};

--
1.8.3.1

2015-08-18 16:42:48

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 09/10] perf,tools: don't validate non-sample event

From: Kan Liang <[email protected]>

Stat event (:N) doesn't do sampling. So perf tool should not validate
its sample_type, read_format and sample_id_all.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/evlist.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index ca7bf8d..fc2da23 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1249,6 +1249,8 @@ bool perf_evlist__valid_sample_type(struct perf_evlist *evlist)
return false;

evlist__for_each(evlist, pos) {
+ if (!pos->attr.sample_type)
+ continue;
if (pos->id_pos != evlist->id_pos ||
pos->is_pos != evlist->is_pos)
return false;
@@ -1293,6 +1295,8 @@ bool perf_evlist__valid_read_format(struct perf_evlist *evlist)
u64 sample_type = first->attr.sample_type;

evlist__for_each(evlist, pos) {
+ if (!pos->attr.sample_type)
+ continue;
if (read_format != pos->attr.read_format)
return false;
}
@@ -1350,6 +1354,8 @@ bool perf_evlist__valid_sample_id_all(struct perf_evlist *evlist)
struct perf_evsel *first = perf_evlist__first(evlist), *pos = first;

evlist__for_each_continue(evlist, pos) {
+ if (!pos->attr.sample_type)
+ continue;
if (first->attr.sample_id_all != pos->attr.sample_id_all)
return false;
}
--
1.8.3.1

2015-08-18 16:41:26

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RFC 10/10] perf,tools: Show STAT_READ in perf report

From: Kan Liang <[email protected]>

This patch parse the PERF_RECORD_STAT_READ in perf report.
With -D option, event name, CPU id and value will be dumped.
With --stdio option, the value will be shown in a new section
(Performance counter stats) between header and sample result.

Example 1:

# perf record -e 'cycles,uncore_imc_0/cas_count_read/N'
--stat-read-interval 10 -a sleep 5
[ perf record: Woken up 501 times to write data ]
[ perf record: Captured and wrote 0.376 MB perf.data (816 samples) ]
# perf report -D | tail
SAMPLE events: 816
MMAP2 events: 1787
FINISHED_ROUND events: 330
STAT_READ events: 1002
cycles stats:
TOTAL events: 816
SAMPLE events: 816
uncore_imc_0/cas_count_read/N stats:
TOTAL events: 1002
STAT_READ events: 1002

Example 2:
$ perf record -e '{cycles,instructions}:N' -a sleep 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.247 MB perf.data (209 samples) ]

$ perf report --stdio --group --socket
# To display the perf.data header info, please use
--header/--header-only options.
#
# Samples: 209 of event 'anon group { cycles, instructions }'
# Event count (approx.): 41787924
#
# Socket: 0
#
# Performance counter stats:
# instructions 2745005
#
# Overhead Command Shared Object Symbol
# ................ ............ .................
.................................
#
8.06% 0.00% swapper [kernel.vmlinux] [k] ixgbe_read_reg
2.03% 0.00% sleep [kernel.vmlinux] [k]
__rb_insert_augmented
1.98% 0.00% swapper [kernel.vmlinux] [k]
run_timer_softirq

# Socket: 1
#
# Performance counter stats:
# instructions 6386942
#
# Overhead Command Shared Object Symbol
# ................ ............ .................
.................................
#
27.09% 0.00% kworker/23:2 [kernel.vmlinux] [k] delay_tsc
17.53% 0.00% perf [kernel.vmlinux] [k]
smp_call_function_single
13.62% 0.00% kworker/23:2 [kernel.vmlinux] [k] ixgbe_read_reg

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-report.c | 101 +++++++++++++++++++++++++++++++++++++++++---
tools/perf/util/cpumap.c | 4 +-
tools/perf/util/cpumap.h | 1 +
3 files changed, 97 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 6fdf9f4..4794a83 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -182,6 +182,34 @@ out_put:
return ret;
}

+static int process_stat_read_event(struct perf_tool *tool __maybe_unused,
+ union perf_event *event,
+ struct perf_session *session)
+{
+ struct perf_evlist *evlist = session->evlist;
+ struct perf_evsel *evsel = perf_evlist__first(evlist);
+ int id = event->stat_read.pos_id;
+
+ while (id--) {
+ BUG_ON(evsel == NULL);
+ evsel = list_next_entry(evsel, node);
+ }
+
+ if ((evsel->counts == NULL) &&
+ (perf_evsel__alloc_counts(evsel, max_cpu_num, 1) < 0))
+ return -ENOMEM;
+
+ perf_counts(evsel->counts, event->stat_read.cpu, 0)->val += event->stat_read.value;
+ hists__inc_nr_events(evsel__hists(evsel), event->header.type);
+
+ dump_printf(": %s CPU %d: value %" PRIu64 " time: %" PRIu64 "\n",
+ evsel ? perf_evsel__name(evsel) : "FAIL",
+ event->stat_read.cpu,
+ event->stat_read.value,
+ event->stat_read.time);
+ return 0;
+}
+
static int process_read_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample __maybe_unused,
@@ -329,6 +357,47 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
return ret + fprintf(fp, "\n#\n");
}

+static inline void _stat_fprintf(FILE *fp, struct perf_evsel *evsel,
+ int socket, bool print_head)
+{
+ int i;
+ u64 stat_count = 0;
+
+ if (print_head)
+ fprintf(fp, "# Performance counter stats:\n");
+
+ for (i = 0; i < max_cpu_num; i++) {
+ if ((socket >= 0) && (cpu__get_socket(i) != socket))
+ continue;
+ stat_count += perf_counts(evsel->counts, i, 0)->val;
+ }
+
+ fprintf(fp, "# %s\t %" PRIu64, evsel->name, stat_count);
+}
+
+static void stat_fprintf(FILE *fp, struct perf_evsel *evsel, int socket)
+{
+ struct perf_evsel *leader = evsel->leader;
+ struct perf_evsel *pos;
+ bool print_head = true;
+
+ if (!symbol_conf.event_group &&
+ (evsel->counts == NULL))
+ return;
+
+ if (symbol_conf.event_group) {
+ for_each_group_member(pos, leader) {
+ if (pos->counts == NULL)
+ continue;
+ _stat_fprintf(fp, pos, socket, print_head);
+ print_head = false;
+ }
+ } else
+ _stat_fprintf(fp, evsel, socket, print_head);
+
+ fprintf(fp, "\n#\n");
+}
+
static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
struct report *rep,
const char *help)
@@ -344,17 +413,23 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
!perf_evsel__is_group_leader(pos))
continue;

- hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
+ if (pos->counts == NULL)
+ hists__fprintf_nr_sample_events(hists, rep, evname, stdout);

if (symbol_conf.socket) {
int i;

for (i = 0; i < max_socket_num; i++) {
- fprintf(stdout, "#\n# Socket: %d\n#\n", i);
- hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout, i);
+ fprintf(stdout, "\n# Socket: %d\n#\n", i);
+ stat_fprintf(stdout, pos, i);
+ if (pos->counts == NULL)
+ hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout, i);
}
- } else
- hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout, -1);
+ } else {
+ stat_fprintf(stdout, pos, -1);
+ if (pos->counts == NULL)
+ hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout, -1);
+ }

fprintf(stdout, "\n\n");
}
@@ -611,6 +686,17 @@ parse_percent_limit(const struct option *opt, const char *str,
return 0;
}

+static void
+free_stat_counts(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel;
+
+ evlist__for_each(evlist, evsel) {
+ if (evsel->counts != NULL)
+ perf_evsel__free_counts(evsel);
+ }
+}
+
int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
{
struct perf_session *session;
@@ -634,6 +720,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
.fork = perf_event__process_fork,
.lost = perf_event__process_lost,
.read = process_read_event,
+ .stat_read = process_stat_read_event,
.attr = perf_event__process_attr,
.tracing_data = perf_event__process_tracing_data,
.build_id = perf_event__process_build_id,
@@ -920,18 +1007,20 @@ repeat:
}

sort__setup_elide(stdout);
-
+ set_max_cpu_num();
if (symbol_conf.socket)
set_max_socket_num();

ret = __cmd_report(&report);
if (ret == K_SWITCH_INPUT_DATA) {
+ free_stat_counts(session->evlist);
perf_session__delete(session);
goto repeat;
} else
ret = 0;

error:
+ free_stat_counts(session->evlist);
perf_session__delete(session);
return ret;
}
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index ae03426..9ce0f2a 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -380,7 +380,7 @@ out:
}

/* Determine highest possible cpu in the system for sparse allocation */
-static void set_max_cpu_num(void)
+void set_max_cpu_num(void)
{
const char *mnt;
char path[PATH_MAX];
@@ -411,8 +411,6 @@ void set_max_socket_num(void)
{
int cpu, socket;

- set_max_cpu_num();
-
max_socket_num = 1;
for (cpu = 0; cpu < max_cpu_num; cpu++) {
socket = cpu__get_socket(cpu);
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 094edd9..ef2ca6e 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -61,6 +61,7 @@ int *cpunode_map;

int cpu__setup_cpunode_map(void);
void set_max_socket_num(void);
+void set_max_cpu_num(void);

static inline int cpu__max_node(void)
{
--
1.8.3.1

2015-08-18 17:36:45

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH RFC 03/10] perf,tools: support option --socket

On Tue, Aug 18, 2015 at 2:25 AM, <[email protected]> wrote:
> From: Kan Liang <[email protected]>
>
> Introduce a new option for perf report to show the event information in
> the same socket together.
> When this option is set, perf report will force to sort by socket.
>
I would say processor socket, otherwise it is a bit confusing. At
first, I thought
you were talking about network sockets.

> $ perf report --stdio --socket
> # To display the perf.data header info, please use
> --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 686 of event 'cycles'
> # Event count (approx.): 349215462
> #
> #
> # Socket: 0
> #
Processor Socket:

> # Overhead Command Shared Object Symbol
> # ........ ......... ................
> .................................
> #
> 97.05% test test [.] plusB_c
> 0.98% test test [.] plusA_c
> 0.16% test [kernel.vmlinux] [k] add_mm_counter_fast
> 0.15% swapper [kernel.vmlinux] [k] note_gp_changes
> 0.15% perf [kernel.vmlinux] [k] unmap_single_vma
> 0.06% perf [kernel.vmlinux] [k] run_timer_softirq
> 0.00% swapper [kernel.vmlinux] [k] native_write_msr
> #
> # Socket: 1
> #
> # Overhead Command Shared Object Symbol
> # ........ ......... ................
> .................................
> #
> 0.93% perf [kernel.vmlinux] [k] smp_call_function_single
> 0.19% perf [kernel.vmlinux] [k] page_fault
> 0.19% swapper [kernel.vmlinux] [k] pm_qos_request
> 0.12% rcu_sched [kernel.vmlinux] [k]
> dyntick_save_progress_counter
> 0.00% swapper [kernel.vmlinux] [k] wake_up_process
> 0.00% swapper [kernel.vmlinux] [k] __do_softirq
> 0.00% swapper [kernel.vmlinux] [k] run_timer_softirq
> 0.00% swapper [kernel.vmlinux] [k] native_write_msr
> 0.00% perf [kernel.vmlinux] [k] native_write_msr
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/Documentation/perf-report.txt | 3 +++
> tools/perf/builtin-diff.c | 2 +-
> tools/perf/builtin-report.c | 17 ++++++++++++++++-
> tools/perf/builtin-top.c | 2 +-
> tools/perf/ui/stdio/hist.c | 14 ++++++++++----
> tools/perf/util/cpumap.c | 14 ++++++++++++++
> tools/perf/util/cpumap.h | 2 ++
> tools/perf/util/hist.h | 2 +-
> tools/perf/util/sort.c | 6 ++++++
> tools/perf/util/symbol.h | 1 +
> 10 files changed, 55 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
> index ca76b0d..49a42e4 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -293,6 +293,9 @@ OPTIONS
> --group::
> Show event group information together.
>
> +--socket::
> + Show event information in the same socket together.
> +
> --demangle::
> Demangle symbol names to human readable form. It's enabled by default,
> disable with --no-demangle.
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index 0b180a8..0dfd91f 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -667,7 +667,7 @@ static void hists__process(struct hists *hists)
> hists__precompute(hists);
> hists__output_resort(hists, NULL);
>
> - hists__fprintf(hists, true, 0, 0, 0, stdout);
> + hists__fprintf(hists, true, 0, 0, 0, stdout, -1);
> }
>
> static void data__fprintf(void)
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 62b285e..6fdf9f4 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -345,7 +345,17 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
> continue;
>
> hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
> - hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout);
> +
> + if (symbol_conf.socket) {
> + int i;
> +
> + for (i = 0; i < max_socket_num; i++) {
> + fprintf(stdout, "#\n# Socket: %d\n#\n", i);
> + hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout, i);
> + }
> + } else
> + hists__fprintf(hists, true, 0, 0, rep->min_percent, stdout, -1);
> +
> fprintf(stdout, "\n\n");
> }
>
> @@ -724,6 +734,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
> "Show a column with the sum of periods"),
> OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
> "Show event group information together"),
> + OPT_BOOLEAN(0, "socket", &symbol_conf.socket,
> + "Show event information in the same socket together"),
> OPT_CALLBACK_NOOPT('b', "branch-stack", &branch_mode, "",
> "use branch records for per branch histogram filling",
> parse_branch_mode),
> @@ -909,6 +921,9 @@ repeat:
>
> sort__setup_elide(stdout);
>
> + if (symbol_conf.socket)
> + set_max_socket_num();
> +
> ret = __cmd_report(&report);
> if (ret == K_SWITCH_INPUT_DATA) {
> perf_session__delete(session);
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index bfe24f1..deffe44 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -295,7 +295,7 @@ static void perf_top__print_sym_table(struct perf_top *top)
> hists__output_recalc_col_len(hists, top->print_entries - printed);
> putchar('\n');
> hists__fprintf(hists, false, top->print_entries - printed, win_width,
> - top->min_percent, stdout);
> + top->min_percent, stdout, -1);
> }
>
> static void prompt_integer(int *target, const char *msg)
> diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
> index dfcbc90..2f512b8 100644
> --- a/tools/perf/ui/stdio/hist.c
> +++ b/tools/perf/ui/stdio/hist.c
> @@ -323,7 +323,8 @@ static int hist_entry__snprintf(struct hist_entry *he, struct perf_hpp *hpp)
> return 0;
>
> perf_hpp__for_each_format(fmt) {
> - if (perf_hpp__should_skip(fmt))
> + if (perf_hpp__should_skip(fmt) ||
> + !strcmp(fmt->name, "SOCKET"))
> continue;
>
> /*
> @@ -371,7 +372,7 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
> }
>
> size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
> - int max_cols, float min_pcnt, FILE *fp)
> + int max_cols, float min_pcnt, FILE *fp, int socket)
> {
> struct perf_hpp_fmt *fmt;
> struct rb_node *nd;
> @@ -402,7 +403,8 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
> fprintf(fp, "# ");
>
> perf_hpp__for_each_format(fmt) {
> - if (perf_hpp__should_skip(fmt))
> + if (perf_hpp__should_skip(fmt) ||
> + !strcmp(fmt->name, "SOCKET"))
> continue;
>
> if (!first)
> @@ -428,7 +430,8 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
> perf_hpp__for_each_format(fmt) {
> unsigned int i;
>
> - if (perf_hpp__should_skip(fmt))
> + if (perf_hpp__should_skip(fmt) ||
> + !strcmp(fmt->name, "SOCKET"))
> continue;
>
> if (!first)
> @@ -465,6 +468,9 @@ print_entries:
> if (h->filtered)
> continue;
>
> + if ((socket >= 0) && (cpu__get_socket(h->cpu) != socket))
> + continue;
> +
> percent = hist_entry__get_percent_limit(h);
> if (percent < min_pcnt)
> continue;
> diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
> index 7f25e6c..ae03426 100644
> --- a/tools/perf/util/cpumap.c
> +++ b/tools/perf/util/cpumap.c
> @@ -407,6 +407,20 @@ out:
> pr_err("Failed to read max cpus, using default of %d\n", max_cpu_num);
> }
>
> +void set_max_socket_num(void)
> +{
> + int cpu, socket;
> +
> + set_max_cpu_num();
> +
> + max_socket_num = 1;
> + for (cpu = 0; cpu < max_cpu_num; cpu++) {
> + socket = cpu__get_socket(cpu);
> + if (max_socket_num < (socket + 1))
> + max_socket_num = socket + 1;
> + }
> +}
> +
> /* Determine highest possible node in the system for sparse allocation */
> static void set_max_node_num(void)
> {
> diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
> index effb56e..094edd9 100644
> --- a/tools/perf/util/cpumap.h
> +++ b/tools/perf/util/cpumap.h
> @@ -56,9 +56,11 @@ static inline bool cpu_map__empty(const struct cpu_map *map)
>
> int max_cpu_num;
> int max_node_num;
> +int max_socket_num;
> int *cpunode_map;
>
> int cpu__setup_cpunode_map(void);
> +void set_max_socket_num(void);
>
> static inline int cpu__max_node(void)
> {
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index af80fb5..b188a62 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -139,7 +139,7 @@ void events_stats__inc(struct events_stats *stats, u32 type);
> size_t events_stats__fprintf(struct events_stats *stats, FILE *fp);
>
> size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
> - int max_cols, float min_pcnt, FILE *fp);
> + int max_cols, float min_pcnt, FILE *fp, int socket);
> size_t perf_evlist__fprintf_nr_events(struct perf_evlist *evlist, FILE *fp);
>
> void hists__filter_by_dso(struct hists *hists);
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 245e254..70f0526 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -1930,6 +1930,12 @@ int setup_sorting(void)
> return err;
> }
>
> + if (symbol_conf.socket) {
> + err = sort_dimension__add("socket");
> + if (err < 0)
> + return err;
> + }
> +
> reset_dimensions();
>
> /*
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index a4cde92..7c153c7 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -101,6 +101,7 @@ struct symbol_conf {
> annotate_asm_raw,
> annotate_src,
> event_group,
> + socket,
> demangle,
> demangle_kernel,
> filter_relative,
> --
> 1.8.3.1
>

2015-08-20 08:57:40

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 01/10] perf,tools: open event on evsel cpus and threads

On Tue, Aug 18, 2015 at 05:25:37AM -0400, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> evsel may have different cpus and threads as evlist's.
> Use it's own cpus and threads, when open evsel in perf record.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/builtin-record.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 25cf6b4..a0178bf 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -279,7 +279,7 @@ static int record__open(struct record *rec)
>
> evlist__for_each(evlist, pos) {
> try_again:
> - if (perf_evsel__open(pos, evlist->cpus, evlist->threads) < 0) {
> + if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
> if (perf_evsel__fallback(pos, errno, msg, sizeof(msg))) {
> if (verbose)
> ui__warning("%s\n", msg);
> --
> 1.8.3.1
>

dont we need then handle filters the same way?
like in attached change? totally untested..

jirka


---
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7aa039bd379a..f5cdf678d504 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -244,12 +244,9 @@ static void handle_initial_delay(void)
struct perf_evsel *counter;

if (initial_delay) {
- const int ncpus = cpu_map__nr(evsel_list->cpus),
- nthreads = thread_map__nr(evsel_list->threads);
-
usleep(initial_delay * 1000);
evlist__for_each(evsel_list, counter)
- perf_evsel__enable(counter, ncpus, nthreads);
+ perf_evsel__enable(counter);
}
}

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 373f65b02545..0d8428458ae1 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1172,14 +1172,12 @@ int perf_evlist__apply_filters(struct perf_evlist *evlist, struct perf_evsel **e
{
struct perf_evsel *evsel;
int err = 0;
- const int ncpus = cpu_map__nr(evlist->cpus),
- nthreads = thread_map__nr(evlist->threads);

evlist__for_each(evlist, evsel) {
if (evsel->filter == NULL)
continue;

- err = perf_evsel__apply_filter(evsel, ncpus, nthreads, evsel->filter);
+ err = perf_evsel__apply_filter(evsel, evsel->filter);
if (err) {
*err_evsel = evsel;
break;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b096ef7a240c..e1e70c55c68d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -898,9 +898,10 @@ static int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthread
return evsel->fd != NULL ? 0 : -ENOMEM;
}

-static int perf_evsel__run_ioctl(struct perf_evsel *evsel, int ncpus, int nthreads,
- int ioc, void *arg)
+static int perf_evsel__run_ioctl(struct perf_evsel *evsel, int ioc, void *arg)
{
+ int ncpus = cpu_map__nr(evsel->cpus),
+ nthreads = thread_map__nr(evsel->threads);
int cpu, thread;

if (evsel->system_wide)
@@ -919,11 +920,9 @@ static int perf_evsel__run_ioctl(struct perf_evsel *evsel, int ncpus, int nthrea
return 0;
}

-int perf_evsel__apply_filter(struct perf_evsel *evsel, int ncpus, int nthreads,
- const char *filter)
+int perf_evsel__apply_filter(struct perf_evsel *evsel, const char *filter)
{
- return perf_evsel__run_ioctl(evsel, ncpus, nthreads,
- PERF_EVENT_IOC_SET_FILTER,
+ return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_SET_FILTER,
(void *)filter);
}

@@ -957,11 +956,9 @@ int perf_evsel__append_filter(struct perf_evsel *evsel,
return -1;
}

-int perf_evsel__enable(struct perf_evsel *evsel, int ncpus, int nthreads)
+int perf_evsel__enable(struct perf_evsel *evsel)
{
- return perf_evsel__run_ioctl(evsel, ncpus, nthreads,
- PERF_EVENT_IOC_ENABLE,
- 0);
+ return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, 0);
}

int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 93ac6b128149..bb5b9c1d482b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -214,9 +214,8 @@ void perf_evsel__set_sample_id(struct perf_evsel *evsel,
int perf_evsel__set_filter(struct perf_evsel *evsel, const char *filter);
int perf_evsel__append_filter(struct perf_evsel *evsel,
const char *op, const char *filter);
-int perf_evsel__apply_filter(struct perf_evsel *evsel, int ncpus, int nthreads,
- const char *filter);
-int perf_evsel__enable(struct perf_evsel *evsel, int ncpus, int nthreads);
+int perf_evsel__apply_filter(struct perf_evsel *evsel, const char *filter);
+int perf_evsel__enable(struct perf_evsel *evsel);

int perf_evsel__open_per_cpu(struct perf_evsel *evsel,
struct cpu_map *cpus);

2015-08-20 09:09:18

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 02/10] perf,tools: Support new sort type --socket

On Tue, Aug 18, 2015 at 05:25:38AM -0400, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> This patch enable perf report to sort by socket
>

SNIP

> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 7e38716..245e254 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -421,6 +421,33 @@ struct sort_entry sort_cpu = {
> .se_width_idx = HISTC_CPU,
> };
>
> +/* --sort socket */
> +
> +static int64_t
> +sort__socket_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> + int r_socket, l_socket;
> +
> + r_socket = cpu__get_socket(right->cpu);
> + l_socket = cpu__get_socket(left->cpu);
> + return r_socket - l_socket;

we need global topology information in perf.data and use
the mapping from there, we can't use current server info

we currently store core_siblings_list and thread_siblings_list,
in topology FEATURE, which is probably not enough

I think we need new feature that stores topology info and
new interface that will provide all useful mappings:
idx -> cpu
cpu -> core
cpu -> socket
cpu -> node

in another patchset I used new CPUMAP event:
https://git.kernel.org/cgit/linux/kernel/git/jolsa/perf.git/commit/?h=perf/stat_script_3&id=37b7b8449aa23acdfe9dec5a7a371e91c5323da5

we might need both ways (new FEATURE and event) to support pipe reports

jirka

2015-08-20 09:30:33

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 03/10] perf,tools: support option --socket

On Tue, Aug 18, 2015 at 05:25:39AM -0400, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Introduce a new option for perf report to show the event information in
> the same socket together.
> When this option is set, perf report will force to sort by socket.
>
> $ perf report --stdio --socket
> # To display the perf.data header info, please use
> --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 686 of event 'cycles'
> # Event count (approx.): 349215462
> #
> #
> # Socket: 0
> #
> # Overhead Command Shared Object Symbol
> # ........ ......... ................
> .................................
> #
> 97.05% test test [.] plusB_c
> 0.98% test test [.] plusA_c
> 0.16% test [kernel.vmlinux] [k] add_mm_counter_fast
> 0.15% swapper [kernel.vmlinux] [k] note_gp_changes
> 0.15% perf [kernel.vmlinux] [k] unmap_single_vma
> 0.06% perf [kernel.vmlinux] [k] run_timer_softirq
> 0.00% swapper [kernel.vmlinux] [k] native_write_msr
> #
> # Socket: 1
> #
> # Overhead Command Shared Object Symbol
> # ........ ......... ................
> .................................
> #
> 0.93% perf [kernel.vmlinux] [k] smp_call_function_single
> 0.19% perf [kernel.vmlinux] [k] page_fault
> 0.19% swapper [kernel.vmlinux] [k] pm_qos_request
> 0.12% rcu_sched [kernel.vmlinux] [k]
> dyntick_save_progress_counter
> 0.00% swapper [kernel.vmlinux] [k] wake_up_process
> 0.00% swapper [kernel.vmlinux] [k] __do_softirq
> 0.00% swapper [kernel.vmlinux] [k] run_timer_softirq
> 0.00% swapper [kernel.vmlinux] [k] native_write_msr
> 0.00% perf [kernel.vmlinux] [k] native_write_msr

nice, but should this be handled by HIST_FILTER__* stuff?

also having generic ability to 'zoom' in TUI for SOCKET/CPU
would be great.. it's already there for thread and dso

jirka

2015-08-20 09:40:04

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 05/10] perf,tools: Enable statistic read for perf record

On Tue, Aug 18, 2015 at 05:25:41AM -0400, [email protected] wrote:

SNIP

> evlist__for_each(evlist, pos) {
> +
> + if (pos->stat_read) {
> + if (!target__has_cpu(&opts->target)) {
> + pos->stat_read = 0;
> + ui__warning("Statistics read only available "
> + "on system-wide/CPU mode. "
> + "Remove :N modifier for event %s\n",
> + pos->name);
> + goto out;
> + }
> + /* Don't do stat read for Group leader */
> + if ((pos->leader == pos) && (pos->leader->nr_members > 1))
> + pos->stat_read = 0;
> + else {
> + if (first == pos) {
> + pos->stat_read = 0;
> + ui__warning("The first event cannot "
> + "be stat read event\n");
> + goto out;
> + }
> + attr = &pos->attr;
> + attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
> + PERF_FORMAT_TOTAL_TIME_RUNNING;
> + attr->sample_freq = 0;
> + attr->sample_period = 0;
> + attr->sample_type = 0;
> + pos->sample_size = 0;
> + }
> + }

seems like this should go under perf_evsel__config

jirka

2015-08-20 09:41:26

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 05/10] perf,tools: Enable statistic read for perf record

On Tue, Aug 18, 2015 at 05:25:41AM -0400, [email protected] wrote:

SNIP

> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 373f65b..ca7bf8d 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -854,6 +854,9 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
> if (evsel->system_wide && thread)
> continue;
>
> + if (evsel->stat_read)
> + continue;
> +

how about filters (perf_evlist__apply_filters) ... should be disabled as well IIUIC

jirka

2015-08-20 09:50:04

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 06/10] perf,tools: New RECORD type PERF_RECORD_STAT_READ

On Tue, Aug 18, 2015 at 05:25:42AM -0400, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Introduce a new user RECORD type PERF_RECORD_STAT_READ to store the
> stat event read result.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/util/event.c | 1 +
> tools/perf/util/event.h | 10 ++++++++++
> tools/perf/util/session.c | 15 +++++++++++++++
> tools/perf/util/tool.h | 1 +
> 4 files changed, 27 insertions(+)
>
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 7ff6127..601f4be 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -37,6 +37,7 @@ static const char *perf_event__names[] = {
> [PERF_RECORD_AUXTRACE_INFO] = "AUXTRACE_INFO",
> [PERF_RECORD_AUXTRACE] = "AUXTRACE",
> [PERF_RECORD_AUXTRACE_ERROR] = "AUXTRACE_ERROR",
> + [PERF_RECORD_STAT_READ] = "STAT_READ",
> };
>
> const char *perf_event__name(unsigned int id)
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index f729df5..f6932cf 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -226,6 +226,7 @@ enum perf_user_event_type { /* above any possible kernel type */
> PERF_RECORD_AUXTRACE_INFO = 70,
> PERF_RECORD_AUXTRACE = 71,
> PERF_RECORD_AUXTRACE_ERROR = 72,
> + PERF_RECORD_STAT_READ = 73,
> PERF_RECORD_HEADER_MAX
> };
>
> @@ -355,6 +356,14 @@ struct context_switch_event {
> u32 next_prev_tid;
> };
>
> +struct stat_read_event {
> + struct perf_event_header header;
> + u64 value; /* counter value delta */
> + u32 cpu;
> + u32 pos_id; /* event position in evlist */
> + u64 time; /* time stamp */
> +};

could we make this more generic like:

struct stat_event {
struct perf_event_header header;

u64 id;
u32 cpu;
u32 thread;

union {
struct {
u64 val;
u64 ena;
u64 run;
};
u64 values[3];
};
};


I'd agreed with the time to be added, but I think it should
contains raw values, not deltas (and all 3 them: val, ena, run)

also I haven't got the value of the pos_id, so far seems
like this could be obtained in report time and does not
need to be part of the event


thanks,
jirka

2015-08-20 17:24:53

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH RFC 01/10] perf,tools: open event on evsel cpus and threads

>
> On Tue, Aug 18, 2015 at 05:25:37AM -0400, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > evsel may have different cpus and threads as evlist's.
> > Use it's own cpus and threads, when open evsel in perf record.
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/builtin-record.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 25cf6b4..a0178bf 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -279,7 +279,7 @@ static int record__open(struct record *rec)
> >
> > evlist__for_each(evlist, pos) {
> > try_again:
> > - if (perf_evsel__open(pos, evlist->cpus, evlist->threads) <
> 0) {
> > + if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
> > if (perf_evsel__fallback(pos, errno, msg,
> sizeof(msg))) {
> > if (verbose)
> > ui__warning("%s\n", msg);
> > --
> > 1.8.3.1
> >
>
> dont we need then handle filters the same way?
> like in attached change? totally untested..

Filters look only work for tracepoint event, which doesn't have cpu limit.
So evlist and evsel should always be same.
I think we don't need to change it.

>
> jirka
>
>
> ---
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 7aa039bd379a..f5cdf678d504 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -244,12 +244,9 @@ static void handle_initial_delay(void)
> struct perf_evsel *counter;
>
> if (initial_delay) {
> - const int ncpus = cpu_map__nr(evsel_list->cpus),
> - nthreads = thread_map__nr(evsel_list->threads);
> -
> usleep(initial_delay * 1000);
> evlist__for_each(evsel_list, counter)
> - perf_evsel__enable(counter, ncpus, nthreads);
> + perf_evsel__enable(counter);
> }
> }
>

Agree, we need to use evsel's cpu and threads here.
What about the code as below? It should be simpler.
+ perf_evsel__enable(counter, cpu_map__nr(counter->cpus), thread_map__nr(counter->threads));


Thanks,
Kan

2015-08-21 07:43:23

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 01/10] perf,tools: open event on evsel cpus and threads

On Thu, Aug 20, 2015 at 05:24:32PM +0000, Liang, Kan wrote:
> >
> > On Tue, Aug 18, 2015 at 05:25:37AM -0400, [email protected] wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > evsel may have different cpus and threads as evlist's.
> > > Use it's own cpus and threads, when open evsel in perf record.
> > >
> > > Signed-off-by: Kan Liang <[email protected]>
> > > ---
> > > tools/perf/builtin-record.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > index 25cf6b4..a0178bf 100644
> > > --- a/tools/perf/builtin-record.c
> > > +++ b/tools/perf/builtin-record.c
> > > @@ -279,7 +279,7 @@ static int record__open(struct record *rec)
> > >
> > > evlist__for_each(evlist, pos) {
> > > try_again:
> > > - if (perf_evsel__open(pos, evlist->cpus, evlist->threads) <
> > 0) {
> > > + if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
> > > if (perf_evsel__fallback(pos, errno, msg,
> > sizeof(msg))) {
> > > if (verbose)
> > > ui__warning("%s\n", msg);
> > > --
> > > 1.8.3.1
> > >
> >
> > dont we need then handle filters the same way?
> > like in attached change? totally untested..
>
> Filters look only work for tracepoint event, which doesn't have cpu limit.
> So evlist and evsel should always be same.
> I think we don't need to change it.

right.. at least please make a comment about that

>
> >
> > jirka
> >
> >
> > ---
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 7aa039bd379a..f5cdf678d504 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -244,12 +244,9 @@ static void handle_initial_delay(void)
> > struct perf_evsel *counter;
> >
> > if (initial_delay) {
> > - const int ncpus = cpu_map__nr(evsel_list->cpus),
> > - nthreads = thread_map__nr(evsel_list->threads);
> > -
> > usleep(initial_delay * 1000);
> > evlist__for_each(evsel_list, counter)
> > - perf_evsel__enable(counter, ncpus, nthreads);
> > + perf_evsel__enable(counter);
> > }
> > }
> >
>
> Agree, we need to use evsel's cpu and threads here.
> What about the code as below? It should be simpler.
> + perf_evsel__enable(counter, cpu_map__nr(counter->cpus), thread_map__nr(counter->threads));
>

ok, maybe I'll submit that patch as a cleanup,
it seems more sane to use evsel cpus and threads
now that we always have it there

jirka

2015-08-21 13:24:18

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH RFC 01/10] perf,tools: open event on evsel cpus and threads

>
> On Thu, Aug 20, 2015 at 05:24:32PM +0000, Liang, Kan wrote:
> > >
> > > On Tue, Aug 18, 2015 at 05:25:37AM -0400, [email protected] wrote:
> > > > From: Kan Liang <[email protected]>
> > > >
> > > > evsel may have different cpus and threads as evlist's.
> > > > Use it's own cpus and threads, when open evsel in perf record.
> > > >
> > > > Signed-off-by: Kan Liang <[email protected]>
> > > > ---
> > > > tools/perf/builtin-record.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/perf/builtin-record.c
> > > > b/tools/perf/builtin-record.c index 25cf6b4..a0178bf 100644
> > > > --- a/tools/perf/builtin-record.c
> > > > +++ b/tools/perf/builtin-record.c
> > > > @@ -279,7 +279,7 @@ static int record__open(struct record *rec)
> > > >
> > > > evlist__for_each(evlist, pos) {
> > > > try_again:
> > > > - if (perf_evsel__open(pos, evlist->cpus, evlist->threads) <
> > > 0) {
> > > > + if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
> > > > if (perf_evsel__fallback(pos, errno, msg,
> > > sizeof(msg))) {
> > > > if (verbose)
> > > > ui__warning("%s\n", msg);
> > > > --
> > > > 1.8.3.1
> > > >
> > >
> > > dont we need then handle filters the same way?
> > > like in attached change? totally untested..
> >
> > Filters look only work for tracepoint event, which doesn't have cpu limit.
> > So evlist and evsel should always be same.
> > I think we don't need to change it.
>
> right.. at least please make a comment about that
>
OK. I will do that.

> >
> > >
> > > jirka
> > >
> > >
> > > ---
> > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > > index 7aa039bd379a..f5cdf678d504 100644
> > > --- a/tools/perf/builtin-stat.c
> > > +++ b/tools/perf/builtin-stat.c
> > > @@ -244,12 +244,9 @@ static void handle_initial_delay(void)
> > > struct perf_evsel *counter;
> > >
> > > if (initial_delay) {
> > > - const int ncpus = cpu_map__nr(evsel_list->cpus),
> > > - nthreads = thread_map__nr(evsel_list->threads);
> > > -
> > > usleep(initial_delay * 1000);
> > > evlist__for_each(evsel_list, counter)
> > > - perf_evsel__enable(counter, ncpus, nthreads);
> > > + perf_evsel__enable(counter);
> > > }
> > > }
> > >
> >
> > Agree, we need to use evsel's cpu and threads here.
> > What about the code as below? It should be simpler.
> > + perf_evsel__enable(counter,
> cpu_map__nr(counter->cpus),
> > +thread_map__nr(counter->threads));
> >
>
> ok, maybe I'll submit that patch as a cleanup, it seems more sane to use
> evsel cpus and threads now that we always have it there
>

I guess I will send out the builtin-record change and comments as a patch.
So you can either merge the cleanup code to that patch, or send out the
cleanup patch separately.
Either is fine for me.

Thanks,
Kan

2015-08-21 20:25:31

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH RFC 02/10] perf,tools: Support new sort type --socket



> On Tue, Aug 18, 2015 at 05:25:38AM -0400, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > This patch enable perf report to sort by socket
> >
>
> SNIP
>
> > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index
> > 7e38716..245e254 100644
> > --- a/tools/perf/util/sort.c
> > +++ b/tools/perf/util/sort.c
> > @@ -421,6 +421,33 @@ struct sort_entry sort_cpu = {
> > .se_width_idx = HISTC_CPU,
> > };
> >
> > +/* --sort socket */
> > +
> > +static int64_t
> > +sort__socket_cmp(struct hist_entry *left, struct hist_entry *right) {
> > + int r_socket, l_socket;
> > +
> > + r_socket = cpu__get_socket(right->cpu);
> > + l_socket = cpu__get_socket(left->cpu);
> > + return r_socket - l_socket;
>
> we need global topology information in perf.data and use the mapping
> from there, we can't use current server info
>
> we currently store core_siblings_list and thread_siblings_list, in topology
> FEATURE, which is probably not enough
>

core_siblings_list includes the cpu list in the same socket.
thread_siblings_list includes the cpu list in the same core.
numa_nodes includes the cpu list for each node.

It looks we have enough data from topology FEATURE.

What do you think about the function as below?
It gets the socket id from env.

+int
+perf_env_get_socket(struct perf_session_env *env, int cpu)
+{
+ int socket_nr, cpu_nr, i, j;
+ struct cpu_map *socket_map = NULL;
+ char *str;
+
+ if (env == NULL)
+ return -1;
+
+ socket_nr = env->nr_sibling_cores;
+ str = env->sibling_cores;
+
+ for (i = 0; i < socket_nr; i++) {
+ socket_map = cpu_map__new(str);
+ str += strlen(str) + 1;
+ if (!socket_map)
+ continue;
+ cpu_nr = socket_map->nr;
+ for (j = 0; j < cpu_nr; j++) {
+ if (cpu == socket_map->map[j]) {
+ free(socket_map);
+ return i;
+ }
+ }
+ free(socket_map);
+ }
+
+ return -1;
+}

Thanks,
Kan

> I think we need new feature that stores topology info and new interface
> that will provide all useful mappings:
> idx -> cpu
> cpu -> core
> cpu -> socket
> cpu -> node
>
> in another patchset I used new CPUMAP event:
> https://git.kernel.org/cgit/linux/kernel/git/jolsa/perf.git/commit/?h=perf/
> stat_script_3&id=37b7b8449aa23acdfe9dec5a7a371e91c5323da5
>
> we might need both ways (new FEATURE and event) to support pipe
> reports
>
> jirka

2015-08-23 22:00:18

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 02/10] perf,tools: Support new sort type --socket

On Fri, Aug 21, 2015 at 08:25:24PM +0000, Liang, Kan wrote:

SNIP

> >
> > we need global topology information in perf.data and use the mapping
> > from there, we can't use current server info
> >
> > we currently store core_siblings_list and thread_siblings_list, in topology
> > FEATURE, which is probably not enough
> >
>
> core_siblings_list includes the cpu list in the same socket.
> thread_siblings_list includes the cpu list in the same core.
> numa_nodes includes the cpu list for each node.
>
> It looks we have enough data from topology FEATURE.

hum, haven't hecked deeply.. how will you get core id for cpu?

>
> What do you think about the function as below?
> It gets the socket id from env.

some sort of caching would be nice, I guess we could
store those cpumap objects within perf_session_env

jirka

>
> +int
> +perf_env_get_socket(struct perf_session_env *env, int cpu)
> +{
> + int socket_nr, cpu_nr, i, j;
> + struct cpu_map *socket_map = NULL;
> + char *str;
> +
> + if (env == NULL)
> + return -1;
> +
> + socket_nr = env->nr_sibling_cores;
> + str = env->sibling_cores;
> +
> + for (i = 0; i < socket_nr; i++) {
> + socket_map = cpu_map__new(str);
> + str += strlen(str) + 1;
> + if (!socket_map)
> + continue;
> + cpu_nr = socket_map->nr;
> + for (j = 0; j < cpu_nr; j++) {
> + if (cpu == socket_map->map[j]) {
> + free(socket_map);
> + return i;
> + }
> + }
> + free(socket_map);
> + }
> +
> + return -1;
> +}
>
> Thanks,
> Kan
>
> > I think we need new feature that stores topology info and new interface
> > that will provide all useful mappings:
> > idx -> cpu
> > cpu -> core
> > cpu -> socket
> > cpu -> node
> >
> > in another patchset I used new CPUMAP event:
> > https://git.kernel.org/cgit/linux/kernel/git/jolsa/perf.git/commit/?h=perf/
> > stat_script_3&id=37b7b8449aa23acdfe9dec5a7a371e91c5323da5
> >
> > we might need both ways (new FEATURE and event) to support pipe
> > reports
> >
> > jirka

2015-08-24 14:22:16

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH RFC 02/10] perf,tools: Support new sort type --socket

>
> On Fri, Aug 21, 2015 at 08:25:24PM +0000, Liang, Kan wrote:
>
> SNIP
>
> > >
> > > we need global topology information in perf.data and use the mapping
> > > from there, we can't use current server info
> > >
> > > we currently store core_siblings_list and thread_siblings_list, in
> > > topology FEATURE, which is probably not enough
> > >
> >
> > core_siblings_list includes the cpu list in the same socket.
> > thread_siblings_list includes the cpu list in the same core.
> > numa_nodes includes the cpu list for each node.
> >
> > It looks we have enough data from topology FEATURE.
>
> hum, haven't hecked deeply.. how will you get core id for cpu?
>

from thread_siblings_list.
I just noticed that svg_build_topology_map did the similar thing to
get topology map for timechart from perf header.

> >
> > What do you think about the function as below?
> > It gets the socket id from env.
>
> some sort of caching would be nice, I guess we could store those cpumap
> objects within perf_session_env

Yes it will be stored in perf_session_env.

Thanks,
Kan

2015-08-24 14:28:00

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 02/10] perf,tools: Support new sort type --socket

On Mon, Aug 24, 2015 at 02:22:08PM +0000, Liang, Kan wrote:
> >
> > On Fri, Aug 21, 2015 at 08:25:24PM +0000, Liang, Kan wrote:
> >
> > SNIP
> >
> > > >
> > > > we need global topology information in perf.data and use the mapping
> > > > from there, we can't use current server info
> > > >
> > > > we currently store core_siblings_list and thread_siblings_list, in
> > > > topology FEATURE, which is probably not enough
> > > >
> > >
> > > core_siblings_list includes the cpu list in the same socket.
> > > thread_siblings_list includes the cpu list in the same core.
> > > numa_nodes includes the cpu list for each node.
> > >
> > > It looks we have enough data from topology FEATURE.
> >
> > hum, haven't hecked deeply.. how will you get core id for cpu?
> >
>
> from thread_siblings_list.
> I just noticed that svg_build_topology_map did the similar thing to
> get topology map for timechart from perf header.

could you please provide both functions then cpu -> core, cpu -> socket

thanks,
jirka

2015-08-24 16:47:36

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH RFC 02/10] perf,tools: Support new sort type --socket



> On Mon, Aug 24, 2015 at 02:22:08PM +0000, Liang, Kan wrote:
> > >
> > > On Fri, Aug 21, 2015 at 08:25:24PM +0000, Liang, Kan wrote:
> > >
> > > SNIP
> > >
> > > > >
> > > > > we need global topology information in perf.data and use the
> > > > > mapping from there, we can't use current server info
> > > > >
> > > > > we currently store core_siblings_list and thread_siblings_list,
> > > > > in topology FEATURE, which is probably not enough
> > > > >
> > > >
> > > > core_siblings_list includes the cpu list in the same socket.
> > > > thread_siblings_list includes the cpu list in the same core.
> > > > numa_nodes includes the cpu list for each node.
> > > >
> > > > It looks we have enough data from topology FEATURE.
> > >
> > > hum, haven't hecked deeply.. how will you get core id for cpu?
> > >
> >
> > from thread_siblings_list.
> > I just noticed that svg_build_topology_map did the similar thing to
> > get topology map for timechart from perf header.
>
> could you please provide both functions then cpu -> core, cpu -> socket
>

Do you mean something like this?
Store cpu->socket and cpu->core in perf_session_env.

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 179b2bd..a01c603 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1590,10 +1596,17 @@ static int process_cpu_topology(struct perf_file_section *section __maybe_unused
u32 nr, i;
char *str;
struct strbuf sb;
+ int cpu_nr = ph->env.nr_cpus_online;
+ struct cpu_map *map;
+ int j;
+
+ ph->env.cpu = calloc(cpu_nr, sizeof(*ph->env.cpu));
+ if (!ph->env.cpu)
+ return -1;

ret = readn(fd, &nr, sizeof(nr));
if (ret != sizeof(nr))
- return -1;
+ goto free_cpu;

if (ph->needs_swap)
nr = bswap_32(nr);
@@ -1608,6 +1621,14 @@ static int process_cpu_topology(struct perf_file_section *section __maybe_unused

/* include a NULL character at the end */
strbuf_add(&sb, str, strlen(str) + 1);
+
+ map = cpu_map__new(str);
+ if (!map)
+ goto error;
+ for (j = 0; j < map->nr; j++) {
+ ph->env.cpu[map->map[j]].socket_id = i;
+ }
+ cpu_map__put(map);
free(str);
}
ph->env.sibling_cores = strbuf_detach(&sb, NULL);
@@ -1628,6 +1649,14 @@ static int process_cpu_topology(struct perf_file_section *section __maybe_unused

/* include a NULL character at the end */
strbuf_add(&sb, str, strlen(str) + 1);
+
+ map = cpu_map__new(str);
+ if (!map)
+ goto error;
+ for (j = 0; j < map->nr; j++) {
+ ph->env.cpu[map->map[j]].core_id = i;
+ }
+ cpu_map__put(map);
free(str);
}
ph->env.sibling_threads = strbuf_detach(&sb, NULL);
@@ -1635,6 +1664,8 @@ static int process_cpu_topology(struct perf_file_section *section __maybe_unused

error:
strbuf_release(&sb);
+free_cpu:
+ free(ph->env.cpu);
return -1;
}

diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 9b53b65..8b8c4fc 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -66,6 +66,11 @@ struct perf_header;
int perf_file_header__read(struct perf_file_header *header,
struct perf_header *ph, int fd);

+struct cpu_topology_map {
+ int socket_id;
+ int core_id;
+};
+
struct perf_session_env {
char *hostname;
char *os_release;
@@ -89,6 +94,7 @@ struct perf_session_env {
char *sibling_threads;
char *numa_nodes;
char *pmu_mappings;
+ struct cpu_topology_map *cpu;
};

struct perf_header {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 18722e7..51b4d5a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -185,6 +185,7 @@ static void perf_session_env__exit(struct perf_session_env *env)
zfree(&env->sibling_threads);
zfree(&env->numa_nodes);
zfree(&env->pmu_mappings);
+ zfree(&env->cpu);
}

void perf_session__delete(struct perf_session *session)

2015-08-24 19:30:29

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 02/10] perf,tools: Support new sort type --socket

On Mon, Aug 24, 2015 at 04:47:12PM +0000, Liang, Kan wrote:
>
>
> > On Mon, Aug 24, 2015 at 02:22:08PM +0000, Liang, Kan wrote:
> > > >
> > > > On Fri, Aug 21, 2015 at 08:25:24PM +0000, Liang, Kan wrote:
> > > >
> > > > SNIP
> > > >
> > > > > >
> > > > > > we need global topology information in perf.data and use the
> > > > > > mapping from there, we can't use current server info
> > > > > >
> > > > > > we currently store core_siblings_list and thread_siblings_list,
> > > > > > in topology FEATURE, which is probably not enough
> > > > > >
> > > > >
> > > > > core_siblings_list includes the cpu list in the same socket.
> > > > > thread_siblings_list includes the cpu list in the same core.
> > > > > numa_nodes includes the cpu list for each node.
> > > > >
> > > > > It looks we have enough data from topology FEATURE.
> > > >
> > > > hum, haven't hecked deeply.. how will you get core id for cpu?
> > > >
> > >
> > > from thread_siblings_list.
> > > I just noticed that svg_build_topology_map did the similar thing to
> > > get topology map for timechart from perf header.
> >
> > could you please provide both functions then cpu -> core, cpu -> socket
> >
>
> Do you mean something like this?
> Store cpu->socket and cpu->core in perf_session_env.

yep, seems ok

thanks,
jirka

>
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 179b2bd..a01c603 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -1590,10 +1596,17 @@ static int process_cpu_topology(struct perf_file_section *section __maybe_unused
> u32 nr, i;
> char *str;
> struct strbuf sb;
> + int cpu_nr = ph->env.nr_cpus_online;
> + struct cpu_map *map;
> + int j;
> +
> + ph->env.cpu = calloc(cpu_nr, sizeof(*ph->env.cpu));
> + if (!ph->env.cpu)
> + return -1;
>
> ret = readn(fd, &nr, sizeof(nr));
> if (ret != sizeof(nr))
> - return -1;
> + goto free_cpu;
>
> if (ph->needs_swap)
> nr = bswap_32(nr);
> @@ -1608,6 +1621,14 @@ static int process_cpu_topology(struct perf_file_section *section __maybe_unused
>
> /* include a NULL character at the end */
> strbuf_add(&sb, str, strlen(str) + 1);
> +
> + map = cpu_map__new(str);
> + if (!map)
> + goto error;
> + for (j = 0; j < map->nr; j++) {
> + ph->env.cpu[map->map[j]].socket_id = i;
> + }
> + cpu_map__put(map);
> free(str);
> }
> ph->env.sibling_cores = strbuf_detach(&sb, NULL);
> @@ -1628,6 +1649,14 @@ static int process_cpu_topology(struct perf_file_section *section __maybe_unused
>
> /* include a NULL character at the end */
> strbuf_add(&sb, str, strlen(str) + 1);
> +
> + map = cpu_map__new(str);
> + if (!map)
> + goto error;
> + for (j = 0; j < map->nr; j++) {
> + ph->env.cpu[map->map[j]].core_id = i;
> + }
> + cpu_map__put(map);
> free(str);
> }
> ph->env.sibling_threads = strbuf_detach(&sb, NULL);
> @@ -1635,6 +1664,8 @@ static int process_cpu_topology(struct perf_file_section *section __maybe_unused
>
> error:
> strbuf_release(&sb);
> +free_cpu:
> + free(ph->env.cpu);
> return -1;
> }
>
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index 9b53b65..8b8c4fc 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -66,6 +66,11 @@ struct perf_header;
> int perf_file_header__read(struct perf_file_header *header,
> struct perf_header *ph, int fd);
>
> +struct cpu_topology_map {
> + int socket_id;
> + int core_id;
> +};
> +
> struct perf_session_env {
> char *hostname;
> char *os_release;
> @@ -89,6 +94,7 @@ struct perf_session_env {
> char *sibling_threads;
> char *numa_nodes;
> char *pmu_mappings;
> + struct cpu_topology_map *cpu;
> };
>
> struct perf_header {
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 18722e7..51b4d5a 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -185,6 +185,7 @@ static void perf_session_env__exit(struct perf_session_env *env)
> zfree(&env->sibling_threads);
> zfree(&env->numa_nodes);
> zfree(&env->pmu_mappings);
> + zfree(&env->cpu);
> }
>
> void perf_session__delete(struct perf_session *session)
>

2015-08-27 18:24:32

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC 02/10] perf,tools: Support new sort type --socket

On Mon, Aug 24, 2015 at 09:30:21PM +0200, Jiri Olsa wrote:
> On Mon, Aug 24, 2015 at 04:47:12PM +0000, Liang, Kan wrote:
> >
> >
> > > On Mon, Aug 24, 2015 at 02:22:08PM +0000, Liang, Kan wrote:
> > > > >
> > > > > On Fri, Aug 21, 2015 at 08:25:24PM +0000, Liang, Kan wrote:
> > > > >
> > > > > SNIP
> > > > >
> > > > > > >
> > > > > > > we need global topology information in perf.data and use the
> > > > > > > mapping from there, we can't use current server info
> > > > > > >
> > > > > > > we currently store core_siblings_list and thread_siblings_list,
> > > > > > > in topology FEATURE, which is probably not enough
> > > > > > >
> > > > > >
> > > > > > core_siblings_list includes the cpu list in the same socket.
> > > > > > thread_siblings_list includes the cpu list in the same core.
> > > > > > numa_nodes includes the cpu list for each node.
> > > > > >
> > > > > > It looks we have enough data from topology FEATURE.
> > > > >
> > > > > hum, haven't hecked deeply.. how will you get core id for cpu?
> > > > >
> > > >
> > > > from thread_siblings_list.
> > > > I just noticed that svg_build_topology_map did the similar thing to
> > > > get topology map for timechart from perf header.
> > >
> > > could you please provide both functions then cpu -> core, cpu -> socket
> > >
> >
> > Do you mean something like this?
> > Store cpu->socket and cpu->core in perf_session_env.
>
> yep, seems ok

any chance you could send out this one? I'd need to use it in my other stuff

thanks,
jirka

2015-08-27 18:29:55

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH RFC 02/10] perf,tools: Support new sort type --socket


>
> On Mon, Aug 24, 2015 at 09:30:21PM +0200, Jiri Olsa wrote:
> > On Mon, Aug 24, 2015 at 04:47:12PM +0000, Liang, Kan wrote:
> > >
> > >
> > > > On Mon, Aug 24, 2015 at 02:22:08PM +0000, Liang, Kan wrote:
> > > > > >
> > > > > > On Fri, Aug 21, 2015 at 08:25:24PM +0000, Liang, Kan wrote:
> > > > > >
> > > > > > SNIP
> > > > > >
> > > > > > > >
> > > > > > > > we need global topology information in perf.data and use
> > > > > > > > the mapping from there, we can't use current server info
> > > > > > > >
> > > > > > > > we currently store core_siblings_list and
> > > > > > > > thread_siblings_list, in topology FEATURE, which is
> > > > > > > > probably not enough
> > > > > > > >
> > > > > > >
> > > > > > > core_siblings_list includes the cpu list in the same socket.
> > > > > > > thread_siblings_list includes the cpu list in the same core.
> > > > > > > numa_nodes includes the cpu list for each node.
> > > > > > >
> > > > > > > It looks we have enough data from topology FEATURE.
> > > > > >
> > > > > > hum, haven't hecked deeply.. how will you get core id for cpu?
> > > > > >
> > > > >
> > > > > from thread_siblings_list.
> > > > > I just noticed that svg_build_topology_map did the similar
> > > > > thing to get topology map for timechart from perf header.
> > > >
> > > > could you please provide both functions then cpu -> core, cpu ->
> > > > socket
> > > >
> > >
> > > Do you mean something like this?
> > > Store cpu->socket and cpu->core in perf_session_env.
> >
> > yep, seems ok
>
> any chance you could send out this one? I'd need to use it in my other stuff
>

Ok I will send out this patch as a separate patch first.