2018-06-06 22:25:33

by Jiri Olsa

[permalink] [raw]
Subject: [RFC 00/10] perf: Add cputime events/metrics

hi,
so.. I failed to make work reliably the exclude_idle bit for
cpu-clock event using the idle's process sum_exec_runtime as
Peter outlined in his patch [1]. The time jumped up and down
and I couldn't make it stable.

But I noticed we actually have IDLE stats (and many more)
available for each CPU (enum cpu_usage_stat), we just can't
reach them by perf yet.

So this patchset adds 'cputime' perf software PMU, that provides
CPUTIME_* stats via events that mirrors their names:

# perf list | grep cputime
cputime/guest/ [Kernel PMU event]
cputime/guest_nice/ [Kernel PMU event]
cputime/idle/ [Kernel PMU event]
cputime/iowait/ [Kernel PMU event]
cputime/irq/ [Kernel PMU event]
cputime/nice/ [Kernel PMU event]
cputime/softirq/ [Kernel PMU event]
cputime/steal/ [Kernel PMU event]
cputime/system/ [Kernel PMU event]
cputime/user/ [Kernel PMU event]

I had some issues with IDLE counter being miscounted due to stopping
of the idle tick. I tried to solve it in this patch (it's part of the
patchset):
perf/cputime: Don't stop idle tick if there's live cputime event

but I'm pretty sure it's wrong and there's better solution.


However most of the counts look ok so far and here's few
of my favorite commands I've been playing with:

# perf stat --top -I 1000
# time Idle System User Irq Softirq IO wait
1.001692690 100.0% 0.0% 0.0% 0.7% 0.2% 0.0%
2.002994039 98.9% 0.0% 0.0% 0.9% 0.2% 0.0%
3.004164038 98.5% 0.2% 0.2% 0.9% 0.2% 0.0%
4.005312773 98.9% 0.0% 0.0% 0.9% 0.2% 0.0%


# perf stat --top-full -I 1000
# time Idle System User Irq Softirq IO wait Guest Guest nice Nice Steal
1.001750803 100.0% 0.0% 0.0% 0.7% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%
2.003159490 99.0% 0.0% 0.0% 0.9% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%
3.004358366 99.0% 0.0% 0.0% 0.9% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%
4.005592436 98.9% 0.0% 0.0% 0.9% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%


# perf stat -e cpu-clock,cputime/system/,cputime/user/,cputime/idle/ -a sleep 10

Performance counter stats for 'system wide':

240070.828221 cpu-clock (msec) # 23.999 CPUs utilized
208,910,979,120 ns cputime/system/ # 87.0% System
20,589,603,359 ns cputime/user/ # 8.6% User
8,813,416,821 ns cputime/idle/ # 3.7% Idle

10.003261054 seconds time elapsed


# perf stat -e cpu-clock,cputime/system/,cputime/user/ yes > /dev/null
^Cyes: Interrupt

Performance counter stats for 'yes':

3483.824364 cpu-clock (msec) # 1.000 CPUs utilized
2,460,117,205 ns cputime/system/ # 70.6% System
1,018,360,669 ns cputime/user/ # 29.2% User

3.484554149 seconds time elapsed

1.018525000 seconds user
2.460515000 seconds sys

# perf stat --top -I 1000 --interval-clear
# perf stat --top -I 1000 --interval-clear --per-core
# perf stat --top -I 1000 --interval-clear --per-socket
# perf stat --top -I 1000 --interval-clear -A

It's also available in here:
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
perf/fixes

My current plan is now to read those counters in perf top/record/report
to show (at least) the idle percentage for the current profile.

thoughts? ;-)

thanks,
jirka


[1] https://marc.info/?l=linux-kernel&m=152397251027433&w=2
---
Jiri Olsa (10):
perf tools: Uniquify the event name if there's no other matched event
perf tools: Fix error index for pmu event parser
perf stat: Add --interval-clear option
perf stat: Use only color_fprintf call in print_metric_only
perf stat: Fix metric column display
perf stat: Allow to specify specific metric column len
perf stat: Add event parsing error handling to add_default_attributes
perf/cputime: Add cputime pmu
perf/cputime: Don't stop idle tick if there's live cputime event
perf stat: Add cputime metric support

include/linux/perf_event.h | 3 ++
kernel/events/Makefile | 2 +-
kernel/events/core.c | 1 +
kernel/events/cputime.c | 211 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/time/tick-sched.c | 4 +++
tools/perf/Documentation/perf-stat.txt | 68 ++++++++++++++++++++++++++++++++++++
tools/perf/builtin-stat.c | 104 ++++++++++++++++++++++++++++++++++++++++++++-----------
tools/perf/util/parse-events.y | 5 +++
tools/perf/util/stat-shadow.c | 70 +++++++++++++++++++++++++++++++++++++
tools/perf/util/stat.c | 10 ++++++
tools/perf/util/stat.h | 10 ++++++
11 files changed, 467 insertions(+), 21 deletions(-)
create mode 100644 kernel/events/cputime.c


2018-06-06 22:17:09

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 09/10] perf/cputime: Don't stop idle tick if there's live cputime event

Disable stopping of the idle tick when having live cputime
event. When the tick is disabled, the idle counts are out
of date until next tick/update and perf cputime PMU provides
misleading counts.

Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/cputime.c | 13 +++++++++++++
kernel/time/tick-sched.c | 4 ++++
3 files changed, 18 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index aa9eaab370be..ba61d2f9602a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1407,4 +1407,5 @@ int perf_event_exit_cpu(unsigned int cpu);
#define perf_event_exit_cpu NULL
#endif

+bool has_cputime_event(int cpu);
#endif /* _LINUX_PERF_EVENT_H */
diff --git a/kernel/events/cputime.c b/kernel/events/cputime.c
index efad24543f13..32d3cde0047e 100644
--- a/kernel/events/cputime.c
+++ b/kernel/events/cputime.c
@@ -1,6 +1,7 @@
#include <linux/kernel_stat.h>
#include <linux/sched.h>
#include <linux/perf_event.h>
+#include <linux/tick.h>

enum perf_cputime_id {
PERF_CPUTIME_USER,
@@ -16,6 +17,13 @@ enum perf_cputime_id {
PERF_CPUTIME_MAX,
};

+static DEFINE_PER_CPU(int, has_cputime);
+
+bool has_cputime_event(int cpu)
+{
+ return per_cpu(has_cputime, cpu) != 0;
+}
+
static enum cpu_usage_stat map[PERF_CPUTIME_MAX] = {
[PERF_CPUTIME_USER] = CPUTIME_USER,
[PERF_CPUTIME_NICE] = CPUTIME_NICE,
@@ -143,12 +151,17 @@ static int cputime_event_add(struct perf_event *event, int flags)
if (flags & PERF_EF_START)
cputime_event_start(event, flags);

+ if (event->hw.config == PERF_CPUTIME_IDLE)
+ tick_nohz_idle_restart_tick();
+
+ this_cpu_inc(has_cputime);
return 0;
}

static void cputime_event_del(struct perf_event *event, int flags)
{
cputime_event_stop(event, PERF_EF_UPDATE);
+ this_cpu_dec(has_cputime);
}

static void perf_cputime_read(struct perf_event *event)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index da9455a6b42b..1c105bc2a92e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -28,6 +28,7 @@
#include <linux/posix-timers.h>
#include <linux/context_tracking.h>
#include <linux/mm.h>
+#include <linux/perf_event.h>

#include <asm/irq_regs.h>

@@ -912,6 +913,9 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
return false;
}

+ if (has_cputime_event(cpu))
+ return false;
+
return true;
}

--
2.13.6


2018-06-06 22:17:31

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 05/10] perf stat: Fix metric column display

Make the metric only display aligned.

Before:
# perf stat --topdown -I 1000
# time core cpus retiring bad speculation frontend bound backend bound
1.000394323 S0-C0 2 37.4% 12.0% 31.4% 19.2%
1.000394323 S0-C1 2 25.1% 9.2% 43.8% 21.9%
2.001521204 S0-C0 2 36.4% 11.4% 32.4% 19.8%
2.001521204 S0-C1 2 26.2% 9.4% 43.1% 21.3%
3.001930208 S0-C0 2 35.1% 10.7% 33.6% 20.6%
3.001930208 S0-C1 2 28.9% 10.0% 40.0% 21.1%

After:
# perf stat --topdown -I 1000
# time core cpus retiring bad speculation frontend bound backend bound
1.000303722 S0-C0 2 34.2% 7.6% 34.2% 24.0%
1.000303722 S0-C1 2 33.1% 6.4% 36.9% 23.6%
2.001281055 S0-C0 2 34.6% 6.7% 36.8% 21.8%
2.001281055 S0-C1 2 32.8% 7.1% 38.1% 22.0%
3.001546080 S0-C0 2 39.3% 5.5% 32.7% 22.5%
3.001546080 S0-C1 2 37.8% 6.0% 33.1% 23.1%

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-stat.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 0b1ddd5ef05d..4e7bae4c98d2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1001,19 +1001,20 @@ static void print_metric_only(void *ctx, const char *color, const char *fmt,
{
struct outstate *os = ctx;
FILE *out = os->fh;
- int n;
- char buf[1024];
+ char buf[1024], str[1024];
unsigned mlen = METRIC_ONLY_LEN;

if (!valid_only_metric(unit))
return;
unit = fixunit(buf, os->evsel, unit);
- n = color_fprintf(out, color ?: "", fmt, val);
- if (n > METRIC_ONLY_LEN)
- n = METRIC_ONLY_LEN;
if (mlen < strlen(unit))
mlen = strlen(unit) + 1;
- fprintf(out, "%*s", mlen - n, "");
+
+ if (color)
+ mlen += strlen(color) + sizeof(PERF_COLOR_RESET) - 1;
+
+ color_snprintf(str, sizeof(str), color ?: "", fmt, val);
+ fprintf(out, "%*s ", mlen, str);
}

static void print_metric_only_csv(void *ctx, const char *color __maybe_unused,
@@ -1053,7 +1054,7 @@ static void print_metric_header(void *ctx, const char *color __maybe_unused,
if (csv_output)
fprintf(os->fh, "%s%s", unit, csv_sep);
else
- fprintf(os->fh, "%-*s ", METRIC_ONLY_LEN, unit);
+ fprintf(os->fh, "%*s ", METRIC_ONLY_LEN, unit);
}

static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
@@ -1730,7 +1731,7 @@ static void print_interval(char *prefix, struct timespec *ts)
fprintf(output, " counts %*s events\n", unit_width, "unit");
break;
case AGGR_NONE:
- fprintf(output, "# time CPU");
+ fprintf(output, "# time CPU ");
if (!metric_only)
fprintf(output, " counts %*s events\n", unit_width, "unit");
break;
--
2.13.6


2018-06-06 22:17:33

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 10/10] perf stat: Add cputime metric support

Adding --top/--top-full options to provide metrics based
on the cputime PMU events. Simply all the metrics are simple
ratios of events to STAT_NSECS time to get their % value.

The --top option provides basic subset of cputime metrics:

# perf stat --top -I 1000
# time Idle System User Irq Softirq IO wait
1.001692690 100.0% 0.0% 0.0% 0.7% 0.2% 0.0%
2.002994039 98.9% 0.0% 0.0% 0.9% 0.2% 0.0%
3.004164038 98.5% 0.2% 0.2% 0.9% 0.2% 0.0%
4.005312773 98.9% 0.0% 0.0% 0.9% 0.2% 0.0%

The --top-full option provides all cputime metrics:

# perf stat --top-full -I 1000
# time Idle System User Irq Softirq IO wait Guest Guest nice Nice Steal
1.001750803 100.0% 0.0% 0.0% 0.7% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%
2.003159490 99.0% 0.0% 0.0% 0.9% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%
3.004358366 99.0% 0.0% 0.0% 0.9% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%
4.005592436 98.9% 0.0% 0.0% 0.9% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0%

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 65 +++++++++++++++++++++++++++++++
tools/perf/builtin-stat.c | 47 +++++++++++++++++++++++
tools/perf/util/stat-shadow.c | 70 ++++++++++++++++++++++++++++++++++
tools/perf/util/stat.c | 10 +++++
tools/perf/util/stat.h | 10 +++++
5 files changed, 202 insertions(+)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index b10a90b6a718..9330765b7225 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -310,6 +310,71 @@ The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf

Users who wants to get the actual value can apply --no-metric-only.

+--top::
+--top-full:
+Measure cputime PMU events and display percentage of CPU utilization rates.
+
+The --top option displays rates for following events:
+ idle system user irq softirq iowait
+
+The --top-full option displays additional rates:
+ guest guest_nice nice steal
+
+Examples:
+ # perf stat --top
+ ^C
+ Performance counter stats for 'system wide':
+
+ Idle System User Irq Softirq IO wait
+ 1.3% 89.5% 7.4% 1.8% 0.1% 0.0%
+
+ 7.282332605 seconds time elapsed
+
+ # perf stat --top-full
+ ^C
+ Performance counter stats for 'system wide':
+
+ Idle System User Irq Softirq IO wait Guest Guest nice Nice Steal
+ 5.4% 85.4% 8.6% 0.5% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0%
+
+ 7.618359683 seconds time elapsed
+
+ # perf stat --top -I 1000
+ # time Idle System User Irq Softirq IO wait
+ 1.000525839 5.4% 85.3% 8.8% 0.4% 0.1% 0.0%
+ 2.001032632 5.1% 85.7% 8.7% 0.4% 0.1% 0.0%
+ 3.001388414 5.2% 85.7% 8.6% 0.4% 0.1% 0.0%
+ 4.001758697 5.7% 85.2% 8.6% 0.5% 0.1% 0.0%
+
+ # perf stat --top -I 1000 -A
+ # time CPU Idle System User Irq Softirq IO wait
+ 1.000485174 CPU0 6.9% 84.0% 8.6% 0.5% 0.1% 0.0%
+ 1.000485174 CPU1 5.5% 84.8% 9.1% 0.5% 0.1% 0.0%
+ 1.000485174 CPU2 5.5% 86.6% 7.4% 0.5% 0.1% 0.0%
+ ...
+
+ # perf stat --top -I 1000 --per-core
+ # time core cpus Idle System User Irq Softirq IO wait
+ 1.000450719 S0-C0 2 4.6% 87.0% 7.9% 0.4% 0.1% 0.0%
+ 1.000450719 S0-C1 2 4.8% 86.3% 8.3% 0.4% 0.1% 0.0%
+ 1.000450719 S0-C2 2 5.3% 86.3% 7.8% 0.4% 0.1% 0.0%
+ 1.000450719 S0-C3 2 5.2% 85.5% 8.7% 0.4% 0.1% 0.0%
+ 1.000450719 S0-C4 2 4.5% 86.7% 8.3% 0.4% 0.1% 0.0%
+
+ # perf stat --top ./perf bench sched messaging -l 10000
+ ...
+ Total time: 7.089 [sec]
+
+ Performance counter stats for './perf bench sched messaging -l 10000':
+
+ Idle System User Irq Softirq IO wait
+ 0.0% 90.1% 8.9% 0.5% 0.1% 0.0%
+
+ 7.186366800 seconds time elapsed
+
+ 14.527066000 seconds user
+ 146.254278000 seconds sys
+
EXAMPLES
--------

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index cc3dd85d5a60..dfe5a0d926c0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -135,6 +135,34 @@ static const char *smi_cost_attrs = {
"}"
};

+static const char *top_attrs = {
+ "{"
+ "cpu-clock,"
+ "cputime/idle/,"
+ "cputime/system/,"
+ "cputime/user/,"
+ "cputime/irq/,"
+ "cputime/softirq/,"
+ "cputime/iowait/"
+ "}"
+};
+
+static const char *top_full_attrs = {
+ "{"
+ "cpu-clock,"
+ "cputime/idle/,"
+ "cputime/system/,"
+ "cputime/user/,"
+ "cputime/irq/,"
+ "cputime/softirq/,"
+ "cputime/iowait/,"
+ "cputime/guest/,"
+ "cputime/guest_nice/,"
+ "cputime/nice/,"
+ "cputime/steal/"
+ "}"
+};
+
static struct perf_evlist *evsel_list;

static struct rblist metric_events;
@@ -154,6 +182,8 @@ static bool null_run = false;
static int detailed_run = 0;
static bool transaction_run;
static bool topdown_run = false;
+static bool top_run = false;
+static bool top_run_full = false;
static bool smi_cost = false;
static bool smi_reset = false;
static bool big_num = true;
@@ -2088,6 +2118,8 @@ static const struct option stat_options[] = {
"measure topdown level 1 statistics"),
OPT_BOOLEAN(0, "smi-cost", &smi_cost,
"measure SMI cost"),
+ OPT_BOOLEAN(0, "top", &top_run, "show CPU utilization"),
+ OPT_BOOLEAN(0, "top-full", &top_run_full, "show extended CPU utilization"),
OPT_CALLBACK('M', "metrics", &evsel_list, "metric/metric group list",
"monitor specified metrics or metric groups (separated by ,)",
parse_metric_groups),
@@ -2474,6 +2506,21 @@ static int add_default_attributes(void)
return 0;
}

+ if (top_run || top_run_full) {
+ const char *attrs = top_run ? top_attrs : top_full_attrs;
+
+ err = parse_events(evsel_list, attrs, &errinfo);
+ if (err) {
+ fprintf(stderr, "Cannot set up cputime events\n");
+ parse_events_print_error(&errinfo, attrs);
+ return -1;
+ }
+ if (!force_metric_only)
+ metric_only = true;
+ metric_only_len = 10;
+ return 0;
+ }
+
if (smi_cost) {
int smi;

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 594d14a02b67..06365dba1753 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -750,6 +750,46 @@ static void generic_metric(const char *metric_expr,
print_metric(ctxp, NULL, NULL, "", 0);
}

+static void cputime_color_name(struct perf_evsel *evsel,
+ const char **color, const char **name,
+ double ratio)
+{
+ if (perf_stat_evsel__is(evsel, CPUTIME_IDLE)) {
+ if (ratio < 0.8)
+ *color = PERF_COLOR_GREEN;
+ if (ratio < 0.5)
+ *color = PERF_COLOR_RED;
+ *name = "Idle";
+ return;
+ }
+
+ if (ratio > (MIN_GREEN / 100))
+ *color = PERF_COLOR_GREEN;
+ if (ratio > (MIN_RED / 100))
+ *color = PERF_COLOR_RED;
+
+ if (perf_stat_evsel__is(evsel, CPUTIME_GUEST))
+ *name = "Guest";
+ else if (perf_stat_evsel__is(evsel, CPUTIME_GUEST_NICE))
+ *name = "Guest nice";
+ else if (perf_stat_evsel__is(evsel, CPUTIME_IOWAIT))
+ *name = "IO wait";
+ else if (perf_stat_evsel__is(evsel, CPUTIME_IRQ))
+ *name = "Irq";
+ else if (perf_stat_evsel__is(evsel, CPUTIME_NICE))
+ *name = "Nice";
+ else if (perf_stat_evsel__is(evsel, CPUTIME_SOFTIRQ))
+ *name = "Softirq";
+ else if (perf_stat_evsel__is(evsel, CPUTIME_STEAL))
+ *name = "Steal";
+ else if (perf_stat_evsel__is(evsel, CPUTIME_SYSTEM))
+ *name = "System";
+ else if (perf_stat_evsel__is(evsel, CPUTIME_USER))
+ *name = "User";
+ else
+ *name = "unknown";
+}
+
void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
double avg, int cpu,
struct perf_stat_output_ctx *out,
@@ -960,6 +1000,36 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
be_bound * 100.);
else
print_metric(ctxp, NULL, NULL, name, 0);
+ } else if (perf_stat_evsel__is(evsel, CPUTIME_GUEST) ||
+ perf_stat_evsel__is(evsel, CPUTIME_GUEST_NICE) ||
+ perf_stat_evsel__is(evsel, CPUTIME_IDLE) ||
+ perf_stat_evsel__is(evsel, CPUTIME_IOWAIT) ||
+ perf_stat_evsel__is(evsel, CPUTIME_IRQ) ||
+ perf_stat_evsel__is(evsel, CPUTIME_NICE) ||
+ perf_stat_evsel__is(evsel, CPUTIME_SOFTIRQ) ||
+ perf_stat_evsel__is(evsel, CPUTIME_STEAL) ||
+ perf_stat_evsel__is(evsel, CPUTIME_SYSTEM) ||
+ perf_stat_evsel__is(evsel, CPUTIME_USER)) {
+
+ const char *name = NULL;
+
+ total = runtime_stat_avg(st, STAT_NSECS, ctx, cpu);
+
+ if (total)
+ ratio = avg / total;
+
+ cputime_color_name(evsel, &color, &name, ratio);
+
+ /*
+ * The cputime meassures are tricky, we can easily get some noise
+ * over 100% ... so let's be proactive and don't confuse users ;-)
+ */
+ ratio = min(1., ratio);
+
+ if (total)
+ print_metric(ctxp, color, "%8.1f%%", name, ratio * 100.);
+ else
+ print_metric(ctxp, NULL, NULL, name, 0);
} else if (evsel->metric_expr) {
generic_metric(evsel->metric_expr, evsel->metric_events, evsel->name,
evsel->metric_name, avg, cpu, out, st);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index a0061e0b0fad..aa78a7188029 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -89,6 +89,16 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
ID(SMI_NUM, msr/smi/),
ID(APERF, msr/aperf/),
+ ID(CPUTIME_GUEST, cputime/guest/),
+ ID(CPUTIME_GUEST_NICE, cputime/guest_nice/),
+ ID(CPUTIME_IDLE, cputime/idle/),
+ ID(CPUTIME_IOWAIT, cputime/iowait/),
+ ID(CPUTIME_IRQ, cputime/irq/),
+ ID(CPUTIME_NICE, cputime/nice/),
+ ID(CPUTIME_SOFTIRQ, cputime/softirq/),
+ ID(CPUTIME_STEAL, cputime/steal/),
+ ID(CPUTIME_SYSTEM, cputime/system/),
+ ID(CPUTIME_USER, cputime/user/),
};
#undef ID

diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 36efb986f7fc..24373873fc76 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -25,6 +25,16 @@ enum perf_stat_evsel_id {
PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
PERF_STAT_EVSEL_ID__SMI_NUM,
PERF_STAT_EVSEL_ID__APERF,
+ PERF_STAT_EVSEL_ID__CPUTIME_GUEST,
+ PERF_STAT_EVSEL_ID__CPUTIME_GUEST_NICE,
+ PERF_STAT_EVSEL_ID__CPUTIME_IDLE,
+ PERF_STAT_EVSEL_ID__CPUTIME_IOWAIT,
+ PERF_STAT_EVSEL_ID__CPUTIME_IRQ,
+ PERF_STAT_EVSEL_ID__CPUTIME_NICE,
+ PERF_STAT_EVSEL_ID__CPUTIME_SOFTIRQ,
+ PERF_STAT_EVSEL_ID__CPUTIME_STEAL,
+ PERF_STAT_EVSEL_ID__CPUTIME_SYSTEM,
+ PERF_STAT_EVSEL_ID__CPUTIME_USER,
PERF_STAT_EVSEL_ID__MAX,
};

--
2.13.6


2018-06-06 22:18:19

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 03/10] perf stat: Add --interval-clear option

Adding --interval-clear option to clear the screen
before next interval.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 3 +++
tools/perf/builtin-stat.c | 11 +++++++++--
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 5dfe102fb5b5..b10a90b6a718 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -178,6 +178,9 @@ Print count deltas for fixed number of times.
This option should be used together with "-I" option.
example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'

+--interval-clear::
+Clear the screen before next interval.
+
--timeout msecs::
Stop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms).
This option is not supported with the "-I" option.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index fce46252f89c..067d8b5b2c83 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -65,6 +65,7 @@
#include "util/tool.h"
#include "util/string2.h"
#include "util/metricgroup.h"
+#include "util/top.h"
#include "asm/bug.h"

#include <linux/time64.h>
@@ -173,6 +174,7 @@ static struct cpu_map *aggr_map;
static aggr_get_id_t aggr_get_id;
static bool append_file;
static bool interval_count;
+static bool interval_clear;
static const char *output_name;
static int output_fd;
static int print_free_counters_hint;
@@ -1713,9 +1715,12 @@ static void print_interval(char *prefix, struct timespec *ts)
FILE *output = stat_config.output;
static int num_print_interval;

+ if (interval_clear)
+ puts(CONSOLE_CLEAR);
+
sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);

- if (num_print_interval == 0 && !csv_output) {
+ if ((num_print_interval == 0 && !csv_output) || interval_clear) {
switch (stat_config.aggr_mode) {
case AGGR_SOCKET:
fprintf(output, "# time socket cpus");
@@ -1747,7 +1752,7 @@ static void print_interval(char *prefix, struct timespec *ts)
}
}

- if (num_print_interval == 0 && metric_only)
+ if ((num_print_interval == 0 && metric_only) || interval_clear)
print_metric_headers(" ", true);
if (++num_print_interval == 25)
num_print_interval = 0;
@@ -2066,6 +2071,8 @@ static const struct option stat_options[] = {
"(overhead is possible for values <= 100ms)"),
OPT_INTEGER(0, "interval-count", &stat_config.times,
"print counts for fixed number of times"),
+ OPT_BOOLEAN(0, "interval-clear", &interval_clear,
+ "clear screen in between new interval"),
OPT_UINTEGER(0, "timeout", &stat_config.timeout,
"stop workload and print counts after a timeout period in ms (>= 10ms)"),
OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
--
2.13.6


2018-06-06 22:18:49

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 06/10] perf stat: Allow to specify specific metric column len

Following change will introduce new metrics, that don't need
such wide hard coded spacing. Switching METRIC_ONLY_LEN macro
usage with metric_only_len variable.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-stat.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 4e7bae4c98d2..a8f93885763a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -145,6 +145,8 @@ static struct target target = {

typedef int (*aggr_get_id_t)(struct cpu_map *m, int cpu);

+#define METRIC_ONLY_LEN 20
+
static int run_count = 1;
static bool no_inherit = false;
static volatile pid_t child_pid = -1;
@@ -182,6 +184,7 @@ static int print_mixed_hw_group_error;
static u64 *walltime_run;
static bool ru_display = false;
static struct rusage ru_data;
+static unsigned int metric_only_len = METRIC_ONLY_LEN;

struct perf_stat {
bool record;
@@ -969,8 +972,6 @@ static void print_metric_csv(void *ctx,
fprintf(out, "%s%s%s%s", csv_sep, vals, csv_sep, unit);
}

-#define METRIC_ONLY_LEN 20
-
/* Filter out some columns that don't work well in metrics only mode */

static bool valid_only_metric(const char *unit)
@@ -1002,7 +1003,7 @@ static void print_metric_only(void *ctx, const char *color, const char *fmt,
struct outstate *os = ctx;
FILE *out = os->fh;
char buf[1024], str[1024];
- unsigned mlen = METRIC_ONLY_LEN;
+ unsigned mlen = metric_only_len;

if (!valid_only_metric(unit))
return;
@@ -1054,7 +1055,7 @@ static void print_metric_header(void *ctx, const char *color __maybe_unused,
if (csv_output)
fprintf(os->fh, "%s%s", unit, csv_sep);
else
- fprintf(os->fh, "%*s ", METRIC_ONLY_LEN, unit);
+ fprintf(os->fh, "%*s ", metric_only_len, unit);
}

static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
--
2.13.6


2018-06-06 22:25:36

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 01/10] perf tools: Uniquify the event name if there's no other matched event

Currently by default we try to match the user specified PMU
name to all PMU units available and use them to aggregate
all matched PMUs event counts into one 'pattern' event.

While this is useful for uncore events, it screws up names
for other events, where this is not desirable, like:

Before:
# perf stat -e cp/cpu-cycles/ kill

Performance counter stats for 'kill':

1,573,757 cp/cpu-cycles/

Keeping the pattern matching logic, but making the name unique
in case there's no other match found. That fixes the example
above and hopefully does not screw up anything else.

After:
# perf stat -e cp/cpu-cycles/ kill

Performance counter stats for 'kill':

1,573,757 cpu/cpu-cycles/

Cc: Andi Kleen <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Agustin Vega-Frias <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-stat.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 096ccb25c11f..fce46252f89c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1323,6 +1323,7 @@ static void collect_all_aliases(struct perf_evsel *counter,
void *data)
{
struct perf_evsel *alias;
+ int cnt = 0;

alias = list_prepare_entry(counter, &(evsel_list->entries), node);
list_for_each_entry_continue (alias, &evsel_list->entries, node) {
@@ -1334,7 +1335,15 @@ static void collect_all_aliases(struct perf_evsel *counter,
break;
alias->merged_stat = true;
cb(alias, data, false);
+ cnt++;
}
+
+ /*
+ * There's no matching event to aggregate
+ * counts with, fix the event name
+ */
+ if (!cnt)
+ uniquify_event_name(counter);
}

static bool collect_data(struct perf_evsel *counter,
--
2.13.6


2018-06-06 22:27:01

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 02/10] perf tools: Fix error index for pmu event parser

For events we provide specific error message we need to set
error column index, PMU parser is missing that, adding it.

Before:
$ perf stat -e cycles,krava/cycles/ kill
event syntax error: 'cycles,krava/cycles/'
\___ Cannot find PMU `krava'. Missing kernel support?

After:
$ perf stat -e cycles,krava/cycles/ kill
event syntax error: 'cycles,krava/cycles/'
\___ Cannot find PMU `krava'. Missing kernel support?

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/parse-events.y | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 155d2570274f..da8fe57691b8 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -227,11 +227,16 @@ event_def: event_pmu |
event_pmu:
PE_NAME opt_pmu_config
{
+ struct parse_events_state *parse_state = _parse_state;
+ struct parse_events_error *error = parse_state->error;
struct list_head *list, *orig_terms, *terms;

if (parse_events_copy_term_list($2, &orig_terms))
YYABORT;

+ if (error)
+ error->idx = @1.first_column;
+
ALLOC_LIST(list);
if (parse_events_add_pmu(_parse_state, list, $1, $2, false, false)) {
struct perf_pmu *pmu = NULL;
--
2.13.6


2018-06-06 22:29:14

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 04/10] perf stat: Use only color_fprintf call in print_metric_only

We can call color_fprintf also for non color case, it's
handled properly. This change simplifies following patch.

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-stat.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 067d8b5b2c83..0b1ddd5ef05d 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1008,10 +1008,7 @@ static void print_metric_only(void *ctx, const char *color, const char *fmt,
if (!valid_only_metric(unit))
return;
unit = fixunit(buf, os->evsel, unit);
- if (color)
- n = color_fprintf(out, color, fmt, val);
- else
- n = fprintf(out, fmt, val);
+ n = color_fprintf(out, color ?: "", fmt, val);
if (n > METRIC_ONLY_LEN)
n = METRIC_ONLY_LEN;
if (mlen < strlen(unit))
--
2.13.6


2018-06-06 22:33:16

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 07/10] perf stat: Add event parsing error handling to add_default_attributes

Adding missing error handling for parse_events calls
in add_default_attributes functions. The error handler
displays error details, like for transactions (-T):

Before:
$ perf stat -T
Cannot set up transaction events

After:
$ perf stat -T
Cannot set up transaction events
event syntax error: '..cycles,cpu/cycles-t/,cpu/tx-start/,cpu/el-start/,cpu/cycles-ct/}'
\___ unknown term

Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-stat.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a8f93885763a..cc3dd85d5a60 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2451,14 +2451,13 @@ static int add_default_attributes(void)
(PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
};
+ struct parse_events_error errinfo;

/* Set attrs if no event is selected and !null_run: */
if (null_run)
return 0;

if (transaction_run) {
- struct parse_events_error errinfo;
-
if (pmu_have_event("cpu", "cycles-ct") &&
pmu_have_event("cpu", "el-start"))
err = parse_events(evsel_list, transaction_attrs,
@@ -2469,6 +2468,7 @@ static int add_default_attributes(void)
&errinfo);
if (err) {
fprintf(stderr, "Cannot set up transaction events\n");
+ parse_events_print_error(&errinfo, transaction_attrs);
return -1;
}
return 0;
@@ -2494,10 +2494,11 @@ static int add_default_attributes(void)
pmu_have_event("msr", "smi")) {
if (!force_metric_only)
metric_only = true;
- err = parse_events(evsel_list, smi_cost_attrs, NULL);
+ err = parse_events(evsel_list, smi_cost_attrs, &errinfo);
} else {
fprintf(stderr, "To measure SMI cost, it needs "
"msr/aperf/, msr/smi/ and cpu/cycles/ support\n");
+ parse_events_print_error(&errinfo, smi_cost_attrs);
return -1;
}
if (err) {
@@ -2532,12 +2533,13 @@ static int add_default_attributes(void)
if (topdown_attrs[0] && str) {
if (warn)
arch_topdown_group_warn();
- err = parse_events(evsel_list, str, NULL);
+ err = parse_events(evsel_list, str, &errinfo);
if (err) {
fprintf(stderr,
"Cannot set up top down events %s: %d\n",
str, err);
free(str);
+ parse_events_print_error(&errinfo, str);
return -1;
}
} else {
--
2.13.6


2018-06-06 22:36:02

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 08/10] perf/cputime: Add cputime pmu

The CPUTIME_* counters account the time for CPU runtimes.
Adding 'cputime' PMU, that provides perf interface to those
counters.

The 'cputime' interface is standard software PMU, that provides
following events, meassuring their CPUTIME counterparts:

PERF_CPUTIME_USER - CPUTIME_USER
PERF_CPUTIME_NICE - CPUTIME_NICE
PERF_CPUTIME_SYSTEM - CPUTIME_SYSTEM
PERF_CPUTIME_SOFTIRQ - CPUTIME_SOFTIRQ
PERF_CPUTIME_IRQ - CPUTIME_IRQ
PERF_CPUTIME_IDLE - CPUTIME_IDLE
PERF_CPUTIME_IOWAIT - CPUTIME_IOWAIT
PERF_CPUTIME_STEAL - CPUTIME_STEAL
PERF_CPUTIME_GUEST - CPUTIME_GUEST
PERF_CPUTIME_GUEST_NICE - CPUTIME_GUEST_NICE

The 'cputime' PMU adds 'events' and 'format' directory,
with above events specifics.

It can be used via perf tool like:

# perf stat -e cputime/system/,cputime/user/ yes > /dev/null
^Cyes: Interrupt

Performance counter stats for 'yes':

2,177,550,368 ns cputime/system/
567,029,895 ns cputime/user/

2.749451438 seconds time elapsed

0.567127000 seconds user
2.177924000 seconds sys

Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/perf_event.h | 2 +
kernel/events/Makefile | 2 +-
kernel/events/core.c | 1 +
kernel/events/cputime.c | 198 +++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 kernel/events/cputime.c

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index bea0b0cd4bf7..aa9eaab370be 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1160,6 +1160,8 @@ static inline int perf_callchain_store(struct perf_callchain_entry_ctx *ctx, u64
}
}

+extern int __init perf_cputime_register(void);
+
extern int sysctl_perf_event_paranoid;
extern int sysctl_perf_event_mlock;
extern int sysctl_perf_event_sample_rate;
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 3c022e33c109..02271b8433a7 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -3,7 +3,7 @@ ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
endif

-obj-y := core.o ring_buffer.o callchain.o
+obj-y := core.o ring_buffer.o callchain.o cputime.o

obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_UPROBES) += uprobes.o
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 08f5e1b42b43..27a8459ce576 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11678,6 +11678,7 @@ void __init perf_event_init(void)
perf_pmu_register(&perf_cpu_clock, NULL, -1);
perf_pmu_register(&perf_task_clock, NULL, -1);
perf_tp_register();
+ perf_cputime_register();
perf_event_init_cpu(smp_processor_id());
register_reboot_notifier(&perf_reboot_notifier);

diff --git a/kernel/events/cputime.c b/kernel/events/cputime.c
new file mode 100644
index 000000000000..efad24543f13
--- /dev/null
+++ b/kernel/events/cputime.c
@@ -0,0 +1,198 @@
+#include <linux/kernel_stat.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+
+enum perf_cputime_id {
+ PERF_CPUTIME_USER,
+ PERF_CPUTIME_NICE,
+ PERF_CPUTIME_SYSTEM,
+ PERF_CPUTIME_SOFTIRQ,
+ PERF_CPUTIME_IRQ,
+ PERF_CPUTIME_IDLE,
+ PERF_CPUTIME_IOWAIT,
+ PERF_CPUTIME_STEAL,
+ PERF_CPUTIME_GUEST,
+ PERF_CPUTIME_GUEST_NICE,
+ PERF_CPUTIME_MAX,
+};
+
+static enum cpu_usage_stat map[PERF_CPUTIME_MAX] = {
+ [PERF_CPUTIME_USER] = CPUTIME_USER,
+ [PERF_CPUTIME_NICE] = CPUTIME_NICE,
+ [PERF_CPUTIME_SYSTEM] = CPUTIME_SYSTEM,
+ [PERF_CPUTIME_SOFTIRQ] = CPUTIME_SOFTIRQ,
+ [PERF_CPUTIME_IRQ] = CPUTIME_IRQ,
+ [PERF_CPUTIME_IDLE] = CPUTIME_IDLE,
+ [PERF_CPUTIME_IOWAIT] = CPUTIME_IOWAIT,
+ [PERF_CPUTIME_STEAL] = CPUTIME_STEAL,
+ [PERF_CPUTIME_GUEST] = CPUTIME_GUEST,
+ [PERF_CPUTIME_GUEST_NICE] = CPUTIME_GUEST_NICE,
+};
+
+PMU_FORMAT_ATTR(event, "config:0-63");
+
+static struct attribute *cputime_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group cputime_format_attr_group = {
+ .name = "format",
+ .attrs = cputime_format_attrs,
+};
+
+static ssize_t
+cputime_event_attr_show(struct device *dev, struct device_attribute *attr,
+ char *page)
+{
+ struct perf_pmu_events_attr *pmu_attr =
+ container_of(attr, struct perf_pmu_events_attr, attr);
+
+ return sprintf(page, "event=%llu\n", pmu_attr->id);
+}
+
+#define __A(__n, __e) \
+ PMU_EVENT_ATTR(__n, cputime_attr_##__n, \
+ __e, cputime_event_attr_show); \
+ PMU_EVENT_ATTR_STRING(__n.unit, \
+ cputime_attr_##__n##_unit, "ns");
+
+__A(user, PERF_CPUTIME_USER)
+__A(nice, PERF_CPUTIME_NICE)
+__A(system, PERF_CPUTIME_SYSTEM)
+__A(softirq, PERF_CPUTIME_SOFTIRQ)
+__A(irq, PERF_CPUTIME_IRQ)
+__A(idle, PERF_CPUTIME_IDLE)
+__A(iowait, PERF_CPUTIME_IOWAIT)
+__A(steal, PERF_CPUTIME_STEAL)
+__A(guest, PERF_CPUTIME_GUEST)
+__A(guest_nice, PERF_CPUTIME_GUEST_NICE)
+
+#undef __A
+
+static struct attribute *cputime_events_attrs[] = {
+#define __A(__n) \
+ &cputime_attr_##__n.attr.attr, \
+ &cputime_attr_##__n##_unit.attr.attr,
+
+ __A(user)
+ __A(nice)
+ __A(system)
+ __A(softirq)
+ __A(irq)
+ __A(idle)
+ __A(iowait)
+ __A(steal)
+ __A(guest)
+ __A(guest_nice)
+
+ NULL,
+
+#undef __A
+};
+
+static struct attribute_group cputime_events_attr_group = {
+ .name = "events",
+ .attrs = cputime_events_attrs,
+};
+
+static const struct attribute_group *cputime_attr_groups[] = {
+ &cputime_format_attr_group,
+ &cputime_events_attr_group,
+ NULL,
+};
+
+static u64 cputime_read_counter(struct perf_event *event)
+{
+ int cpu = event->oncpu;
+
+ return kcpustat_cpu(cpu).cpustat[event->hw.config];
+}
+
+static void perf_cputime_update(struct perf_event *event)
+{
+ u64 prev, now;
+ s64 delta;
+
+ /* Careful, an NMI might modify the previous event value: */
+again:
+ prev = local64_read(&event->hw.prev_count);
+ now = cputime_read_counter(event);
+
+ if (local64_cmpxchg(&event->hw.prev_count, prev, now) != prev)
+ goto again;
+
+ delta = now - prev;
+ local64_add(delta, &event->count);
+}
+
+static void cputime_event_start(struct perf_event *event, int flags)
+{
+ u64 now = cputime_read_counter(event);
+
+ local64_set(&event->hw.prev_count, now);
+}
+
+static void cputime_event_stop(struct perf_event *event, int flags)
+{
+ perf_cputime_update(event);
+}
+
+static int cputime_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ cputime_event_start(event, flags);
+
+ return 0;
+}
+
+static void cputime_event_del(struct perf_event *event, int flags)
+{
+ cputime_event_stop(event, PERF_EF_UPDATE);
+}
+
+static void perf_cputime_read(struct perf_event *event)
+{
+ perf_cputime_update(event);
+}
+
+static int cputime_event_init(struct perf_event *event)
+{
+ u64 cfg = event->attr.config;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest ||
+ event->attr.sample_period) /* no sampling */
+ return -EINVAL;
+
+ if (cfg >= PERF_CPUTIME_MAX)
+ return -EINVAL;
+
+ event->hw.config = map[cfg];
+ return 0;
+}
+
+static struct pmu perf_cputime = {
+ .task_ctx_nr = perf_sw_context,
+ .attr_groups = cputime_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_INTERRUPT,
+ .event_init = cputime_event_init,
+ .add = cputime_event_add,
+ .del = cputime_event_del,
+ .start = cputime_event_start,
+ .stop = cputime_event_stop,
+ .read = perf_cputime_read,
+};
+
+int __init perf_cputime_register(void)
+{
+ return perf_pmu_register(&perf_cputime, "cputime", -1);
+}
--
2.13.6


2018-06-06 23:32:06

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/10] perf tools: Uniquify the event name if there's no other matched event

On Thu, Jun 07, 2018 at 12:15:04AM +0200, Jiri Olsa wrote:
> Currently by default we try to match the user specified PMU
> name to all PMU units available and use them to aggregate
> all matched PMUs event counts into one 'pattern' event.
>
> While this is useful for uncore events, it screws up names
> for other events, where this is not desirable, like:
>
> Before:
> # perf stat -e cp/cpu-cycles/ kill

I assume you mean cpU/cpu-cycles/
>
> Performance counter stats for 'kill':
>
> 1,573,757 cp/cpu-cycles/
>
> Keeping the pattern matching logic, but making the name unique
> in case there's no other match found. That fixes the example
> above and hopefully does not screw up anything else.
>
> After:
> # perf stat -e cp/cpu-cycles/ kill
>
> Performance counter stats for 'kill':
>
> 1,573,757 cpu/cpu-cycles/


The output is 100% identical?

-Andi

2018-06-06 23:32:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC 00/10] perf: Add cputime events/metrics

> I had some issues with IDLE counter being miscounted due to stopping
> of the idle tick. I tried to solve it in this patch (it's part of the
> patchset):
> perf/cputime: Don't stop idle tick if there's live cputime event
>
> but I'm pretty sure it's wrong and there's better solution.

At least on intel we already have hardware counters for different idle
states. You just would need to add them and convert to the same
unit.

But of course it's still useful when this is not available.

> My current plan is now to read those counters in perf top/record/report
> to show (at least) the idle percentage for the current profile.

It's useful. Thanks for working on it. I was thinking about doing
something similar for some time.

-Andi

2018-06-07 06:34:33

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 01/10] perf tools: Uniquify the event name if there's no other matched event

On Wed, Jun 06, 2018 at 04:19:02PM -0700, Andi Kleen wrote:
> On Thu, Jun 07, 2018 at 12:15:04AM +0200, Jiri Olsa wrote:
> > Currently by default we try to match the user specified PMU
> > name to all PMU units available and use them to aggregate
> > all matched PMUs event counts into one 'pattern' event.
> >
> > While this is useful for uncore events, it screws up names
> > for other events, where this is not desirable, like:
> >
> > Before:
> > # perf stat -e cp/cpu-cycles/ kill
>
> I assume you mean cpU/cpu-cycles/
> >
> > Performance counter stats for 'kill':
> >
> > 1,573,757 cp/cpu-cycles/
> >
> > Keeping the pattern matching logic, but making the name unique
> > in case there's no other match found. That fixes the example
> > above and hopefully does not screw up anything else.
> >
> > After:
> > # perf stat -e cp/cpu-cycles/ kill
> >
> > Performance counter stats for 'kill':
> >
> > 1,573,757 cpu/cpu-cycles/
>
>
> The output is 100% identical?

nope, the U is actualy missing.. that's the thing, the patern
matching allows you to put 'cp' instead of 'cpu' and the final
output is screwed.. also the metrics won't match the proper event

jirka

2018-06-07 16:04:00

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 09/10] perf/cputime: Don't stop idle tick if there's live cputime event

Hi Jiri,

On Thu, Jun 7, 2018 at 8:45 AM Andi Kleen <[email protected]> wrote:
>
> On Thu, Jun 07, 2018 at 12:15:12AM +0200, Jiri Olsa wrote:
> > Disable stopping of the idle tick when having live cputime
> > event. When the tick is disabled, the idle counts are out
> > of date until next tick/update and perf cputime PMU provides
> > misleading counts.
>
> I really don't like this. This can totally change performance
> (e.g. less Turbo due to less idle) and performance tools shouldn't
> change the performance profile drastically.
>
You do not want to change the behavior of the kernel just because you
are monitoring.
This may introduce side effects on other events which may not otherwise exist.

2018-06-07 16:10:47

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 01/10] perf tools: Uniquify the event name if there's no other matched event

On Wed, Jun 6, 2018 at 11:22 PM Jiri Olsa <[email protected]> wrote:
>
> On Wed, Jun 06, 2018 at 04:19:02PM -0700, Andi Kleen wrote:
> > On Thu, Jun 07, 2018 at 12:15:04AM +0200, Jiri Olsa wrote:
> > > Currently by default we try to match the user specified PMU
> > > name to all PMU units available and use them to aggregate
> > > all matched PMUs event counts into one 'pattern' event.
> > >
> > > While this is useful for uncore events, it screws up names
> > > for other events, where this is not desirable, like:
> > >
> > > Before:
> > > # perf stat -e cp/cpu-cycles/ kill
> >
> > I assume you mean cpU/cpu-cycles/
> > >
> > > Performance counter stats for 'kill':
> > >
> > > 1,573,757 cp/cpu-cycles/
> > >
> > > Keeping the pattern matching logic, but making the name unique
> > > in case there's no other match found. That fixes the example
> > > above and hopefully does not screw up anything else.
> > >
And the problem I have with this approach, is that you do not really
know what you are measuring.
You have not way of knowing that the count comes from multiple PMU instances:

perf stat -a -e uncore_cb/clockticks/ kill
0 uncore_cb/clockticks/

I think you need to report that this is aggregated from uncore_cbox_0
and uncore_cbox_1.
Otherwise it is not clear what I am monitoring. And you hope that what
you match in the regex
is related.
In my example should say:
0 uncore_cbox[0-1]/clockticks/

then it is clear what happened.

> > > After:
> > > # perf stat -e cp/cpu-cycles/ kill
> > >
> > > Performance counter stats for 'kill':
> > >
> > > 1,573,757 cpu/cpu-cycles/
> >
> >
> > The output is 100% identical?
>
> nope, the U is actualy missing.. that's the thing, the patern
> matching allows you to put 'cp' instead of 'cpu' and the final
> output is screwed.. also the metrics won't match the proper event
>
> jirka

2018-06-07 18:55:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 09/10] perf/cputime: Don't stop idle tick if there's live cputime event

On Thu, Jun 07, 2018 at 12:15:12AM +0200, Jiri Olsa wrote:
> Disable stopping of the idle tick when having live cputime
> event. When the tick is disabled, the idle counts are out
> of date until next tick/update and perf cputime PMU provides
> misleading counts.

I really don't like this. This can totally change performance
(e.g. less Turbo due to less idle) and performance tools shouldn't
change the performance profile drastically.

-Andi

2018-06-07 19:11:44

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 03/10] perf stat: Add --interval-clear option

Em Thu, Jun 07, 2018 at 12:15:06AM +0200, Jiri Olsa escreveu:
> Adding --interval-clear option to clear the screen
> before next interval.

Better than:

watch -n 0 perf stat -a sleep 1

:-)

Tested and applied,

- Arnaldo

> Link: http://lkml.kernel.org/n/[email protected]
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/Documentation/perf-stat.txt | 3 +++
> tools/perf/builtin-stat.c | 11 +++++++++--
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> index 5dfe102fb5b5..b10a90b6a718 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -178,6 +178,9 @@ Print count deltas for fixed number of times.
> This option should be used together with "-I" option.
> example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
>
> +--interval-clear::
> +Clear the screen before next interval.
> +
> --timeout msecs::
> Stop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms).
> This option is not supported with the "-I" option.
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index fce46252f89c..067d8b5b2c83 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -65,6 +65,7 @@
> #include "util/tool.h"
> #include "util/string2.h"
> #include "util/metricgroup.h"
> +#include "util/top.h"
> #include "asm/bug.h"
>
> #include <linux/time64.h>
> @@ -173,6 +174,7 @@ static struct cpu_map *aggr_map;
> static aggr_get_id_t aggr_get_id;
> static bool append_file;
> static bool interval_count;
> +static bool interval_clear;
> static const char *output_name;
> static int output_fd;
> static int print_free_counters_hint;
> @@ -1713,9 +1715,12 @@ static void print_interval(char *prefix, struct timespec *ts)
> FILE *output = stat_config.output;
> static int num_print_interval;
>
> + if (interval_clear)
> + puts(CONSOLE_CLEAR);
> +
> sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
>
> - if (num_print_interval == 0 && !csv_output) {
> + if ((num_print_interval == 0 && !csv_output) || interval_clear) {
> switch (stat_config.aggr_mode) {
> case AGGR_SOCKET:
> fprintf(output, "# time socket cpus");
> @@ -1747,7 +1752,7 @@ static void print_interval(char *prefix, struct timespec *ts)
> }
> }
>
> - if (num_print_interval == 0 && metric_only)
> + if ((num_print_interval == 0 && metric_only) || interval_clear)
> print_metric_headers(" ", true);
> if (++num_print_interval == 25)
> num_print_interval = 0;
> @@ -2066,6 +2071,8 @@ static const struct option stat_options[] = {
> "(overhead is possible for values <= 100ms)"),
> OPT_INTEGER(0, "interval-count", &stat_config.times,
> "print counts for fixed number of times"),
> + OPT_BOOLEAN(0, "interval-clear", &interval_clear,
> + "clear screen in between new interval"),
> OPT_UINTEGER(0, "timeout", &stat_config.timeout,
> "stop workload and print counts after a timeout period in ms (>= 10ms)"),
> OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
> --
> 2.13.6

2018-06-07 19:11:52

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 04/10] perf stat: Use only color_fprintf call in print_metric_only

Em Thu, Jun 07, 2018 at 12:15:07AM +0200, Jiri Olsa escreveu:
> We can call color_fprintf also for non color case, it's
> handled properly. This change simplifies following patch.

Thanks, applied.

- Arnaldo

2018-06-07 19:11:54

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 05/10] perf stat: Fix metric column display

Em Thu, Jun 07, 2018 at 12:15:08AM +0200, Jiri Olsa escreveu:
> Make the metric only display aligned.
>
> Before:
> # perf stat --topdown -I 1000
> # time core cpus retiring bad speculation frontend bound backend bound
> 1.000394323 S0-C0 2 37.4% 12.0% 31.4% 19.2%
> 1.000394323 S0-C1 2 25.1% 9.2% 43.8% 21.9%
> 2.001521204 S0-C0 2 36.4% 11.4% 32.4% 19.8%
> 2.001521204 S0-C1 2 26.2% 9.4% 43.1% 21.3%
> 3.001930208 S0-C0 2 35.1% 10.7% 33.6% 20.6%
> 3.001930208 S0-C1 2 28.9% 10.0% 40.0% 21.1%
>
> After:
> # perf stat --topdown -I 1000
> # time core cpus retiring bad speculation frontend bound backend bound
> 1.000303722 S0-C0 2 34.2% 7.6% 34.2% 24.0%
> 1.000303722 S0-C1 2 33.1% 6.4% 36.9% 23.6%
> 2.001281055 S0-C0 2 34.6% 6.7% 36.8% 21.8%
> 2.001281055 S0-C1 2 32.8% 7.1% 38.1% 22.0%
> 3.001546080 S0-C0 2 39.3% 5.5% 32.7% 22.5%
> 3.001546080 S0-C1 2 37.8% 6.0% 33.1% 23.1%

Thanks, tested, applied.

- Arnaldo

2018-06-07 19:12:14

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 07/10] perf stat: Add event parsing error handling to add_default_attributes

Em Thu, Jun 07, 2018 at 04:04:55PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Jun 07, 2018 at 12:15:10AM +0200, Jiri Olsa escreveu:
> > Adding missing error handling for parse_events calls
> > in add_default_attributes functions. The error handler
> > displays error details, like for transactions (-T):
>
> Applied up to here, that are trivial, waiting a bit more discussion
> about the really new stuff,

Except 1/10, that I also left for later as there is ongoing discussion.

- Arnaldo

2018-06-07 19:12:21

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 02/10] perf tools: Fix error index for pmu event parser

Em Thu, Jun 07, 2018 at 12:15:05AM +0200, Jiri Olsa escreveu:
> For events we provide specific error message we need to set
> error column index, PMU parser is missing that, adding it.
>
> Before:
> $ perf stat -e cycles,krava/cycles/ kill
> event syntax error: 'cycles,krava/cycles/'
> \___ Cannot find PMU `krava'. Missing kernel support?
>
> After:
> $ perf stat -e cycles,krava/cycles/ kill
> event syntax error: 'cycles,krava/cycles/'
> \___ Cannot find PMU `krava'. Missing kernel support?

Thanks, tested and applied.

- Arnaldo

2018-06-07 19:13:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 07/10] perf stat: Add event parsing error handling to add_default_attributes

Em Thu, Jun 07, 2018 at 12:15:10AM +0200, Jiri Olsa escreveu:
> Adding missing error handling for parse_events calls
> in add_default_attributes functions. The error handler
> displays error details, like for transactions (-T):

Applied up to here, that are trivial, waiting a bit more discussion
about the really new stuff,

Thanks,

- Arnaldo

> Before:
> $ perf stat -T
> Cannot set up transaction events
>
> After:
> $ perf stat -T
> Cannot set up transaction events
> event syntax error: '..cycles,cpu/cycles-t/,cpu/tx-start/,cpu/el-start/,cpu/cycles-ct/}'
> \___ unknown term
>
> Link: http://lkml.kernel.org/n/[email protected]
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/builtin-stat.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index a8f93885763a..cc3dd85d5a60 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -2451,14 +2451,13 @@ static int add_default_attributes(void)
> (PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
> (PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
> };
> + struct parse_events_error errinfo;
>
> /* Set attrs if no event is selected and !null_run: */
> if (null_run)
> return 0;
>
> if (transaction_run) {
> - struct parse_events_error errinfo;
> -
> if (pmu_have_event("cpu", "cycles-ct") &&
> pmu_have_event("cpu", "el-start"))
> err = parse_events(evsel_list, transaction_attrs,
> @@ -2469,6 +2468,7 @@ static int add_default_attributes(void)
> &errinfo);
> if (err) {
> fprintf(stderr, "Cannot set up transaction events\n");
> + parse_events_print_error(&errinfo, transaction_attrs);
> return -1;
> }
> return 0;
> @@ -2494,10 +2494,11 @@ static int add_default_attributes(void)
> pmu_have_event("msr", "smi")) {
> if (!force_metric_only)
> metric_only = true;
> - err = parse_events(evsel_list, smi_cost_attrs, NULL);
> + err = parse_events(evsel_list, smi_cost_attrs, &errinfo);
> } else {
> fprintf(stderr, "To measure SMI cost, it needs "
> "msr/aperf/, msr/smi/ and cpu/cycles/ support\n");
> + parse_events_print_error(&errinfo, smi_cost_attrs);
> return -1;
> }
> if (err) {
> @@ -2532,12 +2533,13 @@ static int add_default_attributes(void)
> if (topdown_attrs[0] && str) {
> if (warn)
> arch_topdown_group_warn();
> - err = parse_events(evsel_list, str, NULL);
> + err = parse_events(evsel_list, str, &errinfo);
> if (err) {
> fprintf(stderr,
> "Cannot set up top down events %s: %d\n",
> str, err);
> free(str);
> + parse_events_print_error(&errinfo, str);
> return -1;
> }
> } else {
> --
> 2.13.6

2018-06-08 00:09:38

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 01/10] perf tools: Uniquify the event name if there's no other matched event

On Thu, Jun 07, 2018 at 09:09:10AM -0700, Stephane Eranian wrote:
> On Wed, Jun 6, 2018 at 11:22 PM Jiri Olsa <[email protected]> wrote:
> >
> > On Wed, Jun 06, 2018 at 04:19:02PM -0700, Andi Kleen wrote:
> > > On Thu, Jun 07, 2018 at 12:15:04AM +0200, Jiri Olsa wrote:
> > > > Currently by default we try to match the user specified PMU
> > > > name to all PMU units available and use them to aggregate
> > > > all matched PMUs event counts into one 'pattern' event.
> > > >
> > > > While this is useful for uncore events, it screws up names
> > > > for other events, where this is not desirable, like:
> > > >
> > > > Before:
> > > > # perf stat -e cp/cpu-cycles/ kill
> > >
> > > I assume you mean cpU/cpu-cycles/
> > > >
> > > > Performance counter stats for 'kill':
> > > >
> > > > 1,573,757 cp/cpu-cycles/
> > > >
> > > > Keeping the pattern matching logic, but making the name unique
> > > > in case there's no other match found. That fixes the example
> > > > above and hopefully does not screw up anything else.
> > > >
> And the problem I have with this approach, is that you do not really
> know what you are measuring.
> You have not way of knowing that the count comes from multiple PMU instances:
>
> perf stat -a -e uncore_cb/clockticks/ kill
> 0 uncore_cb/clockticks/
>
> I think you need to report that this is aggregated from uncore_cbox_0
> and uncore_cbox_1.
> Otherwise it is not clear what I am monitoring. And you hope that what
> you match in the regex
> is related.
> In my example should say:
> 0 uncore_cbox[0-1]/clockticks/

ok, makes sense.. also shouldnt be that hard to do.. will repost

thanks,
jirka

2018-06-08 00:12:47

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 09/10] perf/cputime: Don't stop idle tick if there's live cputime event

On Thu, Jun 07, 2018 at 09:01:30AM -0700, Stephane Eranian wrote:
> Hi Jiri,
>
> On Thu, Jun 7, 2018 at 8:45 AM Andi Kleen <[email protected]> wrote:
> >
> > On Thu, Jun 07, 2018 at 12:15:12AM +0200, Jiri Olsa wrote:
> > > Disable stopping of the idle tick when having live cputime
> > > event. When the tick is disabled, the idle counts are out
> > > of date until next tick/update and perf cputime PMU provides
> > > misleading counts.
> >
> > I really don't like this. This can totally change performance
> > (e.g. less Turbo due to less idle) and performance tools shouldn't
> > change the performance profile drastically.
> >
> You do not want to change the behavior of the kernel just because you
> are monitoring.
> This may introduce side effects on other events which may not otherwise exist.

right.. I guess we can survive few seconds of misleading idle counts
and perhaps we could detect nohz is enabled and warn about this

thanks,
jirka

Subject: [tip:perf/urgent] perf stat: Add --interval-clear option

Commit-ID: 9660e08ee8cbc94ac835f2c30576c6e51fbece8f
Gitweb: https://git.kernel.org/tip/9660e08ee8cbc94ac835f2c30576c6e51fbece8f
Author: Jiri Olsa <[email protected]>
AuthorDate: Thu, 7 Jun 2018 00:15:06 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 7 Jun 2018 15:53:36 -0300

perf stat: Add --interval-clear option

Adding --interval-clear option to clear the screen before next interval.

Committer testing:

# perf stat -I 1000 --interval-clear

And, as expected, it behaves almost like:

# watch -n 0 perf stat -a sleep 1

Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 3 +++
tools/perf/builtin-stat.c | 11 +++++++++--
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 5dfe102fb5b5..b10a90b6a718 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -178,6 +178,9 @@ Print count deltas for fixed number of times.
This option should be used together with "-I" option.
example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'

+--interval-clear::
+Clear the screen before next interval.
+
--timeout msecs::
Stop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms).
This option is not supported with the "-I" option.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 096ccb25c11f..f1532e3ac7d7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -65,6 +65,7 @@
#include "util/tool.h"
#include "util/string2.h"
#include "util/metricgroup.h"
+#include "util/top.h"
#include "asm/bug.h"

#include <linux/time64.h>
@@ -173,6 +174,7 @@ static struct cpu_map *aggr_map;
static aggr_get_id_t aggr_get_id;
static bool append_file;
static bool interval_count;
+static bool interval_clear;
static const char *output_name;
static int output_fd;
static int print_free_counters_hint;
@@ -1704,9 +1706,12 @@ static void print_interval(char *prefix, struct timespec *ts)
FILE *output = stat_config.output;
static int num_print_interval;

+ if (interval_clear)
+ puts(CONSOLE_CLEAR);
+
sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);

- if (num_print_interval == 0 && !csv_output) {
+ if ((num_print_interval == 0 && !csv_output) || interval_clear) {
switch (stat_config.aggr_mode) {
case AGGR_SOCKET:
fprintf(output, "# time socket cpus");
@@ -1738,7 +1743,7 @@ static void print_interval(char *prefix, struct timespec *ts)
}
}

- if (num_print_interval == 0 && metric_only)
+ if ((num_print_interval == 0 && metric_only) || interval_clear)
print_metric_headers(" ", true);
if (++num_print_interval == 25)
num_print_interval = 0;
@@ -2057,6 +2062,8 @@ static const struct option stat_options[] = {
"(overhead is possible for values <= 100ms)"),
OPT_INTEGER(0, "interval-count", &stat_config.times,
"print counts for fixed number of times"),
+ OPT_BOOLEAN(0, "interval-clear", &interval_clear,
+ "clear screen in between new interval"),
OPT_UINTEGER(0, "timeout", &stat_config.timeout,
"stop workload and print counts after a timeout period in ms (>= 10ms)"),
OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,

Subject: [tip:perf/urgent] perf stat: Use only color_fprintf call in print_metric_only

Commit-ID: b37d33edbf41b532ddd156707c037c6f4784e40b
Gitweb: https://git.kernel.org/tip/b37d33edbf41b532ddd156707c037c6f4784e40b
Author: Jiri Olsa <[email protected]>
AuthorDate: Thu, 7 Jun 2018 00:15:07 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 7 Jun 2018 15:58:13 -0300

perf stat: Use only color_fprintf call in print_metric_only

We can call color_fprintf also for non color case, it's handled
properly. This change simplifies following patch.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-stat.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f1532e3ac7d7..9e7b6f108956 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1008,10 +1008,7 @@ static void print_metric_only(void *ctx, const char *color, const char *fmt,
if (!valid_only_metric(unit))
return;
unit = fixunit(buf, os->evsel, unit);
- if (color)
- n = color_fprintf(out, color, fmt, val);
- else
- n = fprintf(out, fmt, val);
+ n = color_fprintf(out, color ?: "", fmt, val);
if (n > METRIC_ONLY_LEN)
n = METRIC_ONLY_LEN;
if (mlen < strlen(unit))

Subject: [tip:perf/urgent] perf stat: Allow to specify specific metric column len

Commit-ID: c1a1f5d9da800dc715d8c1d8a9692c63c70c2955
Gitweb: https://git.kernel.org/tip/c1a1f5d9da800dc715d8c1d8a9692c63c70c2955
Author: Jiri Olsa <[email protected]>
AuthorDate: Thu, 7 Jun 2018 00:15:09 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 7 Jun 2018 16:01:44 -0300

perf stat: Allow to specify specific metric column len

The following change will introduce new metrics, that doesn't need such
wide hard coded spacing. Switch METRIC_ONLY_LEN macro usage with
metric_only_len variable.

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-stat.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 8f3fdc052728..3fc1f5286d50 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -145,6 +145,8 @@ static struct target target = {

typedef int (*aggr_get_id_t)(struct cpu_map *m, int cpu);

+#define METRIC_ONLY_LEN 20
+
static int run_count = 1;
static bool no_inherit = false;
static volatile pid_t child_pid = -1;
@@ -182,6 +184,7 @@ static int print_mixed_hw_group_error;
static u64 *walltime_run;
static bool ru_display = false;
static struct rusage ru_data;
+static unsigned int metric_only_len = METRIC_ONLY_LEN;

struct perf_stat {
bool record;
@@ -969,8 +972,6 @@ static void print_metric_csv(void *ctx,
fprintf(out, "%s%s%s%s", csv_sep, vals, csv_sep, unit);
}

-#define METRIC_ONLY_LEN 20
-
/* Filter out some columns that don't work well in metrics only mode */

static bool valid_only_metric(const char *unit)
@@ -1002,7 +1003,7 @@ static void print_metric_only(void *ctx, const char *color, const char *fmt,
struct outstate *os = ctx;
FILE *out = os->fh;
char buf[1024], str[1024];
- unsigned mlen = METRIC_ONLY_LEN;
+ unsigned mlen = metric_only_len;

if (!valid_only_metric(unit))
return;
@@ -1054,7 +1055,7 @@ static void print_metric_header(void *ctx, const char *color __maybe_unused,
if (csv_output)
fprintf(os->fh, "%s%s", unit, csv_sep);
else
- fprintf(os->fh, "%*s ", METRIC_ONLY_LEN, unit);
+ fprintf(os->fh, "%*s ", metric_only_len, unit);
}

static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)

Subject: [tip:perf/urgent] perf stat: Fix metric column header display alignment

Commit-ID: f515572734fb323aa0efe9ea2c546cd7fee327f7
Gitweb: https://git.kernel.org/tip/f515572734fb323aa0efe9ea2c546cd7fee327f7
Author: Jiri Olsa <[email protected]>
AuthorDate: Thu, 7 Jun 2018 00:15:08 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 7 Jun 2018 15:59:13 -0300

perf stat: Fix metric column header display alignment

Make the metric only display aligned.

Before:
# perf stat --topdown -I 1000
# time core cpus retiring bad speculation frontend bound backend bound
1.000394323 S0-C0 2 37.4% 12.0% 31.4% 19.2%
1.000394323 S0-C1 2 25.1% 9.2% 43.8% 21.9%
2.001521204 S0-C0 2 36.4% 11.4% 32.4% 19.8%
2.001521204 S0-C1 2 26.2% 9.4% 43.1% 21.3%
3.001930208 S0-C0 2 35.1% 10.7% 33.6% 20.6%
3.001930208 S0-C1 2 28.9% 10.0% 40.0% 21.1%

After:
# perf stat --topdown -I 1000
# time core cpus retiring bad speculation frontend bound backend bound
1.000303722 S0-C0 2 34.2% 7.6% 34.2% 24.0%
1.000303722 S0-C1 2 33.1% 6.4% 36.9% 23.6%
2.001281055 S0-C0 2 34.6% 6.7% 36.8% 21.8%
2.001281055 S0-C1 2 32.8% 7.1% 38.1% 22.0%
3.001546080 S0-C0 2 39.3% 5.5% 32.7% 22.5%
3.001546080 S0-C1 2 37.8% 6.0% 33.1% 23.1%

Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-stat.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9e7b6f108956..8f3fdc052728 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1001,19 +1001,20 @@ static void print_metric_only(void *ctx, const char *color, const char *fmt,
{
struct outstate *os = ctx;
FILE *out = os->fh;
- int n;
- char buf[1024];
+ char buf[1024], str[1024];
unsigned mlen = METRIC_ONLY_LEN;

if (!valid_only_metric(unit))
return;
unit = fixunit(buf, os->evsel, unit);
- n = color_fprintf(out, color ?: "", fmt, val);
- if (n > METRIC_ONLY_LEN)
- n = METRIC_ONLY_LEN;
if (mlen < strlen(unit))
mlen = strlen(unit) + 1;
- fprintf(out, "%*s", mlen - n, "");
+
+ if (color)
+ mlen += strlen(color) + sizeof(PERF_COLOR_RESET) - 1;
+
+ color_snprintf(str, sizeof(str), color ?: "", fmt, val);
+ fprintf(out, "%*s ", mlen, str);
}

static void print_metric_only_csv(void *ctx, const char *color __maybe_unused,
@@ -1053,7 +1054,7 @@ static void print_metric_header(void *ctx, const char *color __maybe_unused,
if (csv_output)
fprintf(os->fh, "%s%s", unit, csv_sep);
else
- fprintf(os->fh, "%-*s ", METRIC_ONLY_LEN, unit);
+ fprintf(os->fh, "%*s ", METRIC_ONLY_LEN, unit);
}

static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
@@ -1721,7 +1722,7 @@ static void print_interval(char *prefix, struct timespec *ts)
fprintf(output, " counts %*s events\n", unit_width, "unit");
break;
case AGGR_NONE:
- fprintf(output, "# time CPU");
+ fprintf(output, "# time CPU ");
if (!metric_only)
fprintf(output, " counts %*s events\n", unit_width, "unit");
break;

Subject: [tip:perf/urgent] perf stat: Add event parsing error handling to add_default_attributes

Commit-ID: a5cfa6217c94a1f1cfad4481fc14f5fc399abde3
Gitweb: https://git.kernel.org/tip/a5cfa6217c94a1f1cfad4481fc14f5fc399abde3
Author: Jiri Olsa <[email protected]>
AuthorDate: Thu, 7 Jun 2018 00:15:10 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 7 Jun 2018 16:03:21 -0300

perf stat: Add event parsing error handling to add_default_attributes

Add missing error handling for parse_events calls in add_default_attributes
functions. The error handler displays error details, like for transactions (-T):

Before:
$ perf stat -T
Cannot set up transaction events

After:
$ perf stat -T
Cannot set up transaction events
event syntax error: '..cycles,cpu/cycles-t/,cpu/tx-start/,cpu/el-start/,cpu/cycles-ct/}'
\___ unknown term

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-stat.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 3fc1f5286d50..22547a490e1f 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2442,14 +2442,13 @@ static int add_default_attributes(void)
(PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
};
+ struct parse_events_error errinfo;

/* Set attrs if no event is selected and !null_run: */
if (null_run)
return 0;

if (transaction_run) {
- struct parse_events_error errinfo;
-
if (pmu_have_event("cpu", "cycles-ct") &&
pmu_have_event("cpu", "el-start"))
err = parse_events(evsel_list, transaction_attrs,
@@ -2460,6 +2459,7 @@ static int add_default_attributes(void)
&errinfo);
if (err) {
fprintf(stderr, "Cannot set up transaction events\n");
+ parse_events_print_error(&errinfo, transaction_attrs);
return -1;
}
return 0;
@@ -2485,10 +2485,11 @@ static int add_default_attributes(void)
pmu_have_event("msr", "smi")) {
if (!force_metric_only)
metric_only = true;
- err = parse_events(evsel_list, smi_cost_attrs, NULL);
+ err = parse_events(evsel_list, smi_cost_attrs, &errinfo);
} else {
fprintf(stderr, "To measure SMI cost, it needs "
"msr/aperf/, msr/smi/ and cpu/cycles/ support\n");
+ parse_events_print_error(&errinfo, smi_cost_attrs);
return -1;
}
if (err) {
@@ -2523,12 +2524,13 @@ static int add_default_attributes(void)
if (topdown_attrs[0] && str) {
if (warn)
arch_topdown_group_warn();
- err = parse_events(evsel_list, str, NULL);
+ err = parse_events(evsel_list, str, &errinfo);
if (err) {
fprintf(stderr,
"Cannot set up top down events %s: %d\n",
str, err);
free(str);
+ parse_events_print_error(&errinfo, str);
return -1;
}
} else {

Subject: [tip:perf/urgent] perf tools: Fix error index for pmu event parser

Commit-ID: f7fa827f5f432a0b1f34e10fc49da93aeef9f817
Gitweb: https://git.kernel.org/tip/f7fa827f5f432a0b1f34e10fc49da93aeef9f817
Author: Jiri Olsa <[email protected]>
AuthorDate: Thu, 7 Jun 2018 00:15:05 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 7 Jun 2018 15:50:14 -0300

perf tools: Fix error index for pmu event parser

For events we provide specific error message we need to set error column
index, PMU parser is missing that, adding it.

Before:

$ perf stat -e cycles,krava/cycles/ kill
event syntax error: 'cycles,krava/cycles/'
\___ Cannot find PMU `krava'. Missing kernel support?

After:

$ perf stat -e cycles,krava/cycles/ kill
event syntax error: 'cycles,krava/cycles/'
\___ Cannot find PMU `krava'. Missing kernel support?

Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Milian Wolff <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/parse-events.y | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 155d2570274f..da8fe57691b8 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -227,11 +227,16 @@ event_def: event_pmu |
event_pmu:
PE_NAME opt_pmu_config
{
+ struct parse_events_state *parse_state = _parse_state;
+ struct parse_events_error *error = parse_state->error;
struct list_head *list, *orig_terms, *terms;

if (parse_events_copy_term_list($2, &orig_terms))
YYABORT;

+ if (error)
+ error->idx = @1.first_column;
+
ALLOC_LIST(list);
if (parse_events_add_pmu(_parse_state, list, $1, $2, false, false)) {
struct perf_pmu *pmu = NULL;

2018-09-26 14:44:42

by Milian Wolff

[permalink] [raw]
Subject: Re: [RFC 00/10] perf: Add cputime events/metrics

On Thursday, June 7, 2018 1:10:18 AM CEST Andi Kleen wrote:
> > I had some issues with IDLE counter being miscounted due to stopping
> > of the idle tick. I tried to solve it in this patch (it's part of the
> >
> > patchset):
> > perf/cputime: Don't stop idle tick if there's live cputime event
> >
> > but I'm pretty sure it's wrong and there's better solution.
>
> At least on intel we already have hardware counters for different idle
> states. You just would need to add them and convert to the same
> unit.
>
> But of course it's still useful when this is not available.
>
> > My current plan is now to read those counters in perf top/record/report
> > to show (at least) the idle percentage for the current profile.
>
> It's useful. Thanks for working on it. I was thinking about doing
> something similar for some time.

Hey Jiri,

what happened to this patch series? I also believe it's super useful, even
when it's not yet perfect.

Thanks

--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts


Attachments:
smime.p7s (3.74 kB)

2018-09-26 21:49:11

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC 00/10] perf: Add cputime events/metrics

On Wed, Sep 26, 2018 at 04:44:08PM +0200, Milian Wolff wrote:
> On Thursday, June 7, 2018 1:10:18 AM CEST Andi Kleen wrote:
> > > I had some issues with IDLE counter being miscounted due to stopping
> > > of the idle tick. I tried to solve it in this patch (it's part of the
> > >
> > > patchset):
> > > perf/cputime: Don't stop idle tick if there's live cputime event
> > >
> > > but I'm pretty sure it's wrong and there's better solution.
> >
> > At least on intel we already have hardware counters for different idle
> > states. You just would need to add them and convert to the same
> > unit.
> >
> > But of course it's still useful when this is not available.
> >
> > > My current plan is now to read those counters in perf top/record/report
> > > to show (at least) the idle percentage for the current profile.
> >
> > It's useful. Thanks for working on it. I was thinking about doing
> > something similar for some time.
>
> Hey Jiri,
>
> what happened to this patch series? I also believe it's super useful, even
> when it's not yet perfect.

heya,
got sidetracked.. thanks for feedback, will refresh
and repost sane version

thanks,
jirka