2020-07-10 15:12:49

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 00/12] perf intel-pt: Add support for decoding FUP/TIP only

Hi

Here are some fixes and small improvements for Intel PT.

Changes in V2:
For d/e flags, use +/- alphabetic options instead of numbers
Update help text
Improve documentation


Adrian Hunter (12):
perf intel-pt: Fix FUP packet state
perf intel-pt: Fix duplicate branch after CBR
perf tools: Improve aux_output not supported error
perf auxtrace: Add missing itrace options to help text
perf auxtrace: Add optional error flags to the itrace 'e' option
perf intel-pt: Use itrace error flags to suppress some errors
perf auxtrace: Add optional log flags to the itrace 'd' option
perf intel-pt: Use itrace debug log flags to suppress some messages
perf intel-pt: Time filter logged perf events
perf auxtrace: Add itrace 'q' option for quicker, less detailed decoding
perf intel-pt: Add support for decoding FUP/TIP only
perf intel-pt: Add support for decoding PSB+ only

tools/perf/Documentation/itrace.txt | 14 ++
tools/perf/Documentation/perf-intel-pt.txt | 63 +++++-
tools/perf/util/auxtrace.c | 50 +++++
tools/perf/util/auxtrace.h | 31 ++-
tools/perf/util/evsel.c | 4 +
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 214 +++++++++++++++++++--
.../perf/util/intel-pt-decoder/intel-pt-decoder.h | 1 +
tools/perf/util/intel-pt.c | 45 ++++-
8 files changed, 389 insertions(+), 33 deletions(-)


Regards
Adrian


2020-07-10 15:13:20

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 01/12] perf intel-pt: Fix FUP packet state

While walking code towards a FUP ip, the packet state is
INTEL_PT_STATE_FUP or INTEL_PT_STATE_FUP_NO_TIP. That was mishandled
resulting in the state becoming INTEL_PT_STATE_IN_SYNC prematurely.
The result was an occasional lost EXSTOP event.

Signed-off-by: Adrian Hunter <[email protected]>
Cc: [email protected]
---
.../util/intel-pt-decoder/intel-pt-decoder.c | 21 +++++++------------
1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index f8ccfd6be0ee..75c4bd74d521 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -1164,6 +1164,7 @@ static int intel_pt_walk_fup(struct intel_pt_decoder *decoder)
return 0;
if (err == -EAGAIN ||
intel_pt_fup_with_nlip(decoder, &intel_pt_insn, ip, err)) {
+ decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
if (intel_pt_fup_event(decoder))
return 0;
return -EAGAIN;
@@ -1942,17 +1943,13 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
}
if (decoder->set_fup_mwait)
no_tip = true;
+ if (no_tip)
+ decoder->pkt_state = INTEL_PT_STATE_FUP_NO_TIP;
+ else
+ decoder->pkt_state = INTEL_PT_STATE_FUP;
err = intel_pt_walk_fup(decoder);
- if (err != -EAGAIN) {
- if (err)
- return err;
- if (no_tip)
- decoder->pkt_state =
- INTEL_PT_STATE_FUP_NO_TIP;
- else
- decoder->pkt_state = INTEL_PT_STATE_FUP;
- return 0;
- }
+ if (err != -EAGAIN)
+ return err;
if (no_tip) {
no_tip = false;
break;
@@ -2599,15 +2596,11 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
err = intel_pt_walk_tip(decoder);
break;
case INTEL_PT_STATE_FUP:
- decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
err = intel_pt_walk_fup(decoder);
if (err == -EAGAIN)
err = intel_pt_walk_fup_tip(decoder);
- else if (!err)
- decoder->pkt_state = INTEL_PT_STATE_FUP;
break;
case INTEL_PT_STATE_FUP_NO_TIP:
- decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
err = intel_pt_walk_fup(decoder);
if (err == -EAGAIN)
err = intel_pt_walk_trace(decoder);
--
2.17.1

2020-07-10 15:14:25

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 03/12] perf tools: Improve aux_output not supported error

For example:
Before:
$ perf record -e '{intel_pt/branch=0/,branch-loads/aux-output/ppp}' -- ls -l
Error:
branch-loads: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
After:
$ perf record -e '{intel_pt/branch=0/,branch-loads/aux-output/ppp}' -- ls -l
Error:
branch-loads: PMU Hardware doesn't support 'aux_output' feature

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/util/evsel.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 80a7f9862aec..6606c1e3b4fe 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2539,6 +2539,10 @@ int evsel__open_strerror(struct evsel *evsel, struct target *target,
"No such device - did you specify an out-of-range profile CPU?");
break;
case EOPNOTSUPP:
+ if (evsel->core.attr.aux_output)
+ return scnprintf(msg, size,
+ "%s: PMU Hardware doesn't support 'aux_output' feature",
+ evsel__name(evsel));
if (evsel->core.attr.sample_period != 0)
return scnprintf(msg, size,
"%s: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'",
--
2.17.1

2020-07-10 15:17:13

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 02/12] perf intel-pt: Fix duplicate branch after CBR

CBR events can result in a duplicate branch event, because the state type
defaults to a branch. Fix by clearing the state type.

Example: trace 'sleep' and hope for a frequency change

Before:

$ perf record -e intel_pt//u sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.034 MB perf.data ]
$ perf script --itrace=bpe > before.txt

After:

$ perf script --itrace=bpe > after.txt
$ diff -u before.txt after.txt
--- before.txt 2020-07-07 14:42:18.191508098 +0300
+++ after.txt 2020-07-07 14:42:36.587891753 +0300
@@ -29673,7 +29673,6 @@
sleep 93431 [007] 15411.619905: 1 branches:u: 0 [unknown] ([unknown]) => 7f0818abb2e0 clock_nanosleep@@GLIBC_2.17+0x0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
sleep 93431 [007] 15411.619905: 1 branches:u: 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 0 [unknown] ([unknown])
sleep 93431 [007] 15411.720069: cbr: cbr: 15 freq: 1507 MHz ( 56%) 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
- sleep 93431 [007] 15411.720069: 1 branches:u: 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 0 [unknown] ([unknown])
sleep 93431 [007] 15411.720076: 1 branches:u: 0 [unknown] ([unknown]) => 7f0818abb30e clock_nanosleep@@GLIBC_2.17+0x2e (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
sleep 93431 [007] 15411.720077: 1 branches:u: 7f0818abb323 clock_nanosleep@@GLIBC_2.17+0x43 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 7f0818ac0eb7 __nanosleep+0x17 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
sleep 93431 [007] 15411.720077: 1 branches:u: 7f0818ac0ebf __nanosleep+0x1f (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 55cb7e4c2827 rpl_nanosleep+0x97 (/usr/bin/sleep)

Signed-off-by: Adrian Hunter <[email protected]>
Fixes: 91de8684f1cff ("perf intel-pt: Cater for CBR change in PSB+")
Fixes: abe5a1d3e4bee ("perf intel-pt: Decoder to output CBR changes immediately")
Cc: [email protected]
---
tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 75c4bd74d521..7ffcbd6fcd1a 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -1977,8 +1977,10 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
* possibility of another CBR change that gets caught up
* in the PSB+.
*/
- if (decoder->cbr != decoder->cbr_seen)
+ if (decoder->cbr != decoder->cbr_seen) {
+ decoder->state.type = 0;
return 0;
+ }
break;

case INTEL_PT_PIP:
@@ -2019,8 +2021,10 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)

case INTEL_PT_CBR:
intel_pt_calc_cbr(decoder);
- if (decoder->cbr != decoder->cbr_seen)
+ if (decoder->cbr != decoder->cbr_seen) {
+ decoder->state.type = 0;
return 0;
+ }
break;

case INTEL_PT_MODE_EXEC:
--
2.17.1

2020-07-10 15:17:43

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 09/12] perf intel-pt: Time filter logged perf events

Change the debug logging (when used with the --time option) to time filter
logged perf events, but allow that to be overridden by using "d+a" instead
of plain "d".

That can reduce the size of the log file.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-intel-pt.txt | 3 +++
tools/perf/util/intel-pt.c | 19 ++++++++++++++++---
2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index d22dead7bbe0..4666e4a83615 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -886,6 +886,9 @@ and that the resulting file may be very large. The "d" option may be followed
by flags which affect what debug messages will or will not be logged. Each flag
must be preceded by either '+' or '-'. The flags support by Intel PT are:
-a Suppress logging of perf events
+ +a Log all perf events
+By default, logged perf events are filtered by any specified time ranges, but
+flag +a overrides that.

In addition, the period of the "instructions" event can be specified. e.g.

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 34caf24998dd..bddeb18648df 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -249,9 +249,22 @@ static void intel_pt_dump_sample(struct perf_session *session,
intel_pt_dump(pt, sample->aux_sample.data, sample->aux_sample.size);
}

-static bool intel_pt_log_events(struct intel_pt *pt)
+static bool intel_pt_log_events(struct intel_pt *pt, u64 tm)
{
- return !(pt->synth_opts.log_minus_flags & AUXTRACE_LOG_FLG_ALL_PERF_EVTS);
+ struct perf_time_interval *range = pt->synth_opts.ptime_range;
+ int n = pt->synth_opts.range_num;
+
+ if (pt->synth_opts.log_plus_flags & AUXTRACE_LOG_FLG_ALL_PERF_EVTS)
+ return true;
+
+ if (pt->synth_opts.log_minus_flags & AUXTRACE_LOG_FLG_ALL_PERF_EVTS)
+ return false;
+
+ /* perf_time__ranges_skip_sample does not work if time is zero */
+ if (!tm)
+ tm = 1;
+
+ return !n || !perf_time__ranges_skip_sample(range, n, tm);
}

static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a,
@@ -2746,7 +2759,7 @@ static int intel_pt_process_event(struct perf_session *session,
if (!err && event->header.type == PERF_RECORD_TEXT_POKE)
err = intel_pt_text_poke(pt, event);

- if (intel_pt_enable_logging && intel_pt_log_events(pt)) {
+ if (intel_pt_enable_logging && intel_pt_log_events(pt, sample->time)) {
intel_pt_log("event %u: cpu %d time %"PRIu64" tsc %#"PRIx64" ",
event->header.type, sample->cpu, sample->time, timestamp);
intel_pt_log_event(event);
--
2.17.1

2020-07-10 15:18:06

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 04/12] perf auxtrace: Add missing itrace options to help text

Add missing itrace options o, G and L.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/util/auxtrace.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 142ccf7d34df..e3ce5fb03ca0 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -604,13 +604,15 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
struct evsel *evsel);

#define ITRACE_HELP \
-" i: synthesize instructions events\n" \
+" i[period]: synthesize instructions events\n" \
" b: synthesize branches events (branch misses for Arm SPE)\n" \
" c: synthesize branches events (calls only)\n" \
" r: synthesize branches events (returns only)\n" \
" x: synthesize transactions events\n" \
" w: synthesize ptwrite events\n" \
" p: synthesize power events\n" \
+" o: synthesize other events recorded due to the use\n" \
+" of aux-output (refer to perf record)\n" \
" e: synthesize error events\n" \
" d: create a debug log\n" \
" f: synthesize first level cache events\n" \
@@ -618,7 +620,9 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
" t: synthesize TLB events\n" \
" a: synthesize remote access events\n" \
" g[len]: synthesize a call chain (use with i or x)\n" \
+" G[len]: synthesize a call chain on existing event records\n" \
" l[len]: synthesize last branch entries (use with i or x)\n" \
+" L[len]: synthesize last branch entries on existing event records\n" \
" sNUMBER: skip initial number of events\n" \
" PERIOD[ns|us|ms|i|t]: specify period to sample stream\n" \
" concatenate multiple options. Default is ibxwpe or cewp\n"
--
2.17.1

2020-07-10 15:18:08

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 07/12] perf auxtrace: Add optional log flags to the itrace 'd' option

Allow the 'd' option to be followed by flags which will affect what debug
messages will or will not be reported. Each flag must be preceded by either
'+' or '-'. The flags are:
a all perf events

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/itrace.txt | 5 +++++
tools/perf/util/auxtrace.c | 3 +++
tools/perf/util/auxtrace.h | 10 +++++++++-
3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 114d0544d7c7..9c0e8586ed47 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -53,3 +53,8 @@
The flags are:
o overflow
l trace data lost
+
+ If supported, the 'd' option may be followed by flags which affect what
+ debug messages will or will not be logged. Each flag must be preceded
+ by either '+' or '-'. The flags are:
+ a all perf events
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index f0b0758830ee..e028187c51fe 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1483,6 +1483,9 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
break;
case 'd':
synth_opts->log = true;
+ if (get_flags(&p, &synth_opts->log_plus_flags,
+ &synth_opts->log_minus_flags))
+ goto out_err;
break;
case 'c':
synth_opts->branches = true;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index cfe6d00d8624..821ef5446a13 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -58,6 +58,8 @@ enum itrace_period_type {
#define AUXTRACE_ERR_FLG_OVERFLOW (1 << ('o' - 'a'))
#define AUXTRACE_ERR_FLG_DATA_LOST (1 << ('l' - 'a'))

+#define AUXTRACE_LOG_FLG_ALL_PERF_EVTS (1 << ('a' - 'a'))
+
/**
* struct itrace_synth_opts - AUX area tracing synthesis options.
* @set: indicates whether or not options have been set
@@ -96,6 +98,8 @@ enum itrace_period_type {
* @range_num: number of time intervals to trace
* @error_plus_flags: flags to affect what errors are reported
* @error_minus_flags: flags to affect what errors are reported
+ * @log_plus_flags: flags to affect what is logged
+ * @log_minus_flags: flags to affect what is logged
*/
struct itrace_synth_opts {
bool set;
@@ -131,6 +135,8 @@ struct itrace_synth_opts {
int range_num;
unsigned int error_plus_flags;
unsigned int error_minus_flags;
+ unsigned int log_plus_flags;
+ unsigned int log_minus_flags;
};

/**
@@ -624,7 +630,9 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
" each flag must be preceded by + or -\n" \
" error flags are: o (overflow)\n" \
" l (data lost)\n" \
-" d: create a debug log\n" \
+" d[flags]: create a debug log\n" \
+" each flag must be preceded by + or -\n" \
+" log flags are: a (all perf events)\n" \
" f: synthesize first level cache events\n" \
" m: synthesize last level cache events\n" \
" t: synthesize TLB events\n" \
--
2.17.1

2020-07-10 15:18:40

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 08/12] perf intel-pt: Use itrace debug log flags to suppress some messages

The "d" option may be followed by flags which affect what debug messages
will or will not be logged. Each flag must be preceded by either '+' or
'-'. The flags support by Intel PT are:
-a Suppress logging of perf events

Suppressing perf events is useful for decreasing the size of the log.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-intel-pt.txt | 5 ++++-
tools/perf/util/intel-pt.c | 17 ++++++++++-------
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index 20ac592a2641..d22dead7bbe0 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -882,7 +882,10 @@ For example, for errors but not overflow or data lost errors:

The "d" option will cause the creation of a file "intel_pt.log" containing all
decoded packets and instructions. Note that this option slows down the decoder
-and that the resulting file may be very large.
+and that the resulting file may be very large. The "d" option may be followed
+by flags which affect what debug messages will or will not be logged. Each flag
+must be preceded by either '+' or '-'. The flags support by Intel PT are:
+ -a Suppress logging of perf events

In addition, the period of the "instructions" event can be specified. e.g.

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index a1cb6a284a2b..34caf24998dd 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -249,6 +249,11 @@ static void intel_pt_dump_sample(struct perf_session *session,
intel_pt_dump(pt, sample->aux_sample.data, sample->aux_sample.size);
}

+static bool intel_pt_log_events(struct intel_pt *pt)
+{
+ return !(pt->synth_opts.log_minus_flags & AUXTRACE_LOG_FLG_ALL_PERF_EVTS);
+}
+
static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a,
struct auxtrace_buffer *b)
{
@@ -2585,10 +2590,6 @@ static int intel_pt_context_switch(struct intel_pt *pt, union perf_event *event,
return -EINVAL;
}

- intel_pt_log("context_switch: cpu %d pid %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
- cpu, pid, tid, sample->time, perf_time_to_tsc(sample->time,
- &pt->tc));
-
ret = intel_pt_sync_switch(pt, cpu, tid, sample->time);
if (ret <= 0)
return ret;
@@ -2745,9 +2746,11 @@ static int intel_pt_process_event(struct perf_session *session,
if (!err && event->header.type == PERF_RECORD_TEXT_POKE)
err = intel_pt_text_poke(pt, event);

- intel_pt_log("event %u: cpu %d time %"PRIu64" tsc %#"PRIx64" ",
- event->header.type, sample->cpu, sample->time, timestamp);
- intel_pt_log_event(event);
+ if (intel_pt_enable_logging && intel_pt_log_events(pt)) {
+ intel_pt_log("event %u: cpu %d time %"PRIu64" tsc %#"PRIx64" ",
+ event->header.type, sample->cpu, sample->time, timestamp);
+ intel_pt_log_event(event);
+ }

return err;
}
--
2.17.1

2020-07-10 15:18:47

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 06/12] perf intel-pt: Use itrace error flags to suppress some errors

The itrace "e" option may be followed by flags which affect what errors
will or will not be reported. Each flag must be preceded by either '+' or '-'.
The flags supported by Intel PT are:
-o Suppress overflow errors
-l Suppress trace data lost errors
For example, for errors but not overflow or data lost errors:

--itrace=e-o-l

Suppressing those errors can be useful for testing and debugging
because they are not due to decoding.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-intel-pt.txt | 9 ++++++++-
tools/perf/util/intel-pt.c | 9 +++++++++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index f4cd49a7fcdb..20ac592a2641 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -871,7 +871,14 @@ Developer Manuals.

Error events show where the decoder lost the trace. Error events
are quite important. Users must know if what they are seeing is a complete
-picture or not.
+picture or not. The "e" option may be followed by flags which affect what errors
+will or will not be reported. Each flag must be preceded by either '+' or '-'.
+The flags supported by Intel PT are:
+ -o Suppress overflow errors
+ -l Suppress trace data lost errors
+For example, for errors but not overflow or data lost errors:
+
+ --itrace=e-o-l

The "d" option will cause the creation of a file "intel_pt.log" containing all
decoded packets and instructions. Note that this option slows down the decoder
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index f09d4cfcd0fd..a1cb6a284a2b 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1862,6 +1862,15 @@ static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
char msg[MAX_AUXTRACE_ERROR_MSG];
int err;

+ if (pt->synth_opts.error_minus_flags) {
+ if (code == INTEL_PT_ERR_OVR &&
+ pt->synth_opts.error_minus_flags & AUXTRACE_ERR_FLG_OVERFLOW)
+ return 0;
+ if (code == INTEL_PT_ERR_LOST &&
+ pt->synth_opts.error_minus_flags & AUXTRACE_ERR_FLG_DATA_LOST)
+ return 0;
+ }
+
intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);

auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
--
2.17.1

2020-07-10 15:19:30

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 10/12] perf auxtrace: Add itrace 'q' option for quicker, less detailed decoding

The 'q' option is for modes of decoding that are quicker because they
skip or omit decoding some aspects of trace data.

If supported, the 'q' option may be repeated to increase the effect.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/itrace.txt | 3 +++
tools/perf/util/auxtrace.c | 3 +++
tools/perf/util/auxtrace.h | 3 +++
3 files changed, 9 insertions(+)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 9c0e8586ed47..d3740c8f399b 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -18,6 +18,7 @@
l synthesize last branch entries (use with i or x)
L synthesize last branch entries on existing event records
s skip initial number of events
+ q quicker (less detailed) decoding

The default is all events i.e. the same as --itrace=ibxwpe,
except for perf script where it is --itrace=ce
@@ -58,3 +59,5 @@
debug messages will or will not be logged. Each flag must be preceded
by either '+' or '-'. The flags are:
a all perf events
+
+ If supported, the 'q' option may be repeated to increase the effect.
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index e028187c51fe..42a85c86421d 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1554,6 +1554,9 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
case 'a':
synth_opts->remote_access = true;
break;
+ case 'q':
+ synth_opts->quick += 1;
+ break;
case ' ':
case ',':
break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 821ef5446a13..951d2d14cf24 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -100,6 +100,7 @@ enum itrace_period_type {
* @error_minus_flags: flags to affect what errors are reported
* @log_plus_flags: flags to affect what is logged
* @log_minus_flags: flags to affect what is logged
+ * @quick: quicker (less detailed) decoding
*/
struct itrace_synth_opts {
bool set;
@@ -137,6 +138,7 @@ struct itrace_synth_opts {
unsigned int error_minus_flags;
unsigned int log_plus_flags;
unsigned int log_minus_flags;
+ unsigned int quick;
};

/**
@@ -642,6 +644,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
" l[len]: synthesize last branch entries (use with i or x)\n" \
" L[len]: synthesize last branch entries on existing event records\n" \
" sNUMBER: skip initial number of events\n" \
+" q: quicker (less detailed) decoding\n" \
" PERIOD[ns|us|ms|i|t]: specify period to sample stream\n" \
" concatenate multiple options. Default is ibxwpe or cewp\n"

--
2.17.1

2020-07-10 15:20:24

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 11/12] perf intel-pt: Add support for decoding FUP/TIP only

Use the new itrace 'q' option to add support for a mode of decoding that
ignores TNT, does not walk object code, but gets the ip from FUP and TIP
packets.

Example:

$ perf record -e intel_pt//u grep -rI pudding drivers
[ perf record: Woken up 52 times to write data ]
[ perf record: Captured and wrote 57.870 MB perf.data ]
$ time perf script --itrace=bi | wc -l
58948289

real 1m23.863s
user 1m23.251s
sys 0m7.452s
$ time perf script --itrace=biq | wc -l
3385694

real 0m4.453s
user 0m4.455s
sys 0m0.328s

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-intel-pt.txt | 31 ++++
.../util/intel-pt-decoder/intel-pt-decoder.c | 167 +++++++++++++++++-
.../util/intel-pt-decoder/intel-pt-decoder.h | 1 +
tools/perf/util/intel-pt.c | 6 +-
4 files changed, 200 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index 4666e4a83615..f9fe4a4040ba 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -825,6 +825,7 @@ The letters are:
l synthesize last branch entries (use with i or x)
L synthesize last branch entries on existing event records
s skip initial number of events
+ q quicker (less detailed) decoding

"Instructions" events look like they were recorded by "perf record -e
instructions".
@@ -969,6 +970,36 @@ at the beginning. This is useful to ignore initialization code.

skips the first million instructions.

+The q option changes the way the trace is decoded. The decoding is much faster
+but much less detailed. Specifically, with the q option, the decoder does not
+decode TNT packets, and does not walk object code, but gets the ip from FUP and
+TIP packets. The q option can be used with the b and i options but the period
+is not used. The q option decodes more quickly, but is useful only if the
+control flow of interest is represented or indicated by FUP, TIP, TIP.PGE, or
+TIP.PGD packets (refer below). However the q option could be used to find time
+ranges that could then be decoded fully using the --time option.
+
+What will *not* be decoded with the (single) q option:
+
+ - direct calls and jmps
+ - conditional branches
+ - non-branch instructions
+
+What *will* be decoded with the (single) q option:
+
+ - asynchronous branches such as interrupts
+ - indirect branches
+ - function return target address *if* the noretcomp config term (refer
+ config terms section) was used
+ - start of (control-flow) tracing
+ - end of (control-flow) tracing, if it is not out of context
+ - power events, ptwrite, transaction start and abort
+ - instruction pointer associated with PSB packets
+
+Note the q option does not specify what events will be synthesized e.g. the p
+option must be used also to show power events.
+
+
dump option
~~~~~~~~~~~

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 7ffcbd6fcd1a..ccb204b1a050 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -55,6 +55,7 @@ enum intel_pt_pkt_state {
INTEL_PT_STATE_TIP_PGD,
INTEL_PT_STATE_FUP,
INTEL_PT_STATE_FUP_NO_TIP,
+ INTEL_PT_STATE_RESAMPLE,
};

static inline bool intel_pt_sample_time(enum intel_pt_pkt_state pkt_state)
@@ -65,6 +66,7 @@ static inline bool intel_pt_sample_time(enum intel_pt_pkt_state pkt_state)
case INTEL_PT_STATE_ERR_RESYNC:
case INTEL_PT_STATE_IN_SYNC:
case INTEL_PT_STATE_TNT_CONT:
+ case INTEL_PT_STATE_RESAMPLE:
return true;
case INTEL_PT_STATE_TNT:
case INTEL_PT_STATE_TIP:
@@ -109,6 +111,8 @@ struct intel_pt_decoder {
bool fixup_last_mtc;
bool have_last_ip;
bool in_psb;
+ bool hop;
+ bool hop_psb_fup;
enum intel_pt_param_flags flags;
uint64_t pos;
uint64_t last_ip;
@@ -235,6 +239,7 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
decoder->data = params->data;
decoder->return_compression = params->return_compression;
decoder->branch_enable = params->branch_enable;
+ decoder->hop = params->quick >= 1;

decoder->flags = params->flags;

@@ -275,6 +280,9 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
intel_pt_log("timestamp: tsc_ctc_mult %u\n", decoder->tsc_ctc_mult);
intel_pt_log("timestamp: tsc_slip %#x\n", decoder->tsc_slip);

+ if (decoder->hop)
+ intel_pt_log("Hop mode: decoding FUP and TIPs, but not TNT\n");
+
return decoder;
}

@@ -1730,8 +1738,14 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)

case INTEL_PT_FUP:
decoder->pge = true;
- if (decoder->packet.count)
+ if (decoder->packet.count) {
intel_pt_set_last_ip(decoder);
+ if (decoder->hop) {
+ /* Act on FUP at PSBEND */
+ decoder->ip = decoder->last_ip;
+ decoder->hop_psb_fup = true;
+ }
+ }
break;

case INTEL_PT_MODE_TSX:
@@ -1875,6 +1889,118 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
}
}

+static int intel_pt_resample(struct intel_pt_decoder *decoder)
+{
+ decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+ decoder->state.type = INTEL_PT_INSTRUCTION;
+ decoder->state.from_ip = decoder->ip;
+ decoder->state.to_ip = 0;
+ return 0;
+}
+
+#define HOP_PROCESS 0
+#define HOP_IGNORE 1
+#define HOP_RETURN 2
+#define HOP_AGAIN 3
+
+/* Hop mode: Ignore TNT, do not walk code, but get ip from FUPs and TIPs */
+static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, int *err)
+{
+ switch (decoder->packet.type) {
+ case INTEL_PT_TNT:
+ return HOP_IGNORE;
+
+ case INTEL_PT_TIP_PGD:
+ if (!decoder->packet.count)
+ return HOP_IGNORE;
+ intel_pt_set_ip(decoder);
+ decoder->state.type |= INTEL_PT_TRACE_END;
+ decoder->state.from_ip = 0;
+ decoder->state.to_ip = decoder->ip;
+ return HOP_RETURN;
+
+ case INTEL_PT_TIP:
+ if (!decoder->packet.count)
+ return HOP_IGNORE;
+ intel_pt_set_ip(decoder);
+ decoder->state.type = INTEL_PT_INSTRUCTION;
+ decoder->state.from_ip = decoder->ip;
+ decoder->state.to_ip = 0;
+ return HOP_RETURN;
+
+ case INTEL_PT_FUP:
+ if (!decoder->packet.count)
+ return HOP_IGNORE;
+ intel_pt_set_ip(decoder);
+ if (intel_pt_fup_event(decoder))
+ return HOP_RETURN;
+ if (!decoder->branch_enable)
+ *no_tip = true;
+ if (*no_tip) {
+ decoder->state.type = INTEL_PT_INSTRUCTION;
+ decoder->state.from_ip = decoder->ip;
+ decoder->state.to_ip = 0;
+ return HOP_RETURN;
+ }
+ *err = intel_pt_walk_fup_tip(decoder);
+ if (!*err)
+ decoder->pkt_state = INTEL_PT_STATE_RESAMPLE;
+ return HOP_RETURN;
+
+ case INTEL_PT_PSB:
+ decoder->last_ip = 0;
+ decoder->have_last_ip = true;
+ decoder->hop_psb_fup = false;
+ *err = intel_pt_walk_psbend(decoder);
+ if (*err == -EAGAIN)
+ return HOP_AGAIN;
+ if (*err)
+ return HOP_RETURN;
+ if (decoder->hop_psb_fup) {
+ decoder->hop_psb_fup = false;
+ decoder->state.type = INTEL_PT_INSTRUCTION;
+ decoder->state.from_ip = decoder->ip;
+ decoder->state.to_ip = 0;
+ return HOP_RETURN;
+ }
+ if (decoder->cbr != decoder->cbr_seen) {
+ decoder->state.type = 0;
+ return HOP_RETURN;
+ }
+ return HOP_IGNORE;
+
+ case INTEL_PT_BAD:
+ case INTEL_PT_PAD:
+ case INTEL_PT_TIP_PGE:
+ case INTEL_PT_TSC:
+ case INTEL_PT_TMA:
+ case INTEL_PT_MODE_EXEC:
+ case INTEL_PT_MODE_TSX:
+ case INTEL_PT_MTC:
+ case INTEL_PT_CYC:
+ case INTEL_PT_VMCS:
+ case INTEL_PT_PSBEND:
+ case INTEL_PT_CBR:
+ case INTEL_PT_TRACESTOP:
+ case INTEL_PT_PIP:
+ case INTEL_PT_OVF:
+ case INTEL_PT_MNT:
+ case INTEL_PT_PTWRITE:
+ case INTEL_PT_PTWRITE_IP:
+ case INTEL_PT_EXSTOP:
+ case INTEL_PT_EXSTOP_IP:
+ case INTEL_PT_MWAIT:
+ case INTEL_PT_PWRE:
+ case INTEL_PT_PWRX:
+ case INTEL_PT_BBP:
+ case INTEL_PT_BIP:
+ case INTEL_PT_BEP:
+ case INTEL_PT_BEP_IP:
+ default:
+ return HOP_PROCESS;
+ }
+}
+
static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
{
bool no_tip = false;
@@ -1885,6 +2011,19 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
if (err)
return err;
next:
+ if (decoder->hop) {
+ switch (intel_pt_hop_trace(decoder, &no_tip, &err)) {
+ case HOP_IGNORE:
+ continue;
+ case HOP_RETURN:
+ return err;
+ case HOP_AGAIN:
+ goto next;
+ default:
+ break;
+ }
+ }
+
switch (decoder->packet.type) {
case INTEL_PT_TNT:
if (!decoder->packet.count)
@@ -1914,6 +2053,12 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
decoder->state.from_ip = 0;
decoder->state.to_ip = decoder->ip;
decoder->state.type |= INTEL_PT_TRACE_BEGIN;
+ /*
+ * In hop mode, resample to get the to_ip as an
+ * "instruction" sample.
+ */
+ if (decoder->hop)
+ decoder->pkt_state = INTEL_PT_STATE_RESAMPLE;
return 0;
}

@@ -2033,7 +2178,7 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)

case INTEL_PT_MODE_TSX:
/* MODE_TSX need not be followed by FUP */
- if (!decoder->pge) {
+ if (!decoder->pge || decoder->in_psb) {
intel_pt_update_in_tx(decoder);
break;
}
@@ -2424,7 +2569,11 @@ static int intel_pt_sync_ip(struct intel_pt_decoder *decoder)
if (err)
return err;

- decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+ /* In hop mode, resample to get the to_ip as an "instruction" sample */
+ if (decoder->hop)
+ decoder->pkt_state = INTEL_PT_STATE_RESAMPLE;
+ else
+ decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
decoder->overflow = false;

decoder->state.from_ip = 0;
@@ -2545,7 +2694,14 @@ static int intel_pt_sync(struct intel_pt_decoder *decoder)

if (decoder->ip) {
decoder->state.type = 0; /* Do not have a sample */
- decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+ /*
+ * In hop mode, resample to get the PSB FUP ip as an
+ * "instruction" sample.
+ */
+ if (decoder->hop)
+ decoder->pkt_state = INTEL_PT_STATE_RESAMPLE;
+ else
+ decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
} else {
return intel_pt_sync_ip(decoder);
}
@@ -2609,6 +2765,9 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
if (err == -EAGAIN)
err = intel_pt_walk_trace(decoder);
break;
+ case INTEL_PT_STATE_RESAMPLE:
+ err = intel_pt_resample(decoder);
+ break;
default:
err = intel_pt_bug(decoder);
break;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index e289e463d635..8645fc265481 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -250,6 +250,7 @@ struct intel_pt_params {
uint32_t tsc_ctc_ratio_n;
uint32_t tsc_ctc_ratio_d;
enum intel_pt_param_flags flags;
+ unsigned int quick;
};

struct intel_pt_decoder;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index bddeb18648df..7cb3cf769d4d 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1030,6 +1030,7 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
params.mtc_period = intel_pt_mtc_period(pt);
params.tsc_ctc_ratio_n = pt->tsc_ctc_ratio_n;
params.tsc_ctc_ratio_d = pt->tsc_ctc_ratio_d;
+ params.quick = pt->synth_opts.quick;

if (pt->filts.cnt > 0)
params.pgd_ip = intel_pt_pgd_ip;
@@ -1423,7 +1424,10 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)

sample.id = ptq->pt->instructions_id;
sample.stream_id = ptq->pt->instructions_id;
- sample.period = ptq->state->tot_insn_cnt - ptq->last_insn_cnt;
+ if (pt->synth_opts.quick)
+ sample.period = 1;
+ else
+ sample.period = ptq->state->tot_insn_cnt - ptq->last_insn_cnt;

sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_in_cyc_cnt;
if (sample.cyc_cnt) {
--
2.17.1

2020-07-10 15:21:05

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 05/12] perf auxtrace: Add optional error flags to the itrace 'e' option

Allow the 'e' option to be followed by flags which will affect what errors
will or will not be reported. Each flag must be preceded by either '+' or
'-'. The flags are:
o overflow
l trace data lost

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/itrace.txt | 6 ++++
tools/perf/util/auxtrace.c | 44 +++++++++++++++++++++++++++++
tools/perf/util/auxtrace.h | 12 +++++++-
3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index e817179c5027..114d0544d7c7 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -47,3 +47,9 @@
--itrace=i0nss1000000

skips the first million instructions.
+
+ The 'e' option may be followed by flags which affect what errors will or
+ will not be reported. Each flag must be preceded by either '+' or '-'.
+ The flags are:
+ o overflow
+ l trace data lost
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 25c639ac4ad4..f0b0758830ee 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1349,6 +1349,47 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
synth_opts->initial_skip = 0;
}

+static int get_flag(const char **ptr, unsigned int *flags)
+{
+ while (1) {
+ char c = **ptr;
+
+ if (c >= 'a' && c <= 'z') {
+ *flags |= 1 << (c - 'a');
+ ++*ptr;
+ return 0;
+ } else if (c == ' ') {
+ ++*ptr;
+ continue;
+ } else {
+ return -1;
+ }
+ }
+}
+
+static int get_flags(const char **ptr, unsigned int *plus_flags, unsigned int *minus_flags)
+{
+ while (1) {
+ switch (**ptr) {
+ case '+':
+ ++*ptr;
+ if (get_flag(ptr, plus_flags))
+ return -1;
+ break;
+ case '-':
+ ++*ptr;
+ if (get_flag(ptr, minus_flags))
+ return -1;
+ break;
+ case ' ':
+ ++*ptr;
+ break;
+ default:
+ return 0;
+ }
+ }
+}
+
/*
* Please check tools/perf/Documentation/perf-script.txt for information
* about the options parsed here, which is introduced after this cset,
@@ -1436,6 +1477,9 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
break;
case 'e':
synth_opts->errors = true;
+ if (get_flags(&p, &synth_opts->error_plus_flags,
+ &synth_opts->error_minus_flags))
+ goto out_err;
break;
case 'd':
synth_opts->log = true;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index e3ce5fb03ca0..cfe6d00d8624 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -55,6 +55,9 @@ enum itrace_period_type {
PERF_ITRACE_PERIOD_NANOSECS,
};

+#define AUXTRACE_ERR_FLG_OVERFLOW (1 << ('o' - 'a'))
+#define AUXTRACE_ERR_FLG_DATA_LOST (1 << ('l' - 'a'))
+
/**
* struct itrace_synth_opts - AUX area tracing synthesis options.
* @set: indicates whether or not options have been set
@@ -91,6 +94,8 @@ enum itrace_period_type {
* @cpu_bitmap: CPUs for which to synthesize events, or NULL for all
* @ptime_range: time intervals to trace or NULL
* @range_num: number of time intervals to trace
+ * @error_plus_flags: flags to affect what errors are reported
+ * @error_minus_flags: flags to affect what errors are reported
*/
struct itrace_synth_opts {
bool set;
@@ -124,6 +129,8 @@ struct itrace_synth_opts {
unsigned long *cpu_bitmap;
struct perf_time_interval *ptime_range;
int range_num;
+ unsigned int error_plus_flags;
+ unsigned int error_minus_flags;
};

/**
@@ -613,7 +620,10 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
" p: synthesize power events\n" \
" o: synthesize other events recorded due to the use\n" \
" of aux-output (refer to perf record)\n" \
-" e: synthesize error events\n" \
+" e[flags]: synthesize error events\n" \
+" each flag must be preceded by + or -\n" \
+" error flags are: o (overflow)\n" \
+" l (data lost)\n" \
" d: create a debug log\n" \
" f: synthesize first level cache events\n" \
" m: synthesize last level cache events\n" \
--
2.17.1

2020-07-10 15:23:24

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V2 12/12] perf intel-pt: Add support for decoding PSB+ only

A single q option decodes ip from only FUP/TIP packets. Make it so that
repeating the q option (i.e. qq) decodes only PSB+, getting ip if there is
a FUP packet within PSB+ (i.e. between PSB and PSBEND).

Example:

$ perf record -e intel_pt//u grep -rI pudding drivers
[ perf record: Woken up 52 times to write data ]
[ perf record: Captured and wrote 57.870 MB perf.data ]
$ time perf script --itrace=bi | wc -l
58948289

real 1m23.863s
user 1m23.251s
sys 0m7.452s
$ time perf script --itrace=biq | wc -l
3385694

real 0m4.453s
user 0m4.455s
sys 0m0.328s
$ time perf script --itrace=biqq | wc -l
1883

real 0m0.047s
user 0m0.043s
sys 0m0.009s

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-intel-pt.txt | 15 +++++++++++++++
.../util/intel-pt-decoder/intel-pt-decoder.c | 18 ++++++++++++++++++
2 files changed, 33 insertions(+)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index f9fe4a4040ba..d5a266d7f15b 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -999,6 +999,21 @@ What *will* be decoded with the (single) q option:
Note the q option does not specify what events will be synthesized e.g. the p
option must be used also to show power events.

+Repeating the q option (double-q i.e. qq) results in even faster decoding and even
+less detail. The decoder decodes only extended PSB (PSB+) packets, getting the
+instruction pointer if there is a FUP packet within PSB+ (i.e. between PSB and
+PSBEND). Note PSB packets occur regularly in the trace based on the psb_period
+config term (refer config terms section). There will be a FUP packet if the
+PSB+ occurs while control flow is being traced.
+
+What will *not* be decoded with the qq option:
+
+ - everything except instruction pointer associated with PSB packets
+
+What *will* be decoded with the qq option:
+
+ - instruction pointer associated with PSB packets
+

dump option
~~~~~~~~~~~
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index ccb204b1a050..697513f35154 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -113,6 +113,7 @@ struct intel_pt_decoder {
bool in_psb;
bool hop;
bool hop_psb_fup;
+ bool leap;
enum intel_pt_param_flags flags;
uint64_t pos;
uint64_t last_ip;
@@ -240,6 +241,7 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
decoder->return_compression = params->return_compression;
decoder->branch_enable = params->branch_enable;
decoder->hop = params->quick >= 1;
+ decoder->leap = params->quick >= 2;

decoder->flags = params->flags;

@@ -1903,9 +1905,18 @@ static int intel_pt_resample(struct intel_pt_decoder *decoder)
#define HOP_RETURN 2
#define HOP_AGAIN 3

+static int intel_pt_scan_for_psb(struct intel_pt_decoder *decoder);
+
/* Hop mode: Ignore TNT, do not walk code, but get ip from FUPs and TIPs */
static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, int *err)
{
+ /* Leap from PSB to PSB, getting ip from FUP within PSB+ */
+ if (decoder->leap && !decoder->in_psb && decoder->packet.type != INTEL_PT_PSB) {
+ *err = intel_pt_scan_for_psb(decoder);
+ if (*err)
+ return HOP_RETURN;
+ }
+
switch (decoder->packet.type) {
case INTEL_PT_TNT:
return HOP_IGNORE;
@@ -2681,6 +2692,7 @@ static int intel_pt_sync(struct intel_pt_decoder *decoder)
decoder->ip = 0;
intel_pt_clear_stack(&decoder->stack);

+leap:
err = intel_pt_scan_for_psb(decoder);
if (err)
return err;
@@ -2702,6 +2714,12 @@ static int intel_pt_sync(struct intel_pt_decoder *decoder)
decoder->pkt_state = INTEL_PT_STATE_RESAMPLE;
else
decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+ } else if (decoder->leap) {
+ /*
+ * In leap mode, only PSB+ is decoded, so keeping leaping to the
+ * next PSB until there is an ip.
+ */
+ goto leap;
} else {
return intel_pt_sync_ip(decoder);
}
--
2.17.1

2020-07-20 22:27:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V2 10/12] perf auxtrace: Add itrace 'q' option for quicker, less detailed decoding

> +" q: quicker (less detailed) decoding\n" \

Perhaps add '(can be repeated)'

-Andi

2020-07-20 22:28:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V2 12/12] perf intel-pt: Add support for decoding PSB+ only

On Fri, Jul 10, 2020 at 06:11:04PM +0300, Adrian Hunter wrote:
> A single q option decodes ip from only FUP/TIP packets. Make it so that
> repeating the q option (i.e. qq) decodes only PSB+, getting ip if there is
> a FUP packet within PSB+ (i.e. between PSB and PSBEND).
>
> Example:
>
> $ perf record -e intel_pt//u grep -rI pudding drivers
> [ perf record: Woken up 52 times to write data ]
> [ perf record: Captured and wrote 57.870 MB perf.data ]
> $ time perf script --itrace=bi | wc -l
> 58948289
>
> real 1m23.863s
> user 1m23.251s
> sys 0m7.452s
> $ time perf script --itrace=biq | wc -l
> 3385694
>
> real 0m4.453s
> user 0m4.455s
> sys 0m0.328s
> $ time perf script --itrace=biqq | wc -l
> 1883
>
> real 0m0.047s
> user 0m0.043s
> sys 0m0.009s
>
> Signed-off-by: Adrian Hunter <[email protected]>
> ---
> tools/perf/Documentation/perf-intel-pt.txt | 15 +++++++++++++++
> .../util/intel-pt-decoder/intel-pt-decoder.c | 18 ++++++++++++++++++
> 2 files changed, 33 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
> index f9fe4a4040ba..d5a266d7f15b 100644
> --- a/tools/perf/Documentation/perf-intel-pt.txt
> +++ b/tools/perf/Documentation/perf-intel-pt.txt
> @@ -999,6 +999,21 @@ What *will* be decoded with the (single) q option:
> Note the q option does not specify what events will be synthesized e.g. the p
> option must be used also to show power events.
>
> +Repeating the q option (double-q i.e. qq) results in even faster decoding and even
> +less detail. The decoder decodes only extended PSB (PSB+) packets, getting the
> +instruction pointer if there is a FUP packet within PSB+ (i.e. between PSB and
> +PSBEND). Note PSB packets occur regularly in the trace based on the psb_period
> +config term (refer config terms section). There will be a FUP packet if the
> +PSB+ occurs while control flow is being traced.

Some estimate would be good how frequent that is.

If we assume one bit per instruction then a 2K period it's roughly 16k instructions,
with the 16K period roughly 128K instructions.

Could be added in a followon patch.

But looks overall the patches look good to me now.

(for the whole series)
Reviewed-by: Andi Kleen <[email protected]>

2020-08-04 13:36:21

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V2 00/12] perf intel-pt: Add support for decoding FUP/TIP only

On 10/07/20 6:10 pm, Adrian Hunter wrote:
> Hi
>
> Here are some fixes and small improvements for Intel PT.

Andi added his "Reviewed-by" with 2 comments to tweak the
documentation.

The patches still apply, so do you want me to send a V3?

>
> Changes in V2:
> For d/e flags, use +/- alphabetic options instead of numbers
> Update help text
> Improve documentation
>
>
> Adrian Hunter (12):
> perf intel-pt: Fix FUP packet state
> perf intel-pt: Fix duplicate branch after CBR
> perf tools: Improve aux_output not supported error
> perf auxtrace: Add missing itrace options to help text
> perf auxtrace: Add optional error flags to the itrace 'e' option
> perf intel-pt: Use itrace error flags to suppress some errors
> perf auxtrace: Add optional log flags to the itrace 'd' option
> perf intel-pt: Use itrace debug log flags to suppress some messages
> perf intel-pt: Time filter logged perf events
> perf auxtrace: Add itrace 'q' option for quicker, less detailed decoding
> perf intel-pt: Add support for decoding FUP/TIP only
> perf intel-pt: Add support for decoding PSB+ only
>
> tools/perf/Documentation/itrace.txt | 14 ++
> tools/perf/Documentation/perf-intel-pt.txt | 63 +++++-
> tools/perf/util/auxtrace.c | 50 +++++
> tools/perf/util/auxtrace.h | 31 ++-
> tools/perf/util/evsel.c | 4 +
> .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 214 +++++++++++++++++++--
> .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 1 +
> tools/perf/util/intel-pt.c | 45 ++++-
> 8 files changed, 389 insertions(+), 33 deletions(-)
>
>
> Regards
> Adrian
>

2020-08-04 15:00:53

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V2 00/12] perf intel-pt: Add support for decoding FUP/TIP only

Em Tue, Aug 04, 2020 at 04:34:36PM +0300, Adrian Hunter escreveu:
> On 10/07/20 6:10 pm, Adrian Hunter wrote:
> > Hi
> >
> > Here are some fixes and small improvements for Intel PT.
>
> Andi added his "Reviewed-by" with 2 comments to tweak the
> documentation.
>
> The patches still apply, so do you want me to send a V3?

Thanks for the reminder, will apply and test build,

- Arnaldo

> >
> > Changes in V2:
> > For d/e flags, use +/- alphabetic options instead of numbers
> > Update help text
> > Improve documentation
> >
> >
> > Adrian Hunter (12):
> > perf intel-pt: Fix FUP packet state
> > perf intel-pt: Fix duplicate branch after CBR
> > perf tools: Improve aux_output not supported error
> > perf auxtrace: Add missing itrace options to help text
> > perf auxtrace: Add optional error flags to the itrace 'e' option
> > perf intel-pt: Use itrace error flags to suppress some errors
> > perf auxtrace: Add optional log flags to the itrace 'd' option
> > perf intel-pt: Use itrace debug log flags to suppress some messages
> > perf intel-pt: Time filter logged perf events
> > perf auxtrace: Add itrace 'q' option for quicker, less detailed decoding
> > perf intel-pt: Add support for decoding FUP/TIP only
> > perf intel-pt: Add support for decoding PSB+ only
> >
> > tools/perf/Documentation/itrace.txt | 14 ++
> > tools/perf/Documentation/perf-intel-pt.txt | 63 +++++-
> > tools/perf/util/auxtrace.c | 50 +++++
> > tools/perf/util/auxtrace.h | 31 ++-
> > tools/perf/util/evsel.c | 4 +
> > .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 214 +++++++++++++++++++--
> > .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 1 +
> > tools/perf/util/intel-pt.c | 45 ++++-
> > 8 files changed, 389 insertions(+), 33 deletions(-)
> >
> >
> > Regards
> > Adrian
> >
>

--

- Arnaldo

2020-08-06 17:39:57

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V2 12/12] perf intel-pt: Add support for decoding PSB+ only

Em Mon, Jul 20, 2020 at 03:25:02PM -0700, Andi Kleen escreveu:
> But looks overall the patches look good to me now.

> (for the whole series)
> Reviewed-by: Andi Kleen <[email protected]>

Thanks, applied.

- Arnaldo