2020-12-21 08:39:28

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 0/7] perf arm-spe: Enable timestamp

As we know, the timestamp is important for AUX trace; it's mainly used
to correlate between perf events and AUX trace, allows to generate
events with time ordered manner. There have several good examples of
enabling timestamp for AUX trace (like Intel-pt, Intel-bts, etc).

Since the conversion between TSC and kernel timestamp has been supported
on Arm64, TSC is a naming convention from x86, but perf now has reused
it to support Arm arch timer counter.

This patch set is to enable timestamp for Arm SPE trace. It reads out
TSC parameters from mmap page and stores into auxtrace info structure;
the TSC parameters are used for conversion between timer counter and
kernel time and which is applied for Arm SPE samples.

This patch set can be clearly applied on perf/core branch with the
latest commit 2e7f545096f9 ("perf mem: Factor out a function to generate
sort order"). And it was tested on Hisilicon D06 platform.

After:

# perf script -F comm,time,cpu,pid,dso,ip,sym

perf 2408 [032] 168.680297: ffffbd1253690a3c perf_event_exec ([kernel.kallsyms])
perf 2408 [032] 168.680297: ffffbd1253690a3c perf_event_exec ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680317: ffffbd1253683f50 perf_iterate_ctx.constprop.0 ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680317: ffffbd1253683f50 perf_iterate_ctx.constprop.0 ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680319: ffffbd1253683f70 perf_iterate_ctx.constprop.0 ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680319: ffffbd1253683f70 perf_iterate_ctx.constprop.0 ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680367: ffffbd12539b03ec __arch_clear_user ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680375: ffffbd1253721440 kmem_cache_alloc ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680375: ffffbd1253721440 kmem_cache_alloc ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680375: ffffbd1253721440 kmem_cache_alloc ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680375: ffffbd1253721440 kmem_cache_alloc ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680376: ffffbd1253683f70 perf_iterate_ctx.constprop.0 ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680376: ffffbd1253683f70 perf_iterate_ctx.constprop.0 ([kernel.kallsyms])
false_sharing.e 2408 [032] 168.680376: ffffbd1253683f70 perf_iterate_ctx.constprop.0 ([kernel.kallsyms])


Leo Yan (7):
perf arm-spe: Remove unused enum value ARM_SPE_PER_CPU_MMAPS
perf arm-spe: Store TSC parameters in auxtrace info
perf arm-spe: Dump TSC parameters
perf arm-spe: Convert event kernel time to counter value
perf arm-spe: Assign kernel time to synthesized event
perf arm-spe: Bail out if the trace is later than perf event
perf arm-spe: Don't wait for PERF_RECORD_EXIT event

tools/perf/arch/arm64/util/arm-spe.c | 23 ++++++++++
tools/perf/util/arm-spe.c | 67 +++++++++++++++++++++++-----
tools/perf/util/arm-spe.h | 7 ++-
3 files changed, 85 insertions(+), 12 deletions(-)

--
2.17.1


2020-12-21 08:39:32

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 5/7] perf arm-spe: Assign kernel time to synthesized event

In current code, it assigns the arch timer counter to the synthesized
samples Arm SPE trace, thus the samples don't contain the kernel time
but only contain the raw counter value.

To fix the issue, this patch converts the timer counter to kernel time
and assigns it to sample timestamp.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index bc512c3479f7..2b008b973387 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -232,7 +232,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
struct arm_spe_record *record = &speq->decoder->record;

if (!spe->timeless_decoding)
- sample->time = speq->timestamp;
+ sample->time = tsc_to_perf_time(record->timestamp, &spe->tc);

sample->ip = record->from_ip;
sample->cpumode = arm_spe_cpumode(spe, sample->ip);
--
2.17.1

2020-12-21 08:39:53

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 4/7] perf arm-spe: Convert event kernel time to counter value

When handle a perf event, Arm SPE decoder needs to decide if this perf
event is earlier or later than the samples from Arm SPE trace data; to
do comparision, it needs to use the same unit for the time.

This patch converts the event kernel time to arch timer's counter value,
thus it can be used to compare with counter value contained in Arm SPE
Timestamp packet.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index a504ceec2de6..bc512c3479f7 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -588,7 +588,7 @@ static int arm_spe_process_event(struct perf_session *session,
}

if (sample->time && (sample->time != (u64) -1))
- timestamp = sample->time;
+ timestamp = perf_time_to_tsc(sample->time, &spe->tc);
else
timestamp = 0;

--
2.17.1

2020-12-21 08:40:59

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 7/7] perf arm-spe: Don't wait for PERF_RECORD_EXIT event

When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
perf event) for processing trace data, which is needless and even might
cause logic error, e.g. it might fail to correlate perf events with Arm
SPE events correctly.

So this patch removes the condition checking for PERF_RECORD_EXIT event.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 6bf2e4ff5342..7b7812eea117 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -636,11 +636,7 @@ static int arm_spe_process_event(struct perf_session *session,
sample->time);
}
} else if (timestamp) {
- if (event->header.type == PERF_RECORD_EXIT) {
- err = arm_spe_process_queues(spe, timestamp);
- if (err)
- return err;
- }
+ err = arm_spe_process_queues(spe, timestamp);
}

return err;
--
2.17.1

2020-12-21 08:41:02

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 6/7] perf arm-spe: Bail out if the trace is later than perf event

It's possible that record in Arm SPE trace is later than perf event and
vice versa. This asks to correlate the perf events and Arm SPE
synthesized events to be processed in the manner of correct timing.

To achieve the time ordering, this patch reverses the flow, it firstly
calls arm_spe_sample() and then calls arm_spe_decode(). By comparing
the timestamp value and detect the perf event is coming earlier than Arm
SPE trace data, it bails out from the decoding loop, the last record is
pushed into auxtrace stack and is deferred to generate sample. To track
the timestamp, everytime it updates timestamp for the latest record.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 37 ++++++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 2b008b973387..6bf2e4ff5342 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -353,12 +353,36 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
{
struct arm_spe *spe = speq->spe;
+ struct arm_spe_record *record;
int ret;

if (!spe->kernel_start)
spe->kernel_start = machine__kernel_start(spe->machine);

while (1) {
+ /*
+ * The usual logic is firstly to decode the packets, and then
+ * based the record to synthesize sample; but here the flow is
+ * reversed: it calls arm_spe_sample() for synthesizing samples
+ * prior to arm_spe_decode().
+ *
+ * Two reasons for this code logic:
+ * 1. Firstly, when setup queue in arm_spe__setup_queue(), it
+ * has decoded trace data and generated a record, but the record
+ * is left to generate sample until run to here, so it's correct
+ * to synthesize sample for the left record.
+ * 2. After decoding trace data, it needs to compare the record
+ * timestamp with the coming perf event, if the record timestamp
+ * is later than the perf event, it needs bail out and pushs the
+ * record into auxtrace heap, thus the record can be deferred to
+ * synthesize sample until run to here at the next time; so this
+ * can correlate samples between Arm SPE trace data and other
+ * perf events with correct time ordering.
+ */
+ ret = arm_spe_sample(speq);
+ if (ret)
+ return ret;
+
ret = arm_spe_decode(speq->decoder);
if (!ret) {
pr_debug("No data or all data has been processed.\n");
@@ -372,10 +396,17 @@ static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
if (ret < 0)
continue;

- ret = arm_spe_sample(speq);
- if (ret)
- return ret;
+ record = &speq->decoder->record;

+ /* Update timestamp for the last record */
+ if (record->timestamp > speq->timestamp)
+ speq->timestamp = record->timestamp;
+
+ /*
+ * If the timestamp of the queue is later than timestamp of the
+ * coming perf event, bail out so can allow the perf event to
+ * be processed ahead.
+ */
if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
*timestamp = speq->timestamp;
return 0;
--
2.17.1