2021-05-19 19:19:16

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 0/5] perf arm-spe: Enable timestamp

This patch set is to enable timestamp for Arm SPE trace. It reads out
TSC parameters from the TIME_CONV event, the parameters are used for
conversion between timer counter and kernel time, the calculated
timestamps are assigned to Arm SPE samples.

This patch set can be clearly applied on perf/core branch with:

commit 046b243a6afb ("perf x86 kvm-stat: Support to analyze kvm MSR")

The patches have been tested on Hisilicon D06 platform.

Changes from v4:
* Dropped the change "perf arm-spe: Remove unused enum value
ARM_SPE_PER_CPU_MMAPS" for format compatibility (James).

Changes from v3:
* Let to be backwards-compatible for TIME_CONV event (Adrian).

Changes from v2:
* Changed to use TIME_CONV event for extracting clock parameters (Al).

Changes from v1:
* Rebased patch series on the latest perf/core branch;
* Fixed the patch for dumping TSC parameters to support both the
older and new auxtrace info format.


Leo Yan (5):
perf arm-spe: Save clock parameters from TIME_CONV event
perf arm-spe: Convert event kernel time to counter value
perf arm-spe: Assign kernel time to synthesized event
perf arm-spe: Bail out if the trace is later than perf event
perf arm-spe: Don't wait for PERF_RECORD_EXIT event

tools/perf/util/arm-spe.c | 73 +++++++++++++++++++++++++++++++++------
1 file changed, 63 insertions(+), 10 deletions(-)

--
2.25.1



2021-05-19 19:19:21

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 2/5] perf arm-spe: Convert event kernel time to counter value

When handle a perf event, Arm SPE decoder needs to decide if this perf
event is earlier or later than the samples from Arm SPE trace data; to
do comparision, it needs to use the same unit for the time.

This patch converts the event kernel time to arch timer's counter value,
thus it can be used to compare with counter value contained in Arm SPE
Timestamp packet.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d2ae5a5c13ee..ff8b52e6d475 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -669,7 +669,7 @@ static int arm_spe_process_event(struct perf_session *session,
}

if (sample->time && (sample->time != (u64) -1))
- timestamp = sample->time;
+ timestamp = perf_time_to_tsc(sample->time, &spe->tc);
else
timestamp = 0;

--
2.25.1


2021-05-19 19:19:55

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 3/5] perf arm-spe: Assign kernel time to synthesized event

In current code, it assigns the arch timer counter to the synthesized
samples Arm SPE trace, thus the samples don't contain the kernel time
but only contain the raw counter value.

To fix the issue, this patch converts the timer counter to kernel time
and assigns it to sample timestamp.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index ff8b52e6d475..da379328442c 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -234,7 +234,7 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
struct arm_spe_record *record = &speq->decoder->record;

if (!spe->timeless_decoding)
- sample->time = speq->timestamp;
+ sample->time = tsc_to_perf_time(record->timestamp, &spe->tc);

sample->ip = record->from_ip;
sample->cpumode = arm_spe_cpumode(spe, sample->ip);
--
2.25.1


2021-05-19 19:20:06

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 5/5] perf arm-spe: Don't wait for PERF_RECORD_EXIT event

When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
perf event) for processing trace data, which is needless and even might
cause logic error, e.g. it might fail to correlate perf events with Arm
SPE events correctly.

So this patch removes the condition checking for PERF_RECORD_EXIT event.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 5c5b438584c4..58b7069c5a5f 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session,
sample->time);
}
} else if (timestamp) {
- if (event->header.type == PERF_RECORD_EXIT) {
- err = arm_spe_process_queues(spe, timestamp);
- if (err)
- return err;
- }
+ err = arm_spe_process_queues(spe, timestamp);
}

return err;
--
2.25.1


2021-06-25 13:26:36

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v5 5/5] perf arm-spe: Don't wait for PERF_RECORD_EXIT event



On 19/05/2021 08:19, Leo Yan wrote:
> When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
> perf event) for processing trace data, which is needless and even might
> cause logic error, e.g. it might fail to correlate perf events with Arm
> SPE events correctly.
>
> So this patch removes the condition checking for PERF_RECORD_EXIT event.
>
> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/util/arm-spe.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 5c5b438584c4..58b7069c5a5f 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session,
> sample->time);
> }
> } else if (timestamp) {
> - if (event->header.type == PERF_RECORD_EXIT) {
> - err = arm_spe_process_queues(spe, timestamp);
> - if (err)
> - return err;
> - }
> + err = arm_spe_process_queues(spe, timestamp);
> }
>
> return err;
>

For the whole set:
Reviewed-by: James Clark <[email protected]>
Tested-by: James Clark <[email protected]>

I see a big improvement in decoding involving multiple processes because the timestamps are now
correlated with the comm and mmap events.

For example perf-exec samples are visible right before the exec is done, and on an
application that forks, samples are visible from all processes. For example:

perf record -e arm_spe// -- bash -c "stress -c 1"
perf script

perf-exec 4502 [003] 259755.050409: 1 l1d-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
perf-exec 4502 [003] 259755.050409: 1 tlb-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
perf-exec 4502 [003] 259755.050409: 1 memory: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
perf-exec 4502 [003] 259755.050411: 1 tlb-access: ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms])
bash 4502 [003] 259755.050411: 1 branch-miss: ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms])
bash 4502 [003] 259755.050411: 1 tlb-access: 0 [unknown] ([unknown])
...
stress 4502 [003] 259755.051468: 1 l1d-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
stress 4502 [003] 259755.051468: 1 tlb-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
stress 4502 [003] 259755.051468: 1 memory: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])

Previously samples were only attributed to 'stress', which was obviously wrong.

James

2021-06-28 12:14:12

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 5/5] perf arm-spe: Don't wait for PERF_RECORD_EXIT event

On Fri, Jun 25, 2021 at 02:25:15PM +0100, James Clark wrote:
>
>
> On 19/05/2021 08:19, Leo Yan wrote:
> > When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
> > perf event) for processing trace data, which is needless and even might
> > cause logic error, e.g. it might fail to correlate perf events with Arm
> > SPE events correctly.
> >
> > So this patch removes the condition checking for PERF_RECORD_EXIT event.
> >
> > Signed-off-by: Leo Yan <[email protected]>
> > ---
> > tools/perf/util/arm-spe.c | 6 +-----
> > 1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> > index 5c5b438584c4..58b7069c5a5f 100644
> > --- a/tools/perf/util/arm-spe.c
> > +++ b/tools/perf/util/arm-spe.c
> > @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session,
> > sample->time);
> > }
> > } else if (timestamp) {
> > - if (event->header.type == PERF_RECORD_EXIT) {
> > - err = arm_spe_process_queues(spe, timestamp);
> > - if (err)
> > - return err;
> > - }
> > + err = arm_spe_process_queues(spe, timestamp);
> > }
> >
> > return err;
> >
>
> For the whole set:
> Reviewed-by: James Clark <[email protected]>
> Tested-by: James Clark <[email protected]>

> I see a big improvement in decoding involving multiple processes because the timestamps are now
> correlated with the comm and mmap events.
>
> For example perf-exec samples are visible right before the exec is done, and on an
> application that forks, samples are visible from all processes. For example:
>
> perf record -e arm_spe// -- bash -c "stress -c 1"
> perf script
>
> perf-exec 4502 [003] 259755.050409: 1 l1d-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> perf-exec 4502 [003] 259755.050409: 1 tlb-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> perf-exec 4502 [003] 259755.050409: 1 memory: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> perf-exec 4502 [003] 259755.050411: 1 tlb-access: ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms])
> bash 4502 [003] 259755.050411: 1 branch-miss: ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms])
> bash 4502 [003] 259755.050411: 1 tlb-access: 0 [unknown] ([unknown])
> ...
> stress 4502 [003] 259755.051468: 1 l1d-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> stress 4502 [003] 259755.051468: 1 tlb-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> stress 4502 [003] 259755.051468: 1 memory: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
>
> Previously samples were only attributed to 'stress', which was obviously wrong.

Thanks a lot for the review and testing, James!

Hi Arnaldo, I confirmed this patch set can be cleanly applied on
the latest acme/perf/core branch, so could you pick up this patch
set?

Thanks,
Leo

2021-07-01 17:46:53

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 5/5] perf arm-spe: Don't wait for PERF_RECORD_EXIT event

Em Mon, Jun 28, 2021 at 08:12:17PM +0800, Leo Yan escreveu:
> On Fri, Jun 25, 2021 at 02:25:15PM +0100, James Clark wrote:
> > For the whole set:
> > Reviewed-by: James Clark <[email protected]>
> > Tested-by: James Clark <[email protected]>

> > I see a big improvement in decoding involving multiple processes because the timestamps are now

> >
> > For example perf-exec samples are visible right before the exec is done, and on an
> > application that forks, samples are visible from all processes. For example:

> > perf record -e arm_spe// -- bash -c "stress -c 1"
> > perf script

> > perf-exec 4502 [003] 259755.050409: 1 l1d-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> > perf-exec 4502 [003] 259755.050409: 1 tlb-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> > perf-exec 4502 [003] 259755.050409: 1 memory: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> > perf-exec 4502 [003] 259755.050411: 1 tlb-access: ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms])
> > bash 4502 [003] 259755.050411: 1 branch-miss: ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms])
> > bash 4502 [003] 259755.050411: 1 tlb-access: 0 [unknown] ([unknown])
> > ...
> > stress 4502 [003] 259755.051468: 1 l1d-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> > stress 4502 [003] 259755.051468: 1 tlb-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> > stress 4502 [003] 259755.051468: 1 memory: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])

> > Previously samples were only attributed to 'stress', which was obviously wrong.
>
> Thanks a lot for the review and testing, James!
>
> Hi Arnaldo, I confirmed this patch set can be cleanly applied on
> the latest acme/perf/core branch, so could you pick up this patch
> set?

Applied, thanks, please let me know if there is still something
outstanding,

- Arnaldo

2021-07-02 01:33:07

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 5/5] perf arm-spe: Don't wait for PERF_RECORD_EXIT event

On Thu, Jul 01, 2021 at 02:03:16PM -0300, Arnaldo Carvalho de Melo wrote:

[...]

> > Hi Arnaldo, I confirmed this patch set can be cleanly applied on
> > the latest acme/perf/core branch, so could you pick up this patch
> > set?
>
> Applied, thanks, please let me know if there is still something
> outstanding,

Thanks, Arnaldo! I confirmed you don't miss anything.