2021-04-12 09:47:04

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 0/6] perf arm-spe: Enable timestamp

This patch set is to enable timestamp for Arm SPE trace. It reads out
TSC parameters from the TIME_CONV event, the parameters are used for
conversion between timer counter and kernel time and which is applied
for Arm SPE samples.

This version dropped the change for adding hardware clock parameters
into auxtrace info, alternatively, it utilizes the TIME_CONV event to
extract the clock parameters which is used for timestamp calculation.

This patch set can be clearly applied on perf/core branch with:

commit 2c0cb9f56020 ("perf test: Add a shell test for 'perf stat --bpf-counters' new option")

Ths patch series has been tested on Hisilicon D06 platform.

Changes from v3:
* Let to be backwards-compatible for TIME_CONV event (Adrian).

Changes from v2:
* Changed to use TIME_CONV event for extracting clock parameters (Al).

Changes from v1:
* Rebased patch series on the latest perf/core branch;
* Fixed the patch for dumping TSC parameters to support both the
older and new auxtrace info format.


Leo Yan (6):
perf arm-spe: Remove unused enum value ARM_SPE_PER_CPU_MMAPS
perf arm-spe: Save clock parameters from TIME_CONV event
perf arm-spe: Convert event kernel time to counter value
perf arm-spe: Assign kernel time to synthesized event
perf arm-spe: Bail out if the trace is later than perf event
perf arm-spe: Don't wait for PERF_RECORD_EXIT event

tools/perf/util/arm-spe.c | 74 +++++++++++++++++++++++++++++++++------
tools/perf/util/arm-spe.h | 1 -
2 files changed, 64 insertions(+), 11 deletions(-)

--
2.25.1


2021-04-12 09:47:04

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 3/6] perf arm-spe: Convert event kernel time to counter value

When handle a perf event, Arm SPE decoder needs to decide if this perf
event is earlier or later than the samples from Arm SPE trace data; to
do comparision, it needs to use the same unit for the time.

This patch converts the event kernel time to arch timer's counter value,
thus it can be used to compare with counter value contained in Arm SPE
Timestamp packet.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 7620dcc45940..23714cf0380e 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -669,7 +669,7 @@ static int arm_spe_process_event(struct perf_session *session,
}

if (sample->time && (sample->time != (u64) -1))
- timestamp = sample->time;
+ timestamp = perf_time_to_tsc(sample->time, &spe->tc);
else
timestamp = 0;

--
2.25.1

2021-04-12 22:40:21

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 5/6] perf arm-spe: Bail out if the trace is later than perf event

It's possible that record in Arm SPE trace is later than perf event and
vice versa. This asks to correlate the perf events and Arm SPE
synthesized events to be processed in the manner of correct timing.

To achieve the time ordering, this patch reverses the flow, it firstly
calls arm_spe_sample() and then calls arm_spe_decode(). By comparing
the timestamp value and detect the perf event is coming earlier than Arm
SPE trace data, it bails out from the decoding loop, the last record is
pushed into auxtrace stack and is deferred to generate sample. To track
the timestamp, everytime it updates timestamp for the latest record.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 37 ++++++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index c13a89f06ab8..b37d1cacebe9 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -434,12 +434,36 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
{
struct arm_spe *spe = speq->spe;
+ struct arm_spe_record *record;
int ret;

if (!spe->kernel_start)
spe->kernel_start = machine__kernel_start(spe->machine);

while (1) {
+ /*
+ * The usual logic is firstly to decode the packets, and then
+ * based the record to synthesize sample; but here the flow is
+ * reversed: it calls arm_spe_sample() for synthesizing samples
+ * prior to arm_spe_decode().
+ *
+ * Two reasons for this code logic:
+ * 1. Firstly, when setup queue in arm_spe__setup_queue(), it
+ * has decoded trace data and generated a record, but the record
+ * is left to generate sample until run to here, so it's correct
+ * to synthesize sample for the left record.
+ * 2. After decoding trace data, it needs to compare the record
+ * timestamp with the coming perf event, if the record timestamp
+ * is later than the perf event, it needs bail out and pushs the
+ * record into auxtrace heap, thus the record can be deferred to
+ * synthesize sample until run to here at the next time; so this
+ * can correlate samples between Arm SPE trace data and other
+ * perf events with correct time ordering.
+ */
+ ret = arm_spe_sample(speq);
+ if (ret)
+ return ret;
+
ret = arm_spe_decode(speq->decoder);
if (!ret) {
pr_debug("No data or all data has been processed.\n");
@@ -453,10 +477,17 @@ static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
if (ret < 0)
continue;

- ret = arm_spe_sample(speq);
- if (ret)
- return ret;
+ record = &speq->decoder->record;

+ /* Update timestamp for the last record */
+ if (record->timestamp > speq->timestamp)
+ speq->timestamp = record->timestamp;
+
+ /*
+ * If the timestamp of the queue is later than timestamp of the
+ * coming perf event, bail out so can allow the perf event to
+ * be processed ahead.
+ */
if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
*timestamp = speq->timestamp;
return 0;
--
2.25.1

2021-04-12 22:44:09

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 1/6] perf arm-spe: Remove unused enum value ARM_SPE_PER_CPU_MMAPS

The enum value 'ARM_SPE_PER_CPU_MMAPS' is never used so remove it.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
index 98d3235781c3..105ce0ea0a01 100644
--- a/tools/perf/util/arm-spe.h
+++ b/tools/perf/util/arm-spe.h
@@ -11,7 +11,6 @@

enum {
ARM_SPE_PMU_TYPE,
- ARM_SPE_PER_CPU_MMAPS,
ARM_SPE_AUXTRACE_PRIV_MAX,
};

--
2.25.1

2021-04-15 14:44:50

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v4 0/6] perf arm-spe: Enable timestamp

Hi Leo,

I was looking at testing this on N1SDP and I thought I would try the round trip with perf inject and
then perf report but saw that perf inject with SPE always results in an error (unrelated to your change)

-> ./perf report -i per-thread-spe-time.inject.data
0x1328 [0x8]: failed to process type: 9 [Bad address]
Error:
failed to process sample


Do you have any test suggestions other than looking at the raw data?

Thanks
James

On 12/04/2021 12:10, Leo Yan wrote:
> This patch set is to enable timestamp for Arm SPE trace. It reads out
> TSC parameters from the TIME_CONV event, the parameters are used for
> conversion between timer counter and kernel time and which is applied
> for Arm SPE samples.
>
> This version dropped the change for adding hardware clock parameters
> into auxtrace info, alternatively, it utilizes the TIME_CONV event to
> extract the clock parameters which is used for timestamp calculation.
>
> This patch set can be clearly applied on perf/core branch with:
>
> commit 2c0cb9f56020 ("perf test: Add a shell test for 'perf stat --bpf-counters' new option")
>
> Ths patch series has been tested on Hisilicon D06 platform.
>
> Changes from v3:
> * Let to be backwards-compatible for TIME_CONV event (Adrian).
>
> Changes from v2:
> * Changed to use TIME_CONV event for extracting clock parameters (Al).
>
> Changes from v1:
> * Rebased patch series on the latest perf/core branch;
> * Fixed the patch for dumping TSC parameters to support both the
> older and new auxtrace info format.
>
>
> Leo Yan (6):
> perf arm-spe: Remove unused enum value ARM_SPE_PER_CPU_MMAPS
> perf arm-spe: Save clock parameters from TIME_CONV event
> perf arm-spe: Convert event kernel time to counter value
> perf arm-spe: Assign kernel time to synthesized event
> perf arm-spe: Bail out if the trace is later than perf event
> perf arm-spe: Don't wait for PERF_RECORD_EXIT event
>
> tools/perf/util/arm-spe.c | 74 +++++++++++++++++++++++++++++++++------
> tools/perf/util/arm-spe.h | 1 -
> 2 files changed, 64 insertions(+), 11 deletions(-)
>

2021-04-15 14:51:30

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v4 1/6] perf arm-spe: Remove unused enum value ARM_SPE_PER_CPU_MMAPS



On 15/04/2021 17:41, Leo Yan wrote:
> Hi James,
>
> On Thu, Apr 15, 2021 at 05:13:36PM +0300, James Clark wrote:
>> On 12/04/2021 12:10, Leo Yan wrote:
>>> The enum value 'ARM_SPE_PER_CPU_MMAPS' is never used so remove it.
>>
>> Hi Leo,
>>
>> I think this causes an error when attempting to open a newly recorded file
>> with an old version of perf. The value ARM_SPE_AUXTRACE_PRIV_MAX is used here:
>>
>> size_t min_sz = sizeof(u64) * ARM_SPE_AUXTRACE_PRIV_MAX;
>> struct perf_record_time_conv *tc = &session->time_conv;
>> struct arm_spe *spe;
>> int err;
>>
>> if (auxtrace_info->header.size < sizeof(struct perf_record_auxtrace_info) +
>> min_sz)
>> return -EINVAL;
>>
>> And removing ARM_SPE_PER_CPU_MMAPS changes the value of ARM_SPE_AUXTRACE_PRIV_MAX.
>>
>> At least I think that's what's causing the problem. I get this error:
>>
>> ./perf report -i per-thread-spe-time.data
>> 0x1c0 [0x18]: failed to process type: 70 [Invalid argument]
>> Error:
>> failed to process sample
>> # To display the perf.data header info, please use --header/--header-only options.
>> #
>
> Yes, when working on this patch I had concern as well.
>
> I carefully thought that the perf tool should be backwards-compatible,
> but there have no requirement for forwards-compatibility. This is the
> main reason why I kept this patch.
>
> If you or anyone could confirm the forwards-compatibility is required,
> it's quite fine for me to drop this patch.
>

Personally, I can easily imagine sending a file to someone to open with an older version and it causing
friction where it could be easily avoided. And it even made testing a bit more difficult because
I wanted to compare opening the same file with the patched and un-patched version. But if there
is no hard requirement I can't really put too much pressure to not remove it.

> Thanks a lot for the reviewing and testing!
> Leo
>

2021-04-15 15:05:50

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v4 0/6] perf arm-spe: Enable timestamp

On Thu, Apr 15, 2021 at 05:43:24PM +0300, James Clark wrote:
> Hi Leo,
>
> I was looking at testing this on N1SDP and I thought I would try the round trip with perf inject and
> then perf report but saw that perf inject with SPE always results in an error (unrelated to your change)
>
> -> ./perf report -i per-thread-spe-time.inject.data
> 0x1328 [0x8]: failed to process type: 9 [Bad address]
> Error:
> failed to process sample
>
>
> Do you have any test suggestions other than looking at the raw data?

Good catching! I didn't use inject mode for Arm SPE before (it's not
not like Arm CoreSight for instruction sample, or SPE's branch sample
is statistical so we cannot generate branch samples based on accurate
interval).

For the debugging, it's good to use "git grep" to search "Bad address"
to check where the error happens, and can use gdb. I personally think
it's possible to go back to check the sythenization flow, simply to
say, it might have problems when inject samples but not in the
decoding flow.

Thanks,
Leo