2022-10-05 14:21:47

by James Clark

[permalink] [raw]
Subject: [PATCH] perf test: Fix test_arm_coresight.sh failures on Juno

This test commonly fails on Arm Juno because the instruction interval
is large enough to miss generating any samples for Perf in system-wide
mode.

Fix this by lowering the interval until a comfortable number of Perf
instructions are generated. The test is still quick to run because only
a small amount of trace is gathered.

Before:

sudo ./perf test coresight -vvv
...
Recording trace with system wide mode
Looking at perf.data file for dumping branch samples:
Looking at perf.data file for reporting branch samples:
Looking at perf.data file for instruction samples:
CoreSight system wide testing: FAIL
...

After:

sudo ./perf test coresight -vvv
...
Recording trace with system wide mode
Looking at perf.data file for dumping branch samples:
Looking at perf.data file for reporting branch samples:
Looking at perf.data file for instruction samples:
CoreSight system wide testing: PASS
...

Signed-off-by: James Clark <[email protected]>
---
tools/perf/tests/shell/test_arm_coresight.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index e4cb4f1806ff..daad786cf48d 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -70,7 +70,7 @@ perf_report_instruction_samples() {
# 68.12% touch libc-2.27.so [.] _dl_addr
# 5.80% touch libc-2.27.so [.] getenv
# 4.35% touch ld-2.27.so [.] _dl_fixup
- perf report --itrace=i1000i --stdio -i ${perfdata} 2>&1 | \
+ perf report --itrace=i20i --stdio -i ${perfdata} 2>&1 | \
egrep " +[0-9]+\.[0-9]+% +$1" > /dev/null 2>&1
}

--
2.28.0


2022-10-06 15:11:47

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH] perf test: Fix test_arm_coresight.sh failures on Juno

Hi James,

On Wed, Oct 05, 2022 at 03:05:08PM +0100, James Clark wrote:
> This test commonly fails on Arm Juno because the instruction interval
> is large enough to miss generating any samples for Perf in system-wide
> mode.
>
> Fix this by lowering the interval until a comfortable number of Perf
> instructions are generated. The test is still quick to run because only
> a small amount of trace is gathered.
>
> Before:
>
> sudo ./perf test coresight -vvv
> ...
> Recording trace with system wide mode
> Looking at perf.data file for dumping branch samples:
> Looking at perf.data file for reporting branch samples:
> Looking at perf.data file for instruction samples:
> CoreSight system wide testing: FAIL
> ...
>
> After:
>
> sudo ./perf test coresight -vvv
> ...
> Recording trace with system wide mode
> Looking at perf.data file for dumping branch samples:
> Looking at perf.data file for reporting branch samples:
> Looking at perf.data file for instruction samples:
> CoreSight system wide testing: PASS
> ...

Since Arm Juno board has zero timestamp for CoreSight, I don't think
now arm_cs_etm.sh can really work on it.

If we want to pass the test on Juno board, we need to add option
"--itrace=Zi1000i" for "perf report" and "perf script"; but seems
to me "--itrace=Z..." is not a general case for testing ...

> Signed-off-by: James Clark <[email protected]>
> ---
> tools/perf/tests/shell/test_arm_coresight.sh | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
> index e4cb4f1806ff..daad786cf48d 100755
> --- a/tools/perf/tests/shell/test_arm_coresight.sh
> +++ b/tools/perf/tests/shell/test_arm_coresight.sh
> @@ -70,7 +70,7 @@ perf_report_instruction_samples() {
> # 68.12% touch libc-2.27.so [.] _dl_addr
> # 5.80% touch libc-2.27.so [.] getenv
> # 4.35% touch ld-2.27.so [.] _dl_fixup
> - perf report --itrace=i1000i --stdio -i ${perfdata} 2>&1 | \
> + perf report --itrace=i20i --stdio -i ${perfdata} 2>&1 | \
> egrep " +[0-9]+\.[0-9]+% +$1" > /dev/null 2>&1

So here I am suspect that changing to "--itrace=i20i" can allow the test
to pass on Juno board. Could you confirm for this?

Thanks,
Leo

2022-10-06 16:07:02

by James Clark

[permalink] [raw]
Subject: Re: [PATCH] perf test: Fix test_arm_coresight.sh failures on Juno



On 06/10/2022 15:48, Leo Yan wrote:
> Hi James,
>
> On Wed, Oct 05, 2022 at 03:05:08PM +0100, James Clark wrote:
>> This test commonly fails on Arm Juno because the instruction interval
>> is large enough to miss generating any samples for Perf in system-wide
>> mode.
>>
>> Fix this by lowering the interval until a comfortable number of Perf
>> instructions are generated. The test is still quick to run because only
>> a small amount of trace is gathered.
>>
>> Before:
>>
>> sudo ./perf test coresight -vvv
>> ...
>> Recording trace with system wide mode
>> Looking at perf.data file for dumping branch samples:
>> Looking at perf.data file for reporting branch samples:
>> Looking at perf.data file for instruction samples:
>> CoreSight system wide testing: FAIL
>> ...
>>
>> After:
>>
>> sudo ./perf test coresight -vvv
>> ...
>> Recording trace with system wide mode
>> Looking at perf.data file for dumping branch samples:
>> Looking at perf.data file for reporting branch samples:
>> Looking at perf.data file for instruction samples:
>> CoreSight system wide testing: PASS
>> ...
>
> Since Arm Juno board has zero timestamp for CoreSight, I don't think
> now arm_cs_etm.sh can really work on it.
>
> If we want to pass the test on Juno board, we need to add option
> "--itrace=Zi1000i" for "perf report" and "perf script"; but seems
> to me "--itrace=Z..." is not a general case for testing ...

Unfortunately I now think that adding the Z option didn't improve
anything in Coresight decoding other than removing the warning. I've
never seen the zero timestamp issue on Juno though. I thought that was
on some Qualcomm device? I'm not getting the warning on this test anyway.

The problem is that timeless mode assumes per thread mode, and in per
thread mode there is a separate buffer per thread, so the Coresight
channel IDs are ignored. In systemwide mode the channel ID is important
to know which CPU the trace came from. If this info is thrown away then
not much works correctly.

I plan to overhaul the whole decoder and remove all the assumptions
about per-thread and timeless mode. It would be better if they were
completely separate concepts.

>
>> Signed-off-by: James Clark <[email protected]>
>> ---
>> tools/perf/tests/shell/test_arm_coresight.sh | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
>> index e4cb4f1806ff..daad786cf48d 100755
>> --- a/tools/perf/tests/shell/test_arm_coresight.sh
>> +++ b/tools/perf/tests/shell/test_arm_coresight.sh
>> @@ -70,7 +70,7 @@ perf_report_instruction_samples() {
>> # 68.12% touch libc-2.27.so [.] _dl_addr
>> # 5.80% touch libc-2.27.so [.] getenv
>> # 4.35% touch ld-2.27.so [.] _dl_fixup
>> - perf report --itrace=i1000i --stdio -i ${perfdata} 2>&1 | \
>> + perf report --itrace=i20i --stdio -i ${perfdata} 2>&1 | \
>> egrep " +[0-9]+\.[0-9]+% +$1" > /dev/null 2>&1
>
> So here I am suspect that changing to "--itrace=i20i" can allow the test
> to pass on Juno board. Could you confirm for this?

On Juno:

./perf record -e cs_etm// -a -- ls

With interval 20, 23 instruction samples are generated:

./perf report --stdio --itrace=i20i | egrep " +[0-9]+\.[0-9]+% +perf "
| wc -l

23

With interval 1000, 0 are generated:

./perf report --stdio --itrace=i1000i | egrep " +[0-9]+\.[0-9]+% +perf
" | wc -l

Error:
The perf.data data has no samples!
0

I think the issue is that ls is quite quick to run, so not much trace is
generated for Perf. And it just depends on the scheduling which is
slightly different on Juno. I don't think it's a bug. On N1SDP there are
only 134 samples generated with i1000i, so it could probably end up with
a random run generating 0 there too.


>
> Thanks,
> Leo

2022-10-10 07:58:29

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH] perf test: Fix test_arm_coresight.sh failures on Juno

On Thu, Oct 06, 2022 at 04:11:05PM +0100, James Clark wrote:

[...]

> >> Before:
> >>
> >> sudo ./perf test coresight -vvv
> >> ...
> >> Recording trace with system wide mode
> >> Looking at perf.data file for dumping branch samples:
> >> Looking at perf.data file for reporting branch samples:
> >> Looking at perf.data file for instruction samples:
> >> CoreSight system wide testing: FAIL
> >> ...
> >>
> >> After:
> >>
> >> sudo ./perf test coresight -vvv
> >> ...
> >> Recording trace with system wide mode
> >> Looking at perf.data file for dumping branch samples:
> >> Looking at perf.data file for reporting branch samples:
> >> Looking at perf.data file for instruction samples:
> >> CoreSight system wide testing: PASS
> >> ...
> >
> > Since Arm Juno board has zero timestamp for CoreSight, I don't think
> > now arm_cs_etm.sh can really work on it.
> >
> > If we want to pass the test on Juno board, we need to add option
> > "--itrace=Zi1000i" for "perf report" and "perf script"; but seems
> > to me "--itrace=Z..." is not a general case for testing ...
>
> Unfortunately I now think that adding the Z option didn't improve
> anything in Coresight decoding other than removing the warning. I've
> never seen the zero timestamp issue on Juno though. I thought that was
> on some Qualcomm device? I'm not getting the warning on this test anyway.

No, on my Juno-r2 board I can observe the timestamp is always zero
from CoreSight trace data, this is why everytime I must use
"--itrace=Zi1000i" for reporting results.

> The problem is that timeless mode assumes per thread mode, and in per
> thread mode there is a separate buffer per thread, so the Coresight
> channel IDs are ignored. In systemwide mode the channel ID is important
> to know which CPU the trace came from. If this info is thrown away then
> not much works correctly.
>
> I plan to overhaul the whole decoder and remove all the assumptions
> about per-thread and timeless mode. It would be better if they were
> completely separate concepts.

Okay, good to know this.

[...]

> > So here I am suspect that changing to "--itrace=i20i" can allow the test
> > to pass on Juno board. Could you confirm for this?
>
> On Juno:
>
> ./perf record -e cs_etm// -a -- ls
>
> With interval 20, 23 instruction samples are generated:
>
> ./perf report --stdio --itrace=i20i | egrep " +[0-9]+\.[0-9]+% +perf "
> | wc -l
>
> 23
>
> With interval 1000, 0 are generated:
>
> ./perf report --stdio --itrace=i1000i | egrep " +[0-9]+\.[0-9]+% +perf
> " | wc -l
>
> Error:
> The perf.data data has no samples!
> 0

Thanks for confirmation. It's a bit weird that your Juno board doesn't
produce all zeros for timestamp packets.

> I think the issue is that ls is quite quick to run, so not much trace is
> generated for Perf. And it just depends on the scheduling which is
> slightly different on Juno. I don't think it's a bug. On N1SDP there are
> only 134 samples generated with i1000i, so it could probably end up with
> a random run generating 0 there too.

Agreed, changing to smaller interval makes sense for me.

Reviewed-by: Leo Yan <[email protected]>

Thanks,
Leo

2022-10-10 09:25:21

by James Clark

[permalink] [raw]
Subject: Re: [PATCH] perf test: Fix test_arm_coresight.sh failures on Juno



On 10/10/2022 08:41, Leo Yan wrote:
> On Thu, Oct 06, 2022 at 04:11:05PM +0100, James Clark wrote:
>
> [...]
>
>>>> Before:
>>>>
>>>> sudo ./perf test coresight -vvv
>>>> ...
>>>> Recording trace with system wide mode
>>>> Looking at perf.data file for dumping branch samples:
>>>> Looking at perf.data file for reporting branch samples:
>>>> Looking at perf.data file for instruction samples:
>>>> CoreSight system wide testing: FAIL
>>>> ...
>>>>
>>>> After:
>>>>
>>>> sudo ./perf test coresight -vvv
>>>> ...
>>>> Recording trace with system wide mode
>>>> Looking at perf.data file for dumping branch samples:
>>>> Looking at perf.data file for reporting branch samples:
>>>> Looking at perf.data file for instruction samples:
>>>> CoreSight system wide testing: PASS
>>>> ...
>>>
>>> Since Arm Juno board has zero timestamp for CoreSight, I don't think
>>> now arm_cs_etm.sh can really work on it.
>>>
>>> If we want to pass the test on Juno board, we need to add option
>>> "--itrace=Zi1000i" for "perf report" and "perf script"; but seems
>>> to me "--itrace=Z..." is not a general case for testing ...
>>
>> Unfortunately I now think that adding the Z option didn't improve
>> anything in Coresight decoding other than removing the warning. I've
>> never seen the zero timestamp issue on Juno though. I thought that was
>> on some Qualcomm device? I'm not getting the warning on this test anyway.
>
> No, on my Juno-r2 board I can observe the timestamp is always zero
> from CoreSight trace data, this is why everytime I must use
> "--itrace=Zi1000i" for reporting results.

Ah I have r0 which could explain it. But it's good to know that r2 has
that issue. I still wouldn't expect you to have to use the option
though, because it should only make the warning go away.

>
>> The problem is that timeless mode assumes per thread mode, and in per
>> thread mode there is a separate buffer per thread, so the Coresight
>> channel IDs are ignored. In systemwide mode the channel ID is important
>> to know which CPU the trace came from. If this info is thrown away then
>> not much works correctly.
>>
>> I plan to overhaul the whole decoder and remove all the assumptions
>> about per-thread and timeless mode. It would be better if they were
>> completely separate concepts.
>
> Okay, good to know this.
>
> [...]
>
>>> So here I am suspect that changing to "--itrace=i20i" can allow the test
>>> to pass on Juno board. Could you confirm for this?
>>
>> On Juno:
>>
>> ./perf record -e cs_etm// -a -- ls
>>
>> With interval 20, 23 instruction samples are generated:
>>
>> ./perf report --stdio --itrace=i20i | egrep " +[0-9]+\.[0-9]+% +perf "
>> | wc -l
>>
>> 23
>>
>> With interval 1000, 0 are generated:
>>
>> ./perf report --stdio --itrace=i1000i | egrep " +[0-9]+\.[0-9]+% +perf
>> " | wc -l
>>
>> Error:
>> The perf.data data has no samples!
>> 0
>
> Thanks for confirmation. It's a bit weird that your Juno board doesn't
> produce all zeros for timestamp packets.
>
>> I think the issue is that ls is quite quick to run, so not much trace is
>> generated for Perf. And it just depends on the scheduling which is
>> slightly different on Juno. I don't think it's a bug. On N1SDP there are
>> only 134 samples generated with i1000i, so it could probably end up with
>> a random run generating 0 there too.
>
> Agreed, changing to smaller interval makes sense for me.
>
> Reviewed-by: Leo Yan <[email protected]>

Thanks for the review Leo

>
> Thanks,
> Leo

2022-10-13 14:58:56

by Christian Borntraeger

[permalink] [raw]
Subject: arm coresight txt triggers build warning: (was [PATCH] perf test: Fix test_arm_coresight.sh failures on Juno)

Probably not related to this patch set, but tools/perf/Documentation/perf-arm-coresight.txt

results in
cd tools/perf/Documentation/
make man
ASCIIDOC perf-arm-coresight.xml
asciidoc: ERROR: perf-arm-coresight.txt: line 2: malformed manpage title
asciidoc: ERROR: perf-arm-coresight.txt: line 3: name section expected
asciidoc: FAILED: perf-arm-coresight.txt: line 3: section title expected

in linux-next.

2022-10-13 15:33:45

by James Clark

[permalink] [raw]
Subject: Re: arm coresight txt triggers build warning: (was [PATCH] perf test: Fix test_arm_coresight.sh failures on Juno)



On 13/10/2022 14:45, Christian Borntraeger wrote:
> Probably not related to this patch set, but
> tools/perf/Documentation/perf-arm-coresight.txt
>
> results in
> cd tools/perf/Documentation/
> make man
>   ASCIIDOC perf-arm-coresight.xml
> asciidoc: ERROR: perf-arm-coresight.txt: line 2: malformed manpage title
> asciidoc: ERROR: perf-arm-coresight.txt: line 3: name section expected
> asciidoc: FAILED: perf-arm-coresight.txt: line 3: section title expected
>
> in linux-next.
>

I think this is the same as what has been reported here:
https://lore.kernel.org/linux-perf-users/[email protected]/T/#mf1345f086db25c9fb40eb916b0aa42f1960d7eb2

Carsten, are you able to take a look at this?

Thanks
James

2022-10-14 14:16:47

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH] perf test: Fix test_arm_coresight.sh failures on Juno

Em Mon, Oct 10, 2022 at 10:21:22AM +0100, James Clark escreveu:
> On 10/10/2022 08:41, Leo Yan wrote:
> > On Thu, Oct 06, 2022 at 04:11:05PM +0100, James Clark wrote:
> >> Error:
> >> The perf.data data has no samples!
> >> 0

> > Thanks for confirmation. It's a bit weird that your Juno board doesn't
> > produce all zeros for timestamp packets.

> >> I think the issue is that ls is quite quick to run, so not much trace is
> >> generated for Perf. And it just depends on the scheduling which is
> >> slightly different on Juno. I don't think it's a bug. On N1SDP there are
> >> only 134 samples generated with i1000i, so it could probably end up with
> >> a random run generating 0 there too.

> > Agreed, changing to smaller interval makes sense for me.

> > Reviewed-by: Leo Yan <[email protected]>

> Thanks for the review Leo

Thanks, applied.

- Arnaldo