LinuxLists.cc - [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1

2024-04-09 08:48:51

Subject: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1

To prevent anyone from seeing a test failure appear as a regression and
thinking that it was caused by their code change, just skip the test on
N1.

It can be caused by any unrelated change that shifts the loop into an
unfortunate position in the Perf binary which is almost impossible to
debug as the root cause of the test failure. Ultimately it's caused by
the referenced errata.

Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
Signed-off-by: James Clark <[email protected]>
---
tools/perf/tests/shell/test_data_symbol.sh | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/tools/perf/tests/shell/test_data_symbol.sh b/tools/perf/tests/shell/test_data_symbol.sh
index 3dfa91832aa8..ffc641d00aa4 100755
--- a/tools/perf/tests/shell/test_data_symbol.sh
+++ b/tools/perf/tests/shell/test_data_symbol.sh
@@ -16,6 +16,12 @@ skip_if_no_mem_event() {
return 2
}

+# Skip on Arm N1 due to errata 1694299. Bias exists in SPE sampling
+# which can cause the load and store instructions to be skipped
+# entirely. This comes and goes randomly depending on the offset the
+# linker places the datasym loop at in the Perf binary.
+lscpu | grep -q "Neoverse-N1" && exit 2
+
skip_if_no_mem_event || exit 2

skip_test_missing_symbol buf1
--
2.34.1

2024-04-09 16:28:01

by Ian Rogers

[permalink] [raw]

Subject: Re: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1

On Tue, Apr 9, 2024 at 1:48 AM James Clark <[email protected]> wrote:
>
> To prevent anyone from seeing a test failure appear as a regression and
> thinking that it was caused by their code change, just skip the test on
> N1.
>
> It can be caused by any unrelated change that shifts the loop into an
> unfortunate position in the Perf binary which is almost impossible to
> debug as the root cause of the test failure. Ultimately it's caused by
> the referenced errata.
>
> Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
> Signed-off-by: James Clark <[email protected]>

This change makes me sad :-( Is there no hope of aligning the loop? We
have little enough testing coverage for memory events and even precise
events on ARM that anything take away testing coverage feels like we
should try to do better.

Which models are we losing coverage for, presumably neoverse-n1 but
what about neoverse-v1 and neoverse-n2-v2?

If aligning the loop doesn't work, could we use objdump and check its
alignment skipping when it is off? Or run the test and turn fails to
skip on neoverse-n1 - so we get some coverage testing.

It would also be nice if the change didn't add a dependency on lscpu
for the sake of parsing /proc/cpuinfo, I see another arm test already
did this test_arm_callgraph_fp.sh - that case looks like it should be
using uname.

Thanks,
Ian

> ---
> tools/perf/tests/shell/test_data_symbol.sh | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/perf/tests/shell/test_data_symbol.sh b/tools/perf/tests/shell/test_data_symbol.sh
> index 3dfa91832aa8..ffc641d00aa4 100755
> --- a/tools/perf/tests/shell/test_data_symbol.sh
> +++ b/tools/perf/tests/shell/test_data_symbol.sh
> @@ -16,6 +16,12 @@ skip_if_no_mem_event() {
> return 2
> }
>
> +# Skip on Arm N1 due to errata 1694299. Bias exists in SPE sampling
> +# which can cause the load and store instructions to be skipped
> +# entirely. This comes and goes randomly depending on the offset the
> +# linker places the datasym loop at in the Perf binary.
> +lscpu | grep -q "Neoverse-N1" && exit 2
> +
> skip_if_no_mem_event || exit 2
>
> skip_test_missing_symbol buf1
> --
> 2.34.1
>

2024-04-09 23:17:52

by Namhyung Kim

[permalink] [raw]

Subject: Re: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1

Hi Ian and James,

On Tue, Apr 9, 2024 at 8:39 AM Ian Rogers <[email protected]> wrote:
>
> On Tue, Apr 9, 2024 at 1:48 AM James Clark <[email protected]> wrote:
> >
> > To prevent anyone from seeing a test failure appear as a regression and
> > thinking that it was caused by their code change, just skip the test on
> > N1.
> >
> > It can be caused by any unrelated change that shifts the loop into an
> > unfortunate position in the Perf binary which is almost impossible to
> > debug as the root cause of the test failure. Ultimately it's caused by
> > the referenced errata.
> >
> > Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
> > Signed-off-by: James Clark <[email protected]>
>
> This change makes me sad :-( Is there no hope of aligning the loop? We
> have little enough testing coverage for memory events and even precise
> events on ARM that anything take away testing coverage feels like we
> should try to do better.

Can we just add some noise in the loop?

Thanks,
Namhyung

diff --git a/tools/perf/tests/workloads/datasym.c
b/tools/perf/tests/workloads/datasym.c
index ddd40bc63448..e2514bf393cd 100644
--- a/tools/perf/tests/workloads/datasym.c
+++ b/tools/perf/tests/workloads/datasym.c
@@ -16,6 +16,8 @@ static int datasym(int argc __maybe_unused, const
char **argv __maybe_unused)
{
for (;;) {
buf1.data1++;
+ if ((buf1.data1 % 100129) == 0)
+ buf1.data1++;
buf1.data2 += buf1.data1;
}
return 0;

2024-04-10 09:07:45

by James Clark

[permalink] [raw]

Subject: Re: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1

On 10/04/2024 00:17, Namhyung Kim wrote:
> Hi Ian and James,
>
> On Tue, Apr 9, 2024 at 8:39 AM Ian Rogers <[email protected]> wrote:
>>
>> On Tue, Apr 9, 2024 at 1:48 AM James Clark <[email protected]> wrote:
>>>
>>> To prevent anyone from seeing a test failure appear as a regression and
>>> thinking that it was caused by their code change, just skip the test on
>>> N1.
>>>
>>> It can be caused by any unrelated change that shifts the loop into an
>>> unfortunate position in the Perf binary which is almost impossible to
>>> debug as the root cause of the test failure. Ultimately it's caused by
>>> the referenced errata.
>>>
>>> Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
>>> Signed-off-by: James Clark <[email protected]>
>>
>> This change makes me sad :-( Is there no hope of aligning the loop? We
>> have little enough testing coverage for memory events and even precise
>> events on ARM that anything take away testing coverage feels like we
>> should try to do better.
>
> Can we just add some noise in the loop?
>
> Thanks,
> Namhyung
>

Yes that would probably work. I decided to skip rather than touch the
test because I didn't want the errata on one specific core to affect
testing everywhere else.

But if we don't think that it's too hacky to include that in the test
then I can add it. To be honest maybe it makes the test more "realistic"
because a very short infinite loop doesn't really represent a real workload.

>
> diff --git a/tools/perf/tests/workloads/datasym.c
> b/tools/perf/tests/workloads/datasym.c
> index ddd40bc63448..e2514bf393cd 100644
> --- a/tools/perf/tests/workloads/datasym.c
> +++ b/tools/perf/tests/workloads/datasym.c
> @@ -16,6 +16,8 @@ static int datasym(int argc __maybe_unused, const
> char **argv __maybe_unused)
> {
> for (;;) {
> buf1.data1++;
> + if ((buf1.data1 % 100129) == 0)
> + buf1.data1++;
> buf1.data2 += buf1.data1;
> }
> return 0;
>

2024-04-10 09:08:51

by James Clark

[permalink] [raw]

Subject: Re: [PATCH 3/3] perf tests: Skip "test data symbol" on Neoverse N1

On 09/04/2024 16:39, Ian Rogers wrote:
> On Tue, Apr 9, 2024 at 1:48 AM James Clark <[email protected]> wrote:
>>
>> To prevent anyone from seeing a test failure appear as a regression and
>> thinking that it was caused by their code change, just skip the test on
>> N1.
>>
>> It can be caused by any unrelated change that shifts the loop into an
>> unfortunate position in the Perf binary which is almost impossible to
>> debug as the root cause of the test failure. Ultimately it's caused by
>> the referenced errata.
>>
>> Fixes: 60abedb8aa90 ("perf test: Introduce script for data symbol testing")
>> Signed-off-by: James Clark <[email protected]>
>
> This change makes me sad :-( Is there no hope of aligning the loop? We
> have little enough testing coverage for memory events and even precise
> events on ARM that anything take away testing coverage feels like we
> should try to do better.
>
> Which models are we losing coverage for, presumably neoverse-n1 but
> what about neoverse-v1 and neoverse-n2-v2?
>
> If aligning the loop doesn't work, could we use objdump and check its
> alignment skipping when it is off? Or run the test and turn fails to
> skip on neoverse-n1 - so we get some coverage testing.
>
> It would also be nice if the change didn't add a dependency on lscpu
> for the sake of parsing /proc/cpuinfo, I see another arm test already
> did this test_arm_callgraph_fp.sh - that case looks like it should be
> using uname.
>

I'll make the change to add the noise to the loop, which will drop this
lscpu addition. And I'll fix up test_arm_callgraph_fp.sh while I'm at it.

> Thanks,
> Ian
>
>> ---
>> tools/perf/tests/shell/test_data_symbol.sh | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/tools/perf/tests/shell/test_data_symbol.sh b/tools/perf/tests/shell/test_data_symbol.sh
>> index 3dfa91832aa8..ffc641d00aa4 100755
>> --- a/tools/perf/tests/shell/test_data_symbol.sh
>> +++ b/tools/perf/tests/shell/test_data_symbol.sh
>> @@ -16,6 +16,12 @@ skip_if_no_mem_event() {
>> return 2
>> }
>>
>> +# Skip on Arm N1 due to errata 1694299. Bias exists in SPE sampling
>> +# which can cause the load and store instructions to be skipped
>> +# entirely. This comes and goes randomly depending on the offset the
>> +# linker places the datasym loop at in the Perf binary.
>> +lscpu | grep -q "Neoverse-N1" && exit 2
>> +
>> skip_if_no_mem_event || exit 2
>>
>> skip_test_missing_symbol buf1
>> --
>> 2.34.1
>>