2021-12-16 15:15:18

by Thomas Richter

[permalink] [raw]
Subject: [PATCH] perf test: Test 73 Sig_trap fails on s390

In Linux next kernel
Commit 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
introduced the new test which uses breakpoint events.
These events are not supported on s390 and PowerPC and always fail:

# perf test -F 73
73: Sigtrap : FAILED!
#

Fix it the same way as in the breakpoint tests in file
tests/bp_account.c where these type of tests are skipped on
s390 and PowerPC platforms.

With this patch skip this test on both platforms.

Output after:
# ./perf test -F 73
73: Sigtrap

Fixes: 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")

Cc: Marco Elver <[email protected]>
Signed-off-by: Thomas Richter <[email protected]>
---
tools/perf/tests/sigtrap.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/tools/perf/tests/sigtrap.c b/tools/perf/tests/sigtrap.c
index 1004bf0e7cc9..1f147fe6595f 100644
--- a/tools/perf/tests/sigtrap.c
+++ b/tools/perf/tests/sigtrap.c
@@ -22,6 +22,19 @@
#include "tests.h"
#include "../perf-sys.h"

+/*
+ * PowerPC and S390 do not support creation of instruction breakpoints using the
+ * perf_event interface.
+ *
+ * Just disable the test for these architectures until these issues are
+ * resolved.
+ */
+#if defined(__powerpc__) || defined(__s390x__)
+#define BP_ACCOUNT_IS_SUPPORTED 0
+#else
+#define BP_ACCOUNT_IS_SUPPORTED 1
+#endif
+
#define NUM_THREADS 5

static struct {
@@ -122,6 +135,11 @@ static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __m
char sbuf[STRERR_BUFSIZE];
int i, fd, ret = TEST_FAIL;

+ if (!BP_ACCOUNT_IS_SUPPORTED) {
+ pr_debug("Test not supported on this architecture");
+ return TEST_SKIP;
+ }
+
pthread_barrier_init(&barrier, NULL, NUM_THREADS + 1);

action.sa_flags = SA_SIGINFO | SA_NODEFER;
--
2.33.1



2021-12-16 15:48:29

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] perf test: Test 73 Sig_trap fails on s390

On Thu, 16 Dec 2021 at 16:15, Thomas Richter <[email protected]> wrote:
>
> In Linux next kernel
> Commit 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
> introduced the new test which uses breakpoint events.
> These events are not supported on s390 and PowerPC and always fail:
>
> # perf test -F 73
> 73: Sigtrap : FAILED!
> #
>
> Fix it the same way as in the breakpoint tests in file
> tests/bp_account.c where these type of tests are skipped on
> s390 and PowerPC platforms.
>
> With this patch skip this test on both platforms.
>
> Output after:
> # ./perf test -F 73
> 73: Sigtrap
>
> Fixes: 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
>
> Cc: Marco Elver <[email protected]>
> Signed-off-by: Thomas Richter <[email protected]>

Acked-by: Marco Elver <[email protected]>

Thanks, and sorry for missing this case!

> ---
> tools/perf/tests/sigtrap.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/tools/perf/tests/sigtrap.c b/tools/perf/tests/sigtrap.c
> index 1004bf0e7cc9..1f147fe6595f 100644
> --- a/tools/perf/tests/sigtrap.c
> +++ b/tools/perf/tests/sigtrap.c
> @@ -22,6 +22,19 @@
> #include "tests.h"
> #include "../perf-sys.h"
>
> +/*
> + * PowerPC and S390 do not support creation of instruction breakpoints using the
> + * perf_event interface.
> + *
> + * Just disable the test for these architectures until these issues are
> + * resolved.
> + */
> +#if defined(__powerpc__) || defined(__s390x__)
> +#define BP_ACCOUNT_IS_SUPPORTED 0
> +#else
> +#define BP_ACCOUNT_IS_SUPPORTED 1
> +#endif
> +
> #define NUM_THREADS 5
>
> static struct {
> @@ -122,6 +135,11 @@ static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __m
> char sbuf[STRERR_BUFSIZE];
> int i, fd, ret = TEST_FAIL;
>
> + if (!BP_ACCOUNT_IS_SUPPORTED) {
> + pr_debug("Test not supported on this architecture");
> + return TEST_SKIP;
> + }
> +
> pthread_barrier_init(&barrier, NULL, NUM_THREADS + 1);
>
> action.sa_flags = SA_SIGINFO | SA_NODEFER;
> --
> 2.33.1
>

2022-01-18 02:35:53

by John Garry

[permalink] [raw]
Subject: Re: [PATCH] perf test: Test 73 Sig_trap fails on s390

On 16/12/2021 15:48, Marco Elver wrote:

+

> On Thu, 16 Dec 2021 at 16:15, Thomas Richter<[email protected]> wrote:
>> In Linux next kernel
>> Commit 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
>> introduced the new test which uses breakpoint events.
>> These events are not supported on s390 and PowerPC and always fail:
>>
>> # perf test -F 73
>> 73: Sigtrap : FAILED!
>> #
>>
>> Fix it the same way as in the breakpoint tests in file
>> tests/bp_account.c where these type of tests are skipped on
>> s390 and PowerPC platforms.
>>
>> With this patch skip this test on both platforms.
>>
>> Output after:
>> # ./perf test -F 73
>> 73: Sigtrap
>>
>> Fixes: 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
>>
>> Cc: Marco Elver<[email protected]>
>> Signed-off-by: Thomas Richter<[email protected]>
> Acked-by: Marco Elver<[email protected]>
>
> Thanks, and sorry for missing this case!
>

I am finding that this test hangs on my arm64 machine:

john@debian:~/kernel-dev2/tools/perf$ sudo ./perf test -vvv 73
73: Sigtrap:
--- start ---
test child forked, pid 45193

And fails on my x86 broadwell machine:

john@localhost:~/kernel-dev2/tools/perf> sudo ./perf test -v 73
73: Sigtrap :
--- start ---
test child forked, pid 22255
FAILED sys_perf_event_open(): Argument list too long
test child finished with -1
---- end ----
Sigtrap: FAILED!
john@localhost:~/kernel-dev2/tools/perf>

Are there more architectures+platforms for which we need to skip this test?

Thanks,
John

2022-01-20 00:27:06

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH] perf test: Test 73 Sig_trap fails on s390

On Mon, Jan 17, 2022 at 03:39:10PM +0000, John Garry wrote:
> On 16/12/2021 15:48, Marco Elver wrote:
>
> +
>
> > On Thu, 16 Dec 2021 at 16:15, Thomas Richter<[email protected]> wrote:
> > > In Linux next kernel
> > > Commit 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
> > > introduced the new test which uses breakpoint events.
> > > These events are not supported on s390 and PowerPC and always fail:
> > >
> > > # perf test -F 73
> > > 73: Sigtrap : FAILED!
> > > #
> > >
> > > Fix it the same way as in the breakpoint tests in file
> > > tests/bp_account.c where these type of tests are skipped on
> > > s390 and PowerPC platforms.
> > >
> > > With this patch skip this test on both platforms.
> > >
> > > Output after:
> > > # ./perf test -F 73
> > > 73: Sigtrap
> > >
> > > Fixes: 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
> > >
> > > Cc: Marco Elver<[email protected]>
> > > Signed-off-by: Thomas Richter<[email protected]>
> > Acked-by: Marco Elver<[email protected]>
> >
> > Thanks, and sorry for missing this case!
> >
>
> I am finding that this test hangs on my arm64 machine:
>
> john@debian:~/kernel-dev2/tools/perf$ sudo ./perf test -vvv 73
> 73: Sigtrap:
> --- start ---
> test child forked, pid 45193

Both Arm and Arm64 platforms cannot support signal handler with
breakpoint, please see the details in [1]. So I think we need
something like below:

static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
{
...

if (!BP_SIGNAL_IS_SUPPORTED) {
pr_debug("Test not supported on this architecture");
return TEST_SKIP;
}

...
}

Since we have defined BP_SIGNAL_IS_SUPPORTED, I think we can reuse it at
here.

[1] https://lore.kernel.org/lkml/157169993406.29376.12473771029179755767.tip-bot2@tip-bot2/

> And fails on my x86 broadwell machine:
>
> john@localhost:~/kernel-dev2/tools/perf> sudo ./perf test -v 73
> 73: Sigtrap :
> --- start ---
> test child forked, pid 22255
> FAILED sys_perf_event_open(): Argument list too long
> test child finished with -1
> ---- end ----
> Sigtrap: FAILED!
> john@localhost:~/kernel-dev2/tools/perf>

It is a bit suprise for the failure on x86, as I remembered x86 platform
can support signal handler with hw breakpoint. And from the error
"Argument list too long", it should be a different issue from other
archs.

Thanks,
Leo

2022-01-20 06:15:08

by John Garry

[permalink] [raw]
Subject: Re: [PATCH] perf test: Test 73 Sig_trap fails on s390

Hi Leo,

>> test child forked, pid 45193
> Both Arm and Arm64 platforms cannot support signal handler with
> breakpoint, please see the details in [1].

Thanks for the info.

>So I think we need
> something like below:
>

ok

> static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> {
> ...
>
> if (!BP_SIGNAL_IS_SUPPORTED) {
> pr_debug("Test not supported on this architecture");
> return TEST_SKIP;
> }
>
> ...
> }
>
> Since we have defined BP_SIGNAL_IS_SUPPORTED, I think we can reuse it at
> here.


Do you know any other architectures which would have this issue? Or a
generic way to check for support?

It's better to not have to add to this list arch-by-arch..

>
> [1]https://lore.kernel.org/lkml/157169993406.29376.12473771029179755767.tip-bot2@tip-bot2/
>
>> And fails on my x86 broadwell machine:
>>
>> john@localhost:~/kernel-dev2/tools/perf> sudo ./perf test -v 73
>> 73: Sigtrap :
>> --- start ---
>> test child forked, pid 22255
>> FAILED sys_perf_event_open(): Argument list too long
>> test child finished with -1
>> ---- end ----
>> Sigtrap: FAILED!
>> john@localhost:~/kernel-dev2/tools/perf>
> It is a bit suprise for the failure on x86, as I remembered x86 platform
> can support signal handler with hw breakpoint. And from the error
> "Argument list too long", it should be a different issue from other
> archs.

Yeah, I don't know what's going on here.

Thanks,
John

2022-01-20 07:42:02

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH] perf test: Test 73 Sig_trap fails on s390

Hi John,

On Tue, Jan 18, 2022 at 10:20:37AM +0000, John Garry wrote:

[...]

> > static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> > {
> > ...
> >
> > if (!BP_SIGNAL_IS_SUPPORTED) {
> > pr_debug("Test not supported on this architecture");
> > return TEST_SKIP;
> > }
> >
> > ...
> > }
> >
> > Since we have defined BP_SIGNAL_IS_SUPPORTED, I think we can reuse it at
> > here.
>
>
> Do you know any other architectures which would have this issue? Or a
> generic way to check for support?
>
> It's better to not have to add to this list arch-by-arch..

Yeah, it's ugly to add archs one by one. But I don't find an ABI can
be used to make decision if an arch supports signal handler for
breakpoint. Usually, it's architecture specific operations for signal
handling, see the code [1]; simply to say, architecture needs to disable
single step when call signal handler and restore single step after
return from signal handler.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/signal.c#n830

> > > And fails on my x86 broadwell machine:
> > >
> > > john@localhost:~/kernel-dev2/tools/perf> sudo ./perf test -v 73
> > > 73: Sigtrap :
> > > --- start ---
> > > test child forked, pid 22255
> > > FAILED sys_perf_event_open(): Argument list too long
> > > test child finished with -1
> > > ---- end ----
> > > Sigtrap: FAILED!
> > > john@localhost:~/kernel-dev2/tools/perf>
> > It is a bit suprise for the failure on x86, as I remembered x86 platform
> > can support signal handler with hw breakpoint. And from the error
> > "Argument list too long", it should be a different issue from other
> > archs.
>
> Yeah, I don't know what's going on here.

Seems to there have incompatible issue.
Maybe you could cleanup with "make clean" and then rebuild perf.

Thanks,
Leo

2022-01-20 08:34:39

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] perf test: Test 73 Sig_trap fails on s390

On Tue, 18 Jan 2022 at 10:18, Leo Yan <[email protected]> wrote:
>
> On Mon, Jan 17, 2022 at 03:39:10PM +0000, John Garry wrote:
> > On 16/12/2021 15:48, Marco Elver wrote:
> >
> > +
> >
> > > On Thu, 16 Dec 2021 at 16:15, Thomas Richter<[email protected]> wrote:
> > > > In Linux next kernel
> > > > Commit 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
> > > > introduced the new test which uses breakpoint events.
> > > > These events are not supported on s390 and PowerPC and always fail:
> > > >
> > > > # perf test -F 73
> > > > 73: Sigtrap : FAILED!
> > > > #
> > > >
> > > > Fix it the same way as in the breakpoint tests in file
> > > > tests/bp_account.c where these type of tests are skipped on
> > > > s390 and PowerPC platforms.
> > > >
> > > > With this patch skip this test on both platforms.
> > > >
> > > > Output after:
> > > > # ./perf test -F 73
> > > > 73: Sigtrap
> > > >
> > > > Fixes: 5504f67944484 ("perf test sigtrap: Add basic stress test for sigtrap handling")
> > > >
> > > > Cc: Marco Elver<[email protected]>
> > > > Signed-off-by: Thomas Richter<[email protected]>
> > > Acked-by: Marco Elver<[email protected]>
> > >
> > > Thanks, and sorry for missing this case!
> > >
> >
> > I am finding that this test hangs on my arm64 machine:
> >
> > john@debian:~/kernel-dev2/tools/perf$ sudo ./perf test -vvv 73
> > 73: Sigtrap:
> > --- start ---
> > test child forked, pid 45193
>
> Both Arm and Arm64 platforms cannot support signal handler with
> breakpoint, please see the details in [1]. So I think we need
> something like below:
>
> static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> {
> ...
>
> if (!BP_SIGNAL_IS_SUPPORTED) {
> pr_debug("Test not supported on this architecture");
> return TEST_SKIP;
> }
>
> ...
> }
>
> Since we have defined BP_SIGNAL_IS_SUPPORTED, I think we can reuse it at
> here.
>
> [1] https://lore.kernel.org/lkml/157169993406.29376.12473771029179755767.tip-bot2@tip-bot2/

Does this limitation also exist for address watchpoints? The sigtrap
test does not make use of instruction breakpoints, but instead just
sets up a watchpoint on access to a data address.

Thanks,
-- Marco

2022-01-20 09:10:22

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH] perf test: Test 73 Sig_trap fails on s390

On Tue, Jan 18, 2022 at 12:40:04PM +0100, Marco Elver wrote:

[...]

> > Both Arm and Arm64 platforms cannot support signal handler with
> > breakpoint, please see the details in [1]. So I think we need
> > something like below:
> >
> > static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> > {
> > ...
> >
> > if (!BP_SIGNAL_IS_SUPPORTED) {
> > pr_debug("Test not supported on this architecture");
> > return TEST_SKIP;
> > }
> >
> > ...
> > }
> >
> > Since we have defined BP_SIGNAL_IS_SUPPORTED, I think we can reuse it at
> > here.
> >
> > [1] https://lore.kernel.org/lkml/157169993406.29376.12473771029179755767.tip-bot2@tip-bot2/
>
> Does this limitation also exist for address watchpoints? The sigtrap
> test does not make use of instruction breakpoints, but instead just
> sets up a watchpoint on access to a data address.

Yes, after reading the code, the flow for either instrution breakpoint
or watchpoint both use the single step [1], thus the signal handler will
take the single step execution and lead to the infinite loop.

I am not the best person to answer this question; @Will, could you
confirm for this? Thanks!

Leo

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/hw_breakpoint.c

2022-01-24 18:57:17

by John Garry

[permalink] [raw]
Subject: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

On 18/01/2022 12:43, Leo Yan wrote:

Hi Will,

Can you kindly check below the question from Leo on this issue?

You were cc'ed earlier in this thread so should be able to find more
context, if needed.

Cheers,
John

> On Tue, Jan 18, 2022 at 12:40:04PM +0100, Marco Elver wrote:
>
> [...]
>
>>> Both Arm and Arm64 platforms cannot support signal handler with
>>> breakpoint, please see the details in [1]. So I think we need
>>> something like below:
>>>
>>> static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
>>> {
>>> ...
>>>
>>> if (!BP_SIGNAL_IS_SUPPORTED) {
>>> pr_debug("Test not supported on this architecture");
>>> return TEST_SKIP;
>>> }
>>>
>>> ...
>>> }
>>>
>>> Since we have defined BP_SIGNAL_IS_SUPPORTED, I think we can reuse it at
>>> here.
>>>
>>> [1]https://lore.kernel.org/lkml/157169993406.29376.12473771029179755767.tip-bot2@tip-bot2/
>> Does this limitation also exist for address watchpoints? The sigtrap
>> test does not make use of instruction breakpoints, but instead just
>> sets up a watchpoint on access to a data address.
> Yes, after reading the code, the flow for either instrution breakpoint
> or watchpoint both use the single step [1], thus the signal handler will
> take the single step execution and lead to the infinite loop.
>
> I am not the best person to answer this question; @Will, could you
> confirm for this? Thanks!
>
> Leo
>
> [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/hw_breakpoint.c

2022-02-01 20:49:12

by Dmitry Vyukov

[permalink] [raw]
Subject: Test 73 Sig_trap fails on arm64

> On 18/01/2022 12:43, Leo Yan wrote:
>
> Hi Will,
>
> Can you kindly check below the question from Leo on this issue?
>
> You were cc'ed earlier in this thread so should be able to find more
> context, if needed.

Hi Will, John,

I wonder if PSTATE.D flag can be used to resolve this
(similar to x86's use of EFLAGS.RF)?
I naively tried to do:

void OnSigtrap(int sig, siginfo_t* info, void* uctx) {
auto& mctx = static_cast<ucontext_t*>(uctx)->uc_mcontext;
mctx.pstate |= PSR_D_BIT;
}

But then I got a SIGSEGV from kernel.
But I wasn't able to track yet what part of the kernel did
not like setting of D bit.


> Cheers,
> John
>
> > On Tue, Jan 18, 2022 at 12:40:04PM +0100, Marco Elver wrote:
> >
> > [...]
> >
> >>> Both Arm and Arm64 platforms cannot support signal handler with
> >>> breakpoint, please see the details in [1].  So I think we need
> >>> something like below:
> >>>
> >>> static int test__sigtrap(struct test_suite *test __maybe_unused, int subtest __maybe_unused)
> >>> {
> >>>          ...
> >>>
> >>>          if (!BP_SIGNAL_IS_SUPPORTED) {
> >>>                  pr_debug("Test not supported on this architecture");
> >>>                  return TEST_SKIP;
> >>>          }
> >>>
> >>>          ...
> >>> }
> >>>
> >>> Since we have defined BP_SIGNAL_IS_SUPPORTED, I think we can reuse it at
> >>> here.
> >>>
> >>> [1]https://lore.kernel.org/lkml/157169993406.29376.12473771029179755767.tip-bot2@tip-bot2/
> >> Does this limitation also exist for address watchpoints? The sigtrap
> >> test does not make use of instruction breakpoints, but instead just
> >> sets up a watchpoint on access to a data address.
> > Yes, after reading the code, the flow for either instrution breakpoint
> > or watchpoint both use the single step [1], thus the signal handler will
> > take the single step execution and lead to the infinite loop.
> >
> > I am not the best person to answer this question; @Will, could you
> > confirm for this?  Thanks!
> >
> > Leo
> >
> > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/hw_breakpoint.c

2022-02-15 13:30:34

by John Garry

[permalink] [raw]
Subject: Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

On 24/01/2022 09:19, John Garry wrote:

Hi Will,

Have you had a chance to check this or the mail from Dmitry?

https://lore.kernel.org/linux-perf-users/CACT4Y+YVyJcqbR5j2fsSQ+C5hy78X+aobrUHaZKghFf0_NMv=A@mail.gmail.com/

If we can't make progress then we just need to skip the test on arm64
for now.

Thanks,
John

>
>> On Tue, Jan 18, 2022 at 12:40:04PM +0100, Marco Elver wrote:
>>
>> [...]
>>
>>>> Both Arm and Arm64 platforms cannot support signal handler with
>>>> breakpoint, please see the details in [1].  So I think we need
>>>> something like below:
>>>>
>>>> static int test__sigtrap(struct test_suite *test __maybe_unused, int
>>>> subtest __maybe_unused)
>>>> {
>>>>          ...
>>>>
>>>>          if (!BP_SIGNAL_IS_SUPPORTED) {
>>>>                  pr_debug("Test not supported on this architecture");
>>>>                  return TEST_SKIP;
>>>>          }
>>>>
>>>>          ...
>>>> }
>>>>
>>>> Since we have defined BP_SIGNAL_IS_SUPPORTED, I think we can reuse
>>>> it at
>>>> here.
>>>>
>>>> [1]https://lore.kernel.org/lkml/157169993406.29376.12473771029179755767.tip-bot2@tip-bot2/
>>>>
>>> Does this limitation also exist for address watchpoints? The sigtrap
>>> test does not make use of instruction breakpoints, but instead just
>>> sets up a watchpoint on access to a data address.
>> Yes, after reading the code, the flow for either instrution breakpoint
>> or watchpoint both use the single step [1], thus the signal handler will
>> take the single step execution and lead to the infinite loop.
>>
>> I am not the best person to answer this question; @Will, could you
>> confirm for this?  Thanks!
>>
>> Leo
>>
>> [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/hw_breakpoint.c
>>
>
> .

2022-02-15 15:32:18

by Will Deacon

[permalink] [raw]
Subject: Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

On Tue, Feb 15, 2022 at 11:16:16AM +0000, John Garry wrote:
> On 24/01/2022 09:19, John Garry wrote:
>
> Hi Will,
>
> Have you had a chance to check this or the mail from Dmitry?
>
> https://lore.kernel.org/linux-perf-users/CACT4Y+YVyJcqbR5j2fsSQ+C5hy78X+aobrUHaZKghFf0_NMv=A@mail.gmail.com/
>
> If we can't make progress then we just need to skip the test on arm64 for
> now.

Sorry, I haven't had time to look at this (or the thousands of other mails
in my inbox) lately.

I don't recall all of the details, but basically hw_breakpoint really
doesn't work well on arm/arm64 -- the sticking points are around handling
the stepping and whether to step into or over exceptions. Sadly, our ptrace
interface (which is what is used by GDB) is built on top of hw_breakpoint,
so we can't just rip it out and any significant changes are pretty risky.

What I would like to happen is that we rework our debug exception handling
as outlined by [1] so that kernel debug is better defined and the ptrace
interface can interact directly with the debug architecture instead of being
funnelled through hw_breakpoint. Once we have that, I think we could try to
improve hw_breakpoint much more comfortably (or at least defeature it
considerably without having to worry about breaking GDB). I started this a
couple of years ago, but I haven't found time to get back to it for ages.

Anyway, to this specific test...

When we hit a break/watchpoint the faulting PC points at the instruction
which faulted and the exception is reported before the instruction has had
any other side-effects (e.g. if a watchpoint triggers on a store, then
memory will not have been updated when the watchpoint handler runs), so if
we were to return as usual after reporting the exception to perf then we
would just hit the same break/watchpoint again and we'd get stuck. GDB
handles stepping over the faulting instruction, but for perf (and assumedly
these tests), the kernel is expected to handle the step. This handling
amounts to disabling the break/watchpoint which we think we hit and then
attempting a hardware single-step. During the step we could run into more
break/watchpoints on the same instruction, so we'll keep disabling things
until we eventually manage to complete the step, which is signalled by a
specific type of debug exception. At this point, we re-enable the
break/watchpoints and we're good.

Signals make this messy, as the step logic will step _into_ the signal
handler -- we have to do this, otherwise we would miss break/watchpoints
triggered by the signal handler if we had disabled them for the step.
However, it means that when we return back from the signal handler we will
run back into the break/watchpoint which we initially stepped over. When
perf uses SIGTRAP to notify userspace that we hit a break/watchpoint,
then we'll get stuck because we'll step into the handler every time.

Hopefully that clears things up a bit. Ideally, the kernel wouldn't
pretend to handle this stepping at all for arm64 as it adds a bunch of
complexity, overhead to our context-switch and I don't think the current
behaviour is particularly useful.

Will

[1] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/

2022-02-16 11:56:43

by John Garry

[permalink] [raw]
Subject: Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

Hi Will,

> Sorry, I haven't had time to look at this (or the thousands of other mails
> in my inbox) lately.
>

Thanks

> I don't recall all of the details, but basically hw_breakpoint really
> doesn't work well on arm/arm64 -- the sticking points are around handling
> the stepping and whether to step into or over exceptions. Sadly, our ptrace
> interface (which is what is used by GDB) is built on top of hw_breakpoint,
> so we can't just rip it out and any significant changes are pretty risky.
>
> What I would like to happen is that we rework our debug exception handling
> as outlined by [1] so that kernel debug is better defined and the ptrace
> interface can interact directly with the debug architecture instead of being
> funnelled through hw_breakpoint. Once we have that, I think we could try to
> improve hw_breakpoint much more comfortably (or at least defeature it
> considerably without having to worry about breaking GDB). I started this a
> couple of years ago, but I haven't found time to get back to it for ages.
>
> Anyway, to this specific test...
>
> When we hit a break/watchpoint the faulting PC points at the instruction
> which faulted and the exception is reported before the instruction has had
> any other side-effects (e.g. if a watchpoint triggers on a store, then
> memory will not have been updated when the watchpoint handler runs), so if
> we were to return as usual after reporting the exception to perf then we
> would just hit the same break/watchpoint again and we'd get stuck. GDB
> handles stepping over the faulting instruction, but for perf (and assumedly
> these tests), the kernel is expected to handle the step. This handling
> amounts to disabling the break/watchpoint which we think we hit and then
> attempting a hardware single-step. During the step we could run into more
> break/watchpoints on the same instruction, so we'll keep disabling things
> until we eventually manage to complete the step, which is signalled by a
> specific type of debug exception. At this point, we re-enable the
> break/watchpoints and we're good.
>
> Signals make this messy, as the step logic will step_into_ the signal
> handler -- we have to do this, otherwise we would miss break/watchpoints
> triggered by the signal handler if we had disabled them for the step.
> However, it means that when we return back from the signal handler we will
> run back into the break/watchpoint which we initially stepped over. When
> perf uses SIGTRAP to notify userspace that we hit a break/watchpoint,
> then we'll get stuck because we'll step into the handler every time.
>
> Hopefully that clears things up a bit. Ideally, the kernel wouldn't
> pretend to handle this stepping at all for arm64 as it adds a bunch of
> complexity, overhead to our context-switch and I don't think the current
> behaviour is particularly useful.
>

Right, so what I am hearing altogether is that for now we should just
skip this test.

And since the kernel does not seem to advertise this capability we need
to disable for specific architectures.

Thanks,
John

> [1]https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
> .

2022-02-16 12:10:01

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

On Wed, 16 Feb 2022 at 12:47, John Garry <[email protected]> wrote:
>
> Hi Will,
>
> > Sorry, I haven't had time to look at this (or the thousands of other mails
> > in my inbox) lately.
> >
>
> Thanks
>
> > I don't recall all of the details, but basically hw_breakpoint really
> > doesn't work well on arm/arm64 -- the sticking points are around handling
> > the stepping and whether to step into or over exceptions. Sadly, our ptrace
> > interface (which is what is used by GDB) is built on top of hw_breakpoint,
> > so we can't just rip it out and any significant changes are pretty risky.
> >
> > What I would like to happen is that we rework our debug exception handling
> > as outlined by [1] so that kernel debug is better defined and the ptrace
> > interface can interact directly with the debug architecture instead of being
> > funnelled through hw_breakpoint. Once we have that, I think we could try to
> > improve hw_breakpoint much more comfortably (or at least defeature it
> > considerably without having to worry about breaking GDB). I started this a
> > couple of years ago, but I haven't found time to get back to it for ages.
> >
> > Anyway, to this specific test...
> >
> > When we hit a break/watchpoint the faulting PC points at the instruction
> > which faulted and the exception is reported before the instruction has had
> > any other side-effects (e.g. if a watchpoint triggers on a store, then
> > memory will not have been updated when the watchpoint handler runs), so if
> > we were to return as usual after reporting the exception to perf then we
> > would just hit the same break/watchpoint again and we'd get stuck. GDB
> > handles stepping over the faulting instruction, but for perf (and assumedly
> > these tests), the kernel is expected to handle the step. This handling
> > amounts to disabling the break/watchpoint which we think we hit and then
> > attempting a hardware single-step. During the step we could run into more
> > break/watchpoints on the same instruction, so we'll keep disabling things
> > until we eventually manage to complete the step, which is signalled by a
> > specific type of debug exception. At this point, we re-enable the
> > break/watchpoints and we're good.
> >
> > Signals make this messy, as the step logic will step_into_ the signal
> > handler -- we have to do this, otherwise we would miss break/watchpoints
> > triggered by the signal handler if we had disabled them for the step.
> > However, it means that when we return back from the signal handler we will
> > run back into the break/watchpoint which we initially stepped over. When
> > perf uses SIGTRAP to notify userspace that we hit a break/watchpoint,
> > then we'll get stuck because we'll step into the handler every time.
> >
> > Hopefully that clears things up a bit. Ideally, the kernel wouldn't
> > pretend to handle this stepping at all for arm64 as it adds a bunch of
> > complexity, overhead to our context-switch and I don't think the current
> > behaviour is particularly useful.
> >
>
> Right, so what I am hearing altogether is that for now we should just
> skip this test.
>
> And since the kernel does not seem to advertise this capability we need
> to disable for specific architectures.

It does and fwiw I am just trying to use it. Things work only on x86 so far.

2022-02-16 13:26:52

by Leo Yan

[permalink] [raw]
Subject: Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

On Wed, Feb 16, 2022 at 12:54:16PM +0100, Dmitry Vyukov wrote:
> On Wed, 16 Feb 2022 at 12:47, John Garry <[email protected]> wrote:

[...]

> > > Signals make this messy, as the step logic will step_into_ the signal
> > > handler -- we have to do this, otherwise we would miss break/watchpoints
> > > triggered by the signal handler if we had disabled them for the step.
> > > However, it means that when we return back from the signal handler we will
> > > run back into the break/watchpoint which we initially stepped over. When
> > > perf uses SIGTRAP to notify userspace that we hit a break/watchpoint,
> > > then we'll get stuck because we'll step into the handler every time.
> > >
> > > Hopefully that clears things up a bit. Ideally, the kernel wouldn't
> > > pretend to handle this stepping at all for arm64 as it adds a bunch of
> > > complexity, overhead to our context-switch and I don't think the current
> > > behaviour is particularly useful.
> > >
> >
> > Right, so what I am hearing altogether is that for now we should just
> > skip this test.
> >
> > And since the kernel does not seem to advertise this capability we need
> > to disable for specific architectures.
>
> It does and fwiw I am just trying to use it. Things work only on x86 so far.

So here we have agreement to disable the cases for Arm64/Arm.

John, since you put much efforts to follow up the issue, I'd like to
leave decision to you if you want to proceed for patches? Otherwise,
I will send patches to disable cases in perf.

Thanks,
Leo

2022-02-17 11:12:09

by Leo Yan

[permalink] [raw]
Subject: Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

On Thu, Feb 17, 2022 at 09:53:23AM +0000, John Garry wrote:
> On 16/02/2022 13:13, Leo Yan wrote:
> > > It does and fwiw I am just trying to use it. Things work only on x86 so far.
> > So here we have agreement to disable the cases for Arm64/Arm.
> >
> > John, since you put much efforts to follow up the issue, I'd like to
> > leave decision to you if you want to proceed for patches? Otherwise,
> > I will send patches to disable cases in perf.
>
> Hi Leo,
>
> I'll send later today.

Cool, thanks a lot!

Leo

2022-02-17 18:19:44

by John Garry

[permalink] [raw]
Subject: Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

On 16/02/2022 13:13, Leo Yan wrote:
>> It does and fwiw I am just trying to use it. Things work only on x86 so far.
> So here we have agreement to disable the cases for Arm64/Arm.
>
> John, since you put much efforts to follow up the issue, I'd like to
> leave decision to you if you want to proceed for patches? Otherwise,
> I will send patches to disable cases in perf.

Hi Leo,

I'll send later today.

Thanks,
John