This patch-set reworks pending fixes from Wei's series [1] to make
single-step debugging via kgdb/kdb on arm64 work as expected. There was
a prior discussion on ML [2] regarding if we should keep the interrupts
enabled during single-stepping. So patch #1 follows suggestion from Will
[3] to not disable interrupts during single stepping but rather skip
single stepping within interrupt handler.
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
[3] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
Changes in v5:
- Incorporated misc. comments from Mark.
Changes in v4:
- Rebased to the tip of mainline.
- Picked up Doug's Tested-by tag.
Changes in v3:
- Reword commit descriptions as per Daniel's suggestions.
Changes in v2:
- Replace patch #1 to rather follow Will's suggestion.
Sumit Garg (2):
arm64: entry: Skip single stepping into interrupt handlers
arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
arch/arm64/include/asm/debug-monitors.h | 1 +
arch/arm64/kernel/debug-monitors.c | 5 +++++
arch/arm64/kernel/entry-common.c | 22 ++++++++++++++++++++--
arch/arm64/kernel/kgdb.c | 2 ++
4 files changed, 28 insertions(+), 2 deletions(-)
--
2.34.1
Currently on systems where the timer interrupt (or any other
fast-at-human-scale periodic interrupt) is active then it is impossible
to step any code with interrupts unlocked because we will always end up
stepping into the timer interrupt instead of stepping the user code.
The common user's goal while single stepping is that when they step then
the system will stop at PC+4 or PC+I for a branch that gets taken
relative to the instruction they are stepping. So, fix broken single step
implementation via skipping single stepping into interrupt handlers.
The methodology is when we receive an interrupt from EL1, check if we
are single stepping (pstate.SS). If yes then we save MDSCR_EL1.SS and
clear the register bit if it was set. Then unmask only D and leave I set.
On return from the interrupt, set D and restore MDSCR_EL1.SS. Along with
this skip reschedule if we were stepping.
Suggested-by: Will Deacon <[email protected]>
Signed-off-by: Sumit Garg <[email protected]>
Tested-by: Douglas Anderson <[email protected]>
---
arch/arm64/kernel/entry-common.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index cce1167199e3..688d1ef8e864 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -231,11 +231,15 @@ DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
#define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION))
#endif
-static void __sched arm64_preempt_schedule_irq(void)
+static void __sched arm64_preempt_schedule_irq(struct pt_regs *regs)
{
if (!need_irq_preemption())
return;
+ /* Don't reschedule in case we are single stepping */
+ if (!(regs->pstate & DBG_SPSR_SS))
+ return;
+
/*
* Note: thread_info::preempt_count includes both thread_info::count
* and thread_info::need_resched, and is not equivalent to
@@ -471,19 +475,33 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
do_interrupt_handler(regs, handler);
irq_exit_rcu();
- arm64_preempt_schedule_irq();
+ arm64_preempt_schedule_irq(regs);
exit_to_kernel_mode(regs);
}
+
static void noinstr el1_interrupt(struct pt_regs *regs,
void (*handler)(struct pt_regs *))
{
+ unsigned long mdscr;
+
+ /* Disable single stepping within interrupt handler */
+ if (regs->pstate & DBG_SPSR_SS) {
+ mdscr = read_sysreg(mdscr_el1);
+ write_sysreg(mdscr & ~DBG_MDSCR_SS, mdscr_el1);
+ }
+
write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
__el1_pnmi(regs, handler);
else
__el1_irq(regs, handler);
+
+ if (regs->pstate & DBG_SPSR_SS) {
+ write_sysreg(DAIF_PROCCTX_NOIRQ | PSR_D_BIT, daif);
+ write_sysreg(mdscr, mdscr_el1);
+ }
}
asmlinkage void noinstr el1h_64_irq_handler(struct pt_regs *regs)
--
2.34.1
Hi Will, Catalin,
On Mon, 19 Dec 2022 at 15:55, Sumit Garg <[email protected]> wrote:
>
> This patch-set reworks pending fixes from Wei's series [1] to make
> single-step debugging via kgdb/kdb on arm64 work as expected. There was
> a prior discussion on ML [2] regarding if we should keep the interrupts
> enabled during single-stepping. So patch #1 follows suggestion from Will
> [3] to not disable interrupts during single stepping but rather skip
> single stepping within interrupt handler.
>
> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
> [3] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
>
> Changes in v5:
> - Incorporated misc. comments from Mark.
>
Since patch #1 has already been reviewed/acked by Mark and the
complete patchset has been tested by Doug, would it be fine for you to
pick up this patchset? It fixes a real single stepping problem for
kgdb on arm64.
-Sumit
> Changes in v4:
> - Rebased to the tip of mainline.
> - Picked up Doug's Tested-by tag.
>
> Changes in v3:
> - Reword commit descriptions as per Daniel's suggestions.
>
> Changes in v2:
> - Replace patch #1 to rather follow Will's suggestion.
>
> Sumit Garg (2):
> arm64: entry: Skip single stepping into interrupt handlers
> arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
>
> arch/arm64/include/asm/debug-monitors.h | 1 +
> arch/arm64/kernel/debug-monitors.c | 5 +++++
> arch/arm64/kernel/entry-common.c | 22 ++++++++++++++++++++--
> arch/arm64/kernel/kgdb.c | 2 ++
> 4 files changed, 28 insertions(+), 2 deletions(-)
>
> --
> 2.34.1
>
On Thu, Jan 12, 2023 at 02:52:49PM +0530, Sumit Garg wrote:
> Hi Will, Catalin,
>
> On Mon, 19 Dec 2022 at 15:55, Sumit Garg <[email protected]> wrote:
> >
> > This patch-set reworks pending fixes from Wei's series [1] to make
> > single-step debugging via kgdb/kdb on arm64 work as expected. There was
> > a prior discussion on ML [2] regarding if we should keep the interrupts
> > enabled during single-stepping. So patch #1 follows suggestion from Will
> > [3] to not disable interrupts during single stepping but rather skip
> > single stepping within interrupt handler.
> >
> > [1] https://lore.kernel.org/all/[email protected]/
> > [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
> > [3] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
> >
> > Changes in v5:
> > - Incorporated misc. comments from Mark.
> >
>
> Since patch #1 has already been reviewed/acked by Mark and the
> complete patchset has been tested by Doug, would it be fine for you to
> pick up this patchset? It fixes a real single stepping problem for
> kgdb on arm64.
Sorry to be quiet for so long.
Testing this patch set has proven to be a little difficult.
It certainly fixes the single step tests in the kgdbtest suite.
That's a good start.
Unfortunately when testing using qemu/KVM (hosted on NXP
2k/Solidrun Honeycomb) the patch set is resulting in instability
running the built-in self tests (specifically this one:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/kgdbts.c#n74 ). Running this test using the kgdbtest harness
results in the test failing roughly a third of the time.
The error reported is that the trap handler tried to unlock a spinlock
that isn't currently locked. To be honest I suspect this is a generic
problem that the new feature happens to tickle (this test has
historically been unreliable on x86 too... and x86 is noteworthy for
being the only other platform I test using KVM rather than pure qemu).
Of course the only way to prove that would be to find and fix the
problem in the trap handler (which probably involves rewriting it) and I
haven't managed to do that yet.
In short, I think the debugger is more useful with this patchset than
without so, although it is caveated by the above, I'd call this:
Acked-by: Daniel Thompson <[email protected]>
Tested-by: Daniel Thompson <[email protected]>
Daniel.
Hi,
Is this expected to change single-stepping operation in usespace for debuggers (gdb/lldb)? If so, it would be nice to at least
test it a little to make sure it works.
On 1/24/23 18:04, Daniel Thompson wrote:
> On Thu, Jan 12, 2023 at 02:52:49PM +0530, Sumit Garg wrote:
>> Hi Will, Catalin,
>>
>> On Mon, 19 Dec 2022 at 15:55, Sumit Garg <[email protected]> wrote:
>>>
>>> This patch-set reworks pending fixes from Wei's series [1] to make
>>> single-step debugging via kgdb/kdb on arm64 work as expected. There was
>>> a prior discussion on ML [2] regarding if we should keep the interrupts
>>> enabled during single-stepping. So patch #1 follows suggestion from Will
>>> [3] to not disable interrupts during single stepping but rather skip
>>> single stepping within interrupt handler.
>>>
>>> [1] https://lore.kernel.org/all/[email protected]/
>>> [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
>>> [3] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
>>>
>>> Changes in v5:
>>> - Incorporated misc. comments from Mark.
>>>
>>
>> Since patch #1 has already been reviewed/acked by Mark and the
>> complete patchset has been tested by Doug, would it be fine for you to
>> pick up this patchset? It fixes a real single stepping problem for
>> kgdb on arm64.
>
> Sorry to be quiet for so long.
>
> Testing this patch set has proven to be a little difficult.
>
> It certainly fixes the single step tests in the kgdbtest suite.
> That's a good start.
>
> Unfortunately when testing using qemu/KVM (hosted on NXP
> 2k/Solidrun Honeycomb) the patch set is resulting in instability
> running the built-in self tests (specifically this one:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/kgdbts.c#n74 ). Running this test using the kgdbtest harness
> results in the test failing roughly a third of the time.
>
> The error reported is that the trap handler tried to unlock a spinlock
> that isn't currently locked. To be honest I suspect this is a generic
> problem that the new feature happens to tickle (this test has
> historically been unreliable on x86 too... and x86 is noteworthy for
> being the only other platform I test using KVM rather than pure qemu).
> Of course the only way to prove that would be to find and fix the
> problem in the trap handler (which probably involves rewriting it) and I
> haven't managed to do that yet.
>
> In short, I think the debugger is more useful with this patchset than
> without so, although it is caveated by the above, I'd call this:
>
> Acked-by: Daniel Thompson <[email protected]>
> Tested-by: Daniel Thompson <[email protected]>
>
>
> Daniel.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Mon, Dec 19, 2022 at 03:54:51PM +0530, Sumit Garg wrote:
> Currently on systems where the timer interrupt (or any other
> fast-at-human-scale periodic interrupt) is active then it is impossible
> to step any code with interrupts unlocked because we will always end up
> stepping into the timer interrupt instead of stepping the user code.
>
> The common user's goal while single stepping is that when they step then
> the system will stop at PC+4 or PC+I for a branch that gets taken
> relative to the instruction they are stepping. So, fix broken single step
> implementation via skipping single stepping into interrupt handlers.
>
> The methodology is when we receive an interrupt from EL1, check if we
> are single stepping (pstate.SS). If yes then we save MDSCR_EL1.SS and
> clear the register bit if it was set. Then unmask only D and leave I set.
> On return from the interrupt, set D and restore MDSCR_EL1.SS. Along with
> this skip reschedule if we were stepping.
>
> Suggested-by: Will Deacon <[email protected]>
> Signed-off-by: Sumit Garg <[email protected]>
> Tested-by: Douglas Anderson <[email protected]>
> ---
> arch/arm64/kernel/entry-common.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index cce1167199e3..688d1ef8e864 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -231,11 +231,15 @@ DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> #define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION))
> #endif
>
> -static void __sched arm64_preempt_schedule_irq(void)
> +static void __sched arm64_preempt_schedule_irq(struct pt_regs *regs)
> {
> if (!need_irq_preemption())
> return;
>
> + /* Don't reschedule in case we are single stepping */
> + if (!(regs->pstate & DBG_SPSR_SS))
> + return;
Hmm, isn't this the common case? PSTATE.SS will usually be clear, no?
> * Note: thread_info::preempt_count includes both thread_info::count
> * and thread_info::need_resched, and is not equivalent to
> @@ -471,19 +475,33 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
> do_interrupt_handler(regs, handler);
> irq_exit_rcu();
>
> - arm64_preempt_schedule_irq();
> + arm64_preempt_schedule_irq(regs);
>
> exit_to_kernel_mode(regs);
> }
> +
> static void noinstr el1_interrupt(struct pt_regs *regs,
> void (*handler)(struct pt_regs *))
> {
> + unsigned long mdscr;
> +
> + /* Disable single stepping within interrupt handler */
> + if (regs->pstate & DBG_SPSR_SS) {
> + mdscr = read_sysreg(mdscr_el1);
> + write_sysreg(mdscr & ~DBG_MDSCR_SS, mdscr_el1);
> + }
I think this will break the implicit handling of kernel {break,watch}points.
Sadly, I think any attempts to workaround the issues here are likely just
to push the problems around. We really need to overhaul the debug exception
handling logic we have, which means I need to get back to writing up a
proposal.
Will
On Tue, 24 Jan 2023 at 23:34, Daniel Thompson
<[email protected]> wrote:
>
> On Thu, Jan 12, 2023 at 02:52:49PM +0530, Sumit Garg wrote:
> > Hi Will, Catalin,
> >
> > On Mon, 19 Dec 2022 at 15:55, Sumit Garg <[email protected]> wrote:
> > >
> > > This patch-set reworks pending fixes from Wei's series [1] to make
> > > single-step debugging via kgdb/kdb on arm64 work as expected. There was
> > > a prior discussion on ML [2] regarding if we should keep the interrupts
> > > enabled during single-stepping. So patch #1 follows suggestion from Will
> > > [3] to not disable interrupts during single stepping but rather skip
> > > single stepping within interrupt handler.
> > >
> > > [1] https://lore.kernel.org/all/[email protected]/
> > > [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
> > > [3] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
> > >
> > > Changes in v5:
> > > - Incorporated misc. comments from Mark.
> > >
> >
> > Since patch #1 has already been reviewed/acked by Mark and the
> > complete patchset has been tested by Doug, would it be fine for you to
> > pick up this patchset? It fixes a real single stepping problem for
> > kgdb on arm64.
>
> Sorry to be quiet for so long.
>
> Testing this patch set has proven to be a little difficult.
>
> It certainly fixes the single step tests in the kgdbtest suite.
> That's a good start.
>
> Unfortunately when testing using qemu/KVM (hosted on NXP
> 2k/Solidrun Honeycomb) the patch set is resulting in instability
> running the built-in self tests (specifically this one:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/kgdbts.c#n74 ). Running this test using the kgdbtest harness
> results in the test failing roughly a third of the time.
>
> The error reported is that the trap handler tried to unlock a spinlock
> that isn't currently locked. To be honest I suspect this is a generic
> problem that the new feature happens to tickle (this test has
> historically been unreliable on x86 too... and x86 is noteworthy for
> being the only other platform I test using KVM rather than pure qemu).
> Of course the only way to prove that would be to find and fix the
> problem in the trap handler (which probably involves rewriting it) and I
> haven't managed to do that yet.
>
> In short, I think the debugger is more useful with this patchset than
> without so, although it is caveated by the above, I'd call this:
>
> Acked-by: Daniel Thompson <[email protected]>
> Tested-by: Daniel Thompson <[email protected]>
>
Thanks Daniel for the in-depth testing.
-Sumit
>
> Daniel.
Hi Luis,
On Wed, 25 Jan 2023 at 14:48, Luis Machado <[email protected]> wrote:
>
> Hi,
>
> Is this expected to change single-stepping operation in usespace for debuggers (gdb/lldb)?
No it won't affect user-space debuggers as we are only touching the
interrupt path in EL1 mode.
-Sumit
> If so, it would be nice to at least
> test it a little to make sure it works.
>
> On 1/24/23 18:04, Daniel Thompson wrote:
> > On Thu, Jan 12, 2023 at 02:52:49PM +0530, Sumit Garg wrote:
> >> Hi Will, Catalin,
> >>
> >> On Mon, 19 Dec 2022 at 15:55, Sumit Garg <[email protected]> wrote:
> >>>
> >>> This patch-set reworks pending fixes from Wei's series [1] to make
> >>> single-step debugging via kgdb/kdb on arm64 work as expected. There was
> >>> a prior discussion on ML [2] regarding if we should keep the interrupts
> >>> enabled during single-stepping. So patch #1 follows suggestion from Will
> >>> [3] to not disable interrupts during single stepping but rather skip
> >>> single stepping within interrupt handler.
> >>>
> >>> [1] https://lore.kernel.org/all/[email protected]/
> >>> [2] https://lore.kernel.org/all/CAD=FV=Voyfq3Qz0T3RY+aYWYJ0utdH=P_AweB=13rcV8GDBeyQ@mail.gmail.com/
> >>> [3] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/
> >>>
> >>> Changes in v5:
> >>> - Incorporated misc. comments from Mark.
> >>>
> >>
> >> Since patch #1 has already been reviewed/acked by Mark and the
> >> complete patchset has been tested by Doug, would it be fine for you to
> >> pick up this patchset? It fixes a real single stepping problem for
> >> kgdb on arm64.
> >
> > Sorry to be quiet for so long.
> >
> > Testing this patch set has proven to be a little difficult.
> >
> > It certainly fixes the single step tests in the kgdbtest suite.
> > That's a good start.
> >
> > Unfortunately when testing using qemu/KVM (hosted on NXP
> > 2k/Solidrun Honeycomb) the patch set is resulting in instability
> > running the built-in self tests (specifically this one:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/kgdbts.c#n74 ). Running this test using the kgdbtest harness
> > results in the test failing roughly a third of the time.
> >
> > The error reported is that the trap handler tried to unlock a spinlock
> > that isn't currently locked. To be honest I suspect this is a generic
> > problem that the new feature happens to tickle (this test has
> > historically been unreliable on x86 too... and x86 is noteworthy for
> > being the only other platform I test using KVM rather than pure qemu).
> > Of course the only way to prove that would be to find and fix the
> > problem in the trap handler (which probably involves rewriting it) and I
> > haven't managed to do that yet.
> >
> > In short, I think the debugger is more useful with this patchset than
> > without so, although it is caveated by the above, I'd call this:
> >
> > Acked-by: Daniel Thompson <[email protected]>
> > Tested-by: Daniel Thompson <[email protected]>
> >
> >
> > Daniel.
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Hi Will,
Thanks for your review.
On Thu, 26 Jan 2023 at 19:09, Will Deacon <[email protected]> wrote:
>
> On Mon, Dec 19, 2022 at 03:54:51PM +0530, Sumit Garg wrote:
> > Currently on systems where the timer interrupt (or any other
> > fast-at-human-scale periodic interrupt) is active then it is impossible
> > to step any code with interrupts unlocked because we will always end up
> > stepping into the timer interrupt instead of stepping the user code.
> >
> > The common user's goal while single stepping is that when they step then
> > the system will stop at PC+4 or PC+I for a branch that gets taken
> > relative to the instruction they are stepping. So, fix broken single step
> > implementation via skipping single stepping into interrupt handlers.
> >
> > The methodology is when we receive an interrupt from EL1, check if we
> > are single stepping (pstate.SS). If yes then we save MDSCR_EL1.SS and
> > clear the register bit if it was set. Then unmask only D and leave I set.
> > On return from the interrupt, set D and restore MDSCR_EL1.SS. Along with
> > this skip reschedule if we were stepping.
> >
> > Suggested-by: Will Deacon <[email protected]>
> > Signed-off-by: Sumit Garg <[email protected]>
> > Tested-by: Douglas Anderson <[email protected]>
> > ---
> > arch/arm64/kernel/entry-common.c | 22 ++++++++++++++++++++--
> > 1 file changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> > index cce1167199e3..688d1ef8e864 100644
> > --- a/arch/arm64/kernel/entry-common.c
> > +++ b/arch/arm64/kernel/entry-common.c
> > @@ -231,11 +231,15 @@ DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> > #define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION))
> > #endif
> >
> > -static void __sched arm64_preempt_schedule_irq(void)
> > +static void __sched arm64_preempt_schedule_irq(struct pt_regs *regs)
> > {
> > if (!need_irq_preemption())
> > return;
> >
> > + /* Don't reschedule in case we are single stepping */
> > + if (!(regs->pstate & DBG_SPSR_SS))
> > + return;
>
> Hmm, isn't this the common case? PSTATE.SS will usually be clear, no?
>
Ah I see, looks like a copy paste error from v4. This check should be instead:
/* Don't reschedule in case we are single stepping */
if (regs->pstate & DBG_SPSR_SS)
return;
Thanks for catching this, I will correct it in the next version.
> > * Note: thread_info::preempt_count includes both thread_info::count
> > * and thread_info::need_resched, and is not equivalent to
> > @@ -471,19 +475,33 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
> > do_interrupt_handler(regs, handler);
> > irq_exit_rcu();
> >
> > - arm64_preempt_schedule_irq();
> > + arm64_preempt_schedule_irq(regs);
> >
> > exit_to_kernel_mode(regs);
> > }
> > +
> > static void noinstr el1_interrupt(struct pt_regs *regs,
> > void (*handler)(struct pt_regs *))
> > {
> > + unsigned long mdscr;
> > +
> > + /* Disable single stepping within interrupt handler */
> > + if (regs->pstate & DBG_SPSR_SS) {
> > + mdscr = read_sysreg(mdscr_el1);
> > + write_sysreg(mdscr & ~DBG_MDSCR_SS, mdscr_el1);
> > + }
>
> I think this will break the implicit handling of kernel {break,watch}points.
>
Can you please elaborate here? AFAICS, this change will only omit the
interrupt handler while stepping.
> Sadly, I think any attempts to workaround the issues here are likely just
> to push the problems around. We really need to overhaul the debug exception
> handling logic we have, which means I need to get back to writing up a
> proposal.
>
I will be very happy to assist you if you can help me understand the
problem here.
BTW, patch #2 should be an independent fix from patch #1. Can you pull
that alone?
-Sumit
> Will
On Thu, 26 Jan 2023 at 14:40, Will Deacon <[email protected]> wrote:
>
> On Mon, Dec 19, 2022 at 03:54:51PM +0530, Sumit Garg wrote:
> > Currently on systems where the timer interrupt (or any other
> > fast-at-human-scale periodic interrupt) is active then it is impossible
> > to step any code with interrupts unlocked because we will always end up
> > stepping into the timer interrupt instead of stepping the user code.
> >
> > The common user's goal while single stepping is that when they step then
> > the system will stop at PC+4 or PC+I for a branch that gets taken
> > relative to the instruction they are stepping. So, fix broken single step
> > implementation via skipping single stepping into interrupt handlers.
> >
> > The methodology is when we receive an interrupt from EL1, check if we
> > are single stepping (pstate.SS). If yes then we save MDSCR_EL1.SS and
> > clear the register bit if it was set. Then unmask only D and leave I set.
> > On return from the interrupt, set D and restore MDSCR_EL1.SS. Along with
> > this skip reschedule if we were stepping.
> >
> > Suggested-by: Will Deacon <[email protected]>
> > Signed-off-by: Sumit Garg <[email protected]>
> > Tested-by: Douglas Anderson <[email protected]>
> > ---
> > arch/arm64/kernel/entry-common.c | 22 ++++++++++++++++++++--
> > 1 file changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> > index cce1167199e3..688d1ef8e864 100644
> > --- a/arch/arm64/kernel/entry-common.c
> > +++ b/arch/arm64/kernel/entry-common.c
> > @@ -231,11 +231,15 @@ DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> > #define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION))
> > #endif
> >
> > -static void __sched arm64_preempt_schedule_irq(void)
> > +static void __sched arm64_preempt_schedule_irq(struct pt_regs *regs)
> > {
> > if (!need_irq_preemption())
> > return;
> >
> > + /* Don't reschedule in case we are single stepping */
> > + if (!(regs->pstate & DBG_SPSR_SS))
> > + return;
>
> Hmm, isn't this the common case? PSTATE.SS will usually be clear, no?
>
> > * Note: thread_info::preempt_count includes both thread_info::count
> > * and thread_info::need_resched, and is not equivalent to
> > @@ -471,19 +475,33 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
> > do_interrupt_handler(regs, handler);
> > irq_exit_rcu();
> >
> > - arm64_preempt_schedule_irq();
> > + arm64_preempt_schedule_irq(regs);
> >
> > exit_to_kernel_mode(regs);
> > }
> > +
> > static void noinstr el1_interrupt(struct pt_regs *regs,
> > void (*handler)(struct pt_regs *))
> > {
> > + unsigned long mdscr;
> > +
> > + /* Disable single stepping within interrupt handler */
> > + if (regs->pstate & DBG_SPSR_SS) {
> > + mdscr = read_sysreg(mdscr_el1);
> > + write_sysreg(mdscr & ~DBG_MDSCR_SS, mdscr_el1);
> > + }
>
> I think this will break the implicit handling of kernel {break,watch}points.
>
> Sadly, I think any attempts to workaround the issues here are likely just
> to push the problems around. We really need to overhaul the debug exception
> handling logic we have, which means I need to get back to writing up a
> proposal.
>
That would be much appreciated.
This patch makes single step debugging of VMs running under QEMU much
more useful (using QEMU gdbstub), for the same reason as with kdb, as
otherwise, there's a 50/50 chance (in my experience) that doing a
single step will take you the IRQ handler instead of to the next
instruction in program order.
FWIW, I tested this patch with that scenario, and it seems to work
much better, but not 100%: I still end up in the IRQ handler
occasionally, but considerably less often.