2020-02-13 03:13:20

by Corey Minyard

[permalink] [raw]
Subject: [RFC PATCH 0/2] arm64 kgdb fixes for single stepping

I got a bug report about using kgdb on arm64, and it turns out it was
fairly broken. Patch 2 has a description of what was going on. I am
using a Marvell 8100 board.

The following patches fix the problem, but probably not in the
best way. They are what I hacked out to show the problems.

I am not quite sure how this will interact with kprobes and hardware
breakpoints which use the same code, but they would have been broken,
too, so this is not making them any worse.



2020-02-13 03:13:44

by Corey Minyard

[permalink] [raw]
Subject: [RFC PATCH 1/2] arm64: Pass registers to all single-step handling routines

From: Corey Minyard <[email protected]>

Get ready to set the SS bit in the MDSCR register in the kernel restore
handler.

Signed-off-by: Corey Minyard <[email protected]>
---
arch/arm64/include/asm/debug-monitors.h | 4 ++--
arch/arm64/kernel/debug-monitors.c | 4 ++--
arch/arm64/kernel/hw_breakpoint.c | 6 +++---
arch/arm64/kernel/kgdb.c | 6 +++---
arch/arm64/kernel/probes/kprobes.c | 4 ++--
5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index 7619f473155f..73ce974bf754 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -111,8 +111,8 @@ void user_rewind_single_step(struct task_struct *task);
void user_fastforward_single_step(struct task_struct *task);

void kernel_enable_single_step(struct pt_regs *regs);
-void kernel_disable_single_step(void);
-int kernel_active_single_step(void);
+void kernel_disable_single_step(struct pt_regs *regs);
+int kernel_active_single_step(struct pt_regs *regs);

#ifdef CONFIG_HAVE_HW_BREAKPOINT
int reinstall_suspended_bps(struct pt_regs *regs);
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 48222a4760c2..2a0dfd8b1265 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -414,7 +414,7 @@ void kernel_enable_single_step(struct pt_regs *regs)
}
NOKPROBE_SYMBOL(kernel_enable_single_step);

-void kernel_disable_single_step(void)
+void kernel_disable_single_step(struct pt_regs *regs)
{
WARN_ON(!irqs_disabled());
mdscr_write(mdscr_read() & ~DBG_MDSCR_SS);
@@ -422,7 +422,7 @@ void kernel_disable_single_step(void)
}
NOKPROBE_SYMBOL(kernel_disable_single_step);

-int kernel_active_single_step(void)
+int kernel_active_single_step(struct pt_regs *regs)
{
WARN_ON(!irqs_disabled());
return mdscr_read() & DBG_MDSCR_SS;
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 0b727edf4104..785c9a5060a6 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -682,7 +682,7 @@ static int breakpoint_handler(unsigned long unused, unsigned int esr,
if (*kernel_step != ARM_KERNEL_STEP_NONE)
return 0;

- if (kernel_active_single_step()) {
+ if (kernel_active_single_step(regs)) {
*kernel_step = ARM_KERNEL_STEP_SUSPEND;
} else {
*kernel_step = ARM_KERNEL_STEP_ACTIVE;
@@ -825,7 +825,7 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
if (*kernel_step != ARM_KERNEL_STEP_NONE)
return 0;

- if (kernel_active_single_step()) {
+ if (kernel_active_single_step(regs)) {
*kernel_step = ARM_KERNEL_STEP_SUSPEND;
} else {
*kernel_step = ARM_KERNEL_STEP_ACTIVE;
@@ -882,7 +882,7 @@ int reinstall_suspended_bps(struct pt_regs *regs)
toggle_bp_registers(AARCH64_DBG_REG_WCR, DBG_ACTIVE_EL0, 1);

if (*kernel_step != ARM_KERNEL_STEP_SUSPEND) {
- kernel_disable_single_step();
+ kernel_disable_single_step(regs);
handled_exception = 1;
} else {
handled_exception = 0;
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 43119922341f..220fe8fd6ace 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -200,8 +200,8 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
/*
* Received continue command, disable single step
*/
- if (kernel_active_single_step())
- kernel_disable_single_step();
+ if (kernel_active_single_step(linux_regs))
+ kernel_disable_single_step(linux_regs);

err = 0;
break;
@@ -221,7 +221,7 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
/*
* Enable single step handling
*/
- if (!kernel_active_single_step())
+ if (!kernel_active_single_step(linux_regs))
kernel_enable_single_step(linux_regs);
err = 0;
break;
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index d1c95dcf1d78..3082dfc3cd99 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -308,7 +308,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
if (!instruction_pointer(regs))
BUG();

- kernel_disable_single_step();
+ kernel_disable_single_step(regs);

if (kcb->kprobe_status == KPROBE_REENTER)
restore_previous_kprobe(kcb);
@@ -415,7 +415,7 @@ kprobe_single_step_handler(struct pt_regs *regs, unsigned int esr)

if (retval == DBG_HOOK_HANDLED) {
kprobes_restore_local_irqflag(kcb, regs);
- kernel_disable_single_step();
+ kernel_disable_single_step(regs);

post_kprobe_handler(kcb, regs);
}
--
2.17.1

2020-02-13 10:11:26

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] arm64 kgdb fixes for single stepping

On Wed, Feb 12, 2020 at 09:11:29PM -0600, [email protected] wrote:
> I got a bug report about using kgdb on arm64, and it turns out it was
> fairly broken. Patch 2 has a description of what was going on. I am
> using a Marvell 8100 board.
>
> The following patches fix the problem, but probably not in the
> best way. They are what I hacked out to show the problems.
>
> I am not quite sure how this will interact with kprobes and hardware
> breakpoints which use the same code, but they would have been broken,
> too, so this is not making them any worse.

This should all be handled by kgdb itself, not by changing the low-level
debug exception handling. For example, the '&kgdb_step_hook' can take
care of re-arming the step state machine and kgdb can also simply disable
interrupts during the step if it doesn't want to step into the handler.

Will

2020-02-13 15:58:45

by Corey Minyard

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] arm64 kgdb fixes for single stepping

On Thu, Feb 13, 2020 at 10:10:58AM +0000, Will Deacon wrote:
> On Wed, Feb 12, 2020 at 09:11:29PM -0600, [email protected] wrote:
> > I got a bug report about using kgdb on arm64, and it turns out it was
> > fairly broken. Patch 2 has a description of what was going on. I am
> > using a Marvell 8100 board.
> >
> > The following patches fix the problem, but probably not in the
> > best way. They are what I hacked out to show the problems.
> >
> > I am not quite sure how this will interact with kprobes and hardware
> > breakpoints which use the same code, but they would have been broken,
> > too, so this is not making them any worse.
>
> This should all be handled by kgdb itself, not by changing the low-level
> debug exception handling. For example, the '&kgdb_step_hook' can take
> care of re-arming the step state machine and kgdb can also simply disable
> interrupts during the step if it doesn't want to step into the handler.

How can kgdb disable the SS bit in MDSRC, or re-enable it on the right
CPU, without doing this in the exception handling?

I'm actually thinking that this may be a hardware bug. Looking at the
ARMv8 manual, it looks like PSTATE.SS should be set to 0 if the
processor takes an exception. That's definitely not happening; if I do
an instruction step from, say, sys_sync(), it gets the single-step trap
on the instruction after the PSTATE.D bit is disabled in el1_irq.

Even so, I think the migration issue is still a problem. If you do an
eret set up for single-step, and interrupts are on, and you get a timer
interrupt, it could migrate the task to a different CPU if
PREEMPT_ENABLE is set, right? If so, the MDSRC.SS bit will be set on
the wrong CPU and the single step trap won't happen. That will break
kprobes, too.

You mention turning off interrupts in kgdb when single-stepping, which
you could do and it would solve this problem. But it wouldn't solve the
problem of taking a paging exception, which you want to take in this
case. And you could still migrate on a paging exception. So I don't
think disabling interrupts is a good solution.

I don't see a solution besides clearing MDSCR.SS on an el1 exception
entry and conditionally setting it on an el1 exception return. It might
be better to have a thread flag to do this instead of depending on the
setting of that bit; I'm not sure how expensive accessing the MDSRC
register is.

Setting SPSR.SS on subsequent single steps is definitely an issue, but I
can split that out into a separate patch.

-corey

>
> Will