This patch set is to fix several issues of single-step debugging
in kgdb/kdb on arm64.
It seems that these issues have been shelved a very long time,
but i still hope to solve them, as the single-step debugging
is an useful feature.
Note:
Based on patch "arm64: cacheflush: Fix KGDB trap detection",
https://www.spinics.net/lists/arm-kernel/msg803741.html
Wei Li (4):
arm64: kgdb: Fix single-step exception handling oops
arm64: Extract kprobes_save_local_irqflag() and
kprobes_restore_local_irqflag()
arm64: kgdb: Fix single-stepping into the irq handler wrongly
arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
arch/arm64/include/asm/debug-monitors.h | 6 ++++++
arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
arch/arm64/kernel/kgdb.c | 16 +++++++++++---
arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
4 files changed, 48 insertions(+), 30 deletions(-)
--
2.17.1
PSTATE.I and PSTATE.D are very important for single-step working.
Without disabling interrupt on local CPU, there is a chance of
interrupt occurrence in the period of exception return and start of
out-of-line single-step, that result in wrongly single stepping
into the interrupt handler. And if D bit is set then, it results into
undefined exception and when it's handler enables dbg then single-step
exception is generated, not as expected.
As they are maintained well in kprobes_save_local_irqflag() and
kprobes_restore_local_irqflag() for kprobe module, extract them as
kernel_prepare_single_step() and kernel_cleanup_single_step() for
general use.
Signed-off-by: Wei Li <[email protected]>
---
arch/arm64/include/asm/debug-monitors.h | 4 ++++
arch/arm64/kernel/debug-monitors.c | 26 +++++++++++++++++++++++
arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
3 files changed, 32 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index 7619f473155f..b62469f3475b 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -113,6 +113,10 @@ void user_fastforward_single_step(struct task_struct *task);
void kernel_enable_single_step(struct pt_regs *regs);
void kernel_disable_single_step(void);
int kernel_active_single_step(void);
+void kernel_prepare_single_step(unsigned long *flags,
+ struct pt_regs *regs);
+void kernel_cleanup_single_step(unsigned long flags,
+ struct pt_regs *regs);
#ifdef CONFIG_HAVE_HW_BREAKPOINT
int reinstall_suspended_bps(struct pt_regs *regs);
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 48222a4760c2..25ce6b5a52d2 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -429,6 +429,32 @@ int kernel_active_single_step(void)
}
NOKPROBE_SYMBOL(kernel_active_single_step);
+/*
+ * Interrupts need to be disabled before single-step mode is set, and not
+ * reenabled until after single-step mode ends.
+ * Without disabling interrupt on local CPU, there is a chance of
+ * interrupt occurrence in the period of exception return and start of
+ * out-of-line single-step, that result in wrongly single stepping
+ * into the interrupt handler.
+ */
+void kernel_prepare_single_step(unsigned long *flags,
+ struct pt_regs *regs)
+{
+ *flags = regs->pstate & DAIF_MASK;
+ regs->pstate |= PSR_I_BIT;
+ /* Unmask PSTATE.D for enabling software step exceptions. */
+ regs->pstate &= ~PSR_D_BIT;
+}
+NOKPROBE_SYMBOL(kernel_prepare_single_step);
+
+void kernel_cleanup_single_step(unsigned long flags,
+ struct pt_regs *regs)
+{
+ regs->pstate &= ~DAIF_MASK;
+ regs->pstate |= flags;
+}
+NOKPROBE_SYMBOL(kernel_cleanup_single_step);
+
/* ptrace API */
void user_enable_single_step(struct task_struct *task)
{
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index d1c95dcf1d78..c655b6b543e3 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -168,30 +168,6 @@ static void __kprobes set_current_kprobe(struct kprobe *p)
__this_cpu_write(current_kprobe, p);
}
-/*
- * Interrupts need to be disabled before single-step mode is set, and not
- * reenabled until after single-step mode ends.
- * Without disabling interrupt on local CPU, there is a chance of
- * interrupt occurrence in the period of exception return and start of
- * out-of-line single-step, that result in wrongly single stepping
- * into the interrupt handler.
- */
-static void __kprobes kprobes_save_local_irqflag(struct kprobe_ctlblk *kcb,
- struct pt_regs *regs)
-{
- kcb->saved_irqflag = regs->pstate & DAIF_MASK;
- regs->pstate |= PSR_I_BIT;
- /* Unmask PSTATE.D for enabling software step exceptions. */
- regs->pstate &= ~PSR_D_BIT;
-}
-
-static void __kprobes kprobes_restore_local_irqflag(struct kprobe_ctlblk *kcb,
- struct pt_regs *regs)
-{
- regs->pstate &= ~DAIF_MASK;
- regs->pstate |= kcb->saved_irqflag;
-}
-
static void __kprobes
set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
{
@@ -227,7 +203,7 @@ static void __kprobes setup_singlestep(struct kprobe *p,
set_ss_context(kcb, slot); /* mark pending ss */
/* IRQs and single stepping do not mix well. */
- kprobes_save_local_irqflag(kcb, regs);
+ kernel_prepare_single_step(&kcb->saved_irqflag, regs);
kernel_enable_single_step(regs);
instruction_pointer_set(regs, slot);
} else {
@@ -414,7 +390,7 @@ kprobe_single_step_handler(struct pt_regs *regs, unsigned int esr)
retval = kprobe_ss_hit(kcb, instruction_pointer(regs));
if (retval == DBG_HOOK_HANDLED) {
- kprobes_restore_local_irqflag(kcb, regs);
+ kernel_cleanup_single_step(kcb->saved_irqflag, regs);
kernel_disable_single_step();
post_kprobe_handler(kcb, regs);
--
2.17.1
After entering kdb due to breakpoint, when we execute 'ss' or 'go' (will
delay installing breakpoints, do single-step first), it won't work
correctly, and it will enter kdb due to oops.
It's because the reason gotten in kdb_stub() is not as expected, and it
seems that the ex_vector for single-step should be 0, like what arch
powerpc/sh/parisc has implemented.
Before the patch:
Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff8000101486cc (printk)
is enabled addr at ffff8000101486cc, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa878040, pid 266) on processor 3 due to Breakpoint @ 0xffff8000101486cc
[3]kdb> ss
Entering kdb (current=0xffff0000fa878040, pid 266) on processor 3 Oops: (null)
due to oops @ 0xffff800010082ab8
CPU: 3 PID: 266 Comm: sh Not tainted 5.7.0-rc4-13839-gf0e5ad491718 #6
Hardware name: linux,dummy-virt (DT)
pstate: 00000085 (nzcv daIf -PAN -UAO)
pc : el1_irq+0x78/0x180
lr : __handle_sysrq+0x80/0x190
sp : ffff800015003bf0
x29: ffff800015003d20 x28: ffff0000fa878040
x27: 0000000000000000 x26: ffff80001126b1f0
x25: ffff800011b6a0d8 x24: 0000000000000000
x23: 0000000080200005 x22: ffff8000101486cc
x21: ffff800015003d30 x20: 0000ffffffffffff
x19: ffff8000119f2000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : ffff800015003e50
x7 : 0000000000000002 x6 : 00000000380b9990
x5 : ffff8000106e99e8 x4 : ffff0000fadd83c0
x3 : 0000ffffffffffff x2 : ffff800011b6a0d8
x1 : ffff800011b6a000 x0 : ffff80001130c9d8
Call trace:
el1_irq+0x78/0x180
printk+0x0/0x84
write_sysrq_trigger+0xb0/0x118
proc_reg_write+0xb4/0xe0
__vfs_write+0x18/0x40
vfs_write+0xb0/0x1b8
ksys_write+0x64/0xf0
__arm64_sys_write+0x14/0x20
el0_svc_common.constprop.2+0xb0/0x168
do_el0_svc+0x20/0x98
el0_sync_handler+0xec/0x1a8
el0_sync+0x140/0x180
[3]kdb>
After the patch:
Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff8000101486cc (printk)
is enabled addr at ffff8000101486cc, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to Breakpoint @ 0xffff8000101486cc
[0]kdb> g
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to Breakpoint @ 0xffff8000101486cc
[0]kdb> ss
Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to SS trap @ 0xffff800010082ab8
[0]kdb>
Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Signed-off-by: Wei Li <[email protected]>
---
arch/arm64/kernel/kgdb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 43119922341f..1a157ca33262 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -252,7 +252,7 @@ static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
if (!kgdb_single_step)
return DBG_HOOK_ERROR;
- kgdb_handle_exception(1, SIGTRAP, 0, regs);
+ kgdb_handle_exception(0, SIGTRAP, 0, regs);
return DBG_HOOK_HANDLED;
}
NOKPROBE_SYMBOL(kgdb_step_brk_fn);
--
2.17.1
After fixing wrongly single-stepping into the irq handler, when we execute
single-step in kdb/kgdb, we can see only the first step can work.
Refer to the ARM Architecture Reference Manual (ARM DDI 0487E.a) D2.12,
i think PSTATE.SS=1 should be set each step for transferring the PE to the
'Active-not-pending' state. The problem here is PSTATE.SS=1 is not set
since the second single-step.
After the first single-step, the PE transferes to the 'Inactive' state,
with PSTATE.SS=0 and MDSCR.SS=1, thus PSTATE.SS won't be set to 1 due to
kernel_active_single_step()=true. Then the PE transferes to the
'Active-pending' state when ERET and returns to the debugger by step
exception.
Before this patch:
* kdb:
Entering kdb (current=0xffff8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff80001014874c (printk)
is enabled addr at ffff80001014874c, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 3 due to Breakpoint @ 0xffff80001014874c
[3]kdb> ss
Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 3 due to SS trap @ 0xffff800010148750
[3]kdb> ss
Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 3 due to SS trap @ 0xffff800010148750
[3]kdb> ss
Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 3 due to SS trap @ 0xffff800010148750
[3]kdb>
* kgdb:
(gdb) target remote 127.1:23002
Remote debugging using 127.1:23002
arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) b printk
Breakpoint 1 at 0xffff80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
(gdb) c
Continuing.
[New Thread 277]
[Switching to Thread 276]
Thread 171 hit Breakpoint 1, printk (fmt=0xffff8000112fc130 "\001\066sysrq: HELP : ")
at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
2076 {
(gdb) stepi
0xffff800010148750 2076 {
(gdb) stepi
0xffff800010148750 2076 {
(gdb) stepi
0xffff800010148750 2076 {
(gdb)
After this patch:
* kdb:
Entering kdb (current=0xffff8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff80001014874c (printk)
is enabled addr at ffff80001014874c, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa800040, pid 264) on processor 2 due to Breakpoint @ 0xffff80001014874c
[2]kdb> ss
Entering kdb (current=0xffff0000fa800040, pid 264) on processor 2 due to SS trap @ 0xffff800010148750
[2]kdb> ss
Entering kdb (current=0xffff0000fa800040, pid 264) on processor 2 due to SS trap @ 0xffff800010148754
[2]kdb> ss
Entering kdb (current=0xffff0000fa800040, pid 264) on processor 2 due to SS trap @ 0xffff800010148758
[2]kdb>
* kgdb:
(gdb) target remote 127.1:23002
Remote debugging using 127.1:23002
arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) b printk
Breakpoint 1 at 0xffff80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
(gdb) c
Continuing.
[New Thread 281]
[New Thread 280]
[Switching to Thread 281]
Thread 174 hit Breakpoint 1, printk (fmt=0xffff8000112fc138 "\001\066sysrq: HELP : ")
at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
2076 {
(gdb) stepi
0xffff800010148750 2076 {
(gdb) stepi
2080 va_start(args, fmt);
(gdb) stepi
0xffff800010148758 2080 va_start(args, fmt);
(gdb)
Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Signed-off-by: Wei Li <[email protected]>
---
arch/arm64/include/asm/debug-monitors.h | 2 ++
arch/arm64/kernel/debug-monitors.c | 2 +-
arch/arm64/kernel/kgdb.c | 3 ++-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index b62469f3475b..a48b507c89ee 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -78,6 +78,8 @@ struct step_hook {
int (*fn)(struct pt_regs *regs, unsigned int esr);
};
+void set_regs_spsr_ss(struct pt_regs *regs);
+
void register_user_step_hook(struct step_hook *hook);
void unregister_user_step_hook(struct step_hook *hook);
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 25ce6b5a52d2..7a58233677de 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -141,7 +141,7 @@ postcore_initcall(debug_monitors_init);
/*
* Single step API and exception handling.
*/
-static void set_regs_spsr_ss(struct pt_regs *regs)
+void set_regs_spsr_ss(struct pt_regs *regs)
{
regs->pstate |= DBG_SPSR_SS;
}
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 3910ac06c261..093ad9d2e5e6 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -230,7 +230,8 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
kernel_prepare_single_step(&per_cpu(kgdb_ss_flags,
raw_smp_processor_id()), linux_regs);
kernel_enable_single_step(linux_regs);
- }
+ } else
+ set_regs_spsr_ss(linux_regs);
err = 0;
break;
--
2.17.1
After the single-step exception handling oops is fixed, when we execute
single-step in kdb/kgdb, we may see it jumps to the irq_handler (where
PSTATE.D is cleared) instead of the next instruction.
Add the prepare and cleanup work for single-step when enabling and
disabling to maintain the PSTATE.I and PSTATE.D carefully.
Before this patch:
* kdb:
Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff8000101486cc (printk)
is enabled addr at ffff8000101486cc, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fada65c0, pid 267) on processor 0 due to Breakpoint @ 0xffff8000101486cc
[0]kdb> ss
Entering kdb (current=0xffff0000fada65c0, pid 267) on processor 0 due to SS trap @ 0xffff800010082ab8
[0]kdb> 0xffff800010082ab8
0xffff800010082ab8 = 0xffff800010082ab8 (el1_irq+0x78)
[0]kdb>
0xffff800010082ab0 <+112>: nop
0xffff800010082ab4 <+116>: msr daifclr, #0xd
0xffff800010082ab8 <+120>: adrp x1, 0xffff8000113a7000 <cpu_ops+1288>
0xffff800010082abc <+124>: ldr x1, [x1, #1408]
* kgdb:
(gdb) target remote 127.1:23002
Remote debugging using 127.1:23002
arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) b printk
Breakpoint 1 at 0xffff8000101486cc: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
(gdb) c
Continuing.
[New Thread 287]
[Switching to Thread 283]
Thread 177 hit Breakpoint 1, printk (fmt=0xffff80001130c9d8 "\001\066sysrq: HELP : ")
at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
2076 {
(gdb) stepi
el1_irq () at /home/liwei/main_code/linux/arch/arm64/kernel/entry.S:608
608 irq_handler
(gdb)
After this patch:
* kdb:
Entering kdb (current=0xffff8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry
[0]kdb> bp printk
Instruction(i) BP #0 at 0xffff80001014874c (printk)
is enabled addr at ffff80001014874c, hardtype=0 installed=0
[0]kdb> g
/ # echo h > /proc/sysrq-trigger
Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 0 due to Breakpoint @ 0xffff80001014874c
[0]kdb> ss
Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 0 due to SS trap @ 0xffff800010148750
[0]kdb>
* kgdb:
(gdb) target remote 127.1:23002
Remote debugging using 127.1:23002
arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) b printk
Breakpoint 1 at 0xffff80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
(gdb) c
Continuing.
[New Thread 277]
[Switching to Thread 276]
Thread 171 hit Breakpoint 1, printk (fmt=0xffff8000112fc130 "\001\066sysrq: HELP : ")
at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
2076 {
(gdb) stepi
0xffff800010148750 2076 {
(gdb)
Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
Signed-off-by: Wei Li <[email protected]>
---
arch/arm64/kernel/kgdb.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 1a157ca33262..3910ac06c261 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -100,6 +100,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
{ "fpcr", 4, -1 },
};
+static DEFINE_PER_CPU(unsigned long, kgdb_ss_flags);
+
char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
{
if (regno >= DBG_MAX_REG_NUM || regno < 0)
@@ -200,8 +202,11 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
/*
* Received continue command, disable single step
*/
- if (kernel_active_single_step())
+ if (kernel_active_single_step()) {
+ kernel_cleanup_single_step(per_cpu(kgdb_ss_flags,
+ raw_smp_processor_id()), linux_regs);
kernel_disable_single_step();
+ }
err = 0;
break;
@@ -221,8 +226,12 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
/*
* Enable single step handling
*/
- if (!kernel_active_single_step())
+ if (!kernel_active_single_step()) {
+ kernel_prepare_single_step(&per_cpu(kgdb_ss_flags,
+ raw_smp_processor_id()), linux_regs);
kernel_enable_single_step(linux_regs);
+ }
+
err = 0;
break;
default:
--
2.17.1
Hi Wei,
On Sun, 10 May 2020 05:41:57 +0800
Wei Li <[email protected]> wrote:
> PSTATE.I and PSTATE.D are very important for single-step working.
>
> Without disabling interrupt on local CPU, there is a chance of
> interrupt occurrence in the period of exception return and start of
> out-of-line single-step, that result in wrongly single stepping
> into the interrupt handler. And if D bit is set then, it results into
> undefined exception and when it's handler enables dbg then single-step
> exception is generated, not as expected.
>
> As they are maintained well in kprobes_save_local_irqflag() and
> kprobes_restore_local_irqflag() for kprobe module, extract them as
> kernel_prepare_single_step() and kernel_cleanup_single_step() for
> general use.
Agreed, these prepare/cleanup should be generic.
Reviewed-by: Masami Hiramatsu <[email protected]>
Thank you!
>
> Signed-off-by: Wei Li <[email protected]>
> ---
> arch/arm64/include/asm/debug-monitors.h | 4 ++++
> arch/arm64/kernel/debug-monitors.c | 26 +++++++++++++++++++++++
> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> 3 files changed, 32 insertions(+), 26 deletions(-)
>
> diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
> index 7619f473155f..b62469f3475b 100644
> --- a/arch/arm64/include/asm/debug-monitors.h
> +++ b/arch/arm64/include/asm/debug-monitors.h
> @@ -113,6 +113,10 @@ void user_fastforward_single_step(struct task_struct *task);
> void kernel_enable_single_step(struct pt_regs *regs);
> void kernel_disable_single_step(void);
> int kernel_active_single_step(void);
> +void kernel_prepare_single_step(unsigned long *flags,
> + struct pt_regs *regs);
> +void kernel_cleanup_single_step(unsigned long flags,
> + struct pt_regs *regs);
>
> #ifdef CONFIG_HAVE_HW_BREAKPOINT
> int reinstall_suspended_bps(struct pt_regs *regs);
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index 48222a4760c2..25ce6b5a52d2 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -429,6 +429,32 @@ int kernel_active_single_step(void)
> }
> NOKPROBE_SYMBOL(kernel_active_single_step);
>
> +/*
> + * Interrupts need to be disabled before single-step mode is set, and not
> + * reenabled until after single-step mode ends.
> + * Without disabling interrupt on local CPU, there is a chance of
> + * interrupt occurrence in the period of exception return and start of
> + * out-of-line single-step, that result in wrongly single stepping
> + * into the interrupt handler.
> + */
> +void kernel_prepare_single_step(unsigned long *flags,
> + struct pt_regs *regs)
> +{
> + *flags = regs->pstate & DAIF_MASK;
> + regs->pstate |= PSR_I_BIT;
> + /* Unmask PSTATE.D for enabling software step exceptions. */
> + regs->pstate &= ~PSR_D_BIT;
> +}
> +NOKPROBE_SYMBOL(kernel_prepare_single_step);
> +
> +void kernel_cleanup_single_step(unsigned long flags,
> + struct pt_regs *regs)
> +{
> + regs->pstate &= ~DAIF_MASK;
> + regs->pstate |= flags;
> +}
> +NOKPROBE_SYMBOL(kernel_cleanup_single_step);
> +
> /* ptrace API */
> void user_enable_single_step(struct task_struct *task)
> {
> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
> index d1c95dcf1d78..c655b6b543e3 100644
> --- a/arch/arm64/kernel/probes/kprobes.c
> +++ b/arch/arm64/kernel/probes/kprobes.c
> @@ -168,30 +168,6 @@ static void __kprobes set_current_kprobe(struct kprobe *p)
> __this_cpu_write(current_kprobe, p);
> }
>
> -/*
> - * Interrupts need to be disabled before single-step mode is set, and not
> - * reenabled until after single-step mode ends.
> - * Without disabling interrupt on local CPU, there is a chance of
> - * interrupt occurrence in the period of exception return and start of
> - * out-of-line single-step, that result in wrongly single stepping
> - * into the interrupt handler.
> - */
> -static void __kprobes kprobes_save_local_irqflag(struct kprobe_ctlblk *kcb,
> - struct pt_regs *regs)
> -{
> - kcb->saved_irqflag = regs->pstate & DAIF_MASK;
> - regs->pstate |= PSR_I_BIT;
> - /* Unmask PSTATE.D for enabling software step exceptions. */
> - regs->pstate &= ~PSR_D_BIT;
> -}
> -
> -static void __kprobes kprobes_restore_local_irqflag(struct kprobe_ctlblk *kcb,
> - struct pt_regs *regs)
> -{
> - regs->pstate &= ~DAIF_MASK;
> - regs->pstate |= kcb->saved_irqflag;
> -}
> -
> static void __kprobes
> set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
> {
> @@ -227,7 +203,7 @@ static void __kprobes setup_singlestep(struct kprobe *p,
> set_ss_context(kcb, slot); /* mark pending ss */
>
> /* IRQs and single stepping do not mix well. */
> - kprobes_save_local_irqflag(kcb, regs);
> + kernel_prepare_single_step(&kcb->saved_irqflag, regs);
> kernel_enable_single_step(regs);
> instruction_pointer_set(regs, slot);
> } else {
> @@ -414,7 +390,7 @@ kprobe_single_step_handler(struct pt_regs *regs, unsigned int esr)
> retval = kprobe_ss_hit(kcb, instruction_pointer(regs));
>
> if (retval == DBG_HOOK_HANDLED) {
> - kprobes_restore_local_irqflag(kcb, regs);
> + kernel_cleanup_single_step(kcb->saved_irqflag, regs);
> kernel_disable_single_step();
>
> post_kprobe_handler(kcb, regs);
> --
> 2.17.1
>
--
Masami Hiramatsu <[email protected]>
Hi,
On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
>
> PSTATE.I and PSTATE.D are very important for single-step working.
>
> Without disabling interrupt on local CPU, there is a chance of
> interrupt occurrence in the period of exception return and start of
> out-of-line single-step, that result in wrongly single stepping
> into the interrupt handler. And if D bit is set then, it results into
> undefined exception and when it's handler enables dbg then single-step
> exception is generated, not as expected.
>
> As they are maintained well in kprobes_save_local_irqflag() and
> kprobes_restore_local_irqflag() for kprobe module, extract them as
> kernel_prepare_single_step() and kernel_cleanup_single_step() for
> general use.
>
> Signed-off-by: Wei Li <[email protected]>
> ---
> arch/arm64/include/asm/debug-monitors.h | 4 ++++
> arch/arm64/kernel/debug-monitors.c | 26 +++++++++++++++++++++++
> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> 3 files changed, 32 insertions(+), 26 deletions(-)
>
> diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
> index 7619f473155f..b62469f3475b 100644
> --- a/arch/arm64/include/asm/debug-monitors.h
> +++ b/arch/arm64/include/asm/debug-monitors.h
> @@ -113,6 +113,10 @@ void user_fastforward_single_step(struct task_struct *task);
> void kernel_enable_single_step(struct pt_regs *regs);
> void kernel_disable_single_step(void);
> int kernel_active_single_step(void);
> +void kernel_prepare_single_step(unsigned long *flags,
> + struct pt_regs *regs);
> +void kernel_cleanup_single_step(unsigned long flags,
> + struct pt_regs *regs);
>
> #ifdef CONFIG_HAVE_HW_BREAKPOINT
> int reinstall_suspended_bps(struct pt_regs *regs);
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index 48222a4760c2..25ce6b5a52d2 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -429,6 +429,32 @@ int kernel_active_single_step(void)
> }
> NOKPROBE_SYMBOL(kernel_active_single_step);
>
> +/*
> + * Interrupts need to be disabled before single-step mode is set, and not
> + * reenabled until after single-step mode ends.
> + * Without disabling interrupt on local CPU, there is a chance of
> + * interrupt occurrence in the period of exception return and start of
> + * out-of-line single-step, that result in wrongly single stepping
> + * into the interrupt handler.
> + */
> +void kernel_prepare_single_step(unsigned long *flags,
> + struct pt_regs *regs)
> +{
> + *flags = regs->pstate & DAIF_MASK;
> + regs->pstate |= PSR_I_BIT;
> + /* Unmask PSTATE.D for enabling software step exceptions. */
> + regs->pstate &= ~PSR_D_BIT;
> +}
> +NOKPROBE_SYMBOL(kernel_prepare_single_step);
nit: why not just return unsigned long rather than passing by reference?
> +
> +void kernel_cleanup_single_step(unsigned long flags,
> + struct pt_regs *regs)
> +{
> + regs->pstate &= ~DAIF_MASK;
> + regs->pstate |= flags;
> +}
> +NOKPROBE_SYMBOL(kernel_cleanup_single_step);
> +
> /* ptrace API */
> void user_enable_single_step(struct task_struct *task)
> {
> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
> index d1c95dcf1d78..c655b6b543e3 100644
> --- a/arch/arm64/kernel/probes/kprobes.c
> +++ b/arch/arm64/kernel/probes/kprobes.c
> @@ -168,30 +168,6 @@ static void __kprobes set_current_kprobe(struct kprobe *p)
> __this_cpu_write(current_kprobe, p);
> }
>
> -/*
> - * Interrupts need to be disabled before single-step mode is set, and not
> - * reenabled until after single-step mode ends.
> - * Without disabling interrupt on local CPU, there is a chance of
> - * interrupt occurrence in the period of exception return and start of
> - * out-of-line single-step, that result in wrongly single stepping
> - * into the interrupt handler.
> - */
> -static void __kprobes kprobes_save_local_irqflag(struct kprobe_ctlblk *kcb,
> - struct pt_regs *regs)
> -{
> - kcb->saved_irqflag = regs->pstate & DAIF_MASK;
> - regs->pstate |= PSR_I_BIT;
> - /* Unmask PSTATE.D for enabling software step exceptions. */
> - regs->pstate &= ~PSR_D_BIT;
> -}
> -
> -static void __kprobes kprobes_restore_local_irqflag(struct kprobe_ctlblk *kcb,
> - struct pt_regs *regs)
> -{
> - regs->pstate &= ~DAIF_MASK;
> - regs->pstate |= kcb->saved_irqflag;
> -}
> -
> static void __kprobes
> set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
> {
> @@ -227,7 +203,7 @@ static void __kprobes setup_singlestep(struct kprobe *p,
> set_ss_context(kcb, slot); /* mark pending ss */
>
> /* IRQs and single stepping do not mix well. */
> - kprobes_save_local_irqflag(kcb, regs);
> + kernel_prepare_single_step(&kcb->saved_irqflag, regs);
Is there some reason to have two functions? It seems like every time
you call kernel_enable_single_step() you'd want to call
kernel_prepare_single_step(). ...and every time you call
kernel_disable_single_step() you'd want to call
kernel_cleanup_single_step().
Maybe you can just add the flags parameter to
kernel_enable_single_step() / kernel_disable_single_step() and put the
code in there?
-Doug
Hi,
On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
>
> After entering kdb due to breakpoint, when we execute 'ss' or 'go' (will
> delay installing breakpoints, do single-step first), it won't work
> correctly, and it will enter kdb due to oops.
>
> It's because the reason gotten in kdb_stub() is not as expected, and it
> seems that the ex_vector for single-step should be 0, like what arch
> powerpc/sh/parisc has implemented.
>
> Before the patch:
> Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
> [0]kdb> bp printk
> Instruction(i) BP #0 at 0xffff8000101486cc (printk)
> is enabled addr at ffff8000101486cc, hardtype=0 installed=0
>
> [0]kdb> g
>
> / # echo h > /proc/sysrq-trigger
>
> Entering kdb (current=0xffff0000fa878040, pid 266) on processor 3 due to Breakpoint @ 0xffff8000101486cc
> [3]kdb> ss
>
> Entering kdb (current=0xffff0000fa878040, pid 266) on processor 3 Oops: (null)
> due to oops @ 0xffff800010082ab8
> CPU: 3 PID: 266 Comm: sh Not tainted 5.7.0-rc4-13839-gf0e5ad491718 #6
> Hardware name: linux,dummy-virt (DT)
> pstate: 00000085 (nzcv daIf -PAN -UAO)
> pc : el1_irq+0x78/0x180
> lr : __handle_sysrq+0x80/0x190
> sp : ffff800015003bf0
> x29: ffff800015003d20 x28: ffff0000fa878040
> x27: 0000000000000000 x26: ffff80001126b1f0
> x25: ffff800011b6a0d8 x24: 0000000000000000
> x23: 0000000080200005 x22: ffff8000101486cc
> x21: ffff800015003d30 x20: 0000ffffffffffff
> x19: ffff8000119f2000 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000
> x15: 0000000000000000 x14: 0000000000000000
> x13: 0000000000000000 x12: 0000000000000000
> x11: 0000000000000000 x10: 0000000000000000
> x9 : 0000000000000000 x8 : ffff800015003e50
> x7 : 0000000000000002 x6 : 00000000380b9990
> x5 : ffff8000106e99e8 x4 : ffff0000fadd83c0
> x3 : 0000ffffffffffff x2 : ffff800011b6a0d8
> x1 : ffff800011b6a000 x0 : ffff80001130c9d8
> Call trace:
> el1_irq+0x78/0x180
> printk+0x0/0x84
> write_sysrq_trigger+0xb0/0x118
> proc_reg_write+0xb4/0xe0
> __vfs_write+0x18/0x40
> vfs_write+0xb0/0x1b8
> ksys_write+0x64/0xf0
> __arm64_sys_write+0x14/0x20
> el0_svc_common.constprop.2+0xb0/0x168
> do_el0_svc+0x20/0x98
> el0_sync_handler+0xec/0x1a8
> el0_sync+0x140/0x180
>
> [3]kdb>
>
> After the patch:
> Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
> [0]kdb> bp printk
> Instruction(i) BP #0 at 0xffff8000101486cc (printk)
> is enabled addr at ffff8000101486cc, hardtype=0 installed=0
>
> [0]kdb> g
>
> / # echo h > /proc/sysrq-trigger
>
> Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to Breakpoint @ 0xffff8000101486cc
> [0]kdb> g
>
> Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to Breakpoint @ 0xffff8000101486cc
> [0]kdb> ss
>
> Entering kdb (current=0xffff0000fa852bc0, pid 268) on processor 0 due to SS trap @ 0xffff800010082ab8
> [0]kdb>
>
> Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
> Signed-off-by: Wei Li <[email protected]>
> ---
> arch/arm64/kernel/kgdb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
As far as I can tell this looks right. Specifically in kdb_stub() I
see that it needs "ks->ex_vector == 0" to get "reason" as
KDB_REASON_SSTEP.
Reviewed-by: Douglas Anderson <[email protected]>
I did really basic testing of this series as a whole and things that
used to be broken for me around stepping / going past breakpoints are
now fixed. I'll keep using it more, but it definitely feels like it
makes things happier.
Tested-by: Douglas Anderson <[email protected]>
-Doug
Hi,
On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
>
> After the single-step exception handling oops is fixed, when we execute
> single-step in kdb/kgdb, we may see it jumps to the irq_handler (where
> PSTATE.D is cleared) instead of the next instruction.
>
> Add the prepare and cleanup work for single-step when enabling and
> disabling to maintain the PSTATE.I and PSTATE.D carefully.
>
> Before this patch:
> * kdb:
> Entering kdb (current=0xffff8000119e2dc0, pid 0) on processor 0 due to Keyboard Entry
> [0]kdb> bp printk
> Instruction(i) BP #0 at 0xffff8000101486cc (printk)
> is enabled addr at ffff8000101486cc, hardtype=0 installed=0
>
> [0]kdb> g
>
> / # echo h > /proc/sysrq-trigger
>
> Entering kdb (current=0xffff0000fada65c0, pid 267) on processor 0 due to Breakpoint @ 0xffff8000101486cc
> [0]kdb> ss
>
> Entering kdb (current=0xffff0000fada65c0, pid 267) on processor 0 due to SS trap @ 0xffff800010082ab8
> [0]kdb> 0xffff800010082ab8
> 0xffff800010082ab8 = 0xffff800010082ab8 (el1_irq+0x78)
> [0]kdb>
>
> 0xffff800010082ab0 <+112>: nop
> 0xffff800010082ab4 <+116>: msr daifclr, #0xd
> 0xffff800010082ab8 <+120>: adrp x1, 0xffff8000113a7000 <cpu_ops+1288>
> 0xffff800010082abc <+124>: ldr x1, [x1, #1408]
>
> * kgdb:
> (gdb) target remote 127.1:23002
> Remote debugging using 127.1:23002
> arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
> 21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
> (gdb) b printk
> Breakpoint 1 at 0xffff8000101486cc: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
> (gdb) c
> Continuing.
> [New Thread 287]
> [Switching to Thread 283]
>
> Thread 177 hit Breakpoint 1, printk (fmt=0xffff80001130c9d8 "\001\066sysrq: HELP : ")
> at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
> 2076 {
> (gdb) stepi
> el1_irq () at /home/liwei/main_code/linux/arch/arm64/kernel/entry.S:608
> 608 irq_handler
> (gdb)
>
> After this patch:
> * kdb:
> Entering kdb (current=0xffff8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry
> [0]kdb> bp printk
> Instruction(i) BP #0 at 0xffff80001014874c (printk)
> is enabled addr at ffff80001014874c, hardtype=0 installed=0
>
> [0]kdb> g
>
> / # echo h > /proc/sysrq-trigger
>
> Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 0 due to Breakpoint @ 0xffff80001014874c
> [0]kdb> ss
>
> Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 0 due to SS trap @ 0xffff800010148750
> [0]kdb>
>
> * kgdb:
> (gdb) target remote 127.1:23002
> Remote debugging using 127.1:23002
> arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
> 21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
> (gdb) b printk
> Breakpoint 1 at 0xffff80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
> (gdb) c
> Continuing.
> [New Thread 277]
> [Switching to Thread 276]
>
> Thread 171 hit Breakpoint 1, printk (fmt=0xffff8000112fc130 "\001\066sysrq: HELP : ")
> at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
> 2076 {
> (gdb) stepi
> 0xffff800010148750 2076 {
> (gdb)
>
> Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
> Signed-off-by: Wei Li <[email protected]>
> ---
> arch/arm64/kernel/kgdb.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index 1a157ca33262..3910ac06c261 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -100,6 +100,8 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
> { "fpcr", 4, -1 },
> };
>
> +static DEFINE_PER_CPU(unsigned long, kgdb_ss_flags);
> +
> char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
> {
> if (regno >= DBG_MAX_REG_NUM || regno < 0)
> @@ -200,8 +202,11 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
> /*
> * Received continue command, disable single step
> */
> - if (kernel_active_single_step())
> + if (kernel_active_single_step()) {
> + kernel_cleanup_single_step(per_cpu(kgdb_ss_flags,
> + raw_smp_processor_id()), linux_regs);
I don't think you need the raw_ version of smp_processor_id(), do you?
-Doug
Hi,
On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
>
> After fixing wrongly single-stepping into the irq handler, when we execute
> single-step in kdb/kgdb, we can see only the first step can work.
>
> Refer to the ARM Architecture Reference Manual (ARM DDI 0487E.a) D2.12,
> i think PSTATE.SS=1 should be set each step for transferring the PE to the
> 'Active-not-pending' state. The problem here is PSTATE.SS=1 is not set
> since the second single-step.
>
> After the first single-step, the PE transferes to the 'Inactive' state,
> with PSTATE.SS=0 and MDSCR.SS=1, thus PSTATE.SS won't be set to 1 due to
> kernel_active_single_step()=true. Then the PE transferes to the
> 'Active-pending' state when ERET and returns to the debugger by step
> exception.
>
> Before this patch:
> * kdb:
> Entering kdb (current=0xffff8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry
> [0]kdb> bp printk
> Instruction(i) BP #0 at 0xffff80001014874c (printk)
> is enabled addr at ffff80001014874c, hardtype=0 installed=0
>
> [0]kdb> g
>
> / # echo h > /proc/sysrq-trigger
>
> Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 3 due to Breakpoint @ 0xffff80001014874c
> [3]kdb> ss
>
> Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 3 due to SS trap @ 0xffff800010148750
> [3]kdb> ss
>
> Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 3 due to SS trap @ 0xffff800010148750
> [3]kdb> ss
>
> Entering kdb (current=0xffff0000fa6948c0, pid 265) on processor 3 due to SS trap @ 0xffff800010148750
> [3]kdb>
>
> * kgdb:
> (gdb) target remote 127.1:23002
> Remote debugging using 127.1:23002
> arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
> 21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
> (gdb) b printk
> Breakpoint 1 at 0xffff80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
> (gdb) c
> Continuing.
> [New Thread 277]
> [Switching to Thread 276]
>
> Thread 171 hit Breakpoint 1, printk (fmt=0xffff8000112fc130 "\001\066sysrq: HELP : ")
> at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
> 2076 {
> (gdb) stepi
> 0xffff800010148750 2076 {
> (gdb) stepi
> 0xffff800010148750 2076 {
> (gdb) stepi
> 0xffff800010148750 2076 {
> (gdb)
>
> After this patch:
> * kdb:
> Entering kdb (current=0xffff8000119d2dc0, pid 0) on processor 0 due to Keyboard Entry
> [0]kdb> bp printk
> Instruction(i) BP #0 at 0xffff80001014874c (printk)
> is enabled addr at ffff80001014874c, hardtype=0 installed=0
>
> [0]kdb> g
>
> / # echo h > /proc/sysrq-trigger
>
> Entering kdb (current=0xffff0000fa800040, pid 264) on processor 2 due to Breakpoint @ 0xffff80001014874c
> [2]kdb> ss
>
> Entering kdb (current=0xffff0000fa800040, pid 264) on processor 2 due to SS trap @ 0xffff800010148750
> [2]kdb> ss
>
> Entering kdb (current=0xffff0000fa800040, pid 264) on processor 2 due to SS trap @ 0xffff800010148754
> [2]kdb> ss
>
> Entering kdb (current=0xffff0000fa800040, pid 264) on processor 2 due to SS trap @ 0xffff800010148758
> [2]kdb>
>
> * kgdb:
> (gdb) target remote 127.1:23002
> Remote debugging using 127.1:23002
> arch_kgdb_breakpoint () at /home/liwei/main_code/linux/arch/arm64/include/asm/kgdb.h:21
> 21 asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
> (gdb) b printk
> Breakpoint 1 at 0xffff80001014874c: file /home/liwei/main_code/linux/kernel/printk/printk.c, line 2076.
> (gdb) c
> Continuing.
> [New Thread 281]
> [New Thread 280]
> [Switching to Thread 281]
>
> Thread 174 hit Breakpoint 1, printk (fmt=0xffff8000112fc138 "\001\066sysrq: HELP : ")
> at /home/liwei/main_code/linux/kernel/printk/printk.c:2076
> 2076 {
> (gdb) stepi
> 0xffff800010148750 2076 {
> (gdb) stepi
> 2080 va_start(args, fmt);
> (gdb) stepi
> 0xffff800010148758 2080 va_start(args, fmt);
> (gdb)
>
> Fixes: 44679a4f142b ("arm64: KGDB: Add step debugging support")
> Signed-off-by: Wei Li <[email protected]>
> ---
> arch/arm64/include/asm/debug-monitors.h | 2 ++
> arch/arm64/kernel/debug-monitors.c | 2 +-
> arch/arm64/kernel/kgdb.c | 3 ++-
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
> index b62469f3475b..a48b507c89ee 100644
> --- a/arch/arm64/include/asm/debug-monitors.h
> +++ b/arch/arm64/include/asm/debug-monitors.h
> @@ -78,6 +78,8 @@ struct step_hook {
> int (*fn)(struct pt_regs *regs, unsigned int esr);
> };
>
> +void set_regs_spsr_ss(struct pt_regs *regs);
> +
> void register_user_step_hook(struct step_hook *hook);
> void unregister_user_step_hook(struct step_hook *hook);
>
> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
> index 25ce6b5a52d2..7a58233677de 100644
> --- a/arch/arm64/kernel/debug-monitors.c
> +++ b/arch/arm64/kernel/debug-monitors.c
> @@ -141,7 +141,7 @@ postcore_initcall(debug_monitors_init);
> /*
> * Single step API and exception handling.
> */
> -static void set_regs_spsr_ss(struct pt_regs *regs)
> +void set_regs_spsr_ss(struct pt_regs *regs)
> {
> regs->pstate |= DBG_SPSR_SS;
> }
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index 3910ac06c261..093ad9d2e5e6 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -230,7 +230,8 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
> kernel_prepare_single_step(&per_cpu(kgdb_ss_flags,
> raw_smp_processor_id()), linux_regs);
> kernel_enable_single_step(linux_regs);
> - }
> + } else
> + set_regs_spsr_ss(linux_regs);
One slight nit is that my personal preference is that if one half of
an "if/else" needs braces then both halves should have braces. I
don't know what Catalin and Will's policies are, though.
Other than that, this seems right to me. I will leave it to the
Catalin and Will folks to say if they'd rather have this call made
from a different place or if they're happy with where you've put it.
Reviewed-by: Douglas Anderson <[email protected]>
Tested-by: Douglas Anderson <[email protected]>
Hi,
On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
>
> This patch set is to fix several issues of single-step debugging
> in kgdb/kdb on arm64.
>
> It seems that these issues have been shelved a very long time,
> but i still hope to solve them, as the single-step debugging
> is an useful feature.
>
> Note:
> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> https://www.spinics.net/lists/arm-kernel/msg803741.html
>
> Wei Li (4):
> arm64: kgdb: Fix single-step exception handling oops
> arm64: Extract kprobes_save_local_irqflag() and
> kprobes_restore_local_irqflag()
> arm64: kgdb: Fix single-stepping into the irq handler wrongly
> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
>
> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> 4 files changed, 48 insertions(+), 30 deletions(-)
Just an overall note that I'm very happy that you posted this patch
series! It's always been a thorn in my side that stepping and
breakpoints were so broken on arm64 and I'm really excited that you're
fixing them. Now I'll have to get in the habit of using kgdb for more
than just debugging crashes and maybe I can use it more for exploring
how functions work more. :-)
I'll also note that with your patch series I'm even seeing the "call"
feature of gdb working now. That has always been terribly broken for
me.
-Doug
Hi Douglas,
On 2020/5/14 8:34, Doug Anderson wrote:
> Hi,
>
> On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
>>
>> This patch set is to fix several issues of single-step debugging
>> in kgdb/kdb on arm64.
>>
>> It seems that these issues have been shelved a very long time,
>> but i still hope to solve them, as the single-step debugging
>> is an useful feature.
>>
>> Note:
>> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
>> https://www.spinics.net/lists/arm-kernel/msg803741.html
>>
>> Wei Li (4):
>> arm64: kgdb: Fix single-step exception handling oops
>> arm64: Extract kprobes_save_local_irqflag() and
>> kprobes_restore_local_irqflag()
>> arm64: kgdb: Fix single-stepping into the irq handler wrongly
>> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
>>
>> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
>> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
>> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
>> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
>> 4 files changed, 48 insertions(+), 30 deletions(-)
>
> Just an overall note that I'm very happy that you posted this patch
> series! It's always been a thorn in my side that stepping and
> breakpoints were so broken on arm64 and I'm really excited that you're
> fixing them. Now I'll have to get in the habit of using kgdb for more
> than just debugging crashes and maybe I can use it more for exploring
> how functions work more. :-)
> > I'll also note that with your patch series I'm even seeing the "call"
> feature of gdb working now. That has always been terribly broken for
> me.
>
Thanks for reviewing and testing this series.
Enjoy exploring the kernel with kgdb/kdb! :-)
Thanks,
Wei
Hi Douglas,
On 2020/5/14 8:23, Doug Anderson wrote:
(SNIP)
>> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
>> index 3910ac06c261..093ad9d2e5e6 100644
>> --- a/arch/arm64/kernel/kgdb.c
>> +++ b/arch/arm64/kernel/kgdb.c
>> @@ -230,7 +230,8 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
>> kernel_prepare_single_step(&per_cpu(kgdb_ss_flags,
>> raw_smp_processor_id()), linux_regs);
>> kernel_enable_single_step(linux_regs);
>> - }
>> + } else
>> + set_regs_spsr_ss(linux_regs);
>
> One slight nit is that my personal preference is that if one half of
> an "if/else" needs braces then both halves should have braces. I
Thanks for spotting it. Refer to Documentation/process/coding-style.rst,
i will fix it in the v2.
> don't know what Catalin and Will's policies are, though.
>
> Other than that, this seems right to me. I will leave it to the
> Catalin and Will folks to say if they'd rather have this call made
> from a different place or if they're happy with where you've put it.
>
> Reviewed-by: Douglas Anderson <[email protected]>
> Tested-by: Douglas Anderson <[email protected]>
>
Thanks,
Wei
Hi Douglas,
On 2020/5/14 8:21, Doug Anderson wrote:
(SNIP)
>> +/*
>> + * Interrupts need to be disabled before single-step mode is set, and not
>> + * reenabled until after single-step mode ends.
>> + * Without disabling interrupt on local CPU, there is a chance of
>> + * interrupt occurrence in the period of exception return and start of
>> + * out-of-line single-step, that result in wrongly single stepping
>> + * into the interrupt handler.
>> + */
>> +void kernel_prepare_single_step(unsigned long *flags,
>> + struct pt_regs *regs)
>> +{
>> + *flags = regs->pstate & DAIF_MASK;
>> + regs->pstate |= PSR_I_BIT;
>> + /* Unmask PSTATE.D for enabling software step exceptions. */
>> + regs->pstate &= ~PSR_D_BIT;
>> +}
>> +NOKPROBE_SYMBOL(kernel_prepare_single_step);
>
> nit: why not just return unsigned long rather than passing by reference?
Because i just extract this function from kprobes_save_local_irqflag(), i think
return unsigned long is fine.
>
>> +
>> +void kernel_cleanup_single_step(unsigned long flags,
>> + struct pt_regs *regs)
>> +{
>> + regs->pstate &= ~DAIF_MASK;
>> + regs->pstate |= flags;
>> +}
>> +NOKPROBE_SYMBOL(kernel_cleanup_single_step);
>> +
>> /* ptrace API */
>> void user_enable_single_step(struct task_struct *task)
>> {
>> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
>> index d1c95dcf1d78..c655b6b543e3 100644
>> --- a/arch/arm64/kernel/probes/kprobes.c
>> +++ b/arch/arm64/kernel/probes/kprobes.c
>> @@ -168,30 +168,6 @@ static void __kprobes set_current_kprobe(struct kprobe *p)
>> __this_cpu_write(current_kprobe, p);
>> }
>>
>> -/*
>> - * Interrupts need to be disabled before single-step mode is set, and not
>> - * reenabled until after single-step mode ends.
>> - * Without disabling interrupt on local CPU, there is a chance of
>> - * interrupt occurrence in the period of exception return and start of
>> - * out-of-line single-step, that result in wrongly single stepping
>> - * into the interrupt handler.
>> - */
>> -static void __kprobes kprobes_save_local_irqflag(struct kprobe_ctlblk *kcb,
>> - struct pt_regs *regs)
>> -{
>> - kcb->saved_irqflag = regs->pstate & DAIF_MASK;
>> - regs->pstate |= PSR_I_BIT;
>> - /* Unmask PSTATE.D for enabling software step exceptions. */
>> - regs->pstate &= ~PSR_D_BIT;
>> -}
>> -
>> -static void __kprobes kprobes_restore_local_irqflag(struct kprobe_ctlblk *kcb,
>> - struct pt_regs *regs)
>> -{
>> - regs->pstate &= ~DAIF_MASK;
>> - regs->pstate |= kcb->saved_irqflag;
>> -}
>> -
>> static void __kprobes
>> set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
>> {
>> @@ -227,7 +203,7 @@ static void __kprobes setup_singlestep(struct kprobe *p,
>> set_ss_context(kcb, slot); /* mark pending ss */
>>
>> /* IRQs and single stepping do not mix well. */
>> - kprobes_save_local_irqflag(kcb, regs);
>> + kernel_prepare_single_step(&kcb->saved_irqflag, regs);
>
> Is there some reason to have two functions? It seems like every time
> you call kernel_enable_single_step() you'd want to call
> kernel_prepare_single_step(). ...and every time you call
> kernel_disable_single_step() you'd want to call
> kernel_cleanup_single_step().
>
> Maybe you can just add the flags parameter to
> kernel_enable_single_step() / kernel_disable_single_step() and put the
> code in there?
>
As kernel_enable_single_step() / kernel_disable_single_step() are also called in
breakpoint_handler() and watchpoint_handler(), i am not sure it's a right thing
to put the daif flag prepare/cleanup into them, especially we don't have a context
to save the flags.
Thanks,
Wei
Hi,
On Sat, May 16, 2020 at 1:47 AM liwei (GF) <[email protected]> wrote:
>
> >> - kprobes_save_local_irqflag(kcb, regs);
> >> + kernel_prepare_single_step(&kcb->saved_irqflag, regs);
> >
> > Is there some reason to have two functions? It seems like every time
> > you call kernel_enable_single_step() you'd want to call
> > kernel_prepare_single_step(). ...and every time you call
> > kernel_disable_single_step() you'd want to call
> > kernel_cleanup_single_step().
> >
> > Maybe you can just add the flags parameter to
> > kernel_enable_single_step() / kernel_disable_single_step() and put the
> > code in there?
> >
>
> As kernel_enable_single_step() / kernel_disable_single_step() are also called in
> breakpoint_handler() and watchpoint_handler(), i am not sure it's a right thing
> to put the daif flag prepare/cleanup into them, especially we don't have a context
> to save the flags.
I think you misunderstood what I was suggesting. Maybe better with
examples? I was suggesting doing this:
kcb->saved_irqflag = kernel_enable_single_step(regs);
...
kernel_disable_single_step(kcb->saved_irqflag, regs);
To me that seems better than what you have now:
kcb->saved_irqflag = kernel_prepare_single_step(regs);
kernel_enable_single_step(regs);
...
kernel_cleanup_single_step(kcb->saved_irqflag, regs);
kernel_disable_single_step();
...or am I confused?
-Doug
On Sat, 16 May 2020 09:17:21 -0700
Doug Anderson <[email protected]> wrote:
> Hi,
>
> On Sat, May 16, 2020 at 1:47 AM liwei (GF) <[email protected]> wrote:
> >
> > >> - kprobes_save_local_irqflag(kcb, regs);
> > >> + kernel_prepare_single_step(&kcb->saved_irqflag, regs);
> > >
> > > Is there some reason to have two functions? It seems like every time
> > > you call kernel_enable_single_step() you'd want to call
> > > kernel_prepare_single_step(). ...and every time you call
> > > kernel_disable_single_step() you'd want to call
> > > kernel_cleanup_single_step().
> > >
> > > Maybe you can just add the flags parameter to
> > > kernel_enable_single_step() / kernel_disable_single_step() and put the
> > > code in there?
> > >
> >
> > As kernel_enable_single_step() / kernel_disable_single_step() are also called in
> > breakpoint_handler() and watchpoint_handler(), i am not sure it's a right thing
> > to put the daif flag prepare/cleanup into them, especially we don't have a context
> > to save the flags.
>
> I think you misunderstood what I was suggesting. Maybe better with
> examples? I was suggesting doing this:
>
> kcb->saved_irqflag = kernel_enable_single_step(regs);
> ...
> kernel_disable_single_step(kcb->saved_irqflag, regs);
>
> To me that seems better than what you have now:
>
> kcb->saved_irqflag = kernel_prepare_single_step(regs);
> kernel_enable_single_step(regs);
> ...
> kernel_cleanup_single_step(kcb->saved_irqflag, regs);
> kernel_disable_single_step();
>
> ...or am I confused?
+1, this sounds good to me. Currently arch/arm64/kernel/probes/kprobes.c
has a code which sololy use kernel_disable_single_step() without regs
restoring, but it looks like a bug there. So maybe you need following patch.
Thank you,
-----
From: Masami Hiramatsu <[email protected]>
Date: Mon, 18 May 2020 23:08:28 +0900
Subject: [PATCH] arm64: kprobes: Restore saved interrupt flag before disabling
single step
Restore the saved interrupt flag in kprobe_ctlblk to regs->pstate
when a page fault happens on single-stepping instruction.
Without this fix, we will lose the flag if it happens because
kcb->saved_irqflag only knows the previous flag.
Fixes: 2dd0e8d2d2a1 ("arm64: Kprobes with single stepping support")
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: [email protected]
---
arch/arm64/kernel/probes/kprobes.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index d1c95dcf1d78..73fb99770f69 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -308,6 +308,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
if (!instruction_pointer(regs))
BUG();
+ kprobes_restore_local_irqflag(kcb, regs);
kernel_disable_single_step();
if (kcb->kprobe_status == KPROBE_REENTER)
--
2.25.1
--
Masami Hiramatsu <[email protected]>
Wei,
On Sat, May 16, 2020 at 1:20 AM liwei (GF) <[email protected]> wrote:
>
> Hi Douglas,
>
> On 2020/5/14 8:34, Doug Anderson wrote:
> > Hi,
> >
> > On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
> >>
> >> This patch set is to fix several issues of single-step debugging
> >> in kgdb/kdb on arm64.
> >>
> >> It seems that these issues have been shelved a very long time,
> >> but i still hope to solve them, as the single-step debugging
> >> is an useful feature.
> >>
> >> Note:
> >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> >>
> >> Wei Li (4):
> >> arm64: kgdb: Fix single-step exception handling oops
> >> arm64: Extract kprobes_save_local_irqflag() and
> >> kprobes_restore_local_irqflag()
> >> arm64: kgdb: Fix single-stepping into the irq handler wrongly
> >> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> >>
> >> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
> >> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
> >> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
> >> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> >> 4 files changed, 48 insertions(+), 30 deletions(-)
> >
> > Just an overall note that I'm very happy that you posted this patch
> > series! It's always been a thorn in my side that stepping and
> > breakpoints were so broken on arm64 and I'm really excited that you're
> > fixing them. Now I'll have to get in the habit of using kgdb for more
> > than just debugging crashes and maybe I can use it more for exploring
> > how functions work more. :-)
> > > I'll also note that with your patch series I'm even seeing the "call"
> > feature of gdb working now. That has always been terribly broken for
> > me.
> >
> Thanks for reviewing and testing this series.
> Enjoy exploring the kernel with kgdb/kdb! :-)
I'm curious to know if you plan another spin. The feedback you
received was fairly minor so it hopefully shouldn't be too hard. I'd
very much like to get your patches landed and I'd be happy to try to
address the feedback and spin them myself if you're no longer
available for it.
Thanks!
-Doug
On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> On Sat, May 16, 2020 at 1:20 AM liwei (GF) <[email protected]> wrote:
> > On 2020/5/14 8:34, Doug Anderson wrote:
> > > On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
> > >>
> > >> This patch set is to fix several issues of single-step debugging
> > >> in kgdb/kdb on arm64.
> > >>
> > >> It seems that these issues have been shelved a very long time,
> > >> but i still hope to solve them, as the single-step debugging
> > >> is an useful feature.
> > >>
> > >> Note:
> > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > >>
> > >> Wei Li (4):
> > >> arm64: kgdb: Fix single-step exception handling oops
> > >> arm64: Extract kprobes_save_local_irqflag() and
> > >> kprobes_restore_local_irqflag()
> > >> arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > >> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > >>
> > >> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
> > >> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
> > >> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
> > >> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> > >> 4 files changed, 48 insertions(+), 30 deletions(-)
> > >
> > > Just an overall note that I'm very happy that you posted this patch
> > > series! It's always been a thorn in my side that stepping and
> > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > fixing them. Now I'll have to get in the habit of using kgdb for more
> > > than just debugging crashes and maybe I can use it more for exploring
> > > how functions work more. :-)
> > > > I'll also note that with your patch series I'm even seeing the "call"
> > > feature of gdb working now. That has always been terribly broken for
> > > me.
> > >
> > Thanks for reviewing and testing this series.
> > Enjoy exploring the kernel with kgdb/kdb! :-)
>
> I'm curious to know if you plan another spin. The feedback you
> received was fairly minor so it hopefully shouldn't be too hard. I'd
> very much like to get your patches landed and I'd be happy to try to
> address the feedback and spin them myself if you're no longer
> available for it.
I'd welcome some feedback on the proposal I sent out at the end of last
week:
https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
which looks to solve some (all?) of these issues slightly differently,
because it turns out we need to perform some low-level surgery for
preempt-rt *anyway*...
I'm particularly keen to hear any thoughts concerning the reschedule on
return to EL1 after handling an interrupt that hit during a step.
Will
Hi,
On Tue, Jun 30, 2020 at 12:22 AM Will Deacon <[email protected]> wrote:
>
> On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> > On Sat, May 16, 2020 at 1:20 AM liwei (GF) <[email protected]> wrote:
> > > On 2020/5/14 8:34, Doug Anderson wrote:
> > > > On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
> > > >>
> > > >> This patch set is to fix several issues of single-step debugging
> > > >> in kgdb/kdb on arm64.
> > > >>
> > > >> It seems that these issues have been shelved a very long time,
> > > >> but i still hope to solve them, as the single-step debugging
> > > >> is an useful feature.
> > > >>
> > > >> Note:
> > > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > > >>
> > > >> Wei Li (4):
> > > >> arm64: kgdb: Fix single-step exception handling oops
> > > >> arm64: Extract kprobes_save_local_irqflag() and
> > > >> kprobes_restore_local_irqflag()
> > > >> arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > > >> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > > >>
> > > >> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
> > > >> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
> > > >> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
> > > >> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> > > >> 4 files changed, 48 insertions(+), 30 deletions(-)
> > > >
> > > > Just an overall note that I'm very happy that you posted this patch
> > > > series! It's always been a thorn in my side that stepping and
> > > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > > fixing them. Now I'll have to get in the habit of using kgdb for more
> > > > than just debugging crashes and maybe I can use it more for exploring
> > > > how functions work more. :-)
> > > > > I'll also note that with your patch series I'm even seeing the "call"
> > > > feature of gdb working now. That has always been terribly broken for
> > > > me.
> > > >
> > > Thanks for reviewing and testing this series.
> > > Enjoy exploring the kernel with kgdb/kdb! :-)
> >
> > I'm curious to know if you plan another spin. The feedback you
> > received was fairly minor so it hopefully shouldn't be too hard. I'd
> > very much like to get your patches landed and I'd be happy to try to
> > address the feedback and spin them myself if you're no longer
> > available for it.
>
> I'd welcome some feedback on the proposal I sent out at the end of last
> week:
>
> https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
>
> which looks to solve some (all?) of these issues
Actually, I'm pretty sure that patch #1 of Wei Li's patch series would
still be needed. Would you object to landing it now just to get one
patch outta the way?
https://lkml.kernel.org/r/[email protected]
> slightly differently,
> because it turns out we need to perform some low-level surgery for
> preempt-rt *anyway*...
>
> I'm particularly keen to hear any thoughts concerning the reschedule on
> return to EL1 after handling an interrupt that hit during a step.
Thanks for the pointer! Unfortunately this is yet another area that
I'm keenly interested in it working but pretty much clueless about how
this whole area of the system works. :-|
My first question, though, is why would we want interrupts enabled
while we're single stepping? If you're trying to get the processor to
just step one instruction forward you don't really want a bunch of
IRQs going off in the middle of it, do you? While in kgdb and
debugging the kernel I would assume that a single step would run the
very least amount of code that it could (other than debugger code).
In general every time I exit kgdb (and re-start running the kernel
normally) I sorta assume that there's a chance that something in the
system will hit a timeout (maybe it's talking to external hardware
that will give up or something). If I'm stepping through code I just
want that little bit of code I'm running to move forward and the rest
of the kernel to stand still. If I want the rest of the kernel to
proceed I guess I'd set a breakpoint and then say "continue".
It seemed like with Wei Li's patch that we were closing holes and
being more consistent about keeping interrupts disabled when stepping
and I liked that. Maybe if we made it so that taking interrupts
didn't break everything then it would be _OK_ to let them fire, but if
I had a choice I'd rather they didn't.
...of course, I'm looking at all this from the point of view of kgdb.
I don't know nearly enough about how other parts of the kernel use
single step to comment much on what would be best for them.
Did what I said make sense at all, or was it gibberish? ...or not
gibberish but not what you were looking for?
-Doug
Hi Doug,
On 2020/6/30 5:20, Doug Anderson wrote:
> Wei,
>
> On Sat, May 16, 2020 at 1:20 AM liwei (GF) <[email protected]> wrote:
>>
>> Hi Douglas,
>>
>> On 2020/5/14 8:34, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
>>>>
>>>> This patch set is to fix several issues of single-step debugging
>>>> in kgdb/kdb on arm64.
>>>>
>>>> It seems that these issues have been shelved a very long time,
>>>> but i still hope to solve them, as the single-step debugging
>>>> is an useful feature.
>>>>
>>>> Note:
>>>> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
>>>> https://www.spinics.net/lists/arm-kernel/msg803741.html
>>>>
>>>> Wei Li (4):
>>>> arm64: kgdb: Fix single-step exception handling oops
>>>> arm64: Extract kprobes_save_local_irqflag() and
>>>> kprobes_restore_local_irqflag()
>>>> arm64: kgdb: Fix single-stepping into the irq handler wrongly
>>>> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
>>>>
>>>> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
>>>> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
>>>> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
>>>> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
>>>> 4 files changed, 48 insertions(+), 30 deletions(-)
>>>
>>> Just an overall note that I'm very happy that you posted this patch
>>> series! It's always been a thorn in my side that stepping and
>>> breakpoints were so broken on arm64 and I'm really excited that you're
>>> fixing them. Now I'll have to get in the habit of using kgdb for more
>>> than just debugging crashes and maybe I can use it more for exploring
>>> how functions work more. :-)
>>>> I'll also note that with your patch series I'm even seeing the "call"
>>> feature of gdb working now. That has always been terribly broken for
>>> me.
>>>
>> Thanks for reviewing and testing this series.
>> Enjoy exploring the kernel with kgdb/kdb! :-)
>
> I'm curious to know if you plan another spin. The feedback you
> received was fairly minor so it hopefully shouldn't be too hard. I'd
> very much like to get your patches landed and I'd be happy to try to
> address the feedback and spin them myself if you're no longer
> available for it.
>
Sorry for the long delay. I was busy on a project and missed some mail.
I did receive some feedback about typo or coding style before, so i was expecting
more comment about the logic or people just don't care about these. After all,
this issue has lived a very long time.
It's a good news to hear that Will has plan to solve these issues, i will
follow that to do my bit.
Before that, i can send the next version in this week.
Thanks,
Wei
On Sun, 10 May 2020 05:41:55 +0800, Wei Li wrote:
> This patch set is to fix several issues of single-step debugging
> in kgdb/kdb on arm64.
>
> It seems that these issues have been shelved a very long time,
> but i still hope to solve them, as the single-step debugging
> is an useful feature.
>
> [...]
Applied to arm64 (for-next/fixes), thanks!
[1/1] arm64: kgdb: Fix single-step exception handling oops
https://git.kernel.org/arm64/c/8523c006264d
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
Doug,
On Mon, Jul 06, 2020 at 02:37:05PM -0700, Doug Anderson wrote:
> On Tue, Jun 30, 2020 at 12:22 AM Will Deacon <[email protected]> wrote:
> > On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> > > On Sat, May 16, 2020 at 1:20 AM liwei (GF) <[email protected]> wrote:
> > > > On 2020/5/14 8:34, Doug Anderson wrote:
> > > > > On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
> > > > >>
> > > > >> This patch set is to fix several issues of single-step debugging
> > > > >> in kgdb/kdb on arm64.
> > > > >>
> > > > >> It seems that these issues have been shelved a very long time,
> > > > >> but i still hope to solve them, as the single-step debugging
> > > > >> is an useful feature.
> > > > >>
> > > > >> Note:
> > > > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > > > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > > > >>
> > > > >> Wei Li (4):
> > > > >> arm64: kgdb: Fix single-step exception handling oops
> > > > >> arm64: Extract kprobes_save_local_irqflag() and
> > > > >> kprobes_restore_local_irqflag()
> > > > >> arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > > > >> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > > > >>
> > > > >> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
> > > > >> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
> > > > >> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
> > > > >> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> > > > >> 4 files changed, 48 insertions(+), 30 deletions(-)
> > > > >
> > > > > Just an overall note that I'm very happy that you posted this patch
> > > > > series! It's always been a thorn in my side that stepping and
> > > > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > > > fixing them. Now I'll have to get in the habit of using kgdb for more
> > > > > than just debugging crashes and maybe I can use it more for exploring
> > > > > how functions work more. :-)
> > > > > > I'll also note that with your patch series I'm even seeing the "call"
> > > > > feature of gdb working now. That has always been terribly broken for
> > > > > me.
> > > > >
> > > > Thanks for reviewing and testing this series.
> > > > Enjoy exploring the kernel with kgdb/kdb! :-)
> > >
> > > I'm curious to know if you plan another spin. The feedback you
> > > received was fairly minor so it hopefully shouldn't be too hard. I'd
> > > very much like to get your patches landed and I'd be happy to try to
> > > address the feedback and spin them myself if you're no longer
> > > available for it.
> >
> > I'd welcome some feedback on the proposal I sent out at the end of last
> > week:
> >
> > https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
> >
> > which looks to solve some (all?) of these issues
>
> Actually, I'm pretty sure that patch #1 of Wei Li's patch series would
> still be needed. Would you object to landing it now just to get one
> patch outta the way?
>
> https://lkml.kernel.org/r/[email protected]
I've grabbed patch 1, cheers.
> > slightly differently,
> > because it turns out we need to perform some low-level surgery for
> > preempt-rt *anyway*...
> >
> > I'm particularly keen to hear any thoughts concerning the reschedule on
> > return to EL1 after handling an interrupt that hit during a step.
>
> Thanks for the pointer! Unfortunately this is yet another area that
> I'm keenly interested in it working but pretty much clueless about how
> this whole area of the system works. :-|
I'm also keen to fix it up but, although I roughly know how it works, I
always fail to find the time to spend on it. :-|
> My first question, though, is why would we want interrupts enabled
> while we're single stepping? If you're trying to get the processor to
> just step one instruction forward you don't really want a bunch of
> IRQs going off in the middle of it, do you? While in kgdb and
> debugging the kernel I would assume that a single step would run the
> very least amount of code that it could (other than debugger code).
> In general every time I exit kgdb (and re-start running the kernel
> normally) I sorta assume that there's a chance that something in the
> system will hit a timeout (maybe it's talking to external hardware
> that will give up or something). If I'm stepping through code I just
> want that little bit of code I'm running to move forward and the rest
> of the kernel to stand still. If I want the rest of the kernel to
> proceed I guess I'd set a breakpoint and then say "continue".
I understand where you're coming from, but I also think it's a reasonably
narrow viewpoint. If you disable IRQs during a step, you can change the
behaviour of the instruction being stepped. For example, an MRS of DAIF
will now see the I bit set instead of clear, and so something like
irqs_disabled() could go wrong while being stepped. But I think the main
realisation is that instructions being stepped can generate exceptions for
other reasons too, for example if you try to step a BUG() or a get_user().
Not only does that mean that we have to deal with exceptions during a step,
but it also means that disabling interrupts is a pretty bad idea because
the exception context may rely on them being enabled (for example, if it has
to sleep while handling a page fault).
> It seemed like with Wei Li's patch that we were closing holes and
> being more consistent about keeping interrupts disabled when stepping
> and I liked that. Maybe if we made it so that taking interrupts
> didn't break everything then it would be _OK_ to let them fire, but if
> I had a choice I'd rather they didn't.
Although I appreciate somebody sending patches to improve this logic,
I really worry that it just moves the problem around and it won't be long
before somebody else sends another set of patches trying to deal with the
fallout. That ends up being a waste of everybody's time.
> ...of course, I'm looking at all this from the point of view of kgdb.
> I don't know nearly enough about how other parts of the kernel use
> single step to comment much on what would be best for them.
Sure, and I see you as Mr KGDB in this area anyway, but that's an important
user of this infrastructure (esp. single-step).
> Did what I said make sense at all, or was it gibberish? ...or not
> gibberish but not what you were looking for?
I'm fairly set on unmasking IRQs during step for the reasons I mention
above. The question is really whether or not to forcibly prevent preemption
during such an irq.
Will
Hi,
On Wed, Jul 8, 2020 at 3:06 PM Will Deacon <[email protected]> wrote:
>
> Doug,
>
> On Mon, Jul 06, 2020 at 02:37:05PM -0700, Doug Anderson wrote:
> > On Tue, Jun 30, 2020 at 12:22 AM Will Deacon <[email protected]> wrote:
> > > On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> > > > On Sat, May 16, 2020 at 1:20 AM liwei (GF) <[email protected]> wrote:
> > > > > On 2020/5/14 8:34, Doug Anderson wrote:
> > > > > > On Sat, May 9, 2020 at 6:49 AM Wei Li <[email protected]> wrote:
> > > > > >>
> > > > > >> This patch set is to fix several issues of single-step debugging
> > > > > >> in kgdb/kdb on arm64.
> > > > > >>
> > > > > >> It seems that these issues have been shelved a very long time,
> > > > > >> but i still hope to solve them, as the single-step debugging
> > > > > >> is an useful feature.
> > > > > >>
> > > > > >> Note:
> > > > > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > > > > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > > > > >>
> > > > > >> Wei Li (4):
> > > > > >> arm64: kgdb: Fix single-step exception handling oops
> > > > > >> arm64: Extract kprobes_save_local_irqflag() and
> > > > > >> kprobes_restore_local_irqflag()
> > > > > >> arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > > > > >> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > > > > >>
> > > > > >> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
> > > > > >> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
> > > > > >> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
> > > > > >> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> > > > > >> 4 files changed, 48 insertions(+), 30 deletions(-)
> > > > > >
> > > > > > Just an overall note that I'm very happy that you posted this patch
> > > > > > series! It's always been a thorn in my side that stepping and
> > > > > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > > > > fixing them. Now I'll have to get in the habit of using kgdb for more
> > > > > > than just debugging crashes and maybe I can use it more for exploring
> > > > > > how functions work more. :-)
> > > > > > > I'll also note that with your patch series I'm even seeing the "call"
> > > > > > feature of gdb working now. That has always been terribly broken for
> > > > > > me.
> > > > > >
> > > > > Thanks for reviewing and testing this series.
> > > > > Enjoy exploring the kernel with kgdb/kdb! :-)
> > > >
> > > > I'm curious to know if you plan another spin. The feedback you
> > > > received was fairly minor so it hopefully shouldn't be too hard. I'd
> > > > very much like to get your patches landed and I'd be happy to try to
> > > > address the feedback and spin them myself if you're no longer
> > > > available for it.
> > >
> > > I'd welcome some feedback on the proposal I sent out at the end of last
> > > week:
> > >
> > > https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
> > >
> > > which looks to solve some (all?) of these issues
> >
> > Actually, I'm pretty sure that patch #1 of Wei Li's patch series would
> > still be needed. Would you object to landing it now just to get one
> > patch outta the way?
> >
> > https://lkml.kernel.org/r/[email protected]
>
> I've grabbed patch 1, cheers.
Thanks!
> > > slightly differently,
> > > because it turns out we need to perform some low-level surgery for
> > > preempt-rt *anyway*...
> > >
> > > I'm particularly keen to hear any thoughts concerning the reschedule on
> > > return to EL1 after handling an interrupt that hit during a step.
> >
> > Thanks for the pointer! Unfortunately this is yet another area that
> > I'm keenly interested in it working but pretty much clueless about how
> > this whole area of the system works. :-|
>
> I'm also keen to fix it up but, although I roughly know how it works, I
> always fail to find the time to spend on it. :-|
Yeah, I know how it is. I love working on kgdb but I always have a
hard time putting it ahead of other tasks and there are always other
tasks. Knowing that other people use it helps me, at least, and some
of my recent work on kgdb was because a whole pile of other people
were all discussing how to get kgdb working. ;-)
> > My first question, though, is why would we want interrupts enabled
> > while we're single stepping? If you're trying to get the processor to
> > just step one instruction forward you don't really want a bunch of
> > IRQs going off in the middle of it, do you? While in kgdb and
> > debugging the kernel I would assume that a single step would run the
> > very least amount of code that it could (other than debugger code).
> > In general every time I exit kgdb (and re-start running the kernel
> > normally) I sorta assume that there's a chance that something in the
> > system will hit a timeout (maybe it's talking to external hardware
> > that will give up or something). If I'm stepping through code I just
> > want that little bit of code I'm running to move forward and the rest
> > of the kernel to stand still. If I want the rest of the kernel to
> > proceed I guess I'd set a breakpoint and then say "continue".
>
> I understand where you're coming from, but I also think it's a reasonably
> narrow viewpoint. If you disable IRQs during a step, you can change the
> behaviour of the instruction being stepped. For example, an MRS of DAIF
> will now see the I bit set instead of clear, and so something like
> irqs_disabled() could go wrong while being stepped. But I think the main
> realisation is that instructions being stepped can generate exceptions for
> other reasons too, for example if you try to step a BUG() or a get_user().
> Not only does that mean that we have to deal with exceptions during a step,
> but it also means that disabling interrupts is a pretty bad idea because
> the exception context may rely on them being enabled (for example, if it has
> to sleep while handling a page fault).
OK, you make a fair argument.
> > It seemed like with Wei Li's patch that we were closing holes and
> > being more consistent about keeping interrupts disabled when stepping
> > and I liked that. Maybe if we made it so that taking interrupts
> > didn't break everything then it would be _OK_ to let them fire, but if
> > I had a choice I'd rather they didn't.
>
> Although I appreciate somebody sending patches to improve this logic,
> I really worry that it just moves the problem around and it won't be long
> before somebody else sends another set of patches trying to deal with the
> fallout. That ends up being a waste of everybody's time.
Sure, though I think right now all of the non-kgdb stepping logic
already disables interrupts, right? ...and the kgdb logic for single
step is (and has always been) very broken for arm64. Unless there is
hope in the short or medium term of the "right" solution getting done,
it does feel like Wei Li's patches improve the situation...
> > ...of course, I'm looking at all this from the point of view of kgdb.
> > I don't know nearly enough about how other parts of the kernel use
> > single step to comment much on what would be best for them.
>
> Sure, and I see you as Mr KGDB in this area anyway, but that's an important
> user of this infrastructure (esp. single-step).
Yeah, I think that's how I'll be most useful in this situation. While
it'd be really cool to understand all the inner workings of this code,
realistically I don't have time and it's probably better for me to
treat it as a black box and just be useful as a tester of the end
result.
> > Did what I said make sense at all, or was it gibberish? ...or not
> > gibberish but not what you were looking for?
>
> I'm fairly set on unmasking IRQs during step for the reasons I mention
> above. The question is really whether or not to forcibly prevent preemption
> during such an irq.
My first take would be the same argument I made for keeping IRQs off:
the fewer things that happen during single stepping the better.
-Doug