LinuxLists.cc - [PATCH v3 0/3] kgdb: Honour the kprobe blocklist when setting breakpoints

2020-09-14 13:09:39

Subject: [PATCH v3 0/3] kgdb: Honour the kprobe blocklist when setting breakpoints

kgdb has traditionally adopted a no safety rails approach to breakpoint
placement. If the debugger is commanded to place a breakpoint at an
address then it will do so even if that breakpoint results in kgdb
becoming inoperable.

A stop-the-world debugger with memory peek/poke intrinsically provides
its operator with the means to hose their system in all manner of
exciting ways (not least because stopping-the-world is already a DoS
attack ;-) ). Nevertheless the current no safety rail approach is
difficult to defend, especially given kprobes can provide us with plenty
of machinery to mark the parts of the kernel where breakpointing is
discouraged.

This patchset introduces some safety rails by using the existing kprobes
infrastructure and ensures this will be enabled by default on
architectures that implement kprobes. At present it does not cover
absolutely all locations where breakpoints can cause trouble but it will
block off several avenues, including the architecture specific parts
that are handled by arch_within_kprobe_blacklist().

v3:
* Dropped the single step blocklist checks. It is not proven that the
code was actually reachable without triggering the catastrophic
failure flag (which inhibits resume already).
* Update patch description for ("kgdb: Add NOKPROBE labels...") and
added symbols that are called during trap exit
* Added a new patch to push the breakpoint activation later in the
flow and ensure the I/O functions are not called with breakpoints
activated.

v2:
* Reworked after initial RFC to make honouring the blocklist require
CONFIG_KPROBES. It is not optional but the blocklist will be enabled
by default for architectures that CONFIG_HAVE_KPROBES

Daniel Thompson (3):
kgdb: Honour the kprobe blocklist when setting breakpoints
kgdb: Add NOKPROBE labels on the trap handler functions
kernel: debug: Centralize dbg_[de]activate_sw_breakpoints

include/linux/kgdb.h | 18 ++++++++++++++++++
kernel/debug/debug_core.c | 17 +++++++++++++++++
kernel/debug/gdbstub.c | 1 -
kernel/debug/kdb/kdb_bp.c | 9 +++++++++
kernel/debug/kdb/kdb_debugger.c | 2 --
lib/Kconfig.kgdb | 14 ++++++++++++++
6 files changed, 58 insertions(+), 3 deletions(-)

--
2.25.4

2020-09-14 13:10:03

by Daniel Thompson

[permalink] [raw]

Subject: [PATCH v3 1/3] kgdb: Honour the kprobe blocklist when setting breakpoints

Currently kgdb has absolutely no safety rails in place to discourage or
prevent a user from placing a breakpoint in dangerous places such as
the debugger's own trap entry/exit and other places where it is not safe
to take synchronous traps.

Introduce a new config symbol KGDB_HONOUR_BLOCKLIST and modify the
default implementation of kgdb_validate_break_address() so that we use
the kprobe blocklist to prohibit instrumentation of critical functions
if the config symbol is set. The config symbol dependencies are set to
ensure that the blocklist will be enabled by default if we enable KGDB
and are compiling for an architecture where we HAVE_KPROBES.

Suggested-by: Peter Zijlstra <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Signed-off-by: Daniel Thompson <[email protected]>
---
include/linux/kgdb.h | 18 ++++++++++++++++++
kernel/debug/debug_core.c | 4 ++++
kernel/debug/kdb/kdb_bp.c | 9 +++++++++
lib/Kconfig.kgdb | 14 ++++++++++++++
4 files changed, 45 insertions(+)

diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 477b8b7c908f..0d6cf64c8bb1 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -16,6 +16,7 @@
#include <linux/linkage.h>
#include <linux/init.h>
#include <linux/atomic.h>
+#include <linux/kprobes.h>
#ifdef CONFIG_HAVE_ARCH_KGDB
#include <asm/kgdb.h>
#endif
@@ -335,6 +336,23 @@ extern int kgdb_nmicallin(int cpu, int trapnr, void *regs, int err_code,
atomic_t *snd_rdy);
extern void gdbstub_exit(int status);

+/*
+ * kgdb and kprobes both use the same (kprobe) blocklist (which makes sense
+ * given they are both typically hooked up to the same trap meaning on most
+ * architectures one cannot be used to debug the other)
+ *
+ * However on architectures where kprobes is not (yet) implemented we permit
+ * breakpoints everywhere rather than blocking everything by default.
+ */
+static inline bool kgdb_within_blocklist(unsigned long addr)
+{
+#ifdef CONFIG_KGDB_HONOUR_BLOCKLIST
+ return within_kprobe_blacklist(addr);
+#else
+ return false;
+#endif
+}
+
extern int kgdb_single_step;
extern atomic_t kgdb_active;
#define in_dbg_master() \
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index b16dbc1bf056..b1277728a835 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -188,6 +188,10 @@ int __weak kgdb_validate_break_address(unsigned long addr)
{
struct kgdb_bkpt tmp;
int err;
+
+ if (kgdb_within_blocklist(addr))
+ return -EINVAL;
+
/* Validate setting the breakpoint and then removing it. If the
* remove fails, the kernel needs to emit a bad message because we
* are deep trouble not being able to put things back the way we
diff --git a/kernel/debug/kdb/kdb_bp.c b/kernel/debug/kdb/kdb_bp.c
index d7ebb2c79cb8..ec4940146612 100644
--- a/kernel/debug/kdb/kdb_bp.c
+++ b/kernel/debug/kdb/kdb_bp.c
@@ -306,6 +306,15 @@ static int kdb_bp(int argc, const char **argv)
if (!template.bp_addr)
return KDB_BADINT;

+ /*
+ * This check is redundant (since the breakpoint machinery should
+ * be doing the same check during kdb_bp_install) but gives the
+ * user immediate feedback.
+ */
+ diag = kgdb_validate_break_address(template.bp_addr);
+ if (diag)
+ return diag;
+
/*
* Find an empty bp structure to allocate
*/
diff --git a/lib/Kconfig.kgdb b/lib/Kconfig.kgdb
index 256f2486f9bd..713c17fe789c 100644
--- a/lib/Kconfig.kgdb
+++ b/lib/Kconfig.kgdb
@@ -24,6 +24,20 @@ menuconfig KGDB

if KGDB

+config KGDB_HONOUR_BLOCKLIST
+ bool "KGDB: use kprobe blocklist to prohibit unsafe breakpoints"
+ depends on HAVE_KPROBES
+ select KPROBES
+ default y
+ help
+ If set to Y the debug core will use the kprobe blocklist to
+ identify symbols where it is unsafe to set breakpoints.
+ In particular this disallows instrumentation of functions
+ called during debug trap handling and thus makes it very
+ difficult to inadvertently provoke recursive trap handling.
+
+ If unsure, say Y.
+
config KGDB_SERIAL_CONSOLE
tristate "KGDB: use kgdb over the serial console"
select CONSOLE_POLL
--
2.25.4

2020-09-14 13:11:45

by Daniel Thompson

[permalink] [raw]

Subject: [PATCH v3 3/3] kernel: debug: Centralize dbg_[de]activate_sw_breakpoints

During debug trap execution we expect dbg_deactivate_sw_breakpoints()
to be paired with an dbg_activate_sw_breakpoint(). Currently although
the calls are paired correctly they are needlessly smeared across three
different functions. Worse this also results in code to drive polled I/O
being called with breakpoints activated which, in turn, needlessly
increases the set of functions that will recursively trap if breakpointed.

Fix this by moving the activation of breakpoints into the debug core.

Signed-off-by: Daniel Thompson <[email protected]>
---
kernel/debug/debug_core.c | 2 ++
kernel/debug/gdbstub.c | 1 -
kernel/debug/kdb/kdb_debugger.c | 2 --
3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 9618c1e2faf6..74d42d3f3180 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -763,6 +763,8 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
}
}

+ dbg_activate_sw_breakpoints();
+
/* Call the I/O driver's post_exception routine */
if (dbg_io_ops->post_exception)
dbg_io_ops->post_exception();
diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
index cc3c43dfec44..e9a9c3097089 100644
--- a/kernel/debug/gdbstub.c
+++ b/kernel/debug/gdbstub.c
@@ -1061,7 +1061,6 @@ int gdb_serial_stub(struct kgdb_state *ks)
error_packet(remcom_out_buffer, -EINVAL);
break;
}
- dbg_activate_sw_breakpoints();
fallthrough; /* to default processing */
default:
default_handle:
diff --git a/kernel/debug/kdb/kdb_debugger.c b/kernel/debug/kdb/kdb_debugger.c
index 53a0df6e4d92..0220afda3200 100644
--- a/kernel/debug/kdb/kdb_debugger.c
+++ b/kernel/debug/kdb/kdb_debugger.c
@@ -147,7 +147,6 @@ int kdb_stub(struct kgdb_state *ks)
return DBG_PASS_EVENT;
}
kdb_bp_install(ks->linux_regs);
- dbg_activate_sw_breakpoints();
/* Set the exit state to a single step or a continue */
if (KDB_STATE(DOING_SS))
gdbstub_state(ks, "s");
@@ -167,7 +166,6 @@ int kdb_stub(struct kgdb_state *ks)
* differently vs the gdbstub
*/
kgdb_single_step = 0;
- dbg_deactivate_sw_breakpoints();
return DBG_SWITCH_CPU_EVENT;
}
return kgdb_info[ks->cpu].ret_state;
--
2.25.4

2020-09-14 13:12:08

by Daniel Thompson

[permalink] [raw]

Subject: [PATCH v3 2/3] kgdb: Add NOKPROBE labels on the trap handler functions

Currently kgdb honours the kprobe blocklist but doesn't place its own
trap handling code on the list. Add labels to discourage attempting to
use kgdb to debug itself.

Not every functions that executes from the trap handler needs to be
marked up: relatively early in the trap handler execution (just after
we bring the other CPUs to a halt) all breakpoints are replaced with
the original opcodes. This patch marks up code in the debug_core that
executes between trap entry and the breakpoints being deactivated
and, also, code that executes between breakpoint activation and trap
exit.

To be clear these changes are not sufficient to make recursive trapping
impossible since cover all the library calls made during kgdb's
entry/exit logic. However going much further whilst we are sharing the
kprobe blocklist risks reducing the capabilities of kprobe and this
would be a bad trade off (especially so given kgdb's users are currently
conditioned to avoid recursive traps).

Signed-off-by: Daniel Thompson <[email protected]>
---
kernel/debug/debug_core.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index b1277728a835..9618c1e2faf6 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -177,12 +177,14 @@ int __weak kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE);
return err;
}
+NOKPROBE_SYMBOL(kgdb_arch_set_breakpoint);

int __weak kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
{
return copy_to_kernel_nofault((char *)bpt->bpt_addr,
(char *)bpt->saved_instr, BREAK_INSTR_SIZE);
}
+NOKPROBE_SYMBOL(kgdb_arch_remove_breakpoint);

int __weak kgdb_validate_break_address(unsigned long addr)
{
@@ -302,6 +304,7 @@ static void kgdb_flush_swbreak_addr(unsigned long addr)
/* Force flush instruction cache if it was outside the mm */
flush_icache_range(addr, addr + BREAK_INSTR_SIZE);
}
+NOKPROBE_SYMBOL(kgdb_flush_swbreak_addr);

/*
* SW breakpoint management:
@@ -329,6 +332,7 @@ int dbg_activate_sw_breakpoints(void)
}
return ret;
}
+NOKPROBE_SYMBOL(dbg_activate_sw_breakpoints);

int dbg_set_sw_break(unsigned long addr)
{
@@ -392,6 +396,7 @@ int dbg_deactivate_sw_breakpoints(void)
}
return ret;
}
+NOKPROBE_SYMBOL(dbg_deactivate_sw_breakpoints);

int dbg_remove_sw_break(unsigned long addr)
{
@@ -560,6 +565,7 @@ static int kgdb_reenter_check(struct kgdb_state *ks)

return 1;
}
+NOKPROBE_SYMBOL(kgdb_reenter_check);

static void dbg_touch_watchdogs(void)
{
@@ -567,6 +573,7 @@ static void dbg_touch_watchdogs(void)
clocksource_touch_watchdog();
rcu_cpu_stall_reset();
}
+NOKPROBE_SYMBOL(dbg_touch_watchdogs);

static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
int exception_state)
@@ -798,6 +805,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,

return kgdb_info[cpu].ret_state;
}
+NOKPROBE_SYMBOL(kgdb_cpu_enter);

/*
* kgdb_handle_exception() - main entry point from a kernel exception
@@ -842,6 +850,7 @@ kgdb_handle_exception(int evector, int signo, int ecode, struct pt_regs *regs)
arch_kgdb_ops.enable_nmi(1);
return ret;
}
+NOKPROBE_SYMBOL(kgdb_handle_exception);

/*
* GDB places a breakpoint at this function to know dynamically loaded objects.
@@ -876,6 +885,7 @@ int kgdb_nmicallback(int cpu, void *regs)
#endif
return 1;
}
+NOKPROBE_SYMBOL(kgdb_nmicallback);

int kgdb_nmicallin(int cpu, int trapnr, void *regs, int err_code,
atomic_t *send_ready)
@@ -901,6 +911,7 @@ int kgdb_nmicallin(int cpu, int trapnr, void *regs, int err_code,
#endif
return 1;
}
+NOKPROBE_SYMBOL(kgdb_nmicallin);

static void kgdb_console_write(struct console *co, const char *s,
unsigned count)
--
2.25.4

2020-09-15 00:15:51

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] kernel: debug: Centralize dbg_[de]activate_sw_breakpoints

Hi,

On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson
<[email protected]> wrote:
>
> During debug trap execution we expect dbg_deactivate_sw_breakpoints()
> to be paired with an dbg_activate_sw_breakpoint(). Currently although
> the calls are paired correctly they are needlessly smeared across three
> different functions. Worse this also results in code to drive polled I/O
> being called with breakpoints activated which, in turn, needlessly
> increases the set of functions that will recursively trap if breakpointed.
>
> Fix this by moving the activation of breakpoints into the debug core.
>
> Signed-off-by: Daniel Thompson <[email protected]>
> ---
> kernel/debug/debug_core.c | 2 ++
> kernel/debug/gdbstub.c | 1 -
> kernel/debug/kdb/kdb_debugger.c | 2 --
> 3 files changed, 2 insertions(+), 3 deletions(-)

I like the idea, but previously the kgdb_arch_handle_exception() was
always called after the SW breakpoints were activated. Are you sure
it's OK to swap those two orders across all architectures?

-Doug

2020-09-15 00:18:18

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH v3 2/3] kgdb: Add NOKPROBE labels on the trap handler functions

Hi,

On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson
<[email protected]> wrote:
>
> Currently kgdb honours the kprobe blocklist but doesn't place its own
> trap handling code on the list. Add labels to discourage attempting to
> use kgdb to debug itself.
>
> Not every functions that executes from the trap handler needs to be
> marked up: relatively early in the trap handler execution (just after
> we bring the other CPUs to a halt) all breakpoints are replaced with
> the original opcodes. This patch marks up code in the debug_core that
> executes between trap entry and the breakpoints being deactivated
> and, also, code that executes between breakpoint activation and trap
> exit.

Other functions that seem to be missing from a quick skim:
* kgdb_io_ready()
* kgdb_roundup_cpus()
* kgdb_call_nmi_hook()

I'm not confident in my ability to spot every code path, though, so
I'm not sure at what point we stop looking. I only spent a few
minutes and, if important, I could dig more. Did you have any chance
to see if there was any way to have a magic linker script just add
this to everything under "kernel/debug" or something like that where
we just use a heavier hammer to whack a whole bunch?

In general any extra annotation here is better than no annotation, I
suppose. ...so if you just want to commit what you have (maybe with
the above 3 extra functions) then I suppose it'd be fine.

-Doug

-Doug

2020-09-15 00:59:28

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] kgdb: Honour the kprobe blocklist when setting breakpoints

On Mon, 14 Sep 2020 14:01:41 +0100
Daniel Thompson <[email protected]> wrote:

> Currently kgdb has absolutely no safety rails in place to discourage or
> prevent a user from placing a breakpoint in dangerous places such as
> the debugger's own trap entry/exit and other places where it is not safe
> to take synchronous traps.
>
> Introduce a new config symbol KGDB_HONOUR_BLOCKLIST and modify the
> default implementation of kgdb_validate_break_address() so that we use
> the kprobe blocklist to prohibit instrumentation of critical functions
> if the config symbol is set. The config symbol dependencies are set to
> ensure that the blocklist will be enabled by default if we enable KGDB
> and are compiling for an architecture where we HAVE_KPROBES.

This looks good to me.

Reviewed-by: Masami Hiramatsu <[email protected]>

Thank you,

>
> Suggested-by: Peter Zijlstra <[email protected]>
> Reviewed-by: Douglas Anderson <[email protected]>
> Signed-off-by: Daniel Thompson <[email protected]>
> ---
> include/linux/kgdb.h | 18 ++++++++++++++++++
> kernel/debug/debug_core.c | 4 ++++
> kernel/debug/kdb/kdb_bp.c | 9 +++++++++
> lib/Kconfig.kgdb | 14 ++++++++++++++
> 4 files changed, 45 insertions(+)
>
> diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> index 477b8b7c908f..0d6cf64c8bb1 100644
> --- a/include/linux/kgdb.h
> +++ b/include/linux/kgdb.h
> @@ -16,6 +16,7 @@
> #include <linux/linkage.h>
> #include <linux/init.h>
> #include <linux/atomic.h>
> +#include <linux/kprobes.h>
> #ifdef CONFIG_HAVE_ARCH_KGDB
> #include <asm/kgdb.h>
> #endif
> @@ -335,6 +336,23 @@ extern int kgdb_nmicallin(int cpu, int trapnr, void *regs, int err_code,
> atomic_t *snd_rdy);
> extern void gdbstub_exit(int status);
>
> +/*
> + * kgdb and kprobes both use the same (kprobe) blocklist (which makes sense
> + * given they are both typically hooked up to the same trap meaning on most
> + * architectures one cannot be used to debug the other)
> + *
> + * However on architectures where kprobes is not (yet) implemented we permit
> + * breakpoints everywhere rather than blocking everything by default.
> + */
> +static inline bool kgdb_within_blocklist(unsigned long addr)
> +{
> +#ifdef CONFIG_KGDB_HONOUR_BLOCKLIST
> + return within_kprobe_blacklist(addr);
> +#else
> + return false;
> +#endif
> +}
> +
> extern int kgdb_single_step;
> extern atomic_t kgdb_active;
> #define in_dbg_master() \
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index b16dbc1bf056..b1277728a835 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -188,6 +188,10 @@ int __weak kgdb_validate_break_address(unsigned long addr)
> {
> struct kgdb_bkpt tmp;
> int err;
> +
> + if (kgdb_within_blocklist(addr))
> + return -EINVAL;
> +
> /* Validate setting the breakpoint and then removing it. If the
> * remove fails, the kernel needs to emit a bad message because we
> * are deep trouble not being able to put things back the way we
> diff --git a/kernel/debug/kdb/kdb_bp.c b/kernel/debug/kdb/kdb_bp.c
> index d7ebb2c79cb8..ec4940146612 100644
> --- a/kernel/debug/kdb/kdb_bp.c
> +++ b/kernel/debug/kdb/kdb_bp.c
> @@ -306,6 +306,15 @@ static int kdb_bp(int argc, const char **argv)
> if (!template.bp_addr)
> return KDB_BADINT;
>
> + /*
> + * This check is redundant (since the breakpoint machinery should
> + * be doing the same check during kdb_bp_install) but gives the
> + * user immediate feedback.
> + */
> + diag = kgdb_validate_break_address(template.bp_addr);
> + if (diag)
> + return diag;
> +
> /*
> * Find an empty bp structure to allocate
> */
> diff --git a/lib/Kconfig.kgdb b/lib/Kconfig.kgdb
> index 256f2486f9bd..713c17fe789c 100644
> --- a/lib/Kconfig.kgdb
> +++ b/lib/Kconfig.kgdb
> @@ -24,6 +24,20 @@ menuconfig KGDB
>
> if KGDB
>
> +config KGDB_HONOUR_BLOCKLIST
> + bool "KGDB: use kprobe blocklist to prohibit unsafe breakpoints"
> + depends on HAVE_KPROBES
> + select KPROBES
> + default y
> + help
> + If set to Y the debug core will use the kprobe blocklist to
> + identify symbols where it is unsafe to set breakpoints.
> + In particular this disallows instrumentation of functions
> + called during debug trap handling and thus makes it very
> + difficult to inadvertently provoke recursive trap handling.
> +
> + If unsure, say Y.
> +
> config KGDB_SERIAL_CONSOLE
> tristate "KGDB: use kgdb over the serial console"
> select CONSOLE_POLL
> --
> 2.25.4
>

--
Masami Hiramatsu <[email protected]>

2020-09-16 00:31:57

by Daniel Thompson

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] kernel: debug: Centralize dbg_[de]activate_sw_breakpoints

On Mon, Sep 14, 2020 at 05:13:28PM -0700, Doug Anderson wrote:
> Hi,
>
> On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson
> <[email protected]> wrote:
> >
> > During debug trap execution we expect dbg_deactivate_sw_breakpoints()
> > to be paired with an dbg_activate_sw_breakpoint(). Currently although
> > the calls are paired correctly they are needlessly smeared across three
> > different functions. Worse this also results in code to drive polled I/O
> > being called with breakpoints activated which, in turn, needlessly
> > increases the set of functions that will recursively trap if breakpointed.
> >
> > Fix this by moving the activation of breakpoints into the debug core.
> >
> > Signed-off-by: Daniel Thompson <[email protected]>
> > ---
> > kernel/debug/debug_core.c | 2 ++
> > kernel/debug/gdbstub.c | 1 -
> > kernel/debug/kdb/kdb_debugger.c | 2 --
> > 3 files changed, 2 insertions(+), 3 deletions(-)
>
> I like the idea, but previously the kgdb_arch_handle_exception() was
> always called after the SW breakpoints were activated. Are you sure
> it's OK to swap those two orders across all architectures?

Pretty sure, yes.

However, given the poor attention to detail I demonstrated in patch 2/3,
I figure I'd better write out the full chain of reasoning if I want
you to trust me ;-) .

kgdb_arch_handle_exception() is already called frequently with
breakpoints disabled since it is basically a fallback that is called
whenever the architecture neutral parts of the gdb packet processing
cannot cope.

So your question becomes more specific: is it OK to swap orders when the
architecture code is handling a (c)ontinue or (s)tep packet[1]?

The reason the architecture neutral part cannot cope is because it
because *if* gdb has been instructed that the PC must be changed before
resuming then the architecture neutral code does not know how to effect
this. In other words the call to kgdb_arch_handle_exception() will
boil down to doing whatever the architecture equivalent of writing to
linux_regs->pc actually is (or return an error).

Updating the PC is safe whether or not breakpoints are deactivated
and activating the breakpoints if the architecture code actually
censors the resume is a bug (which we just fixed).

Daniel.

[1]
The fallthroughs aren't a whole lot of fun to read but
if gdb_cmd_exception_pass() provokes a fallthrough then it will
have rewritten the packet as a (c)ontinue.

2020-09-16 23:38:49

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH v3 3/3] kernel: debug: Centralize dbg_[de]activate_sw_breakpoints

Hi,

On Tue, Sep 15, 2020 at 6:45 AM Daniel Thompson
<[email protected]> wrote:
>
> On Mon, Sep 14, 2020 at 05:13:28PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson
> > <[email protected]> wrote:
> > >
> > > During debug trap execution we expect dbg_deactivate_sw_breakpoints()
> > > to be paired with an dbg_activate_sw_breakpoint(). Currently although
> > > the calls are paired correctly they are needlessly smeared across three
> > > different functions. Worse this also results in code to drive polled I/O
> > > being called with breakpoints activated which, in turn, needlessly
> > > increases the set of functions that will recursively trap if breakpointed.
> > >
> > > Fix this by moving the activation of breakpoints into the debug core.
> > >
> > > Signed-off-by: Daniel Thompson <[email protected]>
> > > ---
> > > kernel/debug/debug_core.c | 2 ++
> > > kernel/debug/gdbstub.c | 1 -
> > > kernel/debug/kdb/kdb_debugger.c | 2 --
> > > 3 files changed, 2 insertions(+), 3 deletions(-)
> >
> > I like the idea, but previously the kgdb_arch_handle_exception() was
> > always called after the SW breakpoints were activated. Are you sure
> > it's OK to swap those two orders across all architectures?
>
> Pretty sure, yes.
>
> However, given the poor attention to detail I demonstrated in patch 2/3,
> I figure I'd better write out the full chain of reasoning if I want
> you to trust me ;-) .
>
> kgdb_arch_handle_exception() is already called frequently with
> breakpoints disabled since it is basically a fallback that is called
> whenever the architecture neutral parts of the gdb packet processing
> cannot cope.
>
> So your question becomes more specific: is it OK to swap orders when the
> architecture code is handling a (c)ontinue or (s)tep packet[1]?
>
> The reason the architecture neutral part cannot cope is because it
> because *if* gdb has been instructed that the PC must be changed before
> resuming then the architecture neutral code does not know how to effect
> this. In other words the call to kgdb_arch_handle_exception() will
> boil down to doing whatever the architecture equivalent of writing to
> linux_regs->pc actually is (or return an error).
>
> Updating the PC is safe whether or not breakpoints are deactivated
> and activating the breakpoints if the architecture code actually
> censors the resume is a bug (which we just fixed).

OK, fair enough. Without digging into all the arch code, I was
assuming that some arch somewhere could be playing tricks on us.
After all, the arch code is told about all the breakpoints
(kgdb_arch_set_breakpoint) so, in theory, it could be doing something
funky (not applying the breakpoint until it sees the "continue" or
something?).

I guess the one thing that seems the most plausible that an
architecture might be doing is doing something special if it is told
to "continue" or "single step" an address that had a breakpoint on it.
I guess I could imagine that on some architectures this might require
special handling (could it be somehow illegal to run the CPU in step
mode over a breakpoint instruction?). ...or maybe if it was using
hardware breakpoints those don't trigger properly if you're continuing
to the address of the breakpoint? I'm making stuff up here and
presumably none of this is true, but it's what I was worried about.

From a quick glance, I don't _think_ anyone is doing this. Presumably
today they'd either need a) a funky implementation for
kgdb_arch_set_breakpoint() or they'd need b) code somewhere which read
memory and looked for "gdb_bpt_instr".

a) Looking at kgdb_arch_set_breakpoint(), I don't see anything too
funky. They all look like nearly the same code of writing the
breakpoint to memory, perhaps taking into account locked .text space
by using a special patch routine.

b) Looking at users of "gdb_bpt_instr" I do see some funkiness in
"h8300" and "microblaze", but nothing that looks like it fits the bill
of what we're looking for.

In any case, even if someone was doing something funky, it would be
possible to adapt the code to the new way of the world. You'd just
check the PC when applying breakpoints rather than checking the
breakpoints when continuing. So I'm going to go ahead and say:

Reviewed-by: Douglas Anderson <[email protected]>

...and I guess if it really truly causes problems for someone we'll
figure it out.

Feel free to ignore the below. I wrote it up but realized it probably
wasn't useful but couldn't bear to delete it. :-P

One somewhat plausible example (I don't think this happens, so feel
free to let your eyes glaze over and ignore). Assume that the kernel
has a mix of ARM and Thumb code. When told to set a breakpoint at an
address the kernel wouldn't know whether the address refers to ARM or
Thumb code. In the past I've solved this type of problem by using an
instruction as a breakpoint that is an illegal instruction when
interpreted any which way (the instruction is illegal as an ARM
instruction and both halves illegal as a Thumb instruction). Right
when we're about to continue, though, we actually have extra
information about the PC we're going to continue to. If we know we're
about to continue in Thumb mode and the address we're going to
continue on is the 2nd half of a breakpoint instruction we suddenly
know that the breakpoint should have been a Thumb-mode instruction and
we could fix it up.

AKA:

1. kernel is asked to set a breakpoint at 0x12340000. We don't know
if this is arm or thumb so we set a 32-bit (ARM) breakpoint at
0x12340000

2. We're told to continue in thumb mode at address 0x12340002

3. We suddenly know that the breakpoint at 0x12340000 should have been
a 16-bit (Thumb) breakpoint, not a 32-bit (ARM) breakpoint, so we
could fix it up before continuing.

OK, that probably was just confusing, and, like I said, I don't think
kdb in Linux does that. ;-)

-Doug

2020-09-27 21:17:25

by Daniel Thompson

[permalink] [raw]

Subject: Re: [PATCH v3 2/3] kgdb: Add NOKPROBE labels on the trap handler functions

On Mon, Sep 14, 2020 at 05:14:22PM -0700, Doug Anderson wrote:
> Hi,
>
> On Mon, Sep 14, 2020 at 6:02 AM Daniel Thompson
> <[email protected]> wrote:
> >
> > Currently kgdb honours the kprobe blocklist but doesn't place its own
> > trap handling code on the list. Add labels to discourage attempting to
> > use kgdb to debug itself.
> >
> > Not every functions that executes from the trap handler needs to be
> > marked up: relatively early in the trap handler execution (just after
> > we bring the other CPUs to a halt) all breakpoints are replaced with
> > the original opcodes. This patch marks up code in the debug_core that
> > executes between trap entry and the breakpoints being deactivated
> > and, also, code that executes between breakpoint activation and trap
> > exit.
>
> Other functions that seem to be missing from a quick skim:
> * kgdb_io_ready()
> * kgdb_roundup_cpus()
> * kgdb_call_nmi_hook()

I've grabbed those (and a one or two extras after paying better attention
to the entry logic).

> I'm not confident in my ability to spot every code path, though, so
> I'm not sure at what point we stop looking. I only spent a few
> minutes and, if important, I could dig more. Did you have any chance
> to see if there was any way to have a magic linker script just add
> this to everything under "kernel/debug" or something like that where
> we just use a heavier hammer to whack a whole bunch?

I think one could play games with linker sections but it would involve
adding extra infrastructure for the kprobe blocklist. I'm not convinced
that is worth the effort whilst there are acknowledged (and bigger) gaps
elsewhere.

> In general any extra annotation here is better than no annotation, I
> suppose. ...so if you just want to commit what you have (maybe with
> the above 3 extra functions) then I suppose it'd be fine.

This wasn't quite confident enough for me to convert into an Acked-by:
but I plan to pull v4 into -next very shortly after posting it (since
everything else is agreed).

Daniel.