2013-09-23 21:26:02

by Mike Travis

[permalink] [raw]
Subject: [PATCH 5/7] KGDB/KDB: add support for external NMI handler to call KGDB/KDB.

This patch adds a kgdb_nmicallin() interface that can be used by
external NMI handlers to call the KGDB/KDB handler. The primary need
for this is for those types of NMI interrupts where all the CPUs
have already received the NMI signal. Therefore no send_IPI(NMI)
is required, and in fact it will cause a 2nd unhandled NMI to occur.
This generates the "Dazed and Confuzed" messages.

Since all the CPUs are getting the NMI at roughly the same time, it's not
guaranteed that the first CPU that hits the NMI handler will manage to
enter KGDB and set the dbg_master_lock before the slaves start entering.
The new argument "send_ready" was added for KGDB to signal the NMI handler
to release the slave CPUs for entry into KGDB.

Signed-off-by: Mike Travis <[email protected]>
Reviewed-by: Dimitri Sivanich <[email protected]>
Reviewed-by: Hedi Berriche <[email protected]>
---
include/linux/kdb.h | 1 +
include/linux/kgdb.h | 1 +
kernel/debug/debug_core.c | 30 +++++++++++++++++++++++++++++-
kernel/debug/debug_core.h | 1 +
kernel/debug/kdb/kdb_debugger.c | 5 ++++-
kernel/debug/kdb/kdb_main.c | 3 +++
6 files changed, 39 insertions(+), 2 deletions(-)

--- linux.orig/include/linux/kdb.h
+++ linux/include/linux/kdb.h
@@ -109,6 +109,7 @@ typedef enum {
KDB_REASON_RECURSE, /* Recursive entry to kdb;
* regs probably valid */
KDB_REASON_SSTEP, /* Single Step trap. - regs valid */
+ KDB_REASON_SYSTEM_NMI, /* In NMI due to SYSTEM cmd; regs valid */
} kdb_reason_t;

extern int kdb_trap_printk;
--- linux.orig/include/linux/kgdb.h
+++ linux/include/linux/kgdb.h
@@ -310,6 +310,7 @@ extern int
kgdb_handle_exception(int ex_vector, int signo, int err_code,
struct pt_regs *regs);
extern int kgdb_nmicallback(int cpu, void *regs);
+extern int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *snd_rdy);
extern void gdbstub_exit(int status);

extern int kgdb_single_step;
--- linux.orig/kernel/debug/debug_core.c
+++ linux/kernel/debug/debug_core.c
@@ -575,8 +575,12 @@ return_normal:
raw_spin_lock(&dbg_slave_lock);

#ifdef CONFIG_SMP
+ /* If SYSTEM_NMI, slaves are already waiting */
+ if (ks->err_code == KDB_REASON_SYSTEM_NMI)
+ atomic_set(ks->send_ready, 1);
+
/* Signal the other CPUs to enter kgdb_wait() */
- if ((!kgdb_single_step) && kgdb_do_roundup)
+ else if ((!kgdb_single_step) && kgdb_do_roundup)
kgdb_roundup_cpus(flags);
#endif

@@ -729,6 +733,30 @@ int kgdb_nmicallback(int cpu, void *regs
return 0;
}
#endif
+ return 1;
+}
+
+int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *send_ready)
+{
+#ifdef CONFIG_SMP
+ if (!kgdb_io_ready(0) || !send_ready)
+ return 1;
+
+ if (kgdb_info[cpu].enter_kgdb == 0) {
+ struct kgdb_state kgdb_var;
+ struct kgdb_state *ks = &kgdb_var;
+
+ memset(ks, 0, sizeof(struct kgdb_state));
+ ks->cpu = cpu;
+ ks->ex_vector = trapnr;
+ ks->signo = SIGTRAP;
+ ks->err_code = KDB_REASON_SYSTEM_NMI;
+ ks->linux_regs = regs;
+ ks->send_ready = send_ready;
+ kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
+ return 0;
+ }
+#endif
return 1;
}

--- linux.orig/kernel/debug/debug_core.h
+++ linux/kernel/debug/debug_core.h
@@ -26,6 +26,7 @@ struct kgdb_state {
unsigned long threadid;
long kgdb_usethreadid;
struct pt_regs *linux_regs;
+ atomic_t *send_ready;
};

/* Exception state values */
--- linux.orig/kernel/debug/kdb/kdb_debugger.c
+++ linux/kernel/debug/kdb/kdb_debugger.c
@@ -69,7 +69,10 @@ int kdb_stub(struct kgdb_state *ks)
if (atomic_read(&kgdb_setting_breakpoint))
reason = KDB_REASON_KEYBOARD;

- if (in_nmi())
+ if (ks->err_code == KDB_REASON_SYSTEM_NMI)
+ reason = KDB_REASON_SYSTEM_NMI;
+
+ else if (in_nmi())
reason = KDB_REASON_NMI;

for (i = 0, bp = kdb_breakpoints; i < KDB_MAXBPT; i++, bp++) {
--- linux.orig/kernel/debug/kdb/kdb_main.c
+++ linux/kernel/debug/kdb/kdb_main.c
@@ -1200,6 +1200,9 @@ static int kdb_local(kdb_reason_t reason
instruction_pointer(regs));
kdb_dumpregs(regs);
break;
+ case KDB_REASON_SYSTEM_NMI:
+ kdb_printf("due to System NonMaskable Interrupt\n");
+ break;
case KDB_REASON_NMI:
kdb_printf("due to NonMaskable Interrupt @ "
kdb_machreg_fmt "\n",

--


2013-09-24 08:15:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 5/7] KGDB/KDB: add support for external NMI handler to call KGDB/KDB.


* Mike Travis <[email protected]> wrote:

> This patch adds a kgdb_nmicallin() interface that can be used by
> external NMI handlers to call the KGDB/KDB handler. The primary need
> for this is for those types of NMI interrupts where all the CPUs
> have already received the NMI signal. Therefore no send_IPI(NMI)
> is required, and in fact it will cause a 2nd unhandled NMI to occur.
> This generates the "Dazed and Confuzed" messages.
>
> Since all the CPUs are getting the NMI at roughly the same time, it's not
> guaranteed that the first CPU that hits the NMI handler will manage to
> enter KGDB and set the dbg_master_lock before the slaves start entering.
> The new argument "send_ready" was added for KGDB to signal the NMI handler
> to release the slave CPUs for entry into KGDB.
>
> Signed-off-by: Mike Travis <[email protected]>
> Reviewed-by: Dimitri Sivanich <[email protected]>
> Reviewed-by: Hedi Berriche <[email protected]>
> ---
> include/linux/kdb.h | 1 +
> include/linux/kgdb.h | 1 +
> kernel/debug/debug_core.c | 30 +++++++++++++++++++++++++++++-
> kernel/debug/debug_core.h | 1 +
> kernel/debug/kdb/kdb_debugger.c | 5 ++++-
> kernel/debug/kdb/kdb_main.c | 3 +++
> 6 files changed, 39 insertions(+), 2 deletions(-)

Jason, any objections to form or substance of this callback?

Thanks,

Ingo

2013-09-26 20:46:26

by Jason Wessel

[permalink] [raw]
Subject: Re: [PATCH 5/7] KGDB/KDB: add support for external NMI handler to call KGDB/KDB.

On 09/23/2013 04:25 PM, Mike Travis wrote:
> This patch adds a kgdb_nmicallin() interface that can be used by
> external NMI handlers to call the KGDB/KDB handler. The primary need
> for this is for those types of NMI interrupts where all the CPUs
> have already received the NMI signal. Therefore no send_IPI(NMI)
> is required, and in fact it will cause a 2nd unhandled NMI to occur.
> This generates the "Dazed and Confuzed" messages.
>
> Since all the CPUs are getting the NMI at roughly the same time, it's not
> guaranteed that the first CPU that hits the NMI handler will manage to
> enter KGDB and set the dbg_master_lock before the slaves start entering.
> The new argument "send_ready" was added for KGDB to signal the NMI handler
> to release the slave CPUs for entry into KGDB.
>
> Signed-off-by: Mike Travis <[email protected]>
> Reviewed-by: Dimitri Sivanich <[email protected]>
> Reviewed-by: Hedi Berriche <[email protected]>


One problem that I pointed out before and then you can add

Acked-by: Jason Wessel <[email protected]>


> ---
> include/linux/kdb.h | 1 +
> include/linux/kgdb.h | 1 +
> kernel/debug/debug_core.c | 30 +++++++++++++++++++++++++++++-
> kernel/debug/debug_core.h | 1 +
> kernel/debug/kdb/kdb_debugger.c | 5 ++++-
> kernel/debug/kdb/kdb_main.c | 3 +++
> 6 files changed, 39 insertions(+), 2 deletions(-)
>
> --- linux.orig/include/linux/kdb.h
> +++ linux/include/linux/kdb.h
> @@ -109,6 +109,7 @@ typedef enum {
> KDB_REASON_RECURSE, /* Recursive entry to kdb;
> * regs probably valid */
> KDB_REASON_SSTEP, /* Single Step trap. - regs valid */
> + KDB_REASON_SYSTEM_NMI, /* In NMI due to SYSTEM cmd; regs valid */
> } kdb_reason_t;
>
> extern int kdb_trap_printk;
> --- linux.orig/include/linux/kgdb.h
> +++ linux/include/linux/kgdb.h
> @@ -310,6 +310,7 @@ extern int
> kgdb_handle_exception(int ex_vector, int signo, int err_code,
> struct pt_regs *regs);
> extern int kgdb_nmicallback(int cpu, void *regs);
> +extern int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *snd_rdy);
> extern void gdbstub_exit(int status);
>
> extern int kgdb_single_step;
> --- linux.orig/kernel/debug/debug_core.c
> +++ linux/kernel/debug/debug_core.c
> @@ -575,8 +575,12 @@ return_normal:
> raw_spin_lock(&dbg_slave_lock);
>
> #ifdef CONFIG_SMP
> + /* If SYSTEM_NMI, slaves are already waiting */
> + if (ks->err_code == KDB_REASON_SYSTEM_NMI)
> + atomic_set(ks->send_ready, 1);
> +
> /* Signal the other CPUs to enter kgdb_wait() */
> - if ((!kgdb_single_step) && kgdb_do_roundup)
> + else if ((!kgdb_single_step) && kgdb_do_roundup)
> kgdb_roundup_cpus(flags);
> #endif
>
> @@ -729,6 +733,30 @@ int kgdb_nmicallback(int cpu, void *regs
> return 0;
> }
> #endif
> + return 1;
> +}
> +
> +int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *send_ready)
> +{
> +#ifdef CONFIG_SMP
> + if (!kgdb_io_ready(0) || !send_ready)
> + return 1;
> +
> + if (kgdb_info[cpu].enter_kgdb == 0) {
> + struct kgdb_state kgdb_var;
> + struct kgdb_state *ks = &kgdb_var;
> +
> + memset(ks, 0, sizeof(struct kgdb_state));
> + ks->cpu = cpu;
> + ks->ex_vector = trapnr;
> + ks->signo = SIGTRAP;
> + ks->err_code = KDB_REASON_SYSTEM_NMI;
> + ks->linux_regs = regs;
> + ks->send_ready = send_ready;
> + kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
> + return 0;
> + }
> +#endif
> return 1;
> }
>
> --- linux.orig/kernel/debug/debug_core.h
> +++ linux/kernel/debug/debug_core.h
> @@ -26,6 +26,7 @@ struct kgdb_state {
> unsigned long threadid;
> long kgdb_usethreadid;
> struct pt_regs *linux_regs;
> + atomic_t *send_ready;
> };
>
> /* Exception state values */
> --- linux.orig/kernel/debug/kdb/kdb_debugger.c
> +++ linux/kernel/debug/kdb/kdb_debugger.c
> @@ -69,7 +69,10 @@ int kdb_stub(struct kgdb_state *ks)
> if (atomic_read(&kgdb_setting_breakpoint))
> reason = KDB_REASON_KEYBOARD;
>
> - if (in_nmi())
> + if (ks->err_code == KDB_REASON_SYSTEM_NMI)


This should be:

(ks->err == KDB_REASON_SYSNMI && ks->signo == SIGTRAP)

Other arch's in their arch specific handling code may set ks->err differently. The err code is signal specific. For example:

arch/hexagon/kernel/kgdb.c: if (kgdb_handle_exception(args->trapnr & 0xff, args->signr, args->err,

The error passed in will be different for other signals and we do not want it to collide.

Cheers,
Jason.


> + reason = KDB_REASON_SYSTEM_NMI;
> +
> + else if (in_nmi())
> reason = KDB_REASON_NMI;
>
> for (i = 0, bp = kdb_breakpoints; i < KDB_MAXBPT; i++, bp++) {
> --- linux.orig/kernel/debug/kdb/kdb_main.c
> +++ linux/kernel/debug/kdb/kdb_main.c
> @@ -1200,6 +1200,9 @@ static int kdb_local(kdb_reason_t reason
> instruction_pointer(regs));
> kdb_dumpregs(regs);
> break;
> + case KDB_REASON_SYSTEM_NMI:
> + kdb_printf("due to System NonMaskable Interrupt\n");
> + break;
> case KDB_REASON_NMI:
> kdb_printf("due to NonMaskable Interrupt @ "
> kdb_machreg_fmt "\n",
>
> --
>

2013-10-03 05:45:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 5/7] KGDB/KDB: add support for external NMI handler to call KGDB/KDB.


* Jason Wessel <[email protected]> wrote:

> On 09/23/2013 04:25 PM, Mike Travis wrote:
> > This patch adds a kgdb_nmicallin() interface that can be used by
> > external NMI handlers to call the KGDB/KDB handler. The primary need
> > for this is for those types of NMI interrupts where all the CPUs
> > have already received the NMI signal. Therefore no send_IPI(NMI)
> > is required, and in fact it will cause a 2nd unhandled NMI to occur.
> > This generates the "Dazed and Confuzed" messages.
> >
> > Since all the CPUs are getting the NMI at roughly the same time, it's not
> > guaranteed that the first CPU that hits the NMI handler will manage to
> > enter KGDB and set the dbg_master_lock before the slaves start entering.
> > The new argument "send_ready" was added for KGDB to signal the NMI handler
> > to release the slave CPUs for entry into KGDB.
> >
> > Signed-off-by: Mike Travis <[email protected]>
> > Reviewed-by: Dimitri Sivanich <[email protected]>
> > Reviewed-by: Hedi Berriche <[email protected]>
>
> One problem that I pointed out before and then you can add
>
> Acked-by: Jason Wessel <[email protected]>
>
> > ---
> > include/linux/kdb.h | 1 +
> > include/linux/kgdb.h | 1 +
> > kernel/debug/debug_core.c | 30 +++++++++++++++++++++++++++++-
> > kernel/debug/debug_core.h | 1 +
> > kernel/debug/kdb/kdb_debugger.c | 5 ++++-
> > kernel/debug/kdb/kdb_main.c | 3 +++
> > 6 files changed, 39 insertions(+), 2 deletions(-)

Jason, would you be fine with me merging this via tip:x86/uv with your
Acked-by? That is the tree where the actual usecase of this callback would
be merged upstream (v3.13-ish), and most of the other commits are x86/uv
specific.

Thanks,

Ingo