Hey all-
Theres a corner case in 32 bit x86 kdump at the moment. When the box
panics via nmi, we call bust_spinlocks(1) to disable sensitivity to the
console_sem (allowing us to print to the console in all cases), but we don't
call crash_kexec, until after we call bust_spinlocks(0), which re-enables
console_sem sensitivity. The result is that, if we get an nmi while the
console_sem is held and kdump is configured, and we try to print something to
the console during kdump shutdown (which we often do) we deadlock the box. The
fix is to simply do what 64 bit die_nmi does which is to not call
bust_spinlocks(0) until after we call crash_kexec. Patch below tested
successfully by me:
Regards
Neil
Signed-off-by: Neil Horman <[email protected]>
dumpstack_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 1a78180..b361475 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -405,7 +405,6 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
panic("Non maskable interrupt");
console_silent();
spin_unlock(&nmi_print_lock);
- bust_spinlocks(0);
/*
* If we are in kernel we are probably nested up pretty bad
@@ -416,6 +415,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
crash_kexec(regs);
}
+ bust_spinlocks(0);
do_exit(SIGSEGV);
}
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
* Neil Horman <[email protected]> wrote:
> Hey all-
> Theres a corner case in 32 bit x86 kdump at the moment. When
> the box panics via nmi, we call bust_spinlocks(1) to disable
> sensitivity to the console_sem (allowing us to print to the console in
> all cases), but we don't call crash_kexec, until after we call
> bust_spinlocks(0), which re-enables console_sem sensitivity. The
> result is that, if we get an nmi while the console_sem is held and
> kdump is configured, and we try to print something to the console
> during kdump shutdown (which we often do) we deadlock the box. The
> fix is to simply do what 64 bit die_nmi does which is to not call
> bust_spinlocks(0) until after we call crash_kexec. Patch below tested
> successfully by me:
applied to tip/x86/urgent, thanks Neil!
> dumpstack_32.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
would be nice to unify this code some more - to create a new
arch/x86/kernel/dumpstack.c and fill it in with die_nmi() as a
beginning?
Ingo
On Mon, Oct 20, 2008 at 02:13:39PM +0200, Ingo Molnar wrote:
>
> * Neil Horman <[email protected]> wrote:
>
> > Hey all-
> > Theres a corner case in 32 bit x86 kdump at the moment. When
> > the box panics via nmi, we call bust_spinlocks(1) to disable
> > sensitivity to the console_sem (allowing us to print to the console in
> > all cases), but we don't call crash_kexec, until after we call
> > bust_spinlocks(0), which re-enables console_sem sensitivity. The
> > result is that, if we get an nmi while the console_sem is held and
> > kdump is configured, and we try to print something to the console
> > during kdump shutdown (which we often do) we deadlock the box. The
> > fix is to simply do what 64 bit die_nmi does which is to not call
> > bust_spinlocks(0) until after we call crash_kexec. Patch below tested
> > successfully by me:
>
> applied to tip/x86/urgent, thanks Neil!
>
Thanks, Ingo!
> > dumpstack_32.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> would be nice to unify this code some more - to create a new
> arch/x86/kernel/dumpstack.c and fill it in with die_nmi() as a
> beginning?
>
Agreed, I'll try look into that as soon as I have some free time.
Regards
Neil
> Ingo
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
Hi Ingo, Neil,
Doing more unification of dumpstack is/was on my
TODO list, but it got sidetracked with some other
hobby projects.
I did unify oops_begin, oops_end, die and die_nmi,
and it seems to work. It needs more careful testing,
I think, but if someone (Neil?) wants to take a
look and validate the result...
Greetings,
Alexander
Mostly use the x86_64 version of oops_begin() and oops_end() on
i386 too. Changes to the original x86_64 version:
- move add_taint(TAINT_DIE) into oops_end()
- add a conditional crash_kexec() into oops_end()
Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 36 +++++++++++++++++++++---------------
arch/x86/kernel/dumpstack_64.c | 4 +++-
2 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index b361475..e45952b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -289,35 +289,41 @@ static unsigned int die_nest_count;
unsigned __kprobes long oops_begin(void)
{
+ int cpu;
unsigned long flags;
oops_enter();
- if (die_owner != raw_smp_processor_id()) {
- console_verbose();
- raw_local_irq_save(flags);
- __raw_spin_lock(&die_lock);
- die_owner = smp_processor_id();
- die_nest_count = 0;
- bust_spinlocks(1);
- } else {
- raw_local_irq_save(flags);
+ /* racy, but better than risking deadlock. */
+ raw_local_irq_save(flags);
+ cpu = smp_processor_id();
+ if (!__raw_spin_trylock(&die_lock)) {
+ if (cpu == die_owner)
+ /* nested oops. should stop eventually */;
+ else
+ __raw_spin_lock(&die_lock);
}
die_nest_count++;
+ die_owner = cpu;
+ console_verbose();
+ bust_spinlocks(1);
return flags;
}
void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
- bust_spinlocks(0);
die_owner = -1;
+ bust_spinlocks(0);
+ die_nest_count--;
add_taint(TAINT_DIE);
- __raw_spin_unlock(&die_lock);
+ if (!die_nest_count)
+ /* Nest count reaches zero, release the lock. */
+ __raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
-
- if (!regs)
+ if (!regs) {
+ oops_exit();
return;
-
+ }
if (kexec_should_crash(current))
crash_kexec(regs);
if (in_interrupt())
@@ -405,6 +411,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
panic("Non maskable interrupt");
console_silent();
spin_unlock(&nmi_print_lock);
+ bust_spinlocks(0);
/*
* If we are in kernel we are probably nested up pretty bad
@@ -415,7 +422,6 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
crash_kexec(regs);
}
- bust_spinlocks(0);
do_exit(SIGSEGV);
}
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 96a5db7..cd7b46b 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -461,6 +461,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
die_owner = -1;
bust_spinlocks(0);
die_nest_count--;
+ add_taint(TAINT_DIE);
if (!die_nest_count)
/* Nest count reaches zero, release the lock. */
__raw_spin_unlock(&die_lock);
@@ -469,6 +470,8 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
oops_exit();
return;
}
+ if (kexec_should_crash(current))
+ crash_kexec(regs);
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
@@ -496,7 +499,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
return 1;
show_registers(regs);
- add_taint(TAINT_DIE);
/* Executive summary in case the oops scrolled away */
printk(KERN_ALERT "RIP ");
printk_address(regs->ip, 1);
--
1.5.4.3
Use the x86_64 version of die() and die_nmi() on i386 too.
Changes to the original x86_64-version have no influence
on the generated code:
- whitespace, comments
- use user_mode_vm() instead of user_mode()
Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 40 ++++++++++++++--------------------------
arch/x86/kernel/dumpstack_64.c | 9 +++++++--
2 files changed, 21 insertions(+), 28 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index e45952b..6f00938 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -377,52 +377,40 @@ void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();
- if (die_nest_count < 3) {
+ if (!user_mode_vm(regs))
report_bug(regs->ip, regs);
- if (__die(str, regs, err))
- regs = NULL;
- } else {
- printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
- }
+ if (__die(str, regs, err))
+ regs = NULL;
oops_end(flags, regs, SIGSEGV);
}
-static DEFINE_SPINLOCK(nmi_print_lock);
-
void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
+ unsigned long flags;
+
if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;
- spin_lock(&nmi_print_lock);
+ flags = oops_begin();
/*
* We are in trouble anyway, lets at least try
- * to get a message out:
+ * to get a message out.
*/
- bust_spinlocks(1);
printk(KERN_EMERG "%s", str);
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
show_registers(regs);
- if (do_panic)
- panic("Non maskable interrupt");
- console_silent();
- spin_unlock(&nmi_print_lock);
- bust_spinlocks(0);
-
- /*
- * If we are in kernel we are probably nested up pretty bad
- * and might aswell get out now while we still can:
- */
- if (!user_mode_vm(regs)) {
- current->thread.trap_no = 2;
+ if (kexec_should_crash(current))
crash_kexec(regs);
- }
-
- do_exit(SIGSEGV);
+ if (do_panic || panic_on_oops)
+ panic("Non maskable interrupt");
+ oops_end(flags, NULL, SIGBUS);
+ nmi_exit();
+ local_irq_enable();
+ do_exit(SIGBUS);
}
static int __init oops_setup(char *s)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index cd7b46b..dbcca05 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -508,19 +508,24 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
return 0;
}
+/*
+ * This is gone through when something in the kernel has done something bad
+ * and is about to be terminated:
+ */
void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();
- if (!user_mode(regs))
+ if (!user_mode_vm(regs))
report_bug(regs->ip, regs);
if (__die(str, regs, err))
regs = NULL;
+
oops_end(flags, regs, SIGSEGV);
}
-notrace __kprobes void
+void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
unsigned long flags;
--
1.5.4.3
On Mon, Oct 20, 2008 at 05:07:31PM +0200, Alexander van Heukelum wrote:
> Hi Ingo, Neil,
>
> Doing more unification of dumpstack is/was on my
> TODO list, but it got sidetracked with some other
> hobby projects.
>
> I did unify oops_begin, oops_end, die and die_nmi,
> and it seems to work. It needs more careful testing,
> I think, but if someone (Neil?) wants to take a
> look and validate the result...
>
> Greetings,
> Alexander
>
Please send me the patch, If/when I get a few moments, I'd be happy to give it a
spin here.
Thanks!
Neil
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec
--
/***************************************************
*Neil Horman
*Senior Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/
On Mon, Oct 20, 2008 at 05:08:34PM +0200, Alexander van Heukelum wrote:
> Mostly use the x86_64 version of oops_begin() and oops_end() on
> i386 too. Changes to the original x86_64 version:
>
Hey, doing a sight review this am here. Didn't find anything major, but I did
find a few little nits. comments inlie
> - move add_taint(TAINT_DIE) into oops_end()
> - add a conditional crash_kexec() into oops_end()
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 36 +++++++++++++++++++++---------------
> arch/x86/kernel/dumpstack_64.c | 4 +++-
> 2 files changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index b361475..e45952b 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -289,35 +289,41 @@ static unsigned int die_nest_count;
>
> unsigned __kprobes long oops_begin(void)
> {
> + int cpu;
> unsigned long flags;
>
> oops_enter();
>
> - if (die_owner != raw_smp_processor_id()) {
> - console_verbose();
> - raw_local_irq_save(flags);
> - __raw_spin_lock(&die_lock);
> - die_owner = smp_processor_id();
> - die_nest_count = 0;
> - bust_spinlocks(1);
> - } else {
> - raw_local_irq_save(flags);
> + /* racy, but better than risking deadlock. */
> + raw_local_irq_save(flags);
> + cpu = smp_processor_id();
> + if (!__raw_spin_trylock(&die_lock)) {
> + if (cpu == die_owner)
> + /* nested oops. should stop eventually */;
> + else
> + __raw_spin_lock(&die_lock);
> }
> die_nest_count++;
> + die_owner = cpu;
> + console_verbose();
> + bust_spinlocks(1);
> return flags;
> }
>
> void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> {
> - bust_spinlocks(0);
> die_owner = -1;
> + bust_spinlocks(0);
> + die_nest_count--;
> add_taint(TAINT_DIE);
> - __raw_spin_unlock(&die_lock);
> + if (!die_nest_count)
> + /* Nest count reaches zero, release the lock. */
> + __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
> -
> - if (!regs)
> + if (!regs) {
> + oops_exit();
> return;
> -
> + }
> if (kexec_should_crash(current))
> crash_kexec(regs);
Hmm. I think this creates the same case that I just fixed in my initial post.
If we start using oops_end with this here, it may be possible to call
crash_kexec with the console_sem held. If that happens, we deadlock. I think
you should be able to move this clause up above the bust_spinlocks(0) without
any issue, and that would take care of that
> if (in_interrupt())
> @@ -405,6 +411,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> panic("Non maskable interrupt");
> console_silent();
> spin_unlock(&nmi_print_lock);
> + bust_spinlocks(0);
>
> /*
> * If we are in kernel we are probably nested up pretty bad
> @@ -415,7 +422,6 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> crash_kexec(regs);
> }
>
> - bust_spinlocks(0);
> do_exit(SIGSEGV);
> }
>
This undoes my previous patch. I realize your second patch fixes it properly so
the ordering is correct when oops_begin and oops_end are used, but if you could
rediff so this isn't here, I'd appreciate it. If these patches are committed
separately, you'll avoid having the tree in a state where that deadlock can
reoccur (even if it is just for one commit)
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 96a5db7..cd7b46b 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -461,6 +461,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> die_owner = -1;
> bust_spinlocks(0);
> die_nest_count--;
> + add_taint(TAINT_DIE);
> if (!die_nest_count)
> /* Nest count reaches zero, release the lock. */
> __raw_spin_unlock(&die_lock);
> @@ -469,6 +470,8 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> oops_exit();
> return;
> }
> + if (kexec_should_crash(current))
> + crash_kexec(regs);
If you're going to add the crash_kexec here (which looking at the call sites,
makes sense to me), you should likely remove it from the critical section of die
and die_nmi, just to avoid the redundancy. Same issue as the 32 bit version
above applies, this needs to happen before you call bust_spinlocks(0).
Fix those issues, and the rest looks good to me.
Regards
Neil
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Mon, Oct 20, 2008 at 05:11:06PM +0200, Alexander van Heukelum wrote:
> Use the x86_64 version of die() and die_nmi() on i386 too.
> Changes to the original x86_64-version have no influence
> on the generated code:
>
> - whitespace, comments
> - use user_mode_vm() instead of user_mode()
>
Hey, smore more comments inline
> Signed-off-by: Alexander van Heukelum <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 40 ++++++++++++++--------------------------
> arch/x86/kernel/dumpstack_64.c | 9 +++++++--
> 2 files changed, 21 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index e45952b..6f00938 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -377,52 +377,40 @@ void die(const char *str, struct pt_regs *regs, long err)
> {
> unsigned long flags = oops_begin();
>
> - if (die_nest_count < 3) {
> + if (!user_mode_vm(regs))
> report_bug(regs->ip, regs);
>
> - if (__die(str, regs, err))
> - regs = NULL;
> - } else {
> - printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
> - }
> + if (__die(str, regs, err))
> + regs = NULL;
>
> oops_end(flags, regs, SIGSEGV);
> }
>
> -static DEFINE_SPINLOCK(nmi_print_lock);
> -
> void notrace __kprobes
> die_nmi(char *str, struct pt_regs *regs, int do_panic)
> {
> + unsigned long flags;
> +
> if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
> return;
>
> - spin_lock(&nmi_print_lock);
> + flags = oops_begin();
> /*
> * We are in trouble anyway, lets at least try
> - * to get a message out:
> + * to get a message out.
> */
> - bust_spinlocks(1);
> printk(KERN_EMERG "%s", str);
> printk(" on CPU%d, ip %08lx, registers:\n",
> smp_processor_id(), regs->ip);
> show_registers(regs);
> - if (do_panic)
> - panic("Non maskable interrupt");
> - console_silent();
> - spin_unlock(&nmi_print_lock);
> - bust_spinlocks(0);
> -
> - /*
> - * If we are in kernel we are probably nested up pretty bad
> - * and might aswell get out now while we still can:
> - */
> - if (!user_mode_vm(regs)) {
> - current->thread.trap_no = 2;
> + if (kexec_should_crash(current))
> crash_kexec(regs);
As I mentioned in your other patch, this crash_kexec can go I think if you're
going to do it in oops_end (as long as you correct the deadlock condition there
And as before, if you could rediff so that we didn't reintroduce the deadlock
that we just fixed, that would be much appreciated.
Beyond that, I think this looks good. Fix that up and It'll have my ack.
Best
Neil
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
crash_kexec should not be called with console_sem held. Move
the call before bust_spinlocks(0) in oops_end to avoid the
problem.
Signed-off-by: Alexander van Heukelum <[email protected]>
Cc: "Neil Horman" <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index b361475..5493d31 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -309,6 +309,9 @@ unsigned __kprobes long oops_begin(void)
void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
+ if (regs && kexec_should_crash(current))
+ crash_kexec(regs);
+
bust_spinlocks(0);
die_owner = -1;
add_taint(TAINT_DIE);
@@ -318,8 +321,6 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
if (!regs)
return;
- if (kexec_should_crash(current))
- crash_kexec(regs);
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
--
1.5.4.3
Hello Neil, Ingo, everyone,
This patch series unifies oops_begin, oops_end, die_nmi
and die in dumpstack_xx.c. The first patch might be
considered a bug-fix, because otherwise, as Neil Horman
points out, it might be possible to call crash_kexec with
the console_sem held; if that happens, we deadlock.
Greetings,
Alexander
oops_end is preceded by either a call to __die, or a conditional
call to crash_kexec. Move the conditional call to crash_kexec
from the end of __die to the start of oops_end and remove
the superfluous call to crash_kexec in die_nmi.
Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_64.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index ffefea6..57ce11b 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -458,6 +458,9 @@ unsigned __kprobes long oops_begin(void)
void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
+ if (regs && kexec_should_crash(current))
+ crash_kexec(regs);
+
die_owner = -1;
bust_spinlocks(0);
die_nest_count--;
@@ -501,8 +504,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
printk(KERN_ALERT "RIP ");
printk_address(regs->ip, 1);
printk(" RSP <%016lx>\n", regs->sp);
- if (kexec_should_crash(current))
- crash_kexec(regs);
return 0;
}
@@ -536,11 +537,9 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
show_registers(regs);
- if (kexec_should_crash(current))
- crash_kexec(regs);
+ oops_end(flags, regs, 0);
if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
- oops_end(flags, regs, 0);
nmi_exit();
local_irq_enable();
do_exit(SIGBUS);
--
1.5.4.3
Change oops_end such that signr=0 signals that do_exit
is not to be called.
Currently, each use of __die is soon followed by a call
to oops_end and 'regs' is set to NULL if oops_end is expected
not to call do_exit. Change all such pairs to set signr=0
instead. On x86_64 oops_end is used 'bare' in die_nmi; use
signr=0 instead of regs=NULL there, too.
Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 7 ++++---
arch/x86/kernel/dumpstack_64.c | 9 +++++----
arch/x86/mm/fault.c | 11 +++++++----
3 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 5493d31..7c22f99 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -318,7 +318,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
__raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
- if (!regs)
+ if (!signr)
return;
if (in_interrupt())
@@ -371,17 +371,18 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();
+ int sig = SIGSEGV;
if (die_nest_count < 3) {
report_bug(regs->ip, regs);
if (__die(str, regs, err))
- regs = NULL;
+ sig = 0;
} else {
printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
}
- oops_end(flags, regs, SIGSEGV);
+ oops_end(flags, regs, sig);
}
static DEFINE_SPINLOCK(nmi_print_lock);
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 96a5db7..ffefea6 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -465,7 +465,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
/* Nest count reaches zero, release the lock. */
__raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
- if (!regs) {
+ if (!signr) {
oops_exit();
return;
}
@@ -509,13 +509,14 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();
+ int sig = SIGSEGV;
if (!user_mode(regs))
report_bug(regs->ip, regs);
if (__die(str, regs, err))
- regs = NULL;
- oops_end(flags, regs, SIGSEGV);
+ sig = 0;
+ oops_end(flags, regs, sig);
}
notrace __kprobes void
@@ -539,7 +540,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
crash_kexec(regs);
if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
- oops_end(flags, NULL, SIGBUS);
+ oops_end(flags, regs, 0);
nmi_exit();
local_irq_enable();
do_exit(SIGBUS);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8e52e68..ed9ee30 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -415,6 +415,7 @@ static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
unsigned long error_code)
{
unsigned long flags = oops_begin();
+ int sig = SIGKILL;
struct task_struct *tsk;
printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
@@ -425,8 +426,8 @@ static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
tsk->thread.trap_no = 14;
tsk->thread.error_code = error_code;
if (__die("Bad pagetable", regs, error_code))
- regs = NULL;
- oops_end(flags, regs, SIGKILL);
+ sig = 0;
+ oops_end(flags, regs, sig);
}
#endif
@@ -594,6 +595,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
#ifdef CONFIG_X86_64
unsigned long flags;
+ int sig;
#endif
tsk = current;
@@ -868,11 +870,12 @@ no_context:
bust_spinlocks(0);
do_exit(SIGKILL);
#else
+ sig = SIGKILL;
if (__die("Oops", regs, error_code))
- regs = NULL;
+ sig = 0;
/* Executive summary in case the body of the oops scrolled away */
printk(KERN_EMERG "CR2: %016lx\n", address);
- oops_end(flags, regs, SIGKILL);
+ oops_end(flags, regs, sig);
#endif
/*
--
1.5.4.3
Make i386's die() equal to x86_64's version.
Whitespace-only changes on x86_64, to make it equal to i386's
version. (user_mode and user_mode_vm are equal on x86_64.)
Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 10 +++-------
arch/x86/kernel/dumpstack_64.c | 6 +++++-
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index e91ae34..f2046c5 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -378,15 +378,11 @@ void die(const char *str, struct pt_regs *regs, long err)
unsigned long flags = oops_begin();
int sig = SIGSEGV;
- if (die_nest_count < 3) {
+ if (!user_mode_vm(regs))
report_bug(regs->ip, regs);
- if (__die(str, regs, err))
- sig = 0;
- } else {
- printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
- }
-
+ if (__die(str, regs, err))
+ sig = 0;
oops_end(flags, regs, sig);
}
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 831e1e1..28c67aa 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -506,12 +506,16 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
return 0;
}
+/*
+ * This is gone through when something in the kernel has done something bad
+ * and is about to be terminated:
+ */
void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();
int sig = SIGSEGV;
- if (!user_mode(regs))
+ if (!user_mode_vm(regs))
report_bug(regs->ip, regs);
if (__die(str, regs, err))
--
1.5.4.3
Use oops_begin and oops_end in die_nmi.
Whitespace-only changes on x86_64, to make it equal to i386's
version.
Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 33 +++++++++++----------------------
arch/x86/kernel/dumpstack_64.c | 4 ++--
2 files changed, 13 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 7c7d691..e91ae34 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -390,40 +390,29 @@ void die(const char *str, struct pt_regs *regs, long err)
oops_end(flags, regs, sig);
}
-static DEFINE_SPINLOCK(nmi_print_lock);
-
void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
+ unsigned long flags;
+
if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;
- spin_lock(&nmi_print_lock);
/*
- * We are in trouble anyway, lets at least try
- * to get a message out:
- */
- bust_spinlocks(1);
+ * We are in trouble anyway, lets at least try
+ * to get a message out.
+ */
+ flags = oops_begin();
printk(KERN_EMERG "%s", str);
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
show_registers(regs);
- if (do_panic)
+ oops_end(flags, regs, 0);
+ if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
- console_silent();
- spin_unlock(&nmi_print_lock);
-
- /*
- * If we are in kernel we are probably nested up pretty bad
- * and might aswell get out now while we still can:
- */
- if (!user_mode_vm(regs)) {
- current->thread.trap_no = 2;
- crash_kexec(regs);
- }
-
- bust_spinlocks(0);
- do_exit(SIGSEGV);
+ nmi_exit();
+ local_irq_enable();
+ do_exit(SIGBUS);
}
static int __init oops_setup(char *s)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index dc6162b..831e1e1 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -519,7 +519,7 @@ void die(const char *str, struct pt_regs *regs, long err)
oops_end(flags, regs, sig);
}
-notrace __kprobes void
+void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
unsigned long flags;
@@ -527,11 +527,11 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;
- flags = oops_begin();
/*
* We are in trouble anyway, lets at least try
* to get a message out.
*/
+ flags = oops_begin();
printk(KERN_EMERG "%s", str);
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
--
1.5.4.3
On Tue, 21 Oct 2008 10:45:05 -0400, "Neil Horman"
<[email protected]> said:
> On Mon, Oct 20, 2008 at 05:08:34PM +0200, Alexander van Heukelum wrote:
> > Mostly use the x86_64 version of oops_begin() and oops_end() on
> > i386 too. Changes to the original x86_64 version:
> >
> Hey, doing a sight review this am here. Didn't find anything major, but
> I did
> find a few little nits. comments inlie
Hi Neil,
Thanks for the review. I've sent a redone patch series just a moment
ago, based on your comments. There was also another problem with these
two patches: oops_end(flags, regs, signr) had special behaviour for
regs=NULL that I did not consider before. The series has grown due
to this issue...
>> [...]
> Hmm. I think this creates the same case that I just fixed in my initial
> post. If we start using oops_end with this here, it may be possible to call
> crash_kexec with the console_sem held. If that happens, we deadlock. I
> think you should be able to move this clause up above the bust_spinlocks(0)
> without any issue, and that would take care of that
Indeed. The new series does exactly that.
>> [...]
> This undoes my previous patch. I realize your second patch fixes it
> properly so the ordering is correct when oops_begin and oops_end are used, but if you
> could rediff so this isn't here, I'd appreciate it. If these patches are
> committed separately, you'll avoid having the tree in a state where that deadlock
> can reoccur (even if it is just for one commit)
Yeah, I quickly rediffed the patches I already had. The new series
leaves
it as is until die_nmi is replaced by the oops_begin/oops_end version.
>> [...]
> If you're going to add the crash_kexec here (which looking at the call
> sites, makes sense to me), you should likely remove it from the critical section
> of die and die_nmi, just to avoid the redundancy. Same issue as the 32 bit
> version above applies, this needs to happen before you call bust_spinlocks(0).
Indeed.
> Fix those issues, and the rest looks good to me.
I think I've done that ;).
Thanks,
Greetings,
Alexander
(I will probably not be able to respond to e-mail until after the
weekend)
--
Alexander van Heukelum
[email protected]
--
http://www.fastmail.fm - mmm... Fastmail...
oops_begin/oops_end should always be used in pairs. On x86_64
oops_begin increments die_nest_count, and oops_end decrements
die_nest_count. Doing this makes oops_begin and oops_end equal
to the x86_64 versions.
Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 29 +++++++++++++++++------------
1 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index a29b88f..7c7d691 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -289,21 +289,24 @@ static unsigned int die_nest_count;
unsigned __kprobes long oops_begin(void)
{
+ int cpu;
unsigned long flags;
oops_enter();
- if (die_owner != raw_smp_processor_id()) {
- console_verbose();
- raw_local_irq_save(flags);
- __raw_spin_lock(&die_lock);
- die_owner = smp_processor_id();
- die_nest_count = 0;
- bust_spinlocks(1);
- } else {
- raw_local_irq_save(flags);
+ /* racy, but better than risking deadlock. */
+ raw_local_irq_save(flags);
+ cpu = smp_processor_id();
+ if (!__raw_spin_trylock(&die_lock)) {
+ if (cpu == die_owner)
+ /* nested oops. should stop eventually */;
+ else
+ __raw_spin_lock(&die_lock);
}
die_nest_count++;
+ die_owner = cpu;
+ console_verbose();
+ bust_spinlocks(1);
return flags;
}
@@ -315,13 +318,15 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
bust_spinlocks(0);
die_owner = -1;
add_taint(TAINT_DIE);
- __raw_spin_unlock(&die_lock);
+ die_nest_count--;
+ if (!die_nest_count)
+ /* Nest count reaches zero, release the lock. */
+ __raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
-
oops_exit();
+
if (!signr)
return;
-
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
--
1.5.4.3
Always call oops_exit from oops_end, even if signr==0.
Also, move add_taint(TAINT_DIE) from __die to oops_end
on x86_64 and interchange two lines to make oops_end
more similar to the i386-version.
Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 2 +-
arch/x86/kernel/dumpstack_64.c | 11 +++++------
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 7c22f99..a29b88f 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -318,6 +318,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
__raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
+ oops_exit();
if (!signr)
return;
@@ -325,7 +326,6 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
panic("Fatal exception in interrupt");
if (panic_on_oops)
panic("Fatal exception");
- oops_exit();
do_exit(signr);
}
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 57ce11b..dc6162b 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -461,22 +461,22 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
if (regs && kexec_should_crash(current))
crash_kexec(regs);
- die_owner = -1;
bust_spinlocks(0);
+ die_owner = -1;
+ add_taint(TAINT_DIE);
die_nest_count--;
if (!die_nest_count)
/* Nest count reaches zero, release the lock. */
__raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
- if (!signr) {
- oops_exit();
+ oops_exit();
+
+ if (!signr)
return;
- }
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
panic("Fatal exception");
- oops_exit();
do_exit(signr);
}
@@ -499,7 +499,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
return 1;
show_registers(regs);
- add_taint(TAINT_DIE);
/* Executive summary in case the oops scrolled away */
printk(KERN_ALERT "RIP ");
printk_address(regs->ip, 1);
--
1.5.4.3
On Wed, Oct 22, 2008 at 12:18:41PM +0200, Alexander van Heukelum wrote:
> On Tue, 21 Oct 2008 10:45:05 -0400, "Neil Horman"
> <[email protected]> said:
> > On Mon, Oct 20, 2008 at 05:08:34PM +0200, Alexander van Heukelum wrote:
> > > Mostly use the x86_64 version of oops_begin() and oops_end() on
> > > i386 too. Changes to the original x86_64 version:
> > >
> > Hey, doing a sight review this am here. Didn't find anything major, but
> > I did
> > find a few little nits. comments inlie
>
> Hi Neil,
>
> Thanks for the review. I've sent a redone patch series just a moment
> ago, based on your comments. There was also another problem with these
> two patches: oops_end(flags, regs, signr) had special behaviour for
> regs=NULL that I did not consider before. The series has grown due
> to this issue...
>
> >> [...]
> > Hmm. I think this creates the same case that I just fixed in my initial
> > post. If we start using oops_end with this here, it may be possible to call
> > crash_kexec with the console_sem held. If that happens, we deadlock. I
> > think you should be able to move this clause up above the bust_spinlocks(0)
> > without any issue, and that would take care of that
>
> Indeed. The new series does exactly that.
>
> >> [...]
> > This undoes my previous patch. I realize your second patch fixes it
> > properly so the ordering is correct when oops_begin and oops_end are used, but if you
> > could rediff so this isn't here, I'd appreciate it. If these patches are
> > committed separately, you'll avoid having the tree in a state where that deadlock
> > can reoccur (even if it is just for one commit)
>
> Yeah, I quickly rediffed the patches I already had. The new series
> leaves
> it as is until die_nmi is replaced by the oops_begin/oops_end version.
>
> >> [...]
> > If you're going to add the crash_kexec here (which looking at the call
> > sites, makes sense to me), you should likely remove it from the critical section
> > of die and die_nmi, just to avoid the redundancy. Same issue as the 32 bit
> > version above applies, this needs to happen before you call bust_spinlocks(0).
>
> Indeed.
>
> > Fix those issues, and the rest looks good to me.
>
> I think I've done that ;).
>
> Thanks,
> Greetings,
> Alexander
>
> (I will probably not be able to respond to e-mail until after the
> weekend)
Copy that. Thanks for the quick turn-around.
Best
Neil
> --
> Alexander van Heukelum
> [email protected]
>
> --
> http://www.fastmail.fm - mmm... Fastmail...
>
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Wed, Oct 22, 2008 at 12:00:08PM +0200, Alexander van Heukelum wrote:
> crash_kexec should not be called with console_sem held. Move
> the call before bust_spinlocks(0) in oops_end to avoid the
> problem.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
> Cc: "Neil Horman" <[email protected]>
Acked-by: Neil Horman <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index b361475..5493d31 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -309,6 +309,9 @@ unsigned __kprobes long oops_begin(void)
>
> void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> {
> + if (regs && kexec_should_crash(current))
> + crash_kexec(regs);
> +
> bust_spinlocks(0);
> die_owner = -1;
> add_taint(TAINT_DIE);
> @@ -318,8 +321,6 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> if (!regs)
> return;
>
> - if (kexec_should_crash(current))
> - crash_kexec(regs);
> if (in_interrupt())
> panic("Fatal exception in interrupt");
> if (panic_on_oops)
> --
> 1.5.4.3
>
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Wed, Oct 22, 2008 at 12:00:09PM +0200, Alexander van Heukelum wrote:
> Change oops_end such that signr=0 signals that do_exit
> is not to be called.
>
> Currently, each use of __die is soon followed by a call
> to oops_end and 'regs' is set to NULL if oops_end is expected
> not to call do_exit. Change all such pairs to set signr=0
> instead. On x86_64 oops_end is used 'bare' in die_nmi; use
> signr=0 instead of regs=NULL there, too.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
Acked-by: Neil Horman <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 7 ++++---
> arch/x86/kernel/dumpstack_64.c | 9 +++++----
> arch/x86/mm/fault.c | 11 +++++++----
> 3 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index 5493d31..7c22f99 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -318,7 +318,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
>
> - if (!regs)
> + if (!signr)
> return;
>
> if (in_interrupt())
> @@ -371,17 +371,18 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> void die(const char *str, struct pt_regs *regs, long err)
> {
> unsigned long flags = oops_begin();
> + int sig = SIGSEGV;
>
> if (die_nest_count < 3) {
> report_bug(regs->ip, regs);
>
> if (__die(str, regs, err))
> - regs = NULL;
> + sig = 0;
> } else {
> printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
> }
>
> - oops_end(flags, regs, SIGSEGV);
> + oops_end(flags, regs, sig);
> }
>
> static DEFINE_SPINLOCK(nmi_print_lock);
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 96a5db7..ffefea6 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -465,7 +465,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> /* Nest count reaches zero, release the lock. */
> __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
> - if (!regs) {
> + if (!signr) {
> oops_exit();
> return;
> }
> @@ -509,13 +509,14 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> void die(const char *str, struct pt_regs *regs, long err)
> {
> unsigned long flags = oops_begin();
> + int sig = SIGSEGV;
>
> if (!user_mode(regs))
> report_bug(regs->ip, regs);
>
> if (__die(str, regs, err))
> - regs = NULL;
> - oops_end(flags, regs, SIGSEGV);
> + sig = 0;
> + oops_end(flags, regs, sig);
> }
>
> notrace __kprobes void
> @@ -539,7 +540,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> crash_kexec(regs);
> if (do_panic || panic_on_oops)
> panic("Non maskable interrupt");
> - oops_end(flags, NULL, SIGBUS);
> + oops_end(flags, regs, 0);
> nmi_exit();
> local_irq_enable();
> do_exit(SIGBUS);
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 8e52e68..ed9ee30 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -415,6 +415,7 @@ static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
> unsigned long error_code)
> {
> unsigned long flags = oops_begin();
> + int sig = SIGKILL;
> struct task_struct *tsk;
>
> printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
> @@ -425,8 +426,8 @@ static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
> tsk->thread.trap_no = 14;
> tsk->thread.error_code = error_code;
> if (__die("Bad pagetable", regs, error_code))
> - regs = NULL;
> - oops_end(flags, regs, SIGKILL);
> + sig = 0;
> + oops_end(flags, regs, sig);
> }
> #endif
>
> @@ -594,6 +595,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
>
> #ifdef CONFIG_X86_64
> unsigned long flags;
> + int sig;
> #endif
>
> tsk = current;
> @@ -868,11 +870,12 @@ no_context:
> bust_spinlocks(0);
> do_exit(SIGKILL);
> #else
> + sig = SIGKILL;
> if (__die("Oops", regs, error_code))
> - regs = NULL;
> + sig = 0;
> /* Executive summary in case the body of the oops scrolled away */
> printk(KERN_EMERG "CR2: %016lx\n", address);
> - oops_end(flags, regs, SIGKILL);
> + oops_end(flags, regs, sig);
> #endif
>
> /*
> --
> 1.5.4.3
>
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Wed, Oct 22, 2008 at 12:00:10PM +0200, Alexander van Heukelum wrote:
> oops_end is preceded by either a call to __die, or a conditional
> call to crash_kexec. Move the conditional call to crash_kexec
> from the end of __die to the start of oops_end and remove
> the superfluous call to crash_kexec in die_nmi.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
Acked-by: Neil Horman <[email protected]>
> ---
> arch/x86/kernel/dumpstack_64.c | 9 ++++-----
> 1 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index ffefea6..57ce11b 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -458,6 +458,9 @@ unsigned __kprobes long oops_begin(void)
>
> void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> {
> + if (regs && kexec_should_crash(current))
> + crash_kexec(regs);
> +
> die_owner = -1;
> bust_spinlocks(0);
> die_nest_count--;
> @@ -501,8 +504,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> printk(KERN_ALERT "RIP ");
> printk_address(regs->ip, 1);
> printk(" RSP <%016lx>\n", regs->sp);
> - if (kexec_should_crash(current))
> - crash_kexec(regs);
> return 0;
> }
>
> @@ -536,11 +537,9 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> printk(" on CPU%d, ip %08lx, registers:\n",
> smp_processor_id(), regs->ip);
> show_registers(regs);
> - if (kexec_should_crash(current))
> - crash_kexec(regs);
> + oops_end(flags, regs, 0);
> if (do_panic || panic_on_oops)
> panic("Non maskable interrupt");
> - oops_end(flags, regs, 0);
> nmi_exit();
> local_irq_enable();
> do_exit(SIGBUS);
> --
> 1.5.4.3
>
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Wed, Oct 22, 2008 at 12:00:11PM +0200, Alexander van Heukelum wrote:
> Always call oops_exit from oops_end, even if signr==0.
>
> Also, move add_taint(TAINT_DIE) from __die to oops_end
> on x86_64 and interchange two lines to make oops_end
> more similar to the i386-version.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
Acked-by: Neil Horman <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 2 +-
> arch/x86/kernel/dumpstack_64.c | 11 +++++------
> 2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index 7c22f99..a29b88f 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -318,6 +318,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
>
> + oops_exit();
> if (!signr)
> return;
>
> @@ -325,7 +326,6 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> panic("Fatal exception in interrupt");
> if (panic_on_oops)
> panic("Fatal exception");
> - oops_exit();
> do_exit(signr);
> }
>
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 57ce11b..dc6162b 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -461,22 +461,22 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> if (regs && kexec_should_crash(current))
> crash_kexec(regs);
>
> - die_owner = -1;
> bust_spinlocks(0);
> + die_owner = -1;
> + add_taint(TAINT_DIE);
> die_nest_count--;
> if (!die_nest_count)
> /* Nest count reaches zero, release the lock. */
> __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
> - if (!signr) {
> - oops_exit();
> + oops_exit();
> +
> + if (!signr)
> return;
> - }
> if (in_interrupt())
> panic("Fatal exception in interrupt");
> if (panic_on_oops)
> panic("Fatal exception");
> - oops_exit();
> do_exit(signr);
> }
>
> @@ -499,7 +499,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> return 1;
>
> show_registers(regs);
> - add_taint(TAINT_DIE);
> /* Executive summary in case the oops scrolled away */
> printk(KERN_ALERT "RIP ");
> printk_address(regs->ip, 1);
> --
> 1.5.4.3
>
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Wed, Oct 22, 2008 at 12:00:12PM +0200, Alexander van Heukelum wrote:
> oops_begin/oops_end should always be used in pairs. On x86_64
> oops_begin increments die_nest_count, and oops_end decrements
> die_nest_count. Doing this makes oops_begin and oops_end equal
> to the x86_64 versions.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
Acked-by: Neil Horman <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 29 +++++++++++++++++------------
> 1 files changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index a29b88f..7c7d691 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -289,21 +289,24 @@ static unsigned int die_nest_count;
>
> unsigned __kprobes long oops_begin(void)
> {
> + int cpu;
> unsigned long flags;
>
> oops_enter();
>
> - if (die_owner != raw_smp_processor_id()) {
> - console_verbose();
> - raw_local_irq_save(flags);
> - __raw_spin_lock(&die_lock);
> - die_owner = smp_processor_id();
> - die_nest_count = 0;
> - bust_spinlocks(1);
> - } else {
> - raw_local_irq_save(flags);
> + /* racy, but better than risking deadlock. */
> + raw_local_irq_save(flags);
> + cpu = smp_processor_id();
> + if (!__raw_spin_trylock(&die_lock)) {
> + if (cpu == die_owner)
> + /* nested oops. should stop eventually */;
> + else
> + __raw_spin_lock(&die_lock);
> }
> die_nest_count++;
> + die_owner = cpu;
> + console_verbose();
> + bust_spinlocks(1);
> return flags;
> }
>
> @@ -315,13 +318,15 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> bust_spinlocks(0);
> die_owner = -1;
> add_taint(TAINT_DIE);
> - __raw_spin_unlock(&die_lock);
> + die_nest_count--;
> + if (!die_nest_count)
> + /* Nest count reaches zero, release the lock. */
> + __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
> -
> oops_exit();
> +
> if (!signr)
> return;
> -
> if (in_interrupt())
> panic("Fatal exception in interrupt");
> if (panic_on_oops)
> --
> 1.5.4.3
>
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
* Neil Horman <[email protected]> wrote:
> > Thanks for the review. I've sent a redone patch series just a moment
> > ago, based on your comments. There was also another problem with
> > these two patches: oops_end(flags, regs, signr) had special
> > behaviour for regs=NULL that I did not consider before. The series
> > has grown due to this issue...
[...]
> Copy that. Thanks for the quick turn-around.
applied to tip/x86/dumpstack, thanks guys!
Could you please also create arch/x86/kernel/dumpstack.c and move all
the now-unified bits there - before they get out of sync again?
Ingo
On Wed, Oct 22, 2008 at 12:00:13PM +0200, Alexander van Heukelum wrote:
> Use oops_begin and oops_end in die_nmi.
>
> Whitespace-only changes on x86_64, to make it equal to i386's
> version.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
Acked-by: Neil Horman <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 33 +++++++++++----------------------
> arch/x86/kernel/dumpstack_64.c | 4 ++--
> 2 files changed, 13 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index 7c7d691..e91ae34 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -390,40 +390,29 @@ void die(const char *str, struct pt_regs *regs, long err)
> oops_end(flags, regs, sig);
> }
>
> -static DEFINE_SPINLOCK(nmi_print_lock);
> -
> void notrace __kprobes
> die_nmi(char *str, struct pt_regs *regs, int do_panic)
> {
> + unsigned long flags;
> +
> if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
> return;
>
> - spin_lock(&nmi_print_lock);
> /*
> - * We are in trouble anyway, lets at least try
> - * to get a message out:
> - */
> - bust_spinlocks(1);
> + * We are in trouble anyway, lets at least try
> + * to get a message out.
> + */
> + flags = oops_begin();
> printk(KERN_EMERG "%s", str);
> printk(" on CPU%d, ip %08lx, registers:\n",
> smp_processor_id(), regs->ip);
> show_registers(regs);
> - if (do_panic)
> + oops_end(flags, regs, 0);
> + if (do_panic || panic_on_oops)
> panic("Non maskable interrupt");
> - console_silent();
> - spin_unlock(&nmi_print_lock);
> -
> - /*
> - * If we are in kernel we are probably nested up pretty bad
> - * and might aswell get out now while we still can:
> - */
> - if (!user_mode_vm(regs)) {
> - current->thread.trap_no = 2;
> - crash_kexec(regs);
> - }
> -
> - bust_spinlocks(0);
> - do_exit(SIGSEGV);
> + nmi_exit();
> + local_irq_enable();
> + do_exit(SIGBUS);
> }
>
> static int __init oops_setup(char *s)
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index dc6162b..831e1e1 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -519,7 +519,7 @@ void die(const char *str, struct pt_regs *regs, long err)
> oops_end(flags, regs, sig);
> }
>
> -notrace __kprobes void
> +void notrace __kprobes
> die_nmi(char *str, struct pt_regs *regs, int do_panic)
> {
> unsigned long flags;
> @@ -527,11 +527,11 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
> return;
>
> - flags = oops_begin();
> /*
> * We are in trouble anyway, lets at least try
> * to get a message out.
> */
> + flags = oops_begin();
> printk(KERN_EMERG "%s", str);
> printk(" on CPU%d, ip %08lx, registers:\n",
> smp_processor_id(), regs->ip);
> --
> 1.5.4.3
>
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Wed, Oct 22, 2008 at 12:00:14PM +0200, Alexander van Heukelum wrote:
> Make i386's die() equal to x86_64's version.
>
> Whitespace-only changes on x86_64, to make it equal to i386's
> version. (user_mode and user_mode_vm are equal on x86_64.)
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
Acked-by: Neil Horman <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 10 +++-------
> arch/x86/kernel/dumpstack_64.c | 6 +++++-
> 2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index e91ae34..f2046c5 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -378,15 +378,11 @@ void die(const char *str, struct pt_regs *regs, long err)
> unsigned long flags = oops_begin();
> int sig = SIGSEGV;
>
> - if (die_nest_count < 3) {
> + if (!user_mode_vm(regs))
> report_bug(regs->ip, regs);
>
> - if (__die(str, regs, err))
> - sig = 0;
> - } else {
> - printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
> - }
> -
> + if (__die(str, regs, err))
> + sig = 0;
> oops_end(flags, regs, sig);
> }
>
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 831e1e1..28c67aa 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -506,12 +506,16 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> return 0;
> }
>
> +/*
> + * This is gone through when something in the kernel has done something bad
> + * and is about to be terminated:
> + */
> void die(const char *str, struct pt_regs *regs, long err)
> {
> unsigned long flags = oops_begin();
> int sig = SIGSEGV;
>
> - if (!user_mode(regs))
> + if (!user_mode_vm(regs))
> report_bug(regs->ip, regs);
>
> if (__die(str, regs, err))
> --
> 1.5.4.3
>
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Wed, Oct 22, 2008 at 02:02:22PM +0200, Ingo Molnar wrote:
>
> * Neil Horman <[email protected]> wrote:
>
> > > Thanks for the review. I've sent a redone patch series just a moment
> > > ago, based on your comments. There was also another problem with
> > > these two patches: oops_end(flags, regs, signr) had special
> > > behaviour for regs=NULL that I did not consider before. The series
> > > has grown due to this issue...
>
> [...]
> > Copy that. Thanks for the quick turn-around.
>
> applied to tip/x86/dumpstack, thanks guys!
>
> Could you please also create arch/x86/kernel/dumpstack.c and move all
> the now-unified bits there - before they get out of sync again?
>
I can do this in the next day or so, unless you have an urge to Alexander
Neil
> Ingo
>
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
On Wed, 22 Oct 2008 15:19:48 -0400, "Neil Horman"
<[email protected]> said:
> On Wed, Oct 22, 2008 at 02:02:22PM +0200, Ingo Molnar wrote:
> >
> > * Neil Horman <[email protected]> wrote:
> >
> > > > Thanks for the review. I've sent a redone patch series just a moment
> > > > ago, based on your comments. There was also another problem with
> > > > these two patches: oops_end(flags, regs, signr) had special
> > > > behaviour for regs=NULL that I did not consider before. The series
> > > > has grown due to this issue...
> >
> > [...]
> > > Copy that. Thanks for the quick turn-around.
> >
> > applied to tip/x86/dumpstack, thanks guys!
> >
> > Could you please also create arch/x86/kernel/dumpstack.c and move all
> > the now-unified bits there - before they get out of sync again?
> >
> I can do this in the next day or so, unless you have an urge to Alexander
Hi Neil,
Many thanks for the review of the patches, and please go ahead
and finish the unification!
Greetings,
Alexander
> Neil
>
> > Ingo
> >
>
> --
> /****************************************************
> * Neil Horman <[email protected]>
> * Software Engineer, Red Hat
> ****************************************************/
--
Alexander van Heukelum
[email protected]
--
http://www.fastmail.fm - A fast, anti-spam email service.