2008-10-17 21:02:35

by Neil Horman

[permalink] [raw]
Subject: [PATCH] kexec: fix hang on i386 when panic occurs while console_sem is held

Hey all-
Theres a corner case in 32 bit x86 kdump at the moment. When the box
panics via nmi, we call bust_spinlocks(1) to disable sensitivity to the
console_sem (allowing us to print to the console in all cases), but we don't
call crash_kexec, until after we call bust_spinlocks(0), which re-enables
console_sem sensitivity. The result is that, if we get an nmi while the
console_sem is held and kdump is configured, and we try to print something to
the console during kdump shutdown (which we often do) we deadlock the box. The
fix is to simply do what 64 bit die_nmi does which is to not call
bust_spinlocks(0) until after we call crash_kexec. Patch below tested
successfully by me:

Regards
Neil

Signed-off-by: Neil Horman <[email protected]>


dumpstack_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 1a78180..b361475 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -405,7 +405,6 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
panic("Non maskable interrupt");
console_silent();
spin_unlock(&nmi_print_lock);
- bust_spinlocks(0);

/*
* If we are in kernel we are probably nested up pretty bad
@@ -416,6 +415,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
crash_kexec(regs);
}

+ bust_spinlocks(0);
do_exit(SIGSEGV);
}

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/


2008-10-20 12:14:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] kexec: fix hang on i386 when panic occurs while console_sem is held


* Neil Horman <[email protected]> wrote:

> Hey all-
> Theres a corner case in 32 bit x86 kdump at the moment. When
> the box panics via nmi, we call bust_spinlocks(1) to disable
> sensitivity to the console_sem (allowing us to print to the console in
> all cases), but we don't call crash_kexec, until after we call
> bust_spinlocks(0), which re-enables console_sem sensitivity. The
> result is that, if we get an nmi while the console_sem is held and
> kdump is configured, and we try to print something to the console
> during kdump shutdown (which we often do) we deadlock the box. The
> fix is to simply do what 64 bit die_nmi does which is to not call
> bust_spinlocks(0) until after we call crash_kexec. Patch below tested
> successfully by me:

applied to tip/x86/urgent, thanks Neil!

> dumpstack_32.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

would be nice to unify this code some more - to create a new
arch/x86/kernel/dumpstack.c and fill it in with die_nmi() as a
beginning?

Ingo

2008-10-20 13:44:51

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH] kexec: fix hang on i386 when panic occurs while console_sem is held

On Mon, Oct 20, 2008 at 02:13:39PM +0200, Ingo Molnar wrote:
>
> * Neil Horman <[email protected]> wrote:
>
> > Hey all-
> > Theres a corner case in 32 bit x86 kdump at the moment. When
> > the box panics via nmi, we call bust_spinlocks(1) to disable
> > sensitivity to the console_sem (allowing us to print to the console in
> > all cases), but we don't call crash_kexec, until after we call
> > bust_spinlocks(0), which re-enables console_sem sensitivity. The
> > result is that, if we get an nmi while the console_sem is held and
> > kdump is configured, and we try to print something to the console
> > during kdump shutdown (which we often do) we deadlock the box. The
> > fix is to simply do what 64 bit die_nmi does which is to not call
> > bust_spinlocks(0) until after we call crash_kexec. Patch below tested
> > successfully by me:
>
> applied to tip/x86/urgent, thanks Neil!
>
Thanks, Ingo!

> > dumpstack_32.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> would be nice to unify this code some more - to create a new
> arch/x86/kernel/dumpstack.c and fill it in with die_nmi() as a
> beginning?
>
Agreed, I'll try look into that as soon as I have some free time.

Regards
Neil

> Ingo
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

Subject: [PATCH] x86, dumpstack: some more unification

Hi Ingo, Neil,

Doing more unification of dumpstack is/was on my
TODO list, but it got sidetracked with some other
hobby projects.

I did unify oops_begin, oops_end, die and die_nmi,
and it seems to work. It needs more careful testing,
I think, but if someone (Neil?) wants to take a
look and validate the result...

Greetings,
Alexander

Subject: [PATCH] x86: make oops_begin and oops_end equal

Mostly use the x86_64 version of oops_begin() and oops_end() on
i386 too. Changes to the original x86_64 version:

- move add_taint(TAINT_DIE) into oops_end()
- add a conditional crash_kexec() into oops_end()

Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 36 +++++++++++++++++++++---------------
arch/x86/kernel/dumpstack_64.c | 4 +++-
2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index b361475..e45952b 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -289,35 +289,41 @@ static unsigned int die_nest_count;

unsigned __kprobes long oops_begin(void)
{
+ int cpu;
unsigned long flags;

oops_enter();

- if (die_owner != raw_smp_processor_id()) {
- console_verbose();
- raw_local_irq_save(flags);
- __raw_spin_lock(&die_lock);
- die_owner = smp_processor_id();
- die_nest_count = 0;
- bust_spinlocks(1);
- } else {
- raw_local_irq_save(flags);
+ /* racy, but better than risking deadlock. */
+ raw_local_irq_save(flags);
+ cpu = smp_processor_id();
+ if (!__raw_spin_trylock(&die_lock)) {
+ if (cpu == die_owner)
+ /* nested oops. should stop eventually */;
+ else
+ __raw_spin_lock(&die_lock);
}
die_nest_count++;
+ die_owner = cpu;
+ console_verbose();
+ bust_spinlocks(1);
return flags;
}

void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
- bust_spinlocks(0);
die_owner = -1;
+ bust_spinlocks(0);
+ die_nest_count--;
add_taint(TAINT_DIE);
- __raw_spin_unlock(&die_lock);
+ if (!die_nest_count)
+ /* Nest count reaches zero, release the lock. */
+ __raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
-
- if (!regs)
+ if (!regs) {
+ oops_exit();
return;
-
+ }
if (kexec_should_crash(current))
crash_kexec(regs);
if (in_interrupt())
@@ -405,6 +411,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
panic("Non maskable interrupt");
console_silent();
spin_unlock(&nmi_print_lock);
+ bust_spinlocks(0);

/*
* If we are in kernel we are probably nested up pretty bad
@@ -415,7 +422,6 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
crash_kexec(regs);
}

- bust_spinlocks(0);
do_exit(SIGSEGV);
}

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 96a5db7..cd7b46b 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -461,6 +461,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
die_owner = -1;
bust_spinlocks(0);
die_nest_count--;
+ add_taint(TAINT_DIE);
if (!die_nest_count)
/* Nest count reaches zero, release the lock. */
__raw_spin_unlock(&die_lock);
@@ -469,6 +470,8 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
oops_exit();
return;
}
+ if (kexec_should_crash(current))
+ crash_kexec(regs);
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
@@ -496,7 +499,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
return 1;

show_registers(regs);
- add_taint(TAINT_DIE);
/* Executive summary in case the oops scrolled away */
printk(KERN_ALERT "RIP ");
printk_address(regs->ip, 1);
--
1.5.4.3

Subject: [PATCH] x86, dumpstack: make die and die_nmi equal

Use the x86_64 version of die() and die_nmi() on i386 too.
Changes to the original x86_64-version have no influence
on the generated code:

- whitespace, comments
- use user_mode_vm() instead of user_mode()

Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 40 ++++++++++++++--------------------------
arch/x86/kernel/dumpstack_64.c | 9 +++++++--
2 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index e45952b..6f00938 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -377,52 +377,40 @@ void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();

- if (die_nest_count < 3) {
+ if (!user_mode_vm(regs))
report_bug(regs->ip, regs);

- if (__die(str, regs, err))
- regs = NULL;
- } else {
- printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
- }
+ if (__die(str, regs, err))
+ regs = NULL;

oops_end(flags, regs, SIGSEGV);
}

-static DEFINE_SPINLOCK(nmi_print_lock);
-
void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
+ unsigned long flags;
+
if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;

- spin_lock(&nmi_print_lock);
+ flags = oops_begin();
/*
* We are in trouble anyway, lets at least try
- * to get a message out:
+ * to get a message out.
*/
- bust_spinlocks(1);
printk(KERN_EMERG "%s", str);
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
show_registers(regs);
- if (do_panic)
- panic("Non maskable interrupt");
- console_silent();
- spin_unlock(&nmi_print_lock);
- bust_spinlocks(0);
-
- /*
- * If we are in kernel we are probably nested up pretty bad
- * and might aswell get out now while we still can:
- */
- if (!user_mode_vm(regs)) {
- current->thread.trap_no = 2;
+ if (kexec_should_crash(current))
crash_kexec(regs);
- }
-
- do_exit(SIGSEGV);
+ if (do_panic || panic_on_oops)
+ panic("Non maskable interrupt");
+ oops_end(flags, NULL, SIGBUS);
+ nmi_exit();
+ local_irq_enable();
+ do_exit(SIGBUS);
}

static int __init oops_setup(char *s)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index cd7b46b..dbcca05 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -508,19 +508,24 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
return 0;
}

+/*
+ * This is gone through when something in the kernel has done something bad
+ * and is about to be terminated:
+ */
void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();

- if (!user_mode(regs))
+ if (!user_mode_vm(regs))
report_bug(regs->ip, regs);

if (__die(str, regs, err))
regs = NULL;
+
oops_end(flags, regs, SIGSEGV);
}

-notrace __kprobes void
+void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
unsigned long flags;
--
1.5.4.3

2008-10-20 16:52:31

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH] x86, dumpstack: some more unification

On Mon, Oct 20, 2008 at 05:07:31PM +0200, Alexander van Heukelum wrote:
> Hi Ingo, Neil,
>
> Doing more unification of dumpstack is/was on my
> TODO list, but it got sidetracked with some other
> hobby projects.
>
> I did unify oops_begin, oops_end, die and die_nmi,
> and it seems to work. It needs more careful testing,
> I think, but if someone (Neil?) wants to take a
> look and validate the result...
>
> Greetings,
> Alexander
>
Please send me the patch, If/when I get a few moments, I'd be happy to give it a
spin here.

Thanks!
Neil

> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

--
/***************************************************
*Neil Horman
*Senior Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/

2008-10-21 14:47:35

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH] x86: make oops_begin and oops_end equal

On Mon, Oct 20, 2008 at 05:08:34PM +0200, Alexander van Heukelum wrote:
> Mostly use the x86_64 version of oops_begin() and oops_end() on
> i386 too. Changes to the original x86_64 version:
>
Hey, doing a sight review this am here. Didn't find anything major, but I did
find a few little nits. comments inlie

> - move add_taint(TAINT_DIE) into oops_end()
> - add a conditional crash_kexec() into oops_end()
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 36 +++++++++++++++++++++---------------
> arch/x86/kernel/dumpstack_64.c | 4 +++-
> 2 files changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index b361475..e45952b 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -289,35 +289,41 @@ static unsigned int die_nest_count;
>
> unsigned __kprobes long oops_begin(void)
> {
> + int cpu;
> unsigned long flags;
>
> oops_enter();
>
> - if (die_owner != raw_smp_processor_id()) {
> - console_verbose();
> - raw_local_irq_save(flags);
> - __raw_spin_lock(&die_lock);
> - die_owner = smp_processor_id();
> - die_nest_count = 0;
> - bust_spinlocks(1);
> - } else {
> - raw_local_irq_save(flags);
> + /* racy, but better than risking deadlock. */
> + raw_local_irq_save(flags);
> + cpu = smp_processor_id();
> + if (!__raw_spin_trylock(&die_lock)) {
> + if (cpu == die_owner)
> + /* nested oops. should stop eventually */;
> + else
> + __raw_spin_lock(&die_lock);
> }
> die_nest_count++;
> + die_owner = cpu;
> + console_verbose();
> + bust_spinlocks(1);
> return flags;
> }
>
> void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> {
> - bust_spinlocks(0);
> die_owner = -1;
> + bust_spinlocks(0);
> + die_nest_count--;
> add_taint(TAINT_DIE);
> - __raw_spin_unlock(&die_lock);
> + if (!die_nest_count)
> + /* Nest count reaches zero, release the lock. */
> + __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
> -
> - if (!regs)
> + if (!regs) {
> + oops_exit();
> return;
> -
> + }
> if (kexec_should_crash(current))
> crash_kexec(regs);
Hmm. I think this creates the same case that I just fixed in my initial post.
If we start using oops_end with this here, it may be possible to call
crash_kexec with the console_sem held. If that happens, we deadlock. I think
you should be able to move this clause up above the bust_spinlocks(0) without
any issue, and that would take care of that

> if (in_interrupt())
> @@ -405,6 +411,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> panic("Non maskable interrupt");
> console_silent();
> spin_unlock(&nmi_print_lock);
> + bust_spinlocks(0);
>
> /*
> * If we are in kernel we are probably nested up pretty bad
> @@ -415,7 +422,6 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> crash_kexec(regs);
> }
>
> - bust_spinlocks(0);
> do_exit(SIGSEGV);
> }
>
This undoes my previous patch. I realize your second patch fixes it properly so
the ordering is correct when oops_begin and oops_end are used, but if you could
rediff so this isn't here, I'd appreciate it. If these patches are committed
separately, you'll avoid having the tree in a state where that deadlock can
reoccur (even if it is just for one commit)

> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 96a5db7..cd7b46b 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -461,6 +461,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> die_owner = -1;
> bust_spinlocks(0);
> die_nest_count--;
> + add_taint(TAINT_DIE);
> if (!die_nest_count)
> /* Nest count reaches zero, release the lock. */
> __raw_spin_unlock(&die_lock);
> @@ -469,6 +470,8 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> oops_exit();
> return;
> }
> + if (kexec_should_crash(current))
> + crash_kexec(regs);
If you're going to add the crash_kexec here (which looking at the call sites,
makes sense to me), you should likely remove it from the critical section of die
and die_nmi, just to avoid the redundancy. Same issue as the 32 bit version
above applies, this needs to happen before you call bust_spinlocks(0).

Fix those issues, and the rest looks good to me.

Regards
Neil

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-21 15:02:21

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH] x86, dumpstack: make die and die_nmi equal

On Mon, Oct 20, 2008 at 05:11:06PM +0200, Alexander van Heukelum wrote:
> Use the x86_64 version of die() and die_nmi() on i386 too.
> Changes to the original x86_64-version have no influence
> on the generated code:
>
> - whitespace, comments
> - use user_mode_vm() instead of user_mode()
>
Hey, smore more comments inline

> Signed-off-by: Alexander van Heukelum <[email protected]>
> ---
> arch/x86/kernel/dumpstack_32.c | 40 ++++++++++++++--------------------------
> arch/x86/kernel/dumpstack_64.c | 9 +++++++--
> 2 files changed, 21 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index e45952b..6f00938 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -377,52 +377,40 @@ void die(const char *str, struct pt_regs *regs, long err)
> {
> unsigned long flags = oops_begin();
>
> - if (die_nest_count < 3) {
> + if (!user_mode_vm(regs))
> report_bug(regs->ip, regs);
>
> - if (__die(str, regs, err))
> - regs = NULL;
> - } else {
> - printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
> - }
> + if (__die(str, regs, err))
> + regs = NULL;
>
> oops_end(flags, regs, SIGSEGV);
> }
>
> -static DEFINE_SPINLOCK(nmi_print_lock);
> -
> void notrace __kprobes
> die_nmi(char *str, struct pt_regs *regs, int do_panic)
> {
> + unsigned long flags;
> +
> if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
> return;
>
> - spin_lock(&nmi_print_lock);
> + flags = oops_begin();
> /*
> * We are in trouble anyway, lets at least try
> - * to get a message out:
> + * to get a message out.
> */
> - bust_spinlocks(1);
> printk(KERN_EMERG "%s", str);
> printk(" on CPU%d, ip %08lx, registers:\n",
> smp_processor_id(), regs->ip);
> show_registers(regs);
> - if (do_panic)
> - panic("Non maskable interrupt");
> - console_silent();
> - spin_unlock(&nmi_print_lock);
> - bust_spinlocks(0);
> -
> - /*
> - * If we are in kernel we are probably nested up pretty bad
> - * and might aswell get out now while we still can:
> - */
> - if (!user_mode_vm(regs)) {
> - current->thread.trap_no = 2;
> + if (kexec_should_crash(current))
> crash_kexec(regs);
As I mentioned in your other patch, this crash_kexec can go I think if you're
going to do it in oops_end (as long as you correct the deadlock condition there

And as before, if you could rediff so that we didn't reintroduce the deadlock
that we just fixed, that would be much appreciated.

Beyond that, I think this looks good. Fix that up and It'll have my ack.

Best
Neil

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 10:00:55

by Alexander van Heukelum

[permalink] [raw]
Subject: [PATCH 1/7] i386, dumpstack: Move crash_kexec before bust_spinlocks(0) in oops_end

crash_kexec should not be called with console_sem held. Move
the call before bust_spinlocks(0) in oops_end to avoid the
problem.

Signed-off-by: Alexander van Heukelum <[email protected]>
Cc: "Neil Horman" <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index b361475..5493d31 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -309,6 +309,9 @@ unsigned __kprobes long oops_begin(void)

void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
+ if (regs && kexec_should_crash(current))
+ crash_kexec(regs);
+
bust_spinlocks(0);
die_owner = -1;
add_taint(TAINT_DIE);
@@ -318,8 +321,6 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
if (!regs)
return;

- if (kexec_should_crash(current))
- crash_kexec(regs);
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
--
1.5.4.3

2008-10-22 10:01:18

by Alexander van Heukelum

[permalink] [raw]
Subject: [PATCH 0/7] x86, dumpstack: unify oops_begin, oops_end, die_nmi and die

Hello Neil, Ingo, everyone,

This patch series unifies oops_begin, oops_end, die_nmi
and die in dumpstack_xx.c. The first patch might be
considered a bug-fix, because otherwise, as Neil Horman
points out, it might be possible to call crash_kexec with
the console_sem held; if that happens, we deadlock.

Greetings,
Alexander

2008-10-22 10:09:39

by Alexander van Heukelum

[permalink] [raw]
Subject: [PATCH 3/7] x86_64, dumpstack: move kexec_crash from __die to oops_end

oops_end is preceded by either a call to __die, or a conditional
call to crash_kexec. Move the conditional call to crash_kexec
from the end of __die to the start of oops_end and remove
the superfluous call to crash_kexec in die_nmi.

Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_64.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index ffefea6..57ce11b 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -458,6 +458,9 @@ unsigned __kprobes long oops_begin(void)

void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
+ if (regs && kexec_should_crash(current))
+ crash_kexec(regs);
+
die_owner = -1;
bust_spinlocks(0);
die_nest_count--;
@@ -501,8 +504,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
printk(KERN_ALERT "RIP ");
printk_address(regs->ip, 1);
printk(" RSP <%016lx>\n", regs->sp);
- if (kexec_should_crash(current))
- crash_kexec(regs);
return 0;
}

@@ -536,11 +537,9 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
show_registers(regs);
- if (kexec_should_crash(current))
- crash_kexec(regs);
+ oops_end(flags, regs, 0);
if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
- oops_end(flags, regs, 0);
nmi_exit();
local_irq_enable();
do_exit(SIGBUS);
--
1.5.4.3

2008-10-22 10:09:55

by Alexander van Heukelum

[permalink] [raw]
Subject: [PATCH 2/7] x86, dumpstack: let signr=0 signal no do_exit

Change oops_end such that signr=0 signals that do_exit
is not to be called.

Currently, each use of __die is soon followed by a call
to oops_end and 'regs' is set to NULL if oops_end is expected
not to call do_exit. Change all such pairs to set signr=0
instead. On x86_64 oops_end is used 'bare' in die_nmi; use
signr=0 instead of regs=NULL there, too.

Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 7 ++++---
arch/x86/kernel/dumpstack_64.c | 9 +++++----
arch/x86/mm/fault.c | 11 +++++++----
3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 5493d31..7c22f99 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -318,7 +318,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
__raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);

- if (!regs)
+ if (!signr)
return;

if (in_interrupt())
@@ -371,17 +371,18 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();
+ int sig = SIGSEGV;

if (die_nest_count < 3) {
report_bug(regs->ip, regs);

if (__die(str, regs, err))
- regs = NULL;
+ sig = 0;
} else {
printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
}

- oops_end(flags, regs, SIGSEGV);
+ oops_end(flags, regs, sig);
}

static DEFINE_SPINLOCK(nmi_print_lock);
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 96a5db7..ffefea6 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -465,7 +465,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
/* Nest count reaches zero, release the lock. */
__raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
- if (!regs) {
+ if (!signr) {
oops_exit();
return;
}
@@ -509,13 +509,14 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();
+ int sig = SIGSEGV;

if (!user_mode(regs))
report_bug(regs->ip, regs);

if (__die(str, regs, err))
- regs = NULL;
- oops_end(flags, regs, SIGSEGV);
+ sig = 0;
+ oops_end(flags, regs, sig);
}

notrace __kprobes void
@@ -539,7 +540,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
crash_kexec(regs);
if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
- oops_end(flags, NULL, SIGBUS);
+ oops_end(flags, regs, 0);
nmi_exit();
local_irq_enable();
do_exit(SIGBUS);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8e52e68..ed9ee30 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -415,6 +415,7 @@ static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
unsigned long error_code)
{
unsigned long flags = oops_begin();
+ int sig = SIGKILL;
struct task_struct *tsk;

printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
@@ -425,8 +426,8 @@ static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
tsk->thread.trap_no = 14;
tsk->thread.error_code = error_code;
if (__die("Bad pagetable", regs, error_code))
- regs = NULL;
- oops_end(flags, regs, SIGKILL);
+ sig = 0;
+ oops_end(flags, regs, sig);
}
#endif

@@ -594,6 +595,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)

#ifdef CONFIG_X86_64
unsigned long flags;
+ int sig;
#endif

tsk = current;
@@ -868,11 +870,12 @@ no_context:
bust_spinlocks(0);
do_exit(SIGKILL);
#else
+ sig = SIGKILL;
if (__die("Oops", regs, error_code))
- regs = NULL;
+ sig = 0;
/* Executive summary in case the body of the oops scrolled away */
printk(KERN_EMERG "CR2: %016lx\n", address);
- oops_end(flags, regs, SIGKILL);
+ oops_end(flags, regs, sig);
#endif

/*
--
1.5.4.3

2008-10-22 10:18:19

by Alexander van Heukelum

[permalink] [raw]
Subject: [PATCH 7/7] i386, dumpstack: unify die()

Make i386's die() equal to x86_64's version.

Whitespace-only changes on x86_64, to make it equal to i386's
version. (user_mode and user_mode_vm are equal on x86_64.)

Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 10 +++-------
arch/x86/kernel/dumpstack_64.c | 6 +++++-
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index e91ae34..f2046c5 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -378,15 +378,11 @@ void die(const char *str, struct pt_regs *regs, long err)
unsigned long flags = oops_begin();
int sig = SIGSEGV;

- if (die_nest_count < 3) {
+ if (!user_mode_vm(regs))
report_bug(regs->ip, regs);

- if (__die(str, regs, err))
- sig = 0;
- } else {
- printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
- }
-
+ if (__die(str, regs, err))
+ sig = 0;
oops_end(flags, regs, sig);
}

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 831e1e1..28c67aa 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -506,12 +506,16 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
return 0;
}

+/*
+ * This is gone through when something in the kernel has done something bad
+ * and is about to be terminated:
+ */
void die(const char *str, struct pt_regs *regs, long err)
{
unsigned long flags = oops_begin();
int sig = SIGSEGV;

- if (!user_mode(regs))
+ if (!user_mode_vm(regs))
report_bug(regs->ip, regs);

if (__die(str, regs, err))
--
1.5.4.3

2008-10-22 10:18:33

by Alexander van Heukelum

[permalink] [raw]
Subject: [PATCH 6/7] i386, dumpstack: use oops_begin/oops_end in die_nmi

Use oops_begin and oops_end in die_nmi.

Whitespace-only changes on x86_64, to make it equal to i386's
version.

Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 33 +++++++++++----------------------
arch/x86/kernel/dumpstack_64.c | 4 ++--
2 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 7c7d691..e91ae34 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -390,40 +390,29 @@ void die(const char *str, struct pt_regs *regs, long err)
oops_end(flags, regs, sig);
}

-static DEFINE_SPINLOCK(nmi_print_lock);
-
void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
+ unsigned long flags;
+
if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;

- spin_lock(&nmi_print_lock);
/*
- * We are in trouble anyway, lets at least try
- * to get a message out:
- */
- bust_spinlocks(1);
+ * We are in trouble anyway, lets at least try
+ * to get a message out.
+ */
+ flags = oops_begin();
printk(KERN_EMERG "%s", str);
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
show_registers(regs);
- if (do_panic)
+ oops_end(flags, regs, 0);
+ if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
- console_silent();
- spin_unlock(&nmi_print_lock);
-
- /*
- * If we are in kernel we are probably nested up pretty bad
- * and might aswell get out now while we still can:
- */
- if (!user_mode_vm(regs)) {
- current->thread.trap_no = 2;
- crash_kexec(regs);
- }
-
- bust_spinlocks(0);
- do_exit(SIGSEGV);
+ nmi_exit();
+ local_irq_enable();
+ do_exit(SIGBUS);
}

static int __init oops_setup(char *s)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index dc6162b..831e1e1 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -519,7 +519,7 @@ void die(const char *str, struct pt_regs *regs, long err)
oops_end(flags, regs, sig);
}

-notrace __kprobes void
+void notrace __kprobes
die_nmi(char *str, struct pt_regs *regs, int do_panic)
{
unsigned long flags;
@@ -527,11 +527,11 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;

- flags = oops_begin();
/*
* We are in trouble anyway, lets at least try
* to get a message out.
*/
+ flags = oops_begin();
printk(KERN_EMERG "%s", str);
printk(" on CPU%d, ip %08lx, registers:\n",
smp_processor_id(), regs->ip);
--
1.5.4.3

2008-10-22 10:18:55

by Alexander van Heukelum

[permalink] [raw]
Subject: Re: [PATCH] x86: make oops_begin and oops_end equal

On Tue, 21 Oct 2008 10:45:05 -0400, "Neil Horman"
<[email protected]> said:
> On Mon, Oct 20, 2008 at 05:08:34PM +0200, Alexander van Heukelum wrote:
> > Mostly use the x86_64 version of oops_begin() and oops_end() on
> > i386 too. Changes to the original x86_64 version:
> >
> Hey, doing a sight review this am here. Didn't find anything major, but
> I did
> find a few little nits. comments inlie

Hi Neil,

Thanks for the review. I've sent a redone patch series just a moment
ago, based on your comments. There was also another problem with these
two patches: oops_end(flags, regs, signr) had special behaviour for
regs=NULL that I did not consider before. The series has grown due
to this issue...

>> [...]
> Hmm. I think this creates the same case that I just fixed in my initial
> post. If we start using oops_end with this here, it may be possible to call
> crash_kexec with the console_sem held. If that happens, we deadlock. I
> think you should be able to move this clause up above the bust_spinlocks(0)
> without any issue, and that would take care of that

Indeed. The new series does exactly that.

>> [...]
> This undoes my previous patch. I realize your second patch fixes it
> properly so the ordering is correct when oops_begin and oops_end are used, but if you
> could rediff so this isn't here, I'd appreciate it. If these patches are
> committed separately, you'll avoid having the tree in a state where that deadlock
> can reoccur (even if it is just for one commit)

Yeah, I quickly rediffed the patches I already had. The new series
leaves
it as is until die_nmi is replaced by the oops_begin/oops_end version.

>> [...]
> If you're going to add the crash_kexec here (which looking at the call
> sites, makes sense to me), you should likely remove it from the critical section
> of die and die_nmi, just to avoid the redundancy. Same issue as the 32 bit
> version above applies, this needs to happen before you call bust_spinlocks(0).

Indeed.

> Fix those issues, and the rest looks good to me.

I think I've done that ;).

Thanks,
Greetings,
Alexander

(I will probably not be able to respond to e-mail until after the
weekend)
--
Alexander van Heukelum
[email protected]

--
http://www.fastmail.fm - mmm... Fastmail...

2008-10-22 10:37:55

by Alexander van Heukelum

[permalink] [raw]
Subject: [PATCH 5/7] i386, dumpstack: use x86_64's method to account die_nest_count

oops_begin/oops_end should always be used in pairs. On x86_64
oops_begin increments die_nest_count, and oops_end decrements
die_nest_count. Doing this makes oops_begin and oops_end equal
to the x86_64 versions.

Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 29 +++++++++++++++++------------
1 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index a29b88f..7c7d691 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -289,21 +289,24 @@ static unsigned int die_nest_count;

unsigned __kprobes long oops_begin(void)
{
+ int cpu;
unsigned long flags;

oops_enter();

- if (die_owner != raw_smp_processor_id()) {
- console_verbose();
- raw_local_irq_save(flags);
- __raw_spin_lock(&die_lock);
- die_owner = smp_processor_id();
- die_nest_count = 0;
- bust_spinlocks(1);
- } else {
- raw_local_irq_save(flags);
+ /* racy, but better than risking deadlock. */
+ raw_local_irq_save(flags);
+ cpu = smp_processor_id();
+ if (!__raw_spin_trylock(&die_lock)) {
+ if (cpu == die_owner)
+ /* nested oops. should stop eventually */;
+ else
+ __raw_spin_lock(&die_lock);
}
die_nest_count++;
+ die_owner = cpu;
+ console_verbose();
+ bust_spinlocks(1);
return flags;
}

@@ -315,13 +318,15 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
bust_spinlocks(0);
die_owner = -1;
add_taint(TAINT_DIE);
- __raw_spin_unlock(&die_lock);
+ die_nest_count--;
+ if (!die_nest_count)
+ /* Nest count reaches zero, release the lock. */
+ __raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
-
oops_exit();
+
if (!signr)
return;
-
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
--
1.5.4.3

2008-10-22 10:38:17

by Alexander van Heukelum

[permalink] [raw]
Subject: [PATCH 4/7] x86, dumpstack: always call oops_exit from oops_end

Always call oops_exit from oops_end, even if signr==0.

Also, move add_taint(TAINT_DIE) from __die to oops_end
on x86_64 and interchange two lines to make oops_end
more similar to the i386-version.

Signed-off-by: Alexander van Heukelum <[email protected]>
---
arch/x86/kernel/dumpstack_32.c | 2 +-
arch/x86/kernel/dumpstack_64.c | 11 +++++------
2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 7c22f99..a29b88f 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -318,6 +318,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
__raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);

+ oops_exit();
if (!signr)
return;

@@ -325,7 +326,6 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
panic("Fatal exception in interrupt");
if (panic_on_oops)
panic("Fatal exception");
- oops_exit();
do_exit(signr);
}

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 57ce11b..dc6162b 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -461,22 +461,22 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
if (regs && kexec_should_crash(current))
crash_kexec(regs);

- die_owner = -1;
bust_spinlocks(0);
+ die_owner = -1;
+ add_taint(TAINT_DIE);
die_nest_count--;
if (!die_nest_count)
/* Nest count reaches zero, release the lock. */
__raw_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
- if (!signr) {
- oops_exit();
+ oops_exit();
+
+ if (!signr)
return;
- }
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
panic("Fatal exception");
- oops_exit();
do_exit(signr);
}

@@ -499,7 +499,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
return 1;

show_registers(regs);
- add_taint(TAINT_DIE);
/* Executive summary in case the oops scrolled away */
printk(KERN_ALERT "RIP ");
printk_address(regs->ip, 1);
--
1.5.4.3

2008-10-22 10:47:37

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH] x86: make oops_begin and oops_end equal

On Wed, Oct 22, 2008 at 12:18:41PM +0200, Alexander van Heukelum wrote:
> On Tue, 21 Oct 2008 10:45:05 -0400, "Neil Horman"
> <[email protected]> said:
> > On Mon, Oct 20, 2008 at 05:08:34PM +0200, Alexander van Heukelum wrote:
> > > Mostly use the x86_64 version of oops_begin() and oops_end() on
> > > i386 too. Changes to the original x86_64 version:
> > >
> > Hey, doing a sight review this am here. Didn't find anything major, but
> > I did
> > find a few little nits. comments inlie
>
> Hi Neil,
>
> Thanks for the review. I've sent a redone patch series just a moment
> ago, based on your comments. There was also another problem with these
> two patches: oops_end(flags, regs, signr) had special behaviour for
> regs=NULL that I did not consider before. The series has grown due
> to this issue...
>
> >> [...]
> > Hmm. I think this creates the same case that I just fixed in my initial
> > post. If we start using oops_end with this here, it may be possible to call
> > crash_kexec with the console_sem held. If that happens, we deadlock. I
> > think you should be able to move this clause up above the bust_spinlocks(0)
> > without any issue, and that would take care of that
>
> Indeed. The new series does exactly that.
>
> >> [...]
> > This undoes my previous patch. I realize your second patch fixes it
> > properly so the ordering is correct when oops_begin and oops_end are used, but if you
> > could rediff so this isn't here, I'd appreciate it. If these patches are
> > committed separately, you'll avoid having the tree in a state where that deadlock
> > can reoccur (even if it is just for one commit)
>
> Yeah, I quickly rediffed the patches I already had. The new series
> leaves
> it as is until die_nmi is replaced by the oops_begin/oops_end version.
>
> >> [...]
> > If you're going to add the crash_kexec here (which looking at the call
> > sites, makes sense to me), you should likely remove it from the critical section
> > of die and die_nmi, just to avoid the redundancy. Same issue as the 32 bit
> > version above applies, this needs to happen before you call bust_spinlocks(0).
>
> Indeed.
>
> > Fix those issues, and the rest looks good to me.
>
> I think I've done that ;).
>
> Thanks,
> Greetings,
> Alexander
>
> (I will probably not be able to respond to e-mail until after the
> weekend)
Copy that. Thanks for the quick turn-around.

Best
Neil

> --
> Alexander van Heukelum
> [email protected]
>
> --
> http://www.fastmail.fm - mmm... Fastmail...
>
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 10:51:24

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH 1/7] i386, dumpstack: Move crash_kexec before bust_spinlocks(0) in oops_end

On Wed, Oct 22, 2008 at 12:00:08PM +0200, Alexander van Heukelum wrote:
> crash_kexec should not be called with console_sem held. Move
> the call before bust_spinlocks(0) in oops_end to avoid the
> problem.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>
> Cc: "Neil Horman" <[email protected]>
Acked-by: Neil Horman <[email protected]>

> ---
> arch/x86/kernel/dumpstack_32.c | 5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index b361475..5493d31 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -309,6 +309,9 @@ unsigned __kprobes long oops_begin(void)
>
> void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> {
> + if (regs && kexec_should_crash(current))
> + crash_kexec(regs);
> +
> bust_spinlocks(0);
> die_owner = -1;
> add_taint(TAINT_DIE);
> @@ -318,8 +321,6 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> if (!regs)
> return;
>
> - if (kexec_should_crash(current))
> - crash_kexec(regs);
> if (in_interrupt())
> panic("Fatal exception in interrupt");
> if (panic_on_oops)
> --
> 1.5.4.3
>
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 11:03:48

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH 2/7] x86, dumpstack: let signr=0 signal no do_exit

On Wed, Oct 22, 2008 at 12:00:09PM +0200, Alexander van Heukelum wrote:
> Change oops_end such that signr=0 signals that do_exit
> is not to be called.
>
> Currently, each use of __die is soon followed by a call
> to oops_end and 'regs' is set to NULL if oops_end is expected
> not to call do_exit. Change all such pairs to set signr=0
> instead. On x86_64 oops_end is used 'bare' in die_nmi; use
> signr=0 instead of regs=NULL there, too.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>

Acked-by: Neil Horman <[email protected]>

> ---
> arch/x86/kernel/dumpstack_32.c | 7 ++++---
> arch/x86/kernel/dumpstack_64.c | 9 +++++----
> arch/x86/mm/fault.c | 11 +++++++----
> 3 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index 5493d31..7c22f99 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -318,7 +318,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
>
> - if (!regs)
> + if (!signr)
> return;
>
> if (in_interrupt())
> @@ -371,17 +371,18 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> void die(const char *str, struct pt_regs *regs, long err)
> {
> unsigned long flags = oops_begin();
> + int sig = SIGSEGV;
>
> if (die_nest_count < 3) {
> report_bug(regs->ip, regs);
>
> if (__die(str, regs, err))
> - regs = NULL;
> + sig = 0;
> } else {
> printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
> }
>
> - oops_end(flags, regs, SIGSEGV);
> + oops_end(flags, regs, sig);
> }
>
> static DEFINE_SPINLOCK(nmi_print_lock);
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 96a5db7..ffefea6 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -465,7 +465,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> /* Nest count reaches zero, release the lock. */
> __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
> - if (!regs) {
> + if (!signr) {
> oops_exit();
> return;
> }
> @@ -509,13 +509,14 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> void die(const char *str, struct pt_regs *regs, long err)
> {
> unsigned long flags = oops_begin();
> + int sig = SIGSEGV;
>
> if (!user_mode(regs))
> report_bug(regs->ip, regs);
>
> if (__die(str, regs, err))
> - regs = NULL;
> - oops_end(flags, regs, SIGSEGV);
> + sig = 0;
> + oops_end(flags, regs, sig);
> }
>
> notrace __kprobes void
> @@ -539,7 +540,7 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> crash_kexec(regs);
> if (do_panic || panic_on_oops)
> panic("Non maskable interrupt");
> - oops_end(flags, NULL, SIGBUS);
> + oops_end(flags, regs, 0);
> nmi_exit();
> local_irq_enable();
> do_exit(SIGBUS);
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 8e52e68..ed9ee30 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -415,6 +415,7 @@ static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
> unsigned long error_code)
> {
> unsigned long flags = oops_begin();
> + int sig = SIGKILL;
> struct task_struct *tsk;
>
> printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
> @@ -425,8 +426,8 @@ static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
> tsk->thread.trap_no = 14;
> tsk->thread.error_code = error_code;
> if (__die("Bad pagetable", regs, error_code))
> - regs = NULL;
> - oops_end(flags, regs, SIGKILL);
> + sig = 0;
> + oops_end(flags, regs, sig);
> }
> #endif
>
> @@ -594,6 +595,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
>
> #ifdef CONFIG_X86_64
> unsigned long flags;
> + int sig;
> #endif
>
> tsk = current;
> @@ -868,11 +870,12 @@ no_context:
> bust_spinlocks(0);
> do_exit(SIGKILL);
> #else
> + sig = SIGKILL;
> if (__die("Oops", regs, error_code))
> - regs = NULL;
> + sig = 0;
> /* Executive summary in case the body of the oops scrolled away */
> printk(KERN_EMERG "CR2: %016lx\n", address);
> - oops_end(flags, regs, SIGKILL);
> + oops_end(flags, regs, sig);
> #endif
>
> /*
> --
> 1.5.4.3
>
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 11:14:17

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86_64, dumpstack: move kexec_crash from __die to oops_end

On Wed, Oct 22, 2008 at 12:00:10PM +0200, Alexander van Heukelum wrote:
> oops_end is preceded by either a call to __die, or a conditional
> call to crash_kexec. Move the conditional call to crash_kexec
> from the end of __die to the start of oops_end and remove
> the superfluous call to crash_kexec in die_nmi.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>

Acked-by: Neil Horman <[email protected]>

> ---
> arch/x86/kernel/dumpstack_64.c | 9 ++++-----
> 1 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index ffefea6..57ce11b 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -458,6 +458,9 @@ unsigned __kprobes long oops_begin(void)
>
> void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> {
> + if (regs && kexec_should_crash(current))
> + crash_kexec(regs);
> +
> die_owner = -1;
> bust_spinlocks(0);
> die_nest_count--;
> @@ -501,8 +504,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> printk(KERN_ALERT "RIP ");
> printk_address(regs->ip, 1);
> printk(" RSP <%016lx>\n", regs->sp);
> - if (kexec_should_crash(current))
> - crash_kexec(regs);
> return 0;
> }
>
> @@ -536,11 +537,9 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> printk(" on CPU%d, ip %08lx, registers:\n",
> smp_processor_id(), regs->ip);
> show_registers(regs);
> - if (kexec_should_crash(current))
> - crash_kexec(regs);
> + oops_end(flags, regs, 0);
> if (do_panic || panic_on_oops)
> panic("Non maskable interrupt");
> - oops_end(flags, regs, 0);
> nmi_exit();
> local_irq_enable();
> do_exit(SIGBUS);
> --
> 1.5.4.3
>
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 11:16:59

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86, dumpstack: always call oops_exit from oops_end

On Wed, Oct 22, 2008 at 12:00:11PM +0200, Alexander van Heukelum wrote:
> Always call oops_exit from oops_end, even if signr==0.
>
> Also, move add_taint(TAINT_DIE) from __die to oops_end
> on x86_64 and interchange two lines to make oops_end
> more similar to the i386-version.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>


Acked-by: Neil Horman <[email protected]>

> ---
> arch/x86/kernel/dumpstack_32.c | 2 +-
> arch/x86/kernel/dumpstack_64.c | 11 +++++------
> 2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index 7c22f99..a29b88f 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -318,6 +318,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
>
> + oops_exit();
> if (!signr)
> return;
>
> @@ -325,7 +326,6 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> panic("Fatal exception in interrupt");
> if (panic_on_oops)
> panic("Fatal exception");
> - oops_exit();
> do_exit(signr);
> }
>
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 57ce11b..dc6162b 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -461,22 +461,22 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> if (regs && kexec_should_crash(current))
> crash_kexec(regs);
>
> - die_owner = -1;
> bust_spinlocks(0);
> + die_owner = -1;
> + add_taint(TAINT_DIE);
> die_nest_count--;
> if (!die_nest_count)
> /* Nest count reaches zero, release the lock. */
> __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
> - if (!signr) {
> - oops_exit();
> + oops_exit();
> +
> + if (!signr)
> return;
> - }
> if (in_interrupt())
> panic("Fatal exception in interrupt");
> if (panic_on_oops)
> panic("Fatal exception");
> - oops_exit();
> do_exit(signr);
> }
>
> @@ -499,7 +499,6 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> return 1;
>
> show_registers(regs);
> - add_taint(TAINT_DIE);
> /* Executive summary in case the oops scrolled away */
> printk(KERN_ALERT "RIP ");
> printk_address(regs->ip, 1);
> --
> 1.5.4.3
>
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 11:20:42

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH 5/7] i386, dumpstack: use x86_64's method to account die_nest_count

On Wed, Oct 22, 2008 at 12:00:12PM +0200, Alexander van Heukelum wrote:
> oops_begin/oops_end should always be used in pairs. On x86_64
> oops_begin increments die_nest_count, and oops_end decrements
> die_nest_count. Doing this makes oops_begin and oops_end equal
> to the x86_64 versions.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>

Acked-by: Neil Horman <[email protected]>


> ---
> arch/x86/kernel/dumpstack_32.c | 29 +++++++++++++++++------------
> 1 files changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index a29b88f..7c7d691 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -289,21 +289,24 @@ static unsigned int die_nest_count;
>
> unsigned __kprobes long oops_begin(void)
> {
> + int cpu;
> unsigned long flags;
>
> oops_enter();
>
> - if (die_owner != raw_smp_processor_id()) {
> - console_verbose();
> - raw_local_irq_save(flags);
> - __raw_spin_lock(&die_lock);
> - die_owner = smp_processor_id();
> - die_nest_count = 0;
> - bust_spinlocks(1);
> - } else {
> - raw_local_irq_save(flags);
> + /* racy, but better than risking deadlock. */
> + raw_local_irq_save(flags);
> + cpu = smp_processor_id();
> + if (!__raw_spin_trylock(&die_lock)) {
> + if (cpu == die_owner)
> + /* nested oops. should stop eventually */;
> + else
> + __raw_spin_lock(&die_lock);
> }
> die_nest_count++;
> + die_owner = cpu;
> + console_verbose();
> + bust_spinlocks(1);
> return flags;
> }
>
> @@ -315,13 +318,15 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
> bust_spinlocks(0);
> die_owner = -1;
> add_taint(TAINT_DIE);
> - __raw_spin_unlock(&die_lock);
> + die_nest_count--;
> + if (!die_nest_count)
> + /* Nest count reaches zero, release the lock. */
> + __raw_spin_unlock(&die_lock);
> raw_local_irq_restore(flags);
> -
> oops_exit();
> +
> if (!signr)
> return;
> -
> if (in_interrupt())
> panic("Fatal exception in interrupt");
> if (panic_on_oops)
> --
> 1.5.4.3
>
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 12:03:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: make oops_begin and oops_end equal


* Neil Horman <[email protected]> wrote:

> > Thanks for the review. I've sent a redone patch series just a moment
> > ago, based on your comments. There was also another problem with
> > these two patches: oops_end(flags, regs, signr) had special
> > behaviour for regs=NULL that I did not consider before. The series
> > has grown due to this issue...

[...]
> Copy that. Thanks for the quick turn-around.

applied to tip/x86/dumpstack, thanks guys!

Could you please also create arch/x86/kernel/dumpstack.c and move all
the now-unified bits there - before they get out of sync again?

Ingo

2008-10-22 12:38:43

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH 6/7] i386, dumpstack: use oops_begin/oops_end in die_nmi

On Wed, Oct 22, 2008 at 12:00:13PM +0200, Alexander van Heukelum wrote:
> Use oops_begin and oops_end in die_nmi.
>
> Whitespace-only changes on x86_64, to make it equal to i386's
> version.
>
> Signed-off-by: Alexander van Heukelum <[email protected]>

Acked-by: Neil Horman <[email protected]>

> ---
> arch/x86/kernel/dumpstack_32.c | 33 +++++++++++----------------------
> arch/x86/kernel/dumpstack_64.c | 4 ++--
> 2 files changed, 13 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index 7c7d691..e91ae34 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -390,40 +390,29 @@ void die(const char *str, struct pt_regs *regs, long err)
> oops_end(flags, regs, sig);
> }
>
> -static DEFINE_SPINLOCK(nmi_print_lock);
> -
> void notrace __kprobes
> die_nmi(char *str, struct pt_regs *regs, int do_panic)
> {
> + unsigned long flags;
> +
> if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
> return;
>
> - spin_lock(&nmi_print_lock);
> /*
> - * We are in trouble anyway, lets at least try
> - * to get a message out:
> - */
> - bust_spinlocks(1);
> + * We are in trouble anyway, lets at least try
> + * to get a message out.
> + */
> + flags = oops_begin();
> printk(KERN_EMERG "%s", str);
> printk(" on CPU%d, ip %08lx, registers:\n",
> smp_processor_id(), regs->ip);
> show_registers(regs);
> - if (do_panic)
> + oops_end(flags, regs, 0);
> + if (do_panic || panic_on_oops)
> panic("Non maskable interrupt");
> - console_silent();
> - spin_unlock(&nmi_print_lock);
> -
> - /*
> - * If we are in kernel we are probably nested up pretty bad
> - * and might aswell get out now while we still can:
> - */
> - if (!user_mode_vm(regs)) {
> - current->thread.trap_no = 2;
> - crash_kexec(regs);
> - }
> -
> - bust_spinlocks(0);
> - do_exit(SIGSEGV);
> + nmi_exit();
> + local_irq_enable();
> + do_exit(SIGBUS);
> }
>
> static int __init oops_setup(char *s)
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index dc6162b..831e1e1 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -519,7 +519,7 @@ void die(const char *str, struct pt_regs *regs, long err)
> oops_end(flags, regs, sig);
> }
>
> -notrace __kprobes void
> +void notrace __kprobes
> die_nmi(char *str, struct pt_regs *regs, int do_panic)
> {
> unsigned long flags;
> @@ -527,11 +527,11 @@ die_nmi(char *str, struct pt_regs *regs, int do_panic)
> if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) == NOTIFY_STOP)
> return;
>
> - flags = oops_begin();
> /*
> * We are in trouble anyway, lets at least try
> * to get a message out.
> */
> + flags = oops_begin();
> printk(KERN_EMERG "%s", str);
> printk(" on CPU%d, ip %08lx, registers:\n",
> smp_processor_id(), regs->ip);
> --
> 1.5.4.3
>
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 13:39:41

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH 7/7] i386, dumpstack: unify die()

On Wed, Oct 22, 2008 at 12:00:14PM +0200, Alexander van Heukelum wrote:
> Make i386's die() equal to x86_64's version.
>
> Whitespace-only changes on x86_64, to make it equal to i386's
> version. (user_mode and user_mode_vm are equal on x86_64.)
>
> Signed-off-by: Alexander van Heukelum <[email protected]>

Acked-by: Neil Horman <[email protected]>

> ---
> arch/x86/kernel/dumpstack_32.c | 10 +++-------
> arch/x86/kernel/dumpstack_64.c | 6 +++++-
> 2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index e91ae34..f2046c5 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -378,15 +378,11 @@ void die(const char *str, struct pt_regs *regs, long err)
> unsigned long flags = oops_begin();
> int sig = SIGSEGV;
>
> - if (die_nest_count < 3) {
> + if (!user_mode_vm(regs))
> report_bug(regs->ip, regs);
>
> - if (__die(str, regs, err))
> - sig = 0;
> - } else {
> - printk(KERN_EMERG "Recursive die() failure, output suppressed\n");
> - }
> -
> + if (__die(str, regs, err))
> + sig = 0;
> oops_end(flags, regs, sig);
> }
>
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 831e1e1..28c67aa 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -506,12 +506,16 @@ int __kprobes __die(const char *str, struct pt_regs *regs, long err)
> return 0;
> }
>
> +/*
> + * This is gone through when something in the kernel has done something bad
> + * and is about to be terminated:
> + */
> void die(const char *str, struct pt_regs *regs, long err)
> {
> unsigned long flags = oops_begin();
> int sig = SIGSEGV;
>
> - if (!user_mode(regs))
> + if (!user_mode_vm(regs))
> report_bug(regs->ip, regs);
>
> if (__die(str, regs, err))
> --
> 1.5.4.3
>
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-22 19:22:19

by Neil Horman

[permalink] [raw]
Subject: Re: [PATCH] x86: make oops_begin and oops_end equal

On Wed, Oct 22, 2008 at 02:02:22PM +0200, Ingo Molnar wrote:
>
> * Neil Horman <[email protected]> wrote:
>
> > > Thanks for the review. I've sent a redone patch series just a moment
> > > ago, based on your comments. There was also another problem with
> > > these two patches: oops_end(flags, regs, signr) had special
> > > behaviour for regs=NULL that I did not consider before. The series
> > > has grown due to this issue...
>
> [...]
> > Copy that. Thanks for the quick turn-around.
>
> applied to tip/x86/dumpstack, thanks guys!
>
> Could you please also create arch/x86/kernel/dumpstack.c and move all
> the now-unified bits there - before they get out of sync again?
>
I can do this in the next day or so, unless you have an urge to Alexander
Neil

> Ingo
>

--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/

2008-10-23 09:22:44

by Alexander van Heukelum

[permalink] [raw]
Subject: Re: [PATCH] x86: make oops_begin and oops_end equal

On Wed, 22 Oct 2008 15:19:48 -0400, "Neil Horman"
<[email protected]> said:
> On Wed, Oct 22, 2008 at 02:02:22PM +0200, Ingo Molnar wrote:
> >
> > * Neil Horman <[email protected]> wrote:
> >
> > > > Thanks for the review. I've sent a redone patch series just a moment
> > > > ago, based on your comments. There was also another problem with
> > > > these two patches: oops_end(flags, regs, signr) had special
> > > > behaviour for regs=NULL that I did not consider before. The series
> > > > has grown due to this issue...
> >
> > [...]
> > > Copy that. Thanks for the quick turn-around.
> >
> > applied to tip/x86/dumpstack, thanks guys!
> >
> > Could you please also create arch/x86/kernel/dumpstack.c and move all
> > the now-unified bits there - before they get out of sync again?
> >
> I can do this in the next day or so, unless you have an urge to Alexander

Hi Neil,

Many thanks for the review of the patches, and please go ahead
and finish the unification!

Greetings,
Alexander

> Neil
>
> > Ingo
> >
>
> --
> /****************************************************
> * Neil Horman <[email protected]>
> * Software Engineer, Red Hat
> ****************************************************/
--
Alexander van Heukelum
[email protected]

--
http://www.fastmail.fm - A fast, anti-spam email service.