2008-01-10 06:04:57

by Arjan van de Ven

[permalink] [raw]
Subject: Fix x86 32 bit FRAME_POINTER chasing code

This patch is simple; I don't know if it's .24 candidate; the bug is pretty bad but not a recent regression,
and there is obviously some risk with touching this code.



Subject: Fix x86 32 bit FRAME_POINTER chasing code
From: Arjan van de Ven <[email protected]>

The current x86 32 bit FRAME_POINTER chasing code has a nasty bug in
that the EBP tracer doesn't actually update the value of EBP it is
tracing, so that the code doesn't actually switch to the irq stack properly.

The result is a truncated backtrace:

WARNING: at timeroops.c:8 kerneloops_regression_test() (Not tainted)
Pid: 0, comm: swapper Not tainted 2.6.24-0.77.rc4.git4.fc9 #1
[<c040649a>] show_trace_log_lvl+0x1a/0x2f
[<c0406d41>] show_trace+0x12/0x14
[<c0407061>] dump_stack+0x6c/0x72
[<e0258049>] kerneloops_regression_test+0x44/0x46 [timeroops]
[<c04371ac>] run_timer_softirq+0x127/0x18f
[<c0434685>] __do_softirq+0x78/0xff
[<c0407759>] do_softirq+0x74/0xf7
=======================

This patch fixes the code to update EBP properly, and to check the EIP
before printing (as the non-framepointer backtracer does) so that
the same test backtrace now looks like this:

WARNING: at timeroops.c:8 kerneloops_regression_test()
Pid: 0, comm: swapper Not tainted 2.6.24-rc7 #4
[<c0405d17>] show_trace_log_lvl+0x1a/0x2f
[<c0406681>] show_trace+0x12/0x14
[<c0406ef2>] dump_stack+0x6a/0x70
[<e01f6040>] kerneloops_regression_test+0x3b/0x3d [timeroops]
[<c0426f07>] run_timer_softirq+0x11b/0x17c
[<c04243ac>] __do_softirq+0x42/0x94
[<c040704c>] do_softirq+0x50/0xb6
[<c04242a9>] irq_exit+0x37/0x67
[<c040714c>] do_IRQ+0x9a/0xaf
[<c04057da>] common_interrupt+0x2e/0x34
[<c05807fe>] cpuidle_idle_call+0x52/0x78
[<c04034f3>] cpu_idle+0x46/0x60
[<c05fbbd3>] rest_init+0x43/0x45
[<c070aa3d>] start_kernel+0x279/0x27f
=======================

This shows that the backtrace goes all the way down to user context now.
This bug was found during the port to 64 bit of the frame pointer backtracer.

Signed-off-by: Arjan van de Ven <[email protected]>

---
arch/x86/kernel/traps_32.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6.24-rc7/arch/x86/kernel/traps_32.c
===================================================================
--- linux-2.6.24-rc7.orig/arch/x86/kernel/traps_32.c
+++ linux-2.6.24-rc7/arch/x86/kernel/traps_32.c
@@ -124,7 +124,8 @@ static inline unsigned long print_contex
unsigned long addr;

addr = frame->return_address;
- ops->address(data, addr);
+ if (__kernel_text_address(addr))
+ ops->address(data, addr);
/*
* break out of recursive entries (such as
* end_of_stack_stop_unwind_function). Also,
@@ -132,6 +133,7 @@ static inline unsigned long print_contex
* move downwards!
*/
next = frame->next_frame;
+ ebp = (unsigned long) next;
if (next <= frame)
break;
frame = next;

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org


2008-01-10 06:55:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fix x86 32 bit FRAME_POINTER chasing code


* Arjan van de Ven <[email protected]> wrote:

> +++ linux-2.6.24-rc7/arch/x86/kernel/traps_32.c
> @@ -124,7 +124,8 @@ static inline unsigned long print_contex
> unsigned long addr;
>
> addr = frame->return_address;
> - ops->address(data, addr);
> + if (__kernel_text_address(addr))
> + ops->address(data, addr);
> /*
> * break out of recursive entries (such as
> * end_of_stack_stop_unwind_function). Also,
> @@ -132,6 +133,7 @@ static inline unsigned long print_contex
> * move downwards!
> */
> next = frame->next_frame;
> + ebp = (unsigned long) next;
> if (next <= frame)

thanks, applied. Nice catch!

> This patch is simple; I don't know if it's .24 candidate; the bug is
> pretty bad but not a recent regression, and there is obviously some
> risk with touching this code.

it's a 2.6.24.1 candidate i believe. We trigger plenty of various
crashes during x86.git maintenance and others hit various crashes in
-mm, so by the time .1 is released we'll have it in .25 and can backport
it. Most folks/distros will update to 2.6.24.1 very quickly so there's
no risk of months loss of quality to kerneloops.org data either.

if there's more than 1-2 weeks to the v2.6.24 release we could merge it
right now as well:

Acked-by: Ingo Molnar <[email protected]>

because in a week we'll trigger plenty of crashes in -git based x86
trees and will know about any regressions and will be able to reasonably
trust it.

Ingo

2008-01-10 11:02:16

by Adrian Bunk

[permalink] [raw]
Subject: Re: Fix x86 32 bit FRAME_POINTER chasing code

On Thu, Jan 10, 2008 at 07:54:38AM +0100, Ingo Molnar wrote:
>...
> it's a 2.6.24.1 candidate i believe. We trigger plenty of various
> crashes during x86.git maintenance and others hit various crashes in
> -mm, so by the time .1 is released we'll have it in .25 and can backport
> it. Most folks/distros will update to 2.6.24.1 very quickly so there's
> no risk of months loss of quality to kerneloops.org data either.
>...

-stable should not introduce additional regressions, and my personal
impression is that it already introduces too many regressions.

Either this patch is considered both important and safe enough for
getting into Linus' tree now or it's simply too late for 2.6.24*

> Ingo

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-01-10 19:24:18

by Chuck Ebbert

[permalink] [raw]
Subject: Re: Fix x86 32 bit FRAME_POINTER chasing code

On 01/10/2008 01:54 AM, Ingo Molnar wrote:
> * Arjan van de Ven <[email protected]> wrote:
>
>> +++ linux-2.6.24-rc7/arch/x86/kernel/traps_32.c
>> @@ -124,7 +124,8 @@ static inline unsigned long print_contex
>> unsigned long addr;
>>
>> addr = frame->return_address;
>> - ops->address(data, addr);
>> + if (__kernel_text_address(addr))
>> + ops->address(data, addr);
>> /*
>> * break out of recursive entries (such as
>> * end_of_stack_stop_unwind_function). Also,
>> @@ -132,6 +133,7 @@ static inline unsigned long print_contex
>> * move downwards!
>> */
>> next = frame->next_frame;
>> + ebp = (unsigned long) next;
>> if (next <= frame)
>
> thanks, applied. Nice catch!
>
>> This patch is simple; I don't know if it's .24 candidate; the bug is
>> pretty bad but not a recent regression, and there is obviously some
>> risk with touching this code.
>
> it's a 2.6.24.1 candidate i believe. We trigger plenty of various
> crashes during x86.git maintenance and others hit various crashes in
> -mm, so by the time .1 is released we'll have it in .25 and can backport
> it. Most folks/distros will update to 2.6.24.1 very quickly so there's
> no risk of months loss of quality to kerneloops.org data either.
>

Using the same logic, why not put it in 2.6.24 and then remove it in 2.6.24.1
if it's broken?

2008-01-14 10:33:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fix x86 32 bit FRAME_POINTER chasing code


* Chuck Ebbert <[email protected]> wrote:

> > it's a 2.6.24.1 candidate i believe. We trigger plenty of various
> > crashes during x86.git maintenance and others hit various crashes in
> > -mm, so by the time .1 is released we'll have it in .25 and can
> > backport it. Most folks/distros will update to 2.6.24.1 very quickly
> > so there's no risk of months loss of quality to kerneloops.org data
> > either.
>
> Using the same logic, why not put it in 2.6.24 and then remove it in
> 2.6.24.1 if it's broken?

hm, did you understood my .25 reference to mean 2.6.25-final? I meant
2.6.25-rc1. Or if there's a 2.6.24-rc8 planned then we could put it in
right now [with the bisectability fixes, i.e. not the original series].
I.e. IMO what we shouldnt do is to put in these fixes without having had
_some_ -rc release inbetween.

Ingo