2009-11-19 21:19:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

Hence a new unconstrained option...

"Jeff Law" <[email protected]> wrote:

>On 11/19/09 12:50, H. Peter Anvin wrote:
>>
>> Calling the profiler immediately at the entry point is clearly the more
>> sane option. It means the ABI is well-defined, stable, and independent
>> of what the actual function contents are. It means that ABI isn't the
>> normal C ABI (the __fentry__ function would have to preserve all
>> registers), but that's fine...
>>
>Note there are targets (even some old x86 variants) that required the
>profiling calls to occur after the prologue. Unfortunately, nobody
>documented *why* that was the case. Sigh.
>
>Jeff

--
Sent from my mobile phone. Please excuse any lack of formatting.


2009-11-19 21:27:52

by Jeff Law

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

On 11/19/09 14:14, H. Peter Anvin wrote:
> Hence a new unconstrained option...
>
Not arguing against it, just noting there are targets where after the
prologue mcount is mandated. There's certainly hooks in GCC to do it
both ways and if there's no clear need to use after-prologue on
x86-linux, then before-prologue seems reasonable to me.

It's also the case that aligning stacks on the x86 and the poor code
generated when used with profiling is an interaction I doubt anyone has
looked at until now. The result is definitely ugly and inefficient --
and there's something to be said for cleaning that up and at least
marginally reducing the overhead of profiling.

Having said all that, I don't expect to personally be looking at the
problem, given the list of other codegen issues that need to be looked
at (reload in particular), profiling/stack interactions would be around
87 millionth on my list.

jeff

2009-11-19 22:42:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:

> Having said all that, I don't expect to personally be looking at the
> problem, given the list of other codegen issues that need to be looked
> at (reload in particular), profiling/stack interactions would be around
> 87 millionth on my list.

Is there someone else that can look at it?

Or at the very least, could you point us to where that code is, and one
of us tracing folks could take a crack at switching hats to be a
compiler writer (with the obvious prerequisite of drinking a lot of beer
first, or is there a better drug to cope with the pain of writing gcc?).

-- Steve

2009-11-20 00:00:44

by Jeff Law

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

On 11/19/09 15:43, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
>
>
>> Having said all that, I don't expect to personally be looking at the
>> problem, given the list of other codegen issues that need to be looked
>> at (reload in particular), profiling/stack interactions would be around
>> 87 millionth on my list.
>>
> Is there someone else that can look at it?
>
>
Unsure at the moment... Like everyone else, GCC developers are busy and
this probably isn't going to be a high priority item for anyone.


> Or at the very least, could you point us to where that code is, and one
> of us tracing folks could take a crack at switching hats to be a
> compiler writer (with the obvious prerequisite of drinking a lot of beer
> first, or is there a better drug to cope with the pain of writing gcc?).
>
It _might_ be as easy as defining PROFILE_BEFORE_PROLOGUE in
gcc-<someversion>gcc/config/i386/linux.h & rebuilding GCC.

Based on comments elsewhere, the sun386i support may have used
PROFILE_BEFORE_PROLOGUE in the past and thus the x86 backend may not
need further adjustment. That is obviously the ideal case.

If that appears to work for your needs, I'll volunteer to test it more
thoroughly and assuming those tests look good shepherd it into the
source tree.

Jeff

2009-11-20 00:37:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

On Thu, 19 Nov 2009, Jeff Law wrote:
> On 11/19/09 15:43, Steven Rostedt wrote:
> > On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
> >
> >
> > > Having said all that, I don't expect to personally be looking at the
> > > problem, given the list of other codegen issues that need to be looked
> > > at (reload in particular), profiling/stack interactions would be around
> > > 87 millionth on my list.
> > >
> > Is there someone else that can look at it?
> >
> >
> Unsure at the moment... Like everyone else, GCC developers are busy and this
> probably isn't going to be a high priority item for anyone.
>
>
> > Or at the very least, could you point us to where that code is, and one
> > of us tracing folks could take a crack at switching hats to be a
> > compiler writer (with the obvious prerequisite of drinking a lot of beer
> > first, or is there a better drug to cope with the pain of writing gcc?).
> >
> It _might_ be as easy as defining PROFILE_BEFORE_PROLOGUE in
> gcc-<someversion>gcc/config/i386/linux.h & rebuilding GCC.
>
> Based on comments elsewhere, the sun386i support may have used
> PROFILE_BEFORE_PROLOGUE in the past and thus the x86 backend may not need
> further adjustment. That is obviously the ideal case.
>
> If that appears to work for your needs, I'll volunteer to test it more
> thoroughly and assuming those tests look good shepherd it into the source
> tree.

We definitely want to see that ASAP.

While testing various kernel configs we found out that the problem
comes and goes. Finally I started to compare the gcc command line
options and after some fiddling it turned out that the following
minimal deltas change the code generator behaviour:

Bad: -march=pentium-mmx -Wa,-mtune=generic32
Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32

I'm not supposed to understand the logic behind that, right ?

Thanks,

tglx

2009-11-20 01:01:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions



On Fri, 20 Nov 2009, Thomas Gleixner wrote:
>
> While testing various kernel configs we found out that the problem
> comes and goes. Finally I started to compare the gcc command line
> options and after some fiddling it turned out that the following
> minimal deltas change the code generator behaviour:
>
> Bad: -march=pentium-mmx -Wa,-mtune=generic32
> Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
> Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
>
> I'm not supposed to understand the logic behind that, right ?

Are you sure it's just the compiler flags?

There's another configuration portion: the size of the alignment itself.
That's dependent on L1_CACHE_SHIFT, which in turn is taken from the kernel
config CONFIG_X86_L1_CACHE_SHIFT.

Maybe that value matters too - for example maybe gcc will not try to align
the stack if it's big?

[ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
totally unrelated numbers? Very confusing. ]

The compiler flags we use are tied to some of the same choices that choose
the cache shift, so the correlation you found while debugging this would
still hold.

Linus

2009-11-20 01:29:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

On Thu, 19 Nov 2009, Linus Torvalds wrote:
> On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> >
> > While testing various kernel configs we found out that the problem
> > comes and goes. Finally I started to compare the gcc command line
> > options and after some fiddling it turned out that the following
> > minimal deltas change the code generator behaviour:
> >
> > Bad: -march=pentium-mmx -Wa,-mtune=generic32
> > Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
> > Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
> >
> > I'm not supposed to understand the logic behind that, right ?
>
> Are you sure it's just the compiler flags?

I first captured the command line with V=1 and created a script of
it. Then I changed the -march -mtune options in that script and
compiled just that single file manually w/o changing .config or
invoking the kernel make magic.

The good ones produce:

650: 55 push %ebp
651: 89 e5 mov %esp,%ebp
653: 83 e4 f0 and $0xfffffff0,%esp

The bad one:

000005f0 <timer_stats_update_stats>:
5f0: 57 push %edi
5f1: 8d 7c 24 08 lea 0x8(%esp),%edi
5f5: 83 e4 f0 and $0xfffffff0,%esp
5f8: ff 77 fc pushl -0x4(%edi)
5fb: 55 push %ebp
5fc: 89 e5 mov %esp,%ebp

> There's another configuration portion: the size of the alignment itself.
> That's dependent on L1_CACHE_SHIFT, which in turn is taken from the kernel
> config CONFIG_X86_L1_CACHE_SHIFT.
>
> Maybe that value matters too - for example maybe gcc will not try to align
> the stack if it's big?

That does not change any of the compiler options, but yes it could
have some effect via the various include magics, but all I have seen
so far is linkage.h which should not affect the compiler. And the
manual compile did not change any of this.

> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
> totally unrelated numbers? Very confusing. ]

Agreed.

> The compiler flags we use are tied to some of the same choices that choose
> the cache shift, so the correlation you found while debugging this would
> still hold.

Digging further tomorrow when my brain is more awake.

Thanks,

tglx

2009-11-20 01:34:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

On 11/19/2009 04:59 PM, Linus Torvalds wrote:
>
> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
> totally unrelated numbers? Very confusing. ]
>

Yes, there is another thread to clean up that particular mess; it is
already in -tip:

http://git.kernel.org/tip/350f8f5631922c7848ec4b530c111cb8c2ff7caa

-hpa

2009-11-20 02:16:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

On Fri, 20 Nov 2009, Thomas Gleixner wrote:

> On Thu, 19 Nov 2009, Linus Torvalds wrote:
> > On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> > >
> > > While testing various kernel configs we found out that the problem
> > > comes and goes. Finally I started to compare the gcc command line
> > > options and after some fiddling it turned out that the following
> > > minimal deltas change the code generator behaviour:
> > >
> > > Bad: -march=pentium-mmx -Wa,-mtune=generic32
> > > Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
> > > Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32

Found some more:

Bad: -march=k6 -Wa,-mtune=generic32
Bad: -march=geode -Wa,-mtune=generic32
Bad: -march=c3 -Wa,-mtune=generic32

That seems every thing which has MMX support but no SSE and is somehow
compatible to the pentium-mmx.

Looks like the code generator optimization for those was done after
consuming the secret gcc-shrooms.

Thanks,

tglx

2009-11-20 05:40:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions


* Linus Torvalds <[email protected]> wrote:

> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
> totally unrelated numbers? Very confusing. ]

incidentally (or maybe not so incidentally) that got fixed yesterday in
-tip - at around the time i triggered that crash:

350f8f5: x86: Eliminate redundant/contradicting cache line size config options

See the full commit below. The config that triggered the crash for me
has:

CONFIG_X86_L1_CACHE_SHIFT=4

so it's 16 bytes - and it's consistent now, which is a new angle. So i
think this explains why it stayed dormant for such a long time - it was
hidden by the cacheline-size config value inconsistencies.

Ingo

----------------->
>From 350f8f5631922c7848ec4b530c111cb8c2ff7caa Mon Sep 17 00:00:00 2001
From: Jan Beulich <[email protected]>
Date: Fri, 13 Nov 2009 11:54:40 +0000
Subject: [PATCH] x86: Eliminate redundant/contradicting cache line size config options

Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
(with inconsistent defaults), just having the latter suffices as
the former can be easily calculated from it.

To be consistent, also change X86_INTERNODE_CACHE_BYTES to
X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
to account for last level cache line size (which here matters
more than L1 cache line size).

Finally, make sure the default value for X86_L1_CACHE_SHIFT,
when X86_GENERIC is selected, is being seen before that for the
individual CPU model options (other than on x86-64, where
GENERIC_CPU is part of the choice construct, X86_GENERIC is a
separate option on ix86).

Signed-off-by: Jan Beulich <[email protected]>
Acked-by: Ravikiran Thirumalai <[email protected]>
Acked-by: Nick Piggin <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/Kconfig.cpu | 14 +++++---------
arch/x86/boot/compressed/vmlinux.lds.S | 3 ++-
arch/x86/include/asm/cache.h | 7 ++++---
arch/x86/kernel/vmlinux.lds.S | 10 +++++-----
arch/x86/mm/tlb.c | 3 ++-
5 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index f2824fb..621f2bd 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -301,15 +301,11 @@ config X86_CPU

#
# Define implied options from the CPU selection here
-config X86_L1_CACHE_BYTES
+config X86_INTERNODE_CACHE_SHIFT
int
- default "128" if MPSC
- default "64" if GENERIC_CPU || MK8 || MCORE2 || MATOM || X86_32
-
-config X86_INTERNODE_CACHE_BYTES
- int
- default "4096" if X86_VSMP
- default X86_L1_CACHE_BYTES if !X86_VSMP
+ default "12" if X86_VSMP
+ default "7" if NUMA
+ default X86_L1_CACHE_SHIFT

config X86_CMPXCHG
def_bool X86_64 || (X86_32 && !M386)
@@ -317,9 +313,9 @@ config X86_CMPXCHG
config X86_L1_CACHE_SHIFT
int
default "7" if MPENTIUM4 || MPSC
+ default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
default "4" if X86_ELAN || M486 || M386 || MGEODEGX1
default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
- default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU

config X86_XADD
def_bool y
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index f4193bb..a6f1a59 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -4,6 +4,7 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT, CONFIG_OUTPUT_FORMAT, CONFIG_OUTPUT_FORMAT)

#undef i386

+#include <asm/cache.h>
#include <asm/page_types.h>

#ifdef CONFIG_X86_64
@@ -46,7 +47,7 @@ SECTIONS
*(.data.*)
_edata = . ;
}
- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(L1_CACHE_BYTES);
.bss : {
_bss = . ;
*(.bss)
diff --git a/arch/x86/include/asm/cache.h b/arch/x86/include/asm/cache.h
index 549860d..2f9047c 100644
--- a/arch/x86/include/asm/cache.h
+++ b/arch/x86/include/asm/cache.h
@@ -9,12 +9,13 @@

#define __read_mostly __attribute__((__section__(".data.read_mostly")))

+#define INTERNODE_CACHE_SHIFT CONFIG_X86_INTERNODE_CACHE_SHIFT
+#define INTERNODE_CACHE_BYTES (1 << INTERNODE_CACHE_SHIFT)
+
#ifdef CONFIG_X86_VSMP
-/* vSMP Internode cacheline shift */
-#define INTERNODE_CACHE_SHIFT (12)
#ifdef CONFIG_SMP
#define __cacheline_aligned_in_smp \
- __attribute__((__aligned__(1 << (INTERNODE_CACHE_SHIFT)))) \
+ __attribute__((__aligned__(INTERNODE_CACHE_BYTES))) \
__page_aligned_data
#endif
#endif
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index fd2dabe..eeb4f5f 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -135,13 +135,13 @@ SECTIONS

PAGE_ALIGNED_DATA(PAGE_SIZE)

- CACHELINE_ALIGNED_DATA(CONFIG_X86_L1_CACHE_BYTES)
+ CACHELINE_ALIGNED_DATA(L1_CACHE_BYTES)

DATA_DATA
CONSTRUCTORS

/* rarely changed data like cpu maps */
- READ_MOSTLY_DATA(CONFIG_X86_INTERNODE_CACHE_BYTES)
+ READ_MOSTLY_DATA(INTERNODE_CACHE_BYTES)

/* End of data section */
_edata = .;
@@ -165,12 +165,12 @@ SECTIONS
*(.vsyscall_0)
} :user

- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(L1_CACHE_BYTES);
.vsyscall_fn : AT(VLOAD(.vsyscall_fn)) {
*(.vsyscall_fn)
}

- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(L1_CACHE_BYTES);
.vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data)) {
*(.vsyscall_gtod_data)
}
@@ -194,7 +194,7 @@ SECTIONS
}
vgetcpu_mode = VVIRT(.vgetcpu_mode);

- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(L1_CACHE_BYTES);
.jiffies : AT(VLOAD(.jiffies)) {
*(.jiffies)
}
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 36fe08e..65b58e4 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -8,6 +8,7 @@

#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
+#include <asm/cache.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>

@@ -43,7 +44,7 @@ union smp_flush_state {
spinlock_t tlbstate_lock;
DECLARE_BITMAP(flush_cpumask, NR_CPUS);
};
- char pad[CONFIG_X86_INTERNODE_CACHE_BYTES];
+ char pad[INTERNODE_CACHE_BYTES];
} ____cacheline_internodealigned_in_smp;

/* State is put into the per CPU data section, but padded

2009-11-20 12:06:31

by Andrew Haley

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

Thomas Gleixner wrote:

> While testing various kernel configs we found out that the problem
> comes and goes. Finally I started to compare the gcc command line
> options and after some fiddling it turned out that the following
> minimal deltas change the code generator behaviour:
>
> Bad: -march=pentium-mmx -Wa,-mtune=generic32
> Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
> Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
>
> I'm not supposed to understand the logic behind that, right ?

I don't either. I'm seeing:

timer_stats_update_stats: timer_stats_update_stats:
pushl %edi <
leal 8(%esp), %edi <
andl $-16, %esp <
pushl -4(%edi) <
pushl %ebp pushl %ebp
movl %esp, %ebp movl %esp, %ebp
pushl %edi | andl $-16, %esp
pushl %esi | subl $112, %esp
pushl %ebx | movl %ebx, 100(%esp)
subl $108, %esp | movl %esi, 104(%esp)
> movl %edi, 108(%esp)
call mcount call mcount

where the only difference is -mtune=generic. I'm investigating.

Andrew.

2009-11-20 12:24:45

by Andrew Haley

[permalink] [raw]
Subject: Re: BUG: GCC-4.4.x changes the function frame on some functions

Andrew Haley wrote:
> Thomas Gleixner wrote:
>
>> While testing various kernel configs we found out that the problem
>> comes and goes. Finally I started to compare the gcc command line
>> options and after some fiddling it turned out that the following
>> minimal deltas change the code generator behaviour:
>>
>> Bad: -march=pentium-mmx -Wa,-mtune=generic32
>> Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
>> Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
>>
>> I'm not supposed to understand the logic behind that, right ?
>
> I don't either. I'm seeing:
>
> timer_stats_update_stats: timer_stats_update_stats:
> pushl %edi <
> leal 8(%esp), %edi <
> andl $-16, %esp <
> pushl -4(%edi) <
> pushl %ebp pushl %ebp
> movl %esp, %ebp movl %esp, %ebp
> pushl %edi | andl $-16, %esp
> pushl %esi | subl $112, %esp
> pushl %ebx | movl %ebx, 100(%esp)
> subl $108, %esp | movl %esi, 104(%esp)
> > movl %edi, 108(%esp)
> call mcount call mcount
>
> where the only difference is -mtune=generic. I'm investigating.

Forget that, I see from the gcc-bugs list that hj has tracked it down to
the use of DRAP, and for some reason the mtune options affect that. He's
the best person to fix this.

Andrew.

2009-11-20 13:10:30

by Thomas Gleixner

[permalink] [raw]
Subject: [tip:x86/urgent] x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage

Commit-ID: 746357d6a526d6da9d89a2ec645b28406e959c2e
Gitweb: http://git.kernel.org/tip/746357d6a526d6da9d89a2ec645b28406e959c2e
Author: Thomas Gleixner <[email protected]>
AuthorDate: Fri, 20 Nov 2009 12:01:43 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Fri, 20 Nov 2009 14:06:46 +0100

x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage

When the kernel is compiled with -pg for tracing GCC 4.4.x inserts
stack alignment of a function _before_ the mcount prologue if the
-march=pentium-mmx is set and -mtune=generic is not set. This breaks
the assumption of the function graph tracer which expects that the
mcount prologue

push %ebp
mov %esp, %ebp

is the first stack operation in a function because it needs to modify
the function return address on the stack to trap into the tracer
before returning to the real caller.

The generated code is:

push %edi
lea 0x8(%esp),%edi
and $0xfffffff0,%esp
pushl -0x4(%edi)
push %ebp
mov %esp,%ebp

so the tracer modifies the copy of the return address which is stored
after the stack alignment and therefor does not trap the return which
in turn breaks the call chain logic of the tracer and leads to a
kernel panic.

Aside of the fact that the generated code is horrible for no good
reason other -march -mtune options generate the expected:

push %ebp
mov %esp,%ebp
and $0xfffffff0,%esp

which does the same and keeps everything intact.

After some experimenting we found out that this problem is restricted
to gcc4.4.x and to the following -march settings:

i586, pentium, pentium-mmx, k6, k6-2, k6-3, winchip-c6, winchip2, c3,
geode

By adding -mtune=generic the code generator produces always the
expected code.

So forcing -mtune=generic when CONFIG_FUNCTION_GRAPH_TRACER=y is not
pretty, but at the moment the only way to prevent that the kernel
trips over gcc-shrooms induced code madness.

Most distro kernels have CONFIG_X86_GENERIC=y anyway which forces
-mtune=generic as well so it will not impact those.

References: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
http://lkml.org/lkml/2009/11/19/17

Signed-off-by: Thomas Gleixner <[email protected]>
LKML-Reference: <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>,
Cc: Jeff Law <[email protected]>
Cc: [email protected]
Cc: David Daney <[email protected]>
Cc: Andrew Haley <[email protected]>
Cc: Richard Guenther <[email protected]>
Cc: [email protected]
---
arch/x86/Makefile_32.cpu | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index 30e9a26..df7fdf8 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -46,6 +46,12 @@ cflags-$(CONFIG_MGEODEGX1) += -march=pentium-mmx
# cpu entries
cflags-$(CONFIG_X86_GENERIC) += $(call tune,generic,$(call tune,i686))

+# Work around the pentium-mmx code generator madness of gcc4.4.x which
+# does stack alignment by generating horrible code _before_ the mcount
+# prologue (push %ebp, mov %esp, %ebp) which breaks the function graph
+# tracer assumptions
+cflags-$(CONFIG_FUNCTION_GRAPH_TRACER) += $(call cc-option,-mtune=generic)
+
# Bug fix for binutils: this option is required in order to keep
# binutils from generating NOPL instructions against our will.
ifneq ($(CONFIG_X86_P6_NOP),y)