Right... [adding bunch of people to CC],
On Wed, 2011-05-25 at 15:50 +0100, Catalin Marinas wrote:
> 2011/5/25 Måns Rullgård <[email protected]>:
> > Dave Martin <[email protected]> writes:
> >
> >> On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote:
> >>> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
> >>> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
> >>> > > BTW, are we sure that the code generated with unaligned accesses is
> >>> > > better? AFAIK, while processors support unaligned accesses, they may
> >>> > > not always be optimal.
> >>> >
> >>> > The code gcc generates to synthesise an unaligned access using aligned
> >>> > accesses is pretty simplistic:
> >>> ...
> >>> > For code which natively needs to read unaligned fields from data structures,
> >>> > I sincerely doubt that the CPU will not beat the above code for efficiency...
> >>> >
> >>> > So if there's code doing unaligned access to data structures for a good
> >>> > reason, building with unaligned access support turned on in the compiler
> >>> > seems a good idea, if that code might performance-critical for anything.
> >>>
> >>> gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
> >>> function. We have a local variable like below (9 bytes):
> >>>
> >>> char empty_str[] = "--------";
> >>>
> >>> and it looks like other stack accesses are unaligned:
> >>>
> >>> c0082ba0 <pcpu_dump_alloc_info>:
> >>> c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr}
> >>> c0082ba4: e3074118 movw r4, #28952 ; 0x7118
> >>> c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c
> >>> c0082bac: e34c402a movt r4, #49194 ; 0xc02a
> >>> c0082bb0: e58d1014 str r1, [sp, #20]
> >>> c0082bb4: e58d0020 str r0, [sp, #32]
> >>> c0082bb8: e8b40003 ldm r4!, {r0, r1}
> >>> c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!!
> >>> c0082bc0: e59d0014 ldr r0, [sp, #20]
> >>> c0082bc4: e5d43000 ldrb r3, [r4]
> >>>
> >>> I haven't tried with -mno-unaligned-access, I suspect the variables on
> >>> the stack would be aligned.
> >>
> >> So, it looks like empty_str may be misaligned on the stack, and the compiler
> >> is doing a misaligned store when initialising it.
> >
> > empty_str has type char[] so there are no alignment requirements.
>
> I think the local variables after char empty_str[] are unaligned (int
> alloc). Changing the array size to 16 solves the issue.
>
> The gcc guys here in ARM will have a look and I'll get back to you.
This issue seems to be caused by passing -fconserve-stack to GCC. This
was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc
4.5") and as you can see from the archive:
http://lkml.org/lkml/2009/9/20/39
it was thought to only have an impact on inlining decisions. Looking at
the documentation for GCC 4.6:
-fconserve-stack
Attempt to minimize stack usage. The compiler will attempt to
use less stack space, even if that makes the program slower. This option
implies setting the ‘large-stack-frame’ parameter to 100 and the
‘large-stack-frame-growth’ parameter to 400.
So it sounds like we might not want to enable this blindly across all
architectures. Indeed, on ARM, it encourages the compiler to pack
variables on the stack which leads to the weird and wonderful alignment
situation that has been encountered in this thread.
Can we remove -fconserve-stack from the top-level Makefile (or at least
make it conditional by architecture)?
Will
Will Deacon <[email protected]> writes:
> This issue seems to be caused by passing -fconserve-stack to GCC. This
> was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc
> 4.5") and as you can see from the archive:
>
> http://lkml.org/lkml/2009/9/20/39
>
> it was thought to only have an impact on inlining decisions. Looking at
> the documentation for GCC 4.6:
>
> -fconserve-stack
> Attempt to minimize stack usage. The compiler will attempt to
> use less stack space, even if that makes the program slower. This option
> implies setting the ‘large-stack-frame’ parameter to 100 and the
> ‘large-stack-frame-growth’ parameter to 400.
>
> So it sounds like we might not want to enable this blindly across all
> architectures. Indeed, on ARM, it encourages the compiler to pack
> variables on the stack which leads to the weird and wonderful alignment
> situation that has been encountered in this thread.
>
> Can we remove -fconserve-stack from the top-level Makefile (or at least
> make it conditional by architecture)?
Sounds like a good idea to me.
--
Måns Rullgård
[email protected]
> So it sounds like we might not want to enable this blindly across all
> architectures. Indeed, on ARM, it encourages the compiler to pack
> variables on the stack which leads to the weird and wonderful alignment
> situation that has been encountered in this thread.
Interesting. I'm pretty sure it didn't do that when I added the flag.
Anyways making it a CONFIG is fine for me. Just don't set it on ARM. It
should be set on x86
at least.
-Andi
On Thu, 26 May 2011, Will Deacon wrote:
> This issue seems to be caused by passing -fconserve-stack to GCC. This
> was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc
> 4.5") and as you can see from the archive:
>
> http://lkml.org/lkml/2009/9/20/39
>
> it was thought to only have an impact on inlining decisions. Looking at
> the documentation for GCC 4.6:
>
> -fconserve-stack
> Attempt to minimize stack usage. The compiler will attempt to
> use less stack space, even if that makes the program slower. This option
> implies setting the ‘large-stack-frame’ parameter to 100 and the
> ‘large-stack-frame-growth’ parameter to 400.
>
> So it sounds like we might not want to enable this blindly across all
> architectures. Indeed, on ARM, it encourages the compiler to pack
> variables on the stack which leads to the weird and wonderful alignment
> situation that has been encountered in this thread.
>
> Can we remove -fconserve-stack from the top-level Makefile (or at least
> make it conditional by architecture)?
I think this is an orthogonal issue.
My opinion is that we should use -mno-unaligned-access by default on
ARM. The reason is that we've been expecting the compiler not to cause
unaligned accesses for ages, and letting the compiler, for whatever
reasons including things like -fconserve-stack, produce unaligned
accesses behind our back is a change in behavior we might not always be
prepared for. Unaligned accesses in the kernel should be rare anyway,
and allowing the compiler to generate them can be allowed for selected
files when proven beneficial.
It is possible that -fconserve-stack is still valuable on ARM given that
it is also used with -mno-unaligned-access for other things than
structure packing on the stack, and therefore its merits can be debated
independently from the alignment issue at hand.
Nicolas
> It is possible that -fconserve-stack is still valuable on ARM given that
> it is also used with -mno-unaligned-access for other things than
> structure packing on the stack, and therefore its merits can be debated
> independently from the alignment issue at hand.
The big advantage of -fconserve-stack is that it throttles the inliner
if the inlining
would cause too much stack growth. This is something you likely want
on ARM too, especially as code gets more and more complex.
-Andi
Andi Kleen <[email protected]> writes:
>> It is possible that -fconserve-stack is still valuable on ARM given that
>> it is also used with -mno-unaligned-access for other things than
>> structure packing on the stack, and therefore its merits can be debated
>> independently from the alignment issue at hand.
>
> The big advantage of -fconserve-stack is that it throttles the inliner
> if the inlining would cause too much stack growth. This is something
> you likely want on ARM too, especially as code gets more and more
> complex.
Is there no way to get that effect without also activating the
aggressive packing?
--
M?ns Rullg?rd
[email protected]
On Thu, May 26, 2011 at 05:03:39PM -0400, Nicolas Pitre wrote:
> It is possible that -fconserve-stack is still valuable on ARM given that
> it is also used with -mno-unaligned-access for other things than
> structure packing on the stack, and therefore its merits can be debated
> independently from the alignment issue at hand.
Catalin said in his mail "I haven't tried with -mno-unaligned-access, I
suspect the variables on the stack would be aligned.". So I don't think
we know enough to say whether -mno-unaligned-access avoids the stack
packing.
> Catalin said in his mail "I haven't tried with -mno-unaligned-access, I
> suspect the variables on the stack would be aligned.". So I don't think
> we know enough to say whether -mno-unaligned-access avoids the stack
> packing.
It won't, the arm gcc code just checks flag_conserve_stack.
IMHO it's just a gcc bug. You should report it to
http://gcc.gnu.org/bugzilla
As a temporary workaround you can disable it in the kernel too, but as soon
as the compiler it's fixed I would recommend considering to reenable it.
-Andi
On Thu, May 26, 2011 at 10:51:01PM +0100, Russell King - ARM Linux wrote:
> On Thu, May 26, 2011 at 05:03:39PM -0400, Nicolas Pitre wrote:
> > It is possible that -fconserve-stack is still valuable on ARM given that
> > it is also used with -mno-unaligned-access for other things than
> > structure packing on the stack, and therefore its merits can be debated
> > independently from the alignment issue at hand.
>
> Catalin said in his mail "I haven't tried with -mno-unaligned-access, I
> suspect the variables on the stack would be aligned.". So I don't think
> we know enough to say whether -mno-unaligned-access avoids the stack
> packing.
OK, I tried this now:
-fconserve-stack: we get unaligned accesses on the stack because the
newer versions of gcc turned unaligned accesses on by default.
-fconserve-stack -mno-unaligned-access: the stack variables are aligned.
We probably get the benefit of -fconserve-stack as well.
So as per the initial post in this thread, we could have
-mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As
Nicolas suggested, we could compile some files with -munaligned-access
(and maybe -fno-conserve-stack).
I raised this with the gcc guys so they are looking into it. But it
really doesn't look like a gcc bug as long as -mno-unaligned-access is
taken into account.
--
Catalin
On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote:
> OK, I tried this now:
>
> -fconserve-stack: we get unaligned accesses on the stack because the
> newer versions of gcc turned unaligned accesses on by default.
>
> -fconserve-stack -mno-unaligned-access: the stack variables are aligned.
> We probably get the benefit of -fconserve-stack as well.
>
> So as per the initial post in this thread, we could have
> -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As
> Nicolas suggested, we could compile some files with -munaligned-access
> (and maybe -fno-conserve-stack).
>
> I raised this with the gcc guys so they are looking into it. But it
> really doesn't look like a gcc bug as long as -mno-unaligned-access is
> taken into account.
Ok, we need to check one last thing, and that's what the behaviour is
with -mno-unaligned-access and packed structures (such as the ethernet
header). If it makes no difference, then I suggest we always build
with -mno-unaligned-access.
On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote:
> On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote:
> > OK, I tried this now:
> >
> > -fconserve-stack: we get unaligned accesses on the stack because the
> > newer versions of gcc turned unaligned accesses on by default.
> >
> > -fconserve-stack -mno-unaligned-access: the stack variables are aligned.
> > We probably get the benefit of -fconserve-stack as well.
> >
> > So as per the initial post in this thread, we could have
> > -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As
> > Nicolas suggested, we could compile some files with -munaligned-access
> > (and maybe -fno-conserve-stack).
> >
> > I raised this with the gcc guys so they are looking into it. But it
> > really doesn't look like a gcc bug as long as -mno-unaligned-access is
> > taken into account.
>
> Ok, we need to check one last thing, and that's what the behaviour is
> with -mno-unaligned-access and packed structures (such as the ethernet
> header). If it makes no difference, then I suggest we always build
> with -mno-unaligned-access.
I tried some simple code below:
struct test {
unsigned char a[6];
unsigned long b;
} __attribute__((packed));
void set(struct test *t, unsigned long v)
{
t->b = v;
}
int main(void)
{
struct test t;
set(&t, 10);
return 0;
}
With -mno-unaligned-access in newer toolchains, the set() function looks
like this (compiled with -march=armv7):
00000000 <set>:
0: e7e7c451 ubfx ip, r1, #8, #8
4: e7e72851 ubfx r2, r1, #16, #8
8: e1a03c21 lsr r3, r1, #24
c: e5c01006 strb r1, [r0, #6]
10: e5c0c007 strb ip, [r0, #7]
14: e5c02008 strb r2, [r0, #8]
18: e5c03009 strb r3, [r0, #9]
1c: e12fff1e bx lr
If I don't pass -mno-unaligned-access later toolchains use unaligned
accesses by default and the set() function is more efficient:
00000000 <set>:
0: e5801006 str r1, [r0, #6]
4: e12fff1e bx lr
The problem is that in addition to that we also get unaligned stack
variables which are not really efficient. Either way we have a drawback
somewhere. We could argue that -fconserve-stack is badly implemented on
ARM.
--
Catalin
On Fri, 2011-05-27 at 10:51 +0100, Catalin Marinas wrote:
> On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote:
> > On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote:
> > > OK, I tried this now:
> > >
> > > -fconserve-stack: we get unaligned accesses on the stack because the
> > > newer versions of gcc turned unaligned accesses on by default.
> > >
> > > -fconserve-stack -mno-unaligned-access: the stack variables are aligned.
> > > We probably get the benefit of -fconserve-stack as well.
> > >
> > > So as per the initial post in this thread, we could have
> > > -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As
> > > Nicolas suggested, we could compile some files with -munaligned-access
> > > (and maybe -fno-conserve-stack).
> > >
> > > I raised this with the gcc guys so they are looking into it. But it
> > > really doesn't look like a gcc bug as long as -mno-unaligned-access is
> > > taken into account.
> >
> > Ok, we need to check one last thing, and that's what the behaviour is
> > with -mno-unaligned-access and packed structures (such as the ethernet
> > header). If it makes no difference, then I suggest we always build
> > with -mno-unaligned-access.
>
> I tried some simple code below:
>
> struct test {
> unsigned char a[6];
> unsigned long b;
> } __attribute__((packed));
>
> void set(struct test *t, unsigned long v)
> {
> t->b = v;
> }
>
> int main(void)
> {
> struct test t;
>
> set(&t, 10);
>
> return 0;
> }
>
> With -mno-unaligned-access in newer toolchains, the set() function looks
> like this (compiled with -march=armv7):
>
> 00000000 <set>:
> 0: e7e7c451 ubfx ip, r1, #8, #8
> 4: e7e72851 ubfx r2, r1, #16, #8
> 8: e1a03c21 lsr r3, r1, #24
> c: e5c01006 strb r1, [r0, #6]
> 10: e5c0c007 strb ip, [r0, #7]
> 14: e5c02008 strb r2, [r0, #8]
> 18: e5c03009 strb r3, [r0, #9]
> 1c: e12fff1e bx lr
>
> If I don't pass -mno-unaligned-access later toolchains use unaligned
> accesses by default and the set() function is more efficient:
>
> 00000000 <set>:
> 0: e5801006 str r1, [r0, #6]
> 4: e12fff1e bx lr
For completeness, I tried with "unsigned short b" in the structure above
hoping that the compiler would notice that it is 16-bit aligned.
Unfortunately, it doesn't. Code below with -mno-unaligned-access:
00000000 <set>:
0: e1a03421 lsr r3, r1, #8
4: e5c01006 strb r1, [r0, #6]
8: e5c03007 strb r3, [r0, #7]
c: e12fff1e bx lr
--
Catalin
Hi Andi,
On Thu, 2011-05-26 at 22:10 +0100, Andi Kleen wrote:
>
> > It is possible that -fconserve-stack is still valuable on ARM given that
> > it is also used with -mno-unaligned-access for other things than
> > structure packing on the stack, and therefore its merits can be debated
> > independently from the alignment issue at hand.
>
> The big advantage of -fconserve-stack is that it throttles the inliner
> if the inlining
> would cause too much stack growth. This is something you likely want
> on ARM too, especially as code gets more and more complex.
Do you have any concrete examples of -fconserve-stack giving an overall
win that isn't in the noise? The fact that the GCC documentation
explicitly states that enabling the option can lead to `making the
program slower' does make me question why we're enabling it in the first
place.
>From private conversation, the GCC guys don't seem to think this is a
bug so I'm reluctant to open a bugzilla ticket.
Will
Catalin Marinas <[email protected]> writes:
> On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote:
>>
>> Ok, we need to check one last thing, and that's what the behaviour is
>> with -mno-unaligned-access and packed structures (such as the ethernet
>> header). If it makes no difference, then I suggest we always build
>> with -mno-unaligned-access.
>
> I tried some simple code below:
>
> struct test {
> unsigned char a[6];
> unsigned long b;
> } __attribute__((packed));
>
> void set(struct test *t, unsigned long v)
> {
> t->b = v;
> }
>
> int main(void)
> {
> struct test t;
>
> set(&t, 10);
>
> return 0;
> }
>
> With -mno-unaligned-access in newer toolchains, the set() function looks
> like this (compiled with -march=armv7):
>
> 00000000 <set>:
> 0: e7e7c451 ubfx ip, r1, #8, #8
> 4: e7e72851 ubfx r2, r1, #16, #8
> 8: e1a03c21 lsr r3, r1, #24
> c: e5c01006 strb r1, [r0, #6]
> 10: e5c0c007 strb ip, [r0, #7]
> 14: e5c02008 strb r2, [r0, #8]
> 18: e5c03009 strb r3, [r0, #9]
> 1c: e12fff1e bx lr
>
> If I don't pass -mno-unaligned-access later toolchains use unaligned
> accesses by default and the set() function is more efficient:
>
> 00000000 <set>:
> 0: e5801006 str r1, [r0, #6]
> 4: e12fff1e bx lr
This is certainly something we should want. Although some people
expressed concerns over introducing unaligned accesses where there were
previously none, I don't see how this could pose a problem as long as we
make sure strict alignment checking is off. Some basic testing of code
paths known to use unaligned accesses should suffice IMO.
> The problem is that in addition to that we also get unaligned stack
> variables which are not really efficient. Either way we have a drawback
> somewhere. We could argue that -fconserve-stack is badly implemented on
> ARM.
Unless someone can demonstrate a clear win from -fconserve-stack, I
think it's pretty obvious that this flag does more harm than good on
ARM, especially in conjunction with unaligned accesses being allowed.
If the stack packing could be disabled while retaining the other
(presumably beneficial) effects of -fconserve-stack, it might be
reconsidered.
--
M?ns Rullg?rd
[email protected]
> Do you have any concrete examples of -fconserve-stack giving an overall
> win that isn't in the noise? The fact that the GCC documentation
> explicitly states that enabling the option can lead to `making the
> program slower' does make me question why we're enabling it in the first
> place.
Because the kernel has a limited stack. We had a few cases in the past where
inlining blew it, especially in large ioctl switch() functions which inlined
lots of others.
On modern gccs it's better because it is smarter about sharing stack
slots in large
functions. This was also worked around with manual noinlines.
But it's still far safer to tell gcc to conserve stack.
I consider the ARM gcc behaviour just a bug. The thing was really only
intended
for the inliner (I asked for it originally)
-Andi
Here's a untested but straight forward patch to disable it.
-Andi
---
Disable -fconserve-stack on ARM
There are reports that -fconserve-stack misaligns variables on the stack.
Disable it for ARM to work around this gcc bug.
Signed-off-by: Andi Kleen <[email protected]>
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 25750bc..0883cff 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -41,6 +41,10 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
KBUILD_CFLAGS +=-fstack-protector
endif
+# ARM gcc developers unfortunately broke -fconserve-stack. It misaligns
+# variables on the stack
+KBUILD_CFLAGS += $(call cc-option,-fno-conserve-stack)
+
ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
KBUILD_CPPFLAGS += -mbig-endian
AS += -EB
On Sat, May 28, 2011 at 04:34:17PM +0100, Andi Kleen wrote:
> Disable -fconserve-stack on ARM
>
> There are reports that -fconserve-stack misaligns variables on the stack.
> Disable it for ARM to work around this gcc bug.
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> index 25750bc..0883cff 100644
> --- a/arch/arm/Makefile
> +++ b/arch/arm/Makefile
> @@ -41,6 +41,10 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
> KBUILD_CFLAGS +=-fstack-protector
> endif
>
> +# ARM gcc developers unfortunately broke -fconserve-stack. It misaligns
> +# variables on the stack
> +KBUILD_CFLAGS += $(call cc-option,-fno-conserve-stack)
Didn't work, on the gcc command line we get -fno-conserve-stack followed
by -fconserve-stack from the top makefile and it seems that the latter
takes priority.
--
Catalin
> Didn't work, on the gcc command line we get -fno-conserve-stack followed
> by -fconserve-stack from the top makefile and it seems that the latter
> takes priority.
Ah ok need to move it up then too in the top level Makefile.
Like this.
-Andi
---
Disable -fconserve-stack on ARM v2
There are reports that -fconserve-stack misaligns variables on the stack.
Disable it for ARM to work around this gcc bug.
v2: Move top level flags definition up
Signed-off-by: Andi Kleen <[email protected]>
diff --git a/Makefile b/Makefile
index 529d93f..08d848f 100644
--- a/Makefile
+++ b/Makefile
@@ -564,6 +564,10 @@ else
KBUILD_CFLAGS += -O2
endif
+# conserve stack if available
+# do this early so that an architecture can override it.
+KBUILD_CFLAGS += $(call cc-option,-fconserve-stack)
+
include $(srctree)/arch/$(SRCARCH)/Makefile
ifneq ($(CONFIG_FRAME_WARN),0)
@@ -629,9 +633,6 @@ KBUILD_CFLAGS += $(call cc-disable-warning, pointer-sign)
# disable invalid "can't wrap" optimizations for signed / pointers
KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)
-# conserve stack if available
-KBUILD_CFLAGS += $(call cc-option,-fconserve-stack)
-
# use the deterministic mode of AR if available
KBUILD_ARFLAGS := $(call ar-option,D)
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index f5b2b39..902fec9 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -41,6 +41,10 @@ ifeq ($(CONFIG_CC_STACKPROTECTOR),y)
KBUILD_CFLAGS +=-fstack-protector
endif
+# ARM gcc developers unfortunately broke -fconserve-stack. It misaligns
+# variables on the stack
+KBUILD_CFLAGS += $(call cc-option,-fno-conserve-stack)
+
ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
KBUILD_CPPFLAGS += -mbig-endian
AS += -EB
On Tue, May 31, 2011 at 07:01:07PM +0100, Andi Kleen wrote:
> Disable -fconserve-stack on ARM v2
>
> There are reports that -fconserve-stack misaligns variables on the stack.
> Disable it for ARM to work around this gcc bug.
>
> v2: Move top level flags definition up
> Signed-off-by: Andi Kleen <[email protected]>
It seems to disable the -fconserve-stack on ARM.
Tested-by: Catalin Marinas <[email protected]>