2008-07-21 17:41:48

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] x86: BUILD_IRQ say .text to avoid .data.percpu

When I edit the x86_64 Makefile to -fno-unit-at-a-time, bootup panics
on 0xCCs in IRQ0x3e_interrupt(): IRQ0x20_interrupt etc. have got linked
into .data.percpu. Perhaps there are other ways of triggering that:
specify ".text" in the BUILD_IRQ() macro for safety.

Signed-off-by: Hugh Dickins <[email protected]>
---
I've been using -fno-unit-at-a-time (to lessen inlining, for easier
debugging) for a long time, but never saw this until Mike's percpu mods
came in: I mention this so you're on the lookout, just in case other
things are more likely to go into the wrong section now. (I did give
Mike a private headsup on this a couple of weeks ago, in case it helped
with problems he was having with percpu, but in fact it didn't help.)

arch/x86/kernel/irqinit_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- 2.6.26-git/arch/x86/kernel/irqinit_64.c 2008-07-18 11:33:31.000000000 +0100
+++ linux/arch/x86/kernel/irqinit_64.c 2008-07-18 16:07:00.000000000 +0100
@@ -43,7 +43,7 @@

#define BUILD_IRQ(nr) \
asmlinkage void IRQ_NAME(nr); \
- asm("\n.p2align\n" \
+ asm("\n.text\n.p2align\n" \
"IRQ" #nr "_interrupt:\n\t" \
"push $~(" #nr ") ; " \
"jmp common_interrupt");


2008-07-21 23:25:49

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86: BUILD_IRQ say .text to avoid .data.percpu

Hugh Dickins wrote:
> When I edit the x86_64 Makefile to -fno-unit-at-a-time, bootup panics
> on 0xCCs in IRQ0x3e_interrupt(): IRQ0x20_interrupt etc. have got linked
> into .data.percpu. Perhaps there are other ways of triggering that:
> specify ".text" in the BUILD_IRQ() macro for safety.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
> I've been using -fno-unit-at-a-time (to lessen inlining, for easier
> debugging) for a long time, but never saw this until Mike's percpu mods
> came in: I mention this so you're on the lookout, just in case other
> things are more likely to go into the wrong section now. (I did give
> Mike a private headsup on this a couple of weeks ago, in case it helped
> with problems he was having with percpu, but in fact it didn't help.)
>
> arch/x86/kernel/irqinit_64.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- 2.6.26-git/arch/x86/kernel/irqinit_64.c 2008-07-18 11:33:31.000000000 +0100
> +++ linux/arch/x86/kernel/irqinit_64.c 2008-07-18 16:07:00.000000000 +0100
> @@ -43,7 +43,7 @@
>
> #define BUILD_IRQ(nr) \
> asmlinkage void IRQ_NAME(nr); \
> - asm("\n.p2align\n" \
> + asm("\n.text\n.p2align\n" \
> "IRQ" #nr "_interrupt:\n\t" \
> "push $~(" #nr ") ; " \
> "jmp common_interrupt");


Thanks Hugh. Btw, which GCC are you using?

-Mike

2008-07-22 03:49:39

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] x86: BUILD_IRQ say .text to avoid .data.percpu

On Mon, 21 Jul 2008, Mike Travis wrote:
> Hugh Dickins wrote:
> > When I edit the x86_64 Makefile to -fno-unit-at-a-time, bootup panics
> > on 0xCCs in IRQ0x3e_interrupt(): IRQ0x20_interrupt etc. have got linked
> > into .data.percpu. Perhaps there are other ways of triggering that:
> > specify ".text" in the BUILD_IRQ() macro for safety.
>
> Thanks Hugh. Btw, which GCC are you using?

openSUSE 10.3's gcc (GCC) 4.2.1 (SUSE Linux)
along with their binutils-2.17.50.20070726-14

Hugh

2008-07-24 10:45:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: BUILD_IRQ say .text to avoid .data.percpu


* Hugh Dickins <[email protected]> wrote:

> When I edit the x86_64 Makefile to -fno-unit-at-a-time, bootup panics
> on 0xCCs in IRQ0x3e_interrupt(): IRQ0x20_interrupt etc. have got linked
> into .data.percpu. Perhaps there are other ways of triggering that:
> specify ".text" in the BUILD_IRQ() macro for safety.
>
> Signed-off-by: Hugh Dickins <[email protected]>

applied to tip/x86/urgent, thanks Hugh.

> I've been using -fno-unit-at-a-time (to lessen inlining, for easier
> debugging) for a long time, but never saw this until Mike's percpu
> mods came in: I mention this so you're on the lookout, just in case
> other things are more likely to go into the wrong section now. (I did
> give Mike a private headsup on this a couple of weeks ago, in case it
> helped with problems he was having with percpu, but in fact it didn't
> help.)

Should we perhaps enable this automatically on CONFIG_FRAME_POINTER=y
builds? Although a separate, default-off config option might be better,
i'd not be surprised if there were more regressions in this area, it's a
seldom used build vector.

Ingo

2008-07-25 18:45:43

by Hugh Dickins

[permalink] [raw]
Subject: Re: CONFIG_FRAME_POINTER [was [PATCH] x86: BUILD_IRQ say .text]

On Thu, 24 Jul 2008, Ingo Molnar wrote:
> * Hugh Dickins <[email protected]> wrote:
>
> > I've been using -fno-unit-at-a-time (to lessen inlining, for easier
> > debugging) for a long time
>
> Should we perhaps enable this automatically on CONFIG_FRAME_POINTER=y
> builds? Although a separate, default-off config option might be better,

I'm content to go on patching the Makefile to suit my own private debug
proclivities, somewhat reluctant to foist them upon others. But I must
have found no-unit-at-a-time made things clearer at some point,
and assume it is still being useful.

Whether it deserves its own config option, hmm, not sure of that.
But agree it shouldn't really go into CONFIG_FRAME_POINTER=y
(though notice that already bundles no-optimize-sibling-calls).

I rather think CONFIG_FRAME_POINTER shouldn't exist at all (or be
a private, config-user-invisible, specific-to-a-few-arches thing):
what one wants to configure is how far to sacrifice cpu performance
and kernel smallness to getting a good stacktrace. Frame pointer
is just an implementation detail on that, appropriate to some arches.
Perhaps three settings: no stacktrace, fair stacktrace, best stacktrace.

I've Cc'ed Ben and linuxppc-dev because I wonder if they're aware
that several options (I got it from LATENCYTOP, but I think LOCKDEP
and FTRACE and some others) are doing a "select FRAME_POINTER",
which forces CONFIG_FRAME_POINTER=y on PowerPC, even though
FRAME_POINTER is not an option offered on PowerPC. The
resulting kernels appear to run okay, but I was surprised.

> i'd not be surprised if there were more regressions in this area, it's a
> seldom used build vector.

There'd probably be a lot more if we weren't (recently?) supporting
compilers which only manage no-unit-at-a-time. The only issue I get
in my kernel builds is with sched_clock() - but I've not been trying
allyesconfig or randconfig. Here's a patch: but whereas the BUILD_IRQ
one seemed worth sending because the problem might crop up in other
ways and waste someone's time, this one just depends on your taste.


[PATCH] sched: move sched_clock before first use

Move sched_clock() up to stop warning: weak declaration of `sched_clock'
after first use results in unspecified behavior (if -fno-unit-at-a-time).

Signed-off-by: Hugh Dickins <[email protected]>
---

kernel/sched_clock.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

--- 2.6.26-git/kernel/sched_clock.c 2008-07-18 11:33:34.000000000 +0100
+++ linux/kernel/sched_clock.c 2008-07-18 11:57:46.000000000 +0100
@@ -32,6 +32,15 @@
#include <linux/ktime.h>
#include <linux/module.h>

+/*
+ * Scheduler clock - returns current time in nanosec units.
+ * This is default implementation.
+ * Architectures and sub-architectures can override this.
+ */
+unsigned long long __attribute__((weak)) sched_clock(void)
+{
+ return (unsigned long long)jiffies * (NSEC_PER_SEC / HZ);
+}

#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

@@ -321,16 +330,6 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_wakeu

#endif

-/*
- * Scheduler clock - returns current time in nanosec units.
- * This is default implementation.
- * Architectures and sub-architectures can override this.
- */
-unsigned long long __attribute__((weak)) sched_clock(void)
-{
- return (unsigned long long)jiffies * (NSEC_PER_SEC / HZ);
-}
-
unsigned long long cpu_clock(int cpu)
{
unsigned long long clock;

2008-07-25 21:46:54

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: CONFIG_FRAME_POINTER [was [PATCH] x86: BUILD_IRQ say .text]

On Fri, 2008-07-25 at 19:45 +0100, Hugh Dickins wrote:
>
> I've Cc'ed Ben and linuxppc-dev because I wonder if they're aware
> that several options (I got it from LATENCYTOP, but I think LOCKDEP
> and FTRACE and some others) are doing a "select FRAME_POINTER",
> which forces CONFIG_FRAME_POINTER=y on PowerPC, even though
> FRAME_POINTER is not an option offered on PowerPC. The
> resulting kernels appear to run okay, but I was surprised.

Because the option just does nothing for us ? :-) We always have frame
pointers on powerpc except in some case for leaf functions. I don't know
if the option has any actual effect on the later, but I don't think we
have a case where doing either way would break things.

Cheers,
Ben.

2008-07-26 11:02:48

by Hugh Dickins

[permalink] [raw]
Subject: Re: CONFIG_FRAME_POINTER [was [PATCH] x86: BUILD_IRQ say .text]

On Sat, 26 Jul 2008, Benjamin Herrenschmidt wrote:
> On Fri, 2008-07-25 at 19:45 +0100, Hugh Dickins wrote:
> >
> > I've Cc'ed Ben and linuxppc-dev because I wonder if they're aware
> > that several options (I got it from LATENCYTOP, but I think LOCKDEP
> > and FTRACE and some others) are doing a "select FRAME_POINTER",
> > which forces CONFIG_FRAME_POINTER=y on PowerPC, even though
> > FRAME_POINTER is not an option offered on PowerPC. The
> > resulting kernels appear to run okay, but I was surprised.
>
> Because the option just does nothing for us ? :-) We always have frame
> pointers on powerpc except in some case for leaf functions. I don't know
> if the option has any actual effect on the later, but I don't think we
> have a case where doing either way would break things.

Thanks, that's reassuring.

I raised the question partly because I'd noticed CONFIG_FRAME_POINTER=y
does increase the size of powerpc kernels: part of that will have been
because of the -fno-optimize-sibling-calls bundled in, but when I edit
that out of the Makefile I'm left with

text data bss dec hex filename
4773061 856632 232052 5861745 597171 FPN/vmlinux
4943653 856632 232052 6032337 5c0bd1 FPY/vmlinux

Going to the first divergence between them,
the 2.6.26-git6 vmlinux built without CONFIG_FRAME_POINTER has

c000000000008024 <.run_init_process>:
c000000000008024: 7c 08 02 a6 mflr r0
c000000000008028: fb c1 ff f0 std r30,-16(r1)
c00000000000802c: eb c2 80 48 ld r30,-32696(r2)
c000000000008030: f8 01 00 10 std r0,16(r1)
c000000000008034: f8 21 ff 81 stdu r1,-128(r1)
c000000000008038: e9 3e 80 10 ld r9,-32752(r30)
c00000000000803c: f8 69 00 00 std r3,0(r9)
c000000000008040: 7d 24 4b 78 mr r4,r9
c000000000008044: 38 a9 01 10 addi r5,r9,272
c000000000008048: 48 01 70 4d bl c00000000001f094
<.kernel_execve>
c00000000000804c: 60 00 00 00 nop
c000000000008050: 38 21 00 80 addi r1,r1,128
c000000000008054: e8 01 00 10 ld r0,16(r1)
c000000000008058: eb c1 ff f0 ld r30,-16(r1)
c00000000000805c: 7c 08 03 a6 mtlr r0
c000000000008060: 4e 80 00 20 blr

Whereas the vmlinux built with -fno-omit-frame_pointer has

c000000000008024 <.run_init_process>:
c000000000008024: 7c 08 02 a6 mflr r0
c000000000008028: fb c1 ff f0 std r30,-16(r1)
c00000000000802c: eb c2 80 48 ld r30,-32696(r2)
c000000000008030: fb e1 ff f8 std r31,-8(r1)
c000000000008034: f8 01 00 10 std r0,16(r1)
c000000000008038: f8 21 ff 81 stdu r1,-128(r1)
c00000000000803c: e9 3e 80 10 ld r9,-32752(r30)
c000000000008040: f8 69 00 00 std r3,0(r9)
c000000000008044: 7d 24 4b 78 mr r4,r9
c000000000008048: 38 a9 01 10 addi r5,r9,272
c00000000000804c: 7c 3f 0b 78 mr r31,r1
c000000000008050: 48 01 8c 91 bl c000000000020ce0
<.kernel_execve>
c000000000008054: 60 00 00 00 nop
c000000000008058: e8 21 00 00 ld r1,0(r1)
c00000000000805c: e8 01 00 10 ld r0,16(r1)
c000000000008060: eb c1 ff f0 ld r30,-16(r1)
c000000000008064: eb e1 ff f8 ld r31,-8(r1)
c000000000008068: 7c 08 03 a6 mtlr r0
c00000000000806c: 4e 80 00 20 blr

That's for

static void run_init_process(char *init_filename)
{
argv_init[0] = init_filename;
kernel_execve(init_filename, argv_init, envp_init);
}

Hmm, perhaps it is doing sibling calls differently even without the
explicit -fno-optimize-sibling-calls (but when I add that option,
the vmlinux size does go up another 4400).

Sorry, I'm most probably fussing over nothing,
and wasting your time with my ignorance.

Hugh

2008-07-26 12:37:26

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: CONFIG_FRAME_POINTER [was [PATCH] x86: BUILD_IRQ say .text]

On Sat, 2008-07-26 at 12:02 +0100, Hugh Dickins wrote:
>
> Hmm, perhaps it is doing sibling calls differently even without the
> explicit -fno-optimize-sibling-calls (but when I add that option,
> the vmlinux size does go up another 4400).
>
> Sorry, I'm most probably fussing over nothing,
> and wasting your time with my ignorance.

No you aren't, there is indeed something happening. It looks like gcc is
keeping a copy of each stack frame in r31, thus forcing to save/restore
that register, along function calls, possibly to help get reliable
frames for leaf functions. I don't think we use that "feature" in our
backtrace code though... so it won't harm in the sense that it won't
break things, but it will indeed bloat the code a little bit.

Maybe we should totally disable -fno-omit-frame-pointers on powerpc ...
either that or see about actually using that r31 linkage, though I'm not
sure it would be that useful.

I'll have to talk to our toolchain folks to figure out exactly what's
going on there.

Cheers,
Ben.

2008-07-28 14:42:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_FRAME_POINTER [was [PATCH] x86: BUILD_IRQ say .text]


* Hugh Dickins <[email protected]> wrote:

> [PATCH] sched: move sched_clock before first use
>
> Move sched_clock() up to stop warning: weak declaration of
> `sched_clock' after first use results in unspecified behavior (if
> -fno-unit-at-a-time).
>
> Signed-off-by: Hugh Dickins <[email protected]>

applied to tip/sched/urgent - thanks Hugh.

> I rather think CONFIG_FRAME_POINTER shouldn't exist at all (or be a
> private, config-user-invisible, specific-to-a-few-arches thing): what
> one wants to configure is how far to sacrifice cpu performance and
> kernel smallness to getting a good stacktrace. Frame pointer is just
> an implementation detail on that, appropriate to some arches. Perhaps
> three settings: no stacktrace, fair stacktrace, best stacktrace.

actually, we consciously use and rely on frame pointers on x86. The
runtime cost on 64-bit is miniscule and the improved backtrace output in
recent kernels makes backtraces _much_ easier to interpret:

Call Trace:
<NMI> [<ffffffff80480779>] _raw_spin_trylock+0x19/0x50
[<ffffffff808fb2e9>] _spin_lock_irqsave+0x59/0x90
[<ffffffff80261ab4>] atomic_notifier_chain_register+0x24/0x60
[<ffffffff80262f38>] ? __profile_tick+0x58/0x90
[<ffffffff808fd1a9>] nmi_watchdog_tick+0x59/0x1e0
[<ffffffff808fc79a>] default_do_nmi+0x6a/0x220
[<ffffffff808fc9b4>] do_nmi+0x64/0xb0
[<ffffffff808fc032>] nmi+0xa2/0xc2
[<ffffffff80285bd1>] ? stopmachine+0x61/0xd0
<<EOE>> [<ffffffff8020dca9>] child_rip+0xa/0x11
[<ffffffff8020cf3e>] ? restore_args+0x0/0x30
[<ffffffff80285b70>] ? stopmachine+0x0/0xd0
[<ffffffff8020dc9f>] ? child_rip+0x0/0x11

we experimented with using dwarf2 data in the past but it proved to be
very fragile in practice - we depended too much on the whims of
gcc/binutils being absolutely correct, etc.

Something as fundamental to the kernel's general health as backtraces
must not be fragile. So the EBP based backtracing code was ported to
64-bit as well and it was improved further upon.

kudos to Arjan for that.

Ingo

2008-07-28 14:57:14

by Gabriel Paubert

[permalink] [raw]
Subject: Re: CONFIG_FRAME_POINTER [was [PATCH] x86: BUILD_IRQ say .text]

On Sat, Jul 26, 2008 at 10:36:42PM +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2008-07-26 at 12:02 +0100, Hugh Dickins wrote:
> >
> > Hmm, perhaps it is doing sibling calls differently even without the
> > explicit -fno-optimize-sibling-calls (but when I add that option,
> > the vmlinux size does go up another 4400).
> >
> > Sorry, I'm most probably fussing over nothing,
> > and wasting your time with my ignorance.
>
> No you aren't, there is indeed something happening. It looks like gcc is
> keeping a copy of each stack frame in r31, thus forcing to save/restore
> that register, along function calls, possibly to help get reliable
> frames for leaf functions. I don't think we use that "feature" in our
> backtrace code though... so it won't harm in the sense that it won't
> break things, but it will indeed bloat the code a little bit.
>
> Maybe we should totally disable -fno-omit-frame-pointers on powerpc ...

Yes.
> either that or see about actually using that r31 linkage, though I'm not
> sure it would be that useful.

On PPC you can get reliable backtraces (modulo leaf functions, but AFAIR
the frame pointer does not help, only the CFI data) without a frame pointer
since the ABI makes the stack pointer chain easy to follow. The frame pointer
(r31) is only necessary when there are variable size stack allocations,
alloca() for example, but are they even allowed in the kernel?

Regards,
Gabriel

2008-07-28 14:57:48

by Hugh Dickins

[permalink] [raw]
Subject: Re: CONFIG_FRAME_POINTER [was [PATCH] x86: BUILD_IRQ say .text]

On Mon, 28 Jul 2008, Ingo Molnar wrote:
> * Hugh Dickins <[email protected]> wrote:
>
> > I rather think CONFIG_FRAME_POINTER shouldn't exist at all (or be a
> > private, config-user-invisible, specific-to-a-few-arches thing): what
> > one wants to configure is how far to sacrifice cpu performance and
> > kernel smallness to getting a good stacktrace. Frame pointer is just
> > an implementation detail on that, appropriate to some arches. Perhaps
> > three settings: no stacktrace, fair stacktrace, best stacktrace.
>
> actually, we consciously use and rely on frame pointers on x86. The
> runtime cost on 64-bit is miniscule and the improved backtrace output in
> recent kernels makes backtraces _much_ easier to interpret:

Just to clarify, no way was I criticizing the use of frame pointers
on x86. What I don't care for is that CONFIG_FRAME_POINTER is used
by common code (e.g. top level Makefile, and various debug Kconfigs),
when I see it as an arch-specific technique for getting best stacktrace.

Hugh