2012-08-19 03:01:53

by Andi Kleen

[permalink] [raw]
Subject: RFC: Link Time Optimization support for the kernel

This rather large patchkit enables gcc Link Time Optimization (LTO)
support for the kernel.

With LTO gcc will do whole program optimizations for
the whole kernel and each module. This increases compile time,
but can generate faster code.

LTO allows gcc to inline functions between different files and
do various other optimization across the whole binary.

It might also trigger bugs due to more aggressive optimization.
It allows gcc to drop unused code. It also allows it to check
types over the whole program.

The build slow down is currently between 2-4x (with larger binaries
taking longer). Typical configs with reasonably sized vmlinux
compile with less than 4GB memory, but very large setups (like
allyes) need upto 9GB.

You probably wouldn't use it for development, but it may become
a useful option in the future for release builds.

We see speedups in various benchmarks, but also still a few minor
regressions. There's still some outstanding tuning, both in compile
time and allow gcc even better optimization. Also the kernel currently
triggers some slow behaviour in gcc, which will hopefully improve
in future gcc versions, allowing faster LTO builds.

The kit contains workarounds for various toolchain problems with gcc 4.7.
Part of those will be hopefully removed with some upcoming changes.

Currently a special tool chain setup is needed for LTO, with
gcc 4.7 and HJ Lu's Linux binutils. Please see Documentation/lto-build
for more details on how to install the right versions with the right setup.
The LTO code disables itself if it doesn't find the right toolchain
(however it may not be able to detect all misconfigurations)

This is in the RFC stage at this point. I only tested it on 32bit
and 64bit x86. Other architectures will undoubtedly need more
changes. I would be interested in any testing and benchmarking and
review.

Some options are currently disabled with LTO. MODVERSIONS I plan
to fix. Some others like the FUNCTION_TRACER (who rely on
different options for specific files) may need compiler changes.

This patchkit relies on the separately posted const-sections patchkit
With LTO gcc insists on correct section attributes.

Available from

git://github.com/andikleen/linux-misc lto-3.6 (or -3.5 and -3.7 in the future)

Note the tree is frequently rebased.

Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther,
Don Zickus, Changlong Xie who helped with this project
(and probably some more who I forgot, sorry)

-Andi


2012-08-19 02:57:28

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 04/74] sections: Add __visible to ia64 sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/ia64/include/asm/sections.h | 26 +++++++++++++-------------
1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 1a873b3..68b8f3a 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -10,21 +10,21 @@
#include <linux/uaccess.h>
#include <asm-generic/sections.h>

-extern char __per_cpu_start[], __per_cpu_end[], __phys_per_cpu_start[];
+extern __visible char __per_cpu_start[], __per_cpu_end[], __phys_per_cpu_start[];
#ifdef CONFIG_SMP
-extern char __cpu0_per_cpu[];
+extern __visible char __cpu0_per_cpu[];
#endif
-extern char __start___vtop_patchlist[], __end___vtop_patchlist[];
-extern char __start___rse_patchlist[], __end___rse_patchlist[];
-extern char __start___mckinley_e9_bundles[], __end___mckinley_e9_bundles[];
-extern char __start___phys_stack_reg_patchlist[], __end___phys_stack_reg_patchlist[];
-extern char __start_gate_section[];
-extern char __start_gate_mckinley_e9_patchlist[], __end_gate_mckinley_e9_patchlist[];
-extern char __start_gate_vtop_patchlist[], __end_gate_vtop_patchlist[];
-extern char __start_gate_fsyscall_patchlist[], __end_gate_fsyscall_patchlist[];
-extern char __start_gate_brl_fsys_bubble_down_patchlist[], __end_gate_brl_fsys_bubble_down_patchlist[];
-extern char __start_unwind[], __end_unwind[];
-extern char __start_ivt_text[], __end_ivt_text[];
+extern __visible char __start___vtop_patchlist[], __end___vtop_patchlist[];
+extern __visible char __start___rse_patchlist[], __end___rse_patchlist[];
+extern __visible char __start___mckinley_e9_bundles[], __end___mckinley_e9_bundles[];
+extern __visible char __start___phys_stack_reg_patchlist[], __end___phys_stack_reg_patchlist[];
+extern __visible char __start_gate_section[];
+extern __visible char __start_gate_mckinley_e9_patchlist[], __end_gate_mckinley_e9_patchlist[];
+extern __visible char __start_gate_vtop_patchlist[], __end_gate_vtop_patchlist[];
+extern __visible char __start_gate_fsyscall_patchlist[], __end_gate_fsyscall_patchlist[];
+extern __visible char __start_gate_brl_fsys_bubble_down_patchlist[], __end_gate_brl_fsys_bubble_down_patchlist[];
+extern __visible char __start_unwind[], __end_unwind[];
+extern __visible char __start_ivt_text[], __end_ivt_text[];

#undef dereference_function_descriptor
static inline void *dereference_function_descriptor(void *ptr)
--
1.7.7.6

2012-08-19 02:58:14

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 63/74] Kbuild, lto: Print correct info messages for vmlinux link

From: Andi Kleen <[email protected]>

With LTO the tmp_vmlinux builds can take very long. Have own messages
to print so that it's clear what's going on. Previously the vmlinux
link would really happen during "LD init/built-in.o"

Also print separate messages for all the vmlinux link steps.

Signed-off-by: Andi Kleen <[email protected]>
---
scripts/link-vmlinux.sh | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 4629038..a05c49c 100644
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -176,10 +176,12 @@ if [ -n "${CONFIG_KALLSYMS}" ]; then
kallsyms_vmlinux=.tmp_vmlinux2

# step 1
+ info LDFINAL .tmp_vmlinux1
vmlinux_link "" .tmp_vmlinux1
kallsyms .tmp_vmlinux1 .tmp_kallsyms1.o

# step 2
+ info LDFINAL .tmp_vmlinux2
vmlinux_link .tmp_kallsyms1.o .tmp_vmlinux2
kallsyms .tmp_vmlinux2 .tmp_kallsyms2.o

--
1.7.7.6

2012-08-19 02:58:56

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 19/74] x86, lto: Add missing asmlinkages and __visible

From: Andi Kleen <[email protected]>

This is an arch/x86 sweep to add asmlinkage or __visible
to all functions accessed by assembler code. A lot of
functions already had it, but not all.

This is needed for the LTO kernel so that these functions are
not optimized away.

I used asmlinkage for functions without arguments, and __visible
for functions with. This prevents any problems on x86-32 with
asmlinkage changing the calling convention to regparm(0)

I kept it all in a single patch for now. Please let me know
if you want it split up.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/hw_irq.h | 118 +++++++++++++++----------------
arch/x86/include/asm/irq.h | 2 +-
arch/x86/include/asm/kprobes.h | 2 +-
arch/x86/include/asm/setup.h | 8 ++-
arch/x86/include/asm/signal.h | 2 +-
arch/x86/include/asm/switch_to.h | 4 +-
arch/x86/include/asm/syscalls.h | 22 +++---
arch/x86/kernel/apic/apic.c | 6 +-
arch/x86/kernel/cpu/mcheck/mce-inject.c | 1 +
arch/x86/kernel/head32.c | 2 +-
arch/x86/kernel/head64.c | 2 +-
arch/x86/kernel/ioport.c | 2 +-
arch/x86/kernel/irq.c | 4 +-
arch/x86/kernel/irq_work.c | 2 +-
arch/x86/kernel/kprobes.c | 2 +-
arch/x86/kernel/machine_kexec_32.c | 2 +-
arch/x86/kernel/process.c | 10 ++--
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/signal.c | 8 +-
arch/x86/kernel/smp.c | 6 +-
arch/x86/kernel/syscall_64.c | 2 +-
arch/x86/kernel/vm86_32.c | 3 +-
arch/x86/lib/usercopy_64.c | 2 +-
24 files changed, 108 insertions(+), 108 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index eb92a6e..a7cd10e 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -26,55 +26,55 @@
#include <asm/sections.h>

/* Interrupt handlers registered during init_IRQ */
-extern void apic_timer_interrupt(void);
-extern void x86_platform_ipi(void);
-extern void error_interrupt(void);
-extern void irq_work_interrupt(void);
-
-extern void spurious_interrupt(void);
-extern void thermal_interrupt(void);
-extern void reschedule_interrupt(void);
-
-extern void invalidate_interrupt(void);
-extern void invalidate_interrupt0(void);
-extern void invalidate_interrupt1(void);
-extern void invalidate_interrupt2(void);
-extern void invalidate_interrupt3(void);
-extern void invalidate_interrupt4(void);
-extern void invalidate_interrupt5(void);
-extern void invalidate_interrupt6(void);
-extern void invalidate_interrupt7(void);
-extern void invalidate_interrupt8(void);
-extern void invalidate_interrupt9(void);
-extern void invalidate_interrupt10(void);
-extern void invalidate_interrupt11(void);
-extern void invalidate_interrupt12(void);
-extern void invalidate_interrupt13(void);
-extern void invalidate_interrupt14(void);
-extern void invalidate_interrupt15(void);
-extern void invalidate_interrupt16(void);
-extern void invalidate_interrupt17(void);
-extern void invalidate_interrupt18(void);
-extern void invalidate_interrupt19(void);
-extern void invalidate_interrupt20(void);
-extern void invalidate_interrupt21(void);
-extern void invalidate_interrupt22(void);
-extern void invalidate_interrupt23(void);
-extern void invalidate_interrupt24(void);
-extern void invalidate_interrupt25(void);
-extern void invalidate_interrupt26(void);
-extern void invalidate_interrupt27(void);
-extern void invalidate_interrupt28(void);
-extern void invalidate_interrupt29(void);
-extern void invalidate_interrupt30(void);
-extern void invalidate_interrupt31(void);
-
-extern void irq_move_cleanup_interrupt(void);
-extern void reboot_interrupt(void);
-extern void threshold_interrupt(void);
-
-extern void call_function_interrupt(void);
-extern void call_function_single_interrupt(void);
+extern asmlinkage void apic_timer_interrupt(void);
+extern asmlinkage void x86_platform_ipi(void);
+extern asmlinkage void error_interrupt(void);
+extern asmlinkage void irq_work_interrupt(void);
+
+extern asmlinkage void spurious_interrupt(void);
+extern asmlinkage void thermal_interrupt(void);
+extern asmlinkage void reschedule_interrupt(void);
+
+extern asmlinkage void invalidate_interrupt(void);
+extern asmlinkage void invalidate_interrupt0(void);
+extern asmlinkage void invalidate_interrupt1(void);
+extern asmlinkage void invalidate_interrupt2(void);
+extern asmlinkage void invalidate_interrupt3(void);
+extern asmlinkage void invalidate_interrupt4(void);
+extern asmlinkage void invalidate_interrupt5(void);
+extern asmlinkage void invalidate_interrupt6(void);
+extern asmlinkage void invalidate_interrupt7(void);
+extern asmlinkage void invalidate_interrupt8(void);
+extern asmlinkage void invalidate_interrupt9(void);
+extern asmlinkage asmlinkage void invalidate_interrupt10(void);
+extern asmlinkage void invalidate_interrupt11(void);
+extern asmlinkage void invalidate_interrupt12(void);
+extern asmlinkage void invalidate_interrupt13(void);
+extern asmlinkage void invalidate_interrupt14(void);
+extern asmlinkage void invalidate_interrupt15(void);
+extern asmlinkage void invalidate_interrupt16(void);
+extern asmlinkage void invalidate_interrupt17(void);
+extern asmlinkage void invalidate_interrupt18(void);
+extern asmlinkage void invalidate_interrupt19(void);
+extern asmlinkage void invalidate_interrupt20(void);
+extern asmlinkage void invalidate_interrupt21(void);
+extern asmlinkage void invalidate_interrupt22(void);
+extern asmlinkage void invalidate_interrupt23(void);
+extern asmlinkage void invalidate_interrupt24(void);
+extern asmlinkage void invalidate_interrupt25(void);
+extern asmlinkage void invalidate_interrupt26(void);
+extern asmlinkage void invalidate_interrupt27(void);
+extern asmlinkage void invalidate_interrupt28(void);
+extern asmlinkage void invalidate_interrupt29(void);
+extern asmlinkage void invalidate_interrupt30(void);
+extern asmlinkage void invalidate_interrupt31(void);
+
+extern asmlinkage void irq_move_cleanup_interrupt(void);
+extern asmlinkage void reboot_interrupt(void);
+extern asmlinkage void threshold_interrupt(void);
+
+extern asmlinkage void call_function_interrupt(void);
+extern asmlinkage void call_function_single_interrupt(void);

/* IOAPIC */
#define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs))
@@ -143,22 +143,18 @@ extern atomic_t irq_mis_count;
extern void eisa_set_level_irq(unsigned int irq);

/* SMP */
-extern void smp_apic_timer_interrupt(struct pt_regs *);
-extern void smp_spurious_interrupt(struct pt_regs *);
-extern void smp_x86_platform_ipi(struct pt_regs *);
-extern void smp_error_interrupt(struct pt_regs *);
+extern __visible void smp_apic_timer_interrupt(struct pt_regs *);
+extern __visible void smp_spurious_interrupt(struct pt_regs *);
+extern __visible void smp_x86_platform_ipi(struct pt_regs *);
+extern __visible void smp_error_interrupt(struct pt_regs *);
#ifdef CONFIG_X86_IO_APIC
extern asmlinkage void smp_irq_move_cleanup_interrupt(void);
#endif
#ifdef CONFIG_SMP
-extern void smp_reschedule_interrupt(struct pt_regs *);
-extern void smp_call_function_interrupt(struct pt_regs *);
-extern void smp_call_function_single_interrupt(struct pt_regs *);
-#ifdef CONFIG_X86_32
-extern void smp_invalidate_interrupt(struct pt_regs *);
-#else
-extern asmlinkage void smp_invalidate_interrupt(struct pt_regs *);
-#endif
+extern __visible void smp_reschedule_interrupt(struct pt_regs *);
+extern __visible void smp_call_function_interrupt(struct pt_regs *);
+extern __visible void smp_call_function_single_interrupt(struct pt_regs *);
+extern __visible void smp_invalidate_interrupt(struct pt_regs *);
#endif

extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void);
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index ba870bb..c996357 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -33,7 +33,7 @@ extern void (*x86_platform_ipi_callback)(void);
extern void native_init_IRQ(void);
extern bool handle_irq(unsigned irq, struct pt_regs *regs);

-extern unsigned int do_IRQ(struct pt_regs *regs);
+extern __visible unsigned int do_IRQ(struct pt_regs *regs);

/* Interrupt vector management */
extern DECLARE_BITMAP(used_vectors, NR_VECTORS);
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 5478825..a967173 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -61,7 +61,7 @@ extern kprobe_opcode_t optprobe_template_end;
extern const int kretprobe_blacklist_size;

void arch_remove_kprobe(struct kprobe *p);
-void kretprobe_trampoline(void);
+asmlinkage void kretprobe_trampoline(void);

/* Architecture specific copy of original instruction*/
struct arch_specific_insn {
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index d0f19f9..4575ffb 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -5,6 +5,8 @@

#define COMMAND_LINE_SIZE 2048

+#include <linux/linkage.h>
+
#ifdef __i386__

#include <linux/pfn.h>
@@ -107,11 +109,11 @@ void *extend_brk(size_t size, size_t align);
extern void probe_roms(void);
#ifdef __i386__

-void __init i386_start_kernel(void);
+asmlinkage void __init i386_start_kernel(void);

#else
-void __init x86_64_start_kernel(char *real_mode);
-void __init x86_64_start_reservations(char *real_mode_data);
+asmlinkage void __init x86_64_start_kernel(char *real_mode);
+asmlinkage void __init x86_64_start_reservations(char *real_mode_data);

#endif /* __i386__ */
#endif /* _SETUP */
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index 598457c..4937d72 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -122,7 +122,7 @@ typedef unsigned long sigset_t;
#ifndef __ASSEMBLY__

# ifdef __KERNEL__
-extern void do_notify_resume(struct pt_regs *, void *, __u32);
+extern __visible void do_notify_resume(struct pt_regs *, void *, __u32);
# endif /* __KERNEL__ */

#ifdef __i386__
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 4ec45b3..a7ad95a 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -2,8 +2,8 @@
#define _ASM_X86_SWITCH_TO_H

struct task_struct; /* one of the stranger aspects of C forward declarations */
-struct task_struct *__switch_to(struct task_struct *prev,
- struct task_struct *next);
+__visible struct task_struct *__switch_to(struct task_struct *prev,
+ struct task_struct *next);
struct tss_struct;
void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
struct tss_struct *tss);
diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index f1d8b44..de299c8 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -18,23 +18,23 @@
/* Common in X86_32 and X86_64 */
/* kernel/ioport.c */
asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
-long sys_iopl(unsigned int, struct pt_regs *);
+__visible long sys_iopl(unsigned int, struct pt_regs *);

/* kernel/process.c */
-int sys_fork(struct pt_regs *);
-int sys_vfork(struct pt_regs *);
-long sys_execve(const char __user *,
+__visible int sys_fork(struct pt_regs *);
+__visible int sys_vfork(struct pt_regs *);
+__visible long sys_execve(const char __user *,
const char __user *const __user *,
const char __user *const __user *, struct pt_regs *);
-long sys_clone(unsigned long, unsigned long, void __user *,
+__visible long sys_clone(unsigned long, unsigned long, void __user *,
void __user *, struct pt_regs *);

/* kernel/ldt.c */
asmlinkage int sys_modify_ldt(int, void __user *, unsigned long);

/* kernel/signal.c */
-long sys_rt_sigreturn(struct pt_regs *);
-long sys_sigaltstack(const stack_t __user *, stack_t __user *,
+__visible long sys_rt_sigreturn(struct pt_regs *);
+__visible long sys_sigaltstack(const stack_t __user *, stack_t __user *,
struct pt_regs *);


@@ -49,17 +49,17 @@ asmlinkage int sys_get_thread_area(struct user_desc __user *);
asmlinkage int sys_sigsuspend(int, int, old_sigset_t);
asmlinkage int sys_sigaction(int, const struct old_sigaction __user *,
struct old_sigaction __user *);
-unsigned long sys_sigreturn(struct pt_regs *);
+__visible unsigned long sys_sigreturn(struct pt_regs *);

/* kernel/vm86_32.c */
-int sys_vm86old(struct vm86_struct __user *, struct pt_regs *);
-int sys_vm86(unsigned long, unsigned long, struct pt_regs *);
+__visible int sys_vm86old(struct vm86_struct __user *, struct pt_regs *);
+__visible int sys_vm86(unsigned long, unsigned long, struct pt_regs *);

#else /* CONFIG_X86_32 */

/* X86_64 only */
/* kernel/process_64.c */
-long sys_arch_prctl(int, unsigned long);
+asmlinkage long sys_arch_prctl(int, unsigned long);

/* kernel/sys_x86_64.c */
asmlinkage long sys_mmap(unsigned long, unsigned long, unsigned long,
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 24deb30..71f1284 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -879,7 +879,7 @@ static void local_apic_timer_interrupt(void)
* [ if a single-CPU system runs an SMP kernel then we call the local
* interrupt as well. Thus we cannot inline the local irq ... ]
*/
-void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
+__visible void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);

@@ -1875,7 +1875,7 @@ int __init APIC_init_uniprocessor(void)
/*
* This interrupt should _never_ happen with our APIC/SMP architecture
*/
-void smp_spurious_interrupt(struct pt_regs *regs)
+__visible void smp_spurious_interrupt(struct pt_regs *regs)
{
u32 v;

@@ -1901,7 +1901,7 @@ void smp_spurious_interrupt(struct pt_regs *regs)
/*
* This interrupt should never happen with our APIC/SMP architecture
*/
-void smp_error_interrupt(struct pt_regs *regs)
+__visible void smp_error_interrupt(struct pt_regs *regs)
{
u32 v0, v1;
u32 i = 0;
diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
index fc4beb3..ccfcf1c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -27,6 +27,7 @@
#include <asm/mce.h>
#include <asm/apic.h>
#include <asm/nmi.h>
+#include <asm/traps.h>

/* Update fake mce registers on current CPU. */
static void inject_mce(struct mce *m)
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index c18f59d..1b04fb8 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -28,7 +28,7 @@ static void __init i386_default_early_setup(void)
reserve_ebda_region();
}

-void __init i386_start_kernel(void)
+asmlinkage void __init i386_start_kernel(void)
{
memblock_reserve(__pa_symbol(&_text),
__pa_symbol(&__bss_stop) - __pa_symbol(&_text));
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..ded14c3 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -52,7 +52,7 @@ static void __init copy_bootdata(char *real_mode_data)
}
}

-void __init x86_64_start_kernel(char * real_mode_data)
+asmlinkage void __init x86_64_start_kernel(char * real_mode_data)
{
int i;

diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index 8c96897..e828193 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -93,7 +93,7 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int turn_on)
* on system-call entry - see also fork() and the signal handling
* code.
*/
-long sys_iopl(unsigned int level, struct pt_regs *regs)
+__visible long sys_iopl(unsigned int level, struct pt_regs *regs)
{
unsigned int old = (regs->flags >> 12) & 3;
struct thread_struct *t = &current->thread;
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 7ad683d..69c4ace 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -178,7 +178,7 @@ u64 arch_irq_stat(void)
* SMP cross-CPU interrupts have their own specific
* handlers).
*/
-unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
+__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);

@@ -208,7 +208,7 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
/*
* Handler for X86_PLATFORM_IPI_VECTOR.
*/
-void smp_x86_platform_ipi(struct pt_regs *regs)
+__visible void smp_x86_platform_ipi(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);

diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c
index ca8f703..f95694b 100644
--- a/arch/x86/kernel/irq_work.c
+++ b/arch/x86/kernel/irq_work.c
@@ -9,7 +9,7 @@
#include <linux/hardirq.h>
#include <asm/apic.h>

-void smp_irq_work_interrupt(struct pt_regs *regs)
+__visible void smp_irq_work_interrupt(struct pt_regs *regs)
{
irq_enter();
ack_APIC_irq();
diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index e2f751e..86d828b 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -647,7 +647,7 @@ static void __used __kprobes kretprobe_trampoline_holder(void)
/*
* Called from kretprobe_trampoline
*/
-static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
+__visible __used __kprobes void *trampoline_handler(struct pt_regs *regs)
{
struct kretprobe_instance *ri = NULL;
struct hlist_head *head, empty_rp;
diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
index 5b19e4d..7048960 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -184,7 +184,7 @@ void machine_kexec(struct kimage *image)
unsigned long page_list[PAGES_NR];
void *control_page;
int save_ftrace_enabled;
- asmlinkage unsigned long
+ unsigned long
(*relocate_kernel_ptr)(unsigned long indirection_page,
unsigned long control_page,
unsigned long start_address,
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ef6a845..0d099ca3 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -268,7 +268,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
propagate_user_return_notify(prev_p, next_p);
}

-int sys_fork(struct pt_regs *regs)
+__visible int sys_fork(struct pt_regs *regs)
{
return do_fork(SIGCHLD, regs->sp, regs, 0, NULL, NULL);
}
@@ -283,13 +283,13 @@ int sys_fork(struct pt_regs *regs)
* do not have enough call-clobbered registers to hold all
* the information you need.
*/
-int sys_vfork(struct pt_regs *regs)
+__visible int sys_vfork(struct pt_regs *regs)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->sp, regs, 0,
NULL, NULL);
}

-long
+__visible long
sys_clone(unsigned long clone_flags, unsigned long newsp,
void __user *parent_tid, void __user *child_tid, struct pt_regs *regs)
{
@@ -303,7 +303,7 @@ sys_clone(unsigned long clone_flags, unsigned long newsp,
* function to call, and %di containing
* the "args".
*/
-extern void kernel_thread_helper(void);
+__visible extern void kernel_thread_helper(void);

/*
* Create a kernel thread
@@ -339,7 +339,7 @@ EXPORT_SYMBOL(kernel_thread);
/*
* sys_execve() executes a new program.
*/
-long sys_execve(const char __user *name,
+__visible long sys_execve(const char __user *name,
const char __user *const __user *argv,
const char __user *const __user *envp, struct pt_regs *regs)
{
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 516fa18..4826bd1 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -225,7 +225,7 @@ EXPORT_SYMBOL_GPL(start_thread);
* the task-switch, and shows up in ret_from_fork in entry.S,
* for example.
*/
-__notrace_funcgraph struct task_struct *
+__visible __notrace_funcgraph struct task_struct *
__switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{
struct thread_struct *prev = &prev_p->thread,
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 0a980c9..a5720ed 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -265,7 +265,7 @@ void start_thread_ia32(struct pt_regs *regs, u32 new_ip, u32 new_sp)
* Kprobes not supported here. Set the probe on schedule instead.
* Function graph tracer not supported too.
*/
-__notrace_funcgraph struct task_struct *
+__visible __notrace_funcgraph struct task_struct *
__switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{
struct thread_struct *prev = &prev_p->thread;
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index b280908..b3c241d 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -532,7 +532,7 @@ sys_sigaction(int sig, const struct old_sigaction __user *act,
}
#endif /* CONFIG_X86_32 */

-long
+__visible long
sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss,
struct pt_regs *regs)
{
@@ -543,7 +543,7 @@ sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss,
* Do a signal return; undo the signal stack.
*/
#ifdef CONFIG_X86_32
-unsigned long sys_sigreturn(struct pt_regs *regs)
+__visible unsigned long sys_sigreturn(struct pt_regs *regs)
{
struct sigframe __user *frame;
unsigned long ax;
@@ -571,7 +571,7 @@ badframe:
}
#endif /* CONFIG_X86_32 */

-long sys_rt_sigreturn(struct pt_regs *regs)
+__visible long sys_rt_sigreturn(struct pt_regs *regs)
{
struct rt_sigframe __user *frame;
unsigned long ax;
@@ -776,7 +776,7 @@ static void do_signal(struct pt_regs *regs)
* notification of userspace execution resumption
* - triggered by the TIF_WORK_MASK flags
*/
-void
+__visible void
do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
{
#ifdef CONFIG_X86_MCE
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 48d2b7d..d6bfa11 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -249,7 +249,7 @@ finish:
/*
* Reschedule call back.
*/
-void smp_reschedule_interrupt(struct pt_regs *regs)
+__visible void smp_reschedule_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
inc_irq_stat(irq_resched_count);
@@ -259,7 +259,7 @@ void smp_reschedule_interrupt(struct pt_regs *regs)
*/
}

-void smp_call_function_interrupt(struct pt_regs *regs)
+__visible void smp_call_function_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
irq_enter();
@@ -268,7 +268,7 @@ void smp_call_function_interrupt(struct pt_regs *regs)
irq_exit();
}

-void smp_call_function_single_interrupt(struct pt_regs *regs)
+__visible void smp_call_function_single_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
irq_enter();
diff --git a/arch/x86/kernel/syscall_64.c b/arch/x86/kernel/syscall_64.c
index 5c7f8c2..3967318 100644
--- a/arch/x86/kernel/syscall_64.c
+++ b/arch/x86/kernel/syscall_64.c
@@ -23,7 +23,7 @@ typedef void (*sys_call_ptr_t)(void);

extern void sys_ni_syscall(void);

-const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
+asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
/*
* Smells like a compiler bug -- it doesn't work
* when the & below is removed.
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 54abcc0..5c1a188 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -202,7 +202,7 @@ out:
static int do_vm86_irq_handling(int subfunction, int irqnumber);
static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk);

-int sys_vm86old(struct vm86_struct __user *v86, struct pt_regs *regs)
+__visible int sys_vm86old(struct vm86_struct __user *v86, struct pt_regs *regs)
{
struct kernel_vm86_struct info; /* declare this _on top_,
* this avoids wasting of stack space.
@@ -231,6 +231,7 @@ out:
}


+__visible
int sys_vm86(unsigned long cmd, unsigned long arg, struct pt_regs *regs)
{
struct kernel_vm86_struct info; /* declare this _on top_,
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index e5b130b..a28bddf 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -66,7 +66,7 @@ EXPORT_SYMBOL(copy_in_user);
* Since protection fault in copy_from/to_user is not a normal situation,
* it is not necessary to optimize tail handling.
*/
-unsigned long
+__visible unsigned long
copy_user_handle_tail(char *to, char *from, unsigned len, unsigned zerorest)
{
char c;
--
1.7.7.6

2012-08-19 02:59:14

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 72/74] lto: Mark spinlocks noinline when inline spinlocks are disabled

From: Andi Kleen <[email protected]>

Otherwise LTO will inline them anyways

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/spinlock.c | 56 ++++++++++++++++++++++++++--------------------------
1 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/kernel/spinlock.c b/kernel/spinlock.c
index 75439be..e68917a 100644
--- a/kernel/spinlock.c
+++ b/kernel/spinlock.c
@@ -116,7 +116,7 @@ BUILD_LOCK_OPS(write, rwlock);
#endif

#ifndef CONFIG_INLINE_SPIN_TRYLOCK
-int __lockfunc _raw_spin_trylock(raw_spinlock_t *lock)
+noinline int __lockfunc _raw_spin_trylock(raw_spinlock_t *lock)
{
return __raw_spin_trylock(lock);
}
@@ -124,7 +124,7 @@ EXPORT_SYMBOL(_raw_spin_trylock);
#endif

#ifndef CONFIG_INLINE_SPIN_TRYLOCK_BH
-int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)
+noinline int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)
{
return __raw_spin_trylock_bh(lock);
}
@@ -132,7 +132,7 @@ EXPORT_SYMBOL(_raw_spin_trylock_bh);
#endif

#ifndef CONFIG_INLINE_SPIN_LOCK
-void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
{
__raw_spin_lock(lock);
}
@@ -140,7 +140,7 @@ EXPORT_SYMBOL(_raw_spin_lock);
#endif

#ifndef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
+noinline unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
{
return __raw_spin_lock_irqsave(lock);
}
@@ -148,7 +148,7 @@ EXPORT_SYMBOL(_raw_spin_lock_irqsave);
#endif

#ifndef CONFIG_INLINE_SPIN_LOCK_IRQ
-void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
{
__raw_spin_lock_irq(lock);
}
@@ -156,7 +156,7 @@ EXPORT_SYMBOL(_raw_spin_lock_irq);
#endif

#ifndef CONFIG_INLINE_SPIN_LOCK_BH
-void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
{
__raw_spin_lock_bh(lock);
}
@@ -164,7 +164,7 @@ EXPORT_SYMBOL(_raw_spin_lock_bh);
#endif

#ifdef CONFIG_UNINLINE_SPIN_UNLOCK
-void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)
{
__raw_spin_unlock(lock);
}
@@ -172,7 +172,7 @@ EXPORT_SYMBOL(_raw_spin_unlock);
#endif

#ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE
-void __lockfunc _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
{
__raw_spin_unlock_irqrestore(lock, flags);
}
@@ -180,7 +180,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_irqrestore);
#endif

#ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQ
-void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
{
__raw_spin_unlock_irq(lock);
}
@@ -188,7 +188,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_irq);
#endif

#ifndef CONFIG_INLINE_SPIN_UNLOCK_BH
-void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
{
__raw_spin_unlock_bh(lock);
}
@@ -196,7 +196,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_bh);
#endif

#ifndef CONFIG_INLINE_READ_TRYLOCK
-int __lockfunc _raw_read_trylock(rwlock_t *lock)
+noinline int __lockfunc _raw_read_trylock(rwlock_t *lock)
{
return __raw_read_trylock(lock);
}
@@ -204,7 +204,7 @@ EXPORT_SYMBOL(_raw_read_trylock);
#endif

#ifndef CONFIG_INLINE_READ_LOCK
-void __lockfunc _raw_read_lock(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock(rwlock_t *lock)
{
__raw_read_lock(lock);
}
@@ -212,7 +212,7 @@ EXPORT_SYMBOL(_raw_read_lock);
#endif

#ifndef CONFIG_INLINE_READ_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_read_lock_irqsave(rwlock_t *lock)
+noinline unsigned long __lockfunc _raw_read_lock_irqsave(rwlock_t *lock)
{
return __raw_read_lock_irqsave(lock);
}
@@ -220,7 +220,7 @@ EXPORT_SYMBOL(_raw_read_lock_irqsave);
#endif

#ifndef CONFIG_INLINE_READ_LOCK_IRQ
-void __lockfunc _raw_read_lock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock_irq(rwlock_t *lock)
{
__raw_read_lock_irq(lock);
}
@@ -228,7 +228,7 @@ EXPORT_SYMBOL(_raw_read_lock_irq);
#endif

#ifndef CONFIG_INLINE_READ_LOCK_BH
-void __lockfunc _raw_read_lock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock_bh(rwlock_t *lock)
{
__raw_read_lock_bh(lock);
}
@@ -236,7 +236,7 @@ EXPORT_SYMBOL(_raw_read_lock_bh);
#endif

#ifndef CONFIG_INLINE_READ_UNLOCK
-void __lockfunc _raw_read_unlock(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock(rwlock_t *lock)
{
__raw_read_unlock(lock);
}
@@ -244,7 +244,7 @@ EXPORT_SYMBOL(_raw_read_unlock);
#endif

#ifndef CONFIG_INLINE_READ_UNLOCK_IRQRESTORE
-void __lockfunc _raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
{
__raw_read_unlock_irqrestore(lock, flags);
}
@@ -252,7 +252,7 @@ EXPORT_SYMBOL(_raw_read_unlock_irqrestore);
#endif

#ifndef CONFIG_INLINE_READ_UNLOCK_IRQ
-void __lockfunc _raw_read_unlock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock_irq(rwlock_t *lock)
{
__raw_read_unlock_irq(lock);
}
@@ -260,7 +260,7 @@ EXPORT_SYMBOL(_raw_read_unlock_irq);
#endif

#ifndef CONFIG_INLINE_READ_UNLOCK_BH
-void __lockfunc _raw_read_unlock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock_bh(rwlock_t *lock)
{
__raw_read_unlock_bh(lock);
}
@@ -268,7 +268,7 @@ EXPORT_SYMBOL(_raw_read_unlock_bh);
#endif

#ifndef CONFIG_INLINE_WRITE_TRYLOCK
-int __lockfunc _raw_write_trylock(rwlock_t *lock)
+noinline int __lockfunc _raw_write_trylock(rwlock_t *lock)
{
return __raw_write_trylock(lock);
}
@@ -276,7 +276,7 @@ EXPORT_SYMBOL(_raw_write_trylock);
#endif

#ifndef CONFIG_INLINE_WRITE_LOCK
-void __lockfunc _raw_write_lock(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock(rwlock_t *lock)
{
__raw_write_lock(lock);
}
@@ -284,7 +284,7 @@ EXPORT_SYMBOL(_raw_write_lock);
#endif

#ifndef CONFIG_INLINE_WRITE_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_write_lock_irqsave(rwlock_t *lock)
+noinline unsigned long __lockfunc _raw_write_lock_irqsave(rwlock_t *lock)
{
return __raw_write_lock_irqsave(lock);
}
@@ -292,7 +292,7 @@ EXPORT_SYMBOL(_raw_write_lock_irqsave);
#endif

#ifndef CONFIG_INLINE_WRITE_LOCK_IRQ
-void __lockfunc _raw_write_lock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock_irq(rwlock_t *lock)
{
__raw_write_lock_irq(lock);
}
@@ -300,7 +300,7 @@ EXPORT_SYMBOL(_raw_write_lock_irq);
#endif

#ifndef CONFIG_INLINE_WRITE_LOCK_BH
-void __lockfunc _raw_write_lock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock_bh(rwlock_t *lock)
{
__raw_write_lock_bh(lock);
}
@@ -308,7 +308,7 @@ EXPORT_SYMBOL(_raw_write_lock_bh);
#endif

#ifndef CONFIG_INLINE_WRITE_UNLOCK
-void __lockfunc _raw_write_unlock(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock(rwlock_t *lock)
{
__raw_write_unlock(lock);
}
@@ -316,7 +316,7 @@ EXPORT_SYMBOL(_raw_write_unlock);
#endif

#ifndef CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE
-void __lockfunc _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
{
__raw_write_unlock_irqrestore(lock, flags);
}
@@ -324,7 +324,7 @@ EXPORT_SYMBOL(_raw_write_unlock_irqrestore);
#endif

#ifndef CONFIG_INLINE_WRITE_UNLOCK_IRQ
-void __lockfunc _raw_write_unlock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock_irq(rwlock_t *lock)
{
__raw_write_unlock_irq(lock);
}
@@ -332,7 +332,7 @@ EXPORT_SYMBOL(_raw_write_unlock_irq);
#endif

#ifndef CONFIG_INLINE_WRITE_UNLOCK_BH
-void __lockfunc _raw_write_unlock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock_bh(rwlock_t *lock)
{
__raw_write_unlock_bh(lock);
}
--
1.7.7.6

2012-08-19 02:59:17

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 01/74] Add __visible

From: Andi Kleen <[email protected]>

gcc 4.6+ has support for a externally_visible attribute, that prevents
the optimizer from optimizing unused symbols away. Add a __visible macro
to use it with that compiler version or later.

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/compiler-gcc4.h | 7 +++++++
include/linux/compiler.h | 4 ++++
2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 2f40791..934bc34 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -49,6 +49,13 @@
#endif
#endif

+#if __GNUC_MINOR__ >= 6
+/*
+ * Tell the optimizer that something else uses this function or variable.
+ */
+#define __visible __attribute__((externally_visible))
+#endif
+
#if __GNUC_MINOR__ > 0
#define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
#endif
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 923d093..f430e41 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -278,6 +278,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
# define __section(S) __attribute__ ((__section__(#S)))
#endif

+#ifndef __visible
+#define __visible
+#endif
+
/* Are two types/vars the same type (ignoring qualifiers)? */
#ifndef __same_type
# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
--
1.7.7.6

2012-08-19 02:59:22

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 61/74] Kbuild, lto: Drop .number postfixes in modpost

From: Andi Kleen <[email protected]>

LTO turns all global symbols effectively into statics. This
has the side effect that they all have a .NUMBER postfix to make
them unique. In modpost drop this postfix because it confuses
it.

Signed-off-by: Andi Kleen <[email protected]>
---
scripts/mod/modpost.c | 15 ++++++++++++++-
scripts/mod/modpost.h | 2 +-
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index c797e95..ccd34ff 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1696,6 +1696,19 @@ static void check_sec_ref(struct module *mod, const char *modname,
}
}

+static char *remove_dot(char *s)
+{
+ char *end;
+ int n = strcspn(s, ".");
+
+ if (n > 0 && s[n] != 0) {
+ strtoul(s + n + 1, &end, 10);
+ if (end > s + n + 1 && (*end == '.' || *end == 0))
+ s[n] = 0;
+ }
+ return s;
+}
+
static void read_symbols(char *modname)
{
const char *symname;
@@ -1734,7 +1747,7 @@ static void read_symbols(char *modname)
}

for (sym = info.symtab_start; sym < info.symtab_stop; sym++) {
- symname = info.strtab + sym->st_name;
+ symname = remove_dot(info.strtab + sym->st_name);

handle_modversions(mod, &info, sym, symname);
handle_moddevtable(mod, &info, sym, symname);
diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
index 51207e4..168b43d 100644
--- a/scripts/mod/modpost.h
+++ b/scripts/mod/modpost.h
@@ -127,7 +127,7 @@ struct elf_info {
Elf_Section export_gpl_sec;
Elf_Section export_unused_gpl_sec;
Elf_Section export_gpl_future_sec;
- const char *strtab;
+ char *strtab;
char *modinfo;
unsigned int modinfo_len;

--
1.7.7.6

2012-08-19 03:00:15

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 38/74] lto, watchdog/hpwdt.c: Make assembler label global

From: Andi Kleen <[email protected]>

We cannot assume that the inline assembler code always ends up
in the same file as the original C file. So make any assembler labels
that are called with "extern" by C global

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
drivers/watchdog/hpwdt.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 1eff743..68bda60 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -161,7 +161,8 @@ extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
#define HPWDT_ARCH 32

asm(".text \n\t"
- ".align 4 \n"
+ ".align 4 \n\t"
+ ".globl asminline_call \n"
"asminline_call: \n\t"
"pushl %ebp \n\t"
"movl %esp, %ebp \n\t"
@@ -351,7 +352,8 @@ static int __devinit detect_cru_service(void)
#define HPWDT_ARCH 64

asm(".text \n\t"
- ".align 4 \n"
+ ".align 4 \n\t"
+ ".globl asminline_call \n"
"asminline_call: \n\t"
"pushq %rbp \n\t"
"movq %rsp, %rbp \n\t"
--
1.7.7.6

2012-08-19 03:00:22

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 14/74] sections: Add __visible to kernel/* sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/extable.c | 4 ++--
kernel/ksysfs.c | 4 ++--
kernel/module.c | 30 +++++++++++++++---------------
kernel/params.c | 6 +++---
kernel/spinlock.c | 2 +-
5 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/kernel/extable.c b/kernel/extable.c
index fe35a63..2328718 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -32,8 +32,8 @@
*/
DEFINE_MUTEX(text_mutex);

-extern struct exception_table_entry __start___ex_table[];
-extern struct exception_table_entry __stop___ex_table[];
+extern __visible struct exception_table_entry __start___ex_table[];
+extern __visible struct exception_table_entry __stop___ex_table[];

/* Cleared by build time tools if the table is already sorted. */
u32 __initdata main_extable_sort_needed = 1;
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 4e316e1..627314f 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -144,8 +144,8 @@ KERNEL_ATTR_RO(fscaps);
/*
* Make /sys/kernel/notes give the raw contents of our kernel .notes section.
*/
-extern const void __start_notes __attribute__((weak));
-extern const void __stop_notes __attribute__((weak));
+extern __visible const void __start_notes __attribute__((weak));
+extern __visible const void __stop_notes __attribute__((weak));
#define notes_size (&__stop_notes - &__start_notes)

static ssize_t notes_read(struct file *filp, struct kobject *kobj,
diff --git a/kernel/module.c b/kernel/module.c
index 4edbd9c..c00565a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -206,22 +206,22 @@ static void *section_objs(const struct load_info *info,
}

/* Provided by the linker */
-extern const struct kernel_symbol __start___ksymtab[];
-extern const struct kernel_symbol __stop___ksymtab[];
-extern const struct kernel_symbol __start___ksymtab_gpl[];
-extern const struct kernel_symbol __stop___ksymtab_gpl[];
-extern const struct kernel_symbol __start___ksymtab_gpl_future[];
-extern const struct kernel_symbol __stop___ksymtab_gpl_future[];
-extern const unsigned long __start___kcrctab[];
-extern const unsigned long __start___kcrctab_gpl[];
-extern const unsigned long __start___kcrctab_gpl_future[];
+extern __visible const struct kernel_symbol __start___ksymtab[];
+extern __visible const struct kernel_symbol __stop___ksymtab[];
+extern __visible const struct kernel_symbol __start___ksymtab_gpl[];
+extern __visible const struct kernel_symbol __stop___ksymtab_gpl[];
+extern __visible const struct kernel_symbol __start___ksymtab_gpl_future[];
+extern __visible const struct kernel_symbol __stop___ksymtab_gpl_future[];
+extern __visible const unsigned long __start___kcrctab[];
+extern __visible const unsigned long __start___kcrctab_gpl[];
+extern __visible const unsigned long __start___kcrctab_gpl_future[];
#ifdef CONFIG_UNUSED_SYMBOLS
-extern const struct kernel_symbol __start___ksymtab_unused[];
-extern const struct kernel_symbol __stop___ksymtab_unused[];
-extern const struct kernel_symbol __start___ksymtab_unused_gpl[];
-extern const struct kernel_symbol __stop___ksymtab_unused_gpl[];
-extern const unsigned long __start___kcrctab_unused[];
-extern const unsigned long __start___kcrctab_unused_gpl[];
+extern __visible const struct kernel_symbol __start___ksymtab_unused[];
+extern __visible const struct kernel_symbol __stop___ksymtab_unused[];
+extern __visible const struct kernel_symbol __start___ksymtab_unused_gpl[];
+extern __visible const struct kernel_symbol __stop___ksymtab_unused_gpl[];
+extern __visible const unsigned long __start___kcrctab_unused[];
+extern __visible const unsigned long __start___kcrctab_unused_gpl[];
#endif

#ifndef CONFIG_MODVERSIONS
diff --git a/kernel/params.c b/kernel/params.c
index ed35345..c3e61dc 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -503,7 +503,7 @@ EXPORT_SYMBOL(param_ops_string);
#define to_module_attr(n) container_of(n, struct module_attribute, attr)
#define to_module_kobject(n) container_of(n, struct module_kobject, kobj)

-extern struct kernel_param __start___param[], __stop___param[];
+extern __visible struct kernel_param __start___param[], __stop___param[];

struct param_attribute
{
@@ -827,8 +827,8 @@ ssize_t __modver_version_show(struct module_attribute *mattr,
return sprintf(buf, "%s\n", vattr->version);
}

-extern const struct module_version_attribute *__start___modver[];
-extern const struct module_version_attribute *__stop___modver[];
+extern __visible const struct module_version_attribute *__start___modver[];
+extern __visible const struct module_version_attribute *__stop___modver[];

static void __init version_sysfs_builtin(void)
{
diff --git a/kernel/spinlock.c b/kernel/spinlock.c
index 5cdd806..75439be 100644
--- a/kernel/spinlock.c
+++ b/kernel/spinlock.c
@@ -377,7 +377,7 @@ EXPORT_SYMBOL(_raw_spin_lock_nest_lock);
notrace int in_lock_functions(unsigned long addr)
{
/* Linker adds these: start and end of __lockfunc functions */
- extern char __lock_text_start[], __lock_text_end[];
+ extern __visible char __lock_text_start[], __lock_text_end[];

return addr >= (unsigned long)__lock_text_start
&& addr < (unsigned long)__lock_text_end;
--
1.7.7.6

2012-08-19 03:00:25

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 05/74] sections: Add __visible to m68k sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/m68k/include/asm/module.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/m68k/include/asm/module.h b/arch/m68k/include/asm/module.h
index edffe66..15cb110 100644
--- a/arch/m68k/include/asm/module.h
+++ b/arch/m68k/include/asm/module.h
@@ -30,7 +30,7 @@ struct mod_arch_specific {

#endif /* CONFIG_MMU */

-extern struct m68k_fixup_info __start_fixup[], __stop_fixup[];
+extern __visible struct m68k_fixup_info __start_fixup[], __stop_fixup[];

struct module;
extern void module_fixup(struct module *mod, struct m68k_fixup_info *start,
--
1.7.7.6

2012-08-19 03:00:35

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 12/74] sections: Add __visible to jump_label sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/jump_label.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0976fc4..a39e8e3 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -106,8 +106,8 @@ static __always_inline bool static_key_true(struct static_key *key)
return !static_key_false(key);
}

-extern struct jump_entry __start___jump_table[];
-extern struct jump_entry __stop___jump_table[];
+extern __visible struct jump_entry __start___jump_table[];
+extern __visible struct jump_entry __stop___jump_table[];

extern void jump_label_init(void);
extern void jump_label_lock(void);
--
1.7.7.6

2012-08-19 03:00:32

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 47/74] x86, lto: Fix kprobes for LTO

From: Andi Kleen <[email protected]>

- Make all the external assembler template symbols __visible
- Move the templates inline assembler code into a top level
assembler statement, not inside a function. This avoids it being
optimized away or cloned.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/kprobes.h | 8 ++++----
arch/x86/kernel/kprobes-opt.c | 5 +----
2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index a967173..fa2d12e 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -48,10 +48,10 @@ typedef u8 kprobe_opcode_t;
#define flush_insn_slot(p) do { } while (0)

/* optinsn template addresses */
-extern kprobe_opcode_t optprobe_template_entry;
-extern kprobe_opcode_t optprobe_template_val;
-extern kprobe_opcode_t optprobe_template_call;
-extern kprobe_opcode_t optprobe_template_end;
+extern __visible kprobe_opcode_t optprobe_template_entry;
+extern __visible kprobe_opcode_t optprobe_template_val;
+extern __visible kprobe_opcode_t optprobe_template_call;
+extern __visible kprobe_opcode_t optprobe_template_end;
#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
#define MAX_OPTINSN_SIZE \
(((unsigned long)&optprobe_template_end - \
diff --git a/arch/x86/kernel/kprobes-opt.c b/arch/x86/kernel/kprobes-opt.c
index c5e410e..43c34e8 100644
--- a/arch/x86/kernel/kprobes-opt.c
+++ b/arch/x86/kernel/kprobes-opt.c
@@ -88,9 +88,7 @@ static void __kprobes synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long v
*(unsigned long *)addr = val;
}

-static void __used __kprobes kprobes_optinsn_template_holder(void)
-{
- asm volatile (
+asm (
".global optprobe_template_entry\n"
"optprobe_template_entry:\n"
#ifdef CONFIG_X86_64
@@ -129,7 +127,6 @@ static void __used __kprobes kprobes_optinsn_template_holder(void)
#endif
".global optprobe_template_end\n"
"optprobe_template_end:\n");
-}

#define TMPL_MOVE_IDX \
((long)&optprobe_template_val - (long)&optprobe_template_entry)
--
1.7.7.6

2012-08-19 03:00:42

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 08/74] sections: Add __visible to tile sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/tile/include/asm/sections.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/tile/include/asm/sections.h b/arch/tile/include/asm/sections.h
index d062d46..edee549 100644
--- a/arch/tile/include/asm/sections.h
+++ b/arch/tile/include/asm/sections.h
@@ -27,11 +27,11 @@ extern char __w1data_begin[], __w1data_end[];


/* Not exactly sections, but PC comparison points in the code. */
-extern char __rt_sigreturn[], __rt_sigreturn_end[];
+extern __visible char __rt_sigreturn[], __rt_sigreturn_end[];
#ifndef __tilegx__
-extern char sys_cmpxchg[], __sys_cmpxchg_end[];
-extern char __sys_cmpxchg_grab_lock[];
-extern char __start_atomic_asm_code[], __end_atomic_asm_code[];
+extern __visible char sys_cmpxchg[], __sys_cmpxchg_end[];
+extern __visible char __sys_cmpxchg_grab_lock[];
+extern __visible char __start_atomic_asm_code[], __end_atomic_asm_code[];
#endif

/* Handle the discontiguity between _sdata and _stext. */
--
1.7.7.6

2012-08-19 03:00:39

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 17/74] lto: Make asmlinkage __visible

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/linkage.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/linkage.h b/include/linux/linkage.h
index 807f1e5..8d6ff27 100644
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -5,9 +5,9 @@
#include <asm/linkage.h>

#ifdef __cplusplus
-#define CPP_ASMLINKAGE extern "C"
+#define CPP_ASMLINKAGE extern "C" __visible
#else
-#define CPP_ASMLINKAGE
+#define CPP_ASMLINKAGE __visible
#endif

#ifndef asmlinkage
--
1.7.7.6

2012-08-19 03:01:05

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 26/74] lto, sound: Fix export symbols for !CONFIG_MODULES

From: Andi Kleen <[email protected]>

The new LTO EXPORT_SYMBOL references symbols even without CONFIG_MODULES.
Since these functions are macros in this case this doesn't work.
Add a ifdef to fix the build.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
sound/core/seq/seq_device.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/sound/core/seq/seq_device.c b/sound/core/seq/seq_device.c
index 5cf8d65..60e8fc1 100644
--- a/sound/core/seq/seq_device.c
+++ b/sound/core/seq/seq_device.c
@@ -569,5 +569,7 @@ EXPORT_SYMBOL(snd_seq_device_load_drivers);
EXPORT_SYMBOL(snd_seq_device_new);
EXPORT_SYMBOL(snd_seq_device_register_driver);
EXPORT_SYMBOL(snd_seq_device_unregister_driver);
+#ifdef CONFIG_MODULES
EXPORT_SYMBOL(snd_seq_autoload_lock);
EXPORT_SYMBOL(snd_seq_autoload_unlock);
+#endif
--
1.7.7.6

2012-08-19 03:01:15

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 32/74] lto, PNP: Fix the inline assembler to use asmlinkage symbols

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/pnp/pnpbios/bioscalls.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/pnp/pnpbios/bioscalls.c b/drivers/pnp/pnpbios/bioscalls.c
index 769d265..53a69a3 100644
--- a/drivers/pnp/pnpbios/bioscalls.c
+++ b/drivers/pnp/pnpbios/bioscalls.c
@@ -21,7 +21,7 @@

#include "pnpbios.h"

-static struct {
+__visible struct {
u16 offset;
u16 segment;
} pnp_bios_callpoint;
@@ -41,6 +41,7 @@ asmlinkage void pnp_bios_callfunc(void);

__asm__(".text \n"
__ALIGN_STR "\n"
+ ".globl pnp_bios_callfunc\n"
"pnp_bios_callfunc:\n"
" pushl %edx \n"
" pushl %ecx \n"
--
1.7.7.6

2012-08-19 03:01:22

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 24/74] lto: Mark do_exit asmlinkage

From: Andi Kleen <[email protected]>

... since it can be called from assembler code

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/kernel.h | 2 +-
kernel/exit.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 6043821..73a1a54 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -209,7 +209,7 @@ extern void oops_enter(void);
extern void oops_exit(void);
void print_oops_end_marker(void);
extern int oops_may_print(void);
-void do_exit(long error_code)
+asmlinkage void do_exit(long error_code)
__noreturn;
void complete_and_exit(struct completion *, long)
__noreturn;
diff --git a/kernel/exit.c b/kernel/exit.c
index f65345f..7de9a3e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -902,7 +902,7 @@ static void check_stack_usage(void)
static inline void check_stack_usage(void) {}
#endif

-void do_exit(long code)
+asmlinkage void do_exit(long code)
{
struct task_struct *tsk = current;
int group_dead;
--
1.7.7.6

2012-08-19 03:01:29

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 69/74] lto: Increase kallsyms max symbol length

From: Joe Mario <[email protected]>

With the added postfixes that LTO adds for local
symbols, the longest name in the kernel overflows
the namebuf[KSYM_NAME_LEN] array by two bytes. That name is:
__pci_fixup_resumePCI_VENDOR_ID_SERVERWORKSPCI_DEVICE_ID_SERVERWORKS_HT1000SBquirk_disable_broadcom_boot_interrupt.1488004.672802

Double the max symbol name length.

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/kallsyms.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 6883e19..711a50f 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -9,7 +9,7 @@
#include <linux/kernel.h>
#include <linux/stddef.h>

-#define KSYM_NAME_LEN 128
+#define KSYM_NAME_LEN 256
#define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s]") + (KSYM_NAME_LEN - 1) + \
2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + 1)

--
1.7.7.6

2012-08-19 03:01:43

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 31/74] x86, lto: Make various variables used by assembler code __visible

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/common.c | 4 ++--
arch/x86/kernel/process_64.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 46d8786..8f12e8c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1075,7 +1075,7 @@ EXPORT_PER_CPU_SYMBOL(kernel_stack);
DEFINE_PER_CPU(char *, irq_stack_ptr) =
init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE - 64;

-DEFINE_PER_CPU(unsigned int, irq_count) = -1;
+DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;

DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);

@@ -1114,7 +1114,7 @@ void syscall_init(void)
X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|X86_EFLAGS_IOPL);
}

-unsigned long kernel_eflags;
+unsigned long kernel_eflags __visible;

/*
* Copies of the original ist values from the tss are only accessed during
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index a5720ed..34435e2 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -52,7 +52,7 @@

asmlinkage extern void ret_from_fork(void);

-DEFINE_PER_CPU(unsigned long, old_rsp);
+asmlinkage DEFINE_PER_CPU(unsigned long, old_rsp);

/* Prints also some state that isn't saved in the pt_regs */
void __show_regs(struct pt_regs *regs, int all)
--
1.7.7.6

2012-08-19 03:01:33

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 66/74] Kbuild, lto: Handle basic LTO in modpost

From: Andi Kleen <[email protected]>

- Don't warn for __gnu_lto_* COMMON symbols
- Don't complain about .gnu.lto* sections

Signed-off-by: Andi Kleen <[email protected]>
---
scripts/mod/modpost.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index ccd34ff..11fc5a6 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -600,7 +600,8 @@ static void handle_modversions(struct module *mod, struct elf_info *info,

switch (sym->st_shndx) {
case SHN_COMMON:
- warn("\"%s\" [%s] is COMMON symbol\n", symname, mod->name);
+ if (strncmp(symname, "__gnu_lto_", sizeof("__gnu_lto_")-1))
+ warn("\"%s\" [%s] is COMMON symbol\n", symname, mod->name);
break;
case SHN_ABS:
/* CRC'd symbol */
@@ -827,6 +828,7 @@ static const char *section_white_list[] =
".note*",
".got*",
".toc*",
+ ".gnu.lto*",
NULL
};

--
1.7.7.6

2012-08-19 03:01:46

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 22/74] lto: Change kernel_execve to asmlinkage for all architectures

From: Andi Kleen <[email protected]>

The x86 kernel_execve has to be asmlinkage because it's called from
assembler code. To make it consistent I have to also change
the prototype in linux/syscalls.h

This in turn requires adding asmlinkage to all architectures.

Do this here in a tree sweep.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/arm/kernel/sys_arm.c | 1 +
arch/avr32/kernel/sys_avr32.c | 1 +
arch/m32r/kernel/sys_m32r.c | 1 +
arch/microblaze/kernel/sys_microblaze.c | 1 +
arch/mips/kernel/syscall.c | 1 +
arch/parisc/kernel/process.c | 1 +
arch/sh/kernel/sys_sh32.c | 1 +
arch/sh/kernel/sys_sh64.c | 1 +
arch/sparc/kernel/sys_sparc_32.c | 1 +
arch/sparc/kernel/sys_sparc_64.c | 1 +
arch/um/kernel/syscall.c | 1 +
arch/unicore32/kernel/sys.c | 1 +
arch/x86/kernel/sys_i386_32.c | 2 +-
include/linux/syscalls.h | 1 +
14 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/sys_arm.c b/arch/arm/kernel/sys_arm.c
index 76cbb05..50b1bf1 100644
--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -79,6 +79,7 @@ out:
return error;
}

+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/avr32/kernel/sys_avr32.c b/arch/avr32/kernel/sys_avr32.c
index 62635a0..94d7121 100644
--- a/arch/avr32/kernel/sys_avr32.c
+++ b/arch/avr32/kernel/sys_avr32.c
@@ -7,6 +7,7 @@
*/
#include <linux/unistd.h>

+asmlinkage
int kernel_execve(const char *file,
const char *const *argv,
const char *const *envp)
diff --git a/arch/m32r/kernel/sys_m32r.c b/arch/m32r/kernel/sys_m32r.c
index d841fb6..b7b2581 100644
--- a/arch/m32r/kernel/sys_m32r.c
+++ b/arch/m32r/kernel/sys_m32r.c
@@ -93,6 +93,7 @@ asmlinkage int sys_cachectl(char *addr, int nbytes, int op)
* Do a system call from kernel instead of calling sys_execve so we
* end up with proper pt_regs.
*/
+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/microblaze/kernel/sys_microblaze.c b/arch/microblaze/kernel/sys_microblaze.c
index e5b154f..396b157 100644
--- a/arch/microblaze/kernel/sys_microblaze.c
+++ b/arch/microblaze/kernel/sys_microblaze.c
@@ -80,6 +80,7 @@ asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
* Do a system call from kernel instead of calling sys_execve so we
* end up with proper pt_regs.
*/
+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index b08220c..6282a7a 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -318,6 +318,7 @@ asmlinkage void bad_stack(void)
* Do a system call from kernel instead of calling sys_execve so we
* end up with proper pt_regs.
*/
+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index d4b94b3..82f425e 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -358,6 +358,7 @@ out:
extern int __execve(const char *filename,
const char *const argv[],
const char *const envp[], struct task_struct *task);
+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/sh/kernel/sys_sh32.c b/arch/sh/kernel/sys_sh32.c
index f56b6fe5..db3f53d 100644
--- a/arch/sh/kernel/sys_sh32.c
+++ b/arch/sh/kernel/sys_sh32.c
@@ -71,6 +71,7 @@ asmlinkage int sys_fadvise64_64_wrapper(int fd, u32 offset0, u32 offset1,
* Do a system call from kernel instead of calling sys_execve so we
* end up with proper pt_regs.
*/
+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/sh/kernel/sys_sh64.c b/arch/sh/kernel/sys_sh64.c
index c5a38c4..cad6faa 100644
--- a/arch/sh/kernel/sys_sh64.c
+++ b/arch/sh/kernel/sys_sh64.c
@@ -33,6 +33,7 @@
* Do a system call from kernel instead of calling sys_execve so we
* end up with proper pt_regs.
*/
+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/sparc/kernel/sys_sparc_32.c b/arch/sparc/kernel/sys_sparc_32.c
index 0c9b31b..37e75dd 100644
--- a/arch/sparc/kernel/sys_sparc_32.c
+++ b/arch/sparc/kernel/sys_sparc_32.c
@@ -263,6 +263,7 @@ out:
* Do a system call from kernel instead of calling sys_execve so we
* end up with proper pt_regs.
*/
+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index 11c6c96..04e57f9 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -734,6 +734,7 @@ SYSCALL_DEFINE5(rt_sigaction, int, sig, const struct sigaction __user *, act,
* Do a system call from kernel instead of calling sys_execve so we
* end up with proper pt_regs.
*/
+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/um/kernel/syscall.c b/arch/um/kernel/syscall.c
index f958cb8..b78e579 100644
--- a/arch/um/kernel/syscall.c
+++ b/arch/um/kernel/syscall.c
@@ -51,6 +51,7 @@ long old_mmap(unsigned long addr, unsigned long len,
return err;
}

+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/unicore32/kernel/sys.c b/arch/unicore32/kernel/sys.c
index 3afe60a..12fc324 100644
--- a/arch/unicore32/kernel/sys.c
+++ b/arch/unicore32/kernel/sys.c
@@ -63,6 +63,7 @@ out:
return error;
}

+asmlinkage
int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
diff --git a/arch/x86/kernel/sys_i386_32.c b/arch/x86/kernel/sys_i386_32.c
index 0b0cb5f..30ad2cd 100644
--- a/arch/x86/kernel/sys_i386_32.c
+++ b/arch/x86/kernel/sys_i386_32.c
@@ -28,7 +28,7 @@
* Do a system call from kernel instead of calling sys_execve so we
* end up with proper pt_regs.
*/
-int kernel_execve(const char *filename,
+asmlinkage int kernel_execve(const char *filename,
const char *const argv[],
const char *const envp[])
{
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 19439c7..5cf40e3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -827,6 +827,7 @@ asmlinkage long sys_fanotify_mark(int fanotify_fd, unsigned int flags,
const char __user *pathname);
asmlinkage long sys_syncfs(int fd);

+asmlinkage
int kernel_execve(const char *filename, const char *const argv[], const char *const envp[]);


--
1.7.7.6

2012-08-19 03:01:57

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 23/74] lto, mutex: Mark __visible

From: Andi Kleen <[email protected]>

Various kernel/mutex.c functions can be called from
inline assembler, so they should be all global and
__visible

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/mutex.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/mutex.c b/kernel/mutex.c
index a307cc9..fef1585 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -56,8 +56,7 @@ EXPORT_SYMBOL(__mutex_init);
* We also put the fastpath first in the kernel image, to make sure the
* branch is predicted by the CPU as default-untaken.
*/
-static __used noinline void __sched
-__mutex_lock_slowpath(atomic_t *lock_count);
+__visible void __sched __mutex_lock_slowpath(atomic_t *lock_count);

/**
* mutex_lock - acquire the mutex
@@ -94,7 +93,7 @@ void __sched mutex_lock(struct mutex *lock)
EXPORT_SYMBOL(mutex_lock);
#endif

-static __used noinline void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
+__visible void __sched __mutex_unlock_slowpath(atomic_t *lock_count);

/**
* mutex_unlock - release the mutex
@@ -338,7 +337,7 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
/*
* Release the lock, slowpath:
*/
-static __used noinline void
+__visible void
__mutex_unlock_slowpath(atomic_t *lock_count)
{
__mutex_unlock_common_slowpath(lock_count, 1);
@@ -395,7 +394,7 @@ int __sched mutex_lock_killable(struct mutex *lock)
}
EXPORT_SYMBOL(mutex_lock_killable);

-static __used noinline void __sched
+__visible void __sched
__mutex_lock_slowpath(atomic_t *lock_count)
{
struct mutex *lock = container_of(lock_count, struct mutex, count);
--
1.7.7.6

2012-08-19 03:01:40

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 40/74] lto, powerpc: Disable LTO for the powerpc VDSO

From: Andi Kleen <[email protected]>

VDSO does not play well with LTO, so just disable it.

(note that powerpc will likely need more changes for LTO, this was
just from grep)

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/powerpc/kernel/vdso32/Makefile | 2 +-
arch/powerpc/kernel/vdso64/Makefile | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 53e6c9b..8cc88bf 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -16,7 +16,7 @@ obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))

GCOV_PROFILE := n

-ccflags-y := -shared -fno-common -fno-builtin
+ccflags-y := -shared -fno-common -fno-builtin $(DISABLE_LTO)
ccflags-y += -nostdlib -Wl,-soname=linux-vdso32.so.1 \
$(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
asflags-y := -D__VDSO32__ -s
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index effca94..5bca644 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -9,7 +9,7 @@ obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))

GCOV_PROFILE := n

-ccflags-y := -shared -fno-common -fno-builtin
+ccflags-y := -shared -fno-common -fno-builtin $(DISABLE_LTO)
ccflags-y += -nostdlib -Wl,-soname=linux-vdso64.so.1 \
$(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
asflags-y := -D__VDSO64__ -s
--
1.7.7.6

2012-08-19 03:02:00

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 71/74] lto, kprobes: Use KSYM_NAME_LEN to size identifier buffers

From: Joe Mario <[email protected]>

Use KSYM_NAME_LEN to size identifier buffers, so that it can
be easier increased.

Cc: [email protected]
Signed-off-by: Joe Mario <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
---
kernel/kprobes.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c62b854..b9bd2a8 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1955,7 +1955,7 @@ static int __init init_kprobes(void)
{
int i, err = 0;
unsigned long offset = 0, size = 0;
- char *modname, namebuf[128];
+ char *modname, namebuf[KSYM_NAME_LEN];
const char *symbol_name;
void *addr;
struct kprobe_blackpoint *kb;
@@ -2081,7 +2081,7 @@ static int __kprobes show_kprobe_addr(struct seq_file *pi, void *v)
const char *sym = NULL;
unsigned int i = *(loff_t *) v;
unsigned long offset = 0;
- char *modname, namebuf[128];
+ char *modname, namebuf[KSYM_NAME_LEN];

head = &kprobe_table[i];
preempt_disable();
--
1.7.7.6

2012-08-19 03:02:18

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 41/74] x86, lto: Disable LTO for the x86 VDSO

From: Andi Kleen <[email protected]>

The VDSO does not play well with LTO, so just disable LTO for it.
Also pass a 32bit linker flag for the 32bit version.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/vdso/Makefile | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index fd14be1..b2e15f2 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -2,6 +2,8 @@
# Building vDSO images for x86.
#

+KBUILD_CFLAGS += ${DISABLE_LTO}
+
VDSO64-$(CONFIG_X86_64) := y
VDSOX32-$(CONFIG_X86_X32_ABI) := y
VDSO32-$(CONFIG_X86_32) := y
@@ -35,7 +37,8 @@ export CPPFLAGS_vdso.lds += -P -C

VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-Wl,--no-undefined \
- -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
+ -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
+ $(DISABLE_LTO)

$(obj)/vdso.o: $(src)/vdso.S $(obj)/vdso.so

@@ -127,7 +130,7 @@ vdso32.so-$(VDSO32-y) += sysenter
vdso32-images = $(vdso32.so-y:%=vdso32-%.so)

CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1

# This makes sure the $(obj) subdirectory exists even though vdso32/
# is not a kbuild sub-make subdirectory.
@@ -181,7 +184,8 @@ quiet_cmd_vdso = VDSO $@
-Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'

-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
+VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) \
+ ${LTO_CFLAGS}
GCOV_PROFILE := n

#
--
1.7.7.6

2012-08-19 03:02:26

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 45/74] lto: Mark rwsem functions that can be called from assembler asmlinkage

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
lib/rwsem.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 8337e1b..4a33b58 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -222,6 +222,7 @@ rwsem_down_failed_common(struct rw_semaphore *sem,
/*
* wait for the read lock to be granted
*/
+__visible
struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
{
return rwsem_down_failed_common(sem, RWSEM_WAITING_FOR_READ,
@@ -231,6 +232,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
/*
* wait for the write lock to be granted
*/
+__visible
struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
{
return rwsem_down_failed_common(sem, RWSEM_WAITING_FOR_WRITE,
@@ -241,6 +243,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
* handle waking up a waiter on the semaphore
* - up_read/up_write has decremented the active part of count if we come here
*/
+__visible
struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
{
unsigned long flags;
@@ -261,6 +264,7 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
* - caller incremented waiting part of count and discovered it still negative
* - just wake up any readers at the front of the queue
*/
+__visible
struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
{
unsigned long flags;
--
1.7.7.6

2012-08-19 03:02:13

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 49/74] x86, lto, paravirt: Add __visible/asmlinkage to xen paravirt ops

From: Andi Kleen <[email protected]>

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/paravirt_types.h | 3 ++-
arch/x86/xen/xen-ops.h | 16 ++++++++--------
2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 142236e..4f262bc 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -386,7 +386,8 @@ extern struct pv_lock_ops pv_lock_ops;

/* Simple instruction patching code. */
#define DEF_NATIVE(ops, name, code) \
- extern const char start_##ops##_##name[], end_##ops##_##name[]; \
+ extern const char start_##ops##_##name[] __visible, \
+ end_##ops##_##name[] __visible; \
asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")

unsigned paravirt_patch_nop(void);
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 1e4329e..1c4c94e 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -105,9 +105,9 @@ static inline void __init xen_init_apic(void)
/* Declare an asm function, along with symbols needed to make it
inlineable */
#define DECL_ASM(ret, name, ...) \
- ret name(__VA_ARGS__); \
- extern char name##_end[]; \
- extern char name##_reloc[] \
+ asmlinkage ret name(__VA_ARGS__); \
+ extern char name##_end[] __visible; \
+ extern char name##_reloc[] __visible

DECL_ASM(void, xen_irq_enable_direct, void);
DECL_ASM(void, xen_irq_disable_direct, void);
@@ -115,11 +115,11 @@ DECL_ASM(unsigned long, xen_save_fl_direct, void);
DECL_ASM(void, xen_restore_fl_direct, unsigned long);

/* These are not functions, and cannot be called normally */
-void xen_iret(void);
-void xen_sysexit(void);
-void xen_sysret32(void);
-void xen_sysret64(void);
-void xen_adjust_exception_frame(void);
+asmlinkage void xen_iret(void);
+asmlinkage void xen_sysexit(void);
+asmlinkage void xen_sysret32(void);
+asmlinkage void xen_sysret64(void);
+asmlinkage void xen_adjust_exception_frame(void);

extern int xen_panic_handler_init(void);

--
1.7.7.6

2012-08-19 03:02:42

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 43/74] lto, workaround: Disable LTO for sys_ni to work around alias bugs

From: Andi Kleen <[email protected]>

LTO gcc has trouble with the weak alias definitions in sys_ni.
This leads to missing symbols. Just disable LTO for this file.

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/Makefile | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/Makefile b/kernel/Makefile
index c0cc67a..4b37677 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -22,6 +22,8 @@ CFLAGS_REMOVE_cgroup-debug.o = -pg
CFLAGS_REMOVE_irq_work.o = -pg
endif

+CFLAGS_sys_ni.o = $(DISABLE_LTO)
+
obj-y += sched/
obj-y += power/

--
1.7.7.6

2012-08-19 03:02:58

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 74/74] lto, workaround: Mark do_futex noinline to prevent clobbering ebp

From: Andi Kleen <[email protected]>

On a 32bit build gcc 4.7 with LTO decides to clobber the 6th argument on the
stack. Unfortunately this corrupts the user EBP and leads to later crashes.
For now mark do_futex noinline to prevent this.

I wish there was a generic way to handle this. Seems like a ticking time
bomb problem.

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/futex.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 3717e7b..48b5a07 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2620,7 +2620,7 @@ void exit_robust_list(struct task_struct *curr)
curr, pip);
}

-long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
+noinline long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int cmd = op & FUTEX_CMD_MASK;
--
1.7.7.6

2012-08-19 03:02:56

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 57/74] lto, workaround: Add workaround for LTO build problem in pvrusb2-audio

From: Andi Kleen <[email protected]>

Making this visible fixes some missing symbols with gcc 4.7 LTO.
This is a workaround for a compiler problem.

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/media/video/pvrusb2/pvrusb2-audio.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/media/video/pvrusb2/pvrusb2-audio.c b/drivers/media/video/pvrusb2/pvrusb2-audio.c
index cc06d5e..aaa6420 100644
--- a/drivers/media/video/pvrusb2/pvrusb2-audio.c
+++ b/drivers/media/video/pvrusb2/pvrusb2-audio.c
@@ -32,7 +32,7 @@ struct routing_scheme {
unsigned int cnt;
};

-static const int routing_scheme0[] = {
+__visible const int pvrusb2_routing_scheme0[] = {
[PVR2_CVAL_INPUT_TV] = MSP_INPUT_DEFAULT,
[PVR2_CVAL_INPUT_RADIO] = MSP_INPUT(MSP_IN_SCART2,
MSP_IN_TUNER1,
@@ -49,8 +49,8 @@ static const int routing_scheme0[] = {
};

static const struct routing_scheme routing_def0 = {
- .def = routing_scheme0,
- .cnt = ARRAY_SIZE(routing_scheme0),
+ .def = pvrusb2_routing_scheme0,
+ .cnt = ARRAY_SIZE(pvrusb2_routing_scheme0),
};

static const struct routing_scheme *routing_schemes[] = {
--
1.7.7.6

2012-08-19 03:03:11

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 70/74] Kbuild, lto: Handle longer symbols in kallsyms.c

From: Andi Kleen <[email protected]>

Also warn for too long symbols

Signed-off-by: Andi Kleen <[email protected]>
---
scripts/kallsyms.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 487ac6f..acde016 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -27,7 +27,7 @@
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
#endif

-#define KSYM_NAME_LEN 128
+#define KSYM_NAME_LEN 256

struct sym_entry {
unsigned long long addr;
@@ -111,6 +111,12 @@ static int read_symbol(FILE *in, struct sym_entry *s)
fprintf(stderr, "Read error or end of file.\n");
return -1;
}
+ if (strlen(str) > KSYM_NAME_LEN) {
+ fprintf(stderr, "Symbol %s too long for kallsyms.\n"
+ "Please increae KSYM_NAME_LEN both in kernel and kallsyms.c",
+ str);
+ return -1;
+ }

sym = str;
/* skip prefix char */
--
1.7.7.6

2012-08-19 03:03:20

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 67/74] Kbuild, lto: Add Link Time Optimization support

From: Andi Kleen <[email protected]>

With LTO gcc will do whole program optimizations for
the whole kernel and each module. This increases compile time,
but can generate faster code.

LTO allows gcc to inline functions between different files and
do various other optimization across the whole binary.

It might also trigger bugs due to more aggressive optimization.
It allows gcc to drop unused code. It also allows it to check
types over the whole program.

This adds the basic Kbuild plumbing for LTO:

- In Kbuild add a new scripts/Makefile.lto that checks
the tool chain (note the checks may not be fully bulletproof)
and when the tests pass sets the LTO options
Currently LTO is very finicky about the tool chain.
- Add a new LDFINAL variable that controls the final link
for vmlinux or module. In this case we call gcc-ld instead
of ld, to run the LTO step.
- For slim LTO builds (object files containing no backup
executable) force AR to gcc-ar
- Theoretically LTO should pass through compiler options from
the compiler to the link step, but this doesn't work for all options.
So the Makefile sets most of these options manually.
- Kconfigs:
Since LTO with allyesconfig needs more than 4G of memory (~8G)
and has the potential to makes people's system swap to death.
I used a nested config that ensures that a simple
allyesconfig disables LTO. It has to be explicitely
enabled.
- Some depencies on other Kconfigs:
MODVERSIONS, GCOV, FUNCTION_TRACER, single chain WCHAN are
incompatible with LTO currently. MODVERSIONS should be fixable,
but the others require setting special compiler options
for specific files, which LTO currently doesn't support.
I also disable strict copy user checks because they trigger
errors with LTO.

For more information see Documentation/lto-build

Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther,
Don Zickus, Changlong Xie who helped with this project
(and probably some more who I forgot, sorry)

Signed-off-by: Andi Kleen <[email protected]>
---
Makefile | 9 +++++-
arch/x86/Kconfig | 2 +-
arch/x86/Kconfig.debug | 2 +-
init/Kconfig | 58 ++++++++++++++++++++++++++++++++++++++
kernel/gcov/Kconfig | 2 +-
scripts/Makefile.lto | 69 ++++++++++++++++++++++++++++++++++++++++++++++
scripts/Makefile.modpost | 2 +-
scripts/link-vmlinux.sh | 4 +-
8 files changed, 141 insertions(+), 7 deletions(-)
create mode 100644 scripts/Makefile.lto

diff --git a/Makefile b/Makefile
index 9cc77ac..b80c080 100644
--- a/Makefile
+++ b/Makefile
@@ -326,9 +326,14 @@ include $(srctree)/scripts/Kbuild.include

AS = $(CROSS_COMPILE)as
LD = $(CROSS_COMPILE)ld
+LDFINAL = $(LD)
CC = $(CROSS_COMPILE)gcc
CPP = $(CC) -E
+ifdef CONFIG_LTO_SLIM
+AR = $(CROSS_COMPILE)gcc-ar
+else
AR = $(CROSS_COMPILE)ar
+endif
NM = $(CROSS_COMPILE)nm
STRIP = $(CROSS_COMPILE)strip
OBJCOPY = $(CROSS_COMPILE)objcopy
@@ -377,7 +382,7 @@ KERNELVERSION = $(VERSION)$(if $(PATCHLEVEL),.$(PATCHLEVEL)$(if $(SUBLEVEL),.$(S

export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION
export ARCH SRCARCH CONFIG_SHELL HOSTCC HOSTCFLAGS CROSS_COMPILE AS LD CC
-export CPP AR NM STRIP OBJCOPY OBJDUMP
+export CPP AR NM STRIP OBJCOPY OBJDUMP LDFINAL
export MAKE AWK GENKSYMS INSTALLKERNEL PERL UTS_MACHINE
export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS

@@ -647,6 +652,8 @@ ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC)), y)
KBUILD_CFLAGS += -DCC_HAVE_ASM_GOTO
endif

+include ${srctree}/scripts/Makefile.lto
+
# Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
# But warn user when we do so
warn-assign = \
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9382b09..2e2974f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -540,7 +540,7 @@ config X86_32_IRIS

config SCHED_OMIT_FRAME_POINTER
def_bool y
- prompt "Single-depth WCHAN output"
+ prompt "Single-depth WCHAN output" if !LTO && !FRAME_POINTER
depends on X86
---help---
Calculate simpler /proc/<PID>/wchan values. If this option
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index b322f12..7961491 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -294,7 +294,7 @@ config OPTIMIZE_INLINING

config DEBUG_STRICT_USER_COPY_CHECKS
bool "Strict copy size checks"
- depends on DEBUG_KERNEL && !TRACE_BRANCH_PROFILING
+ depends on DEBUG_KERNEL && !TRACE_BRANCH_PROFILING && !LTO
---help---
Enabling this option turns a certain set of sanity checks for user
copy operations into compile time failures.
diff --git a/init/Kconfig b/init/Kconfig
index a8785db..0b972ab 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1135,6 +1135,63 @@ config CC_OPTIMIZE_FOR_SIZE

If unsure, say Y.

+config LTO_MENU
+ bool "Enable gcc link time optimizations"
+ # Only tested on X86 for now. For other architectures you likely
+ # have to fix some things first, like adding asmlinkages etc.
+ depends on EXPERIMENTAL && X86
+ # lto does not support excluding flags for specific files
+ # right now. Can be removed if that is fixed.
+ depends on !FUNCTION_TRACER
+ help
+ With this option gcc will do whole program optimizations for
+ the whole kernel and module. This increases compile time, but can
+ lead to better code. It allows gcc to inline functions between
+ different files. It might also trigger bugs due to more
+ aggressive optimization. It allows gcc to drop unused code.
+ With this option gcc will also do some global checking over
+ different source files.
+
+ This requires a gcc 4.7 or later compiler and
+ Linux binutils 2.21.51.0.3 or later. It does not currently
+ work with a FSF release of binutils or with gold.
+
+ On larger configurations this may need more than 4GB of RAM.
+ It will likely not work on those with a 32bit compiler. Also
+ /tmp in tmpfs may lead to faster running out of RAM
+ (in this case set the TMPDIR environment variable to a different
+ directory directly on disk)
+
+ When the toolchain support is not available this will (hopefully)
+ be automatically disabled.
+
+ For more information see Documentation/lto-build
+
+config LTO_DISABLE
+ bool "Disable LTO again"
+ depends on LTO_MENU
+ default n
+ help
+ This option is merely here so that allyesconfig or allmodconfig does
+ not enable LTO. If you want to actually use LTO do not enable.
+
+config LTO
+ bool
+ default y
+ depends on LTO_MENU && !LTO_DISABLE
+
+config LTO_DEBUG
+ bool "Enable LTO compile time debugging"
+ depends on LTO
+
+config LTO_SLIM
+ bool "Use slim lto"
+ # need to fix modpost for it
+ depends on LTO && BROKEN
+ help
+ Do not generate all code twice. The object files will only contain
+ LTO information. This lowers build time.
+
config SYSCTL
bool

@@ -1566,6 +1623,7 @@ config MODULE_FORCE_UNLOAD

config MODVERSIONS
bool "Module versioning support"
+ depends on !LTO
help
Usually, you have to use modules compiled with your kernel.
Saying Y here makes it sometimes possible to use modules
diff --git a/kernel/gcov/Kconfig b/kernel/gcov/Kconfig
index a920281..b9f6381 100644
--- a/kernel/gcov/Kconfig
+++ b/kernel/gcov/Kconfig
@@ -2,7 +2,7 @@ menu "GCOV-based kernel profiling"

config GCOV_KERNEL
bool "Enable gcov-based kernel profiling"
- depends on DEBUG_FS
+ depends on DEBUG_FS && !LTO
select CONSTRUCTORS if !UML
default n
---help---
diff --git a/scripts/Makefile.lto b/scripts/Makefile.lto
new file mode 100644
index 0000000..1321220
--- /dev/null
+++ b/scripts/Makefile.lto
@@ -0,0 +1,69 @@
+#
+# Support for gcc link time optimization
+#
+
+DISABLE_LTO :=
+LTO_CFLAGS :=
+
+export DISABLE_LTO
+export LTO_CFLAGS
+
+ifdef CONFIG_LTO
+ifeq ($(call cc-ifversion, -ge, 0407,y),y)
+ifneq ($(call cc-option,${LTO_CFLAGS},n),n)
+# We need HJ Lu's Linux binutils because mainline binutils does not
+# support mixing assembler and LTO code in the same ld -r object.
+# XXX check if the gcc plugin ld is the expected one too
+ifeq ($(call ld-ifversion,-ge,22710001,y),y)
+# should use -flto=jobserver, but we need a fix for http://gcc.gnu.org/PR50639
+ LTO_CFLAGS := -flto -fno-toplevel-reorder
+ LTO_FINAL_CFLAGS := -fuse-linker-plugin -flto=$(shell getconf _NPROCESSORS_ONLN) -fno-toplevel-reorder
+ifdef CONFIG_LTO_SLIM
+ # requires plugin ar passed and very recent HJ binutils
+ LTO_CFLAGS += -fno-fat-lto-objects
+endif
+ DISABLE_LTO := -fno-lto
+
+ LTO_FINAL_CFLAGS += ${LTO_CFLAGS} -fwhole-program
+
+ # workaround for http://gcc.gnu.org/PR50602
+ LTO_FINAL_CFLAGS += $(filter -freg-struct-return,${KBUILD_CFLAGS})
+
+ifdef CONFIG_LTO_DEBUG
+ LTO_FINAL_CFLAGS += -dH -fdump-ipa-cgraph -fdump-ipa-inline-details # -Wl,-plugin-save-temps -save-temps
+ LTO_CFLAGS +=
+endif
+
+ # In principle gcc should pass through options in the object files,
+ # but it doesn't always work. So do it here manually
+ LTO_FINAL_CFLAGS += $(filter -g%,${KBUILD_CFLAGS})
+ LTO_FINAL_CFLAGS += $(filter -O%,${KBUILD_CFLAGS})
+ LTO_FINAL_CFLAGS += $(filter -f%,${KBUILD_CFLAGS})
+ #LTO_FINAL_CFLAGS += $(filter -fno-omit-frame-pointer, ${KBUILD_CFLAGS})
+ #LTO_FINAL_CFLAGS += $(filter -fno-strict-aliasing, ${KBUILD_CFLAGS})
+ #LTO_FINAL_CFLAGS += $(filter -fno-delete-null-pointer-checks, ${KBUILD_CFLAGS})
+ #LTO_FINAL_CFLAGS += $(filter -fno-strict-overflow, ${KBUILD_CFLAGS})
+ LTO_FINAL_CFLAGS += $(filter -m%,${KBUILD_CFLAGS})
+ LTO_FINAL_CFLAGS += $(filter -W%,${KBUILD_CFLAGS})
+
+ KBUILD_CFLAGS += ${LTO_CFLAGS}
+
+ #
+ # Don't pass all flags to the optimization stage
+ # We assume the compiler remembers those in the object files.
+ # Currently gcc is a little dumb in this and uses the flags
+ # from the first file, which implies that setting special
+ # flags on files does not work.
+ LDFINAL := ${CONFIG_SHELL} ${srctree}/scripts/gcc-ld \
+ ${LTO_FINAL_CFLAGS}
+
+else
+ $(warning "WARNING: Too old linker version $(call ld-version) for kernel LTO. You need Linux binutils. CONFIG_LTO disabled.")
+endif
+else
+ $(warning "WARNING: Compiler/Linker does not support LTO/WHOPR with linker plugin. CONFIG_LTO disabled.")
+endif
+else
+ $(warning "WARNING: GCC $(call cc-version) too old for LTO/WHOPR. CONFIG_LTO disabled")
+endif
+endif
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 08dce14..9d66a22 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -117,7 +117,7 @@ targets += $(modules:.ko=.mod.o)

# Step 6), final link of the modules
quiet_cmd_ld_ko_o = LD [M] $@
- cmd_ld_ko_o = $(LD) -r $(LDFLAGS) \
+ cmd_ld_ko_o = $(LDFINAL) -r $(LDFLAGS) \
$(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \
-o $@ $(filter-out FORCE,$^)

diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index a05c49c..be65534 100644
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -53,7 +53,7 @@ vmlinux_link()
local lds="${objtree}/${KBUILD_LDS}"

if [ "${SRCARCH}" != "um" ]; then
- ${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \
+ ${LDFINAL} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \
-T ${lds} ${KBUILD_VMLINUX_INIT} \
--start-group ${KBUILD_VMLINUX_MAIN} --end-group ${1}
else
@@ -196,7 +196,7 @@ if [ -n "${CONFIG_KALLSYMS}" ]; then
fi
fi

-info LD vmlinux
+info LDFINAL vmlinux
vmlinux_link "${kallsymso}" vmlinux

if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
--
1.7.7.6

2012-08-19 03:03:28

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 46/74] x86, lto: Disable fancy hweight optimizations for LTO

From: Andi Kleen <[email protected]>

The fancy x86 hweight uses different compiler options for the
hweight file. This does not work with LTO. Just disable the optimization
with LTO

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/Kconfig | 5 +++--
arch/x86/include/asm/arch_hweight.h | 9 +++++++++
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8ec3a1a..9382b09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -224,8 +224,9 @@ config X86_32_LAZY_GS

config ARCH_HWEIGHT_CFLAGS
string
- default "-fcall-saved-ecx -fcall-saved-edx" if X86_32
- default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64
+ default "-fcall-saved-ecx -fcall-saved-edx" if X86_32 && !LTO
+ default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64 && !LTO
+ default "" if LTO

config ARCH_CPU_PROBE_RELEASE
def_bool y
diff --git a/arch/x86/include/asm/arch_hweight.h b/arch/x86/include/asm/arch_hweight.h
index 9686c3d..ca80549 100644
--- a/arch/x86/include/asm/arch_hweight.h
+++ b/arch/x86/include/asm/arch_hweight.h
@@ -25,9 +25,14 @@ static inline unsigned int __arch_hweight32(unsigned int w)
{
unsigned int res = 0;

+#ifdef CONFIG_LTO
+ res = __sw_hweight32(w);
+#else
+
asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, X86_FEATURE_POPCNT)
: "="REG_OUT (res)
: REG_IN (w));
+#endif

return res;
}
@@ -46,6 +51,9 @@ static inline unsigned long __arch_hweight64(__u64 w)
{
unsigned long res = 0;

+#ifdef CONFIG_LTO
+ res = __sw_hweight64(w);
+#else
#ifdef CONFIG_X86_32
return __arch_hweight32((u32)w) +
__arch_hweight32((u32)(w >> 32));
@@ -54,6 +62,7 @@ static inline unsigned long __arch_hweight64(__u64 w)
: "="REG_OUT (res)
: REG_IN (w));
#endif /* CONFIG_X86_32 */
+#endif

return res;
}
--
1.7.7.6

2012-08-19 03:03:26

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 68/74] Kbuild, lto: Add LTO build Documentation

From: Andi Kleen <[email protected]>

Add build documentation for LTO.

Signed-off-by: Andi Kleen <[email protected]>
---
Documentation/lto-build | 115 +++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 115 insertions(+), 0 deletions(-)
create mode 100644 Documentation/lto-build

diff --git a/Documentation/lto-build b/Documentation/lto-build
new file mode 100644
index 0000000..5da427a
--- /dev/null
+++ b/Documentation/lto-build
@@ -0,0 +1,115 @@
+Link time optimization (LTO) for the Linux kernel
+
+This is an experimental feature which still has various problems.
+
+Link Time Optimization allows the compiler to optimize the complete program
+instead of just each file. Link Time Optimization was a new feature in gcc 4.6,
+but only really works with gcc 4.7. The kernel LTO build also requires
+the Linux binutils (the normal FSF releases do not work at the moment)
+
+The compiler can inline functions between files and do some other global
+optimizations. It will also drop unused functions which can make the kernel
+image smaller in some circumstances. The binary gets somewhat larger.
+In return the resulting kernels (usually) have better performance.
+
+Build time and memory consumption at build time will increase.
+The build time penalty depends on the size of the vmlinux. Reasonable
+sized vmlinux build about twice as long, much larger monolithic kernels
+like allyesconfig ~4x as long. Modular kernels are less affected.
+
+Normal "reasonable" builds work with less than 4GB of RAM, but very large
+configurations like allyesconfig can need upto 9GB.
+
+Issues:
+- Various workarounds in kernel needed for toolchain problem.
+- A few kernel features are currently incompatible with LTO, in particular
+function tracing, because they require special compiler flags for
+specific files, which is not supported in LTO right now.
+- The build is faster with LTO_SLIM enabled, but this still triggers
+problems in some circumstances (currently disabled)
+- Jobserver control for -j does not work correctly for the final
+LTO phase. The makefiles hardcodes -j<number of online cpus>
+
+Configuration:
+- Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE.
+This is mainly to not have allyesconfig default to LTO.
+- FUNCTION_TRACER, STACK_TRACER, FUNCTION_GRAPH_TRACER have to disabled
+because they are currently incompatible with LTO.
+- MODVERSIONS have to be disabled because they are not fixed for LTO
+yet.
+
+Requirements:
+- Enough memory: 4GB for a standard build, ~8GB for allyesconfig
+If you are tight on memory and use tmpfs as /tmp define TMPDIR and
+point it to a directory on disk. The peak memory usage
+happens single threaded (when lto-wpa merges types), so dialing
+back -j options will not help much.
+
+A 32bit compiler is unlikely to work due to the memory requirements.
+You can however build a kernel targetted at 32bit on a 64bit host.
+
+- Get the Linux binutils from
+http://www.kernel.org/pub/linux/devel/binutils/
+Sorry standard binutils releases don't work
+The kernel build has to use this linker, so if it is installed
+in a non standard location use LD=... on the make line.
+
+- gcc 4.7 built with plugin ld (--with-plugin-ld) also pointing to the
+linker from the Linux binutils and LTO
+
+Example build procedure for the tool chain and kernel. This does not
+overwrite the standard compiler toolchain on the system. If you already
+have a suitable gcc 4.7+ compiler and linker the toolchain build can
+be skipped (note that a distribution gcc 4.7 is not necessarily
+correctly configured for LTO)
+
+Get the Linux binutils from http://www.kernel.org/pub/linux/devel/binutils/
+The standard binutils do not work at this point!
+
+Unpack binutils
+
+cd binutils-VERSION (or plain binutils in some versions)
+./configure --prefix=/opt/binutils-VERSION --enable-plugins
+nice -n20 make -j$(getconf _NPROCESSORS_ONLN)
+sudo make install
+sudo ln -sf /opt/binutils-VERSION/bin/ld /usr/local/bin/ld-plugin
+
+Unpack gcc-4.7
+
+mkdir obj-gcc
+# please don't skip this cd. the build will not work correctly in the
+# source dir, you have to use the separate object dir
+cd obj-gcc
+# make sure to install gmp-devel and mpfr-devel
+# and the 32bit glibc package if you have a multilib system
+# if mpc-devel is not there get it from
+# http://www.multiprecision.org/mpc/download/mpc-0.8.2.tar.gz
+# and install in gcc-4.7*/mpc
+../gcc-4.7*/configure --prefix=/opt/gcc-4.7 --enable-lto \
+--with-plugin-ld=/usr/local/bin/ld-plugin \
+--disable-nls --enable-languages=c,c++ \
+--disable-libstdcxx-pch
+nice -n20 make -j$(getconf _NPROCESSORS_ONLN)
+sudo make install-no-fixedincludes
+sudo ln -sf /opt/gcc-4.7/bin/gcc /usr/local/bin/gcc47
+sudo ln -sf /opt/gcc-4.7/bin/gcc-ar /usr/local/bin/gcc-ar47
+
+# get lto tree in linux-lto
+
+mkdir obj-lto
+cd obj-lto
+# copy a suitable kernel config file into .config
+make -C ../linux-lto O=$(pwd) oldconfig
+./source/scripts/config --disable function_tracer --disable function_graph_tracer \
+ --disable stack_tracer --enable lto_menu \
+ --disable lto_disable --disable lto_debug --disable lto_slim
+export TMPDIR=$(pwd)
+# this lowers memory usage with /tmp=tmpfs
+# note the special ar is only needed if CONFIG_LTO_SLIM is enabled
+# The PATH is that gcc-ar finds a plugin aware ar, if your standard
+# binutils doesn't support that. If the standard ar supports --plugin
+# it is not needed
+PATH=/opt/binutils-VERSION:$PATH nice -n20 make CC=gcc47 LD=ld-plugin AR=gcc-ar47 \
+-j $(getconf _NPROCESSORS_ONLN)
+
+Andi Kleen
--
1.7.7.6

2012-08-19 03:03:16

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 64/74] Kbuild, lto: Add a gcc-ld script to let run gcc as ld

From: Andi Kleen <[email protected]>

For LTO we need to run the link step with gcc, not ld.
Since there are a lot of linker options passed to it, add a gcc-ld wrapper
that wraps them as -Wl,

Signed-off-by: Andi Kleen <[email protected]>
---
scripts/gcc-ld | 29 +++++++++++++++++++++++++++++
1 files changed, 29 insertions(+), 0 deletions(-)
create mode 100644 scripts/gcc-ld

diff --git a/scripts/gcc-ld b/scripts/gcc-ld
new file mode 100644
index 0000000..cadab9a
--- /dev/null
+++ b/scripts/gcc-ld
@@ -0,0 +1,29 @@
+#!/bin/sh
+# run gcc with ld options
+# used as a wrapper to execute link time optimizations
+# yes virginia, this is not pretty
+
+ARGS="-nostdlib"
+
+while [ "$1" != "" ] ; do
+ case "$1" in
+ -save-temps|-m32|-m64) N="$1" ;;
+ -r) N="$1" ;;
+ -[Wg]*) N="$1" ;;
+ -[olv]|-[Ofd]*|-nostdlib) N="$1" ;;
+ --end-group|--start-group)
+ N="-Wl,$1" ;;
+ -[RTFGhIezcbyYu]*|\
+--script|--defsym|-init|-Map|--oformat|-rpath|\
+-rpath-link|--sort-section|--section-start|-Tbss|-Tdata|-Ttext|\
+--version-script|--dynamic-list|--version-exports-symbol|--wrap|-m)
+ A="$1" ; shift ; N="-Wl,$A,$1" ;;
+ -[m]*) N="$1" ;;
+ -*) N="-Wl,$1" ;;
+ *) N="$1" ;;
+ esac
+ ARGS="$ARGS $N"
+ shift
+done
+
+exec $CC $ARGS
--
1.7.7.6

2012-08-19 03:03:07

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 56/74] lto, workaround: Add workaround for missing LTO symbols in igb

From: Andi Kleen <[email protected]>

The gcc 4.7 LTO with -fno-toplevel-reorder sometimes drops data
variables. These show up as undefined symbols at link time.
As a workaround just make a few where it happened visible for now.

There isl nothing wrong with this driver, just a toolchain problem.

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/net/ethernet/intel/igb/e1000_82575.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
index ba994fb..86b985a 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -2254,7 +2254,8 @@ out:
return ret_val;
}

-static struct e1000_mac_operations e1000_mac_ops_82575 = {
+/* Workaround for LTO bug */
+__visible struct e1000_mac_operations e1000_mac_ops_82575 = {
.init_hw = igb_init_hw_82575,
.check_for_link = igb_check_for_link_82575,
.rar_set = igb_rar_set,
@@ -2262,13 +2263,13 @@ static struct e1000_mac_operations e1000_mac_ops_82575 = {
.get_speed_and_duplex = igb_get_speed_and_duplex_copper,
};

-static struct e1000_phy_operations e1000_phy_ops_82575 = {
+__visible struct e1000_phy_operations e1000_phy_ops_82575 = {
.acquire = igb_acquire_phy_82575,
.get_cfg_done = igb_get_cfg_done_82575,
.release = igb_release_phy_82575,
};

-static struct e1000_nvm_operations e1000_nvm_ops_82575 = {
+__visible struct e1000_nvm_operations e1000_nvm_ops_82575 = {
.acquire = igb_acquire_nvm_82575,
.read = igb_read_nvm_eerd,
.release = igb_release_nvm_82575,
--
1.7.7.6

2012-08-19 03:03:05

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 54/74] x86, lto, vdso: Don't duplicate vvar address variables

From: Andi Kleen <[email protected]>

Every includer of vvar.h currently gets own static variables
for all the vvar addresses. Generate just one set each for the
main kernel and for the vdso. This saves some data space.

Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/vvar.h | 27 +++++++++++++++++----------
arch/x86/vdso/vclock_gettime.c | 1 +
arch/x86/vdso/vma.c | 1 +
3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index d76ac40..1fd06a8 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -24,27 +24,34 @@
/* The kernel linker script defines its own magic to put vvars in the
* right place.
*/
-#define DECLARE_VVAR(offset, type, name) \
- EMIT_VVAR(name, offset)
+#define DECLARE_VVAR(type, name) \
+ EMIT_VVAR(name, VVAR_OFFSET_ ## name)
+
+#elif defined(__VVAR_ADDR)
+
+#define DECLARE_VVAR(type, name) \
+ type const * const vvaraddr_ ## name = \
+ (void *)(VVAR_ADDRESS + (VVAR_OFFSET_ ## name));

#else

-#define DECLARE_VVAR(offset, type, name) \
- static type const * const vvaraddr_ ## name = \
- (void *)(VVAR_ADDRESS + (offset));
+#define DECLARE_VVAR(type, name) \
+ extern type const * const vvaraddr_ ## name;

#define DEFINE_VVAR(type, name) \
type name \
__attribute__((section(".vvar_" #name), aligned(16))) __visible
+#endif

#define VVAR(name) (*vvaraddr_ ## name)

-#endif
-
/* DECLARE_VVAR(offset, type, name) */

-DECLARE_VVAR(0, volatile unsigned long, jiffies)
-DECLARE_VVAR(16, int, vgetcpu_mode)
-DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
+#define VVAR_OFFSET_jiffies 0
+DECLARE_VVAR(volatile unsigned long, jiffies)
+#define VVAR_OFFSET_vgetcpu_mode 16
+DECLARE_VVAR(int, vgetcpu_mode)
+#define VVAR_OFFSET_vsyscall_gtod_data 128
+DECLARE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data)

#undef DECLARE_VVAR
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 885eff4..007eac4 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -10,6 +10,7 @@

/* Disable profiling for userspace code: */
#define DISABLE_BRANCH_PROFILING
+#define __VVAR_ADDR 1

#include <linux/kernel.h>
#include <linux/posix-timers.h>
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index fe08e2b..4432cfc 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -3,6 +3,7 @@
* Copyright 2007 Andi Kleen, SUSE Labs.
* Subject to the GPL, v.2
*/
+#define __VVAR_ADDR 1
#include <linux/mm.h>
#include <linux/err.h>
#include <linux/sched.h>
--
1.7.7.6

2012-08-19 03:03:01

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 65/74] Kbuild, lto: Disable LTO for asm-offsets.c

From: Andi Kleen <[email protected]>

The asm-offset.c technique to fish data out of the assembler file
does not work with LTO. Just disable for the asm-offset.c build.

Signed-off-by: Andi Kleen <[email protected]>
---
scripts/Makefile.build | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index ff1720d..11fa091 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -198,7 +198,7 @@ $(multi-objs-y:.o=.s) : modname = $(modname-multi)
$(multi-objs-y:.o=.lst) : modname = $(modname-multi)

quiet_cmd_cc_s_c = CC $(quiet_modtag) $@
-cmd_cc_s_c = $(CC) $(c_flags) -fverbose-asm -S -o $@ $<
+cmd_cc_s_c = $(CC) $(c_flags) $(DISABLE_LTO) -fverbose-asm -S -o $@ $<

$(obj)/%.s: $(src)/%.c FORCE
$(call if_changed_dep,cc_s_c)
--
1.7.7.6

2012-08-19 03:02:52

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 62/74] Kbuild, lto: add ld-version and ld-ifversion macros

From: Andi Kleen <[email protected]>

To check the linker version. Used by the LTO makefile.

Signed-off-by: Andi Kleen <[email protected]>
---
scripts/Kbuild.include | 9 +++++++++
scripts/ld-version.sh | 8 ++++++++
2 files changed, 17 insertions(+), 0 deletions(-)
create mode 100755 scripts/ld-version.sh

diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 6a3ee98..bda5b20 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -155,6 +155,15 @@ ld-option = $(call try-run,\
# Important: no spaces around options
ar-option = $(call try-run, $(AR) rc$(1) "$$TMP",$(1),$(2))

+# ld-version
+# Usage: $(call ld-version)
+# Note this is mainly for HJ Lu's 3 number binutil versions
+ld-version = $(shell $(LD) --version | $(srctree)/scripts/ld-version.sh)
+
+# ld-ifversion
+# Usage: $(call ld-ifversion, -ge, 22252, y)
+ld-ifversion = $(shell [ $(call ld-version) $(1) $(2) ] && echo $(3))
+
######

###
diff --git a/scripts/ld-version.sh b/scripts/ld-version.sh
new file mode 100755
index 0000000..7eb0b76
--- /dev/null
+++ b/scripts/ld-version.sh
@@ -0,0 +1,8 @@
+#!/usr/bin/awk -f
+# extra linker version number from stdin and turn into single number
+ {
+ gsub(".*)", "");
+ split($1,a, ".");
+ print a[1]*10000000 + a[2]*100000 + a[3]*10000 + a[4]*100 + a[5];
+ exit
+ }
--
1.7.7.6

2012-08-19 03:02:47

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 55/74] lto, workaround: Add workaround for initcall reordering

From: Andi Kleen <[email protected]>

Work around a LTO gcc problem: when there is no reference to a variable
in a module it will be moved to the end of the program. This causes
reordering of initcalls which the kernel does not like.
Add a dummy reference function to avoid this. The function is
deleted by the linker.

This replaces a previous much slower workaround.

Thanks to Honza Hubicka for suggesting this technique.

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/init.h | 20 +++++++++++++++++++-
1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index c2f06b3..e425800 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -176,6 +176,23 @@ extern bool initcall_debug;

#ifndef __ASSEMBLY__

+#ifdef CONFIG_LTO
+/* Work around a LTO gcc problem: when there is no reference to a variable
+ * in a module it will be moved to the end of the program. This causes
+ * reordering of initcalls which the kernel does not like.
+ * Add a dummy reference function to avoid this. The function is
+ * deleted by the linker.
+ */
+#define LTO_REFERENCE_INITCALL(x) \
+ ; /* yes this is needed */ \
+ static __used __exit void *reference_##x(void) \
+ { \
+ return &x; \
+ }
+#else
+#define LTO_REFERENCE_INITCALL(x)
+#endif
+
/* initcalls are now grouped by functionality into separate
* subsections. Ordering inside the subsections is determined
* by link order.
@@ -188,7 +205,8 @@ extern bool initcall_debug;

#define __define_initcall(level,fn,id) \
static initcall_t __initcall_##fn##id __used \
- __attribute__((__section__(".initcall" level ".init"))) = fn
+ __attribute__((__section__(".initcall" level ".init"))) = fn \
+ LTO_REFERENCE_INITCALL(__initcall_##fn##id)

/*
* Early initcalls run before initializing SMP.
--
1.7.7.6

2012-08-19 03:02:36

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 11/74] sections: Add __visible to rapidio sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/rapidio/rio.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/rapidio/rio.h b/drivers/rapidio/rio.h
index b1af414..1fd0138 100644
--- a/drivers/rapidio/rio.h
+++ b/drivers/rapidio/rio.h
@@ -44,8 +44,8 @@ extern struct rio_dev *rio_get_comptag(u32 comp_tag, struct rio_dev *from);
extern struct device_attribute rio_dev_attrs[];
extern spinlock_t rio_global_list_lock;

-extern struct rio_switch_ops __start_rio_switch_ops[];
-extern struct rio_switch_ops __end_rio_switch_ops[];
+extern __visible struct rio_switch_ops __start_rio_switch_ops[];
+extern __visible struct rio_switch_ops __end_rio_switch_ops[];

/* Helpers internal to the RIO core code */
#define DECLARE_RIO_SWITCH_SECTION(section, name, vid, did, init_hook) \
--
1.7.7.6

2012-08-19 03:02:32

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 33/74] x86, lto, apm: Make APM data structure used from assembler visible

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/apm_32.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index d65464e..61c9aa7 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -370,7 +370,7 @@ struct apm_user {
/*
* Local variables
*/
-static struct {
+__visible struct {
unsigned long offset;
unsigned short segment;
} apm_bios_entry;
--
1.7.7.6

2012-08-19 03:02:10

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 51/74] x86, lto, efi: Mark the efi variable used from assembler __visible

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/platform/efi/efi.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 2dc29f5..02bc41a 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -53,7 +53,7 @@
int efi_enabled;
EXPORT_SYMBOL(efi_enabled);

-struct efi __read_mostly efi = {
+struct efi __visible __read_mostly efi = {
.mps = EFI_INVALID_TABLE_ADDR,
.acpi = EFI_INVALID_TABLE_ADDR,
.acpi20 = EFI_INVALID_TABLE_ADDR,
--
1.7.7.6

2012-08-19 03:01:37

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 60/74] lto, Kbuild, bloat-o-meter: fix static detection

From: Andi Kleen <[email protected]>

Disable static detection: the static currently drops a lot of useful information
including clones generated by gcc. Drop this. The statics will appear now
without static. prefix.

But remove the LTO .NUMBER postfixes that look ugly

Signed-off-by: Andi Kleen <[email protected]>
---
scripts/bloat-o-meter | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/bloat-o-meter b/scripts/bloat-o-meter
index 6129020..720cbd2 100755
--- a/scripts/bloat-o-meter
+++ b/scripts/bloat-o-meter
@@ -20,8 +20,8 @@ def getsizes(file):
if type in "tTdDbBrR":
# strip generated symbols
if name[:6] == "__mod_": continue
- # function names begin with '.' on 64-bit powerpc
- if "." in name[1:]: name = "static." + name.split(".")[0]
+ # statics and some other optimizations adds random .NUMBER
+ name = re.sub(r'\.[0-9]+', '', name)
sym[name] = sym.get(name, 0) + int(size, 16)
return sym

--
1.7.7.6

2012-08-19 03:01:26

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 29/74] x86, lto: Make amd.c vide visible

From: Andi Kleen <[email protected]>

A label declared in inline assembler has to be visible

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 9d92e19..2c02d92 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -67,8 +67,8 @@ static inline int wrmsrl_amd_safe(unsigned msr, unsigned long long val)
* performance at the same time..
*/

-extern void vide(void);
-__asm__(".align 4\nvide: ret");
+extern __visible void vide(void);
+__asm__(".globl vide\n\t.align 4\nvide: ret");

static void __cpuinit init_amd_k5(struct cpuinfo_x86 *c)
{
--
1.7.7.6

2012-08-19 03:01:19

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 16/74] sections: Add __visible to lib/* sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
lib/bug.c | 2 +-
lib/dynamic_debug.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/bug.c b/lib/bug.c
index a28c141..f81e1a6 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -43,7 +43,7 @@
#include <linux/bug.h>
#include <linux/sched.h>

-extern const struct bug_entry __start___bug_table[], __stop___bug_table[];
+extern __visible const struct bug_entry __start___bug_table[], __stop___bug_table[];

static inline unsigned long bug_addr(const struct bug_entry *bug)
{
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 7ca29a0..1760a71 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -34,8 +34,8 @@
#include <linux/device.h>
#include <linux/netdevice.h>

-extern struct _ddebug __start___verbose[];
-extern struct _ddebug __stop___verbose[];
+extern __visible struct _ddebug __start___verbose[];
+extern __visible struct _ddebug __stop___verbose[];

struct ddebug_table {
struct list_head link;
--
1.7.7.6

2012-08-19 03:01:12

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 52/74] x86, lto, paravirt: Don't rely on local assembler labels

From: Andi Kleen <[email protected]>

The paravirt patching code assumes that it can reference a
local assembler label between two different top level assembler
statements. This does not work with some experimental gcc builds,
where the assembler code may end up in different assembler files.

Replace it with extern / global /asm linkage labels.

This also removes one redundant copy of the macro.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/paravirt_types.h | 9 +++++----
arch/x86/kernel/paravirt.c | 5 -----
2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 4f262bc..6a464ba 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -385,10 +385,11 @@ extern struct pv_lock_ops pv_lock_ops;
_paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")

/* Simple instruction patching code. */
-#define DEF_NATIVE(ops, name, code) \
- extern const char start_##ops##_##name[] __visible, \
- end_##ops##_##name[] __visible; \
- asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
+#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
+
+#define DEF_NATIVE(ops, name, code) \
+ __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
+ asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))

unsigned paravirt_patch_nop(void);
unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..947255e 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -62,11 +62,6 @@ void __init default_banner(void)
pv_info.name);
}

-/* Simple instruction patching code. */
-#define DEF_NATIVE(ops, name, code) \
- extern const char start_##ops##_##name[], end_##ops##_##name[]; \
- asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
-
/* Undefined instruction for dealing with missing ops pointers. */
static const unsigned char ud2a[] = { 0x0f, 0x0b };

--
1.7.7.6

2012-08-19 03:01:08

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 53/74] x86, lto, paravirt: Make paravirt thunks global

From: Andi Kleen <[email protected]>

The paravirt thunks use a hack of using a static reference to a static
function to reference that function from the top level statement.

This assumes that gcc always generates static function names in a specific
format, which is not necessarily true.

Simply make these functions global and asmlinkage. This way the
static __used variables are not needed and everything works.

Changed in paravirt and in all users (Xen and vsmp)

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/paravirt.h | 2 +-
arch/x86/kernel/vsmp_64.c | 8 ++++----
arch/x86/xen/irq.c | 8 ++++----
arch/x86/xen/mmu.c | 16 ++++++++--------
4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a0facf3..cc733a6 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -804,9 +804,9 @@ static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
*/
#define PV_CALLEE_SAVE_REGS_THUNK(func) \
extern typeof(func) __raw_callee_save_##func; \
- static void *__##func##__ __used = func; \
\
asm(".pushsection .text;" \
+ ".globl __raw_callee_save_" #func " ; " \
"__raw_callee_save_" #func ": " \
PV_SAVE_ALL_CALLER_REGS \
"call " #func ";" \
diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
index 992f890..f393d6d 100644
--- a/arch/x86/kernel/vsmp_64.c
+++ b/arch/x86/kernel/vsmp_64.c
@@ -33,7 +33,7 @@
* and vice versa.
*/

-static unsigned long vsmp_save_fl(void)
+asmlinkage unsigned long vsmp_save_fl(void)
{
unsigned long flags = native_save_fl();

@@ -43,7 +43,7 @@ static unsigned long vsmp_save_fl(void)
}
PV_CALLEE_SAVE_REGS_THUNK(vsmp_save_fl);

-static void vsmp_restore_fl(unsigned long flags)
+asmlinkage void vsmp_restore_fl(unsigned long flags)
{
if (flags & X86_EFLAGS_IF)
flags &= ~X86_EFLAGS_AC;
@@ -53,7 +53,7 @@ static void vsmp_restore_fl(unsigned long flags)
}
PV_CALLEE_SAVE_REGS_THUNK(vsmp_restore_fl);

-static void vsmp_irq_disable(void)
+asmlinkage void vsmp_irq_disable(void)
{
unsigned long flags = native_save_fl();

@@ -61,7 +61,7 @@ static void vsmp_irq_disable(void)
}
PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_disable);

-static void vsmp_irq_enable(void)
+asmlinkage void vsmp_irq_enable(void)
{
unsigned long flags = native_save_fl();

diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 1573376..3dd8831 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -21,7 +21,7 @@ void xen_force_evtchn_callback(void)
(void)HYPERVISOR_xen_version(0, NULL);
}

-static unsigned long xen_save_fl(void)
+asmlinkage unsigned long xen_save_fl(void)
{
struct vcpu_info *vcpu;
unsigned long flags;
@@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
}
PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);

-static void xen_restore_fl(unsigned long flags)
+asmlinkage void xen_restore_fl(unsigned long flags)
{
struct vcpu_info *vcpu;

@@ -66,7 +66,7 @@ static void xen_restore_fl(unsigned long flags)
}
PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);

-static void xen_irq_disable(void)
+asmlinkage void xen_irq_disable(void)
{
/* There's a one instruction preempt window here. We need to
make sure we're don't switch CPUs between getting the vcpu
@@ -77,7 +77,7 @@ static void xen_irq_disable(void)
}
PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);

-static void xen_irq_enable(void)
+asmlinkage void xen_irq_enable(void)
{
struct vcpu_info *vcpu;

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b65a761..9f82443 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -429,7 +429,7 @@ static pteval_t iomap_pte(pteval_t val)
return val;
}

-static pteval_t xen_pte_val(pte_t pte)
+asmlinkage pteval_t xen_pte_val(pte_t pte)
{
pteval_t pteval = pte.pte;
#if 0
@@ -446,7 +446,7 @@ static pteval_t xen_pte_val(pte_t pte)
}
PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);

-static pgdval_t xen_pgd_val(pgd_t pgd)
+asmlinkage pgdval_t xen_pgd_val(pgd_t pgd)
{
return pte_mfn_to_pfn(pgd.pgd);
}
@@ -477,7 +477,7 @@ void xen_set_pat(u64 pat)
WARN_ON(pat != 0x0007010600070106ull);
}

-static pte_t xen_make_pte(pteval_t pte)
+asmlinkage pte_t xen_make_pte(pteval_t pte)
{
phys_addr_t addr = (pte & PTE_PFN_MASK);
#if 0
@@ -512,14 +512,14 @@ static pte_t xen_make_pte(pteval_t pte)
}
PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);

-static pgd_t xen_make_pgd(pgdval_t pgd)
+asmlinkage pgd_t xen_make_pgd(pgdval_t pgd)
{
pgd = pte_pfn_to_mfn(pgd);
return native_make_pgd(pgd);
}
PV_CALLEE_SAVE_REGS_THUNK(xen_make_pgd);

-static pmdval_t xen_pmd_val(pmd_t pmd)
+asmlinkage pmdval_t xen_pmd_val(pmd_t pmd)
{
return pte_mfn_to_pfn(pmd.pmd);
}
@@ -578,7 +578,7 @@ static void xen_pmd_clear(pmd_t *pmdp)
}
#endif /* CONFIG_X86_PAE */

-static pmd_t xen_make_pmd(pmdval_t pmd)
+asmlinkage pmd_t xen_make_pmd(pmdval_t pmd)
{
pmd = pte_pfn_to_mfn(pmd);
return native_make_pmd(pmd);
@@ -586,13 +586,13 @@ static pmd_t xen_make_pmd(pmdval_t pmd)
PV_CALLEE_SAVE_REGS_THUNK(xen_make_pmd);

#if PAGETABLE_LEVELS == 4
-static pudval_t xen_pud_val(pud_t pud)
+asmlinkage pudval_t xen_pud_val(pud_t pud)
{
return pte_mfn_to_pfn(pud.pud);
}
PV_CALLEE_SAVE_REGS_THUNK(xen_pud_val);

-static pud_t xen_make_pud(pudval_t pud)
+asmlinkage pud_t xen_make_pud(pudval_t pud)
{
pud = pte_pfn_to_mfn(pud);

--
1.7.7.6

2012-08-19 03:01:01

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 06/74] sections: Add __visible to powerpc sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/powerpc/include/asm/cputable.h | 2 +-
arch/powerpc/include/asm/firmware.h | 2 +-
arch/powerpc/include/asm/mmu.h | 2 +-
arch/powerpc/include/asm/synch.h | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 50d82c8..048f876 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -125,7 +125,7 @@ struct cpu_spec {

extern struct cpu_spec *cur_cpu_spec;

-extern unsigned int __start___ftr_fixup, __stop___ftr_fixup;
+extern __visible unsigned int __start___ftr_fixup, __stop___ftr_fixup;

extern struct cpu_spec *identify_cpu(unsigned long offset, unsigned int pvr);
extern void do_feature_fixups(unsigned long value, void *fixup_start,
diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index ad0b751..fbf4834 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -128,7 +128,7 @@ extern void machine_check_fwnmi(void);
/* This is true if we are using the firmware NMI handler (typically LPAR) */
extern int fwnmi_active;

-extern unsigned int __start___fw_ftr_fixup, __stop___fw_ftr_fixup;
+extern __visible unsigned int __start___fw_ftr_fixup, __stop___fw_ftr_fixup;

#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e8a26db..cb0a276 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -130,7 +130,7 @@ static inline void mmu_clear_feature(unsigned long feature)
cur_cpu_spec->mmu_features &= ~feature;
}

-extern unsigned int __start___mmu_ftr_fixup, __stop___mmu_ftr_fixup;
+extern __visible unsigned int __start___mmu_ftr_fixup, __stop___mmu_ftr_fixup;

/* MMU initialization */
extern void early_init_mmu(void);
diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h
index e682a71..b13b5f7 100644
--- a/arch/powerpc/include/asm/synch.h
+++ b/arch/powerpc/include/asm/synch.h
@@ -10,7 +10,7 @@
#endif

#ifndef __ASSEMBLY__
-extern unsigned int __start___lwsync_fixup, __stop___lwsync_fixup;
+extern __visible unsigned int __start___lwsync_fixup, __stop___lwsync_fixup;
extern void do_lwsync_fixups(unsigned long value, void *fixup_start,
void *fixup_end);
extern void do_final_fixups(void);
--
1.7.7.6

2012-08-19 03:00:57

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 21/74] lto: Make lockdep_sys_exit asmlinkage

From: Andi Kleen <[email protected]>

lockdep_sys_exit can be called from assembler code, so make it
asmlinkage

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/lockdep.h | 2 +-
kernel/lockdep.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 00e4637..f37a847 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -265,7 +265,7 @@ extern void lockdep_info(void);
extern void lockdep_reset(void);
extern void lockdep_reset_lock(struct lockdep_map *lock);
extern void lockdep_free_key_range(void *start, unsigned long size);
-extern void lockdep_sys_exit(void);
+extern asmlinkage void lockdep_sys_exit(void);

extern void lockdep_off(void);
extern void lockdep_on(void);
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index ea9ee45..ef76308 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4142,7 +4142,7 @@ void debug_show_held_locks(struct task_struct *task)
}
EXPORT_SYMBOL_GPL(debug_show_held_locks);

-void lockdep_sys_exit(void)
+asmlinkage void lockdep_sys_exit(void)
{
struct task_struct *curr = current;

--
1.7.7.6

2012-08-19 03:00:54

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 28/74] lto: Make ksymtab and kcrctab symbols and __this_module __visible

From: Andi Kleen <[email protected]>

Make the ksymtab symbols for EXPORT_SYMBOL visible.
This prevents the LTO compiler from adding a .NUMBER prefix,
which avoids various problems in later export processing.

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/export.h | 4 ++--
scripts/mod/modpost.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/export.h b/include/linux/export.h
index 784617e..77a3e8c 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -36,7 +36,7 @@ extern struct module __this_module;
/* Mark the CRC weak since genksyms apparently decides not to
* generate a checksums for some symbols */
#define __CRC_SYMBOL(sym, sec) \
- extern void *__crc_##sym __attribute__((weak)); \
+ extern __visible void *__crc_##sym __attribute__((weak)); \
static const unsigned long __kcrctab_##sym \
__used \
__attribute__((section("___kcrctab" sec "+" #sym), unused)) \
@@ -52,7 +52,7 @@ extern struct module __this_module;
static const char __kstrtab_##sym[] \
__attribute__((section("__ksymtab_strings"), aligned(1))) \
= MODULE_SYMBOL_PREFIX #sym; \
- static const struct kernel_symbol __ksymtab_##sym \
+ __visible const struct kernel_symbol __ksymtab_##sym \
__used \
__attribute__((section("___ksymtab" sec "+" #sym), unused)) \
= { (unsigned long)&sym, __kstrtab_##sym }
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 68e9f5e..c797e95 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1862,7 +1862,7 @@ static void add_header(struct buffer *b, struct module *mod)
buf_printf(b, "\n");
buf_printf(b, "MODULE_INFO(vermagic, VERMAGIC_STRING);\n");
buf_printf(b, "\n");
- buf_printf(b, "struct module __this_module\n");
+ buf_printf(b, "__visible struct module __this_module\n");
buf_printf(b, "__attribute__((section(\".gnu.linkonce.this_module\"))) = {\n");
buf_printf(b, "\t.name = KBUILD_MODNAME,\n");
if (mod->has_init)
--
1.7.7.6

2012-08-19 03:00:50

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 37/74] lto, KVM: Don't assume asm statements end up in the same assembler file

From: Andi Kleen <[email protected]>

The VMX code references a local assembler label between two inline
assembler statements. This assumes they both end up in the same
assembler files. In some experimental builds of gcc this is not
necessarily true, causing linker failures.

Replace the local label reference with a more traditional asmlinkage
extern.

This also eliminates one assembler statement and
generates a bit better code on 64bit: the compiler can
use a RIP relative LEA instead of a movabs, saving
a few bytes.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kvm/vmx.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c00f03d..2fe1de3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3718,6 +3718,8 @@ static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
__vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, msr);
}

+extern __visible unsigned long kvm_vmx_return;
+
/*
* Set up the vmcs's constant host-state fields, i.e., host-state fields that
* will not change in the lifetime of the guest.
@@ -3753,8 +3755,7 @@ static void vmx_set_constant_host_state(void)
native_store_idt(&dt);
vmcs_writel(HOST_IDTR_BASE, dt.address); /* 22.2.4 */

- asm("mov $.Lkvm_vmx_return, %0" : "=r"(tmpl));
- vmcs_writel(HOST_RIP, tmpl); /* 22.2.5 */
+ vmcs_writel(HOST_RIP, (unsigned long)&kvm_vmx_return); /* 22.2.5 */

rdmsr(MSR_IA32_SYSENTER_CS, low32, high32);
vmcs_write32(HOST_IA32_SYSENTER_CS, low32);
@@ -6305,9 +6306,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
/* Enter guest mode */
"jne .Llaunched \n\t"
__ex(ASM_VMX_VMLAUNCH) "\n\t"
- "jmp .Lkvm_vmx_return \n\t"
+ "jmp kvm_vmx_return \n\t"
".Llaunched: " __ex(ASM_VMX_VMRESUME) "\n\t"
- ".Lkvm_vmx_return: "
+ ".globl kvm_vmx_return\n"
+ "kvm_vmx_return: "
/* Save guest registers, load host registers, keep flags */
"mov %0, %c[wordsize](%%"R"sp) \n\t"
"pop %0 \n\t"
--
1.7.7.6

2012-08-19 03:00:48

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 15/74] sections: Add __visible to kernel/trace/* sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/trace/ftrace.c | 4 ++--
kernel/trace/trace.h | 4 ++--
kernel/trace/trace_branch.c | 8 ++++----
kernel/trace/trace_events.c | 4 ++--
kernel/trace/trace_syscalls.c | 4 ++--
kernel/tracepoint.c | 4 ++--
6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b4f20fb..5028bd3 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3866,8 +3866,8 @@ struct notifier_block ftrace_module_nb = {
.priority = 0,
};

-extern unsigned long __start_mcount_loc[];
-extern unsigned long __stop_mcount_loc[];
+extern __visible unsigned long __start_mcount_loc[];
+extern __visible unsigned long __stop_mcount_loc[];

void __init ftrace_init(void)
{
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..8c063e7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -836,8 +836,8 @@ extern void trace_event_enable_cmd_record(bool enable);
extern struct mutex event_mutex;
extern struct list_head ftrace_events;

-extern const char *__start___trace_bprintk_fmt[];
-extern const char *__stop___trace_bprintk_fmt[];
+extern __visible const char *__start___trace_bprintk_fmt[];
+extern __visible const char *__stop___trace_bprintk_fmt[];

void trace_printk_init_buffers(void);

diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
index 8d3538b..5be6217 100644
--- a/kernel/trace/trace_branch.c
+++ b/kernel/trace/trace_branch.c
@@ -226,8 +226,8 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect)
}
EXPORT_SYMBOL(ftrace_likely_update);

-extern unsigned long __start_annotated_branch_profile[];
-extern unsigned long __stop_annotated_branch_profile[];
+extern __visible unsigned long __start_annotated_branch_profile[];
+extern __visible unsigned long __stop_annotated_branch_profile[];

static int annotated_branch_stat_headers(struct seq_file *m)
{
@@ -355,8 +355,8 @@ fs_initcall(init_annotated_branch_stats);

#ifdef CONFIG_PROFILE_ALL_BRANCHES

-extern unsigned long __start_branch_profile[];
-extern unsigned long __stop_branch_profile[];
+extern __visible unsigned long __start_branch_profile[];
+extern __visible unsigned long __stop_branch_profile[];

static int all_branch_stat_headers(struct seq_file *m)
{
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 29111da..325c9f0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1435,8 +1435,8 @@ static struct notifier_block trace_module_nb = {
.priority = 0,
};

-extern struct ftrace_event_call *__start_ftrace_events[];
-extern struct ftrace_event_call *__stop_ftrace_events[];
+extern __visible struct ftrace_event_call *__start_ftrace_events[];
+extern __visible struct ftrace_event_call *__stop_ftrace_events[];

static char bootup_event_buf[COMMAND_LINE_SIZE] __initdata;

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 60e4d78..52f3e15 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -56,8 +56,8 @@ struct ftrace_event_class event_class_syscall_exit = {
.raw_init = init_syscall_trace,
};

-extern struct syscall_metadata *__start_syscalls_metadata[];
-extern struct syscall_metadata *__stop_syscalls_metadata[];
+extern __visible struct syscall_metadata *__start_syscalls_metadata[];
+extern __visible struct syscall_metadata *__stop_syscalls_metadata[];

static struct syscall_metadata **syscalls_metadata;

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index d96ba22..ddae1de 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -27,8 +27,8 @@
#include <linux/sched.h>
#include <linux/static_key.h>

-extern struct tracepoint * const __start___tracepoints_ptrs[];
-extern struct tracepoint * const __stop___tracepoints_ptrs[];
+extern __visible struct tracepoint * const __start___tracepoints_ptrs[];
+extern __visible struct tracepoint * const __stop___tracepoints_ptrs[];

/* Set to 1 to enable tracepoint debug output */
static const int tracepoint_debug;
--
1.7.7.6

2012-08-19 03:00:19

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 58/74] lto, workaround: Work around LTO compiler problem in atheros driver

From: Andi Kleen <[email protected]>

Making these symbols visible works around a gcc 4.7 LTO compiler
problem with missing symbols.

Signed-off-by: Andi Kleen <[email protected]>
---
.../net/wireless/ath/ath9k/dfs_pattern_detector.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/dfs_pattern_detector.c b/drivers/net/wireless/ath/ath9k/dfs_pattern_detector.c
index ea2a6cf..83c579a 100644
--- a/drivers/net/wireless/ath/ath9k/dfs_pattern_detector.c
+++ b/drivers/net/wireless/ath/ath9k/dfs_pattern_detector.c
@@ -51,7 +51,7 @@ struct radar_types {
}

/* radar types as defined by ETSI EN-301-893 v1.5.1 */
-static const struct radar_detector_specs etsi_radar_ref_types_v15[] = {
+__visible const struct radar_detector_specs etsi_radar_ref_types_v15[] = {
ETSI_PATTERN(0, 0, 1, 700, 700, 1, 18),
ETSI_PATTERN(1, 0, 5, 200, 1000, 1, 10),
ETSI_PATTERN(2, 0, 15, 200, 1600, 1, 15),
--
1.7.7.6

2012-08-19 03:00:09

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 10/74] sections: Add __visible to drivers/{base,pci} sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/base/firmware_class.c | 4 ++--
drivers/base/power/trace.c | 2 +-
drivers/pci/quirks.c | 28 ++++++++++++++--------------
3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 803cfc1..618ca735 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -30,8 +30,8 @@ MODULE_LICENSE("GPL");

#ifdef CONFIG_FW_LOADER

-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern __visible struct builtin_fw __start_builtin_fw[];
+extern __visible struct builtin_fw __end_builtin_fw[];

static bool fw_get_builtin_firmware(struct firmware *fw, const char *name)
{
diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
index d94a1f5..3048afa 100644
--- a/drivers/base/power/trace.c
+++ b/drivers/base/power/trace.c
@@ -166,7 +166,7 @@ void generate_resume_trace(const void *tracedata, unsigned int user)
}
EXPORT_SYMBOL(generate_resume_trace);

-extern char __tracedata_start, __tracedata_end;
+extern __visible char __tracedata_start, __tracedata_end;
static int show_file_hash(unsigned int value)
{
int match;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 5155317..d18ea93 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2941,20 +2941,20 @@ static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
}
}

-extern struct pci_fixup __start_pci_fixups_early[];
-extern struct pci_fixup __end_pci_fixups_early[];
-extern struct pci_fixup __start_pci_fixups_header[];
-extern struct pci_fixup __end_pci_fixups_header[];
-extern struct pci_fixup __start_pci_fixups_final[];
-extern struct pci_fixup __end_pci_fixups_final[];
-extern struct pci_fixup __start_pci_fixups_enable[];
-extern struct pci_fixup __end_pci_fixups_enable[];
-extern struct pci_fixup __start_pci_fixups_resume[];
-extern struct pci_fixup __end_pci_fixups_resume[];
-extern struct pci_fixup __start_pci_fixups_resume_early[];
-extern struct pci_fixup __end_pci_fixups_resume_early[];
-extern struct pci_fixup __start_pci_fixups_suspend[];
-extern struct pci_fixup __end_pci_fixups_suspend[];
+extern __visible struct pci_fixup __start_pci_fixups_early[];
+extern __visible struct pci_fixup __end_pci_fixups_early[];
+extern __visible struct pci_fixup __start_pci_fixups_header[];
+extern __visible struct pci_fixup __end_pci_fixups_header[];
+extern __visible struct pci_fixup __start_pci_fixups_final[];
+extern __visible struct pci_fixup __end_pci_fixups_final[];
+extern __visible struct pci_fixup __start_pci_fixups_enable[];
+extern __visible struct pci_fixup __end_pci_fixups_enable[];
+extern __visible struct pci_fixup __start_pci_fixups_resume[];
+extern __visible struct pci_fixup __end_pci_fixups_resume[];
+extern __visible struct pci_fixup __start_pci_fixups_resume_early[];
+extern __visible struct pci_fixup __end_pci_fixups_resume_early[];
+extern __visible struct pci_fixup __start_pci_fixups_suspend[];
+extern __visible struct pci_fixup __end_pci_fixups_suspend[];

static bool pci_apply_fixup_final_quirks;

--
1.7.7.6

2012-08-19 03:00:01

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 42/74] lto, raid: disable LTO for the Altivec RAID code

From: Andi Kleen <[email protected]>

It needs special options for a file, which LTO does not support.

XXX: may need some more __visibles

Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
lib/raid6/Makefile | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index de06dfe..55fbafb 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -10,7 +10,7 @@ quiet_cmd_unroll = UNROLL $@
< $< > $@ || ( rm -f $@ && exit 1 )

ifeq ($(CONFIG_ALTIVEC),y)
-altivec_flags := -maltivec -mabi=altivec
+altivec_flags := -maltivec -mabi=altivec ${DISABLE_LTO}
endif

targets += int1.c
--
1.7.7.6

2012-08-19 02:59:58

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 35/74] lto, crypto, aes: mark AES tables __visible

From: Andi Kleen <[email protected]>

Various tables in aes_generic are accessed by assembler code.
Mark them __visible for LTO

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
crypto/aes_generic.c | 8 ++++----
include/crypto/aes.h | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
index a68c73d..1cf1f89 100644
--- a/crypto/aes_generic.c
+++ b/crypto/aes_generic.c
@@ -62,7 +62,7 @@ static inline u8 byte(const u32 x, const unsigned n)

static const u32 rco_tab[10] = { 1, 2, 4, 8, 16, 32, 64, 128, 27, 54 };

-const u32 crypto_ft_tab[4][256] = {
+__visible const u32 crypto_ft_tab[4][256] = {
{
0xa56363c6, 0x847c7cf8, 0x997777ee, 0x8d7b7bf6,
0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591,
@@ -326,7 +326,7 @@ const u32 crypto_ft_tab[4][256] = {
}
};

-const u32 crypto_fl_tab[4][256] = {
+__visible const u32 crypto_fl_tab[4][256] = {
{
0x00000063, 0x0000007c, 0x00000077, 0x0000007b,
0x000000f2, 0x0000006b, 0x0000006f, 0x000000c5,
@@ -590,7 +590,7 @@ const u32 crypto_fl_tab[4][256] = {
}
};

-const u32 crypto_it_tab[4][256] = {
+__visible const u32 crypto_it_tab[4][256] = {
{
0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a,
0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b,
@@ -854,7 +854,7 @@ const u32 crypto_it_tab[4][256] = {
}
};

-const u32 crypto_il_tab[4][256] = {
+__visible const u32 crypto_il_tab[4][256] = {
{
0x00000052, 0x00000009, 0x0000006a, 0x000000d5,
0x00000030, 0x00000036, 0x000000a5, 0x00000038,
diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index 7524ba3..f30d38d 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -27,10 +27,10 @@ struct crypto_aes_ctx {
u32 key_length;
};

-extern const u32 crypto_ft_tab[4][256];
-extern const u32 crypto_fl_tab[4][256];
-extern const u32 crypto_it_tab[4][256];
-extern const u32 crypto_il_tab[4][256];
+extern __visible const u32 crypto_ft_tab[4][256];
+extern __visible const u32 crypto_fl_tab[4][256];
+extern __visible const u32 crypto_it_tab[4][256];
+extern __visible const u32 crypto_il_tab[4][256];

int crypto_aes_set_key(struct crypto_tfm *tfm, const u8 *in_key,
unsigned int key_len);
--
1.7.7.6

2012-08-19 02:59:51

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 13/74] sections: Add __visible to init/* sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
init/do_mounts_initrd.c | 2 +-
init/initramfs.c | 4 ++--
init/main.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 135959a2..71a625e 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -36,7 +36,7 @@ __setup("noinitrd", no_initrd);
static int __init do_linuxrc(void *_shell)
{
static const char *argv[] = { "linuxrc", NULL, };
- extern const char *envp_init[];
+ extern __visible const char *envp_init[];
const char *shell = _shell;

sys_close(old_fd);sys_close(root_fd);
diff --git a/init/initramfs.c b/init/initramfs.c
index 84c6bf1..8a1fd07 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -493,8 +493,8 @@ static int __init retain_initrd_param(char *str)
}
__setup("retain_initrd", retain_initrd_param);

-extern char __initramfs_start[];
-extern unsigned long __initramfs_size;
+extern __visible char __initramfs_start[];
+extern __visible unsigned long __initramfs_size;
#include <linux/initrd.h>
#include <linux/kexec.h>

diff --git a/init/main.c b/init/main.c
index e60679d..6438ffd 100644
--- a/init/main.c
+++ b/init/main.c
@@ -470,7 +470,7 @@ static void __init mm_init(void)
asmlinkage void __init start_kernel(void)
{
char * command_line;
- extern const struct kernel_param __start___param[], __stop___param[];
+ extern __visible const struct kernel_param __start___param[], __stop___param[];

/*
* Need to run as early as possible, to initialize the
--
1.7.7.6

2012-08-19 02:59:49

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 03/74] sections: Make external kallsyms tables __visible

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/kallsyms.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 2169fee..1b40cb7 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -36,20 +36,20 @@
* These will be re-linked against their real values
* during the second link stage.
*/
-extern const unsigned long kallsyms_addresses[] __attribute__((weak));
-extern const u8 kallsyms_names[] __attribute__((weak));
+extern __visible const unsigned long kallsyms_addresses[] __attribute__((weak));
+extern __visible const u8 kallsyms_names[] __attribute__((weak));

/*
* Tell the compiler that the count isn't in the small data section if the arch
* has one (eg: FRV).
*/
-extern const unsigned long kallsyms_num_syms
+extern __visible const unsigned long kallsyms_num_syms
__attribute__((weak, section(".rodata")));

-extern const u8 kallsyms_token_table[] __attribute__((weak));
-extern const u16 kallsyms_token_index[] __attribute__((weak));
+extern __visible const u8 kallsyms_token_table[] __attribute__((weak));
+extern __visible const u16 kallsyms_token_index[] __attribute__((weak));

-extern const unsigned long kallsyms_markers[] __attribute__((weak));
+extern __visible const unsigned long kallsyms_markers[] __attribute__((weak));

static inline int is_kernel_inittext(unsigned long addr)
{
--
1.7.7.6

2012-08-19 02:59:46

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 02/74] sections: Make all standard section identifiers __visible

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
include/asm-generic/sections.h | 24 ++++++++++++------------
1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index c1a1216..eab95aa 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -3,20 +3,20 @@

/* References to section boundaries */

-extern char _text[], _stext[], _etext[];
-extern char _data[], _sdata[], _edata[];
-extern char __bss_start[], __bss_stop[];
-extern char __init_begin[], __init_end[];
-extern char _sinittext[], _einittext[];
-extern char _end[];
-extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
-extern char __kprobes_text_start[], __kprobes_text_end[];
-extern char __entry_text_start[], __entry_text_end[];
-extern char __initdata_begin[], __initdata_end[];
-extern char __start_rodata[], __end_rodata[];
+extern __visible char _text[], _stext[], _etext[];
+extern __visible char _data[], _sdata[], _edata[];
+extern __visible char __bss_start[], __bss_stop[];
+extern __visible char __init_begin[], __init_end[];
+extern __visible char _sinittext[], _einittext[];
+extern __visible char _end[];
+extern __visible char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
+extern __visible char __kprobes_text_start[], __kprobes_text_end[];
+extern __visible char __entry_text_start[], __entry_text_end[];
+extern __visible char __initdata_begin[], __initdata_end[];
+extern __visible char __start_rodata[], __end_rodata[];

/* Start and end of .ctors section - used for constructor calls. */
-extern char __ctors_start[], __ctors_end[];
+extern __visible char __ctors_start[], __ctors_end[];

/* function descriptor handling (if any). Override
* in asm/sections.h */
--
1.7.7.6

2012-08-19 02:59:43

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 25/74] x86, lto: Fix sys_call_table type in asm/syscall.h v2

From: Andi Kleen <[email protected]>

Make the sys_call_table type defined in asm/syscall.h match
the definition in syscall_64.c

v2: include asm/syscall.h in syscall_64.c too. I left uml alone
because it doesn't have an syscall.h on its own and including
the native one leads to other errors.
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/syscall.h | 3 ++-
arch/x86/kernel/syscall_64.c | 3 +--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 1ace47b..c36962d 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -20,7 +20,8 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>

-extern const unsigned long sys_call_table[];
+typedef void (*sys_call_ptr_t)(void);
+extern const sys_call_ptr_t sys_call_table[];

/*
* Only the low 32 bits of orig_ax are meaningful, so we return int.
diff --git a/arch/x86/kernel/syscall_64.c b/arch/x86/kernel/syscall_64.c
index 3967318..4ac730b 100644
--- a/arch/x86/kernel/syscall_64.c
+++ b/arch/x86/kernel/syscall_64.c
@@ -4,6 +4,7 @@
#include <linux/sys.h>
#include <linux/cache.h>
#include <asm/asm-offsets.h>
+#include <asm/syscall.h>

#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)

@@ -19,8 +20,6 @@

#define __SYSCALL_64(nr, sym, compat) [nr] = sym,

-typedef void (*sys_call_ptr_t)(void);
-
extern void sys_ni_syscall(void);

asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
--
1.7.7.6

2012-08-19 02:59:39

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 34/74] x86, lto, lguest: Fix C functions used by inline assembler

From: Andi Kleen <[email protected]>

- Make the C code used by the paravirt stubs visible
- Since they have to be global now, give them a more unique
name.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/lguest/boot.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 642d880..dd167d2 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -234,13 +234,13 @@ static void lguest_end_context_switch(struct task_struct *next)
* flags word contains all kind of stuff, but in practice Linux only cares
* about the interrupt flag. Our "save_flags()" just returns that.
*/
-static unsigned long save_fl(void)
+asmlinkage unsigned long lguest_save_fl(void)
{
return lguest_data.irq_enabled;
}

/* Interrupts go off... */
-static void irq_disable(void)
+asmlinkage void lguest_irq_disable(void)
{
lguest_data.irq_enabled = 0;
}
@@ -254,8 +254,8 @@ static void irq_disable(void)
* PV_CALLEE_SAVE_REGS_THUNK(), which pushes %eax onto the stack, calls the
* C function, then restores it.
*/
-PV_CALLEE_SAVE_REGS_THUNK(save_fl);
-PV_CALLEE_SAVE_REGS_THUNK(irq_disable);
+PV_CALLEE_SAVE_REGS_THUNK(lguest_save_fl);
+PV_CALLEE_SAVE_REGS_THUNK(lguest_irq_disable);
/*:*/

/* These are in i386_head.S */
@@ -1285,9 +1285,9 @@ __init void lguest_init(void)
*/

/* Interrupt-related operations */
- pv_irq_ops.save_fl = PV_CALLEE_SAVE(save_fl);
+ pv_irq_ops.save_fl = PV_CALLEE_SAVE(lguest_save_fl);
pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(lg_restore_fl);
- pv_irq_ops.irq_disable = PV_CALLEE_SAVE(irq_disable);
+ pv_irq_ops.irq_disable = PV_CALLEE_SAVE(lguest_irq_disable);
pv_irq_ops.irq_enable = __PV_IS_CALLEE_SAVE(lg_irq_enable);
pv_irq_ops.safe_halt = lguest_safe_halt;

--
1.7.7.6

2012-08-19 02:59:36

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 73/74] lto, module: Warn about modules that are not fully LTOed

From: Andi Kleen <[email protected]>

When __gnu_lto_* is present that means that the module hasn't run with
LTO yet.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
kernel/module.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 2cbbae3..a8a29c4 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1905,8 +1905,11 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
switch (sym[i].st_shndx) {
case SHN_COMMON:
/* Ignore common symbols */
- if (!strncmp(name, "__gnu_lto", 9))
+ if (!strncmp(name, "__gnu_lto", 9)) {
+ printk("%s: module not link time optimized\n",
+ mod->name);
break;
+ }

/* We compiled with -fno-common. These are not
supposed to happen. */
--
1.7.7.6

2012-08-19 02:59:31

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 30/74] x86, lto: Fix AMD K6 indirect call check

From: Andi Kleen <[email protected]>

The AMD K6 errata check relies on timing a indirect call.
But the way it was written it could be optimized to a direct call.
Force gcc to actually do a indirect call and not just
constant resolve the target address.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 2c02d92..cb72014 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -116,7 +116,7 @@ static void __cpuinit init_amd_k6(struct cpuinfo_x86 *c)
*/

n = K6_BUG_LOOP;
- f_vide = vide;
+ asm("" : "=r" (f_vide) : "0" (vide));
rdtscl(d);
while (n--)
f_vide();
--
1.7.7.6

2012-08-19 02:59:28

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 07/74] sections: Add __visible to sh sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/sh/include/asm/sections.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/sh/include/asm/sections.h b/arch/sh/include/asm/sections.h
index 1b61997..e18e132 100644
--- a/arch/sh/include/asm/sections.h
+++ b/arch/sh/include/asm/sections.h
@@ -3,10 +3,10 @@

#include <asm-generic/sections.h>

-extern long __nosave_begin, __nosave_end;
-extern long __machvec_start, __machvec_end;
-extern char __uncached_start, __uncached_end;
-extern char __start_eh_frame[], __stop_eh_frame[];
+extern __visible long __nosave_begin, __nosave_end;
+extern __visible long __machvec_start, __machvec_end;
+extern __visible char __uncached_start, __uncached_end;
+extern __visible char __start_eh_frame[], __stop_eh_frame[];

#endif /* __ASM_SH_SECTIONS_H */

--
1.7.7.6

2012-08-19 02:59:11

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 48/74] x86, lto: Use inline assembler instead of global register variable to get sp

From: Andi Kleen <[email protected]>

LTO in gcc 4.6/47. has trouble with global register variables. They were used
to read the stack pointer. Use a simple inline assembler statement instead.

I verified this generates the same binary (on 64bit) as the original
register variable.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/thread_info.h | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 89f794f..d9fbfa1 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -167,9 +167,11 @@ struct thread_info {
*/
#ifndef __ASSEMBLY__

-
-/* how to get the current stack pointer from C */
-register unsigned long current_stack_pointer asm("esp") __used;
+#define current_stack_pointer ({ \
+ unsigned long sp; \
+ asm("mov %%esp,%0" : "=r" (sp)); \
+ sp; \
+})

/* how to get the thread information struct from C */
static inline struct thread_info *current_thread_info(void)
--
1.7.7.6

2012-08-19 02:59:07

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 36/74] lto, crypto, camelia: Make camelia tables used by assembler __visible

From: Andi Kleen <[email protected]>

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/crypto/camellia_glue.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/crypto/camellia_glue.c b/arch/x86/crypto/camellia_glue.c
index eeb2b3b..f290db7 100644
--- a/arch/x86/crypto/camellia_glue.c
+++ b/arch/x86/crypto/camellia_glue.c
@@ -91,7 +91,7 @@ static void camellia_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
}

/* camellia sboxes */
-const u64 camellia_sp10011110[256] = {
+__visible const u64 camellia_sp10011110[256] = {
0x7000007070707000, 0x8200008282828200, 0x2c00002c2c2c2c00,
0xec0000ecececec00, 0xb30000b3b3b3b300, 0x2700002727272700,
0xc00000c0c0c0c000, 0xe50000e5e5e5e500, 0xe40000e4e4e4e400,
@@ -180,7 +180,7 @@ const u64 camellia_sp10011110[256] = {
0x9e00009e9e9e9e00,
};

-const u64 camellia_sp22000222[256] = {
+__visible const u64 camellia_sp22000222[256] = {
0xe0e0000000e0e0e0, 0x0505000000050505, 0x5858000000585858,
0xd9d9000000d9d9d9, 0x6767000000676767, 0x4e4e0000004e4e4e,
0x8181000000818181, 0xcbcb000000cbcbcb, 0xc9c9000000c9c9c9,
@@ -269,7 +269,7 @@ const u64 camellia_sp22000222[256] = {
0x3d3d0000003d3d3d,
};

-const u64 camellia_sp03303033[256] = {
+__visible const u64 camellia_sp03303033[256] = {
0x0038380038003838, 0x0041410041004141, 0x0016160016001616,
0x0076760076007676, 0x00d9d900d900d9d9, 0x0093930093009393,
0x0060600060006060, 0x00f2f200f200f2f2, 0x0072720072007272,
@@ -358,7 +358,7 @@ const u64 camellia_sp03303033[256] = {
0x004f4f004f004f4f,
};

-const u64 camellia_sp00444404[256] = {
+__visible const u64 camellia_sp00444404[256] = {
0x0000707070700070, 0x00002c2c2c2c002c, 0x0000b3b3b3b300b3,
0x0000c0c0c0c000c0, 0x0000e4e4e4e400e4, 0x0000575757570057,
0x0000eaeaeaea00ea, 0x0000aeaeaeae00ae, 0x0000232323230023,
@@ -447,7 +447,7 @@ const u64 camellia_sp00444404[256] = {
0x00009e9e9e9e009e,
};

-const u64 camellia_sp02220222[256] = {
+__visible const u64 camellia_sp02220222[256] = {
0x00e0e0e000e0e0e0, 0x0005050500050505, 0x0058585800585858,
0x00d9d9d900d9d9d9, 0x0067676700676767, 0x004e4e4e004e4e4e,
0x0081818100818181, 0x00cbcbcb00cbcbcb, 0x00c9c9c900c9c9c9,
@@ -536,7 +536,7 @@ const u64 camellia_sp02220222[256] = {
0x003d3d3d003d3d3d,
};

-const u64 camellia_sp30333033[256] = {
+__visible const u64 camellia_sp30333033[256] = {
0x3800383838003838, 0x4100414141004141, 0x1600161616001616,
0x7600767676007676, 0xd900d9d9d900d9d9, 0x9300939393009393,
0x6000606060006060, 0xf200f2f2f200f2f2, 0x7200727272007272,
@@ -625,7 +625,7 @@ const u64 camellia_sp30333033[256] = {
0x4f004f4f4f004f4f,
};

-const u64 camellia_sp44044404[256] = {
+__visible const u64 camellia_sp44044404[256] = {
0x7070007070700070, 0x2c2c002c2c2c002c, 0xb3b300b3b3b300b3,
0xc0c000c0c0c000c0, 0xe4e400e4e4e400e4, 0x5757005757570057,
0xeaea00eaeaea00ea, 0xaeae00aeaeae00ae, 0x2323002323230023,
@@ -714,7 +714,7 @@ const u64 camellia_sp44044404[256] = {
0x9e9e009e9e9e009e,
};

-const u64 camellia_sp11101110[256] = {
+__visible const u64 camellia_sp11101110[256] = {
0x7070700070707000, 0x8282820082828200, 0x2c2c2c002c2c2c00,
0xececec00ececec00, 0xb3b3b300b3b3b300, 0x2727270027272700,
0xc0c0c000c0c0c000, 0xe5e5e500e5e5e500, 0xe4e4e400e4e4e400,
--
1.7.7.6

2012-08-19 02:59:01

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 18/74] lto, wan/sbni: Make inline assembler symbols visible and assembler global

From: Andi Kleen <[email protected]>

- Inline assembler defining C callable code has to be global
- The function has to be visible

Do this in wan/sbni

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/net/wan/sbni.c | 6 +++---
drivers/net/wan/sbni.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wan/sbni.c b/drivers/net/wan/sbni.c
index d43f4ef..b914ab3 100644
--- a/drivers/net/wan/sbni.c
+++ b/drivers/net/wan/sbni.c
@@ -160,7 +160,7 @@ static int scandone __initdata = 0;
static int num __initdata = 0;

static unsigned char rxl_tab[];
-static u32 crc32tab[];
+__visible u32 sbni_crc32tab[];

/* A list of all installed devices, for removing the driver module. */
static struct net_device *sbni_cards[ SBNI_MAX_NUM_CARDS ];
@@ -1563,7 +1563,7 @@ calc_crc32( u32 crc, u8 *p, u32 len )
"xorl %%ebx, %%ebx\n"
"movl %2, %%esi\n"
"movl %3, %%ecx\n"
- "movl $crc32tab, %%edi\n"
+ "movl $sbni_crc32tab, %%edi\n"
"shrl $2, %%ecx\n"
"jz 1f\n"

@@ -1645,7 +1645,7 @@ calc_crc32( u32 crc, u8 *p, u32 len )
#endif /* ASM_CRC */


-static u32 crc32tab[] __attribute__ ((aligned(8))) = {
+__visible u32 sbni_crc32tab[] __attribute__ ((aligned(8))) = {
0xD202EF8D, 0xA505DF1B, 0x3C0C8EA1, 0x4B0BBE37,
0xD56F2B94, 0xA2681B02, 0x3B614AB8, 0x4C667A2E,
0xDCD967BF, 0xABDE5729, 0x32D70693, 0x45D03605,
diff --git a/drivers/net/wan/sbni.h b/drivers/net/wan/sbni.h
index 8426451..7e6d980 100644
--- a/drivers/net/wan/sbni.h
+++ b/drivers/net/wan/sbni.h
@@ -132,7 +132,7 @@ struct sbni_flags {
/*
* CRC-32 stuff
*/
-#define CRC32(c,crc) (crc32tab[((size_t)(crc) ^ (c)) & 0xff] ^ (((crc) >> 8) & 0x00FFFFFF))
+#define CRC32(c,crc) (sbni_crc32tab[((size_t)(crc) ^ (c)) & 0xff] ^ (((crc) >> 8) & 0x00FFFFFF))
/* CRC generator 0xEDB88320 */
/* CRC remainder 0x2144DF1C */
/* CRC initial value 0x00000000 */
--
1.7.7.6

2012-08-19 02:58:51

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 20/74] x86, lto: Change dotraplinkage into __visible on 32bit

From: Andi Kleen <[email protected]>

Mark 32bit dotraplinkage functions as __visible for LTO.
64bit already is using asmlinkage which includes it.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/traps.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 88eae2a..cc67b3f 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -7,7 +7,7 @@
#include <asm/siginfo.h> /* TRAP_TRACE, ... */

#ifdef CONFIG_X86_32
-#define dotraplinkage
+#define dotraplinkage __visible
#else
#define dotraplinkage asmlinkage
#endif
--
1.7.7.6

2012-08-19 02:58:48

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 27/74] lto: Mark EXPORT_SYMBOL symbols __visible

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/export.h | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/export.h b/include/linux/export.h
index 696c0f4..784617e 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -47,7 +47,7 @@ extern struct module __this_module;

/* For every exported symbol, place a struct in the __ksymtab section */
#define __EXPORT_SYMBOL(sym, sec) \
- extern typeof(sym) sym; \
+ extern typeof(sym) sym __visible; \
__CRC_SYMBOL(sym, sec) \
static const char __kstrtab_##sym[] \
__attribute__((section("__ksymtab_strings"), aligned(1))) \
@@ -78,11 +78,13 @@ extern struct module __this_module;

#else /* !CONFIG_MODULES... */

-#define EXPORT_SYMBOL(sym)
-#define EXPORT_SYMBOL_GPL(sym)
-#define EXPORT_SYMBOL_GPL_FUTURE(sym)
-#define EXPORT_UNUSED_SYMBOL(sym)
-#define EXPORT_UNUSED_SYMBOL_GPL(sym)
+/* Even without modules keep the __visible side effect */
+
+#define EXPORT_SYMBOL(sym) extern typeof(sym) sym __visible
+#define EXPORT_SYMBOL_GPL(sym) extern typeof(sym) sym __visible
+#define EXPORT_SYMBOL_GPL_FUTURE(sym) extern typeof(sym) sym __visible
+#define EXPORT_UNUSED_SYMBOL(sym) extern typeof(sym) sym __visible
+#define EXPORT_UNUSED_SYMBOL_GPL(sym) extern typeof(sym) sym __visible

#endif /* CONFIG_MODULES */

--
1.7.7.6

2012-08-19 02:58:44

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 09/74] sections: Add __visible to x86 sections

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/alternative.c | 4 ++--
arch/x86/kernel/vsyscall_64.c | 4 ++--
arch/x86/power/hibernate_32.c | 2 +-
arch/x86/um/vdso/vma.c | 2 +-
arch/x86/vdso/vma.c | 10 +++++-----
5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index afb7ff7..27ae345 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -252,8 +252,8 @@ static void __init_or_module add_nops(void *insns, unsigned int len)
}
}

-extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
-extern s32 __smp_locks[], __smp_locks_end[];
+extern __visible struct alt_instr __alt_instructions[], __alt_instructions_end[];
+extern __visible s32 __smp_locks[], __smp_locks_end[];
void *text_poke_early(void *addr, const void *opcode, size_t len);

/* Replace instructions with better alternatives for this CPU type.
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 8d141b3..70f25f2 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -355,9 +355,9 @@ cpu_vsyscall_notifier(struct notifier_block *n, unsigned long action, void *arg)

void __init map_vsyscall(void)
{
- extern char __vsyscall_page;
+ extern __visible char __vsyscall_page;
unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
- extern char __vvar_page;
+ extern __visible char __vvar_page;
unsigned long physaddr_vvar_page = __pa_symbol(&__vvar_page);

__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_vsyscall,
diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c
index 74202c1..7b8d7df 100644
--- a/arch/x86/power/hibernate_32.c
+++ b/arch/x86/power/hibernate_32.c
@@ -18,7 +18,7 @@
extern int restore_image(void);

/* References to section boundaries */
-extern const void __nosave_begin, __nosave_end;
+extern __visible const void __nosave_begin, __nosave_end;

/* Pointer to the temporary resume page tables */
pgd_t *resume_pg_dir;
diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index af91901..a09f903 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -16,7 +16,7 @@ unsigned int __read_mostly vdso_enabled = 1;
unsigned long um_vdso_addr;

extern unsigned long task_size;
-extern char vdso_start[], vdso_end[];
+extern __visible char vdso_start[], vdso_end[];

static struct page **vdsop;

diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 00aaf04..fe08e2b 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -18,15 +18,15 @@

unsigned int __read_mostly vdso_enabled = 1;

-extern char vdso_start[], vdso_end[];
-extern unsigned short vdso_sync_cpuid;
+extern __visible char vdso_start[], vdso_end[];
+extern __visible unsigned short vdso_sync_cpuid;

-extern struct page *vdso_pages[];
+extern __visible struct page *vdso_pages[];
static unsigned vdso_size;

#ifdef CONFIG_X86_X32_ABI
-extern char vdsox32_start[], vdsox32_end[];
-extern struct page *vdsox32_pages[];
+extern __visible char vdsox32_start[], vdsox32_end[];
+extern __visible struct page *vdsox32_pages[];
static unsigned vdsox32_size;

static void __init patch_vdsox32(void *vdso, size_t len)
--
1.7.7.6

2012-08-19 02:58:42

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 44/74] lto: Mark functions used by the vsyscall init code visible

From: Andi Kleen <[email protected]>

The vsyscall code is compiled without LTO. This also includes
its init function. The function which are called by it have
to be visible, otherwise they could be optimized away

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/char/random.c | 2 +-
include/linux/mm.h | 1 +
mm/vmalloc.c | 2 +-
3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b86eae9..33c8fe5 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1450,7 +1450,7 @@ late_initcall(random_int_secret_init);
* depleting entropy is too high
*/
static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash);
-unsigned int get_random_int(void)
+__visible unsigned int get_random_int(void)
{
__u32 *hash;
unsigned int ret;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..d3ca155 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1413,6 +1413,7 @@ extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
extern struct file *get_mm_exe_file(struct mm_struct *mm);

extern int may_expand_vm(struct mm_struct *mm, unsigned long npages);
+__visible
extern int install_special_mapping(struct mm_struct *mm,
unsigned long addr, unsigned long len,
unsigned long flags, struct page **pages);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2bb90b1..286ea0c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1558,7 +1558,7 @@ EXPORT_SYMBOL(vunmap);
* Maps @count pages from @pages into contiguous kernel virtual
* space.
*/
-void *vmap(struct page **pages, unsigned int count,
+__visible void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot)
{
struct vm_struct *area;
--
1.7.7.6

2012-08-19 02:58:39

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 59/74] lto: Handle LTO common symbols in module loader

From: Joe Mario <[email protected]>

Here is the workaround I made for having the kernel not reject modules
built with -flto. The clean solution would be to get the compiler to not
emit the symbol. Or if it has to emit the symbol, then emit it as
initialized data but put it into a comdat/linkonce section.

Minor tweaks by AK over Joe's patch.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
kernel/module.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index c00565a..2cbbae3 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1904,6 +1904,10 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)

switch (sym[i].st_shndx) {
case SHN_COMMON:
+ /* Ignore common symbols */
+ if (!strncmp(name, "__gnu_lto", 9))
+ break;
+
/* We compiled with -fno-common. These are not
supposed to happen. */
pr_debug("Common symbol: %s\n", name);
--
1.7.7.6

2012-08-19 02:58:33

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 39/74] x86, lto: Mark vdso variables __visible

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/vvar.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index de656ac..d76ac40 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -35,7 +35,7 @@

#define DEFINE_VVAR(type, name) \
type name \
- __attribute__((section(".vvar_" #name), aligned(16)))
+ __attribute__((section(".vvar_" #name), aligned(16))) __visible

#define VVAR(name) (*vvaraddr_ ## name)

--
1.7.7.6

2012-08-19 02:58:30

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 50/74] x86, lto: Make empty_zero_page __visible for LTO

From: Andi Kleen <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/pgtable.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 49afb3f..72b24ab 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -22,7 +22,8 @@
* ZERO_PAGE is a global shared page that is always zero: used
* for zero-mapped memory areas etc..
*/
-extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
+extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]
+ __visible;
#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))

extern spinlock_t pgd_lock;
--
1.7.7.6

2012-08-19 07:53:11

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 03/74] sections: Make external kallsyms tables __visible

>>> Andi Kleen <[email protected]> 08/19/12 5:02 AM >>>
>-extern const unsigned long kallsyms_addresses[] __attribute__((weak));
>-extern const u8 kallsyms_names[] __attribute__((weak));
>+extern __visible const unsigned long kallsyms_addresses[] __attribute__((weak));
>+extern __visible const u8 kallsyms_names[] __attribute__((weak));

Shouldn't we minimally aim at consistency here:
- all attributes in a one place (I personally prefer the placement between type
and name, for compatibility with other compilers, but there are rare cases -
iirc not on declarations though - where gcc doesn't allow this)
- not using open coded __attribute__(()) when a definition (here: __weak) is
available, or alternatively open coding all of them (__attribute__((weak, ...)))?

Jan

2012-08-19 08:26:11

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 52/74] x86, lto, paravirt: Don't rely on local assembler labels

On 08/18/2012 07:56 PM, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> The paravirt patching code assumes that it can reference a
> local assembler label between two different top level assembler
> statements. This does not work with some experimental gcc builds,
> where the assembler code may end up in different assembler files.

Egad, what are those zany gcc chaps up to now?

J

>
> Replace it with extern / global /asm linkage labels.
>
> This also removes one redundant copy of the macro.
>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/include/asm/paravirt_types.h | 9 +++++----
> arch/x86/kernel/paravirt.c | 5 -----
> 2 files changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 4f262bc..6a464ba 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -385,10 +385,11 @@ extern struct pv_lock_ops pv_lock_ops;
> _paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")
>
> /* Simple instruction patching code. */
> -#define DEF_NATIVE(ops, name, code) \
> - extern const char start_##ops##_##name[] __visible, \
> - end_##ops##_##name[] __visible; \
> - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
> +#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
> +
> +#define DEF_NATIVE(ops, name, code) \
> + __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> + asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))
>
> unsigned paravirt_patch_nop(void);
> unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 17fff18..947255e 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -62,11 +62,6 @@ void __init default_banner(void)
> pv_info.name);
> }
>
> -/* Simple instruction patching code. */
> -#define DEF_NATIVE(ops, name, code) \
> - extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
> -
> /* Undefined instruction for dealing with missing ops pointers. */
> static const unsigned char ud2a[] = { 0x0f, 0x0b };
>

2012-08-19 08:27:08

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 53/74] x86, lto, paravirt: Make paravirt thunks global

On 08/18/2012 07:56 PM, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> The paravirt thunks use a hack of using a static reference to a static
> function to reference that function from the top level statement.
>
> This assumes that gcc always generates static function names in a specific
> format, which is not necessarily true.
>
> Simply make these functions global and asmlinkage. This way the
> static __used variables are not needed and everything works.

I'm not a huge fan of unstaticing all this stuff, but it doesn't
surprise me that the current code is brittle in the face of gcc changes.

J

>
> Changed in paravirt and in all users (Xen and vsmp)
>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/include/asm/paravirt.h | 2 +-
> arch/x86/kernel/vsmp_64.c | 8 ++++----
> arch/x86/xen/irq.c | 8 ++++----
> arch/x86/xen/mmu.c | 16 ++++++++--------
> 4 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index a0facf3..cc733a6 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -804,9 +804,9 @@ static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
> */
> #define PV_CALLEE_SAVE_REGS_THUNK(func) \
> extern typeof(func) __raw_callee_save_##func; \
> - static void *__##func##__ __used = func; \
> \
> asm(".pushsection .text;" \
> + ".globl __raw_callee_save_" #func " ; " \
> "__raw_callee_save_" #func ": " \
> PV_SAVE_ALL_CALLER_REGS \
> "call " #func ";" \
> diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
> index 992f890..f393d6d 100644
> --- a/arch/x86/kernel/vsmp_64.c
> +++ b/arch/x86/kernel/vsmp_64.c
> @@ -33,7 +33,7 @@
> * and vice versa.
> */
>
> -static unsigned long vsmp_save_fl(void)
> +asmlinkage unsigned long vsmp_save_fl(void)
> {
> unsigned long flags = native_save_fl();
>
> @@ -43,7 +43,7 @@ static unsigned long vsmp_save_fl(void)
> }
> PV_CALLEE_SAVE_REGS_THUNK(vsmp_save_fl);
>
> -static void vsmp_restore_fl(unsigned long flags)
> +asmlinkage void vsmp_restore_fl(unsigned long flags)
> {
> if (flags & X86_EFLAGS_IF)
> flags &= ~X86_EFLAGS_AC;
> @@ -53,7 +53,7 @@ static void vsmp_restore_fl(unsigned long flags)
> }
> PV_CALLEE_SAVE_REGS_THUNK(vsmp_restore_fl);
>
> -static void vsmp_irq_disable(void)
> +asmlinkage void vsmp_irq_disable(void)
> {
> unsigned long flags = native_save_fl();
>
> @@ -61,7 +61,7 @@ static void vsmp_irq_disable(void)
> }
> PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_disable);
>
> -static void vsmp_irq_enable(void)
> +asmlinkage void vsmp_irq_enable(void)
> {
> unsigned long flags = native_save_fl();
>
> diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> index 1573376..3dd8831 100644
> --- a/arch/x86/xen/irq.c
> +++ b/arch/x86/xen/irq.c
> @@ -21,7 +21,7 @@ void xen_force_evtchn_callback(void)
> (void)HYPERVISOR_xen_version(0, NULL);
> }
>
> -static unsigned long xen_save_fl(void)
> +asmlinkage unsigned long xen_save_fl(void)
> {
> struct vcpu_info *vcpu;
> unsigned long flags;
> @@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);
>
> -static void xen_restore_fl(unsigned long flags)
> +asmlinkage void xen_restore_fl(unsigned long flags)
> {
> struct vcpu_info *vcpu;
>
> @@ -66,7 +66,7 @@ static void xen_restore_fl(unsigned long flags)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);
>
> -static void xen_irq_disable(void)
> +asmlinkage void xen_irq_disable(void)
> {
> /* There's a one instruction preempt window here. We need to
> make sure we're don't switch CPUs between getting the vcpu
> @@ -77,7 +77,7 @@ static void xen_irq_disable(void)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
>
> -static void xen_irq_enable(void)
> +asmlinkage void xen_irq_enable(void)
> {
> struct vcpu_info *vcpu;
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index b65a761..9f82443 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -429,7 +429,7 @@ static pteval_t iomap_pte(pteval_t val)
> return val;
> }
>
> -static pteval_t xen_pte_val(pte_t pte)
> +asmlinkage pteval_t xen_pte_val(pte_t pte)
> {
> pteval_t pteval = pte.pte;
> #if 0
> @@ -446,7 +446,7 @@ static pteval_t xen_pte_val(pte_t pte)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
>
> -static pgdval_t xen_pgd_val(pgd_t pgd)
> +asmlinkage pgdval_t xen_pgd_val(pgd_t pgd)
> {
> return pte_mfn_to_pfn(pgd.pgd);
> }
> @@ -477,7 +477,7 @@ void xen_set_pat(u64 pat)
> WARN_ON(pat != 0x0007010600070106ull);
> }
>
> -static pte_t xen_make_pte(pteval_t pte)
> +asmlinkage pte_t xen_make_pte(pteval_t pte)
> {
> phys_addr_t addr = (pte & PTE_PFN_MASK);
> #if 0
> @@ -512,14 +512,14 @@ static pte_t xen_make_pte(pteval_t pte)
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);
>
> -static pgd_t xen_make_pgd(pgdval_t pgd)
> +asmlinkage pgd_t xen_make_pgd(pgdval_t pgd)
> {
> pgd = pte_pfn_to_mfn(pgd);
> return native_make_pgd(pgd);
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_make_pgd);
>
> -static pmdval_t xen_pmd_val(pmd_t pmd)
> +asmlinkage pmdval_t xen_pmd_val(pmd_t pmd)
> {
> return pte_mfn_to_pfn(pmd.pmd);
> }
> @@ -578,7 +578,7 @@ static void xen_pmd_clear(pmd_t *pmdp)
> }
> #endif /* CONFIG_X86_PAE */
>
> -static pmd_t xen_make_pmd(pmdval_t pmd)
> +asmlinkage pmd_t xen_make_pmd(pmdval_t pmd)
> {
> pmd = pte_pfn_to_mfn(pmd);
> return native_make_pmd(pmd);
> @@ -586,13 +586,13 @@ static pmd_t xen_make_pmd(pmdval_t pmd)
> PV_CALLEE_SAVE_REGS_THUNK(xen_make_pmd);
>
> #if PAGETABLE_LEVELS == 4
> -static pudval_t xen_pud_val(pud_t pud)
> +asmlinkage pudval_t xen_pud_val(pud_t pud)
> {
> return pte_mfn_to_pfn(pud.pud);
> }
> PV_CALLEE_SAVE_REGS_THUNK(xen_pud_val);
>
> -static pud_t xen_make_pud(pudval_t pud)
> +asmlinkage pud_t xen_make_pud(pudval_t pud)
> {
> pud = pte_pfn_to_mfn(pud);
>

2012-08-19 08:28:13

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 46/74] x86, lto: Disable fancy hweight optimizations for LTO

>>> Andi Kleen <[email protected]> 08/19/12 4:58 AM >>>
>--- a/arch/x86/Kconfig
>+++ b/arch/x86/Kconfig
>@@ -224,8 +224,9 @@ config X86_32_LAZY_GS
>
>config ARCH_HWEIGHT_CFLAGS
> string
>- default "-fcall-saved-ecx -fcall-saved-edx" if X86_32
>- default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64
>+ default "-fcall-saved-ecx -fcall-saved-edx" if X86_32 && !LTO
>+ default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64 && !LTO
>+ default "" if LTO

By moving this last line first you can avoid modifying the other two lines.

>--- a/arch/x86/include/asm/arch_hweight.h
>+++ b/arch/x86/include/asm/arch_hweight.h
>@@ -25,9 +25,14 @@ static inline unsigned int __arch_hweight32(unsigned int w)
>{
> unsigned int res = 0;
>
>+#ifdef CONFIG_LTO
>+ res = __sw_hweight32(w);
>+#else
>+
> asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, X86_FEATURE_POPCNT)
> : "="REG_OUT (res)
> : REG_IN (w));
>+#endif

Isn't this a little to harsh? Rather than not using popcnt at all, why don't you just add
the necessary clobbers to the asm() in the LTO case?

Jan

2012-08-19 08:37:38

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 48/74] x86, lto: Use inline assembler instead of global register variable to get sp

>>> Andi Kleen <[email protected]> 08/19/12 4:59 AM >>>
>I verified this generates the same binary (on 64bit) as the original
>register variable.

This isn't very surprising given that the modified code is inside a
CONFIG_X86_32 conditional (as ought to be obvious from the code using
%%esp). Given that it's being used as operand to a binary &, the resulting
code - if the compiler handles this only half way sensibly - can hardly be
expected to be identical.

>-register unsigned long current_stack_pointer asm("esp") __used;
>+#define current_stack_pointer ({ \
>+ unsigned long sp; \
>+ asm("mov %%esp,%0" : "=r" (sp)); \
>+ sp; \
>+})

It would get closer to the original if you used "=g" (I noticed in a few
earlier patches already that you like to use "=r" in places where a register
is not strictly required, thus reducing the flexibility the compiler has).

Also, given that this is more a workaround for a compiler deficiency,
shouldn't this be conditional upon use of LTO?

Jan

2012-08-19 08:46:17

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 55/74] lto, workaround: Add workaround for initcall reordering

>>> Andi Kleen <[email protected]> 08/19/12 5:05 AM >>>
>Work around a LTO gcc problem: when there is no reference to a variable
>in a module it will be moved to the end of the program. This causes
>reordering of initcalls which the kernel does not like.
>Add a dummy reference function to avoid this. The function is
>deleted by the linker.

This is not even true on x86, not to speak of generally.

>+#ifdef CONFIG_LTO
>+/* Work around a LTO gcc problem: when there is no reference to a variable
>+ * in a module it will be moved to the end of the program. This causes
>+ * reordering of initcalls which the kernel does not like.
>+ * Add a dummy reference function to avoid this. The function is
>+ * deleted by the linker.
>+ */
>+#define LTO_REFERENCE_INITCALL(x) \
>+ ; /* yes this is needed */ \
>+ static __used __exit void *reference_##x(void) \

Why not put it into e.g. section .discard.text? That could be expected to be
discarded by the linker without being arch dependent, as long as all arches
use DISCARDS in their linker script.

Jan

2012-08-19 08:53:10

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 59/74] lto: Handle LTO common symbols in module loader

>>> Andi Kleen <[email protected]> 08/19/12 4:59 AM >>>
>@@ -1904,6 +1904,10 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
>
> switch (sym[i].st_shndx) {
> case SHN_COMMON:
>+ /* Ignore common symbols */
>+ if (!strncmp(name, "__gnu_lto", 9))
>+ break;
>+
> /* We compiled with -fno-common. These are not
> supposed to happen. */
> pr_debug("Common symbol: %s\n", name);

I think it is dangerous to just match the start of the symbol name here -
this may in the future well lead to ignoring symbols we shouldn't be
ignoring.

Also I would think the added comment ought to say "Ignore LTO symbols."
Otherwise its sort of contradicting the purpose of the case being handled
here.

Jan

2012-08-19 09:01:21

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 37/74] lto, KVM: Don't assume asm statements end up in the same assembler file

On 08/19/2012 05:56 AM, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> The VMX code references a local assembler label between two inline
> assembler statements. This assumes they both end up in the same
> assembler files. In some experimental builds of gcc this is not
> necessarily true, causing linker failures.
>
> Replace the local label reference with a more traditional asmlinkage
> extern.
>
> This also eliminates one assembler statement and
> generates a bit better code on 64bit: the compiler can
> use a RIP relative LEA instead of a movabs, saving
> a few bytes.

I'm happy to see work on lto-enabling the kernel.

>
> +extern __visible unsigned long kvm_vmx_return;
> +
> /*
> * Set up the vmcs's constant host-state fields, i.e., host-state fields that
> * will not change in the lifetime of the guest.
> @@ -3753,8 +3755,7 @@ static void vmx_set_constant_host_state(void)
> native_store_idt(&dt);
> vmcs_writel(HOST_IDTR_BASE, dt.address); /* 22.2.4 */
>
> - asm("mov $.Lkvm_vmx_return, %0" : "=r"(tmpl));
> - vmcs_writel(HOST_RIP, tmpl); /* 22.2.5 */
> + vmcs_writel(HOST_RIP, (unsigned long)&kvm_vmx_return); /* 22.2.5 */
>
> rdmsr(MSR_IA32_SYSENTER_CS, low32, high32);
> vmcs_write32(HOST_IA32_SYSENTER_CS, low32);
> @@ -6305,9 +6306,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> /* Enter guest mode */
> "jne .Llaunched \n\t"
> __ex(ASM_VMX_VMLAUNCH) "\n\t"
> - "jmp .Lkvm_vmx_return \n\t"
> + "jmp kvm_vmx_return \n\t"
> ".Llaunched: " __ex(ASM_VMX_VMRESUME) "\n\t"
> - ".Lkvm_vmx_return: "
> + ".globl kvm_vmx_return\n"
> + "kvm_vmx_return: "
> /* Save guest registers, load host registers, keep flags */
> "mov %0, %c[wordsize](%%"R"sp) \n\t"
> "pop %0 \n\t"
>

The reason we use a local label is so that we the function isn't split
into two from the profiler's point of view. See cd2276a795b013d1.

One way to fix this is to have a .data variable initialized to point to
.Lkvm_vmx_return (this can be done from the same asm statement in
vmx_vcpu_run), and reference that variable in
vmx_set_constant_host_state(). If no one comes up with a better idea,
I'll write a patch doing this.

--
error compiling committee.c: too many arguments to function

2012-08-19 15:01:50

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 55/74] lto, workaround: Add workaround for initcall reordering

On Sun, Aug 19, 2012 at 09:46:04AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <[email protected]> 08/19/12 5:05 AM >>>
> >Work around a LTO gcc problem: when there is no reference to a variable
> >in a module it will be moved to the end of the program. This causes
> >reordering of initcalls which the kernel does not like.
> >Add a dummy reference function to avoid this. The function is
> >deleted by the linker.
>
> This is not even true on x86, not to speak of generally.

Why is it not true ?

__initcall is only defined for !MODULE and there __exit discards.

>
> >+#ifdef CONFIG_LTO
> >+/* Work around a LTO gcc problem: when there is no reference to a variable
> >+ * in a module it will be moved to the end of the program. This causes
> >+ * reordering of initcalls which the kernel does not like.
> >+ * Add a dummy reference function to avoid this. The function is
> >+ * deleted by the linker.
> >+ */
> >+#define LTO_REFERENCE_INITCALL(x) \
> >+ ; /* yes this is needed */ \
> >+ static __used __exit void *reference_##x(void) \
>
> Why not put it into e.g. section .discard.text? That could be expected to be
> discarded by the linker without being arch dependent, as long as all arches
> use DISCARDS in their linker script.


That's what __exit does, doesn't it?

-Andi

--
[email protected] -- Speaking for myself only.

2012-08-19 15:09:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 37/74] lto, KVM: Don't assume asm statements end up in the same assembler file

> The reason we use a local label is so that we the function isn't split
> into two from the profiler's point of view. See cd2276a795b013d1.

Hmm that commit message is not very enlightening.

The goal was to force a compiler error?

With LTO there is no way to force two functions be in the same assembler
file. The partitioner is always allowed to split.

>
> One way to fix this is to have a .data variable initialized to point to
> .Lkvm_vmx_return (this can be done from the same asm statement in
> vmx_vcpu_run), and reference that variable in
> vmx_set_constant_host_state(). If no one comes up with a better idea,
> I'll write a patch doing this.

I'm not clear how that is better than my patch.

-andi

--
[email protected] -- Speaking for myself only.

2012-08-19 15:13:51

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 37/74] lto, KVM: Don't assume asm statements end up in the same assembler file

On 08/19/2012 06:09 PM, Andi Kleen wrote:
>> The reason we use a local label is so that we the function isn't split
>> into two from the profiler's point of view. See cd2276a795b013d1.
>
> Hmm that commit message is not very enlightening.
>
> The goal was to force a compiler error?

No, the goal was to avoid a global label in the middle of a function.
The profiler interprets it as a new function. After your patch,
profiles will show a function named kvm_vmx_return taking a few percent
cpu, although there is no such function.

>
> With LTO there is no way to force two functions be in the same assembler
> file. The partitioner is always allowed to split.

I'm not trying to force two functions to be in the same assembler file.

>
>>
>> One way to fix this is to have a .data variable initialized to point to
>> .Lkvm_vmx_return (this can be done from the same asm statement in
>> vmx_vcpu_run), and reference that variable in
>> vmx_set_constant_host_state(). If no one comes up with a better idea,
>> I'll write a patch doing this.
>
> I'm not clear how that is better than my patch.

My patch will not generate the artifact with kvm_vmx_return.

--
error compiling committee.c: too many arguments to function

2012-08-19 15:15:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 46/74] x86, lto: Disable fancy hweight optimizations for LTO

> By moving this last line first you can avoid modifying the other two lines.

Ok.

>
> >--- a/arch/x86/include/asm/arch_hweight.h
> >+++ b/arch/x86/include/asm/arch_hweight.h
> >@@ -25,9 +25,14 @@ static inline unsigned int __arch_hweight32(unsigned int w)
> >{
> > unsigned int res = 0;
> >
> >+#ifdef CONFIG_LTO
> >+ res = __sw_hweight32(w);
> >+#else
> >+
> > asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, X86_FEATURE_POPCNT)
> > : "="REG_OUT (res)
> > : REG_IN (w));
> >+#endif
>
> Isn't this a little to harsh? Rather than not using popcnt at all, why don't you just add
> the necessary clobbers to the asm() in the LTO case?

gcc lacks the means to declare that a asm uses an external symbol
currently. Ok we could make it visible. But there's no way to make the
special calling convention work anyways, at least not without someone
changing gcc to allow to declare this per function.

I'm not sure the optimization is really worth it anyways, hweight should
be uncommon.

-Andi

--
[email protected] -- Speaking for myself only.

2012-08-19 15:18:28

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 48/74] x86, lto: Use inline assembler instead of global register variable to get sp

On Sun, Aug 19, 2012 at 09:37:27AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <[email protected]> 08/19/12 4:59 AM >>>
> >I verified this generates the same binary (on 64bit) as the original
> >register variable.
>
> This isn't very surprising given that the modified code is inside a
> CONFIG_X86_32 conditional (as ought to be obvious from the code using
> %%esp). Given that it's being used as operand to a binary &, the resulting
> code - if the compiler handles this only half way sensibly - can hardly be
> expected to be identical.

Doh! Thanks. I'll double check.

You're right it'll likely change code. But it shouldn't be common.

>
> >-register unsigned long current_stack_pointer asm("esp") __used;
> >+#define current_stack_pointer ({ \
> >+ unsigned long sp; \
> >+ asm("mov %%esp,%0" : "=r" (sp)); \
> >+ sp; \
> >+})
>
> It would get closer to the original if you used "=g" (I noticed in a few
> earlier patches already that you like to use "=r" in places where a register
> is not strictly required, thus reducing the flexibility the compiler has).

My fingers have =r hardcoded. Will fix.

>
> Also, given that this is more a workaround for a compiler deficiency,
> shouldn't this be conditional upon use of LTO?

I think it's cleaner than the global reg var, so unconditional should
be fine. It wouldn't surprise me if global reg causes trouble even
without LTO, i probably just triggered some latent bug.

-Andi

--
[email protected] -- Speaking for myself only.

2012-08-19 15:21:00

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 37/74] lto, KVM: Don't assume asm statements end up in the same assembler file

On Sun, Aug 19, 2012 at 06:12:57PM +0300, Avi Kivity wrote:
> On 08/19/2012 06:09 PM, Andi Kleen wrote:
> >> The reason we use a local label is so that we the function isn't split
> >> into two from the profiler's point of view. See cd2276a795b013d1.
> >
> > Hmm that commit message is not very enlightening.
> >
> > The goal was to force a compiler error?
>
> No, the goal was to avoid a global label in the middle of a function.
> The profiler interprets it as a new function. After your patch,

Ah got it now. I always used to have the same problem with sys_call_return.`

I wonder if there shouldn't be a way to tell perf to ignore a symbol.

> >>
> >> One way to fix this is to have a .data variable initialized to point to
> >> .Lkvm_vmx_return (this can be done from the same asm statement in
> >> vmx_vcpu_run), and reference that variable in
> >> vmx_set_constant_host_state(). If no one comes up with a better idea,
> >> I'll write a patch doing this.
> >
> > I'm not clear how that is better than my patch.
>
> My patch will not generate the artifact with kvm_vmx_return.

Ok fine for me. I'll keep this patch for now, until you have
something better.

-Andi


--
[email protected] -- Speaking for myself only.

2012-08-19 15:23:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 59/74] lto: Handle LTO common symbols in module loader

On Sun, Aug 19, 2012 at 09:53:02AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <[email protected]> 08/19/12 4:59 AM >>>
> >@@ -1904,6 +1904,10 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
> >
> > switch (sym[i].st_shndx) {
> > case SHN_COMMON:
> >+ /* Ignore common symbols */
> >+ if (!strncmp(name, "__gnu_lto", 9))
> >+ break;
> >+
> > /* We compiled with -fno-common. These are not
> > supposed to happen. */
> > pr_debug("Common symbol: %s\n", name);
>
> I think it is dangerous to just match the start of the symbol name here -
> this may in the future well lead to ignoring symbols we shouldn't be
> ignoring.
>
> Also I would think the added comment ought to say "Ignore LTO symbols."
> Otherwise its sort of contradicting the purpose of the case being handled
> here.

Ok maybe should error out. This case only happens with fat LTO when
the LTO step is not actually run.

It used to happen because old versions of this patchkit
didn't correctly LTO modules

I'll change it to error out. The reason for the prefix was that
there is a __gnu_lto_vXXX and the version number could change.

Thanks for the reviewws.

-Andi


--
[email protected] -- Speaking for myself only.

2012-08-19 15:25:57

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 53/74] x86, lto, paravirt: Make paravirt thunks global

On Sun, Aug 19, 2012 at 01:27:00AM -0700, Jeremy Fitzhardinge wrote:
> On 08/18/2012 07:56 PM, Andi Kleen wrote:
> > From: Andi Kleen <[email protected]>
> >
> > The paravirt thunks use a hack of using a static reference to a static
> > function to reference that function from the top level statement.
> >
> > This assumes that gcc always generates static function names in a specific
> > format, which is not necessarily true.
> >
> > Simply make these functions global and asmlinkage. This way the
> > static __used variables are not needed and everything works.
>
> I'm not a huge fan of unstaticing all this stuff, but it doesn't
> surprise me that the current code is brittle in the face of gcc changes.

Hmm actually reading my own patch again it may be wrong. You need
regparm(3) here right? asmlinkage forces it to (0). I'll change it to
__visible. I think I did that earlier for all the 32bit code, but missed
this one.

-Andi

2012-08-19 15:29:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 03/74] sections: Make external kallsyms tables __visible

On Sun, Aug 19, 2012 at 08:53:03AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <[email protected]> 08/19/12 5:02 AM >>>
> >-extern const unsigned long kallsyms_addresses[] __attribute__((weak));
> >-extern const u8 kallsyms_names[] __attribute__((weak));
> >+extern __visible const unsigned long kallsyms_addresses[] __attribute__((weak));
> >+extern __visible const u8 kallsyms_names[] __attribute__((weak));
>
> Shouldn't we minimally aim at consistency here:
> - all attributes in a one place (I personally prefer the placement between type
> and name, for compatibility with other compilers, but there are rare cases -
> iirc not on declarations though - where gcc doesn't allow this)

Ok.

> - not using open coded __attribute__(()) when a definition (here: __weak) is
> available, or alternatively open coding all of them (__attribute__((weak, ...)))?

I just kept the original code. But yes it should be using __weak.
I can change that.

-Andi

--
[email protected] -- Speaking for myself only.

2012-08-20 07:01:48

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 27/74] lto: Mark EXPORT_SYMBOL symbols __visible

On Sat, 18 Aug 2012 19:56:23 -0700, Andi Kleen <[email protected]> wrote:
> @@ -78,11 +78,13 @@ extern struct module __this_module;
>
> #else /* !CONFIG_MODULES... */
>
> -#define EXPORT_SYMBOL(sym)
> -#define EXPORT_SYMBOL_GPL(sym)
> -#define EXPORT_SYMBOL_GPL_FUTURE(sym)
> -#define EXPORT_UNUSED_SYMBOL(sym)
> -#define EXPORT_UNUSED_SYMBOL_GPL(sym)
> +/* Even without modules keep the __visible side effect */
> +
> +#define EXPORT_SYMBOL(sym) extern typeof(sym) sym __visible
> +#define EXPORT_SYMBOL_GPL(sym) extern typeof(sym) sym __visible
> +#define EXPORT_SYMBOL_GPL_FUTURE(sym) extern typeof(sym) sym __visible
> +#define EXPORT_UNUSED_SYMBOL(sym) extern typeof(sym) sym __visible
> +#define EXPORT_UNUSED_SYMBOL_GPL(sym) extern typeof(sym) sym __visible
>
> #endif /* CONFIG_MODULES */

Really, why? Seems like a win to have them eliminated if unused.

Naively, I would think many cases of __visible should be #ifdef
CONFIG_MODULES. What am I missing?

Thanks,
Rusty.

2012-08-20 07:48:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel


* Andi Kleen <[email protected]> wrote:

> This rather large patchkit enables gcc Link Time Optimization (LTO)
> support for the kernel.
>
> With LTO gcc will do whole program optimizations for
> the whole kernel and each module. This increases compile time,
> but can generate faster code.

By how much does it increase compile time?

How much faster does kernel code get?

Last time I checked LTO optimizations (half a year ago) it
resulted in significantly slower build times.

I tried out and measured the LTO speedups and was less than
impressed by them - a lot of build time increase for not much
increase in performance. There was also visible, ongoing
maintenance cost.

The combination of these seemed like a show-stopper.

It's obviously an optimization feature we should consider, but
we really need hard numbers to make a cost/benefit analysis.

Thanks,

Ingo

2012-08-20 08:21:30

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 35/74] lto, crypto, aes: mark AES tables __visible

On Sat, Aug 18, 2012 at 07:56:31PM -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> Various tables in aes_generic are accessed by assembler code.
> Mark them __visible for LTO
>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>

Acked-by: Herbert Xu <[email protected]>
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2012-08-20 08:21:38

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 36/74] lto, crypto, camelia: Make camelia tables used by assembler __visible

On Sat, Aug 18, 2012 at 07:56:32PM -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>

Acked-by: Herbert Xu <[email protected]>
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2012-08-20 08:30:44

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH 26/74] lto, sound: Fix export symbols for !CONFIG_MODULES

At Sat, 18 Aug 2012 19:56:22 -0700,
Andi Kleen wrote:
>
> From: Andi Kleen <[email protected]>
>
> The new LTO EXPORT_SYMBOL references symbols even without CONFIG_MODULES.
> Since these functions are macros in this case this doesn't work.
> Add a ifdef to fix the build.
>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>

Reviewed-by: Takashi Iwai <[email protected]>

I haven't seen the background, so let me ask a dumb question:
is it a 3.6 fix or for 3.7?

And shall I apply this one to sound git tree, or would you like to
apply all in a single tree?


thanks,

Takashi

> ---
> sound/core/seq/seq_device.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/sound/core/seq/seq_device.c b/sound/core/seq/seq_device.c
> index 5cf8d65..60e8fc1 100644
> --- a/sound/core/seq/seq_device.c
> +++ b/sound/core/seq/seq_device.c
> @@ -569,5 +569,7 @@ EXPORT_SYMBOL(snd_seq_device_load_drivers);
> EXPORT_SYMBOL(snd_seq_device_new);
> EXPORT_SYMBOL(snd_seq_device_register_driver);
> EXPORT_SYMBOL(snd_seq_device_unregister_driver);
> +#ifdef CONFIG_MODULES
> EXPORT_SYMBOL(snd_seq_autoload_lock);
> EXPORT_SYMBOL(snd_seq_autoload_unlock);
> +#endif
> --
> 1.7.7.6
>

2012-08-20 09:15:37

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 46/74] x86, lto: Disable fancy hweight optimizations for LTO

On 08/19/2012 05:56 AM, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> The fancy x86 hweight uses different compiler options for the
> hweight file. This does not work with LTO. Just disable the optimization
> with LTO
>
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/Kconfig | 5 +++--
> arch/x86/include/asm/arch_hweight.h | 9 +++++++++
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 8ec3a1a..9382b09 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -224,8 +224,9 @@ config X86_32_LAZY_GS
>
> config ARCH_HWEIGHT_CFLAGS
> string
> - default "-fcall-saved-ecx -fcall-saved-edx" if X86_32
> - default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64
> + default "-fcall-saved-ecx -fcall-saved-edx" if X86_32 && !LTO
> + default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64 && !LTO
> + default "" if LTO
>

Seems heavy handed. How about using __attribute__((optimize(...))) instead?


--
error compiling committee.c: too many arguments to function

2012-08-20 09:42:38

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 46/74] x86, lto: Disable fancy hweight optimizations for LTO

> > config ARCH_HWEIGHT_CFLAGS
> > string
> > - default "-fcall-saved-ecx -fcall-saved-edx" if X86_32
> > - default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64
> > + default "-fcall-saved-ecx -fcall-saved-edx" if X86_32 && !LTO
> > + default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64 && !LTO
> > + default "" if LTO
> >
>
> Seems heavy handed. How about using __attribute__((optimize(...))) instead?

Doesn't work for this. In fact according to the gcc developers that
attribute is mostly broken.

-Andi

--
[email protected] -- Speaking for myself only.

2012-08-20 09:45:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 26/74] lto, sound: Fix export symbols for !CONFIG_MODULES

On Mon, Aug 20, 2012 at 10:30:29AM +0200, Takashi Iwai wrote:
> At Sat, 18 Aug 2012 19:56:22 -0700,
> Andi Kleen wrote:
> >
> > From: Andi Kleen <[email protected]>
> >
> > The new LTO EXPORT_SYMBOL references symbols even without CONFIG_MODULES.
> > Since these functions are macros in this case this doesn't work.
> > Add a ifdef to fix the build.
> >
> > Cc: [email protected]
> > Signed-off-by: Andi Kleen <[email protected]>
>
> Reviewed-by: Takashi Iwai <[email protected]>
>
> I haven't seen the background, so let me ask a dumb question:
> is it a 3.6 fix or for 3.7?

I don't strictly need it for 3.6, 3.7 is ok.

>
> And shall I apply this one to sound git tree, or would you like to
> apply all in a single tree?

Please apply it in yours.

-Andi

2012-08-20 09:49:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 27/74] lto: Mark EXPORT_SYMBOL symbols __visible

> Really, why? Seems like a win to have them eliminated if unused.
>
> Naively, I would think many cases of __visible should be #ifdef
> CONFIG_MODULES. What am I missing?

It worked around some problem I forgot now :)

You're right it shouldn't be needed in theory for !MODULES. I'll double
check.

-Andi

--
[email protected] -- Speaking for myself only.

2012-08-20 09:53:52

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH 26/74] lto, sound: Fix export symbols for !CONFIG_MODULES

At Mon, 20 Aug 2012 11:45:45 +0200,
Andi Kleen wrote:
>
> On Mon, Aug 20, 2012 at 10:30:29AM +0200, Takashi Iwai wrote:
> > At Sat, 18 Aug 2012 19:56:22 -0700,
> > Andi Kleen wrote:
> > >
> > > From: Andi Kleen <[email protected]>
> > >
> > > The new LTO EXPORT_SYMBOL references symbols even without CONFIG_MODULES.
> > > Since these functions are macros in this case this doesn't work.
> > > Add a ifdef to fix the build.
> > >
> > > Cc: [email protected]
> > > Signed-off-by: Andi Kleen <[email protected]>
> >
> > Reviewed-by: Takashi Iwai <[email protected]>
> >
> > I haven't seen the background, so let me ask a dumb question:
> > is it a 3.6 fix or for 3.7?
>
> I don't strictly need it for 3.6, 3.7 is ok.
>
> >
> > And shall I apply this one to sound git tree, or would you like to
> > apply all in a single tree?
>
> Please apply it in yours.

OK, applied now. Thanks.


Takashi

2012-08-20 10:10:47

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel

On Mon, Aug 20, 2012 at 09:48:35AM +0200, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > This rather large patchkit enables gcc Link Time Optimization (LTO)
> > support for the kernel.
> >
> > With LTO gcc will do whole program optimizations for
> > the whole kernel and each module. This increases compile time,
> > but can generate faster code.
>
> By how much does it increase compile time?

All numbers are preliminary at this point. I miss both some code
quality and compile time improvements that it could do, to work
around some issues that are fixable.

Compile time:

Compilation slowdown depends on the largest binary size. I see between
50% and 4x. The 4x case is mainly for allyes (so unlikely); a normal
distro build, which is mostly modular, or a defconfig like build is more
towards the 50%.

Currently I have to disable slim LTO, which essentially means everything
is compiled twice. Once that's fixed it should compile faster for
the normal case too (although it will be still slower than non LTO)

A lot of the overhead on the larger builds is also some specific
gcc code that I'm working with the gcc developers on to improve.
So the 4x extreme case will hopefully go down.

The large builds also currently suffer from too much memory
consumption. That will hopefully improve too, as gcc improves.

I wouldn't expect anyone using it for day to day kernel hacking
(I understand that 50% are annoying for that). It's more like a
"release build" mode.

The performance is currently also missing some improvements due
to workarounds.

Performance:

Hackbench goes about 5% faster, so the scheduler benefits. Kbuild
is not changing much. Various network benchmarks over loopback
go faster too (best case seen 18%+), so the network stack seems to
benefit. A lot of micro benchmarks go faster, sometimes larger numbers.
There are some minor regressions.

A lot of benchmarking on larger workloads is still outstanding.
But the existing numbers are promising I believe. Things will still
change, it's still early.

I would welcome any benchmarking from other people.

I also expect gcc to do more LTO optimizations in the future, so we'll
hopefully see more gains over time. Essentially it gives more
power to the compiler.

Long term it would also help the kernel source organization. For example
there's no reason with LTO to have gigantic includes with large inlines,
because cross file inlining works in a efficient way without reparsing.

In theory (but that's not realized today) the automatic repartitioning of
compilation units could improve compile time with lots of small files

-Andi

2012-08-20 10:57:20

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 46/74] x86, lto: Disable fancy hweight optimizations for LTO

>>> On 19.08.12 at 17:15, Andi Kleen <[email protected]> wrote:
>> >--- a/arch/x86/include/asm/arch_hweight.h
>> >+++ b/arch/x86/include/asm/arch_hweight.h
>> >@@ -25,9 +25,14 @@ static inline unsigned int __arch_hweight32(unsigned int w)
>> >{
>> > unsigned int res = 0;
>> >
>> >+#ifdef CONFIG_LTO
>> >+ res = __sw_hweight32(w);
>> >+#else
>> >+
>> > asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, X86_FEATURE_POPCNT)
>> > : "="REG_OUT (res)
>> > : REG_IN (w));
>> >+#endif
>>
>> Isn't this a little to harsh? Rather than not using popcnt at all, why don't
>> you just add the necessary clobbers to the asm() in the LTO case?
>
> gcc lacks the means to declare that a asm uses an external symbol
> currently. Ok we could make it visible. But there's no way to make the
> special calling convention work anyways, at least not without someone
> changing gcc to allow to declare this per function.

That's not the point: The point really is that you could allow the
alternative regardless of LTO, and just penalize the LTO case
by having even the asm clobber the registers that a function call
would not preserve.

> I'm not sure the optimization is really worth it anyways, hweight should
> be uncommon.

That's a separate question (but I sort of agree - not sure whether
CPU mask weights ever get calculated on hot paths).

Jan

2012-08-20 11:00:53

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 55/74] lto, workaround: Add workaround for initcall reordering

>>> On 19.08.12 at 17:01, Andi Kleen <[email protected]> wrote:
> On Sun, Aug 19, 2012 at 09:46:04AM +0100, Jan Beulich wrote:
>> >>> Andi Kleen <[email protected]> 08/19/12 5:05 AM >>>
>> >Work around a LTO gcc problem: when there is no reference to a variable
>> >in a module it will be moved to the end of the program. This causes
>> >reordering of initcalls which the kernel does not like.
>> >Add a dummy reference function to avoid this. The function is
>> >deleted by the linker.
>>
>> This is not even true on x86, not to speak of generally.
>
> Why is it not true ?
>
> __initcall is only defined for !MODULE and there __exit discards.

__exit, on x86 and perhaps other arches, causes the code
to be discarded at runtime only.

>> >+#ifdef CONFIG_LTO
>> >+/* Work around a LTO gcc problem: when there is no reference to a variable
>> >+ * in a module it will be moved to the end of the program. This causes
>> >+ * reordering of initcalls which the kernel does not like.
>> >+ * Add a dummy reference function to avoid this. The function is
>> >+ * deleted by the linker.
>> >+ */
>> >+#define LTO_REFERENCE_INITCALL(x) \
>> >+ ; /* yes this is needed */ \
>> >+ static __used __exit void *reference_##x(void) \
>>
>> Why not put it into e.g. section .discard.text? That could be expected to be
>> discarded by the linker without being arch dependent, as long as all arches
>> use DISCARDS in their linker script.
>
>
> That's what __exit does, doesn't it?

No - see above. Using .discard.* enforces the discarding at link
time.

Jan

2012-08-20 11:18:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 46/74] x86, lto: Disable fancy hweight optimizations for LTO

> That's not the point: The point really is that you could allow the
> alternative regardless of LTO, and just penalize the LTO case
> by having even the asm clobber the registers that a function call
> would not preserve.

That's just what a normal call does, right?

-Andi
--
[email protected] -- Speaking for myself only

2012-08-20 12:38:27

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 46/74] x86, lto: Disable fancy hweight optimizations for LTO

>>> On 20.08.12 at 13:18, Andi Kleen <[email protected]> wrote:
>> That's not the point: The point really is that you could allow the
>> alternative regardless of LTO, and just penalize the LTO case
>> by having even the asm clobber the registers that a function call
>> would not preserve.
>
> That's just what a normal call does, right?

Exactly.

Jan

2012-08-20 17:48:16

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 54/74] x86, lto, vdso: Don't duplicate vvar address variables

On Sat, Aug 18, 2012 at 7:56 PM, Andi Kleen <[email protected]> wrote:
> From: Andi Kleen <[email protected]>
>
> Every includer of vvar.h currently gets own static variables
> for all the vvar addresses. Generate just one set each for the
> main kernel and for the vdso. This saves some data space.
>
> Cc: Andy Lutomirski <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>

[This doesn't apply to -linus or to 3.5, so I haven't actually tested it.]

NACK, without significant further evidence that this is a good idea.

On input like this:

static const int * const vvaraddr_test = 0xffffffffff601000;

int func(void)
{
return *vvaraddr_test;
}

gcc -O2 generates:

.file "constptr.c"
.text
.p2align 4,,15
.globl func
.type func, @function
func:
.LFB0:
.cfi_startproc
movl -10481664, %eax
ret
.cfi_endproc
.LFE0:
.size func, .-func
.ident "GCC: (GNU) 4.6.3 20120306 (Red Hat 4.6.3-2)"
.section .note.GNU-stack,"",@progbits

Note, in particular, that (a) the load from the vvar uses an immediate
memory operand (this avoids a cacheline access, which is a measureable
speedup) and (b) vvaraddr_test was not emitted as data at all.

Your code will force each vvar address to be emitted as data and will
cause each reference to reference it as data. Barring cleverness (and
I don't remember whether the vdso build is currently clever), this
could result in double-indirect access via the GOT from the vdso.

This kind of change IMO needs actual size measurements, benchmarks,
and some evidence that duplicate .data/.rodata things were emitted.

--Andy

> ---
> arch/x86/include/asm/vvar.h | 27 +++++++++++++++++----------
> arch/x86/vdso/vclock_gettime.c | 1 +
> arch/x86/vdso/vma.c | 1 +
> 3 files changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
> index d76ac40..1fd06a8 100644
> --- a/arch/x86/include/asm/vvar.h
> +++ b/arch/x86/include/asm/vvar.h
> @@ -24,27 +24,34 @@
> /* The kernel linker script defines its own magic to put vvars in the
> * right place.
> */
> -#define DECLARE_VVAR(offset, type, name) \
> - EMIT_VVAR(name, offset)
> +#define DECLARE_VVAR(type, name) \
> + EMIT_VVAR(name, VVAR_OFFSET_ ## name)
> +
> +#elif defined(__VVAR_ADDR)
> +
> +#define DECLARE_VVAR(type, name) \
> + type const * const vvaraddr_ ## name = \
> + (void *)(VVAR_ADDRESS + (VVAR_OFFSET_ ## name));
>
> #else
>
> -#define DECLARE_VVAR(offset, type, name) \
> - static type const * const vvaraddr_ ## name = \
> - (void *)(VVAR_ADDRESS + (offset));
> +#define DECLARE_VVAR(type, name) \
> + extern type const * const vvaraddr_ ## name;
>
> #define DEFINE_VVAR(type, name) \
> type name \
> __attribute__((section(".vvar_" #name), aligned(16))) __visible
> +#endif
>
> #define VVAR(name) (*vvaraddr_ ## name)
>
> -#endif
> -
> /* DECLARE_VVAR(offset, type, name) */
>
> -DECLARE_VVAR(0, volatile unsigned long, jiffies)
> -DECLARE_VVAR(16, int, vgetcpu_mode)
> -DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
> +#define VVAR_OFFSET_jiffies 0
> +DECLARE_VVAR(volatile unsigned long, jiffies)
> +#define VVAR_OFFSET_vgetcpu_mode 16
> +DECLARE_VVAR(int, vgetcpu_mode)
> +#define VVAR_OFFSET_vsyscall_gtod_data 128
> +DECLARE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data)
>
> #undef DECLARE_VVAR
> diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
> index 885eff4..007eac4 100644
> --- a/arch/x86/vdso/vclock_gettime.c
> +++ b/arch/x86/vdso/vclock_gettime.c
> @@ -10,6 +10,7 @@
>
> /* Disable profiling for userspace code: */
> #define DISABLE_BRANCH_PROFILING
> +#define __VVAR_ADDR 1
>
> #include <linux/kernel.h>
> #include <linux/posix-timers.h>
> diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
> index fe08e2b..4432cfc 100644
> --- a/arch/x86/vdso/vma.c
> +++ b/arch/x86/vdso/vma.c
> @@ -3,6 +3,7 @@
> * Copyright 2007 Andi Kleen, SUSE Labs.
> * Subject to the GPL, v.2
> */
> +#define __VVAR_ADDR 1
> #include <linux/mm.h>
> #include <linux/err.h>
> #include <linux/sched.h>
> --
> 1.7.7.6
>

Subject: Re: [PATCH 71/74] lto, kprobes: Use KSYM_NAME_LEN to size identifier buffers

On Sat, Aug 18, 2012 at 07:57:07PM -0700, Andi Kleen wrote:
> From: Joe Mario <[email protected]>
>
> Use KSYM_NAME_LEN to size identifier buffers, so that it can
> be easier increased.
>
> Cc: [email protected]
> Signed-off-by: Joe Mario <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>

Acked-by: Ananth N Mavinakayanahalli <[email protected]>

> ---
> kernel/kprobes.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index c62b854..b9bd2a8 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1955,7 +1955,7 @@ static int __init init_kprobes(void)
> {
> int i, err = 0;
> unsigned long offset = 0, size = 0;
> - char *modname, namebuf[128];
> + char *modname, namebuf[KSYM_NAME_LEN];
> const char *symbol_name;
> void *addr;
> struct kprobe_blackpoint *kb;
> @@ -2081,7 +2081,7 @@ static int __kprobes show_kprobe_addr(struct seq_file *pi, void *v)
> const char *sym = NULL;
> unsigned int i = *(loff_t *) v;
> unsigned long offset = 0;
> - char *modname, namebuf[128];
> + char *modname, namebuf[KSYM_NAME_LEN];
>
> head = &kprobe_table[i];
> preempt_disable();
> --
> 1.7.7.6

2012-08-21 07:49:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel


* Andi Kleen <[email protected]> wrote:

> On Mon, Aug 20, 2012 at 09:48:35AM +0200, Ingo Molnar wrote:
> >
> > * Andi Kleen <[email protected]> wrote:
> >
> > > This rather large patchkit enables gcc Link Time Optimization (LTO)
> > > support for the kernel.
> > >
> > > With LTO gcc will do whole program optimizations for
> > > the whole kernel and each module. This increases compile time,
> > > but can generate faster code.
> >
> > By how much does it increase compile time?
>
> All numbers are preliminary at this point. I miss both some
> code quality and compile time improvements that it could do,
> to work around some issues that are fixable.
>
> Compile time:
>
> Compilation slowdown depends on the largest binary size. I
> see between 50% and 4x. The 4x case is mainly for allyes (so
> unlikely); a normal distro build, which is mostly modular, or
> a defconfig like build is more towards the 50%.
>
> Currently I have to disable slim LTO, which essentially means
> everything is compiled twice. Once that's fixed it should
> compile faster for the normal case too (although it will be
> still slower than non LTO)

The other hope would be that if LTO is used by a high-profile
project like the Linux kernel then the compiler folks might look
at it and improve it.

> A lot of the overhead on the larger builds is also some
> specific gcc code that I'm working with the gcc developers on
> to improve. So the 4x extreme case will hopefully go down.
>
> The large builds also currently suffer from too much memory
> consumption. That will hopefully improve too, as gcc improves.

Are there any LTO build files left around, blowing up the size
of the build tree?

> I wouldn't expect anyone using it for day to day kernel hacking
> (I understand that 50% are annoying for that). It's more like a
> "release build" mode.
>
> The performance is currently also missing some improvements
> due to workarounds.
>
> Performance:
>
> Hackbench goes about 5% faster, so the scheduler benefits.
> Kbuild is not changing much. Various network benchmarks over
> loopback go faster too (best case seen 18%+), so the network
> stack seems to benefit. A lot of micro benchmarks go faster,
> sometimes larger numbers. There are some minor regressions.
>
> A lot of benchmarking on larger workloads is still
> outstanding. But the existing numbers are promising I believe.
> Things will still change, it's still early.
>
> I would welcome any benchmarking from other people.
>
> I also expect gcc to do more LTO optimizations in the future,
> so we'll hopefully see more gains over time. Essentially it
> gives more power to the compiler.
>
> Long term it would also help the kernel source organization.
> For example there's no reason with LTO to have gigantic
> includes with large inlines, because cross file inlining works
> in a efficient way without reparsing.

Can the current implementation of LTO optimize to the level of
inlining? A lot of our include file hell situation results from
the desire to declare structures publicly so that inlined
functions can use them directly.

If data structures could be encapsulated/internalized to
subsystems and only global functions are exposed to other
subsystems [which are then LTO optimized] then our include
file dependencies could become a *lot* simpler.

Thanks,

Ingo

2012-08-21 14:05:39

by Don Zickus

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel

On Tue, Aug 21, 2012 at 09:49:21AM +0200, Ingo Molnar wrote:
> > A lot of the overhead on the larger builds is also some
> > specific gcc code that I'm working with the gcc developers on
> > to improve. So the 4x extreme case will hopefully go down.
> >
> > The large builds also currently suffer from too much memory
> > consumption. That will hopefully improve too, as gcc improves.
>
> Are there any LTO build files left around, blowing up the size
> of the build tree?

Hi Ingo,

Joe Mario from Red Hat has been assisting Andi with his LTO work. One of
the ideas he had which may help here is to push the LTO granularity down
to the directory level. This would allow subsystem maintainers to opt-in
and keep the compile overhead consistent across randconfigs as the linker
would have a smaller pool of files to deal with.

Joe was wondering if he hacked something up for the scheduler directory
only, if there was some preferred benchmark tools he could run to verify a
performance increase or not?

Cheers,
Don

2012-08-21 14:26:32

by Avi Kivity

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel

On 08/21/2012 10:49 AM, Ingo Molnar wrote:
>
> Can the current implementation of LTO optimize to the level of
> inlining? A lot of our include file hell situation results from
> the desire to declare structures publicly so that inlined
> functions can use them directly.
>
> If data structures could be encapsulated/internalized to
> subsystems and only global functions are exposed to other
> subsystems [which are then LTO optimized] then our include
> file dependencies could become a *lot* simpler.

I think modules break this (if I understand what you mean correctly).
If the main kernel exposes symbol x as a global function, then lto will
not inline it into a module.

--
error compiling committee.c: too many arguments to function

2012-08-21 17:02:29

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel

> The other hope would be that if LTO is used by a high-profile
> project like the Linux kernel then the compiler folks might look
> at it and improve it.

Yes definitely. I already got lot of help from toolchain people.

>
> > A lot of the overhead on the larger builds is also some
> > specific gcc code that I'm working with the gcc developers on
> > to improve. So the 4x extreme case will hopefully go down.
> >
> > The large builds also currently suffer from too much memory
> > consumption. That will hopefully improve too, as gcc improves.
>
> Are there any LTO build files left around, blowing up the size
> of the build tree?

The objdir size increases from the immediate information in the objects,
even though it's compressed. A typical LTO objdir is about 2.5x
as big as non LTO.

[this will go down a bit with slim LTO; right now there is an unnecessary
copy of the non LTOed code too; but I expect it will still be
significantly larger]

There's also the TMPDIR problem. If you put /tmp in tmpfs and gcc
defaults to put the immediate files during the final link into
/tmp the memory fills up even faster, because tmpfs is competing
with anonymous memory.

4.7 improved a lot over 4.6 for this with better partitioning; with 4.6 I
had some spectacular OOMst. 4.6 is not supported for LTO anymore now,
with 4.7 it became much better.

I also hope tmpfs will get better algorithms eventually that make
this less likely.

Anyways this can be overriden by setting TMPDIR to the object directory.
With TMPDIR set and not too aggressive -j* for most kernels you should
be ok with 4GB of memory. Just allyes still suffers.

This was one of the reasons why I made it not default for allyesconfig.


> > so we'll hopefully see more gains over time. Essentially it
> > gives more power to the compiler.
> >
> > Long term it would also help the kernel source organization.
> > For example there's no reason with LTO to have gigantic
> > includes with large inlines, because cross file inlining works
> > in a efficient way without reparsing.
>
> Can the current implementation of LTO optimize to the level of
> inlining? A lot of our include file hell situation results from

Yes, it does cross file inlining. Maybe a bit too much even
(Currently there are about 40% less static CALLs when LTOed)
In fact some of the current workarounds limit it, so there may be
even more in the future.

One side effect is that backtraces are harder to read. You'll
need to rely more on addr2line than before (or we may need
to make kallsyms smarter)

It only inlines inside a final binary though, as Avi mentioned,
so it's more useful inside a subsystem for modular kernels.


> If data structures could be encapsulated/internalized to
> subsystems and only global functions are exposed to other
> subsystems [which are then LTO optimized] then our include
> file dependencies could become a *lot* simpler.

Yes, long term we could have these benefits.

BTW I should add LTO does more than just inlining:
- Drop unused global functions and variables
(so may cut down on ifdefs)
- Detect type inconsistencies between files
- Partial inlining (inline only parts of a function like a test
at the beginning)
- Detect pure and const functions without side effects that can be more
aggressively optimized in the caller.
- Detect global clobbers globally. Normally any global call has to
assume all global variables could be changed. With LTO information some
of them can be cached in registers over calls.
- Detect read only variables and optimize them
- Optimize arguments to global functions (drop unnecessary arguments,
optimize input/output etc.)
- Replace indirect calls with direct calls, enabling other
optimizations.
- Do constant propagation and specialization for functions. So if a
function is called commonly with a constant it can generate a special
variant of this function optimized for that. This still needs more tuning (and
currently the code size impact is on the largish side), but I hope
to eventually have e.g. a special kmalloc optimized for GFP_KERNEL.
It can also in principle inline callbacks.

-Andi
--
[email protected] -- Speaking for myself only.

2012-08-22 08:44:49

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 56/74] lto, workaround: Add workaround for missing LTO symbols in igb

On Sunday 19 August 2012, Andi Kleen wrote:
> -static struct e1000_mac_operations e1000_mac_ops_82575 = {
> +/* Workaround for LTO bug */
> +__visible struct e1000_mac_operations e1000_mac_ops_82575 = {

The comment is not very clear outside the context of this patch.
Maybe change it to /* __visible added to work around an LTO but */.

Arnd

2012-08-22 08:58:23

by Arnd Bergmann

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel

On Sunday 19 August 2012, Andi Kleen wrote:
>
> This rather large patchkit enables gcc Link Time Optimization (LTO)
> support for the kernel.
>
> With LTO gcc will do whole program optimizations for
> the whole kernel and each module. This increases compile time,
> but can generate faster code.
>
> LTO allows gcc to inline functions between different files and
> do various other optimization across the whole binary.

This looks quite nice overall. Have you seen other disadvantages
besides bugs and compile time? There are two possible issues that
I can see happening:

* Debuggability: When we get more aggressive optimizations, it
often becomes harder to trace back object code to a specific source
line, which may be a reason for distros not to enable it for their
product kernels in the end because it can make the work of their
support teams harder.

* Stack consumption: If you do more inlining, the total stack usage
of large functions can become higher than what the deepest path through
the same code in the non-inlined version would be. This bites us
more in the kernel than in user applications, which have much more
stack space available.

Have you noticed problems with either of these so far? Do you think
they are realistic concerns or is the LTO implementation good enough
that they would rarely become an issue?

Arnd

2012-08-22 12:35:17

by Andi Kleen

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel

On Wed, Aug 22, 2012 at 08:58:02AM +0000, Arnd Bergmann wrote:
> * Debuggability: When we get more aggressive optimizations, it
> often becomes harder to trace back object code to a specific source
> line, which may be a reason for distros not to enable it for their
> product kernels in the end because it can make the work of their
> support teams harder.

Yes, that's a potential issue with the larger functions. People looking
at oopses may need to rely more on addr2line with debug info. It's probably
less an issue for distributions (who should have debug info for their kernels
and may even use crash instead of only oops logs), but more for random reports
on linux-kernel.

That said for the few LTO crashes I looked at it wasn't that big an issue.
Usually the inline chains are still broken up by indirect calls, and
a lot of kernel paths have that, so all the backtraces I could make
sense of without debug info.

> * Stack consumption: If you do more inlining, the total stack usage
> of large functions can become higher than what the deepest path through
> the same code in the non-inlined version would be. This bites us
> more in the kernel than in user applications, which have much more
> stack space available.

Newer gcc has a heuristic to not inline when the stack frame gets too
large. We set that option. Also there's a warning for too large
stack frames. With these two together we should be pretty safe.

iirc the warning mostly showed up in some staging drivers which were likely
already too large on their own. I haven't hunted for it explicitely,
but I don't remember seeing it much in other places. Also it was alwas
still in a range that does not necessarily crash.

> Have you noticed problems with either of these so far? Do you think
> they are realistic concerns or is the LTO implementation good enough
> that they would rarely become an issue?

I think the first is a realistic possible concern, but I personally haven't
had much trouble with it so far.

-Andi

--
[email protected] -- Speaking for myself only.

2012-08-22 12:36:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 56/74] lto, workaround: Add workaround for missing LTO symbols in igb

On Wed, Aug 22, 2012 at 08:43:35AM +0000, Arnd Bergmann wrote:
> On Sunday 19 August 2012, Andi Kleen wrote:
> > -static struct e1000_mac_operations e1000_mac_ops_82575 = {
> > +/* Workaround for LTO bug */
> > +__visible struct e1000_mac_operations e1000_mac_ops_82575 = {
>
> The comment is not very clear outside the context of this patch.
> Maybe change it to /* __visible added to work around an LTO but */.

I hope to remove this soon, just needs another fix for initcalls
first.

-Andi
--
[email protected] -- Speaking for myself only.

2012-08-22 19:25:35

by Wim Van Sebroeck

[permalink] [raw]
Subject: Re: [PATCH 38/74] lto, watchdog/hpwdt.c: Make assembler label global

Hi andi,

> From: Andi Kleen <[email protected]>
>
> We cannot assume that the inline assembler code always ends up
> in the same file as the original C file. So make any assembler labels
> that are called with "extern" by C global
>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>

You have my signed-off-by, but I'm Cc-ing also the author of the driver
(Tom Mingarelli) so that he is also aware of the proposed change.

Kind regards,
Wim.

2012-08-22 20:12:59

by Tom Mingarelli

[permalink] [raw]
Subject: RE: [PATCH 38/74] lto, watchdog/hpwdt.c: Make assembler label global

I am OK with the changes. We have a few more coming soon to improve the kdump process when hpwdt is running. Just a heads up.


Thanks,
Tom

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Wim Van Sebroeck
Sent: Wednesday, August 22, 2012 2:25 PM
To: Andi Kleen
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Andi Kleen; Mingarelli, Thomas
Subject: Re: [PATCH 38/74] lto, watchdog/hpwdt.c: Make assembler label global

Hi andi,

> From: Andi Kleen <[email protected]>
>
> We cannot assume that the inline assembler code always ends up
> in the same file as the original C file. So make any assembler labels
> that are called with "extern" by C global
>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>

You have my signed-off-by, but I'm Cc-ing also the author of the driver
(Tom Mingarelli) so that he is also aware of the proposed change.

Kind regards,
Wim.

2012-08-23 00:18:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 74/74] lto, workaround: Mark do_futex noinline to prevent clobbering ebp

On 08/18/2012 07:57 PM, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> On a 32bit build gcc 4.7 with LTO decides to clobber the 6th argument on the
> stack. Unfortunately this corrupts the user EBP and leads to later crashes.
> For now mark do_futex noinline to prevent this.
>
> I wish there was a generic way to handle this. Seems like a ticking time
> bomb problem.
>

There is a generic way to handle this. This is actually a bug in Linux
that has been known for at least 15 years and which we keep hacking around.

The right thing to do is to change head_32.S to not violate the i386
ABI. Arguments pushed (by value) on the stack are property of the
callee, that is, they are volatile, so the hack of making them do double
duty as both being saved and passed as arguments is just plain bogus.
The problem is that it works "just well enough" that people (including
myself) keep hacking around it with hacks like this, with assembly
macros, and whatnot instead of fixing the root cause.

-hpa

2012-08-23 02:14:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 74/74] lto, workaround: Mark do_futex noinline to prevent clobbering ebp

On 08/22/2012 05:17 PM, H. Peter Anvin wrote:
> On 08/18/2012 07:57 PM, Andi Kleen wrote:
>> From: Andi Kleen <[email protected]>
>>
>> On a 32bit build gcc 4.7 with LTO decides to clobber the 6th argument on the
>> stack. Unfortunately this corrupts the user EBP and leads to later crashes.
>> For now mark do_futex noinline to prevent this.
>>
>> I wish there was a generic way to handle this. Seems like a ticking time
>> bomb problem.
>>
>
> There is a generic way to handle this. This is actually a bug in Linux
> that has been known for at least 15 years and which we keep hacking around.
>
> The right thing to do is to change head_32.S to not violate the i386
> ABI. Arguments pushed (by value) on the stack are property of the
> callee, that is, they are volatile, so the hack of making them do double
> duty as both being saved and passed as arguments is just plain bogus.
> The problem is that it works "just well enough" that people (including
> myself) keep hacking around it with hacks like this, with assembly
> macros, and whatnot instead of fixing the root cause.
>
> -hpa
>

Just a clarification (Andi knows this, I'm sure, but others might not):
this wasn't done the way it is for no reason; back when Linus originally
wrote the code, i386 passed *all* arguments on the stack, and we still
do that for "asmlinkage" functions on i386. Since gcc back then rarely
if ever mucked with the stack arguments, it made sense to make them
"double duty." Fixing this really should entail changing the invocation
of system calls on i386 to use the regparm convention, which means we
only need to push three arguments twice, rather than six.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-08-23 02:29:15

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 74/74] lto, workaround: Mark do_futex noinline to prevent clobbering ebp

> The right thing to do is to change head_32.S to not violate the i386
> ABI. Arguments pushed (by value) on the stack are property of the
> callee, that is, they are volatile, so the hack of making them do double
> duty as both being saved and passed as arguments is just plain bogus.
> The problem is that it works "just well enough" that people (including
> myself) keep hacking around it with hacks like this, with assembly
> macros, and whatnot instead of fixing the root cause.

How about just use register arguments for the first three arguments.
This should work for the syscalls at least (may be too risky for all
other asm entry points)

And for syscalls with more than three generate a stub that saves on the stack
explicitely. This could be done using the new fancy SYSCALL definition macros
(except that arch/x86 would need to start using them too in its own code)

Or is there some subtle reason with syscall restart and updated args
that prevents it?

Perhaps newer gcc can do regparm(X), X > 3 too, may be worth trying.

Don't have time to look into this currently though.

-Andi

--
[email protected] -- Speaking for myself only

2012-08-23 03:14:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 74/74] lto, workaround: Mark do_futex noinline to prevent clobbering ebp

On 08/22/2012 07:29 PM, Andi Kleen wrote:
> How about just use register arguments for the first three arguments.
> This should work for the syscalls at least (may be too risky for all
> other asm entry points)

Well, it's just an effort to convert each one in turn...

> And for syscalls with more than three generate a stub that saves on the stack
> explicitely. This could be done using the new fancy SYSCALL definition macros
> (except that arch/x86 would need to start using them too in its own code)

I don't think there is any point. Just push the six potential arguments
to the stack and be done with it.

> Or is there some subtle reason with syscall restart and updated args
> that prevents it?
>
> Perhaps newer gcc can do regparm(X), X > 3 too, may be worth trying.

No, there is no such ABI defined.

> Don't have time to look into this currently though.

Always the problem.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-08-23 15:13:10

by Jan Hubicka

[permalink] [raw]
Subject: Re: RFC: Link Time Optimization support for the kernel

> > If data structures could be encapsulated/internalized to
> > subsystems and only global functions are exposed to other
> > subsystems [which are then LTO optimized] then our include
> > file dependencies could become a *lot* simpler.
>
> Yes, long term we could have these benefits.

Yes, LTO should make in long term life of developers easier, it is just not tool
how to get few extra % of performance.
There is a lot to do.
>
> BTW I should add LTO does more than just inlining:
> - Drop unused global functions and variables
> (so may cut down on ifdefs)
> - Detect type inconsistencies between files
> - Partial inlining (inline only parts of a function like a test
> at the beginning)
> - Detect pure and const functions without side effects that can be more
> aggressively optimized in the caller.
Also noreturn and nothorw are autodetected (the second is probably not big deal
for kernel, but it makes some C++ codebases a lot smaller by elliminating EH
and cleanps). We plan to add more in near future.
> - Detect global clobbers globally. Normally any global call has to
> assume all global variables could be changed. With LTO information some
> of them can be cached in registers over calls.
> - Detect read only variables and optimize them
> - Optimize arguments to global functions (drop unnecessary arguments,
> optimize input/output etc.)

At this moment this really happen s within compilation units only.
It is one of harder optimizations to get working over whole program,
we are slowly getting infrasrtucture to make this possible.

> - Replace indirect calls with direct calls, enabling other
> optimizations.
> - Do constant propagation and specialization for functions. So if a
> function is called commonly with a constant it can generate a special
> variant of this function optimized for that. This still needs more tuning (and
> currently the code size impact is on the largish side), but I hope
> to eventually have e.g. a special kmalloc optimized for GFP_KERNEL.
> It can also in principle inline callbacks.

Also profile propagation is done. When function is called only on cold paths, it becomes
cold.

Thanks for all the hard work on LTO kernel, Andi!
Honza
>
> -Andi
> --
> [email protected] -- Speaking for myself only.