2015-11-15 00:27:10

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: clean up the kbuild tree?

Hi Michal,

I notice that the kbuild tree (relative to Linus' tree) only contains
lots of merges and these 2 commits from April 2014:

commit 19a3cc83353e3bb4bc28769f8606139a3d350d2d
Author: Andi Kleen <[email protected]>
Date: Wed Apr 2 21:49:27 2014 +0200

Kbuild, lto: Add Link Time Optimization support v3

With LTO gcc will do whole program optimizations for
the whole kernel and each module. This increases compile time,
but can generate faster and smaller code and allows
the compiler to do global checking. For example the compiler
can complain now about type mismatches for symbols between
different files.

LTO allows gcc to inline functions between different files and
do various other optimization across the whole binary.

It might also trigger bugs due to more aggressive optimizations.
It allows gcc to drop unused code. It also allows it to check
types over the whole program.

The compile time is definitely slower. For gcc 4.8 on a
typical monolithic config it is about 58% slower. 4.9
drastically improved performance, with slowdown being
38% or so. Also incremenential rebuilds are somewhat
slower, as the whole kernel always needs to be reoptimized.
Very modular kernels have less build time slow down, as
the LTO will run for each module individually.

This adds the basic Kbuild plumbing for LTO:

- In Kbuild add a new scripts/Makefile.lto that checks
the tool chain (note the checks may not be fully bulletproof)
and when the tests pass sets the LTO options
Currently LTO is very finicky about the tool chain.
- Add a new LDFINAL variable that controls the final link
for vmlinux or module. In this case we call gcc-ld instead
of ld, to run the LTO step.
- For slim LTO builds (object files containing no backup
executable) force AR to gcc-ar
- Theoretically LTO should pass through compiler options from
the compiler to the link step, but this doesn't work for all options.
So the Makefile sets most of these options manually.
- Kconfigs:
Since LTO with allyesconfig needs more than 4G of memory (~8G)
and has the potential to makes people's system swap to death.
I used a nested config that ensures that a simple
allyesconfig disables LTO. It has to be explicitely
enabled.
- Some depencies on other Kconfigs:
MODVERSIONS, GCOV, FUNCTION_TRACER, KALLSYMS_ALL, single chain WCHAN are
incompatible with LTO currently, mostly because they
they require setting special compiler options
for specific files, which LTO currently doesn't support.
MODVERSIONS should in principle work with gcc 4.9, but still disabled.
FUNCTION_TRACER/GCOV can be fixed with a unmerged gcc patch.
- Also disable strict copy user checks because they trigger
errors with LTO.
- modpost symbol checking is downgraded to a warning,
as in some cases modpost runs before the final link
and it cannot resolve LTO symbols at this point.

For more information see Documentation/lto-build

Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther,
Don Zickus, Changlong Xie who helped with this project
(and probably some more who I forgot, sorry)

v2:
Merge documentation file into this patch
Improve documentation and Kconfig, fix a lot of obsolete comments.
Exclude READABLE_ASM
Some random fixes
v3:
Remove CONFIG_LTO_SLIM, is on by default.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Michal Marek <[email protected]>

commit 810361b9f65daa6144922ac88087a8426eeae817
Author: Andi Kleen <[email protected]>
Date: Wed Apr 2 21:49:26 2014 +0200

Kbuild, lto: Set TMPDIR for LTO v4

LTO gcc puts a lot of data into $TMPDIR, essentially another copy
of the object directory to pass the repartitioned object files
to the code generation processes.

TMPDIR defaults to /tmp With /tmp as tmpfs it's easy to drive systems to
out of memory, because they will compete with the already high anonymous
memory consumption of the wpa LTO pass.

When LTO is set always set TMPDIR to the object directory. This could
be slightly slower, but is far safer and eliminates another parameter
the LTO user would need to set manually.

I made it conditional on LTO for now.

v2: Allow user to override (H. Peter Anvin)
v3: Use standard kernel variable style
v4: Print message for redirection (M.Marek)
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Michal Marek <[email protected]>

Documentation/lto-build | 173 ++++++++++++++++++++
Makefile | 23 ++-
arch/x86/Kconfig | 2 +-
init/Kconfig | 73 +++++++++
kernel/gcov/Kconfig | 2 +-
lib/Kconfig.debug | 2 +-
scripts/Makefile.lto | 84 ++++++++++
scripts/Makefile.modpost | 7 +-
scripts/coccicheck | 2 +-
scripts/coccinelle/free/ifnullfree.cocci | 26 +--
.../iterators/device_node_continue.cocci | 100 ++++++++++++
scripts/coccinelle/misc/compare_const_fl.cocci | 171 ++++++++++++++++++++
scripts/coccinelle/misc/of_table.cocci | 33 +++-
scripts/coccinelle/misc/simple_return.cocci | 180 ---------------------
scripts/coccinelle/null/deref_null.cocci | 4 +-
scripts/coccinelle/tests/odd_ptr_err.cocci | 120 ++++++++++----
scripts/link-vmlinux.sh | 2 +-
scripts/package/builddeb | 11 +-
scripts/tags.sh | 2 +
19 files changed, 775 insertions(+), 242 deletions(-)

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


2015-11-15 17:58:52

by Andi Kleen

[permalink] [raw]
Subject: Re: linux-next: clean up the kbuild tree?

On Sun, Nov 15, 2015 at 11:27:05AM +1100, Stephen Rothwell wrote:
> Hi Michal,
>
> I notice that the kbuild tree (relative to Linus' tree) only contains
> lots of merges and these 2 commits from April 2014:

Really should get in that patch officially. I have a variety of users.
And it clearly has been tested long enough in linux-next :)
Michal, enough to just repost it?

-Andi

2015-11-16 13:01:51

by Michal Marek

[permalink] [raw]
Subject: Re: linux-next: clean up the kbuild tree?

Dne 15.11.2015 v 18:58 Andi Kleen napsal(a):
> On Sun, Nov 15, 2015 at 11:27:05AM +1100, Stephen Rothwell wrote:
>> Hi Michal,
>>
>> I notice that the kbuild tree (relative to Linus' tree) only contains
>> lots of merges and these 2 commits from April 2014:
>
> Really should get in that patch officially. I have a variety of users.
> And it clearly has been tested long enough in linux-next :)
> Michal, enough to just repost it?

So the commit in kbuild.git tree is identical to what is being tested
out of tree? Could you nevertheless provide an updated changelog? One
(and actually only) of Linus' objections was that it was not clear at
all what the actual benefits for the kernel itself are. Do you have some
benchmarks perhaps, where LTO achieves a preformance gain? Also, did the
compile time impact change with gcc 5.x?

Thanks,
Michal

2015-11-21 01:00:35

by Andi Kleen

[permalink] [raw]
Subject: Re: linux-next: clean up the kbuild tree?

Sorry for the delay.

On Mon, Nov 16, 2015 at 02:01:45PM +0100, Michal Marek wrote:
> Dne 15.11.2015 v 18:58 Andi Kleen napsal(a):
> > On Sun, Nov 15, 2015 at 11:27:05AM +1100, Stephen Rothwell wrote:
> >> Hi Michal,
> >>
> >> I notice that the kbuild tree (relative to Linus' tree) only contains
> >> lots of merges and these 2 commits from April 2014:
> >
> > Really should get in that patch officially. I have a variety of users.
> > And it clearly has been tested long enough in linux-next :)
> > Michal, enough to just repost it?
>
> So the commit in kbuild.git tree is identical to what is being tested
> out of tree? Could you nevertheless provide an updated changelog? One

Yes. I'll provide a new ChangeLog.

> (and actually only) of Linus' objections was that it was not clear at
> all what the actual benefits for the kernel itself are. Do you have some
> benchmarks perhaps, where LTO achieves a preformance gain?

The main users use it to shrink the kernel. I'll run some new benchmarks.

> Also, did the
> compile time impact change with gcc 5.x?

5.x is better than 4.x but it's still a slower. It's also not incremential.

-Andi

--
[email protected] -- Speaking for myself only

2015-11-21 10:55:45

by Takashi Iwai

[permalink] [raw]
Subject: Re: linux-next: clean up the kbuild tree?

On Sat, 21 Nov 2015 02:00:33 +0100,
Andi Kleen wrote:
>
> Sorry for the delay.
>
> On Mon, Nov 16, 2015 at 02:01:45PM +0100, Michal Marek wrote:
> > Dne 15.11.2015 v 18:58 Andi Kleen napsal(a):
> > > On Sun, Nov 15, 2015 at 11:27:05AM +1100, Stephen Rothwell wrote:
> > >> Hi Michal,
> > >>
> > >> I notice that the kbuild tree (relative to Linus' tree) only contains
> > >> lots of merges and these 2 commits from April 2014:
> > >
> > > Really should get in that patch officially. I have a variety of users.
> > > And it clearly has been tested long enough in linux-next :)
> > > Michal, enough to just repost it?
> >
> > So the commit in kbuild.git tree is identical to what is being tested
> > out of tree? Could you nevertheless provide an updated changelog? One
>
> Yes. I'll provide a new ChangeLog.
>
> > (and actually only) of Linus' objections was that it was not clear at
> > all what the actual benefits for the kernel itself are. Do you have some
> > benchmarks perhaps, where LTO achieves a preformance gain?
>
> The main users use it to shrink the kernel. I'll run some new benchmarks.

Yeah, people (especially Intel) seem eager to reduce any bits in
kernel for IoT thingy, and LTO would help a lot in this regard.
Many drivers have common helper functions and many of them are unused
for a single driver. They can be dropped easily with LTO. Otherwise
we'd end up having too many unmanageable Kconfigs.

> > Also, did the
> > compile time impact change with gcc 5.x?
>
> 5.x is better than 4.x but it's still a slower. It's also not incremential.

At the last time I tested with the latest 5.x and stock binutils on
openSUSE Tumbleweed, I failed to build, unfortunately. Partly the
detection of gcc version doesn't work for 5.x, and partly something is
missing in binutils side, although it's already built with plugin.
I stopped at this point and didn't track further.

Hopefully the requirement would become easier to manage in future if
we merge this...


thanks,

Takashi

2015-11-24 02:12:34

by Andi Kleen

[permalink] [raw]
Subject: Re: linux-next: clean up the kbuild tree?

> > 5.x is better than 4.x but it's still a slower. It's also not incremential.
>
> At the last time I tested with the latest 5.x and stock binutils on
> openSUSE Tumbleweed, I failed to build, unfortunately. Partly the
> detection of gcc version doesn't work for 5.x, and partly something is

Really? It work for me with gcc 5

> missing in binutils side, although it's already built with plugin.

Yes it needs HJ Lu's Linux binutils, not the standard FSF binutils.
The patch to fix LTO with ld -r was submitted to standard binutils, but they
didn't want to fix the issue.


-Andi
--
[email protected] -- Speaking for myself only

2015-11-24 16:33:41

by Takashi Iwai

[permalink] [raw]
Subject: LTO build errors (Re: linux-next: clean up the kbuild tree?)

On Tue, 24 Nov 2015 03:12:31 +0100,
Andi Kleen wrote:
>
> > > 5.x is better than 4.x but it's still a slower. It's also not incremential.
> >
> > At the last time I tested with the latest 5.x and stock binutils on
> > openSUSE Tumbleweed, I failed to build, unfortunately. Partly the
> > detection of gcc version doesn't work for 5.x, and partly something is
>
> Really? It work for me with gcc 5

I retested now, and it seems like only the binutils, not about gcc
version, indeed. Sorry for confusion.

> > missing in binutils side, although it's already built with plugin.
>
> Yes it needs HJ Lu's Linux binutils, not the standard FSF binutils.
> The patch to fix LTO with ld -r was submitted to standard binutils, but they
> didn't want to fix the issue.

I did "make allnoconfig", disabled tracers, gcov, etc, and then
enabled LTO. With hlj version, the build reaches to the almost end
hitting this:

LD vmlinux
arch/x86/kernel/cpu/perf_event_intel_rapl.c:66:20: error: rapl_domain_names causes a section type conflict with __setup_str_set_reset_devices
static const char *rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
^
init/main.c:159:19: note: ‘__setup_str_set_reset_devices’ was declared here
__setup("reset_devices", set_reset_devices);

Hmm... I see no direct relation, but OK, let's try to get rid of
__initconst. Now it hits lots of other errors like:

`__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
`__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
`__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
`__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
/tmp/ccUCMU7n.ltrans21.ltrans.o: In function `do_exit':
<artificial>:(.text+0xfc0): undefined reference to `sys_futex'
/tmp/ccUCMU7n.ltrans22.ltrans.o: In function `_do_fork':
<artificial>:(.text+0x39f7): undefined reference to `ret_from_fork'
<artificial>:(.text+0x4428): undefined reference to `ret_from_kernel_thread'
....


Any hints to solve these?


thanks,

Takashi

2015-11-25 04:33:46

by Andi Kleen

[permalink] [raw]
Subject: Re: LTO build errors (Re: linux-next: clean up the kbuild tree?)


Hi Takashi,

On Tue, Nov 24, 2015 at 05:33:36PM +0100, Takashi Iwai wrote:
> LD vmlinux
> arch/x86/kernel/cpu/perf_event_intel_rapl.c:66:20: error: rapl_domain_names causes a section type conflict with __setup_str_set_reset_devices
> static const char *rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
> ^
> init/main.c:159:19: note: ‘__setup_str_set_reset_devices’ was declared here
> __setup("reset_devices", set_reset_devices);
>
> Hmm... I see no direct relation, but OK, let's try to get rid of
> __initconst. Now it hits lots of other errors like:

I hit the same issue, will send a patch. The other symbol is typically some
random correct symbol because gcc detects the conflict on a pair of symbols.

The problem is that placing const correctly is too difficult, the correct line
would be

static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {

>
> `__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
> `__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
> `__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
> `__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)

This needs

https://git.kernel.org/cgit/linux/kernel/git/ak/linux-misc.git/commit/?h=lto-4.0&id=d826425f7a9d935d521989bd0a871b76fb4c59e2


> /tmp/ccUCMU7n.ltrans21.ltrans.o: In function `do_exit':
> <artificial>:(.text+0xfc0): undefined reference to `sys_futex'
> /tmp/ccUCMU7n.ltrans22.ltrans.o: In function `_do_fork':
> <artificial>:(.text+0x39f7): undefined reference to `ret_from_fork'
> <artificial>:(.text+0x4428): undefined reference to `ret_from_kernel_thread'


That's new, but can be fixed by adding __visible or asmlinkage to these symbols
I guess it's from the recent entry* restructuring.

I'll do an updated tree later.

Everything that's called from assembler in C needs to be marked like this. It's
fairly mechanic.

-andi

2015-11-25 06:58:23

by Takashi Iwai

[permalink] [raw]
Subject: Re: LTO build errors (Re: linux-next: clean up the kbuild tree?)

On Wed, 25 Nov 2015 05:33:44 +0100,
Andi Kleen wrote:
>
>
> Hi Takashi,
>
> On Tue, Nov 24, 2015 at 05:33:36PM +0100, Takashi Iwai wrote:
> > LD vmlinux
> > arch/x86/kernel/cpu/perf_event_intel_rapl.c:66:20: error: rapl_domain_names causes a section type conflict with __setup_str_set_reset_devices
> > static const char *rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
> > ^
> > init/main.c:159:19: note: ‘__setup_str_set_reset_devices’ was declared here
> > __setup("reset_devices", set_reset_devices);
> >
> > Hmm... I see no direct relation, but OK, let's try to get rid of
> > __initconst. Now it hits lots of other errors like:
>
> I hit the same issue, will send a patch. The other symbol is typically some
> random correct symbol because gcc detects the conflict on a pair of symbols.
>
> The problem is that placing const correctly is too difficult, the correct line
> would be
>
> static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {

Ah, so I should have ignored the relation with __setup_* but
concentrate on the line. LTO goes much deeper than a human being can
look through :)

Yes, such an error is often overseen. A quick grep shows the
following:

arch/arc/plat-axs10x/axs10x.c:463:static const char *axs101_compat[] __initconst = {
arch/arc/plat-axs10x/axs10x.c:477:static const char *axs103_compat[] __initconst = {
arch/arc/plat-sim/platform.c:22:static const char *simulation_compat[] __initconst = {
arch/arm/mach-imx/mach-imx6ul.c:87:static const char *imx6ul_dt_compat[] __initconst = {
arch/arm/mach-shmobile/setup-r8a7793.c:22:static const char *r8a7793_boards_compat_dt[] __initconst = {
arch/x86/kernel/cpu/perf_event_intel_rapl.c:66:static const char *rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
drivers/clk/pistachio/clk.h:40:#define PNAME(x) static const char *x[] __initconst

> > `__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
> > `__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
> > `__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
> > `__sw_hweight32' referenced in section `.text' of /tmp/ccUCMU7n.ltrans13.ltrans.o: defined in discarded section `.text' of lib/built-in.o (symbol from plugin)
>
> This needs
>
> https://git.kernel.org/cgit/linux/kernel/git/ak/linux-misc.git/commit/?h=lto-4.0&id=d826425f7a9d935d521989bd0a871b76fb4c59e2

OK, noted.


> > /tmp/ccUCMU7n.ltrans21.ltrans.o: In function `do_exit':
> > <artificial>:(.text+0xfc0): undefined reference to `sys_futex'
> > /tmp/ccUCMU7n.ltrans22.ltrans.o: In function `_do_fork':
> > <artificial>:(.text+0x39f7): undefined reference to `ret_from_fork'
> > <artificial>:(.text+0x4428): undefined reference to `ret_from_kernel_thread'
>
>
> That's new, but can be fixed by adding __visible or asmlinkage to these symbols
> I guess it's from the recent entry* restructuring.
>
> I'll do an updated tree later.
>
> Everything that's called from assembler in C needs to be marked like this. It's
> fairly mechanic.

OK, thanks for the information!


Takashi

2015-11-30 17:48:46

by Andi Kleen

[permalink] [raw]
Subject: Re: LTO build errors (Re: linux-next: clean up the kbuild tree?)

> > > /tmp/ccUCMU7n.ltrans21.ltrans.o: In function `do_exit':
> > > <artificial>:(.text+0xfc0): undefined reference to `sys_futex'
> > > /tmp/ccUCMU7n.ltrans22.ltrans.o: In function `_do_fork':
> > > <artificial>:(.text+0x39f7): undefined reference to `ret_from_fork'
> > > <artificial>:(.text+0x4428): undefined reference to `ret_from_kernel_thread'
> >
> >
> > That's new, but can be fixed by adding __visible or asmlinkage to these symbols
> > I guess it's from the recent entry* restructuring.
> >
> > I'll do an updated tree later.
> >
> > Everything that's called from assembler in C needs to be marked like this. It's
> > fairly mechanic.
>
> OK, thanks for the information!

I uploaded lto-4.1/4.2/4.3 trees to my git tree. It only needed some minor changes.

However I think the problems with the assembler labels you had are due to toolchain
misconfiguration. I had the same issue when the compiler was using the non Linux binutils
as plugin ld. Unfortunately there is no way to ask the compiler to use a different
plugin ld, other than:
- rebuilding and specifying it at build time (see "Documentation/lto-build")
- or replacing the linker binary (/usr/bin/ld in a standard hosted compiler)

-Andi

--
[email protected] -- Speaking for myself only