2020-07-02 23:42:43

by Danny Lin

[permalink] [raw]
Subject: [PATCH] kbuild: Allow Clang global merging if !MODULES

The old reasoning for disabling Clang's global merging optimization is
that it breaks modpost by coalescing many symbols into _MergedGlobals.
However, modpost is only used in builds with dynamic modules;
vmlinux.symvers is still created during standalone builds, but it's
effectively just an empty dummy file.

Enabling the optimization whenever possible allows us to reap the
benefits of reduced register pressure when many global variables are
used in the same function.

An x86 defconfig kernel built with this optimization boots fine in qemu,
and a downstream 4.14 kernel has been used on arm64 for nearly a year
without any issues caused by this optimization.

Signed-off-by: Danny Lin <[email protected]>
---
Makefile | 3 +++
1 file changed, 3 insertions(+)

diff --git a/Makefile b/Makefile
index a60c98519c37..f04c3639cf61 100644
--- a/Makefile
+++ b/Makefile
@@ -772,10 +772,13 @@ ifdef CONFIG_CC_IS_CLANG
KBUILD_CPPFLAGS += -Qunused-arguments
KBUILD_CFLAGS += -Wno-format-invalid-specifier
KBUILD_CFLAGS += -Wno-gnu
+
+ifdef CONFIG_MODULES
# CLANG uses a _MergedGlobals as optimization, but this breaks modpost, as the
# source of a reference will be _MergedGlobals and not on of the whitelisted names.
# See modpost pattern 2
KBUILD_CFLAGS += -mno-global-merge
+endif
else

# These warnings generated too much noise in a regular build.
--
2.27.0


2020-07-05 02:43:55

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] kbuild: Allow Clang global merging if !MODULES

Hi Danny,

On Thu, Jul 02, 2020 at 04:39:29PM -0700, Danny Lin wrote:
> The old reasoning for disabling Clang's global merging optimization is
> that it breaks modpost by coalescing many symbols into _MergedGlobals.
> However, modpost is only used in builds with dynamic modules;
> vmlinux.symvers is still created during standalone builds, but it's
> effectively just an empty dummy file.
>
> Enabling the optimization whenever possible allows us to reap the
> benefits of reduced register pressure when many global variables are
> used in the same function.

Have you run into any place within the kernel that this is the case or
this more of a "could help if this ever happens" type of deal?

> An x86 defconfig kernel built with this optimization boots fine in qemu,
> and a downstream 4.14 kernel has been used on arm64 for nearly a year
> without any issues caused by this optimization.

If I am reading LLVM's source correctly, this option only seems relevant
for ARM and AArch64?

$ rg --no-heading createGlobalMergePass
llvm/lib/CodeGen/GlobalMerge.cpp:679:Pass *llvm::createGlobalMergePass(const TargetMachine *TM, unsigned Offset,
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp:524: addPass(createGlobalMergePass(TM, 4095, OnlyOptimizeForSize,
llvm/lib/Target/ARM/ARMTargetMachine.cpp:456: addPass(createGlobalMergePass(TM, 127, OnlyOptimizeForSize,
llvm/include/llvm/CodeGen/Passes.h:419: Pass *createGlobalMergePass(const TargetMachine *TM, unsigned MaximalOffset,

Otherwise, I think this is probably okay. According to [1], when the
optimization level is less than -O3, we get a less aggressive version of
this optimization level, which is good for code size:

https://github.com/llvm/llvm-project/commit/8207641251706ea808df6d2a1ea8f87b8ee04c6d

However, we do potentially get merging of extern globals if we do not
specify -mglobal-merge (if I am reading the source correctly), which
this commit claims might hurt performance? Not sure if there is any way
to test or verify that?

https://github.com/llvm/llvm-project/commit/de73404b8c4332190750537eb93ce0d5b6451300

> Signed-off-by: Danny Lin <[email protected]>
> ---
> Makefile | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index a60c98519c37..f04c3639cf61 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -772,10 +772,13 @@ ifdef CONFIG_CC_IS_CLANG
> KBUILD_CPPFLAGS += -Qunused-arguments
> KBUILD_CFLAGS += -Wno-format-invalid-specifier
> KBUILD_CFLAGS += -Wno-gnu
> +
> +ifdef CONFIG_MODULES
> # CLANG uses a _MergedGlobals as optimization, but this breaks modpost, as the
> # source of a reference will be _MergedGlobals and not on of the whitelisted names.
> # See modpost pattern 2
> KBUILD_CFLAGS += -mno-global-merge
> +endif
> else
>
> # These warnings generated too much noise in a regular build.
> --
> 2.27.0
>

Cheers,
Nathan

2020-08-06 22:01:37

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH] kbuild: Allow Clang global merging if !MODULES

On Sun, Jul 5, 2020 at 11:43 AM Nathan Chancellor
<[email protected]> wrote:
>
> Hi Danny,
>
> On Thu, Jul 02, 2020 at 04:39:29PM -0700, Danny Lin wrote:
> > The old reasoning for disabling Clang's global merging optimization is
> > that it breaks modpost by coalescing many symbols into _MergedGlobals.
> > However, modpost is only used in builds with dynamic modules;
> > vmlinux.symvers is still created during standalone builds, but it's
> > effectively just an empty dummy file.

I am not convinced with this statement.

modpost is used even if CONFIG_MODULES=n

You can see the following log for allnoconfig.

MODPOST vmlinux.symvers


Yes, vmlinux.symver is empty.
This is because EXPORT_SYMBOL is no-op when CONFIG_MODULES=n.



> >
> > Enabling the optimization whenever possible allows us to reap the
> > benefits of reduced register pressure when many global variables are
> > used in the same function.
>
> Have you run into any place within the kernel that this is the case or
> this more of a "could help if this ever happens" type of deal?
>
> > An x86 defconfig kernel built with this optimization boots fine in qemu,
> > and a downstream 4.14 kernel has been used on arm64 for nearly a year
> > without any issues caused by this optimization.
>
> If I am reading LLVM's source correctly, this option only seems relevant
> for ARM and AArch64?
>
> $ rg --no-heading createGlobalMergePass
> llvm/lib/CodeGen/GlobalMerge.cpp:679:Pass *llvm::createGlobalMergePass(const TargetMachine *TM, unsigned Offset,
> llvm/lib/Target/AArch64/AArch64TargetMachine.cpp:524: addPass(createGlobalMergePass(TM, 4095, OnlyOptimizeForSize,
> llvm/lib/Target/ARM/ARMTargetMachine.cpp:456: addPass(createGlobalMergePass(TM, 127, OnlyOptimizeForSize,
> llvm/include/llvm/CodeGen/Passes.h:419: Pass *createGlobalMergePass(const TargetMachine *TM, unsigned MaximalOffset,
>
> Otherwise, I think this is probably okay. According to [1], when the
> optimization level is less than -O3, we get a less aggressive version of
> this optimization level, which is good for code size:
>
> https://github.com/llvm/llvm-project/commit/8207641251706ea808df6d2a1ea8f87b8ee04c6d


I do not understand how -mglobal-merge works.
Do you have test code to generate _MergedGlobals ?


Thanks.





> However, we do potentially get merging of extern globals if we do not
> specify -mglobal-merge (if I am reading the source correctly), which
> this commit claims might hurt performance? Not sure if there is any way
> to test or verify that?
>
> https://github.com/llvm/llvm-project/commit/de73404b8c4332190750537eb93ce0d5b6451300
>
> > Signed-off-by: Danny Lin <[email protected]>
> > ---
> > Makefile | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/Makefile b/Makefile
> > index a60c98519c37..f04c3639cf61 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -772,10 +772,13 @@ ifdef CONFIG_CC_IS_CLANG
> > KBUILD_CPPFLAGS += -Qunused-arguments
> > KBUILD_CFLAGS += -Wno-format-invalid-specifier
> > KBUILD_CFLAGS += -Wno-gnu
> > +
> > +ifdef CONFIG_MODULES
> > # CLANG uses a _MergedGlobals as optimization, but this breaks modpost, as the
> > # source of a reference will be _MergedGlobals and not on of the whitelisted names.
> > # See modpost pattern 2
> > KBUILD_CFLAGS += -mno-global-merge
> > +endif
> > else
> >
> > # These warnings generated too much noise in a regular build.
> > --
> > 2.27.0
> >
>
> Cheers,
> Nathan



--
Best Regards
Masahiro Yamada