2022-09-02 14:35:47

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v2 08/59] x86/build: Ensure proper function alignment

From: Thomas Gleixner <[email protected]>

The Intel Architectures Optimization Reference Manual explains that
functions should be aligned at 16 bytes because for a lot of (Intel)
uarchs the I-fetch width is 16 bytes. The AMD Software Optimization
Guide (for recent chips) mentions a 32 byte I-fetch window but a 16
byte decode window.

Follow this advice and align functions to 16 bytes to optimize
instruction delivery to decode and reduce front-end bottlenecks.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/Kconfig.cpu | 6 ++++++
arch/x86/Makefile | 4 ++++
arch/x86/include/asm/linkage.h | 7 ++++---
include/asm-generic/vmlinux.lds.h | 7 ++++++-
4 files changed, 20 insertions(+), 4 deletions(-)

--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -517,3 +517,9 @@ config CPU_SUP_VORTEX_32
makes the kernel a tiny bit smaller.

If unsure, say N.
+
+# Defined here so it is defined for UM too
+config FUNCTION_ALIGNMENT
+ int
+ default 16 if X86_64 || X86_ALIGNMENT_16
+ default 8
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -84,6 +84,10 @@ else
KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
endif

+ifneq ($(CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B),y)
+KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT)
+endif
+
ifeq ($(CONFIG_X86_32),y)
BITS := 32
UTS_MACHINE := i386
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -14,9 +14,10 @@

#ifdef __ASSEMBLY__

-#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16)
-#define __ALIGN .p2align 4, 0x90
-#define __ALIGN_STR __stringify(__ALIGN)
+#if CONFIG_FUNCTION_ALIGNMENT == 16
+#define __ALIGN .p2align 4, 0x90
+#define __ALIGN_STR __stringify(__ALIGN)
+#define FUNCTION_ALIGNMENT 16
#endif

#if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO)
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -82,7 +82,12 @@
#endif

/* Align . to a 8 byte boundary equals to maximum function alignment. */
-#define ALIGN_FUNCTION() . = ALIGN(8)
+#ifndef CONFIG_FUNCTION_ALIGNMENT
+#define __FUNCTION_ALIGNMENT 8
+#else
+#define __FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT
+#endif
+#define ALIGN_FUNCTION() . = ALIGN(__FUNCTION_ALIGNMENT)

/*
* LD_DEAD_CODE_DATA_ELIMINATION option enables -fdata-sections, which



2022-09-02 16:56:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment

On Fri, Sep 2, 2022 at 6:55 AM Peter Zijlstra <[email protected]> wrote:
>
> --- a/arch/x86/include/asm/linkage.h
> +++ b/arch/x86/include/asm/linkage.h
> @@ -14,9 +14,10 @@
>
> #ifdef __ASSEMBLY__
>
> -#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16)
> -#define __ALIGN .p2align 4, 0x90
> -#define __ALIGN_STR __stringify(__ALIGN)
> +#if CONFIG_FUNCTION_ALIGNMENT == 16
> +#define __ALIGN .p2align 4, 0x90
> +#define __ALIGN_STR __stringify(__ALIGN)
> +#define FUNCTION_ALIGNMENT 16
> #endif

Ugh.

Why is this conditional on that alignment being 16?

Is it purely because the CONFIG variable was mis-designed, and is the
number of bytes, instead of being the shift? If so, just fix that, and
then do an unconditional

#define __ALIGN .p2align
CONFIG_FUNCTION_ALIGNMENT_SHIFT, 0x90
#define __ALIGN_STR __stringify(__ALIGN)

(leave the asm-generic one conditional, since then the condition makes
sense - it's arch-specific).

Linus

2022-09-02 17:50:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment

On Fri, Sep 02, 2022 at 09:51:17AM -0700, Linus Torvalds wrote:
> On Fri, Sep 2, 2022 at 6:55 AM Peter Zijlstra <[email protected]> wrote:
> >
> > --- a/arch/x86/include/asm/linkage.h
> > +++ b/arch/x86/include/asm/linkage.h
> > @@ -14,9 +14,10 @@
> >
> > #ifdef __ASSEMBLY__
> >
> > -#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16)
> > -#define __ALIGN .p2align 4, 0x90
> > -#define __ALIGN_STR __stringify(__ALIGN)
> > +#if CONFIG_FUNCTION_ALIGNMENT == 16
> > +#define __ALIGN .p2align 4, 0x90
> > +#define __ALIGN_STR __stringify(__ALIGN)
> > +#define FUNCTION_ALIGNMENT 16
> > #endif
>
> Ugh.
>
> Why is this conditional on that alignment being 16?

There is a DEBUG case that increases the thing to 32.

2022-09-02 18:14:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment

On Fri, Sep 2, 2022 at 10:32 AM Peter Zijlstra <[email protected]> wrote:
>
> There is a DEBUG case that increases the thing to 32.

Well, but that should be part of the Kconfig rules too.

In fact, I think that argues for moving that FUNCTION_ALIGNMENT into
the generic Kconfig, since we already have that much hackier debug
thing there.

That would then get rid of the conditional in asm-generic too, and get
rid of the horrid hack in the main Makefile as well.

I love how commit cf536e185869 ("Makefile: extend 32B aligned debug
option to 64B aligned") took that previous random debug entry and just
made it 64B instead of 32B. What a crock that all is.

Let's just do this right.

Linus


Linus

2022-09-05 02:46:11

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 08/59] x86/build: Ensure proper function alignment

From: Peter Zijlstra
> Sent: 02 September 2022 14:07
>
> From: Thomas Gleixner <[email protected]>
>
> The Intel Architectures Optimization Reference Manual explains that
> functions should be aligned at 16 bytes because for a lot of (Intel)
> uarchs the I-fetch width is 16 bytes. The AMD Software Optimization
> Guide (for recent chips) mentions a 32 byte I-fetch window but a 16
> byte decode window.
>
> Follow this advice and align functions to 16 bytes to optimize
> instruction delivery to decode and reduce front-end bottlenecks.

Performance figures?

IIRC the same document will suggest aligning all jump labels.
That is pretty much known to be harmful because of the bloat
it generates.

Also things like CFI and ENDBRA have a habit of making the
entry point unaligned unless you can pad to 16n+x values.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-09-05 10:09:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment

On Fri, Sep 02, 2022 at 11:08:54AM -0700, Linus Torvalds wrote:

> Let's just do this right.

Something like so then?

---
--- a/Makefile
+++ b/Makefile
@@ -940,8 +940,8 @@ KBUILD_CFLAGS += $(CC_FLAGS_CFI)
export CC_FLAGS_CFI
endif

-ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B
-KBUILD_CFLAGS += -falign-functions=64
+ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0)
+KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT)
endif

# arch Makefile may override CC so keep this after arch Makefile is included
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1419,4 +1419,24 @@ source "kernel/gcov/Kconfig"

source "scripts/gcc-plugins/Kconfig"

+config FUNCTION_ALIGNMENT_8B
+ bool
+
+config FUNCTION_ALIGNMENT_16B
+ bool
+
+config FUNCTION_ALIGNMENT_32B
+ bool
+
+config FUNCTION_ALIGNMENT_64B
+ bool
+
+config FUNCTION_ALIGNMENT
+ int
+ default 64 if FUNCTION_ALIGNMENT_64B
+ default 32 if FUNCTION_ALIGNMENT_32B
+ default 16 if FUNCTION_ALIGNMENT_16B
+ default 8 if FUNCTION_ALIGNMENT_8B
+ default 0
+
endmenu
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -63,6 +63,7 @@ config IA64
select NUMA if !FLATMEM
select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
select ZONE_DMA32
+ select FUNCTION_ALIGNMENT_32B
default y
help
The Itanium Processor Family is Intel's 64-bit successor to
--- a/arch/ia64/Makefile
+++ b/arch/ia64/Makefile
@@ -23,7 +23,7 @@ KBUILD_AFLAGS_KERNEL := -mconstant-gp
EXTRA :=

cflags-y := -pipe $(EXTRA) -ffixed-r13 -mfixed-range=f12-f15,f32-f127 \
- -falign-functions=32 -frename-registers -fno-optimize-sibling-calls
+ -frename-registers -fno-optimize-sibling-calls
KBUILD_CFLAGS_KERNEL := -mconstant-gp

GAS_STATUS = $(shell $(srctree)/arch/ia64/scripts/check-gas "$(CC)" "$(OBJDUMP)")
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -283,6 +283,8 @@ config X86
select X86_FEATURE_NAMES if PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
+ select FUNCTION_ALIGNMENT_16B if X86_64 || X86_ALIGNMENT_16
+ select FUNCTION_ALIGNMENT_8B
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI

config INSTRUCTION_DECODER
@@ -2442,6 +2444,7 @@ config HAVE_CALL_THUNKS
depends on CC_HAS_ENTRY_PADDING && RETHUNK && OBJTOOL

config CALL_THUNKS
+ select FUNCTION_ALIGNMENT_16B
def_bool n

menuconfig SPECULATION_MITIGATIONS
@@ -2515,6 +2518,7 @@ config CALL_DEPTH_TRACKING

config CALL_THUNKS_DEBUG
bool "Enable call thunks and call depth tracking debugging"
+ select FUNCTION_ALIGNMENT_32B
depends on CALL_DEPTH_TRACKING
default n
help
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -517,9 +517,3 @@ config CPU_SUP_VORTEX_32
makes the kernel a tiny bit smaller.

If unsure, say N.
-
-# Defined here so it is defined for UM too
-config FUNCTION_ALIGNMENT
- int
- default 16 if X86_64 || X86_ALIGNMENT_16
- default 8
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -84,10 +84,6 @@ else
KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
endif

-ifneq ($(CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B),y)
-KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT)
-endif
-
ifeq ($(CONFIG_X86_32),y)
BITS := 32
UTS_MACHINE := i386
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -12,16 +12,7 @@
#define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
#endif /* CONFIG_X86_32 */

-#if CONFIG_FUNCTION_ALIGNMENT == 8
-# define __ALIGN .p2align 3, 0x90;
-#elif CONFIG_FUNCTION_ALIGNMENT == 16
-# define __ALIGN .p2align 4, 0x90;
-#elif CONFIG_FUNCTION_ALIGNMENT == 32
-# define __ALIGN .p2align 5, 0x90
-#else
-# error Unsupported function alignment
-#endif
-
+#define __ALIGN .balign CONFIG_FUNCTION_ALIGNMENT, 0x90
#define __ALIGN_STR __stringify(__ALIGN)

#ifdef CONFIG_CFI_CLANG
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -69,8 +69,8 @@
#endif

#ifndef __ALIGN
-#define __ALIGN .align 4,0x90
-#define __ALIGN_STR ".align 4,0x90"
+#define __ALIGN .balign CONFIG_FUNCTION_ALIGNMENT
+#define __ALIGN_STR __stringify(__ALIGN)
#endif

#ifdef __ASSEMBLY__
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -459,6 +459,7 @@ config SECTION_MISMATCH_WARN_ONLY
config DEBUG_FORCE_FUNCTION_ALIGN_64B
bool "Force all function address 64B aligned"
depends on EXPERT && (X86_64 || ARM64 || PPC32 || PPC64 || ARC)
+ select FUNCTION_ALIGNMENT_64B
help
There are cases that a commit from one domain changes the function
address alignment of other domains, and cause magic performance

2022-09-12 14:44:05

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment

On Mon, Sep 5, 2022 at 6:07 AM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Sep 02, 2022 at 11:08:54AM -0700, Linus Torvalds wrote:
>
> > Let's just do this right.
>
> Something like so then?

Sorry, I dropped this due to travel.

The patch looks sane, the only thing I worry a bit about is

> +config FUNCTION_ALIGNMENT
> + int
> + default 64 if FUNCTION_ALIGNMENT_64B
..
> + default 0

Is '0' even a valid value then for things like

> +#define __ALIGN .balign CONFIG_FUNCTION_ALIGNMENT
> +#define __ALIGN_STR __stringify(__ALIGN)

because it doesn't really seem like a sensible byte alignment.

Maybe "default 4" would be a safer choice?

Linus

2022-09-12 19:50:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment

On Mon, Sep 12, 2022 at 10:09:38AM -0400, Linus Torvalds wrote:

> The patch looks sane, the only thing I worry a bit about is
>
> > +config FUNCTION_ALIGNMENT
> > + int
> > + default 64 if FUNCTION_ALIGNMENT_64B
> ..
> > + default 0
>
> Is '0' even a valid value then for things like

At the time I thought to have read that 0 alignment effectively no-ops
the statement, but now I can't find it in a hurry, happy to make it
default to 4.

2022-09-13 08:18:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment

On Mon, Sep 12, 2022 at 09:44:05PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 12, 2022 at 10:09:38AM -0400, Linus Torvalds wrote:
>
> > The patch looks sane, the only thing I worry a bit about is
> >
> > > +config FUNCTION_ALIGNMENT
> > > + int
> > > + default 64 if FUNCTION_ALIGNMENT_64B
> > ..
> > > + default 0
> >
> > Is '0' even a valid value then for things like
>
> At the time I thought to have read that 0 alignment effectively no-ops
> the statement, but now I can't find it in a hurry, happy to make it
> default to 4.

Found it: https://sourceware.org/binutils/docs-2.39/as/Balign.html

7.8 .balign[wl] [abs-expr[, abs-expr[, abs-expr]]]

Pad the location counter (in the current subsection) to a particular
storage boundary. The first expression (which must be absolute) is the
alignment request in bytes. ... If the expression is omitted then a
default value of 0 is used, effectively disabling alignment
requirements.

(for some raisin google served me a very old binutils document last
night that doesn't have mention the 0 thing)

2022-09-13 15:22:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment

On Tue, Sep 13, 2022 at 4:09 AM Peter Zijlstra <[email protected]> wrote:
>
> Found it: https://sourceware.org/binutils/docs-2.39/as/Balign.html
>
> 7.8 .balign[wl] [abs-expr[, abs-expr[, abs-expr]]]
>
> Pad the location counter [...]

Very good. All looks sane to me then.

Linus