2021-01-11 08:22:12

by Bill Wendling

[permalink] [raw]
Subject: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool before
it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can be used either by the compiler if LTO isn't enabled:

... -fprofile-use=vmlinux.profdata ...

or by LLD if LTO is enabled:

... -lto-cs-profile-file=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we know
works. This restriction can be lifted once other platforms have been verified
to work with PGO.

Note that this method of profiling the kernel is clang-native and isn't
compatible with clang's gcov support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/arm/boot/bootp/Makefile | 1 +
arch/arm/boot/compressed/Makefile | 1 +
arch/arm/vdso/Makefile | 3 +-
arch/arm64/kernel/vdso/Makefile | 3 +-
arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
arch/mips/boot/compressed/Makefile | 1 +
arch/mips/vdso/Makefile | 1 +
arch/nds32/kernel/vdso/Makefile | 4 +-
arch/parisc/boot/compressed/Makefile | 1 +
arch/powerpc/kernel/Makefile | 6 +-
arch/powerpc/kernel/trace/Makefile | 3 +-
arch/powerpc/kernel/vdso32/Makefile | 1 +
arch/powerpc/kernel/vdso64/Makefile | 1 +
arch/powerpc/kexec/Makefile | 3 +-
arch/powerpc/xmon/Makefile | 1 +
arch/riscv/kernel/vdso/Makefile | 3 +-
arch/s390/boot/Makefile | 1 +
arch/s390/boot/compressed/Makefile | 1 +
arch/s390/kernel/Makefile | 1 +
arch/s390/kernel/vdso64/Makefile | 3 +-
arch/s390/purgatory/Makefile | 1 +
arch/sh/boot/compressed/Makefile | 1 +
arch/sh/mm/Makefile | 1 +
arch/sparc/vdso/Makefile | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/s390/char/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 44 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 34 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 147 ++++++++++
kernel/pgo/pgo.h | 206 ++++++++++++++
scripts/Makefile.lib | 10 +
48 files changed, 1017 insertions(+), 9 deletions(-)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..2ed7f549b20ef
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ make ... KCLAGS=-fprofile-use=vmlinux.profdata
diff --git a/MAINTAINERS b/MAINTAINERS
index 6390491b07e51..7a98bdaab9861 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13955,6 +13955,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index 8b2c3f88ee5ea..4f42957c78134 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 78c6f05b10f91..a7a6ab7d204dc 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
bool

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
index 981a8d03f064c..523bd58df0a4b 100644
--- a/arch/arm/boot/bootp/Makefile
+++ b/arch/arm/boot/bootp/Makefile
@@ -7,6 +7,7 @@
#

GCOV_PROFILE := n
+PGO_PROFILE := n

LDFLAGS_bootp := --no-undefined -X \
--defsym initrd_phys=$(INITRD_PHYS) \
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index fb521efcc6c20..5fd0fd85fc0e5 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,7 @@ OBJS += hyp-stub.o
endif

GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index b558bee0e1f6b..11f6ce4b48b56 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -36,8 +36,9 @@ else
CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
endif

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
KCOV_INSTRUMENT := n
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index cd9c3fa25902f..d48fc0df07020 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
endif

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-y += vdso.o
targets += vdso.lds
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 1f1e351c5fe2b..ad128ecdbfbdf 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
# compiler instrumentation that inserts callbacks or checks into the code may
# cause crashes. Just disable it.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
index 47cd9dc7454af..0855ea12f2c7f 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n

# decompressor objects (linked with vmlinuz)
vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
index 5810cc12bc1d9..d7eb64de35eae 100644
--- a/arch/mips/vdso/Makefile
+++ b/arch/mips/vdso/Makefile
@@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
CFLAGS_REMOVE_vdso.o = -pg

GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n

diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
index 55df25ef00578..f2b53ee2124b7 100644
--- a/arch/nds32/kernel/vdso/Makefile
+++ b/arch/nds32/kernel/vdso/Makefile
@@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
-Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
-
+PGO_PROFILE := n

obj-y += vdso.o
targets += vdso.lds
diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
index dff4536875305..5cf93a67f7da7 100644
--- a/arch/parisc/boot/compressed/Makefile
+++ b/arch/parisc/boot/compressed/Makefile
@@ -7,6 +7,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe2ef598e2ead..c642c046660d7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -153,17 +153,21 @@ endif
obj-$(CONFIG_PPC_SECURE_BOOT) += secure_boot.o ima_arch.o secvar-ops.o
obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o

-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_prom_init.o := n
+PGO_PROFILE_prom_init.o := n
KCOV_INSTRUMENT_prom_init.o := n
UBSAN_SANITIZE_prom_init.o := n
GCOV_PROFILE_kprobes.o := n
+PGO_PROFILE_kprobes.o := n
KCOV_INSTRUMENT_kprobes.o := n
UBSAN_SANITIZE_kprobes.o := n
GCOV_PROFILE_kprobes-ftrace.o := n
+PGO_PROFILE_kprobes-ftrace.o := n
KCOV_INSTRUMENT_kprobes-ftrace.o := n
UBSAN_SANITIZE_kprobes-ftrace.o := n
GCOV_PROFILE_syscall_64.o := n
+PGO_PROFILE_syscall_64.o := n
KCOV_INSTRUMENT_syscall_64.o := n
UBSAN_SANITIZE_syscall_64.o := n
UBSAN_SANITIZE_vdso.o := n
diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
index 858503775c583..7d72ae7d4f8c6 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING) += trace_clock.o
obj-$(CONFIG_PPC64) += $(obj64-y)
obj-$(CONFIG_PPC32) += $(obj32-y)

-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_ftrace.o := n
+PGO_PROFILE_ftrace.o := n
KCOV_INSTRUMENT_ftrace.o := n
UBSAN_SANITIZE_ftrace.o := n
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 9cb6f524854b9..655e159975a04 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index bf363ff371521..12c286f5afc16 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 4aff6846c7726..1c7f65e3cb969 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -16,7 +16,8 @@ endif
endif


-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_core_$(BITS).o := n
+PGO_PROFILE_core_$(BITS).o := n
KCOV_INSTRUMENT_core_$(BITS).o := n
UBSAN_SANITIZE_core_$(BITS).o := n
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index eb25d7554ffd1..7aff80d18b44b 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -2,6 +2,7 @@
# Makefile for xmon

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..882340dc3c647 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
# Disable -pg to prevent insert call site
CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n

# Force dependency
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index 41a64b8dce252..bee4a32040e79 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -5,6 +5,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
index de18dab518bb6..c3ab883e8425a 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -7,6 +7,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index dd73b7f074237..bd857aacad794 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o = $(CC_FLAGS_FTRACE)
endif

GCOV_PROFILE_early.o := n
+PGO_PROFILE_early.o := n
KCOV_INSTRUMENT_early.o := n
UBSAN_SANITIZE_early.o := n
KASAN_SANITIZE_ipl.o := n
diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
index a6e0fb6b91d6c..d7c43b7c1db96 100644
--- a/arch/s390/kernel/vdso64/Makefile
+++ b/arch/s390/kernel/vdso64/Makefile
@@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
targets += vdso64.lds
CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)

-# Disable gcov profiling, ubsan and kasan for VDSO code
+# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index c57f8c40e9926..9aef584e98466 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
index 589d2d8a573db..ae19aeeb3964c 100644
--- a/arch/sh/boot/compressed/Makefile
+++ b/arch/sh/boot/compressed/Makefile
@@ -13,6 +13,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz \
OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o

GCOV_PROFILE := n
+PGO_PROFILE := n

#
# IMAGE_OFFSET is the load offset of the compression loader
diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
index f69ddc70b1465..ea2782c631f43 100644
--- a/arch/sh/mm/Makefile
+++ b/arch/sh/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING) += uncached.o
obj-$(CONFIG_HAVE_SRAM_POOL) += sram.o

GCOV_PROFILE_pmb.o := n
+PGO_PROFILE_pmb.o := n
diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
index c5e1545bc5cf9..ab5f3783fe199 100644
--- a/arch/sparc/vdso/Makefile
+++ b/arch/sparc/vdso/Makefile
@@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copies of vdso*.so. If our toolchain supports
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7b6dd10b162ac..a751b4f8f6645 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -95,6 +95,7 @@ config X86
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
index c6fdb81a068a6..bf6c5db5da1fc 100644
--- a/drivers/s390/char/Makefile
+++ b/drivers/s390/char/Makefile
@@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
endif

GCOV_PROFILE_sclp_early_core.o := n
+PGO_PROFILE_sclp_early_core.o := n
KCOV_INSTRUMENT_sclp_early_core.o := n
UBSAN_SANITIZE_sclp_early_core.o := n
KASAN_SANITIZE_sclp_early_core.o := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
#define THERMAL_TABLE(name)
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ KEEP(*(__llvm_prf_data)) \
+ . = ALIGN(8); \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_start = .; \
+ KEEP(*(__llvm_prf_cnts)) \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_names_start = .; \
+ KEEP(*(__llvm_prf_names)) \
+ . = ALIGN(8); \
+ __llvm_prf_names_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ KEEP(*(__llvm_prf_vals)) \
+ . = ALIGN(8); \
+ __llvm_prf_vals_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ KEEP(*(__llvm_prf_vnds)) \
+ . = ALIGN(8); \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1125,6 +1168,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..318d36bb3d106
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significatnly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..790a8df037bfc
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+ header->magic = LLVM_PRF_MAGIC;
+ header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 8 - (size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/* Serialize the profling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (err) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; ++i) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debufs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..d96b61a1cf712
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#else
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION 5
+#define LLVM_PRF_DATA_ALIGN 8
+#define LLVM_PRF_IPVK_FIRST 0
+#define LLVM_PRF_IPVK_LAST 1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
+
+#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.30.0.284.gd98b1dd5eaa7-goog


2021-01-11 08:44:46

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
<[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool before
> it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can be used either by the compiler if LTO isn't enabled:
>
> ... -fprofile-use=vmlinux.profdata ...
>
> or by LLD if LTO is enabled:
>
> ... -lto-cs-profile-file=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we know
> works. This restriction can be lifted once other platforms have been verified
> to work with PGO.
>
> Note that this method of profiling the kernel is clang-native and isn't
> compatible with clang's gcov support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>

Hi Bill and Sami,

I have seen the pull-request in the CBL issue tracker and had some
questions in mind.

Good you send this.

First of all, I like to fetch any development stuff easily from a Git
repository.
Can you offer this, please?
What is the base for your work?
I hope this is (fresh released) Linux v5.11-rc3.

I myself had some experiences with a PGO + ThinLTO optimized LLVM
toolchain built with the help of tc-build.
Here it takes very long to build it.

This means I have some profile-data archived.
Can I use it?

Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
this or not?
That is one of my important questions.

Thanks for your work.

Regards,
- Sedat -

> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/arm/boot/bootp/Makefile | 1 +
> arch/arm/boot/compressed/Makefile | 1 +
> arch/arm/vdso/Makefile | 3 +-
> arch/arm64/kernel/vdso/Makefile | 3 +-
> arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
> arch/mips/boot/compressed/Makefile | 1 +
> arch/mips/vdso/Makefile | 1 +
> arch/nds32/kernel/vdso/Makefile | 4 +-
> arch/parisc/boot/compressed/Makefile | 1 +
> arch/powerpc/kernel/Makefile | 6 +-
> arch/powerpc/kernel/trace/Makefile | 3 +-
> arch/powerpc/kernel/vdso32/Makefile | 1 +
> arch/powerpc/kernel/vdso64/Makefile | 1 +
> arch/powerpc/kexec/Makefile | 3 +-
> arch/powerpc/xmon/Makefile | 1 +
> arch/riscv/kernel/vdso/Makefile | 3 +-
> arch/s390/boot/Makefile | 1 +
> arch/s390/boot/compressed/Makefile | 1 +
> arch/s390/kernel/Makefile | 1 +
> arch/s390/kernel/vdso64/Makefile | 3 +-
> arch/s390/purgatory/Makefile | 1 +
> arch/sh/boot/compressed/Makefile | 1 +
> arch/sh/mm/Makefile | 1 +
> arch/sparc/vdso/Makefile | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> drivers/s390/char/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 34 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 147 ++++++++++
> kernel/pgo/pgo.h | 206 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 48 files changed, 1017 insertions(+), 9 deletions(-)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9e..8d6418e858062 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 0000000000000..2ed7f549b20ef
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + make ... KCLAGS=-fprofile-use=vmlinux.profdata
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6390491b07e51..7a98bdaab9861 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13955,6 +13955,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index 8b2c3f88ee5ea..4f42957c78134 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 78c6f05b10f91..a7a6ab7d204dc 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
> bool
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
> index 981a8d03f064c..523bd58df0a4b 100644
> --- a/arch/arm/boot/bootp/Makefile
> +++ b/arch/arm/boot/bootp/Makefile
> @@ -7,6 +7,7 @@
> #
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> LDFLAGS_bootp := --no-undefined -X \
> --defsym initrd_phys=$(INITRD_PHYS) \
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index fb521efcc6c20..5fd0fd85fc0e5 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -24,6 +24,7 @@ OBJS += hyp-stub.o
> endif
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
>
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
> index b558bee0e1f6b..11f6ce4b48b56 100644
> --- a/arch/arm/vdso/Makefile
> +++ b/arch/arm/vdso/Makefile
> @@ -36,8 +36,9 @@ else
> CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
> endif
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> KCOV_INSTRUMENT := n
> diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> index cd9c3fa25902f..d48fc0df07020 100644
> --- a/arch/arm64/kernel/vdso/Makefile
> +++ b/arch/arm64/kernel/vdso/Makefile
> @@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
> CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
> endif
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-y += vdso.o
> targets += vdso.lds
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 1f1e351c5fe2b..ad128ecdbfbdf 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
> # compiler instrumentation that inserts callbacks or checks into the code may
> # cause crashes. Just disable it.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCOV_INSTRUMENT := n
> diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
> index 47cd9dc7454af..0855ea12f2c7f 100644
> --- a/arch/mips/boot/compressed/Makefile
> +++ b/arch/mips/boot/compressed/Makefile
> @@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> # decompressor objects (linked with vmlinuz)
> vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
> diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
> index 5810cc12bc1d9..d7eb64de35eae 100644
> --- a/arch/mips/vdso/Makefile
> +++ b/arch/mips/vdso/Makefile
> @@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
> CFLAGS_REMOVE_vdso.o = -pg
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KCOV_INSTRUMENT := n
>
> diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
> index 55df25ef00578..f2b53ee2124b7 100644
> --- a/arch/nds32/kernel/vdso/Makefile
> +++ b/arch/nds32/kernel/vdso/Makefile
> @@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
> ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
> -Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
> -
> +PGO_PROFILE := n
>
> obj-y += vdso.o
> targets += vdso.lds
> diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
> index dff4536875305..5cf93a67f7da7 100644
> --- a/arch/parisc/boot/compressed/Makefile
> +++ b/arch/parisc/boot/compressed/Makefile
> @@ -7,6 +7,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index fe2ef598e2ead..c642c046660d7 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -153,17 +153,21 @@ endif
> obj-$(CONFIG_PPC_SECURE_BOOT) += secure_boot.o ima_arch.o secvar-ops.o
> obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_prom_init.o := n
> +PGO_PROFILE_prom_init.o := n
> KCOV_INSTRUMENT_prom_init.o := n
> UBSAN_SANITIZE_prom_init.o := n
> GCOV_PROFILE_kprobes.o := n
> +PGO_PROFILE_kprobes.o := n
> KCOV_INSTRUMENT_kprobes.o := n
> UBSAN_SANITIZE_kprobes.o := n
> GCOV_PROFILE_kprobes-ftrace.o := n
> +PGO_PROFILE_kprobes-ftrace.o := n
> KCOV_INSTRUMENT_kprobes-ftrace.o := n
> UBSAN_SANITIZE_kprobes-ftrace.o := n
> GCOV_PROFILE_syscall_64.o := n
> +PGO_PROFILE_syscall_64.o := n
> KCOV_INSTRUMENT_syscall_64.o := n
> UBSAN_SANITIZE_syscall_64.o := n
> UBSAN_SANITIZE_vdso.o := n
> diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
> index 858503775c583..7d72ae7d4f8c6 100644
> --- a/arch/powerpc/kernel/trace/Makefile
> +++ b/arch/powerpc/kernel/trace/Makefile
> @@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING) += trace_clock.o
> obj-$(CONFIG_PPC64) += $(obj64-y)
> obj-$(CONFIG_PPC32) += $(obj32-y)
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_ftrace.o := n
> +PGO_PROFILE_ftrace.o := n
> KCOV_INSTRUMENT_ftrace.o := n
> UBSAN_SANITIZE_ftrace.o := n
> diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
> index 9cb6f524854b9..655e159975a04 100644
> --- a/arch/powerpc/kernel/vdso32/Makefile
> +++ b/arch/powerpc/kernel/vdso32/Makefile
> @@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
> obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
> diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
> index bf363ff371521..12c286f5afc16 100644
> --- a/arch/powerpc/kernel/vdso64/Makefile
> +++ b/arch/powerpc/kernel/vdso64/Makefile
> @@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
> obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
> diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
> index 4aff6846c7726..1c7f65e3cb969 100644
> --- a/arch/powerpc/kexec/Makefile
> +++ b/arch/powerpc/kexec/Makefile
> @@ -16,7 +16,8 @@ endif
> endif
>
>
> -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_core_$(BITS).o := n
> +PGO_PROFILE_core_$(BITS).o := n
> KCOV_INSTRUMENT_core_$(BITS).o := n
> UBSAN_SANITIZE_core_$(BITS).o := n
> diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> index eb25d7554ffd1..7aff80d18b44b 100644
> --- a/arch/powerpc/xmon/Makefile
> +++ b/arch/powerpc/xmon/Makefile
> @@ -2,6 +2,7 @@
> # Makefile for xmon
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> index 0cfd6da784f84..882340dc3c647 100644
> --- a/arch/riscv/kernel/vdso/Makefile
> +++ b/arch/riscv/kernel/vdso/Makefile
> @@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
> # Disable -pg to prevent insert call site
> CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>
> -# Disable gcov profiling for VDSO code
> +# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KCOV_INSTRUMENT := n
>
> # Force dependency
> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> index 41a64b8dce252..bee4a32040e79 100644
> --- a/arch/s390/boot/Makefile
> +++ b/arch/s390/boot/Makefile
> @@ -5,6 +5,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
> diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
> index de18dab518bb6..c3ab883e8425a 100644
> --- a/arch/s390/boot/compressed/Makefile
> +++ b/arch/s390/boot/compressed/Makefile
> @@ -7,6 +7,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
> diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> index dd73b7f074237..bd857aacad794 100644
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o = $(CC_FLAGS_FTRACE)
> endif
>
> GCOV_PROFILE_early.o := n
> +PGO_PROFILE_early.o := n
> KCOV_INSTRUMENT_early.o := n
> UBSAN_SANITIZE_early.o := n
> KASAN_SANITIZE_ipl.o := n
> diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
> index a6e0fb6b91d6c..d7c43b7c1db96 100644
> --- a/arch/s390/kernel/vdso64/Makefile
> +++ b/arch/s390/kernel/vdso64/Makefile
> @@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
> targets += vdso64.lds
> CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
> -# Disable gcov profiling, ubsan and kasan for VDSO code
> +# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
> diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
> index c57f8c40e9926..9aef584e98466 100644
> --- a/arch/s390/purgatory/Makefile
> +++ b/arch/s390/purgatory/Makefile
> @@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
> diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
> index 589d2d8a573db..ae19aeeb3964c 100644
> --- a/arch/sh/boot/compressed/Makefile
> +++ b/arch/sh/boot/compressed/Makefile
> @@ -13,6 +13,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz \
> OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # IMAGE_OFFSET is the load offset of the compression loader
> diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
> index f69ddc70b1465..ea2782c631f43 100644
> --- a/arch/sh/mm/Makefile
> +++ b/arch/sh/mm/Makefile
> @@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING) += uncached.o
> obj-$(CONFIG_HAVE_SRAM_POOL) += sram.o
>
> GCOV_PROFILE_pmb.o := n
> +PGO_PROFILE_pmb.o := n
> diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
> index c5e1545bc5cf9..ab5f3783fe199 100644
> --- a/arch/sparc/vdso/Makefile
> +++ b/arch/sparc/vdso/Makefile
> @@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copies of vdso*.so. If our toolchain supports
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 7b6dd10b162ac..a751b4f8f6645 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -95,6 +95,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce2..383853e32f673 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3faa..ed12ab65f6065 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380bd..26e2b3af0145c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f25..f6cab2316c46a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd5..5f22b31446ad4 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..36f20e99da0bc 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..21797192f958f 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f357..54f5768f58530 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b33..2d81623b33f29 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
> index c6fdb81a068a6..bf6c5db5da1fc 100644
> --- a/drivers/s390/char/Makefile
> +++ b/drivers/s390/char/Makefile
> @@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
> endif
>
> GCOV_PROFILE_sclp_early_core.o := n
> +PGO_PROFILE_sclp_early_core.o := n
> KCOV_INSTRUMENT_sclp_early_core.o := n
> UBSAN_SANITIZE_sclp_early_core.o := n
> KASAN_SANITIZE_sclp_early_core.o := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535a..3a591bb18c5fb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf3..0b34ca228ba46 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 0000000000000..318d36bb3d106
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,34 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significatnly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 0000000000000..41e27cefd9a47
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 0000000000000..790a8df037bfc
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,382 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> + header->magic = LLVM_PRF_MAGIC;
> + header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 8 - (size % 8);
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/* Serialize the profling data into a format LLVM's tools can understand. */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (err) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; ++i) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debufs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 0000000000000..d96b61a1cf712
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,147 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/* Lock guarding value node access and serialization. */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the CounterIndex if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 0000000000000..df0aa278f28bd
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#ifdef CONFIG_64BIT
> + #define LLVM_PRF_MAGIC \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#else
> + #define LLVM_PRF_MAGIC \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +#endif
> +
> +#define LLVM_PRF_VERSION 5
> +#define LLVM_PRF_DATA_ALIGN 8
> +#define LLVM_PRF_IPVK_FIRST 0
> +#define LLVM_PRF_IPVK_LAST 1
> +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> +
> +#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
> +#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_PRF_DATA_ALIGN);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33e..9b218afb5cb87 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210111081821.3041587-1-morbo%40google.com.

2021-01-11 08:45:56

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
>
> On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
> <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can be used either by the compiler if LTO isn't enabled:
> >
> > ... -fprofile-use=vmlinux.profdata ...
> >
> > or by LLD if LTO is enabled:
> >
> > ... -lto-cs-profile-file=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we know
> > works. This restriction can be lifted once other platforms have been verified
> > to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native and isn't
> > compatible with clang's gcov support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
>
> Hi Bill and Sami,
>
> I have seen the pull-request in the CBL issue tracker and had some
> questions in mind.
>
> Good you send this.
>
> First of all, I like to fetch any development stuff easily from a Git
> repository.
> Can you offer this, please?
> What is the base for your work?
> I hope this is (fresh released) Linux v5.11-rc3.
>
> I myself had some experiences with a PGO + ThinLTO optimized LLVM
> toolchain built with the help of tc-build.
> Here it takes very long to build it.
>
> This means I have some profile-data archived.
> Can I use it?
>
> Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
> this or not?

When I recall correctly such an optimized LLVM toolchain saved here
40% of build-time.
So, I am highly interested in reducing my build-time.

- Sedat -

> That is one of my important questions.
>
> Thanks for your work.
>
> Regards,
> - Sedat -
>
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/arm/boot/bootp/Makefile | 1 +
> > arch/arm/boot/compressed/Makefile | 1 +
> > arch/arm/vdso/Makefile | 3 +-
> > arch/arm64/kernel/vdso/Makefile | 3 +-
> > arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
> > arch/mips/boot/compressed/Makefile | 1 +
> > arch/mips/vdso/Makefile | 1 +
> > arch/nds32/kernel/vdso/Makefile | 4 +-
> > arch/parisc/boot/compressed/Makefile | 1 +
> > arch/powerpc/kernel/Makefile | 6 +-
> > arch/powerpc/kernel/trace/Makefile | 3 +-
> > arch/powerpc/kernel/vdso32/Makefile | 1 +
> > arch/powerpc/kernel/vdso64/Makefile | 1 +
> > arch/powerpc/kexec/Makefile | 3 +-
> > arch/powerpc/xmon/Makefile | 1 +
> > arch/riscv/kernel/vdso/Makefile | 3 +-
> > arch/s390/boot/Makefile | 1 +
> > arch/s390/boot/compressed/Makefile | 1 +
> > arch/s390/kernel/Makefile | 1 +
> > arch/s390/kernel/vdso64/Makefile | 3 +-
> > arch/s390/purgatory/Makefile | 1 +
> > arch/sh/boot/compressed/Makefile | 1 +
> > arch/sh/mm/Makefile | 1 +
> > arch/sparc/vdso/Makefile | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > drivers/s390/char/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 44 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 34 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 147 ++++++++++
> > kernel/pgo/pgo.h | 206 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 48 files changed, 1017 insertions(+), 9 deletions(-)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9e..8d6418e858062 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 0000000000000..2ed7f549b20ef
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + echo 1 > /sys/kernel/debug/pgo/reset
> > +
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + make ... KCLAGS=-fprofile-use=vmlinux.profdata
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 6390491b07e51..7a98bdaab9861 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13955,6 +13955,15 @@ S: Maintained
> > F: include/linux/personality.h
> > F: include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M: Sami Tolvanen <[email protected]>
> > +M: Bill Wendling <[email protected]>
> > +R: Nathan Chancellor <[email protected]>
> > +R: Nick Desaulniers <[email protected]>
> > +S: Supported
> > +F: Documentation/dev-tools/pgo.rst
> > +F: kernel/pgo
> > +
> > PHOENIX RC FLIGHT CONTROLLER ADAPTER
> > M: Marcus Folkesson <[email protected]>
> > L: [email protected]
> > diff --git a/Makefile b/Makefile
> > index 8b2c3f88ee5ea..4f42957c78134 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 78c6f05b10f91..a7a6ab7d204dc 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
> > bool
> >
> > source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> > source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
> > index 981a8d03f064c..523bd58df0a4b 100644
> > --- a/arch/arm/boot/bootp/Makefile
> > +++ b/arch/arm/boot/bootp/Makefile
> > @@ -7,6 +7,7 @@
> > #
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > LDFLAGS_bootp := --no-undefined -X \
> > --defsym initrd_phys=$(INITRD_PHYS) \
> > diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> > index fb521efcc6c20..5fd0fd85fc0e5 100644
> > --- a/arch/arm/boot/compressed/Makefile
> > +++ b/arch/arm/boot/compressed/Makefile
> > @@ -24,6 +24,7 @@ OBJS += hyp-stub.o
> > endif
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> >
> > # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> > diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
> > index b558bee0e1f6b..11f6ce4b48b56 100644
> > --- a/arch/arm/vdso/Makefile
> > +++ b/arch/arm/vdso/Makefile
> > @@ -36,8 +36,9 @@ else
> > CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
> > endif
> >
> > -# Disable gcov profiling for VDSO code
> > +# Disable gcov and PGO profiling for VDSO code
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> > KCOV_INSTRUMENT := n
> > diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> > index cd9c3fa25902f..d48fc0df07020 100644
> > --- a/arch/arm64/kernel/vdso/Makefile
> > +++ b/arch/arm64/kernel/vdso/Makefile
> > @@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
> > CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
> > endif
> >
> > -# Disable gcov profiling for VDSO code
> > +# Disable gcov and PGO profiling for VDSO code
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-y += vdso.o
> > targets += vdso.lds
> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> > index 1f1e351c5fe2b..ad128ecdbfbdf 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> > @@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
> > # compiler instrumentation that inserts callbacks or checks into the code may
> > # cause crashes. Just disable it.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCOV_INSTRUMENT := n
> > diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
> > index 47cd9dc7454af..0855ea12f2c7f 100644
> > --- a/arch/mips/boot/compressed/Makefile
> > +++ b/arch/mips/boot/compressed/Makefile
> > @@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
> > # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> > KCOV_INSTRUMENT := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > # decompressor objects (linked with vmlinuz)
> > vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
> > diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
> > index 5810cc12bc1d9..d7eb64de35eae 100644
> > --- a/arch/mips/vdso/Makefile
> > +++ b/arch/mips/vdso/Makefile
> > @@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
> > CFLAGS_REMOVE_vdso.o = -pg
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > KCOV_INSTRUMENT := n
> >
> > diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
> > index 55df25ef00578..f2b53ee2124b7 100644
> > --- a/arch/nds32/kernel/vdso/Makefile
> > +++ b/arch/nds32/kernel/vdso/Makefile
> > @@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
> > ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
> > -Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
> >
> > -# Disable gcov profiling for VDSO code
> > +# Disable gcov and PGO profiling for VDSO code
> > GCOV_PROFILE := n
> > -
> > +PGO_PROFILE := n
> >
> > obj-y += vdso.o
> > targets += vdso.lds
> > diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
> > index dff4536875305..5cf93a67f7da7 100644
> > --- a/arch/parisc/boot/compressed/Makefile
> > +++ b/arch/parisc/boot/compressed/Makefile
> > @@ -7,6 +7,7 @@
> >
> > KCOV_INSTRUMENT := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
> > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> > index fe2ef598e2ead..c642c046660d7 100644
> > --- a/arch/powerpc/kernel/Makefile
> > +++ b/arch/powerpc/kernel/Makefile
> > @@ -153,17 +153,21 @@ endif
> > obj-$(CONFIG_PPC_SECURE_BOOT) += secure_boot.o ima_arch.o secvar-ops.o
> > obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o
> >
> > -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> > +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> > GCOV_PROFILE_prom_init.o := n
> > +PGO_PROFILE_prom_init.o := n
> > KCOV_INSTRUMENT_prom_init.o := n
> > UBSAN_SANITIZE_prom_init.o := n
> > GCOV_PROFILE_kprobes.o := n
> > +PGO_PROFILE_kprobes.o := n
> > KCOV_INSTRUMENT_kprobes.o := n
> > UBSAN_SANITIZE_kprobes.o := n
> > GCOV_PROFILE_kprobes-ftrace.o := n
> > +PGO_PROFILE_kprobes-ftrace.o := n
> > KCOV_INSTRUMENT_kprobes-ftrace.o := n
> > UBSAN_SANITIZE_kprobes-ftrace.o := n
> > GCOV_PROFILE_syscall_64.o := n
> > +PGO_PROFILE_syscall_64.o := n
> > KCOV_INSTRUMENT_syscall_64.o := n
> > UBSAN_SANITIZE_syscall_64.o := n
> > UBSAN_SANITIZE_vdso.o := n
> > diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
> > index 858503775c583..7d72ae7d4f8c6 100644
> > --- a/arch/powerpc/kernel/trace/Makefile
> > +++ b/arch/powerpc/kernel/trace/Makefile
> > @@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING) += trace_clock.o
> > obj-$(CONFIG_PPC64) += $(obj64-y)
> > obj-$(CONFIG_PPC32) += $(obj32-y)
> >
> > -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> > +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> > GCOV_PROFILE_ftrace.o := n
> > +PGO_PROFILE_ftrace.o := n
> > KCOV_INSTRUMENT_ftrace.o := n
> > UBSAN_SANITIZE_ftrace.o := n
> > diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
> > index 9cb6f524854b9..655e159975a04 100644
> > --- a/arch/powerpc/kernel/vdso32/Makefile
> > +++ b/arch/powerpc/kernel/vdso32/Makefile
> > @@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
> > obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KCOV_INSTRUMENT := n
> > UBSAN_SANITIZE := n
> > KASAN_SANITIZE := n
> > diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
> > index bf363ff371521..12c286f5afc16 100644
> > --- a/arch/powerpc/kernel/vdso64/Makefile
> > +++ b/arch/powerpc/kernel/vdso64/Makefile
> > @@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
> > obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KCOV_INSTRUMENT := n
> > UBSAN_SANITIZE := n
> > KASAN_SANITIZE := n
> > diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
> > index 4aff6846c7726..1c7f65e3cb969 100644
> > --- a/arch/powerpc/kexec/Makefile
> > +++ b/arch/powerpc/kexec/Makefile
> > @@ -16,7 +16,8 @@ endif
> > endif
> >
> >
> > -# Disable GCOV, KCOV & sanitizers in odd or sensitive code
> > +# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> > GCOV_PROFILE_core_$(BITS).o := n
> > +PGO_PROFILE_core_$(BITS).o := n
> > KCOV_INSTRUMENT_core_$(BITS).o := n
> > UBSAN_SANITIZE_core_$(BITS).o := n
> > diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> > index eb25d7554ffd1..7aff80d18b44b 100644
> > --- a/arch/powerpc/xmon/Makefile
> > +++ b/arch/powerpc/xmon/Makefile
> > @@ -2,6 +2,7 @@
> > # Makefile for xmon
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KCOV_INSTRUMENT := n
> > UBSAN_SANITIZE := n
> > KASAN_SANITIZE := n
> > diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> > index 0cfd6da784f84..882340dc3c647 100644
> > --- a/arch/riscv/kernel/vdso/Makefile
> > +++ b/arch/riscv/kernel/vdso/Makefile
> > @@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
> > # Disable -pg to prevent insert call site
> > CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >
> > -# Disable gcov profiling for VDSO code
> > +# Disable gcov and PGO profiling for VDSO code
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KCOV_INSTRUMENT := n
> >
> > # Force dependency
> > diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> > index 41a64b8dce252..bee4a32040e79 100644
> > --- a/arch/s390/boot/Makefile
> > +++ b/arch/s390/boot/Makefile
> > @@ -5,6 +5,7 @@
> >
> > KCOV_INSTRUMENT := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > KASAN_SANITIZE := n
> >
> > diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
> > index de18dab518bb6..c3ab883e8425a 100644
> > --- a/arch/s390/boot/compressed/Makefile
> > +++ b/arch/s390/boot/compressed/Makefile
> > @@ -7,6 +7,7 @@
> >
> > KCOV_INSTRUMENT := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > KASAN_SANITIZE := n
> >
> > diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> > index dd73b7f074237..bd857aacad794 100644
> > --- a/arch/s390/kernel/Makefile
> > +++ b/arch/s390/kernel/Makefile
> > @@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o = $(CC_FLAGS_FTRACE)
> > endif
> >
> > GCOV_PROFILE_early.o := n
> > +PGO_PROFILE_early.o := n
> > KCOV_INSTRUMENT_early.o := n
> > UBSAN_SANITIZE_early.o := n
> > KASAN_SANITIZE_ipl.o := n
> > diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
> > index a6e0fb6b91d6c..d7c43b7c1db96 100644
> > --- a/arch/s390/kernel/vdso64/Makefile
> > +++ b/arch/s390/kernel/vdso64/Makefile
> > @@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
> > targets += vdso64.lds
> > CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
> >
> > -# Disable gcov profiling, ubsan and kasan for VDSO code
> > +# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > KASAN_SANITIZE := n
> >
> > diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
> > index c57f8c40e9926..9aef584e98466 100644
> > --- a/arch/s390/purgatory/Makefile
> > +++ b/arch/s390/purgatory/Makefile
> > @@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
> >
> > KCOV_INSTRUMENT := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > KASAN_SANITIZE := n
> >
> > diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
> > index 589d2d8a573db..ae19aeeb3964c 100644
> > --- a/arch/sh/boot/compressed/Makefile
> > +++ b/arch/sh/boot/compressed/Makefile
> > @@ -13,6 +13,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz \
> > OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # IMAGE_OFFSET is the load offset of the compression loader
> > diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
> > index f69ddc70b1465..ea2782c631f43 100644
> > --- a/arch/sh/mm/Makefile
> > +++ b/arch/sh/mm/Makefile
> > @@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING) += uncached.o
> > obj-$(CONFIG_HAVE_SRAM_POOL) += sram.o
> >
> > GCOV_PROFILE_pmb.o := n
> > +PGO_PROFILE_pmb.o := n
> > diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
> > index c5e1545bc5cf9..ab5f3783fe199 100644
> > --- a/arch/sparc/vdso/Makefile
> > +++ b/arch/sparc/vdso/Makefile
> > @@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copies of vdso*.so. If our toolchain supports
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 7b6dd10b162ac..a751b4f8f6645 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -95,6 +95,7 @@ config X86
> > select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> > select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> > + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_QUEUED_RWLOCKS
> > select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce2..383853e32f673 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3faa..ed12ab65f6065 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE :=n
> >
> > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380bd..26e2b3af0145c 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> > $(call ld-option, --eh-frame-hdr) -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > quiet_cmd_vdso_and_check = VDSO $@
> > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f25..f6cab2316c46a 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> > BUG_TABLE
> >
> > + PGO_CLANG_DATA
> > +
> > ORC_UNWIND_TABLE
> >
> > . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd5..5f22b31446ad4 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> > KASAN_SANITIZE := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20cb..36f20e99da0bc 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> > # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449f..21797192f958f 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f357..54f5768f58530 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b33..2d81623b33f29 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > # Sanitizer runtimes are unavailable and cannot be linked here.
> > KASAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
> > index c6fdb81a068a6..bf6c5db5da1fc 100644
> > --- a/drivers/s390/char/Makefile
> > +++ b/drivers/s390/char/Makefile
> > @@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
> > endif
> >
> > GCOV_PROFILE_sclp_early_core.o := n
> > +PGO_PROFILE_sclp_early_core.o := n
> > KCOV_INSTRUMENT_sclp_early_core.o := n
> > UBSAN_SANITIZE_sclp_early_core.o := n
> > KASAN_SANITIZE_sclp_early_core.o := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535a..3a591bb18c5fb 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> > #define THERMAL_TABLE(name)
> > #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA \
> > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_start = .; \
> > + __llvm_prf_data_start = .; \
> > + KEEP(*(__llvm_prf_data)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_data_end = .; \
> > + } \
> > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_start = .; \
> > + KEEP(*(__llvm_prf_cnts)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_end = .; \
> > + } \
> > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_start = .; \
> > + KEEP(*(__llvm_prf_names)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> > + __llvm_prf_vals_start = .; \
> > + KEEP(*(__llvm_prf_vals)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vals_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> > + __llvm_prf_vnds_start = .; \
> > + KEEP(*(__llvm_prf_vnds)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vnds_end = .; \
> > + __llvm_prf_end = .; \
> > + }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> > #define KERNEL_DTB() \
> > STRUCT_ALIGN(); \
> > __dtb_start = .; \
> > @@ -1125,6 +1168,7 @@
> > CONSTRUCTORS \
> > } \
> > BUG_TABLE \
> > + PGO_CLANG_DATA
> >
> > #define INIT_TEXT_SECTION(inittext_align) \
> > . = ALIGN(inittext_align); \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf3..0b34ca228ba46 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> > obj-$(CONFIG_KCSAN) += kcsan/
> > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> > obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 0000000000000..318d36bb3d106
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,34 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > + bool
> > +
> > +config PGO_CLANG
> > + bool "Enable clang's PGO-based kernel profiling"
> > + depends on DEBUG_FS
> > + depends on ARCH_SUPPORTS_PGO_CLANG
> > + help
> > + This option enables clang's PGO (Profile Guided Optimization) based
> > + code profiling to better optimize the kernel.
> > +
> > + If unsure, say N.
> > +
> > + Run a representative workload for your application on a kernel
> > + compiled with this option and download the raw profile file from
> > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > + llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > + Copy the resulting profile file into vmlinux.profdata, and enable
> > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > + kernel.
> > +
> > + Note that a kernel compiled with profiling flags will be
> > + significatnly larger and run slower. Also be sure to exclude files
> > + from profiling which are not linked to the kernel image to prevent
> > + linker errors.
> > +
> > + Note that the debugfs filesystem has to be mounted to access
> > + profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 0000000000000..41e27cefd9a47
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > +
> > +obj-y += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 0000000000000..790a8df037bfc
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,382 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > + void *buffer;
> > + unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + * - llvm_prf_header
> > + * - __llvm_prf_data
> > + * - __llvm_prf_cnts
> > + * - __llvm_prf_names
> > + * - zero padding to 8 bytes
> > + * - for each llvm_prf_data in __llvm_prf_data:
> > + * - llvm_prf_value_data
> > + * - llvm_prf_value_record + site count array
> > + * - llvm_prf_value_node_data
> > + * ...
> > + * ...
> > + * ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > + header->magic = LLVM_PRF_MAGIC;
> > + header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> > + header->data_size = prf_data_count();
> > + header->padding_bytes_before_counters = 0;
> > + header->counters_size = prf_cnts_count();
> > + header->padding_bytes_after_counters = 0;
> > + header->names_size = prf_names_count();
> > + header->counters_delta = (u64)__llvm_prf_cnts_start;
> > + header->names_delta = (u64)__llvm_prf_names_start;
> > + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> > +
> > + *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > + memcpy(*buffer, src, size);
> > + *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + u32 kinds = 0;
> > + u32 size = 0;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Record + site count array */
> > + size += prf_get_value_record_size(sites);
> > + kinds++;
> > +
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX)
> > + site = site->next;
> > +
> > + size += count *
> > + sizeof(struct llvm_prf_value_node_data);
> > + }
> > +
> > + s += sites;
> > + }
> > +
> > + if (size)
> > + size += sizeof(struct llvm_prf_value_data);
> > +
> > + if (value_kinds)
> > + *value_kinds = kinds;
> > +
> > + return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > + u32 size = 0;
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + size += __prf_get_value_size(p, NULL);
> > +
> > + return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > + struct llvm_prf_value_data header;
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > + if (!header.num_value_kinds)
> > + /* Nothing to write. */
> > + return;
> > +
> > + prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + struct llvm_prf_value_record *record;
> > + u8 *counts;
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Profiling value record. */
> > + record = *(struct llvm_prf_value_record **)buffer;
> > + *buffer += prf_get_value_record_header_size();
> > +
> > + record->kind = kind;
> > + record->num_value_sites = sites;
> > +
> > + /* Site count array. */
> > + counts = *(u8 **)buffer;
> > + *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > + /*
> > + * If we don't have nodes, we can skip updating the site count
> > + * array, because the buffer is zero filled.
> > + */
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX) {
> > + prf_copy_to_buffer(buffer, site,
> > + sizeof(struct llvm_prf_value_node_data));
> > + site = site->next;
> > + }
> > +
> > + counts[n] = (u8)count;
> > + }
> > +
> > + s += sites;
> > + }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > + return 8 - (size % 8);
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > + return sizeof(struct llvm_prf_header) +
> > + prf_data_size() +
> > + prf_cnts_size() +
> > + prf_names_size() +
> > + prf_get_padding(prf_names_size()) +
> > + prf_get_value_size();
> > +}
> > +
> > +/* Serialize the profling data into a format LLVM's tools can understand. */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > + int err = 0;
> > + void *buffer;
> > +
> > + p->size = prf_buffer_size();
> > + p->buffer = vzalloc(p->size);
> > +
> > + if (!p->buffer) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + buffer = p->buffer;
> > +
> > + prf_fill_header(&buffer);
> > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > + buffer += prf_get_padding(prf_names_size());
> > +
> > + prf_serialize_values(&buffer);
> > +
> > +out:
> > + return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data;
> > + unsigned long flags;
> > + int err;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + flags = prf_lock();
> > +
> > + err = prf_serialize(data);
> > + if (err) {
> > + kfree(data);
> > + goto out_unlock;
> > + }
> > +
> > + file->private_data = data;
> > +
> > +out_unlock:
> > + prf_unlock(flags);
> > +out:
> > + return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + BUG_ON(!data);
> > +
> > + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > + data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + if (data) {
> > + vfree(data->buffer);
> > + kfree(data);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > + .owner = THIS_MODULE,
> > + .open = prf_open,
> > + .read = prf_read,
> > + .llseek = default_llseek,
> > + .release = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > + size_t len, loff_t *pos)
> > +{
> > + struct llvm_prf_data *data;
> > +
> > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> > + struct llvm_prf_value_node **vnodes;
> > + u64 current_vsite_count;
> > + u32 i;
> > +
> > + if (!data->values)
> > + continue;
> > +
> > + current_vsite_count = 0;
> > + vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > + for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> > + current_vsite_count += data->num_value_sites[i];
> > +
> > + for (i = 0; i < current_vsite_count; ++i) {
> > + struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > + while (current_vnode) {
> > + current_vnode->count = 0;
> > + current_vnode = current_vnode->next;
> > + }
> > + }
> > + }
> > +
> > + return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > + .owner = THIS_MODULE,
> > + .write = reset_write,
> > + .llseek = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > + directory = debugfs_create_dir("pgo", NULL);
> > + if (!directory)
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > + &prf_fops))
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("reset", 0200, directory, NULL,
> > + &prf_reset_fops))
> > + goto err_remove;
> > +
> > + return 0;
> > +
> > +err_remove:
> > + pr_err("initialization failed\n");
> > + return -EIO;
> > +}
> > +
> > +/* Remove debufs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > + debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 0000000000000..d96b61a1cf712
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,147 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/* Lock guarding value node access and serialization. */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&pgo_lock, flags);
> > +
> > + return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > + spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > + u32 index, u64 value)
> > +{
> > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > + return NULL; /* Out of nodes */
> > +
> > + current_node++;
> > +
> > + /* Make sure the node is entirely within the section */
> > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > + return NULL;
> > +
> > + return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the CounterIndex if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > + struct llvm_prf_value_node **counters;
> > + struct llvm_prf_value_node *curr;
> > + struct llvm_prf_value_node *min = NULL;
> > + struct llvm_prf_value_node *prev = NULL;
> > + u64 min_count = U64_MAX;
> > + u8 values = 0;
> > + unsigned long flags;
> > +
> > + if (!p || !p->values)
> > + return;
> > +
> > + counters = (struct llvm_prf_value_node **)p->values;
> > + curr = counters[index];
> > +
> > + while (curr) {
> > + if (target_value == curr->value) {
> > + curr->count++;
> > + return;
> > + }
> > +
> > + if (curr->count < min_count) {
> > + min_count = curr->count;
> > + min = curr;
> > + }
> > +
> > + prev = curr;
> > + curr = curr->next;
> > + values++;
> > + }
> > +
> > + if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> > + if (!min->count || !(--min->count)) {
> > + curr = min;
> > + curr->value = target_value;
> > + curr->count++;
> > + }
> > + return;
> > + }
> > +
> > + /* Lock when updating the value node structure. */
> > + flags = prf_lock();
> > +
> > + curr = allocate_node(p, index, target_value);
> > + if (!curr)
> > + goto out;
> > +
> > + curr->value = target_value;
> > + curr->count++;
> > +
> > + if (!counters[index])
> > + counters[index] = curr;
> > + else if (prev && !prev->next)
> > + prev->next = curr;
> > +
> > +out:
> > + prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value)
> > +{
> > + if (large_value != S64_MIN && (s64)target_value >= large_value)
> > + target_value = large_value;
> > + else if ((s64)target_value < precise_start ||
> > + (s64)target_value > precise_last)
> > + target_value = precise_last + 1;
> > +
> > + __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 0000000000000..df0aa278f28bd
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,206 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#ifdef CONFIG_64BIT
> > + #define LLVM_PRF_MAGIC \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'r' << 8 | \
> > + (u64)129)
> > +#else
> > + #define LLVM_PRF_MAGIC \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'R' << 8 | \
> > + (u64)129)
> > +#endif
> > +
> > +#define LLVM_PRF_VERSION 5
> > +#define LLVM_PRF_DATA_ALIGN 8
> > +#define LLVM_PRF_IPVK_FIRST 0
> > +#define LLVM_PRF_IPVK_LAST 1
> > +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> > +
> > +#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
> > +#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + * counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + * counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + * counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + * counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > + u64 magic;
> > + u64 version;
> > + u64 data_size;
> > + u64 padding_bytes_before_counters;
> > + u64 counters_size;
> > + u64 padding_bytes_after_counters;
> > + u64 names_size;
> > + u64 counters_delta;
> > + u64 names_delta;
> > + u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > + const u64 name_ref;
> > + const u64 func_hash;
> > + const void *counter_ptr;
> > + const void *function_ptr;
> > + void *values;
> > + const u32 num_counters;
> > + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> > +} __aligned(LLVM_PRF_DATA_ALIGN);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + * llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > + u64 value;
> > + u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + * the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > + u64 value;
> > + u64 count;
> > + struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + * format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + * data.
> > + */
> > +struct llvm_prf_value_data {
> > + u32 total_size;
> > + u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + * profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + * of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > + u32 kind;
> > + u32 num_value_sites;
> > + u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size() \
> > + offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites) \
> > + roundup((sites), 8)
> > +#define prf_get_value_record_size(sites) \
> > + (prf_get_value_record_header_size() + \
> > + prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > + static inline unsigned long prf_ ## s ## _size(void) \
> > + { \
> > + unsigned long start = \
> > + (unsigned long)__llvm_prf_ ## s ## _start; \
> > + unsigned long end = \
> > + (unsigned long)__llvm_prf_ ## s ## _end; \
> > + return roundup(end - start, \
> > + sizeof(__llvm_prf_ ## s ## _start[0])); \
> > + } \
> > + static inline unsigned long prf_ ## s ## _count(void) \
> > + { \
> > + return prf_ ## s ## _size() / \
> > + sizeof(__llvm_prf_ ## s ## _start[0]); \
> > + }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33e..9b218afb5cb87 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_GCOV))
> > endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > + $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> > #
> > # Enable address sanitizer flags for kernel except some files or directories
> > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.284.gd98b1dd5eaa7-goog
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210111081821.3041587-1-morbo%40google.com.

2021-01-11 09:23:07

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 12:39 AM Sedat Dilek <[email protected]> wrote:
>
> On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
> <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can be used either by the compiler if LTO isn't enabled:
> >
> > ... -fprofile-use=vmlinux.profdata ...
> >
> > or by LLD if LTO is enabled:
> >
> > ... -lto-cs-profile-file=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we know
> > works. This restriction can be lifted once other platforms have been verified
> > to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native and isn't
> > compatible with clang's gcov support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
>
> Hi Bill and Sami,
>
> I have seen the pull-request in the CBL issue tracker and had some
> questions in mind.
>
> Good you send this.
>
> First of all, I like to fetch any development stuff easily from a Git
> repository.

The version in the pull-request in the CBL issue tracker is roughly
the same as this patch. (There are some changes, but they aren't
functionality changes.)

> Can you offer this, please?
> What is the base for your work?
> I hope this is (fresh released) Linux v5.11-rc3.
>
This patch (and the PR on the CBL issue tracker) are from top-of-tree Linux.

> I myself had some experiences with a PGO + ThinLTO optimized LLVM
> toolchain built with the help of tc-build.
> Here it takes very long to build it.
>
> This means I have some profile-data archived.
> Can I use it?
>
LLVM is more tolerant of "stale" profile data than gcov, so it's
possible that your archived profile data would still work, but I can't
guarantee that it will be better than using new profile data.

> Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
> this or not?
> That is one of my important questions.
>
Do you mean that the LLVM tools (clang, llc, etc.) are compiled with
PGO + ThinLTO?

-bw

2021-01-11 13:06:37

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 10:17 AM Bill Wendling <[email protected]> wrote:
>
> On Mon, Jan 11, 2021 at 12:39 AM Sedat Dilek <[email protected]> wrote:
> >
> > On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
> > <[email protected]> wrote:
> > >
> > > From: Sami Tolvanen <[email protected]>
> > >
> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > profile, the kernel is instrumented with PGO counters, a representative
> > > workload is run, and the raw profile data is collected from
> > > /sys/kernel/debug/pgo/profraw.
> > >
> > > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > > it can be used during recompilation:
> > >
> > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >
> > > Multiple raw profiles may be merged during this step.
> > >
> > > The data can be used either by the compiler if LTO isn't enabled:
> > >
> > > ... -fprofile-use=vmlinux.profdata ...
> > >
> > > or by LLD if LTO is enabled:
> > >
> > > ... -lto-cs-profile-file=vmlinux.profdata ...
> > >
> > > This initial submission is restricted to x86, as that's the platform we know
> > > works. This restriction can be lifted once other platforms have been verified
> > > to work with PGO.
> > >
> > > Note that this method of profiling the kernel is clang-native and isn't
> > > compatible with clang's gcov support in kernel/gcov.
> > >
> > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > >
> >
> > Hi Bill and Sami,
> >
> > I have seen the pull-request in the CBL issue tracker and had some
> > questions in mind.
> >
> > Good you send this.
> >
> > First of all, I like to fetch any development stuff easily from a Git
> > repository.
>
> The version in the pull-request in the CBL issue tracker is roughly
> the same as this patch. (There are some changes, but they aren't
> functionality changes.)
>
> > Can you offer this, please?
> > What is the base for your work?
> > I hope this is (fresh released) Linux v5.11-rc3.
> >
> This patch (and the PR on the CBL issue tracker) are from top-of-tree Linux.
>
> > I myself had some experiences with a PGO + ThinLTO optimized LLVM
> > toolchain built with the help of tc-build.
> > Here it takes very long to build it.
> >
> > This means I have some profile-data archived.
> > Can I use it?
> >
> LLVM is more tolerant of "stale" profile data than gcov, so it's
> possible that your archived profile data would still work, but I can't
> guarantee that it will be better than using new profile data.
>
> > Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
> > this or not?
> > That is one of my important questions.
> >
> Do you mean that the LLVM tools (clang, llc, etc.) are compiled with
> PGO + ThinLTO?
>

Yes.

- Sedat -

2021-01-11 18:31:56

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 10:57:35AM +0100, Sedat Dilek wrote:
> On Mon, Jan 11, 2021 at 10:17 AM Bill Wendling <[email protected]> wrote:
> >
> > On Mon, Jan 11, 2021 at 12:39 AM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Mon, Jan 11, 2021 at 9:18 AM 'Bill Wendling' via Clang Built Linux
> > > <[email protected]> wrote:
> > > >
> > > > From: Sami Tolvanen <[email protected]>
> > > >
> > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > workload is run, and the raw profile data is collected from
> > > > /sys/kernel/debug/pgo/profraw.
> > > >
> > > > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > > > it can be used during recompilation:
> > > >
> > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > >
> > > > Multiple raw profiles may be merged during this step.
> > > >
> > > > The data can be used either by the compiler if LTO isn't enabled:
> > > >
> > > > ... -fprofile-use=vmlinux.profdata ...
> > > >
> > > > or by LLD if LTO is enabled:
> > > >
> > > > ... -lto-cs-profile-file=vmlinux.profdata ...
> > > >
> > > > This initial submission is restricted to x86, as that's the platform we know
> > > > works. This restriction can be lifted once other platforms have been verified
> > > > to work with PGO.
> > > >
> > > > Note that this method of profiling the kernel is clang-native and isn't
> > > > compatible with clang's gcov support in kernel/gcov.
> > > >
> > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > >
> > >
> > > Hi Bill and Sami,
> > >
> > > I have seen the pull-request in the CBL issue tracker and had some
> > > questions in mind.
> > >
> > > Good you send this.
> > >
> > > First of all, I like to fetch any development stuff easily from a Git
> > > repository.
> >
> > The version in the pull-request in the CBL issue tracker is roughly
> > the same as this patch. (There are some changes, but they aren't
> > functionality changes.)
> >
> > > Can you offer this, please?
> > > What is the base for your work?
> > > I hope this is (fresh released) Linux v5.11-rc3.
> > >
> > This patch (and the PR on the CBL issue tracker) are from top-of-tree Linux.
> >
> > > I myself had some experiences with a PGO + ThinLTO optimized LLVM
> > > toolchain built with the help of tc-build.
> > > Here it takes very long to build it.
> > >
> > > This means I have some profile-data archived.
> > > Can I use it?
> > >
> > LLVM is more tolerant of "stale" profile data than gcov, so it's
> > possible that your archived profile data would still work, but I can't
> > guarantee that it will be better than using new profile data.
> >
> > > Is an own PGO + ThinLTO optimized LLVM toolchain pre-requirement for
> > > this or not?
> > > That is one of my important questions.
> > >
> > Do you mean that the LLVM tools (clang, llc, etc.) are compiled with
> > PGO + ThinLTO?
> >
>
> Yes.
>
> - Sedat -

No, having an optimized LLVM toolchain is not a requirement of this
patchset. It will make compiling the kernel faster but it does nothing
more than that.

Cheers,
Nathan

2021-01-11 20:16:15

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
>From: Sami Tolvanen <[email protected]>
>
>Enable the use of clang's Profile-Guided Optimization[1]. To generate a
>profile, the kernel is instrumented with PGO counters, a representative
>workload is run, and the raw profile data is collected from
>/sys/kernel/debug/pgo/profraw.
>
>The raw profile data must be processed by clang's "llvm-profdata" tool before
>it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
>Multiple raw profiles may be merged during this step.
>
>The data can be used either by the compiler if LTO isn't enabled:
>
> ... -fprofile-use=vmlinux.profdata ...
>
>or by LLD if LTO is enabled:
>
> ... -lto-cs-profile-file=vmlinux.profdata ...

This LLD option does not exist.
LLD does have some `--lto-*` options but the `-lto-*` form is not supported
(it clashes with -l) https://reviews.llvm.org/D79371

(There is an earlier -fprofile-instr-generate which does
instrumentation in Clang, but the option does not have broad usage.
It is used more for code coverage, not for optimization.
Noticeably, it does not even implement the Kirchhoff's current law
optimization)

-fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).

clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).

So I think the "or by LLD if LTO is enabled:" part should be removed.

>This initial submission is restricted to x86, as that's the platform we know
>works. This restriction can be lifted once other platforms have been verified
>to work with PGO.
>
>Note that this method of profiling the kernel is clang-native and isn't
>compatible with clang's gcov support in kernel/gcov.
>
>[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
>Signed-off-by: Sami Tolvanen <[email protected]>
>Co-developed-by: Bill Wendling <[email protected]>
>Signed-off-by: Bill Wendling <[email protected]>
>---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/arm/boot/bootp/Makefile | 1 +
> arch/arm/boot/compressed/Makefile | 1 +
> arch/arm/vdso/Makefile | 3 +-
> arch/arm64/kernel/vdso/Makefile | 3 +-
> arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
> arch/mips/boot/compressed/Makefile | 1 +
> arch/mips/vdso/Makefile | 1 +
> arch/nds32/kernel/vdso/Makefile | 4 +-
> arch/parisc/boot/compressed/Makefile | 1 +
> arch/powerpc/kernel/Makefile | 6 +-
> arch/powerpc/kernel/trace/Makefile | 3 +-
> arch/powerpc/kernel/vdso32/Makefile | 1 +
> arch/powerpc/kernel/vdso64/Makefile | 1 +
> arch/powerpc/kexec/Makefile | 3 +-
> arch/powerpc/xmon/Makefile | 1 +
> arch/riscv/kernel/vdso/Makefile | 3 +-
> arch/s390/boot/Makefile | 1 +
> arch/s390/boot/compressed/Makefile | 1 +
> arch/s390/kernel/Makefile | 1 +
> arch/s390/kernel/vdso64/Makefile | 3 +-
> arch/s390/purgatory/Makefile | 1 +
> arch/sh/boot/compressed/Makefile | 1 +
> arch/sh/mm/Makefile | 1 +
> arch/sparc/vdso/Makefile | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> drivers/s390/char/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 34 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 147 ++++++++++
> kernel/pgo/pgo.h | 206 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 48 files changed, 1017 insertions(+), 9 deletions(-)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
>diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
>index f7809c7b1ba9e..8d6418e858062 100644
>--- a/Documentation/dev-tools/index.rst
>+++ b/Documentation/dev-tools/index.rst
>@@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
>+ pgo
>
>
> .. only:: subproject and html
>diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
>new file mode 100644
>index 0000000000000..2ed7f549b20ef
>--- /dev/null
>+++ b/Documentation/dev-tools/pgo.rst
>@@ -0,0 +1,127 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+===============================
>+Using PGO with the Linux kernel
>+===============================
>+
>+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
>+when building with Clang. The profiling data is exported via the ``pgo``
>+debugfs directory.
>+
>+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>+
>+
>+Preparation
>+===========
>+
>+Configure the kernel with:
>+
>+.. code-block:: make
>+
>+ CONFIG_DEBUG_FS=y
>+ CONFIG_PGO_CLANG=y

Clang also supports SamplePGO (-fprofile-sample-use; AutoFDO) and
context-sensitive PGO (post-inline, -fcs-profile-generate).
Context-sensitive SamplePGO is under development.

If the naming does not make adding future other PGO features difficult,
I am happy with CONFIG_PGO_CLANG :)


Aside from the question and the issue above, the description looks good to me.

>+Note that kernels compiled with profiling flags will be significantly larger
>+and run slower.
>+
>+Profiling data will only become accessible once debugfs has been mounted:
>+
>+.. code-block:: sh
>+
>+ mount -t debugfs none /sys/kernel/debug
>+
>+
>+Customization
>+=============
>+
>+You can enable or disable profiling for individual file and directories by
>+adding a line similar to the following to the respective kernel Makefile:
>+
>+- For a single file (e.g. main.o)
>+
>+ .. code-block:: make
>+
>+ PGO_PROFILE_main.o := y
>+
>+- For all files in one directory
>+
>+ .. code-block:: make
>+
>+ PGO_PROFILE := y
>+
>+To exclude files from being profiled use
>+
>+ .. code-block:: make
>+
>+ PGO_PROFILE_main.o := n
>+
>+and
>+
>+ .. code-block:: make
>+
>+ PGO_PROFILE := n
>+
>+Only files which are linked to the main kernel image or are compiled as kernel
>+modules are supported by this mechanism.
>+
>+
>+Files
>+=====
>+
>+The PGO kernel support creates the following files in debugfs:
>+
>+``/sys/kernel/debug/pgo``
>+ Parent directory for all PGO-related files.
>+
>+``/sys/kernel/debug/pgo/reset``
>+ Global reset file: resets all coverage data to zero when written to.
>+
>+``/sys/kernel/debug/profraw``
>+ The raw PGO data that must be processed with ``llvm_profdata``.
>+
>+
>+Workflow
>+========
>+
>+The PGO kernel can be run on the host or test machines. The data though should
>+be analyzed with Clang's tools from the same Clang version as the kernel was
>+compiled. Clang's tolerant of version skew, but it's easier to use the same
>+Clang version.
>+
>+The profiling data is useful for optimizing the kernel, analyzing coverage,
>+etc. Clang offers tools to perform these tasks.
>+
>+Here is an example workflow for profiling an instrumented kernel with PGO and
>+using the result to optimize the kernel:
>+
>+1) Install the kernel on the TEST machine.
>+
>+2) Reset the data counters right before running the load tests
>+
>+ .. code-block:: sh
>+
>+ echo 1 > /sys/kernel/debug/pgo/reset
>+
>+3) Run the load tests.
>+
>+4) Collect the raw profile data
>+
>+ .. code-block:: sh
>+
>+ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
>+
>+5) (Optional) Download the raw profile data to the HOST machine.
>+
>+6) Process the raw profile data
>+
>+ .. code-block:: sh
>+
>+ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>+
>+ Note that multiple raw profile data files can be merged during this step.
>+
>+7) Rebuild the kernel using the profile data (PGO disabled)
>+
>+ .. code-block:: sh
>+
>+ make ... KCLAGS=-fprofile-use=vmlinux.profdata
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 6390491b07e51..7a98bdaab9861 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -13955,6 +13955,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
>+PGO BASED KERNEL PROFILING
>+M: Sami Tolvanen <[email protected]>
>+M: Bill Wendling <[email protected]>
>+R: Nathan Chancellor <[email protected]>
>+R: Nick Desaulniers <[email protected]>
>+S: Supported
>+F: Documentation/dev-tools/pgo.rst
>+F: kernel/pgo
>+
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
>diff --git a/Makefile b/Makefile
>index 8b2c3f88ee5ea..4f42957c78134 100644
>--- a/Makefile
>+++ b/Makefile
>@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
>+CFLAGS_PGO_CLANG := -fprofile-generate
>+export CFLAGS_PGO_CLANG
>+
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
>diff --git a/arch/Kconfig b/arch/Kconfig
>index 78c6f05b10f91..a7a6ab7d204dc 100644
>--- a/arch/Kconfig
>+++ b/arch/Kconfig
>@@ -1106,6 +1106,7 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
> bool
>
> source "kernel/gcov/Kconfig"
>+source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
>diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
>index 981a8d03f064c..523bd58df0a4b 100644
>--- a/arch/arm/boot/bootp/Makefile
>+++ b/arch/arm/boot/bootp/Makefile
>@@ -7,6 +7,7 @@
> #
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> LDFLAGS_bootp := --no-undefined -X \
> --defsym initrd_phys=$(INITRD_PHYS) \
>diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
>index fb521efcc6c20..5fd0fd85fc0e5 100644
>--- a/arch/arm/boot/compressed/Makefile
>+++ b/arch/arm/boot/compressed/Makefile
>@@ -24,6 +24,7 @@ OBJS += hyp-stub.o
> endif
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KASAN_SANITIZE := n
>
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
>diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
>index b558bee0e1f6b..11f6ce4b48b56 100644
>--- a/arch/arm/vdso/Makefile
>+++ b/arch/arm/vdso/Makefile
>@@ -36,8 +36,9 @@ else
> CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
> endif
>
>-# Disable gcov profiling for VDSO code
>+# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> KCOV_INSTRUMENT := n
>diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
>index cd9c3fa25902f..d48fc0df07020 100644
>--- a/arch/arm64/kernel/vdso/Makefile
>+++ b/arch/arm64/kernel/vdso/Makefile
>@@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
> CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
> endif
>
>-# Disable gcov profiling for VDSO code
>+# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> obj-y += vdso.o
> targets += vdso.lds
>diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
>index 1f1e351c5fe2b..ad128ecdbfbdf 100644
>--- a/arch/arm64/kvm/hyp/nvhe/Makefile
>+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
>@@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
> # compiler instrumentation that inserts callbacks or checks into the code may
> # cause crashes. Just disable it.
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCOV_INSTRUMENT := n
>diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
>index 47cd9dc7454af..0855ea12f2c7f 100644
>--- a/arch/mips/boot/compressed/Makefile
>+++ b/arch/mips/boot/compressed/Makefile
>@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> # decompressor objects (linked with vmlinuz)
> vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
>diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
>index 5810cc12bc1d9..d7eb64de35eae 100644
>--- a/arch/mips/vdso/Makefile
>+++ b/arch/mips/vdso/Makefile
>@@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
> CFLAGS_REMOVE_vdso.o = -pg
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KCOV_INSTRUMENT := n
>
>diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
>index 55df25ef00578..f2b53ee2124b7 100644
>--- a/arch/nds32/kernel/vdso/Makefile
>+++ b/arch/nds32/kernel/vdso/Makefile
>@@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
> ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
> -Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv
>
>-# Disable gcov profiling for VDSO code
>+# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
>-
>+PGO_PROFILE := n
>
> obj-y += vdso.o
> targets += vdso.lds
>diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
>index dff4536875305..5cf93a67f7da7 100644
>--- a/arch/parisc/boot/compressed/Makefile
>+++ b/arch/parisc/boot/compressed/Makefile
>@@ -7,6 +7,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
>diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
>index fe2ef598e2ead..c642c046660d7 100644
>--- a/arch/powerpc/kernel/Makefile
>+++ b/arch/powerpc/kernel/Makefile
>@@ -153,17 +153,21 @@ endif
> obj-$(CONFIG_PPC_SECURE_BOOT) += secure_boot.o ima_arch.o secvar-ops.o
> obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o
>
>-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
>+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_prom_init.o := n
>+PGO_PROFILE_prom_init.o := n
> KCOV_INSTRUMENT_prom_init.o := n
> UBSAN_SANITIZE_prom_init.o := n
> GCOV_PROFILE_kprobes.o := n
>+PGO_PROFILE_kprobes.o := n
> KCOV_INSTRUMENT_kprobes.o := n
> UBSAN_SANITIZE_kprobes.o := n
> GCOV_PROFILE_kprobes-ftrace.o := n
>+PGO_PROFILE_kprobes-ftrace.o := n
> KCOV_INSTRUMENT_kprobes-ftrace.o := n
> UBSAN_SANITIZE_kprobes-ftrace.o := n
> GCOV_PROFILE_syscall_64.o := n
>+PGO_PROFILE_syscall_64.o := n
> KCOV_INSTRUMENT_syscall_64.o := n
> UBSAN_SANITIZE_syscall_64.o := n
> UBSAN_SANITIZE_vdso.o := n
>diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
>index 858503775c583..7d72ae7d4f8c6 100644
>--- a/arch/powerpc/kernel/trace/Makefile
>+++ b/arch/powerpc/kernel/trace/Makefile
>@@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING) += trace_clock.o
> obj-$(CONFIG_PPC64) += $(obj64-y)
> obj-$(CONFIG_PPC32) += $(obj32-y)
>
>-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
>+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_ftrace.o := n
>+PGO_PROFILE_ftrace.o := n
> KCOV_INSTRUMENT_ftrace.o := n
> UBSAN_SANITIZE_ftrace.o := n
>diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
>index 9cb6f524854b9..655e159975a04 100644
>--- a/arch/powerpc/kernel/vdso32/Makefile
>+++ b/arch/powerpc/kernel/vdso32/Makefile
>@@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
> obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
>index bf363ff371521..12c286f5afc16 100644
>--- a/arch/powerpc/kernel/vdso64/Makefile
>+++ b/arch/powerpc/kernel/vdso64/Makefile
>@@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
> obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
>index 4aff6846c7726..1c7f65e3cb969 100644
>--- a/arch/powerpc/kexec/Makefile
>+++ b/arch/powerpc/kexec/Makefile
>@@ -16,7 +16,8 @@ endif
> endif
>
>
>-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
>+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
> GCOV_PROFILE_core_$(BITS).o := n
>+PGO_PROFILE_core_$(BITS).o := n
> KCOV_INSTRUMENT_core_$(BITS).o := n
> UBSAN_SANITIZE_core_$(BITS).o := n
>diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
>index eb25d7554ffd1..7aff80d18b44b 100644
>--- a/arch/powerpc/xmon/Makefile
>+++ b/arch/powerpc/xmon/Makefile
>@@ -2,6 +2,7 @@
> # Makefile for xmon
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KCOV_INSTRUMENT := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
>index 0cfd6da784f84..882340dc3c647 100644
>--- a/arch/riscv/kernel/vdso/Makefile
>+++ b/arch/riscv/kernel/vdso/Makefile
>@@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
> # Disable -pg to prevent insert call site
> CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>
>-# Disable gcov profiling for VDSO code
>+# Disable gcov and PGO profiling for VDSO code
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KCOV_INSTRUMENT := n
>
> # Force dependency
>diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
>index 41a64b8dce252..bee4a32040e79 100644
>--- a/arch/s390/boot/Makefile
>+++ b/arch/s390/boot/Makefile
>@@ -5,6 +5,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
>diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
>index de18dab518bb6..c3ab883e8425a 100644
>--- a/arch/s390/boot/compressed/Makefile
>+++ b/arch/s390/boot/compressed/Makefile
>@@ -7,6 +7,7 @@
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
>diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
>index dd73b7f074237..bd857aacad794 100644
>--- a/arch/s390/kernel/Makefile
>+++ b/arch/s390/kernel/Makefile
>@@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o = $(CC_FLAGS_FTRACE)
> endif
>
> GCOV_PROFILE_early.o := n
>+PGO_PROFILE_early.o := n
> KCOV_INSTRUMENT_early.o := n
> UBSAN_SANITIZE_early.o := n
> KASAN_SANITIZE_ipl.o := n
>diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
>index a6e0fb6b91d6c..d7c43b7c1db96 100644
>--- a/arch/s390/kernel/vdso64/Makefile
>+++ b/arch/s390/kernel/vdso64/Makefile
>@@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
> targets += vdso64.lds
> CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
>-# Disable gcov profiling, ubsan and kasan for VDSO code
>+# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
>diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
>index c57f8c40e9926..9aef584e98466 100644
>--- a/arch/s390/purgatory/Makefile
>+++ b/arch/s390/purgatory/Makefile
>@@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
>
> KCOV_INSTRUMENT := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
> KASAN_SANITIZE := n
>
>diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
>index 589d2d8a573db..ae19aeeb3964c 100644
>--- a/arch/sh/boot/compressed/Makefile
>+++ b/arch/sh/boot/compressed/Makefile
>@@ -13,6 +13,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz \
> OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> #
> # IMAGE_OFFSET is the load offset of the compression loader
>diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
>index f69ddc70b1465..ea2782c631f43 100644
>--- a/arch/sh/mm/Makefile
>+++ b/arch/sh/mm/Makefile
>@@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING) += uncached.o
> obj-$(CONFIG_HAVE_SRAM_POOL) += sram.o
>
> GCOV_PROFILE_pmb.o := n
>+PGO_PROFILE_pmb.o := n
>diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
>index c5e1545bc5cf9..ab5f3783fe199 100644
>--- a/arch/sparc/vdso/Makefile
>+++ b/arch/sparc/vdso/Makefile
>@@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> #
> # Install the unstripped copies of vdso*.so. If our toolchain supports
>diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>index 7b6dd10b162ac..a751b4f8f6645 100644
>--- a/arch/x86/Kconfig
>+++ b/arch/x86/Kconfig
>@@ -95,6 +95,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
>+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
>diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
>index fe605205b4ce2..383853e32f673 100644
>--- a/arch/x86/boot/Makefile
>+++ b/arch/x86/boot/Makefile
>@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
>diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>index e0bc3988c3faa..ed12ab65f6065 100644
>--- a/arch/x86/boot/compressed/Makefile
>+++ b/arch/x86/boot/compressed/Makefile
>@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
>diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
>index 02e3e42f380bd..26e2b3af0145c 100644
>--- a/arch/x86/entry/vdso/Makefile
>+++ b/arch/x86/entry/vdso/Makefile
>@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
>diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>index efd9e9ea17f25..f6cab2316c46a 100644
>--- a/arch/x86/kernel/vmlinux.lds.S
>+++ b/arch/x86/kernel/vmlinux.lds.S
>@@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
>+ PGO_CLANG_DATA
>+
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
>diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
>index 84b09c230cbd5..5f22b31446ad4 100644
>--- a/arch/x86/platform/efi/Makefile
>+++ b/arch/x86/platform/efi/Makefile
>@@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
>diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
>index 95ea17a9d20cb..36f20e99da0bc 100644
>--- a/arch/x86/purgatory/Makefile
>+++ b/arch/x86/purgatory/Makefile
>@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
>diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
>index 83f1b6a56449f..21797192f958f 100644
>--- a/arch/x86/realmode/rm/Makefile
>+++ b/arch/x86/realmode/rm/Makefile
>@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
>diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
>index 5943387e3f357..54f5768f58530 100644
>--- a/arch/x86/um/vdso/Makefile
>+++ b/arch/x86/um/vdso/Makefile
>@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
>diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
>index 8a94388e38b33..2d81623b33f29 100644
>--- a/drivers/firmware/efi/libstub/Makefile
>+++ b/drivers/firmware/efi/libstub/Makefile
>@@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
>diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
>index c6fdb81a068a6..bf6c5db5da1fc 100644
>--- a/drivers/s390/char/Makefile
>+++ b/drivers/s390/char/Makefile
>@@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
> endif
>
> GCOV_PROFILE_sclp_early_core.o := n
>+PGO_PROFILE_sclp_early_core.o := n
> KCOV_INSTRUMENT_sclp_early_core.o := n
> UBSAN_SANITIZE_sclp_early_core.o := n
> KASAN_SANITIZE_sclp_early_core.o := n
>diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>index b2b3d81b1535a..3a591bb18c5fb 100644
>--- a/include/asm-generic/vmlinux.lds.h
>+++ b/include/asm-generic/vmlinux.lds.h
>@@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
>+#ifdef CONFIG_PGO_CLANG
>+#define PGO_CLANG_DATA \
>+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
>+ . = ALIGN(8); \
>+ __llvm_prf_start = .; \
>+ __llvm_prf_data_start = .; \
>+ KEEP(*(__llvm_prf_data)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_data_end = .; \
>+ } \
>+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
>+ . = ALIGN(8); \
>+ __llvm_prf_cnts_start = .; \
>+ KEEP(*(__llvm_prf_cnts)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_cnts_end = .; \
>+ } \
>+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
>+ . = ALIGN(8); \
>+ __llvm_prf_names_start = .; \
>+ KEEP(*(__llvm_prf_names)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_names_end = .; \
>+ . = ALIGN(8); \
>+ } \
>+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
>+ __llvm_prf_vals_start = .; \
>+ KEEP(*(__llvm_prf_vals)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_vals_end = .; \
>+ . = ALIGN(8); \
>+ } \
>+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
>+ __llvm_prf_vnds_start = .; \
>+ KEEP(*(__llvm_prf_vnds)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_vnds_end = .; \
>+ __llvm_prf_end = .; \
>+ }
>+#else
>+#define PGO_CLANG_DATA
>+#endif
>+
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
>@@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
>+ PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
>diff --git a/kernel/Makefile b/kernel/Makefile
>index aa7368c7eabf3..0b34ca228ba46 100644
>--- a/kernel/Makefile
>+++ b/kernel/Makefile
>@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
>+obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
>diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
>new file mode 100644
>index 0000000000000..318d36bb3d106
>--- /dev/null
>+++ b/kernel/pgo/Kconfig
>@@ -0,0 +1,34 @@
>+# SPDX-License-Identifier: GPL-2.0-only
>+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
>+
>+config ARCH_SUPPORTS_PGO_CLANG
>+ bool
>+
>+config PGO_CLANG
>+ bool "Enable clang's PGO-based kernel profiling"
>+ depends on DEBUG_FS
>+ depends on ARCH_SUPPORTS_PGO_CLANG
>+ help
>+ This option enables clang's PGO (Profile Guided Optimization) based
>+ code profiling to better optimize the kernel.
>+
>+ If unsure, say N.
>+
>+ Run a representative workload for your application on a kernel
>+ compiled with this option and download the raw profile file from
>+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
>+ llvm-profdata. It may be merged with other collected raw profiles.
>+
>+ Copy the resulting profile file into vmlinux.profdata, and enable
>+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
>+ kernel.
>+
>+ Note that a kernel compiled with profiling flags will be
>+ significatnly larger and run slower. Also be sure to exclude files
>+ from profiling which are not linked to the kernel image to prevent
>+ linker errors.
>+
>+ Note that the debugfs filesystem has to be mounted to access
>+ profiling data.
>+
>+endmenu
>diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
>new file mode 100644
>index 0000000000000..41e27cefd9a47
>--- /dev/null
>+++ b/kernel/pgo/Makefile
>@@ -0,0 +1,5 @@
>+# SPDX-License-Identifier: GPL-2.0
>+GCOV_PROFILE := n
>+PGO_PROFILE := n
>+
>+obj-y += fs.o instrument.o
>diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
>new file mode 100644
>index 0000000000000..790a8df037bfc
>--- /dev/null
>+++ b/kernel/pgo/fs.c
>@@ -0,0 +1,382 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ * Sami Tolvanen <[email protected]>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#define pr_fmt(fmt) "pgo: " fmt
>+
>+#include <linux/kernel.h>
>+#include <linux/debugfs.h>
>+#include <linux/fs.h>
>+#include <linux/module.h>
>+#include <linux/slab.h>
>+#include <linux/vmalloc.h>
>+#include "pgo.h"
>+
>+static struct dentry *directory;
>+
>+struct prf_private_data {
>+ void *buffer;
>+ unsigned long size;
>+};
>+
>+/*
>+ * Raw profile data format:
>+ *
>+ * - llvm_prf_header
>+ * - __llvm_prf_data
>+ * - __llvm_prf_cnts
>+ * - __llvm_prf_names
>+ * - zero padding to 8 bytes
>+ * - for each llvm_prf_data in __llvm_prf_data:
>+ * - llvm_prf_value_data
>+ * - llvm_prf_value_record + site count array
>+ * - llvm_prf_value_node_data
>+ * ...
>+ * ...
>+ * ...
>+ */
>+
>+static void prf_fill_header(void **buffer)
>+{
>+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
>+
>+ header->magic = LLVM_PRF_MAGIC;
>+ header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
>+ header->data_size = prf_data_count();
>+ header->padding_bytes_before_counters = 0;
>+ header->counters_size = prf_cnts_count();
>+ header->padding_bytes_after_counters = 0;
>+ header->names_size = prf_names_count();
>+ header->counters_delta = (u64)__llvm_prf_cnts_start;
>+ header->names_delta = (u64)__llvm_prf_names_start;
>+ header->value_kind_last = LLVM_PRF_IPVK_LAST;
>+
>+ *buffer += sizeof(*header);
>+}
>+
>+/*
>+ * Copy the source into the buffer, incrementing the pointer into buffer in the
>+ * process.
>+ */
>+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
>+{
>+ memcpy(*buffer, src, size);
>+ *buffer += size;
>+}
>+
>+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
>+{
>+ struct llvm_prf_value_node **nodes =
>+ (struct llvm_prf_value_node **)p->values;
>+ u32 kinds = 0;
>+ u32 size = 0;
>+ unsigned int kind;
>+ unsigned int n;
>+ unsigned int s = 0;
>+
>+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
>+ unsigned int sites = p->num_value_sites[kind];
>+
>+ if (!sites)
>+ continue;
>+
>+ /* Record + site count array */
>+ size += prf_get_value_record_size(sites);
>+ kinds++;
>+
>+ if (!nodes)
>+ continue;
>+
>+ for (n = 0; n < sites; n++) {
>+ u32 count = 0;
>+ struct llvm_prf_value_node *site = nodes[s + n];
>+
>+ while (site && ++count <= U8_MAX)
>+ site = site->next;
>+
>+ size += count *
>+ sizeof(struct llvm_prf_value_node_data);
>+ }
>+
>+ s += sites;
>+ }
>+
>+ if (size)
>+ size += sizeof(struct llvm_prf_value_data);
>+
>+ if (value_kinds)
>+ *value_kinds = kinds;
>+
>+ return size;
>+}
>+
>+static u32 prf_get_value_size(void)
>+{
>+ u32 size = 0;
>+ struct llvm_prf_data *p;
>+
>+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
>+ size += __prf_get_value_size(p, NULL);
>+
>+ return size;
>+}
>+
>+/* Serialize the profiling's value. */
>+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
>+{
>+ struct llvm_prf_value_data header;
>+ struct llvm_prf_value_node **nodes =
>+ (struct llvm_prf_value_node **)p->values;
>+ unsigned int kind;
>+ unsigned int n;
>+ unsigned int s = 0;
>+
>+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
>+
>+ if (!header.num_value_kinds)
>+ /* Nothing to write. */
>+ return;
>+
>+ prf_copy_to_buffer(buffer, &header, sizeof(header));
>+
>+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
>+ struct llvm_prf_value_record *record;
>+ u8 *counts;
>+ unsigned int sites = p->num_value_sites[kind];
>+
>+ if (!sites)
>+ continue;
>+
>+ /* Profiling value record. */
>+ record = *(struct llvm_prf_value_record **)buffer;
>+ *buffer += prf_get_value_record_header_size();
>+
>+ record->kind = kind;
>+ record->num_value_sites = sites;
>+
>+ /* Site count array. */
>+ counts = *(u8 **)buffer;
>+ *buffer += prf_get_value_record_site_count_size(sites);
>+
>+ /*
>+ * If we don't have nodes, we can skip updating the site count
>+ * array, because the buffer is zero filled.
>+ */
>+ if (!nodes)
>+ continue;
>+
>+ for (n = 0; n < sites; n++) {
>+ u32 count = 0;
>+ struct llvm_prf_value_node *site = nodes[s + n];
>+
>+ while (site && ++count <= U8_MAX) {
>+ prf_copy_to_buffer(buffer, site,
>+ sizeof(struct llvm_prf_value_node_data));
>+ site = site->next;
>+ }
>+
>+ counts[n] = (u8)count;
>+ }
>+
>+ s += sites;
>+ }
>+}
>+
>+static void prf_serialize_values(void **buffer)
>+{
>+ struct llvm_prf_data *p;
>+
>+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
>+ prf_serialize_value(p, buffer);
>+}
>+
>+static inline unsigned long prf_get_padding(unsigned long size)
>+{
>+ return 8 - (size % 8);
>+}
>+
>+static unsigned long prf_buffer_size(void)
>+{
>+ return sizeof(struct llvm_prf_header) +
>+ prf_data_size() +
>+ prf_cnts_size() +
>+ prf_names_size() +
>+ prf_get_padding(prf_names_size()) +
>+ prf_get_value_size();
>+}
>+
>+/* Serialize the profling data into a format LLVM's tools can understand. */
>+static int prf_serialize(struct prf_private_data *p)
>+{
>+ int err = 0;
>+ void *buffer;
>+
>+ p->size = prf_buffer_size();
>+ p->buffer = vzalloc(p->size);
>+
>+ if (!p->buffer) {
>+ err = -ENOMEM;
>+ goto out;
>+ }
>+
>+ buffer = p->buffer;
>+
>+ prf_fill_header(&buffer);
>+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
>+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
>+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
>+ buffer += prf_get_padding(prf_names_size());
>+
>+ prf_serialize_values(&buffer);
>+
>+out:
>+ return err;
>+}
>+
>+/* open() implementation for PGO. Creates a copy of the profiling data set. */
>+static int prf_open(struct inode *inode, struct file *file)
>+{
>+ struct prf_private_data *data;
>+ unsigned long flags;
>+ int err;
>+
>+ data = kzalloc(sizeof(*data), GFP_KERNEL);
>+ if (!data) {
>+ err = -ENOMEM;
>+ goto out;
>+ }
>+
>+ flags = prf_lock();
>+
>+ err = prf_serialize(data);
>+ if (err) {
>+ kfree(data);
>+ goto out_unlock;
>+ }
>+
>+ file->private_data = data;
>+
>+out_unlock:
>+ prf_unlock(flags);
>+out:
>+ return err;
>+}
>+
>+/* read() implementation for PGO. */
>+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
>+ loff_t *ppos)
>+{
>+ struct prf_private_data *data = file->private_data;
>+
>+ BUG_ON(!data);
>+
>+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
>+ data->size);
>+}
>+
>+/* release() implementation for PGO. Release resources allocated by open(). */
>+static int prf_release(struct inode *inode, struct file *file)
>+{
>+ struct prf_private_data *data = file->private_data;
>+
>+ if (data) {
>+ vfree(data->buffer);
>+ kfree(data);
>+ }
>+
>+ return 0;
>+}
>+
>+static const struct file_operations prf_fops = {
>+ .owner = THIS_MODULE,
>+ .open = prf_open,
>+ .read = prf_read,
>+ .llseek = default_llseek,
>+ .release = prf_release
>+};
>+
>+/* write() implementation for resetting PGO's profile data. */
>+static ssize_t reset_write(struct file *file, const char __user *addr,
>+ size_t len, loff_t *pos)
>+{
>+ struct llvm_prf_data *data;
>+
>+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
>+
>+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
>+ struct llvm_prf_value_node **vnodes;
>+ u64 current_vsite_count;
>+ u32 i;
>+
>+ if (!data->values)
>+ continue;
>+
>+ current_vsite_count = 0;
>+ vnodes = (struct llvm_prf_value_node **)data->values;
>+
>+ for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
>+ current_vsite_count += data->num_value_sites[i];
>+
>+ for (i = 0; i < current_vsite_count; ++i) {
>+ struct llvm_prf_value_node *current_vnode = vnodes[i];
>+
>+ while (current_vnode) {
>+ current_vnode->count = 0;
>+ current_vnode = current_vnode->next;
>+ }
>+ }
>+ }
>+
>+ return len;
>+}
>+
>+static const struct file_operations prf_reset_fops = {
>+ .owner = THIS_MODULE,
>+ .write = reset_write,
>+ .llseek = noop_llseek,
>+};
>+
>+/* Create debugfs entries. */
>+static int __init pgo_init(void)
>+{
>+ directory = debugfs_create_dir("pgo", NULL);
>+ if (!directory)
>+ goto err_remove;
>+
>+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
>+ &prf_fops))
>+ goto err_remove;
>+
>+ if (!debugfs_create_file("reset", 0200, directory, NULL,
>+ &prf_reset_fops))
>+ goto err_remove;
>+
>+ return 0;
>+
>+err_remove:
>+ pr_err("initialization failed\n");
>+ return -EIO;
>+}
>+
>+/* Remove debufs entries. */
>+static void __exit pgo_exit(void)
>+{
>+ debugfs_remove_recursive(directory);
>+}
>+
>+module_init(pgo_init);
>+module_exit(pgo_exit);
>diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
>new file mode 100644
>index 0000000000000..d96b61a1cf712
>--- /dev/null
>+++ b/kernel/pgo/instrument.c
>@@ -0,0 +1,147 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ * Sami Tolvanen <[email protected]>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#define pr_fmt(fmt) "pgo: " fmt
>+
>+#include <linux/kernel.h>
>+#include <linux/export.h>
>+#include <linux/spinlock.h>
>+#include <linux/types.h>
>+#include "pgo.h"
>+
>+/* Lock guarding value node access and serialization. */
>+static DEFINE_SPINLOCK(pgo_lock);
>+static int current_node;
>+
>+unsigned long prf_lock(void)
>+{
>+ unsigned long flags;
>+
>+ spin_lock_irqsave(&pgo_lock, flags);
>+
>+ return flags;
>+}
>+
>+void prf_unlock(unsigned long flags)
>+{
>+ spin_unlock_irqrestore(&pgo_lock, flags);
>+}
>+
>+/*
>+ * Return a newly allocated profiling value node which contains the tracked
>+ * value by the value profiler.
>+ * Note: caller *must* hold pgo_lock.
>+ */
>+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
>+ u32 index, u64 value)
>+{
>+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
>+ return NULL; /* Out of nodes */
>+
>+ current_node++;
>+
>+ /* Make sure the node is entirely within the section */
>+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
>+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
>+ return NULL;
>+
>+ return &__llvm_prf_vnds_start[current_node];
>+}
>+
>+/*
>+ * Counts the number of times a target value is seen.
>+ *
>+ * Records the target value for the CounterIndex if not seen before. Otherwise,
>+ * increments the counter associated w/ the target value.
>+ */
>+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
>+{
>+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
>+ struct llvm_prf_value_node **counters;
>+ struct llvm_prf_value_node *curr;
>+ struct llvm_prf_value_node *min = NULL;
>+ struct llvm_prf_value_node *prev = NULL;
>+ u64 min_count = U64_MAX;
>+ u8 values = 0;
>+ unsigned long flags;
>+
>+ if (!p || !p->values)
>+ return;
>+
>+ counters = (struct llvm_prf_value_node **)p->values;
>+ curr = counters[index];
>+
>+ while (curr) {
>+ if (target_value == curr->value) {
>+ curr->count++;
>+ return;
>+ }
>+
>+ if (curr->count < min_count) {
>+ min_count = curr->count;
>+ min = curr;
>+ }
>+
>+ prev = curr;
>+ curr = curr->next;
>+ values++;
>+ }
>+
>+ if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
>+ if (!min->count || !(--min->count)) {
>+ curr = min;
>+ curr->value = target_value;
>+ curr->count++;
>+ }
>+ return;
>+ }
>+
>+ /* Lock when updating the value node structure. */
>+ flags = prf_lock();
>+
>+ curr = allocate_node(p, index, target_value);
>+ if (!curr)
>+ goto out;
>+
>+ curr->value = target_value;
>+ curr->count++;
>+
>+ if (!counters[index])
>+ counters[index] = curr;
>+ else if (prev && !prev->next)
>+ prev->next = curr;
>+
>+out:
>+ prf_unlock(flags);
>+}
>+EXPORT_SYMBOL(__llvm_profile_instrument_target);
>+
>+/* Counts the number of times a range of targets values are seen. */
>+void __llvm_profile_instrument_range(u64 target_value, void *data,
>+ u32 index, s64 precise_start,
>+ s64 precise_last, s64 large_value)
>+{
>+ if (large_value != S64_MIN && (s64)target_value >= large_value)
>+ target_value = large_value;
>+ else if ((s64)target_value < precise_start ||
>+ (s64)target_value > precise_last)
>+ target_value = precise_last + 1;
>+
>+ __llvm_profile_instrument_target(target_value, data, index);
>+}
>+EXPORT_SYMBOL(__llvm_profile_instrument_range);
>diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
>new file mode 100644
>index 0000000000000..df0aa278f28bd
>--- /dev/null
>+++ b/kernel/pgo/pgo.h
>@@ -0,0 +1,206 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ * Sami Tolvanen <[email protected]>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#ifndef _PGO_H
>+#define _PGO_H
>+
>+/*
>+ * Note: These internal LLVM definitions must match the compiler version.
>+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
>+ */
>+
>+#ifdef CONFIG_64BIT
>+ #define LLVM_PRF_MAGIC \
>+ ((u64)255 << 56 | \
>+ (u64)'l' << 48 | \
>+ (u64)'p' << 40 | \
>+ (u64)'r' << 32 | \
>+ (u64)'o' << 24 | \
>+ (u64)'f' << 16 | \
>+ (u64)'r' << 8 | \
>+ (u64)129)
>+#else
>+ #define LLVM_PRF_MAGIC \
>+ ((u64)255 << 56 | \
>+ (u64)'l' << 48 | \
>+ (u64)'p' << 40 | \
>+ (u64)'r' << 32 | \
>+ (u64)'o' << 24 | \
>+ (u64)'f' << 16 | \
>+ (u64)'R' << 8 | \
>+ (u64)129)
>+#endif
>+
>+#define LLVM_PRF_VERSION 5
>+#define LLVM_PRF_DATA_ALIGN 8
>+#define LLVM_PRF_IPVK_FIRST 0
>+#define LLVM_PRF_IPVK_LAST 1
>+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
>+
>+#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
>+#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
>+
>+/**
>+ * struct llvm_prf_header - represents the raw profile header data structure.
>+ * @magic: the magic token for the file format.
>+ * @version: the version of the file format.
>+ * @data_size: the number of entries in the profile data section.
>+ * @padding_bytes_before_counters: the number of padding bytes before the
>+ * counters.
>+ * @counters_size: the size in bytes of the LLVM profile section containing the
>+ * counters.
>+ * @padding_bytes_after_counters: the number of padding bytes after the
>+ * counters.
>+ * @names_size: the size in bytes of the LLVM profile section containing the
>+ * counters' names.
>+ * @counters_delta: the beginning of the LLMV profile counters section.
>+ * @names_delta: the beginning of the LLMV profile names section.
>+ * @value_kind_last: the last profile value kind.
>+ */
>+struct llvm_prf_header {
>+ u64 magic;
>+ u64 version;
>+ u64 data_size;
>+ u64 padding_bytes_before_counters;
>+ u64 counters_size;
>+ u64 padding_bytes_after_counters;
>+ u64 names_size;
>+ u64 counters_delta;
>+ u64 names_delta;
>+ u64 value_kind_last;
>+};
>+
>+/**
>+ * struct llvm_prf_data - represents the per-function control structure.
>+ * @name_ref: the reference to the function's name.
>+ * @func_hash: the hash value of the function.
>+ * @counter_ptr: a pointer to the profile counter.
>+ * @function_ptr: a pointer to the function.
>+ * @values: the profiling values associated with this function.
>+ * @num_counters: the number of counters in the function.
>+ * @num_value_sites: the number of value profile sites.
>+ */
>+struct llvm_prf_data {
>+ const u64 name_ref;
>+ const u64 func_hash;
>+ const void *counter_ptr;
>+ const void *function_ptr;
>+ void *values;
>+ const u32 num_counters;
>+ const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
>+} __aligned(LLVM_PRF_DATA_ALIGN);
>+
>+/**
>+ * structure llvm_prf_value_node_data - represents the data part of the struct
>+ * llvm_prf_value_node data structure.
>+ * @value: the value counters.
>+ * @count: the counters' count.
>+ */
>+struct llvm_prf_value_node_data {
>+ u64 value;
>+ u64 count;
>+};
>+
>+/**
>+ * struct llvm_prf_value_node - represents an internal data structure used by
>+ * the value profiler.
>+ * @value: the value counters.
>+ * @count: the counters' count.
>+ * @next: the next value node.
>+ */
>+struct llvm_prf_value_node {
>+ u64 value;
>+ u64 count;
>+ struct llvm_prf_value_node *next;
>+};
>+
>+/**
>+ * struct llvm_prf_value_data - represents the value profiling data in indexed
>+ * format.
>+ * @total_size: the total size in bytes including this field.
>+ * @num_value_kinds: the number of value profile kinds that has value profile
>+ * data.
>+ */
>+struct llvm_prf_value_data {
>+ u32 total_size;
>+ u32 num_value_kinds;
>+};
>+
>+/**
>+ * struct llvm_prf_value_record - represents the on-disk layout of the value
>+ * profile data of a particular kind for one function.
>+ * @kind: the kind of the value profile record.
>+ * @num_value_sites: the number of value profile sites.
>+ * @site_count_array: the first element of the array that stores the number
>+ * of profiled values for each value site.
>+ */
>+struct llvm_prf_value_record {
>+ u32 kind;
>+ u32 num_value_sites;
>+ u8 site_count_array[];
>+};
>+
>+#define prf_get_value_record_header_size() \
>+ offsetof(struct llvm_prf_value_record, site_count_array)
>+#define prf_get_value_record_site_count_size(sites) \
>+ roundup((sites), 8)
>+#define prf_get_value_record_size(sites) \
>+ (prf_get_value_record_header_size() + \
>+ prf_get_value_record_site_count_size((sites)))
>+
>+/* Data sections */
>+extern struct llvm_prf_data __llvm_prf_data_start[];
>+extern struct llvm_prf_data __llvm_prf_data_end[];
>+
>+extern u64 __llvm_prf_cnts_start[];
>+extern u64 __llvm_prf_cnts_end[];
>+
>+extern char __llvm_prf_names_start[];
>+extern char __llvm_prf_names_end[];
>+
>+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
>+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
>+
>+/* Locking for vnodes */
>+extern unsigned long prf_lock(void);
>+extern void prf_unlock(unsigned long flags);
>+
>+#define __DEFINE_PRF_SIZE(s) \
>+ static inline unsigned long prf_ ## s ## _size(void) \
>+ { \
>+ unsigned long start = \
>+ (unsigned long)__llvm_prf_ ## s ## _start; \
>+ unsigned long end = \
>+ (unsigned long)__llvm_prf_ ## s ## _end; \
>+ return roundup(end - start, \
>+ sizeof(__llvm_prf_ ## s ## _start[0])); \
>+ } \
>+ static inline unsigned long prf_ ## s ## _count(void) \
>+ { \
>+ return prf_ ## s ## _size() / \
>+ sizeof(__llvm_prf_ ## s ## _start[0]); \
>+ }
>+
>+__DEFINE_PRF_SIZE(data);
>+__DEFINE_PRF_SIZE(cnts);
>+__DEFINE_PRF_SIZE(names);
>+__DEFINE_PRF_SIZE(vnds);
>+
>+#undef __DEFINE_PRF_SIZE
>+
>+#endif /* _PGO_H */
>diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
>index 213677a5ed33e..9b218afb5cb87 100644
>--- a/scripts/Makefile.lib
>+++ b/scripts/Makefile.lib
>@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
>+#
>+# Enable clang's PGO profiling flags for a file or directory depending on
>+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
>+#
>+ifeq ($(CONFIG_PGO_CLANG),y)
>+_c_flags += $(if $(patsubst n%,, \
>+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
>+ $(CFLAGS_PGO_CLANG))
>+endif
>+
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
>--
>2.30.0.284.gd98b1dd5eaa7-goog
>
>--
>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210111081821.3041587-1-morbo%40google.com.

2021-01-11 20:36:27

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On 2021-01-11, Bill Wendling wrote:
>On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song <[email protected]> wrote:
>>
>> On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
>> >From: Sami Tolvanen <[email protected]>
>> >
>> >Enable the use of clang's Profile-Guided Optimization[1]. To generate a
>> >profile, the kernel is instrumented with PGO counters, a representative
>> >workload is run, and the raw profile data is collected from
>> >/sys/kernel/debug/pgo/profraw.
>> >
>> >The raw profile data must be processed by clang's "llvm-profdata" tool before
>> >it can be used during recompilation:
>> >
>> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>> >
>> >Multiple raw profiles may be merged during this step.
>> >
>> >The data can be used either by the compiler if LTO isn't enabled:
>> >
>> > ... -fprofile-use=vmlinux.profdata ...
>> >
>> >or by LLD if LTO is enabled:
>> >
>> > ... -lto-cs-profile-file=vmlinux.profdata ...
>>
>> This LLD option does not exist.
>> LLD does have some `--lto-*` options but the `-lto-*` form is not supported
>> (it clashes with -l) https://reviews.llvm.org/D79371
>>
>That's strange. I've been using that option for years now. :-) Is this
>a recent change?

The more frequently used options (specifyed by the clang driver) are
-plugin-opt=... (options implemented by LLVMgold.so).
`-lto-*` is rare.

>> (There is an earlier -fprofile-instr-generate which does
>> instrumentation in Clang, but the option does not have broad usage.
>> It is used more for code coverage, not for optimization.
>> Noticeably, it does not even implement the Kirchhoff's current law
>> optimization)
>>
>Right. I've been told outside of this email that -fprofile-generate is
>the prefered flag to use.
>
>> -fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).
>>
>> clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
>> For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).
>>
>> So I think the "or by LLD if LTO is enabled:" part should be removed.
>
>But what if you specify the linking step explicitly? Linux doesn't
>call "clang" when linking, but "ld.lld".

Regular PGO+LTO does not need -plugin-opt=cs-profile-path=
CSPGO+LTO needs it.
Because -fprofile-use= may be used by both, Clang driver adds it.
CSPGO is relevant in this this patch, so the linker option does not need to be mentioned.

2021-01-11 21:07:47

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 12:18:21AM -0800, Bill Wendling wrote:
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool before
> it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can be used either by the compiler if LTO isn't enabled:
>
> ... -fprofile-use=vmlinux.profdata ...
>
> or by LLD if LTO is enabled:
>
> ... -lto-cs-profile-file=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we know
> works. This restriction can be lifted once other platforms have been verified
> to work with PGO.
>
> Note that this method of profiling the kernel is clang-native and isn't
> compatible with clang's gcov support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>

I took this for a spin against x86_64_defconfig and ran into two issues:

1. https://github.com/ClangBuiltLinux/linux/issues/1252

There is also one in drivers/gpu/drm/i915/i915_query.c. For the time
being, I added PGO_PROFILE_... := n for those two files.

2. After doing that, I run into an undefined function error with ld.lld.

How I tested:

$ make -skj"$(nproc)" LLVM=1 defconfig

$ scripts/config -e PGO_CLANG

$ make -skj"$(nproc)" LLVM=1 olddefconfig vmlinux all
ld.lld: error: undefined symbol: __llvm_profile_instrument_memop
>>> referenced by head64.c
>>> arch/x86/kernel/head64.o:(__early_make_pgtable)
>>> referenced by head64.c
>>> arch/x86/kernel/head64.o:(x86_64_start_kernel)
>>> referenced by head64.c
>>> arch/x86/kernel/head64.o:(copy_bootdata)
>>> referenced 2259 more times

Local diff:

diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index ffce287ef415..4b2f238770b5 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -4,6 +4,7 @@
#

obj-y += mem.o random.o
+PGO_PROFILE_random.o := n
obj-$(CONFIG_TTY_PRINTK) += ttyprintk.o
obj-y += misc.o
obj-$(CONFIG_ATARI_DSP56K) += dsp56k.o
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e5574e506a5c..d83cacc79b1a 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -168,6 +168,7 @@ i915-y += \
i915_vma.o \
intel_region_lmem.o \
intel_wopcm.o
+PGO_PROFILE_i915_query.o := n

# general-purpose microcontroller (GuC) support
i915-y += gt/uc/intel_uc.o \

2021-01-11 21:20:45

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 1:04 PM Nathan Chancellor
<[email protected]> wrote:
>
> On Mon, Jan 11, 2021 at 12:18:21AM -0800, Bill Wendling wrote:
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can be used either by the compiler if LTO isn't enabled:
> >
> > ... -fprofile-use=vmlinux.profdata ...
> >
> > or by LLD if LTO is enabled:
> >
> > ... -lto-cs-profile-file=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we know
> > works. This restriction can be lifted once other platforms have been verified
> > to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native and isn't
> > compatible with clang's gcov support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
>
> I took this for a spin against x86_64_defconfig and ran into two issues:
>
> 1. https://github.com/ClangBuiltLinux/linux/issues/1252

"Cannot split an edge from a CallBrInst"
Looks like that should be fixed first, then we should gate this
feature on clang-12.

>
> There is also one in drivers/gpu/drm/i915/i915_query.c. For the time
> being, I added PGO_PROFILE_... := n for those two files.
>
> 2. After doing that, I run into an undefined function error with ld.lld.
>
> How I tested:
>
> $ make -skj"$(nproc)" LLVM=1 defconfig
>
> $ scripts/config -e PGO_CLANG
>
> $ make -skj"$(nproc)" LLVM=1 olddefconfig vmlinux all
> ld.lld: error: undefined symbol: __llvm_profile_instrument_memop

Err...that seems like it should be implemented in
kernel/pgo/instrument.c in this patch in a v2?

> >>> referenced by head64.c
> >>> arch/x86/kernel/head64.o:(__early_make_pgtable)
> >>> referenced by head64.c
> >>> arch/x86/kernel/head64.o:(x86_64_start_kernel)
> >>> referenced by head64.c
> >>> arch/x86/kernel/head64.o:(copy_bootdata)
> >>> referenced 2259 more times
>
> Local diff:
>
> diff --git a/drivers/char/Makefile b/drivers/char/Makefile
> index ffce287ef415..4b2f238770b5 100644
> --- a/drivers/char/Makefile
> +++ b/drivers/char/Makefile
> @@ -4,6 +4,7 @@
> #
>
> obj-y += mem.o random.o
> +PGO_PROFILE_random.o := n
> obj-$(CONFIG_TTY_PRINTK) += ttyprintk.o
> obj-y += misc.o
> obj-$(CONFIG_ATARI_DSP56K) += dsp56k.o
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index e5574e506a5c..d83cacc79b1a 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -168,6 +168,7 @@ i915-y += \
> i915_vma.o \
> intel_region_lmem.o \
> intel_wopcm.o
> +PGO_PROFILE_i915_query.o := n
>
> # general-purpose microcontroller (GuC) support
> i915-y += gt/uc/intel_uc.o \

I'd rather have these both sorted out before landing with PGO disabled
on these files.

--
Thanks,
~Nick Desaulniers

2021-01-11 21:38:13

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 1:18 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Mon, Jan 11, 2021 at 1:04 PM Nathan Chancellor
> <[email protected]> wrote:
> >
> > On Mon, Jan 11, 2021 at 12:18:21AM -0800, Bill Wendling wrote:
> > > From: Sami Tolvanen <[email protected]>
> > >
> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > profile, the kernel is instrumented with PGO counters, a representative
> > > workload is run, and the raw profile data is collected from
> > > /sys/kernel/debug/pgo/profraw.
> > >
> > > The raw profile data must be processed by clang's "llvm-profdata" tool before
> > > it can be used during recompilation:
> > >
> > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >
> > > Multiple raw profiles may be merged during this step.
> > >
> > > The data can be used either by the compiler if LTO isn't enabled:
> > >
> > > ... -fprofile-use=vmlinux.profdata ...
> > >
> > > or by LLD if LTO is enabled:
> > >
> > > ... -lto-cs-profile-file=vmlinux.profdata ...
> > >
> > > This initial submission is restricted to x86, as that's the platform we know
> > > works. This restriction can be lifted once other platforms have been verified
> > > to work with PGO.
> > >
> > > Note that this method of profiling the kernel is clang-native and isn't
> > > compatible with clang's gcov support in kernel/gcov.
> > >
> > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > >
> > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > Co-developed-by: Bill Wendling <[email protected]>
> > > Signed-off-by: Bill Wendling <[email protected]>
> >
> > I took this for a spin against x86_64_defconfig and ran into two issues:
> >
> > 1. https://github.com/ClangBuiltLinux/linux/issues/1252
>
> "Cannot split an edge from a CallBrInst"
> Looks like that should be fixed first, then we should gate this
> feature on clang-12.
>
Weird. I'll investigate.

> >
> > There is also one in drivers/gpu/drm/i915/i915_query.c. For the time
> > being, I added PGO_PROFILE_... := n for those two files.
> >
> > 2. After doing that, I run into an undefined function error with ld.lld.
> >
> > How I tested:
> >
> > $ make -skj"$(nproc)" LLVM=1 defconfig
> >
> > $ scripts/config -e PGO_CLANG
> >
> > $ make -skj"$(nproc)" LLVM=1 olddefconfig vmlinux all
> > ld.lld: error: undefined symbol: __llvm_profile_instrument_memop
>
> Err...that seems like it should be implemented in
> kernel/pgo/instrument.c in this patch in a v2?
>
Yes. I'll submit a new V2 with this and other feedback integrated.

> > >>> referenced by head64.c
> > >>> arch/x86/kernel/head64.o:(__early_make_pgtable)
> > >>> referenced by head64.c
> > >>> arch/x86/kernel/head64.o:(x86_64_start_kernel)
> > >>> referenced by head64.c
> > >>> arch/x86/kernel/head64.o:(copy_bootdata)
> > >>> referenced 2259 more times
> >
> > Local diff:
> >
> > diff --git a/drivers/char/Makefile b/drivers/char/Makefile
> > index ffce287ef415..4b2f238770b5 100644
> > --- a/drivers/char/Makefile
> > +++ b/drivers/char/Makefile
> > @@ -4,6 +4,7 @@
> > #
> >
> > obj-y += mem.o random.o
> > +PGO_PROFILE_random.o := n
> > obj-$(CONFIG_TTY_PRINTK) += ttyprintk.o
> > obj-y += misc.o
> > obj-$(CONFIG_ATARI_DSP56K) += dsp56k.o
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index e5574e506a5c..d83cacc79b1a 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -168,6 +168,7 @@ i915-y += \
> > i915_vma.o \
> > intel_region_lmem.o \
> > intel_wopcm.o
> > +PGO_PROFILE_i915_query.o := n
> >
> > # general-purpose microcontroller (GuC) support
> > i915-y += gt/uc/intel_uc.o \
>
> I'd rather have these both sorted out before landing with PGO disabled
> on these files.
>
Agreed.

-bw

2021-01-12 09:37:08

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song <[email protected]> wrote:
>
> On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
> >From: Sami Tolvanen <[email protected]>
> >
> >Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> >profile, the kernel is instrumented with PGO counters, a representative
> >workload is run, and the raw profile data is collected from
> >/sys/kernel/debug/pgo/profraw.
> >
> >The raw profile data must be processed by clang's "llvm-profdata" tool before
> >it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> >Multiple raw profiles may be merged during this step.
> >
> >The data can be used either by the compiler if LTO isn't enabled:
> >
> > ... -fprofile-use=vmlinux.profdata ...
> >
> >or by LLD if LTO is enabled:
> >
> > ... -lto-cs-profile-file=vmlinux.profdata ...
>
> This LLD option does not exist.
> LLD does have some `--lto-*` options but the `-lto-*` form is not supported
> (it clashes with -l) https://reviews.llvm.org/D79371
>
That's strange. I've been using that option for years now. :-) Is this
a recent change?

> (There is an earlier -fprofile-instr-generate which does
> instrumentation in Clang, but the option does not have broad usage.
> It is used more for code coverage, not for optimization.
> Noticeably, it does not even implement the Kirchhoff's current law
> optimization)
>
Right. I've been told outside of this email that -fprofile-generate is
the prefered flag to use.

> -fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).
>
> clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
> For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).
>
> So I think the "or by LLD if LTO is enabled:" part should be removed.

But what if you specify the linking step explicitly? Linux doesn't
call "clang" when linking, but "ld.lld".

-bw

2021-01-12 10:40:30

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 4:38 PM Bill Wendling <[email protected]> wrote:
>
> On Mon, Jan 11, 2021 at 12:31 PM Fangrui Song <[email protected]> wrote:
> > On 2021-01-11, Bill Wendling wrote:
> > >On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song <[email protected]> wrote:
> > >>
> > >> On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
> > >> >From: Sami Tolvanen <[email protected]>
> > >> >
> > >> >Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > >> >profile, the kernel is instrumented with PGO counters, a representative
> > >> >workload is run, and the raw profile data is collected from
> > >> >/sys/kernel/debug/pgo/profraw.
> > >> >
> > >> >The raw profile data must be processed by clang's "llvm-profdata" tool before
> > >> >it can be used during recompilation:
> > >> >
> > >> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > >> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >> >
> > >> >Multiple raw profiles may be merged during this step.
> > >> >
> > >> >The data can be used either by the compiler if LTO isn't enabled:
> > >> >
> > >> > ... -fprofile-use=vmlinux.profdata ...
> > >> >
> > >> >or by LLD if LTO is enabled:
> > >> >
> > >> > ... -lto-cs-profile-file=vmlinux.profdata ...
> > >>
> > >> This LLD option does not exist.
> > >> LLD does have some `--lto-*` options but the `-lto-*` form is not supported
> > >> (it clashes with -l) https://reviews.llvm.org/D79371
> > >>
> > >That's strange. I've been using that option for years now. :-) Is this
> > >a recent change?
> >
> > The more frequently used options (specifyed by the clang driver) are
> > -plugin-opt=... (options implemented by LLVMgold.so).
> > `-lto-*` is rare.
> >
> > >> (There is an earlier -fprofile-instr-generate which does
> > >> instrumentation in Clang, but the option does not have broad usage.
> > >> It is used more for code coverage, not for optimization.
> > >> Noticeably, it does not even implement the Kirchhoff's current law
> > >> optimization)
> > >>
> > >Right. I've been told outside of this email that -fprofile-generate is
> > >the prefered flag to use.
> > >
> > >> -fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).
> > >>
> > >> clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
> > >> For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).
> > >>
> > >> So I think the "or by LLD if LTO is enabled:" part should be removed.
> > >
> > >But what if you specify the linking step explicitly? Linux doesn't
> > >call "clang" when linking, but "ld.lld".
> >
> > Regular PGO+LTO does not need -plugin-opt=cs-profile-path=
> > CSPGO+LTO needs it.
> > Because -fprofile-use= may be used by both, Clang driver adds it.
> > CSPGO is relevant in this this patch, so the linker option does not need to be mentioned.
>
> I'm still a bit confused. Are you saying that when clang uses
> `-flto=thin -fprofile-use=foo` that the profile file "foo" is embedded
> into the bitcode file so that when the linker's run it'll be used?
>
> This is the workflow:
>
> clang ... -fprofile-use=vmlinux.profdata ... -c -o foo.o foo.c
> clang ... -fprofile-use=vmlinux.profdata ... -c -o bar.o bar.c
> ld.lld ... <output file> foo.o bar.o
>
> Are you saying that we don't need to have
> "-plugin-opt=cs-profile-path=vmlinux.profdata" on the "ld.lld ..."
> line?
>
> -bw

The backend compile step -flto=thin -fprofile-use=foo has all the information.

-plugin-opt=cs-profile-path=vmlinux.profdata is not needed for regular PGO.

2021-01-12 10:40:42

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 12:31 PM Fangrui Song <[email protected]> wrote:
> On 2021-01-11, Bill Wendling wrote:
> >On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song <[email protected]> wrote:
> >>
> >> On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
> >> >From: Sami Tolvanen <[email protected]>
> >> >
> >> >Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> >> >profile, the kernel is instrumented with PGO counters, a representative
> >> >workload is run, and the raw profile data is collected from
> >> >/sys/kernel/debug/pgo/profraw.
> >> >
> >> >The raw profile data must be processed by clang's "llvm-profdata" tool before
> >> >it can be used during recompilation:
> >> >
> >> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> >> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >> >
> >> >Multiple raw profiles may be merged during this step.
> >> >
> >> >The data can be used either by the compiler if LTO isn't enabled:
> >> >
> >> > ... -fprofile-use=vmlinux.profdata ...
> >> >
> >> >or by LLD if LTO is enabled:
> >> >
> >> > ... -lto-cs-profile-file=vmlinux.profdata ...
> >>
> >> This LLD option does not exist.
> >> LLD does have some `--lto-*` options but the `-lto-*` form is not supported
> >> (it clashes with -l) https://reviews.llvm.org/D79371
> >>
> >That's strange. I've been using that option for years now. :-) Is this
> >a recent change?
>
> The more frequently used options (specifyed by the clang driver) are
> -plugin-opt=... (options implemented by LLVMgold.so).
> `-lto-*` is rare.
>
> >> (There is an earlier -fprofile-instr-generate which does
> >> instrumentation in Clang, but the option does not have broad usage.
> >> It is used more for code coverage, not for optimization.
> >> Noticeably, it does not even implement the Kirchhoff's current law
> >> optimization)
> >>
> >Right. I've been told outside of this email that -fprofile-generate is
> >the prefered flag to use.
> >
> >> -fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).
> >>
> >> clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the linker.
> >> For regular PGO, this option is effectively a no-op (confirmed with CSPGO main developer).
> >>
> >> So I think the "or by LLD if LTO is enabled:" part should be removed.
> >
> >But what if you specify the linking step explicitly? Linux doesn't
> >call "clang" when linking, but "ld.lld".
>
> Regular PGO+LTO does not need -plugin-opt=cs-profile-path=
> CSPGO+LTO needs it.
> Because -fprofile-use= may be used by both, Clang driver adds it.
> CSPGO is relevant in this this patch, so the linker option does not need to be mentioned.

I'm still a bit confused. Are you saying that when clang uses
`-flto=thin -fprofile-use=foo` that the profile file "foo" is embedded
into the bitcode file so that when the linker's run it'll be used?

This is the workflow:

clang ... -fprofile-use=vmlinux.profdata ... -c -o foo.o foo.c
clang ... -fprofile-use=vmlinux.profdata ... -c -o bar.o bar.c
ld.lld ... <output file> foo.o bar.o

Are you saying that we don't need to have
"-plugin-opt=cs-profile-path=vmlinux.profdata" on the "ld.lld ..."
line?

-bw

2021-01-12 11:19:20

by Bill Wendling

[permalink] [raw]
Subject: [PATCH v2] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native and isn't
compatible with clang's gcov support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/arm/boot/bootp/Makefile | 1 +
arch/arm/boot/compressed/Makefile | 1 +
arch/arm/vdso/Makefile | 3 +-
arch/arm64/kernel/vdso/Makefile | 3 +-
arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
arch/mips/boot/compressed/Makefile | 1 +
arch/mips/vdso/Makefile | 1 +
arch/nds32/kernel/vdso/Makefile | 4 +-
arch/parisc/boot/compressed/Makefile | 1 +
arch/powerpc/kernel/Makefile | 6 +-
arch/powerpc/kernel/trace/Makefile | 3 +-
arch/powerpc/kernel/vdso32/Makefile | 1 +
arch/powerpc/kernel/vdso64/Makefile | 1 +
arch/powerpc/kexec/Makefile | 3 +-
arch/powerpc/xmon/Makefile | 1 +
arch/riscv/kernel/vdso/Makefile | 3 +-
arch/s390/boot/Makefile | 1 +
arch/s390/boot/compressed/Makefile | 1 +
arch/s390/kernel/Makefile | 1 +
arch/s390/kernel/vdso64/Makefile | 3 +-
arch/s390/purgatory/Makefile | 1 +
arch/sh/boot/compressed/Makefile | 1 +
arch/sh/mm/Makefile | 1 +
arch/sparc/vdso/Makefile | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/s390/char/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 44 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 34 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 188 +++++++++++++
kernel/pgo/pgo.h | 206 ++++++++++++++
scripts/Makefile.lib | 10 +
48 files changed, 1058 insertions(+), 9 deletions(-)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..da0e654ae7078
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index cc1e6a5ee6e67..1b979da316fa4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13954,6 +13954,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index 9e73f82e0d863..9128bfe1ccc97 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a36..f39d3991f6bfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
pairs of 32-bit arguments, select this option.

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
index 981a8d03f064c..523bd58df0a4b 100644
--- a/arch/arm/boot/bootp/Makefile
+++ b/arch/arm/boot/bootp/Makefile
@@ -7,6 +7,7 @@
#

GCOV_PROFILE := n
+PGO_PROFILE := n

LDFLAGS_bootp := --no-undefined -X \
--defsym initrd_phys=$(INITRD_PHYS) \
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index fb521efcc6c20..5fd0fd85fc0e5 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,7 @@ OBJS += hyp-stub.o
endif

GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index b558bee0e1f6b..11f6ce4b48b56 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -36,8 +36,9 @@ else
CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
endif

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
KCOV_INSTRUMENT := n
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index cd9c3fa25902f..d48fc0df07020 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
endif

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-y += vdso.o
targets += vdso.lds
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 1f1e351c5fe2b..ad128ecdbfbdf 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
# compiler instrumentation that inserts callbacks or checks into the code may
# cause crashes. Just disable it.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
index 47cd9dc7454af..0855ea12f2c7f 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n

# decompressor objects (linked with vmlinuz)
vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
index 5810cc12bc1d9..d7eb64de35eae 100644
--- a/arch/mips/vdso/Makefile
+++ b/arch/mips/vdso/Makefile
@@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
CFLAGS_REMOVE_vdso.o = -pg

GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n

diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
index 55df25ef00578..f2b53ee2124b7 100644
--- a/arch/nds32/kernel/vdso/Makefile
+++ b/arch/nds32/kernel/vdso/Makefile
@@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
-Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
-
+PGO_PROFILE := n

obj-y += vdso.o
targets += vdso.lds
diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
index dff4536875305..5cf93a67f7da7 100644
--- a/arch/parisc/boot/compressed/Makefile
+++ b/arch/parisc/boot/compressed/Makefile
@@ -7,6 +7,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe2ef598e2ead..c642c046660d7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -153,17 +153,21 @@ endif
obj-$(CONFIG_PPC_SECURE_BOOT) += secure_boot.o ima_arch.o secvar-ops.o
obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o

-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_prom_init.o := n
+PGO_PROFILE_prom_init.o := n
KCOV_INSTRUMENT_prom_init.o := n
UBSAN_SANITIZE_prom_init.o := n
GCOV_PROFILE_kprobes.o := n
+PGO_PROFILE_kprobes.o := n
KCOV_INSTRUMENT_kprobes.o := n
UBSAN_SANITIZE_kprobes.o := n
GCOV_PROFILE_kprobes-ftrace.o := n
+PGO_PROFILE_kprobes-ftrace.o := n
KCOV_INSTRUMENT_kprobes-ftrace.o := n
UBSAN_SANITIZE_kprobes-ftrace.o := n
GCOV_PROFILE_syscall_64.o := n
+PGO_PROFILE_syscall_64.o := n
KCOV_INSTRUMENT_syscall_64.o := n
UBSAN_SANITIZE_syscall_64.o := n
UBSAN_SANITIZE_vdso.o := n
diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
index 858503775c583..7d72ae7d4f8c6 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING) += trace_clock.o
obj-$(CONFIG_PPC64) += $(obj64-y)
obj-$(CONFIG_PPC32) += $(obj32-y)

-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_ftrace.o := n
+PGO_PROFILE_ftrace.o := n
KCOV_INSTRUMENT_ftrace.o := n
UBSAN_SANITIZE_ftrace.o := n
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 9cb6f524854b9..655e159975a04 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index bf363ff371521..12c286f5afc16 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 4aff6846c7726..1c7f65e3cb969 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -16,7 +16,8 @@ endif
endif


-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_core_$(BITS).o := n
+PGO_PROFILE_core_$(BITS).o := n
KCOV_INSTRUMENT_core_$(BITS).o := n
UBSAN_SANITIZE_core_$(BITS).o := n
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index eb25d7554ffd1..7aff80d18b44b 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -2,6 +2,7 @@
# Makefile for xmon

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..882340dc3c647 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
# Disable -pg to prevent insert call site
CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n

# Force dependency
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index 41a64b8dce252..bee4a32040e79 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -5,6 +5,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
index de18dab518bb6..c3ab883e8425a 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -7,6 +7,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index dd73b7f074237..bd857aacad794 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o = $(CC_FLAGS_FTRACE)
endif

GCOV_PROFILE_early.o := n
+PGO_PROFILE_early.o := n
KCOV_INSTRUMENT_early.o := n
UBSAN_SANITIZE_early.o := n
KASAN_SANITIZE_ipl.o := n
diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
index a6e0fb6b91d6c..d7c43b7c1db96 100644
--- a/arch/s390/kernel/vdso64/Makefile
+++ b/arch/s390/kernel/vdso64/Makefile
@@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
targets += vdso64.lds
CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)

-# Disable gcov profiling, ubsan and kasan for VDSO code
+# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index c57f8c40e9926..9aef584e98466 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
index 589d2d8a573db..ae19aeeb3964c 100644
--- a/arch/sh/boot/compressed/Makefile
+++ b/arch/sh/boot/compressed/Makefile
@@ -13,6 +13,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz \
OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o

GCOV_PROFILE := n
+PGO_PROFILE := n

#
# IMAGE_OFFSET is the load offset of the compression loader
diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
index f69ddc70b1465..ea2782c631f43 100644
--- a/arch/sh/mm/Makefile
+++ b/arch/sh/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING) += uncached.o
obj-$(CONFIG_HAVE_SRAM_POOL) += sram.o

GCOV_PROFILE_pmb.o := n
+PGO_PROFILE_pmb.o := n
diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
index c5e1545bc5cf9..ab5f3783fe199 100644
--- a/arch/sparc/vdso/Makefile
+++ b/arch/sparc/vdso/Makefile
@@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copies of vdso*.so. If our toolchain supports
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff08..36305ea61dc09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
index c6fdb81a068a6..bf6c5db5da1fc 100644
--- a/drivers/s390/char/Makefile
+++ b/drivers/s390/char/Makefile
@@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
endif

GCOV_PROFILE_sclp_early_core.o := n
+PGO_PROFILE_sclp_early_core.o := n
KCOV_INSTRUMENT_sclp_early_core.o := n
UBSAN_SANITIZE_sclp_early_core.o := n
KASAN_SANITIZE_sclp_early_core.o := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
#define THERMAL_TABLE(name)
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ KEEP(*(__llvm_prf_data)) \
+ . = ALIGN(8); \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_start = .; \
+ KEEP(*(__llvm_prf_cnts)) \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_names_start = .; \
+ KEEP(*(__llvm_prf_names)) \
+ . = ALIGN(8); \
+ __llvm_prf_names_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ KEEP(*(__llvm_prf_vals)) \
+ . = ALIGN(8); \
+ __llvm_prf_vals_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ KEEP(*(__llvm_prf_vnds)) \
+ . = ALIGN(8); \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1125,6 +1168,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..318d36bb3d106
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significatnly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..790a8df037bfc
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+ header->magic = LLVM_PRF_MAGIC;
+ header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 8 - (size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/* Serialize the profling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (err) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; ++i) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debufs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..465615b7f8735
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static inline int inst_prof_popcount(unsigned long long value)
+{
+ value = value - ((value >> 1) & 0x5555555555555555ULL);
+ value = (value & 0x3333333333333333ULL) +
+ ((value >> 2) & 0x3333333333333333ULL);
+ value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
+
+ return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
+}
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+ if (value <= 8)
+ /* The first ranges are individually tracked, us it as is. */
+ return value;
+ else if (value >= 513)
+ /* The last range is mapped to its lowest value. */
+ return 513;
+ else if (inst_prof_popcount(value) == 1)
+ /* If it's a power of two, use it as is. */
+ return value;
+
+ /* Otherwise, take to the previous power of two + 1. */
+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index)
+{
+ u64 rep_value;
+
+ /* Map the target value to the representative value of its range. */
+ rep_value = inst_prof_get_range_rep_value(target_value);
+ __llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#else
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION 5
+#define LLVM_PRF_DATA_ALIGN 8
+#define LLVM_PRF_IPVK_FIRST 0
+#define LLVM_PRF_IPVK_LAST 1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
+
+#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.30.0.284.gd98b1dd5eaa7-goog

2021-01-12 11:20:15

by Bill Wendling

[permalink] [raw]
Subject: [PATCH v3] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native and isn't
compatible with clang's gcov support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
---
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
testing.
- Corrected documentation, re PGO flags when using LTO, based on Fāng-ruì
Sòng's comments.
v3: - Added change log section based on Sedat Dilek's comments.
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/arm/boot/bootp/Makefile | 1 +
arch/arm/boot/compressed/Makefile | 1 +
arch/arm/vdso/Makefile | 3 +-
arch/arm64/kernel/vdso/Makefile | 3 +-
arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
arch/mips/boot/compressed/Makefile | 1 +
arch/mips/vdso/Makefile | 1 +
arch/nds32/kernel/vdso/Makefile | 4 +-
arch/parisc/boot/compressed/Makefile | 1 +
arch/powerpc/kernel/Makefile | 6 +-
arch/powerpc/kernel/trace/Makefile | 3 +-
arch/powerpc/kernel/vdso32/Makefile | 1 +
arch/powerpc/kernel/vdso64/Makefile | 1 +
arch/powerpc/kexec/Makefile | 3 +-
arch/powerpc/xmon/Makefile | 1 +
arch/riscv/kernel/vdso/Makefile | 3 +-
arch/s390/boot/Makefile | 1 +
arch/s390/boot/compressed/Makefile | 1 +
arch/s390/kernel/Makefile | 1 +
arch/s390/kernel/vdso64/Makefile | 3 +-
arch/s390/purgatory/Makefile | 1 +
arch/sh/boot/compressed/Makefile | 1 +
arch/sh/mm/Makefile | 1 +
arch/sparc/vdso/Makefile | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/s390/char/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 44 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 34 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 188 +++++++++++++
kernel/pgo/pgo.h | 206 ++++++++++++++
scripts/Makefile.lib | 10 +
48 files changed, 1058 insertions(+), 9 deletions(-)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..da0e654ae7078
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index cc1e6a5ee6e67..1b979da316fa4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13954,6 +13954,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index 9e73f82e0d863..9128bfe1ccc97 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a36..f39d3991f6bfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
pairs of 32-bit arguments, select this option.

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/arm/boot/bootp/Makefile b/arch/arm/boot/bootp/Makefile
index 981a8d03f064c..523bd58df0a4b 100644
--- a/arch/arm/boot/bootp/Makefile
+++ b/arch/arm/boot/bootp/Makefile
@@ -7,6 +7,7 @@
#

GCOV_PROFILE := n
+PGO_PROFILE := n

LDFLAGS_bootp := --no-undefined -X \
--defsym initrd_phys=$(INITRD_PHYS) \
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index fb521efcc6c20..5fd0fd85fc0e5 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,7 @@ OBJS += hyp-stub.o
endif

GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index b558bee0e1f6b..11f6ce4b48b56 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -36,8 +36,9 @@ else
CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y)
endif

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
KCOV_INSTRUMENT := n
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index cd9c3fa25902f..d48fc0df07020 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -41,8 +41,9 @@ ifneq ($(c-gettimeofday-y),)
CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
endif

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-y += vdso.o
targets += vdso.lds
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 1f1e351c5fe2b..ad128ecdbfbdf 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -60,6 +60,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAG
# compiler instrumentation that inserts callbacks or checks into the code may
# cause crashes. Just disable it.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
index 47cd9dc7454af..0855ea12f2c7f 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -37,6 +37,7 @@ KBUILD_AFLAGS := $(KBUILD_AFLAGS) -D__ASSEMBLY__ \
# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n

# decompressor objects (linked with vmlinuz)
vmlinuzobjs-y := $(obj)/head.o $(obj)/decompress.o $(obj)/string.o
diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
index 5810cc12bc1d9..d7eb64de35eae 100644
--- a/arch/mips/vdso/Makefile
+++ b/arch/mips/vdso/Makefile
@@ -66,6 +66,7 @@ ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
CFLAGS_REMOVE_vdso.o = -pg

GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n

diff --git a/arch/nds32/kernel/vdso/Makefile b/arch/nds32/kernel/vdso/Makefile
index 55df25ef00578..f2b53ee2124b7 100644
--- a/arch/nds32/kernel/vdso/Makefile
+++ b/arch/nds32/kernel/vdso/Makefile
@@ -15,9 +15,9 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
ccflags-y := -shared -fno-common -fno-builtin -nostdlib -fPIC -Wl,-shared -g \
-Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
-
+PGO_PROFILE := n

obj-y += vdso.o
targets += vdso.lds
diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
index dff4536875305..5cf93a67f7da7 100644
--- a/arch/parisc/boot/compressed/Makefile
+++ b/arch/parisc/boot/compressed/Makefile
@@ -7,6 +7,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe2ef598e2ead..c642c046660d7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -153,17 +153,21 @@ endif
obj-$(CONFIG_PPC_SECURE_BOOT) += secure_boot.o ima_arch.o secvar-ops.o
obj-$(CONFIG_PPC_SECVAR_SYSFS) += secvar-sysfs.o

-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_prom_init.o := n
+PGO_PROFILE_prom_init.o := n
KCOV_INSTRUMENT_prom_init.o := n
UBSAN_SANITIZE_prom_init.o := n
GCOV_PROFILE_kprobes.o := n
+PGO_PROFILE_kprobes.o := n
KCOV_INSTRUMENT_kprobes.o := n
UBSAN_SANITIZE_kprobes.o := n
GCOV_PROFILE_kprobes-ftrace.o := n
+PGO_PROFILE_kprobes-ftrace.o := n
KCOV_INSTRUMENT_kprobes-ftrace.o := n
UBSAN_SANITIZE_kprobes-ftrace.o := n
GCOV_PROFILE_syscall_64.o := n
+PGO_PROFILE_syscall_64.o := n
KCOV_INSTRUMENT_syscall_64.o := n
UBSAN_SANITIZE_syscall_64.o := n
UBSAN_SANITIZE_vdso.o := n
diff --git a/arch/powerpc/kernel/trace/Makefile b/arch/powerpc/kernel/trace/Makefile
index 858503775c583..7d72ae7d4f8c6 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -23,7 +23,8 @@ obj-$(CONFIG_TRACING) += trace_clock.o
obj-$(CONFIG_PPC64) += $(obj64-y)
obj-$(CONFIG_PPC32) += $(obj32-y)

-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_ftrace.o := n
+PGO_PROFILE_ftrace.o := n
KCOV_INSTRUMENT_ftrace.o := n
UBSAN_SANITIZE_ftrace.o := n
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index 9cb6f524854b9..655e159975a04 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -34,6 +34,7 @@ targets := $(obj-vdso32) vdso32.so.dbg
obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index bf363ff371521..12c286f5afc16 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -21,6 +21,7 @@ targets := $(obj-vdso64) vdso64.so.dbg
obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 4aff6846c7726..1c7f65e3cb969 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -16,7 +16,8 @@ endif
endif


-# Disable GCOV, KCOV & sanitizers in odd or sensitive code
+# Disable GCOV, PGO, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_core_$(BITS).o := n
+PGO_PROFILE_core_$(BITS).o := n
KCOV_INSTRUMENT_core_$(BITS).o := n
UBSAN_SANITIZE_core_$(BITS).o := n
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index eb25d7554ffd1..7aff80d18b44b 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -2,6 +2,7 @@
# Makefile for xmon

GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..882340dc3c647 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -32,8 +32,9 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
# Disable -pg to prevent insert call site
CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os

-# Disable gcov profiling for VDSO code
+# Disable gcov and PGO profiling for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n
KCOV_INSTRUMENT := n

# Force dependency
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index 41a64b8dce252..bee4a32040e79 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -5,6 +5,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
index de18dab518bb6..c3ab883e8425a 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -7,6 +7,7 @@

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index dd73b7f074237..bd857aacad794 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -14,6 +14,7 @@ CFLAGS_REMOVE_early.o = $(CC_FLAGS_FTRACE)
endif

GCOV_PROFILE_early.o := n
+PGO_PROFILE_early.o := n
KCOV_INSTRUMENT_early.o := n
UBSAN_SANITIZE_early.o := n
KASAN_SANITIZE_ipl.o := n
diff --git a/arch/s390/kernel/vdso64/Makefile b/arch/s390/kernel/vdso64/Makefile
index a6e0fb6b91d6c..d7c43b7c1db96 100644
--- a/arch/s390/kernel/vdso64/Makefile
+++ b/arch/s390/kernel/vdso64/Makefile
@@ -35,8 +35,9 @@ obj-y += vdso64_wrapper.o
targets += vdso64.lds
CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)

-# Disable gcov profiling, ubsan and kasan for VDSO code
+# Disable gcov and PGO profiling, ubsan and kasan for VDSO code
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index c57f8c40e9926..9aef584e98466 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -17,6 +17,7 @@ $(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE

KCOV_INSTRUMENT := n
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
KASAN_SANITIZE := n

diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
index 589d2d8a573db..ae19aeeb3964c 100644
--- a/arch/sh/boot/compressed/Makefile
+++ b/arch/sh/boot/compressed/Makefile
@@ -13,6 +13,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz \
OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o

GCOV_PROFILE := n
+PGO_PROFILE := n

#
# IMAGE_OFFSET is the load offset of the compression loader
diff --git a/arch/sh/mm/Makefile b/arch/sh/mm/Makefile
index f69ddc70b1465..ea2782c631f43 100644
--- a/arch/sh/mm/Makefile
+++ b/arch/sh/mm/Makefile
@@ -43,3 +43,4 @@ obj-$(CONFIG_UNCACHED_MAPPING) += uncached.o
obj-$(CONFIG_HAVE_SRAM_POOL) += sram.o

GCOV_PROFILE_pmb.o := n
+PGO_PROFILE_pmb.o := n
diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile
index c5e1545bc5cf9..ab5f3783fe199 100644
--- a/arch/sparc/vdso/Makefile
+++ b/arch/sparc/vdso/Makefile
@@ -115,6 +115,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copies of vdso*.so. If our toolchain supports
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff08..36305ea61dc09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/drivers/s390/char/Makefile b/drivers/s390/char/Makefile
index c6fdb81a068a6..bf6c5db5da1fc 100644
--- a/drivers/s390/char/Makefile
+++ b/drivers/s390/char/Makefile
@@ -9,6 +9,7 @@ CFLAGS_REMOVE_sclp_early_core.o = $(CC_FLAGS_FTRACE)
endif

GCOV_PROFILE_sclp_early_core.o := n
+PGO_PROFILE_sclp_early_core.o := n
KCOV_INSTRUMENT_sclp_early_core.o := n
UBSAN_SANITIZE_sclp_early_core.o := n
KASAN_SANITIZE_sclp_early_core.o := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
#define THERMAL_TABLE(name)
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ KEEP(*(__llvm_prf_data)) \
+ . = ALIGN(8); \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_start = .; \
+ KEEP(*(__llvm_prf_cnts)) \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_names_start = .; \
+ KEEP(*(__llvm_prf_names)) \
+ . = ALIGN(8); \
+ __llvm_prf_names_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ KEEP(*(__llvm_prf_vals)) \
+ . = ALIGN(8); \
+ __llvm_prf_vals_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ KEEP(*(__llvm_prf_vnds)) \
+ . = ALIGN(8); \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1125,6 +1168,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..318d36bb3d106
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significatnly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..790a8df037bfc
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+ header->magic = LLVM_PRF_MAGIC;
+ header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 8 - (size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/* Serialize the profling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (err) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; ++i) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debufs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..465615b7f8735
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static inline int inst_prof_popcount(unsigned long long value)
+{
+ value = value - ((value >> 1) & 0x5555555555555555ULL);
+ value = (value & 0x3333333333333333ULL) +
+ ((value >> 2) & 0x3333333333333333ULL);
+ value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
+
+ return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
+}
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+ if (value <= 8)
+ /* The first ranges are individually tracked, us it as is. */
+ return value;
+ else if (value >= 513)
+ /* The last range is mapped to its lowest value. */
+ return 513;
+ else if (inst_prof_popcount(value) == 1)
+ /* If it's a power of two, use it as is. */
+ return value;
+
+ /* Otherwise, take to the previous power of two + 1. */
+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index)
+{
+ u64 rep_value;
+
+ /* Map the target value to the representative value of its range. */
+ rep_value = inst_prof_get_range_rep_value(target_value);
+ __llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#else
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION 5
+#define LLVM_PRF_DATA_ALIGN 8
+#define LLVM_PRF_IPVK_FIRST 0
+#define LLVM_PRF_IPVK_LAST 1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
+
+#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.30.0.284.gd98b1dd5eaa7-goog

2021-01-12 17:26:26

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v3] pgo: add clang's Profile Guided Optimization infrastructure

On Tue, Jan 12, 2021 at 05:10:04PM +0800, kernel test robot wrote:
> Hi Bill,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on linus/master]
> [also build test WARNING on v5.11-rc3]
> [cannot apply to powerpc/next s390/features tip/x86/core next-20210111]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Bill-Wendling/pgo-add-clang-s-Profile-Guided-Optimization-infrastructure/20210112-133315
> base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a0d54b4f5b219fb31f0776e9f53aa137e78ae431
> config: x86_64-allyesconfig (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

Hmmm... This should probably be gated on CC_IS_CLANG? Or even better
CLANG_VERSION >= 120000 due to
https://github.com/ClangBuiltLinux/linux/issues/1252?

> reproduce (this is a W=1 build):
> # https://github.com/0day-ci/linux/commit/6ab85bae7667afd0aa68c6442b7ca5c369fa1088
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Bill-Wendling/pgo-add-clang-s-Profile-Guided-Optimization-infrastructure/20210112-133315
> git checkout 6ab85bae7667afd0aa68c6442b7ca5c369fa1088
> # save the attached .config to linux build tree
> make W=1 ARCH=x86_64
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> kernel/pgo/instrument.c:72:6: warning: no previous prototype for '__llvm_profile_instrument_target' [-Wmissing-prototypes]
> 72 | void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> kernel/pgo/instrument.c:135:6: warning: no previous prototype for '__llvm_profile_instrument_range' [-Wmissing-prototypes]
> 135 | void __llvm_profile_instrument_range(u64 target_value, void *data,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> kernel/pgo/instrument.c:179:6: warning: no previous prototype for '__llvm_profile_instrument_memop' [-Wmissing-prototypes]
> 179 | void __llvm_profile_instrument_memop(u64 target_value, void *data,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>

I still think that this warning will show up with clang at W=1. Given
that these are compiler inserted functions, the prototypes don't matter
but we could shut it up by just putting the prototypes right above the
functions like was done in commit 1e1b6d63d634 ("lib/string.c: implement
stpcpy").

Cheers,
Nathan

2021-01-12 17:40:06

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 11, 2021 at 9:14 PM Bill Wendling <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we

Please drop all changes to arch/* that are not to arch/x86/ then; we
can cross that bridge when we get to each arch. For example, there's
no point disabling PGO for architectures LLVM doesn't even have a
backend for.

> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native and isn't
> compatible with clang's gcov support in kernel/gcov.

Then the Kconfig option should depend on !GCOV so that they are
mutually exclusive and can't be selected together accidentally; such
as by bots doing randconfig tests.

<large snip>

> +static inline int inst_prof_popcount(unsigned long long value)
> +{
> + value = value - ((value >> 1) & 0x5555555555555555ULL);
> + value = (value & 0x3333333333333333ULL) +
> + ((value >> 2) & 0x3333333333333333ULL);
> + value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
> +
> + return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
> +}

The kernel has a portable popcnt implementation called hweight64 if
you #include <asm-generic/bitops/hweight.h>; does that work here?
https://en.wikipedia.org/wiki/Hamming_weight
--
Thanks,
~Nick Desaulniers

2021-01-12 23:37:46

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3] pgo: add clang's Profile Guided Optimization infrastructure

Hi Bill,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.11-rc3]
[cannot apply to powerpc/next s390/features tip/x86/core next-20210111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Bill-Wendling/pgo-add-clang-s-Profile-Guided-Optimization-infrastructure/20210112-133315
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a0d54b4f5b219fb31f0776e9f53aa137e78ae431
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/6ab85bae7667afd0aa68c6442b7ca5c369fa1088
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Bill-Wendling/pgo-add-clang-s-Profile-Guided-Optimization-infrastructure/20210112-133315
git checkout 6ab85bae7667afd0aa68c6442b7ca5c369fa1088
# save the attached .config to linux build tree
make W=1 ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

kernel/pgo/instrument.c:72:6: warning: no previous prototype for '__llvm_profile_instrument_target' [-Wmissing-prototypes]
72 | void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/pgo/instrument.c:135:6: warning: no previous prototype for '__llvm_profile_instrument_range' [-Wmissing-prototypes]
135 | void __llvm_profile_instrument_range(u64 target_value, void *data,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/pgo/instrument.c:179:6: warning: no previous prototype for '__llvm_profile_instrument_memop' [-Wmissing-prototypes]
179 | void __llvm_profile_instrument_memop(u64 target_value, void *data,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/__llvm_profile_instrument_memop +179 kernel/pgo/instrument.c

174
175 /*
176 * The target values are partitioned into multiple ranges. The range spec is
177 * defined in compiler-rt/include/profile/InstrProfData.inc.
178 */
> 179 void __llvm_profile_instrument_memop(u64 target_value, void *data,

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (2.56 kB)
.config.gz (76.00 kB)
Download all attachments

2021-01-13 02:23:55

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH v2] pgo: add clang's Profile Guided Optimization infrastructure

On Tue, Jan 12, 2021 at 9:37 AM 'Nick Desaulniers' via Clang Built
Linux <[email protected]> wrote:
>
> On Mon, Jan 11, 2021 at 9:14 PM Bill Wendling <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
>
> Please drop all changes to arch/* that are not to arch/x86/ then; we
> can cross that bridge when we get to each arch. For example, there's
> no point disabling PGO for architectures LLVM doesn't even have a
> backend for.
>
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native and isn't
> > compatible with clang's gcov support in kernel/gcov.
>
> Then the Kconfig option should depend on !GCOV so that they are
> mutually exclusive and can't be selected together accidentally; such
> as by bots doing randconfig tests.

The profile formats (Clang PGO, Clang gcov, GCC gcov/PGO) are
different but Clang PGO can be used with Clang's gcov implementation:
clang -fprofile-generate --coverage a.cc; ./a.out => default*.profraw + a.gcda

> <large snip>
>
> > +static inline int inst_prof_popcount(unsigned long long value)
> > +{
> > + value = value - ((value >> 1) & 0x5555555555555555ULL);
> > + value = (value & 0x3333333333333333ULL) +
> > + ((value >> 2) & 0x3333333333333333ULL);
> > + value = (value + (value >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
> > +
> > + return (int)((unsigned long long)(value * 0x0101010101010101ULL) >> 56);
> > +}
>
> The kernel has a portable popcnt implementation called hweight64 if
> you #include <asm-generic/bitops/hweight.h>; does that work here?
> https://en.wikipedia.org/wiki/Hamming_weight
> --
> Thanks,
> ~Nick Desaulniers
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdk%2BNqhzC_4wFbQMJmLMQWoDSjQiRJyCGe5dsWkqK_NJJQ%40mail.gmail.com.

2021-01-13 06:22:21

by Bill Wendling

[permalink] [raw]
Subject: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
Change-Id: Ic78e69c682286d3a44c4549a0138578c98138b77
---
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
testing.
- Corrected documentation, re PGO flags when using LTO, based on Fangrui
Song's comments.
v3: - Added change log section based on Sedat Dilek's comments.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
own popcount implementation, based on Nick Desaulniers's comment.
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 44 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 34 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 185 +++++++++++++
kernel/pgo/pgo.h | 206 ++++++++++++++
scripts/Makefile.lib | 10 +
23 files changed, 1019 insertions(+)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..b7f11d8405b73
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ $ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index cc1e6a5ee6e67..1b979da316fa4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13954,6 +13954,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index 9e73f82e0d863..9128bfe1ccc97 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a36..f39d3991f6bfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
pairs of 32-bit arguments, select this option.

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff08..36305ea61dc09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
#define THERMAL_TABLE(name)
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ KEEP(*(__llvm_prf_data)) \
+ . = ALIGN(8); \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_start = .; \
+ KEEP(*(__llvm_prf_cnts)) \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_names_start = .; \
+ KEEP(*(__llvm_prf_names)) \
+ . = ALIGN(8); \
+ __llvm_prf_names_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ KEEP(*(__llvm_prf_vals)) \
+ . = ALIGN(8); \
+ __llvm_prf_vals_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ KEEP(*(__llvm_prf_vnds)) \
+ . = ALIGN(8); \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1125,6 +1168,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..318d36bb3d106
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significatnly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..790a8df037bfc
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+ header->magic = LLVM_PRF_MAGIC;
+ header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 8 - (size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/* Serialize the profling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (err) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; ++i) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debufs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..6084ff0652e85
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/bitops.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value);
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+ if (value <= 8)
+ /* The first ranges are individually tracked, us it as is. */
+ return value;
+ else if (value >= 513)
+ /* The last range is mapped to its lowest value. */
+ return 513;
+ else if (hweight64(value) == 1)
+ /* If it's a power of two, use it as is. */
+ return value;
+
+ /* Otherwise, take to the previous power of two + 1. */
+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index);
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index)
+{
+ u64 rep_value;
+
+ /* Map the target value to the representative value of its range. */
+ rep_value = inst_prof_get_range_rep_value(target_value);
+ __llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#else
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION 5
+#define LLVM_PRF_DATA_ALIGN 8
+#define LLVM_PRF_IPVK_FIRST 0
+#define LLVM_PRF_IPVK_LAST 1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
+
+#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.30.0.284.gd98b1dd5eaa7-goog

2021-01-13 21:13:33

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

Hi Bill,

On Tue, Jan 12, 2021 at 10:19:58PM -0800, Bill Wendling wrote:
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Change-Id: Ic78e69c682286d3a44c4549a0138578c98138b77

Small nit: This should be removed.

I applied this patch on top of v5.11-rc3, built it with LLVM 12
(f1d5cbbdee5526bc86eac0a5652b115d9bc158e5 + D94470) with Microsoft's
WSL 5.4 config [1] + CONFIG_PGO_CLANG=y, and ran it on WSL2.

$ zgrep PGO /proc/config.gz
# Profile Guided Optimization (PGO) (EXPERIMENTAL)
CONFIG_ARCH_SUPPORTS_PGO_CLANG=y
CONFIG_PGO_CLANG=y
# end of Profile Guided Optimization (PGO) (EXPERIMENTAL)

However, I see an issue with actually using the data:

$ sudo -s
# mount -t debugfs none /sys/kernel/debug
# cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
# chown nathan:nathan vmlinux.profraw
# exit
$ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
error: No profiles could be merged.

Am I holding it wrong? :) Note, this is virtualized, I do not have any
"real" x86 hardware that I can afford to test on right now.

[1]: https://github.com/microsoft/WSL2-Linux-Kernel/raw/linux-msft-wsl-5.4.y/Microsoft/config-wsl

Cheers,
Nathan

2021-01-14 02:03:20

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

On Tue, Jan 12, 2021 at 10:20 PM Bill Wendling <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Change-Id: Ic78e69c682286d3a44c4549a0138578c98138b77
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 34 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 185 +++++++++++++
> kernel/pgo/pgo.h | 206 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 23 files changed, 1019 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9e..8d6418e858062 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 0000000000000..b7f11d8405b73
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset

Maybe I'm a noob, but I had to:
$ mkdir -p /sys/kernel/debug
$ mount -t debugfs none /sys/kernel/debug

first. That might trip up future travelers (like myself, I'm prone to
forget these unless they're in my shell history).

> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cc1e6a5ee6e67..1b979da316fa4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13954,6 +13954,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index 9e73f82e0d863..9128bfe1ccc97 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a36..f39d3991f6bfe 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> pairs of 32-bit arguments, select this option.
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff08..36305ea61dc09 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce2..383853e32f673 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3faa..ed12ab65f6065 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380bd..26e2b3af0145c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f25..f6cab2316c46a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd5..5f22b31446ad4 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..36f20e99da0bc 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..21797192f958f 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f357..54f5768f58530 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b33..2d81623b33f29 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535a..3a591bb18c5fb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \

So if I do a build of
$ make CC=clang
I observe the following error:
`.discard.text' referenced in section `__llvm_prf_data' of
arch/x86/kernel/setup.o: defined in discarded section `.discard.text'
of arch/x86/kernel/setup.o
`.discard.text' referenced in section `__llvm_prf_data' of
arch/x86/mm/init.o: defined in discarded section `.discard.text' of
arch/x86/mm/init.o

This can be investigated more via:
$ llvm-objdump -Dr -j __llvm_prf_data arch/x86/mm/init.o | grep discard
0000000000000168: R_X86_64_64 .discard.text

So looks like a relocation is referencing something in .discard.text.

$ llvm-objdump -Dr -j .discard.text arch/x86/mm/init.o
...
0000000000000000 <__brk_reservation_fn_early_pgt_alloc__>:
0: 48 83 05 00 00 00 00 01 addq $1, (%rip) # 8
<__brk_reservation_fn_early_pgt_alloc__+0x8>
0000000000000003: R_X86_64_PC32 __llvm_prf_cnts+0xf3
8: c3 retq

Looks like arch/x86/include/asm/setup.h defines the macro RESERVE_BRK
which defines a static function in the .discard.text section. Is
there a function attribute that we can use to say "please don't
profile this one function?"

For arch/x86/kernel/setup.o it looks like the same issue with
__brk_reservation_fn_dmi_alloc__.

More specifically, this warning goes away when using LLD:
$ make CC=clang LD=ld.lld

Not sure yet why these warnings are only observed when using BFD?
Maybe LLD should also be producing this diagnostic, but is not?

Anyways, I was able to build+boot profiling mode binaries built with:
$ make LLVM=1 defconfig+PGO
$ make LLVM=1 LLVM_IAS=1 defconfig+PGO

It would be good to resolve/investigate the above error with BFD and
fix it, or make this config also depend on LLD for now. ie.
$ make CC=clang defconfig+PGO

I'm going to try rebuilding+booting with the profile data now, and
will report back.

> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf3..0b34ca228ba46 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 0000000000000..318d36bb3d106
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,34 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG

probably should additionally:
depends on CLANG_VERSION >= 120000

I'm observing the same failed assertion as Nathan trying to build
x86_64 defconfig (drivers/gpu/drm/i915/i915_query.c), that should be
fixed by:
https://reviews.llvm.org/D94470

I think that would also help prevent this config from being selectable
if not using CC=clang?

> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significatnly larger and run slower. Also be sure to exclude files

^ typo: significantly

> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 0000000000000..41e27cefd9a47
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 0000000000000..790a8df037bfc
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,382 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> + header->magic = LLVM_PRF_MAGIC;
> + header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 8 - (size % 8);
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/* Serialize the profling data into a format LLVM's tools can understand. */

^ typo: profiling

> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (err) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; ++i) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debufs entries. */

^ typo: debugfs

> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 0000000000000..6084ff0652e85
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,185 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/* Lock guarding value node access and serialization. */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the CounterIndex if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, us it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 0000000000000..df0aa278f28bd
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#ifdef CONFIG_64BIT
> + #define LLVM_PRF_MAGIC \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#else
> + #define LLVM_PRF_MAGIC \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +#endif
> +
> +#define LLVM_PRF_VERSION 5
> +#define LLVM_PRF_DATA_ALIGN 8
> +#define LLVM_PRF_IPVK_FIRST 0
> +#define LLVM_PRF_IPVK_LAST 1
> +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> +
> +#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
> +#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_PRF_DATA_ALIGN);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33e..9b218afb5cb87 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>


--
Thanks,
~Nick Desaulniers

2021-01-14 02:13:54

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
<[email protected]> wrote:
>
> Hi Bill,
>
> On Tue, Jan 12, 2021 at 10:19:58PM -0800, Bill Wendling wrote:
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > Change-Id: Ic78e69c682286d3a44c4549a0138578c98138b77
>
> Small nit: This should be removed.
>
Grrr....The git hook keeps adding it in there. :-(

> I applied this patch on top of v5.11-rc3, built it with LLVM 12
> (f1d5cbbdee5526bc86eac0a5652b115d9bc158e5 + D94470) with Microsoft's
> WSL 5.4 config [1] + CONFIG_PGO_CLANG=y, and ran it on WSL2.
>
> $ zgrep PGO /proc/config.gz
> # Profile Guided Optimization (PGO) (EXPERIMENTAL)
> CONFIG_ARCH_SUPPORTS_PGO_CLANG=y
> CONFIG_PGO_CLANG=y
> # end of Profile Guided Optimization (PGO) (EXPERIMENTAL)
>
> However, I see an issue with actually using the data:
>
> $ sudo -s
> # mount -t debugfs none /sys/kernel/debug
> # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> # chown nathan:nathan vmlinux.profraw
> # exit
> $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> error: No profiles could be merged.
>
> Am I holding it wrong? :) Note, this is virtualized, I do not have any
> "real" x86 hardware that I can afford to test on right now.
>
> [1]: https://github.com/microsoft/WSL2-Linux-Kernel/raw/linux-msft-wsl-5.4.y/Microsoft/config-wsl
>
Could you send me the vmlinux.profraw file? (Don't CC this list.)

-bw

2021-01-14 04:11:39

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
<[email protected]> wrote:
>
> However, I see an issue with actually using the data:
>
> $ sudo -s
> # mount -t debugfs none /sys/kernel/debug
> # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> # chown nathan:nathan vmlinux.profraw
> # exit
> $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> error: No profiles could be merged.
>
> Am I holding it wrong? :) Note, this is virtualized, I do not have any
> "real" x86 hardware that I can afford to test on right now.

Same.

I think the magic calculation in this patch may differ from upstream
llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101

vs this patch:

+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#else
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+#endif

--
Thanks,
~Nick Desaulniers

2021-01-16 00:06:23

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> <[email protected]> wrote:
> >
> > However, I see an issue with actually using the data:
> >
> > $ sudo -s
> > # mount -t debugfs none /sys/kernel/debug
> > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > # chown nathan:nathan vmlinux.profraw
> > # exit
> > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > error: No profiles could be merged.
> >
> > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > "real" x86 hardware that I can afford to test on right now.
>
> Same.
>
> I think the magic calculation in this patch may differ from upstream
> llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101

Err...it looks like it was the padding calculation. With that fixed
up, we can query the profile data to get insights on the most heavily
called functions. Here's what my top 20 are (reset, then watch 10
minutes worth of cat videos on youtube while running `find /` and
sleeping at my desk). Anything curious stand out to anyone?

$ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
Instrumentation level: IR entry_first = 0
Total functions: 48970
Maximum function count: 62070879
Maximum internal block count: 83221158
Top 20 functions with the largest internal block counts:
drivers/tty/n_tty.c:n_tty_write, max count = 83221158
rcu_read_unlock_strict, max count = 62070879
_cond_resched, max count = 25486882
rcu_all_qs, max count = 25451477
drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
_raw_spin_unlock_irqrestore, max count = 18874121
drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
_raw_spin_lock_irqsave, max count = 18509161
memchr, max count = 15525452
_raw_spin_lock, max count = 15484254
__mod_memcg_state, max count = 14604619
__mod_memcg_lruvec_state, max count = 14602783
fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
__mod_lruvec_state, max count = 12527154
__mod_node_page_state, max count = 12525172
native_sched_clock, max count = 8904692
sched_clock_cpu, max count = 8895832
sched_clock, max count = 8894627
kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
fpregs_assert_state_consistent, max count = 8287198

--
Thanks,
~Nick Desaulniers

2021-01-16 00:16:25

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

> On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > <[email protected]> wrote:
> > >
> > > However, I see an issue with actually using the data:
> > >
> > > $ sudo -s
> > > # mount -t debugfs none /sys/kernel/debug
> > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > # chown nathan:nathan vmlinux.profraw
> > > # exit
> > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > error: No profiles could be merged.
> > >
> > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > "real" x86 hardware that I can afford to test on right now.
> >
> > Same.
> >
> > I think the magic calculation in this patch may differ from upstream
> > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
>
> Err...it looks like it was the padding calculation. With that fixed
> up, we can query the profile data to get insights on the most heavily
> called functions. Here's what my top 20 are (reset, then watch 10
> minutes worth of cat videos on youtube while running `find /` and
> sleeping at my desk). Anything curious stand out to anyone?

Hello world from my personal laptop whose kernel was rebuilt with
profiling data! Wow, I can run `find /` and watch cat videos on youtube
so fast! Users will love this! /s

Checking the sections sizes of .text.hot. and .text.unlikely. looks
good!

>
> $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> Instrumentation level: IR entry_first = 0
> Total functions: 48970
> Maximum function count: 62070879
> Maximum internal block count: 83221158
> Top 20 functions with the largest internal block counts:
> drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> rcu_read_unlock_strict, max count = 62070879
> _cond_resched, max count = 25486882
> rcu_all_qs, max count = 25451477
> drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> _raw_spin_unlock_irqrestore, max count = 18874121
> drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> _raw_spin_lock_irqsave, max count = 18509161
> memchr, max count = 15525452
> _raw_spin_lock, max count = 15484254
> __mod_memcg_state, max count = 14604619
> __mod_memcg_lruvec_state, max count = 14602783
> fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> __mod_lruvec_state, max count = 12527154
> __mod_node_page_state, max count = 12525172
> native_sched_clock, max count = 8904692
> sched_clock_cpu, max count = 8895832
> sched_clock, max count = 8894627
> kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> fpregs_assert_state_consistent, max count = 8287198
>
> --
> Thanks,
> ~Nick Desaulniers
>

2021-01-16 04:32:52

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 1:13 AM Nick Desaulniers
<[email protected]> wrote:
>
> > On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > > <[email protected]> wrote:
> > > >
> > > > However, I see an issue with actually using the data:
> > > >
> > > > $ sudo -s
> > > > # mount -t debugfs none /sys/kernel/debug
> > > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > # chown nathan:nathan vmlinux.profraw
> > > > # exit
> > > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > > error: No profiles could be merged.
> > > >
> > > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > > "real" x86 hardware that I can afford to test on right now.
> > >
> > > Same.
> > >
> > > I think the magic calculation in this patch may differ from upstream
> > > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> >
> > Err...it looks like it was the padding calculation. With that fixed
> > up, we can query the profile data to get insights on the most heavily
> > called functions. Here's what my top 20 are (reset, then watch 10
> > minutes worth of cat videos on youtube while running `find /` and
> > sleeping at my desk). Anything curious stand out to anyone?
>
> Hello world from my personal laptop whose kernel was rebuilt with
> profiling data! Wow, I can run `find /` and watch cat videos on youtube
> so fast! Users will love this! /s
>
> Checking the sections sizes of .text.hot. and .text.unlikely. looks
> good!
>

I love cat videos on youtube and do find parallelly...

I must try this :-)!

Might be good to write up an instruction (README) for followers?

- Sedat -

> >
> > $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> > Instrumentation level: IR entry_first = 0
> > Total functions: 48970
> > Maximum function count: 62070879
> > Maximum internal block count: 83221158
> > Top 20 functions with the largest internal block counts:
> > drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> > rcu_read_unlock_strict, max count = 62070879
> > _cond_resched, max count = 25486882
> > rcu_all_qs, max count = 25451477
> > drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> > _raw_spin_unlock_irqrestore, max count = 18874121
> > drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> > _raw_spin_lock_irqsave, max count = 18509161
> > memchr, max count = 15525452
> > _raw_spin_lock, max count = 15484254
> > __mod_memcg_state, max count = 14604619
> > __mod_memcg_lruvec_state, max count = 14602783
> > fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> > __mod_lruvec_state, max count = 12527154
> > __mod_node_page_state, max count = 12525172
> > native_sched_clock, max count = 8904692
> > sched_clock_cpu, max count = 8895832
> > sched_clock, max count = 8894627
> > kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> > fpregs_assert_state_consistent, max count = 8287198
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116001324.2865-1-nick.desaulniers%40gmail.com.

2021-01-16 05:10:37

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 1:13 AM Nick Desaulniers
<[email protected]> wrote:
>
> > On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > > <[email protected]> wrote:
> > > >
> > > > However, I see an issue with actually using the data:
> > > >
> > > > $ sudo -s
> > > > # mount -t debugfs none /sys/kernel/debug
> > > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > # chown nathan:nathan vmlinux.profraw
> > > > # exit
> > > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > > error: No profiles could be merged.
> > > >
> > > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > > "real" x86 hardware that I can afford to test on right now.
> > >
> > > Same.
> > >
> > > I think the magic calculation in this patch may differ from upstream
> > > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> >
> > Err...it looks like it was the padding calculation. With that fixed
> > up, we can query the profile data to get insights on the most heavily
> > called functions. Here's what my top 20 are (reset, then watch 10
> > minutes worth of cat videos on youtube while running `find /` and
> > sleeping at my desk). Anything curious stand out to anyone?
>
> Hello world from my personal laptop whose kernel was rebuilt with
> profiling data! Wow, I can run `find /` and watch cat videos on youtube
> so fast! Users will love this! /s
>
> Checking the sections sizes of .text.hot. and .text.unlikely. looks
> good!
>

Is that the latest status of Bill's patch?

Or do you have me a lore link?

- Sedat -

[1] https://github.com/gwelymernans/linux/commits/gwelymernans/linux


> >
> > $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> > Instrumentation level: IR entry_first = 0
> > Total functions: 48970
> > Maximum function count: 62070879
> > Maximum internal block count: 83221158
> > Top 20 functions with the largest internal block counts:
> > drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> > rcu_read_unlock_strict, max count = 62070879
> > _cond_resched, max count = 25486882
> > rcu_all_qs, max count = 25451477
> > drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> > _raw_spin_unlock_irqrestore, max count = 18874121
> > drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> > _raw_spin_lock_irqsave, max count = 18509161
> > memchr, max count = 15525452
> > _raw_spin_lock, max count = 15484254
> > __mod_memcg_state, max count = 14604619
> > __mod_memcg_lruvec_state, max count = 14602783
> > fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> > __mod_lruvec_state, max count = 12527154
> > __mod_node_page_state, max count = 12525172
> > native_sched_clock, max count = 8904692
> > sched_clock_cpu, max count = 8895832
> > sched_clock, max count = 8894627
> > kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> > fpregs_assert_state_consistent, max count = 8287198
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116001324.2865-1-nick.desaulniers%40gmail.com.

2021-01-16 05:23:43

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 6:07 AM Sedat Dilek <[email protected]> wrote:
>
> On Sat, Jan 16, 2021 at 1:13 AM Nick Desaulniers
> <[email protected]> wrote:
> >
> > > On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > > > <[email protected]> wrote:
> > > > >
> > > > > However, I see an issue with actually using the data:
> > > > >
> > > > > $ sudo -s
> > > > > # mount -t debugfs none /sys/kernel/debug
> > > > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > # chown nathan:nathan vmlinux.profraw
> > > > > # exit
> > > > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > > > error: No profiles could be merged.
> > > > >
> > > > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > > > "real" x86 hardware that I can afford to test on right now.
> > > >
> > > > Same.
> > > >
> > > > I think the magic calculation in this patch may differ from upstream
> > > > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> > >
> > > Err...it looks like it was the padding calculation. With that fixed
> > > up, we can query the profile data to get insights on the most heavily
> > > called functions. Here's what my top 20 are (reset, then watch 10
> > > minutes worth of cat videos on youtube while running `find /` and
> > > sleeping at my desk). Anything curious stand out to anyone?
> >
> > Hello world from my personal laptop whose kernel was rebuilt with
> > profiling data! Wow, I can run `find /` and watch cat videos on youtube
> > so fast! Users will love this! /s
> >
> > Checking the sections sizes of .text.hot. and .text.unlikely. looks
> > good!
> >
>
> Is that the latest status of Bill's patch?
>
> Or do you have me a lore link?
>

I tried with the message-id of Bill's initial email:

link="https://lore.kernel.org/r/[email protected]"
b4 -d am $link

This gives me:

v4_20210112_morbo_pgo_add_clang_s_profile_guided_optimization_infrastructure.mbx

- Sedat -

>
> [1] https://github.com/gwelymernans/linux/commits/gwelymernans/linux
>
>
> > >
> > > $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> > > Instrumentation level: IR entry_first = 0
> > > Total functions: 48970
> > > Maximum function count: 62070879
> > > Maximum internal block count: 83221158
> > > Top 20 functions with the largest internal block counts:
> > > drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> > > rcu_read_unlock_strict, max count = 62070879
> > > _cond_resched, max count = 25486882
> > > rcu_all_qs, max count = 25451477
> > > drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> > > _raw_spin_unlock_irqrestore, max count = 18874121
> > > drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> > > _raw_spin_lock_irqsave, max count = 18509161
> > > memchr, max count = 15525452
> > > _raw_spin_lock, max count = 15484254
> > > __mod_memcg_state, max count = 14604619
> > > __mod_memcg_lruvec_state, max count = 14602783
> > > fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> > > __mod_lruvec_state, max count = 12527154
> > > __mod_node_page_state, max count = 12525172
> > > native_sched_clock, max count = 8904692
> > > sched_clock_cpu, max count = 8895832
> > > sched_clock, max count = 8894627
> > > kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> > > fpregs_assert_state_consistent, max count = 8287198
> > >
> > > --
> > > Thanks,
> > > ~Nick Desaulniers
> > >
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116001324.2865-1-nick.desaulniers%40gmail.com.

2021-01-16 09:46:51

by Bill Wendling

[permalink] [raw]
Subject: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
---
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
testing.
- Corrected documentation, re PGO flags when using LTO, based on Fangrui
Song's comments.
v3: - Added change log section based on Sedat Dilek's comments.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
own popcount implementation, based on Nick Desaulniers's comment.
v5: - Correct padding calculation, discovered by Nathan Chancellor.
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/crypto/Makefile | 2 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 44 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 35 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 185 +++++++++++++
kernel/pgo/pgo.h | 206 ++++++++++++++
scripts/Makefile.lib | 10 +
24 files changed, 1022 insertions(+)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 0000000000000..b7f11d8405b73
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ $ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index 79b400c97059f..cb1f1f2b2baf4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13948,6 +13948,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index 9e73f82e0d863..9128bfe1ccc97 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a36..f39d3991f6bfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
pairs of 32-bit arguments, select this option.

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff08..36305ea61dc09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce2..383853e32f673 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3faa..ed12ab65f6065 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index a31de0c6ccde2..775fa0b368e98 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -4,6 +4,8 @@

OBJECT_FILES_NON_STANDARD := y

+PGO_PROFILE_curve25519-x86_64.o := n
+
obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o

obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380bd..26e2b3af0145c 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f25..f6cab2316c46a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd5..5f22b31446ad4 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20cb..36f20e99da0bc 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..21797192f958f 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f357..54f5768f58530 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b33..2d81623b33f29 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535a..3a591bb18c5fb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
#define THERMAL_TABLE(name)
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ KEEP(*(__llvm_prf_data)) \
+ . = ALIGN(8); \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_start = .; \
+ KEEP(*(__llvm_prf_cnts)) \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_names_start = .; \
+ KEEP(*(__llvm_prf_names)) \
+ . = ALIGN(8); \
+ __llvm_prf_names_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ KEEP(*(__llvm_prf_vals)) \
+ . = ALIGN(8); \
+ __llvm_prf_vals_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ KEEP(*(__llvm_prf_vnds)) \
+ . = ALIGN(8); \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1125,6 +1168,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf3..0b34ca228ba46 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 0000000000000..76a640b6cf6ed
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significantly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 0000000000000..41e27cefd9a47
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 0000000000000..68b24672be10a
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+ header->magic = LLVM_PRF_MAGIC;
+ header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 7 & (8 - size % 8);
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/* Serialize the profiling data into a format LLVM's tools can understand. */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (err) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; ++i) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debugfs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 0000000000000..6084ff0652e85
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/bitops.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/* Lock guarding value node access and serialization. */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the CounterIndex if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value);
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+ if (value <= 8)
+ /* The first ranges are individually tracked, us it as is. */
+ return value;
+ else if (value >= 513)
+ /* The last range is mapped to its lowest value. */
+ return 513;
+ else if (hweight64(value) == 1)
+ /* If it's a power of two, use it as is. */
+ return value;
+
+ /* Otherwise, take to the previous power of two + 1. */
+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index);
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index)
+{
+ u64 rep_value;
+
+ /* Map the target value to the representative value of its range. */
+ rep_value = inst_prof_get_range_rep_value(target_value);
+ __llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 0000000000000..df0aa278f28bd
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#ifdef CONFIG_64BIT
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#else
+ #define LLVM_PRF_MAGIC \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+#endif
+
+#define LLVM_PRF_VERSION 5
+#define LLVM_PRF_DATA_ALIGN 8
+#define LLVM_PRF_IPVK_FIRST 0
+#define LLVM_PRF_IPVK_LAST 1
+#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
+
+#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
+#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_PRF_DATA_ALIGN);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33e..9b218afb5cb87 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.30.0.284.gd98b1dd5eaa7-goog

2021-01-16 17:42:44

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
<[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 2 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 185 +++++++++++++
> kernel/pgo/pgo.h | 206 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1022 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9e..8d6418e858062 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 0000000000000..b7f11d8405b73
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +

I do not get this...

# mount | grep debugfs
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)

After the load-test...?

echo 0 > /sys/kernel/debug/pgo/reset

> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +

This is only 4,9M small and seen from the date 5mins before I run the
echo-1 line.

# ll /sys/kernel/debug/pgo
insgesamt 0
drwxr-xr-x 2 root root 0 16. Jan 17:29 .
drwx------ 41 root root 0 16. Jan 17:29 ..
-rw------- 1 root root 0 16. Jan 17:29 profraw
--w------- 1 root root 0 16. Jan 18:19 reset

# cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw

# ll /tmp/vmlinux.profraw
-rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw

For me there was no prof-data collected from my defconfig kernel-build.

> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +

Is that executed in /path/to/linux/git?

> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

How big is vmlinux.profdata (make defconfig)?

Do I need to do a full defconfig build or can I stop the build after
let me say 10mins?

- Sedat -

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 79b400c97059f..cb1f1f2b2baf4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13948,6 +13948,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index 9e73f82e0d863..9128bfe1ccc97 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a36..f39d3991f6bfe 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> pairs of 32-bit arguments, select this option.
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff08..36305ea61dc09 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce2..383853e32f673 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3faa..ed12ab65f6065 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index a31de0c6ccde2..775fa0b368e98 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,8 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
>
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380bd..26e2b3af0145c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f25..f6cab2316c46a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd5..5f22b31446ad4 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..36f20e99da0bc 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..21797192f958f 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f357..54f5768f58530 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b33..2d81623b33f29 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535a..3a591bb18c5fb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf3..0b34ca228ba46 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 0000000000000..76a640b6cf6ed
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 0000000000000..41e27cefd9a47
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 0000000000000..68b24672be10a
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,382 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> + header->magic = LLVM_PRF_MAGIC;
> + header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (8 - size % 8);
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/* Serialize the profiling data into a format LLVM's tools can understand. */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (err) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; ++i) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 0000000000000..6084ff0652e85
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,185 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/* Lock guarding value node access and serialization. */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the CounterIndex if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, us it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 0000000000000..df0aa278f28bd
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#ifdef CONFIG_64BIT
> + #define LLVM_PRF_MAGIC \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#else
> + #define LLVM_PRF_MAGIC \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +#endif
> +
> +#define LLVM_PRF_VERSION 5
> +#define LLVM_PRF_DATA_ALIGN 8
> +#define LLVM_PRF_IPVK_FIRST 0
> +#define LLVM_PRF_IPVK_LAST 1
> +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> +
> +#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
> +#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_PRF_DATA_ALIGN);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33e..9b218afb5cb87 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116094357.3620352-1-morbo%40google.com.

2021-01-16 18:40:41

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 6:38 PM Sedat Dilek <[email protected]> wrote:
>
> On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > testing.
> > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 2 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 44 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 185 +++++++++++++
> > kernel/pgo/pgo.h | 206 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1022 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9e..8d6418e858062 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 0000000000000..b7f11d8405b73
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
>
> I do not get this...
>
> # mount | grep debugfs
> debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
>

I tried:

# umount /sys/kernel/debug

# mount -t debugfs none /sys/kernel/debug

# echo 1 > /sys/kernel/debug/pgo/reset

*** Run load-test ***

Again the profraw file is younger.

# LC_ALL=C ll /sys/kernel/debug/pgo/
total 0
drwxr-xr-x 2 root root 0 Jan 16 17:29 .
drwx------ 41 root root 0 Jan 16 17:29 ..
-rw------- 1 root root 0 Jan 16 19:14 profraw
--w------- 1 root root 0 Jan 16 19:29 reset

Did this really profile my kernel-build?

- Sedat -

> After the load-test...?
>
> echo 0 > /sys/kernel/debug/pgo/reset
>
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
>
> This is only 4,9M small and seen from the date 5mins before I run the
> echo-1 line.
>
> # ll /sys/kernel/debug/pgo
> insgesamt 0
> drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> drwx------ 41 root root 0 16. Jan 17:29 ..
> -rw------- 1 root root 0 16. Jan 17:29 profraw
> --w------- 1 root root 0 16. Jan 18:19 reset
>
> # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
>
> # ll /tmp/vmlinux.profraw
> -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
>
> For me there was no prof-data collected from my defconfig kernel-build.
>
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
>
> Is that executed in /path/to/linux/git?
>
> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> How big is vmlinux.profdata (make defconfig)?
>
> Do I need to do a full defconfig build or can I stop the build after
> let me say 10mins?
>
> - Sedat -
>
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 79b400c97059f..cb1f1f2b2baf4 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13948,6 +13948,15 @@ S: Maintained
> > F: include/linux/personality.h
> > F: include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M: Sami Tolvanen <[email protected]>
> > +M: Bill Wendling <[email protected]>
> > +R: Nathan Chancellor <[email protected]>
> > +R: Nick Desaulniers <[email protected]>
> > +S: Supported
> > +F: Documentation/dev-tools/pgo.rst
> > +F: kernel/pgo
> > +
> > PHOENIX RC FLIGHT CONTROLLER ADAPTER
> > M: Marcus Folkesson <[email protected]>
> > L: [email protected]
> > diff --git a/Makefile b/Makefile
> > index 9e73f82e0d863..9128bfe1ccc97 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 24862d15f3a36..f39d3991f6bfe 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> > pairs of 32-bit arguments, select this option.
> >
> > source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> > source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 21f851179ff08..36305ea61dc09 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -96,6 +96,7 @@ config X86
> > select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> > select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> > + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_QUEUED_RWLOCKS
> > select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce2..383853e32f673 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3faa..ed12ab65f6065 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE :=n
> >
> > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> > index a31de0c6ccde2..775fa0b368e98 100644
> > --- a/arch/x86/crypto/Makefile
> > +++ b/arch/x86/crypto/Makefile
> > @@ -4,6 +4,8 @@
> >
> > OBJECT_FILES_NON_STANDARD := y
> >
> > +PGO_PROFILE_curve25519-x86_64.o := n
> > +
> > obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
> >
> > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380bd..26e2b3af0145c 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> > $(call ld-option, --eh-frame-hdr) -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > quiet_cmd_vdso_and_check = VDSO $@
> > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f25..f6cab2316c46a 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> > BUG_TABLE
> >
> > + PGO_CLANG_DATA
> > +
> > ORC_UNWIND_TABLE
> >
> > . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd5..5f22b31446ad4 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> > KASAN_SANITIZE := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20cb..36f20e99da0bc 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> > # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449f..21797192f958f 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f357..54f5768f58530 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b33..2d81623b33f29 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > # Sanitizer runtimes are unavailable and cannot be linked here.
> > KASAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535a..3a591bb18c5fb 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> > #define THERMAL_TABLE(name)
> > #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA \
> > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_start = .; \
> > + __llvm_prf_data_start = .; \
> > + KEEP(*(__llvm_prf_data)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_data_end = .; \
> > + } \
> > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_start = .; \
> > + KEEP(*(__llvm_prf_cnts)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_end = .; \
> > + } \
> > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_start = .; \
> > + KEEP(*(__llvm_prf_names)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> > + __llvm_prf_vals_start = .; \
> > + KEEP(*(__llvm_prf_vals)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vals_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> > + __llvm_prf_vnds_start = .; \
> > + KEEP(*(__llvm_prf_vnds)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vnds_end = .; \
> > + __llvm_prf_end = .; \
> > + }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> > #define KERNEL_DTB() \
> > STRUCT_ALIGN(); \
> > __dtb_start = .; \
> > @@ -1125,6 +1168,7 @@
> > CONSTRUCTORS \
> > } \
> > BUG_TABLE \
> > + PGO_CLANG_DATA
> >
> > #define INIT_TEXT_SECTION(inittext_align) \
> > . = ALIGN(inittext_align); \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf3..0b34ca228ba46 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> > obj-$(CONFIG_KCSAN) += kcsan/
> > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> > obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 0000000000000..76a640b6cf6ed
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,35 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > + bool
> > +
> > +config PGO_CLANG
> > + bool "Enable clang's PGO-based kernel profiling"
> > + depends on DEBUG_FS
> > + depends on ARCH_SUPPORTS_PGO_CLANG
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> > + help
> > + This option enables clang's PGO (Profile Guided Optimization) based
> > + code profiling to better optimize the kernel.
> > +
> > + If unsure, say N.
> > +
> > + Run a representative workload for your application on a kernel
> > + compiled with this option and download the raw profile file from
> > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > + llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > + Copy the resulting profile file into vmlinux.profdata, and enable
> > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > + kernel.
> > +
> > + Note that a kernel compiled with profiling flags will be
> > + significantly larger and run slower. Also be sure to exclude files
> > + from profiling which are not linked to the kernel image to prevent
> > + linker errors.
> > +
> > + Note that the debugfs filesystem has to be mounted to access
> > + profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 0000000000000..41e27cefd9a47
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > +
> > +obj-y += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 0000000000000..68b24672be10a
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,382 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > + void *buffer;
> > + unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + * - llvm_prf_header
> > + * - __llvm_prf_data
> > + * - __llvm_prf_cnts
> > + * - __llvm_prf_names
> > + * - zero padding to 8 bytes
> > + * - for each llvm_prf_data in __llvm_prf_data:
> > + * - llvm_prf_value_data
> > + * - llvm_prf_value_record + site count array
> > + * - llvm_prf_value_node_data
> > + * ...
> > + * ...
> > + * ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > + header->magic = LLVM_PRF_MAGIC;
> > + header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> > + header->data_size = prf_data_count();
> > + header->padding_bytes_before_counters = 0;
> > + header->counters_size = prf_cnts_count();
> > + header->padding_bytes_after_counters = 0;
> > + header->names_size = prf_names_count();
> > + header->counters_delta = (u64)__llvm_prf_cnts_start;
> > + header->names_delta = (u64)__llvm_prf_names_start;
> > + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> > +
> > + *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > + memcpy(*buffer, src, size);
> > + *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + u32 kinds = 0;
> > + u32 size = 0;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Record + site count array */
> > + size += prf_get_value_record_size(sites);
> > + kinds++;
> > +
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX)
> > + site = site->next;
> > +
> > + size += count *
> > + sizeof(struct llvm_prf_value_node_data);
> > + }
> > +
> > + s += sites;
> > + }
> > +
> > + if (size)
> > + size += sizeof(struct llvm_prf_value_data);
> > +
> > + if (value_kinds)
> > + *value_kinds = kinds;
> > +
> > + return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > + u32 size = 0;
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + size += __prf_get_value_size(p, NULL);
> > +
> > + return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > + struct llvm_prf_value_data header;
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > + if (!header.num_value_kinds)
> > + /* Nothing to write. */
> > + return;
> > +
> > + prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + struct llvm_prf_value_record *record;
> > + u8 *counts;
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Profiling value record. */
> > + record = *(struct llvm_prf_value_record **)buffer;
> > + *buffer += prf_get_value_record_header_size();
> > +
> > + record->kind = kind;
> > + record->num_value_sites = sites;
> > +
> > + /* Site count array. */
> > + counts = *(u8 **)buffer;
> > + *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > + /*
> > + * If we don't have nodes, we can skip updating the site count
> > + * array, because the buffer is zero filled.
> > + */
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX) {
> > + prf_copy_to_buffer(buffer, site,
> > + sizeof(struct llvm_prf_value_node_data));
> > + site = site->next;
> > + }
> > +
> > + counts[n] = (u8)count;
> > + }
> > +
> > + s += sites;
> > + }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > + return 7 & (8 - size % 8);
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > + return sizeof(struct llvm_prf_header) +
> > + prf_data_size() +
> > + prf_cnts_size() +
> > + prf_names_size() +
> > + prf_get_padding(prf_names_size()) +
> > + prf_get_value_size();
> > +}
> > +
> > +/* Serialize the profiling data into a format LLVM's tools can understand. */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > + int err = 0;
> > + void *buffer;
> > +
> > + p->size = prf_buffer_size();
> > + p->buffer = vzalloc(p->size);
> > +
> > + if (!p->buffer) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + buffer = p->buffer;
> > +
> > + prf_fill_header(&buffer);
> > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > + buffer += prf_get_padding(prf_names_size());
> > +
> > + prf_serialize_values(&buffer);
> > +
> > +out:
> > + return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data;
> > + unsigned long flags;
> > + int err;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + flags = prf_lock();
> > +
> > + err = prf_serialize(data);
> > + if (err) {
> > + kfree(data);
> > + goto out_unlock;
> > + }
> > +
> > + file->private_data = data;
> > +
> > +out_unlock:
> > + prf_unlock(flags);
> > +out:
> > + return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + BUG_ON(!data);
> > +
> > + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > + data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + if (data) {
> > + vfree(data->buffer);
> > + kfree(data);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > + .owner = THIS_MODULE,
> > + .open = prf_open,
> > + .read = prf_read,
> > + .llseek = default_llseek,
> > + .release = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > + size_t len, loff_t *pos)
> > +{
> > + struct llvm_prf_data *data;
> > +
> > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
> > + struct llvm_prf_value_node **vnodes;
> > + u64 current_vsite_count;
> > + u32 i;
> > +
> > + if (!data->values)
> > + continue;
> > +
> > + current_vsite_count = 0;
> > + vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > + for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> > + current_vsite_count += data->num_value_sites[i];
> > +
> > + for (i = 0; i < current_vsite_count; ++i) {
> > + struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > + while (current_vnode) {
> > + current_vnode->count = 0;
> > + current_vnode = current_vnode->next;
> > + }
> > + }
> > + }
> > +
> > + return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > + .owner = THIS_MODULE,
> > + .write = reset_write,
> > + .llseek = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > + directory = debugfs_create_dir("pgo", NULL);
> > + if (!directory)
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > + &prf_fops))
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("reset", 0200, directory, NULL,
> > + &prf_reset_fops))
> > + goto err_remove;
> > +
> > + return 0;
> > +
> > +err_remove:
> > + pr_err("initialization failed\n");
> > + return -EIO;
> > +}
> > +
> > +/* Remove debugfs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > + debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 0000000000000..6084ff0652e85
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,185 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/* Lock guarding value node access and serialization. */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&pgo_lock, flags);
> > +
> > + return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > + spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > + u32 index, u64 value)
> > +{
> > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > + return NULL; /* Out of nodes */
> > +
> > + current_node++;
> > +
> > + /* Make sure the node is entirely within the section */
> > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > + return NULL;
> > +
> > + return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the CounterIndex if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > + struct llvm_prf_value_node **counters;
> > + struct llvm_prf_value_node *curr;
> > + struct llvm_prf_value_node *min = NULL;
> > + struct llvm_prf_value_node *prev = NULL;
> > + u64 min_count = U64_MAX;
> > + u8 values = 0;
> > + unsigned long flags;
> > +
> > + if (!p || !p->values)
> > + return;
> > +
> > + counters = (struct llvm_prf_value_node **)p->values;
> > + curr = counters[index];
> > +
> > + while (curr) {
> > + if (target_value == curr->value) {
> > + curr->count++;
> > + return;
> > + }
> > +
> > + if (curr->count < min_count) {
> > + min_count = curr->count;
> > + min = curr;
> > + }
> > +
> > + prev = curr;
> > + curr = curr->next;
> > + values++;
> > + }
> > +
> > + if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> > + if (!min->count || !(--min->count)) {
> > + curr = min;
> > + curr->value = target_value;
> > + curr->count++;
> > + }
> > + return;
> > + }
> > +
> > + /* Lock when updating the value node structure. */
> > + flags = prf_lock();
> > +
> > + curr = allocate_node(p, index, target_value);
> > + if (!curr)
> > + goto out;
> > +
> > + curr->value = target_value;
> > + curr->count++;
> > +
> > + if (!counters[index])
> > + counters[index] = curr;
> > + else if (prev && !prev->next)
> > + prev->next = curr;
> > +
> > +out:
> > + prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value);
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value)
> > +{
> > + if (large_value != S64_MIN && (s64)target_value >= large_value)
> > + target_value = large_value;
> > + else if ((s64)target_value < precise_start ||
> > + (s64)target_value > precise_last)
> > + target_value = precise_last + 1;
> > +
> > + __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > +
> > +static u64 inst_prof_get_range_rep_value(u64 value)
> > +{
> > + if (value <= 8)
> > + /* The first ranges are individually tracked, us it as is. */
> > + return value;
> > + else if (value >= 513)
> > + /* The last range is mapped to its lowest value. */
> > + return 513;
> > + else if (hweight64(value) == 1)
> > + /* If it's a power of two, use it as is. */
> > + return value;
> > +
> > + /* Otherwise, take to the previous power of two + 1. */
> > + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> > +}
> > +
> > +/*
> > + * The target values are partitioned into multiple ranges. The range spec is
> > + * defined in compiler-rt/include/profile/InstrProfData.inc.
> > + */
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index);
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index)
> > +{
> > + u64 rep_value;
> > +
> > + /* Map the target value to the representative value of its range. */
> > + rep_value = inst_prof_get_range_rep_value(target_value);
> > + __llvm_profile_instrument_target(rep_value, data, counter_index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 0000000000000..df0aa278f28bd
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,206 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#ifdef CONFIG_64BIT
> > + #define LLVM_PRF_MAGIC \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'r' << 8 | \
> > + (u64)129)
> > +#else
> > + #define LLVM_PRF_MAGIC \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'R' << 8 | \
> > + (u64)129)
> > +#endif
> > +
> > +#define LLVM_PRF_VERSION 5
> > +#define LLVM_PRF_DATA_ALIGN 8
> > +#define LLVM_PRF_IPVK_FIRST 0
> > +#define LLVM_PRF_IPVK_LAST 1
> > +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
> > +
> > +#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
> > +#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + * counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + * counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + * counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + * counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > + u64 magic;
> > + u64 version;
> > + u64 data_size;
> > + u64 padding_bytes_before_counters;
> > + u64 counters_size;
> > + u64 padding_bytes_after_counters;
> > + u64 names_size;
> > + u64 counters_delta;
> > + u64 names_delta;
> > + u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > + const u64 name_ref;
> > + const u64 func_hash;
> > + const void *counter_ptr;
> > + const void *function_ptr;
> > + void *values;
> > + const u32 num_counters;
> > + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> > +} __aligned(LLVM_PRF_DATA_ALIGN);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + * llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > + u64 value;
> > + u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + * the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > + u64 value;
> > + u64 count;
> > + struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + * format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + * data.
> > + */
> > +struct llvm_prf_value_data {
> > + u32 total_size;
> > + u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + * profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + * of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > + u32 kind;
> > + u32 num_value_sites;
> > + u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size() \
> > + offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites) \
> > + roundup((sites), 8)
> > +#define prf_get_value_record_size(sites) \
> > + (prf_get_value_record_header_size() + \
> > + prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > + static inline unsigned long prf_ ## s ## _size(void) \
> > + { \
> > + unsigned long start = \
> > + (unsigned long)__llvm_prf_ ## s ## _start; \
> > + unsigned long end = \
> > + (unsigned long)__llvm_prf_ ## s ## _end; \
> > + return roundup(end - start, \
> > + sizeof(__llvm_prf_ ## s ## _start[0])); \
> > + } \
> > + static inline unsigned long prf_ ## s ## _count(void) \
> > + { \
> > + return prf_ ## s ## _size() / \
> > + sizeof(__llvm_prf_ ## s ## _start[0]); \
> > + }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33e..9b218afb5cb87 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_GCOV))
> > endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > + $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> > #
> > # Enable address sanitizer flags for kernel except some files or directories
> > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.284.gd98b1dd5eaa7-goog
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116094357.3620352-1-morbo%40google.com.

2021-01-16 20:25:43

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > testing.
> > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 2 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 44 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 185 +++++++++++++
> > kernel/pgo/pgo.h | 206 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1022 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9e..8d6418e858062 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 0000000000000..b7f11d8405b73
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
>
> I do not get this...
>
> # mount | grep debugfs
> debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
>
> After the load-test...?
>
> echo 0 > /sys/kernel/debug/pgo/reset
>
Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
the profiling counters. I picked 1 (one) semi-randomly, but it could
be any number, letter, your favorite short story, etc. You don't want
to reset it before collecting the profiling data from your load tests
though.

> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
>
> This is only 4,9M small and seen from the date 5mins before I run the
> echo-1 line.
>
> # ll /sys/kernel/debug/pgo
> insgesamt 0
> drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> drwx------ 41 root root 0 16. Jan 17:29 ..
> -rw------- 1 root root 0 16. Jan 17:29 profraw
> --w------- 1 root root 0 16. Jan 18:19 reset
>
> # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
>
> # ll /tmp/vmlinux.profraw
> -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
>
> For me there was no prof-data collected from my defconfig kernel-build.
>
The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
it, not even the kernel. All it does is serialize the profiling
counters from a memory location in the kernel into a format that
LLVM's tools can understand.

> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
>
> Is that executed in /path/to/linux/git?
>
The llvm-profdata tool is not in the linux source tree. You need to
grab it from a clang distribution (or built from clang's git repo).

> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> How big is vmlinux.profdata (make defconfig)?
>
I don't have numbers for this, but from what you listed here, it's ~5M
in size. The size is proportional to the number of counters
instrumented in the kernel.

> Do I need to do a full defconfig build or can I stop the build after
> let me say 10mins?
>
You should do a full rebuild. Make sure that PGO is disabled during the rebuild.

-bw

2021-01-17 10:50:46

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
>
> On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > <[email protected]> wrote:
> > >
> > > From: Sami Tolvanen <[email protected]>
> > >
> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > profile, the kernel is instrumented with PGO counters, a representative
> > > workload is run, and the raw profile data is collected from
> > > /sys/kernel/debug/pgo/profraw.
> > >
> > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > before it can be used during recompilation:
> > >
> > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >
> > > Multiple raw profiles may be merged during this step.
> > >
> > > The data can now be used by the compiler:
> > >
> > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > >
> > > This initial submission is restricted to x86, as that's the platform we
> > > know works. This restriction can be lifted once other platforms have
> > > been verified to work with PGO.
> > >
> > > Note that this method of profiling the kernel is clang-native, unlike
> > > the clang support in kernel/gcov.
> > >
> > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > >
> > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > Co-developed-by: Bill Wendling <[email protected]>
> > > Signed-off-by: Bill Wendling <[email protected]>
> > > ---
> > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > testing.
> > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > Song's comments.
> > > v3: - Added change log section based on Sedat Dilek's comments.
> > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > own popcount implementation, based on Nick Desaulniers's comment.
> > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > ---
> > > Documentation/dev-tools/index.rst | 1 +
> > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > MAINTAINERS | 9 +
> > > Makefile | 3 +
> > > arch/Kconfig | 1 +
> > > arch/x86/Kconfig | 1 +
> > > arch/x86/boot/Makefile | 1 +
> > > arch/x86/boot/compressed/Makefile | 1 +
> > > arch/x86/crypto/Makefile | 2 +
> > > arch/x86/entry/vdso/Makefile | 1 +
> > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > arch/x86/platform/efi/Makefile | 1 +
> > > arch/x86/purgatory/Makefile | 1 +
> > > arch/x86/realmode/rm/Makefile | 1 +
> > > arch/x86/um/vdso/Makefile | 1 +
> > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > kernel/Makefile | 1 +
> > > kernel/pgo/Kconfig | 35 +++
> > > kernel/pgo/Makefile | 5 +
> > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > scripts/Makefile.lib | 10 +
> > > 24 files changed, 1022 insertions(+)
> > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > create mode 100644 kernel/pgo/Kconfig
> > > create mode 100644 kernel/pgo/Makefile
> > > create mode 100644 kernel/pgo/fs.c
> > > create mode 100644 kernel/pgo/instrument.c
> > > create mode 100644 kernel/pgo/pgo.h
> > >
> > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > index f7809c7b1ba9e..8d6418e858062 100644
> > > --- a/Documentation/dev-tools/index.rst
> > > +++ b/Documentation/dev-tools/index.rst
> > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > kgdb
> > > kselftest
> > > kunit/index
> > > + pgo
> > >
> > >
> > > .. only:: subproject and html
> > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > new file mode 100644
> > > index 0000000000000..b7f11d8405b73
> > > --- /dev/null
> > > +++ b/Documentation/dev-tools/pgo.rst
> > > @@ -0,0 +1,127 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +
> > > +===============================
> > > +Using PGO with the Linux kernel
> > > +===============================
> > > +
> > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > +debugfs directory.
> > > +
> > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > +
> > > +
> > > +Preparation
> > > +===========
> > > +
> > > +Configure the kernel with:
> > > +
> > > +.. code-block:: make
> > > +
> > > + CONFIG_DEBUG_FS=y
> > > + CONFIG_PGO_CLANG=y
> > > +
> > > +Note that kernels compiled with profiling flags will be significantly larger
> > > +and run slower.
> > > +
> > > +Profiling data will only become accessible once debugfs has been mounted:
> > > +
> > > +.. code-block:: sh
> > > +
> > > + mount -t debugfs none /sys/kernel/debug
> > > +
> > > +
> > > +Customization
> > > +=============
> > > +
> > > +You can enable or disable profiling for individual file and directories by
> > > +adding a line similar to the following to the respective kernel Makefile:
> > > +
> > > +- For a single file (e.g. main.o)
> > > +
> > > + .. code-block:: make
> > > +
> > > + PGO_PROFILE_main.o := y
> > > +
> > > +- For all files in one directory
> > > +
> > > + .. code-block:: make
> > > +
> > > + PGO_PROFILE := y
> > > +
> > > +To exclude files from being profiled use
> > > +
> > > + .. code-block:: make
> > > +
> > > + PGO_PROFILE_main.o := n
> > > +
> > > +and
> > > +
> > > + .. code-block:: make
> > > +
> > > + PGO_PROFILE := n
> > > +
> > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > +modules are supported by this mechanism.
> > > +
> > > +
> > > +Files
> > > +=====
> > > +
> > > +The PGO kernel support creates the following files in debugfs:
> > > +
> > > +``/sys/kernel/debug/pgo``
> > > + Parent directory for all PGO-related files.
> > > +
> > > +``/sys/kernel/debug/pgo/reset``
> > > + Global reset file: resets all coverage data to zero when written to.
> > > +
> > > +``/sys/kernel/debug/profraw``
> > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > +
> > > +
> > > +Workflow
> > > +========
> > > +
> > > +The PGO kernel can be run on the host or test machines. The data though should
> > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > +Clang version.
> > > +
> > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > +etc. Clang offers tools to perform these tasks.
> > > +
> > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > +using the result to optimize the kernel:
> > > +
> > > +1) Install the kernel on the TEST machine.
> > > +
> > > +2) Reset the data counters right before running the load tests
> > > +
> > > + .. code-block:: sh
> > > +
> > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > +
> >
> > I do not get this...
> >
> > # mount | grep debugfs
> > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> >
> > After the load-test...?
> >
> > echo 0 > /sys/kernel/debug/pgo/reset
> >
> Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> the profiling counters. I picked 1 (one) semi-randomly, but it could
> be any number, letter, your favorite short story, etc. You don't want
> to reset it before collecting the profiling data from your load tests
> though.
>
> > > +3) Run the load tests.
> > > +
> > > +4) Collect the raw profile data
> > > +
> > > + .. code-block:: sh
> > > +
> > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > +
> >
> > This is only 4,9M small and seen from the date 5mins before I run the
> > echo-1 line.
> >
> > # ll /sys/kernel/debug/pgo
> > insgesamt 0
> > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > drwx------ 41 root root 0 16. Jan 17:29 ..
> > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > --w------- 1 root root 0 16. Jan 18:19 reset
> >
> > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> >
> > # ll /tmp/vmlinux.profraw
> > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> >
> > For me there was no prof-data collected from my defconfig kernel-build.
> >
> The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> it, not even the kernel. All it does is serialize the profiling
> counters from a memory location in the kernel into a format that
> LLVM's tools can understand.
>
> > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > +
> > > +6) Process the raw profile data
> > > +
> > > + .. code-block:: sh
> > > +
> > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > +
> >
> > Is that executed in /path/to/linux/git?
> >
> The llvm-profdata tool is not in the linux source tree. You need to
> grab it from a clang distribution (or built from clang's git repo).
>
> > > + Note that multiple raw profile data files can be merged during this step.
> > > +
> > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > +
> > > + .. code-block:: sh
> > > +
> > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > How big is vmlinux.profdata (make defconfig)?
> >
> I don't have numbers for this, but from what you listed here, it's ~5M
> in size. The size is proportional to the number of counters
> instrumented in the kernel.
>
> > Do I need to do a full defconfig build or can I stop the build after
> > let me say 10mins?
> >
> You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
>

Thanks Bill for all the information.

And sorry if I am so pedantic.

I have installed my Debian system with Legacy-BIOS enabled.

When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
have as a default) my system hangs on reboot.

[ diffconfig ]
$ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
/boot/config-5.11.0-rc3-9-amd64-clang12-pgo
BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
PGO_CLANG y -> n

[ my make line ]
$ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
KBUILD_BUILD_HOST=iniza [email protected]
KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
KCFLAGS=-fprofile-use=vmlinux.profdata

( Yes, 06:47 a.m. in the morning :-). )

When I boot with the rebuild Linux-kernel I see:

Wrong EFI loader signature
...
Decompressing
Parsing EFI
Performing Relocations done.
Booting the Kernel.

*** SYSTEM HANGS ***
( I waited for approx 1 min )

I tried to turn UEFI support ON and OFF.
No success.

Does Clang-PGO support Legacy-BIOS or is something different wrong?

Thanks.

- Sedat -

2021-01-17 10:57:03

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <[email protected]> wrote:
>
> On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
> >
> > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > <[email protected]> wrote:
> > > >
> > > > From: Sami Tolvanen <[email protected]>
> > > >
> > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > workload is run, and the raw profile data is collected from
> > > > /sys/kernel/debug/pgo/profraw.
> > > >
> > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > before it can be used during recompilation:
> > > >
> > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > >
> > > > Multiple raw profiles may be merged during this step.
> > > >
> > > > The data can now be used by the compiler:
> > > >
> > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > >
> > > > This initial submission is restricted to x86, as that's the platform we
> > > > know works. This restriction can be lifted once other platforms have
> > > > been verified to work with PGO.
> > > >
> > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > the clang support in kernel/gcov.
> > > >
> > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > >
> > > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > > Co-developed-by: Bill Wendling <[email protected]>
> > > > Signed-off-by: Bill Wendling <[email protected]>
> > > > ---
> > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > testing.
> > > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > Song's comments.
> > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > own popcount implementation, based on Nick Desaulniers's comment.
> > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > ---
> > > > Documentation/dev-tools/index.rst | 1 +
> > > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > > MAINTAINERS | 9 +
> > > > Makefile | 3 +
> > > > arch/Kconfig | 1 +
> > > > arch/x86/Kconfig | 1 +
> > > > arch/x86/boot/Makefile | 1 +
> > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > arch/x86/crypto/Makefile | 2 +
> > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > arch/x86/platform/efi/Makefile | 1 +
> > > > arch/x86/purgatory/Makefile | 1 +
> > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > arch/x86/um/vdso/Makefile | 1 +
> > > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > > kernel/Makefile | 1 +
> > > > kernel/pgo/Kconfig | 35 +++
> > > > kernel/pgo/Makefile | 5 +
> > > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > > scripts/Makefile.lib | 10 +
> > > > 24 files changed, 1022 insertions(+)
> > > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > > create mode 100644 kernel/pgo/Kconfig
> > > > create mode 100644 kernel/pgo/Makefile
> > > > create mode 100644 kernel/pgo/fs.c
> > > > create mode 100644 kernel/pgo/instrument.c
> > > > create mode 100644 kernel/pgo/pgo.h
> > > >
> > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > --- a/Documentation/dev-tools/index.rst
> > > > +++ b/Documentation/dev-tools/index.rst
> > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > kgdb
> > > > kselftest
> > > > kunit/index
> > > > + pgo
> > > >
> > > >
> > > > .. only:: subproject and html
> > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > new file mode 100644
> > > > index 0000000000000..b7f11d8405b73
> > > > --- /dev/null
> > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > @@ -0,0 +1,127 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +===============================
> > > > +Using PGO with the Linux kernel
> > > > +===============================
> > > > +
> > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > +debugfs directory.
> > > > +
> > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > +
> > > > +
> > > > +Preparation
> > > > +===========
> > > > +
> > > > +Configure the kernel with:
> > > > +
> > > > +.. code-block:: make
> > > > +
> > > > + CONFIG_DEBUG_FS=y
> > > > + CONFIG_PGO_CLANG=y
> > > > +
> > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > +and run slower.
> > > > +
> > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > +
> > > > +.. code-block:: sh
> > > > +
> > > > + mount -t debugfs none /sys/kernel/debug
> > > > +
> > > > +
> > > > +Customization
> > > > +=============
> > > > +
> > > > +You can enable or disable profiling for individual file and directories by
> > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > +
> > > > +- For a single file (e.g. main.o)
> > > > +
> > > > + .. code-block:: make
> > > > +
> > > > + PGO_PROFILE_main.o := y
> > > > +
> > > > +- For all files in one directory
> > > > +
> > > > + .. code-block:: make
> > > > +
> > > > + PGO_PROFILE := y
> > > > +
> > > > +To exclude files from being profiled use
> > > > +
> > > > + .. code-block:: make
> > > > +
> > > > + PGO_PROFILE_main.o := n
> > > > +
> > > > +and
> > > > +
> > > > + .. code-block:: make
> > > > +
> > > > + PGO_PROFILE := n
> > > > +
> > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > +modules are supported by this mechanism.
> > > > +
> > > > +
> > > > +Files
> > > > +=====
> > > > +
> > > > +The PGO kernel support creates the following files in debugfs:
> > > > +
> > > > +``/sys/kernel/debug/pgo``
> > > > + Parent directory for all PGO-related files.
> > > > +
> > > > +``/sys/kernel/debug/pgo/reset``
> > > > + Global reset file: resets all coverage data to zero when written to.
> > > > +
> > > > +``/sys/kernel/debug/profraw``
> > > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > > +
> > > > +
> > > > +Workflow
> > > > +========
> > > > +
> > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > +Clang version.
> > > > +
> > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > +etc. Clang offers tools to perform these tasks.
> > > > +
> > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > +using the result to optimize the kernel:
> > > > +
> > > > +1) Install the kernel on the TEST machine.
> > > > +
> > > > +2) Reset the data counters right before running the load tests
> > > > +
> > > > + .. code-block:: sh
> > > > +
> > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > +
> > >
> > > I do not get this...
> > >
> > > # mount | grep debugfs
> > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > >
> > > After the load-test...?
> > >
> > > echo 0 > /sys/kernel/debug/pgo/reset
> > >
> > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > be any number, letter, your favorite short story, etc. You don't want
> > to reset it before collecting the profiling data from your load tests
> > though.
> >
> > > > +3) Run the load tests.
> > > > +
> > > > +4) Collect the raw profile data
> > > > +
> > > > + .. code-block:: sh
> > > > +
> > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > +
> > >
> > > This is only 4,9M small and seen from the date 5mins before I run the
> > > echo-1 line.
> > >
> > > # ll /sys/kernel/debug/pgo
> > > insgesamt 0
> > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > --w------- 1 root root 0 16. Jan 18:19 reset
> > >
> > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > >
> > > # ll /tmp/vmlinux.profraw
> > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > >
> > > For me there was no prof-data collected from my defconfig kernel-build.
> > >
> > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > it, not even the kernel. All it does is serialize the profiling
> > counters from a memory location in the kernel into a format that
> > LLVM's tools can understand.
> >
> > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > +
> > > > +6) Process the raw profile data
> > > > +
> > > > + .. code-block:: sh
> > > > +
> > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > +
> > >
> > > Is that executed in /path/to/linux/git?
> > >
> > The llvm-profdata tool is not in the linux source tree. You need to
> > grab it from a clang distribution (or built from clang's git repo).
> >
> > > > + Note that multiple raw profile data files can be merged during this step.
> > > > +
> > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > +
> > > > + .. code-block:: sh
> > > > +
> > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > >
> > > How big is vmlinux.profdata (make defconfig)?
> > >
> > I don't have numbers for this, but from what you listed here, it's ~5M
> > in size. The size is proportional to the number of counters
> > instrumented in the kernel.
> >
> > > Do I need to do a full defconfig build or can I stop the build after
> > > let me say 10mins?
> > >
> > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> >
>
> Thanks Bill for all the information.
>
> And sorry if I am so pedantic.
>
> I have installed my Debian system with Legacy-BIOS enabled.
>
> When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> have as a default) my system hangs on reboot.
>
> [ diffconfig ]
> $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> PGO_CLANG y -> n
>
> [ my make line ]
> $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> KBUILD_BUILD_HOST=iniza [email protected]
> KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> KCFLAGS=-fprofile-use=vmlinux.profdata
>
> ( Yes, 06:47 a.m. in the morning :-). )
>
> When I boot with the rebuild Linux-kernel I see:
>
> Wrong EFI loader signature
> ...
> Decompressing
> Parsing EFI
> Performing Relocations done.
> Booting the Kernel.
>
> *** SYSTEM HANGS ***
> ( I waited for approx 1 min )
>
> I tried to turn UEFI support ON and OFF.
> No success.
>
> Does Clang-PGO support Legacy-BIOS or is something different wrong?
>
> Thanks.
>

My bootloader is GRUB.

In UEFI-BIOS settings there is no secure-boot disable option.
Just simple "Use UEFI BIOS" enabled|disabled.

Installed Debian packages:

ii grub-common 2.04-12
ii grub-pc 2.04-12
ii grub-pc-bin 2.04-12
ii grub2-common 2.04-12

I found in the below link to do in grub-shell:

set check_signatures=no

But this is when grub-efi is installed.

- Sedat -

Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check

2021-01-17 11:27:52

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <[email protected]> wrote:
>
> On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <[email protected]> wrote:
> >
> > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
> > >
> > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > <[email protected]> wrote:
> > > > >
> > > > > From: Sami Tolvanen <[email protected]>
> > > > >
> > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > workload is run, and the raw profile data is collected from
> > > > > /sys/kernel/debug/pgo/profraw.
> > > > >
> > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > before it can be used during recompilation:
> > > > >
> > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > >
> > > > > Multiple raw profiles may be merged during this step.
> > > > >
> > > > > The data can now be used by the compiler:
> > > > >
> > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > >
> > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > know works. This restriction can be lifted once other platforms have
> > > > > been verified to work with PGO.
> > > > >
> > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > the clang support in kernel/gcov.
> > > > >
> > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > >
> > > > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > > > Co-developed-by: Bill Wendling <[email protected]>
> > > > > Signed-off-by: Bill Wendling <[email protected]>
> > > > > ---
> > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > testing.
> > > > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > Song's comments.
> > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > own popcount implementation, based on Nick Desaulniers's comment.
> > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > ---
> > > > > Documentation/dev-tools/index.rst | 1 +
> > > > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > > > MAINTAINERS | 9 +
> > > > > Makefile | 3 +
> > > > > arch/Kconfig | 1 +
> > > > > arch/x86/Kconfig | 1 +
> > > > > arch/x86/boot/Makefile | 1 +
> > > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > > arch/x86/crypto/Makefile | 2 +
> > > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > > arch/x86/platform/efi/Makefile | 1 +
> > > > > arch/x86/purgatory/Makefile | 1 +
> > > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > > arch/x86/um/vdso/Makefile | 1 +
> > > > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > > > kernel/Makefile | 1 +
> > > > > kernel/pgo/Kconfig | 35 +++
> > > > > kernel/pgo/Makefile | 5 +
> > > > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > > > scripts/Makefile.lib | 10 +
> > > > > 24 files changed, 1022 insertions(+)
> > > > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > create mode 100644 kernel/pgo/Kconfig
> > > > > create mode 100644 kernel/pgo/Makefile
> > > > > create mode 100644 kernel/pgo/fs.c
> > > > > create mode 100644 kernel/pgo/instrument.c
> > > > > create mode 100644 kernel/pgo/pgo.h
> > > > >
> > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > --- a/Documentation/dev-tools/index.rst
> > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > kgdb
> > > > > kselftest
> > > > > kunit/index
> > > > > + pgo
> > > > >
> > > > >
> > > > > .. only:: subproject and html
> > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > new file mode 100644
> > > > > index 0000000000000..b7f11d8405b73
> > > > > --- /dev/null
> > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > @@ -0,0 +1,127 @@
> > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > +
> > > > > +===============================
> > > > > +Using PGO with the Linux kernel
> > > > > +===============================
> > > > > +
> > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > +debugfs directory.
> > > > > +
> > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > +
> > > > > +
> > > > > +Preparation
> > > > > +===========
> > > > > +
> > > > > +Configure the kernel with:
> > > > > +
> > > > > +.. code-block:: make
> > > > > +
> > > > > + CONFIG_DEBUG_FS=y
> > > > > + CONFIG_PGO_CLANG=y
> > > > > +
> > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > +and run slower.
> > > > > +
> > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > +
> > > > > +.. code-block:: sh
> > > > > +
> > > > > + mount -t debugfs none /sys/kernel/debug
> > > > > +
> > > > > +
> > > > > +Customization
> > > > > +=============
> > > > > +
> > > > > +You can enable or disable profiling for individual file and directories by
> > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > +
> > > > > +- For a single file (e.g. main.o)
> > > > > +
> > > > > + .. code-block:: make
> > > > > +
> > > > > + PGO_PROFILE_main.o := y
> > > > > +
> > > > > +- For all files in one directory
> > > > > +
> > > > > + .. code-block:: make
> > > > > +
> > > > > + PGO_PROFILE := y
> > > > > +
> > > > > +To exclude files from being profiled use
> > > > > +
> > > > > + .. code-block:: make
> > > > > +
> > > > > + PGO_PROFILE_main.o := n
> > > > > +
> > > > > +and
> > > > > +
> > > > > + .. code-block:: make
> > > > > +
> > > > > + PGO_PROFILE := n
> > > > > +
> > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > +modules are supported by this mechanism.
> > > > > +
> > > > > +
> > > > > +Files
> > > > > +=====
> > > > > +
> > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > +
> > > > > +``/sys/kernel/debug/pgo``
> > > > > + Parent directory for all PGO-related files.
> > > > > +
> > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > + Global reset file: resets all coverage data to zero when written to.
> > > > > +
> > > > > +``/sys/kernel/debug/profraw``
> > > > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > +
> > > > > +
> > > > > +Workflow
> > > > > +========
> > > > > +
> > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > +Clang version.
> > > > > +
> > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > +etc. Clang offers tools to perform these tasks.
> > > > > +
> > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > +using the result to optimize the kernel:
> > > > > +
> > > > > +1) Install the kernel on the TEST machine.
> > > > > +
> > > > > +2) Reset the data counters right before running the load tests
> > > > > +
> > > > > + .. code-block:: sh
> > > > > +
> > > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > +
> > > >
> > > > I do not get this...
> > > >
> > > > # mount | grep debugfs
> > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > >
> > > > After the load-test...?
> > > >
> > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > >
> > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > be any number, letter, your favorite short story, etc. You don't want
> > > to reset it before collecting the profiling data from your load tests
> > > though.
> > >
> > > > > +3) Run the load tests.
> > > > > +
> > > > > +4) Collect the raw profile data
> > > > > +
> > > > > + .. code-block:: sh
> > > > > +
> > > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > +
> > > >
> > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > echo-1 line.
> > > >
> > > > # ll /sys/kernel/debug/pgo
> > > > insgesamt 0
> > > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > > --w------- 1 root root 0 16. Jan 18:19 reset
> > > >
> > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > >
> > > > # ll /tmp/vmlinux.profraw
> > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > >
> > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > >
> > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > it, not even the kernel. All it does is serialize the profiling
> > > counters from a memory location in the kernel into a format that
> > > LLVM's tools can understand.
> > >
> > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > +
> > > > > +6) Process the raw profile data
> > > > > +
> > > > > + .. code-block:: sh
> > > > > +
> > > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > +
> > > >
> > > > Is that executed in /path/to/linux/git?
> > > >
> > > The llvm-profdata tool is not in the linux source tree. You need to
> > > grab it from a clang distribution (or built from clang's git repo).
> > >
> > > > > + Note that multiple raw profile data files can be merged during this step.
> > > > > +
> > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > +
> > > > > + .. code-block:: sh
> > > > > +
> > > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > >
> > > > How big is vmlinux.profdata (make defconfig)?
> > > >
> > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > in size. The size is proportional to the number of counters
> > > instrumented in the kernel.
> > >
> > > > Do I need to do a full defconfig build or can I stop the build after
> > > > let me say 10mins?
> > > >
> > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > >
> >
> > Thanks Bill for all the information.
> >
> > And sorry if I am so pedantic.
> >
> > I have installed my Debian system with Legacy-BIOS enabled.
> >
> > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > have as a default) my system hangs on reboot.
> >
> > [ diffconfig ]
> > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > PGO_CLANG y -> n
> >
> > [ my make line ]
> > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > KBUILD_BUILD_HOST=iniza [email protected]
> > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > KCFLAGS=-fprofile-use=vmlinux.profdata
> >
> > ( Yes, 06:47 a.m. in the morning :-). )
> >
> > When I boot with the rebuild Linux-kernel I see:
> >
> > Wrong EFI loader signature
> > ...
> > Decompressing
> > Parsing EFI
> > Performing Relocations done.
> > Booting the Kernel.
> >
> > *** SYSTEM HANGS ***
> > ( I waited for approx 1 min )
> >
> > I tried to turn UEFI support ON and OFF.
> > No success.
> >
> > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> >
> > Thanks.
> >
>
> My bootloader is GRUB.
>
> In UEFI-BIOS settings there is no secure-boot disable option.
> Just simple "Use UEFI BIOS" enabled|disabled.
>
> Installed Debian packages:
>
> ii grub-common 2.04-12
> ii grub-pc 2.04-12
> ii grub-pc-bin 2.04-12
> ii grub2-common 2.04-12
>
> I found in the below link to do in grub-shell:
>
> set check_signatures=no
>
> But this is when grub-efi is installed.
>
> - Sedat -
>
> Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check

Forget about that "Wrong EFI bootloader" - I see this with all other
kernels (all boot fine).

I tried in QEMU with and without KASLR:

[ run_qemu.sh ]
KPATH=$(pwd)

APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
APPEND="$APPEND nokaslr"

qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
$KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
[ /run_qemu.sh ]

$ ./run_qemu.sh
Probing EDD (edd=off to disable)... ok
Wrong EFI loader signature.
early console in extract_kernel
input_data: 0x000000000289940d
input_len: 0x000000000069804a
output: 0x0000000001000000
output_len: 0x0000000001ef2010
kernel_total_size: 0x0000000001c2c000
needed_size: 0x0000000002000000
trampoline_32bit: 0x000000000009d000


KASLR disabled: 'nokaslr' on cmdline.


Decompressing Linux... Parsing ELF... No relocation needed... done.
Booting the kernel.

QEMU run stops, too.

- Sedat

2021-01-17 12:02:04

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 12:42 PM Sedat Dilek <[email protected]> wrote:
>
> On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <[email protected]> wrote:
> >
> > On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
> > > > >
> > > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > From: Sami Tolvanen <[email protected]>
> > > > > > >
> > > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > > workload is run, and the raw profile data is collected from
> > > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > > >
> > > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > > before it can be used during recompilation:
> > > > > > >
> > > > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > >
> > > > > > > Multiple raw profiles may be merged during this step.
> > > > > > >
> > > > > > > The data can now be used by the compiler:
> > > > > > >
> > > > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > >
> > > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > > been verified to work with PGO.
> > > > > > >
> > > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > > the clang support in kernel/gcov.
> > > > > > >
> > > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > >
> > > > > > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > > > > > Co-developed-by: Bill Wendling <[email protected]>
> > > > > > > Signed-off-by: Bill Wendling <[email protected]>
> > > > > > > ---
> > > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > > testing.
> > > > > > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > > Song's comments.
> > > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > > own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > > ---
> > > > > > > Documentation/dev-tools/index.rst | 1 +
> > > > > > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > > > > > MAINTAINERS | 9 +
> > > > > > > Makefile | 3 +
> > > > > > > arch/Kconfig | 1 +
> > > > > > > arch/x86/Kconfig | 1 +
> > > > > > > arch/x86/boot/Makefile | 1 +
> > > > > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > > > > arch/x86/crypto/Makefile | 2 +
> > > > > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > > > > arch/x86/platform/efi/Makefile | 1 +
> > > > > > > arch/x86/purgatory/Makefile | 1 +
> > > > > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > > > > arch/x86/um/vdso/Makefile | 1 +
> > > > > > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > > > > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > > > > > kernel/Makefile | 1 +
> > > > > > > kernel/pgo/Kconfig | 35 +++
> > > > > > > kernel/pgo/Makefile | 5 +
> > > > > > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > > > > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > > > > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > > > > > scripts/Makefile.lib | 10 +
> > > > > > > 24 files changed, 1022 insertions(+)
> > > > > > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > > create mode 100644 kernel/pgo/Kconfig
> > > > > > > create mode 100644 kernel/pgo/Makefile
> > > > > > > create mode 100644 kernel/pgo/fs.c
> > > > > > > create mode 100644 kernel/pgo/instrument.c
> > > > > > > create mode 100644 kernel/pgo/pgo.h
> > > > > > >
> > > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > > kgdb
> > > > > > > kselftest
> > > > > > > kunit/index
> > > > > > > + pgo
> > > > > > >
> > > > > > >
> > > > > > > .. only:: subproject and html
> > > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > > new file mode 100644
> > > > > > > index 0000000000000..b7f11d8405b73
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > > @@ -0,0 +1,127 @@
> > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > +
> > > > > > > +===============================
> > > > > > > +Using PGO with the Linux kernel
> > > > > > > +===============================
> > > > > > > +
> > > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > > +debugfs directory.
> > > > > > > +
> > > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > +
> > > > > > > +
> > > > > > > +Preparation
> > > > > > > +===========
> > > > > > > +
> > > > > > > +Configure the kernel with:
> > > > > > > +
> > > > > > > +.. code-block:: make
> > > > > > > +
> > > > > > > + CONFIG_DEBUG_FS=y
> > > > > > > + CONFIG_PGO_CLANG=y
> > > > > > > +
> > > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > > +and run slower.
> > > > > > > +
> > > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > > +
> > > > > > > +.. code-block:: sh
> > > > > > > +
> > > > > > > + mount -t debugfs none /sys/kernel/debug
> > > > > > > +
> > > > > > > +
> > > > > > > +Customization
> > > > > > > +=============
> > > > > > > +
> > > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > > +
> > > > > > > +- For a single file (e.g. main.o)
> > > > > > > +
> > > > > > > + .. code-block:: make
> > > > > > > +
> > > > > > > + PGO_PROFILE_main.o := y
> > > > > > > +
> > > > > > > +- For all files in one directory
> > > > > > > +
> > > > > > > + .. code-block:: make
> > > > > > > +
> > > > > > > + PGO_PROFILE := y
> > > > > > > +
> > > > > > > +To exclude files from being profiled use
> > > > > > > +
> > > > > > > + .. code-block:: make
> > > > > > > +
> > > > > > > + PGO_PROFILE_main.o := n
> > > > > > > +
> > > > > > > +and
> > > > > > > +
> > > > > > > + .. code-block:: make
> > > > > > > +
> > > > > > > + PGO_PROFILE := n
> > > > > > > +
> > > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > > +modules are supported by this mechanism.
> > > > > > > +
> > > > > > > +
> > > > > > > +Files
> > > > > > > +=====
> > > > > > > +
> > > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > > +
> > > > > > > +``/sys/kernel/debug/pgo``
> > > > > > > + Parent directory for all PGO-related files.
> > > > > > > +
> > > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > > + Global reset file: resets all coverage data to zero when written to.
> > > > > > > +
> > > > > > > +``/sys/kernel/debug/profraw``
> > > > > > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > > +
> > > > > > > +
> > > > > > > +Workflow
> > > > > > > +========
> > > > > > > +
> > > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > > +Clang version.
> > > > > > > +
> > > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > > +
> > > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > > +using the result to optimize the kernel:
> > > > > > > +
> > > > > > > +1) Install the kernel on the TEST machine.
> > > > > > > +
> > > > > > > +2) Reset the data counters right before running the load tests
> > > > > > > +
> > > > > > > + .. code-block:: sh
> > > > > > > +
> > > > > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > > +
> > > > > >
> > > > > > I do not get this...
> > > > > >
> > > > > > # mount | grep debugfs
> > > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > > >
> > > > > > After the load-test...?
> > > > > >
> > > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > > >
> > > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > > be any number, letter, your favorite short story, etc. You don't want
> > > > > to reset it before collecting the profiling data from your load tests
> > > > > though.
> > > > >
> > > > > > > +3) Run the load tests.
> > > > > > > +
> > > > > > > +4) Collect the raw profile data
> > > > > > > +
> > > > > > > + .. code-block:: sh
> > > > > > > +
> > > > > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > +
> > > > > >
> > > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > > echo-1 line.
> > > > > >
> > > > > > # ll /sys/kernel/debug/pgo
> > > > > > insgesamt 0
> > > > > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > > > > --w------- 1 root root 0 16. Jan 18:19 reset
> > > > > >
> > > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > >
> > > > > > # ll /tmp/vmlinux.profraw
> > > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > > >
> > > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > > >
> > > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > > it, not even the kernel. All it does is serialize the profiling
> > > > > counters from a memory location in the kernel into a format that
> > > > > LLVM's tools can understand.
> > > > >
> > > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > > +
> > > > > > > +6) Process the raw profile data
> > > > > > > +
> > > > > > > + .. code-block:: sh
> > > > > > > +
> > > > > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > +
> > > > > >
> > > > > > Is that executed in /path/to/linux/git?
> > > > > >
> > > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > > grab it from a clang distribution (or built from clang's git repo).
> > > > >
> > > > > > > + Note that multiple raw profile data files can be merged during this step.
> > > > > > > +
> > > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > > +
> > > > > > > + .. code-block:: sh
> > > > > > > +
> > > > > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > >
> > > > > > How big is vmlinux.profdata (make defconfig)?
> > > > > >
> > > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > > in size. The size is proportional to the number of counters
> > > > > instrumented in the kernel.
> > > > >
> > > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > > let me say 10mins?
> > > > > >
> > > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > > >
> > > >
> > > > Thanks Bill for all the information.
> > > >
> > > > And sorry if I am so pedantic.
> > > >
> > > > I have installed my Debian system with Legacy-BIOS enabled.
> > > >
> > > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > > have as a default) my system hangs on reboot.
> > > >
> > > > [ diffconfig ]
> > > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > > PGO_CLANG y -> n
> > > >
> > > > [ my make line ]
> > > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > >
> > > > ( Yes, 06:47 a.m. in the morning :-). )
> > > >
> > > > When I boot with the rebuild Linux-kernel I see:
> > > >
> > > > Wrong EFI loader signature
> > > > ...
> > > > Decompressing
> > > > Parsing EFI
> > > > Performing Relocations done.
> > > > Booting the Kernel.
> > > >
> > > > *** SYSTEM HANGS ***
> > > > ( I waited for approx 1 min )
> > > >
> > > > I tried to turn UEFI support ON and OFF.
> > > > No success.
> > > >
> > > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > > >
> > > > Thanks.
> > > >
> > >
> > > My bootloader is GRUB.
> > >
> > > In UEFI-BIOS settings there is no secure-boot disable option.
> > > Just simple "Use UEFI BIOS" enabled|disabled.
> > >
> > > Installed Debian packages:
> > >
> > > ii grub-common 2.04-12
> > > ii grub-pc 2.04-12
> > > ii grub-pc-bin 2.04-12
> > > ii grub2-common 2.04-12
> > >
> > > I found in the below link to do in grub-shell:
> > >
> > > set check_signatures=no
> > >
> > > But this is when grub-efi is installed.
> > >
> > > - Sedat -
> > >
> > > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
> >
> > Forget about that "Wrong EFI bootloader" - I see this with all other
> > kernels (all boot fine).
> >
> > I tried in QEMU with and without KASLR:
> >
> > [ run_qemu.sh ]
> > KPATH=$(pwd)
> >
> > APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> > APPEND="$APPEND nokaslr"
> >
> > qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> > $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> > [ /run_qemu.sh ]
> >
> > $ ./run_qemu.sh
> > Probing EDD (edd=off to disable)... ok
> > Wrong EFI loader signature.
> > early console in extract_kernel
> > input_data: 0x000000000289940d
> > input_len: 0x000000000069804a
> > output: 0x0000000001000000
> > output_len: 0x0000000001ef2010
> > kernel_total_size: 0x0000000001c2c000
> > needed_size: 0x0000000002000000
> > trampoline_32bit: 0x000000000009d000
> >
> >
> > KASLR disabled: 'nokaslr' on cmdline.
> >
> >
> > Decompressing Linux... Parsing ELF... No relocation needed... done.
> > Booting the kernel.
> >
> > QEMU run stops, too.
> >
>
> I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).
>
> --- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
> +++ /etc/initramfs-tools/initramfs.conf.zstd 2020-09-21
> 23:55:43.121735427 +0200
> @@ -41,7 +41,7 @@ KEYMAP=n
> # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
> #
>
> -COMPRESS=gzip
> +COMPRESS=zstd
>
> #
> # DEVICE: ...
>
> root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER
>
> QEMU boot stops at the same stage.
>
> Now, my head is empty...
>
> Any comments?
>

( Just as a side note I have Nick's DWARF-v5 support enabled. )

There is one EFI related warning in my build-log:

$ grep warning: build-log_5.11.0-rc3-9-amd64-clang12-pgo.txt
dpkg-architecture: warning: specified GNU system type x86_64-linux-gnu
does not match CC system type x86_64-pc-linux-gnu, try setting a
correct CC environment variable
warning: arch/x86/platform/efi/quirks.c: Function control flow change
detected (hash mismatch) efi_arch_mem_reserve Hash =
391331300655996873 [-Wbackend-plugin]
warning: arch/x86/platform/efi/efi.c: Function control flow change
detected (hash mismatch) efi_attr_is_visible Hash = 567185240781730690
[-Wbackend-plugin]
arch/x86/crypto/aegis128-aesni-glue.c:265:30: warning: unused variable
'simd_alg' [-Wunused-variable]
warning: lib/crypto/sha256.c: Function control flow change detected
(hash mismatch) sha256_update Hash = 744640996947387358
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) memcmp Hash = 742261418966908927
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) bcmp Hash = 742261418966908927
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strcmp Hash = 536873291001348520
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strnlen Hash = 146835646621254984
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) simple_strtoull Hash =
252792765950587360 [-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strstr Hash = 391331303349076211
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strchr Hash = 1063705159280644635
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) kstrtoull Hash = 758414239132790022
[-Wbackend-plugin]
drivers/infiniband/hw/hfi1/platform.o: warning: objtool: tune_serdes()
falls through to next function apply_tx_lanes()

Cannot say if this information is helpful.

- Sedat -

2021-01-17 12:37:19

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <[email protected]> wrote:
>
> On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <[email protected]> wrote:
> >
> > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
> > > >
> > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > From: Sami Tolvanen <[email protected]>
> > > > > >
> > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > workload is run, and the raw profile data is collected from
> > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > >
> > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > before it can be used during recompilation:
> > > > > >
> > > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > >
> > > > > > Multiple raw profiles may be merged during this step.
> > > > > >
> > > > > > The data can now be used by the compiler:
> > > > > >
> > > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > >
> > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > been verified to work with PGO.
> > > > > >
> > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > the clang support in kernel/gcov.
> > > > > >
> > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > >
> > > > > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > > > > Co-developed-by: Bill Wendling <[email protected]>
> > > > > > Signed-off-by: Bill Wendling <[email protected]>
> > > > > > ---
> > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > testing.
> > > > > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > Song's comments.
> > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > ---
> > > > > > Documentation/dev-tools/index.rst | 1 +
> > > > > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > > > > MAINTAINERS | 9 +
> > > > > > Makefile | 3 +
> > > > > > arch/Kconfig | 1 +
> > > > > > arch/x86/Kconfig | 1 +
> > > > > > arch/x86/boot/Makefile | 1 +
> > > > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > > > arch/x86/crypto/Makefile | 2 +
> > > > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > > > arch/x86/platform/efi/Makefile | 1 +
> > > > > > arch/x86/purgatory/Makefile | 1 +
> > > > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > > > arch/x86/um/vdso/Makefile | 1 +
> > > > > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > > > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > > > > kernel/Makefile | 1 +
> > > > > > kernel/pgo/Kconfig | 35 +++
> > > > > > kernel/pgo/Makefile | 5 +
> > > > > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > > > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > > > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > > > > scripts/Makefile.lib | 10 +
> > > > > > 24 files changed, 1022 insertions(+)
> > > > > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > create mode 100644 kernel/pgo/Kconfig
> > > > > > create mode 100644 kernel/pgo/Makefile
> > > > > > create mode 100644 kernel/pgo/fs.c
> > > > > > create mode 100644 kernel/pgo/instrument.c
> > > > > > create mode 100644 kernel/pgo/pgo.h
> > > > > >
> > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > kgdb
> > > > > > kselftest
> > > > > > kunit/index
> > > > > > + pgo
> > > > > >
> > > > > >
> > > > > > .. only:: subproject and html
> > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > new file mode 100644
> > > > > > index 0000000000000..b7f11d8405b73
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > @@ -0,0 +1,127 @@
> > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > +
> > > > > > +===============================
> > > > > > +Using PGO with the Linux kernel
> > > > > > +===============================
> > > > > > +
> > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > +debugfs directory.
> > > > > > +
> > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > +
> > > > > > +
> > > > > > +Preparation
> > > > > > +===========
> > > > > > +
> > > > > > +Configure the kernel with:
> > > > > > +
> > > > > > +.. code-block:: make
> > > > > > +
> > > > > > + CONFIG_DEBUG_FS=y
> > > > > > + CONFIG_PGO_CLANG=y
> > > > > > +
> > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > +and run slower.
> > > > > > +
> > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > +
> > > > > > +.. code-block:: sh
> > > > > > +
> > > > > > + mount -t debugfs none /sys/kernel/debug
> > > > > > +
> > > > > > +
> > > > > > +Customization
> > > > > > +=============
> > > > > > +
> > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > +
> > > > > > +- For a single file (e.g. main.o)
> > > > > > +
> > > > > > + .. code-block:: make
> > > > > > +
> > > > > > + PGO_PROFILE_main.o := y
> > > > > > +
> > > > > > +- For all files in one directory
> > > > > > +
> > > > > > + .. code-block:: make
> > > > > > +
> > > > > > + PGO_PROFILE := y
> > > > > > +
> > > > > > +To exclude files from being profiled use
> > > > > > +
> > > > > > + .. code-block:: make
> > > > > > +
> > > > > > + PGO_PROFILE_main.o := n
> > > > > > +
> > > > > > +and
> > > > > > +
> > > > > > + .. code-block:: make
> > > > > > +
> > > > > > + PGO_PROFILE := n
> > > > > > +
> > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > +modules are supported by this mechanism.
> > > > > > +
> > > > > > +
> > > > > > +Files
> > > > > > +=====
> > > > > > +
> > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > +
> > > > > > +``/sys/kernel/debug/pgo``
> > > > > > + Parent directory for all PGO-related files.
> > > > > > +
> > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > + Global reset file: resets all coverage data to zero when written to.
> > > > > > +
> > > > > > +``/sys/kernel/debug/profraw``
> > > > > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > +
> > > > > > +
> > > > > > +Workflow
> > > > > > +========
> > > > > > +
> > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > +Clang version.
> > > > > > +
> > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > +
> > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > +using the result to optimize the kernel:
> > > > > > +
> > > > > > +1) Install the kernel on the TEST machine.
> > > > > > +
> > > > > > +2) Reset the data counters right before running the load tests
> > > > > > +
> > > > > > + .. code-block:: sh
> > > > > > +
> > > > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > +
> > > > >
> > > > > I do not get this...
> > > > >
> > > > > # mount | grep debugfs
> > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > >
> > > > > After the load-test...?
> > > > >
> > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > >
> > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > be any number, letter, your favorite short story, etc. You don't want
> > > > to reset it before collecting the profiling data from your load tests
> > > > though.
> > > >
> > > > > > +3) Run the load tests.
> > > > > > +
> > > > > > +4) Collect the raw profile data
> > > > > > +
> > > > > > + .. code-block:: sh
> > > > > > +
> > > > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > +
> > > > >
> > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > echo-1 line.
> > > > >
> > > > > # ll /sys/kernel/debug/pgo
> > > > > insgesamt 0
> > > > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > > > --w------- 1 root root 0 16. Jan 18:19 reset
> > > > >
> > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > >
> > > > > # ll /tmp/vmlinux.profraw
> > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > >
> > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > >
> > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > it, not even the kernel. All it does is serialize the profiling
> > > > counters from a memory location in the kernel into a format that
> > > > LLVM's tools can understand.
> > > >
> > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > +
> > > > > > +6) Process the raw profile data
> > > > > > +
> > > > > > + .. code-block:: sh
> > > > > > +
> > > > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > +
> > > > >
> > > > > Is that executed in /path/to/linux/git?
> > > > >
> > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > grab it from a clang distribution (or built from clang's git repo).
> > > >
> > > > > > + Note that multiple raw profile data files can be merged during this step.
> > > > > > +
> > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > +
> > > > > > + .. code-block:: sh
> > > > > > +
> > > > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > >
> > > > > How big is vmlinux.profdata (make defconfig)?
> > > > >
> > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > in size. The size is proportional to the number of counters
> > > > instrumented in the kernel.
> > > >
> > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > let me say 10mins?
> > > > >
> > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > >
> > >
> > > Thanks Bill for all the information.
> > >
> > > And sorry if I am so pedantic.
> > >
> > > I have installed my Debian system with Legacy-BIOS enabled.
> > >
> > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > have as a default) my system hangs on reboot.
> > >
> > > [ diffconfig ]
> > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > PGO_CLANG y -> n
> > >
> > > [ my make line ]
> > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > KBUILD_BUILD_HOST=iniza [email protected]
> > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > >
> > > ( Yes, 06:47 a.m. in the morning :-). )
> > >
> > > When I boot with the rebuild Linux-kernel I see:
> > >
> > > Wrong EFI loader signature
> > > ...
> > > Decompressing
> > > Parsing EFI
> > > Performing Relocations done.
> > > Booting the Kernel.
> > >
> > > *** SYSTEM HANGS ***
> > > ( I waited for approx 1 min )
> > >
> > > I tried to turn UEFI support ON and OFF.
> > > No success.
> > >
> > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > >
> > > Thanks.
> > >
> >
> > My bootloader is GRUB.
> >
> > In UEFI-BIOS settings there is no secure-boot disable option.
> > Just simple "Use UEFI BIOS" enabled|disabled.
> >
> > Installed Debian packages:
> >
> > ii grub-common 2.04-12
> > ii grub-pc 2.04-12
> > ii grub-pc-bin 2.04-12
> > ii grub2-common 2.04-12
> >
> > I found in the below link to do in grub-shell:
> >
> > set check_signatures=no
> >
> > But this is when grub-efi is installed.
> >
> > - Sedat -
> >
> > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
>
> Forget about that "Wrong EFI bootloader" - I see this with all other
> kernels (all boot fine).
>
> I tried in QEMU with and without KASLR:
>
> [ run_qemu.sh ]
> KPATH=$(pwd)
>
> APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> APPEND="$APPEND nokaslr"
>
> qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> [ /run_qemu.sh ]
>
> $ ./run_qemu.sh
> Probing EDD (edd=off to disable)... ok
> Wrong EFI loader signature.
> early console in extract_kernel
> input_data: 0x000000000289940d
> input_len: 0x000000000069804a
> output: 0x0000000001000000
> output_len: 0x0000000001ef2010
> kernel_total_size: 0x0000000001c2c000
> needed_size: 0x0000000002000000
> trampoline_32bit: 0x000000000009d000
>
>
> KASLR disabled: 'nokaslr' on cmdline.
>
>
> Decompressing Linux... Parsing ELF... No relocation needed... done.
> Booting the kernel.
>
> QEMU run stops, too.
>

I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).

--- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
+++ /etc/initramfs-tools/initramfs.conf.zstd 2020-09-21
23:55:43.121735427 +0200
@@ -41,7 +41,7 @@ KEYMAP=n
# COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
#

-COMPRESS=gzip
+COMPRESS=zstd

#
# DEVICE: ...

root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER

QEMU boot stops at the same stage.

Now, my head is empty...

Any comments?

- Sedat -

2021-01-17 12:55:18

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 12:58 PM Sedat Dilek <[email protected]> wrote:
>
> On Sun, Jan 17, 2021 at 12:42 PM Sedat Dilek <[email protected]> wrote:
> >
> > On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <[email protected]> wrote:
> > > > >
> > > > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
> > > > > >
> > > > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > > > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > From: Sami Tolvanen <[email protected]>
> > > > > > > >
> > > > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > > > workload is run, and the raw profile data is collected from
> > > > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > > > >
> > > > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > > > before it can be used during recompilation:
> > > > > > > >
> > > > > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > >
> > > > > > > > Multiple raw profiles may be merged during this step.
> > > > > > > >
> > > > > > > > The data can now be used by the compiler:
> > > > > > > >
> > > > > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > >
> > > > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > > > been verified to work with PGO.
> > > > > > > >
> > > > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > > > the clang support in kernel/gcov.
> > > > > > > >
> > > > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > >
> > > > > > > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > > > > > > Co-developed-by: Bill Wendling <[email protected]>
> > > > > > > > Signed-off-by: Bill Wendling <[email protected]>
> > > > > > > > ---
> > > > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > > > testing.
> > > > > > > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > > > Song's comments.
> > > > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > > > own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > > > ---
> > > > > > > > Documentation/dev-tools/index.rst | 1 +
> > > > > > > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > > > > > > MAINTAINERS | 9 +
> > > > > > > > Makefile | 3 +
> > > > > > > > arch/Kconfig | 1 +
> > > > > > > > arch/x86/Kconfig | 1 +
> > > > > > > > arch/x86/boot/Makefile | 1 +
> > > > > > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > > > > > arch/x86/crypto/Makefile | 2 +
> > > > > > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > > > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > > > > > arch/x86/platform/efi/Makefile | 1 +
> > > > > > > > arch/x86/purgatory/Makefile | 1 +
> > > > > > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > > > > > arch/x86/um/vdso/Makefile | 1 +
> > > > > > > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > > > > > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > > > > > > kernel/Makefile | 1 +
> > > > > > > > kernel/pgo/Kconfig | 35 +++
> > > > > > > > kernel/pgo/Makefile | 5 +
> > > > > > > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > > > > > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > > > > > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > > > > > > scripts/Makefile.lib | 10 +
> > > > > > > > 24 files changed, 1022 insertions(+)
> > > > > > > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > > > create mode 100644 kernel/pgo/Kconfig
> > > > > > > > create mode 100644 kernel/pgo/Makefile
> > > > > > > > create mode 100644 kernel/pgo/fs.c
> > > > > > > > create mode 100644 kernel/pgo/instrument.c
> > > > > > > > create mode 100644 kernel/pgo/pgo.h
> > > > > > > >
> > > > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > > > kgdb
> > > > > > > > kselftest
> > > > > > > > kunit/index
> > > > > > > > + pgo
> > > > > > > >
> > > > > > > >
> > > > > > > > .. only:: subproject and html
> > > > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000000000..b7f11d8405b73
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > > > @@ -0,0 +1,127 @@
> > > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > > +
> > > > > > > > +===============================
> > > > > > > > +Using PGO with the Linux kernel
> > > > > > > > +===============================
> > > > > > > > +
> > > > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > > > +debugfs directory.
> > > > > > > > +
> > > > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +Preparation
> > > > > > > > +===========
> > > > > > > > +
> > > > > > > > +Configure the kernel with:
> > > > > > > > +
> > > > > > > > +.. code-block:: make
> > > > > > > > +
> > > > > > > > + CONFIG_DEBUG_FS=y
> > > > > > > > + CONFIG_PGO_CLANG=y
> > > > > > > > +
> > > > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > > > +and run slower.
> > > > > > > > +
> > > > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > > > +
> > > > > > > > +.. code-block:: sh
> > > > > > > > +
> > > > > > > > + mount -t debugfs none /sys/kernel/debug
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +Customization
> > > > > > > > +=============
> > > > > > > > +
> > > > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > > > +
> > > > > > > > +- For a single file (e.g. main.o)
> > > > > > > > +
> > > > > > > > + .. code-block:: make
> > > > > > > > +
> > > > > > > > + PGO_PROFILE_main.o := y
> > > > > > > > +
> > > > > > > > +- For all files in one directory
> > > > > > > > +
> > > > > > > > + .. code-block:: make
> > > > > > > > +
> > > > > > > > + PGO_PROFILE := y
> > > > > > > > +
> > > > > > > > +To exclude files from being profiled use
> > > > > > > > +
> > > > > > > > + .. code-block:: make
> > > > > > > > +
> > > > > > > > + PGO_PROFILE_main.o := n
> > > > > > > > +
> > > > > > > > +and
> > > > > > > > +
> > > > > > > > + .. code-block:: make
> > > > > > > > +
> > > > > > > > + PGO_PROFILE := n
> > > > > > > > +
> > > > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > > > +modules are supported by this mechanism.
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +Files
> > > > > > > > +=====
> > > > > > > > +
> > > > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > > > +
> > > > > > > > +``/sys/kernel/debug/pgo``
> > > > > > > > + Parent directory for all PGO-related files.
> > > > > > > > +
> > > > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > > > + Global reset file: resets all coverage data to zero when written to.
> > > > > > > > +
> > > > > > > > +``/sys/kernel/debug/profraw``
> > > > > > > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +Workflow
> > > > > > > > +========
> > > > > > > > +
> > > > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > > > +Clang version.
> > > > > > > > +
> > > > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > > > +
> > > > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > > > +using the result to optimize the kernel:
> > > > > > > > +
> > > > > > > > +1) Install the kernel on the TEST machine.
> > > > > > > > +
> > > > > > > > +2) Reset the data counters right before running the load tests
> > > > > > > > +
> > > > > > > > + .. code-block:: sh
> > > > > > > > +
> > > > > > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > > > +
> > > > > > >
> > > > > > > I do not get this...
> > > > > > >
> > > > > > > # mount | grep debugfs
> > > > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > > > >
> > > > > > > After the load-test...?
> > > > > > >
> > > > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > > > >
> > > > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > > > be any number, letter, your favorite short story, etc. You don't want
> > > > > > to reset it before collecting the profiling data from your load tests
> > > > > > though.
> > > > > >
> > > > > > > > +3) Run the load tests.
> > > > > > > > +
> > > > > > > > +4) Collect the raw profile data
> > > > > > > > +
> > > > > > > > + .. code-block:: sh
> > > > > > > > +
> > > > > > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > +
> > > > > > >
> > > > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > > > echo-1 line.
> > > > > > >
> > > > > > > # ll /sys/kernel/debug/pgo
> > > > > > > insgesamt 0
> > > > > > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > > > > > --w------- 1 root root 0 16. Jan 18:19 reset
> > > > > > >
> > > > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > >
> > > > > > > # ll /tmp/vmlinux.profraw
> > > > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > > > >
> > > > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > > > >
> > > > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > > > it, not even the kernel. All it does is serialize the profiling
> > > > > > counters from a memory location in the kernel into a format that
> > > > > > LLVM's tools can understand.
> > > > > >
> > > > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > > > +
> > > > > > > > +6) Process the raw profile data
> > > > > > > > +
> > > > > > > > + .. code-block:: sh
> > > > > > > > +
> > > > > > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > +
> > > > > > >
> > > > > > > Is that executed in /path/to/linux/git?
> > > > > > >
> > > > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > > > grab it from a clang distribution (or built from clang's git repo).
> > > > > >
> > > > > > > > + Note that multiple raw profile data files can be merged during this step.
> > > > > > > > +
> > > > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > > > +
> > > > > > > > + .. code-block:: sh
> > > > > > > > +
> > > > > > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > >
> > > > > > > How big is vmlinux.profdata (make defconfig)?
> > > > > > >
> > > > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > > > in size. The size is proportional to the number of counters
> > > > > > instrumented in the kernel.
> > > > > >
> > > > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > > > let me say 10mins?
> > > > > > >
> > > > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > > > >
> > > > >
> > > > > Thanks Bill for all the information.
> > > > >
> > > > > And sorry if I am so pedantic.
> > > > >
> > > > > I have installed my Debian system with Legacy-BIOS enabled.
> > > > >
> > > > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > > > have as a default) my system hangs on reboot.
> > > > >
> > > > > [ diffconfig ]
> > > > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > > > PGO_CLANG y -> n
> > > > >
> > > > > [ my make line ]
> > > > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > > dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > > >
> > > > > ( Yes, 06:47 a.m. in the morning :-). )
> > > > >
> > > > > When I boot with the rebuild Linux-kernel I see:
> > > > >
> > > > > Wrong EFI loader signature
> > > > > ...
> > > > > Decompressing
> > > > > Parsing EFI
> > > > > Performing Relocations done.
> > > > > Booting the Kernel.
> > > > >
> > > > > *** SYSTEM HANGS ***
> > > > > ( I waited for approx 1 min )
> > > > >
> > > > > I tried to turn UEFI support ON and OFF.
> > > > > No success.
> > > > >
> > > > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > > > My bootloader is GRUB.
> > > >
> > > > In UEFI-BIOS settings there is no secure-boot disable option.
> > > > Just simple "Use UEFI BIOS" enabled|disabled.
> > > >
> > > > Installed Debian packages:
> > > >
> > > > ii grub-common 2.04-12
> > > > ii grub-pc 2.04-12
> > > > ii grub-pc-bin 2.04-12
> > > > ii grub2-common 2.04-12
> > > >
> > > > I found in the below link to do in grub-shell:
> > > >
> > > > set check_signatures=no
> > > >
> > > > But this is when grub-efi is installed.
> > > >
> > > > - Sedat -
> > > >
> > > > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
> > >
> > > Forget about that "Wrong EFI bootloader" - I see this with all other
> > > kernels (all boot fine).
> > >
> > > I tried in QEMU with and without KASLR:
> > >
> > > [ run_qemu.sh ]
> > > KPATH=$(pwd)
> > >
> > > APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> > > APPEND="$APPEND nokaslr"
> > >
> > > qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> > > $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> > > [ /run_qemu.sh ]
> > >
> > > $ ./run_qemu.sh
> > > Probing EDD (edd=off to disable)... ok
> > > Wrong EFI loader signature.
> > > early console in extract_kernel
> > > input_data: 0x000000000289940d
> > > input_len: 0x000000000069804a
> > > output: 0x0000000001000000
> > > output_len: 0x0000000001ef2010
> > > kernel_total_size: 0x0000000001c2c000
> > > needed_size: 0x0000000002000000
> > > trampoline_32bit: 0x000000000009d000
> > >
> > >
> > > KASLR disabled: 'nokaslr' on cmdline.
> > >
> > >
> > > Decompressing Linux... Parsing ELF... No relocation needed... done.
> > > Booting the kernel.
> > >
> > > QEMU run stops, too.
> > >
> >
> > I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).
> >
> > --- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
> > +++ /etc/initramfs-tools/initramfs.conf.zstd 2020-09-21
> > 23:55:43.121735427 +0200
> > @@ -41,7 +41,7 @@ KEYMAP=n
> > # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
> > #
> >
> > -COMPRESS=gzip
> > +COMPRESS=zstd
> >
> > #
> > # DEVICE: ...
> >
> > root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER
> >
> > QEMU boot stops at the same stage.
> >
> > Now, my head is empty...
> >
> > Any comments?
> >
>
> ( Just as a side note I have Nick's DWARF-v5 support enabled. )
>
> There is one EFI related warning in my build-log:
>
> $ grep warning: build-log_5.11.0-rc3-9-amd64-clang12-pgo.txt
> dpkg-architecture: warning: specified GNU system type x86_64-linux-gnu
> does not match CC system type x86_64-pc-linux-gnu, try setting a
> correct CC environment variable
> warning: arch/x86/platform/efi/quirks.c: Function control flow change
> detected (hash mismatch) efi_arch_mem_reserve Hash =
> 391331300655996873 [-Wbackend-plugin]
> warning: arch/x86/platform/efi/efi.c: Function control flow change
> detected (hash mismatch) efi_attr_is_visible Hash = 567185240781730690
> [-Wbackend-plugin]
> arch/x86/crypto/aegis128-aesni-glue.c:265:30: warning: unused variable
> 'simd_alg' [-Wunused-variable]
> warning: lib/crypto/sha256.c: Function control flow change detected
> (hash mismatch) sha256_update Hash = 744640996947387358
> [-Wbackend-plugin]
> warning: arch/x86/boot/compressed/string.c: Function control flow
> change detected (hash mismatch) memcmp Hash = 742261418966908927
> [-Wbackend-plugin]
> warning: arch/x86/boot/compressed/string.c: Function control flow
> change detected (hash mismatch) bcmp Hash = 742261418966908927
> [-Wbackend-plugin]
> warning: arch/x86/boot/compressed/string.c: Function control flow
> change detected (hash mismatch) strcmp Hash = 536873291001348520
> [-Wbackend-plugin]
> warning: arch/x86/boot/compressed/string.c: Function control flow
> change detected (hash mismatch) strnlen Hash = 146835646621254984
> [-Wbackend-plugin]
> warning: arch/x86/boot/compressed/string.c: Function control flow
> change detected (hash mismatch) simple_strtoull Hash =
> 252792765950587360 [-Wbackend-plugin]
> warning: arch/x86/boot/compressed/string.c: Function control flow
> change detected (hash mismatch) strstr Hash = 391331303349076211
> [-Wbackend-plugin]
> warning: arch/x86/boot/compressed/string.c: Function control flow
> change detected (hash mismatch) strchr Hash = 1063705159280644635
> [-Wbackend-plugin]
> warning: arch/x86/boot/compressed/string.c: Function control flow
> change detected (hash mismatch) kstrtoull Hash = 758414239132790022
> [-Wbackend-plugin]
> drivers/infiniband/hw/hfi1/platform.o: warning: objtool: tune_serdes()
> falls through to next function apply_tx_lanes()
>
> Cannot say if this information is helpful.
>

My LLVM/Clang v12 is from <apt.llvm.org>:

clang-12 version 1:12~++20210115111113+45ef053bd709-1~exp1~20210115101809.3724

My kernel-config is attached.

- Sedat -


Attachments:
config-5.11.0-rc3-9-amd64-clang12-pgo (232.65 kB)

2021-01-17 18:06:25

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 1:05 PM Sedat Dilek <[email protected]> wrote:
>
> On Sun, Jan 17, 2021 at 12:58 PM Sedat Dilek <[email protected]> wrote:
> >
> > On Sun, Jan 17, 2021 at 12:42 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <[email protected]> wrote:
> > > > >
> > > > > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <[email protected]> wrote:
> > > > > >
> > > > > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
> > > > > > >
> > > > > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > > > > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > > > > <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > From: Sami Tolvanen <[email protected]>
> > > > > > > > >
> > > > > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > > > > workload is run, and the raw profile data is collected from
> > > > > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > > > > >
> > > > > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > > > > before it can be used during recompilation:
> > > > > > > > >
> > > > > > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > >
> > > > > > > > > Multiple raw profiles may be merged during this step.
> > > > > > > > >
> > > > > > > > > The data can now be used by the compiler:
> > > > > > > > >
> > > > > > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > > >
> > > > > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > > > > been verified to work with PGO.
> > > > > > > > >
> > > > > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > > > > the clang support in kernel/gcov.
> > > > > > > > >
> > > > > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > >
> > > > > > > > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > > > > > > > Co-developed-by: Bill Wendling <[email protected]>
> > > > > > > > > Signed-off-by: Bill Wendling <[email protected]>
> > > > > > > > > ---
> > > > > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > > > > testing.
> > > > > > > > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > > > > Song's comments.
> > > > > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > > > > own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > > > > ---
> > > > > > > > > Documentation/dev-tools/index.rst | 1 +
> > > > > > > > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > > > > > > > MAINTAINERS | 9 +
> > > > > > > > > Makefile | 3 +
> > > > > > > > > arch/Kconfig | 1 +
> > > > > > > > > arch/x86/Kconfig | 1 +
> > > > > > > > > arch/x86/boot/Makefile | 1 +
> > > > > > > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > > > > > > arch/x86/crypto/Makefile | 2 +
> > > > > > > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > > > > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > > > > > > arch/x86/platform/efi/Makefile | 1 +
> > > > > > > > > arch/x86/purgatory/Makefile | 1 +
> > > > > > > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > > > > > > arch/x86/um/vdso/Makefile | 1 +
> > > > > > > > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > > > > > > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > > > > > > > kernel/Makefile | 1 +
> > > > > > > > > kernel/pgo/Kconfig | 35 +++
> > > > > > > > > kernel/pgo/Makefile | 5 +
> > > > > > > > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > > > > > > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > > > > > > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > > > > > > > scripts/Makefile.lib | 10 +
> > > > > > > > > 24 files changed, 1022 insertions(+)
> > > > > > > > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > > > > create mode 100644 kernel/pgo/Kconfig
> > > > > > > > > create mode 100644 kernel/pgo/Makefile
> > > > > > > > > create mode 100644 kernel/pgo/fs.c
> > > > > > > > > create mode 100644 kernel/pgo/instrument.c
> > > > > > > > > create mode 100644 kernel/pgo/pgo.h
> > > > > > > > >
> > > > > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > > > > kgdb
> > > > > > > > > kselftest
> > > > > > > > > kunit/index
> > > > > > > > > + pgo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > .. only:: subproject and html
> > > > > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > > > > new file mode 100644
> > > > > > > > > index 0000000000000..b7f11d8405b73
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > > > > @@ -0,0 +1,127 @@
> > > > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > > > +
> > > > > > > > > +===============================
> > > > > > > > > +Using PGO with the Linux kernel
> > > > > > > > > +===============================
> > > > > > > > > +
> > > > > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > > > > +debugfs directory.
> > > > > > > > > +
> > > > > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Preparation
> > > > > > > > > +===========
> > > > > > > > > +
> > > > > > > > > +Configure the kernel with:
> > > > > > > > > +
> > > > > > > > > +.. code-block:: make
> > > > > > > > > +
> > > > > > > > > + CONFIG_DEBUG_FS=y
> > > > > > > > > + CONFIG_PGO_CLANG=y
> > > > > > > > > +
> > > > > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > > > > +and run slower.
> > > > > > > > > +
> > > > > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > > > > +
> > > > > > > > > +.. code-block:: sh
> > > > > > > > > +
> > > > > > > > > + mount -t debugfs none /sys/kernel/debug
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Customization
> > > > > > > > > +=============
> > > > > > > > > +
> > > > > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > > > > +
> > > > > > > > > +- For a single file (e.g. main.o)
> > > > > > > > > +
> > > > > > > > > + .. code-block:: make
> > > > > > > > > +
> > > > > > > > > + PGO_PROFILE_main.o := y
> > > > > > > > > +
> > > > > > > > > +- For all files in one directory
> > > > > > > > > +
> > > > > > > > > + .. code-block:: make
> > > > > > > > > +
> > > > > > > > > + PGO_PROFILE := y
> > > > > > > > > +
> > > > > > > > > +To exclude files from being profiled use
> > > > > > > > > +
> > > > > > > > > + .. code-block:: make
> > > > > > > > > +
> > > > > > > > > + PGO_PROFILE_main.o := n
> > > > > > > > > +
> > > > > > > > > +and
> > > > > > > > > +
> > > > > > > > > + .. code-block:: make
> > > > > > > > > +
> > > > > > > > > + PGO_PROFILE := n
> > > > > > > > > +
> > > > > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > > > > +modules are supported by this mechanism.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Files
> > > > > > > > > +=====
> > > > > > > > > +
> > > > > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > > > > +
> > > > > > > > > +``/sys/kernel/debug/pgo``
> > > > > > > > > + Parent directory for all PGO-related files.
> > > > > > > > > +
> > > > > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > > > > + Global reset file: resets all coverage data to zero when written to.
> > > > > > > > > +
> > > > > > > > > +``/sys/kernel/debug/profraw``
> > > > > > > > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Workflow
> > > > > > > > > +========
> > > > > > > > > +
> > > > > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > > > > +Clang version.
> > > > > > > > > +
> > > > > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > > > > +
> > > > > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > > > > +using the result to optimize the kernel:
> > > > > > > > > +
> > > > > > > > > +1) Install the kernel on the TEST machine.
> > > > > > > > > +
> > > > > > > > > +2) Reset the data counters right before running the load tests
> > > > > > > > > +
> > > > > > > > > + .. code-block:: sh
> > > > > > > > > +
> > > > > > > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > > > > +
> > > > > > > >
> > > > > > > > I do not get this...
> > > > > > > >
> > > > > > > > # mount | grep debugfs
> > > > > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > > > > >
> > > > > > > > After the load-test...?
> > > > > > > >
> > > > > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > > > > >
> > > > > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > > > > be any number, letter, your favorite short story, etc. You don't want
> > > > > > > to reset it before collecting the profiling data from your load tests
> > > > > > > though.
> > > > > > >
> > > > > > > > > +3) Run the load tests.
> > > > > > > > > +
> > > > > > > > > +4) Collect the raw profile data
> > > > > > > > > +
> > > > > > > > > + .. code-block:: sh
> > > > > > > > > +
> > > > > > > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > > +
> > > > > > > >
> > > > > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > > > > echo-1 line.
> > > > > > > >
> > > > > > > > # ll /sys/kernel/debug/pgo
> > > > > > > > insgesamt 0
> > > > > > > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > > > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > > > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > > > > > > --w------- 1 root root 0 16. Jan 18:19 reset
> > > > > > > >
> > > > > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > >
> > > > > > > > # ll /tmp/vmlinux.profraw
> > > > > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > > > > >
> > > > > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > > > > >
> > > > > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > > > > it, not even the kernel. All it does is serialize the profiling
> > > > > > > counters from a memory location in the kernel into a format that
> > > > > > > LLVM's tools can understand.
> > > > > > >
> > > > > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > > > > +
> > > > > > > > > +6) Process the raw profile data
> > > > > > > > > +
> > > > > > > > > + .. code-block:: sh
> > > > > > > > > +
> > > > > > > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > > +
> > > > > > > >
> > > > > > > > Is that executed in /path/to/linux/git?
> > > > > > > >
> > > > > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > > > > grab it from a clang distribution (or built from clang's git repo).
> > > > > > >
> > > > > > > > > + Note that multiple raw profile data files can be merged during this step.
> > > > > > > > > +
> > > > > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > > > > +
> > > > > > > > > + .. code-block:: sh
> > > > > > > > > +
> > > > > > > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > >
> > > > > > > > How big is vmlinux.profdata (make defconfig)?
> > > > > > > >
> > > > > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > > > > in size. The size is proportional to the number of counters
> > > > > > > instrumented in the kernel.
> > > > > > >
> > > > > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > > > > let me say 10mins?
> > > > > > > >
> > > > > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > > > > >
> > > > > >
> > > > > > Thanks Bill for all the information.
> > > > > >
> > > > > > And sorry if I am so pedantic.
> > > > > >
> > > > > > I have installed my Debian system with Legacy-BIOS enabled.
> > > > > >
> > > > > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > > > > have as a default) my system hangs on reboot.
> > > > > >
> > > > > > [ diffconfig ]
> > > > > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > > > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > > > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > > > > PGO_CLANG y -> n
> > > > > >
> > > > > > [ my make line ]
> > > > > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > > > dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > > > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > > > >
> > > > > > ( Yes, 06:47 a.m. in the morning :-). )
> > > > > >
> > > > > > When I boot with the rebuild Linux-kernel I see:
> > > > > >
> > > > > > Wrong EFI loader signature
> > > > > > ...
> > > > > > Decompressing
> > > > > > Parsing EFI
> > > > > > Performing Relocations done.
> > > > > > Booting the Kernel.
> > > > > >
> > > > > > *** SYSTEM HANGS ***
> > > > > > ( I waited for approx 1 min )
> > > > > >
> > > > > > I tried to turn UEFI support ON and OFF.
> > > > > > No success.
> > > > > >
> > > > > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > > > My bootloader is GRUB.
> > > > >
> > > > > In UEFI-BIOS settings there is no secure-boot disable option.
> > > > > Just simple "Use UEFI BIOS" enabled|disabled.
> > > > >
> > > > > Installed Debian packages:
> > > > >
> > > > > ii grub-common 2.04-12
> > > > > ii grub-pc 2.04-12
> > > > > ii grub-pc-bin 2.04-12
> > > > > ii grub2-common 2.04-12
> > > > >
> > > > > I found in the below link to do in grub-shell:
> > > > >
> > > > > set check_signatures=no
> > > > >
> > > > > But this is when grub-efi is installed.
> > > > >
> > > > > - Sedat -
> > > > >
> > > > > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
> > > >
> > > > Forget about that "Wrong EFI bootloader" - I see this with all other
> > > > kernels (all boot fine).
> > > >
> > > > I tried in QEMU with and without KASLR:
> > > >
> > > > [ run_qemu.sh ]
> > > > KPATH=$(pwd)
> > > >
> > > > APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> > > > APPEND="$APPEND nokaslr"
> > > >
> > > > qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> > > > $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> > > > [ /run_qemu.sh ]
> > > >
> > > > $ ./run_qemu.sh
> > > > Probing EDD (edd=off to disable)... ok
> > > > Wrong EFI loader signature.
> > > > early console in extract_kernel
> > > > input_data: 0x000000000289940d
> > > > input_len: 0x000000000069804a
> > > > output: 0x0000000001000000
> > > > output_len: 0x0000000001ef2010
> > > > kernel_total_size: 0x0000000001c2c000
> > > > needed_size: 0x0000000002000000
> > > > trampoline_32bit: 0x000000000009d000
> > > >
> > > >
> > > > KASLR disabled: 'nokaslr' on cmdline.
> > > >
> > > >
> > > > Decompressing Linux... Parsing ELF... No relocation needed... done.
> > > > Booting the kernel.
> > > >
> > > > QEMU run stops, too.
> > > >
> > >
> > > I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).
> > >
> > > --- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
> > > +++ /etc/initramfs-tools/initramfs.conf.zstd 2020-09-21
> > > 23:55:43.121735427 +0200
> > > @@ -41,7 +41,7 @@ KEYMAP=n
> > > # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
> > > #
> > >
> > > -COMPRESS=gzip
> > > +COMPRESS=zstd
> > >
> > > #
> > > # DEVICE: ...
> > >
> > > root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER
> > >
> > > QEMU boot stops at the same stage.
> > >
> > > Now, my head is empty...
> > >
> > > Any comments?
> > >
> >
> > ( Just as a side note I have Nick's DWARF-v5 support enabled. )
> >
> > There is one EFI related warning in my build-log:
> >
> > $ grep warning: build-log_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > dpkg-architecture: warning: specified GNU system type x86_64-linux-gnu
> > does not match CC system type x86_64-pc-linux-gnu, try setting a
> > correct CC environment variable
> > warning: arch/x86/platform/efi/quirks.c: Function control flow change
> > detected (hash mismatch) efi_arch_mem_reserve Hash =
> > 391331300655996873 [-Wbackend-plugin]
> > warning: arch/x86/platform/efi/efi.c: Function control flow change
> > detected (hash mismatch) efi_attr_is_visible Hash = 567185240781730690
> > [-Wbackend-plugin]
> > arch/x86/crypto/aegis128-aesni-glue.c:265:30: warning: unused variable
> > 'simd_alg' [-Wunused-variable]
> > warning: lib/crypto/sha256.c: Function control flow change detected
> > (hash mismatch) sha256_update Hash = 744640996947387358
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) memcmp Hash = 742261418966908927
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) bcmp Hash = 742261418966908927
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) strcmp Hash = 536873291001348520
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) strnlen Hash = 146835646621254984
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) simple_strtoull Hash =
> > 252792765950587360 [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) strstr Hash = 391331303349076211
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) strchr Hash = 1063705159280644635
> > [-Wbackend-plugin]
> > warning: arch/x86/boot/compressed/string.c: Function control flow
> > change detected (hash mismatch) kstrtoull Hash = 758414239132790022
> > [-Wbackend-plugin]
> > drivers/infiniband/hw/hfi1/platform.o: warning: objtool: tune_serdes()
> > falls through to next function apply_tx_lanes()
> >
> > Cannot say if this information is helpful.
> >
>
> My LLVM/Clang v12 is from <apt.llvm.org>:
>
> clang-12 version 1:12~++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
>
> My kernel-config is attached.
>

I dropped "LLVM_IAS=1" from my make line and did for my next build:

$ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo .config
BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-10-amd64-clang12-pgo"
DEBUG_INFO_DWARF2 n -> y
DEBUG_INFO_DWARF5 y -> n
PGO_CLANG y -> n

Means dropped DWARF5 support.

- Sedat -

2021-01-17 21:12:15

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 9:42 AM Sedat Dilek <[email protected]> wrote:
>
> On Sun, Jan 17, 2021 at 1:05 PM Sedat Dilek <[email protected]> wrote:
> >
> > On Sun, Jan 17, 2021 at 12:58 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 12:42 PM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <[email protected]> wrote:
> > > > >
> > > > > On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <[email protected]> wrote:
> > > > > >
> > > > > > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <[email protected]> wrote:
> > > > > > >
> > > > > > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > > > > > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > From: Sami Tolvanen <[email protected]>
> > > > > > > > > >
> > > > > > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > > > > > workload is run, and the raw profile data is collected from
> > > > > > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > > > > > >
> > > > > > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > > > > > before it can be used during recompilation:
> > > > > > > > > >
> > > > > > > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > > > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > > >
> > > > > > > > > > Multiple raw profiles may be merged during this step.
> > > > > > > > > >
> > > > > > > > > > The data can now be used by the compiler:
> > > > > > > > > >
> > > > > > > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > > > >
> > > > > > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > > > > > been verified to work with PGO.
> > > > > > > > > >
> > > > > > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > > > > > the clang support in kernel/gcov.
> > > > > > > > > >
> > > > > > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > > > > > > > > Co-developed-by: Bill Wendling <[email protected]>
> > > > > > > > > > Signed-off-by: Bill Wendling <[email protected]>
> > > > > > > > > > ---
> > > > > > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > > > > > testing.
> > > > > > > > > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > > > > > Song's comments.
> > > > > > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > > > > > own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > > > > > ---
> > > > > > > > > > Documentation/dev-tools/index.rst | 1 +
> > > > > > > > > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > > > > > > > > MAINTAINERS | 9 +
> > > > > > > > > > Makefile | 3 +
> > > > > > > > > > arch/Kconfig | 1 +
> > > > > > > > > > arch/x86/Kconfig | 1 +
> > > > > > > > > > arch/x86/boot/Makefile | 1 +
> > > > > > > > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > > > > > > > arch/x86/crypto/Makefile | 2 +
> > > > > > > > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > > > > > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > > > > > > > arch/x86/platform/efi/Makefile | 1 +
> > > > > > > > > > arch/x86/purgatory/Makefile | 1 +
> > > > > > > > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > > > > > > > arch/x86/um/vdso/Makefile | 1 +
> > > > > > > > > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > > > > > > > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > > > > > > > > kernel/Makefile | 1 +
> > > > > > > > > > kernel/pgo/Kconfig | 35 +++
> > > > > > > > > > kernel/pgo/Makefile | 5 +
> > > > > > > > > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > > > > > > > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > > > > > > > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > > > > > > > > scripts/Makefile.lib | 10 +
> > > > > > > > > > 24 files changed, 1022 insertions(+)
> > > > > > > > > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > > > > > create mode 100644 kernel/pgo/Kconfig
> > > > > > > > > > create mode 100644 kernel/pgo/Makefile
> > > > > > > > > > create mode 100644 kernel/pgo/fs.c
> > > > > > > > > > create mode 100644 kernel/pgo/instrument.c
> > > > > > > > > > create mode 100644 kernel/pgo/pgo.h
> > > > > > > > > >
> > > > > > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > > > > > kgdb
> > > > > > > > > > kselftest
> > > > > > > > > > kunit/index
> > > > > > > > > > + pgo
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > .. only:: subproject and html
> > > > > > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > > > > > new file mode 100644
> > > > > > > > > > index 0000000000000..b7f11d8405b73
> > > > > > > > > > --- /dev/null
> > > > > > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > > > > > @@ -0,0 +1,127 @@
> > > > > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > > > > +
> > > > > > > > > > +===============================
> > > > > > > > > > +Using PGO with the Linux kernel
> > > > > > > > > > +===============================
> > > > > > > > > > +
> > > > > > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > > > > > +debugfs directory.
> > > > > > > > > > +
> > > > > > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > > > +
> > > > > > > > > > +
> > > > > > > > > > +Preparation
> > > > > > > > > > +===========
> > > > > > > > > > +
> > > > > > > > > > +Configure the kernel with:
> > > > > > > > > > +
> > > > > > > > > > +.. code-block:: make
> > > > > > > > > > +
> > > > > > > > > > + CONFIG_DEBUG_FS=y
> > > > > > > > > > + CONFIG_PGO_CLANG=y
> > > > > > > > > > +
> > > > > > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > > > > > +and run slower.
> > > > > > > > > > +
> > > > > > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > > > > > +
> > > > > > > > > > +.. code-block:: sh
> > > > > > > > > > +
> > > > > > > > > > + mount -t debugfs none /sys/kernel/debug
> > > > > > > > > > +
> > > > > > > > > > +
> > > > > > > > > > +Customization
> > > > > > > > > > +=============
> > > > > > > > > > +
> > > > > > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > > > > > +
> > > > > > > > > > +- For a single file (e.g. main.o)
> > > > > > > > > > +
> > > > > > > > > > + .. code-block:: make
> > > > > > > > > > +
> > > > > > > > > > + PGO_PROFILE_main.o := y
> > > > > > > > > > +
> > > > > > > > > > +- For all files in one directory
> > > > > > > > > > +
> > > > > > > > > > + .. code-block:: make
> > > > > > > > > > +
> > > > > > > > > > + PGO_PROFILE := y
> > > > > > > > > > +
> > > > > > > > > > +To exclude files from being profiled use
> > > > > > > > > > +
> > > > > > > > > > + .. code-block:: make
> > > > > > > > > > +
> > > > > > > > > > + PGO_PROFILE_main.o := n
> > > > > > > > > > +
> > > > > > > > > > +and
> > > > > > > > > > +
> > > > > > > > > > + .. code-block:: make
> > > > > > > > > > +
> > > > > > > > > > + PGO_PROFILE := n
> > > > > > > > > > +
> > > > > > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > > > > > +modules are supported by this mechanism.
> > > > > > > > > > +
> > > > > > > > > > +
> > > > > > > > > > +Files
> > > > > > > > > > +=====
> > > > > > > > > > +
> > > > > > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > > > > > +
> > > > > > > > > > +``/sys/kernel/debug/pgo``
> > > > > > > > > > + Parent directory for all PGO-related files.
> > > > > > > > > > +
> > > > > > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > > > > > + Global reset file: resets all coverage data to zero when written to.
> > > > > > > > > > +
> > > > > > > > > > +``/sys/kernel/debug/profraw``
> > > > > > > > > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > > > > > +
> > > > > > > > > > +
> > > > > > > > > > +Workflow
> > > > > > > > > > +========
> > > > > > > > > > +
> > > > > > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > > > > > +Clang version.
> > > > > > > > > > +
> > > > > > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > > > > > +
> > > > > > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > > > > > +using the result to optimize the kernel:
> > > > > > > > > > +
> > > > > > > > > > +1) Install the kernel on the TEST machine.
> > > > > > > > > > +
> > > > > > > > > > +2) Reset the data counters right before running the load tests
> > > > > > > > > > +
> > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > +
> > > > > > > > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > > > > > +
> > > > > > > > >
> > > > > > > > > I do not get this...
> > > > > > > > >
> > > > > > > > > # mount | grep debugfs
> > > > > > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > > > > > >
> > > > > > > > > After the load-test...?
> > > > > > > > >
> > > > > > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > > > > > >
> > > > > > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > > > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > > > > > be any number, letter, your favorite short story, etc. You don't want
> > > > > > > > to reset it before collecting the profiling data from your load tests
> > > > > > > > though.
> > > > > > > >
> > > > > > > > > > +3) Run the load tests.
> > > > > > > > > > +
> > > > > > > > > > +4) Collect the raw profile data
> > > > > > > > > > +
> > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > +
> > > > > > > > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > > > +
> > > > > > > > >
> > > > > > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > > > > > echo-1 line.
> > > > > > > > >
> > > > > > > > > # ll /sys/kernel/debug/pgo
> > > > > > > > > insgesamt 0
> > > > > > > > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > > > > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > > > > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > > > > > > > --w------- 1 root root 0 16. Jan 18:19 reset
> > > > > > > > >
> > > > > > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > >
> > > > > > > > > # ll /tmp/vmlinux.profraw
> > > > > > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > > > > > >
> > > > > > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > > > > > >
> > > > > > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > > > > > it, not even the kernel. All it does is serialize the profiling
> > > > > > > > counters from a memory location in the kernel into a format that
> > > > > > > > LLVM's tools can understand.
> > > > > > > >
> > > > > > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > > > > > +
> > > > > > > > > > +6) Process the raw profile data
> > > > > > > > > > +
> > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > +
> > > > > > > > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > > > +
> > > > > > > > >
> > > > > > > > > Is that executed in /path/to/linux/git?
> > > > > > > > >
> > > > > > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > > > > > grab it from a clang distribution (or built from clang's git repo).
> > > > > > > >
> > > > > > > > > > + Note that multiple raw profile data files can be merged during this step.
> > > > > > > > > > +
> > > > > > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > > > > > +
> > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > +
> > > > > > > > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > > >
> > > > > > > > > How big is vmlinux.profdata (make defconfig)?
> > > > > > > > >
> > > > > > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > > > > > in size. The size is proportional to the number of counters
> > > > > > > > instrumented in the kernel.
> > > > > > > >
> > > > > > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > > > > > let me say 10mins?
> > > > > > > > >
> > > > > > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > > > > > >
> > > > > > >
> > > > > > > Thanks Bill for all the information.
> > > > > > >
> > > > > > > And sorry if I am so pedantic.
> > > > > > >
> > > > > > > I have installed my Debian system with Legacy-BIOS enabled.
> > > > > > >
> > > > > > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > > > > > have as a default) my system hangs on reboot.
> > > > > > >
> > > > > > > [ diffconfig ]
> > > > > > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > > > > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > > > > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > > > > > PGO_CLANG y -> n
> > > > > > >
> > > > > > > [ my make line ]
> > > > > > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > > > > dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > > > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > > > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > > > > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > > > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > > > > >
> > > > > > > ( Yes, 06:47 a.m. in the morning :-). )
> > > > > > >
> > > > > > > When I boot with the rebuild Linux-kernel I see:
> > > > > > >
> > > > > > > Wrong EFI loader signature
> > > > > > > ...
> > > > > > > Decompressing
> > > > > > > Parsing EFI
> > > > > > > Performing Relocations done.
> > > > > > > Booting the Kernel.
> > > > > > >
> > > > > > > *** SYSTEM HANGS ***
> > > > > > > ( I waited for approx 1 min )
> > > > > > >
> > > > > > > I tried to turn UEFI support ON and OFF.
> > > > > > > No success.
> > > > > > >
> > > > > > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > >
> > > > > > My bootloader is GRUB.
> > > > > >
> > > > > > In UEFI-BIOS settings there is no secure-boot disable option.
> > > > > > Just simple "Use UEFI BIOS" enabled|disabled.
> > > > > >
> > > > > > Installed Debian packages:
> > > > > >
> > > > > > ii grub-common 2.04-12
> > > > > > ii grub-pc 2.04-12
> > > > > > ii grub-pc-bin 2.04-12
> > > > > > ii grub2-common 2.04-12
> > > > > >
> > > > > > I found in the below link to do in grub-shell:
> > > > > >
> > > > > > set check_signatures=no
> > > > > >
> > > > > > But this is when grub-efi is installed.
> > > > > >
> > > > > > - Sedat -
> > > > > >
> > > > > > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
> > > > >
> > > > > Forget about that "Wrong EFI bootloader" - I see this with all other
> > > > > kernels (all boot fine).
> > > > >
> > > > > I tried in QEMU with and without KASLR:
> > > > >
> > > > > [ run_qemu.sh ]
> > > > > KPATH=$(pwd)
> > > > >
> > > > > APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> > > > > APPEND="$APPEND nokaslr"
> > > > >
> > > > > qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> > > > > $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> > > > > [ /run_qemu.sh ]
> > > > >
> > > > > $ ./run_qemu.sh
> > > > > Probing EDD (edd=off to disable)... ok
> > > > > Wrong EFI loader signature.
> > > > > early console in extract_kernel
> > > > > input_data: 0x000000000289940d
> > > > > input_len: 0x000000000069804a
> > > > > output: 0x0000000001000000
> > > > > output_len: 0x0000000001ef2010
> > > > > kernel_total_size: 0x0000000001c2c000
> > > > > needed_size: 0x0000000002000000
> > > > > trampoline_32bit: 0x000000000009d000
> > > > >
> > > > >
> > > > > KASLR disabled: 'nokaslr' on cmdline.
> > > > >
> > > > >
> > > > > Decompressing Linux... Parsing ELF... No relocation needed... done.
> > > > > Booting the kernel.
> > > > >
> > > > > QEMU run stops, too.
> > > > >
> > > >
> > > > I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).
> > > >
> > > > --- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
> > > > +++ /etc/initramfs-tools/initramfs.conf.zstd 2020-09-21
> > > > 23:55:43.121735427 +0200
> > > > @@ -41,7 +41,7 @@ KEYMAP=n
> > > > # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
> > > > #
> > > >
> > > > -COMPRESS=gzip
> > > > +COMPRESS=zstd
> > > >
> > > > #
> > > > # DEVICE: ...
> > > >
> > > > root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER
> > > >
> > > > QEMU boot stops at the same stage.
> > > >
> > > > Now, my head is empty...
> > > >
> > > > Any comments?
> > > >
> > >
> > > ( Just as a side note I have Nick's DWARF-v5 support enabled. )
> > >
> > > There is one EFI related warning in my build-log:
> > >
> > > $ grep warning: build-log_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > dpkg-architecture: warning: specified GNU system type x86_64-linux-gnu
> > > does not match CC system type x86_64-pc-linux-gnu, try setting a
> > > correct CC environment variable
> > > warning: arch/x86/platform/efi/quirks.c: Function control flow change
> > > detected (hash mismatch) efi_arch_mem_reserve Hash =
> > > 391331300655996873 [-Wbackend-plugin]
> > > warning: arch/x86/platform/efi/efi.c: Function control flow change
> > > detected (hash mismatch) efi_attr_is_visible Hash = 567185240781730690
> > > [-Wbackend-plugin]
> > > arch/x86/crypto/aegis128-aesni-glue.c:265:30: warning: unused variable
> > > 'simd_alg' [-Wunused-variable]
> > > warning: lib/crypto/sha256.c: Function control flow change detected
> > > (hash mismatch) sha256_update Hash = 744640996947387358
> > > [-Wbackend-plugin]
> > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > change detected (hash mismatch) memcmp Hash = 742261418966908927
> > > [-Wbackend-plugin]
> > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > change detected (hash mismatch) bcmp Hash = 742261418966908927
> > > [-Wbackend-plugin]
> > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > change detected (hash mismatch) strcmp Hash = 536873291001348520
> > > [-Wbackend-plugin]
> > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > change detected (hash mismatch) strnlen Hash = 146835646621254984
> > > [-Wbackend-plugin]
> > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > change detected (hash mismatch) simple_strtoull Hash =
> > > 252792765950587360 [-Wbackend-plugin]
> > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > change detected (hash mismatch) strstr Hash = 391331303349076211
> > > [-Wbackend-plugin]
> > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > change detected (hash mismatch) strchr Hash = 1063705159280644635
> > > [-Wbackend-plugin]
> > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > change detected (hash mismatch) kstrtoull Hash = 758414239132790022
> > > [-Wbackend-plugin]
> > > drivers/infiniband/hw/hfi1/platform.o: warning: objtool: tune_serdes()
> > > falls through to next function apply_tx_lanes()
> > >
> > > Cannot say if this information is helpful.
> > >
> >
> > My LLVM/Clang v12 is from <apt.llvm.org>:
> >
> > clang-12 version 1:12~++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> >
> > My kernel-config is attached.
> >
>
> I dropped "LLVM_IAS=1" from my make line and did for my next build:
>
> $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo .config
> BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-10-amd64-clang12-pgo"
> DEBUG_INFO_DWARF2 n -> y
> DEBUG_INFO_DWARF5 y -> n
> PGO_CLANG y -> n
>
> Means dropped DWARF5 support.
>
Hi Sedat,

Using PGO just improves optimizations. So unless there's miscompile,
or some other nefarious thing, it shouldn't affect how the boot loader
runs.

As a sanity check, does the same Linux source and compiler version
generate a bootable kernel when PGO isn't used?

-bw

2021-01-17 23:50:17

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 9:35 PM Bill Wendling <[email protected]> wrote:
>
> On Sun, Jan 17, 2021 at 9:42 AM Sedat Dilek <[email protected]> wrote:
> >
> > On Sun, Jan 17, 2021 at 1:05 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 12:58 PM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Sun, Jan 17, 2021 at 12:42 PM Sedat Dilek <[email protected]> wrote:
> > > > >
> > > > > On Sun, Jan 17, 2021 at 12:23 PM Sedat Dilek <[email protected]> wrote:
> > > > > >
> > > > > > On Sun, Jan 17, 2021 at 11:53 AM Sedat Dilek <[email protected]> wrote:
> > > > > > >
> > > > > > > On Sun, Jan 17, 2021 at 11:44 AM Sedat Dilek <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Sat, Jan 16, 2021 at 9:23 PM Bill Wendling <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Sat, Jan 16, 2021 at 9:39 AM Sedat Dilek <[email protected]> wrote:
> > > > > > > > > > On Sat, Jan 16, 2021 at 10:44 AM 'Bill Wendling' via Clang Built Linux
> > > > > > > > > > <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > From: Sami Tolvanen <[email protected]>
> > > > > > > > > > >
> > > > > > > > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > > > > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > > > > > > > workload is run, and the raw profile data is collected from
> > > > > > > > > > > /sys/kernel/debug/pgo/profraw.
> > > > > > > > > > >
> > > > > > > > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > > > > > > > before it can be used during recompilation:
> > > > > > > > > > >
> > > > > > > > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > > > > > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > > > >
> > > > > > > > > > > Multiple raw profiles may be merged during this step.
> > > > > > > > > > >
> > > > > > > > > > > The data can now be used by the compiler:
> > > > > > > > > > >
> > > > > > > > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > > > > >
> > > > > > > > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > > > > > > > know works. This restriction can be lifted once other platforms have
> > > > > > > > > > > been verified to work with PGO.
> > > > > > > > > > >
> > > > > > > > > > > Note that this method of profiling the kernel is clang-native, unlike
> > > > > > > > > > > the clang support in kernel/gcov.
> > > > > > > > > > >
> > > > > > > > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Sami Tolvanen <[email protected]>
> > > > > > > > > > > Co-developed-by: Bill Wendling <[email protected]>
> > > > > > > > > > > Signed-off-by: Bill Wendling <[email protected]>
> > > > > > > > > > > ---
> > > > > > > > > > > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > > > > > > > > > > testing.
> > > > > > > > > > > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > > > > > > > > > > Song's comments.
> > > > > > > > > > > v3: - Added change log section based on Sedat Dilek's comments.
> > > > > > > > > > > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > > > > > > > > > > own popcount implementation, based on Nick Desaulniers's comment.
> > > > > > > > > > > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > > > > > > > > > > ---
> > > > > > > > > > > Documentation/dev-tools/index.rst | 1 +
> > > > > > > > > > > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > > > > > > > > > > MAINTAINERS | 9 +
> > > > > > > > > > > Makefile | 3 +
> > > > > > > > > > > arch/Kconfig | 1 +
> > > > > > > > > > > arch/x86/Kconfig | 1 +
> > > > > > > > > > > arch/x86/boot/Makefile | 1 +
> > > > > > > > > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > > > > > > > > arch/x86/crypto/Makefile | 2 +
> > > > > > > > > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > > > > > > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > > > > > > > > arch/x86/platform/efi/Makefile | 1 +
> > > > > > > > > > > arch/x86/purgatory/Makefile | 1 +
> > > > > > > > > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > > > > > > > > arch/x86/um/vdso/Makefile | 1 +
> > > > > > > > > > > drivers/firmware/efi/libstub/Makefile | 1 +
> > > > > > > > > > > include/asm-generic/vmlinux.lds.h | 44 +++
> > > > > > > > > > > kernel/Makefile | 1 +
> > > > > > > > > > > kernel/pgo/Kconfig | 35 +++
> > > > > > > > > > > kernel/pgo/Makefile | 5 +
> > > > > > > > > > > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > > > > > > > > > > kernel/pgo/instrument.c | 185 +++++++++++++
> > > > > > > > > > > kernel/pgo/pgo.h | 206 ++++++++++++++
> > > > > > > > > > > scripts/Makefile.lib | 10 +
> > > > > > > > > > > 24 files changed, 1022 insertions(+)
> > > > > > > > > > > create mode 100644 Documentation/dev-tools/pgo.rst
> > > > > > > > > > > create mode 100644 kernel/pgo/Kconfig
> > > > > > > > > > > create mode 100644 kernel/pgo/Makefile
> > > > > > > > > > > create mode 100644 kernel/pgo/fs.c
> > > > > > > > > > > create mode 100644 kernel/pgo/instrument.c
> > > > > > > > > > > create mode 100644 kernel/pgo/pgo.h
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > > > > > > > > > > index f7809c7b1ba9e..8d6418e858062 100644
> > > > > > > > > > > --- a/Documentation/dev-tools/index.rst
> > > > > > > > > > > +++ b/Documentation/dev-tools/index.rst
> > > > > > > > > > > @@ -26,6 +26,7 @@ whole; patches welcome!
> > > > > > > > > > > kgdb
> > > > > > > > > > > kselftest
> > > > > > > > > > > kunit/index
> > > > > > > > > > > + pgo
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > .. only:: subproject and html
> > > > > > > > > > > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > > > > > > > > > > new file mode 100644
> > > > > > > > > > > index 0000000000000..b7f11d8405b73
> > > > > > > > > > > --- /dev/null
> > > > > > > > > > > +++ b/Documentation/dev-tools/pgo.rst
> > > > > > > > > > > @@ -0,0 +1,127 @@
> > > > > > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > > > > > +
> > > > > > > > > > > +===============================
> > > > > > > > > > > +Using PGO with the Linux kernel
> > > > > > > > > > > +===============================
> > > > > > > > > > > +
> > > > > > > > > > > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > > > > > > > > > > +when building with Clang. The profiling data is exported via the ``pgo``
> > > > > > > > > > > +debugfs directory.
> > > > > > > > > > > +
> > > > > > > > > > > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > > > > > > > > > > +
> > > > > > > > > > > +
> > > > > > > > > > > +Preparation
> > > > > > > > > > > +===========
> > > > > > > > > > > +
> > > > > > > > > > > +Configure the kernel with:
> > > > > > > > > > > +
> > > > > > > > > > > +.. code-block:: make
> > > > > > > > > > > +
> > > > > > > > > > > + CONFIG_DEBUG_FS=y
> > > > > > > > > > > + CONFIG_PGO_CLANG=y
> > > > > > > > > > > +
> > > > > > > > > > > +Note that kernels compiled with profiling flags will be significantly larger
> > > > > > > > > > > +and run slower.
> > > > > > > > > > > +
> > > > > > > > > > > +Profiling data will only become accessible once debugfs has been mounted:
> > > > > > > > > > > +
> > > > > > > > > > > +.. code-block:: sh
> > > > > > > > > > > +
> > > > > > > > > > > + mount -t debugfs none /sys/kernel/debug
> > > > > > > > > > > +
> > > > > > > > > > > +
> > > > > > > > > > > +Customization
> > > > > > > > > > > +=============
> > > > > > > > > > > +
> > > > > > > > > > > +You can enable or disable profiling for individual file and directories by
> > > > > > > > > > > +adding a line similar to the following to the respective kernel Makefile:
> > > > > > > > > > > +
> > > > > > > > > > > +- For a single file (e.g. main.o)
> > > > > > > > > > > +
> > > > > > > > > > > + .. code-block:: make
> > > > > > > > > > > +
> > > > > > > > > > > + PGO_PROFILE_main.o := y
> > > > > > > > > > > +
> > > > > > > > > > > +- For all files in one directory
> > > > > > > > > > > +
> > > > > > > > > > > + .. code-block:: make
> > > > > > > > > > > +
> > > > > > > > > > > + PGO_PROFILE := y
> > > > > > > > > > > +
> > > > > > > > > > > +To exclude files from being profiled use
> > > > > > > > > > > +
> > > > > > > > > > > + .. code-block:: make
> > > > > > > > > > > +
> > > > > > > > > > > + PGO_PROFILE_main.o := n
> > > > > > > > > > > +
> > > > > > > > > > > +and
> > > > > > > > > > > +
> > > > > > > > > > > + .. code-block:: make
> > > > > > > > > > > +
> > > > > > > > > > > + PGO_PROFILE := n
> > > > > > > > > > > +
> > > > > > > > > > > +Only files which are linked to the main kernel image or are compiled as kernel
> > > > > > > > > > > +modules are supported by this mechanism.
> > > > > > > > > > > +
> > > > > > > > > > > +
> > > > > > > > > > > +Files
> > > > > > > > > > > +=====
> > > > > > > > > > > +
> > > > > > > > > > > +The PGO kernel support creates the following files in debugfs:
> > > > > > > > > > > +
> > > > > > > > > > > +``/sys/kernel/debug/pgo``
> > > > > > > > > > > + Parent directory for all PGO-related files.
> > > > > > > > > > > +
> > > > > > > > > > > +``/sys/kernel/debug/pgo/reset``
> > > > > > > > > > > + Global reset file: resets all coverage data to zero when written to.
> > > > > > > > > > > +
> > > > > > > > > > > +``/sys/kernel/debug/profraw``
> > > > > > > > > > > + The raw PGO data that must be processed with ``llvm_profdata``.
> > > > > > > > > > > +
> > > > > > > > > > > +
> > > > > > > > > > > +Workflow
> > > > > > > > > > > +========
> > > > > > > > > > > +
> > > > > > > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > > > > > > +Clang version.
> > > > > > > > > > > +
> > > > > > > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > > > > > > +
> > > > > > > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > > > > > > +using the result to optimize the kernel:
> > > > > > > > > > > +
> > > > > > > > > > > +1) Install the kernel on the TEST machine.
> > > > > > > > > > > +
> > > > > > > > > > > +2) Reset the data counters right before running the load tests
> > > > > > > > > > > +
> > > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > > +
> > > > > > > > > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > > > > > > +
> > > > > > > > > >
> > > > > > > > > > I do not get this...
> > > > > > > > > >
> > > > > > > > > > # mount | grep debugfs
> > > > > > > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > > > > > > >
> > > > > > > > > > After the load-test...?
> > > > > > > > > >
> > > > > > > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > > > > > > >
> > > > > > > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > > > > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > > > > > > be any number, letter, your favorite short story, etc. You don't want
> > > > > > > > > to reset it before collecting the profiling data from your load tests
> > > > > > > > > though.
> > > > > > > > >
> > > > > > > > > > > +3) Run the load tests.
> > > > > > > > > > > +
> > > > > > > > > > > +4) Collect the raw profile data
> > > > > > > > > > > +
> > > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > > +
> > > > > > > > > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > > > > +
> > > > > > > > > >
> > > > > > > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > > > > > > echo-1 line.
> > > > > > > > > >
> > > > > > > > > > # ll /sys/kernel/debug/pgo
> > > > > > > > > > insgesamt 0
> > > > > > > > > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > > > > > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > > > > > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > > > > > > > > --w------- 1 root root 0 16. Jan 18:19 reset
> > > > > > > > > >
> > > > > > > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > > >
> > > > > > > > > > # ll /tmp/vmlinux.profraw
> > > > > > > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > > > > > > >
> > > > > > > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > > > > > > >
> > > > > > > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > > > > > > it, not even the kernel. All it does is serialize the profiling
> > > > > > > > > counters from a memory location in the kernel into a format that
> > > > > > > > > LLVM's tools can understand.
> > > > > > > > >
> > > > > > > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > > > > > > +
> > > > > > > > > > > +6) Process the raw profile data
> > > > > > > > > > > +
> > > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > > +
> > > > > > > > > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > > > > +
> > > > > > > > > >
> > > > > > > > > > Is that executed in /path/to/linux/git?
> > > > > > > > > >
> > > > > > > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > > > > > > grab it from a clang distribution (or built from clang's git repo).
> > > > > > > > >
> > > > > > > > > > > + Note that multiple raw profile data files can be merged during this step.
> > > > > > > > > > > +
> > > > > > > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > > > > > > +
> > > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > > +
> > > > > > > > > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > > > >
> > > > > > > > > > How big is vmlinux.profdata (make defconfig)?
> > > > > > > > > >
> > > > > > > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > > > > > > in size. The size is proportional to the number of counters
> > > > > > > > > instrumented in the kernel.
> > > > > > > > >
> > > > > > > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > > > > > > let me say 10mins?
> > > > > > > > > >
> > > > > > > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Thanks Bill for all the information.
> > > > > > > >
> > > > > > > > And sorry if I am so pedantic.
> > > > > > > >
> > > > > > > > I have installed my Debian system with Legacy-BIOS enabled.
> > > > > > > >
> > > > > > > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > > > > > > have as a default) my system hangs on reboot.
> > > > > > > >
> > > > > > > > [ diffconfig ]
> > > > > > > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > > > > > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > > > > > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > > > > > > PGO_CLANG y -> n
> > > > > > > >
> > > > > > > > [ my make line ]
> > > > > > > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > > > > > dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > > > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > > > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > > > > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > > > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > > > > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > > > > > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > > > > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > > > > > >
> > > > > > > > ( Yes, 06:47 a.m. in the morning :-). )
> > > > > > > >
> > > > > > > > When I boot with the rebuild Linux-kernel I see:
> > > > > > > >
> > > > > > > > Wrong EFI loader signature
> > > > > > > > ...
> > > > > > > > Decompressing
> > > > > > > > Parsing EFI
> > > > > > > > Performing Relocations done.
> > > > > > > > Booting the Kernel.
> > > > > > > >
> > > > > > > > *** SYSTEM HANGS ***
> > > > > > > > ( I waited for approx 1 min )
> > > > > > > >
> > > > > > > > I tried to turn UEFI support ON and OFF.
> > > > > > > > No success.
> > > > > > > >
> > > > > > > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > >
> > > > > > > My bootloader is GRUB.
> > > > > > >
> > > > > > > In UEFI-BIOS settings there is no secure-boot disable option.
> > > > > > > Just simple "Use UEFI BIOS" enabled|disabled.
> > > > > > >
> > > > > > > Installed Debian packages:
> > > > > > >
> > > > > > > ii grub-common 2.04-12
> > > > > > > ii grub-pc 2.04-12
> > > > > > > ii grub-pc-bin 2.04-12
> > > > > > > ii grub2-common 2.04-12
> > > > > > >
> > > > > > > I found in the below link to do in grub-shell:
> > > > > > >
> > > > > > > set check_signatures=no
> > > > > > >
> > > > > > > But this is when grub-efi is installed.
> > > > > > >
> > > > > > > - Sedat -
> > > > > > >
> > > > > > > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
> > > > > >
> > > > > > Forget about that "Wrong EFI bootloader" - I see this with all other
> > > > > > kernels (all boot fine).
> > > > > >
> > > > > > I tried in QEMU with and without KASLR:
> > > > > >
> > > > > > [ run_qemu.sh ]
> > > > > > KPATH=$(pwd)
> > > > > >
> > > > > > APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> > > > > > APPEND="$APPEND nokaslr"
> > > > > >
> > > > > > qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> > > > > > $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> > > > > > [ /run_qemu.sh ]
> > > > > >
> > > > > > $ ./run_qemu.sh
> > > > > > Probing EDD (edd=off to disable)... ok
> > > > > > Wrong EFI loader signature.
> > > > > > early console in extract_kernel
> > > > > > input_data: 0x000000000289940d
> > > > > > input_len: 0x000000000069804a
> > > > > > output: 0x0000000001000000
> > > > > > output_len: 0x0000000001ef2010
> > > > > > kernel_total_size: 0x0000000001c2c000
> > > > > > needed_size: 0x0000000002000000
> > > > > > trampoline_32bit: 0x000000000009d000
> > > > > >
> > > > > >
> > > > > > KASLR disabled: 'nokaslr' on cmdline.
> > > > > >
> > > > > >
> > > > > > Decompressing Linux... Parsing ELF... No relocation needed... done.
> > > > > > Booting the kernel.
> > > > > >
> > > > > > QEMU run stops, too.
> > > > > >
> > > > >
> > > > > I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).
> > > > >
> > > > > --- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
> > > > > +++ /etc/initramfs-tools/initramfs.conf.zstd 2020-09-21
> > > > > 23:55:43.121735427 +0200
> > > > > @@ -41,7 +41,7 @@ KEYMAP=n
> > > > > # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
> > > > > #
> > > > >
> > > > > -COMPRESS=gzip
> > > > > +COMPRESS=zstd
> > > > >
> > > > > #
> > > > > # DEVICE: ...
> > > > >
> > > > > root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER
> > > > >
> > > > > QEMU boot stops at the same stage.
> > > > >
> > > > > Now, my head is empty...
> > > > >
> > > > > Any comments?
> > > > >
> > > >
> > > > ( Just as a side note I have Nick's DWARF-v5 support enabled. )
> > > >
> > > > There is one EFI related warning in my build-log:
> > > >
> > > > $ grep warning: build-log_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > dpkg-architecture: warning: specified GNU system type x86_64-linux-gnu
> > > > does not match CC system type x86_64-pc-linux-gnu, try setting a
> > > > correct CC environment variable
> > > > warning: arch/x86/platform/efi/quirks.c: Function control flow change
> > > > detected (hash mismatch) efi_arch_mem_reserve Hash =
> > > > 391331300655996873 [-Wbackend-plugin]
> > > > warning: arch/x86/platform/efi/efi.c: Function control flow change
> > > > detected (hash mismatch) efi_attr_is_visible Hash = 567185240781730690
> > > > [-Wbackend-plugin]
> > > > arch/x86/crypto/aegis128-aesni-glue.c:265:30: warning: unused variable
> > > > 'simd_alg' [-Wunused-variable]
> > > > warning: lib/crypto/sha256.c: Function control flow change detected
> > > > (hash mismatch) sha256_update Hash = 744640996947387358
> > > > [-Wbackend-plugin]
> > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > change detected (hash mismatch) memcmp Hash = 742261418966908927
> > > > [-Wbackend-plugin]
> > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > change detected (hash mismatch) bcmp Hash = 742261418966908927
> > > > [-Wbackend-plugin]
> > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > change detected (hash mismatch) strcmp Hash = 536873291001348520
> > > > [-Wbackend-plugin]
> > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > change detected (hash mismatch) strnlen Hash = 146835646621254984
> > > > [-Wbackend-plugin]
> > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > change detected (hash mismatch) simple_strtoull Hash =
> > > > 252792765950587360 [-Wbackend-plugin]
> > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > change detected (hash mismatch) strstr Hash = 391331303349076211
> > > > [-Wbackend-plugin]
> > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > change detected (hash mismatch) strchr Hash = 1063705159280644635
> > > > [-Wbackend-plugin]
> > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > change detected (hash mismatch) kstrtoull Hash = 758414239132790022
> > > > [-Wbackend-plugin]
> > > > drivers/infiniband/hw/hfi1/platform.o: warning: objtool: tune_serdes()
> > > > falls through to next function apply_tx_lanes()
> > > >
> > > > Cannot say if this information is helpful.
> > > >
> > >
> > > My LLVM/Clang v12 is from <apt.llvm.org>:
> > >
> > > clang-12 version 1:12~++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> > >
> > > My kernel-config is attached.
> > >
> >
> > I dropped "LLVM_IAS=1" from my make line and did for my next build:
> >
> > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo .config
> > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-10-amd64-clang12-pgo"
> > DEBUG_INFO_DWARF2 n -> y
> > DEBUG_INFO_DWARF5 y -> n
> > PGO_CLANG y -> n
> >
> > Means dropped DWARF5 support.
> >
> Hi Sedat,
>
> Using PGO just improves optimizations. So unless there's miscompile,
> or some other nefarious thing, it shouldn't affect how the boot loader
> runs.
>
> As a sanity check, does the same Linux source and compiler version
> generate a bootable kernel when PGO isn't used?
>

Yes, I can boot with the same code base without PGO.

With the attached kernel-config.

I remember there is a fix in CBL issue tracker for...

( https://github.com/ClangBuiltLinux/linux/issues/1250 )

Loading, please wait...
Starting version 247.2-4
[ 2.157223] floppy: module verification failed: signature and/or
required key missing - tainting kernel
[ 2.179326] i2c_piix4: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.183558] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.187991] floppy: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.195047] psmouse: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.210404] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.231055] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)

Full QEMU log...

$ ./run_qemu.sh
Probing EDD (edd=off to disable)... ok
Wrong EFI loader signature.
early console in extract_kernel
input_data: 0x000000000289c40d
input_len: 0x0000000000693f62
output: 0x0000000001000000
output_len: 0x0000000001ef0224
kernel_total_size: 0x0000000001c2c000
needed_size: 0x0000000002000000
trampoline_32bit: 0x000000000009d000
Physical KASLR using RDTSC...
Virtual KASLR using RDTSC...

Decompressing Linux... Parsing ELF... Performing relocations... done.
Booting the kernel.
[ 0.000000] Linux version 5.11.0-rc3-10-amd64-clang12-pgo
([email protected]@iniza) (Debian clang version
12.0.0-++20210115111113+45ef053bd709-1~exp1~2021011510
1809.3724, LLD 12.0.0) #10~bullseye+dileks1 SMP 2021-01-17
[ 0.000000] Command line: root=/dev/ram0 console=ttyS0
hung_task_panic=1 earlyprintk=ttyS0,115200
[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] printk: bootconsole [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.8 present.
[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.14.0-2 04/01/2014
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr c877001, primary cpu clock
[ 0.000000] kvm-clock: using sched offset of 510123624 cycles
[ 0.003240] clocksource: kvm-clock: mask: 0xffffffffffffffff
max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.009652] tsc: Detected 1596.372 MHz processor
[ 0.013107] last_pfn = 0x1ffe0 max_arch_pfn = 0x400000000
[ 0.015537] x86/PAT: PAT not supported by the CPU.
[ 0.017605] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
Memory KASLR using RDTSC...
[ 0.038444] found SMP MP-table at [mem 0x000f5ce0-0x000f5cef]
[ 0.042330] RAMDISK: [mem 0x1dfdb000-0x1ffdffff]
[ 0.044738] ACPI: Early table checksum verification disabled
[ 0.047289] ACPI: RSDP 0x00000000000F5B20 000014 (v00 BOCHS )
[ 0.049887] ACPI: RSDT 0x000000001FFE1550 000034 (v01 BOCHS
BXPCRSDT 00000001 BXPC 00000001)
[ 0.054578] ACPI: FACP 0x000000001FFE1404 000074 (v01 BOCHS
BXPCFACP 00000001 BXPC 00000001)
[ 0.058412] ACPI: DSDT 0x000000001FFE0040 0013C4 (v01 BOCHS
BXPCDSDT 00000001 BXPC 00000001)
[ 0.062056] ACPI: FACS 0x000000001FFE0000 000040
[ 0.064325] ACPI: APIC 0x000000001FFE1478 000078 (v01 BOCHS
BXPCAPIC 00000001 BXPC 00000001)
[ 0.068546] ACPI: HPET 0x000000001FFE14F0 000038 (v01 BOCHS
BXPCHPET 00000001 BXPC 00000001)
[ 0.073026] ACPI: WAET 0x000000001FFE1528 000028 (v01 BOCHS
BXPCWAET 00000001 BXPC 00000001)
[ 0.078063] No NUMA configuration found
[ 0.080007] Faking a node at [mem 0x0000000000000000-0x000000001ffdffff]
[ 0.083430] NODE_DATA(0) allocated [mem 0x1dfb1000-0x1dfdafff]
[ 0.086934] Zone ranges:
[ 0.087919] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.089927] DMA32 [mem 0x0000000001000000-0x000000001ffdffff]
[ 0.092270] Normal empty
[ 0.093824] Device empty
[ 0.095069] Movable zone start for each node
[ 0.096880] Early memory node ranges
[ 0.098410] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.101939] node 0: [mem 0x0000000000100000-0x000000001ffdffff]
[ 0.106130] Zeroed struct page in unavailable ranges: 130 pages
[ 0.106139] Initmem setup node 0 [mem 0x0000000000001000-0x000000001ffdffff]
[ 0.115094] ACPI: PM-Timer IO Port: 0x608
[ 0.117173] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.121073] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[ 0.123537] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.126254] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.129062] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.131888] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.135065] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.137286] Using ACPI (MADT) for SMP configuration information
[ 0.139743] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.141956] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.143678] PM: hibernation: Registered nosave memory: [mem
0x00000000-0x00000fff]
[ 0.146249] PM: hibernation: Registered nosave memory: [mem
0x0009f000-0x0009ffff]
[ 0.148784] PM: hibernation: Registered nosave memory: [mem
0x000a0000-0x000effff]
[ 0.152756] PM: hibernation: Registered nosave memory: [mem
0x000f0000-0x000fffff]
[ 0.155969] [mem 0x20000000-0xfeffbfff] available for PCI devices
[ 0.158542] Booting paravirtualized kernel on KVM
[ 0.160520] clocksource: refined-jiffies: mask: 0xffffffff
max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.171049] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:1
nr_cpu_ids:1 nr_node_ids:1
[ 0.175162] percpu: Embedded 54 pages/cpu s183512 r8192 d29480 u2097152
[ 0.178044] kvm-guest: stealtime: cpu 0, msr 1d418480
[ 0.180197] kvm-guest: PV spinlocks disabled, no host support
[ 0.182655] Built 1 zonelists, mobility grouping on. Total pages: 128872
[ 0.188717] Policy zone: DMA32
[ 0.190055] Kernel command line: root=/dev/ram0 console=ttyS0
hung_task_panic=1 earlyprintk=ttyS0,115200
[ 0.194307] Dentry cache hash table entries: 65536 (order: 7,
524288 bytes, linear)
[ 0.197691] Inode-cache hash table entries: 32768 (order: 6, 262144
bytes, linear)
[ 0.201953] mem auto-init: stack:off, heap alloc:on, heap free:off
[ 0.206787] Memory: 232680K/523768K available (12295K kernel code,
2462K rwdata, 4008K rodata, 2444K init, 1888K bss, 71012K reserved, 0K
cma-reserved)
[ 0.212719] random: get_random_u64 called from
kmem_cache_open+0x27/0x500 with crng_init=0
[ 0.212892] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.220858] Kernel/User page tables isolation: enabled
[ 0.223136] ftrace: allocating 36189 entries in 142 pages
[ 0.249721] ftrace: allocated 142 pages with 4 groups
[ 0.252993] rcu: Hierarchical RCU implementation.
[ 0.255411] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=1.
[ 0.258890] Rude variant of Tasks RCU enabled.
[ 0.260761] Tracing variant of Tasks RCU enabled.
[ 0.262625] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[ 0.265212] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.271882] NR_IRQS: 524544, nr_irqs: 256, preallocated irqs: 16
[ 0.295378] Console: colour VGA+ 80x25
[ 0.297439] printk: console [ttyS0] enabled
[ 0.297439] printk: console [ttyS0] enabled
[ 0.302560] printk: bootconsole [earlyser0] disabled
[ 0.302560] printk: bootconsole [earlyser0] disabled
[ 0.307728] ACPI: Core revision 20201113
[ 0.310172] clocksource: hpet: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 19112604467 ns
[ 0.315115] APIC: Switch to symmetric I/O mode setup
[ 0.318899] x2apic enabled
[ 0.321088] Switched APIC routing to physical x2apic.
[ 0.326381] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.329537] clocksource: tsc-early: mask: 0xffffffffffffffff
max_cycles: 0x1702c1d9d3d, max_idle_ns: 440795278546 ns
[ 0.335417] Calibrating delay loop (skipped) preset value.. 3192.74
BogoMIPS (lpj=6385488)
[ 0.339418] pid_max: default: 32768 minimum: 301
[ 0.341620] LSM: Security Framework initializing
[ 0.343446] Yama: becoming mindful.
[ 0.345314] AppArmor: AppArmor initialized
[ 0.347421] TOMOYO Linux initialized
[ 0.349270] Mount-cache hash table entries: 1024 (order: 1, 8192
bytes, linear)
[ 0.351417] Mountpoint-cache hash table entries: 1024 (order: 1,
8192 bytes, linear)
Poking KASLR using RDTSC...
[ 0.361119] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.363416] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.367419] Spectre V1 : Mitigation: usercopy/swapgs barriers and
__user pointer sanitization
[ 0.370260] Spectre V2 : Mitigation: Full generic retpoline
[ 0.371412] Spectre V2 : Spectre v2 / SpectreRSB mitigation:
Filling RSB on context switch
[ 0.374257] Speculative Store Bypass: Vulnerable
[ 0.375416] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 0.389948] Freeing SMP alternatives memory: 36K
[ 0.505617] APIC calibration not consistent with PM-Timer: 101ms
instead of 100ms
[ 0.507410] APIC delta adjusted to PM-Timer: 6252138 (6321934)
[ 0.507513] smpboot: CPU0: Intel QEMU Virtual CPU version 2.5+
(family: 0x6, model: 0x6, stepping: 0x3)
[ 0.512111] Performance Events: PMU not available due to
virtualization, using software events only.
[ 0.515510] rcu: Hierarchical SRCU implementation.
[ 0.517439] NMI watchdog: Perf NMI watchdog permanently disabled
[ 0.519477] smp: Bringing up secondary CPUs ...
[ 0.523416] smp: Brought up 1 node, 1 CPU
[ 0.525134] smpboot: Max logical packages: 1
[ 0.526969] smpboot: Total of 1 processors activated (3192.74 BogoMIPS)
[ 0.532118] node 0 deferred pages initialised in 4ms
[ 0.534052] devtmpfs: initialized
[ 0.535262] x86/mm: Memory block size: 128MB
[ 0.535711] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.539428] futex hash table entries: 256 (order: 2, 16384 bytes, linear)
[ 0.541875] pinctrl core: initialized pinctrl subsystem
[ 0.543936] NET: Registered protocol family 16
[ 0.547553] audit: initializing netlink subsys (disabled)
[ 0.551634] thermal_sys: Registered thermal governor 'fair_share'
[ 0.551637] thermal_sys: Registered thermal governor 'bang_bang'
[ 0.554723] thermal_sys: Registered thermal governor 'step_wise'
[ 0.555425] audit: type=2000 audit(1610926004.833:1):
state=initialized audit_enabled=0 res=1
[ 0.563420] thermal_sys: Registered thermal governor 'user_space'
[ 0.563434] cpuidle: using governor ladder
[ 0.569524] cpuidle: using governor menu
[ 0.571485] ACPI: bus type PCI registered
[ 0.573517] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 0.576324] PCI: Using configuration type 1 for base access
[ 0.580588] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.772658] ACPI: Added _OSI(Module Device)
[ 0.774521] ACPI: Added _OSI(Processor Device)
[ 0.775417] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.778176] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.779432] ACPI: Added _OSI(Linux-Dell-Video)
[ 0.783458] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 0.785480] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[ 0.788133] ACPI: 1 ACPI AML tables successfully acquired and loaded
[ 0.792295] ACPI: Interpreter enabled
[ 0.794716] ACPI: (supports S0 S3 S4 S5)
[ 0.795415] ACPI: Using IOAPIC for interrupt routing
[ 0.797540] PCI: Using host bridge windows from ACPI; if necessary,
use "pci=nocrs" and report a bug
[ 0.799590] ACPI: Enabled 2 GPEs in block 00 to 0F
[ 0.807844] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 0.811186] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM
Segments MSI HPX-Type3]
[ 0.811444] acpi PNP0A03:00: fail to add MMCONFIG information,
can't access extended PCI configuration space under this bridge.
[ 0.815895] acpiphp: Slot [3] registered
[ 0.819473] acpiphp: Slot [4] registered
[ 0.821210] acpiphp: Slot [5] registered
[ 0.823453] acpiphp: Slot [6] registered
[ 0.825153] acpiphp: Slot [7] registered
[ 0.827461] acpiphp: Slot [8] registered
[ 0.829166] acpiphp: Slot [9] registered
[ 0.831537] acpiphp: Slot [10] registered
[ 0.833276] acpiphp: Slot [11] registered
[ 0.835447] acpiphp: Slot [12] registered
[ 0.837183] acpiphp: Slot [13] registered
[ 0.839428] acpiphp: Slot [14] registered
[ 0.841167] acpiphp: Slot [15] registered
[ 0.843042] acpiphp: Slot [16] registered
[ 0.843455] acpiphp: Slot [17] registered
[ 0.845205] acpiphp: Slot [18] registered
[ 0.847452] acpiphp: Slot [19] registered
[ 0.849209] acpiphp: Slot [20] registered
[ 0.851448] acpiphp: Slot [21] registered
[ 0.853215] acpiphp: Slot [22] registered
[ 0.855447] acpiphp: Slot [23] registered
[ 0.857179] acpiphp: Slot [24] registered
[ 0.859478] acpiphp: Slot [25] registered
[ 0.861807] acpiphp: Slot [26] registered
[ 0.863150] acpiphp: Slot [27] registered
[ 0.863458] acpiphp: Slot [28] registered
[ 0.865444] acpiphp: Slot [29] registered
[ 0.867451] acpiphp: Slot [30] registered
[ 0.868826] acpiphp: Slot [31] registered
[ 0.870296] PCI host bridge to bus 0000:00
[ 0.871415] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
[ 0.875414] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
[ 0.879418] pci_bus 0000:00: root bus resource [mem
0x000a0000-0x000bffff window]
[ 0.883416] pci_bus 0000:00: root bus resource [mem
0x20000000-0xfebfffff window]
[ 0.887416] pci_bus 0000:00: root bus resource [mem
0x100000000-0x17fffffff window]
[ 0.891277] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.891510] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[ 0.896375] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[ 0.900672] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[ 0.908157] pci 0000:00:01.1: reg 0x20: [io 0xc000-0xc00f]
[ 0.912723] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7]
[ 0.915417] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6]
[ 0.919413] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177]
[ 0.923413] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376]
[ 0.926608] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[ 0.928431] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by
PIIX4 ACPI
[ 0.931433] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB
[ 0.935898] pci 0000:00:02.0: [1234:1111] type 00 class 0x030000
[ 0.941665] pci 0000:00:02.0: reg 0x10: [mem 0xfd000000-0xfdffffff pref]
[ 0.949458] pci 0000:00:02.0: reg 0x18: [mem 0xfebf0000-0xfebf0fff]
[ 0.958562] pci 0000:00:02.0: reg 0x30: [mem 0xfebe0000-0xfebeffff pref]
[ 0.961151] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[ 0.963610] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 0.966032] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 0.967627] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 0.971526] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[ 0.974667] iommu: Default domain type: Translated
[ 0.975568] pci 0000:00:02.0: vgaarb: setting as boot VGA device
[ 0.978113] pci 0000:00:02.0: vgaarb: VGA device added:
decodes=io+mem,owns=io+mem,locks=none
[ 0.979413] pci 0000:00:02.0: vgaarb: bridge control possible
[ 0.983413] vgaarb: loaded
[ 0.984827] EDAC MC: Ver: 3.0.0
[ 0.988222] NetLabel: Initializing
[ 0.991415] NetLabel: domain hash size = 128
[ 0.992873] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
[ 0.994825] NetLabel: unlabeled traffic allowed by default
[ 0.995430] PCI: Using ACPI for IRQ routing
[ 0.999490] hpet: 3 channels of 0 reserved for per-cpu timers
[ 1.001394] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 1.002975] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 1.009634] clocksource: Switched to clocksource kvm-clock
[ 1.019197] VFS: Disk quotas dquot_6.6.0
[ 1.021644] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 1.029347] AppArmor: AppArmor Filesystem Enabled
[ 1.031431] pnp: PnP ACPI init
[ 1.033294] pnp: PnP ACPI: found 6 devices
[ 1.041838] clocksource: acpi_pm: mask: 0xffffff max_cycles:
0xffffff, max_idle_ns: 2085701024 ns
[ 1.045506] NET: Registered protocol family 2
[ 1.047325] tcp_listen_portaddr_hash hash table entries: 256
(order: 0, 4096 bytes, linear)
[ 1.051250] TCP established hash table entries: 4096 (order: 3,
32768 bytes, linear)
[ 1.054797] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[ 1.057867] TCP: Hash tables configured (established 4096 bind 4096)
[ 1.060657] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[ 1.063438] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[ 1.066616] NET: Registered protocol family 1
[ 1.068525] NET: Registered protocol family 44
[ 1.070988] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
[ 1.073088] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window]
[ 1.075350] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[ 1.078851] pci_bus 0000:00: resource 7 [mem 0x20000000-0xfebfffff window]
[ 1.082396] pci_bus 0000:00: resource 8 [mem 0x100000000-0x17fffffff window]
[ 1.086505] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 1.089003] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 1.091193] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 1.093281] pci 0000:00:02.0: Video device with shadowed ROM at
[mem 0x000c0000-0x000dffff]
[ 1.096308] PCI: CLS 0 bytes, default 64
[ 1.098784] Trying to unpack rootfs image as initramfs...
[ 1.756924] Freeing initrd memory: 32788K
[ 1.759044] clocksource: tsc: mask: 0xffffffffffffffff max_cycles:
0x1702c1d9d3d, max_idle_ns: 440795278546 ns
[ 1.765351] Initialise system trusted keyrings
[ 1.767287] Key type blacklist registered
[ 1.769096] workingset: timestamp_bits=36 max_order=17 bucket_order=0
[ 1.773218] zbud: loaded
[ 1.774596] integrity: Platform Keyring initialized
[ 1.776709] Key type asymmetric registered
[ 1.779399] Asymmetric key parser 'x509' registered
[ 1.781504] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 251)
[ 1.784737] io scheduler mq-deadline registered
[ 1.786842] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 1.790028] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 1.793393] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200)
is a 16550A
[ 1.798437] Linux agpgart interface v0.103
[ 1.799944] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <[email protected]>
[ 1.802535] AMD-Vi: AMD IOMMUv2 functionality not available on this system
[ 1.806358] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU]
at 0x60,0x64 irq 1,12
[ 1.810762] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 1.813927] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 1.816306] mousedev: PS/2 mouse device common for all mice
[ 1.819254] input: AT Translated Set 2 keyboard as
/devices/platform/i8042/serio0/input/input0
[ 1.823023] rtc_cmos 00:05: RTC can wake from S4
[ 1.826320] rtc_cmos 00:05: registered as rtc0
[ 1.829030] rtc_cmos 00:05: setting system clock to
2021-01-17T23:26:45 UTC (1610926005)
[ 1.832489] rtc_cmos 00:05: alarms up to one day, y3k, 242 bytes
nvram, hpet irqs
[ 1.835661] intel_pstate: CPU model not supported
[ 1.837656] ledtrig-cpu: registered to indicate activity on CPUs
[ 1.840489] NET: Registered protocol family 10
[ 1.857135] Segment Routing with IPv6
[ 1.858772] mip6: Mobile IPv6
[ 1.860093] NET: Registered protocol family 17
[ 1.862844] mpls_gso: MPLS GSO support
[ 1.864379] IPI shorthand broadcast: enabled
[ 1.865844] sched_clock: Marking stable (1819436328,
44726425)->(1868284483, -4121730)
[ 1.869029] registered taskstats version 1
[ 1.870771] Loading compiled-in X.509 certificates
[ 1.873185] zswap: loaded using pool zstd/zbud
[ 1.875399] Key type ._fscrypt registered
[ 1.877158] Key type .fscrypt registered
[ 1.879447] Key type fscrypt-provisioning registered
[ 1.881189] AppArmor: AppArmor sha1 policy hashing enabled
[ 1.886920] Freeing unused kernel image (initmem) memory: 2444K
[ 1.891517] Write protecting the kernel read-only data: 18432k
[ 1.896049] Freeing unused kernel image (text/rodata gap) memory: 2040K
[ 1.899196] Freeing unused kernel image (rodata/data gap) memory: 88K
[ 1.968324] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[ 1.971797] x86/mm: Checking user space page tables
[ 2.037848] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[ 2.040258] Run /init as init process
Loading, please wait...
Starting version 247.2-4
[ 2.157223] floppy: module verification failed: signature and/or
required key missing - tainting kernel
[ 2.179326] i2c_piix4: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.183558] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.187991] floppy: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.195047] psmouse: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.210404] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[ 2.231055] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... [ 2.261574] libcrc32c:
Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
Scanning for Btrfs filesystems
done.
Begin: Waiting for root file system ... Begin: Running
/scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
qemu-system-x86_64: terminating on signal 2

- Sedat -


Attachments:
config-5.11.0-rc3-10-amd64-clang12-pgo (232.66 kB)

2021-01-18 00:41:58

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 18, 2021 at 12:33 AM Sedat Dilek <[email protected]> wrote:

[ big snip ]

> > > > > > > > > > > > +Workflow
> > > > > > > > > > > > +========
> > > > > > > > > > > > +
> > > > > > > > > > > > +The PGO kernel can be run on the host or test machines. The data though should
> > > > > > > > > > > > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > > > > > > > > > > > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > > > > > > > > > > > +Clang version.
> > > > > > > > > > > > +
> > > > > > > > > > > > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > > > > > > > > > > > +etc. Clang offers tools to perform these tasks.
> > > > > > > > > > > > +
> > > > > > > > > > > > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > > > > > > > > > > > +using the result to optimize the kernel:
> > > > > > > > > > > > +
> > > > > > > > > > > > +1) Install the kernel on the TEST machine.
> > > > > > > > > > > > +
> > > > > > > > > > > > +2) Reset the data counters right before running the load tests
> > > > > > > > > > > > +
> > > > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > > > +
> > > > > > > > > > > > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > > > > > > > > > > > +
> > > > > > > > > > >
> > > > > > > > > > > I do not get this...
> > > > > > > > > > >
> > > > > > > > > > > # mount | grep debugfs
> > > > > > > > > > > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
> > > > > > > > > > >
> > > > > > > > > > > After the load-test...?
> > > > > > > > > > >
> > > > > > > > > > > echo 0 > /sys/kernel/debug/pgo/reset
> > > > > > > > > > >
> > > > > > > > > > Writing anything to /sys/kernel/debug/pgo/reset will cause it to reset
> > > > > > > > > > the profiling counters. I picked 1 (one) semi-randomly, but it could
> > > > > > > > > > be any number, letter, your favorite short story, etc. You don't want
> > > > > > > > > > to reset it before collecting the profiling data from your load tests
> > > > > > > > > > though.
> > > > > > > > > >
> > > > > > > > > > > > +3) Run the load tests.
> > > > > > > > > > > > +
> > > > > > > > > > > > +4) Collect the raw profile data
> > > > > > > > > > > > +
> > > > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > > > +
> > > > > > > > > > > > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > > > > > +
> > > > > > > > > > >
> > > > > > > > > > > This is only 4,9M small and seen from the date 5mins before I run the
> > > > > > > > > > > echo-1 line.
> > > > > > > > > > >
> > > > > > > > > > > # ll /sys/kernel/debug/pgo
> > > > > > > > > > > insgesamt 0
> > > > > > > > > > > drwxr-xr-x 2 root root 0 16. Jan 17:29 .
> > > > > > > > > > > drwx------ 41 root root 0 16. Jan 17:29 ..
> > > > > > > > > > > -rw------- 1 root root 0 16. Jan 17:29 profraw
> > > > > > > > > > > --w------- 1 root root 0 16. Jan 18:19 reset
> > > > > > > > > > >
> > > > > > > > > > > # cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > > > > > > > > > >
> > > > > > > > > > > # ll /tmp/vmlinux.profraw
> > > > > > > > > > > -rw------- 1 root root 4,9M 16. Jan 17:29 /tmp/vmlinux.profraw
> > > > > > > > > > >
> > > > > > > > > > > For me there was no prof-data collected from my defconfig kernel-build.
> > > > > > > > > > >
> > > > > > > > > > The /sys/kernel/debug/pgo/profraw file is read-only. Nothing writes to
> > > > > > > > > > it, not even the kernel. All it does is serialize the profiling
> > > > > > > > > > counters from a memory location in the kernel into a format that
> > > > > > > > > > LLVM's tools can understand.
> > > > > > > > > >
> > > > > > > > > > > > +5) (Optional) Download the raw profile data to the HOST machine.
> > > > > > > > > > > > +
> > > > > > > > > > > > +6) Process the raw profile data
> > > > > > > > > > > > +
> > > > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > > > +
> > > > > > > > > > > > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > > > > > > > > > +
> > > > > > > > > > >
> > > > > > > > > > > Is that executed in /path/to/linux/git?
> > > > > > > > > > >
> > > > > > > > > > The llvm-profdata tool is not in the linux source tree. You need to
> > > > > > > > > > grab it from a clang distribution (or built from clang's git repo).
> > > > > > > > > >
> > > > > > > > > > > > + Note that multiple raw profile data files can be merged during this step.
> > > > > > > > > > > > +
> > > > > > > > > > > > +7) Rebuild the kernel using the profile data (PGO disabled)
> > > > > > > > > > > > +
> > > > > > > > > > > > + .. code-block:: sh
> > > > > > > > > > > > +
> > > > > > > > > > > > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > > > > > > > >
> > > > > > > > > > > How big is vmlinux.profdata (make defconfig)?
> > > > > > > > > > >
> > > > > > > > > > I don't have numbers for this, but from what you listed here, it's ~5M
> > > > > > > > > > in size. The size is proportional to the number of counters
> > > > > > > > > > instrumented in the kernel.
> > > > > > > > > >
> > > > > > > > > > > Do I need to do a full defconfig build or can I stop the build after
> > > > > > > > > > > let me say 10mins?
> > > > > > > > > > >
> > > > > > > > > > You should do a full rebuild. Make sure that PGO is disabled during the rebuild.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks Bill for all the information.
> > > > > > > > >
> > > > > > > > > And sorry if I am so pedantic.
> > > > > > > > >
> > > > > > > > > I have installed my Debian system with Legacy-BIOS enabled.
> > > > > > > > >
> > > > > > > > > When I rebuild with KCFLAGS=-fprofile-use=vmlinux.profdata (LLVM=1 I
> > > > > > > > > have as a default) my system hangs on reboot.
> > > > > > > > >
> > > > > > > > > [ diffconfig ]
> > > > > > > > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo
> > > > > > > > > /boot/config-5.11.0-rc3-9-amd64-clang12-pgo
> > > > > > > > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-9-amd64-clang12-pgo"
> > > > > > > > > PGO_CLANG y -> n
> > > > > > > > >
> > > > > > > > > [ my make line ]
> > > > > > > > > $ cat ../start-build_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > > > > > > dileks 63120 63095 0 06:47 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > > > > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > > > > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > > > > > > LOCALVERSION=-9-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > > > > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > > > > > > KBUILD_BUILD_TIMESTAMP=2021-01-17 bindeb-pkg
> > > > > > > > > KDEB_PKGVERSION=5.11.0~rc3-9~bullseye+dileks1
> > > > > > > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > > > > > > >
> > > > > > > > > ( Yes, 06:47 a.m. in the morning :-). )
> > > > > > > > >
> > > > > > > > > When I boot with the rebuild Linux-kernel I see:
> > > > > > > > >
> > > > > > > > > Wrong EFI loader signature
> > > > > > > > > ...
> > > > > > > > > Decompressing
> > > > > > > > > Parsing EFI
> > > > > > > > > Performing Relocations done.
> > > > > > > > > Booting the Kernel.
> > > > > > > > >
> > > > > > > > > *** SYSTEM HANGS ***
> > > > > > > > > ( I waited for approx 1 min )
> > > > > > > > >
> > > > > > > > > I tried to turn UEFI support ON and OFF.
> > > > > > > > > No success.
> > > > > > > > >
> > > > > > > > > Does Clang-PGO support Legacy-BIOS or is something different wrong?
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > >
> > > > > > > > My bootloader is GRUB.
> > > > > > > >
> > > > > > > > In UEFI-BIOS settings there is no secure-boot disable option.
> > > > > > > > Just simple "Use UEFI BIOS" enabled|disabled.
> > > > > > > >
> > > > > > > > Installed Debian packages:
> > > > > > > >
> > > > > > > > ii grub-common 2.04-12
> > > > > > > > ii grub-pc 2.04-12
> > > > > > > > ii grub-pc-bin 2.04-12
> > > > > > > > ii grub2-common 2.04-12
> > > > > > > >
> > > > > > > > I found in the below link to do in grub-shell:
> > > > > > > >
> > > > > > > > set check_signatures=no
> > > > > > > >
> > > > > > > > But this is when grub-efi is installed.
> > > > > > > >
> > > > > > > > - Sedat -
> > > > > > > >
> > > > > > > > Link: https://unix.stackexchange.com/questions/126286/grub-efi-disable-signature-check
> > > > > > >
> > > > > > > Forget about that "Wrong EFI bootloader" - I see this with all other
> > > > > > > kernels (all boot fine).
> > > > > > >
> > > > > > > I tried in QEMU with and without KASLR:
> > > > > > >
> > > > > > > [ run_qemu.sh ]
> > > > > > > KPATH=$(pwd)
> > > > > > >
> > > > > > > APPEND="root=/dev/ram0 console=ttyS0 hung_task_panic=1 earlyprintk=ttyS0,115200"
> > > > > > > APPEND="$APPEND nokaslr"
> > > > > > >
> > > > > > > qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage -initrd
> > > > > > > $KPATH/initrd.img -m 512 -net none -serial stdio -append "${APPEND}"
> > > > > > > [ /run_qemu.sh ]
> > > > > > >
> > > > > > > $ ./run_qemu.sh
> > > > > > > Probing EDD (edd=off to disable)... ok
> > > > > > > Wrong EFI loader signature.
> > > > > > > early console in extract_kernel
> > > > > > > input_data: 0x000000000289940d
> > > > > > > input_len: 0x000000000069804a
> > > > > > > output: 0x0000000001000000
> > > > > > > output_len: 0x0000000001ef2010
> > > > > > > kernel_total_size: 0x0000000001c2c000
> > > > > > > needed_size: 0x0000000002000000
> > > > > > > trampoline_32bit: 0x000000000009d000
> > > > > > >
> > > > > > >
> > > > > > > KASLR disabled: 'nokaslr' on cmdline.
> > > > > > >
> > > > > > >
> > > > > > > Decompressing Linux... Parsing ELF... No relocation needed... done.
> > > > > > > Booting the kernel.
> > > > > > >
> > > > > > > QEMU run stops, too.
> > > > > > >
> > > > > >
> > > > > > I re-generated my initrd.img with GZIP as compressor (my default is ZSTD).
> > > > > >
> > > > > > --- /etc/initramfs-tools/initramfs.conf 2021-01-17 12:35:30.823818501 +0100
> > > > > > +++ /etc/initramfs-tools/initramfs.conf.zstd 2020-09-21
> > > > > > 23:55:43.121735427 +0200
> > > > > > @@ -41,7 +41,7 @@ KEYMAP=n
> > > > > > # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
> > > > > > #
> > > > > >
> > > > > > -COMPRESS=gzip
> > > > > > +COMPRESS=zstd
> > > > > >
> > > > > > #
> > > > > > # DEVICE: ...
> > > > > >
> > > > > > root# KVER="5.11.0-rc3-9-amd64-clang12-pgo" ; update-initramfs -c -k $KVER
> > > > > >
> > > > > > QEMU boot stops at the same stage.
> > > > > >
> > > > > > Now, my head is empty...
> > > > > >
> > > > > > Any comments?
> > > > > >
> > > > >
> > > > > ( Just as a side note I have Nick's DWARF-v5 support enabled. )
> > > > >
> > > > > There is one EFI related warning in my build-log:
> > > > >
> > > > > $ grep warning: build-log_5.11.0-rc3-9-amd64-clang12-pgo.txt
> > > > > dpkg-architecture: warning: specified GNU system type x86_64-linux-gnu
> > > > > does not match CC system type x86_64-pc-linux-gnu, try setting a
> > > > > correct CC environment variable
> > > > > warning: arch/x86/platform/efi/quirks.c: Function control flow change
> > > > > detected (hash mismatch) efi_arch_mem_reserve Hash =
> > > > > 391331300655996873 [-Wbackend-plugin]
> > > > > warning: arch/x86/platform/efi/efi.c: Function control flow change
> > > > > detected (hash mismatch) efi_attr_is_visible Hash = 567185240781730690
> > > > > [-Wbackend-plugin]
> > > > > arch/x86/crypto/aegis128-aesni-glue.c:265:30: warning: unused variable
> > > > > 'simd_alg' [-Wunused-variable]
> > > > > warning: lib/crypto/sha256.c: Function control flow change detected
> > > > > (hash mismatch) sha256_update Hash = 744640996947387358
> > > > > [-Wbackend-plugin]
> > > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > > change detected (hash mismatch) memcmp Hash = 742261418966908927
> > > > > [-Wbackend-plugin]
> > > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > > change detected (hash mismatch) bcmp Hash = 742261418966908927
> > > > > [-Wbackend-plugin]
> > > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > > change detected (hash mismatch) strcmp Hash = 536873291001348520
> > > > > [-Wbackend-plugin]
> > > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > > change detected (hash mismatch) strnlen Hash = 146835646621254984
> > > > > [-Wbackend-plugin]
> > > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > > change detected (hash mismatch) simple_strtoull Hash =
> > > > > 252792765950587360 [-Wbackend-plugin]
> > > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > > change detected (hash mismatch) strstr Hash = 391331303349076211
> > > > > [-Wbackend-plugin]
> > > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > > change detected (hash mismatch) strchr Hash = 1063705159280644635
> > > > > [-Wbackend-plugin]
> > > > > warning: arch/x86/boot/compressed/string.c: Function control flow
> > > > > change detected (hash mismatch) kstrtoull Hash = 758414239132790022
> > > > > [-Wbackend-plugin]
> > > > > drivers/infiniband/hw/hfi1/platform.o: warning: objtool: tune_serdes()
> > > > > falls through to next function apply_tx_lanes()
> > > > >
> > > > > Cannot say if this information is helpful.
> > > > >
> > > >
> > > > My LLVM/Clang v12 is from <apt.llvm.org>:
> > > >
> > > > clang-12 version 1:12~++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> > > >
> > > > My kernel-config is attached.
> > > >
> > >
> > > I dropped "LLVM_IAS=1" from my make line and did for my next build:
> > >
> > > $ scripts/diffconfig /boot/config-5.11.0-rc3-8-amd64-clang12-pgo .config
> > > BUILD_SALT "5.11.0-rc3-8-amd64-clang12-pgo" -> "5.11.0-rc3-10-amd64-clang12-pgo"
> > > DEBUG_INFO_DWARF2 n -> y
> > > DEBUG_INFO_DWARF5 y -> n
> > > PGO_CLANG y -> n
> > >
> > > Means dropped DWARF5 support.
> > >
> > Hi Sedat,
> >
> > Using PGO just improves optimizations. So unless there's miscompile,
> > or some other nefarious thing, it shouldn't affect how the boot loader
> > runs.
> >
> > As a sanity check, does the same Linux source and compiler version
> > generate a bootable kernel when PGO isn't used?
> >
>
> Yes, I can boot with the same code base without PGO.
>
> With the attached kernel-config.
>
> I remember there is a fix in CBL issue tracker for...
>
> ( https://github.com/ClangBuiltLinux/linux/issues/1250 )
>
> Loading, please wait...
> Starting version 247.2-4
> [ 2.157223] floppy: module verification failed: signature and/or
> required key missing - tainting kernel
> [ 2.179326] i2c_piix4: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
> [ 2.183558] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
> [ 2.187991] floppy: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
> [ 2.195047] psmouse: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
> [ 2.210404] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
> [ 2.231055] scsi_mod: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
>

[ CC Fangrui ]

With the attached...

[PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
undefined symbols

...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
For details see ClangBuiltLinux issue #1250 "Unknown symbol
_GLOBAL_OFFSET_TABLE_ loading kernel modules".

@ Bill Nick Sami Nathan

1, Can you say something of the impact passing "LLVM_IAS=1" to make?
2. Can you please try Nick's DWARF v5 support patchset v5 and
CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?

I would like to know what the impact of the Clang's Integrated
Assembler and DWARF v5 are.

I dropped both means...

1. Do not pass "LLVM_IAS=1" to make.
2. Use default DWARF v2 (with Nick's patchset: CONFIG_DEBUG_INFO_DWARF2=y).

...for a successfull build and boot on bare metal.

Thanks.

- Sedat -

[1] https://github.com/ClangBuiltLinux/linux/issues/1250


Attachments:
dmesg-T_5.11.0-rc3-10-amd64-clang12-pgo.txt (72.61 kB)
v3_20210115_maskray_module_ignore__global_offset_table__when_warning_for_undefined_symbols.mbx (3.78 kB)
config-5.11.0-rc3-10-amd64-clang12-pgo (232.66 kB)
v5_20210115_ndesaulniers_kbuild_dwarf_v5_support.mbx (13.80 kB)
Download all attachments

2021-01-18 01:07:06

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 1:13 AM Nick Desaulniers
<[email protected]> wrote:
>
> > On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > > <[email protected]> wrote:
> > > >
> > > > However, I see an issue with actually using the data:
> > > >
> > > > $ sudo -s
> > > > # mount -t debugfs none /sys/kernel/debug
> > > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > # chown nathan:nathan vmlinux.profraw
> > > > # exit
> > > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > > error: No profiles could be merged.
> > > >
> > > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > > "real" x86 hardware that I can afford to test on right now.
> > >
> > > Same.
> > >
> > > I think the magic calculation in this patch may differ from upstream
> > > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> >
> > Err...it looks like it was the padding calculation. With that fixed
> > up, we can query the profile data to get insights on the most heavily
> > called functions. Here's what my top 20 are (reset, then watch 10
> > minutes worth of cat videos on youtube while running `find /` and
> > sleeping at my desk). Anything curious stand out to anyone?
>
> Hello world from my personal laptop whose kernel was rebuilt with
> profiling data! Wow, I can run `find /` and watch cat videos on youtube
> so fast! Users will love this! /s
>
> Checking the sections sizes of .text.hot. and .text.unlikely. looks
> good!
>

On each rebuild I need to pass to make ...?

LLVM=1 -fprofile-use=vmlinux.profdata

Did you try together with passing LLVM_IAS=1 to make?

- Sedat -


> >
> > $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> > Instrumentation level: IR entry_first = 0
> > Total functions: 48970
> > Maximum function count: 62070879
> > Maximum internal block count: 83221158
> > Top 20 functions with the largest internal block counts:
> > drivers/tty/n_tty.c:n_tty_write, max count = 83221158
> > rcu_read_unlock_strict, max count = 62070879
> > _cond_resched, max count = 25486882
> > rcu_all_qs, max count = 25451477
> > drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
> > _raw_spin_unlock_irqrestore, max count = 18874121
> > drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
> > _raw_spin_lock_irqsave, max count = 18509161
> > memchr, max count = 15525452
> > _raw_spin_lock, max count = 15484254
> > __mod_memcg_state, max count = 14604619
> > __mod_memcg_lruvec_state, max count = 14602783
> > fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
> > __mod_lruvec_state, max count = 12527154
> > __mod_node_page_state, max count = 12525172
> > native_sched_clock, max count = 8904692
> > sched_clock_cpu, max count = 8895832
> > sched_clock, max count = 8894627
> > kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
> > fpregs_assert_state_consistent, max count = 8287198
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210116001324.2865-1-nick.desaulniers%40gmail.com.

2021-01-18 05:08:59

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
>
> [ big snip ]

[More snippage.]

> [ CC Fangrui ]
>
> With the attached...
>
> [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> undefined symbols
>
> ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> For details see ClangBuiltLinux issue #1250 "Unknown symbol
> _GLOBAL_OFFSET_TABLE_ loading kernel modules".
>
Thanks for confirming that this works with the above patch.

> @ Bill Nick Sami Nathan
>
> 1, Can you say something of the impact passing "LLVM_IAS=1" to make?

The integrated assembler and this option are more-or-less orthogonal
to each other. One can still use the GNU assembler with PGO. If you're
having an issue, it may be related to ClangBuiltLinux issue #1250.

> 2. Can you please try Nick's DWARF v5 support patchset v5 and
> CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
>
I know Nick did several tests with PGO. He may have looked into it
already, but we can check.

> I would like to know what the impact of the Clang's Integrated
> Assembler and DWARF v5 are.
>
> I dropped both means...
>
> 1. Do not pass "LLVM_IAS=1" to make.
> 2. Use default DWARF v2 (with Nick's patchset: CONFIG_DEBUG_INFO_DWARF2=y).
>
> ...for a successfull build and boot on bare metal.
>

[Next message]

> On each rebuild I need to pass to make ...?
>
> LLVM=1 -fprofile-use=vmlinux.profdata
>
Yes.

> Did you try together with passing LLVM_IAS=1 to make?

One of my tests was with the integrated assembler enabled. Are you
finding issues with it?

The problem with using top-of-tree clang is that it's not necessarily
stable. You could try using the clang 11.x release (changing the
"CLANG_VERSION >= 120000" in kernel/pgo/Kconfig/ to "CLANG_VERSION >=
110000").

-bw

2021-01-18 21:59:34

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 18, 2021 at 9:26 AM Sedat Dilek <[email protected]> wrote:
>
> On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
> >
> > On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > [ big snip ]
> > >
> > > [More snippage.]
> > >
> > > > [ CC Fangrui ]
> > > >
> > > > With the attached...
> > > >
> > > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > > undefined symbols
> > > >
> > > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > > >
> > > Thanks for confirming that this works with the above patch.
> > >
> > > > @ Bill Nick Sami Nathan
> > > >
> > > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> > >
> > > The integrated assembler and this option are more-or-less orthogonal
> > > to each other. One can still use the GNU assembler with PGO. If you're
> > > having an issue, it may be related to ClangBuiltLinux issue #1250.
> > >
> > > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > > >
> > > I know Nick did several tests with PGO. He may have looked into it
> > > already, but we can check.
> > >
> >
> > Reproducible.
> >
> > LLVM_IAS=1 + DWARF5 = Not bootable
> >
> > I will try:
> >
> > LLVM_IAS=1 + DWARF4
> >
>
> I was not able to boot into such a built Linux-kernel.
>
PGO will have no effect on debugging data. If this is an issue with
DWARF, then it's likely orthogonal to the PGO patch.

> For me worked: DWARF2 and LLVM_IAS=1 *not* set.
>
> Of course, this could be an issue with my system's LLVM/Clang.
>
> Debian clang version
> 12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
>
Please use the official clang 11.0.1 release
(https://releases.llvm.org/download.html), modifying the
kernel/pgo/Kconfig as I suggested above. The reason we specify clang
12 for the minimal version is because of an issue that was recently
fixed.

> Can you give me a LLVM commit-id where you had success with LLVM_IAS=1
> and especially CONFIG_DEBUG_INFO_DWARF5=y?
> Success means I was able to boot in QEMU and/or bare metal.
>
The DWARF5 patch isn't in yet, so I don't want to rely upon it too much.

-bw

2021-01-19 04:14:28

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
>
> On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> >
> > [ big snip ]
>
> [More snippage.]
>
> > [ CC Fangrui ]
> >
> > With the attached...
> >
> > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > undefined symbols
> >
> > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> >
> Thanks for confirming that this works with the above patch.
>
> > @ Bill Nick Sami Nathan
> >
> > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
>
> The integrated assembler and this option are more-or-less orthogonal
> to each other. One can still use the GNU assembler with PGO. If you're
> having an issue, it may be related to ClangBuiltLinux issue #1250.
>
> > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> >
> I know Nick did several tests with PGO. He may have looked into it
> already, but we can check.
>

Reproducible.

LLVM_IAS=1 + DWARF5 = Not bootable

I will try:

LLVM_IAS=1 + DWARF4

- Sedat -

> > I would like to know what the impact of the Clang's Integrated
> > Assembler and DWARF v5 are.
> >
> > I dropped both means...
> >
> > 1. Do not pass "LLVM_IAS=1" to make.
> > 2. Use default DWARF v2 (with Nick's patchset: CONFIG_DEBUG_INFO_DWARF2=y).
> >
> > ...for a successfull build and boot on bare metal.
> >
>
> [Next message]
>
> > On each rebuild I need to pass to make ...?
> >
> > LLVM=1 -fprofile-use=vmlinux.profdata
> >
> Yes.
>
> > Did you try together with passing LLVM_IAS=1 to make?
>
> One of my tests was with the integrated assembler enabled. Are you
> finding issues with it?
>
> The problem with using top-of-tree clang is that it's not necessarily
> stable. You could try using the clang 11.x release (changing the
> "CLANG_VERSION >= 120000" in kernel/pgo/Kconfig/ to "CLANG_VERSION >=
> 110000").
>
> -bw

2021-01-19 04:57:39

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
>
> On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> >
> > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > [ big snip ]
> >
> > [More snippage.]
> >
> > > [ CC Fangrui ]
> > >
> > > With the attached...
> > >
> > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > undefined symbols
> > >
> > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > >
> > Thanks for confirming that this works with the above patch.
> >
> > > @ Bill Nick Sami Nathan
> > >
> > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> >
> > The integrated assembler and this option are more-or-less orthogonal
> > to each other. One can still use the GNU assembler with PGO. If you're
> > having an issue, it may be related to ClangBuiltLinux issue #1250.
> >
> > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > >
> > I know Nick did several tests with PGO. He may have looked into it
> > already, but we can check.
> >
>
> Reproducible.
>
> LLVM_IAS=1 + DWARF5 = Not bootable
>
> I will try:
>
> LLVM_IAS=1 + DWARF4
>

I was not able to boot into such a built Linux-kernel.

For me worked: DWARF2 and LLVM_IAS=1 *not* set.

Of course, this could be an issue with my system's LLVM/Clang.

Debian clang version
12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724

Can you give me a LLVM commit-id where you had success with LLVM_IAS=1
and especially CONFIG_DEBUG_INFO_DWARF5=y?
Success means I was able to boot in QEMU and/or bare metal.

Thanks.

Regards,
- Sedat -

>
> > > I would like to know what the impact of the Clang's Integrated
> > > Assembler and DWARF v5 are.
> > >
> > > I dropped both means...
> > >
> > > 1. Do not pass "LLVM_IAS=1" to make.
> > > 2. Use default DWARF v2 (with Nick's patchset: CONFIG_DEBUG_INFO_DWARF2=y).
> > >
> > > ...for a successfull build and boot on bare metal.
> > >
> >
> > [Next message]
> >
> > > On each rebuild I need to pass to make ...?
> > >
> > > LLVM=1 -fprofile-use=vmlinux.profdata
> > >
> > Yes.
> >
> > > Did you try together with passing LLVM_IAS=1 to make?
> >
> > One of my tests was with the integrated assembler enabled. Are you
> > finding issues with it?
> >
> > The problem with using top-of-tree clang is that it's not necessarily
> > stable. You could try using the clang 11.x release (changing the
> > "CLANG_VERSION >= 120000" in kernel/pgo/Kconfig/ to "CLANG_VERSION >=
> > 110000").
> >
> > -bw

2021-01-19 05:32:17

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 18, 2021 at 10:56 PM Bill Wendling <[email protected]> wrote:
>
> On Mon, Jan 18, 2021 at 9:26 AM Sedat Dilek <[email protected]> wrote:
> >
> > On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> > > >
> > > > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > > > >
> > > > > [ big snip ]
> > > >
> > > > [More snippage.]
> > > >
> > > > > [ CC Fangrui ]
> > > > >
> > > > > With the attached...
> > > > >
> > > > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > > > undefined symbols
> > > > >
> > > > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > > > >
> > > > Thanks for confirming that this works with the above patch.
> > > >
> > > > > @ Bill Nick Sami Nathan
> > > > >
> > > > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> > > >
> > > > The integrated assembler and this option are more-or-less orthogonal
> > > > to each other. One can still use the GNU assembler with PGO. If you're
> > > > having an issue, it may be related to ClangBuiltLinux issue #1250.
> > > >
> > > > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > > > >
> > > > I know Nick did several tests with PGO. He may have looked into it
> > > > already, but we can check.
> > > >
> > >
> > > Reproducible.
> > >
> > > LLVM_IAS=1 + DWARF5 = Not bootable
> > >
> > > I will try:
> > >
> > > LLVM_IAS=1 + DWARF4
> > >
> >
> > I was not able to boot into such a built Linux-kernel.
> >
> PGO will have no effect on debugging data. If this is an issue with
> DWARF, then it's likely orthogonal to the PGO patch.
>
> > For me worked: DWARF2 and LLVM_IAS=1 *not* set.
> >
> > Of course, this could be an issue with my system's LLVM/Clang.
> >
> > Debian clang version
> > 12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> >
> Please use the official clang 11.0.1 release
> (https://releases.llvm.org/download.html), modifying the
> kernel/pgo/Kconfig as I suggested above. The reason we specify clang
> 12 for the minimal version is because of an issue that was recently
> fixed.
>
> > Can you give me a LLVM commit-id where you had success with LLVM_IAS=1
> > and especially CONFIG_DEBUG_INFO_DWARF5=y?
> > Success means I was able to boot in QEMU and/or bare metal.
> >
> The DWARF5 patch isn't in yet, so I don't want to rely upon it too much.
>

That means to build the first PGO-enabled kernel with clang-11 and
rebuild in a second step again with the same clang-11.

Just FYI:
I was able to boot into a Linux-kernel rebuild with *no LLVM_IAS=1*
(means use "GNU AS 2.35.1") set and DWARF5 using LLVM=1 from
LLVM/Clang-12.

- Sedat -

2021-01-20 01:04:48

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jan 16, 2021 at 1:44 AM Bill Wendling <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.

Specifically for v5:
Tested-by: Nick Desaulniers <[email protected]>

If anything changes drastically, please drop that and I'll retest it;
otherwise for changes to the commit message or docs, feel free to
carry it forward.

I'll try to provide code review by EOW, assuming we can stop
regressing LLVM so I can focus. (Ha!)
--
Thanks,
~Nick Desaulniers

2021-01-21 03:26:26

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

Thanks Bill, mostly questions below. Patch looks good to me modulo
disabling profiling for one crypto TU, mixing style of pre/post
increment, and some comments around locking. With those addressed,
I'm hoping akpm@ would consider picking this up.

On Sat, Jan 16, 2021 at 1:44 AM Bill Wendling <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 2 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 185 +++++++++++++
> kernel/pgo/pgo.h | 206 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1022 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9e..8d6418e858062 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 0000000000000..b7f11d8405b73
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 79b400c97059f..cb1f1f2b2baf4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13948,6 +13948,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index 9e73f82e0d863..9128bfe1ccc97 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a36..f39d3991f6bfe 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> pairs of 32-bit arguments, select this option.
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff08..36305ea61dc09 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce2..383853e32f673 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3faa..ed12ab65f6065 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index a31de0c6ccde2..775fa0b368e98 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,8 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +PGO_PROFILE_curve25519-x86_64.o := n
> +

^ Do you have more info about this?

> obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
>
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380bd..26e2b3af0145c 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f25..f6cab2316c46a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd5..5f22b31446ad4 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20cb..36f20e99da0bc 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..21797192f958f 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f357..54f5768f58530 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b33..2d81623b33f29 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535a..3a591bb18c5fb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA

^ since dropping other arch support from v4, could probably drop this,
too. We should be covered by the modification to
arch/x86/kernel/vmlinux.lds.S, right?

>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf3..0b34ca228ba46 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 0000000000000..76a640b6cf6ed
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 0000000000000..41e27cefd9a47
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 0000000000000..68b24672be10a
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,382 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> + header->magic = LLVM_PRF_MAGIC;
> + header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (8 - size % 8);
> +}

This is ugly but it looks like this corresponds with
__llvm_profile_get_num_padding_bytes() in
llvm-project/compiler-rt/lib/profile/InstrProfiling.c? If there are
platforms where `sizeof(unsigned long) != 8` and are supported by the
kernel, it might be nicer to spell out `sizeof(unsigned long)` rather
than hardcode 8. Should we also use u64 for the parameter and u8 for
the return type?

> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/* Serialize the profiling data into a format LLVM's tools can understand. */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (err) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);

This is an improvement over earlier editions, but kfree() is still
within the critical section. I wonder if it can be moved out? If not,
why, precisely? Otherwise are we sure we have the concurrency correct?
Might be worth pursuing in a follow up patch once the core has landed.

Also, it looks like the comment above the definition of pgo_lock and
allocate_node() seem to indicate the same lock is used for
serialization. I'm curious to know more about why we can't access
current_node and serialize at the same time? At the least, it seems
that `prf_serialize` should have a similar comment to `allocate_node`
regarding the caller being expected to hold the `pgo_lock` via a call
to `prf_lock()`, yeah?

I can't help but look at the two call sites of prf_lock() and be
suspicious that pgo_lock is technically guarding access to more
variables than described in the comment. It would be good to explain
exactly what is going on should we need to revisit the concurrency
here in the future (and lower the bus factor).

> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {

^ this patch mixes pre-increment and post-increment in loops. The
kernel coding style docs (Documentation/process/coding-style.rst)
don't make a call on this, but it might be nice to be internally
consistent throughout the patch. I assume that's from having mixed
authors. Not a huge issue, but I'm pedantic.

> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; ++i) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 0000000000000..6084ff0652e85
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,185 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/* Lock guarding value node access and serialization. */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the CounterIndex if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, us it as is. */

^ typo, "use"

> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 0000000000000..df0aa278f28bd
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#ifdef CONFIG_64BIT
> + #define LLVM_PRF_MAGIC \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#else
> + #define LLVM_PRF_MAGIC \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +#endif
> +
> +#define LLVM_PRF_VERSION 5
> +#define LLVM_PRF_DATA_ALIGN 8
> +#define LLVM_PRF_IPVK_FIRST 0
> +#define LLVM_PRF_IPVK_LAST 1
> +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16

llvm/include/llvm/ProfileData/InstrProfData.inc defines
INSTR_PROF_MAX_NUM_VAL_PER_SITE as 255; does that need to match?

> +
> +#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
> +#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_PRF_DATA_ALIGN);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33e..9b218afb5cb87 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.284.gd98b1dd5eaa7-goog
>


--
Thanks,
~Nick Desaulniers

2021-01-21 03:33:22

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

Hi,

When I looked through the code I wondered why we do not add a
"CONFIG_PGO_CLANG_PROFDATA" which can be helpful when doing the PGO
rebuild with a vmlinux.profdata.

This introduces a "PGO_PROFDATA" to turn on/off to pass
"-fprofile-use=vmlinux.profdata" (see CFLAGS_PGO_CLANG_PROFDATA in
top-level Makefile).

If we turn off via "PGO_PROFILE := n" in several Makefiles - we should
do the same and add "PGO_PROFDATA := n" to the same Makefiles?

Please see the attached diff.

Regards,
- Sedat -


Attachments:
CONFIG_PGO_CLANG_PROFDATA.diff (4.66 kB)

2021-01-21 03:40:58

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jan 18, 2021 at 10:56 PM Bill Wendling <[email protected]> wrote:
>
> On Mon, Jan 18, 2021 at 9:26 AM Sedat Dilek <[email protected]> wrote:
> >
> > On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> > > >
> > > > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > > > >
> > > > > [ big snip ]
> > > >
> > > > [More snippage.]
> > > >
> > > > > [ CC Fangrui ]
> > > > >
> > > > > With the attached...
> > > > >
> > > > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > > > undefined symbols
> > > > >
> > > > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > > > >
> > > > Thanks for confirming that this works with the above patch.
> > > >
> > > > > @ Bill Nick Sami Nathan
> > > > >
> > > > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> > > >
> > > > The integrated assembler and this option are more-or-less orthogonal
> > > > to each other. One can still use the GNU assembler with PGO. If you're
> > > > having an issue, it may be related to ClangBuiltLinux issue #1250.
> > > >
> > > > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > > > >
> > > > I know Nick did several tests with PGO. He may have looked into it
> > > > already, but we can check.
> > > >
> > >
> > > Reproducible.
> > >
> > > LLVM_IAS=1 + DWARF5 = Not bootable
> > >
> > > I will try:
> > >
> > > LLVM_IAS=1 + DWARF4
> > >
> >
> > I was not able to boot into such a built Linux-kernel.
> >
> PGO will have no effect on debugging data. If this is an issue with
> DWARF, then it's likely orthogonal to the PGO patch.
>
> > For me worked: DWARF2 and LLVM_IAS=1 *not* set.
> >
> > Of course, this could be an issue with my system's LLVM/Clang.
> >
> > Debian clang version
> > 12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> >
> Please use the official clang 11.0.1 release
> (https://releases.llvm.org/download.html), modifying the
> kernel/pgo/Kconfig as I suggested above. The reason we specify clang
> 12 for the minimal version is because of an issue that was recently
> fixed.
>

I downgraded to clang-11.1.0-rc1.
( See attachment. )

Doing the first build with PGO enabled plus DWARF5 and LLVM_IAS=1 works.

But again after generating vmlinux.profdata and doing the PGO-rebuild
- the resulting Linux-kernel does NOT boot in QEMU or on bare metal.
With GNU/as I can boot.

So this is independent of DWARF v4 or DWARF v5 (LLVM_IAS=1 and DWARF
v2 is not allowed).
There is something wrong (here) with passing LLVM_IAS=1 to make when
doing the PGO-rebuild.

Can someone please verify and confirm that the PGO-rebuild with
LLVM_IAS=1 boots or boots not?

Thanks.

- Sedat -

> > Can you give me a LLVM commit-id where you had success with LLVM_IAS=1
> > and especially CONFIG_DEBUG_INFO_DWARF5=y?
> > Success means I was able to boot in QEMU and/or bare metal.
> >
> The DWARF5 patch isn't in yet, so I don't want to rely upon it too much.
>
> -bw


Attachments:
0001-pgo-Allow-to-use-clang-v11.0.1.patch (841.00 B)

2021-01-21 08:35:17

by Bill Wendling

[permalink] [raw]
Subject: [PATCH v6] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
---
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
testing.
- Corrected documentation, re PGO flags when using LTO, based on Fangrui
Song's comments.
v3: - Added change log section based on Sedat Dilek's comments.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
own popcount implementation, based on Nick Desaulniers's comment.
v5: - Correct padding calculation, discovered by Nathan Chancellor.
v6: - Add better documentation about the locking scheme and other things.
- Rename macros to better match the same macros in LLVM's source code.
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/crypto/Makefile | 4 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 44 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 35 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 189 +++++++++++++
kernel/pgo/pgo.h | 203 ++++++++++++++
scripts/Makefile.lib | 10 +
24 files changed, 1032 insertions(+)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9..8d6418e85806 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 000000000000..b7f11d8405b7
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ $ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index 00836f6452f0..13333026e140 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13948,6 +13948,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index b0e4767735dc..9339541f7cec 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a3..f39d3991f6bf 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
pairs of 32-bit arguments, select this option.

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..36305ea61dc0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce..383853e32f67 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3fa..ed12ab65f606 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index a31de0c6ccde..5753aea7bcbd 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -4,6 +4,10 @@

OBJECT_FILES_NON_STANDARD := y

+# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
+# registers for some of the functions.
+PGO_PROFILE_curve25519-x86_64.o := n
+
obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o

obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380b..26e2b3af0145 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f2..f6cab2316c46 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd..5f22b31446ad 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..36f20e99da0b 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449..21797192f958 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f35..54f5768f5853 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b3..2d81623b33f2 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..3a591bb18c5f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
#define THERMAL_TABLE(name)
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ KEEP(*(__llvm_prf_data)) \
+ . = ALIGN(8); \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_start = .; \
+ KEEP(*(__llvm_prf_cnts)) \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_names_start = .; \
+ KEEP(*(__llvm_prf_names)) \
+ . = ALIGN(8); \
+ __llvm_prf_names_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ KEEP(*(__llvm_prf_vals)) \
+ . = ALIGN(8); \
+ __llvm_prf_vals_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ KEEP(*(__llvm_prf_vnds)) \
+ . = ALIGN(8); \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1125,6 +1168,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf..0b34ca228ba4 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 000000000000..76a640b6cf6e
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significantly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 000000000000..41e27cefd9a4
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 000000000000..132ff2ab3feb
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,389 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+#ifdef CONFIG_64BIT
+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
+#else
+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
+#endif
+ header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_PRF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 7 & (sizeof(u64) - size % sizeof(u64));
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/*
+ * Serialize the profiling data into a format LLVM's tools can understand.
+ * Note: caller *must* hold pgo_lock.
+ */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (unlikely(err)) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; i++) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debugfs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 000000000000..62ff5cfce7b1
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/bitops.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/*
+ * This lock guards both profile count updating and serialization of the
+ * profiling data. Keeping both of these activities separate via locking
+ * ensures that we don't try to serialize data that's only partially updated.
+ */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the index if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value);
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+ if (value <= 8)
+ /* The first ranges are individually tracked, use it as is. */
+ return value;
+ else if (value >= 513)
+ /* The last range is mapped to its lowest value. */
+ return 513;
+ else if (hweight64(value) == 1)
+ /* If it's a power of two, use it as is. */
+ return value;
+
+ /* Otherwise, take to the previous power of two + 1. */
+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index);
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index)
+{
+ u64 rep_value;
+
+ /* Map the target value to the representative value of its range. */
+ rep_value = inst_prof_get_range_rep_value(target_value);
+ __llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 000000000000..0d33e07a0bf3
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+
+#define LLVM_INSTR_PROF_RAW_VERSION 5
+#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
+#define LLVM_INSTR_PROF_IPVK_FIRST 0
+#define LLVM_INSTR_PROF_IPVK_LAST 1
+#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
+
+#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
+#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33..9b218afb5cb8 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.30.0.296.g2bfb1c46d8-goog

2021-01-21 08:38:53

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Jan 20, 2021 at 4:51 PM Nick Desaulniers
<[email protected]> wrote:
>
> Thanks Bill, mostly questions below. Patch looks good to me modulo
> disabling profiling for one crypto TU, mixing style of pre/post
> increment, and some comments around locking. With those addressed,
> I'm hoping akpm@ would consider picking this up.
>
> On Sat, Jan 16, 2021 at 1:44 AM Bill Wendling <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > testing.
> > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 2 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 44 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 382 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 185 +++++++++++++
> > kernel/pgo/pgo.h | 206 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1022 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9e..8d6418e858062 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 0000000000000..b7f11d8405b73
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 79b400c97059f..cb1f1f2b2baf4 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13948,6 +13948,15 @@ S: Maintained
> > F: include/linux/personality.h
> > F: include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M: Sami Tolvanen <[email protected]>
> > +M: Bill Wendling <[email protected]>
> > +R: Nathan Chancellor <[email protected]>
> > +R: Nick Desaulniers <[email protected]>
> > +S: Supported
> > +F: Documentation/dev-tools/pgo.rst
> > +F: kernel/pgo
> > +
> > PHOENIX RC FLIGHT CONTROLLER ADAPTER
> > M: Marcus Folkesson <[email protected]>
> > L: [email protected]
> > diff --git a/Makefile b/Makefile
> > index 9e73f82e0d863..9128bfe1ccc97 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 24862d15f3a36..f39d3991f6bfe 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> > pairs of 32-bit arguments, select this option.
> >
> > source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> > source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 21f851179ff08..36305ea61dc09 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -96,6 +96,7 @@ config X86
> > select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> > select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> > + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_QUEUED_RWLOCKS
> > select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce2..383853e32f673 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3faa..ed12ab65f6065 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE :=n
> >
> > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> > index a31de0c6ccde2..775fa0b368e98 100644
> > --- a/arch/x86/crypto/Makefile
> > +++ b/arch/x86/crypto/Makefile
> > @@ -4,6 +4,8 @@
> >
> > OBJECT_FILES_NON_STANDARD := y
> >
> > +PGO_PROFILE_curve25519-x86_64.o := n
> > +
>
> ^ Do you have more info about this?
>
This gave an error during compilation complaining about lacking
registers in some instances. This file is mostly inline asm or code
that doesn't super benefit from profiling, so I disabled it.

Note that the register issue happens only with PGO. Normal compilation is fine.

> > obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
> >
> > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380bd..26e2b3af0145c 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> > $(call ld-option, --eh-frame-hdr) -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > quiet_cmd_vdso_and_check = VDSO $@
> > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f25..f6cab2316c46a 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> > BUG_TABLE
> >
> > + PGO_CLANG_DATA
> > +
> > ORC_UNWIND_TABLE
> >
> > . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd5..5f22b31446ad4 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> > KASAN_SANITIZE := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20cb..36f20e99da0bc 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> > # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449f..21797192f958f 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f357..54f5768f58530 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b33..2d81623b33f29 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > # Sanitizer runtimes are unavailable and cannot be linked here.
> > KASAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535a..3a591bb18c5fb 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> > #define THERMAL_TABLE(name)
> > #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA \
> > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_start = .; \
> > + __llvm_prf_data_start = .; \
> > + KEEP(*(__llvm_prf_data)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_data_end = .; \
> > + } \
> > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_start = .; \
> > + KEEP(*(__llvm_prf_cnts)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_end = .; \
> > + } \
> > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_start = .; \
> > + KEEP(*(__llvm_prf_names)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> > + __llvm_prf_vals_start = .; \
> > + KEEP(*(__llvm_prf_vals)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vals_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> > + __llvm_prf_vnds_start = .; \
> > + KEEP(*(__llvm_prf_vnds)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vnds_end = .; \
> > + __llvm_prf_end = .; \
> > + }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> > #define KERNEL_DTB() \
> > STRUCT_ALIGN(); \
> > __dtb_start = .; \
> > @@ -1125,6 +1168,7 @@
> > CONSTRUCTORS \
> > } \
> > BUG_TABLE \
> > + PGO_CLANG_DATA
>
> ^ since dropping other arch support from v4, could probably drop this,
> too. We should be covered by the modification to
> arch/x86/kernel/vmlinux.lds.S, right?
>
Possibly, but I'd like to keep it here anyway. It's the correct place
for this info, and will benefit us when we do enable other platforms.

> >
> > #define INIT_TEXT_SECTION(inittext_align) \
> > . = ALIGN(inittext_align); \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf3..0b34ca228ba46 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> > obj-$(CONFIG_KCSAN) += kcsan/
> > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> > obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 0000000000000..76a640b6cf6ed
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,35 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > + bool
> > +
> > +config PGO_CLANG
> > + bool "Enable clang's PGO-based kernel profiling"
> > + depends on DEBUG_FS
> > + depends on ARCH_SUPPORTS_PGO_CLANG
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> > + help
> > + This option enables clang's PGO (Profile Guided Optimization) based
> > + code profiling to better optimize the kernel.
> > +
> > + If unsure, say N.
> > +
> > + Run a representative workload for your application on a kernel
> > + compiled with this option and download the raw profile file from
> > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > + llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > + Copy the resulting profile file into vmlinux.profdata, and enable
> > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > + kernel.
> > +
> > + Note that a kernel compiled with profiling flags will be
> > + significantly larger and run slower. Also be sure to exclude files
> > + from profiling which are not linked to the kernel image to prevent
> > + linker errors.
> > +
> > + Note that the debugfs filesystem has to be mounted to access
> > + profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 0000000000000..41e27cefd9a47
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > +
> > +obj-y += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 0000000000000..68b24672be10a
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,382 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > + void *buffer;
> > + unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + * - llvm_prf_header
> > + * - __llvm_prf_data
> > + * - __llvm_prf_cnts
> > + * - __llvm_prf_names
> > + * - zero padding to 8 bytes
> > + * - for each llvm_prf_data in __llvm_prf_data:
> > + * - llvm_prf_value_data
> > + * - llvm_prf_value_record + site count array
> > + * - llvm_prf_value_node_data
> > + * ...
> > + * ...
> > + * ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > + header->magic = LLVM_PRF_MAGIC;
> > + header->version = LLVM_PRF_VARIANT_MASK_IR | LLVM_PRF_VERSION;
> > + header->data_size = prf_data_count();
> > + header->padding_bytes_before_counters = 0;
> > + header->counters_size = prf_cnts_count();
> > + header->padding_bytes_after_counters = 0;
> > + header->names_size = prf_names_count();
> > + header->counters_delta = (u64)__llvm_prf_cnts_start;
> > + header->names_delta = (u64)__llvm_prf_names_start;
> > + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> > +
> > + *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > + memcpy(*buffer, src, size);
> > + *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + u32 kinds = 0;
> > + u32 size = 0;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Record + site count array */
> > + size += prf_get_value_record_size(sites);
> > + kinds++;
> > +
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX)
> > + site = site->next;
> > +
> > + size += count *
> > + sizeof(struct llvm_prf_value_node_data);
> > + }
> > +
> > + s += sites;
> > + }
> > +
> > + if (size)
> > + size += sizeof(struct llvm_prf_value_data);
> > +
> > + if (value_kinds)
> > + *value_kinds = kinds;
> > +
> > + return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > + u32 size = 0;
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + size += __prf_get_value_size(p, NULL);
> > +
> > + return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > + struct llvm_prf_value_data header;
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > + if (!header.num_value_kinds)
> > + /* Nothing to write. */
> > + return;
> > +
> > + prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + struct llvm_prf_value_record *record;
> > + u8 *counts;
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Profiling value record. */
> > + record = *(struct llvm_prf_value_record **)buffer;
> > + *buffer += prf_get_value_record_header_size();
> > +
> > + record->kind = kind;
> > + record->num_value_sites = sites;
> > +
> > + /* Site count array. */
> > + counts = *(u8 **)buffer;
> > + *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > + /*
> > + * If we don't have nodes, we can skip updating the site count
> > + * array, because the buffer is zero filled.
> > + */
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX) {
> > + prf_copy_to_buffer(buffer, site,
> > + sizeof(struct llvm_prf_value_node_data));
> > + site = site->next;
> > + }
> > +
> > + counts[n] = (u8)count;
> > + }
> > +
> > + s += sites;
> > + }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > + return 7 & (8 - size % 8);
> > +}
>
> This is ugly but it looks like this corresponds with
> __llvm_profile_get_num_padding_bytes() in
> llvm-project/compiler-rt/lib/profile/InstrProfiling.c? If there are
> platforms where `sizeof(unsigned long) != 8` and are supported by the
> kernel, it might be nicer to spell out `sizeof(unsigned long)` rather
> than hardcode 8. Should we also use u64 for the parameter and u8 for
> the return type?
>
It's probably best to use what llvm uses in that function
(sizeof(uint64_t)). I can replace it.

> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > + return sizeof(struct llvm_prf_header) +
> > + prf_data_size() +
> > + prf_cnts_size() +
> > + prf_names_size() +
> > + prf_get_padding(prf_names_size()) +
> > + prf_get_value_size();
> > +}
> > +
> > +/* Serialize the profiling data into a format LLVM's tools can understand. */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > + int err = 0;
> > + void *buffer;
> > +
> > + p->size = prf_buffer_size();
> > + p->buffer = vzalloc(p->size);
> > +
> > + if (!p->buffer) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + buffer = p->buffer;
> > +
> > + prf_fill_header(&buffer);
> > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > + buffer += prf_get_padding(prf_names_size());
> > +
> > + prf_serialize_values(&buffer);
> > +
> > +out:
> > + return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data;
> > + unsigned long flags;
> > + int err;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + flags = prf_lock();
> > +
> > + err = prf_serialize(data);
> > + if (err) {
> > + kfree(data);
> > + goto out_unlock;
> > + }
> > +
> > + file->private_data = data;
> > +
> > +out_unlock:
> > + prf_unlock(flags);
>
> This is an improvement over earlier editions, but kfree() is still
> within the critical section. I wonder if it can be moved out? If not,
> why, precisely? Otherwise are we sure we have the concurrency correct?
> Might be worth pursuing in a follow up patch once the core has landed.
>
The kfree() isn't on the critical path, but done only when an error
occurs. I could add an "unlikely()" in the if-conditional hoping that
it's moved out-of-line, but the code it would be skipping would be a
couple of asm instructions. While I appreciate that performance in the
kernel is super important, we've already warned that performance with
an instrumented kernel won't be as good. :-)

> Also, it looks like the comment above the definition of pgo_lock and
> allocate_node() seem to indicate the same lock is used for
> serialization. I'm curious to know more about why we can't access
> current_node and serialize at the same time? At the least, it seems
> that `prf_serialize` should have a similar comment to `allocate_node`
> regarding the caller being expected to hold the `pgo_lock` via a call
> to `prf_lock()`, yeah?
>
> I can't help but look at the two call sites of prf_lock() and be
> suspicious that pgo_lock is technically guarding access to more
> variables than described in the comment. It would be good to explain
> exactly what is going on should we need to revisit the concurrency
> here in the future (and lower the bus factor).
>
I'll update the comments.

> > +out:
> > + return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + BUG_ON(!data);
> > +
> > + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > + data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + if (data) {
> > + vfree(data->buffer);
> > + kfree(data);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > + .owner = THIS_MODULE,
> > + .open = prf_open,
> > + .read = prf_read,
> > + .llseek = default_llseek,
> > + .release = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > + size_t len, loff_t *pos)
> > +{
> > + struct llvm_prf_data *data;
> > +
> > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; ++data) {
>
> ^ this patch mixes pre-increment and post-increment in loops. The
> kernel coding style docs (Documentation/process/coding-style.rst)
> don't make a call on this, but it might be nice to be internally
> consistent throughout the patch. I assume that's from having mixed
> authors. Not a huge issue, but I'm pedantic.
>
Okay.

> > + struct llvm_prf_value_node **vnodes;
> > + u64 current_vsite_count;
> > + u32 i;
> > +
> > + if (!data->values)
> > + continue;
> > +
> > + current_vsite_count = 0;
> > + vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > + for (i = LLVM_PRF_IPVK_FIRST; i <= LLVM_PRF_IPVK_LAST; ++i)
> > + current_vsite_count += data->num_value_sites[i];
> > +
> > + for (i = 0; i < current_vsite_count; ++i) {
> > + struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > + while (current_vnode) {
> > + current_vnode->count = 0;
> > + current_vnode = current_vnode->next;
> > + }
> > + }
> > + }
> > +
> > + return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > + .owner = THIS_MODULE,
> > + .write = reset_write,
> > + .llseek = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > + directory = debugfs_create_dir("pgo", NULL);
> > + if (!directory)
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > + &prf_fops))
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("reset", 0200, directory, NULL,
> > + &prf_reset_fops))
> > + goto err_remove;
> > +
> > + return 0;
> > +
> > +err_remove:
> > + pr_err("initialization failed\n");
> > + return -EIO;
> > +}
> > +
> > +/* Remove debugfs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > + debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 0000000000000..6084ff0652e85
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,185 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/* Lock guarding value node access and serialization. */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&pgo_lock, flags);
> > +
> > + return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > + spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > + u32 index, u64 value)
> > +{
> > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > + return NULL; /* Out of nodes */
> > +
> > + current_node++;
> > +
> > + /* Make sure the node is entirely within the section */
> > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > + return NULL;
> > +
> > + return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the CounterIndex if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > + struct llvm_prf_value_node **counters;
> > + struct llvm_prf_value_node *curr;
> > + struct llvm_prf_value_node *min = NULL;
> > + struct llvm_prf_value_node *prev = NULL;
> > + u64 min_count = U64_MAX;
> > + u8 values = 0;
> > + unsigned long flags;
> > +
> > + if (!p || !p->values)
> > + return;
> > +
> > + counters = (struct llvm_prf_value_node **)p->values;
> > + curr = counters[index];
> > +
> > + while (curr) {
> > + if (target_value == curr->value) {
> > + curr->count++;
> > + return;
> > + }
> > +
> > + if (curr->count < min_count) {
> > + min_count = curr->count;
> > + min = curr;
> > + }
> > +
> > + prev = curr;
> > + curr = curr->next;
> > + values++;
> > + }
> > +
> > + if (values >= LLVM_PRF_MAX_NUM_VALS_PER_SITE) {
> > + if (!min->count || !(--min->count)) {
> > + curr = min;
> > + curr->value = target_value;
> > + curr->count++;
> > + }
> > + return;
> > + }
> > +
> > + /* Lock when updating the value node structure. */
> > + flags = prf_lock();
> > +
> > + curr = allocate_node(p, index, target_value);
> > + if (!curr)
> > + goto out;
> > +
> > + curr->value = target_value;
> > + curr->count++;
> > +
> > + if (!counters[index])
> > + counters[index] = curr;
> > + else if (prev && !prev->next)
> > + prev->next = curr;
> > +
> > +out:
> > + prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value);
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value)
> > +{
> > + if (large_value != S64_MIN && (s64)target_value >= large_value)
> > + target_value = large_value;
> > + else if ((s64)target_value < precise_start ||
> > + (s64)target_value > precise_last)
> > + target_value = precise_last + 1;
> > +
> > + __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > +
> > +static u64 inst_prof_get_range_rep_value(u64 value)
> > +{
> > + if (value <= 8)
> > + /* The first ranges are individually tracked, us it as is. */
>
> ^ typo, "use"
>
> > + return value;
> > + else if (value >= 513)
> > + /* The last range is mapped to its lowest value. */
> > + return 513;
> > + else if (hweight64(value) == 1)
> > + /* If it's a power of two, use it as is. */
> > + return value;
> > +
> > + /* Otherwise, take to the previous power of two + 1. */
> > + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> > +}
> > +
> > +/*
> > + * The target values are partitioned into multiple ranges. The range spec is
> > + * defined in compiler-rt/include/profile/InstrProfData.inc.
> > + */
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index);
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index)
> > +{
> > + u64 rep_value;
> > +
> > + /* Map the target value to the representative value of its range. */
> > + rep_value = inst_prof_get_range_rep_value(target_value);
> > + __llvm_profile_instrument_target(rep_value, data, counter_index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 0000000000000..df0aa278f28bd
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,206 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#ifdef CONFIG_64BIT
> > + #define LLVM_PRF_MAGIC \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'r' << 8 | \
> > + (u64)129)
> > +#else
> > + #define LLVM_PRF_MAGIC \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'R' << 8 | \
> > + (u64)129)
> > +#endif
> > +
> > +#define LLVM_PRF_VERSION 5
> > +#define LLVM_PRF_DATA_ALIGN 8
> > +#define LLVM_PRF_IPVK_FIRST 0
> > +#define LLVM_PRF_IPVK_LAST 1
> > +#define LLVM_PRF_MAX_NUM_VALS_PER_SITE 16
>
> llvm/include/llvm/ProfileData/InstrProfData.inc defines
> INSTR_PROF_MAX_NUM_VAL_PER_SITE as 255; does that need to match?
>
Sure. I also updated the names to better match LLVM's names.

> > +
> > +#define LLVM_PRF_VARIANT_MASK_IR (0x1ull << 56)
> > +#define LLVM_PRF_VARIANT_MASK_CSIR (0x1ull << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + * counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + * counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + * counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + * counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > + u64 magic;
> > + u64 version;
> > + u64 data_size;
> > + u64 padding_bytes_before_counters;
> > + u64 counters_size;
> > + u64 padding_bytes_after_counters;
> > + u64 names_size;
> > + u64 counters_delta;
> > + u64 names_delta;
> > + u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > + const u64 name_ref;
> > + const u64 func_hash;
> > + const void *counter_ptr;
> > + const void *function_ptr;
> > + void *values;
> > + const u32 num_counters;
> > + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> > +} __aligned(LLVM_PRF_DATA_ALIGN);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + * llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > + u64 value;
> > + u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + * the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > + u64 value;
> > + u64 count;
> > + struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + * format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + * data.
> > + */
> > +struct llvm_prf_value_data {
> > + u32 total_size;
> > + u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + * profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + * of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > + u32 kind;
> > + u32 num_value_sites;
> > + u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size() \
> > + offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites) \
> > + roundup((sites), 8)
> > +#define prf_get_value_record_size(sites) \
> > + (prf_get_value_record_header_size() + \
> > + prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > + static inline unsigned long prf_ ## s ## _size(void) \
> > + { \
> > + unsigned long start = \
> > + (unsigned long)__llvm_prf_ ## s ## _start; \
> > + unsigned long end = \
> > + (unsigned long)__llvm_prf_ ## s ## _end; \
> > + return roundup(end - start, \
> > + sizeof(__llvm_prf_ ## s ## _start[0])); \
> > + } \
> > + static inline unsigned long prf_ ## s ## _count(void) \
> > + { \
> > + return prf_ ## s ## _size() / \
> > + sizeof(__llvm_prf_ ## s ## _start[0]); \
> > + }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33e..9b218afb5cb87 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_GCOV))
> > endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > + $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> > #
> > # Enable address sanitizer flags for kernel except some files or directories
> > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.284.gd98b1dd5eaa7-goog
> >

Sending patch v6. PTAL.

-bw

2021-01-21 10:39:34

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v6] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 21, 2021 at 9:25 AM 'Bill Wendling' via Clang Built Linux
<[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
> ---

Thanks for v6.

Small Changelog nits:
* Can you reverse-order the changelog - latest v6 first.
* v4: s/Makfile changes and se/Ma*k*efile changes and *u*se

Can you add a hint to this "Clang-PGO" patch requiring Clang >= 12?

Can you please add a comment for people using Clang >= 12 (ToT) and
have kernel-modules enabled, they will need the patch from CBL issue
#1250?
Otherwise they cannot boot and follow the next steps in the workflow.

Can you put a comment about value "1" to reset the profiling counter?
That there is no "0" value stopping it.

Can you add an example for the workload test?
Here I do a x86-64 defconfig build.
See attached script.

Usually, I download this patch from LORE.

link="https://lore.kernel.org/r/[email protected]"
b4 -d am $link

This downloads v6.

What if I want a previous version (compare)?
Again, I will love to see a "clang-pgo" branch and maybe tags for the
several versions in your personal GitHub.
Come on, Bill :-).

If you like you can add my...

Tested-by: Sedat Dilek <[email protected]>

( I guess I have built approx. 10+ clang-pgo kernels. )

- Sedat -

[1] https://github.com/ClangBuiltLinux/linux/issues/1250







> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1032 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9..8d6418e85806 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 000000000000..b7f11d8405b7
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 00836f6452f0..13333026e140 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13948,6 +13948,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index b0e4767735dc..9339541f7cec 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..f39d3991f6bf 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> pairs of 32-bit arguments, select this option.
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..36305ea61dc0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce..383853e32f67 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..ed12ab65f606 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index a31de0c6ccde..5753aea7bcbd 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> +# registers for some of the functions.
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
>
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380b..26e2b3af0145 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f2..f6cab2316c46 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd..5f22b31446ad 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..36f20e99da0b 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449..21797192f958 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f35..54f5768f5853 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b3..2d81623b33f2 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535..3a591bb18c5f 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf..0b34ca228ba4 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 000000000000..76a640b6cf6e
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 000000000000..41e27cefd9a4
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 000000000000..132ff2ab3feb
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,389 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +#ifdef CONFIG_64BIT
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> +#else
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> +#endif
> + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (sizeof(u64) - size % sizeof(u64));
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/*
> + * Serialize the profiling data into a format LLVM's tools can understand.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (unlikely(err)) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; i++) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 000000000000..62ff5cfce7b1
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/*
> + * This lock guards both profile count updating and serialization of the
> + * profiling data. Keeping both of these activities separate via locking
> + * ensures that we don't try to serialize data that's only partially updated.
> + */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the index if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, use it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 000000000000..0d33e07a0bf3
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +
> +#define LLVM_INSTR_PROF_RAW_VERSION 5
> +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> +#define LLVM_INSTR_PROF_IPVK_LAST 1
> +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> +
> +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33..9b218afb5cb8 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.296.g2bfb1c46d8-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210121082451.2240540-1-morbo%40google.com.


Attachments:
profile_clang-pgo.sh (1.94 kB)

2021-01-21 22:46:52

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 21, 2021 at 3:03 AM Sedat Dilek <[email protected]> wrote:
>
> On Mon, Jan 18, 2021 at 10:56 PM Bill Wendling <[email protected]> wrote:
> >
> > On Mon, Jan 18, 2021 at 9:26 AM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> > > > >
> > > > > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > > > > >
> > > > > > [ big snip ]
> > > > >
> > > > > [More snippage.]
> > > > >
> > > > > > [ CC Fangrui ]
> > > > > >
> > > > > > With the attached...
> > > > > >
> > > > > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > > > > undefined symbols
> > > > > >
> > > > > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > > > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > > > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > > > > >
> > > > > Thanks for confirming that this works with the above patch.
> > > > >
> > > > > > @ Bill Nick Sami Nathan
> > > > > >
> > > > > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> > > > >
> > > > > The integrated assembler and this option are more-or-less orthogonal
> > > > > to each other. One can still use the GNU assembler with PGO. If you're
> > > > > having an issue, it may be related to ClangBuiltLinux issue #1250.
> > > > >
> > > > > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > > > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > > > > >
> > > > > I know Nick did several tests with PGO. He may have looked into it
> > > > > already, but we can check.
> > > > >
> > > >
> > > > Reproducible.
> > > >
> > > > LLVM_IAS=1 + DWARF5 = Not bootable
> > > >
> > > > I will try:
> > > >
> > > > LLVM_IAS=1 + DWARF4
> > > >
> > >
> > > I was not able to boot into such a built Linux-kernel.
> > >
> > PGO will have no effect on debugging data. If this is an issue with
> > DWARF, then it's likely orthogonal to the PGO patch.
> >
> > > For me worked: DWARF2 and LLVM_IAS=1 *not* set.
> > >
> > > Of course, this could be an issue with my system's LLVM/Clang.
> > >
> > > Debian clang version
> > > 12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> > >
> > Please use the official clang 11.0.1 release
> > (https://releases.llvm.org/download.html), modifying the
> > kernel/pgo/Kconfig as I suggested above. The reason we specify clang
> > 12 for the minimal version is because of an issue that was recently
> > fixed.
> >
>
> I downgraded to clang-11.1.0-rc1.
> ( See attachment. )
>
> Doing the first build with PGO enabled plus DWARF5 and LLVM_IAS=1 works.
>
> But again after generating vmlinux.profdata and doing the PGO-rebuild
> - the resulting Linux-kernel does NOT boot in QEMU or on bare metal.
> With GNU/as I can boot.
>
> So this is independent of DWARF v4 or DWARF v5 (LLVM_IAS=1 and DWARF
> v2 is not allowed).
> There is something wrong (here) with passing LLVM_IAS=1 to make when
> doing the PGO-rebuild.
>
> Can someone please verify and confirm that the PGO-rebuild with
> LLVM_IAS=1 boots or boots not?
>
> Thanks.
>
> - Sedat -
>
> > > Can you give me a LLVM commit-id where you had success with LLVM_IAS=1
> > > and especially CONFIG_DEBUG_INFO_DWARF5=y?
> > > Success means I was able to boot in QEMU and/or bare metal.
> > >
> > The DWARF5 patch isn't in yet, so I don't want to rely upon it too much.
> >

I passed LLVM_IAS=1 with KAFLAGS=-fprofile-use=vmlinux.profdata:

/usr/bin/perf_5.10 stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++
HOSTLD=ld.lld CC=clang LD=ld.lld PAHOLE=/opt/paho
le/bin/pahole LOCALVERSION=-2-amd64-clang11-pgo KBUILD_VERBOSE=1
KBUILD_BUILD_HOST=iniza [email protected]
KBUILD_BUILD_TIMESTAMP=2021-01-21 bind
eb-pkg KDEB_PKGVERSION=5.11.0~rc4-2~bullseye+dileks1 LLVM=1
KCFLAGS=-fprofile-use=vmlinux.profdata LLVM_IAS=1
KAFLAGS=-fprofile-use=vmlinux.profdata

The resulting Linux-kernel does not boot.

But I see in the build-log these warnings:

warning: arch/x86/platform/efi/quirks.c: Function control flow change
detected (hash mismatch) efi_arch_mem_reserve Hash = 73770966985
[-Wbackend-plugin]
warning: arch/x86/platform/efi/efi.c: Function control flow change
detected (hash mismatch) efi_attr_is_visible Hash = 57959232386
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) memcmp Hash = 12884901887
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) bcmp Hash = 12884901887
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strcmp Hash = 44149752232
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strnlen Hash = 29212902728
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) simple_strtoull Hash =
288230479369728480 [-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strstr Hash = 76464046323
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) strchr Hash = 30948479515
[-Wbackend-plugin]
warning: arch/x86/boot/compressed/string.c: Function control flow
change detected (hash mismatch) kstrtoull Hash = 288230543187488006
[-Wbackend-plugin]

What does "Function control flow change detected (hash mismatch)" mean?
Related with my boot problems?

- Sedat -

2021-01-22 00:17:56

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v6] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 21, 2021 at 9:25 AM 'Bill Wendling' via Clang Built Linux
<[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>

Was this v6 compile-tested?

This breaks and errors like this:

clang -Wp,-MMD,kernel/pgo/.fs.o.d -nostdinc -isystem
/opt/llvm-toolchain/lib/clang/11.1.0/include -I./arch/x86/include
-I./arch/x86/include/generated -I./include -I
./arch/x86/include/uapi -I./arch/x86/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include
./include/linux/kconfig.h -include ./include/linux/com
piler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=./=
-Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -fshort-wc
har -fno-PIE -Werror=implicit-function-declaration
-Werror=implicit-int -Werror=return-type -Wno-format-security
-std=gnu89 -Werror=unknown-warning-option -mno-sse -mn
o-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387
-mstack-alignment=8 -mtune=generic -mno-red-zone -mcmodel=kernel
-Wno-sign-compare -fno-asynchronous-unwind-tables
-mretpoline-external-thunk -fno-delete-null-pointer-checks
-Wno-frame-address -Wno-address-of-packed-member -O2
-Wframe-larger-than=2048 -fstack-protector-strong -Wno-
format-invalid-specifier -Wno-gnu -mno-global-merge
-Wno-unused-const-variable -g -gdwarf-5 -g -gdwarf-5 -pg -mfentry
-DCC_USING_FENTRY -Wdeclaration-after-statement -
Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow
-fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types
-fcf-protection=none -Wno-initializ
er-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length
-Wno-pointer-to-enum-cast
-Wno-tautological-constant-out-of-range-compare
-DKBUILD_MODFILE='"ker
nel/pgo/fs"' -DKBUILD_BASENAME='"fs"' -DKBUILD_MODNAME='"fs"' -c -o
kernel/pgo/fs.o kernel/pgo/fs.c
In file included from kernel/pgo/fs.c:27:
kernel/pgo/pgo.h:102:28: error: use of undeclared identifier
'LLVM_PRF_IPVK_LAST'
const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
^
kernel/pgo/fs.c:70:28: error: use of undeclared identifier 'LLVM_PRF_IPVK_LAST'
header->value_kind_last = LLVM_PRF_IPVK_LAST;
^
2 errors generated.
make[5]: *** [scripts/Makefile.build:279: kernel/pgo/fs.o] Error 1

- Sedat -

> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1032 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9..8d6418e85806 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 000000000000..b7f11d8405b7
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 00836f6452f0..13333026e140 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13948,6 +13948,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index b0e4767735dc..9339541f7cec 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..f39d3991f6bf 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> pairs of 32-bit arguments, select this option.
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..36305ea61dc0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce..383853e32f67 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..ed12ab65f606 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index a31de0c6ccde..5753aea7bcbd 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> +# registers for some of the functions.
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
>
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380b..26e2b3af0145 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f2..f6cab2316c46 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd..5f22b31446ad 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..36f20e99da0b 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449..21797192f958 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f35..54f5768f5853 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b3..2d81623b33f2 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535..3a591bb18c5f 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf..0b34ca228ba4 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 000000000000..76a640b6cf6e
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 000000000000..41e27cefd9a4
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 000000000000..132ff2ab3feb
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,389 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +#ifdef CONFIG_64BIT
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> +#else
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> +#endif
> + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (sizeof(u64) - size % sizeof(u64));
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/*
> + * Serialize the profiling data into a format LLVM's tools can understand.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (unlikely(err)) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; i++) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 000000000000..62ff5cfce7b1
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/*
> + * This lock guards both profile count updating and serialization of the
> + * profiling data. Keeping both of these activities separate via locking
> + * ensures that we don't try to serialize data that's only partially updated.
> + */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the index if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, use it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 000000000000..0d33e07a0bf3
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +
> +#define LLVM_INSTR_PROF_RAW_VERSION 5
> +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> +#define LLVM_INSTR_PROF_IPVK_LAST 1
> +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> +
> +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33..9b218afb5cb8 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.296.g2bfb1c46d8-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210121082451.2240540-1-morbo%40google.com.

2021-01-22 01:13:13

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v6] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 1:14 AM Sedat Dilek <[email protected]> wrote:
>
> On Thu, Jan 21, 2021 at 9:25 AM 'Bill Wendling' via Clang Built Linux
> <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
>
> Was this v6 compile-tested?
>
> This breaks and errors like this:
>
> clang -Wp,-MMD,kernel/pgo/.fs.o.d -nostdinc -isystem
> /opt/llvm-toolchain/lib/clang/11.1.0/include -I./arch/x86/include
> -I./arch/x86/include/generated -I./include -I
> ./arch/x86/include/uapi -I./arch/x86/include/generated/uapi
> -I./include/uapi -I./include/generated/uapi -include
> ./include/linux/kconfig.h -include ./include/linux/com
> piler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=./=
> -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs
> -fno-strict-aliasing -fno-common -fshort-wc
> har -fno-PIE -Werror=implicit-function-declaration
> -Werror=implicit-int -Werror=return-type -Wno-format-security
> -std=gnu89 -Werror=unknown-warning-option -mno-sse -mn
> o-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387
> -mstack-alignment=8 -mtune=generic -mno-red-zone -mcmodel=kernel
> -Wno-sign-compare -fno-asynchronous-unwind-tables
> -mretpoline-external-thunk -fno-delete-null-pointer-checks
> -Wno-frame-address -Wno-address-of-packed-member -O2
> -Wframe-larger-than=2048 -fstack-protector-strong -Wno-
> format-invalid-specifier -Wno-gnu -mno-global-merge
> -Wno-unused-const-variable -g -gdwarf-5 -g -gdwarf-5 -pg -mfentry
> -DCC_USING_FENTRY -Wdeclaration-after-statement -
> Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow
> -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types
> -fcf-protection=none -Wno-initializ
> er-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length
> -Wno-pointer-to-enum-cast
> -Wno-tautological-constant-out-of-range-compare
> -DKBUILD_MODFILE='"ker
> nel/pgo/fs"' -DKBUILD_BASENAME='"fs"' -DKBUILD_MODNAME='"fs"' -c -o
> kernel/pgo/fs.o kernel/pgo/fs.c
> In file included from kernel/pgo/fs.c:27:
> kernel/pgo/pgo.h:102:28: error: use of undeclared identifier
> 'LLVM_PRF_IPVK_LAST'
> const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> ^
> kernel/pgo/fs.c:70:28: error: use of undeclared identifier 'LLVM_PRF_IPVK_LAST'
> header->value_kind_last = LLVM_PRF_IPVK_LAST;
> ^
> 2 errors generated.
> make[5]: *** [scripts/Makefile.build:279: kernel/pgo/fs.o] Error 1
>

That should fix it:

diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
index 132ff2ab3feb..1678df3b7d64 100644
--- a/kernel/pgo/fs.c
+++ b/kernel/pgo/fs.c
@@ -67,7 +67,7 @@ static void prf_fill_header(void **buffer)
header->names_size = prf_names_count();
header->counters_delta = (u64)__llvm_prf_cnts_start;
header->names_delta = (u64)__llvm_prf_names_start;
- header->value_kind_last = LLVM_PRF_IPVK_LAST;
+ header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;

*buffer += sizeof(*header);
}
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
index 0d33e07a0bf3..ddc8d3002fe5 100644
--- a/kernel/pgo/pgo.h
+++ b/kernel/pgo/pgo.h
@@ -99,7 +99,7 @@ struct llvm_prf_data {
const void *function_ptr;
void *values;
const u32 num_counters;
- const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
+ const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);

/**

s/LLVM_PRF_IPVK_LAST/LLVM_INSTR_PROF_IPVK_LAST/g

- Sedat -

>
> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > testing.
> > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > v6: - Add better documentation about the locking scheme and other things.
> > - Rename macros to better match the same macros in LLVM's source code.
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 4 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 44 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 189 +++++++++++++
> > kernel/pgo/pgo.h | 203 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1032 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9..8d6418e85806 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 000000000000..b7f11d8405b7
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 00836f6452f0..13333026e140 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13948,6 +13948,15 @@ S: Maintained
> > F: include/linux/personality.h
> > F: include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M: Sami Tolvanen <[email protected]>
> > +M: Bill Wendling <[email protected]>
> > +R: Nathan Chancellor <[email protected]>
> > +R: Nick Desaulniers <[email protected]>
> > +S: Supported
> > +F: Documentation/dev-tools/pgo.rst
> > +F: kernel/pgo
> > +
> > PHOENIX RC FLIGHT CONTROLLER ADAPTER
> > M: Marcus Folkesson <[email protected]>
> > L: [email protected]
> > diff --git a/Makefile b/Makefile
> > index b0e4767735dc..9339541f7cec 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 24862d15f3a3..f39d3991f6bf 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> > pairs of 32-bit arguments, select this option.
> >
> > source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> > source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 21f851179ff0..36305ea61dc0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -96,6 +96,7 @@ config X86
> > select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> > select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> > + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_QUEUED_RWLOCKS
> > select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce..383853e32f67 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3fa..ed12ab65f606 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE :=n
> >
> > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> > index a31de0c6ccde..5753aea7bcbd 100644
> > --- a/arch/x86/crypto/Makefile
> > +++ b/arch/x86/crypto/Makefile
> > @@ -4,6 +4,10 @@
> >
> > OBJECT_FILES_NON_STANDARD := y
> >
> > +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> > +# registers for some of the functions.
> > +PGO_PROFILE_curve25519-x86_64.o := n
> > +
> > obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
> >
> > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380b..26e2b3af0145 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> > $(call ld-option, --eh-frame-hdr) -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > quiet_cmd_vdso_and_check = VDSO $@
> > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f2..f6cab2316c46 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> > BUG_TABLE
> >
> > + PGO_CLANG_DATA
> > +
> > ORC_UNWIND_TABLE
> >
> > . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd..5f22b31446ad 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> > KASAN_SANITIZE := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20c..36f20e99da0b 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> > # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449..21797192f958 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f35..54f5768f5853 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b3..2d81623b33f2 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > # Sanitizer runtimes are unavailable and cannot be linked here.
> > KASAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535..3a591bb18c5f 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> > #define THERMAL_TABLE(name)
> > #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA \
> > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_start = .; \
> > + __llvm_prf_data_start = .; \
> > + KEEP(*(__llvm_prf_data)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_data_end = .; \
> > + } \
> > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_start = .; \
> > + KEEP(*(__llvm_prf_cnts)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_end = .; \
> > + } \
> > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_start = .; \
> > + KEEP(*(__llvm_prf_names)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> > + __llvm_prf_vals_start = .; \
> > + KEEP(*(__llvm_prf_vals)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vals_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> > + __llvm_prf_vnds_start = .; \
> > + KEEP(*(__llvm_prf_vnds)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vnds_end = .; \
> > + __llvm_prf_end = .; \
> > + }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> > #define KERNEL_DTB() \
> > STRUCT_ALIGN(); \
> > __dtb_start = .; \
> > @@ -1125,6 +1168,7 @@
> > CONSTRUCTORS \
> > } \
> > BUG_TABLE \
> > + PGO_CLANG_DATA
> >
> > #define INIT_TEXT_SECTION(inittext_align) \
> > . = ALIGN(inittext_align); \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf..0b34ca228ba4 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> > obj-$(CONFIG_KCSAN) += kcsan/
> > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> > obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 000000000000..76a640b6cf6e
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,35 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > + bool
> > +
> > +config PGO_CLANG
> > + bool "Enable clang's PGO-based kernel profiling"
> > + depends on DEBUG_FS
> > + depends on ARCH_SUPPORTS_PGO_CLANG
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> > + help
> > + This option enables clang's PGO (Profile Guided Optimization) based
> > + code profiling to better optimize the kernel.
> > +
> > + If unsure, say N.
> > +
> > + Run a representative workload for your application on a kernel
> > + compiled with this option and download the raw profile file from
> > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > + llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > + Copy the resulting profile file into vmlinux.profdata, and enable
> > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > + kernel.
> > +
> > + Note that a kernel compiled with profiling flags will be
> > + significantly larger and run slower. Also be sure to exclude files
> > + from profiling which are not linked to the kernel image to prevent
> > + linker errors.
> > +
> > + Note that the debugfs filesystem has to be mounted to access
> > + profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 000000000000..41e27cefd9a4
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > +
> > +obj-y += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 000000000000..132ff2ab3feb
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,389 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > + void *buffer;
> > + unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + * - llvm_prf_header
> > + * - __llvm_prf_data
> > + * - __llvm_prf_cnts
> > + * - __llvm_prf_names
> > + * - zero padding to 8 bytes
> > + * - for each llvm_prf_data in __llvm_prf_data:
> > + * - llvm_prf_value_data
> > + * - llvm_prf_value_record + site count array
> > + * - llvm_prf_value_node_data
> > + * ...
> > + * ...
> > + * ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > +#ifdef CONFIG_64BIT
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> > +#else
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> > +#endif
> > + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> > + header->data_size = prf_data_count();
> > + header->padding_bytes_before_counters = 0;
> > + header->counters_size = prf_cnts_count();
> > + header->padding_bytes_after_counters = 0;
> > + header->names_size = prf_names_count();
> > + header->counters_delta = (u64)__llvm_prf_cnts_start;
> > + header->names_delta = (u64)__llvm_prf_names_start;
> > + header->value_kind_last = LLVM_PRF_IPVK_LAST;
> > +
> > + *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > + memcpy(*buffer, src, size);
> > + *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + u32 kinds = 0;
> > + u32 size = 0;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Record + site count array */
> > + size += prf_get_value_record_size(sites);
> > + kinds++;
> > +
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX)
> > + site = site->next;
> > +
> > + size += count *
> > + sizeof(struct llvm_prf_value_node_data);
> > + }
> > +
> > + s += sites;
> > + }
> > +
> > + if (size)
> > + size += sizeof(struct llvm_prf_value_data);
> > +
> > + if (value_kinds)
> > + *value_kinds = kinds;
> > +
> > + return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > + u32 size = 0;
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + size += __prf_get_value_size(p, NULL);
> > +
> > + return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > + struct llvm_prf_value_data header;
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > + if (!header.num_value_kinds)
> > + /* Nothing to write. */
> > + return;
> > +
> > + prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + struct llvm_prf_value_record *record;
> > + u8 *counts;
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Profiling value record. */
> > + record = *(struct llvm_prf_value_record **)buffer;
> > + *buffer += prf_get_value_record_header_size();
> > +
> > + record->kind = kind;
> > + record->num_value_sites = sites;
> > +
> > + /* Site count array. */
> > + counts = *(u8 **)buffer;
> > + *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > + /*
> > + * If we don't have nodes, we can skip updating the site count
> > + * array, because the buffer is zero filled.
> > + */
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX) {
> > + prf_copy_to_buffer(buffer, site,
> > + sizeof(struct llvm_prf_value_node_data));
> > + site = site->next;
> > + }
> > +
> > + counts[n] = (u8)count;
> > + }
> > +
> > + s += sites;
> > + }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > + return 7 & (sizeof(u64) - size % sizeof(u64));
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > + return sizeof(struct llvm_prf_header) +
> > + prf_data_size() +
> > + prf_cnts_size() +
> > + prf_names_size() +
> > + prf_get_padding(prf_names_size()) +
> > + prf_get_value_size();
> > +}
> > +
> > +/*
> > + * Serialize the profiling data into a format LLVM's tools can understand.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > + int err = 0;
> > + void *buffer;
> > +
> > + p->size = prf_buffer_size();
> > + p->buffer = vzalloc(p->size);
> > +
> > + if (!p->buffer) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + buffer = p->buffer;
> > +
> > + prf_fill_header(&buffer);
> > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > + buffer += prf_get_padding(prf_names_size());
> > +
> > + prf_serialize_values(&buffer);
> > +
> > +out:
> > + return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data;
> > + unsigned long flags;
> > + int err;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + flags = prf_lock();
> > +
> > + err = prf_serialize(data);
> > + if (unlikely(err)) {
> > + kfree(data);
> > + goto out_unlock;
> > + }
> > +
> > + file->private_data = data;
> > +
> > +out_unlock:
> > + prf_unlock(flags);
> > +out:
> > + return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + BUG_ON(!data);
> > +
> > + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > + data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + if (data) {
> > + vfree(data->buffer);
> > + kfree(data);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > + .owner = THIS_MODULE,
> > + .open = prf_open,
> > + .read = prf_read,
> > + .llseek = default_llseek,
> > + .release = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > + size_t len, loff_t *pos)
> > +{
> > + struct llvm_prf_data *data;
> > +
> > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> > + struct llvm_prf_value_node **vnodes;
> > + u64 current_vsite_count;
> > + u32 i;
> > +
> > + if (!data->values)
> > + continue;
> > +
> > + current_vsite_count = 0;
> > + vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> > + current_vsite_count += data->num_value_sites[i];
> > +
> > + for (i = 0; i < current_vsite_count; i++) {
> > + struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > + while (current_vnode) {
> > + current_vnode->count = 0;
> > + current_vnode = current_vnode->next;
> > + }
> > + }
> > + }
> > +
> > + return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > + .owner = THIS_MODULE,
> > + .write = reset_write,
> > + .llseek = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > + directory = debugfs_create_dir("pgo", NULL);
> > + if (!directory)
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > + &prf_fops))
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("reset", 0200, directory, NULL,
> > + &prf_reset_fops))
> > + goto err_remove;
> > +
> > + return 0;
> > +
> > +err_remove:
> > + pr_err("initialization failed\n");
> > + return -EIO;
> > +}
> > +
> > +/* Remove debugfs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > + debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 000000000000..62ff5cfce7b1
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/*
> > + * This lock guards both profile count updating and serialization of the
> > + * profiling data. Keeping both of these activities separate via locking
> > + * ensures that we don't try to serialize data that's only partially updated.
> > + */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&pgo_lock, flags);
> > +
> > + return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > + spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > + u32 index, u64 value)
> > +{
> > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > + return NULL; /* Out of nodes */
> > +
> > + current_node++;
> > +
> > + /* Make sure the node is entirely within the section */
> > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > + return NULL;
> > +
> > + return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the index if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > + struct llvm_prf_value_node **counters;
> > + struct llvm_prf_value_node *curr;
> > + struct llvm_prf_value_node *min = NULL;
> > + struct llvm_prf_value_node *prev = NULL;
> > + u64 min_count = U64_MAX;
> > + u8 values = 0;
> > + unsigned long flags;
> > +
> > + if (!p || !p->values)
> > + return;
> > +
> > + counters = (struct llvm_prf_value_node **)p->values;
> > + curr = counters[index];
> > +
> > + while (curr) {
> > + if (target_value == curr->value) {
> > + curr->count++;
> > + return;
> > + }
> > +
> > + if (curr->count < min_count) {
> > + min_count = curr->count;
> > + min = curr;
> > + }
> > +
> > + prev = curr;
> > + curr = curr->next;
> > + values++;
> > + }
> > +
> > + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> > + if (!min->count || !(--min->count)) {
> > + curr = min;
> > + curr->value = target_value;
> > + curr->count++;
> > + }
> > + return;
> > + }
> > +
> > + /* Lock when updating the value node structure. */
> > + flags = prf_lock();
> > +
> > + curr = allocate_node(p, index, target_value);
> > + if (!curr)
> > + goto out;
> > +
> > + curr->value = target_value;
> > + curr->count++;
> > +
> > + if (!counters[index])
> > + counters[index] = curr;
> > + else if (prev && !prev->next)
> > + prev->next = curr;
> > +
> > +out:
> > + prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value);
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value)
> > +{
> > + if (large_value != S64_MIN && (s64)target_value >= large_value)
> > + target_value = large_value;
> > + else if ((s64)target_value < precise_start ||
> > + (s64)target_value > precise_last)
> > + target_value = precise_last + 1;
> > +
> > + __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > +
> > +static u64 inst_prof_get_range_rep_value(u64 value)
> > +{
> > + if (value <= 8)
> > + /* The first ranges are individually tracked, use it as is. */
> > + return value;
> > + else if (value >= 513)
> > + /* The last range is mapped to its lowest value. */
> > + return 513;
> > + else if (hweight64(value) == 1)
> > + /* If it's a power of two, use it as is. */
> > + return value;
> > +
> > + /* Otherwise, take to the previous power of two + 1. */
> > + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> > +}
> > +
> > +/*
> > + * The target values are partitioned into multiple ranges. The range spec is
> > + * defined in compiler-rt/include/profile/InstrProfData.inc.
> > + */
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index);
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index)
> > +{
> > + u64 rep_value;
> > +
> > + /* Map the target value to the representative value of its range. */
> > + rep_value = inst_prof_get_range_rep_value(target_value);
> > + __llvm_profile_instrument_target(rep_value, data, counter_index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 000000000000..0d33e07a0bf3
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,203 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'r' << 8 | \
> > + (u64)129)
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'R' << 8 | \
> > + (u64)129)
> > +
> > +#define LLVM_INSTR_PROF_RAW_VERSION 5
> > +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> > +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> > +#define LLVM_INSTR_PROF_IPVK_LAST 1
> > +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> > +
> > +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> > +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + * counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + * counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + * counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + * counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > + u64 magic;
> > + u64 version;
> > + u64 data_size;
> > + u64 padding_bytes_before_counters;
> > + u64 counters_size;
> > + u64 padding_bytes_after_counters;
> > + u64 names_size;
> > + u64 counters_delta;
> > + u64 names_delta;
> > + u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > + const u64 name_ref;
> > + const u64 func_hash;
> > + const void *counter_ptr;
> > + const void *function_ptr;
> > + void *values;
> > + const u32 num_counters;
> > + const u16 num_value_sites[LLVM_PRF_IPVK_LAST + 1];
> > +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + * llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > + u64 value;
> > + u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + * the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > + u64 value;
> > + u64 count;
> > + struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + * format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + * data.
> > + */
> > +struct llvm_prf_value_data {
> > + u32 total_size;
> > + u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + * profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + * of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > + u32 kind;
> > + u32 num_value_sites;
> > + u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size() \
> > + offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites) \
> > + roundup((sites), 8)
> > +#define prf_get_value_record_size(sites) \
> > + (prf_get_value_record_header_size() + \
> > + prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > + static inline unsigned long prf_ ## s ## _size(void) \
> > + { \
> > + unsigned long start = \
> > + (unsigned long)__llvm_prf_ ## s ## _start; \
> > + unsigned long end = \
> > + (unsigned long)__llvm_prf_ ## s ## _end; \
> > + return roundup(end - start, \
> > + sizeof(__llvm_prf_ ## s ## _start[0])); \
> > + } \
> > + static inline unsigned long prf_ ## s ## _count(void) \
> > + { \
> > + return prf_ ## s ## _size() / \
> > + sizeof(__llvm_prf_ ## s ## _start[0]); \
> > + }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33..9b218afb5cb8 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_GCOV))
> > endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > + $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> > #
> > # Enable address sanitizer flags for kernel except some files or directories
> > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.296.g2bfb1c46d8-goog
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210121082451.2240540-1-morbo%40google.com.

2021-01-22 01:31:56

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v6] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 21, 2021 at 12:24 AM Bill Wendling <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.

This is a major win for readability and comparing it against LLVM's
compiler-rt implementation! Thank you for doing that. It looks like
it addresses most of my concerns. I'm not against following up on
little details in subsequent patches on top. However Sedat is right
about the small issue that v6 doesn't compile. If you were to roll
his fixup into a v7 I'd be happy to sign off on it at this point.
--
Thanks,
~Nick Desaulniers

2021-01-22 01:37:16

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Jan 20, 2021 at 6:21 PM Sedat Dilek <[email protected]> wrote:
>
> Hi,
>
> When I looked through the code I wondered why we do not add a
> "CONFIG_PGO_CLANG_PROFDATA" which can be helpful when doing the PGO
> rebuild with a vmlinux.profdata.
>
> This introduces a "PGO_PROFDATA" to turn on/off to pass
> "-fprofile-use=vmlinux.profdata" (see CFLAGS_PGO_CLANG_PROFDATA in
> top-level Makefile).
>
> If we turn off via "PGO_PROFILE := n" in several Makefiles - we should
> do the same and add "PGO_PROFDATA := n" to the same Makefiles?
>
> Please see the attached diff.

This is a good idea; something that I brought up in initial code
review (on github). Would it be ok with you to land the core first,
then follow up with this suggestion?

Also, AutoFDO production builds are so incredibly similar to PGO
builds that I could see a possible path forward:
1. land PGO upstream
2. adds docs for AutoFDO
3. consider a config for hardcoding the location of the profiling data
so that we don't need to specify it at the command line invocation of
make.
--
Thanks,
~Nick Desaulniers

2021-01-22 01:45:53

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Jan 20, 2021 at 6:03 PM Sedat Dilek <[email protected]> wrote:
>
> On Mon, Jan 18, 2021 at 10:56 PM Bill Wendling <[email protected]> wrote:
> >
> > On Mon, Jan 18, 2021 at 9:26 AM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> > > > >
> > > > > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > > > > >
> > > > > > [ big snip ]
> > > > >
> > > > > [More snippage.]
> > > > >
> > > > > > [ CC Fangrui ]
> > > > > >
> > > > > > With the attached...
> > > > > >
> > > > > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > > > > undefined symbols
> > > > > >
> > > > > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > > > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > > > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > > > > >
> > > > > Thanks for confirming that this works with the above patch.
> > > > >
> > > > > > @ Bill Nick Sami Nathan
> > > > > >
> > > > > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> > > > >
> > > > > The integrated assembler and this option are more-or-less orthogonal
> > > > > to each other. One can still use the GNU assembler with PGO. If you're
> > > > > having an issue, it may be related to ClangBuiltLinux issue #1250.
> > > > >
> > > > > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > > > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > > > > >
> > > > > I know Nick did several tests with PGO. He may have looked into it
> > > > > already, but we can check.
> > > > >
> > > >
> > > > Reproducible.
> > > >
> > > > LLVM_IAS=1 + DWARF5 = Not bootable
> > > >
> > > > I will try:
> > > >
> > > > LLVM_IAS=1 + DWARF4
> > > >
> > >
> > > I was not able to boot into such a built Linux-kernel.
> > >
> > PGO will have no effect on debugging data. If this is an issue with
> > DWARF, then it's likely orthogonal to the PGO patch.
> >
> > > For me worked: DWARF2 and LLVM_IAS=1 *not* set.
> > >
> > > Of course, this could be an issue with my system's LLVM/Clang.
> > >
> > > Debian clang version
> > > 12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> > >
> > Please use the official clang 11.0.1 release
> > (https://releases.llvm.org/download.html), modifying the
> > kernel/pgo/Kconfig as I suggested above. The reason we specify clang
> > 12 for the minimal version is because of an issue that was recently
> > fixed.
> >
>
> I downgraded to clang-11.1.0-rc1.
> ( See attachment. )
>
> Doing the first build with PGO enabled plus DWARF5 and LLVM_IAS=1 works.
>
> But again after generating vmlinux.profdata and doing the PGO-rebuild
> - the resulting Linux-kernel does NOT boot in QEMU or on bare metal.
> With GNU/as I can boot.
>
> So this is independent of DWARF v4 or DWARF v5 (LLVM_IAS=1 and DWARF
> v2 is not allowed).
> There is something wrong (here) with passing LLVM_IAS=1 to make when
> doing the PGO-rebuild.
>
> Can someone please verify and confirm that the PGO-rebuild with
> LLVM_IAS=1 boots or boots not?

I was able to build+boot with LLVM_IAS=1 on my personal laptop (no
dwarf 5, just mainline+v5).

>
> Thanks.
>
> - Sedat -
>
> > > Can you give me a LLVM commit-id where you had success with LLVM_IAS=1
> > > and especially CONFIG_DEBUG_INFO_DWARF5=y?
> > > Success means I was able to boot in QEMU and/or bare metal.
> > >
> > The DWARF5 patch isn't in yet, so I don't want to rely upon it too much.

I agree, providing test results with patches that haven't landed yet
can cloud the interpretation of results. It would be helpful to drop
local patch sets before trying this.

If the resulting image still isn't working for you, can you please
provide your config? Surely we'd be able to reproduce boot failures in
QEMU? Nothing comes to mind about a change of assemblers causing an
issue; I would assume assembly cannot be instrumented by the compiler
(even though the compiler is the "driver" of the assembler).

The hash warnings are certainly curious.
IndexedInstrProfReader::getInstrProfRecord() is the only place in LLVM
sources that can emit that.
--
Thanks,
~Nick Desaulniers

2021-01-22 01:47:12

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 2:34 AM Nick Desaulniers
<[email protected]> wrote:
>
> On Wed, Jan 20, 2021 at 6:21 PM Sedat Dilek <[email protected]> wrote:
> >
> > Hi,
> >
> > When I looked through the code I wondered why we do not add a
> > "CONFIG_PGO_CLANG_PROFDATA" which can be helpful when doing the PGO
> > rebuild with a vmlinux.profdata.
> >
> > This introduces a "PGO_PROFDATA" to turn on/off to pass
> > "-fprofile-use=vmlinux.profdata" (see CFLAGS_PGO_CLANG_PROFDATA in
> > top-level Makefile).
> >
> > If we turn off via "PGO_PROFILE := n" in several Makefiles - we should
> > do the same and add "PGO_PROFDATA := n" to the same Makefiles?
> >
> > Please see the attached diff.
>
> This is a good idea; something that I brought up in initial code
> review (on github). Would it be ok with you to land the core first,
> then follow up with this suggestion?
>
> Also, AutoFDO production builds are so incredibly similar to PGO
> builds that I could see a possible path forward:
> 1. land PGO upstream
> 2. adds docs for AutoFDO
> 3. consider a config for hardcoding the location of the profiling data
> so that we don't need to specify it at the command line invocation of
> make.
>

I made a v3 - with some small nits.
The idea was to do the "PGO-rebuild" handling a bit easier.

But as you say that can wait.

Some personal notes:

I will be very happy when people verify/confirm what's going on with
PGO-rebuild + LLVM_IAS=1.
As said GNU/AS and PGO-rebuild is fine.
( This seems to be independent of clang-12 or clang-11. )
( This seems to be independent of DWARF v4 or v5 enabled. )

The benefit here I saw was a reduction in build-time of 00:30 seen
from a total 04:30 when using a PGO-rebuilt Linux-kernel.
Approx. 10%?

This is not much compared to a ThinLTO + PGO optimized LLVM toolchain
whcih saved here 40% of build-time.

- Sedat -

2021-01-22 01:48:27

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v6] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 21, 2021 at 2:35 AM Sedat Dilek <[email protected]> wrote:
>
> Thanks for v6.
>
> Small Changelog nits:
> * Can you reverse-order the changelog - latest v6 first.
> * v4: s/Makfile changes and se/Ma*k*efile changes and *u*se
>
> Can you add a hint to this "Clang-PGO" patch requiring Clang >= 12?
>
> Can you please add a comment for people using Clang >= 12 (ToT) and
> have kernel-modules enabled, they will need the patch from CBL issue
> #1250?
> Otherwise they cannot boot and follow the next steps in the workflow.
>
> Can you put a comment about value "1" to reset the profiling counter?
> That there is no "0" value stopping it.
>
> Can you add an example for the workload test?
> Here I do a x86-64 defconfig build.
> See attached script.
>
> Usually, I download this patch from LORE.
>
> link="https://lore.kernel.org/r/[email protected]"
> b4 -d am $link
>
> This downloads v6.
>
> What if I want a previous version (compare)?
> Again, I will love to see a "clang-pgo" branch and maybe tags for the
> several versions in your personal GitHub.
> Come on, Bill :-).

That's quite a long list, Sedat! Do you think some of these can be
follow ups, once the core lands? I'd much prefer to land the meat of
this and follow up quickly, than tire out poor Bill! :P
--
Thanks,
~Nick Desaulniers

2021-01-22 01:51:00

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 2:43 AM Nick Desaulniers
<[email protected]> wrote:
>
> On Wed, Jan 20, 2021 at 6:03 PM Sedat Dilek <[email protected]> wrote:
> >
> > On Mon, Jan 18, 2021 at 10:56 PM Bill Wendling <[email protected]> wrote:
> > >
> > > On Mon, Jan 18, 2021 at 9:26 AM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
> > > > >
> > > > > On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> > > > > >
> > > > > > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > > > > > >
> > > > > > > [ big snip ]
> > > > > >
> > > > > > [More snippage.]
> > > > > >
> > > > > > > [ CC Fangrui ]
> > > > > > >
> > > > > > > With the attached...
> > > > > > >
> > > > > > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > > > > > undefined symbols
> > > > > > >
> > > > > > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > > > > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > > > > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > > > > > >
> > > > > > Thanks for confirming that this works with the above patch.
> > > > > >
> > > > > > > @ Bill Nick Sami Nathan
> > > > > > >
> > > > > > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> > > > > >
> > > > > > The integrated assembler and this option are more-or-less orthogonal
> > > > > > to each other. One can still use the GNU assembler with PGO. If you're
> > > > > > having an issue, it may be related to ClangBuiltLinux issue #1250.
> > > > > >
> > > > > > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > > > > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > > > > > >
> > > > > > I know Nick did several tests with PGO. He may have looked into it
> > > > > > already, but we can check.
> > > > > >
> > > > >
> > > > > Reproducible.
> > > > >
> > > > > LLVM_IAS=1 + DWARF5 = Not bootable
> > > > >
> > > > > I will try:
> > > > >
> > > > > LLVM_IAS=1 + DWARF4
> > > > >
> > > >
> > > > I was not able to boot into such a built Linux-kernel.
> > > >
> > > PGO will have no effect on debugging data. If this is an issue with
> > > DWARF, then it's likely orthogonal to the PGO patch.
> > >
> > > > For me worked: DWARF2 and LLVM_IAS=1 *not* set.
> > > >
> > > > Of course, this could be an issue with my system's LLVM/Clang.
> > > >
> > > > Debian clang version
> > > > 12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> > > >
> > > Please use the official clang 11.0.1 release
> > > (https://releases.llvm.org/download.html), modifying the
> > > kernel/pgo/Kconfig as I suggested above. The reason we specify clang
> > > 12 for the minimal version is because of an issue that was recently
> > > fixed.
> > >
> >
> > I downgraded to clang-11.1.0-rc1.
> > ( See attachment. )
> >
> > Doing the first build with PGO enabled plus DWARF5 and LLVM_IAS=1 works.
> >
> > But again after generating vmlinux.profdata and doing the PGO-rebuild
> > - the resulting Linux-kernel does NOT boot in QEMU or on bare metal.
> > With GNU/as I can boot.
> >
> > So this is independent of DWARF v4 or DWARF v5 (LLVM_IAS=1 and DWARF
> > v2 is not allowed).
> > There is something wrong (here) with passing LLVM_IAS=1 to make when
> > doing the PGO-rebuild.
> >
> > Can someone please verify and confirm that the PGO-rebuild with
> > LLVM_IAS=1 boots or boots not?
>
> I was able to build+boot with LLVM_IAS=1 on my personal laptop (no
> dwarf 5, just mainline+v5).
>

To clarify:

I can build a PGO-enabled Linux-kernel and boot it.
Afterwards generate a vmlinux.profdata.
In a next step: A rebuild without PGO-Kconfig disabled + LLVM_IAS=1
does not boot.

- Sedat -

> >
> > Thanks.
> >
> > - Sedat -
> >
> > > > Can you give me a LLVM commit-id where you had success with LLVM_IAS=1
> > > > and especially CONFIG_DEBUG_INFO_DWARF5=y?
> > > > Success means I was able to boot in QEMU and/or bare metal.
> > > >
> > > The DWARF5 patch isn't in yet, so I don't want to rely upon it too much.
>
> I agree, providing test results with patches that haven't landed yet
> can cloud the interpretation of results. It would be helpful to drop
> local patch sets before trying this.
>
> If the resulting image still isn't working for you, can you please
> provide your config? Surely we'd be able to reproduce boot failures in
> QEMU? Nothing comes to mind about a change of assemblers causing an
> issue; I would assume assembly cannot be instrumented by the compiler
> (even though the compiler is the "driver" of the assembler).
>
> The hash warnings are certainly curious.
> IndexedInstrProfReader::getInstrProfRecord() is the only place in LLVM
> sources that can emit that.
> --
> Thanks,
> ~Nick Desaulniers

2021-01-22 01:56:08

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 21, 2021 at 5:49 PM Sedat Dilek <[email protected]> wrote:
>
> On Fri, Jan 22, 2021 at 2:43 AM Nick Desaulniers
> <[email protected]> wrote:
> >
> > On Wed, Jan 20, 2021 at 6:03 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > On Mon, Jan 18, 2021 at 10:56 PM Bill Wendling <[email protected]> wrote:
> > > >
> > > > On Mon, Jan 18, 2021 at 9:26 AM Sedat Dilek <[email protected]> wrote:
> > > > >
> > > > > On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> > > > > > >
> > > > > > > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > > > > > > >
> > > > > > > > [ big snip ]
> > > > > > >
> > > > > > > [More snippage.]
> > > > > > >
> > > > > > > > [ CC Fangrui ]
> > > > > > > >
> > > > > > > > With the attached...
> > > > > > > >
> > > > > > > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > > > > > > undefined symbols
> > > > > > > >
> > > > > > > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > > > > > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > > > > > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > > > > > > >
> > > > > > > Thanks for confirming that this works with the above patch.
> > > > > > >
> > > > > > > > @ Bill Nick Sami Nathan
> > > > > > > >
> > > > > > > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> > > > > > >
> > > > > > > The integrated assembler and this option are more-or-less orthogonal
> > > > > > > to each other. One can still use the GNU assembler with PGO. If you're
> > > > > > > having an issue, it may be related to ClangBuiltLinux issue #1250.
> > > > > > >
> > > > > > > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > > > > > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > > > > > > >
> > > > > > > I know Nick did several tests with PGO. He may have looked into it
> > > > > > > already, but we can check.
> > > > > > >
> > > > > >
> > > > > > Reproducible.
> > > > > >
> > > > > > LLVM_IAS=1 + DWARF5 = Not bootable
> > > > > >
> > > > > > I will try:
> > > > > >
> > > > > > LLVM_IAS=1 + DWARF4
> > > > > >
> > > > >
> > > > > I was not able to boot into such a built Linux-kernel.
> > > > >
> > > > PGO will have no effect on debugging data. If this is an issue with
> > > > DWARF, then it's likely orthogonal to the PGO patch.
> > > >
> > > > > For me worked: DWARF2 and LLVM_IAS=1 *not* set.
> > > > >
> > > > > Of course, this could be an issue with my system's LLVM/Clang.
> > > > >
> > > > > Debian clang version
> > > > > 12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> > > > >
> > > > Please use the official clang 11.0.1 release
> > > > (https://releases.llvm.org/download.html), modifying the
> > > > kernel/pgo/Kconfig as I suggested above. The reason we specify clang
> > > > 12 for the minimal version is because of an issue that was recently
> > > > fixed.
> > > >
> > >
> > > I downgraded to clang-11.1.0-rc1.
> > > ( See attachment. )
> > >
> > > Doing the first build with PGO enabled plus DWARF5 and LLVM_IAS=1 works.
> > >
> > > But again after generating vmlinux.profdata and doing the PGO-rebuild
> > > - the resulting Linux-kernel does NOT boot in QEMU or on bare metal.
> > > With GNU/as I can boot.
> > >
> > > So this is independent of DWARF v4 or DWARF v5 (LLVM_IAS=1 and DWARF
> > > v2 is not allowed).
> > > There is something wrong (here) with passing LLVM_IAS=1 to make when
> > > doing the PGO-rebuild.
> > >
> > > Can someone please verify and confirm that the PGO-rebuild with
> > > LLVM_IAS=1 boots or boots not?
> >
> > I was able to build+boot with LLVM_IAS=1 on my personal laptop (no
> > dwarf 5, just mainline+v5).
> >
>
> To clarify:
>
> I can build a PGO-enabled Linux-kernel and boot it.
> Afterwards generate a vmlinux.profdata.
> In a next step: A rebuild without PGO-Kconfig disabled + LLVM_IAS=1
> does not boot.

Does the rebuild produce the hash warnings previously reported?

Can you send your .config for this?
--
Thanks,
~Nick Desaulniers

2021-01-22 02:38:45

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v6] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 2:44 AM Nick Desaulniers
<[email protected]> wrote:
>
> On Thu, Jan 21, 2021 at 2:35 AM Sedat Dilek <[email protected]> wrote:
> >
> > Thanks for v6.
> >
> > Small Changelog nits:
> > * Can you reverse-order the changelog - latest v6 first.
> > * v4: s/Makfile changes and se/Ma*k*efile changes and *u*se
> >
> > Can you add a hint to this "Clang-PGO" patch requiring Clang >= 12?
> >
> > Can you please add a comment for people using Clang >= 12 (ToT) and
> > have kernel-modules enabled, they will need the patch from CBL issue
> > #1250?
> > Otherwise they cannot boot and follow the next steps in the workflow.
> >
> > Can you put a comment about value "1" to reset the profiling counter?
> > That there is no "0" value stopping it.
> >
> > Can you add an example for the workload test?
> > Here I do a x86-64 defconfig build.
> > See attached script.
> >
> > Usually, I download this patch from LORE.
> >
> > link="https://lore.kernel.org/r/[email protected]"
> > b4 -d am $link
> >
> > This downloads v6.
> >
> > What if I want a previous version (compare)?
> > Again, I will love to see a "clang-pgo" branch and maybe tags for the
> > several versions in your personal GitHub.
> > Come on, Bill :-).
>
> That's quite a long list, Sedat! Do you think some of these can be
> follow ups, once the core lands? I'd much prefer to land the meat of
> this and follow up quickly, than tire out poor Bill! :P
>

Poor Bill - he lost his hairs :-)?

I hoped that the documentation gets a bit improved and clearer at some places.

- Sedat -

2021-01-22 02:38:54

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v5] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 2:52 AM Nick Desaulniers
<[email protected]> wrote:
>
> On Thu, Jan 21, 2021 at 5:49 PM Sedat Dilek <[email protected]> wrote:
> >
> > On Fri, Jan 22, 2021 at 2:43 AM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > On Wed, Jan 20, 2021 at 6:03 PM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > On Mon, Jan 18, 2021 at 10:56 PM Bill Wendling <[email protected]> wrote:
> > > > >
> > > > > On Mon, Jan 18, 2021 at 9:26 AM Sedat Dilek <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Jan 18, 2021 at 1:39 PM Sedat Dilek <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Jan 18, 2021 at 3:32 AM Bill Wendling <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Sun, Jan 17, 2021 at 4:27 PM Sedat Dilek <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > [ big snip ]
> > > > > > > >
> > > > > > > > [More snippage.]
> > > > > > > >
> > > > > > > > > [ CC Fangrui ]
> > > > > > > > >
> > > > > > > > > With the attached...
> > > > > > > > >
> > > > > > > > > [PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for
> > > > > > > > > undefined symbols
> > > > > > > > >
> > > > > > > > > ...I was finally able to boot into a rebuild PGO-optimized Linux-kernel.
> > > > > > > > > For details see ClangBuiltLinux issue #1250 "Unknown symbol
> > > > > > > > > _GLOBAL_OFFSET_TABLE_ loading kernel modules".
> > > > > > > > >
> > > > > > > > Thanks for confirming that this works with the above patch.
> > > > > > > >
> > > > > > > > > @ Bill Nick Sami Nathan
> > > > > > > > >
> > > > > > > > > 1, Can you say something of the impact passing "LLVM_IAS=1" to make?
> > > > > > > >
> > > > > > > > The integrated assembler and this option are more-or-less orthogonal
> > > > > > > > to each other. One can still use the GNU assembler with PGO. If you're
> > > > > > > > having an issue, it may be related to ClangBuiltLinux issue #1250.
> > > > > > > >
> > > > > > > > > 2. Can you please try Nick's DWARF v5 support patchset v5 and
> > > > > > > > > CONFIG_DEBUG_INFO_DWARF5=y (see attachments)?
> > > > > > > > >
> > > > > > > > I know Nick did several tests with PGO. He may have looked into it
> > > > > > > > already, but we can check.
> > > > > > > >
> > > > > > >
> > > > > > > Reproducible.
> > > > > > >
> > > > > > > LLVM_IAS=1 + DWARF5 = Not bootable
> > > > > > >
> > > > > > > I will try:
> > > > > > >
> > > > > > > LLVM_IAS=1 + DWARF4
> > > > > > >
> > > > > >
> > > > > > I was not able to boot into such a built Linux-kernel.
> > > > > >
> > > > > PGO will have no effect on debugging data. If this is an issue with
> > > > > DWARF, then it's likely orthogonal to the PGO patch.
> > > > >
> > > > > > For me worked: DWARF2 and LLVM_IAS=1 *not* set.
> > > > > >
> > > > > > Of course, this could be an issue with my system's LLVM/Clang.
> > > > > >
> > > > > > Debian clang version
> > > > > > 12.0.0-++20210115111113+45ef053bd709-1~exp1~20210115101809.3724
> > > > > >
> > > > > Please use the official clang 11.0.1 release
> > > > > (https://releases.llvm.org/download.html), modifying the
> > > > > kernel/pgo/Kconfig as I suggested above. The reason we specify clang
> > > > > 12 for the minimal version is because of an issue that was recently
> > > > > fixed.
> > > > >
> > > >
> > > > I downgraded to clang-11.1.0-rc1.
> > > > ( See attachment. )
> > > >
> > > > Doing the first build with PGO enabled plus DWARF5 and LLVM_IAS=1 works.
> > > >
> > > > But again after generating vmlinux.profdata and doing the PGO-rebuild
> > > > - the resulting Linux-kernel does NOT boot in QEMU or on bare metal.
> > > > With GNU/as I can boot.
> > > >
> > > > So this is independent of DWARF v4 or DWARF v5 (LLVM_IAS=1 and DWARF
> > > > v2 is not allowed).
> > > > There is something wrong (here) with passing LLVM_IAS=1 to make when
> > > > doing the PGO-rebuild.
> > > >
> > > > Can someone please verify and confirm that the PGO-rebuild with
> > > > LLVM_IAS=1 boots or boots not?
> > >
> > > I was able to build+boot with LLVM_IAS=1 on my personal laptop (no
> > > dwarf 5, just mainline+v5).
> > >
> >
> > To clarify:
> >
> > I can build a PGO-enabled Linux-kernel and boot it.
> > Afterwards generate a vmlinux.profdata.
> > In a next step: A rebuild without PGO-Kconfig disabled + LLVM_IAS=1
> > does not boot.
>
> Does the rebuild produce the hash warnings previously reported?
>
> Can you send your .config for this?

Exactly!

Attached is config-5.11.0-rc4-2-amd64-clang11-pgo.

- Sedat -


Attachments:
config-5.11.0-rc4-2-amd64-clang11-pgo (232.68 kB)

2021-01-22 10:17:06

by Bill Wendling

[permalink] [raw]
Subject: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
---
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
testing.
- Corrected documentation, re PGO flags when using LTO, based on Fangrui
Song's comments.
v3: - Added change log section based on Sedat Dilek's comments.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
own popcount implementation, based on Nick Desaulniers's comment.
v5: - Correct padding calculation, discovered by Nathan Chancellor.
v6: - Add better documentation about the locking scheme and other things.
- Rename macros to better match the same macros in LLVM's source code.
v7: - Fix minor build failure reported by Sedat.
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/crypto/Makefile | 4 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 44 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 35 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 189 +++++++++++++
kernel/pgo/pgo.h | 203 ++++++++++++++
scripts/Makefile.lib | 10 +
24 files changed, 1032 insertions(+)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9..8d6418e85806 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 000000000000..b7f11d8405b7
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ $ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index 705776b31c8d..0a75d223682d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13948,6 +13948,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index b0e4767735dc..9339541f7cec 100644
--- a/Makefile
+++ b/Makefile
@@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a3..f39d3991f6bf 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
pairs of 32-bit arguments, select this option.

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..36305ea61dc0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce..383853e32f67 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3fa..ed12ab65f606 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index a31de0c6ccde..5753aea7bcbd 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -4,6 +4,10 @@

OBJECT_FILES_NON_STANDARD := y

+# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
+# registers for some of the functions.
+PGO_PROFILE_curve25519-x86_64.o := n
+
obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o

obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 02e3e42f380b..26e2b3af0145 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f2..f6cab2316c46 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd..5f22b31446ad 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..36f20e99da0b 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449..21797192f958 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f35..54f5768f5853 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b3..2d81623b33f2 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..3a591bb18c5f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -316,6 +316,49 @@
#define THERMAL_TABLE(name)
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ KEEP(*(__llvm_prf_data)) \
+ . = ALIGN(8); \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_start = .; \
+ KEEP(*(__llvm_prf_cnts)) \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_names_start = .; \
+ KEEP(*(__llvm_prf_names)) \
+ . = ALIGN(8); \
+ __llvm_prf_names_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ KEEP(*(__llvm_prf_vals)) \
+ . = ALIGN(8); \
+ __llvm_prf_vals_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ KEEP(*(__llvm_prf_vnds)) \
+ . = ALIGN(8); \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1125,6 +1168,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index aa7368c7eabf..0b34ca228ba4 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 000000000000..76a640b6cf6e
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significantly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 000000000000..41e27cefd9a4
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 000000000000..1678df3b7d64
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,389 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+#ifdef CONFIG_64BIT
+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
+#else
+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
+#endif
+ header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 7 & (sizeof(u64) - size % sizeof(u64));
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/*
+ * Serialize the profiling data into a format LLVM's tools can understand.
+ * Note: caller *must* hold pgo_lock.
+ */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (unlikely(err)) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; i++) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debugfs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 000000000000..62ff5cfce7b1
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/bitops.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/*
+ * This lock guards both profile count updating and serialization of the
+ * profiling data. Keeping both of these activities separate via locking
+ * ensures that we don't try to serialize data that's only partially updated.
+ */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the index if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value);
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+ if (value <= 8)
+ /* The first ranges are individually tracked, use it as is. */
+ return value;
+ else if (value >= 513)
+ /* The last range is mapped to its lowest value. */
+ return 513;
+ else if (hweight64(value) == 1)
+ /* If it's a power of two, use it as is. */
+ return value;
+
+ /* Otherwise, take to the previous power of two + 1. */
+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index);
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index)
+{
+ u64 rep_value;
+
+ /* Map the target value to the representative value of its range. */
+ rep_value = inst_prof_get_range_rep_value(target_value);
+ __llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 000000000000..ddc8d3002fe5
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+
+#define LLVM_INSTR_PROF_RAW_VERSION 5
+#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
+#define LLVM_INSTR_PROF_IPVK_FIRST 0
+#define LLVM_INSTR_PROF_IPVK_LAST 1
+#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
+
+#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
+#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
+} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 213677a5ed33..9b218afb5cb8 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.30.0.280.ga3ce27912f-goog

2021-01-22 11:35:20

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 11:12 AM 'Bill Wendling' via Clang Built Linux
<[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
> v7: - Fix minor build failure reported by Sedat.

Thanks for v7.

Tested-by: Sedat Dilek <[email protected]> # LLVM/Clang v11.1.0-rc1

- Sedat -

> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1032 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9..8d6418e85806 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 000000000000..b7f11d8405b7
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 705776b31c8d..0a75d223682d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13948,6 +13948,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index b0e4767735dc..9339541f7cec 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..f39d3991f6bf 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> pairs of 32-bit arguments, select this option.
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..36305ea61dc0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce..383853e32f67 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..ed12ab65f606 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index a31de0c6ccde..5753aea7bcbd 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> +# registers for some of the functions.
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
>
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380b..26e2b3af0145 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f2..f6cab2316c46 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd..5f22b31446ad 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..36f20e99da0b 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449..21797192f958 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f35..54f5768f5853 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b3..2d81623b33f2 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535..3a591bb18c5f 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf..0b34ca228ba4 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 000000000000..76a640b6cf6e
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 000000000000..41e27cefd9a4
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 000000000000..1678df3b7d64
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,389 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +#ifdef CONFIG_64BIT
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> +#else
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> +#endif
> + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (sizeof(u64) - size % sizeof(u64));
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/*
> + * Serialize the profiling data into a format LLVM's tools can understand.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (unlikely(err)) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; i++) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 000000000000..62ff5cfce7b1
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/*
> + * This lock guards both profile count updating and serialization of the
> + * profiling data. Keeping both of these activities separate via locking
> + * ensures that we don't try to serialize data that's only partially updated.
> + */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the index if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, use it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 000000000000..ddc8d3002fe5
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +
> +#define LLVM_INSTR_PROF_RAW_VERSION 5
> +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> +#define LLVM_INSTR_PROF_IPVK_LAST 1
> +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> +
> +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33..9b218afb5cb8 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.280.ga3ce27912f-goog
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210122101156.3257143-1-morbo%40google.com.

2021-01-22 19:41:32

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 2:12 AM Bill Wendling <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>

Reviewed-by: Nick Desaulniers <[email protected]>

Let's get this queued up, then start thinking about how we can follow
up with improvements to docs, ergonomics of passing the profiling
data, and any nailing down which configs tickle any compiler bugs,
boot failures, or hash mismatches.

> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
> v7: - Fix minor build failure reported by Sedat.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1032 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9..8d6418e85806 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 000000000000..b7f11d8405b7
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 705776b31c8d..0a75d223682d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13948,6 +13948,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index b0e4767735dc..9339541f7cec 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..f39d3991f6bf 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> pairs of 32-bit arguments, select this option.
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..36305ea61dc0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce..383853e32f67 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..ed12ab65f606 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index a31de0c6ccde..5753aea7bcbd 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> +# registers for some of the functions.
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
>
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380b..26e2b3af0145 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f2..f6cab2316c46 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd..5f22b31446ad 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..36f20e99da0b 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449..21797192f958 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f35..54f5768f5853 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b3..2d81623b33f2 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535..3a591bb18c5f 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf..0b34ca228ba4 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 000000000000..76a640b6cf6e
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 000000000000..41e27cefd9a4
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 000000000000..1678df3b7d64
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,389 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +#ifdef CONFIG_64BIT
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> +#else
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> +#endif
> + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (sizeof(u64) - size % sizeof(u64));
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/*
> + * Serialize the profiling data into a format LLVM's tools can understand.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (unlikely(err)) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; i++) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 000000000000..62ff5cfce7b1
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/*
> + * This lock guards both profile count updating and serialization of the
> + * profiling data. Keeping both of these activities separate via locking
> + * ensures that we don't try to serialize data that's only partially updated.
> + */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the index if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, use it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 000000000000..ddc8d3002fe5
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +
> +#define LLVM_INSTR_PROF_RAW_VERSION 5
> +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> +#define LLVM_INSTR_PROF_IPVK_LAST 1
> +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> +
> +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33..9b218afb5cb8 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.280.ga3ce27912f-goog
>


--
Thanks,
~Nick Desaulniers

2021-01-28 20:51:09

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 7:41 PM 'Nick Desaulniers' via Clang Built
Linux <[email protected]> wrote:
>
> On Fri, Jan 22, 2021 at 2:12 AM Bill Wendling <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
>
> Reviewed-by: Nick Desaulniers <[email protected]>
>
> Let's get this queued up, then start thinking about how we can follow
> up with improvements to docs, ergonomics of passing the profiling
> data, and any nailing down which configs tickle any compiler bugs,
> boot failures, or hash mismatches.
>

[ LLVM ]

Today, I switched over to LLVM version 12.0.0-rc1.


[ Step #1: 5.11.0-rc5-5-amd64-clang12-pgo ]

My first kernel was built with CONFIG_PGO_CLANG=y and LLVM=1 plus LLVM_IAS=1.

[ start-build_5.11.0-rc5-5-amd64-clang12-pgo.txt ]
dileks 193090 193065 0 06:54 pts/2 00:00:00 /usr/bin/perf_5.10
stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
LOCALVERSION=-5-amd64-clang12-pgo KBUILD_VERBOSE=1
KBUILD_BUILD_HOST=iniza [email protected]
KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
KDEB_PKGVERSION=5.11.0~rc5-5~bullseye+dileks1

Config: config-5.11.0-rc5-5-amd64-clang12-pgo


[ Step #2: x86-64 defconfig & vmlinux.profdata ]

Booted into 5.11.0-rc5-5-amd64-clang12-pgo and built an x86-64
defconfig to generate/merge a vmlinux.profdata file.

[ start-build_x86-64-defconfig.txt ]
dileks 18430 15640 0 11:15 pts/2 00:00:00 make V=1 -j4
HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang LD=ld.lld LLVM=1
LLVM_IAS=1

Script: profile_clang-pgo.sh
Config: dot-config.x86-64-defconfig


[ Step #3.1: 5.11.0-rc5-6-amd64-clang12-pgo & GNU-AS ]

The first rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
KCFLAGS=-fprofile-use=vmlinux.profdata".
I was able to boot into this one.
Used assembler: GNU-AS 2.35.1

[ start-build_5.11.0-rc5-6-amd64-clang12-pgo.txt ]
dileks 65734 65709 0 11:54 pts/2 00:00:00 /usr/bin/perf_5.10
stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
LOCALVERSION=-6-amd64-clang12-pgo KBUILD_VERBOSE=1
KBUILD_BUILD_HOST=iniza [email protected]
KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
KDEB_PKGVERSION=5.11.0~rc5-6~bullseye+dileks1 LLVM=1
KCFLAGS=-fprofile-use=vmlinux.profdata

Config: config-5.11.0-rc5-6-amd64-clang12-pgo


[ Step #3.2: 5.11.0-rc5-7-amd64-clang12-pgo & Clang-IAS ]

The second rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
KCFLAGS=-fprofile-use=vmlinux.profdata" plus LLVM_IAS=1.
Compilable but NOT bootable in QEMU and on bare metal.
Used assembler: Clang-IAS v12.0.0-rc1

[ start-build_5.11.0-rc5-7-amd64-clang12-pgo.txt ]
dileks 6545 6520 0 16:31 pts/2 00:00:00 /usr/bin/perf_5.10
stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
LOCALVERSION=-7-amd64-clang12-pgo KBUILD_VERBOSE=1
KBUILD_BUILD_HOST=iniza [email protected]
KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
KDEB_PKGVERSION=5.11.0~rc5-7~bullseye+dileks1 LLVM=1
KCFLAGS=-fprofile-use=vmlinux.profdata LLVM_IAS=1

Config: config-5.11.0-rc5-7-amd64-clang12-pgo


[ Conclusion ]

The only statement I can tell you is a "PGO optimized" rebuild with
LLVM_IAS=1 is compilable but NOT bootable.

- Sedat -

P.S.: See attachments kernel-configs and scripts

> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > testing.
> > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > v6: - Add better documentation about the locking scheme and other things.
> > - Rename macros to better match the same macros in LLVM's source code.
> > v7: - Fix minor build failure reported by Sedat.
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 4 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 44 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 189 +++++++++++++
> > kernel/pgo/pgo.h | 203 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1032 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9..8d6418e85806 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 000000000000..b7f11d8405b7
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 705776b31c8d..0a75d223682d 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13948,6 +13948,15 @@ S: Maintained
> > F: include/linux/personality.h
> > F: include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M: Sami Tolvanen <[email protected]>
> > +M: Bill Wendling <[email protected]>
> > +R: Nathan Chancellor <[email protected]>
> > +R: Nick Desaulniers <[email protected]>
> > +S: Supported
> > +F: Documentation/dev-tools/pgo.rst
> > +F: kernel/pgo
> > +
> > PHOENIX RC FLIGHT CONTROLLER ADAPTER
> > M: Marcus Folkesson <[email protected]>
> > L: [email protected]
> > diff --git a/Makefile b/Makefile
> > index b0e4767735dc..9339541f7cec 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 24862d15f3a3..f39d3991f6bf 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> > pairs of 32-bit arguments, select this option.
> >
> > source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> > source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 21f851179ff0..36305ea61dc0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -96,6 +96,7 @@ config X86
> > select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> > select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> > + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_QUEUED_RWLOCKS
> > select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce..383853e32f67 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3fa..ed12ab65f606 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE :=n
> >
> > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> > index a31de0c6ccde..5753aea7bcbd 100644
> > --- a/arch/x86/crypto/Makefile
> > +++ b/arch/x86/crypto/Makefile
> > @@ -4,6 +4,10 @@
> >
> > OBJECT_FILES_NON_STANDARD := y
> >
> > +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> > +# registers for some of the functions.
> > +PGO_PROFILE_curve25519-x86_64.o := n
> > +
> > obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
> >
> > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380b..26e2b3af0145 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> > $(call ld-option, --eh-frame-hdr) -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > quiet_cmd_vdso_and_check = VDSO $@
> > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f2..f6cab2316c46 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> > BUG_TABLE
> >
> > + PGO_CLANG_DATA
> > +
> > ORC_UNWIND_TABLE
> >
> > . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd..5f22b31446ad 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> > KASAN_SANITIZE := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20c..36f20e99da0b 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> > # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449..21797192f958 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f35..54f5768f5853 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b3..2d81623b33f2 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > # Sanitizer runtimes are unavailable and cannot be linked here.
> > KASAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535..3a591bb18c5f 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> > #define THERMAL_TABLE(name)
> > #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA \
> > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_start = .; \
> > + __llvm_prf_data_start = .; \
> > + KEEP(*(__llvm_prf_data)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_data_end = .; \
> > + } \
> > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_start = .; \
> > + KEEP(*(__llvm_prf_cnts)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_end = .; \
> > + } \
> > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_start = .; \
> > + KEEP(*(__llvm_prf_names)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> > + __llvm_prf_vals_start = .; \
> > + KEEP(*(__llvm_prf_vals)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vals_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> > + __llvm_prf_vnds_start = .; \
> > + KEEP(*(__llvm_prf_vnds)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vnds_end = .; \
> > + __llvm_prf_end = .; \
> > + }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> > #define KERNEL_DTB() \
> > STRUCT_ALIGN(); \
> > __dtb_start = .; \
> > @@ -1125,6 +1168,7 @@
> > CONSTRUCTORS \
> > } \
> > BUG_TABLE \
> > + PGO_CLANG_DATA
> >
> > #define INIT_TEXT_SECTION(inittext_align) \
> > . = ALIGN(inittext_align); \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf..0b34ca228ba4 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> > obj-$(CONFIG_KCSAN) += kcsan/
> > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> > obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 000000000000..76a640b6cf6e
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,35 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > + bool
> > +
> > +config PGO_CLANG
> > + bool "Enable clang's PGO-based kernel profiling"
> > + depends on DEBUG_FS
> > + depends on ARCH_SUPPORTS_PGO_CLANG
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> > + help
> > + This option enables clang's PGO (Profile Guided Optimization) based
> > + code profiling to better optimize the kernel.
> > +
> > + If unsure, say N.
> > +
> > + Run a representative workload for your application on a kernel
> > + compiled with this option and download the raw profile file from
> > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > + llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > + Copy the resulting profile file into vmlinux.profdata, and enable
> > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > + kernel.
> > +
> > + Note that a kernel compiled with profiling flags will be
> > + significantly larger and run slower. Also be sure to exclude files
> > + from profiling which are not linked to the kernel image to prevent
> > + linker errors.
> > +
> > + Note that the debugfs filesystem has to be mounted to access
> > + profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 000000000000..41e27cefd9a4
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > +
> > +obj-y += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 000000000000..1678df3b7d64
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,389 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > + void *buffer;
> > + unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + * - llvm_prf_header
> > + * - __llvm_prf_data
> > + * - __llvm_prf_cnts
> > + * - __llvm_prf_names
> > + * - zero padding to 8 bytes
> > + * - for each llvm_prf_data in __llvm_prf_data:
> > + * - llvm_prf_value_data
> > + * - llvm_prf_value_record + site count array
> > + * - llvm_prf_value_node_data
> > + * ...
> > + * ...
> > + * ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > +#ifdef CONFIG_64BIT
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> > +#else
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> > +#endif
> > + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> > + header->data_size = prf_data_count();
> > + header->padding_bytes_before_counters = 0;
> > + header->counters_size = prf_cnts_count();
> > + header->padding_bytes_after_counters = 0;
> > + header->names_size = prf_names_count();
> > + header->counters_delta = (u64)__llvm_prf_cnts_start;
> > + header->names_delta = (u64)__llvm_prf_names_start;
> > + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> > +
> > + *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > + memcpy(*buffer, src, size);
> > + *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + u32 kinds = 0;
> > + u32 size = 0;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Record + site count array */
> > + size += prf_get_value_record_size(sites);
> > + kinds++;
> > +
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX)
> > + site = site->next;
> > +
> > + size += count *
> > + sizeof(struct llvm_prf_value_node_data);
> > + }
> > +
> > + s += sites;
> > + }
> > +
> > + if (size)
> > + size += sizeof(struct llvm_prf_value_data);
> > +
> > + if (value_kinds)
> > + *value_kinds = kinds;
> > +
> > + return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > + u32 size = 0;
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + size += __prf_get_value_size(p, NULL);
> > +
> > + return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > + struct llvm_prf_value_data header;
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > + if (!header.num_value_kinds)
> > + /* Nothing to write. */
> > + return;
> > +
> > + prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + struct llvm_prf_value_record *record;
> > + u8 *counts;
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Profiling value record. */
> > + record = *(struct llvm_prf_value_record **)buffer;
> > + *buffer += prf_get_value_record_header_size();
> > +
> > + record->kind = kind;
> > + record->num_value_sites = sites;
> > +
> > + /* Site count array. */
> > + counts = *(u8 **)buffer;
> > + *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > + /*
> > + * If we don't have nodes, we can skip updating the site count
> > + * array, because the buffer is zero filled.
> > + */
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX) {
> > + prf_copy_to_buffer(buffer, site,
> > + sizeof(struct llvm_prf_value_node_data));
> > + site = site->next;
> > + }
> > +
> > + counts[n] = (u8)count;
> > + }
> > +
> > + s += sites;
> > + }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > + return 7 & (sizeof(u64) - size % sizeof(u64));
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > + return sizeof(struct llvm_prf_header) +
> > + prf_data_size() +
> > + prf_cnts_size() +
> > + prf_names_size() +
> > + prf_get_padding(prf_names_size()) +
> > + prf_get_value_size();
> > +}
> > +
> > +/*
> > + * Serialize the profiling data into a format LLVM's tools can understand.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > + int err = 0;
> > + void *buffer;
> > +
> > + p->size = prf_buffer_size();
> > + p->buffer = vzalloc(p->size);
> > +
> > + if (!p->buffer) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + buffer = p->buffer;
> > +
> > + prf_fill_header(&buffer);
> > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > + buffer += prf_get_padding(prf_names_size());
> > +
> > + prf_serialize_values(&buffer);
> > +
> > +out:
> > + return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data;
> > + unsigned long flags;
> > + int err;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + flags = prf_lock();
> > +
> > + err = prf_serialize(data);
> > + if (unlikely(err)) {
> > + kfree(data);
> > + goto out_unlock;
> > + }
> > +
> > + file->private_data = data;
> > +
> > +out_unlock:
> > + prf_unlock(flags);
> > +out:
> > + return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + BUG_ON(!data);
> > +
> > + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > + data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + if (data) {
> > + vfree(data->buffer);
> > + kfree(data);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > + .owner = THIS_MODULE,
> > + .open = prf_open,
> > + .read = prf_read,
> > + .llseek = default_llseek,
> > + .release = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > + size_t len, loff_t *pos)
> > +{
> > + struct llvm_prf_data *data;
> > +
> > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> > + struct llvm_prf_value_node **vnodes;
> > + u64 current_vsite_count;
> > + u32 i;
> > +
> > + if (!data->values)
> > + continue;
> > +
> > + current_vsite_count = 0;
> > + vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> > + current_vsite_count += data->num_value_sites[i];
> > +
> > + for (i = 0; i < current_vsite_count; i++) {
> > + struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > + while (current_vnode) {
> > + current_vnode->count = 0;
> > + current_vnode = current_vnode->next;
> > + }
> > + }
> > + }
> > +
> > + return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > + .owner = THIS_MODULE,
> > + .write = reset_write,
> > + .llseek = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > + directory = debugfs_create_dir("pgo", NULL);
> > + if (!directory)
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > + &prf_fops))
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("reset", 0200, directory, NULL,
> > + &prf_reset_fops))
> > + goto err_remove;
> > +
> > + return 0;
> > +
> > +err_remove:
> > + pr_err("initialization failed\n");
> > + return -EIO;
> > +}
> > +
> > +/* Remove debugfs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > + debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 000000000000..62ff5cfce7b1
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/*
> > + * This lock guards both profile count updating and serialization of the
> > + * profiling data. Keeping both of these activities separate via locking
> > + * ensures that we don't try to serialize data that's only partially updated.
> > + */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&pgo_lock, flags);
> > +
> > + return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > + spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > + u32 index, u64 value)
> > +{
> > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > + return NULL; /* Out of nodes */
> > +
> > + current_node++;
> > +
> > + /* Make sure the node is entirely within the section */
> > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > + return NULL;
> > +
> > + return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the index if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > + struct llvm_prf_value_node **counters;
> > + struct llvm_prf_value_node *curr;
> > + struct llvm_prf_value_node *min = NULL;
> > + struct llvm_prf_value_node *prev = NULL;
> > + u64 min_count = U64_MAX;
> > + u8 values = 0;
> > + unsigned long flags;
> > +
> > + if (!p || !p->values)
> > + return;
> > +
> > + counters = (struct llvm_prf_value_node **)p->values;
> > + curr = counters[index];
> > +
> > + while (curr) {
> > + if (target_value == curr->value) {
> > + curr->count++;
> > + return;
> > + }
> > +
> > + if (curr->count < min_count) {
> > + min_count = curr->count;
> > + min = curr;
> > + }
> > +
> > + prev = curr;
> > + curr = curr->next;
> > + values++;
> > + }
> > +
> > + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> > + if (!min->count || !(--min->count)) {
> > + curr = min;
> > + curr->value = target_value;
> > + curr->count++;
> > + }
> > + return;
> > + }
> > +
> > + /* Lock when updating the value node structure. */
> > + flags = prf_lock();
> > +
> > + curr = allocate_node(p, index, target_value);
> > + if (!curr)
> > + goto out;
> > +
> > + curr->value = target_value;
> > + curr->count++;
> > +
> > + if (!counters[index])
> > + counters[index] = curr;
> > + else if (prev && !prev->next)
> > + prev->next = curr;
> > +
> > +out:
> > + prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value);
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value)
> > +{
> > + if (large_value != S64_MIN && (s64)target_value >= large_value)
> > + target_value = large_value;
> > + else if ((s64)target_value < precise_start ||
> > + (s64)target_value > precise_last)
> > + target_value = precise_last + 1;
> > +
> > + __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > +
> > +static u64 inst_prof_get_range_rep_value(u64 value)
> > +{
> > + if (value <= 8)
> > + /* The first ranges are individually tracked, use it as is. */
> > + return value;
> > + else if (value >= 513)
> > + /* The last range is mapped to its lowest value. */
> > + return 513;
> > + else if (hweight64(value) == 1)
> > + /* If it's a power of two, use it as is. */
> > + return value;
> > +
> > + /* Otherwise, take to the previous power of two + 1. */
> > + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> > +}
> > +
> > +/*
> > + * The target values are partitioned into multiple ranges. The range spec is
> > + * defined in compiler-rt/include/profile/InstrProfData.inc.
> > + */
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index);
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index)
> > +{
> > + u64 rep_value;
> > +
> > + /* Map the target value to the representative value of its range. */
> > + rep_value = inst_prof_get_range_rep_value(target_value);
> > + __llvm_profile_instrument_target(rep_value, data, counter_index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 000000000000..ddc8d3002fe5
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,203 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'r' << 8 | \
> > + (u64)129)
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'R' << 8 | \
> > + (u64)129)
> > +
> > +#define LLVM_INSTR_PROF_RAW_VERSION 5
> > +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> > +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> > +#define LLVM_INSTR_PROF_IPVK_LAST 1
> > +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> > +
> > +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> > +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + * counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + * counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + * counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + * counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > + u64 magic;
> > + u64 version;
> > + u64 data_size;
> > + u64 padding_bytes_before_counters;
> > + u64 counters_size;
> > + u64 padding_bytes_after_counters;
> > + u64 names_size;
> > + u64 counters_delta;
> > + u64 names_delta;
> > + u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > + const u64 name_ref;
> > + const u64 func_hash;
> > + const void *counter_ptr;
> > + const void *function_ptr;
> > + void *values;
> > + const u32 num_counters;
> > + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> > +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + * llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > + u64 value;
> > + u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + * the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > + u64 value;
> > + u64 count;
> > + struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + * format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + * data.
> > + */
> > +struct llvm_prf_value_data {
> > + u32 total_size;
> > + u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + * profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + * of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > + u32 kind;
> > + u32 num_value_sites;
> > + u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size() \
> > + offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites) \
> > + roundup((sites), 8)
> > +#define prf_get_value_record_size(sites) \
> > + (prf_get_value_record_header_size() + \
> > + prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > + static inline unsigned long prf_ ## s ## _size(void) \
> > + { \
> > + unsigned long start = \
> > + (unsigned long)__llvm_prf_ ## s ## _start; \
> > + unsigned long end = \
> > + (unsigned long)__llvm_prf_ ## s ## _end; \
> > + return roundup(end - start, \
> > + sizeof(__llvm_prf_ ## s ## _start[0])); \
> > + } \
> > + static inline unsigned long prf_ ## s ## _count(void) \
> > + { \
> > + return prf_ ## s ## _size() / \
> > + sizeof(__llvm_prf_ ## s ## _start[0]); \
> > + }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33..9b218afb5cb8 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_GCOV))
> > endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > + $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> > #
> > # Enable address sanitizer flags for kernel except some files or directories
> > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.280.ga3ce27912f-goog
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdm%2B3o8z2GivPjSJRa%3Dc%3DUKdfkiY-79s6yn2BxJkFnoFTw%40mail.gmail.com.


Attachments:
config-5.11.0-rc5-5-amd64-clang12-pgo (232.96 kB)
config-5.11.0-rc5-6-amd64-clang12-pgo (232.97 kB)
config-5.11.0-rc5-7-amd64-clang12-pgo (232.97 kB)
dot-config.x86-64-defconfig (124.36 kB)
profile_clang-pgo.sh (1.94 kB)
Download all attachments

2021-01-28 21:14:03

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 28, 2021 at 12:46 PM Sedat Dilek <[email protected]> wrote:
>
> [ LLVM ]
>
> Today, I switched over to LLVM version 12.0.0-rc1.
>
>
> [ Step #1: 5.11.0-rc5-5-amd64-clang12-pgo ]
>
> My first kernel was built with CONFIG_PGO_CLANG=y and LLVM=1 plus LLVM_IAS=1.
>
> [ start-build_5.11.0-rc5-5-amd64-clang12-pgo.txt ]
> dileks 193090 193065 0 06:54 pts/2 00:00:00 /usr/bin/perf_5.10
> stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> LOCALVERSION=-5-amd64-clang12-pgo KBUILD_VERBOSE=1
> KBUILD_BUILD_HOST=iniza [email protected]
> KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> KDEB_PKGVERSION=5.11.0~rc5-5~bullseye+dileks1
>
> Config: config-5.11.0-rc5-5-amd64-clang12-pgo
>
>
> [ Step #2: x86-64 defconfig & vmlinux.profdata ]
>
> Booted into 5.11.0-rc5-5-amd64-clang12-pgo and built an x86-64
> defconfig to generate/merge a vmlinux.profdata file.
>
> [ start-build_x86-64-defconfig.txt ]
> dileks 18430 15640 0 11:15 pts/2 00:00:00 make V=1 -j4
> HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang LD=ld.lld LLVM=1
> LLVM_IAS=1
>
> Script: profile_clang-pgo.sh
> Config: dot-config.x86-64-defconfig
>
>
> [ Step #3.1: 5.11.0-rc5-6-amd64-clang12-pgo & GNU-AS ]
>
> The first rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
> KCFLAGS=-fprofile-use=vmlinux.profdata".
> I was able to boot into this one.
> Used assembler: GNU-AS 2.35.1
>
> [ start-build_5.11.0-rc5-6-amd64-clang12-pgo.txt ]
> dileks 65734 65709 0 11:54 pts/2 00:00:00 /usr/bin/perf_5.10
> stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
> LOCALVERSION=-6-amd64-clang12-pgo KBUILD_VERBOSE=1
> KBUILD_BUILD_HOST=iniza [email protected]
> KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> KDEB_PKGVERSION=5.11.0~rc5-6~bullseye+dileks1 LLVM=1
> KCFLAGS=-fprofile-use=vmlinux.profdata
>
> Config: config-5.11.0-rc5-6-amd64-clang12-pgo
>
>
> [ Step #3.2: 5.11.0-rc5-7-amd64-clang12-pgo & Clang-IAS ]
>
> The second rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
> KCFLAGS=-fprofile-use=vmlinux.profdata" plus LLVM_IAS=1.
> Compilable but NOT bootable in QEMU and on bare metal.
> Used assembler: Clang-IAS v12.0.0-rc1
>
> [ start-build_5.11.0-rc5-7-amd64-clang12-pgo.txt ]
> dileks 6545 6520 0 16:31 pts/2 00:00:00 /usr/bin/perf_5.10
> stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
> LOCALVERSION=-7-amd64-clang12-pgo KBUILD_VERBOSE=1
> KBUILD_BUILD_HOST=iniza [email protected]
> KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> KDEB_PKGVERSION=5.11.0~rc5-7~bullseye+dileks1 LLVM=1
> KCFLAGS=-fprofile-use=vmlinux.profdata LLVM_IAS=1
>
> Config: config-5.11.0-rc5-7-amd64-clang12-pgo
>
>
> [ Conclusion ]
>
> The only statement I can tell you is a "PGO optimized" rebuild with
> LLVM_IAS=1 is compilable but NOT bootable.

Thanks for the extensive testing and report. Can you compress, upload,
and post a link to your kernel image? I would like to take it for a
spin in QEMU and see if I can find what it's doing, then work
backwards from there.

--
Thanks,
~Nick Desaulniers

2021-01-28 21:22:00

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 28, 2021 at 10:12 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Thu, Jan 28, 2021 at 12:46 PM Sedat Dilek <[email protected]> wrote:
> >
> > [ LLVM ]
> >
> > Today, I switched over to LLVM version 12.0.0-rc1.
> >
> >
> > [ Step #1: 5.11.0-rc5-5-amd64-clang12-pgo ]
> >
> > My first kernel was built with CONFIG_PGO_CLANG=y and LLVM=1 plus LLVM_IAS=1.
> >
> > [ start-build_5.11.0-rc5-5-amd64-clang12-pgo.txt ]
> > dileks 193090 193065 0 06:54 pts/2 00:00:00 /usr/bin/perf_5.10
> > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > LOCALVERSION=-5-amd64-clang12-pgo KBUILD_VERBOSE=1
> > KBUILD_BUILD_HOST=iniza [email protected]
> > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > KDEB_PKGVERSION=5.11.0~rc5-5~bullseye+dileks1
> >
> > Config: config-5.11.0-rc5-5-amd64-clang12-pgo
> >
> >
> > [ Step #2: x86-64 defconfig & vmlinux.profdata ]
> >
> > Booted into 5.11.0-rc5-5-amd64-clang12-pgo and built an x86-64
> > defconfig to generate/merge a vmlinux.profdata file.
> >
> > [ start-build_x86-64-defconfig.txt ]
> > dileks 18430 15640 0 11:15 pts/2 00:00:00 make V=1 -j4
> > HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang LD=ld.lld LLVM=1
> > LLVM_IAS=1
> >
> > Script: profile_clang-pgo.sh
> > Config: dot-config.x86-64-defconfig
> >
> >
> > [ Step #3.1: 5.11.0-rc5-6-amd64-clang12-pgo & GNU-AS ]
> >
> > The first rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
> > KCFLAGS=-fprofile-use=vmlinux.profdata".
> > I was able to boot into this one.
> > Used assembler: GNU-AS 2.35.1
> >
> > [ start-build_5.11.0-rc5-6-amd64-clang12-pgo.txt ]
> > dileks 65734 65709 0 11:54 pts/2 00:00:00 /usr/bin/perf_5.10
> > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
> > LOCALVERSION=-6-amd64-clang12-pgo KBUILD_VERBOSE=1
> > KBUILD_BUILD_HOST=iniza [email protected]
> > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > KDEB_PKGVERSION=5.11.0~rc5-6~bullseye+dileks1 LLVM=1
> > KCFLAGS=-fprofile-use=vmlinux.profdata
> >
> > Config: config-5.11.0-rc5-6-amd64-clang12-pgo
> >
> >
> > [ Step #3.2: 5.11.0-rc5-7-amd64-clang12-pgo & Clang-IAS ]
> >
> > The second rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
> > KCFLAGS=-fprofile-use=vmlinux.profdata" plus LLVM_IAS=1.
> > Compilable but NOT bootable in QEMU and on bare metal.
> > Used assembler: Clang-IAS v12.0.0-rc1
> >
> > [ start-build_5.11.0-rc5-7-amd64-clang12-pgo.txt ]
> > dileks 6545 6520 0 16:31 pts/2 00:00:00 /usr/bin/perf_5.10
> > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
> > LOCALVERSION=-7-amd64-clang12-pgo KBUILD_VERBOSE=1
> > KBUILD_BUILD_HOST=iniza [email protected]
> > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > KDEB_PKGVERSION=5.11.0~rc5-7~bullseye+dileks1 LLVM=1
> > KCFLAGS=-fprofile-use=vmlinux.profdata LLVM_IAS=1
> >
> > Config: config-5.11.0-rc5-7-amd64-clang12-pgo
> >
> >
> > [ Conclusion ]
> >
> > The only statement I can tell you is a "PGO optimized" rebuild with
> > LLVM_IAS=1 is compilable but NOT bootable.
>
> Thanks for the extensive testing and report. Can you compress, upload,
> and post a link to your kernel image? I would like to take it for a
> spin in QEMU and see if I can find what it's doing, then work
> backwards from there.
>

Which files do you need?
For QEMU: bzImage and initrd.img enough?

- Sedat -

2021-01-28 21:27:04

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 28, 2021 at 1:19 PM Sedat Dilek <[email protected]> wrote:
>
> On Thu, Jan 28, 2021 at 10:12 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > On Thu, Jan 28, 2021 at 12:46 PM Sedat Dilek <[email protected]> wrote:
> > >
> > > [ LLVM ]
> > >
> > > Today, I switched over to LLVM version 12.0.0-rc1.
> > >
> > >
> > > [ Step #1: 5.11.0-rc5-5-amd64-clang12-pgo ]
> > >
> > > My first kernel was built with CONFIG_PGO_CLANG=y and LLVM=1 plus LLVM_IAS=1.
> > >
> > > [ start-build_5.11.0-rc5-5-amd64-clang12-pgo.txt ]
> > > dileks 193090 193065 0 06:54 pts/2 00:00:00 /usr/bin/perf_5.10
> > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > LOCALVERSION=-5-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > KBUILD_BUILD_HOST=iniza [email protected]
> > > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > > KDEB_PKGVERSION=5.11.0~rc5-5~bullseye+dileks1
> > >
> > > Config: config-5.11.0-rc5-5-amd64-clang12-pgo
> > >
> > >
> > > [ Step #2: x86-64 defconfig & vmlinux.profdata ]
> > >
> > > Booted into 5.11.0-rc5-5-amd64-clang12-pgo and built an x86-64
> > > defconfig to generate/merge a vmlinux.profdata file.
> > >
> > > [ start-build_x86-64-defconfig.txt ]
> > > dileks 18430 15640 0 11:15 pts/2 00:00:00 make V=1 -j4
> > > HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang LD=ld.lld LLVM=1
> > > LLVM_IAS=1
> > >
> > > Script: profile_clang-pgo.sh
> > > Config: dot-config.x86-64-defconfig
> > >
> > >
> > > [ Step #3.1: 5.11.0-rc5-6-amd64-clang12-pgo & GNU-AS ]
> > >
> > > The first rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
> > > KCFLAGS=-fprofile-use=vmlinux.profdata".
> > > I was able to boot into this one.
> > > Used assembler: GNU-AS 2.35.1
> > >
> > > [ start-build_5.11.0-rc5-6-amd64-clang12-pgo.txt ]
> > > dileks 65734 65709 0 11:54 pts/2 00:00:00 /usr/bin/perf_5.10
> > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
> > > LOCALVERSION=-6-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > KBUILD_BUILD_HOST=iniza [email protected]
> > > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > > KDEB_PKGVERSION=5.11.0~rc5-6~bullseye+dileks1 LLVM=1
> > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > >
> > > Config: config-5.11.0-rc5-6-amd64-clang12-pgo
> > >
> > >
> > > [ Step #3.2: 5.11.0-rc5-7-amd64-clang12-pgo & Clang-IAS ]
> > >
> > > The second rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
> > > KCFLAGS=-fprofile-use=vmlinux.profdata" plus LLVM_IAS=1.
> > > Compilable but NOT bootable in QEMU and on bare metal.
> > > Used assembler: Clang-IAS v12.0.0-rc1
> > >
> > > [ start-build_5.11.0-rc5-7-amd64-clang12-pgo.txt ]
> > > dileks 6545 6520 0 16:31 pts/2 00:00:00 /usr/bin/perf_5.10
> > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
> > > LOCALVERSION=-7-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > KBUILD_BUILD_HOST=iniza [email protected]
> > > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > > KDEB_PKGVERSION=5.11.0~rc5-7~bullseye+dileks1 LLVM=1
> > > KCFLAGS=-fprofile-use=vmlinux.profdata LLVM_IAS=1
> > >
> > > Config: config-5.11.0-rc5-7-amd64-clang12-pgo
> > >
> > >
> > > [ Conclusion ]
> > >
> > > The only statement I can tell you is a "PGO optimized" rebuild with
> > > LLVM_IAS=1 is compilable but NOT bootable.
> >
> > Thanks for the extensive testing and report. Can you compress, upload,
> > and post a link to your kernel image? I would like to take it for a
> > spin in QEMU and see if I can find what it's doing, then work
> > backwards from there.
> >
>
> Which files do you need?
> For QEMU: bzImage and initrd.img enough?

bzImage should be enough; I'll use my own initrd. If that boots for
me, maybe then I'll take a look with the initrd added.
--
Thanks,
~Nick Desaulniers

2021-01-28 21:42:22

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 28, 2021 at 10:24 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Thu, Jan 28, 2021 at 1:19 PM Sedat Dilek <[email protected]> wrote:
> >
> > On Thu, Jan 28, 2021 at 10:12 PM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > On Thu, Jan 28, 2021 at 12:46 PM Sedat Dilek <[email protected]> wrote:
> > > >
> > > > [ LLVM ]
> > > >
> > > > Today, I switched over to LLVM version 12.0.0-rc1.
> > > >
> > > >
> > > > [ Step #1: 5.11.0-rc5-5-amd64-clang12-pgo ]
> > > >
> > > > My first kernel was built with CONFIG_PGO_CLANG=y and LLVM=1 plus LLVM_IAS=1.
> > > >
> > > > [ start-build_5.11.0-rc5-5-amd64-clang12-pgo.txt ]
> > > > dileks 193090 193065 0 06:54 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > LD=ld.lld LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole
> > > > LOCALVERSION=-5-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > > > KDEB_PKGVERSION=5.11.0~rc5-5~bullseye+dileks1
> > > >
> > > > Config: config-5.11.0-rc5-5-amd64-clang12-pgo
> > > >
> > > >
> > > > [ Step #2: x86-64 defconfig & vmlinux.profdata ]
> > > >
> > > > Booted into 5.11.0-rc5-5-amd64-clang12-pgo and built an x86-64
> > > > defconfig to generate/merge a vmlinux.profdata file.
> > > >
> > > > [ start-build_x86-64-defconfig.txt ]
> > > > dileks 18430 15640 0 11:15 pts/2 00:00:00 make V=1 -j4
> > > > HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang LD=ld.lld LLVM=1
> > > > LLVM_IAS=1
> > > >
> > > > Script: profile_clang-pgo.sh
> > > > Config: dot-config.x86-64-defconfig
> > > >
> > > >
> > > > [ Step #3.1: 5.11.0-rc5-6-amd64-clang12-pgo & GNU-AS ]
> > > >
> > > > The first rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
> > > > KCFLAGS=-fprofile-use=vmlinux.profdata".
> > > > I was able to boot into this one.
> > > > Used assembler: GNU-AS 2.35.1
> > > >
> > > > [ start-build_5.11.0-rc5-6-amd64-clang12-pgo.txt ]
> > > > dileks 65734 65709 0 11:54 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
> > > > LOCALVERSION=-6-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > > > KDEB_PKGVERSION=5.11.0~rc5-6~bullseye+dileks1 LLVM=1
> > > > KCFLAGS=-fprofile-use=vmlinux.profdata
> > > >
> > > > Config: config-5.11.0-rc5-6-amd64-clang12-pgo
> > > >
> > > >
> > > > [ Step #3.2: 5.11.0-rc5-7-amd64-clang12-pgo & Clang-IAS ]
> > > >
> > > > The second rebuild with CONFIG_PGO_CLANG=n and "LLVM=1
> > > > KCFLAGS=-fprofile-use=vmlinux.profdata" plus LLVM_IAS=1.
> > > > Compilable but NOT bootable in QEMU and on bare metal.
> > > > Used assembler: Clang-IAS v12.0.0-rc1
> > > >
> > > > [ start-build_5.11.0-rc5-7-amd64-clang12-pgo.txt ]
> > > > dileks 6545 6520 0 16:31 pts/2 00:00:00 /usr/bin/perf_5.10
> > > > stat make V=1 -j4 HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang
> > > > LD=ld.lld PAHOLE=/opt/pahole/bin/pahole
> > > > LOCALVERSION=-7-amd64-clang12-pgo KBUILD_VERBOSE=1
> > > > KBUILD_BUILD_HOST=iniza [email protected]
> > > > KBUILD_BUILD_TIMESTAMP=2021-01-28 bindeb-pkg
> > > > KDEB_PKGVERSION=5.11.0~rc5-7~bullseye+dileks1 LLVM=1
> > > > KCFLAGS=-fprofile-use=vmlinux.profdata LLVM_IAS=1
> > > >
> > > > Config: config-5.11.0-rc5-7-amd64-clang12-pgo
> > > >
> > > >
> > > > [ Conclusion ]
> > > >
> > > > The only statement I can tell you is a "PGO optimized" rebuild with
> > > > LLVM_IAS=1 is compilable but NOT bootable.
> > >
> > > Thanks for the extensive testing and report. Can you compress, upload,
> > > and post a link to your kernel image? I would like to take it for a
> > > spin in QEMU and see if I can find what it's doing, then work
> > > backwards from there.
> > >
> >
> > Which files do you need?
> > For QEMU: bzImage and initrd.img enough?
>
> bzImage should be enough; I'll use my own initrd. If that boots for
> me, maybe then I'll take a look with the initrd added.
>

You should receive an email with a link to my dropbox shared-folder
"clang-pgo > for-nick".
Please let me know if you were able to download.

Thanks, Sedat

2021-01-29 07:45:31

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Jan 22, 2021 at 7:41 PM 'Nick Desaulniers' via Clang Built
Linux <[email protected]> wrote:
>
> On Fri, Jan 22, 2021 at 2:12 AM Bill Wendling <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
>
> Reviewed-by: Nick Desaulniers <[email protected]>
>
> Let's get this queued up, then start thinking about how we can follow
> up with improvements to docs, ergonomics of passing the profiling
> data, and any nailing down which configs tickle any compiler bugs,
> boot failures, or hash mismatches.
>

Some comments:

[ hash mismatches ]

Observed identical warnings when doing a rebuild with GAS or Clang-IAS.

[ Importance of LLVM_IAS=1 working ]

Clang-LTO and Clang-CFI depend both on LLVM_IAS=1 (see for example
"kbuild: add support for Clang LTO").
Sooner or later we will deal with this issue (hope it is not a local problem).

- Sedat -

[1] https://github.com/samitolvanen/linux/commit/27da26bada87bde166f01cb1f61b88b727f83a84


> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > testing.
> > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > v6: - Add better documentation about the locking scheme and other things.
> > - Rename macros to better match the same macros in LLVM's source code.
> > v7: - Fix minor build failure reported by Sedat.
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 4 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 44 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 189 +++++++++++++
> > kernel/pgo/pgo.h | 203 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1032 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9..8d6418e85806 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 000000000000..b7f11d8405b7
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 705776b31c8d..0a75d223682d 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13948,6 +13948,15 @@ S: Maintained
> > F: include/linux/personality.h
> > F: include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M: Sami Tolvanen <[email protected]>
> > +M: Bill Wendling <[email protected]>
> > +R: Nathan Chancellor <[email protected]>
> > +R: Nick Desaulniers <[email protected]>
> > +S: Supported
> > +F: Documentation/dev-tools/pgo.rst
> > +F: kernel/pgo
> > +
> > PHOENIX RC FLIGHT CONTROLLER ADAPTER
> > M: Marcus Folkesson <[email protected]>
> > L: [email protected]
> > diff --git a/Makefile b/Makefile
> > index b0e4767735dc..9339541f7cec 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 24862d15f3a3..f39d3991f6bf 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> > pairs of 32-bit arguments, select this option.
> >
> > source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> > source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 21f851179ff0..36305ea61dc0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -96,6 +96,7 @@ config X86
> > select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> > select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> > + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_QUEUED_RWLOCKS
> > select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce..383853e32f67 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3fa..ed12ab65f606 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE :=n
> >
> > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> > index a31de0c6ccde..5753aea7bcbd 100644
> > --- a/arch/x86/crypto/Makefile
> > +++ b/arch/x86/crypto/Makefile
> > @@ -4,6 +4,10 @@
> >
> > OBJECT_FILES_NON_STANDARD := y
> >
> > +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> > +# registers for some of the functions.
> > +PGO_PROFILE_curve25519-x86_64.o := n
> > +
> > obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
> >
> > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380b..26e2b3af0145 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> > $(call ld-option, --eh-frame-hdr) -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > quiet_cmd_vdso_and_check = VDSO $@
> > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f2..f6cab2316c46 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> > BUG_TABLE
> >
> > + PGO_CLANG_DATA
> > +
> > ORC_UNWIND_TABLE
> >
> > . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd..5f22b31446ad 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> > KASAN_SANITIZE := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20c..36f20e99da0b 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> > # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449..21797192f958 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f35..54f5768f5853 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b3..2d81623b33f2 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > # Sanitizer runtimes are unavailable and cannot be linked here.
> > KASAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535..3a591bb18c5f 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> > #define THERMAL_TABLE(name)
> > #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA \
> > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_start = .; \
> > + __llvm_prf_data_start = .; \
> > + KEEP(*(__llvm_prf_data)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_data_end = .; \
> > + } \
> > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_start = .; \
> > + KEEP(*(__llvm_prf_cnts)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_end = .; \
> > + } \
> > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_start = .; \
> > + KEEP(*(__llvm_prf_names)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> > + __llvm_prf_vals_start = .; \
> > + KEEP(*(__llvm_prf_vals)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vals_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> > + __llvm_prf_vnds_start = .; \
> > + KEEP(*(__llvm_prf_vnds)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vnds_end = .; \
> > + __llvm_prf_end = .; \
> > + }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> > #define KERNEL_DTB() \
> > STRUCT_ALIGN(); \
> > __dtb_start = .; \
> > @@ -1125,6 +1168,7 @@
> > CONSTRUCTORS \
> > } \
> > BUG_TABLE \
> > + PGO_CLANG_DATA
> >
> > #define INIT_TEXT_SECTION(inittext_align) \
> > . = ALIGN(inittext_align); \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf..0b34ca228ba4 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> > obj-$(CONFIG_KCSAN) += kcsan/
> > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> > obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 000000000000..76a640b6cf6e
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,35 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > + bool
> > +
> > +config PGO_CLANG
> > + bool "Enable clang's PGO-based kernel profiling"
> > + depends on DEBUG_FS
> > + depends on ARCH_SUPPORTS_PGO_CLANG
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> > + help
> > + This option enables clang's PGO (Profile Guided Optimization) based
> > + code profiling to better optimize the kernel.
> > +
> > + If unsure, say N.
> > +
> > + Run a representative workload for your application on a kernel
> > + compiled with this option and download the raw profile file from
> > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > + llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > + Copy the resulting profile file into vmlinux.profdata, and enable
> > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > + kernel.
> > +
> > + Note that a kernel compiled with profiling flags will be
> > + significantly larger and run slower. Also be sure to exclude files
> > + from profiling which are not linked to the kernel image to prevent
> > + linker errors.
> > +
> > + Note that the debugfs filesystem has to be mounted to access
> > + profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 000000000000..41e27cefd9a4
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > +
> > +obj-y += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 000000000000..1678df3b7d64
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,389 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > + void *buffer;
> > + unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + * - llvm_prf_header
> > + * - __llvm_prf_data
> > + * - __llvm_prf_cnts
> > + * - __llvm_prf_names
> > + * - zero padding to 8 bytes
> > + * - for each llvm_prf_data in __llvm_prf_data:
> > + * - llvm_prf_value_data
> > + * - llvm_prf_value_record + site count array
> > + * - llvm_prf_value_node_data
> > + * ...
> > + * ...
> > + * ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > +#ifdef CONFIG_64BIT
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> > +#else
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> > +#endif
> > + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> > + header->data_size = prf_data_count();
> > + header->padding_bytes_before_counters = 0;
> > + header->counters_size = prf_cnts_count();
> > + header->padding_bytes_after_counters = 0;
> > + header->names_size = prf_names_count();
> > + header->counters_delta = (u64)__llvm_prf_cnts_start;
> > + header->names_delta = (u64)__llvm_prf_names_start;
> > + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> > +
> > + *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > + memcpy(*buffer, src, size);
> > + *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + u32 kinds = 0;
> > + u32 size = 0;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Record + site count array */
> > + size += prf_get_value_record_size(sites);
> > + kinds++;
> > +
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX)
> > + site = site->next;
> > +
> > + size += count *
> > + sizeof(struct llvm_prf_value_node_data);
> > + }
> > +
> > + s += sites;
> > + }
> > +
> > + if (size)
> > + size += sizeof(struct llvm_prf_value_data);
> > +
> > + if (value_kinds)
> > + *value_kinds = kinds;
> > +
> > + return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > + u32 size = 0;
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + size += __prf_get_value_size(p, NULL);
> > +
> > + return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > + struct llvm_prf_value_data header;
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > + if (!header.num_value_kinds)
> > + /* Nothing to write. */
> > + return;
> > +
> > + prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + struct llvm_prf_value_record *record;
> > + u8 *counts;
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Profiling value record. */
> > + record = *(struct llvm_prf_value_record **)buffer;
> > + *buffer += prf_get_value_record_header_size();
> > +
> > + record->kind = kind;
> > + record->num_value_sites = sites;
> > +
> > + /* Site count array. */
> > + counts = *(u8 **)buffer;
> > + *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > + /*
> > + * If we don't have nodes, we can skip updating the site count
> > + * array, because the buffer is zero filled.
> > + */
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX) {
> > + prf_copy_to_buffer(buffer, site,
> > + sizeof(struct llvm_prf_value_node_data));
> > + site = site->next;
> > + }
> > +
> > + counts[n] = (u8)count;
> > + }
> > +
> > + s += sites;
> > + }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > + return 7 & (sizeof(u64) - size % sizeof(u64));
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > + return sizeof(struct llvm_prf_header) +
> > + prf_data_size() +
> > + prf_cnts_size() +
> > + prf_names_size() +
> > + prf_get_padding(prf_names_size()) +
> > + prf_get_value_size();
> > +}
> > +
> > +/*
> > + * Serialize the profiling data into a format LLVM's tools can understand.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > + int err = 0;
> > + void *buffer;
> > +
> > + p->size = prf_buffer_size();
> > + p->buffer = vzalloc(p->size);
> > +
> > + if (!p->buffer) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + buffer = p->buffer;
> > +
> > + prf_fill_header(&buffer);
> > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > + buffer += prf_get_padding(prf_names_size());
> > +
> > + prf_serialize_values(&buffer);
> > +
> > +out:
> > + return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data;
> > + unsigned long flags;
> > + int err;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + flags = prf_lock();
> > +
> > + err = prf_serialize(data);
> > + if (unlikely(err)) {
> > + kfree(data);
> > + goto out_unlock;
> > + }
> > +
> > + file->private_data = data;
> > +
> > +out_unlock:
> > + prf_unlock(flags);
> > +out:
> > + return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + BUG_ON(!data);
> > +
> > + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > + data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + if (data) {
> > + vfree(data->buffer);
> > + kfree(data);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > + .owner = THIS_MODULE,
> > + .open = prf_open,
> > + .read = prf_read,
> > + .llseek = default_llseek,
> > + .release = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > + size_t len, loff_t *pos)
> > +{
> > + struct llvm_prf_data *data;
> > +
> > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> > + struct llvm_prf_value_node **vnodes;
> > + u64 current_vsite_count;
> > + u32 i;
> > +
> > + if (!data->values)
> > + continue;
> > +
> > + current_vsite_count = 0;
> > + vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> > + current_vsite_count += data->num_value_sites[i];
> > +
> > + for (i = 0; i < current_vsite_count; i++) {
> > + struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > + while (current_vnode) {
> > + current_vnode->count = 0;
> > + current_vnode = current_vnode->next;
> > + }
> > + }
> > + }
> > +
> > + return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > + .owner = THIS_MODULE,
> > + .write = reset_write,
> > + .llseek = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > + directory = debugfs_create_dir("pgo", NULL);
> > + if (!directory)
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > + &prf_fops))
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("reset", 0200, directory, NULL,
> > + &prf_reset_fops))
> > + goto err_remove;
> > +
> > + return 0;
> > +
> > +err_remove:
> > + pr_err("initialization failed\n");
> > + return -EIO;
> > +}
> > +
> > +/* Remove debugfs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > + debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 000000000000..62ff5cfce7b1
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/*
> > + * This lock guards both profile count updating and serialization of the
> > + * profiling data. Keeping both of these activities separate via locking
> > + * ensures that we don't try to serialize data that's only partially updated.
> > + */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&pgo_lock, flags);
> > +
> > + return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > + spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > + u32 index, u64 value)
> > +{
> > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > + return NULL; /* Out of nodes */
> > +
> > + current_node++;
> > +
> > + /* Make sure the node is entirely within the section */
> > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > + return NULL;
> > +
> > + return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the index if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > + struct llvm_prf_value_node **counters;
> > + struct llvm_prf_value_node *curr;
> > + struct llvm_prf_value_node *min = NULL;
> > + struct llvm_prf_value_node *prev = NULL;
> > + u64 min_count = U64_MAX;
> > + u8 values = 0;
> > + unsigned long flags;
> > +
> > + if (!p || !p->values)
> > + return;
> > +
> > + counters = (struct llvm_prf_value_node **)p->values;
> > + curr = counters[index];
> > +
> > + while (curr) {
> > + if (target_value == curr->value) {
> > + curr->count++;
> > + return;
> > + }
> > +
> > + if (curr->count < min_count) {
> > + min_count = curr->count;
> > + min = curr;
> > + }
> > +
> > + prev = curr;
> > + curr = curr->next;
> > + values++;
> > + }
> > +
> > + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> > + if (!min->count || !(--min->count)) {
> > + curr = min;
> > + curr->value = target_value;
> > + curr->count++;
> > + }
> > + return;
> > + }
> > +
> > + /* Lock when updating the value node structure. */
> > + flags = prf_lock();
> > +
> > + curr = allocate_node(p, index, target_value);
> > + if (!curr)
> > + goto out;
> > +
> > + curr->value = target_value;
> > + curr->count++;
> > +
> > + if (!counters[index])
> > + counters[index] = curr;
> > + else if (prev && !prev->next)
> > + prev->next = curr;
> > +
> > +out:
> > + prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value);
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value)
> > +{
> > + if (large_value != S64_MIN && (s64)target_value >= large_value)
> > + target_value = large_value;
> > + else if ((s64)target_value < precise_start ||
> > + (s64)target_value > precise_last)
> > + target_value = precise_last + 1;
> > +
> > + __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > +
> > +static u64 inst_prof_get_range_rep_value(u64 value)
> > +{
> > + if (value <= 8)
> > + /* The first ranges are individually tracked, use it as is. */
> > + return value;
> > + else if (value >= 513)
> > + /* The last range is mapped to its lowest value. */
> > + return 513;
> > + else if (hweight64(value) == 1)
> > + /* If it's a power of two, use it as is. */
> > + return value;
> > +
> > + /* Otherwise, take to the previous power of two + 1. */
> > + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> > +}
> > +
> > +/*
> > + * The target values are partitioned into multiple ranges. The range spec is
> > + * defined in compiler-rt/include/profile/InstrProfData.inc.
> > + */
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index);
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index)
> > +{
> > + u64 rep_value;
> > +
> > + /* Map the target value to the representative value of its range. */
> > + rep_value = inst_prof_get_range_rep_value(target_value);
> > + __llvm_profile_instrument_target(rep_value, data, counter_index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 000000000000..ddc8d3002fe5
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,203 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'r' << 8 | \
> > + (u64)129)
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'R' << 8 | \
> > + (u64)129)
> > +
> > +#define LLVM_INSTR_PROF_RAW_VERSION 5
> > +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> > +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> > +#define LLVM_INSTR_PROF_IPVK_LAST 1
> > +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> > +
> > +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> > +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + * counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + * counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + * counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + * counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > + u64 magic;
> > + u64 version;
> > + u64 data_size;
> > + u64 padding_bytes_before_counters;
> > + u64 counters_size;
> > + u64 padding_bytes_after_counters;
> > + u64 names_size;
> > + u64 counters_delta;
> > + u64 names_delta;
> > + u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > + const u64 name_ref;
> > + const u64 func_hash;
> > + const void *counter_ptr;
> > + const void *function_ptr;
> > + void *values;
> > + const u32 num_counters;
> > + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> > +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + * llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > + u64 value;
> > + u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + * the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > + u64 value;
> > + u64 count;
> > + struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + * format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + * data.
> > + */
> > +struct llvm_prf_value_data {
> > + u32 total_size;
> > + u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + * profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + * of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > + u32 kind;
> > + u32 num_value_sites;
> > + u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size() \
> > + offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites) \
> > + roundup((sites), 8)
> > +#define prf_get_value_record_size(sites) \
> > + (prf_get_value_record_header_size() + \
> > + prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > + static inline unsigned long prf_ ## s ## _size(void) \
> > + { \
> > + unsigned long start = \
> > + (unsigned long)__llvm_prf_ ## s ## _start; \
> > + unsigned long end = \
> > + (unsigned long)__llvm_prf_ ## s ## _end; \
> > + return roundup(end - start, \
> > + sizeof(__llvm_prf_ ## s ## _start[0])); \
> > + } \
> > + static inline unsigned long prf_ ## s ## _count(void) \
> > + { \
> > + return prf_ ## s ## _size() / \
> > + sizeof(__llvm_prf_ ## s ## _start[0]); \
> > + }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33..9b218afb5cb8 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_GCOV))
> > endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > + $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> > #
> > # Enable address sanitizer flags for kernel except some files or directories
> > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.280.ga3ce27912f-goog
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdm%2B3o8z2GivPjSJRa%3Dc%3DUKdfkiY-79s6yn2BxJkFnoFTw%40mail.gmail.com.

2021-01-29 21:52:23

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

On Thu, Jan 28, 2021 at 11:43 PM Sedat Dilek <[email protected]> wrote:
>
> Some comments:
>
> [ hash mismatches ]
>
> Observed identical warnings when doing a rebuild with GAS or Clang-IAS.
>
> [ Importance of LLVM_IAS=1 working ]
>
> Clang-LTO and Clang-CFI depend both on LLVM_IAS=1 (see for example
> "kbuild: add support for Clang LTO").
> Sooner or later we will deal with this issue (hope it is not a local problem).

If you're switching back and forth between GAS and IAS, then I would
expect a hash error if you're trying to reuse profiling data from one
with the other. The profiling data is not portable when switching
toolchains between when the profile was collected, and when it was
used.
--
Thanks,
~Nick Desaulniers

2021-02-10 23:29:19

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

Bumping for review from Masahiro Yamada and Andrew Morton.

-bw

On Fri, Jan 22, 2021 at 2:12 AM Bill Wendling <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> ---
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> v3: - Added change log section based on Sedat Dilek's comments.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
> v7: - Fix minor build failure reported by Sedat.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1032 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9..8d6418e85806 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 000000000000..b7f11d8405b7
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 705776b31c8d..0a75d223682d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13948,6 +13948,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index b0e4767735dc..9339541f7cec 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..f39d3991f6bf 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> pairs of 32-bit arguments, select this option.
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..36305ea61dc0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -96,6 +96,7 @@ config X86
> select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce..383853e32f67 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..ed12ab65f606 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index a31de0c6ccde..5753aea7bcbd 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> +# registers for some of the functions.
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
>
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 02e3e42f380b..26e2b3af0145 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f2..f6cab2316c46 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd..5f22b31446ad 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..36f20e99da0b 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449..21797192f958 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f35..54f5768f5853 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 8a94388e38b3..2d81623b33f2 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535..3a591bb18c5f 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -316,6 +316,49 @@
> #define THERMAL_TABLE(name)
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1125,6 +1168,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index aa7368c7eabf..0b34ca228ba4 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 000000000000..76a640b6cf6e
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 000000000000..41e27cefd9a4
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 000000000000..1678df3b7d64
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,389 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +#ifdef CONFIG_64BIT
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> +#else
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> +#endif
> + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (sizeof(u64) - size % sizeof(u64));
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/*
> + * Serialize the profiling data into a format LLVM's tools can understand.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (unlikely(err)) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; i++) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 000000000000..62ff5cfce7b1
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/*
> + * This lock guards both profile count updating and serialization of the
> + * profiling data. Keeping both of these activities separate via locking
> + * ensures that we don't try to serialize data that's only partially updated.
> + */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the index if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, use it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 000000000000..ddc8d3002fe5
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +
> +#define LLVM_INSTR_PROF_RAW_VERSION 5
> +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> +#define LLVM_INSTR_PROF_IPVK_LAST 1
> +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> +
> +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 213677a5ed33..9b218afb5cb8 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.0.280.ga3ce27912f-goog
>

2021-02-22 22:44:40

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v7] pgo: add clang's Profile Guided Optimization infrastructure

Another bump for review. :-)


On Wed, Feb 10, 2021 at 3:25 PM Bill Wendling <[email protected]> wrote:
>
> Bumping for review from Masahiro Yamada and Andrew Morton.
>
> -bw
>
> On Fri, Jan 22, 2021 at 2:12 AM Bill Wendling <[email protected]> wrote:
> >
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
> > ---
> > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> > testing.
> > - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> > Song's comments.
> > v3: - Added change log section based on Sedat Dilek's comments.
> > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> > own popcount implementation, based on Nick Desaulniers's comment.
> > v5: - Correct padding calculation, discovered by Nathan Chancellor.
> > v6: - Add better documentation about the locking scheme and other things.
> > - Rename macros to better match the same macros in LLVM's source code.
> > v7: - Fix minor build failure reported by Sedat.
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 4 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 44 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 189 +++++++++++++
> > kernel/pgo/pgo.h | 203 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1032 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index f7809c7b1ba9..8d6418e85806 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -26,6 +26,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 000000000000..b7f11d8405b7
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 705776b31c8d..0a75d223682d 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13948,6 +13948,15 @@ S: Maintained
> > F: include/linux/personality.h
> > F: include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M: Sami Tolvanen <[email protected]>
> > +M: Bill Wendling <[email protected]>
> > +R: Nathan Chancellor <[email protected]>
> > +R: Nick Desaulniers <[email protected]>
> > +S: Supported
> > +F: Documentation/dev-tools/pgo.rst
> > +F: kernel/pgo
> > +
> > PHOENIX RC FLIGHT CONTROLLER ADAPTER
> > M: Marcus Folkesson <[email protected]>
> > L: [email protected]
> > diff --git a/Makefile b/Makefile
> > index b0e4767735dc..9339541f7cec 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -659,6 +659,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 24862d15f3a3..f39d3991f6bf 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1112,6 +1112,7 @@ config ARCH_SPLIT_ARG64
> > pairs of 32-bit arguments, select this option.
> >
> > source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> > source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 21f851179ff0..36305ea61dc0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -96,6 +96,7 @@ config X86
> > select ARCH_SUPPORTS_DEBUG_PAGEALLOC
> > select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
> > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> > + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_QUEUED_RWLOCKS
> > select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce..383853e32f67 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3fa..ed12ab65f606 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE :=n
> >
> > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> > index a31de0c6ccde..5753aea7bcbd 100644
> > --- a/arch/x86/crypto/Makefile
> > +++ b/arch/x86/crypto/Makefile
> > @@ -4,6 +4,10 @@
> >
> > OBJECT_FILES_NON_STANDARD := y
> >
> > +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> > +# registers for some of the functions.
> > +PGO_PROFILE_curve25519-x86_64.o := n
> > +
> > obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
> >
> > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 02e3e42f380b..26e2b3af0145 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -179,6 +179,7 @@ quiet_cmd_vdso = VDSO $@
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> > $(call ld-option, --eh-frame-hdr) -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > quiet_cmd_vdso_and_check = VDSO $@
> > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f2..f6cab2316c46 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> > BUG_TABLE
> >
> > + PGO_CLANG_DATA
> > +
> > ORC_UNWIND_TABLE
> >
> > . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd..5f22b31446ad 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> > KASAN_SANITIZE := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20c..36f20e99da0b 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> > # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449..21797192f958 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f35..54f5768f5853 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 8a94388e38b3..2d81623b33f2 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
> > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > # Sanitizer runtimes are unavailable and cannot be linked here.
> > KASAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index b2b3d81b1535..3a591bb18c5f 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -316,6 +316,49 @@
> > #define THERMAL_TABLE(name)
> > #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA \
> > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_start = .; \
> > + __llvm_prf_data_start = .; \
> > + KEEP(*(__llvm_prf_data)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_data_end = .; \
> > + } \
> > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_start = .; \
> > + KEEP(*(__llvm_prf_cnts)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_cnts_end = .; \
> > + } \
> > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_start = .; \
> > + KEEP(*(__llvm_prf_names)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_names_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> > + __llvm_prf_vals_start = .; \
> > + KEEP(*(__llvm_prf_vals)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vals_end = .; \
> > + . = ALIGN(8); \
> > + } \
> > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> > + __llvm_prf_vnds_start = .; \
> > + KEEP(*(__llvm_prf_vnds)) \
> > + . = ALIGN(8); \
> > + __llvm_prf_vnds_end = .; \
> > + __llvm_prf_end = .; \
> > + }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> > #define KERNEL_DTB() \
> > STRUCT_ALIGN(); \
> > __dtb_start = .; \
> > @@ -1125,6 +1168,7 @@
> > CONSTRUCTORS \
> > } \
> > BUG_TABLE \
> > + PGO_CLANG_DATA
> >
> > #define INIT_TEXT_SECTION(inittext_align) \
> > . = ALIGN(inittext_align); \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index aa7368c7eabf..0b34ca228ba4 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> > obj-$(CONFIG_KCSAN) += kcsan/
> > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> > obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 000000000000..76a640b6cf6e
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,35 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > + bool
> > +
> > +config PGO_CLANG
> > + bool "Enable clang's PGO-based kernel profiling"
> > + depends on DEBUG_FS
> > + depends on ARCH_SUPPORTS_PGO_CLANG
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> > + help
> > + This option enables clang's PGO (Profile Guided Optimization) based
> > + code profiling to better optimize the kernel.
> > +
> > + If unsure, say N.
> > +
> > + Run a representative workload for your application on a kernel
> > + compiled with this option and download the raw profile file from
> > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > + llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > + Copy the resulting profile file into vmlinux.profdata, and enable
> > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > + kernel.
> > +
> > + Note that a kernel compiled with profiling flags will be
> > + significantly larger and run slower. Also be sure to exclude files
> > + from profiling which are not linked to the kernel image to prevent
> > + linker errors.
> > +
> > + Note that the debugfs filesystem has to be mounted to access
> > + profiling data.
> > +
> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 000000000000..41e27cefd9a4
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > +
> > +obj-y += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 000000000000..1678df3b7d64
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,389 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > + void *buffer;
> > + unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + * - llvm_prf_header
> > + * - __llvm_prf_data
> > + * - __llvm_prf_cnts
> > + * - __llvm_prf_names
> > + * - zero padding to 8 bytes
> > + * - for each llvm_prf_data in __llvm_prf_data:
> > + * - llvm_prf_value_data
> > + * - llvm_prf_value_record + site count array
> > + * - llvm_prf_value_node_data
> > + * ...
> > + * ...
> > + * ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > +#ifdef CONFIG_64BIT
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> > +#else
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> > +#endif
> > + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> > + header->data_size = prf_data_count();
> > + header->padding_bytes_before_counters = 0;
> > + header->counters_size = prf_cnts_count();
> > + header->padding_bytes_after_counters = 0;
> > + header->names_size = prf_names_count();
> > + header->counters_delta = (u64)__llvm_prf_cnts_start;
> > + header->names_delta = (u64)__llvm_prf_names_start;
> > + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> > +
> > + *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > + memcpy(*buffer, src, size);
> > + *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + u32 kinds = 0;
> > + u32 size = 0;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Record + site count array */
> > + size += prf_get_value_record_size(sites);
> > + kinds++;
> > +
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX)
> > + site = site->next;
> > +
> > + size += count *
> > + sizeof(struct llvm_prf_value_node_data);
> > + }
> > +
> > + s += sites;
> > + }
> > +
> > + if (size)
> > + size += sizeof(struct llvm_prf_value_data);
> > +
> > + if (value_kinds)
> > + *value_kinds = kinds;
> > +
> > + return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > + u32 size = 0;
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + size += __prf_get_value_size(p, NULL);
> > +
> > + return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > + struct llvm_prf_value_data header;
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > + if (!header.num_value_kinds)
> > + /* Nothing to write. */
> > + return;
> > +
> > + prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + struct llvm_prf_value_record *record;
> > + u8 *counts;
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Profiling value record. */
> > + record = *(struct llvm_prf_value_record **)buffer;
> > + *buffer += prf_get_value_record_header_size();
> > +
> > + record->kind = kind;
> > + record->num_value_sites = sites;
> > +
> > + /* Site count array. */
> > + counts = *(u8 **)buffer;
> > + *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > + /*
> > + * If we don't have nodes, we can skip updating the site count
> > + * array, because the buffer is zero filled.
> > + */
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX) {
> > + prf_copy_to_buffer(buffer, site,
> > + sizeof(struct llvm_prf_value_node_data));
> > + site = site->next;
> > + }
> > +
> > + counts[n] = (u8)count;
> > + }
> > +
> > + s += sites;
> > + }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > + return 7 & (sizeof(u64) - size % sizeof(u64));
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > + return sizeof(struct llvm_prf_header) +
> > + prf_data_size() +
> > + prf_cnts_size() +
> > + prf_names_size() +
> > + prf_get_padding(prf_names_size()) +
> > + prf_get_value_size();
> > +}
> > +
> > +/*
> > + * Serialize the profiling data into a format LLVM's tools can understand.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > + int err = 0;
> > + void *buffer;
> > +
> > + p->size = prf_buffer_size();
> > + p->buffer = vzalloc(p->size);
> > +
> > + if (!p->buffer) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + buffer = p->buffer;
> > +
> > + prf_fill_header(&buffer);
> > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > + buffer += prf_get_padding(prf_names_size());
> > +
> > + prf_serialize_values(&buffer);
> > +
> > +out:
> > + return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data;
> > + unsigned long flags;
> > + int err;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + flags = prf_lock();
> > +
> > + err = prf_serialize(data);
> > + if (unlikely(err)) {
> > + kfree(data);
> > + goto out_unlock;
> > + }
> > +
> > + file->private_data = data;
> > +
> > +out_unlock:
> > + prf_unlock(flags);
> > +out:
> > + return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + BUG_ON(!data);
> > +
> > + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > + data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + if (data) {
> > + vfree(data->buffer);
> > + kfree(data);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > + .owner = THIS_MODULE,
> > + .open = prf_open,
> > + .read = prf_read,
> > + .llseek = default_llseek,
> > + .release = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > + size_t len, loff_t *pos)
> > +{
> > + struct llvm_prf_data *data;
> > +
> > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> > + struct llvm_prf_value_node **vnodes;
> > + u64 current_vsite_count;
> > + u32 i;
> > +
> > + if (!data->values)
> > + continue;
> > +
> > + current_vsite_count = 0;
> > + vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> > + current_vsite_count += data->num_value_sites[i];
> > +
> > + for (i = 0; i < current_vsite_count; i++) {
> > + struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > + while (current_vnode) {
> > + current_vnode->count = 0;
> > + current_vnode = current_vnode->next;
> > + }
> > + }
> > + }
> > +
> > + return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > + .owner = THIS_MODULE,
> > + .write = reset_write,
> > + .llseek = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > + directory = debugfs_create_dir("pgo", NULL);
> > + if (!directory)
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > + &prf_fops))
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("reset", 0200, directory, NULL,
> > + &prf_reset_fops))
> > + goto err_remove;
> > +
> > + return 0;
> > +
> > +err_remove:
> > + pr_err("initialization failed\n");
> > + return -EIO;
> > +}
> > +
> > +/* Remove debugfs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > + debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 000000000000..62ff5cfce7b1
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/*
> > + * This lock guards both profile count updating and serialization of the
> > + * profiling data. Keeping both of these activities separate via locking
> > + * ensures that we don't try to serialize data that's only partially updated.
> > + */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&pgo_lock, flags);
> > +
> > + return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > + spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > + u32 index, u64 value)
> > +{
> > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > + return NULL; /* Out of nodes */
> > +
> > + current_node++;
> > +
> > + /* Make sure the node is entirely within the section */
> > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > + return NULL;
> > +
> > + return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the index if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > + struct llvm_prf_value_node **counters;
> > + struct llvm_prf_value_node *curr;
> > + struct llvm_prf_value_node *min = NULL;
> > + struct llvm_prf_value_node *prev = NULL;
> > + u64 min_count = U64_MAX;
> > + u8 values = 0;
> > + unsigned long flags;
> > +
> > + if (!p || !p->values)
> > + return;
> > +
> > + counters = (struct llvm_prf_value_node **)p->values;
> > + curr = counters[index];
> > +
> > + while (curr) {
> > + if (target_value == curr->value) {
> > + curr->count++;
> > + return;
> > + }
> > +
> > + if (curr->count < min_count) {
> > + min_count = curr->count;
> > + min = curr;
> > + }
> > +
> > + prev = curr;
> > + curr = curr->next;
> > + values++;
> > + }
> > +
> > + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> > + if (!min->count || !(--min->count)) {
> > + curr = min;
> > + curr->value = target_value;
> > + curr->count++;
> > + }
> > + return;
> > + }
> > +
> > + /* Lock when updating the value node structure. */
> > + flags = prf_lock();
> > +
> > + curr = allocate_node(p, index, target_value);
> > + if (!curr)
> > + goto out;
> > +
> > + curr->value = target_value;
> > + curr->count++;
> > +
> > + if (!counters[index])
> > + counters[index] = curr;
> > + else if (prev && !prev->next)
> > + prev->next = curr;
> > +
> > +out:
> > + prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value);
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value)
> > +{
> > + if (large_value != S64_MIN && (s64)target_value >= large_value)
> > + target_value = large_value;
> > + else if ((s64)target_value < precise_start ||
> > + (s64)target_value > precise_last)
> > + target_value = precise_last + 1;
> > +
> > + __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > +
> > +static u64 inst_prof_get_range_rep_value(u64 value)
> > +{
> > + if (value <= 8)
> > + /* The first ranges are individually tracked, use it as is. */
> > + return value;
> > + else if (value >= 513)
> > + /* The last range is mapped to its lowest value. */
> > + return 513;
> > + else if (hweight64(value) == 1)
> > + /* If it's a power of two, use it as is. */
> > + return value;
> > +
> > + /* Otherwise, take to the previous power of two + 1. */
> > + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> > +}
> > +
> > +/*
> > + * The target values are partitioned into multiple ranges. The range spec is
> > + * defined in compiler-rt/include/profile/InstrProfData.inc.
> > + */
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index);
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index)
> > +{
> > + u64 rep_value;
> > +
> > + /* Map the target value to the representative value of its range. */
> > + rep_value = inst_prof_get_range_rep_value(target_value);
> > + __llvm_profile_instrument_target(rep_value, data, counter_index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 000000000000..ddc8d3002fe5
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,203 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'r' << 8 | \
> > + (u64)129)
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'R' << 8 | \
> > + (u64)129)
> > +
> > +#define LLVM_INSTR_PROF_RAW_VERSION 5
> > +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> > +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> > +#define LLVM_INSTR_PROF_IPVK_LAST 1
> > +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> > +
> > +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> > +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + * counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + * counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + * counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + * counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > + u64 magic;
> > + u64 version;
> > + u64 data_size;
> > + u64 padding_bytes_before_counters;
> > + u64 counters_size;
> > + u64 padding_bytes_after_counters;
> > + u64 names_size;
> > + u64 counters_delta;
> > + u64 names_delta;
> > + u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > + const u64 name_ref;
> > + const u64 func_hash;
> > + const void *counter_ptr;
> > + const void *function_ptr;
> > + void *values;
> > + const u32 num_counters;
> > + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> > +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + * llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > + u64 value;
> > + u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + * the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > + u64 value;
> > + u64 count;
> > + struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + * format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + * data.
> > + */
> > +struct llvm_prf_value_data {
> > + u32 total_size;
> > + u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + * profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + * of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > + u32 kind;
> > + u32 num_value_sites;
> > + u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size() \
> > + offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites) \
> > + roundup((sites), 8)
> > +#define prf_get_value_record_size(sites) \
> > + (prf_get_value_record_header_size() + \
> > + prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > + static inline unsigned long prf_ ## s ## _size(void) \
> > + { \
> > + unsigned long start = \
> > + (unsigned long)__llvm_prf_ ## s ## _start; \
> > + unsigned long end = \
> > + (unsigned long)__llvm_prf_ ## s ## _end; \
> > + return roundup(end - start, \
> > + sizeof(__llvm_prf_ ## s ## _start[0])); \
> > + } \
> > + static inline unsigned long prf_ ## s ## _count(void) \
> > + { \
> > + return prf_ ## s ## _size() / \
> > + sizeof(__llvm_prf_ ## s ## _start[0]); \
> > + }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 213677a5ed33..9b218afb5cb8 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -143,6 +143,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_GCOV))
> > endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > + $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> > #
> > # Enable address sanitizer flags for kernel except some files or directories
> > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.30.0.280.ga3ce27912f-goog
> >

2021-02-26 22:23:42

by Bill Wendling

[permalink] [raw]
Subject: [PATCH v8] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
---
v8: - Rebased on top-of-tree.
v7: - Fix minor build failure reported by Sedat.
v6: - Add better documentation about the locking scheme and other things.
- Rename macros to better match the same macros in LLVM's source code.
v5: - Correct padding calculation, discovered by Nathan Chancellor.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
own popcount implementation, based on Nick Desaulniers's comment.
v3: - Added change log section based on Sedat Dilek's comments.
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
testing.
- Corrected documentation, re PGO flags when using LTO, based on Fangrui
Song's comments.
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/crypto/Makefile | 4 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 44 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 35 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 189 +++++++++++++
kernel/pgo/pgo.h | 203 ++++++++++++++
scripts/Makefile.lib | 10 +
24 files changed, 1032 insertions(+)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9..8d6418e85806 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 000000000000..b7f11d8405b7
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ $ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index c71664ca8bfd..3a6668792bc5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14019,6 +14019,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index 6ecd0d22e608..b57d4d44c799 100644
--- a/Makefile
+++ b/Makefile
@@ -657,6 +657,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index 2bb30673d8e6..111e642a2af7 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1192,6 +1192,7 @@ config ARCH_HAS_ELFCORE_COMPAT
bool

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cd4b9b1204a8..c9808583b528 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -99,6 +99,7 @@ config X86
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
select ARCH_SUPPORTS_LTO_CLANG if X86_64
select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce..383853e32f67 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3fa..ed12ab65f606 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index b28e36b7c96b..4b2e9620c412 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -4,6 +4,10 @@

OBJECT_FILES_NON_STANDARD := y

+# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
+# registers for some of the functions.
+PGO_PROFILE_curve25519-x86_64.o := n
+
obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 05c4abc2fdfd..f7421e44725a 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f2..f6cab2316c46 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd..5f22b31446ad 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..36f20e99da0b 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449..21797192f958 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f35..54f5768f5853 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index c23466e05e60..724fb389bb9d 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 6786f8c0182f..4a0c21b840b3 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -329,6 +329,49 @@
#define DTPM_TABLE()
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ KEEP(*(__llvm_prf_data)) \
+ . = ALIGN(8); \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_start = .; \
+ KEEP(*(__llvm_prf_cnts)) \
+ . = ALIGN(8); \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ . = ALIGN(8); \
+ __llvm_prf_names_start = .; \
+ KEEP(*(__llvm_prf_names)) \
+ . = ALIGN(8); \
+ __llvm_prf_names_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ KEEP(*(__llvm_prf_vals)) \
+ . = ALIGN(8); \
+ __llvm_prf_vals_end = .; \
+ . = ALIGN(8); \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ KEEP(*(__llvm_prf_vnds)) \
+ . = ALIGN(8); \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1105,6 +1148,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index 320f1f3941b7..a2a23ef2b12f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 000000000000..76a640b6cf6e
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significantly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 000000000000..41e27cefd9a4
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 000000000000..1678df3b7d64
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,389 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+#ifdef CONFIG_64BIT
+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
+#else
+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
+#endif
+ header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 7 & (sizeof(u64) - size % sizeof(u64));
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/*
+ * Serialize the profiling data into a format LLVM's tools can understand.
+ * Note: caller *must* hold pgo_lock.
+ */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (unlikely(err)) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; i++) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debugfs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 000000000000..62ff5cfce7b1
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/bitops.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/*
+ * This lock guards both profile count updating and serialization of the
+ * profiling data. Keeping both of these activities separate via locking
+ * ensures that we don't try to serialize data that's only partially updated.
+ */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the index if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value);
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+ if (value <= 8)
+ /* The first ranges are individually tracked, use it as is. */
+ return value;
+ else if (value >= 513)
+ /* The last range is mapped to its lowest value. */
+ return 513;
+ else if (hweight64(value) == 1)
+ /* If it's a power of two, use it as is. */
+ return value;
+
+ /* Otherwise, take to the previous power of two + 1. */
+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index);
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index)
+{
+ u64 rep_value;
+
+ /* Map the target value to the representative value of its range. */
+ rep_value = inst_prof_get_range_rep_value(target_value);
+ __llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 000000000000..ddc8d3002fe5
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+
+#define LLVM_INSTR_PROF_RAW_VERSION 5
+#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
+#define LLVM_INSTR_PROF_IPVK_FIRST 0
+#define LLVM_INSTR_PROF_IPVK_LAST 1
+#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
+
+#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
+#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
+} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index eee59184de64..48a65d092c5b 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.30.1.766.gb4fecdf3b7-goog

2021-02-26 22:57:34

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v8] pgo: add clang's Profile Guided Optimization infrastructure

On Fri, Feb 26, 2021 at 2:20 PM Bill Wendling <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>

I forgot to add these tags:

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>

> ---
> v8: - Rebased on top-of-tree.
> v7: - Fix minor build failure reported by Sedat.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
> v5: - Correct padding calculation, discovered by Nathan Chancellor.
> v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
> v3: - Added change log section based on Sedat Dilek's comments.
> v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1032 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index f7809c7b1ba9..8d6418e85806 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 000000000000..b7f11d8405b7
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c71664ca8bfd..3a6668792bc5 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14019,6 +14019,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index 6ecd0d22e608..b57d4d44c799 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -657,6 +657,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 2bb30673d8e6..111e642a2af7 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1192,6 +1192,7 @@ config ARCH_HAS_ELFCORE_COMPAT
> bool
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cd4b9b1204a8..c9808583b528 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -99,6 +99,7 @@ config X86
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> select ARCH_SUPPORTS_LTO_CLANG if X86_64
> select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce..383853e32f67 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..ed12ab65f606 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index b28e36b7c96b..4b2e9620c412 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> +# registers for some of the functions.
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
> obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 05c4abc2fdfd..f7421e44725a 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f2..f6cab2316c46 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd..5f22b31446ad 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..36f20e99da0b 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449..21797192f958 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f35..54f5768f5853 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index c23466e05e60..724fb389bb9d 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 6786f8c0182f..4a0c21b840b3 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -329,6 +329,49 @@
> #define DTPM_TABLE()
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + KEEP(*(__llvm_prf_data)) \
> + . = ALIGN(8); \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_start = .; \
> + KEEP(*(__llvm_prf_cnts)) \
> + . = ALIGN(8); \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + . = ALIGN(8); \
> + __llvm_prf_names_start = .; \
> + KEEP(*(__llvm_prf_names)) \
> + . = ALIGN(8); \
> + __llvm_prf_names_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + KEEP(*(__llvm_prf_vals)) \
> + . = ALIGN(8); \
> + __llvm_prf_vals_end = .; \
> + . = ALIGN(8); \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + KEEP(*(__llvm_prf_vnds)) \
> + . = ALIGN(8); \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1105,6 +1148,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 320f1f3941b7..a2a23ef2b12f 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 000000000000..76a640b6cf6e
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 000000000000..41e27cefd9a4
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 000000000000..1678df3b7d64
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,389 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +#ifdef CONFIG_64BIT
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> +#else
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> +#endif
> + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (sizeof(u64) - size % sizeof(u64));
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/*
> + * Serialize the profiling data into a format LLVM's tools can understand.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (unlikely(err)) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; i++) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 000000000000..62ff5cfce7b1
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/*
> + * This lock guards both profile count updating and serialization of the
> + * profiling data. Keeping both of these activities separate via locking
> + * ensures that we don't try to serialize data that's only partially updated.
> + */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the index if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, use it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 000000000000..ddc8d3002fe5
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +
> +#define LLVM_INSTR_PROF_RAW_VERSION 5
> +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> +#define LLVM_INSTR_PROF_IPVK_LAST 1
> +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> +
> +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index eee59184de64..48a65d092c5b 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.30.1.766.gb4fecdf3b7-goog
>

2021-02-28 20:39:22

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH v8] pgo: add clang's Profile Guided Optimization infrastructure

Reviewed-by: Fangrui Song <[email protected]>

Some minor items below:

On 2021-02-26, 'Bill Wendling' via Clang Built Linux wrote:
>From: Sami Tolvanen <[email protected]>
>
>Enable the use of clang's Profile-Guided Optimization[1]. To generate a
>profile, the kernel is instrumented with PGO counters, a representative
>workload is run, and the raw profile data is collected from
>/sys/kernel/debug/pgo/profraw.
>
>The raw profile data must be processed by clang's "llvm-profdata" tool
>before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
>Multiple raw profiles may be merged during this step.
>
>The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
>This initial submission is restricted to x86, as that's the platform we
>know works. This restriction can be lifted once other platforms have
>been verified to work with PGO.
>
>Note that this method of profiling the kernel is clang-native, unlike
>the clang support in kernel/gcov.
>
>[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
>Signed-off-by: Sami Tolvanen <[email protected]>
>Co-developed-by: Bill Wendling <[email protected]>
>Signed-off-by: Bill Wendling <[email protected]>
>---
>v8: - Rebased on top-of-tree.
>v7: - Fix minor build failure reported by Sedat.
>v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
>v5: - Correct padding calculation, discovered by Nathan Chancellor.
>v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
> own popcount implementation, based on Nick Desaulniers's comment.
>v3: - Added change log section based on Sedat Dilek's comments.
>v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
> testing.
> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
> Song's comments.
>---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 44 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1032 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
>diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
>index f7809c7b1ba9..8d6418e85806 100644
>--- a/Documentation/dev-tools/index.rst
>+++ b/Documentation/dev-tools/index.rst
>@@ -26,6 +26,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
>+ pgo
>
>
> .. only:: subproject and html
>diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
>new file mode 100644
>index 000000000000..b7f11d8405b7
>--- /dev/null
>+++ b/Documentation/dev-tools/pgo.rst
>@@ -0,0 +1,127 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+===============================
>+Using PGO with the Linux kernel
>+===============================
>+
>+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
>+when building with Clang. The profiling data is exported via the ``pgo``
>+debugfs directory.
>+
>+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>+
>+
>+Preparation
>+===========
>+
>+Configure the kernel with:
>+
>+.. code-block:: make
>+
>+ CONFIG_DEBUG_FS=y
>+ CONFIG_PGO_CLANG=y
>+
>+Note that kernels compiled with profiling flags will be significantly larger
>+and run slower.
>+
>+Profiling data will only become accessible once debugfs has been mounted:
>+
>+.. code-block:: sh
>+
>+ mount -t debugfs none /sys/kernel/debug
>+
>+
>+Customization
>+=============
>+
>+You can enable or disable profiling for individual file and directories by
>+adding a line similar to the following to the respective kernel Makefile:
>+
>+- For a single file (e.g. main.o)
>+
>+ .. code-block:: make
>+
>+ PGO_PROFILE_main.o := y
>+
>+- For all files in one directory
>+
>+ .. code-block:: make
>+
>+ PGO_PROFILE := y
>+
>+To exclude files from being profiled use
>+
>+ .. code-block:: make
>+
>+ PGO_PROFILE_main.o := n
>+
>+and
>+
>+ .. code-block:: make
>+
>+ PGO_PROFILE := n
>+
>+Only files which are linked to the main kernel image or are compiled as kernel
>+modules are supported by this mechanism.
>+
>+
>+Files
>+=====
>+
>+The PGO kernel support creates the following files in debugfs:
>+
>+``/sys/kernel/debug/pgo``
>+ Parent directory for all PGO-related files.
>+
>+``/sys/kernel/debug/pgo/reset``
>+ Global reset file: resets all coverage data to zero when written to.
>+
>+``/sys/kernel/debug/profraw``
>+ The raw PGO data that must be processed with ``llvm_profdata``.
>+
>+
>+Workflow
>+========
>+
>+The PGO kernel can be run on the host or test machines. The data though should
>+be analyzed with Clang's tools from the same Clang version as the kernel was
>+compiled. Clang's tolerant of version skew, but it's easier to use the same
>+Clang version.
>+
>+The profiling data is useful for optimizing the kernel, analyzing coverage,
>+etc. Clang offers tools to perform these tasks.
>+
>+Here is an example workflow for profiling an instrumented kernel with PGO and
>+using the result to optimize the kernel:
>+
>+1) Install the kernel on the TEST machine.
>+
>+2) Reset the data counters right before running the load tests
>+
>+ .. code-block:: sh
>+
>+ $ echo 1 > /sys/kernel/debug/pgo/reset
>+
>+3) Run the load tests.
>+
>+4) Collect the raw profile data
>+
>+ .. code-block:: sh
>+
>+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
>+
>+5) (Optional) Download the raw profile data to the HOST machine.
>+
>+6) Process the raw profile data
>+
>+ .. code-block:: sh
>+
>+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>+
>+ Note that multiple raw profile data files can be merged during this step.
>+
>+7) Rebuild the kernel using the profile data (PGO disabled)
>+
>+ .. code-block:: sh
>+
>+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>diff --git a/MAINTAINERS b/MAINTAINERS
>index c71664ca8bfd..3a6668792bc5 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -14019,6 +14019,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
>+PGO BASED KERNEL PROFILING
>+M: Sami Tolvanen <[email protected]>
>+M: Bill Wendling <[email protected]>
>+R: Nathan Chancellor <[email protected]>
>+R: Nick Desaulniers <[email protected]>
>+S: Supported
>+F: Documentation/dev-tools/pgo.rst
>+F: kernel/pgo
>+
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
>diff --git a/Makefile b/Makefile
>index 6ecd0d22e608..b57d4d44c799 100644
>--- a/Makefile
>+++ b/Makefile
>@@ -657,6 +657,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
>+CFLAGS_PGO_CLANG := -fprofile-generate
>+export CFLAGS_PGO_CLANG
>+
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
>diff --git a/arch/Kconfig b/arch/Kconfig
>index 2bb30673d8e6..111e642a2af7 100644
>--- a/arch/Kconfig
>+++ b/arch/Kconfig
>@@ -1192,6 +1192,7 @@ config ARCH_HAS_ELFCORE_COMPAT
> bool
>
> source "kernel/gcov/Kconfig"
>+source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
>diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>index cd4b9b1204a8..c9808583b528 100644
>--- a/arch/x86/Kconfig
>+++ b/arch/x86/Kconfig
>@@ -99,6 +99,7 @@ config X86
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> select ARCH_SUPPORTS_LTO_CLANG if X86_64
> select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
>+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
>diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
>index fe605205b4ce..383853e32f67 100644
>--- a/arch/x86/boot/Makefile
>+++ b/arch/x86/boot/Makefile
>@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
>diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>index e0bc3988c3fa..ed12ab65f606 100644
>--- a/arch/x86/boot/compressed/Makefile
>+++ b/arch/x86/boot/compressed/Makefile
>@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
>diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
>index b28e36b7c96b..4b2e9620c412 100644
>--- a/arch/x86/crypto/Makefile
>+++ b/arch/x86/crypto/Makefile
>@@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
>+# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
>+# registers for some of the functions.
>+PGO_PROFILE_curve25519-x86_64.o := n
>+
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
> obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
>diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
>index 05c4abc2fdfd..f7421e44725a 100644
>--- a/arch/x86/entry/vdso/Makefile
>+++ b/arch/x86/entry/vdso/Makefile
>@@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
>diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>index efd9e9ea17f2..f6cab2316c46 100644
>--- a/arch/x86/kernel/vmlinux.lds.S
>+++ b/arch/x86/kernel/vmlinux.lds.S
>@@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
>+ PGO_CLANG_DATA
>+
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
>diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
>index 84b09c230cbd..5f22b31446ad 100644
>--- a/arch/x86/platform/efi/Makefile
>+++ b/arch/x86/platform/efi/Makefile
>@@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
>diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
>index 95ea17a9d20c..36f20e99da0b 100644
>--- a/arch/x86/purgatory/Makefile
>+++ b/arch/x86/purgatory/Makefile
>@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
>diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
>index 83f1b6a56449..21797192f958 100644
>--- a/arch/x86/realmode/rm/Makefile
>+++ b/arch/x86/realmode/rm/Makefile
>@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> UBSAN_SANITIZE := n
>diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
>index 5943387e3f35..54f5768f5853 100644
>--- a/arch/x86/um/vdso/Makefile
>+++ b/arch/x86/um/vdso/Makefile
>@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
>+PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
>diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
>index c23466e05e60..724fb389bb9d 100644
>--- a/drivers/firmware/efi/libstub/Makefile
>+++ b/drivers/firmware/efi/libstub/Makefile
>@@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
>+PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
>diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>index 6786f8c0182f..4a0c21b840b3 100644
>--- a/include/asm-generic/vmlinux.lds.h
>+++ b/include/asm-generic/vmlinux.lds.h
>@@ -329,6 +329,49 @@
> #define DTPM_TABLE()
> #endif
>
>+#ifdef CONFIG_PGO_CLANG
>+#define PGO_CLANG_DATA \
>+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
>+ . = ALIGN(8); \
>+ __llvm_prf_start = .; \
>+ __llvm_prf_data_start = .; \
>+ KEEP(*(__llvm_prf_data)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_data_end = .; \
>+ } \

Some minor items on linker script usage. The end of a metadata section
usually does not need alignment. Does the . = ALIGN(8) have
significance? Ditto below.



This is an item about LD_DEAD_CODE_DATA_ELIMINATION. Feel free to
postpone after this patch is in tree:

KEEP(*(__llvm_prf_data))

KEEP should be dropped.

I have been involved in improving GC (my recent interests on such
metadata sections :)
https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order)

With LLVM>=13 (https://reviews.llvm.org/D96757), __llvm_prf_* associated
to non-COMDAT text sections can be GCed as well. KEEP would
unnecessarily retain them under LD_DEAD_CODE_DATA_ELIMINATION.

For older releases (at least 10<=LLVM<13), such __llvm_prf_* sections
are not in zero flag section groups so they usually cannot be discarded.
So perhaps with KEEP or without KEEP, you won't find many size
differences.

>+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
>+ . = ALIGN(8); \
>+ __llvm_prf_cnts_start = .; \
>+ KEEP(*(__llvm_prf_cnts)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_cnts_end = .; \
>+ } \
>+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
>+ . = ALIGN(8); \
>+ __llvm_prf_names_start = .; \
>+ KEEP(*(__llvm_prf_names)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_names_end = .; \
>+ . = ALIGN(8); \
>+ } \

__llvm_prf_names does not need alignment.
It is often 1 in userspace programs.

>+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
>+ __llvm_prf_vals_start = .; \
>+ KEEP(*(__llvm_prf_vals)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_vals_end = .; \
>+ . = ALIGN(8); \
>+ } \
>+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
>+ __llvm_prf_vnds_start = .; \
>+ KEEP(*(__llvm_prf_vnds)) \
>+ . = ALIGN(8); \
>+ __llvm_prf_vnds_end = .; \
>+ __llvm_prf_end = .; \
>+ }

In userspace PGO instrumentation, the start is often aligned by 16.
The end does not need alignment.

>+#else
>+#define PGO_CLANG_DATA
>+#endif
>+
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
>@@ -1105,6 +1148,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
>+ PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
>diff --git a/kernel/Makefile b/kernel/Makefile
>index 320f1f3941b7..a2a23ef2b12f 100644
>--- a/kernel/Makefile
>+++ b/kernel/Makefile
>@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
>+obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
>diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
>new file mode 100644
>index 000000000000..76a640b6cf6e
>--- /dev/null
>+++ b/kernel/pgo/Kconfig
>@@ -0,0 +1,35 @@
>+# SPDX-License-Identifier: GPL-2.0-only
>+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
>+
>+config ARCH_SUPPORTS_PGO_CLANG
>+ bool
>+
>+config PGO_CLANG
>+ bool "Enable clang's PGO-based kernel profiling"
>+ depends on DEBUG_FS
>+ depends on ARCH_SUPPORTS_PGO_CLANG
>+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000
>+ help
>+ This option enables clang's PGO (Profile Guided Optimization) based
>+ code profiling to better optimize the kernel.
>+
>+ If unsure, say N.
>+
>+ Run a representative workload for your application on a kernel
>+ compiled with this option and download the raw profile file from
>+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
>+ llvm-profdata. It may be merged with other collected raw profiles.
>+
>+ Copy the resulting profile file into vmlinux.profdata, and enable
>+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
>+ kernel.
>+
>+ Note that a kernel compiled with profiling flags will be
>+ significantly larger and run slower. Also be sure to exclude files
>+ from profiling which are not linked to the kernel image to prevent
>+ linker errors.
>+
>+ Note that the debugfs filesystem has to be mounted to access
>+ profiling data.
>+
>+endmenu
>diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
>new file mode 100644
>index 000000000000..41e27cefd9a4
>--- /dev/null
>+++ b/kernel/pgo/Makefile
>@@ -0,0 +1,5 @@
>+# SPDX-License-Identifier: GPL-2.0
>+GCOV_PROFILE := n
>+PGO_PROFILE := n
>+
>+obj-y += fs.o instrument.o
>diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
>new file mode 100644
>index 000000000000..1678df3b7d64
>--- /dev/null
>+++ b/kernel/pgo/fs.c
>@@ -0,0 +1,389 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ * Sami Tolvanen <[email protected]>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#define pr_fmt(fmt) "pgo: " fmt
>+
>+#include <linux/kernel.h>
>+#include <linux/debugfs.h>
>+#include <linux/fs.h>
>+#include <linux/module.h>
>+#include <linux/slab.h>
>+#include <linux/vmalloc.h>
>+#include "pgo.h"
>+
>+static struct dentry *directory;
>+
>+struct prf_private_data {
>+ void *buffer;
>+ unsigned long size;
>+};
>+
>+/*
>+ * Raw profile data format:
>+ *
>+ * - llvm_prf_header
>+ * - __llvm_prf_data
>+ * - __llvm_prf_cnts
>+ * - __llvm_prf_names
>+ * - zero padding to 8 bytes
>+ * - for each llvm_prf_data in __llvm_prf_data:
>+ * - llvm_prf_value_data
>+ * - llvm_prf_value_record + site count array
>+ * - llvm_prf_value_node_data
>+ * ...
>+ * ...
>+ * ...
>+ */
>+
>+static void prf_fill_header(void **buffer)
>+{
>+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
>+
>+#ifdef CONFIG_64BIT
>+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
>+#else
>+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
>+#endif
>+ header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
>+ header->data_size = prf_data_count();
>+ header->padding_bytes_before_counters = 0;
>+ header->counters_size = prf_cnts_count();
>+ header->padding_bytes_after_counters = 0;
>+ header->names_size = prf_names_count();
>+ header->counters_delta = (u64)__llvm_prf_cnts_start;
>+ header->names_delta = (u64)__llvm_prf_names_start;
>+ header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
>+
>+ *buffer += sizeof(*header);
>+}
>+
>+/*
>+ * Copy the source into the buffer, incrementing the pointer into buffer in the
>+ * process.
>+ */
>+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
>+{
>+ memcpy(*buffer, src, size);
>+ *buffer += size;
>+}
>+
>+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
>+{
>+ struct llvm_prf_value_node **nodes =
>+ (struct llvm_prf_value_node **)p->values;
>+ u32 kinds = 0;
>+ u32 size = 0;
>+ unsigned int kind;
>+ unsigned int n;
>+ unsigned int s = 0;
>+
>+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
>+ unsigned int sites = p->num_value_sites[kind];
>+
>+ if (!sites)
>+ continue;
>+
>+ /* Record + site count array */
>+ size += prf_get_value_record_size(sites);
>+ kinds++;
>+
>+ if (!nodes)
>+ continue;
>+
>+ for (n = 0; n < sites; n++) {
>+ u32 count = 0;
>+ struct llvm_prf_value_node *site = nodes[s + n];
>+
>+ while (site && ++count <= U8_MAX)
>+ site = site->next;
>+
>+ size += count *
>+ sizeof(struct llvm_prf_value_node_data);
>+ }
>+
>+ s += sites;
>+ }
>+
>+ if (size)
>+ size += sizeof(struct llvm_prf_value_data);
>+
>+ if (value_kinds)
>+ *value_kinds = kinds;
>+
>+ return size;
>+}
>+
>+static u32 prf_get_value_size(void)
>+{
>+ u32 size = 0;
>+ struct llvm_prf_data *p;
>+
>+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
>+ size += __prf_get_value_size(p, NULL);
>+
>+ return size;
>+}
>+
>+/* Serialize the profiling's value. */
>+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
>+{
>+ struct llvm_prf_value_data header;
>+ struct llvm_prf_value_node **nodes =
>+ (struct llvm_prf_value_node **)p->values;
>+ unsigned int kind;
>+ unsigned int n;
>+ unsigned int s = 0;
>+
>+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
>+
>+ if (!header.num_value_kinds)
>+ /* Nothing to write. */
>+ return;
>+
>+ prf_copy_to_buffer(buffer, &header, sizeof(header));
>+
>+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
>+ struct llvm_prf_value_record *record;
>+ u8 *counts;
>+ unsigned int sites = p->num_value_sites[kind];
>+
>+ if (!sites)
>+ continue;
>+
>+ /* Profiling value record. */
>+ record = *(struct llvm_prf_value_record **)buffer;
>+ *buffer += prf_get_value_record_header_size();
>+
>+ record->kind = kind;
>+ record->num_value_sites = sites;
>+
>+ /* Site count array. */
>+ counts = *(u8 **)buffer;
>+ *buffer += prf_get_value_record_site_count_size(sites);
>+
>+ /*
>+ * If we don't have nodes, we can skip updating the site count
>+ * array, because the buffer is zero filled.
>+ */
>+ if (!nodes)
>+ continue;
>+
>+ for (n = 0; n < sites; n++) {
>+ u32 count = 0;
>+ struct llvm_prf_value_node *site = nodes[s + n];
>+
>+ while (site && ++count <= U8_MAX) {
>+ prf_copy_to_buffer(buffer, site,
>+ sizeof(struct llvm_prf_value_node_data));
>+ site = site->next;
>+ }
>+
>+ counts[n] = (u8)count;
>+ }
>+
>+ s += sites;
>+ }
>+}
>+
>+static void prf_serialize_values(void **buffer)
>+{
>+ struct llvm_prf_data *p;
>+
>+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
>+ prf_serialize_value(p, buffer);
>+}
>+
>+static inline unsigned long prf_get_padding(unsigned long size)
>+{
>+ return 7 & (sizeof(u64) - size % sizeof(u64));
>+}
>+
>+static unsigned long prf_buffer_size(void)
>+{
>+ return sizeof(struct llvm_prf_header) +
>+ prf_data_size() +
>+ prf_cnts_size() +
>+ prf_names_size() +
>+ prf_get_padding(prf_names_size()) +
>+ prf_get_value_size();
>+}
>+
>+/*
>+ * Serialize the profiling data into a format LLVM's tools can understand.
>+ * Note: caller *must* hold pgo_lock.
>+ */
>+static int prf_serialize(struct prf_private_data *p)
>+{
>+ int err = 0;
>+ void *buffer;
>+
>+ p->size = prf_buffer_size();
>+ p->buffer = vzalloc(p->size);
>+
>+ if (!p->buffer) {
>+ err = -ENOMEM;
>+ goto out;
>+ }
>+
>+ buffer = p->buffer;
>+
>+ prf_fill_header(&buffer);
>+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
>+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
>+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
>+ buffer += prf_get_padding(prf_names_size());
>+
>+ prf_serialize_values(&buffer);
>+
>+out:
>+ return err;
>+}
>+
>+/* open() implementation for PGO. Creates a copy of the profiling data set. */
>+static int prf_open(struct inode *inode, struct file *file)
>+{
>+ struct prf_private_data *data;
>+ unsigned long flags;
>+ int err;
>+
>+ data = kzalloc(sizeof(*data), GFP_KERNEL);
>+ if (!data) {
>+ err = -ENOMEM;
>+ goto out;
>+ }
>+
>+ flags = prf_lock();
>+
>+ err = prf_serialize(data);
>+ if (unlikely(err)) {
>+ kfree(data);
>+ goto out_unlock;
>+ }
>+
>+ file->private_data = data;
>+
>+out_unlock:
>+ prf_unlock(flags);
>+out:
>+ return err;
>+}
>+
>+/* read() implementation for PGO. */
>+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
>+ loff_t *ppos)
>+{
>+ struct prf_private_data *data = file->private_data;
>+
>+ BUG_ON(!data);
>+
>+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
>+ data->size);
>+}
>+
>+/* release() implementation for PGO. Release resources allocated by open(). */
>+static int prf_release(struct inode *inode, struct file *file)
>+{
>+ struct prf_private_data *data = file->private_data;
>+
>+ if (data) {
>+ vfree(data->buffer);
>+ kfree(data);
>+ }
>+
>+ return 0;
>+}
>+
>+static const struct file_operations prf_fops = {
>+ .owner = THIS_MODULE,
>+ .open = prf_open,
>+ .read = prf_read,
>+ .llseek = default_llseek,
>+ .release = prf_release
>+};
>+
>+/* write() implementation for resetting PGO's profile data. */
>+static ssize_t reset_write(struct file *file, const char __user *addr,
>+ size_t len, loff_t *pos)
>+{
>+ struct llvm_prf_data *data;
>+
>+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
>+
>+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
>+ struct llvm_prf_value_node **vnodes;
>+ u64 current_vsite_count;
>+ u32 i;
>+
>+ if (!data->values)
>+ continue;
>+
>+ current_vsite_count = 0;
>+ vnodes = (struct llvm_prf_value_node **)data->values;
>+
>+ for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
>+ current_vsite_count += data->num_value_sites[i];
>+
>+ for (i = 0; i < current_vsite_count; i++) {
>+ struct llvm_prf_value_node *current_vnode = vnodes[i];
>+
>+ while (current_vnode) {
>+ current_vnode->count = 0;
>+ current_vnode = current_vnode->next;
>+ }
>+ }
>+ }
>+
>+ return len;
>+}
>+
>+static const struct file_operations prf_reset_fops = {
>+ .owner = THIS_MODULE,
>+ .write = reset_write,
>+ .llseek = noop_llseek,
>+};
>+
>+/* Create debugfs entries. */
>+static int __init pgo_init(void)
>+{
>+ directory = debugfs_create_dir("pgo", NULL);
>+ if (!directory)
>+ goto err_remove;
>+
>+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
>+ &prf_fops))
>+ goto err_remove;
>+
>+ if (!debugfs_create_file("reset", 0200, directory, NULL,
>+ &prf_reset_fops))
>+ goto err_remove;
>+
>+ return 0;
>+
>+err_remove:
>+ pr_err("initialization failed\n");
>+ return -EIO;
>+}
>+
>+/* Remove debugfs entries. */
>+static void __exit pgo_exit(void)
>+{
>+ debugfs_remove_recursive(directory);
>+}
>+
>+module_init(pgo_init);
>+module_exit(pgo_exit);
>diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
>new file mode 100644
>index 000000000000..62ff5cfce7b1
>--- /dev/null
>+++ b/kernel/pgo/instrument.c
>@@ -0,0 +1,189 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ * Sami Tolvanen <[email protected]>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#define pr_fmt(fmt) "pgo: " fmt
>+
>+#include <linux/bitops.h>
>+#include <linux/kernel.h>
>+#include <linux/export.h>
>+#include <linux/spinlock.h>
>+#include <linux/types.h>
>+#include "pgo.h"
>+
>+/*
>+ * This lock guards both profile count updating and serialization of the
>+ * profiling data. Keeping both of these activities separate via locking
>+ * ensures that we don't try to serialize data that's only partially updated.
>+ */
>+static DEFINE_SPINLOCK(pgo_lock);
>+static int current_node;
>+
>+unsigned long prf_lock(void)
>+{
>+ unsigned long flags;
>+
>+ spin_lock_irqsave(&pgo_lock, flags);
>+
>+ return flags;
>+}
>+
>+void prf_unlock(unsigned long flags)
>+{
>+ spin_unlock_irqrestore(&pgo_lock, flags);
>+}
>+
>+/*
>+ * Return a newly allocated profiling value node which contains the tracked
>+ * value by the value profiler.
>+ * Note: caller *must* hold pgo_lock.
>+ */
>+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
>+ u32 index, u64 value)
>+{
>+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
>+ return NULL; /* Out of nodes */
>+
>+ current_node++;
>+
>+ /* Make sure the node is entirely within the section */
>+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
>+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
>+ return NULL;
>+
>+ return &__llvm_prf_vnds_start[current_node];
>+}
>+
>+/*
>+ * Counts the number of times a target value is seen.
>+ *
>+ * Records the target value for the index if not seen before. Otherwise,
>+ * increments the counter associated w/ the target value.
>+ */
>+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
>+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
>+{
>+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
>+ struct llvm_prf_value_node **counters;
>+ struct llvm_prf_value_node *curr;
>+ struct llvm_prf_value_node *min = NULL;
>+ struct llvm_prf_value_node *prev = NULL;
>+ u64 min_count = U64_MAX;
>+ u8 values = 0;
>+ unsigned long flags;
>+
>+ if (!p || !p->values)
>+ return;
>+
>+ counters = (struct llvm_prf_value_node **)p->values;
>+ curr = counters[index];
>+
>+ while (curr) {
>+ if (target_value == curr->value) {
>+ curr->count++;
>+ return;
>+ }
>+
>+ if (curr->count < min_count) {
>+ min_count = curr->count;
>+ min = curr;
>+ }
>+
>+ prev = curr;
>+ curr = curr->next;
>+ values++;
>+ }
>+
>+ if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
>+ if (!min->count || !(--min->count)) {
>+ curr = min;
>+ curr->value = target_value;
>+ curr->count++;
>+ }
>+ return;
>+ }
>+
>+ /* Lock when updating the value node structure. */
>+ flags = prf_lock();
>+
>+ curr = allocate_node(p, index, target_value);
>+ if (!curr)
>+ goto out;
>+
>+ curr->value = target_value;
>+ curr->count++;
>+
>+ if (!counters[index])
>+ counters[index] = curr;
>+ else if (prev && !prev->next)
>+ prev->next = curr;
>+
>+out:
>+ prf_unlock(flags);
>+}
>+EXPORT_SYMBOL(__llvm_profile_instrument_target);
>+
>+/* Counts the number of times a range of targets values are seen. */
>+void __llvm_profile_instrument_range(u64 target_value, void *data,
>+ u32 index, s64 precise_start,
>+ s64 precise_last, s64 large_value);
>+void __llvm_profile_instrument_range(u64 target_value, void *data,
>+ u32 index, s64 precise_start,
>+ s64 precise_last, s64 large_value)
>+{
>+ if (large_value != S64_MIN && (s64)target_value >= large_value)
>+ target_value = large_value;
>+ else if ((s64)target_value < precise_start ||
>+ (s64)target_value > precise_last)
>+ target_value = precise_last + 1;
>+
>+ __llvm_profile_instrument_target(target_value, data, index);
>+}
>+EXPORT_SYMBOL(__llvm_profile_instrument_range);
>+
>+static u64 inst_prof_get_range_rep_value(u64 value)
>+{
>+ if (value <= 8)
>+ /* The first ranges are individually tracked, use it as is. */
>+ return value;
>+ else if (value >= 513)
>+ /* The last range is mapped to its lowest value. */
>+ return 513;
>+ else if (hweight64(value) == 1)
>+ /* If it's a power of two, use it as is. */
>+ return value;
>+
>+ /* Otherwise, take to the previous power of two + 1. */
>+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
>+}
>+
>+/*
>+ * The target values are partitioned into multiple ranges. The range spec is
>+ * defined in compiler-rt/include/profile/InstrProfData.inc.
>+ */
>+void __llvm_profile_instrument_memop(u64 target_value, void *data,
>+ u32 counter_index);
>+void __llvm_profile_instrument_memop(u64 target_value, void *data,
>+ u32 counter_index)
>+{
>+ u64 rep_value;
>+
>+ /* Map the target value to the representative value of its range. */
>+ rep_value = inst_prof_get_range_rep_value(target_value);
>+ __llvm_profile_instrument_target(rep_value, data, counter_index);
>+}
>+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
>diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
>new file mode 100644
>index 000000000000..ddc8d3002fe5
>--- /dev/null
>+++ b/kernel/pgo/pgo.h
>@@ -0,0 +1,203 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+/*
>+ * Copyright (C) 2019 Google, Inc.
>+ *
>+ * Author:
>+ * Sami Tolvanen <[email protected]>
>+ *
>+ * This software is licensed under the terms of the GNU General Public
>+ * License version 2, as published by the Free Software Foundation, and
>+ * may be copied, distributed, and modified under those terms.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+ * GNU General Public License for more details.
>+ *
>+ */
>+
>+#ifndef _PGO_H
>+#define _PGO_H
>+
>+/*
>+ * Note: These internal LLVM definitions must match the compiler version.
>+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
>+ */
>+
>+#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
>+ ((u64)255 << 56 | \
>+ (u64)'l' << 48 | \
>+ (u64)'p' << 40 | \
>+ (u64)'r' << 32 | \
>+ (u64)'o' << 24 | \
>+ (u64)'f' << 16 | \
>+ (u64)'r' << 8 | \
>+ (u64)129)
>+#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
>+ ((u64)255 << 56 | \
>+ (u64)'l' << 48 | \
>+ (u64)'p' << 40 | \
>+ (u64)'r' << 32 | \
>+ (u64)'o' << 24 | \
>+ (u64)'f' << 16 | \
>+ (u64)'R' << 8 | \
>+ (u64)129)
>+
>+#define LLVM_INSTR_PROF_RAW_VERSION 5
>+#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
>+#define LLVM_INSTR_PROF_IPVK_FIRST 0
>+#define LLVM_INSTR_PROF_IPVK_LAST 1
>+#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
>+
>+#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
>+#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
>+
>+/**
>+ * struct llvm_prf_header - represents the raw profile header data structure.
>+ * @magic: the magic token for the file format.
>+ * @version: the version of the file format.
>+ * @data_size: the number of entries in the profile data section.
>+ * @padding_bytes_before_counters: the number of padding bytes before the
>+ * counters.
>+ * @counters_size: the size in bytes of the LLVM profile section containing the
>+ * counters.
>+ * @padding_bytes_after_counters: the number of padding bytes after the
>+ * counters.
>+ * @names_size: the size in bytes of the LLVM profile section containing the
>+ * counters' names.
>+ * @counters_delta: the beginning of the LLMV profile counters section.
>+ * @names_delta: the beginning of the LLMV profile names section.
>+ * @value_kind_last: the last profile value kind.
>+ */
>+struct llvm_prf_header {
>+ u64 magic;
>+ u64 version;
>+ u64 data_size;
>+ u64 padding_bytes_before_counters;
>+ u64 counters_size;
>+ u64 padding_bytes_after_counters;
>+ u64 names_size;
>+ u64 counters_delta;
>+ u64 names_delta;
>+ u64 value_kind_last;
>+};
>+
>+/**
>+ * struct llvm_prf_data - represents the per-function control structure.
>+ * @name_ref: the reference to the function's name.
>+ * @func_hash: the hash value of the function.
>+ * @counter_ptr: a pointer to the profile counter.
>+ * @function_ptr: a pointer to the function.
>+ * @values: the profiling values associated with this function.
>+ * @num_counters: the number of counters in the function.
>+ * @num_value_sites: the number of value profile sites.
>+ */
>+struct llvm_prf_data {
>+ const u64 name_ref;
>+ const u64 func_hash;
>+ const void *counter_ptr;
>+ const void *function_ptr;
>+ void *values;
>+ const u32 num_counters;
>+ const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
>+} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
>+
>+/**
>+ * structure llvm_prf_value_node_data - represents the data part of the struct
>+ * llvm_prf_value_node data structure.
>+ * @value: the value counters.
>+ * @count: the counters' count.
>+ */
>+struct llvm_prf_value_node_data {
>+ u64 value;
>+ u64 count;
>+};
>+
>+/**
>+ * struct llvm_prf_value_node - represents an internal data structure used by
>+ * the value profiler.
>+ * @value: the value counters.
>+ * @count: the counters' count.
>+ * @next: the next value node.
>+ */
>+struct llvm_prf_value_node {
>+ u64 value;
>+ u64 count;
>+ struct llvm_prf_value_node *next;
>+};
>+
>+/**
>+ * struct llvm_prf_value_data - represents the value profiling data in indexed
>+ * format.
>+ * @total_size: the total size in bytes including this field.
>+ * @num_value_kinds: the number of value profile kinds that has value profile
>+ * data.
>+ */
>+struct llvm_prf_value_data {
>+ u32 total_size;
>+ u32 num_value_kinds;
>+};
>+
>+/**
>+ * struct llvm_prf_value_record - represents the on-disk layout of the value
>+ * profile data of a particular kind for one function.
>+ * @kind: the kind of the value profile record.
>+ * @num_value_sites: the number of value profile sites.
>+ * @site_count_array: the first element of the array that stores the number
>+ * of profiled values for each value site.
>+ */
>+struct llvm_prf_value_record {
>+ u32 kind;
>+ u32 num_value_sites;
>+ u8 site_count_array[];
>+};
>+
>+#define prf_get_value_record_header_size() \
>+ offsetof(struct llvm_prf_value_record, site_count_array)
>+#define prf_get_value_record_site_count_size(sites) \
>+ roundup((sites), 8)
>+#define prf_get_value_record_size(sites) \
>+ (prf_get_value_record_header_size() + \
>+ prf_get_value_record_site_count_size((sites)))
>+
>+/* Data sections */
>+extern struct llvm_prf_data __llvm_prf_data_start[];
>+extern struct llvm_prf_data __llvm_prf_data_end[];
>+
>+extern u64 __llvm_prf_cnts_start[];
>+extern u64 __llvm_prf_cnts_end[];
>+
>+extern char __llvm_prf_names_start[];
>+extern char __llvm_prf_names_end[];
>+
>+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
>+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
>+
>+/* Locking for vnodes */
>+extern unsigned long prf_lock(void);
>+extern void prf_unlock(unsigned long flags);
>+
>+#define __DEFINE_PRF_SIZE(s) \
>+ static inline unsigned long prf_ ## s ## _size(void) \
>+ { \
>+ unsigned long start = \
>+ (unsigned long)__llvm_prf_ ## s ## _start; \
>+ unsigned long end = \
>+ (unsigned long)__llvm_prf_ ## s ## _end; \
>+ return roundup(end - start, \
>+ sizeof(__llvm_prf_ ## s ## _start[0])); \
>+ } \
>+ static inline unsigned long prf_ ## s ## _count(void) \
>+ { \
>+ return prf_ ## s ## _size() / \
>+ sizeof(__llvm_prf_ ## s ## _start[0]); \
>+ }
>+
>+__DEFINE_PRF_SIZE(data);
>+__DEFINE_PRF_SIZE(cnts);
>+__DEFINE_PRF_SIZE(names);
>+__DEFINE_PRF_SIZE(vnds);
>+
>+#undef __DEFINE_PRF_SIZE
>+
>+#endif /* _PGO_H */
>diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
>index eee59184de64..48a65d092c5b 100644
>--- a/scripts/Makefile.lib
>+++ b/scripts/Makefile.lib
>@@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
>+#
>+# Enable clang's PGO profiling flags for a file or directory depending on
>+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
>+#
>+ifeq ($(CONFIG_PGO_CLANG),y)
>+_c_flags += $(if $(patsubst n%,, \
>+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
>+ $(CFLAGS_PGO_CLANG))
>+endif
>+
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
>--
>2.30.1.766.gb4fecdf3b7-goog
>
>--
>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210226222030.3718075-1-morbo%40google.com.

2021-02-28 23:33:03

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH v8] pgo: add clang's Profile Guided Optimization infrastructure

On 2021-02-28, Fangrui Song wrote:
>Reviewed-by: Fangrui Song <[email protected]>
>
>Some minor items below:
>
>On 2021-02-26, 'Bill Wendling' via Clang Built Linux wrote:
>>From: Sami Tolvanen <[email protected]>
>>
>>Enable the use of clang's Profile-Guided Optimization[1]. To generate a
>>profile, the kernel is instrumented with PGO counters, a representative
>>workload is run, and the raw profile data is collected from
>>/sys/kernel/debug/pgo/profraw.
>>
>>The raw profile data must be processed by clang's "llvm-profdata" tool
>>before it can be used during recompilation:
>>
>> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>>
>>Multiple raw profiles may be merged during this step.
>>
>>The data can now be used by the compiler:
>>
>> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>>
>>This initial submission is restricted to x86, as that's the platform we
>>know works. This restriction can be lifted once other platforms have
>>been verified to work with PGO.
>>
>>Note that this method of profiling the kernel is clang-native, unlike
>>the clang support in kernel/gcov.
>>
>>[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>>
>>Signed-off-by: Sami Tolvanen <[email protected]>
>>Co-developed-by: Bill Wendling <[email protected]>
>>Signed-off-by: Bill Wendling <[email protected]>
>>---
>>v8: - Rebased on top-of-tree.
>>v7: - Fix minor build failure reported by Sedat.
>>v6: - Add better documentation about the locking scheme and other things.
>> - Rename macros to better match the same macros in LLVM's source code.
>>v5: - Correct padding calculation, discovered by Nathan Chancellor.
>>v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
>> own popcount implementation, based on Nick Desaulniers's comment.
>>v3: - Added change log section based on Sedat Dilek's comments.
>>v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
>> testing.
>> - Corrected documentation, re PGO flags when using LTO, based on Fangrui
>> Song's comments.
>>---
>>Documentation/dev-tools/index.rst | 1 +
>>Documentation/dev-tools/pgo.rst | 127 +++++++++
>>MAINTAINERS | 9 +
>>Makefile | 3 +
>>arch/Kconfig | 1 +
>>arch/x86/Kconfig | 1 +
>>arch/x86/boot/Makefile | 1 +
>>arch/x86/boot/compressed/Makefile | 1 +
>>arch/x86/crypto/Makefile | 4 +
>>arch/x86/entry/vdso/Makefile | 1 +
>>arch/x86/kernel/vmlinux.lds.S | 2 +
>>arch/x86/platform/efi/Makefile | 1 +
>>arch/x86/purgatory/Makefile | 1 +
>>arch/x86/realmode/rm/Makefile | 1 +
>>arch/x86/um/vdso/Makefile | 1 +
>>drivers/firmware/efi/libstub/Makefile | 1 +
>>include/asm-generic/vmlinux.lds.h | 44 +++
>>kernel/Makefile | 1 +
>>kernel/pgo/Kconfig | 35 +++
>>kernel/pgo/Makefile | 5 +
>>kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
>>kernel/pgo/instrument.c | 189 +++++++++++++
>>kernel/pgo/pgo.h | 203 ++++++++++++++
>>scripts/Makefile.lib | 10 +
>>24 files changed, 1032 insertions(+)
>>create mode 100644 Documentation/dev-tools/pgo.rst
>>create mode 100644 kernel/pgo/Kconfig
>>create mode 100644 kernel/pgo/Makefile
>>create mode 100644 kernel/pgo/fs.c
>>create mode 100644 kernel/pgo/instrument.c
>>create mode 100644 kernel/pgo/pgo.h
>>
>>diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
>>index f7809c7b1ba9..8d6418e85806 100644
>>--- a/Documentation/dev-tools/index.rst
>>+++ b/Documentation/dev-tools/index.rst
>>@@ -26,6 +26,7 @@ whole; patches welcome!
>> kgdb
>> kselftest
>> kunit/index
>>+ pgo
>>
>>
>>.. only:: subproject and html
>>diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
>>new file mode 100644
>>index 000000000000..b7f11d8405b7
>>--- /dev/null
>>+++ b/Documentation/dev-tools/pgo.rst
>>@@ -0,0 +1,127 @@
>>+.. SPDX-License-Identifier: GPL-2.0
>>+
>>+===============================
>>+Using PGO with the Linux kernel
>>+===============================
>>+
>>+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
>>+when building with Clang. The profiling data is exported via the ``pgo``
>>+debugfs directory.
>>+
>>+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>>+
>>+
>>+Preparation
>>+===========
>>+
>>+Configure the kernel with:
>>+
>>+.. code-block:: make
>>+
>>+ CONFIG_DEBUG_FS=y
>>+ CONFIG_PGO_CLANG=y
>>+
>>+Note that kernels compiled with profiling flags will be significantly larger
>>+and run slower.
>>+
>>+Profiling data will only become accessible once debugfs has been mounted:
>>+
>>+.. code-block:: sh
>>+
>>+ mount -t debugfs none /sys/kernel/debug
>>+
>>+
>>+Customization
>>+=============
>>+
>>+You can enable or disable profiling for individual file and directories by
>>+adding a line similar to the following to the respective kernel Makefile:
>>+
>>+- For a single file (e.g. main.o)
>>+
>>+ .. code-block:: make
>>+
>>+ PGO_PROFILE_main.o := y
>>+
>>+- For all files in one directory
>>+
>>+ .. code-block:: make
>>+
>>+ PGO_PROFILE := y
>>+
>>+To exclude files from being profiled use
>>+
>>+ .. code-block:: make
>>+
>>+ PGO_PROFILE_main.o := n
>>+
>>+and
>>+
>>+ .. code-block:: make
>>+
>>+ PGO_PROFILE := n
>>+
>>+Only files which are linked to the main kernel image or are compiled as kernel
>>+modules are supported by this mechanism.
>>+
>>+
>>+Files
>>+=====
>>+
>>+The PGO kernel support creates the following files in debugfs:
>>+
>>+``/sys/kernel/debug/pgo``
>>+ Parent directory for all PGO-related files.
>>+
>>+``/sys/kernel/debug/pgo/reset``
>>+ Global reset file: resets all coverage data to zero when written to.
>>+
>>+``/sys/kernel/debug/profraw``
>>+ The raw PGO data that must be processed with ``llvm_profdata``.
>>+
>>+
>>+Workflow
>>+========
>>+
>>+The PGO kernel can be run on the host or test machines. The data though should
>>+be analyzed with Clang's tools from the same Clang version as the kernel was
>>+compiled. Clang's tolerant of version skew, but it's easier to use the same
>>+Clang version.
>>+
>>+The profiling data is useful for optimizing the kernel, analyzing coverage,
>>+etc. Clang offers tools to perform these tasks.
>>+
>>+Here is an example workflow for profiling an instrumented kernel with PGO and
>>+using the result to optimize the kernel:
>>+
>>+1) Install the kernel on the TEST machine.
>>+
>>+2) Reset the data counters right before running the load tests
>>+
>>+ .. code-block:: sh
>>+
>>+ $ echo 1 > /sys/kernel/debug/pgo/reset
>>+
>>+3) Run the load tests.
>>+
>>+4) Collect the raw profile data
>>+
>>+ .. code-block:: sh
>>+
>>+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
>>+
>>+5) (Optional) Download the raw profile data to the HOST machine.
>>+
>>+6) Process the raw profile data
>>+
>>+ .. code-block:: sh
>>+
>>+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>>+
>>+ Note that multiple raw profile data files can be merged during this step.
>>+
>>+7) Rebuild the kernel using the profile data (PGO disabled)
>>+
>>+ .. code-block:: sh
>>+
>>+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>>diff --git a/MAINTAINERS b/MAINTAINERS
>>index c71664ca8bfd..3a6668792bc5 100644
>>--- a/MAINTAINERS
>>+++ b/MAINTAINERS
>>@@ -14019,6 +14019,15 @@ S: Maintained
>>F: include/linux/personality.h
>>F: include/uapi/linux/personality.h
>>
>>+PGO BASED KERNEL PROFILING
>>+M: Sami Tolvanen <[email protected]>
>>+M: Bill Wendling <[email protected]>
>>+R: Nathan Chancellor <[email protected]>
>>+R: Nick Desaulniers <[email protected]>
>>+S: Supported
>>+F: Documentation/dev-tools/pgo.rst
>>+F: kernel/pgo
>>+
>>PHOENIX RC FLIGHT CONTROLLER ADAPTER
>>M: Marcus Folkesson <[email protected]>
>>L: [email protected]
>>diff --git a/Makefile b/Makefile
>>index 6ecd0d22e608..b57d4d44c799 100644
>>--- a/Makefile
>>+++ b/Makefile
>>@@ -657,6 +657,9 @@ endif # KBUILD_EXTMOD
>># Defaults to vmlinux, but the arch makefile usually adds further targets
>>all: vmlinux
>>
>>+CFLAGS_PGO_CLANG := -fprofile-generate
>>+export CFLAGS_PGO_CLANG
>>+
>>CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
>> $(call cc-option,-fno-tree-loop-im) \
>> $(call cc-disable-warning,maybe-uninitialized,)
>>diff --git a/arch/Kconfig b/arch/Kconfig
>>index 2bb30673d8e6..111e642a2af7 100644
>>--- a/arch/Kconfig
>>+++ b/arch/Kconfig
>>@@ -1192,6 +1192,7 @@ config ARCH_HAS_ELFCORE_COMPAT
>> bool
>>
>>source "kernel/gcov/Kconfig"
>>+source "kernel/pgo/Kconfig"
>>
>>source "scripts/gcc-plugins/Kconfig"
>>
>>diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>index cd4b9b1204a8..c9808583b528 100644
>>--- a/arch/x86/Kconfig
>>+++ b/arch/x86/Kconfig
>>@@ -99,6 +99,7 @@ config X86
>> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
>> select ARCH_SUPPORTS_LTO_CLANG if X86_64
>> select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
>>+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
>> select ARCH_USE_BUILTIN_BSWAP
>> select ARCH_USE_QUEUED_RWLOCKS
>> select ARCH_USE_QUEUED_SPINLOCKS
>>diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
>>index fe605205b4ce..383853e32f67 100644
>>--- a/arch/x86/boot/Makefile
>>+++ b/arch/x86/boot/Makefile
>>@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>>KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
>>KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>>GCOV_PROFILE := n
>>+PGO_PROFILE := n
>>UBSAN_SANITIZE := n
>>
>>$(obj)/bzImage: asflags-y := $(SVGA_MODE)
>>diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>>index e0bc3988c3fa..ed12ab65f606 100644
>>--- a/arch/x86/boot/compressed/Makefile
>>+++ b/arch/x86/boot/compressed/Makefile
>>@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>>
>>KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>>GCOV_PROFILE := n
>>+PGO_PROFILE := n
>>UBSAN_SANITIZE :=n
>>
>>KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
>>diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
>>index b28e36b7c96b..4b2e9620c412 100644
>>--- a/arch/x86/crypto/Makefile
>>+++ b/arch/x86/crypto/Makefile
>>@@ -4,6 +4,10 @@
>>
>>OBJECT_FILES_NON_STANDARD := y
>>
>>+# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
>>+# registers for some of the functions.
>>+PGO_PROFILE_curve25519-x86_64.o := n
>>+
>>obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
>>twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
>>obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
>>diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
>>index 05c4abc2fdfd..f7421e44725a 100644
>>--- a/arch/x86/entry/vdso/Makefile
>>+++ b/arch/x86/entry/vdso/Makefile
>>@@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@
>>VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
>> $(call ld-option, --eh-frame-hdr) -Bsymbolic
>>GCOV_PROFILE := n
>>+PGO_PROFILE := n
>>
>>quiet_cmd_vdso_and_check = VDSO $@
>> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
>>diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>>index efd9e9ea17f2..f6cab2316c46 100644
>>--- a/arch/x86/kernel/vmlinux.lds.S
>>+++ b/arch/x86/kernel/vmlinux.lds.S
>>@@ -184,6 +184,8 @@ SECTIONS
>>
>> BUG_TABLE
>>
>>+ PGO_CLANG_DATA
>>+
>> ORC_UNWIND_TABLE
>>
>> . = ALIGN(PAGE_SIZE);
>>diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
>>index 84b09c230cbd..5f22b31446ad 100644
>>--- a/arch/x86/platform/efi/Makefile
>>+++ b/arch/x86/platform/efi/Makefile
>>@@ -2,6 +2,7 @@
>>OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
>>KASAN_SANITIZE := n
>>GCOV_PROFILE := n
>>+PGO_PROFILE := n
>>
>>obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
>>obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
>>diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
>>index 95ea17a9d20c..36f20e99da0b 100644
>>--- a/arch/x86/purgatory/Makefile
>>+++ b/arch/x86/purgatory/Makefile
>>@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>>
>># Sanitizer, etc. runtimes are unavailable and cannot be linked here.
>>GCOV_PROFILE := n
>>+PGO_PROFILE := n
>>KASAN_SANITIZE := n
>>UBSAN_SANITIZE := n
>>KCSAN_SANITIZE := n
>>diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
>>index 83f1b6a56449..21797192f958 100644
>>--- a/arch/x86/realmode/rm/Makefile
>>+++ b/arch/x86/realmode/rm/Makefile
>>@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
>>KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>>KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>>GCOV_PROFILE := n
>>+PGO_PROFILE := n
>>UBSAN_SANITIZE := n
>>diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
>>index 5943387e3f35..54f5768f5853 100644
>>--- a/arch/x86/um/vdso/Makefile
>>+++ b/arch/x86/um/vdso/Makefile
>>@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>>
>>VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
>>GCOV_PROFILE := n
>>+PGO_PROFILE := n
>>
>>#
>># Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
>>diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
>>index c23466e05e60..724fb389bb9d 100644
>>--- a/drivers/firmware/efi/libstub/Makefile
>>+++ b/drivers/firmware/efi/libstub/Makefile
>>@@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
>>KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
>>
>>GCOV_PROFILE := n
>>+PGO_PROFILE := n
>># Sanitizer runtimes are unavailable and cannot be linked here.
>>KASAN_SANITIZE := n
>>KCSAN_SANITIZE := n
>>diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>>index 6786f8c0182f..4a0c21b840b3 100644
>>--- a/include/asm-generic/vmlinux.lds.h
>>+++ b/include/asm-generic/vmlinux.lds.h
>>@@ -329,6 +329,49 @@
>>#define DTPM_TABLE()
>>#endif
>>
>>+#ifdef CONFIG_PGO_CLANG
>>+#define PGO_CLANG_DATA \
>>+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
>>+ . = ALIGN(8); \
>>+ __llvm_prf_start = .; \
>>+ __llvm_prf_data_start = .; \
>>+ KEEP(*(__llvm_prf_data)) \
>>+ . = ALIGN(8); \
>>+ __llvm_prf_data_end = .; \
>>+ } \
>
>Some minor items on linker script usage. The end of a metadata section
>usually does not need alignment. Does the . = ALIGN(8) have
>significance? Ditto below.
>
>
>
>This is an item about LD_DEAD_CODE_DATA_ELIMINATION. Feel free to
>postpone after this patch is in tree:
>
> KEEP(*(__llvm_prf_data))
>
>KEEP should be dropped.
>
>I have been involved in improving GC (my recent interests on such
>metadata sections :)
>https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order)
>
>With LLVM>=13 (https://reviews.llvm.org/D96757), __llvm_prf_* associated
>to non-COMDAT text sections can be GCed as well. KEEP would
>unnecessarily retain them under LD_DEAD_CODE_DATA_ELIMINATION.
>
>For older releases (at least 10<=LLVM<13), such __llvm_prf_* sections
>are not in zero flag section groups so they usually cannot be discarded.
>So perhaps with KEEP or without KEEP, you won't find many size
>differences.
>
>>+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
>>+ . = ALIGN(8); \
>>+ __llvm_prf_cnts_start = .; \
>>+ KEEP(*(__llvm_prf_cnts)) \
>>+ . = ALIGN(8); \
>>+ __llvm_prf_cnts_end = .; \
>>+ } \
>>+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
>>+ . = ALIGN(8); \
>>+ __llvm_prf_names_start = .; \
>>+ KEEP(*(__llvm_prf_names)) \
>>+ . = ALIGN(8); \
>>+ __llvm_prf_names_end = .; \
>>+ . = ALIGN(8); \
>>+ } \
>
>__llvm_prf_names does not need alignment.
>It is often 1 in userspace programs.
>
>>+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
>>+ __llvm_prf_vals_start = .; \
>>+ KEEP(*(__llvm_prf_vals)) \
>>+ . = ALIGN(8); \
>>+ __llvm_prf_vals_end = .; \
>>+ . = ALIGN(8); \
>>+ } \
>>+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
>>+ __llvm_prf_vnds_start = .; \
>>+ KEEP(*(__llvm_prf_vnds)) \
>>+ . = ALIGN(8); \
>>+ __llvm_prf_vnds_end = .; \
>>+ __llvm_prf_end = .; \
>>+ }
>
>In userspace PGO instrumentation, the start is often aligned by 16.
>The end does not need alignment.

Thinking more, my suggestion is to drop explicit alignment annotations
entirely:


__llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
__llvm_prf_vals_start = .; \
*(__llvm_prf_vals) \
__llvm_prf_vals_end = .; \
} \

__llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
__llvm_prf_vnds_start = .; \
*(__llvm_prf_vnds) \
__llvm_prf_vnds_end = .; \
__llvm_prf_end = .; \
}

// _cnts, _names and _data are similar. Just delete all ALIGN.
// I deleted KEEP above to facilitate --gc-sections as well.


Let the linker figure out the alignments in input sections and the
output section alignment
(https://lld.llvm.org/ELF/linker_script.html#output-section-alignment).

Omitting alignment is probably preferable in most cases, unless no input
section is present (either not emitted at all or all discarded by ld
--gc-sections) (very rare event, happened with commit 793f49a87aae
("firmware_loader: align .builtin_fw to 8"), but that case unlikely
happens with PGO).

>

>>+#else
>>+#define PGO_CLANG_DATA
>>+#endif
>>+
>>#define KERNEL_DTB() \
>> STRUCT_ALIGN(); \
>> __dtb_start = .; \
>>@@ -1105,6 +1148,7 @@
>> CONSTRUCTORS \
>> } \
>> BUG_TABLE \
>>+ PGO_CLANG_DATA
>>
>>#define INIT_TEXT_SECTION(inittext_align) \
>> . = ALIGN(inittext_align); \
>>diff --git a/kernel/Makefile b/kernel/Makefile
>>index 320f1f3941b7..a2a23ef2b12f 100644
>>--- a/kernel/Makefile
>>+++ b/kernel/Makefile
>>@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
>>obj-$(CONFIG_KCSAN) += kcsan/
>>obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>>obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
>>+obj-$(CONFIG_PGO_CLANG) += pgo/
>>
>>obj-$(CONFIG_PERF_EVENTS) += events/
>>
>>diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
>>new file mode 100644
>>index 000000000000..76a640b6cf6e
>>--- /dev/null
>>+++ b/kernel/pgo/Kconfig
>>@@ -0,0 +1,35 @@
>>+# SPDX-License-Identifier: GPL-2.0-only
>>+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
>>+
>>+config ARCH_SUPPORTS_PGO_CLANG
>>+ bool
>>+
>>+config PGO_CLANG
>>+ bool "Enable clang's PGO-based kernel profiling"
>>+ depends on DEBUG_FS
>>+ depends on ARCH_SUPPORTS_PGO_CLANG
>>+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000
>>+ help
>>+ This option enables clang's PGO (Profile Guided Optimization) based
>>+ code profiling to better optimize the kernel.
>>+
>>+ If unsure, say N.
>>+
>>+ Run a representative workload for your application on a kernel
>>+ compiled with this option and download the raw profile file from
>>+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
>>+ llvm-profdata. It may be merged with other collected raw profiles.
>>+
>>+ Copy the resulting profile file into vmlinux.profdata, and enable
>>+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
>>+ kernel.
>>+
>>+ Note that a kernel compiled with profiling flags will be
>>+ significantly larger and run slower. Also be sure to exclude files
>>+ from profiling which are not linked to the kernel image to prevent
>>+ linker errors.
>>+
>>+ Note that the debugfs filesystem has to be mounted to access
>>+ profiling data.
>>+
>>+endmenu
>>diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
>>new file mode 100644
>>index 000000000000..41e27cefd9a4
>>--- /dev/null
>>+++ b/kernel/pgo/Makefile
>>@@ -0,0 +1,5 @@
>>+# SPDX-License-Identifier: GPL-2.0
>>+GCOV_PROFILE := n
>>+PGO_PROFILE := n
>>+
>>+obj-y += fs.o instrument.o
>>diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
>>new file mode 100644
>>index 000000000000..1678df3b7d64
>>--- /dev/null
>>+++ b/kernel/pgo/fs.c
>>@@ -0,0 +1,389 @@
>>+// SPDX-License-Identifier: GPL-2.0
>>+/*
>>+ * Copyright (C) 2019 Google, Inc.
>>+ *
>>+ * Author:
>>+ * Sami Tolvanen <[email protected]>
>>+ *
>>+ * This software is licensed under the terms of the GNU General Public
>>+ * License version 2, as published by the Free Software Foundation, and
>>+ * may be copied, distributed, and modified under those terms.
>>+ *
>>+ * This program is distributed in the hope that it will be useful,
>>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>+ * GNU General Public License for more details.
>>+ *
>>+ */
>>+
>>+#define pr_fmt(fmt) "pgo: " fmt
>>+
>>+#include <linux/kernel.h>
>>+#include <linux/debugfs.h>
>>+#include <linux/fs.h>
>>+#include <linux/module.h>
>>+#include <linux/slab.h>
>>+#include <linux/vmalloc.h>
>>+#include "pgo.h"
>>+
>>+static struct dentry *directory;
>>+
>>+struct prf_private_data {
>>+ void *buffer;
>>+ unsigned long size;
>>+};
>>+
>>+/*
>>+ * Raw profile data format:
>>+ *
>>+ * - llvm_prf_header
>>+ * - __llvm_prf_data
>>+ * - __llvm_prf_cnts
>>+ * - __llvm_prf_names
>>+ * - zero padding to 8 bytes
>>+ * - for each llvm_prf_data in __llvm_prf_data:
>>+ * - llvm_prf_value_data
>>+ * - llvm_prf_value_record + site count array
>>+ * - llvm_prf_value_node_data
>>+ * ...
>>+ * ...
>>+ * ...
>>+ */
>>+
>>+static void prf_fill_header(void **buffer)
>>+{
>>+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
>>+
>>+#ifdef CONFIG_64BIT
>>+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
>>+#else
>>+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
>>+#endif
>>+ header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
>>+ header->data_size = prf_data_count();
>>+ header->padding_bytes_before_counters = 0;
>>+ header->counters_size = prf_cnts_count();
>>+ header->padding_bytes_after_counters = 0;
>>+ header->names_size = prf_names_count();
>>+ header->counters_delta = (u64)__llvm_prf_cnts_start;
>>+ header->names_delta = (u64)__llvm_prf_names_start;
>>+ header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
>>+
>>+ *buffer += sizeof(*header);
>>+}
>>+
>>+/*
>>+ * Copy the source into the buffer, incrementing the pointer into buffer in the
>>+ * process.
>>+ */
>>+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
>>+{
>>+ memcpy(*buffer, src, size);
>>+ *buffer += size;
>>+}
>>+
>>+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
>>+{
>>+ struct llvm_prf_value_node **nodes =
>>+ (struct llvm_prf_value_node **)p->values;
>>+ u32 kinds = 0;
>>+ u32 size = 0;
>>+ unsigned int kind;
>>+ unsigned int n;
>>+ unsigned int s = 0;
>>+
>>+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
>>+ unsigned int sites = p->num_value_sites[kind];
>>+
>>+ if (!sites)
>>+ continue;
>>+
>>+ /* Record + site count array */
>>+ size += prf_get_value_record_size(sites);
>>+ kinds++;
>>+
>>+ if (!nodes)
>>+ continue;
>>+
>>+ for (n = 0; n < sites; n++) {
>>+ u32 count = 0;
>>+ struct llvm_prf_value_node *site = nodes[s + n];
>>+
>>+ while (site && ++count <= U8_MAX)
>>+ site = site->next;
>>+
>>+ size += count *
>>+ sizeof(struct llvm_prf_value_node_data);
>>+ }
>>+
>>+ s += sites;
>>+ }
>>+
>>+ if (size)
>>+ size += sizeof(struct llvm_prf_value_data);
>>+
>>+ if (value_kinds)
>>+ *value_kinds = kinds;
>>+
>>+ return size;
>>+}
>>+
>>+static u32 prf_get_value_size(void)
>>+{
>>+ u32 size = 0;
>>+ struct llvm_prf_data *p;
>>+
>>+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
>>+ size += __prf_get_value_size(p, NULL);
>>+
>>+ return size;
>>+}
>>+
>>+/* Serialize the profiling's value. */
>>+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
>>+{
>>+ struct llvm_prf_value_data header;
>>+ struct llvm_prf_value_node **nodes =
>>+ (struct llvm_prf_value_node **)p->values;
>>+ unsigned int kind;
>>+ unsigned int n;
>>+ unsigned int s = 0;
>>+
>>+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
>>+
>>+ if (!header.num_value_kinds)
>>+ /* Nothing to write. */
>>+ return;
>>+
>>+ prf_copy_to_buffer(buffer, &header, sizeof(header));
>>+
>>+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
>>+ struct llvm_prf_value_record *record;
>>+ u8 *counts;
>>+ unsigned int sites = p->num_value_sites[kind];
>>+
>>+ if (!sites)
>>+ continue;
>>+
>>+ /* Profiling value record. */
>>+ record = *(struct llvm_prf_value_record **)buffer;
>>+ *buffer += prf_get_value_record_header_size();
>>+
>>+ record->kind = kind;
>>+ record->num_value_sites = sites;
>>+
>>+ /* Site count array. */
>>+ counts = *(u8 **)buffer;
>>+ *buffer += prf_get_value_record_site_count_size(sites);
>>+
>>+ /*
>>+ * If we don't have nodes, we can skip updating the site count
>>+ * array, because the buffer is zero filled.
>>+ */
>>+ if (!nodes)
>>+ continue;
>>+
>>+ for (n = 0; n < sites; n++) {
>>+ u32 count = 0;
>>+ struct llvm_prf_value_node *site = nodes[s + n];
>>+
>>+ while (site && ++count <= U8_MAX) {
>>+ prf_copy_to_buffer(buffer, site,
>>+ sizeof(struct llvm_prf_value_node_data));
>>+ site = site->next;
>>+ }
>>+
>>+ counts[n] = (u8)count;
>>+ }
>>+
>>+ s += sites;
>>+ }
>>+}
>>+
>>+static void prf_serialize_values(void **buffer)
>>+{
>>+ struct llvm_prf_data *p;
>>+
>>+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
>>+ prf_serialize_value(p, buffer);
>>+}
>>+
>>+static inline unsigned long prf_get_padding(unsigned long size)
>>+{
>>+ return 7 & (sizeof(u64) - size % sizeof(u64));
>>+}
>>+
>>+static unsigned long prf_buffer_size(void)
>>+{
>>+ return sizeof(struct llvm_prf_header) +
>>+ prf_data_size() +
>>+ prf_cnts_size() +
>>+ prf_names_size() +
>>+ prf_get_padding(prf_names_size()) +
>>+ prf_get_value_size();
>>+}
>>+
>>+/*
>>+ * Serialize the profiling data into a format LLVM's tools can understand.
>>+ * Note: caller *must* hold pgo_lock.
>>+ */
>>+static int prf_serialize(struct prf_private_data *p)
>>+{
>>+ int err = 0;
>>+ void *buffer;
>>+
>>+ p->size = prf_buffer_size();
>>+ p->buffer = vzalloc(p->size);
>>+
>>+ if (!p->buffer) {
>>+ err = -ENOMEM;
>>+ goto out;
>>+ }
>>+
>>+ buffer = p->buffer;
>>+
>>+ prf_fill_header(&buffer);
>>+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
>>+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
>>+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
>>+ buffer += prf_get_padding(prf_names_size());
>>+
>>+ prf_serialize_values(&buffer);
>>+
>>+out:
>>+ return err;
>>+}
>>+
>>+/* open() implementation for PGO. Creates a copy of the profiling data set. */
>>+static int prf_open(struct inode *inode, struct file *file)
>>+{
>>+ struct prf_private_data *data;
>>+ unsigned long flags;
>>+ int err;
>>+
>>+ data = kzalloc(sizeof(*data), GFP_KERNEL);
>>+ if (!data) {
>>+ err = -ENOMEM;
>>+ goto out;
>>+ }
>>+
>>+ flags = prf_lock();
>>+
>>+ err = prf_serialize(data);
>>+ if (unlikely(err)) {
>>+ kfree(data);
>>+ goto out_unlock;
>>+ }
>>+
>>+ file->private_data = data;
>>+
>>+out_unlock:
>>+ prf_unlock(flags);
>>+out:
>>+ return err;
>>+}
>>+
>>+/* read() implementation for PGO. */
>>+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
>>+ loff_t *ppos)
>>+{
>>+ struct prf_private_data *data = file->private_data;
>>+
>>+ BUG_ON(!data);
>>+
>>+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
>>+ data->size);
>>+}
>>+
>>+/* release() implementation for PGO. Release resources allocated by open(). */
>>+static int prf_release(struct inode *inode, struct file *file)
>>+{
>>+ struct prf_private_data *data = file->private_data;
>>+
>>+ if (data) {
>>+ vfree(data->buffer);
>>+ kfree(data);
>>+ }
>>+
>>+ return 0;
>>+}
>>+
>>+static const struct file_operations prf_fops = {
>>+ .owner = THIS_MODULE,
>>+ .open = prf_open,
>>+ .read = prf_read,
>>+ .llseek = default_llseek,
>>+ .release = prf_release
>>+};
>>+
>>+/* write() implementation for resetting PGO's profile data. */
>>+static ssize_t reset_write(struct file *file, const char __user *addr,
>>+ size_t len, loff_t *pos)
>>+{
>>+ struct llvm_prf_data *data;
>>+
>>+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
>>+
>>+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
>>+ struct llvm_prf_value_node **vnodes;
>>+ u64 current_vsite_count;
>>+ u32 i;
>>+
>>+ if (!data->values)
>>+ continue;
>>+
>>+ current_vsite_count = 0;
>>+ vnodes = (struct llvm_prf_value_node **)data->values;
>>+
>>+ for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
>>+ current_vsite_count += data->num_value_sites[i];
>>+
>>+ for (i = 0; i < current_vsite_count; i++) {
>>+ struct llvm_prf_value_node *current_vnode = vnodes[i];
>>+
>>+ while (current_vnode) {
>>+ current_vnode->count = 0;
>>+ current_vnode = current_vnode->next;
>>+ }
>>+ }
>>+ }
>>+
>>+ return len;
>>+}
>>+
>>+static const struct file_operations prf_reset_fops = {
>>+ .owner = THIS_MODULE,
>>+ .write = reset_write,
>>+ .llseek = noop_llseek,
>>+};
>>+
>>+/* Create debugfs entries. */
>>+static int __init pgo_init(void)
>>+{
>>+ directory = debugfs_create_dir("pgo", NULL);
>>+ if (!directory)
>>+ goto err_remove;
>>+
>>+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
>>+ &prf_fops))
>>+ goto err_remove;
>>+
>>+ if (!debugfs_create_file("reset", 0200, directory, NULL,
>>+ &prf_reset_fops))
>>+ goto err_remove;
>>+
>>+ return 0;
>>+
>>+err_remove:
>>+ pr_err("initialization failed\n");
>>+ return -EIO;
>>+}
>>+
>>+/* Remove debugfs entries. */
>>+static void __exit pgo_exit(void)
>>+{
>>+ debugfs_remove_recursive(directory);
>>+}
>>+
>>+module_init(pgo_init);
>>+module_exit(pgo_exit);
>>diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
>>new file mode 100644
>>index 000000000000..62ff5cfce7b1
>>--- /dev/null
>>+++ b/kernel/pgo/instrument.c
>>@@ -0,0 +1,189 @@
>>+// SPDX-License-Identifier: GPL-2.0
>>+/*
>>+ * Copyright (C) 2019 Google, Inc.
>>+ *
>>+ * Author:
>>+ * Sami Tolvanen <[email protected]>
>>+ *
>>+ * This software is licensed under the terms of the GNU General Public
>>+ * License version 2, as published by the Free Software Foundation, and
>>+ * may be copied, distributed, and modified under those terms.
>>+ *
>>+ * This program is distributed in the hope that it will be useful,
>>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>+ * GNU General Public License for more details.
>>+ *
>>+ */
>>+
>>+#define pr_fmt(fmt) "pgo: " fmt
>>+
>>+#include <linux/bitops.h>
>>+#include <linux/kernel.h>
>>+#include <linux/export.h>
>>+#include <linux/spinlock.h>
>>+#include <linux/types.h>
>>+#include "pgo.h"
>>+
>>+/*
>>+ * This lock guards both profile count updating and serialization of the
>>+ * profiling data. Keeping both of these activities separate via locking
>>+ * ensures that we don't try to serialize data that's only partially updated.
>>+ */
>>+static DEFINE_SPINLOCK(pgo_lock);
>>+static int current_node;
>>+
>>+unsigned long prf_lock(void)
>>+{
>>+ unsigned long flags;
>>+
>>+ spin_lock_irqsave(&pgo_lock, flags);
>>+
>>+ return flags;
>>+}
>>+
>>+void prf_unlock(unsigned long flags)
>>+{
>>+ spin_unlock_irqrestore(&pgo_lock, flags);
>>+}
>>+
>>+/*
>>+ * Return a newly allocated profiling value node which contains the tracked
>>+ * value by the value profiler.
>>+ * Note: caller *must* hold pgo_lock.
>>+ */
>>+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
>>+ u32 index, u64 value)
>>+{
>>+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
>>+ return NULL; /* Out of nodes */
>>+
>>+ current_node++;
>>+
>>+ /* Make sure the node is entirely within the section */
>>+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
>>+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
>>+ return NULL;
>>+
>>+ return &__llvm_prf_vnds_start[current_node];
>>+}
>>+
>>+/*
>>+ * Counts the number of times a target value is seen.
>>+ *
>>+ * Records the target value for the index if not seen before. Otherwise,
>>+ * increments the counter associated w/ the target value.
>>+ */
>>+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
>>+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
>>+{
>>+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
>>+ struct llvm_prf_value_node **counters;
>>+ struct llvm_prf_value_node *curr;
>>+ struct llvm_prf_value_node *min = NULL;
>>+ struct llvm_prf_value_node *prev = NULL;
>>+ u64 min_count = U64_MAX;
>>+ u8 values = 0;
>>+ unsigned long flags;
>>+
>>+ if (!p || !p->values)
>>+ return;
>>+
>>+ counters = (struct llvm_prf_value_node **)p->values;
>>+ curr = counters[index];
>>+
>>+ while (curr) {
>>+ if (target_value == curr->value) {
>>+ curr->count++;
>>+ return;
>>+ }
>>+
>>+ if (curr->count < min_count) {
>>+ min_count = curr->count;
>>+ min = curr;
>>+ }
>>+
>>+ prev = curr;
>>+ curr = curr->next;
>>+ values++;
>>+ }
>>+
>>+ if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
>>+ if (!min->count || !(--min->count)) {
>>+ curr = min;
>>+ curr->value = target_value;
>>+ curr->count++;
>>+ }
>>+ return;
>>+ }
>>+
>>+ /* Lock when updating the value node structure. */
>>+ flags = prf_lock();
>>+
>>+ curr = allocate_node(p, index, target_value);
>>+ if (!curr)
>>+ goto out;
>>+
>>+ curr->value = target_value;
>>+ curr->count++;
>>+
>>+ if (!counters[index])
>>+ counters[index] = curr;
>>+ else if (prev && !prev->next)
>>+ prev->next = curr;
>>+
>>+out:
>>+ prf_unlock(flags);
>>+}
>>+EXPORT_SYMBOL(__llvm_profile_instrument_target);
>>+
>>+/* Counts the number of times a range of targets values are seen. */
>>+void __llvm_profile_instrument_range(u64 target_value, void *data,
>>+ u32 index, s64 precise_start,
>>+ s64 precise_last, s64 large_value);
>>+void __llvm_profile_instrument_range(u64 target_value, void *data,
>>+ u32 index, s64 precise_start,
>>+ s64 precise_last, s64 large_value)
>>+{
>>+ if (large_value != S64_MIN && (s64)target_value >= large_value)
>>+ target_value = large_value;
>>+ else if ((s64)target_value < precise_start ||
>>+ (s64)target_value > precise_last)
>>+ target_value = precise_last + 1;
>>+
>>+ __llvm_profile_instrument_target(target_value, data, index);
>>+}
>>+EXPORT_SYMBOL(__llvm_profile_instrument_range);
>>+
>>+static u64 inst_prof_get_range_rep_value(u64 value)
>>+{
>>+ if (value <= 8)
>>+ /* The first ranges are individually tracked, use it as is. */
>>+ return value;
>>+ else if (value >= 513)
>>+ /* The last range is mapped to its lowest value. */
>>+ return 513;
>>+ else if (hweight64(value) == 1)
>>+ /* If it's a power of two, use it as is. */
>>+ return value;
>>+
>>+ /* Otherwise, take to the previous power of two + 1. */
>>+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1;
>>+}

`1 << ...` is another very minor issue.

I sent https://reviews.llvm.org/D97640 to fix the upstream.
The overflow won't happen in practice because the function is only used
by the size parameter of memory operation (e.g. memcpy).

>>+/*
>>+ * The target values are partitioned into multiple ranges. The range spec is
>>+ * defined in compiler-rt/include/profile/InstrProfData.inc.
>>+ */
>>+void __llvm_profile_instrument_memop(u64 target_value, void *data,
>>+ u32 counter_index);
>>+void __llvm_profile_instrument_memop(u64 target_value, void *data,
>>+ u32 counter_index)
>>+{
>>+ u64 rep_value;
>>+
>>+ /* Map the target value to the representative value of its range. */
>>+ rep_value = inst_prof_get_range_rep_value(target_value);
>>+ __llvm_profile_instrument_target(rep_value, data, counter_index);
>>+}
>>+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
>>diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
>>new file mode 100644
>>index 000000000000..ddc8d3002fe5
>>--- /dev/null
>>+++ b/kernel/pgo/pgo.h
>>@@ -0,0 +1,203 @@
>>+/* SPDX-License-Identifier: GPL-2.0 */
>>+/*
>>+ * Copyright (C) 2019 Google, Inc.
>>+ *
>>+ * Author:
>>+ * Sami Tolvanen <[email protected]>
>>+ *
>>+ * This software is licensed under the terms of the GNU General Public
>>+ * License version 2, as published by the Free Software Foundation, and
>>+ * may be copied, distributed, and modified under those terms.
>>+ *
>>+ * This program is distributed in the hope that it will be useful,
>>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>+ * GNU General Public License for more details.
>>+ *
>>+ */
>>+
>>+#ifndef _PGO_H
>>+#define _PGO_H
>>+
>>+/*
>>+ * Note: These internal LLVM definitions must match the compiler version.
>>+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
>>+ */
>>+
>>+#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
>>+ ((u64)255 << 56 | \
>>+ (u64)'l' << 48 | \
>>+ (u64)'p' << 40 | \
>>+ (u64)'r' << 32 | \
>>+ (u64)'o' << 24 | \
>>+ (u64)'f' << 16 | \
>>+ (u64)'r' << 8 | \
>>+ (u64)129)
>>+#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
>>+ ((u64)255 << 56 | \
>>+ (u64)'l' << 48 | \
>>+ (u64)'p' << 40 | \
>>+ (u64)'r' << 32 | \
>>+ (u64)'o' << 24 | \
>>+ (u64)'f' << 16 | \
>>+ (u64)'R' << 8 | \
>>+ (u64)129)
>>+
>>+#define LLVM_INSTR_PROF_RAW_VERSION 5
>>+#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
>>+#define LLVM_INSTR_PROF_IPVK_FIRST 0
>>+#define LLVM_INSTR_PROF_IPVK_LAST 1
>>+#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
>>+
>>+#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
>>+#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
>>+
>>+/**
>>+ * struct llvm_prf_header - represents the raw profile header data structure.
>>+ * @magic: the magic token for the file format.
>>+ * @version: the version of the file format.
>>+ * @data_size: the number of entries in the profile data section.
>>+ * @padding_bytes_before_counters: the number of padding bytes before the
>>+ * counters.
>>+ * @counters_size: the size in bytes of the LLVM profile section containing the
>>+ * counters.
>>+ * @padding_bytes_after_counters: the number of padding bytes after the
>>+ * counters.
>>+ * @names_size: the size in bytes of the LLVM profile section containing the
>>+ * counters' names.
>>+ * @counters_delta: the beginning of the LLMV profile counters section.
>>+ * @names_delta: the beginning of the LLMV profile names section.
>>+ * @value_kind_last: the last profile value kind.
>>+ */
>>+struct llvm_prf_header {
>>+ u64 magic;
>>+ u64 version;
>>+ u64 data_size;
>>+ u64 padding_bytes_before_counters;
>>+ u64 counters_size;
>>+ u64 padding_bytes_after_counters;
>>+ u64 names_size;
>>+ u64 counters_delta;
>>+ u64 names_delta;
>>+ u64 value_kind_last;
>>+};
>>+
>>+/**
>>+ * struct llvm_prf_data - represents the per-function control structure.
>>+ * @name_ref: the reference to the function's name.
>>+ * @func_hash: the hash value of the function.
>>+ * @counter_ptr: a pointer to the profile counter.
>>+ * @function_ptr: a pointer to the function.
>>+ * @values: the profiling values associated with this function.
>>+ * @num_counters: the number of counters in the function.
>>+ * @num_value_sites: the number of value profile sites.
>>+ */
>>+struct llvm_prf_data {
>>+ const u64 name_ref;
>>+ const u64 func_hash;
>>+ const void *counter_ptr;
>>+ const void *function_ptr;
>>+ void *values;
>>+ const u32 num_counters;
>>+ const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
>>+} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
>>+
>>+/**
>>+ * structure llvm_prf_value_node_data - represents the data part of the struct
>>+ * llvm_prf_value_node data structure.
>>+ * @value: the value counters.
>>+ * @count: the counters' count.
>>+ */
>>+struct llvm_prf_value_node_data {
>>+ u64 value;
>>+ u64 count;
>>+};
>>+
>>+/**
>>+ * struct llvm_prf_value_node - represents an internal data structure used by
>>+ * the value profiler.
>>+ * @value: the value counters.
>>+ * @count: the counters' count.
>>+ * @next: the next value node.
>>+ */
>>+struct llvm_prf_value_node {
>>+ u64 value;
>>+ u64 count;
>>+ struct llvm_prf_value_node *next;
>>+};
>>+
>>+/**
>>+ * struct llvm_prf_value_data - represents the value profiling data in indexed
>>+ * format.
>>+ * @total_size: the total size in bytes including this field.
>>+ * @num_value_kinds: the number of value profile kinds that has value profile
>>+ * data.
>>+ */
>>+struct llvm_prf_value_data {
>>+ u32 total_size;
>>+ u32 num_value_kinds;
>>+};
>>+
>>+/**
>>+ * struct llvm_prf_value_record - represents the on-disk layout of the value
>>+ * profile data of a particular kind for one function.
>>+ * @kind: the kind of the value profile record.
>>+ * @num_value_sites: the number of value profile sites.
>>+ * @site_count_array: the first element of the array that stores the number
>>+ * of profiled values for each value site.
>>+ */
>>+struct llvm_prf_value_record {
>>+ u32 kind;
>>+ u32 num_value_sites;
>>+ u8 site_count_array[];
>>+};
>>+
>>+#define prf_get_value_record_header_size() \
>>+ offsetof(struct llvm_prf_value_record, site_count_array)
>>+#define prf_get_value_record_site_count_size(sites) \
>>+ roundup((sites), 8)
>>+#define prf_get_value_record_size(sites) \
>>+ (prf_get_value_record_header_size() + \
>>+ prf_get_value_record_site_count_size((sites)))
>>+
>>+/* Data sections */
>>+extern struct llvm_prf_data __llvm_prf_data_start[];
>>+extern struct llvm_prf_data __llvm_prf_data_end[];
>>+
>>+extern u64 __llvm_prf_cnts_start[];
>>+extern u64 __llvm_prf_cnts_end[];
>>+
>>+extern char __llvm_prf_names_start[];
>>+extern char __llvm_prf_names_end[];
>>+
>>+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
>>+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
>>+
>>+/* Locking for vnodes */
>>+extern unsigned long prf_lock(void);
>>+extern void prf_unlock(unsigned long flags);
>>+
>>+#define __DEFINE_PRF_SIZE(s) \
>>+ static inline unsigned long prf_ ## s ## _size(void) \
>>+ { \
>>+ unsigned long start = \
>>+ (unsigned long)__llvm_prf_ ## s ## _start; \
>>+ unsigned long end = \
>>+ (unsigned long)__llvm_prf_ ## s ## _end; \
>>+ return roundup(end - start, \
>>+ sizeof(__llvm_prf_ ## s ## _start[0])); \
>>+ } \
>>+ static inline unsigned long prf_ ## s ## _count(void) \
>>+ { \
>>+ return prf_ ## s ## _size() / \
>>+ sizeof(__llvm_prf_ ## s ## _start[0]); \
>>+ }
>>+
>>+__DEFINE_PRF_SIZE(data);
>>+__DEFINE_PRF_SIZE(cnts);
>>+__DEFINE_PRF_SIZE(names);
>>+__DEFINE_PRF_SIZE(vnds);
>>+
>>+#undef __DEFINE_PRF_SIZE
>>+
>>+#endif /* _PGO_H */
>>diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
>>index eee59184de64..48a65d092c5b 100644
>>--- a/scripts/Makefile.lib
>>+++ b/scripts/Makefile.lib
>>@@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \
>> $(CFLAGS_GCOV))
>>endif
>>
>>+#
>>+# Enable clang's PGO profiling flags for a file or directory depending on
>>+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
>>+#
>>+ifeq ($(CONFIG_PGO_CLANG),y)
>>+_c_flags += $(if $(patsubst n%,, \
>>+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
>>+ $(CFLAGS_PGO_CLANG))
>>+endif
>>+
>>#
>># Enable address sanitizer flags for kernel except some files or directories
>># we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
>>--
>>2.30.1.766.gb4fecdf3b7-goog
>>
>>--
>>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
>>To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210226222030.3718075-1-morbo%40google.com.

2021-04-07 22:05:11

by Bill Wendling

[permalink] [raw]
Subject: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

From: Sami Tolvanen <[email protected]>

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen <[email protected]>
Co-developed-by: Bill Wendling <[email protected]>
Signed-off-by: Bill Wendling <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Fangrui Song <[email protected]>
---
v9: - [maskray] Remove explicit 'ALIGN' and 'KEEP' from PGO variables in
vmlinux.lds.h.
v8: - Rebased on top-of-tree.
v7: - [sedat.dilek] Fix minor build failure.
v6: - Add better documentation about the locking scheme and other things.
- Rename macros to better match the same macros in LLVM's source code.
v5: - [natechancellor] Correct padding calculation.
v4: - [ndesaulniers] Remove non-x86 Makfile changes and se "hweight64" instead
of using our own popcount implementation.
v3: - [sedat.dilek] Added change log section.
v2: - [natechancellor] Added "__llvm_profile_instrument_memop".
- [maskray] Corrected documentation, re PGO flags when using LTO.
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/pgo.rst | 127 +++++++++
MAINTAINERS | 9 +
Makefile | 3 +
arch/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/crypto/Makefile | 4 +
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +
arch/x86/platform/efi/Makefile | 1 +
arch/x86/purgatory/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/um/vdso/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 34 +++
kernel/Makefile | 1 +
kernel/pgo/Kconfig | 35 +++
kernel/pgo/Makefile | 5 +
kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
kernel/pgo/instrument.c | 189 +++++++++++++
kernel/pgo/pgo.h | 203 ++++++++++++++
scripts/Makefile.lib | 10 +
24 files changed, 1022 insertions(+)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 1b1cf4f5c9d9..6a30cd98e6f9 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -27,6 +27,7 @@ whole; patches welcome!
kgdb
kselftest
kunit/index
+ pgo


.. only:: subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index 000000000000..b7f11d8405b7
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Using PGO with the Linux kernel
+===============================
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===========
+
+Configure the kernel with:
+
+.. code-block:: make
+
+ CONFIG_DEBUG_FS=y
+ CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+ mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=============
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in one directory
+
+ .. code-block:: make
+
+ PGO_PROFILE := y
+
+To exclude files from being profiled use
+
+ .. code-block:: make
+
+ PGO_PROFILE_main.o := n
+
+and
+
+ .. code-block:: make
+
+ PGO_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as kernel
+modules are supported by this mechanism.
+
+
+Files
+=====
+
+The PGO kernel support creates the following files in debugfs:
+
+``/sys/kernel/debug/pgo``
+ Parent directory for all PGO-related files.
+
+``/sys/kernel/debug/pgo/reset``
+ Global reset file: resets all coverage data to zero when written to.
+
+``/sys/kernel/debug/profraw``
+ The raw PGO data that must be processed with ``llvm_profdata``.
+
+
+Workflow
+========
+
+The PGO kernel can be run on the host or test machines. The data though should
+be analyzed with Clang's tools from the same Clang version as the kernel was
+compiled. Clang's tolerant of version skew, but it's easier to use the same
+Clang version.
+
+The profiling data is useful for optimizing the kernel, analyzing coverage,
+etc. Clang offers tools to perform these tasks.
+
+Here is an example workflow for profiling an instrumented kernel with PGO and
+using the result to optimize the kernel:
+
+1) Install the kernel on the TEST machine.
+
+2) Reset the data counters right before running the load tests
+
+ .. code-block:: sh
+
+ $ echo 1 > /sys/kernel/debug/pgo/reset
+
+3) Run the load tests.
+
+4) Collect the raw profile data
+
+ .. code-block:: sh
+
+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
+
+5) (Optional) Download the raw profile data to the HOST machine.
+
+6) Process the raw profile data
+
+ .. code-block:: sh
+
+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
+
+ Note that multiple raw profile data files can be merged during this step.
+
+7) Rebuild the kernel using the profile data (PGO disabled)
+
+ .. code-block:: sh
+
+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
diff --git a/MAINTAINERS b/MAINTAINERS
index c80ad735b384..742058188af2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14054,6 +14054,15 @@ S: Maintained
F: include/linux/personality.h
F: include/uapi/linux/personality.h

+PGO BASED KERNEL PROFILING
+M: Sami Tolvanen <[email protected]>
+M: Bill Wendling <[email protected]>
+R: Nathan Chancellor <[email protected]>
+R: Nick Desaulniers <[email protected]>
+S: Supported
+F: Documentation/dev-tools/pgo.rst
+F: kernel/pgo
+
PHOENIX RC FLIGHT CONTROLLER ADAPTER
M: Marcus Folkesson <[email protected]>
L: [email protected]
diff --git a/Makefile b/Makefile
index cc77fd45ca64..6450faceb137 100644
--- a/Makefile
+++ b/Makefile
@@ -660,6 +660,9 @@ endif # KBUILD_EXTMOD
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux

+CFLAGS_PGO_CLANG := -fprofile-generate
+export CFLAGS_PGO_CLANG
+
CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
$(call cc-option,-fno-tree-loop-im) \
$(call cc-disable-warning,maybe-uninitialized,)
diff --git a/arch/Kconfig b/arch/Kconfig
index ecfd3520b676..afd082133e0a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1191,6 +1191,7 @@ config ARCH_HAS_ELFCORE_COMPAT
bool

source "kernel/gcov/Kconfig"
+source "kernel/pgo/Kconfig"

source "scripts/gcc-plugins/Kconfig"

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..62be93b199ff 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -99,6 +99,7 @@ config X86
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
select ARCH_SUPPORTS_LTO_CLANG if X86_64
select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
+ select ARCH_SUPPORTS_PGO_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index fe605205b4ce..383853e32f67 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n

$(obj)/bzImage: asflags-y := $(SVGA_MODE)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3fa..ed12ab65f606 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index b28e36b7c96b..4b2e9620c412 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -4,6 +4,10 @@

OBJECT_FILES_NON_STANDARD := y

+# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
+# registers for some of the functions.
+PGO_PROFILE_curve25519-x86_64.o := n
+
obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 05c4abc2fdfd..f7421e44725a 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@
VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
$(call ld-option, --eh-frame-hdr) -Bsymbolic
GCOV_PROFILE := n
+PGO_PROFILE := n

quiet_cmd_vdso_and_check = VDSO $@
cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index efd9e9ea17f2..f6cab2316c46 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -184,6 +184,8 @@ SECTIONS

BUG_TABLE

+ PGO_CLANG_DATA
+
ORC_UNWIND_TABLE

. = ALIGN(PAGE_SIZE);
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index 84b09c230cbd..5f22b31446ad 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -2,6 +2,7 @@
OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
KASAN_SANITIZE := n
GCOV_PROFILE := n
+PGO_PROFILE := n

obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..36f20e99da0b 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk

# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
GCOV_PROFILE := n
+PGO_PROFILE := n
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449..21797192f958 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
+PGO_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 5943387e3f35..54f5768f5853 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@

VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
GCOV_PROFILE := n
+PGO_PROFILE := n

#
# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index c23466e05e60..724fb389bb9d 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
+PGO_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 0331d5d49551..b371857097e8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -329,6 +329,39 @@
#define DTPM_TABLE()
#endif

+#ifdef CONFIG_PGO_CLANG
+#define PGO_CLANG_DATA \
+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
+ __llvm_prf_start = .; \
+ __llvm_prf_data_start = .; \
+ *(__llvm_prf_data) \
+ __llvm_prf_data_end = .; \
+ } \
+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
+ __llvm_prf_cnts_start = .; \
+ *(__llvm_prf_cnts) \
+ __llvm_prf_cnts_end = .; \
+ } \
+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
+ __llvm_prf_names_start = .; \
+ *(__llvm_prf_names) \
+ __llvm_prf_names_end = .; \
+ } \
+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
+ __llvm_prf_vals_start = .; \
+ *(__llvm_prf_vals) \
+ __llvm_prf_vals_end = .; \
+ } \
+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
+ __llvm_prf_vnds_start = .; \
+ *(__llvm_prf_vnds) \
+ __llvm_prf_vnds_end = .; \
+ __llvm_prf_end = .; \
+ }
+#else
+#define PGO_CLANG_DATA
+#endif
+
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@@ -1106,6 +1139,7 @@
CONSTRUCTORS \
} \
BUG_TABLE \
+ PGO_CLANG_DATA

#define INIT_TEXT_SECTION(inittext_align) \
. = ALIGN(inittext_align); \
diff --git a/kernel/Makefile b/kernel/Makefile
index 320f1f3941b7..a2a23ef2b12f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
+obj-$(CONFIG_PGO_CLANG) += pgo/

obj-$(CONFIG_PERF_EVENTS) += events/

diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
new file mode 100644
index 000000000000..76a640b6cf6e
--- /dev/null
+++ b/kernel/pgo/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
+
+config ARCH_SUPPORTS_PGO_CLANG
+ bool
+
+config PGO_CLANG
+ bool "Enable clang's PGO-based kernel profiling"
+ depends on DEBUG_FS
+ depends on ARCH_SUPPORTS_PGO_CLANG
+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000
+ help
+ This option enables clang's PGO (Profile Guided Optimization) based
+ code profiling to better optimize the kernel.
+
+ If unsure, say N.
+
+ Run a representative workload for your application on a kernel
+ compiled with this option and download the raw profile file from
+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with
+ llvm-profdata. It may be merged with other collected raw profiles.
+
+ Copy the resulting profile file into vmlinux.profdata, and enable
+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
+ kernel.
+
+ Note that a kernel compiled with profiling flags will be
+ significantly larger and run slower. Also be sure to exclude files
+ from profiling which are not linked to the kernel image to prevent
+ linker errors.
+
+ Note that the debugfs filesystem has to be mounted to access
+ profiling data.
+
+endmenu
diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
new file mode 100644
index 000000000000..41e27cefd9a4
--- /dev/null
+++ b/kernel/pgo/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+GCOV_PROFILE := n
+PGO_PROFILE := n
+
+obj-y += fs.o instrument.o
diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
new file mode 100644
index 000000000000..1678df3b7d64
--- /dev/null
+++ b/kernel/pgo/fs.c
@@ -0,0 +1,389 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/kernel.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include "pgo.h"
+
+static struct dentry *directory;
+
+struct prf_private_data {
+ void *buffer;
+ unsigned long size;
+};
+
+/*
+ * Raw profile data format:
+ *
+ * - llvm_prf_header
+ * - __llvm_prf_data
+ * - __llvm_prf_cnts
+ * - __llvm_prf_names
+ * - zero padding to 8 bytes
+ * - for each llvm_prf_data in __llvm_prf_data:
+ * - llvm_prf_value_data
+ * - llvm_prf_value_record + site count array
+ * - llvm_prf_value_node_data
+ * ...
+ * ...
+ * ...
+ */
+
+static void prf_fill_header(void **buffer)
+{
+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
+
+#ifdef CONFIG_64BIT
+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
+#else
+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
+#endif
+ header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
+ header->data_size = prf_data_count();
+ header->padding_bytes_before_counters = 0;
+ header->counters_size = prf_cnts_count();
+ header->padding_bytes_after_counters = 0;
+ header->names_size = prf_names_count();
+ header->counters_delta = (u64)__llvm_prf_cnts_start;
+ header->names_delta = (u64)__llvm_prf_names_start;
+ header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
+
+ *buffer += sizeof(*header);
+}
+
+/*
+ * Copy the source into the buffer, incrementing the pointer into buffer in the
+ * process.
+ */
+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
+{
+ memcpy(*buffer, src, size);
+ *buffer += size;
+}
+
+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
+{
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ u32 kinds = 0;
+ u32 size = 0;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Record + site count array */
+ size += prf_get_value_record_size(sites);
+ kinds++;
+
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX)
+ site = site->next;
+
+ size += count *
+ sizeof(struct llvm_prf_value_node_data);
+ }
+
+ s += sites;
+ }
+
+ if (size)
+ size += sizeof(struct llvm_prf_value_data);
+
+ if (value_kinds)
+ *value_kinds = kinds;
+
+ return size;
+}
+
+static u32 prf_get_value_size(void)
+{
+ u32 size = 0;
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ size += __prf_get_value_size(p, NULL);
+
+ return size;
+}
+
+/* Serialize the profiling's value. */
+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
+{
+ struct llvm_prf_value_data header;
+ struct llvm_prf_value_node **nodes =
+ (struct llvm_prf_value_node **)p->values;
+ unsigned int kind;
+ unsigned int n;
+ unsigned int s = 0;
+
+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
+
+ if (!header.num_value_kinds)
+ /* Nothing to write. */
+ return;
+
+ prf_copy_to_buffer(buffer, &header, sizeof(header));
+
+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
+ struct llvm_prf_value_record *record;
+ u8 *counts;
+ unsigned int sites = p->num_value_sites[kind];
+
+ if (!sites)
+ continue;
+
+ /* Profiling value record. */
+ record = *(struct llvm_prf_value_record **)buffer;
+ *buffer += prf_get_value_record_header_size();
+
+ record->kind = kind;
+ record->num_value_sites = sites;
+
+ /* Site count array. */
+ counts = *(u8 **)buffer;
+ *buffer += prf_get_value_record_site_count_size(sites);
+
+ /*
+ * If we don't have nodes, we can skip updating the site count
+ * array, because the buffer is zero filled.
+ */
+ if (!nodes)
+ continue;
+
+ for (n = 0; n < sites; n++) {
+ u32 count = 0;
+ struct llvm_prf_value_node *site = nodes[s + n];
+
+ while (site && ++count <= U8_MAX) {
+ prf_copy_to_buffer(buffer, site,
+ sizeof(struct llvm_prf_value_node_data));
+ site = site->next;
+ }
+
+ counts[n] = (u8)count;
+ }
+
+ s += sites;
+ }
+}
+
+static void prf_serialize_values(void **buffer)
+{
+ struct llvm_prf_data *p;
+
+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
+ prf_serialize_value(p, buffer);
+}
+
+static inline unsigned long prf_get_padding(unsigned long size)
+{
+ return 7 & (sizeof(u64) - size % sizeof(u64));
+}
+
+static unsigned long prf_buffer_size(void)
+{
+ return sizeof(struct llvm_prf_header) +
+ prf_data_size() +
+ prf_cnts_size() +
+ prf_names_size() +
+ prf_get_padding(prf_names_size()) +
+ prf_get_value_size();
+}
+
+/*
+ * Serialize the profiling data into a format LLVM's tools can understand.
+ * Note: caller *must* hold pgo_lock.
+ */
+static int prf_serialize(struct prf_private_data *p)
+{
+ int err = 0;
+ void *buffer;
+
+ p->size = prf_buffer_size();
+ p->buffer = vzalloc(p->size);
+
+ if (!p->buffer) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ buffer = p->buffer;
+
+ prf_fill_header(&buffer);
+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
+ buffer += prf_get_padding(prf_names_size());
+
+ prf_serialize_values(&buffer);
+
+out:
+ return err;
+}
+
+/* open() implementation for PGO. Creates a copy of the profiling data set. */
+static int prf_open(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data;
+ unsigned long flags;
+ int err;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ flags = prf_lock();
+
+ err = prf_serialize(data);
+ if (unlikely(err)) {
+ kfree(data);
+ goto out_unlock;
+ }
+
+ file->private_data = data;
+
+out_unlock:
+ prf_unlock(flags);
+out:
+ return err;
+}
+
+/* read() implementation for PGO. */
+static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct prf_private_data *data = file->private_data;
+
+ BUG_ON(!data);
+
+ return simple_read_from_buffer(buf, count, ppos, data->buffer,
+ data->size);
+}
+
+/* release() implementation for PGO. Release resources allocated by open(). */
+static int prf_release(struct inode *inode, struct file *file)
+{
+ struct prf_private_data *data = file->private_data;
+
+ if (data) {
+ vfree(data->buffer);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static const struct file_operations prf_fops = {
+ .owner = THIS_MODULE,
+ .open = prf_open,
+ .read = prf_read,
+ .llseek = default_llseek,
+ .release = prf_release
+};
+
+/* write() implementation for resetting PGO's profile data. */
+static ssize_t reset_write(struct file *file, const char __user *addr,
+ size_t len, loff_t *pos)
+{
+ struct llvm_prf_data *data;
+
+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
+
+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
+ struct llvm_prf_value_node **vnodes;
+ u64 current_vsite_count;
+ u32 i;
+
+ if (!data->values)
+ continue;
+
+ current_vsite_count = 0;
+ vnodes = (struct llvm_prf_value_node **)data->values;
+
+ for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
+ current_vsite_count += data->num_value_sites[i];
+
+ for (i = 0; i < current_vsite_count; i++) {
+ struct llvm_prf_value_node *current_vnode = vnodes[i];
+
+ while (current_vnode) {
+ current_vnode->count = 0;
+ current_vnode = current_vnode->next;
+ }
+ }
+ }
+
+ return len;
+}
+
+static const struct file_operations prf_reset_fops = {
+ .owner = THIS_MODULE,
+ .write = reset_write,
+ .llseek = noop_llseek,
+};
+
+/* Create debugfs entries. */
+static int __init pgo_init(void)
+{
+ directory = debugfs_create_dir("pgo", NULL);
+ if (!directory)
+ goto err_remove;
+
+ if (!debugfs_create_file("profraw", 0600, directory, NULL,
+ &prf_fops))
+ goto err_remove;
+
+ if (!debugfs_create_file("reset", 0200, directory, NULL,
+ &prf_reset_fops))
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ pr_err("initialization failed\n");
+ return -EIO;
+}
+
+/* Remove debugfs entries. */
+static void __exit pgo_exit(void)
+{
+ debugfs_remove_recursive(directory);
+}
+
+module_init(pgo_init);
+module_exit(pgo_exit);
diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
new file mode 100644
index 000000000000..464b3bc77431
--- /dev/null
+++ b/kernel/pgo/instrument.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#define pr_fmt(fmt) "pgo: " fmt
+
+#include <linux/bitops.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include "pgo.h"
+
+/*
+ * This lock guards both profile count updating and serialization of the
+ * profiling data. Keeping both of these activities separate via locking
+ * ensures that we don't try to serialize data that's only partially updated.
+ */
+static DEFINE_SPINLOCK(pgo_lock);
+static int current_node;
+
+unsigned long prf_lock(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pgo_lock, flags);
+
+ return flags;
+}
+
+void prf_unlock(unsigned long flags)
+{
+ spin_unlock_irqrestore(&pgo_lock, flags);
+}
+
+/*
+ * Return a newly allocated profiling value node which contains the tracked
+ * value by the value profiler.
+ * Note: caller *must* hold pgo_lock.
+ */
+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
+ u32 index, u64 value)
+{
+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
+ return NULL; /* Out of nodes */
+
+ current_node++;
+
+ /* Make sure the node is entirely within the section */
+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
+ return NULL;
+
+ return &__llvm_prf_vnds_start[current_node];
+}
+
+/*
+ * Counts the number of times a target value is seen.
+ *
+ * Records the target value for the index if not seen before. Otherwise,
+ * increments the counter associated w/ the target value.
+ */
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
+{
+ struct llvm_prf_data *p = (struct llvm_prf_data *)data;
+ struct llvm_prf_value_node **counters;
+ struct llvm_prf_value_node *curr;
+ struct llvm_prf_value_node *min = NULL;
+ struct llvm_prf_value_node *prev = NULL;
+ u64 min_count = U64_MAX;
+ u8 values = 0;
+ unsigned long flags;
+
+ if (!p || !p->values)
+ return;
+
+ counters = (struct llvm_prf_value_node **)p->values;
+ curr = counters[index];
+
+ while (curr) {
+ if (target_value == curr->value) {
+ curr->count++;
+ return;
+ }
+
+ if (curr->count < min_count) {
+ min_count = curr->count;
+ min = curr;
+ }
+
+ prev = curr;
+ curr = curr->next;
+ values++;
+ }
+
+ if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
+ if (!min->count || !(--min->count)) {
+ curr = min;
+ curr->value = target_value;
+ curr->count++;
+ }
+ return;
+ }
+
+ /* Lock when updating the value node structure. */
+ flags = prf_lock();
+
+ curr = allocate_node(p, index, target_value);
+ if (!curr)
+ goto out;
+
+ curr->value = target_value;
+ curr->count++;
+
+ if (!counters[index])
+ counters[index] = curr;
+ else if (prev && !prev->next)
+ prev->next = curr;
+
+out:
+ prf_unlock(flags);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_target);
+
+/* Counts the number of times a range of targets values are seen. */
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value);
+void __llvm_profile_instrument_range(u64 target_value, void *data,
+ u32 index, s64 precise_start,
+ s64 precise_last, s64 large_value)
+{
+ if (large_value != S64_MIN && (s64)target_value >= large_value)
+ target_value = large_value;
+ else if ((s64)target_value < precise_start ||
+ (s64)target_value > precise_last)
+ target_value = precise_last + 1;
+
+ __llvm_profile_instrument_target(target_value, data, index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_range);
+
+static u64 inst_prof_get_range_rep_value(u64 value)
+{
+ if (value <= 8)
+ /* The first ranges are individually tracked, use it as is. */
+ return value;
+ else if (value >= 513)
+ /* The last range is mapped to its lowest value. */
+ return 513;
+ else if (hweight64(value) == 1)
+ /* If it's a power of two, use it as is. */
+ return value;
+
+ /* Otherwise, take to the previous power of two + 1. */
+ return ((u64)1 << (64 - __builtin_clzll(value) - 1)) + 1;
+}
+
+/*
+ * The target values are partitioned into multiple ranges. The range spec is
+ * defined in compiler-rt/include/profile/InstrProfData.inc.
+ */
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index);
+void __llvm_profile_instrument_memop(u64 target_value, void *data,
+ u32 counter_index)
+{
+ u64 rep_value;
+
+ /* Map the target value to the representative value of its range. */
+ rep_value = inst_prof_get_range_rep_value(target_value);
+ __llvm_profile_instrument_target(rep_value, data, counter_index);
+}
+EXPORT_SYMBOL(__llvm_profile_instrument_memop);
diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
new file mode 100644
index 000000000000..ddc8d3002fe5
--- /dev/null
+++ b/kernel/pgo/pgo.h
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Google, Inc.
+ *
+ * Author:
+ * Sami Tolvanen <[email protected]>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _PGO_H
+#define _PGO_H
+
+/*
+ * Note: These internal LLVM definitions must match the compiler version.
+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
+ */
+
+#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'r' << 8 | \
+ (u64)129)
+#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
+ ((u64)255 << 56 | \
+ (u64)'l' << 48 | \
+ (u64)'p' << 40 | \
+ (u64)'r' << 32 | \
+ (u64)'o' << 24 | \
+ (u64)'f' << 16 | \
+ (u64)'R' << 8 | \
+ (u64)129)
+
+#define LLVM_INSTR_PROF_RAW_VERSION 5
+#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
+#define LLVM_INSTR_PROF_IPVK_FIRST 0
+#define LLVM_INSTR_PROF_IPVK_LAST 1
+#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
+
+#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
+#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
+
+/**
+ * struct llvm_prf_header - represents the raw profile header data structure.
+ * @magic: the magic token for the file format.
+ * @version: the version of the file format.
+ * @data_size: the number of entries in the profile data section.
+ * @padding_bytes_before_counters: the number of padding bytes before the
+ * counters.
+ * @counters_size: the size in bytes of the LLVM profile section containing the
+ * counters.
+ * @padding_bytes_after_counters: the number of padding bytes after the
+ * counters.
+ * @names_size: the size in bytes of the LLVM profile section containing the
+ * counters' names.
+ * @counters_delta: the beginning of the LLMV profile counters section.
+ * @names_delta: the beginning of the LLMV profile names section.
+ * @value_kind_last: the last profile value kind.
+ */
+struct llvm_prf_header {
+ u64 magic;
+ u64 version;
+ u64 data_size;
+ u64 padding_bytes_before_counters;
+ u64 counters_size;
+ u64 padding_bytes_after_counters;
+ u64 names_size;
+ u64 counters_delta;
+ u64 names_delta;
+ u64 value_kind_last;
+};
+
+/**
+ * struct llvm_prf_data - represents the per-function control structure.
+ * @name_ref: the reference to the function's name.
+ * @func_hash: the hash value of the function.
+ * @counter_ptr: a pointer to the profile counter.
+ * @function_ptr: a pointer to the function.
+ * @values: the profiling values associated with this function.
+ * @num_counters: the number of counters in the function.
+ * @num_value_sites: the number of value profile sites.
+ */
+struct llvm_prf_data {
+ const u64 name_ref;
+ const u64 func_hash;
+ const void *counter_ptr;
+ const void *function_ptr;
+ void *values;
+ const u32 num_counters;
+ const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
+} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
+
+/**
+ * structure llvm_prf_value_node_data - represents the data part of the struct
+ * llvm_prf_value_node data structure.
+ * @value: the value counters.
+ * @count: the counters' count.
+ */
+struct llvm_prf_value_node_data {
+ u64 value;
+ u64 count;
+};
+
+/**
+ * struct llvm_prf_value_node - represents an internal data structure used by
+ * the value profiler.
+ * @value: the value counters.
+ * @count: the counters' count.
+ * @next: the next value node.
+ */
+struct llvm_prf_value_node {
+ u64 value;
+ u64 count;
+ struct llvm_prf_value_node *next;
+};
+
+/**
+ * struct llvm_prf_value_data - represents the value profiling data in indexed
+ * format.
+ * @total_size: the total size in bytes including this field.
+ * @num_value_kinds: the number of value profile kinds that has value profile
+ * data.
+ */
+struct llvm_prf_value_data {
+ u32 total_size;
+ u32 num_value_kinds;
+};
+
+/**
+ * struct llvm_prf_value_record - represents the on-disk layout of the value
+ * profile data of a particular kind for one function.
+ * @kind: the kind of the value profile record.
+ * @num_value_sites: the number of value profile sites.
+ * @site_count_array: the first element of the array that stores the number
+ * of profiled values for each value site.
+ */
+struct llvm_prf_value_record {
+ u32 kind;
+ u32 num_value_sites;
+ u8 site_count_array[];
+};
+
+#define prf_get_value_record_header_size() \
+ offsetof(struct llvm_prf_value_record, site_count_array)
+#define prf_get_value_record_site_count_size(sites) \
+ roundup((sites), 8)
+#define prf_get_value_record_size(sites) \
+ (prf_get_value_record_header_size() + \
+ prf_get_value_record_site_count_size((sites)))
+
+/* Data sections */
+extern struct llvm_prf_data __llvm_prf_data_start[];
+extern struct llvm_prf_data __llvm_prf_data_end[];
+
+extern u64 __llvm_prf_cnts_start[];
+extern u64 __llvm_prf_cnts_end[];
+
+extern char __llvm_prf_names_start[];
+extern char __llvm_prf_names_end[];
+
+extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
+extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
+
+/* Locking for vnodes */
+extern unsigned long prf_lock(void);
+extern void prf_unlock(unsigned long flags);
+
+#define __DEFINE_PRF_SIZE(s) \
+ static inline unsigned long prf_ ## s ## _size(void) \
+ { \
+ unsigned long start = \
+ (unsigned long)__llvm_prf_ ## s ## _start; \
+ unsigned long end = \
+ (unsigned long)__llvm_prf_ ## s ## _end; \
+ return roundup(end - start, \
+ sizeof(__llvm_prf_ ## s ## _start[0])); \
+ } \
+ static inline unsigned long prf_ ## s ## _count(void) \
+ { \
+ return prf_ ## s ## _size() / \
+ sizeof(__llvm_prf_ ## s ## _start[0]); \
+ }
+
+__DEFINE_PRF_SIZE(data);
+__DEFINE_PRF_SIZE(cnts);
+__DEFINE_PRF_SIZE(names);
+__DEFINE_PRF_SIZE(vnds);
+
+#undef __DEFINE_PRF_SIZE
+
+#endif /* _PGO_H */
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 8cd67b1b6d15..d411e92dd0d6 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_GCOV))
endif

+#
+# Enable clang's PGO profiling flags for a file or directory depending on
+# variables PGO_PROFILE_obj.o and PGO_PROFILE.
+#
+ifeq ($(CONFIG_PGO_CLANG),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
+ $(CFLAGS_PGO_CLANG))
+endif
+
#
# Enable address sanitizer flags for kernel except some files or directories
# we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
--
2.31.0.208.g409f899ff0-goog

2021-04-07 22:07:12

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Apr 07, 2021 at 02:17:04PM -0700, 'Bill Wendling' via Clang Built Linux wrote:
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Reviewed-by: Fangrui Song <[email protected]>

Thanks for sending this again! I'm looking forward to using it.

Masahiro and Andrew, unless one of you would prefer to take this in your
tree, I figure I can snag it to send to Linus.

Anyone else have feedback?

Thanks!

-Kees

--
Kees Cook

2021-04-07 22:07:25

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Apr 7, 2021 at 2:22 PM Kees Cook <[email protected]> wrote:
>
> On Wed, Apr 07, 2021 at 02:17:04PM -0700, 'Bill Wendling' via Clang Built Linux wrote:
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
> > Reviewed-by: Nick Desaulniers <[email protected]>
> > Reviewed-by: Fangrui Song <[email protected]>
>
> Thanks for sending this again! I'm looking forward to using it.

Yay. Quite excited about that:)

> Masahiro and Andrew, unless one of you would prefer to take this in your
> tree, I figure I can snag it to send to Linus.
>
> Anyone else have feedback?

I have carefully compared the implementation and the original
implementation in llvm-project/compiler-rt.
This looks great.
Also very happy about the cleaner include/asm-generic/vmlinux.lds.h now.

Just adding a note here for folks who may want to help test the
not-yet-common option LD_DEAD_CODE_DATA_ELIMINATION.
--gc-sections may not work perfectly with some advanced PGO features
before Clang 13 (not broken but probably just in an inferior state).
There were some upstream changes in this area recently and I think as
of my https://reviews.llvm.org/D97649 things should be perfect with GC
now.
This does not deserve any comment without more testing, though.

Thanks for already carrying my Reviewed-by tag.

> Thanks!
>
> -Kees
>
> --
> Kees Cook



--
宋方睿

2021-04-07 22:08:00

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Apr 7, 2021 at 2:47 PM Nathan Chancellor <[email protected]> wrote:
>
> Hi Bill,
>
> On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
> >
> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
> > Reviewed-by: Nick Desaulniers <[email protected]>
> > Reviewed-by: Fangrui Song <[email protected]>
>
> Few small nits below, not sure they warrant a v10 versus just some
> follow up patches, up to you. Regardless:
>
> Reviewed-by: Nathan Chancellor <[email protected]>
>
> > ---
> > v9: - [maskray] Remove explicit 'ALIGN' and 'KEEP' from PGO variables in
> > vmlinux.lds.h.
> > v8: - Rebased on top-of-tree.
> > v7: - [sedat.dilek] Fix minor build failure.
> > v6: - Add better documentation about the locking scheme and other things.
> > - Rename macros to better match the same macros in LLVM's source code.
> > v5: - [natechancellor] Correct padding calculation.
> > v4: - [ndesaulniers] Remove non-x86 Makfile changes and se "hweight64" instead
> > of using our own popcount implementation.
> > v3: - [sedat.dilek] Added change log section.
> > v2: - [natechancellor] Added "__llvm_profile_instrument_memop".
> > - [maskray] Corrected documentation, re PGO flags when using LTO.
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 4 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 34 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 189 +++++++++++++
> > kernel/pgo/pgo.h | 203 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1022 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
> >
> > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> > index 1b1cf4f5c9d9..6a30cd98e6f9 100644
> > --- a/Documentation/dev-tools/index.rst
> > +++ b/Documentation/dev-tools/index.rst
> > @@ -27,6 +27,7 @@ whole; patches welcome!
> > kgdb
> > kselftest
> > kunit/index
> > + pgo
> >
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> > new file mode 100644
> > index 000000000000..b7f11d8405b7
> > --- /dev/null
> > +++ b/Documentation/dev-tools/pgo.rst
> > @@ -0,0 +1,127 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Using PGO with the Linux kernel
> > +===============================
> > +
> > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> > +when building with Clang. The profiling data is exported via the ``pgo``
> > +debugfs directory.
> > +
> > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> > +
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with:
> > +
> > +.. code-block:: make
> > +
> > + CONFIG_DEBUG_FS=y
> > + CONFIG_PGO_CLANG=y
> > +
> > +Note that kernels compiled with profiling flags will be significantly larger
> > +and run slower.
> > +
> > +Profiling data will only become accessible once debugfs has been mounted:
> > +
> > +.. code-block:: sh
> > +
> > + mount -t debugfs none /sys/kernel/debug
> > +
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable profiling for individual file and directories by
> > +adding a line similar to the following to the respective kernel Makefile:
> > +
> > +- For a single file (e.g. main.o)
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := y
> > +
> > +- For all files in one directory
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := y
> > +
> > +To exclude files from being profiled use
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE_main.o := n
> > +
> > +and
> > +
> > + .. code-block:: make
> > +
> > + PGO_PROFILE := n
> > +
> > +Only files which are linked to the main kernel image or are compiled as kernel
> > +modules are supported by this mechanism.
> > +
> > +
> > +Files
> > +=====
> > +
> > +The PGO kernel support creates the following files in debugfs:
> > +
> > +``/sys/kernel/debug/pgo``
> > + Parent directory for all PGO-related files.
> > +
> > +``/sys/kernel/debug/pgo/reset``
> > + Global reset file: resets all coverage data to zero when written to.
> > +
> > +``/sys/kernel/debug/profraw``
> > + The raw PGO data that must be processed with ``llvm_profdata``.
> > +
> > +
> > +Workflow
> > +========
> > +
> > +The PGO kernel can be run on the host or test machines. The data though should
> > +be analyzed with Clang's tools from the same Clang version as the kernel was
> > +compiled. Clang's tolerant of version skew, but it's easier to use the same
> > +Clang version.
> > +
> > +The profiling data is useful for optimizing the kernel, analyzing coverage,
> > +etc. Clang offers tools to perform these tasks.
> > +
> > +Here is an example workflow for profiling an instrumented kernel with PGO and
> > +using the result to optimize the kernel:
> > +
> > +1) Install the kernel on the TEST machine.
> > +
> > +2) Reset the data counters right before running the load tests
> > +
> > + .. code-block:: sh
> > +
> > + $ echo 1 > /sys/kernel/debug/pgo/reset
> > +
> > +3) Run the load tests.
> > +
> > +4) Collect the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> > +
> > +5) (Optional) Download the raw profile data to the HOST machine.
> > +
> > +6) Process the raw profile data
> > +
> > + .. code-block:: sh
> > +
> > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > +
> > + Note that multiple raw profile data files can be merged during this step.
> > +
> > +7) Rebuild the kernel using the profile data (PGO disabled)
> > +
> > + .. code-block:: sh
> > +
> > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index c80ad735b384..742058188af2 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14054,6 +14054,15 @@ S: Maintained
> > F: include/linux/personality.h
> > F: include/uapi/linux/personality.h
> >
> > +PGO BASED KERNEL PROFILING
> > +M: Sami Tolvanen <[email protected]>
> > +M: Bill Wendling <[email protected]>
> > +R: Nathan Chancellor <[email protected]>
>
> This should be updated to my @kernel.org address. I can send a follow-up
> patch if need be.
>
Sorry about that!

> > +R: Nick Desaulniers <[email protected]>
> > +S: Supported
> > +F: Documentation/dev-tools/pgo.rst
> > +F: kernel/pgo
> > +
> > PHOENIX RC FLIGHT CONTROLLER ADAPTER
> > M: Marcus Folkesson <[email protected]>
> > L: [email protected]
> > diff --git a/Makefile b/Makefile
> > index cc77fd45ca64..6450faceb137 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -660,6 +660,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index ecfd3520b676..afd082133e0a 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1191,6 +1191,7 @@ config ARCH_HAS_ELFCORE_COMPAT
> > bool
> >
> > source "kernel/gcov/Kconfig"
> > +source "kernel/pgo/Kconfig"
> >
> > source "scripts/gcc-plugins/Kconfig"
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 2792879d398e..62be93b199ff 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -99,6 +99,7 @@ config X86
> > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> > select ARCH_SUPPORTS_LTO_CLANG if X86_64
> > select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
> > + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_QUEUED_RWLOCKS
> > select ARCH_USE_QUEUED_SPINLOCKS
> > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> > index fe605205b4ce..383853e32f67 100644
> > --- a/arch/x86/boot/Makefile
> > +++ b/arch/x86/boot/Makefile
> > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> >
> > $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index e0bc3988c3fa..ed12ab65f606 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
> >
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE :=n
> >
> > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> > index b28e36b7c96b..4b2e9620c412 100644
> > --- a/arch/x86/crypto/Makefile
> > +++ b/arch/x86/crypto/Makefile
> > @@ -4,6 +4,10 @@
> >
> > OBJECT_FILES_NON_STANDARD := y
> >
> > +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> > +# registers for some of the functions.
> > +PGO_PROFILE_curve25519-x86_64.o := n
> > +
> > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> > twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
> > obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
> > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> > index 05c4abc2fdfd..f7421e44725a 100644
> > --- a/arch/x86/entry/vdso/Makefile
> > +++ b/arch/x86/entry/vdso/Makefile
> > @@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@
> > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> > $(call ld-option, --eh-frame-hdr) -Bsymbolic
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > quiet_cmd_vdso_and_check = VDSO $@
> > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index efd9e9ea17f2..f6cab2316c46 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -184,6 +184,8 @@ SECTIONS
> >
> > BUG_TABLE
> >
> > + PGO_CLANG_DATA
> > +
> > ORC_UNWIND_TABLE
> >
> > . = ALIGN(PAGE_SIZE);
> > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> > index 84b09c230cbd..5f22b31446ad 100644
> > --- a/arch/x86/platform/efi/Makefile
> > +++ b/arch/x86/platform/efi/Makefile
> > @@ -2,6 +2,7 @@
> > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> > KASAN_SANITIZE := n
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> > index 95ea17a9d20c..36f20e99da0b 100644
> > --- a/arch/x86/purgatory/Makefile
> > +++ b/arch/x86/purgatory/Makefile
> > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
> >
> > # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > KASAN_SANITIZE := n
> > UBSAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> > index 83f1b6a56449..21797192f958 100644
> > --- a/arch/x86/realmode/rm/Makefile
> > +++ b/arch/x86/realmode/rm/Makefile
> > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > UBSAN_SANITIZE := n
> > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> > index 5943387e3f35..54f5768f5853 100644
> > --- a/arch/x86/um/vdso/Makefile
> > +++ b/arch/x86/um/vdso/Makefile
> > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
> >
> > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> >
> > #
> > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index c23466e05e60..724fb389bb9d 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
> >
> > GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > # Sanitizer runtimes are unavailable and cannot be linked here.
> > KASAN_SANITIZE := n
> > KCSAN_SANITIZE := n
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index 0331d5d49551..b371857097e8 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -329,6 +329,39 @@
> > #define DTPM_TABLE()
> > #endif
> >
> > +#ifdef CONFIG_PGO_CLANG
> > +#define PGO_CLANG_DATA \
> > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> > + __llvm_prf_start = .; \
> > + __llvm_prf_data_start = .; \
> > + *(__llvm_prf_data) \
> > + __llvm_prf_data_end = .; \
> > + } \
> > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> > + __llvm_prf_cnts_start = .; \
> > + *(__llvm_prf_cnts) \
> > + __llvm_prf_cnts_end = .; \
> > + } \
> > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> > + __llvm_prf_names_start = .; \
> > + *(__llvm_prf_names) \
> > + __llvm_prf_names_end = .; \
> > + } \
> > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> > + __llvm_prf_vals_start = .; \
> > + *(__llvm_prf_vals) \
> > + __llvm_prf_vals_end = .; \
> > + } \
> > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> > + __llvm_prf_vnds_start = .; \
> > + *(__llvm_prf_vnds) \
> > + __llvm_prf_vnds_end = .; \
> > + __llvm_prf_end = .; \
> > + }
> > +#else
> > +#define PGO_CLANG_DATA
> > +#endif
> > +
> > #define KERNEL_DTB() \
> > STRUCT_ALIGN(); \
> > __dtb_start = .; \
> > @@ -1106,6 +1139,7 @@
> > CONSTRUCTORS \
> > } \
> > BUG_TABLE \
> > + PGO_CLANG_DATA
> >
> > #define INIT_TEXT_SECTION(inittext_align) \
> > . = ALIGN(inittext_align); \
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index 320f1f3941b7..a2a23ef2b12f 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> > obj-$(CONFIG_KCSAN) += kcsan/
> > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> > +obj-$(CONFIG_PGO_CLANG) += pgo/
> >
> > obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> > new file mode 100644
> > index 000000000000..76a640b6cf6e
> > --- /dev/null
> > +++ b/kernel/pgo/Kconfig
> > @@ -0,0 +1,35 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> > +
> > +config ARCH_SUPPORTS_PGO_CLANG
> > + bool
> > +
> > +config PGO_CLANG
> > + bool "Enable clang's PGO-based kernel profiling"
> > + depends on DEBUG_FS
> > + depends on ARCH_SUPPORTS_PGO_CLANG
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> > + help
> > + This option enables clang's PGO (Profile Guided Optimization) based
> > + code profiling to better optimize the kernel.
> > +
> > + If unsure, say N.
> > +
> > + Run a representative workload for your application on a kernel
> > + compiled with this option and download the raw profile file from
> > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> > + llvm-profdata. It may be merged with other collected raw profiles.
> > +
> > + Copy the resulting profile file into vmlinux.profdata, and enable
> > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> > + kernel.
> > +
> > + Note that a kernel compiled with profiling flags will be
> > + significantly larger and run slower. Also be sure to exclude files
> > + from profiling which are not linked to the kernel image to prevent
> > + linker errors.
> > +
> > + Note that the debugfs filesystem has to be mounted to access
> > + profiling data.
>
> It might be nice to have CONFIG_PGO_PROFILE_ALL like
> CONFIG_GCOV_PROFILE_ALL so that people do not have to go sprinkle the
> kernel with PGO_PROFILE definitions in the Makefile.
>
It seemed to me that the GCOV_PROFILE_ALL option was there to
differentiate between profiling and coverage. I may be wrong about
that. I didn't add the PGO_PROFILE_ALL because there's only one use
when you enable PGO_CLANG, profiling the entire kernel. It may be
useful to have PGO_PROFILE_ALL once we include coverage. Thoughts?

-bw

> > +endmenu
> > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> > new file mode 100644
> > index 000000000000..41e27cefd9a4
> > --- /dev/null
> > +++ b/kernel/pgo/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +GCOV_PROFILE := n
> > +PGO_PROFILE := n
> > +
> > +obj-y += fs.o instrument.o
> > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> > new file mode 100644
> > index 000000000000..1678df3b7d64
> > --- /dev/null
> > +++ b/kernel/pgo/fs.c
> > @@ -0,0 +1,389 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +#include "pgo.h"
> > +
> > +static struct dentry *directory;
> > +
> > +struct prf_private_data {
> > + void *buffer;
> > + unsigned long size;
> > +};
> > +
> > +/*
> > + * Raw profile data format:
> > + *
> > + * - llvm_prf_header
> > + * - __llvm_prf_data
> > + * - __llvm_prf_cnts
> > + * - __llvm_prf_names
> > + * - zero padding to 8 bytes
> > + * - for each llvm_prf_data in __llvm_prf_data:
> > + * - llvm_prf_value_data
> > + * - llvm_prf_value_record + site count array
> > + * - llvm_prf_value_node_data
> > + * ...
> > + * ...
> > + * ...
> > + */
> > +
> > +static void prf_fill_header(void **buffer)
> > +{
> > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> > +
> > +#ifdef CONFIG_64BIT
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> > +#else
> > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> > +#endif
> > + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> > + header->data_size = prf_data_count();
> > + header->padding_bytes_before_counters = 0;
> > + header->counters_size = prf_cnts_count();
> > + header->padding_bytes_after_counters = 0;
> > + header->names_size = prf_names_count();
> > + header->counters_delta = (u64)__llvm_prf_cnts_start;
> > + header->names_delta = (u64)__llvm_prf_names_start;
> > + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> > +
> > + *buffer += sizeof(*header);
> > +}
> > +
> > +/*
> > + * Copy the source into the buffer, incrementing the pointer into buffer in the
> > + * process.
> > + */
> > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> > +{
> > + memcpy(*buffer, src, size);
> > + *buffer += size;
> > +}
> > +
> > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> > +{
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + u32 kinds = 0;
> > + u32 size = 0;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Record + site count array */
> > + size += prf_get_value_record_size(sites);
> > + kinds++;
> > +
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX)
> > + site = site->next;
> > +
> > + size += count *
> > + sizeof(struct llvm_prf_value_node_data);
> > + }
> > +
> > + s += sites;
> > + }
> > +
> > + if (size)
> > + size += sizeof(struct llvm_prf_value_data);
> > +
> > + if (value_kinds)
> > + *value_kinds = kinds;
> > +
> > + return size;
> > +}
> > +
> > +static u32 prf_get_value_size(void)
> > +{
> > + u32 size = 0;
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + size += __prf_get_value_size(p, NULL);
> > +
> > + return size;
> > +}
> > +
> > +/* Serialize the profiling's value. */
> > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> > +{
> > + struct llvm_prf_value_data header;
> > + struct llvm_prf_value_node **nodes =
> > + (struct llvm_prf_value_node **)p->values;
> > + unsigned int kind;
> > + unsigned int n;
> > + unsigned int s = 0;
> > +
> > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> > +
> > + if (!header.num_value_kinds)
> > + /* Nothing to write. */
> > + return;
> > +
> > + prf_copy_to_buffer(buffer, &header, sizeof(header));
> > +
> > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> > + struct llvm_prf_value_record *record;
> > + u8 *counts;
> > + unsigned int sites = p->num_value_sites[kind];
> > +
> > + if (!sites)
> > + continue;
> > +
> > + /* Profiling value record. */
> > + record = *(struct llvm_prf_value_record **)buffer;
> > + *buffer += prf_get_value_record_header_size();
> > +
> > + record->kind = kind;
> > + record->num_value_sites = sites;
> > +
> > + /* Site count array. */
> > + counts = *(u8 **)buffer;
> > + *buffer += prf_get_value_record_site_count_size(sites);
> > +
> > + /*
> > + * If we don't have nodes, we can skip updating the site count
> > + * array, because the buffer is zero filled.
> > + */
> > + if (!nodes)
> > + continue;
> > +
> > + for (n = 0; n < sites; n++) {
> > + u32 count = 0;
> > + struct llvm_prf_value_node *site = nodes[s + n];
> > +
> > + while (site && ++count <= U8_MAX) {
> > + prf_copy_to_buffer(buffer, site,
> > + sizeof(struct llvm_prf_value_node_data));
> > + site = site->next;
> > + }
> > +
> > + counts[n] = (u8)count;
> > + }
> > +
> > + s += sites;
> > + }
> > +}
> > +
> > +static void prf_serialize_values(void **buffer)
> > +{
> > + struct llvm_prf_data *p;
> > +
> > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> > + prf_serialize_value(p, buffer);
> > +}
> > +
> > +static inline unsigned long prf_get_padding(unsigned long size)
> > +{
> > + return 7 & (sizeof(u64) - size % sizeof(u64));
> > +}
> > +
> > +static unsigned long prf_buffer_size(void)
> > +{
> > + return sizeof(struct llvm_prf_header) +
> > + prf_data_size() +
> > + prf_cnts_size() +
> > + prf_names_size() +
> > + prf_get_padding(prf_names_size()) +
> > + prf_get_value_size();
> > +}
> > +
> > +/*
> > + * Serialize the profiling data into a format LLVM's tools can understand.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static int prf_serialize(struct prf_private_data *p)
> > +{
> > + int err = 0;
> > + void *buffer;
> > +
> > + p->size = prf_buffer_size();
> > + p->buffer = vzalloc(p->size);
> > +
> > + if (!p->buffer) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + buffer = p->buffer;
> > +
> > + prf_fill_header(&buffer);
> > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> > + buffer += prf_get_padding(prf_names_size());
> > +
> > + prf_serialize_values(&buffer);
> > +
> > +out:
> > + return err;
> > +}
> > +
> > +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> > +static int prf_open(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data;
> > + unsigned long flags;
> > + int err;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + flags = prf_lock();
> > +
> > + err = prf_serialize(data);
> > + if (unlikely(err)) {
> > + kfree(data);
> > + goto out_unlock;
> > + }
> > +
> > + file->private_data = data;
> > +
> > +out_unlock:
> > + prf_unlock(flags);
> > +out:
> > + return err;
> > +}
> > +
> > +/* read() implementation for PGO. */
> > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> > + loff_t *ppos)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + BUG_ON(!data);
> > +
> > + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> > + data->size);
> > +}
> > +
> > +/* release() implementation for PGO. Release resources allocated by open(). */
> > +static int prf_release(struct inode *inode, struct file *file)
> > +{
> > + struct prf_private_data *data = file->private_data;
> > +
> > + if (data) {
> > + vfree(data->buffer);
> > + kfree(data);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations prf_fops = {
> > + .owner = THIS_MODULE,
> > + .open = prf_open,
> > + .read = prf_read,
> > + .llseek = default_llseek,
> > + .release = prf_release
> > +};
> > +
> > +/* write() implementation for resetting PGO's profile data. */
> > +static ssize_t reset_write(struct file *file, const char __user *addr,
> > + size_t len, loff_t *pos)
> > +{
> > + struct llvm_prf_data *data;
> > +
> > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> > +
> > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> > + struct llvm_prf_value_node **vnodes;
> > + u64 current_vsite_count;
> > + u32 i;
> > +
> > + if (!data->values)
> > + continue;
> > +
> > + current_vsite_count = 0;
> > + vnodes = (struct llvm_prf_value_node **)data->values;
> > +
> > + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> > + current_vsite_count += data->num_value_sites[i];
> > +
> > + for (i = 0; i < current_vsite_count; i++) {
> > + struct llvm_prf_value_node *current_vnode = vnodes[i];
> > +
> > + while (current_vnode) {
> > + current_vnode->count = 0;
> > + current_vnode = current_vnode->next;
> > + }
> > + }
> > + }
> > +
> > + return len;
> > +}
> > +
> > +static const struct file_operations prf_reset_fops = {
> > + .owner = THIS_MODULE,
> > + .write = reset_write,
> > + .llseek = noop_llseek,
> > +};
> > +
> > +/* Create debugfs entries. */
> > +static int __init pgo_init(void)
> > +{
> > + directory = debugfs_create_dir("pgo", NULL);
> > + if (!directory)
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> > + &prf_fops))
> > + goto err_remove;
> > +
> > + if (!debugfs_create_file("reset", 0200, directory, NULL,
> > + &prf_reset_fops))
> > + goto err_remove;
> > +
> > + return 0;
> > +
> > +err_remove:
> > + pr_err("initialization failed\n");
> > + return -EIO;
> > +}
> > +
> > +/* Remove debugfs entries. */
> > +static void __exit pgo_exit(void)
> > +{
> > + debugfs_remove_recursive(directory);
> > +}
> > +
> > +module_init(pgo_init);
> > +module_exit(pgo_exit);
> > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> > new file mode 100644
> > index 000000000000..464b3bc77431
> > --- /dev/null
> > +++ b/kernel/pgo/instrument.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "pgo: " fmt
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/kernel.h>
> > +#include <linux/export.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include "pgo.h"
> > +
> > +/*
> > + * This lock guards both profile count updating and serialization of the
> > + * profiling data. Keeping both of these activities separate via locking
> > + * ensures that we don't try to serialize data that's only partially updated.
> > + */
> > +static DEFINE_SPINLOCK(pgo_lock);
> > +static int current_node;
> > +
> > +unsigned long prf_lock(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&pgo_lock, flags);
> > +
> > + return flags;
> > +}
> > +
> > +void prf_unlock(unsigned long flags)
> > +{
> > + spin_unlock_irqrestore(&pgo_lock, flags);
> > +}
> > +
> > +/*
> > + * Return a newly allocated profiling value node which contains the tracked
> > + * value by the value profiler.
> > + * Note: caller *must* hold pgo_lock.
> > + */
> > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> > + u32 index, u64 value)
> > +{
> > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> > + return NULL; /* Out of nodes */
> > +
> > + current_node++;
> > +
> > + /* Make sure the node is entirely within the section */
> > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> > + return NULL;
> > +
> > + return &__llvm_prf_vnds_start[current_node];
> > +}
> > +
> > +/*
> > + * Counts the number of times a target value is seen.
> > + *
> > + * Records the target value for the index if not seen before. Otherwise,
> > + * increments the counter associated w/ the target value.
> > + */
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> > +{
> > + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> > + struct llvm_prf_value_node **counters;
> > + struct llvm_prf_value_node *curr;
> > + struct llvm_prf_value_node *min = NULL;
> > + struct llvm_prf_value_node *prev = NULL;
> > + u64 min_count = U64_MAX;
> > + u8 values = 0;
> > + unsigned long flags;
> > +
> > + if (!p || !p->values)
> > + return;
> > +
> > + counters = (struct llvm_prf_value_node **)p->values;
> > + curr = counters[index];
> > +
> > + while (curr) {
> > + if (target_value == curr->value) {
> > + curr->count++;
> > + return;
> > + }
> > +
> > + if (curr->count < min_count) {
> > + min_count = curr->count;
> > + min = curr;
> > + }
> > +
> > + prev = curr;
> > + curr = curr->next;
> > + values++;
> > + }
> > +
> > + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> > + if (!min->count || !(--min->count)) {
> > + curr = min;
> > + curr->value = target_value;
> > + curr->count++;
> > + }
> > + return;
> > + }
> > +
> > + /* Lock when updating the value node structure. */
> > + flags = prf_lock();
> > +
> > + curr = allocate_node(p, index, target_value);
> > + if (!curr)
> > + goto out;
> > +
> > + curr->value = target_value;
> > + curr->count++;
> > +
> > + if (!counters[index])
> > + counters[index] = curr;
> > + else if (prev && !prev->next)
> > + prev->next = curr;
> > +
> > +out:
> > + prf_unlock(flags);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> > +
> > +/* Counts the number of times a range of targets values are seen. */
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value);
> > +void __llvm_profile_instrument_range(u64 target_value, void *data,
> > + u32 index, s64 precise_start,
> > + s64 precise_last, s64 large_value)
> > +{
> > + if (large_value != S64_MIN && (s64)target_value >= large_value)
> > + target_value = large_value;
> > + else if ((s64)target_value < precise_start ||
> > + (s64)target_value > precise_last)
> > + target_value = precise_last + 1;
> > +
> > + __llvm_profile_instrument_target(target_value, data, index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> > +
> > +static u64 inst_prof_get_range_rep_value(u64 value)
> > +{
> > + if (value <= 8)
> > + /* The first ranges are individually tracked, use it as is. */
> > + return value;
> > + else if (value >= 513)
> > + /* The last range is mapped to its lowest value. */
> > + return 513;
> > + else if (hweight64(value) == 1)
> > + /* If it's a power of two, use it as is. */
> > + return value;
> > +
> > + /* Otherwise, take to the previous power of two + 1. */
> > + return ((u64)1 << (64 - __builtin_clzll(value) - 1)) + 1;
> > +}
> > +
> > +/*
> > + * The target values are partitioned into multiple ranges. The range spec is
> > + * defined in compiler-rt/include/profile/InstrProfData.inc.
> > + */
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index);
> > +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> > + u32 counter_index)
> > +{
> > + u64 rep_value;
> > +
> > + /* Map the target value to the representative value of its range. */
> > + rep_value = inst_prof_get_range_rep_value(target_value);
> > + __llvm_profile_instrument_target(rep_value, data, counter_index);
> > +}
> > +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> > new file mode 100644
> > index 000000000000..ddc8d3002fe5
> > --- /dev/null
> > +++ b/kernel/pgo/pgo.h
> > @@ -0,0 +1,203 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2019 Google, Inc.
> > + *
> > + * Author:
> > + * Sami Tolvanen <[email protected]>
> > + *
> > + * This software is licensed under the terms of the GNU General Public
> > + * License version 2, as published by the Free Software Foundation, and
> > + * may be copied, distributed, and modified under those terms.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#ifndef _PGO_H
> > +#define _PGO_H
> > +
> > +/*
> > + * Note: These internal LLVM definitions must match the compiler version.
> > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> > + */
> > +
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'r' << 8 | \
> > + (u64)129)
> > +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> > + ((u64)255 << 56 | \
> > + (u64)'l' << 48 | \
> > + (u64)'p' << 40 | \
> > + (u64)'r' << 32 | \
> > + (u64)'o' << 24 | \
> > + (u64)'f' << 16 | \
> > + (u64)'R' << 8 | \
> > + (u64)129)
> > +
> > +#define LLVM_INSTR_PROF_RAW_VERSION 5
> > +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> > +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> > +#define LLVM_INSTR_PROF_IPVK_LAST 1
> > +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> > +
> > +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> > +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> > +
> > +/**
> > + * struct llvm_prf_header - represents the raw profile header data structure.
> > + * @magic: the magic token for the file format.
> > + * @version: the version of the file format.
> > + * @data_size: the number of entries in the profile data section.
> > + * @padding_bytes_before_counters: the number of padding bytes before the
> > + * counters.
> > + * @counters_size: the size in bytes of the LLVM profile section containing the
> > + * counters.
> > + * @padding_bytes_after_counters: the number of padding bytes after the
> > + * counters.
> > + * @names_size: the size in bytes of the LLVM profile section containing the
> > + * counters' names.
> > + * @counters_delta: the beginning of the LLMV profile counters section.
> > + * @names_delta: the beginning of the LLMV profile names section.
> > + * @value_kind_last: the last profile value kind.
> > + */
> > +struct llvm_prf_header {
> > + u64 magic;
> > + u64 version;
> > + u64 data_size;
> > + u64 padding_bytes_before_counters;
> > + u64 counters_size;
> > + u64 padding_bytes_after_counters;
> > + u64 names_size;
> > + u64 counters_delta;
> > + u64 names_delta;
> > + u64 value_kind_last;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_data - represents the per-function control structure.
> > + * @name_ref: the reference to the function's name.
> > + * @func_hash: the hash value of the function.
> > + * @counter_ptr: a pointer to the profile counter.
> > + * @function_ptr: a pointer to the function.
> > + * @values: the profiling values associated with this function.
> > + * @num_counters: the number of counters in the function.
> > + * @num_value_sites: the number of value profile sites.
> > + */
> > +struct llvm_prf_data {
> > + const u64 name_ref;
> > + const u64 func_hash;
> > + const void *counter_ptr;
> > + const void *function_ptr;
> > + void *values;
> > + const u32 num_counters;
> > + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> > +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> > +
> > +/**
> > + * structure llvm_prf_value_node_data - represents the data part of the struct
> > + * llvm_prf_value_node data structure.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + */
> > +struct llvm_prf_value_node_data {
> > + u64 value;
> > + u64 count;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_node - represents an internal data structure used by
> > + * the value profiler.
> > + * @value: the value counters.
> > + * @count: the counters' count.
> > + * @next: the next value node.
> > + */
> > +struct llvm_prf_value_node {
> > + u64 value;
> > + u64 count;
> > + struct llvm_prf_value_node *next;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_data - represents the value profiling data in indexed
> > + * format.
> > + * @total_size: the total size in bytes including this field.
> > + * @num_value_kinds: the number of value profile kinds that has value profile
> > + * data.
> > + */
> > +struct llvm_prf_value_data {
> > + u32 total_size;
> > + u32 num_value_kinds;
> > +};
> > +
> > +/**
> > + * struct llvm_prf_value_record - represents the on-disk layout of the value
> > + * profile data of a particular kind for one function.
> > + * @kind: the kind of the value profile record.
> > + * @num_value_sites: the number of value profile sites.
> > + * @site_count_array: the first element of the array that stores the number
> > + * of profiled values for each value site.
> > + */
> > +struct llvm_prf_value_record {
> > + u32 kind;
> > + u32 num_value_sites;
> > + u8 site_count_array[];
> > +};
> > +
> > +#define prf_get_value_record_header_size() \
> > + offsetof(struct llvm_prf_value_record, site_count_array)
> > +#define prf_get_value_record_site_count_size(sites) \
> > + roundup((sites), 8)
> > +#define prf_get_value_record_size(sites) \
> > + (prf_get_value_record_header_size() + \
> > + prf_get_value_record_site_count_size((sites)))
> > +
> > +/* Data sections */
> > +extern struct llvm_prf_data __llvm_prf_data_start[];
> > +extern struct llvm_prf_data __llvm_prf_data_end[];
> > +
> > +extern u64 __llvm_prf_cnts_start[];
> > +extern u64 __llvm_prf_cnts_end[];
> > +
> > +extern char __llvm_prf_names_start[];
> > +extern char __llvm_prf_names_end[];
> > +
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> > +
> > +/* Locking for vnodes */
> > +extern unsigned long prf_lock(void);
> > +extern void prf_unlock(unsigned long flags);
> > +
> > +#define __DEFINE_PRF_SIZE(s) \
> > + static inline unsigned long prf_ ## s ## _size(void) \
> > + { \
> > + unsigned long start = \
> > + (unsigned long)__llvm_prf_ ## s ## _start; \
> > + unsigned long end = \
> > + (unsigned long)__llvm_prf_ ## s ## _end; \
> > + return roundup(end - start, \
> > + sizeof(__llvm_prf_ ## s ## _start[0])); \
> > + } \
> > + static inline unsigned long prf_ ## s ## _count(void) \
> > + { \
> > + return prf_ ## s ## _size() / \
> > + sizeof(__llvm_prf_ ## s ## _start[0]); \
> > + }
> > +
> > +__DEFINE_PRF_SIZE(data);
> > +__DEFINE_PRF_SIZE(cnts);
> > +__DEFINE_PRF_SIZE(names);
> > +__DEFINE_PRF_SIZE(vnds);
> > +
> > +#undef __DEFINE_PRF_SIZE
> > +
> > +#endif /* _PGO_H */
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 8cd67b1b6d15..d411e92dd0d6 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_GCOV))
> > endif
> >
> > +#
> > +# Enable clang's PGO profiling flags for a file or directory depending on
> > +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> > +#
> > +ifeq ($(CONFIG_PGO_CLANG),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> > + $(CFLAGS_PGO_CLANG))
> > +endif
> > +
> > #
> > # Enable address sanitizer flags for kernel except some files or directories
> > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> > --
> > 2.31.0.208.g409f899ff0-goog
> >

2021-04-07 22:08:00

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

Hi Bill,

On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Reviewed-by: Fangrui Song <[email protected]>

Few small nits below, not sure they warrant a v10 versus just some
follow up patches, up to you. Regardless:

Reviewed-by: Nathan Chancellor <[email protected]>

> ---
> v9: - [maskray] Remove explicit 'ALIGN' and 'KEEP' from PGO variables in
> vmlinux.lds.h.
> v8: - Rebased on top-of-tree.
> v7: - [sedat.dilek] Fix minor build failure.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
> v5: - [natechancellor] Correct padding calculation.
> v4: - [ndesaulniers] Remove non-x86 Makfile changes and se "hweight64" instead
> of using our own popcount implementation.
> v3: - [sedat.dilek] Added change log section.
> v2: - [natechancellor] Added "__llvm_profile_instrument_memop".
> - [maskray] Corrected documentation, re PGO flags when using LTO.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 34 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1022 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index 1b1cf4f5c9d9..6a30cd98e6f9 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -27,6 +27,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 000000000000..b7f11d8405b7
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c80ad735b384..742058188af2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14054,6 +14054,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>

This should be updated to my @kernel.org address. I can send a follow-up
patch if need be.

> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index cc77fd45ca64..6450faceb137 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -660,6 +660,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index ecfd3520b676..afd082133e0a 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1191,6 +1191,7 @@ config ARCH_HAS_ELFCORE_COMPAT
> bool
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2792879d398e..62be93b199ff 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -99,6 +99,7 @@ config X86
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> select ARCH_SUPPORTS_LTO_CLANG if X86_64
> select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce..383853e32f67 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..ed12ab65f606 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index b28e36b7c96b..4b2e9620c412 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> +# registers for some of the functions.
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
> obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 05c4abc2fdfd..f7421e44725a 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f2..f6cab2316c46 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd..5f22b31446ad 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..36f20e99da0b 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449..21797192f958 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f35..54f5768f5853 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index c23466e05e60..724fb389bb9d 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 0331d5d49551..b371857097e8 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -329,6 +329,39 @@
> #define DTPM_TABLE()
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + *(__llvm_prf_data) \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + __llvm_prf_cnts_start = .; \
> + *(__llvm_prf_cnts) \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + __llvm_prf_names_start = .; \
> + *(__llvm_prf_names) \
> + __llvm_prf_names_end = .; \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + *(__llvm_prf_vals) \
> + __llvm_prf_vals_end = .; \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + *(__llvm_prf_vnds) \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1106,6 +1139,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 320f1f3941b7..a2a23ef2b12f 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 000000000000..76a640b6cf6e
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.

It might be nice to have CONFIG_PGO_PROFILE_ALL like
CONFIG_GCOV_PROFILE_ALL so that people do not have to go sprinkle the
kernel with PGO_PROFILE definitions in the Makefile.

> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 000000000000..41e27cefd9a4
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 000000000000..1678df3b7d64
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,389 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +#ifdef CONFIG_64BIT
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> +#else
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> +#endif
> + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (sizeof(u64) - size % sizeof(u64));
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/*
> + * Serialize the profiling data into a format LLVM's tools can understand.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (unlikely(err)) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);
> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; i++) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 000000000000..464b3bc77431
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/*
> + * This lock guards both profile count updating and serialization of the
> + * profiling data. Keeping both of these activities separate via locking
> + * ensures that we don't try to serialize data that's only partially updated.
> + */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the index if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, use it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return ((u64)1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 000000000000..ddc8d3002fe5
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +
> +#define LLVM_INSTR_PROF_RAW_VERSION 5
> +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> +#define LLVM_INSTR_PROF_IPVK_LAST 1
> +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> +
> +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 8cd67b1b6d15..d411e92dd0d6 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.31.0.208.g409f899ff0-goog
>

2021-05-19 21:39:26

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.
>
> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Reviewed-by: Fangrui Song <[email protected]>
> ---
> v9: - [maskray] Remove explicit 'ALIGN' and 'KEEP' from PGO variables in
> vmlinux.lds.h.
> v8: - Rebased on top-of-tree.
> v7: - [sedat.dilek] Fix minor build failure.
> v6: - Add better documentation about the locking scheme and other things.
> - Rename macros to better match the same macros in LLVM's source code.
> v5: - [natechancellor] Correct padding calculation.
> v4: - [ndesaulniers] Remove non-x86 Makfile changes and se "hweight64" instead
> of using our own popcount implementation.
> v3: - [sedat.dilek] Added change log section.
> v2: - [natechancellor] Added "__llvm_profile_instrument_memop".
> - [maskray] Corrected documentation, re PGO flags when using LTO.
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 34 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1022 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h
>
> diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
> index 1b1cf4f5c9d9..6a30cd98e6f9 100644
> --- a/Documentation/dev-tools/index.rst
> +++ b/Documentation/dev-tools/index.rst
> @@ -27,6 +27,7 @@ whole; patches welcome!
> kgdb
> kselftest
> kunit/index
> + pgo
>
>
> .. only:: subproject and html
> diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
> new file mode 100644
> index 000000000000..b7f11d8405b7
> --- /dev/null
> +++ b/Documentation/dev-tools/pgo.rst
> @@ -0,0 +1,127 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Using PGO with the Linux kernel
> +===============================
> +
> +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
> +when building with Clang. The profiling data is exported via the ``pgo``
> +debugfs directory.
> +
> +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> +
> +
> +Preparation
> +===========
> +
> +Configure the kernel with:
> +
> +.. code-block:: make
> +
> + CONFIG_DEBUG_FS=y
> + CONFIG_PGO_CLANG=y
> +
> +Note that kernels compiled with profiling flags will be significantly larger
> +and run slower.
> +
> +Profiling data will only become accessible once debugfs has been mounted:
> +
> +.. code-block:: sh
> +
> + mount -t debugfs none /sys/kernel/debug
> +
> +
> +Customization
> +=============
> +
> +You can enable or disable profiling for individual file and directories by
> +adding a line similar to the following to the respective kernel Makefile:
> +
> +- For a single file (e.g. main.o)
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := y
> +
> +- For all files in one directory
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := y
> +
> +To exclude files from being profiled use
> +
> + .. code-block:: make
> +
> + PGO_PROFILE_main.o := n
> +
> +and
> +
> + .. code-block:: make
> +
> + PGO_PROFILE := n
> +
> +Only files which are linked to the main kernel image or are compiled as kernel
> +modules are supported by this mechanism.
> +
> +
> +Files
> +=====
> +
> +The PGO kernel support creates the following files in debugfs:
> +
> +``/sys/kernel/debug/pgo``
> + Parent directory for all PGO-related files.
> +
> +``/sys/kernel/debug/pgo/reset``
> + Global reset file: resets all coverage data to zero when written to.
> +
> +``/sys/kernel/debug/profraw``
> + The raw PGO data that must be processed with ``llvm_profdata``.
> +
> +
> +Workflow
> +========
> +
> +The PGO kernel can be run on the host or test machines. The data though should
> +be analyzed with Clang's tools from the same Clang version as the kernel was
> +compiled. Clang's tolerant of version skew, but it's easier to use the same
> +Clang version.
> +
> +The profiling data is useful for optimizing the kernel, analyzing coverage,
> +etc. Clang offers tools to perform these tasks.
> +
> +Here is an example workflow for profiling an instrumented kernel with PGO and
> +using the result to optimize the kernel:
> +
> +1) Install the kernel on the TEST machine.
> +
> +2) Reset the data counters right before running the load tests
> +
> + .. code-block:: sh
> +
> + $ echo 1 > /sys/kernel/debug/pgo/reset
> +
> +3) Run the load tests.
> +
> +4) Collect the raw profile data
> +
> + .. code-block:: sh
> +
> + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw
> +
> +5) (Optional) Download the raw profile data to the HOST machine.
> +
> +6) Process the raw profile data
> +
> + .. code-block:: sh
> +
> + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> +
> + Note that multiple raw profile data files can be merged during this step.
> +
> +7) Rebuild the kernel using the profile data (PGO disabled)
> +
> + .. code-block:: sh
> +
> + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c80ad735b384..742058188af2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14054,6 +14054,15 @@ S: Maintained
> F: include/linux/personality.h
> F: include/uapi/linux/personality.h
>
> +PGO BASED KERNEL PROFILING
> +M: Sami Tolvanen <[email protected]>
> +M: Bill Wendling <[email protected]>
> +R: Nathan Chancellor <[email protected]>
> +R: Nick Desaulniers <[email protected]>
> +S: Supported
> +F: Documentation/dev-tools/pgo.rst
> +F: kernel/pgo
> +
> PHOENIX RC FLIGHT CONTROLLER ADAPTER
> M: Marcus Folkesson <[email protected]>
> L: [email protected]
> diff --git a/Makefile b/Makefile
> index cc77fd45ca64..6450faceb137 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -660,6 +660,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index ecfd3520b676..afd082133e0a 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1191,6 +1191,7 @@ config ARCH_HAS_ELFCORE_COMPAT
> bool
>
> source "kernel/gcov/Kconfig"
> +source "kernel/pgo/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2792879d398e..62be93b199ff 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -99,6 +99,7 @@ config X86
> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
> select ARCH_SUPPORTS_LTO_CLANG if X86_64
> select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
> + select ARCH_SUPPORTS_PGO_CLANG if X86_64
> select ARCH_USE_BUILTIN_BSWAP
> select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_USE_QUEUED_SPINLOCKS
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index fe605205b4ce..383853e32f67 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
>
> $(obj)/bzImage: asflags-y := $(SVGA_MODE)
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..ed12ab65f606 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE :=n
>
> KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index b28e36b7c96b..4b2e9620c412 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -4,6 +4,10 @@
>
> OBJECT_FILES_NON_STANDARD := y
>
> +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of
> +# registers for some of the functions.
> +PGO_PROFILE_curve25519-x86_64.o := n
> +
> obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
> twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
> obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index 05c4abc2fdfd..f7421e44725a 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@
> VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \
> $(call ld-option, --eh-frame-hdr) -Bsymbolic
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> quiet_cmd_vdso_and_check = VDSO $@
> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check)
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index efd9e9ea17f2..f6cab2316c46 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -184,6 +184,8 @@ SECTIONS
>
> BUG_TABLE
>
> + PGO_CLANG_DATA
> +
> ORC_UNWIND_TABLE
>
> . = ALIGN(PAGE_SIZE);
> diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
> index 84b09c230cbd..5f22b31446ad 100644
> --- a/arch/x86/platform/efi/Makefile
> +++ b/arch/x86/platform/efi/Makefile
> @@ -2,6 +2,7 @@
> OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
> KASAN_SANITIZE := n
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
> obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..36f20e99da0b 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk
>
> # Sanitizer, etc. runtimes are unavailable and cannot be linked here.
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> KASAN_SANITIZE := n
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449..21797192f958 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
> index 5943387e3f35..54f5768f5853 100644
> --- a/arch/x86/um/vdso/Makefile
> +++ b/arch/x86/um/vdso/Makefile
> @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@
>
> VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv
> GCOV_PROFILE := n
> +PGO_PROFILE := n
>
> #
> # Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index c23466e05e60..724fb389bb9d 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
> KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
>
> GCOV_PROFILE := n
> +PGO_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 0331d5d49551..b371857097e8 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -329,6 +329,39 @@
> #define DTPM_TABLE()
> #endif
>
> +#ifdef CONFIG_PGO_CLANG
> +#define PGO_CLANG_DATA \
> + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \
> + __llvm_prf_start = .; \
> + __llvm_prf_data_start = .; \
> + *(__llvm_prf_data) \
> + __llvm_prf_data_end = .; \
> + } \
> + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \
> + __llvm_prf_cnts_start = .; \
> + *(__llvm_prf_cnts) \
> + __llvm_prf_cnts_end = .; \
> + } \
> + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \
> + __llvm_prf_names_start = .; \
> + *(__llvm_prf_names) \
> + __llvm_prf_names_end = .; \
> + } \
> + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \
> + __llvm_prf_vals_start = .; \
> + *(__llvm_prf_vals) \
> + __llvm_prf_vals_end = .; \
> + } \
> + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \
> + __llvm_prf_vnds_start = .; \
> + *(__llvm_prf_vnds) \
> + __llvm_prf_vnds_end = .; \
> + __llvm_prf_end = .; \
> + }
> +#else
> +#define PGO_CLANG_DATA
> +#endif
> +
> #define KERNEL_DTB() \
> STRUCT_ALIGN(); \
> __dtb_start = .; \
> @@ -1106,6 +1139,7 @@
> CONSTRUCTORS \
> } \
> BUG_TABLE \
> + PGO_CLANG_DATA
>
> #define INIT_TEXT_SECTION(inittext_align) \
> . = ALIGN(inittext_align); \
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 320f1f3941b7..a2a23ef2b12f 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/
> obj-$(CONFIG_KCSAN) += kcsan/
> obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
> obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o
> +obj-$(CONFIG_PGO_CLANG) += pgo/
>
> obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig
> new file mode 100644
> index 000000000000..76a640b6cf6e
> --- /dev/null
> +++ b/kernel/pgo/Kconfig
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)"
> +
> +config ARCH_SUPPORTS_PGO_CLANG
> + bool
> +
> +config PGO_CLANG
> + bool "Enable clang's PGO-based kernel profiling"
> + depends on DEBUG_FS
> + depends on ARCH_SUPPORTS_PGO_CLANG
> + depends on CC_IS_CLANG && CLANG_VERSION >= 120000
> + help
> + This option enables clang's PGO (Profile Guided Optimization) based
> + code profiling to better optimize the kernel.
> +
> + If unsure, say N.
> +
> + Run a representative workload for your application on a kernel
> + compiled with this option and download the raw profile file from
> + /sys/kernel/debug/pgo/profraw. This file needs to be processed with
> + llvm-profdata. It may be merged with other collected raw profiles.
> +
> + Copy the resulting profile file into vmlinux.profdata, and enable
> + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized
> + kernel.
> +
> + Note that a kernel compiled with profiling flags will be
> + significantly larger and run slower. Also be sure to exclude files
> + from profiling which are not linked to the kernel image to prevent
> + linker errors.
> +
> + Note that the debugfs filesystem has to be mounted to access
> + profiling data.
> +
> +endmenu
> diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile
> new file mode 100644
> index 000000000000..41e27cefd9a4
> --- /dev/null
> +++ b/kernel/pgo/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +GCOV_PROFILE := n
> +PGO_PROFILE := n
> +
> +obj-y += fs.o instrument.o
> diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c
> new file mode 100644
> index 000000000000..1678df3b7d64
> --- /dev/null
> +++ b/kernel/pgo/fs.c
> @@ -0,0 +1,389 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/debugfs.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include "pgo.h"
> +
> +static struct dentry *directory;
> +
> +struct prf_private_data {
> + void *buffer;
> + unsigned long size;
> +};
> +
> +/*
> + * Raw profile data format:
> + *
> + * - llvm_prf_header
> + * - __llvm_prf_data
> + * - __llvm_prf_cnts
> + * - __llvm_prf_names
> + * - zero padding to 8 bytes
> + * - for each llvm_prf_data in __llvm_prf_data:
> + * - llvm_prf_value_data
> + * - llvm_prf_value_record + site count array
> + * - llvm_prf_value_node_data
> + * ...
> + * ...
> + * ...
> + */
> +
> +static void prf_fill_header(void **buffer)
> +{
> + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer;
> +
> +#ifdef CONFIG_64BIT
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64;
> +#else
> + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32;
> +#endif
> + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION;
> + header->data_size = prf_data_count();
> + header->padding_bytes_before_counters = 0;
> + header->counters_size = prf_cnts_count();
> + header->padding_bytes_after_counters = 0;
> + header->names_size = prf_names_count();
> + header->counters_delta = (u64)__llvm_prf_cnts_start;
> + header->names_delta = (u64)__llvm_prf_names_start;
> + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST;
> +
> + *buffer += sizeof(*header);
> +}
> +
> +/*
> + * Copy the source into the buffer, incrementing the pointer into buffer in the
> + * process.
> + */
> +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size)
> +{
> + memcpy(*buffer, src, size);
> + *buffer += size;
> +}
> +
> +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds)
> +{
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + u32 kinds = 0;
> + u32 size = 0;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Record + site count array */
> + size += prf_get_value_record_size(sites);
> + kinds++;
> +
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX)
> + site = site->next;
> +
> + size += count *
> + sizeof(struct llvm_prf_value_node_data);
> + }
> +
> + s += sites;
> + }
> +
> + if (size)
> + size += sizeof(struct llvm_prf_value_data);
> +
> + if (value_kinds)
> + *value_kinds = kinds;
> +
> + return size;
> +}
> +
> +static u32 prf_get_value_size(void)
> +{
> + u32 size = 0;
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + size += __prf_get_value_size(p, NULL);
> +
> + return size;
> +}
> +
> +/* Serialize the profiling's value. */
> +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer)
> +{
> + struct llvm_prf_value_data header;
> + struct llvm_prf_value_node **nodes =
> + (struct llvm_prf_value_node **)p->values;
> + unsigned int kind;
> + unsigned int n;
> + unsigned int s = 0;
> +
> + header.total_size = __prf_get_value_size(p, &header.num_value_kinds);
> +
> + if (!header.num_value_kinds)
> + /* Nothing to write. */
> + return;
> +
> + prf_copy_to_buffer(buffer, &header, sizeof(header));
> +
> + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) {
> + struct llvm_prf_value_record *record;
> + u8 *counts;
> + unsigned int sites = p->num_value_sites[kind];
> +
> + if (!sites)
> + continue;
> +
> + /* Profiling value record. */
> + record = *(struct llvm_prf_value_record **)buffer;
> + *buffer += prf_get_value_record_header_size();
> +
> + record->kind = kind;
> + record->num_value_sites = sites;
> +
> + /* Site count array. */
> + counts = *(u8 **)buffer;
> + *buffer += prf_get_value_record_site_count_size(sites);
> +
> + /*
> + * If we don't have nodes, we can skip updating the site count
> + * array, because the buffer is zero filled.
> + */
> + if (!nodes)
> + continue;
> +
> + for (n = 0; n < sites; n++) {
> + u32 count = 0;
> + struct llvm_prf_value_node *site = nodes[s + n];
> +
> + while (site && ++count <= U8_MAX) {
> + prf_copy_to_buffer(buffer, site,
> + sizeof(struct llvm_prf_value_node_data));
> + site = site->next;
> + }
> +
> + counts[n] = (u8)count;
> + }
> +
> + s += sites;
> + }
> +}
> +
> +static void prf_serialize_values(void **buffer)
> +{
> + struct llvm_prf_data *p;
> +
> + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++)
> + prf_serialize_value(p, buffer);
> +}
> +
> +static inline unsigned long prf_get_padding(unsigned long size)
> +{
> + return 7 & (sizeof(u64) - size % sizeof(u64));
> +}
> +
> +static unsigned long prf_buffer_size(void)
> +{
> + return sizeof(struct llvm_prf_header) +
> + prf_data_size() +
> + prf_cnts_size() +
> + prf_names_size() +
> + prf_get_padding(prf_names_size()) +
> + prf_get_value_size();
> +}
> +
> +/*
> + * Serialize the profiling data into a format LLVM's tools can understand.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static int prf_serialize(struct prf_private_data *p)
> +{
> + int err = 0;
> + void *buffer;
> +
> + p->size = prf_buffer_size();
> + p->buffer = vzalloc(p->size);
> +
> + if (!p->buffer) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + buffer = p->buffer;
> +
> + prf_fill_header(&buffer);
> + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size());
> + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size());
> + buffer += prf_get_padding(prf_names_size());
> +
> + prf_serialize_values(&buffer);
> +
> +out:
> + return err;
> +}
> +
> +/* open() implementation for PGO. Creates a copy of the profiling data set. */
> +static int prf_open(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data;
> + unsigned long flags;
> + int err;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + flags = prf_lock();
> +
> + err = prf_serialize(data);
> + if (unlikely(err)) {
> + kfree(data);
> + goto out_unlock;
> + }
> +
> + file->private_data = data;
> +
> +out_unlock:
> + prf_unlock(flags);
> +out:
> + return err;
> +}
> +
> +/* read() implementation for PGO. */
> +static ssize_t prf_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + BUG_ON(!data);

I've changed this to:

if (WARN_ON_ONCE(!data))
return -ENOMEM;

> +
> + return simple_read_from_buffer(buf, count, ppos, data->buffer,
> + data->size);
> +}
> +
> +/* release() implementation for PGO. Release resources allocated by open(). */
> +static int prf_release(struct inode *inode, struct file *file)
> +{
> + struct prf_private_data *data = file->private_data;
> +
> + if (data) {
> + vfree(data->buffer);
> + kfree(data);
> + }
> +
> + return 0;
> +}
> +
> +static const struct file_operations prf_fops = {
> + .owner = THIS_MODULE,
> + .open = prf_open,
> + .read = prf_read,
> + .llseek = default_llseek,
> + .release = prf_release
> +};
> +
> +/* write() implementation for resetting PGO's profile data. */
> +static ssize_t reset_write(struct file *file, const char __user *addr,
> + size_t len, loff_t *pos)
> +{
> + struct llvm_prf_data *data;
> +
> + memset(__llvm_prf_cnts_start, 0, prf_cnts_size());
> +
> + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) {
> + struct llvm_prf_value_node **vnodes;
> + u64 current_vsite_count;
> + u32 i;
> +
> + if (!data->values)
> + continue;
> +
> + current_vsite_count = 0;
> + vnodes = (struct llvm_prf_value_node **)data->values;
> +
> + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++)
> + current_vsite_count += data->num_value_sites[i];
> +
> + for (i = 0; i < current_vsite_count; i++) {
> + struct llvm_prf_value_node *current_vnode = vnodes[i];
> +
> + while (current_vnode) {
> + current_vnode->count = 0;
> + current_vnode = current_vnode->next;
> + }
> + }
> + }
> +
> + return len;
> +}
> +
> +static const struct file_operations prf_reset_fops = {
> + .owner = THIS_MODULE,
> + .write = reset_write,
> + .llseek = noop_llseek,
> +};
> +
> +/* Create debugfs entries. */
> +static int __init pgo_init(void)
> +{
> + directory = debugfs_create_dir("pgo", NULL);
> + if (!directory)
> + goto err_remove;
> +
> + if (!debugfs_create_file("profraw", 0600, directory, NULL,
> + &prf_fops))
> + goto err_remove;
> +
> + if (!debugfs_create_file("reset", 0200, directory, NULL,
> + &prf_reset_fops))
> + goto err_remove;
> +
> + return 0;
> +
> +err_remove:
> + pr_err("initialization failed\n");
> + return -EIO;
> +}
> +
> +/* Remove debugfs entries. */
> +static void __exit pgo_exit(void)
> +{
> + debugfs_remove_recursive(directory);
> +}
> +
> +module_init(pgo_init);
> +module_exit(pgo_exit);
> diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c
> new file mode 100644
> index 000000000000..464b3bc77431
> --- /dev/null
> +++ b/kernel/pgo/instrument.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#define pr_fmt(fmt) "pgo: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/kernel.h>
> +#include <linux/export.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +#include "pgo.h"
> +
> +/*
> + * This lock guards both profile count updating and serialization of the
> + * profiling data. Keeping both of these activities separate via locking
> + * ensures that we don't try to serialize data that's only partially updated.
> + */
> +static DEFINE_SPINLOCK(pgo_lock);
> +static int current_node;
> +
> +unsigned long prf_lock(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pgo_lock, flags);
> +
> + return flags;
> +}
> +
> +void prf_unlock(unsigned long flags)
> +{
> + spin_unlock_irqrestore(&pgo_lock, flags);
> +}
> +
> +/*
> + * Return a newly allocated profiling value node which contains the tracked
> + * value by the value profiler.
> + * Note: caller *must* hold pgo_lock.
> + */
> +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p,
> + u32 index, u64 value)
> +{
> + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end)
> + return NULL; /* Out of nodes */
> +
> + current_node++;
> +
> + /* Make sure the node is entirely within the section */
> + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end ||
> + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end)
> + return NULL;
> +
> + return &__llvm_prf_vnds_start[current_node];
> +}
> +
> +/*
> + * Counts the number of times a target value is seen.
> + *
> + * Records the target value for the index if not seen before. Otherwise,
> + * increments the counter associated w/ the target value.
> + */
> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index);

For each of these declarations, I've moved them to the pgo.h file so
both W=1 and checkpatch.pl stay happy.

> +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index)
> +{
> + struct llvm_prf_data *p = (struct llvm_prf_data *)data;
> + struct llvm_prf_value_node **counters;
> + struct llvm_prf_value_node *curr;
> + struct llvm_prf_value_node *min = NULL;
> + struct llvm_prf_value_node *prev = NULL;
> + u64 min_count = U64_MAX;
> + u8 values = 0;
> + unsigned long flags;
> +
> + if (!p || !p->values)
> + return;
> +
> + counters = (struct llvm_prf_value_node **)p->values;
> + curr = counters[index];
> +
> + while (curr) {
> + if (target_value == curr->value) {
> + curr->count++;
> + return;
> + }
> +
> + if (curr->count < min_count) {
> + min_count = curr->count;
> + min = curr;
> + }
> +
> + prev = curr;
> + curr = curr->next;
> + values++;
> + }
> +
> + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) {
> + if (!min->count || !(--min->count)) {
> + curr = min;
> + curr->value = target_value;
> + curr->count++;
> + }
> + return;
> + }
> +
> + /* Lock when updating the value node structure. */
> + flags = prf_lock();
> +
> + curr = allocate_node(p, index, target_value);
> + if (!curr)
> + goto out;
> +
> + curr->value = target_value;
> + curr->count++;
> +
> + if (!counters[index])
> + counters[index] = curr;
> + else if (prev && !prev->next)
> + prev->next = curr;
> +
> +out:
> + prf_unlock(flags);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_target);
> +
> +/* Counts the number of times a range of targets values are seen. */
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value);
> +void __llvm_profile_instrument_range(u64 target_value, void *data,
> + u32 index, s64 precise_start,
> + s64 precise_last, s64 large_value)
> +{
> + if (large_value != S64_MIN && (s64)target_value >= large_value)
> + target_value = large_value;
> + else if ((s64)target_value < precise_start ||
> + (s64)target_value > precise_last)
> + target_value = precise_last + 1;
> +
> + __llvm_profile_instrument_target(target_value, data, index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_range);
> +
> +static u64 inst_prof_get_range_rep_value(u64 value)
> +{
> + if (value <= 8)
> + /* The first ranges are individually tracked, use it as is. */
> + return value;
> + else if (value >= 513)
> + /* The last range is mapped to its lowest value. */
> + return 513;
> + else if (hweight64(value) == 1)
> + /* If it's a power of two, use it as is. */
> + return value;
> +
> + /* Otherwise, take to the previous power of two + 1. */
> + return ((u64)1 << (64 - __builtin_clzll(value) - 1)) + 1;
> +}
> +
> +/*
> + * The target values are partitioned into multiple ranges. The range spec is
> + * defined in compiler-rt/include/profile/InstrProfData.inc.
> + */
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index);
> +void __llvm_profile_instrument_memop(u64 target_value, void *data,
> + u32 counter_index)
> +{
> + u64 rep_value;
> +
> + /* Map the target value to the representative value of its range. */
> + rep_value = inst_prof_get_range_rep_value(target_value);
> + __llvm_profile_instrument_target(rep_value, data, counter_index);
> +}
> +EXPORT_SYMBOL(__llvm_profile_instrument_memop);
> diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h
> new file mode 100644
> index 000000000000..ddc8d3002fe5
> --- /dev/null
> +++ b/kernel/pgo/pgo.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 Google, Inc.
> + *
> + * Author:
> + * Sami Tolvanen <[email protected]>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _PGO_H
> +#define _PGO_H
> +
> +/*
> + * Note: These internal LLVM definitions must match the compiler version.
> + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code.
> + */
> +
> +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'r' << 8 | \
> + (u64)129)
> +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \
> + ((u64)255 << 56 | \
> + (u64)'l' << 48 | \
> + (u64)'p' << 40 | \
> + (u64)'r' << 32 | \
> + (u64)'o' << 24 | \
> + (u64)'f' << 16 | \
> + (u64)'R' << 8 | \
> + (u64)129)
> +
> +#define LLVM_INSTR_PROF_RAW_VERSION 5
> +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8
> +#define LLVM_INSTR_PROF_IPVK_FIRST 0
> +#define LLVM_INSTR_PROF_IPVK_LAST 1
> +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255
> +
> +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56)
> +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
> +
> +/**
> + * struct llvm_prf_header - represents the raw profile header data structure.
> + * @magic: the magic token for the file format.
> + * @version: the version of the file format.
> + * @data_size: the number of entries in the profile data section.
> + * @padding_bytes_before_counters: the number of padding bytes before the
> + * counters.
> + * @counters_size: the size in bytes of the LLVM profile section containing the
> + * counters.
> + * @padding_bytes_after_counters: the number of padding bytes after the
> + * counters.
> + * @names_size: the size in bytes of the LLVM profile section containing the
> + * counters' names.
> + * @counters_delta: the beginning of the LLMV profile counters section.
> + * @names_delta: the beginning of the LLMV profile names section.
> + * @value_kind_last: the last profile value kind.
> + */
> +struct llvm_prf_header {
> + u64 magic;
> + u64 version;
> + u64 data_size;
> + u64 padding_bytes_before_counters;
> + u64 counters_size;
> + u64 padding_bytes_after_counters;
> + u64 names_size;
> + u64 counters_delta;
> + u64 names_delta;
> + u64 value_kind_last;
> +};
> +
> +/**
> + * struct llvm_prf_data - represents the per-function control structure.
> + * @name_ref: the reference to the function's name.
> + * @func_hash: the hash value of the function.
> + * @counter_ptr: a pointer to the profile counter.
> + * @function_ptr: a pointer to the function.
> + * @values: the profiling values associated with this function.
> + * @num_counters: the number of counters in the function.
> + * @num_value_sites: the number of value profile sites.
> + */
> +struct llvm_prf_data {
> + const u64 name_ref;
> + const u64 func_hash;
> + const void *counter_ptr;
> + const void *function_ptr;
> + void *values;
> + const u32 num_counters;
> + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1];
> +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT);
> +
> +/**
> + * structure llvm_prf_value_node_data - represents the data part of the struct
> + * llvm_prf_value_node data structure.
> + * @value: the value counters.
> + * @count: the counters' count.
> + */
> +struct llvm_prf_value_node_data {
> + u64 value;
> + u64 count;
> +};
> +
> +/**
> + * struct llvm_prf_value_node - represents an internal data structure used by
> + * the value profiler.
> + * @value: the value counters.
> + * @count: the counters' count.
> + * @next: the next value node.
> + */
> +struct llvm_prf_value_node {
> + u64 value;
> + u64 count;
> + struct llvm_prf_value_node *next;
> +};
> +
> +/**
> + * struct llvm_prf_value_data - represents the value profiling data in indexed
> + * format.
> + * @total_size: the total size in bytes including this field.
> + * @num_value_kinds: the number of value profile kinds that has value profile
> + * data.
> + */
> +struct llvm_prf_value_data {
> + u32 total_size;
> + u32 num_value_kinds;
> +};
> +
> +/**
> + * struct llvm_prf_value_record - represents the on-disk layout of the value
> + * profile data of a particular kind for one function.
> + * @kind: the kind of the value profile record.
> + * @num_value_sites: the number of value profile sites.
> + * @site_count_array: the first element of the array that stores the number
> + * of profiled values for each value site.
> + */
> +struct llvm_prf_value_record {
> + u32 kind;
> + u32 num_value_sites;
> + u8 site_count_array[];
> +};
> +
> +#define prf_get_value_record_header_size() \
> + offsetof(struct llvm_prf_value_record, site_count_array)
> +#define prf_get_value_record_site_count_size(sites) \
> + roundup((sites), 8)
> +#define prf_get_value_record_size(sites) \
> + (prf_get_value_record_header_size() + \
> + prf_get_value_record_site_count_size((sites)))
> +
> +/* Data sections */
> +extern struct llvm_prf_data __llvm_prf_data_start[];
> +extern struct llvm_prf_data __llvm_prf_data_end[];
> +
> +extern u64 __llvm_prf_cnts_start[];
> +extern u64 __llvm_prf_cnts_end[];
> +
> +extern char __llvm_prf_names_start[];
> +extern char __llvm_prf_names_end[];
> +
> +extern struct llvm_prf_value_node __llvm_prf_vnds_start[];
> +extern struct llvm_prf_value_node __llvm_prf_vnds_end[];
> +
> +/* Locking for vnodes */
> +extern unsigned long prf_lock(void);
> +extern void prf_unlock(unsigned long flags);
> +
> +#define __DEFINE_PRF_SIZE(s) \
> + static inline unsigned long prf_ ## s ## _size(void) \
> + { \
> + unsigned long start = \
> + (unsigned long)__llvm_prf_ ## s ## _start; \
> + unsigned long end = \
> + (unsigned long)__llvm_prf_ ## s ## _end; \
> + return roundup(end - start, \
> + sizeof(__llvm_prf_ ## s ## _start[0])); \
> + } \
> + static inline unsigned long prf_ ## s ## _count(void) \
> + { \
> + return prf_ ## s ## _size() / \
> + sizeof(__llvm_prf_ ## s ## _start[0]); \
> + }
> +
> +__DEFINE_PRF_SIZE(data);
> +__DEFINE_PRF_SIZE(cnts);
> +__DEFINE_PRF_SIZE(names);
> +__DEFINE_PRF_SIZE(vnds);
> +
> +#undef __DEFINE_PRF_SIZE
> +
> +#endif /* _PGO_H */
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 8cd67b1b6d15..d411e92dd0d6 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \
> $(CFLAGS_GCOV))
> endif
>
> +#
> +# Enable clang's PGO profiling flags for a file or directory depending on
> +# variables PGO_PROFILE_obj.o and PGO_PROFILE.
> +#
> +ifeq ($(CONFIG_PGO_CLANG),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \
> + $(CFLAGS_PGO_CLANG))
> +endif
> +
> #
> # Enable address sanitizer flags for kernel except some files or directories
> # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)
> --
> 2.31.0.208.g409f899ff0-goog
>

I've added this to patch to my -next tree now:

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/clang/pgo&id=e1af496cbe9b4517428601a4e44fee3602dd3c15

Thanks!

-Kees

--
Kees Cook

2021-05-22 23:54:47

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, May 19, 2021 at 2:37 PM Kees Cook <[email protected]> wrote:
>
> I've added this to patch to my -next tree now:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/clang/pgo&id=e1af496cbe9b4517428601a4e44fee3602dd3c15
>
> Thanks!
> Kees Cook

Thank you!

-bw

2021-05-31 21:13:36

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, May 19, 2021 at 02:37:26PM -0700, Kees Cook wrote:
> I've added this to patch to my -next tree now:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/clang/pgo&id=e1af496cbe9b4517428601a4e44fee3602dd3c15
>

Would this be appropriate to send? Someone sent some patches based on
this work so it would be nice to solidify how they will get to Linus
if/when the time comes :)

https://lore.kernel.org/r/[email protected]/
https://lore.kernel.org/r/[email protected]/
https://lore.kernel.org/r/[email protected]/
https://lore.kernel.org/r/[email protected]/
https://lore.kernel.org/r/[email protected]/
https://lore.kernel.org/r/[email protected]/

Cheers,
Nathan

======================================

diff --git a/MAINTAINERS b/MAINTAINERS
index c45613c30803..0d03f6ccdb70 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14378,9 +14378,13 @@ F: include/uapi/linux/personality.h
PGO BASED KERNEL PROFILING
M: Sami Tolvanen <[email protected]>
M: Bill Wendling <[email protected]>
+M: Kees Cook <[email protected]>
R: Nathan Chancellor <[email protected]>
R: Nick Desaulniers <[email protected]>
+L: [email protected]
S: Supported
+B: https://github.com/ClangBuiltLinux/linux/issues
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/clang/pgo
F: Documentation/dev-tools/pgo.rst
F: kernel/pgo/

2021-06-01 17:35:45

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, May 31, 2021 at 2:12 PM Nathan Chancellor <[email protected]> wrote:
> Would this be appropriate to send?
>
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c45613c30803..0d03f6ccdb70 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14378,9 +14378,13 @@ F: include/uapi/linux/personality.h
> PGO BASED KERNEL PROFILING
> M: Sami Tolvanen <[email protected]>
> M: Bill Wendling <[email protected]>
> +M: Kees Cook <[email protected]>
> R: Nathan Chancellor <[email protected]>
> R: Nick Desaulniers <[email protected]>
> +L: [email protected]
> S: Supported
> +B: https://github.com/ClangBuiltLinux/linux/issues
> +T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/clang/pgo
> F: Documentation/dev-tools/pgo.rst
> F: kernel/pgo/
>

I think so.
Acked-by: Nick Desaulniers <[email protected]>
--
Thanks,
~Nick Desaulniers

2021-06-01 18:54:43

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, May 31, 2021 at 02:12:46PM -0700, Nathan Chancellor wrote:
> On Wed, May 19, 2021 at 02:37:26PM -0700, Kees Cook wrote:
> > I've added this to patch to my -next tree now:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/clang/pgo&id=e1af496cbe9b4517428601a4e44fee3602dd3c15
> >
>
> Would this be appropriate to send? Someone sent some patches based on
> this work so it would be nice to solidify how they will get to Linus
> if/when the time comes :)

Yeah, good idea.

> https://lore.kernel.org/r/[email protected]/
> https://lore.kernel.org/r/[email protected]/
> https://lore.kernel.org/r/[email protected]/
> https://lore.kernel.org/r/[email protected]/
> https://lore.kernel.org/r/[email protected]/
> https://lore.kernel.org/r/[email protected]/

BTW, Jarmo, if you haven't had this suggested yet, I'd recommend using
this kind of a script for your email sending workflow to get a set of
threaded patches:

#!/bin/sh
set -x

MYSELF="Your Name <[email protected]>"
prefix="PATCH"
# or
#prefix="PATCH v2"
# etc...
SHA="SHA your series is based on"


format_args="--cover-letter -n -o outgoing/"
maint_args="--norolestats"

mkdir -p outgoing
git format-patch $format_args --subject-prefix "$prefix" "$SHA"

./scripts/checkpatch.pl "$@" --codespell outgoing/0*patch

${EDITOR:-vi} outgoing/*

# Send patches
git send-email --transfer-encoding=8bit --8bit-encoding=UTF-8 \
--no-chain-reply-to --thread \
--from="$MYSELF" --cc="$MYSELF" \
--to-cmd="./scripts/get_maintainer.pl $maint_args -m" \
--cc-cmd="./scripts/get_maintainer.pl $maint_args --nom" \
outgoing/*


>
> Cheers,
> Nathan
>
> ======================================
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c45613c30803..0d03f6ccdb70 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14378,9 +14378,13 @@ F: include/uapi/linux/personality.h
> PGO BASED KERNEL PROFILING
> M: Sami Tolvanen <[email protected]>
> M: Bill Wendling <[email protected]>
> +M: Kees Cook <[email protected]>
> R: Nathan Chancellor <[email protected]>
> R: Nick Desaulniers <[email protected]>
> +L: [email protected]
> S: Supported
> +B: https://github.com/ClangBuiltLinux/linux/issues
> +T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/clang/pgo

I think I'm going to keep things combined in a single tree for now since the patch rate
is low:

+T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/clang/features

> F: Documentation/dev-tools/pgo.rst
> F: kernel/pgo/
>

I should likely do the same entry for CFI.

--
Kees Cook

2021-06-12 17:06:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> From: Sami Tolvanen <[email protected]>
>
> Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> profile, the kernel is instrumented with PGO counters, a representative
> workload is run, and the raw profile data is collected from
> /sys/kernel/debug/pgo/profraw.
>
> The raw profile data must be processed by clang's "llvm-profdata" tool
> before it can be used during recompilation:
>
> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
> Multiple raw profiles may be merged during this step.
>
> The data can now be used by the compiler:
>
> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
>
> This initial submission is restricted to x86, as that's the platform we
> know works. This restriction can be lifted once other platforms have
> been verified to work with PGO.

*sigh*, and not a single x86 person on Cc, how nice :-/

> Note that this method of profiling the kernel is clang-native, unlike
> the clang support in kernel/gcov.
>
> [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Also, and I don't see this answered *anywhere*, why are you not using
perf for this? Your link even mentions Sampling Profilers (and I happen
to know there's been significant effort to make perf output work as
input for the PGO passes of the various compilers).

> Signed-off-by: Sami Tolvanen <[email protected]>
> Co-developed-by: Bill Wendling <[email protected]>
> Signed-off-by: Bill Wendling <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Reviewed-by: Fangrui Song <[email protected]>
> ---
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/pgo.rst | 127 +++++++++
> MAINTAINERS | 9 +
> Makefile | 3 +
> arch/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/crypto/Makefile | 4 +
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/kernel/vmlinux.lds.S | 2 +
> arch/x86/platform/efi/Makefile | 1 +
> arch/x86/purgatory/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/um/vdso/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 34 +++
> kernel/Makefile | 1 +
> kernel/pgo/Kconfig | 35 +++
> kernel/pgo/Makefile | 5 +
> kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> kernel/pgo/instrument.c | 189 +++++++++++++
> kernel/pgo/pgo.h | 203 ++++++++++++++
> scripts/Makefile.lib | 10 +
> 24 files changed, 1022 insertions(+)
> create mode 100644 Documentation/dev-tools/pgo.rst
> create mode 100644 kernel/pgo/Kconfig
> create mode 100644 kernel/pgo/Makefile
> create mode 100644 kernel/pgo/fs.c
> create mode 100644 kernel/pgo/instrument.c
> create mode 100644 kernel/pgo/pgo.h

> --- a/Makefile
> +++ b/Makefile
> @@ -660,6 +660,9 @@ endif # KBUILD_EXTMOD
> # Defaults to vmlinux, but the arch makefile usually adds further targets
> all: vmlinux
>
> +CFLAGS_PGO_CLANG := -fprofile-generate
> +export CFLAGS_PGO_CLANG
> +
> CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> $(call cc-option,-fno-tree-loop-im) \
> $(call cc-disable-warning,maybe-uninitialized,)

And which of the many flags in noinstr disables this?

Basically I would like to NAK this whole thing until someone can
adequately explain the interaction with noinstr and why we need those
many lines of kernel code and can't simply use perf for this.

2021-06-12 17:30:01

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 9:59 AM Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> > From: Sami Tolvanen <[email protected]>
> >
> > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > profile, the kernel is instrumented with PGO counters, a representative
> > workload is run, and the raw profile data is collected from
> > /sys/kernel/debug/pgo/profraw.
> >
> > The raw profile data must be processed by clang's "llvm-profdata" tool
> > before it can be used during recompilation:
> >
> > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> >
> > Multiple raw profiles may be merged during this step.
> >
> > The data can now be used by the compiler:
> >
> > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> >
> > This initial submission is restricted to x86, as that's the platform we
> > know works. This restriction can be lifted once other platforms have
> > been verified to work with PGO.
>
> *sigh*, and not a single x86 person on Cc, how nice :-/
>
This tool is generic and, despite the fact that it's first enabled for
x86, it contains no x86-specific code. The reason we're restricting it
to x86 is because it's the platform we tested on.

> > Note that this method of profiling the kernel is clang-native, unlike
> > the clang support in kernel/gcov.
> >
> > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
>
> Also, and I don't see this answered *anywhere*, why are you not using
> perf for this? Your link even mentions Sampling Profilers (and I happen
> to know there's been significant effort to make perf output work as
> input for the PGO passes of the various compilers).
>
Instruction-based (non-sampling) profiling gives us a better
context-sensitive profile, making PGO more impactful. It's also useful
for coverage whereas sampling profiles cannot.

> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Co-developed-by: Bill Wendling <[email protected]>
> > Signed-off-by: Bill Wendling <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
> > Reviewed-by: Nick Desaulniers <[email protected]>
> > Reviewed-by: Fangrui Song <[email protected]>
> > ---
> > Documentation/dev-tools/index.rst | 1 +
> > Documentation/dev-tools/pgo.rst | 127 +++++++++
> > MAINTAINERS | 9 +
> > Makefile | 3 +
> > arch/Kconfig | 1 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/crypto/Makefile | 4 +
> > arch/x86/entry/vdso/Makefile | 1 +
> > arch/x86/kernel/vmlinux.lds.S | 2 +
> > arch/x86/platform/efi/Makefile | 1 +
> > arch/x86/purgatory/Makefile | 1 +
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/um/vdso/Makefile | 1 +
> > drivers/firmware/efi/libstub/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 34 +++
> > kernel/Makefile | 1 +
> > kernel/pgo/Kconfig | 35 +++
> > kernel/pgo/Makefile | 5 +
> > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++
> > kernel/pgo/instrument.c | 189 +++++++++++++
> > kernel/pgo/pgo.h | 203 ++++++++++++++
> > scripts/Makefile.lib | 10 +
> > 24 files changed, 1022 insertions(+)
> > create mode 100644 Documentation/dev-tools/pgo.rst
> > create mode 100644 kernel/pgo/Kconfig
> > create mode 100644 kernel/pgo/Makefile
> > create mode 100644 kernel/pgo/fs.c
> > create mode 100644 kernel/pgo/instrument.c
> > create mode 100644 kernel/pgo/pgo.h
>
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -660,6 +660,9 @@ endif # KBUILD_EXTMOD
> > # Defaults to vmlinux, but the arch makefile usually adds further targets
> > all: vmlinux
> >
> > +CFLAGS_PGO_CLANG := -fprofile-generate
> > +export CFLAGS_PGO_CLANG
> > +
> > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \
> > $(call cc-option,-fno-tree-loop-im) \
> > $(call cc-disable-warning,maybe-uninitialized,)
>
> And which of the many flags in noinstr disables this?
>
These flags aren't used with PGO. So there's no need to disable them.

> Basically I would like to NAK this whole thing until someone can
> adequately explain the interaction with noinstr and why we need those
> many lines of kernel code and can't simply use perf for this.

-bw

2021-06-12 18:17:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 10:25:57AM -0700, Bill Wendling wrote:
> On Sat, Jun 12, 2021 at 9:59 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> > > From: Sami Tolvanen <[email protected]>
> > >
> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > profile, the kernel is instrumented with PGO counters, a representative
> > > workload is run, and the raw profile data is collected from
> > > /sys/kernel/debug/pgo/profraw.
> > >
> > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > before it can be used during recompilation:
> > >
> > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > >
> > > Multiple raw profiles may be merged during this step.
> > >
> > > The data can now be used by the compiler:
> > >
> > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > >
> > > This initial submission is restricted to x86, as that's the platform we
> > > know works. This restriction can be lifted once other platforms have
> > > been verified to work with PGO.
> >
> > *sigh*, and not a single x86 person on Cc, how nice :-/
> >
> This tool is generic and, despite the fact that it's first enabled for
> x86, it contains no x86-specific code. The reason we're restricting it
> to x86 is because it's the platform we tested on.

You're modifying a lot of x86 files, you don't think it's good to let us
know? Worse, afaict this -fprofile-generate changes code generation,
and we definitely want to know about that.

> > > arch/x86/Kconfig | 1 +
> > > arch/x86/boot/Makefile | 1 +
> > > arch/x86/boot/compressed/Makefile | 1 +
> > > arch/x86/crypto/Makefile | 4 +
> > > arch/x86/entry/vdso/Makefile | 1 +
> > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > arch/x86/platform/efi/Makefile | 1 +
> > > arch/x86/purgatory/Makefile | 1 +
> > > arch/x86/realmode/rm/Makefile | 1 +
> > > arch/x86/um/vdso/Makefile | 1 +


> > > +CFLAGS_PGO_CLANG := -fprofile-generate
> > > +export CFLAGS_PGO_CLANG

> > And which of the many flags in noinstr disables this?
> >
> These flags aren't used with PGO. So there's no need to disable them.

Supposedly -fprofile-generate adds instrumentation to the generated
code. noinstr *MUST* disable that. If not, this is a complete
non-starter for x86.

> > Also, and I don't see this answered *anywhere*, why are you not using
> > perf for this? Your link even mentions Sampling Profilers (and I happen
> > to know there's been significant effort to make perf output work as
> > input for the PGO passes of the various compilers).
> >
> Instruction-based (non-sampling) profiling gives us a better
> context-sensitive profile, making PGO more impactful. It's also useful
> for coverage whereas sampling profiles cannot.

We've got KCOV and GCOV support already. Coverage is also not an
argument mentioned anywhere else. Coverage can go pound sand, we really
don't need a third means of getting that.

Do you have actual numbers that back up the sampling vs instrumented
argument? Having the instrumentation will affect performance which can
scew the profile just the same.

Also, sampling tends to capture the hot spots very well.

2021-06-12 19:14:34

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

")On Sat, Jun 12, 2021 at 11:15 AM Peter Zijlstra <[email protected]> wrote:
>
> On Sat, Jun 12, 2021 at 10:25:57AM -0700, Bill Wendling wrote:
> > On Sat, Jun 12, 2021 at 9:59 AM Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> > > > From: Sami Tolvanen <[email protected]>
> > > >
> > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > workload is run, and the raw profile data is collected from
> > > > /sys/kernel/debug/pgo/profraw.
> > > >
> > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > before it can be used during recompilation:
> > > >
> > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > >
> > > > Multiple raw profiles may be merged during this step.
> > > >
> > > > The data can now be used by the compiler:
> > > >
> > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > >
> > > > This initial submission is restricted to x86, as that's the platform we
> > > > know works. This restriction can be lifted once other platforms have
> > > > been verified to work with PGO.
> > >
> > > *sigh*, and not a single x86 person on Cc, how nice :-/
> > >
> > This tool is generic and, despite the fact that it's first enabled for
> > x86, it contains no x86-specific code. The reason we're restricting it
> > to x86 is because it's the platform we tested on.
>
> You're modifying a lot of x86 files, you don't think it's good to let us
> know? Worse, afaict this -fprofile-generate changes code generation,
> and we definitely want to know about that.
>
I got the list of people to add from the scripts/get_maintainer.pl.
The files you list below are mostly changes in Makefile, so it added
the kbuild maintainers and list. There's a small change to the linker
script to add the clang PGO data section, which is defined in
"include/asm-generic/vmlinux.lds.h". Using the "kernel/gcov" initial
implementation as a guildlline
(2521f2c228ad750701ba4702484e31d876dbc386), there's one intel people
CC'ed, but he didn't sign off on it. These patches were available for
review for months now, and posted to all of the lists and CC'ed to the
people from scripts/get_maintainers.pl. Perhaps that program should be
improved?

> > > > arch/x86/Kconfig | 1 +
> > > > arch/x86/boot/Makefile | 1 +
> > > > arch/x86/boot/compressed/Makefile | 1 +
> > > > arch/x86/crypto/Makefile | 4 +
> > > > arch/x86/entry/vdso/Makefile | 1 +
> > > > arch/x86/kernel/vmlinux.lds.S | 2 +
> > > > arch/x86/platform/efi/Makefile | 1 +
> > > > arch/x86/purgatory/Makefile | 1 +
> > > > arch/x86/realmode/rm/Makefile | 1 +
> > > > arch/x86/um/vdso/Makefile | 1 +
>
>
> > > > +CFLAGS_PGO_CLANG := -fprofile-generate
> > > > +export CFLAGS_PGO_CLANG
>
> > > And which of the many flags in noinstr disables this?
> > >
> > These flags aren't used with PGO. So there's no need to disable them.
>
> Supposedly -fprofile-generate adds instrumentation to the generated
> code. noinstr *MUST* disable that. If not, this is a complete
> non-starter for x86.

"noinstr" has "notrace", which is defined as
"__attribute__((__no_instrument_function__))", which is honored by
both gcc and clang.

> > > Also, and I don't see this answered *anywhere*, why are you not using
> > > perf for this? Your link even mentions Sampling Profilers (and I happen
> > > to know there's been significant effort to make perf output work as
> > > input for the PGO passes of the various compilers).
> > >
> > Instruction-based (non-sampling) profiling gives us a better
> > context-sensitive profile, making PGO more impactful. It's also useful
> > for coverage whereas sampling profiles cannot.
>
> We've got KCOV and GCOV support already. Coverage is also not an
> argument mentioned anywhere else. Coverage can go pound sand, we really
> don't need a third means of getting that.
>
Those aren't useful for clang-based implementations. And I like to
look forward to potential improvements.

> Do you have actual numbers that back up the sampling vs instrumented
> argument? Having the instrumentation will affect performance which can
> scew the profile just the same.
>
Instrumentation counts the number of times a branch is taken. Sampling
is at a gross level, where if the sampling time is fine enough, you
can get an idea of where the hot spots are, but it won't give you the
fine-grained information that clang finds useful. Essentially, while
sampling can "capture the hot spots very well", relying solely on
sampling is basically leaving optimization on the floor.

Our optimizations experts here have determined, through data of
course, that instrumentation is the best option for PGO.

> Also, sampling tends to capture the hot spots very well.


-bw

2021-06-12 19:37:56

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 12:10 PM Bill Wendling <[email protected]> wrote:
> ")On Sat, Jun 12, 2021 at 11:15 AM Peter Zijlstra <[email protected]> wrote:
> > On Sat, Jun 12, 2021 at 10:25:57AM -0700, Bill Wendling wrote:
> > > On Sat, Jun 12, 2021 at 9:59 AM Peter Zijlstra <[email protected]> wrote:
> > > >
> > > > On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote:
> > > > > From: Sami Tolvanen <[email protected]>
> > > > >
> > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a
> > > > > profile, the kernel is instrumented with PGO counters, a representative
> > > > > workload is run, and the raw profile data is collected from
> > > > > /sys/kernel/debug/pgo/profraw.
> > > > >
> > > > > The raw profile data must be processed by clang's "llvm-profdata" tool
> > > > > before it can be used during recompilation:
> > > > >
> > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > > >
> > > > > Multiple raw profiles may be merged during this step.
> > > > >
> > > > > The data can now be used by the compiler:
> > > > >
> > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...
> > > > >
> > > > > This initial submission is restricted to x86, as that's the platform we
> > > > > know works. This restriction can be lifted once other platforms have
> > > > > been verified to work with PGO.
> > > >
> > > > *sigh*, and not a single x86 person on Cc, how nice :-/
> > > >
> > > This tool is generic and, despite the fact that it's first enabled for
> > > x86, it contains no x86-specific code. The reason we're restricting it
> > > to x86 is because it's the platform we tested on.
> >
> > You're modifying a lot of x86 files, you don't think it's good to let us
> > know? Worse, afaict this -fprofile-generate changes code generation,
> > and we definitely want to know about that.
> >
> I got the list of people to add from the scripts/get_maintainer.pl.
> The files you list below are mostly changes in Makefile, so it added
> the kbuild maintainers and list. There's a small change to the linker
> script to add the clang PGO data section, which is defined in
> "include/asm-generic/vmlinux.lds.h". Using the "kernel/gcov" initial
> implementation as a guildlline
> (2521f2c228ad750701ba4702484e31d876dbc386), there's one intel people
> CC'ed, but he didn't sign off on it. These patches were available for
> review for months now, and posted to all of the lists and CC'ed to the
> people from scripts/get_maintainers.pl. Perhaps that program should be
> improved?
>
Correction: I see now that it lists X86 maintainers. That was somehow
missed in my initial submission. Sorry about that. Please add any
reviewers you think are necessary.

-bw

2021-06-12 20:28:06

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On 2021-06-12, Peter Zijlstra wrote:
>On Sat, Jun 12, 2021 at 10:25:57AM -0700, Bill Wendling wrote:
>> On Sat, Jun 12, 2021 at 9:59 AM Peter Zijlstra <[email protected]> wrote:
>> > Also, and I don't see this answered *anywhere*, why are you not using
>> > perf for this? Your link even mentions Sampling Profilers (and I happen
>> > to know there's been significant effort to make perf output work as
>> > input for the PGO passes of the various compilers).
>> >
>> Instruction-based (non-sampling) profiling gives us a better
>> context-sensitive profile, making PGO more impactful. It's also useful
>> for coverage whereas sampling profiles cannot.
>
>We've got KCOV and GCOV support already. Coverage is also not an
>argument mentioned anywhere else. Coverage can go pound sand, we really
>don't need a third means of getting that.
>
>Do you have actual numbers that back up the sampling vs instrumented
>argument? Having the instrumentation will affect performance which can
>scew the profile just the same.
>
>Also, sampling tends to capture the hot spots very well.

[I don't do kernel development. My experience is user-space toolchain.]

For applications, I think instrumentation based PGO can be 1%~4% faster
than sample-based PGO (e.g. AutoFDO) on x86.

Sample-based PGO has CPU requirement (e.g. Performance Monitoring Unit).
(my gut feeling is that there may be larger gap between instrumentation
based PGO and sample-based PGO for aarch64/ppc64, even though they can
use sample-based PGO.)
Instrumentation based PGO can be ported to more architectures.

In addition, having an infrastructure for instrumentation based PGO
makes it easy to deploy newer techniques like context-sensitive PGO
(just changed compile options; it doesn't need new source level
annotation).

2021-06-12 20:29:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 12:10:03PM -0700, Bill Wendling wrote:
> > You're modifying a lot of x86 files, you don't think it's good to let us
> > know? Worse, afaict this -fprofile-generate changes code generation,
> > and we definitely want to know about that.
> >
> I got the list of people to add from the scripts/get_maintainer.pl.

$ ./scripts/get_maintainer.pl -f arch/x86/Makefile
Thomas Gleixner <[email protected]> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Ingo Molnar <[email protected]> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Borislav Petkov <[email protected]> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
[email protected] (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))

> there's one intel people CC'ed, but he didn't sign off on it.

Intel does not employ the main x86 maintainers, even it if did, mailing
a random Google person won't get the mail to you either, would it?

> These patches were available for review for months now,

Which doesn't help if you don't Cc the right people, does it. *nobody*
has time to read LKML.

> and posted to all of the lists and CC'ed to the people from
> scripts/get_maintainers.pl. Perhaps that program should be improved?

I suspect operator error, see above.

> > Supposedly -fprofile-generate adds instrumentation to the generated
> > code. noinstr *MUST* disable that. If not, this is a complete
> > non-starter for x86.
>
> "noinstr" has "notrace", which is defined as
> "__attribute__((__no_instrument_function__))", which is honored by
> both gcc and clang.

Yes it is, but is that sufficient in this case? It very much isn't for
KASAN, UBSAN, and a whole host of other instrumentation crud. They all
needed their own 'bugger-off' attributes.

> > We've got KCOV and GCOV support already. Coverage is also not an
> > argument mentioned anywhere else. Coverage can go pound sand, we really
> > don't need a third means of getting that.
> >
> Those aren't useful for clang-based implementations. And I like to
> look forward to potential improvements.

I look forward to less things doing the same over and over. The obvious
solution if of course to make clang use what we have, not the other way
around.

> > Do you have actual numbers that back up the sampling vs instrumented
> > argument? Having the instrumentation will affect performance which can
> > scew the profile just the same.
> >
> Instrumentation counts the number of times a branch is taken. Sampling
> is at a gross level, where if the sampling time is fine enough, you
> can get an idea of where the hot spots are, but it won't give you the
> fine-grained information that clang finds useful. Essentially, while
> sampling can "capture the hot spots very well", relying solely on
> sampling is basically leaving optimization on the floor.
>
> Our optimizations experts here have determined, through data of
> course, that instrumentation is the best option for PGO.

It would be very good to post some of that data and explicit examples.
Hear-say don't carry much weight.

2021-06-12 20:33:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 01:20:15PM -0700, Fangrui Song wrote:

> For applications, I think instrumentation based PGO can be 1%~4% faster
> than sample-based PGO (e.g. AutoFDO) on x86.

Why? What specifically is missed by sample-based? I thought that LBR
augmented samples were very useful for exactly this.

> Sample-based PGO has CPU requirement (e.g. Performance Monitoring Unit).
> (my gut feeling is that there may be larger gap between instrumentation
> based PGO and sample-based PGO for aarch64/ppc64, even though they can
> use sample-based PGO.)
> Instrumentation based PGO can be ported to more architectures.

Every architecture that cares about performance had better have a
hardware PMU. Both argh64 and ppc64 have one.

> In addition, having an infrastructure for instrumentation based PGO
> makes it easy to deploy newer techniques like context-sensitive PGO
> (just changed compile options; it doesn't need new source level
> annotation).

What's this context sensitive stuff you speak of? The link provided
earlier is devoid of useful information.

2021-06-12 20:59:25

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 1:25 PM Peter Zijlstra <[email protected]> wrote:
> On Sat, Jun 12, 2021 at 12:10:03PM -0700, Bill Wendling wrote:
> Yes it is, but is that sufficient in this case? It very much isn't for
> KASAN, UBSAN, and a whole host of other instrumentation crud. They all
> needed their own 'bugger-off' attributes.
>
> > > We've got KCOV and GCOV support already. Coverage is also not an
> > > argument mentioned anywhere else. Coverage can go pound sand, we really
> > > don't need a third means of getting that.
> > >
> > Those aren't useful for clang-based implementations. And I like to
> > look forward to potential improvements.
>
> I look forward to less things doing the same over and over. The obvious
> solution if of course to make clang use what we have, not the other way
> around.
>
That is not the obvious "solution".

> > > Do you have actual numbers that back up the sampling vs instrumented
> > > argument? Having the instrumentation will affect performance which can
> > > scew the profile just the same.
> > >
> > Instrumentation counts the number of times a branch is taken. Sampling
> > is at a gross level, where if the sampling time is fine enough, you
> > can get an idea of where the hot spots are, but it won't give you the
> > fine-grained information that clang finds useful. Essentially, while
> > sampling can "capture the hot spots very well", relying solely on
> > sampling is basically leaving optimization on the floor.
> >
> > Our optimizations experts here have determined, through data of
> > course, that instrumentation is the best option for PGO.
>
> It would be very good to post some of that data and explicit examples.
> Hear-say don't carry much weight.

Should I add measurements from waving a dead chicken over my keyboard?
I heard somewhere that that works as well. Or how about a feature that
hasn't been integrated yet, like using the perf tool apparently? I'm
sure that will be worth my time. You can't just come up with a
potential, unimplemented alternative (gcov is still a thing and not
using "perf") and expect people to dance to your tune.

I could give you numbers, but they would mean nothing to you, and I
suspect that you would reject them out of hand because it may not
benefit *everything*. The nature of FDO/PGO is that it's targeted to
specific tasks.

For example, Fangrui gave you numbers, and you rejected them out of
hand. I've explained to you why instrumentation is better than
sampling (at least for clang). Fangrui gave you numbers. Let's move on
to something else.

Now, for the "nointr" issue. I'll see if we need an additional change for that.

-bw

2021-06-12 22:59:09

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 1:56 PM Bill Wendling <[email protected]> wrote:
> On Sat, Jun 12, 2021 at 1:25 PM Peter Zijlstra <[email protected]> wrote:
> > On Sat, Jun 12, 2021 at 12:10:03PM -0700, Bill Wendling wrote:
> > Yes it is, but is that sufficient in this case? It very much isn't for
> > KASAN, UBSAN, and a whole host of other instrumentation crud. They all
> > needed their own 'bugger-off' attributes.
> >
> Now, for the "nointr" issue. I'll see if we need an additional change for that.
>
The GCOV implementation disables profiling in those directories where
instrumentation would fail. We do the same. Both clang and gcc seem to
treat the no_instrument_function attribute similarly.

-bw

2021-06-13 18:11:08

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 3:47 PM Bill Wendling <[email protected]> wrote:
>
> On Sat, Jun 12, 2021 at 1:56 PM Bill Wendling <[email protected]> wrote:
> > On Sat, Jun 12, 2021 at 1:25 PM Peter Zijlstra <[email protected]> wrote:
> > > On Sat, Jun 12, 2021 at 12:10:03PM -0700, Bill Wendling wrote:
> > > Yes it is, but is that sufficient in this case? It very much isn't for
> > > KASAN, UBSAN, and a whole host of other instrumentation crud. They all
> > > needed their own 'bugger-off' attributes.
> > >
> > Now, for the "nointr" issue. I'll see if we need an additional change for that.
> >
> The GCOV implementation disables profiling in those directories where
> instrumentation would fail. We do the same. Both clang and gcc seem to
> treat the no_instrument_function attribute similarly.
>
An example:

$ cat n.c
int g(int);

int __attribute__((__no_instrument_function__))
__attribute__((no_instrument_function))
no_instr(int a) {
int sum = 0;
for (int i = 0; i < a; i++)
sum += g(i);
return sum;
}

int instr(int a) {
int sum = 0;
for (int i = 0; i < a; i++)
sum += g(i);
return sum;
}

$ gcc -S -o - n.c -fprofile-arcs -ftest-coverage -O2
.globl no_instr
.type no_instr, @function
no_instr:
.LFB0:
...
addq $1, __gcov0.no_instr(%rip)
pushq %rbp
...
.L3:
...
addq $1, 8+__gcov0.no_instr(%rip)
...
addq $1, 16+__gcov0.no_instr(%rip)
...
addq $1, 16+__gcov0.no_instr(%rip)
...
ret
.globl instr
.type instr, @function
instr:
.LFB1:
...
addq $1, __gcov0.instr(%rip)
...
addq $1, 8+__gcov0.instr(%rip)
...
addq $1, 16+__gcov0.instr(%rip)
...
addq $1, 16+__gcov0.instr(%rip)
...
ret

$ clang -S -o - n.c -fprofile-generate -O2
.globl no_instr # -- Begin function no_instr
.p2align 4, 0x90
.type no_instr,@function
no_instr: # @no_instr
...
addq $1, .L__profc_no_instr+8(%rip)
...
movq .L__profc_no_instr(%rip), %rax
...
movq %rax, .L__profc_no_instr(%rip)
...
retq
.globl instr # -- Begin function instr
.p2align 4, 0x90
.type instr,@function
instr: # @instr
...
addq $1, .L__profc_instr+8(%rip)
...
movq .L__profc_instr(%rip), %rax
...
movq %rax, .L__profc_instr(%rip)
...
retq
.Lfunc_end1:

2021-06-14 07:53:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 01:56:41PM -0700, Bill Wendling wrote:
> For example, Fangrui gave you numbers, and you rejected them out of
> hand. I've explained to you why instrumentation is better than
> sampling (at least for clang). Fangrui gave you numbers. Let's move on
> to something else.

I did not dismiss them; I asked for clarification. I would like to
understand what exactly is missed by sampling based PGO data that makes
such a difference.

2021-06-14 09:03:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sat, Jun 12, 2021 at 01:56:41PM -0700, Bill Wendling wrote:
> On Sat, Jun 12, 2021 at 1:25 PM Peter Zijlstra <[email protected]> wrote:
> > On Sat, Jun 12, 2021 at 12:10:03PM -0700, Bill Wendling wrote:
> > Yes it is, but is that sufficient in this case? It very much isn't for
> > KASAN, UBSAN, and a whole host of other instrumentation crud. They all
> > needed their own 'bugger-off' attributes.
> >
> > > > We've got KCOV and GCOV support already. Coverage is also not an
> > > > argument mentioned anywhere else. Coverage can go pound sand, we really
> > > > don't need a third means of getting that.
> > > >
> > > Those aren't useful for clang-based implementations. And I like to
> > > look forward to potential improvements.
> >
> > I look forward to less things doing the same over and over. The obvious
> > solution if of course to make clang use what we have, not the other way
> > around.
> >
> That is not the obvious "solution".

Because having GCOV, KCOV and PGO all do essentially the same thing
differently, makes heaps of sense?

I understand that the compilers actually generates radically different
instrumentation for the various cases, but essentially they're all
collecting (function/branch) arcs.

I'm thinking it might be about time to build _one_ infrastructure for
that and define a kernel arc format and call it a day.

Note that if your compiler does arcs with functions (like gcc, unlike
clang) we can also trivially augment the arcs with PMU counter data. I
once did that for userspace.

2021-06-14 09:42:46

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 2:01 AM Peter Zijlstra <[email protected]> wrote:
> On Sat, Jun 12, 2021 at 01:56:41PM -0700, Bill Wendling wrote:
> > On Sat, Jun 12, 2021 at 1:25 PM Peter Zijlstra <[email protected]> wrote:
> > > On Sat, Jun 12, 2021 at 12:10:03PM -0700, Bill Wendling wrote:
> > > Yes it is, but is that sufficient in this case? It very much isn't for
> > > KASAN, UBSAN, and a whole host of other instrumentation crud. They all
> > > needed their own 'bugger-off' attributes.
> > >
> > > > > We've got KCOV and GCOV support already. Coverage is also not an
> > > > > argument mentioned anywhere else. Coverage can go pound sand, we really
> > > > > don't need a third means of getting that.
> > > > >
> > > > Those aren't useful for clang-based implementations. And I like to
> > > > look forward to potential improvements.
> > >
> > > I look forward to less things doing the same over and over. The obvious
> > > solution if of course to make clang use what we have, not the other way
> > > around.
> > >
> > That is not the obvious "solution".
>
> Because having GCOV, KCOV and PGO all do essentially the same thing
> differently, makes heaps of sense?
>
It does when you're dealing with one toolchain without access to another.

> I understand that the compilers actually generates radically different
> instrumentation for the various cases, but essentially they're all
> collecting (function/branch) arcs.
>
That's true, but there's no one format for profiling data that's
usable between all compilers. I'm not even sure there's a good way to
translate between, say, gcov and llvm's format. To make matters more
complicated, each compiler's format is tightly coupled to a specific
version of that compiler. And depending on *how* the data is collected
(e.g. sampling or instrumentation), it may not give us the full
benefit of FDO/PGO.

> I'm thinking it might be about time to build _one_ infrastructure for
> that and define a kernel arc format and call it a day.
>
That may be nice, but it's a rather large request.

> Note that if your compiler does arcs with functions (like gcc, unlike
> clang) we can also trivially augment the arcs with PMU counter data. I
> once did that for userspace.

2021-06-14 09:45:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Sun, Jun 13, 2021 at 11:07:26AM -0700, Bill Wendling wrote:

> > > Now, for the "nointr" issue. I'll see if we need an additional change for that.
> > >
> > The GCOV implementation disables profiling in those directories where
> > instrumentation would fail. We do the same. Both clang and gcc seem to
> > treat the no_instrument_function attribute similarly.

Both seem to emit instrumentation, so they're both, simliarly, *broken*.

noinstr *MUST* disable all compiler generated instrumentation. Also see:

https://lkml.kernel.org/r/[email protected]

I'll go mark GCOV support as BROKEN for x86.

2021-06-14 10:22:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 11:43:12AM +0200, Peter Zijlstra wrote:
> On Sun, Jun 13, 2021 at 11:07:26AM -0700, Bill Wendling wrote:
>
> > > > Now, for the "nointr" issue. I'll see if we need an additional change for that.
> > > >
> > > The GCOV implementation disables profiling in those directories where
> > > instrumentation would fail. We do the same. Both clang and gcc seem to
> > > treat the no_instrument_function attribute similarly.
>
> Both seem to emit instrumentation, so they're both, simliarly, *broken*.
>
> noinstr *MUST* disable all compiler generated instrumentation. Also see:
>
> https://lkml.kernel.org/r/[email protected]
>
> I'll go mark GCOV support as BROKEN for x86.

https://lkml.kernel.org/r/YMcssV/[email protected]

2021-06-14 10:59:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 02:39:41AM -0700, Bill Wendling wrote:
> On Mon, Jun 14, 2021 at 2:01 AM Peter Zijlstra <[email protected]> wrote:

> > Because having GCOV, KCOV and PGO all do essentially the same thing
> > differently, makes heaps of sense?
> >
> It does when you're dealing with one toolchain without access to another.

Here's a sekrit, don't tell anyone, but you can get a free copy of GCC
right here:

https://gcc.gnu.org/

We also have this linux-toolchains list (Cc'ed now) that contains folks
from both sides.

> > I understand that the compilers actually generates radically different
> > instrumentation for the various cases, but essentially they're all
> > collecting (function/branch) arcs.
> >
> That's true, but there's no one format for profiling data that's
> usable between all compilers. I'm not even sure there's a good way to
> translate between, say, gcov and llvm's format. To make matters more
> complicated, each compiler's format is tightly coupled to a specific
> version of that compiler. And depending on *how* the data is collected
> (e.g. sampling or instrumentation), it may not give us the full
> benefit of FDO/PGO.

I'm thinking that something simple like:

struct arc {
u64 from;
u64 to;
u64 nr;
u64 cntrs[0];
};

goes a very long way. Stick a header on that says how large cntrs[] is,
and some other data (like load offset and whatnot) and you should be
good.

Combine that with the executable image (say /proc/kcore) to recover
what's @from (call, jmp or conditional branch) and I'm thinking one
ought to be able to construct lots of useful data.

I've also been led to believe that the KCOV data format is not in fact
dependent on which toolchain is used.

> > I'm thinking it might be about time to build _one_ infrastructure for
> > that and define a kernel arc format and call it a day.
> >
> That may be nice, but it's a rather large request.

Given GCOV just died, perhaps you can look at what KCOV does and see if
that can be extended to do as you want. KCOV is actively used and
we actually tripped over all the fun little noinstr bugs at the time.

2021-06-14 11:46:14

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 3:45 AM Peter Zijlstra <[email protected]> wrote:
> On Mon, Jun 14, 2021 at 02:39:41AM -0700, Bill Wendling wrote:
> > On Mon, Jun 14, 2021 at 2:01 AM Peter Zijlstra <[email protected]> wrote:
>
> > > Because having GCOV, KCOV and PGO all do essentially the same thing
> > > differently, makes heaps of sense?
> > >
> > It does when you're dealing with one toolchain without access to another.
>
> Here's a sekrit, don't tell anyone, but you can get a free copy of GCC
> right here:
>
> https://gcc.gnu.org/
>
> We also have this linux-toolchains list (Cc'ed now) that contains folks
> from both sides.
>
Your sarcasm is not useful.

> > > I understand that the compilers actually generates radically different
> > > instrumentation for the various cases, but essentially they're all
> > > collecting (function/branch) arcs.
> > >
> > That's true, but there's no one format for profiling data that's
> > usable between all compilers. I'm not even sure there's a good way to
> > translate between, say, gcov and llvm's format. To make matters more
> > complicated, each compiler's format is tightly coupled to a specific
> > version of that compiler. And depending on *how* the data is collected
> > (e.g. sampling or instrumentation), it may not give us the full
> > benefit of FDO/PGO.
>
> I'm thinking that something simple like:
>
> struct arc {
> u64 from;
> u64 to;
> u64 nr;
> u64 cntrs[0];
> };
>
> goes a very long way. Stick a header on that says how large cntrs[] is,
> and some other data (like load offset and whatnot) and you should be
> good.
>
> Combine that with the executable image (say /proc/kcore) to recover
> what's @from (call, jmp or conditional branch) and I'm thinking one
> ought to be able to construct lots of useful data.
>
> I've also been led to believe that the KCOV data format is not in fact
> dependent on which toolchain is used.
>
> > > I'm thinking it might be about time to build _one_ infrastructure for
> > > that and define a kernel arc format and call it a day.
> > >
> > That may be nice, but it's a rather large request.
>
> Given GCOV just died, perhaps you can look at what KCOV does and see if
> that can be extended to do as you want. KCOV is actively used and
> we actually tripped over all the fun little noinstr bugs at the time.

2021-06-14 11:46:43

by Bill Wendling

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 3:45 AM Peter Zijlstra <[email protected]> wrote:
> On Mon, Jun 14, 2021 at 02:39:41AM -0700, Bill Wendling wrote:
> > On Mon, Jun 14, 2021 at 2:01 AM Peter Zijlstra <[email protected]> wrote:
> > > I understand that the compilers actually generates radically different
> > > instrumentation for the various cases, but essentially they're all
> > > collecting (function/branch) arcs.
> > >
> > That's true, but there's no one format for profiling data that's
> > usable between all compilers. I'm not even sure there's a good way to
> > translate between, say, gcov and llvm's format. To make matters more
> > complicated, each compiler's format is tightly coupled to a specific
> > version of that compiler. And depending on *how* the data is collected
> > (e.g. sampling or instrumentation), it may not give us the full
> > benefit of FDO/PGO.
>
> I'm thinking that something simple like:
>
> struct arc {
> u64 from;
> u64 to;
> u64 nr;
> u64 cntrs[0];
> };
>
> goes a very long way. Stick a header on that says how large cntrs[] is,
> and some other data (like load offset and whatnot) and you should be
> good.
>
> Combine that with the executable image (say /proc/kcore) to recover
> what's @from (call, jmp or conditional branch) and I'm thinking one
> ought to be able to construct lots of useful data.
>
> I've also been led to believe that the KCOV data format is not in fact
> dependent on which toolchain is used.
>
Awesome! I await your RFC on both the gcc and clang mailing lists.

-bw

> > > I'm thinking it might be about time to build _one_ infrastructure for
> > > that and define a kernel arc format and call it a day.
> > >
> > That may be nice, but it's a rather large request.
>
> Given GCOV just died, perhaps you can look at what KCOV does and see if
> that can be extended to do as you want. KCOV is actively used and
> we actually tripped over all the fun little noinstr bugs at the time.

2021-06-14 14:19:41

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, 14 Jun 2021 at 12:45, Peter Zijlstra <[email protected]> wrote:
[...]
> I've also been led to believe that the KCOV data format is not in fact
> dependent on which toolchain is used.

Correct, we use KCOV with both gcc and clang. Both gcc and clang emit
the same instrumentation for -fsanitize-coverage. Thus, the user-space
portion and interface is indeed identical:
https://www.kernel.org/doc/html/latest/dev-tools/kcov.html

> > > I'm thinking it might be about time to build _one_ infrastructure for
> > > that and define a kernel arc format and call it a day.
> > >
> > That may be nice, but it's a rather large request.
>
> Given GCOV just died, perhaps you can look at what KCOV does and see if
> that can be extended to do as you want. KCOV is actively used and
> we actually tripped over all the fun little noinstr bugs at the time.

There might be a subtle mismatch between coverage instrumentation for
testing/fuzzing and for profiling. (Disclaimer: I'm not too familiar
with Clang-PGO's requirements.) For example, while for testing/fuzzing
we may only require information if a code-path has been visited, for
profiling the "hotness" might be of interest. Therefore, the
user-space exported data format can make several trade-offs in
complexity.

In theory, I imagine there's a limit to how generic one could make
profiling information, because one compiler's optimizations are not
another compiler's optimizations. On the other hand, it may be doable
to collect unified profiling information for common stuff, but I guess
there's little motivation for figuring out the common ground given the
producer and consumer of the PGO data is the same compiler by design
(unlike coverage info for testing/fuzzing).

Therefore, if KCOV's exposed information does not match PGO's
requirements today, I'm not sure what realistically can be done
without turning KCOV into a monster. Because KCOV is optimized for
testing/fuzzing coverage, and I'm not sure how complex we can or want
to make it to cater to a new use-case.

My intuition is that the simpler design is to have 2 subsystems for
instrumentation-based coverage collection: one for testing/fuzzing,
and the other for profiling.

Alas, there's the problem of GCOV, which should be replaceable by KCOV
for most use cases. But it would be good to hear from a GCOV user if
there are some.

But as we learned GCOV is broken on x86 now, I see these options:

1. Remove GCOV, make KCOV the de-facto test-coverage collection
subsystem. Introduce PGO-instrumentation subsystem for profile
collection only, and make it _very_ clear that KCOV != PGO data as
hinted above. A pre-requisite is that compiler-support for PGO
instrumentation adds selective instrumentation support, likely just
making attribute no_instrument_function do the right thing.

2. Like (1) but also keep GCOV, given proper support for attribute
no_instrument_function would probably fix it (?).

3. Keep GCOV (and KCOV of course). Somehow extract PGO profiles from KCOV.

4. Somehow extract PGO profiles from GCOV, or modify kernel/gcov to do so.

Thanks.

2021-06-14 15:30:35

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 04:16:16PM +0200, 'Marco Elver' via Clang Built Linux wrote:
> On Mon, 14 Jun 2021 at 12:45, Peter Zijlstra <[email protected]> wrote:
> [...]
> > I've also been led to believe that the KCOV data format is not in fact
> > dependent on which toolchain is used.
>
> Correct, we use KCOV with both gcc and clang. Both gcc and clang emit
> the same instrumentation for -fsanitize-coverage. Thus, the user-space
> portion and interface is indeed identical:
> https://www.kernel.org/doc/html/latest/dev-tools/kcov.html
>
> > > > I'm thinking it might be about time to build _one_ infrastructure for
> > > > that and define a kernel arc format and call it a day.
> > > >
> > > That may be nice, but it's a rather large request.
> >
> > Given GCOV just died, perhaps you can look at what KCOV does and see if
> > that can be extended to do as you want. KCOV is actively used and
> > we actually tripped over all the fun little noinstr bugs at the time.
>
> There might be a subtle mismatch between coverage instrumentation for
> testing/fuzzing and for profiling. (Disclaimer: I'm not too familiar
> with Clang-PGO's requirements.) For example, while for testing/fuzzing
> we may only require information if a code-path has been visited, for
> profiling the "hotness" might be of interest. Therefore, the
> user-space exported data format can make several trade-offs in
> complexity.

This has been my primary take-away: given that Clang's PGO is different
enough from the other things and provides more specific/actionable
results, I think it's justified to exist on its own separate from the
other parts.

> In theory, I imagine there's a limit to how generic one could make
> profiling information, because one compiler's optimizations are not
> another compiler's optimizations. On the other hand, it may be doable
> to collect unified profiling information for common stuff, but I guess
> there's little motivation for figuring out the common ground given the
> producer and consumer of the PGO data is the same compiler by design
> (unlike coverage info for testing/fuzzing).
>
> Therefore, if KCOV's exposed information does not match PGO's
> requirements today, I'm not sure what realistically can be done
> without turning KCOV into a monster. Because KCOV is optimized for
> testing/fuzzing coverage, and I'm not sure how complex we can or want
> to make it to cater to a new use-case.
>
> My intuition is that the simpler design is to have 2 subsystems for
> instrumentation-based coverage collection: one for testing/fuzzing,
> and the other for profiling.
>
> Alas, there's the problem of GCOV, which should be replaceable by KCOV
> for most use cases. But it would be good to hear from a GCOV user if
> there are some.
>
> But as we learned GCOV is broken on x86 now, I see these options:
>
> 1. Remove GCOV, make KCOV the de-facto test-coverage collection
> subsystem. Introduce PGO-instrumentation subsystem for profile
> collection only, and make it _very_ clear that KCOV != PGO data as
> hinted above. A pre-requisite is that compiler-support for PGO
> instrumentation adds selective instrumentation support, likely just
> making attribute no_instrument_function do the right thing.

Right. I can't speak to GCOV, but KCOV certainly isn't PGO.

> 2. Like (1) but also keep GCOV, given proper support for attribute
> no_instrument_function would probably fix it (?).
>
> 3. Keep GCOV (and KCOV of course). Somehow extract PGO profiles from KCOV.
>
> 4. Somehow extract PGO profiles from GCOV, or modify kernel/gcov to do so.

If there *is* a way to "combine" these, I don't think it makes sense
to do it now. PGO has users (and is expanding[1]), and trying to
optimize the design before even landing the first version seems like a
needless obstruction, and to likely not address currently undiscovered
requirements.

So, AFAICT, the original blocking issue ("PGO does not respect noinstr")
is not actually an issue (noinstr contains notrace, which IS respected
by PGO[2]), I think this is fine to move forward.

-Kees

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/CAGG=3QVHkkJ236mCJ8Jt_6JtgYtWHV9b4aVXnoj6ypc7GOnc0A@mail.gmail.com/

--
Kees Cook

2021-06-14 15:39:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 08:26:01AM -0700, Kees Cook wrote:
> So, AFAICT, the original blocking issue ("PGO does not respect noinstr")
> is not actually an issue (noinstr contains notrace, which IS respected
> by PGO[2]), I think this is fine to move forward.

It is *NOT*: https://godbolt.org/z/9c7xdvGd9

Look at how both compilers generate instrumentation in the no_instr()
function.

2021-06-14 15:48:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 08:26:01AM -0700, Kees Cook wrote:
> > 2. Like (1) but also keep GCOV, given proper support for attribute
> > no_instrument_function would probably fix it (?).
> >
> > 3. Keep GCOV (and KCOV of course). Somehow extract PGO profiles from KCOV.
> >
> > 4. Somehow extract PGO profiles from GCOV, or modify kernel/gcov to do so.
>
> If there *is* a way to "combine" these, I don't think it makes sense
> to do it now. PGO has users (and is expanding[1]), and trying to
> optimize the design before even landing the first version seems like a
> needless obstruction, and to likely not address currently undiscovered
> requirements.

Even if that were so (and I'm not yet convinced), the current proposal
is wedded to llvm-pgo, there is no way gcc-pgo could reuse any of this
code afaict, which then means they have to create yet another variant.

Sorting this *before* the first version is exactly the right time.

Since when are we merging code when the requirements are not clear?

Just to clarify:

Nacked-by: Peter Zijlstra (Intel) <[email protected]>

For all this PGO crud.

2021-06-14 16:06:06

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 8:46 AM Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Jun 14, 2021 at 08:26:01AM -0700, Kees Cook wrote:
> > > 2. Like (1) but also keep GCOV, given proper support for attribute
> > > no_instrument_function would probably fix it (?).
> > >
> > > 3. Keep GCOV (and KCOV of course). Somehow extract PGO profiles from KCOV.
> > >
> > > 4. Somehow extract PGO profiles from GCOV, or modify kernel/gcov to do so.
> >
> > If there *is* a way to "combine" these, I don't think it makes sense
> > to do it now. PGO has users (and is expanding[1]), and trying to
> > optimize the design before even landing the first version seems like a
> > needless obstruction, and to likely not address currently undiscovered
> > requirements.
>
> Even if that were so (and I'm not yet convinced), the current proposal
> is wedded to llvm-pgo, there is no way gcc-pgo could reuse any of this
> code afaict, which then means they have to create yet another variant.

Similar to GCOV, the runtime support for exporting such data is
heavily compiler (and compiler version) specific, as is the data
format for compilers to consume. We were able to reuse most of the
runtime code between GCC and Clang support in GCOV; I don't see why we
couldn't do a similar factoring of the runtime code being added to the
kernel here, should anyone care to pursue implementing PGO with GCC.
Having an implementation is a great starting point for folks looking
to extend support or to understand how to support PGO in such a bare
metal environment (one that doesn't dynamically link against
traditional compiler runtimes).
--
Thanks,
~Nick Desaulniers

2021-06-14 16:26:53

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 05:35:45PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 14, 2021 at 08:26:01AM -0700, Kees Cook wrote:
> > So, AFAICT, the original blocking issue ("PGO does not respect noinstr")
> > is not actually an issue (noinstr contains notrace, which IS respected
> > by PGO[2]), I think this is fine to move forward.
>
> It is *NOT*: https://godbolt.org/z/9c7xdvGd9
>
> Look at how both compilers generate instrumentation in the no_instr()
> function.

Well that's disappointing. I'll put this on hold until Clang can grow an
appropriate attribute (or similar work-around). Thanks for catching
that.

--
Kees Cook

2021-06-14 18:13:55

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 9:23 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Jun 14, 2021 at 05:35:45PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 14, 2021 at 08:26:01AM -0700, Kees Cook wrote:
> > > So, AFAICT, the original blocking issue ("PGO does not respect noinstr")
> > > is not actually an issue (noinstr contains notrace, which IS respected
> > > by PGO[2]), I think this is fine to move forward.
> >
> > It is *NOT*: https://godbolt.org/z/9c7xdvGd9
> >
> > Look at how both compilers generate instrumentation in the no_instr()
> > function.
>
> Well that's disappointing. I'll put this on hold until Clang can grow an
> appropriate attribute (or similar work-around). Thanks for catching
> that.

Cross referencing since these two threads are related.
https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/
--
Thanks,
~Nick Desaulniers

2021-06-14 20:54:00

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

On Mon, Jun 14, 2021 at 11:07 AM Nick Desaulniers
<[email protected]> wrote:
>
> On Mon, Jun 14, 2021 at 9:23 AM Kees Cook <[email protected]> wrote:
> >
> > On Mon, Jun 14, 2021 at 05:35:45PM +0200, Peter Zijlstra wrote:
> > > On Mon, Jun 14, 2021 at 08:26:01AM -0700, Kees Cook wrote:
> > > > So, AFAICT, the original blocking issue ("PGO does not respect noinstr")
> > > > is not actually an issue (noinstr contains notrace, which IS respected
> > > > by PGO[2]), I think this is fine to move forward.
> > >
> > > It is *NOT*: https://godbolt.org/z/9c7xdvGd9
> > >
> > > Look at how both compilers generate instrumentation in the no_instr()
> > > function.
> >
> > Well that's disappointing. I'll put this on hold until Clang can grow an
> > appropriate attribute (or similar work-around). Thanks for catching
> > that.
>
> Cross referencing since these two threads are related.
> https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 looked appropriate
to me, so I commented on it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223#c6

Patches for:
PGO: https://reviews.llvm.org/D104253
GCOV: https://reviews.llvm.org/D104257
--
Thanks,
~Nick Desaulniers