LinuxLists.cc - [PATCH 0/8] arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

2023-01-09 14:15:03

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 0/8] arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

This series adds a new DYNAMIC_FTRACE_WITH_CALL_OPS mechanism, and
enables support for this on arm64. This significantly reduces the
overhead of tracing when a callsite/tracee has a single associated
tracer, avoids a number of issues that make it undesireably and
infeasible to use dynamically-allocated trampolines (e.g. branch range
limitations), and makes it possible to implement support for
DYNAMIC_FTRACE_WITH_DIRECT_CALLS in future.

The main idea is to give each ftrace callsite an associated pointer to
an ftrace_ops. The architecture's ftrace_caller trampoline can recover
the ops pointer and invoke ops->func from this without needing to use
ftrace_ops_list_func, which has to iterate through all registered ops.

To do this, we use -fpatchable-function-entry=M,N, there N NOPs are
placed before the function entry point. On arm64 NOPs are always 4
bytes, so by allocating 2 per-function NOPs, we have enaough space to
place a 64-bit value. So that we can manipulate the pointer atomically,
we need to align instrumented functions to at least 8 bytes.

The first three patches enable this function alignment, requiring
changes to the ACPICA Makefile, and working around cases where GCC drops
alignment.

The final four patches implement support for arm64. As noted in the
final patch, this results in a significant reduction in overhead:

Before this patch:

Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,583 | 0.95 | -
0 | 1 || 93,709 | 0.94 | -
0 | 2 || 93,666 | 0.94 | -
0 | 10 || 93,709 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 6,467,833 | 64.68 | 63.73
1 | 2 || 7,509,708 | 75.10 | 74.15
1 | 10 || 23,786,792 | 237.87 | 236.92
1 | 100 || 106,432,500 | 1,064.43 | 1063.38
---------+------------++-------------+--------------+------------
1 | 0 || 1,431,875 | 14.32 | 13.37
2 | 0 || 6,456,334 | 64.56 | 63.62
10 | 0 || 22,717,000 | 227.17 | 226.22
100 | 0 || 103,293,667 | 1032.94 | 1031.99
---------+------------++-------------+--------------+--------------

Note: per-call overhead is estiamated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.

After this patch

Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,541 | 0.95 | -
0 | 1 || 93,666 | 0.94 | -
0 | 2 || 93,709 | 0.94 | -
0 | 10 || 93,667 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 281,000 | 2.81 | 1.86
1 | 2 || 281,042 | 2.81 | 1.87
1 | 10 || 280,958 | 2.81 | 1.86
1 | 100 || 281,250 | 2.81 | 1.87
---------+------------++-------------+--------------+------------
1 | 0 || 280,959 | 2.81 | 1.86
2 | 0 || 6,502,708 | 65.03 | 64.08
10 | 0 || 18,681,209 | 186.81 | 185.87
100 | 0 || 103,550,458 | 1,035.50 | 1034.56
---------+------------++-------------+--------------+------------

Note: per-call overhead is estiamated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.

Thanks,
Mark.

Mark Rutland (8):
Compiler attributes: GCC function alignment workarounds
ACPI: Don't build ACPICA with '-Os'
arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
arm64: insn: Add helpers for BTI
arm64: patching: Add aarch64_insn_write_literal_u64()
arm64: ftrace: Update stale comment
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS

arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 5 +-
arch/arm64/include/asm/ftrace.h | 15 +--
arch/arm64/include/asm/insn.h | 1 +
arch/arm64/include/asm/linkage.h | 10 +-
arch/arm64/include/asm/patching.h | 2 +
arch/arm64/kernel/asm-offsets.c | 4 +
arch/arm64/kernel/entry-ftrace.S | 32 +++++-
arch/arm64/kernel/ftrace.c | 158 +++++++++++++++++++++++++++-
arch/arm64/kernel/patching.c | 17 +++
drivers/acpi/acpica/Makefile | 2 +-
include/linux/compiler_attributes.h | 23 +++-
include/linux/ftrace.h | 15 ++-
kernel/trace/Kconfig | 7 ++
kernel/trace/ftrace.c | 109 ++++++++++++++++++-
15 files changed, 371 insertions(+), 32 deletions(-)

--
2.30.2

2023-01-09 14:17:43

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 3/8] arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT

On arm64 we don't align assembly funciton in the same way as C
functions. This somewhat limits the utility of
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B for testing.

Follow the example of x86, and align assembly functions in the same way
as C functions.

I've tested this by selecting CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y,
building and booting a kernel, and looking for misaligned text symbols:

Before:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
322

After:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
115

The remaining unaligned text symbols are a combination of non-function labels
in assembly and early position-independent code which is built with '-Os'.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/linkage.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index 1436fa1cde24d..df18a3446ce82 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -5,8 +5,14 @@
#include <asm/assembler.h>
#endif

-#define __ALIGN .align 2
-#define __ALIGN_STR ".align 2"
+#if CONFIG_FUNCTION_ALIGNMENT > 0
+#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT
+#else
+#define ARM64_FUNCTION_ALIGNMENT 4
+#endif
+
+#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT
+#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT

/*
* When using in-kernel BTI we need to ensure that PCS-conformant
--
2.30.2

2023-01-09 14:19:22

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 5/8] arm64: insn: Add helpers for BTI

In subsequent patches we'd like to check whether an instruction is a
BTI. In preparation for this, add basic instruction helpers for BTI
instructions.

Per ARM DDI 0487H.a section C6.2.41, BTI is encoded in binary as
follows, MSB to LSB:

1101 0101 000 0011 0010 0100 xx01 1111

Where the `xx` bits encode J/C/JC:

00 : (omitted)
01 : C
10 : J
11 : JC

Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/insn.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index aaf1f52fbf3e0..139a88e4e8522 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -420,6 +420,7 @@ __AARCH64_INSN_FUNCS(sb, 0xFFFFFFFF, 0xD50330FF)
__AARCH64_INSN_FUNCS(clrex, 0xFFFFF0FF, 0xD503305F)
__AARCH64_INSN_FUNCS(ssbb, 0xFFFFFFFF, 0xD503309F)
__AARCH64_INSN_FUNCS(pssbb, 0xFFFFFFFF, 0xD503349F)
+__AARCH64_INSN_FUNCS(bti, 0xFFFFFF3F, 0xD503241f)

#undef __AARCH64_INSN_FUNCS

--
2.30.2

2023-01-09 14:20:57

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 2/8] ACPI: Don't build ACPICA with '-Os'

The ACPICA code has been built with '-Os' since the beginning of git
history, though there's no explanatory comment as to why.

This is unfortunate as building with '-Os' overrides -falign-functions,
which prevents CONFIG_FUNCITON_ALIGNMENT and
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B from having their expected effect
on the ACPICA code. This is doubly unfortunate as in subsequent patches
arm64 will depend upon CONFIG_FUNCTION_ALIGNMENT for its ftrace
implementation.

Drop the '-Os' flag when building the ACPICA code. With this removed,
the code builds cleanly and works correctly in testing so far.

I've tested this by selecting CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y,
building and booting a kernel using ACPI, and looking for misaligned
text symbols:

* arm64:

Before:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
908
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
576

After:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
322
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
0

* x86_64:

Before:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
2057
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
706

After:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
1351
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
0

With the patch applied, the remaining unaligned text labels are a
combination of static call trampolines and labels in assembly, which
will be dealt with in subsequent patches.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Robert Moore <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
---
drivers/acpi/acpica/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/acpica/Makefile b/drivers/acpi/acpica/Makefile
index 9e0d95d76fff7..30f3fc13c29d1 100644
--- a/drivers/acpi/acpica/Makefile
+++ b/drivers/acpi/acpica/Makefile
@@ -3,7 +3,7 @@
# Makefile for ACPICA Core interpreter
#

-ccflags-y := -Os -D_LINUX -DBUILDING_ACPICA
+ccflags-y := -D_LINUX -DBUILDING_ACPICA
ccflags-$(CONFIG_ACPI_DEBUG) += -DACPI_DEBUG_OUTPUT

# use acpi.o to put all files here into acpi.o modparam namespace
--
2.30.2

2023-01-09 14:31:26

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 7/8] arm64: ftrace: Update stale comment

In commit:

26299b3f6ba26bfc ("ftrace: arm64: move from REGS to ARGS")

... we folded ftrace_regs_entry into ftrace_caller, and
ftrace_regs_entry no longer exists.

Update the comment accordingly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/kernel/ftrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index b30b955a89211..38ebdf063255b 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -209,7 +209,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
* | NOP | MOV X9, LR | MOV X9, LR |
* | NOP | NOP | BL <entry> |
*
- * The LR value will be recovered by ftrace_regs_entry, and restored into LR
+ * The LR value will be recovered by ftrace_caller, and restored into LR
* before returning to the regular function prologue. When a function is not
* being traced, the MOV is not harmful given x9 is not live per the AAPCS.
*
--
2.30.2

2023-01-09 14:33:30

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 8/8] arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS

This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampoliens for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.

The main idea is to place a pointer to the ftrace_ops as a literal at a
fixed offset from the function entry point, which can be recovered by
the common ftrace trampoline. Using a 64-bit literal avoids branch range
limitations, and permits the ops to be swapped atomically without
special considerations that apply to code-patching. In future this will
also allow for the implementation of DYNAMIC_FTRACE_WITH_DIRECT_CALLS
without branch range limitations by using additional fields in struct
ftrace_ops.

As noted in the core patch adding support for
DYNAMIC_FTRACE_WITH_CALL_OPS, this approach allows for directly invoking
ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or
part of a module), without going via ftrace_ops_list_func.

Currently, this approach is not compatible with CLANG_CFI, as the
presence/absence of pre-function NOPs changes the offset of the
pre-function type hash, and there's no existing mechanism to ensure a
consistent offset for instrumented and uninstrumented functions. When
CLANG_CFI is enabled, the existing scheme with a global ops->func
pointer is used, and there should be no functional change. I am
currently working with others to allow the two to work together in
future (though this will liekly require updated compiler support).

I've benchamrked this with the ftrace_ops sample module [1], which is
not currently upstream, but available at:

https://lore.kernel.org/lkml/[email protected]
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git ftrace-ops-sample-20230109

Using that module I measured the total time taken for 100,000 calls to a
trivial instrumented function, with a number of tracers enabled with
relevant filters (which would apply to the instrumented function) and a
number of tracers enabled with irrelevant filters (which would not apply
to the instrumented function). I tested on an M1 MacBook Pro, running
under a HVF-accelerated QEMU VM (i.e. on real hardware).

Before this patch:

Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,583 | 0.95 | -
0 | 1 || 93,709 | 0.94 | -
0 | 2 || 93,666 | 0.94 | -
0 | 10 || 93,709 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 6,467,833 | 64.68 | 63.73
1 | 2 || 7,509,708 | 75.10 | 74.15
1 | 10 || 23,786,792 | 237.87 | 236.92
1 | 100 || 106,432,500 | 1,064.43 | 1063.38
---------+------------++-------------+--------------+------------
1 | 0 || 1,431,875 | 14.32 | 13.37
2 | 0 || 6,456,334 | 64.56 | 63.62
10 | 0 || 22,717,000 | 227.17 | 226.22
100 | 0 || 103,293,667 | 1032.94 | 1031.99
---------+------------++-------------+--------------+--------------

Note: per-call overhead is estiamated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.

After this patch

Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,541 | 0.95 | -
0 | 1 || 93,666 | 0.94 | -
0 | 2 || 93,709 | 0.94 | -
0 | 10 || 93,667 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 281,000 | 2.81 | 1.86
1 | 2 || 281,042 | 2.81 | 1.87
1 | 10 || 280,958 | 2.81 | 1.86
1 | 100 || 281,250 | 2.81 | 1.87
---------+------------++-------------+--------------+------------
1 | 0 || 280,959 | 2.81 | 1.86
2 | 0 || 6,502,708 | 65.03 | 64.08
10 | 0 || 18,681,209 | 186.81 | 185.87
100 | 0 || 103,550,458 | 1,035.50 | 1034.56
---------+------------++-------------+--------------+------------

Note: per-call overhead is estiamated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.

As can be seen from the above:

a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.

b) The overhead for a single relevant tracer has dropped to ~1/7 of the
overhead prior to this series (from 13.37ns to 1.86ns). This is
largely due to permitting calls to dynamically-allocated ftrace_ops
without going through ftrace_ops_list_func.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 5 +-
arch/arm64/include/asm/ftrace.h | 15 +--
arch/arm64/kernel/asm-offsets.c | 4 +
arch/arm64/kernel/entry-ftrace.S | 32 ++++++-
arch/arm64/kernel/ftrace.c | 156 +++++++++++++++++++++++++++++++
6 files changed, 195 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 03934808b2ed0..7838f568fa158 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -123,6 +123,7 @@ config ARM64
select DMA_DIRECT_REMAP
select EDAC_SUPPORT
select FRAME_POINTER
+ select FUNCTION_ALIGNMENT_8B if DYNAMIC_FTRACE_WITH_CALL_OPS
select GENERIC_ALLOCATOR
select GENERIC_ARCH_TOPOLOGY
select GENERIC_CLOCKEVENTS_BROADCAST
@@ -186,6 +187,8 @@ config ARM64
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGS \
if $(cc-option,-fpatchable-function-entry=2)
+ select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS \
+ if (DYNAMIC_FTRACE_WITH_ARGS && !CFI_CLANG)
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
if DYNAMIC_FTRACE_WITH_ARGS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index d62bd221828f7..4c3be442fbb35 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -139,7 +139,10 @@ endif

CHECKFLAGS += -D__aarch64__

-ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_ARGS),y)
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS),y)
+ KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2
+else ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_ARGS),y)
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
CC_FLAGS_FTRACE := -fpatchable-function-entry=2
endif
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 5664729800ae1..1c2672bbbf379 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -62,20 +62,7 @@ extern unsigned long ftrace_graph_call;

extern void return_to_handler(void);

-static inline unsigned long ftrace_call_adjust(unsigned long addr)
-{
- /*
- * Adjust addr to point at the BL in the callsite.
- * See ftrace_init_nop() for the callsite sequence.
- */
- if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_ARGS))
- return addr + AARCH64_INSN_SIZE;
- /*
- * addr is the address of the mcount call instruction.
- * recordmcount does the necessary offset calculation.
- */
- return addr;
-}
+unsigned long ftrace_call_adjust(unsigned long addr);

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
struct dyn_ftrace;
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 2234624536d95..ae345b06e9f7e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -9,6 +9,7 @@

#include <linux/arm_sdei.h>
#include <linux/sched.h>
+#include <linux/ftrace.h>
#include <linux/kexec.h>
#include <linux/mm.h>
#include <linux/dma-mapping.h>
@@ -193,6 +194,9 @@ int main(void)
DEFINE(KIMAGE_HEAD, offsetof(struct kimage, head));
DEFINE(KIMAGE_START, offsetof(struct kimage, start));
BLANK();
+#endif
+#ifdef CONFIG_FUNCTION_TRACER
+ DEFINE(FTRACE_OPS_FUNC, offsetof(struct ftrace_ops, func));
#endif
return 0;
}
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 3b625f76ffbae..350ed81324ace 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -65,13 +65,35 @@ SYM_CODE_START(ftrace_caller)
stp x29, x30, [sp, #FREGS_SIZE]
add x29, sp, #FREGS_SIZE

- sub x0, x30, #AARCH64_INSN_SIZE // ip (callsite's BL insn)
- mov x1, x9 // parent_ip (callsite's LR)
- ldr_l x2, function_trace_op // op
- mov x3, sp // regs
+ /* Prepare arguments for the the tracer func */
+ sub x0, x30, #AARCH64_INSN_SIZE // ip (callsite's BL insn)
+ mov x1, x9 // parent_ip (callsite's LR)
+ mov x3, sp // regs
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+ /*
+ * The literal pointer to the ops is at an 8-byte aligned boundary
+ * which is either 12 or 16 bytes before the BL instruction in the call
+ * site. See ftrace_call_adjust() for details.
+ *
+ * Therefore here the LR points at `literal + 16` or `literal + 20`,
+ * and we can find the address of the literal in either case by
+ * aligning to an 8-byte boundary and subtracting 16. We do the
+ * alignment first as this allows us to fold the subtraction into the
+ * LDR.
+ */
+ bic x2, x30, 0x7
+ ldr x2, [x2, #-16] // op
+
+ ldr x4, [x2, #FTRACE_OPS_FUNC] // op->func
+ blr x4 // op->func(ip, parent_ip, op, regs)
+
+#else
+ ldr_l x2, function_trace_op // op

SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
- bl ftrace_stub
+ bl ftrace_stub // func(ip, parent_ip, op, regs)
+#endif

/*
* At the callsite x0-x8 and x19-x30 were live. Any C code will have preserved
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 38ebdf063255b..5545fe1a90125 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -60,6 +60,89 @@ int ftrace_regs_query_register_offset(const char *name)
}
#endif

+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+ /*
+ * When using mcount, addr is the address of the mcount call
+ * instruction, and no adjustment is necessary.
+ */
+ if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_ARGS))
+ return addr;
+
+ /*
+ * When using patchable-function-entry without pre-function NOPS, addr
+ * is the address of the first NOP after the function entry point.
+ *
+ * The compiler has either generated:
+ *
+ * addr+00: func: NOP // To be patched to MOV X9, LR
+ * addr+04: NOP // To be patched to BL <caller>
+ *
+ * Or:
+ *
+ * addr-04: BTI C
+ * addr+00: func: NOP // To be patched to MOV X9, LR
+ * addr+04: NOP // To be patched to BL <caller>
+ *
+ * We must adjust addr to the address of the NOP which will be patched
+ * to `BL <caller>`, which is at `addr + 4` bytes in either case.
+ *
+ */
+ if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+ return addr + AARCH64_INSN_SIZE;
+
+ /*
+ * When using patchable-function-entry with pre-function NOPs, addr is
+ * the address of the first pre-function NOP.
+ *
+ * Starting from an 8-byte aligned base, the compiler has either
+ * generated:
+ *
+ * addr+00: NOP // Literal (first 32 bits)
+ * addr+04: NOP // Literal (last 32 bits)
+ * addr+08: func: NOP // To be patched to MOV X9, LR
+ * addr+12: NOP // To be patched to BL <caller>
+ *
+ * Or:
+ *
+ * addr+00: NOP // Literal (first 32 bits)
+ * addr+04: NOP // Literal (last 32 bits)
+ * addr+08: func: BTI C
+ * addr+12: NOP // To be patched to MOV X9, LR
+ * addr+16: NOP // To be patched to BL <caller>
+ *
+ * We must adjust addr to the address of the NOP which will be patched
+ * to `BL <caller>`, which is at either addr+12 or addr+16 depending on
+ * whether there is a BTI.
+ */
+
+ if (!IS_ALIGNED(addr, sizeof(unsigned long))) {
+ WARN_RATELIMIT(1, "Misaligned patch-site %pS\n",
+ (void *)(addr + 8));
+ return 0;
+ }
+
+ /* Skip the NOPs placed before the function entry point */
+ addr += 2 * AARCH64_INSN_SIZE;
+
+ /* Skip any BTI */
+ if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) {
+ u32 insn = le32_to_cpu(*(__le32 *)addr);
+
+ if (aarch64_insn_is_bti(insn)) {
+ addr += AARCH64_INSN_SIZE;
+ } else if (insn != aarch64_insn_gen_nop()) {
+ WARN_RATELIMIT(1, "unexpected insn in patch-site %pS: 0x%08x\n",
+ (void *)addr, insn);
+ }
+ }
+
+ /* Skip the first NOP after function entry */
+ addr += AARCH64_INSN_SIZE;
+
+ return addr;
+}
+
/*
* Replace a single instruction, which may be a branch or NOP.
* If @validate == true, a replaced instruction is checked against 'old'.
@@ -98,6 +181,13 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
unsigned long pc;
u32 new;

+ /*
+ * When using CALL_OPS, the function to call is associated with the
+ * call site, and we don't have a global function pointer to update.
+ */
+ if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+ return 0;
+
pc = (unsigned long)ftrace_call;
new = aarch64_insn_gen_branch_imm(pc, (unsigned long)func,
AARCH64_INSN_BRANCH_LINK);
@@ -176,6 +266,44 @@ static bool ftrace_find_callable_addr(struct dyn_ftrace *rec,
return true;
}

+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+static const struct ftrace_ops *arm64_rec_get_ops(struct dyn_ftrace *rec)
+{
+ const struct ftrace_ops *ops = NULL;
+
+ if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
+ ops = ftrace_find_unique_ops(rec);
+ WARN_ON_ONCE(!ops);
+ }
+
+ if (!ops)
+ ops = &ftrace_list_ops;
+
+ return ops;
+}
+
+static int ftrace_rec_set_ops(const struct dyn_ftrace *rec,
+ const struct ftrace_ops *ops)
+{
+ unsigned long literal = ALIGN_DOWN(rec->ip - 12, 8);
+ return aarch64_insn_write_literal_u64((void *)literal,
+ (unsigned long)ops);
+}
+
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec)
+{
+ return ftrace_rec_set_ops(rec, &ftrace_nop_ops);
+}
+
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec)
+{
+ return ftrace_rec_set_ops(rec, arm64_rec_get_ops(rec));
+}
+#else
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; }
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; }
+#endif
+
/*
* Turn on the call to ftrace_caller() in instrumented function
*/
@@ -183,6 +311,11 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
unsigned long pc = rec->ip;
u32 old, new;
+ int ret;
+
+ ret = ftrace_rec_update_ops(rec);
+ if (ret)
+ return ret;

if (!ftrace_find_callable_addr(rec, NULL, &addr))
return -EINVAL;
@@ -193,6 +326,19 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
return ftrace_modify_code(pc, old, new, true);
}

+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
+ unsigned long addr)
+{
+ if (WARN_ON_ONCE(old_addr != (unsigned long)ftrace_caller))
+ return -EINVAL;
+ if (WARN_ON_ONCE(addr != (unsigned long)ftrace_caller))
+ return -EINVAL;
+
+ return ftrace_rec_update_ops(rec);
+}
+#endif
+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
/*
* The compiler has inserted two NOPs before the regular function prologue.
@@ -220,6 +366,11 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
{
unsigned long pc = rec->ip - AARCH64_INSN_SIZE;
u32 old, new;
+ int ret;
+
+ ret = ftrace_rec_set_nop_ops(rec);
+ if (ret)
+ return ret;

old = aarch64_insn_gen_nop();
new = aarch64_insn_gen_move_reg(AARCH64_INSN_REG_9,
@@ -237,9 +388,14 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
{
unsigned long pc = rec->ip;
u32 old = 0, new;
+ int ret;

new = aarch64_insn_gen_nop();

+ ret = ftrace_rec_set_nop_ops(rec);
+ if (ret)
+ return ret;
+
/*
* When using mcount, callsites in modules may have been initalized to
* call an arbitrary module PLT (which redirects to the _mcount stub)
--
2.30.2

2023-01-09 14:44:29

by Mark Rutland

[permalink] [raw]

Subject: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds

From local testing, contemporary versions of of GCC (e.g. GCC 12.2.0)
don't respect '-falign-functions=N' in all cases. This is unfortunate,
as (for non-zero values of N) CONFIG_FUNCTION_ALIGNMENT=N will set
'-falign-functions=N', but this won't take effect for all functions.
LLVM appears to respect '-falign-functions=N' in call cases.

Today, for x86 this turns out to be functionally benign, though it does
somewhat undermine the CONFIG_FUNCTION_ALIGNMENT option, and it means
that CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B is not as robust as we'd
like.

On arm64 we'd like to use CONFIG_FUNCTION_ALIGNMENT to implement ftrace
functionality, and we'll require function alignment to be respected for
functional reasons.

As far as I can tell, GCC doesn't respect '-falign-functions=N':

* When the __weak__ attribute is used

GCC seems to forget the alignment specified by '-falign-functions=N',
but will respect the '__aligned__(N)' function attribute. Thus, we can
work around this by explciitly setting the alignment for weak
functions.

* When the __cold__ attribute is used

GCC seems to forget the alignment specified by '-falign-functions=N',
and also doesn't seem to respect the '__aligned__(N)' function
attribute. The only way to work around this is to not use the __cold__
attibute.

This patch implements workarounds for these two cases, using a function
attribute to set the alignment of __weak__ functions, and preventing the
use of the __cold__ attribute when CONFIG_FUNCTION_ALIGNMENT is
non-zero.

I've tested this by selecting CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y,
building and booting a kernel, and looking for misaligned text symbols:

* arm64:

Before:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
4939

After:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
908

* x86_64:

Before:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
7969

After:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
2057

With the patch applied, the remaining unaligned text labels are a
combination of static call trampolines, non-function labels in assembly,
and ACPICA functions, which will be dealt with in subsequent patches.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Nick Desaulniers <[email protected]>
---
include/linux/compiler_attributes.h | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index 898b3458b24a0..dcb7ac67b764f 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -33,6 +33,17 @@
#define __aligned(x) __attribute__((__aligned__(x)))
#define __aligned_largest __attribute__((__aligned__))

+/*
+ * Contemporary versions of GCC (e.g. 12.2.0) don't always respect
+ * '-falign-functions=N', and require alignment to be specificed via a function
+ * attribute in some cases.
+ */
+#if CONFIG_FUNCTION_ALIGNMENT > 0
+#define __function_aligned __aligned(CONFIG_FUNCTION_ALIGNMENT)
+#else
+#define __function_aligned
+#endif
+
/*
* Note: do not use this directly. Instead, use __alloc_size() since it is conditionally
* available and includes other attributes. For GCC < 9.1, __alloc_size__ gets undefined
@@ -78,8 +89,15 @@
/*
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-cold-function-attribute
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Label-Attributes.html#index-cold-label-attribute
+ *
+ * GCC drops function alignment when the __cold__ attribute is used. Avoid the
+ * __cold__ attribute if function alignment is required.
*/
+#if !defined(CONFIG_CC_IS_GCC) || (CONFIG_FUNCTION_ALIGNMENT == 0)
#define __cold __attribute__((__cold__))
+#else
+#define __cold
+#endif

/*
* Note the long name.
@@ -369,8 +387,11 @@
/*
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-weak-function-attribute
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-weak-variable-attribute
+ *
+ * GCC drops function alignment when the __weak__ attribute is used. This can
+ * be restored with function attributes.
*/
-#define __weak __attribute__((__weak__))
+#define __weak __attribute__((__weak__)) __function_aligned

/*
* Used by functions that use '__builtin_return_address'. These function
--
2.30.2

2023-01-09 15:08:17

by Miguel Ojeda

[permalink] [raw]

Subject: Re: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds

On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <[email protected]> wrote:
>
> As far as I can tell, GCC doesn't respect '-falign-functions=N':
>
> * When the __weak__ attribute is used
>
> GCC seems to forget the alignment specified by '-falign-functions=N',
> but will respect the '__aligned__(N)' function attribute. Thus, we can
> work around this by explciitly setting the alignment for weak
> functions.
>
> * When the __cold__ attribute is used
>
> GCC seems to forget the alignment specified by '-falign-functions=N',
> and also doesn't seem to respect the '__aligned__(N)' function
> attribute. The only way to work around this is to not use the __cold__
> attibute.

If you happen to have a reduced case, then it would be nice to link it
in the commit. A bug report to GCC would also be nice.

I gave it a very quick try in Compiler Explorer, but I couldn't
reproduce it, so I guess it depends on flags, non-trivial functions or
something else.

> + * '-falign-functions=N', and require alignment to be specificed via a function

Nit: specificed -> specified

> +#if CONFIG_FUNCTION_ALIGNMENT > 0
> +#define __function_aligned __aligned(CONFIG_FUNCTION_ALIGNMENT)
> +#else
> +#define __function_aligned
> +#endif

Currently, the file is intended for attributes that do not depend on
`CONFIG_*` options.

What I usually mention is that we could change that policy, but
otherwise these would go into e.g. `compiler_types.h`.

> +#if !defined(CONFIG_CC_IS_GCC) || (CONFIG_FUNCTION_ALIGNMENT == 0)
> #define __cold __attribute__((__cold__))
> +#else
> +#define __cold
> +#endif

Similarly, in this case this could go into `compiler-gcc.h` /
`compiler-clang.h` etc., since the definition will be different for
each.

Cheers,
Miguel

2023-01-09 17:13:39

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds

On Mon, Jan 09, 2023 at 03:43:16PM +0100, Miguel Ojeda wrote:
> On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <[email protected]> wrote:
> >
> > As far as I can tell, GCC doesn't respect '-falign-functions=N':
> >
> > * When the __weak__ attribute is used
> >
> > GCC seems to forget the alignment specified by '-falign-functions=N',
> > but will respect the '__aligned__(N)' function attribute. Thus, we can
> > work around this by explciitly setting the alignment for weak

Whoops: s/explciitly/explicitly/ here too; I'll go re-proofread the series.

> > functions.
> >
> > * When the __cold__ attribute is used
> >
> > GCC seems to forget the alignment specified by '-falign-functions=N',
> > and also doesn't seem to respect the '__aligned__(N)' function
> > attribute. The only way to work around this is to not use the __cold__
> > attibute.

Whoops: s/attibute/attribute/

> If you happen to have a reduced case, then it would be nice to link it
> in the commit. A bug report to GCC would also be nice.
>
> I gave it a very quick try in Compiler Explorer, but I couldn't
> reproduce it, so I guess it depends on flags, non-trivial functions or
> something else.

Sorry, that is something I had intendeed to do but I hadn't extracted a
reproducer yet. I'll try to come up with something that can be included in the
commit message and reported to GCC folk (and double-check at the same time that
there's not another hidden cause)

With this series applied and this patch reverted, it's possible to see when
building defconfig + CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y, where scanning
/proc/kallsyms with:

$ grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] '

... will show a bunch of cold functions (and their callees/callers), largely
init/exit functions (so I'll double-check whether section handling as an
effect), e.g.

ffffdf08be173b8c t snd_soc_exit
ffffdf08be173bc4 t apple_mca_driver_exit
ffffdf08be173be8 t failover_exit
ffffdf08be173c10 t inet_diag_exit
ffffdf08be173c60 t tcp_diag_exit
ffffdf08be173c84 t cubictcp_unregister
ffffdf08be173cac t af_unix_exit
ffffdf08be173cf4 t packet_exit
ffffdf08be173d3c t cleanup_sunrpc
ffffdf08be173d8c t exit_rpcsec_gss
ffffdf08be173dc4 t exit_p9
ffffdf08be173dec T p9_client_exit
ffffdf08be173e10 t p9_trans_fd_exit
ffffdf08be173e58 t p9_virtio_cleanup
ffffdf08be173e90 t exit_dns_resolver

> > + * '-falign-functions=N', and require alignment to be specificed via a function
>
> Nit: specificed -> specified

Thanks, fixed

> > +#if CONFIG_FUNCTION_ALIGNMENT > 0
> > +#define __function_aligned __aligned(CONFIG_FUNCTION_ALIGNMENT)
> > +#else
> > +#define __function_aligned
> > +#endif
>
> Currently, the file is intended for attributes that do not depend on
> `CONFIG_*` options.
>
> What I usually mention is that we could change that policy, but
> otherwise these would go into e.g. `compiler_types.h`.

I'm happy to move these, I just wasn't sure what the policy would be w.r.t. the
existing __weak and __cold defitions since those end up depending upon
__function_aligned.

I assume I should move them all? i.e. move __weak as well?

> > +#if !defined(CONFIG_CC_IS_GCC) || (CONFIG_FUNCTION_ALIGNMENT == 0)
> > #define __cold __attribute__((__cold__))
> > +#else
> > +#define __cold
> > +#endif
>
> Similarly, in this case this could go into `compiler-gcc.h` /
> `compiler-clang.h` etc., since the definition will be different for
> each.

Sure, can do.

Thanks,
Mark.

2023-01-09 23:37:26

by Miguel Ojeda

[permalink] [raw]

Subject: Re: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds

On Mon, Jan 9, 2023 at 6:06 PM Mark Rutland <[email protected]> wrote:
>
> Sorry, that is something I had intendeed to do but I hadn't extracted a
> reproducer yet. I'll try to come up with something that can be included in the
> commit message and reported to GCC folk (and double-check at the same time that
> there's not another hidden cause)

Yeah, no worries :) I suggested it because from my quick test it
didn't appear to be reproducible trivially, so I thought having the
reproducer would be nice.

> I'm happy to move these, I just wasn't sure what the policy would be w.r.t. the
> existing __weak and __cold defitions since those end up depending upon
> __function_aligned.
>
> I assume I should move them all? i.e. move __weak as well?

Yeah, with the current policy, all should be moved since their
behavior now depends on the config (eventually).

Cheers,
Miguel

2023-01-10 09:18:01

by David Laight

[permalink] [raw]

Subject: RE: [PATCH 0/8] arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

From: Mark Rutland
> Sent: 09 January 2023 13:58
>
> This series adds a new DYNAMIC_FTRACE_WITH_CALL_OPS mechanism, and
> enables support for this on arm64. This significantly reduces the
> overhead of tracing when a callsite/tracee has a single associated
> tracer, avoids a number of issues that make it undesireably and
> infeasible to use dynamically-allocated trampolines (e.g. branch range
> limitations), and makes it possible to implement support for
> DYNAMIC_FTRACE_WITH_DIRECT_CALLS in future.
>
> The main idea is to give each ftrace callsite an associated pointer to
> an ftrace_ops. The architecture's ftrace_caller trampoline can recover
> the ops pointer and invoke ops->func from this without needing to use
> ftrace_ops_list_func, which has to iterate through all registered ops.
>
> To do this, we use -fpatchable-function-entry=M,N, there N NOPs are
> placed before the function entry point...

Doesn't this bump the minimum gcc version up to something like 9.0 ?

How does it interact with the 'CFI stuff' that also uses the same area?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-01-10 10:53:06

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 0/8] arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

On Tue, Jan 10, 2023 at 08:55:58AM +0000, David Laight wrote:
> From: Mark Rutland
> > Sent: 09 January 2023 13:58
> >
> > This series adds a new DYNAMIC_FTRACE_WITH_CALL_OPS mechanism, and
> > enables support for this on arm64. This significantly reduces the
> > overhead of tracing when a callsite/tracee has a single associated
> > tracer, avoids a number of issues that make it undesireably and
> > infeasible to use dynamically-allocated trampolines (e.g. branch range
> > limitations), and makes it possible to implement support for
> > DYNAMIC_FTRACE_WITH_DIRECT_CALLS in future.
> >
> > The main idea is to give each ftrace callsite an associated pointer to
> > an ftrace_ops. The architecture's ftrace_caller trampoline can recover
> > the ops pointer and invoke ops->func from this without needing to use
> > ftrace_ops_list_func, which has to iterate through all registered ops.
> >
> > To do this, we use -fpatchable-function-entry=M,N, there N NOPs are
> > placed before the function entry point...
>
> Doesn't this bump the minimum gcc version up to something like 9.0 ?

This doesn't bump the minimum GCC version, but users of older toolchains
won't get the speedup.

We already support -fpatchable-function-entry based ftrace with GCC 8+ (and
this is necessary to play nicely with pointer authentication), for older GCC
versions we still support using -pg / mcount.

> How does it interact with the 'CFI stuff' that also uses the same area?

There's some more detail in patch 8, but the summary is that they're mutually
exclusive for now (enforce by Kconfig), and I'm working with others to get
improved compiler support necessary for them to play nicely together.

Currently LLVM will place the type-hash before the pre-function NOPs, which
works if everything has pre-function NOPs, but doesn't work for calls between
instrumented and non-instrumented functions, since as the latter don't have
pre-function NOPs and the type hash is placed at a different offset. To make
them work better together we'll need some improved compiler support, and I'm
working with others for that currently.

GCC doesn't currently have KCFI support, but the plan is to match whatever LLVM
does.

Atop that we'll need some trivial changes to the asm function macros, but
without the underlying compiler support there's not much point.

Thanks,
Mark.

2023-01-10 13:54:53

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH 2/8] ACPI: Don't build ACPICA with '-Os'

On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <[email protected]> wrote:
>
> The ACPICA code has been built with '-Os' since the beginning of git
> history, though there's no explanatory comment as to why.
>
> This is unfortunate as building with '-Os' overrides -falign-functions,
> which prevents CONFIG_FUNCITON_ALIGNMENT and
> CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B from having their expected effect
> on the ACPICA code. This is doubly unfortunate as in subsequent patches
> arm64 will depend upon CONFIG_FUNCTION_ALIGNMENT for its ftrace
> implementation.
>
> Drop the '-Os' flag when building the ACPICA code. With this removed,
> the code builds cleanly and works correctly in testing so far.

Fair enough.

Acked-by: Rafael J. Wysocki <[email protected]>

and please feel free to route this through the arch tree along with
the rest of the series.

Thanks!

> I've tested this by selecting CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y,
> building and booting a kernel using ACPI, and looking for misaligned
> text symbols:
>
> * arm64:
>
> Before:
> # grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
> 908
> # grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
> 576
>
> After:
> # grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
> 322
> # grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
> 0
>
> * x86_64:
>
> Before:
> # grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
> 2057
> # grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
> 706
>
> After:
> # grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
> 1351
> # grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
> 0
>
> With the patch applied, the remaining unaligned text labels are a
> combination of static call trampolines and labels in assembly, which
> will be dealt with in subsequent patches.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Florent Revest <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Robert Moore <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> ---
> drivers/acpi/acpica/Makefile | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/acpica/Makefile b/drivers/acpi/acpica/Makefile
> index 9e0d95d76fff7..30f3fc13c29d1 100644
> --- a/drivers/acpi/acpica/Makefile
> +++ b/drivers/acpi/acpica/Makefile
> @@ -3,7 +3,7 @@
> # Makefile for ACPICA Core interpreter
> #
>
> -ccflags-y := -Os -D_LINUX -DBUILDING_ACPICA
> +ccflags-y := -D_LINUX -DBUILDING_ACPICA
> ccflags-$(CONFIG_ACPI_DEBUG) += -DACPI_DEBUG_OUTPUT
>
> # use acpi.o to put all files here into acpi.o modparam namespace
> --
> 2.30.2
>

2023-01-10 21:00:34

[permalink] [raw]

Subject: Re: [PATCH 3/8] arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT

On Tue, Jan 10, 2023 at 09:35:18PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 09, 2023 at 01:58:23PM +0000, Mark Rutland wrote:
>
> > diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
> > index 1436fa1cde24d..df18a3446ce82 100644
> > --- a/arch/arm64/include/asm/linkage.h
> > +++ b/arch/arm64/include/asm/linkage.h
> > @@ -5,8 +5,14 @@
> > #include <asm/assembler.h>
> > #endif
> >
> > -#define __ALIGN .align 2
> > -#define __ALIGN_STR ".align 2"
> > +#if CONFIG_FUNCTION_ALIGNMENT > 0
> > +#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT
> > +#else
> > +#define ARM64_FUNCTION_ALIGNMENT 4
> > +#endif
> > +
> > +#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT
> > +#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT
>
> Isn't that much the same as having ARM64 select FUNCTION_ALIGNMENT_4B
> and simply removing all these lines and relying on the default
> behaviour?

There's a proposal (with some rough performance claims) to select
FUNCTION_ALIGNMENT_16B over at:

https://lore.kernel.org/r/[email protected]

so we could just go with that?

Will

2023-01-10 21:02:36

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 3/8] arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT

On Mon, Jan 09, 2023 at 01:58:23PM +0000, Mark Rutland wrote:

> diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
> index 1436fa1cde24d..df18a3446ce82 100644
> --- a/arch/arm64/include/asm/linkage.h
> +++ b/arch/arm64/include/asm/linkage.h
> @@ -5,8 +5,14 @@
> #include <asm/assembler.h>
> #endif
>
> -#define __ALIGN .align 2
> -#define __ALIGN_STR ".align 2"
> +#if CONFIG_FUNCTION_ALIGNMENT > 0
> +#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT
> +#else
> +#define ARM64_FUNCTION_ALIGNMENT 4
> +#endif
> +
> +#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT
> +#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT

Isn't that much the same as having ARM64 select FUNCTION_ALIGNMENT_4B
and simply removing all these lines and relying on the default
behaviour?

2023-01-11 11:43:17

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 3/8] arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT

On Tue, Jan 10, 2023 at 08:43:20PM +0000, Will Deacon wrote:
> On Tue, Jan 10, 2023 at 09:35:18PM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 09, 2023 at 01:58:23PM +0000, Mark Rutland wrote:
> >
> > > diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
> > > index 1436fa1cde24d..df18a3446ce82 100644
> > > --- a/arch/arm64/include/asm/linkage.h
> > > +++ b/arch/arm64/include/asm/linkage.h
> > > @@ -5,8 +5,14 @@
> > > #include <asm/assembler.h>
> > > #endif
> > >
> > > -#define __ALIGN .align 2
> > > -#define __ALIGN_STR ".align 2"
> > > +#if CONFIG_FUNCTION_ALIGNMENT > 0
> > > +#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT
> > > +#else
> > > +#define ARM64_FUNCTION_ALIGNMENT 4
> > > +#endif
> > > +
> > > +#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT
> > > +#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT
> >
> > Isn't that much the same as having ARM64 select FUNCTION_ALIGNMENT_4B
> > and simply removing all these lines and relying on the default
> > behaviour?
>
> There's a proposal (with some rough performance claims) to select
> FUNCTION_ALIGNMENT_16B over at:
>
> https://lore.kernel.org/r/[email protected]
>
> so we could just go with that?

I reckon it'd be worth having that as a separate patch atop, to split the
infrastructure from the actual change, but I'm happy to go with 16B immediately
if you'd prefer.

It'd be nice if we could get some numbers...

Thanks,
Mark.

2023-01-11 12:07:18

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 3/8] arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT

On Tue, Jan 10, 2023 at 09:35:18PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 09, 2023 at 01:58:23PM +0000, Mark Rutland wrote:
>
> > diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
> > index 1436fa1cde24d..df18a3446ce82 100644
> > --- a/arch/arm64/include/asm/linkage.h
> > +++ b/arch/arm64/include/asm/linkage.h
> > @@ -5,8 +5,14 @@
> > #include <asm/assembler.h>
> > #endif
> >
> > -#define __ALIGN .align 2
> > -#define __ALIGN_STR ".align 2"
> > +#if CONFIG_FUNCTION_ALIGNMENT > 0
> > +#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT
> > +#else
> > +#define ARM64_FUNCTION_ALIGNMENT 4
> > +#endif
> > +
> > +#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT
> > +#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT
>
> Isn't that much the same as having ARM64 select FUNCTION_ALIGNMENT_4B
> and simply removing all these lines and relying on the default
> behaviour?

Yes, it is.

I was focussed on obviously retaining the existing semantic by default, and I
missed that was possible by selecting FUNCTION_ALIGNMENT_4B.

Thanks,
Mark.

2023-01-11 19:16:18

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds

On Mon, Jan 09, 2023 at 03:43:16PM +0100, Miguel Ojeda wrote:
> On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <[email protected]> wrote:
> >
> > As far as I can tell, GCC doesn't respect '-falign-functions=N':
> >
> > * When the __weak__ attribute is used
> >
> > GCC seems to forget the alignment specified by '-falign-functions=N',
> > but will respect the '__aligned__(N)' function attribute. Thus, we can
> > work around this by explciitly setting the alignment for weak
> > functions.
> >
> > * When the __cold__ attribute is used
> >
> > GCC seems to forget the alignment specified by '-falign-functions=N',
> > and also doesn't seem to respect the '__aligned__(N)' function
> > attribute. The only way to work around this is to not use the __cold__
> > attibute.
>
> If you happen to have a reduced case, then it would be nice to link it
> in the commit. A bug report to GCC would also be nice.
>
> I gave it a very quick try in Compiler Explorer, but I couldn't
> reproduce it, so I guess it depends on flags, non-trivial functions or
> something else.

So having spent today coming up with tests, it turns out it's not quite as I
described above, but in a sense worse. I'm posting a summary here for
posterity; I'll try to get this to compiler folk shortly.

GCC appears to not align cold functions to the alignment specified by
`-falign-functions=N` when compiling at `-O1` or above. Alignment *can* be
restored with explicit attributes on each function, but due to some
interprocedural analysis, callees can be implicitly marked as cold (losing
their default alignment), which means we don't have a reliable mechanism to
ensure functions are always aligned short of annotating *every* function
explicitly (and I suspect that's not sufficient due to some interprocedural optimizations).

I've tested with the 12.1.0 binary release from the kernel.org cross toolchains
page).

LLVM always seems to repsect `-falign-functions=N` at both `-O1` and `-O2` (I
tested the 10.0.0, 11.0.0, 11.0.1, 15.0.6 binary releases from llvm.org).

For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-cold.c
| #define __cold \
| __attribute__((cold))
|
| #define EXPORT_FUNC_PTR(func) \
| typeof((func)) *__ptr_##func = (func)
|
| __cold
| void cold_func_a(void) { }
|
| __cold
| void cold_func_b(void) { }
|
| __cold
| void cold_func_c(void) { }
|
| static __cold
| void static_cold_func_a(void) { }
| EXPORT_FUNC_PTR(static_cold_func_a);
|
| static __cold
| void static_cold_func_b(void) { }
| EXPORT_FUNC_PTR(static_cold_func_b);
|
| static __cold
| void static_cold_func_c(void) { }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-cold.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-cold.o
|
| test-cold.o: file format elf64-littleaarch64
|
|
| Disassembly of section .text:
|
| 0000000000000000 <static_cold_func_a>:
| 0: d65f03c0 ret
|
| 0000000000000004 <static_cold_func_b>:
| 4: d65f03c0 ret
|
| 0000000000000008 <static_cold_func_c>:
| 8: d65f03c0 ret
|
| 000000000000000c <cold_func_a>:
| c: d65f03c0 ret
|
| 0000000000000010 <cold_func_b>:
| 10: d65f03c0 ret
|
| 0000000000000014 <cold_func_c>:
| 14: d65f03c0 ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-cold.o
|
| test-cold.o: file format elf64-littleaarch64
|
| Sections:
| Idx Name Size VMA LMA File off Algn
| 0 .text 00000018 0000000000000000 0000000000000000 00000040 2**2
| CONTENTS, ALLOC, LOAD, READONLY, CODE
| 1 .data 00000018 0000000000000000 0000000000000000 00000058 2**3
| CONTENTS, ALLOC, LOAD, RELOC, DATA
| 2 .bss 00000000 0000000000000000 0000000000000000 00000070 2**0
| ALLOC
| 3 .comment 00000013 0000000000000000 0000000000000000 00000070 2**0
| CONTENTS, READONLY
| 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000083 2**0
| CONTENTS, READONLY
| 5 .eh_frame 00000090 0000000000000000 0000000000000000 00000088 2**3
| CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

In simple cases, alignment *can* be restored if an explicit function attribute
is used. For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold.c
| #define __aligned(n) \
| __attribute__((aligned(n)))
|
| #define __cold \
| __attribute__((cold)) __aligned(16)
|
| #define EXPORT_FUNC_PTR(func) \
| typeof((func)) *__ptr_##func = (func)
|
| __cold
| void cold_func_a(void) { }
|
| __cold
| void cold_func_b(void) { }
|
| __cold
| void cold_func_c(void) { }
|
| static __cold
| void static_cold_func_a(void) { }
| EXPORT_FUNC_PTR(static_cold_func_a);
|
| static __cold
| void static_cold_func_b(void) { }
| EXPORT_FUNC_PTR(static_cold_func_b);
|
| static __cold
| void static_cold_func_c(void) { }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold.o
|
| test-aligned-cold.o: file format elf64-littleaarch64
|
|
| Disassembly of section .text:
|
| 0000000000000000 <static_cold_func_a>:
| 0: d65f03c0 ret
| 4: d503201f nop
| 8: d503201f nop
| c: d503201f nop
|
| 0000000000000010 <static_cold_func_b>:
| 10: d65f03c0 ret
| 14: d503201f nop
| 18: d503201f nop
| 1c: d503201f nop
|
| 0000000000000020 <static_cold_func_c>:
| 20: d65f03c0 ret
| 24: d503201f nop
| 28: d503201f nop
| 2c: d503201f nop
|
| 0000000000000030 <cold_func_a>:
| 30: d65f03c0 ret
| 34: d503201f nop
| 38: d503201f nop
| 3c: d503201f nop
|
| 0000000000000040 <cold_func_b>:
| 40: d65f03c0 ret
| 44: d503201f nop
| 48: d503201f nop
| 4c: d503201f nop
|
| 0000000000000050 <cold_func_c>:
| 50: d65f03c0 ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold.o
|
| test-aligned-cold.o: file format elf64-littleaarch64
|
| Sections:
| Idx Name Size VMA LMA File off Algn
| 0 .text 00000054 0000000000000000 0000000000000000 00000040 2**4
| CONTENTS, ALLOC, LOAD, READONLY, CODE
| 1 .data 00000018 0000000000000000 0000000000000000 00000098 2**3
| CONTENTS, ALLOC, LOAD, RELOC, DATA
| 2 .bss 00000000 0000000000000000 0000000000000000 000000b0 2**0
| ALLOC
| 3 .comment 00000013 0000000000000000 0000000000000000 000000b0 2**0
| CONTENTS, READONLY
| 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 000000c3 2**0
| CONTENTS, READONLY
| 5 .eh_frame 00000090 0000000000000000 0000000000000000 000000c8 2**3
| CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Unfortunately it appears that some interprocedural analysis determines that if
a callee is only called/referenced from cold callers, the callee is marked as
cold, and the alignment it would have got from the command line option is
dropped. If it's given an explicit alignment attribute, the alignment is
retained.

For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold-caller.c
| #define noinline \
| __attribute__((noinline))
|
| #define __aligned(n) \
| __attribute__((aligned(n)))
|
| #define __cold \
| __attribute__((cold)) __aligned(16)
|
| #define EXPORT_FUNC_PTR(func) \
| typeof((func)) *__ptr_##func = (func)
|
| static noinline void callee_a(void)
| {
| asm volatile("// callee_a\n" ::: "memory");
| }
|
| static noinline void callee_b(void)
| {
| asm volatile("// callee_b\n" ::: "memory");
| }
|
| static noinline void callee_c(void)
| {
| asm volatile("// callee_c\n" ::: "memory");
| }
| __cold
| void cold_func_a(void) { callee_a(); }
|
| __cold
| void cold_func_b(void) { callee_b(); }
|
| __cold
| void cold_func_c(void) { callee_c(); }
|
| static __cold
| void static_cold_func_a(void) { callee_a(); }
| EXPORT_FUNC_PTR(static_cold_func_a);
|
| static __cold
| void static_cold_func_b(void) { callee_b(); }
| EXPORT_FUNC_PTR(static_cold_func_b);
|
| static __cold
| void static_cold_func_c(void) { callee_c(); }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold-caller.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold-caller.o
|
| test-aligned-cold-caller.o: file format elf64-littleaarch64
|
|
| Disassembly of section .text:
|
| 0000000000000000 <callee_a>:
| 0: d65f03c0 ret
|
| 0000000000000004 <callee_b>:
| 4: d65f03c0 ret
|
| 0000000000000008 <callee_c>:
| 8: d65f03c0 ret
| c: d503201f nop
|
| 0000000000000010 <static_cold_func_a>:
| 10: a9bf7bfd stp x29, x30, [sp, #-16]!
| 14: 910003fd mov x29, sp
| 18: 97fffffa bl 0 <callee_a>
| 1c: a8c17bfd ldp x29, x30, [sp], #16
| 20: d65f03c0 ret
| 24: d503201f nop
| 28: d503201f nop
| 2c: d503201f nop
|
| 0000000000000030 <static_cold_func_b>:
| 30: a9bf7bfd stp x29, x30, [sp, #-16]!
| 34: 910003fd mov x29, sp
| 38: 97fffff3 bl 4 <callee_b>
| 3c: a8c17bfd ldp x29, x30, [sp], #16
| 40: d65f03c0 ret
| 44: d503201f nop
| 48: d503201f nop
| 4c: d503201f nop
|
| 0000000000000050 <static_cold_func_c>:
| 50: a9bf7bfd stp x29, x30, [sp, #-16]!
| 54: 910003fd mov x29, sp
| 58: 97ffffec bl 8 <callee_c>
| 5c: a8c17bfd ldp x29, x30, [sp], #16
| 60: d65f03c0 ret
| 64: d503201f nop
| 68: d503201f nop
| 6c: d503201f nop
|
| 0000000000000070 <cold_func_a>:
| 70: a9bf7bfd stp x29, x30, [sp, #-16]!
| 74: 910003fd mov x29, sp
| 78: 97ffffe2 bl 0 <callee_a>
| 7c: a8c17bfd ldp x29, x30, [sp], #16
| 80: d65f03c0 ret
| 84: d503201f nop
| 88: d503201f nop
| 8c: d503201f nop
|
| 0000000000000090 <cold_func_b>:
| 90: a9bf7bfd stp x29, x30, [sp, #-16]!
| 94: 910003fd mov x29, sp
| 98: 97ffffdb bl 4 <callee_b>
| 9c: a8c17bfd ldp x29, x30, [sp], #16
| a0: d65f03c0 ret
| a4: d503201f nop
| a8: d503201f nop
| ac: d503201f nop
|
| 00000000000000b0 <cold_func_c>:
| b0: a9bf7bfd stp x29, x30, [sp, #-16]!
| b4: 910003fd mov x29, sp
| b8: 97ffffd4 bl 8 <callee_c>
| bc: a8c17bfd ldp x29, x30, [sp], #16
| c0: d65f03c0 ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold-caller.o
|
| test-aligned-cold-caller.o: file format elf64-littleaarch64
|
| Sections:
| Idx Name Size VMA LMA File off Algn
| 0 .text 000000c4 0000000000000000 0000000000000000 00000040 2**4
| CONTENTS, ALLOC, LOAD, READONLY, CODE
| 1 .data 00000018 0000000000000000 0000000000000000 00000108 2**3
| CONTENTS, ALLOC, LOAD, RELOC, DATA
| 2 .bss 00000000 0000000000000000 0000000000000000 00000120 2**0
| ALLOC
| 3 .comment 00000013 0000000000000000 0000000000000000 00000120 2**0
| CONTENTS, READONLY
| 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000133 2**0
| CONTENTS, READONLY
| 5 .eh_frame 00000110 0000000000000000 0000000000000000 00000138 2**3
| CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Thanks,
Mark.

2023-01-12 12:33:09

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds

On Wed, Jan 11, 2023 at 06:27:53PM +0000, Mark Rutland wrote:
> On Mon, Jan 09, 2023 at 03:43:16PM +0100, Miguel Ojeda wrote:
> > On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <[email protected]> wrote:
> > >
> > > As far as I can tell, GCC doesn't respect '-falign-functions=N':
> > >
> > > * When the __weak__ attribute is used
> > >
> > > GCC seems to forget the alignment specified by '-falign-functions=N',
> > > but will respect the '__aligned__(N)' function attribute. Thus, we can
> > > work around this by explciitly setting the alignment for weak
> > > functions.
> > >
> > > * When the __cold__ attribute is used
> > >
> > > GCC seems to forget the alignment specified by '-falign-functions=N',
> > > and also doesn't seem to respect the '__aligned__(N)' function
> > > attribute. The only way to work around this is to not use the __cold__
> > > attibute.
> >
> > If you happen to have a reduced case, then it would be nice to link it
> > in the commit. A bug report to GCC would also be nice.
> >
> > I gave it a very quick try in Compiler Explorer, but I couldn't
> > reproduce it, so I guess it depends on flags, non-trivial functions or
> > something else.
>
> So having spent today coming up with tests, it turns out it's not quite as I
> described above, but in a sense worse. I'm posting a summary here for
> posterity; I'll try to get this to compiler folk shortly.

I've added the cold bits to an existing ticket:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

I have not been able to reproduce the issue with __weak__, so I'll go dig into
that some more; it's likely I was mistaken there.

Thanks,
Mark.

2023-01-13 13:34:52

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds

On Thu, Jan 12, 2023 at 11:38:17AM +0000, Mark Rutland wrote:
> On Wed, Jan 11, 2023 at 06:27:53PM +0000, Mark Rutland wrote:
> > On Mon, Jan 09, 2023 at 03:43:16PM +0100, Miguel Ojeda wrote:
> > > On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <[email protected]> wrote:
> > > >
> > > > As far as I can tell, GCC doesn't respect '-falign-functions=N':
> > > >
> > > > * When the __weak__ attribute is used
> > > >
> > > > GCC seems to forget the alignment specified by '-falign-functions=N',
> > > > but will respect the '__aligned__(N)' function attribute. Thus, we can
> > > > work around this by explciitly setting the alignment for weak
> > > > functions.
> > > >
> > > > * When the __cold__ attribute is used
> > > >
> > > > GCC seems to forget the alignment specified by '-falign-functions=N',
> > > > and also doesn't seem to respect the '__aligned__(N)' function
> > > > attribute. The only way to work around this is to not use the __cold__
> > > > attibute.
> > >
> > > If you happen to have a reduced case, then it would be nice to link it
> > > in the commit. A bug report to GCC would also be nice.
> > >
> > > I gave it a very quick try in Compiler Explorer, but I couldn't
> > > reproduce it, so I guess it depends on flags, non-trivial functions or
> > > something else.
> >
> > So having spent today coming up with tests, it turns out it's not quite as I
> > described above, but in a sense worse. I'm posting a summary here for
> > posterity; I'll try to get this to compiler folk shortly.
>
> I've added the cold bits to an existing ticket:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
>
> I have not been able to reproduce the issue with __weak__, so I'll go dig into
> that some more; it's likely I was mistaken there.

It turns out that was a red herring; GCC is actually implicitly marking the
abort() function as cold, and as Linux's implementation happened to be marked
as weak I assumed that was the culprit.

I'll drop the changes to weak and update our abort implementation specifically,
with a comment.

I'll also go update the ticket above.

Thanks,
Mark.

2023-01-15 21:45:51

by Miguel Ojeda

[permalink] [raw]

Subject: Re: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds

On Fri, Jan 13, 2023 at 1:49 PM Mark Rutland <[email protected]> wrote:
>
> It turns out that was a red herring; GCC is actually implicitly marking the
> abort() function as cold, and as Linux's implementation happened to be marked
> as weak I assumed that was the culprit.

That and your previous message probably explains probably why I
couldn't reproduce it.

Thanks a lot for all the details -- the `cold` issue is reproducible
since gcc 4.6 at least: https://godbolt.org/z/PoxazzT9T

The `abort` case appears to happen since gcc 8.1.

Cheers,
Miguel