2023-04-05 18:04:55

by Florent Revest

[permalink] [raw]
Subject: [PATCH v6 0/5] Add ftrace direct call for arm64

This series adds ftrace direct call support to arm64.
This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64.

It is meant to be taken by the arm64 tree but it depends on the
trace-direct-v6.3-rc3 tag of the linux-trace tree:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
That tag was created by Steven Rostedt so the arm64 tree can pull the prior work
this depends on. [1]

Thanks to the ftrace refactoring under that tag, an ftrace_ops backing a ftrace
direct call will only ever point to *one* direct call. This means we can look up
the direct called trampoline address stored in the ops from the ftrace_caller
trampoline in the case when the destination would be out of reach of a BL
instruction at the ftrace callsite. This fixes limitations of previous attempts
such as [2].

This series has been tested on arm64 with:
1- CONFIG_FTRACE_SELFTEST
2- samples/ftrace/*.ko (cf: patch 4)
3- tools/testing/selftests/bpf/test_progs (cf: patch 5)

Changes since v5 [3]:
- Fixed saving the fourth argument of handle_mm_fault in both the x86 (patch 3)
and arm64 (as part of patch 4) "ftrace-direct-too" sample trampolines
- Fixed the address of the traced function logged by some direct call samples
(ftrace-direct-multi and ftrace-direct-multi-modify) by moving lr into x0

1: https://lore.kernel.org/all/ZB2Nl7fzpHoq5V20@FVFF77S0Q05N/
2: https://lore.kernel.org/all/[email protected]/
3: https://lore.kernel.org/bpf/[email protected]/

Florent Revest (5):
arm64: ftrace: Add direct call support
arm64: ftrace: Simplify get_ftrace_plt
samples: ftrace: Save required argument registers in sample
trampolines
arm64: ftrace: Add direct call trampoline samples support
selftests/bpf: Update the tests deny list on aarch64

arch/arm64/Kconfig | 6 ++
arch/arm64/include/asm/ftrace.h | 22 +++++
arch/arm64/kernel/asm-offsets.c | 6 ++
arch/arm64/kernel/entry-ftrace.S | 90 ++++++++++++++++----
arch/arm64/kernel/ftrace.c | 46 +++++++---
samples/ftrace/ftrace-direct-modify.c | 34 ++++++++
samples/ftrace/ftrace-direct-multi-modify.c | 40 +++++++++
samples/ftrace/ftrace-direct-multi.c | 24 ++++++
samples/ftrace/ftrace-direct-too.c | 40 +++++++--
samples/ftrace/ftrace-direct.c | 24 ++++++
tools/testing/selftests/bpf/DENYLIST.aarch64 | 82 ++----------------
11 files changed, 306 insertions(+), 108 deletions(-)

--
2.40.0.577.gac1e443424-goog


2023-04-05 18:05:01

by Florent Revest

[permalink] [raw]
Subject: [PATCH v6 2/5] arm64: ftrace: Simplify get_ftrace_plt

Following recent refactorings, the get_ftrace_plt function only ever
gets called with addr = FTRACE_ADDR so its code can be simplified to
always return the ftrace trampoline plt.

Signed-off-by: Florent Revest <[email protected]>
Acked-by: Mark Rutland <[email protected]>
---
arch/arm64/kernel/ftrace.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 758436727fba..432626c866a8 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -195,15 +195,15 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
return ftrace_modify_code(pc, 0, new, false);
}

-static struct plt_entry *get_ftrace_plt(struct module *mod, unsigned long addr)
+static struct plt_entry *get_ftrace_plt(struct module *mod)
{
#ifdef CONFIG_ARM64_MODULE_PLTS
struct plt_entry *plt = mod->arch.ftrace_trampolines;

- if (addr == FTRACE_ADDR)
- return &plt[FTRACE_PLT_IDX];
-#endif
+ return &plt[FTRACE_PLT_IDX];
+#else
return NULL;
+#endif
}

static bool reachable_by_bl(unsigned long addr, unsigned long pc)
@@ -270,7 +270,7 @@ static bool ftrace_find_callable_addr(struct dyn_ftrace *rec,
if (WARN_ON(!mod))
return false;

- plt = get_ftrace_plt(mod, *addr);
+ plt = get_ftrace_plt(mod);
if (!plt) {
pr_err("ftrace: no module PLT for %ps\n", (void *)*addr);
return false;
--
2.40.0.577.gac1e443424-goog

2023-04-05 18:05:17

by Florent Revest

[permalink] [raw]
Subject: [PATCH v6 1/5] arm64: ftrace: Add direct call support

This builds up on the CALL_OPS work which extends the ftrace patchsite
on arm64 with an ops pointer usable by the ftrace trampoline.

This ops pointer is valid at all time. Indeed, it is either pointing to
ftrace_list_ops or to the single ops which should be called from that
patchsite.

There are a few cases to distinguish:
- If a direct call ops is the only one tracing a function:
- If the direct called trampoline is within the reach of a BL
instruction
-> the ftrace patchsite jumps to the trampoline
- Else
-> the ftrace patchsite jumps to the ftrace_caller trampoline which
reads the ops pointer in the patchsite and jumps to the direct
call address stored in the ops
- Else
-> the ftrace patchsite jumps to the ftrace_caller trampoline and its
ops literal points to ftrace_list_ops so it iterates over all
registered ftrace ops, including the direct call ops and calls its
call_direct_funcs handler which stores the direct called
trampoline's address in the ftrace_regs and the ftrace_caller
trampoline will return to that address instead of returning to the
traced function

Signed-off-by: Florent Revest <[email protected]>
Co-developed-by: Mark Rutland <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
---
arch/arm64/Kconfig | 4 ++
arch/arm64/include/asm/ftrace.h | 22 ++++++++
arch/arm64/kernel/asm-offsets.c | 6 +++
arch/arm64/kernel/entry-ftrace.S | 90 ++++++++++++++++++++++++++------
arch/arm64/kernel/ftrace.c | 36 +++++++++++--
5 files changed, 138 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1023e896d46b..f3503d0cc1b8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -185,6 +185,10 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+ select HAVE_DYNAMIC_FTRACE_WITH_ARGS \
+ if $(cc-option,-fpatchable-function-entry=2)
+ select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
+ if DYNAMIC_FTRACE_WITH_ARGS && DYNAMIC_FTRACE_WITH_CALL_OPS
select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS \
if (DYNAMIC_FTRACE_WITH_ARGS && !CFI_CLANG && \
!CC_OPTIMIZE_FOR_SIZE)
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 1c2672bbbf37..b87d70b693c6 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -70,10 +70,19 @@ struct ftrace_ops;

#define arch_ftrace_get_regs(regs) NULL

+/*
+ * Note: sizeof(struct ftrace_regs) must be a multiple of 16 to ensure correct
+ * stack alignment
+ */
struct ftrace_regs {
/* x0 - x8 */
unsigned long regs[9];
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+ unsigned long direct_tramp;
+#else
unsigned long __unused;
+#endif

unsigned long fp;
unsigned long lr;
@@ -136,6 +145,19 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs);
#define ftrace_graph_func ftrace_graph_func
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs,
+ unsigned long addr)
+{
+ /*
+ * The ftrace trampoline will return to this address instead of the
+ * instrumented function.
+ */
+ fregs->direct_tramp = addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+
#endif

#define ftrace_return_address(n) return_address(n)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index ae345b06e9f7..0996094b0d22 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -93,6 +93,9 @@ int main(void)
DEFINE(FREGS_LR, offsetof(struct ftrace_regs, lr));
DEFINE(FREGS_SP, offsetof(struct ftrace_regs, sp));
DEFINE(FREGS_PC, offsetof(struct ftrace_regs, pc));
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+ DEFINE(FREGS_DIRECT_TRAMP, offsetof(struct ftrace_regs, direct_tramp));
+#endif
DEFINE(FREGS_SIZE, sizeof(struct ftrace_regs));
BLANK();
#endif
@@ -197,6 +200,9 @@ int main(void)
#endif
#ifdef CONFIG_FUNCTION_TRACER
DEFINE(FTRACE_OPS_FUNC, offsetof(struct ftrace_ops, func));
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+ DEFINE(FTRACE_OPS_DIRECT_CALL, offsetof(struct ftrace_ops, direct_call));
+#endif
#endif
return 0;
}
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 350ed81324ac..1c38a60575aa 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -36,6 +36,31 @@
SYM_CODE_START(ftrace_caller)
bti c

+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+ /*
+ * The literal pointer to the ops is at an 8-byte aligned boundary
+ * which is either 12 or 16 bytes before the BL instruction in the call
+ * site. See ftrace_call_adjust() for details.
+ *
+ * Therefore here the LR points at `literal + 16` or `literal + 20`,
+ * and we can find the address of the literal in either case by
+ * aligning to an 8-byte boundary and subtracting 16. We do the
+ * alignment first as this allows us to fold the subtraction into the
+ * LDR.
+ */
+ bic x11, x30, 0x7
+ ldr x11, [x11, #-(4 * AARCH64_INSN_SIZE)] // op
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+ /*
+ * If the op has a direct call, handle it immediately without
+ * saving/restoring registers.
+ */
+ ldr x17, [x11, #FTRACE_OPS_DIRECT_CALL] // op->direct_call
+ cbnz x17, ftrace_caller_direct
+#endif
+#endif
+
/* Save original SP */
mov x10, sp

@@ -49,6 +74,10 @@ SYM_CODE_START(ftrace_caller)
stp x6, x7, [sp, #FREGS_X6]
str x8, [sp, #FREGS_X8]

+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+ str xzr, [sp, #FREGS_DIRECT_TRAMP]
+#endif
+
/* Save the callsite's FP, LR, SP */
str x29, [sp, #FREGS_FP]
str x9, [sp, #FREGS_LR]
@@ -71,20 +100,7 @@ SYM_CODE_START(ftrace_caller)
mov x3, sp // regs

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
- /*
- * The literal pointer to the ops is at an 8-byte aligned boundary
- * which is either 12 or 16 bytes before the BL instruction in the call
- * site. See ftrace_call_adjust() for details.
- *
- * Therefore here the LR points at `literal + 16` or `literal + 20`,
- * and we can find the address of the literal in either case by
- * aligning to an 8-byte boundary and subtracting 16. We do the
- * alignment first as this allows us to fold the subtraction into the
- * LDR.
- */
- bic x2, x30, 0x7
- ldr x2, [x2, #-16] // op
-
+ mov x2, x11 // op
ldr x4, [x2, #FTRACE_OPS_FUNC] // op->func
blr x4 // op->func(ip, parent_ip, op, regs)

@@ -107,8 +123,15 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
ldp x6, x7, [sp, #FREGS_X6]
ldr x8, [sp, #FREGS_X8]

- /* Restore the callsite's FP, LR, PC */
+ /* Restore the callsite's FP */
ldr x29, [sp, #FREGS_FP]
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+ ldr x17, [sp, #FREGS_DIRECT_TRAMP]
+ cbnz x17, ftrace_caller_direct_late
+#endif
+
+ /* Restore the callsite's LR and PC */
ldr x30, [sp, #FREGS_LR]
ldr x9, [sp, #FREGS_PC]

@@ -116,8 +139,45 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
add sp, sp, #FREGS_SIZE + 32

ret x9
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+SYM_INNER_LABEL(ftrace_caller_direct_late, SYM_L_LOCAL)
+ /*
+ * Head to a direct trampoline in x17 after having run other tracers.
+ * The ftrace_regs are live, and x0-x8 and FP have been restored. The
+ * LR, PC, and SP have not been restored.
+ */
+
+ /*
+ * Restore the callsite's LR and PC matching the trampoline calling
+ * convention.
+ */
+ ldr x9, [sp, #FREGS_LR]
+ ldr x30, [sp, #FREGS_PC]
+
+ /* Restore the callsite's SP */
+ add sp, sp, #FREGS_SIZE + 32
+
+SYM_INNER_LABEL(ftrace_caller_direct, SYM_L_LOCAL)
+ /*
+ * Head to a direct trampoline in x17.
+ *
+ * We use `BR X17` as this can safely land on a `BTI C` or `PACIASP` in
+ * the trampoline, and will not unbalance any return stack.
+ */
+ br x17
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
SYM_CODE_END(ftrace_caller)

+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+SYM_CODE_START(ftrace_stub_direct_tramp)
+ bti c
+ mov x10, x30
+ mov x30, x9
+ ret x10
+SYM_CODE_END(ftrace_stub_direct_tramp)
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+
#else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */

/*
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 5545fe1a9012..758436727fba 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -206,6 +206,13 @@ static struct plt_entry *get_ftrace_plt(struct module *mod, unsigned long addr)
return NULL;
}

+static bool reachable_by_bl(unsigned long addr, unsigned long pc)
+{
+ long offset = (long)addr - (long)pc;
+
+ return offset >= -SZ_128M && offset < SZ_128M;
+}
+
/*
* Find the address the callsite must branch to in order to reach '*addr'.
*
@@ -220,14 +227,21 @@ static bool ftrace_find_callable_addr(struct dyn_ftrace *rec,
unsigned long *addr)
{
unsigned long pc = rec->ip;
- long offset = (long)*addr - (long)pc;
struct plt_entry *plt;

+ /*
+ * If a custom trampoline is unreachable, rely on the ftrace_caller
+ * trampoline which knows how to indirectly reach that trampoline
+ * through ops->direct_call.
+ */
+ if (*addr != FTRACE_ADDR && !reachable_by_bl(*addr, pc))
+ *addr = FTRACE_ADDR;
+
/*
* When the target is within range of the 'BL' instruction, use 'addr'
* as-is and branch to that directly.
*/
- if (offset >= -SZ_128M && offset < SZ_128M)
+ if (reachable_by_bl(*addr, pc))
return true;

/*
@@ -330,12 +344,24 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
unsigned long addr)
{
- if (WARN_ON_ONCE(old_addr != (unsigned long)ftrace_caller))
+ unsigned long pc = rec->ip;
+ u32 old, new;
+ int ret;
+
+ ret = ftrace_rec_set_ops(rec, arm64_rec_get_ops(rec));
+ if (ret)
+ return ret;
+
+ if (!ftrace_find_callable_addr(rec, NULL, &old_addr))
return -EINVAL;
- if (WARN_ON_ONCE(addr != (unsigned long)ftrace_caller))
+ if (!ftrace_find_callable_addr(rec, NULL, &addr))
return -EINVAL;

- return ftrace_rec_update_ops(rec);
+ old = aarch64_insn_gen_branch_imm(pc, old_addr,
+ AARCH64_INSN_BRANCH_LINK);
+ new = aarch64_insn_gen_branch_imm(pc, addr, AARCH64_INSN_BRANCH_LINK);
+
+ return ftrace_modify_code(pc, old, new, true);
}
#endif

--
2.40.0.577.gac1e443424-goog

2023-04-05 18:05:42

by Florent Revest

[permalink] [raw]
Subject: [PATCH v6 4/5] arm64: ftrace: Add direct call trampoline samples support

The ftrace samples need per-architecture trampoline implementations
to save and restore argument registers around the calls to
my_direct_func* and to restore polluted registers (eg: x30).

These samples also include <asm/asm-offsets.h> which, on arm64, is not
necessary and redefines previously defined macros (resulting in
warnings) so these includes are guarded by !CONFIG_ARM64.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm64/Kconfig | 2 ++
samples/ftrace/ftrace-direct-modify.c | 34 ++++++++++++++++++
samples/ftrace/ftrace-direct-multi-modify.c | 40 +++++++++++++++++++++
samples/ftrace/ftrace-direct-multi.c | 24 +++++++++++++
samples/ftrace/ftrace-direct-too.c | 26 ++++++++++++++
samples/ftrace/ftrace-direct.c | 24 +++++++++++++
6 files changed, 150 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f3503d0cc1b8..c2bf28099abd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -194,6 +194,8 @@ config ARM64
!CC_OPTIMIZE_FOR_SIZE)
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
if DYNAMIC_FTRACE_WITH_ARGS
+ select HAVE_SAMPLE_FTRACE_DIRECT
+ select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index 25fba66f61c0..98d1b7385f08 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,9 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/ftrace.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func1(void);
extern void my_direct_func2(void);
@@ -96,6 +98,38 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp1, @function\n"
+" .globl my_tramp1\n"
+" my_tramp1:"
+" bti c\n"
+" sub sp, sp, #16\n"
+" stp x9, x30, [sp]\n"
+" bl my_direct_func1\n"
+" ldp x30, x9, [sp]\n"
+" add sp, sp, #16\n"
+" ret x9\n"
+" .size my_tramp1, .-my_tramp1\n"
+
+" .type my_tramp2, @function\n"
+" .globl my_tramp2\n"
+" my_tramp2:"
+" bti c\n"
+" sub sp, sp, #16\n"
+" stp x9, x30, [sp]\n"
+" bl my_direct_func2\n"
+" ldp x30, x9, [sp]\n"
+" add sp, sp, #16\n"
+" ret x9\n"
+" .size my_tramp2, .-my_tramp2\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
+
static struct ftrace_ops direct;

static unsigned long my_tramp = (unsigned long)my_tramp1;
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index f72623899602..26956c8fc513 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -2,7 +2,9 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/ftrace.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func1(unsigned long ip);
extern void my_direct_func2(unsigned long ip);
@@ -103,6 +105,44 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp1, @function\n"
+" .globl my_tramp1\n"
+" my_tramp1:"
+" bti c\n"
+" sub sp, sp, #32\n"
+" stp x9, x30, [sp]\n"
+" str x0, [sp, #16]\n"
+" mov x0, x30\n"
+" bl my_direct_func1\n"
+" ldp x30, x9, [sp]\n"
+" ldr x0, [sp, #16]\n"
+" add sp, sp, #32\n"
+" ret x9\n"
+" .size my_tramp1, .-my_tramp1\n"
+
+" .type my_tramp2, @function\n"
+" .globl my_tramp2\n"
+" my_tramp2:"
+" bti c\n"
+" sub sp, sp, #32\n"
+" stp x9, x30, [sp]\n"
+" str x0, [sp, #16]\n"
+" mov x0, x30\n"
+" bl my_direct_func2\n"
+" ldp x30, x9, [sp]\n"
+" ldr x0, [sp, #16]\n"
+" add sp, sp, #32\n"
+" ret x9\n"
+" .size my_tramp2, .-my_tramp2\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
+
static unsigned long my_tramp = (unsigned long)my_tramp1;
static unsigned long tramps[2] = {
(unsigned long)my_tramp1,
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index 1547c2c6be02..b2ac90e0c02e 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -4,7 +4,9 @@
#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
#include <linux/sched/stat.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func(unsigned long ip);

@@ -66,6 +68,28 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp, @function\n"
+" .globl my_tramp\n"
+" my_tramp:"
+" bti c\n"
+" sub sp, sp, #32\n"
+" stp x9, x30, [sp]\n"
+" str x0, [sp, #16]\n"
+" mov x0, x30\n"
+" bl my_direct_func\n"
+" ldp x30, x9, [sp]\n"
+" ldr x0, [sp, #16]\n"
+" add sp, sp, #32\n"
+" ret x9\n"
+" .size my_tramp, .-my_tramp\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
static struct ftrace_ops direct;

static int __init ftrace_direct_multi_init(void)
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index 71ed4ee8cb4a..38f6f677f913 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -3,7 +3,9 @@

#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func(struct vm_area_struct *vma, unsigned long address,
unsigned int flags, struct pt_regs *regs);
@@ -72,6 +74,30 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp, @function\n"
+" .globl my_tramp\n"
+" my_tramp:"
+" bti c\n"
+" sub sp, sp, #48\n"
+" stp x9, x30, [sp]\n"
+" stp x0, x1, [sp, #16]\n"
+" stp x2, x3, [sp, #32]\n"
+" bl my_direct_func\n"
+" ldp x30, x9, [sp]\n"
+" ldp x0, x1, [sp, #16]\n"
+" ldp x2, x3, [sp, #32]\n"
+" add sp, sp, #48\n"
+" ret x9\n"
+" .size my_tramp, .-my_tramp\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
+
static struct ftrace_ops direct;

static int __init ftrace_direct_init(void)
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index d81a9473b585..e5312f9c15d3 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -3,7 +3,9 @@

#include <linux/sched.h> /* for wake_up_process() */
#include <linux/ftrace.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func(struct task_struct *p);

@@ -63,6 +65,28 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp, @function\n"
+" .globl my_tramp\n"
+" my_tramp:"
+" bti c\n"
+" sub sp, sp, #32\n"
+" stp x9, x30, [sp]\n"
+" str x0, [sp, #16]\n"
+" bl my_direct_func\n"
+" ldp x30, x9, [sp]\n"
+" ldr x0, [sp, #16]\n"
+" add sp, sp, #32\n"
+" ret x9\n"
+" .size my_tramp, .-my_tramp\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
+
static struct ftrace_ops direct;

static int __init ftrace_direct_init(void)
--
2.40.0.577.gac1e443424-goog

2023-04-05 18:05:52

by Florent Revest

[permalink] [raw]
Subject: [PATCH v6 5/5] selftests/bpf: Update the tests deny list on aarch64

Now that ftrace supports direct call on arm64, BPF tracing programs work
on that architecture. This fixes the vast majority of BPF selftests
except for:

- multi_kprobe programs which require fprobe, not available on arm64 yet
- tracing_struct which requires trampoline support to access struct args

This patch updates the list of BPF selftests which are known to fail so
the BPF CI can validate the tests which pass now.

Signed-off-by: Florent Revest <[email protected]>
---
tools/testing/selftests/bpf/DENYLIST.aarch64 | 82 ++------------------
1 file changed, 5 insertions(+), 77 deletions(-)

diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index 99cc33c51eaa..6b95cb544094 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -1,33 +1,5 @@
-bloom_filter_map # libbpf: prog 'check_bloom': failed to attach: ERROR: strerror_r(-524)=22
-bpf_cookie/lsm
-bpf_cookie/multi_kprobe_attach_api
-bpf_cookie/multi_kprobe_link_api
-bpf_cookie/trampoline
-bpf_loop/check_callback_fn_stop # link unexpected error: -524
-bpf_loop/check_invalid_flags
-bpf_loop/check_nested_calls
-bpf_loop/check_non_constant_callback
-bpf_loop/check_nr_loops
-bpf_loop/check_null_callback_ctx
-bpf_loop/check_stack
-bpf_mod_race # bpf_mod_kfunc_race__attach unexpected error: -524 (errno 524)
-bpf_tcp_ca/dctcp_fallback
-btf_dump/btf_dump: var_data # find type id unexpected find type id: actual -2 < expected 0
-cgroup_hierarchical_stats # attach unexpected error: -524 (errno 524)
-d_path/basic # setup attach failed: -524
-deny_namespace # attach unexpected error: -524 (errno 524)
-fentry_fexit # fentry_attach unexpected error: -1 (errno 524)
-fentry_test # fentry_attach unexpected error: -1 (errno 524)
-fexit_sleep # fexit_attach fexit attach failed: -1
-fexit_stress # fexit attach unexpected fexit attach: actual -524 < expected 0
-fexit_test # fexit_attach unexpected error: -1 (errno 524)
-get_func_args_test # get_func_args_test__attach unexpected error: -524 (errno 524) (trampoline)
-get_func_ip_test # get_func_ip_test__attach unexpected error: -524 (errno 524) (trampoline)
-htab_update/reenter_update
-kfree_skb # attach fentry unexpected error: -524 (trampoline)
-kfunc_call/subprog # extern (var ksym) 'bpf_prog_active': not found in kernel BTF
-kfunc_call/subprog_lskel # skel unexpected error: -2
-kfunc_dynptr_param/dynptr_data_null # libbpf: prog 'dynptr_data_null': failed to attach: ERROR: strerror_r(-524)=22
+bpf_cookie/multi_kprobe_attach_api # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
+bpf_cookie/multi_kprobe_link_api # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
kprobe_multi_bench_attach # bpf_program__attach_kprobe_multi_opts unexpected error: -95
kprobe_multi_test/attach_api_addrs # bpf_program__attach_kprobe_multi_opts unexpected error: -95
kprobe_multi_test/attach_api_pattern # bpf_program__attach_kprobe_multi_opts unexpected error: -95
@@ -35,50 +7,6 @@ kprobe_multi_test/attach_api_syms # bpf_program__attach_kprobe_mu
kprobe_multi_test/bench_attach # bpf_program__attach_kprobe_multi_opts unexpected error: -95
kprobe_multi_test/link_api_addrs # link_fd unexpected link_fd: actual -95 < expected 0
kprobe_multi_test/link_api_syms # link_fd unexpected link_fd: actual -95 < expected 0
-kprobe_multi_test/skel_api # kprobe_multi__attach unexpected error: -524 (errno 524)
-ksyms_module/libbpf # 'bpf_testmod_ksym_percpu': not found in kernel BTF
-ksyms_module/lskel # test_ksyms_module_lskel__open_and_load unexpected error: -2
-libbpf_get_fd_by_id_opts # test_libbpf_get_fd_by_id_opts__attach unexpected error: -524 (errno 524)
-linked_list
-lookup_key # test_lookup_key__attach unexpected error: -524 (errno 524)
-lru_bug # lru_bug__attach unexpected error: -524 (errno 524)
-modify_return # modify_return__attach failed unexpected error: -524 (errno 524)
-module_attach # skel_attach skeleton attach failed: -524
-mptcp/base # run_test mptcp unexpected error: -524 (errno 524)
-netcnt # packets unexpected packets: actual 10001 != expected 10000
-rcu_read_lock # failed to attach: ERROR: strerror_r(-524)=22
-recursion # skel_attach unexpected error: -524 (errno 524)
-ringbuf # skel_attach skeleton attachment failed: -1
-setget_sockopt # attach_cgroup unexpected error: -524
-sk_storage_tracing # test_sk_storage_tracing__attach unexpected error: -524 (errno 524)
-skc_to_unix_sock # could not attach BPF object unexpected error: -524 (errno 524)
-socket_cookie # prog_attach unexpected error: -524
-stacktrace_build_id # compare_stack_ips stackmap vs. stack_amap err -1 errno 2
-task_local_storage/exit_creds # skel_attach unexpected error: -524 (errno 524)
-task_local_storage/recursion # skel_attach unexpected error: -524 (errno 524)
-test_bprm_opts # attach attach failed: -524
-test_ima # attach attach failed: -524
-test_local_storage # attach lsm attach failed: -524
-test_lsm # test_lsm_first_attach unexpected error: -524 (errno 524)
-test_overhead # attach_fentry unexpected error: -524
-timer # timer unexpected error: -524 (errno 524)
-timer_crash # timer_crash__attach unexpected error: -524 (errno 524)
-timer_mim # timer_mim unexpected error: -524 (errno 524)
-trace_printk # trace_printk__attach unexpected error: -1 (errno 524)
-trace_vprintk # trace_vprintk__attach unexpected error: -1 (errno 524)
-tracing_struct # tracing_struct__attach unexpected error: -524 (errno 524)
-trampoline_count # attach_prog unexpected error: -524
-unpriv_bpf_disabled # skel_attach unexpected error: -524 (errno 524)
-user_ringbuf/test_user_ringbuf_post_misaligned # misaligned_skel unexpected error: -524 (errno 524)
-user_ringbuf/test_user_ringbuf_post_producer_wrong_offset
-user_ringbuf/test_user_ringbuf_post_larger_than_ringbuf_sz
-user_ringbuf/test_user_ringbuf_basic # ringbuf_basic_skel unexpected error: -524 (errno 524)
-user_ringbuf/test_user_ringbuf_sample_full_ring_buffer
-user_ringbuf/test_user_ringbuf_post_alignment_autoadjust
-user_ringbuf/test_user_ringbuf_overfill
-user_ringbuf/test_user_ringbuf_discards_properly_ignored
-user_ringbuf/test_user_ringbuf_loop
-user_ringbuf/test_user_ringbuf_msg_protocol
-user_ringbuf/test_user_ringbuf_blocking_reserve
-verify_pkcs7_sig # test_verify_pkcs7_sig__attach unexpected error: -524 (errno 524)
-vmlinux # skel_attach skeleton attach failed: -524
+kprobe_multi_test/skel_api # libbpf: failed to load BPF skeleton 'kprobe_multi': -3
+module_attach # prog 'kprobe_multi': failed to auto-attach: -95
+tracing_struct # tracing_struct__attach unexpected error: -524 (errno 524)
\ No newline at end of file
--
2.40.0.577.gac1e443424-goog

2023-04-05 18:07:11

by Florent Revest

[permalink] [raw]
Subject: [PATCH v6 3/5] samples: ftrace: Save required argument registers in sample trampolines

The ftrace-direct-too sample traces the handle_mm_fault function whose
signature changed since the introduction of the sample. Since:
commit bce617edecad ("mm: do page fault accounting in handle_mm_fault")
handle_mm_fault now has 4 arguments. Therefore, the sample trampoline
should save 4 argument registers.

s390 saves all argument registers already so it does not need a change
but x86_64 needs an extra push and pop.

This also evolves the signature of the tracing function to make it
mirror the signature of the traced function.

Signed-off-by: Florent Revest <[email protected]>
---
samples/ftrace/ftrace-direct-too.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index f28e7b99840f..71ed4ee8cb4a 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -5,14 +5,14 @@
#include <linux/ftrace.h>
#include <asm/asm-offsets.h>

-extern void my_direct_func(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags);
+extern void my_direct_func(struct vm_area_struct *vma, unsigned long address,
+ unsigned int flags, struct pt_regs *regs);

-void my_direct_func(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags)
+void my_direct_func(struct vm_area_struct *vma, unsigned long address,
+ unsigned int flags, struct pt_regs *regs)
{
- trace_printk("handle mm fault vma=%p address=%lx flags=%x\n",
- vma, address, flags);
+ trace_printk("handle mm fault vma=%p address=%lx flags=%x regs=%p\n",
+ vma, address, flags, regs);
}

extern void my_tramp(void *);
@@ -34,7 +34,9 @@ asm (
" pushq %rdi\n"
" pushq %rsi\n"
" pushq %rdx\n"
+" pushq %rcx\n"
" call my_direct_func\n"
+" popq %rcx\n"
" popq %rdx\n"
" popq %rsi\n"
" popq %rdi\n"
--
2.40.0.577.gac1e443424-goog

2023-04-05 20:44:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v6 3/5] samples: ftrace: Save required argument registers in sample trampolines

On Wed, 5 Apr 2023 20:02:48 +0200
Florent Revest <[email protected]> wrote:

> The ftrace-direct-too sample traces the handle_mm_fault function whose
> signature changed since the introduction of the sample. Since:
> commit bce617edecad ("mm: do page fault accounting in handle_mm_fault")
> handle_mm_fault now has 4 arguments. Therefore, the sample trampoline
> should save 4 argument registers.
>
> s390 saves all argument registers already so it does not need a change
> but x86_64 needs an extra push and pop.
>
> This also evolves the signature of the tracing function to make it
> mirror the signature of the traced function.
>

Should probably add:

Cc: [email protected]
Fixes: bce617edecad ("mm: do page fault accounting in handle_mm_fault")

Reviewed-by: Steven Rostedt (Google) <[email protected]>

-- Steve

2023-04-06 10:35:06

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v6 3/5] samples: ftrace: Save required argument registers in sample trampolines

On Wed, Apr 05, 2023 at 08:02:48PM +0200, Florent Revest wrote:
> The ftrace-direct-too sample traces the handle_mm_fault function whose
> signature changed since the introduction of the sample. Since:
> commit bce617edecad ("mm: do page fault accounting in handle_mm_fault")
> handle_mm_fault now has 4 arguments. Therefore, the sample trampoline
> should save 4 argument registers.
>
> s390 saves all argument registers already so it does not need a change
> but x86_64 needs an extra push and pop.
>
> This also evolves the signature of the tracing function to make it
> mirror the signature of the traced function.
>
> Signed-off-by: Florent Revest <[email protected]>

Reviewed-by: Mark Rutland <[email protected]>

Thanks for this!

Mark.

> ---
> samples/ftrace/ftrace-direct-too.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> index f28e7b99840f..71ed4ee8cb4a 100644
> --- a/samples/ftrace/ftrace-direct-too.c
> +++ b/samples/ftrace/ftrace-direct-too.c
> @@ -5,14 +5,14 @@
> #include <linux/ftrace.h>
> #include <asm/asm-offsets.h>
>
> -extern void my_direct_func(struct vm_area_struct *vma,
> - unsigned long address, unsigned int flags);
> +extern void my_direct_func(struct vm_area_struct *vma, unsigned long address,
> + unsigned int flags, struct pt_regs *regs);
>
> -void my_direct_func(struct vm_area_struct *vma,
> - unsigned long address, unsigned int flags)
> +void my_direct_func(struct vm_area_struct *vma, unsigned long address,
> + unsigned int flags, struct pt_regs *regs)
> {
> - trace_printk("handle mm fault vma=%p address=%lx flags=%x\n",
> - vma, address, flags);
> + trace_printk("handle mm fault vma=%p address=%lx flags=%x regs=%p\n",
> + vma, address, flags, regs);
> }
>
> extern void my_tramp(void *);
> @@ -34,7 +34,9 @@ asm (
> " pushq %rdi\n"
> " pushq %rsi\n"
> " pushq %rdx\n"
> +" pushq %rcx\n"
> " call my_direct_func\n"
> +" popq %rcx\n"
> " popq %rdx\n"
> " popq %rsi\n"
> " popq %rdi\n"
> --
> 2.40.0.577.gac1e443424-goog
>

2023-04-06 11:09:33

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v6 4/5] arm64: ftrace: Add direct call trampoline samples support

On Wed, Apr 05, 2023 at 08:02:49PM +0200, Florent Revest wrote:
> The ftrace samples need per-architecture trampoline implementations
> to save and restore argument registers around the calls to
> my_direct_func* and to restore polluted registers (eg: x30).
>
> These samples also include <asm/asm-offsets.h> which, on arm64, is not
> necessary and redefines previously defined macros (resulting in
> warnings) so these includes are guarded by !CONFIG_ARM64.
>
> Signed-off-by: Florent Revest <[email protected]>

These all look good to me. I gave each module a spin in an 8-vCPU VM on an M1
Macbook Pro with a bunch of other work going on, and all of those worked as
expected with sensible output in /sys/kernel/tracing/trace, and no noticeable
failures elsewhere. So:

Reviewed-by: Mark Rutland <[email protected]>
Tested-by: Mark Rutland <[email protected]>

Mark.

> ---
> arch/arm64/Kconfig | 2 ++
> samples/ftrace/ftrace-direct-modify.c | 34 ++++++++++++++++++
> samples/ftrace/ftrace-direct-multi-modify.c | 40 +++++++++++++++++++++
> samples/ftrace/ftrace-direct-multi.c | 24 +++++++++++++
> samples/ftrace/ftrace-direct-too.c | 26 ++++++++++++++
> samples/ftrace/ftrace-direct.c | 24 +++++++++++++
> 6 files changed, 150 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f3503d0cc1b8..c2bf28099abd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -194,6 +194,8 @@ config ARM64
> !CC_OPTIMIZE_FOR_SIZE)
> select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
> if DYNAMIC_FTRACE_WITH_ARGS
> + select HAVE_SAMPLE_FTRACE_DIRECT
> + select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> select HAVE_EFFICIENT_UNALIGNED_ACCESS
> select HAVE_FAST_GUP
> select HAVE_FTRACE_MCOUNT_RECORD
> diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> index 25fba66f61c0..98d1b7385f08 100644
> --- a/samples/ftrace/ftrace-direct-modify.c
> +++ b/samples/ftrace/ftrace-direct-modify.c
> @@ -2,7 +2,9 @@
> #include <linux/module.h>
> #include <linux/kthread.h>
> #include <linux/ftrace.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func1(void);
> extern void my_direct_func2(void);
> @@ -96,6 +98,38 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp1, @function\n"
> +" .globl my_tramp1\n"
> +" my_tramp1:"
> +" bti c\n"
> +" sub sp, sp, #16\n"
> +" stp x9, x30, [sp]\n"
> +" bl my_direct_func1\n"
> +" ldp x30, x9, [sp]\n"
> +" add sp, sp, #16\n"
> +" ret x9\n"
> +" .size my_tramp1, .-my_tramp1\n"
> +
> +" .type my_tramp2, @function\n"
> +" .globl my_tramp2\n"
> +" my_tramp2:"
> +" bti c\n"
> +" sub sp, sp, #16\n"
> +" stp x9, x30, [sp]\n"
> +" bl my_direct_func2\n"
> +" ldp x30, x9, [sp]\n"
> +" add sp, sp, #16\n"
> +" ret x9\n"
> +" .size my_tramp2, .-my_tramp2\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */
> +
> static struct ftrace_ops direct;
>
> static unsigned long my_tramp = (unsigned long)my_tramp1;
> diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> index f72623899602..26956c8fc513 100644
> --- a/samples/ftrace/ftrace-direct-multi-modify.c
> +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> @@ -2,7 +2,9 @@
> #include <linux/module.h>
> #include <linux/kthread.h>
> #include <linux/ftrace.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func1(unsigned long ip);
> extern void my_direct_func2(unsigned long ip);
> @@ -103,6 +105,44 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp1, @function\n"
> +" .globl my_tramp1\n"
> +" my_tramp1:"
> +" bti c\n"
> +" sub sp, sp, #32\n"
> +" stp x9, x30, [sp]\n"
> +" str x0, [sp, #16]\n"
> +" mov x0, x30\n"
> +" bl my_direct_func1\n"
> +" ldp x30, x9, [sp]\n"
> +" ldr x0, [sp, #16]\n"
> +" add sp, sp, #32\n"
> +" ret x9\n"
> +" .size my_tramp1, .-my_tramp1\n"
> +
> +" .type my_tramp2, @function\n"
> +" .globl my_tramp2\n"
> +" my_tramp2:"
> +" bti c\n"
> +" sub sp, sp, #32\n"
> +" stp x9, x30, [sp]\n"
> +" str x0, [sp, #16]\n"
> +" mov x0, x30\n"
> +" bl my_direct_func2\n"
> +" ldp x30, x9, [sp]\n"
> +" ldr x0, [sp, #16]\n"
> +" add sp, sp, #32\n"
> +" ret x9\n"
> +" .size my_tramp2, .-my_tramp2\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */
> +
> static unsigned long my_tramp = (unsigned long)my_tramp1;
> static unsigned long tramps[2] = {
> (unsigned long)my_tramp1,
> diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> index 1547c2c6be02..b2ac90e0c02e 100644
> --- a/samples/ftrace/ftrace-direct-multi.c
> +++ b/samples/ftrace/ftrace-direct-multi.c
> @@ -4,7 +4,9 @@
> #include <linux/mm.h> /* for handle_mm_fault() */
> #include <linux/ftrace.h>
> #include <linux/sched/stat.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func(unsigned long ip);
>
> @@ -66,6 +68,28 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp, @function\n"
> +" .globl my_tramp\n"
> +" my_tramp:"
> +" bti c\n"
> +" sub sp, sp, #32\n"
> +" stp x9, x30, [sp]\n"
> +" str x0, [sp, #16]\n"
> +" mov x0, x30\n"
> +" bl my_direct_func\n"
> +" ldp x30, x9, [sp]\n"
> +" ldr x0, [sp, #16]\n"
> +" add sp, sp, #32\n"
> +" ret x9\n"
> +" .size my_tramp, .-my_tramp\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */
> static struct ftrace_ops direct;
>
> static int __init ftrace_direct_multi_init(void)
> diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> index 71ed4ee8cb4a..38f6f677f913 100644
> --- a/samples/ftrace/ftrace-direct-too.c
> +++ b/samples/ftrace/ftrace-direct-too.c
> @@ -3,7 +3,9 @@
>
> #include <linux/mm.h> /* for handle_mm_fault() */
> #include <linux/ftrace.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func(struct vm_area_struct *vma, unsigned long address,
> unsigned int flags, struct pt_regs *regs);
> @@ -72,6 +74,30 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp, @function\n"
> +" .globl my_tramp\n"
> +" my_tramp:"
> +" bti c\n"
> +" sub sp, sp, #48\n"
> +" stp x9, x30, [sp]\n"
> +" stp x0, x1, [sp, #16]\n"
> +" stp x2, x3, [sp, #32]\n"
> +" bl my_direct_func\n"
> +" ldp x30, x9, [sp]\n"
> +" ldp x0, x1, [sp, #16]\n"
> +" ldp x2, x3, [sp, #32]\n"
> +" add sp, sp, #48\n"
> +" ret x9\n"
> +" .size my_tramp, .-my_tramp\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */
> +
> static struct ftrace_ops direct;
>
> static int __init ftrace_direct_init(void)
> diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
> index d81a9473b585..e5312f9c15d3 100644
> --- a/samples/ftrace/ftrace-direct.c
> +++ b/samples/ftrace/ftrace-direct.c
> @@ -3,7 +3,9 @@
>
> #include <linux/sched.h> /* for wake_up_process() */
> #include <linux/ftrace.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func(struct task_struct *p);
>
> @@ -63,6 +65,28 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp, @function\n"
> +" .globl my_tramp\n"
> +" my_tramp:"
> +" bti c\n"
> +" sub sp, sp, #32\n"
> +" stp x9, x30, [sp]\n"
> +" str x0, [sp, #16]\n"
> +" bl my_direct_func\n"
> +" ldp x30, x9, [sp]\n"
> +" ldr x0, [sp, #16]\n"
> +" add sp, sp, #32\n"
> +" ret x9\n"
> +" .size my_tramp, .-my_tramp\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */
> +
> static struct ftrace_ops direct;
>
> static int __init ftrace_direct_init(void)
> --
> 2.40.0.577.gac1e443424-goog
>

2023-04-11 16:04:54

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v6 0/5] Add ftrace direct call for arm64

On Wed, Apr 05, 2023 at 08:02:45PM +0200, Florent Revest wrote:
> This series adds ftrace direct call support to arm64.
> This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64.
>
> It is meant to be taken by the arm64 tree but it depends on the
> trace-direct-v6.3-rc3 tag of the linux-trace tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
> That tag was created by Steven Rostedt so the arm64 tree can pull the prior work
> this depends on. [1]

Catalin, Will, are you happy to pick this via the arm64 tree, or for it to go
via the trace tree?

We'd been assuming the former, but it looks like there'll be a (simple) merge
conflict with the series adding FUNCTION_GRAPH_RETVAL:

https://lore.kernel.org/lkml/[email protected]/

... as both series add some definitions to arm64's asm-offsets.c in the same
place, and all those additions need to be kept. Other than that, the two series
are independent.

IIUC Steve was hoping to take the FUNCTION_GRAPH_RETVAL series through the
trace tree, and if that's still the plan, maybe both should go that way?

Mark.

> Thanks to the ftrace refactoring under that tag, an ftrace_ops backing a ftrace
> direct call will only ever point to *one* direct call. This means we can look up
> the direct called trampoline address stored in the ops from the ftrace_caller
> trampoline in the case when the destination would be out of reach of a BL
> instruction at the ftrace callsite. This fixes limitations of previous attempts
> such as [2].
>
> This series has been tested on arm64 with:
> 1- CONFIG_FTRACE_SELFTEST
> 2- samples/ftrace/*.ko (cf: patch 4)
> 3- tools/testing/selftests/bpf/test_progs (cf: patch 5)
>
> Changes since v5 [3]:
> - Fixed saving the fourth argument of handle_mm_fault in both the x86 (patch 3)
> and arm64 (as part of patch 4) "ftrace-direct-too" sample trampolines
> - Fixed the address of the traced function logged by some direct call samples
> (ftrace-direct-multi and ftrace-direct-multi-modify) by moving lr into x0
>
> 1: https://lore.kernel.org/all/ZB2Nl7fzpHoq5V20@FVFF77S0Q05N/
> 2: https://lore.kernel.org/all/[email protected]/
> 3: https://lore.kernel.org/bpf/[email protected]/
>
> Florent Revest (5):
> arm64: ftrace: Add direct call support
> arm64: ftrace: Simplify get_ftrace_plt
> samples: ftrace: Save required argument registers in sample
> trampolines
> arm64: ftrace: Add direct call trampoline samples support
> selftests/bpf: Update the tests deny list on aarch64
>
> arch/arm64/Kconfig | 6 ++
> arch/arm64/include/asm/ftrace.h | 22 +++++
> arch/arm64/kernel/asm-offsets.c | 6 ++
> arch/arm64/kernel/entry-ftrace.S | 90 ++++++++++++++++----
> arch/arm64/kernel/ftrace.c | 46 +++++++---
> samples/ftrace/ftrace-direct-modify.c | 34 ++++++++
> samples/ftrace/ftrace-direct-multi-modify.c | 40 +++++++++
> samples/ftrace/ftrace-direct-multi.c | 24 ++++++
> samples/ftrace/ftrace-direct-too.c | 40 +++++++--
> samples/ftrace/ftrace-direct.c | 24 ++++++
> tools/testing/selftests/bpf/DENYLIST.aarch64 | 82 ++----------------
> 11 files changed, 306 insertions(+), 108 deletions(-)
>
> --
> 2.40.0.577.gac1e443424-goog
>

2023-04-11 16:49:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v6 0/5] Add ftrace direct call for arm64

On Tue, 11 Apr 2023 16:56:45 +0100
Mark Rutland <[email protected]> wrote:

> IIUC Steve was hoping to take the FUNCTION_GRAPH_RETVAL series through the
> trace tree, and if that's still the plan, maybe both should go that way?

The conflict is minor, and I think I prefer to still have the ARM64 bits go
through the arm64 tree, as it will get better testing, and I don't like to
merge branches ;-)

I've added Linus to the Cc so he knows that there will be conflicts, but as
long as we mention it in our pull request, with a branch that includes the
solution, it should be fine going through two different trees.

-- Steve

2023-04-11 17:17:16

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v6 0/5] Add ftrace direct call for arm64

On Tue, Apr 11, 2023 at 12:47:49PM -0400, Steven Rostedt wrote:
> On Tue, 11 Apr 2023 16:56:45 +0100
> Mark Rutland <[email protected]> wrote:
>
> > IIUC Steve was hoping to take the FUNCTION_GRAPH_RETVAL series through the
> > trace tree, and if that's still the plan, maybe both should go that way?
>
> The conflict is minor, and I think I prefer to still have the ARM64 bits go
> through the arm64 tree, as it will get better testing, and I don't like to
> merge branches ;-)
>
> I've added Linus to the Cc so he knows that there will be conflicts, but as
> long as we mention it in our pull request, with a branch that includes the
> solution, it should be fine going through two different trees.

If it's just the simple asm-offsets conflict that Mark mentioned, then that
sounds fine to me. However, patches 3-5 don't seem to have anything to do
with arm64 at all and I'd prefer those to go via other trees (esp. as patch
3 is an independent -stable candidate and the last one is a bpf selftest
change which conflicts in -next).

So I'll queue the first two in arm64 on a branch (or-next/ftrace) based
on trace-direct-v6.3-rc3.

Will

2023-04-11 17:46:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v6 0/5] Add ftrace direct call for arm64

On Tue, 11 Apr 2023 18:08:08 +0100
Will Deacon <[email protected]> wrote:

> On Tue, Apr 11, 2023 at 12:47:49PM -0400, Steven Rostedt wrote:
> > On Tue, 11 Apr 2023 16:56:45 +0100
> > Mark Rutland <[email protected]> wrote:
> >
> > > IIUC Steve was hoping to take the FUNCTION_GRAPH_RETVAL series through the
> > > trace tree, and if that's still the plan, maybe both should go that way?
> >
> > The conflict is minor, and I think I prefer to still have the ARM64 bits go
> > through the arm64 tree, as it will get better testing, and I don't like to
> > merge branches ;-)
> >
> > I've added Linus to the Cc so he knows that there will be conflicts, but as
> > long as we mention it in our pull request, with a branch that includes the
> > solution, it should be fine going through two different trees.
>
> If it's just the simple asm-offsets conflict that Mark mentioned, then that
> sounds fine to me. However, patches 3-5 don't seem to have anything to do

I guess 3 and 5 are not, but patch 4 adds arm64 code to the samples (as
it requires arch specific asm to handle the direct trampolines).

> with arm64 at all and I'd prefer those to go via other trees (esp. as patch
> 3 is an independent -stable candidate and the last one is a bpf selftest
> change which conflicts in -next).
>
> So I'll queue the first two in arm64 on a branch (or-next/ftrace) based
> on trace-direct-v6.3-rc3.

Are 3-5 dependent on those changes? If not, I can pull them into my tree.

-- Steve

2023-04-11 18:02:44

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v6 0/5] Add ftrace direct call for arm64

On Tue, Apr 11, 2023 at 01:44:56PM -0400, Steven Rostedt wrote:
> On Tue, 11 Apr 2023 18:08:08 +0100
> Will Deacon <[email protected]> wrote:
>
> > On Tue, Apr 11, 2023 at 12:47:49PM -0400, Steven Rostedt wrote:
> > > On Tue, 11 Apr 2023 16:56:45 +0100
> > > Mark Rutland <[email protected]> wrote:
> > >
> > > > IIUC Steve was hoping to take the FUNCTION_GRAPH_RETVAL series through the
> > > > trace tree, and if that's still the plan, maybe both should go that way?
> > >
> > > The conflict is minor, and I think I prefer to still have the ARM64 bits go
> > > through the arm64 tree, as it will get better testing, and I don't like to
> > > merge branches ;-)
> > >
> > > I've added Linus to the Cc so he knows that there will be conflicts, but as
> > > long as we mention it in our pull request, with a branch that includes the
> > > solution, it should be fine going through two different trees.
> >
> > If it's just the simple asm-offsets conflict that Mark mentioned, then that
> > sounds fine to me. However, patches 3-5 don't seem to have anything to do
>
> I guess 3 and 5 are not, but patch 4 adds arm64 code to the samples (as
> it requires arch specific asm to handle the direct trampolines).

Sorry, yes, I was thinking of arch/arm64/ and then failed spectacularly
at communicating :)

> > with arm64 at all and I'd prefer those to go via other trees (esp. as patch
> > 3 is an independent -stable candidate and the last one is a bpf selftest
> > change which conflicts in -next).
> >
> > So I'll queue the first two in arm64 on a branch (or-next/ftrace) based
> > on trace-direct-v6.3-rc3.
>
> Are 3-5 dependent on those changes? If not, I can pull them into my tree.

Good question. Florent?

Will

2023-04-11 18:43:19

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v6 0/5] Add ftrace direct call for arm64

On Wed, 5 Apr 2023 20:02:45 +0200, Florent Revest wrote:
> This series adds ftrace direct call support to arm64.
> This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64.
>
> It is meant to be taken by the arm64 tree but it depends on the
> trace-direct-v6.3-rc3 tag of the linux-trace tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
> That tag was created by Steven Rostedt so the arm64 tree can pull the prior work
> this depends on. [1]
>
> [...]

Applied first two to arm64 (for-next/ftrace), thanks!

[1/5] arm64: ftrace: Add direct call support
https://git.kernel.org/arm64/c/2aa6ac03516d
[2/5] arm64: ftrace: Simplify get_ftrace_plt
https://git.kernel.org/arm64/c/0f59dca63bf2

Cheers,
--
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

2023-04-12 10:04:15

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v6 0/5] Add ftrace direct call for arm64

On Tue, Apr 11, 2023 at 06:54:24PM +0100, Will Deacon wrote:
> On Tue, Apr 11, 2023 at 01:44:56PM -0400, Steven Rostedt wrote:
> > On Tue, 11 Apr 2023 18:08:08 +0100
> > Will Deacon <[email protected]> wrote:
> >
> > > On Tue, Apr 11, 2023 at 12:47:49PM -0400, Steven Rostedt wrote:
> > > > On Tue, 11 Apr 2023 16:56:45 +0100
> > > > Mark Rutland <[email protected]> wrote:
> > > >
> > > > > IIUC Steve was hoping to take the FUNCTION_GRAPH_RETVAL series through the
> > > > > trace tree, and if that's still the plan, maybe both should go that way?
> > > >
> > > > The conflict is minor, and I think I prefer to still have the ARM64 bits go
> > > > through the arm64 tree, as it will get better testing, and I don't like to
> > > > merge branches ;-)
> > > >
> > > > I've added Linus to the Cc so he knows that there will be conflicts, but as
> > > > long as we mention it in our pull request, with a branch that includes the
> > > > solution, it should be fine going through two different trees.
> > >
> > > If it's just the simple asm-offsets conflict that Mark mentioned, then that
> > > sounds fine to me. However, patches 3-5 don't seem to have anything to do
> >
> > I guess 3 and 5 are not, but patch 4 adds arm64 code to the samples (as
> > it requires arch specific asm to handle the direct trampolines).
>
> Sorry, yes, I was thinking of arch/arm64/ and then failed spectacularly
> at communicating :)
>
> > > with arm64 at all and I'd prefer those to go via other trees (esp. as patch
> > > 3 is an independent -stable candidate and the last one is a bpf selftest
> > > change which conflicts in -next).
> > >
> > > So I'll queue the first two in arm64 on a branch (or-next/ftrace) based
> > > on trace-direct-v6.3-rc3.
> >
> > Are 3-5 dependent on those changes? If not, I can pull them into my tree.
>
> Good question. Florent?

Patch 3 (the fix to the ftrace test) does not depend upon patches 1 and 2. It
probably would've been better to queue that as a preparatory fix before the
other changes.

Patch 4 (adding arm64 support to the samples) depends on patch 3. The arm64
parts depends upon patch 1 to be selectable, and without patch 1 the samples
will behave the same as before. It could be queued independently of patch 1,
but won't have any effect until merged with patch 1.

Patch 5 (the bpf selftest list changes) depends on patch 1 alone.

Perhaps we could queue 1 and 2 via the arm64 tree, 3 and 4 via the ftrace tree,
and follow up with patch 5 via the bpf tree after -rc1?

Thanks,
Mark.

2023-04-24 20:13:22

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v6 0/5] Add ftrace direct call for arm64

On Wed, 12 Apr 2023 10:50:21 +0100
Mark Rutland <[email protected]> wrote:

> Perhaps we could queue 1 and 2 via the arm64 tree, 3 and 4 via the ftrace tree,
> and follow up with patch 5 via the bpf tree after -rc1?

Any patches that you want through the ftrace tree, please send as a
separate queue to the linux-trace-kernel mailing list (and lkml) if you
haven't done that already. I'm still a thousand emails behind, and
walking through them while at the airport lounge.

-- Steve