2023-04-03 11:41:40

by Florent Revest

[permalink] [raw]
Subject: [PATCH v5 0/4] Add ftrace direct call for arm64

This series adds ftrace direct call support to arm64.
This makes BPF tracing programs (fentry/fexit/fmod_ret/lsm) work on arm64.

It is meant to be taken by the arm64 tree but it depends on the
trace-direct-v6.3-rc3 tag of the linux-trace tree:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
That tag was created by Steven Rostedt so the arm64 tree can pull the prior work
this depends on. [1]

Thanks to the ftrace refactoring under that tag, an ftrace_ops backing a ftrace
direct call will only ever point to *one* direct call. This means we can look up
the direct called trampoline address stored in the ops from the ftrace_caller
trampoline in the case when the destination would be out of reach of a BL
instruction at the ftrace callsite. This fixes limitations of previous attempts
such as [2].

This series has been tested on arm64 with:
1- CONFIG_FTRACE_SELFTEST
2- samples/ftrace/*.ko (cf: patch 3)
3- tools/testing/selftests/bpf/test_progs (cf: patch 4)

Changes since v4 [3]:
- *Actually* added "BTI C" instructions at the beginning of each ftrace direct
call sample (v4 was supposed to do that but I forgot to git commit --amend...)

1: https://lore.kernel.org/all/ZB2Nl7fzpHoq5V20@FVFF77S0Q05N/
2: https://lore.kernel.org/all/[email protected]/
3: https://lore.kernel.org/bpf/[email protected]/

Florent Revest (4):
arm64: ftrace: Add direct call support
arm64: ftrace: Simplify get_ftrace_plt
arm64: ftrace: Add direct call trampoline samples support
selftests/bpf: Update the tests deny list on aarch64

arch/arm64/Kconfig | 6 ++
arch/arm64/include/asm/ftrace.h | 22 +++++
arch/arm64/kernel/asm-offsets.c | 6 ++
arch/arm64/kernel/entry-ftrace.S | 90 ++++++++++++++++----
arch/arm64/kernel/ftrace.c | 46 +++++++---
samples/ftrace/ftrace-direct-modify.c | 34 ++++++++
samples/ftrace/ftrace-direct-multi-modify.c | 38 +++++++++
samples/ftrace/ftrace-direct-multi.c | 23 +++++
samples/ftrace/ftrace-direct-too.c | 26 ++++++
samples/ftrace/ftrace-direct.c | 24 ++++++
tools/testing/selftests/bpf/DENYLIST.aarch64 | 82 ++----------------
11 files changed, 295 insertions(+), 102 deletions(-)

--
2.40.0.423.gd6c402a77b-goog


2023-04-03 11:41:39

by Florent Revest

[permalink] [raw]
Subject: [PATCH v5 2/4] arm64: ftrace: Simplify get_ftrace_plt

Following recent refactorings, the get_ftrace_plt function only ever
gets called with addr = FTRACE_ADDR so its code can be simplified to
always return the ftrace trampoline plt.

Signed-off-by: Florent Revest <[email protected]>
Acked-by: Mark Rutland <[email protected]>
---
arch/arm64/kernel/ftrace.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 758436727fba..432626c866a8 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -195,15 +195,15 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
return ftrace_modify_code(pc, 0, new, false);
}

-static struct plt_entry *get_ftrace_plt(struct module *mod, unsigned long addr)
+static struct plt_entry *get_ftrace_plt(struct module *mod)
{
#ifdef CONFIG_ARM64_MODULE_PLTS
struct plt_entry *plt = mod->arch.ftrace_trampolines;

- if (addr == FTRACE_ADDR)
- return &plt[FTRACE_PLT_IDX];
-#endif
+ return &plt[FTRACE_PLT_IDX];
+#else
return NULL;
+#endif
}

static bool reachable_by_bl(unsigned long addr, unsigned long pc)
@@ -270,7 +270,7 @@ static bool ftrace_find_callable_addr(struct dyn_ftrace *rec,
if (WARN_ON(!mod))
return false;

- plt = get_ftrace_plt(mod, *addr);
+ plt = get_ftrace_plt(mod);
if (!plt) {
pr_err("ftrace: no module PLT for %ps\n", (void *)*addr);
return false;
--
2.40.0.423.gd6c402a77b-goog

2023-04-03 11:41:53

by Florent Revest

[permalink] [raw]
Subject: [PATCH v5 3/4] arm64: ftrace: Add direct call trampoline samples support

The ftrace samples need per-architecture trampoline implementations
to save and restore argument registers around the calls to
my_direct_func* and to restore polluted registers (eg: x30).

These samples also include <asm/asm-offsets.h> which, on arm64, is not
necessary and redefines previously defined macros (resulting in
warnings) so these includes are guarded by !CONFIG_ARM64.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm64/Kconfig | 2 ++
samples/ftrace/ftrace-direct-modify.c | 34 ++++++++++++++++++
samples/ftrace/ftrace-direct-multi-modify.c | 38 +++++++++++++++++++++
samples/ftrace/ftrace-direct-multi.c | 23 +++++++++++++
samples/ftrace/ftrace-direct-too.c | 26 ++++++++++++++
samples/ftrace/ftrace-direct.c | 24 +++++++++++++
6 files changed, 147 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f3503d0cc1b8..c2bf28099abd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -194,6 +194,8 @@ config ARM64
!CC_OPTIMIZE_FOR_SIZE)
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
if DYNAMIC_FTRACE_WITH_ARGS
+ select HAVE_SAMPLE_FTRACE_DIRECT
+ select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index 25fba66f61c0..98d1b7385f08 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,9 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/ftrace.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func1(void);
extern void my_direct_func2(void);
@@ -96,6 +98,38 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp1, @function\n"
+" .globl my_tramp1\n"
+" my_tramp1:"
+" bti c\n"
+" sub sp, sp, #16\n"
+" stp x9, x30, [sp]\n"
+" bl my_direct_func1\n"
+" ldp x30, x9, [sp]\n"
+" add sp, sp, #16\n"
+" ret x9\n"
+" .size my_tramp1, .-my_tramp1\n"
+
+" .type my_tramp2, @function\n"
+" .globl my_tramp2\n"
+" my_tramp2:"
+" bti c\n"
+" sub sp, sp, #16\n"
+" stp x9, x30, [sp]\n"
+" bl my_direct_func2\n"
+" ldp x30, x9, [sp]\n"
+" add sp, sp, #16\n"
+" ret x9\n"
+" .size my_tramp2, .-my_tramp2\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
+
static struct ftrace_ops direct;

static unsigned long my_tramp = (unsigned long)my_tramp1;
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index f72623899602..e39108eb085d 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -2,7 +2,9 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/ftrace.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func1(unsigned long ip);
extern void my_direct_func2(unsigned long ip);
@@ -103,6 +105,42 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp1, @function\n"
+" .globl my_tramp1\n"
+" my_tramp1:"
+" bti c\n"
+" sub sp, sp, #32\n"
+" stp x9, x30, [sp]\n"
+" str x0, [sp, #16]\n"
+" bl my_direct_func1\n"
+" ldp x30, x9, [sp]\n"
+" ldr x0, [sp, #16]\n"
+" add sp, sp, #32\n"
+" ret x9\n"
+" .size my_tramp1, .-my_tramp1\n"
+
+" .type my_tramp2, @function\n"
+" .globl my_tramp2\n"
+" my_tramp2:"
+" bti c\n"
+" sub sp, sp, #32\n"
+" stp x9, x30, [sp]\n"
+" str x0, [sp, #16]\n"
+" bl my_direct_func2\n"
+" ldp x30, x9, [sp]\n"
+" ldr x0, [sp, #16]\n"
+" add sp, sp, #32\n"
+" ret x9\n"
+" .size my_tramp2, .-my_tramp2\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
+
static unsigned long my_tramp = (unsigned long)my_tramp1;
static unsigned long tramps[2] = {
(unsigned long)my_tramp1,
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index 1547c2c6be02..5a395d2d2e07 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -4,7 +4,9 @@
#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
#include <linux/sched/stat.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func(unsigned long ip);

@@ -66,6 +68,27 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp, @function\n"
+" .globl my_tramp\n"
+" my_tramp:"
+" bti c\n"
+" sub sp, sp, #32\n"
+" stp x9, x30, [sp]\n"
+" str x0, [sp, #16]\n"
+" bl my_direct_func\n"
+" ldp x30, x9, [sp]\n"
+" ldr x0, [sp, #16]\n"
+" add sp, sp, #32\n"
+" ret x9\n"
+" .size my_tramp, .-my_tramp\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
static struct ftrace_ops direct;

static int __init ftrace_direct_multi_init(void)
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index f28e7b99840f..6e93c45fea86 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -3,7 +3,9 @@

#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func(struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
@@ -70,6 +72,30 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp, @function\n"
+" .globl my_tramp\n"
+" my_tramp:"
+" bti c\n"
+" sub sp, sp, #48\n"
+" stp x9, x30, [sp]\n"
+" stp x0, x1, [sp, #16]\n"
+" str x2, [sp, #32]\n"
+" bl my_direct_func\n"
+" ldp x30, x9, [sp]\n"
+" ldp x0, x1, [sp, #16]\n"
+" ldr x2, [sp, #32]\n"
+" add sp, sp, #48\n"
+" ret x9\n"
+" .size my_tramp, .-my_tramp\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
+
static struct ftrace_ops direct;

static int __init ftrace_direct_init(void)
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index d81a9473b585..e5312f9c15d3 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -3,7 +3,9 @@

#include <linux/sched.h> /* for wake_up_process() */
#include <linux/ftrace.h>
+#ifndef CONFIG_ARM64
#include <asm/asm-offsets.h>
+#endif

extern void my_direct_func(struct task_struct *p);

@@ -63,6 +65,28 @@ asm (

#endif /* CONFIG_S390 */

+#ifdef CONFIG_ARM64
+
+asm (
+" .pushsection .text, \"ax\", @progbits\n"
+" .type my_tramp, @function\n"
+" .globl my_tramp\n"
+" my_tramp:"
+" bti c\n"
+" sub sp, sp, #32\n"
+" stp x9, x30, [sp]\n"
+" str x0, [sp, #16]\n"
+" bl my_direct_func\n"
+" ldp x30, x9, [sp]\n"
+" ldr x0, [sp, #16]\n"
+" add sp, sp, #32\n"
+" ret x9\n"
+" .size my_tramp, .-my_tramp\n"
+" .popsection\n"
+);
+
+#endif /* CONFIG_ARM64 */
+
static struct ftrace_ops direct;

static int __init ftrace_direct_init(void)
--
2.40.0.423.gd6c402a77b-goog

2023-04-03 11:44:22

by Florent Revest

[permalink] [raw]
Subject: [PATCH v5 4/4] selftests/bpf: Update the tests deny list on aarch64

Now that ftrace supports direct call on arm64, BPF tracing programs work
on that architecture. This fixes the vast majority of BPF selftests
except for:

- multi_kprobe programs which require fprobe, not available on arm64 yet
- tracing_struct which requires trampoline support to access struct args

This patch updates the list of BPF selftests which are known to fail so
the BPF CI can validate the tests which pass now.

Signed-off-by: Florent Revest <[email protected]>
---
tools/testing/selftests/bpf/DENYLIST.aarch64 | 82 ++------------------
1 file changed, 5 insertions(+), 77 deletions(-)

diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index 99cc33c51eaa..6b95cb544094 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -1,33 +1,5 @@
-bloom_filter_map # libbpf: prog 'check_bloom': failed to attach: ERROR: strerror_r(-524)=22
-bpf_cookie/lsm
-bpf_cookie/multi_kprobe_attach_api
-bpf_cookie/multi_kprobe_link_api
-bpf_cookie/trampoline
-bpf_loop/check_callback_fn_stop # link unexpected error: -524
-bpf_loop/check_invalid_flags
-bpf_loop/check_nested_calls
-bpf_loop/check_non_constant_callback
-bpf_loop/check_nr_loops
-bpf_loop/check_null_callback_ctx
-bpf_loop/check_stack
-bpf_mod_race # bpf_mod_kfunc_race__attach unexpected error: -524 (errno 524)
-bpf_tcp_ca/dctcp_fallback
-btf_dump/btf_dump: var_data # find type id unexpected find type id: actual -2 < expected 0
-cgroup_hierarchical_stats # attach unexpected error: -524 (errno 524)
-d_path/basic # setup attach failed: -524
-deny_namespace # attach unexpected error: -524 (errno 524)
-fentry_fexit # fentry_attach unexpected error: -1 (errno 524)
-fentry_test # fentry_attach unexpected error: -1 (errno 524)
-fexit_sleep # fexit_attach fexit attach failed: -1
-fexit_stress # fexit attach unexpected fexit attach: actual -524 < expected 0
-fexit_test # fexit_attach unexpected error: -1 (errno 524)
-get_func_args_test # get_func_args_test__attach unexpected error: -524 (errno 524) (trampoline)
-get_func_ip_test # get_func_ip_test__attach unexpected error: -524 (errno 524) (trampoline)
-htab_update/reenter_update
-kfree_skb # attach fentry unexpected error: -524 (trampoline)
-kfunc_call/subprog # extern (var ksym) 'bpf_prog_active': not found in kernel BTF
-kfunc_call/subprog_lskel # skel unexpected error: -2
-kfunc_dynptr_param/dynptr_data_null # libbpf: prog 'dynptr_data_null': failed to attach: ERROR: strerror_r(-524)=22
+bpf_cookie/multi_kprobe_attach_api # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
+bpf_cookie/multi_kprobe_link_api # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
kprobe_multi_bench_attach # bpf_program__attach_kprobe_multi_opts unexpected error: -95
kprobe_multi_test/attach_api_addrs # bpf_program__attach_kprobe_multi_opts unexpected error: -95
kprobe_multi_test/attach_api_pattern # bpf_program__attach_kprobe_multi_opts unexpected error: -95
@@ -35,50 +7,6 @@ kprobe_multi_test/attach_api_syms # bpf_program__attach_kprobe_mu
kprobe_multi_test/bench_attach # bpf_program__attach_kprobe_multi_opts unexpected error: -95
kprobe_multi_test/link_api_addrs # link_fd unexpected link_fd: actual -95 < expected 0
kprobe_multi_test/link_api_syms # link_fd unexpected link_fd: actual -95 < expected 0
-kprobe_multi_test/skel_api # kprobe_multi__attach unexpected error: -524 (errno 524)
-ksyms_module/libbpf # 'bpf_testmod_ksym_percpu': not found in kernel BTF
-ksyms_module/lskel # test_ksyms_module_lskel__open_and_load unexpected error: -2
-libbpf_get_fd_by_id_opts # test_libbpf_get_fd_by_id_opts__attach unexpected error: -524 (errno 524)
-linked_list
-lookup_key # test_lookup_key__attach unexpected error: -524 (errno 524)
-lru_bug # lru_bug__attach unexpected error: -524 (errno 524)
-modify_return # modify_return__attach failed unexpected error: -524 (errno 524)
-module_attach # skel_attach skeleton attach failed: -524
-mptcp/base # run_test mptcp unexpected error: -524 (errno 524)
-netcnt # packets unexpected packets: actual 10001 != expected 10000
-rcu_read_lock # failed to attach: ERROR: strerror_r(-524)=22
-recursion # skel_attach unexpected error: -524 (errno 524)
-ringbuf # skel_attach skeleton attachment failed: -1
-setget_sockopt # attach_cgroup unexpected error: -524
-sk_storage_tracing # test_sk_storage_tracing__attach unexpected error: -524 (errno 524)
-skc_to_unix_sock # could not attach BPF object unexpected error: -524 (errno 524)
-socket_cookie # prog_attach unexpected error: -524
-stacktrace_build_id # compare_stack_ips stackmap vs. stack_amap err -1 errno 2
-task_local_storage/exit_creds # skel_attach unexpected error: -524 (errno 524)
-task_local_storage/recursion # skel_attach unexpected error: -524 (errno 524)
-test_bprm_opts # attach attach failed: -524
-test_ima # attach attach failed: -524
-test_local_storage # attach lsm attach failed: -524
-test_lsm # test_lsm_first_attach unexpected error: -524 (errno 524)
-test_overhead # attach_fentry unexpected error: -524
-timer # timer unexpected error: -524 (errno 524)
-timer_crash # timer_crash__attach unexpected error: -524 (errno 524)
-timer_mim # timer_mim unexpected error: -524 (errno 524)
-trace_printk # trace_printk__attach unexpected error: -1 (errno 524)
-trace_vprintk # trace_vprintk__attach unexpected error: -1 (errno 524)
-tracing_struct # tracing_struct__attach unexpected error: -524 (errno 524)
-trampoline_count # attach_prog unexpected error: -524
-unpriv_bpf_disabled # skel_attach unexpected error: -524 (errno 524)
-user_ringbuf/test_user_ringbuf_post_misaligned # misaligned_skel unexpected error: -524 (errno 524)
-user_ringbuf/test_user_ringbuf_post_producer_wrong_offset
-user_ringbuf/test_user_ringbuf_post_larger_than_ringbuf_sz
-user_ringbuf/test_user_ringbuf_basic # ringbuf_basic_skel unexpected error: -524 (errno 524)
-user_ringbuf/test_user_ringbuf_sample_full_ring_buffer
-user_ringbuf/test_user_ringbuf_post_alignment_autoadjust
-user_ringbuf/test_user_ringbuf_overfill
-user_ringbuf/test_user_ringbuf_discards_properly_ignored
-user_ringbuf/test_user_ringbuf_loop
-user_ringbuf/test_user_ringbuf_msg_protocol
-user_ringbuf/test_user_ringbuf_blocking_reserve
-verify_pkcs7_sig # test_verify_pkcs7_sig__attach unexpected error: -524 (errno 524)
-vmlinux # skel_attach skeleton attach failed: -524
+kprobe_multi_test/skel_api # libbpf: failed to load BPF skeleton 'kprobe_multi': -3
+module_attach # prog 'kprobe_multi': failed to auto-attach: -95
+tracing_struct # tracing_struct__attach unexpected error: -524 (errno 524)
\ No newline at end of file
--
2.40.0.423.gd6c402a77b-goog

2023-04-03 17:28:52

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v5 3/4] arm64: ftrace: Add direct call trampoline samples support

On Mon, Apr 03, 2023 at 01:35:51PM +0200, Florent Revest wrote:
> The ftrace samples need per-architecture trampoline implementations
> to save and restore argument registers around the calls to
> my_direct_func* and to restore polluted registers (eg: x30).
>
> These samples also include <asm/asm-offsets.h> which, on arm64, is not
> necessary and redefines previously defined macros (resulting in
> warnings) so these includes are guarded by !CONFIG_ARM64.
>
> Signed-off-by: Florent Revest <[email protected]>

Overall this looks pretty good!

I spotted a few bugs below while testing, and I've suggested some fixups below.

w.r.t. the asm-offsets include guards. I took a look at fixing arm64's
asm-offsets.c to not be problematic, but it requires some invasive refactoring,
so I'd like to clean that up as a separate series. I don't think that should
block this series, and I think that the include guards are fine for now.

> ---
> arch/arm64/Kconfig | 2 ++
> samples/ftrace/ftrace-direct-modify.c | 34 ++++++++++++++++++
> samples/ftrace/ftrace-direct-multi-modify.c | 38 +++++++++++++++++++++
> samples/ftrace/ftrace-direct-multi.c | 23 +++++++++++++
> samples/ftrace/ftrace-direct-too.c | 26 ++++++++++++++
> samples/ftrace/ftrace-direct.c | 24 +++++++++++++
> 6 files changed, 147 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f3503d0cc1b8..c2bf28099abd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -194,6 +194,8 @@ config ARM64
> !CC_OPTIMIZE_FOR_SIZE)
> select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
> if DYNAMIC_FTRACE_WITH_ARGS
> + select HAVE_SAMPLE_FTRACE_DIRECT
> + select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> select HAVE_EFFICIENT_UNALIGNED_ACCESS
> select HAVE_FAST_GUP
> select HAVE_FTRACE_MCOUNT_RECORD
> diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> index 25fba66f61c0..98d1b7385f08 100644
> --- a/samples/ftrace/ftrace-direct-modify.c
> +++ b/samples/ftrace/ftrace-direct-modify.c
> @@ -2,7 +2,9 @@
> #include <linux/module.h>
> #include <linux/kthread.h>
> #include <linux/ftrace.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func1(void);
> extern void my_direct_func2(void);
> @@ -96,6 +98,38 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp1, @function\n"
> +" .globl my_tramp1\n"
> +" my_tramp1:"
> +" bti c\n"
> +" sub sp, sp, #16\n"
> +" stp x9, x30, [sp]\n"
> +" bl my_direct_func1\n"
> +" ldp x30, x9, [sp]\n"
> +" add sp, sp, #16\n"
> +" ret x9\n"
> +" .size my_tramp1, .-my_tramp1\n"
> +
> +" .type my_tramp2, @function\n"
> +" .globl my_tramp2\n"
> +" my_tramp2:"
> +" bti c\n"
> +" sub sp, sp, #16\n"
> +" stp x9, x30, [sp]\n"
> +" bl my_direct_func2\n"
> +" ldp x30, x9, [sp]\n"
> +" add sp, sp, #16\n"
> +" ret x9\n"
> +" .size my_tramp2, .-my_tramp2\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */

These looks functionally correct, given they'll only be attached to schedule()
and the direct funcs take no arguments, so there's no arguments to save/restore
and nothing to shuffle.

As an aside, I believe we'll need to rework the sequences when we add support
for RELIABLE_STACKTRACE so that the unwinder can reliably acquire the address
of the instrumented function and its caller, but I think for now it's
preferable to keep this simple and I'm happy to make that a problem for future
me.

> diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> index f72623899602..e39108eb085d 100644
> --- a/samples/ftrace/ftrace-direct-multi-modify.c
> +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> @@ -2,7 +2,9 @@
> #include <linux/module.h>
> #include <linux/kthread.h>
> #include <linux/ftrace.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func1(unsigned long ip);
> extern void my_direct_func2(unsigned long ip);
> @@ -103,6 +105,42 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp1, @function\n"
> +" .globl my_tramp1\n"
> +" my_tramp1:"
> +" bti c\n"
> +" sub sp, sp, #32\n"
> +" stp x9, x30, [sp]\n"
> +" str x0, [sp, #16]\n"
> +" bl my_direct_func1\n"
> +" ldp x30, x9, [sp]\n"
> +" ldr x0, [sp, #16]\n"
> +" add sp, sp, #32\n"
> +" ret x9\n"
> +" .size my_tramp1, .-my_tramp1\n"
> +
> +" .type my_tramp2, @function\n"
> +" .globl my_tramp2\n"
> +" my_tramp2:"
> +" bti c\n"
> +" sub sp, sp, #32\n"
> +" stp x9, x30, [sp]\n"
> +" str x0, [sp, #16]\n"
> +" bl my_direct_func2\n"
> +" ldp x30, x9, [sp]\n"
> +" ldr x0, [sp, #16]\n"
> +" add sp, sp, #32\n"
> +" ret x9\n"
> +" .size my_tramp2, .-my_tramp2\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */

For both of these trampolines we need to pass the trampoline's return address
(i.e. where we'll return to in the instrumented function) as the 'ip' argument
to my_direct_func{1,2}().

In both cases, just before the 'bl my_direct_func{1,2}' we'll need to add:

mov x0, x30

[...]

> diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> index 1547c2c6be02..5a395d2d2e07 100644
> --- a/samples/ftrace/ftrace-direct-multi.c
> +++ b/samples/ftrace/ftrace-direct-multi.c
> @@ -4,7 +4,9 @@
> #include <linux/mm.h> /* for handle_mm_fault() */
> #include <linux/ftrace.h>
> #include <linux/sched/stat.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func(unsigned long ip);
>
> @@ -66,6 +68,27 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp, @function\n"
> +" .globl my_tramp\n"
> +" my_tramp:"
> +" bti c\n"
> +" sub sp, sp, #32\n"
> +" stp x9, x30, [sp]\n"
> +" str x0, [sp, #16]\n"
> +" bl my_direct_func\n"
> +" ldp x30, x9, [sp]\n"
> +" ldr x0, [sp, #16]\n"
> +" add sp, sp, #32\n"
> +" ret x9\n"
> +" .size my_tramp, .-my_tramp\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */
> static struct ftrace_ops direct;

As with ftrace-direct-multi-modify.c, we need to pass the return address of the
trampoline as the 'ip' argument to my_direct_func1(), so just before the 'bl
my_direct_func' we'll need to add:

mov x0, x30

[...]

> diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> index f28e7b99840f..6e93c45fea86 100644
> --- a/samples/ftrace/ftrace-direct-too.c
> +++ b/samples/ftrace/ftrace-direct-too.c
> @@ -3,7 +3,9 @@
>
> #include <linux/mm.h> /* for handle_mm_fault() */
> #include <linux/ftrace.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func(struct vm_area_struct *vma,
> unsigned long address, unsigned int flags);

This gets attached to handle_mm_fault(), whose prototype is currently:

vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
unsigned int flags, struct pt_regs *regs)

i.e. it has 4 arguments, in x0 to x3.

> @@ -70,6 +72,30 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp, @function\n"
> +" .globl my_tramp\n"
> +" my_tramp:"
> +" bti c\n"
> +" sub sp, sp, #48\n"
> +" stp x9, x30, [sp]\n"
> +" stp x0, x1, [sp, #16]\n"
> +" str x2, [sp, #32]\n"
> +" bl my_direct_func\n"
> +" ldp x30, x9, [sp]\n"
> +" ldp x0, x1, [sp, #16]\n"
> +" ldr x2, [sp, #32]\n"

So here we need to save+restore x3 also.

We already have the space reserved, so that should just be a matter of using
stp/ldp for x2 and x3.

> +" add sp, sp, #48\n"
> +" ret x9\n"
> +" .size my_tramp, .-my_tramp\n"
> +" .popsection\n"
> +);
> +
> +#endif /* CONFIG_ARM64 */
> +
> static struct ftrace_ops direct;
>
> static int __init ftrace_direct_init(void)
> diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
> index d81a9473b585..e5312f9c15d3 100644
> --- a/samples/ftrace/ftrace-direct.c
> +++ b/samples/ftrace/ftrace-direct.c
> @@ -3,7 +3,9 @@
>
> #include <linux/sched.h> /* for wake_up_process() */
> #include <linux/ftrace.h>
> +#ifndef CONFIG_ARM64
> #include <asm/asm-offsets.h>
> +#endif
>
> extern void my_direct_func(struct task_struct *p);
>
> @@ -63,6 +65,28 @@ asm (
>
> #endif /* CONFIG_S390 */
>
> +#ifdef CONFIG_ARM64
> +
> +asm (
> +" .pushsection .text, \"ax\", @progbits\n"
> +" .type my_tramp, @function\n"
> +" .globl my_tramp\n"
> +" my_tramp:"
> +" bti c\n"
> +" sub sp, sp, #32\n"
> +" stp x9, x30, [sp]\n"
> +" str x0, [sp, #16]\n"
> +" bl my_direct_func\n"
> +" ldp x30, x9, [sp]\n"
> +" ldr x0, [sp, #16]\n"
> +" add sp, sp, #32\n"
> +" ret x9\n"
> +" .size my_tramp, .-my_tramp\n"
> +" .popsection\n"
> +);

This looks fine. Since my_direct_func() is attached to wake_up_process() and
expects its single argument, saving/restoring x0 is sufficient and we don't
need any additional register shuffling.

Thanks,
Mark.

2023-04-04 13:44:52

by Florent Revest

[permalink] [raw]
Subject: Re: [PATCH v5 3/4] arm64: ftrace: Add direct call trampoline samples support

On Mon, Apr 3, 2023 at 7:26 PM Mark Rutland <[email protected]> wrote:
>
> On Mon, Apr 03, 2023 at 01:35:51PM +0200, Florent Revest wrote:
> > The ftrace samples need per-architecture trampoline implementations
> > to save and restore argument registers around the calls to
> > my_direct_func* and to restore polluted registers (eg: x30).
> >
> > These samples also include <asm/asm-offsets.h> which, on arm64, is not
> > necessary and redefines previously defined macros (resulting in
> > warnings) so these includes are guarded by !CONFIG_ARM64.
> >
> > Signed-off-by: Florent Revest <[email protected]>
>
> Overall this looks pretty good!
>
> I spotted a few bugs below while testing, and I've suggested some fixups below.
>
> w.r.t. the asm-offsets include guards. I took a look at fixing arm64's
> asm-offsets.c to not be problematic, but it requires some invasive refactoring,
> so I'd like to clean that up as a separate series. I don't think that should
> block this series, and I think that the include guards are fine for now.

Sounds great! Thank you Mark :)

> > ---
> > arch/arm64/Kconfig | 2 ++
> > samples/ftrace/ftrace-direct-modify.c | 34 ++++++++++++++++++
> > samples/ftrace/ftrace-direct-multi-modify.c | 38 +++++++++++++++++++++
> > samples/ftrace/ftrace-direct-multi.c | 23 +++++++++++++
> > samples/ftrace/ftrace-direct-too.c | 26 ++++++++++++++
> > samples/ftrace/ftrace-direct.c | 24 +++++++++++++
> > 6 files changed, 147 insertions(+)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index f3503d0cc1b8..c2bf28099abd 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -194,6 +194,8 @@ config ARM64
> > !CC_OPTIMIZE_FOR_SIZE)
> > select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
> > if DYNAMIC_FTRACE_WITH_ARGS
> > + select HAVE_SAMPLE_FTRACE_DIRECT
> > + select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
> > select HAVE_EFFICIENT_UNALIGNED_ACCESS
> > select HAVE_FAST_GUP
> > select HAVE_FTRACE_MCOUNT_RECORD
> > diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
> > index 25fba66f61c0..98d1b7385f08 100644
> > --- a/samples/ftrace/ftrace-direct-modify.c
> > +++ b/samples/ftrace/ftrace-direct-modify.c
> > @@ -2,7 +2,9 @@
> > #include <linux/module.h>
> > #include <linux/kthread.h>
> > #include <linux/ftrace.h>
> > +#ifndef CONFIG_ARM64
> > #include <asm/asm-offsets.h>
> > +#endif
> >
> > extern void my_direct_func1(void);
> > extern void my_direct_func2(void);
> > @@ -96,6 +98,38 @@ asm (
> >
> > #endif /* CONFIG_S390 */
> >
> > +#ifdef CONFIG_ARM64
> > +
> > +asm (
> > +" .pushsection .text, \"ax\", @progbits\n"
> > +" .type my_tramp1, @function\n"
> > +" .globl my_tramp1\n"
> > +" my_tramp1:"
> > +" bti c\n"
> > +" sub sp, sp, #16\n"
> > +" stp x9, x30, [sp]\n"
> > +" bl my_direct_func1\n"
> > +" ldp x30, x9, [sp]\n"
> > +" add sp, sp, #16\n"
> > +" ret x9\n"
> > +" .size my_tramp1, .-my_tramp1\n"
> > +
> > +" .type my_tramp2, @function\n"
> > +" .globl my_tramp2\n"
> > +" my_tramp2:"
> > +" bti c\n"
> > +" sub sp, sp, #16\n"
> > +" stp x9, x30, [sp]\n"
> > +" bl my_direct_func2\n"
> > +" ldp x30, x9, [sp]\n"
> > +" add sp, sp, #16\n"
> > +" ret x9\n"
> > +" .size my_tramp2, .-my_tramp2\n"
> > +" .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_ARM64 */
>
> These looks functionally correct, given they'll only be attached to schedule()
> and the direct funcs take no arguments, so there's no arguments to save/restore
> and nothing to shuffle.
>
> As an aside, I believe we'll need to rework the sequences when we add support
> for RELIABLE_STACKTRACE so that the unwinder can reliably acquire the address
> of the instrumented function and its caller, but I think for now it's
> preferable to keep this simple and I'm happy to make that a problem for future
> me.

Ah, interesting. I'd be happy to help when that time comes! :)

> > diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
> > index f72623899602..e39108eb085d 100644
> > --- a/samples/ftrace/ftrace-direct-multi-modify.c
> > +++ b/samples/ftrace/ftrace-direct-multi-modify.c
> > @@ -2,7 +2,9 @@
> > #include <linux/module.h>
> > #include <linux/kthread.h>
> > #include <linux/ftrace.h>
> > +#ifndef CONFIG_ARM64
> > #include <asm/asm-offsets.h>
> > +#endif
> >
> > extern void my_direct_func1(unsigned long ip);
> > extern void my_direct_func2(unsigned long ip);
> > @@ -103,6 +105,42 @@ asm (
> >
> > #endif /* CONFIG_S390 */
> >
> > +#ifdef CONFIG_ARM64
> > +
> > +asm (
> > +" .pushsection .text, \"ax\", @progbits\n"
> > +" .type my_tramp1, @function\n"
> > +" .globl my_tramp1\n"
> > +" my_tramp1:"
> > +" bti c\n"
> > +" sub sp, sp, #32\n"
> > +" stp x9, x30, [sp]\n"
> > +" str x0, [sp, #16]\n"
> > +" bl my_direct_func1\n"
> > +" ldp x30, x9, [sp]\n"
> > +" ldr x0, [sp, #16]\n"
> > +" add sp, sp, #32\n"
> > +" ret x9\n"
> > +" .size my_tramp1, .-my_tramp1\n"
> > +
> > +" .type my_tramp2, @function\n"
> > +" .globl my_tramp2\n"
> > +" my_tramp2:"
> > +" bti c\n"
> > +" sub sp, sp, #32\n"
> > +" stp x9, x30, [sp]\n"
> > +" str x0, [sp, #16]\n"
> > +" bl my_direct_func2\n"
> > +" ldp x30, x9, [sp]\n"
> > +" ldr x0, [sp, #16]\n"
> > +" add sp, sp, #32\n"
> > +" ret x9\n"
> > +" .size my_tramp2, .-my_tramp2\n"
> > +" .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_ARM64 */
>
> For both of these trampolines we need to pass the trampoline's return address
> (i.e. where we'll return to in the instrumented function) as the 'ip' argument
> to my_direct_func{1,2}().
>
> In both cases, just before the 'bl my_direct_func{1,2}' we'll need to add:
>
> mov x0, x30

Oopsie, yes! Very good catch!

> > diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
> > index 1547c2c6be02..5a395d2d2e07 100644
> > --- a/samples/ftrace/ftrace-direct-multi.c
> > +++ b/samples/ftrace/ftrace-direct-multi.c
> > @@ -4,7 +4,9 @@
> > #include <linux/mm.h> /* for handle_mm_fault() */
> > #include <linux/ftrace.h>
> > #include <linux/sched/stat.h>
> > +#ifndef CONFIG_ARM64
> > #include <asm/asm-offsets.h>
> > +#endif
> >
> > extern void my_direct_func(unsigned long ip);
> >
> > @@ -66,6 +68,27 @@ asm (
> >
> > #endif /* CONFIG_S390 */
> >
> > +#ifdef CONFIG_ARM64
> > +
> > +asm (
> > +" .pushsection .text, \"ax\", @progbits\n"
> > +" .type my_tramp, @function\n"
> > +" .globl my_tramp\n"
> > +" my_tramp:"
> > +" bti c\n"
> > +" sub sp, sp, #32\n"
> > +" stp x9, x30, [sp]\n"
> > +" str x0, [sp, #16]\n"
> > +" bl my_direct_func\n"
> > +" ldp x30, x9, [sp]\n"
> > +" ldr x0, [sp, #16]\n"
> > +" add sp, sp, #32\n"
> > +" ret x9\n"
> > +" .size my_tramp, .-my_tramp\n"
> > +" .popsection\n"
> > +);
> > +
> > +#endif /* CONFIG_ARM64 */
> > static struct ftrace_ops direct;
>
> As with ftrace-direct-multi-modify.c, we need to pass the return address of the
> trampoline as the 'ip' argument to my_direct_func1(), so just before the 'bl
> my_direct_func' we'll need to add:
>
> mov x0, x30

Will do :)

> > diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> > index f28e7b99840f..6e93c45fea86 100644
> > --- a/samples/ftrace/ftrace-direct-too.c
> > +++ b/samples/ftrace/ftrace-direct-too.c
> > @@ -3,7 +3,9 @@
> >
> > #include <linux/mm.h> /* for handle_mm_fault() */
> > #include <linux/ftrace.h>
> > +#ifndef CONFIG_ARM64
> > #include <asm/asm-offsets.h>
> > +#endif
> >
> > extern void my_direct_func(struct vm_area_struct *vma,
> > unsigned long address, unsigned int flags);
>
> This gets attached to handle_mm_fault(), whose prototype is currently:
>
> vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> unsigned int flags, struct pt_regs *regs)
>
> i.e. it has 4 arguments, in x0 to x3.
>
> > @@ -70,6 +72,30 @@ asm (
> >
> > #endif /* CONFIG_S390 */
> >
> > +#ifdef CONFIG_ARM64
> > +
> > +asm (
> > +" .pushsection .text, \"ax\", @progbits\n"
> > +" .type my_tramp, @function\n"
> > +" .globl my_tramp\n"
> > +" my_tramp:"
> > +" bti c\n"
> > +" sub sp, sp, #48\n"
> > +" stp x9, x30, [sp]\n"
> > +" stp x0, x1, [sp, #16]\n"
> > +" str x2, [sp, #32]\n"
> > +" bl my_direct_func\n"
> > +" ldp x30, x9, [sp]\n"
> > +" ldp x0, x1, [sp, #16]\n"
> > +" ldr x2, [sp, #32]\n"
>
> So here we need to save+restore x3 also.
>
> We already have the space reserved, so that should just be a matter of using
> stp/ldp for x2 and x3.

That's also a very good catch. It looks like it's an issue for the
sample trampoline on x86_64 as well, I'll fix it too in a separate
patch as part of v6. (if i understand s390 asm correctly, the stmg
already saves all arg registers in one instruction so s390 doesn't
need to change)