2024-02-28 23:27:35

by Puranjay Mohan

[permalink] [raw]
Subject: arm64: Supporting DYNAMIC_FTRACE_WITH_CALL_OPS with CLANG_CFI


I recently worked on allowing BPF programs to work properly on CLANG_CFI
enabled kernels [1]. While doing this I found that fentry programs are
failing to attach because DYNAMIC_FTRACE_WITH_CALL_OPS doesn't work with
CLANG_CFI.

Mark told me that the problem is that clang CFI places the type hash
immediately before any pre-function NOPs, and so where some functions
have pre-function NOPs and others do not, the type hashes are not at a
consistent offset (and effectively the functions have different ABIs and
cannot call one another)

I tried enabling both Clang CFI and -fpatchable-function-entry=4,2 to
see the behaviour and where this could fail. Here is an example:

This is the disassembly of jump_label_cmp() that has two pre-function nops and the CFI
hash before them. So, the hash is at (addr - 12).

ffff80008033e9b0: 16c516ce [kCFI hash for 'static int jump_label_cmp(const void *a, const void *b)']
ffff80008033e9b4: d503201f nop
ffff80008033e9b8: d503201f nop
ffff80008033e9bc <jump_label_cmp>:
ffff80008033e9bc: d503245f bti c
ffff80008033e9c0: d503201f nop
ffff80008033e9c4: d503201f nop
[.....]

The following is the disassembly of the sort_r() function that makes an indirect call to
jump_label_cmp() but loads the CFI hash from (addr - 4) rather than
(addr - 12). So, it is loading the nop instruction and not the hash.

ffff80008084e19c <sort_r>:
[.....]
0xffff80008084e454 <+696>: ldur w16, [x8, #-4] (#-4 here should be #-12)
0xffff80008084e458 <+700>: movk w17, #0x16ce
0xffff80008084e45c <+704>: movk w17, #0x16c5, lsl #16
0xffff80008084e460 <+708>: cmp w16, w17
0xffff80008084e464 <+712>: b.eq 0xffff80008084e46c <sort_r+720> // b.none
0xffff80008084e468 <+716>: brk #0x8228
0xffff80008084e46c <+720>: blr x8

This would cause a cfi exception.

As I haven't spent more time trying to understand this, I am not aware
how the compiler emits 2 nops before some functions and none for others.

I would propose the following changes to the compiler that could fix this
issue:

1. The kCFI hash should always be generated at func_addr - 4, this would
make the calling code consistent.

2. The two(n) nops should be generated before the kCFI hash. We would
modify the ftrace code to look for these nops at (fun_addr - 12) and
(func_addr - 8) when CFI is enabled, and (func_addr - 8), (func_addr -
4) when CFI is disabled.

The generated code could then look like:

ffff80008033e9b0: d503201f nop
ffff80008033e9b4: d503201f nop
ffff80008033e9b8: 16c516ce kCFI hash
ffff80008033e9bc <jump_label_cmp>:
ffff80008033e9bc: d503245f bti c
ffff80008033e9c0: d503201f nop
ffff80008033e9c4: d503201f nop
[.....]

Note: I am overlooking the alignment requirements here, we might need to
add another nop above the hash to make sure the top two nops are aligned at 8 bytes.

I am not sure how useful this solution is, looking forward to hear from
others who know more about this topic.

Thanks,
Puranjay

[1] https://lore.kernel.org/bpf/[email protected]/