Add jump optimization support for RISC-V.
Replaces ebreak instructions used by normal kprobes with an
auipc+jalr instruction pair, at the aim of suppressing the probe-hit
overhead.
All known optprobe-capable RISC architectures have been using a single
jump or branch instructions while this patch chooses not. RISC-V has a
quite limited jump range (4KB or 2MB) for both its branch and jump
instructions, which prevent optimizations from supporting probes that
spread all over the kernel.
Auipc-jalr instruction pair is introduced with a much wider jump range
(4GB), where auipc loads the upper 12 bits to a free register and jalr
Deaconappends the lower 20 bits to form a 32 bit immediate. Note that
returns from probe handler requires another free register. As kprobes
can appear almost anywhere inside the kernel, the free register should
be found in a generic way, not depending on calling convention or any
other regulations.
The algorithm for finding the free register is inspired by the register
renaming in modern processors. From the perspective of register
renaming,
a register could be represented as two different registers if two
neighbour
instructions both write to it but no one ever reads. Extending this
fact,
a register is considered to be free if there is no read before its next
write in the execution flow. We are free to change its value without
interfering normal execution.
Static analysis shows that 51% instructions of the kernel (default
config)
is capable of being replaced i.e. one free register can be found at both
the start and end of replaced instruction pairs while the replaced
instructions can be directly executed. We also made an efficiency test
on Gem 5 RISCV which shows a more than 5x speedup on breakpoint-based
implementation.
Contribution:
Chen Guokai invents the algorithm of searching free register, evaluate
the ratio of optimizaion, the basic function support RVI kernel binary.
Liao Chang adds the support for hybrid RVI and RVC kernel binary, fix
some bugs with different kernel configure, refactor out entire feature
into some individual patches.
v5:
1. Correct known nits
2. Enable the usage of unused caller-saved registers
3. Append an efficiency test result on Gem 5
v4:
Correct the sequence of Signed-off-by and Co-developed-by.
v3:
1. Support of hybrid RVI and RVC kernel binary.
2. Refactor out entire feature into some individual patches.
v2:
1. Adjust comments
2. Remove improper copyright
3. Clean up format issues that is no common practice
4. Extract common definition of instruction decoder
5. Fix race issue in SMP platform.
v1:
Chen Guokai contribute the basic functionality code.
Chen Guokai (1):
riscv/kprobe: Search free registers from unused caller-saved ones
Liao Chang (8):
riscv/kprobe: Prepare the skeleton to implement RISCV OPTPROBES
feature
riscv/kprobe: Allocate detour buffer from module area
riscv/kprobe: Prepare the skeleton to prepare optimized kprobe
riscv/kprobe: Add common RVI and RVC instruction decoder code
riscv/kprobe: Search free register(s) to clobber for 'AUIPC/JALR'
riscv/kprobe: Add code to check if kprobe can be optimized
riscv/kprobe: Prepare detour buffer for optimized kprobe
riscv/kprobe: Patch AUIPC/JALR pair to optimize kprobe
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/bug.h | 5 +-
arch/riscv/include/asm/kprobes.h | 49 ++
arch/riscv/include/asm/patch.h | 1 +
arch/riscv/kernel/patch.c | 23 +-
arch/riscv/kernel/probes/Makefile | 1 +
arch/riscv/kernel/probes/decode-insn.h | 153 +++++
arch/riscv/kernel/probes/kprobes.c | 24 +
arch/riscv/kernel/probes/opt.c | 693 ++++++++++++++++++++++
arch/riscv/kernel/probes/opt_trampoline.S | 137 +++++
arch/riscv/kernel/probes/simulate-insn.h | 41 ++
11 files changed, 1123 insertions(+), 5 deletions(-)
create mode 100644 arch/riscv/kernel/probes/opt.c
create mode 100644 arch/riscv/kernel/probes/opt_trampoline.S
--
2.34.1
From: Liao Chang <[email protected]>
This patch implement the algorithm of searching free register(s) to
form a long-jump instruction pair.
AUIPC/JALR instruction pair is introduced with a much wider jump range
(4GB), where auipc loads the upper 20 bits to a free register and jalr
appends the lower 12 bits to form a 32 bit immediate. Since kprobes can
be instrumented at anywhere in kernel space, hence the free register
should be found in a generic way, not depending on the calling convention
or any other regulations.
The algorithm for finding the free register is inspired by the register
renaming in modern processors. From the perspective of register renaming,
a register could be represented as two different registers if two neighbour
instructions both write to it but no one ever reads. Extending this fact,
a register is considered to be free if there is no read before its next
write in the execution flow. We are free to change its value without
interfering normal execution.
In order to do jump optimization, it needs to search two free registers,
the first one is used to form AUIPC/JALR jumping to detour buffer, the
second one is used to form JR jumping back from detour buffer. If first
one never been updated by any instructions replaced by 'AUIPC/JALR',
both register supposes to the same one.
Let's use the example below to explain how the algorithm work. Given
kernel is RVI and RCV hybrid binary, and one kprobe is instrumented at
the entry of function idle_dummy.
Before Optimized Detour buffer
<idle_dummy>: ...
#1 add sp,sp,-16 auipc a0, #? add sp,sp,-16
#2 sd s0,8(sp) sd s0,8(sp)
#3 addi s0,sp,16 jalr a0, #?(a0) addi s0,sp,16
#4 ld s0,8(sp) ld s0,8(sp)
#5 li a0,0 li a0,0 auipc a0, #?
#6 addi sp,sp,16 addi sp,sp,16 jr x0, #?(a0)
#7 ret ret
For regular kprobe, it is trival to replace the first instruction with
C.EREABK, no more instruction and register will be clobber, in order to
optimize kprobe with long-jump, it used to patch the first 8 bytes with
AUIPC/JALR, and a0 will be chosen to save the address jumping to,
because from #1 to #7, a0 is the only one register that satifies two
conditions: (1) No read before write (2) Never been updated in detour
buffer. While s0 has been used as the source register at #2, so it is
not free to clobber.
The searching starts from the kprobe and stop at the last instruction of
function or the first branch/jump instruction, it decodes out the 'rs'
and 'rd' part of each visited instruction. If the 'rd' never been read
before, then record it to bitmask 'write'; if the 'rs' never been
written before, then record it to another bitmask 'read'. When searching
stops, the remaining bits of 'write' are the free registers to form
AUIPC/JALR or JR.
Signed-off-by: Liao Chang <[email protected]>
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
---
arch/riscv/kernel/probes/opt.c | 223 +++++++++++++++++++++++++++++++++
1 file changed, 223 insertions(+)
diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
index a4271e6033ba..a0d2ab39e3fa 100644
--- a/arch/riscv/kernel/probes/opt.c
+++ b/arch/riscv/kernel/probes/opt.c
@@ -12,6 +12,9 @@
#include <asm/kprobes.h>
#include <asm/patch.h>
+#include "simulate-insn.h"
+#include "decode-insn.h"
+
static inline int in_auipc_jalr_range(long val)
{
#ifdef CONFIG_ARCH_RV32I
@@ -37,15 +40,235 @@ static void prepare_detour_buffer(kprobe_opcode_t *code, kprobe_opcode_t *slot,
{
}
+/* Registers the first usage of which is the destination of instruction */
+#define WRITE_ON(reg) \
+ (*write |= (((*read >> (reg)) ^ 1UL) & 1) << (reg))
+/* Registers the first usage of which is the source of instruction */
+#define READ_ON(reg) \
+ (*read |= (((*write >> (reg)) ^ 1UL) & 1) << (reg))
+
/*
* In RISC-V ISA, AUIPC/JALR clobber one register to form target address,
* by inspired by register renaming in OoO processor, this involves search
* backwards that is not previously used as a source register and is used
* as a destination register before any branch or jump instruction.
*/
+static void find_register(unsigned long start, unsigned long end,
+ unsigned long *write, unsigned long *read)
+{
+ kprobe_opcode_t insn;
+ unsigned long addr, offset = 0UL;
+
+ for (addr = start; addr < end; addr += offset) {
+ insn = *(kprobe_opcode_t *)addr;
+ offset = GET_INSN_LENGTH(insn);
+
+#ifdef CONFIG_RISCV_ISA_C
+ if (offset == RVI_INSN_LEN)
+ goto is_rvi;
+
+ insn &= __COMPRESSED_INSN_MASK;
+ /* Stop searching until any control transfer instruction */
+ if (riscv_insn_is_c_ebreak(insn) || riscv_insn_is_c_j(insn))
+ break;
+
+ if (riscv_insn_is_c_jal(insn)) {
+ /* The rd of C.JAL is x1 by default */
+ WRITE_ON(1);
+ break;
+ }
+
+ if (riscv_insn_is_c_jr(insn)) {
+ READ_ON(rvc_r_rs1(insn));
+ break;
+ }
+
+ if (riscv_insn_is_c_jalr(insn)) {
+ READ_ON(rvc_r_rs1(insn));
+ /* The rd of C.JALR is x1 by default */
+ WRITE_ON(1);
+ break;
+ }
+
+ if (riscv_insn_is_c_beqz(insn) || riscv_insn_is_c_bnez(insn)) {
+ READ_ON(rvc_b_rs(insn));
+ break;
+ }
+
+ /*
+ * Decode RVC instructions that encode integer registers, try
+ * to find out some destination register, the number of which
+ * are equal with 'least' and never be used as source register.
+ */
+ if (riscv_insn_is_c_sub(insn) || riscv_insn_is_c_subw(insn)) {
+ READ_ON(rvc_a_rs1(insn));
+ READ_ON(rvc_a_rs2(insn));
+ continue;
+ } else if (riscv_insn_is_c_sq(insn) ||
+ riscv_insn_is_c_sw(insn) ||
+ riscv_insn_is_c_sd(insn)) {
+ READ_ON(rvc_s_rs1(insn));
+ READ_ON(rvc_s_rs2(insn));
+ continue;
+ } else if (riscv_insn_is_c_addi16sp(insn) ||
+ riscv_insn_is_c_addi(insn) ||
+ riscv_insn_is_c_addiw(insn) ||
+ riscv_insn_is_c_slli(insn)) {
+ READ_ON(rvc_i_rs1(insn));
+ continue;
+ } else if (riscv_insn_is_c_sri(insn) ||
+ riscv_insn_is_c_andi(insn)) {
+ READ_ON(rvc_b_rs(insn));
+ continue;
+ } else if (riscv_insn_is_c_sqsp(insn) ||
+ riscv_insn_is_c_swsp(insn) ||
+ riscv_insn_is_c_sdsp(insn)) {
+ READ_ON(rvc_ss_rs2(insn));
+ /* The rs2 of C.SQSP/SWSP/SDSP are x2 by default */
+ READ_ON(2);
+ continue;
+ } else if (riscv_insn_is_c_mv(insn)) {
+ READ_ON(rvc_r_rs2(insn));
+ WRITE_ON(rvc_r_rd(insn));
+ } else if (riscv_insn_is_c_addi4spn(insn)) {
+ /* The rs of C.ADDI4SPN is x2 by default */
+ READ_ON(2);
+ WRITE_ON(rvc_l_rd(insn));
+ } else if (riscv_insn_is_c_lq(insn) ||
+ riscv_insn_is_c_lw(insn) ||
+ riscv_insn_is_c_ld(insn)) {
+ /* FIXME: c.lw/c.ld share opcode with c.flw/c.fld */
+ READ_ON(rvc_l_rs(insn));
+ WRITE_ON(rvc_l_rd(insn));
+ } else if (riscv_insn_is_c_lqsp(insn) ||
+ riscv_insn_is_c_lwsp(insn) ||
+ riscv_insn_is_c_ldsp(insn)) {
+ /*
+ * FIXME: c.lwsp/c.ldsp share opcode with c.flwsp/c.fldsp
+ * The rs of C.LQSP/C.LWSP/C.LDSP is x2 by default.
+ */
+ READ_ON(2);
+ WRITE_ON(rvc_i_rd(insn));
+ } else if (riscv_insn_is_c_li(insn) ||
+ riscv_insn_is_c_lui(insn)) {
+ WRITE_ON(rvc_i_rd(insn));
+ }
+
+ if ((*write > 1UL) && __builtin_ctzl(*write & ~1UL))
+ return;
+is_rvi:
+#endif
+ /* Stop searching until any control transfer instruction */
+ if (riscv_insn_is_branch(insn)) {
+ READ_ON(rvi_rs1(insn));
+ READ_ON(rvi_rs2(insn));
+ break;
+ }
+
+ if (riscv_insn_is_jal(insn)) {
+ WRITE_ON(rvi_rd(insn));
+ break;
+ }
+
+ if (riscv_insn_is_jalr(insn)) {
+ READ_ON(rvi_rs1(insn));
+ WRITE_ON(rvi_rd(insn));
+ break;
+ }
+
+ if (riscv_insn_is_system(insn)) {
+ /* csrrw, csrrs, csrrc */
+ if (rvi_rs1(insn))
+ READ_ON(rvi_rs1(insn));
+ /* csrrwi, csrrsi, csrrci, csrrw, csrrs, csrrc */
+ if (rvi_rd(insn))
+ WRITE_ON(rvi_rd(insn));
+ break;
+ }
+
+ /*
+ * Decode RVC instructions that has rd and rs, try to find out
+ * some rd, the number of which are equal with 'least' and never
+ * be used as rs.
+ */
+ if (riscv_insn_is_lui(insn) || riscv_insn_is_auipc(insn)) {
+ WRITE_ON(rvi_rd(insn));
+ } else if (riscv_insn_is_arith_ri(insn) ||
+ riscv_insn_is_load(insn)) {
+ READ_ON(rvi_rs1(insn));
+ WRITE_ON(rvi_rd(insn));
+ } else if (riscv_insn_is_arith_rr(insn) ||
+ riscv_insn_is_store(insn) ||
+ riscv_insn_is_amo(insn)) {
+ READ_ON(rvi_rs1(insn));
+ READ_ON(rvi_rs2(insn));
+ WRITE_ON(rvi_rd(insn));
+ }
+
+ if ((*write > 1UL) && __builtin_ctzl(*write & ~1UL))
+ return;
+ }
+}
+
static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op,
int *rd, int *ra)
{
+ unsigned long start, end;
+ /*
+ * Searching algorithm explanation:
+ *
+ * 1. Define two types of instruction area firstly:
+ *
+ * +-----+
+ * + +
+ * + + ---> instrunctions modified by optprobe, named 'O-Area'.
+ * + +
+ * +-----+
+ * + +
+ * + + ---> instructions after optprobe, named 'K-Area'.
+ * + +
+ * + ~ +
+ *
+ * 2. There are two usages for each GPR in given instruction area.
+ *
+ * - W: GPR is used as the RD oprand at first emergence.
+ * - R: GPR is used as the RS oprand at first emergence.
+ *
+ * Then there are 4 different usages for each GPR totally:
+ *
+ * 1. Used as W in O-Area, Used as W in K-Area.
+ * 2. Used as W in O-Area, Used as R in K-Area.
+ * 3. Used as R in O-Area, Used as W in K-Area.
+ * 4. Used as R in O-Area, Used as R in K-Area.
+ *
+ * All registers satisfy #1 or #3 could be chosen to form 'AUIPC/JALR'
+ * jumping to detour buffer.
+ *
+ * All registers satisfy #1 or #2, could be chosen to form 'JR' jumping
+ * back from detour buffer.
+ */
+ unsigned long kw = 0UL, kr = 0UL, ow = 0UL, or = 0UL;
+
+ /* Search one free register used to form AUIPC/JALR */
+ start = (unsigned long)&kp->opcode;
+ end = start + GET_INSN_LENGTH(kp->opcode);
+ find_register(start, end, &ow, &or);
+
+ start = (unsigned long)kp->addr + GET_INSN_LENGTH(kp->opcode);
+ end = (unsigned long)kp->addr + op->optinsn.length;
+ find_register(start, end, &ow, &or);
+
+ /* Search one free register used to form JR */
+ find_register(end, (unsigned long)_end, &kw, &kr);
+
+ if ((kw & ow) > 1UL) {
+ *rd = __builtin_ctzl((kw & ow) & ~1UL);
+ *ra = *rd;
+ return;
+ }
+
+ *rd = ((kw | ow) == 1UL) ? 0 : __builtin_ctzl((kw | ow) & ~1UL);
+ *ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL);
}
/*
--
2.34.1
From: Liao Chang <[email protected]>
Prepare skeleton to implement optimized kprobe on RISCV, it is consist
of Makfile, Kconfig and some architecture specific files: kprobe.h and
opt.c opt.c include some macro, type definition and functions required
by kprobe framework, opt_trampoline.S provide a piece of assembly code
template used to construct the detour buffer as the target of long jump
instruction(s) for each optimzed kprobe.
Since the jump range of PC-relative instruction JAL is +/-2M, that is
too small to reach the detour buffer, hence the foudamental idea to
address OPTPROBES on RISCV is replace 'EBREAK' with 'AUIPC/JALR'. which
means it needs to clobber one more instruction beside the kprobe
instruction, furthermore, RISCV supports hybird RVI and RVC in single
kernel binary, so in theory a pair of 'AUIPC/JALR' is about to clobber
10 bytes(3 RVC and 1 RVI, 2 bytes is padding for alignment) at worst
case. The second hardsome problem is looking for one integer register as
the destination of 'AUIPC/JALR' without any side-effect.
Signed-off-by: Liao Chang <[email protected]>
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/kprobes.h | 32 ++++++++++++++
arch/riscv/kernel/probes/Makefile | 1 +
arch/riscv/kernel/probes/opt.c | 51 +++++++++++++++++++++++
arch/riscv/kernel/probes/opt_trampoline.S | 12 ++++++
5 files changed, 97 insertions(+)
create mode 100644 arch/riscv/kernel/probes/opt.c
create mode 100644 arch/riscv/kernel/probes/opt_trampoline.S
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..5fa3094d55bc 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -103,6 +103,7 @@ config RISCV
select HAVE_KPROBES_ON_FTRACE if !XIP_KERNEL
select HAVE_KRETPROBES if !XIP_KERNEL
select HAVE_RETHOOK if !XIP_KERNEL
+ select HAVE_OPTPROBES if !XIP_KERNEL
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_PCI
diff --git a/arch/riscv/include/asm/kprobes.h b/arch/riscv/include/asm/kprobes.h
index e7882ccb0fd4..e85130c9112f 100644
--- a/arch/riscv/include/asm/kprobes.h
+++ b/arch/riscv/include/asm/kprobes.h
@@ -41,5 +41,37 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned int trapnr);
bool kprobe_breakpoint_handler(struct pt_regs *regs);
bool kprobe_single_step_handler(struct pt_regs *regs);
+#ifdef CONFIG_OPTPROBES
+
+/* optinsn template addresses */
+extern __visible kprobe_opcode_t optprobe_template_entry[];
+extern __visible kprobe_opcode_t optprobe_template_end[];
+
+#define MAX_OPTINSN_SIZE \
+ ((unsigned long)optprobe_template_end - \
+ (unsigned long)optprobe_template_entry)
+
+/*
+ * For RVI and RVC hybird encoding kernel, althought long jump just needs
+ * 2 RVI instructions(AUIPC+JALR), optimized instructions is 10 bytes long
+ * at most to ensure no RVI would be truncated actually, so it means four
+ * combinations:
+ * - 2 RVI
+ * - 4 RVC
+ * - 2 RVC + 1 RVI
+ * - 3 RVC + 1 RVI (truncated, need padding)
+ */
+#define MAX_COPIED_INSN 4
+#define MAX_OPTIMIZED_LENGTH 10
+
+struct arch_optimized_insn {
+ kprobe_opcode_t copied_insn[MAX_COPIED_INSN];
+ /* detour code buffer */
+ kprobe_opcode_t *insn;
+ unsigned long length;
+ int rd;
+};
+
+#endif /* CONFIG_OPTPROBES */
#endif /* CONFIG_KPROBES */
#endif /* _ASM_RISCV_KPROBES_H */
diff --git a/arch/riscv/kernel/probes/Makefile b/arch/riscv/kernel/probes/Makefile
index c40139e9ca47..3d837eb5f9be 100644
--- a/arch/riscv/kernel/probes/Makefile
+++ b/arch/riscv/kernel/probes/Makefile
@@ -3,4 +3,5 @@ obj-$(CONFIG_KPROBES) += kprobes.o decode-insn.o simulate-insn.o
obj-$(CONFIG_RETHOOK) += rethook.o rethook_trampoline.o
obj-$(CONFIG_KPROBES_ON_FTRACE) += ftrace.o
obj-$(CONFIG_UPROBES) += uprobes.o decode-insn.o simulate-insn.o
+obj-$(CONFIG_OPTPROBES) += opt.o opt_trampoline.o
CFLAGS_REMOVE_simulate-insn.o = $(CC_FLAGS_FTRACE)
diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
new file mode 100644
index 000000000000..56c8a227c857
--- /dev/null
+++ b/arch/riscv/kernel/probes/opt.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Kernel Probes Jump Optimization (Optprobes)
+ *
+ * Copyright (C) Guokai Chen, 2022
+ * Author: Guokai Chen [email protected]
+ */
+
+#define pr_fmt(fmt) "optprobe: " fmt
+
+#include <linux/kprobes.h>
+#include <asm/kprobes.h>
+
+int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
+{
+ return 0;
+}
+
+int arch_check_optimized_kprobe(struct optimized_kprobe *op)
+{
+ return 0;
+}
+
+int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
+ struct kprobe *orig)
+{
+ return 0;
+}
+
+void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
+{
+}
+
+void arch_optimize_kprobes(struct list_head *oplist)
+{
+}
+
+void arch_unoptimize_kprobes(struct list_head *oplist,
+ struct list_head *done_list)
+{
+}
+
+void arch_unoptimize_kprobe(struct optimized_kprobe *op)
+{
+}
+
+int arch_within_optimized_kprobe(struct optimized_kprobe *op,
+ kprobe_opcode_t *addr)
+{
+ return 0;
+}
diff --git a/arch/riscv/kernel/probes/opt_trampoline.S b/arch/riscv/kernel/probes/opt_trampoline.S
new file mode 100644
index 000000000000..16160c4367ff
--- /dev/null
+++ b/arch/riscv/kernel/probes/opt_trampoline.S
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2022 Guokai Chen
+ */
+
+#include <linux/linkage.h>
+
+#incldue <asm/csr.h>
+#include <asm/asm-offsets.h>
+
+SYM_ENTRY(optprobe_template_entry, SYM_L_GLOBAL, SYM_A_NONE)
+SYM_ENTRY(optprobe_template_end, SYM_L_GLOBAL, SYM_A_NONE)
--
2.34.1
From: Liao Chang <[email protected]>
This patch introduce code to prepare instruction slot for optimized
kprobe, the instruction slot for regular kprobe just records two
instructions, first one is the original instruction replaced by EBREAK,
the second one is EBREAK for single-step. While instruction slot for
optimized kprobe is larger, beside execute instruction out-of-line, it
also contains a standalone stackframe for calling kprobe handler.
All optimized instruction slots consis of 5 major parts, which copied
from the assembly code template in opt_trampoline.S.
SAVE REGS
CALL optimized_callback
RESTORE REGS
EXECUTE INSNS OUT-OF-LINE
RETURN BACK
Although most instructions in each slot are same, these slots still have
a bit difference in their payload, it is result from three parts:
- 'CALL optimized_callback', the relative offset for 'call'
instruction is different for each kprobe.
- 'EXECUTE INSN OUT-OF-LINE', no doubt.
- 'RETURN BACK', the chosen free register is reused here as the
destination register of jumping back.
So it also need to customize the slot payload for each optimized kprobe.
Signed-off-by: Liao Chang <[email protected]>
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
---
arch/riscv/include/asm/kprobes.h | 16 +++
arch/riscv/kernel/probes/opt.c | 76 +++++++++++++
arch/riscv/kernel/probes/opt_trampoline.S | 125 ++++++++++++++++++++++
3 files changed, 217 insertions(+)
diff --git a/arch/riscv/include/asm/kprobes.h b/arch/riscv/include/asm/kprobes.h
index e85130c9112f..e40c837d0a1d 100644
--- a/arch/riscv/include/asm/kprobes.h
+++ b/arch/riscv/include/asm/kprobes.h
@@ -46,10 +46,26 @@ bool kprobe_single_step_handler(struct pt_regs *regs);
/* optinsn template addresses */
extern __visible kprobe_opcode_t optprobe_template_entry[];
extern __visible kprobe_opcode_t optprobe_template_end[];
+extern __visible kprobe_opcode_t optprobe_template_save[];
+extern __visible kprobe_opcode_t optprobe_template_call[];
+extern __visible kprobe_opcode_t optprobe_template_insn[];
+extern __visible kprobe_opcode_t optprobe_template_return[];
#define MAX_OPTINSN_SIZE \
((unsigned long)optprobe_template_end - \
(unsigned long)optprobe_template_entry)
+#define DETOUR_SAVE_OFFSET \
+ ((unsigned long)optprobe_template_save - \
+ (unsigned long)optprobe_template_entry)
+#define DETOUR_CALL_OFFSET \
+ ((unsigned long)optprobe_template_call - \
+ (unsigned long)optprobe_template_entry)
+#define DETOUR_INSN_OFFSET \
+ ((unsigned long)optprobe_template_insn - \
+ (unsigned long)optprobe_template_entry)
+#define DETOUR_RETURN_OFFSET \
+ ((unsigned long)optprobe_template_return - \
+ (unsigned long)optprobe_template_entry)
/*
* For RVI and RVC hybird encoding kernel, althought long jump just needs
diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
index 258a283c906d..bc232fce5b39 100644
--- a/arch/riscv/kernel/probes/opt.c
+++ b/arch/riscv/kernel/probes/opt.c
@@ -11,9 +11,37 @@
#include <linux/kprobes.h>
#include <asm/kprobes.h>
#include <asm/patch.h>
+#include <asm/asm-offsets.h>
#include "simulate-insn.h"
#include "decode-insn.h"
+#include "../../net/bpf_jit.h"
+
+static void
+optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
+{
+ unsigned long flags;
+ struct kprobe_ctlblk *kcb;
+
+ /* Save skipped registers */
+ regs->epc = (unsigned long)op->kp.addr;
+ regs->orig_a0 = ~0UL;
+
+ local_irq_save(flags);
+ kcb = get_kprobe_ctlblk();
+
+ if (kprobe_running()) {
+ kprobes_inc_nmissed_count(&op->kp);
+ } else {
+ __this_cpu_write(current_kprobe, &op->kp);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ opt_pre_handler(&op->kp, regs);
+ __this_cpu_write(current_kprobe, NULL);
+ }
+ local_irq_restore(flags);
+}
+
+NOKPROBE_SYMBOL(optimized_callback)
static inline int in_auipc_jalr_range(long val)
{
@@ -30,6 +58,11 @@ static inline int in_auipc_jalr_range(long val)
#endif
}
+#define DETOUR_ADDR(code, offs) \
+ ((void *)((unsigned long)(code) + (offs)))
+#define DETOUR_INSN(code, offs) \
+ (*(kprobe_opcode_t *)((unsigned long)(code) + (offs)))
+
/*
* Copy optprobe assembly code template into detour buffer and modify some
* instructions for each kprobe.
@@ -38,6 +71,49 @@ static void prepare_detour_buffer(kprobe_opcode_t *code, kprobe_opcode_t *slot,
int rd, struct optimized_kprobe *op,
kprobe_opcode_t opcode)
{
+ long offs;
+ unsigned long data;
+
+ memcpy(code, optprobe_template_entry, MAX_OPTINSN_SIZE);
+
+ /* Step1: record optimized_kprobe pointer into detour buffer */
+ memcpy(DETOUR_ADDR(code, DETOUR_SAVE_OFFSET), &op, sizeof(op));
+
+ /*
+ * Step2
+ * auipc ra, 0 --> aupic ra, HI20.{optimized_callback - pc}
+ * jalr ra, 0(ra) --> jalr ra, LO12.{optimized_callback - pc}(ra)
+ */
+ offs = (unsigned long)&optimized_callback -
+ (unsigned long)DETOUR_ADDR(slot, DETOUR_CALL_OFFSET);
+ DETOUR_INSN(code, DETOUR_CALL_OFFSET) =
+ rv_auipc(1, (offs + (1 << 11)) >> 12);
+ DETOUR_INSN(code, DETOUR_CALL_OFFSET + 0x4) =
+ rv_jalr(1, 1, offs & 0xFFF);
+
+ /* Step3: copy replaced instructions into detour buffer */
+ memcpy(DETOUR_ADDR(code, DETOUR_INSN_OFFSET), op->kp.addr,
+ op->optinsn.length);
+ memcpy(DETOUR_ADDR(code, DETOUR_INSN_OFFSET), &opcode,
+ GET_INSN_LENGTH(opcode));
+
+ /* Step4: record return address of long jump into detour buffer */
+ data = (unsigned long)op->kp.addr + op->optinsn.length;
+ memcpy(DETOUR_ADDR(code, DETOUR_RETURN_OFFSET), &data, sizeof(data));
+
+ /*
+ * Step5
+ * auipc ra, 0 --> auipc rd, 0
+ * ld/w ra, -4(ra) --> ld/w rd, -8(rd)
+ * jalr x0, 0(ra) --> jalr x0, 0(rd)
+ */
+ DETOUR_INSN(code, DETOUR_RETURN_OFFSET + 0x8) = rv_auipc(rd, 0);
+#if __riscv_xlen == 32
+ DETOUR_INSN(code, DETOUR_RETURN_OFFSET + 0xC) = rv_lw(rd, -8, rd);
+#else
+ DETOUR_INSN(code, DETOUR_RETURN_OFFSET + 0xC) = rv_ld(rd, -8, rd);
+#endif
+ DETOUR_INSN(code, DETOUR_RETURN_OFFSET + 0x10) = rv_jalr(0, rd, 0);
}
/* Registers the first usage of which is the destination of instruction */
diff --git a/arch/riscv/kernel/probes/opt_trampoline.S b/arch/riscv/kernel/probes/opt_trampoline.S
index 16160c4367ff..75e34e373cf2 100644
--- a/arch/riscv/kernel/probes/opt_trampoline.S
+++ b/arch/riscv/kernel/probes/opt_trampoline.S
@@ -1,12 +1,137 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2022 Guokai Chen
+ * Copyright (C) 2022 Liao, Chang <[email protected]>
*/
#include <linux/linkage.h>
+#include <asm/asm.h>
#incldue <asm/csr.h>
#include <asm/asm-offsets.h>
SYM_ENTRY(optprobe_template_entry, SYM_L_GLOBAL, SYM_A_NONE)
+ addi sp, sp, -(PT_SIZE_ON_STACK)
+ REG_S x1, PT_RA(sp)
+ REG_S x2, PT_SP(sp)
+ REG_S x3, PT_GP(sp)
+ REG_S x4, PT_TP(sp)
+ REG_S x5, PT_T0(sp)
+ REG_S x6, PT_T1(sp)
+ REG_S x7, PT_T2(sp)
+ REG_S x8, PT_S0(sp)
+ REG_S x9, PT_S1(sp)
+ REG_S x10, PT_A0(sp)
+ REG_S x11, PT_A1(sp)
+ REG_S x12, PT_A2(sp)
+ REG_S x13, PT_A3(sp)
+ REG_S x14, PT_A4(sp)
+ REG_S x15, PT_A5(sp)
+ REG_S x16, PT_A6(sp)
+ REG_S x17, PT_A7(sp)
+ REG_S x18, PT_S2(sp)
+ REG_S x19, PT_S3(sp)
+ REG_S x20, PT_S4(sp)
+ REG_S x21, PT_S5(sp)
+ REG_S x22, PT_S6(sp)
+ REG_S x23, PT_S7(sp)
+ REG_S x24, PT_S8(sp)
+ REG_S x25, PT_S9(sp)
+ REG_S x26, PT_S10(sp)
+ REG_S x27, PT_S11(sp)
+ REG_S x28, PT_T3(sp)
+ REG_S x29, PT_T4(sp)
+ REG_S x30, PT_T5(sp)
+ REG_S x31, PT_T6(sp)
+ /* Update fp is friendly for stacktrace */
+ addi s0, sp, (PT_SIZE_ON_STACK)
+ j 1f
+
+SYM_ENTRY(optprobe_template_save, SYM_L_GLOBAL, SYM_A_NONE)
+ /*
+ * Step1:
+ * Filled with the pointer to optimized_kprobe data
+ */
+ .dword 0
+1:
+ /* Load optimize_kprobe pointer from .dword below */
+ auipc a0, 0
+ REG_L a0, -8(a0)
+ add a1, sp, x0
+
+SYM_ENTRY(optprobe_template_call, SYM_L_GLOBAL, SYM_A_NONE)
+ /*
+ * Step2:
+ * <IMME> of AUIPC/JALR are modified to the offset to optimized_callback
+ * jump target is loaded from above .dword.
+ */
+ auipc ra, 0
+ jalr ra, 0(ra)
+
+ REG_L x1, PT_RA(sp)
+ REG_L x3, PT_GP(sp)
+ REG_L x4, PT_TP(sp)
+ REG_L x5, PT_T0(sp)
+ REG_L x6, PT_T1(sp)
+ REG_L x7, PT_T2(sp)
+ REG_L x8, PT_S0(sp)
+ REG_L x9, PT_S1(sp)
+ REG_L x10, PT_A0(sp)
+ REG_L x11, PT_A1(sp)
+ REG_L x12, PT_A2(sp)
+ REG_L x13, PT_A3(sp)
+ REG_L x14, PT_A4(sp)
+ REG_L x15, PT_A5(sp)
+ REG_L x16, PT_A6(sp)
+ REG_L x17, PT_A7(sp)
+ REG_L x18, PT_S2(sp)
+ REG_L x19, PT_S3(sp)
+ REG_L x20, PT_S4(sp)
+ REG_L x21, PT_S5(sp)
+ REG_L x22, PT_S6(sp)
+ REG_L x23, PT_S7(sp)
+ REG_L x24, PT_S8(sp)
+ REG_L x25, PT_S9(sp)
+ REG_L x26, PT_S10(sp)
+ REG_L x27, PT_S11(sp)
+ REG_L x28, PT_T3(sp)
+ REG_L x29, PT_T4(sp)
+ REG_L x30, PT_T5(sp)
+ REG_L x31, PT_T6(sp)
+ REG_L x2, PT_SP(sp)
+ addi sp, sp, (PT_SIZE_ON_STACK)
+
+SYM_ENTRY(optprobe_template_insn, SYM_L_GLOBAL, SYM_A_NONE)
+ /*
+ * Step3:
+ * NOPS will be replaced by the probed instruction, at worst case 3 RVC
+ * and 1 RVI instructions is about to execute out of line.
+ */
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ j 2f
+
+SYM_ENTRY(optprobe_template_return, SYM_L_GLOBAL, SYM_A_NONE)
+ /*
+ * Step4:
+ * Filled with the return address of long jump(AUIPC/JALR)
+ */
+ .dword 0
+2:
+ /*
+ * Step5:
+ * The <RA> of AUIPC/LD/JALR will be replaced for each kprobe,
+ * used to read return address saved in .dword above.
+ */
+ auipc ra, 0
+ REG_L ra, -8(ra)
+ jalr x0, 0(ra)
SYM_ENTRY(optprobe_template_end, SYM_L_GLOBAL, SYM_A_NONE)
--
2.34.1
From: Liao Chang <[email protected]>
This patch add code that can be used to decode RVI and RVC instructions
in searching one register for 'AUIPC/JALR'. As mentioned in previous
patch, kprobe can't be optimized until one free integer register can be
found out to save the jump target, in order to figure out the register
searching, all instructions starts from the kprobe to the last one of
function needs to decode and test if contains one candidate register.
For all RVI instruction format, the position and length of 'rs1', 'rs2'
,'rd' and 'opcode' part are uniform, but the rule of RVC instruction
format is more complicated, so it address a couple of inline functions
to decode rs1/rs2/rd for RVC.
These instruction decoder suppose to be consistent with the RVC and
RV32/RV64G instruction set list specified in the riscv instruction
reference published at August 25, 2022.
Signed-off-by: Liao Chang <[email protected]>
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
---
arch/riscv/include/asm/bug.h | 5 +-
arch/riscv/kernel/probes/decode-insn.h | 148 +++++++++++++++++++++++
arch/riscv/kernel/probes/simulate-insn.h | 41 +++++++
3 files changed, 193 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
index 1aaea81fb141..9c33d3b58225 100644
--- a/arch/riscv/include/asm/bug.h
+++ b/arch/riscv/include/asm/bug.h
@@ -19,11 +19,14 @@
#define __BUG_INSN_32 _UL(0x00100073) /* ebreak */
#define __BUG_INSN_16 _UL(0x9002) /* c.ebreak */
+#define RVI_INSN_LEN 4UL
+#define RVC_INSN_LEN 2UL
+
#define GET_INSN_LENGTH(insn) \
({ \
unsigned long __len; \
__len = ((insn & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) ? \
- 4UL : 2UL; \
+ RVI_INSN_LEN : RVC_INSN_LEN; \
__len; \
})
diff --git a/arch/riscv/kernel/probes/decode-insn.h b/arch/riscv/kernel/probes/decode-insn.h
index 42269a7d676d..785b023a62ea 100644
--- a/arch/riscv/kernel/probes/decode-insn.h
+++ b/arch/riscv/kernel/probes/decode-insn.h
@@ -3,6 +3,7 @@
#ifndef _RISCV_KERNEL_KPROBES_DECODE_INSN_H
#define _RISCV_KERNEL_KPROBES_DECODE_INSN_H
+#include <linux/bitops.h>
#include <asm/sections.h>
#include <asm/kprobes.h>
@@ -15,4 +16,151 @@ enum probe_insn {
enum probe_insn __kprobes
riscv_probe_decode_insn(probe_opcode_t *addr, struct arch_probe_insn *asi);
+#ifdef CONFIG_KPROBES
+
+static inline u16 rvi_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 15) & 0x1f);
+}
+
+static inline u16 rvi_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 20) & 0x1f);
+}
+
+static inline u16 rvi_rd(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x1f);
+}
+
+static inline s32 rvi_branch_imme(kprobe_opcode_t opcode)
+{
+ u32 imme = 0;
+
+ imme |= (((opcode >> 8) & 0xf) << 1) |
+ (((opcode >> 25) & 0x3f) << 5) |
+ (((opcode >> 7) & 0x1) << 11) |
+ (((opcode >> 31) & 0x1) << 12);
+
+ return sign_extend32(imme, 13);
+}
+
+static inline s32 rvi_jal_imme(kprobe_opcode_t opcode)
+{
+ u32 imme = 0;
+
+ imme |= (((opcode >> 21) & 0x3ff) << 1) |
+ (((opcode >> 20) & 0x1) << 11) |
+ (((opcode >> 12) & 0xff) << 12) |
+ (((opcode >> 31) & 0x1) << 20);
+
+ return sign_extend32(imme, 21);
+}
+
+#ifdef CONFIG_RISCV_ISA_C
+static inline u16 rvc_r_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x1f);
+}
+
+static inline u16 rvc_r_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x1f);
+}
+
+static inline u16 rvc_r_rd(kprobe_opcode_t opcode)
+{
+ return rvc_r_rs1(opcode);
+}
+
+static inline u16 rvc_i_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x1f);
+}
+
+static inline u16 rvc_i_rd(kprobe_opcode_t opcode)
+{
+ return rvc_i_rs1(opcode);
+}
+
+static inline u16 rvc_ss_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x1f);
+}
+
+static inline u16 rvc_l_rd(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x7);
+}
+
+static inline u16 rvc_l_rs(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x7);
+}
+
+static inline u16 rvc_s_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x7);
+}
+
+static inline u16 rvc_s_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x7);
+}
+
+static inline u16 rvc_a_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x7);
+}
+
+static inline u16 rvc_a_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x7);
+}
+
+static inline u16 rvc_a_rd(kprobe_opcode_t opcode)
+{
+ return rvc_a_rs1(opcode);
+}
+
+static inline u16 rvc_b_rd(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x7);
+}
+
+static inline u16 rvc_b_rs(kprobe_opcode_t opcode)
+{
+ return rvc_b_rd(opcode);
+}
+
+static inline s32 rvc_branch_imme(kprobe_opcode_t opcode)
+{
+ u32 imme = 0;
+
+ imme |= (((opcode >> 3) & 0x3) << 1) |
+ (((opcode >> 10) & 0x3) << 3) |
+ (((opcode >> 2) & 0x1) << 5) |
+ (((opcode >> 5) & 0x3) << 6) |
+ (((opcode >> 12) & 0x1) << 8);
+
+ return sign_extend32(imme, 9);
+}
+
+static inline s32 rvc_jal_imme(kprobe_opcode_t opcode)
+{
+ u32 imme = 0;
+
+ imme |= (((opcode >> 3) & 0x3) << 1) |
+ (((opcode >> 11) & 0x1) << 4) |
+ (((opcode >> 2) & 0x1) << 5) |
+ (((opcode >> 7) & 0x1) << 6) |
+ (((opcode >> 6) & 0x1) << 7) |
+ (((opcode >> 9) & 0x3) << 8) |
+ (((opcode >> 8) & 0x1) << 10) |
+ (((opcode >> 12) & 0x1) << 11);
+
+ return sign_extend32(imme, 12);
+}
+#endif /* CONFIG_KPROBES */
+#endif /* CONFIG_RISCV_ISA_C */
#endif /* _RISCV_KERNEL_KPROBES_DECODE_INSN_H */
diff --git a/arch/riscv/kernel/probes/simulate-insn.h b/arch/riscv/kernel/probes/simulate-insn.h
index cb6ff7dccb92..74d8c1ba9064 100644
--- a/arch/riscv/kernel/probes/simulate-insn.h
+++ b/arch/riscv/kernel/probes/simulate-insn.h
@@ -37,6 +37,40 @@ __RISCV_INSN_FUNCS(c_jalr, 0xf007, 0x9002);
__RISCV_INSN_FUNCS(c_beqz, 0xe003, 0xc001);
__RISCV_INSN_FUNCS(c_bnez, 0xe003, 0xe001);
__RISCV_INSN_FUNCS(c_ebreak, 0xffff, 0x9002);
+/* RVC(S) instructions contain rs1 and rs2 */
+__RISCV_INSN_FUNCS(c_sq, 0xe003, 0xa000);
+__RISCV_INSN_FUNCS(c_sw, 0xe003, 0xc000);
+__RISCV_INSN_FUNCS(c_sd, 0xe003, 0xe000);
+/* RVC(A) instructions contain rs1 and rs2 */
+__RISCV_INSN_FUNCS(c_sub, 0xfc03, 0x8c01);
+__RISCV_INSN_FUNCS(c_subw, 0xfc43, 0x9c01);
+/* RVC(L) instructions contain rs1 */
+__RISCV_INSN_FUNCS(c_lq, 0xe003, 0x2000);
+__RISCV_INSN_FUNCS(c_lw, 0xe003, 0x4000);
+__RISCV_INSN_FUNCS(c_ld, 0xe003, 0x6000);
+/* RVC(I) instructions contain rs1 */
+__RISCV_INSN_FUNCS(c_addi, 0xe003, 0x0001);
+__RISCV_INSN_FUNCS(c_addiw, 0xe003, 0x2001);
+__RISCV_INSN_FUNCS(c_addi16sp, 0xe183, 0x6101);
+__RISCV_INSN_FUNCS(c_slli, 0xe003, 0x0002);
+/* RVC(B) instructions contain rs1 */
+__RISCV_INSN_FUNCS(c_sri, 0xe803, 0x8001);
+__RISCV_INSN_FUNCS(c_andi, 0xec03, 0x8801);
+/* RVC(SS) instructions contain rs2 */
+__RISCV_INSN_FUNCS(c_sqsp, 0xe003, 0xa002);
+__RISCV_INSN_FUNCS(c_swsp, 0xe003, 0xc002);
+__RISCV_INSN_FUNCS(c_sdsp, 0xe003, 0xe002);
+/* RVC(R) instructions contain rs2 and rd */
+__RISCV_INSN_FUNCS(c_mv, 0xe003, 0x8002);
+/* RVC(I) instructions contain sp and rd */
+__RISCV_INSN_FUNCS(c_lqsp, 0xe003, 0x2002);
+__RISCV_INSN_FUNCS(c_lwsp, 0xe003, 0x4002);
+__RISCV_INSN_FUNCS(c_ldsp, 0xe003, 0x6002);
+/* RVC(CW) instructions contain sp and rd */
+__RISCV_INSN_FUNCS(c_addi4spn, 0xe003, 0x0000);
+/* RVC(I) instructions contain rd */
+__RISCV_INSN_FUNCS(c_li, 0xe003, 0x4001);
+__RISCV_INSN_FUNCS(c_lui, 0xe003, 0x6001);
__RISCV_INSN_FUNCS(auipc, 0x7f, 0x17);
__RISCV_INSN_FUNCS(branch, 0x7f, 0x63);
@@ -44,4 +78,11 @@ __RISCV_INSN_FUNCS(branch, 0x7f, 0x63);
__RISCV_INSN_FUNCS(jal, 0x7f, 0x6f);
__RISCV_INSN_FUNCS(jalr, 0x707f, 0x67);
+__RISCV_INSN_FUNCS(arith_rr, 0x77, 0x33);
+__RISCV_INSN_FUNCS(arith_ri, 0x77, 0x13);
+__RISCV_INSN_FUNCS(lui, 0x7f, 0x37);
+__RISCV_INSN_FUNCS(load, 0x7f, 0x03);
+__RISCV_INSN_FUNCS(store, 0x7f, 0x23);
+__RISCV_INSN_FUNCS(amo, 0x7f, 0x2f);
+
#endif /* _RISCV_KERNEL_PROBES_SIMULATE_INSN_H */
--
2.34.1
From: Liao Chang <[email protected]>
This patch add code to check if kprobe can be optimized, regular kprobe
replaces single instruction with EBREAK or C.EBREAK, it just requires
the instrumented instruction support execute out-of-line or simulation,
while optimized kprobe patch AUIPC/JALR pair to do a long jump, it makes
everything more compilated, espeically for kernel that is hybrid RVI and
RVC binary, although AUIPC/JALR just need 8 bytes space, the bytes to
patch are 10 bytes long at worst case to ensure no RVI would be
truncated, so there are four methods to patch optimized kprobe.
- Replace 2 RVI with AUIPC/JALR.
- Replace 4 RVC with AUIPC/JALR.
- Replace 2 RVC and 1 RVI with AUIPC/JALR.
- Replace 3 RVC and 1 RVI with AUIPC/JALR, and patch C.NOP into last
two bytes for alignment.
So it has to find out a instruction window large enough to patch
AUIPC/JALR from the address instrumented breakpoint, meanwhile, ensure
no instruction has chance to jump into the range of patched window.
Signed-off-by: Liao Chang <[email protected]>
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
---
arch/riscv/kernel/probes/opt.c | 98 ++++++++++++++++++++++++++++++++--
1 file changed, 93 insertions(+), 5 deletions(-)
diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
index a0d2ab39e3fa..258a283c906d 100644
--- a/arch/riscv/kernel/probes/opt.c
+++ b/arch/riscv/kernel/probes/opt.c
@@ -271,15 +271,103 @@ static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op,
*ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL);
}
+static bool insn_jump_into_range(unsigned long addr, unsigned long start,
+ unsigned long end)
+{
+ kprobe_opcode_t insn = *(kprobe_opcode_t *)addr;
+ unsigned long target, offset = GET_INSN_LENGTH(insn);
+
+#ifdef CONFIG_RISCV_ISA_C
+ if (offset == RVC_INSN_LEN) {
+ if (riscv_insn_is_c_beqz(insn) || riscv_insn_is_c_bnez(insn))
+ target = addr + rvc_branch_imme(insn);
+ else if (riscv_insn_is_c_jal(insn) || riscv_insn_is_c_j(insn))
+ target = addr + rvc_jal_imme(insn);
+ else
+ target = 0;
+ return (target >= start) && (target < end);
+ }
+#endif
+
+ if (riscv_insn_is_branch(insn))
+ target = addr + rvi_branch_imme(insn);
+ else if (riscv_insn_is_jal(insn))
+ target = addr + rvi_jal_imme(insn);
+ else
+ target = 0;
+ return (target >= start) && (target < end);
+}
+
+static int search_copied_insn(unsigned long paddr, struct optimized_kprobe *op)
+{
+ int i = 1;
+ unsigned long offset = GET_INSN_LENGTH(*(kprobe_opcode_t *)paddr);
+
+ while ((i++ < MAX_COPIED_INSN) && (offset < 2 * RVI_INSN_LEN)) {
+ if (riscv_probe_decode_insn((probe_opcode_t *)paddr + offset,
+ NULL) != INSN_GOOD)
+ return -1;
+ offset += GET_INSN_LENGTH(*(kprobe_opcode_t *)(paddr + offset));
+ }
+
+ op->optinsn.length = offset;
+ return 0;
+}
+
/*
- * If two free registers can be found at the beginning of both
- * the start and the end of replaced code, it can be optimized
- * Also, in-function jumps need to be checked to make sure that
- * there is no jump to the second instruction to be replaced
+ * The kprobe can be optimized when no in-function jump reaches to the
+ * instructions replaced by optimized jump instructions(AUIPC/JALR).
*/
static bool can_optimize(unsigned long paddr, struct optimized_kprobe *op)
{
- return false;
+ int ret;
+ unsigned long addr, size = 0, offset = 0;
+ struct kprobe *kp = get_kprobe((kprobe_opcode_t *)paddr);
+
+ /*
+ * Skip optimization if kprobe has been disarmed or instrumented
+ * instruction support XOI.
+ */
+ if (!kp || (riscv_probe_decode_insn(&kp->opcode, NULL) != INSN_GOOD))
+ return false;
+
+ /*
+ * Find a instruction window large enough to contain a pair
+ * of AUIPC/JALR, and ensure each instruction in this window
+ * supports XOI.
+ */
+ ret = search_copied_insn(paddr, op);
+ if (ret)
+ return false;
+
+ if (!kallsyms_lookup_size_offset(paddr, &size, &offset))
+ return false;
+
+ /* Check there is enough space for relative jump(AUIPC/JALR) */
+ if (size - offset <= op->optinsn.length)
+ return false;
+
+ /*
+ * Decode instructions until function end, check any instruction
+ * don't jump into the window used to emit optprobe(AUIPC/JALR).
+ */
+ addr = paddr - offset;
+ while (addr < paddr) {
+ if (insn_jump_into_range(addr, paddr + RVC_INSN_LEN,
+ paddr + op->optinsn.length))
+ return false;
+ addr += GET_INSN_LENGTH(*(kprobe_opcode_t *)addr);
+ }
+
+ addr = paddr + op->optinsn.length;
+ while (addr < paddr - offset + size) {
+ if (insn_jump_into_range(addr, paddr + RVC_INSN_LEN,
+ paddr + op->optinsn.length))
+ return false;
+ addr += GET_INSN_LENGTH(*(kprobe_opcode_t *)addr);
+ }
+
+ return true;
}
int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
--
2.34.1
From: Liao Chang <[email protected]>
This patch provide a skeleton to prepare optimized kprobe instruction
slot, it is consist of two major parts, the first part is check if
current kprobe satifies the requirement to optimize. The kprobe bases on
breakpoint just require the instrumented instruction supports execute
out-of-line or simulation, however optimized kprobe bases on long-jump
needs more requirements, it includes:
- The target of long-jump in the range of 'AUIPC/JALR'.
- No near instruction jump to any instruction replaced by 'AUIPC/JALR'
- It managed to find one free register to form 'AUIPC/JALR' jumping to
detour buffer.
- It managed to find one free register to form 'JR' jumping back from
detour buffer
The second part is allocate a larger instruction slot for each optimized
kprobe, the payload of which is patched with the assembly code defined
in opt_trampoline.S, a call to kprobe pre_handler and these instructions
replaced by 'AUIPC/JALR'.
Signed-off-by: Liao Chang <[email protected]>
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
---
arch/riscv/kernel/probes/opt.c | 107 ++++++++++++++++++++++++++++++++-
1 file changed, 106 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
index 56c8a227c857..a4271e6033ba 100644
--- a/arch/riscv/kernel/probes/opt.c
+++ b/arch/riscv/kernel/probes/opt.c
@@ -10,6 +10,54 @@
#include <linux/kprobes.h>
#include <asm/kprobes.h>
+#include <asm/patch.h>
+
+static inline int in_auipc_jalr_range(long val)
+{
+#ifdef CONFIG_ARCH_RV32I
+ return 1;
+#else
+ /*
+ * Note that the set of address offsets that can be formed
+ * by pairing LUI with LD, AUIPC with JALR, etc. in RV64I is
+ * [−2^31−2^11, 2^31−2^11−1].
+ */
+ return ((-(1L << 31) - (1L << 11)) <= val) &&
+ (val < ((1L << 31) - (1L << 11)));
+#endif
+}
+
+/*
+ * Copy optprobe assembly code template into detour buffer and modify some
+ * instructions for each kprobe.
+ */
+static void prepare_detour_buffer(kprobe_opcode_t *code, kprobe_opcode_t *slot,
+ int rd, struct optimized_kprobe *op,
+ kprobe_opcode_t opcode)
+{
+}
+
+/*
+ * In RISC-V ISA, AUIPC/JALR clobber one register to form target address,
+ * by inspired by register renaming in OoO processor, this involves search
+ * backwards that is not previously used as a source register and is used
+ * as a destination register before any branch or jump instruction.
+ */
+static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op,
+ int *rd, int *ra)
+{
+}
+
+/*
+ * If two free registers can be found at the beginning of both
+ * the start and the end of replaced code, it can be optimized
+ * Also, in-function jumps need to be checked to make sure that
+ * there is no jump to the second instruction to be replaced
+ */
+static bool can_optimize(unsigned long paddr, struct optimized_kprobe *op)
+{
+ return false;
+}
int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
{
@@ -24,7 +72,64 @@ int arch_check_optimized_kprobe(struct optimized_kprobe *op)
int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
struct kprobe *orig)
{
- return 0;
+ long rel;
+ int rd, ra, ret;
+ kprobe_opcode_t *code = NULL, *slot = NULL;
+
+ if (!can_optimize((unsigned long)orig->addr, op))
+ return -EILSEQ;
+
+ code = kzalloc(MAX_OPTINSN_SIZE, GFP_KERNEL);
+ slot = get_optinsn_slot();
+ if (!code || !slot) {
+ ret = -ENOMEM;
+ goto on_error;
+ }
+
+ /*
+ * Verify if the address gap is within 4GB range, because this uses
+ * a auipc+jalr pair.
+ */
+ rel = (unsigned long)slot - (unsigned long)orig->addr;
+ if (!in_auipc_jalr_range(rel)) {
+ /*
+ * Different from x86, we free code buf directly instead of
+ * calling __arch_remove_optimized_kprobe() because
+ * we have not fill any field in op.
+ */
+ ret = -ERANGE;
+ goto on_error;
+ }
+
+ /*
+ * Search two free registers, rd is used as to form AUIPC/JALR jumping
+ * to detour buffer, ra is used as to form JR jumping back from detour
+ * buffer.
+ */
+ find_free_registers(orig, op, &rd, &ra);
+ if (rd == 0 || ra == 0) {
+ ret = -EILSEQ;
+ goto on_error;
+ }
+
+ op->optinsn.rd = rd;
+ prepare_detour_buffer(code, slot, ra, op, orig->opcode);
+
+ ret = patch_text_nosync((void *)slot, code, MAX_OPTINSN_SIZE);
+ if (!ret) {
+ op->optinsn.insn = slot;
+ kfree(code);
+ return 0;
+ }
+
+on_error:
+ if (slot) {
+ free_optinsn_slot(slot, 0);
+ op->optinsn.insn = NULL;
+ op->optinsn.length = 0;
+ }
+ kfree(code);
+ return ret;
}
void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
--
2.34.1
From: Liao Chang <[email protected]>
The patch optimize 'EBREAK' with 'AUIPC/JALR', introduce new patching
function to modify multiple instructions.
Signed-off-by: Liao Chang <[email protected]>
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
---
arch/riscv/include/asm/patch.h | 1 +
arch/riscv/kernel/patch.c | 23 ++++++++++---
arch/riscv/kernel/probes/opt.c | 63 ++++++++++++++++++++++++++++++++--
3 files changed, 81 insertions(+), 6 deletions(-)
diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h
index 9a7d7346001e..ee31539de65f 100644
--- a/arch/riscv/include/asm/patch.h
+++ b/arch/riscv/include/asm/patch.h
@@ -8,5 +8,6 @@
int patch_text_nosync(void *addr, const void *insns, size_t len);
int patch_text(void *addr, u32 insn);
+int patch_text_batch(void *addr, const void *insn, size_t size);
#endif /* _ASM_RISCV_PATCH_H */
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 765004b60513..ce324b6a6998 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -15,7 +15,8 @@
struct patch_insn {
void *addr;
- u32 insn;
+ const void *insn;
+ size_t size;
atomic_t cpu_count;
};
@@ -106,8 +107,7 @@ static int patch_text_cb(void *data)
if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
ret =
- patch_text_nosync(patch->addr, &patch->insn,
- GET_INSN_LENGTH(patch->insn));
+ patch_text_nosync(patch->addr, patch->insn, patch->size);
atomic_inc(&patch->cpu_count);
} else {
while (atomic_read(&patch->cpu_count) <= num_online_cpus())
@@ -123,7 +123,8 @@ int patch_text(void *addr, u32 insn)
{
struct patch_insn patch = {
.addr = addr,
- .insn = insn,
+ .insn = &insn,
+ .size = GET_INSN_LENGTH(insn),
.cpu_count = ATOMIC_INIT(0),
};
@@ -131,3 +132,17 @@ int patch_text(void *addr, u32 insn)
&patch, cpu_online_mask);
}
NOKPROBE_SYMBOL(patch_text);
+
+int patch_text_batch(void *addr, const void *insn, size_t size)
+{
+ struct patch_insn patch = {
+ .addr = addr,
+ .insn = insn,
+ .size = size,
+ .cpu_count = ATOMIC_INIT(0),
+ };
+
+ return stop_machine_cpuslocked(patch_text_cb, &patch, cpu_online_mask);
+}
+
+NOKPROBE_SYMBOL(patch_text_batch);
diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
index bc232fce5b39..1c0e9d218f6f 100644
--- a/arch/riscv/kernel/probes/opt.c
+++ b/arch/riscv/kernel/probes/opt.c
@@ -448,11 +448,19 @@ static bool can_optimize(unsigned long paddr, struct optimized_kprobe *op)
int arch_prepared_optinsn(struct arch_optimized_insn *optinsn)
{
- return 0;
+ return optinsn->length;
}
int arch_check_optimized_kprobe(struct optimized_kprobe *op)
{
+ unsigned long i;
+ struct kprobe *p;
+
+ for (i = RVC_INSN_LEN; i < op->optinsn.length; i += RVC_INSN_LEN) {
+ p = get_kprobe(op->kp.addr + i);
+ if (p && !kprobe_disabled(p))
+ return -EEXIST;
+ }
return 0;
}
@@ -521,23 +529,74 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
{
+ if (op->optinsn.insn) {
+ free_optinsn_slot(op->optinsn.insn, 1);
+ op->optinsn.insn = NULL;
+ op->optinsn.length = 0;
+ }
}
void arch_optimize_kprobes(struct list_head *oplist)
{
+ long offs;
+ kprobe_opcode_t insn[3];
+ struct optimized_kprobe *op, *tmp;
+
+ list_for_each_entry_safe(op, tmp, oplist, list) {
+ WARN_ON(kprobe_disabled(&op->kp));
+
+ /* Backup instructions which will be replaced by jump address */
+ memcpy(op->optinsn.copied_insn,
+ DETOUR_ADDR(op->kp.addr, GET_INSN_LENGTH(op->kp.opcode)),
+ op->optinsn.length - GET_INSN_LENGTH(op->kp.opcode));
+
+ /*
+ * After patch, it should be:
+ * auipc free_register, %hi(detour_buffer)
+ * jalr free_register, free_register, %lo(detour_buffer)
+ * where free_register will eventually save the return address
+ */
+ offs = (unsigned long)op->optinsn.insn -
+ (unsigned long)op->kp.addr;
+ insn[0] = rv_auipc(op->optinsn.rd, (offs + (1 << 11)) >> 12);
+ insn[1] = rv_jalr(op->optinsn.rd, op->optinsn.rd, offs & 0xFFF);
+ /* For 3 RVC + 1 RVI scenario, need C.NOP for padding */
+ if (op->optinsn.length > 2 * RVI_INSN_LEN)
+ insn[2] = rvc_addi(0, 0);
+
+ patch_text_batch(op->kp.addr, insn, op->optinsn.length);
+ if (memcmp(op->kp.addr, insn, op->optinsn.length))
+ continue;
+
+ list_del_init(&op->list);
+ }
}
void arch_unoptimize_kprobes(struct list_head *oplist,
struct list_head *done_list)
{
+ struct optimized_kprobe *op, *tmp;
+
+ list_for_each_entry_safe(op, tmp, oplist, list) {
+ arch_unoptimize_kprobe(op);
+ list_move(&op->list, done_list);
+ }
}
void arch_unoptimize_kprobe(struct optimized_kprobe *op)
{
+ kprobe_opcode_t buf[MAX_COPIED_INSN];
+ unsigned long offset = GET_INSN_LENGTH(op->kp.opcode);
+
+ buf[0] = (offset == RVI_INSN_LEN) ? __BUG_INSN_32 : __BUG_INSN_16;
+ memcpy(DETOUR_ADDR(buf, offset), op->optinsn.copied_insn,
+ op->optinsn.length - offset);
+ patch_text_batch(op->kp.addr, buf, op->optinsn.length);
}
int arch_within_optimized_kprobe(struct optimized_kprobe *op,
kprobe_opcode_t *addr)
{
- return 0;
+ return (op->kp.addr <= addr &&
+ op->kp.addr + op->optinsn.length > addr);
}
--
2.34.1
From: Liao Chang <[email protected]>
To address the limitation of PC-relative branch instruction on riscv
architecture, detour buffer slot used for optprobes is allocated from
the region, the distance of which from kernel should be less than 4GB.
For the time being, Modules region always live before the kernel.
But Vmalloc region reside far from kernel, the distance is half of the
kernel address space (See Documentation/riscv/vm-layout.rst), hence it
needs to override the alloc_optinsn_page() to make sure allocate detour
buffer from jump-safe region.
Signed-off-by: Liao Chang <[email protected]>
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
---
arch/riscv/kernel/probes/kprobes.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index f21592d20306..e1856b04db04 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -6,6 +6,7 @@
#include <linux/extable.h>
#include <linux/slab.h>
#include <linux/stop_machine.h>
+#include <linux/set_memory.h>
#include <asm/ptrace.h>
#include <linux/uaccess.h>
#include <asm/sections.h>
@@ -84,6 +85,29 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
}
#ifdef CONFIG_MMU
+#if defined(CONFIG_OPTPROBES) && defined(CONFIG_64BIT)
+void *alloc_optinsn_page(void)
+{
+ void *page;
+
+ page = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR,
+ MODULES_END, GFP_KERNEL,
+ PAGE_KERNEL, 0, NUMA_NO_NODE,
+ __builtin_return_address(0));
+ if (!page)
+ return NULL;
+
+ set_vm_flush_reset_perms(page);
+ /*
+ * First make the page read-only, and only then make it executable to
+ * prevent it from being W+X in between.
+ */
+ set_memory_rox((unsigned long)page, 1);
+
+ return page;
+}
+#endif
+
void *alloc_insn_page(void)
{
return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
--
2.34.1
This patch further allows optprobe to use caller-saved registers that
is not used across the function being optimized as free registers.
Signed-off-by: Chen Guokai <[email protected]>
Co-developed-by: Liao Chang <[email protected]>
Signed-off-by: Liao Chang <[email protected]>
Reported-by: Björn Töpel <[email protected]>
---
arch/riscv/include/asm/kprobes.h | 1 +
arch/riscv/kernel/probes/decode-insn.h | 5 +
arch/riscv/kernel/probes/opt.c | 121 ++++++++++++++++++++++---
3 files changed, 112 insertions(+), 15 deletions(-)
diff --git a/arch/riscv/include/asm/kprobes.h b/arch/riscv/include/asm/kprobes.h
index e40c837d0a1d..7fecec799077 100644
--- a/arch/riscv/include/asm/kprobes.h
+++ b/arch/riscv/include/asm/kprobes.h
@@ -86,6 +86,7 @@ struct arch_optimized_insn {
kprobe_opcode_t *insn;
unsigned long length;
int rd;
+ u32 free_reg;
};
#endif /* CONFIG_OPTPROBES */
diff --git a/arch/riscv/kernel/probes/decode-insn.h b/arch/riscv/kernel/probes/decode-insn.h
index 785b023a62ea..907b951f2c86 100644
--- a/arch/riscv/kernel/probes/decode-insn.h
+++ b/arch/riscv/kernel/probes/decode-insn.h
@@ -13,6 +13,11 @@ enum probe_insn {
INSN_GOOD,
};
+#define NRREG 32
+#define ALL_REG_OCCUPIED 0xffffffffu
+/* If a register is not caller-saved, its corresponding bit is set */
+#define NON_CALLER_SAVED_MASK 0xffc031d
+
enum probe_insn __kprobes
riscv_probe_decode_insn(probe_opcode_t *addr, struct arch_probe_insn *asi);
diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
index 1c0e9d218f6f..884e77d2df4c 100644
--- a/arch/riscv/kernel/probes/opt.c
+++ b/arch/riscv/kernel/probes/opt.c
@@ -12,6 +12,7 @@
#include <asm/kprobes.h>
#include <asm/patch.h>
#include <asm/asm-offsets.h>
+#include <linux/extable.h>
#include "simulate-insn.h"
#include "decode-insn.h"
@@ -130,7 +131,7 @@ static void prepare_detour_buffer(kprobe_opcode_t *code, kprobe_opcode_t *slot,
* as a destination register before any branch or jump instruction.
*/
static void find_register(unsigned long start, unsigned long end,
- unsigned long *write, unsigned long *read)
+ unsigned long *write, unsigned long *read)
{
kprobe_opcode_t insn;
unsigned long addr, offset = 0UL;
@@ -390,16 +391,99 @@ static int search_copied_insn(unsigned long paddr, struct optimized_kprobe *op)
return 0;
}
+static void update_free_reg(unsigned long addr, uint32_t *used_reg)
+{
+ kprobe_opcode_t insn = *(kprobe_opcode_t *)addr;
+ unsigned long offset = GET_INSN_LENGTH(insn);
+
+#ifdef CONFIG_RISCV_ISA_C
+ if (offset == RVI_INSN_LEN)
+ goto is_rvi;
+
+ insn &= __COMPRESSED_INSN_MASK;
+ if (riscv_insn_is_c_jal(insn)) {
+ *used_reg |= 1 << 1;
+ } else if (riscv_insn_is_c_jr(insn)) {
+ *used_reg |= 1 << rvc_r_rs1(insn);
+ } else if (riscv_insn_is_c_jalr(insn)) {
+ *used_reg |= 1 << rvc_r_rs1(insn);
+ } else if (riscv_insn_is_c_beqz(insn) || riscv_insn_is_c_bnez(insn)) {
+ *used_reg |= 1 << rvc_b_rs(insn);
+ } else if (riscv_insn_is_c_sub(insn) || riscv_insn_is_c_subw(insn)) {
+ *used_reg |= 1 << rvc_a_rs1(insn);
+ *used_reg |= 1 << rvc_a_rs2(insn);
+ } else if (riscv_insn_is_c_sq(insn) || riscv_insn_is_c_sw(insn) ||
+ riscv_insn_is_c_sd(insn)) {
+ *used_reg |= 1 << rvc_s_rs1(insn);
+ *used_reg |= 1 << rvc_s_rs2(insn);
+ } else if (riscv_insn_is_c_addi16sp(insn) || riscv_insn_is_c_addi(insn) ||
+ riscv_insn_is_c_addiw(insn) ||
+ riscv_insn_is_c_slli(insn)) {
+ *used_reg |= 1 << rvc_i_rs1(insn);
+ } else if (riscv_insn_is_c_sri(insn) ||
+ riscv_insn_is_c_andi(insn)) {
+ *used_reg |= 1 << rvc_b_rs(insn);
+ } else if (riscv_insn_is_c_sqsp(insn) || riscv_insn_is_c_swsp(insn) ||
+ riscv_insn_is_c_sdsp(insn)) {
+ *used_reg |= 1 << rvc_ss_rs2(insn);
+ *used_reg |= 1 << 2;
+ } else if (riscv_insn_is_c_mv(insn)) {
+ *used_reg |= 1 << rvc_r_rs2(insn);
+ } else if (riscv_insn_is_c_addi4spn(insn)) {
+ *used_reg |= 1 << 2;
+ } else if (riscv_insn_is_c_lq(insn) || riscv_insn_is_c_lw(insn) ||
+ riscv_insn_is_c_ld(insn)) {
+ *used_reg |= 1 << rvc_l_rs(insn);
+ } else if (riscv_insn_is_c_lqsp(insn) || riscv_insn_is_c_lwsp(insn) ||
+ riscv_insn_is_c_ldsp(insn)) {
+ *used_reg |= 1 << 2;
+ }
+ /* li and lui does not have source reg */
+ return;
+is_rvi:
+#endif
+ if (riscv_insn_is_arith_ri(insn) || riscv_insn_is_load(insn)) {
+ *used_reg |= 1 << rvi_rs1(insn);
+ } else if (riscv_insn_is_arith_rr(insn) || riscv_insn_is_store(insn) ||
+ riscv_insn_is_amo(insn)) {
+ *used_reg |= 1 << rvi_rs1(insn);
+ *used_reg |= 1 << rvi_rs2(insn);
+ } else if (riscv_insn_is_branch(insn)) {
+ *used_reg |= 1 << rvi_rs1(insn);
+ *used_reg |= 1 << rvi_rs2(insn);
+ } else if (riscv_insn_is_jalr(insn)) {
+ *used_reg |= 1 << rvi_rs1(insn);
+ }
+}
+
+static bool scan_code(unsigned long *addr, unsigned long paddr,
+ struct optimized_kprobe *op, uint32_t *used_reg)
+{
+ if (insn_jump_into_range(*addr, paddr + RVC_INSN_LEN,
+ paddr + op->optinsn.length))
+ return false;
+ if (search_exception_tables(*addr))
+ return false;
+ update_free_reg(*addr, used_reg);
+ *addr += GET_INSN_LENGTH(*(kprobe_opcode_t *)addr);
+ return true;
+}
+
/*
* The kprobe can be optimized when no in-function jump reaches to the
* instructions replaced by optimized jump instructions(AUIPC/JALR).
*/
-static bool can_optimize(unsigned long paddr, struct optimized_kprobe *op)
+static bool can_optimize(unsigned long paddr, struct optimized_kprobe *op, uint32_t *used_reg)
{
int ret;
unsigned long addr, size = 0, offset = 0;
struct kprobe *kp = get_kprobe((kprobe_opcode_t *)paddr);
+ /*
+ * All callee
+ */
+ *used_reg = NON_CALLER_SAVED_MASK;
+
/*
* Skip optimization if kprobe has been disarmed or instrumented
* instruction support XOI.
@@ -429,18 +513,14 @@ static bool can_optimize(unsigned long paddr, struct optimized_kprobe *op)
*/
addr = paddr - offset;
while (addr < paddr) {
- if (insn_jump_into_range(addr, paddr + RVC_INSN_LEN,
- paddr + op->optinsn.length))
+ if (!scan_code(&addr, paddr, op, used_reg))
return false;
- addr += GET_INSN_LENGTH(*(kprobe_opcode_t *)addr);
}
-
- addr = paddr + op->optinsn.length;
+ update_free_reg((unsigned long)&kp->opcode, used_reg);
+ addr = paddr + GET_INSN_LENGTH(*(kprobe_opcode_t *)&kp->opcode);
while (addr < paddr - offset + size) {
- if (insn_jump_into_range(addr, paddr + RVC_INSN_LEN,
- paddr + op->optinsn.length))
+ if (!scan_code(&addr, paddr, op, used_reg))
return false;
- addr += GET_INSN_LENGTH(*(kprobe_opcode_t *)addr);
}
return true;
@@ -469,10 +549,13 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
{
long rel;
int rd, ra, ret;
+ u32 used_reg;
kprobe_opcode_t *code = NULL, *slot = NULL;
- if (!can_optimize((unsigned long)orig->addr, op))
+ if (!can_optimize((unsigned long)orig->addr, op, &used_reg)) {
+ op->optinsn.rd = -1;
return -EILSEQ;
+ }
code = kzalloc(MAX_OPTINSN_SIZE, GFP_KERNEL);
slot = get_optinsn_slot();
@@ -497,11 +580,17 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
}
/*
- * Search two free registers, rd is used as to form AUIPC/JALR jumping
- * to detour buffer, ra is used as to form JR jumping back from detour
- * buffer.
+ * Search two free registers if no unused ones, rd is used as to form
+ * AUIPC/JALR jumping to detour buffer, ra is used as to form JR jumping
+ * back from detour buffer.
*/
- find_free_registers(orig, op, &rd, &ra);
+ if (used_reg == ALL_REG_OCCUPIED) {
+ find_free_registers(orig, op, &rd, &ra);
+ } else {
+ rd = ffz(used_reg);
+ ra = rd;
+ }
+
if (rd == 0 || ra == 0) {
ret = -EILSEQ;
goto on_error;
@@ -545,6 +634,8 @@ void arch_optimize_kprobes(struct list_head *oplist)
list_for_each_entry_safe(op, tmp, oplist, list) {
WARN_ON(kprobe_disabled(&op->kp));
+ if (op->optinsn.rd < 0)
+ continue;
/* Backup instructions which will be replaced by jump address */
memcpy(op->optinsn.copied_insn,
DETOUR_ADDR(op->kp.addr, GET_INSN_LENGTH(op->kp.opcode)),
--
2.34.1
Chen Guokai <[email protected]> writes:
> From: Liao Chang <[email protected]>
>
> Prepare skeleton to implement optimized kprobe on RISCV, it is consist
> of Makfile, Kconfig and some architecture specific files: kprobe.h and
> opt.c opt.c include some macro, type definition and functions required
> by kprobe framework, opt_trampoline.S provide a piece of assembly code
> template used to construct the detour buffer as the target of long jump
> instruction(s) for each optimzed kprobe.
This is pretty much just reiterating what diff-stat says. Please try to
explain why a certain change is done, instead of what. What is already
in the patch.
> Since the jump range of PC-relative instruction JAL is +/-2M, that is
> too small to reach the detour buffer, hence the foudamental idea to
> address OPTPROBES on RISCV is replace 'EBREAK' with 'AUIPC/JALR'. which
> means it needs to clobber one more instruction beside the kprobe
> instruction, furthermore, RISCV supports hybird RVI and RVC in single
> kernel binary, so in theory a pair of 'AUIPC/JALR' is about to clobber
> 10 bytes(3 RVC and 1 RVI, 2 bytes is padding for alignment) at worst
> case. The second hardsome problem is looking for one integer register as
> the destination of 'AUIPC/JALR' without any side-effect.
There are a number of spelling errors, please use a spellchecker and if
you reference a file (e.g. Makefile), make sure it is correctly spelled
out.
The comments above applies to all the commit messages of this series.
Björn
Chen Guokai <[email protected]> writes:
> Add jump optimization support for RISC-V.
Thank you for continuing to work on the series! I took the series for a
spin, and ran into a number of issues that makes me wonder how you test
the series, and how the testing is different from my runs.
I'll outline the general/big issues here, and leave the specifics per-patch.
I've done simple testing, using "Kprobe-based Event Tracing"
(CONFIG_KPROBE_EVENTS=y) via tracefs.
All the tests were run on commit 88603b6dc419 ("Linux 6.2-rc2") with the
series applied. All the bugs were trigged by setting different probes to
do_sys_openat2. Code:
do_sys_openat2:
...snip...
ffffffff802d138c: 89aa c.mv s3,a0 // +44
ffffffff802d138e: 892e c.mv s2,a1 // +46
ffffffff802d1390: 8532 c.mv a0,a2
ffffffff802d1392: fa040593 addi a1,s0,-96
ffffffff802d1396: 84b2 c.mv s1,a2
ffffffff802d1398: fa043023 sd zero,-96(s0)
ffffffff802d139c: fa043423 sd zero,-88(s0)
ffffffff802d13a0: fa042823 sw zero,-80(s0)
ffffffff802d13a4: 00000097 auipc ra,0x0
...snip...
1. Fail to register kprobe to c.mv
Add a kprobe:
echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
register_kprobe returns -22 (EINVAL). This is due to a bug in the
instruction decoder. I've sent to fix upstream [1].
2. (with [1] applied) Oops when register a probe
Add a kprobe:
echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
You get a splat:
Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008
Oops [#1]
Modules linked in:
CPU: 1 PID: 242 Comm: bash Tainted: G W 6.2.0-rc2-00010-g09ff1aa7b1f9-dirty #14
Hardware name: riscv-virtio,qemu (DT)
epc : riscv_probe_decode_insn+0x16a/0x192
ra : riscv_probe_decode_insn+0x32/0x192
epc : ffffffff8127b2bc ra : ffffffff8127b184 sp : ff2000000173bac0
gp : ffffffff82533f70 tp : ff60000086ab2b40 t0 : 0000000000000000
t1 : 0000000000000850 t2 : 65646f6365642054 s0 : ff2000000173bae0
s1 : 0000000000000017 a0 : 000000000000e001 a1 : 000000000000003f
a2 : 0000000000009002 a3 : 0000000000000017 a4 : 000000000000c001
a5 : ffffffff8127b38a a6 : ff6000047d666000 a7 : 0000000000040000
s2 : 0000000000000000 s3 : 0000000000000006 s4 : ff6000008558f718
s5 : ff6000008558f718 s6 : 0000000000000001 s7 : ff6000008558f768
s8 : 0000000000000007 s9 : 0000000000000003 s10: 0000000000000002
s11: 00aaaaaad62baf78 t3 : 0000000000000000 t4 : 8dd70b0100000000
t5 : ffffffffffffe000 t6 : ff2000000173b8c8
status: 0000000200000120 badaddr: 0000000000000008 cause: 000000000000000f
[<ffffffff81257e48>] arch_prepare_optimized_kprobe+0xc2/0x4ec
[<ffffffff8125b420>] alloc_aggr_kprobe+0x5c/0x6a
[<ffffffff8125ba0a>] register_kprobe+0x5dc/0x6a2
[<ffffffff8016f266>] __register_trace_kprobe.part.0+0x98/0xbc
[<ffffffff80170544>] __trace_kprobe_create+0x6ea/0xbcc
[<ffffffff80176cee>] trace_probe_create+0x6c/0x7c
[<ffffffff8016f1a2>] create_or_delete_trace_kprobe+0x24/0x50
[<ffffffff80150642>] trace_parse_run_command+0x9e/0x12a
[<ffffffff8016f176>] probes_write+0x18/0x20
[<ffffffff802d494a>] vfs_write+0xca/0x41e
[<ffffffff802d4f96>] ksys_write+0x70/0xee
[<ffffffff802d5036>] sys_write+0x22/0x2a
[<ffffffff80004196>] ret_from_syscall+0x0/0x2
This is because a call to riscv_probe_decode_insn(probe_opcode_t *addr,
struct arch_probe_insn *api), where api is NULL (and tripping over
auipc). Should be a common scenario...
3. No bound check for instructions
Add a probe to a non-valid instruction (in the middle of addi):
echo 'p 0xffffffff802d1394' > /sys/kernel/debug/tracing/kprobe_events
You get the same splat as above from the auipc NULL-pointer, but the
"half" addi-instruction is parsed as a correct instruction.
4. Lockdep splat
Might be false positive; When enabling a probe, e.g.
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
======================================================
WARNING: possible circular locking dependency detected
------------------------------------------------------
bash/244 is trying to acquire lock:
ffffffff8223ee90 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x2c/0x54
but task is already holding lock:
ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (text_mutex){+.+.}-{3:3}:
lock_acquire+0x10a/0x328
__mutex_lock+0xa8/0x770
mutex_lock_nested+0x28/0x30
register_kprobe+0x3ae/0x5ea
__register_trace_kprobe.part.0+0x98/0xbc
__trace_kprobe_create+0x6ea/0xbcc
trace_probe_create+0x6c/0x7c
create_or_delete_trace_kprobe+0x24/0x50
trace_parse_run_command+0x9e/0x12a
probes_write+0x18/0x20
vfs_write+0xca/0x41e
ksys_write+0x70/0xee
sys_write+0x22/0x2a
ret_from_syscall+0x0/0x2
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
check_noncircular+0x122/0x13a
__lock_acquire+0x1058/0x20e4
lock_acquire+0x10a/0x328
cpus_read_lock+0x4c/0x11c
stop_machine+0x2c/0x54
arch_ftrace_update_code+0x2e/0x4c
ftrace_startup+0xd0/0x15e
register_ftrace_function+0x32/0x7c
arm_kprobe+0x132/0x198
enable_kprobe+0x9c/0xc0
enable_trace_kprobe+0x6e/0xea
kprobe_register+0x64/0x6c
__ftrace_event_enable_disable+0x72/0x246
event_enable_write+0x94/0xe4
vfs_write+0xca/0x41e
ksys_write+0x70/0xee
sys_write+0x22/0x2a
ret_from_syscall+0x0/0x2
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(text_mutex);
lock(cpu_hotplug_lock);
lock(text_mutex);
lock(cpu_hotplug_lock);
*** DEADLOCK ***
5 locks held by bash/244:
#0: ff60000080f49438 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x70/0xee
#1: ffffffff822d9468 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0x7c/0xe4
#2: ffffffff822d3fa8 (kprobe_mutex){+.+.}-{3:3}, at: enable_kprobe+0x32/0xc0
#3: ffffffff822d56d8 (ftrace_lock){+.+.}-{3:3}, at: register_ftrace_function+0x26/0x7c
#4: ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
stack backtrace:
CPU: 2 PID: 244 Comm: bash Not tainted 6.2.0-rc1-00008-g544b2c59fd81 #1
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80006e80>] dump_backtrace+0x30/0x38
[<ffffffff81256e82>] show_stack+0x40/0x4c
[<ffffffff8126e054>] dump_stack_lvl+0x62/0x84
[<ffffffff8126e08e>] dump_stack+0x18/0x20
[<ffffffff8009b37e>] print_circular_bug+0x2ac/0x318
[<ffffffff8009b50c>] check_noncircular+0x122/0x13a
[<ffffffff8009e020>] __lock_acquire+0x1058/0x20e4
[<ffffffff8009f90c>] lock_acquire+0x10a/0x328
[<ffffffff8002fb8a>] cpus_read_lock+0x4c/0x11c
[<ffffffff8011ed60>] stop_machine+0x2c/0x54
[<ffffffff8013aec6>] arch_ftrace_update_code+0x2e/0x4c
[<ffffffff8013e796>] ftrace_startup+0xd0/0x15e
[<ffffffff8013e856>] register_ftrace_function+0x32/0x7c
[<ffffffff8012f928>] arm_kprobe+0x132/0x198
[<ffffffff8012fa2a>] enable_kprobe+0x9c/0xc0
[<ffffffff8016ff62>] enable_trace_kprobe+0x6e/0xea
[<ffffffff801700da>] kprobe_register+0x64/0x6c
[<ffffffff8015eba6>] __ftrace_event_enable_disable+0x72/0x246
[<ffffffff8015eeea>] event_enable_write+0x94/0xe4
[<ffffffff802d5e1a>] vfs_write+0xca/0x41e
[<ffffffff802d6466>] ksys_write+0x70/0xee
[<ffffffff802d6506>] sys_write+0x22/0x2a
[<ffffffff80004196>] ret_from_syscall+0x0/0x2
5. 32b support?
I've noticed that there code supports rv32. Is this tested? Does regular
kprobes work on 32b?
Thanks,
Björn
[1] https://lore.kernel.org/linux-riscv/[email protected]/
Chen Guokai <[email protected]> writes:
> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
> index 1c0e9d218f6f..884e77d2df4c 100644
> --- a/arch/riscv/kernel/probes/opt.c
> +++ b/arch/riscv/kernel/probes/opt.c
> @@ -12,6 +12,7 @@
> #include <asm/kprobes.h>
> #include <asm/patch.h>
> #include <asm/asm-offsets.h>
> +#include <linux/extable.h>
>
> #include "simulate-insn.h"
> #include "decode-insn.h"
> @@ -130,7 +131,7 @@ static void prepare_detour_buffer(kprobe_opcode_t *code, kprobe_opcode_t *slot,
> * as a destination register before any branch or jump instruction.
> */
> static void find_register(unsigned long start, unsigned long end,
> - unsigned long *write, unsigned long *read)
> + unsigned long *write, unsigned long *read)
Probably a patch messup; This makes the series no apply fully.
Björn
Chen Guokai <[email protected]> writes:
> From: Liao Chang <[email protected]>
> diff --git a/arch/riscv/kernel/probes/simulate-insn.h b/arch/riscv/kernel/probes/simulate-insn.h
> index cb6ff7dccb92..74d8c1ba9064 100644
> --- a/arch/riscv/kernel/probes/simulate-insn.h
> +++ b/arch/riscv/kernel/probes/simulate-insn.h
> @@ -37,6 +37,40 @@ __RISCV_INSN_FUNCS(c_jalr, 0xf007, 0x9002);
> __RISCV_INSN_FUNCS(c_beqz, 0xe003, 0xc001);
> __RISCV_INSN_FUNCS(c_bnez, 0xe003, 0xe001);
> __RISCV_INSN_FUNCS(c_ebreak, 0xffff, 0x9002);
> +/* RVC(S) instructions contain rs1 and rs2 */
> +__RISCV_INSN_FUNCS(c_sq, 0xe003, 0xa000);
> +__RISCV_INSN_FUNCS(c_sw, 0xe003, 0xc000);
> +__RISCV_INSN_FUNCS(c_sd, 0xe003, 0xe000);
> +/* RVC(A) instructions contain rs1 and rs2 */
> +__RISCV_INSN_FUNCS(c_sub, 0xfc03, 0x8c01);
Incorrect mask.
> +__RISCV_INSN_FUNCS(c_subw, 0xfc43, 0x9c01);
> +/* RVC(L) instructions contain rs1 */
> +__RISCV_INSN_FUNCS(c_lq, 0xe003, 0x2000);
> +__RISCV_INSN_FUNCS(c_lw, 0xe003, 0x4000);
> +__RISCV_INSN_FUNCS(c_ld, 0xe003, 0x6000);
> +/* RVC(I) instructions contain rs1 */
> +__RISCV_INSN_FUNCS(c_addi, 0xe003, 0x0001);
> +__RISCV_INSN_FUNCS(c_addiw, 0xe003, 0x2001);
> +__RISCV_INSN_FUNCS(c_addi16sp, 0xe183, 0x6101);
> +__RISCV_INSN_FUNCS(c_slli, 0xe003, 0x0002);
> +/* RVC(B) instructions contain rs1 */
> +__RISCV_INSN_FUNCS(c_sri, 0xe803, 0x8001);
> +__RISCV_INSN_FUNCS(c_andi, 0xec03, 0x8801);
> +/* RVC(SS) instructions contain rs2 */
> +__RISCV_INSN_FUNCS(c_sqsp, 0xe003, 0xa002);
> +__RISCV_INSN_FUNCS(c_swsp, 0xe003, 0xc002);
> +__RISCV_INSN_FUNCS(c_sdsp, 0xe003, 0xe002);
> +/* RVC(R) instructions contain rs2 and rd */
> +__RISCV_INSN_FUNCS(c_mv, 0xe003, 0x8002);
Shouldn't the mask be 0xf003?
Björn
Chen Guokai <[email protected]> writes:
> From: Liao Chang <[email protected]>
> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
> index 258a283c906d..bc232fce5b39 100644
> --- a/arch/riscv/kernel/probes/opt.c
> +++ b/arch/riscv/kernel/probes/opt.c
> @@ -11,9 +11,37 @@
> #include <linux/kprobes.h>
> #include <asm/kprobes.h>
> #include <asm/patch.h>
> +#include <asm/asm-offsets.h>
>
> #include "simulate-insn.h"
> #include "decode-insn.h"
> +#include "../../net/bpf_jit.h"
> +
> +static void
Super-nit, but I really prefer *not* breaking function name and return
value, for grepability.
> diff --git a/arch/riscv/kernel/probes/opt_trampoline.S b/arch/riscv/kernel/probes/opt_trampoline.S
> index 16160c4367ff..75e34e373cf2 100644
> --- a/arch/riscv/kernel/probes/opt_trampoline.S
> +++ b/arch/riscv/kernel/probes/opt_trampoline.S
> @@ -1,12 +1,137 @@
> /* SPDX-License-Identifier: GPL-2.0-only */
> /*
> * Copyright (C) 2022 Guokai Chen
> + * Copyright (C) 2022 Liao, Chang <[email protected]>
> */
>
> #include <linux/linkage.h>
>
> +#include <asm/asm.h>
> #incldue <asm/csr.h>
> #include <asm/asm-offsets.h>
>
> SYM_ENTRY(optprobe_template_entry, SYM_L_GLOBAL, SYM_A_NONE)
> + addi sp, sp, -(PT_SIZE_ON_STACK)
> + REG_S x1, PT_RA(sp)
> + REG_S x2, PT_SP(sp)
> + REG_S x3, PT_GP(sp)
> + REG_S x4, PT_TP(sp)
> + REG_S x5, PT_T0(sp)
> + REG_S x6, PT_T1(sp)
> + REG_S x7, PT_T2(sp)
> + REG_S x8, PT_S0(sp)
> + REG_S x9, PT_S1(sp)
> + REG_S x10, PT_A0(sp)
> + REG_S x11, PT_A1(sp)
> + REG_S x12, PT_A2(sp)
> + REG_S x13, PT_A3(sp)
> + REG_S x14, PT_A4(sp)
> + REG_S x15, PT_A5(sp)
> + REG_S x16, PT_A6(sp)
> + REG_S x17, PT_A7(sp)
> + REG_S x18, PT_S2(sp)
> + REG_S x19, PT_S3(sp)
> + REG_S x20, PT_S4(sp)
> + REG_S x21, PT_S5(sp)
> + REG_S x22, PT_S6(sp)
> + REG_S x23, PT_S7(sp)
> + REG_S x24, PT_S8(sp)
> + REG_S x25, PT_S9(sp)
> + REG_S x26, PT_S10(sp)
> + REG_S x27, PT_S11(sp)
> + REG_S x28, PT_T3(sp)
> + REG_S x29, PT_T4(sp)
> + REG_S x30, PT_T5(sp)
> + REG_S x31, PT_T6(sp)
> + /* Update fp is friendly for stacktrace */
> + addi s0, sp, (PT_SIZE_ON_STACK)
> + j 1f
> +
> +SYM_ENTRY(optprobe_template_save, SYM_L_GLOBAL, SYM_A_NONE)
> + /*
> + * Step1:
> + * Filled with the pointer to optimized_kprobe data
> + */
> + .dword 0
> +1:
> + /* Load optimize_kprobe pointer from .dword below */
> + auipc a0, 0
> + REG_L a0, -8(a0)
> + add a1, sp, x0
> +
> +SYM_ENTRY(optprobe_template_call, SYM_L_GLOBAL, SYM_A_NONE)
> + /*
> + * Step2:
> + * <IMME> of AUIPC/JALR are modified to the offset to optimized_callback
> + * jump target is loaded from above .dword.
> + */
> + auipc ra, 0
> + jalr ra, 0(ra)
> +
> + REG_L x1, PT_RA(sp)
> + REG_L x3, PT_GP(sp)
> + REG_L x4, PT_TP(sp)
> + REG_L x5, PT_T0(sp)
> + REG_L x6, PT_T1(sp)
> + REG_L x7, PT_T2(sp)
> + REG_L x8, PT_S0(sp)
> + REG_L x9, PT_S1(sp)
> + REG_L x10, PT_A0(sp)
> + REG_L x11, PT_A1(sp)
> + REG_L x12, PT_A2(sp)
> + REG_L x13, PT_A3(sp)
> + REG_L x14, PT_A4(sp)
> + REG_L x15, PT_A5(sp)
> + REG_L x16, PT_A6(sp)
> + REG_L x17, PT_A7(sp)
> + REG_L x18, PT_S2(sp)
> + REG_L x19, PT_S3(sp)
> + REG_L x20, PT_S4(sp)
> + REG_L x21, PT_S5(sp)
> + REG_L x22, PT_S6(sp)
> + REG_L x23, PT_S7(sp)
> + REG_L x24, PT_S8(sp)
> + REG_L x25, PT_S9(sp)
> + REG_L x26, PT_S10(sp)
> + REG_L x27, PT_S11(sp)
> + REG_L x28, PT_T3(sp)
> + REG_L x29, PT_T4(sp)
> + REG_L x30, PT_T5(sp)
> + REG_L x31, PT_T6(sp)
> + REG_L x2, PT_SP(sp)
> + addi sp, sp, (PT_SIZE_ON_STACK)
> +
> +SYM_ENTRY(optprobe_template_insn, SYM_L_GLOBAL, SYM_A_NONE)
> + /*
> + * Step3:
> + * NOPS will be replaced by the probed instruction, at worst case 3 RVC
> + * and 1 RVI instructions is about to execute out of line.
> + */
> + nop
A nop here will be either a compressed nop or a non-compressed,
depending on the build (C-enabled or not), right? Maybe be explicit to
the assembler what you want?
Björn
Chen Guokai <[email protected]> writes:
> From: Liao Chang <[email protected]>
> arch/riscv/kernel/probes/opt.c | 107 ++++++++++++++++++++++++++++++++-
>
> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
> index 56c8a227c857..a4271e6033ba 100644
> --- a/arch/riscv/kernel/probes/opt.c
> +++ b/arch/riscv/kernel/probes/opt.c
> @@ -24,7 +72,64 @@ int arch_check_optimized_kprobe(struct optimized_kprobe *op)
> int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
> struct kprobe *orig)
> {
> - return 0;
> + long rel;
> + int rd, ra, ret;
> + kprobe_opcode_t *code = NULL, *slot = NULL;
> +
> + if (!can_optimize((unsigned long)orig->addr, op))
> + return -EILSEQ;
> +
> + code = kzalloc(MAX_OPTINSN_SIZE, GFP_KERNEL);
> + slot = get_optinsn_slot();
> + if (!code || !slot) {
> + ret = -ENOMEM;
> + goto on_error;
> + }
> +
> + /*
> + * Verify if the address gap is within 4GB range, because this uses
> + * a auipc+jalr pair.
Try to be consistent. You're mixing "AUIPC/JALR" with "auipc+jalr".
> + */
> + rel = (unsigned long)slot - (unsigned long)orig->addr;
> + if (!in_auipc_jalr_range(rel)) {
> + /*
> + * Different from x86, we free code buf directly instead
> of
Reword for readers that are not familiar with x86.
Björn
Chen Guokai <[email protected]> writes:
> From: Liao Chang <[email protected]>
> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
> index a0d2ab39e3fa..258a283c906d 100644
> --- a/arch/riscv/kernel/probes/opt.c
> +++ b/arch/riscv/kernel/probes/opt.c
> @@ -271,15 +271,103 @@ static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op,
> *ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL);
> }
>
> +static bool insn_jump_into_range(unsigned long addr, unsigned long start,
> + unsigned long end)
> +{
> + kprobe_opcode_t insn = *(kprobe_opcode_t *)addr;
> + unsigned long target, offset = GET_INSN_LENGTH(insn);
> +
> +#ifdef CONFIG_RISCV_ISA_C
> + if (offset == RVC_INSN_LEN) {
> + if (riscv_insn_is_c_beqz(insn) || riscv_insn_is_c_bnez(insn))
> + target = addr + rvc_branch_imme(insn);
> + else if (riscv_insn_is_c_jal(insn) || riscv_insn_is_c_j(insn))
> + target = addr + rvc_jal_imme(insn);
> + else
> + target = 0;
> + return (target >= start) && (target < end);
> + }
> +#endif
> +
> + if (riscv_insn_is_branch(insn))
> + target = addr + rvi_branch_imme(insn);
> + else if (riscv_insn_is_jal(insn))
> + target = addr + rvi_jal_imme(insn);
> + else
> + target = 0;
> + return (target >= start) && (target < end);
> +}
> +
> +static int search_copied_insn(unsigned long paddr, struct optimized_kprobe *op)
> +{
> + int i = 1;
> + unsigned long offset = GET_INSN_LENGTH(*(kprobe_opcode_t *)paddr);
> +
> + while ((i++ < MAX_COPIED_INSN) && (offset < 2 * RVI_INSN_LEN)) {
> + if (riscv_probe_decode_insn((probe_opcode_t *)paddr + offset,
> + NULL) != INSN_GOOD)
If the second argument is NULL, and the insn is auipc, we'll splat with
NULL-ptr exception.
Hmm, probe_opcode_t is u32, right? And GET_INSN_LENGTH() returns 4 or 2
...then the pointer arithmetic will be a mess?
Björn
Chen Guokai <[email protected]> writes:
> From: Liao Chang <[email protected]>
> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
> index 56c8a227c857..a4271e6033ba 100644
> --- a/arch/riscv/kernel/probes/opt.c
> +++ b/arch/riscv/kernel/probes/opt.c
> @@ -10,6 +10,54 @@
>
> #include <linux/kprobes.h>
> #include <asm/kprobes.h>
> +#include <asm/patch.h>
> +
> +static inline int in_auipc_jalr_range(long val)
Leave "inline" out; Let the compiler decide.
在 2023/1/3 2:03, Björn Töpel 写道:
> Chen Guokai <[email protected]> writes:
>
>> From: Liao Chang <[email protected]>
>
>> diff --git a/arch/riscv/kernel/probes/simulate-insn.h b/arch/riscv/kernel/probes/simulate-insn.h
>> index cb6ff7dccb92..74d8c1ba9064 100644
>> --- a/arch/riscv/kernel/probes/simulate-insn.h
>> +++ b/arch/riscv/kernel/probes/simulate-insn.h
>> @@ -37,6 +37,40 @@ __RISCV_INSN_FUNCS(c_jalr, 0xf007, 0x9002);
>> __RISCV_INSN_FUNCS(c_beqz, 0xe003, 0xc001);
>> __RISCV_INSN_FUNCS(c_bnez, 0xe003, 0xe001);
>> __RISCV_INSN_FUNCS(c_ebreak, 0xffff, 0x9002);
>> +/* RVC(S) instructions contain rs1 and rs2 */
>> +__RISCV_INSN_FUNCS(c_sq, 0xe003, 0xa000);
>> +__RISCV_INSN_FUNCS(c_sw, 0xe003, 0xc000);
>> +__RISCV_INSN_FUNCS(c_sd, 0xe003, 0xe000);
>> +/* RVC(A) instructions contain rs1 and rs2 */
>> +__RISCV_INSN_FUNCS(c_sub, 0xfc03, 0x8c01);
>
> Incorrect mask.
Thanks for checking, i study the opcode of C_SUB [1], the correct mask should be 0xFC63.
15 14 13 12 | 11 10 9 8 | 7 6 5 4 | 3 2 1 0
c.sub: 1 0 0 0 | 1 1 rs1'/rd' 0 0 rs2' 0 1
mask: F | C | 6 | 3
value: 8 | C | 0 | 1
>
>> +__RISCV_INSN_FUNCS(c_subw, 0xfc43, 0x9c01);
>> +/* RVC(L) instructions contain rs1 */
>> +__RISCV_INSN_FUNCS(c_lq, 0xe003, 0x2000);
>> +__RISCV_INSN_FUNCS(c_lw, 0xe003, 0x4000);
>> +__RISCV_INSN_FUNCS(c_ld, 0xe003, 0x6000);
>> +/* RVC(I) instructions contain rs1 */
>> +__RISCV_INSN_FUNCS(c_addi, 0xe003, 0x0001);
>> +__RISCV_INSN_FUNCS(c_addiw, 0xe003, 0x2001);
>> +__RISCV_INSN_FUNCS(c_addi16sp, 0xe183, 0x6101);
>> +__RISCV_INSN_FUNCS(c_slli, 0xe003, 0x0002);
>> +/* RVC(B) instructions contain rs1 */
>> +__RISCV_INSN_FUNCS(c_sri, 0xe803, 0x8001);
>> +__RISCV_INSN_FUNCS(c_andi, 0xec03, 0x8801);
>> +/* RVC(SS) instructions contain rs2 */
>> +__RISCV_INSN_FUNCS(c_sqsp, 0xe003, 0xa002);
>> +__RISCV_INSN_FUNCS(c_swsp, 0xe003, 0xc002);
>> +__RISCV_INSN_FUNCS(c_sdsp, 0xe003, 0xe002);
>> +/* RVC(R) instructions contain rs2 and rd */
>> +__RISCV_INSN_FUNCS(c_mv, 0xe003, 0x8002);
>
> Shouldn't the mask be 0xf003?
Actually, the mask should be 0xf003 indeedly, but it also bring another problem that
it can't tell C.MV and C.JR via the mask and value parts. Look opcodes below:
15 14 13 12 | 11 10 9 8 | 7 6 5 4 | 3 2 1 0
C.JR: 1 0 0 0 | rs1 0 1 0
C.MV: 1 0 0 0 | rd rs2 1 0
The only differece between C.MV and C.JR is the bits[2~6], these bitfield of C.JR is zero,
the ones of C.MV is rs2 which never be zero. In order to tell C.MV and C.JR correclty, it
is better to adjust the mask of C.JR to be 0xf07f as your patch(riscv, kprobe: Stricter c.jr/c.jalr decoding)
Looking forward to your feedback.
>
>
> Björn
[1] https://github.com/riscv/riscv-isa-manual/releases
--
BR,
Liao, Chang
在 2023/1/3 16:27, Björn Töpel 写道:
> Chen Guokai <[email protected]> writes:
>
>> From: Liao Chang <[email protected]>
>
>> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
>> index 56c8a227c857..a4271e6033ba 100644
>> --- a/arch/riscv/kernel/probes/opt.c
>> +++ b/arch/riscv/kernel/probes/opt.c
>> @@ -10,6 +10,54 @@
>>
>> #include <linux/kprobes.h>
>> #include <asm/kprobes.h>
>> +#include <asm/patch.h>
>> +
>> +static inline int in_auipc_jalr_range(long val)
>
> Leave "inline" out; Let the compiler decide.
Will do.
--
BR,
Liao, Chang
在 2023/1/3 2:03, Björn Töpel 写道:
> Chen Guokai <[email protected]> writes:
>
>> From: Liao Chang <[email protected]>
>
>> arch/riscv/kernel/probes/opt.c | 107 ++++++++++++++++++++++++++++++++-
>>
>> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
>> index 56c8a227c857..a4271e6033ba 100644
>> --- a/arch/riscv/kernel/probes/opt.c
>> +++ b/arch/riscv/kernel/probes/opt.c
>
>> @@ -24,7 +72,64 @@ int arch_check_optimized_kprobe(struct optimized_kprobe *op)
>> int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
>> struct kprobe *orig)
>> {
>> - return 0;
>> + long rel;
>> + int rd, ra, ret;
>> + kprobe_opcode_t *code = NULL, *slot = NULL;
>> +
>> + if (!can_optimize((unsigned long)orig->addr, op))
>> + return -EILSEQ;
>> +
>> + code = kzalloc(MAX_OPTINSN_SIZE, GFP_KERNEL);
>> + slot = get_optinsn_slot();
>> + if (!code || !slot) {
>> + ret = -ENOMEM;
>> + goto on_error;
>> + }
>> +
>> + /*
>> + * Verify if the address gap is within 4GB range, because this uses
>> + * a auipc+jalr pair.
>
> Try to be consistent. You're mixing "AUIPC/JALR" with "auipc+jalr".
OK,i will use AUIPC/JALR in all commit messages and comments of this series.
>
>> + */
>> + rel = (unsigned long)slot - (unsigned long)orig->addr;
>> + if (!in_auipc_jalr_range(rel)) {
>> + /*
>> + * Different from x86, we free code buf directly instead
>> of
>
> Reword for readers that are not familiar with x86.
OK, BTW, i think the code following tag on_error is fairly self-explanatoty,
perhaps this comment is no need anymore.
>
>
> Björn
--
BR,
Liao, Chang
"liaochang (A)" <[email protected]> writes:
>> Shouldn't the mask be 0xf003?
>
> Actually, the mask should be 0xf003 indeedly, but it also bring another problem that
> it can't tell C.MV and C.JR via the mask and value parts. Look opcodes below:
>
> 15 14 13 12 | 11 10 9 8 | 7 6 5 4 | 3 2 1 0
> C.JR: 1 0 0 0 | rs1 0 1 0
> C.MV: 1 0 0 0 | rd rs2 1 0
>
> The only differece between C.MV and C.JR is the bits[2~6], these bitfield of C.JR is zero,
> the ones of C.MV is rs2 which never be zero. In order to tell C.MV and C.JR correclty, it
> is better to adjust the mask of C.JR to be 0xf07f as your patch(riscv, kprobe: Stricter c.jr/c.jalr decoding)
>
> Looking forward to your feedback.
Yup, that was the reason I submitted the fix! Let's wait for the fix to
be applied, and not include that fix in your feature series.
Björn
在 2023/1/3 2:04, Björn Töpel 写道:
> Chen Guokai <[email protected]> writes:
>
>> From: Liao Chang <[email protected]>
>
>> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
>> index a0d2ab39e3fa..258a283c906d 100644
>> --- a/arch/riscv/kernel/probes/opt.c
>> +++ b/arch/riscv/kernel/probes/opt.c
>> @@ -271,15 +271,103 @@ static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op,
>> *ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL);
>> }
>>
>> +static bool insn_jump_into_range(unsigned long addr, unsigned long start,
>> + unsigned long end)
>> +{
>> + kprobe_opcode_t insn = *(kprobe_opcode_t *)addr;
>> + unsigned long target, offset = GET_INSN_LENGTH(insn);
>> +
>> +#ifdef CONFIG_RISCV_ISA_C
>> + if (offset == RVC_INSN_LEN) {
>> + if (riscv_insn_is_c_beqz(insn) || riscv_insn_is_c_bnez(insn))
>> + target = addr + rvc_branch_imme(insn);
>> + else if (riscv_insn_is_c_jal(insn) || riscv_insn_is_c_j(insn))
>> + target = addr + rvc_jal_imme(insn);
>> + else
>> + target = 0;
>> + return (target >= start) && (target < end);
>> + }
>> +#endif
>> +
>> + if (riscv_insn_is_branch(insn))
>> + target = addr + rvi_branch_imme(insn);
>> + else if (riscv_insn_is_jal(insn))
>> + target = addr + rvi_jal_imme(insn);
>> + else
>> + target = 0;
>> + return (target >= start) && (target < end);
>> +}
>> +
>> +static int search_copied_insn(unsigned long paddr, struct optimized_kprobe *op)
>> +{
>> + int i = 1;
>> + unsigned long offset = GET_INSN_LENGTH(*(kprobe_opcode_t *)paddr);
>> +
>> + while ((i++ < MAX_COPIED_INSN) && (offset < 2 * RVI_INSN_LEN)) {
>> + if (riscv_probe_decode_insn((probe_opcode_t *)paddr + offset,
>> + NULL) != INSN_GOOD)
>
> If the second argument is NULL, and the insn is auipc, we'll splat with
> NULL-ptr exception.
Good catch, it is my fault to ignore the access to second argument in macro RISCV_INSN_SET_SIMULATE.
>
> Hmm, probe_opcode_t is u32, right? And GET_INSN_LENGTH() returns 4 or 2
> ...then the pointer arithmetic will be a mess?
Hmm, This pointer arithemtic does make no sense here, i had debugged this function on QEMU step by step,
and it work well. Anyway, i will go through this function again, thanks.
>
>
> Björn
--
BR,
Liao, Chang
Hi,Björn,appreciate for your review and testing about this feature.
在 2023/1/3 2:02, Björn Töpel 写道:
> Chen Guokai <[email protected]> writes:
>
>> Add jump optimization support for RISC-V.
>
> Thank you for continuing to work on the series! I took the series for a
> spin, and ran into a number of issues that makes me wonder how you test
> the series, and how the testing is different from my runs.
I have pick some kernel functions to test this series, which means all optprobe
are install at entry of function, i guess the instruction pattern is not versatile
enough for my testcases leads to some bugs are not discovered.
Do you think it is good idea to test this feature via binary ftracetest and the
kprobe related tc scripts in tools/testing/ftrace directory?
Thanks.
>
> I'll outline the general/big issues here, and leave the specifics per-patch.
>
> I've done simple testing, using "Kprobe-based Event Tracing"
> (CONFIG_KPROBE_EVENTS=y) via tracefs.
>
> All the tests were run on commit 88603b6dc419 ("Linux 6.2-rc2") with the
> series applied. All the bugs were trigged by setting different probes to
> do_sys_openat2. Code:
>
> do_sys_openat2:
> ...snip...
> ffffffff802d138c: 89aa c.mv s3,a0 // +44
> ffffffff802d138e: 892e c.mv s2,a1 // +46
> ffffffff802d1390: 8532 c.mv a0,a2
> ffffffff802d1392: fa040593 addi a1,s0,-96
> ffffffff802d1396: 84b2 c.mv s1,a2
> ffffffff802d1398: fa043023 sd zero,-96(s0)
> ffffffff802d139c: fa043423 sd zero,-88(s0)
> ffffffff802d13a0: fa042823 sw zero,-80(s0)
> ffffffff802d13a4: 00000097 auipc ra,0x0
> ...snip...
>
>
> 1. Fail to register kprobe to c.mv
>
> Add a kprobe:
> echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
>
> register_kprobe returns -22 (EINVAL). This is due to a bug in the
> instruction decoder. I've sent to fix upstream [1].
>
> 2. (with [1] applied) Oops when register a probe
>
> Add a kprobe:
> echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
>
> You get a splat:
> Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008
> Oops [#1]
> Modules linked in:
> CPU: 1 PID: 242 Comm: bash Tainted: G W 6.2.0-rc2-00010-g09ff1aa7b1f9-dirty #14
> Hardware name: riscv-virtio,qemu (DT)
> epc : riscv_probe_decode_insn+0x16a/0x192
> ra : riscv_probe_decode_insn+0x32/0x192
> epc : ffffffff8127b2bc ra : ffffffff8127b184 sp : ff2000000173bac0
> gp : ffffffff82533f70 tp : ff60000086ab2b40 t0 : 0000000000000000
> t1 : 0000000000000850 t2 : 65646f6365642054 s0 : ff2000000173bae0
> s1 : 0000000000000017 a0 : 000000000000e001 a1 : 000000000000003f
> a2 : 0000000000009002 a3 : 0000000000000017 a4 : 000000000000c001
> a5 : ffffffff8127b38a a6 : ff6000047d666000 a7 : 0000000000040000
> s2 : 0000000000000000 s3 : 0000000000000006 s4 : ff6000008558f718
> s5 : ff6000008558f718 s6 : 0000000000000001 s7 : ff6000008558f768
> s8 : 0000000000000007 s9 : 0000000000000003 s10: 0000000000000002
> s11: 00aaaaaad62baf78 t3 : 0000000000000000 t4 : 8dd70b0100000000
> t5 : ffffffffffffe000 t6 : ff2000000173b8c8
> status: 0000000200000120 badaddr: 0000000000000008 cause: 000000000000000f
> [<ffffffff81257e48>] arch_prepare_optimized_kprobe+0xc2/0x4ec
> [<ffffffff8125b420>] alloc_aggr_kprobe+0x5c/0x6a
> [<ffffffff8125ba0a>] register_kprobe+0x5dc/0x6a2
> [<ffffffff8016f266>] __register_trace_kprobe.part.0+0x98/0xbc
> [<ffffffff80170544>] __trace_kprobe_create+0x6ea/0xbcc
> [<ffffffff80176cee>] trace_probe_create+0x6c/0x7c
> [<ffffffff8016f1a2>] create_or_delete_trace_kprobe+0x24/0x50
> [<ffffffff80150642>] trace_parse_run_command+0x9e/0x12a
> [<ffffffff8016f176>] probes_write+0x18/0x20
> [<ffffffff802d494a>] vfs_write+0xca/0x41e
> [<ffffffff802d4f96>] ksys_write+0x70/0xee
> [<ffffffff802d5036>] sys_write+0x22/0x2a
> [<ffffffff80004196>] ret_from_syscall+0x0/0x2
>
> This is because a call to riscv_probe_decode_insn(probe_opcode_t *addr,
> struct arch_probe_insn *api), where api is NULL (and tripping over
> auipc). Should be a common scenario...
>
> 3. No bound check for instructions
>
> Add a probe to a non-valid instruction (in the middle of addi):
> echo 'p 0xffffffff802d1394' > /sys/kernel/debug/tracing/kprobe_events
>
> You get the same splat as above from the auipc NULL-pointer, but the
> "half" addi-instruction is parsed as a correct instruction.
>
> 4. Lockdep splat
>
> Might be false positive; When enabling a probe, e.g.
> echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
>
>
> ======================================================
> WARNING: possible circular locking dependency detected
>
> ------------------------------------------------------
> bash/244 is trying to acquire lock:
> ffffffff8223ee90 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x2c/0x54
>
> but task is already holding lock:
> ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (text_mutex){+.+.}-{3:3}:
> lock_acquire+0x10a/0x328
> __mutex_lock+0xa8/0x770
> mutex_lock_nested+0x28/0x30
> register_kprobe+0x3ae/0x5ea
> __register_trace_kprobe.part.0+0x98/0xbc
> __trace_kprobe_create+0x6ea/0xbcc
> trace_probe_create+0x6c/0x7c
> create_or_delete_trace_kprobe+0x24/0x50
> trace_parse_run_command+0x9e/0x12a
> probes_write+0x18/0x20
> vfs_write+0xca/0x41e
> ksys_write+0x70/0xee
> sys_write+0x22/0x2a
> ret_from_syscall+0x0/0x2
>
> -> #0 (cpu_hotplug_lock){++++}-{0:0}:
> check_noncircular+0x122/0x13a
> __lock_acquire+0x1058/0x20e4
> lock_acquire+0x10a/0x328
> cpus_read_lock+0x4c/0x11c
> stop_machine+0x2c/0x54
> arch_ftrace_update_code+0x2e/0x4c
> ftrace_startup+0xd0/0x15e
> register_ftrace_function+0x32/0x7c
> arm_kprobe+0x132/0x198
> enable_kprobe+0x9c/0xc0
> enable_trace_kprobe+0x6e/0xea
> kprobe_register+0x64/0x6c
> __ftrace_event_enable_disable+0x72/0x246
> event_enable_write+0x94/0xe4
> vfs_write+0xca/0x41e
> ksys_write+0x70/0xee
> sys_write+0x22/0x2a
> ret_from_syscall+0x0/0x2
>
> other info that might help us debug this:
Need to study this backtrace further, but at first glance, i guess CONFIG_DYNAMIC_FTRACE is enabled on your kernel, right?
If so, all krpobe is installed via ftrace stub, then kprobe optimiztion occur in the ftrace trampoline code, and it also
a corner case to current optprobe implementation.
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(text_mutex);
> lock(cpu_hotplug_lock);
> lock(text_mutex);
> lock(cpu_hotplug_lock);
>
> *** DEADLOCK ***
>
> 5 locks held by bash/244:
> #0: ff60000080f49438 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x70/0xee
> #1: ffffffff822d9468 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0x7c/0xe4
> #2: ffffffff822d3fa8 (kprobe_mutex){+.+.}-{3:3}, at: enable_kprobe+0x32/0xc0
> #3: ffffffff822d56d8 (ftrace_lock){+.+.}-{3:3}, at: register_ftrace_function+0x26/0x7c
> #4: ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
>
> stack backtrace:
> CPU: 2 PID: 244 Comm: bash Not tainted 6.2.0-rc1-00008-g544b2c59fd81 #1
> Hardware name: riscv-virtio,qemu (DT)
> Call Trace:
> [<ffffffff80006e80>] dump_backtrace+0x30/0x38
> [<ffffffff81256e82>] show_stack+0x40/0x4c
> [<ffffffff8126e054>] dump_stack_lvl+0x62/0x84
> [<ffffffff8126e08e>] dump_stack+0x18/0x20
> [<ffffffff8009b37e>] print_circular_bug+0x2ac/0x318
> [<ffffffff8009b50c>] check_noncircular+0x122/0x13a
> [<ffffffff8009e020>] __lock_acquire+0x1058/0x20e4
> [<ffffffff8009f90c>] lock_acquire+0x10a/0x328
> [<ffffffff8002fb8a>] cpus_read_lock+0x4c/0x11c
> [<ffffffff8011ed60>] stop_machine+0x2c/0x54
> [<ffffffff8013aec6>] arch_ftrace_update_code+0x2e/0x4c
> [<ffffffff8013e796>] ftrace_startup+0xd0/0x15e
> [<ffffffff8013e856>] register_ftrace_function+0x32/0x7c
> [<ffffffff8012f928>] arm_kprobe+0x132/0x198
> [<ffffffff8012fa2a>] enable_kprobe+0x9c/0xc0
> [<ffffffff8016ff62>] enable_trace_kprobe+0x6e/0xea
> [<ffffffff801700da>] kprobe_register+0x64/0x6c
> [<ffffffff8015eba6>] __ftrace_event_enable_disable+0x72/0x246
> [<ffffffff8015eeea>] event_enable_write+0x94/0xe4
> [<ffffffff802d5e1a>] vfs_write+0xca/0x41e
> [<ffffffff802d6466>] ksys_write+0x70/0xee
> [<ffffffff802d6506>] sys_write+0x22/0x2a
> [<ffffffff80004196>] ret_from_syscall+0x0/0x2
>
My comment is same as the last one.
>
> 5. 32b support?
>
> I've noticed that there code supports rv32. Is this tested? Does regular
> kprobes work on 32b?
Not yet, i will test on rv32.
>
>
> Thanks,
> Björn
>
>
> [1] https://lore.kernel.org/linux-riscv/[email protected]/
>
>
--
BR,
Liao, Chang
"liaochang (A)" <[email protected]> writes:
>>> + */
>>> + rel = (unsigned long)slot - (unsigned long)orig->addr;
>>> + if (!in_auipc_jalr_range(rel)) {
>>> + /*
>>> + * Different from x86, we free code buf directly instead
>>> of
>>
>> Reword for readers that are not familiar with x86.
>
> OK, BTW, i think the code following tag on_error is fairly self-explanatoty,
> perhaps this comment is no need anymore.
Fair enough! :-)
Björn
在 2023/1/3 2:03, Björn Töpel 写道:
> Chen Guokai <[email protected]> writes:
>
>> From: Liao Chang <[email protected]>
>>
>> Prepare skeleton to implement optimized kprobe on RISCV, it is consist
>> of Makfile, Kconfig and some architecture specific files: kprobe.h and
>> opt.c opt.c include some macro, type definition and functions required
>> by kprobe framework, opt_trampoline.S provide a piece of assembly code
>> template used to construct the detour buffer as the target of long jump
>> instruction(s) for each optimzed kprobe.
>
> This is pretty much just reiterating what diff-stat says. Please try to
> explain why a certain change is done, instead of what. What is already
> in the patch.
Thanks for your suggestion, i will explain further in next revision.
>
>> Since the jump range of PC-relative instruction JAL is +/-2M, that is
>> too small to reach the detour buffer, hence the foudamental idea to
>> address OPTPROBES on RISCV is replace 'EBREAK' with 'AUIPC/JALR'. which
>> means it needs to clobber one more instruction beside the kprobe
>> instruction, furthermore, RISCV supports hybird RVI and RVC in single
>> kernel binary, so in theory a pair of 'AUIPC/JALR' is about to clobber
>> 10 bytes(3 RVC and 1 RVI, 2 bytes is padding for alignment) at worst
>> case. The second hardsome problem is looking for one integer register as
>> the destination of 'AUIPC/JALR' without any side-effect.
>
> There are a number of spelling errors, please use a spellchecker and if
> you reference a file (e.g. Makefile), make sure it is correctly spelled
> out.
>
> The comments above applies to all the commit messages of this series.
Thanks for reviewing, i will correct these spelling errors.
>
>
> Björn
--
BR,
Liao, Chang
在 2023/1/3 2:04, Björn Töpel 写道:
> Chen Guokai <[email protected]> writes:
>
>> From: Liao Chang <[email protected]>
>
>> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
>> index 258a283c906d..bc232fce5b39 100644
>> --- a/arch/riscv/kernel/probes/opt.c
>> +++ b/arch/riscv/kernel/probes/opt.c
>> @@ -11,9 +11,37 @@
>> #include <linux/kprobes.h>
>> #include <asm/kprobes.h>
>> #include <asm/patch.h>
>> +#include <asm/asm-offsets.h>
>>
>> #include "simulate-insn.h"
>> #include "decode-insn.h"
>> +#include "../../net/bpf_jit.h"
>> +
>> +static void
>
> Super-nit, but I really prefer *not* breaking function name and return
> value, for grepability.
OK, i will keep function name and return at the same line.
>
>> diff --git a/arch/riscv/kernel/probes/opt_trampoline.S b/arch/riscv/kernel/probes/opt_trampoline.S
>> index 16160c4367ff..75e34e373cf2 100644
>> --- a/arch/riscv/kernel/probes/opt_trampoline.S
>> +++ b/arch/riscv/kernel/probes/opt_trampoline.S
>> @@ -1,12 +1,137 @@
>> /* SPDX-License-Identifier: GPL-2.0-only */
>> /*
>> * Copyright (C) 2022 Guokai Chen
>> + * Copyright (C) 2022 Liao, Chang <[email protected]>
>> */
>>
>> #include <linux/linkage.h>
>>
>> +#include <asm/asm.h>
>> #incldue <asm/csr.h>
>> #include <asm/asm-offsets.h>
>>
>> SYM_ENTRY(optprobe_template_entry, SYM_L_GLOBAL, SYM_A_NONE)
>> + addi sp, sp, -(PT_SIZE_ON_STACK)
>> + REG_S x1, PT_RA(sp)
>> + REG_S x2, PT_SP(sp)
>> + REG_S x3, PT_GP(sp)
>> + REG_S x4, PT_TP(sp)
>> + REG_S x5, PT_T0(sp)
>> + REG_S x6, PT_T1(sp)
>> + REG_S x7, PT_T2(sp)
>> + REG_S x8, PT_S0(sp)
>> + REG_S x9, PT_S1(sp)
>> + REG_S x10, PT_A0(sp)
>> + REG_S x11, PT_A1(sp)
>> + REG_S x12, PT_A2(sp)
>> + REG_S x13, PT_A3(sp)
>> + REG_S x14, PT_A4(sp)
>> + REG_S x15, PT_A5(sp)
>> + REG_S x16, PT_A6(sp)
>> + REG_S x17, PT_A7(sp)
>> + REG_S x18, PT_S2(sp)
>> + REG_S x19, PT_S3(sp)
>> + REG_S x20, PT_S4(sp)
>> + REG_S x21, PT_S5(sp)
>> + REG_S x22, PT_S6(sp)
>> + REG_S x23, PT_S7(sp)
>> + REG_S x24, PT_S8(sp)
>> + REG_S x25, PT_S9(sp)
>> + REG_S x26, PT_S10(sp)
>> + REG_S x27, PT_S11(sp)
>> + REG_S x28, PT_T3(sp)
>> + REG_S x29, PT_T4(sp)
>> + REG_S x30, PT_T5(sp)
>> + REG_S x31, PT_T6(sp)
>> + /* Update fp is friendly for stacktrace */
>> + addi s0, sp, (PT_SIZE_ON_STACK)
>> + j 1f
>> +
>> +SYM_ENTRY(optprobe_template_save, SYM_L_GLOBAL, SYM_A_NONE)
>> + /*
>> + * Step1:
>> + * Filled with the pointer to optimized_kprobe data
>> + */
>> + .dword 0
>> +1:
>> + /* Load optimize_kprobe pointer from .dword below */
>> + auipc a0, 0
>> + REG_L a0, -8(a0)
>> + add a1, sp, x0
>> +
>> +SYM_ENTRY(optprobe_template_call, SYM_L_GLOBAL, SYM_A_NONE)
>> + /*
>> + * Step2:
>> + * <IMME> of AUIPC/JALR are modified to the offset to optimized_callback
>> + * jump target is loaded from above .dword.
>> + */
>> + auipc ra, 0
>> + jalr ra, 0(ra)
>> +
>> + REG_L x1, PT_RA(sp)
>> + REG_L x3, PT_GP(sp)
>> + REG_L x4, PT_TP(sp)
>> + REG_L x5, PT_T0(sp)
>> + REG_L x6, PT_T1(sp)
>> + REG_L x7, PT_T2(sp)
>> + REG_L x8, PT_S0(sp)
>> + REG_L x9, PT_S1(sp)
>> + REG_L x10, PT_A0(sp)
>> + REG_L x11, PT_A1(sp)
>> + REG_L x12, PT_A2(sp)
>> + REG_L x13, PT_A3(sp)
>> + REG_L x14, PT_A4(sp)
>> + REG_L x15, PT_A5(sp)
>> + REG_L x16, PT_A6(sp)
>> + REG_L x17, PT_A7(sp)
>> + REG_L x18, PT_S2(sp)
>> + REG_L x19, PT_S3(sp)
>> + REG_L x20, PT_S4(sp)
>> + REG_L x21, PT_S5(sp)
>> + REG_L x22, PT_S6(sp)
>> + REG_L x23, PT_S7(sp)
>> + REG_L x24, PT_S8(sp)
>> + REG_L x25, PT_S9(sp)
>> + REG_L x26, PT_S10(sp)
>> + REG_L x27, PT_S11(sp)
>> + REG_L x28, PT_T3(sp)
>> + REG_L x29, PT_T4(sp)
>> + REG_L x30, PT_T5(sp)
>> + REG_L x31, PT_T6(sp)
>> + REG_L x2, PT_SP(sp)
>> + addi sp, sp, (PT_SIZE_ON_STACK)
>> +
>> +SYM_ENTRY(optprobe_template_insn, SYM_L_GLOBAL, SYM_A_NONE)
>> + /*
>> + * Step3:
>> + * NOPS will be replaced by the probed instruction, at worst case 3 RVC
>> + * and 1 RVI instructions is about to execute out of line.
>> + */
>> + nop
>
> A nop here will be either a compressed nop or a non-compressed,
> depending on the build (C-enabled or not), right? Maybe be explicit to
> the assembler what you want?
>
You are right, if CONFIG_RISCV_ISA_C is disabled, two NOP is enough for 2 RVI execute out of line,
if CONFIG_RISCV_ISA_C is enabled, it needs eight C.NOP here for the worst case (3 RVC + 1 RVI).
I will use {C}.NOP explicitly for different configure in next revision, thanks.
>
> Björn
--
BR,
Liao, Chang
"liaochang (A)" <[email protected]> writes:
> Hi,Björn,appreciate for your review and testing about this feature.
Thank you for the hard work!
> 在 2023/1/3 2:02, Björn Töpel 写道:
>> Chen Guokai <[email protected]> writes:
>>
>>> Add jump optimization support for RISC-V.
>>
>> Thank you for continuing to work on the series! I took the series for a
>> spin, and ran into a number of issues that makes me wonder how you test
>> the series, and how the testing is different from my runs.
>
> I have pick some kernel functions to test this series, which means all optprobe
> are install at entry of function, i guess the instruction pattern is not versatile
> enough for my testcases leads to some bugs are not discovered.
>
> Do you think it is good idea to test this feature via binary ftracetest and the
> kprobe related tc scripts in tools/testing/ftrace directory?
Definitely! Both running all tests in tools/testing/selftests/ftrace and
with the CONFIG_KPROBES_SANITY_TEST module.
[...]
>> 4. Lockdep splat
[...]
> Need to study this backtrace further, but at first glance, i guess CONFIG_DYNAMIC_FTRACE is enabled on your kernel, right?
> If so, all krpobe is installed via ftrace stub, then kprobe optimiztion occur in the ftrace trampoline code, and it also
> a corner case to current optprobe implementation.
Yes, CONFIG_DYNAMIC_FTRACE is on. My kernel config was simply:
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- defconfig
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- kselftest-merge
Thanks,
Björn
在 2023/1/3 2:04, Björn Töpel 写道:
> Chen Guokai <[email protected]> writes:
>
>> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
>> index 1c0e9d218f6f..884e77d2df4c 100644
>> --- a/arch/riscv/kernel/probes/opt.c
>> +++ b/arch/riscv/kernel/probes/opt.c
>> @@ -12,6 +12,7 @@
>> #include <asm/kprobes.h>
>> #include <asm/patch.h>
>> #include <asm/asm-offsets.h>
>> +#include <linux/extable.h>
>>
>> #include "simulate-insn.h"
>> #include "decode-insn.h"
>> @@ -130,7 +131,7 @@ static void prepare_detour_buffer(kprobe_opcode_t *code, kprobe_opcode_t *slot,
>> * as a destination register before any branch or jump instruction.
>> */
>> static void find_register(unsigned long start, unsigned long end,
>> - unsigned long *write, unsigned long *read)
>> + unsigned long *write, unsigned long *read)
>
> Probably a patch messup; This makes the series no apply fully.
Not sure, i will apply this patch to latest kernel(Linux 6.2-rc2)
>
>
> Björn
--
BR,
Liao, Chang
"liaochang (A)" <[email protected]> writes:
>>> +SYM_ENTRY(optprobe_template_insn, SYM_L_GLOBAL, SYM_A_NONE)
>>> + /*
>>> + * Step3:
>>> + * NOPS will be replaced by the probed instruction, at worst case 3 RVC
>>> + * and 1 RVI instructions is about to execute out of line.
>>> + */
>>> + nop
>>
>> A nop here will be either a compressed nop or a non-compressed,
>> depending on the build (C-enabled or not), right? Maybe be explicit to
>> the assembler what you want?
>>
>
> You are right, if CONFIG_RISCV_ISA_C is disabled, two NOP is enough for 2 RVI execute out of line,
> if CONFIG_RISCV_ISA_C is enabled, it needs eight C.NOP here for the worst case (3 RVC + 1 RVI).
>
> I will use {C}.NOP explicitly for different configure in next revision, thanks.
What I meant was that "nop" can expand to compressed instructions, and
you should be explicit. So you know how it's expanded by the
compiler/assembler.
An example:
$ cat bar.S
.text
bar:
nop
nop
$ riscv64-linux-gnu-gcc -O2 -o bar.o -c bar.S && riscv64-linux-gnu-objdump -M no-aliases -d bar.o
bar.o: file format elf64-littleriscv
Disassembly of section .text:
0000000000000000 <bar>:
0: 0001 c.addi zero,0
2: 0001 c.addi zero,0
vs
$ cat foo.S
.text
foo:
.option norvc
nop
nop
$ riscv64-linux-gnu-gcc -O2 -o foo.o -c foo.S && riscv64-linux-gnu-objdump -M no-aliases -d foo.o
foo.o: file format elf64-littleriscv
Disassembly of section .text:
0000000000000000 <foo>:
0: 00000013 addi zero,zero,0
4: 00000013 addi zero,zero,0
Björn
Hi Björn,
Thanks for your detailed review! I made tests mainly on some syscall/timer related
functions where these issues were not triggered. I will check all these issues as well
as comments that spread per-patch before a new version of patch set is sent.
FYI the 32b support is included and was tested with mostly same cases as the 64b one.
Regards,
Guokai Chen
> 2023年1月3日 02:02,Björn Töpel <[email protected]> 写道:
>
> Chen Guokai <[email protected]> writes:
>
>> Add jump optimization support for RISC-V.
>
> Thank you for continuing to work on the series! I took the series for a
> spin, and ran into a number of issues that makes me wonder how you test
> the series, and how the testing is different from my runs.
>
> I'll outline the general/big issues here, and leave the specifics per-patch.
>
> I've done simple testing, using "Kprobe-based Event Tracing"
> (CONFIG_KPROBE_EVENTS=y) via tracefs.
>
> All the tests were run on commit 88603b6dc419 ("Linux 6.2-rc2") with the
> series applied. All the bugs were trigged by setting different probes to
> do_sys_openat2. Code:
>
> do_sys_openat2:
> ...snip...
> ffffffff802d138c: 89aa c.mv s3,a0 // +44
> ffffffff802d138e: 892e c.mv s2,a1 // +46
> ffffffff802d1390: 8532 c.mv a0,a2
> ffffffff802d1392: fa040593 addi a1,s0,-96
> ffffffff802d1396: 84b2 c.mv s1,a2
> ffffffff802d1398: fa043023 sd zero,-96(s0)
> ffffffff802d139c: fa043423 sd zero,-88(s0)
> ffffffff802d13a0: fa042823 sw zero,-80(s0)
> ffffffff802d13a4: 00000097 auipc ra,0x0
> ...snip...
>
>
> 1. Fail to register kprobe to c.mv
>
> Add a kprobe:
> echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
>
> register_kprobe returns -22 (EINVAL). This is due to a bug in the
> instruction decoder. I've sent to fix upstream [1].
>
> 2. (with [1] applied) Oops when register a probe
>
> Add a kprobe:
> echo 'p do_sys_openat2+44' > /sys/kernel/debug/tracing/kprobe_events
>
> You get a splat:
> Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008
> Oops [#1]
> Modules linked in:
> CPU: 1 PID: 242 Comm: bash Tainted: G W 6.2.0-rc2-00010-g09ff1aa7b1f9-dirty #14
> Hardware name: riscv-virtio,qemu (DT)
> epc : riscv_probe_decode_insn+0x16a/0x192
> ra : riscv_probe_decode_insn+0x32/0x192
> epc : ffffffff8127b2bc ra : ffffffff8127b184 sp : ff2000000173bac0
> gp : ffffffff82533f70 tp : ff60000086ab2b40 t0 : 0000000000000000
> t1 : 0000000000000850 t2 : 65646f6365642054 s0 : ff2000000173bae0
> s1 : 0000000000000017 a0 : 000000000000e001 a1 : 000000000000003f
> a2 : 0000000000009002 a3 : 0000000000000017 a4 : 000000000000c001
> a5 : ffffffff8127b38a a6 : ff6000047d666000 a7 : 0000000000040000
> s2 : 0000000000000000 s3 : 0000000000000006 s4 : ff6000008558f718
> s5 : ff6000008558f718 s6 : 0000000000000001 s7 : ff6000008558f768
> s8 : 0000000000000007 s9 : 0000000000000003 s10: 0000000000000002
> s11: 00aaaaaad62baf78 t3 : 0000000000000000 t4 : 8dd70b0100000000
> t5 : ffffffffffffe000 t6 : ff2000000173b8c8
> status: 0000000200000120 badaddr: 0000000000000008 cause: 000000000000000f
> [<ffffffff81257e48>] arch_prepare_optimized_kprobe+0xc2/0x4ec
> [<ffffffff8125b420>] alloc_aggr_kprobe+0x5c/0x6a
> [<ffffffff8125ba0a>] register_kprobe+0x5dc/0x6a2
> [<ffffffff8016f266>] __register_trace_kprobe.part.0+0x98/0xbc
> [<ffffffff80170544>] __trace_kprobe_create+0x6ea/0xbcc
> [<ffffffff80176cee>] trace_probe_create+0x6c/0x7c
> [<ffffffff8016f1a2>] create_or_delete_trace_kprobe+0x24/0x50
> [<ffffffff80150642>] trace_parse_run_command+0x9e/0x12a
> [<ffffffff8016f176>] probes_write+0x18/0x20
> [<ffffffff802d494a>] vfs_write+0xca/0x41e
> [<ffffffff802d4f96>] ksys_write+0x70/0xee
> [<ffffffff802d5036>] sys_write+0x22/0x2a
> [<ffffffff80004196>] ret_from_syscall+0x0/0x2
>
> This is because a call to riscv_probe_decode_insn(probe_opcode_t *addr,
> struct arch_probe_insn *api), where api is NULL (and tripping over
> auipc). Should be a common scenario...
>
> 3. No bound check for instructions
>
> Add a probe to a non-valid instruction (in the middle of addi):
> echo 'p 0xffffffff802d1394' > /sys/kernel/debug/tracing/kprobe_events
>
> You get the same splat as above from the auipc NULL-pointer, but the
> "half" addi-instruction is parsed as a correct instruction.
>
> 4. Lockdep splat
>
> Might be false positive; When enabling a probe, e.g.
> echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
>
>
> ======================================================
> WARNING: possible circular locking dependency detected
>
> ------------------------------------------------------
> bash/244 is trying to acquire lock:
> ffffffff8223ee90 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x2c/0x54
>
> but task is already holding lock:
> ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (text_mutex){+.+.}-{3:3}:
> lock_acquire+0x10a/0x328
> __mutex_lock+0xa8/0x770
> mutex_lock_nested+0x28/0x30
> register_kprobe+0x3ae/0x5ea
> __register_trace_kprobe.part.0+0x98/0xbc
> __trace_kprobe_create+0x6ea/0xbcc
> trace_probe_create+0x6c/0x7c
> create_or_delete_trace_kprobe+0x24/0x50
> trace_parse_run_command+0x9e/0x12a
> probes_write+0x18/0x20
> vfs_write+0xca/0x41e
> ksys_write+0x70/0xee
> sys_write+0x22/0x2a
> ret_from_syscall+0x0/0x2
>
> -> #0 (cpu_hotplug_lock){++++}-{0:0}:
> check_noncircular+0x122/0x13a
> __lock_acquire+0x1058/0x20e4
> lock_acquire+0x10a/0x328
> cpus_read_lock+0x4c/0x11c
> stop_machine+0x2c/0x54
> arch_ftrace_update_code+0x2e/0x4c
> ftrace_startup+0xd0/0x15e
> register_ftrace_function+0x32/0x7c
> arm_kprobe+0x132/0x198
> enable_kprobe+0x9c/0xc0
> enable_trace_kprobe+0x6e/0xea
> kprobe_register+0x64/0x6c
> __ftrace_event_enable_disable+0x72/0x246
> event_enable_write+0x94/0xe4
> vfs_write+0xca/0x41e
> ksys_write+0x70/0xee
> sys_write+0x22/0x2a
> ret_from_syscall+0x0/0x2
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(text_mutex);
> lock(cpu_hotplug_lock);
> lock(text_mutex);
> lock(cpu_hotplug_lock);
>
> *** DEADLOCK ***
>
> 5 locks held by bash/244:
> #0: ff60000080f49438 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x70/0xee
> #1: ffffffff822d9468 (event_mutex){+.+.}-{3:3}, at: event_enable_write+0x7c/0xe4
> #2: ffffffff822d3fa8 (kprobe_mutex){+.+.}-{3:3}, at: enable_kprobe+0x32/0xc0
> #3: ffffffff822d56d8 (ftrace_lock){+.+.}-{3:3}, at: register_ftrace_function+0x26/0x7c
> #4: ffffffff82249f70 (text_mutex){+.+.}-{3:3}, at: ftrace_arch_code_modify_prepare+0x1a/0x22
>
> stack backtrace:
> CPU: 2 PID: 244 Comm: bash Not tainted 6.2.0-rc1-00008-g544b2c59fd81 #1
> Hardware name: riscv-virtio,qemu (DT)
> Call Trace:
> [<ffffffff80006e80>] dump_backtrace+0x30/0x38
> [<ffffffff81256e82>] show_stack+0x40/0x4c
> [<ffffffff8126e054>] dump_stack_lvl+0x62/0x84
> [<ffffffff8126e08e>] dump_stack+0x18/0x20
> [<ffffffff8009b37e>] print_circular_bug+0x2ac/0x318
> [<ffffffff8009b50c>] check_noncircular+0x122/0x13a
> [<ffffffff8009e020>] __lock_acquire+0x1058/0x20e4
> [<ffffffff8009f90c>] lock_acquire+0x10a/0x328
> [<ffffffff8002fb8a>] cpus_read_lock+0x4c/0x11c
> [<ffffffff8011ed60>] stop_machine+0x2c/0x54
> [<ffffffff8013aec6>] arch_ftrace_update_code+0x2e/0x4c
> [<ffffffff8013e796>] ftrace_startup+0xd0/0x15e
> [<ffffffff8013e856>] register_ftrace_function+0x32/0x7c
> [<ffffffff8012f928>] arm_kprobe+0x132/0x198
> [<ffffffff8012fa2a>] enable_kprobe+0x9c/0xc0
> [<ffffffff8016ff62>] enable_trace_kprobe+0x6e/0xea
> [<ffffffff801700da>] kprobe_register+0x64/0x6c
> [<ffffffff8015eba6>] __ftrace_event_enable_disable+0x72/0x246
> [<ffffffff8015eeea>] event_enable_write+0x94/0xe4
> [<ffffffff802d5e1a>] vfs_write+0xca/0x41e
> [<ffffffff802d6466>] ksys_write+0x70/0xee
> [<ffffffff802d6506>] sys_write+0x22/0x2a
> [<ffffffff80004196>] ret_from_syscall+0x0/0x2
>
>
> 5. 32b support?
>
> I've noticed that there code supports rv32. Is this tested? Does regular
> kprobes work on 32b?
>
>
> Thanks,
> Björn
>
>
> [1] https://lore.kernel.org/linux-riscv/[email protected]/
>
Xim <[email protected]> writes:
> Hi Björn,
>
> Thanks for your detailed review! I made tests mainly on some syscall/timer related
> functions where these issues were not triggered. I will check all these issues as well
> as comments that spread per-patch before a new version of patch set is sent.
>
> FYI the 32b support is included and was tested with mostly same cases as the 64b one.
Ok! Thank you for clarifying!
Björn
在 2023/1/4 17:12, Björn Töpel 写道:
> "liaochang (A)" <[email protected]> writes:
>
>>>> +SYM_ENTRY(optprobe_template_insn, SYM_L_GLOBAL, SYM_A_NONE)
>>>> + /*
>>>> + * Step3:
>>>> + * NOPS will be replaced by the probed instruction, at worst case 3 RVC
>>>> + * and 1 RVI instructions is about to execute out of line.
>>>> + */
>>>> + nop
>>>
>>> A nop here will be either a compressed nop or a non-compressed,
>>> depending on the build (C-enabled or not), right? Maybe be explicit to
>>> the assembler what you want?
>>>
>>
>> You are right, if CONFIG_RISCV_ISA_C is disabled, two NOP is enough for 2 RVI execute out of line,
>> if CONFIG_RISCV_ISA_C is enabled, it needs eight C.NOP here for the worst case (3 RVC + 1 RVI).
>>
>> I will use {C}.NOP explicitly for different configure in next revision, thanks.
>
> What I meant was that "nop" can expand to compressed instructions, and
> you should be explicit. So you know how it's expanded by the
> compiler/assembler.
>
> An example:
>
> $ cat bar.S
> .text
> bar:
> nop
> nop
> $ riscv64-linux-gnu-gcc -O2 -o bar.o -c bar.S && riscv64-linux-gnu-objdump -M no-aliases -d bar.o
>
> bar.o: file format elf64-littleriscv
>
>
> Disassembly of section .text:
>
> 0000000000000000 <bar>:
> 0: 0001 c.addi zero,0
> 2: 0001 c.addi zero,0
>
>
> vs
>
> $ cat foo.S
> .text
> foo:
> .option norvc
> nop
> nop
>
> $ riscv64-linux-gnu-gcc -O2 -o foo.o -c foo.S && riscv64-linux-gnu-objdump -M no-aliases -d foo.o
>
> foo.o: file format elf64-littleriscv
>
>
> Disassembly of section .text:
>
> 0000000000000000 <foo>:
> 0: 00000013 addi zero,zero,0
> 4: 00000013 addi zero,zero,0
Above examples are very clear, i will use these expaned instructions in next revision, thanks.
>
>
> Björn
--
BR,
Liao, Chang