From: Liao Chang <[email protected]>
Add jump optimization support for RISC-V.
Replaces ebreak instructions used by normal kprobes with an
auipc+jalr instruction pair, at the aim of suppressing the probe-hit
overhead.
All known optprobe-capable RISC architectures have been using a single
jump or branch instructions while this patch chooses not. RISC-V has a
quite limited jump range (4KB or 2MB) for both its branch and jump
instructions, which prevent optimizations from supporting probes that
spread all over the kernel.
Auipc-jalr instruction pair is introduced with a much wider jump range
(4GB), where auipc loads the upper 12 bits to a free register and jalr
Deaconappends the lower 20 bits to form a 32 bit immediate. Note that
returns from probe handler requires another free register. As kprobes
can appear almost anywhere inside the kernel, the free register should
be found in a generic way, not depending on calling convention or any
other regulations.
The algorithm for finding the free register is inspired by the register
renaming in modern processors. From the perspective of register renaming,
a register could be represented as two different registers if two neighbour
instructions both write to it but no one ever reads. Extending this fact,
a register is considered to be free if there is no read before its next
write in the execution flow. We are free to change its value without
interfering normal execution.
Static analysis shows that 51% instructions of the kernel (default config)
is capable of being replaced i.e. one free register can be found at both
the start and end of replaced instruction pairs while the replaced
instructions can be directly executed.
Contribution:
Chen Guokai invents the algorithm of searching free register, evaluated
the ratio of optimizaion, the basic function for RVI kernel binary.
Liao Chang adds the support for hybrid RVI and RVC kernel binary, fix
some bugs with different kernel configurations, refactor out entire
feature into some individual patches.
v3:
1. Support of hybrid RVI and RVC kernel binary.
2. Refactor out entire feature into some individual patches.
v2:
1. Adjust comments
2. Remove improper copyright
3. Clean up format issues that is no common practice
4. Extract common definition of instruction decoder
5. Fix race issue in SMP platform.
v1:
Chen Guokai contribute the basic functionality code.
Liao Chang (8):
riscv/kprobe: Prepare the skeleton to implement RISCV OPTPROBES
feature
riscv/kprobe: Allocate detour buffer from module area
riscv/kprobe: Prepare the skeleton to prepare optimized kprobe
riscv/kprobe: Add common RVI and RVC instruction decoder code
riscv/kprobe: Search free register(s) to clobber for 'AUIPC/JALR'
riscv/kprobe: Add code to check if kprobe can be optimized
riscv/kprobe: Prepare detour buffer for optimized kprobe
riscv/kprobe: Patch AUIPC/JALR pair to optimize kprobe
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/bug.h | 5 +-
arch/riscv/include/asm/kprobes.h | 48 ++
arch/riscv/include/asm/patch.h | 1 +
arch/riscv/kernel/patch.c | 22 +-
arch/riscv/kernel/probes/Makefile | 1 +
arch/riscv/kernel/probes/decode-insn.h | 145 ++++++
arch/riscv/kernel/probes/kprobes.c | 25 +
arch/riscv/kernel/probes/opt.c | 559 ++++++++++++++++++++++
arch/riscv/kernel/probes/opt_trampoline.S | 137 ++++++
arch/riscv/kernel/probes/simulate-insn.h | 41 ++
11 files changed, 980 insertions(+), 5 deletions(-)
create mode 100644 arch/riscv/kernel/probes/opt.c
create mode 100644 arch/riscv/kernel/probes/opt_trampoline.S
--
2.17.1
From: Liao Chang <[email protected]>
This patch adds code that can be used to decode RVI and RVC instructions
in searching one register for 'AUIPC/JALR'. As mentioned in previous
patch, kprobe can't be optimized until one free integer register can be
found to save the jump target, in order to figure out the register
searching, all instructions starting from the kprobe point to the last
one of function needs to be decoded and tested if it contains one
candidate register.
For all RVI instruction format, the position and length of 'rs1', 'rs2'
,'rd' and 'opcode' part are uniform, but the rule of RVC instruction
format is more complicated, so it addresses a couple of inline functions
to decode rs1/rs2/rd for RVC.
These instruction decoders are supposed to be consistent with the RVC and
RV32/RV64G instruction set list specified in the riscv instruction
reference published at August 25, 2022.
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
Signed-off-by: Liao Chang <[email protected]>
---
arch/riscv/include/asm/bug.h | 5 +-
arch/riscv/kernel/probes/decode-insn.h | 145 +++++++++++++++++++++++
arch/riscv/kernel/probes/simulate-insn.h | 41 +++++++
3 files changed, 190 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
index 1aaea81fb141..9c33d3b58225 100644
--- a/arch/riscv/include/asm/bug.h
+++ b/arch/riscv/include/asm/bug.h
@@ -19,11 +19,14 @@
#define __BUG_INSN_32 _UL(0x00100073) /* ebreak */
#define __BUG_INSN_16 _UL(0x9002) /* c.ebreak */
+#define RVI_INSN_LEN 4UL
+#define RVC_INSN_LEN 2UL
+
#define GET_INSN_LENGTH(insn) \
({ \
unsigned long __len; \
__len = ((insn & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) ? \
- 4UL : 2UL; \
+ RVI_INSN_LEN : RVC_INSN_LEN; \
__len; \
})
diff --git a/arch/riscv/kernel/probes/decode-insn.h b/arch/riscv/kernel/probes/decode-insn.h
index 42269a7d676d..1c202b0ac7d4 100644
--- a/arch/riscv/kernel/probes/decode-insn.h
+++ b/arch/riscv/kernel/probes/decode-insn.h
@@ -3,6 +3,7 @@
#ifndef _RISCV_KERNEL_KPROBES_DECODE_INSN_H
#define _RISCV_KERNEL_KPROBES_DECODE_INSN_H
+#include <linux/bitops.h>
#include <asm/sections.h>
#include <asm/kprobes.h>
@@ -15,4 +16,148 @@ enum probe_insn {
enum probe_insn __kprobes
riscv_probe_decode_insn(probe_opcode_t *addr, struct arch_probe_insn *asi);
+static inline u16 rvi_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 15) & 0x1f);
+}
+
+static inline u16 rvi_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 20) & 0x1f);
+}
+
+static inline u16 rvi_rd(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x1f);
+}
+
+static inline s32 rvi_branch_imme(kprobe_opcode_t opcode)
+{
+ u32 imme = 0;
+
+ imme |= (((opcode >> 8) & 0xf) << 1) |
+ (((opcode >> 25) & 0x3f) << 5) |
+ (((opcode >> 7) & 0x1) << 11) |
+ (((opcode >> 31) & 0x1) << 12);
+
+ return sign_extend32(imme, 13);
+}
+
+static inline s32 rvi_jal_imme(kprobe_opcode_t opcode)
+{
+ u32 imme = 0;
+
+ imme |= (((opcode >> 21) & 0x3ff) << 1) |
+ (((opcode >> 20) & 0x1) << 11) |
+ (((opcode >> 12) & 0xff) << 12) |
+ (((opcode >> 31) & 0x1) << 20);
+
+ return sign_extend32(imme, 21);
+}
+
+#ifdef CONFIG_RISCV_ISA_C
+static inline u16 rvc_r_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x1f);
+}
+
+static inline u16 rvc_r_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x1f);
+}
+
+static inline u16 rvc_r_rd(kprobe_opcode_t opcode)
+{
+ return rvc_r_rs1(opcode);
+}
+
+static inline u16 rvc_i_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x1f);
+}
+
+static inline u16 rvc_i_rd(kprobe_opcode_t opcode)
+{
+ return rvc_i_rs1(opcode);
+}
+
+static inline u16 rvc_ss_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x1f);
+}
+
+static inline u16 rvc_l_rd(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x7);
+}
+
+static inline u16 rvc_l_rs(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x7);
+}
+
+static inline u16 rvc_s_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x7);
+}
+
+static inline u16 rvc_s_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x7);
+}
+
+static inline u16 rvc_a_rs2(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 2) & 0x7);
+}
+
+static inline u16 rvc_a_rs1(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x7);
+}
+
+static inline u16 rvc_a_rd(kprobe_opcode_t opcode)
+{
+ return rvc_a_rs1(opcode);
+}
+
+static inline u16 rvc_b_rd(kprobe_opcode_t opcode)
+{
+ return (u16)((opcode >> 7) & 0x7);
+}
+
+static inline u16 rvc_b_rs(kprobe_opcode_t opcode)
+{
+ return rvc_b_rd(opcode);
+}
+
+static inline s32 rvc_branch_imme(kprobe_opcode_t opcode)
+{
+ u32 imme = 0;
+
+ imme |= (((opcode >> 3) & 0x3) << 1) |
+ (((opcode >> 10) & 0x3) << 3) |
+ (((opcode >> 2) & 0x1) << 5) |
+ (((opcode >> 5) & 0x3) << 6) |
+ (((opcode >> 12) & 0x1) << 8);
+
+ return sign_extend32(imme, 9);
+}
+
+static inline s32 rvc_jal_imme(kprobe_opcode_t opcode)
+{
+ u32 imme = 0;
+
+ imme |= (((opcode >> 3) & 0x3) << 1) |
+ (((opcode >> 11) & 0x1) << 4) |
+ (((opcode >> 2) & 0x1) << 5) |
+ (((opcode >> 7) & 0x1) << 6) |
+ (((opcode >> 6) & 0x1) << 7) |
+ (((opcode >> 9) & 0x3) << 8) |
+ (((opcode >> 8) & 0x1) << 10) |
+ (((opcode >> 12) & 0x1) << 11);
+
+ return sign_extend32(imme, 12);
+}
+#endif /* CONFIG_RISCV_ISA_C */
#endif /* _RISCV_KERNEL_KPROBES_DECODE_INSN_H */
diff --git a/arch/riscv/kernel/probes/simulate-insn.h b/arch/riscv/kernel/probes/simulate-insn.h
index cb6ff7dccb92..74d8c1ba9064 100644
--- a/arch/riscv/kernel/probes/simulate-insn.h
+++ b/arch/riscv/kernel/probes/simulate-insn.h
@@ -37,6 +37,40 @@ __RISCV_INSN_FUNCS(c_jalr, 0xf007, 0x9002);
__RISCV_INSN_FUNCS(c_beqz, 0xe003, 0xc001);
__RISCV_INSN_FUNCS(c_bnez, 0xe003, 0xe001);
__RISCV_INSN_FUNCS(c_ebreak, 0xffff, 0x9002);
+/* RVC(S) instructions contain rs1 and rs2 */
+__RISCV_INSN_FUNCS(c_sq, 0xe003, 0xa000);
+__RISCV_INSN_FUNCS(c_sw, 0xe003, 0xc000);
+__RISCV_INSN_FUNCS(c_sd, 0xe003, 0xe000);
+/* RVC(A) instructions contain rs1 and rs2 */
+__RISCV_INSN_FUNCS(c_sub, 0xfc03, 0x8c01);
+__RISCV_INSN_FUNCS(c_subw, 0xfc43, 0x9c01);
+/* RVC(L) instructions contain rs1 */
+__RISCV_INSN_FUNCS(c_lq, 0xe003, 0x2000);
+__RISCV_INSN_FUNCS(c_lw, 0xe003, 0x4000);
+__RISCV_INSN_FUNCS(c_ld, 0xe003, 0x6000);
+/* RVC(I) instructions contain rs1 */
+__RISCV_INSN_FUNCS(c_addi, 0xe003, 0x0001);
+__RISCV_INSN_FUNCS(c_addiw, 0xe003, 0x2001);
+__RISCV_INSN_FUNCS(c_addi16sp, 0xe183, 0x6101);
+__RISCV_INSN_FUNCS(c_slli, 0xe003, 0x0002);
+/* RVC(B) instructions contain rs1 */
+__RISCV_INSN_FUNCS(c_sri, 0xe803, 0x8001);
+__RISCV_INSN_FUNCS(c_andi, 0xec03, 0x8801);
+/* RVC(SS) instructions contain rs2 */
+__RISCV_INSN_FUNCS(c_sqsp, 0xe003, 0xa002);
+__RISCV_INSN_FUNCS(c_swsp, 0xe003, 0xc002);
+__RISCV_INSN_FUNCS(c_sdsp, 0xe003, 0xe002);
+/* RVC(R) instructions contain rs2 and rd */
+__RISCV_INSN_FUNCS(c_mv, 0xe003, 0x8002);
+/* RVC(I) instructions contain sp and rd */
+__RISCV_INSN_FUNCS(c_lqsp, 0xe003, 0x2002);
+__RISCV_INSN_FUNCS(c_lwsp, 0xe003, 0x4002);
+__RISCV_INSN_FUNCS(c_ldsp, 0xe003, 0x6002);
+/* RVC(CW) instructions contain sp and rd */
+__RISCV_INSN_FUNCS(c_addi4spn, 0xe003, 0x0000);
+/* RVC(I) instructions contain rd */
+__RISCV_INSN_FUNCS(c_li, 0xe003, 0x4001);
+__RISCV_INSN_FUNCS(c_lui, 0xe003, 0x6001);
__RISCV_INSN_FUNCS(auipc, 0x7f, 0x17);
__RISCV_INSN_FUNCS(branch, 0x7f, 0x63);
@@ -44,4 +78,11 @@ __RISCV_INSN_FUNCS(branch, 0x7f, 0x63);
__RISCV_INSN_FUNCS(jal, 0x7f, 0x6f);
__RISCV_INSN_FUNCS(jalr, 0x707f, 0x67);
+__RISCV_INSN_FUNCS(arith_rr, 0x77, 0x33);
+__RISCV_INSN_FUNCS(arith_ri, 0x77, 0x13);
+__RISCV_INSN_FUNCS(lui, 0x7f, 0x37);
+__RISCV_INSN_FUNCS(load, 0x7f, 0x03);
+__RISCV_INSN_FUNCS(store, 0x7f, 0x23);
+__RISCV_INSN_FUNCS(amo, 0x7f, 0x2f);
+
#endif /* _RISCV_KERNEL_PROBES_SIMULATE_INSN_H */
--
2.25.1
From: Liao Chang <[email protected]>
This patch introduces code to prepare instruction slot for optimized
kprobe. The instruction slot for regular kprobe just records two
instructions, first one is the original instruction replaced by EBREAK,
the second one is EBREAK for single-step. While instruction slot for
optimized kprobe is larger, beside execute instruction out-of-line, it
also contains a standalone stackframe for calling kprobe handler.
All optimized instruction slots consis of 5 major parts, which copied
from the assembly code template in opt_trampoline.S.
SAVE REGS
CALL optimized_callback
RESTORE REGS
EXECUTE INSNS OUT-OF-LINE
RETURN BACK
Although most instructions in each slot are same, these slots still have
a bit difference in their payload, it is caused by three parts:
- 'CALL optimized_callback', the relative offset for 'call'
instruction is different for each kprobe.
- 'EXECUTE INSN OUT-OF-LINE', no doubt.
- 'RETURN BACK', the chosen free register is reused here as the
destination register of jumping back.
So we also need customize the slot payload for each optimized kprobe.
Co-developed-by: Chen Guokai <[email protected]>
Signed-off-by: Chen Guokai <[email protected]>
Signed-off-by: Liao Chang <[email protected]>
---
arch/riscv/include/asm/kprobes.h | 16 +++
arch/riscv/kernel/probes/opt.c | 75 +++++++++++++
arch/riscv/kernel/probes/opt_trampoline.S | 125 ++++++++++++++++++++++
3 files changed, 216 insertions(+)
diff --git a/arch/riscv/include/asm/kprobes.h b/arch/riscv/include/asm/kprobes.h
index 22b73a2fd1fd..a9ef864f7225 100644
--- a/arch/riscv/include/asm/kprobes.h
+++ b/arch/riscv/include/asm/kprobes.h
@@ -48,10 +48,26 @@ void __kprobes *trampoline_probe_handler(struct pt_regs *regs);
/* optinsn template addresses */
extern __visible kprobe_opcode_t optprobe_template_entry[];
extern __visible kprobe_opcode_t optprobe_template_end[];
+extern __visible kprobe_opcode_t optprobe_template_save[];
+extern __visible kprobe_opcode_t optprobe_template_call[];
+extern __visible kprobe_opcode_t optprobe_template_insn[];
+extern __visible kprobe_opcode_t optprobe_template_return[];
#define MAX_OPTINSN_SIZE \
((unsigned long)optprobe_template_end - \
(unsigned long)optprobe_template_entry)
+#define DETOUR_SAVE_OFFSET \
+ ((unsigned long)optprobe_template_save - \
+ (unsigned long)optprobe_template_entry)
+#define DETOUR_CALL_OFFSET \
+ ((unsigned long)optprobe_template_call - \
+ (unsigned long)optprobe_template_entry)
+#define DETOUR_INSN_OFFSET \
+ ((unsigned long)optprobe_template_insn - \
+ (unsigned long)optprobe_template_entry)
+#define DETOUR_RETURN_OFFSET \
+ ((unsigned long)optprobe_template_return - \
+ (unsigned long)optprobe_template_entry)
/*
* For RVI and RVC hybird encoding kernel, althought long jump just needs
diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
index 876bec539554..77248ed7d4e8 100644
--- a/arch/riscv/kernel/probes/opt.c
+++ b/arch/riscv/kernel/probes/opt.c
@@ -11,9 +11,37 @@
#include <linux/kprobes.h>
#include <asm/kprobes.h>
#include <asm/patch.h>
+#include <asm/asm-offsets.h>
#include "simulate-insn.h"
#include "decode-insn.h"
+#include "../../net/bpf_jit.h"
+
+static void
+optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
+{
+ unsigned long flags;
+ struct kprobe_ctlblk *kcb;
+
+ /* Save skipped registers */
+ regs->epc = (unsigned long)op->kp.addr;
+ regs->orig_a0 = ~0UL;
+
+ local_irq_save(flags);
+ kcb = get_kprobe_ctlblk();
+
+ if (kprobe_running()) {
+ kprobes_inc_nmissed_count(&op->kp);
+ } else {
+ __this_cpu_write(current_kprobe, &op->kp);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ opt_pre_handler(&op->kp, regs);
+ __this_cpu_write(current_kprobe, NULL);
+ }
+ local_irq_restore(flags);
+}
+
+NOKPROBE_SYMBOL(optimized_callback)
static inline int in_auipc_jalr_range(long val)
{
@@ -30,6 +58,11 @@ static inline int in_auipc_jalr_range(long val)
#endif
}
+#define DETOUR_ADDR(code, offs) \
+ ((void *)((unsigned long)(code) + (offs)))
+#define DETOUR_INSN(code, offs) \
+ (*(kprobe_opcode_t *)((unsigned long)(code) + (offs)))
+
/*
* Copy optprobe assembly code template into detour buffer and modify some
* instructions for each kprobe.
@@ -38,6 +71,49 @@ static void prepare_detour_buffer(kprobe_opcode_t *code, kprobe_opcode_t *slot,
int rd, struct optimized_kprobe *op,
kprobe_opcode_t opcode)
{
+ long offs;
+ unsigned long data;
+
+ memcpy(code, optprobe_template_entry, MAX_OPTINSN_SIZE);
+
+ /* Step1: record optimized_kprobe pointer into detour buffer */
+ memcpy(DETOUR_ADDR(code, DETOUR_SAVE_OFFSET), &op, sizeof(op));
+
+ /*
+ * Step2
+ * auipc ra, 0 --> aupic ra, HI20.{optimized_callback - pc}
+ * jalr ra, 0(ra) --> jalr ra, LO12.{optimized_callback - pc}(ra)
+ */
+ offs = (unsigned long)&optimized_callback -
+ (unsigned long)DETOUR_ADDR(slot, DETOUR_CALL_OFFSET);
+ DETOUR_INSN(code, DETOUR_CALL_OFFSET) =
+ rv_auipc(1, (offs + (1 << 11)) >> 12);
+ DETOUR_INSN(code, DETOUR_CALL_OFFSET + 0x4) =
+ rv_jalr(1, 1, offs & 0xFFF);
+
+ /* Step3: copy replaced instructions into detour buffer */
+ memcpy(DETOUR_ADDR(code, DETOUR_INSN_OFFSET), op->kp.addr,
+ op->optinsn.length);
+ memcpy(DETOUR_ADDR(code, DETOUR_INSN_OFFSET), &opcode,
+ GET_INSN_LENGTH(opcode));
+
+ /* Step4: record return address of long jump into detour buffer */
+ data = (unsigned long)op->kp.addr + op->optinsn.length;
+ memcpy(DETOUR_ADDR(code, DETOUR_RETURN_OFFSET), &data, sizeof(data));
+
+ /*
+ * Step5
+ * auipc ra, 0 --> auipc rd, 0
+ * ld/w ra, -4(ra) --> ld/w rd, -8(rd)
+ * jalr x0, 0(ra) --> jalr x0, 0(rd)
+ */
+ DETOUR_INSN(code, DETOUR_RETURN_OFFSET + 0x8) = rv_auipc(rd, 0);
+#if __riscv_xlen == 32
+ DETOUR_INSN(code, DETOUR_RETURN_OFFSET + 0xC) = rv_lw(rd, -8, rd);
+#else
+ DETOUR_INSN(code, DETOUR_RETURN_OFFSET + 0xC) = rv_ld(rd, -8, rd);
+#endif
+ DETOUR_INSN(code, DETOUR_RETURN_OFFSET + 0x10) = rv_jalr(0, rd, 0);
}
/* Registers the first usage of which is the destination of instruction */
diff --git a/arch/riscv/kernel/probes/opt_trampoline.S b/arch/riscv/kernel/probes/opt_trampoline.S
index 16160c4367ff..75e34e373cf2 100644
--- a/arch/riscv/kernel/probes/opt_trampoline.S
+++ b/arch/riscv/kernel/probes/opt_trampoline.S
@@ -1,12 +1,137 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2022 Guokai Chen
+ * Copyright (C) 2022 Liao, Chang <[email protected]>
*/
#include <linux/linkage.h>
+#include <asm/asm.h>
#incldue <asm/csr.h>
#include <asm/asm-offsets.h>
SYM_ENTRY(optprobe_template_entry, SYM_L_GLOBAL, SYM_A_NONE)
+ addi sp, sp, -(PT_SIZE_ON_STACK)
+ REG_S x1, PT_RA(sp)
+ REG_S x2, PT_SP(sp)
+ REG_S x3, PT_GP(sp)
+ REG_S x4, PT_TP(sp)
+ REG_S x5, PT_T0(sp)
+ REG_S x6, PT_T1(sp)
+ REG_S x7, PT_T2(sp)
+ REG_S x8, PT_S0(sp)
+ REG_S x9, PT_S1(sp)
+ REG_S x10, PT_A0(sp)
+ REG_S x11, PT_A1(sp)
+ REG_S x12, PT_A2(sp)
+ REG_S x13, PT_A3(sp)
+ REG_S x14, PT_A4(sp)
+ REG_S x15, PT_A5(sp)
+ REG_S x16, PT_A6(sp)
+ REG_S x17, PT_A7(sp)
+ REG_S x18, PT_S2(sp)
+ REG_S x19, PT_S3(sp)
+ REG_S x20, PT_S4(sp)
+ REG_S x21, PT_S5(sp)
+ REG_S x22, PT_S6(sp)
+ REG_S x23, PT_S7(sp)
+ REG_S x24, PT_S8(sp)
+ REG_S x25, PT_S9(sp)
+ REG_S x26, PT_S10(sp)
+ REG_S x27, PT_S11(sp)
+ REG_S x28, PT_T3(sp)
+ REG_S x29, PT_T4(sp)
+ REG_S x30, PT_T5(sp)
+ REG_S x31, PT_T6(sp)
+ /* Update fp is friendly for stacktrace */
+ addi s0, sp, (PT_SIZE_ON_STACK)
+ j 1f
+
+SYM_ENTRY(optprobe_template_save, SYM_L_GLOBAL, SYM_A_NONE)
+ /*
+ * Step1:
+ * Filled with the pointer to optimized_kprobe data
+ */
+ .dword 0
+1:
+ /* Load optimize_kprobe pointer from .dword below */
+ auipc a0, 0
+ REG_L a0, -8(a0)
+ add a1, sp, x0
+
+SYM_ENTRY(optprobe_template_call, SYM_L_GLOBAL, SYM_A_NONE)
+ /*
+ * Step2:
+ * <IMME> of AUIPC/JALR are modified to the offset to optimized_callback
+ * jump target is loaded from above .dword.
+ */
+ auipc ra, 0
+ jalr ra, 0(ra)
+
+ REG_L x1, PT_RA(sp)
+ REG_L x3, PT_GP(sp)
+ REG_L x4, PT_TP(sp)
+ REG_L x5, PT_T0(sp)
+ REG_L x6, PT_T1(sp)
+ REG_L x7, PT_T2(sp)
+ REG_L x8, PT_S0(sp)
+ REG_L x9, PT_S1(sp)
+ REG_L x10, PT_A0(sp)
+ REG_L x11, PT_A1(sp)
+ REG_L x12, PT_A2(sp)
+ REG_L x13, PT_A3(sp)
+ REG_L x14, PT_A4(sp)
+ REG_L x15, PT_A5(sp)
+ REG_L x16, PT_A6(sp)
+ REG_L x17, PT_A7(sp)
+ REG_L x18, PT_S2(sp)
+ REG_L x19, PT_S3(sp)
+ REG_L x20, PT_S4(sp)
+ REG_L x21, PT_S5(sp)
+ REG_L x22, PT_S6(sp)
+ REG_L x23, PT_S7(sp)
+ REG_L x24, PT_S8(sp)
+ REG_L x25, PT_S9(sp)
+ REG_L x26, PT_S10(sp)
+ REG_L x27, PT_S11(sp)
+ REG_L x28, PT_T3(sp)
+ REG_L x29, PT_T4(sp)
+ REG_L x30, PT_T5(sp)
+ REG_L x31, PT_T6(sp)
+ REG_L x2, PT_SP(sp)
+ addi sp, sp, (PT_SIZE_ON_STACK)
+
+SYM_ENTRY(optprobe_template_insn, SYM_L_GLOBAL, SYM_A_NONE)
+ /*
+ * Step3:
+ * NOPS will be replaced by the probed instruction, at worst case 3 RVC
+ * and 1 RVI instructions is about to execute out of line.
+ */
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ j 2f
+
+SYM_ENTRY(optprobe_template_return, SYM_L_GLOBAL, SYM_A_NONE)
+ /*
+ * Step4:
+ * Filled with the return address of long jump(AUIPC/JALR)
+ */
+ .dword 0
+2:
+ /*
+ * Step5:
+ * The <RA> of AUIPC/LD/JALR will be replaced for each kprobe,
+ * used to read return address saved in .dword above.
+ */
+ auipc ra, 0
+ REG_L ra, -8(ra)
+ jalr x0, 0(ra)
SYM_ENTRY(optprobe_template_end, SYM_L_GLOBAL, SYM_A_NONE)
--
2.25.1