This series makes atmoic code patching possible in riscv ftrace. A
direct benefit of this is that we can get rid of stop_machine() when
patching function entries. This also makes it possible to run ftrace
with full kernel preemption. Before this series, the kernel initializes
patchable function entries to NOP4 + NOP4. To start tracing, it updates
entries to AUIPC + JALR while holding other cores in stop_machine.
stop_machine() is required because it is impossible to update 2
instructions, and be seen atomically. And preemption must have to be
prevented, as kernel preemption allows process to be scheduled out while
executing on one of these instruction pairs.
This series addresses the problem by initializing the first NOP4 to
AUIPC. So, atmoic patching is possible because the kernel only has to
update one instruction. As long as the instruction is naturally aligned,
then it is expected to be updated atomically.
However, the address range of the ftrace trampoline is limited to +-2K
from ftrace_caller after appplying this series. This issue is expected
to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
data in front of pacthable functions and can use it to direct execution
out to any custom trampolines.
The series is composed by two parts. The first part (1-3) cleans up
existing issues that was found during testing of and not caused by the
implementation. The second part modifies the ftrace code patching
mechanism (4-6) as mentioned above. Then prepare ftrace to be able to
run with kernel preemption (7,8)
---
Andy Chiu (8):
riscv: stacktrace: convert arch_stack_walk() to noinstr
tracing: do not trace kernel_text_address()
riscv: ftrace: support fastcc in Clang for WITH_ARGS
riscv: ftrace: align patchable functions to 4 Byte boundary
riscv: ftrace: prepare ftrace for atomic code patching
riscv: ftrace: do not use stop_machine to update code
riscv: vector: Support calling schedule() for preemptible Vector
riscv: ftrace: support PREEMPT
arch/riscv/Kconfig | 3 +-
arch/riscv/Makefile | 7 +-
arch/riscv/include/asm/ftrace.h | 11 +++
arch/riscv/include/asm/processor.h | 5 ++
arch/riscv/include/asm/vector.h | 22 +++++-
arch/riscv/kernel/asm-offsets.c | 7 ++
arch/riscv/kernel/ftrace.c | 133 ++++++++++++++++---------------------
arch/riscv/kernel/mcount-dyn.S | 25 +++++--
arch/riscv/kernel/stacktrace.c | 2 +-
kernel/extable.c | 4 +-
10 files changed, 129 insertions(+), 90 deletions(-)
---
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
change-id: 20240613-dev-andyc-dyn-ftrace-v4-941d4a00ea19
Best regards,
--
Andy Chiu <[email protected]>
arch_stack_walk() is called intensively in function_graph when the
kernel is compiled with CONFIG_TRACE_IRQFLAGS. As a result, the kernel
logs a lot of arch_stack_walk and its sub-functions into the ftrace
buffer. However, these functions should not appear on the trace log
because they are part of the ftrace itself. This patch references what
arm64 does for the smae function. So it further prevent the re-enter
kprobe issue, which is also possible on riscv.
Related-to: commit 0fbcd8abf337 ("arm64: Prohibit instrumentation on arch_stack_walk()")
Fixes: 680341382da5 ("riscv: add CALLER_ADDRx support")
Signed-off-by: Andy Chiu <[email protected]>
---
arch/riscv/kernel/stacktrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
index 528ec7cc9a62..0d3f00eb0bae 100644
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -156,7 +156,7 @@ unsigned long __get_wchan(struct task_struct *task)
return pc;
}
-noinline void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
+noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
struct task_struct *task, struct pt_regs *regs)
{
walk_stackframe(task, regs, consume_entry, cookie);
--
2.43.0
kernel_text_address() and __kernel_text_address() are called in
arch_stack_walk() of riscv. This results in excess amount of un-related
traces when the kernel is compiled with CONFIG_TRACE_IRQFLAGS. The
situation worsens when function_graph is active, as it calls
local_irq_save/restore in each function's entry/exit. This patch adds
both functions to notrace, so they won't show up on the trace records.
Signed-off-by: Andy Chiu <[email protected]>
---
kernel/extable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/extable.c b/kernel/extable.c
index 71f482581cab..d03fa462fa8b 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -74,7 +74,7 @@ int notrace core_kernel_text(unsigned long addr)
return 0;
}
-int __kernel_text_address(unsigned long addr)
+int notrace __kernel_text_address(unsigned long addr)
{
if (kernel_text_address(addr))
return 1;
@@ -91,7 +91,7 @@ int __kernel_text_address(unsigned long addr)
return 0;
}
-int kernel_text_address(unsigned long addr)
+int notrace kernel_text_address(unsigned long addr)
{
bool no_rcu;
int ret = 1;
--
2.43.0
Some caller-saved registers which are not defined as function arguments
in the ABI can still be passed as arguments when the kernel is compiled
with Clang. As a result, we must save and restore those registers to
prevent ftrace from clobbering them.
- [1]: https://reviews.llvm.org/D68559
Reported-by: Evgenii Shatokhin <[email protected]>
Closes: https://lore.kernel.org/linux-riscv/[email protected]/
Signed-off-by: Andy Chiu <[email protected]>
---
arch/riscv/include/asm/ftrace.h | 7 +++++++
arch/riscv/kernel/asm-offsets.c | 7 +++++++
arch/riscv/kernel/mcount-dyn.S | 16 ++++++++++++++--
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 9eb31a7ea0aa..5f81c53dbfd9 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -144,6 +144,13 @@ struct ftrace_regs {
unsigned long a5;
unsigned long a6;
unsigned long a7;
+#ifdef CONFIG_CC_IS_CLANG
+ unsigned long t2;
+ unsigned long t3;
+ unsigned long t4;
+ unsigned long t5;
+ unsigned long t6;
+#endif
};
};
};
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index b09ca5f944f7..db5a26fcc9ae 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -497,6 +497,13 @@ void asm_offsets(void)
DEFINE(FREGS_SP, offsetof(struct ftrace_regs, sp));
DEFINE(FREGS_S0, offsetof(struct ftrace_regs, s0));
DEFINE(FREGS_T1, offsetof(struct ftrace_regs, t1));
+#ifdef CONFIG_CC_IS_CLANG
+ DEFINE(FREGS_T2, offsetof(struct ftrace_regs, t2));
+ DEFINE(FREGS_T3, offsetof(struct ftrace_regs, t3));
+ DEFINE(FREGS_T4, offsetof(struct ftrace_regs, t4));
+ DEFINE(FREGS_T5, offsetof(struct ftrace_regs, t5));
+ DEFINE(FREGS_T6, offsetof(struct ftrace_regs, t6));
+#endif
DEFINE(FREGS_A0, offsetof(struct ftrace_regs, a0));
DEFINE(FREGS_A1, offsetof(struct ftrace_regs, a1));
DEFINE(FREGS_A2, offsetof(struct ftrace_regs, a2));
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index 745dd4c4a69c..e988bd26b28b 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -96,7 +96,13 @@
REG_S x8, FREGS_S0(sp)
#endif
REG_S x6, FREGS_T1(sp)
-
+#ifdef CONFIG_CC_IS_CLANG
+ REG_S x7, FREGS_T2(sp)
+ REG_S x28, FREGS_T3(sp)
+ REG_S x29, FREGS_T4(sp)
+ REG_S x30, FREGS_T5(sp)
+ REG_S x31, FREGS_T6(sp)
+#endif
// save the arguments
REG_S x10, FREGS_A0(sp)
REG_S x11, FREGS_A1(sp)
@@ -115,7 +121,13 @@
REG_L x8, FREGS_S0(sp)
#endif
REG_L x6, FREGS_T1(sp)
-
+#ifdef CONFIG_CC_IS_CLANG
+ REG_L x7, FREGS_T2(sp)
+ REG_L x28, FREGS_T3(sp)
+ REG_L x29, FREGS_T4(sp)
+ REG_L x30, FREGS_T5(sp)
+ REG_L x31, FREGS_T6(sp)
+#endif
// restore the arguments
REG_L x10, FREGS_A0(sp)
REG_L x11, FREGS_A1(sp)
--
2.43.0
We are changing ftrace code patching in order to remove dependency from
stop_machine() and enable kernel preemption. This requires us to align
functions entry at a 4-B align address.
However, -falign-functions on older versions of GCC alone was not strong
enoungh to align all functions. In fact, cold functions are not aligned
after turning on optimizations. We consider this is a bug in GCC and
turn off guess-branch-probility as a workaround to align all functions.
GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
The option -fmin-function-alignment is able to align all functions
properly on newer versions of gcc. So, we add a cc-option to test if
the toolchain supports it.
Suggested-by: Evgenii Shatokhin <[email protected]>
Signed-off-by: Andy Chiu <[email protected]>
---
arch/riscv/Kconfig | 1 +
arch/riscv/Makefile | 7 ++++++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index b94176e25be1..80b8d48e1e46 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -203,6 +203,7 @@ config CLANG_SUPPORTS_DYNAMIC_FTRACE
config GCC_SUPPORTS_DYNAMIC_FTRACE
def_bool CC_IS_GCC
depends on $(cc-option,-fpatchable-function-entry=8)
+ depends on $(cc-option,-fmin-function-alignment=4) || !RISCV_ISA_C
config HAVE_SHADOW_CALL_STACK
def_bool $(cc-option,-fsanitize=shadow-call-stack)
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 06de9d365088..74628ad8dcf8 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -14,8 +14,13 @@ endif
ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
LDFLAGS_vmlinux += --no-relax
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+ifeq ($(CONFIG_CC_IS_CLANG),y)
+ cflags_ftrace_align := -falign-functions=4
+else
+ cflags_ftrace_align := -fmin-function-alignment=4
+endif
ifeq ($(CONFIG_RISCV_ISA_C),y)
- CC_FLAGS_FTRACE := -fpatchable-function-entry=4
+ CC_FLAGS_FTRACE := -fpatchable-function-entry=4 $(cflags_ftrace_align)
else
CC_FLAGS_FTRACE := -fpatchable-function-entry=2
endif
--
2.43.0
We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
instruction fetch can break down to 4 byte at a time, it is impossible
to update two instructions without a race. In order to mitigate it, we
initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
patching can change NOP4 to JALR to eable/disable ftrcae from a
function. This limits the reach of each ftrace entry to +-2KB displacing
from ftrace_caller.
Starting from the trampoline, we add a level of indirection for it to
reach ftrace caller target. Now, it loads the target address from a
memory location, then perform the jump. This enable the kernel to update
the target atomically.
The ordering of reading/updating the targert address should be guarded
by generic ftrace code, where it sends smp_rmb ipi.
Signed-off-by: Andy Chiu <[email protected]>
---
arch/riscv/include/asm/ftrace.h | 4 +++
arch/riscv/kernel/ftrace.c | 80 ++++++++++++++++++++++++++---------------
arch/riscv/kernel/mcount-dyn.S | 9 +++--
3 files changed, 62 insertions(+), 31 deletions(-)
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 5f81c53dbfd9..7199383f8c02 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -81,6 +81,7 @@ struct dyn_arch_ftrace {
#define JALR_T0 (0x000282e7)
#define AUIPC_T0 (0x00000297)
#define NOP4 (0x00000013)
+#define JALR_RANGE (JALR_SIGN_MASK - 1)
#define to_jalr_t0(offset) \
(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
@@ -118,6 +119,9 @@ do { \
* Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
*/
#define MCOUNT_INSN_SIZE 8
+#define MCOUNT_AUIPC_SIZE 4
+#define MCOUNT_JALR_SIZE 4
+#define MCOUNT_NOP4_SIZE 4
#ifndef __ASSEMBLY__
struct dyn_ftrace;
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 87cbd86576b2..f3b09f2d3ecc 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -64,42 +64,64 @@ static int ftrace_check_current_call(unsigned long hook_pos,
return 0;
}
-static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
- bool enable, bool ra)
+static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target, bool validate)
{
unsigned int call[2];
- unsigned int nops[2] = {NOP4, NOP4};
+ unsigned int replaced[2];
+
+ make_call_t0(hook_pos, target, call);
- if (ra)
- make_call_ra(hook_pos, target, call);
- else
- make_call_t0(hook_pos, target, call);
+ if (validate) {
+ /*
+ * Read the text we want to modify;
+ * return must be -EFAULT on read error
+ */
+ if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
+ MCOUNT_INSN_SIZE))
+ return -EFAULT;
+
+ if (replaced[0] != call[0]) {
+ pr_err("%p: expected (%08x) but got (%08x)\n",
+ (void *)hook_pos, call[0], replaced[0]);
+ return -EINVAL;
+ }
+ }
- /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
- if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
+ /* Replace the jalr at once. Return -EPERM on write error. */
+ if (patch_insn_write((void *)(hook_pos + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE))
return -EPERM;
return 0;
}
-int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+static int __ftrace_modify_call_site(ftrace_func_t *hook_pos, ftrace_func_t target, bool enable)
{
- unsigned int call[2];
+ ftrace_func_t call = target;
+ ftrace_func_t nops = &ftrace_stub;
- make_call_t0(rec->ip, addr, call);
-
- if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
- return -EPERM;
+ WRITE_ONCE(*hook_pos, enable ? call : nops);
return 0;
}
+int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned long distance, orig_addr;
+
+ orig_addr = (unsigned long)&ftrace_caller;
+ distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
+ if (distance > JALR_RANGE)
+ return -EINVAL;
+
+ return __ftrace_modify_call(rec->ip, addr, false);
+}
+
int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
unsigned long addr)
{
- unsigned int nops[2] = {NOP4, NOP4};
+ unsigned int nops[1] = {NOP4};
- if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
+ if (patch_insn_write((void *)(rec->ip + MCOUNT_AUIPC_SIZE), nops, MCOUNT_NOP4_SIZE))
return -EPERM;
return 0;
@@ -114,10 +136,14 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
*/
int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
{
+ unsigned int nops[2];
int out;
+ make_call_t0(rec->ip, &ftrace_caller, nops);
+ nops[1] = NOP4;
+
mutex_lock(&text_mutex);
- out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
+ out = patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE);
mutex_unlock(&text_mutex);
if (!mod)
@@ -126,12 +152,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
return out;
}
+ftrace_func_t ftrace_call_dest = ftrace_stub;
int ftrace_update_ftrace_func(ftrace_func_t func)
{
- int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
- (unsigned long)func, true, true);
-
- return ret;
+ return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
}
struct ftrace_modify_param {
@@ -185,7 +209,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
if (ret)
return ret;
- return __ftrace_modify_call(caller, addr, true, false);
+ return __ftrace_modify_call(caller, addr, true);
}
#endif
@@ -220,17 +244,17 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
prepare_ftrace_return(&fregs->ra, ip, fregs->s0);
}
#else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
-extern void ftrace_graph_call(void);
+ftrace_func_t ftrace_graph_call_dest = ftrace_stub;
int ftrace_enable_ftrace_graph_caller(void)
{
- return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
- (unsigned long)&prepare_ftrace_return, true, true);
+ return __ftrace_modify_call_site(&ftrace_graph_call_dest,
+ &prepare_ftrace_return, true);
}
int ftrace_disable_ftrace_graph_caller(void)
{
- return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
- (unsigned long)&prepare_ftrace_return, false, true);
+ return __ftrace_modify_call_site(&ftrace_graph_call_dest,
+ &prepare_ftrace_return, false);
}
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
#endif /* CONFIG_DYNAMIC_FTRACE */
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index e988bd26b28b..bc06e8ab81cf 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -162,7 +162,8 @@ SYM_FUNC_START(ftrace_caller)
mv a3, sp
SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
- call ftrace_stub
+ REG_L ra, ftrace_call_dest
+ jalr 0(ra)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
addi a0, sp, ABI_RA
@@ -172,7 +173,8 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
mv a2, s0
#endif
SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
- call ftrace_stub
+ REG_L ra, ftrace_graph_call_dest
+ jalr 0(ra)
#endif
RESTORE_ABI
jr t0
@@ -185,7 +187,8 @@ SYM_FUNC_START(ftrace_caller)
PREPARE_ARGS
SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
- call ftrace_stub
+ REG_L ra, ftrace_call_dest
+ jalr 0(ra)
RESTORE_ABI_REGS
bnez t1, .Ldirect
--
2.43.0
Each function entry implies a call to ftrace infrastructure. And it may
call into schedule in some cases. So, it is possible for preemptible
kernel-mode Vector to implicitly call into schedule. Since all V-regs
are caller-saved, it is possible to drop all V context when a thread
voluntarily call schedule(). Besides, we currently don't pass argument
through vector register, so we don't have to save/restore V-regs in
ftrace trampoline.
Signed-off-by: Andy Chiu <[email protected]>
---
arch/riscv/include/asm/processor.h | 5 +++++
arch/riscv/include/asm/vector.h | 22 +++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 68c3432dc6ea..02598e168659 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -95,6 +95,10 @@ struct pt_regs;
* Thus, the task does not own preempt_v. Any use of Vector will have to
* save preempt_v, if dirty, and fallback to non-preemptible kernel-mode
* Vector.
+ * - bit 29: The thread voluntarily calls schedule() while holding an active
+ * preempt_v. All preempt_v context should be dropped in such case because
+ * V-regs are caller-saved. Only sstatus.VS=ON is persisted across a
+ * schedule() call.
* - bit 30: The in-kernel preempt_v context is saved, and requries to be
* restored when returning to the context that owns the preempt_v.
* - bit 31: The in-kernel preempt_v context is dirty, as signaled by the
@@ -109,6 +113,7 @@ struct pt_regs;
#define RISCV_PREEMPT_V 0x00000100
#define RISCV_PREEMPT_V_DIRTY 0x80000000
#define RISCV_PREEMPT_V_NEED_RESTORE 0x40000000
+#define RISCV_PREEMPT_V_IN_SCHEDULE 0x20000000
/* CPU-specific state of a task */
struct thread_struct {
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 731dcd0ed4de..50693cffbe78 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -75,6 +75,11 @@ static __always_inline void riscv_v_disable(void)
csr_clear(CSR_SSTATUS, SR_VS);
}
+static __always_inline bool riscv_v_is_on(void)
+{
+ return !!(csr_read(CSR_SSTATUS) & SR_VS);
+}
+
static __always_inline void __vstate_csr_save(struct __riscv_v_ext_state *dest)
{
asm volatile (
@@ -243,6 +248,11 @@ static inline void __switch_to_vector(struct task_struct *prev,
struct pt_regs *regs;
if (riscv_preempt_v_started(prev)) {
+ if (riscv_v_is_on()) {
+ WARN_ON(prev->thread.riscv_v_flags & RISCV_V_CTX_DEPTH_MASK);
+ riscv_v_disable();
+ prev->thread.riscv_v_flags |= RISCV_PREEMPT_V_IN_SCHEDULE;
+ }
if (riscv_preempt_v_dirty(prev)) {
__riscv_v_vstate_save(&prev->thread.kernel_vstate,
prev->thread.kernel_vstate.datap);
@@ -253,10 +263,16 @@ static inline void __switch_to_vector(struct task_struct *prev,
riscv_v_vstate_save(&prev->thread.vstate, regs);
}
- if (riscv_preempt_v_started(next))
- riscv_preempt_v_set_restore(next);
- else
+ if (riscv_preempt_v_started(next)) {
+ if (next->thread.riscv_v_flags & RISCV_PREEMPT_V_IN_SCHEDULE) {
+ next->thread.riscv_v_flags &= ~RISCV_PREEMPT_V_IN_SCHEDULE;
+ riscv_v_enable();
+ } else {
+ riscv_preempt_v_set_restore(next);
+ }
+ } else {
riscv_v_vstate_set_restore(next, task_pt_regs(next));
+ }
}
void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
--
2.43.0
Now, we can safely enable dynamic ftrace with kernel preemption.
Signed-off-by: Andy Chiu <[email protected]>
---
arch/riscv/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 80b8d48e1e46..c1493ee1b8cd 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -133,7 +133,7 @@ config RISCV
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
- select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
+ select HAVE_FUNCTION_TRACER if !XIP_KERNEL
select HAVE_EBPF_JIT if MMU
select HAVE_GUP_FAST if MMU
select HAVE_FUNCTION_ARG_ACCESS_API
--
2.43.0
Now it is safe to remove dependency from stop_machine() for us to patch
code in ftrace.
Signed-off-by: Andy Chiu <[email protected]>
---
arch/riscv/kernel/ftrace.c | 53 ++++------------------------------------------
1 file changed, 4 insertions(+), 49 deletions(-)
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index f3b09f2d3ecc..9a421e151b1d 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -13,23 +13,13 @@
#include <asm/patch.h>
#ifdef CONFIG_DYNAMIC_FTRACE
-void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
+void arch_ftrace_update_code(int command)
{
mutex_lock(&text_mutex);
-
- /*
- * The code sequences we use for ftrace can't be patched while the
- * kernel is running, so we need to use stop_machine() to modify them
- * for now. This doesn't play nice with text_mutex, we use this flag
- * to elide the check.
- */
- riscv_patch_in_stop_machine = true;
-}
-
-void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
-{
- riscv_patch_in_stop_machine = false;
+ command |= FTRACE_MAY_SLEEP;
+ ftrace_modify_all_code(command);
mutex_unlock(&text_mutex);
+ flush_icache_all();
}
static int ftrace_check_current_call(unsigned long hook_pos,
@@ -158,41 +148,6 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
}
-struct ftrace_modify_param {
- int command;
- atomic_t cpu_count;
-};
-
-static int __ftrace_modify_code(void *data)
-{
- struct ftrace_modify_param *param = data;
-
- if (atomic_inc_return(¶m->cpu_count) == num_online_cpus()) {
- ftrace_modify_all_code(param->command);
- /*
- * Make sure the patching store is effective *before* we
- * increment the counter which releases all waiting CPUs
- * by using the release variant of atomic increment. The
- * release pairs with the call to local_flush_icache_all()
- * on the waiting CPU.
- */
- atomic_inc_return_release(¶m->cpu_count);
- } else {
- while (atomic_read(¶m->cpu_count) <= num_online_cpus())
- cpu_relax();
- }
-
- local_flush_icache_all();
-
- return 0;
-}
-
-void arch_ftrace_update_code(int command)
-{
- struct ftrace_modify_param param = { command, ATOMIC_INIT(0) };
-
- stop_machine(__ftrace_modify_code, ¶m, cpu_online_mask);
-}
#endif
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
--
2.43.0
On Thu, 13 Jun 2024 15:11:07 +0800
Andy Chiu <[email protected]> wrote:
> kernel_text_address() and __kernel_text_address() are called in
> arch_stack_walk() of riscv. This results in excess amount of un-related
> traces when the kernel is compiled with CONFIG_TRACE_IRQFLAGS. The
> situation worsens when function_graph is active, as it calls
> local_irq_save/restore in each function's entry/exit. This patch adds
> both functions to notrace, so they won't show up on the trace records.
I rather not add notrace just because something is noisy.
You can always just add:
echo '*kernel_text_address' > /sys/kernel/tracing/set_ftrace_notrace
and achieve the same result.
-- Steve
Hi Andy,
On Thu, Jun 13, 2024 at 03:11:09PM +0800, Andy Chiu wrote:
> We are changing ftrace code patching in order to remove dependency from
> stop_machine() and enable kernel preemption. This requires us to align
> functions entry at a 4-B align address.
>
> However, -falign-functions on older versions of GCC alone was not strong
> enoungh to align all functions. In fact, cold functions are not aligned
> after turning on optimizations. We consider this is a bug in GCC and
> turn off guess-branch-probility as a workaround to align all functions.
>
> GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
>
> The option -fmin-function-alignment is able to align all functions
> properly on newer versions of gcc. So, we add a cc-option to test if
> the toolchain supports it.
>
> Suggested-by: Evgenii Shatokhin <[email protected]>
> Signed-off-by: Andy Chiu <[email protected]>
> ---
> arch/riscv/Kconfig | 1 +
> arch/riscv/Makefile | 7 ++++++-
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index b94176e25be1..80b8d48e1e46 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -203,6 +203,7 @@ config CLANG_SUPPORTS_DYNAMIC_FTRACE
> config GCC_SUPPORTS_DYNAMIC_FTRACE
> def_bool CC_IS_GCC
> depends on $(cc-option,-fpatchable-function-entry=8)
> + depends on $(cc-option,-fmin-function-alignment=4) || !RISCV_ISA_C
Please use CC_HAS_MIN_FUNCTION_ALIGNMENT (from arch/Kconfig), which
already checks for support for this option.
> config HAVE_SHADOW_CALL_STACK
> def_bool $(cc-option,-fsanitize=shadow-call-stack)
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index 06de9d365088..74628ad8dcf8 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -14,8 +14,13 @@ endif
> ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
> LDFLAGS_vmlinux += --no-relax
> KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> +ifeq ($(CONFIG_CC_IS_CLANG),y)
Same here, please invert this and use
ifdef CONFIG_CC_HAS_MIN_FUNCTION_ALIGNMENT
like the main Makefile does.
> + cflags_ftrace_align := -falign-functions=4
> +else
> + cflags_ftrace_align := -fmin-function-alignment=4
> +endif
> ifeq ($(CONFIG_RISCV_ISA_C),y)
> - CC_FLAGS_FTRACE := -fpatchable-function-entry=4
> + CC_FLAGS_FTRACE := -fpatchable-function-entry=4 $(cflags_ftrace_align)
> else
> CC_FLAGS_FTRACE := -fpatchable-function-entry=2
> endif
>
> --
> 2.43.0
>
>
On Thu, Jun 13, 2024 at 03:11:08PM +0800, Andy Chiu wrote:
> Some caller-saved registers which are not defined as function arguments
> in the ABI can still be passed as arguments when the kernel is compiled
> with Clang. As a result, we must save and restore those registers to
> prevent ftrace from clobbering them.
>
> - [1]: https://reviews.llvm.org/D68559
> Reported-by: Evgenii Shatokhin <[email protected]>
> Closes: https://lore.kernel.org/linux-riscv/[email protected]/
> Signed-off-by: Andy Chiu <[email protected]>
Acked-by: Nathan Chancellor <[email protected]>
> ---
> arch/riscv/include/asm/ftrace.h | 7 +++++++
> arch/riscv/kernel/asm-offsets.c | 7 +++++++
> arch/riscv/kernel/mcount-dyn.S | 16 ++++++++++++++--
> 3 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> index 9eb31a7ea0aa..5f81c53dbfd9 100644
> --- a/arch/riscv/include/asm/ftrace.h
> +++ b/arch/riscv/include/asm/ftrace.h
> @@ -144,6 +144,13 @@ struct ftrace_regs {
> unsigned long a5;
> unsigned long a6;
> unsigned long a7;
> +#ifdef CONFIG_CC_IS_CLANG
> + unsigned long t2;
> + unsigned long t3;
> + unsigned long t4;
> + unsigned long t5;
> + unsigned long t6;
> +#endif
> };
> };
> };
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index b09ca5f944f7..db5a26fcc9ae 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -497,6 +497,13 @@ void asm_offsets(void)
> DEFINE(FREGS_SP, offsetof(struct ftrace_regs, sp));
> DEFINE(FREGS_S0, offsetof(struct ftrace_regs, s0));
> DEFINE(FREGS_T1, offsetof(struct ftrace_regs, t1));
> +#ifdef CONFIG_CC_IS_CLANG
> + DEFINE(FREGS_T2, offsetof(struct ftrace_regs, t2));
> + DEFINE(FREGS_T3, offsetof(struct ftrace_regs, t3));
> + DEFINE(FREGS_T4, offsetof(struct ftrace_regs, t4));
> + DEFINE(FREGS_T5, offsetof(struct ftrace_regs, t5));
> + DEFINE(FREGS_T6, offsetof(struct ftrace_regs, t6));
> +#endif
> DEFINE(FREGS_A0, offsetof(struct ftrace_regs, a0));
> DEFINE(FREGS_A1, offsetof(struct ftrace_regs, a1));
> DEFINE(FREGS_A2, offsetof(struct ftrace_regs, a2));
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index 745dd4c4a69c..e988bd26b28b 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -96,7 +96,13 @@
> REG_S x8, FREGS_S0(sp)
> #endif
> REG_S x6, FREGS_T1(sp)
> -
> +#ifdef CONFIG_CC_IS_CLANG
> + REG_S x7, FREGS_T2(sp)
> + REG_S x28, FREGS_T3(sp)
> + REG_S x29, FREGS_T4(sp)
> + REG_S x30, FREGS_T5(sp)
> + REG_S x31, FREGS_T6(sp)
> +#endif
> // save the arguments
> REG_S x10, FREGS_A0(sp)
> REG_S x11, FREGS_A1(sp)
> @@ -115,7 +121,13 @@
> REG_L x8, FREGS_S0(sp)
> #endif
> REG_L x6, FREGS_T1(sp)
> -
> +#ifdef CONFIG_CC_IS_CLANG
> + REG_L x7, FREGS_T2(sp)
> + REG_L x28, FREGS_T3(sp)
> + REG_L x29, FREGS_T4(sp)
> + REG_L x30, FREGS_T5(sp)
> + REG_L x31, FREGS_T6(sp)
> +#endif
> // restore the arguments
> REG_L x10, FREGS_A0(sp)
> REG_L x11, FREGS_A1(sp)
>
> --
> 2.43.0
>