Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
as notrace"), some architectures assume that the stopped CPUs don't make
function calls to traceable functions when they are in the stopped
state. For example, it causes unexpected kernel crashed when switching
tracer on RISC-V.
The following patches added calls to these two functions, fix it by
adding the notrace annotations.
Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
multi_cpu_stop()")
Signed-off-by: Zong Li <[email protected]>
---
kernel/rcu/tree.c | 2 +-
kernel/stop_machine.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 06895ef85d69..2a52f42f64b6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
*
* The caller must have disabled interrupts and must not be idle.
*/
-void rcu_momentary_dyntick_idle(void)
+notrace void rcu_momentary_dyntick_idle(void)
{
int special;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 865bb0228ab6..890b79cf0e7c 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
set_state(msdata, msdata->state + 1);
}
-void __weak stop_machine_yield(const struct cpumask *cpumask)
+notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
{
cpu_relax();
}
--
2.28.0
On Wed, 21 Oct 2020 15:38:39 +0800
Zong Li <[email protected]> wrote:
> Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> as notrace"), some architectures assume that the stopped CPUs don't make
> function calls to traceable functions when they are in the stopped
> state. For example, it causes unexpected kernel crashed when switching
> tracer on RISC-V.
>
> The following patches added calls to these two functions, fix it by
> adding the notrace annotations.
>
> Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> multi_cpu_stop()")
I really do not like to add "notrace" to core functions because a single
architecture has issues with it. Why does RISCV have problems with these
functions but no other architecture does?
NACK from me until it is shown that these are issues for a broader set of
architectures.
It sounds to me like you are treating a symptom and not the disease.
-- Steve
>
> Signed-off-by: Zong Li <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> kernel/stop_machine.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 06895ef85d69..2a52f42f64b6 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> *
> * The caller must have disabled interrupts and must not be idle.
> */
> -void rcu_momentary_dyntick_idle(void)
> +notrace void rcu_momentary_dyntick_idle(void)
> {
> int special;
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 865bb0228ab6..890b79cf0e7c 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> set_state(msdata, msdata->state + 1);
> }
>
> -void __weak stop_machine_yield(const struct cpumask *cpumask)
> +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> {
> cpu_relax();
> }
On Wed, 21 Oct 2020 10:12:16 -0400
Steven Rostedt <[email protected]> wrote:
> > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> > multi_cpu_stop()")
>
> I really do not like to add "notrace" to core functions because a single
> architecture has issues with it. Why does RISCV have problems with these
> functions but no other architecture does?
>
> NACK from me until it is shown that these are issues for a broader set of
> architectures.
After looking at the two above fixes, I take back my NACK ;-)
One of them duplicates an already notraced function, so that looks fine.
The other makes a static function global, which could cause issues as well.
After further review:
Acked-by: Steven Rostedt (VMware) <[email protected]>
-- Steve
On Wed, 21 Oct 2020 08:44:56 -0700
"Paul E. McKenney" <[email protected]> wrote:
> Or let me know if you would like me to take it, target v5.11.
I'm not sure if these can wait, as I believe they are fixing a regression
with RISCV function tracing.
Probably best to have them go through the RISCV tree.
-- Steve
On Wed, Oct 21, 2020 at 11:54:51AM -0400, Steven Rostedt wrote:
> On Wed, 21 Oct 2020 08:44:56 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > Or let me know if you would like me to take it, target v5.11.
>
> I'm not sure if these can wait, as I believe they are fixing a regression
> with RISCV function tracing.
>
> Probably best to have them go through the RISCV tree.
Works for me!
Thanx, Paul
On Wed, Oct 21, 2020 at 12:38 AM Zong Li <[email protected]> wrote:
>
> Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> as notrace"), some architectures assume that the stopped CPUs don't make
> function calls to traceable functions when they are in the stopped
> state. For example, it causes unexpected kernel crashed when switching
> tracer on RISC-V.
>
> The following patches added calls to these two functions, fix it by
> adding the notrace annotations.
>
> Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> multi_cpu_stop()")
>
> Signed-off-by: Zong Li <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> kernel/stop_machine.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 06895ef85d69..2a52f42f64b6 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> *
> * The caller must have disabled interrupts and must not be idle.
> */
> -void rcu_momentary_dyntick_idle(void)
> +notrace void rcu_momentary_dyntick_idle(void)
> {
> int special;
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 865bb0228ab6..890b79cf0e7c 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> set_state(msdata, msdata->state + 1);
> }
>
> -void __weak stop_machine_yield(const struct cpumask *cpumask)
> +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> {
> cpu_relax();
> }
> --
> 2.28.0
>
Thanks for the fix. FWIW,
Tested-by: Atish Patra <[email protected]>
Can you update the bugzilla as well ?
https://bugzilla.kernel.org/show_bug.cgi?id=209317
--
Regards,
Atish
On Wed, Oct 21, 2020 at 10:15:22AM -0400, Steven Rostedt wrote:
> On Wed, 21 Oct 2020 10:12:16 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> > > multi_cpu_stop()")
> >
> > I really do not like to add "notrace" to core functions because a single
> > architecture has issues with it. Why does RISCV have problems with these
> > functions but no other architecture does?
> >
> > NACK from me until it is shown that these are issues for a broader set of
> > architectures.
>
> After looking at the two above fixes, I take back my NACK ;-)
>
> One of them duplicates an already notraced function, so that looks fine.
> The other makes a static function global, which could cause issues as well.
>
> After further review:
>
> Acked-by: Steven Rostedt (VMware) <[email protected]>
If someone else would like to take this:
Acked-by: Paul E. McKenney <[email protected]>
Or let me know if you would like me to take it, target v5.11.
Thanx, Paul
On 21/10/2020 08:38, Zong Li wrote:
> Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> as notrace"), some architectures assume that the stopped CPUs don't make
> function calls to traceable functions when they are in the stopped
> state. For example, it causes unexpected kernel crashed when switching
> tracer on RISC-V.
>
> The following patches added calls to these two functions, fix it by
> adding the notrace annotations.
>
> Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> multi_cpu_stop()")
>
> Signed-off-by: Zong Li <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> kernel/stop_machine.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 06895ef85d69..2a52f42f64b6 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> *
> * The caller must have disabled interrupts and must not be idle.
> */
> -void rcu_momentary_dyntick_idle(void)
> +notrace void rcu_momentary_dyntick_idle(void)
> {
> int special;
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 865bb0228ab6..890b79cf0e7c 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> set_state(msdata, msdata->state + 1);
> }
>
> -void __weak stop_machine_yield(const struct cpumask *cpumask)
> +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> {
> cpu_relax();
> }
>
Apologies for taking so long to reply, I needed to test this on several
devices.
This not only fixes the ftrace issue I see on RISC-V but also a ftrace
hang issue on ARM64 in 5.8 too.
Tested-by: Colin Ian King <[email protected]>
Many thanks!
On Sat, Oct 24, 2020 at 3:29 AM Colin Ian King <[email protected]> wrote:
>
> On 21/10/2020 08:38, Zong Li wrote:
> > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> > as notrace"), some architectures assume that the stopped CPUs don't make
> > function calls to traceable functions when they are in the stopped
> > state. For example, it causes unexpected kernel crashed when switching
> > tracer on RISC-V.
> >
> > The following patches added calls to these two functions, fix it by
> > adding the notrace annotations.
> >
> > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> > multi_cpu_stop()")
> >
> > Signed-off-by: Zong Li <[email protected]>
> > ---
> > kernel/rcu/tree.c | 2 +-
> > kernel/stop_machine.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 06895ef85d69..2a52f42f64b6 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> > *
> > * The caller must have disabled interrupts and must not be idle.
> > */
> > -void rcu_momentary_dyntick_idle(void)
> > +notrace void rcu_momentary_dyntick_idle(void)
> > {
> > int special;
> >
> > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > index 865bb0228ab6..890b79cf0e7c 100644
> > --- a/kernel/stop_machine.c
> > +++ b/kernel/stop_machine.c
> > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> > set_state(msdata, msdata->state + 1);
> > }
> >
> > -void __weak stop_machine_yield(const struct cpumask *cpumask)
> > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> > {
> > cpu_relax();
> > }
> >
>
> Apologies for taking so long to reply, I needed to test this on several
> devices.
>
> This not only fixes the ftrace issue I see on RISC-V but also a ftrace
> hang issue on ARM64 in 5.8 too.
>
> Tested-by: Colin Ian King <[email protected]>
>
> Many thanks!
Many thanks all for reviewing and testing.
Hi Palmer,
As Steven suggested, could you help to pick up this patch in RISC-V tree?
The following commit has been merged into the smp/urgent branch of tip:
Commit-ID: 4230e2deaa484b385aa01d598b2aea8e7f2660a6
Gitweb: https://git.kernel.org/tip/4230e2deaa484b385aa01d598b2aea8e7f2660a6
Author: Zong Li <[email protected]>
AuthorDate: Wed, 21 Oct 2020 15:38:39 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Mon, 26 Oct 2020 12:12:27 +01:00
stop_machine, rcu: Mark functions as notrace
Some architectures assume that the stopped CPUs don't make function calls
to traceable functions when they are in the stopped state. See also commit
cb9d7fd51d9f ("watchdog: Mark watchdog touch functions as notrace").
Violating this assumption causes kernel crashes when switching tracer on
RISC-V.
Mark rcu_momentary_dyntick_idle() and stop_machine_yield() notrace to
prevent this.
Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in multi_cpu_stop()")
Signed-off-by: Zong Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Atish Patra <[email protected]>
Tested-by: Colin Ian King <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
---
kernel/rcu/tree.c | 2 +-
kernel/stop_machine.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 06895ef..2a52f42 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
*
* The caller must have disabled interrupts and must not be idle.
*/
-void rcu_momentary_dyntick_idle(void)
+notrace void rcu_momentary_dyntick_idle(void)
{
int special;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 865bb02..890b79c 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
set_state(msdata, msdata->state + 1);
}
-void __weak stop_machine_yield(const struct cpumask *cpumask)
+notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
{
cpu_relax();
}
On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <[email protected]> wrote:
>
> Hi Zong & Atish,
>
> In our 2 harts c910 chip, we found:
>
> echo function > /sys/kernel/debug/tracing/current_tracer
> echo function_graph > /sys/kernel/debug/tracing/current_tracer
> echo function > /sys/kernel/debug/tracing/current_tracer
> echo function_graph > /sys/kernel/debug/tracing/current_tracer
>
> Then one core halted at stop_machine_yield:
> arch_cpu_idle () at arch/riscv/kernel/process.c:39
> 39 local_irq_enable();
> (gdb) i th
> Id Target Id Frame
> * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39
> 2 Thread 2 (CPU#1) stop_machine_yield
> (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at
> ./arch/riscv/include/asm/vdso/processor.h:12
> (gdb) thread 2
> [Switching to thread 2 (Thread 2)]
> #0 stop_machine_yield (cpumask=0xffffffe001371fa8
> <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12
> 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
>
> With your patch, it's solved. For this patch, I'll give:
> Tested by: Guo Ren <[email protected]>
>
> But that's not enough, we still need:
>
> diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> index 226ccce..12b8808 100644
> --- a/arch/riscv/kernel/sbi.c
> +++ b/arch/riscv/kernel/sbi.c
> @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi);
> *
> * Return: None
> */
> -void sbi_remote_fence_i(const unsigned long *hart_mask)
> +void notrace sbi_remote_fence_i(const unsigned long *hart_mask)
> {
> __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
> hart_mask, 0, 0, 0, 0);
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 400b945d..9467d987 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -9,12 +9,12 @@
>
> #include <asm/sbi.h>
>
> -static void ipi_remote_fence_i(void *info)
> +static void notrace ipi_remote_fence_i(void *info)
> {
> return local_flush_icache_all();
> }
>
> -void flush_icache_all(void)
> +void notrace flush_icache_all(void)
> {
> if (IS_ENABLED(CONFIG_RISCV_SBI))
> sbi_remote_fence_i(NULL);
>
Did you see any issue if these functions are not marked as notrace ?
As per Zong's explanation, the issue was that the other harts already
fetched the next 2 nops and
executed 1 while kernel patching replaced other with one of the auipc
+ jalr pair.
@Zong can correct me if I am wrong.
These functions are too far ahead. Can it cause such issues ? If yes,
then we need to mark each and every function
that can be invoked from patch_text_nosync and are not inlined.
That includes copy_to_kernel_nofault, __sbi_rfence_v02,
__sbi_rfence_v02_call, sbi_ecall.
Few of these functions may be inlined by compiler. Can we depend on that ?
> Because:
> (gdb) bt
> #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20
> #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns=
> <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96
> #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
> addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
> #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e
> nable=true) at kernel/trace/ftrace.c:2503
> #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized
> out>) at kernel/trace/ftrace.c:2530
> #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel
> /trace/ftrace.c:2677
> #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at
> kernel/trace/ftrace.c:2703
> #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin
> e.c:224
> #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern
> el/stop_machine.c:491
> #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot.
> c:165
> #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern
> el/kthread.c:292
> #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
>
> On Wed, Oct 21, 2020 at 3:38 PM Zong Li <[email protected]> wrote:
> >
> > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> > as notrace"), some architectures assume that the stopped CPUs don't make
> > function calls to traceable functions when they are in the stopped
> > state. For example, it causes unexpected kernel crashed when switching
> > tracer on RISC-V.
> >
> > The following patches added calls to these two functions, fix it by
> > adding the notrace annotations.
> >
> > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> > multi_cpu_stop()")
> >
> > Signed-off-by: Zong Li <[email protected]>
> > ---
> > kernel/rcu/tree.c | 2 +-
> > kernel/stop_machine.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 06895ef85d69..2a52f42f64b6 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> > *
> > * The caller must have disabled interrupts and must not be idle.
> > */
> > -void rcu_momentary_dyntick_idle(void)
> > +notrace void rcu_momentary_dyntick_idle(void)
> > {
> > int special;
> >
> > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > index 865bb0228ab6..890b79cf0e7c 100644
> > --- a/kernel/stop_machine.c
> > +++ b/kernel/stop_machine.c
> > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> > set_state(msdata, msdata->state + 1);
> > }
> >
> > -void __weak stop_machine_yield(const struct cpumask *cpumask)
> > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> > {
> > cpu_relax();
> > }
> > --
> > 2.28.0
> >
>
>
> --
> Best Regards
> Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/
--
Regards,
Atish
Hi Zong & Atish,
In our 2 harts c910 chip, we found:
echo function > /sys/kernel/debug/tracing/current_tracer
echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo function > /sys/kernel/debug/tracing/current_tracer
echo function_graph > /sys/kernel/debug/tracing/current_tracer
Then one core halted at stop_machine_yield:
arch_cpu_idle () at arch/riscv/kernel/process.c:39
39 local_irq_enable();
(gdb) i th
Id Target Id Frame
* 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39
2 Thread 2 (CPU#1) stop_machine_yield
(cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at
./arch/riscv/include/asm/vdso/processor.h:12
(gdb) thread 2
[Switching to thread 2 (Thread 2)]
#0 stop_machine_yield (cpumask=0xffffffe001371fa8
<__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12
12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
With your patch, it's solved. For this patch, I'll give:
Tested by: Guo Ren <[email protected]>
But that's not enough, we still need:
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index 226ccce..12b8808 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi);
*
* Return: None
*/
-void sbi_remote_fence_i(const unsigned long *hart_mask)
+void notrace sbi_remote_fence_i(const unsigned long *hart_mask)
{
__sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
hart_mask, 0, 0, 0, 0);
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 400b945d..9467d987 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -9,12 +9,12 @@
#include <asm/sbi.h>
-static void ipi_remote_fence_i(void *info)
+static void notrace ipi_remote_fence_i(void *info)
{
return local_flush_icache_all();
}
-void flush_icache_all(void)
+void notrace flush_icache_all(void)
{
if (IS_ENABLED(CONFIG_RISCV_SBI))
sbi_remote_fence_i(NULL);
Because:
(gdb) bt
#0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20
#1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns=
<optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96
#2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
#3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e
nable=true) at kernel/trace/ftrace.c:2503
#4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized
out>) at kernel/trace/ftrace.c:2530
#5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel
/trace/ftrace.c:2677
#6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at
kernel/trace/ftrace.c:2703
#7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin
e.c:224
#8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern
el/stop_machine.c:491
#9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot.
c:165
#10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern
el/kthread.c:292
#11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
On Wed, Oct 21, 2020 at 3:38 PM Zong Li <[email protected]> wrote:
>
> Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> as notrace"), some architectures assume that the stopped CPUs don't make
> function calls to traceable functions when they are in the stopped
> state. For example, it causes unexpected kernel crashed when switching
> tracer on RISC-V.
>
> The following patches added calls to these two functions, fix it by
> adding the notrace annotations.
>
> Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> multi_cpu_stop()")
>
> Signed-off-by: Zong Li <[email protected]>
> ---
> kernel/rcu/tree.c | 2 +-
> kernel/stop_machine.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 06895ef85d69..2a52f42f64b6 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> *
> * The caller must have disabled interrupts and must not be idle.
> */
> -void rcu_momentary_dyntick_idle(void)
> +notrace void rcu_momentary_dyntick_idle(void)
> {
> int special;
>
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 865bb0228ab6..890b79cf0e7c 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> set_state(msdata, msdata->state + 1);
> }
>
> -void __weak stop_machine_yield(const struct cpumask *cpumask)
> +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> {
> cpu_relax();
> }
> --
> 2.28.0
>
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <[email protected]> wrote:
>
> On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <[email protected]> wrote:
> >
> > Hi Zong & Atish,
> >
> > In our 2 harts c910 chip, we found:
> >
> > echo function > /sys/kernel/debug/tracing/current_tracer
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > echo function > /sys/kernel/debug/tracing/current_tracer
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> >
> > Then one core halted at stop_machine_yield:
> > arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > 39 local_irq_enable();
> > (gdb) i th
> > Id Target Id Frame
> > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > 2 Thread 2 (CPU#1) stop_machine_yield
> > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at
> > ./arch/riscv/include/asm/vdso/processor.h:12
> > (gdb) thread 2
> > [Switching to thread 2 (Thread 2)]
> > #0 stop_machine_yield (cpumask=0xffffffe001371fa8
> > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12
> > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
> >
> > With your patch, it's solved. For this patch, I'll give:
> > Tested by: Guo Ren <[email protected]>
> >
> > But that's not enough, we still need:
> >
> > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > index 226ccce..12b8808 100644
> > --- a/arch/riscv/kernel/sbi.c
> > +++ b/arch/riscv/kernel/sbi.c
> > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi);
> > *
> > * Return: None
> > */
> > -void sbi_remote_fence_i(const unsigned long *hart_mask)
> > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask)
> > {
> > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
> > hart_mask, 0, 0, 0, 0);
> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > index 400b945d..9467d987 100644
> > --- a/arch/riscv/mm/cacheflush.c
> > +++ b/arch/riscv/mm/cacheflush.c
> > @@ -9,12 +9,12 @@
> >
> > #include <asm/sbi.h>
> >
> > -static void ipi_remote_fence_i(void *info)
> > +static void notrace ipi_remote_fence_i(void *info)
> > {
> > return local_flush_icache_all();
> > }
> >
> > -void flush_icache_all(void)
> > +void notrace flush_icache_all(void)
> > {
> > if (IS_ENABLED(CONFIG_RISCV_SBI))
> > sbi_remote_fence_i(NULL);
> >
>
> Did you see any issue if these functions are not marked as notrace ?
>
> As per Zong's explanation, the issue was that the other harts already
> fetched the next 2 nops and
> executed 1 while kernel patching replaced other with one of the auipc
> + jalr pair.
>
> @Zong can correct me if I am wrong.
>
> These functions are too far ahead. Can it cause such issues ? If yes,
> then we need to mark each and every function
> that can be invoked from patch_text_nosync and are not inlined.
>
> That includes copy_to_kernel_nofault, __sbi_rfence_v02,
> __sbi_rfence_v02_call, sbi_ecall.
>
> Few of these functions may be inlined by compiler. Can we depend on that ?
>
> > Because:
> > (gdb) bt
> > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20
> > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns=
> > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96
> > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
> > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
> > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e
> > nable=true) at kernel/trace/ftrace.c:2503
> > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized
> > out>) at kernel/trace/ftrace.c:2530
> > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel
> > /trace/ftrace.c:2677
> > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at
> > kernel/trace/ftrace.c:2703
> > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin
> > e.c:224
> > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern
> > el/stop_machine.c:491
> > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot.
> > c:165
> > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern
> > el/kthread.c:292
> > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
> >
It seems to me that the problem happens on the waiting threads, it
doesn't cause the issue on the patching code thread, so it is OK that
these functions are traceable. I probably don't figure out all
possible situations, do you find any issue and reason to change the
annotation of these functions?
> > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <[email protected]> wrote:
> > >
> > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> > > as notrace"), some architectures assume that the stopped CPUs don't make
> > > function calls to traceable functions when they are in the stopped
> > > state. For example, it causes unexpected kernel crashed when switching
> > > tracer on RISC-V.
> > >
> > > The following patches added calls to these two functions, fix it by
> > > adding the notrace annotations.
> > >
> > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> > > multi_cpu_stop()")
> > >
> > > Signed-off-by: Zong Li <[email protected]>
> > > ---
> > > kernel/rcu/tree.c | 2 +-
> > > kernel/stop_machine.c | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 06895ef85d69..2a52f42f64b6 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> > > *
> > > * The caller must have disabled interrupts and must not be idle.
> > > */
> > > -void rcu_momentary_dyntick_idle(void)
> > > +notrace void rcu_momentary_dyntick_idle(void)
> > > {
> > > int special;
> > >
> > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > > index 865bb0228ab6..890b79cf0e7c 100644
> > > --- a/kernel/stop_machine.c
> > > +++ b/kernel/stop_machine.c
> > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> > > set_state(msdata, msdata->state + 1);
> > > }
> > >
> > > -void __weak stop_machine_yield(const struct cpumask *cpumask)
> > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> > > {
> > > cpu_relax();
> > > }
> > > --
> > > 2.28.0
> > >
> >
> >
> > --
> > Best Regards
> > Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/
>
>
>
> --
> Regards,
> Atish
On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <[email protected]> wrote:
>
> On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <[email protected]> wrote:
> >
> > Hi Zong & Atish,
> >
> > In our 2 harts c910 chip, we found:
> >
> > echo function > /sys/kernel/debug/tracing/current_tracer
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > echo function > /sys/kernel/debug/tracing/current_tracer
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> >
> > Then one core halted at stop_machine_yield:
> > arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > 39 local_irq_enable();
> > (gdb) i th
> > Id Target Id Frame
> > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > 2 Thread 2 (CPU#1) stop_machine_yield
> > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at
> > ./arch/riscv/include/asm/vdso/processor.h:12
> > (gdb) thread 2
> > [Switching to thread 2 (Thread 2)]
> > #0 stop_machine_yield (cpumask=0xffffffe001371fa8
> > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12
> > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
> >
> > With your patch, it's solved. For this patch, I'll give:
> > Tested by: Guo Ren <[email protected]>
> >
> > But that's not enough, we still need:
> >
> > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > index 226ccce..12b8808 100644
> > --- a/arch/riscv/kernel/sbi.c
> > +++ b/arch/riscv/kernel/sbi.c
> > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi);
> > *
> > * Return: None
> > */
> > -void sbi_remote_fence_i(const unsigned long *hart_mask)
> > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask)
> > {
> > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
> > hart_mask, 0, 0, 0, 0);
> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > index 400b945d..9467d987 100644
> > --- a/arch/riscv/mm/cacheflush.c
> > +++ b/arch/riscv/mm/cacheflush.c
> > @@ -9,12 +9,12 @@
> >
> > #include <asm/sbi.h>
> >
> > -static void ipi_remote_fence_i(void *info)
> > +static void notrace ipi_remote_fence_i(void *info)
> > {
> > return local_flush_icache_all();
> > }
> >
> > -void flush_icache_all(void)
> > +void notrace flush_icache_all(void)
> > {
> > if (IS_ENABLED(CONFIG_RISCV_SBI))
> > sbi_remote_fence_i(NULL);
> >
>
> Did you see any issue if these functions are not marked as notrace ?
Welcome to Buildroot
buildroot login: root
#
# cat /proc/cpuinfo
processor : 0
hart : 0
isa : rv64imafdcsu
mmu : sv39
#
#
# echo function > /sys/kernel/debug/tracing/current_tracer
[ 45.234334] Unable to handle kernel paging request at virtual
address ffffffd38ae80900
[ 45.242313] Oops [#1]
[ 45.244600] Modules linked in:
[ 45.247678] CPU: 0 PID: 11 Comm: migration/0 Not tainted
5.9.0-00025-g9b7db83-dirty #215
[ 45.255797] epc: ffffffe00021689a ra : ffffffe00021718e sp : ffffffe01afabb58
[ 45.262955] gp : ffffffe00136afa0 tp : ffffffe01af94d00 t0 :
0000000000000002
[ 45.270200] t1 : 0000000000000000 t2 : 0000000000000001 s0 :
ffffffe01afabc08
[ 45.277443] s1 : ffffffe0013718a8 a0 : 0000000000000000 a1 :
ffffffe01afabba8
[ 45.284686] a2 : 0000000000000000 a3 : 0000000000000000 a4 :
c4c16ad38ae80900
[ 45.291929] a5 : 0000000000000000 a6 : 0000000000000000 a7 :
0000000052464e43
[ 45.299173] s2 : 0000000000000001 s3 : ffffffe000206a60 s4 :
ffffffe000206a60
[ 45.306415] s5 : 00000000000009ec s6 : ffffffe0013718a8 s7 :
c4c16ad38ae80900
[ 45.313658] s8 : 0000000000000004 s9 : 0000000000000001 s10:
0000000000000001
[ 45.320902] s11: 0000000000000003 t3 : 0000000000000001 t4 :
ffffffffd192fe79
[ 45.328144] t5 : ffffffffb8f80000 t6 : 0000000000040000
[ 45.333472] status: 0000000200000100 badaddr: ffffffd38ae80900
cause: 000000000000000f
[ 45.341514] ---[ end trace d95102172248fdcf ]---
[ 45.346176] note: migration/0[11] exited with preempt_count 1
(gdb) x /2i $pc
=> 0xffffffe00021689a <__do_proc_dointvec+196>: sd zero,0(s7)
0xffffffe00021689e <__do_proc_dointvec+200>: li s11,0
(gdb) bt
#0 __do_proc_dointvec (tbl_data=0x0, table=0xffffffe01afabba8,
write=0, buffer=0x0, lenp=0x7bf897061f9a0800, ppos=0x4, conv=0x0,
data=0x52464e43) at kernel/sysctl.c:581
#1 0xffffffe00021718e in do_proc_dointvec (data=<optimized out>,
conv=<optimized out>, ppos=<optimized out>, lenp=<optimized out>,
buffer=<optimized out>, write=<optimized out>, table=<optimized out>)
at kernel/sysctl.c:964
#2 proc_dointvec_minmax (ppos=<optimized out>, lenp=<optimized out>,
buffer=<optimized out>, write=<optimized out>, table=<optimized out>)
at kernel/sysctl.c:964
#3 proc_do_static_key (table=<optimized out>, write=1, buffer=0x0,
lenp=0x0, ppos=0x7bf897061f9a0800) at kernel/sysctl.c:1643
#4 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
#5 0xffffffe0002c9c04 in __ftrace_replace_code
(rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503
#6 0xffffffe0002ca0b2 in ftrace_replace_code (mod_flags=<optimized
out>) at kernel/trace/ftrace.c:2530
#7 0xffffffe0002ca26a in ftrace_modify_all_code (command=5) at
kernel/trace/ftrace.c:2677
#8 0xffffffe0002ca30e in __ftrace_modify_code (data=<optimized out>)
at kernel/trace/ftrace.c:2703
#9 0xffffffe0002c13b0 in multi_cpu_stop (data=0x0) at kernel/stop_machine.c:224
#10 0xffffffe0002c0fde in cpu_stopper_thread (cpu=<optimized out>) at
kernel/stop_machine.c:491
#11 0xffffffe0002343de in smpboot_thread_fn (data=0x0) at kernel/smpboot.c:165
#12 0xffffffe00022f8b4 in kthread (_create=0xffffffe01af0c040) at
kernel/kthread.c:292
#13 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
0xffffffe00020678a <+114>: auipc ra,0xffffe
0xffffffe00020678e <+118>: jalr -118(ra) # 0xffffffe000204714
<patch_text_nosync>
0xffffffe000206792 <+122>: snez a0,a0
(gdb) disassemble patch_text_nosync
Dump of assembler code for function patch_text_nosync:
0xffffffe000204714 <+0>: addi sp,sp,-32
0xffffffe000204716 <+2>: sd s0,16(sp)
0xffffffe000204718 <+4>: sd ra,24(sp)
0xffffffe00020471a <+6>: addi s0,sp,32
0xffffffe00020471c <+8>: auipc ra,0x0
0xffffffe000204720 <+12>: jalr -384(ra) # 0xffffffe00020459c
<patch_insn_write>
0xffffffe000204724 <+16>: beqz a0,0xffffffe00020472e
<patch_text_nosync+26>
0xffffffe000204726 <+18>: ld ra,24(sp)
0xffffffe000204728 <+20>: ld s0,16(sp)
0xffffffe00020472a <+22>: addi sp,sp,32
0xffffffe00020472c <+24>: ret
0xffffffe00020472e <+26>: sd a0,-24(s0)
0xffffffe000204732 <+30>: auipc ra,0x4
0xffffffe000204736 <+34>: jalr -1464(ra) # 0xffffffe00020817a
<flush_icache_all>
0xffffffe00020473a <+38>: ld a0,-24(s0)
0xffffffe00020473e <+42>: ld ra,24(sp)
0xffffffe000204740 <+44>: ld s0,16(sp)
0xffffffe000204742 <+46>: addi sp,sp,32
0xffffffe000204744 <+48>: ret
(gdb) disassemble flush_icache_all-4
Dump of assembler code for function flush_icache_all:
0xffffffe00020817a <+0>: addi sp,sp,-8
0xffffffe00020817c <+2>: sd ra,0(sp)
0xffffffe00020817e <+4>: auipc ra,0xfffff
0xffffffe000208182 <+8>: jalr -1822(ra) # 0xffffffe000206a60
<ftrace_caller>
0xffffffe000208186 <+12>: ld ra,0(sp)
0xffffffe000208188 <+14>: addi sp,sp,8
0xffffffe00020818a <+0>: addi sp,sp,-16
0xffffffe00020818c <+2>: sd s0,0(sp)
0xffffffe00020818e <+4>: sd ra,8(sp)
0xffffffe000208190 <+6>: addi s0,sp,16
0xffffffe000208192 <+8>: li a0,0
0xffffffe000208194 <+10>: auipc ra,0xfffff
0xffffffe000208198 <+14>: jalr -410(ra) # 0xffffffe000206ffa
<sbi_remote_fence_i>
0xffffffe00020819c <+18>: ld s0,0(sp)
0xffffffe00020819e <+20>: ld ra,8(sp)
0xffffffe0002081a0 <+22>: addi sp,sp,16
0xffffffe0002081a2 <+24>: ret
(gdb) frame 5
#5 0xffffffe0002c9c04 in __ftrace_replace_code
(rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503
2503 return ftrace_make_call(rec, ftrace_addr);
(gdb) p /x rec->ip
$2 = 0xffffffe00020817a -> flush_icache_all !
Look when we modify flush_icache_all's patchable-entry with ftrace_caller:
1. Insert ftrace_caller at flush_icache_all entry.
2. Call flush_icache_all to sync I/Dcache, but flush_icache_all is
just we've modified not ready to be called!
>
> As per Zong's explanation, the issue was that the other harts already
> fetched the next 2 nops and
> executed 1 while kernel patching replaced other with one of the auipc
> + jalr pair.
>
> @Zong can correct me if I am wrong.
>
> These functions are too far ahead. Can it cause such issues ? If yes,
> then we need to mark each and every function
> that can be invoked from patch_text_nosync and are not inlined.
>
> That includes copy_to_kernel_nofault, __sbi_rfence_v02,
> __sbi_rfence_v02_call, sbi_ecall.
Yes, mark all of them.
>
> Few of these functions may be inlined by compiler. Can we depend on that ?
It works, but we'd better give notrace for them.
>
> > Because:
> > (gdb) bt
> > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20
> > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns=
> > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96
> > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
> > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
> > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e
> > nable=true) at kernel/trace/ftrace.c:2503
> > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized
> > out>) at kernel/trace/ftrace.c:2530
> > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel
> > /trace/ftrace.c:2677
> > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at
> > kernel/trace/ftrace.c:2703
> > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin
> > e.c:224
> > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern
> > el/stop_machine.c:491
> > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot.
> > c:165
> > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern
> > el/kthread.c:292
> > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
> >
> > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <[email protected]> wrote:
> > >
> > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> > > as notrace"), some architectures assume that the stopped CPUs don't make
> > > function calls to traceable functions when they are in the stopped
> > > state. For example, it causes unexpected kernel crashed when switching
> > > tracer on RISC-V.
> > >
> > > The following patches added calls to these two functions, fix it by
> > > adding the notrace annotations.
> > >
> > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> > > multi_cpu_stop()")
> > >
> > > Signed-off-by: Zong Li <[email protected]>
> > > ---
> > > kernel/rcu/tree.c | 2 +-
> > > kernel/stop_machine.c | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 06895ef85d69..2a52f42f64b6 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> > > *
> > > * The caller must have disabled interrupts and must not be idle.
> > > */
> > > -void rcu_momentary_dyntick_idle(void)
> > > +notrace void rcu_momentary_dyntick_idle(void)
> > > {
> > > int special;
> > >
> > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > > index 865bb0228ab6..890b79cf0e7c 100644
> > > --- a/kernel/stop_machine.c
> > > +++ b/kernel/stop_machine.c
> > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> > > set_state(msdata, msdata->state + 1);
> > > }
> > >
> > > -void __weak stop_machine_yield(const struct cpumask *cpumask)
> > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> > > {
> > > cpu_relax();
> > > }
> > > --
> > > 2.28.0
> > >
> >
> >
> > --
> > Best Regards
> > Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/
>
>
>
> --
> Regards,
> Atish
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
On Thu, Oct 29, 2020 at 10:34 AM Zong Li <[email protected]> wrote:
>
> On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <[email protected]> wrote:
> >
> > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <[email protected]> wrote:
> > >
> > > Hi Zong & Atish,
> > >
> > > In our 2 harts c910 chip, we found:
> > >
> > > echo function > /sys/kernel/debug/tracing/current_tracer
> > > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > echo function > /sys/kernel/debug/tracing/current_tracer
> > > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > >
> > > Then one core halted at stop_machine_yield:
> > > arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > > 39 local_irq_enable();
> > > (gdb) i th
> > > Id Target Id Frame
> > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > > 2 Thread 2 (CPU#1) stop_machine_yield
> > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at
> > > ./arch/riscv/include/asm/vdso/processor.h:12
> > > (gdb) thread 2
> > > [Switching to thread 2 (Thread 2)]
> > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8
> > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12
> > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
> > >
> > > With your patch, it's solved. For this patch, I'll give:
> > > Tested by: Guo Ren <[email protected]>
> > >
> > > But that's not enough, we still need:
> > >
> > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > > index 226ccce..12b8808 100644
> > > --- a/arch/riscv/kernel/sbi.c
> > > +++ b/arch/riscv/kernel/sbi.c
> > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi);
> > > *
> > > * Return: None
> > > */
> > > -void sbi_remote_fence_i(const unsigned long *hart_mask)
> > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask)
> > > {
> > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
> > > hart_mask, 0, 0, 0, 0);
> > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > > index 400b945d..9467d987 100644
> > > --- a/arch/riscv/mm/cacheflush.c
> > > +++ b/arch/riscv/mm/cacheflush.c
> > > @@ -9,12 +9,12 @@
> > >
> > > #include <asm/sbi.h>
> > >
> > > -static void ipi_remote_fence_i(void *info)
> > > +static void notrace ipi_remote_fence_i(void *info)
> > > {
> > > return local_flush_icache_all();
> > > }
> > >
> > > -void flush_icache_all(void)
> > > +void notrace flush_icache_all(void)
> > > {
> > > if (IS_ENABLED(CONFIG_RISCV_SBI))
> > > sbi_remote_fence_i(NULL);
> > >
> >
> > Did you see any issue if these functions are not marked as notrace ?
> >
> > As per Zong's explanation, the issue was that the other harts already
> > fetched the next 2 nops and
> > executed 1 while kernel patching replaced other with one of the auipc
> > + jalr pair.
> >
> > @Zong can correct me if I am wrong.
> >
> > These functions are too far ahead. Can it cause such issues ? If yes,
> > then we need to mark each and every function
> > that can be invoked from patch_text_nosync and are not inlined.
> >
> > That includes copy_to_kernel_nofault, __sbi_rfence_v02,
> > __sbi_rfence_v02_call, sbi_ecall.
> >
> > Few of these functions may be inlined by compiler. Can we depend on that ?
> >
> > > Because:
> > > (gdb) bt
> > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20
> > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns=
> > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96
> > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
> > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
> > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e
> > > nable=true) at kernel/trace/ftrace.c:2503
> > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized
> > > out>) at kernel/trace/ftrace.c:2530
> > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel
> > > /trace/ftrace.c:2677
> > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at
> > > kernel/trace/ftrace.c:2703
> > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin
> > > e.c:224
> > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern
> > > el/stop_machine.c:491
> > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot.
> > > c:165
> > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern
> > > el/kthread.c:292
> > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
> > >
>
> It seems to me that the problem happens on the waiting threads, it
No, that is the call trace to show ftrace_make_call ->
flush_icache_all and we should give notrace on the whole path.
> doesn't cause the issue on the patching code thread, so it is OK that
> these functions are traceable. I probably don't figure out all
> possible situations, do you find any issue and reason to change the
> annotation of these functions?
>
> > > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <[email protected]> wrote:
> > > >
> > > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> > > > as notrace"), some architectures assume that the stopped CPUs don't make
> > > > function calls to traceable functions when they are in the stopped
> > > > state. For example, it causes unexpected kernel crashed when switching
> > > > tracer on RISC-V.
> > > >
> > > > The following patches added calls to these two functions, fix it by
> > > > adding the notrace annotations.
> > > >
> > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> > > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> > > > multi_cpu_stop()")
> > > >
> > > > Signed-off-by: Zong Li <[email protected]>
> > > > ---
> > > > kernel/rcu/tree.c | 2 +-
> > > > kernel/stop_machine.c | 2 +-
> > > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 06895ef85d69..2a52f42f64b6 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> > > > *
> > > > * The caller must have disabled interrupts and must not be idle.
> > > > */
> > > > -void rcu_momentary_dyntick_idle(void)
> > > > +notrace void rcu_momentary_dyntick_idle(void)
> > > > {
> > > > int special;
> > > >
> > > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > > > index 865bb0228ab6..890b79cf0e7c 100644
> > > > --- a/kernel/stop_machine.c
> > > > +++ b/kernel/stop_machine.c
> > > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> > > > set_state(msdata, msdata->state + 1);
> > > > }
> > > >
> > > > -void __weak stop_machine_yield(const struct cpumask *cpumask)
> > > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> > > > {
> > > > cpu_relax();
> > > > }
> > > > --
> > > > 2.28.0
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > > Guo Ren
> > >
> > > ML: https://lore.kernel.org/linux-csky/
> >
> >
> >
> > --
> > Regards,
> > Atish
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
On Thu, Oct 29, 2020 at 9:06 AM Guo Ren <[email protected]> wrote:
>
> On Thu, Oct 29, 2020 at 10:34 AM Zong Li <[email protected]> wrote:
> >
> > On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <[email protected]> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <[email protected]> wrote:
> > > >
> > > > Hi Zong & Atish,
> > > >
> > > > In our 2 harts c910 chip, we found:
> > > >
> > > > echo function > /sys/kernel/debug/tracing/current_tracer
> > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > > echo function > /sys/kernel/debug/tracing/current_tracer
> > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > >
> > > > Then one core halted at stop_machine_yield:
> > > > arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > > > 39 local_irq_enable();
> > > > (gdb) i th
> > > > Id Target Id Frame
> > > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > > > 2 Thread 2 (CPU#1) stop_machine_yield
> > > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at
> > > > ./arch/riscv/include/asm/vdso/processor.h:12
> > > > (gdb) thread 2
> > > > [Switching to thread 2 (Thread 2)]
> > > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8
> > > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12
> > > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
> > > >
> > > > With your patch, it's solved. For this patch, I'll give:
> > > > Tested by: Guo Ren <[email protected]>
> > > >
> > > > But that's not enough, we still need:
> > > >
> > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > > > index 226ccce..12b8808 100644
> > > > --- a/arch/riscv/kernel/sbi.c
> > > > +++ b/arch/riscv/kernel/sbi.c
> > > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi);
> > > > *
> > > > * Return: None
> > > > */
> > > > -void sbi_remote_fence_i(const unsigned long *hart_mask)
> > > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask)
> > > > {
> > > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
> > > > hart_mask, 0, 0, 0, 0);
> > > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > > > index 400b945d..9467d987 100644
> > > > --- a/arch/riscv/mm/cacheflush.c
> > > > +++ b/arch/riscv/mm/cacheflush.c
> > > > @@ -9,12 +9,12 @@
> > > >
> > > > #include <asm/sbi.h>
> > > >
> > > > -static void ipi_remote_fence_i(void *info)
> > > > +static void notrace ipi_remote_fence_i(void *info)
> > > > {
> > > > return local_flush_icache_all();
> > > > }
> > > >
> > > > -void flush_icache_all(void)
> > > > +void notrace flush_icache_all(void)
> > > > {
> > > > if (IS_ENABLED(CONFIG_RISCV_SBI))
> > > > sbi_remote_fence_i(NULL);
> > > >
> > >
> > > Did you see any issue if these functions are not marked as notrace ?
> > >
> > > As per Zong's explanation, the issue was that the other harts already
> > > fetched the next 2 nops and
> > > executed 1 while kernel patching replaced other with one of the auipc
> > > + jalr pair.
> > >
> > > @Zong can correct me if I am wrong.
> > >
> > > These functions are too far ahead. Can it cause such issues ? If yes,
> > > then we need to mark each and every function
> > > that can be invoked from patch_text_nosync and are not inlined.
> > >
> > > That includes copy_to_kernel_nofault, __sbi_rfence_v02,
> > > __sbi_rfence_v02_call, sbi_ecall.
> > >
> > > Few of these functions may be inlined by compiler. Can we depend on that ?
> > >
> > > > Because:
> > > > (gdb) bt
> > > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20
> > > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns=
> > > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96
> > > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
> > > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
> > > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e
> > > > nable=true) at kernel/trace/ftrace.c:2503
> > > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized
> > > > out>) at kernel/trace/ftrace.c:2530
> > > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel
> > > > /trace/ftrace.c:2677
> > > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at
> > > > kernel/trace/ftrace.c:2703
> > > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin
> > > > e.c:224
> > > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern
> > > > el/stop_machine.c:491
> > > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot.
> > > > c:165
> > > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern
> > > > el/kthread.c:292
> > > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
> > > >
> >
> > It seems to me that the problem happens on the waiting threads, it
> No, that is the call trace to show ftrace_make_call ->
> flush_icache_all and we should give notrace on the whole path.
>
Hmm. I am curious to understand how other architectures avoid this problem.
Is it a bigger issue in RISC-V because we have to switch privilege
mode to sync I/D cache ?
> > doesn't cause the issue on the patching code thread, so it is OK that
> > these functions are traceable. I probably don't figure out all
> > possible situations, do you find any issue and reason to change the
> > annotation of these functions?
> >
> > > > On Wed, Oct 21, 2020 at 3:38 PM Zong Li <[email protected]> wrote:
> > > > >
> > > > > Like the commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions
> > > > > as notrace"), some architectures assume that the stopped CPUs don't make
> > > > > function calls to traceable functions when they are in the stopped
> > > > > state. For example, it causes unexpected kernel crashed when switching
> > > > > tracer on RISC-V.
> > > > >
> > > > > The following patches added calls to these two functions, fix it by
> > > > > adding the notrace annotations.
> > > > >
> > > > > Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield")
> > > > > Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in
> > > > > multi_cpu_stop()")
> > > > >
> > > > > Signed-off-by: Zong Li <[email protected]>
> > > > > ---
> > > > > kernel/rcu/tree.c | 2 +-
> > > > > kernel/stop_machine.c | 2 +-
> > > > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 06895ef85d69..2a52f42f64b6 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -409,7 +409,7 @@ bool rcu_eqs_special_set(int cpu)
> > > > > *
> > > > > * The caller must have disabled interrupts and must not be idle.
> > > > > */
> > > > > -void rcu_momentary_dyntick_idle(void)
> > > > > +notrace void rcu_momentary_dyntick_idle(void)
> > > > > {
> > > > > int special;
> > > > >
> > > > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > > > > index 865bb0228ab6..890b79cf0e7c 100644
> > > > > --- a/kernel/stop_machine.c
> > > > > +++ b/kernel/stop_machine.c
> > > > > @@ -178,7 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
> > > > > set_state(msdata, msdata->state + 1);
> > > > > }
> > > > >
> > > > > -void __weak stop_machine_yield(const struct cpumask *cpumask)
> > > > > +notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> > > > > {
> > > > > cpu_relax();
> > > > > }
> > > > > --
> > > > > 2.28.0
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > > Guo Ren
> > > >
> > > > ML: https://lore.kernel.org/linux-csky/
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Atish
>
>
>
> --
> Best Regards
> Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/
--
Regards,
Atish
On Fri, Oct 30, 2020 at 2:46 AM Atish Patra <[email protected]> wrote:
>
> On Thu, Oct 29, 2020 at 9:06 AM Guo Ren <[email protected]> wrote:
> >
> > On Thu, Oct 29, 2020 at 10:34 AM Zong Li <[email protected]> wrote:
> > >
> > > On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <[email protected]> wrote:
> > > >
> > > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <[email protected]> wrote:
> > > > >
> > > > > Hi Zong & Atish,
> > > > >
> > > > > In our 2 harts c910 chip, we found:
> > > > >
> > > > > echo function > /sys/kernel/debug/tracing/current_tracer
> > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > > > echo function > /sys/kernel/debug/tracing/current_tracer
> > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > > >
> > > > > Then one core halted at stop_machine_yield:
> > > > > arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > > > > 39 local_irq_enable();
> > > > > (gdb) i th
> > > > > Id Target Id Frame
> > > > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > > > > 2 Thread 2 (CPU#1) stop_machine_yield
> > > > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at
> > > > > ./arch/riscv/include/asm/vdso/processor.h:12
> > > > > (gdb) thread 2
> > > > > [Switching to thread 2 (Thread 2)]
> > > > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8
> > > > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12
> > > > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
> > > > >
> > > > > With your patch, it's solved. For this patch, I'll give:
> > > > > Tested by: Guo Ren <[email protected]>
> > > > >
> > > > > But that's not enough, we still need:
> > > > >
> > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > > > > index 226ccce..12b8808 100644
> > > > > --- a/arch/riscv/kernel/sbi.c
> > > > > +++ b/arch/riscv/kernel/sbi.c
> > > > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi);
> > > > > *
> > > > > * Return: None
> > > > > */
> > > > > -void sbi_remote_fence_i(const unsigned long *hart_mask)
> > > > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask)
> > > > > {
> > > > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
> > > > > hart_mask, 0, 0, 0, 0);
> > > > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > > > > index 400b945d..9467d987 100644
> > > > > --- a/arch/riscv/mm/cacheflush.c
> > > > > +++ b/arch/riscv/mm/cacheflush.c
> > > > > @@ -9,12 +9,12 @@
> > > > >
> > > > > #include <asm/sbi.h>
> > > > >
> > > > > -static void ipi_remote_fence_i(void *info)
> > > > > +static void notrace ipi_remote_fence_i(void *info)
> > > > > {
> > > > > return local_flush_icache_all();
> > > > > }
> > > > >
> > > > > -void flush_icache_all(void)
> > > > > +void notrace flush_icache_all(void)
> > > > > {
> > > > > if (IS_ENABLED(CONFIG_RISCV_SBI))
> > > > > sbi_remote_fence_i(NULL);
> > > > >
> > > >
> > > > Did you see any issue if these functions are not marked as notrace ?
> > > >
> > > > As per Zong's explanation, the issue was that the other harts already
> > > > fetched the next 2 nops and
> > > > executed 1 while kernel patching replaced other with one of the auipc
> > > > + jalr pair.
> > > >
> > > > @Zong can correct me if I am wrong.
> > > >
> > > > These functions are too far ahead. Can it cause such issues ? If yes,
> > > > then we need to mark each and every function
> > > > that can be invoked from patch_text_nosync and are not inlined.
> > > >
> > > > That includes copy_to_kernel_nofault, __sbi_rfence_v02,
> > > > __sbi_rfence_v02_call, sbi_ecall.
> > > >
> > > > Few of these functions may be inlined by compiler. Can we depend on that ?
> > > >
> > > > > Because:
> > > > > (gdb) bt
> > > > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20
> > > > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns=
> > > > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96
> > > > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
> > > > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
> > > > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e
> > > > > nable=true) at kernel/trace/ftrace.c:2503
> > > > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized
> > > > > out>) at kernel/trace/ftrace.c:2530
> > > > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel
> > > > > /trace/ftrace.c:2677
> > > > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at
> > > > > kernel/trace/ftrace.c:2703
> > > > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin
> > > > > e.c:224
> > > > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern
> > > > > el/stop_machine.c:491
> > > > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot.
> > > > > c:165
> > > > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern
> > > > > el/kthread.c:292
> > > > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
> > > > >
> > >
> > > It seems to me that the problem happens on the waiting threads, it
> > No, that is the call trace to show ftrace_make_call ->
> > flush_icache_all and we should give notrace on the whole path.
> >
>
> Hmm. I am curious to understand how other architectures avoid this problem.
for arm64
static int ftrace_modify_code(unsigned long pc, u32 old, u32 new,
bool validate)
{
u32 replaced;
...
if (aarch64_insn_patch_text_nosync((void *)pc, new))
return -EPERM;
int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
{
u32 *tp = addr;
int ret;
/* A64 instructions must be word aligned */
if ((uintptr_t)tp & 0x3)
return -EINVAL;
ret = aarch64_insn_write(tp, insn);
if (ret == 0)
__flush_icache_range((uintptr_t)tp,
(uintptr_t)tp + AARCH64_INSN_SIZE);
Look at arm64, they __kprobes flag and I guess it would also prevent
ftrace call site.
__flush_icache_range is written in asm and no possible ftrace call site.
> Is it a bigger issue in RISC-V because we have to switch privilege
> mode to sync I/D cache ?
We should sync I/D cache at s-mode because we need virtual address.
For c910 we've added icache broadcast invalid instructions by physical
address and virtual address.
Current linux/arch/riscv I/D cache sync is so expensive.
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
On Thu, Oct 29, 2020 at 8:28 PM Guo Ren <[email protected]> wrote:
>
> On Fri, Oct 30, 2020 at 2:46 AM Atish Patra <[email protected]> wrote:
> >
> > On Thu, Oct 29, 2020 at 9:06 AM Guo Ren <[email protected]> wrote:
> > >
> > > On Thu, Oct 29, 2020 at 10:34 AM Zong Li <[email protected]> wrote:
> > > >
> > > > On Thu, Oct 29, 2020 at 8:23 AM Atish Patra <[email protected]> wrote:
> > > > >
> > > > > On Wed, Oct 28, 2020 at 8:44 AM Guo Ren <[email protected]> wrote:
> > > > > >
> > > > > > Hi Zong & Atish,
> > > > > >
> > > > > > In our 2 harts c910 chip, we found:
> > > > > >
> > > > > > echo function > /sys/kernel/debug/tracing/current_tracer
> > > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > > > > echo function > /sys/kernel/debug/tracing/current_tracer
> > > > > > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > > > >
> > > > > > Then one core halted at stop_machine_yield:
> > > > > > arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > > > > > 39 local_irq_enable();
> > > > > > (gdb) i th
> > > > > > Id Target Id Frame
> > > > > > * 1 Thread 1 (CPU#0) arch_cpu_idle () at arch/riscv/kernel/process.c:39
> > > > > > 2 Thread 2 (CPU#1) stop_machine_yield
> > > > > > (cpumask=0xffffffe001371fa8 <__cpu_online_mask>) at
> > > > > > ./arch/riscv/include/asm/vdso/processor.h:12
> > > > > > (gdb) thread 2
> > > > > > [Switching to thread 2 (Thread 2)]
> > > > > > #0 stop_machine_yield (cpumask=0xffffffe001371fa8
> > > > > > <__cpu_online_mask>) at ./arch/riscv/include/asm/vdso/processor.h:12
> > > > > > 12 __asm__ __volatile__ ("div %0, %0, zero" : "=r" (dummy));
> > > > > >
> > > > > > With your patch, it's solved. For this patch, I'll give:
> > > > > > Tested by: Guo Ren <[email protected]>
> > > > > >
> > > > > > But that's not enough, we still need:
> > > > > >
> > > > > > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > > > > > index 226ccce..12b8808 100644
> > > > > > --- a/arch/riscv/kernel/sbi.c
> > > > > > +++ b/arch/riscv/kernel/sbi.c
> > > > > > @@ -376,7 +376,7 @@ EXPORT_SYMBOL(sbi_send_ipi);
> > > > > > *
> > > > > > * Return: None
> > > > > > */
> > > > > > -void sbi_remote_fence_i(const unsigned long *hart_mask)
> > > > > > +void notrace sbi_remote_fence_i(const unsigned long *hart_mask)
> > > > > > {
> > > > > > __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
> > > > > > hart_mask, 0, 0, 0, 0);
> > > > > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > > > > > index 400b945d..9467d987 100644
> > > > > > --- a/arch/riscv/mm/cacheflush.c
> > > > > > +++ b/arch/riscv/mm/cacheflush.c
> > > > > > @@ -9,12 +9,12 @@
> > > > > >
> > > > > > #include <asm/sbi.h>
> > > > > >
> > > > > > -static void ipi_remote_fence_i(void *info)
> > > > > > +static void notrace ipi_remote_fence_i(void *info)
> > > > > > {
> > > > > > return local_flush_icache_all();
> > > > > > }
> > > > > >
> > > > > > -void flush_icache_all(void)
> > > > > > +void notrace flush_icache_all(void)
> > > > > > {
> > > > > > if (IS_ENABLED(CONFIG_RISCV_SBI))
> > > > > > sbi_remote_fence_i(NULL);
> > > > > >
> > > > >
> > > > > Did you see any issue if these functions are not marked as notrace ?
> > > > >
> > > > > As per Zong's explanation, the issue was that the other harts already
> > > > > fetched the next 2 nops and
> > > > > executed 1 while kernel patching replaced other with one of the auipc
> > > > > + jalr pair.
> > > > >
> > > > > @Zong can correct me if I am wrong.
> > > > >
> > > > > These functions are too far ahead. Can it cause such issues ? If yes,
> > > > > then we need to mark each and every function
> > > > > that can be invoked from patch_text_nosync and are not inlined.
> > > > >
> > > > > That includes copy_to_kernel_nofault, __sbi_rfence_v02,
> > > > > __sbi_rfence_v02_call, sbi_ecall.
> > > > >
> > > > > Few of these functions may be inlined by compiler. Can we depend on that ?
> > > > >
> > > > > > Because:
> > > > > > (gdb) bt
> > > > > > #0 flush_icache_all () at arch/riscv/mm/cacheflush.c:20
> > > > > > #1 0xffffffe00020473a in patch_text_nosync (addr=<optimized out>, insns=
> > > > > > <optimized out>, len=<optimized out>) at arch/riscv/kernel/patch.c:96
> > > > > > #2 0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
> > > > > > addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
> > > > > > #3 0xffffffe0002c9be4 in __ftrace_replace_code (rec=0xffffffe01ae40020, e
> > > > > > nable=true) at kernel/trace/ftrace.c:2503
> > > > > > #4 0xffffffe0002ca092 in ftrace_replace_code (mod_flags=<optimized
> > > > > > out>) at kernel/trace/ftrace.c:2530
> > > > > > #5 0xffffffe0002ca24a in ftrace_modify_all_code (command=9) at kernel
> > > > > > /trace/ftrace.c:2677
> > > > > > #6 0xffffffe0002ca2ee in __ftrace_modify_code (data=<optimized out>) at
> > > > > > kernel/trace/ftrace.c:2703
> > > > > > #7 0xffffffe0002c1390 in multi_cpu_stop (data=0x0) at kernel/stop_machin
> > > > > > e.c:224
> > > > > > #8 0xffffffe0002c0fbe in cpu_stopper_thread (cpu=<optimized out>) at kern
> > > > > > el/stop_machine.c:491
> > > > > > #9 0xffffffe0002343be in smpboot_thread_fn (data=0x0) at kernel/smpboot.
> > > > > > c:165
> > > > > > #10 0xffffffe00022f894 in kthread (_create=0xffffffe01af13040) at kern
> > > > > > el/kthread.c:292
> > > > > > #11 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236
> > > > > >
> > > >
> > > > It seems to me that the problem happens on the waiting threads, it
> > > No, that is the call trace to show ftrace_make_call ->
> > > flush_icache_all and we should give notrace on the whole path.
> > >
> >
> > Hmm. I am curious to understand how other architectures avoid this problem.
>
> for arm64
> static int ftrace_modify_code(unsigned long pc, u32 old, u32 new,
> bool validate)
> {
> u32 replaced;
> ...
> if (aarch64_insn_patch_text_nosync((void *)pc, new))
> return -EPERM;
>
> int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
> {
> u32 *tp = addr;
> int ret;
>
> /* A64 instructions must be word aligned */
> if ((uintptr_t)tp & 0x3)
> return -EINVAL;
>
> ret = aarch64_insn_write(tp, insn);
> if (ret == 0)
> __flush_icache_range((uintptr_t)tp,
> (uintptr_t)tp + AARCH64_INSN_SIZE);
>
> Look at arm64, they __kprobes flag and I guess it would also prevent
> ftrace call site.
>
Are you sure about that ? __kprobes puts the code in .kprobes.text section
which is under whitelist sections in recordmcount.pl & recordmcount.c.
> __flush_icache_range is written in asm and no possible ftrace call site.
>
> > Is it a bigger issue in RISC-V because we have to switch privilege
> > mode to sync I/D cache ?
> We should sync I/D cache at s-mode because we need virtual address.
> For c910 we've added icache broadcast invalid instructions by physical
> address and virtual address.
>
> Current linux/arch/riscv I/D cache sync is so expensive.
>
Yes. It is a known fact. Unfortunately, RISC-V specifications doesn't
allow any other method yet.
I hope the specification is modified to allow some method to sync I/D
cache from S-mode soon.
> --
> Best Regards
> Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/
--
Regards,
Atish
On Fri, 30 Oct 2020 14:47:56 -0700
Atish Patra <[email protected]> wrote:
> > Look at arm64, they __kprobes flag and I guess it would also prevent
> > ftrace call site.
> >
>
> Are you sure about that ? __kprobes puts the code in .kprobes.text section
> which is under whitelist sections in recordmcount.pl & recordmcount.c.
Correct, ftrace can trace functions marked with __kprobes. That said,
the instruction you are looking at here, is in a file that is
blacklisted from recordmcount.
CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE)
All ftrace flags are removed from the compiling of insn.c, and every
function in that file will not be traced.
-- Steve
On Sat, Oct 31, 2020 at 8:28 AM Steven Rostedt <[email protected]> wrote:
>
> On Fri, 30 Oct 2020 14:47:56 -0700
> Atish Patra <[email protected]> wrote:
>
> > > Look at arm64, they __kprobes flag and I guess it would also prevent
> > > ftrace call site.
> > >
> >
> > Are you sure about that ? __kprobes puts the code in .kprobes.text section
> > which is under whitelist sections in recordmcount.pl & recordmcount.c.
>
> Correct, ftrace can trace functions marked with __kprobes. That said,
I guess wrong, thx for correct me.
> the instruction you are looking at here, is in a file that is
> blacklisted from recordmcount.
>
> CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE)
>
> All ftrace flags are removed from the compiling of insn.c, and every
> function in that file will not be traced.
Yes, arm64 prevents the whole file from ftrace. My patch just use
notrace flag setting on some functions.
@Atish How do think:
CFLAGS_REMOVE_cacheflush.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE)
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
On Sat, Oct 31, 2020 at 12:42 AM Guo Ren <[email protected]> wrote:
>
> On Sat, Oct 31, 2020 at 8:28 AM Steven Rostedt <[email protected]> wrote:
> >
> > On Fri, 30 Oct 2020 14:47:56 -0700
> > Atish Patra <[email protected]> wrote:
> >
> > > > Look at arm64, they __kprobes flag and I guess it would also prevent
> > > > ftrace call site.
> > > >
> > >
> > > Are you sure about that ? __kprobes puts the code in .kprobes.text section
> > > which is under whitelist sections in recordmcount.pl & recordmcount.c.
> >
> > Correct, ftrace can trace functions marked with __kprobes. That said,
> I guess wrong, thx for correct me.
>
> > the instruction you are looking at here, is in a file that is
> > blacklisted from recordmcount.
> >
> > CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE)
> >
> > All ftrace flags are removed from the compiling of insn.c, and every
> > function in that file will not be traced.
> Yes, arm64 prevents the whole file from ftrace. My patch just use
> notrace flag setting on some functions.
>
> @Atish How do think:
> CFLAGS_REMOVE_cacheflush.o = $(CC_FLAGS_FTRACE)
> CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE)
>
Looks good to me. What should be done for copy_to_kernel_nofault ?
That is also in the calling path.
> --
> Best Regards
> Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/
--
Regards,
Atish
On Tue, Nov 3, 2020 at 11:33 PM Atish Patra <[email protected]> wrote:
>
> On Sat, Oct 31, 2020 at 12:42 AM Guo Ren <[email protected]> wrote:
> >
> > On Sat, Oct 31, 2020 at 8:28 AM Steven Rostedt <[email protected]> wrote:
> > >
> > > On Fri, 30 Oct 2020 14:47:56 -0700
> > > Atish Patra <[email protected]> wrote:
> > >
> > > > > Look at arm64, they __kprobes flag and I guess it would also prevent
> > > > > ftrace call site.
> > > > >
> > > >
> > > > Are you sure about that ? __kprobes puts the code in .kprobes.text section
> > > > which is under whitelist sections in recordmcount.pl & recordmcount.c.
> > >
> > > Correct, ftrace can trace functions marked with __kprobes. That said,
> > I guess wrong, thx for correct me.
> >
> > > the instruction you are looking at here, is in a file that is
> > > blacklisted from recordmcount.
> > >
> > > CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE)
> > >
> > > All ftrace flags are removed from the compiling of insn.c, and every
> > > function in that file will not be traced.
> > Yes, arm64 prevents the whole file from ftrace. My patch just use
> > notrace flag setting on some functions.
> >
> > @Atish How do think:
> > CFLAGS_REMOVE_cacheflush.o = $(CC_FLAGS_FTRACE)
> > CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE)
> >
>
> Looks good to me. What should be done for copy_to_kernel_nofault ?
> That is also in the calling path.
There is no nops' entry in the prologue of copy_to_kernel_nofault.
>>>>
000000000000007c <.LVL6>:
}
7c: 6105 addi sp,sp,32
7e: 8082 ret
0000000000000080 <copy_to_user_nofault>:
*
* Safely write to address @dst from the buffer at @src. If a kernel fault
* happens, handle that and return -EFAULT.
*/
long copy_to_user_nofault(void __user *dst, const void *src, size_t size)
{
80: 1101 addi sp,sp,-32
82: e822 sd s0,16(sp)
84: ec06 sd ra,24(sp)
86: e426 sd s1,8(sp)
88: e04a sd s2,0(sp)
8a: 1000 addi s0,sp,32
<<<<
>>>>
cmd_mm/maccess.o :=
/root/source/riscv-tools/install_64gc/bin/riscv64-unknown-linux-gnu-gcc
-Wp,-MMD,mm/.maccess.o.d -nostdinc -isystem
/root/source/riscv-tools/install_64gc/bin/../lib/gcc/riscv64-unknown-linux-gnu/8.4.0/include
-I./arch/riscv/include -I./arch/riscv/include/generated -I./include
-I./arch/riscv/include/uapi -I./arch/riscv/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include
./include/linux/kconfig.h -include ./include/linux/compiler_types.h
-D__KERNEL__ ***-DCC_USING_PATCHABLE_FUNCTION_ENTRY*** -Wall -Wundef
-Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -fshort-wchar -fno-PIE
-Werror=implicit-function-declaration -Werror=implicit-int
-Wno-format-security -std=gnu89 -mabi=lp64 -march=rv64imac
-mno-save-restore -DCONFIG_PAGE_OFFSET=0xffffffe000000000
-mcmodel=medany -fno-omit-frame-pointer -mstrict-align
-fno-delete-null-pointer-checks -Wno-frame-address
-Wno-format-truncation -Wno-format-overflow -O2
--param=allow-store-data-races=0 -Wframe-larger-than=2048
-fstack-protector-strong -Wno-unused-but-set-variable
-Wimplicit-fallthrough -Wno-unused-const-variable
-fno-omit-frame-pointer -fno-optimize-sibling-calls
-fno-var-tracking-assignments -g ***-fpatchable-function-entry=8***
-Wdeclaration-after-statement -Wvla -Wno-pointer-sign
-Wno-stringop-truncation -Wno-array-bounds -Wno-stringop-overflow
-Wno-restrict -Wno-maybe-uninitialized -fno-strict-overflow
-fno-merge-all-constants -fmerge-constants -fno-stack-check
-fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types
-Werror=designated-init -fmacro-prefix-map=./= -Wno-packed-not-aligned
-DKBUILD_MODFILE='"mm/maccess"' -DKBUILD_BASENAME='"maccess"'
-DKBUILD_MODNAME='"maccess"' -c -o mm/maccess.o mm/maccess.c
<<<<
But copy_from_user_nofault has:
000000000000007c <.LVL6>:
}
7c: 6105 addi sp,sp,32
7e: 8082 ret
0000000000000080 <copy_to_user_nofault>:
*
* Safely write to address @dst from the buffer at @src. If a kernel fault
* happens, handle that and return -EFAULT.
*/
long copy_to_user_nofault(void __user *dst, const void *src, size_t size)
{
80: 1101 addi sp,sp,-32
82: e822 sd s0,16(sp)
84: ec06 sd ra,24(sp)
86: e426 sd s1,8(sp)
88: e04a sd s2,0(sp)
8a: 1000 addi s0,sp,32
I think it's a gcc problem, but satisfy our ftrace requirement.
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/