2024-02-06 20:54:05

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH] riscv: Fix text patching when icache flushes use IPIs

For now, we use stop_machine() to patch the text and when we use IPIs for
remote icache flushes, the system hangs since the irqs are disabled on all
cpus.

So instead, make sure every cpu executes the stop_machine() patching
function which emits a local icache flush and then avoids the use of
IPIs.

Co-developed-by: Björn Töpel <[email protected]>
Signed-off-by: Björn Töpel <[email protected]>
Signed-off-by: Alexandre Ghiti <[email protected]>
---
arch/riscv/include/asm/patch.h | 1 +
arch/riscv/kernel/ftrace.c | 38 ++++++++++++++++++++++++++++++----
arch/riscv/kernel/patch.c | 11 +++++-----
3 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h
index e88b52d39eac..9f5d6e14c405 100644
--- a/arch/riscv/include/asm/patch.h
+++ b/arch/riscv/include/asm/patch.h
@@ -6,6 +6,7 @@
#ifndef _ASM_RISCV_PATCH_H
#define _ASM_RISCV_PATCH_H

+int patch_insn_write(void *addr, const void *insn, size_t len);
int patch_text_nosync(void *addr, const void *insns, size_t len);
int patch_text_set_nosync(void *addr, u8 c, size_t len);
int patch_text(void *addr, u32 *insns, int ninsns);
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index f5aa24d9e1c1..1694a1861d1e 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -8,6 +8,7 @@
#include <linux/ftrace.h>
#include <linux/uaccess.h>
#include <linux/memory.h>
+#include <linux/stop_machine.h>
#include <asm/cacheflush.h>
#include <asm/patch.h>

@@ -75,8 +76,7 @@ static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
make_call_t0(hook_pos, target, call);

/* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
- if (patch_text_nosync
- ((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
+ if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
return -EPERM;

return 0;
@@ -88,7 +88,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)

make_call_t0(rec->ip, addr, call);

- if (patch_text_nosync((void *)rec->ip, call, MCOUNT_INSN_SIZE))
+ if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
return -EPERM;

return 0;
@@ -99,7 +99,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
{
unsigned int nops[2] = {NOP4, NOP4};

- if (patch_text_nosync((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
+ if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
return -EPERM;

return 0;
@@ -134,6 +134,36 @@ int ftrace_update_ftrace_func(ftrace_func_t func)

return ret;
}
+
+struct ftrace_modify_param {
+ int command;
+ atomic_t cpu_count;
+};
+
+static int __ftrace_modify_code(void *data)
+{
+ struct ftrace_modify_param *param = data;
+
+ if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
+ ftrace_modify_all_code(param->command);
+ atomic_inc(&param->cpu_count);
+ } else {
+ while (atomic_read(&param->cpu_count) <= num_online_cpus())
+ cpu_relax();
+ smp_mb();
+ }
+
+ local_flush_icache_all();
+
+ return 0;
+}
+
+void arch_ftrace_update_code(int command)
+{
+ struct ftrace_modify_param param = { command, ATOMIC_INIT(0) };
+
+ stop_machine(__ftrace_modify_code, &param, cpu_online_mask);
+}
#endif

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 37e87fdcf6a0..ec7760a4d6cd 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -188,7 +188,7 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len)
}
NOKPROBE_SYMBOL(patch_text_set_nosync);

-static int patch_insn_write(void *addr, const void *insn, size_t len)
+int patch_insn_write(void *addr, const void *insn, size_t len)
{
size_t patched = 0;
size_t size;
@@ -211,11 +211,9 @@ NOKPROBE_SYMBOL(patch_insn_write);

int patch_text_nosync(void *addr, const void *insns, size_t len)
{
- u32 *tp = addr;
int ret;

- ret = patch_insn_write(tp, insns, len);
-
+ ret = patch_insn_write(addr, insns, len);
if (!ret)
flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len);

@@ -232,8 +230,7 @@ static int patch_text_cb(void *data)
if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
for (i = 0; ret == 0 && i < patch->ninsns; i++) {
len = GET_INSN_LENGTH(patch->insns[i]);
- ret = patch_text_nosync(patch->addr + i * len,
- &patch->insns[i], len);
+ ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
}
atomic_inc(&patch->cpu_count);
} else {
@@ -242,6 +239,8 @@ static int patch_text_cb(void *data)
smp_mb();
}

+ local_flush_icache_all();
+
return ret;
}
NOKPROBE_SYMBOL(patch_text_cb);
--
2.39.2



2024-02-07 07:31:10

by Björn Töpel

[permalink] [raw]
Subject: Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

Alexandre Ghiti <[email protected]> writes:

> For now, we use stop_machine() to patch the text and when we use IPIs for
> remote icache flushes, the system hangs since the irqs are disabled on all
> cpus.
>
> So instead, make sure every cpu executes the stop_machine() patching
> function which emits a local icache flush and then avoids the use of
> IPIs.
>
> Co-developed-by: Björn Töpel <[email protected]>
> Signed-off-by: Björn Töpel <[email protected]>
> Signed-off-by: Alexandre Ghiti <[email protected]>

FWIW, the BPF selftests pass nicely with this (especially the
fentry/fexit tests ;-)). I don't know if it's worth much saying that
your own stuff was tested, but here goes:

Tested-by: Björn Töpel <[email protected]>

2024-02-08 11:43:16

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

> +static int __ftrace_modify_code(void *data)
> +{
> + struct ftrace_modify_param *param = data;
> +
> + if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
> + ftrace_modify_all_code(param->command);
> + atomic_inc(&param->cpu_count);

I stared at ftrace_modify_all_code() for a bit but honestly I don't see
what prevents the ->cpu_count increment from being reordered before the
insn write(s) (architecturally) now that you have removed the IPI dance:
perhaps add an smp_wmb() right before the atomic_inc() (or promote this
latter to a (void)atomic_inc_return_release()) and/or an inline comment
saying why such reordering is not possible?


> + } else {
> + while (atomic_read(&param->cpu_count) <= num_online_cpus())
> + cpu_relax();
> + smp_mb();

I see that you've lifted/copied the memory barrier from patch_text_cb():
what's its point? AFAIU, the barrier has no ordering effect on program
order later insn fetches; perhaps the code was based on some legacy/old
version of Zifencei? IAC, comments, comments, ... or maybe just remove
that memory barrier?


> + }
> +
> + local_flush_icache_all();
> +
> + return 0;
> +}

[...]


> @@ -232,8 +230,7 @@ static int patch_text_cb(void *data)
> if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
> for (i = 0; ret == 0 && i < patch->ninsns; i++) {
> len = GET_INSN_LENGTH(patch->insns[i]);
> - ret = patch_text_nosync(patch->addr + i * len,
> - &patch->insns[i], len);
> + ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
> }
> atomic_inc(&patch->cpu_count);
> } else {
> @@ -242,6 +239,8 @@ static int patch_text_cb(void *data)
> smp_mb();
> }
>
> + local_flush_icache_all();
> +
> return ret;
> }
> NOKPROBE_SYMBOL(patch_text_cb);

My above remarks/questions also apply to this function.


On a last topic, although somehow orthogonal to the scope of this patch,
I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is
correct: I can see why we may want (need to do) the local TLB flush be-
fore returning from patch_{map,unmap}(), but does a local flush suffice?
For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb)
sequence in their unmapping stage (and apparently relying on "no caching
of invalid ptes" in their mapping stage). Of course, "broadcasting" our
(riscv's) TLB invalidations will necessary introduce some complexity...

Thoughts?

Andrea

2024-02-08 13:55:01

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

Hi Andrea,

On Thu, Feb 8, 2024 at 12:42 PM Andrea Parri <[email protected]> wrote:
>
> > +static int __ftrace_modify_code(void *data)
> > +{
> > + struct ftrace_modify_param *param = data;
> > +
> > + if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
> > + ftrace_modify_all_code(param->command);
> > + atomic_inc(&param->cpu_count);
>
> I stared at ftrace_modify_all_code() for a bit but honestly I don't see
> what prevents the ->cpu_count increment from being reordered before the
> insn write(s) (architecturally) now that you have removed the IPI dance:
> perhaps add an smp_wmb() right before the atomic_inc() (or promote this
> latter to a (void)atomic_inc_return_release()) and/or an inline comment
> saying why such reordering is not possible?

I did not even think of that, and it actually makes sense so I'll go
with what you propose: I'll replace atomic_inc() with
atomic_inc_return_release(). And I'll add the following comment if
that's ok with you:

"Make sure the patching store is effective *before* we increment the
counter which releases all waiting cpus"

>
>
> > + } else {
> > + while (atomic_read(&param->cpu_count) <= num_online_cpus())
> > + cpu_relax();
> > + smp_mb();
>
> I see that you've lifted/copied the memory barrier from patch_text_cb():
> what's its point? AFAIU, the barrier has no ordering effect on program
> order later insn fetches; perhaps the code was based on some legacy/old
> version of Zifencei? IAC, comments, comments, ... or maybe just remove
> that memory barrier?

Honestly, I looked at it one minute, did not understand its purpose
and said to myself "ok that can't hurt anyway, I may be missing
something".

FWIW, I see that arm64 uses isb() here. If you don't see its purpose,
I'll remove it (here and where I copied it).

>
>
> > + }
> > +
> > + local_flush_icache_all();
> > +
> > + return 0;
> > +}
>
> [...]
>
>
> > @@ -232,8 +230,7 @@ static int patch_text_cb(void *data)
> > if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
> > for (i = 0; ret == 0 && i < patch->ninsns; i++) {
> > len = GET_INSN_LENGTH(patch->insns[i]);
> > - ret = patch_text_nosync(patch->addr + i * len,
> > - &patch->insns[i], len);
> > + ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
> > }
> > atomic_inc(&patch->cpu_count);
> > } else {
> > @@ -242,6 +239,8 @@ static int patch_text_cb(void *data)
> > smp_mb();
> > }
> >
> > + local_flush_icache_all();
> > +
> > return ret;
> > }
> > NOKPROBE_SYMBOL(patch_text_cb);
>
> My above remarks/questions also apply to this function.
>
>
> On a last topic, although somehow orthogonal to the scope of this patch,
> I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is
> correct: I can see why we may want (need to do) the local TLB flush be-
> fore returning from patch_{map,unmap}(), but does a local flush suffice?
> For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb)
> sequence in their unmapping stage (and apparently relying on "no caching
> of invalid ptes" in their mapping stage). Of course, "broadcasting" our
> (riscv's) TLB invalidations will necessary introduce some complexity...
>
> Thoughts?

To avoid remote TLBI, could we simply disable the preemption before
the first patch_map()? arm64 disables the irqs, but that seems
overkill to me, but maybe I'm missing something again?

Thanks for your comments Andrea,

Alex

>
> Andrea

2024-02-08 14:24:06

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

> I did not even think of that, and it actually makes sense so I'll go
> with what you propose: I'll replace atomic_inc() with
> atomic_inc_return_release(). And I'll add the following comment if
> that's ok with you:
>
> "Make sure the patching store is effective *before* we increment the
> counter which releases all waiting cpus"

Yes, this sounds good to me.


> Honestly, I looked at it one minute, did not understand its purpose
> and said to myself "ok that can't hurt anyway, I may be missing
> something".
>
> FWIW, I see that arm64 uses isb() here. If you don't see its purpose,
> I'll remove it (here and where I copied it).

Removing the smp_mb() (and keeping the local_flush_icache_all()) seems
fine to me; thanks for the confirmation.


> > On a last topic, although somehow orthogonal to the scope of this patch,
> > I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is
> > correct: I can see why we may want (need to do) the local TLB flush be-
> > fore returning from patch_{map,unmap}(), but does a local flush suffice?
> > For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb)
> > sequence in their unmapping stage (and apparently relying on "no caching
> > of invalid ptes" in their mapping stage). Of course, "broadcasting" our
> > (riscv's) TLB invalidations will necessary introduce some complexity...
> >
> > Thoughts?
>
> To avoid remote TLBI, could we simply disable the preemption before
> the first patch_map()? arm64 disables the irqs, but that seems
> overkill to me, but maybe I'm missing something again?

Mmh, I'm afraid this will require more thinking/probing on my end (not
really "the expert" of the codebase at stake...). Maybe the ftrace
reviewers will provide further ideas/suggestions for us to brainstorm.

Andrea