2022-09-27 02:30:07

by Liao, Chang

[permalink] [raw]
Subject: [PATCH V4 0/3] kprobe: Optimize the performance of patching ss

Single-step slot would not be used until kprobe is enabled, that means
no race condition occurs on it under SMP, hence it is safe to pacth ss
slot without stopping machine.

v4:
1. Add Acked-by from Will Deacon
2. Mark Rutland provides some subtleties on arm64 micro-architecture
that needs to follow.

v3:
1. Drop duplicated I-Cache maintenance for arm64.
2. Add Acked-by from Masami Hiramatsu.

v2:
Backport riscv patch to cksy and arm64.


Liao Chang (3):
riscv/kprobe: Optimize the performance of patching single-step slot
csky/kprobe: Optimize the performance of patching single-step slot
arm64/kprobe: Optimize the performance of patching single-step slot

arch/arm64/kernel/probes/kprobes.c | 27 +++++++++++++++++++++------
arch/csky/kernel/probes/kprobes.c | 6 +++++-
arch/riscv/kernel/probes/kprobes.c | 8 +++++---
3 files changed, 31 insertions(+), 10 deletions(-)

--
2.17.1


2022-09-27 02:48:08

by Liao, Chang

[permalink] [raw]
Subject: [PATCH 2/3] csky/kprobe: Optimize the performance of patching single-step slot

Single-step slot would not be used until kprobe is enabled, that means
no race condition occurs on it under SMP, hence it is safe to pacth ss
slot without stopping machine.

Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Liao Chang <[email protected]>
---
arch/csky/kernel/probes/kprobes.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/csky/kernel/probes/kprobes.c b/arch/csky/kernel/probes/kprobes.c
index 3c6e5c725d81..4feb5ce16264 100644
--- a/arch/csky/kernel/probes/kprobes.c
+++ b/arch/csky/kernel/probes/kprobes.c
@@ -57,7 +57,11 @@ static void __kprobes arch_prepare_ss_slot(struct kprobe *p)

p->ainsn.api.restore = (unsigned long)p->addr + offset;

- patch_text(p->ainsn.api.insn, p->opcode);
+ memcpy(p->ainsn.api.insn, &p->opcode, offset);
+ dcache_wb_range((unsigned long)p->ainsn.api.insn,
+ (unsigned long)p->ainsn.api.insn + offset);
+ icache_inv_range((unsigned long)p->ainsn.api.insn,
+ (unsigned long)p->ainsn.api.insn + offset);
}

static void __kprobes arch_prepare_simulate(struct kprobe *p)
--
2.17.1

2022-09-27 02:51:39

by Liao, Chang

[permalink] [raw]
Subject: [PATCH 1/3] riscv/kprobe: Optimize the performance of patching single-step slot

Single-step slot would not be used until kprobe is enabled, that means
no race condition occurs on it under SMP, hence it is safe to pacth ss
slot without stopping machine.

Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Liao Chang <[email protected]>
---
arch/riscv/kernel/probes/kprobes.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index e6e950b7cf32..bc1f39b96e41 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -24,12 +24,14 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
{
unsigned long offset = GET_INSN_LENGTH(p->opcode);
+ kprobe_opcode_t slot[MAX_INSN_SIZE];

p->ainsn.api.restore = (unsigned long)p->addr + offset;

- patch_text(p->ainsn.api.insn, p->opcode);
- patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
- __BUG_INSN_32);
+ memcpy(slot, &p->opcode, offset);
+ *(kprobe_opcode_t *)((unsigned long)slot + offset) = __BUG_INSN_32;
+ patch_text_nosync(p->ainsn.api.insn, slot,
+ offset + GET_INSN_LENGTH(__BUG_INSN_32));
}

static void __kprobes arch_prepare_simulate(struct kprobe *p)
--
2.17.1

2022-09-27 02:54:04

by Liao, Chang

[permalink] [raw]
Subject: [PATCH 3/3] arm64/kprobe: Optimize the performance of patching single-step slot

Single-step slot would not be used until kprobe is enabled, that means
no race condition occurs on it under SMP, hence it is safe to pacth ss
slot without stopping machine.

Since I and D caches are coherent within single-step slot from
aarch64_insn_patch_text_nosync(), hence no need to do it again via
flush_icache_range().

Acked-by: Will Deacon <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Liao Chang <[email protected]>
---
arch/arm64/kernel/probes/kprobes.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index d1d182320245..c9e4d0720285 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -44,13 +44,28 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
{
kprobe_opcode_t *addr = p->ainsn.api.insn;
- void *addrs[] = {addr, addr + 1};
- u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS};

- /* prepare insn slot */
- aarch64_insn_patch_text(addrs, insns, 2);
-
- flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE));
+ /*
+ * Prepare insn slot, Mark Rutland points out it depends on a coupe of
+ * subtleties:
+ *
+ * - That the I-cache maintenance for these instructions is complete
+ * *before* the kprobe BRK is written (and aarch64_insn_patch_text_nosync()
+ * ensures this, but just omits causing a Context-Synchronization-Event
+ * on all CPUS).
+ *
+ * - That the kprobe BRK results in an exception (and consequently a
+ * Context-Synchronoization-Event), which ensures that the CPU will
+ * fetch thesingle-step slot instructions *after* this, ensuring that
+ * the new instructions are used
+ *
+ * It supposes to place ISB after patching to guarantee I-cache maintenance
+ * is observed on all CPUS, however, single-step slot is installed in
+ * the BRK exception handler, so it is unnecessary to generate
+ * Contex-Synchronization-Event via ISB again.
+ */
+ aarch64_insn_patch_text_nosync(addr, p->opcode);
+ aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS);

/*
* Needs restoring of return address after stepping xol.
--
2.17.1

2022-09-29 17:37:31

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 3/3] arm64/kprobe: Optimize the performance of patching single-step slot

On Tue, Sep 27, 2022 at 10:24:35AM +0800, Liao Chang wrote:
> Single-step slot would not be used until kprobe is enabled, that means
> no race condition occurs on it under SMP, hence it is safe to pacth ss
> slot without stopping machine.
>
> Since I and D caches are coherent within single-step slot from
> aarch64_insn_patch_text_nosync(), hence no need to do it again via
> flush_icache_range().
>
> Acked-by: Will Deacon <[email protected]>
> Acked-by: Masami Hiramatsu (Google) <[email protected]>
> Signed-off-by: Liao Chang <[email protected]>
> ---
> arch/arm64/kernel/probes/kprobes.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)

What's your expectation with this series, should the arch maintainers
just pick the individual patches?

--
Catalin

2022-09-30 01:12:42

by Liao, Chang

[permalink] [raw]
Subject: Re: [PATCH 3/3] arm64/kprobe: Optimize the performance of patching single-step slot



在 2022/9/30 0:50, Catalin Marinas 写道:
> On Tue, Sep 27, 2022 at 10:24:35AM +0800, Liao Chang wrote:
>> Single-step slot would not be used until kprobe is enabled, that means
>> no race condition occurs on it under SMP, hence it is safe to pacth ss
>> slot without stopping machine.
>>
>> Since I and D caches are coherent within single-step slot from
>> aarch64_insn_patch_text_nosync(), hence no need to do it again via
>> flush_icache_range().
>>
>> Acked-by: Will Deacon <[email protected]>
>> Acked-by: Masami Hiramatsu (Google) <[email protected]>
>> Signed-off-by: Liao Chang <[email protected]>
>> ---
>> arch/arm64/kernel/probes/kprobes.c | 27 +++++++++++++++++++++------
>> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> What's your expectation with this series, should the arch maintainers
> just pick the individual patches?

Yes, or should i split this series into individual patch?

Thanks.

>

--
BR,
Liao, Chang

2022-09-30 08:36:44

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 3/3] arm64/kprobe: Optimize the performance of patching single-step slot

On Fri, Sep 30, 2022 at 09:02:20AM +0800, liaochang (A) wrote:
>
>
> 在 2022/9/30 0:50, Catalin Marinas 写道:
> > On Tue, Sep 27, 2022 at 10:24:35AM +0800, Liao Chang wrote:
> >> Single-step slot would not be used until kprobe is enabled, that means
> >> no race condition occurs on it under SMP, hence it is safe to pacth ss
> >> slot without stopping machine.
> >>
> >> Since I and D caches are coherent within single-step slot from
> >> aarch64_insn_patch_text_nosync(), hence no need to do it again via
> >> flush_icache_range().
> >>
> >> Acked-by: Will Deacon <[email protected]>
> >> Acked-by: Masami Hiramatsu (Google) <[email protected]>
> >> Signed-off-by: Liao Chang <[email protected]>
> >> ---
> >> arch/arm64/kernel/probes/kprobes.c | 27 +++++++++++++++++++++------
> >> 1 file changed, 21 insertions(+), 6 deletions(-)
> >
> > What's your expectation with this series, should the arch maintainers
> > just pick the individual patches?
>
> Yes, or should i split this series into individual patch?

No need to, I can pick the arm64 patch. If the other maintainers don't
merge the patches, you might want to post them again individually as to
avoid confusion.

--
Catalin

2022-09-30 16:41:10

by Catalin Marinas

[permalink] [raw]
Subject: Re: (subset) [PATCH V4 0/3] kprobe: Optimize the performance of patching ss

On Tue, 27 Sep 2022 10:24:32 +0800, Liao Chang wrote:
> Single-step slot would not be used until kprobe is enabled, that means
> no race condition occurs on it under SMP, hence it is safe to pacth ss
> slot without stopping machine.
>
> v4:
> 1. Add Acked-by from Will Deacon
> 2. Mark Rutland provides some subtleties on arm64 micro-architecture
> that needs to follow.
>
> [...]

Applied to arm64 (for-next/misc), thanks!

[3/3] arm64/kprobe: Optimize the performance of patching single-step slot
https://git.kernel.org/arm64/c/a0caebbd0460

--
Catalin