2022-03-24 01:39:53

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v1 0/4] Kill the time spent in patch_instruction()

This series reduces by 70% the time required to activate
ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.

Measure is performed in function ftrace_replace_code() using mftb()
around the loop.

With the series,
- Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
- With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.

Before this series,
- Without CONFIG_STRICT_KERNEL_RWX, 427000 TB ticks are measured.
- With CONFIG_STRICT_KERNEL_RWX, 1744000 TB ticks are measured.

Before the series, CONFIG_STRICT_KERNEL_RWX multiplies the time
required for ftrace activation by more than 4.

With the series, CONFIG_STRICT_KERNEL_RWX increases the time
required for ftrace activation by only 30%

Christophe Leroy (4):
powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without
CONFIG_MODULES
powerpc/code-patching: Speed up page mapping/unmapping
powerpc/code-patching: Use jump_label for testing freed initmem
powerpc/code-patching: Use jump_label to check if poking_init() is
done

arch/powerpc/include/asm/code-patching.h | 2 ++
arch/powerpc/lib/code-patching.c | 37 +++++++++++++++---------
arch/powerpc/mm/mem.c | 2 ++
3 files changed, 28 insertions(+), 13 deletions(-)

--
2.35.1


2022-05-16 09:01:19

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()

On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote:
> This series reduces by 70% the time required to activate
> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.
>
> Measure is performed in function ftrace_replace_code() using mftb()
> around the loop.
>
> With the series,
> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.
>
> [...]

Patches 1, 3 and 4 applied to powerpc/next.

[1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without CONFIG_MODULES
https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596
[3/4] powerpc/code-patching: Use jump_label for testing freed initmem
https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6
[4/4] powerpc/code-patching: Use jump_label to check if poking_init() is done
https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a

cheers

2022-05-17 16:43:30

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()



Le 15/05/2022 à 12:28, Michael Ellerman a écrit :
> On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote:
>> This series reduces by 70% the time required to activate
>> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.
>>
>> Measure is performed in function ftrace_replace_code() using mftb()
>> around the loop.
>>
>> With the series,
>> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
>> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.
>>
>> [...]
>
> Patches 1, 3 and 4 applied to powerpc/next.
>
> [1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without CONFIG_MODULES
> https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596
> [3/4] powerpc/code-patching: Use jump_label for testing freed initmem
> https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6
> [4/4] powerpc/code-patching: Use jump_label to check if poking_init() is done
> https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a
>

Patch 2 was the keystone of this series. What happened to it ?

Christophe

2022-05-17 22:03:22

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()

Christophe Leroy <[email protected]> writes:
> Le 15/05/2022 à 12:28, Michael Ellerman a écrit :
>> On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote:
>>> This series reduces by 70% the time required to activate
>>> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.
>>>
>>> Measure is performed in function ftrace_replace_code() using mftb()
>>> around the loop.
>>>
>>> With the series,
>>> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
>>> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.
>>>
>>> [...]
>>
>> Patches 1, 3 and 4 applied to powerpc/next.
>>
>> [1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without CONFIG_MODULES
>> https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596
>> [3/4] powerpc/code-patching: Use jump_label for testing freed initmem
>> https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6
>> [4/4] powerpc/code-patching: Use jump_label to check if poking_init() is done
>> https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a
>>
>
> Patch 2 was the keystone of this series. What happened to it ?

It broke on 64-bit. I think I know why but I haven't had time to test
it. Will try and get it fixed in the next day or two.

cheers

2022-06-01 20:48:09

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()



Le 17/05/2022 à 14:37, Michael Ellerman a écrit :
> Christophe Leroy <[email protected]> writes:
>> Le 15/05/2022 à 12:28, Michael Ellerman a écrit :
>>> On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote:
>>>> This series reduces by 70% the time required to activate
>>>> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.
>>>>
>>>> Measure is performed in function ftrace_replace_code() using mftb()
>>>> around the loop.
>>>>
>>>> With the series,
>>>> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
>>>> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.
>>>>
>>>> [...]
>>>
>>> Patches 1, 3 and 4 applied to powerpc/next.
>>>
>>> [1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without CONFIG_MODULES
>>> https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596
>>> [3/4] powerpc/code-patching: Use jump_label for testing freed initmem
>>> https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6
>>> [4/4] powerpc/code-patching: Use jump_label to check if poking_init() is done
>>> https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a
>>>
>>
>> Patch 2 was the keystone of this series. What happened to it ?
>
> It broke on 64-bit. I think I know why but I haven't had time to test
> it. Will try and get it fixed in the next day or two.
>

You didn't find any solution at the end, or didn't have time ?

What was the problem exactly ? I made a quick try on QEMU and it was
working as expected.

Christophe

2022-06-24 07:18:07

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()

Michael ?

Le 31/05/2022 à 08:24, Christophe Leroy a écrit :
>
>
> Le 17/05/2022 à 14:37, Michael Ellerman a écrit :
>> Christophe Leroy <[email protected]> writes:
>>> Le 15/05/2022 à 12:28, Michael Ellerman a écrit :
>>>> On Tue, 22 Mar 2022 16:40:17 +0100, Christophe Leroy wrote:
>>>>> This series reduces by 70% the time required to activate
>>>>> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.
>>>>>
>>>>> Measure is performed in function ftrace_replace_code() using mftb()
>>>>> around the loop.
>>>>>
>>>>> With the series,
>>>>> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
>>>>> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.
>>>>>
>>>>> [...]
>>>>
>>>> Patches 1, 3 and 4 applied to powerpc/next.
>>>>
>>>> [1/4] powerpc/code-patching: Don't call is_vmalloc_or_module_addr()
>>>> without CONFIG_MODULES
>>>>
>>>> https://git.kernel.org/powerpc/c/cb3ac45214c03852430979a43180371a44b74596
>>>>
>>>> [3/4] powerpc/code-patching: Use jump_label for testing freed initmem
>>>>
>>>> https://git.kernel.org/powerpc/c/b033767848c4115e486b1a51946de3bee2ac0fa6
>>>>
>>>> [4/4] powerpc/code-patching: Use jump_label to check if
>>>> poking_init() is done
>>>>
>>>> https://git.kernel.org/powerpc/c/1751289268ef959db68b0b6f798d904d6403309a
>>>>
>>>>
>>>
>>> Patch 2 was the keystone of this series. What happened to it ?
>>
>> It broke on 64-bit. I think I know why but I haven't had time to test
>> it. Will try and get it fixed in the next day or two.
>>
>
> You didn't find any solution at the end, or didn't have time ?
>
> What was the problem exactly ? I made a quick try on QEMU and it was
> working as expected.
>

Should I make it a ppc32-only change ?

2022-09-27 15:16:17

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v1 0/4] Kill the time spent in patch_instruction()

Argh !! Looks like I sent an old already applied series nested in the
new one.

Ignore those x/4 patches, only look at the x/6 ones.


Le 27/09/2022 à 16:33, Christophe Leroy a écrit :
> This series reduces by 70% the time required to activate
> ftrace on an 8xx with CONFIG_STRICT_KERNEL_RWX.
>
> Measure is performed in function ftrace_replace_code() using mftb()
> around the loop.
>
> With the series,
> - Without CONFIG_STRICT_KERNEL_RWX, 416000 TB ticks are measured.
> - With CONFIG_STRICT_KERNEL_RWX, 546000 TB ticks are measured.
>
> Before this series,
> - Without CONFIG_STRICT_KERNEL_RWX, 427000 TB ticks are measured.
> - With CONFIG_STRICT_KERNEL_RWX, 1744000 TB ticks are measured.
>
> Before the series, CONFIG_STRICT_KERNEL_RWX multiplies the time
> required for ftrace activation by more than 4.
>
> With the series, CONFIG_STRICT_KERNEL_RWX increases the time
> required for ftrace activation by only 30%
>
> Christophe Leroy (4):
> powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without
> CONFIG_MODULES
> powerpc/code-patching: Speed up page mapping/unmapping
> powerpc/code-patching: Use jump_label for testing freed initmem
> powerpc/code-patching: Use jump_label to check if poking_init() is
> done
>
> arch/powerpc/include/asm/code-patching.h | 2 ++
> arch/powerpc/lib/code-patching.c | 37 +++++++++++++++---------
> arch/powerpc/mm/mem.c | 2 ++
> 3 files changed, 28 insertions(+), 13 deletions(-)
>

2022-09-27 15:20:39

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v1 1/6] powerpc/code-patching: Use pte_offset_kernel() instead of virt_to_kpte()

virt_to_kpte() checks pmd_none() and returns NULL in that case.

__do_patch_instruction() doesn't expect the pmd to be none and
doesn't handle the case anyway.

So avoid the pmd_none() check by using pte_offset_kernel()
directly.

It improves ftrace activation by approx 1% on an 8xx.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/lib/code-patching.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index ad0cf3108dd0..0f3acb0534b6 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -158,7 +158,7 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr & PAGE_MASK;
patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));

- pte = virt_to_kpte(text_poke_addr);
+ pte = pte_offset_kernel(pmd_off_k(text_poke_addr), text_poke_addr);
__set_pte_at(&init_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0);
/* See ptesync comment in radix__set_pte_at() */
if (radix_enabled())
--
2.37.1

2022-09-27 15:23:34

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v1 3/6] powerpc/feature-fixups: Refactor entry fixups patching

Several fonctions have the same loop for patching instructions.

Introduce function do_patch_entry_fixups() to refactor those loops.

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/lib/feature-fixups.c | 84 ++++++++++++-------------------
1 file changed, 32 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
index 993d3f31832a..6767a6c3106f 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -118,9 +118,33 @@ void do_feature_fixups(unsigned long value, void *fixup_start, void *fixup_end)
}

#ifdef CONFIG_PPC_BOOK3S_64
+static int do_patch_entry_fixups(long *start, long *end, unsigned int *instrs,
+ bool do_fallback, void *fallback)
+{
+ int i;
+
+ for (i = 0; start < end; start++, i++) {
+ unsigned int *dest = (void *)start + *start;
+
+ pr_devel("patching dest %lx\n", (unsigned long)dest);
+
+ // See comment in do_entry_flush_fixups() RE order of patching
+ if (do_fallback) {
+ patch_instruction(dest, ppc_inst(instrs[0]));
+ patch_instruction(dest + 2, ppc_inst(instrs[2]));
+ patch_branch(dest + 1, (unsigned long)fallback, BRANCH_SET_LINK);
+ } else {
+ patch_instruction(dest + 1, ppc_inst(instrs[1]));
+ patch_instruction(dest + 2, ppc_inst(instrs[2]));
+ patch_instruction(dest, ppc_inst(instrs[0]));
+ }
+ }
+ return i;
+}
+
static void do_stf_entry_barrier_fixups(enum stf_barrier_type types)
{
- unsigned int instrs[3], *dest;
+ unsigned int instrs[3];
long *start, *end;
int i;

@@ -144,23 +168,8 @@ static void do_stf_entry_barrier_fixups(enum stf_barrier_type types)
instrs[i++] = PPC_RAW_ORI(_R31, _R31, 0); /* speculation barrier */
}

- for (i = 0; start < end; start++, i++) {
- dest = (void *)start + *start;
-
- pr_devel("patching dest %lx\n", (unsigned long)dest);
-
- // See comment in do_entry_flush_fixups() RE order of patching
- if (types & STF_BARRIER_FALLBACK) {
- patch_instruction(dest, ppc_inst(instrs[0]));
- patch_instruction(dest + 2, ppc_inst(instrs[2]));
- patch_branch(dest + 1,
- (unsigned long)&stf_barrier_fallback, BRANCH_SET_LINK);
- } else {
- patch_instruction(dest + 1, ppc_inst(instrs[1]));
- patch_instruction(dest + 2, ppc_inst(instrs[2]));
- patch_instruction(dest, ppc_inst(instrs[0]));
- }
- }
+ i = do_patch_entry_fixups(start, end, instrs, types & STF_BARRIER_FALLBACK,
+ &stf_barrier_fallback);

printk(KERN_DEBUG "stf-barrier: patched %d entry locations (%s barrier)\n", i,
(types == STF_BARRIER_NONE) ? "no" :
@@ -325,7 +334,7 @@ void do_uaccess_flush_fixups(enum l1d_flush_type types)
static int __do_entry_flush_fixups(void *data)
{
enum l1d_flush_type types = *(enum l1d_flush_type *)data;
- unsigned int instrs[3], *dest;
+ unsigned int instrs[3];
long *start, *end;
int i;

@@ -375,42 +384,13 @@ static int __do_entry_flush_fixups(void *data)

start = PTRRELOC(&__start___entry_flush_fixup);
end = PTRRELOC(&__stop___entry_flush_fixup);
- for (i = 0; start < end; start++, i++) {
- dest = (void *)start + *start;
-
- pr_devel("patching dest %lx\n", (unsigned long)dest);
-
- if (types == L1D_FLUSH_FALLBACK) {
- patch_instruction(dest, ppc_inst(instrs[0]));
- patch_instruction(dest + 2, ppc_inst(instrs[2]));
- patch_branch(dest + 1,
- (unsigned long)&entry_flush_fallback, BRANCH_SET_LINK);
- } else {
- patch_instruction(dest + 1, ppc_inst(instrs[1]));
- patch_instruction(dest + 2, ppc_inst(instrs[2]));
- patch_instruction(dest, ppc_inst(instrs[0]));
- }
- }
+ i = do_patch_entry_fixups(start, end, instrs, types == L1D_FLUSH_FALLBACK,
+ &entry_flush_fallback);

start = PTRRELOC(&__start___scv_entry_flush_fixup);
end = PTRRELOC(&__stop___scv_entry_flush_fixup);
- for (; start < end; start++, i++) {
- dest = (void *)start + *start;
-
- pr_devel("patching dest %lx\n", (unsigned long)dest);
-
- if (types == L1D_FLUSH_FALLBACK) {
- patch_instruction(dest, ppc_inst(instrs[0]));
- patch_instruction(dest + 2, ppc_inst(instrs[2]));
- patch_branch(dest + 1,
- (unsigned long)&scv_entry_flush_fallback, BRANCH_SET_LINK);
- } else {
- patch_instruction(dest + 1, ppc_inst(instrs[1]));
- patch_instruction(dest + 2, ppc_inst(instrs[2]));
- patch_instruction(dest, ppc_inst(instrs[0]));
- }
- }
-
+ i += do_patch_entry_fixups(start, end, instrs, types == L1D_FLUSH_FALLBACK,
+ &scv_entry_flush_fallback);

printk(KERN_DEBUG "entry-flush: patched %d locations (%s flush)\n", i,
(types == L1D_FLUSH_NONE) ? "no" :
--
2.37.1