2024-04-11 17:36:12

by Mike Rapoport

[permalink] [raw]
Subject: [RFC PATCH 0/7] x86/module: use large ROX pages for text allocations

From: "Mike Rapoport (IBM)" <[email protected]>

Hi,

These patches add support for using large ROX pages for allocations of
executable memory on x86.

They address Andy's comments [1] about having executable mappings for code
that was not completely formed.

The approach taken is to allocate ROX memory along with writable but not
executable memory and use the writable copy to perform relocations and
alternatives patching. After the module text gets into its final shape, the
contents of the writable memory is copied into the actual ROX location
using text poking.

The allocations of the ROX memory use vmalloc(VMAP_ALLOW_HUGE_MAP) to
allocate PMD aligned memory, fill that memory with invalid instructions and
in the end remap it as ROX. Portions of these large pages are handed out to
execmem_alloc() callers without any changes to the permissions. When the
memory is freed with execmem_free() it is invalidated again so that it
won't contain stale instructions.

The module memory allocation, x86 code dealing with relocations and
alternatives patching takes into account the existence of the two copies,
the writable memory and the ROX memory at the actual allocated virtual
address.

This is an early RFC, it is not well tested and there is a lot of room for
improvement. For example, the locking of execmem_cache can be made more
fine grained, freeing of PMD-sized pages from execmem_cache can be
implemented with a shrinker, the large pages can be removed from the direct
map when they are added to the cache and restored there when they are free
from the cache.

Still, I'd like to hear feedback on the approach in general before moving
forward with polishing the details.

The series applies on top of v4 of "jit/text allocator" [2] and also
available at git:

https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v4%2bx86-rox

[1] https://lore.kernel.org/all/[email protected]
[2] https://lore.kernel.org/linux-mm/[email protected]

Mike Rapoport (IBM) (6):
asm-generic: introduce text-patching.h
mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations
module: prepare to handle ROX allocations for text
x86/module: perpare module loading for ROX allocations of text
execmem: add support for cache of large ROX pages
x86/module: enable ROX caches for module text

Song Liu (1):
ftrace: Add swap_func to ftrace_process_locs()

arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
.../include/asm/{patch.h => text-patching.h} | 0
arch/arm/kernel/ftrace.c | 2 +-
arch/arm/kernel/jump_label.c | 2 +-
arch/arm/kernel/kgdb.c | 2 +-
arch/arm/kernel/patch.c | 2 +-
arch/arm/probes/kprobes/core.c | 2 +-
arch/arm/probes/kprobes/opt-arm.c | 2 +-
.../asm/{patching.h => text-patching.h} | 0
arch/arm64/kernel/ftrace.c | 2 +-
arch/arm64/kernel/jump_label.c | 2 +-
arch/arm64/kernel/kgdb.c | 2 +-
arch/arm64/kernel/patching.c | 2 +-
arch/arm64/kernel/probes/kprobes.c | 2 +-
arch/arm64/kernel/traps.c | 2 +-
arch/arm64/net/bpf_jit_comp.c | 2 +-
arch/csky/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
.../include/asm/{patch.h => text-patching.h} | 0
arch/parisc/kernel/ftrace.c | 2 +-
arch/parisc/kernel/jump_label.c | 2 +-
arch/parisc/kernel/kgdb.c | 2 +-
arch/parisc/kernel/kprobes.c | 2 +-
arch/parisc/kernel/patch.c | 2 +-
arch/powerpc/include/asm/kprobes.h | 2 +-
.../asm/{code-patching.h => text-patching.h} | 0
arch/powerpc/kernel/crash_dump.c | 2 +-
arch/powerpc/kernel/epapr_paravirt.c | 2 +-
arch/powerpc/kernel/jump_label.c | 2 +-
arch/powerpc/kernel/kgdb.c | 2 +-
arch/powerpc/kernel/kprobes.c | 2 +-
arch/powerpc/kernel/module_32.c | 2 +-
arch/powerpc/kernel/module_64.c | 2 +-
arch/powerpc/kernel/optprobes.c | 2 +-
arch/powerpc/kernel/process.c | 2 +-
arch/powerpc/kernel/security.c | 2 +-
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/powerpc/kernel/static_call.c | 2 +-
arch/powerpc/kernel/trace/ftrace.c | 2 +-
arch/powerpc/kernel/trace/ftrace_64_pg.c | 2 +-
arch/powerpc/lib/code-patching.c | 2 +-
arch/powerpc/lib/feature-fixups.c | 2 +-
arch/powerpc/lib/test-code-patching.c | 2 +-
arch/powerpc/lib/test_emulate_step.c | 2 +-
arch/powerpc/mm/book3s32/mmu.c | 2 +-
arch/powerpc/mm/book3s64/hash_utils.c | 2 +-
arch/powerpc/mm/book3s64/slb.c | 2 +-
arch/powerpc/mm/kasan/init_32.c | 2 +-
arch/powerpc/mm/mem.c | 2 +-
arch/powerpc/mm/nohash/44x.c | 2 +-
arch/powerpc/mm/nohash/book3e_pgtable.c | 2 +-
arch/powerpc/mm/nohash/tlb.c | 2 +-
arch/powerpc/net/bpf_jit_comp.c | 2 +-
arch/powerpc/perf/8xx-pmu.c | 2 +-
arch/powerpc/perf/core-book3s.c | 2 +-
arch/powerpc/platforms/85xx/smp.c | 2 +-
arch/powerpc/platforms/86xx/mpc86xx_smp.c | 2 +-
arch/powerpc/platforms/cell/smp.c | 2 +-
arch/powerpc/platforms/powermac/smp.c | 2 +-
arch/powerpc/platforms/powernv/idle.c | 2 +-
arch/powerpc/platforms/powernv/smp.c | 2 +-
arch/powerpc/platforms/pseries/smp.c | 2 +-
arch/powerpc/xmon/xmon.c | 2 +-
arch/riscv/errata/andes/errata.c | 2 +-
arch/riscv/errata/sifive/errata.c | 2 +-
arch/riscv/errata/thead/errata.c | 2 +-
.../include/asm/{patch.h => text-patching.h} | 0
arch/riscv/include/asm/uprobes.h | 2 +-
arch/riscv/kernel/alternative.c | 2 +-
arch/riscv/kernel/cpufeature.c | 3 +-
arch/riscv/kernel/ftrace.c | 2 +-
arch/riscv/kernel/jump_label.c | 2 +-
arch/riscv/kernel/patch.c | 2 +-
arch/riscv/kernel/probes/kprobes.c | 2 +-
arch/riscv/net/bpf_jit_comp64.c | 2 +-
arch/riscv/net/bpf_jit_core.c | 2 +-
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/kernel/um_arch.c | 16 +-
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/alternative.h | 14 +-
arch/x86/include/asm/text-patching.h | 1 +
arch/x86/kernel/alternative.c | 152 ++++++----
arch/x86/kernel/ftrace.c | 41 ++-
arch/x86/kernel/module.c | 17 +-
arch/x86/mm/init.c | 29 +-
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/text-patching.h | 5 +
include/linux/execmem.h | 25 ++
include/linux/ftrace.h | 2 +
include/linux/module.h | 11 +
include/linux/text-patching.h | 15 +
kernel/module/main.c | 70 ++++-
kernel/module/strict_rwx.c | 3 +
kernel/trace/ftrace.c | 13 +-
mm/execmem.c | 278 +++++++++++++++++-
mm/vmalloc.c | 9 +-
105 files changed, 663 insertions(+), 193 deletions(-)
rename arch/arm/include/asm/{patch.h => text-patching.h} (100%)
rename arch/arm64/include/asm/{patching.h => text-patching.h} (100%)
rename arch/parisc/include/asm/{patch.h => text-patching.h} (100%)
rename arch/powerpc/include/asm/{code-patching.h => text-patching.h} (100%)
rename arch/riscv/include/asm/{patch.h => text-patching.h} (100%)
create mode 100644 include/asm-generic/text-patching.h
create mode 100644 include/linux/text-patching.h

--
2.43.0



2024-04-11 17:36:46

by Mike Rapoport

[permalink] [raw]
Subject: [RFC PATCH 2/7] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations

From: "Mike Rapoport (IBM)" <[email protected]>

vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explictly
specify node ID will use huge pages only if size_per_node is larger than
PMD_SIZE.
Still the actual allocated memory is not distributed between nodes and
there is no advantage in such approach.
On the contrary, BPF allocates PMD_SIZE * num_possible_nodes() for each
new bpf_prog_pack, while it could do with PMD_SIZE'ed packs.

Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with
NUMA_NO_NODE and use huge pages whenever the requested allocation size
is larger than PMD_SIZE.

Signed-off-by: Mike Rapoport (IBM) <[email protected]>
---
mm/vmalloc.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 22aa63f4ef63..5fc8b514e457 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3737,8 +3737,6 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
}

if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
- unsigned long size_per_node;
-
/*
* Try huge pages. Only try for PAGE_KERNEL allocations,
* others like modules don't yet expect huge pages in
@@ -3746,13 +3744,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
* supporting them.
*/

- size_per_node = size;
- if (node == NUMA_NO_NODE)
- size_per_node /= num_online_nodes();
- if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
+ if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
shift = PMD_SHIFT;
else
- shift = arch_vmap_pte_supported_shift(size_per_node);
+ shift = arch_vmap_pte_supported_shift(size);

align = max(real_align, 1UL << shift);
size = ALIGN(real_size, 1UL << shift);
--
2.43.0


2024-04-11 17:36:46

by Mike Rapoport

[permalink] [raw]
Subject: [RFC PATCH 1/7] asm-generic: introduce text-patching.h

From: "Mike Rapoport (IBM)" <[email protected]>

Several architectures support text patching, but they name the header
files that declare patching functions differently.

Make all such headers consistently named text-patching.h and add an empty
header in asm-generic for architectures that do not support text patching.

Signed-off-by: Mike Rapoport (IBM) <[email protected]>
---
arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
arch/arm/include/asm/{patch.h => text-patching.h} | 0
arch/arm/kernel/ftrace.c | 2 +-
arch/arm/kernel/jump_label.c | 2 +-
arch/arm/kernel/kgdb.c | 2 +-
arch/arm/kernel/patch.c | 2 +-
arch/arm/probes/kprobes/core.c | 2 +-
arch/arm/probes/kprobes/opt-arm.c | 2 +-
.../include/asm/{patching.h => text-patching.h} | 0
arch/arm64/kernel/ftrace.c | 2 +-
arch/arm64/kernel/jump_label.c | 2 +-
arch/arm64/kernel/kgdb.c | 2 +-
arch/arm64/kernel/patching.c | 2 +-
arch/arm64/kernel/probes/kprobes.c | 2 +-
arch/arm64/kernel/traps.c | 2 +-
arch/arm64/net/bpf_jit_comp.c | 2 +-
arch/csky/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
.../include/asm/{patch.h => text-patching.h} | 0
arch/parisc/kernel/ftrace.c | 2 +-
arch/parisc/kernel/jump_label.c | 2 +-
arch/parisc/kernel/kgdb.c | 2 +-
arch/parisc/kernel/kprobes.c | 2 +-
arch/parisc/kernel/patch.c | 2 +-
arch/powerpc/include/asm/kprobes.h | 2 +-
.../asm/{code-patching.h => text-patching.h} | 0
arch/powerpc/kernel/crash_dump.c | 2 +-
arch/powerpc/kernel/epapr_paravirt.c | 2 +-
arch/powerpc/kernel/jump_label.c | 2 +-
arch/powerpc/kernel/kgdb.c | 2 +-
arch/powerpc/kernel/kprobes.c | 2 +-
arch/powerpc/kernel/module_32.c | 2 +-
arch/powerpc/kernel/module_64.c | 2 +-
arch/powerpc/kernel/optprobes.c | 2 +-
arch/powerpc/kernel/process.c | 2 +-
arch/powerpc/kernel/security.c | 2 +-
arch/powerpc/kernel/setup_32.c | 2 +-
arch/powerpc/kernel/setup_64.c | 2 +-
arch/powerpc/kernel/static_call.c | 2 +-
arch/powerpc/kernel/trace/ftrace.c | 2 +-
arch/powerpc/kernel/trace/ftrace_64_pg.c | 2 +-
arch/powerpc/lib/code-patching.c | 2 +-
arch/powerpc/lib/feature-fixups.c | 2 +-
arch/powerpc/lib/test-code-patching.c | 2 +-
arch/powerpc/lib/test_emulate_step.c | 2 +-
arch/powerpc/mm/book3s32/mmu.c | 2 +-
arch/powerpc/mm/book3s64/hash_utils.c | 2 +-
arch/powerpc/mm/book3s64/slb.c | 2 +-
arch/powerpc/mm/kasan/init_32.c | 2 +-
arch/powerpc/mm/mem.c | 2 +-
arch/powerpc/mm/nohash/44x.c | 2 +-
arch/powerpc/mm/nohash/book3e_pgtable.c | 2 +-
arch/powerpc/mm/nohash/tlb.c | 2 +-
arch/powerpc/net/bpf_jit_comp.c | 2 +-
arch/powerpc/perf/8xx-pmu.c | 2 +-
arch/powerpc/perf/core-book3s.c | 2 +-
arch/powerpc/platforms/85xx/smp.c | 2 +-
arch/powerpc/platforms/86xx/mpc86xx_smp.c | 2 +-
arch/powerpc/platforms/cell/smp.c | 2 +-
arch/powerpc/platforms/powermac/smp.c | 2 +-
arch/powerpc/platforms/powernv/idle.c | 2 +-
arch/powerpc/platforms/powernv/smp.c | 2 +-
arch/powerpc/platforms/pseries/smp.c | 2 +-
arch/powerpc/xmon/xmon.c | 2 +-
arch/riscv/errata/andes/errata.c | 2 +-
arch/riscv/errata/sifive/errata.c | 2 +-
arch/riscv/errata/thead/errata.c | 2 +-
.../include/asm/{patch.h => text-patching.h} | 0
arch/riscv/include/asm/uprobes.h | 2 +-
arch/riscv/kernel/alternative.c | 2 +-
arch/riscv/kernel/cpufeature.c | 3 ++-
arch/riscv/kernel/ftrace.c | 2 +-
arch/riscv/kernel/jump_label.c | 2 +-
arch/riscv/kernel/patch.c | 2 +-
arch/riscv/kernel/probes/kprobes.c | 2 +-
arch/riscv/net/bpf_jit_comp64.c | 2 +-
arch/riscv/net/bpf_jit_core.c | 2 +-
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/kernel/um_arch.c | 5 +++++
arch/x86/include/asm/text-patching.h | 1 +
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/text-patching.h | 5 +++++
include/linux/text-patching.h | 15 +++++++++++++++
91 files changed, 109 insertions(+), 69 deletions(-)
rename arch/arm/include/asm/{patch.h => text-patching.h} (100%)
rename arch/arm64/include/asm/{patching.h => text-patching.h} (100%)
rename arch/parisc/include/asm/{patch.h => text-patching.h} (100%)
rename arch/powerpc/include/asm/{code-patching.h => text-patching.h} (100%)
rename arch/riscv/include/asm/{patch.h => text-patching.h} (100%)
create mode 100644 include/asm-generic/text-patching.h
create mode 100644 include/linux/text-patching.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 396caece6d6d..483965c5a4de 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -5,3 +5,4 @@ generic-y += agp.h
generic-y += asm-offsets.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += text-patching.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 3c1afa524b9c..955741f08b93 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/arm/include/asm/patch.h b/arch/arm/include/asm/text-patching.h
similarity index 100%
rename from arch/arm/include/asm/patch.h
rename to arch/arm/include/asm/text-patching.h
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index a0b6d1e3812f..e9a2a3096967 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -23,7 +23,7 @@
#include <asm/insn.h>
#include <asm/set_memory.h>
#include <asm/stacktrace.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

/*
* The compiler emitted profiling hook consists of
diff --git a/arch/arm/kernel/jump_label.c b/arch/arm/kernel/jump_label.c
index eb9c24b6e8e2..a06a92d0f550 100644
--- a/arch/arm/kernel/jump_label.c
+++ b/arch/arm/kernel/jump_label.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/kernel.h>
#include <linux/jump_label.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/insn.h>

static void __arch_jump_label_transform(struct jump_entry *entry,
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index 22f937e6f3ff..ab76c55fd610 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -15,7 +15,7 @@
#include <linux/kgdb.h>
#include <linux/uaccess.h>

-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/traps.h>

struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index e9e828b6bb30..4d45e60cd46d 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -9,7 +9,7 @@
#include <asm/fixmap.h>
#include <asm/smp_plat.h>
#include <asm/opcodes.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

struct patch {
void *addr;
diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index d8238da095df..9fd877c87a38 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -25,7 +25,7 @@
#include <asm/cacheflush.h>
#include <linux/percpu.h>
#include <linux/bug.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>

#include "../decode-arm.h"
diff --git a/arch/arm/probes/kprobes/opt-arm.c b/arch/arm/probes/kprobes/opt-arm.c
index 7f65048380ca..966c6042c5ad 100644
--- a/arch/arm/probes/kprobes/opt-arm.c
+++ b/arch/arm/probes/kprobes/opt-arm.c
@@ -14,7 +14,7 @@
/* for arm_gen_branch */
#include <asm/insn.h>
/* for patch_text */
-#include <asm/patch.h>
+#include <asm/text-patching.h>

#include "core.h"

diff --git a/arch/arm64/include/asm/patching.h b/arch/arm64/include/asm/text-patching.h
similarity index 100%
rename from arch/arm64/include/asm/patching.h
rename to arch/arm64/include/asm/text-patching.h
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index a650f5e11fc5..3575d03d60af 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -15,7 +15,7 @@
#include <asm/debug-monitors.h>
#include <asm/ftrace.h>
#include <asm/insn.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
struct fregs_offset {
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
index faf88ec9c48e..bf089a06a2d1 100644
--- a/arch/arm64/kernel/jump_label.c
+++ b/arch/arm64/kernel/jump_label.c
@@ -8,7 +8,7 @@
#include <linux/kernel.h>
#include <linux/jump_label.h>
#include <asm/insn.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>

void arch_jump_label_transform(struct jump_entry *entry,
enum jump_label_type type)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 4e1f983df3d1..f3c4d3a8a20f 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -17,7 +17,7 @@

#include <asm/debug-monitors.h>
#include <asm/insn.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/traps.h>

struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
index 255534930368..bd183c7a65b2 100644
--- a/arch/arm64/kernel/patching.c
+++ b/arch/arm64/kernel/patching.c
@@ -10,7 +10,7 @@
#include <asm/fixmap.h>
#include <asm/insn.h>
#include <asm/kprobes.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>

static DEFINE_RAW_SPINLOCK(patch_lock);
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index 4268678d0e86..01dbe9a56956 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -27,7 +27,7 @@
#include <asm/debug-monitors.h>
#include <asm/insn.h>
#include <asm/irq.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/ptrace.h>
#include <asm/sections.h>
#include <asm/system_misc.h>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 215e6d7f2df8..878c805c8561 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -41,7 +41,7 @@
#include <asm/extable.h>
#include <asm/insn.h>
#include <asm/kprobes.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/traps.h>
#include <asm/smp.h>
#include <asm/stack_pointer.h>
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 456f5af239fc..5736caf33ff3 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -19,7 +19,7 @@
#include <asm/cacheflush.h>
#include <asm/debug-monitors.h>
#include <asm/insn.h>
-#include <asm/patching.h>
+#include <asm/text-patching.h>
#include <asm/set_memory.h>

#include "bpf_jit.h"
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 1117c28cb7e8..2ac9542342f8 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -10,3 +10,4 @@ generic-y += qspinlock.h
generic-y += parport.h
generic-y += user.h
generic-y += vmlinux.lds.h
+generic-y += text-patching.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 3ece3c93fe08..c9cd8a600f23 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += extable.h
generic-y += iomap.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += text-patching.h
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 2dbec7853ae8..676907b23fe7 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -27,3 +27,4 @@ generic-y += param.h
generic-y += posix_types.h
generic-y += resource.h
generic-y += kvm_para.h
+generic-y += text-patching.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index 0dbf9c5c6fae..b282e0dd8dc1 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += spinlock.h
+generic-y += text-patching.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index a055f5dbe00a..7178f990e8b3 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -8,3 +8,4 @@ generic-y += parport.h
generic-y += syscalls.h
generic-y += tlb.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 7ba67a0d6c97..684569b2ecd6 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -13,3 +13,4 @@ generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 7fe7437555fb..52af5c1ee6f6 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -5,3 +5,4 @@ generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += spinlock.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index c8c99b554ca4..4526edffc8d6 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -7,3 +7,4 @@ generic-y += spinlock.h
generic-y += qrwlock_types.h
generic-y += qrwlock.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/arch/parisc/include/asm/patch.h b/arch/parisc/include/asm/text-patching.h
similarity index 100%
rename from arch/parisc/include/asm/patch.h
rename to arch/parisc/include/asm/text-patching.h
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 621a4b386ae4..fd8c34152783 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -20,7 +20,7 @@
#include <asm/assembly.h>
#include <asm/sections.h>
#include <asm/ftrace.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

#define __hot __section(".text.hot")

diff --git a/arch/parisc/kernel/jump_label.c b/arch/parisc/kernel/jump_label.c
index e253b134500d..ea51f15bf0e6 100644
--- a/arch/parisc/kernel/jump_label.c
+++ b/arch/parisc/kernel/jump_label.c
@@ -8,7 +8,7 @@
#include <linux/jump_label.h>
#include <linux/bug.h>
#include <asm/alternative.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

static inline int reassemble_17(int as17)
{
diff --git a/arch/parisc/kernel/kgdb.c b/arch/parisc/kernel/kgdb.c
index b16fa9bac5f4..fee81f877525 100644
--- a/arch/parisc/kernel/kgdb.c
+++ b/arch/parisc/kernel/kgdb.c
@@ -16,7 +16,7 @@
#include <asm/ptrace.h>
#include <asm/traps.h>
#include <asm/processor.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/cacheflush.h>

const struct kgdb_arch arch_kgdb_ops = {
diff --git a/arch/parisc/kernel/kprobes.c b/arch/parisc/kernel/kprobes.c
index 6e0b86652f30..9255adba67a3 100644
--- a/arch/parisc/kernel/kprobes.c
+++ b/arch/parisc/kernel/kprobes.c
@@ -12,7 +12,7 @@
#include <linux/kprobes.h>
#include <linux/slab.h>
#include <asm/cacheflush.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c
index e59574f65e64..35dd764b871e 100644
--- a/arch/parisc/kernel/patch.c
+++ b/arch/parisc/kernel/patch.c
@@ -13,7 +13,7 @@

#include <asm/cacheflush.h>
#include <asm/fixmap.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

struct patch {
void *addr;
diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h
index 4525a9c68260..dfe2e5ad3b21 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -21,7 +21,7 @@
#include <linux/percpu.h>
#include <linux/module.h>
#include <asm/probes.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>

#ifdef CONFIG_KPROBES
#define __ARCH_WANT_KPROBES_INSN_SLOT
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/text-patching.h
similarity index 100%
rename from arch/powerpc/include/asm/code-patching.h
rename to arch/powerpc/include/asm/text-patching.h
diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index 2086fa6cdc25..103b6605dd68 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -13,7 +13,7 @@
#include <linux/io.h>
#include <linux/memblock.h>
#include <linux/of.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/kdump.h>
#include <asm/firmware.h>
#include <linux/uio.h>
diff --git a/arch/powerpc/kernel/epapr_paravirt.c b/arch/powerpc/kernel/epapr_paravirt.c
index d4b8aff20815..247ab2acaccc 100644
--- a/arch/powerpc/kernel/epapr_paravirt.c
+++ b/arch/powerpc/kernel/epapr_paravirt.c
@@ -9,7 +9,7 @@
#include <linux/of_fdt.h>
#include <asm/epapr_hcalls.h>
#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/machdep.h>
#include <asm/inst.h>

diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
index 5277cf582c16..2659e1ac8604 100644
--- a/arch/powerpc/kernel/jump_label.c
+++ b/arch/powerpc/kernel/jump_label.c
@@ -5,7 +5,7 @@

#include <linux/kernel.h>
#include <linux/jump_label.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/inst.h>

void arch_jump_label_transform(struct jump_entry *entry,
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index ebe4d1645ca1..0699640a460e 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -21,7 +21,7 @@
#include <asm/processor.h>
#include <asm/machdep.h>
#include <asm/debug.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <linux/slab.h>
#include <asm/inst.h>

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 14c5ddec3056..9dff58143720 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -21,7 +21,7 @@
#include <linux/slab.h>
#include <linux/set_memory.h>
#include <linux/execmem.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cacheflush.h>
#include <asm/sstep.h>
#include <asm/sections.h>
diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index 816a63fd71fb..f930e3395a7f 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -18,7 +18,7 @@
#include <linux/bug.h>
#include <linux/sort.h>
#include <asm/setup.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>

/* Count how many different relocations (different symbol, different
addend) */
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7112adc597a8..2f07071229a3 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -17,7 +17,7 @@
#include <linux/kernel.h>
#include <asm/module.h>
#include <asm/firmware.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <linux/sort.h>
#include <asm/setup.h>
#include <asm/sections.h>
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 004fae2044a3..a540f13210fb 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -13,7 +13,7 @@
#include <asm/kprobes.h>
#include <asm/ptrace.h>
#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/sstep.h>
#include <asm/ppc-opcode.h>
#include <asm/inst.h>
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9452a54d356c..9e8c1dccb150 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -54,7 +54,7 @@
#include <asm/firmware.h>
#include <asm/hw_irq.h>
#endif
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/exec.h>
#include <asm/livepatch.h>
#include <asm/cpu_has_feature.h>
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 4856e1a5161c..fbb7ebd8aa08 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -14,7 +14,7 @@
#include <linux/debugfs.h>

#include <asm/asm-prototypes.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/security_features.h>
#include <asm/sections.h>
#include <asm/setup.h>
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index b761cc1a403c..fae65e7e5fcc 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -40,7 +40,7 @@
#include <asm/time.h>
#include <asm/serial.h>
#include <asm/udbg.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cpu_has_feature.h>
#include <asm/asm-prototypes.h>
#include <asm/kdump.h>
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 2f19d5e94485..01c089caf88c 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -60,7 +60,7 @@
#include <asm/xmon.h>
#include <asm/udbg.h>
#include <asm/kexec.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/ftrace.h>
#include <asm/opal.h>
#include <asm/cputhreads.h>
diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c
index 863a7aa24650..f1875b55f0a6 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -2,7 +2,7 @@
#include <linux/memory.h>
#include <linux/static_call.h>

-#include <asm/code-patching.h>
+#include <asm/text-patching.h>

void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
{
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index d8d6b4fd9a14..be1a245241b3 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -23,7 +23,7 @@
#include <linux/list.h>

#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/ftrace.h>
#include <asm/syscall.h>
#include <asm/inst.h>
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..9e862ba55263 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -23,7 +23,7 @@
#include <linux/list.h>

#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/ftrace.h>
#include <asm/syscall.h>
#include <asm/inst.h>
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 7af791446ddf..3a2bd3a71c47 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -17,7 +17,7 @@
#include <asm/tlb.h>
#include <asm/tlbflush.h>
#include <asm/page.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/inst.h>

static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr)
diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
index 4f82581ca203..2957f1040ccb 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -16,7 +16,7 @@
#include <linux/sched/mm.h>
#include <linux/stop_machine.h>
#include <asm/cputable.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/interrupt.h>
#include <asm/page.h>
#include <asm/sections.h>
diff --git a/arch/powerpc/lib/test-code-patching.c b/arch/powerpc/lib/test-code-patching.c
index c44823292f73..5380c6b681b2 100644
--- a/arch/powerpc/lib/test-code-patching.c
+++ b/arch/powerpc/lib/test-code-patching.c
@@ -6,7 +6,7 @@
#include <linux/vmalloc.h>
#include <linux/init.h>

-#include <asm/code-patching.h>
+#include <asm/text-patching.h>

static int __init instr_is_branch_to_addr(const u32 *instr, unsigned long addr)
{
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 23c7805fb7b3..66b5b4fa1686 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -11,7 +11,7 @@
#include <asm/cpu_has_feature.h>
#include <asm/sstep.h>
#include <asm/ppc-opcode.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/inst.h>

#define MAX_SUBTESTS 16
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 625fe7d08e06..bc44066ec9f8 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -25,7 +25,7 @@

#include <asm/mmu.h>
#include <asm/machdep.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>

#include <mm/mmu_decl.h>
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 01c3b4b65241..9381274f040c 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -57,7 +57,7 @@
#include <asm/sections.h>
#include <asm/copro.h>
#include <asm/udbg.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/fadump.h>
#include <asm/firmware.h>
#include <asm/tm.h>
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index f2708c8629a5..6b783552403c 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -24,7 +24,7 @@
#include <linux/pgtable.h>

#include <asm/udbg.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>

#include "internal.h"

diff --git a/arch/powerpc/mm/kasan/init_32.c b/arch/powerpc/mm/kasan/init_32.c
index aa9aa11927b2..03666d790a53 100644
--- a/arch/powerpc/mm/kasan/init_32.c
+++ b/arch/powerpc/mm/kasan/init_32.c
@@ -7,7 +7,7 @@
#include <linux/memblock.h>
#include <linux/sched/task.h>
#include <asm/pgalloc.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <mm/mmu_decl.h>

static pgprot_t __init kasan_prot_ro(void)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 82723dc966e4..6bf161beabac 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -25,7 +25,7 @@
#include <asm/svm.h>
#include <asm/mmzone.h>
#include <asm/ftrace.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/setup.h>
#include <asm/fixmap.h>

diff --git a/arch/powerpc/mm/nohash/44x.c b/arch/powerpc/mm/nohash/44x.c
index 1beae802bb1c..6d10c6d8be71 100644
--- a/arch/powerpc/mm/nohash/44x.c
+++ b/arch/powerpc/mm/nohash/44x.c
@@ -24,7 +24,7 @@
#include <asm/mmu.h>
#include <asm/page.h>
#include <asm/cacheflush.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/smp.h>

#include <mm/mmu_decl.h>
diff --git a/arch/powerpc/mm/nohash/book3e_pgtable.c b/arch/powerpc/mm/nohash/book3e_pgtable.c
index 1c5e4ecbebeb..4601b9ce228d 100644
--- a/arch/powerpc/mm/nohash/book3e_pgtable.c
+++ b/arch/powerpc/mm/nohash/book3e_pgtable.c
@@ -10,7 +10,7 @@
#include <asm/pgalloc.h>
#include <asm/tlb.h>
#include <asm/dma.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>

#include <mm/mmu_decl.h>

diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 5ffa0af4328a..297e61d242cd 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -37,7 +37,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/tlb.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cputhreads.h>
#include <asm/hugetlb.h>
#include <asm/paca.h>
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 0f9a21783329..48be919d058e 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -18,7 +18,7 @@
#include <linux/bpf.h>

#include <asm/kprobes.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>

#include "bpf_jit.h"

diff --git a/arch/powerpc/perf/8xx-pmu.c b/arch/powerpc/perf/8xx-pmu.c
index 308a2e40d7be..1d2972229e3a 100644
--- a/arch/powerpc/perf/8xx-pmu.c
+++ b/arch/powerpc/perf/8xx-pmu.c
@@ -14,7 +14,7 @@
#include <asm/machdep.h>
#include <asm/firmware.h>
#include <asm/ptrace.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/inst.h>

#define PERF_8xx_ID_CPU_CYCLES 1
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 6b5f8a94e7d8..c0534534fcfd 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -16,7 +16,7 @@
#include <asm/machdep.h>
#include <asm/firmware.h>
#include <asm/ptrace.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/hw_irq.h>
#include <asm/interrupt.h>

diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index 40aa58206888..165d9ec29691 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -23,7 +23,7 @@
#include <asm/mpic.h>
#include <asm/cacheflush.h>
#include <asm/dbell.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/cputhreads.h>
#include <asm/fsl_pm.h>

diff --git a/arch/powerpc/platforms/86xx/mpc86xx_smp.c b/arch/powerpc/platforms/86xx/mpc86xx_smp.c
index 8a7e55acf090..9be33e41af6d 100644
--- a/arch/powerpc/platforms/86xx/mpc86xx_smp.c
+++ b/arch/powerpc/platforms/86xx/mpc86xx_smp.c
@@ -12,7 +12,7 @@
#include <linux/delay.h>
#include <linux/pgtable.h>

-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/page.h>
#include <asm/pci-bridge.h>
#include <asm/mpic.h>
diff --git a/arch/powerpc/platforms/cell/smp.c b/arch/powerpc/platforms/cell/smp.c
index 30394c6f8894..14c2b45fdd52 100644
--- a/arch/powerpc/platforms/cell/smp.c
+++ b/arch/powerpc/platforms/cell/smp.c
@@ -35,7 +35,7 @@
#include <asm/firmware.h>
#include <asm/rtas.h>
#include <asm/cputhreads.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>

#include "interrupt.h"
#include <asm/udbg.h>
diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index 15644be31990..6a50f6c408d1 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -35,7 +35,7 @@

#include <asm/ptrace.h>
#include <linux/atomic.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/irq.h>
#include <asm/page.h>
#include <asm/sections.h>
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index ad41dffe4d92..d98b933e4984 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -18,7 +18,7 @@
#include <asm/opal.h>
#include <asm/cputhreads.h>
#include <asm/cpuidle.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/smp.h>
#include <asm/runlatch.h>
#include <asm/dbell.h>
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 8f14f0581a21..6b746feeabe4 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -28,7 +28,7 @@
#include <asm/xive.h>
#include <asm/opal.h>
#include <asm/runlatch.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/dbell.h>
#include <asm/kvm_ppc.h>
#include <asm/ppc-opcode.h>
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index c597711ef20a..db99725e752b 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -39,7 +39,7 @@
#include <asm/xive.h>
#include <asm/dbell.h>
#include <asm/plpar_wrappers.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/svm.h>
#include <asm/kvm_guest.h>

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index d79d6633f333..08b8c211743b 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -50,7 +50,7 @@
#include <asm/xive.h>
#include <asm/opal.h>
#include <asm/firmware.h>
-#include <asm/code-patching.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>
#include <asm/inst.h>
#include <asm/interrupt.h>
diff --git a/arch/riscv/errata/andes/errata.c b/arch/riscv/errata/andes/errata.c
index f2708a9494a1..1dbc476b91de 100644
--- a/arch/riscv/errata/andes/errata.c
+++ b/arch/riscv/errata/andes/errata.c
@@ -13,7 +13,7 @@
#include <asm/alternative.h>
#include <asm/cacheflush.h>
#include <asm/errata_list.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/processor.h>
#include <asm/sbi.h>
#include <asm/vendorid_list.h>
diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
index 3d9a32d791f7..cdd5ed2f0b6d 100644
--- a/arch/riscv/errata/sifive/errata.c
+++ b/arch/riscv/errata/sifive/errata.c
@@ -8,7 +8,7 @@
#include <linux/module.h>
#include <linux/string.h>
#include <linux/bug.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/alternative.h>
#include <asm/vendorid_list.h>
#include <asm/errata_list.h>
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index b1c410bbc1ae..4bb39e471a4f 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -16,7 +16,7 @@
#include <asm/errata_list.h>
#include <asm/hwprobe.h>
#include <asm/io.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/vendorid_list.h>

static bool errata_probe_pbmt(unsigned int stage,
diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/text-patching.h
similarity index 100%
rename from arch/riscv/include/asm/patch.h
rename to arch/riscv/include/asm/text-patching.h
diff --git a/arch/riscv/include/asm/uprobes.h b/arch/riscv/include/asm/uprobes.h
index 3fc7deda9190..5008f76cdc27 100644
--- a/arch/riscv/include/asm/uprobes.h
+++ b/arch/riscv/include/asm/uprobes.h
@@ -4,7 +4,7 @@
#define _ASM_RISCV_UPROBES_H

#include <asm/probes.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/bug.h>

#define MAX_UINSN_BYTES 8
diff --git a/arch/riscv/kernel/alternative.c b/arch/riscv/kernel/alternative.c
index 0128b161bfda..7eb3cb1215c6 100644
--- a/arch/riscv/kernel/alternative.c
+++ b/arch/riscv/kernel/alternative.c
@@ -18,7 +18,7 @@
#include <asm/sbi.h>
#include <asm/csr.h>
#include <asm/insn.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

struct cpu_manufacturer_info_t {
unsigned long vendor_id;
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 3ed2359eae35..4119429becad 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -20,7 +20,8 @@
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/hwcap.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
+#include <asm/hwprobe.h>
#include <asm/processor.h>
#include <asm/sbi.h>
#include <asm/vector.h>
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index f5aa24d9e1c1..4b6d02cf2015 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -9,7 +9,7 @@
#include <linux/uaccess.h>
#include <linux/memory.h>
#include <asm/cacheflush.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

#ifdef CONFIG_DYNAMIC_FTRACE
void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c
index e6694759dbd0..9acb0bf56637 100644
--- a/arch/riscv/kernel/jump_label.c
+++ b/arch/riscv/kernel/jump_label.c
@@ -9,7 +9,7 @@
#include <linux/memory.h>
#include <linux/mutex.h>
#include <asm/bug.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

#define RISCV_INSN_NOP 0x00000013U
#define RISCV_INSN_JAL 0x0000006fU
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 37e87fdcf6a0..65fa689872e3 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -13,7 +13,7 @@
#include <asm/cacheflush.h>
#include <asm/fixmap.h>
#include <asm/ftrace.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/sections.h>

struct patch_insn {
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index e64f2f3064eb..97a4393b11e1 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -11,7 +11,7 @@
#include <asm/sections.h>
#include <asm/cacheflush.h>
#include <asm/bug.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>

#include "decode-insn.h"

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 1adf2f39ce59..0787e63bf9e4 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -10,7 +10,7 @@
#include <linux/filter.h>
#include <linux/memory.h>
#include <linux/stop_machine.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/cfi.h>
#include "bpf_jit.h"

diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index e238fdbd5dbc..33bfdda5618b 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -9,7 +9,7 @@
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/memory.h>
-#include <asm/patch.h>
+#include <asm/text-patching.h>
#include <asm/cfi.h>
#include "bpf_jit.h"

diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index fc44d9c88b41..4d3f10ed8275 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -3,3 +3,4 @@ generated-y += syscall_table.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += text-patching.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 43b0ae4c2c21..17ee8a273aa6 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -4,3 +4,4 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += text-patching.h
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 7a9820797eae..c5ca89e62552 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -470,6 +470,11 @@ void *text_poke(void *addr, const void *opcode, size_t len)
return memcpy(addr, opcode, len);
}

+void *text_poke_copy(void *addr, const void *opcode, size_t len)
+{
+ return text_poke(addr, opcode, len);
+}
+
void text_poke_sync(void)
{
}
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index 345aafbc1964..a2424db8e628 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -35,6 +35,7 @@ extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void text_poke_sync(void);
extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
extern void *text_poke_copy(void *addr, const void *opcode, size_t len);
+#define text_poke_copy text_poke_copy
extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len, bool core_ok);
extern void *text_poke_set(void *addr, int c, size_t len);
extern int poke_int3_handler(struct pt_regs *regs);
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index fa07c686cbcc..cc5dba738389 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -8,3 +8,4 @@ generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
generic-y += user.h
+generic-y += text-patching.h
diff --git a/include/asm-generic/text-patching.h b/include/asm-generic/text-patching.h
new file mode 100644
index 000000000000..2245c641b741
--- /dev/null
+++ b/include/asm-generic/text-patching.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_TEXT_PATCHING_H
+#define _ASM_GENERIC_TEXT_PATCHING_H
+
+#endif /* _ASM_GENERIC_TEXT_PATCHING_H */
diff --git a/include/linux/text-patching.h b/include/linux/text-patching.h
new file mode 100644
index 000000000000..ad5877ab0855
--- /dev/null
+++ b/include/linux/text-patching.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TEXT_PATCHING_H
+#define _LINUX_TEXT_PATCHING_H
+
+#include <asm/text-patching.h>
+
+#ifndef text_poke_copy
+static inline void *text_poke_copy(void *dst, const void *src, size_t len)
+{
+ return memcpy(dst, src, len);
+}
+#define text_poke_copy text_poke_copy
+#endif
+
+#endif /* _LINUX_TEXT_PATCHING_H */
--
2.43.0


2024-04-11 17:37:04

by Mike Rapoport

[permalink] [raw]
Subject: [RFC PATCH 3/7] module: prepare to handle ROX allocations for text

From: "Mike Rapoport (IBM)" <[email protected]>

In order to support ROX allocations for module text, it is necessary to
handle modifications to the code, such as relocations and alternatives
patching, without write access to that memory.

One option is to use text patching, but this would make module loading
extremely slow and will expose executable code that is not finally formed.

A better way is to have memory allocated with ROX permissions contain
invalid instructions and keep a writable, but not executable copy of the
module text. The relocations and alternative patches would be done on the
writable copy using the addresses of the ROX memory.
Once the module is completely ready, the updated text will be copied to ROX
memory using text patching in one go and the writable copy will be freed.

Add support for that to module initialization code and provide necessary
interfaces in execmem.
---
include/linux/execmem.h | 23 +++++++++++++
include/linux/module.h | 11 ++++++
kernel/module/main.c | 70 ++++++++++++++++++++++++++++++++++----
kernel/module/strict_rwx.c | 3 ++
mm/execmem.c | 11 ++++++
5 files changed, 111 insertions(+), 7 deletions(-)

diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index ffd0d12feef5..9d22999dbd7d 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -46,9 +46,11 @@ enum execmem_type {
/**
* enum execmem_range_flags - options for executable memory allocations
* @EXECMEM_KASAN_SHADOW: allocate kasan shadow
+ * @EXECMEM_READ_ONLY: allocated memory should be mapped as read only
*/
enum execmem_range_flags {
EXECMEM_KASAN_SHADOW = (1 << 0),
+ EXECMEM_ROX_CACHE = (1 << 1),
};

/**
@@ -123,6 +125,27 @@ void *execmem_alloc(enum execmem_type type, size_t size);
*/
void execmem_free(void *ptr);

+/**
+ * execmem_update_copy - copy an update to executable memory
+ * @dst: destination address to update
+ * @src: source address containing the data
+ * @size: how many bytes of memory shold be copied
+ *
+ * Copy @size bytes from @src to @dst using text poking if the memory at
+ * @dst is read-only.
+ *
+ * Return: a pointer to @dst or NULL on error
+ */
+void *execmem_update_copy(void *dst, const void *src, size_t size);
+
+/**
+ * execmem_is_rox - check if execmem is read-only
+ * @type - the execmem type to check
+ *
+ * Return: %true if the @type is read-only, %false if it's writable
+ */
+bool execmem_is_rox(enum execmem_type type);
+
#ifdef CONFIG_ARCH_WANTS_EXECMEM_EARLY
void execmem_early_init(void);
#else
diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..3df3202680a2 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -361,6 +361,8 @@ enum mod_mem_type {

struct module_memory {
void *base;
+ void *rw_copy;
+ bool is_rox;
unsigned int size;

#ifdef CONFIG_MODULES_TREE_LOOKUP
@@ -368,6 +370,15 @@ struct module_memory {
#endif
};

+#ifdef CONFIG_MODULES
+unsigned long module_writable_offset(struct module *mod, void *loc);
+#else
+static inline unsigned long module_writable_offset(struct module *mod, void *loc)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_MODULES_TREE_LOOKUP
/* Only touch one cacheline for common rbtree-for-core-layout case. */
#define __module_memory_align ____cacheline_aligned
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 91e185607d4b..f83fbb9c95ee 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1188,6 +1188,21 @@ void __weak module_arch_freeing_init(struct module *mod)
{
}

+unsigned long module_writable_offset(struct module *mod, void *loc)
+{
+ if (!mod)
+ return 0;
+
+ for_class_mod_mem_type(type, text) {
+ struct module_memory *mem = &mod->mem[type];
+
+ if (loc >= mem->base && loc < mem->base + mem->size)
+ return (unsigned long)(mem->rw_copy - mem->base);
+ }
+
+ return 0;
+}
+
static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
{
unsigned int size = PAGE_ALIGN(mod->mem[type].size);
@@ -1205,6 +1220,23 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
if (!ptr)
return -ENOMEM;

+ mod->mem[type].base = ptr;
+
+ if (execmem_is_rox(execmem_type)) {
+ ptr = vzalloc(size);
+
+ if (!ptr) {
+ execmem_free(mod->mem[type].base);
+ return -ENOMEM;
+ }
+
+ mod->mem[type].rw_copy = ptr;
+ mod->mem[type].is_rox = true;
+ } else {
+ mod->mem[type].rw_copy = mod->mem[type].base;
+ memset(mod->mem[type].base, 0, size);
+ }
+
/*
* The pointer to these blocks of memory are stored on the module
* structure and we keep that around so long as the module is
@@ -1218,15 +1250,16 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
*/
kmemleak_not_leak(ptr);

- memset(ptr, 0, size);
- mod->mem[type].base = ptr;
-
return 0;
}

static void module_memory_free(struct module *mod, enum mod_mem_type type)
{
- void *ptr = mod->mem[type].base;
+ struct module_memory *mem = &mod->mem[type];
+ void *ptr = mem->base;
+
+ if (mem->is_rox)
+ vfree(mem->rw_copy);

execmem_free(ptr);
}
@@ -2237,6 +2270,7 @@ static int move_module(struct module *mod, struct load_info *info)
for_each_mod_mem_type(type) {
if (!mod->mem[type].size) {
mod->mem[type].base = NULL;
+ mod->mem[type].rw_copy = NULL;
continue;
}

@@ -2253,11 +2287,14 @@ static int move_module(struct module *mod, struct load_info *info)
void *dest;
Elf_Shdr *shdr = &info->sechdrs[i];
enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
+ unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
+ unsigned long addr;

if (!(shdr->sh_flags & SHF_ALLOC))
continue;

- dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
+ addr = (unsigned long)mod->mem[type].base + offset;
+ dest = mod->mem[type].rw_copy + offset;

if (shdr->sh_type != SHT_NOBITS) {
/*
@@ -2279,7 +2316,7 @@ static int move_module(struct module *mod, struct load_info *info)
* users of info can keep taking advantage and using the newly
* minted official memory area.
*/
- shdr->sh_addr = (unsigned long)dest;
+ shdr->sh_addr = addr;
pr_debug("\t0x%lx 0x%.8lx %s\n", (long)shdr->sh_addr,
(long)shdr->sh_size, info->secstrings + shdr->sh_name);
}
@@ -2429,6 +2466,8 @@ int __weak module_finalize(const Elf_Ehdr *hdr,

static int post_relocation(struct module *mod, const struct load_info *info)
{
+ int ret;
+
/* Sort exception table now relocations are done. */
sort_extable(mod->extable, mod->extable + mod->num_exentries);

@@ -2440,7 +2479,24 @@ static int post_relocation(struct module *mod, const struct load_info *info)
add_kallsyms(mod, info);

/* Arch-specific module finalizing. */
- return module_finalize(info->hdr, info->sechdrs, mod);
+ ret = module_finalize(info->hdr, info->sechdrs, mod);
+ if (ret)
+ return ret;
+
+ for_each_mod_mem_type(type) {
+ struct module_memory *mem = &mod->mem[type];
+
+ if (mem->is_rox) {
+ if (!execmem_update_copy(mem->base, mem->rw_copy,
+ mem->size))
+ return -ENOMEM;
+
+ vfree(mem->rw_copy);
+ mem->rw_copy = NULL;
+ }
+ }
+
+ return 0;
}

/* Call module constructors. */
diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c
index c45caa4690e5..239e5013359d 100644
--- a/kernel/module/strict_rwx.c
+++ b/kernel/module/strict_rwx.c
@@ -34,6 +34,9 @@ int module_enable_text_rox(const struct module *mod)
for_class_mod_mem_type(type, text) {
int ret;

+ if (mod->mem[type].is_rox)
+ continue;
+
if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
ret = module_set_memory(mod, type, set_memory_rox);
else
diff --git a/mm/execmem.c b/mm/execmem.c
index aabc0afabdbc..c920d2b5a721 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -4,6 +4,7 @@
#include <linux/vmalloc.h>
#include <linux/execmem.h>
#include <linux/moduleloader.h>
+#include <linux/text-patching.h>

static struct execmem_info *execmem_info __ro_after_init;
static struct execmem_info default_execmem_info __ro_after_init;
@@ -63,6 +64,16 @@ void execmem_free(void *ptr)
vfree(ptr);
}

+void *execmem_update_copy(void *dst, const void *src, size_t size)
+{
+ return text_poke_copy(dst, src, size);
+}
+
+bool execmem_is_rox(enum execmem_type type)
+{
+ return !!(execmem_info->ranges[type].flags & EXECMEM_ROX_CACHE);
+}
+
static bool execmem_validate(struct execmem_info *info)
{
struct execmem_range *r = &info->ranges[EXECMEM_DEFAULT];
--
2.43.0


2024-04-11 17:37:19

by Mike Rapoport

[permalink] [raw]
Subject: [RFC PATCH 4/7] ftrace: Add swap_func to ftrace_process_locs()

From: Song Liu <[email protected]>

ftrace_process_locs sorts module mcount, which is inside RO memory. Add a
ftrace_swap_func so that archs can use RO-memory-poke function to do the
sorting.

Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Mike Rapoport (IBM) <[email protected]>
---
include/linux/ftrace.h | 2 ++
kernel/trace/ftrace.c | 13 ++++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 54d53f345d14..54393ce57f08 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1172,4 +1172,6 @@ unsigned long arch_syscall_addr(int nr);

#endif /* CONFIG_FTRACE_SYSCALLS */

+void ftrace_swap_func(void *a, void *b, int n);
+
#endif /* _LINUX_FTRACE_H */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index da1710499698..95f11c8cdc5e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6480,6 +6480,17 @@ static void test_is_sorted(unsigned long *start, unsigned long count)
}
#endif

+void __weak ftrace_swap_func(void *a, void *b, int n)
+{
+ unsigned long t;
+
+ WARN_ON_ONCE(n != sizeof(t));
+
+ t = *((unsigned long *)a);
+ *(unsigned long *)a = *(unsigned long *)b;
+ *(unsigned long *)b = t;
+}
+
static int ftrace_process_locs(struct module *mod,
unsigned long *start,
unsigned long *end)
@@ -6507,7 +6518,7 @@ static int ftrace_process_locs(struct module *mod,
*/
if (!IS_ENABLED(CONFIG_BUILDTIME_MCOUNT_SORT) || mod) {
sort(start, count, sizeof(*start),
- ftrace_cmp_ips, NULL);
+ ftrace_cmp_ips, ftrace_swap_func);
} else {
test_is_sorted(start, count);
}
--
2.43.0


2024-04-11 17:37:35

by Mike Rapoport

[permalink] [raw]
Subject: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text

From: "Mike Rapoport (IBM)" <[email protected]>

When module text memory will be allocated with ROX permissions, the
memory at the actual address where the module will live will contain
invalid instructions and there will be a writable copy that contains the
actual module code.

Update relocations and alternatives patching to deal with it.

Signed-off-by: Mike Rapoport (IBM) <[email protected]>
---
arch/um/kernel/um_arch.c | 11 ++-
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/alternative.h | 14 +--
arch/x86/kernel/alternative.c | 152 +++++++++++++++++------------
arch/x86/kernel/ftrace.c | 41 +++++---
arch/x86/kernel/module.c | 17 ++--
6 files changed, 140 insertions(+), 98 deletions(-)

diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index c5ca89e62552..5183c955974e 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -437,24 +437,25 @@ void __init arch_cpu_finalize_init(void)
os_check_bugs();
}

-void apply_seal_endbr(s32 *start, s32 *end)
+void apply_seal_endbr(s32 *start, s32 *end, struct module *mod)
{
}

-void apply_retpolines(s32 *start, s32 *end)
+void apply_retpolines(s32 *start, s32 *end, struct module *mod)
{
}

-void apply_returns(s32 *start, s32 *end)
+void apply_returns(s32 *start, s32 *end, struct module *mod)
{
}

void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
- s32 *start_cfi, s32 *end_cfi)
+ s32 *start_cfi, s32 *end_cfi, struct module *mod)
{
}

-void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
+void apply_alternatives(struct alt_instr *start, struct alt_instr *end,
+ struct module *mod)
{
}

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 6d83ceb7f1ba..31412adef5d2 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -51,7 +51,8 @@ int __init init_vdso_image(const struct vdso_image *image)

apply_alternatives((struct alt_instr *)(image->data + image->alt),
(struct alt_instr *)(image->data + image->alt +
- image->alt_len));
+ image->alt_len),
+ NULL);

return 0;
}
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index fcd20c6dc7f9..6f4b0776fc89 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -96,16 +96,16 @@ extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
* instructions were patched in already:
*/
extern int alternatives_patched;
+struct module;

extern void alternative_instructions(void);
-extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
-extern void apply_retpolines(s32 *start, s32 *end);
-extern void apply_returns(s32 *start, s32 *end);
-extern void apply_seal_endbr(s32 *start, s32 *end);
+extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end,
+ struct module *mod);
+extern void apply_retpolines(s32 *start, s32 *end, struct module *mod);
+extern void apply_returns(s32 *start, s32 *end, struct module *mod);
+extern void apply_seal_endbr(s32 *start, s32 *end, struct module *mod);
extern void apply_fineibt(s32 *start_retpoline, s32 *end_retpoine,
- s32 *start_cfi, s32 *end_cfi);
-
-struct module;
+ s32 *start_cfi, s32 *end_cfi, struct module *mod);

struct callthunk_sites {
s32 *call_start, *call_end;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 45a280f2161c..b4d6868df573 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -462,7 +462,8 @@ static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a)
* to refetch changed I$ lines.
*/
void __init_or_module noinline apply_alternatives(struct alt_instr *start,
- struct alt_instr *end)
+ struct alt_instr *end,
+ struct module *mod)
{
struct alt_instr *a;
u8 *instr, *replacement;
@@ -490,10 +491,18 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
* order.
*/
for (a = start; a < end; a++) {
+ unsigned long ins_offs, repl_offs;
int insn_buff_sz = 0;
+ u8 *wr_instr, *wr_replacement;

instr = (u8 *)&a->instr_offset + a->instr_offset;
+ ins_offs = module_writable_offset(mod, instr);
+ wr_instr = instr + ins_offs;
+
replacement = (u8 *)&a->repl_offset + a->repl_offset;
+ repl_offs = module_writable_offset(mod, replacement);
+ wr_replacement = replacement + repl_offs;
+
BUG_ON(a->instrlen > sizeof(insn_buff));
BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);

@@ -504,17 +513,17 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
* patch if feature is *NOT* present.
*/
if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT)) {
- optimize_nops_inplace(instr, a->instrlen);
+ optimize_nops_inplace(wr_instr, a->instrlen);
continue;
}

- DPRINTK(ALT, "feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d) flags: 0x%x",
+ DPRINTK(ALT, "feat: %d*32+%d, old: (%px (%px) len: %d), repl: (%px (%px), len: %d) flags: 0x%x",
a->cpuid >> 5,
a->cpuid & 0x1f,
- instr, instr, a->instrlen,
- replacement, a->replacementlen, a->flags);
+ instr, wr_instr, a->instrlen,
+ replacement, wr_replacement, a->replacementlen, a->flags);

- memcpy(insn_buff, replacement, a->replacementlen);
+ memcpy(insn_buff, wr_replacement, a->replacementlen);
insn_buff_sz = a->replacementlen;

if (a->flags & ALT_FLAG_DIRECT_CALL) {
@@ -528,11 +537,11 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,

apply_relocation(insn_buff, a->instrlen, instr, replacement, a->replacementlen);

- DUMP_BYTES(ALT, instr, a->instrlen, "%px: old_insn: ", instr);
+ DUMP_BYTES(ALT, wr_instr, a->instrlen, "%px: old_insn: ", instr);
DUMP_BYTES(ALT, replacement, a->replacementlen, "%px: rpl_insn: ", replacement);
DUMP_BYTES(ALT, insn_buff, insn_buff_sz, "%px: final_insn: ", instr);

- text_poke_early(instr, insn_buff, insn_buff_sz);
+ text_poke_early(wr_instr, insn_buff, insn_buff_sz);
}

kasan_enable_current();
@@ -723,18 +732,20 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
/*
* Generated by 'objtool --retpoline'.
*/
-void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end,
+ struct module *mod)
{
s32 *s;

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);
struct insn insn;
int len, ret;
u8 bytes[16];
u8 op1, op2;

- ret = insn_decode_kernel(&insn, addr);
+ ret = insn_decode_kernel(&insn, wr_addr);
if (WARN_ON_ONCE(ret < 0))
continue;

@@ -762,9 +773,9 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
len = patch_retpoline(addr, &insn, bytes);
if (len == insn.length) {
optimize_nops(bytes, len);
- DUMP_BYTES(RETPOLINE, ((u8*)addr), len, "%px: orig: ", addr);
+ DUMP_BYTES(RETPOLINE, ((u8*)wr_addr), len, "%px: orig: ", addr);
DUMP_BYTES(RETPOLINE, ((u8*)bytes), len, "%px: repl: ", addr);
- text_poke_early(addr, bytes, len);
+ text_poke_early(wr_addr, bytes, len);
}
}
}
@@ -800,7 +811,8 @@ static int patch_return(void *addr, struct insn *insn, u8 *bytes)
return i;
}

-void __init_or_module noinline apply_returns(s32 *start, s32 *end)
+void __init_or_module noinline apply_returns(s32 *start, s32 *end,
+ struct module *mod)
{
s32 *s;

@@ -809,12 +821,13 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)

for (s = start; s < end; s++) {
void *dest = NULL, *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);
struct insn insn;
int len, ret;
u8 bytes[16];
u8 op;

- ret = insn_decode_kernel(&insn, addr);
+ ret = insn_decode_kernel(&insn, wr_addr);
if (WARN_ON_ONCE(ret < 0))
continue;

@@ -834,28 +847,31 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end)

len = patch_return(addr, &insn, bytes);
if (len == insn.length) {
- DUMP_BYTES(RET, ((u8*)addr), len, "%px: orig: ", addr);
+ DUMP_BYTES(RET, ((u8*)wr_addr), len, "%px: orig: ", addr);
DUMP_BYTES(RET, ((u8*)bytes), len, "%px: repl: ", addr);
- text_poke_early(addr, bytes, len);
+ text_poke_early(wr_addr, bytes, len);
}
}
}
#else
-void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }
+void __init_or_module noinline apply_returns(s32 *start, s32 *end,
+ struct module *mod) { }
#endif /* CONFIG_MITIGATION_RETHUNK */

#else /* !CONFIG_MITIGATION_RETPOLINE || !CONFIG_OBJTOOL */

-void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
-void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end,
+ struct module *mod) { }
+void __init_or_module noinline apply_returns(s32 *start, s32 *end,
+ struct module *mod) { }

#endif /* CONFIG_MITIGATION_RETPOLINE && CONFIG_OBJTOOL */

#ifdef CONFIG_X86_KERNEL_IBT

-static void poison_cfi(void *addr);
+static void poison_cfi(void *addr, void *wr_addr);

-static void __init_or_module poison_endbr(void *addr, bool warn)
+static void __init_or_module poison_endbr(void *addr, void *wr_addr, bool warn)
{
u32 endbr, poison = gen_endbr_poison();

@@ -874,7 +890,7 @@ static void __init_or_module poison_endbr(void *addr, bool warn)
*/
DUMP_BYTES(ENDBR, ((u8*)addr), 4, "%px: orig: ", addr);
DUMP_BYTES(ENDBR, ((u8*)&poison), 4, "%px: repl: ", addr);
- text_poke_early(addr, &poison, 4);
+ text_poke_early(wr_addr, &poison, 4);
}

/*
@@ -883,22 +899,23 @@ static void __init_or_module poison_endbr(void *addr, bool warn)
* Seal the functions for indirect calls by clobbering the ENDBR instructions
* and the kCFI hash value.
*/
-void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end)
+void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end, struct module *mod)
{
s32 *s;

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);

- poison_endbr(addr, true);
+ poison_endbr(addr, wr_addr, true);
if (IS_ENABLED(CONFIG_FINEIBT))
- poison_cfi(addr - 16);
+ poison_cfi(addr - 16, wr_addr - 16);
}
}

#else

-void __init_or_module apply_seal_endbr(s32 *start, s32 *end) { }
+void __init_or_module apply_seal_endbr(s32 *start, s32 *end, struct module *mod) { }

#endif /* CONFIG_X86_KERNEL_IBT */

@@ -1120,7 +1137,7 @@ static u32 decode_caller_hash(void *addr)
}

/* .retpoline_sites */
-static int cfi_disable_callers(s32 *start, s32 *end)
+static int cfi_disable_callers(s32 *start, s32 *end, struct module *mod)
{
/*
* Disable kCFI by patching in a JMP.d8, this leaves the hash immediate
@@ -1132,6 +1149,7 @@ static int cfi_disable_callers(s32 *start, s32 *end)

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);
u32 hash;

addr -= fineibt_caller_size;
@@ -1139,13 +1157,13 @@ static int cfi_disable_callers(s32 *start, s32 *end)
if (!hash) /* nocfi callers */
continue;

- text_poke_early(addr, jmp, 2);
+ text_poke_early(wr_addr, jmp, 2);
}

return 0;
}

-static int cfi_enable_callers(s32 *start, s32 *end)
+static int cfi_enable_callers(s32 *start, s32 *end, struct module *mod)
{
/*
* Re-enable kCFI, undo what cfi_disable_callers() did.
@@ -1155,6 +1173,7 @@ static int cfi_enable_callers(s32 *start, s32 *end)

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);
u32 hash;

addr -= fineibt_caller_size;
@@ -1162,19 +1181,20 @@ static int cfi_enable_callers(s32 *start, s32 *end)
if (!hash) /* nocfi callers */
continue;

- text_poke_early(addr, mov, 2);
+ text_poke_early(wr_addr, mov, 2);
}

return 0;
}

/* .cfi_sites */
-static int cfi_rand_preamble(s32 *start, s32 *end)
+static int cfi_rand_preamble(s32 *start, s32 *end, struct module *mod)
{
s32 *s;

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);
u32 hash;

hash = decode_preamble_hash(addr);
@@ -1183,18 +1203,19 @@ static int cfi_rand_preamble(s32 *start, s32 *end)
return -EINVAL;

hash = cfi_rehash(hash);
- text_poke_early(addr + 1, &hash, 4);
+ text_poke_early(wr_addr + 1, &hash, 4);
}

return 0;
}

-static int cfi_rewrite_preamble(s32 *start, s32 *end)
+static int cfi_rewrite_preamble(s32 *start, s32 *end, struct module *mod)
{
s32 *s;

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);
u32 hash;

hash = decode_preamble_hash(addr);
@@ -1202,59 +1223,62 @@ static int cfi_rewrite_preamble(s32 *start, s32 *end)
addr, addr, 5, addr))
return -EINVAL;

- text_poke_early(addr, fineibt_preamble_start, fineibt_preamble_size);
+ text_poke_early(wr_addr, fineibt_preamble_start, fineibt_preamble_size);
WARN_ON(*(u32 *)(addr + fineibt_preamble_hash) != 0x12345678);
- text_poke_early(addr + fineibt_preamble_hash, &hash, 4);
+ text_poke_early(wr_addr + fineibt_preamble_hash, &hash, 4);
}

return 0;
}

-static void cfi_rewrite_endbr(s32 *start, s32 *end)
+static void cfi_rewrite_endbr(s32 *start, s32 *end, struct module *mod)
{
s32 *s;

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);

- poison_endbr(addr+16, false);
+ poison_endbr(addr+16, wr_addr, false);
}
}

/* .retpoline_sites */
-static int cfi_rand_callers(s32 *start, s32 *end)
+static int cfi_rand_callers(s32 *start, s32 *end, struct module *mod)
{
s32 *s;

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);
u32 hash;

addr -= fineibt_caller_size;
hash = decode_caller_hash(addr);
if (hash) {
hash = -cfi_rehash(hash);
- text_poke_early(addr + 2, &hash, 4);
+ text_poke_early(wr_addr + 2, &hash, 4);
}
}

return 0;
}

-static int cfi_rewrite_callers(s32 *start, s32 *end)
+static int cfi_rewrite_callers(s32 *start, s32 *end, struct module *mod)
{
s32 *s;

for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ void *wr_addr = addr + module_writable_offset(mod, addr);
u32 hash;

addr -= fineibt_caller_size;
hash = decode_caller_hash(addr);
if (hash) {
- text_poke_early(addr, fineibt_caller_start, fineibt_caller_size);
+ text_poke_early(wr_addr, fineibt_caller_start, fineibt_caller_size);
WARN_ON(*(u32 *)(addr + fineibt_caller_hash) != 0x12345678);
- text_poke_early(addr + fineibt_caller_hash, &hash, 4);
+ text_poke_early(wr_addr + fineibt_caller_hash, &hash, 4);
}
/* rely on apply_retpolines() */
}
@@ -1263,8 +1287,9 @@ static int cfi_rewrite_callers(s32 *start, s32 *end)
}

static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
- s32 *start_cfi, s32 *end_cfi, bool builtin)
+ s32 *start_cfi, s32 *end_cfi, struct module *mod)
{
+ bool builtin = mod ? false : true;
int ret;

if (WARN_ONCE(fineibt_preamble_size != 16,
@@ -1282,7 +1307,7 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
* rewrite them. This disables all CFI. If this succeeds but any of the
* later stages fails, we're without CFI.
*/
- ret = cfi_disable_callers(start_retpoline, end_retpoline);
+ ret = cfi_disable_callers(start_retpoline, end_retpoline, mod);
if (ret)
goto err;

@@ -1293,11 +1318,11 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
cfi_bpf_subprog_hash = cfi_rehash(cfi_bpf_subprog_hash);
}

- ret = cfi_rand_preamble(start_cfi, end_cfi);
+ ret = cfi_rand_preamble(start_cfi, end_cfi, mod);
if (ret)
goto err;

- ret = cfi_rand_callers(start_retpoline, end_retpoline);
+ ret = cfi_rand_callers(start_retpoline, end_retpoline, mod);
if (ret)
goto err;
}
@@ -1309,7 +1334,7 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
return;

case CFI_KCFI:
- ret = cfi_enable_callers(start_retpoline, end_retpoline);
+ ret = cfi_enable_callers(start_retpoline, end_retpoline, mod);
if (ret)
goto err;

@@ -1319,17 +1344,17 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,

case CFI_FINEIBT:
/* place the FineIBT preamble at func()-16 */
- ret = cfi_rewrite_preamble(start_cfi, end_cfi);
+ ret = cfi_rewrite_preamble(start_cfi, end_cfi, mod);
if (ret)
goto err;

/* rewrite the callers to target func()-16 */
- ret = cfi_rewrite_callers(start_retpoline, end_retpoline);
+ ret = cfi_rewrite_callers(start_retpoline, end_retpoline, mod);
if (ret)
goto err;

/* now that nobody targets func()+0, remove ENDBR there */
- cfi_rewrite_endbr(start_cfi, end_cfi);
+ cfi_rewrite_endbr(start_cfi, end_cfi, mod);

if (builtin)
pr_info("Using FineIBT CFI\n");
@@ -1348,7 +1373,7 @@ static inline void poison_hash(void *addr)
*(u32 *)addr = 0;
}

-static void poison_cfi(void *addr)
+static void poison_cfi(void *addr, void *wr_addr)
{
switch (cfi_mode) {
case CFI_FINEIBT:
@@ -1360,8 +1385,8 @@ static void poison_cfi(void *addr)
* ud2
* 1: nop
*/
- poison_endbr(addr, false);
- poison_hash(addr + fineibt_preamble_hash);
+ poison_endbr(addr, wr_addr, false);
+ poison_hash(wr_addr + fineibt_preamble_hash);
break;

case CFI_KCFI:
@@ -1370,7 +1395,7 @@ static void poison_cfi(void *addr)
* movl $0, %eax
* .skip 11, 0x90
*/
- poison_hash(addr + 1);
+ poison_hash(wr_addr + 1);
break;

default:
@@ -1381,22 +1406,21 @@ static void poison_cfi(void *addr)
#else

static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
- s32 *start_cfi, s32 *end_cfi, bool builtin)
+ s32 *start_cfi, s32 *end_cfi, struct module *mod)
{
}

#ifdef CONFIG_X86_KERNEL_IBT
-static void poison_cfi(void *addr) { }
+static void poison_cfi(void *addr, void *wr_addr) { }
#endif

#endif

void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
- s32 *start_cfi, s32 *end_cfi)
+ s32 *start_cfi, s32 *end_cfi, struct module *mod)
{
return __apply_fineibt(start_retpoline, end_retpoline,
- start_cfi, end_cfi,
- /* .builtin = */ false);
+ start_cfi, end_cfi, mod);
}

#ifdef CONFIG_SMP
@@ -1693,16 +1717,16 @@ void __init alternative_instructions(void)
paravirt_set_cap();

__apply_fineibt(__retpoline_sites, __retpoline_sites_end,
- __cfi_sites, __cfi_sites_end, true);
+ __cfi_sites, __cfi_sites_end, NULL);

/*
* Rewrite the retpolines, must be done before alternatives since
* those can rewrite the retpoline thunks.
*/
- apply_retpolines(__retpoline_sites, __retpoline_sites_end);
- apply_returns(__return_sites, __return_sites_end);
+ apply_retpolines(__retpoline_sites, __retpoline_sites_end, NULL);
+ apply_returns(__return_sites, __return_sites_end, NULL);

- apply_alternatives(__alt_instructions, __alt_instructions_end);
+ apply_alternatives(__alt_instructions, __alt_instructions_end, NULL);

/*
* Now all calls are established. Apply the call thunks if
@@ -1713,7 +1737,7 @@ void __init alternative_instructions(void)
/*
* Seal all functions that do not have their address taken.
*/
- apply_seal_endbr(__ibt_endbr_seal, __ibt_endbr_seal_end);
+ apply_seal_endbr(__ibt_endbr_seal, __ibt_endbr_seal_end, NULL);

#ifdef CONFIG_SMP
/* Patch to UP if other cpus not imminent. */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8da0e66ca22d..563d9a890ce2 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -118,10 +118,13 @@ ftrace_modify_code_direct(unsigned long ip, const char *old_code,
return ret;

/* replace the text with the new text */
- if (ftrace_poke_late)
+ if (ftrace_poke_late) {
text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
- else
- text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
+ } else {
+ mutex_lock(&text_mutex);
+ text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE);
+ mutex_unlock(&text_mutex);
+ }
return 0;
}

@@ -318,7 +321,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
unsigned const char op_ref[] = { 0x48, 0x8b, 0x15 };
unsigned const char retq[] = { RET_INSN_OPCODE, INT3_INSN_OPCODE };
union ftrace_op_code_union op_ptr;
- int ret;
+ void *ret;

if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
start_offset = (unsigned long)ftrace_regs_caller;
@@ -349,15 +352,15 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
npages = DIV_ROUND_UP(*tramp_size, PAGE_SIZE);

/* Copy ftrace_caller onto the trampoline memory */
- ret = copy_from_kernel_nofault(trampoline, (void *)start_offset, size);
- if (WARN_ON(ret < 0))
+ ret = text_poke_copy(trampoline, (void *)start_offset, size);
+ if (WARN_ON(!ret))
goto fail;

ip = trampoline + size;
if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
__text_gen_insn(ip, JMP32_INSN_OPCODE, ip, x86_return_thunk, JMP32_INSN_SIZE);
else
- memcpy(ip, retq, sizeof(retq));
+ text_poke_copy(ip, retq, sizeof(retq));

/* No need to test direct calls on created trampolines */
if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
@@ -365,8 +368,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
ip = trampoline + (jmp_offset - start_offset);
if (WARN_ON(*(char *)ip != 0x75))
goto fail;
- ret = copy_from_kernel_nofault(ip, x86_nops[2], 2);
- if (ret < 0)
+ if (!text_poke_copy(ip, x86_nops[2], 2))
goto fail;
}

@@ -379,7 +381,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
*/

ptr = (unsigned long *)(trampoline + size + RET_SIZE);
- *ptr = (unsigned long)ops;
+ text_poke_copy(ptr, &ops, sizeof(unsigned long));

op_offset -= start_offset;
memcpy(&op_ptr, trampoline + op_offset, OP_REF_SIZE);
@@ -395,7 +397,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
op_ptr.offset = offset;

/* put in the new offset to the ftrace_ops */
- memcpy(trampoline + op_offset, &op_ptr, OP_REF_SIZE);
+ text_poke_copy(trampoline + op_offset, &op_ptr, OP_REF_SIZE);

/* put in the call to the function */
mutex_lock(&text_mutex);
@@ -405,9 +407,9 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
* the depth accounting before the call already.
*/
dest = ftrace_ops_get_func(ops);
- memcpy(trampoline + call_offset,
- text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest),
- CALL_INSN_SIZE);
+ text_poke_copy_locked(trampoline + call_offset,
+ text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest),
+ CALL_INSN_SIZE, false);
mutex_unlock(&text_mutex);

/* ALLOC_TRAMP flags lets us know we created it */
@@ -654,4 +656,15 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
}
#endif

+void ftrace_swap_func(void *a, void *b, int n)
+{
+ unsigned long t;
+
+ WARN_ON_ONCE(n != sizeof(t));
+
+ t = *((unsigned long *)a);
+ text_poke_copy(a, b, sizeof(t));
+ text_poke_copy(b, &t, sizeof(t));
+}
+
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 837450b6e882..30eed5228f44 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -146,18 +146,21 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs,
}

if (apply) {
- if (memcmp(loc, &zero, size)) {
+ void *wr_loc = loc + module_writable_offset(me, loc);
+
+ if (memcmp(wr_loc, &zero, size)) {
pr_err("x86/modules: Invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",
(int)ELF64_R_TYPE(rel[i].r_info), loc, val);
return -ENOEXEC;
}
- write(loc, &val, size);
+ write(wr_loc, &val, size);
} else {
if (memcmp(loc, &val, size)) {
pr_warn("x86/modules: Invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n",
(int)ELF64_R_TYPE(rel[i].r_info), loc, val);
return -ENOEXEC;
}
+ /* FIXME: needs care for ROX module allocations */
write(loc, &zero, size);
}
}
@@ -265,20 +268,20 @@ int module_finalize(const Elf_Ehdr *hdr,
csize = cfi->sh_size;
}

- apply_fineibt(rseg, rseg + rsize, cseg, cseg + csize);
+ apply_fineibt(rseg, rseg + rsize, cseg, cseg + csize, me);
}
if (retpolines) {
void *rseg = (void *)retpolines->sh_addr;
- apply_retpolines(rseg, rseg + retpolines->sh_size);
+ apply_retpolines(rseg, rseg + retpolines->sh_size, me);
}
if (returns) {
void *rseg = (void *)returns->sh_addr;
- apply_returns(rseg, rseg + returns->sh_size);
+ apply_returns(rseg, rseg + returns->sh_size, me);
}
if (alt) {
/* patch .altinstructions */
void *aseg = (void *)alt->sh_addr;
- apply_alternatives(aseg, aseg + alt->sh_size);
+ apply_alternatives(aseg, aseg + alt->sh_size, me);
}
if (calls || alt) {
struct callthunk_sites cs = {};
@@ -297,7 +300,7 @@ int module_finalize(const Elf_Ehdr *hdr,
}
if (ibt_endbr) {
void *iseg = (void *)ibt_endbr->sh_addr;
- apply_seal_endbr(iseg, iseg + ibt_endbr->sh_size);
+ apply_seal_endbr(iseg, iseg + ibt_endbr->sh_size, me);
}
if (locks) {
void *lseg = (void *)locks->sh_addr;
--
2.43.0


2024-04-11 17:37:52

by Mike Rapoport

[permalink] [raw]
Subject: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

From: "Mike Rapoport (IBM)" <[email protected]>

Using large pages to map text areas reduces iTLB pressure and improves
performance.

Extend execmem_alloc() with an ability to use PMD_SIZE'ed pages with ROX
permissions as a cache for smaller allocations.

To populate the cache, a writable large page is allocated from vmalloc with
VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
ROX.

Portions of that large page are handed out to execmem_alloc() callers
without any changes to the permissions.

When the memory is freed with execmem_free() it is invalidated again so
that it won't contain stale instructions.

The cache is enabled when an architecture sets EXECMEM_ROX_CACHE flag in
definition of an execmem_range.

Signed-off-by: Mike Rapoport (IBM) <[email protected]>
---
include/linux/execmem.h | 2 +
mm/execmem.c | 267 ++++++++++++++++++++++++++++++++++++++--
2 files changed, 262 insertions(+), 7 deletions(-)

diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 9d22999dbd7d..06f678e6fe55 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -77,12 +77,14 @@ struct execmem_range {

/**
* struct execmem_info - architecture parameters for code allocations
+ * @invalidate: set memory to contain invalid instructions
* @ranges: array of parameter sets defining architecture specific
* parameters for executable memory allocations. The ranges that are not
* explicitly initialized by an architecture use parameters defined for
* @EXECMEM_DEFAULT.
*/
struct execmem_info {
+ void (*invalidate)(void *ptr, size_t size, bool writable);
struct execmem_range ranges[EXECMEM_TYPE_MAX];
};

diff --git a/mm/execmem.c b/mm/execmem.c
index c920d2b5a721..716fba68ab0e 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -1,30 +1,88 @@
// SPDX-License-Identifier: GPL-2.0

#include <linux/mm.h>
+#include <linux/mutex.h>
#include <linux/vmalloc.h>
#include <linux/execmem.h>
+#include <linux/maple_tree.h>
#include <linux/moduleloader.h>
#include <linux/text-patching.h>

+#include <asm/tlbflush.h>
+
+#include "internal.h"
+
static struct execmem_info *execmem_info __ro_after_init;
static struct execmem_info default_execmem_info __ro_after_init;

-static void *__execmem_alloc(struct execmem_range *range, size_t size)
+struct execmem_cache {
+ struct mutex mutex;
+ struct maple_tree busy_areas;
+ struct maple_tree free_areas;
+};
+
+static struct execmem_cache execmem_cache = {
+ .mutex = __MUTEX_INITIALIZER(execmem_cache.mutex),
+ .busy_areas = MTREE_INIT_EXT(busy_areas, MT_FLAGS_LOCK_EXTERN,
+ execmem_cache.mutex),
+ .free_areas = MTREE_INIT_EXT(free_areas, MT_FLAGS_LOCK_EXTERN,
+ execmem_cache.mutex),
+};
+
+static void execmem_cache_clean(struct work_struct *work)
+{
+ struct maple_tree *free_areas = &execmem_cache.free_areas;
+ struct mutex *mutex = &execmem_cache.mutex;
+ MA_STATE(mas, free_areas, 0, ULONG_MAX);
+ void *area;
+
+ mutex_lock(mutex);
+ mas_for_each(&mas, area, ULONG_MAX) {
+ size_t size;
+
+ if (!xa_is_value(area))
+ continue;
+
+ size = xa_to_value(area);
+
+ if (IS_ALIGNED(size, PMD_SIZE) && IS_ALIGNED(mas.index, PMD_SIZE)) {
+ void *ptr = (void *)mas.index;
+
+ mas_erase(&mas);
+ vfree(ptr);
+ }
+ }
+ mutex_unlock(mutex);
+}
+
+static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean);
+
+static void execmem_invalidate(void *ptr, size_t size, bool writable)
+{
+ if (execmem_info->invalidate)
+ execmem_info->invalidate(ptr, size, writable);
+ else
+ memset(ptr, 0, size);
+}
+
+static void *execmem_vmalloc(struct execmem_range *range, size_t size,
+ pgprot_t pgprot, unsigned long vm_flags)
{
bool kasan = range->flags & EXECMEM_KASAN_SHADOW;
- unsigned long vm_flags = VM_FLUSH_RESET_PERMS;
gfp_t gfp_flags = GFP_KERNEL | __GFP_NOWARN;
+ unsigned int align = range->alignment;
unsigned long start = range->start;
unsigned long end = range->end;
- unsigned int align = range->alignment;
- pgprot_t pgprot = range->pgprot;
void *p;

if (kasan)
vm_flags |= VM_DEFER_KMEMLEAK;

- p = __vmalloc_node_range(size, align, start, end, gfp_flags,
- pgprot, vm_flags, NUMA_NO_NODE,
+ if (vm_flags & VM_ALLOW_HUGE_VMAP)
+ align = PMD_SIZE;
+
+ p = __vmalloc_node_range(size, align, start, end, gfp_flags, pgprot,
+ vm_flags, NUMA_NO_NODE,
__builtin_return_address(0));
if (!p && range->fallback_start) {
start = range->fallback_start;
@@ -44,6 +102,199 @@ static void *__execmem_alloc(struct execmem_range *range, size_t size)
return NULL;
}

+ return p;
+}
+
+static int execmem_cache_add(void *ptr, size_t size)
+{
+ struct maple_tree *free_areas = &execmem_cache.free_areas;
+ struct mutex *mutex = &execmem_cache.mutex;
+ unsigned long addr = (unsigned long)ptr;
+ MA_STATE(mas, free_areas, addr - 1, addr + 1);
+ unsigned long lower, lower_size = 0;
+ unsigned long upper, upper_size = 0;
+ unsigned long area_size;
+ void *area = NULL;
+ int err;
+
+ lower = addr;
+ upper = addr + size - 1;
+
+ mutex_lock(mutex);
+ area = mas_walk(&mas);
+ if (area && xa_is_value(area) && mas.last == addr - 1) {
+ lower = mas.index;
+ lower_size = xa_to_value(area);
+ }
+
+ area = mas_next(&mas, ULONG_MAX);
+ if (area && xa_is_value(area) && mas.index == addr + size) {
+ upper = mas.last;
+ upper_size = xa_to_value(area);
+ }
+
+ mas_set_range(&mas, lower, upper);
+ area_size = lower_size + upper_size + size;
+ err = mas_store_gfp(&mas, xa_mk_value(area_size), GFP_KERNEL);
+ mutex_unlock(mutex);
+ if (err)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void *__execmem_cache_alloc(size_t size)
+{
+ struct maple_tree *free_areas = &execmem_cache.free_areas;
+ struct maple_tree *busy_areas = &execmem_cache.busy_areas;
+ MA_STATE(mas_free, free_areas, 0, ULONG_MAX);
+ MA_STATE(mas_busy, busy_areas, 0, ULONG_MAX);
+ struct mutex *mutex = &execmem_cache.mutex;
+ unsigned long addr, last, area_size = 0;
+ void *area, *ptr = NULL;
+ int err;
+
+ mutex_lock(mutex);
+ mas_for_each(&mas_free, area, ULONG_MAX) {
+ area_size = xa_to_value(area);
+ if (area_size >= size)
+ break;
+ }
+
+ if (area_size < size)
+ goto out_unlock;
+
+ addr = mas_free.index;
+ last = mas_free.last;
+
+ /* insert allocated size to busy_areas at range [addr, addr + size) */
+ mas_set_range(&mas_busy, addr, addr + size - 1);
+ err = mas_store_gfp(&mas_busy, xa_mk_value(size), GFP_KERNEL);
+ if (err)
+ goto out_unlock;
+
+ mas_erase(&mas_free);
+ if (area_size > size) {
+ /*
+ * re-insert remaining free size to free_areas at range
+ * [addr + size, last]
+ */
+ mas_set_range(&mas_free, addr + size, last);
+ size = area_size - size;
+ err = mas_store_gfp(&mas_free, xa_mk_value(size), GFP_KERNEL);
+ if (err) {
+ mas_erase(&mas_busy);
+ goto out_unlock;
+ }
+ }
+ ptr = (void *)addr;
+
+out_unlock:
+ mutex_unlock(mutex);
+ return ptr;
+}
+
+static int execmem_cache_populate(struct execmem_range *range, size_t size)
+{
+ unsigned long vm_flags = VM_FLUSH_RESET_PERMS | VM_ALLOW_HUGE_VMAP;
+ unsigned long start, end;
+ struct vm_struct *vm;
+ size_t alloc_size;
+ int err = -ENOMEM;
+ void *p;
+
+ alloc_size = round_up(size, PMD_SIZE);
+ p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags);
+ if (!p)
+ return err;
+
+ vm = find_vm_area(p);
+ if (!vm)
+ goto err_free_mem;
+
+ /* fill memory with invalid instructions */
+ execmem_invalidate(p, alloc_size, /* writable = */ true);
+
+ start = (unsigned long)p;
+ end = start + alloc_size;
+
+ vunmap_range_noflush(start, end);
+ flush_tlb_kernel_range(start, end);
+
+ /* FIXME: handle direct map alias */
+
+ err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages,
+ PMD_SHIFT);
+ if (err)
+ goto err_free_mem;
+
+ err = execmem_cache_add(p, alloc_size);
+ if (err)
+ goto err_free_mem;
+
+ return 0;
+
+err_free_mem:
+ vfree(p);
+ return err;
+}
+
+static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
+{
+ void *p;
+ int err;
+
+ p = __execmem_cache_alloc(size);
+ if (p)
+ return p;
+
+ err = execmem_cache_populate(range, size);
+ if (err)
+ return NULL;
+
+ return __execmem_cache_alloc(size);
+}
+
+static bool execmem_cache_free(void *ptr)
+{
+ struct maple_tree *busy_areas = &execmem_cache.busy_areas;
+ struct mutex *mutex = &execmem_cache.mutex;
+ unsigned long addr = (unsigned long)ptr;
+ MA_STATE(mas, busy_areas, addr, addr);
+ size_t size;
+ void *area;
+
+ mutex_lock(mutex);
+ area = mas_walk(&mas);
+ if (!area) {
+ mutex_unlock(mutex);
+ return false;
+ }
+ size = xa_to_value(area);
+ mas_erase(&mas);
+ mutex_unlock(mutex);
+
+ execmem_invalidate(ptr, size, /* writable = */ false);
+
+ execmem_cache_add(ptr, size);
+
+ schedule_work(&execmem_cache_clean_work);
+
+ return true;
+}
+
+static void *__execmem_alloc(struct execmem_range *range, size_t size)
+{
+ bool use_cache = range->flags & EXECMEM_ROX_CACHE;
+ unsigned long vm_flags = VM_FLUSH_RESET_PERMS;
+ pgprot_t pgprot = range->pgprot;
+ void *p;
+
+ if (use_cache)
+ p = execmem_cache_alloc(range, size);
+ else
+ p = execmem_vmalloc(range, size, pgprot, vm_flags);
+
return kasan_reset_tag(p);
}

@@ -61,7 +312,9 @@ void execmem_free(void *ptr)
* supported by vmalloc.
*/
WARN_ON(in_interrupt());
- vfree(ptr);
+
+ if (!execmem_cache_free(ptr))
+ vfree(ptr);
}

void *execmem_update_copy(void *dst, const void *src, size_t size)
--
2.43.0


2024-04-11 17:38:08

by Mike Rapoport

[permalink] [raw]
Subject: [RFC PATCH 7/7] x86/module: enable ROX caches for module text

From: "Mike Rapoport (IBM)" <[email protected]>

Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module
text allocations.

Signed-off-by: Mike Rapoport (IBM) <[email protected]>
---
arch/x86/mm/init.c | 29 +++++++++++++++++++++++++----
1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8e8cd0de3af6..049a8b4c64e2 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1102,9 +1102,23 @@ unsigned long arch_max_swapfile_size(void)
#endif

#ifdef CONFIG_EXECMEM
+static void execmem_invalidate(void *ptr, size_t size, bool writeable)
+{
+ /* fill memory with INT3 instructions */
+ if (writeable)
+ memset(ptr, 0xcc, size);
+ else
+ text_poke_set(ptr, 0xcc, size);
+}
+
static struct execmem_info execmem_info __ro_after_init = {
+ .invalidate = execmem_invalidate,
.ranges = {
- [EXECMEM_DEFAULT] = {
+ [EXECMEM_MODULE_TEXT] = {
+ .flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE,
+ .alignment = MODULE_ALIGN,
+ },
+ [EXECMEM_KPROBES...EXECMEM_MODULE_DATA] = {
.flags = EXECMEM_KASAN_SHADOW,
.alignment = MODULE_ALIGN,
},
@@ -1119,9 +1133,16 @@ struct execmem_info __init *execmem_arch_setup(void)
offset = get_random_u32_inclusive(1, 1024) * PAGE_SIZE;

start = MODULES_VADDR + offset;
- execmem_info.ranges[EXECMEM_DEFAULT].start = start;
- execmem_info.ranges[EXECMEM_DEFAULT].end = MODULES_END;
- execmem_info.ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL;
+
+ for (int i = EXECMEM_MODULE_TEXT; i < EXECMEM_TYPE_MAX; i++) {
+ struct execmem_range *r = &execmem_info.ranges[i];
+
+ r->start = start;
+ r->end = MODULES_END;
+ r->pgprot = PAGE_KERNEL;
+ }
+
+ execmem_info.ranges[EXECMEM_MODULE_TEXT].pgprot = PAGE_KERNEL_ROX;

return &execmem_info;
}
--
2.43.0


2024-04-12 06:07:51

by Christophe Leroy

[permalink] [raw]
Subject: Re: [RFC PATCH 2/7] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations



Le 11/04/2024 à 18:05, Mike Rapoport a écrit :
> From: "Mike Rapoport (IBM)" <[email protected]>
>
> vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explictly
> specify node ID will use huge pages only if size_per_node is larger than
> PMD_SIZE.
> Still the actual allocated memory is not distributed between nodes and
> there is no advantage in such approach.
> On the contrary, BPF allocates PMD_SIZE * num_possible_nodes() for each
> new bpf_prog_pack, while it could do with PMD_SIZE'ed packs.
>
> Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with
> NUMA_NO_NODE and use huge pages whenever the requested allocation size
> is larger than PMD_SIZE.

Patch looks ok but message is confusing. We also use huge pages at PTE
size, for instance 512k pages or 16k pages on powerpc 8xx, while
PMD_SIZE is 4M.

Christophe

>
> Signed-off-by: Mike Rapoport (IBM) <[email protected]>
> ---
> mm/vmalloc.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 22aa63f4ef63..5fc8b514e457 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3737,8 +3737,6 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> }
>
> if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
> - unsigned long size_per_node;
> -
> /*
> * Try huge pages. Only try for PAGE_KERNEL allocations,
> * others like modules don't yet expect huge pages in
> @@ -3746,13 +3744,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> * supporting them.
> */
>
> - size_per_node = size;
> - if (node == NUMA_NO_NODE)
> - size_per_node /= num_online_nodes();
> - if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
> + if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
> shift = PMD_SHIFT;
> else
> - shift = arch_vmap_pte_supported_shift(size_per_node);
> + shift = arch_vmap_pte_supported_shift(size);
>
> align = max(real_align, 1UL << shift);
> size = ALIGN(real_size, 1UL << shift);

2024-04-12 09:08:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text


* Mike Rapoport <[email protected]> wrote:

> for (s = start; s < end; s++) {
> void *addr = (void *)s + *s;
> + void *wr_addr = addr + module_writable_offset(mod, addr);

So instead of repeating this pattern in a dozen of places, why not use a
simpler method:

void *wr_addr = module_writable_address(mod, addr);

or so, since we have to pass 'addr' to the module code anyway.

The text patching code is pretty complex already.

Thanks,

Ingo

2024-04-14 07:35:27

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC PATCH 2/7] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations

On Fri, Apr 12, 2024 at 06:07:19AM +0000, Christophe Leroy wrote:
>
>
> Le 11/04/2024 ? 18:05, Mike Rapoport a ?crit?:
> > From: "Mike Rapoport (IBM)" <[email protected]>
> >
> > vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explictly
> > specify node ID will use huge pages only if size_per_node is larger than
> > PMD_SIZE.
> > Still the actual allocated memory is not distributed between nodes and
> > there is no advantage in such approach.
> > On the contrary, BPF allocates PMD_SIZE * num_possible_nodes() for each
> > new bpf_prog_pack, while it could do with PMD_SIZE'ed packs.
> >
> > Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with
> > NUMA_NO_NODE and use huge pages whenever the requested allocation size
> > is larger than PMD_SIZE.
>
> Patch looks ok but message is confusing. We also use huge pages at PTE
> size, for instance 512k pages or 16k pages on powerpc 8xx, while
> PMD_SIZE is 4M.

Ok, I'll rephrase.

> Christophe
>
> >
> > Signed-off-by: Mike Rapoport (IBM) <[email protected]>
> > ---
> > mm/vmalloc.c | 9 ++-------
> > 1 file changed, 2 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 22aa63f4ef63..5fc8b514e457 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3737,8 +3737,6 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> > }
> >
> > if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
> > - unsigned long size_per_node;
> > -
> > /*
> > * Try huge pages. Only try for PAGE_KERNEL allocations,
> > * others like modules don't yet expect huge pages in
> > @@ -3746,13 +3744,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> > * supporting them.
> > */
> >
> > - size_per_node = size;
> > - if (node == NUMA_NO_NODE)
> > - size_per_node /= num_online_nodes();
> > - if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
> > + if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
> > shift = PMD_SHIFT;
> > else
> > - shift = arch_vmap_pte_supported_shift(size_per_node);
> > + shift = arch_vmap_pte_supported_shift(size);
> >
> > align = max(real_align, 1UL << shift);
> > size = ALIGN(real_size, 1UL << shift);

--
Sincerely yours,
Mike.

2024-04-14 07:37:06

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text

On Fri, Apr 12, 2024 at 11:08:00AM +0200, Ingo Molnar wrote:
>
> * Mike Rapoport <[email protected]> wrote:
>
> > for (s = start; s < end; s++) {
> > void *addr = (void *)s + *s;
> > + void *wr_addr = addr + module_writable_offset(mod, addr);
>
> So instead of repeating this pattern in a dozen of places, why not use a
> simpler method:
>
> void *wr_addr = module_writable_address(mod, addr);
>
> or so, since we have to pass 'addr' to the module code anyway.

Agree.

> The text patching code is pretty complex already.
>
> Thanks,
>
> Ingo

--
Sincerely yours,
Mike.

2024-04-15 10:43:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text

On Thu, Apr 11, 2024 at 07:05:24PM +0300, Mike Rapoport wrote:
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 45a280f2161c..b4d6868df573 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c

> @@ -504,17 +513,17 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
> * patch if feature is *NOT* present.
> */
> if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT)) {
> - optimize_nops_inplace(instr, a->instrlen);
> + optimize_nops_inplace(wr_instr, a->instrlen);
> continue;
> }
>
> - DPRINTK(ALT, "feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d) flags: 0x%x",
> + DPRINTK(ALT, "feat: %d*32+%d, old: (%px (%px) len: %d), repl: (%px (%px), len: %d) flags: 0x%x",
> a->cpuid >> 5,
> a->cpuid & 0x1f,
> - instr, instr, a->instrlen,
> - replacement, a->replacementlen, a->flags);
> + instr, wr_instr, a->instrlen,
> + replacement, wr_replacement, a->replacementlen, a->flags);

I think this, and

>
> - memcpy(insn_buff, replacement, a->replacementlen);
> + memcpy(insn_buff, wr_replacement, a->replacementlen);
> insn_buff_sz = a->replacementlen;
>
> if (a->flags & ALT_FLAG_DIRECT_CALL) {
> @@ -528,11 +537,11 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
>
> apply_relocation(insn_buff, a->instrlen, instr, replacement, a->replacementlen);
>
> - DUMP_BYTES(ALT, instr, a->instrlen, "%px: old_insn: ", instr);
> + DUMP_BYTES(ALT, wr_instr, a->instrlen, "%px: old_insn: ", instr);

this, want to remain as is.

> DUMP_BYTES(ALT, replacement, a->replacementlen, "%px: rpl_insn: ", replacement);
> DUMP_BYTES(ALT, insn_buff, insn_buff_sz, "%px: final_insn: ", instr);
>
> - text_poke_early(instr, insn_buff, insn_buff_sz);
> + text_poke_early(wr_instr, insn_buff, insn_buff_sz);
> }
>
> kasan_enable_current();

The rationale being that we then print an address that can be correlated
to the kernel image (provided one either kills kaslr or adjusts for it).

2024-04-15 17:03:00

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote:
>
> > To populate the cache, a writable large page is allocated from vmalloc with
> > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> > ROX.
>
> > +static void execmem_invalidate(void *ptr, size_t size, bool writable)
> > +{
> > + if (execmem_info->invalidate)
> > + execmem_info->invalidate(ptr, size, writable);
> > + else
> > + memset(ptr, 0, size);
> > +}
>
> +static void execmem_invalidate(void *ptr, size_t size, bool writeable)
> +{
> + /* fill memory with INT3 instructions */
> + if (writeable)
> + memset(ptr, 0xcc, size);
> + else
> + text_poke_set(ptr, 0xcc, size);
> +}
>
> Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction.
> It raises #BP not #UD.

Do you mean that _invalidate is a poor name choice or that it's necessary
to use an instruction that raises #UD?

--
Sincerely yours,
Mike.

2024-04-15 17:09:10

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text

On Mon, Apr 15, 2024 at 12:43:16PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 11, 2024 at 07:05:24PM +0300, Mike Rapoport wrote:
> > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> > index 45a280f2161c..b4d6868df573 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
>
> > @@ -504,17 +513,17 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
> > * patch if feature is *NOT* present.
> > */
> > if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT)) {
> > - optimize_nops_inplace(instr, a->instrlen);
> > + optimize_nops_inplace(wr_instr, a->instrlen);
> > continue;
> > }
> >
> > - DPRINTK(ALT, "feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, len: %d) flags: 0x%x",
> > + DPRINTK(ALT, "feat: %d*32+%d, old: (%px (%px) len: %d), repl: (%px (%px), len: %d) flags: 0x%x",
> > a->cpuid >> 5,
> > a->cpuid & 0x1f,
> > - instr, instr, a->instrlen,
> > - replacement, a->replacementlen, a->flags);
> > + instr, wr_instr, a->instrlen,
> > + replacement, wr_replacement, a->replacementlen, a->flags);
>
> I think this, and

I've found printing both address handy when I debugged it, but no strong
feelings here.

> >
> > - memcpy(insn_buff, replacement, a->replacementlen);
> > + memcpy(insn_buff, wr_replacement, a->replacementlen);
> > insn_buff_sz = a->replacementlen;
> >
> > if (a->flags & ALT_FLAG_DIRECT_CALL) {
> > @@ -528,11 +537,11 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
> >
> > apply_relocation(insn_buff, a->instrlen, instr, replacement, a->replacementlen);
> >
> > - DUMP_BYTES(ALT, instr, a->instrlen, "%px: old_insn: ", instr);
> > + DUMP_BYTES(ALT, wr_instr, a->instrlen, "%px: old_insn: ", instr);
>
> this, want to remain as is.

here wr_instr is the buffer to dump:

DUMP_BYTES(type, buf, len, fmt, args...)

rather than an address, which remained 'instr'.

> > DUMP_BYTES(ALT, replacement, a->replacementlen, "%px: rpl_insn: ", replacement);
> > DUMP_BYTES(ALT, insn_buff, insn_buff_sz, "%px: final_insn: ", instr);
> >
> > - text_poke_early(instr, insn_buff, insn_buff_sz);
> > + text_poke_early(wr_instr, insn_buff, insn_buff_sz);
> > }
> >
> > kasan_enable_current();
>
> The rationale being that we then print an address that can be correlated
> to the kernel image (provided one either kills kaslr or adjusts for it).

--
Sincerely yours,
Mike.

2024-04-15 17:57:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote:

> To populate the cache, a writable large page is allocated from vmalloc with
> VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> ROX.

> +static void execmem_invalidate(void *ptr, size_t size, bool writable)
> +{
> + if (execmem_info->invalidate)
> + execmem_info->invalidate(ptr, size, writable);
> + else
> + memset(ptr, 0, size);
> +}

+static void execmem_invalidate(void *ptr, size_t size, bool writeable)
+{
+ /* fill memory with INT3 instructions */
+ if (writeable)
+ memset(ptr, 0xcc, size);
+ else
+ text_poke_set(ptr, 0xcc, size);
+}

Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction.
It raises #BP not #UD.

2024-04-16 09:36:39

by Nadav Amit

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] module: prepare to handle ROX allocations for text



> On 11 Apr 2024, at 19:05, Mike Rapoport <[email protected]> wrote:
>
> @@ -2440,7 +2479,24 @@ static int post_relocation(struct module *mod, const struct load_info *info)
> add_kallsyms(mod, info);
>
> /* Arch-specific module finalizing. */
> - return module_finalize(info->hdr, info->sechdrs, mod);
> + ret = module_finalize(info->hdr, info->sechdrs, mod);
> + if (ret)
> + return ret;
> +
> + for_each_mod_mem_type(type) {
> + struct module_memory *mem = &mod->mem[type];
> +
> + if (mem->is_rox) {
> + if (!execmem_update_copy(mem->base, mem->rw_copy,
> + mem->size))
> + return -ENOMEM;
> +
> + vfree(mem->rw_copy);
> + mem->rw_copy = NULL;
> + }
> + }
> +
> + return 0;
> }

I might be missing something, but it seems a bit racy.

IIUC, module_finalize() calls alternatives_smp_module_add(). At this
point, since you don’t hold the text_mutex, some might do text_poke(),
e.g., by enabling/disabling static-key, and the update would be
overwritten. No?

2024-04-16 14:06:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

On Mon, Apr 15, 2024 at 08:00:26PM +0300, Mike Rapoport wrote:
> On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote:
> >
> > > To populate the cache, a writable large page is allocated from vmalloc with
> > > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> > > ROX.
> >
> > > +static void execmem_invalidate(void *ptr, size_t size, bool writable)
> > > +{
> > > + if (execmem_info->invalidate)
> > > + execmem_info->invalidate(ptr, size, writable);
> > > + else
> > > + memset(ptr, 0, size);
> > > +}
> >
> > +static void execmem_invalidate(void *ptr, size_t size, bool writeable)
> > +{
> > + /* fill memory with INT3 instructions */
> > + if (writeable)
> > + memset(ptr, 0xcc, size);
> > + else
> > + text_poke_set(ptr, 0xcc, size);
> > +}
> >
> > Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction.
> > It raises #BP not #UD.
>
> Do you mean that _invalidate is a poor name choice or that it's necessary
> to use an instruction that raises #UD?

Poor naming, mostly. #BP handler will still scream bloody murder if the
site is otherwise unclaimed.

It just isn't an invalid instruction.

2024-04-18 10:21:47

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] module: prepare to handle ROX allocations for text

On Tue, Apr 16, 2024 at 12:36:08PM +0300, Nadav Amit wrote:
>
>
> > On 11 Apr 2024, at 19:05, Mike Rapoport <[email protected]> wrote:
> >
> > @@ -2440,7 +2479,24 @@ static int post_relocation(struct module *mod, const struct load_info *info)
> > add_kallsyms(mod, info);
> >
> > /* Arch-specific module finalizing. */
> > - return module_finalize(info->hdr, info->sechdrs, mod);
> > + ret = module_finalize(info->hdr, info->sechdrs, mod);
> > + if (ret)
> > + return ret;
> > +
> > + for_each_mod_mem_type(type) {
> > + struct module_memory *mem = &mod->mem[type];
> > +
> > + if (mem->is_rox) {
> > + if (!execmem_update_copy(mem->base, mem->rw_copy,
> > + mem->size))
> > + return -ENOMEM;
> > +
> > + vfree(mem->rw_copy);
> > + mem->rw_copy = NULL;
> > + }
> > + }
> > +
> > + return 0;
> > }
>
> I might be missing something, but it seems a bit racy.
>
> IIUC, module_finalize() calls alternatives_smp_module_add(). At this
> point, since you don’t hold the text_mutex, some might do text_poke(),
> e.g., by enabling/disabling static-key, and the update would be
> overwritten. No?

Right :(
Even worse, for UP case alternatives_smp_unlock() will "patch" still empty
area.

So I'm thinking about calling alternatives_smp_module_add() from an
additional callback after the execmem_update_copy().

Does it make sense to you?

--
Sincerely yours,
Mike.

2024-04-18 10:25:23

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC PATCH 6/7] execmem: add support for cache of large ROX pages

On Tue, Apr 16, 2024 at 09:52:34AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 15, 2024 at 08:00:26PM +0300, Mike Rapoport wrote:
> > On Mon, Apr 15, 2024 at 12:47:50PM +0200, Peter Zijlstra wrote:
> > > On Thu, Apr 11, 2024 at 07:05:25PM +0300, Mike Rapoport wrote:
> > >
> > > > To populate the cache, a writable large page is allocated from vmalloc with
> > > > VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as
> > > > ROX.
> > >
> > > > +static void execmem_invalidate(void *ptr, size_t size, bool writable)
> > > > +{
> > > > + if (execmem_info->invalidate)
> > > > + execmem_info->invalidate(ptr, size, writable);
> > > > + else
> > > > + memset(ptr, 0, size);
> > > > +}
> > >
> > > +static void execmem_invalidate(void *ptr, size_t size, bool writeable)
> > > +{
> > > + /* fill memory with INT3 instructions */
> > > + if (writeable)
> > > + memset(ptr, 0xcc, size);
> > > + else
> > > + text_poke_set(ptr, 0xcc, size);
> > > +}
> > >
> > > Thing is, 0xcc (aka INT3_INSN_OPCODE) is not an invalid instruction.
> > > It raises #BP not #UD.
> >
> > Do you mean that _invalidate is a poor name choice or that it's necessary
> > to use an instruction that raises #UD?
>
> Poor naming, mostly. #BP handler will still scream bloody murder if the
> site is otherwise unclaimed.
>
> It just isn't an invalid instruction.

Well, execmem_fill_with_insns_screaming_bloody_murder seems too long, how
about execmem_fill_trapping_insns?

--
Sincerely yours,
Mike.

2024-04-18 19:31:45

by Nadav Amit

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] module: [



> On 18 Apr 2024, at 13:20, Mike Rapoport <[email protected]> wrote:
>
> On Tue, Apr 16, 2024 at 12:36:08PM +0300, Nadav Amit wrote:
>>
>>
>>
>> I might be missing something, but it seems a bit racy.
>>
>> IIUC, module_finalize() calls alternatives_smp_module_add(). At this
>> point, since you don’t hold the text_mutex, some might do text_poke(),
>> e.g., by enabling/disabling static-key, and the update would be
>> overwritten. No?
>
> Right :(
> Even worse, for UP case alternatives_smp_unlock() will "patch" still empty
> area.
>
> So I'm thinking about calling alternatives_smp_module_add() from an
> additional callback after the execmem_update_copy().
>
> Does it make sense to you?

Going over the code again - I might have just been wrong: I confused the
alternatives and the jump-label mechanisms (as they do share a lot of
code and characteristics).

The jump-labels are updated when prepare_coming_module() is called, which
happens after post_relocation() [which means they would be updated using
text_poke() “inefficiently” but should be safe].

The “alternatives” appear only to use text_poke() (in contrast for
text_poke_early()) from very specific few flows, e.g.,
common_cpu_up() -> alternatives_enable_smp().

Are those flows pose a problem after boot?

Anyhow, sorry for the noise.

2024-04-18 19:48:51

by Mike Rapoport

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] module: [

On Thu, Apr 18, 2024 at 10:31:16PM +0300, Nadav Amit wrote:
>
> > On 18 Apr 2024, at 13:20, Mike Rapoport <[email protected]> wrote:
> >
> > On Tue, Apr 16, 2024 at 12:36:08PM +0300, Nadav Amit wrote:
> >>
> >>
> >>
> >> I might be missing something, but it seems a bit racy.
> >>
> >> IIUC, module_finalize() calls alternatives_smp_module_add(). At this
> >> point, since you don’t hold the text_mutex, some might do text_poke(),
> >> e.g., by enabling/disabling static-key, and the update would be
> >> overwritten. No?
> >
> > Right :(
> > Even worse, for UP case alternatives_smp_unlock() will "patch" still empty
> > area.
> >
> > So I'm thinking about calling alternatives_smp_module_add() from an
> > additional callback after the execmem_update_copy().
> >
> > Does it make sense to you?
>
> Going over the code again - I might have just been wrong: I confused the
> alternatives and the jump-label mechanisms (as they do share a lot of
> code and characteristics).
>
> The jump-labels are updated when prepare_coming_module() is called, which
> happens after post_relocation() [which means they would be updated using
> text_poke() “inefficiently” but should be safe].
>
> The “alternatives” appear only to use text_poke() (in contrast for
> text_poke_early()) from very specific few flows, e.g.,
> common_cpu_up() -> alternatives_enable_smp().
>
> Are those flows pose a problem after boot?

Yes, common_cpu_up is called on CPU hotplug, so it's possible to have a
race between alternatives_smp_module_add() and common_cpu_up() ->
alternatives_enable_smp().

And in UP case alternatives_smp_module_add() will call
alternatives_smp_unlock() that will patch module text before it is updated.

> Anyhow, sorry for the noise.

On the contrary, I would have missed it.

--
Sincerely yours,
Mike.