2023-01-25 18:51:02

by Song Liu

[permalink] [raw]
Subject: [PATCH v2] module: replace module_layout with module_memory

module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:

1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)

Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:

MOD_MEM_TYPE_TEXT,
MOD_MEM_TYPE_DATA,
MOD_MEM_TYPE_RODATA,
MOD_MEM_TYPE_RO_AFTER_INIT,
MOD_MEM_TYPE_INIT_TEXT,
MOD_MEM_TYPE_INIT_DATA,
MOD_MEM_TYPE_INIT_RODATA,

and allocating them separately.

Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.

Signed-off-by: Song Liu <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>

---

This is the preparation work for the type aware module_alloc() discussed
in [1]. While this work is not covered much in the discussion, it is a
critical step of the effort.

As this part grows pretty big (~1000 lines, + and -), I would like get
some feedback on it, so that I know it is on the right track.

Please share your comments. Thanks!

Test coverage: Tested on x86_64.
Build tested by kernel test bot in [2]. The only regression in [2] was a
typo in parisc, which is also fixed.

[1] https://lore.kernel.org/linux-mm/[email protected]/T/#u
[2] https://lore.kernel.org/linux-raid/63b8827e.clJQX2wg+I+tiX7m%[email protected]/T/#u

Changes v1 => v2:
1. Add data_addr_[min|max] for CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
case.
---
arch/alpha/kernel/module.c | 3 +-
arch/arc/kernel/unwind.c | 9 +-
arch/arm/kernel/module-plts.c | 5 +-
arch/arm64/kernel/module-plts.c | 5 +-
arch/ia64/kernel/module.c | 31 ++-
arch/mips/kernel/vpe.c | 11 +-
arch/parisc/kernel/module.c | 53 +++--
arch/powerpc/kernel/module_32.c | 10 +-
arch/s390/kernel/module.c | 30 +--
include/linux/module.h | 65 +++---
kernel/module/internal.h | 34 ++-
kernel/module/kallsyms.c | 54 +++--
kernel/module/kdb.c | 17 +-
kernel/module/main.c | 389 +++++++++++++++++++-------------
kernel/module/procfs.c | 17 +-
kernel/module/strict_rwx.c | 113 +++-------
kernel/module/tree_lookup.c | 43 ++--
17 files changed, 484 insertions(+), 405 deletions(-)

diff --git a/arch/alpha/kernel/module.c b/arch/alpha/kernel/module.c
index 5b60c248de9e..905af4e8d835 100644
--- a/arch/alpha/kernel/module.c
+++ b/arch/alpha/kernel/module.c
@@ -148,7 +148,8 @@ apply_relocate_add(Elf64_Shdr *sechdrs, const char *strtab,

/* The small sections were sorted to the end of the segment.
The following should definitely cover them. */
- gp = (u64)me->core_layout.base + me->core_layout.size - 0x8000;
+ gp = (u64)me->mod_mem[MOD_MEM_TYPE_DATA].base +
+ me->mod_mem[MOD_MEM_TYPE_DATA].size - 0x8000;
got = sechdrs[me->arch.gotsecindex].sh_addr;

for (i = 0; i < n; i++) {
diff --git a/arch/arc/kernel/unwind.c b/arch/arc/kernel/unwind.c
index 200270a94558..2c55f6215f79 100644
--- a/arch/arc/kernel/unwind.c
+++ b/arch/arc/kernel/unwind.c
@@ -369,6 +369,8 @@ void *unwind_add_table(struct module *module, const void *table_start,
unsigned long table_size)
{
struct unwind_table *table;
+ struct module_memory *mod_mem_core_text;
+ struct module_memory *mod_mem_init_text;

if (table_size <= 0)
return NULL;
@@ -377,9 +379,12 @@ void *unwind_add_table(struct module *module, const void *table_start,
if (!table)
return NULL;

+ mod_mem_core_text = &module->mod_mem[MOD_MEM_TYPE_TEXT];
+ mod_mem_init_text = &module->mod_mem[MOD_MEM_TYPE_INIT_TEXT];
+
init_unwind_table(table, module->name,
- module->core_layout.base, module->core_layout.size,
- module->init_layout.base, module->init_layout.size,
+ mod_mem_core_text->base, mod_mem_core_text->size,
+ mod_mem_init_text->base, mod_mem_init_text->size,
table_start, table_size,
NULL, 0);

diff --git a/arch/arm/kernel/module-plts.c b/arch/arm/kernel/module-plts.c
index 1fc309b41f94..e68a1da41206 100644
--- a/arch/arm/kernel/module-plts.c
+++ b/arch/arm/kernel/module-plts.c
@@ -30,7 +30,10 @@ static const u32 fixed_plts[] = {

static bool in_init(const struct module *mod, unsigned long loc)
{
- return loc - (u32)mod->init_layout.base < mod->init_layout.size;
+ const struct module_memory *mod_mem;
+
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT];
+ return loc - (u32)mod_mem->base < mod_mem->size;
}

static void prealloc_fixed(struct mod_plt_sec *pltsec, struct plt_entries *plt)
diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
index 5a0a8f552a61..2f1ca5e2294b 100644
--- a/arch/arm64/kernel/module-plts.c
+++ b/arch/arm64/kernel/module-plts.c
@@ -67,7 +67,10 @@ static bool plt_entries_equal(const struct plt_entry *a,

static bool in_init(const struct module *mod, void *loc)
{
- return (u64)loc - (u64)mod->init_layout.base < mod->init_layout.size;
+ const struct module_memory *mod_mem;
+
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT];
+ return (u64)loc - (u64)mod_mem->base < mod_mem->size;
}

u64 module_emit_plt_entry(struct module *mod, Elf64_Shdr *sechdrs,
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 8f62cf97f691..966a8997424d 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -485,19 +485,25 @@ module_frob_arch_sections (Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, char *secstrings,
return 0;
}

-static inline int
+static inline bool
in_init (const struct module *mod, uint64_t addr)
{
- return addr - (uint64_t) mod->init_layout.base < mod->init_layout.size;
+ const struct module_memory *mod_mem;
+
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT];
+ return addr - (uint64_t)mod_mem->base < mod_mem->size;
}

-static inline int
+static inline bool
in_core (const struct module *mod, uint64_t addr)
{
- return addr - (uint64_t) mod->core_layout.base < mod->core_layout.size;
+ const struct module_memory *mod_mem;
+
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_TEXT];
+ return addr - (uint64_t)mod_mem->base < mod_mem->size;
}

-static inline int
+static inline bool
is_internal (const struct module *mod, uint64_t value)
{
return in_init(mod, value) || in_core(mod, value);
@@ -677,7 +683,9 @@ do_reloc (struct module *mod, uint8_t r_type, Elf64_Sym *sym, uint64_t addend,
break;

case RV_BDREL:
- val -= (uint64_t) (in_init(mod, val) ? mod->init_layout.base : mod->core_layout.base);
+ val -= (uint64_t) (in_init(mod, val) ?
+ mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT].base :
+ mod->mod_mem[MOD_MEM_TYPE_TEXT].base);
break;

case RV_LTV:
@@ -812,15 +820,18 @@ apply_relocate_add (Elf64_Shdr *sechdrs, const char *strtab, unsigned int symind
* addresses have been selected...
*/
uint64_t gp;
- if (mod->core_layout.size > MAX_LTOFF)
+ struct module_memory *mod_mem;
+
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_DATA];
+ if (mod_mem->size > MAX_LTOFF)
/*
* This takes advantage of fact that SHF_ARCH_SMALL gets allocated
* at the end of the module.
*/
- gp = mod->core_layout.size - MAX_LTOFF / 2;
+ gp = mod_mem->size - MAX_LTOFF / 2;
else
- gp = mod->core_layout.size / 2;
- gp = (uint64_t) mod->core_layout.base + ((gp + 7) & -8);
+ gp = mod_mem->size / 2;
+ gp = (uint64_t) mod_mem->base + ((gp + 7) & -8);
mod->arch.gp = gp;
DEBUGP("%s: placing gp at 0x%lx\n", __func__, gp);
}
diff --git a/arch/mips/kernel/vpe.c b/arch/mips/kernel/vpe.c
index 13294972707b..b13363a1f6a3 100644
--- a/arch/mips/kernel/vpe.c
+++ b/arch/mips/kernel/vpe.c
@@ -199,18 +199,17 @@ static void layout_sections(struct module *mod, const Elf_Ehdr *hdr,
for (m = 0; m < ARRAY_SIZE(masks); ++m) {
for (i = 0; i < hdr->e_shnum; ++i) {
Elf_Shdr *s = &sechdrs[i];
+ struct module_memory *mod_mem;
+
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_TEXT];

if ((s->sh_flags & masks[m][0]) != masks[m][0]
|| (s->sh_flags & masks[m][1])
|| s->sh_entsize != ~0UL)
continue;
s->sh_entsize =
- get_offset((unsigned long *)&mod->core_layout.size, s);
+ get_offset((unsigned long *)&mod_mem->size, s);
}
-
- if (m == 0)
- mod->core_layout.text_size = mod->core_layout.size;
-
}
}

@@ -641,7 +640,7 @@ static int vpe_elfload(struct vpe *v)
layout_sections(&mod, hdr, sechdrs, secstrings);
}

- v->load_addr = alloc_progmem(mod.core_layout.size);
+ v->load_addr = alloc_progmem(mod.mod_mem[MOD_MEM_TYPE_TEXT].size);
if (!v->load_addr)
return -ENOMEM;

diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index 7df140545b22..f2af72ec4d66 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -27,9 +27,11 @@
* We are not doing SEGREL32 handling correctly. According to the ABI, we
* should do a value offset, like this:
* if (in_init(me, (void *)val))
- * val -= (uint32_t)me->init_layout.base;
+ * val -= (uint32_t)me->
+ * mod_mem[MOD_MEM_TYPE_INIT_TEXT].base;
* else
- * val -= (uint32_t)me->core_layout.base;
+ * val -= (uint32_t)me->
+ * mod_mem[MOD_MEM_TYPE_TEXT].base;
* However, SEGREL32 is used only for PARISC unwind entries, and we want
* those entries to have an absolute address, and not just an offset.
*
@@ -76,23 +78,18 @@
* allows us to allocate up to 4095 GOT entries. */
#define MAX_GOTS 4095

-/* three functions to determine where in the module core
- * or init pieces the location is */
-static inline int in_init(struct module *me, void *loc)
+static inline bool in_local(struct module *me, void *loc)
{
- return (loc >= me->init_layout.base &&
- loc <= (me->init_layout.base + me->init_layout.size));
-}
+ enum mod_mem_type type;

-static inline int in_core(struct module *me, void *loc)
-{
- return (loc >= me->core_layout.base &&
- loc <= (me->core_layout.base + me->core_layout.size));
-}
+ for (type = MOD_MEM_TYPE_TEXT; type < MOD_MEM_NUM_TYPES; type++) {
+ struct module_memory *mod_mem = &me->mod_mem[type];

-static inline int in_local(struct module *me, void *loc)
-{
- return in_init(me, loc) || in_core(me, loc);
+ if (loc >= mod_mem->base &&
+ loc <= (mod_mem->base + mod_mem->size))
+ return true;
+ }
+ return false;
}

#ifndef CONFIG_64BIT
@@ -302,6 +299,7 @@ int module_frob_arch_sections(CONST Elf_Ehdr *hdr,
{
unsigned long gots = 0, fdescs = 0, len;
unsigned int i;
+ struct module_memory *mod_mem;

len = hdr->e_shnum * sizeof(me->arch.section[0]);
me->arch.section = kzalloc(len, GFP_KERNEL);
@@ -346,14 +344,15 @@ int module_frob_arch_sections(CONST Elf_Ehdr *hdr,
me->arch.section[s].stub_entries += count;
}

+ mod_mem = &me->mod_mem[MOD_MEM_TYPE_TEXT];
/* align things a bit */
- me->core_layout.size = ALIGN(me->core_layout.size, 16);
- me->arch.got_offset = me->core_layout.size;
- me->core_layout.size += gots * sizeof(struct got_entry);
+ mod_mem->size = ALIGN(mod_mem->size, 16);
+ me->arch.got_offset = mod_mem->size;
+ mod_mem->size += gots * sizeof(struct got_entry);

- me->core_layout.size = ALIGN(me->core_layout.size, 16);
- me->arch.fdesc_offset = me->core_layout.size;
- me->core_layout.size += fdescs * sizeof(Elf_Fdesc);
+ mod_mem->size = ALIGN(mod_mem->size, 16);
+ me->arch.fdesc_offset = mod_mem->size;
+ mod_mem->size += fdescs * sizeof(Elf_Fdesc);

me->arch.got_max = gots;
me->arch.fdesc_max = fdescs;
@@ -371,7 +370,7 @@ static Elf64_Word get_got(struct module *me, unsigned long value, long addend)

BUG_ON(value == 0);

- got = me->core_layout.base + me->arch.got_offset;
+ got = me->mod_mem[MOD_MEM_TYPE_TEXT].base + me->arch.got_offset;
for (i = 0; got[i].addr; i++)
if (got[i].addr == value)
goto out;
@@ -389,7 +388,7 @@ static Elf64_Word get_got(struct module *me, unsigned long value, long addend)
#ifdef CONFIG_64BIT
static Elf_Addr get_fdesc(struct module *me, unsigned long value)
{
- Elf_Fdesc *fdesc = me->core_layout.base + me->arch.fdesc_offset;
+ Elf_Fdesc *fdesc = me->mod_mem[MOD_MEM_TYPE_TEXT].base + me->arch.fdesc_offset;

if (!value) {
printk(KERN_ERR "%s: zero OPD requested!\n", me->name);
@@ -407,7 +406,7 @@ static Elf_Addr get_fdesc(struct module *me, unsigned long value)

/* Create new one */
fdesc->addr = value;
- fdesc->gp = (Elf_Addr)me->core_layout.base + me->arch.got_offset;
+ fdesc->gp = (Elf_Addr)me->mod_mem[MOD_MEM_TYPE_TEXT].base + me->arch.got_offset;
return (Elf_Addr)fdesc;
}
#endif /* CONFIG_64BIT */
@@ -839,7 +838,7 @@ register_unwind_table(struct module *me,

table = (unsigned char *)sechdrs[me->arch.unwind_section].sh_addr;
end = table + sechdrs[me->arch.unwind_section].sh_size;
- gp = (Elf_Addr)me->core_layout.base + me->arch.got_offset;
+ gp = (Elf_Addr)me->mod_mem[MOD_MEM_TYPE_TEXT].base + me->arch.got_offset;

pr_debug("register_unwind_table(), sect = %d at 0x%p - 0x%p (gp=0x%lx)\n",
me->arch.unwind_section, table, end, gp);
@@ -977,7 +976,7 @@ void module_arch_cleanup(struct module *mod)
#ifdef CONFIG_64BIT
void *dereference_module_function_descriptor(struct module *mod, void *ptr)
{
- unsigned long start_opd = (Elf64_Addr)mod->core_layout.base +
+ unsigned long start_opd = (Elf64_Addr)mod->mod_mem[MOD_MEM_TYPE_TEXT].base +
mod->arch.fdesc_offset;
unsigned long end_opd = start_opd +
mod->arch.fdesc_count * sizeof(Elf64_Fdesc);
diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index ea6536171778..1496551c9e9e 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -160,11 +160,13 @@ static uint32_t do_plt_call(void *location,
struct module *mod)
{
struct ppc_plt_entry *entry;
+ struct module_memory *mod_mem;

pr_debug("Doing plt for call to 0x%x at 0x%x\n", val, (unsigned int)location);
+
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_TEXT];
/* Init, or core PLT? */
- if (location >= mod->core_layout.base
- && location < mod->core_layout.base + mod->core_layout.size)
+ if (location >= mod_mem->base && location < mod_mem->base + mod_mem->size)
entry = (void *)sechdrs[mod->arch.core_plt_section].sh_addr;
else
entry = (void *)sechdrs[mod->arch.init_plt_section].sh_addr;
@@ -322,14 +324,14 @@ notrace int module_trampoline_target(struct module *mod, unsigned long addr,

int module_finalize_ftrace(struct module *module, const Elf_Shdr *sechdrs)
{
- module->arch.tramp = do_plt_call(module->core_layout.base,
+ module->arch.tramp = do_plt_call(module->mod_mem[MOD_MEM_TYPE_TEXT].base,
(unsigned long)ftrace_caller,
sechdrs, module);
if (!module->arch.tramp)
return -ENOENT;

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
- module->arch.tramp_regs = do_plt_call(module->core_layout.base,
+ module->arch.tramp_regs = do_plt_call(module->mod_mem[MOD_MEM_TYPE_TEXT].base,
(unsigned long)ftrace_regs_caller,
sechdrs, module);
if (!module->arch.tramp_regs)
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 2d159b32885b..5f9c45fa2ffd 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -108,7 +108,7 @@ static void check_rela(Elf_Rela *rela, struct module *me)
case R_390_GLOB_DAT:
case R_390_JMP_SLOT:
case R_390_RELATIVE:
- /* Only needed if we want to support loading of
+ /* Only needed if we want to support loading of
modules linked with -shared. */
break;
}
@@ -126,6 +126,7 @@ int module_frob_arch_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
Elf_Rela *rela;
char *strings;
int nrela, i, j;
+ struct module_memory *mod_mem;

/* Find symbol table and string table. */
symtab = NULL;
@@ -173,14 +174,15 @@ int module_frob_arch_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,

/* Increase core size by size of got & plt and set start
offsets for got and plt. */
- me->core_layout.size = ALIGN(me->core_layout.size, 4);
- me->arch.got_offset = me->core_layout.size;
- me->core_layout.size += me->arch.got_size;
- me->arch.plt_offset = me->core_layout.size;
+ mod_mem = &me->mod_mem[MOD_MEM_TYPE_TEXT];
+ mod_mem->size = ALIGN(mod_mem->size, 4);
+ me->arch.got_offset = mod_mem->size;
+ mod_mem->size += me->arch.got_size;
+ me->arch.plt_offset = mod_mem->size;
if (me->arch.plt_size) {
if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_disable)
me->arch.plt_size += PLT_ENTRY_SIZE;
- me->core_layout.size += me->arch.plt_size;
+ mod_mem->size += me->arch.plt_size;
}
return 0;
}
@@ -304,7 +306,7 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
case R_390_GOTPLT64: /* 64 bit offset to jump slot. */
case R_390_GOTPLTENT: /* 32 bit rel. offset to jump slot >> 1. */
if (info->got_initialized == 0) {
- Elf_Addr *gotent = me->core_layout.base +
+ Elf_Addr *gotent = me->mod_mem[MOD_MEM_TYPE_TEXT].base +
me->arch.got_offset +
info->got_offset;

@@ -329,7 +331,7 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
rc = apply_rela_bits(loc, val, 0, 64, 0, write);
else if (r_type == R_390_GOTENT ||
r_type == R_390_GOTPLTENT) {
- val += (Elf_Addr) me->core_layout.base - loc;
+ val += (Elf_Addr) me->mod_mem[MOD_MEM_TYPE_TEXT].base - loc;
rc = apply_rela_bits(loc, val, 1, 32, 1, write);
}
break;
@@ -345,7 +347,7 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
char *plt_base;
char *ip;

- plt_base = me->core_layout.base + me->arch.plt_offset;
+ plt_base = me->mod_mem[MOD_MEM_TYPE_TEXT].base + me->arch.plt_offset;
ip = plt_base + info->plt_offset;
*(int *)insn = 0x0d10e310; /* basr 1,0 */
*(int *)&insn[4] = 0x100c0004; /* lg 1,12(1) */
@@ -375,7 +377,7 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
val - loc + 0xffffUL < 0x1ffffeUL) ||
(r_type == R_390_PLT32DBL &&
val - loc + 0xffffffffULL < 0x1fffffffeULL)))
- val = (Elf_Addr) me->core_layout.base +
+ val = (Elf_Addr) me->mod_mem[MOD_MEM_TYPE_TEXT].base +
me->arch.plt_offset +
info->plt_offset;
val += rela->r_addend - loc;
@@ -397,7 +399,7 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
case R_390_GOTOFF32: /* 32 bit offset to GOT. */
case R_390_GOTOFF64: /* 64 bit offset to GOT. */
val = val + rela->r_addend -
- ((Elf_Addr) me->core_layout.base + me->arch.got_offset);
+ ((Elf_Addr) me->mod_mem[MOD_MEM_TYPE_TEXT].base + me->arch.got_offset);
if (r_type == R_390_GOTOFF16)
rc = apply_rela_bits(loc, val, 0, 16, 0, write);
else if (r_type == R_390_GOTOFF32)
@@ -407,7 +409,7 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
break;
case R_390_GOTPC: /* 32 bit PC relative offset to GOT. */
case R_390_GOTPCDBL: /* 32 bit PC rel. off. to GOT shifted by 1. */
- val = (Elf_Addr) me->core_layout.base + me->arch.got_offset +
+ val = (Elf_Addr) me->mod_mem[MOD_MEM_TYPE_TEXT].base + me->arch.got_offset +
rela->r_addend - loc;
if (r_type == R_390_GOTPC)
rc = apply_rela_bits(loc, val, 1, 32, 0, write);
@@ -418,7 +420,7 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
case R_390_GLOB_DAT: /* Create GOT entry. */
case R_390_JMP_SLOT: /* Create PLT entry. */
case R_390_RELATIVE: /* Adjust by program base. */
- /* Only needed if we want to support loading of
+ /* Only needed if we want to support loading of
modules linked with -shared. */
return -ENOEXEC;
default:
@@ -515,7 +517,7 @@ int module_finalize(const Elf_Ehdr *hdr,
!nospec_disable && me->arch.plt_size) {
unsigned int *ij;

- ij = me->core_layout.base + me->arch.plt_offset +
+ ij = me->mod_mem[MOD_MEM_TYPE_TEXT].base + me->arch.plt_offset +
me->arch.plt_size - PLT_ENTRY_SIZE;
ij[0] = 0xc6000000; /* exrl %r0,.+10 */
ij[1] = 0x0005a7f4; /* j . */
diff --git a/include/linux/module.h b/include/linux/module.h
index ec61fb53979a..3380ef8ebf57 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -320,17 +320,22 @@ struct mod_tree_node {
struct latch_tree_node node;
};

-struct module_layout {
- /* The actual code + data. */
+enum mod_mem_type {
+ MOD_MEM_TYPE_TEXT,
+ MOD_MEM_TYPE_DATA,
+ MOD_MEM_TYPE_RODATA,
+ MOD_MEM_TYPE_RO_AFTER_INIT,
+ MOD_MEM_TYPE_INIT_TEXT,
+ MOD_MEM_TYPE_INIT_DATA,
+ MOD_MEM_TYPE_INIT_RODATA,
+
+ MOD_MEM_NUM_TYPES,
+ MOD_MEM_TYPE_INVALID = -1,
+};
+
+struct module_memory {
void *base;
- /* Total size. */
unsigned int size;
- /* The size of the executable code. */
- unsigned int text_size;
- /* Size of RO section of the module (text+rodata) */
- unsigned int ro_size;
- /* Size of RO after init section */
- unsigned int ro_after_init_size;

#ifdef CONFIG_MODULES_TREE_LOOKUP
struct mod_tree_node mtn;
@@ -339,9 +344,9 @@ struct module_layout {

#ifdef CONFIG_MODULES_TREE_LOOKUP
/* Only touch one cacheline for common rbtree-for-core-layout case. */
-#define __module_layout_align ____cacheline_aligned
+#define __module_memory_align ____cacheline_aligned
#else
-#define __module_layout_align
+#define __module_memory_align
#endif

struct mod_kallsyms {
@@ -418,12 +423,8 @@ struct module {
/* Startup function. */
int (*init)(void);

- /* Core layout: rbtree is accessed frequently, so keep together. */
- struct module_layout core_layout __module_layout_align;
- struct module_layout init_layout;
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- struct module_layout data_layout;
-#endif
+ /* rbtree is accessed frequently, so keep together. */
+ struct module_memory mod_mem[MOD_MEM_NUM_TYPES] __module_memory_align;

/* Arch-specific module values */
struct mod_arch_specific arch;
@@ -573,23 +574,35 @@ bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
bool is_module_percpu_address(unsigned long addr);
bool is_module_text_address(unsigned long addr);

+static inline bool within_module_mem_type(unsigned long addr,
+ const struct module *mod,
+ enum mod_mem_type type)
+{
+ const struct module_memory *mod_mem;
+
+ if (WARN_ON_ONCE(type < MOD_MEM_TYPE_TEXT || type >= MOD_MEM_NUM_TYPES))
+ return false;
+
+ mod_mem = &mod->mod_mem[type];
+ return (unsigned long)mod_mem->base <= addr &&
+ addr < (unsigned long)mod_mem->base + mod_mem->size;
+}
+
static inline bool within_module_core(unsigned long addr,
const struct module *mod)
{
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- if ((unsigned long)mod->data_layout.base <= addr &&
- addr < (unsigned long)mod->data_layout.base + mod->data_layout.size)
- return true;
-#endif
- return (unsigned long)mod->core_layout.base <= addr &&
- addr < (unsigned long)mod->core_layout.base + mod->core_layout.size;
+ return within_module_mem_type(addr, mod, MOD_MEM_TYPE_TEXT) ||
+ within_module_mem_type(addr, mod, MOD_MEM_TYPE_DATA) ||
+ within_module_mem_type(addr, mod, MOD_MEM_TYPE_RODATA) ||
+ within_module_mem_type(addr, mod, MOD_MEM_TYPE_RO_AFTER_INIT);
}

static inline bool within_module_init(unsigned long addr,
const struct module *mod)
{
- return (unsigned long)mod->init_layout.base <= addr &&
- addr < (unsigned long)mod->init_layout.base + mod->init_layout.size;
+ return within_module_mem_type(addr, mod, MOD_MEM_TYPE_INIT_TEXT) ||
+ within_module_mem_type(addr, mod, MOD_MEM_TYPE_INIT_DATA) ||
+ within_module_mem_type(addr, mod, MOD_MEM_TYPE_INIT_RODATA);
}

static inline bool within_module(unsigned long addr, const struct module *mod)
diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index 2e2bf236f558..d31f28b0a973 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -22,22 +22,17 @@
/* Maximum number of characters written by module_flags() */
#define MODULE_FLAGS_BUF_SIZE (TAINT_FLAGS_COUNT + 4)

-#ifndef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
-#define data_layout core_layout
-#endif
+#define for_each_mod_mem_type(type) \
+ for ((type) = MOD_MEM_TYPE_TEXT; \
+ (type) < MOD_MEM_NUM_TYPES; (type)++)

-/*
- * Modules' sections will be aligned on page boundaries
- * to ensure complete separation of code and data, but
- * only when CONFIG_STRICT_MODULE_RWX=y
- */
-static inline unsigned int strict_align(unsigned int size)
-{
- if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
- return PAGE_ALIGN(size);
- else
- return size;
-}
+#define for_core_mod_mem_type(type) \
+ for ((type) = MOD_MEM_TYPE_TEXT; \
+ (type) < MOD_MEM_TYPE_INIT_TEXT; (type)++)
+
+#define for_init_mod_mem_type(type) \
+ for ((type) = MOD_MEM_TYPE_INIT_TEXT; \
+ (type) < MOD_MEM_NUM_TYPES; (type)++)

extern struct mutex module_mutex;
extern struct list_head modules;
@@ -101,8 +96,8 @@ int try_to_force_load(struct module *mod, const char *reason);
bool find_symbol(struct find_symbol_arg *fsa);
struct module *find_module_all(const char *name, size_t len, bool even_unformed);
int cmp_name(const void *name, const void *sym);
-long module_get_offset(struct module *mod, unsigned int *size, Elf_Shdr *sechdr,
- unsigned int section);
+long module_get_offset_and_type(struct module *mod, enum mod_mem_type type,
+ Elf_Shdr *sechdr, unsigned int section);
char *module_flags(struct module *mod, char *buf, bool show_state);
size_t module_flags_taint(unsigned long taints, char *buf);

@@ -190,10 +185,13 @@ struct mod_tree_root {
#endif
unsigned long addr_min;
unsigned long addr_max;
+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+ unsigned long data_addr_min;
+ unsigned long data_addr_max;
+#endif
};

extern struct mod_tree_root mod_tree;
-extern struct mod_tree_root mod_data_tree;

#ifdef CONFIG_MODULES_TREE_LOOKUP
void mod_tree_insert(struct module *mod);
diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
index f5c5c9175333..f68431a4f331 100644
--- a/kernel/module/kallsyms.c
+++ b/kernel/module/kallsyms.c
@@ -113,11 +113,13 @@ void layout_symtab(struct module *mod, struct load_info *info)
Elf_Shdr *strsect = info->sechdrs + info->index.str;
const Elf_Sym *src;
unsigned int i, nsrc, ndst, strtab_size = 0;
+ struct module_memory *mod_mem_data = &mod->mod_mem[MOD_MEM_TYPE_DATA];
+ struct module_memory *mod_mem_init_data = &mod->mod_mem[MOD_MEM_TYPE_INIT_DATA];

/* Put symbol section at end of init part of module. */
symsect->sh_flags |= SHF_ALLOC;
- symsect->sh_entsize = module_get_offset(mod, &mod->init_layout.size, symsect,
- info->index.sym) | INIT_OFFSET_MASK;
+ symsect->sh_entsize = module_get_offset_and_type(mod, MOD_MEM_TYPE_INIT_DATA,
+ symsect, info->index.sym);
pr_debug("\t%s\n", info->secstrings + symsect->sh_name);

src = (void *)info->hdr + symsect->sh_offset;
@@ -134,28 +136,27 @@ void layout_symtab(struct module *mod, struct load_info *info)
}

/* Append room for core symbols at end of core part. */
- info->symoffs = ALIGN(mod->data_layout.size, symsect->sh_addralign ?: 1);
- info->stroffs = mod->data_layout.size = info->symoffs + ndst * sizeof(Elf_Sym);
- mod->data_layout.size += strtab_size;
+ info->symoffs = ALIGN(mod_mem_data->size, symsect->sh_addralign ?: 1);
+ info->stroffs = mod_mem_data->size = info->symoffs + ndst * sizeof(Elf_Sym);
+ mod_mem_data->size += strtab_size;
/* Note add_kallsyms() computes strtab_size as core_typeoffs - stroffs */
- info->core_typeoffs = mod->data_layout.size;
- mod->data_layout.size += ndst * sizeof(char);
- mod->data_layout.size = strict_align(mod->data_layout.size);
+ info->core_typeoffs = mod_mem_data->size;
+ mod_mem_data->size += ndst * sizeof(char);

/* Put string table section at end of init part of module. */
strsect->sh_flags |= SHF_ALLOC;
- strsect->sh_entsize = module_get_offset(mod, &mod->init_layout.size, strsect,
- info->index.str) | INIT_OFFSET_MASK;
+ strsect->sh_entsize = module_get_offset_and_type(mod, MOD_MEM_TYPE_INIT_DATA,
+ strsect, info->index.str);
pr_debug("\t%s\n", info->secstrings + strsect->sh_name);

/* We'll tack temporary mod_kallsyms on the end. */
- mod->init_layout.size = ALIGN(mod->init_layout.size,
- __alignof__(struct mod_kallsyms));
- info->mod_kallsyms_init_off = mod->init_layout.size;
- mod->init_layout.size += sizeof(struct mod_kallsyms);
- info->init_typeoffs = mod->init_layout.size;
- mod->init_layout.size += nsrc * sizeof(char);
- mod->init_layout.size = strict_align(mod->init_layout.size);
+ mod_mem_init_data->size = ALIGN(mod_mem_init_data->size,
+ __alignof__(struct mod_kallsyms));
+ info->mod_kallsyms_init_off = mod_mem_init_data->size;
+
+ mod_mem_init_data->size += sizeof(struct mod_kallsyms);
+ info->init_typeoffs = mod_mem_init_data->size;
+ mod_mem_init_data->size += nsrc * sizeof(char);
}

/*
@@ -171,9 +172,11 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
char *s;
Elf_Shdr *symsec = &info->sechdrs[info->index.sym];
unsigned long strtab_size;
+ void *data_base = mod->mod_mem[MOD_MEM_TYPE_DATA].base;
+ void *init_data_base = mod->mod_mem[MOD_MEM_TYPE_INIT_DATA].base;

/* Set up to point into init section. */
- mod->kallsyms = (void __rcu *)mod->init_layout.base +
+ mod->kallsyms = (void __rcu *)init_data_base +
info->mod_kallsyms_init_off;

rcu_read_lock();
@@ -183,15 +186,15 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
/* Make sure we get permanent strtab: don't use info->strtab. */
rcu_dereference(mod->kallsyms)->strtab =
(void *)info->sechdrs[info->index.str].sh_addr;
- rcu_dereference(mod->kallsyms)->typetab = mod->init_layout.base + info->init_typeoffs;
+ rcu_dereference(mod->kallsyms)->typetab = init_data_base + info->init_typeoffs;

/*
* Now populate the cut down core kallsyms for after init
* and set types up while we still have access to sections.
*/
- mod->core_kallsyms.symtab = dst = mod->data_layout.base + info->symoffs;
- mod->core_kallsyms.strtab = s = mod->data_layout.base + info->stroffs;
- mod->core_kallsyms.typetab = mod->data_layout.base + info->core_typeoffs;
+ mod->core_kallsyms.symtab = dst = data_base + info->symoffs;
+ mod->core_kallsyms.strtab = s = data_base + info->stroffs;
+ mod->core_kallsyms.typetab = data_base + info->core_typeoffs;
strtab_size = info->core_typeoffs - info->stroffs;
src = rcu_dereference(mod->kallsyms)->symtab;
for (ndst = i = 0; i < rcu_dereference(mod->kallsyms)->num_symtab; i++) {
@@ -267,12 +270,15 @@ static const char *find_kallsyms_symbol(struct module *mod,
unsigned int i, best = 0;
unsigned long nextval, bestval;
struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms);
+ struct module_memory *mod_mem;

/* At worse, next value is at end of module */
if (within_module_init(addr, mod))
- nextval = (unsigned long)mod->init_layout.base + mod->init_layout.text_size;
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT];
else
- nextval = (unsigned long)mod->core_layout.base + mod->core_layout.text_size;
+ mod_mem = &mod->mod_mem[MOD_MEM_TYPE_TEXT];
+
+ nextval = (unsigned long)mod_mem->base + mod_mem->size;

bestval = kallsyms_symbol_value(&kallsyms->symtab[best]);

diff --git a/kernel/module/kdb.c b/kernel/module/kdb.c
index f4317f92e189..33f595454488 100644
--- a/kernel/module/kdb.c
+++ b/kernel/module/kdb.c
@@ -26,10 +26,11 @@ int kdb_lsmod(int argc, const char **argv)
if (mod->state == MODULE_STATE_UNFORMED)
continue;

- kdb_printf("%-20s%8u", mod->name, mod->core_layout.size);
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- kdb_printf("/%8u", mod->data_layout.size);
-#endif
+ kdb_printf("%-20s%8u", mod->name, mod->mod_mem[MOD_MEM_TYPE_TEXT].size);
+ kdb_printf("/%8u", mod->mod_mem[MOD_MEM_TYPE_RODATA].size);
+ kdb_printf("/%8u", mod->mod_mem[MOD_MEM_TYPE_RO_AFTER_INIT].size);
+ kdb_printf("/%8u", mod->mod_mem[MOD_MEM_TYPE_DATA].size);
+
kdb_printf(" 0x%px ", (void *)mod);
#ifdef CONFIG_MODULE_UNLOAD
kdb_printf("%4d ", module_refcount(mod));
@@ -40,10 +41,10 @@ int kdb_lsmod(int argc, const char **argv)
kdb_printf(" (Loading)");
else
kdb_printf(" (Live)");
- kdb_printf(" 0x%px", mod->core_layout.base);
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- kdb_printf("/0x%px", mod->data_layout.base);
-#endif
+ kdb_printf(" 0x%px", mod->mod_mem[MOD_MEM_TYPE_TEXT].base);
+ kdb_printf("/0x%px", mod->mod_mem[MOD_MEM_TYPE_RODATA].base);
+ kdb_printf("/0x%px", mod->mod_mem[MOD_MEM_TYPE_RO_AFTER_INIT].base);
+ kdb_printf("/0x%px", mod->mod_mem[MOD_MEM_TYPE_DATA].base);

#ifdef CONFIG_MODULE_UNLOAD
{
diff --git a/kernel/module/main.c b/kernel/module/main.c
index d02d39c7174e..c90eb2ba0985 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -79,15 +79,6 @@ struct mod_tree_root mod_tree __cacheline_aligned = {
.addr_min = -1UL,
};

-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
-struct mod_tree_root mod_data_tree __cacheline_aligned = {
- .addr_min = -1UL,
-};
-#endif
-
-#define module_addr_min mod_tree.addr_min
-#define module_addr_max mod_tree.addr_max
-
struct symsearch {
const struct kernel_symbol *start, *stop;
const s32 *crcs;
@@ -95,14 +86,25 @@ struct symsearch {
};

/*
- * Bounds of module text, for speeding up __module_address.
+ * Bounds of module memory, for speeding up __module_address.
* Protected by module_mutex.
*/
-static void __mod_update_bounds(void *base, unsigned int size, struct mod_tree_root *tree)
+static void __mod_update_bounds(enum mod_mem_type type __maybe_unused, void *base,
+ unsigned int size, struct mod_tree_root *tree)
{
unsigned long min = (unsigned long)base;
unsigned long max = min + size;

+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+ if (type == MOD_MEM_TYPE_DATA || type == MOD_MEM_TYPE_RODATA ||
+ type == MOD_MEM_TYPE_RO_AFTER_INIT) {
+ if (min < tree->data_addr_min)
+ tree->data_addr_min = min;
+ if (max > tree->data_addr_max)
+ tree->data_addr_max = max;
+ return;
+ }
+#endif
if (min < tree->addr_min)
tree->addr_min = min;
if (max > tree->addr_max)
@@ -111,12 +113,14 @@ static void __mod_update_bounds(void *base, unsigned int size, struct mod_tree_r

static void mod_update_bounds(struct module *mod)
{
- __mod_update_bounds(mod->core_layout.base, mod->core_layout.size, &mod_tree);
- if (mod->init_layout.size)
- __mod_update_bounds(mod->init_layout.base, mod->init_layout.size, &mod_tree);
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- __mod_update_bounds(mod->data_layout.base, mod->data_layout.size, &mod_data_tree);
-#endif
+ enum mod_mem_type type;
+
+ for_each_mod_mem_type(type) {
+ struct module_memory *mod_mem = &mod->mod_mem[type];
+
+ if (mod_mem->size)
+ __mod_update_bounds(type, mod_mem->base, mod_mem->size, &mod_tree);
+ }
}

/* Block module loading/unloading? */
@@ -925,11 +929,31 @@ static ssize_t store_uevent(struct module_attribute *mattr,
struct module_attribute module_uevent =
__ATTR(uevent, 0200, NULL, store_uevent);

+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+
+static ssize_t show_coresize(struct module_attribute *mattr,
+ struct module_kobject *mk, char *buffer)
+{
+ enum mod_mem_type type;
+ unsigned int size = 0;
+
+ return sprintf(buffer, "%u\n",
+ mk->mod->mod_mem[MOD_MEM_TYPE_TEXT].size);
+}
+
+#else
+
static ssize_t show_coresize(struct module_attribute *mattr,
struct module_kobject *mk, char *buffer)
{
- return sprintf(buffer, "%u\n", mk->mod->core_layout.size);
+ enum mod_mem_type type;
+ unsigned int size = 0;
+
+ for_core_mod_mem_type(type)
+ size += mk->mod->mod_mem[type].size;
+ return sprintf(buffer, "%u\n", size);
}
+#endif

static struct module_attribute modinfo_coresize =
__ATTR(coresize, 0444, show_coresize, NULL);
@@ -938,7 +962,15 @@ static struct module_attribute modinfo_coresize =
static ssize_t show_datasize(struct module_attribute *mattr,
struct module_kobject *mk, char *buffer)
{
- return sprintf(buffer, "%u\n", mk->mod->data_layout.size);
+ enum mod_mem_type type;
+ unsigned int size = 0;
+
+ for_core_mod_mem_type(type) {
+ if (type == MOD_MEM_TYPE_TEXT)
+ continue;
+ size += mk->mod->mod_mem[type].size;
+ }
+ return sprintf(buffer, "%u\n", size);
}

static struct module_attribute modinfo_datasize =
@@ -948,7 +980,12 @@ static struct module_attribute modinfo_datasize =
static ssize_t show_initsize(struct module_attribute *mattr,
struct module_kobject *mk, char *buffer)
{
- return sprintf(buffer, "%u\n", mk->mod->init_layout.size);
+ enum mod_mem_type type;
+ unsigned int size = 0;
+
+ for_init_mod_mem_type(type)
+ size += mk->mod->mod_mem[type].size;
+ return sprintf(buffer, "%u\n", size);
}

static struct module_attribute modinfo_initsize =
@@ -1145,6 +1182,54 @@ void __weak module_arch_freeing_init(struct module *mod)
{
}

+#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
+static bool mod_mem_use_vmalloc(enum mod_mem_type type)
+{
+ return type == MOD_MEM_TYPE_DATA ||
+ type == MOD_MEM_TYPE_RODATA ||
+ type == MOD_MEM_TYPE_RO_AFTER_INIT;
+}
+#else
+static bool mod_mem_use_vmalloc(enum mod_mem_type type)
+{
+ return false;
+}
+#endif
+
+static void free_mod_mem(struct module *mod)
+{
+ /* free the memory in the right order to avoid use-after-free */
+ static enum mod_mem_type mod_mem_free_order[MOD_MEM_NUM_TYPES] = {
+ /* first free init sections */
+ MOD_MEM_TYPE_INIT_TEXT,
+ MOD_MEM_TYPE_INIT_DATA,
+ MOD_MEM_TYPE_INIT_RODATA,
+
+ /* then core sections, except rw data */
+ MOD_MEM_TYPE_TEXT,
+ MOD_MEM_TYPE_RODATA,
+ MOD_MEM_TYPE_RO_AFTER_INIT,
+
+ /* last, rw data */
+ MOD_MEM_TYPE_DATA,
+ };
+ int i;
+
+ for (i = 0; i < MOD_MEM_NUM_TYPES; i++) {
+ enum mod_mem_type type = mod_mem_free_order[i];
+ struct module_memory *mod_mem = &mod->mod_mem[type];
+
+ /* Free lock-classes; relies on the preceding sync_rcu(). */
+ lockdep_free_key_range(mod_mem->base, mod_mem->size);
+ if (mod_mem->size) {
+ if (mod_mem_use_vmalloc(type))
+ vfree(mod_mem->base);
+ else
+ module_memfree(mod_mem->base);
+ }
+ }
+}
+
/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
@@ -1191,18 +1276,10 @@ static void free_module(struct module *mod)

/* This may be empty, but that's OK */
module_arch_freeing_init(mod);
- module_memfree(mod->init_layout.base);
kfree(mod->args);
percpu_modfree(mod);

- /* Free lock-classes; relies on the preceding sync_rcu(). */
- lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);
-
- /* Finally, free the core (containing the module structure) */
- module_memfree(mod->core_layout.base);
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- vfree(mod->data_layout.base);
-#endif
+ free_mod_mem(mod);
}

void *__symbol_get(const char *symbol)
@@ -1389,16 +1466,29 @@ unsigned int __weak arch_mod_section_prepend(struct module *mod,
return 0;
}

-/* Update size with this section: return offset. */
-long module_get_offset(struct module *mod, unsigned int *size,
- Elf_Shdr *sechdr, unsigned int section)
+/*
+ * Use highest 4 bits of sh_entsize to store the mod_mem_type of this
+ * section. This leaves 28 bits for offset on 32-bit systems, which is
+ * about 256 MiB (WARN_ON_ONCE if we exceed that).
+ */
+
+#define SH_ENTSIZE_TYPE_BITS 4
+#define SH_ENTSIZE_TYPE_SHIFT (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)
+#define SH_ENTSIZE_TYPE_MASK ((1UL << SH_ENTSIZE_TYPE_BITS) - 1)
+#define SH_ENTSIZE_OFFSET_MASK ((1UL << (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)) - 1)
+
+long module_get_offset_and_type(struct module *mod, enum mod_mem_type type,
+ Elf_Shdr *sechdr, unsigned int section)
{
- long ret;
+ long offset;
+ long mask = ((unsigned long)(type) & SH_ENTSIZE_TYPE_MASK) << SH_ENTSIZE_TYPE_SHIFT;

- *size += arch_mod_section_prepend(mod, section);
- ret = ALIGN(*size, sechdr->sh_addralign ?: 1);
- *size = ret + sechdr->sh_size;
- return ret;
+ mod->mod_mem[type].size += arch_mod_section_prepend(mod, section);
+ offset = ALIGN(mod->mod_mem[type].size, sechdr->sh_addralign ?: 1);
+ mod->mod_mem[type].size = offset + sechdr->sh_size;
+
+ WARN_ON_ONCE(offset & mask);
+ return offset | mask;
}

static bool module_init_layout_section(const char *sname)
@@ -1430,6 +1520,20 @@ static void layout_sections(struct module *mod, struct load_info *info)
{ SHF_WRITE | SHF_ALLOC, ARCH_SHF_SMALL },
{ ARCH_SHF_SMALL | SHF_ALLOC, 0 }
};
+ static int core_m_to_mem_type[] = {
+ MOD_MEM_TYPE_TEXT,
+ MOD_MEM_TYPE_RODATA,
+ MOD_MEM_TYPE_RO_AFTER_INIT,
+ MOD_MEM_TYPE_DATA,
+ MOD_MEM_TYPE_INVALID,
+ };
+ static int init_m_to_mem_type[] = {
+ MOD_MEM_TYPE_INIT_TEXT,
+ MOD_MEM_TYPE_INIT_RODATA,
+ MOD_MEM_TYPE_INVALID,
+ MOD_MEM_TYPE_INIT_DATA,
+ MOD_MEM_TYPE_INVALID,
+ };
unsigned int m, i;

for (i = 0; i < info->hdr->e_shnum; i++)
@@ -1437,41 +1541,30 @@ static void layout_sections(struct module *mod, struct load_info *info)

pr_debug("Core section allocation order:\n");
for (m = 0; m < ARRAY_SIZE(masks); ++m) {
+ enum mod_mem_type type = core_m_to_mem_type[m];
+
for (i = 0; i < info->hdr->e_shnum; ++i) {
Elf_Shdr *s = &info->sechdrs[i];
const char *sname = info->secstrings + s->sh_name;
- unsigned int *sizep;

if ((s->sh_flags & masks[m][0]) != masks[m][0]
|| (s->sh_flags & masks[m][1])
|| s->sh_entsize != ~0UL
|| module_init_layout_section(sname))
continue;
- sizep = m ? &mod->data_layout.size : &mod->core_layout.size;
- s->sh_entsize = module_get_offset(mod, sizep, s, i);
+
+ if (WARN_ON_ONCE(type == MOD_MEM_TYPE_INVALID))
+ continue;
+
+ s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
pr_debug("\t%s\n", sname);
}
- switch (m) {
- case 0: /* executable */
- mod->core_layout.size = strict_align(mod->core_layout.size);
- mod->core_layout.text_size = mod->core_layout.size;
- break;
- case 1: /* RO: text and ro-data */
- mod->data_layout.size = strict_align(mod->data_layout.size);
- mod->data_layout.ro_size = mod->data_layout.size;
- break;
- case 2: /* RO after init */
- mod->data_layout.size = strict_align(mod->data_layout.size);
- mod->data_layout.ro_after_init_size = mod->data_layout.size;
- break;
- case 4: /* whole core */
- mod->data_layout.size = strict_align(mod->data_layout.size);
- break;
- }
}

pr_debug("Init section allocation order:\n");
for (m = 0; m < ARRAY_SIZE(masks); ++m) {
+ enum mod_mem_type type = init_m_to_mem_type[m];
+
for (i = 0; i < info->hdr->e_shnum; ++i) {
Elf_Shdr *s = &info->sechdrs[i];
const char *sname = info->secstrings + s->sh_name;
@@ -1481,30 +1574,13 @@ static void layout_sections(struct module *mod, struct load_info *info)
|| s->sh_entsize != ~0UL
|| !module_init_layout_section(sname))
continue;
- s->sh_entsize = (module_get_offset(mod, &mod->init_layout.size, s, i)
- | INIT_OFFSET_MASK);
+
+ if (WARN_ON_ONCE(type == MOD_MEM_TYPE_INVALID))
+ continue;
+
+ s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
pr_debug("\t%s\n", sname);
}
- switch (m) {
- case 0: /* executable */
- mod->init_layout.size = strict_align(mod->init_layout.size);
- mod->init_layout.text_size = mod->init_layout.size;
- break;
- case 1: /* RO: text and ro-data */
- mod->init_layout.size = strict_align(mod->init_layout.size);
- mod->init_layout.ro_size = mod->init_layout.size;
- break;
- case 2:
- /*
- * RO after init doesn't apply to init_layout (only
- * core_layout), so it just takes the value of ro_size.
- */
- mod->init_layout.ro_after_init_size = mod->init_layout.ro_size;
- break;
- case 4: /* whole init */
- mod->init_layout.size = strict_align(mod->init_layout.size);
- break;
- }
}
}

@@ -2117,74 +2193,45 @@ static int find_module_sections(struct module *mod, struct load_info *info)

static int move_module(struct module *mod, struct load_info *info)
{
+ enum mod_mem_type type;
int i;
void *ptr;

- /* Do the allocs. */
- ptr = module_alloc(mod->core_layout.size);
- /*
- * The pointer to this block is stored in the module structure
- * which is inside the block. Just mark it as not being a
- * leak.
- */
- kmemleak_not_leak(ptr);
- if (!ptr)
- return -ENOMEM;
-
- memset(ptr, 0, mod->core_layout.size);
- mod->core_layout.base = ptr;
+ for_each_mod_mem_type(type) {
+ if (!mod->mod_mem[type].size) {
+ mod->mod_mem[type].base = NULL;
+ continue;
+ }
+ mod->mod_mem[type].size = PAGE_ALIGN(mod->mod_mem[type].size);
+ if (mod_mem_use_vmalloc(type))
+ ptr = vzalloc(mod->mod_mem[type].size);
+ else
+ ptr = module_alloc(mod->mod_mem[type].size);

- if (mod->init_layout.size) {
- ptr = module_alloc(mod->init_layout.size);
/*
* The pointer to this block is stored in the module structure
- * which is inside the block. This block doesn't need to be
- * scanned as it contains data and code that will be freed
- * after the module is initialized.
+ * which is inside the block. Just mark it as not being a
+ * leak.
*/
kmemleak_ignore(ptr);
- if (!ptr) {
- module_memfree(mod->core_layout.base);
- return -ENOMEM;
- }
- memset(ptr, 0, mod->init_layout.size);
- mod->init_layout.base = ptr;
- } else
- mod->init_layout.base = NULL;
-
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- /* Do the allocs. */
- ptr = vzalloc(mod->data_layout.size);
- /*
- * The pointer to this block is stored in the module structure
- * which is inside the block. Just mark it as not being a
- * leak.
- */
- kmemleak_not_leak(ptr);
- if (!ptr) {
- module_memfree(mod->core_layout.base);
- module_memfree(mod->init_layout.base);
- return -ENOMEM;
+ if (!ptr)
+ goto out_enomem;
+ memset(ptr, 0, mod->mod_mem[type].size);
+ mod->mod_mem[type].base = ptr;
}

- mod->data_layout.base = ptr;
-#endif
/* Transfer each section which specifies SHF_ALLOC */
pr_debug("final section addresses:\n");
for (i = 0; i < info->hdr->e_shnum; i++) {
void *dest;
Elf_Shdr *shdr = &info->sechdrs[i];
+ enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;

if (!(shdr->sh_flags & SHF_ALLOC))
continue;

- if (shdr->sh_entsize & INIT_OFFSET_MASK)
- dest = mod->init_layout.base
- + (shdr->sh_entsize & ~INIT_OFFSET_MASK);
- else if (!(shdr->sh_flags & SHF_EXECINSTR))
- dest = mod->data_layout.base + shdr->sh_entsize;
- else
- dest = mod->core_layout.base + shdr->sh_entsize;
+ dest = mod->mod_mem[type].base +
+ (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);

if (shdr->sh_type != SHT_NOBITS)
memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
@@ -2195,6 +2242,14 @@ static int move_module(struct module *mod, struct load_info *info)
}

return 0;
+out_enomem:
+ for (type--; type >= 0; type--) {
+ if (mod_mem_use_vmalloc(type))
+ vfree(mod->mod_mem[type].base);
+ else
+ module_memfree(mod->mod_mem[type].base);
+ }
+ return -ENOMEM;
}

static int check_module_license_and_versions(struct module *mod)
@@ -2234,17 +2289,20 @@ static int check_module_license_and_versions(struct module *mod)

static void flush_module_icache(const struct module *mod)
{
+ enum mod_mem_type type;
+
/*
* Flush the instruction cache, since we've played with text.
* Do it before processing of module parameters, so the module
* can provide parameter accessor functions of its own.
*/
- if (mod->init_layout.base)
- flush_icache_range((unsigned long)mod->init_layout.base,
- (unsigned long)mod->init_layout.base
- + mod->init_layout.size);
- flush_icache_range((unsigned long)mod->core_layout.base,
- (unsigned long)mod->core_layout.base + mod->core_layout.size);
+ for_each_mod_mem_type(type) {
+ const struct module_memory *mod_mem = &mod->mod_mem[type];
+
+ if (mod_mem->size)
+ flush_icache_range((unsigned long)mod_mem->base,
+ (unsigned long)mod_mem->base + mod_mem->size);
+ }
}

int __weak module_frob_arch_sections(Elf_Ehdr *hdr,
@@ -2342,11 +2400,8 @@ static void module_deallocate(struct module *mod, struct load_info *info)
{
percpu_modfree(mod);
module_arch_freeing_init(mod);
- module_memfree(mod->init_layout.base);
- module_memfree(mod->core_layout.base);
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- vfree(mod->data_layout.base);
-#endif
+
+ free_mod_mem(mod);
}

int __weak module_finalize(const Elf_Ehdr *hdr,
@@ -2406,7 +2461,9 @@ static void do_mod_ctors(struct module *mod)
/* For freeing module_init on success, in case kallsyms traversing */
struct mod_initfree {
struct llist_node node;
- void *module_init;
+ void *init_text;
+ void *init_data;
+ void *init_rodata;
};

static void do_free_init(struct work_struct *w)
@@ -2420,7 +2477,9 @@ static void do_free_init(struct work_struct *w)

llist_for_each_safe(pos, n, list) {
initfree = container_of(pos, struct mod_initfree, node);
- module_memfree(initfree->module_init);
+ module_memfree(initfree->init_text);
+ module_memfree(initfree->init_data);
+ module_memfree(initfree->init_rodata);
kfree(initfree);
}
}
@@ -2441,13 +2500,16 @@ static noinline int do_init_module(struct module *mod)
{
int ret = 0;
struct mod_initfree *freeinit;
+ enum mod_mem_type type;

freeinit = kmalloc(sizeof(*freeinit), GFP_KERNEL);
if (!freeinit) {
ret = -ENOMEM;
goto fail;
}
- freeinit->module_init = mod->init_layout.base;
+ freeinit->init_text = mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT].base;
+ freeinit->init_data = mod->mod_mem[MOD_MEM_TYPE_INIT_DATA].base;
+ freeinit->init_rodata = mod->mod_mem[MOD_MEM_TYPE_INIT_RODATA].base;

do_mod_ctors(mod);
/* Start the module */
@@ -2483,8 +2545,9 @@ static noinline int do_init_module(struct module *mod)
if (!mod->async_probe_requested)
async_synchronize_full();

- ftrace_free_mem(mod, mod->init_layout.base, mod->init_layout.base +
- mod->init_layout.size);
+ ftrace_free_mem(mod, mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT].base,
+ mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT].base +
+ mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT].size);
mutex_lock(&module_mutex);
/* Drop initial reference. */
module_put(mod);
@@ -2496,11 +2559,10 @@ static noinline int do_init_module(struct module *mod)
module_enable_ro(mod, true);
mod_tree_remove_init(mod);
module_arch_freeing_init(mod);
- mod->init_layout.base = NULL;
- mod->init_layout.size = 0;
- mod->init_layout.ro_size = 0;
- mod->init_layout.ro_after_init_size = 0;
- mod->init_layout.text_size = 0;
+ for_init_mod_mem_type(type) {
+ mod->mod_mem[type].base = NULL;
+ mod->mod_mem[type].size = 0;
+ }
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
/* .BTF is not SHF_ALLOC and will get removed, so sanitize pointer */
mod->btf_data = NULL;
@@ -2674,6 +2736,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
struct module *mod;
long err = 0;
char *after_dashes;
+ enum mod_mem_type type;

/*
* Do the signature check (if any) first. All that
@@ -2885,7 +2948,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
mutex_unlock(&module_mutex);
free_module:
/* Free lock-classes; relies on the preceding sync_rcu() */
- lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);
+ for_core_mod_mem_type(type) {
+ lockdep_free_key_range(mod->mod_mem[type].base,
+ mod->mod_mem[type].size);
+ }

module_deallocate(mod, info);
free_copy:
@@ -3036,20 +3102,21 @@ bool is_module_address(unsigned long addr)
struct module *__module_address(unsigned long addr)
{
struct module *mod;
- struct mod_tree_root *tree;

if (addr >= mod_tree.addr_min && addr <= mod_tree.addr_max)
- tree = &mod_tree;
+ goto lookup;
+
#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- else if (addr >= mod_data_tree.addr_min && addr <= mod_data_tree.addr_max)
- tree = &mod_data_tree;
+ if (addr >= mod_tree.data_addr_min && addr <= mod_tree.data_addr_max)
+ goto lookup;
#endif
- else
- return NULL;

+ return NULL;
+
+lookup:
module_assert_mutex_or_preempt();

- mod = mod_find(addr, tree);
+ mod = mod_find(addr, &mod_tree);
if (mod) {
BUG_ON(!within_module(addr, mod));
if (mod->state == MODULE_STATE_UNFORMED)
@@ -3088,9 +3155,13 @@ struct module *__module_text_address(unsigned long addr)
{
struct module *mod = __module_address(addr);
if (mod) {
+ struct module_memory *text_mem, *init_text_mem;
+
+ text_mem = &mod->mod_mem[MOD_MEM_TYPE_TEXT];
+ init_text_mem = &mod->mod_mem[MOD_MEM_TYPE_INIT_TEXT];
/* Make sure it's within the text section. */
- if (!within(addr, mod->init_layout.base, mod->init_layout.text_size)
- && !within(addr, mod->core_layout.base, mod->core_layout.text_size))
+ if (!within(addr, text_mem->base, text_mem->size)
+ && !within(addr, init_text_mem->base, init_text_mem->size))
mod = NULL;
}
return mod;
diff --git a/kernel/module/procfs.c b/kernel/module/procfs.c
index cf5b9f1e6ec4..1fce4209aeb2 100644
--- a/kernel/module/procfs.c
+++ b/kernel/module/procfs.c
@@ -62,6 +62,16 @@ static void m_stop(struct seq_file *m, void *p)
mutex_unlock(&module_mutex);
}

+static unsigned int module_total_size(struct module *mod)
+{
+ int size = 0;
+ enum mod_mem_type type;
+
+ for_each_mod_mem_type(type)
+ size += mod->mod_mem[type].size;
+ return size;
+}
+
static int m_show(struct seq_file *m, void *p)
{
struct module *mod = list_entry(p, struct module, list);
@@ -73,10 +83,7 @@ static int m_show(struct seq_file *m, void *p)
if (mod->state == MODULE_STATE_UNFORMED)
return 0;

- size = mod->init_layout.size + mod->core_layout.size;
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- size += mod->data_layout.size;
-#endif
+ size = module_total_size(mod);
seq_printf(m, "%s %u", mod->name, size);
print_unload_info(m, mod);

@@ -86,7 +93,7 @@ static int m_show(struct seq_file *m, void *p)
mod->state == MODULE_STATE_COMING ? "Loading" :
"Live");
/* Used by oprofile and other similar tools. */
- value = m->private ? NULL : mod->core_layout.base;
+ value = m->private ? NULL : mod->mod_mem[MOD_MEM_TYPE_TEXT].base;
seq_printf(m, " 0x%px", value);

/* Taints info */
diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c
index 14fbea66f12f..1db895642afc 100644
--- a/kernel/module/strict_rwx.c
+++ b/kernel/module/strict_rwx.c
@@ -11,82 +11,40 @@
#include <linux/set_memory.h>
#include "internal.h"

-/*
- * LKM RO/NX protection: protect module's text/ro-data
- * from modification and any data from execution.
- *
- * General layout of module is:
- * [text] [read-only-data] [ro-after-init] [writable data]
- * text_size -----^ ^ ^ ^
- * ro_size ------------------------| | |
- * ro_after_init_size -----------------------------| |
- * size -----------------------------------------------------------|
- *
- * These values are always page-aligned (as is base) when
- * CONFIG_STRICT_MODULE_RWX is set.
- */
-
-/*
- * Since some arches are moving towards PAGE_KERNEL module allocations instead
- * of PAGE_KERNEL_EXEC, keep frob_text() and module_enable_x() independent of
- * CONFIG_STRICT_MODULE_RWX because they are needed regardless of whether we
- * are strict.
- */
-static void frob_text(const struct module_layout *layout,
- int (*set_memory)(unsigned long start, int num_pages))
-{
- set_memory((unsigned long)layout->base,
- PAGE_ALIGN(layout->text_size) >> PAGE_SHIFT);
-}
-
-static void frob_rodata(const struct module_layout *layout,
- int (*set_memory)(unsigned long start, int num_pages))
-{
- set_memory((unsigned long)layout->base + layout->text_size,
- (layout->ro_size - layout->text_size) >> PAGE_SHIFT);
-}
-
-static void frob_ro_after_init(const struct module_layout *layout,
- int (*set_memory)(unsigned long start, int num_pages))
+bool module_check_misalignment(const struct module *mod)
{
- set_memory((unsigned long)layout->base + layout->ro_size,
- (layout->ro_after_init_size - layout->ro_size) >> PAGE_SHIFT);
-}
+ enum mod_mem_type type;

-static void frob_writable_data(const struct module_layout *layout,
- int (*set_memory)(unsigned long start, int num_pages))
-{
- set_memory((unsigned long)layout->base + layout->ro_after_init_size,
- (layout->size - layout->ro_after_init_size) >> PAGE_SHIFT);
-}
+ if (!IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
+ return false;

-static bool layout_check_misalignment(const struct module_layout *layout)
-{
- return WARN_ON(!PAGE_ALIGNED(layout->base)) ||
- WARN_ON(!PAGE_ALIGNED(layout->text_size)) ||
- WARN_ON(!PAGE_ALIGNED(layout->ro_size)) ||
- WARN_ON(!PAGE_ALIGNED(layout->ro_after_init_size)) ||
- WARN_ON(!PAGE_ALIGNED(layout->size));
+ for_each_mod_mem_type(type) {
+ if (WARN_ON(!PAGE_ALIGNED(mod->mod_mem[type].base)))
+ return true;
+ }
+ return false;
}

-bool module_check_misalignment(const struct module *mod)
+static void module_set_memory(
+ const struct module *mod, enum mod_mem_type type,
+ int (*set_memory)(unsigned long start, int num_pages))
{
- if (!IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
- return false;
+ const struct module_memory *mod_mem = &mod->mod_mem[type];

- return layout_check_misalignment(&mod->core_layout) ||
- layout_check_misalignment(&mod->data_layout) ||
- layout_check_misalignment(&mod->init_layout);
+ set_vm_flush_reset_perms(mod_mem->base);
+ set_memory((unsigned long)mod_mem->base, mod_mem->size >> PAGE_SHIFT);
}

+/*
+ * Since some arches are moving towards PAGE_KERNEL module allocations instead
+ * of PAGE_KERNEL_EXEC, keep module_enable_x() independent of
+ * CONFIG_STRICT_MODULE_RWX because they are needed regardless of whether we
+ * are strict.
+ */
void module_enable_x(const struct module *mod)
{
- if (!PAGE_ALIGNED(mod->core_layout.base) ||
- !PAGE_ALIGNED(mod->init_layout.base))
- return;
-
- frob_text(&mod->core_layout, set_memory_x);
- frob_text(&mod->init_layout, set_memory_x);
+ module_set_memory(mod, MOD_MEM_TYPE_TEXT, set_memory_x);
+ module_set_memory(mod, MOD_MEM_TYPE_INIT_TEXT, set_memory_x);
}

void module_enable_ro(const struct module *mod, bool after_init)
@@ -98,28 +56,27 @@ void module_enable_ro(const struct module *mod, bool after_init)
return;
#endif

- set_vm_flush_reset_perms(mod->core_layout.base);
- set_vm_flush_reset_perms(mod->init_layout.base);
- frob_text(&mod->core_layout, set_memory_ro);
-
- frob_rodata(&mod->data_layout, set_memory_ro);
- frob_text(&mod->init_layout, set_memory_ro);
- frob_rodata(&mod->init_layout, set_memory_ro);
+ module_set_memory(mod, MOD_MEM_TYPE_TEXT, set_memory_ro);
+ module_set_memory(mod, MOD_MEM_TYPE_INIT_TEXT, set_memory_ro);
+ module_set_memory(mod, MOD_MEM_TYPE_RODATA, set_memory_ro);
+ module_set_memory(mod, MOD_MEM_TYPE_INIT_RODATA, set_memory_ro);

if (after_init)
- frob_ro_after_init(&mod->data_layout, set_memory_ro);
+ module_set_memory(mod, MOD_MEM_TYPE_RO_AFTER_INIT, set_memory_ro);
}

void module_enable_nx(const struct module *mod)
{
+ enum mod_mem_type type;
+
if (!IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
return;

- frob_rodata(&mod->data_layout, set_memory_nx);
- frob_ro_after_init(&mod->data_layout, set_memory_nx);
- frob_writable_data(&mod->data_layout, set_memory_nx);
- frob_rodata(&mod->init_layout, set_memory_nx);
- frob_writable_data(&mod->init_layout, set_memory_nx);
+ for_each_mod_mem_type(type) {
+ if (type == MOD_MEM_TYPE_TEXT || type == MOD_MEM_TYPE_INIT_TEXT)
+ continue;
+ module_set_memory(mod, type, set_memory_nx);
+ }
}

int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
diff --git a/kernel/module/tree_lookup.c b/kernel/module/tree_lookup.c
index 8ec5cfd60496..e9744da6f6af 100644
--- a/kernel/module/tree_lookup.c
+++ b/kernel/module/tree_lookup.c
@@ -21,16 +21,16 @@

static __always_inline unsigned long __mod_tree_val(struct latch_tree_node *n)
{
- struct module_layout *layout = container_of(n, struct module_layout, mtn.node);
+ struct module_memory *mod_mem = container_of(n, struct module_memory, mtn.node);

- return (unsigned long)layout->base;
+ return (unsigned long)mod_mem->base;
}

static __always_inline unsigned long __mod_tree_size(struct latch_tree_node *n)
{
- struct module_layout *layout = container_of(n, struct module_layout, mtn.node);
+ struct module_memory *mod_mem = container_of(n, struct module_memory, mtn.node);

- return (unsigned long)layout->size;
+ return (unsigned long)mod_mem->size;
}

static __always_inline bool
@@ -77,32 +77,33 @@ static void __mod_tree_remove(struct mod_tree_node *node, struct mod_tree_root *
*/
void mod_tree_insert(struct module *mod)
{
- mod->core_layout.mtn.mod = mod;
- mod->init_layout.mtn.mod = mod;
+ enum mod_mem_type type;

- __mod_tree_insert(&mod->core_layout.mtn, &mod_tree);
- if (mod->init_layout.size)
- __mod_tree_insert(&mod->init_layout.mtn, &mod_tree);
-
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- mod->data_layout.mtn.mod = mod;
- __mod_tree_insert(&mod->data_layout.mtn, &mod_data_tree);
-#endif
+ for_each_mod_mem_type(type) {
+ mod->mod_mem[type].mtn.mod = mod;
+ if (mod->mod_mem[type].size)
+ __mod_tree_insert(&mod->mod_mem[type].mtn, &mod_tree);
+ }
}

void mod_tree_remove_init(struct module *mod)
{
- if (mod->init_layout.size)
- __mod_tree_remove(&mod->init_layout.mtn, &mod_tree);
+ enum mod_mem_type type;
+
+ for_init_mod_mem_type(type) {
+ if (mod->mod_mem[type].size)
+ __mod_tree_remove(&mod->mod_mem[type].mtn, &mod_tree);
+ }
}

void mod_tree_remove(struct module *mod)
{
- __mod_tree_remove(&mod->core_layout.mtn, &mod_tree);
- mod_tree_remove_init(mod);
-#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
- __mod_tree_remove(&mod->data_layout.mtn, &mod_data_tree);
-#endif
+ enum mod_mem_type type;
+
+ for_each_mod_mem_type(type) {
+ if (mod->mod_mem[type].size)
+ __mod_tree_remove(&mod->mod_mem[type].mtn, &mod_tree);
+ }
}

struct module *mod_find(unsigned long addr, struct mod_tree_root *tree)
--
2.30.2



2023-01-26 06:06:23

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] module: replace module_layout with module_memory

On Wed, Jan 25, 2023 at 10:50:04AM -0800, Song Liu wrote:
> module_layout manages different types of memory (text, data, rodata, etc.)
> in one allocation, which is problematic for some reasons:
>
> 1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
> 2. It is hard to use huge pages in modules (and not break strict rwx).
> 3. Many archs uses module_layout for arch-specific data, but it is not
> obvious how these data are used (are they RO, RX, or RW?)
>
> Improve the scenario by replacing 2 (or 3) module_layout per module with
> up to 7 module_memory per module:
>
> MOD_MEM_TYPE_TEXT,
> MOD_MEM_TYPE_DATA,
> MOD_MEM_TYPE_RODATA,
> MOD_MEM_TYPE_RO_AFTER_INIT,
> MOD_MEM_TYPE_INIT_TEXT,
> MOD_MEM_TYPE_INIT_DATA,
> MOD_MEM_TYPE_INIT_RODATA,
>
> and allocating them separately.

First thanks for doing this work!

This seems to not acknolwedge the original goal of the first module_layout and
the latched rb-tree use, and that was was for speeding up __module_address()
since it *can* even be triggered on NMIs. I say this because the first question
that comes to me is the impact to performance on __module_address() I can't
see that changing much here, but mention it as it similar consideration
should be made in case future changes modify this path.

Microbenching something so trivial as __module_address() may not be as useful
for an idle system, at the very least being able to compare before and after
even on idle may be useful *if* you eventually do some more radical changes
here. Modules-related kernel/kallsyms_selftest.c did that for kallsyms_lookup_name()
and friend just recently for a minor performance enhancement.

At a high level it is perhaps my only conern so far while reviewing this patch,
and I am glad it is now clear that addressing modules is a requirement. The rest
seems like a highly welcomed cleanup even though the diffstat does not reflect
that.

> Various archs use module_layout for different data. These data are put
> into different module_memory based on their location in module_layout.
> IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
> data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.

I think the commit log should document a bit the rationale for why
ARCH_WANTS_MODULES_DATA_IN_VMALLOC gets *can* tidied up so well here.

> Signed-off-by: Song Liu <[email protected]>
> Cc: Luis Chamberlain <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
>
> ---
>
> This is the preparation work for the type aware module_alloc() discussed
> in [1]. While this work is not covered much in the discussion, it is a
> critical step of the effort.
>
> As this part grows pretty big (~1000 lines, + and -), I would like get
> some feedback on it, so that I know it is on the right track.
>
> Please share your comments. Thanks!
>
> Test coverage: Tested on x86_64.

I will likely merge this onto modules-next soon, not because I think it is
ready, but just because I think it *is* mostly ready and the next thing
we need is exposure and testing. rc5 is pretty late to consider this
for v6.3 and so hopefully for this cycle we can at least settle on
something which will sit in linux-next since the respective linux-next
after v6.3-rc1 is released.

> Build tested by kernel test bot in [2]. The only regression in [2] was a
> typo in parisc, which is also fixed.
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/T/#u

You still never addressed my performance suggestions so don't be
surprised if I insist later. Yes you can use existing performance
benchmarks, specially now with modules as a hard requirement, to
show gains. So I'd like to clarify that if I'm reviewing late it is
because:

a) my modules patch review queue has been high as of late
b) you seem to not have taken these performance suggestions into consideration
before and so I tend to put it at my end of my queue for review.

> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index d02d39c7174e..c90eb2ba0985 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -925,11 +929,31 @@ static ssize_t store_uevent(struct module_attribute *mattr,
> struct module_attribute module_uevent =
> __ATTR(uevent, 0200, NULL, store_uevent);
>
> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> +
> +static ssize_t show_coresize(struct module_attribute *mattr,
> + struct module_kobject *mk, char *buffer)
> +{
> + enum mod_mem_type type;

This is unused.

> @@ -1389,16 +1466,29 @@ unsigned int __weak arch_mod_section_prepend(struct module *mod,
> return 0;
> }
>
> -/* Update size with this section: return offset. */
> -long module_get_offset(struct module *mod, unsigned int *size,
> - Elf_Shdr *sechdr, unsigned int section)
> +/*
> + * Use highest 4 bits of sh_entsize to store the mod_mem_type of this
> + * section. This leaves 28 bits for offset on 32-bit systems, which is
> + * about 256 MiB (WARN_ON_ONCE if we exceed that).
> + */
> +
> +#define SH_ENTSIZE_TYPE_BITS 4
> +#define SH_ENTSIZE_TYPE_SHIFT (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)
> +#define SH_ENTSIZE_TYPE_MASK ((1UL << SH_ENTSIZE_TYPE_BITS) - 1)
> +#define SH_ENTSIZE_OFFSET_MASK ((1UL << (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)) - 1)
> +
> +long module_get_offset_and_type(struct module *mod, enum mod_mem_type type,
> + Elf_Shdr *sechdr, unsigned int section)
> {
> - long ret;
> + long offset;
> + long mask = ((unsigned long)(type) & SH_ENTSIZE_TYPE_MASK) << SH_ENTSIZE_TYPE_SHIFT;
>
> - *size += arch_mod_section_prepend(mod, section);
> - ret = ALIGN(*size, sechdr->sh_addralign ?: 1);
> - *size = ret + sechdr->sh_size;
> - return ret;
> + mod->mod_mem[type].size += arch_mod_section_prepend(mod, section);
> + offset = ALIGN(mod->mod_mem[type].size, sechdr->sh_addralign ?: 1);
> + mod->mod_mem[type].size = offset + sechdr->sh_size;
> +
> + WARN_ON_ONCE(offset & mask);
> + return offset | mask;
> }

Could you split this somehow with the mask stuff up into its own
non-functional patch first somehow to make this easier to review?

No worries if you can't.

> @@ -2117,74 +2193,45 @@ static int find_module_sections(struct module *mod, struct load_info *info)
>
> static int move_module(struct module *mod, struct load_info *info)
> {
> + enum mod_mem_type type;
> int i;
> void *ptr;
>
> - /* Do the allocs. */
> - ptr = module_alloc(mod->core_layout.size);
> - /*
> - * The pointer to this block is stored in the module structure
> - * which is inside the block. Just mark it as not being a
> - * leak.
> - */
> - kmemleak_not_leak(ptr);
> - if (!ptr)
> - return -ENOMEM;
> -
> - memset(ptr, 0, mod->core_layout.size);
> - mod->core_layout.base = ptr;
> + for_each_mod_mem_type(type) {
> + if (!mod->mod_mem[type].size) {
> + mod->mod_mem[type].base = NULL;
> + continue;
> + }
> + mod->mod_mem[type].size = PAGE_ALIGN(mod->mod_mem[type].size);
> + if (mod_mem_use_vmalloc(type))
> + ptr = vzalloc(mod->mod_mem[type].size);
> + else
> + ptr = module_alloc(mod->mod_mem[type].size);

This form to check for mod_mem_use_vmalloc() is used twice, how about
a helper to just do it one line for us?

> @@ -2195,6 +2242,14 @@ static int move_module(struct module *mod, struct load_info *info)
> }
>
> return 0;
> +out_enomem:
> + for (type--; type >= 0; type--) {
> + if (mod_mem_use_vmalloc(type))
> + vfree(mod->mod_mem[type].base);
> + else
> + module_memfree(mod->mod_mem[type].base);
> + }
> + return -ENOMEM;
> }

Here's the other user.

> diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c

This was the hardest file to review and so I ask the same, if a separate
non-functional change could be done first to make the changes easier
to review.

No worries if not.

Reviewed-by: Luis Chamberlain <[email protected]>

I'll queue this onto modules-testing to see what blows up. After at
least one more review I'll queue this onto modules-next.

Luis

2023-01-26 06:13:36

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] module: replace module_layout with module_memory

On Wed, Jan 25, 2023 at 10:06:04PM -0800, Luis Chamberlain wrote:
> I'll queue this onto modules-testing to see what blows up. After at
> least one more review I'll queue this onto modules-next.

I had to manually address a small conflict on modules-next, for your next spin,
can you base it on modules-next [0]?

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next

Luis

2023-01-26 20:02:04

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v2] module: replace module_layout with module_memory

Hi Luis,

Thanks for your kind review!

On Wed, Jan 25, 2023 at 10:06 PM Luis Chamberlain <[email protected]> wrote:
>
[...]
> >
> > MOD_MEM_TYPE_TEXT,
> > MOD_MEM_TYPE_DATA,
> > MOD_MEM_TYPE_RODATA,
> > MOD_MEM_TYPE_RO_AFTER_INIT,
> > MOD_MEM_TYPE_INIT_TEXT,
> > MOD_MEM_TYPE_INIT_DATA,
> > MOD_MEM_TYPE_INIT_RODATA,
> >
> > and allocating them separately.
>
> First thanks for doing this work!
>
> This seems to not acknolwedge the original goal of the first module_layout and
> the latched rb-tree use, and that was was for speeding up __module_address()
> since it *can* even be triggered on NMIs. I say this because the first question
> that comes to me is the impact to performance on __module_address() I can't
> see that changing much here, but mention it as it similar consideration
> should be made in case future changes modify this path.

To make sure I understand this correctly. Do you mean we need something like
the following in the commit log?

"""
This adds slightly more entries to mod_tree (from up to 3 entries per module, to
up to 7 entries per module). However, this at most adds a small constant
overhead to __module_address(), which is expected to be fast.
"""

>
> Microbenching something so trivial as __module_address() may not be as useful
> for an idle system, at the very least being able to compare before and after
> even on idle may be useful *if* you eventually do some more radical changes
> here. Modules-related kernel/kallsyms_selftest.c did that for kallsyms_lookup_name()
> and friend just recently for a minor performance enhancement.

kernel/kallsyms_selftest.c is new to me. I will give it a try.

>
> At a high level it is perhaps my only conern so far while reviewing this patch,
> and I am glad it is now clear that addressing modules is a requirement. The rest
> seems like a highly welcomed cleanup even though the diffstat does not reflect
> that.
>
> > Various archs use module_layout for different data. These data are put
> > into different module_memory based on their location in module_layout.
> > IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
> > data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
>
> I think the commit log should document a bit the rationale for why
> ARCH_WANTS_MODULES_DATA_IN_VMALLOC gets *can* tidied up so well here.

Great point! Will add this in v3.

>
> > Signed-off-by: Song Liu <[email protected]>
> > Cc: Luis Chamberlain <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> >
> > ---
> >
> > This is the preparation work for the type aware module_alloc() discussed
> > in [1]. While this work is not covered much in the discussion, it is a
> > critical step of the effort.
> >
> > As this part grows pretty big (~1000 lines, + and -), I would like get
> > some feedback on it, so that I know it is on the right track.
> >
> > Please share your comments. Thanks!
> >
> > Test coverage: Tested on x86_64.
>
> I will likely merge this onto modules-next soon, not because I think it is
> ready, but just because I think it *is* mostly ready and the next thing
> we need is exposure and testing. rc5 is pretty late to consider this
> for v6.3 and so hopefully for this cycle we can at least settle on
> something which will sit in linux-next since the respective linux-next
> after v6.3-rc1 is released.

Yes, this definitely needs more tests. Given different archs use
module_layout in all sorts of ways. I will be very surprised if I updated
all them correctly (though I tried hard to).

>
> > Build tested by kernel test bot in [2]. The only regression in [2] was a
> > typo in parisc, which is also fixed.
> >
> > [1] https://lore.kernel.org/linux-mm/[email protected]/T/#u
>
> You still never addressed my performance suggestions so don't be
> surprised if I insist later. Yes you can use existing performance
> benchmarks, specially now with modules as a hard requirement, to
> show gains. So I'd like to clarify that if I'm reviewing late it is
> because:
>
> a) my modules patch review queue has been high as of late
> b) you seem to not have taken these performance suggestions into consideration
> before and so I tend to put it at my end of my queue for review.

I think it will be a lot easier to run performance tests with the
module support. Let's see what we can do when we get to the
performance test part.

[...]
>
> > @@ -1389,16 +1466,29 @@ unsigned int __weak arch_mod_section_prepend(struct module *mod,
> > return 0;
> > }
> >
> > -/* Update size with this section: return offset. */
> > -long module_get_offset(struct module *mod, unsigned int *size,
> > - Elf_Shdr *sechdr, unsigned int section)
> > +/*
> > + * Use highest 4 bits of sh_entsize to store the mod_mem_type of this
> > + * section. This leaves 28 bits for offset on 32-bit systems, which is
> > + * about 256 MiB (WARN_ON_ONCE if we exceed that).
> > + */
> > +
> > +#define SH_ENTSIZE_TYPE_BITS 4
> > +#define SH_ENTSIZE_TYPE_SHIFT (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)
> > +#define SH_ENTSIZE_TYPE_MASK ((1UL << SH_ENTSIZE_TYPE_BITS) - 1)
> > +#define SH_ENTSIZE_OFFSET_MASK ((1UL << (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)) - 1)
> > +
> > +long module_get_offset_and_type(struct module *mod, enum mod_mem_type type,
> > + Elf_Shdr *sechdr, unsigned int section)
> > {
> > - long ret;
> > + long offset;
> > + long mask = ((unsigned long)(type) & SH_ENTSIZE_TYPE_MASK) << SH_ENTSIZE_TYPE_SHIFT;
> >
> > - *size += arch_mod_section_prepend(mod, section);
> > - ret = ALIGN(*size, sechdr->sh_addralign ?: 1);
> > - *size = ret + sechdr->sh_size;
> > - return ret;
> > + mod->mod_mem[type].size += arch_mod_section_prepend(mod, section);
> > + offset = ALIGN(mod->mod_mem[type].size, sechdr->sh_addralign ?: 1);
> > + mod->mod_mem[type].size = offset + sechdr->sh_size;
> > +
> > + WARN_ON_ONCE(offset & mask);
> > + return offset | mask;
> > }
>
> Could you split this somehow with the mask stuff up into its own
> non-functional patch first somehow to make this easier to review?
>
> No worries if you can't.

This is not very straightforward with INIT_OFFSET_MASK. So I will keep
it as-is for now. Btw, while looking into this, I found a leftover
INIT_OFFSET_MASK that I will remove in v3.

>
> > @@ -2117,74 +2193,45 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> >
> > static int move_module(struct module *mod, struct load_info *info)
> > {
> > + enum mod_mem_type type;
> > int i;
> > void *ptr;
> >
> > - /* Do the allocs. */
> > - ptr = module_alloc(mod->core_layout.size);
> > - /*
> > - * The pointer to this block is stored in the module structure
> > - * which is inside the block. Just mark it as not being a
> > - * leak.
> > - */
> > - kmemleak_not_leak(ptr);
> > - if (!ptr)
> > - return -ENOMEM;
> > -
> > - memset(ptr, 0, mod->core_layout.size);
> > - mod->core_layout.base = ptr;
> > + for_each_mod_mem_type(type) {
> > + if (!mod->mod_mem[type].size) {
> > + mod->mod_mem[type].base = NULL;
> > + continue;
> > + }
> > + mod->mod_mem[type].size = PAGE_ALIGN(mod->mod_mem[type].size);
> > + if (mod_mem_use_vmalloc(type))
> > + ptr = vzalloc(mod->mod_mem[type].size);
> > + else
> > + ptr = module_alloc(mod->mod_mem[type].size);
>
> This form to check for mod_mem_use_vmalloc() is used twice, how about
> a helper to just do it one line for us?

Good idea! Will update in v3.

>
> > @@ -2195,6 +2242,14 @@ static int move_module(struct module *mod, struct load_info *info)
> > }
> >
> > return 0;
> > +out_enomem:
> > + for (type--; type >= 0; type--) {
> > + if (mod_mem_use_vmalloc(type))
> > + vfree(mod->mod_mem[type].base);
> > + else
> > + module_memfree(mod->mod_mem[type].base);
> > + }
> > + return -ENOMEM;
> > }
>
> Here's the other user.
>
> > diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c
>
> This was the hardest file to review and so I ask the same, if a separate
> non-functional change could be done first to make the changes easier
> to review.

AFAICT, strict_rwx.c is pretty straightforward after the change. And it is
just ~100 LOC. The non-functional change is very likely to be more
complicated. So I will keep this part as-is.

>
> No worries if not.
>
> Reviewed-by: Luis Chamberlain <[email protected]>

I will fix these, rebase, and send v3.

Thanks again!
Song

2023-01-26 21:00:52

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2] module: replace module_layout with module_memory

Guenter Roeck,

Any chance you can give this branch a good spin on your multi-arch setup
to see what may below up?

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-testing

On Thu, Jan 26, 2023 at 12:01:40PM -0800, Song Liu wrote:
> Hi Luis,
>
> Thanks for your kind review!
>
> On Wed, Jan 25, 2023 at 10:06 PM Luis Chamberlain <[email protected]> wrote:
> >
> [...]
> > >
> > > MOD_MEM_TYPE_TEXT,
> > > MOD_MEM_TYPE_DATA,
> > > MOD_MEM_TYPE_RODATA,
> > > MOD_MEM_TYPE_RO_AFTER_INIT,
> > > MOD_MEM_TYPE_INIT_TEXT,
> > > MOD_MEM_TYPE_INIT_DATA,
> > > MOD_MEM_TYPE_INIT_RODATA,
> > >
> > > and allocating them separately.
> >
> > First thanks for doing this work!
> >
> > This seems to not acknolwedge the original goal of the first module_layout and
> > the latched rb-tree use, and that was was for speeding up __module_address()
> > since it *can* even be triggered on NMIs. I say this because the first question
> > that comes to me is the impact to performance on __module_address() I can't
> > see that changing much here, but mention it as it similar consideration
> > should be made in case future changes modify this path.
>
> To make sure I understand this correctly. Do you mean we need something like
> the following in the commit log?
>
> """
> This adds slightly more entries to mod_tree (from up to 3 entries per module, to
> up to 7 entries per module). However, this at most adds a small constant
> overhead to __module_address(), which is expected to be fast.
> """

Yes I think this is very useful information for the reviewier.

> > Microbenching something so trivial as __module_address() may not be as useful
> > for an idle system, at the very least being able to compare before and after
> > even on idle may be useful *if* you eventually do some more radical changes
> > here. Modules-related kernel/kallsyms_selftest.c did that for kallsyms_lookup_name()
> > and friend just recently for a minor performance enhancement.
>
> kernel/kallsyms_selftest.c is new to me. I will give it a try.

It was just merged, be sure to have da35048f2600 ("kallsyms: Fix
scheduling with interrupts disabled in self-test").

> > > Signed-off-by: Song Liu <[email protected]>
> > > Cc: Luis Chamberlain <[email protected]>
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> > >
> > > ---
> > >
> > > This is the preparation work for the type aware module_alloc() discussed
> > > in [1]. While this work is not covered much in the discussion, it is a
> > > critical step of the effort.
> > >
> > > As this part grows pretty big (~1000 lines, + and -), I would like get
> > > some feedback on it, so that I know it is on the right track.
> > >
> > > Please share your comments. Thanks!
> > >
> > > Test coverage: Tested on x86_64.
> >
> > I will likely merge this onto modules-next soon, not because I think it is
> > ready, but just because I think it *is* mostly ready and the next thing
> > we need is exposure and testing. rc5 is pretty late to consider this
> > for v6.3 and so hopefully for this cycle we can at least settle on
> > something which will sit in linux-next since the respective linux-next
> > after v6.3-rc1 is released.
>
> Yes, this definitely needs more tests. Given different archs use
> module_layout in all sorts of ways. I will be very surprised if I updated
> all them correctly (though I tried hard to).

OK so let's be patient with testing. Getting help from Guenter here
can probably speed up finding issues.

> > > Build tested by kernel test bot in [2]. The only regression in [2] was a
> > > typo in parisc, which is also fixed.
> > >
> > > [1] https://lore.kernel.org/linux-mm/[email protected]/T/#u
> >
> > You still never addressed my performance suggestions so don't be
> > surprised if I insist later. Yes you can use existing performance
> > benchmarks, specially now with modules as a hard requirement, to
> > show gains. So I'd like to clarify that if I'm reviewing late it is
> > because:
> >
> > a) my modules patch review queue has been high as of late
> > b) you seem to not have taken these performance suggestions into consideration
> > before and so I tend to put it at my end of my queue for review.
>
> I think it will be a lot easier to run performance tests with the
> module support. Let's see what we can do when we get to the
> performance test part.

Fantastic, clearly I'm interested in being able to reproduce so I will
email you offline about some techniques I've used to reproduce some
things easily for testing things with modules.

Luis

2023-01-27 13:15:17

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v2] module: replace module_layout with module_memory

On Thu, Jan 26, 2023 at 01:00:43PM -0800, Luis Chamberlain wrote:
> Guenter Roeck,
>
> Any chance you can give this branch a good spin on your multi-arch setup
> to see what may below up?
>
I assume I shoud test v3 ?

Guenter

> https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-testing
>
> On Thu, Jan 26, 2023 at 12:01:40PM -0800, Song Liu wrote:
> > Hi Luis,
> >
> > Thanks for your kind review!
> >
> > On Wed, Jan 25, 2023 at 10:06 PM Luis Chamberlain <[email protected]> wrote:
> > >
> > [...]
> > > >
> > > > MOD_MEM_TYPE_TEXT,
> > > > MOD_MEM_TYPE_DATA,
> > > > MOD_MEM_TYPE_RODATA,
> > > > MOD_MEM_TYPE_RO_AFTER_INIT,
> > > > MOD_MEM_TYPE_INIT_TEXT,
> > > > MOD_MEM_TYPE_INIT_DATA,
> > > > MOD_MEM_TYPE_INIT_RODATA,
> > > >
> > > > and allocating them separately.
> > >
> > > First thanks for doing this work!
> > >
> > > This seems to not acknolwedge the original goal of the first module_layout and
> > > the latched rb-tree use, and that was was for speeding up __module_address()
> > > since it *can* even be triggered on NMIs. I say this because the first question
> > > that comes to me is the impact to performance on __module_address() I can't
> > > see that changing much here, but mention it as it similar consideration
> > > should be made in case future changes modify this path.
> >
> > To make sure I understand this correctly. Do you mean we need something like
> > the following in the commit log?
> >
> > """
> > This adds slightly more entries to mod_tree (from up to 3 entries per module, to
> > up to 7 entries per module). However, this at most adds a small constant
> > overhead to __module_address(), which is expected to be fast.
> > """
>
> Yes I think this is very useful information for the reviewier.
>
> > > Microbenching something so trivial as __module_address() may not be as useful
> > > for an idle system, at the very least being able to compare before and after
> > > even on idle may be useful *if* you eventually do some more radical changes
> > > here. Modules-related kernel/kallsyms_selftest.c did that for kallsyms_lookup_name()
> > > and friend just recently for a minor performance enhancement.
> >
> > kernel/kallsyms_selftest.c is new to me. I will give it a try.
>
> It was just merged, be sure to have da35048f2600 ("kallsyms: Fix
> scheduling with interrupts disabled in self-test").
>
> > > > Signed-off-by: Song Liu <[email protected]>
> > > > Cc: Luis Chamberlain <[email protected]>
> > > > Cc: Thomas Gleixner <[email protected]>
> > > > Cc: Peter Zijlstra <[email protected]>
> > > >
> > > > ---
> > > >
> > > > This is the preparation work for the type aware module_alloc() discussed
> > > > in [1]. While this work is not covered much in the discussion, it is a
> > > > critical step of the effort.
> > > >
> > > > As this part grows pretty big (~1000 lines, + and -), I would like get
> > > > some feedback on it, so that I know it is on the right track.
> > > >
> > > > Please share your comments. Thanks!
> > > >
> > > > Test coverage: Tested on x86_64.
> > >
> > > I will likely merge this onto modules-next soon, not because I think it is
> > > ready, but just because I think it *is* mostly ready and the next thing
> > > we need is exposure and testing. rc5 is pretty late to consider this
> > > for v6.3 and so hopefully for this cycle we can at least settle on
> > > something which will sit in linux-next since the respective linux-next
> > > after v6.3-rc1 is released.
> >
> > Yes, this definitely needs more tests. Given different archs use
> > module_layout in all sorts of ways. I will be very surprised if I updated
> > all them correctly (though I tried hard to).
>
> OK so let's be patient with testing. Getting help from Guenter here
> can probably speed up finding issues.
>
> > > > Build tested by kernel test bot in [2]. The only regression in [2] was a
> > > > typo in parisc, which is also fixed.
> > > >
> > > > [1] https://lore.kernel.org/linux-mm/[email protected]/T/#u
> > >
> > > You still never addressed my performance suggestions so don't be
> > > surprised if I insist later. Yes you can use existing performance
> > > benchmarks, specially now with modules as a hard requirement, to
> > > show gains. So I'd like to clarify that if I'm reviewing late it is
> > > because:
> > >
> > > a) my modules patch review queue has been high as of late
> > > b) you seem to not have taken these performance suggestions into consideration
> > > before and so I tend to put it at my end of my queue for review.
> >
> > I think it will be a lot easier to run performance tests with the
> > module support. Let's see what we can do when we get to the
> > performance test part.
>
> Fantastic, clearly I'm interested in being able to reproduce so I will
> email you offline about some techniques I've used to reproduce some
> things easily for testing things with modules.
>
> Luis

2023-01-27 16:44:01

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v2] module: replace module_layout with module_memory

Hi Guenter,

On Fri, Jan 27, 2023 at 5:13 AM Guenter Roeck <[email protected]> wrote:
>
> On Thu, Jan 26, 2023 at 01:00:43PM -0800, Luis Chamberlain wrote:
> > Guenter Roeck,
> >
> > Any chance you can give this branch a good spin on your multi-arch setup
> > to see what may below up?
> >
> I assume I shoud test v3 ?

Yes, please. The one in the modules-testing [1] branch is already v3.

Thanks,
Song

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-testing

2023-01-27 16:59:48

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v2] module: replace module_layout with module_memory

On Fri, Jan 27, 2023 at 08:42:57AM -0800, Song Liu wrote:
> Hi Guenter,
>
> On Fri, Jan 27, 2023 at 5:13 AM Guenter Roeck <[email protected]> wrote:
> >
> > On Thu, Jan 26, 2023 at 01:00:43PM -0800, Luis Chamberlain wrote:
> > > Guenter Roeck,
> > >
> > > Any chance you can give this branch a good spin on your multi-arch setup
> > > to see what may below up?
> > >
> > I assume I shoud test v3 ?
>
> Yes, please. The one in the modules-testing [1] branch is already v3.
>

I started a test on v6.2-rc5-52-gc96fb00e9ddf.

Guenter