2022-07-29 09:34:13

by Xi Ruoyao

[permalink] [raw]
Subject: [PATCH v4 0/4] LoongArch: Support new relocation types

The version 2.00 of LoongArch ELF ABI specification introduced new
relocation types, and the development tree of Binutils and GCC has
started to use them. If the kernel is built with the latest snapshot of
Binutils or GCC, it will fail to load the modules because of unrecognized
relocation types in modules.

Add support for GOT and new relocation types for the module loader, so
the kernel (with modules) can be built with the "normal" code model and
function properly.

This series does not break the compatibility with old toolchain using
stack-based relocation types, so with the patches applied the kernel can
be be built with both old and new toolchains.

Tested by building the kernel with both Binutils & GCC master branch and
my system Binutils & GCC (without new relocation type support), running
both the builds with 35 in-tree modules loaded, and loading one module
with 20 GOT loads (loaded addresses verified by comparing with
/proc/kallsyms).

Changes from v3 to v4:

- No code change. Reword the commit message of the 3rd patch again
based on suggestion from Huacai.

Changes from v2 to v3:

- Use `union loongarch_instruction` instead of explicit bit shifts
applying the relocation. Suggested by Youling.
- For R_LARCH_B26, move the alignment check before the range check to be
consistent with stack pop relocations. Suggested by Youling.
- Reword the commit message of the 3rd patch. Suggested by Huacai.

Changes from v1 to v2:

- Fix a stupid programming error (confusion between the number of PLT
entries and the number of GOT entries). (Bug spotted by Youling).
- Synthesize the _GLOBAL_OFFSET_TABLE_ symbol with module.lds, instead
of faking it at runtime. The 3rd patch from V1 is now merged into
the 1st patch because it would be a one-line change. (Suggested by
Jinyang).
- Keep reloc_rela_handlers[] ordered by the relocation type ID.
(Suggested by Youling).
- Remove -fplt along with -Wa,-mla-* options because it's the default.
(Suggested by Youling).

Xi Ruoyao (4):
LoongArch: Add section of GOT for kernel module
LoongArch: Support R_LARCH_SOP_PUSH_GPREL relocation type in kernel
module
LoongArch: Remove -fplt and -Wa,-mla-* from CFLAGS
LoongArch: Support modules with new relocation types

arch/loongarch/Makefile | 4 --
arch/loongarch/include/asm/elf.h | 37 ++++++++++
arch/loongarch/include/asm/module.h | 23 ++++++
arch/loongarch/include/asm/module.lds.h | 1 +
arch/loongarch/kernel/head.S | 10 +--
arch/loongarch/kernel/module-sections.c | 51 +++++++++++--
arch/loongarch/kernel/module.c | 96 +++++++++++++++++++++++++
7 files changed, 209 insertions(+), 13 deletions(-)

--
2.37.0



2022-07-29 09:34:43

by Xi Ruoyao

[permalink] [raw]
Subject: [PATCH v4 4/4] LoongArch: Support modules with new relocation types

If GAS 2.40 and/or GCC 13 is used to build the kernel, the modules will
contain R_LARCH_B26, R_LARCH_PCALA_HI20, R_LARCH_PCALA_LO12,
R_LARCH_GOT_PC_HI20, and R_LARCH_GOT_PC_LO12 relocations. Support them
in the module loader to allow a kernel built with latest toolchain
capable to load the modules.

Signed-off-by: Xi Ruoyao <[email protected]>
---
arch/loongarch/include/asm/elf.h | 37 +++++++++++
arch/loongarch/kernel/module-sections.c | 12 +++-
arch/loongarch/kernel/module.c | 85 +++++++++++++++++++++++++
3 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h
index 5f3ff4781fda..7af0cebf28d7 100644
--- a/arch/loongarch/include/asm/elf.h
+++ b/arch/loongarch/include/asm/elf.h
@@ -74,6 +74,43 @@
#define R_LARCH_SUB64 56
#define R_LARCH_GNU_VTINHERIT 57
#define R_LARCH_GNU_VTENTRY 58
+#define R_LARCH_B16 64
+#define R_LARCH_B21 65
+#define R_LARCH_B26 66
+#define R_LARCH_ABS_HI20 67
+#define R_LARCH_ABS_LO12 68
+#define R_LARCH_ABS64_LO20 69
+#define R_LARCH_ABS64_HI12 70
+#define R_LARCH_PCALA_HI20 71
+#define R_LARCH_PCALA_LO12 72
+#define R_LARCH_PCALA64_LO20 73
+#define R_LARCH_PCALA64_HI12 74
+#define R_LARCH_GOT_PC_HI20 75
+#define R_LARCH_GOT_PC_LO12 76
+#define R_LARCH_GOT64_PC_LO20 77
+#define R_LARCH_GOT64_PC_HI12 78
+#define R_LARCH_GOT_HI20 79
+#define R_LARCH_GOT_LO12 80
+#define R_LARCH_GOT64_LO20 81
+#define R_LARCH_GOT64_HI12 82
+#define R_LARCH_TLS_LE_HI20 83
+#define R_LARCH_TLS_LE_LO12 84
+#define R_LARCH_TLS_LE64_LO20 85
+#define R_LARCH_TLS_LE64_HI12 86
+#define R_LARCH_TLS_IE_PC_HI20 87
+#define R_LARCH_TLS_IE_PC_LO12 88
+#define R_LARCH_TLS_IE64_PC_LO20 89
+#define R_LARCH_TLS_IE64_PC_HI12 90
+#define R_LARCH_TLS_IE_HI20 91
+#define R_LARCH_TLS_IE_LO12 92
+#define R_LARCH_TLS_IE64_LO20 93
+#define R_LARCH_TLS_IE64_HI12 94
+#define R_LARCH_TLS_LD_PC_HI20 95
+#define R_LARCH_TLS_LD_HI20 96
+#define R_LARCH_TLS_GD_PC_HI20 97
+#define R_LARCH_TLS_GD_HI20 98
+#define R_LARCH_32_PCREL 99
+#define R_LARCH_RELAX 100

#ifndef ELF_ARCH

diff --git a/arch/loongarch/kernel/module-sections.c b/arch/loongarch/kernel/module-sections.c
index 36a77771d18c..8c0e4ad048cc 100644
--- a/arch/loongarch/kernel/module-sections.c
+++ b/arch/loongarch/kernel/module-sections.c
@@ -76,12 +76,20 @@ static void count_max_entries(Elf_Rela *relas, int num,

for (i = 0; i < num; i++) {
type = ELF_R_TYPE(relas[i].r_info);
- if (type == R_LARCH_SOP_PUSH_PLT_PCREL) {
+ switch (type) {
+ case R_LARCH_SOP_PUSH_PLT_PCREL:
+ case R_LARCH_B26:
if (!duplicate_rela(relas, i))
(*plts)++;
- } else if (type == R_LARCH_SOP_PUSH_GPREL)
+ break;
+ case R_LARCH_SOP_PUSH_GPREL:
+ case R_LARCH_GOT_PC_HI20:
if (!duplicate_rela(relas, i))
(*gots)++;
+ break;
+ default:
+ /* Do nothing. */
+ }
}
}

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index 3ac4fbb5f109..c7b40150e1f0 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -291,6 +291,86 @@ static int apply_r_larch_add_sub(struct module *mod, u32 *location, Elf_Addr v,
}
}

+static int apply_r_larch_b26(struct module *mod, u32 *location, Elf_Addr v,
+ s64 *rela_stack, size_t *rela_stack_top, unsigned int type)
+{
+ ptrdiff_t offset = (void *)v - (void *)location;
+ union loongarch_instruction *insn = (union loongarch_instruction *)location;
+
+ if (offset >= SZ_128M)
+ v = module_emit_plt_entry(mod, v);
+
+ if (offset < -SZ_128M)
+ v = module_emit_plt_entry(mod, v);
+
+ offset = (void *)v - (void *)location;
+
+ if (offset & 3) {
+ pr_err("module %s: jump offset = 0x%llx unaligned! dangerous R_LARCH_B26 (%u) relocation\n",
+ mod->name, (long long)offset, type);
+ return -ENOEXEC;
+ }
+
+ if (!signed_imm_check(offset, 28)) {
+ pr_err("module %s: jump offset = 0x%llx overflow! dangerous R_LARCH_B26 (%u) relocation\n",
+ mod->name, (long long)offset, type);
+ return -ENOEXEC;
+ }
+
+ offset >>= 2;
+ insn->reg0i26_format.immediate_l = offset & 0xffff;
+ insn->reg0i26_format.immediate_h = (offset >> 16) & 0x3ff;
+ return 0;
+}
+
+static int apply_r_larch_pcala_hi20(struct module *mod, u32 *location,
+ Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
+ unsigned int type)
+{
+ ptrdiff_t offset = (void *)((v + 0x800) & ~0xfff) -
+ (void *)((Elf_Addr)location & ~0xfff);
+ union loongarch_instruction *insn = (union loongarch_instruction *)location;
+
+ if (!signed_imm_check(offset, 32)) {
+ pr_err("module %s: PCALA offset = 0x%llx does not fit in 32-bit signed and is unsupported by kernel! dangerous %s (%u) relocation\n",
+ mod->name, (long long)offset, __func__, type);
+ return -ENOEXEC;
+ }
+
+ insn->reg1i20_format.immediate = (offset >> 12) & 0xfffff;
+ return 0;
+}
+
+static int apply_r_larch_got_pc_hi20(struct module *mod, u32 *location,
+ Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
+ unsigned int type)
+{
+ Elf_Addr got = module_emit_got_entry(mod, v);
+
+ return apply_r_larch_pcala_hi20(mod, location, got, rela_stack,
+ rela_stack_top, type);
+}
+
+static int apply_r_larch_pcala_lo12(struct module *mod, u32 *location,
+ Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
+ unsigned int type)
+{
+ union loongarch_instruction *insn = (union loongarch_instruction *)location;
+
+ insn->reg2i12_format.immediate = v & 0xfff;
+ return 0;
+}
+
+static int apply_r_larch_got_pc_lo12(struct module *mod, u32 *location,
+ Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
+ unsigned int type)
+{
+ Elf_Addr got = module_emit_got_entry(mod, v);
+
+ return apply_r_larch_pcala_lo12(mod, location, got, rela_stack,
+ rela_stack_top, type);
+}
+
/*
* reloc_handlers_rela() - Apply a particular relocation to a module
* @mod: the module to apply the reloc to
@@ -321,6 +401,11 @@ static reloc_rela_handler reloc_rela_handlers[] = {
[R_LARCH_SOP_SUB ... R_LARCH_SOP_IF_ELSE] = apply_r_larch_sop,
[R_LARCH_SOP_POP_32_S_10_5 ... R_LARCH_SOP_POP_32_U] = apply_r_larch_sop_imm_field,
[R_LARCH_ADD32 ... R_LARCH_SUB64] = apply_r_larch_add_sub,
+ [R_LARCH_B26] = apply_r_larch_b26,
+ [R_LARCH_PCALA_HI20] = apply_r_larch_pcala_hi20,
+ [R_LARCH_PCALA_LO12] = apply_r_larch_pcala_lo12,
+ [R_LARCH_GOT_PC_HI20] = apply_r_larch_got_pc_hi20,
+ [R_LARCH_GOT_PC_LO12] = apply_r_larch_got_pc_lo12,
};

int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
--
2.37.0


2022-07-29 09:35:48

by Xi Ruoyao

[permalink] [raw]
Subject: [PATCH v4 2/4] LoongArch: Support R_LARCH_SOP_PUSH_GPREL relocation type in kernel module

This relocation type pushes the offset of the GOT entry for a symbol
from the beginning of GOT into the relocation stack. Our linker script
has initialized an empty GOT, so we need to create a new GOT entry if
there is no exist one for a symbol.

Signed-off-by: Xi Ruoyao <[email protected]>
---
arch/loongarch/kernel/module.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index 638427ff0d51..3ac4fbb5f109 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -122,6 +122,16 @@ static int apply_r_larch_sop_push_plt_pcrel(struct module *mod, u32 *location, E
return apply_r_larch_sop_push_pcrel(mod, location, v, rela_stack, rela_stack_top, type);
}

+static int apply_r_larch_sop_push_gprel(struct module *mod, u32 *location,
+ Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
+ unsigned int type)
+{
+ Elf_Addr got = module_emit_got_entry(mod, v);
+ ptrdiff_t offset = (void *)got - (void *)mod->arch.got.shdr->sh_addr;
+
+ return rela_stack_push(offset, rela_stack, rela_stack_top);
+}
+
static int apply_r_larch_sop(struct module *mod, u32 *location, Elf_Addr v,
s64 *rela_stack, size_t *rela_stack_top, unsigned int type)
{
@@ -306,6 +316,7 @@ static reloc_rela_handler reloc_rela_handlers[] = {
[R_LARCH_SOP_PUSH_PCREL] = apply_r_larch_sop_push_pcrel,
[R_LARCH_SOP_PUSH_ABSOLUTE] = apply_r_larch_sop_push_absolute,
[R_LARCH_SOP_PUSH_DUP] = apply_r_larch_sop_push_dup,
+ [R_LARCH_SOP_PUSH_GPREL] = apply_r_larch_sop_push_gprel,
[R_LARCH_SOP_PUSH_PLT_PCREL] = apply_r_larch_sop_push_plt_pcrel,
[R_LARCH_SOP_SUB ... R_LARCH_SOP_IF_ELSE] = apply_r_larch_sop,
[R_LARCH_SOP_POP_32_S_10_5 ... R_LARCH_SOP_POP_32_U] = apply_r_larch_sop_imm_field,
--
2.37.0


2022-07-29 10:06:32

by WANG Xuerui

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On 2022/7/29 16:38, Xi Ruoyao wrote:
> The version 2.00 of LoongArch ELF ABI specification introduced new
> relocation types, and the development tree of Binutils and GCC has
> started to use them. If the kernel is built with the latest snapshot of
> Binutils or GCC, it will fail to load the modules because of unrecognized
> relocation types in modules.
>
> Add support for GOT and new relocation types for the module loader, so
> the kernel (with modules) can be built with the "normal" code model and
> function properly.
>
> This series does not break the compatibility with old toolchain using
> stack-based relocation types, so with the patches applied the kernel can
> be be built with both old and new toolchains.
>
> Tested by building the kernel with both Binutils & GCC master branch and
> my system Binutils & GCC (without new relocation type support), running
> both the builds with 35 in-tree modules loaded, and loading one module
> with 20 GOT loads (loaded addresses verified by comparing with
> /proc/kallsyms).
>
> Changes from v3 to v4:
>
> - No code change. Reword the commit message of the 3rd patch again
> based on suggestion from Huacai.
>
> Changes from v2 to v3:
>
> - Use `union loongarch_instruction` instead of explicit bit shifts
> applying the relocation. Suggested by Youling.
> - For R_LARCH_B26, move the alignment check before the range check to be
> consistent with stack pop relocations. Suggested by Youling.
> - Reword the commit message of the 3rd patch. Suggested by Huacai.
>
> Changes from v1 to v2:
>
> - Fix a stupid programming error (confusion between the number of PLT
> entries and the number of GOT entries). (Bug spotted by Youling).
> - Synthesize the _GLOBAL_OFFSET_TABLE_ symbol with module.lds, instead
> of faking it at runtime. The 3rd patch from V1 is now merged into
> the 1st patch because it would be a one-line change. (Suggested by
> Jinyang).
> - Keep reloc_rela_handlers[] ordered by the relocation type ID.
> (Suggested by Youling).
> - Remove -fplt along with -Wa,-mla-* options because it's the default.
> (Suggested by Youling).
>
> Xi Ruoyao (4):
> LoongArch: Add section of GOT for kernel module
> LoongArch: Support R_LARCH_SOP_PUSH_GPREL relocation type in kernel
> module
> LoongArch: Remove -fplt and -Wa,-mla-* from CFLAGS
> LoongArch: Support modules with new relocation types
>
> arch/loongarch/Makefile | 4 --
> arch/loongarch/include/asm/elf.h | 37 ++++++++++
> arch/loongarch/include/asm/module.h | 23 ++++++
> arch/loongarch/include/asm/module.lds.h | 1 +
> arch/loongarch/kernel/head.S | 10 +--
> arch/loongarch/kernel/module-sections.c | 51 +++++++++++--
> arch/loongarch/kernel/module.c | 96 +++++++++++++++++++++++++
> 7 files changed, 209 insertions(+), 13 deletions(-)
>

Thanks very much for the timely adaptation. I'm rebuilding my Gentoo
toolchain from upstream HEAD, will test this weekend.

--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/

2022-07-29 10:24:12

by Youling Tang

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, Ruoyao

Tested this series of patches v3 on a CLFS 5.5 system, using the new
cross toolchain,
$ dmesg | head
[ 0.000000] Linux version 5.19.0-rc7new-toolchain+ (loongson@linux)
(loongarch64-unknown-linux-gnu-gcc (GCC) 13.0.0 20220726 (experimental)
[master revision
cf7eac5805e:1e0611b64d8:3fb68f2e666d9de7e0326af9f43b12c9e98f19a6], GNU
ld (GNU Binutils) 2.39.50.20220726) #1 SMP PREEMPT Fri Jul 29 05:24:15
EDT 2022

Relocation error when manually loading nf_tables.ko module,
$ sudo modprobe nf_tables
odprobe: ERROR: could not insert 'nf_tables': Exec format error

$ dmesg
[ 61.506737] kmod: module nf_tables: PCALA offset = 0x90007ffffed8c000
does not fit in 32-bit signed and is unsupported by kernel! dangerous
apply_r_larch_pcala_hi20 (71) relocation

Do you have the same problem over there?

Thanks,
Youling

On 07/29/2022 04:38 PM, Xi Ruoyao wrote:
> The version 2.00 of LoongArch ELF ABI specification introduced new
> relocation types, and the development tree of Binutils and GCC has
> started to use them. If the kernel is built with the latest snapshot of
> Binutils or GCC, it will fail to load the modules because of unrecognized
> relocation types in modules.
>
> Add support for GOT and new relocation types for the module loader, so
> the kernel (with modules) can be built with the "normal" code model and
> function properly.
>
> This series does not break the compatibility with old toolchain using
> stack-based relocation types, so with the patches applied the kernel can
> be be built with both old and new toolchains.
>
> Tested by building the kernel with both Binutils & GCC master branch and
> my system Binutils & GCC (without new relocation type support), running
> both the builds with 35 in-tree modules loaded, and loading one module
> with 20 GOT loads (loaded addresses verified by comparing with
> /proc/kallsyms).
>
> Changes from v3 to v4:
>
> - No code change. Reword the commit message of the 3rd patch again
> based on suggestion from Huacai.
>
> Changes from v2 to v3:
>
> - Use `union loongarch_instruction` instead of explicit bit shifts
> applying the relocation. Suggested by Youling.
> - For R_LARCH_B26, move the alignment check before the range check to be
> consistent with stack pop relocations. Suggested by Youling.
> - Reword the commit message of the 3rd patch. Suggested by Huacai.
>
> Changes from v1 to v2:
>
> - Fix a stupid programming error (confusion between the number of PLT
> entries and the number of GOT entries). (Bug spotted by Youling).
> - Synthesize the _GLOBAL_OFFSET_TABLE_ symbol with module.lds, instead
> of faking it at runtime. The 3rd patch from V1 is now merged into
> the 1st patch because it would be a one-line change. (Suggested by
> Jinyang).
> - Keep reloc_rela_handlers[] ordered by the relocation type ID.
> (Suggested by Youling).
> - Remove -fplt along with -Wa,-mla-* options because it's the default.
> (Suggested by Youling).
>
> Xi Ruoyao (4):
> LoongArch: Add section of GOT for kernel module
> LoongArch: Support R_LARCH_SOP_PUSH_GPREL relocation type in kernel
> module
> LoongArch: Remove -fplt and -Wa,-mla-* from CFLAGS
> LoongArch: Support modules with new relocation types
>
> arch/loongarch/Makefile | 4 --
> arch/loongarch/include/asm/elf.h | 37 ++++++++++
> arch/loongarch/include/asm/module.h | 23 ++++++
> arch/loongarch/include/asm/module.lds.h | 1 +
> arch/loongarch/kernel/head.S | 10 +--
> arch/loongarch/kernel/module-sections.c | 51 +++++++++++--
> arch/loongarch/kernel/module.c | 96 +++++++++++++++++++++++++
> 7 files changed, 209 insertions(+), 13 deletions(-)
>

2022-07-29 10:35:41

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Fri, 2022-07-29 at 17:49 +0800, Youling Tang wrote:
> Hi, Ruoyao
>
> Tested this series of patches v3 on a CLFS 5.5 system, using the new
> cross toolchain,
> $ dmesg | head
> [    0.000000] Linux version 5.19.0-rc7new-toolchain+ (loongson@linux)
> (loongarch64-unknown-linux-gnu-gcc (GCC) 13.0.0 20220726 (experimental)
> [master revision
> cf7eac5805e:1e0611b64d8:3fb68f2e666d9de7e0326af9f43b12c9e98f19a6], GNU
> ld (GNU Binutils) 2.39.50.20220726) #1 SMP PREEMPT Fri Jul 29 05:24:15
> EDT 2022
>
> Relocation error when manually loading nf_tables.ko module,
> $ sudo modprobe nf_tables
> odprobe: ERROR: could not insert 'nf_tables': Exec format error
>
> $ dmesg
> [   61.506737] kmod: module nf_tables: PCALA offset = 0x90007ffffed8c000
> does not fit in 32-bit signed and is unsupported by kernel! dangerous
> apply_r_larch_pcala_hi20 (71) relocation
>
> Do you have the same problem over there?

I can reproduce it with "modprobe x_tables". Will try to debug...

2022-07-29 10:37:58

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Fri, 2022-07-29 at 18:18 +0800, Xi Ruoyao wrote:
> On Fri, 2022-07-29 at 17:49 +0800, Youling Tang wrote:
> > Hi, Ruoyao
> >
> > Tested this series of patches v3 on a CLFS 5.5 system, using the new
> > cross toolchain,
> > $ dmesg | head
> > [    0.000000] Linux version 5.19.0-rc7new-toolchain+ (loongson@linux)
> > (loongarch64-unknown-linux-gnu-gcc (GCC) 13.0.0 20220726 (experimental)
> > [master revision
> > cf7eac5805e:1e0611b64d8:3fb68f2e666d9de7e0326af9f43b12c9e98f19a6], GNU
> > ld (GNU Binutils) 2.39.50.20220726) #1 SMP PREEMPT Fri Jul 29 05:24:15
> > EDT 2022
> >
> > Relocation error when manually loading nf_tables.ko module,
> > $ sudo modprobe nf_tables
> > odprobe: ERROR: could not insert 'nf_tables': Exec format error
> >
> > $ dmesg
> > [   61.506737] kmod: module nf_tables: PCALA offset = 0x90007ffffed8c000
> > does not fit in 32-bit signed and is unsupported by kernel! dangerous
> > apply_r_larch_pcala_hi20 (71) relocation
> >
> > Do you have the same problem over there?
>
> I can reproduce it with "modprobe x_tables".  Will try to debug...

The relocation against local percpu variable is broken up. I'll try to
fix it.

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-29 11:53:51

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Fri, 2022-07-29 at 18:36 +0800, Xi Ruoyao wrote:

> The relocation against local percpu variable is broken up.  I'll try
> to fix it.

Hmm... The problem is the "addresses" of per-cpu symbols are faked: they
are actually offsets from $r21. So we can't just load such an offset
with PCALA addressing.

It looks like we'll need to introduce an attribute for GCC to make an
variable "must be addressed via GOT", and add the attribute into
PER_CPU_ATTRIBUTES.

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-29 12:25:31

by Youling Tang

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types


On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
> On Fri, 2022-07-29 at 18:36 +0800, Xi Ruoyao wrote:
>
>> The relocation against local percpu variable is broken up. I'll try
>> to fix it.
>
> Hmm... The problem is the "addresses" of per-cpu symbols are faked: they
> are actually offsets from $r21. So we can't just load such an offset
> with PCALA addressing.
>
> It looks like we'll need to introduce an attribute for GCC to make an
> variable "must be addressed via GOT", and add the attribute into
> PER_CPU_ATTRIBUTES.
Yes, we need a GCC attribute to specify the per-cpu variable.

Thanks,
Youling
>

2022-07-29 18:08:06

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:

> On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
> > Hmm... The problem is the "addresses" of per-cpu symbols are faked: they
> > are actually offsets from $r21.  So we can't just load such an offset
> > with PCALA addressing.
> >
> > It looks like we'll need to introduce an attribute for GCC to make an
> > variable "must be addressed via GOT", and add the attribute into
> > PER_CPU_ATTRIBUTES.

> Yes, we need a GCC attribute to specify the per-cpu variable.

GCC patch adding "addr_global" attribute for LoongArch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html

An experiment to use it:
https://github.com/xry111/linux/commit/c1d5d70

This fixes "modprobe x_tables" for me.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-30 02:28:05

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
> On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
>
> > On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
> > > Hmm... The problem is the "addresses" of per-cpu symbols are
> > > faked: they
> > > are actually offsets from $r21.  So we can't just load such an
> > > offset
> > > with PCALA addressing.
> > >
> > > It looks like we'll need to introduce an attribute for GCC to make
> > > an
> > > variable "must be addressed via GOT", and add the attribute into
> > > PER_CPU_ATTRIBUTES.
>
> > Yes, we need a GCC attribute to specify the per-cpu variable.
>
> GCC patch adding "addr_global" attribute for LoongArch:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
>
> An experiment to use it:
> https://github.com/xry111/linux/commit/c1d5d70

Correction: https://github.com/xry111/linux/commit/c1d5d708

It seems 7-bit SHA is not enough for kernel repo.

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-30 02:56:48

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
> On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
> > On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
> >
> > > On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
> > > > Hmm... The problem is the "addresses" of per-cpu symbols are
> > > > faked: they
> > > > are actually offsets from $r21.  So we can't just load such an
> > > > offset
> > > > with PCALA addressing.
> > > >
> > > > It looks like we'll need to introduce an attribute for GCC to
> > > > make
> > > > an
> > > > variable "must be addressed via GOT", and add the attribute into
> > > > PER_CPU_ATTRIBUTES.
> >
> > > Yes, we need a GCC attribute to specify the per-cpu variable.
> >
> > GCC patch adding "addr_global" attribute for LoongArch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
> >
> > An experiment to use it:
> > https://github.com/xry111/linux/commit/c1d5d70
>
> Correction: https://github.com/xry111/linux/commit/c1d5d708
>
> It seems 7-bit SHA is not enough for kernel repo.

If addr_global is rejected or not implemented (for example, building the
kernel with GCC 12), *I expect* the following hack to work (I've not
tested it because I'm AFK now). Using visibility in kernel seems
strange, but I think it may make some sense because the modules are some
sort of similar to an ELF shared object being dlopen()'ed, and our way
to inject per-CPU symbols is analog to ELF interposition.

arch/loongarch/include/asm/percpu.h:

#if !__has_attribute(__addr_global__) && defined(MODULE)
/* Magically remove "static" for per-CPU variables. */
# define ARCH_NEEDS_WEAK_PER_CPU
/* Force GOT-relocation for per-CPU variables. */
# define PER_CPU_ATTRIBUTES __attribute__((__visibility__("default")))
#endif

arch/loongarch/Makefile:

# Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
# include/asm/percpu.h
if (call gcc-does-not-support-addr-global)
KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
endif

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-30 06:26:53

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, Ruoyao,

On Sat, Jul 30, 2022 at 10:53 AM Xi Ruoyao <[email protected]> wrote:
>
> On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
> > On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
> > > On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
> > >
> > > > On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
> > > > > Hmm... The problem is the "addresses" of per-cpu symbols are
> > > > > faked: they
> > > > > are actually offsets from $r21. So we can't just load such an
> > > > > offset
> > > > > with PCALA addressing.
> > > > >
> > > > > It looks like we'll need to introduce an attribute for GCC to
> > > > > make
> > > > > an
> > > > > variable "must be addressed via GOT", and add the attribute into
> > > > > PER_CPU_ATTRIBUTES.
> > >
> > > > Yes, we need a GCC attribute to specify the per-cpu variable.
> > >
> > > GCC patch adding "addr_global" attribute for LoongArch:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
> > >
> > > An experiment to use it:
> > > https://github.com/xry111/linux/commit/c1d5d70
> >
> > Correction: https://github.com/xry111/linux/commit/c1d5d708
> >
> > It seems 7-bit SHA is not enough for kernel repo.
>
> If addr_global is rejected or not implemented (for example, building the
> kernel with GCC 12), *I expect* the following hack to work (I've not
> tested it because I'm AFK now). Using visibility in kernel seems
> strange, but I think it may make some sense because the modules are some
> sort of similar to an ELF shared object being dlopen()'ed, and our way
> to inject per-CPU symbols is analog to ELF interposition.
Sadly, I don't know what visibility is, does it have something to do
with __visible in include/linux/compiler_attributes.h?

Huacai
>
> arch/loongarch/include/asm/percpu.h:
>
> #if !__has_attribute(__addr_global__) && defined(MODULE)
> /* Magically remove "static" for per-CPU variables. */
> # define ARCH_NEEDS_WEAK_PER_CPU
> /* Force GOT-relocation for per-CPU variables. */
> # define PER_CPU_ATTRIBUTES __attribute__((__visibility__("default")))
> #endif
>
> arch/loongarch/Makefile:
>
> # Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
> # include/asm/percpu.h
> if (call gcc-does-not-support-addr-global)
> KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
> endif
>
> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University
>

2022-07-30 10:03:08

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Sat, 2022-07-30 at 14:44 +0800, Lulu Cheng wrote:
> > > If addr_global is rejected or not implemented (for example, building the
> > > kernel with GCC 12), *I expect* the following hack to work (I've not
> > > tested it because I'm AFK now). Using visibility in kernel seems
> > > strange, but I think it may make some sense because the modules are some
> > > sort of similar to an ELF shared object being dlopen()'ed, and our way
> > > to inject per-CPU symbols is analog to ELF interposition.
> > >
> > Sadly, I don't know what visibility is, does it have something to do
> > with __visible in include/linux/compiler_attributes.h?

They are different definitions of visibility and mostly unrelated.
Unfortunately humans do not have enough words in the language to
disambiguate those different concepts :).

-fvisibility and __attribute__((visibility)) are for ELF shared objects.
Kernel developers usually do not need to take care of them (unless
working on VDSO).

-fvisibility=default (yes, it's the default) makes the symbol "possible
to be interposed" while -fPIC. Say

$ cat main.c
extern int f(void);
extern int printf(const char *, ...);
int x = 1;
int main() { printf("%d\n", f()); }
$ cat shared.c
int x = 42;
int f(void) { return x; }
$ cc shared.c -fPIC -shared -o libshared.so
$ cc main.c -L. -Wl,-rpath,. -lshared
$ ./a.out
1

You may think it strange but it's so-called "symbol interposition"
mandated by ELF spec. To make it work, the compiler has to use GOT
access for "x" instead of PC-relative access.

OTOH, a "hidden" visibility disallows interposition:

$ cat shared-1.c
__attribute__((visbility("hidden"))) int x = 42;
int f(void) { return x; }
$ cc shared-1.c -fPIC -shared -o libshared.so
$ ./a.out
42

Now the compiler will use PC-relative access for "x" in "f".

In my hack the combination of "-fPIC" and
"__attribute__((visibility("default")))" for per-CPU symbols makes per-
CPU symbols accessed via GOT, and "-fvisibility=hidden" keeps normal
symbols accessed via PC-relative within a TU.

Note that the visibility of a symbol is also recorded in the symtab, and
ld.so will refuse to access a hidden symbol in one shared object from
another. But the kernel module loader just doesn't care the visibility
field in symtab so it won't affect us.

Basically the hack just uses visibility options & attributes *in a way
they are not designed for* to trick the compiler to emit GOT accesses
for per-CPU symbols. A new attribute ("get_through_got"/"movable" or
whatever) is definitely wanted to avoid such a tricky approach, but the
hack can be used if we want modular kernel able to be built with GCC 12.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-30 10:56:34

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Sat, Jul 30, 2022 at 5:51 PM Xi Ruoyao <[email protected]> wrote:
>
> On Sat, 2022-07-30 at 14:44 +0800, Lulu Cheng wrote:
> > > > If addr_global is rejected or not implemented (for example, building the
> > > > kernel with GCC 12), *I expect* the following hack to work (I've not
> > > > tested it because I'm AFK now). Using visibility in kernel seems
> > > > strange, but I think it may make some sense because the modules are some
> > > > sort of similar to an ELF shared object being dlopen()'ed, and our way
> > > > to inject per-CPU symbols is analog to ELF interposition.
> > > >
> > > Sadly, I don't know what visibility is, does it have something to do
> > > with __visible in include/linux/compiler_attributes.h?
>
> They are different definitions of visibility and mostly unrelated.
> Unfortunately humans do not have enough words in the language to
> disambiguate those different concepts :).
>
> -fvisibility and __attribute__((visibility)) are for ELF shared objects.
> Kernel developers usually do not need to take care of them (unless
> working on VDSO).
>
> -fvisibility=default (yes, it's the default) makes the symbol "possible
> to be interposed" while -fPIC. Say
>
> $ cat main.c
> extern int f(void);
> extern int printf(const char *, ...);
> int x = 1;
> int main() { printf("%d\n", f()); }
> $ cat shared.c
> int x = 42;
> int f(void) { return x; }
> $ cc shared.c -fPIC -shared -o libshared.so
> $ cc main.c -L. -Wl,-rpath,. -lshared
> $ ./a.out
> 1
>
> You may think it strange but it's so-called "symbol interposition"
> mandated by ELF spec. To make it work, the compiler has to use GOT
> access for "x" instead of PC-relative access.
>
> OTOH, a "hidden" visibility disallows interposition:
>
> $ cat shared-1.c
> __attribute__((visbility("hidden"))) int x = 42;
> int f(void) { return x; }
> $ cc shared-1.c -fPIC -shared -o libshared.so
> $ ./a.out
> 42
>
> Now the compiler will use PC-relative access for "x" in "f".
>
> In my hack the combination of "-fPIC" and
> "__attribute__((visibility("default")))" for per-CPU symbols makes per-
> CPU symbols accessed via GOT, and "-fvisibility=hidden" keeps normal
> symbols accessed via PC-relative within a TU.
>
> Note that the visibility of a symbol is also recorded in the symtab, and
> ld.so will refuse to access a hidden symbol in one shared object from
> another. But the kernel module loader just doesn't care the visibility
> field in symtab so it won't affect us.
>
> Basically the hack just uses visibility options & attributes *in a way
> they are not designed for* to trick the compiler to emit GOT accesses
> for per-CPU symbols. A new attribute ("get_through_got"/"movable" or
> whatever) is definitely wanted to avoid such a tricky approach, but the
> hack can be used if we want modular kernel able to be built with GCC 12.
So it has nothing to do with __visible in include/linux/compiler_attributes.h?
Or __visible is a similar thing that used by Linux kernel?

Huacai
> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University

2022-07-31 03:49:55

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Sat, 2022-07-30 at 18:38 +0800, Huacai Chen wrote:
> So it has nothing to do with __visible in include/linux/compiler_attributes.h?
> Or __visible is a similar thing that used by Linux kernel?

They are two different things. __visible means an object can be
accessed by another TU mysteriously (with some way not described by C
semantics), but it does not changes the way resolve the symbol in the
same TU.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-08-01 02:34:05

by Youling Tang

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, Ruoyao

On 07/30/2022 10:52 AM, Xi Ruoyao wrote:
> On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
>> On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
>>> On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
>>>
>>>> On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
>>>>> Hmm... The problem is the "addresses" of per-cpu symbols are
>>>>> faked: they
>>>>> are actually offsets from $r21. So we can't just load such an
>>>>> offset
>>>>> with PCALA addressing.
>>>>>
>>>>> It looks like we'll need to introduce an attribute for GCC to
>>>>> make
>>>>> an
>>>>> variable "must be addressed via GOT", and add the attribute into
>>>>> PER_CPU_ATTRIBUTES.
>>>
>>>> Yes, we need a GCC attribute to specify the per-cpu variable.
>>>
>>> GCC patch adding "addr_global" attribute for LoongArch:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
>>>
>>> An experiment to use it:
>>> https://github.com/xry111/linux/commit/c1d5d70
>>
>> Correction: https://github.com/xry111/linux/commit/c1d5d708
>>
>> It seems 7-bit SHA is not enough for kernel repo.
>
> If addr_global is rejected or not implemented (for example, building the
> kernel with GCC 12), *I expect* the following hack to work (I've not
> tested it because I'm AFK now). Using visibility in kernel seems
> strange, but I think it may make some sense because the modules are some
> sort of similar to an ELF shared object being dlopen()'ed, and our way
> to inject per-CPU symbols is analog to ELF interposition.
>
> arch/loongarch/include/asm/percpu.h:
>
> #if !__has_attribute(__addr_global__) && defined(MODULE)
> /* Magically remove "static" for per-CPU variables. */
> # define ARCH_NEEDS_WEAK_PER_CPU
> /* Force GOT-relocation for per-CPU variables. */
> # define PER_CPU_ATTRIBUTES __attribute__((__visibility__("default")))
> #endif
>
> arch/loongarch/Makefile:
>
> # Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
> # include/asm/percpu.h
> if (call gcc-does-not-support-addr-global)
> KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
> endif
>
Using the old toolchain (GCC 12) can successfully load the nf_tables.ko
module after applying the above patch.

Thanks,
Youling


2022-08-01 02:59:34

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, all,

On Mon, Aug 1, 2022 at 10:16 AM Youling Tang <[email protected]> wrote:
>
> Hi, Ruoyao
>
> On 07/30/2022 10:52 AM, Xi Ruoyao wrote:
> > On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
> >> On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
> >>> On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
> >>>
> >>>> On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
> >>>>> Hmm... The problem is the "addresses" of per-cpu symbols are
> >>>>> faked: they
> >>>>> are actually offsets from $r21. So we can't just load such an
> >>>>> offset
> >>>>> with PCALA addressing.
> >>>>>
> >>>>> It looks like we'll need to introduce an attribute for GCC to
> >>>>> make
> >>>>> an
> >>>>> variable "must be addressed via GOT", and add the attribute into
> >>>>> PER_CPU_ATTRIBUTES.
> >>>
> >>>> Yes, we need a GCC attribute to specify the per-cpu variable.
> >>>
> >>> GCC patch adding "addr_global" attribute for LoongArch:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
> >>>
> >>> An experiment to use it:
> >>> https://github.com/xry111/linux/commit/c1d5d70
> >>
> >> Correction: https://github.com/xry111/linux/commit/c1d5d708
> >>
> >> It seems 7-bit SHA is not enough for kernel repo.
> >
> > If addr_global is rejected or not implemented (for example, building the
> > kernel with GCC 12), *I expect* the following hack to work (I've not
> > tested it because I'm AFK now). Using visibility in kernel seems
> > strange, but I think it may make some sense because the modules are some
> > sort of similar to an ELF shared object being dlopen()'ed, and our way
> > to inject per-CPU symbols is analog to ELF interposition.
> >
> > arch/loongarch/include/asm/percpu.h:
> >
> > #if !__has_attribute(__addr_global__) && defined(MODULE)
> > /* Magically remove "static" for per-CPU variables. */
> > # define ARCH_NEEDS_WEAK_PER_CPU
> > /* Force GOT-relocation for per-CPU variables. */
> > # define PER_CPU_ATTRIBUTES __attribute__((__visibility__("default")))
> > #endif
> >
> > arch/loongarch/Makefile:
> >
> > # Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
> > # include/asm/percpu.h
> > if (call gcc-does-not-support-addr-global)
> > KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
> > endif
> >
> Using the old toolchain (GCC 12) can successfully load the nf_tables.ko
> module after applying the above patch.
I don't like such a hack..., can we consider using old relocation
types when building by old toolchains?

Huacai
>
> Thanks,
> Youling
>

2022-08-01 04:49:41

by Youling Tang

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, all

On 08/01/2022 10:34 AM, Huacai Chen wrote:
> Hi, all,
>
> On Mon, Aug 1, 2022 at 10:16 AM Youling Tang <[email protected]> wrote:
>>
>> Hi, Ruoyao
>>
>> On 07/30/2022 10:52 AM, Xi Ruoyao wrote:
>>> On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
>>>> On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
>>>>> On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
>>>>>
>>>>>> On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
>>>>>>> Hmm... The problem is the "addresses" of per-cpu symbols are
>>>>>>> faked: they
>>>>>>> are actually offsets from $r21. So we can't just load such an
>>>>>>> offset
>>>>>>> with PCALA addressing.
>>>>>>>
>>>>>>> It looks like we'll need to introduce an attribute for GCC to
>>>>>>> make
>>>>>>> an
>>>>>>> variable "must be addressed via GOT", and add the attribute into
>>>>>>> PER_CPU_ATTRIBUTES.
>>>>>
>>>>>> Yes, we need a GCC attribute to specify the per-cpu variable.
>>>>>
>>>>> GCC patch adding "addr_global" attribute for LoongArch:
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
>>>>>
>>>>> An experiment to use it:
>>>>> https://github.com/xry111/linux/commit/c1d5d70
>>>>
>>>> Correction: https://github.com/xry111/linux/commit/c1d5d708

Using the new toolchain (with the "addr_global" attribute) to build the
kernel can successfully load the nf_tables.ko module after applying the
"c1d5d708" commit.

Thanks,
Youling
>>>>
>>>> It seems 7-bit SHA is not enough for kernel repo.
>>>
>>> If addr_global is rejected or not implemented (for example, building the
>>> kernel with GCC 12), *I expect* the following hack to work (I've not
>>> tested it because I'm AFK now). Using visibility in kernel seems
>>> strange, but I think it may make some sense because the modules are some
>>> sort of similar to an ELF shared object being dlopen()'ed, and our way
>>> to inject per-CPU symbols is analog to ELF interposition.
>>>
>>> arch/loongarch/include/asm/percpu.h:
>>>
>>> #if !__has_attribute(__addr_global__) && defined(MODULE)
>>> /* Magically remove "static" for per-CPU variables. */
>>> # define ARCH_NEEDS_WEAK_PER_CPU
>>> /* Force GOT-relocation for per-CPU variables. */
>>> # define PER_CPU_ATTRIBUTES __attribute__((__visibility__("default")))
>>> #endif
>>>
>>> arch/loongarch/Makefile:
>>>
>>> # Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
>>> # include/asm/percpu.h
>>> if (call gcc-does-not-support-addr-global)
>>> KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
>>> endif
>>>
>> Using the old toolchain (GCC 12) can successfully load the nf_tables.ko
>> module after applying the above patch.
> I don't like such a hack..., can we consider using old relocation
> types when building by old toolchains?
>
> Huacai
>>
>> Thanks,
>> Youling
>>


2022-08-01 10:07:41

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Mon, 2022-08-01 at 10:34 +0800, Huacai Chen wrote:
> Hi, all,
>
> On Mon, Aug 1, 2022 at 10:16 AM Youling Tang <[email protected]>
> wrote:
> >
> > Hi, Ruoyao
> >
> > On 07/30/2022 10:52 AM, Xi Ruoyao wrote:
> > > On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
> > > > On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
> > > > > On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
> > > > >
> > > > > > On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
> > > > > > > Hmm... The problem is the "addresses" of per-cpu symbols
> > > > > > > are
> > > > > > > faked: they
> > > > > > > are actually offsets from $r21.  So we can't just load
> > > > > > > such an
> > > > > > > offset
> > > > > > > with PCALA addressing.
> > > > > > >
> > > > > > > It looks like we'll need to introduce an attribute for GCC
> > > > > > > to
> > > > > > > make
> > > > > > > an
> > > > > > > variable "must be addressed via GOT", and add the
> > > > > > > attribute into
> > > > > > > PER_CPU_ATTRIBUTES.
> > > > >
> > > > > > Yes, we need a GCC attribute to specify the per-cpu
> > > > > > variable.
> > > > >
> > > > > GCC patch adding "addr_global" attribute for LoongArch:
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
> > > > >
> > > > > An experiment to use it:
> > > > > https://github.com/xry111/linux/commit/c1d5d70
> > > >
> > > > Correction: https://github.com/xry111/linux/commit/c1d5d708
> > > >
> > > > It seems 7-bit SHA is not enough for kernel repo.
> > >
> > > If addr_global is rejected or not implemented (for example,
> > > building the
> > > kernel with GCC 12), *I expect* the following hack to work (I've
> > > not
> > > tested it because I'm AFK now).  Using visibility in kernel seems
> > > strange, but I think it may make some sense because the modules
> > > are some
> > > sort of similar to an ELF shared object being dlopen()'ed, and our
> > > way
> > > to inject per-CPU symbols is analog to ELF interposition.
> > >
> > > arch/loongarch/include/asm/percpu.h:
> > >
> > >    #if !__has_attribute(__addr_global__) && defined(MODULE)
> > >    /* Magically remove "static" for per-CPU variables.  */
> > >    # define ARCH_NEEDS_WEAK_PER_CPU
> > >    /* Force GOT-relocation for per-CPU variables.  */
> > >    # define PER_CPU_ATTRIBUTES
> > > __attribute__((__visibility__("default")))
> > >    #endif
> > >
> > > arch/loongarch/Makefile:
> > >
> > >    # Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
> > >    # include/asm/percpu.h
> > >    if (call gcc-does-not-support-addr-global)
> > >      KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
> > >    endif
> > >
> > Using the old toolchain (GCC 12) can successfully load the
> > nf_tables.ko
> > module after applying the above patch.
> I don't like such a hack..., can we consider using old relocation
> types when building by old toolchains?


I don't like the hack too. I only developed it as an intellectual game.

We need to consider multiple combinations:

(1) Old GCC + old Binutils. We need -mla-local-with-abs for
KBUILD_CFLAGS_MODULE.

(2) Old GCC + new Binutils. We need -mla-local-with-abs for
KBUILD_CFLAGS_MODULE, *and* adding the support for
R_LARCH_ABS{_HI20,_LO12,64_LO20,64_HI12} in the kernel module loader.

(3) New GCC + old Binutils. As new GCC should support our new attribute
(I now intend to send V2 patch to gcc-patches using "movable" as the
attribute name), no special action is needed.

Basically, we need:

(1) Handle R_LARCH_ABS{_HI20,_LO12,64_LO20,64_HI12} in the kernel module
loader.
(2) Add -Wa,-mla-local-with-abs into KBUILD_CFLAGS_MODULE if GCC version
is <= 12.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-08-01 10:17:51

by Youling Tang

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] LoongArch: Support modules with new relocation types

Hi, Ruoyao

On 07/29/2022 04:42 PM, Xi Ruoyao wrote:
> If GAS 2.40 and/or GCC 13 is used to build the kernel, the modules will
> contain R_LARCH_B26, R_LARCH_PCALA_HI20, R_LARCH_PCALA_LO12,
> R_LARCH_GOT_PC_HI20, and R_LARCH_GOT_PC_LO12 relocations. Support them
> in the module loader to allow a kernel built with latest toolchain
> capable to load the modules.
>
> Signed-off-by: Xi Ruoyao <[email protected]>
> ---
> arch/loongarch/include/asm/elf.h | 37 +++++++++++
> arch/loongarch/kernel/module-sections.c | 12 +++-
> arch/loongarch/kernel/module.c | 85 +++++++++++++++++++++++++
> 3 files changed, 132 insertions(+), 2 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h
> index 5f3ff4781fda..7af0cebf28d7 100644
> --- a/arch/loongarch/include/asm/elf.h
> +++ b/arch/loongarch/include/asm/elf.h
> @@ -74,6 +74,43 @@
> #define R_LARCH_SUB64 56
> #define R_LARCH_GNU_VTINHERIT 57
> #define R_LARCH_GNU_VTENTRY 58
> +#define R_LARCH_B16 64
> +#define R_LARCH_B21 65
> +#define R_LARCH_B26 66
> +#define R_LARCH_ABS_HI20 67
> +#define R_LARCH_ABS_LO12 68
> +#define R_LARCH_ABS64_LO20 69
> +#define R_LARCH_ABS64_HI12 70
> +#define R_LARCH_PCALA_HI20 71
> +#define R_LARCH_PCALA_LO12 72
> +#define R_LARCH_PCALA64_LO20 73
> +#define R_LARCH_PCALA64_HI12 74
> +#define R_LARCH_GOT_PC_HI20 75
> +#define R_LARCH_GOT_PC_LO12 76
> +#define R_LARCH_GOT64_PC_LO20 77
> +#define R_LARCH_GOT64_PC_HI12 78
> +#define R_LARCH_GOT_HI20 79
> +#define R_LARCH_GOT_LO12 80
> +#define R_LARCH_GOT64_LO20 81
> +#define R_LARCH_GOT64_HI12 82
> +#define R_LARCH_TLS_LE_HI20 83
> +#define R_LARCH_TLS_LE_LO12 84
> +#define R_LARCH_TLS_LE64_LO20 85
> +#define R_LARCH_TLS_LE64_HI12 86
> +#define R_LARCH_TLS_IE_PC_HI20 87
> +#define R_LARCH_TLS_IE_PC_LO12 88
> +#define R_LARCH_TLS_IE64_PC_LO20 89
> +#define R_LARCH_TLS_IE64_PC_HI12 90
> +#define R_LARCH_TLS_IE_HI20 91
> +#define R_LARCH_TLS_IE_LO12 92
> +#define R_LARCH_TLS_IE64_LO20 93
> +#define R_LARCH_TLS_IE64_HI12 94
> +#define R_LARCH_TLS_LD_PC_HI20 95
> +#define R_LARCH_TLS_LD_HI20 96
> +#define R_LARCH_TLS_GD_PC_HI20 97
> +#define R_LARCH_TLS_GD_HI20 98
> +#define R_LARCH_32_PCREL 99
> +#define R_LARCH_RELAX 100
>
> #ifndef ELF_ARCH
>
> diff --git a/arch/loongarch/kernel/module-sections.c b/arch/loongarch/kernel/module-sections.c
> index 36a77771d18c..8c0e4ad048cc 100644
> --- a/arch/loongarch/kernel/module-sections.c
> +++ b/arch/loongarch/kernel/module-sections.c
> @@ -76,12 +76,20 @@ static void count_max_entries(Elf_Rela *relas, int num,
>
> for (i = 0; i < num; i++) {
> type = ELF_R_TYPE(relas[i].r_info);
> - if (type == R_LARCH_SOP_PUSH_PLT_PCREL) {
> + switch (type) {
> + case R_LARCH_SOP_PUSH_PLT_PCREL:
> + case R_LARCH_B26:
> if (!duplicate_rela(relas, i))
> (*plts)++;
> - } else if (type == R_LARCH_SOP_PUSH_GPREL)
> + break;
> + case R_LARCH_SOP_PUSH_GPREL:
> + case R_LARCH_GOT_PC_HI20:
> if (!duplicate_rela(relas, i))
> (*gots)++;
> + break;
> + default:
> + /* Do nothing. */
> + }
> }
> }
>
> diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
> index 3ac4fbb5f109..c7b40150e1f0 100644
> --- a/arch/loongarch/kernel/module.c
> +++ b/arch/loongarch/kernel/module.c
> @@ -291,6 +291,86 @@ static int apply_r_larch_add_sub(struct module *mod, u32 *location, Elf_Addr v,
> }
> }
>
> +static int apply_r_larch_b26(struct module *mod, u32 *location, Elf_Addr v,
> + s64 *rela_stack, size_t *rela_stack_top, unsigned int type)
> +{
> + ptrdiff_t offset = (void *)v - (void *)location;
> + union loongarch_instruction *insn = (union loongarch_instruction *)location;
> +
> + if (offset >= SZ_128M)
> + v = module_emit_plt_entry(mod, v);
> +
> + if (offset < -SZ_128M)
> + v = module_emit_plt_entry(mod, v);
> +
> + offset = (void *)v - (void *)location;
> +
> + if (offset & 3) {
> + pr_err("module %s: jump offset = 0x%llx unaligned! dangerous R_LARCH_B26 (%u) relocation\n",
> + mod->name, (long long)offset, type);
> + return -ENOEXEC;
> + }
> +
> + if (!signed_imm_check(offset, 28)) {
> + pr_err("module %s: jump offset = 0x%llx overflow! dangerous R_LARCH_B26 (%u) relocation\n",
> + mod->name, (long long)offset, type);
> + return -ENOEXEC;
> + }
> +
> + offset >>= 2;
> + insn->reg0i26_format.immediate_l = offset & 0xffff;
> + insn->reg0i26_format.immediate_h = (offset >> 16) & 0x3ff;
> + return 0;
> +}
> +
> +static int apply_r_larch_pcala_hi20(struct module *mod, u32 *location,
> + Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
> + unsigned int type)
> +{
> + ptrdiff_t offset = (void *)((v + 0x800) & ~0xfff) -
> + (void *)((Elf_Addr)location & ~0xfff);
> + union loongarch_instruction *insn = (union loongarch_instruction *)location;
> +
> + if (!signed_imm_check(offset, 32)) {
> + pr_err("module %s: PCALA offset = 0x%llx does not fit in 32-bit signed and is unsupported by kernel! dangerous %s (%u) relocation\n",
> + mod->name, (long long)offset, __func__, type);
> + return -ENOEXEC;
> + }
Do we need to add a "location" message when the relocation fails? To
get the PC for better debugging.

> +
> + insn->reg1i20_format.immediate = (offset >> 12) & 0xfffff;
> + return 0;
> +}
> +
> +static int apply_r_larch_got_pc_hi20(struct module *mod, u32 *location,
> + Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
> + unsigned int type)
> +{
> + Elf_Addr got = module_emit_got_entry(mod, v);
> +
> + return apply_r_larch_pcala_hi20(mod, location, got, rela_stack,
> + rela_stack_top, type);
> +}
> +
> +static int apply_r_larch_pcala_lo12(struct module *mod, u32 *location,
> + Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
> + unsigned int type)
> +{
> + union loongarch_instruction *insn = (union loongarch_instruction *)location;
> +
> + insn->reg2i12_format.immediate = v & 0xfff;
> + return 0;
> +}
> +
> +static int apply_r_larch_got_pc_lo12(struct module *mod, u32 *location,
> + Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
> + unsigned int type)
> +{
> + Elf_Addr got = module_emit_got_entry(mod, v);
> +
> + return apply_r_larch_pcala_lo12(mod, location, got, rela_stack,
> + rela_stack_top, type);
> +}
Maybe it might look better to keep the apply_r_larch_* functions
ordered by relocation type ID?
Order:
apply_r_larch_pcala_hi20
apply_r_larch_pcala_lo12
apply_r_larch_got_pc_hi20
apply_r_larch_got_pc_lo12

Thanks,
Youling

> +
> /*
> * reloc_handlers_rela() - Apply a particular relocation to a module
> * @mod: the module to apply the reloc to
> @@ -321,6 +401,11 @@ static reloc_rela_handler reloc_rela_handlers[] = {
> [R_LARCH_SOP_SUB ... R_LARCH_SOP_IF_ELSE] = apply_r_larch_sop,
> [R_LARCH_SOP_POP_32_S_10_5 ... R_LARCH_SOP_POP_32_U] = apply_r_larch_sop_imm_field,
> [R_LARCH_ADD32 ... R_LARCH_SUB64] = apply_r_larch_add_sub,
> + [R_LARCH_B26] = apply_r_larch_b26,
> + [R_LARCH_PCALA_HI20] = apply_r_larch_pcala_hi20,
> + [R_LARCH_PCALA_LO12] = apply_r_larch_pcala_lo12,
> + [R_LARCH_GOT_PC_HI20] = apply_r_larch_got_pc_hi20,
> + [R_LARCH_GOT_PC_LO12] = apply_r_larch_got_pc_lo12,
> };
>
> int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>


2022-08-01 10:23:50

by Jinyang He

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On 08/01/2022 05:55 PM, Xi Ruoyao wrote:

> On Mon, 2022-08-01 at 10:34 +0800, Huacai Chen wrote:
>> Hi, all,
>>
>> On Mon, Aug 1, 2022 at 10:16 AM Youling Tang <[email protected]>
>> wrote:
>>> Hi, Ruoyao
>>>
>>> On 07/30/2022 10:52 AM, Xi Ruoyao wrote:
>>>> On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
>>>>> On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
>>>>>> On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
>>>>>>
>>>>>>> On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
>>>>>>>> Hmm... The problem is the "addresses" of per-cpu symbols
>>>>>>>> are
>>>>>>>> faked: they
>>>>>>>> are actually offsets from $r21. So we can't just load
>>>>>>>> such an
>>>>>>>> offset
>>>>>>>> with PCALA addressing.
>>>>>>>>
>>>>>>>> It looks like we'll need to introduce an attribute for GCC
>>>>>>>> to
>>>>>>>> make
>>>>>>>> an
>>>>>>>> variable "must be addressed via GOT", and add the
>>>>>>>> attribute into
>>>>>>>> PER_CPU_ATTRIBUTES.
>>>>>>> Yes, we need a GCC attribute to specify the per-cpu
>>>>>>> variable.
>>>>>> GCC patch adding "addr_global" attribute for LoongArch:
>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
>>>>>>
>>>>>> An experiment to use it:
>>>>>> https://github.com/xry111/linux/commit/c1d5d70
>>>>> Correction: https://github.com/xry111/linux/commit/c1d5d708
>>>>>
>>>>> It seems 7-bit SHA is not enough for kernel repo.
>>>> If addr_global is rejected or not implemented (for example,
>>>> building the
>>>> kernel with GCC 12), *I expect* the following hack to work (I've
>>>> not
>>>> tested it because I'm AFK now). Using visibility in kernel seems
>>>> strange, but I think it may make some sense because the modules
>>>> are some
>>>> sort of similar to an ELF shared object being dlopen()'ed, and our
>>>> way
>>>> to inject per-CPU symbols is analog to ELF interposition.
>>>>
>>>> arch/loongarch/include/asm/percpu.h:
>>>>
>>>> #if !__has_attribute(__addr_global__) && defined(MODULE)
>>>> /* Magically remove "static" for per-CPU variables. */
>>>> # define ARCH_NEEDS_WEAK_PER_CPU
>>>> /* Force GOT-relocation for per-CPU variables. */
>>>> # define PER_CPU_ATTRIBUTES
>>>> __attribute__((__visibility__("default")))
>>>> #endif
>>>>
>>>> arch/loongarch/Makefile:
>>>>
>>>> # Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
>>>> # include/asm/percpu.h
>>>> if (call gcc-does-not-support-addr-global)
>>>> KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
>>>> endif
>>>>
>>> Using the old toolchain (GCC 12) can successfully load the
>>> nf_tables.ko
>>> module after applying the above patch.
>> I don't like such a hack..., can we consider using old relocation
>> types when building by old toolchains?
>
> I don't like the hack too. I only developed it as an intellectual game.
>
> We need to consider multiple combinations:
>
> (1) Old GCC + old Binutils. We need -mla-local-with-abs for
> KBUILD_CFLAGS_MODULE.
>
> (2) Old GCC + new Binutils. We need -mla-local-with-abs for
> KBUILD_CFLAGS_MODULE, *and* adding the support for
> R_LARCH_ABS{_HI20,_LO12,64_LO20,64_HI12} in the kernel module loader.
>
> (3) New GCC + old Binutils. As new GCC should support our new attribute
> (I now intend to send V2 patch to gcc-patches using "movable" as the
> attribute name), no special action is needed.
>
> Basically, we need:
>
> (1) Handle R_LARCH_ABS{_HI20,_LO12,64_LO20,64_HI12} in the kernel module
> loader.
> (2) Add -Wa,-mla-local-with-abs into KBUILD_CFLAGS_MODULE if GCC version
> is <= 12.

Actually, I really hope kernel image is in the XKVRANGE, rather
than being in XKPRANGE. So that we can limit kernel and modules
be in 4GB range. I think it will make all work normally. :-(



2022-08-01 11:13:52

by WANG Xuerui

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On 2022/8/1 18:08, Jinyang He wrote:
> [snip]
>
> Actually, I really hope kernel image is in the XKVRANGE, rather
> than being in XKPRANGE. So that we can limit kernel and modules
> be in 4GB range. I think it will make all work normally. :-(

Just my 2c. I'd prefer any scheme in which memfd_secret is possible. The
current design makes it impossible to carve out memory regions from
kernel's view, IIUC, which is of course something to improve...

--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/


2022-08-01 11:14:36

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, Ruoyao,

On Mon, Aug 1, 2022 at 5:55 PM Xi Ruoyao <[email protected]> wrote:
>
> On Mon, 2022-08-01 at 10:34 +0800, Huacai Chen wrote:
> > Hi, all,
> >
> > On Mon, Aug 1, 2022 at 10:16 AM Youling Tang <[email protected]>
> > wrote:
> > >
> > > Hi, Ruoyao
> > >
> > > On 07/30/2022 10:52 AM, Xi Ruoyao wrote:
> > > > On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
> > > > > On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
> > > > > > On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
> > > > > >
> > > > > > > On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
> > > > > > > > Hmm... The problem is the "addresses" of per-cpu symbols
> > > > > > > > are
> > > > > > > > faked: they
> > > > > > > > are actually offsets from $r21. So we can't just load
> > > > > > > > such an
> > > > > > > > offset
> > > > > > > > with PCALA addressing.
> > > > > > > >
> > > > > > > > It looks like we'll need to introduce an attribute for GCC
> > > > > > > > to
> > > > > > > > make
> > > > > > > > an
> > > > > > > > variable "must be addressed via GOT", and add the
> > > > > > > > attribute into
> > > > > > > > PER_CPU_ATTRIBUTES.
> > > > > >
> > > > > > > Yes, we need a GCC attribute to specify the per-cpu
> > > > > > > variable.
> > > > > >
> > > > > > GCC patch adding "addr_global" attribute for LoongArch:
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
> > > > > >
> > > > > > An experiment to use it:
> > > > > > https://github.com/xry111/linux/commit/c1d5d70
> > > > >
> > > > > Correction: https://github.com/xry111/linux/commit/c1d5d708
> > > > >
> > > > > It seems 7-bit SHA is not enough for kernel repo.
> > > >
> > > > If addr_global is rejected or not implemented (for example,
> > > > building the
> > > > kernel with GCC 12), *I expect* the following hack to work (I've
> > > > not
> > > > tested it because I'm AFK now). Using visibility in kernel seems
> > > > strange, but I think it may make some sense because the modules
> > > > are some
> > > > sort of similar to an ELF shared object being dlopen()'ed, and our
> > > > way
> > > > to inject per-CPU symbols is analog to ELF interposition.
> > > >
> > > > arch/loongarch/include/asm/percpu.h:
> > > >
> > > > #if !__has_attribute(__addr_global__) && defined(MODULE)
> > > > /* Magically remove "static" for per-CPU variables. */
> > > > # define ARCH_NEEDS_WEAK_PER_CPU
> > > > /* Force GOT-relocation for per-CPU variables. */
> > > > # define PER_CPU_ATTRIBUTES
> > > > __attribute__((__visibility__("default")))
> > > > #endif
> > > >
> > > > arch/loongarch/Makefile:
> > > >
> > > > # Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
> > > > # include/asm/percpu.h
> > > > if (call gcc-does-not-support-addr-global)
> > > > KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
> > > > endif
> > > >
> > > Using the old toolchain (GCC 12) can successfully load the
> > > nf_tables.ko
> > > module after applying the above patch.
> > I don't like such a hack..., can we consider using old relocation
> > types when building by old toolchains?
>
>
> I don't like the hack too. I only developed it as an intellectual game.
>
> We need to consider multiple combinations:
>
> (1) Old GCC + old Binutils. We need -mla-local-with-abs for
> KBUILD_CFLAGS_MODULE.
>
> (2) Old GCC + new Binutils. We need -mla-local-with-abs for
> KBUILD_CFLAGS_MODULE, *and* adding the support for
> R_LARCH_ABS{_HI20,_LO12,64_LO20,64_HI12} in the kernel module loader.
>
> (3) New GCC + old Binutils. As new GCC should support our new attribute
> (I now intend to send V2 patch to gcc-patches using "movable" as the
> attribute name), no special action is needed.
>
> Basically, we need:
>
> (1) Handle R_LARCH_ABS{_HI20,_LO12,64_LO20,64_HI12} in the kernel module
> loader.
> (2) Add -Wa,-mla-local-with-abs into KBUILD_CFLAGS_MODULE if GCC version
> is <= 12.
There is another simple solution: just refuse to build if the
toolchain is too old.

Huacai

> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University
>

2022-08-01 11:38:37

by Youling Tang

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, Jinyang

On 08/01/2022 06:08 PM, Jinyang He wrote:
> On 08/01/2022 05:55 PM, Xi Ruoyao wrote:
>
>> On Mon, 2022-08-01 at 10:34 +0800, Huacai Chen wrote:
>>> Hi, all,
>>>
>>> On Mon, Aug 1, 2022 at 10:16 AM Youling Tang <[email protected]>
>>> wrote:
>>>> Hi, Ruoyao
>>>>
>>>> On 07/30/2022 10:52 AM, Xi Ruoyao wrote:
>>>>> On Sat, 2022-07-30 at 10:24 +0800, Xi Ruoyao wrote:
>>>>>> On Sat, 2022-07-30 at 01:55 +0800, Xi Ruoyao wrote:
>>>>>>> On Fri, 2022-07-29 at 20:19 +0800, Youling Tang wrote:
>>>>>>>
>>>>>>>> On 07/29/2022 07:45 PM, Xi Ruoyao wrote:
>>>>>>>>> Hmm... The problem is the "addresses" of per-cpu symbols
>>>>>>>>> are
>>>>>>>>> faked: they
>>>>>>>>> are actually offsets from $r21. So we can't just load
>>>>>>>>> such an
>>>>>>>>> offset
>>>>>>>>> with PCALA addressing.
>>>>>>>>>
>>>>>>>>> It looks like we'll need to introduce an attribute for GCC
>>>>>>>>> to
>>>>>>>>> make
>>>>>>>>> an
>>>>>>>>> variable "must be addressed via GOT", and add the
>>>>>>>>> attribute into
>>>>>>>>> PER_CPU_ATTRIBUTES.
>>>>>>>> Yes, we need a GCC attribute to specify the per-cpu
>>>>>>>> variable.
>>>>>>> GCC patch adding "addr_global" attribute for LoongArch:
>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599064.html
>>>>>>>
>>>>>>> An experiment to use it:
>>>>>>> https://github.com/xry111/linux/commit/c1d5d70
>>>>>> Correction: https://github.com/xry111/linux/commit/c1d5d708
>>>>>>
>>>>>> It seems 7-bit SHA is not enough for kernel repo.
>>>>> If addr_global is rejected or not implemented (for example,
>>>>> building the
>>>>> kernel with GCC 12), *I expect* the following hack to work (I've
>>>>> not
>>>>> tested it because I'm AFK now). Using visibility in kernel seems
>>>>> strange, but I think it may make some sense because the modules
>>>>> are some
>>>>> sort of similar to an ELF shared object being dlopen()'ed, and our
>>>>> way
>>>>> to inject per-CPU symbols is analog to ELF interposition.
>>>>>
>>>>> arch/loongarch/include/asm/percpu.h:
>>>>>
>>>>> #if !__has_attribute(__addr_global__) && defined(MODULE)
>>>>> /* Magically remove "static" for per-CPU variables. */
>>>>> # define ARCH_NEEDS_WEAK_PER_CPU
>>>>> /* Force GOT-relocation for per-CPU variables. */
>>>>> # define PER_CPU_ATTRIBUTES
>>>>> __attribute__((__visibility__("default")))
>>>>> #endif
>>>>>
>>>>> arch/loongarch/Makefile:
>>>>>
>>>>> # Hack for per-CPU variables, see PER_CPU_ATTRIBUTES in
>>>>> # include/asm/percpu.h
>>>>> if (call gcc-does-not-support-addr-global)
>>>>> KBUILD_CFLAGS_MODULE += -fPIC -fvisibility=hidden
>>>>> endif
>>>>>
>>>> Using the old toolchain (GCC 12) can successfully load the
>>>> nf_tables.ko
>>>> module after applying the above patch.
>>> I don't like such a hack..., can we consider using old relocation
>>> types when building by old toolchains?
>>
>> I don't like the hack too. I only developed it as an intellectual game.
>>
>> We need to consider multiple combinations:
>>
>> (1) Old GCC + old Binutils. We need -mla-local-with-abs for
>> KBUILD_CFLAGS_MODULE.
>>
>> (2) Old GCC + new Binutils. We need -mla-local-with-abs for
>> KBUILD_CFLAGS_MODULE, *and* adding the support for
>> R_LARCH_ABS{_HI20,_LO12,64_LO20,64_HI12} in the kernel module loader.
>>
>> (3) New GCC + old Binutils. As new GCC should support our new attribute
>> (I now intend to send V2 patch to gcc-patches using "movable" as the
>> attribute name), no special action is needed.
>>
>> Basically, we need:
>>
>> (1) Handle R_LARCH_ABS{_HI20,_LO12,64_LO20,64_HI12} in the kernel module
>> loader.
>> (2) Add -Wa,-mla-local-with-abs into KBUILD_CFLAGS_MODULE if GCC version
>> is <= 12.
>
> Actually, I really hope kernel image is in the XKVRANGE, rather
> than being in XKPRANGE. So that we can limit kernel and modules
> be in 4GB range. I think it will make all work normally. :-(
>

Assuming that the kernel and modules are limited to 4G, the external
symbols will be accessed through pcrel32, which means that there is no
need to pass the GOT table entry, and there is no need for got support,
so there will be no percpu problem, and it will make all work normally?

Youling.


2022-08-01 11:50:58

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Mon, 2022-08-01 at 19:28 +0800, Youling Tang wrote:

> > Actually, I really hope kernel image is in the XKVRANGE, rather
> > than being in XKPRANGE. So that we can limit kernel and modules
> > be in 4GB range. I think it will make all work normally. :-(
> >
>
> Assuming that the kernel and modules are limited to 4G, the external
> symbols will be accessed through pcrel32, which means that there is no
> need to pass the GOT table entry and there is no need for got support

We'll still need to modify GCC to use PC-rel for accessing an object in
another TU (by default, or an option to control), instead of GOT. Or
just add support to GOT relocations here. But anyway it will be much
easier as we won't need to handle per-CPU variables specially.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-08-01 12:54:48

by Youling Tang

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types



On 08/01/2022 07:39 PM, Xi Ruoyao wrote:
> On Mon, 2022-08-01 at 19:28 +0800, Youling Tang wrote:
>
>>> Actually, I really hope kernel image is in the XKVRANGE, rather
>>> than being in XKPRANGE. So that we can limit kernel and modules
>>> be in 4GB range. I think it will make all work normally. :-(
>>>
>>
>> Assuming that the kernel and modules are limited to 4G, the external
>> symbols will be accessed through pcrel32, which means that there is no
>> need to pass the GOT table entry and there is no need for got support
>
> We'll still need to modify GCC to use PC-rel for accessing an object in
> another TU (by default, or an option to control), instead of GOT. Or
> just add support to GOT relocations here. But anyway it will be much
> easier as we won't need to handle per-CPU variables specially.
>
OO, old toolchains require extra handlingg no matter how modified.
Maybe rejecting old toolchain builds is a good option as Huacai said.

Youling.


2022-08-01 12:55:51

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, Ruoyao,

On Mon, Aug 1, 2022 at 7:40 PM Xi Ruoyao <[email protected]> wrote:
>
> On Mon, 2022-08-01 at 19:28 +0800, Youling Tang wrote:
>
> > > Actually, I really hope kernel image is in the XKVRANGE, rather
> > > than being in XKPRANGE. So that we can limit kernel and modules
> > > be in 4GB range. I think it will make all work normally. :-(
> > >
> >
> > Assuming that the kernel and modules are limited to 4G, the external
> > symbols will be accessed through pcrel32, which means that there is no
> > need to pass the GOT table entry and there is no need for got support
>
> We'll still need to modify GCC to use PC-rel for accessing an object in
> another TU (by default, or an option to control), instead of GOT. Or
> just add support to GOT relocations here. But anyway it will be much
> easier as we won't need to handle per-CPU variables specially.
All tlb-mapped kernel may be supported in future, but no now. Because
there are a ton of problems. :)

Huacai
> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University

2022-08-02 07:48:03

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

On Tue, 2022-08-02 at 14:39 +0800, Lulu Cheng wrote:

> >
> >  OO, old toolchains require extra handlingg no matter how modified.
> >  Maybe rejecting old toolchain builds is a good option as Huacai
> > said.

> Sorry to ask, is it possible to use an absolute address to visit here?
> Like precpu, do not use pcrel or got, and it directly uses
> four instructions to obtain the absolute address.
> The same is achieved by adding attributes in GCC.

Both GOT and ABS will work. But to me GOT is better because it only
needs two instructions while ABS needs four.

The most troubling issue is how to support the old GCC. It seems we
have to check GCC version and use -Wa,-mla-local-with-pcrel for GCC 12
(while I still think GOT is better but we don't have -mla-local-with-
got, *and* it will cause make every local object address load cost 4
instructions), or we just say "it's impossible to use GCC 12 to build
Linux 6.0 for LoongArch".

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-08-09 12:02:28

by Youling Tang

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] LoongArch: Support modules with new relocation types

Hi, Ruoyao

On 07/29/2022 04:42 PM, Xi Ruoyao wrote:
> If GAS 2.40 and/or GCC 13 is used to build the kernel, the modules will
> contain R_LARCH_B26, R_LARCH_PCALA_HI20, R_LARCH_PCALA_LO12,
> R_LARCH_GOT_PC_HI20, and R_LARCH_GOT_PC_LO12 relocations. Support them
> in the module loader to allow a kernel built with latest toolchain
> capable to load the modules.
>
> Signed-off-by: Xi Ruoyao <[email protected]>
> ---
> arch/loongarch/include/asm/elf.h | 37 +++++++++++
> arch/loongarch/kernel/module-sections.c | 12 +++-
> arch/loongarch/kernel/module.c | 85 +++++++++++++++++++++++++
> 3 files changed, 132 insertions(+), 2 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h
> index 5f3ff4781fda..7af0cebf28d7 100644
> --- a/arch/loongarch/include/asm/elf.h
> +++ b/arch/loongarch/include/asm/elf.h
> @@ -74,6 +74,43 @@
> #define R_LARCH_SUB64 56
> #define R_LARCH_GNU_VTINHERIT 57
> #define R_LARCH_GNU_VTENTRY 58
> +#define R_LARCH_B16 64
> +#define R_LARCH_B21 65
> +#define R_LARCH_B26 66
> +#define R_LARCH_ABS_HI20 67
> +#define R_LARCH_ABS_LO12 68
> +#define R_LARCH_ABS64_LO20 69
> +#define R_LARCH_ABS64_HI12 70
ARCH_REL_TYPE_ABS should add the corresponding absolute relocation type
with the new relocation type added. Maybe we should add R_LARCH_ABS* in
arch/loongarch/vdso/Makefile?

Thanks,
Youling

> +#define R_LARCH_PCALA_HI20 71
> +#define R_LARCH_PCALA_LO12 72
> +#define R_LARCH_PCALA64_LO20 73
> +#define R_LARCH_PCALA64_HI12 74
> +#define R_LARCH_GOT_PC_HI20 75
> +#define R_LARCH_GOT_PC_LO12 76
> +#define R_LARCH_GOT64_PC_LO20 77
> +#define R_LARCH_GOT64_PC_HI12 78
> +#define R_LARCH_GOT_HI20 79
> +#define R_LARCH_GOT_LO12 80
> +#define R_LARCH_GOT64_LO20 81
> +#define R_LARCH_GOT64_HI12 82
> +#define R_LARCH_TLS_LE_HI20 83
> +#define R_LARCH_TLS_LE_LO12 84
> +#define R_LARCH_TLS_LE64_LO20 85
> +#define R_LARCH_TLS_LE64_HI12 86
> +#define R_LARCH_TLS_IE_PC_HI20 87
> +#define R_LARCH_TLS_IE_PC_LO12 88
> +#define R_LARCH_TLS_IE64_PC_LO20 89
> +#define R_LARCH_TLS_IE64_PC_HI12 90
> +#define R_LARCH_TLS_IE_HI20 91
> +#define R_LARCH_TLS_IE_LO12 92
> +#define R_LARCH_TLS_IE64_LO20 93
> +#define R_LARCH_TLS_IE64_HI12 94
> +#define R_LARCH_TLS_LD_PC_HI20 95
> +#define R_LARCH_TLS_LD_HI20 96
> +#define R_LARCH_TLS_GD_PC_HI20 97
> +#define R_LARCH_TLS_GD_HI20 98
> +#define R_LARCH_32_PCREL 99
> +#define R_LARCH_RELAX 100
>
> #ifndef ELF_ARCH
>
> diff --git a/arch/loongarch/kernel/module-sections.c b/arch/loongarch/kernel/module-sections.c
> index 36a77771d18c..8c0e4ad048cc 100644
> --- a/arch/loongarch/kernel/module-sections.c
> +++ b/arch/loongarch/kernel/module-sections.c
> @@ -76,12 +76,20 @@ static void count_max_entries(Elf_Rela *relas, int num,
>
> for (i = 0; i < num; i++) {
> type = ELF_R_TYPE(relas[i].r_info);
> - if (type == R_LARCH_SOP_PUSH_PLT_PCREL) {
> + switch (type) {
> + case R_LARCH_SOP_PUSH_PLT_PCREL:
> + case R_LARCH_B26:
> if (!duplicate_rela(relas, i))
> (*plts)++;
> - } else if (type == R_LARCH_SOP_PUSH_GPREL)
> + break;
> + case R_LARCH_SOP_PUSH_GPREL:
> + case R_LARCH_GOT_PC_HI20:
> if (!duplicate_rela(relas, i))
> (*gots)++;
> + break;
> + default:
> + /* Do nothing. */
> + }
> }
> }
>
> diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
> index 3ac4fbb5f109..c7b40150e1f0 100644
> --- a/arch/loongarch/kernel/module.c
> +++ b/arch/loongarch/kernel/module.c
> @@ -291,6 +291,86 @@ static int apply_r_larch_add_sub(struct module *mod, u32 *location, Elf_Addr v,
> }
> }
>
> +static int apply_r_larch_b26(struct module *mod, u32 *location, Elf_Addr v,
> + s64 *rela_stack, size_t *rela_stack_top, unsigned int type)
> +{
> + ptrdiff_t offset = (void *)v - (void *)location;
> + union loongarch_instruction *insn = (union loongarch_instruction *)location;
> +
> + if (offset >= SZ_128M)
> + v = module_emit_plt_entry(mod, v);
> +
> + if (offset < -SZ_128M)
> + v = module_emit_plt_entry(mod, v);
> +
> + offset = (void *)v - (void *)location;
> +
> + if (offset & 3) {
> + pr_err("module %s: jump offset = 0x%llx unaligned! dangerous R_LARCH_B26 (%u) relocation\n",
> + mod->name, (long long)offset, type);
> + return -ENOEXEC;
> + }
> +
> + if (!signed_imm_check(offset, 28)) {
> + pr_err("module %s: jump offset = 0x%llx overflow! dangerous R_LARCH_B26 (%u) relocation\n",
> + mod->name, (long long)offset, type);
> + return -ENOEXEC;
> + }
> +
> + offset >>= 2;
> + insn->reg0i26_format.immediate_l = offset & 0xffff;
> + insn->reg0i26_format.immediate_h = (offset >> 16) & 0x3ff;
> + return 0;
> +}
> +
> +static int apply_r_larch_pcala_hi20(struct module *mod, u32 *location,
> + Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
> + unsigned int type)
> +{
> + ptrdiff_t offset = (void *)((v + 0x800) & ~0xfff) -
> + (void *)((Elf_Addr)location & ~0xfff);
> + union loongarch_instruction *insn = (union loongarch_instruction *)location;
> +
> + if (!signed_imm_check(offset, 32)) {
> + pr_err("module %s: PCALA offset = 0x%llx does not fit in 32-bit signed and is unsupported by kernel! dangerous %s (%u) relocation\n",
> + mod->name, (long long)offset, __func__, type);
> + return -ENOEXEC;
> + }
> +
> + insn->reg1i20_format.immediate = (offset >> 12) & 0xfffff;
> + return 0;
> +}
> +
> +static int apply_r_larch_got_pc_hi20(struct module *mod, u32 *location,
> + Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
> + unsigned int type)
> +{
> + Elf_Addr got = module_emit_got_entry(mod, v);
> +
> + return apply_r_larch_pcala_hi20(mod, location, got, rela_stack,
> + rela_stack_top, type);
> +}
> +
> +static int apply_r_larch_pcala_lo12(struct module *mod, u32 *location,
> + Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
> + unsigned int type)
> +{
> + union loongarch_instruction *insn = (union loongarch_instruction *)location;
> +
> + insn->reg2i12_format.immediate = v & 0xfff;
> + return 0;
> +}
> +
> +static int apply_r_larch_got_pc_lo12(struct module *mod, u32 *location,
> + Elf_Addr v, s64 *rela_stack, size_t *rela_stack_top,
> + unsigned int type)
> +{
> + Elf_Addr got = module_emit_got_entry(mod, v);
> +
> + return apply_r_larch_pcala_lo12(mod, location, got, rela_stack,
> + rela_stack_top, type);
> +}
> +
> /*
> * reloc_handlers_rela() - Apply a particular relocation to a module
> * @mod: the module to apply the reloc to
> @@ -321,6 +401,11 @@ static reloc_rela_handler reloc_rela_handlers[] = {
> [R_LARCH_SOP_SUB ... R_LARCH_SOP_IF_ELSE] = apply_r_larch_sop,
> [R_LARCH_SOP_POP_32_S_10_5 ... R_LARCH_SOP_POP_32_U] = apply_r_larch_sop_imm_field,
> [R_LARCH_ADD32 ... R_LARCH_SUB64] = apply_r_larch_add_sub,
> + [R_LARCH_B26] = apply_r_larch_b26,
> + [R_LARCH_PCALA_HI20] = apply_r_larch_pcala_hi20,
> + [R_LARCH_PCALA_LO12] = apply_r_larch_pcala_lo12,
> + [R_LARCH_GOT_PC_HI20] = apply_r_larch_got_pc_hi20,
> + [R_LARCH_GOT_PC_LO12] = apply_r_larch_got_pc_lo12,
> };
>
> int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>

2022-08-27 14:05:06

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] LoongArch: Support new relocation types

Hi, Arnd,

Could you please update the cross-compilers here [1] with the current
snapshot of binutils and gcc? Or we can only do that until new
binutils/gcc be released?

New relocation types will be supported in binutils-2.40 and gcc-13 but
they will be released after kernel-6.1. If possible, we want to make
this series be upstream in the 6.1 cycle and remove the old relocation
types support at the same time (This means we cannot use old toolchain
to build new kernels). Since 5.19 and 6.0 are un-bootable on
LoongArch, this will break nothing but make life easier by reducing
maintenance cost.

[1] https://mirrors.edge.kernel.org/pub/tools/crosstool/


Huacai

On Tue, Aug 2, 2022 at 3:15 PM Xi Ruoyao <[email protected]> wrote:
>
> On Tue, 2022-08-02 at 14:39 +0800, Lulu Cheng wrote:
>
> > >
> > > OO, old toolchains require extra handlingg no matter how modified.
> > > Maybe rejecting old toolchain builds is a good option as Huacai
> > > said.
>
> > Sorry to ask, is it possible to use an absolute address to visit here?
> > Like precpu, do not use pcrel or got, and it directly uses
> > four instructions to obtain the absolute address.
> > The same is achieved by adding attributes in GCC.
>
> Both GOT and ABS will work. But to me GOT is better because it only
> needs two instructions while ABS needs four.
>
> The most troubling issue is how to support the old GCC. It seems we
> have to check GCC version and use -Wa,-mla-local-with-pcrel for GCC 12
> (while I still think GOT is better but we don't have -mla-local-with-
> got, *and* it will cause make every local object address load cost 4
> instructions), or we just say "it's impossible to use GCC 12 to build
> Linux 6.0 for LoongArch".
>
> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University