2023-06-01 12:12:34

by Masahiro Yamada

[permalink] [raw]
Subject: [PATCH 0/7] modpost: fix section mismatch detection for ARM

addend_arm_rel() is completely, entirely bogus.

Fix the code, and also catch more section mismatches.

I confirmed this series is cleanly applicable to linux-next 20230601.



Masahiro Yamada (7):
modpost: fix section mismatch message for R_ARM_ABS32
modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24}
modpost: detect section mismatch for R_ARM_{MOVW_ABS_NC,MOVT_ABS}
modpost: refactor find_fromsym() and find_tosym()
modpost: detect section mismatch for R_ARM_THM_{MOVW_ABS_NC,MOVT_ABS}
modpost: fix section_mismatch message for
R_ARM_THM_{CALL,JUMP24,JUMP19}
modpost: detect section mismatch for R_ARM_REL32

scripts/mod/modpost.c | 193 ++++++++++++++++++++++++++++--------------
1 file changed, 129 insertions(+), 64 deletions(-)

--
2.39.2



2023-06-01 12:12:52

by Masahiro Yamada

[permalink] [raw]
Subject: [PATCH 2/7] modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24}

addend_arm_rel() processes R_ARM_PC24, R_ARM_CALL, R_ARM_JUMP24 in a
wrong way.

Here, test code.

[test code for R_ARM_JUMP24]

.section .init.text,"ax"
bar:
bx lr

.section .text,"ax"
.globl foo
foo:
b bar

[test code for R_ARM_CALL]

.section .init.text,"ax"
bar:
bx lr

.section .text,"ax"
.globl foo
foo:
push {lr}
bl bar
pop {pc}

If you compile it with ARM multi_v7_defconfig, modpost will show the
symbol name, (unknown).

WARNING: modpost: vmlinux.o: section mismatch in reference: foo (section: .text) -> (unknown) (section: .init.text)

(You need to use GNU linker instead of LLD to reproduce it.)

Fix the code to make modpost show the correct symbol name.

I imported (with adjustment) sign_extend32() from include/linux/bitops.h.

The '+8' is the compensation for pc-relative instruction. It is
documented in "ELF for the Arm Architecture" [1].

"If the relocation is pc-relative then compensation for the PC bias
(the PC value is 8 bytes ahead of the executing instruction in Arm
state and 4 bytes in Thumb state) must be encoded in the relocation
by the object producer."

[1]: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst

Fixes: 56a974fa2d59 ("kbuild: make better section mismatch reports on arm")
Fixes: 6e2e340b59d2 ("ARM: 7324/1: modpost: Fix section warnings for ARM for many compilers")
Signed-off-by: Masahiro Yamada <[email protected]>
---

scripts/mod/modpost.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index c68dad45ace2..e47bba7cfad2 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1277,12 +1277,20 @@ static int addend_386_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
#define R_ARM_THM_JUMP19 51
#endif

+static int32_t sign_extend32(int32_t value, int index)
+{
+ uint8_t shift = 31 - index;
+
+ return (int32_t)(value << shift) >> shift;
+}
+
static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
{
unsigned int r_typ = ELF_R_TYPE(r->r_info);
Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
void *loc = reloc_location(elf, sechdr, r);
uint32_t inst;
+ int32_t offset;

switch (r_typ) {
case R_ARM_ABS32:
@@ -1292,6 +1300,10 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
case R_ARM_PC24:
case R_ARM_CALL:
case R_ARM_JUMP24:
+ inst = TO_NATIVE(*(uint32_t *)loc);
+ offset = sign_extend32((inst & 0x00ffffff) << 2, 25);
+ r->r_addend = offset + sym->st_value + 8;
+ break;
case R_ARM_THM_CALL:
case R_ARM_THM_JUMP24:
case R_ARM_THM_JUMP19:
--
2.39.2


2023-06-01 12:12:52

by Masahiro Yamada

[permalink] [raw]
Subject: [PATCH 3/7] modpost: detect section mismatch for R_ARM_{MOVW_ABS_NC,MOVT_ABS}

For ARM defconfig (i.e. multi_v7_defconfig), modpost fails to detect
some types of section mismatches.

[test code]

#include <linux/init.h>

int __initdata foo;
int get_foo(void) { return foo; }

It is apparently a bad reference, but modpost does not report anything.

The test code above produces the following relocations.

Relocation section '.rel.text' at offset 0x200 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0
00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0

Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.

Add code to handle them. I checked arch/arm/kernel/module.c to learn
how the offset is encoded in the instruction.

The referenced symbol in relocation might be a local anchor.
If is_valid_name() returns false, let's search for a better symbol name.

Signed-off-by: Masahiro Yamada <[email protected]>
---

scripts/mod/modpost.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index e47bba7cfad2..5a5e802b160c 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1078,7 +1078,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
/**
* Find symbol based on relocation record info.
* In some cases the symbol supplied is a valid symbol so
- * return refsym. If st_name != 0 we assume this is a valid symbol.
+ * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
* In other cases the symbol needs to be looked up in the symbol table
* based on section and address.
* **/
@@ -1091,7 +1091,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
Elf64_Sword d;
unsigned int relsym_secindex;

- if (relsym->st_name != 0)
+ if (is_valid_name(elf, relsym))
return relsym;

/*
@@ -1297,6 +1297,13 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
inst = TO_NATIVE(*(uint32_t *)loc);
r->r_addend = inst + sym->st_value;
break;
+ case R_ARM_MOVW_ABS_NC:
+ case R_ARM_MOVT_ABS:
+ inst = TO_NATIVE(*(uint32_t *)loc);
+ offset = sign_extend32(((inst & 0xf0000) >> 4) | (inst & 0xfff),
+ 15);
+ r->r_addend = offset + sym->st_value;
+ break;
case R_ARM_PC24:
case R_ARM_CALL:
case R_ARM_JUMP24:
--
2.39.2


2023-06-01 12:13:00

by Masahiro Yamada

[permalink] [raw]
Subject: [PATCH 4/7] modpost: refactor find_fromsym() and find_tosym()

find_fromsym() and find_tosym() are similar - both of them iterate
in the .symtab section and return the nearest symbol.

The difference between them is that find_tosym() allows a negative
distance, but the distance must be less than 20.

Factor out the common part into find_nearest_sym().

Signed-off-by: Masahiro Yamada <[email protected]>
---

scripts/mod/modpost.c | 95 ++++++++++++++++---------------------------
1 file changed, 36 insertions(+), 59 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 5a5e802b160c..32d56efe3f3b 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1075,81 +1075,58 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
return !is_mapping_symbol(name);
}

-/**
- * Find symbol based on relocation record info.
- * In some cases the symbol supplied is a valid symbol so
- * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
- * In other cases the symbol needs to be looked up in the symbol table
- * based on section and address.
- * **/
-static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
- Elf_Sym *relsym)
+/* Look up the nearest symbol based on the section and the address */
+static Elf_Sym *find_nearest_sym(struct elf_info *elf, Elf_Addr addr,
+ unsigned int secndx, bool allow_negative,
+ Elf_Addr min_distance)
{
Elf_Sym *sym;
Elf_Sym *near = NULL;
- Elf64_Sword distance = 20;
- Elf64_Sword d;
- unsigned int relsym_secindex;
-
- if (is_valid_name(elf, relsym))
- return relsym;
-
- /*
- * Strive to find a better symbol name, but the resulting name may not
- * match the symbol referenced in the original code.
- */
- relsym_secindex = get_secindex(elf, relsym);
- for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) {
- if (get_secindex(elf, sym) != relsym_secindex)
- continue;
- if (ELF_ST_TYPE(sym->st_info) == STT_SECTION)
- continue;
- if (!is_valid_name(elf, sym))
- continue;
- if (sym->st_value == addr)
- return sym;
- /* Find a symbol nearby - addr are maybe negative */
- d = sym->st_value - addr;
- if (d < 0)
- d = addr - sym->st_value;
- if (d < distance) {
- distance = d;
- near = sym;
- }
- }
- /* We need a close match */
- if (distance < 20)
- return near;
- else
- return NULL;
-}
-
-/*
- * Find symbols before or equal addr and after addr - in the section sec.
- * If we find two symbols with equal offset prefer one with a valid name.
- * The ELF format may have a better way to detect what type of symbol
- * it is, but this works for now.
- **/
-static Elf_Sym *find_fromsym(struct elf_info *elf, Elf_Addr addr,
- unsigned int secndx)
-{
- Elf_Sym *sym;
- Elf_Sym *near = NULL;
- Elf_Addr distance = ~0;
+ Elf_Addr distance;

for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) {
if (get_secindex(elf, sym) != secndx)
continue;
if (!is_valid_name(elf, sym))
continue;
- if (sym->st_value <= addr && addr - sym->st_value <= distance) {
+
+ if (addr >= sym->st_value)
distance = addr - sym->st_value;
+ else if (allow_negative)
+ distance = sym->st_value - addr;
+ else
+ continue;
+
+ if (distance <= min_distance) {
+ min_distance = distance;
near = sym;
}
+
+ if (min_distance == 0)
+ break;
}
return near;
}

+static Elf_Sym *find_fromsym(struct elf_info *elf, Elf_Addr addr,
+ unsigned int secndx)
+{
+ return find_nearest_sym(elf, addr, secndx, false, ~0);
+}
+
+static Elf_Sym *find_tosym(struct elf_info *elf, Elf_Addr addr, Elf_Sym *sym)
+{
+ /* If the supplied symbol has a valid name, return it */
+ if (is_valid_name(elf, sym))
+ return sym;
+
+ /*
+ * Strive to find a better symbol name, but the resulting name may not
+ * match the symbol referenced in the original code.
+ */
+ return find_nearest_sym(elf, addr, get_secindex(elf, sym), true, 20);
+}
+
static bool is_executable_section(struct elf_info *elf, unsigned int secndx)
{
if (secndx > elf->num_sections)
--
2.39.2


2023-06-01 12:13:19

by Masahiro Yamada

[permalink] [raw]
Subject: [PATCH 7/7] modpost: detect section mismatch for R_ARM_REL32

For ARM, modpost fails to detect some types of section mismatches.

[test code]

.section .init.data,"aw"
bar:
.long 0

.section .data,"aw"
.globl foo
foo:
.long bar - .

It is apparently a bad reference, but modpost does not report anything.

The test code above produces the following relocations.

Relocation section '.rel.data' at offset 0xe8 contains 1 entry:
Offset Info Type Sym.Value Sym. Name
00000000 00000403 R_ARM_REL32 00000000 .init.data

Currently, R_ARM_REL32 is just skipped.

Handle it like R_ARM_ABS32.

Signed-off-by: Masahiro Yamada <[email protected]>
---

scripts/mod/modpost.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 55d142bb000b..9f0c87064ca5 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1281,6 +1281,7 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)

switch (r_typ) {
case R_ARM_ABS32:
+ case R_ARM_REL32:
inst = TO_NATIVE(*(uint32_t *)loc);
r->r_addend = inst + sym->st_value;
break;
--
2.39.2


2023-06-01 12:14:09

by Masahiro Yamada

[permalink] [raw]
Subject: [PATCH 5/7] modpost: detect section mismatch for R_ARM_THM_{MOVW_ABS_NC,MOVT_ABS}

When CONFIG_THUMB2_KERNEL is enabled, modpost fails to detect some
types of section mismatches.

[test code]

#include <linux/init.h>

int __initdata foo;
int get_foo(void) { return foo; }

It is apparently a bad reference, but modpost does not report anything.

The test code above produces the following relocations.

Relocation section '.rel.text' at offset 0x1e8 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 0000052f R_ARM_THM_MOVW_AB 00000000 .LANCHOR0
00000004 00000530 R_ARM_THM_MOVT_AB 00000000 .LANCHOR0

Currently, R_ARM_THM_MOVW_ABS_NC and R_ARM_THM_MOVT_ABS are just skipped.

Add code to handle them. I checked arch/arm/kernel/module.c to learn
how the offset is encoded in the instruction.

One more thing to note for Thumb instructions - the st_value is an odd
value, so you need to mask the bit 0 to get the offset. Otherwise, you
will get an off-by-one error in the nearest symbol look-up.

It is documented in "ELF for the ARM Architecture" [1]:

* If the symbol addresses a Thumb instruction, its value is the address
of the instruction with bit zero set (in a relocatable object, the
section offset with bit zero set).

* For the purposes of relocation the value used shall be the address
of the instruction (st_value & ~1).

[1]: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst

Signed-off-by: Masahiro Yamada <[email protected]>
---

scripts/mod/modpost.c | 31 ++++++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 32d56efe3f3b..528aa9175e84 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1082,7 +1082,8 @@ static Elf_Sym *find_nearest_sym(struct elf_info *elf, Elf_Addr addr,
{
Elf_Sym *sym;
Elf_Sym *near = NULL;
- Elf_Addr distance;
+ Elf_Addr sym_addr, distance;
+ bool is_arm = (elf->hdr->e_machine == EM_ARM);

for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) {
if (get_secindex(elf, sym) != secndx)
@@ -1090,10 +1091,19 @@ static Elf_Sym *find_nearest_sym(struct elf_info *elf, Elf_Addr addr,
if (!is_valid_name(elf, sym))
continue;

- if (addr >= sym->st_value)
- distance = addr - sym->st_value;
+ sym_addr = sym->st_value;
+
+ /*
+ * For ARM Thumb instruction, the bit 0 of st_value is set.
+ * Mask it to get the address.
+ */
+ if (is_arm)
+ sym_addr &= ~1;
+
+ if (addr >= sym_addr)
+ distance = addr - sym_addr;
else if (allow_negative)
- distance = sym->st_value - addr;
+ distance = sym_addr - addr;
else
continue;

@@ -1266,7 +1276,7 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
unsigned int r_typ = ELF_R_TYPE(r->r_info);
Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
void *loc = reloc_location(elf, sechdr, r);
- uint32_t inst;
+ uint32_t inst, upper, lower;
int32_t offset;

switch (r_typ) {
@@ -1288,6 +1298,17 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
offset = sign_extend32((inst & 0x00ffffff) << 2, 25);
r->r_addend = offset + sym->st_value + 8;
break;
+ case R_ARM_THM_MOVW_ABS_NC:
+ case R_ARM_THM_MOVT_ABS:
+ upper = TO_NATIVE(*(uint16_t *)loc);
+ lower = TO_NATIVE(*((uint16_t *)loc + 1));
+ offset = sign_extend32(((upper & 0x000f) << 12) |
+ ((upper & 0x0400) << 1) |
+ ((lower & 0x7000) >> 4) |
+ (lower & 0x00ff),
+ 15);
+ r->r_addend = offset + sym->st_value;
+ break;
case R_ARM_THM_CALL:
case R_ARM_THM_JUMP24:
case R_ARM_THM_JUMP19:
--
2.39.2


2023-06-01 12:16:29

by Masahiro Yamada

[permalink] [raw]
Subject: [PATCH 6/7] modpost: fix section_mismatch message for R_ARM_THM_{CALL,JUMP24,JUMP19}

addend_arm_rel() processes R_ARM_THM_CALL, R_ARM_THM_JUMP24,
R_ARM_THM_JUMP19 in a wrong way.

Here, test code.

[test code for R_ARM_THM_JUMP24]

  .section .init.text,"ax"
  bar:
          bx      lr

  .section .text,"ax"
  .globl foo
  foo:
          b       bar

[test code for R_ARM_THM_CALL]

  .section .init.text,"ax"
  bar:
          bx      lr

  .section .text,"ax"
  .globl foo
  foo:
          push    {lr}
          bl      bar
          pop     {pc}

If you compile it with CONFIG_THUMB2_KERNEL=y, modpost will show the
symbol name, (unknown).

  WARNING: modpost: vmlinux.o: section mismatch in reference: foo (section: .text) -> (unknown) (section: .init.text)

(You need to use GNU linker instead of LLD to reproduce it.)

Fix the code to make modpost show the correct symbol name. I checked
arch/arm/kernel/module.c to learn the encoding of R_ARM_THM_CALL and
R_ARM_THM_JUMP24. The module does not support R_ARM_THM_JUMP19, but
I checked its encoding in ARM ARM.

The '+4' is the compensation for pc-relative instruction. It is
documented in "ELF for the Arm Architecture" [1].

  "If the relocation is pc-relative then compensation for the PC bias
  (the PC value is 8 bytes ahead of the executing instruction in Arm
  state and 4 bytes in Thumb state) must be encoded in the relocation
  by the object producer."

[1]: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst

Fixes: c9698e5cd6ad ("ARM: 7964/1: Detect section mismatches in thumb relocations")
Signed-off-by: Masahiro Yamada <[email protected]>
---

scripts/mod/modpost.c | 53 ++++++++++++++++++++++++++++++++++++++-----
1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 528aa9175e84..55d142bb000b 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1276,7 +1276,7 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
unsigned int r_typ = ELF_R_TYPE(r->r_info);
Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
void *loc = reloc_location(elf, sechdr, r);
- uint32_t inst, upper, lower;
+ uint32_t inst, upper, lower, sign, j1, j2;
int32_t offset;

switch (r_typ) {
@@ -1309,13 +1309,54 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
15);
r->r_addend = offset + sym->st_value;
break;
+ case R_ARM_THM_JUMP19:
+ /*
+ * Encoding T3:
+ * S = upper[10]
+ * imm6 = upper[5:0]
+ * J1 = lower[13]
+ * J2 = lower[11]
+ * imm11 = lower[10:0]
+ * imm32 = SignExtend(S:J2:J1:imm6:imm11:'0')
+ */
+ upper = TO_NATIVE(*(uint16_t *)loc);
+ lower = TO_NATIVE(*((uint16_t *)loc + 1));
+
+ sign = (upper >> 10) & 1;
+ j1 = (lower >> 13) & 1;
+ j2 = (lower >> 11) & 1;
+ offset = sign_extend32((sign << 20) | (j2 << 19) | (j1 << 18) |
+ ((upper & 0x03f) << 12) |
+ ((lower & 0x07ff) << 1),
+ 20);
+ r->r_addend = offset + sym->st_value + 4;
+ break;
case R_ARM_THM_CALL:
case R_ARM_THM_JUMP24:
- case R_ARM_THM_JUMP19:
- /* From ARM ABI: ((S + A) | T) - P */
- r->r_addend = (int)(long)(elf->hdr +
- sechdr->sh_offset +
- (r->r_offset - sechdr->sh_addr));
+ /*
+ * Encoding T4:
+ * S = upper[10]
+ * imm10 = upper[9:0]
+ * J1 = lower[13]
+ * J2 = lower[11]
+ * imm11 = lower[10:0]
+ * I1 = NOT(J1 XOR S)
+ * I2 = NOT(J2 XOR S)
+ * imm32 = SignExtend(S:I1:I2:imm10:imm11:'0')
+ */
+ upper = TO_NATIVE(*(uint16_t *)loc);
+ lower = TO_NATIVE(*((uint16_t *)loc + 1));
+
+ sign = (upper >> 10) & 1;
+ j1 = (lower >> 13) & 1;
+ j2 = (lower >> 11) & 1;
+ offset = sign_extend32((sign << 24) |
+ ((~(j1 ^ sign) & 1) << 23) |
+ ((~(j2 ^ sign) & 1) << 22) |
+ ((upper & 0x03ff) << 12) |
+ ((lower & 0x07ff) << 1),
+ 24);
+ r->r_addend = offset + sym->st_value + 4;
break;
default:
return 1;
--
2.39.2


2023-06-01 12:23:57

by Masahiro Yamada

[permalink] [raw]
Subject: [PATCH 1/7] modpost: fix section mismatch message for R_ARM_ABS32

addend_arm_rel() processes R_ARM_ABS32 in a wrong way.

Here, test code.

[test code 1]

#include <linux/init.h>

int __initdata foo;
int get_foo(void) { return foo; }

If you compile it with ARM versatile_defconfig, modpost will show the
symbol name, (unknown).

WARNING: modpost: vmlinux.o: section mismatch in reference: get_foo (section: .text) -> (unknown) (section: .init.data)

(You need to use GNU linker instead of LLD to reproduce it.)

If you compile it for other architectures, modpost will show the correct
symbol name.

WARNING: modpost: vmlinux.o: section mismatch in reference: get_foo (section: .text) -> foo (section: .init.data)

For R_ARM_ABS32, addend_arm_rel() sets r->r_addend to a wrong value.

I just mimicked the code in arch/arm/kernel/module.c.

However, there is more difficulty for ARM.

Here, test code.

[test code 2]

#include <linux/init.h>

int __initdata foo;
int get_foo(void) { return foo; }

int __initdata bar;
int get_bar(void) { return bar; }

With this commit applied, modpost will show the following messages
for ARM versatile_defconfig:

WARNING: modpost: vmlinux.o: section mismatch in reference: get_foo (section: .text) -> foo (section: .init.data)
WARNING: modpost: vmlinux.o: section mismatch in reference: get_bar (section: .text) -> foo (section: .init.data)

The reference from 'get_bar' to 'foo' seems wrong.

I have no solution for this because it is true in assembly level.

In the following output, relocation at 0x1c is no longer associated
with 'bar'. The two relocation entries point to the same symbol, and
the offset to 'bar' is encoded in the instruction 'r0, [r3, #4]'.

Disassembly of section .text:

00000000 <get_foo>:
0: e59f3004 ldr r3, [pc, #4] @ c <get_foo+0xc>
4: e5930000 ldr r0, [r3]
8: e12fff1e bx lr
c: 00000000 .word 0x00000000

00000010 <get_bar>:
10: e59f3004 ldr r3, [pc, #4] @ 1c <get_bar+0xc>
14: e5930004 ldr r0, [r3, #4]
18: e12fff1e bx lr
1c: 00000000 .word 0x00000000

Relocation section '.rel.text' at offset 0x244 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0000000c 00000c02 R_ARM_ABS32 00000000 .init.data
0000001c 00000c02 R_ARM_ABS32 00000000 .init.data

When find_elf_symbol() gets into a situation where relsym->st_name is
zero, there is no guarantee to get the symbol name as written in C.

I am keeping the current logic because it is useful in many architectures,
but the symbol name is not always correct depending on the optimization.
I left some comments in find_tosym().

Fixes: 56a974fa2d59 ("kbuild: make better section mismatch reports on arm")
Signed-off-by: Masahiro Yamada <[email protected]>
---

scripts/mod/modpost.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 7031e5da62e5..c68dad45ace2 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1094,6 +1094,10 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
if (relsym->st_name != 0)
return relsym;

+ /*
+ * Strive to find a better symbol name, but the resulting name may not
+ * match the symbol referenced in the original code.
+ */
relsym_secindex = get_secindex(elf, relsym);
for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) {
if (get_secindex(elf, sym) != relsym_secindex)
@@ -1276,12 +1280,14 @@ static int addend_386_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
{
unsigned int r_typ = ELF_R_TYPE(r->r_info);
+ Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
+ void *loc = reloc_location(elf, sechdr, r);
+ uint32_t inst;

switch (r_typ) {
case R_ARM_ABS32:
- /* From ARM ABI: (S + A) | T */
- r->r_addend = (int)(long)
- (elf->symtab_start + ELF_R_SYM(r->r_info));
+ inst = TO_NATIVE(*(uint32_t *)loc);
+ r->r_addend = inst + sym->st_value;
break;
case R_ARM_PC24:
case R_ARM_CALL:
--
2.39.2


2023-06-01 12:30:05

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 5/7] modpost: detect section mismatch for R_ARM_THM_{MOVW_ABS_NC,MOVT_ABS}

On Thu, 1 Jun 2023 at 14:10, Masahiro Yamada <[email protected]> wrote:
>
> When CONFIG_THUMB2_KERNEL is enabled, modpost fails to detect some
> types of section mismatches.
>
> [test code]
>
> #include <linux/init.h>
>
> int __initdata foo;
> int get_foo(void) { return foo; }
>
> It is apparently a bad reference, but modpost does not report anything.
>
> The test code above produces the following relocations.
>
> Relocation section '.rel.text' at offset 0x1e8 contains 2 entries:
> Offset Info Type Sym.Value Sym. Name
> 00000000 0000052f R_ARM_THM_MOVW_AB 00000000 .LANCHOR0
> 00000004 00000530 R_ARM_THM_MOVT_AB 00000000 .LANCHOR0
>
> Currently, R_ARM_THM_MOVW_ABS_NC and R_ARM_THM_MOVT_ABS are just skipped.
>
> Add code to handle them. I checked arch/arm/kernel/module.c to learn
> how the offset is encoded in the instruction.
>
> One more thing to note for Thumb instructions - the st_value is an odd
> value, so you need to mask the bit 0 to get the offset. Otherwise, you
> will get an off-by-one error in the nearest symbol look-up.
>
> It is documented in "ELF for the ARM Architecture" [1]:
>
> * If the symbol addresses a Thumb instruction, its value is the address
> of the instruction with bit zero set (in a relocatable object, the
> section offset with bit zero set).
>
> * For the purposes of relocation the value used shall be the address
> of the instruction (st_value & ~1).
>
> [1]: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst
>
> Signed-off-by: Masahiro Yamada <[email protected]>
> ---
>
> scripts/mod/modpost.c | 31 ++++++++++++++++++++++++++-----
> 1 file changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 32d56efe3f3b..528aa9175e84 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -1082,7 +1082,8 @@ static Elf_Sym *find_nearest_sym(struct elf_info *elf, Elf_Addr addr,
> {
> Elf_Sym *sym;
> Elf_Sym *near = NULL;
> - Elf_Addr distance;
> + Elf_Addr sym_addr, distance;
> + bool is_arm = (elf->hdr->e_machine == EM_ARM);
>
> for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) {
> if (get_secindex(elf, sym) != secndx)
> @@ -1090,10 +1091,19 @@ static Elf_Sym *find_nearest_sym(struct elf_info *elf, Elf_Addr addr,
> if (!is_valid_name(elf, sym))
> continue;
>
> - if (addr >= sym->st_value)
> - distance = addr - sym->st_value;
> + sym_addr = sym->st_value;
> +
> + /*
> + * For ARM Thumb instruction, the bit 0 of st_value is set.
> + * Mask it to get the address.
> + */
> + if (is_arm)
> + sym_addr &= ~1;
> +

This is only appropriate for STT_FUNC symbols. If this is a data
reference, bit 0 could be a valid address bit.



> + if (addr >= sym_addr)
> + distance = addr - sym_addr;
> else if (allow_negative)
> - distance = sym->st_value - addr;
> + distance = sym_addr - addr;
> else
> continue;
>
> @@ -1266,7 +1276,7 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> unsigned int r_typ = ELF_R_TYPE(r->r_info);
> Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
> void *loc = reloc_location(elf, sechdr, r);
> - uint32_t inst;
> + uint32_t inst, upper, lower;
> int32_t offset;
>
> switch (r_typ) {
> @@ -1288,6 +1298,17 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> offset = sign_extend32((inst & 0x00ffffff) << 2, 25);
> r->r_addend = offset + sym->st_value + 8;
> break;
> + case R_ARM_THM_MOVW_ABS_NC:
> + case R_ARM_THM_MOVT_ABS:
> + upper = TO_NATIVE(*(uint16_t *)loc);
> + lower = TO_NATIVE(*((uint16_t *)loc + 1));
> + offset = sign_extend32(((upper & 0x000f) << 12) |
> + ((upper & 0x0400) << 1) |
> + ((lower & 0x7000) >> 4) |
> + (lower & 0x00ff),
> + 15);
> + r->r_addend = offset + sym->st_value;
> + break;
> case R_ARM_THM_CALL:
> case R_ARM_THM_JUMP24:
> case R_ARM_THM_JUMP19:
> --
> 2.39.2
>

2023-06-01 13:19:53

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 7/7] modpost: detect section mismatch for R_ARM_REL32

On Thu, 1 Jun 2023 at 14:10, Masahiro Yamada <[email protected]> wrote:
>
> For ARM, modpost fails to detect some types of section mismatches.
>
> [test code]
>
> .section .init.data,"aw"
> bar:
> .long 0
>
> .section .data,"aw"
> .globl foo
> foo:
> .long bar - .
>
> It is apparently a bad reference, but modpost does not report anything.
>
> The test code above produces the following relocations.
>
> Relocation section '.rel.data' at offset 0xe8 contains 1 entry:
> Offset Info Type Sym.Value Sym. Name
> 00000000 00000403 R_ARM_REL32 00000000 .init.data
>
> Currently, R_ARM_REL32 is just skipped.
>
> Handle it like R_ARM_ABS32.

OK, so the reason we can handle these in the same way is because we
never calculate the resulting value, right? Because that value would
be different for these cases.


>
> Signed-off-by: Masahiro Yamada <[email protected]>
> ---
>
> scripts/mod/modpost.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 55d142bb000b..9f0c87064ca5 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -1281,6 +1281,7 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
>
> switch (r_typ) {
> case R_ARM_ABS32:
> + case R_ARM_REL32:
> inst = TO_NATIVE(*(uint32_t *)loc);
> r->r_addend = inst + sym->st_value;
> break;
> --
> 2.39.2
>

2023-06-01 14:53:08

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 7/7] modpost: detect section mismatch for R_ARM_REL32

On Thu, 1 Jun 2023 at 16:36, Masahiro Yamada <[email protected]> wrote:
>
> On Thu, Jun 1, 2023 at 9:40 PM Ard Biesheuvel <[email protected]> wrote:
> >
> > On Thu, 1 Jun 2023 at 14:10, Masahiro Yamada <[email protected]> wrote:
> > >
> > > For ARM, modpost fails to detect some types of section mismatches.
> > >
> > > [test code]
> > >
> > > .section .init.data,"aw"
> > > bar:
> > > .long 0
> > >
> > > .section .data,"aw"
> > > .globl foo
> > > foo:
> > > .long bar - .
> > >
> > > It is apparently a bad reference, but modpost does not report anything.
> > >
> > > The test code above produces the following relocations.
> > >
> > > Relocation section '.rel.data' at offset 0xe8 contains 1 entry:
> > > Offset Info Type Sym.Value Sym. Name
> > > 00000000 00000403 R_ARM_REL32 00000000 .init.data
> > >
> > > Currently, R_ARM_REL32 is just skipped.
> > >
> > > Handle it like R_ARM_ABS32.
> >
> > OK, so the reason we can handle these in the same way is because we
> > never calculate the resulting value, right? Because that value would
> > be different for these cases.
>
> Right.
>
> '- loc' is unnecessary here because modpost never calculates the
> resulting instruction.
>
> modpost wants to know the location of the referenced symbol.
> (the offset from the start of the section).
>
> For the same reason, I omitted '- loc' for
> PC-relative ones such as R_ARM_CALL, R_ARM_JUMP24, etc.
>

OK makes sense - I just wanted to double check

2023-06-01 15:00:26

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH 7/7] modpost: detect section mismatch for R_ARM_REL32

On Thu, Jun 1, 2023 at 9:40 PM Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 1 Jun 2023 at 14:10, Masahiro Yamada <[email protected]> wrote:
> >
> > For ARM, modpost fails to detect some types of section mismatches.
> >
> > [test code]
> >
> > .section .init.data,"aw"
> > bar:
> > .long 0
> >
> > .section .data,"aw"
> > .globl foo
> > foo:
> > .long bar - .
> >
> > It is apparently a bad reference, but modpost does not report anything.
> >
> > The test code above produces the following relocations.
> >
> > Relocation section '.rel.data' at offset 0xe8 contains 1 entry:
> > Offset Info Type Sym.Value Sym. Name
> > 00000000 00000403 R_ARM_REL32 00000000 .init.data
> >
> > Currently, R_ARM_REL32 is just skipped.
> >
> > Handle it like R_ARM_ABS32.
>
> OK, so the reason we can handle these in the same way is because we
> never calculate the resulting value, right? Because that value would
> be different for these cases.

Right.

'- loc' is unnecessary here because modpost never calculates the
resulting instruction.

modpost wants to know the location of the referenced symbol.
(the offset from the start of the section).

For the same reason, I omitted '- loc' for
PC-relative ones such as R_ARM_CALL, R_ARM_JUMP24, etc.







--
Best Regards

Masahiro Yamada

2023-06-01 15:00:42

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH 5/7] modpost: detect section mismatch for R_ARM_THM_{MOVW_ABS_NC,MOVT_ABS}

On Thu, Jun 1, 2023 at 9:23 PM Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 1 Jun 2023 at 14:10, Masahiro Yamada <[email protected]> wrote:
> >
> > When CONFIG_THUMB2_KERNEL is enabled, modpost fails to detect some
> > types of section mismatches.
> >
> > [test code]
> >
> > #include <linux/init.h>
> >
> > int __initdata foo;
> > int get_foo(void) { return foo; }
> >
> > It is apparently a bad reference, but modpost does not report anything.
> >
> > The test code above produces the following relocations.
> >
> > Relocation section '.rel.text' at offset 0x1e8 contains 2 entries:
> > Offset Info Type Sym.Value Sym. Name
> > 00000000 0000052f R_ARM_THM_MOVW_AB 00000000 .LANCHOR0
> > 00000004 00000530 R_ARM_THM_MOVT_AB 00000000 .LANCHOR0
> >
> > Currently, R_ARM_THM_MOVW_ABS_NC and R_ARM_THM_MOVT_ABS are just skipped.
> >
> > Add code to handle them. I checked arch/arm/kernel/module.c to learn
> > how the offset is encoded in the instruction.
> >
> > One more thing to note for Thumb instructions - the st_value is an odd
> > value, so you need to mask the bit 0 to get the offset. Otherwise, you
> > will get an off-by-one error in the nearest symbol look-up.
> >
> > It is documented in "ELF for the ARM Architecture" [1]:
> >
> > * If the symbol addresses a Thumb instruction, its value is the address
> > of the instruction with bit zero set (in a relocatable object, the
> > section offset with bit zero set).
> >
> > * For the purposes of relocation the value used shall be the address
> > of the instruction (st_value & ~1).
> >
> > [1]: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst
> >
> > Signed-off-by: Masahiro Yamada <[email protected]>
> > ---
> >
> > scripts/mod/modpost.c | 31 ++++++++++++++++++++++++++-----
> > 1 file changed, 26 insertions(+), 5 deletions(-)
> >
> > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > index 32d56efe3f3b..528aa9175e84 100644
> > --- a/scripts/mod/modpost.c
> > +++ b/scripts/mod/modpost.c
> > @@ -1082,7 +1082,8 @@ static Elf_Sym *find_nearest_sym(struct elf_info *elf, Elf_Addr addr,
> > {
> > Elf_Sym *sym;
> > Elf_Sym *near = NULL;
> > - Elf_Addr distance;
> > + Elf_Addr sym_addr, distance;
> > + bool is_arm = (elf->hdr->e_machine == EM_ARM);
> >
> > for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) {
> > if (get_secindex(elf, sym) != secndx)
> > @@ -1090,10 +1091,19 @@ static Elf_Sym *find_nearest_sym(struct elf_info *elf, Elf_Addr addr,
> > if (!is_valid_name(elf, sym))
> > continue;
> >
> > - if (addr >= sym->st_value)
> > - distance = addr - sym->st_value;
> > + sym_addr = sym->st_value;
> > +
> > + /*
> > + * For ARM Thumb instruction, the bit 0 of st_value is set.
> > + * Mask it to get the address.
> > + */
> > + if (is_arm)
> > + sym_addr &= ~1;
> > +
>
> This is only appropriate for STT_FUNC symbols. If this is a data
> reference, bit 0 could be a valid address bit.


Thanks for catching it.

I will fix it as follows:

/*
* For ARM Thumb instruction, the bit 0 of st_value is set if
* the symbol is STT_FUNC type. Mask it to get the address.
*/
if (is_arm && ELF_ST_TYPE(sym->st_info) == STT_FUNC)
sym_addr &= ~1;




--
Best Regards
Masahiro Yamada