2024-04-20 14:53:31

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v5] kallsyms: Avoid weak references for kallsyms symbols

From: Ard Biesheuvel <[email protected]>

kallsyms is a directory of all the symbols in the vmlinux binary, and so
creating it poses somewhat of a chicken-and-egg problem, as its non-zero
size affects the layout of the binary, and therefore the values of the
symbols.

For this reason, the kernel is linked more than once, and the first pass
does not include any kallsyms data at all. For the linker to accept
this, the symbol declarations describing the kallsyms metadata are
emitted as having weak linkage, so they can remain unsatisfied. During
the subsequent passes, the weak references are satisfied by the kallsyms
metadata that was constructed based on information gathered from the
preceding passes.

Weak references lead to somewhat worse codegen, because taking their
address may need to produce NULL (if the reference was unsatisfied), and
this is not usually supported by RIP or PC relative symbol references.

Given that these references are ultimately always satisfied in the final
link, let's drop the weak annotation on the declarations, and instead,
provide fallback definitions with weak linkage. This informs the
compiler that ultimately, the reference will always be satisfied.

While at it, drop the FRV specific annotation that these symbols reside
in .rodata - FRV is long gone.

Cc: Masahiro Yamada <[email protected]>
Cc: [email protected]
Acked-by: Nick Desaulniers <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>
---
v5: - avoid PROVIDE() in the linker script, use weak definitions instead
- drop tested-by, replace reviewed-by with acked-by

kernel/kallsyms.c | 31 ++++++++++++++++----
kernel/kallsyms_internal.h | 25 ++++++----------
2 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 18edd57b5fe8..fada7fbb24cf 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -34,6 +34,31 @@

#include "kallsyms_internal.h"

+/*
+ * The real definitions of the symbols below will not exist yet during the
+ * first pass of the link, but are guaranteed to exist in the final link.
+ * Provide preliminary weak definitions that will be superseded in the final
+ * link, to avoid having to rely on weak references, which require a GOT when
+ * used in position independent code.
+ */
+
+#ifndef CONFIG_KALLSYMS_BASE_RELATIVE
+const unsigned long __weak kallsyms_addresses[1];
+#else
+const int __weak kallsyms_offsets[1];
+const unsigned long __weak kallsyms_relative_base;
+#endif
+
+const u8 __weak kallsyms_names[1];
+
+const unsigned int __weak kallsyms_num_syms;
+
+const char __weak kallsyms_token_table[1];
+const u16 __weak kallsyms_token_index[1];
+
+const unsigned int __weak kallsyms_markers[1];
+const u8 __weak kallsyms_seqs_of_names[3];
+
/*
* Expand a compressed symbol data into the resulting uncompressed string,
* if uncompressed string is too long (>= maxlen), it will be truncated,
@@ -325,12 +350,6 @@ static unsigned long get_symbol_pos(unsigned long addr,
unsigned long symbol_start = 0, symbol_end = 0;
unsigned long i, low, high, mid;

- /* This kernel should never had been booted. */
- if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
- BUG_ON(!kallsyms_addresses);
- else
- BUG_ON(!kallsyms_offsets);
-
/* Do a binary search on the sorted kallsyms_addresses array. */
low = 0;
high = kallsyms_num_syms;
diff --git a/kernel/kallsyms_internal.h b/kernel/kallsyms_internal.h
index 27fabdcc40f5..cf4124dbcc5b 100644
--- a/kernel/kallsyms_internal.h
+++ b/kernel/kallsyms_internal.h
@@ -8,24 +8,17 @@
* These will be re-linked against their real values
* during the second link stage.
*/
-extern const unsigned long kallsyms_addresses[] __weak;
-extern const int kallsyms_offsets[] __weak;
-extern const u8 kallsyms_names[] __weak;
+extern const unsigned long kallsyms_addresses[];
+extern const int kallsyms_offsets[];
+extern const u8 kallsyms_names[];

-/*
- * Tell the compiler that the count isn't in the small data section if the arch
- * has one (eg: FRV).
- */
-extern const unsigned int kallsyms_num_syms
-__section(".rodata") __attribute__((weak));
-
-extern const unsigned long kallsyms_relative_base
-__section(".rodata") __attribute__((weak));
+extern const unsigned int kallsyms_num_syms;
+extern const unsigned long kallsyms_relative_base;

-extern const char kallsyms_token_table[] __weak;
-extern const u16 kallsyms_token_index[] __weak;
+extern const char kallsyms_token_table[];
+extern const u16 kallsyms_token_index[];

-extern const unsigned int kallsyms_markers[] __weak;
-extern const u8 kallsyms_seqs_of_names[] __weak;
+extern const unsigned int kallsyms_markers[];
+extern const u8 kallsyms_seqs_of_names[];

#endif // LINUX_KALLSYMS_INTERNAL_H_
--
2.44.0.769.g3c40516874-goog



2024-04-22 16:03:49

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH v5] kallsyms: Avoid weak references for kallsyms symbols

On Sat, Apr 20, 2024 at 11:53 PM Ard Biesheuvel <[email protected]> wrote:
>
> From: Ard Biesheuvel <[email protected]>
>
> kallsyms is a directory of all the symbols in the vmlinux binary, and so
> creating it poses somewhat of a chicken-and-egg problem, as its non-zero
> size affects the layout of the binary, and therefore the values of the
> symbols.
>
> For this reason, the kernel is linked more than once, and the first pass
> does not include any kallsyms data at all. For the linker to accept
> this, the symbol declarations describing the kallsyms metadata are
> emitted as having weak linkage, so they can remain unsatisfied. During
> the subsequent passes, the weak references are satisfied by the kallsyms
> metadata that was constructed based on information gathered from the
> preceding passes.
>
> Weak references lead to somewhat worse codegen, because taking their
> address may need to produce NULL (if the reference was unsatisfied), and
> this is not usually supported by RIP or PC relative symbol references.
>
> Given that these references are ultimately always satisfied in the final
> link, let's drop the weak annotation on the declarations, and instead,
> provide fallback definitions with weak linkage. This informs the
> compiler that ultimately, the reference will always be satisfied.
>
> While at it, drop the FRV specific annotation that these symbols reside
> in .rodata - FRV is long gone.
>
> Cc: Masahiro Yamada <[email protected]>
> Cc: [email protected]
> Acked-by: Nick Desaulniers <[email protected]>
> Acked-by: Kees Cook <[email protected]>
> Acked-by: Arnd Bergmann <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> v5: - avoid PROVIDE() in the linker script, use weak definitions instead
> - drop tested-by, replace reviewed-by with acked-by
>

Applied to linux-kbuild. Thanks.


--
Best Regards
Masahiro Yamada

2024-04-23 16:02:39

by Konrad Dybcio

[permalink] [raw]
Subject: Re: [PATCH v5] kallsyms: Avoid weak references for kallsyms symbols



On 4/22/24 18:02, Masahiro Yamada wrote:
> On Sat, Apr 20, 2024 at 11:53 PM Ard Biesheuvel <[email protected]> wrote:
>>
>> From: Ard Biesheuvel <[email protected]>
>>
>> kallsyms is a directory of all the symbols in the vmlinux binary, and so
>> creating it poses somewhat of a chicken-and-egg problem, as its non-zero
>> size affects the layout of the binary, and therefore the values of the
>> symbols.
>>
>> For this reason, the kernel is linked more than once, and the first pass
>> does not include any kallsyms data at all. For the linker to accept
>> this, the symbol declarations describing the kallsyms metadata are
>> emitted as having weak linkage, so they can remain unsatisfied. During
>> the subsequent passes, the weak references are satisfied by the kallsyms
>> metadata that was constructed based on information gathered from the
>> preceding passes.
>>
>> Weak references lead to somewhat worse codegen, because taking their
>> address may need to produce NULL (if the reference was unsatisfied), and
>> this is not usually supported by RIP or PC relative symbol references.
>>
>> Given that these references are ultimately always satisfied in the final
>> link, let's drop the weak annotation on the declarations, and instead,
>> provide fallback definitions with weak linkage. This informs the
>> compiler that ultimately, the reference will always be satisfied.
>>
>> While at it, drop the FRV specific annotation that these symbols reside
>> in .rodata - FRV is long gone.
>>
>> Cc: Masahiro Yamada <[email protected]>
>> Cc: [email protected]
>> Acked-by: Nick Desaulniers <[email protected]>
>> Acked-by: Kees Cook <[email protected]>
>> Acked-by: Arnd Bergmann <[email protected]>
>> Link: https://lore.kernel.org/all/[email protected]
>> Signed-off-by: Ard Biesheuvel <[email protected]>
>> ---
>> v5: - avoid PROVIDE() in the linker script, use weak definitions instead
>> - drop tested-by, replace reviewed-by with acked-by
>>
>
> Applied to linux-kbuild. Thanks.

Hi, this commit seems to break call traces, resulting in output like:

[ 2.777006] Call trace:
[ 2.777007] _text+0x89e7e8/0x39e0000
[ 2.777008] _text+0x89e82c/0x39e0000
[ 2.777009] _text+0x2b940cc/0x2bd2a90
[ 2.777011] _text+0x2b941a4/0x2bd2a90
[ 2.777012] _text+0x145dc/0x39e0000
[ 2.777014] _text+0x2b51184/0x2bd2a90
[ 2.777016] _text+0x18fc6a4/0x39e0000
[ 2.777018] _text+0x15644/0x39e0000
[ 2.777019] ---[ end trace 0000000000000000 ]---

Konrad

2024-04-23 16:22:43

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v5] kallsyms: Avoid weak references for kallsyms symbols

On Tue, 23 Apr 2024 at 18:01, Konrad Dybcio <[email protected]> wrote:
>
>
>
> On 4/22/24 18:02, Masahiro Yamada wrote:
> > On Sat, Apr 20, 2024 at 11:53 PM Ard Biesheuvel <[email protected]> wrote:
> >>
> >> From: Ard Biesheuvel <[email protected]>
> >>
> >> kallsyms is a directory of all the symbols in the vmlinux binary, and so
> >> creating it poses somewhat of a chicken-and-egg problem, as its non-zero
> >> size affects the layout of the binary, and therefore the values of the
> >> symbols.
> >>
> >> For this reason, the kernel is linked more than once, and the first pass
> >> does not include any kallsyms data at all. For the linker to accept
> >> this, the symbol declarations describing the kallsyms metadata are
> >> emitted as having weak linkage, so they can remain unsatisfied. During
> >> the subsequent passes, the weak references are satisfied by the kallsyms
> >> metadata that was constructed based on information gathered from the
> >> preceding passes.
> >>
> >> Weak references lead to somewhat worse codegen, because taking their
> >> address may need to produce NULL (if the reference was unsatisfied), and
> >> this is not usually supported by RIP or PC relative symbol references.
> >>
> >> Given that these references are ultimately always satisfied in the final
> >> link, let's drop the weak annotation on the declarations, and instead,
> >> provide fallback definitions with weak linkage. This informs the
> >> compiler that ultimately, the reference will always be satisfied.
> >>
> >> While at it, drop the FRV specific annotation that these symbols reside
> >> in .rodata - FRV is long gone.
> >>
> >> Cc: Masahiro Yamada <[email protected]>
> >> Cc: [email protected]
> >> Acked-by: Nick Desaulniers <[email protected]>
> >> Acked-by: Kees Cook <[email protected]>
> >> Acked-by: Arnd Bergmann <[email protected]>
> >> Link: https://lore.kernel.org/all/[email protected]
> >> Signed-off-by: Ard Biesheuvel <[email protected]>
> >> ---
> >> v5: - avoid PROVIDE() in the linker script, use weak definitions instead
> >> - drop tested-by, replace reviewed-by with acked-by
> >>
> >
> > Applied to linux-kbuild. Thanks.
>
> Hi, this commit seems to break call traces, resulting in output like:
>
> [ 2.777006] Call trace:
> [ 2.777007] _text+0x89e7e8/0x39e0000
> [ 2.777008] _text+0x89e82c/0x39e0000
> [ 2.777009] _text+0x2b940cc/0x2bd2a90
> [ 2.777011] _text+0x2b941a4/0x2bd2a90
> [ 2.777012] _text+0x145dc/0x39e0000
> [ 2.777014] _text+0x2b51184/0x2bd2a90
> [ 2.777016] _text+0x18fc6a4/0x39e0000
> [ 2.777018] _text+0x15644/0x39e0000
> [ 2.777019] ---[ end trace 0000000000000000 ]---
>

This patch triggers an issue in the compiler, which appears to perform
constant propagation on variables defined as weak, and this is
definitely a compiler bug. (A weak variable can be superseded by
another instance from a different object at link time, so the compiler
cannot make assumptions based on the version of the variable it
observes at compile time)

It has already been dropped from the kbuild tree.

2024-04-23 16:52:44

by Konrad Dybcio

[permalink] [raw]
Subject: Re: [PATCH v5] kallsyms: Avoid weak references for kallsyms symbols



On 4/23/24 18:22, Ard Biesheuvel wrote:
> On Tue, 23 Apr 2024 at 18:01, Konrad Dybcio <[email protected]> wrote:
>>
>>
>>
>> On 4/22/24 18:02, Masahiro Yamada wrote:
>>> On Sat, Apr 20, 2024 at 11:53 PM Ard Biesheuvel <[email protected]> wrote:
>>>>
>>>> From: Ard Biesheuvel <[email protected]>
>>>>
>>>> kallsyms is a directory of all the symbols in the vmlinux binary, and so
>>>> creating it poses somewhat of a chicken-and-egg problem, as its non-zero
>>>> size affects the layout of the binary, and therefore the values of the
>>>> symbols.
>>>>
>>>> For this reason, the kernel is linked more than once, and the first pass
>>>> does not include any kallsyms data at all. For the linker to accept
>>>> this, the symbol declarations describing the kallsyms metadata are
>>>> emitted as having weak linkage, so they can remain unsatisfied. During
>>>> the subsequent passes, the weak references are satisfied by the kallsyms
>>>> metadata that was constructed based on information gathered from the
>>>> preceding passes.
>>>>
>>>> Weak references lead to somewhat worse codegen, because taking their
>>>> address may need to produce NULL (if the reference was unsatisfied), and
>>>> this is not usually supported by RIP or PC relative symbol references.
>>>>
>>>> Given that these references are ultimately always satisfied in the final
>>>> link, let's drop the weak annotation on the declarations, and instead,
>>>> provide fallback definitions with weak linkage. This informs the
>>>> compiler that ultimately, the reference will always be satisfied.
>>>>
>>>> While at it, drop the FRV specific annotation that these symbols reside
>>>> in .rodata - FRV is long gone.
>>>>
>>>> Cc: Masahiro Yamada <[email protected]>
>>>> Cc: [email protected]
>>>> Acked-by: Nick Desaulniers <[email protected]>
>>>> Acked-by: Kees Cook <[email protected]>
>>>> Acked-by: Arnd Bergmann <[email protected]>
>>>> Link: https://lore.kernel.org/all/[email protected]
>>>> Signed-off-by: Ard Biesheuvel <[email protected]>
>>>> ---
>>>> v5: - avoid PROVIDE() in the linker script, use weak definitions instead
>>>> - drop tested-by, replace reviewed-by with acked-by
>>>>
>>>
>>> Applied to linux-kbuild. Thanks.
>>
>> Hi, this commit seems to break call traces, resulting in output like:
>>
>> [ 2.777006] Call trace:
>> [ 2.777007] _text+0x89e7e8/0x39e0000
>> [ 2.777008] _text+0x89e82c/0x39e0000
>> [ 2.777009] _text+0x2b940cc/0x2bd2a90
>> [ 2.777011] _text+0x2b941a4/0x2bd2a90
>> [ 2.777012] _text+0x145dc/0x39e0000
>> [ 2.777014] _text+0x2b51184/0x2bd2a90
>> [ 2.777016] _text+0x18fc6a4/0x39e0000
>> [ 2.777018] _text+0x15644/0x39e0000
>> [ 2.777019] ---[ end trace 0000000000000000 ]---
>>
>
> This patch triggers an issue in the compiler, which appears to perform
> constant propagation on variables defined as weak, and this is
> definitely a compiler bug. (A weak variable can be superseded by
> another instance from a different object at link time, so the compiler
> cannot make assumptions based on the version of the variable it
> observes at compile time)

Sounds like fun..

>
> It has already been dropped from the kbuild tree.

Thanks!

Konrad