2023-09-14 13:22:37

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 0/3] Fix 'faddr2line' for LLVM arm64 builds

Hi all,

Here's version four of my faddr2line fixes previously posted here:

v1: https://lore.kernel.org/r/[email protected]
v2: https://lore.kernel.org/r/[email protected]
v3: https://lore.kernel.org/r/[email protected]

Changes since v3 include:
* Add support for specifying specific LLVM versions with LLVM=
* Drop the mksysmap filter in favour of a simpler regex implementing
is_mapping_symbol()

Cheers,

Will

Cc: Masahiro Yamada <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Nicolas Schier <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: John Stultz <[email protected]>
Cc: [email protected]

--->8

Will Deacon (3):
scripts/faddr2line: Don't filter out non-function symbols from readelf
scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1
scripts/faddr2line: Skip over mapping symbols in output from readelf

scripts/faddr2line | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)

--
2.42.0.283.g2d96d420d3-goog


2023-09-14 16:12:45

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 2/3] scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1

GNU utilities cannot necessarily parse objects built by LLVM, which can
result in confusing errors when using 'faddr2line':

$ CROSS_COMPILE=aarch64-linux-gnu- ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260
aarch64-linux-gnu-addr2line: vmlinux: unknown type [0x13] section `.relr.dyn'
aarch64-linux-gnu-addr2line: DWARF error: invalid or unhandled FORM value: 0x25
do_one_initcall+0xf4/0x260:
aarch64-linux-gnu-addr2line: vmlinux: unknown type [0x13] section `.relr.dyn'
aarch64-linux-gnu-addr2line: DWARF error: invalid or unhandled FORM value: 0x25
$x.73 at main.c:?

Although this can be worked around by setting CROSS_COMPILE to "llvm=-",
it's cleaner to follow the same syntax as the top-level Makefile and
accept LLVM= as an indication to use the llvm- tools, optionally
specifying their location or specific version number.

Cc: Josh Poimboeuf <[email protected]>
Cc: John Stultz <[email protected]>
Suggested-by: Masahiro Yamada <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
scripts/faddr2line | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/scripts/faddr2line b/scripts/faddr2line
index a35a420d0f26..6b8206802157 100755
--- a/scripts/faddr2line
+++ b/scripts/faddr2line
@@ -58,8 +58,21 @@ die() {
exit 1
}

-READELF="${CROSS_COMPILE:-}readelf"
-ADDR2LINE="${CROSS_COMPILE:-}addr2line"
+UTIL_SUFFIX=""
+if [[ "${LLVM:-}" == "" ]]; then
+ UTIL_PREFIX=${CROSS_COMPILE:-}
+else
+ UTIL_PREFIX=llvm-
+
+ if [[ "${LLVM}" == *"/" ]]; then
+ UTIL_PREFIX=${LLVM}${UTIL_PREFIX}
+ elif [[ "${LLVM}" == "-"* ]]; then
+ UTIL_SUFFIX=${LLVM}
+ fi
+fi
+
+READELF="${UTIL_PREFIX}readelf${UTIL_SUFFIX}"
+ADDR2LINE="${UTIL_PREFIX}addr2line${UTIL_SUFFIX}"
AWK="awk"
GREP="grep"

--
2.42.0.283.g2d96d420d3-goog

2023-09-14 16:12:48

by Will Deacon

[permalink] [raw]
Subject: [PATCH v4 3/3] scripts/faddr2line: Skip over mapping symbols in output from readelf

Mapping symbols emitted in the readelf output can confuse the
'faddr2line' symbol size calculation, resulting in the erroneous
rejection of valid offsets. This is especially prevalent when building
an arm64 kernel with CONFIG_CFI_CLANG=y, where most functions are
prefixed with a 32-bit data value in a '$d.n' section. For example:

447538: ffff800080014b80 548 FUNC GLOBAL DEFAULT 2 do_one_initcall
104: ffff800080014c74 0 NOTYPE LOCAL DEFAULT 2 $x.73
106: ffff800080014d30 0 NOTYPE LOCAL DEFAULT 2 $x.75
111: ffff800080014da4 0 NOTYPE LOCAL DEFAULT 2 $d.78
112: ffff800080014da8 0 NOTYPE LOCAL DEFAULT 2 $x.79
36: ffff800080014de0 200 FUNC LOCAL DEFAULT 2 run_init_process

Adding a warning to do_one_initcall() results in:

| WARNING: CPU: 0 PID: 1 at init/main.c:1236 do_one_initcall+0xf4/0x260

Which 'faddr2line' refuses to accept:

$ ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260
skipping do_one_initcall address at 0xffff800080014c74 due to size mismatch (0x260 != 0x224)
no match for do_one_initcall+0xf4/0x260

Filter out these entries from readelf using a shell reimplementation of
is_mapping_symbol(), so that the size of a symbol is calculated as a
delta to the next symbol present in ksymtab.

Cc: Josh Poimboeuf <[email protected]>
Cc: John Stultz <[email protected]>
Suggested-by: Masahiro Yamada <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
scripts/faddr2line | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/scripts/faddr2line b/scripts/faddr2line
index 6b8206802157..20d9b3d37843 100755
--- a/scripts/faddr2line
+++ b/scripts/faddr2line
@@ -179,6 +179,11 @@ __faddr2line() {
local cur_sym_elf_size=${fields[2]}
local cur_sym_name=${fields[7]:-}

+ # is_mapping_symbol(cur_sym_name)
+ if [[ ${cur_sym_name} =~ ^((\.L)|(L0)|(\$[adtx](\.|$))) ]]; then
+ continue
+ fi
+
if [[ $cur_sym_addr = $sym_addr ]] &&
[[ $cur_sym_elf_size = $sym_elf_size ]] &&
[[ $cur_sym_name = $sym_name ]]; then
--
2.42.0.283.g2d96d420d3-goog

2023-09-14 21:56:57

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1

On Thu, Sep 14, 2023 at 6:12 AM Will Deacon <[email protected]> wrote:
>
> GNU utilities cannot necessarily parse objects built by LLVM, which can
> result in confusing errors when using 'faddr2line':
>
> $ CROSS_COMPILE=aarch64-linux-gnu- ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260
> aarch64-linux-gnu-addr2line: vmlinux: unknown type [0x13] section `.relr.dyn'
> aarch64-linux-gnu-addr2line: DWARF error: invalid or unhandled FORM value: 0x25
> do_one_initcall+0xf4/0x260:
> aarch64-linux-gnu-addr2line: vmlinux: unknown type [0x13] section `.relr.dyn'
> aarch64-linux-gnu-addr2line: DWARF error: invalid or unhandled FORM value: 0x25
> $x.73 at main.c:?
>
> Although this can be worked around by setting CROSS_COMPILE to "llvm=-",
> it's cleaner to follow the same syntax as the top-level Makefile and
> accept LLVM= as an indication to use the llvm- tools, optionally
> specifying their location or specific version number.
>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: John Stultz <[email protected]>
> Suggested-by: Masahiro Yamada <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>

Thanks for the patch series!
Reviewed-by: Nick Desaulniers <[email protected]>

> ---
> scripts/faddr2line | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/faddr2line b/scripts/faddr2line
> index a35a420d0f26..6b8206802157 100755
> --- a/scripts/faddr2line
> +++ b/scripts/faddr2line
> @@ -58,8 +58,21 @@ die() {
> exit 1
> }
>
> -READELF="${CROSS_COMPILE:-}readelf"
> -ADDR2LINE="${CROSS_COMPILE:-}addr2line"
> +UTIL_SUFFIX=""
> +if [[ "${LLVM:-}" == "" ]]; then
> + UTIL_PREFIX=${CROSS_COMPILE:-}
> +else
> + UTIL_PREFIX=llvm-
> +
> + if [[ "${LLVM}" == *"/" ]]; then
> + UTIL_PREFIX=${LLVM}${UTIL_PREFIX}
> + elif [[ "${LLVM}" == "-"* ]]; then
> + UTIL_SUFFIX=${LLVM}
> + fi
> +fi
> +
> +READELF="${UTIL_PREFIX}readelf${UTIL_SUFFIX}"
> +ADDR2LINE="${UTIL_PREFIX}addr2line${UTIL_SUFFIX}"
> AWK="awk"
> GREP="grep"
>
> --
> 2.42.0.283.g2d96d420d3-goog
>


--
Thanks,
~Nick Desaulniers

2023-09-18 16:49:01

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] scripts/faddr2line: Skip over mapping symbols in output from readelf

On Thu, Sep 14, 2023 at 6:12 AM Will Deacon <[email protected]> wrote:
>
> Mapping symbols emitted in the readelf output can confuse the
> 'faddr2line' symbol size calculation, resulting in the erroneous
> rejection of valid offsets. This is especially prevalent when building
> an arm64 kernel with CONFIG_CFI_CLANG=y, where most functions are
> prefixed with a 32-bit data value in a '$d.n' section. For example:
>
> 447538: ffff800080014b80 548 FUNC GLOBAL DEFAULT 2 do_one_initcall
> 104: ffff800080014c74 0 NOTYPE LOCAL DEFAULT 2 $x.73
> 106: ffff800080014d30 0 NOTYPE LOCAL DEFAULT 2 $x.75
> 111: ffff800080014da4 0 NOTYPE LOCAL DEFAULT 2 $d.78
> 112: ffff800080014da8 0 NOTYPE LOCAL DEFAULT 2 $x.79
> 36: ffff800080014de0 200 FUNC LOCAL DEFAULT 2 run_init_process
>
> Adding a warning to do_one_initcall() results in:
>
> | WARNING: CPU: 0 PID: 1 at init/main.c:1236 do_one_initcall+0xf4/0x260
>
> Which 'faddr2line' refuses to accept:
>
> $ ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260
> skipping do_one_initcall address at 0xffff800080014c74 due to size mismatch (0x260 != 0x224)
> no match for do_one_initcall+0xf4/0x260
>
> Filter out these entries from readelf using a shell reimplementation of
> is_mapping_symbol(), so that the size of a symbol is calculated as a
> delta to the next symbol present in ksymtab.
>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: John Stultz <[email protected]>
> Suggested-by: Masahiro Yamada <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> scripts/faddr2line | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/scripts/faddr2line b/scripts/faddr2line
> index 6b8206802157..20d9b3d37843 100755
> --- a/scripts/faddr2line
> +++ b/scripts/faddr2line
> @@ -179,6 +179,11 @@ __faddr2line() {
> local cur_sym_elf_size=${fields[2]}
> local cur_sym_name=${fields[7]:-}
>
> + # is_mapping_symbol(cur_sym_name)
> + if [[ ${cur_sym_name} =~ ^((\.L)|(L0)|(\$[adtx](\.|$))) ]]; then

Thanks for the patch!

I'm curious about the `|$` in the final part of the regex. IIUC that
will match something like
$a
Do we have any such symbols without `.<n>` suffixes?

With aarch64 defconfig + cfi:
$ llvm-readelf -s vmlinux | grep '\$' | rev | cut -d ' ' -f 1 | rev | sort -u
I only see $d.<n> and $x.<n> where the initial value of <n> is zero
(as opposed to no `.<n>` suffix).
Can we tighten up that last part of the regex to be `\$[adtx]\.[0-9]+$` ?
Or perhaps you've observed mapping symbols use another convention than
what clang is doing?

https://sourceware.org/binutils/docs/as/AArch64-Mapping-Symbols.html
also only mentions $d and $x. Ah,
https://developer.arm.com/documentation/dui0803/a/Accessing-and-managing-symbols-with-armlink/About-mapping-symbols
mentions $a for A32 and $t for T32.
Consider adding a link to the ARM documentation on mapping symbols in
the commit message?

(Curiously, `llvm-nm` does not print these symbols, but `llvm-readelf -s` does).

> + continue
> + fi
> +
> if [[ $cur_sym_addr = $sym_addr ]] &&
> [[ $cur_sym_elf_size = $sym_elf_size ]] &&
> [[ $cur_sym_name = $sym_name ]]; then
> --
> 2.42.0.283.g2d96d420d3-goog
>


--
Thanks,
~Nick Desaulniers

2023-09-25 16:53:57

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] scripts/faddr2line: Skip over mapping symbols in output from readelf

On Thu, Sep 14, 2023 at 10:12 PM Will Deacon <[email protected]> wrote:
>
> Mapping symbols emitted in the readelf output can confuse the
> 'faddr2line' symbol size calculation, resulting in the erroneous
> rejection of valid offsets. This is especially prevalent when building
> an arm64 kernel with CONFIG_CFI_CLANG=y, where most functions are
> prefixed with a 32-bit data value in a '$d.n' section. For example:
>
> 447538: ffff800080014b80 548 FUNC GLOBAL DEFAULT 2 do_one_initcall
> 104: ffff800080014c74 0 NOTYPE LOCAL DEFAULT 2 $x.73
> 106: ffff800080014d30 0 NOTYPE LOCAL DEFAULT 2 $x.75
> 111: ffff800080014da4 0 NOTYPE LOCAL DEFAULT 2 $d.78
> 112: ffff800080014da8 0 NOTYPE LOCAL DEFAULT 2 $x.79
> 36: ffff800080014de0 200 FUNC LOCAL DEFAULT 2 run_init_process
>
> Adding a warning to do_one_initcall() results in:
>
> | WARNING: CPU: 0 PID: 1 at init/main.c:1236 do_one_initcall+0xf4/0x260
>
> Which 'faddr2line' refuses to accept:
>
> $ ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260
> skipping do_one_initcall address at 0xffff800080014c74 due to size mismatch (0x260 != 0x224)
> no match for do_one_initcall+0xf4/0x260
>
> Filter out these entries from readelf using a shell reimplementation of
> is_mapping_symbol(), so that the size of a symbol is calculated as a
> delta to the next symbol present in ksymtab.
>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: John Stultz <[email protected]>
> Suggested-by: Masahiro Yamada <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> scripts/faddr2line | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/scripts/faddr2line b/scripts/faddr2line
> index 6b8206802157..20d9b3d37843 100755
> --- a/scripts/faddr2line
> +++ b/scripts/faddr2line
> @@ -179,6 +179,11 @@ __faddr2line() {
> local cur_sym_elf_size=${fields[2]}
> local cur_sym_name=${fields[7]:-}
>
> + # is_mapping_symbol(cur_sym_name)
> + if [[ ${cur_sym_name} =~ ^((\.L)|(L0)|(\$[adtx](\.|$))) ]]; then
> + continue
> + fi
> +


Too many parentheses.


The latest include/linux/module_symbol.h looks like this.

static inline int is_mapping_symbol(const char *str)
{
if (str[0] == '.' && str[1] == 'L')
return true;
if (str[0] == 'L' && str[1] == '0')
return true;
return str[0] == '$';
}






Does this work?

if [[ ${cur_sym_name} =~ ^(\.L|L0|\$) ]]; then
continue
fi








> if [[ $cur_sym_addr = $sym_addr ]] &&
> [[ $cur_sym_elf_size = $sym_elf_size ]] &&
> [[ $cur_sym_name = $sym_name ]]; then
> --
> 2.42.0.283.g2d96d420d3-goog
>


--
Best Regards
Masahiro Yamada

2023-09-29 14:28:45

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] scripts/faddr2line: Skip over mapping symbols in output from readelf

On Tue, Sep 26, 2023 at 01:50:20AM +0900, Masahiro Yamada wrote:
> On Thu, Sep 14, 2023 at 10:12 PM Will Deacon <[email protected]> wrote:
> >
> > Mapping symbols emitted in the readelf output can confuse the
> > 'faddr2line' symbol size calculation, resulting in the erroneous
> > rejection of valid offsets. This is especially prevalent when building
> > an arm64 kernel with CONFIG_CFI_CLANG=y, where most functions are
> > prefixed with a 32-bit data value in a '$d.n' section. For example:
> >
> > 447538: ffff800080014b80 548 FUNC GLOBAL DEFAULT 2 do_one_initcall
> > 104: ffff800080014c74 0 NOTYPE LOCAL DEFAULT 2 $x.73
> > 106: ffff800080014d30 0 NOTYPE LOCAL DEFAULT 2 $x.75
> > 111: ffff800080014da4 0 NOTYPE LOCAL DEFAULT 2 $d.78
> > 112: ffff800080014da8 0 NOTYPE LOCAL DEFAULT 2 $x.79
> > 36: ffff800080014de0 200 FUNC LOCAL DEFAULT 2 run_init_process
> >
> > Adding a warning to do_one_initcall() results in:
> >
> > | WARNING: CPU: 0 PID: 1 at init/main.c:1236 do_one_initcall+0xf4/0x260
> >
> > Which 'faddr2line' refuses to accept:
> >
> > $ ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260
> > skipping do_one_initcall address at 0xffff800080014c74 due to size mismatch (0x260 != 0x224)
> > no match for do_one_initcall+0xf4/0x260
> >
> > Filter out these entries from readelf using a shell reimplementation of
> > is_mapping_symbol(), so that the size of a symbol is calculated as a
> > delta to the next symbol present in ksymtab.
> >
> > Cc: Josh Poimboeuf <[email protected]>
> > Cc: John Stultz <[email protected]>
> > Suggested-by: Masahiro Yamada <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> > scripts/faddr2line | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/scripts/faddr2line b/scripts/faddr2line
> > index 6b8206802157..20d9b3d37843 100755
> > --- a/scripts/faddr2line
> > +++ b/scripts/faddr2line
> > @@ -179,6 +179,11 @@ __faddr2line() {
> > local cur_sym_elf_size=${fields[2]}
> > local cur_sym_name=${fields[7]:-}
> >
> > + # is_mapping_symbol(cur_sym_name)
> > + if [[ ${cur_sym_name} =~ ^((\.L)|(L0)|(\$[adtx](\.|$))) ]]; then
> > + continue
> > + fi
> > +
>
>
> Too many parentheses.

Ha, well _that_ is subjective! I really think they help when it comes to
regex syntax. However...

> The latest include/linux/module_symbol.h looks like this.
>
> static inline int is_mapping_symbol(const char *str)
> {
> if (str[0] == '.' && str[1] == 'L')
> return true;
> if (str[0] == 'L' && str[1] == '0')
> return true;
> return str[0] == '$';
> }

...oh, nice, that got simplified a whole lot by ff09f6fd2972 ("modpost,
kallsyms: Treat add '$'-prefixed symbols as mapping symbols") in the
recent merge window, so I can definitely simplify the regex.

> Does this work?
>
> if [[ ${cur_sym_name} =~ ^(\.L|L0|\$) ]]; then
> continue
> fi

Looks about right.

Will

2023-10-02 15:40:00

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] scripts/faddr2line: Skip over mapping symbols in output from readelf

On Mon, Sep 18, 2023 at 08:46:22AM -0700, Nick Desaulniers wrote:
> On Thu, Sep 14, 2023 at 6:12 AM Will Deacon <[email protected]> wrote:
> >
> > Mapping symbols emitted in the readelf output can confuse the
> > 'faddr2line' symbol size calculation, resulting in the erroneous
> > rejection of valid offsets. This is especially prevalent when building
> > an arm64 kernel with CONFIG_CFI_CLANG=y, where most functions are
> > prefixed with a 32-bit data value in a '$d.n' section. For example:
> >
> > 447538: ffff800080014b80 548 FUNC GLOBAL DEFAULT 2 do_one_initcall
> > 104: ffff800080014c74 0 NOTYPE LOCAL DEFAULT 2 $x.73
> > 106: ffff800080014d30 0 NOTYPE LOCAL DEFAULT 2 $x.75
> > 111: ffff800080014da4 0 NOTYPE LOCAL DEFAULT 2 $d.78
> > 112: ffff800080014da8 0 NOTYPE LOCAL DEFAULT 2 $x.79
> > 36: ffff800080014de0 200 FUNC LOCAL DEFAULT 2 run_init_process
> >
> > Adding a warning to do_one_initcall() results in:
> >
> > | WARNING: CPU: 0 PID: 1 at init/main.c:1236 do_one_initcall+0xf4/0x260
> >
> > Which 'faddr2line' refuses to accept:
> >
> > $ ./scripts/faddr2line vmlinux do_one_initcall+0xf4/0x260
> > skipping do_one_initcall address at 0xffff800080014c74 due to size mismatch (0x260 != 0x224)
> > no match for do_one_initcall+0xf4/0x260
> >
> > Filter out these entries from readelf using a shell reimplementation of
> > is_mapping_symbol(), so that the size of a symbol is calculated as a
> > delta to the next symbol present in ksymtab.
> >
> > Cc: Josh Poimboeuf <[email protected]>
> > Cc: John Stultz <[email protected]>
> > Suggested-by: Masahiro Yamada <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> > scripts/faddr2line | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/scripts/faddr2line b/scripts/faddr2line
> > index 6b8206802157..20d9b3d37843 100755
> > --- a/scripts/faddr2line
> > +++ b/scripts/faddr2line
> > @@ -179,6 +179,11 @@ __faddr2line() {
> > local cur_sym_elf_size=${fields[2]}
> > local cur_sym_name=${fields[7]:-}
> >
> > + # is_mapping_symbol(cur_sym_name)
> > + if [[ ${cur_sym_name} =~ ^((\.L)|(L0)|(\$[adtx](\.|$))) ]]; then
>
> Thanks for the patch!
>
> I'm curious about the `|$` in the final part of the regex. IIUC that
> will match something like
> $a
> Do we have any such symbols without `.<n>` suffixes?

tbh, I just blindly followed the implementation of is_mapping_symbol()
at the time, but Masahiro has since pointed out that it's been
significantly simplified so this regex should get much more manageable
in the next version.

Will