LinuxLists.cc - [PATCH v6 17/23] scripts: decode_stacktrace: demangle Rust symbols

2022-05-08 08:56:15

Subject: [PATCH v6 17/23] scripts: decode_stacktrace: demangle Rust symbols

Recent versions of both Binutils (`c++filt`) and LLVM (`llvm-cxxfilt`)
provide Rust v0 mangling support.

Co-developed-by: Alex Gaynor <[email protected]>
Signed-off-by: Alex Gaynor <[email protected]>
Co-developed-by: Wedson Almeida Filho <[email protected]>
Signed-off-by: Wedson Almeida Filho <[email protected]>
Signed-off-by: Miguel Ojeda <[email protected]>
---
I would like to use this patch for discussing the demangling topic.

The following discusses the different approaches we could take.

# Leave demangling to userspace

This is the easiest and less invasive approach, the one implemented
by this patch.

The `decode_stacktrace.sh` script is already needed to map
the offsets to the source code. Therefore, we could also take
the chance to demangle the symbols here.

With this approach, we do not need to introduce any change in the
`vsprintf` machinery and we minimize the risk of breaking user tools.

Note that, if we take this approach, it is likely we want to ask
for a minimum version of either of the tools (since there may be
users of the script that do not have recent enough toolchains).

# Demangling in kernelspace on-the-fly

That is, at backtrace print time, we demangle the Rust symbols.

The size of the code that would be needed is fairly small; around
5 KiB using the "official" library (written in Rust), e.g.:

text data bss dec hex filename
7799976 1689820 2129920 11619716 b14d84 vmlinux
7801111 1693916 2129920 11624947 b161f3 vmlinux + demangling

We can remove a few bits from the official library, e.g. punycode
support that we do not need (all our identifiers will be ASCII),
but it does not make a substantial difference.

The official library performs the demangling without requiring
allocations. However, of course, it will increased our stack usage
and complexity, specially considering a stack dump may be requested
in not ideal conditions.

Furthermore, this approach (and the ones below) likely require adding
a new `%p` specifier (or a new modifier to existing ones) if we do
not want to affect non-backtrace uses of the `B`/`S` ones. Also,
it is unclear whether we should write the demangled versions in an
extra, different line or replace the real symbol -- we could be
breaking user tools relying on parsing backtraces (e.g. our own
`decode_stacktrace.sh`). For instance, they could be relying on
having real symbols there, or may break due to e.g. spaces.

# Demangling at compile-time

This implies having kallsyms demangle all the Rust symbols.

The size of this data is around the same order of magnitude of the
non-demangled ones. However, this is notably more than the demangling
code (see previous point), e.g. 120 KiB (uncompressed) in a
small kernel.

This approach also brings the same concerns regarding modifying
the backtrace printing (see previous point).

# Demangling at compile-time and substituting symbols by hashes

One variation of the previous alternative is avoiding the mangled
names inside the kernel, by hashing them. This would avoid having
to support "big symbols" and would also reduce the size of the
kallsyms tables, while still allowing to link modules.

However, if we do not have the real symbols around, then we do not
have the possibility of providing both the mangled and demangled
versions in the backtrace, which brings us back to the issues
related to breaking userspace tools. There are also other places
other than backtraces using "real" symbols that users may be
relying on, such as `/proc/*/stack`.

scripts/decode_stacktrace.sh | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/scripts/decode_stacktrace.sh b/scripts/decode_stacktrace.sh
index 5fbad61fe490..f3c7b506d440 100755
--- a/scripts/decode_stacktrace.sh
+++ b/scripts/decode_stacktrace.sh
@@ -8,6 +8,14 @@ usage() {
echo " $0 -r <release> | <vmlinux> [<base path>|auto] [<modules path>]"
}

+# Try to find a Rust demangler
+if type llvm-cxxfilt >/dev/null 2>&1 ; then
+ cppfilt=llvm-cxxfilt
+elif type c++filt >/dev/null 2>&1 ; then
+ cppfilt=c++filt
+ cppfilt_opts=-i
+fi
+
if [[ $1 == "-r" ]] ; then
vmlinux=""
basepath="auto"
@@ -169,6 +177,12 @@ parse_symbol() {
# In the case of inlines, move everything to same line
code=${code//$'\n'/' '}

+ # Demangle if the name looks like a Rust symbol and if
+ # we got a Rust demangler
+ if [[ $name =~ ^_R && $cppfilt != "" ]] ; then
+ name=$("$cppfilt" "$cppfilt_opts" "$name")
+ fi
+
# Replace old address with pretty line numbers
symbol="$segment$name ($code)"
}
--
2.35.3

2022-05-09 06:09:19

by Miguel Ojeda

[permalink] [raw]

Subject: Re: [PATCH v6 17/23] scripts: decode_stacktrace: demangle Rust symbols

Hi Kees,

Thanks a lot for taking the time to read all that :)

On Sat, May 7, 2022 at 10:32 AM Kees Cook <[email protected]> wrote:
>
> I may need some examples here for what you're thinking will cause
> problems. Why a new specifier? Won't demangling just give us text? Is
> the concern about breaking backtrace parsers that only understand C
> symbols?

What I was thinking here is that if we replace how `%pB` works, for
instance, and do demangling unconditionally, then we would break some
of the use cases that are expecting real symbol names, e.g.
`/proc/pid/stack`. So we would need to introduce a way to
differentiate the cases where real symbols should be kept vs.
demangled backtraces for humans, e.g. a new specifier or a modifier
for the existing ones.

Similarly, if we modify the backtrace printed in the kernel log, there
is a high chance we break somebody's userspace backtrace parsers and
other tools connected to those in different ways, e.g.:

- If we replace the mangled symbol, then some tools may not expect
e.g. whitespace or other characters (which Rust demangled symbols
have); or if they handle them, they may be expecting actual symbols
(like in the case above) because they use them later on to correlate
them to some other data.

- If we keep the mangled symbols (so that tools still have the real
symbols) and introduce an extra line (or extra length in the same
line) per Rust symbol (where we write the demangled version), that
could still break some parsers just because of the new line (or extra
data).

So my concern is all about how to introduce the new information
without breaking any existing use case.

> It seems all of that would be in the build-time helper, not the kernel
> image, though, so that seems better than run-time demangling.

Hmm... I am not sure what you mean here.

What I meant by this option is that we pre-generate a table (at
compile-time) and put it into `vmlinux` (and similar for each loadable
module) so that we can then just look it up within the kernel instead
of running the demangle algorithm for each symbol (e.g. when printing
a backtrace).

If you mean giving userspace that table (e.g. that distros keep in a
file in `/boot` for tools to use etc.), that could be a good idea to
avoid userspace tools requiring a library for demangling, but it would
be an improvement on top of option 1 ("# Leave demangling to
userspace") rather than an independent option (since we need to choose
what we do for backtraces), no? Or what do you mean?

Cheers,
Miguel

2022-05-09 06:09:31

by Kees Cook

[permalink] [raw]

Subject: Re: [PATCH v6 17/23] scripts: decode_stacktrace: demangle Rust symbols

On Sat, May 07, 2022 at 12:21:14PM +0200, Miguel Ojeda wrote:
> - If we replace the mangled symbol, then some tools may not expect
> e.g. whitespace or other characters (which Rust demangled symbols
> have); or if they handle them, they may be expecting actual symbols
> (like in the case above) because they use them later on to correlate
> them to some other data.

Yeah. I think this is the fundamental issue, and it requires just
leaving it up to userspace. I don't see any significant benefits to any
of the other solutions.

Maybe some day we'll want demangling visible in traces, etc, but it
doesn't make sense to try to design that now. The mangled version is
existing-parser-safe.

--
Kees Cook

2022-05-09 11:27:33

by Kees Cook

[permalink] [raw]

Subject: Re: [PATCH v6 17/23] scripts: decode_stacktrace: demangle Rust symbols

On Sat, May 07, 2022 at 07:24:15AM +0200, Miguel Ojeda wrote:
> Recent versions of both Binutils (`c++filt`) and LLVM (`llvm-cxxfilt`)
> provide Rust v0 mangling support.
>
> Co-developed-by: Alex Gaynor <[email protected]>
> Signed-off-by: Alex Gaynor <[email protected]>
> Co-developed-by: Wedson Almeida Filho <[email protected]>
> Signed-off-by: Wedson Almeida Filho <[email protected]>
> Signed-off-by: Miguel Ojeda <[email protected]>
> ---
> I would like to use this patch for discussing the demangling topic.
>
> The following discusses the different approaches we could take.
>
>
> # Leave demangling to userspace
>
> This is the easiest and less invasive approach, the one implemented
> by this patch.
>
> The `decode_stacktrace.sh` script is already needed to map
> the offsets to the source code. Therefore, we could also take
> the chance to demangle the symbols here.
>
> With this approach, we do not need to introduce any change in the
> `vsprintf` machinery and we minimize the risk of breaking user tools.
>
> Note that, if we take this approach, it is likely we want to ask
> for a minimum version of either of the tools (since there may be
> users of the script that do not have recent enough toolchains).

For the first in-tree Rust support, I think this is entirely the right
approach.

> # Demangling in kernelspace on-the-fly

Please no. :) I don't see a benefit compared to doing it at
compile-time.

> Furthermore, this approach (and the ones below) likely require adding
> a new `%p` specifier (or a new modifier to existing ones) if we do
> not want to affect non-backtrace uses of the `B`/`S` ones. Also,
> it is unclear whether we should write the demangled versions in an
> extra, different line or replace the real symbol -- we could be
> breaking user tools relying on parsing backtraces (e.g. our own
> `decode_stacktrace.sh`). For instance, they could be relying on
> having real symbols there, or may break due to e.g. spaces.

I may need some examples here for what you're thinking will cause
problems. Why a new specifier? Won't demangling just give us text? Is
the concern about breaking backtrace parsers that only understand C
symbols?

> # Demangling at compile-time
>
> This implies having kallsyms demangle all the Rust symbols.
>
> The size of this data is around the same order of magnitude of the
> non-demangled ones. However, this is notably more than the demangling
> code (see previous point), e.g. 120 KiB (uncompressed) in a
> small kernel.

It seems all of that would be in the build-time helper, not the kernel
image, though, so that seems better than run-time demangling.

> # Demangling at compile-time and substituting symbols by hashes

Nah; this is even less readable than the mangled symbols. :) I don't
think the symbol length should be a concern. (Though maybe there are
some crash parsers that we can buffer overflow!)

> scripts/decode_stacktrace.sh | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook