From: Omar Sandoval <[email protected]>
Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change. Using the
kernel version is not a solution because distros frequently backport
changes.
It suffices to store a version number for the ORC format in the vmlinux
and kernel module ELF files (to use when parsing ORC sections from ELF),
and in kernel memory (to use when parsing ORC from a core dump). This
patch adds both of these by creating an .orc_header ELF section
containing a 4-byte version number and the corresponding
__start_orc_header and __stop_orc_header symbols.
The current version number is 3. Version 1 is the original version
merged in commit ee9f8fce9964 ("x86/unwind: Add the ORC unwinder").
Version 2 is the version from commit ffb1b4a41016 ("x86/unwind/orc: Add
'signal' field to ORC metadata"), which obviously didn't include this
header but could get it in a backport to the 6.3 stable branch.
1: https://github.com/osandov/drgn
2: https://github.com/osandov/drgn/issues/303
Signed-off-by: Omar Sandoval <[email protected]>
---
Hi,
As mentioned in the commit message, the motivation for this patch is
allowing drgn to continue to make use of ORC for kernel stack unwinding.
I want to make it clear that I don't want ORC to be stable ABI. The
kernel is free to change the format as much as needed, I just need a way
to detect the change. (drgn already pokes at many kernel internals and
needs updates for most kernel versions anyways. We have a big test suite
to catch changes we care about.)
I'm not at all married to (or proud of) this particular implementation;
I'd be happy to use anything that lets me detect the format version in
both cases mentioned in the commit message (ELF file or core dump +
symbol table).
It'd be great if we could get a solution in before 6.4 is released. I
would've reported this sooner, but I just got back from paternity leave
last week.
Thanks!
Omar
arch/x86/include/asm/orc_header.h | 14 ++++++++++++++
arch/x86/include/asm/orc_types.h | 14 ++++++++++++++
arch/x86/kernel/unwind_orc.c | 3 +++
include/asm-generic/vmlinux.lds.h | 3 +++
scripts/mod/modpost.c | 5 +++++
tools/arch/x86/include/asm/orc_types.h | 14 ++++++++++++++
6 files changed, 53 insertions(+)
create mode 100644 arch/x86/include/asm/orc_header.h
diff --git a/arch/x86/include/asm/orc_header.h b/arch/x86/include/asm/orc_header.h
new file mode 100644
index 000000000000..08c3710311f7
--- /dev/null
+++ b/arch/x86/include/asm/orc_header.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright (c) Meta Platforms, Inc. and affiliates. */
+
+#ifndef _ORC_HEADER_H
+#define _ORC_HEADER_H
+
+#include <asm/orc_types.h>
+
+/* For now, the header is just the 4-byte version number. */
+#define ORC_HEADER \
+ __used __section(".orc_header") \
+ static const u32 orc_header = ORC_VERSION
+
+#endif /* _ORC_HEADER_H */
diff --git a/arch/x86/include/asm/orc_types.h b/arch/x86/include/asm/orc_types.h
index 46d7e06763c9..fe1669284254 100644
--- a/arch/x86/include/asm/orc_types.h
+++ b/arch/x86/include/asm/orc_types.h
@@ -9,6 +9,20 @@
#include <linux/types.h>
#include <linux/compiler.h>
+/*
+ * Bump this whenever the format of .orc_unwind, .orc_unwind_ip, or .orc_lookup
+ * changes.
+ *
+ * - Version 2:
+ * - Added struct orc_entry.signal
+ * - Version 3:
+ * - Removed struct orc_entry.end
+ * - Made struct orc_entry.type 3 bits
+ * - Added ORC_TYPE_UNDEFINED and ORC_TYPE_END_OF_STACK
+ * - Renumbered ORC_TYPE_CALL, ORC_TYPE_REGS, and ORC_TYPE_REGS_PARTIAL
+ */
+#define ORC_VERSION 3
+
/*
* The ORC_REG_* registers are base registers which are used to find other
* registers on the stack.
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 3ac50b7298d1..4d8e518365f4 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -7,6 +7,9 @@
#include <asm/unwind.h>
#include <asm/orc_types.h>
#include <asm/orc_lookup.h>
+#include <asm/orc_header.h>
+
+ORC_HEADER;
#define orc_warn(fmt, ...) \
printk_deferred_once(KERN_WARNING "WARNING: " fmt, ##__VA_ARGS__)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index cebdf1ca415d..da9e5629ea43 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -839,6 +839,9 @@
#ifdef CONFIG_UNWINDER_ORC
#define ORC_UNWIND_TABLE \
+ .orc_header : AT(ADDR(.orc_header) - LOAD_OFFSET) { \
+ BOUNDED_SECTION_BY(.orc_header, _orc_header) \
+ } \
. = ALIGN(4); \
.orc_unwind_ip : AT(ADDR(.orc_unwind_ip) - LOAD_OFFSET) { \
BOUNDED_SECTION_BY(.orc_unwind_ip, _orc_unwind_ip) \
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index d4531d09984d..c12150f96b88 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1979,6 +1979,11 @@ static void add_header(struct buffer *b, struct module *mod)
buf_printf(b, "#include <linux/vermagic.h>\n");
buf_printf(b, "#include <linux/compiler.h>\n");
buf_printf(b, "\n");
+ buf_printf(b, "#ifdef CONFIG_UNWINDER_ORC\n");
+ buf_printf(b, "#include <asm/orc_header.h>\n");
+ buf_printf(b, "ORC_HEADER;\n");
+ buf_printf(b, "#endif\n");
+ buf_printf(b, "\n");
buf_printf(b, "BUILD_SALT;\n");
buf_printf(b, "BUILD_LTO_INFO;\n");
buf_printf(b, "\n");
diff --git a/tools/arch/x86/include/asm/orc_types.h b/tools/arch/x86/include/asm/orc_types.h
index 46d7e06763c9..fe1669284254 100644
--- a/tools/arch/x86/include/asm/orc_types.h
+++ b/tools/arch/x86/include/asm/orc_types.h
@@ -9,6 +9,20 @@
#include <linux/types.h>
#include <linux/compiler.h>
+/*
+ * Bump this whenever the format of .orc_unwind, .orc_unwind_ip, or .orc_lookup
+ * changes.
+ *
+ * - Version 2:
+ * - Added struct orc_entry.signal
+ * - Version 3:
+ * - Removed struct orc_entry.end
+ * - Made struct orc_entry.type 3 bits
+ * - Added ORC_TYPE_UNDEFINED and ORC_TYPE_END_OF_STACK
+ * - Renumbered ORC_TYPE_CALL, ORC_TYPE_REGS, and ORC_TYPE_REGS_PARTIAL
+ */
+#define ORC_VERSION 3
+
/*
* The ORC_REG_* registers are base registers which are used to find other
* registers on the stack.
--
2.40.1
On Thu, Jun 08, 2023 at 03:38:38PM -0700, Omar Sandoval wrote:
> Hi,
>
> As mentioned in the commit message, the motivation for this patch is
> allowing drgn to continue to make use of ORC for kernel stack unwinding.
>
> I want to make it clear that I don't want ORC to be stable ABI. The
> kernel is free to change the format as much as needed, I just need a way
> to detect the change. (drgn already pokes at many kernel internals and
> needs updates for most kernel versions anyways. We have a big test suite
> to catch changes we care about.)
>
> I'm not at all married to (or proud of) this particular implementation;
> I'd be happy to use anything that lets me detect the format version in
> both cases mentioned in the commit message (ELF file or core dump +
> symbol table).
>
> It'd be great if we could get a solution in before 6.4 is released. I
> would've reported this sooner, but I just got back from paternity leave
> last week.
Hi Omar,
Peter and I agree this seems fine in principle. Though, instead of
using an incrementing version, Peter had the idea to hash the struct,
like:
awk '/^struct orc_entry {$/ { p=1 } p { print } /^}/ { p=0 }' arch/x86/include/asm/orc_types.h | sha1sum
That way we don't have to remember to bump the version number, and it
would be more resilient to partial backports in distros.
Would something like that work for you?
--
Josh
On Fri, Jun 09, 2023 at 03:04:30PM -0700, Josh Poimboeuf wrote:
> On Thu, Jun 08, 2023 at 03:38:38PM -0700, Omar Sandoval wrote:
> > Hi,
> >
> > As mentioned in the commit message, the motivation for this patch is
> > allowing drgn to continue to make use of ORC for kernel stack unwinding.
> >
> > I want to make it clear that I don't want ORC to be stable ABI. The
> > kernel is free to change the format as much as needed, I just need a way
> > to detect the change. (drgn already pokes at many kernel internals and
> > needs updates for most kernel versions anyways. We have a big test suite
> > to catch changes we care about.)
> >
> > I'm not at all married to (or proud of) this particular implementation;
> > I'd be happy to use anything that lets me detect the format version in
> > both cases mentioned in the commit message (ELF file or core dump +
> > symbol table).
> >
> > It'd be great if we could get a solution in before 6.4 is released. I
> > would've reported this sooner, but I just got back from paternity leave
> > last week.
>
> Hi Omar,
>
> Peter and I agree this seems fine in principle.
Glad to hear that!
> Though, instead of
> using an incrementing version, Peter had the idea to hash the struct,
> like:
>
> awk '/^struct orc_entry {$/ { p=1 } p { print } /^}/ { p=0 }' arch/x86/include/asm/orc_types.h | sha1sum
>
> That way we don't have to remember to bump the version number, and it
> would be more resilient to partial backports in distros.
>
> Would something like that work for you?
Any sort of unique identifier works for me. One thing that the proposed
hash wouldn't catch is if ORC_REG_* or ORC_TYPE_* are ever renumbered
(i.e., the meanings of existing values change). It also wouldn't catch
if something about the .orc_unwind_ip section changed. But assuming
changes like that would be much rarer, it could be handled manually by
bumping a "salt" for the hash. E.g., by adding 'BEGIN { print <SALT> }'
to the awk script:
awk 'BEGIN { print 1 } /^struct orc_entry {$/ { p=1 } p { print } /^}/ { p=0 }' arch/x86/include/asm/orc_types.h | sha1sum
I'll defer to you guys whether it's easier to remember to bump a version
everytime or only in those rare cases.
Thanks,
Omar
On Fri, Jun 09, 2023 at 03:24:38PM -0700, Omar Sandoval wrote:
> > Though, instead of
> > using an incrementing version, Peter had the idea to hash the struct,
> > like:
> >
> > awk '/^struct orc_entry {$/ { p=1 } p { print } /^}/ { p=0 }' arch/x86/include/asm/orc_types.h | sha1sum
> >
> > That way we don't have to remember to bump the version number, and it
> > would be more resilient to partial backports in distros.
> >
> > Would something like that work for you?
>
> Any sort of unique identifier works for me. One thing that the proposed
> hash wouldn't catch is if ORC_REG_* or ORC_TYPE_* are ever renumbered
> (i.e., the meanings of existing values change). It also wouldn't catch
> if something about the .orc_unwind_ip section changed. But assuming
> changes like that would be much rarer, it could be handled manually by
> bumping a "salt" for the hash. E.g., by adding 'BEGIN { print <SALT> }'
> to the awk script:
>
> awk 'BEGIN { print 1 } /^struct orc_entry {$/ { p=1 } p { print } /^}/ { p=0 }' arch/x86/include/asm/orc_types.h | sha1sum
>
> I'll defer to you guys whether it's easier to remember to bump a version
> everytime or only in those rare cases.
I think I'd prefer only bumping it in the rare cases, because we're
going to end up forgetting either way ;-)
To catch REG/TYPE changes, we could forego awk and just hash the whole
file, the only downside being that it might introduce unnecessary
changes if we change a comment or something. But the file changes
rarely enough.
Or we could tweak the awk to also print ORC_{REG,TYPE}_* pretty easily.
--
Josh
On Fri, Jun 09, 2023 at 03:48:03PM -0700, Josh Poimboeuf wrote:
> On Fri, Jun 09, 2023 at 03:24:38PM -0700, Omar Sandoval wrote:
> > > Though, instead of
> > > using an incrementing version, Peter had the idea to hash the struct,
> > > like:
> > >
> > > awk '/^struct orc_entry {$/ { p=1 } p { print } /^}/ { p=0 }' arch/x86/include/asm/orc_types.h | sha1sum
> > >
> > > That way we don't have to remember to bump the version number, and it
> > > would be more resilient to partial backports in distros.
> > >
> > > Would something like that work for you?
> >
> > Any sort of unique identifier works for me. One thing that the proposed
> > hash wouldn't catch is if ORC_REG_* or ORC_TYPE_* are ever renumbered
> > (i.e., the meanings of existing values change). It also wouldn't catch
> > if something about the .orc_unwind_ip section changed. But assuming
> > changes like that would be much rarer, it could be handled manually by
> > bumping a "salt" for the hash. E.g., by adding 'BEGIN { print <SALT> }'
> > to the awk script:
> >
> > awk 'BEGIN { print 1 } /^struct orc_entry {$/ { p=1 } p { print } /^}/ { p=0 }' arch/x86/include/asm/orc_types.h | sha1sum
> >
> > I'll defer to you guys whether it's easier to remember to bump a version
> > everytime or only in those rare cases.
>
> I think I'd prefer only bumping it in the rare cases, because we're
> going to end up forgetting either way ;-)
>
> To catch REG/TYPE changes, we could forego awk and just hash the whole
> file, the only downside being that it might introduce unnecessary
> changes if we change a comment or something. But the file changes
> rarely enough.
>
> Or we could tweak the awk to also print ORC_{REG,TYPE}_* pretty easily.
Printing ORC_{REG,TYPE}_* should cover almost everything, and if there's
anything not covered, it can be handled manually, so that sounds good to
me. I'll draft a patch doing that.
Thanks,
Omar