LinuxLists.cc - [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-09 18:14:35

Subject: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

Hi,

Here's a preview of what I'm planning to discuss at the LPC toolchains
microconference. Feel free to start the discussion early :-)

This is a proposal for some new minor GCC/Clang features which would
help objtool greatly.

Background
----------

Objtool is a kernel-specific tool which reverse engineers the control
flow graph (CFG) of compiled objects. It then performs various
validations, annotations, and modifications, mostly with the goal of
improving robustness and security of the kernel.

Objtool features which use the CFG include include:
validation/generation of unwinding metadata; validation of Intel SMAP
rules; and validation of kernel "noinstr" rules (preventing compiler
instrumentation in certain critical sections).

In general it's not feasible for the traditional toolchain to do any of
this work, because the kernel has a lot of "blind spots" which the
toolchain doesn't have visibility to, notably asm and inline asm.
Manual .cfi annotations are very difficult to maintain and even more
difficult to ensure correctness. Also, due to kernel live patching, the
kernel relies on 100% correctness of unwinding metadata, whereas the
toolchain treats it as a best effort.

Challenges
----------

Reverse engineering the control flow graph is mostly quite
straightforward, with two notable exceptions:

1) Jump tables (e.g., switch statements):

Depending on the architecture, it's somewhere between difficult and
impossible to reliabily identify which indirect jumps correspond to
jump tables, and what are their corresponding intra-function jump
destinations.

2) Noreturn functions:

There's no reliable way to determine which functions are designated
by the compiler to be noreturn (either explictly via function
attribute, or implicitly via a static function which is a wrapper
around a noreturn function.) This information is needed because the
code after the call to such a function is optimized out as
unreachable and objtool has no way of knowing that.

Proposal
--------

Add the following new compiler flags which create non-allocatable ELF
sections which "annotate" control flow:

(Note this is purely hypothetical, intended for starting a discussion.
I'm not a compiler person and I haven't written any compiler code.)

1) -fannotate-jump-table

Create an .annotate.jump_table section which is an array of the
following variable-length structure:

struct annotate_jump_table {
void *indirect_jmp;
long num_targets;
void *targets[];
};

For example, given the following switch statement code:

.Lswitch_jmp:
// %rax is .Lcase_1 or .Lcase_2
jmp %rax

.Lcase_1:
...
.Lcase_2:
...

Add the following code:

.pushsection .annotate.jump_table
// indirect JMP address
.quad .Lswitch_jmp

// num jump targets
.quad 2

// indirect JMP target addresses
.quad .Lcase_1
.quad .Lcase_2
.popsection

2) -fannotate-noreturn

Create an .annotate.noreturn section which is an array of pointers to
noreturn functions (both explicit/implicit and defined/undefined).

For example, given the following three noreturn functions:

// explicit noreturn:
__attribute__((__noreturn__)) void func1(void)
{
exit(1);
}

// explicit noreturn (extern):
extern __attribute__((__noreturn__)) void func2(void);

// implicit noreturn:
static void func3(void)
{
// call noreturn function
func2();
}

Add the following code:

.pushsection .annotate.noreturn
.quad func1
.quad func2
.quad func3
.popsection

Alternatives
------------

Another idea which has been floated in the past is for objtool to read
DWARF (or .eh_frame) to help it figure out the control flow. That
hasn't been tried yet, but would be considerably more difficult and
fragile IMO.

--
Josh

2022-09-11 16:13:11

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> Alternatives
> ------------
>
> Another idea which has been floated in the past is for objtool to read
> DWARF (or .eh_frame) to help it figure out the control flow. That
> hasn't been tried yet, but would be considerably more difficult and
> fragile IMO.

I though Ard played around with that a bit on ARM64. And yes, given that
most toolchains consider DWARF itself best-effort, I'm not holding my
breath there.

On top of that, building a kernel with DWARFs on is just so much
slower..

2022-09-11 16:14:05

by Ard Biesheuvel

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Sun, 11 Sept 2022 at 16:26, Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > Alternatives
> > ------------
> >
> > Another idea which has been floated in the past is for objtool to read
> > DWARF (or .eh_frame) to help it figure out the control flow. That
> > hasn't been tried yet, but would be considerably more difficult and
> > fragile IMO.
>
> I though Ard played around with that a bit on ARM64. And yes, given that
> most toolchains consider DWARF itself best-effort, I'm not holding my
> breath there.
>

I have patches out that use unwind data to locate pointer auth
sign/authenticate instructions in the code, in order to patch them to
shadow call stack pushes and pops at runtime if pointer authentication
is not supported by the hardware. This has little to do with objtool
or reliable stack traces.

I still think DWARF could help to make objtool's job a bit easier, but
I don't think it will be of any use with jump tables or noreturn
functions in particular.

2022-09-12 11:52:03

by Borislav Petkov

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

+ matz.

Micha, any opinions on the below are appreciated.

Thx.

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> Hi,
>
> Here's a preview of what I'm planning to discuss at the LPC toolchains
> microconference. Feel free to start the discussion early :-)
>
> This is a proposal for some new minor GCC/Clang features which would
> help objtool greatly.
>
>
> Background
> ----------
>
> Objtool is a kernel-specific tool which reverse engineers the control
> flow graph (CFG) of compiled objects. It then performs various
> validations, annotations, and modifications, mostly with the goal of
> improving robustness and security of the kernel.
>
> Objtool features which use the CFG include include:
> validation/generation of unwinding metadata; validation of Intel SMAP
> rules; and validation of kernel "noinstr" rules (preventing compiler
> instrumentation in certain critical sections).
>
> In general it's not feasible for the traditional toolchain to do any of
> this work, because the kernel has a lot of "blind spots" which the
> toolchain doesn't have visibility to, notably asm and inline asm.
> Manual .cfi annotations are very difficult to maintain and even more
> difficult to ensure correctness. Also, due to kernel live patching, the
> kernel relies on 100% correctness of unwinding metadata, whereas the
> toolchain treats it as a best effort.
>
>
> Challenges
> ----------
>
> Reverse engineering the control flow graph is mostly quite
> straightforward, with two notable exceptions:
>
> 1) Jump tables (e.g., switch statements):
>
> Depending on the architecture, it's somewhere between difficult and
> impossible to reliabily identify which indirect jumps correspond to
> jump tables, and what are their corresponding intra-function jump
> destinations.
>
> 2) Noreturn functions:
>
> There's no reliable way to determine which functions are designated
> by the compiler to be noreturn (either explictly via function
> attribute, or implicitly via a static function which is a wrapper
> around a noreturn function.) This information is needed because the
> code after the call to such a function is optimized out as
> unreachable and objtool has no way of knowing that.
>
>
> Proposal
> --------
>
> Add the following new compiler flags which create non-allocatable ELF
> sections which "annotate" control flow:
>
> (Note this is purely hypothetical, intended for starting a discussion.
> I'm not a compiler person and I haven't written any compiler code.)
>
>
> 1) -fannotate-jump-table
>
> Create an .annotate.jump_table section which is an array of the
> following variable-length structure:
>
> struct annotate_jump_table {
> void *indirect_jmp;
> long num_targets;
> void *targets[];
> };
>
>
> For example, given the following switch statement code:
>
> .Lswitch_jmp:
> // %rax is .Lcase_1 or .Lcase_2
> jmp %rax
>
> .Lcase_1:
> ...
> .Lcase_2:
> ...
>
>
> Add the following code:
>
> .pushsection .annotate.jump_table
> // indirect JMP address
> .quad .Lswitch_jmp
>
> // num jump targets
> .quad 2
>
> // indirect JMP target addresses
> .quad .Lcase_1
> .quad .Lcase_2
> .popsection
>
>
> 2) -fannotate-noreturn
>
> Create an .annotate.noreturn section which is an array of pointers to
> noreturn functions (both explicit/implicit and defined/undefined).
>
>
> For example, given the following three noreturn functions:
>
> // explicit noreturn:
> __attribute__((__noreturn__)) void func1(void)
> {
> exit(1);
> }
>
> // explicit noreturn (extern):
> extern __attribute__((__noreturn__)) void func2(void);
>
> // implicit noreturn:
> static void func3(void)
> {
> // call noreturn function
> func2();
> }
>
>
> Add the following code:
>
> .pushsection .annotate.noreturn
> .quad func1
> .quad func2
> .quad func3
> .popsection
>
>
> Alternatives
> ------------
>
> Another idea which has been floated in the past is for objtool to read
> DWARF (or .eh_frame) to help it figure out the control flow. That
> hasn't been tried yet, but would be considerably more difficult and
> fragile IMO.
>
>
> --
> Josh

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-09-12 12:02:09

by Segher Boessenkool

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

Hi!

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> 2) Noreturn functions:
>
> There's no reliable way to determine which functions are designated
> by the compiler to be noreturn (either explictly via function
> attribute, or implicitly via a static function which is a wrapper
> around a noreturn function.)

Or just a function that does not return for any other reason.

The compiler makes no difference between functions that have the
attribute and functions that do not. There are good reasons to not
have the attribute on functions that do in fact not return. The
not-returningness of the function may be just an implementation
accident, something you do not want part of the API, so it *should* not
have that attribute; or you may want the callers to a function to not be
optimised according to this knowledge (you cannot *prevent* that, the
compiler can figure it out it other ways, but still) for any other
reason.

> This information is needed because the
> code after the call to such a function is optimized out as
> unreachable and objtool has no way of knowing that.

Since June we (GCC) have -funreachable-traps. This creates a trap insn
wherever control flow would otherwise go into limbo.

Segher

2022-09-12 15:02:15

by Michael Matz

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

Hey,

On Mon, 12 Sep 2022, Borislav Petkov wrote:

> Micha, any opinions on the below are appreciated.
>
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:

> > difficult to ensure correctness. Also, due to kernel live patching, the
> > kernel relies on 100% correctness of unwinding metadata, whereas the
> > toolchain treats it as a best effort.

Unwinding certainly is not best effort. It's 100% reliable as far as the
source language or compilation options require. But as it doesn't
touch the discussed features I won't belabor that point.

I will mention that objtool's existence is based on mistrust, of persons
(not correctly annotating stuff) and of tools (not correctly heeding those
annotations). The mistrust in persons is understandable and can be dealt
with by tools, but the mistrust in tools can't be fixed by making tools
more complicated by emitting even more information; there's no good reason
to assume that one piece of info can be trusted more than other pieces.
So, if you mistrust the tools you have already lost. That's somewhat
philosophical, so I won't beat that horse much more either.

Now, recovering the CFG. I'll switch order of your two items:

2) noreturn function

> > .pushsection .annotate.noreturn
> > .quad func1
> > .quad func2
> > .quad func3
> > .popsection

This won't work for indirect calls to noreturn functions:

void (* __attribute__((noreturn)) noretptr)(void);
int callnoret (int i)
{
noretptr();
return i + 32;
}

The return statement is unreachable (and removed by GCC). To know that
you would have to mark the call statements, not the individual functions.
All schemes that mark functions that somehow indicates a meaningful
difference in the calling sequence (e.g. the ABI of functions) have the
same problem: it's part of the call expressions type, not of individual
decls.

Second problem: it's not extensible. Today it's noreturn functions you
want to know, and tomorrow? So, add a flag word per entry, define bit 0
for now to be NORETURN, and see what comes. Add a header with a version
(and/or identifier) as well and it's properly extensible. For easy
linking and identifying the blobs in the linked result include a length in
the header. If this were in an allocated section it would be a good idea
to refer to the symbols in a PC-relative manner, so as to not result in
runtime relocations. In this case, as it's within a non-alloc section
that doesn't matter. So:

.section .annotate.functions
.long 1 # version
.long 0xcafe # ident
.long 2f-1f # length
1:
.quad func1, 1 # noreturn
.quad func2, 1 # noreturn
.quad func3, 32 # something_else_but_not_noreturn
...
2:
.long 1b-2b # align and "checksum"

It might be that the length-and-header scheme is cumbersome if you need to
write those section commands by hand, in which case another scheme might
be preferrable, but it should somehow be self-delimiting.

For the above problem of indirect calls to noreturns, instead do:

.text
noretcalllabel:
call noreturn
othercall:
call really_special_thing
.section .annotate.noretcalls
.quad noretcalllabel, 1 # noreturn call
.quad othercall, 32 # call to some special(-ABI?) function

Same thoughts re extensibility and self-delimitation apply.

1) jump tables

> > Create an .annotate.jump_table section which is an array of the
> > following variable-length structure:
> >
> > struct annotate_jump_table {
> > void *indirect_jmp;
> > long num_targets;
> > void *targets[];
> > };

It's very often the case that the compiler already emits what your
.targets[] member would encode, just at some unknown place, length and
encoding. So you would save space if you instead only remember the
encoding and places of those jump tables:

struct {
void *indirect_jump;
long num_tables;
struct {
unsigned num_entries;
unsigned encoding;
void *start_of_table;
} tables[];
};

The usual encodings are: direct, PC-relative, relative-to-start-of-table.
Usually for a specific jump instruction there's only one table, so
optimizing for that makes sense. For strange unthought-of cases it's
probably a good idea to have your initial scheme as fallback, which could
be indicated by a special .encoding value.

> > For example, given the following switch statement code:
> >
> > .Lswitch_jmp:
> > // %rax is .Lcase_1 or .Lcase_2
> > jmp %rax

So, usually %rax would point into a table (somewhere in .rodata/.text)
that looks like so:

.Ljump_table:
.quad .Lcase_1 - .Ljump_table
.quad .Lcase_2 - .Ljump_table

(for position-independend code)

and hence you would emit this as annotation:

.quad .Lswitch_jmp
.quad 1 # only a single table
.long 2 # with two entries
.long RELATIVE_TO_START # all entries are X - start_of_table
.quad .Ljump_table

In this case you won't save anything of course, but as soon as there's a
meaningful number of cases you will.

Again, if that info would be put into an allocated section you would want
to use relative encodings of the addresses to avoid runtime relocs. And
the remarks about self-delimitation and extensibility also apply here.

> > Alternatives
> > ------------
> >
> > Another idea which has been floated in the past is for objtool to read
> > DWARF (or .eh_frame) to help it figure out the control flow. That
> > hasn't been tried yet, but would be considerably more difficult and
> > fragile IMO.

While noreturn functions are marked in the debug info, noreturn
function types currently aren't quite correct. And jump-tables aren't
marked at all, so that would lose.

Ciao,
Michael.

2022-09-13 23:15:10

by Indu Bhagat

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

Hi Josh,

On 9/9/22 11:07, Josh Poimboeuf wrote:
> Hi,
>
> Here's a preview of what I'm planning to discuss at the LPC toolchains
> microconference. Feel free to start the discussion early :-)
>
> This is a proposal for some new minor GCC/Clang features which would
> help objtool greatly.
>
>
> Background
> ----------
>
> Objtool is a kernel-specific tool which reverse engineers the control
> flow graph (CFG) of compiled objects. It then performs various
> validations, annotations, and modifications, mostly with the goal of
> improving robustness and security of the kernel.
>
> Objtool features which use the CFG include include:
> validation/generation of unwinding metadata; validation of Intel SMAP
> rules; and validation of kernel "noinstr" rules (preventing compiler
> instrumentation in certain critical sections).
>
> In general it's not feasible for the traditional toolchain to do any of
> this work, because the kernel has a lot of "blind spots" which the
> toolchain doesn't have visibility to, notably asm and inline asm.
> Manual .cfi annotations are very difficult to maintain and even more
> difficult to ensure correctness. Also, due to kernel live patching, the
> kernel relies on 100% correctness of unwinding metadata, whereas the
> toolchain treats it as a best effort.
>
>
> Challenges
> ----------
>
> Reverse engineering the control flow graph is mostly quite
> straightforward, with two notable exceptions:
>
> 1) Jump tables (e.g., switch statements):
>
> Depending on the architecture, it's somewhere between difficult and
> impossible to reliabily identify which indirect jumps correspond to
> jump tables, and what are their corresponding intra-function jump
> destinations.
>
> 2) Noreturn functions:
>
> There's no reliable way to determine which functions are designated
> by the compiler to be noreturn (either explictly via function
> attribute, or implicitly via a static function which is a wrapper
> around a noreturn function.) This information is needed because the
> code after the call to such a function is optimized out as
> unreachable and objtool has no way of knowing that.
>
>

Curious to know what all features of objtool rely on the need to reverse
engineer the control flow graph. Is it a larger set or it is only for
ORC generation ?

> Proposal
> --------
>
> Add the following new compiler flags which create non-allocatable ELF
> sections which "annotate" control flow:
>
> (Note this is purely hypothetical, intended for starting a discussion.
> I'm not a compiler person and I haven't written any compiler code.)
>
>
> 1) -fannotate-jump-table
>
> Create an .annotate.jump_table section which is an array of the
> following variable-length structure:
>
> struct annotate_jump_table {
> void *indirect_jmp;
> long num_targets;
> void *targets[];
> };
>
>
> For example, given the following switch statement code:
>
> .Lswitch_jmp:
> // %rax is .Lcase_1 or .Lcase_2
> jmp %rax
>
> .Lcase_1:
> ...
> .Lcase_2:
> ...
>
>
> Add the following code:
>
> .pushsection .annotate.jump_table
> // indirect JMP address
> .quad .Lswitch_jmp
>
> // num jump targets
> .quad 2
>
> // indirect JMP target addresses
> .quad .Lcase_1
> .quad .Lcase_2
> .popsection
>
>
> 2) -fannotate-noreturn
>
> Create an .annotate.noreturn section which is an array of pointers to
> noreturn functions (both explicit/implicit and defined/undefined).
>
>
> For example, given the following three noreturn functions:
>
> // explicit noreturn:
> __attribute__((__noreturn__)) void func1(void)
> {
> exit(1);
> }
>
> // explicit noreturn (extern):
> extern __attribute__((__noreturn__)) void func2(void);
>
> // implicit noreturn:
> static void func3(void)
> {
> // call noreturn function
> func2();
> }
>
>
> Add the following code:
>
> .pushsection .annotate.noreturn
> .quad func1
> .quad func2
> .quad func3
> .popsection
>
>
> Alternatives
> ------------
>
> Another idea which has been floated in the past is for objtool to read
> DWARF (or .eh_frame) to help it figure out the control flow. That
> hasn't been tried yet, but would be considerably more difficult and
> fragile IMO.
>

2022-09-14 00:42:02

by Josh Poimboeuf

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Mon, Sep 12, 2022 at 02:17:36PM +0000, Michael Matz wrote:
> Hey,

Hi Michael,

Thanks for looking at this.

> On Mon, 12 Sep 2022, Borislav Petkov wrote:
>
> > Micha, any opinions on the below are appreciated.
> >
> > On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
>
> > > difficult to ensure correctness. Also, due to kernel live patching, the
> > > kernel relies on 100% correctness of unwinding metadata, whereas the
> > > toolchain treats it as a best effort.
>
> Unwinding certainly is not best effort. It's 100% reliable as far as the
> source language or compilation options require. But as it doesn't
> touch the discussed features I won't belabor that point.

Ok, maybe I had the wrong impression about the reliability of DWARF.

> I will mention that objtool's existence is based on mistrust, of persons
> (not correctly annotating stuff) and of tools (not correctly heeding those
> annotations). The mistrust in persons is understandable and can be dealt
> with by tools, but the mistrust in tools can't be fixed by making tools
> more complicated by emitting even more information; there's no good reason
> to assume that one piece of info can be trusted more than other pieces.
> So, if you mistrust the tools you have already lost. That's somewhat
> philosophical, so I won't beat that horse much more either.

Maybe this is semantics, but I wouldn't characterize objtool's existence
as being based on the mistrust of tools. It's main motivation is to
fill in the toolchain's blind spots in asm and inline-asm, which exist
by design.

(Objtool has actually found many compiler bugs, but that's a side
benefit and not its reason for existence.)

I understand the concern about trusting one piece of info more than
others, but we have to trust the toolchain. Also, objtool does a lot of
consistency checks, and experience shows that if there's a bug in the
existing jump table or noreturn detection logic, it almost always
quickly surfaces as an objtool warning: unreachable instruction, stack
state mismatch, falling through the end of a function, etc.

> Now, recovering the CFG. I'll switch order of your two items:
>
> 2) noreturn function
>
> > > .pushsection .annotate.noreturn
> > > .quad func1
> > > .quad func2
> > > .quad func3
> > > .popsection
>
> This won't work for indirect calls to noreturn functions:
>
> void (* __attribute__((noreturn)) noretptr)(void);
> int callnoret (int i)
> {
> noretptr();
> return i + 32;
> }
>
> The return statement is unreachable (and removed by GCC). To know that
> you would have to mark the call statements, not the individual functions.
> All schemes that mark functions that somehow indicates a meaningful
> difference in the calling sequence (e.g. the ABI of functions) have the
> same problem: it's part of the call expressions type, not of individual
> decls.
>
> Second problem: it's not extensible. Today it's noreturn functions you
> want to know, and tomorrow? So, add a flag word per entry, define bit 0
> for now to be NORETURN, and see what comes. Add a header with a version
> (and/or identifier) as well and it's properly extensible. For easy
> linking and identifying the blobs in the linked result include a length in
> the header. If this were in an allocated section it would be a good idea
> to refer to the symbols in a PC-relative manner, so as to not result in
> runtime relocations. In this case, as it's within a non-alloc section
> that doesn't matter. So:
>
> .section .annotate.functions
> .long 1 # version
> .long 0xcafe # ident
> .long 2f-1f # length
> 1:
> .quad func1, 1 # noreturn
> .quad func2, 1 # noreturn
> .quad func3, 32 # something_else_but_not_noreturn
> ...
> 2:
> .long 1b-2b # align and "checksum"
>
> It might be that the length-and-header scheme is cumbersome if you need to
> write those section commands by hand, in which case another scheme might
> be preferrable, but it should somehow be self-delimiting.
>
> For the above problem of indirect calls to noreturns, instead do:
>
> .text
> noretcalllabel:
> call noreturn
> othercall:
> call really_special_thing
> .section .annotate.noretcalls
> .quad noretcalllabel, 1 # noreturn call
> .quad othercall, 32 # call to some special(-ABI?) function
>
> Same thoughts re extensibility and self-delimitation apply.

Hm, I didn't know noreturn function pointers were a thing. Annotating
the call site instead of the function would be fine.

I'm thinking PC-relative relocs are a good idea regardless, it makes the
binary smaller even if the section isn't allocatable.

As far as extending goes, I had been thinking future annotation types
would just go in new sections, e.g. .annotate.retpolinecalls, each
section with its own format. And that has the benefit of being a
simpler and easier to parse format (no headers, versions, lengths, etc).
But either way is fine I think.

>
> 1) jump tables
>
> > > Create an .annotate.jump_table section which is an array of the
> > > following variable-length structure:
> > >
> > > struct annotate_jump_table {
> > > void *indirect_jmp;
> > > long num_targets;
> > > void *targets[];
> > > };
>
> It's very often the case that the compiler already emits what your
> .targets[] member would encode, just at some unknown place, length and
> encoding. So you would save space if you instead only remember the
> encoding and places of those jump tables:
>
> struct {
> void *indirect_jump;
> long num_tables;
> struct {
> unsigned num_entries;
> unsigned encoding;
> void *start_of_table;
> } tables[];
> };
>
> The usual encodings are: direct, PC-relative, relative-to-start-of-table.
> Usually for a specific jump instruction there's only one table, so
> optimizing for that makes sense. For strange unthought-of cases it's
> probably a good idea to have your initial scheme as fallback, which could
> be indicated by a special .encoding value.
>
> > > For example, given the following switch statement code:
> > >
> > > .Lswitch_jmp:
> > > // %rax is .Lcase_1 or .Lcase_2
> > > jmp %rax
>
> So, usually %rax would point into a table (somewhere in .rodata/.text)
> that looks like so:
>
> .Ljump_table:
> .quad .Lcase_1 - .Ljump_table
> .quad .Lcase_2 - .Ljump_table
>
> (for position-independend code)
>
> and hence you would emit this as annotation:
>
> .quad .Lswitch_jmp
> .quad 1 # only a single table
> .long 2 # with two entries
> .long RELATIVE_TO_START # all entries are X - start_of_table
> .quad .Ljump_table
>
> In this case you won't save anything of course, but as soon as there's a
> meaningful number of cases you will.

As a user of the data, I would prefer a simpler format (something like
my original scheme) which uses more space, rather than needing headers,
fallback scheme, encodings, blob lengths, etc just to save some
non-allocatable bytes. But the above seems fine.

--
Josh

2022-09-14 00:47:03

by Josh Poimboeuf

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Tue, Sep 13, 2022 at 03:51:44PM -0700, Indu Bhagat wrote:
> Curious to know what all features of objtool rely on the need to reverse
> engineer the control flow graph. Is it a larger set or it is only for ORC
> generation ?

Objtool features which rely on the CFG:

- Frame pointer rule validation (when using
CONFIG_UNWINDER_FRAME_POINTER)

- ORC metadata generation

- Intel SMAP rule validation - ensures EFLAGS #AC is only set during
usercopy

- "noinstr" rule validation - ensures no instrumentation/tracing
functions are called in certain critical sections

--
Josh

2022-09-14 10:24:13

by Josh Poimboeuf

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Mon, Sep 12, 2022 at 06:31:14AM -0500, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > 2) Noreturn functions:
> >
> > There's no reliable way to determine which functions are designated
> > by the compiler to be noreturn (either explictly via function
> > attribute, or implicitly via a static function which is a wrapper
> > around a noreturn function.)
>
> Or just a function that does not return for any other reason.
>
> The compiler makes no difference between functions that have the
> attribute and functions that do not. There are good reasons to not
> have the attribute on functions that do in fact not return. The
> not-returningness of the function may be just an implementation
> accident, something you do not want part of the API, so it *should* not
> have that attribute; or you may want the callers to a function to not be
> optimised according to this knowledge (you cannot *prevent* that, the
> compiler can figure it out it other ways, but still) for any other
> reason.

Yes, many static functions that are wrappers around noreturn functions
have this "implicit noreturn" property. I agree we would need to know
about those functions (or, as Michael suggested, their call sites) as
well.

> > This information is needed because the
> > code after the call to such a function is optimized out as
> > unreachable and objtool has no way of knowing that.
>
> Since June we (GCC) have -funreachable-traps. This creates a trap insn
> wherever control flow would otherwise go into limbo.

Ah, that's interesting, though I'm not sure if we'd be able to
distinguish between "call doesn't return" traps and other traps or
reasons for UD2.

--
Josh

2022-09-14 12:36:17

by Segher Boessenkool

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Wed, Sep 14, 2022 at 11:21:00AM +0100, Josh Poimboeuf wrote:
> On Mon, Sep 12, 2022 at 06:31:14AM -0500, Segher Boessenkool wrote:
> > On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > > 2) Noreturn functions:
> > >
> > > There's no reliable way to determine which functions are designated
> > > by the compiler to be noreturn (either explictly via function
> > > attribute, or implicitly via a static function which is a wrapper
> > > around a noreturn function.)
> >
> > Or just a function that does not return for any other reason.
> >
> > The compiler makes no difference between functions that have the
> > attribute and functions that do not. There are good reasons to not
> > have the attribute on functions that do in fact not return. The
> > not-returningness of the function may be just an implementation
> > accident, something you do not want part of the API, so it *should* not
> > have that attribute; or you may want the callers to a function to not be
> > optimised according to this knowledge (you cannot *prevent* that, the
> > compiler can figure it out it other ways, but still) for any other
> > reason.
>
> Yes, many static functions that are wrappers around noreturn functions
> have this "implicit noreturn" property.

I meant functions that are noreturn intrinsically. The trivial example:

void f(void)
{
for (;;)
;
}

> I agree we would need to know
> about those functions (or, as Michael suggested, their call sites) as
> well.

Many "potentially does not return" functions (there are very many such
functions!) turn into "never returns" functions, for some inputs (or
something in the environment). If the compiler specialises a code path
that does not return, you'll not see that marked up any way. Of course
such a path should not be taken in the kernel, normally :-)

> > > This information is needed because the
> > > code after the call to such a function is optimized out as
> > > unreachable and objtool has no way of knowing that.
> >
> > Since June we (GCC) have -funreachable-traps. This creates a trap insn
> > wherever control flow would otherwise go into limbo.
>
> Ah, that's interesting, though I'm not sure if we'd be able to
> distinguish between "call doesn't return" traps and other traps or
> reasons for UD2.

The trap handler can see where the trap came from. And then look up
that address in some tables or such. Just like __bug_table?

Segher

2022-09-14 12:40:03

by Michael Matz

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

Hello,

On Wed, 14 Sep 2022, Josh Poimboeuf wrote:

> > > This information is needed because the
> > > code after the call to such a function is optimized out as
> > > unreachable and objtool has no way of knowing that.
> >
> > Since June we (GCC) have -funreachable-traps. This creates a trap insn
> > wherever control flow would otherwise go into limbo.
>
> Ah, that's interesting, though I'm not sure if we'd be able to
> distinguish between "call doesn't return" traps and other traps or
> reasons for UD2.

There are two reasons (which will turn out to be the same) for a trap (say
'UD2' on x86-64) directly after a call insn:
1) "the call shall not have returned"
2) something else jumps to that trap because it was __builtin_unreachable
(or equivalent), and the compiler happened to put that ud2 directly
after the call. It could have done that only when the call itself was
noreturn:
cmp $foo, %rax
jne do_trap
call noret
do_trap:
ud2

So, it's all the same. If there's an ud2 (or whatever the trap maker is)
after a call then it was because it's noreturn.

(But, of course this costs (little) code size, unlike the non-alloc
checker sections)

Ciao,
Michael.

2022-09-14 14:25:06

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Wed, Sep 14, 2022 at 01:04:16AM +0100, Josh Poimboeuf wrote:

> > I will mention that objtool's existence is based on mistrust, of persons
> > (not correctly annotating stuff) and of tools (not correctly heeding those
> > annotations). The mistrust in persons is understandable and can be dealt
> > with by tools, but the mistrust in tools can't be fixed by making tools
> > more complicated by emitting even more information; there's no good reason
> > to assume that one piece of info can be trusted more than other pieces.
> > So, if you mistrust the tools you have already lost. That's somewhat
> > philosophical, so I won't beat that horse much more either.
>
> Maybe this is semantics, but I wouldn't characterize objtool's existence
> as being based on the mistrust of tools. It's main motivation is to
> fill in the toolchain's blind spots in asm and inline-asm, which exist
> by design.

That and a fairly deep seated loathing for the regular CFI annotations
and DWARF in general. Linus was fairly firm he didn't want anything to
do with DWARF for in-kernel unwinding.

That left us in a spot that we needed unwind information in a 'better'
format than DWARF.

Objtool was born out of those contraints. ORC not needing the CFI
annotations and ORC being *much* faster at unwiding and generation
(debug builds are slow) were all good.

2022-09-14 14:53:04

by Michael Matz

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

Hello,

On Wed, 14 Sep 2022, Peter Zijlstra wrote:

> > Maybe this is semantics, but I wouldn't characterize objtool's existence
> > as being based on the mistrust of tools. It's main motivation is to
> > fill in the toolchain's blind spots in asm and inline-asm, which exist
> > by design.
>
> That and a fairly deep seated loathing for the regular CFI annotations
> and DWARF in general. Linus was fairly firm he didn't want anything to
> do with DWARF for in-kernel unwinding.

I was referring only to the check-stuff functionality of objtool, not to
its other parts. Altough, of course, "deep seated loathing" is a special
form of mistrust as well ;-)

> That left us in a spot that we needed unwind information in a 'better'
> format than DWARF.
>
> Objtool was born out of those contraints. ORC not needing the CFI
> annotations and ORC being *much* faster at unwiding and generation
> (debug builds are slow) were all good.

Don't mix DWARF debug info with DWARF-based unwinding info, the latter
doesn't imply the former. Out of interest: how does ORC get around the
need for CFI annotations (or equivalents to restore registers) and what
makes it fast? I want faster unwinding for DWARF as well, when there's
feature parity :-) Maybe something can be learned for integration into
dwarf-unwind.

Ciao,
Michael.

2022-09-14 16:08:22

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Wed, Sep 14, 2022 at 02:28:26PM +0000, Michael Matz wrote:
> Hello,
>
> On Wed, 14 Sep 2022, Peter Zijlstra wrote:
>
> > > Maybe this is semantics, but I wouldn't characterize objtool's existence
> > > as being based on the mistrust of tools. It's main motivation is to
> > > fill in the toolchain's blind spots in asm and inline-asm, which exist
> > > by design.
> >
> > That and a fairly deep seated loathing for the regular CFI annotations
> > and DWARF in general. Linus was fairly firm he didn't want anything to
> > do with DWARF for in-kernel unwinding.
>
> I was referring only to the check-stuff functionality of objtool, not to
> its other parts. Altough, of course, "deep seated loathing" is a special
> form of mistrust as well ;-)

Those were born out the DWARF unwinder itself crashing the kernel due to
it's inherent complexity (tracking the whole DWARF state machine and not
being quite robust itself).

That, and the manual CFI annotations were 'always' wrong, due to humans
and no tooling verifying them.

That said; objtool does do have a number of annotations as well; mostly
things telling what kind of stackframe stuff starts with.

> > That left us in a spot that we needed unwind information in a 'better'
> > format than DWARF.
> >
> > Objtool was born out of those contraints. ORC not needing the CFI
> > annotations and ORC being *much* faster at unwiding and generation
> > (debug builds are slow) were all good.
>
> Don't mix DWARF debug info with DWARF-based unwinding info, the latter
> doesn't imply the former. Out of interest: how does ORC get around the
> need for CFI annotations (or equivalents to restore registers) and what

Objtool 'interprets' the stackops. So it follows the call-graph and is
an interpreter for all instructions that modify the stack. Doing that it
konws what the stackframe is at 'most' places.

> makes it fast? I want faster unwinding for DWARF as well, when there's
> feature parity :-) Maybe something can be learned for integration into
> dwarf-unwind.

I think we have some details here:

https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html

2022-09-14 18:04:07

by Segher Boessenkool

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Wed, Sep 14, 2022 at 04:55:27PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 14, 2022 at 02:28:26PM +0000, Michael Matz wrote:
> > Don't mix DWARF debug info with DWARF-based unwinding info, the latter
> > doesn't imply the former. Out of interest: how does ORC get around the
> > need for CFI annotations (or equivalents to restore registers) and what
>
> Objtool 'interprets' the stackops. So it follows the call-graph and is
> an interpreter for all instructions that modify the stack. Doing that it
> konws what the stackframe is at 'most' places.

To get correct backtraces on e.g. PowerPC you need to emulate many of
the integer insns. That is why GCC enables -fasynchronous-unwind-tables
by default for us.

The same is true for s390, aarch64, and x86 (unless 32-bit w/ frame
pointer).

The problem is that you do not know how to access anything on the stack,
whether in the current frame or in a previous frame, from a random point
in the program. GDB has many heuristics for this, and it still does not
get it right in all cases.

> > makes it fast? I want faster unwinding for DWARF as well, when there's
> > feature parity :-) Maybe something can be learned for integration into
> > dwarf-unwind.
>
> I think we have some details here:
>
> https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html

It is faster because it does a whole lot less. Is that still enough?
It's not clear (to me) what exact information it wants to provide :-(

Segher

2022-09-15 03:31:23

by Chen Zhongjin

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

Hi,

On 2022/9/12 22:17, Michael Matz wrote:
> Hey,
>
> On Mon, 12 Sep 2022, Borislav Petkov wrote:
>
>> Micha, any opinions on the below are appreciated.
>>
>> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
>>> difficult to ensure correctness. Also, due to kernel live patching, the
>>> kernel relies on 100% correctness of unwinding metadata, whereas the
>>> toolchain treats it as a best effort.
> Unwinding certainly is not best effort. It's 100% reliable as far as the
> source language or compilation options require. But as it doesn't
> touch the discussed features I won't belabor that point.
>
> I will mention that objtool's existence is based on mistrust, of persons
> (not correctly annotating stuff) and of tools (not correctly heeding those
> annotations). The mistrust in persons is understandable and can be dealt
> with by tools, but the mistrust in tools can't be fixed by making tools
> more complicated by emitting even more information; there's no good reason
> to assume that one piece of info can be trusted more than other pieces.
> So, if you mistrust the tools you have already lost. That's somewhat
> philosophical, so I won't beat that horse much more either.
>
> Now, recovering the CFG. I'll switch order of your two items:
>
> 2) noreturn function
>
>>> .pushsection .annotate.noreturn
>>> .quad func1
>>> .quad func2
>>> .quad func3
>>> .popsection
> This won't work for indirect calls to noreturn functions:
>
> void (* __attribute__((noreturn)) noretptr)(void);
> int callnoret (int i)
> {
> noretptr();
> return i + 32;
> }
>
> The return statement is unreachable (and removed by GCC). To know that
> you would have to mark the call statements, not the individual functions.
> All schemes that mark functions that somehow indicates a meaningful
> difference in the calling sequence (e.g. the ABI of functions) have the
> same problem: it's part of the call expressions type, not of individual
> decls.
>
> Second problem: it's not extensible. Today it's noreturn functions you
> want to know, and tomorrow? So, add a flag word per entry, define bit 0
> for now to be NORETURN, and see what comes. Add a header with a version
> (and/or identifier) as well and it's properly extensible. For easy
> linking and identifying the blobs in the linked result include a length in
> the header. If this were in an allocated section it would be a good idea
> to refer to the symbols in a PC-relative manner, so as to not result in
> runtime relocations. In this case, as it's within a non-alloc section
> that doesn't matter. So:
>
> .section .annotate.functions
> .long 1 # version
> .long 0xcafe # ident
> .long 2f-1f # length
> 1:
> .quad func1, 1 # noreturn
> .quad func2, 1 # noreturn
> .quad func3, 32 # something_else_but_not_noreturn
> ...
> 2:
> .long 1b-2b # align and "checksum"
>
> It might be that the length-and-header scheme is cumbersome if you need to
> write those section commands by hand, in which case another scheme might
> be preferrable, but it should somehow be self-delimiting.
>
> For the above problem of indirect calls to noreturns, instead do:
>
> .text
> noretcalllabel:
> call noreturn
> othercall:
> call really_special_thing
> .section .annotate.noretcalls
> .quad noretcalllabel, 1 # noreturn call
> .quad othercall, 32 # call to some special(-ABI?) function
>
> Same thoughts re extensibility and self-delimitation apply.
>
> 1) jump tables
>
>>> Create an .annotate.jump_table section which is an array of the
>>> following variable-length structure:
>>>
>>> struct annotate_jump_table {
>>> void *indirect_jmp;
>>> long num_targets;
>>> void *targets[];
>>> };
> It's very often the case that the compiler already emits what your
> .targets[] member would encode, just at some unknown place, length and
> encoding. So you would save space if you instead only remember the
> encoding and places of those jump tables:

We have found some anonymous information on x86 in .rodata.

I'm not sure if those are *all* of Josh wanted on x86, however for arm64
we did not found that in the same section so it is a problem on arm64 now.

Does the compiler will emit these for all arches? At lease I tried and
didn't find anything meaningful (maybe I omitted it).

Best,

Chen

> struct {
> void *indirect_jump;
> long num_tables;
> struct {
> unsigned num_entries;
> unsigned encoding;
> void *start_of_table;
> } tables[];
> };
>
> The usual encodings are: direct, PC-relative, relative-to-start-of-table.
> Usually for a specific jump instruction there's only one table, so
> optimizing for that makes sense. For strange unthought-of cases it's
> probably a good idea to have your initial scheme as fallback, which could
> be indicated by a special .encoding value.
>
>>> For example, given the following switch statement code:
>>>
>>> .Lswitch_jmp:
>>> // %rax is .Lcase_1 or .Lcase_2
>>> jmp %rax
> So, usually %rax would point into a table (somewhere in .rodata/.text)
> that looks like so:
>
> .Ljump_table:
> .quad .Lcase_1 - .Ljump_table
> .quad .Lcase_2 - .Ljump_table
>
> (for position-independend code)
>
> and hence you would emit this as annotation:
>
> .quad .Lswitch_jmp
> .quad 1 # only a single table
> .long 2 # with two entries
> .long RELATIVE_TO_START # all entries are X - start_of_table
> .quad .Ljump_table
>
> In this case you won't save anything of course, but as soon as there's a
> meaningful number of cases you will.
>
> Again, if that info would be put into an allocated section you would want
> to use relative encodings of the addresses to avoid runtime relocs. And
> the remarks about self-delimitation and extensibility also apply here.
>
>>> Alternatives
>>> ------------
>>>
>>> Another idea which has been floated in the past is for objtool to read
>>> DWARF (or .eh_frame) to help it figure out the control flow. That
>>> hasn't been tried yet, but would be considerably more difficult and
>>> fragile IMO.
> While noreturn functions are marked in the debug info, noreturn
> function types currently aren't quite correct. And jump-tables aren't
> marked at all, so that would lose.
>
>
> Ciao,
> Michael.

2022-09-15 09:02:05

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Thu, Sep 15, 2022 at 10:56:58AM +0800, Chen Zhongjin wrote:

> We have found some anonymous information on x86 in .rodata.

Well yes, but that's still a bunch of heuristics on our side.

> I'm not sure if those are *all* of Josh wanted on x86, however for arm64 we
> did not found that in the same section so it is a problem on arm64 now.

Nick found Bolt managed the ARM64 jumptables:

https://github.com/llvm/llvm-project/blob/main/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp#L484

But that does look like a less than ideal solution too.

> Does the compiler will emit these for all arches? At lease I tried and
> didn't find anything meaningful (maybe I omitted it).

That's the question; can we get the compiler to help us here in a well
defined manner.

2022-09-20 17:30:33

by Ard Biesheuvel

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

On Thu, 15 Sept 2022 at 10:47, Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Sep 15, 2022 at 10:56:58AM +0800, Chen Zhongjin wrote:
>
> > We have found some anonymous information on x86 in .rodata.
>
> Well yes, but that's still a bunch of heuristics on our side.
>
> > I'm not sure if those are *all* of Josh wanted on x86, however for arm64 we
> > did not found that in the same section so it is a problem on arm64 now.
>
> Nick found Bolt managed the ARM64 jumptables:
>
> https://github.com/llvm/llvm-project/blob/main/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp#L484
>
> But that does look like a less than ideal solution too.
>
> > Does the compiler will emit these for all arches? At lease I tried and
> > didn't find anything meaningful (maybe I omitted it).
>
> That's the question; can we get the compiler to help us here in a well
> defined manner.

Do BTI landing pads help at all here? I.e., I assume that objtool just
treats any indirect call as a dangling edge in the control flow graph,
and the problem is identifying the valid targets. In the BTI case,
those will all start with a 'BTI J' instruction.

2022-09-21 03:44:31

by Chen Zhongjin

[permalink] [raw]

Subject: Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

Hi,

On 2022/9/21 0:49, Ard Biesheuvel wrote:
> On Thu, 15 Sept 2022 at 10:47, Peter Zijlstra <[email protected]> wrote:
>> On Thu, Sep 15, 2022 at 10:56:58AM +0800, Chen Zhongjin wrote:
>>
>>> We have found some anonymous information on x86 in .rodata.
>> Well yes, but that's still a bunch of heuristics on our side.
>>
>>> I'm not sure if those are *all* of Josh wanted on x86, however for arm64 we
>>> did not found that in the same section so it is a problem on arm64 now.
>> Nick found Bolt managed the ARM64 jumptables:
>>
>> https://github.com/llvm/llvm-project/blob/main/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp#L484
>>
>> But that does look like a less than ideal solution too.
>>
>>> Does the compiler will emit these for all arches? At lease I tried and
>>> didn't find anything meaningful (maybe I omitted it).
>> That's the question; can we get the compiler to help us here in a well
>> defined manner.
> Do BTI landing pads help at all here? I.e., I assume that objtool just
> treats any indirect call as a dangling edge in the control flow graph,
> and the problem is identifying the valid targets. In the BTI case,
> those will all start with a 'BTI J' instruction.

Maybe not enough, I guess.

For switch jump tables we need to know its *own* jump targets so that we
can go through all its branches. If there are more than one indirect
jump inside one function, only marks targets with BTI J can't help
matching the entry and its targets.

Anyway I think this job is more for compiler. Switch jump tables is
different from other indirect jump/call. It have fixed control flow just
as if/else flow and the indirect jump table is just a compiler
optimization which hide this.

Best,

Chen