LinuxLists.cc - Suboptimal inline heuristics due to non-code sections

2018-05-01 06:50:43

Subject: Suboptimal inline heuristics due to non-code sections

When gcc considers the size of a function for inlining decisions, it
apparently considers *all* sections. Since the kernel extensively uses
sections for things other than code (e.g., exception-table, bug-table), the
optimality of these decisions seem questionable to me.

The objtool’s sections may be the most extreme case, as these sections are
discarded, while their size still appears to be considered by the inlining
heuristics. It may be beneficial not to consider (some) the other sections
as well, as they do not affect code-caching but only increase the kernel
size.

To illustrate the issue, consider the function copy_overflow():

0xffffffff819315e0 <+0>: push %rbp
0xffffffff819315e1 <+1>: mov %rsi,%rdx
0xffffffff819315e4 <+4>: mov %edi,%esi
0xffffffff819315e6 <+6>: mov $0xffffffff820bc4b8,%rdi
0xffffffff819315ed <+13>: mov %rsp,%rbp
0xffffffff819315f0 <+16>: callq 0xffffffff81089b70 <__warn_printk>
0xffffffff819315f5 <+21>: ud2
0xffffffff819315f7 <+23>: pop %rbp
0xffffffff819315f8 <+24>: retq

This function seems to me as a great candidate for inlining. Yet, in my 4.16
build (using gcc 7.2), I get 38 non-inlined instances of this function in
vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
non-inlined instances to 35. Removing, in addition, the data which is saved
in the __bug_table makes all the instances of the function to be inlined.

Obviously this certain function can be set as __always_inline, but the inline
heuristics seems to me as wrongfully biased.

What do you think?

Is there a way to make gcc to ignore sections for its inlining heuristics?

Thanks,
Nadav

2018-05-01 13:40:47

by Josh Poimboeuf

[permalink] [raw]

Subject: Re: Suboptimal inline heuristics due to non-code sections

On Tue, May 01, 2018 at 06:50:14AM +0000, Nadav Amit wrote:
> When gcc considers the size of a function for inlining decisions, it
> apparently considers *all* sections. Since the kernel extensively uses
> sections for things other than code (e.g., exception-table, bug-table), the
> optimality of these decisions seem questionable to me.
>
> The objtool’s sections may be the most extreme case, as these sections are
> discarded, while their size still appears to be considered by the inlining
> heuristics. It may be beneficial not to consider (some) the other sections
> as well, as they do not affect code-caching but only increase the kernel
> size.
>
> To illustrate the issue, consider the function copy_overflow():
>
> 0xffffffff819315e0 <+0>: push %rbp
> 0xffffffff819315e1 <+1>: mov %rsi,%rdx
> 0xffffffff819315e4 <+4>: mov %edi,%esi
> 0xffffffff819315e6 <+6>: mov $0xffffffff820bc4b8,%rdi
> 0xffffffff819315ed <+13>: mov %rsp,%rbp
> 0xffffffff819315f0 <+16>: callq 0xffffffff81089b70 <__warn_printk>
> 0xffffffff819315f5 <+21>: ud2
> 0xffffffff819315f7 <+23>: pop %rbp
> 0xffffffff819315f8 <+24>: retq
>
> This function seems to me as a great candidate for inlining. Yet, in my 4.16
> build (using gcc 7.2), I get 38 non-inlined instances of this function in
> vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
> non-inlined instances to 35. Removing, in addition, the data which is saved
> in the __bug_table makes all the instances of the function to be inlined.
>
> Obviously this certain function can be set as __always_inline, but the inline
> heuristics seems to me as wrongfully biased.
>
> What do you think?
>
> Is there a way to make gcc to ignore sections for its inlining heuristics?

Good find.

Playing around with one of the affected files (crypto/af_alg.o), if I
make the .discard.reachable section empty by removing the text reference
from the annotate_reachable() macro, then copy_overflow() still isn't
inlined.

But if I remove the section completely by removing the
pushsection/popsection, then copy_overflow() gets inlined.

So GCC's inlining decisions are somehow influenced by the existence of
some random empty section. This definitely seems like a GCC bug to me.

--
Josh

2018-05-01 15:38:23

by Linus Torvalds

[permalink] [raw]

Subject: Re: Suboptimal inline heuristics due to non-code sections

On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf <[email protected]> wrote:

> But if I remove the section completely by removing the
> pushsection/popsection, then copy_overflow() gets inlined.

> So GCC's inlining decisions are somehow influenced by the existence of
> some random empty section. This definitely seems like a GCC bug to me.

I think gcc uses the size of the string to approximate the size of an
inline asm.

So I don't think it's the "empty section" that makes gcc do this, I think
it's literally "our inline asms _look_ big".

Linus "does this section directive make me look fat?"
Torvalds

2018-05-01 16:39:54

by Nadav Amit

[permalink] [raw]

Subject: Re: Suboptimal inline heuristics due to non-code sections

Linus Torvalds <[email protected]> wrote:

> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf <[email protected]> wrote:
>
>> But if I remove the section completely by removing the
>> pushsection/popsection, then copy_overflow() gets inlined.
>
>> So GCC's inlining decisions are somehow influenced by the existence of
>> some random empty section. This definitely seems like a GCC bug to me.
>
> I think gcc uses the size of the string to approximate the size of an
> inline asm.
>
> So I don't think it's the "empty section" that makes gcc do this, I think
> it's literally "our inline asms _look_ big”.

I didn’t think about that.

Playing with the code a bit more, it seems that it is actually related to
the number of “new-lines” in the inline assembly. Removing 4 new-lines from
_BUG_FLAGS (those that can be removed without breaking assembly) eliminated
most of the non-inlined versions of copy_overflow().

Would it be reasonable to remove new-lines in such cases?

Regards,
Nadav

2018-05-01 16:48:01

by Nadav Amit

[permalink] [raw]

Subject: Re: Suboptimal inline heuristics due to non-code sections

Nadav Amit <[email protected]> wrote:

> Linus Torvalds <[email protected]> wrote:
>
>> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf <[email protected]> wrote:
>>
>>> But if I remove the section completely by removing the
>>> pushsection/popsection, then copy_overflow() gets inlined.
>>
>>> So GCC's inlining decisions are somehow influenced by the existence of
>>> some random empty section. This definitely seems like a GCC bug to me.
>>
>> I think gcc uses the size of the string to approximate the size of an
>> inline asm.
>>
>> So I don't think it's the "empty section" that makes gcc do this, I think
>> it's literally "our inline asms _look_ big”.
>
> I didn’t think about that.
>
> Playing with the code a bit more, it seems that it is actually related to
> the number of “new-lines” in the inline assembly. Removing 4 new-lines from
> _BUG_FLAGS (those that can be removed without breaking assembly) eliminated
> most of the non-inlined versions of copy_overflow().
>
> Would it be reasonable to remove new-lines in such cases?

My bad. It’s not the new-line. Let me do some more digging.

Nadav

2018-05-01 16:48:51

by Linus Torvalds

[permalink] [raw]

Subject: Re: Suboptimal inline heuristics due to non-code sections

On Tue, May 1, 2018 at 9:39 AM Nadav Amit <[email protected]> wrote:

> Would it be reasonable to remove new-lines in such cases?

Yeah, that may be fine for some of our (already illegible) section crud.

Linus

2018-05-01 16:52:36

by Linus Torvalds

[permalink] [raw]

Subject: Re: Suboptimal inline heuristics due to non-code sections

On Tue, May 1, 2018 at 9:46 AM Nadav Amit <[email protected]> wrote:

> My bad. It’s not the new-line. Let me do some more digging.

From the gcc docs:

Some targets require that GCC track the size of each instruction used
in order to generate correct code. Because the final length of the
code produced by an @code{asm} statement is only known by the
assembler, GCC must make an estimate as to how big it will be. It
does this by counting the number of instructions in the pattern of the
@code{asm} and multiplying that by the length of the longest
instruction supported by that processor. (When working out the number
of instructions, it assumes that any occurrence of a newline or of
whatever statement separator character is supported by the assembler --
typically @samp{;} --- indicates the end of an instruction.)

so it probably counts newlines and semicolons to estimate the size.

Linus

2018-05-01 16:55:20

by Nadav Amit

[permalink] [raw]

Subject: Re: Suboptimal inline heuristics due to non-code sections

Linus Torvalds <[email protected]> wrote:

> On Tue, May 1, 2018 at 9:46 AM Nadav Amit <[email protected]> wrote:
>
>> My bad. It’s not the new-line. Let me do some more digging.
>
> From the gcc docs:
>
> Some targets require that GCC track the size of each instruction used
> in order to generate correct code. Because the final length of the
> code produced by an @code{asm} statement is only known by the
> assembler, GCC must make an estimate as to how big it will be. It
> does this by counting the number of instructions in the pattern of the
> @code{asm} and multiplying that by the length of the longest
> instruction supported by that processor. (When working out the number
> of instructions, it assumes that any occurrence of a newline or of
> whatever statement separator character is supported by the assembler --
> typically @samp{;} --- indicates the end of an instruction.)
>
> so it probably counts newlines and semicolons to estimate the size.

Thanks. I probably did not have enough coffee.

I’ll work on it.

Regards,
Nadav