On Wed, May 13, 2020 at 09:52:07PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2020, 20:50 Andy Lutomirski <[email protected]> wrote:
>
> >
> > LTO isn’t a linker taking regular .o files full of regular machine
> > code and optimizing it. That’s nuts.
> >
>
> Yeah, you're right. I wear originally thinking just an optimizing
> assembler, and then started thinking about link-time optimizations in that
> sense, but it was wrong to then go from that to LTO which has a very
> specific meaning.
>
> We do have assemblers that do some optimizations, but they tend to all be
> at the single instruction level (eg things like turning "add $128" into
> "sub $-128" which fits in a byte constant).
>
> Linus
>
> >
The gcc docs [1,2] at least don't inspire much confidence that this will
continue working with plain asm("") though:
"Note that GCC’s optimizers can move asm statements relative to other
code, including across jumps."
...
"Note that the compiler can move even volatile asm instructions relative
to other code, including across jump instructions."
Even if we don't include an instruction in it I think it should at least
have a memory clobber, to stop the compiler from deciding that it can be
moved before the call so it can do the tail-call optimization.
[1] https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html#Basic-Asm
[2] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile
On Thu, May 14, 2020 at 7:22 AM Arvind Sankar <[email protected]> wrote:
> On Wed, May 13, 2020 at 09:52:07PM -0700, Linus Torvalds wrote:
> > On Wed, May 13, 2020, 20:50 Andy Lutomirski <[email protected]> wrote:
> The gcc docs [1,2] at least don't inspire much confidence that this will
> continue working with plain asm("") though:
>
> "Note that GCC’s optimizers can move asm statements relative to other
> code, including across jumps."
> ...
> "Note that the compiler can move even volatile asm instructions relative
> to other code, including across jump instructions."
>
> Even if we don't include an instruction in it I think it should at least
> have a memory clobber, to stop the compiler from deciding that it can be
> moved before the call so it can do the tail-call optimization.
I think LTO would still be able to notice that cpu_startup_entry() can
be annotated __attribute__((noreturn)) and optimize the callers
accordingly, which in turn would allow a tail call again after dead code
elimination.
Arnd