2023-12-02 12:45:56

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH] nolibc: optimise _start() on x86_64

Just jump into _start_c, it is not going to return anyway.

Signed-off-by: Alexey Dobriyan <[email protected]>
---

Also, kernel clears all registers before starting process,
I'm not sure why

xor ebp, ebp

was added.

tools/include/nolibc/arch-x86_64.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -167,8 +167,7 @@ void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_
"xor %ebp, %ebp\n" /* zero the stack frame */
"mov %rsp, %rdi\n" /* save stack pointer to %rdi, as arg1 of _start_c */
"and $-16, %rsp\n" /* %rsp must be 16-byte aligned before call */
- "call _start_c\n" /* transfer to c runtime */
- "hlt\n" /* ensure it does not return */
+ "jmp _start_c\n" /* transfer to c runtime */
);
__builtin_unreachable();
}


2023-12-02 13:24:20

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] nolibc: optimise _start() on x86_64

Hi Alexey,

On Sat, Dec 02, 2023 at 03:45:13PM +0300, Alexey Dobriyan wrote:
> Just jump into _start_c, it is not going to return anyway.

Thanks, but what's upper in the stack there ? I'm trying to make sure
that if _start_c returns we don't get a random behavior. If we get a
systematic crash (e.g. 0 always there) that's fine, what would be
annoying would be random infinite loops etc. In the psABI description
(table 3.9) I'm seeing "undefined" before argc, which I don't find
much appealing.

> Signed-off-by: Alexey Dobriyan <[email protected]>
> ---
>
> Also, kernel clears all registers before starting process,
> I'm not sure why
>
> xor ebp, ebp
>
> was added.

Hmmm psABI says:

Only the registers listed below have specied values at process entry:

%rbp The content of this register is unspecied at process initialization
time, but the user code should mark the deepest stack frame by setting
the frame pointer to zero.

%rsp The stack pointer holds the address of the byte with lowest address
which is part of the stack. It is guaranteed to be 16-byte aligned at
process entry.

%rdx a function pointer that the application should register with atexit (BA_OS).

Thus apparently it's documented as being our job to clear it :-/

Willy

2023-12-03 12:01:31

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH] nolibc: optimise _start() on x86_64

On Sat, Dec 02, 2023 at 02:23:59PM +0100, Willy Tarreau wrote:
> Hi Alexey,
>
> On Sat, Dec 02, 2023 at 03:45:13PM +0300, Alexey Dobriyan wrote:
> > Just jump into _start_c, it is not going to return anyway.
>
> Thanks, but what's upper in the stack there ?

argc

(gdb) break _start
(gdb) run

(gdb) x/20gx $sp
0x7fffffffdae0: 0x0000000000000004 0x00007fffffffdf33
0x7fffffffdaf0: 0x00007fffffffdf49 0x00007fffffffdf4b
0x7fffffffdb00: 0x00007fffffffdf4d 0x0000000000000000
0x7fffffffdb10: 0x00007fffffffdf4f 0x00007fffffffdf70
0x7fffffffdb20: 0x00007fffffffdf80 0x00007fffffffdfce

(gdb) x/s 0x00007fffffffdf33
0x7fffffffdf33: "/home/ad/s-test/a.out"

> I'm trying to make sure
> that if _start_c returns we don't get a random behavior.

Yes, it should segfault executing from very small address.
I tested with

.intel_syntax noprefix
.globl _start
_start:
ret
mov eax, 231
xor edi, edi
syscall

> If we get a
> systematic crash (e.g. 0 always there) that's fine, what would be
> annoying would be random infinite loops etc. In the psABI description
> (table 3.9) I'm seeing "undefined" before argc, which I don't find
> much appealing.
>
> > Signed-off-by: Alexey Dobriyan <[email protected]>
> > ---
> >
> > Also, kernel clears all registers before starting process,
> > I'm not sure why
> >
> > xor ebp, ebp
> >
> > was added.
>
> Hmmm psABI says:
>
> Only the registers listed below have specied values at process entry:
>
> %rbp The content of this register is unspecied at process initialization
> time, but the user code should mark the deepest stack frame by setting
> the frame pointer to zero.
>
> %rsp The stack pointer holds the address of the byte with lowest address
> which is part of the stack. It is guaranteed to be 16-byte aligned at
> process entry.
>
> %rdx a function pointer that the application should register with atexit (BA_OS).
>
> Thus apparently it's documented as being our job to clear it :-/

I meant, ELF loader clears all registers except rsp and aligns the stack to 16 bytes.
There were problems with stack aligning, but registers, I think, were always zeroed.

2023-12-03 21:38:13

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] nolibc: optimise _start() on x86_64

On Sun, Dec 03, 2023 at 03:00:48PM +0300, Alexey Dobriyan wrote:
> On Sat, Dec 02, 2023 at 02:23:59PM +0100, Willy Tarreau wrote:
> > Hi Alexey,
> >
> > On Sat, Dec 02, 2023 at 03:45:13PM +0300, Alexey Dobriyan wrote:
> > > Just jump into _start_c, it is not going to return anyway.
> >
> > Thanks, but what's upper in the stack there ?
>
> argc
>
> (gdb) break _start
> (gdb) run
>
> (gdb) x/20gx $sp
> 0x7fffffffdae0: 0x0000000000000004 0x00007fffffffdf33
> 0x7fffffffdaf0: 0x00007fffffffdf49 0x00007fffffffdf4b
> 0x7fffffffdb00: 0x00007fffffffdf4d 0x0000000000000000
> 0x7fffffffdb10: 0x00007fffffffdf4f 0x00007fffffffdf70
> 0x7fffffffdb20: 0x00007fffffffdf80 0x00007fffffffdfce
>
> (gdb) x/s 0x00007fffffffdf33
> 0x7fffffffdf33: "/home/ad/s-test/a.out"
>
> > I'm trying to make sure
> > that if _start_c returns we don't get a random behavior.
>
> Yes, it should segfault executing from very small address.
> I tested with
>
> .intel_syntax noprefix
> .globl _start
> _start:
> ret
> mov eax, 231
> xor edi, edi
> syscall

Well, this could possibly be acceptable then but the ABI also says that
we need rsp to be 16-byte aligned before the call, so it's supposed to
be 8 on top of this, so this would actually require more code to maintain
this guarantee, since a sub rsp,8 is longer than just the hlt we're saving.

> > If we get a
> > systematic crash (e.g. 0 always there) that's fine, what would be
> > annoying would be random infinite loops etc. In the psABI description
> > (table 3.9) I'm seeing "undefined" before argc, which I don't find
> > much appealing.
> >
> > > Signed-off-by: Alexey Dobriyan <[email protected]>
> > > ---
> > >
> > > Also, kernel clears all registers before starting process,
> > > I'm not sure why
> > >
> > > xor ebp, ebp
> > >
> > > was added.
> >
> > Hmmm psABI says:
> >
> > Only the registers listed below have specied values at process entry:
> >
> > %rbp The content of this register is unspecied at process initialization
> > time, but the user code should mark the deepest stack frame by setting
> > the frame pointer to zero.
> >
> > %rsp The stack pointer holds the address of the byte with lowest address
> > which is part of the stack. It is guaranteed to be 16-byte aligned at
> > process entry.
> >
> > %rdx a function pointer that the application should register with atexit (BA_OS).
> >
> > Thus apparently it's documented as being our job to clear it :-/
>
> I meant, ELF loader clears all registers except rsp and aligns the stack to 16 bytes.
> There were problems with stack aligning, but registers, I think, were always zeroed.

But there's a strong difference between what's observed and what's
specified. If you get the x86_64 ABI spec to reflect this, it becomes
the standard and we can rely on it. Otherwise the standard remains what
is documented, and what is implemented may change while remaining within
the specs above.

Willy