2023-09-14 02:20:49

by Sebastian Ott

[permalink] [raw]
Subject: aarch64 binaries using nolibc segfault before reaching the entry point

Hi,

the tpidr2 selftest on an arm box segfaults before reaching the entry point.
I have no clue what is to blame for this or how to debug it but for a
statically linked binary there shouldn't be much stuff going on besides the
elf loader?

I can reproduce this with a program using an empty main function. Also checked
for other nolibc users - same result for init.c from rcutorture.

tools/testing/selftests/arm64/fp/za-fork is working though - the only
difference I could spot here is that it is linked together with another object
file. I also looked at the elf headers but didn't find anything obvious (but
I'm a bit out of my comfort zone here..)

After playing around with linker options I found that using -static-pie
lets the binaries run successful.

[root@arm abi]# cat test.c
int main(void)
{
return 1;
}
[root@arm abi]# gcc -Os -static -Wall -lgcc -nostdlib -ffreestanding -include ../../../../include/nolibc/nolibc.h test.c
[root@arm abi]# ./a.out
Segmentation fault
[root@arm abi]# gcc -Os -static -Wall -lgcc -nostdlib -ffreestanding -static-pie -include ../../../../include/nolibc/nolibc.h test.c
[root@arm abi]# ./a.out
[root@arm abi]#

All on aarch64 running fedora37 + upstream kernel. Any hints on what could
be borken here or how to actually fix it?

Sebastian


2023-09-14 10:14:19

by Thomas Weißschuh

[permalink] [raw]
Subject: Re: aarch64 binaries using nolibc segfault before reaching the entry point

On 2023-09-13 22:19:00+0200, Thomas Weißschuh wrote:
> On 2023-09-13 20:44:59+0200, Sebastian Ott wrote:
> > the tpidr2 selftest on an arm box segfaults before reaching the entry point.
> > I have no clue what is to blame for this or how to debug it but for a
> > statically linked binary there shouldn't be much stuff going on besides the
> > elf loader?

> [..]

>
> I reduced it to the following reproducer:
>
> $ cat test.c
> int foo; /* It works when deleting this variable */
>
> void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) _start(void)
> {
> __asm__ volatile (
> "mov x8, 93\n" /* NR_exit == 93 */
> "svc #0\n"
> );
> __builtin_unreachable();
> }
>
> $ aarch64-linux-gnu-gcc -Os -static -fno-stack-protector -Wall -nostdlib test.c
> $ ./a.out
> Segmentation fault
>
> Also when running under gdb the error message is:
>
> During startup program terminated with signal SIGSEGV, Segmentation fault.
>
> So it seems the error already happens during loading.
>
> Could be a compiler or kernel bug?

Callchain for the failure:

load_elf_binary()
-> if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bess)))
-> padzero()
-> clear_user()
-> __arch_clear_user()
-> failure in arch/arm64/lib/clear_user.S

Resulting in a EFAULT which gets translated to SIGSEGV somewhere.


The following patch, which seems sensible to me, fixes it for me.
But as this is really old, heavily used code I'm a bit hesitant.

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 7b3d2d491407..13f71733ba63 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -112,7 +112,7 @@ static struct linux_binfmt elf_format = {

static int set_brk(unsigned long start, unsigned long end, int prot)
{
- start = ELF_PAGEALIGN(start);
+ start = ELF_PAGESTART(start);
end = ELF_PAGEALIGN(end);
if (end > start) {
/*

2023-09-16 21:10:52

by Thomas Weißschuh

[permalink] [raw]
Subject: Re: aarch64 binaries using nolibc segfault before reaching the entry point

On 2023-09-13 20:44:59+0200, Sebastian Ott wrote:
> Hi,
>
> the tpidr2 selftest on an arm box segfaults before reaching the entry point.
> I have no clue what is to blame for this or how to debug it but for a
> statically linked binary there shouldn't be much stuff going on besides the
> elf loader?
>
> I can reproduce this with a program using an empty main function. Also checked
> for other nolibc users - same result for init.c from rcutorture.
>
> tools/testing/selftests/arm64/fp/za-fork is working though - the only
> difference I could spot here is that it is linked together with another object
> file. I also looked at the elf headers but didn't find anything obvious (but
> I'm a bit out of my comfort zone here..)
>
> After playing around with linker options I found that using -static-pie
> lets the binaries run successful.
>
> [root@arm abi]# cat test.c
> int main(void)
> {
> return 1;
> }
> [root@arm abi]# gcc -Os -static -Wall -lgcc -nostdlib -ffreestanding -include ../../../../include/nolibc/nolibc.h test.c
> [root@arm abi]# ./a.out Segmentation fault
> [root@arm abi]# gcc -Os -static -Wall -lgcc -nostdlib -ffreestanding -static-pie -include ../../../../include/nolibc/nolibc.h test.c
> [root@arm abi]# ./a.out [root@arm abi]#
>
> All on aarch64 running fedora37 + upstream kernel. Any hints on what could
> be borken here or how to actually fix it?

I reduced it to the following reproducer:

$ cat test.c
int foo; /* It works when deleting this variable */

void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) _start(void)
{
__asm__ volatile (
"mov x8, 93\n" /* NR_exit == 93 */
"svc #0\n"
);
__builtin_unreachable();
}

$ aarch64-linux-gnu-gcc -Os -static -fno-stack-protector -Wall -nostdlib test.c
$ ./a.out
Segmentation fault

Also when running under gdb the error message is:

During startup program terminated with signal SIGSEGV, Segmentation fault.

So it seems the error already happens during loading.

Could be a compiler or kernel bug?

Thomas