Hello Paul,
this 3rd series aims at generally improving usability and maintainance
of nolibc. It needs to be applied on top of the s390 series:
https://lore.kernel.org/lkml/[email protected]/
- first, I've encountered remaining problems related to section
reordering happening at certain optimization levels and that is also
arch-dependent. We were fortunate they didn't appear in rcutorture,
but I could reproduce them with ARM in Thumb mode at -O0. The problem
is that our out-of-block asm() statement changes the current section
to ".text" without the compiler knowing. Thus the compiler may believe
it's still in .bss and emit variables immediately after (e.g. errno),
which end up in the wrong section. Switching the section at the end
to .bss doesn't work either because at -Os I was seeing sys_brk()
placed immediately after and crashing as .bss is not executable. The
only safe solution is to turn _start to real functions. This was
tested on all archs at -O0,-O1,-O2,-O3,-Os and all of them now work.
- second, thumb mode support was in complete on ARM. Only thumb-2 was
supported and depending on how the toolchain is configured, passing
"-mthumb" would result in thumb-1 (if armv4/5 was the default) or
thumb-2 (when armv7 was the default). I discovered this when first
trying the toolchains from kernel.org because mine works in v7 by
default, hence thumb-2. The change only replaces a few instructions
that are not available in thumb-1 by their compatible equivalent. In
addition, thumb cannot be used at -O0 or with frame pointers in general
because register r7 is the frame pointer there, and cannot be assigned
by the compiler. That's bad because r7 carries the syscall number. Now
when thumb is detected, we simply use a slightly larger setup code
which uses r6 and swaps it with r7 when performing the call.
- third, the definitions of the (possibly wrong) arch-specific O_* values
were dropped in favor of those coming from asm/fcntl.h. Not only these
ones are correct, but doing so will avoid build redefinition warnings
should the file be included for whatever other reason.
- the errno, environ and the auxiliary vector were really a pain to
use. errno was declared as a static variable, showing a different
one to each build unit. environ had to be declared by the application,
and the auxv had to be both declared and found by the application if
needed. All three of them have now been declared as weak symbols,
and environ and _auxv are setup during startup, so that only one
instance of them exists across the whole binary, and that code
currently declaring them continues to work. This now means that code
not using them will not optimize them away anymore, but let's face
it, errno was always used and no relevant application manages to get
rid of .bss, so the amount of extra space is really just 8-16 bytes
total for a much better simplicity for the user.
- getauxval() and getpagesize() were added by Ammar Faizi (along with
the associated selftests).
This was tested on arm64/armv5/armv7/thumb1/thumb2/i386/x86_64/mips/riscv
and s390 at all optimization levels. I could also verify that my original
preinit code continues to build and work fine, so please consider queuing
it.
Thank you!
Willy
Changes since v1:
- added missing s-o-b on the last 3 patches
Ammar Faizi (3):
nolibc/stdlib: Implement `getauxval(3)` function
nolibc/sys: Implement `getpagesize(2)` function
selftests/nolibc: Add `getpagesize(2)` selftest
Sven Schnelle (2):
tools/nolibc: export environ as a weak symbol on s390
tools/nolibc: add auxiliary vector retrieval for s390
Willy Tarreau (17):
tools/nolibc: make compiler and assembler agree on the section around
_start
tools/nolibc: enable support for thumb1 mode for ARM
tools/nolibc: support thumb mode with frame pointers on ARM
tools/nolibc: remove local definitions of O_* flags for open/fcntl
tools/nolibc: make errno a weak symbol instead of a static one
tools/nolibc: export environ as a weak symbol on x86_64
tools/nolibc: export environ as a weak symbol on i386
tools/nolibc: export environ as a weak symbol on arm64
tools/nolibc: export environ as a weak symbol on arm
tools/nolibc: export environ as a weak symbol on mips
tools/nolibc: export environ as a weak symbol on riscv
tools/nolibc: add auxiliary vector retrieval for i386
tools/nolibc: add auxiliary vector retrieval for x86_64
tools/nolibc: add auxiliary vector retrieval for arm64
tools/nolibc: add auxiliary vector retrieval for arm
tools/nolibc: add auxiliary vector retrieval for riscv
tools/nolibc: add auxiliary vector retrieval for mips
tools/include/nolibc/arch-aarch64.h | 52 +++----
tools/include/nolibc/arch-arm.h | 138 ++++++++++++-------
tools/include/nolibc/arch-i386.h | 60 ++++----
tools/include/nolibc/arch-mips.h | 79 ++++++-----
tools/include/nolibc/arch-riscv.h | 62 +++++----
tools/include/nolibc/arch-s390.h | 70 +++++-----
tools/include/nolibc/arch-x86_64.h | 52 +++----
tools/include/nolibc/errno.h | 4 +-
tools/include/nolibc/stdlib.h | 27 ++++
tools/include/nolibc/sys.h | 22 +++
tools/testing/selftests/nolibc/nolibc-test.c | 30 ++++
11 files changed, 363 insertions(+), 233 deletions(-)
--
2.17.5
The environ is retrieved from the _start code and is easy to store at
this moment. Let's declare the variable weak and store the value into
it. By not being static it will be visible to all units. By being weak,
if some programs already declared it, they will continue to be able to
use it. This was tested on riscv64 both with environ inherited from
_start and extracted from envp.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-riscv.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/include/nolibc/arch-riscv.h b/tools/include/nolibc/arch-riscv.h
index c2b5db383d96..1608e6bd94b9 100644
--- a/tools/include/nolibc/arch-riscv.h
+++ b/tools/include/nolibc/arch-riscv.h
@@ -170,6 +170,8 @@ struct sys_stat_struct {
_arg1; \
})
+char **environ __attribute__((weak));
+
/* startup code */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
{
@@ -183,6 +185,8 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"slli a2, a0, "PTRLOG"\n" // envp (a2) = SZREG*argc ...
"add a2, a2, "SZREG"\n" // + SZREG (skip null)
"add a2,a2,a1\n" // + argv
+ "lui a3, %hi(environ)\n" // a3 = &environ (high bits)
+ "sd a2,%lo(environ)(a3)\n" // store envp(a2) into environ
"andi sp,a1,-16\n" // sp must be 16-byte aligned
"call main\n" // main() returns the status code, we'll exit with it.
"li a7, 93\n" // NR_exit == 93
--
2.17.5
In the _start block we now iterate over envp to find the auxiliary
vector after the NULL. The pointer is saved into an _auxv variable
that is marked as weak so that it's accessible from multiple units.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-i386.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/include/nolibc/arch-i386.h b/tools/include/nolibc/arch-i386.h
index 60b586120727..e8d0cf545bf1 100644
--- a/tools/include/nolibc/arch-i386.h
+++ b/tools/include/nolibc/arch-i386.h
@@ -179,6 +179,7 @@ struct sys_stat_struct {
})
char **environ __attribute__((weak));
+const unsigned long *_auxv __attribute__((weak));
/* startup code */
/*
@@ -195,6 +196,12 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx)
"mov %ecx, environ\n" // save environ
"xor %ebp, %ebp\n" // zero the stack frame
+ "mov %ecx, %edx\n" // search for auxv (follows NULL after last env)
+ "0:\n"
+ "add $4, %edx\n" // search for auxv using edx, it follows the
+ "cmp -4(%edx), %ebp\n" // ... NULL after last env (ebp is zero here)
+ "jnz 0b\n"
+ "mov %edx, _auxv\n" // save it into _auxv
"and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned before
"sub $4, %esp\n" // the call instruction (args are aligned)
"push %ecx\n" // push all registers on the stack so that we
--
2.17.5
In the _start block we now iterate over envp to find the auxiliary
vector after the NULL. The pointer is saved into an _auxv variable
that is marked as weak so that it's accessible from multiple units.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-aarch64.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/include/nolibc/arch-aarch64.h b/tools/include/nolibc/arch-aarch64.h
index 2e3d9adc4c4c..383baddef701 100644
--- a/tools/include/nolibc/arch-aarch64.h
+++ b/tools/include/nolibc/arch-aarch64.h
@@ -170,6 +170,7 @@ struct sys_stat_struct {
})
char **environ __attribute__((weak));
+const unsigned long *_auxv __attribute__((weak));
/* startup code */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
@@ -182,6 +183,12 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"add x2, x2, x1\n" // + argv
"adrp x3, environ\n" // x3 = &environ (high bits)
"str x2, [x3, #:lo12:environ]\n" // store envp into environ
+ "mov x4, x2\n" // search for auxv (follows NULL after last env)
+ "0:\n"
+ "ldr x5, [x4], 8\n" // x5 = *x4; x4 += 8
+ "cbnz x5, 0b\n" // and stop at NULL after last env
+ "adrp x3, _auxv\n" // x3 = &_auxv (high bits)
+ "str x4, [x3, #:lo12:_auxv]\n" // store x4 into _auxv
"and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
"bl main\n" // main() returns the status code, we'll exit with it.
"mov x8, 93\n" // NR_exit == 93
--
2.17.5
From: Ammar Faizi <[email protected]>
This function returns the page size used by the running kernel. The
page size value is taken from the auxiliary vector at 'AT_PAGESZ' key.
'getpagesize(2)' is assumed as a syscall becuase the manpage placement
of this function is in entry 2 ('man 2 getpagesize') despite there is
no real 'getpagesize(2)' syscall in the Linux syscall table. Define
this function in 'sys.h'.
Signed-off-by: Ammar Faizi <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/sys.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 47bf67668860..b5f8cd35c03b 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -19,6 +19,7 @@
#include <linux/fs.h>
#include <linux/loop.h>
#include <linux/time.h>
+#include <linux/auxvec.h>
#include "arch.h"
#include "errno.h"
@@ -499,6 +500,26 @@ pid_t gettid(void)
return sys_gettid();
}
+static unsigned long getauxval(unsigned long key);
+
+/*
+ * long getpagesize(void);
+ */
+
+static __attribute__((unused))
+long getpagesize(void)
+{
+ long ret;
+
+ ret = getauxval(AT_PAGESZ);
+ if (!ret) {
+ SET_ERRNO(ENOENT);
+ return -1;
+ }
+
+ return ret;
+}
+
/*
* int gettimeofday(struct timeval *tv, struct timezone *tz);
--
2.17.5
From: Sven Schnelle <[email protected]>
The environ is retrieved from the _start code and is easy to store at
this moment. Let's declare the variable weak and store the value into
it. By not being static it will be visible to all units. By being weak,
if some programs already declared it, they will continue to be able to
use it. This was tested on s390 both with environ inherited from
_start and extracted from envp.
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-s390.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/include/nolibc/arch-s390.h b/tools/include/nolibc/arch-s390.h
index b58f64d47b82..039b454e79f0 100644
--- a/tools/include/nolibc/arch-s390.h
+++ b/tools/include/nolibc/arch-s390.h
@@ -159,6 +159,8 @@ struct sys_stat_struct {
_arg1; \
})
+char **environ __attribute__((weak));
+
/* startup code */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
{
@@ -174,6 +176,8 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"la %r4,8(%r4)\n" /* advance pointer */
"jnz 0b\n" /* no -> test next pointer */
/* yes -> r4 now contains start of envp */
+ "larl %r1,environ\n"
+ "stg %r4,0(%r1)\n"
"aghi %r15,-160\n" /* allocate new stackframe */
"xc 0(8,%r15),0(%r15)\n" /* clear backchain */
--
2.17.5
The environ is retrieved from the _start code and is easy to store at
this moment. Let's declare the variable weak and store the value into
it. By not being static it will be visible to all units. By being weak,
if some programs already declared it, they will continue to be able to
use it. This was tested both with environ inherited from _start and
extracted from envp.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-i386.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/include/nolibc/arch-i386.h b/tools/include/nolibc/arch-i386.h
index ef2a836ee667..60b586120727 100644
--- a/tools/include/nolibc/arch-i386.h
+++ b/tools/include/nolibc/arch-i386.h
@@ -178,6 +178,8 @@ struct sys_stat_struct {
_eax; \
})
+char **environ __attribute__((weak));
+
/* startup code */
/*
* i386 System V ABI mandates:
@@ -191,6 +193,7 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"pop %eax\n" // argc (first arg, %eax)
"mov %esp, %ebx\n" // argv[] (second arg, %ebx)
"lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx)
+ "mov %ecx, environ\n" // save environ
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned before
"sub $4, %esp\n" // the call instruction (args are aligned)
--
2.17.5
In the _start block we now iterate over envp to find the auxiliary
vector after the NULL. The pointer is saved into an _auxv variable
that is marked as weak so that it's accessible from multiple units.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-mips.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/include/nolibc/arch-mips.h b/tools/include/nolibc/arch-mips.h
index 7d22f7bc38b3..bf83432d23ed 100644
--- a/tools/include/nolibc/arch-mips.h
+++ b/tools/include/nolibc/arch-mips.h
@@ -177,6 +177,7 @@ struct sys_stat_struct {
})
char **environ __attribute__((weak));
+const unsigned long *_auxv __attribute__((weak));
/* startup code, note that it's called __start on MIPS */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __start(void)
@@ -196,6 +197,16 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __start(void)
"lui $a3, %hi(environ)\n" // load environ into a3 (hi)
"addiu $a3, %lo(environ)\n" // load environ into a3 (lo)
"sw $a2,($a3)\n" // store envp(a2) into environ
+
+ "move $t0, $a2\n" // iterate t0 over envp, look for NULL
+ "0:" // do {
+ "lw $a3, ($t0)\n" // a3=*(t0);
+ "bne $a3, $0, 0b\n" // } while (a3);
+ "addiu $t0, $t0, 4\n" // delayed slot: t0+=4;
+ "lui $a3, %hi(_auxv)\n" // load _auxv into a3 (hi)
+ "addiu $a3, %lo(_auxv)\n" // load _auxv into a3 (lo)
+ "sw $t0, ($a3)\n" // store t0 into _auxv
+
"li $t0, -8\n"
"and $sp, $sp, $t0\n" // sp must be 8-byte aligned
"addiu $sp,$sp,-16\n" // the callee expects to save a0..a3 there!
--
2.17.5
Till now errno was declared static so that it could be eliminated if
unused. While the goal is commendable for tiny executables as it allows
to eliminate any data and bss segments when not used, this comes with
some limitations, one of which being that the errno symbol seen in
different units are not the same. Even though this has never been a
real issue given the nature of the programs involved till now, it
happens that referencing the same symbol from multiple units can also
be achieved using weak symbols, with a difference being that only one
of them will be used for all of them. Compared to weak symbols, static
basically have no benefit for regular programs since there are always
at least a few variables in most of these, so the bss segment cannot
be eliminated. E.g:
$ size nolibc-test-static-errno
text data bss dec hex filename
11531 0 48 11579 2d3b nolibc-test-static-errno
Furthermore, the weak symbol doesn't use bss storage at all, resulting
in a slightly section:
$ size nolibc-test-weak-errno
text data bss dec hex filename
11531 0 40 11571 2d33 nolibc-test-weak-errno
This patch thus converts errno from static to weak.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/errno.h | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/include/nolibc/errno.h b/tools/include/nolibc/errno.h
index 9dc4919c769b..a44486ff0477 100644
--- a/tools/include/nolibc/errno.h
+++ b/tools/include/nolibc/errno.h
@@ -9,11 +9,9 @@
#include <asm/errno.h>
-/* this way it will be removed if unused */
-static int errno;
-
#ifndef NOLIBC_IGNORE_ERRNO
#define SET_ERRNO(v) do { errno = (v); } while (0)
+int errno __attribute__((weak));
#else
#define SET_ERRNO(v) do { } while (0)
#endif
--
2.17.5
In the _start block we now iterate over envp to find the auxiliary
vector after the NULL. The pointer is saved into an _auxv variable
that is marked as weak so that it's accessible from multiple units.
Signed-off-by: Willy Tarreau <[email protected]>
It was tested in arm, thumb1 and thumb2 modes.
---
tools/include/nolibc/arch-arm.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/tools/include/nolibc/arch-arm.h b/tools/include/nolibc/arch-arm.h
index 79666b590e87..42499f23e73c 100644
--- a/tools/include/nolibc/arch-arm.h
+++ b/tools/include/nolibc/arch-arm.h
@@ -197,6 +197,7 @@ struct sys_stat_struct {
})
char **environ __attribute__((weak));
+const unsigned long *_auxv __attribute__((weak));
/* startup code */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
@@ -211,6 +212,16 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"ldr %r3, 1f\n" // r3 = &environ (see below)
"str %r2, [r3]\n" // store envp into environ
+ "mov r4, r2\n" // search for auxv (follows NULL after last env)
+ "0:\n"
+ "mov r5, r4\n" // r5 = r4
+ "add r4, r4, #4\n" // r4 += 4
+ "ldr r5,[r5]\n" // r5 = *r5 = *(r4-4)
+ "cmp r5, #0\n" // and stop at NULL after last env
+ "bne 0b\n"
+ "ldr %r3, 2f\n" // r3 = &_auxv (low bits)
+ "str r4, [r3]\n" // store r4 into _auxv
+
"mov %r3, $8\n" // AAPCS : sp must be 8-byte aligned in the
"neg %r3, %r3\n" // callee, and bl doesn't push (lr=pc)
"and %r3, %r3, %r1\n" // so we do sp = r1(=sp) & r3(=-8);
@@ -222,6 +233,8 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
".align 2\n" // below are the pointers to a few variables
"1:\n"
".word environ\n"
+ "2:\n"
+ ".word _auxv\n"
);
__builtin_unreachable();
}
--
2.17.5
The historic nolibc code did not include asm/fcntl.h and had to define
the various O_RDWR etc macros in each arch-specific file (since such
values differ between certain archs). This was found at least once to
induce bugs due to wrong definitions. Let's get rid of all of them and
include asm/nolibc.h from sys.h instead. This was verified to work
properly on all supported architectures.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-aarch64.h | 12 ------------
tools/include/nolibc/arch-arm.h | 12 ------------
tools/include/nolibc/arch-i386.h | 12 ------------
tools/include/nolibc/arch-mips.h | 12 ------------
tools/include/nolibc/arch-riscv.h | 12 ------------
tools/include/nolibc/arch-s390.h | 12 ------------
tools/include/nolibc/arch-x86_64.h | 12 ------------
tools/include/nolibc/sys.h | 1 +
8 files changed, 1 insertion(+), 84 deletions(-)
diff --git a/tools/include/nolibc/arch-aarch64.h b/tools/include/nolibc/arch-aarch64.h
index 4d263661411f..f480993159ec 100644
--- a/tools/include/nolibc/arch-aarch64.h
+++ b/tools/include/nolibc/arch-aarch64.h
@@ -7,18 +7,6 @@
#ifndef _NOLIBC_ARCH_AARCH64_H
#define _NOLIBC_ARCH_AARCH64_H
-/* O_* macros for fcntl/open are architecture-specific */
-#define O_RDONLY 0
-#define O_WRONLY 1
-#define O_RDWR 2
-#define O_CREAT 0x40
-#define O_EXCL 0x80
-#define O_NOCTTY 0x100
-#define O_TRUNC 0x200
-#define O_APPEND 0x400
-#define O_NONBLOCK 0x800
-#define O_DIRECTORY 0x4000
-
/* The struct returned by the newfstatat() syscall. Differs slightly from the
* x86_64's stat one by field ordering, so be careful.
*/
diff --git a/tools/include/nolibc/arch-arm.h b/tools/include/nolibc/arch-arm.h
index ef94df2d93d5..48bd95492c87 100644
--- a/tools/include/nolibc/arch-arm.h
+++ b/tools/include/nolibc/arch-arm.h
@@ -7,18 +7,6 @@
#ifndef _NOLIBC_ARCH_ARM_H
#define _NOLIBC_ARCH_ARM_H
-/* O_* macros for fcntl/open are architecture-specific */
-#define O_RDONLY 0
-#define O_WRONLY 1
-#define O_RDWR 2
-#define O_CREAT 0x40
-#define O_EXCL 0x80
-#define O_NOCTTY 0x100
-#define O_TRUNC 0x200
-#define O_APPEND 0x400
-#define O_NONBLOCK 0x800
-#define O_DIRECTORY 0x4000
-
/* The struct returned by the stat() syscall, 32-bit only, the syscall returns
* exactly 56 bytes (stops before the unused array). In big endian, the format
* differs as devices are returned as short only.
diff --git a/tools/include/nolibc/arch-i386.h b/tools/include/nolibc/arch-i386.h
index b1bed2d87f74..ef2a836ee667 100644
--- a/tools/include/nolibc/arch-i386.h
+++ b/tools/include/nolibc/arch-i386.h
@@ -7,18 +7,6 @@
#ifndef _NOLIBC_ARCH_I386_H
#define _NOLIBC_ARCH_I386_H
-/* O_* macros for fcntl/open are architecture-specific */
-#define O_RDONLY 0
-#define O_WRONLY 1
-#define O_RDWR 2
-#define O_CREAT 0x40
-#define O_EXCL 0x80
-#define O_NOCTTY 0x100
-#define O_TRUNC 0x200
-#define O_APPEND 0x400
-#define O_NONBLOCK 0x800
-#define O_DIRECTORY 0x10000
-
/* The struct returned by the stat() syscall, 32-bit only, the syscall returns
* exactly 56 bytes (stops before the unused array).
*/
diff --git a/tools/include/nolibc/arch-mips.h b/tools/include/nolibc/arch-mips.h
index 11270ef25ea5..a3764f6d267e 100644
--- a/tools/include/nolibc/arch-mips.h
+++ b/tools/include/nolibc/arch-mips.h
@@ -7,18 +7,6 @@
#ifndef _NOLIBC_ARCH_MIPS_H
#define _NOLIBC_ARCH_MIPS_H
-/* O_* macros for fcntl/open are architecture-specific */
-#define O_RDONLY 0
-#define O_WRONLY 1
-#define O_RDWR 2
-#define O_APPEND 0x0008
-#define O_NONBLOCK 0x0080
-#define O_CREAT 0x0100
-#define O_TRUNC 0x0200
-#define O_EXCL 0x0400
-#define O_NOCTTY 0x0800
-#define O_DIRECTORY 0x10000
-
/* The struct returned by the stat() syscall. 88 bytes are returned by the
* syscall.
*/
diff --git a/tools/include/nolibc/arch-riscv.h b/tools/include/nolibc/arch-riscv.h
index bee769e6885c..c2b5db383d96 100644
--- a/tools/include/nolibc/arch-riscv.h
+++ b/tools/include/nolibc/arch-riscv.h
@@ -7,18 +7,6 @@
#ifndef _NOLIBC_ARCH_RISCV_H
#define _NOLIBC_ARCH_RISCV_H
-/* O_* macros for fcntl/open are architecture-specific */
-#define O_RDONLY 0
-#define O_WRONLY 1
-#define O_RDWR 2
-#define O_CREAT 0x40
-#define O_EXCL 0x80
-#define O_NOCTTY 0x100
-#define O_TRUNC 0x200
-#define O_APPEND 0x400
-#define O_NONBLOCK 0x800
-#define O_DIRECTORY 0x10000
-
struct sys_stat_struct {
unsigned long st_dev; /* Device. */
unsigned long st_ino; /* File serial number. */
diff --git a/tools/include/nolibc/arch-s390.h b/tools/include/nolibc/arch-s390.h
index 2c0b8847c050..b58f64d47b82 100644
--- a/tools/include/nolibc/arch-s390.h
+++ b/tools/include/nolibc/arch-s390.h
@@ -7,18 +7,6 @@
#define _NOLIBC_ARCH_S390_H
#include <asm/unistd.h>
-/* O_* macros for fcntl/open are architecture-specific */
-#define O_RDONLY 0
-#define O_WRONLY 1
-#define O_RDWR 2
-#define O_CREAT 0x40
-#define O_EXCL 0x80
-#define O_NOCTTY 0x100
-#define O_TRUNC 0x200
-#define O_APPEND 0x400
-#define O_NONBLOCK 0x800
-#define O_DIRECTORY 0x10000
-
/* The struct returned by the stat() syscall, equivalent to stat64(). The
* syscall returns 116 bytes and stops in the middle of __unused.
*/
diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index c70a84612a9e..8d482505c347 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -7,18 +7,6 @@
#ifndef _NOLIBC_ARCH_X86_64_H
#define _NOLIBC_ARCH_X86_64_H
-/* O_* macros for fcntl/open are architecture-specific */
-#define O_RDONLY 0
-#define O_WRONLY 1
-#define O_RDWR 2
-#define O_CREAT 0x40
-#define O_EXCL 0x80
-#define O_NOCTTY 0x100
-#define O_TRUNC 0x200
-#define O_APPEND 0x400
-#define O_NONBLOCK 0x800
-#define O_DIRECTORY 0x10000
-
/* The struct returned by the stat() syscall, equivalent to stat64(). The
* syscall returns 116 bytes and stops in the middle of __unused.
*/
diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index a42d7c405bdc..47bf67668860 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -11,6 +11,7 @@
#include "std.h"
/* system includes */
+#include <asm/fcntl.h> // for O_*
#include <asm/unistd.h>
#include <asm/signal.h> // for SIGCHLD
#include <asm/ioctls.h>
--
2.17.5
Passing -mthumb to the kernel.org arm toolchain failed to build because it
defaults to armv5 hence thumb1, which has a fairly limited instruction set
compared to thumb2 enabled with armv7 that is much more complete. It's not
very difficult to adjust the instructions to also build on thumb1, it only
adds a total of 3 instructions, so it's worth doing it at least to ease use
by casual testers. It was verified that the adjusted code now builds and
works fine for armv5, thumb1, armv7 and thumb2, as long as frame pointers
are not used.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-arm.h | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/tools/include/nolibc/arch-arm.h b/tools/include/nolibc/arch-arm.h
index 875b21975137..e4ba77b0310f 100644
--- a/tools/include/nolibc/arch-arm.h
+++ b/tools/include/nolibc/arch-arm.h
@@ -180,10 +180,16 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
__asm__ volatile (
"pop {%r0}\n" // argc was in the stack
"mov %r1, %sp\n" // argv = sp
- "add %r2, %r1, %r0, lsl #2\n" // envp = argv + 4*argc ...
- "add %r2, %r2, $4\n" // ... + 4
- "and %r3, %r1, $-8\n" // AAPCS : sp must be 8-byte aligned in the
- "mov %sp, %r3\n" // callee, an bl doesn't push (lr=pc)
+
+ "add %r2, %r0, $1\n" // envp = (argc + 1) ...
+ "lsl %r2, %r2, $2\n" // * 4 ...
+ "add %r2, %r2, %r1\n" // + argv
+
+ "mov %r3, $8\n" // AAPCS : sp must be 8-byte aligned in the
+ "neg %r3, %r3\n" // callee, and bl doesn't push (lr=pc)
+ "and %r3, %r3, %r1\n" // so we do sp = r1(=sp) & r3(=-8);
+ "mov %sp, %r3\n" //
+
"bl main\n" // main() returns the status code, we'll exit with it.
"movs r7, $1\n" // NR_exit == 1
"svc $0x00\n"
--
2.17.5
The out-of-block asm() statement carrying _start does not allow the
compiler to know what section the assembly code is being emitted to,
and there's no easy way to push/pop the current section and restore
it. It sometimes causes issues depending on the include files ordering
and compiler optimizations. For example if a variable is declared
immediately before the asm() block and another one after, the compiler
assumes that the current section is still .bss and doesn't re-emit it,
making the second variable appear inside the .text section instead.
Forcing .bss at the end of the _start block doesn't work either because
at certain optimizations the compiler may reorder blocks and will make
some real code appear just after this block.
A significant number of solutions were attempted, but many of them were
still sensitive to section reordering. In the end, the best way to make
sure the compiler and assembler agree on the current section is to place
this code inside a function. Here the function is directly called _start
and configured not to emit a frame-pointer, hence to have no prologue.
If some future architectures would still emit some prologue, another
working approach consists in naming the function differently and placing
the _start label inside the asm statement. But the current solution is
simpler.
It was tested with nolibc-test at -O,-O0,-O2,-O3,-Os for arm,arm64,i386,
mips,riscv,s390 and x86_64.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-aarch64.h | 29 ++++++++--------
tools/include/nolibc/arch-arm.h | 40 +++++++++-------------
tools/include/nolibc/arch-i386.h | 38 +++++++++++----------
tools/include/nolibc/arch-mips.h | 51 +++++++++++++++--------------
tools/include/nolibc/arch-riscv.h | 36 ++++++++++----------
tools/include/nolibc/arch-s390.h | 44 +++++++++++++------------
tools/include/nolibc/arch-x86_64.h | 30 +++++++++--------
7 files changed, 135 insertions(+), 133 deletions(-)
diff --git a/tools/include/nolibc/arch-aarch64.h b/tools/include/nolibc/arch-aarch64.h
index f68baf8f395f..4d263661411f 100644
--- a/tools/include/nolibc/arch-aarch64.h
+++ b/tools/include/nolibc/arch-aarch64.h
@@ -182,18 +182,19 @@ struct sys_stat_struct {
})
/* startup code */
-__asm__ (".section .text\n"
- ".weak _start\n"
- "_start:\n"
- "ldr x0, [sp]\n" // argc (x0) was in the stack
- "add x1, sp, 8\n" // argv (x1) = sp
- "lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
- "add x2, x2, 8\n" // + 8 (skip null)
- "add x2, x2, x1\n" // + argv
- "and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
- "bl main\n" // main() returns the status code, we'll exit with it.
- "mov x8, 93\n" // NR_exit == 93
- "svc #0\n"
- "");
-
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+{
+ __asm__ volatile (
+ "ldr x0, [sp]\n" // argc (x0) was in the stack
+ "add x1, sp, 8\n" // argv (x1) = sp
+ "lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
+ "add x2, x2, 8\n" // + 8 (skip null)
+ "add x2, x2, x1\n" // + argv
+ "and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
+ "bl main\n" // main() returns the status code, we'll exit with it.
+ "mov x8, 93\n" // NR_exit == 93
+ "svc #0\n"
+ );
+ __builtin_unreachable();
+}
#endif // _NOLIBC_ARCH_AARCH64_H
diff --git a/tools/include/nolibc/arch-arm.h b/tools/include/nolibc/arch-arm.h
index f31be8e967d6..875b21975137 100644
--- a/tools/include/nolibc/arch-arm.h
+++ b/tools/include/nolibc/arch-arm.h
@@ -175,30 +175,20 @@ struct sys_stat_struct {
})
/* startup code */
-__asm__ (".section .text\n"
- ".weak _start\n"
- "_start:\n"
-#if defined(__THUMBEB__) || defined(__THUMBEL__)
- /* We enter here in 32-bit mode but if some previous functions were in
- * 16-bit mode, the assembler cannot know, so we need to tell it we're in
- * 32-bit now, then switch to 16-bit (is there a better way to do it than
- * adding 1 by hand ?) and tell the asm we're now in 16-bit mode so that
- * it generates correct instructions. Note that we do not support thumb1.
- */
- ".code 32\n"
- "add r0, pc, #1\n"
- "bx r0\n"
- ".code 16\n"
-#endif
- "pop {%r0}\n" // argc was in the stack
- "mov %r1, %sp\n" // argv = sp
- "add %r2, %r1, %r0, lsl #2\n" // envp = argv + 4*argc ...
- "add %r2, %r2, $4\n" // ... + 4
- "and %r3, %r1, $-8\n" // AAPCS : sp must be 8-byte aligned in the
- "mov %sp, %r3\n" // callee, an bl doesn't push (lr=pc)
- "bl main\n" // main() returns the status code, we'll exit with it.
- "movs r7, $1\n" // NR_exit == 1
- "svc $0x00\n"
- "");
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+{
+ __asm__ volatile (
+ "pop {%r0}\n" // argc was in the stack
+ "mov %r1, %sp\n" // argv = sp
+ "add %r2, %r1, %r0, lsl #2\n" // envp = argv + 4*argc ...
+ "add %r2, %r2, $4\n" // ... + 4
+ "and %r3, %r1, $-8\n" // AAPCS : sp must be 8-byte aligned in the
+ "mov %sp, %r3\n" // callee, an bl doesn't push (lr=pc)
+ "bl main\n" // main() returns the status code, we'll exit with it.
+ "movs r7, $1\n" // NR_exit == 1
+ "svc $0x00\n"
+ );
+ __builtin_unreachable();
+}
#endif // _NOLIBC_ARCH_ARM_H
diff --git a/tools/include/nolibc/arch-i386.h b/tools/include/nolibc/arch-i386.h
index d7e7212346e2..b1bed2d87f74 100644
--- a/tools/include/nolibc/arch-i386.h
+++ b/tools/include/nolibc/arch-i386.h
@@ -197,23 +197,25 @@ struct sys_stat_struct {
* 2) The deepest stack frame should be set to zero
*
*/
-__asm__ (".section .text\n"
- ".weak _start\n"
- "_start:\n"
- "pop %eax\n" // argc (first arg, %eax)
- "mov %esp, %ebx\n" // argv[] (second arg, %ebx)
- "lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx)
- "xor %ebp, %ebp\n" // zero the stack frame
- "and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned before
- "sub $4, %esp\n" // the call instruction (args are aligned)
- "push %ecx\n" // push all registers on the stack so that we
- "push %ebx\n" // support both regparm and plain stack modes
- "push %eax\n"
- "call main\n" // main() returns the status code in %eax
- "mov %eax, %ebx\n" // retrieve exit code (32-bit int)
- "movl $1, %eax\n" // NR_exit == 1
- "int $0x80\n" // exit now
- "hlt\n" // ensure it does not
- "");
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+{
+ __asm__ volatile (
+ "pop %eax\n" // argc (first arg, %eax)
+ "mov %esp, %ebx\n" // argv[] (second arg, %ebx)
+ "lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx)
+ "xor %ebp, %ebp\n" // zero the stack frame
+ "and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned before
+ "sub $4, %esp\n" // the call instruction (args are aligned)
+ "push %ecx\n" // push all registers on the stack so that we
+ "push %ebx\n" // support both regparm and plain stack modes
+ "push %eax\n"
+ "call main\n" // main() returns the status code in %eax
+ "mov %eax, %ebx\n" // retrieve exit code (32-bit int)
+ "movl $1, %eax\n" // NR_exit == 1
+ "int $0x80\n" // exit now
+ "hlt\n" // ensure it does not
+ );
+ __builtin_unreachable();
+}
#endif // _NOLIBC_ARCH_I386_H
diff --git a/tools/include/nolibc/arch-mips.h b/tools/include/nolibc/arch-mips.h
index 7380093ba9e7..11270ef25ea5 100644
--- a/tools/include/nolibc/arch-mips.h
+++ b/tools/include/nolibc/arch-mips.h
@@ -189,29 +189,32 @@ struct sys_stat_struct {
})
/* startup code, note that it's called __start on MIPS */
-__asm__ (".section .text\n"
- ".weak __start\n"
- ".set nomips16\n"
- ".set push\n"
- ".set noreorder\n"
- ".option pic0\n"
- ".ent __start\n"
- "__start:\n"
- "lw $a0,($sp)\n" // argc was in the stack
- "addiu $a1, $sp, 4\n" // argv = sp + 4
- "sll $a2, $a0, 2\n" // a2 = argc * 4
- "add $a2, $a2, $a1\n" // envp = argv + 4*argc ...
- "addiu $a2, $a2, 4\n" // ... + 4
- "li $t0, -8\n"
- "and $sp, $sp, $t0\n" // sp must be 8-byte aligned
- "addiu $sp,$sp,-16\n" // the callee expects to save a0..a3 there!
- "jal main\n" // main() returns the status code, we'll exit with it.
- "nop\n" // delayed slot
- "move $a0, $v0\n" // retrieve 32-bit exit code from v0
- "li $v0, 4001\n" // NR_exit == 4001
- "syscall\n"
- ".end __start\n"
- ".set pop\n"
- "");
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __start(void)
+{
+ __asm__ volatile (
+ //".set nomips16\n"
+ ".set push\n"
+ ".set noreorder\n"
+ ".option pic0\n"
+ //".ent __start\n"
+ //"__start:\n"
+ "lw $a0,($sp)\n" // argc was in the stack
+ "addiu $a1, $sp, 4\n" // argv = sp + 4
+ "sll $a2, $a0, 2\n" // a2 = argc * 4
+ "add $a2, $a2, $a1\n" // envp = argv + 4*argc ...
+ "addiu $a2, $a2, 4\n" // ... + 4
+ "li $t0, -8\n"
+ "and $sp, $sp, $t0\n" // sp must be 8-byte aligned
+ "addiu $sp,$sp,-16\n" // the callee expects to save a0..a3 there!
+ "jal main\n" // main() returns the status code, we'll exit with it.
+ "nop\n" // delayed slot
+ "move $a0, $v0\n" // retrieve 32-bit exit code from v0
+ "li $v0, 4001\n" // NR_exit == 4001
+ "syscall\n"
+ //".end __start\n"
+ ".set pop\n"
+ );
+ __builtin_unreachable();
+}
#endif // _NOLIBC_ARCH_MIPS_H
diff --git a/tools/include/nolibc/arch-riscv.h b/tools/include/nolibc/arch-riscv.h
index a3bdd9803f8c..bee769e6885c 100644
--- a/tools/include/nolibc/arch-riscv.h
+++ b/tools/include/nolibc/arch-riscv.h
@@ -183,22 +183,24 @@ struct sys_stat_struct {
})
/* startup code */
-__asm__ (".section .text\n"
- ".weak _start\n"
- "_start:\n"
- ".option push\n"
- ".option norelax\n"
- "lla gp, __global_pointer$\n"
- ".option pop\n"
- "lw a0, 0(sp)\n" // argc (a0) was in the stack
- "add a1, sp, "SZREG"\n" // argv (a1) = sp
- "slli a2, a0, "PTRLOG"\n" // envp (a2) = SZREG*argc ...
- "add a2, a2, "SZREG"\n" // + SZREG (skip null)
- "add a2,a2,a1\n" // + argv
- "andi sp,a1,-16\n" // sp must be 16-byte aligned
- "call main\n" // main() returns the status code, we'll exit with it.
- "li a7, 93\n" // NR_exit == 93
- "ecall\n"
- "");
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+{
+ __asm__ volatile (
+ ".option push\n"
+ ".option norelax\n"
+ "lla gp, __global_pointer$\n"
+ ".option pop\n"
+ "lw a0, 0(sp)\n" // argc (a0) was in the stack
+ "add a1, sp, "SZREG"\n" // argv (a1) = sp
+ "slli a2, a0, "PTRLOG"\n" // envp (a2) = SZREG*argc ...
+ "add a2, a2, "SZREG"\n" // + SZREG (skip null)
+ "add a2,a2,a1\n" // + argv
+ "andi sp,a1,-16\n" // sp must be 16-byte aligned
+ "call main\n" // main() returns the status code, we'll exit with it.
+ "li a7, 93\n" // NR_exit == 93
+ "ecall\n"
+ );
+ __builtin_unreachable();
+}
#endif // _NOLIBC_ARCH_RISCV_H
diff --git a/tools/include/nolibc/arch-s390.h b/tools/include/nolibc/arch-s390.h
index 76bc8fdaf922..2c0b8847c050 100644
--- a/tools/include/nolibc/arch-s390.h
+++ b/tools/include/nolibc/arch-s390.h
@@ -172,27 +172,29 @@ struct sys_stat_struct {
})
/* startup code */
-__asm__ (".section .text\n"
- ".weak _start\n"
- "_start:\n"
- "lg %r2,0(%r15)\n" /* argument count */
- "la %r3,8(%r15)\n" /* argument pointers */
-
- "xgr %r0,%r0\n" /* r0 will be our NULL value */
- /* search for envp */
- "lgr %r4,%r3\n" /* start at argv */
- "0:\n"
- "clg %r0,0(%r4)\n" /* entry zero? */
- "la %r4,8(%r4)\n" /* advance pointer */
- "jnz 0b\n" /* no -> test next pointer */
- /* yes -> r4 now contains start of envp */
-
- "aghi %r15,-160\n" /* allocate new stackframe */
- "xc 0(8,%r15),0(%r15)\n" /* clear backchain */
- "brasl %r14,main\n" /* ret value of main is arg to exit */
- "lghi %r1,1\n" /* __NR_exit */
- "svc 0\n"
- "");
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+{
+ __asm__ volatile (
+ "lg %r2,0(%r15)\n" /* argument count */
+ "la %r3,8(%r15)\n" /* argument pointers */
+
+ "xgr %r0,%r0\n" /* r0 will be our NULL value */
+ /* search for envp */
+ "lgr %r4,%r3\n" /* start at argv */
+ "0:\n"
+ "clg %r0,0(%r4)\n" /* entry zero? */
+ "la %r4,8(%r4)\n" /* advance pointer */
+ "jnz 0b\n" /* no -> test next pointer */
+ /* yes -> r4 now contains start of envp */
+
+ "aghi %r15,-160\n" /* allocate new stackframe */
+ "xc 0(8,%r15),0(%r15)\n" /* clear backchain */
+ "brasl %r14,main\n" /* ret value of main is arg to exit */
+ "lghi %r1,1\n" /* __NR_exit */
+ "svc 0\n"
+ );
+ __builtin_unreachable();
+}
struct s390_mmap_arg_struct {
unsigned long addr;
diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index 0e1e9eb8545d..c70a84612a9e 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -197,19 +197,21 @@ struct sys_stat_struct {
* 2) The deepest stack frame should be zero (the %rbp).
*
*/
-__asm__ (".section .text\n"
- ".weak _start\n"
- "_start:\n"
- "pop %rdi\n" // argc (first arg, %rdi)
- "mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
- "lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
- "xor %ebp, %ebp\n" // zero the stack frame
- "and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
- "call main\n" // main() returns the status code, we'll exit with it.
- "mov %eax, %edi\n" // retrieve exit code (32 bit)
- "mov $60, %eax\n" // NR_exit == 60
- "syscall\n" // really exit
- "hlt\n" // ensure it does not return
- "");
+void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
+{
+ __asm__ volatile (
+ "pop %rdi\n" // argc (first arg, %rdi)
+ "mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
+ "lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
+ "xor %ebp, %ebp\n" // zero the stack frame
+ "and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
+ "call main\n" // main() returns the status code, we'll exit with it.
+ "mov %eax, %edi\n" // retrieve exit code (32 bit)
+ "mov $60, %eax\n" // NR_exit == 60
+ "syscall\n" // really exit
+ "hlt\n" // ensure it does not return
+ );
+ __builtin_unreachable();
+}
#endif // _NOLIBC_ARCH_X86_64_H
--
2.17.5
In Thumb mode, register r7 is normally used to store the frame pointer.
By default when optimizing at -Os there's no frame pointer so this works
fine. But if no optimization is set, then build errors occur, indicating
that r7 cannot not be used. It's difficult to cheat because it's the
compiler that is complaining, not the assembler, so it's not even possible
to report that the register was clobbered. The solution consists in saving
and restoring r7 around the syscall, but this slightly inflates the code.
The syscall number is passed via r6 which is never used by syscalls.
The current patch adds a few macroes which do that only in Thumb mode,
and which continue to directly assign the syscall number to register r7
in ARM mode. Now this always builds and works for all modes (tested on
Arm, Thumbv1, Thumbv2 modes, at -Os, -O0, -O0 -fomit-frame-pointer).
The code is very slightly inflated in thumb-mode without frame-pointers
compared to previously (e.g. 7928 vs 7864 bytes for nolibc-test) but at
least it's always operational. And it's possible to disable this mechanism
by setting NOLIBC_OMIT_FRAME_POINTER.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-arm.h | 60 ++++++++++++++++++++++++++-------
1 file changed, 47 insertions(+), 13 deletions(-)
diff --git a/tools/include/nolibc/arch-arm.h b/tools/include/nolibc/arch-arm.h
index e4ba77b0310f..ef94df2d93d5 100644
--- a/tools/include/nolibc/arch-arm.h
+++ b/tools/include/nolibc/arch-arm.h
@@ -70,20 +70,44 @@ struct sys_stat_struct {
* don't have to experience issues with register constraints.
* - the syscall number is always specified last in order to allow to force
* some registers before (gcc refuses a %-register at the last position).
+ * - in thumb mode without -fomit-frame-pointer, r7 is also used to store the
+ * frame pointer, and we cannot directly assign it as a register variable,
+ * nor can we clobber it. Instead we assign the r6 register and swap it
+ * with r7 before calling svc, and r6 is marked as clobbered.
+ * We're just using any regular register which we assign to r7 after saving
+ * it.
*
* Also, ARM supports the old_select syscall if newselect is not available
*/
#define __ARCH_WANT_SYS_OLD_SELECT
+#if (defined(__THUMBEB__) || defined(__THUMBEL__)) && \
+ !defined(NOLIBC_OMIT_FRAME_POINTER)
+/* swap r6,r7 needed in Thumb mode since we can't use nor clobber r7 */
+#define _NOLIBC_SYSCALL_REG "r6"
+#define _NOLIBC_THUMB_SET_R7 "eor r7, r6\neor r6, r7\neor r7, r6\n"
+#define _NOLIBC_THUMB_RESTORE_R7 "mov r7, r6\n"
+
+#else /* we're in ARM mode */
+/* in Arm mode we can directly use r7 */
+#define _NOLIBC_SYSCALL_REG "r7"
+#define _NOLIBC_THUMB_SET_R7 ""
+#define _NOLIBC_THUMB_RESTORE_R7 ""
+
+#endif /* end THUMB */
+
#define my_syscall0(num) \
({ \
- register long _num __asm__ ("r7") = (num); \
+ register long _num __asm__(_NOLIBC_SYSCALL_REG) = (num); \
register long _arg1 __asm__ ("r0"); \
\
__asm__ volatile ( \
+ _NOLIBC_THUMB_SET_R7 \
"svc #0\n" \
- : "=r"(_arg1) \
- : "r"(_num) \
+ _NOLIBC_THUMB_RESTORE_R7 \
+ : "=r"(_arg1), "=r"(_num) \
+ : "r"(_arg1), \
+ "r"(_num) \
: "memory", "cc", "lr" \
); \
_arg1; \
@@ -91,12 +115,14 @@ struct sys_stat_struct {
#define my_syscall1(num, arg1) \
({ \
- register long _num __asm__ ("r7") = (num); \
+ register long _num __asm__(_NOLIBC_SYSCALL_REG) = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
\
__asm__ volatile ( \
+ _NOLIBC_THUMB_SET_R7 \
"svc #0\n" \
- : "=r"(_arg1) \
+ _NOLIBC_THUMB_RESTORE_R7 \
+ : "=r"(_arg1), "=r" (_num) \
: "r"(_arg1), \
"r"(_num) \
: "memory", "cc", "lr" \
@@ -106,13 +132,15 @@ struct sys_stat_struct {
#define my_syscall2(num, arg1, arg2) \
({ \
- register long _num __asm__ ("r7") = (num); \
+ register long _num __asm__(_NOLIBC_SYSCALL_REG) = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
register long _arg2 __asm__ ("r1") = (long)(arg2); \
\
__asm__ volatile ( \
+ _NOLIBC_THUMB_SET_R7 \
"svc #0\n" \
- : "=r"(_arg1) \
+ _NOLIBC_THUMB_RESTORE_R7 \
+ : "=r"(_arg1), "=r" (_num) \
: "r"(_arg1), "r"(_arg2), \
"r"(_num) \
: "memory", "cc", "lr" \
@@ -122,14 +150,16 @@ struct sys_stat_struct {
#define my_syscall3(num, arg1, arg2, arg3) \
({ \
- register long _num __asm__ ("r7") = (num); \
+ register long _num __asm__(_NOLIBC_SYSCALL_REG) = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
register long _arg2 __asm__ ("r1") = (long)(arg2); \
register long _arg3 __asm__ ("r2") = (long)(arg3); \
\
__asm__ volatile ( \
+ _NOLIBC_THUMB_SET_R7 \
"svc #0\n" \
- : "=r"(_arg1) \
+ _NOLIBC_THUMB_RESTORE_R7 \
+ : "=r"(_arg1), "=r" (_num) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), \
"r"(_num) \
: "memory", "cc", "lr" \
@@ -139,15 +169,17 @@ struct sys_stat_struct {
#define my_syscall4(num, arg1, arg2, arg3, arg4) \
({ \
- register long _num __asm__ ("r7") = (num); \
+ register long _num __asm__(_NOLIBC_SYSCALL_REG) = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
register long _arg2 __asm__ ("r1") = (long)(arg2); \
register long _arg3 __asm__ ("r2") = (long)(arg3); \
register long _arg4 __asm__ ("r3") = (long)(arg4); \
\
__asm__ volatile ( \
+ _NOLIBC_THUMB_SET_R7 \
"svc #0\n" \
- : "=r"(_arg1) \
+ _NOLIBC_THUMB_RESTORE_R7 \
+ : "=r"(_arg1), "=r" (_num) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \
"r"(_num) \
: "memory", "cc", "lr" \
@@ -157,7 +189,7 @@ struct sys_stat_struct {
#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \
({ \
- register long _num __asm__ ("r7") = (num); \
+ register long _num __asm__(_NOLIBC_SYSCALL_REG) = (num); \
register long _arg1 __asm__ ("r0") = (long)(arg1); \
register long _arg2 __asm__ ("r1") = (long)(arg2); \
register long _arg3 __asm__ ("r2") = (long)(arg3); \
@@ -165,8 +197,10 @@ struct sys_stat_struct {
register long _arg5 __asm__ ("r4") = (long)(arg5); \
\
__asm__ volatile ( \
+ _NOLIBC_THUMB_SET_R7 \
"svc #0\n" \
- : "=r" (_arg1) \
+ _NOLIBC_THUMB_RESTORE_R7 \
+ : "=r"(_arg1), "=r" (_num) \
: "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \
"r"(_num) \
: "memory", "cc", "lr" \
--
2.17.5
The environ is retrieved from the _start code and is easy to store at
this moment. Let's declare the variable weak and store the value into
it. By not being static it will be visible to all units. By being weak,
if some programs already declared it, they will continue to be able to
use it. This was tested both with environ inherited from _start and
extracted from envp.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-x86_64.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index 8d482505c347..683702a16a61 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -178,6 +178,8 @@ struct sys_stat_struct {
_ret; \
})
+char **environ __attribute__((weak));
+
/* startup code */
/*
* x86-64 System V ABI mandates:
@@ -191,6 +193,7 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"pop %rdi\n" // argc (first arg, %rdi)
"mov %rsp, %rsi\n" // argv[] (second arg, %rsi)
"lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
+ "mov %rdx, environ\n" // save environ
"xor %ebp, %ebp\n" // zero the stack frame
"and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
--
2.17.5
From: Sven Schnelle <[email protected]>
In the _start block we now iterate over envp to find the auxiliary
vector after the NULL. The pointer is saved into an _auxv variable
that is marked as weak so that it's accessible from multiple units.
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-s390.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/include/nolibc/arch-s390.h b/tools/include/nolibc/arch-s390.h
index 039b454e79f0..6b0e54ed543d 100644
--- a/tools/include/nolibc/arch-s390.h
+++ b/tools/include/nolibc/arch-s390.h
@@ -160,6 +160,7 @@ struct sys_stat_struct {
})
char **environ __attribute__((weak));
+const unsigned long *_auxv __attribute__((weak));
/* startup code */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
@@ -179,6 +180,15 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"larl %r1,environ\n"
"stg %r4,0(%r1)\n"
+ /* search for auxv */
+ "lgr %r5,%r4\n" /* start at envp */
+ "1:\n"
+ "clg %r0,0(%r5)\n" /* entry zero? */
+ "la %r5,8(%r5)\n" /* advance pointer */
+ "jnz 1b\n" /* no -> test next pointer */
+ "larl %r1,_auxv\n" /* yes -> store value in _auxv */
+ "stg %r5,0(%r1)\n"
+
"aghi %r15,-160\n" /* allocate new stackframe */
"xc 0(8,%r15),0(%r15)\n" /* clear backchain */
"brasl %r14,main\n" /* ret value of main is arg to exit */
--
2.17.5
The environ is retrieved from the _start code and is easy to store at
this moment. Let's declare the variable weak and store the value into
it. By not being static it will be visible to all units. By being weak,
if some programs already declared it, they will continue to be able to
use it. This was tested both with environ inherited from _start and
extracted from envp.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-aarch64.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/include/nolibc/arch-aarch64.h b/tools/include/nolibc/arch-aarch64.h
index f480993159ec..2e3d9adc4c4c 100644
--- a/tools/include/nolibc/arch-aarch64.h
+++ b/tools/include/nolibc/arch-aarch64.h
@@ -169,6 +169,8 @@ struct sys_stat_struct {
_arg1; \
})
+char **environ __attribute__((weak));
+
/* startup code */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
{
@@ -178,6 +180,8 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"lsl x2, x0, 3\n" // envp (x2) = 8*argc ...
"add x2, x2, 8\n" // + 8 (skip null)
"add x2, x2, x1\n" // + argv
+ "adrp x3, environ\n" // x3 = &environ (high bits)
+ "str x2, [x3, #:lo12:environ]\n" // store envp into environ
"and sp, x1, -16\n" // sp must be 16-byte aligned in the callee
"bl main\n" // main() returns the status code, we'll exit with it.
"mov x8, 93\n" // NR_exit == 93
--
2.17.5
From: Ammar Faizi <[email protected]>
Previous commits save the address of the auxiliary vector into a global
variable @_auxv. This commit creates a new function 'getauxval()' as a
helper function to get the auxv value based on the given key.
The behavior of this function is identic with the function documented
in 'man 3 getauxval'. This function is also needed to implement
'getpagesize()' function that we will wire up in the next patches.
Signed-off-by: Ammar Faizi <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/stdlib.h | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/tools/include/nolibc/stdlib.h b/tools/include/nolibc/stdlib.h
index a24000d1e822..894c955d027e 100644
--- a/tools/include/nolibc/stdlib.h
+++ b/tools/include/nolibc/stdlib.h
@@ -12,6 +12,7 @@
#include "types.h"
#include "sys.h"
#include "string.h"
+#include <linux/auxvec.h>
struct nolibc_heap {
size_t len;
@@ -108,6 +109,32 @@ char *getenv(const char *name)
return _getenv(name, environ);
}
+static __attribute__((unused))
+unsigned long getauxval(unsigned long type)
+{
+ const unsigned long *auxv = _auxv;
+ unsigned long ret;
+
+ if (!auxv)
+ return 0;
+
+ while (1) {
+ if (!auxv[0] && !auxv[1]) {
+ ret = 0;
+ break;
+ }
+
+ if (auxv[0] == type) {
+ ret = auxv[1];
+ break;
+ }
+
+ auxv += 2;
+ }
+
+ return ret;
+}
+
static __attribute__((unused))
void *malloc(size_t len)
{
--
2.17.5
In the _start block we now iterate over envp to find the auxiliary
vector after the NULL. The pointer is saved into an _auxv variable
that is marked as weak so that it's accessible from multiple units.
It was tested on riscv64 only.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-riscv.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/include/nolibc/arch-riscv.h b/tools/include/nolibc/arch-riscv.h
index 1608e6bd94b9..e197fcb10ac0 100644
--- a/tools/include/nolibc/arch-riscv.h
+++ b/tools/include/nolibc/arch-riscv.h
@@ -171,6 +171,7 @@ struct sys_stat_struct {
})
char **environ __attribute__((weak));
+const unsigned long *_auxv __attribute__((weak));
/* startup code */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
@@ -185,6 +186,15 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"slli a2, a0, "PTRLOG"\n" // envp (a2) = SZREG*argc ...
"add a2, a2, "SZREG"\n" // + SZREG (skip null)
"add a2,a2,a1\n" // + argv
+
+ "add a3, a2, zero\n" // iterate a3 over envp to find auxv (after NULL)
+ "0:\n" // do {
+ "ld a4, 0(a3)\n" // a4 = *a3;
+ "add a3, a3, "SZREG"\n" // a3 += sizeof(void*);
+ "bne a4, zero, 0b\n" // } while (a4);
+ "lui a4, %hi(_auxv)\n" // a4 = &_auxv (high bits)
+ "sd a3, %lo(_auxv)(a4)\n" // store a3 into _auxv
+
"lui a3, %hi(environ)\n" // a3 = &environ (high bits)
"sd a2,%lo(environ)(a3)\n" // store envp(a2) into environ
"andi sp,a1,-16\n" // sp must be 16-byte aligned
--
2.17.5
From: Ammar Faizi <[email protected]>
Test the getpagesize() function. Make sure it returns the correct
value.
Signed-off-by: Ammar Faizi <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/testing/selftests/nolibc/nolibc-test.c | 30 ++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index f14f5076fb6d..c4a0c915139c 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -442,6 +442,35 @@ int test_getdents64(const char *dir)
return ret;
}
+static int test_getpagesize(void)
+{
+ long x = getpagesize();
+ int c;
+
+ if (x < 0)
+ return x;
+
+#if defined(__x86_64__) || defined(__i386__) || defined(__i486__) || defined(__i586__) || defined(__i686__)
+ /*
+ * x86 family is always 4K page.
+ */
+ c = (x == 4096);
+#elif defined(__aarch64__)
+ /*
+ * Linux aarch64 supports three values of page size: 4K, 16K, and 64K
+ * which are selected at kernel compilation time.
+ */
+ c = (x == 4096 || x == (16 * 1024) || x == (64 * 1024));
+#else
+ /*
+ * Assuming other architectures must have at least 4K page.
+ */
+ c = (x >= 4096);
+#endif
+
+ return !c;
+}
+
/* Run syscall tests between IDs <min> and <max>.
* Return 0 on success, non-zero on failure.
*/
@@ -502,6 +531,7 @@ int run_syscall(int min, int max)
CASE_TEST(gettimeofday_bad2); EXPECT_SYSER(1, gettimeofday(NULL, (void *)1), -1, EFAULT); break;
CASE_TEST(gettimeofday_bad2); EXPECT_SYSER(1, gettimeofday(NULL, (void *)1), -1, EFAULT); break;
#endif
+ CASE_TEST(getpagesize); EXPECT_SYSZR(1, test_getpagesize()); break;
CASE_TEST(ioctl_tiocinq); EXPECT_SYSZR(1, ioctl(0, TIOCINQ, &tmp)); break;
CASE_TEST(ioctl_tiocinq); EXPECT_SYSZR(1, ioctl(0, TIOCINQ, &tmp)); break;
CASE_TEST(link_root1); EXPECT_SYSER(1, link("/", "/"), -1, EEXIST); break;
--
2.17.5
The environ is retrieved from the _start code and is easy to store at
this moment. Let's declare the variable weak and store the value into
it. By not being static it will be visible to all units. By being weak,
if some programs already declared it, they will continue to be able to
use it. This was tested in arm and thumb1 and thumb2 modes, and for each
mode, both with environ inherited from _start and extracted from envp.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-arm.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/include/nolibc/arch-arm.h b/tools/include/nolibc/arch-arm.h
index 48bd95492c87..79666b590e87 100644
--- a/tools/include/nolibc/arch-arm.h
+++ b/tools/include/nolibc/arch-arm.h
@@ -196,6 +196,8 @@ struct sys_stat_struct {
_arg1; \
})
+char **environ __attribute__((weak));
+
/* startup code */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
{
@@ -206,6 +208,8 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"add %r2, %r0, $1\n" // envp = (argc + 1) ...
"lsl %r2, %r2, $2\n" // * 4 ...
"add %r2, %r2, %r1\n" // + argv
+ "ldr %r3, 1f\n" // r3 = &environ (see below)
+ "str %r2, [r3]\n" // store envp into environ
"mov %r3, $8\n" // AAPCS : sp must be 8-byte aligned in the
"neg %r3, %r3\n" // callee, and bl doesn't push (lr=pc)
@@ -215,7 +219,10 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"bl main\n" // main() returns the status code, we'll exit with it.
"movs r7, $1\n" // NR_exit == 1
"svc $0x00\n"
- );
+ ".align 2\n" // below are the pointers to a few variables
+ "1:\n"
+ ".word environ\n"
+ );
__builtin_unreachable();
}
--
2.17.5
The environ is retrieved from the _start code and is easy to store at
this moment. Let's declare the variable weak and store the value into
it. By not being static it will be visible to all units. By being weak,
if some programs already declared it, they will continue to be able to
use it. This was tested with mips24kc (BE) both with environ inherited
from _start and extracted from envp.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-mips.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/include/nolibc/arch-mips.h b/tools/include/nolibc/arch-mips.h
index a3764f6d267e..7d22f7bc38b3 100644
--- a/tools/include/nolibc/arch-mips.h
+++ b/tools/include/nolibc/arch-mips.h
@@ -176,6 +176,8 @@ struct sys_stat_struct {
_arg4 ? -_num : _num; \
})
+char **environ __attribute__((weak));
+
/* startup code, note that it's called __start on MIPS */
void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __start(void)
{
@@ -191,6 +193,9 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __start(void)
"sll $a2, $a0, 2\n" // a2 = argc * 4
"add $a2, $a2, $a1\n" // envp = argv + 4*argc ...
"addiu $a2, $a2, 4\n" // ... + 4
+ "lui $a3, %hi(environ)\n" // load environ into a3 (hi)
+ "addiu $a3, %lo(environ)\n" // load environ into a3 (lo)
+ "sw $a2,($a3)\n" // store envp(a2) into environ
"li $t0, -8\n"
"and $sp, $sp, $t0\n" // sp must be 8-byte aligned
"addiu $sp,$sp,-16\n" // the callee expects to save a0..a3 there!
--
2.17.5
In the _start block we now iterate over envp to find the auxiliary
vector after the NULL. The pointer is saved into an _auxv variable
that is marked as weak so that it's accessible from multiple units.
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/arch-x86_64.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index 683702a16a61..17f6751208e7 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -179,6 +179,7 @@ struct sys_stat_struct {
})
char **environ __attribute__((weak));
+const unsigned long *_auxv __attribute__((weak));
/* startup code */
/*
@@ -195,6 +196,12 @@ void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) _start(void)
"lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx)
"mov %rdx, environ\n" // save environ
"xor %ebp, %ebp\n" // zero the stack frame
+ "mov %rdx, %rax\n" // search for auxv (follows NULL after last env)
+ "0:\n"
+ "add $8, %rax\n" // search for auxv using rax, it follows the
+ "cmp -8(%rax), %rbp\n" // ... NULL after last env (rbp is zero here)
+ "jnz 0b\n"
+ "mov %rax, _auxv\n" // save it into _auxv
"and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call
"call main\n" // main() returns the status code, we'll exit with it.
"mov %eax, %edi\n" // retrieve exit code (32 bit)
--
2.17.5
On Tue, Jan 10, 2023 at 08:24:12AM +0100, Willy Tarreau wrote:
> Hello Paul,
>
> this 3rd series aims at generally improving usability and maintainance
> of nolibc. It needs to be applied on top of the s390 series:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> - first, I've encountered remaining problems related to section
> reordering happening at certain optimization levels and that is also
> arch-dependent. We were fortunate they didn't appear in rcutorture,
> but I could reproduce them with ARM in Thumb mode at -O0. The problem
> is that our out-of-block asm() statement changes the current section
> to ".text" without the compiler knowing. Thus the compiler may believe
> it's still in .bss and emit variables immediately after (e.g. errno),
> which end up in the wrong section. Switching the section at the end
> to .bss doesn't work either because at -Os I was seeing sys_brk()
> placed immediately after and crashing as .bss is not executable. The
> only safe solution is to turn _start to real functions. This was
> tested on all archs at -O0,-O1,-O2,-O3,-Os and all of them now work.
>
> - second, thumb mode support was in complete on ARM. Only thumb-2 was
> supported and depending on how the toolchain is configured, passing
> "-mthumb" would result in thumb-1 (if armv4/5 was the default) or
> thumb-2 (when armv7 was the default). I discovered this when first
> trying the toolchains from kernel.org because mine works in v7 by
> default, hence thumb-2. The change only replaces a few instructions
> that are not available in thumb-1 by their compatible equivalent. In
> addition, thumb cannot be used at -O0 or with frame pointers in general
> because register r7 is the frame pointer there, and cannot be assigned
> by the compiler. That's bad because r7 carries the syscall number. Now
> when thumb is detected, we simply use a slightly larger setup code
> which uses r6 and swaps it with r7 when performing the call.
>
> - third, the definitions of the (possibly wrong) arch-specific O_* values
> were dropped in favor of those coming from asm/fcntl.h. Not only these
> ones are correct, but doing so will avoid build redefinition warnings
> should the file be included for whatever other reason.
>
> - the errno, environ and the auxiliary vector were really a pain to
> use. errno was declared as a static variable, showing a different
> one to each build unit. environ had to be declared by the application,
> and the auxv had to be both declared and found by the application if
> needed. All three of them have now been declared as weak symbols,
> and environ and _auxv are setup during startup, so that only one
> instance of them exists across the whole binary, and that code
> currently declaring them continues to work. This now means that code
> not using them will not optimize them away anymore, but let's face
> it, errno was always used and no relevant application manages to get
> rid of .bss, so the amount of extra space is really just 8-16 bytes
> total for a much better simplicity for the user.
>
> - getauxval() and getpagesize() were added by Ammar Faizi (along with
> the associated selftests).
>
> This was tested on arm64/armv5/armv7/thumb1/thumb2/i386/x86_64/mips/riscv
> and s390 at all optimization levels. I could also verify that my original
> preinit code continues to build and work fine, so please consider queuing
> it.
Queued and pushed, thank you!
Thanx, Paul
> Thank you!
> Willy
>
> Changes since v1:
> - added missing s-o-b on the last 3 patches
>
> Ammar Faizi (3):
> nolibc/stdlib: Implement `getauxval(3)` function
> nolibc/sys: Implement `getpagesize(2)` function
> selftests/nolibc: Add `getpagesize(2)` selftest
>
> Sven Schnelle (2):
> tools/nolibc: export environ as a weak symbol on s390
> tools/nolibc: add auxiliary vector retrieval for s390
>
> Willy Tarreau (17):
> tools/nolibc: make compiler and assembler agree on the section around
> _start
> tools/nolibc: enable support for thumb1 mode for ARM
> tools/nolibc: support thumb mode with frame pointers on ARM
> tools/nolibc: remove local definitions of O_* flags for open/fcntl
> tools/nolibc: make errno a weak symbol instead of a static one
> tools/nolibc: export environ as a weak symbol on x86_64
> tools/nolibc: export environ as a weak symbol on i386
> tools/nolibc: export environ as a weak symbol on arm64
> tools/nolibc: export environ as a weak symbol on arm
> tools/nolibc: export environ as a weak symbol on mips
> tools/nolibc: export environ as a weak symbol on riscv
> tools/nolibc: add auxiliary vector retrieval for i386
> tools/nolibc: add auxiliary vector retrieval for x86_64
> tools/nolibc: add auxiliary vector retrieval for arm64
> tools/nolibc: add auxiliary vector retrieval for arm
> tools/nolibc: add auxiliary vector retrieval for riscv
> tools/nolibc: add auxiliary vector retrieval for mips
>
> tools/include/nolibc/arch-aarch64.h | 52 +++----
> tools/include/nolibc/arch-arm.h | 138 ++++++++++++-------
> tools/include/nolibc/arch-i386.h | 60 ++++----
> tools/include/nolibc/arch-mips.h | 79 ++++++-----
> tools/include/nolibc/arch-riscv.h | 62 +++++----
> tools/include/nolibc/arch-s390.h | 70 +++++-----
> tools/include/nolibc/arch-x86_64.h | 52 +++----
> tools/include/nolibc/errno.h | 4 +-
> tools/include/nolibc/stdlib.h | 27 ++++
> tools/include/nolibc/sys.h | 22 +++
> tools/testing/selftests/nolibc/nolibc-test.c | 30 ++++
> 11 files changed, 363 insertions(+), 233 deletions(-)
>
> --
> 2.17.5
>