...
> We also need this style of checking for the delta logic in __atoi_add(). have
> randomly tried different clang and gcc versions, seems all of them work
> correctly, but the compiling speed is not that good if we want to support the
> worst cases like "((0x900000 + 0x0f0000) + 5)", the shorter one
> "((0x900000+0x0f0000)+5)" is used by ARM+OABI (not supported by nolibc
> currently), therefore, we can strip some tailing branches but it is either not
> that fast, of course, the other architectures/variants can use faster
> __atoi_add() versions with less branches and without hex detection, comparison
> and calculating.
If there are only a few prefix offsets then the code can be optimised
to explicitly detect them - rather than decoding arbitrary hex values.
After all it only needs to decode the values that actually appear.
The code also needs a compile-time assert that the result
is constant (__buitin_constant_p() will do the check.
But you can't use _Static_assert() to report the error
because that requires an 'integer constant expression'.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Hi, David
> ...
> > We also need this style of checking for the delta logic in __atoi_add(). have
> > randomly tried different clang and gcc versions, seems all of them work
> > correctly, but the compiling speed is not that good if we want to support the
> > worst cases like "((0x900000 + 0x0f0000) + 5)", the shorter one
> > "((0x900000+0x0f0000)+5)" is used by ARM+OABI (not supported by nolibc
> > currently), therefore, we can strip some tailing branches but it is either not
> > that fast, of course, the other architectures/variants can use faster
> > __atoi_add() versions with less branches and without hex detection, comparison
> > and calculating.
>
> If there are only a few prefix offsets then the code can be optimised
> to explicitly detect them - rather than decoding arbitrary hex values.
> After all it only needs to decode the values that actually appear.
>
> The code also needs a compile-time assert that the result
> is constant (__buitin_constant_p() will do the check.
> But you can't use _Static_assert() to report the error
> because that requires an 'integer constant expression'.
>
Thanks a lot, your above suggestion inspired me a lot.
I have explored ARM and MIPS again and found their __NR_* definitions
have only a 'dynamic' part, that is the right part:
arch/mips/include/generated/uapi/asm/unistd_o32.h:#define __NR_io_uring_register (__NR_Linux + 427)
arch/mips/include/generated/uapi/asm/unistd_o32.h:#define __NR_open_tree (__NR_Linux + 428)
arch/mips/include/generated/uapi/asm/unistd_o32.h:#define __NR_move_mount (__NR_Linux + 429)
arch/mips/include/generated/uapi/asm/unistd_o32.h:#define __NR_fsopen (__NR_Linux + 430)
arch/mips/include/generated/uapi/asm/unistd_o32.h:#define __NR_fsconfig (__NR_Linux + 431)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:#define __NR_io_uring_setup (__NR_SYSCALL_BASE + 425)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:#define __NR_io_uring_enter (__NR_SYSCALL_BASE + 426)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:#define __NR_io_uring_register (__NR_SYSCALL_BASE + 427)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:#define __NR_open_tree (__NR_SYSCALL_BASE + 428)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:#define __NR_move_mount (__NR_SYSCALL_BASE + 429)
The left part: __NR_Linux and __NR_SYSCALL_BASE are always defined, so,
we can get their values directly, without the need of stringify and
unstringify, as a result, the delta addition work becomes:
base + __atoi_from(str, sizeof(#base) + 3)
And we can simply convert our old __atoi() to __atoi_from(), change the
fixed 0 'from' to a dynamic 'from'. and a simple __get_from() can help
us to get the right offset for more complicated cases, such as:
(__NR_Linux+1), (__NR_Linux + 1).
So, the new __atoi_add() becomes:
__atoi_add(str, base):
--> __atoi_add(__stringify(__NR_open_tree), __NR_Linux)
--> __atoi_add("(4000 + 428)", 4000)
--> __atoi_from("(4000 + 428)", sizeof(#4000) + 3) + 4000
--> __atoi_from("(4000 + 428)", 8) + 4000
~~~~ ^ / ~~~~
base \___/ base
from
--> 428 + 4000
--> 4428
It is very fast and the cost time is deterministic. It also works for
the most complicated case we have mentioned:
__atoi_add("((0x900000+0x0f0000)+5)", (0x900000+0x0f0000))
--> __atoi_from("((0x900000+0x0f0000)+5)", sizeof(#(0x900000+0x0f0000)) + 1) + (0x900000+0x0f0000)
^ /
\_________________/
--> ...
--> 5 + (0x900000+0x0f0000)
So, the calculating of the most complicated part can be simply skipped,
we only need to convert the minimal 'dynamic' part from string to
integer and since the 'dynamic' part is not that big, most of them may
be less than 1000 in the not long future, only 4 characters and
therefore only 4-level depth branches for __atoi_from(), so, even with
hex 'dynamic' part conversion (but we may don't need it any more), the
compile speed is also very fast.
A simple local test on most of the architectures shows, the compile
speed is very near to the one with our old proposed NOLIBC__NR_* macros
for every __NR_* (defined as (-1L) when __NR_* not defined) and their
generated binary size is the same, so, we are near the ultimate solution,
but still need more tests. Thanks again for your positive suggestion!
Best regards,
Zhangjin
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)