2024-04-25 16:30:19

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH 2/2] tools/nolibc: implement strtol() and friends

Hi Thomas,

On Thu, Apr 25, 2024 at 06:09:27PM +0200, Thomas Wei?schuh wrote:
> The implementation always works on uintmax_t values.
>
> This is inefficient when only 32bit are needed.
> However for all functions this only happens for strtol() on 32bit
> platforms.

That's indeed very useful! I think there's two small bugs below where
the second one hides the first one:

> +static __attribute__((unused))
> +uintmax_t __strtox(const char *nptr, char **endptr, int base, intmax_t lower_limit, uintmax_t upper_limit)
> +{
> + const char signed_ = lower_limit != 0;
> + unsigned char neg = 0, overflow = 0;
> + uintmax_t val = 0, limit, old_val;
> + char c;
> +
> + if (base < 0 || base > 35) {
^^^^^^^^^
should be 36 otherwise you won't support [0-9a-z].

> + SET_ERRNO(EINVAL);
> + goto out;
> + }
(...)
> + if (c > base)
> + goto out;

This should be "c >= base" otherwise 'z' is accepted in base 35 for
example. I think it could be useful to add one more test covering base
36 to make sure all chars pass ?

> + if (endptr)
> + *endptr = (char *)nptr;
> + return (neg ? -1 : 1) * val;

I just checked to see what the compiler does on this and quite frequently
it emits a multiply while the other approach involving only a negation is
always at least as short:

return neg ? -val : val;

E.g. here's the test code:

long fct1(long neg, long val)
{
return (neg ? -1 : 1) * val;
}

long fct2(long neg, long val)
{
return neg ? -val : val;
}

- on x86_64 with gcc-13.2 -Os:

0000000000000000 <fct1>:
0: f7 df neg %edi
2: 48 19 c0 sbb %rax,%rax
5: 48 83 c8 01 or $0x1,%rax
9: 48 0f af c6 imul %rsi,%rax
d: c3 ret

000000000000000e <fct2>:
e: 48 89 f0 mov %rsi,%rax
11: 85 ff test %edi,%edi
13: 74 03 je 18 <fct2+0xa>
15: 48 f7 d8 neg %rax
18: c3 ret

- on riscv64 with 13.2 -Os:

0000000000000000 <fct1>:
0: c509 beqz a0,a
2: 557d li a0,-1
4: 02b50533 mul a0,a0,a1
8: 8082 ret
a: 4505 li a0,1
c: bfe5 j 4

000000000000000e <fct2>:
e: c119 beqz a0,14
10: 40b005b3 neg a1,a1
14: 852e mv a0,a1
16: 8082 ret

So IMHO it would be better to go the simpler way even if these are just a
few bytes (and possibly ones less mul on some slow archs).

Thanks!
Willy