2022-10-09 15:39:59

by Willy Tarreau

[permalink] [raw]
Subject: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen()
and replaces it with a jump to strlen(), which is not defined as a symbol
and breaks compilation. Worse, when the function is called strlen(), the
function is simply replaced with a jump to itself, hence becomes an
infinite loop.

One way to avoid this is to always set -ffreestanding, but the calling
code doesn't know this and there's no way (either via attributes or
pragmas) to globally enable it from include files, effectively leaving
a painful situation for the caller.

It turns out that -fno-tree-loop-distribute-patterns disables replacement
of strlen-like loops with calls to strlen and that this option is accepted
in the optimize() function attribute. Thus at least it allows us to make
sure our local definition is not replaced with a self jump. The function
only needs to be renamed back to strlen() so that the symbol exists, which
implies that nolibc_strlen() which is used on variable strings has to be
declared as a macro that points back to it before the strlen() macro is
redifined.

It was verified to produce valid code with gcc 3.4 to 12.1 at different
optimization levels, and both with constant and variable strings.

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Fixes: 96980b833a21 ("tools/nolibc/string: do not use __builtin_strlen() at -O0")
Cc: "Paul E. McKenney" <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
---
tools/include/nolibc/string.h | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index bef35bee9c44..5ef8778cd16f 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -125,10 +125,16 @@ char *strcpy(char *dst, const char *src)
}

/* this function is only used with arguments that are not constants or when
- * it's not known because optimizations are disabled.
+ * it's not known because optimizations are disabled. Note that gcc 12
+ * recognizes an strlen() pattern and replaces it with a jump to strlen(),
+ * thus itself, hence the optimize() attribute below that's meant to disable
+ * this confusing practice.
*/
+#if defined(__GNUC__) && (__GNUC__ >= 12)
+__attribute__((optimize("no-tree-loop-distribute-patterns")))
+#endif
static __attribute__((unused))
-size_t nolibc_strlen(const char *str)
+size_t strlen(const char *str)
{
size_t len;

@@ -140,13 +146,12 @@ size_t nolibc_strlen(const char *str)
* the two branches, then will rely on an external definition of strlen().
*/
#if defined(__OPTIMIZE__)
+#define nolibc_strlen(x) strlen(x)
#define strlen(str) ({ \
__builtin_constant_p((str)) ? \
__builtin_strlen((str)) : \
nolibc_strlen((str)); \
})
-#else
-#define strlen(str) nolibc_strlen((str))
#endif

static __attribute__((unused))
--
2.35.3


2022-10-09 16:13:39

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

Willy Tarreau wrote:
> +#if defined(__GNUC__) && (__GNUC__ >= 12)
> +__attribute__((optimize("no-tree-loop-distribute-patterns")))
> +#endif
> static __attribute__((unused))
> -size_t nolibc_strlen(const char *str

I'd suggest to use asm("") in the loop body. It worked in the past
to prevent folding division loop back into division instruction.

Or switch to

size_t f(const char *s)
{
const char *s0 = s;
while (*s++)
;
return s - s0 - 1;
}

which compiles to 1 branch, not 2.

But of course they could recognise this pattern too.

2022-10-09 18:43:13

by Willy Tarreau

[permalink] [raw]
Subject: [PATCH v2] tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen()
and replaces it with a jump to strlen(), which is not defined as a symbol
and breaks compilation. Worse, when the function is called strlen(), the
function is simply replaced with a jump to itself, hence becomes an
infinite loop.

One way to avoid this is to always set -ffreestanding, but the calling
code doesn't know this and there's no way (either via attributes or
pragmas) to globally enable it from include files, effectively leaving
a painful situation for the caller.

Alexey suggested to place an empty asm() statement inside the loop to
stop gcc from recognizing a well-known pattern, which happens to work
pretty fine. At least it allows us to make sure our local definition
is not replaced with a self jump.

The function only needs to be renamed back to strlen() so that the symbol
exists, which implies that nolibc_strlen() which is used on variable
strings has to be declared as a macro that points back to it before the
strlen() macro is redifined.

It was verified to produce valid code with gcc 3.4 to 12.1 at different
optimization levels, and both with constant and variable strings.

In case this problem surfaces again in the future, an alternate approach
consisting in adding an optimize("no-tree-loop-distribute-patterns")
function attribute for gcc>=12 worked as well but is less pretty.

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Fixes: 96980b833a21 ("tools/nolibc/string: do not use __builtin_strlen() at -O0")
Cc: "Paul E. McKenney" <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
---
v2: dropped the attribute(optimize) in favor of an empty asm() statement

---
tools/include/nolibc/string.h | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index bef35bee9c44..718a405ffbc3 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -125,14 +125,18 @@ char *strcpy(char *dst, const char *src)
}

/* this function is only used with arguments that are not constants or when
- * it's not known because optimizations are disabled.
+ * it's not known because optimizations are disabled. Note that gcc 12
+ * recognizes an strlen() pattern and replaces it with a jump to strlen(),
+ * thus itself, hence the asm() statement below that's meant to disable this
+ * confusing practice.
*/
static __attribute__((unused))
-size_t nolibc_strlen(const char *str)
+size_t strlen(const char *str)
{
size_t len;

- for (len = 0; str[len]; len++);
+ for (len = 0; str[len]; len++)
+ asm("");
return len;
}

@@ -140,13 +144,12 @@ size_t nolibc_strlen(const char *str)
* the two branches, then will rely on an external definition of strlen().
*/
#if defined(__OPTIMIZE__)
+#define nolibc_strlen(x) strlen(x)
#define strlen(str) ({ \
__builtin_constant_p((str)) ? \
__builtin_strlen((str)) : \
nolibc_strlen((str)); \
})
-#else
-#define strlen(str) nolibc_strlen((str))
#endif

static __attribute__((unused))
--
2.35.3

2022-10-09 18:54:31

by Willy Tarreau

[permalink] [raw]
Subject: Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

Hi Alexey,

On Sun, Oct 09, 2022 at 06:45:49PM +0300, Alexey Dobriyan wrote:
> Willy Tarreau wrote:
> > +#if defined(__GNUC__) && (__GNUC__ >= 12)
> > +__attribute__((optimize("no-tree-loop-distribute-patterns")))
> > +#endif
> > static __attribute__((unused))
> > -size_t nolibc_strlen(const char *str
>
> I'd suggest to use asm("") in the loop body. It worked in the past
> to prevent folding division loop back into division instruction.

Ah excellent idea! I initially thought about using asm() to hide a
variable provenance but didn't like it much because it undermines
code optimization. But you're right, with an empty asm() statement
alone, the loop will not look like an strlen() anymore. Just tried
and it works like a charm, I'll resend a patch so that we can get
rid of the ugly ifdef.

> Or switch to
>
> size_t f(const char *s)
> {
> const char *s0 = s;
> while (*s++)
> ;
> return s - s0 - 1;
> }
>
> which compiles to 1 branch, not 2.

In fact it depends. In the original code that approach was part of
the ones I had considered, but it doesn't always in better code due
to the prologue and epilogue being larger. It's only better at -O1,
and -O2, but not -Os, and once you add asm() into it, only -O1
remains better:

$ nm --size len.o|grep O|rev|sort|rev
000000000000001a T len_while_O1
0000000000000022 T len_while_asm_O1
0000000000000026 T len_for_O1
000000000000001a T len_while_O2
000000000000002b T len_while_asm_O2
0000000000000021 T len_for_O2
0000000000000013 T len_while_Os
0000000000000015 T len_while_asm_Os
000000000000000e T len_for_Os

This observation seems consistent for me on x86_64, i386, arm and arm64.

> But of course they could recognise this pattern too.

Yes definitely, hence the need for asm() there as well to complete the
comparison.

Thanks for the suggestion, I'll send a v2 shortly.
Willy

2022-10-09 19:12:01

by Willy Tarreau

[permalink] [raw]
Subject: Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

On Sun, Oct 09, 2022 at 07:59:20PM +0200, Willy Tarreau wrote:
> Hi Alexey,
>
> On Sun, Oct 09, 2022 at 06:45:49PM +0300, Alexey Dobriyan wrote:
> > Willy Tarreau wrote:
> > > +#if defined(__GNUC__) && (__GNUC__ >= 12)
> > > +__attribute__((optimize("no-tree-loop-distribute-patterns")))
> > > +#endif
> > > static __attribute__((unused))
> > > -size_t nolibc_strlen(const char *str
> >
> > I'd suggest to use asm("") in the loop body. It worked in the past
> > to prevent folding division loop back into division instruction.
>
> Ah excellent idea! I initially thought about using asm() to hide a
> variable provenance but didn't like it much because it undermines
> code optimization. But you're right, with an empty asm() statement
> alone, the loop will not look like an strlen() anymore. Just tried
> and it works like a charm, I'll resend a patch so that we can get
> rid of the ugly ifdef.
>
> > Or switch to
> >
> > size_t f(const char *s)
> > {
> > const char *s0 = s;
> > while (*s++)
> > ;
> > return s - s0 - 1;
> > }
> >
> > which compiles to 1 branch, not 2.
>
> In fact it depends. In the original code that approach was part of
> the ones I had considered, but it doesn't always in better code due
> to the prologue and epilogue being larger. It's only better at -O1,
> and -O2, but not -Os, and once you add asm() into it, only -O1
> remains better:
>
> $ nm --size len.o|grep O|rev|sort|rev
> 000000000000001a T len_while_O1
> 0000000000000022 T len_while_asm_O1
> 0000000000000026 T len_for_O1
> 000000000000001a T len_while_O2
> 000000000000002b T len_while_asm_O2
> 0000000000000021 T len_for_O2
> 0000000000000013 T len_while_Os
> 0000000000000015 T len_while_asm_Os
> 000000000000000e T len_for_Os
>
> This observation seems consistent for me on x86_64, i386, arm and arm64.

By the way, just for the sake of completeness, the one that consistently
gives me a better output is this one:

size_t strlen(const char *str)
{
const char *s0 = str--;

while (*++str)
;
return str - s0;
}

Which gives me this:


0000000000000000 <strlen>:
0: 48 8d 47 ff lea -0x1(%rdi),%rax
4: 48 ff c0 inc %rax
7: 80 38 00 cmpb $0x0,(%rax)
a: 75 f8 jne 4 <len+0x4>
c: 48 29 f8 sub %rdi,%rax
f: c3 ret

But this is totally ruined by the addition of asm() in the loop. However
I suspect that the construct is difficult to match against a real strlen()
since it starts on an extra character, thus placing the asm() statement
before the loop could durably preserve it. It does work here (the code
remains the exact same one), but for how long, that's the question. Maybe
we can revisit the various loop-based functions in the future with this in
mind.

Cheers,
Willy

2022-10-10 10:40:56

by David Laight

[permalink] [raw]
Subject: RE: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

From: Willy Tarreau <[email protected]>
> Sent: 09 October 2022 19:36
...
> By the way, just for the sake of completeness, the one that consistently
> gives me a better output is this one:
>
> size_t strlen(const char *str)
> {
> const char *s0 = str--;
>
> while (*++str)
> ;
> return str - s0;
> }
>
> Which gives me this:
>
>
> 0000000000000000 <strlen>:
> 0: 48 8d 47 ff lea -0x1(%rdi),%rax
> 4: 48 ff c0 inc %rax
> 7: 80 38 00 cmpb $0x0,(%rax)
> a: 75 f8 jne 4 <len+0x4>
> c: 48 29 f8 sub %rdi,%rax
> f: c3 ret
>
> But this is totally ruined by the addition of asm() in the loop. However
> I suspect that the construct is difficult to match against a real strlen()
> since it starts on an extra character, thus placing the asm() statement
> before the loop could durably preserve it. It does work here (the code
> remains the exact same one), but for how long, that's the question. Maybe
> we can revisit the various loop-based functions in the future with this in
> mind.

clang wilfully and persistently generates:

strlen: # @strlen
movq $-1, %rax
.LBB0_1: # =>This Inner Loop Header: Depth=1
cmpb $0, 1(%rdi,%rax)
leaq 1(%rax), %rax
jne .LBB0_1
retq

But feed the C for that into gcc and it generates a 'jmp strlen'
at everything above -O1.
I suspect that might run with less clocks/byte than the code above.

Somewhere I hate some complier pessimisations.
Substituting a call to strlen() is typical.
strlen() is almost certainly optimised for long strings.
If the string is short the coded loop will be faster.
The same is true (and probably more so) for memcpy.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-10-11 06:39:14

by Willy Tarreau

[permalink] [raw]
Subject: Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

On Mon, Oct 10, 2022 at 10:03:53AM +0000, David Laight wrote:
> From: Willy Tarreau <[email protected]>
> > Sent: 09 October 2022 19:36
> ...
> > By the way, just for the sake of completeness, the one that consistently
> > gives me a better output is this one:
> >
> > size_t strlen(const char *str)
> > {
> > const char *s0 = str--;
> >
> > while (*++str)
> > ;
> > return str - s0;
> > }
> >
> > Which gives me this:
> >
> >
> > 0000000000000000 <strlen>:
> > 0: 48 8d 47 ff lea -0x1(%rdi),%rax
> > 4: 48 ff c0 inc %rax
> > 7: 80 38 00 cmpb $0x0,(%rax)
> > a: 75 f8 jne 4 <len+0x4>
> > c: 48 29 f8 sub %rdi,%rax
> > f: c3 ret
> >
> > But this is totally ruined by the addition of asm() in the loop. However
> > I suspect that the construct is difficult to match against a real strlen()
> > since it starts on an extra character, thus placing the asm() statement
> > before the loop could durably preserve it. It does work here (the code
> > remains the exact same one), but for how long, that's the question. Maybe
> > we can revisit the various loop-based functions in the future with this in
> > mind.
>
> clang wilfully and persistently generates:
>
> strlen: # @strlen
> movq $-1, %rax
> .LBB0_1: # =>This Inner Loop Header: Depth=1
> cmpb $0, 1(%rdi,%rax)
> leaq 1(%rax), %rax
> jne .LBB0_1
> retq
>
> But feed the C for that into gcc and it generates a 'jmp strlen'
> at everything above -O1.

Interesting, that's not the case for me here with 12.2 from kernel.org
on x86_64, which gives this at -O1, -O2, -O3 and -Ofast:

0000000000000000 <strlen>:
0: 48 8d 47 ff lea -0x1(%rdi),%rax
4: 0f 1f 40 00 nopl 0x0(%rax)
8: 48 83 c0 01 add $0x1,%rax
c: 80 38 00 cmpb $0x0,(%rax)
f: 75 f7 jne 8 <strlen+0x8>
11: 48 29 f8 sub %rdi,%rax
14: c3 ret

Out of curiosity what version were you using ?

> I suspect that might run with less clocks/byte than the code above.

Certainly for large strings, but not for short ones.

> Somewhere I hate some complier pessimisations.
> Substituting a call to strlen() is typical.
> strlen() is almost certainly optimised for long strings.
> If the string is short the coded loop will be faster.

Yes, and more importantly, if a developer takes the time to explicitly
write a loop to do something that matches such a function, it's very
likely that they already considered the function and did *not* want to
use it for whatever reason. And the problem we're currently having with
compilers is that they are not willing to respect the developer's intent
because they always know better.

> The same is true (and probably more so) for memcpy.

Yes, I think that we'll eventually have to stuff a few asm() here and
there in a few such loops as compilers become less and less trustable.

Willy

2022-10-11 10:34:15

by David Laight

[permalink] [raw]
Subject: RE: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

From: Willy Tarreau
> Sent: 11 October 2022 07:21
>
> On Mon, Oct 10, 2022 at 10:03:53AM +0000, David Laight wrote:
> > From: Willy Tarreau <[email protected]>
> > > Sent: 09 October 2022 19:36
> > ...
> > > By the way, just for the sake of completeness, the one that consistently
> > > gives me a better output is this one:
> > >
> > > size_t strlen(const char *str)
> > > {
> > > const char *s0 = str--;
> > >
> > > while (*++str)
> > > ;
> > > return str - s0;
> > > }
> > >
> > > Which gives me this:
> > >
> > >
> > > 0000000000000000 <strlen>:
> > > 0: 48 8d 47 ff lea -0x1(%rdi),%rax
> > > 4: 48 ff c0 inc %rax
> > > 7: 80 38 00 cmpb $0x0,(%rax)
> > > a: 75 f8 jne 4 <len+0x4>
> > > c: 48 29 f8 sub %rdi,%rax
> > > f: c3 ret
> > >
> > > But this is totally ruined by the addition of asm() in the loop. However
> > > I suspect that the construct is difficult to match against a real strlen()
> > > since it starts on an extra character, thus placing the asm() statement
> > > before the loop could durably preserve it. It does work here (the code
> > > remains the exact same one), but for how long, that's the question. Maybe
> > > we can revisit the various loop-based functions in the future with this in
> > > mind.
> >
> > clang wilfully and persistently generates:
> >
> > strlen: # @strlen
> > movq $-1, %rax
> > .LBB0_1: # =>This Inner Loop Header: Depth=1
> > cmpb $0, 1(%rdi,%rax)
> > leaq 1(%rax), %rax
> > jne .LBB0_1
> > retq
> >
> > But feed the C for that into gcc and it generates a 'jmp strlen'
> > at everything above -O1.
>
> Interesting, that's not the case for me here with 12.2 from kernel.org
> on x86_64, which gives this at -O1, -O2, -O3 and -Ofast:
>
> 0000000000000000 <strlen>:
> 0: 48 8d 47 ff lea -0x1(%rdi),%rax
> 4: 0f 1f 40 00 nopl 0x0(%rax)
> 8: 48 83 c0 01 add $0x1,%rax
> c: 80 38 00 cmpb $0x0,(%rax)
> f: 75 f7 jne 8 <strlen+0x8>
> 11: 48 29 f8 sub %rdi,%rax
> 14: c3 ret
>
> Out of curiosity what version were you using ?

Clang 12.0.0 onwards, see https://godbolt.org/z/67Gnzs8js

> > I suspect that might run with less clocks/byte than the code above.
>
> Certainly for large strings, but not for short ones.

For short strings not needing the final sub and not having
the read depend on the increment should make the leal one faster.
(The nop to align the loop label is monumentally pointless.)

For long strings what matters is how many clocks it takes
to schedule the 4 uops in the loop.
It might be possible to get down to 2 clocks - but I think
both the loops are 3 clocks (assuming the adjacent cmp/jne fuse).

I'm not going to try to instrument the loops though!

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-10-20 22:52:57

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2] tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12

On Sun, Oct 09, 2022 at 08:29:36PM +0200, Willy Tarreau wrote:
> When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen()
> and replaces it with a jump to strlen(), which is not defined as a symbol
> and breaks compilation. Worse, when the function is called strlen(), the
> function is simply replaced with a jump to itself, hence becomes an
> infinite loop.
>
> One way to avoid this is to always set -ffreestanding, but the calling
> code doesn't know this and there's no way (either via attributes or
> pragmas) to globally enable it from include files, effectively leaving
> a painful situation for the caller.
>
> Alexey suggested to place an empty asm() statement inside the loop to
> stop gcc from recognizing a well-known pattern, which happens to work
> pretty fine. At least it allows us to make sure our local definition
> is not replaced with a self jump.
>
> The function only needs to be renamed back to strlen() so that the symbol
> exists, which implies that nolibc_strlen() which is used on variable
> strings has to be declared as a macro that points back to it before the
> strlen() macro is redifined.
>
> It was verified to produce valid code with gcc 3.4 to 12.1 at different
> optimization levels, and both with constant and variable strings.
>
> In case this problem surfaces again in the future, an alternate approach
> consisting in adding an optimize("no-tree-loop-distribute-patterns")
> function attribute for gcc>=12 worked as well but is less pretty.
>
> Reported-by: kernel test robot <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
> Fixes: 96980b833a21 ("tools/nolibc/string: do not use __builtin_strlen() at -O0")
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Alexey Dobriyan <[email protected]>
> Signed-off-by: Willy Tarreau <[email protected]>

Hearing no objections, I have pulled this one in for review and testing,
thank you!

Thanx, Paul

> ---
> v2: dropped the attribute(optimize) in favor of an empty asm() statement
>
> ---
> tools/include/nolibc/string.h | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
> index bef35bee9c44..718a405ffbc3 100644
> --- a/tools/include/nolibc/string.h
> +++ b/tools/include/nolibc/string.h
> @@ -125,14 +125,18 @@ char *strcpy(char *dst, const char *src)
> }
>
> /* this function is only used with arguments that are not constants or when
> - * it's not known because optimizations are disabled.
> + * it's not known because optimizations are disabled. Note that gcc 12
> + * recognizes an strlen() pattern and replaces it with a jump to strlen(),
> + * thus itself, hence the asm() statement below that's meant to disable this
> + * confusing practice.
> */
> static __attribute__((unused))
> -size_t nolibc_strlen(const char *str)
> +size_t strlen(const char *str)
> {
> size_t len;
>
> - for (len = 0; str[len]; len++);
> + for (len = 0; str[len]; len++)
> + asm("");
> return len;
> }
>
> @@ -140,13 +144,12 @@ size_t nolibc_strlen(const char *str)
> * the two branches, then will rely on an external definition of strlen().
> */
> #if defined(__OPTIMIZE__)
> +#define nolibc_strlen(x) strlen(x)
> #define strlen(str) ({ \
> __builtin_constant_p((str)) ? \
> __builtin_strlen((str)) : \
> nolibc_strlen((str)); \
> })
> -#else
> -#define strlen(str) nolibc_strlen((str))
> #endif
>
> static __attribute__((unused))
> --
> 2.35.3
>