2020-08-15 22:20:43

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2] lib/string.c: implement stpcpy

On Sat, Aug 15, 2020 at 2:31 PM Joe Perches <[email protected]> wrote:
>
> On Sat, 2020-08-15 at 14:28 -0700, Nick Desaulniers wrote:
> > On Sat, Aug 15, 2020 at 2:24 PM Joe Perches <[email protected]> wrote:
> > > On Sat, 2020-08-15 at 13:47 -0700, Nick Desaulniers wrote:
> > > > On Sat, Aug 15, 2020 at 9:34 AM Kees Cook <[email protected]> wrote:
> > > > > On Fri, Aug 14, 2020 at 07:09:44PM -0700, Nick Desaulniers wrote:
> > > > > > LLVM implemented a recent "libcall optimization" that lowers calls to
> > > > > > `sprintf(dest, "%s", str)` where the return value is used to
> > > > > > `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> > > > > > in parsing format strings. Calling `sprintf` with overlapping arguments
> > > > > > was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.
> > > > > >
> > > > > > `stpcpy` is just like `strcpy` except it returns the pointer to the new
> > > > > > tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
> > > > > > one statement.
> > > > >
> > > > > O_O What?
> > > > >
> > > > > No; this is a _terrible_ API: there is no bounds checking, there are no
> > > > > buffer sizes. Anything using the example sprintf() pattern is _already_
> > > > > wrong and must be removed from the kernel. (Yes, I realize that the
> > > > > kernel is *filled* with this bad assumption that "I'll never write more
> > > > > than PAGE_SIZE bytes to this buffer", but that's both theoretically
> > > > > wrong ("640k is enough for anybody") and has been known to be wrong in
> > > > > practice too (e.g. when suddenly your writing routine is reachable by
> > > > > splice(2) and you may not have a PAGE_SIZE buffer).
> > > > >
> > > > > But we cannot _add_ another dangerous string API. We're already in a
> > > > > terrible mess trying to remove strcpy[1], strlcpy[2], and strncpy[3]. This
> > > > > needs to be addressed up by removing the unbounded sprintf() uses. (And
> > > > > to do so without introducing bugs related to using snprintf() when
> > > > > scnprintf() is expected[4].)
> > > >
> > > > Well, everything (-next, mainline, stable) is broken right now (with
> > > > ToT Clang) without providing this symbol. I'm not going to go clean
> > > > the entire kernel's use of sprintf to get our CI back to being green.
> > >
> > > Maybe this should get place in compiler-clang.h so it isn't
> > > generic and public.
> >
> > https://bugs.llvm.org/show_bug.cgi?id=47162#c7 and
> > https://bugs.llvm.org/show_bug.cgi?id=47144
> > Seem to imply that Clang is not the only compiler that can lower a
> > sequence of libcalls to stpcpy. Do we want to wait until we have a
> > fire drill w/ GCC to move such an implementation from
> > include/linux/compiler-clang.h back in to lib/string.c?
>
> My guess is yes, wait until gcc, if ever, needs it.

The suggestion to use static inline doesn't even make sense. The
compiler is lowering calls to other library routines; `stpcpy` isn't
being explicitly called. Even if it was, not sure we want it being
inlined. No symbol definition will be emitted; problem not solved.
And I refuse to add any more code using `extern inline`. Putting the
definition in lib/string.c is the most straightforward and avoids
revisiting this issue in the future for other toolchains. I'll limit
access by removing the declaration, and adding a comment to avoid its
use. But if you're going to use a gnu target triple without using
-ffreestanding because you *want* libcall optimizations, then you have
to provide symbols for all possible library routines!
--
Thanks,
~Nick Desaulniers


2020-08-16 07:17:35

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH v2] lib/string.c: implement stpcpy

On 2020-08-15, 'Nick Desaulniers' via Clang Built Linux wrote:
>On Sat, Aug 15, 2020 at 2:31 PM Joe Perches <[email protected]> wrote:
>>
>> On Sat, 2020-08-15 at 14:28 -0700, Nick Desaulniers wrote:
>> > On Sat, Aug 15, 2020 at 2:24 PM Joe Perches <[email protected]> wrote:
>> > > On Sat, 2020-08-15 at 13:47 -0700, Nick Desaulniers wrote:
>> > > > On Sat, Aug 15, 2020 at 9:34 AM Kees Cook <[email protected]> wrote:
>> > > > > On Fri, Aug 14, 2020 at 07:09:44PM -0700, Nick Desaulniers wrote:
>> > > > > > LLVM implemented a recent "libcall optimization" that lowers calls to
>> > > > > > `sprintf(dest, "%s", str)` where the return value is used to
>> > > > > > `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
>> > > > > > in parsing format strings. Calling `sprintf` with overlapping arguments
>> > > > > > was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.
>> > > > > >
>> > > > > > `stpcpy` is just like `strcpy` except it returns the pointer to the new
>> > > > > > tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
>> > > > > > one statement.
>> > > > >
>> > > > > O_O What?
>> > > > >
>> > > > > No; this is a _terrible_ API: there is no bounds checking, there are no
>> > > > > buffer sizes. Anything using the example sprintf() pattern is _already_
>> > > > > wrong and must be removed from the kernel. (Yes, I realize that the
>> > > > > kernel is *filled* with this bad assumption that "I'll never write more
>> > > > > than PAGE_SIZE bytes to this buffer", but that's both theoretically
>> > > > > wrong ("640k is enough for anybody") and has been known to be wrong in
>> > > > > practice too (e.g. when suddenly your writing routine is reachable by
>> > > > > splice(2) and you may not have a PAGE_SIZE buffer).
>> > > > >
>> > > > > But we cannot _add_ another dangerous string API. We're already in a
>> > > > > terrible mess trying to remove strcpy[1], strlcpy[2], and strncpy[3]. This
>> > > > > needs to be addressed up by removing the unbounded sprintf() uses. (And
>> > > > > to do so without introducing bugs related to using snprintf() when
>> > > > > scnprintf() is expected[4].)
>> > > >
>> > > > Well, everything (-next, mainline, stable) is broken right now (with
>> > > > ToT Clang) without providing this symbol. I'm not going to go clean
>> > > > the entire kernel's use of sprintf to get our CI back to being green.
>> > >
>> > > Maybe this should get place in compiler-clang.h so it isn't
>> > > generic and public.
>> >
>> > https://bugs.llvm.org/show_bug.cgi?id=47162#c7 and
>> > https://bugs.llvm.org/show_bug.cgi?id=47144
>> > Seem to imply that Clang is not the only compiler that can lower a
>> > sequence of libcalls to stpcpy. Do we want to wait until we have a
>> > fire drill w/ GCC to move such an implementation from
>> > include/linux/compiler-clang.h back in to lib/string.c?
>>
>> My guess is yes, wait until gcc, if ever, needs it.
>
>The suggestion to use static inline doesn't even make sense. The
>compiler is lowering calls to other library routines; `stpcpy` isn't
>being explicitly called. Even if it was, not sure we want it being
>inlined. No symbol definition will be emitted; problem not solved.
>And I refuse to add any more code using `extern inline`. Putting the
>definition in lib/string.c is the most straightforward and avoids
>revisiting this issue in the future for other toolchains. I'll limit
>access by removing the declaration, and adding a comment to avoid its
>use. But if you're going to use a gnu target triple without using
>-ffreestanding because you *want* libcall optimizations, then you have
>to provide symbols for all possible library routines!

Adding a definition without a declaration for stpcpy looks good.
Clang LTO will work.

(If the kernel does not want to provide these routines,
is http://git.kernel.org/linus/6edfba1b33c701108717f4e036320fc39abe1912
probably wrong? (why remove -ffreestanding from the main Makefile) )

2020-08-16 09:13:49

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v2] lib/string.c: implement stpcpy

On Sun, Aug 16, 2020 at 2:19 AM 'Fangrui Song' via Clang Built Linux
<[email protected]> wrote:
>
> On 2020-08-15, 'Nick Desaulniers' via Clang Built Linux wrote:
> >On Sat, Aug 15, 2020 at 2:31 PM Joe Perches <[email protected]> wrote:
> >>
> >> On Sat, 2020-08-15 at 14:28 -0700, Nick Desaulniers wrote:
> >> > On Sat, Aug 15, 2020 at 2:24 PM Joe Perches <[email protected]> wrote:
> >> > > On Sat, 2020-08-15 at 13:47 -0700, Nick Desaulniers wrote:
> >> > > > On Sat, Aug 15, 2020 at 9:34 AM Kees Cook <[email protected]> wrote:
> >> > > > > On Fri, Aug 14, 2020 at 07:09:44PM -0700, Nick Desaulniers wrote:
> >> > > > > > LLVM implemented a recent "libcall optimization" that lowers calls to
> >> > > > > > `sprintf(dest, "%s", str)` where the return value is used to
> >> > > > > > `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> >> > > > > > in parsing format strings. Calling `sprintf` with overlapping arguments
> >> > > > > > was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.
> >> > > > > >
> >> > > > > > `stpcpy` is just like `strcpy` except it returns the pointer to the new
> >> > > > > > tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
> >> > > > > > one statement.
> >> > > > >
> >> > > > > O_O What?
> >> > > > >
> >> > > > > No; this is a _terrible_ API: there is no bounds checking, there are no
> >> > > > > buffer sizes. Anything using the example sprintf() pattern is _already_
> >> > > > > wrong and must be removed from the kernel. (Yes, I realize that the
> >> > > > > kernel is *filled* with this bad assumption that "I'll never write more
> >> > > > > than PAGE_SIZE bytes to this buffer", but that's both theoretically
> >> > > > > wrong ("640k is enough for anybody") and has been known to be wrong in
> >> > > > > practice too (e.g. when suddenly your writing routine is reachable by
> >> > > > > splice(2) and you may not have a PAGE_SIZE buffer).
> >> > > > >
> >> > > > > But we cannot _add_ another dangerous string API. We're already in a
> >> > > > > terrible mess trying to remove strcpy[1], strlcpy[2], and strncpy[3]. This
> >> > > > > needs to be addressed up by removing the unbounded sprintf() uses. (And
> >> > > > > to do so without introducing bugs related to using snprintf() when
> >> > > > > scnprintf() is expected[4].)
> >> > > >
> >> > > > Well, everything (-next, mainline, stable) is broken right now (with
> >> > > > ToT Clang) without providing this symbol. I'm not going to go clean
> >> > > > the entire kernel's use of sprintf to get our CI back to being green.
> >> > >
> >> > > Maybe this should get place in compiler-clang.h so it isn't
> >> > > generic and public.
> >> >
> >> > https://bugs.llvm.org/show_bug.cgi?id=47162#c7 and
> >> > https://bugs.llvm.org/show_bug.cgi?id=47144
> >> > Seem to imply that Clang is not the only compiler that can lower a
> >> > sequence of libcalls to stpcpy. Do we want to wait until we have a
> >> > fire drill w/ GCC to move such an implementation from
> >> > include/linux/compiler-clang.h back in to lib/string.c?
> >>
> >> My guess is yes, wait until gcc, if ever, needs it.
> >
> >The suggestion to use static inline doesn't even make sense. The
> >compiler is lowering calls to other library routines; `stpcpy` isn't
> >being explicitly called. Even if it was, not sure we want it being
> >inlined. No symbol definition will be emitted; problem not solved.
> >And I refuse to add any more code using `extern inline`. Putting the
> >definition in lib/string.c is the most straightforward and avoids
> >revisiting this issue in the future for other toolchains. I'll limit
> >access by removing the declaration, and adding a comment to avoid its
> >use. But if you're going to use a gnu target triple without using
> >-ffreestanding because you *want* libcall optimizations, then you have
> >to provide symbols for all possible library routines!
>
> Adding a definition without a declaration for stpcpy looks good.
> Clang LTO will work.
>
> (If the kernel does not want to provide these routines,
> is http://git.kernel.org/linus/6edfba1b33c701108717f4e036320fc39abe1912
> probably wrong? (why remove -ffreestanding from the main Makefile) )
>

We had some many issues in arch/x86 where *FLAGS were removed or used
differently and had to re-add them :-(.

So if -ffreestanding is used in arch/x86 and was! used in top-level
Makefile - it makes sense to re-add it back?
( I cannot speak for archs other than x86. )

- Sedat -

2020-08-16 15:25:12

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v2] lib/string.c: implement stpcpy

On Sun, Aug 16, 2020 at 07:22:35AM +0200, Sedat Dilek wrote:
> On Sun, Aug 16, 2020 at 2:19 AM 'Fangrui Song' via Clang Built Linux
> <[email protected]> wrote:
> >
> > Adding a definition without a declaration for stpcpy looks good.
> > Clang LTO will work.
> >
> > (If the kernel does not want to provide these routines,
> > is http://git.kernel.org/linus/6edfba1b33c701108717f4e036320fc39abe1912
> > probably wrong? (why remove -ffreestanding from the main Makefile) )
> >
>
> We had some many issues in arch/x86 where *FLAGS were removed or used
> differently and had to re-add them :-(.
>
> So if -ffreestanding is used in arch/x86 and was! used in top-level
> Makefile - it makes sense to re-add it back?
> ( I cannot speak for archs other than x86. )
>
> - Sedat -

-ffreestanding disables _all_ builtins and libcall optimizations, which
is probably not desirable. If we added it back, we'd need to also go
back to #define various string functions to the __builtin versions.

Though I don't understand the original issue, with -ffreestanding,
sprintf shouldn't have been turned into strcpy in the first place.

32-bit still has -ffreestanding -- I wonder if that's actually necessary
any more?

Why does -fno-builtin-stpcpy not work with clang LTO? Isn't that a
compiler bug?

Thanks.

2020-08-17 22:36:19

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v2] lib/string.c: implement stpcpy

On Sun, Aug 16, 2020 at 8:02 AM Arvind Sankar <[email protected]> wrote:
>
> On Sun, Aug 16, 2020 at 07:22:35AM +0200, Sedat Dilek wrote:
> > On Sun, Aug 16, 2020 at 2:19 AM 'Fangrui Song' via Clang Built Linux
> > <[email protected]> wrote:
> > >
> > > Adding a definition without a declaration for stpcpy looks good.
> > > Clang LTO will work.
> > >
> > > (If the kernel does not want to provide these routines,
> > > is http://git.kernel.org/linus/6edfba1b33c701108717f4e036320fc39abe1912
> > > probably wrong? (why remove -ffreestanding from the main Makefile) )
> > >
> >
> > We had some many issues in arch/x86 where *FLAGS were removed or used
> > differently and had to re-add them :-(.
> >
> > So if -ffreestanding is used in arch/x86 and was! used in top-level
> > Makefile - it makes sense to re-add it back?
> > ( I cannot speak for archs other than x86. )
> >
> > - Sedat -
>
> -ffreestanding disables _all_ builtins and libcall optimizations, which
> is probably not desirable. If we added it back, we'd need to also go
> back to #define various string functions to the __builtin versions.
>
> Though I don't understand the original issue, with -ffreestanding,
> sprintf shouldn't have been turned into strcpy in the first place.
>
> 32-bit still has -ffreestanding -- I wonder if that's actually necessary
> any more?
>
> Why does -fno-builtin-stpcpy not work with clang LTO? Isn't that a
> compiler bug?

I just confirmed that adding -fno-builtin-stpcpy to KBUILD_CFLAGS does
work with LTO as well.

Sami

2020-08-17 22:45:37

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2] lib/string.c: implement stpcpy

On Mon, Aug 17, 2020 at 10:14:43AM -0700, Sami Tolvanen wrote:
> I just confirmed that adding -fno-builtin-stpcpy to KBUILD_CFLAGS does
> work with LTO as well.

Oh, I read this out of order; sorry! Yes, if -fno-builtin-stpcpy works,
let's use that instead. Doesn't that solve it too?

--
Kees Cook