I haven't looked into why this is so slow with clang, but it really is
painfully slow:
time make CC=clang allmodconfig
real 0m2.667s
vs the gcc case:
time make CC=gcc allmodconfig
real 0m0.903s
Yeah, yeah, three seconds may sound like "not a lot of time, but
considering that the subsequent full build (which for me is often
empty) doesn't take all that much longer, that config time clang waste
is actually quite noticeable.
I actually don't do allmodconfig builds with clang, but I do my
default kernel builds with it:
time make oldconfig
real 0m2.748s
time sh -c "make -j128 > ../makes"
real 0m3.546s
so that "make oldconfig" really is almost as slow as the whole
"confirm build is done" thing. Its' quite noticeable in my workflow.
The gcc config isn't super-fast either, but there's a big 3x
difference, so the clang case really is doing something extra wrong.
I've not actually looked into _why_. Except I do see that "clang" gets
invoked with small (empty?) test files several times, probably to
check for command line flags being valid.
Sending this to relevant parties in the hope that somebody goes "Yeah,
that's silly" and fixes it.
This is on my F34 machine:
clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
in case it matters (but I don't see why it should).
Many many moons ago the promise for clang was faster build speeds.
That didn't turn out to be true, but can we please at least try to
make them not painfully much slower?
Linus
On Thu, Apr 29, 2021 at 2:53 PM Linus Torvalds
<[email protected]> wrote:
>
> I haven't looked into why this is so slow with clang, but it really is
> painfully slow:
>
> time make CC=clang allmodconfig
> real 0m2.667s
>
> vs the gcc case:
>
> time make CC=gcc allmodconfig
> real 0m0.903s
Hmmm...I seem to only be able to reproduce such a drastic difference
between the two if I:
1. make clean
2. time make CC=<either> allmodconfig
3. time make CC=<the other> allmodconfig
without doing another `make clean` in between 2 and 3; and regardless
which toolchain I use first vs second. Otherwise I pretty
consistently get 1.49-1.62s for clang, 1.28-1.4s for gcc; that's a
build of clang with assertions enabled, too.
Can you confirm your observations with `make clean` between runs? Can
you provide info about your clang build such as the version string,
and whether this was built locally perhaps?
>
> Yeah, yeah, three seconds may sound like "not a lot of time, but
> considering that the subsequent full build (which for me is often
> empty) doesn't take all that much longer, that config time clang waste
> is actually quite noticeable.
>
> I actually don't do allmodconfig builds with clang, but I do my
> default kernel builds with it:
:)
>
> time make oldconfig
> real 0m2.748s
>
> time sh -c "make -j128 > ../makes"
> real 0m3.546s
>
> so that "make oldconfig" really is almost as slow as the whole
> "confirm build is done" thing. Its' quite noticeable in my workflow.
>
> The gcc config isn't super-fast either, but there's a big 3x
> difference, so the clang case really is doing something extra wrong.
>
> I've not actually looked into _why_. Except I do see that "clang" gets
> invoked with small (empty?) test files several times, probably to
> check for command line flags being valid.
There's probably more we can be doing to speed up the flag checking
case; Nathan had a good idea about using -fsyntax-only to stop the
compilation pipeline after flags have been validated. I think we
should run some testing on that to see if it makes a measurable
impact; I'd imagine that being beneficial to both compilers.
>
> Sending this to relevant parties in the hope that somebody goes "Yeah,
> that's silly" and fixes it.
>
> This is on my F34 machine:
>
> clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
>
> in case it matters (but I don't see why it should).
>
> Many many moons ago the promise for clang was faster build speeds.
> That didn't turn out to be true, but can we please at least try to
> make them not painfully much slower?
Ack. Forwarded that request directly up the chain of command. ;)
In the interest of build speed, have you tried LLD yet? `make LLVM=1
...` or `make LD=ld.lld ...` should do it; you'll find it's much
faster than the competition, especially when there's a large number of
cores on the host. Not going to help with the allmodconfig
configuration, but would definitely help incremental rebuilds.
--
Thanks,
~Nick Desaulniers
On Thu, Apr 29, 2021 at 02:53:08PM -0700, Linus Torvalds wrote:
> I haven't looked into why this is so slow with clang, but it really is
> painfully slow:
>
> time make CC=clang allmodconfig
> real 0m2.667s
>
> vs the gcc case:
>
> time make CC=gcc allmodconfig
> real 0m0.903s
>
> Yeah, yeah, three seconds may sound like "not a lot of time, but
> considering that the subsequent full build (which for me is often
> empty) doesn't take all that much longer, that config time clang waste
> is actually quite noticeable.
>
> I actually don't do allmodconfig builds with clang, but I do my
> default kernel builds with it:
>
> time make oldconfig
> real 0m2.748s
>
> time sh -c "make -j128 > ../makes"
> real 0m3.546s
>
> so that "make oldconfig" really is almost as slow as the whole
> "confirm build is done" thing. Its' quite noticeable in my workflow.
>
> The gcc config isn't super-fast either, but there's a big 3x
> difference, so the clang case really is doing something extra wrong.
>
> I've not actually looked into _why_. Except I do see that "clang" gets
> invoked with small (empty?) test files several times, probably to
> check for command line flags being valid.
>
> Sending this to relevant parties in the hope that somebody goes "Yeah,
> that's silly" and fixes it.
>
> This is on my F34 machine:
>
> clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
>
> in case it matters (but I don't see why it should).
>
> Many many moons ago the promise for clang was faster build speeds.
> That didn't turn out to be true, but can we please at least try to
> make them not painfully much slower?
Hi Linus,
I benchmarked this with your latest tree
(8ca5297e7e38f2dc8c753d33a5092e7be181fff0) with my distribution versions
of clang 11.1.0 and gcc 10.2.0 and I saw the same results, benchmarking
with hyperfine.
$ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
Benchmark #1: make allmodconfig
Time (mean ± σ): 1.490 s ± 0.012 s [User: 1.153 s, System: 0.374 s]
Range (min … max): 1.462 s … 1.522 s 100 runs
Benchmark #2: make CC=clang allmodconfig
Time (mean ± σ): 4.001 s ± 0.020 s [User: 2.761 s, System: 1.274 s]
Range (min … max): 3.939 s … 4.038 s 100 runs
Summary
'make allmodconfig' ran
2.69 ± 0.03 times faster than 'make CC=clang allmodconfig'
It was also reproducible in a Fedora Docker image, which has newer
versions of those tools than my distro does (GCC 11.1.0 and clang
12.0.0):
$ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
Benchmark #1: make allmodconfig
Time (mean ± σ): 989.9 ms ± 3.5 ms [User: 747.0 ms, System: 271.1 ms]
Range (min … max): 983.0 ms … 998.2 ms 100 runs
Benchmark #2: make CC=clang allmodconfig
Time (mean ± σ): 3.328 s ± 0.005 s [User: 2.408 s, System: 0.948 s]
Range (min … max): 3.316 s … 3.343 s 100 runs
Summary
'make allmodconfig' ran
3.36 ± 0.01 times faster than 'make CC=clang allmodconfig'
Unfortunately, I doubt there is much that can be done on the kernel side
because this is reproducible just invoking the compilers without any
source input.
Clang 11.1.0 and GCC 10.2.0:
$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -x c -c -o /dev/null -
Time (mean ± σ): 9.6 ms ± 1.0 ms [User: 6.5 ms, System: 3.4 ms]
Range (min … max): 5.8 ms … 12.7 ms 5000 runs
Benchmark #2: echo | clang -x c -c -o /dev/null -
Time (mean ± σ): 33.0 ms ± 0.8 ms [User: 22.4 ms, System: 10.9 ms]
Range (min … max): 30.3 ms … 36.0 ms 5000 runs
Summary
'echo | gcc -x c -c -o /dev/null -' ran
3.45 ± 0.39 times faster than 'echo | clang -x c -c -o /dev/null -'
$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
Time (mean ± σ): 11.9 ms ± 1.1 ms [User: 10.5 ms, System: 1.8 ms]
Range (min … max): 8.2 ms … 15.1 ms 5000 runs
Warning: Ignoring non-zero exit code.
Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
Time (mean ± σ): 31.0 ms ± 0.8 ms [User: 20.3 ms, System: 10.9 ms]
Range (min … max): 27.9 ms … 33.8 ms 5000 runs
Warning: Ignoring non-zero exit code.
Summary
'echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
2.62 ± 0.26 times faster than 'echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
Clang 12.0.0 and GCC 11.1.0:
$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -x c -c -o /dev/null -
Time (mean ± σ): 8.5 ms ± 0.3 ms [User: 5.6 ms, System: 3.3 ms]
Range (min … max): 7.6 ms … 9.8 ms 5000 runs
Benchmark #2: echo | clang -x c -c -o /dev/null -
Time (mean ± σ): 27.4 ms ± 0.4 ms [User: 19.6 ms, System: 8.1 ms]
Range (min … max): 26.4 ms … 29.1 ms 5000 runs
Summary
'echo | gcc -x c -c -o /dev/null -' ran
3.22 ± 0.13 times faster than 'echo | clang -x c -c -o /dev/null -'
$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
Time (mean ± σ): 12.2 ms ± 0.3 ms [User: 11.5 ms, System: 1.0 ms]
Range (min … max): 11.7 ms … 13.9 ms 5000 runs
Warning: Ignoring non-zero exit code.
Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
Time (mean ± σ): 26.3 ms ± 0.5 ms [User: 19.1 ms, System: 7.5 ms]
Range (min … max): 25.2 ms … 28.1 ms 5000 runs
Warning: Ignoring non-zero exit code.
Summary
'echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
2.16 ± 0.06 times faster than 'echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
Seems that GCC is faster to complete when it does not have to parse
warning flags while clang shows no major variance. Thinking more about,
cc-option gives clang an empty file so it should not have to actually
parse anything so I do not think '-fsyntax-only' will gain us a whole
ton because we should not be dipping into the backend at all.
Tangentially, my version of clang built with Profile Guided Optimization
gets me closed to GCC. I am surprised to see this level of gain.
$ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -x c -c -o /dev/null -
Time (mean ± σ): 9.6 ms ± 1.0 ms [User: 6.4 ms, System: 3.5 ms]
Range (min … max): 5.6 ms … 12.9 ms 5000 runs
Benchmark #2: echo | clang -x c -c -o /dev/null -
Time (mean ± σ): 8.7 ms ± 1.3 ms [User: 4.3 ms, System: 4.9 ms]
Range (min … max): 4.9 ms … 12.1 ms 5000 runs
Warning: Command took less than 5 ms to complete. Results might be inaccurate.
Summary
'echo | clang -x c -c -o /dev/null -' ran
1.10 ± 0.20 times faster than 'echo | gcc -x c -c -o /dev/null -'
$ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
Benchmark #1: make allmodconfig
Time (mean ± σ): 1.531 s ± 0.011 s [User: 1.180 s, System: 0.388 s]
Range (min … max): 1.501 s … 1.561 s 100 runs
Benchmark #2: make CC=clang allmodconfig
Time (mean ± σ): 1.828 s ± 0.015 s [User: 1.209 s, System: 0.760 s]
Range (min … max): 1.802 s … 1.872 s 100 runs
Summary
'make allmodconfig' ran
1.19 ± 0.01 times faster than 'make CC=clang allmodconfig'
I think that we should definitely see what we can do to speed up the front end.
Cheers,
Nathan
On Thu, Apr 29, 2021 at 5:52 PM Nathan Chancellor <[email protected]> wrote:
>
> On Thu, Apr 29, 2021 at 02:53:08PM -0700, Linus Torvalds wrote:
> > I haven't looked into why this is so slow with clang, but it really is
> > painfully slow:
> >
> > time make CC=clang allmodconfig
> > real 0m2.667s
> >
> > vs the gcc case:
> >
> > time make CC=gcc allmodconfig
> > real 0m0.903s
> >
> > Yeah, yeah, three seconds may sound like "not a lot of time, but
> > considering that the subsequent full build (which for me is often
> > empty) doesn't take all that much longer, that config time clang waste
> > is actually quite noticeable.
> >
> > I actually don't do allmodconfig builds with clang, but I do my
> > default kernel builds with it:
> >
> > time make oldconfig
> > real 0m2.748s
> >
> > time sh -c "make -j128 > ../makes"
> > real 0m3.546s
> >
> > so that "make oldconfig" really is almost as slow as the whole
> > "confirm build is done" thing. Its' quite noticeable in my workflow.
> >
> > The gcc config isn't super-fast either, but there's a big 3x
> > difference, so the clang case really is doing something extra wrong.
> >
> > I've not actually looked into _why_. Except I do see that "clang" gets
> > invoked with small (empty?) test files several times, probably to
> > check for command line flags being valid.
> >
> > Sending this to relevant parties in the hope that somebody goes "Yeah,
> > that's silly" and fixes it.
> >
> > This is on my F34 machine:
> >
> > clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
> >
> > in case it matters (but I don't see why it should).
> >
> > Many many moons ago the promise for clang was faster build speeds.
> > That didn't turn out to be true, but can we please at least try to
> > make them not painfully much slower?
>
> Hi Linus,
>
> I benchmarked this with your latest tree
> (8ca5297e7e38f2dc8c753d33a5092e7be181fff0) with my distribution versions
> of clang 11.1.0 and gcc 10.2.0 and I saw the same results, benchmarking
> with hyperfine.
>
> $ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
> Benchmark #1: make allmodconfig
> Time (mean ± σ): 1.490 s ± 0.012 s [User: 1.153 s, System: 0.374 s]
> Range (min … max): 1.462 s … 1.522 s 100 runs
>
> Benchmark #2: make CC=clang allmodconfig
> Time (mean ± σ): 4.001 s ± 0.020 s [User: 2.761 s, System: 1.274 s]
> Range (min … max): 3.939 s … 4.038 s 100 runs
>
> Summary
> 'make allmodconfig' ran
> 2.69 ± 0.03 times faster than 'make CC=clang allmodconfig'
$ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make
{comp_var}allmodconfig'
Benchmark #1: make allmodconfig
Time (mean ± σ): 2.095 s ± 0.025 s [User: 1.285 s, System: 0.880 s]
Range (min … max): 2.014 s … 2.168 s 100 runs
Benchmark #2: make CC=clang allmodconfig
Time (mean ± σ): 2.930 s ± 0.034 s [User: 1.522 s, System: 1.477 s]
Range (min … max): 2.849 s … 3.005 s 100 runs
Summary
'make allmodconfig' ran
1.40 ± 0.02 times faster than 'make CC=clang allmodconfig'
Swapping the order, I get pretty similar results to my initial run:
hyperfine -L comp_var "CC=clang ","" -r 100 -S /bin/sh -w 5 'make
{comp_var}allmodconfig'
Benchmark #1: make CC=clang allmodconfig
Time (mean ± σ): 2.915 s ± 0.031 s [User: 1.501 s, System: 1.482 s]
Range (min … max): 2.825 s … 3.004 s 100 runs
Benchmark #2: make allmodconfig
Time (mean ± σ): 2.093 s ± 0.022 s [User: 1.284 s, System: 0.879 s]
Range (min … max): 2.037 s … 2.136 s 100 runs
Summary
'make allmodconfig' ran
1.39 ± 0.02 times faster than 'make CC=clang allmodconfig'
So, yes, slower, but not quite as drastic as others have observed.
>
> It was also reproducible in a Fedora Docker image, which has newer
> versions of those tools than my distro does (GCC 11.1.0 and clang
> 12.0.0):
>
> $ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
> Benchmark #1: make allmodconfig
> Time (mean ± σ): 989.9 ms ± 3.5 ms [User: 747.0 ms, System: 271.1 ms]
> Range (min … max): 983.0 ms … 998.2 ms 100 runs
>
> Benchmark #2: make CC=clang allmodconfig
> Time (mean ± σ): 3.328 s ± 0.005 s [User: 2.408 s, System: 0.948 s]
> Range (min … max): 3.316 s … 3.343 s 100 runs
>
> Summary
> 'make allmodconfig' ran
> 3.36 ± 0.01 times faster than 'make CC=clang allmodconfig'
>
> Unfortunately, I doubt there is much that can be done on the kernel side
> because this is reproducible just invoking the compilers without any
> source input.
>
> Clang 11.1.0 and GCC 10.2.0:
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -x c -c -o /dev/null -
> Time (mean ± σ): 9.6 ms ± 1.0 ms [User: 6.5 ms, System: 3.4 ms]
> Range (min … max): 5.8 ms … 12.7 ms 5000 runs
>
> Benchmark #2: echo | clang -x c -c -o /dev/null -
> Time (mean ± σ): 33.0 ms ± 0.8 ms [User: 22.4 ms, System: 10.9 ms]
> Range (min … max): 30.3 ms … 36.0 ms 5000 runs
>
> Summary
> 'echo | gcc -x c -c -o /dev/null -' ran
> 3.45 ± 0.39 times faster than 'echo | clang -x c -c -o /dev/null -'
hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo |
{compiler} -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -x c -c -o /dev/null -
Time (mean ± σ): 21.4 ms ± 2.4 ms [User: 11.6 ms, System: 10.8 ms]
Range (min … max): 12.8 ms … 27.3 ms 5000 runs
Benchmark #2: echo | clang -x c -c -o /dev/null -
Time (mean ± σ): 16.4 ms ± 2.3 ms [User: 8.6 ms, System: 8.8 ms]
Range (min … max): 10.4 ms … 25.4 ms 5000 runs
Summary
' echo | clang -x c -c -o /dev/null -' ran
1.31 ± 0.24 times faster than ' echo | gcc -x c -c -o /dev/null -'
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
> Time (mean ± σ): 11.9 ms ± 1.1 ms [User: 10.5 ms, System: 1.8 ms]
> Range (min … max): 8.2 ms … 15.1 ms 5000 runs
>
> Warning: Ignoring non-zero exit code.
>
> Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
> Time (mean ± σ): 31.0 ms ± 0.8 ms [User: 20.3 ms, System: 10.9 ms]
> Range (min … max): 27.9 ms … 33.8 ms 5000 runs
>
> Warning: Ignoring non-zero exit code.
>
> Summary
> 'echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
> 2.62 ± 0.26 times faster than 'echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo |
{compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o
/dev/null -
Time (mean ± σ): 18.5 ms ± 2.4 ms [User: 17.0 ms, System: 2.7 ms]
Range (min … max): 12.2 ms … 24.6 ms 5000 runs
Warning: Ignoring non-zero exit code.
Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c
-o /dev/null -
Time (mean ± σ): 15.4 ms ± 2.3 ms [User: 8.4 ms, System: 8.1 ms]
Range (min … max): 9.5 ms … 22.6 ms 5000 runs
Warning: Ignoring non-zero exit code.
Summary
' echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
1.20 ± 0.23 times faster than ' echo | gcc -Werror
-Wflag-that-does-not-exit -x c -c -o /dev/null -'
>
> Clang 12.0.0 and GCC 11.1.0:
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -x c -c -o /dev/null -
> Time (mean ± σ): 8.5 ms ± 0.3 ms [User: 5.6 ms, System: 3.3 ms]
> Range (min … max): 7.6 ms … 9.8 ms 5000 runs
>
> Benchmark #2: echo | clang -x c -c -o /dev/null -
> Time (mean ± σ): 27.4 ms ± 0.4 ms [User: 19.6 ms, System: 8.1 ms]
> Range (min … max): 26.4 ms … 29.1 ms 5000 runs
>
> Summary
> 'echo | gcc -x c -c -o /dev/null -' ran
> 3.22 ± 0.13 times faster than 'echo | clang -x c -c -o /dev/null -'
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
> Time (mean ± σ): 12.2 ms ± 0.3 ms [User: 11.5 ms, System: 1.0 ms]
> Range (min … max): 11.7 ms … 13.9 ms 5000 runs
>
> Warning: Ignoring non-zero exit code.
>
> Benchmark #2: echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -
> Time (mean ± σ): 26.3 ms ± 0.5 ms [User: 19.1 ms, System: 7.5 ms]
> Range (min … max): 25.2 ms … 28.1 ms 5000 runs
>
> Warning: Ignoring non-zero exit code.
>
> Summary
> 'echo | gcc -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -' ran
> 2.16 ± 0.06 times faster than 'echo | clang -Werror -Wflag-that-does-not-exit -x c -c -o /dev/null -'
>
> Seems that GCC is faster to complete when it does not have to parse
> warning flags while clang shows no major variance. Thinking more about,
> cc-option gives clang an empty file so it should not have to actually
> parse anything so I do not think '-fsyntax-only' will gain us a whole
> ton because we should not be dipping into the backend at all.
>
> Tangentially, my version of clang built with Profile Guided Optimization
> gets me closed to GCC. I am surprised to see this level of gain.
>
> $ hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo | {compiler} -x c -c -o /dev/null -'
> Benchmark #1: echo | gcc -x c -c -o /dev/null -
> Time (mean ± σ): 9.6 ms ± 1.0 ms [User: 6.4 ms, System: 3.5 ms]
> Range (min … max): 5.6 ms … 12.9 ms 5000 runs
hyperfine -i -L compiler gcc,clang -r 5000 -S /bin/sh -w 5 'echo |
{compiler} -x c -c -o /dev/null -'
Benchmark #1: echo | gcc -x c -c -o /dev/null -
Time (mean ± σ): 21.3 ms ± 2.4 ms [User: 11.7 ms, System: 10.6 ms]
Range (min … max): 12.2 ms … 27.4 ms 5000 runs
Benchmark #2: echo | clang -x c -c -o /dev/null -
Time (mean ± σ): 16.3 ms ± 2.3 ms [User: 8.5 ms, System: 8.8 ms]
Range (min … max): 10.1 ms … 25.2 ms 5000 runs
Summary
' echo | clang -x c -c -o /dev/null -' ran
1.31 ± 0.24 times faster than ' echo | gcc -x c -c -o /dev/null -'
So now clang is faster? Am I holding it wrong?
>
> Benchmark #2: echo | clang -x c -c -o /dev/null -
> Time (mean ± σ): 8.7 ms ± 1.3 ms [User: 4.3 ms, System: 4.9 ms]
> Range (min … max): 4.9 ms … 12.1 ms 5000 runs
>
> Warning: Command took less than 5 ms to complete. Results might be inaccurate.
>
> Summary
> 'echo | clang -x c -c -o /dev/null -' ran
> 1.10 ± 0.20 times faster than 'echo | gcc -x c -c -o /dev/null -'
>
> $ hyperfine -L comp_var "","CC=clang " -r 100 -S /bin/sh -w 5 'make {comp_var}allmodconfig'
> Benchmark #1: make allmodconfig
> Time (mean ± σ): 1.531 s ± 0.011 s [User: 1.180 s, System: 0.388 s]
> Range (min … max): 1.501 s … 1.561 s 100 runs
>
> Benchmark #2: make CC=clang allmodconfig
> Time (mean ± σ): 1.828 s ± 0.015 s [User: 1.209 s, System: 0.760 s]
> Range (min … max): 1.802 s … 1.872 s 100 runs
>
> Summary
> 'make allmodconfig' ran
> 1.19 ± 0.01 times faster than 'make CC=clang allmodconfig'
>
> I think that we should definitely see what we can do to speed up the front end.
Numbers between machines probably aren't directly comparable, but I
would be curious if which toolchain was used to build clang makes a
difference; whether debug builds are significantly slower, and whether
distro toolchains vs local builds from the same branch (but all from
the same machine) are noticeably different.
--
Thanks,
~Nick Desaulniers
On Thu, Apr 29, 2021 at 5:19 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Thu, Apr 29, 2021 at 2:53 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > I haven't looked into why this is so slow with clang, but it really is
> > painfully slow:
> >
> > time make CC=clang allmodconfig
> > real 0m2.667s
> >
> > vs the gcc case:
> >
> > time make CC=gcc allmodconfig
> > real 0m0.903s
>
> Can
> you provide info about your clang build such as the version string,
> and whether this was built locally perhaps?
d'oh it was below.
> > This is on my F34 machine:
> >
> > clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
--
Thanks,
~Nick Desaulniers
On Thu, Apr 29, 2021 at 7:22 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Thu, Apr 29, 2021 at 5:19 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > On Thu, Apr 29, 2021 at 2:53 PM Linus Torvalds
> > <[email protected]> wrote:
> > >
> > > I haven't looked into why this is so slow with clang, but it really is
> > > painfully slow:
> > >
> > > time make CC=clang allmodconfig
> > > real 0m2.667s
> > >
> > > vs the gcc case:
> > >
> > > time make CC=gcc allmodconfig
> > > real 0m0.903s
> >
> > Can
> > you provide info about your clang build such as the version string,
> > and whether this was built locally perhaps?
>
> d'oh it was below.
>
> > > This is on my F34 machine:
> > >
> > > clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)
A quick:
$ perf record -e cycles:pp --call-graph lbr make LLVM=1 LLVM_IAS=1
-j72 allmodconfig
$ perf report --no-children --sort=dso,symbol
shows:
2.35% [unknown] [k] 0xffffffffabc00fc7
+ 2.29% libc-2.31.so [.] _int_malloc
1.24% libc-2.31.so [.] _int_free
+ 1.23% ld-2.31.so [.] do_lookup_x
+ 1.14% libc-2.31.so [.] __strlen_avx2
+ 1.06% libc-2.31.so [.] malloc
+ 1.03% clang-13 [.] llvm::StringMapImpl::LookupBucketFor
1.01% libc-2.31.so [.] __memmove_avx_unaligned_erms
+ 0.76% conf [.] yylex
+ 0.68% clang-13 [.] llvm::Instruction::getNumSuccessors
+ 0.63% libbfd-2.35.2-system.so [.] bfd_hash_lookup
+ 0.63% clang-13 [.] llvm::PMDataManager::findAnalysisPass
+ 0.63% ld-2.31.so [.] _dl_lookup_symbol_x
0.62% libc-2.31.so [.] __memcmp_avx2_movbe
0.60% libc-2.31.so [.] __strcmp_avx2
+ 0.56% clang-13 [.] llvm::ValueHandleBase::AddToUseList
+ 0.56% clang-13 [.]
llvm::operator==<llvm::DenseMap<llvm::BasicBlock const*, unsigned int,
llvm::DenseMapInfo<llvm::BasicBlock const*>, llvm::detail::Dense
0.53% clang-13 [.]
llvm::SmallPtrSetImplBase::insert_imp_big
(yes, I know about kptr_restrict)(sorry if there's a better way to
share such perf data; don't you need to share perf.data and the same
binary, IIRC?)
The string map lookups look expected; the compiler flags are one very
large string map; though we've identified previously perhaps hashing
could be sped up.
llvm::Instruction::getNumSuccessors looks unexpectedly like codegen,
but this was a trace of `allmodconfig`; I wouldn't be surprised if
this is LLVM=1 setting HOSTCC=clang; might be good to try to isolate
those out.
Some other questions that came to mind thinking about this overnight:
- is Kbuild/make doing more work than is necessary when building with
clang (beyond perhaps a few more cc-option checks)? I don't think perf
is the right tool for profiling GNU make. V=1 to make hides a lot of
the work macros like cc-option are doing.
- is clang doing more work than necessary for just checking support of
command line flags? Probably. I'm not sure that has been optimized
before, but if we pursue that but the slowdown was more so the
previous point, that would potentially be a waste of time.
--
Thanks,
~Nick Desaulniers
On Fri, Apr 30, 2021 at 5:19 PM Nick Desaulniers
<[email protected]> wrote:
>
> A quick:
> $ perf record -e cycles:pp --call-graph lbr make LLVM=1 LLVM_IAS=1
> -j72 allmodconfig
> $ perf report --no-children --sort=dso,symbol
> shows:
> 2.35% [unknown] [k] 0xffffffffabc00fc7
> + 2.29% libc-2.31.so [.] _int_malloc
> 1.24% libc-2.31.so [.] _int_free
> + 1.23% ld-2.31.so [.] do_lookup_x
^ bfd
> + 1.14% libc-2.31.so [.] __strlen_avx2
> + 1.06% libc-2.31.so [.] malloc
> + 1.03% clang-13 [.] llvm::StringMapImpl::LookupBucketFor
> 1.01% libc-2.31.so [.] __memmove_avx_unaligned_erms
> + 0.76% conf [.] yylex
^ kconfig
> + 0.68% clang-13 [.] llvm::Instruction::getNumSuccessors
> + 0.63% libbfd-2.35.2-system.so [.] bfd_hash_lookup
^ bfd
> + 0.63% clang-13 [.] llvm::PMDataManager::findAnalysisPass
^ this is another suspect one to me, though perhaps I should isolate
out the HOSTCC invocations vs cc-option checks. Should retest with
just CC=clang rather than HOSTCC=clang set via LLVM=1.
> + 0.63% ld-2.31.so [.] _dl_lookup_symbol_x
^ bfd
> 0.62% libc-2.31.so [.] __memcmp_avx2_movbe
> 0.60% libc-2.31.so [.] __strcmp_avx2
> + 0.56% clang-13 [.] llvm::ValueHandleBase::AddToUseList
> + 0.56% clang-13 [.]
> llvm::operator==<llvm::DenseMap<llvm::BasicBlock const*, unsigned int,
> llvm::DenseMapInfo<llvm::BasicBlock const*>, llvm::detail::Dense
> 0.53% clang-13 [.]
> llvm::SmallPtrSetImplBase::insert_imp_big
--
Thanks,
~Nick Desaulniers
On Fri, Apr 30, 2021 at 5:23 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Fri, Apr 30, 2021 at 5:19 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > A quick:
> > $ perf record -e cycles:pp --call-graph lbr make LLVM=1 LLVM_IAS=1
> > -j72 allmodconfig
> > $ perf report --no-children --sort=dso,symbol
> > shows:
> > 2.35% [unknown] [k] 0xffffffffabc00fc7
> > + 2.29% libc-2.31.so [.] _int_malloc
> > 1.24% libc-2.31.so [.] _int_free
> > + 1.23% ld-2.31.so [.] do_lookup_x
>
> ^ bfd
> > + 0.63% ld-2.31.so [.] _dl_lookup_symbol_x
>
> ^ bfd
Ah, no, sorry, these are the runtime link editor/loader. So probably
spending quite some time resolving symbols in large binaries.
--
Thanks,
~Nick Desaulniers
On Fri, Apr 30, 2021 at 5:25 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Fri, Apr 30, 2021 at 5:23 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > On Fri, Apr 30, 2021 at 5:19 PM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > A quick:
> > > $ perf record -e cycles:pp --call-graph lbr make LLVM=1 LLVM_IAS=1
> > > -j72 allmodconfig
> > > $ perf report --no-children --sort=dso,symbol
> > > shows:
> > > 2.35% [unknown] [k] 0xffffffffabc00fc7
> > > + 2.29% libc-2.31.so [.] _int_malloc
> > > 1.24% libc-2.31.so [.] _int_free
> > > + 1.23% ld-2.31.so [.] do_lookup_x
> >
> > ^ bfd
>
>
> > > + 0.63% ld-2.31.so [.] _dl_lookup_symbol_x
> >
> > ^ bfd
>
> Ah, no, sorry, these are the runtime link editor/loader. So probably
> spending quite some time resolving symbols in large binaries.
I don't see NOW in:
$ llvm-readelf -Wd `which clang`
so I don't think clang was linked as `-Wl,-z,now`. I also see both
.hash and .gnu.hash in
$ llvm-readelf -S `which clang`.
The presence of NOW or lack of .gnu.hash would have been my guess for
symbol lookup issues. Perhaps the length of C++ mangled symbols
doesn't help.
--
Thanks,
~Nick Desaulniers
On Fri, Apr 30, 2021 at 5:25 PM Nick Desaulniers
<[email protected]> wrote:
>
> Ah, no, sorry, these are the runtime link editor/loader. So probably
> spending quite some time resolving symbols in large binaries.
Yeah. Appended is the profile I see when I profile that "make
oldconfig", so about 45% of all time seems to be spent in just symbol
lookup and relocation.
And a fair amount of time just creating and tearing down that huge
executable (with a lot of copy-on-write overhead too), with the kernel
side of that being another 15%. The cost of that is likely also fairly
directly linked to all the dynamic linking costs, which brings in all
that data.
Just to compare, btw, this is the symbol lookup overhead for the gcc case:
1.43% ld-2.33.so do_lookup_x
0.96% ld-2.33.so _dl_relocate_object
0.69% ld-2.33.so _dl_lookup_symbol_x
so it really does seem to be something very odd going on with the clang binary.
Maybe the Fedora binary is built some odd way, but it's likely just
the default clang build.
Linus
----
23.59% ld-2.33.so _dl_lookup_symbol_x
11.41% ld-2.33.so _dl_relocate_object
9.95% ld-2.33.so do_lookup_x
4.00% [kernel.vmlinux] copy_page
3.98% [kernel.vmlinux] next_uptodate_page
3.05% [kernel.vmlinux] zap_pte_range
1.81% [kernel.vmlinux] clear_page_rep
1.68% [kernel.vmlinux] asm_exc_page_fault
1.33% ld-2.33.so strcmp
1.33% ld-2.33.so check_match
0.92% libLLVM-12.so llvm::StringMapImpl::LookupBucketFor
0.83% [kernel.vmlinux] rmqueue_bulk
0.77% conf yylex
0.75% libc-2.33.so __gconv_transform_utf8_internal
0.74% libc-2.33.so _int_malloc
0.69% libc-2.33.so __strlen_avx2
0.62% [kernel.vmlinux] pagecache_get_page
0.58% [kernel.vmlinux] page_remove_rmap
0.56% [kernel.vmlinux] __handle_mm_fault
0.54% [kernel.vmlinux] filemap_map_pages
0.54% libc-2.33.so __strcmp_avx2
0.54% [kernel.vmlinux] __free_one_page
0.52% [kernel.vmlinux] release_pages
On Fri, Apr 30, 2021 at 6:22 PM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, Apr 30, 2021 at 5:25 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > Ah, no, sorry, these are the runtime link editor/loader. So probably
> > spending quite some time resolving symbols in large binaries.
>
> Yeah. Appended is the profile I see when I profile that "make
> oldconfig", so about 45% of all time seems to be spent in just symbol
> lookup and relocation.
>
> And a fair amount of time just creating and tearing down that huge
> executable (with a lot of copy-on-write overhead too), with the kernel
> side of that being another 15%. The cost of that is likely also fairly
> directly linked to all the dynamic linking costs, which brings in all
> that data.
>
> Just to compare, btw, this is the symbol lookup overhead for the gcc case:
>
> 1.43% ld-2.33.so do_lookup_x
> 0.96% ld-2.33.so _dl_relocate_object
> 0.69% ld-2.33.so _dl_lookup_symbol_x
>
> so it really does seem to be something very odd going on with the clang binary.
>
> Maybe the Fedora binary is built some odd way, but it's likely just
> the default clang build.
>
> Linus
>
> ----
> 23.59% ld-2.33.so _dl_lookup_symbol_x
> 11.41% ld-2.33.so _dl_relocate_object
> 9.95% ld-2.33.so do_lookup_x
> 4.00% [kernel.vmlinux] copy_page
> 3.98% [kernel.vmlinux] next_uptodate_page
> 3.05% [kernel.vmlinux] zap_pte_range
> 1.81% [kernel.vmlinux] clear_page_rep
> 1.68% [kernel.vmlinux] asm_exc_page_fault
> 1.33% ld-2.33.so strcmp
> 1.33% ld-2.33.so check_match
47.61% spent in symbol table lookup. Nice. (Not counting probably a
fair amount of the libc calls below).
> 0.92% libLLVM-12.so llvm::StringMapImpl::LookupBucketFor
^ wait a minute; notice how in your profile the `Shared Object` is
attributed to `libLLVM-12.so` while mine is `clang-13`? Clang can be
built as either having libllvm statically linked or dynamically; see
the cmake variables
LLVM_BUILD_LLVM_DYLIB:BOOL
LLVM_LINK_LLVM_DYLIB:BOOL
BUILD_SHARED_LIBS:BOOL
https://llvm.org/docs/CMake.html
I think those are frowned upon; useful for cutting down on developers
iteration speed due to not having to relink llvm when developing
clang. But shipping that in production? I just checked and it doesn't
look like we do that for AOSP's build of LLVM.
Tom, is one of the above intentionally set for clang builds on Fedora?
I'm guessing it's intentional that there are packages for
libLLVM-12.so and libclang-cpp.so.12, perhaps they have other
dependents?
> 0.83% [kernel.vmlinux] rmqueue_bulk
> 0.77% conf yylex
> 0.75% libc-2.33.so __gconv_transform_utf8_internal
> 0.74% libc-2.33.so _int_malloc
> 0.69% libc-2.33.so __strlen_avx2
> 0.62% [kernel.vmlinux] pagecache_get_page
> 0.58% [kernel.vmlinux] page_remove_rmap
> 0.56% [kernel.vmlinux] __handle_mm_fault
> 0.54% [kernel.vmlinux] filemap_map_pages
> 0.54% libc-2.33.so __strcmp_avx2
> 0.54% [kernel.vmlinux] __free_one_page
> 0.52% [kernel.vmlinux] release_pages
--
Thanks,
~Nick Desaulniers
On 2021-04-30, Nick Desaulniers wrote:
>On Fri, Apr 30, 2021 at 6:22 PM Linus Torvalds
><[email protected]> wrote:
>>
>> On Fri, Apr 30, 2021 at 5:25 PM Nick Desaulniers
>> <[email protected]> wrote:
>> >
>> > Ah, no, sorry, these are the runtime link editor/loader. So probably
>> > spending quite some time resolving symbols in large binaries.
>>
>> Yeah. Appended is the profile I see when I profile that "make
>> oldconfig", so about 45% of all time seems to be spent in just symbol
>> lookup and relocation.
>>
>> And a fair amount of time just creating and tearing down that huge
>> executable (with a lot of copy-on-write overhead too), with the kernel
>> side of that being another 15%. The cost of that is likely also fairly
>> directly linked to all the dynamic linking costs, which brings in all
>> that data.
>>
>> Just to compare, btw, this is the symbol lookup overhead for the gcc case:
>>
>> 1.43% ld-2.33.so do_lookup_x
>> 0.96% ld-2.33.so _dl_relocate_object
>> 0.69% ld-2.33.so _dl_lookup_symbol_x
>>
>> so it really does seem to be something very odd going on with the clang binary.
>>
>> Maybe the Fedora binary is built some odd way, but it's likely just
>> the default clang build.
>>
>> Linus
>>
>> ----
>> 23.59% ld-2.33.so _dl_lookup_symbol_x
>> 11.41% ld-2.33.so _dl_relocate_object
>> 9.95% ld-2.33.so do_lookup_x
>> 4.00% [kernel.vmlinux] copy_page
>> 3.98% [kernel.vmlinux] next_uptodate_page
>> 3.05% [kernel.vmlinux] zap_pte_range
>> 1.81% [kernel.vmlinux] clear_page_rep
>> 1.68% [kernel.vmlinux] asm_exc_page_fault
>> 1.33% ld-2.33.so strcmp
>> 1.33% ld-2.33.so check_match
>
>47.61% spent in symbol table lookup. Nice. (Not counting probably a
>fair amount of the libc calls below).
>
>> 0.92% libLLVM-12.so llvm::StringMapImpl::LookupBucketFor
>
>^ wait a minute; notice how in your profile the `Shared Object` is
>attributed to `libLLVM-12.so` while mine is `clang-13`? Clang can be
>built as either having libllvm statically linked or dynamically; see
>the cmake variables
>LLVM_BUILD_LLVM_DYLIB:BOOL
>LLVM_LINK_LLVM_DYLIB:BOOL
>BUILD_SHARED_LIBS:BOOL
>https://llvm.org/docs/CMake.html
>
>I think those are frowned upon; useful for cutting down on developers
>iteration speed due to not having to relink llvm when developing
>clang. But shipping that in production? I just checked and it doesn't
>look like we do that for AOSP's build of LLVM.
>
>Tom, is one of the above intentionally set for clang builds on Fedora?
>I'm guessing it's intentional that there are packages for
>libLLVM-12.so and libclang-cpp.so.12, perhaps they have other
>dependents?
LLVM_LINK_LLVM_DYLIB (linking against libLLVM.so instead of libLLVM*.a)
has been around for a while.
Tom added CLANG_LINK_CLANG_DYLIB in 2019
(https://reviews.llvm.org/D63503 link against libclang-cpp.so instead of
libclang*.a or libclang*.so) :) So I'd guess this is a concious decision
for Fedora.
Arch Linux has switched to -DCLANG_LINK_CLANG_DYLIB=on as well
https://github.com/archlinux/svntogit-packages/blob/packages/clang/trunk/PKGBUILD
This is useful to make the total size of LLVM/clang dependent packages
(ccls, zig, etc) small.
If we don't let distributions use libLLVM.so libclang-cpp.so, hmmmm, I guess
their only choice will be crunchgen[1]-style
clang+lld+llvm-objcopy+llvm-objdump+llvm-ar+llvm-nm+llvm-strings+llvm-readelf+...+clang-format+clang-offload-bundler+...
(executables from packages which are usually named llvm, clang, and clang-tools)
[1]: https://www.freebsd.org/cgi/man.cgi?query=crunchgen&sektion=1
>> 0.83% [kernel.vmlinux] rmqueue_bulk
>> 0.77% conf yylex
>> 0.75% libc-2.33.so __gconv_transform_utf8_internal
>> 0.74% libc-2.33.so _int_malloc
>> 0.69% libc-2.33.so __strlen_avx2
>> 0.62% [kernel.vmlinux] pagecache_get_page
>> 0.58% [kernel.vmlinux] page_remove_rmap
>> 0.56% [kernel.vmlinux] __handle_mm_fault
>> 0.54% [kernel.vmlinux] filemap_map_pages
>> 0.54% libc-2.33.so __strcmp_avx2
>> 0.54% [kernel.vmlinux] __free_one_page
>> 0.52% [kernel.vmlinux] release_pages
>--
>Thanks,
>~Nick Desaulniers
On 4/30/21 6:48 PM, Nick Desaulniers wrote:
> On Fri, Apr 30, 2021 at 6:22 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Fri, Apr 30, 2021 at 5:25 PM Nick Desaulniers
>> <[email protected]> wrote:
>>>
>>> Ah, no, sorry, these are the runtime link editor/loader. So probably
>>> spending quite some time resolving symbols in large binaries.
>>
>> Yeah. Appended is the profile I see when I profile that "make
>> oldconfig", so about 45% of all time seems to be spent in just symbol
>> lookup and relocation.
>>
>> And a fair amount of time just creating and tearing down that huge
>> executable (with a lot of copy-on-write overhead too), with the kernel
>> side of that being another 15%. The cost of that is likely also fairly
>> directly linked to all the dynamic linking costs, which brings in all
>> that data.
>>
>> Just to compare, btw, this is the symbol lookup overhead for the gcc case:
>>
>> 1.43% ld-2.33.so do_lookup_x
>> 0.96% ld-2.33.so _dl_relocate_object
>> 0.69% ld-2.33.so _dl_lookup_symbol_x
>>
>> so it really does seem to be something very odd going on with the clang binary.
>>
>> Maybe the Fedora binary is built some odd way, but it's likely just
>> the default clang build.
>>
>> Linus
>>
>> ----
>> 23.59% ld-2.33.so _dl_lookup_symbol_x
>> 11.41% ld-2.33.so _dl_relocate_object
>> 9.95% ld-2.33.so do_lookup_x
>> 4.00% [kernel.vmlinux] copy_page
>> 3.98% [kernel.vmlinux] next_uptodate_page
>> 3.05% [kernel.vmlinux] zap_pte_range
>> 1.81% [kernel.vmlinux] clear_page_rep
>> 1.68% [kernel.vmlinux] asm_exc_page_fault
>> 1.33% ld-2.33.so strcmp
>> 1.33% ld-2.33.so check_match
>
> 47.61% spent in symbol table lookup. Nice. (Not counting probably a
> fair amount of the libc calls below).
>
>> 0.92% libLLVM-12.so llvm::StringMapImpl::LookupBucketFor
>
> ^ wait a minute; notice how in your profile the `Shared Object` is
> attributed to `libLLVM-12.so` while mine is `clang-13`? Clang can be
> built as either having libllvm statically linked or dynamically; see
> the cmake variables
> LLVM_BUILD_LLVM_DYLIB:BOOL
> LLVM_LINK_LLVM_DYLIB:BOOL
> BUILD_SHARED_LIBS:BOOL
> https://llvm.org/docs/CMake.html
>
> I think those are frowned upon; useful for cutting down on developers
> iteration speed due to not having to relink llvm when developing
> clang. But shipping that in production? I just checked and it doesn't
> look like we do that for AOSP's build of LLVM.
>
BUILD_SHARED_LIBS is the only one that is discouraged and we don't use
that in Fedora any more. We just use LLVM_LINK_LLVM_DYLIB and the
clang equivalent.
> Tom, is one of the above intentionally set for clang builds on Fedora?
> I'm guessing it's intentional that there are packages for
> libLLVM-12.so and libclang-cpp.so.12, perhaps they have other
> dependents?
>
Yes, it's intentional. Dynamic linking libraries from other packages is
the Fedora policy[1], and clang and llvm are separate packages (in Fedora).
- Tom
[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/#_statically_linking_executables
>> 0.83% [kernel.vmlinux] rmqueue_bulk
>> 0.77% conf yylex
>> 0.75% libc-2.33.so __gconv_transform_utf8_internal
>> 0.74% libc-2.33.so _int_malloc
>> 0.69% libc-2.33.so __strlen_avx2
>> 0.62% [kernel.vmlinux] pagecache_get_page
>> 0.58% [kernel.vmlinux] page_remove_rmap
>> 0.56% [kernel.vmlinux] __handle_mm_fault
>> 0.54% [kernel.vmlinux] filemap_map_pages
>> 0.54% libc-2.33.so __strcmp_avx2
>> 0.54% [kernel.vmlinux] __free_one_page
>> 0.52% [kernel.vmlinux] release_pages
On Fri, Apr 30, 2021 at 8:33 PM Tom Stellard <[email protected]> wrote:
>
> Yes, it's intentional. Dynamic linking libraries from other packages is
> the Fedora policy[1], and clang and llvm are separate packages (in Fedora).
Side note: I really wish Fedora stopped doing that.
Shared libraries are not a good thing in general. They add a lot of
overhead in this case, but more importantly they also add lots of
unnecessary dependencies and complexity, and almost no shared
libraries are actually version-safe, so it adds absolutely zero
upside.
Yes, it can save on disk use, but unless it's some very core library
used by a lot of things (ie particularly things like GUI libraries
like gnome or Qt or similar), the disk savings are often not all that
big - and disk is cheap. And the memory savings are often actually
negative (again, unless it's some big library that is typically used
by lots of different programs at the same time).
In this case, for example, it's true that a parallel build will be
running possibly hundreds of copies of clang at the same time - and
they'll all share the shared llvm library. But they'd share those same
pages even if it wasn't a shared library, because it's the same
executable! And the dynamic linking will actually cause a lot _less_
sharing because of all the fixups.
We hit this in the subsurface project too. We had a couple of
libraries that *nobody* else used. Literally *nobody*. But the Fedora
policy meant that a Fedora package had to go the extra mile to make
those other libraries be shared libraries, for actual negative gain,
and a much more fragile end result (since those libraries were in no
way compatible across different versions - so it all had to be updated
in lock-step).
I think people have this incorrect picture that "shared libraries are
inherently good". They really really aren't. They cause a lot of
problems, and the advantage really should always be weighed against
those (big) disadvantages.
Pretty much the only case shared libraries really make sense is for
truly standardized system libraries that are everywhere, and are part
of the base distro.
[ Or, for those very rare programs that end up dynamically loading
rare modules at run-time - not at startup - because that's their
extension model. But that's a different kind of "shared library"
entirely, even if ELF makes the technical distinction between
"loadable module" and "shared library" be a somewhat moot point ]
Linus
On Sat, May 01, 2021 at 09:32:25AM -0700, Linus Torvalds wrote:
> On Fri, Apr 30, 2021 at 8:33 PM Tom Stellard <[email protected]> wrote:
> >
> > Yes, it's intentional. Dynamic linking libraries from other packages is
> > the Fedora policy[1], and clang and llvm are separate packages (in Fedora).
>
> Side note: I really wish Fedora stopped doing that.
>
> Shared libraries are not a good thing in general. They add a lot of
> overhead in this case, but more importantly they also add lots of
> unnecessary dependencies and complexity, and almost no shared
> libraries are actually version-safe, so it adds absolutely zero
> upside.
>
> Yes, it can save on disk use, but unless it's some very core library
> used by a lot of things (ie particularly things like GUI libraries
> like gnome or Qt or similar), the disk savings are often not all that
> big - and disk is cheap. And the memory savings are often actually
> negative (again, unless it's some big library that is typically used
> by lots of different programs at the same time).
>
> In this case, for example, it's true that a parallel build will be
> running possibly hundreds of copies of clang at the same time - and
> they'll all share the shared llvm library. But they'd share those same
> pages even if it wasn't a shared library, because it's the same
> executable! And the dynamic linking will actually cause a lot _less_
> sharing because of all the fixups.
>
> We hit this in the subsurface project too. We had a couple of
> libraries that *nobody* else used. Literally *nobody*. But the Fedora
> policy meant that a Fedora package had to go the extra mile to make
> those other libraries be shared libraries, for actual negative gain,
> and a much more fragile end result (since those libraries were in no
> way compatible across different versions - so it all had to be updated
> in lock-step).
>
> I think people have this incorrect picture that "shared libraries are
> inherently good". They really really aren't. They cause a lot of
> problems, and the advantage really should always be weighed against
> those (big) disadvantages.
>
> Pretty much the only case shared libraries really make sense is for
> truly standardized system libraries that are everywhere, and are part
> of the base distro.
>
> [ Or, for those very rare programs that end up dynamically loading
> rare modules at run-time - not at startup - because that's their
> extension model. But that's a different kind of "shared library"
> entirely, even if ELF makes the technical distinction between
> "loadable module" and "shared library" be a somewhat moot point ]
I tend to agree with most of these arguments, but let me offer another
perspective:
# from an llvm-repo, configured to use libLLVM.so
> du -s bin
9152344 bin
# from the same repo, configured to use static libraries
> du -s bin
43777528
As a packager, I roughly need to put all these bits in packages, across
base, development and debug package. As a user, I may need to download them.
disk space is ok, but network bandwidth is not as cheap for everyone.
Different metrics lead to different choice, then comes the great pleasure of
making compromises :-)
not
From: Linus Torvalds
> Sent: 01 May 2021 17:32
>
> On Fri, Apr 30, 2021 at 8:33 PM Tom Stellard <[email protected]> wrote:
> >
> > Yes, it's intentional. Dynamic linking libraries from other packages is
> > the Fedora policy[1], and clang and llvm are separate packages (in Fedora).
>
> Side note: I really wish Fedora stopped doing that.
>
> Shared libraries are not a good thing in general. They add a lot of
> overhead in this case, but more importantly they also add lots of
> unnecessary dependencies and complexity, and almost no shared
> libraries are actually version-safe, so it adds absolutely zero
> upside.
It's 'swings and roundabouts'...
I used a system where the libc.so the linker found was actually
an archive library, one member being libc.so.1.
The function that updated utmp and utmpx (last login details)
was in the archive part.
This code had incorrect locking and corrupted the files.
While the fix was easy, getting in 'installed' wasn't because all
the programs that used it needed to be relinked - hard when some
where provided as commercial binaries by 3rd parties.
I've also done some experiments with the mozilla web browser.
This loaded about 30 libraries at program startup.
The elf symbol hashing rules don't help at all!
Every symbol gets looked for in every library (often checking
for a non-weak symbol having found a weak definition).
So the hash of the symbol is calculated.
It is remaindered by the hash table size and the linked list scanned.
Now the hash table size is the prime below the power of 2 below
the number of symbols (well was when I did this).
So the average hash chain has about 1.5 entries.
With 30 libraries this is ~45 string compares.
If all the strings start with similar strings (C++classes)
then the strcmp() are quite long.
I played around with the hash table size.
It really didn't matter whether it was a prime or not.
For libc the distribution was always horrid - with some
quite long hash chains.
Making the hash table larger than the number of symbols
(perhaps 2 powers of 2 above) would be more likely to
make the hash hit an empty list - and skip all the strcmp().
The other 'trick' was a rewrite of the dynamic loader to
generate a single symbol table that contained all the symbols
of all the libraries loaded at program startup.
Process the libraries in the right order and this is easy.
That made a considerable improvement to program startup.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Sat, May 1, 2021 at 12:58 PM Serge Guelton <[email protected]> wrote:
>
> Different metrics lead to different choice, then comes the great pleasure of
> making compromises :-)
Even if that particular compromise might be the right one to do for
clang and llvm, the point is that the Fedora rule is garbage, and it
doesn't _allow_ for making any compromises at all.
The Fedora policy is basically "you have to use shared libraries
whether that makes any sense or not".
As mentioned, I've seen a project bitten by that insane policy. It's bogus.
Linus
On 2021-05-01, Linus Torvalds wrote:
>On Sat, May 1, 2021 at 12:58 PM Serge Guelton <[email protected]> wrote:
>>
>> Different metrics lead to different choice, then comes the great pleasure of
>> making compromises :-)
>
>Even if that particular compromise might be the right one to do for
>clang and llvm, the point is that the Fedora rule is garbage, and it
>doesn't _allow_ for making any compromises at all.
>
>The Fedora policy is basically "you have to use shared libraries
>whether that makes any sense or not".
>
>As mentioned, I've seen a project bitten by that insane policy. It's bogus.
>
> Linus
As a very safe optimization, distributions can consider
-fno-semantic-interposition (only effectful on x86 in GCC and Clang,
already used by some Python packages):
avoid GOT/PLT generating relocation if the referenced symbol is defined
in the same translation unit. See my benchmark below: it makes the built
-fPIC clang slightly faster.
As a slightly aggressive optimization, consider
-DCMAKE_EXE_LINKER_FLAGS=-Wl,-Bsymbolic-functions -DCMAKE_SHARED_LINKER_FLAGS=-Wl,-Bsymbolic-functions.
The performance is comparable to a mostly statically linked PIE clang. (-shared
-Bsymbolic is very similar to -pie.): function calls within libLLVM.so
or libclang-cpp.so has no extra cost compared with a mostly statically linked PIE clang.
Normally I don't recommend -Bsymbolic because
* it can break C++ semantics about address uniqueness of inline functions,
type_info (exceptions) when there are multiple definitions in the
process. I believe LLVM+Clang are not subject to such issues.
We don't throw LLVM/Clang type exceptions.
* it is not compatible with copy relocations[1]. This is not an issue for -Bsymbolic-functions.
-Bsymbolic-functions should be suitable for LLVM+Clang.
LD=ld.lld -j 40 defconfig; time 'make vmlinux'
# the compile flags may be very different from the clang builds below.
system gcc
1050.15s user 192.96s system 3015% cpu 41.219 total
1055.47s user 196.51s system 3022% cpu 41.424 total
clang (libLLVM*.a libclang*.a); LLVM=1
1588.35s user 193.02s system 3223% cpu 55.259 total
1613.59s user 193.22s system 3234% cpu 55.861 total
clang (libLLVM.so libclang-cpp.so); LLVM=1
1870.07s user 222.86s system 3256% cpu 1:04.26 total
1863.26s user 220.59s system 3219% cpu 1:04.73 total
1877.79s user 223.98s system 3233% cpu 1:05.00 total
1859.32s user 221.96s system 3241% cpu 1:04.20 total
clang (libLLVM.so libclang-cpp.so -fno-semantic-interposition); LLVM=1
1810.47s user 222.98s system 3288% cpu 1:01.83 total
1790.46s user 219.65s system 3227% cpu 1:02.27 total
1796.46s user 220.88s system 3139% cpu 1:04.25 total
1796.55s user 221.28s system 3215% cpu 1:02.75 total
clang (libLLVM.so libclang-cpp.so -fno-semantic-interposition -Wl,-Bsymbolic); LLVM=1
1608.75s user 221.39s system 3192% cpu 57.333 total
1607.85s user 220.60s system 3205% cpu 57.042 total
1598.64s user 191.21s system 3208% cpu 55.778 total
clang (libLLVM.so libclang-cpp.so -fno-semantic-interposition -Wl,-Bsymbolic-functions); LLVM=1
1617.35s user 220.54s system 3217% cpu 57.115 total
LLVM's reusable component design causes us some overhead here. Almost
every cross-TU callable function is moved to a public header and
exported, libLLVM.so and libclang-cpp.so have huge dynamic symbol tables.
-Wl,--gc-sections cannot really eliminate much.
(Last, I guess it is a conscious decision that distributions build all
targets instead of just the host -DLLVM_TARGETS_TO_BUILD=host. This
makes cross compilation easy: a single clang can replace various *-linux-gnu-gcc)
[1]: Even if one design goal of -fPIE is to avoid copy relocations, and
normally there should be no issue on non-x86, there is an unfortunate
GCC 5 fallout for x86-64 ("x86-64: Optimize access to globals in PIE with copy reloc").
I'll omit words here as you can find details on https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
-Bsymbolic-functions avoids such issues.
On Fri, Apr 30, 2021 at 06:48:11PM -0700, Nick Desaulniers wrote:
> On Fri, Apr 30, 2021 at 6:22 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Fri, Apr 30, 2021 at 5:25 PM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > Ah, no, sorry, these are the runtime link editor/loader. So probably
> > > spending quite some time resolving symbols in large binaries.
> >
> > Yeah. Appended is the profile I see when I profile that "make
> > oldconfig", so about 45% of all time seems to be spent in just symbol
> > lookup and relocation.
> >
> > And a fair amount of time just creating and tearing down that huge
> > executable (with a lot of copy-on-write overhead too), with the kernel
> > side of that being another 15%. The cost of that is likely also fairly
> > directly linked to all the dynamic linking costs, which brings in all
> > that data.
> >
> > Just to compare, btw, this is the symbol lookup overhead for the gcc case:
> >
> > 1.43% ld-2.33.so do_lookup_x
> > 0.96% ld-2.33.so _dl_relocate_object
> > 0.69% ld-2.33.so _dl_lookup_symbol_x
> >
> > so it really does seem to be something very odd going on with the clang binary.
> >
> > Maybe the Fedora binary is built some odd way, but it's likely just
> > the default clang build.
> >
> > Linus
> >
> > ----
> > 23.59% ld-2.33.so _dl_lookup_symbol_x
> > 11.41% ld-2.33.so _dl_relocate_object
> > 9.95% ld-2.33.so do_lookup_x
> > 4.00% [kernel.vmlinux] copy_page
> > 3.98% [kernel.vmlinux] next_uptodate_page
> > 3.05% [kernel.vmlinux] zap_pte_range
> > 1.81% [kernel.vmlinux] clear_page_rep
> > 1.68% [kernel.vmlinux] asm_exc_page_fault
> > 1.33% ld-2.33.so strcmp
> > 1.33% ld-2.33.so check_match
>
> 47.61% spent in symbol table lookup. Nice. (Not counting probably a
> fair amount of the libc calls below).
>
> > 0.92% libLLVM-12.so llvm::StringMapImpl::LookupBucketFor
>
> ^ wait a minute; notice how in your profile the `Shared Object` is
> attributed to `libLLVM-12.so` while mine is `clang-13`? Clang can be
> built as either having libllvm statically linked or dynamically; see
> the cmake variables
> LLVM_BUILD_LLVM_DYLIB:BOOL
> LLVM_LINK_LLVM_DYLIB:BOOL
> BUILD_SHARED_LIBS:BOOL
> https://llvm.org/docs/CMake.html
>
> I think those are frowned upon; useful for cutting down on developers
> iteration speed due to not having to relink llvm when developing
> clang. But shipping that in production? I just checked and it doesn't
> look like we do that for AOSP's build of LLVM.
>
> Tom, is one of the above intentionally set for clang builds on Fedora?
> I'm guessing it's intentional that there are packages for
> libLLVM-12.so and libclang-cpp.so.12, perhaps they have other
> dependents?
Have you tried building clang/llvm with -Bsymbolic-functions?
Mike
On Fri, Apr 30, 2021 at 06:48:11PM -0700, Nick Desaulniers wrote:
> On Fri, Apr 30, 2021 at 6:22 PM Linus Torvalds
> <[email protected]> wrote:
> > 0.92% libLLVM-12.so llvm::StringMapImpl::LookupBucketFor
>
> ^ wait a minute; notice how in your profile the `Shared Object` is
> attributed to `libLLVM-12.so` while mine is `clang-13`? Clang can be
> built as either having libllvm statically linked or dynamically; see
> the cmake variables
> LLVM_BUILD_LLVM_DYLIB:BOOL
> LLVM_LINK_LLVM_DYLIB:BOOL
> BUILD_SHARED_LIBS:BOOL
> https://llvm.org/docs/CMake.html
>
> I think those are frowned upon; useful for cutting down on developers
> iteration speed due to not having to relink llvm when developing
> clang. But shipping that in production? I just checked and it doesn't
> look like we do that for AOSP's build of LLVM.
There's also `-DLLVM_ENABLE_LTO=Thin` that enables LTO for building LLVM
and Clang themselves, considered they can be bootstrapped like this
using a previous version of Clang. Combining that with a non-shared
library build mode for both Clang and LLVM, the result is possibly the
fastest and most optimized build that is achievable. Unfortunately I
see distributions neglecting to enable this in packaging this as well.
On a side note, I'm also a Fedora user and agree with Linus about this.
I'd like to see an opt-in bypass of the shared library policy via
something like `dnf install clang-optimized` that would install the
fastest and most optimized Clang build regardless of RPM install size.
--
Dan Aloni
On Sat, May 01, 2021 at 09:32:25AM -0700, Linus Torvalds wrote:
>...
> Yes, it can save on disk use, but unless it's some very core library
> used by a lot of things (ie particularly things like GUI libraries
> like gnome or Qt or similar), the disk savings are often not all that
> big - and disk is cheap. And the memory savings are often actually
> negative (again, unless it's some big library that is typically used
> by lots of different programs at the same time).
>...
> I think people have this incorrect picture that "shared libraries are
> inherently good". They really really aren't. They cause a lot of
> problems, and the advantage really should always be weighed against
> those (big) disadvantages.
>...
Disk and memory usage is not the biggest advantage.
The biggest advantage of shared libraries is that they enable
distributions to provide security fixes.
Distributions try hard to have only one place to patch and one package
to rebuild when a CVE has to be fixed.
It is not feasible to rebuild all users of a library in a
distribution every time a CVE gets published for a library.
Some of the new language ecosystems like Go or Rust do not offer
shared libraries.
At the end of this email are some of the recent CVEs in Rust.
Q:
What happens if you use a program provided by your distribution that is
written in Rust and handles untrusted input in a way that it might be
vulnerable to exploits based on one of these CVEs?
A:
The program has a known vulnerability that will likely stay unfixed.
This is of course not a problem for the rare software like Firefox or
the kernel that have CVEs themselves so regularly that they get rebuilt
all the time.
> Linus
cu
Adrian
CVE-2020-36317 In the standard library in Rust before 1.49.0,
String::retain() function has a panic safety problem. It allows creation
of a non-UTF-8 Rust string when the provided closure panics. This bug
could result in a memory safety violation when other string APIs assume
that UTF-8 encoding is used on the same string.
CVE-2020-36318 In the standard library in Rust before 1.49.0,
VecDeque::make_contiguous has a bug that pops the same element more than
once under certain condition. This bug could result in a use-after-free
or double free.
CVE-2020-36323 In the standard library in Rust before 1.52.0, there is
an optimization for joining strings that can cause uninitialized bytes
to be exposed (or the program to crash) if the borrowed string changes
after its length is checked.
CVE-2021-28875 In the standard library in Rust before 1.50.0,
read_to_end() does not validate the return value from Read in an unsafe
context. This bug could lead to a buffer overflow.
CVE-2021-31162 In the standard library in Rust before 1.53.0, a double
free can occur in the Vec::from_iter function if freeing the element
panics.
From: Adrian Bunk
> Sent: 02 May 2021 10:31
>
> On Sat, May 01, 2021 at 09:32:25AM -0700, Linus Torvalds wrote:
> >...
> > Yes, it can save on disk use, but unless it's some very core library
> > used by a lot of things (ie particularly things like GUI libraries
> > like gnome or Qt or similar), the disk savings are often not all that
> > big - and disk is cheap. And the memory savings are often actually
> > negative (again, unless it's some big library that is typically used
> > by lots of different programs at the same time).
> >...
> > I think people have this incorrect picture that "shared libraries are
> > inherently good". They really really aren't. They cause a lot of
> > problems, and the advantage really should always be weighed against
> > those (big) disadvantages.
> >...
>
> Disk and memory usage is not the biggest advantage.
>
> The biggest advantage of shared libraries is that they enable
> distributions to provide security fixes.
>
> Distributions try hard to have only one place to patch and one package
> to rebuild when a CVE has to be fixed.
>
> It is not feasible to rebuild all users of a library in a
> distribution every time a CVE gets published for a library.
Absolutely.
You'd also need to rebuild every application that might include
the static version of the broken function.
Good luck finding all those on a big install.
OTOH just splitting a compiler into multiple shared objects
that have no other use is, as Linus said, stupid.
Building shared libraries requires the same control as building
the kernel.
The user interface mustn't change.
You can add new functions, but not change any existing ones.
This is easy in C, difficult in C++.
Since PLT lookups can only handle code, you really don't want
data areas shared between the program and library.
If the size ever changes 'horrid things (tm)' happen.
We compile any shared libraries with -fvisibility=hidden and
mark any entry points with __attribute__((visibility("protected"))).
This means that calls with a library are simple PC-relative
and only the entry points are visible outside.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Sun, May 2, 2021 at 2:31 AM Adrian Bunk <[email protected]> wrote:
>
> The biggest advantage of shared libraries is that they enable
> distributions to provide security fixes.
Adrian - you're ignoring the real argument, to the point that the
above is basically a lie.
The argument was never that things like libc or the core GUI libraries
shouldn't be shared.
The argument was that the "one-off" libraries shouldn't be shared.
Things very much like libLLVM.so.
Or things like "libdivecomputer.so". You probably have never ever
heard of that library, have you? It's used by one single project, that
project isn't even in Fedora, but when we tried to make an rpm for it,
people complained because the Fedora rules said it needed to use
shared libraries.
So the whole notion that "shared libraries are good and required by
default" is pure and utter garbage. It's simply not true.
And no, it really didn't become any more true due to "security fixes".
Your argument is a red herring.
Linus
On Sun, May 02, 2021 at 09:12:37AM -0700, Linus Torvalds wrote:
> On Sun, May 2, 2021 at 2:31 AM Adrian Bunk <[email protected]> wrote:
> >
> > The biggest advantage of shared libraries is that they enable
> > distributions to provide security fixes.
>
> Adrian - you're ignoring the real argument, to the point that the
> above is basically a lie.
>
> The argument was never that things like libc or the core GUI libraries
> shouldn't be shared.
>
> The argument was that the "one-off" libraries shouldn't be shared.
>
> Things very much like libLLVM.so.
>...
Mesa and PostgreSQL are among the packages that do use libLLVM.so,
this is a popular library for implementing compilers and JITs.
> Linus
cu
Adrian
On Sun, May 2, 2021 at 9:45 AM Adrian Bunk <[email protected]> wrote:
>
> Mesa and PostgreSQL are among the packages that do use libLLVM.so,
> this is a popular library for implementing compilers and JITs.
Yes, and it's entirely reasonable to update those packages if it turns
out libLLVM has a bug in it.
Because we're talking about a small handful of packages, not some kind
of "everything" model.
So again, what's your point?
Linus
On Sun, May 02, 2021 at 09:49:44AM -0700, Linus Torvalds wrote:
> On Sun, May 2, 2021 at 9:45 AM Adrian Bunk <[email protected]> wrote:
> >
> > Mesa and PostgreSQL are among the packages that do use libLLVM.so,
> > this is a popular library for implementing compilers and JITs.
>
> Yes, and it's entirely reasonable to update those packages if it turns
> out libLLVM has a bug in it.
>
> Because we're talking about a small handful of packages, not some kind
> of "everything" model.
>
> So again, what's your point?
Two dozen other packages are linking directly with libLLVM.so.
Are you happy about libclang.so being a shared library?
libclang.so uses libLLVM.so, which adds another 10 indirect users.
Debian ships 30k source packages that build 60k binary packages,
with 3 years of security support (plus 2 years LTS).
It makes things a lot easier from a distribution point of view if a bug
in libLLVM can be fixed just there, instead of having to additionally
find and rebuild the 30 or more source packages building binary packages
that use libLLVM in a security update for a stable release of a distribution.
> Linus
cu
Adrian
On Sun, May 2, 2021 at 10:55 AM Adrian Bunk <[email protected]> wrote:
>
> Are you happy about libclang.so being a shared library?
Honestly, considering how I don't have any other package that I care
about than clang itself, and how this seems to be a *huge* performance
problem, then no.
But you are still entirely avoiding the real issue: the Fedora rule
that everything should be a shared library is simply bogus.
Even if the llvm/clang maintainers decide that that is what they want,
I know for a fact that that rule is completely the wrong thing in
other situations where people did *not* want that.
Can you please stop dancing around that issue, and just admit that the
whole "you should always use shared libraries" is simply WRONG.
Shared libraries really can have huge downsides, and the blind "shared
libraries are good" standpoint is just wrong.
Linus
On Sun, May 02, 2021 at 10:59:10AM -0700, Linus Torvalds wrote:
> On Sun, May 2, 2021 at 10:55 AM Adrian Bunk <[email protected]> wrote:
> >
> > Are you happy about libclang.so being a shared library?
>
> Honestly, considering how I don't have any other package that I care
> about than clang itself, and how this seems to be a *huge* performance
> problem, then no.
>
> But you are still entirely avoiding the real issue: the Fedora rule
> that everything should be a shared library is simply bogus.
It is not a Fedora-specific rule, we have something similar in Debian.
And in general, static libraries in the C/C++ ecosystem often feel like
a rarely used remnant from the last millenium (except for convenience
libraries during the build).
> Even if the llvm/clang maintainers decide that that is what they want,
> I know for a fact that that rule is completely the wrong thing in
> other situations where people did *not* want that.
libdivecomputer is now a submodule of subsurface.
If this is the only copy of libdivecomputer in a distribution,
then linking subsurface statically with it and not shipping it
as a separate library at all is fine for distributions.
The Fedora policy that was linked to also states that this is OK.
The important part for a distribution is to ship only one copy of the
code and having to rebuild only the one package containing it when
fixing a bug.
> Can you please stop dancing around that issue, and just admit that the
> whole "you should always use shared libraries" is simply WRONG.
>
> Shared libraries really can have huge downsides, and the blind "shared
> libraries are good" standpoint is just wrong.
Good for distributions or good for performance?
These two are quite distinct here, and distribution rules care about
what is good for distributions.
Library packages in ecosystems like Go or Rust are copies of the source
code, and when an application package is built with these "libraries"
(might even be using LTO) this is expected to be faster than using
shared libraries.
But for distributions not using shared libraries can be a huge pain.
Compared to LTO compilation of all code used in an application, static
linking gives the same pain to distributions with smaller benefits.
I agree that on the performance side you have a valid point regarding
the disadvantages of shared libraries, but not using them is bad for
distributions since it makes maintaining and supporting the software
much harder and security support often impossible.
> Linus
cu
Adrian
On Sat, 1 May 2021, Linus Torvalds wrote:
> > Yes, it's intentional. Dynamic linking libraries from other packages is
> > the Fedora policy[1], and clang and llvm are separate packages (in Fedora).
>
> Side note: I really wish Fedora stopped doing that.
I wish they never stop.
> Shared libraries are not a good thing in general. They add a lot of
> overhead in this case, but more importantly they also add lots of
> unnecessary dependencies and complexity, and almost no shared
> libraries are actually version-safe, so it adds absolutely zero
> upside.
I agree shared libraries are a tough compromise, but there is an
important upside. Let me quote myself from a recent discussion
<https://lore.kernel.org/linux-mips/[email protected]/>:
"Dynamic shared objects (libraries) were invented in early 1990s for two
reasons:
1. To limit the use of virtual memory. Memory conservation may not be as
important nowadays in many applications where vast amounts of RAM are
available, though of course this does not apply everywhere, and still
it has to be weighed up whether any waste of resources is justified and
compensated by a gain elsewhere.
2. To make it easy to replace a piece of code shared among many programs,
so that you don't have to relink them all (or recompile if sources are
available) when say an issue is found or a feature is added that is
transparent to applications (for instance a new protocol or a better
algorithm). This still stands very much nowadays.
People went through great efforts to support shared libraries, sacrificed
performance for it even back then when the computing power was much lower
than nowadays. Support was implemented in Linux for the a.out binary
format even, despite the need to go through horrible hoops to get a.out
shared libraries built. Some COFF environments were adapted for shared
library support too."
And the context here is a bug in the linker caused all programs built by
Golang to be broken WRT FPU handling for the 32-bit MIPS configuration,
due to a bad ABI annotation causing the wrong per-process FPU mode being
set up at run time (Golang seemed to have got stuck in early 2000s as far
the MIPS ABI goes and chose to produce what has been considered legacy
objects for some 10 years now, and nobody noticed in 10 years or so that
the GNU linker does not handle legacy MIPS objects correctly anymore).
This could have been fixed easily by rebuilding the Go runtime, but as it
happens Google chose not to create a shared Go runtime and all programs
are linked statically except for libc.
This had led to a desperate attempt to work the issue around crudely in
the kernel (which cannot be done in a completely foolproof way, because
there's missing information) so that Debian does not have to rebuild 2000+
packages in a stable distribution, which OTOH is a no-no for them.
Whether distributions package shared libraries in a reasonable manner is
another matter, and I've lost hope it will ever happen, at least widely
(there has been an attempt to address that with a distribution called PLD,
where the policy used to have it that shared libraries coming from a given
source package need to go into a separate binary package on their own, so
that several versions of different SONAMEs each of the same shared library
package can safely coexist in a single system, but I haven't checked it in
many years whether the policy has been retained, nor actually ever used
PLD myself).
Maciej
On Mon, 3 May 2021, Theodore Ts'o wrote:
> On Mon, May 03, 2021 at 10:38:12AM -0400, Theodore Ts'o wrote:
> > On Mon, May 03, 2021 at 03:03:31AM +0200, Maciej W. Rozycki wrote:
> > >
> > > People went through great efforts to support shared libraries, sacrificed
> > > performance for it even back then when the computing power was much lower
> > > than nowadays.
> >
> > That was because memory was *incredibly* restrictive in those days.
> > My first Linux server had one gig of memory, and so shared libraries
> > provided a huge performance boost --- because otherwise systems would
> > be swapping or paging their brains out.
>
> Correction. My bad, my first Linux machine had 16 megs of memory....
There was memory and there was storage. Back in 1990s I maintained Linux
machines with as little as 4MiB of RAM or 2MiB even with some 80386SX box,
and as little as 40MB HDD or 64MB SSD (which was pretty damn expensive and
occupied the whole 3.5" drive space in the PATA form factor). Yes, 2MiB
used to be the minimum for x86 around 2.0.x, and you could actually boot a
system multiuser with as little RAM. And obviously dynamic executables
took less storage space than static ones, so if you had more than just a
couple, the saving balanced the overhead of the shared library files used.
But I agree this is only relevant nowadays in certain specific use cases
(which will often anyway choose to run things like BusyBox plus maybe just
a bunch of tools, and won't see any benefit from memory sharing or storage
saving).
> > However, these days, many if not most developers aren't capable of the
> > discpline needed to maintained the ABI stability needed for shared
> > libraries to work well. I can think several packages where if you
> > used shared libraries, the major version number would need to be
> > bumped at every releases, because people don't know how to spell ABI,
> > never mind be able to *preserve* ABI. Heck, it's the same reason that
> > we don't promise kernel ABI compatibility for kernel modules!
> >
> > https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst
> >
> > And in the case of Debian, use of shared libraries means that every
> > time you release a new version of, say, f2fs-tools, things can get
> > stalled for months or in one case, over a year, due to the new package
> > review process (a shared library version bump means a new binary
> > package, and that in turn requires a full review of the entire source
> > package for GPL compliance from scratch, and f2fs-tools has bumped
> > their shared library major version *every* *single* *release*) ---
> > during which time, security bug fixes were being held up due to the
> > new package review tarpit.
Well, SONAME maintenance is indeed a hassle, but to solve this problem
we've had symbol versioning for decades now, ever since we've switched
from libc 5 to glibc 2.0. And glibc hasn't bumped the SONAMEs of the
individual libraries ever since, while maintaining all the old ABIs (not
necessarily available to link against) and adding new ones as required.
So it has been pretty easy to maintain ABI compatibility nowadays without
the need to carry multiple library versions along, as long as you actually
care to do so.
> > If people could actually guarantee stable ABI's, then shared libraries
> > might make sense. E2fsprogs hasn't had a major version bump in shared
> > libraries for over a decade (although some developers whine and
> > complain about how I reject function signature changes in the
> > libext2fs library to provide that ABI stability). But how many
> > userspace packages can make that claim?
That's actually the matter of general software quality and the competence
of software developers. I have no good answer except for a suggestion to
see this talk: <https://lca2020.linux.org.au/schedule/presentation/105/>.
Maciej
On Mon, May 03, 2021 at 03:03:31AM +0200, Maciej W. Rozycki wrote:
>
> People went through great efforts to support shared libraries, sacrificed
> performance for it even back then when the computing power was much lower
> than nowadays.
That was because memory was *incredibly* restrictive in those days.
My first Linux server had one gig of memory, and so shared libraries
provided a huge performance boost --- because otherwise systems would
be swapping or paging their brains out.
However, these days, many if not most developers aren't capable of the
discpline needed to maintained the ABI stability needed for shared
libraries to work well. I can think several packages where if you
used shared libraries, the major version number would need to be
bumped at every releases, because people don't know how to spell ABI,
never mind be able to *preserve* ABI. Heck, it's the same reason that
we don't promise kernel ABI compatibility for kernel modules!
https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst
And in the case of Debian, use of shared libraries means that every
time you release a new version of, say, f2fs-tools, things can get
stalled for months or in one case, over a year, due to the new package
review process (a shared library version bump means a new binary
package, and that in turn requires a full review of the entire source
package for GPL compliance from scratch, and f2fs-tools has bumped
their shared library major version *every* *single* *release*) ---
during which time, security bug fixes were being held up due to the
new package review tarpit.
If people could actually guarantee stable ABI's, then shared libraries
might make sense. E2fsprogs hasn't had a major version bump in shared
libraries for over a decade (although some developers whine and
complain about how I reject function signature changes in the
libext2fs library to provide that ABI stability). But how many
userspace packages can make that claim?
- Ted
On Mon, May 03, 2021 at 10:38:12AM -0400, Theodore Ts'o wrote:
> On Mon, May 03, 2021 at 03:03:31AM +0200, Maciej W. Rozycki wrote:
> >
> > People went through great efforts to support shared libraries, sacrificed
> > performance for it even back then when the computing power was much lower
> > than nowadays.
>
> That was because memory was *incredibly* restrictive in those days.
> My first Linux server had one gig of memory, and so shared libraries
> provided a huge performance boost --- because otherwise systems would
> be swapping or paging their brains out.
Correction. My bad, my first Linux machine had 16 megs of memory....
- Ted
>
> However, these days, many if not most developers aren't capable of the
> discpline needed to maintained the ABI stability needed for shared
> libraries to work well. I can think several packages where if you
> used shared libraries, the major version number would need to be
> bumped at every releases, because people don't know how to spell ABI,
> never mind be able to *preserve* ABI. Heck, it's the same reason that
> we don't promise kernel ABI compatibility for kernel modules!
>
> https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst
>
> And in the case of Debian, use of shared libraries means that every
> time you release a new version of, say, f2fs-tools, things can get
> stalled for months or in one case, over a year, due to the new package
> review process (a shared library version bump means a new binary
> package, and that in turn requires a full review of the entire source
> package for GPL compliance from scratch, and f2fs-tools has bumped
> their shared library major version *every* *single* *release*) ---
> during which time, security bug fixes were being held up due to the
> new package review tarpit.
>
> If people could actually guarantee stable ABI's, then shared libraries
> might make sense. E2fsprogs hasn't had a major version bump in shared
> libraries for over a decade (although some developers whine and
> complain about how I reject function signature changes in the
> libext2fs library to provide that ABI stability). But how many
> userspace packages can make that claim?
>
> - Ted
From: Theodore Ts'o <[email protected]>
> Sent: 03 May 2021 15:38
...
> If people could actually guarantee stable ABI's, then shared libraries
> might make sense. E2fsprogs hasn't had a major version bump in shared
> libraries for over a decade (although some developers whine and
> complain about how I reject function signature changes in the
> libext2fs library to provide that ABI stability). But how many
> userspace packages can make that claim?
Indeed. Stable ABIs are really mandatory for anything released as
a shared library.
You can add new functions, and (if careful) new features to
existing functions (if you remembered to check all those unused
fields and flags), but the function signatures must not change.
You also can't change the exported data area.
We've got some simple drivers, they don't do anything complex.
Just hardware interrupts and PCIe accesses.
It wouldn't require many structure to be fixed, and a few
non-inlined versions of some access functions to make these
reasonably binary compatible.
At least to the point that they don't need rebuilding when
a distribution releases a new minor kernel version.
Solaris had stable kernel ABIs.
The windows version of our drivers installs on everything
from Windows 7 (maybe even Vista) through to the latest
Windows 10 (apart from the 'driver signing' fiasco).
With multiple symbol namespaces it ought to be possible
to keep them separately stable - so that drivers that only
use some symbols are portable.
Of course, there are the people who only want to support
in-tree source drivers.
They clearly exist outside the commercial world.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 5/1/21 10:19 PM, Dan Aloni wrote:
> On Fri, Apr 30, 2021 at 06:48:11PM -0700, Nick Desaulniers wrote:
>> On Fri, Apr 30, 2021 at 6:22 PM Linus Torvalds
>> <[email protected]> wrote:
>>> 0.92% libLLVM-12.so llvm::StringMapImpl::LookupBucketFor
>>
>> ^ wait a minute; notice how in your profile the `Shared Object` is
>> attributed to `libLLVM-12.so` while mine is `clang-13`? Clang can be
>> built as either having libllvm statically linked or dynamically; see
>> the cmake variables
>> LLVM_BUILD_LLVM_DYLIB:BOOL
>> LLVM_LINK_LLVM_DYLIB:BOOL
>> BUILD_SHARED_LIBS:BOOL
>> https://llvm.org/docs/CMake.html
>>
>> I think those are frowned upon; useful for cutting down on developers
>> iteration speed due to not having to relink llvm when developing
>> clang. But shipping that in production? I just checked and it doesn't
>> look like we do that for AOSP's build of LLVM.
>
> There's also `-DLLVM_ENABLE_LTO=Thin` that enables LTO for building LLVM
> and Clang themselves, considered they can be bootstrapped like this
> using a previous version of Clang. Combining that with a non-shared
> library build mode for both Clang and LLVM, the result is possibly the
> fastest and most optimized build that is achievable. Unfortunately I
> see distributions neglecting to enable this in packaging this as well.
>
> On a side note, I'm also a Fedora user and agree with Linus about this.
> I'd like to see an opt-in bypass of the shared library policy via
> something like `dnf install clang-optimized` that would install the
> fastest and most optimized Clang build regardless of RPM install size.
>
I have experimented with creating a static version of clang in the past,
but never really found a solution I liked enough to upstream into Fedora.
e.g. This solution[1] that we're using to bootstrap clang in our internal
clang-as-the-default-cc Fedora buildroots that we use for testing.
If someone could file a bug[2] against the clang package in Fedora (or RHEL even)
with some data or other information that shows the downsides of the shared
build of of clang, that would be really helpful.
-Tom
[1] https://src.fedoraproject.org/fork/tstellar/rpms/clang/c/dea2413c6822cc7aa7a08ebe73d10abf8216259f?branch=clang-minimal
[2] https://bugzilla.redhat.com/
On 2021-05-03, Tom Stellard wrote:
>On 5/1/21 10:19 PM, Dan Aloni wrote:
>>On Fri, Apr 30, 2021 at 06:48:11PM -0700, Nick Desaulniers wrote:
>>>On Fri, Apr 30, 2021 at 6:22 PM Linus Torvalds
>>><[email protected]> wrote:
>>>> 0.92% libLLVM-12.so llvm::StringMapImpl::LookupBucketFor
>>>
>>>^ wait a minute; notice how in your profile the `Shared Object` is
>>>attributed to `libLLVM-12.so` while mine is `clang-13`? Clang can be
>>>built as either having libllvm statically linked or dynamically; see
>>>the cmake variables
>>>LLVM_BUILD_LLVM_DYLIB:BOOL
>>>LLVM_LINK_LLVM_DYLIB:BOOL
>>>BUILD_SHARED_LIBS:BOOL
>>>https://llvm.org/docs/CMake.html
>>>
>>>I think those are frowned upon; useful for cutting down on developers
>>>iteration speed due to not having to relink llvm when developing
>>>clang. But shipping that in production? I just checked and it doesn't
>>>look like we do that for AOSP's build of LLVM.
>>
>>There's also `-DLLVM_ENABLE_LTO=Thin` that enables LTO for building LLVM
>>and Clang themselves, considered they can be bootstrapped like this
>>using a previous version of Clang. Combining that with a non-shared
>>library build mode for both Clang and LLVM, the result is possibly the
>>fastest and most optimized build that is achievable. Unfortunately I
>>see distributions neglecting to enable this in packaging this as well.
>>
>>On a side note, I'm also a Fedora user and agree with Linus about this.
>>I'd like to see an opt-in bypass of the shared library policy via
>>something like `dnf install clang-optimized` that would install the
>>fastest and most optimized Clang build regardless of RPM install size.
>>
>
>I have experimented with creating a static version of clang in the past,
>but never really found a solution I liked enough to upstream into Fedora.
>e.g. This solution[1] that we're using to bootstrap clang in our internal
>clang-as-the-default-cc Fedora buildroots that we use for testing.
>
>If someone could file a bug[2] against the clang package in Fedora (or RHEL even)
>with some data or other information that shows the downsides of the shared
>build of of clang, that would be really helpful.
>
>-Tom
>
>[1] https://src.fedoraproject.org/fork/tstellar/rpms/clang/c/dea2413c6822cc7aa7a08ebe73d10abf8216259f?branch=clang-minimal
>[2] https://bugzilla.redhat.com/
>
I have filed https://bugzilla.redhat.com/show_bug.cgi?id=1956484 with
information from my previous reply https://lore.kernel.org/lkml/[email protected]/
-fpic (.so) -fno-semantic-interposition -Wl,-Bsymbolic-functions is very
close to -fpic/-fpie (.a) in terms of performance.
(
If Fedora is willing to use -fprofile-use (profile guided optimization)
or ThinLTO, that's great as well, with the cost of much longer build time.)
On Sun, May 2, 2021 at 11:48 PM Adrian Bunk <[email protected]> wrote:
>
> Library packages in ecosystems like Go or Rust are copies of the source
> code, and when an application package is built with these "libraries"
> (might even be using LTO) this is expected to be faster than using
> shared libraries.
Rust libraries only need to include "copies" for generics; and only
enough information to use them. Keeping the raw source code would be
one way of doing that (like C++ header-only libraries), but it is not
required.
However, it is true that Rust does not have a stable ABI, that the
vast majority of Rust open source applications get built from source
via Cargo and that Cargo does not share artifacts in its cache.
Cheers,
Miguel
On Sun, May 2, 2021 at 11:31 AM Adrian Bunk <[email protected]> wrote:
>
> Some of the new language ecosystems like Go or Rust do not offer
> shared libraries.
This is a bit misleading. Rust offers shared libraries, including the
option to offer a C ABI.
The problem are generics which, like C++ templates, cannot be swapped
at runtime. Distributions have had to deal with the STL, Boost, etc.
all these years too.
In fact, Rust improves things a bit: there are no headers that need to
be parsed from scratch every time.
> What happens if you use a program provided by your distribution that is
> written in Rust and handles untrusted input in a way that it might be
> vulnerable to exploits based on one of these CVEs?
>
> The program has a known vulnerability that will likely stay unfixed.
Why? I fail to see what is the issue rebuilding (or relinking) all
packages except distributions lacking enough compute resources.
Cheers,
Miguel
On Mon, 3 May 2021 at 10:39, Theodore Ts'o <[email protected]> wrote:
>
> That was because memory was *incredibly* restrictive in those days.
> My first Linux server had one gig of memory, and so shared libraries
> provided a huge performance boost --- because otherwise systems would
> be swapping or paging their brains out.
(I assume you mean 1 megabyte?)
I have 16G and the way modern programs are written I'm still having
trouble avoiding swap thrashing...
This is always a foolish argument though. Regardless of the amount of
resources available we always want to use it as efficiently as
possible. The question is not whether we have more memory today than
before, but whether the time and power saved in reducing memory usage
(and memory bandwidth usage) is more or less than other resource costs
being traded off and whether that balance has changed.
> However, these days, many if not most developers aren't capable of the
> discpline needed to maintained the ABI stability needed for shared
> libraries to work well.
I would argue you have cause and effect reversed here. The reason
developers don't understand ABI (or even API) compatibility is
*because* they're used to people just static linking (or vendoring).
If people pushed back the world would be a better place.
--
greg
On Wed, May 05, 2021 at 12:02:33AM +0200, Miguel Ojeda wrote:
> On Sun, May 2, 2021 at 11:48 PM Adrian Bunk <[email protected]> wrote:
> >
> > Library packages in ecosystems like Go or Rust are copies of the source
> > code, and when an application package is built with these "libraries"
> > (might even be using LTO) this is expected to be faster than using
> > shared libraries.
>
> Rust libraries only need to include "copies" for generics; and only
> enough information to use them. Keeping the raw source code would be
> one way of doing that (like C++ header-only libraries), but it is not
> required.
>
> However, it is true that Rust does not have a stable ABI, that the
> vast majority of Rust open source applications get built from source
> via Cargo and that Cargo does not share artifacts in its cache.
What does this mean for enterprise distributions, like RHEL, which
need to maintain a stable kernel ABI as part of their business model.
I assume it means that they will need to lock down on a specific Rust
compiler and Rust libraries? How painful will it be for them to get
security updates (or have to do backports of security bug fixes) for
7-10 years?
- Ted
On Tue, May 04, 2021 at 07:04:56PM -0400, Greg Stark wrote:
> On Mon, 3 May 2021 at 10:39, Theodore Ts'o <[email protected]> wrote:
> >
> > That was because memory was *incredibly* restrictive in those days.
> > My first Linux server had one gig of memory, and so shared libraries
> > provided a huge performance boost --- because otherwise systems would
> > be swapping or paging their brains out.
>
> (I assume you mean 1 megabyte?)
> I have 16G and the way modern programs are written I'm still having
> trouble avoiding swap thrashing...
I corrected myself in a follow-on message; I had 16 megabytes of
memory, which was generous at the time. But it was still restrictive
enough that it made sense to have shared libraries for C library, X
Windows, etc.
> This is always a foolish argument though. Regardless of the amount of
> resources available we always want to use it as efficiently as
> possible. The question is not whether we have more memory today than
> before, but whether the time and power saved in reducing memory usage
> (and memory bandwidth usage) is more or less than other resource costs
> being traded off and whether that balance has changed.
It's always about engineering tradeoffs. We're always trading off
available CPU, memory, storage device speeds --- and also programmer
time and complexity. For example, C++ and stable ABI's really don't
go well together. So if you are using a large number of C++
libraries, the ability to maintain stable ABI's is ***much*** more
difficult. This was well understood decades ago --- there was an
Ottawa Linux Symposium presentation that discussed this in the context
of KDE two decades ago.
I'll also note that technology can also play a huge role here. Debian
for example is now much more capable of rebuilding all packages from
source with autobuilders. In addition, most desktops have easy access
to high speed network links, and are set up auto-update packages. In
that case, the argument that distributions have to have shared
libraries because otherwise it's too hard to rebuild all of the
binaries that statically linked against a shared library with a
security fix becomes much less compelling. It should be pretty simple
to set up a system where after a library gets a security update, the
distribution could automatically figure out which packages needs to be
automatically rebuilt, and rebuild them all.
> > However, these days, many if not most developers aren't capable of the
> > discpline needed to maintained the ABI stability needed for shared
> > libraries to work well.
>
> I would argue you have cause and effect reversed here. The reason
> developers don't understand ABI (or even API) compatibility is
> *because* they're used to people just static linking (or vendoring).
> If people pushed back the world would be a better place.
I'd argue is just that many upstream developers just don't *care*.
The incentives of an upstream developer and the distribution
maintainers are quite different. ABI compatibility doesn't bring much
benefits to upstream developers, and when you have a separation of
concerns between package maintenance and upstream development, it's
pretty inevitable.
I wear both hats for e2fsprogs as the upstream maintainer as well as
the Debian maintainer for that package, and I can definitely see the
differences in the points of view of those two roles.
Cheers,
- Ted
From: Miguel Ojeda
> Sent: 04 May 2021 22:33
...
> > What happens if you use a program provided by your distribution that is
> > written in Rust and handles untrusted input in a way that it might be
> > vulnerable to exploits based on one of these CVEs?
> >
> > The program has a known vulnerability that will likely stay unfixed.
>
> Why? I fail to see what is the issue rebuilding (or relinking) all
> packages except distributions lacking enough compute resources.
The problem isn't the packages that come with the distribution.
The problem is 3rd party programs supplied as binaries.
They have 2 big requirements:
1) The same binary will run on all distributions (newer than some cutoff).
2) Any serious bug fixes in system libraries get picked up when the
distribution updates the library.
There is also the possibility that the implementation of some
function differs between distributions.
So you absolutely need to use the version from the installed system
not whatever was in some static library on the actual build machine.
Both of these need stable ABI and shared libraries.
Remember, as far as userspace is concerned, foo.h is the definition
for 'foo' and foo.so is the current implementation.
(yes, I know a little bit of info is taken from foo.so on the build
system - but that ought to be absolutely minimal.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, May 5, 2021 at 1:06 PM David Laight <[email protected]> wrote:
>
> The problem isn't the packages that come with the distribution.
My question was in the context of Adrian's emails who were mentioning
issues for Linux distribution etc.
> The problem is 3rd party programs supplied as binaries.
> They have 2 big requirements:
> 1) The same binary will run on all distributions (newer than some cutoff).
This is fine with the "everything statically linked" model.
> 2) Any serious bug fixes in system libraries get picked up when the
> distribution updates the library.
For 3rd party software, this is usually done through an auto-update
mechanism of some kind. And since the vendor typically provides
everything, including dependencies (even libc in some cases!), they
can afford to statically link the world.
That model, of course, has issues -- the vendor may go out of
business, may be slow with security updates, etc.
But this is all orthogonal to Rust -- I replied mainly because it was
mentioned that Rust brought new issues to the table, which isn't true.
> There is also the possibility that the implementation of some
> function differs between distributions.
> So you absolutely need to use the version from the installed system
> not whatever was in some static library on the actual build machine.
>
> Both of these need stable ABI and shared libraries.
Not really. If you go for the "statically linked" model for your
application, you only need to care about the syscall layer (or
equivalent higher-level layers in e.g. Windows/macOS).
If you trust vendors a bit, you can instead go for "statically linked
except for major system libraries" (like libc or libm in Linux). This
is what Rust does by default for the glibc x86_64 target.
Given that nowadays statically linking is convenient, affordable and
improves performance, it seems like the right decision.
> Remember, as far as userspace is concerned, foo.h is the definition
> for 'foo' and foo.so is the current implementation.
> (yes, I know a little bit of info is taken from foo.so on the build
> system - but that ought to be absolutely minimal.)
No, that is only the C model for shared libraries.
C++ has had templates for decades now and no "C++ ABI" so far covers
them. Thus, if you want to provide templates as a library, they cannot
be "pre-compiled" and so the implementation is kept in the header.
This actually turned out to be quite convenient and nowadays many
libraries are developed as "header-only", in fact. Moreover, recently
the C++ standard introduced new features that simplify taking this
approach, e.g. C++17 `inline` variables.
Rust has the same issue with generics, but improves the situation a
bit: there is no need to reparse everything, every time, from scratch,
for each translation unit that uses a library with templates (which is
quite an issue for C++, with big projects going out of their way to
reduce the trees of includes).
Cheers,
Miguel
From: Miguel Ojeda
> Sent: 05 May 2021 14:54
>
> On Wed, May 5, 2021 at 1:06 PM David Laight <[email protected]> wrote:
> >
> > The problem isn't the packages that come with the distribution.
>
> My question was in the context of Adrian's emails who were mentioning
> issues for Linux distribution etc.
>
> > The problem is 3rd party programs supplied as binaries.
> > They have 2 big requirements:
> > 1) The same binary will run on all distributions (newer than some cutoff).
>
> This is fine with the "everything statically linked" model.
No it isn't.
It may work for simple library calls (like string functions)
that you can write yourself, and things that are direct system calls.
But it falls fowl of anything complicated that has to interact
with the rest of the system.
> > 2) Any serious bug fixes in system libraries get picked up when the
> > distribution updates the library.
>
> For 3rd party software, this is usually done through an auto-update
> mechanism of some kind.
I can already see a held of pigs flying...
> And since the vendor typically provides
> everything, including dependencies (even libc in some cases!), they
> can afford to statically link the world.
That might work if they are supplying all the applications that run
of a given system - and probably including the kernel as well.
> That model, of course, has issues -- the vendor may go out of
> business, may be slow with security updates, etc.
Many years ago the company I worked for found that the unix 'utmpx'
file was getting corrupted (due to incorrect locking).
The functions had been places in an archive part of libc (for
various reasons).
Getting the fix onto the customers machine (we were the OS vendor)
involved determining which applications from 3rd (4th?) parties
had been linked with the broken code and then applying enough
'gentle persuasion' to get them to relink the offending programs.
Even this can be problematic because the source control systems
of some companies isn't great (it is probably better these days).
But getting the 'previous version' rebuilt with a new libc.a
can be very problematic.
> But this is all orthogonal to Rust -- I replied mainly because it was
> mentioned that Rust brought new issues to the table, which isn't true.
>
> > There is also the possibility that the implementation of some
> > function differs between distributions.
> > So you absolutely need to use the version from the installed system
> > not whatever was in some static library on the actual build machine.
> >
> > Both of these need stable ABI and shared libraries.
>
> Not really. If you go for the "statically linked" model for your
> application, you only need to care about the syscall layer (or
> equivalent higher-level layers in e.g. Windows/macOS).
No because there are messages sent to system daemons and file
formats that can be system dependant.
Not everything is a system call.
>
> If you trust vendors a bit, you can instead go for "statically linked
> except for major system libraries" (like libc or libm in Linux). This
> is what Rust does by default for the glibc x86_64 target.
>
> Given that nowadays statically linking is convenient, affordable and
> improves performance, it seems like the right decision.
>
> > Remember, as far as userspace is concerned, foo.h is the definition
> > for 'foo' and foo.so is the current implementation.
> > (yes, I know a little bit of info is taken from foo.so on the build
> > system - but that ought to be absolutely minimal.)
>
> No, that is only the C model for shared libraries.
>
> C++ has had templates for decades now and no "C++ ABI" so far covers
> them. Thus, if you want to provide templates as a library, they cannot
> be "pre-compiled" and so the implementation is kept in the header.
>
> This actually turned out to be quite convenient and nowadays many
> libraries are developed as "header-only", in fact. Moreover, recently
> the C++ standard introduced new features that simplify taking this
> approach, e.g. C++17 `inline` variables.
Remind be to request our management to let me remove all the C++
from most of our programs.
None of them actually need it, the reasons for C++ aren't technical.
if you want to see something really horrid look at the inline
destructor code for iostringstream.
And don't let me look at the code for CString either.
> Rust has the same issue with generics, but improves the situation a
> bit: there is no need to reparse everything, every time, from scratch,
> for each translation unit that uses a library with templates (which is
> quite an issue for C++, with big projects going out of their way to
> reduce the trees of includes).
That sounds like it has all the same problems as pre-compiled headers.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, May 5, 2021 at 2:58 AM Theodore Ts'o <[email protected]> wrote:
>
> What does this mean for enterprise distributions, like RHEL, which
> need to maintain a stable kernel ABI as part of their business model.
> I assume it means that they will need to lock down on a specific Rust
> compiler and Rust libraries? How painful will it be for them to get
> security updates (or have to do backports of security bug fixes) for
> 7-10 years?
That is a good question. If a security fix requires changes in some
generic that an out-of-tree module uses, customers will need to
rebuild their module if they want that fix. So companies providing
those modules will need to understand that disadvantage if they decide
to write an out-of-tree module in Rust. And to support out-of-tree
modules, distributions will need to provide the generics metadata like
they provide headers etc.
As for freezing the compiler, some distributions already support the
Rust compiler in LTS releases etc. But now that the Rust Foundation
exists and gets sponsor money from big corporations, companies and
distributions may be able to ask for "LTS" releases of the Rust
compiler, or ask for sharing some of the burden of backporting
security fixes etc.
Cheers,
Miguel
On Wed, May 5, 2021 at 6:25 PM David Laight <[email protected]> wrote:
>
> But it is the customer's customer who comes back to you saying
> that something in your library is broken.
> This is when you really don't what static linking - ever.
In that case, you need to refer them to your (direct) customer. I
understand where you are coming from (e.g. Microsoft also encourages
developers to avoid static linking their CRT), but there is no good
solution for that -- some of your direct customers will require you
provide the version for static linking nevertheless, so your only
approach would be gating access to the static version somehow.
> Static linking is much worse because different parts of the 'system'
> are provided by different people.
> With a little care a C shared library can be implemented by different
> companies while still meeting the same ABI.
I assume you are talking about things like program plugins in the form
of shared libraries (e.g. a different renderers in 3D suites,
different chess engines, mods in a videogame, etc.).
In that case, well, if you really need a customer of yours to swap
libraries without rebuilding the host program, because you want other
companies to provide plugins, then obviously static linking is not the
way to go. But shared libraries are just one possible solution in that
space anyway, there is also IPC of different kinds, bytecode VMs,
interpreters, etc.
> It this case it was done to give the software engineers some
> experience of writing C++.
> Technically it was a big mistake.
>
> Bad C++ is also infinitely worse that bad C.
> Exception handling (which you might think of as a gain)
> is very easy to get badly wrong.
> Class member overloads make it impossible to work out where data is used.
> Function overloads are sometimes nice - but unnecessary.
Agreed! While, in general, this applies to any language, it is
specially dangerous in languages with UB. And, among those, C++ is
very complex, which in turn can produce very subtle UB issues. This
was understood by Rust designers, and the language is an attempt to
minimize UB while, at the same time, providing higher-level features
than C.
Cheers,
Miguel
On Wed, May 5, 2021 at 4:13 PM David Laight <[email protected]> wrote:
>
> Many years ago the company I worked for found that the unix 'utmpx'
> file was getting corrupted (due to incorrect locking).
> The functions had been places in an archive part of libc (for
> various reasons).
> Getting the fix onto the customers machine (we were the OS vendor)
> involved determining which applications from 3rd (4th?) parties
> had been linked with the broken code and then applying enough
> 'gentle persuasion' to get them to relink the offending programs.
> Even this can be problematic because the source control systems
> of some companies isn't great (it is probably better these days).
> But getting the 'previous version' rebuilt with a new libc.a
> can be very problematic.
If you are a library vendor and you provide the fixed library, then
you are done. It is your customer's call to rebuild their software or
not; and they are the ones choosing static linking or not.
Sure, you want to offer the best service to your clients, and some
customers will choose static linking without fully understanding the
pros/cons, but you cannot do anything against that. And you still need
to provide the static version for those clients that know they need
it.
> No because there are messages sent to system daemons and file
> formats that can be system dependant.
> Not everything is a system call.
That is orthogonal to static linking or not, which was the topic at hand.
What you are talking about now are dependencies on external entities
and services. Static linking is not better nor worse just because you
depend on a local process, a file, a networked service, a particular
piece of hardware being present, etc.
> Remind be to request our management to let me remove all the C++
> from most of our programs.
Yeah, the problem exists since before 1998 :)
A stable, common C++ ABI etc. would have had some advantages, but it
did not happen.
> None of them actually need it, the reasons for C++ aren't technical.
Well, no program "needs" any particular language, but there are
advantages and disadvantages of using languages with more features
(and more complexity, too). It is a balance.
For the kernel, we believe Rust brings enough advantages over *both* C
and C++ to merit using it. C++ also has advantages over C, but it has
a big complexity burden, it has not had the luxury of being designed
from scratch with decades of hindsight from C and C++ like Rust has
had, and it does not have a UB-free subset.
> That sounds like it has all the same problems as pre-compiled headers.
PCHs are a hack to improve build times, yes.
In Rust, however, it is a more fundamental feature and the needed
information goes encoded into your library (.rlib, .so...).
Cheers,
Miguel
From: Miguel Ojeda
> Sent: 05 May 2021 17:07
>
> On Wed, May 5, 2021 at 4:13 PM David Laight <[email protected]> wrote:
> >
> > Many years ago the company I worked for found that the unix 'utmpx'
> > file was getting corrupted (due to incorrect locking).
> > The functions had been places in an archive part of libc (for
> > various reasons).
> > Getting the fix onto the customers machine (we were the OS vendor)
> > involved determining which applications from 3rd (4th?) parties
> > had been linked with the broken code and then applying enough
> > 'gentle persuasion' to get them to relink the offending programs.
> > Even this can be problematic because the source control systems
> > of some companies isn't great (it is probably better these days).
> > But getting the 'previous version' rebuilt with a new libc.a
> > can be very problematic.
>
> If you are a library vendor and you provide the fixed library, then
> you are done. It is your customer's call to rebuild their software or
> not; and they are the ones choosing static linking or not.
But it is the customer's customer who comes back to you saying
that something in your library is broken.
This is when you really don't what static linking - ever.
> Sure, you want to offer the best service to your clients, and some
> customers will choose static linking without fully understanding the
> pros/cons, but you cannot do anything against that. And you still need
> to provide the static version for those clients that know they need
> it.
>
> > No because there are messages sent to system daemons and file
> > formats that can be system dependant.
> > Not everything is a system call.
>
> That is orthogonal to static linking or not, which was the topic at hand.
>
> What you are talking about now are dependencies on external entities
> and services. Static linking is not better nor worse just because you
> depend on a local process, a file, a networked service, a particular
> piece of hardware being present, etc.
Static linking is much worse because different parts of the 'system'
are provided by different people.
With a little care a C shared library can be implemented by different
companies while still meeting the same ABI.
> > Remind be to request our management to let me remove all the C++
> > from most of our programs.
>
> Yeah, the problem exists since before 1998 :)
>
> A stable, common C++ ABI etc. would have had some advantages, but it
> did not happen.
>
> > None of them actually need it, the reasons for C++ aren't technical.
>
> Well, no program "needs" any particular language, but there are
> advantages and disadvantages of using languages with more features
> (and more complexity, too). It is a balance.
It this case it was done to give the software engineers some
experience of writing C++.
Technically it was a big mistake.
Bad C++ is also infinitely worse that bad C.
Exception handling (which you might think of as a gain)
is very easy to get badly wrong.
Class member overloads make it impossible to work out where data is used.
Function overloads are sometimes nice - but unnecessary.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)