2015-04-07 03:37:51

by Pengfei Yuan

[permalink] [raw]
Subject: Why not build kernel with -O3

Hi,

I have conducted some experiments to compare kernels built with -O2
and -O3. Here are the results:

Application Performance O2 Performance O3 Improvement
Apache 127814.14 req/s 130321.24 req/s 1.96%
Nginx 537589.08 req/s 556723.32 req/s 3.56%
MySQL 70661.38 tx/s 71008.47 tx/s 0.49%
PostgreSQL 79763.39 tx/s 79535.59 tx/s -0.29%
Redis 352547.47 op/s 405417.24 op/s 15.0%
Memcached 844439.14 op/s 845321.79 op/s 0.10%

Geomean: +3.34%

Experiment environment: Linux 3.19.3, GCC 4.9.3 prerelease, Core-i7
4770, 32G RAM, 10GbE

LMbench microbenchmark also shows reduction in various latencies, as
well as increase of throughputs.

Why not add an option to build kernel with -O3?

Regards,
YUAN, Pengfei


2015-04-07 06:43:18

by Mike Galbraith

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On Tue, 2015-04-07 at 11:37 +0800, Pengfei Yuan wrote:
> Hi,
>
> I have conducted some experiments to compare kernels built with -O2
> and -O3. Here are the results:
>
> Application Performance O2 Performance O3 Improvement
> Apache 127814.14 req/s 130321.24 req/s 1.96%
> Nginx 537589.08 req/s 556723.32 req/s 3.56%
> MySQL 70661.38 tx/s 71008.47 tx/s 0.49%
> PostgreSQL 79763.39 tx/s 79535.59 tx/s -0.29%
> Redis 352547.47 op/s 405417.24 op/s 15.0%
> Memcached 844439.14 op/s 845321.79 op/s 0.10%
>
> Geomean: +3.34%
>
> Experiment environment: Linux 3.19.3, GCC 4.9.3 prerelease, Core-i7
> 4770, 32G RAM, 10GbE
>
> LMbench microbenchmark also shows reduction in various latencies, as
> well as increase of throughputs.

Please show multiple run data for all permutations of supported gcc
version/arch ;-)

-Mike

2015-04-07 07:07:20

by Boaz Harrosh

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On 04/07/2015 09:43 AM, Mike Galbraith wrote:
> On Tue, 2015-04-07 at 11:37 +0800, Pengfei Yuan wrote:
>> Hi,
>>
>> I have conducted some experiments to compare kernels built with -O2
>> and -O3. Here are the results:
>>
>> Application Performance O2 Performance O3 Improvement
>> Apache 127814.14 req/s 130321.24 req/s 1.96%
>> Nginx 537589.08 req/s 556723.32 req/s 3.56%
>> MySQL 70661.38 tx/s 71008.47 tx/s 0.49%
>> PostgreSQL 79763.39 tx/s 79535.59 tx/s -0.29%
>> Redis 352547.47 op/s 405417.24 op/s 15.0%
>> Memcached 844439.14 op/s 845321.79 op/s 0.10%
>>
>> Geomean: +3.34%
>>
>> Experiment environment: Linux 3.19.3, GCC 4.9.3 prerelease, Core-i7
>> 4770, 32G RAM, 10GbE
>>
>> LMbench microbenchmark also shows reduction in various latencies, as
>> well as increase of throughputs.
>
> Please show multiple run data for all permutations of supported gcc
> version/arch ;-)
>

He did say optional. So I'd imagine it would be a Kconfig of its own.
So the default can be as today, but people that want to experiment
need not hack the source code.

Cheers
Boaz

> -Mike

2015-04-07 07:56:40

by Pengfei Yuan

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

I am trying legacy GCC versions.
But I am not able to try different architectures.

2015-04-07 14:43 GMT+08:00 Mike Galbraith <[email protected]>:
> On Tue, 2015-04-07 at 11:37 +0800, Pengfei Yuan wrote:
>> Hi,
>>
>> I have conducted some experiments to compare kernels built with -O2
>> and -O3. Here are the results:
>>
>> Application Performance O2 Performance O3 Improvement
>> Apache 127814.14 req/s 130321.24 req/s 1.96%
>> Nginx 537589.08 req/s 556723.32 req/s 3.56%
>> MySQL 70661.38 tx/s 71008.47 tx/s 0.49%
>> PostgreSQL 79763.39 tx/s 79535.59 tx/s -0.29%
>> Redis 352547.47 op/s 405417.24 op/s 15.0%
>> Memcached 844439.14 op/s 845321.79 op/s 0.10%
>>
>> Geomean: +3.34%
>>
>> Experiment environment: Linux 3.19.3, GCC 4.9.3 prerelease, Core-i7
>> 4770, 32G RAM, 10GbE
>>
>> LMbench microbenchmark also shows reduction in various latencies, as
>> well as increase of throughputs.
>
> Please show multiple run data for all permutations of supported gcc
> version/arch ;-)
>
> -Mike

2015-04-07 08:29:48

by Mike Galbraith

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On Tue, 2015-04-07 at 10:07 +0300, Boaz Harrosh wrote:
>
> He did say optional. So I'd imagine it would be a Kconfig of its own.
> So the default can be as today, but people that want to experiment
> need not hack the source code.

Anybody wanting to play with it will just twiddle the Makefile.
Anyone too lazy to do that is likely lying on the couch watching
cartoons or something, not building kernels :)

-Mike

2015-04-07 10:09:21

by Mike Galbraith

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On Tue, 2015-04-07 at 15:56 +0800, Pengfei Yuan wrote:
> I am trying legacy GCC versions.
> But I am not able to try different architectures.

The point of my reply wasn't to get you to actually test the world ;-)

I was indirectly pointing out that "works for me" is not good enough
justification. Much checking for safety/benefit required.

-Mike

2015-04-07 18:05:13

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On 2015-04-07 06:09, Mike Galbraith wrote:
> On Tue, 2015-04-07 at 15:56 +0800, Pengfei Yuan wrote:
>> I am trying legacy GCC versions.
>> But I am not able to try different architectures.
>
> The point of my reply wasn't to get you to actually test the world ;-)
>
> I was indirectly pointing out that "works for me" is not good enough
> justification. Much checking for safety/benefit required.
>
Safety especially, -O3 is known to cause perfectly standards-compliant
code to break in weird ways in user-space.



Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-08 01:01:00

by Pengfei Yuan

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

Could you please provide some examples that I can investigate?
Thanks!

2015-04-08 2:05 GMT+08:00 Austin S Hemmelgarn <[email protected]>:
> On 2015-04-07 06:09, Mike Galbraith wrote:
>> On Tue, 2015-04-07 at 15:56 +0800, Pengfei Yuan wrote:
>>> I am trying legacy GCC versions.
>>> But I am not able to try different architectures.
>>
>> The point of my reply wasn't to get you to actually test the world ;-)
>>
>> I was indirectly pointing out that "works for me" is not good enough
>> justification. Much checking for safety/benefit required.
>>
> Safety especially, -O3 is known to cause perfectly standards-compliant
> code to break in weird ways in user-space.
>
>

2015-04-08 12:06:12

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On 2015-04-07 21:00, Pengfei Yuan wrote:
> Could you please provide some examples that I can investigate?
> Thanks!
>
> 2015-04-08 2:05 GMT+08:00 Austin S Hemmelgarn <[email protected]>:
>> On 2015-04-07 06:09, Mike Galbraith wrote:
>>> On Tue, 2015-04-07 at 15:56 +0800, Pengfei Yuan wrote:
>>>> I am trying legacy GCC versions.
>>>> But I am not able to try different architectures.
>>>
>>> The point of my reply wasn't to get you to actually test the world ;-)
>>>
>>> I was indirectly pointing out that "works for me" is not good enough
>>> justification. Much checking for safety/benefit required.
>>>
>> Safety especially, -O3 is known to cause perfectly standards-compliant
>> code to break in weird ways in user-space.
>>
>>
I can't remember any off the top of my head, but it does say explicitly
in the GCC manual to be careful with -O3. IIRC, most of the issues
relate to -O3 enabling -ffast-math (which tends to really mess with code
that expects strict IEEE 754 compliance), so it may not be as much of an
issue for kernel code. You might look into some of the projects that
use -O3 by default (I think most of the Mozilla so0ftware does these
days, and I know that there are others, I just can't remember what right
now).


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-08 12:19:06

by Richard Weinberger

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On Wed, Apr 8, 2015 at 3:00 AM, Pengfei Yuan <[email protected]> wrote:
> Could you please provide some examples that I can investigate?
> Thanks!

It would be awesome if you could find out which gcc optimizations
cause the speed up.
"gcc -c -Q -O3 --help=optimizers" will help you.

Please also double check your results.
You need do to multiple runs, etc...
Especially the redis speed up looks odd. Does redis really spend that much time
in the kernel?

--
Thanks,
//richard

2015-04-08 12:49:19

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On 2015-04-08 08:19, Richard Weinberger wrote:
> On Wed, Apr 8, 2015 at 3:00 AM, Pengfei Yuan <[email protected]> wrote:
>> Could you please provide some examples that I can investigate?
>> Thanks!
>
> It would be awesome if you could find out which gcc optimizations
> cause the speed up.
> "gcc -c -Q -O3 --help=optimizers" will help you.
>
> Please also double check your results.
> You need do to multiple runs, etc...
> Especially the redis speed up looks odd. Does redis really spend that much time
> in the kernel?
>
My guess would be that much of the speed up is actually related to
context switch handling and the scheduling/statistics related code.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-08 12:57:31

by Alan Cox

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

> I can't remember any off the top of my head, but it does say explicitly
> in the GCC manual to be careful with -O3. IIRC, most of the issues
> relate to -O3 enabling -ffast-math (which tends to really mess with code
> that expects strict IEEE 754 compliance), so it may not be as much of an
> issue for kernel code. You might look into some of the projects that
> use -O3 by default (I think most of the Mozilla so0ftware does these
> days, and I know that there are others, I just can't remember what right
> now).

Historically -O3 used to produce code that used a lot more memory and was
frequently neither correct nor fast. That was however in the days of gcc
2.7.x and I don't know that anyone has taken a hard look at stuff with a
modern gcc.

At the very least I think a -O3 change for x86 as well as being
benchmarked would need to go through all the stress testers and
regression testing build systems we now have to try and catch any
surprises.

If your numbers are right then it looks well worth investigating.

Alan

2015-04-08 13:16:33

by Pengfei Yuan

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

2015-04-08 20:19 GMT+08:00 Richard Weinberger <[email protected]>:
> On Wed, Apr 8, 2015 at 3:00 AM, Pengfei Yuan <[email protected]> wrote:
>> Could you please provide some examples that I can investigate?
>> Thanks!
>
> It would be awesome if you could find out which gcc optimizations
> cause the speed up.
> "gcc -c -Q -O3 --help=optimizers" will help you.
>

This is really helpful.
But I can only find very short description for each option from
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

> Please also double check your results.
> You need do to multiple runs, etc...
> Especially the redis speed up looks odd. Does redis really spend that much time
> in the kernel?

Redis is special among the six applications because it is single-threaded.

Yuan

2015-04-08 13:19:34

by Pengfei Yuan

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

2015-04-08 20:06 GMT+08:00 Austin S Hemmelgarn <[email protected]>:
> I can't remember any off the top of my head, but it does say explicitly in
> the GCC manual to be careful with -O3. IIRC, most of the issues relate to
> -O3 enabling -ffast-math (which tends to really mess with code that expects
> strict IEEE 754 compliance), so it may not be as much of an issue for kernel
> code. You might look into some of the projects that use -O3 by default (I
> think most of the Mozilla so0ftware does these days, and I know that there
> are others, I just can't remember what right now).
>

I am afraid you are talking about -Ofast, not -O3.
See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Yuan

2015-04-08 13:21:52

by Richard Weinberger

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

Am 08.04.2015 um 15:16 schrieb Pengfei Yuan:
> 2015-04-08 20:19 GMT+08:00 Richard Weinberger <[email protected]>:
>> On Wed, Apr 8, 2015 at 3:00 AM, Pengfei Yuan <[email protected]> wrote:
>>> Could you please provide some examples that I can investigate?
>>> Thanks!
>>
>> It would be awesome if you could find out which gcc optimizations
>> cause the speed up.
>> "gcc -c -Q -O3 --help=optimizers" will help you.
>>
>
> This is really helpful.
> But I can only find very short description for each option from
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Yeah, but if you know -fXY gives a nice speed up we can add it to our
CFLAGS if it makes sense and has not much downsides.
Blindly enabling -O3 can be dangerous as it might make the generated code
much bigger and the asm unreadable.

>> Please also double check your results.
>> You need do to multiple runs, etc...
>> Especially the redis speed up looks odd. Does redis really spend that much time
>> in the kernel?
>
> Redis is special among the six applications because it is single-threaded.

Still it would be nice to now much more about the load and why -O3 helps.

Thanks,
//richard

2015-04-08 13:53:51

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

On 2015-04-08 09:19, Pengfei Yuan wrote:
> 2015-04-08 20:06 GMT+08:00 Austin S Hemmelgarn <[email protected]>:
>> I can't remember any off the top of my head, but it does say explicitly in
>> the GCC manual to be careful with -O3. IIRC, most of the issues relate to
>> -O3 enabling -ffast-math (which tends to really mess with code that expects
>> strict IEEE 754 compliance), so it may not be as much of an issue for kernel
>> code. You might look into some of the projects that use -O3 by default (I
>> think most of the Mozilla so0ftware does these days, and I know that there
>> are others, I just can't remember what right now).
>>
>
> I am afraid you are talking about -Ofast, not -O3.
> See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
>
> Yuan
>
You're right, I had been looking at the wrong paragraph in the info
manual. Sorry about any confusion.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-27 20:37:25

by Pavel Machek

[permalink] [raw]
Subject: Re: Why not build kernel with -O3

Hi!

> I have conducted some experiments to compare kernels built with -O2
> and -O3. Here are the results:
>
> Application Performance O2 Performance O3 Improvement
> Apache 127814.14 req/s 130321.24 req/s 1.96%
> Nginx 537589.08 req/s 556723.32 req/s 3.56%
> MySQL 70661.38 tx/s 71008.47 tx/s 0.49%
> PostgreSQL 79763.39 tx/s 79535.59 tx/s -0.29%
> Redis 352547.47 op/s 405417.24 op/s 15.0%
> Memcached 844439.14 op/s 845321.79 op/s 0.10%
>
> Geomean: +3.34%
>
> Experiment environment: Linux 3.19.3, GCC 4.9.3 prerelease, Core-i7
> 4770, 32G RAM, 10GbE
>
> LMbench microbenchmark also shows reduction in various latencies, as
> well as increase of throughputs.

What is the size difference with -O3?

Do you have mostly-userspace benchmark? (kernel build?) -O3 could hurt there
if it produces bigger code..
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html