2002-01-25 15:51:19

by Horst von Brand

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

"Moore, Robert" <[email protected]> said:
> And I'll add my comments about so-called "bloat".
>
> Given that the MS VC compiler consistently generates IA-32 code that is over
> 30% smaller than GCC, I would have to say that Linux would benefit far more
> by directing all of the energy spent complaining about code size toward
> optimizing the compiler.

Is it faster too? Or at least not slower? If not, what is the point?
--
Horst von Brand http://counter.li.org # 22616


2002-01-25 16:03:06

by Ryan Cumming

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

On January 25, 2002 07:50, Horst von Brand wrote:
> > Given that the MS VC compiler consistently generates IA-32 code that is
> > over 30% smaller than GCC, I would have to say that Linux would benefit
> > far more by directing all of the energy spent complaining about code size
> > toward optimizing the compiler.
>
> Is it faster too? Or at least not slower? If not, what is the point?

Storing 30% less executable pages in memory? Reading 30% less executable
pages off the disk? Performing 30% less relocations?

-Ryan

2002-01-25 16:16:12

by Andreas Schwab

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

Ryan Cumming <[email protected]> writes:

|> On January 25, 2002 07:50, Horst von Brand wrote:
|> > > Given that the MS VC compiler consistently generates IA-32 code that is
|> > > over 30% smaller than GCC, I would have to say that Linux would benefit
|> > > far more by directing all of the energy spent complaining about code size
|> > > toward optimizing the compiler.
|> >
|> > Is it faster too? Or at least not slower? If not, what is the point?
|>
|> Storing 30% less executable pages in memory? Reading 30% less executable
|> pages off the disk?

These are all startup costs that are lost in the noise the longer the
program runs.

|> Performing 30% less relocations?

30% less code does not imply 30% less relocations.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE GmbH, Deutschherrnstr. 15-19, D-90429 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2002-01-25 20:06:47

by Ryan Cumming

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

On January 25, 2002 08:15, Andreas Schwab wrote:
> These are all startup costs that are lost in the noise the longer the
> program runs.

Executable size is -not- just a startup cost. Larger executables will have a
bigger memory footprint and less cache locality. A KDE desktop on 64megs of
memory would be noticably more responsive if GCC generated executables the
same size as VC++, due to less swap thrashing alone.

-Ryan

2002-01-26 01:01:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

In article <[email protected]>,
Andreas Schwab <[email protected]> wrote:
>|>
>|> Storing 30% less executable pages in memory? Reading 30% less executable
>|> pages off the disk?
>
>These are all startup costs that are lost in the noise the longer the
>program runs.

That's a load of bull.

Startup costs tend to _dominate_ most applications, except for
benchmarks, scientific loads and games/multimedia.

Not surprisingly, those three categories are also the ones where lots of
optimizer tuning is regularly done. But it's a _small_ subset of the
general application load.

Note that not only do startup costs often dominate the rest, they are
psychologically very important. From a user standpoint, an application
that loads "instantly" is mostly viewed as being much more pleasant than
one that takes longer to load, even if the latter were to run faster
once loaded.

This is also something that only gets worse and worse with time. Not
only do caches get more and more important (because the CPU core gets
ever faster compared to the outside world), but they get larger as well.

And what that leads to is that the cache warmup phase (which is linear
wrt size of the problem) gets relatively _slower_ and bigger compared to
the warm cache behaviour (ie the running phase). So optimizing for size
is (a) the right thing to do and (b) going to be even more so in the
future.

It's sad that gcc relegates "optimize for size" to a second-class
citizen. Instead of having a "-Os" (that optimizes for size and doesn't
work together with other optimizations), it would be better to have a
"-Olargecode", which explicitly enables "don't care about code size" for
those (few) applications where it makes sense.

Linus

2002-01-26 03:45:18

by Jamie Lokier

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

Linus Torvalds wrote:
> It's sad that gcc relegates "optimize for size" to a second-class
> citizen. Instead of having a "-Os" (that optimizes for size and doesn't
> work together with other optimizations), it would be better to have a
> "-Olargecode", which explicitly enables "don't care about code size" for
> those (few) applications where it makes sense.

Btw, there have been suggestions that -Os may actually be faster for x86
code on current processors.

-- Jamie

2002-01-26 16:38:25

by Martin Eriksson

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

----- Original Message -----
From: "Jamie Lokier" <[email protected]>
To: "Linus Torvalds" <[email protected]>
Cc: <[email protected]>
Sent: Saturday, January 26, 2002 4:41 AM
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel


> Linus Torvalds wrote:
> > It's sad that gcc relegates "optimize for size" to a second-class
> > citizen. Instead of having a "-Os" (that optimizes for size and doesn't
> > work together with other optimizations), it would be better to have a
> > "-Olargecode", which explicitly enables "don't care about code size" for
> > those (few) applications where it makes sense.
>
> Btw, there have been suggestions that -Os may actually be faster for x86
> code on current processors.

Hmm.. I tried to compile the kernel with -Os (gcc 2.96-98) and I just got a
~1% smaller vmlinux and a ~3% smaller bzImage. Maybe the size optimizations
doesn't show on these files? Internal data structures that are much bigger
than "real" code?

This is how I did:
--- Makefile Sat Jan 26 17:15:52 2002
+++ Makefile.Os Sat Jan 26 17:15:30 2002
@@ -88,7 +88,7 @@

CPPFLAGS := -D__KERNEL__ -I$(HPATH)

-CFLAGS := $(CPPFLAGS) -Wall -Wstrict-prototypes -Wno-trigraphs -O2 \
+CFLAGS := $(CPPFLAGS) -Wall -Wstrict-prototypes -Wno-trigraphs -Os \
-fomit-frame-pointer -fno-strict-aliasing -fno-common
AFLAGS := -D__ASSEMBLY__ $(CPPFLAGS)

_____________________________________________________
| Martin Eriksson <[email protected]>
| MSc CSE student, department of Computing Science
| Ume? University, Sweden

- ABIT BP6(RU) - 2xCeleron 400 - 128MB/PC100/C2 Acer
- Maxtor 10/5400/U33 HPT P/M - Seagate 6/5400/DMA2 HPT S/M
- 2xDE-530TX - 1xTulip - Linux 2.4.17+ide+preempt

2002-01-26 16:47:38

by Jeff Garzik

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

Martin Eriksson wrote:
>
> ----- Original Message -----
> From: "Jamie Lokier" <[email protected]>
> To: "Linus Torvalds" <[email protected]>
> Cc: <[email protected]>
> Sent: Saturday, January 26, 2002 4:41 AM
> Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel
>
> > Linus Torvalds wrote:
> > > It's sad that gcc relegates "optimize for size" to a second-class
> > > citizen. Instead of having a "-Os" (that optimizes for size and doesn't
> > > work together with other optimizations), it would be better to have a
> > > "-Olargecode", which explicitly enables "don't care about code size" for
> > > those (few) applications where it makes sense.
> >
> > Btw, there have been suggestions that -Os may actually be faster for x86
> > code on current processors.
>
> Hmm.. I tried to compile the kernel with -Os (gcc 2.96-98) and I just got a
> ~1% smaller vmlinux and a ~3% smaller bzImage. Maybe the size optimizations
> doesn't show on these files? Internal data structures that are much bigger
> than "real" code?

That doesn't tell us much unless you benchmark any speed
improvements/degradations noticed. Hidden in that 1% may be more
favorable I-cache usage, better register usage... who knows.

It would also be interesting to compile key files like kernel/sched.c or
mm/vmscan.c in assembly using O2 and Os, and compare the output with
diff -u.

Jeff



--
Jeff Garzik | "I went through my candy like hot oatmeal
Building 1024 | through an internally-buttered weasel."
MandrakeSoft | - goats.com

2002-01-26 17:33:31

by Felix von Leitner

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

Thus spake Linus Torvalds ([email protected]):
> >These are all startup costs that are lost in the noise the longer the
> >program runs.
> That's a load of bull.

Agreed. I like to plug my diet libc slides at this point which (I hope)
make a point about this with network programming as an example. See
http://www.fefe.de/dietlibc/talk.pdf for details.

> Startup costs tend to _dominate_ most applications, except for
> benchmarks, scientific loads and games/multimedia.

> Not surprisingly, those three categories are also the ones where lots of
> optimizer tuning is regularly done. But it's a _small_ subset of the
> general application load.

Exactly. However, due to these optimizations the trend goes to large
long-running monster applications like Mozilla or GNOME and KDE. KDE
does not ask me whether I want to run those 20 processes all the time.
It just starts them. And new processes are forked off a long running
process because the start-up cost has become so large.

> Note that not only do startup costs often dominate the rest, they are
> psychologically very important.

That is not just psychological. Most developers would do good to visit
a close university or school and see what kind of machines they use
there. Ever tried installing Debian on a Sparc SLC? It took a little
over 24 hours. Compiling a kernel takes over 12 hours on that box IIRC.
But that's not the point. This hardware was very much usable a few
years ago. Today it's practically futile to use it. You are waiting
more than you are working. On my desktop Athlon, 1.3 million CPU cycles
static start-up cost for running a dynamically linked glibc program may
not look like much. But my statically linked ls does an ls -rtl of a
directory with 10 files in less time.

> It's sad that gcc relegates "optimize for size" to a second-class
> citizen. Instead of having a "-Os" (that optimizes for size and doesn't
> work together with other optimizations), it would be better to have a
> "-Olargecode", which explicitly enables "don't care about code size" for
> those (few) applications where it makes sense.

What do you mean with "does not work together with other optimizations"?
I use -Os all the time. Actually, -Os often produces faster code than
-O2 or -O3! What other optimizations do you mean? I don't need much
other optimizer options besides -fomit-frame-pointer and -march=athlon
if you link PIC code and use an Athlon.

And since -funroll-loops and -finline-functions are enabled explicitly
(or the latter with -O3 and larger by people who don't know what they
are doing), I think gcc already does what you want it to do ;)

Felix

2002-01-26 17:52:17

by Jamie Lokier

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

Jeff Garzik wrote:
> > Hmm.. I tried to compile the kernel with -Os (gcc 2.96-98) and I just got a
> > ~1% smaller vmlinux and a ~3% smaller bzImage. Maybe the size optimizations
> > doesn't show on these files? Internal data structures that are much bigger
> > than "real" code?
>
> That doesn't tell us much unless you benchmark any speed
> improvements/degradations noticed. Hidden in that 1% may be more
> favorable I-cache usage, better register usage... who knows.
>
> It would also be interesting to compile key files like kernel/sched.c or
> mm/vmscan.c in assembly using O2 and Os, and compare the output with
> diff -u.

It'd be good to know why it's not achieving the quoted 30% space saving
that other compilers manage for normal code, unless it's myth of course.

-- Jamie

2002-01-26 18:24:32

by Martin Eriksson

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel


----- Original Message -----
From: "Jamie Lokier" <[email protected]>
To: "Jeff Garzik" <[email protected]>
Cc: "Martin Eriksson" <[email protected]>; "Linus Torvalds"
<[email protected]>; <[email protected]>
Sent: Saturday, January 26, 2002 6:48 PM
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel


> Jeff Garzik wrote:
> > > Hmm.. I tried to compile the kernel with -Os (gcc 2.96-98) and I just
got a
> > > ~1% smaller vmlinux and a ~3% smaller bzImage. Maybe the size
optimizations
> > > doesn't show on these files? Internal data structures that are much
bigger
> > > than "real" code?
> >
> > That doesn't tell us much unless you benchmark any speed
> > improvements/degradations noticed. Hidden in that 1% may be more
> > favorable I-cache usage, better register usage... who knows.
> >
> > It would also be interesting to compile key files like kernel/sched.c or
> > mm/vmscan.c in assembly using O2 and Os, and compare the output with
> > diff -u.
>
> It'd be good to know why it's not achieving the quoted 30% space saving
> that other compilers manage for normal code, unless it's myth of course.
>

So I compiled sched.c to assembly (note that I have the rml preempt patch
there too), and the results are pretty strange:

Diff between -O2 and -Os:
http://giron.wox.org/sched.s.diff

As you can see, not much size optimizing are done from -O2.

The C file:
http://giron.wox.org/sched.c

Command line:
gcc -D__KERNEL__ -Wall -Wstrict-prototypes -Wno-trigraphs -OX \
-fomit-frame-pointer -fno-strict-aliasing -fno-common -S sched.c

where -OX have been replaced by -O0 -O2 -O3 and -Os

The assembler files:
http://giron.wox.org/sched.s.o0
http://giron.wox.org/sched.s.o2
http://giron.wox.org/sched.s.o3
http://giron.wox.org/sched.s.os

The file created with -O0 (no optimization) is the biggest of all, even
bigger than -O3.
-O2 and -Os differ only about 1%

So either
a) -O2 does size optimization
b) -Os sucks at size optimization

_____________________________________________________
| Martin Eriksson <[email protected]>
| MSc CSE student, department of Computing Science
| Ume? University, Sweden


2002-01-26 19:43:31

by Florian Weimer

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

Felix von Leitner <[email protected]> writes:

> What do you mean with "does not work together with other optimizations"?

You cannot both enable -Os and prefetching support, for example (at
least with certain GCC versions).

--
Florian Weimer [email protected]
University of Stuttgart http://CERT.Uni-Stuttgart.DE/people/fw/
RUS-CERT +49-711-685-5973/fax +49-711-685-5898

2002-01-26 21:43:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel


On Sat, 26 Jan 2002, Martin Eriksson wrote:
>
> Hmm.. I tried to compile the kernel with -Os (gcc 2.96-98) and I just got a
> ~1% smaller vmlinux and a ~3% smaller bzImage.

Note that while "-Os" exists and is documented, as far as I know gcc
doesn't actually do much with it. It really acts mostly as a "disable
certain optimizations" than anything else.

In the 3.0.x tree, it seems to change some of the weights of some
instructions, and it might make more of a difference there. But at the
same time it is quite telling that "-Os" doesn't even change any of the
alignments etc - because gcc developers do not seem to really support it
as a real option. It's an after-thought, not a big performance push.

Linus

2002-01-27 13:56:57

by Martin Dalecki

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

Linus Torvalds wrote:

>In article <[email protected]>,
>Andreas Schwab <[email protected]> wrote:
>
>>|>
>>|> Storing 30% less executable pages in memory? Reading 30% less executable
>>|> pages off the disk?
>>
>>These are all startup costs that are lost in the noise the longer the
>>program runs.
>>
>
>That's a load of bull.
>
>Startup costs tend to _dominate_ most applications, except for
>benchmarks, scientific loads and games/multimedia.
>
Well the situation is in fact even more embarassing if you do true
benchmarking on really long running
(well that's relative of course) applications. I personaly did once in a
time a benchmarking on the good
old tex running trhough a few hundert pages long document. Well the -O2
version was actually about 15%
*SLOWER* then the -Os version. That's becouse in real world
applications, which don't do numerical
calculations but most of the time they do "decision taking" the whole
mulitpipline sceduling get's
outwighted by the simple cache pressure thing by *far*.

The whole GCC developement is badly misguided on this for *sure*. They
develop for numerics where
most programs are kind of doing a controlling/decision taking job.
Well I know I should try this with the kernel one time...

2002-01-29 09:20:09

by Andrey Panin

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

On Sat, Jan 26, 2002 at 01:42:52PM -0800, Linus Torvalds wrote:
>
> On Sat, 26 Jan 2002, Martin Eriksson wrote:
> >
> > Hmm.. I tried to compile the kernel with -Os (gcc 2.96-98) and I just got a
> > ~1% smaller vmlinux and a ~3% smaller bzImage.
>
> Note that while "-Os" exists and is documented, as far as I know gcc
> doesn't actually do much with it. It really acts mostly as a "disable
> certain optimizations" than anything else.
>

Stupid questions:
- what stop us from using -mregparm=3 gcc switch ?
- same with -Os -malign-loops=1 -malign-jumps=1 ?
- any tool to measure perfomance gain/penalty of above ?

> In the 3.0.x tree, it seems to change some of the weights of some
> instructions, and it might make more of a difference there. But at the
> same time it is quite telling that "-Os" doesn't even change any of the
> alignments etc - because gcc developers do not seem to really support it
> as a real option. It's an after-thought, not a big performance push.
>
> Linus
>

--
Andrey Panin | Embedded systems software engineer
[email protected] | PGP key: wwwkeys.eu.pgp.net


Attachments:
(No filename) (232.00 B)

2002-01-30 07:58:55

by Andrey Panin

[permalink] [raw]
Subject: Re: [ACPI] ACPI mentioned on lwn.net/kernel

On Tue, Jan 29, 2002 at 02:14:32PM -0500, Mark Hahn wrote:
> > - what stop us from using -mregparm=3 gcc switch ?
>
> I dimly recall that it can generage bad code, perhaps only
> on old compilers.

Bad code, hmm ... slow code or wrong code ?
IIRC gcc 2.95.3 declared as minimal requirement for kernel compilation,
so may be it's not an issue anymore ?

>
> > - same with -Os -malign-loops=1 -malign-jumps=1 ?
>
> 2 is probably always the lowest you should go, and indeed,
> I think that's all the compiler permits.
>

gcc (at least 2.95.2) permits 1, but i didn't check generated code.

> > - any tool to measure perfomance gain/penalty of above ?
>
> lmbench. it has to be a microbenchmark to measure this sort of thing,
> though a sanity check (kernel compile) would also be useful.

Thanks, i'll test these issues this weekend.

Best regards.

--
Andrey Panin | Embedded systems software engineer
[email protected] | PGP key: wwwkeys.eu.pgp.net


Attachments:
(No filename) (232.00 B)