2006-03-26 08:31:25

by L A Walsh

[permalink] [raw]
Subject: Save 320K on production machines?

I have one older system with fixed resources (upgraded
as far as it will go) that I try to use as a "stable" machine.
It's had maybe 2-3 unexplained "Oopses" over the past 3-4
years (subtracting out configuration problems, where I usually
had a debug-enabled kernel installed anyway). To minimize
problems, I disable unused hardware, and all _used_ hardware
is compiled in (no module loading overhead, no chances for
arbitrary code insertion).

I find I can save a bunch of memory if I turn off Debugging
symbols and enable compile-time optimization. I know it's
not something useful for development, but some people might
find the extra memory useful.

320240 bytes of memory savings comes from:
188464 Turning off debugging symbols (CONFIG_KALLSYMS)
125008 Compiler Optimization
6784 Disabling unused code (HOTPLUG stubs)**2


** primarily "funit-at-a-time", though -fweb &
-frename-registers may add a bit (GCC 3.3.5 as
patched by SuSE; Maybe extra optimizations could
be a "CONFIG" option much like regparms is now?

**2 (please don't this as condescending; I know many
people already know this stuff, but, I find it's
best not to _assume_ what people know) But anyway...
I've always been taught that disabling unused code
was one way to improve reliability and performance.
Generally, it is in the code-paths and features
that are least used that are most likely to have
hidden problems. I've always been told that it is
more secure to disable whatever features and drivers
you don't need. It's similar to the concept of not
putting a C-development environment or ssh-client
on your web server.

I know 320K isn't that much to some people, but you have
to remember, that double that amount is all some people
will "ever need"...:-) Not working with new hardware
is leading me to attempting to squeeze the last bits of
performance out of my current hardware and software. :-)

-l



2006-03-26 09:24:54

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Save 320K on production machines?


> no chances for arbitrary code insertion).

Uh, there is /dev/kmem. Of course it is harder than module loading, but
it's there.

> ** primarily "funit-at-a-time", though -fweb &
> -frename-registers may add a bit (GCC 3.3.5 as
> patched by SuSE; Maybe extra optimizations could
> be a "CONFIG" option much like regparms is now?

IIRC, -funit-at-a-time with gcc3 made compiled code go bloat.


Jan Engelhardt
--

2006-03-26 10:06:45

by Adrian Bunk

[permalink] [raw]
Subject: Re: Save 320K on production machines?

On Sun, Mar 26, 2006 at 11:24:15AM +0200, Jan Engelhardt wrote:
>...
> > ** primarily "funit-at-a-time", though -fweb &
> > -frename-registers may add a bit (GCC 3.3.5 as
> > patched by SuSE; Maybe extra optimizations could
> > be a "CONFIG" option much like regparms is now?
>
> IIRC, -funit-at-a-time with gcc3 made compiled code go bloat.

That's wrong, the compiled code is smaller.

> Jan Engelhardt

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-03-26 10:39:31

by Andre Tomt

[permalink] [raw]
Subject: Re: Save 320K on production machines?

Linda Walsh wrote:
<snip>
> To minimize
> problems, I disable unused hardware, and all _used_ hardware
> is compiled in (no module loading overhead, no chances for
> arbitrary code insertion).

FYI, rootkits have been able to cope with inserting kernel code without
using the modules support for ages. It is only makes it marginally harder.

--
André Tomt

2006-03-27 10:05:21

by L A Walsh

[permalink] [raw]
Subject: Re: Save 320K on production machines?

Andre Tomt wrote:
> Linda Walsh wrote:
> <snip>
>> To minimize
>> problems, I disable unused hardware, and all _used_ hardware
>> is compiled in (no module loading overhead, no chances for
>> arbitrary code insertion).
>
> FYI, rootkits have been able to cope with inserting kernel code
> without using the modules support for ages. It is only makes it
> marginally harder.
>
---
True, but that's the point. People break into systems with
passwords. Just because passwords aren't 100% effective in
protecting systems doesn't mean we don't use them. :-)

The point is to "minimize" a vulnerability profile.
I'm wondering why unused code is required to be compiled
in to standard kernels. It seems very un-linux like -- more like
Windows that has support for everything compiled in.

Reducing code bloat is not just a good idea for embedded systems.
It's good for performance and security if for no other reason that
there are fewer lines that could go wrong. :-)

-l

2006-03-27 10:22:44

by L A Walsh

[permalink] [raw]
Subject: Re: Save 320K on production machines?

Adrian Bunk wrote:
> On Sun, Mar 26, 2006 at 11:24:15AM +0200, Jan Engelhardt wrote
>>> ** primarily "funit-at-a-time", though -fweb &
>>> -frename-registers may add a bit (GCC 3.3.5 as
>>> patched by SuSE; Maybe extra optimizations could
>>> be a "CONFIG" option much like regparms is now?
>>>
>> IIRC, -funit-at-a-time with gcc3 made compiled code go bloat.
>>
> That's wrong, the compiled code is smaller.
>
>> Jan Engelhardt
>>
> cu
> Adrian
>
---
That's my point -- if the code is optimized and it shrinks the code
size
due to unnecessary path duplication, the remain code is more likely
to fit in the CPU cache (getting some performance benefits as well
faster in the process), isn't that a good reason to use it?

This was measured on a Pentium-III, SMP optioned kernel. I'm sure it
will help my code fit just a little better in the runtime caches, no?

The current makefile turns on the optimization only on gcc4 or higher,
but my results were with gcc3.5.5. Maybe defaults for 386 should
enabler the optimization for some versions of gcc 3 as well?
-l

2006-03-27 11:36:49

by Paulo Marques

[permalink] [raw]
Subject: Re: Save 320K on production machines?

Linda Walsh wrote:
> [...]
> The current makefile turns on the optimization only on gcc4 or higher,
> but my results were with gcc3.5.5. Maybe defaults for 386 should
> enabler the optimization for some versions of gcc 3 as well? -l

AFAICR, the problem with gcc3 and unit-at-a-time was stack usage with
local variables on automatically inlined functions.

For instance, if function A called B and after B returned called C, both
local variables of B and C would be given a reserved space on the stack
during the execution of A if both functions were automatically inlined.
So the space needed now was A+B+C whereas before was Max(A+B, A+C).

--
Paulo Marques - http://www.grupopie.com

Pointy-Haired Boss: I don't see anything that could stand in our way.
Dilbert: Sanity? Reality? The laws of physics?

2006-03-28 14:29:59

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Save 320K on production machines?


> True, but that's the point. People break into systems with
> passwords. Just because passwords aren't 100% effective in
> protecting systems doesn't mean we don't use them. :-)
>

yepp, in the end it always boils down to pebkac. :-]



Jan Engelhardt
--

2006-03-30 21:34:52

by L A Walsh

[permalink] [raw]
Subject: Re: Save 320K on production machines?

Paulo Marques wrote:
> AFAICR, the problem with gcc3 and unit-at-a-time was stack usage with
> local variables on automatically inlined functions.
>
> For instance, if function A called B and after B returned called C,
> both local variables of B and C would be given a reserved space on the
> stack during the execution of A if both functions were automatically
> inlined. So the space needed now was A+B+C whereas before was Max(A+B,
> A+C).
>
Hmmm...that at least makes some sense for disabling it, but jeez,
this stack "thing" is such an "unknown quantity". It would be
"nice" if we had some clue about how much stack our systems actually
use -- I'm using a 4K stack size and system has been rock-solid
stable since around release of 2.15.2.

If I "doubled" my stack back to 8K, that would lower the "random
probability" of hitting a stack limit, but right now, it seems like
amount of stack "needed" is nearly guesswork. Sigh. Having my
kernel fairly static and minimalistic (no unused modules; no loadable
modules, etc) I might only "need" 3K.

1) It would be nice if a "stack usage" option could be turned on
that would do some sort of run-time bounds checking that could
display the max-stack used "so far" in "/proc".

2) How difficult would it be to place kernel stack in a "pageable" pool
where the limit of valid data in a 4K page is only 3.5K - then
when a kernel routine tries to exceed the stack boundary, it takes a
page fault where a "note" could be logged that more stack was "needed",
then automatically map another 4K page into the stack and return to
interrupted routine.

It sounds a bit strange -- the kernel having to call another part of
the kernel to handle a pagefault within the kernel, but perhaps there
could be another level of "partitioning" w/in kernel space that would
allow the non-paging part of the kernel to be paged in/out in a similar
way to user code.

I have *no clue*, if it is the same idea, but the NT kernel has *parts*
of the kernel marked swappable by default. If one wants to lock all of
the kernel in memory, there is a flag that can be set in the registry
to disable paging the "Executive". Could that be applicable in
Linux?

-l


2006-03-31 09:43:17

by Adrian Bunk

[permalink] [raw]
Subject: Re: Save 320K on production machines?

On Thu, Mar 30, 2006 at 01:34:07PM -0800, Linda Walsh wrote:
>...
> If I "doubled" my stack back to 8K, that would lower the "random
> probability" of hitting a stack limit, but right now, it seems like
> amount of stack "needed" is nearly guesswork. Sigh. Having my
> kernel fairly static and minimalistic (no unused modules; no loadable
> modules, etc) I might only "need" 3K.

Things like unused modules or loadable module support should have more
or less zero impact on stack usage.

> 1) It would be nice if a "stack usage" option could be turned on
> that would do some sort of run-time bounds checking that could
> display the max-stack used "so far" in "/proc".

The -rt kernel contains something like this.

> 2) How difficult would it be to place kernel stack in a "pageable" pool
> where the limit of valid data in a 4K page is only 3.5K - then
> when a kernel routine tries to exceed the stack boundary, it takes a
> page fault where a "note" could be logged that more stack was "needed",
> then automatically map another 4K page into the stack and return to
> interrupted routine.
>
> It sounds a bit strange -- the kernel having to call another part of
> the kernel to handle a pagefault within the kernel, but perhaps there
> could be another level of "partitioning" w/in kernel space that would
> allow the non-paging part of the kernel to be paged in/out in a similar
> way to user code.
>...

This has been discussed to death, and the consensus was that code
resulting in a too high stack usage should be fixed.

If you find any stack problems with 4k stacks and the automatically
enabled unit-at-a-time when using gcc 4.x in kernel 2.6.16-mm2, please
send a bug report.

Regarding unit-at-a-time with gcc 3.x, it works most time for most
people, but it's completely unsupported. If you want to use
unit-at-a-time on i386, please use gcc 4.x.

> -l

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-03-31 09:55:23

by Jörn Engel

[permalink] [raw]
Subject: Re: Save 320K on production machines?

On Thu, 30 March 2006 13:34:07 -0800, Linda Walsh wrote:
>
> 1) It would be nice if a "stack usage" option could be turned on
> that would do some sort of run-time bounds checking that could
> display the max-stack used "so far" in "/proc".

Would CONFIG_DEBUG_STACK_USAGE=y do what you want?

> 2) How difficult would it be to place kernel stack in a "pageable" pool
> where the limit of valid data in a 4K page is only 3.5K - then
> when a kernel routine tries to exceed the stack boundary, it takes a
> page fault where a "note" could be logged that more stack was "needed",
> then automatically map another 4K page into the stack and return to
> interrupted routine.

S390 has something a bit like that. They can specify the stack limit
and get an exception when a function is trying to grow the stack
beyond the limit. Martin Schwidefsky might know the details a bit
better.

J?rn

--
You ain't got no problem, Jules. I'm on the motherfucker. Go back in
there, chill them niggers out and wait for the Wolf, who should be
coming directly.
-- Marsellus Wallace