2003-07-23 18:31:46

by Bernardo Innocenti

[permalink] [raw]
Subject: Kernel 2.6 size increase

Hello,

code bloat can be very harmful on embedded targets, but it's
generally inconvenient for any platform. I've measured the
code increase between 2.4.21 and 2.6.0-test1 on a small
kernel configuration for ColdFire:

text data bss dec hex filename
640564 39152 134260 813976 c6b98 linux-2.4.x/linux
845924 51204 78896 976024 ee498 linux-2.5.x/vmlinux

I could provide the exact .config file for both kernels to
anybody interested. They are almost the same: no filesystems
except JFFS2, IPv4 and a bunch of small drivers. I have no
SMP, security, futexes, modules and anything else not
strictly needed to execute processes.

I've made a linker map file and compared the size of single
subsystems. These are the the major contributors to the
size increase:

kernel/ +27KB
mm/ +14KB
fs/ +47KB
drivers/ +35KB
net/ +64KB

I've digged into net/ with nm -S --size-sort. It seems that
the major increase is caused by net/xfrm/. Could this module
be made optional?

In fs/, almost all modules have got 30-40% bigger, therefore
bloat is probably caused by inlines and macros getting more
complex.

Block drivers and MTD have generally become smaller. Character
devices are responsable for most of the size increase in drivers/.

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html



2003-07-23 19:01:27

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

On Wed, 23 Jul 2003, Bernardo Innocenti wrote:

> Hello,
>
> code bloat can be very harmful on embedded targets, but it's
> generally inconvenient for any platform. I've measured the
> code increase between 2.4.21 and 2.6.0-test1 on a small
> kernel configuration for ColdFire:
>
> text data bss dec hex filename
> 640564 39152 134260 813976 c6b98 linux-2.4.x/linux
^^^^^^
> 845924 51204 78896 976024 ee498 linux-2.5.x/vmlinux

[SNIPPED...]

It looks like a lot of data may have been initialized in the
newer kernel, i.e. int barf = 0; or struct vomit = {0,}.
If they just declared the static data, it would end up in
.bss which is allocated at run-time (and zeroed) and is
not in the kernel image.

You might want to check this out. There is 51204 - 39152 = 12,052
more data, but 134260 - 78896 = 55350 less bss.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2003-07-23 19:21:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wed, Jul 23, 2003 at 08:46:46PM +0200, Bernardo Innocenti wrote:
> Hello,
>
> code bloat can be very harmful on embedded targets, but it's
> generally inconvenient for any platform. I've measured the
> code increase between 2.4.21 and 2.6.0-test1 on a small
> kernel configuration for ColdFire:
>
> text data bss dec hex filename
> 640564 39152 134260 813976 c6b98 linux-2.4.x/linux
> 845924 51204 78896 976024 ee498 linux-2.5.x/vmlinux
>
> I could provide the exact .config file for both kernels to
> anybody interested. They are almost the same: no filesystems
> except JFFS2, IPv4 and a bunch of small drivers. I have no
> SMP, security, futexes, modules and anything else not
> strictly needed to execute processes.

Yes, we need to get this down again. What compiler and compiler
flags are you using? Could you retry with the following ripped
from include/linux/compiler.h:

#if (__GNUC__ > 3) || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1)
#define inline __inline__ __attribute__((always_inline))
#define __inline__ __inline__ __attribute__((always_inline))
#define __inline __inline__ __attribute__((always_inline))
#endif

I'd especially be interested in the fs/ numbers after this.

Also -Os on both would be quite cool.

2003-07-23 19:52:22

by David Miller

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

On Wed, 23 Jul 2003 15:14:22 -0400 (EDT)
"Richard B. Johnson" <[email protected]> wrote:

> On Wed, 23 Jul 2003, Bernardo Innocenti wrote:
> It looks like a lot of data may have been initialized in the
> newer kernel, i.e. int barf = 0; or struct vomit = {0,}.
> If they just declared the static data, it would end up in
> .bss which is allocated at run-time (and zeroed) and is
> not in the kernel image.

GCC 3.3 and later do this automatically.

It's weird, since we killed TONS of explicit zero initializers during
2.5.x, you'd be pressed to find many examples like the one you
mention.

Another thing is that the define_per_cpu() stuff eliminated many huge
[NR_CPUS] arrays. But this probably doesn't apply to his kernel
unless he built is with SMP enabled.

2003-07-23 19:56:59

by David Miller

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wed, 23 Jul 2003 21:32:46 +0200
Christoph Hellwig <[email protected]> wrote:

> Could you retry with the following ripped
> from include/linux/compiler.h:
>
> #if (__GNUC__ > 3) || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1)
> #define inline __inline__ __attribute__((always_inline))
> #define __inline__ __inline__ __attribute__((always_inline))
> #define __inline __inline__ __attribute__((always_inline))
> #endif

Careful, some platforms won't work with this.

I know that ppc64's switch_to() for example must be inlined or else
the kernel stops working. It's either the above defines or adding
-finline-limit=100000 to the GCC command line to force it to be
inlined.

2003-07-23 20:00:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wed, Jul 23, 2003 at 01:11:54PM -0700, David S. Miller wrote:
> Careful, some platforms won't work with this.

I didn't say I want this changed again in mainline, I just
wanted to see whether gcc actually is smarter than us so we
need to remove some more inlines..

2003-07-23 20:08:22

by David Miller

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wed, 23 Jul 2003 21:15:33 +0100
Christoph Hellwig <[email protected]> wrote:

> I just wanted to see whether gcc actually is smarter than us
> so we need to remove some more inlines..

Drivers weren't audited much, and there's a lot of boneheaded
stuff in this area. But these should be mostly identical
to what would happen on the 2.4.x side

2003-07-23 20:15:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wed, Jul 23, 2003 at 01:22:56PM -0700, David S. Miller wrote:
> Drivers weren't audited much, and there's a lot of boneheaded
> stuff in this area. But these should be mostly identical
> to what would happen on the 2.4.x side

Please read the original message again - he stated that every single
module in fs/ got alot bigger - if it gets smaller or at least the
same size as 2.4 it's clearly a sign of inlines gone mad in the
filesystem/VM code and we need to look at that. If not we have to look
elsewhere.

2003-07-23 21:35:34

by Randy.Dunlap

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

On Wed, 23 Jul 2003 13:07:12 -0700 "David S. Miller" <[email protected]> wrote:

| On Wed, 23 Jul 2003 15:14:22 -0400 (EDT)
| "Richard B. Johnson" <[email protected]> wrote:
|
| > On Wed, 23 Jul 2003, Bernardo Innocenti wrote:
| > It looks like a lot of data may have been initialized in the
| > newer kernel, i.e. int barf = 0; or struct vomit = {0,}.
| > If they just declared the static data, it would end up in
| > .bss which is allocated at run-time (and zeroed) and is
| > not in the kernel image.
|
| GCC 3.3 and later do this automatically.
|
| It's weird, since we killed TONS of explicit zero initializers during
| 2.5.x, you'd be pressed to find many examples like the one you
| mention.
|
| Another thing is that the define_per_cpu() stuff eliminated many huge
| [NR_CPUS] arrays. But this probably doesn't apply to his kernel
| unless he built is with SMP enabled.

Yes, lots were already killed off, but there are also several
kernel-janitor patches to remove many more static 0 inits.
They can be found at
http://developer.osdl.org/ogasawara/kj-patches/uninit_static/
and I'll be trying to have them merged, although I don't know
how well they will be accepted.

--
~Randy

2003-07-23 21:42:24

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wednesday 23 July 2003 21:32, Christoph Hellwig wrote:

> Yes, we need to get this down again. What compiler and compiler
> flags are you using? Could you retry with the following ripped
> from include/linux/compiler.h:
>
> #if (__GNUC__ > 3) || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1)
> #define inline __inline__ __attribute__((always_inline))
> #define __inline__ __inline__ __attribute__((always_inline))
> #define __inline __inline__ __attribute__((always_inline))
> #endif

Not much changed:

text data bss dec hex filename
845924 51204 78896 976024 ee498 linux-2.5.x/vmlinux-inline
840368 48392 78896 967656 ec3e8 linux-2.5.x/vmlinux-noinline

By the way: this is uClinux 2.5.75-uc0. 2.6.0-test1 should be very close.
I'm building with gcc 3.3.1-pre with some ColdFire/uClinux patches.

> I'd especially be interested in the fs/ numbers after this.

Neither did it change much here:

text data bss dec hex filename
224145 6952 5468 236565 39c15 linux-2.5.x/fs/built-in.o.inline
223591 6952 5468 236011 399eb linux-2.5.x/fs/built-in.o

> Also -Os on both would be quite cool.

text data bss dec hex filename
845924 51204 78896 976024 ee498 linux-2.5.x/vmlinux-inline-O2
819276 52460 78896 950632 e8168 linux-2.5.x/vmlinux-inline-Os

text data bss dec hex filename
840368 48392 78896 967656 ec3e8 linux-2.5.x/vmlinux-noinline-O2
815052 48316 78896 942264 e60b8 linux-2.5.x/vmlinux-noinline-Os

This is quite a saving! I'll send a patch to Greg (uClinux maintainer)
once I've tested it a bit more.


NOTE: I just noticed the 2.4.x kernel was built with -O1 because I
had symbolic debug enabled. That's not fair: 2.4.20 would probably
come out even smaller than I reported!

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-23 21:52:30

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wednesday 23 July 2003 23:57, Bernardo Innocenti wrote:

> NOTE: I just noticed the 2.4.x kernel was built with -O1 because I
> had symbolic debug enabled. That's not fair: 2.4.20 would probably
> come out even smaller than I reported!

Here come the numbers:

text data bss dec hex filename
640564 39152 134260 813976 c6b98 linux-2.4.x/linux-O1
633028 37952 134260 805240 c4978 linux-2.4.x/linux-Os


So the new comparison base is:

text data bss dec hex filename
633028 37952 134260 805240 c4978 linux-2.4.x/linux-Os
819276 52460 78896 950632 e8168 linux-2.5.x/vmlinux-inline-Os
^^^^^^
2.6 still needs a hard diet... :-/

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-23 22:13:16

by Willy Tarreau

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

Hi !

On Thu, Jul 24, 2003 at 12:07:15AM +0200, Bernardo Innocenti wrote:

> text data bss dec hex filename
> 633028 37952 134260 805240 c4978 linux-2.4.x/linux-Os
> 819276 52460 78896 950632 e8168 linux-2.5.x/vmlinux-inline-Os
> ^^^^^^
> 2.6 still needs a hard diet... :-/

I did the same observation a few weeks ago on 2.5.74/gcc-2.95.3. I tried
to track down the responsible, to the point that I completely disabled every
driver, networking option and file-system, just to see, and got about a 550 kB
vmlinux compiled with -Os. 550 kB for nothing :-(

I don't have the config nor the exact numbers right here now, but I can
redo the tests on 2.6.0-test1 if needed.

I was interested in using a very minimal pre-boot kernel with kexec which would
automatically select a valid image among several ones. But 500 kB overhead for
a boot loader quickly refrained me...

Cheers,
Willy

2003-07-23 22:20:56

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Wednesday 23 July 2003 22:27, Christoph Hellwig wrote:

> On Wed, Jul 23, 2003 at 01:22:56PM -0700, David S. Miller wrote:
> > Drivers weren't audited much, and there's a lot of boneheaded
> > stuff in this area. But these should be mostly identical
> > to what would happen on the 2.4.x side
>
> Please read the original message again - he stated that every single
> module in fs/ got alot bigger - if it gets smaller or at least the
> same size as 2.4 it's clearly a sign of inlines gone mad in the
> filesystem/VM code and we need to look at that. If not we have to look
> elsewhere.

I have my humbling opinion:

In 2.4.20 (m68knommu):
-------------------------------------------------------------------------
#define current _current_task
-------------------------------------------------------------------------

In 2.6.0-test1 (m68knommu):
-------------------------------------------------------------------------
#define current get_current()
static inline struct task_struct *get_current(void)
{
return(current_thread_info()->task);
}
static inline struct thread_info *current_thread_info(void)
{
struct thread_info *ti;
__asm__(
"move.l %%sp, %0 \n\t"
"and.l %1, %0"
: "=&d"(ti)
: "d" (~(THREAD_SIZE-1))
);
return ti;
}
-------------------------------------------------------------------------

The latter expands to:

0: movel #-8192,%d0
6: movel %sp,%d2
8: andl %d0,%d2
a: moveal %d2,%a1
c: moveal %a1@,%a0
e: moveal %a0@(92),%a0
12:

It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
The compiler cannot see around it.

"current" is being used very lightly all over the kernel, like in this
code snippet from fs/open.c:

old_fsuid = current->fsuid;
old_fsgid = current->fsgid;
old_cap = current->cap_effective;
current->fsuid = current->uid;
current->fsgid = current->gid;
if (current->uid)
cap_clear(current->cap_effective);
else
current->cap_effective = current->cap_permitted;

This takes 18*11 = 198 bytes just for invoking the 'current'
macro so many times.

Perhaps adding __attribute__((const)) on current_thread_info() and
get_current() would help eliminating some unnecessary accesses.

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-23 22:27:30

by Alan

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Mer, 2003-07-23 at 23:27, Willy Tarreau wrote:
> I was interested in using a very minimal pre-boot kernel with kexec which would
> automatically select a valid image among several ones. But 500 kB overhead for
> a boot loader quickly refrained me...

Something like the GPL'd eCos might be a better option (or on x86 there
is the 64K Linux 8086)

2003-07-23 22:30:26

by Alan

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> The compiler cannot see around it.
> This takes 18*11 = 198 bytes just for invoking the 'current'
> macro so many times.

Unless you support SMP I'm not sure I understand why m68k nommu changed
from using a global for current_task ?

2003-07-23 22:45:30

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Thursday 24 July 2003 00:37, Alan Cox wrote:

> On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> > It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> > The compiler cannot see around it.
> > This takes 18*11 = 198 bytes just for invoking the 'current'
> > macro so many times.
>
> Unless you support SMP I'm not sure I understand why m68k nommu changed
> from using a global for current_task ?

The people who might know best are Greg and David from SnapGear.
I'm appending them to the Cc list.

But I noticed that most archs in 2.6 do like this. Is it some kind
of flock-effect? Things get changed in i386 and all other archs
just follow... :-)

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-24 04:53:38

by David McCullough

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?


Jivin Bernardo Innocenti lays it down ...
> On Thursday 24 July 2003 00:37, Alan Cox wrote:
>
> > On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> > > It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> > > The compiler cannot see around it.
> > > This takes 18*11 = 198 bytes just for invoking the 'current'
> > > macro so many times.
> >
> > Unless you support SMP I'm not sure I understand why m68k nommu changed
> > from using a global for current_task ?
>
> The people who might know best are Greg and David from SnapGear.
> I'm appending them to the Cc list.
>
> But I noticed that most archs in 2.6 do like this. Is it some kind
> of flock-effect? Things get changed in i386 and all other archs
> just follow... :-)

It's a little this way for sure.

Back when I first did the 2.4 uClinux port, the m68k MMU code was
dedicating a register (a2) for current. I thought that was a bad idea
given how often you run out of registers on the 68k, and made it a
global. Because it was still effectively a pointer, the code size
change was not a factor. I just didn't want to give up a register.
So that is the 2.4 history and it has served us well so far ;-)

On the 2.5/2.6 front, I think the change comes from the 8K (2 page) task
structure and everyone just masking the kernel stack pointer to get the
task pointer. Gerg would know for sure, he did the 2.5 work in this area.
We should be easily able to switch back to the current_task pointer with a
few small mods to entry.S.

A general comment on the use of inline throughout the kernel. Although
they may show gains on x86 platforms, they often perform worse on
embedded processors with limited cache, as well as adding size. I
can't see any way of coding around this though. As long as x86 is
driving influence, other platforms will jut have to deal with it as
best they can.

Cheers,
Davidm

--
David McCullough, [email protected] Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org

2003-07-24 11:22:22

by Alan

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Iau, 2003-07-24 at 06:06, David McCullough wrote:
> Back when I first did the 2.4 uClinux port, the m68k MMU code was
> dedicating a register (a2) for current. I thought that was a bad idea
> given how often you run out of registers on the 68k, and made it a

On some platforms a global register current was a win, I can't speak for
m68k - current is used a lot.

> On the 2.5/2.6 front, I think the change comes from the 8K (2 page) task
> structure and everyone just masking the kernel stack pointer to get the
> task pointer. Gerg would know for sure, he did the 2.5 work in this area.
> We should be easily able to switch back to the current_task pointer with a
> few small mods to entry.S.

A lot of platforms went this way because "current" is hard to do right
on an SMP box. Its effectively per CPU dependant, and that means you
either set up the MMU to do per CPU pages (via segments or tables) which
is a pita, or you do the stack trick. For uniprocessor a global still
works perfectly well.

> A general comment on the use of inline throughout the kernel. Although
> they may show gains on x86 platforms, they often perform worse on
> embedded processors with limited cache, as well as adding size. I

Code size for critical paths is getting more and more performance critical
on x86 as well as on the embedded CPU systems. 3Ghz superscalar processors
lose a lot of clocks to a memory stall.

2003-07-24 11:51:13

by David McCullough

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?


Jivin Alan Cox lays it down ...
> On Iau, 2003-07-24 at 06:06, David McCullough wrote:
> > Back when I first did the 2.4 uClinux port, the m68k MMU code was
> > dedicating a register (a2) for current. I thought that was a bad idea
> > given how often you run out of registers on the 68k, and made it a
>
> On some platforms a global register current was a win, I can't speak for
> m68k - current is used a lot.


I'm sure that using a register for current was the right thing to do at
the time. One problem with a global register approach is that the more
inlining the code uses, the more like the compiler is going to want
that extra register :-)


> > On the 2.5/2.6 front, I think the change comes from the 8K (2 page) task
> > structure and everyone just masking the kernel stack pointer to get the
> > task pointer. Gerg would know for sure, he did the 2.5 work in this area.
> > We should be easily able to switch back to the current_task pointer with a
> > few small mods to entry.S.
>
> A lot of platforms went this way because "current" is hard to do right
> on an SMP box. Its effectively per CPU dependant, and that means you
> either set up the MMU to do per CPU pages (via segments or tables) which
> is a pita, or you do the stack trick. For uniprocessor a global still
> works perfectly well.


Sounds like something that can at least be made conditional on SMP.
I'll look into it for m68knommu since it is more likely to care about "size"
than SMP.


> > A general comment on the use of inline throughout the kernel. Although
> > they may show gains on x86 platforms, they often perform worse on
> > embedded processors with limited cache, as well as adding size. I
>
> Code size for critical paths is getting more and more performance critical
> on x86 as well as on the embedded CPU systems. 3Ghz superscalar processors
> lose a lot of clocks to a memory stall.

So should the trend be away from inlining, especially larger functions ?

I know on m68k some of the really simple inlines are actually smaller as
an inline than as a function call. But they have to be very simple, or
only used once.

Cheers,
Davidm

--
David McCullough, [email protected] Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org

2003-07-24 14:42:21

by Alan

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Iau, 2003-07-24 at 13:04, David McCullough wrote:
> So should the trend be away from inlining, especially larger functions ?
>
> I know on m68k some of the really simple inlines are actually smaller as
> an inline than as a function call. But they have to be very simple, or
> only used once.

Cool. As to trends well there are two conflicting ones - less inlines but
also more code because of adding fast paths to cut conditions down on normal
sequences of execution.


2003-07-24 15:15:25

by Hollis Blanchard

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Thursday, Jul 24, 2003, at 06:28 US/Central, Alan Cox wrote:
>
> Code size for critical paths is getting more and more performance
> critical
> on x86 as well as on the embedded CPU systems. 3Ghz superscalar
> processors
> lose a lot of clocks to a memory stall.

So you're arguing for more inlining, because icache speculative
prefetch will pick up the inlined code?

Or you're arguing for less, because code like get_current() which is
called frequently could have a single copy living in icache?

--
Hollis Blanchard
IBM Linux Technology Center

2003-07-24 19:28:59

by Alan

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
> So you're arguing for more inlining, because icache speculative
> prefetch will pick up the inlined code?

I'm arguing for short inlined fast paths and non inlined unusual
paths.

> Or you're arguing for less, because code like get_current() which is
> called frequently could have a single copy living in icache?

Depends how much the jump costs you.

2003-07-24 19:36:18

by Hollis Blanchard

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Thursday, Jul 24, 2003, at 14:37 US/Central, Alan Cox wrote:

> On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
>> So you're arguing for more inlining, because icache speculative
>> prefetch will pick up the inlined code?
>
> I'm arguing for short inlined fast paths and non inlined unusual
> paths.
>
>> Or you're arguing for less, because code like get_current() which is
>> called frequently could have a single copy living in icache?
>
> Depends how much the jump costs you.

And also how big your icache is, and maybe even cpu/bus ratio, etc...
which depend on the arch of course.

So as I saw Ihar suggest earlier in this thread, perhaps there should
be two inline directives: must_inline (for code whose correctness
depends on it) and could_help_performance_inline. Then different archs
could #define could_help_performance_inline as appropriate.

--
Hollis Blanchard
IBM Linux Technology Center

2003-07-24 20:12:16

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Thursday 24 July 2003 00:27, Willy Tarreau wrote:

> On Thu, Jul 24, 2003 at 12:07:15AM +0200, Bernardo Innocenti wrote:
> > text data bss dec hex filename
> > 633028 37952 134260 805240 c4978 linux-2.4.x/linux-Os
> > 819276 52460 78896 950632 e8168 linux-2.5.x/vmlinux-inline-Os
> > ^^^^^^
> > 2.6 still needs a hard diet... :-/
>
> I did the same observation a few weeks ago on 2.5.74/gcc-2.95.3. I tried
> to track down the responsible, to the point that I completely disabled
> every driver, networking option and file-system, just to see, and got about
> a 550 kB vmlinux compiled with -Os. 550 kB for nothing :-(

Some of the bigger 2.6 additions cannot be configured out.
I wish sysfs and the different I/O schedulers could be removed.

There are probably many other things mostly useless for embedded
systems that I'm not aware of.

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-24 21:04:54

by J.A. Magallon

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?


On 07.24, Hollis Blanchard wrote:
> On Thursday, Jul 24, 2003, at 14:37 US/Central, Alan Cox wrote:
>
> > On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
> >> So you're arguing for more inlining, because icache speculative
> >> prefetch will pick up the inlined code?
> >
> > I'm arguing for short inlined fast paths and non inlined unusual
> > paths.
> >
> >> Or you're arguing for less, because code like get_current() which is
> >> called frequently could have a single copy living in icache?
> >
> > Depends how much the jump costs you.
>
> And also how big your icache is, and maybe even cpu/bus ratio, etc...
> which depend on the arch of course.
>
> So as I saw Ihar suggest earlier in this thread, perhaps there should
> be two inline directives: must_inline (for code whose correctness
> depends on it) and could_help_performance_inline. Then different archs
> could #define could_help_performance_inline as appropriate.
>

Or you just define must_inline, and let gcc inline the rest of 'inlines',
based on its own rule of functions size, adjusting the parameters
to gcc to assure (more or less) that what is inlined fits in cache of
the processor one is building for...
(this can be hard, help from gcc hackers will be needed...)

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-pre7-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-0.6mdk))

2003-07-25 04:12:24

by Otto Solares

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Thu, Jul 24, 2003 at 11:20:00PM +0200, J.A. Magallon wrote:
> Or you just define must_inline, and let gcc inline the rest of 'inlines',
> based on its own rule of functions size, adjusting the parameters
> to gcc to assure (more or less) that what is inlined fits in cache of
> the processor one is building for...
> (this can be hard, help from gcc hackers will be needed...)

IMO just a CONFIG_INLINE_FUNCTIONS will work, if you
want to conserve space in detriment of speed simply
don't select this option, else you have speed but
a big kernel.

-solca

2003-07-25 14:23:21

by Hollis Blanchard

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?

On Thursday, Jul 24, 2003, at 23:22 US/Central, Otto Solares wrote:

> On Thu, Jul 24, 2003 at 11:20:00PM +0200, J.A. Magallon wrote:
>> Or you just define must_inline, and let gcc inline the rest of
>> 'inlines',
>> based on its own rule of functions size, adjusting the parameters
>> to gcc to assure (more or less) that what is inlined fits in cache of
>> the processor one is building for...
>> (this can be hard, help from gcc hackers will be needed...)
>
> IMO just a CONFIG_INLINE_FUNCTIONS will work, if you
> want to conserve space in detriment of speed simply
> don't select this option, else you have speed but
> a big kernel.

Inlines don't always help performance (depending on cache sizes, branch
penalties, frequency of code access...), but they do always increase
code size.

I believe the point Alan was trying to make is not that we should have
more or less inlines, but we should have smarter inlines. I.E. don't
just inline a function to "make it fast"; think about the implications
(and ideally measure it, though I think that becomes problematic when
so many other factors can affect the benefit of a single inlined
function). The specific example he gave was inlining code on the fast
path, while accepting branch/cache penalties for non-inlined code on
the slow path.

--
Hollis Blanchard
IBM Linux Technology Center

2003-07-25 15:31:56

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Thu, Jul 24, 2003 at 10:27:16PM +0200, Bernardo Innocenti wrote:
> Some of the bigger 2.6 additions cannot be configured out.
> I wish sysfs and the different I/O schedulers could be removed.

Removing the I/O schedulers is pretty trivial, please come up with a
patch to make both of them optional and maybe add a trivial noop one.

Removing sysfs should also be pretty trivial but I'm not sure whether
you really want that.

2003-07-25 23:40:39

by Bernardo Innocenti

[permalink] [raw]
Subject: [PATCH] Make I/O schedulers optional (Was: Re: Kernel 2.6 size increase)

On Friday 25 July 2003 17:46, Christoph Hellwig wrote:
> On Thu, Jul 24, 2003 at 10:27:16PM +0200, Bernardo Innocenti wrote:
> > Some of the bigger 2.6 additions cannot be configured out.
> > I wish sysfs and the different I/O schedulers could be removed.
>
> Removing the I/O schedulers is pretty trivial, please come up with a
> patch to make both of them optional and maybe add a trivial noop one.

Here it is, attached below. I've tested it on both i386 and m68knommu.

Jens, could you please review this patch and push to Linus if you
like it?

> Removing sysfs should also be pretty trivial but I'm not sure whether
> you really want that.

I really don't need sysfs on uClinux. I have no programs using it.
Removing sysfs appears to be a little bit more difficult, though.
I'll try tomorrow.

--------------------------------------------------------------------------

Add kconfig options to allow excluding either or both the I/O schedulers.
Mostly useful for embedded systems (save ~13KB):

With my desktop PC (i386) kernel:

text data bss dec hex filename
2210707 475856 150444 2837007 2b4a0f vmlinux_with_ioscheds
2197763 473446 150380 2821589 2b0dd5 vmlinux_without_ioscheds

With my uClinux (m68knommu) kernel:

text data bss dec hex filename
807760 47384 78884 934028 e408c linux_without_ioscheds
819276 52460 78896 950632 e8168 linux_with_ioscheds


diff -Nru linux-2.6.0-test1.orig/drivers/block/Kconfig linux-2.6.0-test1/drivers/block/Kconfig
--- linux-2.6.0-test1.orig/drivers/block/Kconfig 2003-07-14 05:31:51.000000000 +0200
+++ linux-2.6.0-test1/drivers/block/Kconfig 2003-07-25 18:59:19.000000000 +0200
@@ -4,6 +4,10 @@

menu "Block devices"

+menu "I/O schedulers"
+source "drivers/block/Kconfig.iosched"
+endmenu
+
config BLK_DEV_FD
tristate "Normal floppy disk support"
depends on !X86_PC9800
diff -Nru linux-2.6.0-test1.orig/drivers/block/Kconfig.iosched linux-2.6.0-test1/drivers/block/Kconfig.iosched
--- linux-2.6.0-test1.orig/drivers/block/Kconfig.iosched 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.0-test1/drivers/block/Kconfig.iosched 2003-07-25 18:59:53.000000000 +0200
@@ -0,0 +1,8 @@
+config IOSCHED_AS
+ bool "Anticipatory I/O scheduler"
+ default y
+
+config IOSCHED_DEADLINE
+ bool "Deadline I/O scheduler"
+ default y
+
diff -Nru linux-2.6.0-test1.orig/drivers/block/Makefile linux-2.6.0-test1/drivers/block/Makefile
--- linux-2.6.0-test1.orig/drivers/block/Makefile 2003-07-14 05:37:16.000000000 +0200
+++ linux-2.6.0-test1/drivers/block/Makefile 2003-07-25 20:21:50.000000000 +0200
@@ -13,9 +13,10 @@
# kblockd threads
#

-obj-y := elevator.o ll_rw_blk.o ioctl.o genhd.o scsi_ioctl.o \
- deadline-iosched.o as-iosched.o
+obj-y := elevator.o ll_rw_blk.o ioctl.o genhd.o scsi_ioctl.o

+obj-$(CONFIG_IOSCHED_AS) += as-iosched.o
+obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_MAC_FLOPPY) += swim3.o
obj-$(CONFIG_BLK_DEV_FD) += floppy.o
obj-$(CONFIG_BLK_DEV_FD98) += floppy98.o
diff -Nru linux-2.6.0-test1.orig/drivers/block/as-iosched.c linux-2.6.0-test1/drivers/block/as-iosched.c
--- linux-2.6.0-test1.orig/drivers/block/as-iosched.c 2003-07-14 05:28:54.000000000 +0200
+++ linux-2.6.0-test1/drivers/block/as-iosched.c 2003-07-25 20:19:44.000000000 +0200
@@ -1,7 +1,7 @@
/*
* linux/drivers/block/as-iosched.c
*
- * Anticipatory & deadline i/o scheduler.
+ * Anticipatory i/o scheduler.
*
* Copyright (C) 2002 Jens Axboe <[email protected]>
* Nick Piggin <[email protected]>
@@ -1832,6 +1832,7 @@
.elevator_exit_fn = as_exit,

.elevator_ktype = &as_ktype,
+ .elevator_name = "anticipatory scheduling",
};

EXPORT_SYMBOL(iosched_as);
diff -Nru linux-2.6.0-test1.orig/drivers/block/deadline-iosched.c linux-2.6.0-test1/drivers/block/deadline-iosched.c
--- linux-2.6.0-test1.orig/drivers/block/deadline-iosched.c 2003-07-14 05:37:15.000000000 +0200
+++ linux-2.6.0-test1/drivers/block/deadline-iosched.c 2003-07-25 20:20:53.000000000 +0200
@@ -941,6 +941,7 @@
.elevator_exit_fn = deadline_exit,

.elevator_ktype = &deadline_ktype,
+ .elevator_name = "deadline",
};

EXPORT_SYMBOL(iosched_deadline);
diff -Nru linux-2.6.0-test1.orig/drivers/block/elevator.c linux-2.6.0-test1/drivers/block/elevator.c
--- linux-2.6.0-test1.orig/drivers/block/elevator.c 2003-07-14 05:36:48.000000000 +0200
+++ linux-2.6.0-test1/drivers/block/elevator.c 2003-07-25 19:27:41.000000000 +0200
@@ -409,6 +409,7 @@
.elevator_merge_req_fn = elevator_noop_merge_requests,
.elevator_next_req_fn = elevator_noop_next_request,
.elevator_add_req_fn = elevator_noop_add_request,
+ .elevator_name = "noop",
};

module_init(elevator_global_init);
diff -Nru linux-2.6.0-test1.orig/drivers/block/ll_rw_blk.c linux-2.6.0-test1/drivers/block/ll_rw_blk.c
--- linux-2.6.0-test1.orig/drivers/block/ll_rw_blk.c 2003-07-14 05:30:40.000000000 +0200
+++ linux-2.6.0-test1/drivers/block/ll_rw_blk.c 2003-07-25 19:27:02.000000000 +0200
@@ -1205,17 +1205,31 @@

static int __make_request(request_queue_t *, struct bio *);

-static elevator_t *chosen_elevator = &iosched_as;
+static elevator_t *chosen_elevator =
+#if defined(CONFIG_IOSCHED_AS)
+ &iosched_as;
+#elif defined(CONFIG_IOSCHED_DEADLINE)
+ &iosched_deadline;
+#else
+ &elevator_noop;
+#endif

+#if defined(CONFIG_IOSCHED_AS) || defined(CONFIG_IOSCHED_DEADLINE)
static int __init elevator_setup(char *str)
{
+#ifdef CONFIG_IOSCHED_DEADLINE
if (!strcmp(str, "deadline"))
chosen_elevator = &iosched_deadline;
+#endif
+#ifdef CONFIG_IOSCHED_AS
if (!strcmp(str, "as"))
chosen_elevator = &iosched_as;
+#endif
return 1;
}
+
__setup("elevator=", elevator_setup);
+#endif /* CONFIG_IOSCHED_AS || CONFIG_IOSCHED_DEADLINE */

/**
* blk_init_queue - prepare a request queue for use with a block device
@@ -1255,10 +1269,7 @@

if (!printed) {
printed = 1;
- if (chosen_elevator == &iosched_deadline)
- printk("deadline elevator\n");
- else if (chosen_elevator == &iosched_as)
- printk("anticipatory scheduling elevator\n");
+ printk("Using %s elevator\n", chosen_elevator->elevator_name);
}

if ((ret = elevator_init(q, chosen_elevator))) {
diff -Nru linux-2.6.0-test1.orig/include/linux/elevator.h linux-2.6.0-test1/include/linux/elevator.h
--- linux-2.6.0-test1.orig/include/linux/elevator.h 2003-07-14 05:29:27.000000000 +0200
+++ linux-2.6.0-test1/include/linux/elevator.h 2003-07-25 19:18:39.000000000 +0200
@@ -52,6 +52,7 @@

struct kobject kobj;
struct kobj_type *elevator_ktype;
+ const char *elevator_name;
};

/*


--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-26 08:01:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Make I/O schedulers optional (Was: Re: Kernel 2.6 size increase)

Bernardo Innocenti <[email protected]> wrote:
>
> > Removing the I/O schedulers is pretty trivial, please come up with a
> > patch to make both of them optional and maybe add a trivial noop one.
>
> Here it is, attached below. I've tested it on both i386 and m68knommu.

Is nice, but I wonder if it should be appearing under the

General Setup -> Remove kernel features

menu? ie: CONFIG_EMBEDDED.

2003-07-26 12:26:05

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [PATCH] Make I/O schedulers optional (Was: Re: Kernel 2.6 size increase)

On Saturday 26 July 2003 10:17, Andrew Morton wrote:

> > Here it is, attached below. I've tested it on both i386 and m68knommu.
>
> Is nice, but I wonder if it should be appearing under the
>
> General Setup -> Remove kernel features
>
> menu? ie: CONFIG_EMBEDDED.

Right. Here is an updated patch. I think I've now tested it properly even
with the noop scheduler on i386. Please apply.

--------------------------------------------------------------------------

Add kconfig options to allow excluding either or both the I/O schedulers.
Mostly useful for embedded systems (save ~13KB):

With my desktop PC (i386) kernel:

text data bss dec hex filename
2210707 475856 150444 2837007 2b4a0f vmlinux_with_ioscheds
2197763 473446 150380 2821589 2b0dd5 vmlinux_without_ioscheds

With my uClinux (m68knommu) kernel:

text data bss dec hex filename
807760 47384 78884 934028 e408c linux_without_ioscheds
819276 52460 78896 950632 e8168 linux_with_ioscheds


diff -Nru linux-2.6.0-test1.orig/drivers/block/Kconfig.iosched linux-2.6.0-test1-with_elevator_patch/drivers/block/Kconfig.iosched
--- linux-2.6.0-test1.orig/drivers/block/Kconfig.iosched 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/Kconfig.iosched 2003-07-26 14:25:44.000000000 +0200
@@ -0,0 +1,8 @@
+config IOSCHED_AS
+ bool "Anticipatory I/O scheduler" if EMBEDDED
+ default y
+
+config IOSCHED_DEADLINE
+ bool "Deadline I/O scheduler" if EMBEDDED
+ default y
+
diff -Nru linux-2.6.0-test1.orig/drivers/block/Makefile linux-2.6.0-test1-with_elevator_patch/drivers/block/Makefile
--- linux-2.6.0-test1.orig/drivers/block/Makefile 2003-07-14 05:37:16.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/Makefile 2003-07-25 20:21:50.000000000 +0200
@@ -13,9 +13,10 @@
# kblockd threads
#

-obj-y := elevator.o ll_rw_blk.o ioctl.o genhd.o scsi_ioctl.o \
- deadline-iosched.o as-iosched.o
+obj-y := elevator.o ll_rw_blk.o ioctl.o genhd.o scsi_ioctl.o

+obj-$(CONFIG_IOSCHED_AS) += as-iosched.o
+obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_MAC_FLOPPY) += swim3.o
obj-$(CONFIG_BLK_DEV_FD) += floppy.o
obj-$(CONFIG_BLK_DEV_FD98) += floppy98.o
diff -Nru linux-2.6.0-test1.orig/drivers/block/as-iosched.c linux-2.6.0-test1-with_elevator_patch/drivers/block/as-iosched.c
--- linux-2.6.0-test1.orig/drivers/block/as-iosched.c 2003-07-14 05:28:54.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/as-iosched.c 2003-07-25 20:19:44.000000000 +0200
@@ -1,7 +1,7 @@
/*
* linux/drivers/block/as-iosched.c
*
- * Anticipatory & deadline i/o scheduler.
+ * Anticipatory i/o scheduler.
*
* Copyright (C) 2002 Jens Axboe <[email protected]>
* Nick Piggin <[email protected]>
@@ -1832,6 +1832,7 @@
.elevator_exit_fn = as_exit,

.elevator_ktype = &as_ktype,
+ .elevator_name = "anticipatory scheduling",
};

EXPORT_SYMBOL(iosched_as);
diff -Nru linux-2.6.0-test1.orig/drivers/block/deadline-iosched.c linux-2.6.0-test1-with_elevator_patch/drivers/block/deadline-iosched.c
--- linux-2.6.0-test1.orig/drivers/block/deadline-iosched.c 2003-07-14 05:37:15.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/deadline-iosched.c 2003-07-25 20:20:53.000000000 +0200
@@ -941,6 +941,7 @@
.elevator_exit_fn = deadline_exit,

.elevator_ktype = &deadline_ktype,
+ .elevator_name = "deadline",
};

EXPORT_SYMBOL(iosched_deadline);
diff -Nru linux-2.6.0-test1.orig/drivers/block/elevator.c linux-2.6.0-test1-with_elevator_patch/drivers/block/elevator.c
--- linux-2.6.0-test1.orig/drivers/block/elevator.cator_name = 2003-07-14 05:36:48.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/elevator.c 2003-07-25 19:27:41.000000000 +0200
@@ -409,6 +409,7 @@
.elevator_merge_req_fn = elevator_noop_merge_requests,
.elevator_next_req_fn = elevator_noop_next_request,
.elevator_add_req_fn = elevator_noop_add_request,
+ .elevator_name = "noop",
};

module_init(elevator_global_init);
diff -Nru linux-2.6.0-test1.orig/drivers/block/ll_rw_blk.c linux-2.6.0-test1-with_elevator_patch/drivers/block/ll_rw_blk.c
--- linux-2.6.0-test1.orig/drivers/block/ll_rw_blk.c 2003-07-14 05:30:40.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/ll_rw_blk.c 2003-07-25 19:27:02.000000000 +0200
@@ -1205,17 +1205,31 @@

static int __make_request(request_queue_t *, struct bio *);

-static elevator_t *chosen_elevator = &iosched_as;
+static elevator_t *chosen_elevator =
+#if defined(CONFIG_IOSCHED_AS)
+ &iosched_as;
+#elif defined(CONFIG_IOSCHED_DEADLINE)
+ &iosched_deadline;
+#else
+ &elevator_noop;
+#endif

+#if defined(CONFIG_IOSCHED_AS) || defined(CONFIG_IOSCHED_DEADLINE)
static int __init elevator_setup(char *str)
{
+#ifdef CONFIG_IOSCHED_DEADLINE
if (!strcmp(str, "deadline"))
chosen_elevator = &iosched_deadline;
+#endif
+#ifdef CONFIG_IOSCHED_AS
if (!strcmp(str, "as"))
chosen_elevator = &iosched_as;
+#endif
return 1;
}
+
__setup("elevator=", elevator_setup);
+#endif /* CONFIG_IOSCHED_AS || CONFIG_IOSCHED_DEADLINE */
ator_name =
/**
* blk_init_queue - prepare a request queue for use with a block device
@@ -1255,10 +1269,7 @@

if (!printed) {
printed = 1;
- if (chosen_elevator == &iosched_deadline)
- printk("deadline elevator\n");
- else if (chosen_elevator == &iosched_as)
- printk("anticipatory scheduling elevator\n");
+ printk("Using %s elevator\n", chosen_elevator->elevator_name);
}

if ((ret = elevator_init(q, chosen_elevator))) {
diff -Nru linux-2.6.0-test1.orig/init/Kconfig linux-2.6.0-test1-with_elevator_patch/init/Kconfig
--- linux-2.6.0-test1.orig/init/Kconfig 2003-07-14 05:37:16.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/init/Kconfig 2003-07-26 14:25:48.000000000 +0200
@@ -141,6 +141,8 @@
Disabling this option will cause the kernel to be built without
support for epoll family of system calls.

+source "drivers/block/Kconfig.iosched"
+
endmenu # General setup


diff -Nru linux-2.6.0-test1.orig/include/linux/elevator.h linux-2.6.0-test1/include/linux/elevator.h
--- linux-2.6.0-test1.orig/include/linux/elevator.h 2003-07-14 05:29:27.000000000 +0200
+++ linux-2.6.0-test1/include/linux/elevator.h 2003-07-25 19:18:39.000000000 +0200
@@ -52,6 +52,7 @@

struct kobject kobj;
struct kobj_type *elevator_ktype;
+ const char *elevator_name;
};

/*

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-26 14:05:23

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Make I/O schedulers optional (Was: Re: Kernel 2.6 size increase)

On Sat, Jul 26 2003, Bernardo Innocenti wrote:
> diff -Nru linux-2.6.0-test1.orig/drivers/block/as-iosched.c linux-2.6.0-test1-with_elevator_patch/drivers/block/as-iosched.c
> --- linux-2.6.0-test1.orig/drivers/block/as-iosched.c 2003-07-14 05:28:54.000000000 +0200
> +++ linux-2.6.0-test1-with_elevator_patch/drivers/block/as-iosched.c 2003-07-25 20:19:44.000000000 +0200
> @@ -1,7 +1,7 @@
> /*
> * linux/drivers/block/as-iosched.c
> *
> - * Anticipatory & deadline i/o scheduler.
> + * Anticipatory i/o scheduler.
> *
> * Copyright (C) 2002 Jens Axboe <[email protected]>
> * Nick Piggin <[email protected]>

Huh? What is that about? AS is deadline + anticipation. Good rule is not
to make comment changes when you don't know your changes to be a fact.

About making it selectable, I'm fine with it. Please send an updated
patch.

--
Jens Axboe

2003-07-26 23:28:13

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [PATCH] Make I/O schedulers optional (Was: Re: Kernel 2.6 size increase)

On Saturday 26 July 2003 16:07, Jens Axboe wrote:

> > /*
> > * linux/drivers/block/as-iosched.c
> > *
> > - * Anticipatory & deadline i/o scheduler.
> > + * Anticipatory i/o scheduler.
> > *
> > * Copyright (C) 2002 Jens Axboe <[email protected]>
> > * Nick Piggin <[email protected]>
>
> Huh? What is that about? AS is deadline + anticipation. Good rule is not
> to make comment changes when you don't know your changes to be a fact.

Oops, I thought it was done by mistake :-)

By the way, this comment in as-iosched.c refers to a missing file:

/*
* See Documentation/as-iosched.txt
*/

Another issue: to make the I/O schedulers configurable, I had to
fiddle with ll_rw_blk.c, makiing it slightly harder to understand.

The MTD layer uses a nice registration-based API that makes adding
maps and chips very clean and easy.

Of course, implementing a similar registration infrastructure just
to select between two scheduling policies would be overkill.

Instead, I think it would be nice if the kernel provided some
generic API for registration of components, modules, strategies,
algorithms, etc.

It could be useful in several places, from crypto to network protocols.

Perhaps it could be done with kobjs and it should provide a way to find
an item by name.

> About making it selectable, I'm fine with it. Please send an updated
> patch.

Here it is:

--------------------------------------------------------------------------

Add kconfig options to allow excluding either or both the I/O schedulers.
Mostly useful for embedded systems (save ~13KB):

With my desktop PC (i386) kernel:

text data bss dec hex filename
2210707 475856 150444 2837007 2b4a0f vmlinux_with_ioscheds
2197763 473446 150380 2821589 2b0dd5 vmlinux_without_ioscheds

With my uClinux (m68knommu) kernel:

text data bss dec hex filename
807760 47384 78884 934028 e408c linux_without_ioscheds
819276 52460 78896 950632 e8168 linux_with_ioscheds


diff -Nru linux-2.6.0-test1.orig/drivers/block/Kconfig.iosched linux-2.6.0-test1-with_elevator_patch/drivers/block/Kconfig.iosched
--- linux-2.6.0-test1.orig/drivers/block/Kconfig.iosched 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/Kconfig.iosched 2003-07-26 14:25:44.000000000 +0200
@@ -0,0 +1,8 @@
+config IOSCHED_AS
+ bool "Anticipatory I/O scheduler" if EMBEDDED
+ default y
+
+config IOSCHED_DEADLINE
+ bool "Deadline I/O scheduler" if EMBEDDED
+ default y
+
diff -Nru linux-2.6.0-test1.orig/drivers/block/Makefile linux-2.6.0-test1-with_elevator_patch/drivers/block/Makefile
--- linux-2.6.0-test1.orig/drivers/block/Makefile 2003-07-14 05:37:16.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/Makefile 2003-07-25 20:21:50.000000000 +0200
@@ -13,9 +13,10 @@
# kblockd threads
#

-obj-y := elevator.o ll_rw_blk.o ioctl.o genhd.o scsi_ioctl.o \
- deadline-iosched.o as-iosched.o
+obj-y := elevator.o ll_rw_blk.o ioctl.o genhd.o scsi_ioctl.o

+obj-$(CONFIG_IOSCHED_AS) += as-iosched.o
+obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_MAC_FLOPPY) += swim3.o
obj-$(CONFIG_BLK_DEV_FD) += floppy.o
obj-$(CONFIG_BLK_DEV_FD98) += floppy98.o
diff -Nru linux-2.6.0-test1.orig/drivers/block/as-iosched.c linux-2.6.0-test1-with_elevator_patch/drivers/block/as-iosched.c
--- linux-2.6.0-test1.orig/drivers/block/as-iosched.c 2003-07-14 05:28:54.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/as-iosched.c 2003-07-25 20:19:44.000000000 +0200
@@ -1832,6 +1832,7 @@
.elevator_exit_fn = as_exit,

.elevator_ktype = &as_ktype,
+ .elevator_name = "anticipatory scheduling",
};

EXPORT_SYMBOL(iosched_as);
diff -Nru linux-2.6.0-test1.orig/drivers/block/deadline-iosched.c linux-2.6.0-test1-with_elevator_patch/drivers/block/deadline-iosched.c
--- linux-2.6.0-test1.orig/drivers/block/deadline-iosched.c 2003-07-14 05:37:15.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/deadline-iosched.c 2003-07-25 20:20:53.000000000 +0200
@@ -941,6 +941,7 @@
.elevator_exit_fn = deadline_exit,

.elevator_ktype = &deadline_ktype,
+ .elevator_name = "deadline",
};

EXPORT_SYMBOL(iosched_deadline);
diff -Nru linux-2.6.0-test1.orig/drivers/block/elevator.c linux-2.6.0-test1-with_elevator_patch/drivers/block/elevator.c
--- linux-2.6.0-test1.orig/drivers/block/elevator.c 2003-07-14 05:36:48.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/elevator.c 2003-07-25 19:27:41.000000000 +0200
@@ -409,6 +409,7 @@
.elevator_merge_req_fn = elevator_noop_merge_requests,
.elevator_next_req_fn = elevator_noop_next_request,
.elevator_add_req_fn = elevator_noop_add_request,
+ .elevator_name = "noop",
};

module_init(elevator_global_init);
diff -Nru linux-2.6.0-test1.orig/drivers/block/ll_rw_blk.c linux-2.6.0-test1-with_elevator_patch/drivers/block/ll_rw_blk.c
--- linux-2.6.0-test1.orig/drivers/block/ll_rw_blk.c 2003-07-14 05:30:40.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/drivers/block/ll_rw_blk.c 2003-07-25 19:27:02.000000000 +0200
@@ -1205,17 +1205,31 @@

static int __make_request(request_queue_t *, struct bio *);

-static elevator_t *chosen_elevator = &iosched_as;
+static elevator_t *chosen_elevator =
+#if defined(CONFIG_IOSCHED_AS)
+ &iosched_as;
+#elif defined(CONFIG_IOSCHED_DEADLINE)
+ &iosched_deadline;
+#else
+ &elevator_noop;
+#endif

+#if defined(CONFIG_IOSCHED_AS) || defined(CONFIG_IOSCHED_DEADLINE)
static int __init elevator_setup(char *str)
{
+#ifdef CONFIG_IOSCHED_DEADLINE
if (!strcmp(str, "deadline"))
chosen_elevator = &iosched_deadline;
+#endif
+#ifdef CONFIG_IOSCHED_AS
if (!strcmp(str, "as"))
chosen_elevator = &iosched_as;
+#endif
return 1;
}
+
__setup("elevator=", elevator_setup);
+#endif /* CONFIG_IOSCHED_AS || CONFIG_IOSCHED_DEADLINE */

/**
* blk_init_queue - prepare a request queue for use with a block device
@@ -1255,10 +1269,7 @@

if (!printed) {
printed = 1;
- if (chosen_elevator == &iosched_deadline)
- printk("deadline elevator\n");
- else if (chosen_elevator == &iosched_as)
- printk("anticipatory scheduling elevator\n");
+ printk("Using %s elevator\n", chosen_elevator->elevator_name);
}

if ((ret = elevator_init(q, chosen_elevator))) {
diff -Nru linux-2.6.0-test1.orig/init/Kconfig linux-2.6.0-test1-with_elevator_patch/init/Kconfig
--- linux-2.6.0-test1.orig/init/Kconfig 2003-07-14 05:37:16.000000000 +0200
+++ linux-2.6.0-test1-with_elevator_patch/init/Kconfig 2003-07-26 14:25:48.000000000 +0200
@@ -141,6 +141,8 @@
Disabling this option will cause the kernel to be built without
support for epoll family of system calls.

+source "drivers/block/Kconfig.iosched"
+
endmenu # General setup


diff -Nru linux-2.6.0-test1.orig/include/linux/elevator.h linux-2.6.0-test1/include/linux/elevator.h
--- linux-2.6.0-test1.orig/include/linux/elevator.h 2003-07-14 05:29:27.000000000 +0200
+++ linux-2.6.0-test1/include/linux/elevator.h 2003-07-25 19:18:39.000000000 +0200
@@ -52,6 +52,7 @@

struct kobject kobj;
struct kobj_type *elevator_ktype;
+ const char *elevator_name;
};

/*


--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-26 23:42:02

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Make I/O schedulers optional (Was: Re: Kernel 2.6 size increase)

On Sun, Jul 27 2003, Bernardo Innocenti wrote:
> On Saturday 26 July 2003 16:07, Jens Axboe wrote:
>
> > > /*
> > > * linux/drivers/block/as-iosched.c
> > > *
> > > - * Anticipatory & deadline i/o scheduler.
> > > + * Anticipatory i/o scheduler.
> > > *
> > > * Copyright (C) 2002 Jens Axboe <[email protected]>
> > > * Nick Piggin <[email protected]>
> >
> > Huh? What is that about? AS is deadline + anticipation. Good rule is not
> > to make comment changes when you don't know your changes to be a fact.
>
> Oops, I thought it was done by mistake :-)
>
> By the way, this comment in as-iosched.c refers to a missing file:
>
> /*
> * See Documentation/as-iosched.txt
> */

Hmm odd, maybe it never got migrated from Andrews tree.

> Another issue: to make the I/O schedulers configurable, I had to
> fiddle with ll_rw_blk.c, makiing it slightly harder to understand.

Don't worry, the current addon of additional schedulers is crap already,
so your patch has to deal with that. The real modular schedulers will
change this, removing the nasty bits that are there know. It doesn't
even belong in ll_rw_blk.c

> > About making it selectable, I'm fine with it. Please send an updated
> > patch.
>
> Here it is:

Thanks

--
Jens Axboe

2003-07-28 03:05:50

by Miles Bader

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase - get_current()?

Hollis Blanchard <[email protected]> writes:
> Inlines don't always help performance (depending on cache sizes, branch
> penalties, frequency of code access...), but they do always increase
> code size.

Um, inlining can often _decrease_ code size because it gives the
compiler substantial new opportunities for optimization (the function
body is no longer opaque, so the compiler has a lot more info, and any
optimizations done on the inlined body can be context-specific).

-Miles
--
Is it true that nothing can be known? If so how do we know this? -Woody Allen

2003-07-28 07:58:49

by Ihar 'Philips' Filipau

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase - get_current()?

Miles Bader wrote:
> Hollis Blanchard <[email protected]> writes:
>
>>Inlines don't always help performance (depending on cache sizes, branch
>>penalties, frequency of code access...), but they do always increase
>>code size.
>
>
> Um, inlining can often _decrease_ code size because it gives the
> compiler substantial new opportunities for optimization (the function
> body is no longer opaque, so the compiler has a lot more info, and any
> optimizations done on the inlined body can be context-specific).
>

starting from -O3 gcc do always trys to do inlining.
was observed on gcc 3.2 and I beleive I saw the same 2.95.3

compile this test with 02 & 03:
-------------------
#include <stdio.h>

int aaa() { return 32; }

int main() {
int b = aaa();
printf("hello %d \n", b);
return 0;
}
------------------

and then objdump -d to see the difference between main()s: 02 - you
will see function call, 03 - you will see just raw number 0x20 used.

2003-07-28 08:47:55

by Ihar 'Philips' Filipau

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase - get_current()?

Miles Bader wrote:
> "Ihar \"Philips\" Filipau" <[email protected]> writes:
>
>> starting from -O3 gcc do always trys to do inlining.
>> was observed on gcc 3.2 and I beleive I saw the same 2.95.3
>>
>> compile this test with 02 & 03:
>
>
> Um, what's your point?
>

FYI only.
no point at all.
I meant that compiler - given a freedom to do so - can inline by itself.

2003-07-28 08:44:09

by Miles Bader

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase - get_current()?

"Ihar \"Philips\" Filipau" <[email protected]> writes:
> starting from -O3 gcc do always trys to do inlining.
> was observed on gcc 3.2 and I beleive I saw the same 2.95.3
>
> compile this test with 02 & 03:

Um, what's your point?

-Miles
--
If you can't beat them, arrange to have them beaten. [George Carlin]

2003-07-28 16:58:22

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Fri, 25 Jul 2003, Christoph Hellwig wrote:

> On Thu, Jul 24, 2003 at 10:27:16PM +0200, Bernardo Innocenti wrote:
> > Some of the bigger 2.6 additions cannot be configured out.
> > I wish sysfs and the different I/O schedulers could be removed.
>
> Removing the I/O schedulers is pretty trivial, please come up with a
> patch to make both of them optional and maybe add a trivial noop one.
>
> Removing sysfs should also be pretty trivial but I'm not sure whether
> you really want that.

Being able to remove the block layer entirely, just as for the networking
layer, should be considered too, since none of ramfs, tmpfs, nfs, smbfs,
jffs and jffs2 just to name those ones actually need the block layer to
operate. This is really a big pile of dead code in many embedded setups.


Nicolas

2003-07-28 23:02:59

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Monday 28 July 2003 19:13, Nicolas Pitre wrote:

> > Removing the I/O schedulers is pretty trivial, please come up with a
> > patch to make both of them optional and maybe add a trivial noop one.
> >
> > Removing sysfs should also be pretty trivial but I'm not sure whether
> > you really want that.
>
> Being able to remove the block layer entirely, just as for the networking
> layer, should be considered too, since none of ramfs, tmpfs, nfs, smbfs,
> jffs and jffs2 just to name those ones actually need the block layer to
> operate. This is really a big pile of dead code in many embedded setups.

It's a great idea.

I've read in the Kconfig help that JFFS2 still depends on mtdblock even
though it doesn't use it for I/O. I think I've also seen some promise
that this dependency will eventually be removed...

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-29 02:37:16

by Miles Bader

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

Bernardo Innocenti <[email protected]> writes:
> > Being able to remove the block layer entirely, just as for the networking
> > layer, should be considered too, since none of ramfs, tmpfs, nfs, smbfs,
> > jffs and jffs2 just to name those ones actually need the block layer to
> > operate. This is really a big pile of dead code in many embedded setups.
>
> It's a great idea.

Yup.

When I've used a debugger to trace through the kernel reading a block on
a system using only romfs, it's utterly amazing how much completely
unnecessary stuff happens.

Of course it's a lot harder to find a clean way to make it optional
than it is to complain about it ... :-)

-Miles
--
I have seen the enemy, and he is us. -- Pogo

2003-07-29 22:29:26

by Tom Rini

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Thu, Jul 24, 2003 at 10:27:16PM +0200, Bernardo Innocenti wrote:
> On Thursday 24 July 2003 00:27, Willy Tarreau wrote:
>
> > On Thu, Jul 24, 2003 at 12:07:15AM +0200, Bernardo Innocenti wrote:
> > > text data bss dec hex filename
> > > 633028 37952 134260 805240 c4978 linux-2.4.x/linux-Os
> > > 819276 52460 78896 950632 e8168 linux-2.5.x/vmlinux-inline-Os
> > > ^^^^^^
> > > 2.6 still needs a hard diet... :-/
> >
> > I did the same observation a few weeks ago on 2.5.74/gcc-2.95.3. I tried
> > to track down the responsible, to the point that I completely disabled
> > every driver, networking option and file-system, just to see, and got about
> > a 550 kB vmlinux compiled with -Os. 550 kB for nothing :-(
>
> Some of the bigger 2.6 additions cannot be configured out.
> I wish sysfs and the different I/O schedulers could be removed.
>
> There are probably many other things mostly useless for embedded
> systems that I'm not aware of.

Well, from Pat's talk at OLS, it seems like sysfs would be an important
part of 'sleep', which is something at least some embedded systems care
about.

... not that 2.6 doesn't need some good pruning options now, but maybe
CONFIG_EMBEDDED isn't the right place to put them all.

--
Tom Rini
http://gate.crashing.org/~trini/

2003-07-29 22:57:14

by Alan

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Maw, 2003-07-29 at 23:29, Tom Rini wrote:
>
> Well, from Pat's talk at OLS, it seems like sysfs would be an important
> part of 'sleep', which is something at least some embedded systems care
> about.

sysfs is relevant for bigger systems but for small embedded stuff the whole
PM layer is fairly "so what". At that level your hardware is tightly defined
and you *know* the power management ordering. Policy becomes critical for
performance and gets done at a very fine grained level - things like waking
up the flash for a read then turning it back off on a timer for example.

2003-07-29 23:07:01

by Tom Rini

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Tue, Jul 29, 2003 at 11:48:10PM +0100, Alan Cox wrote:
> On Maw, 2003-07-29 at 23:29, Tom Rini wrote:
> >
> > Well, from Pat's talk at OLS, it seems like sysfs would be an important
> > part of 'sleep', which is something at least some embedded systems care
> > about.
>
> sysfs is relevant for bigger systems but for small embedded stuff the whole
> PM layer is fairly "so what". At that level your hardware is tightly defined
> and you *know* the power management ordering. Policy becomes critical for
> performance and gets done at a very fine grained level - things like waking
> up the flash for a read then turning it back off on a timer for example.

And wouldn't it be nice to have one 'policy enforcing tool' or whatever
that you feed it policy_desktop.txt, policy_embedded_in_my_fridge.txt or
policy_enterprise.txt ?

And while you know the ordering for your one board, it's not the same as
that other board there, nor that third board just behind you, and so on.
So getting passed in some sort of tree (or more correctly DAG) and
dealing with it with some more generic code sure sounds nice.

--
Tom Rini
http://gate.crashing.org/~trini/

2003-07-30 02:08:30

by Miles Bader

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

Tom Rini <[email protected]> writes:
> And wouldn't it be nice to have one 'policy enforcing tool' or whatever
> that you feed it policy_desktop.txt, policy_embedded_in_my_fridge.txt or
> policy_enterprise.txt ?

Sure, but not nice enough to justify requiring more memory or whatever
(of course just that one feature's not going to make much difference,
but in aggregate, they might).

-Miles
--
Run away! Run away!

2003-07-30 02:49:52

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wednesday 30 July 2003 00:29, Tom Rini wrote:

> > Some of the bigger 2.6 additions cannot be configured out.
> > I wish sysfs and the different I/O schedulers could be removed.
> >
> > There are probably many other things mostly useless for embedded
> > systems that I'm not aware of.
>
> Well, from Pat's talk at OLS, it seems like sysfs would be an important
> part of 'sleep', which is something at least some embedded systems care
> about.

I tried stripping sysfs away. I just saved 7KB and got a kernel that
couldn't boot because root device translation depends on sysfs ;-)

> ... not that 2.6 doesn't need some good pruning options now, but maybe
> CONFIG_EMBEDDED isn't the right place to put them all.

In the long term the embedded menu would get cluttered with all kinds
of disparate options... I don't think I would like it.

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-30 15:33:16

by Tom Rini

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

On Wed, Jul 30, 2003 at 11:07:24AM +0900, Miles Bader wrote:
> Tom Rini <[email protected]> writes:
> > And wouldn't it be nice to have one 'policy enforcing tool' or whatever
> > that you feed it policy_desktop.txt, policy_embedded_in_my_fridge.txt or
> > policy_enterprise.txt ?
>
> Sure, but not nice enough to justify requiring more memory or whatever
> (of course just that one feature's not going to make much difference,
> but in aggregate, they might).

Well, that sort-of depends on which 'embedded' board you're talking
about really. And the trade-off between the work needed for a hacked-up
1 off, the space that could be saved by doing this, and the space that
could be saved elsewhere. Perhaps on the embedded_in_my_fridge machine
it might make sense, but not ever embedded device is quite so tiny and
strapped for space.

--
Tom Rini
http://gate.crashing.org/~trini/

2003-07-30 15:37:01

by Tom Rini

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Wed, Jul 30, 2003 at 04:49:37AM +0200, Bernardo Innocenti wrote:
> On Wednesday 30 July 2003 00:29, Tom Rini wrote:
>
> > > Some of the bigger 2.6 additions cannot be configured out.
> > > I wish sysfs and the different I/O schedulers could be removed.
> > >
> > > There are probably many other things mostly useless for embedded
> > > systems that I'm not aware of.
> >
> > Well, from Pat's talk at OLS, it seems like sysfs would be an important
> > part of 'sleep', which is something at least some embedded systems care
> > about.
>
> I tried stripping sysfs away. I just saved 7KB and got a kernel that
> couldn't boot because root device translation depends on sysfs ;-)

Now that someone has gone down the path (and, thanks for doing it), we
know how much is saved and what needs to be done to get it to work.
Lets just hope it doesn't grow that much more.

> > ... not that 2.6 doesn't need some good pruning options now, but maybe
> > CONFIG_EMBEDDED isn't the right place to put them all.
>
> In the long term the embedded menu would get cluttered with all kinds
> of disparate options... I don't think I would like it.

Certainly. I hope to have more time to get the tweaks patch I talked
about in one of the CONFIG_TINY threads a bit further along (1
dependancy left to get right) and at least get some discussion going on
it, but not now.

--
Tom Rini
http://gate.crashing.org/~trini/

2003-07-30 16:45:42

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase (PATCH)

On Wednesday 30 July 2003 17:35, Tom Rini wrote:

> > I tried stripping sysfs away. I just saved 7KB and got a kernel that
> > couldn't boot because root device translation depends on sysfs ;-)
>
> Now that someone has gone down the path (and, thanks for doing it), we
> know how much is saved and what needs to be done to get it to work.
> Lets just hope it doesn't grow that much more.

Here's the patch, in case someone cares trying it.
Please DON'T apply as-is to shipping kernels: as I was saying,
removing sysfs like this makes the system unable to boot.

---------------------------------------------------------------------------

Make sysfs optional for embedded systems.

Applies as-is to 2.6.0-test1.

diff -Nru linux-2.6.0-test1-with_elevator_patch/init/Kconfig linux-2.6.0-test1/init/Kconfig
--- linux-2.6.0-test1-with_elevator_patch/init/Kconfig 2003-07-26 14:25:48.000000000 +0200
+++ linux-2.6.0-test1/init/Kconfig 2003-07-26 16:02:01.000000000 +0200
@@ -141,6 +141,13 @@
Disabling this option will cause the kernel to be built without
support for epoll family of system calls.

+config SYS_FS
+ bool "/sys file system support" if EMBEDDED
+ default y
+ help
+ Disabling this option will cause the kernel to be built without
+ sysfs, which is mostly needed for power management and hot-plug support.
+
source "drivers/block/Kconfig.iosched"

endmenu # General setup
diff -Nru linux-2.6.0-test1-with_elevator_patch/fs/Makefile linux-2.6.0-test1/fs/Makefile
--- linux-2.6.0-test1-with_elevator_patch/fs/Makefile 2003-07-14 05:34:42.000000000 +0200
+++ linux-2.6.0-test1/fs/Makefile 2003-07-26 01:03:59.000000000 +0200
@@ -43,7 +43,7 @@

obj-$(CONFIG_PROC_FS) += proc/
obj-y += partitions/
-obj-y += sysfs/
+obj-$(CONFIG_SYS_FS) += sysfs/
obj-y += devpts/

obj-$(CONFIG_PROFILING) += dcookies.o
@@ -74,7 +74,7 @@
obj-$(CONFIG_NLS) += nls/
obj-$(CONFIG_SYSV_FS) += sysv/
obj-$(CONFIG_SMB_FS) += smbfs/
-obj-$(CONFIG_CIFS) += cifs/
+obj-$(CONFIG_CIFS) += cifs/
obj-$(CONFIG_NCP_FS) += ncpfs/
obj-$(CONFIG_HPFS_FS) += hpfs/
obj-$(CONFIG_NTFS_FS) += ntfs/
diff -Nru linux-2.6.0-test1-with_elevator_patch/fs/namespace.c linux-2.6.0-test1/fs/namespace.c
--- linux-2.6.0-test1-with_elevator_patch/fs/namespace.c 2003-07-14 05:35:52.000000000 +0200
+++ linux-2.6.0-test1/fs/namespace.c 2003-07-26 15:39:05.000000000 +0200
@@ -1154,7 +1154,11 @@
d++;
i--;
} while (i);
+
+#ifdef CONFIG_SYSFS
sysfs_init();
+#endif /* CONFIG_SYSFS */
+
init_rootfs();
init_mount_tree();
}
diff -Nru linux-2.6.0-test1-with_elevator_patch/include/linux/sysfs.h linux-2.6.0-test1/include/linux/sysfs.h
--- linux-2.6.0-test1-with_elevator_patch/include/linux/sysfs.h 2003-07-14 05:32:44.000000000 +0200
+++ linux-2.6.0-test1/include/linux/sysfs.h 2003-07-26 15:28:12.000000000 +0200
@@ -25,36 +25,101 @@
ssize_t (*write)(struct kobject *, char *, loff_t, size_t);
};

-int sysfs_create_bin_file(struct kobject * kobj, struct bin_attribute * attr);
-int sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr);
-
struct sysfs_ops {
ssize_t (*show)(struct kobject *, struct attribute *,char *);
ssize_t (*store)(struct kobject *,struct attribute *,const char *, size_t);
};

+#ifdef CONFIG_SYS_FS
+
extern int
-sysfs_create_dir(struct kobject *);
+sysfs_create_bin_file(struct kobject * kobj, struct bin_attribute * attr);
+
+extern int
+sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr);
+
+extern int
+sysfs_create_dir(struct kobject * kobj);

extern void
-sysfs_remove_dir(struct kobject *);
+sysfs_remove_dir(struct kobject * kobj);

extern void
-sysfs_rename_dir(struct kobject *, char *new_name);
+sysfs_rename_dir(struct kobject * kobj, char *new_name);

extern int
-sysfs_create_file(struct kobject *, struct attribute *);
+sysfs_create_file(struct kobject * kobj, struct attribute * attr);

extern int
-sysfs_update_file(struct kobject *, struct attribute *);
+sysfs_update_file(struct kobject * kobj, struct attribute * attr);

extern void
-sysfs_remove_file(struct kobject *, struct attribute *);
+sysfs_remove_file(struct kobject * kobj, struct attribute * attr);

extern int
sysfs_create_link(struct kobject * kobj, struct kobject * target, char * name);

extern void
-sysfs_remove_link(struct kobject *, char * name);
+sysfs_remove_link(struct kobject * kobj, char * name);
+
+#else /* !CONFIG_SYS_FS */
+
+static inline int
+sysfs_create_bin_file(struct kobject * kobj, struct bin_attribute * attr)
+{
+ return 0;
+}
+
+static inline int
+sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)
+{
+ return 0;
+}
+
+static inline int
+sysfs_create_dir(struct kobject * kobj)
+{
+ return 0;
+}
+
+static inline void
+sysfs_remove_dir(struct kobject * kobj)
+{
+}
+
+static inline void
+sysfs_rename_dir(struct kobject * kobj, char *new_name)
+{
+}
+
+static inline int
+sysfs_create_file(struct kobject * kobj, struct attribute * attr)
+{
+ return 0;
+}
+
+static inline int
+sysfs_update_file(struct kobject * kobj, struct attribute * attr)
+{
+ return 0;
+}
+
+static inline void
+sysfs_remove_file(struct kobject * kobj, struct attribute * attr)
+{
+}
+
+static inline int
+sysfs_create_link(struct kobject * kobj, struct kobject * target, char * name)
+{
+ return 0;
+}
+
+static inline void
+sysfs_remove_link(struct kobject * kobj, char * name)
+{
+}
+
+#endif /* !CONFIG_SYS_FS */

#endif /* _SYSFS_H_ */


--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html


2003-07-31 01:50:10

by Miles Bader

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

Tom Rini <[email protected]> writes:
> > but not nice enough to justify requiring more memory or whatever (of
> > course just that one feature's not going to make much difference,
> > but in aggregate, they might).
>
> Well, that sort-of depends on which 'embedded' board you're talking
> about really.

The point was that in _some_ embedded systems, the space-savings is
wanted, and so a useful thing for linux to support.

-Miles
--
Suburbia: where they tear out the trees and then name streets after them.

2003-07-31 04:17:46

by Tom Rini

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

On Thu, Jul 31, 2003 at 10:49:06AM +0900, Miles Bader wrote:
> Tom Rini <[email protected]> writes:
> > > but not nice enough to justify requiring more memory or whatever (of
> > > course just that one feature's not going to make much difference,
> > > but in aggregate, they might).
> >
> > Well, that sort-of depends on which 'embedded' board you're talking
> > about really.
>
> The point was that in _some_ embedded systems, the space-savings is
> wanted, and so a useful thing for linux to support.

To what end? One of the things we (== PPC folks) at OLS was that, wow,
doing PM as some sort of one-off sucks, and if at all possible we want
to get device information (and pm dependancies) passed in so we can tell
sysfs and get any shared driver done right for free, among other
reasons.

As has been pointed out, there's things like the block layer that aren't
needed if you have just a subset of common embedded-device filesystems and
some network stuff seems to have creeped back in. All I'm trying to say
is that before you go too far down the CONFIG_SYSFS route, investigate the
others first as there's a fair chance of saving even more.

--
Tom Rini
http://gate.crashing.org/~trini/

2003-07-31 05:04:24

by Miles Bader

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

Tom Rini <[email protected]> writes:
> > The point was that in _some_ embedded systems, the space-savings is
> > wanted, and so a useful thing for linux to support.
>
> As has been pointed out, there's things like the block layer that aren't
> needed if you have just a subset of common embedded-device filesystems and
> some network stuff seems to have creeped back in. All I'm trying to say
> is that before you go too far down the CONFIG_SYSFS route, investigate the
> others first as there's a fair chance of saving even more.

I'm not really trying to defend this particular config option, just
saying that the attitude of `why bother trying to cut down, it's more
featureful to include everything!' is not always valid.

You may very well be right that other subsystems offer better
gain/pain, and I'm all for attacking the low-hanging-fruit first.

> To what end? One of the things we (== PPC folks) at OLS was that, wow,
> doing PM as some sort of one-off sucks, and if at all possible we want
> to get device information (and pm dependancies) passed in so we can tell
> sysfs and get any shared driver done right for free, among other
> reasons.

[What's PM? Power Management? What does that have to do with anything?]

-Miles
--
Would you like fries with that?

2003-07-31 15:28:18

by Tom Rini

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

On Thu, Jul 31, 2003 at 02:03:34PM +0900, Miles Bader wrote:
> Tom Rini <[email protected]> writes:
> > > The point was that in _some_ embedded systems, the space-savings is
> > > wanted, and so a useful thing for linux to support.
> >
> > As has been pointed out, there's things like the block layer that aren't
> > needed if you have just a subset of common embedded-device filesystems and
> > some network stuff seems to have creeped back in. All I'm trying to say
> > is that before you go too far down the CONFIG_SYSFS route, investigate the
> > others first as there's a fair chance of saving even more.
>
> I'm not really trying to defend this particular config option, just
> saying that the attitude of `why bother trying to cut down, it's more
> featureful to include everything!' is not always valid.

I hate email sometimes. My attitude is "some things you really can't
cut out". I really am all for trying to cut things out, it's just that
some things are tied in rather well (like sysfs and root device as
opposed to the static table before).

> You may very well be right that other subsystems offer better
> gain/pain, and I'm all for attacking the low-hanging-fruit first.
>
> > To what end? One of the things we (== PPC folks) at OLS was that, wow,
> > doing PM as some sort of one-off sucks, and if at all possible we want
> > to get device information (and pm dependancies) passed in so we can tell
> > sysfs and get any shared driver done right for free, among other
> > reasons.
>
> [What's PM? Power Management? What does that have to do with anything?]

Power Management, sysfs plays / will play a role in finding out the order
in which devices get powered down. This is important on some types of
embedded devices (and arguably important everywhere).

--
Tom Rini
http://gate.crashing.org/~trini/

2003-07-31 16:30:43

by Ihar 'Philips' Filipau

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

Tom Rini wrote:
>
> Power Management, sysfs plays / will play a role in finding out the order
> in which devices get powered down. This is important on some types of
> embedded devices (and arguably important everywhere).
>

You are contradicting to yourself.

I have participated in creation of two specialized embedded systems,
and currently going into third one.
Every system were need some specialized shutdown sequence.
None of them were need power saving.

Please do not generalize your particular system to everything else.

No one needs another self-aware self-configurable software subsystem,
which intended to do the task of the engineers. Especially when this
task takes 15 minutes to code.

2003-07-31 16:43:29

by Tom Rini

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

On Thu, Jul 31, 2003 at 06:31:29PM +0200, Ihar Philips Filipau wrote:
> Tom Rini wrote:
> >
> >Power Management, sysfs plays / will play a role in finding out the order
> >in which devices get powered down. This is important on some types of
> >embedded devices (and arguably important everywhere).
> >
>
> You are contradicting to yourself.
>
> I have participated in creation of two specialized embedded systems,
> and currently going into third one.
> Every system were need some specialized shutdown sequence.
> None of them were need power saving.

Shutdown != sleep. If you want to wake devices up again, you need to do
them in the right order.

--
Tom Rini
http://gate.crashing.org/~trini/

2003-07-31 17:04:25

by Ihar 'Philips' Filipau

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

Tom Rini wrote:
>
> Shutdown != sleep. If you want to wake devices up again, you need to do
> them in the right order.
>

You didn't get my point.
My appliances do not need sleep/shutdown at all.
Not every embedded system is a handheld ;-)

Shutdown was smth like:
# mount / -o ro; sync; lcd-off; \
dd if=/dev/zero seek=0xBYE of=/dev/port
For a long time it was shell script :-)))

2003-07-31 17:20:49

by Tom Rini

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

On Thu, Jul 31, 2003 at 07:04:49PM +0200, Ihar Philips Filipau wrote:
> Tom Rini wrote:
> >
> >Shutdown != sleep. If you want to wake devices up again, you need to do
> >them in the right order.
>
> You didn't get my point.
> My appliances do not need sleep/shutdown at all.
> Not every embedded system is a handheld ;-)

That certainly is true, yes. They might want to power things down when
the user isn't there (or maybe they don't, I don't know what you're
making :)). And I did originally say 'some'.

--
Tom Rini
http://gate.crashing.org/~trini/

2003-07-31 18:01:33

by Ihar 'Philips' Filipau

[permalink] [raw]
Subject: Re: Kernel 2.6 size increase

Tom Rini wrote:
>>
>> You didn't get my point.
>> My appliances do not need sleep/shutdown at all.
>> Not every embedded system is a handheld ;-)
>
>
> That certainly is true, yes. They might want to power things down when
> the user isn't there (or maybe they don't, I don't know what you're
> making :)). And I did originally say 'some'.
>

Can you imagine teapot?
State of your system - on/off. Power saving as done by user herself:
power is consumed only when user want to boil the water ;-)

Two devices I was working on were little bit more complicated and I
had internal UPS to be able to handle power offs/fails gracefuly (like
switching off of LCD + save of the mechanics state).

As for me - I already expressed my point on subject: we should fix
compiler and language (C's inline is definitely not ehough to express
our intentions to compiler). That's the compiler should be optimized to
gain space for performace or gain performance for space.

And sure kernel should be more configurable too.


2003-08-08 13:25:14

by David Woodhouse

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Tue, 2003-07-29 at 00:02, Bernardo Innocenti wrote:
> I've read in the Kconfig help that JFFS2 still depends on mtdblock even
> though it doesn't use it for I/O. I think I've also seen some promise
> that this dependency will eventually be removed...

It's already been removed from everything but the Kconfig file... :)

--
dwmw2

2003-08-08 14:37:55

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

David Woodhouse wrote:
> On Tue, 2003-07-29 at 00:02, Bernardo Innocenti wrote:
>
>>I've read in the Kconfig help that JFFS2 still depends on mtdblock even
>>though it doesn't use it for I/O. I think I've also seen some promise
>>that this dependency will eventually be removed...
>
> It's already been removed from everything but the Kconfig file... :)

So, let's try to remove the block layer from the kernel.
Do you reckon it would be difficult to do?

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html



2003-08-08 14:43:14

by David Woodhouse

[permalink] [raw]
Subject: Re: [uClinux-dev] Kernel 2.6 size increase

On Fri, 2003-08-08 at 15:37, Bernardo Innocenti wrote:
> So, let's try to remove the block layer from the kernel.
> Do you reckon it would be difficult to do?

Depends how thorough you want to be. I wanted to go the whole way and
actually remove the definition of struct buffer_head, then remove all
the code which would no longer compile at all..

You could be slightly less enthusiastic about it and make life easier
for yourself, I suppose.

--
dwmw2