2018-08-23 17:45:59

by Paul E. McKenney

[permalink] [raw]
Subject: Kernel-only deployments?

Hello!

Does anyone do kernel-only deployments, for example, setting up an
embedded device having a Linux kernel and absolutely no userspace
whatsoever?

The reason I as is that such a mode would be mildly useful for rcutorture.

You see, rcutorture runs entirely out of initrd, never mounting a real
root partition. The user has been required to supply the initrd, but
more people are starting to use rcutorture. This has led to confusion
and complaints about the need to supply the initrd. So I am finally
getting my rcutorture initrd act together, with significant dracut help
from Connor Shu. I added mkinitramfs support for environments such as
mine that don't support dracut, at least not without significant slashing
and burning.

The mkinitramfs approach results in about 40MB of initrd, and dracut
about 10MB. Most of this is completely useless for rcutorture, which
isn't interested in mounting filesystems, opening devices, and almost
all of the other interesting things that mkinitramfs and dracut enable.

Those who know me will not be at all surprised to learn that I went
overboard making the resulting initrd as small as possible. I started
by throwing out everything not absolutely needed by the dash and sleep
binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
This situation of course prompted me to create an initrd containing
a statically linked binary named "init" and absolutely nothing else
(not even /dev or /tmp directories), which weighs in at not quite 800KB.
This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
for a C-language "for" loop containing nothing more than a single call to
sleep()? Much of the code is there for things that I might do (dl_open(),
for example), but don't. All I can say is that there clearly aren't many
of us left who made heavy use of systems with naked-eye-visible bits!
(Or naked-finger-feelable, for that matter.)

This further prompted the idea of modifying kernel_init() to just loop
forever, perhaps not even reaping orphaned zombies [*], given an appropriate
Kconfig option and/or kernel boot parameter. I obviously cannot justify
this to save a sub-one-megabyte initrd for rcutorture, no matter how much
a wasted 800K might have offended my 30-years-ago self. If I take this
next step, there have to be quite a few others benefiting significantly
from it.

So, does anyone in the deep embedded space already do this?

Thanx, Paul

[*] What zombies??? There is no userspace!!!



2018-08-23 18:18:29

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Kernel-only deployments?

Hi Paul,

On Thu, Aug 23, 2018 at 7:44 PM Paul E. McKenney
<[email protected]> wrote:
> Does anyone do kernel-only deployments, for example, setting up an
> embedded device having a Linux kernel and absolutely no userspace
> whatsoever?

Isn't that basically the original porting guide from VxWorks to Linux?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-08-23 18:44:32

by Nicolas Pitre

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, 23 Aug 2018, Paul E. McKenney wrote:

> Hello!
>
> Does anyone do kernel-only deployments, for example, setting up an
> embedded device having a Linux kernel and absolutely no userspace
> whatsoever?

Not that I know of. For one thing, you'd lose the ability to license
your application code the way you want.

> The reason I as is that such a mode would be mildly useful for rcutorture.
>
> You see, rcutorture runs entirely out of initrd, never mounting a real
> root partition. The user has been required to supply the initrd, but
> more people are starting to use rcutorture. This has led to confusion
> and complaints about the need to supply the initrd. So I am finally
> getting my rcutorture initrd act together, with significant dracut help
> from Connor Shu. I added mkinitramfs support for environments such as
> mine that don't support dracut, at least not without significant slashing
> and burning.
>
> The mkinitramfs approach results in about 40MB of initrd, and dracut
> about 10MB. Most of this is completely useless for rcutorture, which
> isn't interested in mounting filesystems, opening devices, and almost
> all of the other interesting things that mkinitramfs and dracut enable.

No surprise there.

> Those who know me will not be at all surprised to learn that I went
> overboard making the resulting initrd as small as possible. I started
> by throwing out everything not absolutely needed by the dash and sleep
> binaries, which got me down to about 2.5MB, 1.8MB of which was libc.

That is possibly still very big. You could probably get away with a
statically linked busybox containing only the shell facilities you
require for 100K or so.

> This situation of course prompted me to create an initrd containing
> a statically linked binary named "init" and absolutely nothing else
> (not even /dev or /tmp directories), which weighs in at not quite 800KB.

This still looks big for a custom binary, unless you do have a lot of
code in there. It is already possible to have a kernel binary about that
size, and even if that's a configured down kernel, quite some complex
code remains.

The bloat might come from the C library you use. It's been a while since
glibc stopped caring about not pulling a lot of unneeded code when all
you want to do is printf(). It carries all those locale dependencies,
etc. You should look at alternative C libs to get things small.

> This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> for a C-language "for" loop containing nothing more than a single call to
> sleep()? Much of the code is there for things that I might do (dl_open(),
> for example), but don't. All I can say is that there clearly aren't many
> of us left who made heavy use of systems with naked-eye-visible bits!
> (Or naked-finger-feelable, for that matter.)

:-)

> This further prompted the idea of modifying kernel_init() to just loop
> forever, perhaps not even reaping orphaned zombies [*], given an appropriate
> Kconfig option and/or kernel boot parameter. I obviously cannot justify
> this to save a sub-one-megabyte initrd for rcutorture, no matter how much
> a wasted 800K might have offended my 30-years-ago self. If I take this
> next step, there have to be quite a few others benefiting significantly
> from it.

You could easily do it from your init binary with less trouble than
having the kernel carry such an option.

> So, does anyone in the deep embedded space already do this?

Not that I know of. Normally, if the init process dies, you typically
want the whole system to reboot (you may force a reboot upon any kernel
panic for example).


Nicolas

2018-08-23 18:44:59

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 08:16:55PM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
>
> On Thu, Aug 23, 2018 at 7:44 PM Paul E. McKenney
> <[email protected]> wrote:
> > Does anyone do kernel-only deployments, for example, setting up an
> > embedded device having a Linux kernel and absolutely no userspace
> > whatsoever?
>
> Isn't that basically the original porting guide from VxWorks to Linux?

I haven't seen that document, but if you say so. The TimeSys appnote
suggests using Linux userspace, but I can easily imagine cases where
porting the VxWorks "application" into the Linux kernel.

But do you really believe that supporting Linux-kernel-only deployments
in mainline would be something worth doing? Who aside from rcutorture
would really use such a thing?

Thanx, Paul

> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
>


2018-08-23 18:56:14

by Adam Borowski

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 10:43:59AM -0700, Paul E. McKenney wrote:
> The mkinitramfs approach results in about 40MB of initrd, and dracut
> about 10MB. Most of this is completely useless for rcutorture, which
> isn't interested in mounting filesystems, opening devices, and almost
> all of the other interesting things that mkinitramfs and dracut enable.
>
> Those who know me will not be at all surprised to learn that I went
> overboard making the resulting initrd as small as possible. I started
> by throwing out everything not absolutely needed by the dash and sleep
> binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
> This situation of course prompted me to create an initrd containing
> a statically linked binary named "init" and absolutely nothing else
> (not even /dev or /tmp directories), which weighs in at not quite 800KB.
> This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> for a C-language "for" loop containing nothing more than a single call to
> sleep()?

.globl _start
.data
req: .8byte 999999999, 999999999
.text
_start:
mov $35, %rax # syscall: nanosleep
mov $req, %rdi
xor %rsi, %rsi
syscall
jmp _start


as sl.s -o sl.o
ld sl.o -o init

'Ere you go, no libc needed. If your arch is not amd64, just say so.

If you want to do anything more complex, though -- you really want musl
or another lightweight libc instead. Glibc is utterly unfit for static
linking.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀ .globl _start↵.data↵rc: .ascii "/etc/init.d/rcS\0"↵.text↵_start
⣾⠁⢰⠒⠀⣿⡁ mov $57,%rax↵syscall↵cmp $0,%rax↵jne child↵parent:↵mov $61,%rax
⢿⡄⠘⠷⠚⠋⠀ mov $-1,%rdi↵xor %rsi,%rsi↵xor %rdx,%rdx↵syscall↵jmp parent↵child:
⠈⠳⣄⠀⠀⠀⠀ mov $59,%rax↵mov $rc,%rdi↵xor %rsi,%rsi↵xor %rdx,%rdx↵syscall

2018-08-23 19:09:32

by Willy Tarreau

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 08:54:17PM +0200, Adam Borowski wrote:
> .globl _start
> .data
> req: .8byte 999999999, 999999999
> .text
> _start:
> mov $35, %rax # syscall: nanosleep
> mov $req, %rdi
> xor %rsi, %rsi
> syscall
> jmp _start
>
>
> as sl.s -o sl.o
> ld sl.o -o init
>
> 'Ere you go, no libc needed. If your arch is not amd64, just say so.
>
> If you want to do anything more complex, though -- you really want musl
> or another lightweight libc instead. Glibc is utterly unfit for static
> linking.

Since there seems to be some interest about this, I'll repost this
here. I've developed a "nolibc" include file which implements most
common syscalls and string functions (those I use in early boot)
as static inlines so the resulting executable only contains the
code you really use :

http://git.formilux.org/?p=people/willy/nolibc.git;a=tree

Example :

$ echo "int main() { return sleep(3);}" | gcc -Os -nostdlib -include ../nolibc/nolibc.h -s -fno-exceptions -fno-asynchronous-unwind-tables -fno-unwind-tables -lgcc -o sleep -xc -
$ ls -l sleep
-rwxr-xr-x 1 willy users 664 Aug 23 20:37 sleep

It's actually used by my pre-init loader that is embedded into the
initramfs of all my kernels, to untar the modules and switch to the
initrd or rootfs. This way all my modules are contained into the
kernel image and I can easily use many different kernels with rootfs
without having to install modules.

Just in case someone curious would want to know more about it, the
(old and horrible) preinit is here :

http://git.formilux.org/?p=dist/src/flxutils.git;a=tree;f=init;h=9dc8fbae6383d9b4d56d34cc6c3d59585318bef8;hb=HEAD

And the (old and ugly) build script is here :

http://git.formilux.org/?p=dist/techno.git;a=tree;f=scripts/kernel;hb=HEAD

Yes it's aging a lot now but it's still very convenient ;-)

Willy

2018-08-23 19:17:39

by Josh Triplett

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 08:54:17PM +0200, Adam Borowski wrote:
> On Thu, Aug 23, 2018 at 10:43:59AM -0700, Paul E. McKenney wrote:
> > The mkinitramfs approach results in about 40MB of initrd, and dracut
> > about 10MB. Most of this is completely useless for rcutorture, which
> > isn't interested in mounting filesystems, opening devices, and almost
> > all of the other interesting things that mkinitramfs and dracut enable.
> >
> > Those who know me will not be at all surprised to learn that I went
> > overboard making the resulting initrd as small as possible. I started
> > by throwing out everything not absolutely needed by the dash and sleep
> > binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
> > This situation of course prompted me to create an initrd containing
> > a statically linked binary named "init" and absolutely nothing else
> > (not even /dev or /tmp directories), which weighs in at not quite 800KB.
> > This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> > for a C-language "for" loop containing nothing more than a single call to
> > sleep()?
>
> .globl _start
> .data
> req: .8byte 999999999, 999999999
> .text
> _start:
> mov $35, %rax # syscall: nanosleep
> mov $req, %rdi
> xor %rsi, %rsi
> syscall
> jmp _start
>
>
> as sl.s -o sl.o
> ld sl.o -o init
>
> 'Ere you go, no libc needed. If your arch is not amd64, just say so.

"pause" ($34) would also suffice, and would not require an argument or a
.data section.

- Josh Triplett

2018-08-23 19:23:49

by Josh Triplett

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 10:43:59AM -0700, Paul E. McKenney wrote:
> Hello!
>
> Does anyone do kernel-only deployments, for example, setting up an
> embedded device having a Linux kernel and absolutely no userspace
> whatsoever?

I would very much *like* to do this. One day I'd like to have a
CONFIG_USERSPACE that I can disable, and then just have the kernel call
an in-kernel main() where it would normally start init.

> Those who know me will not be at all surprised to learn that I went
> overboard making the resulting initrd as small as possible. I started
> by throwing out everything not absolutely needed by the dash and sleep
> binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
> This situation of course prompted me to create an initrd containing
> a statically linked binary named "init" and absolutely nothing else
> (not even /dev or /tmp directories), which weighs in at not quite 800KB.
> This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> for a C-language "for" loop containing nothing more than a single call to
> sleep()? Much of the code is there for things that I might do (dl_open(),
> for example), but don't. All I can say is that there clearly aren't many
> of us left who made heavy use of systems with naked-eye-visible bits!
> (Or naked-finger-feelable, for that matter.)

I have definitely built initramfs images containing nothing but a single
statically linked /init before.

If you want to make it even smaller, you could avoid linking in libc at
all, and just write a short assembly stub, but I don't know any way to
do that *portably* without writing raw assembly for each target
platform. That would get you down to a few kB though.

> This further prompted the idea of modifying kernel_init() to just loop
> forever, perhaps not even reaping orphaned zombies [*], given an appropriate
> Kconfig option and/or kernel boot parameter. I obviously cannot justify
> this to save a sub-one-megabyte initrd for rcutorture, no matter how much
> a wasted 800K might have offended my 30-years-ago self. If I take this
> next step, there have to be quite a few others benefiting significantly
> from it.

I would *love* to have support for omitting userspace entirely. And once
we have that, we can start ripping out so many other things...

One thought, though: that won't necessarily give you a representative
rcutorture experience, given that you need to test things like the
nohz-on-non-idle support, which interacts with "am I in userspace".

2018-08-23 19:24:36

by Ray Clinton

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 7:44 PM Paul E. McKenney
<[email protected]> wrote:
> Does anyone do kernel-only deployments, for example, setting up an
> embedded device having a Linux kernel and absolutely no userspace
> whatsoever?

To be honest I'm a total newb to kernel dev, so much so that I copied and
pasted the above quote in the hopes that I did the formatting right. I'm such
a newb that I realize I might not even understand your question.

That beingsaid, wouldn't building a uImage of the kernel and loading it onto
your device using tftpboot accomplish this?

Ray On Thu, Aug 23, 2018 at 1:46 PM Paul E. McKenney
<[email protected]> wrote:
>
> Hello!
>
> Does anyone do kernel-only deployments, for example, setting up an
> embedded device having a Linux kernel and absolutely no userspace
> whatsoever?
>
> The reason I as is that such a mode would be mildly useful for rcutorture.
>
> You see, rcutorture runs entirely out of initrd, never mounting a real
> root partition. The user has been required to supply the initrd, but
> more people are starting to use rcutorture. This has led to confusion
> and complaints about the need to supply the initrd. So I am finally
> getting my rcutorture initrd act together, with significant dracut help
> from Connor Shu. I added mkinitramfs support for environments such as
> mine that don't support dracut, at least not without significant slashing
> and burning.
>
> The mkinitramfs approach results in about 40MB of initrd, and dracut
> about 10MB. Most of this is completely useless for rcutorture, which
> isn't interested in mounting filesystems, opening devices, and almost
> all of the other interesting things that mkinitramfs and dracut enable.
>
> Those who know me will not be at all surprised to learn that I went
> overboard making the resulting initrd as small as possible. I started
> by throwing out everything not absolutely needed by the dash and sleep
> binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
> This situation of course prompted me to create an initrd containing
> a statically linked binary named "init" and absolutely nothing else
> (not even /dev or /tmp directories), which weighs in at not quite 800KB.
> This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> for a C-language "for" loop containing nothing more than a single call to
> sleep()? Much of the code is there for things that I might do (dl_open(),
> for example), but don't. All I can say is that there clearly aren't many
> of us left who made heavy use of systems with naked-eye-visible bits!
> (Or naked-finger-feelable, for that matter.)
>
> This further prompted the idea of modifying kernel_init() to just loop
> forever, perhaps not even reaping orphaned zombies [*], given an appropriate
> Kconfig option and/or kernel boot parameter. I obviously cannot justify
> this to save a sub-one-megabyte initrd for rcutorture, no matter how much
> a wasted 800K might have offended my 30-years-ago self. If I take this
> next step, there have to be quite a few others benefiting significantly
> from it.
>
> So, does anyone in the deep embedded space already do this?
>
> Thanx, Paul
>
> [*] What zombies??? There is no userspace!!!
>

2018-08-23 20:40:21

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 02:42:45PM -0400, Nicolas Pitre wrote:
> On Thu, 23 Aug 2018, Paul E. McKenney wrote:
>
> > Hello!
> >
> > Does anyone do kernel-only deployments, for example, setting up an
> > embedded device having a Linux kernel and absolutely no userspace
> > whatsoever?
>
> Not that I know of. For one thing, you'd lose the ability to license
> your application code the way you want.

Good point! I could see where that might reduce the number of potential
users below the point of usefulness.

> > The reason I as is that such a mode would be mildly useful for rcutorture.
> >
> > You see, rcutorture runs entirely out of initrd, never mounting a real
> > root partition. The user has been required to supply the initrd, but
> > more people are starting to use rcutorture. This has led to confusion
> > and complaints about the need to supply the initrd. So I am finally
> > getting my rcutorture initrd act together, with significant dracut help
> > from Connor Shu. I added mkinitramfs support for environments such as
> > mine that don't support dracut, at least not without significant slashing
> > and burning.
> >
> > The mkinitramfs approach results in about 40MB of initrd, and dracut
> > about 10MB. Most of this is completely useless for rcutorture, which
> > isn't interested in mounting filesystems, opening devices, and almost
> > all of the other interesting things that mkinitramfs and dracut enable.
>
> No surprise there.

;-)

> > Those who know me will not be at all surprised to learn that I went
> > overboard making the resulting initrd as small as possible. I started
> > by throwing out everything not absolutely needed by the dash and sleep
> > binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
>
> That is possibly still very big. You could probably get away with a
> statically linked busybox containing only the shell facilities you
> require for 100K or so.

That does sound considerably more reasonable.

> > This situation of course prompted me to create an initrd containing
> > a statically linked binary named "init" and absolutely nothing else
> > (not even /dev or /tmp directories), which weighs in at not quite 800KB.
>
> This still looks big for a custom binary, unless you do have a lot of
> code in there. It is already possible to have a kernel binary about that
> size, and even if that's a configured down kernel, quite some complex
> code remains.
>
> The bloat might come from the C library you use. It's been a while since
> glibc stopped caring about not pulling a lot of unneeded code when all
> you want to do is printf(). It carries all those locale dependencies,
> etc. You should look at alternative C libs to get things small.

Yes, I really was stupid enough to be using glibc. Sounds like I have
an easy change to reduce the size further, then. ;-)

> > This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> > for a C-language "for" loop containing nothing more than a single call to
> > sleep()? Much of the code is there for things that I might do (dl_open(),
> > for example), but don't. All I can say is that there clearly aren't many
> > of us left who made heavy use of systems with naked-eye-visible bits!
> > (Or naked-finger-feelable, for that matter.)
>
> :-)
>
> > This further prompted the idea of modifying kernel_init() to just loop
> > forever, perhaps not even reaping orphaned zombies [*], given an appropriate
> > Kconfig option and/or kernel boot parameter. I obviously cannot justify
> > this to save a sub-one-megabyte initrd for rcutorture, no matter how much
> > a wasted 800K might have offended my 30-years-ago self. If I take this
> > next step, there have to be quite a few others benefiting significantly
> > from it.
>
> You could easily do it from your init binary with less trouble than
> having the kernel carry such an option.

Got it, thank you!

> > So, does anyone in the deep embedded space already do this?
>
> Not that I know of. Normally, if the init process dies, you typically
> want the whole system to reboot (you may force a reboot upon any kernel
> panic for example).

Indeed, your licensing point earlier explains quite a bit.

Thank you again!

Thanx, Paul


2018-08-23 20:40:34

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 08:54:17PM +0200, Adam Borowski wrote:
> On Thu, Aug 23, 2018 at 10:43:59AM -0700, Paul E. McKenney wrote:
> > The mkinitramfs approach results in about 40MB of initrd, and dracut
> > about 10MB. Most of this is completely useless for rcutorture, which
> > isn't interested in mounting filesystems, opening devices, and almost
> > all of the other interesting things that mkinitramfs and dracut enable.
> >
> > Those who know me will not be at all surprised to learn that I went
> > overboard making the resulting initrd as small as possible. I started
> > by throwing out everything not absolutely needed by the dash and sleep
> > binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
> > This situation of course prompted me to create an initrd containing
> > a statically linked binary named "init" and absolutely nothing else
> > (not even /dev or /tmp directories), which weighs in at not quite 800KB.
> > This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> > for a C-language "for" loop containing nothing more than a single call to
> > sleep()?
>
> .globl _start
> .data
> req: .8byte 999999999, 999999999
> .text
> _start:
> mov $35, %rax # syscall: nanosleep
> mov $req, %rdi
> xor %rsi, %rsi
> syscall
> jmp _start
>
>
> as sl.s -o sl.o
> ld sl.o -o init
>
> 'Ere you go, no libc needed. If your arch is not amd64, just say so.

I need to be arch-independent, but I will save off your solution,
thank you!

> If you want to do anything more complex, though -- you really want musl
> or another lightweight libc instead. Glibc is utterly unfit for static
> linking.

Got it, thank you!

Thanx, Paul

> Meow!
> --
> ⢀⣴⠾⠻⢶⣦⠀ .globl _start↵.data↵rc: .ascii "/etc/init.d/rcS\0"↵.text↵_start
> ⣾⠁⢰⠒⠀⣿⡁ mov $57,%rax↵syscall↵cmp $0,%rax↵jne child↵parent:↵mov $61,%rax
> ⢿⡄⠘⠷⠚⠋⠀ mov $-1,%rdi↵xor %rsi,%rsi↵xor %rdx,%rdx↵syscall↵jmp parent↵child:
> ⠈⠳⣄⠀⠀⠀⠀ mov $59,%rax↵mov $rc,%rdi↵xor %rsi,%rsi↵xor %rdx,%rdx↵syscall
>


2018-08-23 20:41:59

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 12:16:04PM -0700, Josh Triplett wrote:
> On Thu, Aug 23, 2018 at 08:54:17PM +0200, Adam Borowski wrote:
> > On Thu, Aug 23, 2018 at 10:43:59AM -0700, Paul E. McKenney wrote:
> > > The mkinitramfs approach results in about 40MB of initrd, and dracut
> > > about 10MB. Most of this is completely useless for rcutorture, which
> > > isn't interested in mounting filesystems, opening devices, and almost
> > > all of the other interesting things that mkinitramfs and dracut enable.
> > >
> > > Those who know me will not be at all surprised to learn that I went
> > > overboard making the resulting initrd as small as possible. I started
> > > by throwing out everything not absolutely needed by the dash and sleep
> > > binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
> > > This situation of course prompted me to create an initrd containing
> > > a statically linked binary named "init" and absolutely nothing else
> > > (not even /dev or /tmp directories), which weighs in at not quite 800KB.
> > > This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> > > for a C-language "for" loop containing nothing more than a single call to
> > > sleep()?
> >
> > .globl _start
> > .data
> > req: .8byte 999999999, 999999999
> > .text
> > _start:
> > mov $35, %rax # syscall: nanosleep
> > mov $req, %rdi
> > xor %rsi, %rsi
> > syscall
> > jmp _start
> >
> >
> > as sl.s -o sl.o
> > ld sl.o -o init
> >
> > 'Ere you go, no libc needed. If your arch is not amd64, just say so.
>
> "pause" ($34) would also suffice, and would not require an argument or a
> .data section.

Cute! ;-)

Thanx, Paul


2018-08-23 20:46:34

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 12:12:35PM -0700, Josh Triplett wrote:
> On Thu, Aug 23, 2018 at 10:43:59AM -0700, Paul E. McKenney wrote:
> > Hello!
> >
> > Does anyone do kernel-only deployments, for example, setting up an
> > embedded device having a Linux kernel and absolutely no userspace
> > whatsoever?
>
> I would very much *like* to do this. One day I'd like to have a
> CONFIG_USERSPACE that I can disable, and then just have the kernel call
> an in-kernel main() where it would normally start init.

This looks to be an easy change, though it might not seem so easy
after starting to try it out. ;-)

> > Those who know me will not be at all surprised to learn that I went
> > overboard making the resulting initrd as small as possible. I started
> > by throwing out everything not absolutely needed by the dash and sleep
> > binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
> > This situation of course prompted me to create an initrd containing
> > a statically linked binary named "init" and absolutely nothing else
> > (not even /dev or /tmp directories), which weighs in at not quite 800KB.
> > This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> > for a C-language "for" loop containing nothing more than a single call to
> > sleep()? Much of the code is there for things that I might do (dl_open(),
> > for example), but don't. All I can say is that there clearly aren't many
> > of us left who made heavy use of systems with naked-eye-visible bits!
> > (Or naked-finger-feelable, for that matter.)
>
> I have definitely built initramfs images containing nothing but a single
> statically linked /init before.

Cool!

> If you want to make it even smaller, you could avoid linking in libc at
> all, and just write a short assembly stub, but I don't know any way to
> do that *portably* without writing raw assembly for each target
> platform. That would get you down to a few kB though.

I do need portability. And even 800K isn't -that- big a deal, much
though my earlier self would disbelieve this.

> > This further prompted the idea of modifying kernel_init() to just loop
> > forever, perhaps not even reaping orphaned zombies [*], given an appropriate
> > Kconfig option and/or kernel boot parameter. I obviously cannot justify
> > this to save a sub-one-megabyte initrd for rcutorture, no matter how much
> > a wasted 800K might have offended my 30-years-ago self. If I take this
> > next step, there have to be quite a few others benefiting significantly
> > from it.
>
> I would *love* to have support for omitting userspace entirely. And once
> we have that, we can start ripping out so many other things...

;-)

> One thought, though: that won't necessarily give you a representative
> rcutorture experience, given that you need to test things like the
> nohz-on-non-idle support, which interacts with "am I in userspace".

That is an excellent point. I should keep the initrd specifically to
retain userspace execution, and should also occasionally run CPU-bound
in userspace. Easy enough! And thank you!

Thanx, Paul


2018-08-23 20:50:40

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 03:22:48PM -0400, Ray Clinton wrote:
> On Thu, Aug 23, 2018 at 7:44 PM Paul E. McKenney
> <[email protected]> wrote:
> > Does anyone do kernel-only deployments, for example, setting up an
> > embedded device having a Linux kernel and absolutely no userspace
> > whatsoever?
>
> To be honest I'm a total newb to kernel dev, so much so that I copied and
> pasted the above quote in the hopes that I did the formatting right. I'm such
> a newb that I realize I might not even understand your question.

;-) ;-) ;-)

> That beingsaid, wouldn't building a uImage of the kernel and loading it onto
> your device using tftpboot accomplish this?

I do something vaguely similar, but instead use qemu, passing it arguments
to grab the kernel from the filesystem. Here is an example qemu command
generated by the rcutorture scripts:

qemu-system-x86_64 -enable-kvm -nographic -smp 1 -serial file:/home/paulmck/public_git/linux-rcu/tools/testing/selftests/rcutorture/res/2018.08.23-10:22:45/TREE09/console.log -m 512 -kernel /home/paulmck/public_git/linux-rcu/tools/testing/selftests/rcutorture/res/2018.08.23-10:22:45/TREE09/bzImage -append "noapic selinux=0 initcall_debug debug console=ttyS0 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=600 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"

This runs single-threaded, captures console output on a file named
"console.log", provides 512MB of memory, grabs the kernel from the
specified "bzImage" file, and passes in a bunch of kernel parameters.

See tools/testing/selftests/rcutorture in recent Linux-kernel source
trees for more information, should you want more. ;-)

Thanx, Paul

> Ray On Thu, Aug 23, 2018 at 1:46 PM Paul E. McKenney
> <[email protected]> wrote:
> >
> > Hello!
> >
> > Does anyone do kernel-only deployments, for example, setting up an
> > embedded device having a Linux kernel and absolutely no userspace
> > whatsoever?
> >
> > The reason I as is that such a mode would be mildly useful for rcutorture.
> >
> > You see, rcutorture runs entirely out of initrd, never mounting a real
> > root partition. The user has been required to supply the initrd, but
> > more people are starting to use rcutorture. This has led to confusion
> > and complaints about the need to supply the initrd. So I am finally
> > getting my rcutorture initrd act together, with significant dracut help
> > from Connor Shu. I added mkinitramfs support for environments such as
> > mine that don't support dracut, at least not without significant slashing
> > and burning.
> >
> > The mkinitramfs approach results in about 40MB of initrd, and dracut
> > about 10MB. Most of this is completely useless for rcutorture, which
> > isn't interested in mounting filesystems, opening devices, and almost
> > all of the other interesting things that mkinitramfs and dracut enable.
> >
> > Those who know me will not be at all surprised to learn that I went
> > overboard making the resulting initrd as small as possible. I started
> > by throwing out everything not absolutely needed by the dash and sleep
> > binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
> > This situation of course prompted me to create an initrd containing
> > a statically linked binary named "init" and absolutely nothing else
> > (not even /dev or /tmp directories), which weighs in at not quite 800KB.
> > This is a great improvement over 10MB, to say nothing of 40MB, but 800KB
> > for a C-language "for" loop containing nothing more than a single call to
> > sleep()? Much of the code is there for things that I might do (dl_open(),
> > for example), but don't. All I can say is that there clearly aren't many
> > of us left who made heavy use of systems with naked-eye-visible bits!
> > (Or naked-finger-feelable, for that matter.)
> >
> > This further prompted the idea of modifying kernel_init() to just loop
> > forever, perhaps not even reaping orphaned zombies [*], given an appropriate
> > Kconfig option and/or kernel boot parameter. I obviously cannot justify
> > this to save a sub-one-megabyte initrd for rcutorture, no matter how much
> > a wasted 800K might have offended my 30-years-ago self. If I take this
> > next step, there have to be quite a few others benefiting significantly
> > from it.
> >
> > So, does anyone in the deep embedded space already do this?
> >
> > Thanx, Paul
> >
> > [*] What zombies??? There is no userspace!!!
> >
>


2018-08-23 20:55:28

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: Kernel-only deployments?

Hi all!

On Thu, 2018-08-23 at 10:43 -0700, Paul E. McKenney wrote:
> [...]
> Does anyone do kernel-only deployments, for example, setting up an
> embedded device having a Linux kernel and absolutely no userspace
> whatsoever?
[...]
> You see, rcutorture runs entirely out of initrd, never mounting a real
> root partition. The user has been required to supply the initrd, but

IMHO running programs from the initrd is in user-space, but anyways:

Ages ago at some former employer, we built an embedded Linux device on
an MPC-860 board (but that shouldn't make a significant difference to
other architectures) based on the (at that time) brand new 2.4 kernel
which ran completely out of the initrd (which obviously contained the
whole root filesystem).

[...]
> by throwing out everything not absolutely needed by the dash and sleep
> binaries, which got me down to about 2.5MB, 1.8MB of which was libc.

We had a working glibc binary (which as the largest binary on the
filesystem) and just used it (and never got time and/or necessity to
use something else like ulibc, newlibc or build glibc ourselves to
leave all unneeded stuff out).

We basically built the filesystem - the distribution as such;-) - from
scratch (only self-crafted `configure` calls around[0]) and - thus -
used busybox and ash (IIRC) - so throw dash, core-utils etc. away and
just use busybox (or something similar) for further space savings.

The whole startup and daemon management was done with busybox' "init"
via a simple /etc/inittab (that were the good old times;-) and it was
enough as one can start one-time programs at boot time (e.g. to load
kernel modules (and remove the file in the filesystem from the
filesystem[0]) or configure stuff via sysctl) and restart daemons. We
didn't need run-levels ...

> This situation of course prompted me to create an initrd containing
> a statically linked binary named "init" and absolutely nothing else
> (not even /dev or /tmp directories), which weighs in at not quite 800KB.

That is probably the smallest solution - if it's enough. If it's all
GPL, just link it statically against dietlibc ....

We had all of the usual directories and a somewhat filled /dev
(completely static in the initrd IIRC, no udev or similar dynamic stuff
was needed) as we had dropbear as ssh-server, a small webserver+CGI-
script for a web interface and a SNMP agent (hacked net-smtp as we had
our own configuration daemon and needed SNMP only as a transport
protocol).

[...]

MfG,
Bernd

[0]: Every byte counts and size does matter;-)
--
Bernd Petrovitsch Email : [email protected]
LUGA : http://www.luga.at


2018-08-23 20:58:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Kernel-only deployments?

On Thu, Aug 23, 2018 at 09:52:17PM +0200, Bernd Petrovitsch wrote:
> Hi all!
>
> On Thu, 2018-08-23 at 10:43 -0700, Paul E. McKenney wrote:
> > [...]
> > Does anyone do kernel-only deployments, for example, setting up an
> > embedded device having a Linux kernel and absolutely no userspace
> > whatsoever?
> [...]
> > You see, rcutorture runs entirely out of initrd, never mounting a real
> > root partition. The user has been required to supply the initrd, but
>
> IMHO running programs from the initrd is in user-space, but anyways:

Agreed, rcutorture still has a userspace, albeit a small one. I was
wondering if I should take the next step and eliminate userspace entirely.
Josh Triplett pointed out that doing so would reduce my test coverage,
so the answer is that I should not eliminate userspace entirely for
rcutorture.

> Ages ago at some former employer, we built an embedded Linux device on
> an MPC-860 board (but that shouldn't make a significant difference to
> other architectures) based on the (at that time) brand new 2.4 kernel
> which ran completely out of the initrd (which obviously contained the
> whole root filesystem).

Cute! The rcutorture test scripts do something similar, but you
clearly got there long before I did.

> [...]
> > by throwing out everything not absolutely needed by the dash and sleep
> > binaries, which got me down to about 2.5MB, 1.8MB of which was libc.
>
> We had a working glibc binary (which as the largest binary on the
> filesystem) and just used it (and never got time and/or necessity to
> use something else like ulibc, newlibc or build glibc ourselves to
> leave all unneeded stuff out).
>
> We basically built the filesystem - the distribution as such;-) - from
> scratch (only self-crafted `configure` calls around[0]) and - thus -
> used busybox and ash (IIRC) - so throw dash, core-utils etc. away and
> just use busybox (or something similar) for further space savings.
>
> The whole startup and daemon management was done with busybox' "init"
> via a simple /etc/inittab (that were the good old times;-) and it was
> enough as one can start one-time programs at boot time (e.g. to load
> kernel modules (and remove the file in the filesystem from the
> filesystem[0]) or configure stuff via sysctl) and restart daemons. We
> didn't need run-levels ...

Indeed, concerns about possible additional boot-time kernel-userspace
interactions led me to use dracut or mkinitramfs if available, and
hand-craft the "init" binary only if neither was present.

> > This situation of course prompted me to create an initrd containing
> > a statically linked binary named "init" and absolutely nothing else
> > (not even /dev or /tmp directories), which weighs in at not quite 800KB.
>
> That is probably the smallest solution - if it's enough. If it's all
> GPL, just link it statically against dietlibc ....

Sounds like there are a number of reduced-weight libc libraries
available.

> We had all of the usual directories and a somewhat filled /dev
> (completely static in the initrd IIRC, no udev or similar dynamic stuff
> was needed) as we had dropbear as ssh-server, a small webserver+CGI-
> script for a web interface and a SNMP agent (hacked net-smtp as we had
> our own configuration daemon and needed SNMP only as a transport
> protocol).

Cool! Me, I currently leave networking out. I compile it into the
kernel to catch build problems, but don't actually exercise the
networking code.

> [...]
>
> MfG,
> Bernd
>
> [0]: Every byte counts and size does matter;-)

;-) ;-) ;-)

Thanx, Paul

> --
> Bernd Petrovitsch Email : [email protected]
> LUGA : http://www.luga.at
>


2023-02-15 02:37:35

by Zhangjin Wu

[permalink] [raw]
Subject: Re: Re: Kernel-only deployments?

Hi, Willy & Paul

Thanks very much for your work on nolibc, based on the nolibc feature
and the gc-sections feature from Paul Burton, I have tried to 'gc' the
dead system calls not used in the nolibc applications.

Tests shows, the gc-sections shrinks a minimal config of RISC-V 64 by
~10% and the gc-sections for syscalls shrinks another ~4.6% (~200k).

Since nolibc has been added into tools/include/nolibc, it may be
possible to auto 'gc' the dead syscalls automatically while building the
nolibc based initrd, but it requires to auto update the architecture
specific system call table after building the nolibc application:

1. Eliminate the unused functions and syscalls of the nolibc application

add -ffunction-sections -fdata-sections and -Wl,--gc-sections to
compile the nolibc application

2. Dump the used syscalls with the help of objdump

This is architecture dependent, a RISC-V 64 example:

riscv64-linux-gnu-objdump -d $nolibc_bin | \
egrep "li[[:space:]]*a7|ecall" | \
egrep -B1 ecall | \
egrep "li[[:space:]]*a7" | \
rev | cut -d ' ' -f1 | rev | cut -d ',' -f2 | \
sort -u -g

Use a simple hello.c with reboot() at the end as an example, the
dumped syscall numbers are:

64
93
142

3. Update architecture specific system call table

Use RISC-V 64 as an example, arch/riscv/kernel/syscall_table.c:

diff --git a/arch/riscv/kernel/syscall_table.c b/arch/riscv/kernel/syscall_table.c
index 44b1420a2270..3b48a94c0ae8 100644
--- a/arch/riscv/kernel/syscall_table.c
+++ b/arch/riscv/kernel/syscall_table.c
@@ -14,5 +14,10 @@

void * const sys_call_table[__NR_syscalls] = {
[0 ... __NR_syscalls - 1] = sys_ni_syscall,
-#include <asm/unistd.h>
+// AUTO INSERT START
+ [64] = sys_write,
+ [93] = sys_exit,
+ [142] = sys_reboot,
+// AUTO INSERT END
+// #include <asm/unistd.h>
};

4. Build kernel with gc-sections, the unused syscalls will be eliminated

It is not that complicated, but to mainline such a feature and let it
support more architectures, it is not that easy. I have written more
about this here:
https://lore.kernel.org/linux-riscv/[email protected]/

So, is such a feature really useful? does anyone in the deep embedded
space already do this? welcome your suggestion.

Thanks
- Zhangjin Wu

On Thu, 23 Aug 2018 18:38:12 -0400, Willy Tarreau wrote:
>
> On Thu, Aug 23, 2018 at 08:54:17PM +0200, Adam Borowski wrote:
> > .globl _start
> > .data
> > req: .8byte 999999999, 999999999
> > .text
> > _start:
> > mov $35, %rax # syscall: nanosleep
> > mov $req, %rdi
> > xor %rsi, %rsi
> > syscall
> > jmp _start
> >
> >
> > as sl.s -o sl.o
> > ld sl.o -o init
> >
> > 'Ere you go, no libc needed. If your arch is not amd64, just say so.
> >
> > If you want to do anything more complex, though -- you really want musl
> > or another lightweight libc instead. Glibc is utterly unfit for static
> > linking.
>
> Since there seems to be some interest about this, I'll repost this
> here. I've developed a "nolibc" include file which implements most
> common syscalls and string functions (those I use in early boot)
> as static inlines so the resulting executable only contains the
> code you really use :
>
> http://git.formilux.org/?p=people/willy/nolibc.git;a=tree
>
> Example :
>
> $ echo "int main() { return sleep(3);}" | gcc -Os -nostdlib -include ../nolibc/nolibc.h -s -fno-exceptions -fno-asynchronous-unwind-tables -fno-unwind-tables -lgcc -o sleep -xc -
> $ ls -l sleep
> -rwxr-xr-x 1 willy users 664 Aug 23 20:37 sleep
>
> It's actually used by my pre-init loader that is embedded into the
> initramfs of all my kernels, to untar the modules and switch to the
> initrd or rootfs. This way all my modules are contained into the
> kernel image and I can easily use many different kernels with rootfs
> without having to install modules.
>
> Just in case someone curious would want to know more about it, the
> (old and horrible) preinit is here :
>
> http://git.formilux.org/?p=dist/src/flxutils.git;a=tree;f=init;h=9dc8fbae6383d9b4d56d34cc6c3d59585318bef8;hb=HEAD
>
> And the (old and ugly) build script is here :
>
> http://git.formilux.org/?p=dist/techno.git;a=tree;f=scripts/kernel;hb=HEAD
>
> Yes it's aging a lot now but it's still very convenient ;-)
>
> Willy

2023-02-15 09:48:42

by Willy Tarreau

[permalink] [raw]
Subject: Re: Re: Kernel-only deployments?

Hi Wu,

On Wed, Feb 15, 2023 at 10:35:57AM +0800, Zhangjin Wu wrote:
> Hi, Willy & Paul
>
> Thanks very much for your work on nolibc, based on the nolibc feature
> and the gc-sections feature from Paul Burton, I have tried to 'gc' the
> dead system calls not used in the nolibc applications.
>
> Tests shows, the gc-sections shrinks a minimal config of RISC-V 64 by
> ~10% and the gc-sections for syscalls shrinks another ~4.6% (~200k).
>
> Since nolibc has been added into tools/include/nolibc, it may be
> possible to auto 'gc' the dead syscalls automatically while building the
> nolibc based initrd, but it requires to auto update the architecture
> specific system call table after building the nolibc application:
>
> 1. Eliminate the unused functions and syscalls of the nolibc application
>
> add -ffunction-sections -fdata-sections and -Wl,--gc-sections to
> compile the nolibc application
>
> 2. Dump the used syscalls with the help of objdump
>
> This is architecture dependent, a RISC-V 64 example:
>
> riscv64-linux-gnu-objdump -d $nolibc_bin | \
> egrep "li[[:space:]]*a7|ecall" | \
> egrep -B1 ecall | \
> egrep "li[[:space:]]*a7" | \
> rev | cut -d ' ' -f1 | rev | cut -d ',' -f2 | \
> sort -u -g
>
> Use a simple hello.c with reboot() at the end as an example, the
> dumped syscall numbers are:
>
> 64
> 93
> 142
>
> 3. Update architecture specific system call table
>
> Use RISC-V 64 as an example, arch/riscv/kernel/syscall_table.c:
>
> diff --git a/arch/riscv/kernel/syscall_table.c b/arch/riscv/kernel/syscall_table.c
> index 44b1420a2270..3b48a94c0ae8 100644
> --- a/arch/riscv/kernel/syscall_table.c
> +++ b/arch/riscv/kernel/syscall_table.c
> @@ -14,5 +14,10 @@
>
> void * const sys_call_table[__NR_syscalls] = {
> [0 ... __NR_syscalls - 1] = sys_ni_syscall,
> -#include <asm/unistd.h>
> +// AUTO INSERT START
> + [64] = sys_write,
> + [93] = sys_exit,
> + [142] = sys_reboot,
> +// AUTO INSERT END
> +// #include <asm/unistd.h>
> };
>
> 4. Build kernel with gc-sections, the unused syscalls will be eliminated
>
> It is not that complicated, but to mainline such a feature and let it
> support more architectures, it is not that easy. I have written more
> about this here:
> https://lore.kernel.org/linux-riscv/[email protected]/

Yeah I noticed your message (though didn't yet have time to respond). If
find it interesting from an academic perspective at least.

> So, is such a feature really useful? does anyone in the deep embedded
> space already do this? welcome your suggestion.

The thing is that you will clearly not be able to compile realistic
applications with nolibc. Its goal is just to support test programs
or ultra-basic shells or init programs for which a libc is either
annoying (e.g. for kernel development you prefer to use the -nolibc
toolchains) or overkill (you don't always want to inflate your embedded
initramfs by hundreds of kB for a 300 bytes program, especially when
your kernel size approaches the maximum size of your flash device like
I recently had).

But for real applications you will definitely need to have a real libc
such as klibc or musl.

However the value I'm seeing in your work is to be able to show the
cost of families of syscalls and features. Instead of automatically
trimming them depending on what the application uses, I think it could
be useful to spot groups that dominate the size of these 200kB savings,
and possibly add build options to allow to remove them. In this case it
becomes easy to add tests for them (including using nolibc) that are
representative to what a some application would need and quickly verify
if a given kernel config has chances to work with this or that application.

This approach is even better because it won't force you to limit your
analysis to syscalls, but it can also cover other optional areas and
help application developers estimate the rough amount of savings they
can make by removing some parts if it's estimated that the application
will not use them.

Just my two cents,
Willy

2023-02-16 13:10:15

by Zhangjin Wu

[permalink] [raw]
Subject: Re: Re: Kernel-only deployments?

Hi, Willy

> On Wed, 15 Feb 2023 10:47:51 +0100, Willy Tarreau wrote:
>
> Hi Wu,
>
> On Wed, Feb 15, 2023 at 10:35:57AM +0800, Zhangjin Wu wrote:
> > Hi, Willy & Paul
> >
> > Thanks very much for your work on nolibc, based on the nolibc feature
> > and the gc-sections feature from Paul Burton, I have tried to 'gc' the
> > dead system calls not used in the nolibc applications.
> >
> > Tests shows, the gc-sections shrinks a minimal config of RISC-V 64 by
> > ~10% and the gc-sections for syscalls shrinks another ~4.6% (~200k).
> >
> > Since nolibc has been added into tools/include/nolibc, it may be
> > possible to auto 'gc' the dead syscalls automatically while building the
> > nolibc based initrd, but it requires to auto update the architecture
> > specific system call table after building the nolibc application:
> >
> > 1. Eliminate the unused functions and syscalls of the nolibc application
> >
> > add -ffunction-sections -fdata-sections and -Wl,--gc-sections to
> > compile the nolibc application
> >
> > 2. Dump the used syscalls with the help of objdump
> >
> > This is architecture dependent, a RISC-V 64 example:
> >
> > riscv64-linux-gnu-objdump -d $nolibc_bin | \
> > egrep "li[[:space:]]*a7|ecall" | \
> > egrep -B1 ecall | \
> > egrep "li[[:space:]]*a7" | \
> > rev | cut -d ' ' -f1 | rev | cut -d ',' -f2 | \
> > sort -u -g
> >
> > Use a simple hello.c with reboot() at the end as an example, the
> > dumped syscall numbers are:
> >
> > 64
> > 93
> > 142
> >
> > 3. Update architecture specific system call table
> >
> > Use RISC-V 64 as an example, arch/riscv/kernel/syscall_table.c:
> >
> > diff --git a/arch/riscv/kernel/syscall_table.c b/arch/riscv/kernel/syscall_table.c
> > index 44b1420a2270..3b48a94c0ae8 100644
> > --- a/arch/riscv/kernel/syscall_table.c
> > +++ b/arch/riscv/kernel/syscall_table.c
> > @@ -14,5 +14,10 @@
> >
> > void * const sys_call_table[__NR_syscalls] = {
> > [0 ... __NR_syscalls - 1] = sys_ni_syscall,
> > -#include <asm/unistd.h>
> > +// AUTO INSERT START
> > + [64] = sys_write,
> > + [93] = sys_exit,
> > + [142] = sys_reboot,
> > +// AUTO INSERT END
> > +// #include <asm/unistd.h>
> > };
> >
> > 4. Build kernel with gc-sections, the unused syscalls will be eliminated
> >
> > It is not that complicated, but to mainline such a feature and let it
> > support more architectures, it is not that easy. I have written more
> > about this here:
> > https://lore.kernel.org/linux-riscv/[email protected]/
>
> Yeah I noticed your message (though didn't yet have time to respond). If
> find it interesting from an academic perspective at least.
>

Thanks very much for your kindly reply and suggestion ;-)

> > So, is such a feature really useful? does anyone in the deep embedded
> > space already do this? welcome your suggestion.
>
> The thing is that you will clearly not be able to compile realistic
> applications with nolibc. Its goal is just to support test programs
> or ultra-basic shells or init programs for which a libc is either
> annoying (e.g. for kernel development you prefer to use the -nolibc
> toolchains) or overkill (you don't always want to inflate your embedded
> initramfs by hundreds of kB for a 300 bytes program, especially when
> your kernel size approaches the maximum size of your flash device like
> I recently had).
>
> But for real applications you will definitely need to have a real libc
> such as klibc or musl.
>

Yeah, that is exactly the cause why I use nolibc as the base to think about
dead system call elimination, currently, not for real applications, not for
real products, only for possibility estimation, it is part of my long-term
community tinylinux work: https://tinylab.org/tinylinux

With nolibc, especially after its integration into the kernel source
code tree, the kernel+user becomes a monolithic software, it can simply
tell us what system calls (and of course may use some other kernel
interfaces, such as /dev, /tmp, /proc, /sys, here we focus on syscalls)
it uses and then we can put the 'C' lib part aside and focus on the
kernel part.

With a real bigger libc, even only with a small initramfs, the work to dig out
the used system calls is very hard and time-cost, although it is possible,
kernel+nolibc is such a good simplified 'model' for such a type of kernel
development.

> However the value I'm seeing in your work is to be able to show the
> cost of families of syscalls and features. Instead of automatically
> trimming them depending on what the application uses, I think it could
> be useful to spot groups that dominate the size of these 200kB savings,
> and possibly add build options to allow to remove them. In this case it
> becomes easy to add tests for them (including using nolibc) that are
> representative to what a some application would need and quickly verify
> if a given kernel config has chances to work with this or that application.
>

This is really a right direction, and I have tried to add many config
options for different syscalls:
https://github.com/tinyclub/tinylinux/tree/2.6.35/dev/syscall-cfg

And under this kernel menu:

General setup --->
Configure standard kernel features (expert users) --->

There have been more than 10+ syscall options, but this direction has at
least two potential issues:

- The manual splitting of a new system call option is very hard and the
upstream to mainline is hard too. If x 451 (__NR_syscalls in generic
unistd.h), the work will be huge.

But we still need to split some syscalls manually, for example, vdso
syscalls in some architectures (e.g. MIPS) are not configurable. some
other syscalls may be just 'referenced' directly in kernel space, but
not really 'used'. such ones should be found out.

- The configure of the options is not that easy, the kernel engineers
should co-operate with the application engineers cheek by jowl and
then test them carefully, may fail, re-build, fail, re-build.

With 'dead system call elimination', application engineers can care about their
own functions development, kernel engineers can simply enable the 'dead system
call elimination', the left parts could be submitted to a script to dump out
all of the system calls used by the kernel+user monolithic software (the same
to a real system, but need to do more).

I have prepared several RFC patches to implement a draft support of 'dead
system call elimination', will send them out later and welcome your review.

> This approach is even better because it won't force you to limit your
> analysis to syscalls, but it can also cover other optional areas and
> help application developers estimate the rough amount of savings they
> can make by removing some parts if it's estimated that the application
> will not use them.

Yes, syscall elimination is only of the options to tinylinux, here are
some others we tried: https://github.com/tinyclub/tinylinux/branches

Thanks,
- Zhangjin Wu

>
> Just my two cents,
> Willy