2000-11-09 14:43:16

by Eric W. Biederman

[permalink] [raw]
Subject: Q: Linux rebooting directly into linux.


I have recently developed a patch that allows linux to directly boot
into another linux kernel. With the code freeze it appears
inappropriate to submit it at this time.

Linus in principal do you have any trouble with this kind of
functionality?

The immediate applications of this code, are:
- Clusters can network can network boot over arbitrary network
interfaces, and the network driver only needs to be written and
maintained in one place.
- Multiplatform boot loaders can be written.
- The Linux kernel can be included in a boot ROM and you can still
boot other linux kernels.
- Kernel developers can have a fast interface for booting into a
development kernel.

The interface is designed to be simple and inflexible yet very
powerful. To that end the code just takes an elf binary, and a
command line. The started image also takes an environment generated
by the kernel of all of the unprobeable hardware details.

ELF was picked for it's multiplatform support and the sheer simplicity
of it's program header. Plus you can use standard tools to generate
elf images fairly easily.

The environment passed to a loaded image is designed to expand and
handle new data types without breaking old decoders. They just break
because the don't support the new hardware :)

Linus the path I envision is that this code gets integrated early in
2.5. This includes cleaning up the boot paths so all our C code has
to deal with is this new format. Then backporting the functionality
to 2.4 and possibly 2.2.

The kernel patches can be found in:
ftp://ftp.linuxnetworx.com/pub/kexec-patches-1.0.tar.gz
(This is a patchset with 4 patches
1 Ingo Molanar's improved apic support
2 My enhancements upon it so we restore the apics to their boot
state when we shut down.
3 My 2 line patch to make certain that in smp_send_stop
the last cpu running is the boot cpu. (Required by the MP spec...)
4 The code to support execing a new kernel. )

The code to generate a image bootable by this new syscall is in:
ftp://ftp.linuxnetworx.com/pub/mkelfImage-1.0.tar.gz
(This is a perl script that takes a kernel and possibly a ramdisk
and a command line and generates an elfimage suitable to be booted
in this new infrastructure)

Eric

p.s. Linus the code is not included inline because I don't expect it to
be included just yet.


2000-11-11 20:18:29

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Michael Rothwell <[email protected]> writes:

> "Eric W. Biederman" wrote:
> >
> > I have recently developed a patch that allows linux to directly boot
> > into another linux kernel.
>
> This would rock. One place I can think of using it is with distro
> installers. The installer boots a generic i386 kernel, and then installs
> an optimized (i.e, PIII, etc.) kernel for run-time.

This would rock? It already does. Of course the installers need
to actually uses this.

Eric


2000-11-11 20:18:49

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Wakko Warner <[email protected]> writes:

> > I have recently developed a patch that allows linux to directly boot
> > into another linux kernel. With the code freeze it appears
> > inappropriate to submit it at this time.
> >
> > Linus in principal do you have any trouble with this kind of
> > functionality?
> >
> > The immediate applications of this code, are:
> > - Clusters can network can network boot over arbitrary network
> > interfaces, and the network driver only needs to be written and
> > maintained in one place.
> > - Multiplatform boot loaders can be written.
> > - The Linux kernel can be included in a boot ROM and you can still
> > boot other linux kernels.
> > - Kernel developers can have a fast interface for booting into a
> > development kernel.
> >
> > The interface is designed to be simple and inflexible yet very
> > powerful. To that end the code just takes an elf binary, and a
> > command line. The started image also takes an environment generated
> > by the kernel of all of the unprobeable hardware details.
>
> Isn't this what milo does on alpha?

Similar milo uses kernel drivers in it's own framework.
This has proved to be a major maintenance problem. Milo is nearly
a kernel fork.

The design is for the long term to get this incorporated into the
kernel, and even if not a small kernel patch should be easier to
maintain that a harness for calling kernel drivers.

Eric


2000-11-11 20:34:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Followup to: <[email protected]>
By author: [email protected] (Eric W. Biederman)
In newsgroup: linux.dev.kernel
> > >
> > > The interface is designed to be simple and inflexible yet very
> > > powerful. To that end the code just takes an elf binary, and a
> > > command line. The started image also takes an environment generated
> > > by the kernel of all of the unprobeable hardware details.
> >
> > Isn't this what milo does on alpha?
>
> Similar milo uses kernel drivers in it's own framework.
> This has proved to be a major maintenance problem. Milo is nearly
> a kernel fork.
>
> The design is for the long term to get this incorporated into the
> kernel, and even if not a small kernel patch should be easier to
> maintain that a harness for calling kernel drivers.
>

I'm working on something similiar in "Genesis". It pretty much is (or
rather, will be) a kernel *port*, not a fork; the port is such that it
can run on top of a simple BIOS extender and thus access the boot
media.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-11-11 22:12:14

by Adam Lazur

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Eric W. Biederman ([email protected]) said:
> I have recently developed a patch that allows linux to directly boot
> into another linux kernel. With the code freeze it appears
> inappropriate to submit it at this time.

Aside from what looks to be support for SMP, how does this differ from
the two kernel monte stuff at http://scyld.com/software/monte.html ?

.adam

--
[ Adam Lazur, NOW Monkey <[email protected]> ]
[ Progeny Linux Systems http://progeny.com ]

2000-11-11 22:47:00

by Adam Lazur

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Eric W. Biederman ([email protected]) said:
> Michael Rothwell <[email protected]> writes:
> > This would rock. One place I can think of using it is with distro
> > installers. The installer boots a generic i386 kernel, and then installs
> > an optimized (i.e, PIII, etc.) kernel for run-time.
>
> This would rock? It already does. Of course the installers need
> to actually uses this.

Actually, along the lines of what Scyld uses two kernel monte for with
their Beowulf2 distribution.

They boot a network enabled kernel which pulls a kernel off of a server
and then uses two kernel monte to boot with that one. This allows you
to centrally admin your cluster with one server. Good stuff...

.adam

--
[ Adam Lazur, NOW Monkey <[email protected]> ]
[ Progeny Linux Systems http://progeny.com ]

2000-11-12 00:18:56

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

"H. Peter Anvin" <[email protected]> writes:

> Followup to: <[email protected]>
> By author: [email protected] (Eric W. Biederman)
> In newsgroup: linux.dev.kernel
> > > >
> > > > The interface is designed to be simple and inflexible yet very
> > > > powerful. To that end the code just takes an elf binary, and a
> > > > command line. The started image also takes an environment generated
> > > > by the kernel of all of the unprobeable hardware details.
> > >
> > > Isn't this what milo does on alpha?
> >
> > Similar milo uses kernel drivers in it's own framework.
> > This has proved to be a major maintenance problem. Milo is nearly
> > a kernel fork.
> >
> > The design is for the long term to get this incorporated into the
> > kernel, and even if not a small kernel patch should be easier to
> > maintain that a harness for calling kernel drivers.
> >
>
> I'm working on something similiar in "Genesis". It pretty much is (or
> rather, will be) a kernel *port*, not a fork; the port is such that it
> can run on top of a simple BIOS extender and thus access the boot
> media.

Hmm. You must mean similiar to milo.

Have fun. With linuxBIOS I'm working exactly the other way. Killing
off the BIOS. And letting the initial firmware be just a boot loader.
The reduction is complexity should make it more reliable.

Eric

2000-11-12 00:19:26

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Adam Lazur <[email protected]> writes:

> Eric W. Biederman ([email protected]) said:
> > Michael Rothwell <[email protected]> writes:
> > > This would rock. One place I can think of using it is with distro
> > > installers. The installer boots a generic i386 kernel, and then installs
> > > an optimized (i.e, PIII, etc.) kernel for run-time.
> >
> > This would rock? It already does. Of course the installers need
> > to actually uses this.
>
> Actually, along the lines of what Scyld uses two kernel monte for with
> their Beowulf2 distribution.
>
> They boot a network enabled kernel which pulls a kernel off of a server
> and then uses two kernel monte to boot with that one. This allows you
> to centrally admin your cluster with one server. Good stuff...

Yep. You can also do this with etherboot flashed on one a nick card as well.

I also intend to use my work for this functionality as well.
FYI I work for linux networx which builds hardware for linux clusters.

The fact that Scyld is using arp and a fixed network socket is a
design decision I don't agree with.

Truly slick will be when linuxBIOS is solid. Then you even get remote
control of the BIOS, and remote booting all from within the BIOS. Only
time will tell if it is worth the effort :)

Eric

2000-11-12 00:19:56

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Adam Lazur <[email protected]> writes:

> Eric W. Biederman ([email protected]) said:
> > I have recently developed a patch that allows linux to directly boot
> > into another linux kernel. With the code freeze it appears
> > inappropriate to submit it at this time.
>
> Aside from what looks to be support for SMP, how does this differ from
> the two kernel monte stuff at http://scyld.com/software/monte.html ?

I admit that LOBOS, two kernel monte, and the one by by Werner Almsberg.
Were all related work that I looked at. And I acknowledge
there were some good ideas I pilfered from all of them.

There are a couple of differences.
But the big one is I'm trying to do it right. In particular
this means fixing the problem where the problem is.

Additionally I'm killing backwards compatibility with a lot of short
sited things.

And multiplatform support is in the plan. So long term this should
run on alpha, and x86, and sparc and everything else out there
that linux supports. This means that you can have a multiplatform
boot loader. There will have to be glue code out there to get
started from different firmware on different machines but that is it.

Additionally mine is the only one that has a real chance of booting
a non-linux kernel. Gathering the non probable hardware information
is hard. Currently mine implementation is the only one to not simply
copy the boot parameters page that is give to the linux kernel.

Unlike 2 kernel monte mine deliberately has no reliance upon a BIOS.

There is another major difference as well. kexec is part of work
on the linuxBIOS project. Where the goal is to have a very minimal
firmware before booting into linux. And to use that initial linux
kernel as the firmware hardware drivers. What this means is kexec
is being developed from a point of view that needs it. If you don't
have a BIOS kexec is a must.

Eric

2000-11-12 00:32:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

"Eric W. Biederman" wrote:
>
> Hmm. You must mean similiar to milo.
>
> Have fun. With linuxBIOS I'm working exactly the other way. Killing
> off the BIOS. And letting the initial firmware be just a boot loader.
> The reduction is complexity should make it more reliable.
>

... except that you have to handle every single motherboard architecture
out there now.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-11-12 07:29:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

"H. Peter Anvin" <[email protected]> writes:

> "Eric W. Biederman" wrote:
> >
> > Hmm. You must mean similiar to milo.
> >
> > Have fun. With linuxBIOS I'm working exactly the other way. Killing
> > off the BIOS. And letting the initial firmware be just a boot loader.
> > The reduction is complexity should make it more reliable.
> >
>
> ... except that you have to handle every single motherboard architecture
> out there now.

Agreed that is a bit of a risk. Mostly you just have to handle
the chipset of the boards and there are a finite number of them.

Only time will tell if this is truly feasible. I think it is certainly
work a try.

And I don't have to handle every single one just all of the ones
I need it to run on :)

With the my kexec patch I'm just getting the infrastructure ready, and that
is functionality that can be used independently of linuxBIOS. If
booting linux from linux would help with what you are doing I love to
work together on that.

Eric

2000-11-14 08:37:51

by Erik Andersen

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

On Thu Nov 09, 2000 at 01:18:24AM -0700, Eric W. Biederman wrote:
>
> I have recently developed a patch that allows linux to directly boot
> into another linux kernel.

Looks very cool. I'm curious about your decision to use ELF images. This
makes it much less conveinient to use due to the kernel postprocessing, and
makes it that the kernel binary from which you initially boot is not
necessirily the same as the binary that you re-boot into.

Wouldn't it be more reasonable to simply try to exec whatever file is provided?
If the concern is initrds; they can be simply pasted into the kernel binary.

-Erik

--
Erik B. Andersen email: [email protected]
--This message was written using 73% post-consumer electrons--

2000-11-14 15:20:26

by Werner Almesberger

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Eric W. Biederman wrote:
> There are a couple of differences.
> But the big one is I'm trying to do it right.

So why do you need a file-based interface then ? ;-)

Since this is a highly privileged operation anyway, you may as
well trust user space to use the right data format ...

I get the impression that you incur quite a lot of overhead just
to make it fit with the exec interface. I agree that it's
conceptually nice, and it looks cleanly done, but I don't quite
see the practical value. (Except, perhaps, that this allowed you
to pick the rather cute name "kexec" ;-)

> Additionally mine is the only one that has a real chance of booting
> a non-linux kernel.

Hmm, I think all approaches could boot a non-Linux kernel, but ...

As far as loading is concerned, bootimg probably has an advantage
there, because you can put things together in memory (e.g. some
OS-specific chain loader), without going to secondary storage.
(Proof of concept: bootimg is able to load all currently supported
kernel image formats on ia32.)

As far as execution is concerned, you're probably slightly better
off with an approach that goes back to real mode. (Or use a chain
loader - this can be transparent to the kernel.) But then, I'm not
sure if you can re-animate the BIOS in any consistent way, so your
choice of operating systems may be quite limited, or you have to
provide your own BIOS substitute.

Concerning complexity, you don't need to use assembler for the
copying (arch/i386/kernel/relocate_kernel.S), see bootimg,
kernel/bootimg_pic.c

Also, why did you implement your own memory management in
fs/kexec.c:kimage_get_chunk ?

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, ICA, EPFL, CH [email protected] /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/

2000-11-15 10:58:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Erik Andersen <[email protected]> writes:

> On Thu Nov 09, 2000 at 01:18:24AM -0700, Eric W. Biederman wrote:
> >
> > I have recently developed a patch that allows linux to directly boot
> > into another linux kernel.
>
> Looks very cool. I'm curious about your decision to use ELF images. This
> makes it much less conveinient to use due to the kernel postprocessing, and
> makes it that the kernel binary from which you initially boot is not
> necessirily the same as the binary that you re-boot into.

The decision here was that I needed to pass a vector of
<physical address, length, data> pairs. The elf program header
is dead simple and provides it. So I either had to invent a
complicated argument passing mechanism for a syscall or have the
kernel parse a file.

> Wouldn't it be more reasonable to simply try to exec whatever file is provided?
> If the concern is initrds; they can be simply pasted into the kernel binary.

That's exactly what my preprocessing does.

vmlinux is also an elf binary. As is arch/i386/boot/bvmlinux but it
is compressed.

All mkelfImage does is the pasting of initrd's, command lines,
and just a touch of argument conversion code.

What I don't do deliberately is allow or need setup.S which does
syscalls to run. All it does are BIOS calls, and store them in a
nasty data structure. I have replaced that data structure with
something that is maintainable.

I would like very much to not need mkelfImage. However that
requires further changes to the kernel, and I cannot boot an unpatched
kernel with that method.

Eric

2000-11-15 23:54:17

by Erik Andersen

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

On Tue Nov 14, 2000 at 07:59:18AM -0700, Eric W. Biederman wrote:
>
> All mkelfImage does is the pasting of initrd's, command lines,
> and just a touch of argument conversion code.

You can link in an initrd using linker magic, i.e.
$(OBJCOPY) --add-section=image=kernel --add-section=initrd=initrd.gz

This is done in ppc/boot/Makefile for example. It might be a nice thing
to add a .config option to optionally specify an initrd to link into
the kernel image. Similarly, several architectures have a CONFIG_CMDLINE
which could also do the job (see arch/ppc/config.in for example).

Presumably, by doing such things you could avoid needing to use mkelfImage.

-Erik

--
Erik B. Andersen email: [email protected]
--This message was written using 73% post-consumer electrons--

2000-11-16 07:58:46

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Erik Andersen <[email protected]> writes:

> On Tue Nov 14, 2000 at 07:59:18AM -0700, Eric W. Biederman wrote:
> >
> > All mkelfImage does is the pasting of initrd's, command lines,
> > and just a touch of argument conversion code.
>
> You can link in an initrd using linker magic, i.e.
> $(OBJCOPY) --add-section=image=kernel --add-section=initrd=initrd.gz

Hmm this is certainly possible.
My impression is that this doesn't currently work on x86.
I would love to be wrong.

> This is done in ppc/boot/Makefile for example. It might be a nice thing
> to add a .config option to optionally specify an initrd to link into
> the kernel image. Similarly, several architectures have a CONFIG_CMDLINE
> which could also do the job (see arch/ppc/config.in for example).
>
> Presumably, by doing such things you could avoid needing to use mkelfImage.

Agreed. And I would like to see that.
With the 2.4 code freeze it is too late to do that today.
Also mkelfImage gives me backwards compatibility for now.

Eric

2000-11-16 20:00:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Werner Almesberger <[email protected]> writes:

> Eric W. Biederman wrote:
> > There are a couple of differences.
> > But the big one is I'm trying to do it right.
>
> So why do you need a file-based interface then ? ;-)

When possible it is nice to set as much policy as possible,
without removing functionality.

> Since this is a highly privileged operation anyway, you may as
> well trust user space to use the right data format ...

Hmm. I hadn't thought of it from that angle.
I don't think I have much code tied up in format checks
so I'm not too worried. If something goes wrong it is simple
a question of where it will crash. Doing some checking simply
allows for better debugging of problems :)

One thing I'm going to have to consider though is if the memory
regions that the new kernel is going into are actually memory. The
pro argument is that checking for reserved areas of memory catches
changes to an architecture that were unexpected. The recent issues
with the extended BIOS area growing are a good example of this.


> I get the impression that you incur quite a lot of overhead just
> to make it fit with the exec interface. I agree that it's
> conceptually nice, and it looks cleanly done, but I don't quite
> see the practical value. (Except, perhaps, that this allowed you
> to pick the rather cute name "kexec" ;-)

Well there is that. Somehow implementing scatter/gather from
a user space process seemed like a potential mess, and extra work.

In part I am starting with a network boot loader, so building
a file format that works was needed anyway. As far as overhead my
impression is that there is none in speed, and only one or two extra
ones functions in space.

> > Additionally mine is the only one that has a real chance of booting
> > a non-linux kernel.
>
> Hmm, I think all approaches could boot a non-Linux kernel, but ...
bootimg is close.

I was thinking a couple of directions here.
- Mine is the only interface that can boot a non-Linux kernel
natively. Bootimg doesn't count because it doesn't do anything
natively :)

In particular every other boot loader passes the nasty empty zero page
to the new kernel. Definitely requiring a chain loader.

With an OS neutral format, cataloging the non-probable hardware
details, and providing those details in an extensible format, I gain a lot
in easy extensibility.

I need to find time soon and write up all of the file format details
in an RFC like the GRUB multiboot spec. Possibly even submit it
to the IETF as an RFC for compatible booting and multiple platforms.

And this raises an important point. Lazy programmers tend to go
with whatever is easiest. Having a good file format, making this
the easy case, should reduce the number of formats supported
and increase boot interoperability. Most of what was said
on this score with GRUB I agree with. I would even be following
the GRUB multiboot spec except it doesn't allow passing of the
unprobeable hardware details and it doesn't allow easy expansion of
what it does pass. This is the big reason I'm not in favor
of the bootimg approach, that doesn't define anything.


> As far as loading is concerned, bootimg probably has an advantage
> there, because you can put things together in memory (e.g. some
> OS-specific chain loader), without going to secondary storage.

Well with ramfs is hardly secondary storage, though it has
a touch more overhead. And you only need to do this for the
non common case. Getting images to adapt to a specific bootloader
isn't to hard. Every other boot loader in the world does it.

> (Proof of concept: bootimg is able to load all currently supported
> kernel image formats on ia32.)

I do conceded that bootimg has this ability as well in theory.

I actually have booted multiboot compliant images in an earlier
version of my patch and the cost to support both formats in a kernel
loader is negligible. My mkelfImage builds linux kernels that
support being booted both ways.

> As far as execution is concerned, you're probably slightly better
> off with an approach that goes back to real mode. (Or use a chain
> loader - this can be transparent to the kernel.) But then, I'm not
> sure if you can re-animate the BIOS in any consistent way, so your
> choice of operating systems may be quite limited, or you have to
> provide your own BIOS substitute.

Agreed if the goal is to boot code is designed to start with a single
sector loaded at 0x7c00. If I really care I might worry about that.
Since linux preserved the first page of memory which includes the
interrupt table reanimating the BIOS might not be so bad.

My primary non-linux target are the BSD's, and various experimental
OS's. And in those cases why go to the pain of dropping out of
protected mode if you are going to just load back into it again.

All of what I do is colored by the fact that my most important
environment I have no BIOS. So for me I can't reanimate the BIOS
because it isn't there. Once this bullet is bitten though this
buys a lot. I can now write a multiplatform boot loader, with
sophisticated features.

> Concerning complexity, you don't need to use assembler for the
> copying (arch/i386/kernel/relocate_kernel.S), see bootimg,
> kernel/bootimg_pic.c

I don't doubt that you can build code that works. I have
yet to be convinced that the code is safe.
... Thinking ...
Compiling the code in it's own file and putting it in it's own section
of the kernel for size would probably do it though. Being sure
the code is PIC is a little tricky though.

>
> Also, why did you implement your own memory management in
> fs/kexec.c:kimage_get_chunk ?

That really isn't memory management because no actual memory is
allocated. All that does is find an area of memory < 4GB that is not
reserved and is no other page is going to be placed there.

In retrospect I probably should be looking through the map of memory
I'm going to provide to the new image and not mem_map. As with most
code there is certainly room for improvement.

Eric

2000-11-19 02:55:06

by Werner Almesberger

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Eric W. Biederman wrote:
> Well there is that. Somehow implementing scatter/gather from
> a user space process seemed like a potential mess, and extra work.

Did you look at kiobufs ? I think they may just have the right
functionality. I always wanted bootimg to be able to memory-map things
to reduce memory pressure, and it seems now all the ingredients are in
place. Your file-based approach could probably use brw_kiovec.

> I need to find time soon and write up all of the file format details
> in an RFC like the GRUB multiboot spec. Possibly even submit it
> to the IETF as an RFC for compatible booting and multiple platforms.

Hmm, if you succeed in selling the format as an integral part of your
network boot protocol, this may even work ;-)

> This is the big reason I'm not in favor
> of the bootimg approach, that doesn't define anything.

Oh, it does - but the policy is implemented in user space. And, of
course, it's rather simple. But I'm a little confused with your
UBE. It only seems to copy the e820 information, so you still seem
to rely on e.g. the SMP tables the BIOS stores in memory. Also, I
don't quite see where you're using the saved information. What am
I missing ?

However, parameter passing like UBE may solve the following two
potential problems:

- kernel 1 copies tables marked by "magic" numbers in memory,
then boots kernel 2, which trips over the copy
- kernel 1 doesn't know about a table and damages it, then boots
kernel 2, which recognizes the table, and trips over it

But I think we don't need to copy or even convert the entire tables for
this. After all, any OS that boots on i386 already knows how to parse
the BIOS-provided tables, so I think it's better to directly re-use
this code than to invent a new format. A few flags or maybe a short
list should be sufficient for the problems I've described above.

> My primary non-linux target are the BSD's, and various experimental
> OS's. And in those cases why go to the pain of dropping out of
> protected mode if you are going to just load back into it again.

Yep, I fully agree.

> Compiling the code in it's own file and putting it in it's own section
> of the kernel for size would probably do it though.

This is exactly what bootimg does :)

> Being sure the code is PIC is a little tricky though.

Yes, for now I cheat and depend on gcc to generate code that just
happens to be PIC.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, ICA, EPFL, CH [email protected] /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/

2000-11-19 10:58:40

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Werner Almesberger <[email protected]> writes:

> Eric W. Biederman wrote:
> > Well there is that. Somehow implementing scatter/gather from
> > a user space process seemed like a potential mess, and extra work.
>
> Did you look at kiobufs ? I think they may just have the right
> functionality. I always wanted bootimg to be able to memory-map things
> to reduce memory pressure, and it seems now all the ingredients are in
> place. Your file-based approach could probably use brw_kiovec.

When I looked kiobufs seemed to do a good gather but not a good scatter.
The code wasn't trivially reusable, and the structures had a lot
of overhead.

>
> > I need to find time soon and write up all of the file format details
> > in an RFC like the GRUB multiboot spec. Possibly even submit it
> > to the IETF as an RFC for compatible booting and multiple platforms.
>
> Hmm, if you succeed in selling the format as an integral part of your
> network boot protocol, this may even work ;-)

Well I'd sell it to promote interoperability. What I'm doing protocol
wise has been RFC sanctions for years. It's just that every vendor
invents their own format. So interoperability is a problem.

>
> > This is the big reason I'm not in favor
> > of the bootimg approach, that doesn't define anything.
>
> Oh, it does - but the policy is implemented in user space. And, of
> course, it's rather simple. But I'm a little confused with your
> UBE. It only seems to copy the e820 information, so you still seem
> to rely on e.g. the SMP tables the BIOS stores in memory. Also, I
> don't quite see where you're using the saved information. What am
> I missing ?

Defining all of the parameters for the UBE is a separate issue.
It comes next in a couple of weeks.

The rebooting is done the rest is not yet.

As far as where I use the information is used, look in do_kexec.
Right after kimage_get_chunk which figures out where it is safe
to put the information.

> However, parameter passing like UBE may solve the following two
> potential problems:
>
> - kernel 1 copies tables marked by "magic" numbers in memory,
> then boots kernel 2, which trips over the copy
> - kernel 1 doesn't know about a table and damages it, then boots
> kernel 2, which recognizes the table, and trips over it
>
> But I think we don't need to copy or even convert the entire tables for
> this. After all, any OS that boots on i386 already knows how to parse
> the BIOS-provided tables, so I think it's better to directly re-use
> this code than to invent a new format. A few flags or maybe a short
> list should be sufficient for the problems I've described above.

I agree writing the code to understand the table may be a significant
issue. On the other hand I still think it is worth a look, being
able to unify option parsing for multiple platforms is not a small
gain, nor is getting out from short sighted vendor half standards.

Besides which most tables seem to contain a lot of information that
is probeable. Which just makes them a waste of BIOS space, and
sources of bugs.

> > My primary non-linux target are the BSD's, and various experimental
> > OS's. And in those cases why go to the pain of dropping out of
> > protected mode if you are going to just load back into it again.
>
> Yep, I fully agree.
>
> > Compiling the code in it's own file and putting it in it's own section
> > of the kernel for size would probably do it though.
>
> This is exactly what bootimg does :)
>
> > Being sure the code is PIC is a little tricky though.
>
> Yes, for now I cheat and depend on gcc to generate code that just
> happens to be PIC.

Hmm. I wonder how hard it would be to add -fPIC to the compilation
line for that file. But I'm not certain that would do what I want
in this instance...

Eric

2000-11-19 13:55:59

by Werner Almesberger

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Eric W. Biederman wrote:
> The code wasn't trivially reusable, and the structures had a lot
> of overhead.

There's some overhead, but I think it's not too bad. I'll give it a
try ...

> The rebooting is done the rest is not yet.

Ah, and I already wondered where in all the APIC code you've hidden
the magic to avoid the config data clobbering issues ;-)

> I agree writing the code to understand the table may be a significant
> issue. On the other hand I still think it is worth a look, being
> able to unify option parsing for multiple platforms is not a small
> gain, nor is getting out from short sighted vendor half standards.

Well, you certainly have a point where stupid vendors and BIOS nonsense
are concerned. However, if we ignore LinuxBIOS for a moment, each
platform already has a set of configuration parameter passing conventions
imposed by the firmware. So we need to be able to handle this anyway, and
most of the information is highly platform-specific.

LinuxBIOS is a special case, because you have your own firmware. But
what you're suggesting is basically yet another parameter format, which
needs to incorporate and possibly unify much of the information
contained in all those platform-specific formats. I'm not sure it's worth
the effort.

And, besides, I think it complicates the kernel, because you either
have to add a parallel set of functions extracting and processing data
from the "native" or the UBE environment, or you have to add a converter
between "native" and UBE for each platform. Or do you have a better
plan ?

When I started with bootimg, I also thought that we'd need some
parameter passing mechanism, a bit similar to UBE (although I would
have tried to be more text-based). Then I realized that there are
actually only a few tables, and we can just keep them in memory. And
some of them need to be modified before we can re-use them. (Trivial
example: the boot command line. Video modes are a similar, although
much more complicated issue.)

> Besides which most tables seem to contain a lot of information that
> is probeable. Which just makes them a waste of BIOS space, and
> sources of bugs.

Agreed with BIOS bugs ;-) Where probing is possible, is it reliable ?
It'd take some baroque BIOS parameter table over yet another mandatory
boot command line parameter any time ...

> Hmm. I wonder how hard it would be to add -fPIC to the compilation
> line for that file. But I'm not certain that would do what I want
> in this instance...

Are there actually architectures where the compiler generates
position-dependent code even if you're careful ? (I.e. all functions
inlined, only auto variables.)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, ICA, EPFL, CH [email protected] /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/

2000-11-19 20:53:13

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Q: Linux rebooting directly into linux.

Werner Almesberger <[email protected]> writes:

> Eric W. Biederman wrote:
> > The code wasn't trivially reusable, and the structures had a lot
> > of overhead.
>
> There's some overhead, but I think it's not too bad. I'll give it a
> try ...
>
> > The rebooting is done the rest is not yet.
>
> Ah, and I already wondered where in all the APIC code you've hidden
> the magic to avoid the config data clobbering issues ;-)

Nope. That just comes in two parts.
The first chunk is the work on the apic so the deadlock detector
can run on UP kernels. From Ingo Molanar. The second part are my
cleanups so we up the apic in a sane state upon reboot.

> > I agree writing the code to understand the table may be a significant
> > issue. On the other hand I still think it is worth a look, being
> > able to unify option parsing for multiple platforms is not a small
> > gain, nor is getting out from short sighted vendor half standards.
>
> Well, you certainly have a point where stupid vendors and BIOS nonsense
> are concerned. However, if we ignore LinuxBIOS for a moment, each
> platform already has a set of configuration parameter passing conventions
> imposed by the firmware. So we need to be able to handle this anyway, and
> most of the information is highly platform-specific.
>
> LinuxBIOS is a special case, because you have your own firmware. But
> what you're suggesting is basically yet another parameter format, which
> needs to incorporate and possibly unify much of the information
> contained in all those platform-specific formats. I'm not sure it's worth
> the effort.
>
> And, besides, I think it complicates the kernel, because you either
> have to add a parallel set of functions extracting and processing data
> from the "native" or the UBE environment, or you have to add a converter
> between "native" and UBE for each platform. Or do you have a better
> plan ?

My initial plan was to have two parallel table parsers. The ones we
have now. And another based on UBE. If we find the information we
need via UBE use that. If not fall back to the old way.

But the tables are only half of it. Right now we have all kinds
of weirdness going through the empty_zero_page at boot time.
A lot of that I plan on just gather in UBE format instead of random
data in random locations. Since Setup.S implements this it should
be transparent to most everything.

But I need to see how well that works first before I'm too commited
either way.

For x86 it isn't too big of a deal. For other platforms though
where the Firmware comes is multiple flavors converting everything
looks like it could be a real win.

I guess what I'm most after is improving the linux BIOS abstraction layer.
We mostly have one, and only do BIOS calls before really starting the
kernel (except for some stupid BIOS standards like APM).

> When I started with bootimg, I also thought that we'd need some
> parameter passing mechanism, a bit similar to UBE (although I would
> have tried to be more text-based). Then I realized that there are
> actually only a few tables, and we can just keep them in memory. And
> some of them need to be modified before we can re-use them. (Trivial
> example: the boot command line. Video modes are a similar, although
> much more complicated issue.)

I agree with tables that we need to be careful. A lossy conversion
can be a real problem. The empty_zero_page is my first canidate,
and I'll see where it goes from there.

One of the more ugly challenges that I've already run into is that
there are multiple tables for specifying how interrupts are routed.
(In modern PC irq number is dynamically assigned). I would
like to have one good table than two that fight each other.

But the point is that looking through the parameters and figuring
out what works and what makes sense will take some doing, and
I'm not promising to do any more than clean up the empty_zero_page.

>
> > Besides which most tables seem to contain a lot of information that
> > is probeable. Which just makes them a waste of BIOS space, and
> > sources of bugs.
>
> Agreed with BIOS bugs ;-) Where probing is possible, is it reliable ?
> It'd take some baroque BIOS parameter table over yet another mandatory
> boot command line parameter any time ...
>
> > Hmm. I wonder how hard it would be to add -fPIC to the compilation
> > line for that file. But I'm not certain that would do what I want
> > in this instance...
>
> Are there actually architectures where the compiler generates
> position-dependent code even if you're careful ? (I.e. all functions
> inlined, only auto variables.)

I don't know yet. And since that part is machine specific, x86 is
really the only case that matters. I just don't quite trust the compiler.
But next rev I'll make certain to steal this code from bootimg.

Given a normal architecture I believe no references to global data
should be sufficient, to ensure the code is pic. Inlines are
interesting because they aren't always inlined. To be really
certain you can specify -fPIC and then make certain to properly
fill in the offset table after relocation. But avoiding the
whole offset table issue is much better.

Eric