Hi,
Eric biederman implemented the relocatable kernel support for x86 and x86_64
and posted the patches for comments almost two months back.
http://marc.theaimsgroup.com/?l=linux-kernel&m=115443019026302&w=2
We have been testing the patches in RHEL kernels since then and things are
looking up. I think this is the time that patches can be included in -mm
and get more testing done and get rest of the issues sorted out.
Eric is currently held up with other things, so I have taken his patches
and forward ported to 2.6.18-git19. Did some minor cleanups and fixed a
couple of bugs as faced in our testing. I have also accomodated the review
comments received last time.
Currently this patch series is only for i386. I will be posting the patches
for x86_64 later.
Following is a brief account of changes I have done since last time.
o Forward ported the patches to 2.6.18-git19
o Replaced CONFIG_PHYSICAL_START with CONFIG_PHYSICAL_ALIGN
o Added a patch to prevent section relative symbols becoming absolute
for sections with zero size.
o Dropped support for serial output debugging in decompressor code for
the time being as per the discussion last time.
o Added symbol _text to few architectures so that compilation does not break
o Fix a typo in config option (CONFIG_RELOCTABLE). It was causing wrong pice
of code to be executed and second kernel was stomping over first kernel's
data.
o Modified __pa_symbol() definition as per Andi's comment.
o Put --emit-relocs undef #ifdef, so that relocation sections are not
retained of CONFIG_RELOCATABLE is not set.
o Made symbol _end_rodata also section relative.
o Align .data section to 4K address otherwise data segment is loaded at
a non-4K aligned boundary and kexec-tools do the check.
Following is the text from Eric's last post to refresh memory that
why do we need a relocatable kernel.
The problem:
We can't always run the kernel at 1MB or 2MB, and so people who need
different addresses must build multiple kernels. The bzImage format
can't even represent loading a kernel at other than it's default address.
With kexec on panic now starting to be used by distros having a kernel
not running at the default load address is starting to become common.
The goal of this patch series is to build kernels that are relocatable
at run time, and to extend the bzImage format to make it capable of
expressing a relocatable kernel.
In extending the bzImage format I am replacing the existing unused bootsector
with an ELF header. To express what is going on the ELF header will
have type ET_DYN. Just like the kernel loading an ET_DYN executable
bootloaders are not expected to process relocations. But the executable
may be shifted in the address space so long as it's alignment requirements
are met.
The i386 kernel is built to process relocations generated with --emit-relocs
(after vmlinux.lds.S) has been fixed up to sort out static and dynamic
relocations.
Thanks
Vivek
o Now CONFIG_PHYSICAL_START is being replaced with CONFIG_PHYSICAL_ALIGN.
Hardcoding the kernel physical start value creates a problem in relocatable
kernel context due to boot loader limitations. For ex, if somebody
compiles a relocatable kernel to be run from address 4MB, but this kernel
will run from location 1MB as grub loads the kernel at physical address
1MB. Kernel thinks that I am a relocatable kernel and I should run from
the address I have been loaded at. So somebody wanting to run kernel
from 4MB alignment location (for improved performance regions) can't do
that.
o Hence, Eric proposed that probably CONFIG_PHYSICAL_ALIGN will make
more sense in relocatable kernel context. At run time kernel will move
itself to a physical addr location which meets user specified alignment
restrictions.
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/Kconfig | 33 ++++++++++++++++++---------------
arch/i386/boot/compressed/head.S | 26 ++++++++++++++------------
arch/i386/boot/compressed/misc.c | 7 ++++---
arch/i386/kernel/vmlinux.lds.S | 3 ++-
include/asm-i386/boot.h | 6 +++++-
5 files changed, 43 insertions(+), 32 deletions(-)
diff -puN arch/i386/Kconfig~i386-implement-config-physical-align-option arch/i386/Kconfig
--- linux-2.6.18-git17/arch/i386/Kconfig~i386-implement-config-physical-align-option 2006-10-02 14:21:56.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/Kconfig 2006-10-02 14:21:56.000000000 -0400
@@ -785,23 +785,26 @@ config RELOCATABLE
must live at a different physical address than the primary
kernel.
-config PHYSICAL_START
- hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
-
- default "0x1000000" if CRASH_DUMP
+config PHYSICAL_ALIGN
+ hex "Alignment value to which kernel should be aligned"
default "0x100000"
+ range 0x2000 0x400000
help
- This gives the physical address where the kernel is loaded. Normally
- for regular kernels this value is 0x100000 (1MB). But in the case
- of kexec on panic the fail safe kernel needs to run at a different
- address than the panic-ed kernel. This option is used to set the load
- address for kernels used to capture crash dump on being kexec'ed
- after panic. The default value for crash dump kernels is
- 0x1000000 (16MB). This can also be set based on the "X" value as
- specified in the "crashkernel=YM@XM" command line boot parameter
- passed to the panic-ed kernel. Typically this parameter is set as
- crashkernel=64M@16M. Please take a look at
- Documentation/kdump/kdump.txt for more details about crash dumps.
+ This value puts the alignment restrictions on physical address
+ where kernel is loaded and run from. Kernel is compiled for an
+ address which meets above alignment restriction.
+
+ If bootloader loads the kernel at a non-aligned address and
+ CONFIG_RELOCATABLE is set, kernel will move itself to nearest
+ address aligned to above value and run from there.
+
+ If bootloader loads the kernel at a non-aligned address and
+ CONFIG_RELOCATABLE is not set, kernel will ignore the run time
+ load address and decompress itself to the address it has been
+ compiled for and run from there. The address for which kernel is
+ compiled already meets above alignment restrictions. Hence the
+ end result is that kernel runs from a physical address meeting
+ above alignment restrictions.
Don't change this unless you know what you are doing.
diff -puN include/asm-i386/boot.h~i386-implement-config-physical-align-option include/asm-i386/boot.h
--- linux-2.6.18-git17/include/asm-i386/boot.h~i386-implement-config-physical-align-option 2006-10-02 14:21:56.000000000 -0400
+++ linux-2.6.18-git17-root/include/asm-i386/boot.h 2006-10-02 14:21:56.000000000 -0400
@@ -12,4 +12,8 @@
#define EXTENDED_VGA 0xfffe /* 80x50 mode */
#define ASK_VGA 0xfffd /* ask for it at bootup */
-#endif
+/* Physical address where kenrel should be loaded. */
+#define LOAD_PHYSICAL_ADDR ((0x100000 + CONFIG_PHYSICAL_ALIGN - 1) \
+ & ~(CONFIG_PHYSICAL_ALIGN - 1))
+
+#endif /* _LINUX_BOOT_H */
diff -puN arch/i386/kernel/vmlinux.lds.S~i386-implement-config-physical-align-option arch/i386/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/i386/kernel/vmlinux.lds.S~i386-implement-config-physical-align-option 2006-10-02 14:21:56.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/kernel/vmlinux.lds.S 2006-10-02 14:21:56.000000000 -0400
@@ -9,6 +9,7 @@
#include <asm/thread_info.h>
#include <asm/page.h>
#include <asm/cache.h>
+#include <asm/boot.h>
OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)
@@ -22,7 +23,7 @@ PHDRS {
}
SECTIONS
{
- . = LOAD_OFFSET + CONFIG_PHYSICAL_START;
+ . = LOAD_OFFSET + LOAD_PHYSICAL_ADDR;
phys_startup_32 = startup_32 - LOAD_OFFSET;
/* read-only */
.text : AT(ADDR(.text) - LOAD_OFFSET) {
diff -puN arch/i386/boot/compressed/head.S~i386-implement-config-physical-align-option arch/i386/boot/compressed/head.S
--- linux-2.6.18-git17/arch/i386/boot/compressed/head.S~i386-implement-config-physical-align-option 2006-10-02 14:21:56.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/head.S 2006-10-02 14:21:56.000000000 -0400
@@ -26,6 +26,7 @@
#include <linux/linkage.h>
#include <asm/segment.h>
#include <asm/page.h>
+#include <asm/boot.h>
.section ".text.head"
.globl startup_32
@@ -52,17 +53,17 @@ startup_32:
1: popl %ebp
subl $1b, %ebp
-/* Compute the delta between where we were compiled to run at
- * and where the code will actually run at.
+/* %ebp contains the address we are loaded at by the boot loader and %ebx
+ * contains the address where we should move the kernel image temporarily
+ * for safe in-place decompression.
*/
- /* Start with the delta to where the kernel will run at. If we are
- * a relocatable kernel this is the delta to our load address otherwise
- * this is the delta to CONFIG_PHYSICAL start.
- */
+
#ifdef CONFIG_RELOCATABLE
- movl %ebp, %ebx
+ movl %ebp, %ebx
+ addl $(CONFIG_PHYSICAL_ALIGN - 1), %ebx
+ andl $(~(CONFIG_PHYSICAL_ALIGN - 1)), %ebx
#else
- movl $(CONFIG_PHYSICAL_START - startup_32), %ebx
+ movl $LOAD_PHYSICAL_ADDR, %ebx
#endif
/* Replace the compressed data size with the uncompressed size */
@@ -94,9 +95,10 @@ startup_32:
/* Compute the kernel start address.
*/
#ifdef CONFIG_RELOCATABLE
- leal startup_32(%ebp), %ebp
+ addl $(CONFIG_PHYSICAL_ALIGN - 1), %ebp
+ andl $(~(CONFIG_PHYSICAL_ALIGN - 1)), %ebp
#else
- movl $CONFIG_PHYSICAL_START, %ebp
+ movl $LOAD_PHYSICAL_ADDR, %ebp
#endif
/*
@@ -150,8 +152,8 @@ relocated:
* and where it was actually loaded.
*/
movl %ebp, %ebx
- subl $CONFIG_PHYSICAL_START, %ebx
-
+ subl $LOAD_PHYSICAL_ADDR, %ebx
+ jz 2f /* Nothing to be done if loaded at compiled addr. */
/*
* Process relocations.
*/
diff -puN arch/i386/boot/compressed/misc.c~i386-implement-config-physical-align-option arch/i386/boot/compressed/misc.c
--- linux-2.6.18-git17/arch/i386/boot/compressed/misc.c~i386-implement-config-physical-align-option 2006-10-02 14:21:56.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/misc.c 2006-10-02 14:21:56.000000000 -0400
@@ -15,6 +15,7 @@
#include <linux/screen_info.h>
#include <asm/io.h>
#include <asm/page.h>
+#include <asm/boot.h>
/* WARNING!!
* This code is compiled with -fPIC and it is relocated dynamically
@@ -361,12 +362,12 @@ asmlinkage void decompress_kernel(void *
insize = input_len;
inptr = 0;
- if (((u32)output - CONFIG_PHYSICAL_START) & 0x3fffff)
- error("Destination address not 4M aligned");
+ if ((u32)output & (CONFIG_PHYSICAL_ALIGN -1))
+ error("Destination address not CONFIG_PHYSICAL_ALIGN aligned");
if (end > ((-__PAGE_OFFSET-(512 <<20)-1) & 0x7fffffff))
error("Destination address too large");
#ifndef CONFIG_RELOCATABLE
- if ((u32)output != CONFIG_PHYSICAL_START)
+ if ((u32)output != LOAD_PHYSICAL_ADDR)
error("Wrong destination address");
#endif
_
o Add ELFOSABI_STANDALONE definition to elf.h
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
include/linux/elf.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN include/linux/elf.h~elf-Add-ELFOSABI_STANDALONE-to-elf.h include/linux/elf.h
--- linux-2.6.18-git17/include/linux/elf.h~elf-Add-ELFOSABI_STANDALONE-to-elf.h 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/include/linux/elf.h 2006-10-02 13:17:59.000000000 -0400
@@ -338,8 +338,9 @@ typedef struct elf64_shdr {
#define EV_CURRENT 1
#define EV_NUM 2
-#define ELFOSABI_NONE 0
-#define ELFOSABI_LINUX 3
+#define ELFOSABI_NONE 0
+#define ELFOSABI_LINUX 3
+#define ELFOSABI_STANDALONE 255
#ifndef ELF_OSABI
#define ELF_OSABI ELFOSABI_NONE
_
On x86_64 we have to be careful with calculating the physical
address of kernel symbols. Both because of compiler odditities
and because the symbols live in a different range of the virtual
address space.
Having a defintition of __pa_symbol that works on both x86_64 and
i386 simplifies writing code that works for both x86_64 and
i386 that has these kinds of dependencies.
So this patch adds the trivial i386 __pa_symbol definition.
Added assembly magic similar to RELOC_HIDE as suggested by Andi Kleen.
Just picked it up from x86_64.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
include/asm-i386/page.h | 6 ++++++
1 file changed, 6 insertions(+)
diff -puN include/asm-i386/page.h~i386-define-__pa_symbol include/asm-i386/page.h
--- linux-2.6.18-git17/include/asm-i386/page.h~i386-define-__pa_symbol 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/include/asm-i386/page.h 2006-10-02 14:36:32.000000000 -0400
@@ -124,6 +124,12 @@ extern int page_is_ram(unsigned long pag
#define VMALLOC_RESERVE ((unsigned long)__VMALLOC_RESERVE)
#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
+/* __pa_symbol should be used for C visible symbols.
+ This seems to be the official gcc blessed way to do such arithmetic. */
+#define __pa_symbol(x) \
+ ({unsigned long v; \
+ asm("" : "=r" (v) : "0" (x)); \
+ __pa(v); })
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
#ifdef CONFIG_FLATMEM
_
Ld knows about 2 kinds of symbols, absolute and section
relative. Section relative symbols symbols change value
when a section is moved and absolute symbols do not.
Currently in the linker script we have several labels
marking the beginning and ending of sections that
are outside of sections, making them absolute symbols.
Having a mixture of absolute and section relative
symbols refereing to the same data is currently harmless
but it is confusing.
This must be done carefully as newer revs of ld do not place
symbols that appear in sections without data and instead
ld makes those symbols global :(
My ultimate goal is to build a relocatable kernel. The
safest and least intrusive technique is to generate
relocation entries so the kernel can be relocated at load
time. The only penalty would be an increase in the size
of the kernel binary. The problem is that if absolute and
relocatable symbols are not properly specified absolute symbols
will be relocated or section relative symbols won't be, which
is fatal.
The practical motivation is that when generating kernels that
will run from a reserved area for analyzing what caused
a kernel panic, it is simpler if you don't need to hard code
the physical memory location they will run at, especially
for the distributions.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/kernel/vmlinux.lds.S | 109 ++++++++++++++++++++------------------
include/asm-generic/vmlinux.lds.h | 4 -
2 files changed, 60 insertions(+), 53 deletions(-)
diff -puN arch/i386/kernel/vmlinux.lds.S~i386-vmlinux.lds.S-Distinguish-absolute-symbols arch/i386/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/i386/kernel/vmlinux.lds.S~i386-vmlinux.lds.S-Distinguish-absolute-symbols 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/kernel/vmlinux.lds.S 2006-10-02 14:38:44.000000000 -0400
@@ -24,31 +24,32 @@ SECTIONS
. = __KERNEL_START;
phys_startup_32 = startup_32 - LOAD_OFFSET;
/* read-only */
- _text = .; /* Text and read-only data */
.text : AT(ADDR(.text) - LOAD_OFFSET) {
+ _text = .; /* Text and read-only data */
*(.text)
SCHED_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
*(.gnu.warning)
- } :text = 0x9090
-
- _etext = .; /* End of text section */
+ _etext = .; /* End of text section */
+ } :text = 0x9090
. = ALIGN(16); /* Exception table */
- __start___ex_table = .;
- __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { *(__ex_table) }
- __stop___ex_table = .;
+ __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
+ __start___ex_table = .;
+ *(__ex_table)
+ __stop___ex_table = .;
+ }
RODATA
. = ALIGN(4);
- __tracedata_start = .;
.tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) {
+ __tracedata_start = .;
*(.tracedata)
+ __tracedata_end = .;
}
- __tracedata_end = .;
/* writeable */
.data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */
@@ -57,10 +58,12 @@ SECTIONS
} :data
. = ALIGN(4096);
- __nosave_begin = .;
- .data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) { *(.data.nosave) }
- . = ALIGN(4096);
- __nosave_end = .;
+ .data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) {
+ __nosave_begin = .;
+ *(.data.nosave)
+ . = ALIGN(4096);
+ __nosave_end = .;
+ }
. = ALIGN(4096);
.data.page_aligned : AT(ADDR(.data.page_aligned) - LOAD_OFFSET) {
@@ -74,8 +77,10 @@ SECTIONS
/* rarely changed data like cpu maps */
. = ALIGN(32);
- .data.read_mostly : AT(ADDR(.data.read_mostly) - LOAD_OFFSET) { *(.data.read_mostly) }
- _edata = .; /* End of data section */
+ .data.read_mostly : AT(ADDR(.data.read_mostly) - LOAD_OFFSET) {
+ *(.data.read_mostly)
+ _edata = .; /* End of data section */
+ }
#ifdef CONFIG_STACK_UNWIND
. = ALIGN(4);
@@ -93,39 +98,41 @@ SECTIONS
/* might get freed after init */
. = ALIGN(4096);
- __smp_alt_begin = .;
- __smp_alt_instructions = .;
.smp_altinstructions : AT(ADDR(.smp_altinstructions) - LOAD_OFFSET) {
+ __smp_alt_begin = .;
+ __smp_alt_instructions = .;
*(.smp_altinstructions)
+ __smp_alt_instructions_end = .;
}
- __smp_alt_instructions_end = .;
. = ALIGN(4);
- __smp_locks = .;
.smp_locks : AT(ADDR(.smp_locks) - LOAD_OFFSET) {
+ __smp_locks = .;
*(.smp_locks)
+ __smp_locks_end = .;
}
- __smp_locks_end = .;
.smp_altinstr_replacement : AT(ADDR(.smp_altinstr_replacement) - LOAD_OFFSET) {
*(.smp_altinstr_replacement)
+ . = ALIGN(4096);
+ __smp_alt_end = .;
}
- . = ALIGN(4096);
- __smp_alt_end = .;
/* will be freed after init */
. = ALIGN(4096); /* Init code and data */
- __init_begin = .;
.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
+ __init_begin = .;
_sinittext = .;
*(.init.text)
_einittext = .;
}
.init.data : AT(ADDR(.init.data) - LOAD_OFFSET) { *(.init.data) }
. = ALIGN(16);
- __setup_start = .;
- .init.setup : AT(ADDR(.init.setup) - LOAD_OFFSET) { *(.init.setup) }
- __setup_end = .;
- __initcall_start = .;
+ .init.setup : AT(ADDR(.init.setup) - LOAD_OFFSET) {
+ __setup_start = .;
+ *(.init.setup)
+ __setup_end = .;
+ }
.initcall.init : AT(ADDR(.initcall.init) - LOAD_OFFSET) {
+ __initcall_start = .;
*(.initcall1.init)
*(.initcall2.init)
*(.initcall3.init)
@@ -133,20 +140,20 @@ SECTIONS
*(.initcall5.init)
*(.initcall6.init)
*(.initcall7.init)
+ __initcall_end = .;
}
- __initcall_end = .;
- __con_initcall_start = .;
.con_initcall.init : AT(ADDR(.con_initcall.init) - LOAD_OFFSET) {
+ __con_initcall_start = .;
*(.con_initcall.init)
+ __con_initcall_end = .;
}
- __con_initcall_end = .;
SECURITY_INIT
. = ALIGN(4);
- __alt_instructions = .;
.altinstructions : AT(ADDR(.altinstructions) - LOAD_OFFSET) {
+ __alt_instructions = .;
*(.altinstructions)
+ __alt_instructions_end = .;
}
- __alt_instructions_end = .;
.altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
*(.altinstr_replacement)
}
@@ -155,32 +162,32 @@ SECTIONS
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
.exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
. = ALIGN(4096);
- __initramfs_start = .;
- .init.ramfs : AT(ADDR(.init.ramfs) - LOAD_OFFSET) { *(.init.ramfs) }
- __initramfs_end = .;
+ .init.ramfs : AT(ADDR(.init.ramfs) - LOAD_OFFSET) {
+ __initramfs_start = .;
+ *(.init.ramfs)
+ __initramfs_end = .;
+ }
. = ALIGN(L1_CACHE_BYTES);
- __per_cpu_start = .;
- .data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
- __per_cpu_end = .;
+ .data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
+ __per_cpu_start = .;
+ *(.data.percpu)
+ __per_cpu_end = .;
+ }
. = ALIGN(4096);
- __init_end = .;
/* freed after init ends here */
- __bss_start = .; /* BSS */
- .bss.page_aligned : AT(ADDR(.bss.page_aligned) - LOAD_OFFSET) {
- *(.bss.page_aligned)
- }
.bss : AT(ADDR(.bss) - LOAD_OFFSET) {
+ __init_end = .;
+ __bss_start = .; /* BSS */
+ *(.bss.page_aligned)
*(.bss)
+ . = ALIGN(4);
+ __bss_stop = .;
+ _end = . ;
+ /* This is where the kernel creates the early boot page tables */
+ . = ALIGN(4096);
+ pg0 = . ;
}
- . = ALIGN(4);
- __bss_stop = .;
-
- _end = . ;
-
- /* This is where the kernel creates the early boot page tables */
- . = ALIGN(4096);
- pg0 = .;
/* Sections to be discarded */
/DISCARD/ : {
diff -puN include/asm-generic/vmlinux.lds.h~i386-vmlinux.lds.S-Distinguish-absolute-symbols include/asm-generic/vmlinux.lds.h
--- linux-2.6.18-git17/include/asm-generic/vmlinux.lds.h~i386-vmlinux.lds.S-Distinguish-absolute-symbols 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/include/asm-generic/vmlinux.lds.h 2006-10-02 13:17:58.000000000 -0400
@@ -11,8 +11,8 @@
#define RODATA \
. = ALIGN(4096); \
- __start_rodata = .; \
.rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__start_rodata) = .; \
*(.rodata) *(.rodata.*) \
*(__vermagic) /* Kernel version magic */ \
} \
@@ -124,8 +124,8 @@
VMLINUX_SYMBOL(__start___param) = .; \
*(__param) \
VMLINUX_SYMBOL(__stop___param) = .; \
+ VMLINUX_SYMBOL(__end_rodata) = .; \
} \
- __end_rodata = .; \
. = ALIGN(4096);
#define SECURITY_INIT \
_
Print the addresses of non-absolute symbols relative to _text
so that ld will generate relocations. Allowing a relocatable
kernel to relocate them. We can't actually use the symbol names
because kallsyms includes static symbols that are not exported
from their object files.
Add the _text symbol definitions to the architectures which don't
define it otherwise linker will fail.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/m68knommu/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/ppc/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/sparc64/kernel/vmlinux.lds.S | 1 +
arch/v850/kernel/vmlinux.lds.S | 1 +
scripts/kallsyms.c | 20 +++++++++++++++++---
8 files changed, 24 insertions(+), 3 deletions(-)
diff -puN scripts/kallsyms.c~kallsyms.c-Generate-relocatable-symbols scripts/kallsyms.c
--- linux-2.6.18-git17/scripts/kallsyms.c~kallsyms.c-Generate-relocatable-symbols 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/scripts/kallsyms.c 2006-10-02 13:17:59.000000000 -0400
@@ -43,7 +43,7 @@ struct sym_entry {
static struct sym_entry *table;
static unsigned int table_size, table_cnt;
-static unsigned long long _stext, _etext, _sinittext, _einittext, _sextratext, _eextratext;
+static unsigned long long _text, _stext, _etext, _sinittext, _einittext, _sextratext, _eextratext;
static int all_symbols = 0;
static char symbol_prefix_char = '\0';
@@ -91,7 +91,9 @@ static int read_symbol(FILE *in, struct
sym++;
/* Ignore most absolute/undefined (?) symbols. */
- if (strcmp(sym, "_stext") == 0)
+ if (strcmp(sym, "_text") == 0)
+ _text = s->addr;
+ else if (strcmp(sym, "_stext") == 0)
_stext = s->addr;
else if (strcmp(sym, "_etext") == 0)
_etext = s->addr;
@@ -265,9 +267,21 @@ static void write_src(void)
printf(".data\n");
+ /* Provide proper symbols relocatability by their '_text'
+ * relativeness. The symbol names cannot be used to construct
+ * normal symbol references as the list of symbols contains
+ * symbols that are declared static and are private to their
+ * .o files. This prevents .tmp_kallsyms.o or any other
+ * object from referencing them.
+ */
output_label("kallsyms_addresses");
for (i = 0; i < table_cnt; i++) {
- printf("\tPTR\t%#llx\n", table[i].addr);
+ if (toupper(table[i].sym[0]) != 'A') {
+ printf("\tPTR\t_text + %#llx\n",
+ table[i].addr - _text);
+ } else {
+ printf("\tPTR\t%#llx\n", table[i].addr);
+ }
}
printf("\n");
diff -puN arch/h8300/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols arch/h8300/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/h8300/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/h8300/kernel/vmlinux.lds.S 2006-10-02 13:17:59.000000000 -0400
@@ -70,6 +70,7 @@ SECTIONS
#endif
.text :
{
+ _text = .;
#if defined(CONFIG_ROMKERNEL)
*(.int_redirect)
#endif
diff -puN arch/m68knommu/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols arch/m68knommu/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/m68knommu/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/m68knommu/kernel/vmlinux.lds.S 2006-10-02 13:17:59.000000000 -0400
@@ -60,6 +60,7 @@ SECTIONS {
#endif
.text : {
+ _text = .;
_stext = . ;
*(.text)
SCHED_TEXT
diff -puN arch/powerpc/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols arch/powerpc/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/powerpc/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/powerpc/kernel/vmlinux.lds.S 2006-10-02 13:17:59.000000000 -0400
@@ -33,6 +33,7 @@ SECTIONS
/* Text and gots */
.text : {
+ _text = .;
*(.text .text.*)
SCHED_TEXT
LOCK_TEXT
diff -puN arch/ppc/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols arch/ppc/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/ppc/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/ppc/kernel/vmlinux.lds.S 2006-10-02 13:17:59.000000000 -0400
@@ -31,6 +31,7 @@ SECTIONS
.plt : { *(.plt) }
.text :
{
+ _text = .;
*(.text)
SCHED_TEXT
LOCK_TEXT
diff -puN arch/sparc64/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols arch/sparc64/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/sparc64/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/sparc64/kernel/vmlinux.lds.S 2006-10-02 13:17:59.000000000 -0400
@@ -13,6 +13,7 @@ SECTIONS
. = 0x4000;
.text 0x0000000000404000 :
{
+ _text = .;
*(.text)
SCHED_TEXT
LOCK_TEXT
diff -puN arch/sparc/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols arch/sparc/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/sparc/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/sparc/kernel/vmlinux.lds.S 2006-10-02 13:17:59.000000000 -0400
@@ -11,6 +11,7 @@ SECTIONS
. = 0x10000 + SIZEOF_HEADERS;
.text 0xf0004000 :
{
+ _text = .;
*(.text)
SCHED_TEXT
LOCK_TEXT
diff -puN arch/v850/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols arch/v850/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/v850/kernel/vmlinux.lds.S~kallsyms.c-Generate-relocatable-symbols 2006-10-02 13:17:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/v850/kernel/vmlinux.lds.S 2006-10-02 13:17:59.000000000 -0400
@@ -90,6 +90,7 @@
/* Kernel text segment, and some constant data areas. */
#define TEXT_CONTENTS \
+ _text = .; \
__stext = . ; \
*(.text) \
SCHED_TEXT \
_
o Relocation patches for i386, moved the symbols in vmlinux.lds.S inside
sections so that these symbols become section relative and are no more
absolute. If these symbols become absolute, its bad as they are not
relocated if kernel is not loaded at the address it has been compiled
for.
o Ironically, just moving the symbols inside the section does not
gurantee that symbols inside will not become absolute. Recent
versions of linkers, do some optimization, and if section size is
zero, it gets rid of the section and makes any defined symbol as absolute.
o This leads to a failure while second kernel is booting.
arch/i386/alternative.c frees any pages present between __smp_alt_begin
and __smp_alt_end. In my case size of section .smp_altinstructions is
zero and symbol __smpt_alt_begin becomes absolute and is not relocated
and system crashes while it is trying to free the memory starting
from __smp_alt_begin.
o This issue is being fixed by the linker guys and they are making sure
that linker does not get rid of an empty section if there is any
section relative symbol defined in it. But we need to fix it at
kernel level too so that people using the linker version without fix,
are not affected.
o One of the possible solutions is that force the section size to be
non zero to make sure these symbols don't become absolute. This
patch implements that.
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/kernel/vmlinux.lds.S | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff -puN arch/i386/kernel/vmlinux.lds.S~i386-reloc-non-zero-size-section arch/i386/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/i386/kernel/vmlinux.lds.S~i386-reloc-non-zero-size-section 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/kernel/vmlinux.lds.S 2006-10-02 14:36:32.000000000 -0400
@@ -40,6 +40,7 @@ SECTIONS
__start___ex_table = .;
*(__ex_table)
__stop___ex_table = .;
+ LONG(0)
}
RODATA
@@ -49,6 +50,7 @@ SECTIONS
__tracedata_start = .;
*(.tracedata)
__tracedata_end = .;
+ LONG(0)
}
/* writeable */
@@ -64,6 +66,7 @@ SECTIONS
*(.data.nosave)
. = ALIGN(4096);
__nosave_end = .;
+ LONG(0)
}
. = ALIGN(4096);
@@ -81,6 +84,7 @@ SECTIONS
.data.read_mostly : AT(ADDR(.data.read_mostly) - LOAD_OFFSET) {
*(.data.read_mostly)
_edata = .; /* End of data section */
+ LONG(0)
}
#ifdef CONFIG_STACK_UNWIND
@@ -89,6 +93,7 @@ SECTIONS
__start_unwind = .;
*(.eh_frame)
__end_unwind = .;
+ LONG(0)
}
#endif
@@ -104,17 +109,20 @@ SECTIONS
__smp_alt_instructions = .;
*(.smp_altinstructions)
__smp_alt_instructions_end = .;
+ LONG(0)
}
. = ALIGN(4);
.smp_locks : AT(ADDR(.smp_locks) - LOAD_OFFSET) {
__smp_locks = .;
*(.smp_locks)
__smp_locks_end = .;
+ LONG(0)
}
.smp_altinstr_replacement : AT(ADDR(.smp_altinstr_replacement) - LOAD_OFFSET) {
*(.smp_altinstr_replacement)
. = ALIGN(4096);
__smp_alt_end = .;
+ LONG(0)
}
/* will be freed after init */
@@ -124,6 +132,7 @@ SECTIONS
_sinittext = .;
*(.init.text)
_einittext = .;
+ LONG(0)
}
.init.data : AT(ADDR(.init.data) - LOAD_OFFSET) { *(.init.data) }
. = ALIGN(16);
@@ -131,6 +140,7 @@ SECTIONS
__setup_start = .;
*(.init.setup)
__setup_end = .;
+ LONG(0)
}
.initcall.init : AT(ADDR(.initcall.init) - LOAD_OFFSET) {
__initcall_start = .;
@@ -142,11 +152,13 @@ SECTIONS
*(.initcall6.init)
*(.initcall7.init)
__initcall_end = .;
+ LONG(0)
}
.con_initcall.init : AT(ADDR(.con_initcall.init) - LOAD_OFFSET) {
__con_initcall_start = .;
*(.con_initcall.init)
__con_initcall_end = .;
+ LONG(0)
}
SECURITY_INIT
. = ALIGN(4);
@@ -154,6 +166,7 @@ SECTIONS
__alt_instructions = .;
*(.altinstructions)
__alt_instructions_end = .;
+ LONG(0)
}
.altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
*(.altinstr_replacement)
@@ -167,12 +180,14 @@ SECTIONS
__initramfs_start = .;
*(.init.ramfs)
__initramfs_end = .;
+ LONG(0)
}
. = ALIGN(L1_CACHE_BYTES);
.data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
__per_cpu_start = .;
*(.data.percpu)
__per_cpu_end = .;
+ LONG(0)
}
. = ALIGN(4096);
/* freed after init ends here */
_
Defining __PHYSICAL_START and __KERNEL_START in asm-i386/page.h works but
it triggers a full kernel rebuild for the silliest of reasons. This
modifies the users to directly use CONFIG_PHYSICAL_START and linux/config.h
which prevents the full rebuild problem, which makes the code much
more maintainer and hopefully user friendly.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/boot/compressed/head.S | 7 +++----
arch/i386/boot/compressed/misc.c | 8 ++++----
arch/i386/kernel/vmlinux.lds.S | 3 ++-
include/asm-i386/page.h | 3 ---
4 files changed, 9 insertions(+), 12 deletions(-)
diff -puN arch/i386/boot/compressed/head.S~i386-CONFIG_PHYSICAL_START-cleanup arch/i386/boot/compressed/head.S
--- linux-2.6.18-git17/arch/i386/boot/compressed/head.S~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/head.S 2006-10-02 14:33:44.000000000 -0400
@@ -25,7 +25,6 @@
#include <linux/linkage.h>
#include <asm/segment.h>
-#include <asm/page.h>
.globl startup_32
@@ -75,7 +74,7 @@ startup_32:
popl %esi # discard address
popl %esi # real mode pointer
xorl %ebx,%ebx
- ljmp $(__BOOT_CS), $__PHYSICAL_START
+ ljmp $(__BOOT_CS), $CONFIG_PHYSICAL_START
/*
* We come here, if we were loaded high.
@@ -100,7 +99,7 @@ startup_32:
popl %ecx # lcount
popl %edx # high_buffer_start
popl %eax # hcount
- movl $__PHYSICAL_START,%edi
+ movl $CONFIG_PHYSICAL_START,%edi
cli # make sure we don't get interrupted
ljmp $(__BOOT_CS), $0x1000 # and jump to the move routine
@@ -125,5 +124,5 @@ move_routine_start:
movsl
movl %ebx,%esi # Restore setup pointer
xorl %ebx,%ebx
- ljmp $(__BOOT_CS), $__PHYSICAL_START
+ ljmp $(__BOOT_CS), $CONFIG_PHYSICAL_START
move_routine_end:
diff -puN arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup arch/i386/boot/compressed/misc.c
--- linux-2.6.18-git17/arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/misc.c 2006-10-02 14:33:44.000000000 -0400
@@ -9,11 +9,11 @@
* High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
*/
+#include <linux/config.h>
#include <linux/linkage.h>
#include <linux/vmalloc.h>
#include <linux/screen_info.h>
#include <asm/io.h>
-#include <asm/page.h>
/*
* gzip declarations
@@ -303,7 +303,7 @@ static void setup_normal_output_buffer(v
#else
if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < 1024) error("Less than 2MB of memory");
#endif
- output_data = (unsigned char *)__PHYSICAL_START; /* Normally Points to 1M */
+ output_data = (unsigned char *)CONFIG_PHYSICAL_START; /* Normally Points to 1M */
free_mem_end_ptr = (long)real_mode;
}
@@ -326,8 +326,8 @@ static void setup_output_buffer_if_we_ru
low_buffer_size = low_buffer_end - LOW_BUFFER_START;
high_loaded = 1;
free_mem_end_ptr = (long)high_buffer_start;
- if ( (__PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
- high_buffer_start = (uch *)(__PHYSICAL_START + low_buffer_size);
+ if ( (CONFIG_PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
+ high_buffer_start = (uch *)(CONFIG_PHYSICAL_START + low_buffer_size);
mv->hcount = 0; /* say: we need not to move high_buffer */
}
else mv->hcount = -1;
diff -puN arch/i386/kernel/vmlinux.lds.S~i386-CONFIG_PHYSICAL_START-cleanup arch/i386/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/i386/kernel/vmlinux.lds.S~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/kernel/vmlinux.lds.S 2006-10-02 14:33:13.000000000 -0400
@@ -4,6 +4,7 @@
#define LOAD_OFFSET __PAGE_OFFSET
+#include <linux/config.h>
#include <asm-generic/vmlinux.lds.h>
#include <asm/thread_info.h>
#include <asm/page.h>
@@ -21,7 +22,7 @@ PHDRS {
}
SECTIONS
{
- . = __KERNEL_START;
+ . = LOAD_OFFSET + CONFIG_PHYSICAL_START;
phys_startup_32 = startup_32 - LOAD_OFFSET;
/* read-only */
.text : AT(ADDR(.text) - LOAD_OFFSET) {
diff -puN include/asm-i386/page.h~i386-CONFIG_PHYSICAL_START-cleanup include/asm-i386/page.h
--- linux-2.6.18-git17/include/asm-i386/page.h~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/include/asm-i386/page.h 2006-10-02 13:17:58.000000000 -0400
@@ -112,12 +112,9 @@ extern int page_is_ram(unsigned long pag
#ifdef __ASSEMBLY__
#define __PAGE_OFFSET CONFIG_PAGE_OFFSET
-#define __PHYSICAL_START CONFIG_PHYSICAL_START
#else
#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
-#define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
#endif
-#define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
#define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET)
_
Increasingly the cobbled together boot protocol that
is bzImage does not have the flexibility to deal
with booting in new situations.
Now that we no longer support the bootsector loader
we have 512 bytes at the very start of a bzImage that
we can use for other things.
Placing an ELF header there allows us to retain
a single binary for all of x86 while at the same
time describing things that bzImage does not allow
us to describe.
The existing bugger off code for warning if we attempt to
boot from the bootsector is kept but the error message is
made more terse so we have a little more room to play with.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/boot/Makefile | 2
arch/i386/boot/bootsect.S | 94 ++++++++++++++++++
arch/i386/boot/tools/build.c | 214 ++++++++++++++++++++++++++++++++++++++-----
include/linux/elf_boot.h | 19 +++
4 files changed, 301 insertions(+), 28 deletions(-)
diff -puN arch/i386/boot/bootsect.S~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/bootsect.S
--- linux-2.6.18-git17/arch/i386/boot/bootsect.S~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-02 14:21:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/bootsect.S 2006-10-02 14:21:59.000000000 -0400
@@ -13,6 +13,12 @@
*
*/
+#include <linux/version.h>
+#include <linux/utsrelease.h>
+#include <linux/compile.h>
+#include <linux/elf.h>
+#include <linux/elf_boot.h>
+#include <asm/page.h>
#include <asm/boot.h>
SETUPSECTS = 4 /* default nr of setup-sectors */
@@ -42,10 +48,92 @@ SWAP_DEV = 0 /* SWAP_DEV is now writte
.global _start
_start:
+ehdr:
+ # e_ident is carefully crafted so if this is treated
+ # as an x86 bootsector you will execute through
+ # e_ident and then print the bugger off message.
+ # The 1 stores to bx+di is unfortunate it is
+ # unlikely to affect the ability to print
+ # a message and you aren't supposed to be booting a
+ # bzImage directly from a floppy anyway.
+
+ # e_ident
+ .byte ELFMAG0, ELFMAG1, ELFMAG2, ELFMAG3
+ .byte ELFCLASS32, ELFDATA2LSB, EV_CURRENT, ELFOSABI_STANDALONE
+ .byte 0xeb, 0x3d, 0, 0, 0, 0, 0, 0
+#ifndef CONFIG_RELOCATABLE
+ .word ET_EXEC # e_type
+#else
+ .word ET_DYN # e_type
+#endif
+ .word EM_386 # e_machine
+ .int 1 # e_version
+ .int LOAD_PHYSICAL_ADDR # e_entry
+ .int phdr - _start # e_phoff
+ .int 0 # e_shoff
+ .int 0 # e_flags
+ .word e_ehdr - ehdr # e_ehsize
+ .word e_phdr1 - phdr # e_phentsize
+ .word (e_phdr - phdr)/(e_phdr1 - phdr) # e_phnum
+ .word 40 # e_shentsize
+ .word 0 # e_shnum
+ .word 0 # e_shstrndx
+e_ehdr:
+.org 71
+normalize:
# Normalize the start address
jmpl $BOOTSEG, $start2
+.org 80
+phdr:
+ .int PT_LOAD # p_type
+ .int (SETUPSECTS+1)*512 # p_offset
+ .int LOAD_PHYSICAL_ADDR + __PAGE_OFFSET # p_vaddr
+ .int LOAD_PHYSICAL_ADDR # p_paddr
+ .int SYSSIZE*16 # p_filesz
+ .int 0 # p_memsz
+ .int PF_R | PF_W | PF_X # p_flags
+ .int CONFIG_PHYSICAL_ALIGN # p_align
+e_phdr1:
+
+ .int PT_NOTE # p_type
+ .int b_note - _start # p_offset
+ .int 0 # p_vaddr
+ .int 0 # p_paddr
+ .int e_note - b_note # p_filesz
+ .int 0 # p_memsz
+ .int 0 # p_flags
+ .int 0 # p_align
+e_phdr:
+
+.macro note name, type
+ .balign 4
+ .int 2f - 1f # n_namesz
+ .int 4f - 3f # n_descsz
+ .int \type # n_type
+ .balign 4
+1: .asciz "\name"
+2: .balign 4
+3:
+.endm
+.macro enote
+4: .balign 4
+.endm
+
+ .balign 4
+b_note:
+ note ELF_NOTE_BOOT, EIN_PROGRAM_NAME
+ .asciz "Linux"
+ enote
+ note ELF_NOTE_BOOT, EIN_PROGRAM_VERSION
+ .asciz UTS_RELEASE
+ enote
+ note ELF_NOTE_BOOT, EIN_ARGUMENT_STYLE
+ .asciz "Linux"
+ enote
+e_note:
+
start2:
movw %cs, %ax
movw %ax, %ds
@@ -78,11 +166,11 @@ die:
bugger_off_msg:
- .ascii "Direct booting from floppy is no longer supported.\r\n"
- .ascii "Please use a boot loader program instead.\r\n"
+ .ascii "Booting linux without a boot loader is no longer supported.\r\n"
.ascii "\n"
- .ascii "Remove disk and press any key to reboot . . .\r\n"
+ .ascii "Press any key to reboot . . .\r\n"
.byte 0
+ebugger_off_msg:
# Kernel attributes; used by setup
diff -puN arch/i386/boot/Makefile~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/Makefile
--- linux-2.6.18-git17/arch/i386/boot/Makefile~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-02 14:21:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/Makefile 2006-10-02 14:21:59.000000000 -0400
@@ -43,7 +43,7 @@ $(obj)/bzImage: BUILDFLAGS := -b
quiet_cmd_image = BUILD $@
cmd_image = $(obj)/tools/build $(BUILDFLAGS) $(obj)/bootsect $(obj)/setup \
- $(obj)/vmlinux.bin $(ROOT_DEV) > $@
+ $(obj)/vmlinux.bin $(ROOT_DEV) vmlinux > $@
$(obj)/zImage $(obj)/bzImage: $(obj)/bootsect $(obj)/setup \
$(obj)/vmlinux.bin $(obj)/tools/build FORCE
diff -puN arch/i386/boot/tools/build.c~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/tools/build.c
--- linux-2.6.18-git17/arch/i386/boot/tools/build.c~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-02 14:21:59.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/tools/build.c 2006-10-02 14:21:59.000000000 -0400
@@ -27,6 +27,11 @@
#include <string.h>
#include <stdlib.h>
#include <stdarg.h>
+#include <elf.h>
+#include <byteswap.h>
+#define USE_BSD
+#include <endian.h>
+#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/sysmacros.h>
@@ -48,6 +53,10 @@ byte buf[1024];
int fd;
int is_big_kernel;
+#define MAX_PHDRS 100
+static Elf32_Ehdr ehdr;
+static Elf32_Phdr phdr[MAX_PHDRS];
+
void die(const char * str, ...)
{
va_list args;
@@ -57,20 +66,151 @@ void die(const char * str, ...)
exit(1);
}
+#if BYTE_ORDER == LITTLE_ENDIAN
+#define le16_to_cpu(val) (val)
+#define le32_to_cpu(val) (val)
+#endif
+#if BYTE_ORDER == BIG_ENDIAN
+#define le16_to_cpu(val) bswap_16(val)
+#define le32_to_cpu(val) bswap_32(val)
+#endif
+
+static uint16_t elf16_to_cpu(uint16_t val)
+{
+ return le16_to_cpu(val);
+}
+
+static uint32_t elf32_to_cpu(uint32_t val)
+{
+ return le32_to_cpu(val);
+}
+
void file_open(const char *name)
{
if ((fd = open(name, O_RDONLY, 0)) < 0)
die("Unable to open `%s': %m", name);
}
+static void read_ehdr(void)
+{
+ if (read(fd, &ehdr, sizeof(ehdr)) != sizeof(ehdr)) {
+ die("Cannot read ELF header: %s\n",
+ strerror(errno));
+ }
+ if (memcmp(ehdr.e_ident, ELFMAG, 4) != 0) {
+ die("No ELF magic\n");
+ }
+ if (ehdr.e_ident[EI_CLASS] != ELFCLASS32) {
+ die("Not a 32 bit executable\n");
+ }
+ if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) {
+ die("Not a LSB ELF executable\n");
+ }
+ if (ehdr.e_ident[EI_VERSION] != EV_CURRENT) {
+ die("Unknown ELF version\n");
+ }
+ /* Convert the fields to native endian */
+ ehdr.e_type = elf16_to_cpu(ehdr.e_type);
+ ehdr.e_machine = elf16_to_cpu(ehdr.e_machine);
+ ehdr.e_version = elf32_to_cpu(ehdr.e_version);
+ ehdr.e_entry = elf32_to_cpu(ehdr.e_entry);
+ ehdr.e_phoff = elf32_to_cpu(ehdr.e_phoff);
+ ehdr.e_shoff = elf32_to_cpu(ehdr.e_shoff);
+ ehdr.e_flags = elf32_to_cpu(ehdr.e_flags);
+ ehdr.e_ehsize = elf16_to_cpu(ehdr.e_ehsize);
+ ehdr.e_phentsize = elf16_to_cpu(ehdr.e_phentsize);
+ ehdr.e_phnum = elf16_to_cpu(ehdr.e_phnum);
+ ehdr.e_shentsize = elf16_to_cpu(ehdr.e_shentsize);
+ ehdr.e_shnum = elf16_to_cpu(ehdr.e_shnum);
+ ehdr.e_shstrndx = elf16_to_cpu(ehdr.e_shstrndx);
+
+ if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
+ die("Unsupported ELF header type\n");
+ }
+ if (ehdr.e_machine != EM_386) {
+ die("Not for x86\n");
+ }
+ if (ehdr.e_version != EV_CURRENT) {
+ die("Unknown ELF version\n");
+ }
+ if (ehdr.e_ehsize != sizeof(Elf32_Ehdr)) {
+ die("Bad Elf header size\n");
+ }
+ if (ehdr.e_phentsize != sizeof(Elf32_Phdr)) {
+ die("Bad program header entry\n");
+ }
+ if (ehdr.e_shentsize != sizeof(Elf32_Shdr)) {
+ die("Bad section header entry\n");
+ }
+ if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+ die("String table index out of bounds\n");
+ }
+}
+
+static void read_phds(void)
+{
+ int i;
+ size_t size;
+ if (ehdr.e_phnum > MAX_PHDRS) {
+ die("%d program headers supported: %d\n",
+ ehdr.e_phnum, MAX_PHDRS);
+ }
+ if (lseek(fd, ehdr.e_phoff, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ ehdr.e_phoff, strerror(errno));
+ }
+ size = sizeof(phdr[0])*ehdr.e_phnum;
+ if (read(fd, &phdr, size) != size) {
+ die("Cannot read ELF section headers: %s\n",
+ strerror(errno));
+ }
+ for(i = 0; i < ehdr.e_phnum; i++) {
+ phdr[i].p_type = elf32_to_cpu(phdr[i].p_type);
+ phdr[i].p_offset = elf32_to_cpu(phdr[i].p_offset);
+ phdr[i].p_vaddr = elf32_to_cpu(phdr[i].p_vaddr);
+ phdr[i].p_paddr = elf32_to_cpu(phdr[i].p_paddr);
+ phdr[i].p_filesz = elf32_to_cpu(phdr[i].p_filesz);
+ phdr[i].p_memsz = elf32_to_cpu(phdr[i].p_memsz);
+ phdr[i].p_flags = elf32_to_cpu(phdr[i].p_flags);
+ phdr[i].p_align = elf32_to_cpu(phdr[i].p_align);
+ }
+}
+
+unsigned long vmlinux_memsz(void)
+{
+ unsigned long min, max, size;
+ int i;
+ min = 0xffffffff;
+ max = 0;
+ for(i = 0; i < ehdr.e_phnum; i++) {
+ unsigned long start, end;
+ if (phdr[i].p_type != PT_LOAD)
+ continue;
+ start = phdr[i].p_paddr;
+ end = phdr[i].p_paddr + phdr[i].p_memsz;
+ if (start < min)
+ min = start;
+ if (end > max)
+ max = end;
+ }
+ /* Get the reported size by vmlinux */
+ size = max - min;
+ /* Add 128K for the bootmem bitmap */
+ size += 128*1024;
+ /* Add in space for the initial page tables */
+ size = ((size + (((size + 4095) >> 12)*4)) + 4095) & ~4095;
+ return size;
+}
+
void usage(void)
{
- die("Usage: build [-b] bootsect setup system [rootdev] [> image]");
+ die("Usage: build [-b] bootsect setup system rootdev vmlinux [> image]");
}
int main(int argc, char ** argv)
{
unsigned int i, sz, setup_sectors;
+ unsigned kernel_offset, kernel_filesz, kernel_memsz;
int c;
u32 sys_size;
byte major_root, minor_root;
@@ -81,30 +221,25 @@ int main(int argc, char ** argv)
is_big_kernel = 1;
argc--, argv++;
}
- if ((argc < 4) || (argc > 5))
+ if (argc != 6)
usage();
- if (argc > 4) {
- if (!strcmp(argv[4], "CURRENT")) {
- if (stat("/", &sb)) {
- perror("/");
- die("Couldn't stat /");
- }
- major_root = major(sb.st_dev);
- minor_root = minor(sb.st_dev);
- } else if (strcmp(argv[4], "FLOPPY")) {
- if (stat(argv[4], &sb)) {
- perror(argv[4]);
- die("Couldn't stat root device.");
- }
- major_root = major(sb.st_rdev);
- minor_root = minor(sb.st_rdev);
- } else {
- major_root = 0;
- minor_root = 0;
+ if (!strcmp(argv[4], "CURRENT")) {
+ if (stat("/", &sb)) {
+ perror("/");
+ die("Couldn't stat /");
+ }
+ major_root = major(sb.st_dev);
+ minor_root = minor(sb.st_dev);
+ } else if (strcmp(argv[4], "FLOPPY")) {
+ if (stat(argv[4], &sb)) {
+ perror(argv[4]);
+ die("Couldn't stat root device.");
}
+ major_root = major(sb.st_rdev);
+ minor_root = minor(sb.st_rdev);
} else {
- major_root = DEFAULT_MAJOR_ROOT;
- minor_root = DEFAULT_MINOR_ROOT;
+ major_root = 0;
+ minor_root = 0;
}
fprintf(stderr, "Root device is (%d, %d)\n", major_root, minor_root);
@@ -144,10 +279,11 @@ int main(int argc, char ** argv)
i += c;
}
+ kernel_offset = (setup_sectors + 1)*512;
file_open(argv[3]);
if (fstat (fd, &sb))
die("Unable to stat `%s': %m", argv[3]);
- sz = sb.st_size;
+ kernel_filesz = sz = sb.st_size;
fprintf (stderr, "System is %d kB\n", sz/1024);
sys_size = (sz + 15) / 16;
if (!is_big_kernel && sys_size > DEF_SYSSIZE)
@@ -168,7 +304,37 @@ int main(int argc, char ** argv)
}
close(fd);
- if (lseek(1, 497, SEEK_SET) != 497) /* Write sizes to the bootsector */
+ file_open(argv[5]);
+ read_ehdr();
+ read_phds();
+ close(fd);
+ kernel_memsz = vmlinux_memsz();
+
+ if (lseek(1, 84, SEEK_SET) != 84) /* Write sizes to the bootsector */
+ die("Output: seek failed");
+ buf[0] = (kernel_offset >> 0) & 0xff;
+ buf[1] = (kernel_offset >> 8) & 0xff;
+ buf[2] = (kernel_offset >> 16) & 0xff;
+ buf[3] = (kernel_offset >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel file offset failed");
+ if (lseek(1, 96, SEEK_SET) != 96)
+ die("Output: seek failed");
+ buf[0] = (kernel_filesz >> 0) & 0xff;
+ buf[1] = (kernel_filesz >> 8) & 0xff;
+ buf[2] = (kernel_filesz >> 16) & 0xff;
+ buf[3] = (kernel_filesz >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel file size failed");
+ if (lseek(1, 100, SEEK_SET) != 100)
+ die("Output: seek failed");
+ buf[0] = (kernel_memsz >> 0) & 0xff;
+ buf[1] = (kernel_memsz >> 8) & 0xff;
+ buf[2] = (kernel_memsz >> 16) & 0xff;
+ buf[3] = (kernel_memsz >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel memory size failed");
+ if (lseek(1, 497, SEEK_SET) != 497)
die("Output: seek failed");
buf[0] = setup_sectors;
if (write(1, buf, 1) != 1)
diff -puN /dev/null include/linux/elf_boot.h
--- /dev/null 2006-10-02 12:52:59.410341682 -0400
+++ linux-2.6.18-git17-root/include/linux/elf_boot.h 2006-10-02 14:21:59.000000000 -0400
@@ -0,0 +1,19 @@
+#ifndef ELF_BOOT_H
+#define ELF_BOOT_H
+
+/* Elf notes to help bootloaders identify what program they are booting.
+ */
+
+/* Standardized Elf image notes for booting... The name for all of these is ELFBoot */
+#define ELF_NOTE_BOOT "ELFBoot"
+
+#define EIN_PROGRAM_NAME 0x00000001
+/* The program in this ELF file */
+#define EIN_PROGRAM_VERSION 0x00000002
+/* The version of the program in this ELF file */
+#define EIN_PROGRAM_CHECKSUM 0x00000003
+/* ip style checksum of the memory image. */
+#define EIN_ARGUMENT_STYLE 0x00000004
+/* String identifying argument passing style */
+
+#endif /* ELF_BOOT_H */
_
This patch modifies the x86 kernel so that if CONFIG_RELOCATABLE is
selected it will be able to be loaded at any 4K aligned address below
1G. The technique used is to compile the decompressor with -fPIC and
modify it so the decompressor is fully relocatable. For the main
kernel relocations are generated. Resulting in a kernel that is relocatable
with no runtime overhead and no need to modify the source code.
A reserved 32bit word in the parameters has been assigned
to serve as a stack so we figure out where are running.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/Kconfig | 12
arch/i386/Makefile | 4
arch/i386/boot/compressed/Makefile | 22 +
arch/i386/boot/compressed/head.S | 190 +++++++----
arch/i386/boot/compressed/misc.c | 267 ++++++++--------
arch/i386/boot/compressed/relocs.c | 563 ++++++++++++++++++++++++++++++++++
arch/i386/boot/compressed/vmlinux.lds | 43 ++
arch/i386/boot/compressed/vmlinux.scr | 3
arch/i386/boot/setup.S | 29 +
include/linux/screen_info.h | 3
10 files changed, 920 insertions(+), 216 deletions(-)
diff -puN arch/i386/boot/compressed/head.S~i386-Relocatable-kernel-support arch/i386/boot/compressed/head.S
--- linux-2.6.18-git17/arch/i386/boot/compressed/head.S~i386-Relocatable-kernel-support 2006-10-02 14:09:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/head.S 2006-10-02 14:33:13.000000000 -0400
@@ -25,9 +25,11 @@
#include <linux/linkage.h>
#include <asm/segment.h>
+#include <asm/page.h>
+.section ".text.head"
.globl startup_32
-
+
startup_32:
cld
cli
@@ -36,93 +38,141 @@ startup_32:
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
+ movl %eax,%ss
- lss stack_start,%esp
- xorl %eax,%eax
-1: incl %eax # check that A20 really IS enabled
- movl %eax,0x000000 # loop forever if it isn't
- cmpl %eax,0x100000
- je 1b
+/* Calculate the delta between where we were compiled to run
+ * at and where we were actually loaded at. This can only be done
+ * with a short local call on x86. Nothing else will tell us what
+ * address we are running at. The reserved chunk of the real-mode
+ * data at 0x34-0x3f are used as the stack for this calculation.
+ * Only 4 bytes are needed.
+ */
+ leal 0x40(%esi), %esp
+ call 1f
+1: popl %ebp
+ subl $1b, %ebp
+
+/* Compute the delta between where we were compiled to run at
+ * and where the code will actually run at.
+ */
+ /* Start with the delta to where the kernel will run at. If we are
+ * a relocatable kernel this is the delta to our load address otherwise
+ * this is the delta to CONFIG_PHYSICAL start.
+ */
+#ifdef CONFIG_RELOCATABLE
+ movl %ebp, %ebx
+#else
+ movl $(CONFIG_PHYSICAL_START - startup_32), %ebx
+#endif
+
+ /* Replace the compressed data size with the uncompressed size */
+ subl input_len(%ebp), %ebx
+ movl output_len(%ebp), %eax
+ addl %eax, %ebx
+ /* Add 8 bytes for every 32K input block */
+ shrl $12, %eax
+ addl %eax, %ebx
+ /* Add 32K + 18 bytes of extra slack */
+ addl $(32768 + 18), %ebx
+ /* Align on a 4K boundary */
+ addl $4095, %ebx
+ andl $~4095, %ebx
+
+/* Copy the compressed kernel to the end of our buffer
+ * where decompression in place becomes safe.
+ */
+ pushl %esi
+ leal _end(%ebp), %esi
+ leal _end(%ebx), %edi
+ movl $(_end - startup_32), %ecx
+ std
+ rep
+ movsb
+ cld
+ popl %esi
+
+/* Compute the kernel start address.
+ */
+#ifdef CONFIG_RELOCATABLE
+ leal startup_32(%ebp), %ebp
+#else
+ movl $CONFIG_PHYSICAL_START, %ebp
+#endif
/*
- * Initialize eflags. Some BIOS's leave bits like NT set. This would
- * confuse the debugger if this code is traced.
- * XXX - best to initialize before switching to protected mode.
+ * Jump to the relocated address.
*/
- pushl $0
- popfl
+ leal relocated(%ebx), %eax
+ jmp *%eax
+.section ".text"
+relocated:
+
/*
* Clear BSS
*/
xorl %eax,%eax
- movl $_edata,%edi
- movl $_end,%ecx
+ leal _edata(%ebx),%edi
+ leal _end(%ebx), %ecx
subl %edi,%ecx
cld
rep
stosb
+
+/*
+ * Setup the stack for the decompressor
+ */
+ leal stack_end(%ebx), %esp
+
/*
* Do the decompression, and jump to the new kernel..
*/
- subl $16,%esp # place for structure on the stack
- movl %esp,%eax
+ movl output_len(%ebx), %eax
+ pushl %eax
+ pushl %ebp # output address
+ movl input_len(%ebx), %eax
+ pushl %eax # input_len
+ leal input_data(%ebx), %eax
+ pushl %eax # input_data
+ leal _end(%ebx), %eax
+ pushl %eax # end of the image as third argument
pushl %esi # real mode pointer as second arg
- pushl %eax # address of structure as first arg
call decompress_kernel
- orl %eax,%eax
- jnz 3f
- popl %esi # discard address
- popl %esi # real mode pointer
- xorl %ebx,%ebx
- ljmp $(__BOOT_CS), $CONFIG_PHYSICAL_START
+ addl $20, %esp
+ popl %ecx
+
+#if CONFIG_RELOCATABLE
+/* Find the address of the relocations.
+ */
+ movl %ebp, %edi
+ addl %ecx, %edi
+
+/* Calculate the delta between where vmlinux was compiled to run
+ * and where it was actually loaded.
+ */
+ movl %ebp, %ebx
+ subl $CONFIG_PHYSICAL_START, %ebx
/*
- * We come here, if we were loaded high.
- * We need to move the move-in-place routine down to 0x1000
- * and then start it with the buffer addresses in registers,
- * which we got from the stack.
- */
-3:
- movl $move_routine_start,%esi
- movl $0x1000,%edi
- movl $move_routine_end,%ecx
- subl %esi,%ecx
- addl $3,%ecx
- shrl $2,%ecx
- cld
- rep
- movsl
+ * Process relocations.
+ */
- popl %esi # discard the address
- popl %ebx # real mode pointer
- popl %esi # low_buffer_start
- popl %ecx # lcount
- popl %edx # high_buffer_start
- popl %eax # hcount
- movl $CONFIG_PHYSICAL_START,%edi
- cli # make sure we don't get interrupted
- ljmp $(__BOOT_CS), $0x1000 # and jump to the move routine
-
-/*
- * Routine (template) for moving the decompressed kernel in place,
- * if we were high loaded. This _must_ PIC-code !
- */
-move_routine_start:
- movl %ecx,%ebp
- shrl $2,%ecx
- rep
- movsl
- movl %ebp,%ecx
- andl $3,%ecx
- rep
- movsb
- movl %edx,%esi
- movl %eax,%ecx # NOTE: rep movsb won't move if %ecx == 0
- addl $3,%ecx
- shrl $2,%ecx
- rep
- movsl
- movl %ebx,%esi # Restore setup pointer
+1: subl $4, %edi
+ movl 0(%edi), %ecx
+ testl %ecx, %ecx
+ jz 2f
+ addl %ebx, -__PAGE_OFFSET(%ebx, %ecx)
+ jmp 1b
+2:
+#endif
+
+/*
+ * Jump to the decompressed kernel.
+ */
xorl %ebx,%ebx
- ljmp $(__BOOT_CS), $CONFIG_PHYSICAL_START
-move_routine_end:
+ jmp *%ebp
+
+.bss
+.balign 4
+stack:
+ .fill 4096, 1, 0
+stack_end:
diff -puN arch/i386/boot/compressed/Makefile~i386-Relocatable-kernel-support arch/i386/boot/compressed/Makefile
--- linux-2.6.18-git17/arch/i386/boot/compressed/Makefile~i386-Relocatable-kernel-support 2006-10-02 14:09:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/Makefile 2006-10-02 14:09:14.000000000 -0400
@@ -7,19 +7,33 @@
targets := vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o
EXTRA_AFLAGS := -traditional
-LDFLAGS_vmlinux := -Ttext $(IMAGE_OFFSET) -e startup_32
+LDFLAGS_vmlinux := -T
+CFLAGS_misc.o += -fPIC
+hostprogs-y := relocs
-$(obj)/vmlinux: $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE
+$(obj)/vmlinux: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE
$(call if_changed,ld)
@:
$(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
-$(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
+quiet_cmd_relocs = RELOCS $@
+ cmd_relocs = $(obj)/relocs $< > $@
+$(obj)/vmlinux.relocs: vmlinux $(obj)/relocs FORCE
+ $(call if_changed,relocs)
+
+vmlinux.bin.all-y := $(obj)/vmlinux.bin
+vmlinux.bin.all-$(CONFIG_RELOCATABLE) += $(obj)/vmlinux.relocs
+quiet_cmd_relocbin = BUILD $@
+ cmd_relocbin = cat $(filter-out FORCE,$^) > $@
+$(obj)/vmlinux.bin.all: $(vmlinux.bin.all-y) FORCE
+ $(call if_changed,relocbin)
+
+$(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin.all FORCE
$(call if_changed,gzip)
LDFLAGS_piggy.o := -r --format binary --oformat elf32-i386 -T
-$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE
+$(obj)/piggy.o: $(src)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE
$(call if_changed,ld)
diff -puN arch/i386/boot/compressed/misc.c~i386-Relocatable-kernel-support arch/i386/boot/compressed/misc.c
--- linux-2.6.18-git17/arch/i386/boot/compressed/misc.c~i386-Relocatable-kernel-support 2006-10-02 14:09:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/misc.c 2006-10-02 14:33:13.000000000 -0400
@@ -14,6 +14,88 @@
#include <linux/vmalloc.h>
#include <linux/screen_info.h>
#include <asm/io.h>
+#include <asm/page.h>
+
+/* WARNING!!
+ * This code is compiled with -fPIC and it is relocated dynamically
+ * at run time, but no relocation processing is performed.
+ * This means that it is not safe to place pointers in static structures.
+ */
+
+/*
+ * Getting to provable safe in place decompression is hard.
+ * Worst case behaviours need to be analized.
+ * Background information:
+ *
+ * The file layout is:
+ * magic[2]
+ * method[1]
+ * flags[1]
+ * timestamp[4]
+ * extraflags[1]
+ * os[1]
+ * compressed data blocks[N]
+ * crc[4] orig_len[4]
+ *
+ * resulting in 18 bytes of non compressed data overhead.
+ *
+ * Files divided into blocks
+ * 1 bit (last block flag)
+ * 2 bits (block type)
+ *
+ * 1 block occurs every 32K -1 bytes or when there 50% compression has been achieved.
+ * The smallest block type encoding is always used.
+ *
+ * stored:
+ * 32 bits length in bytes.
+ *
+ * fixed:
+ * magic fixed tree.
+ * symbols.
+ *
+ * dynamic:
+ * dynamic tree encoding.
+ * symbols.
+ *
+ *
+ * The buffer for decompression in place is the length of the
+ * uncompressed data, plus a small amount extra to keep the algorithm safe.
+ * The compressed data is placed at the end of the buffer. The output
+ * pointer is placed at the start of the buffer and the input pointer
+ * is placed where the compressed data starts. Problems will occur
+ * when the output pointer overruns the input pointer.
+ *
+ * The output pointer can only overrun the input pointer if the input
+ * pointer is moving faster than the output pointer. A condition only
+ * triggered by data whose compressed form is larger than the uncompressed
+ * form.
+ *
+ * The worst case at the block level is a growth of the compressed data
+ * of 5 bytes per 32767 bytes.
+ *
+ * The worst case internal to a compressed block is very hard to figure.
+ * The worst case can at least be boundined by having one bit that represents
+ * 32764 bytes and then all of the rest of the bytes representing the very
+ * very last byte.
+ *
+ * All of which is enough to compute an amount of extra data that is required
+ * to be safe. To avoid problems at the block level allocating 5 extra bytes
+ * per 32767 bytes of data is sufficient. To avoind problems internal to a block
+ * adding an extra 32767 bytes (the worst case uncompressed block size) is
+ * sufficient, to ensure that in the worst case the decompressed data for
+ * block will stop the byte before the compressed data for a block begins.
+ * To avoid problems with the compressed data's meta information an extra 18
+ * bytes are needed. Leading to the formula:
+ *
+ * extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
+ *
+ * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
+ * Adding 32768 instead of 32767 just makes for round numbers.
+ * Adding the decompressor_size is necessary as it musht live after all
+ * of the data as well. Last I measured the decompressor is about 14K.
+ * 10K of actuall data and 4K of bss.
+ *
+ */
/*
* gzip declarations
@@ -30,15 +112,20 @@ typedef unsigned char uch;
typedef unsigned short ush;
typedef unsigned long ulg;
-#define WSIZE 0x8000 /* Window size must be at least 32k, */
- /* and a power of two */
-
-static uch *inbuf; /* input buffer */
-static uch window[WSIZE]; /* Sliding window buffer */
-
-static unsigned insize = 0; /* valid bytes in inbuf */
-static unsigned inptr = 0; /* index of next byte to be processed in inbuf */
-static unsigned outcnt = 0; /* bytes in output buffer */
+#define WSIZE 0x80000000 /* Window size must be at least 32k,
+ * and a power of two
+ * We don't actually have a window just
+ * a huge output buffer so I report
+ * a 2G windows size, as that should
+ * always be larger than our output buffer.
+ */
+
+static uch *inbuf; /* input buffer */
+static uch *window; /* Sliding window buffer, (and final output buffer) */
+
+static unsigned insize; /* valid bytes in inbuf */
+static unsigned inptr; /* index of next byte to be processed in inbuf */
+static unsigned outcnt; /* bytes in output buffer */
/* gzip flag byte */
#define ASCII_FLAG 0x01 /* bit 0 set: file probably ASCII text */
@@ -89,8 +176,6 @@ extern unsigned char input_data[];
extern int input_len;
static long bytes_out = 0;
-static uch *output_data;
-static unsigned long output_ptr = 0;
static void *malloc(int size);
static void free(void *where);
@@ -100,17 +185,10 @@ static void *memcpy(void *dest, const vo
static void putstr(const char *);
-extern int end;
-static long free_mem_ptr = (long)&end;
-static long free_mem_end_ptr;
-
-#define INPLACE_MOVE_ROUTINE 0x1000
-#define LOW_BUFFER_START 0x2000
-#define LOW_BUFFER_MAX 0x90000
+static unsigned long free_mem_ptr;
+static unsigned long free_mem_end_ptr;
+
#define HEAP_SIZE 0x3000
-static unsigned int low_buffer_end, low_buffer_size;
-static int high_loaded =0;
-static uch *high_buffer_start /* = (uch *)(((ulg)&end) + HEAP_SIZE)*/;
static char *vidmem = (char *)0xb8000;
static int vidport;
@@ -151,7 +229,7 @@ static void gzip_mark(void **ptr)
static void gzip_release(void **ptr)
{
- free_mem_ptr = (long) *ptr;
+ free_mem_ptr = (unsigned long) *ptr;
}
static void scroll(void)
@@ -179,7 +257,7 @@ static void putstr(const char *s)
y--;
}
} else {
- vidmem [ ( x + cols * y ) * 2 ] = c;
+ vidmem [ ( x + cols * y ) * 2 ] = c;
if ( ++x >= cols ) {
x = 0;
if ( ++y >= lines ) {
@@ -224,58 +302,31 @@ static void* memcpy(void* dest, const vo
*/
static int fill_inbuf(void)
{
- if (insize != 0) {
- error("ran out of input data");
- }
-
- inbuf = input_data;
- insize = input_len;
- inptr = 1;
- return inbuf[0];
+ error("ran out of input data");
+ return 0;
}
/* ===========================================================================
* Write the output window window[0..outcnt-1] and update crc and bytes_out.
* (Used for the decompressed data only.)
*/
-static void flush_window_low(void)
-{
- ulg c = crc; /* temporary variable */
- unsigned n;
- uch *in, *out, ch;
-
- in = window;
- out = &output_data[output_ptr];
- for (n = 0; n < outcnt; n++) {
- ch = *out++ = *in++;
- c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
- }
- crc = c;
- bytes_out += (ulg)outcnt;
- output_ptr += (ulg)outcnt;
- outcnt = 0;
-}
-
-static void flush_window_high(void)
-{
- ulg c = crc; /* temporary variable */
- unsigned n;
- uch *in, ch;
- in = window;
- for (n = 0; n < outcnt; n++) {
- ch = *output_data++ = *in++;
- if ((ulg)output_data == low_buffer_end) output_data=high_buffer_start;
- c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
- }
- crc = c;
- bytes_out += (ulg)outcnt;
- outcnt = 0;
-}
-
static void flush_window(void)
{
- if (high_loaded) flush_window_high();
- else flush_window_low();
+ /* With my window equal to my output buffer
+ * I only need to compute the crc here.
+ */
+ ulg c = crc; /* temporary variable */
+ unsigned n;
+ uch *in, ch;
+
+ in = window;
+ for (n = 0; n < outcnt; n++) {
+ ch = *in++;
+ c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
+ }
+ crc = c;
+ bytes_out += (ulg)outcnt;
+ outcnt = 0;
}
static void error(char *x)
@@ -287,66 +338,8 @@ static void error(char *x)
while(1); /* Halt */
}
-#define STACK_SIZE (4096)
-
-long user_stack [STACK_SIZE];
-
-struct {
- long * a;
- short b;
- } stack_start = { & user_stack [STACK_SIZE] , __BOOT_DS };
-
-static void setup_normal_output_buffer(void)
-{
-#ifdef STANDARD_MEMORY_BIOS_CALL
- if (RM_EXT_MEM_K < 1024) error("Less than 2MB of memory");
-#else
- if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < 1024) error("Less than 2MB of memory");
-#endif
- output_data = (unsigned char *)CONFIG_PHYSICAL_START; /* Normally Points to 1M */
- free_mem_end_ptr = (long)real_mode;
-}
-
-struct moveparams {
- uch *low_buffer_start; int lcount;
- uch *high_buffer_start; int hcount;
-};
-
-static void setup_output_buffer_if_we_run_high(struct moveparams *mv)
-{
- high_buffer_start = (uch *)(((ulg)&end) + HEAP_SIZE);
-#ifdef STANDARD_MEMORY_BIOS_CALL
- if (RM_EXT_MEM_K < (3*1024)) error("Less than 4MB of memory");
-#else
- if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < (3*1024)) error("Less than 4MB of memory");
-#endif
- mv->low_buffer_start = output_data = (unsigned char *)LOW_BUFFER_START;
- low_buffer_end = ((unsigned int)real_mode > LOW_BUFFER_MAX
- ? LOW_BUFFER_MAX : (unsigned int)real_mode) & ~0xfff;
- low_buffer_size = low_buffer_end - LOW_BUFFER_START;
- high_loaded = 1;
- free_mem_end_ptr = (long)high_buffer_start;
- if ( (CONFIG_PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
- high_buffer_start = (uch *)(CONFIG_PHYSICAL_START + low_buffer_size);
- mv->hcount = 0; /* say: we need not to move high_buffer */
- }
- else mv->hcount = -1;
- mv->high_buffer_start = high_buffer_start;
-}
-
-static void close_output_buffer_if_we_run_high(struct moveparams *mv)
-{
- if (bytes_out > low_buffer_size) {
- mv->lcount = low_buffer_size;
- if (mv->hcount)
- mv->hcount = bytes_out - low_buffer_size;
- } else {
- mv->lcount = bytes_out;
- mv->hcount = 0;
- }
-}
-
-asmlinkage int decompress_kernel(struct moveparams *mv, void *rmode)
+asmlinkage void decompress_kernel(void *rmode, unsigned long end,
+ uch *input_data, unsigned long input_len, uch *output)
{
real_mode = rmode;
@@ -361,13 +354,25 @@ asmlinkage int decompress_kernel(struct
lines = RM_SCREEN_INFO.orig_video_lines;
cols = RM_SCREEN_INFO.orig_video_cols;
- if (free_mem_ptr < 0x100000) setup_normal_output_buffer();
- else setup_output_buffer_if_we_run_high(mv);
+ window = output; /* Output buffer (Normally at 1M) */
+ free_mem_ptr = end; /* Heap */
+ free_mem_end_ptr = end + HEAP_SIZE;
+ inbuf = input_data; /* Input buffer */
+ insize = input_len;
+ inptr = 0;
+
+ if (((u32)output - CONFIG_PHYSICAL_START) & 0x3fffff)
+ error("Destination address not 4M aligned");
+ if (end > ((-__PAGE_OFFSET-(512 <<20)-1) & 0x7fffffff))
+ error("Destination address too large");
+#ifndef CONFIG_RELOCATABLE
+ if ((u32)output != CONFIG_PHYSICAL_START)
+ error("Wrong destination address");
+#endif
makecrc();
putstr("Uncompressing Linux... ");
gunzip();
putstr("Ok, booting the kernel.\n");
- if (high_loaded) close_output_buffer_if_we_run_high(mv);
- return high_loaded;
+ return;
}
diff -puN /dev/null arch/i386/boot/compressed/relocs.c
--- /dev/null 2006-10-02 12:52:59.410341682 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/relocs.c 2006-10-02 14:09:14.000000000 -0400
@@ -0,0 +1,563 @@
+#include <stdio.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <elf.h>
+#include <byteswap.h>
+#define USE_BSD
+#include <endian.h>
+
+#define MAX_SHDRS 100
+static Elf32_Ehdr ehdr;
+static Elf32_Shdr shdr[MAX_SHDRS];
+static Elf32_Sym *symtab[MAX_SHDRS];
+static Elf32_Rel *reltab[MAX_SHDRS];
+static char *strtab[MAX_SHDRS];
+static unsigned long reloc_count, reloc_idx;
+static unsigned long *relocs;
+
+static void die(char *fmt, ...)
+{
+ va_list ap;
+ va_start(ap, fmt);
+ vfprintf(stderr, fmt, ap);
+ va_end(ap);
+ exit(1);
+}
+
+static const char *sym_type(unsigned type)
+{
+ static const char *type_name[] = {
+#define SYM_TYPE(X) [X] = #X
+ SYM_TYPE(STT_NOTYPE),
+ SYM_TYPE(STT_OBJECT),
+ SYM_TYPE(STT_FUNC),
+ SYM_TYPE(STT_SECTION),
+ SYM_TYPE(STT_FILE),
+ SYM_TYPE(STT_COMMON),
+ SYM_TYPE(STT_TLS),
+#undef SYM_TYPE
+ };
+ const char *name = "unknown sym type name";
+ if (type < sizeof(type_name)/sizeof(type_name[0])) {
+ name = type_name[type];
+ }
+ return name;
+}
+
+static const char *sym_bind(unsigned bind)
+{
+ static const char *bind_name[] = {
+#define SYM_BIND(X) [X] = #X
+ SYM_BIND(STB_LOCAL),
+ SYM_BIND(STB_GLOBAL),
+ SYM_BIND(STB_WEAK),
+#undef SYM_BIND
+ };
+ const char *name = "unknown sym bind name";
+ if (bind < sizeof(bind_name)/sizeof(bind_name[0])) {
+ name = bind_name[bind];
+ }
+ return name;
+}
+
+static const char *sym_visibility(unsigned visibility)
+{
+ static const char *visibility_name[] = {
+#define SYM_VISIBILITY(X) [X] = #X
+ SYM_VISIBILITY(STV_DEFAULT),
+ SYM_VISIBILITY(STV_INTERNAL),
+ SYM_VISIBILITY(STV_HIDDEN),
+ SYM_VISIBILITY(STV_PROTECTED),
+#undef SYM_VISIBILITY
+ };
+ const char *name = "unknown sym visibility name";
+ if (visibility < sizeof(visibility_name)/sizeof(visibility_name[0])) {
+ name = visibility_name[visibility];
+ }
+ return name;
+}
+
+static const char *rel_type(unsigned type)
+{
+ static const char *type_name[] = {
+#define REL_TYPE(X) [X] = #X
+ REL_TYPE(R_386_NONE),
+ REL_TYPE(R_386_32),
+ REL_TYPE(R_386_PC32),
+ REL_TYPE(R_386_GOT32),
+ REL_TYPE(R_386_PLT32),
+ REL_TYPE(R_386_COPY),
+ REL_TYPE(R_386_GLOB_DAT),
+ REL_TYPE(R_386_JMP_SLOT),
+ REL_TYPE(R_386_RELATIVE),
+ REL_TYPE(R_386_GOTOFF),
+ REL_TYPE(R_386_GOTPC),
+#undef REL_TYPE
+ };
+ const char *name = "unknown type rel type name";
+ if (type < sizeof(type_name)/sizeof(type_name[0])) {
+ name = type_name[type];
+ }
+ return name;
+}
+
+static const char *sec_name(unsigned shndx)
+{
+ const char *sec_strtab;
+ const char *name;
+ sec_strtab = strtab[ehdr.e_shstrndx];
+ name = "<noname>";
+ if (shndx < ehdr.e_shnum) {
+ name = sec_strtab + shdr[shndx].sh_name;
+ }
+ else if (shndx == SHN_ABS) {
+ name = "ABSOLUTE";
+ }
+ else if (shndx == SHN_COMMON) {
+ name = "COMMON";
+ }
+ return name;
+}
+
+static const char *sym_name(const char *sym_strtab, Elf32_Sym *sym)
+{
+ const char *name;
+ name = "<noname>";
+ if (sym->st_name) {
+ name = sym_strtab + sym->st_name;
+ }
+ else {
+ name = sec_name(shdr[sym->st_shndx].sh_name);
+ }
+ return name;
+}
+
+
+
+#if BYTE_ORDER == LITTLE_ENDIAN
+#define le16_to_cpu(val) (val)
+#define le32_to_cpu(val) (val)
+#endif
+#if BYTE_ORDER == BIG_ENDIAN
+#define le16_to_cpu(val) bswap_16(val)
+#define le32_to_cpu(val) bswap_32(val)
+#endif
+
+static uint16_t elf16_to_cpu(uint16_t val)
+{
+ return le16_to_cpu(val);
+}
+
+static uint32_t elf32_to_cpu(uint32_t val)
+{
+ return le32_to_cpu(val);
+}
+
+static void read_ehdr(FILE *fp)
+{
+ if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1) {
+ die("Cannot read ELF header: %s\n",
+ strerror(errno));
+ }
+ if (memcmp(ehdr.e_ident, ELFMAG, 4) != 0) {
+ die("No ELF magic\n");
+ }
+ if (ehdr.e_ident[EI_CLASS] != ELFCLASS32) {
+ die("Not a 32 bit executable\n");
+ }
+ if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) {
+ die("Not a LSB ELF executable\n");
+ }
+ if (ehdr.e_ident[EI_VERSION] != EV_CURRENT) {
+ die("Unknown ELF version\n");
+ }
+ /* Convert the fields to native endian */
+ ehdr.e_type = elf16_to_cpu(ehdr.e_type);
+ ehdr.e_machine = elf16_to_cpu(ehdr.e_machine);
+ ehdr.e_version = elf32_to_cpu(ehdr.e_version);
+ ehdr.e_entry = elf32_to_cpu(ehdr.e_entry);
+ ehdr.e_phoff = elf32_to_cpu(ehdr.e_phoff);
+ ehdr.e_shoff = elf32_to_cpu(ehdr.e_shoff);
+ ehdr.e_flags = elf32_to_cpu(ehdr.e_flags);
+ ehdr.e_ehsize = elf16_to_cpu(ehdr.e_ehsize);
+ ehdr.e_phentsize = elf16_to_cpu(ehdr.e_phentsize);
+ ehdr.e_phnum = elf16_to_cpu(ehdr.e_phnum);
+ ehdr.e_shentsize = elf16_to_cpu(ehdr.e_shentsize);
+ ehdr.e_shnum = elf16_to_cpu(ehdr.e_shnum);
+ ehdr.e_shstrndx = elf16_to_cpu(ehdr.e_shstrndx);
+
+ if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
+ die("Unsupported ELF header type\n");
+ }
+ if (ehdr.e_machine != EM_386) {
+ die("Not for x86\n");
+ }
+ if (ehdr.e_version != EV_CURRENT) {
+ die("Unknown ELF version\n");
+ }
+ if (ehdr.e_ehsize != sizeof(Elf32_Ehdr)) {
+ die("Bad Elf header size\n");
+ }
+ if (ehdr.e_phentsize != sizeof(Elf32_Phdr)) {
+ die("Bad program header entry\n");
+ }
+ if (ehdr.e_shentsize != sizeof(Elf32_Shdr)) {
+ die("Bad section header entry\n");
+ }
+ if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+ die("String table index out of bounds\n");
+ }
+}
+
+static void read_shdrs(FILE *fp)
+{
+ int i;
+ if (ehdr.e_shnum > MAX_SHDRS) {
+ die("%d section headers supported: %d\n",
+ ehdr.e_shnum, MAX_SHDRS);
+ }
+ if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ ehdr.e_shoff, strerror(errno));
+ }
+ if (fread(&shdr, sizeof(shdr[0]), ehdr.e_shnum, fp) != ehdr.e_shnum) {
+ die("Cannot read ELF section headers: %s\n",
+ strerror(errno));
+ }
+ for(i = 0; i < ehdr.e_shnum; i++) {
+ shdr[i].sh_name = elf32_to_cpu(shdr[i].sh_name);
+ shdr[i].sh_type = elf32_to_cpu(shdr[i].sh_type);
+ shdr[i].sh_flags = elf32_to_cpu(shdr[i].sh_flags);
+ shdr[i].sh_addr = elf32_to_cpu(shdr[i].sh_addr);
+ shdr[i].sh_offset = elf32_to_cpu(shdr[i].sh_offset);
+ shdr[i].sh_size = elf32_to_cpu(shdr[i].sh_size);
+ shdr[i].sh_link = elf32_to_cpu(shdr[i].sh_link);
+ shdr[i].sh_info = elf32_to_cpu(shdr[i].sh_info);
+ shdr[i].sh_addralign = elf32_to_cpu(shdr[i].sh_addralign);
+ shdr[i].sh_entsize = elf32_to_cpu(shdr[i].sh_entsize);
+ }
+
+}
+
+static void read_strtabs(FILE *fp)
+{
+ int i;
+ for(i = 0; i < ehdr.e_shnum; i++) {
+ if (shdr[i].sh_type != SHT_STRTAB) {
+ continue;
+ }
+ strtab[i] = malloc(shdr[i].sh_size);
+ if (!strtab[i]) {
+ die("malloc of %d bytes for strtab failed\n",
+ shdr[i].sh_size);
+ }
+ if (fseek(fp, shdr[i].sh_offset, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ shdr[i].sh_offset, strerror(errno));
+ }
+ if (fread(strtab[i], 1, shdr[i].sh_size, fp) != shdr[i].sh_size) {
+ die("Cannot read symbol table: %s\n",
+ strerror(errno));
+ }
+ }
+}
+
+static void read_symtabs(FILE *fp)
+{
+ int i,j;
+ for(i = 0; i < ehdr.e_shnum; i++) {
+ if (shdr[i].sh_type != SHT_SYMTAB) {
+ continue;
+ }
+ symtab[i] = malloc(shdr[i].sh_size);
+ if (!symtab[i]) {
+ die("malloc of %d bytes for symtab failed\n",
+ shdr[i].sh_size);
+ }
+ if (fseek(fp, shdr[i].sh_offset, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ shdr[i].sh_offset, strerror(errno));
+ }
+ if (fread(symtab[i], 1, shdr[i].sh_size, fp) != shdr[i].sh_size) {
+ die("Cannot read symbol table: %s\n",
+ strerror(errno));
+ }
+ for(j = 0; j < shdr[i].sh_size/sizeof(symtab[i][0]); j++) {
+ symtab[i][j].st_name = elf32_to_cpu(symtab[i][j].st_name);
+ symtab[i][j].st_value = elf32_to_cpu(symtab[i][j].st_value);
+ symtab[i][j].st_size = elf32_to_cpu(symtab[i][j].st_size);
+ symtab[i][j].st_shndx = elf16_to_cpu(symtab[i][j].st_shndx);
+ }
+ }
+}
+
+
+static void read_relocs(FILE *fp)
+{
+ int i,j;
+ for(i = 0; i < ehdr.e_shnum; i++) {
+ if (shdr[i].sh_type != SHT_REL) {
+ continue;
+ }
+ reltab[i] = malloc(shdr[i].sh_size);
+ if (!reltab[i]) {
+ die("malloc of %d bytes for relocs failed\n",
+ shdr[i].sh_size);
+ }
+ if (fseek(fp, shdr[i].sh_offset, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ shdr[i].sh_offset, strerror(errno));
+ }
+ if (fread(reltab[i], 1, shdr[i].sh_size, fp) != shdr[i].sh_size) {
+ die("Cannot read symbol table: %s\n",
+ strerror(errno));
+ }
+ for(j = 0; j < shdr[i].sh_size/sizeof(reltab[0][0]); j++) {
+ reltab[i][j].r_offset = elf32_to_cpu(reltab[i][j].r_offset);
+ reltab[i][j].r_info = elf32_to_cpu(reltab[i][j].r_info);
+ }
+ }
+}
+
+
+static void print_absolute_symbols(void)
+{
+ int i;
+ printf("Absolute symbols\n");
+ printf(" Num: Value Size Type Bind Visibility Name\n");
+ for(i = 0; i < ehdr.e_shnum; i++) {
+ char *sym_strtab;
+ Elf32_Sym *sh_symtab;
+ int j;
+ if (shdr[i].sh_type != SHT_SYMTAB) {
+ continue;
+ }
+ sh_symtab = symtab[i];
+ sym_strtab = strtab[shdr[i].sh_link];
+ for(j = 0; j < shdr[i].sh_size/sizeof(symtab[0][0]); j++) {
+ Elf32_Sym *sym;
+ const char *name;
+ sym = &symtab[i][j];
+ name = sym_name(sym_strtab, sym);
+ if (sym->st_shndx != SHN_ABS) {
+ continue;
+ }
+ printf("%5d %08x %5d %10s %10s %12s %s\n",
+ j, sym->st_value, sym->st_size,
+ sym_type(ELF32_ST_TYPE(sym->st_info)),
+ sym_bind(ELF32_ST_BIND(sym->st_info)),
+ sym_visibility(ELF32_ST_VISIBILITY(sym->st_other)),
+ name);
+ }
+ }
+ printf("\n");
+}
+
+static void print_absolute_relocs(void)
+{
+ int i;
+ printf("Absolute relocations\n");
+ printf("Offset Info Type Sym.Value Sym.Name\n");
+ for(i = 0; i < ehdr.e_shnum; i++) {
+ char *sym_strtab;
+ Elf32_Sym *sh_symtab;
+ unsigned sec_applies, sec_symtab;
+ int j;
+ if (shdr[i].sh_type != SHT_REL) {
+ continue;
+ }
+ sec_symtab = shdr[i].sh_link;
+ sec_applies = shdr[i].sh_info;
+ if (!(shdr[sec_applies].sh_flags & SHF_ALLOC)) {
+ continue;
+ }
+ sh_symtab = symtab[sec_symtab];
+ sym_strtab = strtab[shdr[sec_symtab].sh_link];
+ for(j = 0; j < shdr[i].sh_size/sizeof(reltab[0][0]); j++) {
+ Elf32_Rel *rel;
+ Elf32_Sym *sym;
+ const char *name;
+ rel = &reltab[i][j];
+ sym = &sh_symtab[ELF32_R_SYM(rel->r_info)];
+ name = sym_name(sym_strtab, sym);
+ if (sym->st_shndx != SHN_ABS) {
+ continue;
+ }
+ printf("%08x %08x %10s %08x %s\n",
+ rel->r_offset,
+ rel->r_info,
+ rel_type(ELF32_R_TYPE(rel->r_info)),
+ sym->st_value,
+ name);
+ }
+ }
+ printf("\n");
+}
+
+static void walk_relocs(void (*visit)(Elf32_Rel *rel, Elf32_Sym *sym))
+{
+ int i;
+ /* Walk through the relocations */
+ for(i = 0; i < ehdr.e_shnum; i++) {
+ char *sym_strtab;
+ Elf32_Sym *sh_symtab;
+ unsigned sec_applies, sec_symtab;
+ int j;
+ if (shdr[i].sh_type != SHT_REL) {
+ continue;
+ }
+ sec_symtab = shdr[i].sh_link;
+ sec_applies = shdr[i].sh_info;
+ if (!(shdr[sec_applies].sh_flags & SHF_ALLOC)) {
+ continue;
+ }
+ sh_symtab = symtab[sec_symtab];
+ sym_strtab = strtab[shdr[sec_symtab].sh_link];
+ for(j = 0; j < shdr[i].sh_size/sizeof(reltab[0][0]); j++) {
+ Elf32_Rel *rel;
+ Elf32_Sym *sym;
+ unsigned r_type;
+ rel = &reltab[i][j];
+ sym = &sh_symtab[ELF32_R_SYM(rel->r_info)];
+ r_type = ELF32_R_TYPE(rel->r_info);
+ /* Don't visit relocations to absolute symbols */
+ if (sym->st_shndx == SHN_ABS) {
+ continue;
+ }
+ if (r_type == R_386_PC32) {
+ /* PC relative relocations don't need to be adjusted */
+ }
+ else if (r_type == R_386_32) {
+ /* Visit relocations that need to be adjusted */
+ visit(rel, sym);
+ }
+ else {
+ die("Unsupported relocation type: %d\n", r_type);
+ }
+ }
+ }
+}
+
+static void count_reloc(Elf32_Rel *rel, Elf32_Sym *sym)
+{
+ reloc_count += 1;
+}
+
+static void collect_reloc(Elf32_Rel *rel, Elf32_Sym *sym)
+{
+ /* Remember the address that needs to be adjusted. */
+ relocs[reloc_idx++] = rel->r_offset;
+}
+
+static int cmp_relocs(const void *va, const void *vb)
+{
+ const unsigned long *a, *b;
+ a = va; b = vb;
+ return (*a == *b)? 0 : (*a > *b)? 1 : -1;
+}
+
+static void emit_relocs(int as_text)
+{
+ int i;
+ /* Count how many relocations I have and allocate space for them. */
+ reloc_count = 0;
+ walk_relocs(count_reloc);
+ relocs = malloc(reloc_count * sizeof(relocs[0]));
+ if (!relocs) {
+ die("malloc of %d entries for relocs failed\n",
+ reloc_count);
+ }
+ /* Collect up the relocations */
+ reloc_idx = 0;
+ walk_relocs(collect_reloc);
+
+ /* Order the relocations for more efficient processing */
+ qsort(relocs, reloc_count, sizeof(relocs[0]), cmp_relocs);
+
+ /* Print the relocations */
+ if (as_text) {
+ /* Print the relocations in a form suitable that
+ * gas will like.
+ */
+ printf(".section \".data.reloc\",\"a\"\n");
+ printf(".balign 4\n");
+ for(i = 0; i < reloc_count; i++) {
+ printf("\t .long 0x%08lx\n", relocs[i]);
+ }
+ printf("\n");
+ }
+ else {
+ unsigned char buf[4];
+ buf[0] = buf[1] = buf[2] = buf[3] = 0;
+ /* Print a stop */
+ printf("%c%c%c%c", buf[0], buf[1], buf[2], buf[3]);
+ /* Now print each relocation */
+ for(i = 0; i < reloc_count; i++) {
+ buf[0] = (relocs[i] >> 0) & 0xff;
+ buf[1] = (relocs[i] >> 8) & 0xff;
+ buf[2] = (relocs[i] >> 16) & 0xff;
+ buf[3] = (relocs[i] >> 24) & 0xff;
+ printf("%c%c%c%c", buf[0], buf[1], buf[2], buf[3]);
+ }
+ }
+}
+
+static void usage(void)
+{
+ die("i386_reloc [--abs | --text] vmlinux\n");
+}
+
+int main(int argc, char **argv)
+{
+ int show_absolute;
+ int as_text;
+ const char *fname;
+ FILE *fp;
+ int i;
+
+ show_absolute = 0;
+ as_text = 0;
+ fname = NULL;
+ for(i = 1; i < argc; i++) {
+ char *arg = argv[i];
+ if (*arg == '-') {
+ if (strcmp(argv[1], "--abs") == 0) {
+ show_absolute = 1;
+ continue;
+ }
+ else if (strcmp(argv[1], "--text") == 0) {
+ as_text = 1;
+ continue;
+ }
+ }
+ else if (!fname) {
+ fname = arg;
+ continue;
+ }
+ usage();
+ }
+ if (!fname) {
+ usage();
+ }
+ fp = fopen(fname, "r");
+ if (!fp) {
+ die("Cannot open %s: %s\n",
+ fname, strerror(errno));
+ }
+ read_ehdr(fp);
+ read_shdrs(fp);
+ read_strtabs(fp);
+ read_symtabs(fp);
+ read_relocs(fp);
+ if (show_absolute) {
+ print_absolute_symbols();
+ print_absolute_relocs();
+ return 0;
+ }
+ emit_relocs(as_text);
+ return 0;
+}
diff -puN /dev/null arch/i386/boot/compressed/vmlinux.lds
--- /dev/null 2006-10-02 12:52:59.410341682 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/vmlinux.lds 2006-10-02 14:09:14.000000000 -0400
@@ -0,0 +1,43 @@
+OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
+OUTPUT_ARCH(i386)
+ENTRY(startup_32)
+SECTIONS
+{
+ /* Be careful parts of head.S assume startup_32 is at
+ * address 0.
+ */
+ . = 0 ;
+ .text.head : {
+ _head = . ;
+ *(.text.head)
+ _ehead = . ;
+ }
+ .data.compressed : {
+ *(.data.compressed)
+ }
+ .text : {
+ _text = .; /* Text */
+ *(.text)
+ *(.text.*)
+ _etext = . ;
+ }
+ .rodata : {
+ _rodata = . ;
+ *(.rodata) /* read-only data */
+ *(.rodata.*)
+ _erodata = . ;
+ }
+ .data : {
+ _data = . ;
+ *(.data)
+ *(.data.*)
+ _edata = . ;
+ }
+ .bss : {
+ _bss = . ;
+ *(.bss)
+ *(.bss.*)
+ *(COMMON)
+ _end = . ;
+ }
+}
diff -puN arch/i386/boot/compressed/vmlinux.scr~i386-Relocatable-kernel-support arch/i386/boot/compressed/vmlinux.scr
--- linux-2.6.18-git17/arch/i386/boot/compressed/vmlinux.scr~i386-Relocatable-kernel-support 2006-10-02 14:09:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/vmlinux.scr 2006-10-02 14:09:14.000000000 -0400
@@ -1,9 +1,10 @@
SECTIONS
{
- .data : {
+ .data.compressed : {
input_len = .;
LONG(input_data_end - input_data) input_data = .;
*(.data)
+ output_len = . - 4;
input_data_end = .;
}
}
diff -puN arch/i386/boot/setup.S~i386-Relocatable-kernel-support arch/i386/boot/setup.S
--- linux-2.6.18-git17/arch/i386/boot/setup.S~i386-Relocatable-kernel-support 2006-10-02 14:09:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/setup.S 2006-10-02 14:09:14.000000000 -0400
@@ -588,11 +588,6 @@ rmodeswtch_normal:
call default_switch
rmodeswtch_end:
-# we get the code32 start address and modify the below 'jmpi'
-# (loader may have changed it)
- movl %cs:code32_start, %eax
- movl %eax, %cs:code32
-
# Now we move the system to its rightful place ... but we check if we have a
# big-kernel. In that case we *must* not move it ...
testb $LOADED_HIGH, %cs:loadflags
@@ -788,11 +783,12 @@ a20_err_msg:
a20_done:
#endif /* CONFIG_X86_VOYAGER */
-# set up gdt and idt
+# set up gdt and idt and 32bit start address
lidt idt_48 # load idt with 0,0
xorl %eax, %eax # Compute gdt_base
movw %ds, %ax # (Convert %ds:gdt to a linear ptr)
shll $4, %eax
+ addl %eax, code32
addl $gdt, %eax
movl %eax, (gdt_48+2)
lgdt gdt_48 # load gdt with whatever is
@@ -851,9 +847,26 @@ flush_instr:
# Manual, Mixing 16-bit and 32-bit code, page 16-6)
.byte 0x66, 0xea # prefix + jmpi-opcode
-code32: .long 0x1000 # will be set to 0x100000
- # for big kernels
+code32: .long startup_32 # will be set to %cs+startup_32
.word __BOOT_CS
+.code32
+startup_32:
+ movl $(__BOOT_DS), %eax
+ movl %eax, %ds
+ movl %eax, %es
+ movl %eax, %fs
+ movl %eax, %gs
+ movl %eax, %ss
+
+ xorl %eax, %eax
+1: incl %eax # check that A20 really IS enabled
+ movl %eax, 0x00000000 # loop forever if it isn't
+ cmpl %eax, 0x00100000
+ je 1b
+
+ # Jump to the 32bit entry point
+ jmpl *(code32_start - start + (DELTA_INITSEG << 4))(%esi)
+.code16
# Here's a bunch of information about your current kernel..
kernel_version: .ascii UTS_RELEASE
diff -puN arch/i386/Kconfig~i386-Relocatable-kernel-support arch/i386/Kconfig
--- linux-2.6.18-git17/arch/i386/Kconfig~i386-Relocatable-kernel-support 2006-10-02 14:09:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/Kconfig 2006-10-02 14:33:13.000000000 -0400
@@ -773,6 +773,18 @@ config CRASH_DUMP
PHYSICAL_START.
For more details see Documentation/kdump/kdump.txt
+config RELOCATABLE
+ bool "Build a relocatable kernel"
+ help
+ This build a kernel image that retains relocation information
+ so it can be loaded someplace besides the default 1MB.
+ The relocations tend to the kernel binary about 10% larger,
+ but are discarded at runtime.
+
+ One use is for the kexec on panic case where the recovery kernel
+ must live at a different physical address than the primary
+ kernel.
+
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
diff -puN arch/i386/Makefile~i386-Relocatable-kernel-support arch/i386/Makefile
--- linux-2.6.18-git17/arch/i386/Makefile~i386-Relocatable-kernel-support 2006-10-02 14:09:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/Makefile 2006-10-02 14:13:15.000000000 -0400
@@ -26,7 +26,9 @@ endif
LDFLAGS := -m elf_i386
OBJCOPYFLAGS := -O binary -R .note -R .comment -S
-LDFLAGS_vmlinux :=
+ifdef CONFIG_RELOCATABLE
+LDFLAGS_vmlinux := --emit-relocs
+endif
CHECKFLAGS += -D__i386__
CFLAGS += -pipe -msoft-float
diff -puN include/linux/screen_info.h~i386-Relocatable-kernel-support include/linux/screen_info.h
--- linux-2.6.18-git17/include/linux/screen_info.h~i386-Relocatable-kernel-support 2006-10-02 14:09:14.000000000 -0400
+++ linux-2.6.18-git17-root/include/linux/screen_info.h 2006-10-02 14:09:14.000000000 -0400
@@ -42,7 +42,8 @@ struct screen_info {
u16 pages; /* 0x32 */
u16 vesa_attributes; /* 0x34 */
u32 capabilities; /* 0x36 */
- /* 0x3a -- 0x3f reserved for future expansion */
+ /* 0x3a -- 0x3b reserved for future expansion */
+ /* 0x3c -- 0x3f micro stack for relocatable kernels */
};
extern struct screen_info screen_info;
_
The motivation for this is that currently we have 512 bytes
at the begining of a bzImage that are unused now that we don't
have a bootsector there. I plan on putting an ELF header
there, and generating it by hand with assebmly data directives
to be minimally disrutptive to the current build process.
To do that I need the elf magic constants available to my
assembly code.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
include/linux/elf.h | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff -puN include/linux/elf.h~Make-linux-elf.h-safe-to-be-included-in-assembly-files include/linux/elf.h
--- linux-2.6.18-git17/include/linux/elf.h~Make-linux-elf.h-safe-to-be-included-in-assembly-files 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/include/linux/elf.h 2006-10-02 14:35:25.000000000 -0400
@@ -1,9 +1,11 @@
#ifndef _LINUX_ELF_H
#define _LINUX_ELF_H
+#include <linux/elf-em.h>
+
+#ifndef __ASSEMBLY__
#include <linux/types.h>
#include <linux/auxvec.h>
-#include <linux/elf-em.h>
#include <asm/elf.h>
#ifndef elf_read_implies_exec
@@ -30,6 +32,8 @@ typedef __u32 Elf64_Word;
typedef __u64 Elf64_Xword;
typedef __s64 Elf64_Sxword;
+#endif /* __ASSEMBLY__ */
+
/* These constants are for the segment types stored in the image headers */
#define PT_NULL 0
#define PT_LOAD 1
@@ -97,6 +101,8 @@ typedef __s64 Elf64_Sxword;
#define STT_COMMON 5
#define STT_TLS 6
+#ifndef __ASSEMBLY__
+
#define ELF_ST_BIND(x) ((x) >> 4)
#define ELF_ST_TYPE(x) (((unsigned int) x) & 0xf)
#define ELF32_ST_BIND(x) ELF_ST_BIND(x)
@@ -204,12 +210,16 @@ typedef struct elf64_hdr {
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
+#endif /* __ASSEMBLY__ */
+
/* These constants define the permissions on sections in the program
header, p_flags. */
#define PF_R 0x4
#define PF_W 0x2
#define PF_X 0x1
+#ifndef __ASSEMBLY__
+
typedef struct elf32_phdr{
Elf32_Word p_type;
Elf32_Off p_offset;
@@ -232,6 +242,8 @@ typedef struct elf64_phdr {
Elf64_Xword p_align; /* Segment alignment, file & memory */
} Elf64_Phdr;
+#endif /* __ASSEMBLY__ */
+
/* sh_type */
#define SHT_NULL 0
#define SHT_PROGBITS 1
@@ -265,6 +277,8 @@ typedef struct elf64_phdr {
#define SHN_ABS 0xfff1
#define SHN_COMMON 0xfff2
#define SHN_HIRESERVE 0xffff
+
+#ifndef __ASSEMBLY__
typedef struct {
Elf32_Word sh_name;
@@ -292,6 +306,8 @@ typedef struct elf64_shdr {
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;
+#endif /* __ASSEMBLY__ */
+
#define EI_MAG0 0 /* e_ident[] indexes */
#define EI_MAG1 1
#define EI_MAG2 2
@@ -338,6 +354,8 @@ typedef struct elf64_shdr {
#define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
+#ifndef __ASSEMBLY__
+
/* Note header in a PT_NOTE section */
typedef struct elf32_note {
Elf32_Word n_namesz; /* Name size */
@@ -368,5 +386,7 @@ extern Elf64_Dyn _DYNAMIC [];
#endif
+#endif /* __ASSEMBLY__ */
+
#endif /* _LINUX_ELF_H */
_
Currently when we are reserving the memory the kernel text
resides in we start at __PHYSICAL_START which happens to be
correct but not very obvious. In addition when we start relocating
the kernel __PHYSICAL_START is the wrong value, as it is an
absolute symbol that does not get relocated.
By starting the reservation at __pa_symbol(_text)
the code is clearer and will be correct when relocated.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/kernel/setup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff -puN arch/i386/kernel/setup.c~i386-setup.c-Reserve-kernel-memory-starting-from-_text arch/i386/kernel/setup.c
--- linux-2.6.18-git17/arch/i386/kernel/setup.c~i386-setup.c-Reserve-kernel-memory-starting-from-_text 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/kernel/setup.c 2006-10-02 13:17:58.000000000 -0400
@@ -1119,8 +1119,8 @@ void __init setup_bootmem_allocator(void
* the (very unlikely) case of us accidentally initializing the
* bootmem allocator with an invalid RAM area.
*/
- reserve_bootmem(__PHYSICAL_START, (PFN_PHYS(min_low_pfn) +
- bootmap_size + PAGE_SIZE-1) - (__PHYSICAL_START));
+ reserve_bootmem(__pa_symbol(_text), (PFN_PHYS(min_low_pfn) +
+ bootmap_size + PAGE_SIZE-1) - __pa_symbol(_text));
/*
* reserve physical page 0 - it's a special BIOS page on many boxes,
_
o Currently there is no specific alignment restriction in linker script
and in some cases it can be placed non 4K aligned addresses. This fails
kexec which checks that segment to be loaded is page aligned.
o I guess, it does not harm data segment to be 4K aligned.
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/kernel/vmlinux.lds.S | 1 +
1 file changed, 1 insertion(+)
diff -puN arch/i386/kernel/vmlinux.lds.S~i386-force-data-section-to-4K-aligned arch/i386/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/i386/kernel/vmlinux.lds.S~i386-force-data-section-to-4K-aligned 2006-10-02 13:17:58.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/kernel/vmlinux.lds.S 2006-10-02 14:38:17.000000000 -0400
@@ -52,6 +52,7 @@ SECTIONS
}
/* writeable */
+ . = ALIGN(4096);
.data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */
*(.data)
CONSTRUCTORS
_
On Tue, 2006-10-03 at 13:15 -0400, Vivek Goyal wrote:
> diff -puN arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup arch/i386/boot/compressed/misc.c
> --- linux-2.6.18-git17/arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-02 13:17:58.000000000 -0400
> +++ linux-2.6.18-git17-root/arch/i386/boot/compressed/misc.c 2006-10-02 14:33:44.000000000 -0400
> @@ -9,11 +9,11 @@
> * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
> */
>
> +#include <linux/config.h>
> #include <linux/linkage.h>
> #include <linux/vmalloc.h>
> #include <linux/screen_info.h>
Isn't config.h implicitly included everywhere by the build system now?
I don't think this is needed.
-- Dave
On Tue, Oct 03, 2006 at 11:45:09AM -0700, Dave Hansen wrote:
> On Tue, 2006-10-03 at 13:15 -0400, Vivek Goyal wrote:
> > diff -puN arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup arch/i386/boot/compressed/misc.c
> > --- linux-2.6.18-git17/arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-02 13:17:58.000000000 -0400
> > +++ linux-2.6.18-git17-root/arch/i386/boot/compressed/misc.c 2006-10-02 14:33:44.000000000 -0400
> > @@ -9,11 +9,11 @@
> > * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
> > */
> >
> > +#include <linux/config.h>
> > #include <linux/linkage.h>
> > #include <linux/vmalloc.h>
> > #include <linux/screen_info.h>
>
> Isn't config.h implicitly included everywhere by the build system now?
> I don't think this is needed.
>
You are right. I will get rid of it.
-Vivek
Dave Hansen <[email protected]> writes:
> On Tue, 2006-10-03 at 13:15 -0400, Vivek Goyal wrote:
>> diff -puN arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup
> arch/i386/boot/compressed/misc.c
>> ---
> linux-2.6.18-git17/arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup
> 2006-10-02 13:17:58.000000000 -0400
>> +++ linux-2.6.18-git17-root/arch/i386/boot/compressed/misc.c 2006-10-02
> 14:33:44.000000000 -0400
>> @@ -9,11 +9,11 @@
>> * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
>> */
>>
>> +#include <linux/config.h>
>> #include <linux/linkage.h>
>> #include <linux/vmalloc.h>
>> #include <linux/screen_info.h>
>
> Isn't config.h implicitly included everywhere by the build system now?
> I don't think this is needed.
It should be.
That is one of the issues I received feedback on, the first round.
Eric
On Tue, Oct 03, 2006 at 12:59:21PM -0600, Eric W. Biederman wrote:
> Dave Hansen <[email protected]> writes:
>
> > On Tue, 2006-10-03 at 13:15 -0400, Vivek Goyal wrote:
> >> diff -puN arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup
> > arch/i386/boot/compressed/misc.c
> >> ---
> > linux-2.6.18-git17/arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup
> > 2006-10-02 13:17:58.000000000 -0400
> >> +++ linux-2.6.18-git17-root/arch/i386/boot/compressed/misc.c 2006-10-02
> > 14:33:44.000000000 -0400
> >> @@ -9,11 +9,11 @@
> >> * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
> >> */
> >>
> >> +#include <linux/config.h>
> >> #include <linux/linkage.h>
> >> #include <linux/vmalloc.h>
> >> #include <linux/screen_info.h>
> >
> > Isn't config.h implicitly included everywhere by the build system now?
> > I don't think this is needed.
>
> It should be.
>
> That is one of the issues I received feedback on, the first round.
>
I got rid of some other config.h but somehow missed last two in this patch.
Please find attached the regenerated patch.
-Vivek
Defining __PHYSICAL_START and __KERNEL_START in asm-i386/page.h works but
it triggers a full kernel rebuild for the silliest of reasons. This
modifies the users to directly use CONFIG_PHYSICAL_START and linux/config.h
which prevents the full rebuild problem, which makes the code much
more maintainer and hopefully user friendly.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/boot/compressed/head.S | 7 +++----
arch/i386/boot/compressed/misc.c | 7 +++----
arch/i386/kernel/vmlinux.lds.S | 2 +-
include/asm-i386/page.h | 3 ---
4 files changed, 7 insertions(+), 12 deletions(-)
diff -puN arch/i386/boot/compressed/head.S~i386-CONFIG_PHYSICAL_START-cleanup arch/i386/boot/compressed/head.S
--- linux-2.6.18-git17/arch/i386/boot/compressed/head.S~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-03 14:57:50.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/head.S 2006-10-03 14:57:50.000000000 -0400
@@ -25,7 +25,6 @@
#include <linux/linkage.h>
#include <asm/segment.h>
-#include <asm/page.h>
.globl startup_32
@@ -75,7 +74,7 @@ startup_32:
popl %esi # discard address
popl %esi # real mode pointer
xorl %ebx,%ebx
- ljmp $(__BOOT_CS), $__PHYSICAL_START
+ ljmp $(__BOOT_CS), $CONFIG_PHYSICAL_START
/*
* We come here, if we were loaded high.
@@ -100,7 +99,7 @@ startup_32:
popl %ecx # lcount
popl %edx # high_buffer_start
popl %eax # hcount
- movl $__PHYSICAL_START,%edi
+ movl $CONFIG_PHYSICAL_START,%edi
cli # make sure we don't get interrupted
ljmp $(__BOOT_CS), $0x1000 # and jump to the move routine
@@ -125,5 +124,5 @@ move_routine_start:
movsl
movl %ebx,%esi # Restore setup pointer
xorl %ebx,%ebx
- ljmp $(__BOOT_CS), $__PHYSICAL_START
+ ljmp $(__BOOT_CS), $CONFIG_PHYSICAL_START
move_routine_end:
diff -puN arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup arch/i386/boot/compressed/misc.c
--- linux-2.6.18-git17/arch/i386/boot/compressed/misc.c~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-03 14:57:50.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/compressed/misc.c 2006-10-03 14:59:26.000000000 -0400
@@ -13,7 +13,6 @@
#include <linux/vmalloc.h>
#include <linux/screen_info.h>
#include <asm/io.h>
-#include <asm/page.h>
/*
* gzip declarations
@@ -303,7 +302,7 @@ static void setup_normal_output_buffer(v
#else
if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < 1024) error("Less than 2MB of memory");
#endif
- output_data = (unsigned char *)__PHYSICAL_START; /* Normally Points to 1M */
+ output_data = (unsigned char *)CONFIG_PHYSICAL_START; /* Normally Points to 1M */
free_mem_end_ptr = (long)real_mode;
}
@@ -326,8 +325,8 @@ static void setup_output_buffer_if_we_ru
low_buffer_size = low_buffer_end - LOW_BUFFER_START;
high_loaded = 1;
free_mem_end_ptr = (long)high_buffer_start;
- if ( (__PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
- high_buffer_start = (uch *)(__PHYSICAL_START + low_buffer_size);
+ if ( (CONFIG_PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
+ high_buffer_start = (uch *)(CONFIG_PHYSICAL_START + low_buffer_size);
mv->hcount = 0; /* say: we need not to move high_buffer */
}
else mv->hcount = -1;
diff -puN arch/i386/kernel/vmlinux.lds.S~i386-CONFIG_PHYSICAL_START-cleanup arch/i386/kernel/vmlinux.lds.S
--- linux-2.6.18-git17/arch/i386/kernel/vmlinux.lds.S~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-03 14:57:50.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/kernel/vmlinux.lds.S 2006-10-03 14:59:14.000000000 -0400
@@ -21,7 +21,7 @@ PHDRS {
}
SECTIONS
{
- . = __KERNEL_START;
+ . = LOAD_OFFSET + CONFIG_PHYSICAL_START;
phys_startup_32 = startup_32 - LOAD_OFFSET;
/* read-only */
.text : AT(ADDR(.text) - LOAD_OFFSET) {
diff -puN include/asm-i386/page.h~i386-CONFIG_PHYSICAL_START-cleanup include/asm-i386/page.h
--- linux-2.6.18-git17/include/asm-i386/page.h~i386-CONFIG_PHYSICAL_START-cleanup 2006-10-03 14:57:50.000000000 -0400
+++ linux-2.6.18-git17-root/include/asm-i386/page.h 2006-10-03 14:57:50.000000000 -0400
@@ -112,12 +112,9 @@ extern int page_is_ram(unsigned long pag
#ifdef __ASSEMBLY__
#define __PAGE_OFFSET CONFIG_PAGE_OFFSET
-#define __PHYSICAL_START CONFIG_PHYSICAL_START
#else
#define __PAGE_OFFSET ((unsigned long)CONFIG_PAGE_OFFSET)
-#define __PHYSICAL_START ((unsigned long)CONFIG_PHYSICAL_START)
#endif
-#define __KERNEL_START (__PAGE_OFFSET + __PHYSICAL_START)
#define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET)
_
On Tue, 3 Oct 2006 13:25:11 -0400
Vivek Goyal <[email protected]> wrote:
> Increasingly the cobbled together boot protocol that
> is bzImage does not have the flexibility to deal
> with booting in new situations.
>
> Now that we no longer support the bootsector loader
> we have 512 bytes at the very start of a bzImage that
> we can use for other things.
>
> Placing an ELF header there allows us to retain
> a single binary for all of x86 while at the same
> time describing things that bzImage does not allow
> us to describe.
Seems that the entire kernel effort is an ongoing plot to make my poor
little Vaio stop working. This patch turns it into a black-screened rock
as soon as it does grub -> linux. Stock-standard FC5 install, config at
http://userweb.kernel.org/~akpm/config-sony.txt.
On Tue, Oct 03, 2006 at 08:13:40PM -0700, Andrew Morton wrote:
> On Tue, 3 Oct 2006 13:25:11 -0400
> Vivek Goyal <[email protected]> wrote:
>
> > Increasingly the cobbled together boot protocol that
> > is bzImage does not have the flexibility to deal
> > with booting in new situations.
> >
> > Now that we no longer support the bootsector loader
> > we have 512 bytes at the very start of a bzImage that
> > we can use for other things.
> >
> > Placing an ELF header there allows us to retain
> > a single binary for all of x86 while at the same
> > time describing things that bzImage does not allow
> > us to describe.
>
> Seems that the entire kernel effort is an ongoing plot to make my poor
> little Vaio stop working. This patch turns it into a black-screened rock
> as soon as it does grub -> linux. Stock-standard FC5 install, config at
> http://userweb.kernel.org/~akpm/config-sony.txt.
Hi Andrew,
Right now I don't have access to my test machine. Tomorrow morning,
very first thing I am going to try it out with your config file.
This patch just adds and ELF header to bzImage which is not even used
by grub.
So without this patch you are able to boot the kernel on your laptop?
Thanks
Vivek
Vivek Goyal wrote:
>
> Hi Andrew,
>
> Right now I don't have access to my test machine. Tomorrow morning,
> very first thing I am going to try it out with your config file.
>
> This patch just adds and ELF header to bzImage which is not even used
> by grub.
>
Oh yes, it will be. See below.
> So without this patch you are able to boot the kernel on your laptop?
Danger, Will Robinson. GRUB, Etherboot, and a whole bunch of other boot
loaders will recognize an ELF binary and load it as such. They will
typically load it as an executable (not a relocatable object) -- I doubt
many of them check that appropriate part of the ELF header -- so unless
your kernel can be safely loaded *AND RUN* in that mode this is not
going to work.
The entrypoint is going to be a major headache, since the standard
kernel is entered in real mode, whereas an ELF file will typically be
entered in protected mode, quite possibly using the C calling convention
to pass the command line as (argc, argv). God only knows how they're
going to deal with an initrd.
It may very well be that the ELF magic number has to be obfuscated.
-hpa
On Wed, 4 Oct 2006 00:28:50 -0400
Vivek Goyal <[email protected]> wrote:
> > Seems that the entire kernel effort is an ongoing plot to make my poor
> > little Vaio stop working. This patch turns it into a black-screened rock
> > as soon as it does grub -> linux. Stock-standard FC5 install, config at
> > http://userweb.kernel.org/~akpm/config-sony.txt.
>
> Hi Andrew,
>
> Right now I don't have access to my test machine. Tomorrow morning,
> very first thing I am going to try it out with your config file.
>
> This patch just adds and ELF header to bzImage which is not even used
> by grub.
>
> So without this patch you are able to boot the kernel on your laptop?
With your other 11 patches applied and not this one, it boots OK.
With this patch applied and not the other eleven applied: no-compile.
With all 12 applied: crash.
Vivek Goyal <[email protected]> writes:
> Increasingly the cobbled together boot protocol that
> is bzImage does not have the flexibility to deal
> with booting in new situations.
>
> Now that we no longer support the bootsector loader
> we have 512 bytes at the very start of a bzImage that
> we can use for other things.
>
> Placing an ELF header there allows us to retain
> a single binary for all of x86 while at the same
> time describing things that bzImage does not allow
> us to describe.
>
> The existing bugger off code for warning if we attempt to
> boot from the bootsector is kept but the error message is
> made more terse so we have a little more room to play with.
Vivek for this first round can we please take out the ELF
note processing. Now that vmlinux has ELF notes of interest
to the bootloader we really should be getting the ELF notes
from there.
So the generation of the ELF notes needs to move into the
vmlinux and then we need to copy them to ELF header.
If we just remove the ELF note munging code from this patch
that should be a good first step in getting the ELF notes correct.
Eric
"H. Peter Anvin" <[email protected]> writes:
> Vivek Goyal wrote:
>> Hi Andrew,
>> Right now I don't have access to my test machine. Tomorrow morning,
>> very first thing I am going to try it out with your config file.
>> This patch just adds and ELF header to bzImage which is not even used
>> by grub.
>>
>
> Oh yes, it will be. See below.
>
>> So without this patch you are able to boot the kernel on your laptop?
>
> Danger, Will Robinson. GRUB, Etherboot, and a whole bunch of other boot loaders
> will recognize an ELF binary and load it as such. They will typically load it
> as an executable (not a relocatable object) -- I doubt many of them check that
> appropriate part of the ELF header -- so unless your kernel can be safely loaded
> *AND RUN* in that mode this is not going to work.
The bzImage be safely loaded run in that mode. The only question is one
of arguments. Because there are no standards. For Etherboot we are good.
For the insanity that is GRUB I haven't the faintest clue but we should
be ok as we don't have a multiboot header.
> The entrypoint is going to be a major headache, since the standard kernel is
> entered in real mode, whereas an ELF file will typically be entered in protected
> mode, quite possibly using the C calling convention to pass the command line as
> (argc, argv). God only knows how they're going to deal with an initrd.
>
> It may very well be that the ELF magic number has to be obfuscated.
The entry point that is exported is the kernels protected mode entry point
that is used after the real mode code has been run. This is to allow
bootloaders like kexec where running the real-mode code is insane or
impossible to be used.
The calling conventions though are not changed, this is just formalizing
something that various groups have been doing for years. Since it is
all in the bzImage we still only have a single file format to support,
so any bootloader that can load a standard bzImage and run the kernels
real mode code should still do it that way but. If you can't the
rest of the information is available.
Eric
hi,
Sorry for the late feedback...
Vivek Goyal wrote:
>
> On x86_64 we have to be careful with calculating the physical
> address of kernel symbols. Both because of compiler odditities
> and because the symbols live in a different range of the virtual
> address space.
>
[snip]
> +#define __pa_symbol(x) \
> + ({unsigned long v; \
> + asm("" : "=r" (v) : "0" (x)); \
> + __pa(v); })
Why not simply reusing RELOC_HIDE like this ?
#define __pa_symbol(x) __pa(RELOC_HIDE(x,0))
thanks
Franck
> /* writeable */
> @@ -64,6 +66,7 @@ SECTIONS
> *(.data.nosave)
> . = ALIGN(4096);
> __nosave_end = .;
> + LONG(0)
> }
>
> . = ALIGN(4096);
You're wasting one full page once for each of these LONG(0)s because
of the following 4096 alignment.
Isn't there some way to do this less wastefull?
-Andi
On Tuesday 03 October 2006 19:06, Vivek Goyal wrote:
>
> o Currently there is no specific alignment restriction in linker script
> and in some cases it can be placed non 4K aligned addresses. This fails
> kexec which checks that segment to be loaded is page aligned.
>
> o I guess, it does not harm data segment to be 4K aligned.
iirc P4 optimization guide even recommends to keep writable data
away one page from code to avoid some cache invalidations. But:
> diff -puN arch/i386/kernel/vmlinux.lds.S~i386-force-data-section-to-4K-aligned arch/i386/kernel/vmlinux.lds.S
> --- linux-2.6.18-git17/arch/i386/kernel/vmlinux.lds.S~i386-force-data-section-to-4K-aligned 2006-10-02 13:17:58.000000000 -0400
> +++ linux-2.6.18-git17-root/arch/i386/kernel/vmlinux.lds.S 2006-10-02 14:38:17.000000000 -0400
> @@ -52,6 +52,7 @@ SECTIONS
> }
>
> /* writeable */
> + . = ALIGN(4096);
> .data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */
> *(.data)
> CONSTRUCTORS
I would move the ".tracedata" section behind it first.
-Andi
Andi Kleen <[email protected]> writes:
>> /* writeable */
>> @@ -64,6 +66,7 @@ SECTIONS
>> *(.data.nosave)
>> . = ALIGN(4096);
>> __nosave_end = .;
>> + LONG(0)
>> }
>>
>> . = ALIGN(4096);
>
> You're wasting one full page once for each of these LONG(0)s because
> of the following 4096 alignment.
>
> Isn't there some way to do this less wastefull?
So the problem is that we have sections that don't get relocated which
confuses things. If the first that happened was that the size was
check to see if it was non-zero before we did anything I think we
wouldn't care if the linker messed up in this way.
But I may be wrong on that point.
Eric
On Wed, Oct 04, 2006 at 01:08:56AM -0600, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
> > Increasingly the cobbled together boot protocol that
> > is bzImage does not have the flexibility to deal
> > with booting in new situations.
> >
> > Now that we no longer support the bootsector loader
> > we have 512 bytes at the very start of a bzImage that
> > we can use for other things.
> >
> > Placing an ELF header there allows us to retain
> > a single binary for all of x86 while at the same
> > time describing things that bzImage does not allow
> > us to describe.
> >
> > The existing bugger off code for warning if we attempt to
> > boot from the bootsector is kept but the error message is
> > made more terse so we have a little more room to play with.
>
> Vivek for this first round can we please take out the ELF
> note processing. Now that vmlinux has ELF notes of interest
> to the bootloader we really should be getting the ELF notes
> from there.
>
> So the generation of the ELF notes needs to move into the
> vmlinux and then we need to copy them to ELF header.
>
> If we just remove the ELF note munging code from this patch
> that should be a good first step in getting the ELF notes correct.
Hi Eric,
Sure. I will get rid if ELF note generation for bzImage ELF header.
But would that stop bootloaders out there from treating kernel as
an ELF executable?
I have got a FC5 machine with grub version .97 and everything seems
to work for me. So I am assuming that Andrew got a newer version of
Grub which is trying to load ther kernel as an ELF executable and then
running into the issues.
Thanks
Vivek
On Wed, Oct 04, 2006 at 08:07:36AM -0600, Eric W. Biederman wrote:
> Andi Kleen <[email protected]> writes:
>
> >> /* writeable */
> >> @@ -64,6 +66,7 @@ SECTIONS
> >> *(.data.nosave)
> >> . = ALIGN(4096);
> >> __nosave_end = .;
> >> + LONG(0)
> >> }
> >>
> >> . = ALIGN(4096);
> >
> > You're wasting one full page once for each of these LONG(0)s because
> > of the following 4096 alignment.
> >
> > Isn't there some way to do this less wastefull?
>
> So the problem is that we have sections that don't get relocated which
> confuses things. If the first that happened was that the size was
> check to see if it was non-zero before we did anything I think we
> wouldn't care if the linker messed up in this way.
>
Actually in this case if section size is zero, linker does not even
output that section and simply gets rid of it. What is left behind is
just the symbols (which were supposed to be section relative) and linker
just makes those symbols as absolute symbols. Absolute symbols are not
to be relocated so patch just filters out those symbols and they don't
get relocated. So I am not sure where can I check the section size?
One other possible solution is that kernel code is written carefully
so that we don't run into such problems even if absolute symbols don't
get relocated. For example, if there are two symbols A and B denoting
section start and end, always check if (A<B) before doing anything. Also
make sure that one is not trying to handle multiple sections at the same
time. For example, if A and B represents start and end for section 1
and C and D represent start and end for section 2 then one wants to
free memory between A and D , then it should be done in two steps.
if (A<B)
free_memory(A,B)
if (C<D)
free_memory(C,D)
So this code will become safe even if symbols for empty sections become
absolute.
But this looks to be very awkward solution.
Thanks
Vivek
Eric W. Biederman wrote:
>
>> The entrypoint is going to be a major headache, since the standard kernel is
>> entered in real mode, whereas an ELF file will typically be entered in protected
>> mode, quite possibly using the C calling convention to pass the command line as
>> (argc, argv). God only knows how they're going to deal with an initrd.
>>
>> It may very well be that the ELF magic number has to be obfuscated.
>
> The entry point that is exported is the kernels protected mode entry point
> that is used after the real mode code has been run. This is to allow
> bootloaders like kexec where running the real-mode code is insane or
> impossible to be used.
>
> The calling conventions though are not changed, this is just formalizing
> something that various groups have been doing for years. Since it is
> all in the bzImage we still only have a single file format to support,
> so any bootloader that can load a standard bzImage and run the kernels
> real mode code should still do it that way but. If you can't the
> rest of the information is available.
>
Well, it doesn't help if what you end up with for some bootloader is a
nonfunctioning kernel.
-hpa
Andi Kleen wrote:
> On Tuesday 03 October 2006 19:06, Vivek Goyal wrote:
>> o Currently there is no specific alignment restriction in linker script
>> and in some cases it can be placed non 4K aligned addresses. This fails
>> kexec which checks that segment to be loaded is page aligned.
>>
>> o I guess, it does not harm data segment to be 4K aligned.
>
> iirc P4 optimization guide even recommends to keep writable data
> away one page from code to avoid some cache invalidations. But:
>
Yes, that's why the .rodata section should be in between.
It's not just the P4, either.
-hpa
On Tue, 3 Oct 2006 13:09:08 -0400
Vivek Goyal <[email protected]> wrote:
> o Relocation patches for i386, moved the symbols in vmlinux.lds.S inside
> sections so that these symbols become section relative and are no more
> absolute. If these symbols become absolute, its bad as they are not
> relocated if kernel is not loaded at the address it has been compiled
> for.
>
> o Ironically, just moving the symbols inside the section does not
> gurantee that symbols inside will not become absolute. Recent
> versions of linkers, do some optimization, and if section size is
> zero, it gets rid of the section and makes any defined symbol as absolute.
>
> o This leads to a failure while second kernel is booting.
> arch/i386/alternative.c frees any pages present between __smp_alt_begin
> and __smp_alt_end. In my case size of section .smp_altinstructions is
> zero and symbol __smpt_alt_begin becomes absolute and is not relocated
> and system crashes while it is trying to free the memory starting
> from __smp_alt_begin.
>
> o This issue is being fixed by the linker guys and they are making sure
> that linker does not get rid of an empty section if there is any
> section relative symbol defined in it. But we need to fix it at
> kernel level too so that people using the linker version without fix,
> are not affected.
>
> o One of the possible solutions is that force the section size to be
> non zero to make sure these symbols don't become absolute. This
> patch implements that.
Would it be reasonable to omit this patch and require that the small number
of people who want to build relocatable kernels install binutils
2.17.50.0.5 or later?
On Wed, Oct 04, 2006 at 09:09:46AM -0700, Andrew Morton wrote:
> On Tue, 3 Oct 2006 13:09:08 -0400
> Vivek Goyal <[email protected]> wrote:
>
> > o Relocation patches for i386, moved the symbols in vmlinux.lds.S inside
> > sections so that these symbols become section relative and are no more
> > absolute. If these symbols become absolute, its bad as they are not
> > relocated if kernel is not loaded at the address it has been compiled
> > for.
> >
> > o Ironically, just moving the symbols inside the section does not
> > gurantee that symbols inside will not become absolute. Recent
> > versions of linkers, do some optimization, and if section size is
> > zero, it gets rid of the section and makes any defined symbol as absolute.
> >
> > o This leads to a failure while second kernel is booting.
> > arch/i386/alternative.c frees any pages present between __smp_alt_begin
> > and __smp_alt_end. In my case size of section .smp_altinstructions is
> > zero and symbol __smpt_alt_begin becomes absolute and is not relocated
> > and system crashes while it is trying to free the memory starting
> > from __smp_alt_begin.
> >
> > o This issue is being fixed by the linker guys and they are making sure
> > that linker does not get rid of an empty section if there is any
> > section relative symbol defined in it. But we need to fix it at
> > kernel level too so that people using the linker version without fix,
> > are not affected.
> >
> > o One of the possible solutions is that force the section size to be
> > non zero to make sure these symbols don't become absolute. This
> > patch implements that.
>
> Would it be reasonable to omit this patch and require that the small number
> of people who want to build relocatable kernels install binutils
> 2.17.50.0.5 or later?
I think that's a reasonable thing to do for now.
Thanks
Vivek
On Wed, Oct 04, 2006 at 01:08:56AM -0600, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
> > Increasingly the cobbled together boot protocol that
> > is bzImage does not have the flexibility to deal
> > with booting in new situations.
> >
> > Now that we no longer support the bootsector loader
> > we have 512 bytes at the very start of a bzImage that
> > we can use for other things.
> >
> > Placing an ELF header there allows us to retain
> > a single binary for all of x86 while at the same
> > time describing things that bzImage does not allow
> > us to describe.
> >
> > The existing bugger off code for warning if we attempt to
> > boot from the bootsector is kept but the error message is
> > made more terse so we have a little more room to play with.
>
> Vivek for this first round can we please take out the ELF
> note processing. Now that vmlinux has ELF notes of interest
> to the bootloader we really should be getting the ELF notes
> from there.
>
> So the generation of the ELF notes needs to move into the
> vmlinux and then we need to copy them to ELF header.
>
> If we just remove the ELF note munging code from this patch
> that should be a good first step in getting the ELF notes correct.
Please find attached the patch. I have got rid the ELF note code
from the patch.
Andrew, does this fix your boot issue with grub? What's the grub version
you are using?
Thanks
Vivek
Increasingly the cobbled together boot protocol that
is bzImage does not have the flexibility to deal
with booting in new situations.
Now that we no longer support the bootsector loader
we have 512 bytes at the very start of a bzImage that
we can use for other things.
Placing an ELF header there allows us to retain
a single binary for all of x86 while at the same
time describing things that bzImage does not allow
us to describe.
The existing bugger off code for warning if we attempt to
boot from the bootsector is kept but the error message is
made more terse so we have a little more room to play with.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/boot/Makefile | 2
arch/i386/boot/bootsect.S | 56 ++++++++++-
arch/i386/boot/tools/build.c | 214 ++++++++++++++++++++++++++++++++++++++-----
3 files changed, 244 insertions(+), 28 deletions(-)
diff -puN arch/i386/boot/bootsect.S~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/bootsect.S
--- linux-2.6.18-git17/arch/i386/boot/bootsect.S~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-03 15:08:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/bootsect.S 2006-10-04 12:38:34.000000000 -0400
@@ -13,6 +13,11 @@
*
*/
+#include <linux/version.h>
+#include <linux/utsrelease.h>
+#include <linux/compile.h>
+#include <linux/elf.h>
+#include <asm/page.h>
#include <asm/boot.h>
SETUPSECTS = 4 /* default nr of setup-sectors */
@@ -42,10 +47,55 @@ SWAP_DEV = 0 /* SWAP_DEV is now writte
.global _start
_start:
+ehdr:
+ # e_ident is carefully crafted so if this is treated
+ # as an x86 bootsector you will execute through
+ # e_ident and then print the bugger off message.
+ # The 1 stores to bx+di is unfortunate it is
+ # unlikely to affect the ability to print
+ # a message and you aren't supposed to be booting a
+ # bzImage directly from a floppy anyway.
+
+ # e_ident
+ .byte ELFMAG0, ELFMAG1, ELFMAG2, ELFMAG3
+ .byte ELFCLASS32, ELFDATA2LSB, EV_CURRENT, ELFOSABI_STANDALONE
+ .byte 0xeb, 0x3d, 0, 0, 0, 0, 0, 0
+#ifndef CONFIG_RELOCATABLE
+ .word ET_EXEC # e_type
+#else
+ .word ET_DYN # e_type
+#endif
+ .word EM_386 # e_machine
+ .int 1 # e_version
+ .int LOAD_PHYSICAL_ADDR # e_entry
+ .int phdr - _start # e_phoff
+ .int 0 # e_shoff
+ .int 0 # e_flags
+ .word e_ehdr - ehdr # e_ehsize
+ .word e_phdr - phdr # e_phentsize
+ .word 1 # e_phnum
+ .word 40 # e_shentsize
+ .word 0 # e_shnum
+ .word 0 # e_shstrndx
+e_ehdr:
+.org 71
+normalize:
# Normalize the start address
jmpl $BOOTSEG, $start2
+.org 80
+phdr:
+ .int PT_LOAD # p_type
+ .int (SETUPSECTS+1)*512 # p_offset
+ .int LOAD_PHYSICAL_ADDR + __PAGE_OFFSET # p_vaddr
+ .int LOAD_PHYSICAL_ADDR # p_paddr
+ .int SYSSIZE*16 # p_filesz
+ .int 0 # p_memsz
+ .int PF_R | PF_W | PF_X # p_flags
+ .int CONFIG_PHYSICAL_ALIGN # p_align
+e_phdr:
+
start2:
movw %cs, %ax
movw %ax, %ds
@@ -78,11 +128,11 @@ die:
bugger_off_msg:
- .ascii "Direct booting from floppy is no longer supported.\r\n"
- .ascii "Please use a boot loader program instead.\r\n"
+ .ascii "Booting linux without a boot loader is no longer supported.\r\n"
.ascii "\n"
- .ascii "Remove disk and press any key to reboot . . .\r\n"
+ .ascii "Press any key to reboot . . .\r\n"
.byte 0
+ebugger_off_msg:
# Kernel attributes; used by setup
diff -puN arch/i386/boot/Makefile~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/Makefile
--- linux-2.6.18-git17/arch/i386/boot/Makefile~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-03 15:08:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/Makefile 2006-10-03 15:08:14.000000000 -0400
@@ -43,7 +43,7 @@ $(obj)/bzImage: BUILDFLAGS := -b
quiet_cmd_image = BUILD $@
cmd_image = $(obj)/tools/build $(BUILDFLAGS) $(obj)/bootsect $(obj)/setup \
- $(obj)/vmlinux.bin $(ROOT_DEV) > $@
+ $(obj)/vmlinux.bin $(ROOT_DEV) vmlinux > $@
$(obj)/zImage $(obj)/bzImage: $(obj)/bootsect $(obj)/setup \
$(obj)/vmlinux.bin $(obj)/tools/build FORCE
diff -puN arch/i386/boot/tools/build.c~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/tools/build.c
--- linux-2.6.18-git17/arch/i386/boot/tools/build.c~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-03 15:08:14.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/tools/build.c 2006-10-03 15:08:14.000000000 -0400
@@ -27,6 +27,11 @@
#include <string.h>
#include <stdlib.h>
#include <stdarg.h>
+#include <elf.h>
+#include <byteswap.h>
+#define USE_BSD
+#include <endian.h>
+#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/sysmacros.h>
@@ -48,6 +53,10 @@ byte buf[1024];
int fd;
int is_big_kernel;
+#define MAX_PHDRS 100
+static Elf32_Ehdr ehdr;
+static Elf32_Phdr phdr[MAX_PHDRS];
+
void die(const char * str, ...)
{
va_list args;
@@ -57,20 +66,151 @@ void die(const char * str, ...)
exit(1);
}
+#if BYTE_ORDER == LITTLE_ENDIAN
+#define le16_to_cpu(val) (val)
+#define le32_to_cpu(val) (val)
+#endif
+#if BYTE_ORDER == BIG_ENDIAN
+#define le16_to_cpu(val) bswap_16(val)
+#define le32_to_cpu(val) bswap_32(val)
+#endif
+
+static uint16_t elf16_to_cpu(uint16_t val)
+{
+ return le16_to_cpu(val);
+}
+
+static uint32_t elf32_to_cpu(uint32_t val)
+{
+ return le32_to_cpu(val);
+}
+
void file_open(const char *name)
{
if ((fd = open(name, O_RDONLY, 0)) < 0)
die("Unable to open `%s': %m", name);
}
+static void read_ehdr(void)
+{
+ if (read(fd, &ehdr, sizeof(ehdr)) != sizeof(ehdr)) {
+ die("Cannot read ELF header: %s\n",
+ strerror(errno));
+ }
+ if (memcmp(ehdr.e_ident, ELFMAG, 4) != 0) {
+ die("No ELF magic\n");
+ }
+ if (ehdr.e_ident[EI_CLASS] != ELFCLASS32) {
+ die("Not a 32 bit executable\n");
+ }
+ if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) {
+ die("Not a LSB ELF executable\n");
+ }
+ if (ehdr.e_ident[EI_VERSION] != EV_CURRENT) {
+ die("Unknown ELF version\n");
+ }
+ /* Convert the fields to native endian */
+ ehdr.e_type = elf16_to_cpu(ehdr.e_type);
+ ehdr.e_machine = elf16_to_cpu(ehdr.e_machine);
+ ehdr.e_version = elf32_to_cpu(ehdr.e_version);
+ ehdr.e_entry = elf32_to_cpu(ehdr.e_entry);
+ ehdr.e_phoff = elf32_to_cpu(ehdr.e_phoff);
+ ehdr.e_shoff = elf32_to_cpu(ehdr.e_shoff);
+ ehdr.e_flags = elf32_to_cpu(ehdr.e_flags);
+ ehdr.e_ehsize = elf16_to_cpu(ehdr.e_ehsize);
+ ehdr.e_phentsize = elf16_to_cpu(ehdr.e_phentsize);
+ ehdr.e_phnum = elf16_to_cpu(ehdr.e_phnum);
+ ehdr.e_shentsize = elf16_to_cpu(ehdr.e_shentsize);
+ ehdr.e_shnum = elf16_to_cpu(ehdr.e_shnum);
+ ehdr.e_shstrndx = elf16_to_cpu(ehdr.e_shstrndx);
+
+ if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
+ die("Unsupported ELF header type\n");
+ }
+ if (ehdr.e_machine != EM_386) {
+ die("Not for x86\n");
+ }
+ if (ehdr.e_version != EV_CURRENT) {
+ die("Unknown ELF version\n");
+ }
+ if (ehdr.e_ehsize != sizeof(Elf32_Ehdr)) {
+ die("Bad Elf header size\n");
+ }
+ if (ehdr.e_phentsize != sizeof(Elf32_Phdr)) {
+ die("Bad program header entry\n");
+ }
+ if (ehdr.e_shentsize != sizeof(Elf32_Shdr)) {
+ die("Bad section header entry\n");
+ }
+ if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+ die("String table index out of bounds\n");
+ }
+}
+
+static void read_phds(void)
+{
+ int i;
+ size_t size;
+ if (ehdr.e_phnum > MAX_PHDRS) {
+ die("%d program headers supported: %d\n",
+ ehdr.e_phnum, MAX_PHDRS);
+ }
+ if (lseek(fd, ehdr.e_phoff, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ ehdr.e_phoff, strerror(errno));
+ }
+ size = sizeof(phdr[0])*ehdr.e_phnum;
+ if (read(fd, &phdr, size) != size) {
+ die("Cannot read ELF section headers: %s\n",
+ strerror(errno));
+ }
+ for(i = 0; i < ehdr.e_phnum; i++) {
+ phdr[i].p_type = elf32_to_cpu(phdr[i].p_type);
+ phdr[i].p_offset = elf32_to_cpu(phdr[i].p_offset);
+ phdr[i].p_vaddr = elf32_to_cpu(phdr[i].p_vaddr);
+ phdr[i].p_paddr = elf32_to_cpu(phdr[i].p_paddr);
+ phdr[i].p_filesz = elf32_to_cpu(phdr[i].p_filesz);
+ phdr[i].p_memsz = elf32_to_cpu(phdr[i].p_memsz);
+ phdr[i].p_flags = elf32_to_cpu(phdr[i].p_flags);
+ phdr[i].p_align = elf32_to_cpu(phdr[i].p_align);
+ }
+}
+
+unsigned long vmlinux_memsz(void)
+{
+ unsigned long min, max, size;
+ int i;
+ min = 0xffffffff;
+ max = 0;
+ for(i = 0; i < ehdr.e_phnum; i++) {
+ unsigned long start, end;
+ if (phdr[i].p_type != PT_LOAD)
+ continue;
+ start = phdr[i].p_paddr;
+ end = phdr[i].p_paddr + phdr[i].p_memsz;
+ if (start < min)
+ min = start;
+ if (end > max)
+ max = end;
+ }
+ /* Get the reported size by vmlinux */
+ size = max - min;
+ /* Add 128K for the bootmem bitmap */
+ size += 128*1024;
+ /* Add in space for the initial page tables */
+ size = ((size + (((size + 4095) >> 12)*4)) + 4095) & ~4095;
+ return size;
+}
+
void usage(void)
{
- die("Usage: build [-b] bootsect setup system [rootdev] [> image]");
+ die("Usage: build [-b] bootsect setup system rootdev vmlinux [> image]");
}
int main(int argc, char ** argv)
{
unsigned int i, sz, setup_sectors;
+ unsigned kernel_offset, kernel_filesz, kernel_memsz;
int c;
u32 sys_size;
byte major_root, minor_root;
@@ -81,30 +221,25 @@ int main(int argc, char ** argv)
is_big_kernel = 1;
argc--, argv++;
}
- if ((argc < 4) || (argc > 5))
+ if (argc != 6)
usage();
- if (argc > 4) {
- if (!strcmp(argv[4], "CURRENT")) {
- if (stat("/", &sb)) {
- perror("/");
- die("Couldn't stat /");
- }
- major_root = major(sb.st_dev);
- minor_root = minor(sb.st_dev);
- } else if (strcmp(argv[4], "FLOPPY")) {
- if (stat(argv[4], &sb)) {
- perror(argv[4]);
- die("Couldn't stat root device.");
- }
- major_root = major(sb.st_rdev);
- minor_root = minor(sb.st_rdev);
- } else {
- major_root = 0;
- minor_root = 0;
+ if (!strcmp(argv[4], "CURRENT")) {
+ if (stat("/", &sb)) {
+ perror("/");
+ die("Couldn't stat /");
+ }
+ major_root = major(sb.st_dev);
+ minor_root = minor(sb.st_dev);
+ } else if (strcmp(argv[4], "FLOPPY")) {
+ if (stat(argv[4], &sb)) {
+ perror(argv[4]);
+ die("Couldn't stat root device.");
}
+ major_root = major(sb.st_rdev);
+ minor_root = minor(sb.st_rdev);
} else {
- major_root = DEFAULT_MAJOR_ROOT;
- minor_root = DEFAULT_MINOR_ROOT;
+ major_root = 0;
+ minor_root = 0;
}
fprintf(stderr, "Root device is (%d, %d)\n", major_root, minor_root);
@@ -144,10 +279,11 @@ int main(int argc, char ** argv)
i += c;
}
+ kernel_offset = (setup_sectors + 1)*512;
file_open(argv[3]);
if (fstat (fd, &sb))
die("Unable to stat `%s': %m", argv[3]);
- sz = sb.st_size;
+ kernel_filesz = sz = sb.st_size;
fprintf (stderr, "System is %d kB\n", sz/1024);
sys_size = (sz + 15) / 16;
if (!is_big_kernel && sys_size > DEF_SYSSIZE)
@@ -168,7 +304,37 @@ int main(int argc, char ** argv)
}
close(fd);
- if (lseek(1, 497, SEEK_SET) != 497) /* Write sizes to the bootsector */
+ file_open(argv[5]);
+ read_ehdr();
+ read_phds();
+ close(fd);
+ kernel_memsz = vmlinux_memsz();
+
+ if (lseek(1, 84, SEEK_SET) != 84) /* Write sizes to the bootsector */
+ die("Output: seek failed");
+ buf[0] = (kernel_offset >> 0) & 0xff;
+ buf[1] = (kernel_offset >> 8) & 0xff;
+ buf[2] = (kernel_offset >> 16) & 0xff;
+ buf[3] = (kernel_offset >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel file offset failed");
+ if (lseek(1, 96, SEEK_SET) != 96)
+ die("Output: seek failed");
+ buf[0] = (kernel_filesz >> 0) & 0xff;
+ buf[1] = (kernel_filesz >> 8) & 0xff;
+ buf[2] = (kernel_filesz >> 16) & 0xff;
+ buf[3] = (kernel_filesz >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel file size failed");
+ if (lseek(1, 100, SEEK_SET) != 100)
+ die("Output: seek failed");
+ buf[0] = (kernel_memsz >> 0) & 0xff;
+ buf[1] = (kernel_memsz >> 8) & 0xff;
+ buf[2] = (kernel_memsz >> 16) & 0xff;
+ buf[3] = (kernel_memsz >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel memory size failed");
+ if (lseek(1, 497, SEEK_SET) != 497)
die("Output: seek failed");
buf[0] = setup_sectors;
if (write(1, buf, 1) != 1)
_
On Wed, Oct 04, 2006 at 10:26:44AM +0200, Franck Bui-Huu wrote:
> hi,
>
> Sorry for the late feedback...
>
> Vivek Goyal wrote:
> >
> > On x86_64 we have to be careful with calculating the physical
> > address of kernel symbols. Both because of compiler odditities
> > and because the symbols live in a different range of the virtual
> > address space.
> >
>
> [snip]
>
> > +#define __pa_symbol(x) \
> > + ({unsigned long v; \
> > + asm("" : "=r" (v) : "0" (x)); \
> > + __pa(v); })
>
> Why not simply reusing RELOC_HIDE like this ?
>
> #define __pa_symbol(x) __pa(RELOC_HIDE(x,0))
>
Thanks. Above did not work and compiler gave following warning message upon
using __pa_symbol(_text)
error: cast specified array type
Then I specifically typecasted x to unsigned long and it seems to be
fine.
Regenerated patch is attached.
On x86_64 we have to be careful with calculating the physical
address of kernel symbols. Both because of compiler odditities
and because the symbols live in a different range of the virtual
address space.
Having a defintition of __pa_symbol that works on both x86_64 and
i386 simplifies writing code that works for both x86_64 and
i386 that has these kinds of dependencies.
So this patch adds the trivial i386 __pa_symbol definition.
Added assembly magic similar to RELOC_HIDE as suggested by Andi Kleen.
Just picked it up from x86_64.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
include/asm-i386/page.h | 3 +++
1 file changed, 3 insertions(+)
diff -puN include/asm-i386/page.h~i386-define-__pa_symbol include/asm-i386/page.h
--- linux-2.6.18-git17/include/asm-i386/page.h~i386-define-__pa_symbol 2006-10-02 14:39:18.000000000 -0400
+++ linux-2.6.18-git17-root/include/asm-i386/page.h 2006-10-04 14:48:54.000000000 -0400
@@ -124,6 +124,9 @@ extern int page_is_ram(unsigned long pag
#define VMALLOC_RESERVE ((unsigned long)__VMALLOC_RESERVE)
#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
+/* __pa_symbol should be used for C visible symbols.
+ This seems to be the official gcc blessed way to do such arithmetic. */
+#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)x,0))
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
#ifdef CONFIG_FLATMEM
_
On Tue, Oct 03, 2006 at 09:40:56PM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >
> >Hi Andrew,
> >
> >Right now I don't have access to my test machine. Tomorrow morning,
> >very first thing I am going to try it out with your config file.
> >
> >This patch just adds and ELF header to bzImage which is not even used
> >by grub.
> >
>
> Oh yes, it will be. See below.
>
> >So without this patch you are able to boot the kernel on your laptop?
>
> Danger, Will Robinson. GRUB, Etherboot, and a whole bunch of other boot
> loaders will recognize an ELF binary and load it as such. They will
> typically load it as an executable (not a relocatable object) -- I doubt
> many of them check that appropriate part of the ELF header -- so unless
> your kernel can be safely loaded *AND RUN* in that mode this is not
> going to work.
>
> The entrypoint is going to be a major headache, since the standard
> kernel is entered in real mode, whereas an ELF file will typically be
> entered in protected mode, quite possibly using the C calling convention
> to pass the command line as (argc, argv). God only knows how they're
> going to deal with an initrd.
>
> It may very well be that the ELF magic number has to be obfuscated.
>
Eric/Peter,
How about just extending bzImage format to include some info in real mode
kernel header. Say protocol version 2.05. I think if we just include two
more fields, is kernel relocatable and equivalent of ELF memsz, then probably
this information should be enough for kexec bzImage loader to load and run
a relocatable kernel from a different address.
Thanks
Vivek
Vivek Goyal wrote:
>
> Eric/Peter,
>
> How about just extending bzImage format to include some info in real mode
> kernel header. Say protocol version 2.05. I think if we just include two
> more fields, is kernel relocatable and equivalent of ELF memsz, then probably
> this information should be enough for kexec bzImage loader to load and run
> a relocatable kernel from a different address.
>
What would be the exact semantics of the "equivalent of ELF memsz"? I
have balked on that one in the past, because the proposed semantics were
unsafe.
I suspect we need at least one more piece of data, which is the required
alignment of a relocated kernel. Either which way, it seems clear that
there is some re-engineering that needs to be done, and I think we need
to better understand *why* the proposed patch failed.
Can this failure be reproduced in a simulator?
-hpa
On Wed, Oct 04, 2006 at 01:27:49PM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >
> >Eric/Peter,
> >
> >How about just extending bzImage format to include some info in real mode
> >kernel header. Say protocol version 2.05. I think if we just include two
> >more fields, is kernel relocatable and equivalent of ELF memsz, then
> >probably
> >this information should be enough for kexec bzImage loader to load and run
> >a relocatable kernel from a different address.
> >
>
> What would be the exact semantics of the "equivalent of ELF memsz"? I
> have balked on that one in the past, because the proposed semantics were
> unsafe.
>
memsz will contain the memory required to load the kernel image. And
probably should also include the memory used by kernel in initial boot
up code which is unaccounted and unbounded.
> I suspect we need at least one more piece of data, which is the required
> alignment of a relocated kernel.
Now with the introduction of config option CONFIG_PHYSICAL_ALIGN, it
should be easy to get.
> Either which way, it seems clear that
> there is some re-engineering that needs to be done, and I think we need
> to better understand *why* the proposed patch failed.
>
> Can this failure be reproduced in a simulator?
I will try to reproduce in a simulator. May be qemu? Any suggestions?
Thanks
Vivek
Vivek Goyal wrote:
>
> memsz will contain the memory required to load the kernel image. And
> probably should also include the memory used by kernel in initial boot
> up code which is unaccounted and unbounded.
>
Right, so that's a major project to produce.
One modification that would be highly desirable is to be able to put
initrd/initramfs in highmem, since people keep adding options which
break the highmem/lowmem boundary without consideration for the
implications; the latest one being vmalloc=.
>> I suspect we need at least one more piece of data, which is the required
>> alignment of a relocated kernel.
>
> Now with the introduction of config option CONFIG_PHYSICAL_ALIGN, it
> should be easy to get.
Yes, that should be easy.
>> Either which way, it seems clear that
>> there is some re-engineering that needs to be done, and I think we need
>> to better understand *why* the proposed patch failed.
>>
>> Can this failure be reproduced in a simulator?
>
> I will try to reproduce in a simulator. May be qemu? Any suggestions?
I find Bochs easier to debug under, although it's substantially slower.
-hpa
On Wed, Oct 04, 2006 at 01:52:58PM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >
> >memsz will contain the memory required to load the kernel image. And
> >probably should also include the memory used by kernel in initial boot
> >up code which is unaccounted and unbounded.
> >
>
> Right, so that's a major project to produce.
>
Eric is already doing that in his patch. He goes through vmlinux
headers to determine the memory to load the various segments and then
also takes into account the memory required by bootmem bitmap (128K)
and memory consumed by initial page tables (tools/build.c). We can
audit the code more closely for anything missed and can also include
some buffer amount to be safe.
The only flip side would be that if down the line somebody changes
the initial bootup code, he shall have to also acccount it in
tools/build.c. Having said that, its not frequent that initial bootup
code changes.
Thanks
Vivek
Vivek Goyal wrote:
> On Wed, Oct 04, 2006 at 01:52:58PM -0700, H. Peter Anvin wrote:
>> Vivek Goyal wrote:
>>> memsz will contain the memory required to load the kernel image. And
>>> probably should also include the memory used by kernel in initial boot
>>> up code which is unaccounted and unbounded.
>>>
>> Right, so that's a major project to produce.
>>
>
> Eric is already doing that in his patch. He goes through vmlinux
> headers to determine the memory to load the various segments and then
> also takes into account the memory required by bootmem bitmap (128K)
> and memory consumed by initial page tables (tools/build.c). We can
> audit the code more closely for anything missed and can also include
> some buffer amount to be safe.
>
> The only flip side would be that if down the line somebody changes
> the initial bootup code, he shall have to also acccount it in
> tools/build.c. Having said that, its not frequent that initial bootup
> code changes.
>
No, but it's going to be extremely hard to get this straight unless this
is actually enforced. I suspect there needs to be a check and a message
if this is violated.
-hpa
Vivek Goyal <[email protected]> writes:
> Hi Eric,
>
> Sure. I will get rid if ELF note generation for bzImage ELF header.
>
> But would that stop bootloaders out there from treating kernel as
> an ELF executable?
No. The point of the notes is so that the bootloaders can look
at the kernel and have a strong hint what the right thing todo is.
The reason for taking them out is that what needs to happen is that
we need to put the notes into vmlinux and then copy the notes in
vmlinux into the bzImage. Taking the notes out just make way
for us to put them back in properly.
> I have got a FC5 machine with grub version .97 and everything seems
> to work for me. So I am assuming that Andrew got a newer version of
> Grub which is trying to load ther kernel as an ELF executable and then
> running into the issues.
We need to figure out how to reproduce this.
Eric
Andrew Morton <[email protected]> writes:
> On Tue, 3 Oct 2006 13:25:11 -0400
> Vivek Goyal <[email protected]> wrote:
>
>> Increasingly the cobbled together boot protocol that
>> is bzImage does not have the flexibility to deal
>> with booting in new situations.
>>
>> Now that we no longer support the bootsector loader
>> we have 512 bytes at the very start of a bzImage that
>> we can use for other things.
>>
>> Placing an ELF header there allows us to retain
>> a single binary for all of x86 while at the same
>> time describing things that bzImage does not allow
>> us to describe.
>
> Seems that the entire kernel effort is an ongoing plot to make my poor
> little Vaio stop working. This patch turns it into a black-screened rock
> as soon as it does grub -> linux. Stock-standard FC5 install, config at
> http://userweb.kernel.org/~akpm/config-sony.txt.
Ugh. I just tested this with a grub 0.97-5 from what I assume is a
standard FC5 install (I haven't touched it) and the kernel boots.
I only have a 64bit user space on that machine so init doesn't
start but I get the rest of the kernel messages.
There were several testers working at redhat so a pure redhat
incompatibility would be a surprise.
I don't think the formula is a simple grub+bzImage == death.
There is something more subtle going on here.
I'm not certain where to start looking. Andrew it might help if we
could get the dying binary just in case some weird compile or
processing problem caused insanely unlikely things like the multiboot
binary to show up in your grub install. I don't think that is it,
but it should allow us to rule out that possibility.
Eric
Eric W. Biederman wrote:
>
> Ugh. I just tested this with a grub 0.97-5 from what I assume is a
> standard FC5 install (I haven't touched it) and the kernel boots.
> I only have a 64bit user space on that machine so init doesn't
> start but I get the rest of the kernel messages.
>
> There were several testers working at redhat so a pure redhat
> incompatibility would be a surprise.
>
> I don't think the formula is a simple grub+bzImage == death.
>
> There is something more subtle going on here.
>
> I'm not certain where to start looking. Andrew it might help if we
> could get the dying binary just in case some weird compile or
> processing problem caused insanely unlikely things like the multiboot
> binary to show up in your grub install. I don't think that is it,
> but it should allow us to rule out that possibility.
>
I would try running it in a more memory-constrained environment.
-hpa
"H. Peter Anvin" <[email protected]> writes:
> Well, it doesn't help if what you end up with for some bootloader is a
> nonfunctioning kernel.
I agree. We need to look at what is happening closely.
However just because we have some initial glitches doesn't mean we
shouldn't give up.
With grub you can say:
kernel --type=biglinux /path/to/bzImage
As I read the code it won't necessarily force the type of kernel image
grub will use but it will refuse to boot if it doesn't recognize
the kernel as the type specified.
The code for grub is in stage2/boot.c:load_image(). It tries a few
other formats before it tests for the linux magic number but
it won't recognize an ELF format executable unless it is a mutliboot
or a BSD executable.
Eric
Eric W. Biederman wrote:
> "H. Peter Anvin" <[email protected]> writes:
>
>> Well, it doesn't help if what you end up with for some bootloader is a
>> nonfunctioning kernel.
>
> I agree. We need to look at what is happening closely.
> However just because we have some initial glitches doesn't mean we
> shouldn't give up.
>
> With grub you can say:
> kernel --type=biglinux /path/to/bzImage
>
> As I read the code it won't necessarily force the type of kernel image
> grub will use but it will refuse to boot if it doesn't recognize
> the kernel as the type specified.
>
> The code for grub is in stage2/boot.c:load_image(). It tries a few
> other formats before it tests for the linux magic number but
> it won't recognize an ELF format executable unless it is a mutliboot
> or a BSD executable.
>
This isn't just about Grub, though. There is probably about a dozen
bootloaders in use on i386, if not more.
-hpa
On Wed, 04 Oct 2006 22:06:27 -0600
[email protected] (Eric W. Biederman) wrote:
> > Seems that the entire kernel effort is an ongoing plot to make my poor
> > little Vaio stop working. This patch turns it into a black-screened rock
> > as soon as it does grub -> linux. Stock-standard FC5 install, config at
> > http://userweb.kernel.org/~akpm/config-sony.txt.
>
> Ugh. I just tested this with a grub 0.97-5 from what I assume is a
> standard FC5 install (I haven't touched it) and the kernel boots.
> I only have a 64bit user space on that machine so init doesn't
> start but I get the rest of the kernel messages.
>
> There were several testers working at redhat so a pure redhat
> incompatibility would be a surprise.
>
> I don't think the formula is a simple grub+bzImage == death.
>
> There is something more subtle going on here.
>
> I'm not certain where to start looking. Andrew it might help if we
> could get the dying binary just in case some weird compile or
> processing problem caused insanely unlikely things like the multiboot
> binary to show up in your grub install. I don't think that is it,
> but it should allow us to rule out that possibility.
I tested it with Vivek's fix (below) and it still dies immediately.
The grub record is
title new (2.6.19-rc1)
root (hd0,5)
kernel /boot/bzImage-2.6.19-rc1 ro root=LABEL=/ rhgb vga=0x263
initrd /boot/initrd-2.6.19-rc1.img
various binares are at http://userweb.kernel.org/~akpm/reloc/
arch/i386/boot/bootsect.S | 42 +-----------------------------------
1 files changed, 2 insertions(+), 40 deletions(-)
diff -puN arch/i386/boot/bootsect.S~i386-boot-add-an-elf-header-to-bzimage-fix arch/i386/boot/bootsect.S
--- a/arch/i386/boot/bootsect.S~i386-boot-add-an-elf-header-to-bzimage-fix
+++ a/arch/i386/boot/bootsect.S
@@ -17,7 +17,6 @@
#include <linux/utsrelease.h>
#include <linux/compile.h>
#include <linux/elf.h>
-#include <linux/elf_boot.h>
#include <asm/page.h>
#include <asm/boot.h>
@@ -73,8 +72,8 @@ ehdr:
.int 0 # e_shoff
.int 0 # e_flags
.word e_ehdr - ehdr # e_ehsize
- .word e_phdr1 - phdr # e_phentsize
- .word (e_phdr - phdr)/(e_phdr1 - phdr) # e_phnum
+ .word e_phdr - phdr # e_phentsize
+ .word 1 # e_phnum
.word 40 # e_shentsize
.word 0 # e_shnum
.word 0 # e_shstrndx
@@ -95,45 +94,8 @@ phdr:
.int 0 # p_memsz
.int PF_R | PF_W | PF_X # p_flags
.int CONFIG_PHYSICAL_ALIGN # p_align
-e_phdr1:
-
- .int PT_NOTE # p_type
- .int b_note - _start # p_offset
- .int 0 # p_vaddr
- .int 0 # p_paddr
- .int e_note - b_note # p_filesz
- .int 0 # p_memsz
- .int 0 # p_flags
- .int 0 # p_align
e_phdr:
-.macro note name, type
- .balign 4
- .int 2f - 1f # n_namesz
- .int 4f - 3f # n_descsz
- .int \type # n_type
- .balign 4
-1: .asciz "\name"
-2: .balign 4
-3:
-.endm
-.macro enote
-4: .balign 4
-.endm
-
- .balign 4
-b_note:
- note ELF_NOTE_BOOT, EIN_PROGRAM_NAME
- .asciz "Linux"
- enote
- note ELF_NOTE_BOOT, EIN_PROGRAM_VERSION
- .asciz UTS_RELEASE
- enote
- note ELF_NOTE_BOOT, EIN_ARGUMENT_STYLE
- .asciz "Linux"
- enote
-e_note:
-
start2:
movw %cs, %ax
movw %ax, %ds
_
Andrew Morton <[email protected]> writes:
> I tested it with Vivek's fix (below) and it still dies immediately.
>
> The grub record is
>
> title new (2.6.19-rc1)
> root (hd0,5)
> kernel /boot/bzImage-2.6.19-rc1 ro root=LABEL=/ rhgb vga=0x263
> initrd /boot/initrd-2.6.19-rc1.img
>
> various binares are at http://userweb.kernel.org/~akpm/reloc/
Thanks.
The fix was actually to remove a conflict with the other ELF notes we
are starting to generate (in the Xen context) so we can get our act
together that way. I had no reason to suspect it would have had any
connection with your boot failure.
I examined your bzImage and it does not have a multiboot signature,
in the first 8k.
I pointed my grub at your bzImage and it booted as far as searching
for init. The only differences were I don't have video mode 0x263
so when prompted for something supported I told it to use video mode
0 instead. My boot partition is (hd0,0) and is just boot, so
I changed the grub configuration to:
title Andrew
root (hd0,0)
kernel /bzImage-2.6.19-rc1 ro root=LABEL=/ rhgb vga=0x263
initrd /initrd-2.6.19-rc1.img
So it feels like a subtle interaction with your hardware, or firmware.
Do things work better if you don't specify a vga=xxx mode?
This is a weird problem.
Eric
Vivek Goyal <[email protected]> writes:
Ok. I just noticed another piece that we want to change for
greater compatibility. We should make the virtual and the physical
addresses the same. Then there is no danger of some loader getting
them mixed up.
I.e. Not:
> +.org 80
> +phdr:
> + .int PT_LOAD # p_type
> + .int (SETUPSECTS+1)*512 # p_offset
> + .int LOAD_PHYSICAL_ADDR + __PAGE_OFFSET # p_vaddr
> + .int LOAD_PHYSICAL_ADDR # p_paddr
> + .int SYSSIZE*16 # p_filesz
> + .int 0 # p_memsz
> + .int PF_R | PF_W | PF_X # p_flags
> + .int CONFIG_PHYSICAL_ALIGN # p_align
> +e_phdr:
> +
but
> +.org 80
> +phdr:
> + .int PT_LOAD # p_type
> + .int (SETUPSECTS+1)*512 # p_offset
> + .int LOAD_PHYSICAL_ADDR # p_vaddr
> + .int LOAD_PHYSICAL_ADDR # p_paddr
> + .int SYSSIZE*16 # p_filesz
> + .int 0 # p_memsz
> + .int PF_R | PF_W | PF_X # p_flags
> + .int CONFIG_PHYSICAL_ALIGN # p_align
> +e_phdr:
> +
> start2:
> movw %cs, %ax
> movw %ax, %ds
> @@ -78,11 +128,11 @@ die:
Eric
On Thu, 05 Oct 2006 00:13:12 -0600
[email protected] (Eric W. Biederman) wrote:
> Do things work better if you don't specify a vga=xxx mode?
yes, without vga=0x263 it boots.
Andrew Morton <[email protected]> writes:
> On Thu, 05 Oct 2006 00:13:12 -0600
> [email protected] (Eric W. Biederman) wrote:
>
>> Do things work better if you don't specify a vga=xxx mode?
>
> yes, without vga=0x263 it boots.
Ok. It will take some digging but I suspect the problem is
that video.S is using a table or a variable placed over the original
boot sector, and expecting it to be zero initialized.
Finding that in the pile of 2000 lines of assembly could take a
little while.
Now at least we have something other people can try and reproduce
this problem with.
Eric
Andrew Morton wrote:
> On Thu, 05 Oct 2006 00:13:12 -0600
> [email protected] (Eric W. Biederman) wrote:
>
>> Do things work better if you don't specify a vga=xxx mode?
>
> yes, without vga=0x263 it boots.
vga= actually patches a specific offset in the boot sector. We don't
actually have 512 bytes, we have some 500-ish bytes plus a small patch
area at the end.
-hpa
In the lazy programmer school of fixes.
I haven't really tested this in any configuration.
But reading video.S it does use variable in the bootsector.
It does seem to initialize the variables before use.
But obviously something is missed.
By zeroing the uninteresting parts of the bootsector just after we
have determined we are loaded ok. We should ensure we are
always in a known state the entire time.
Andrew if I am right about the cause of your video not working
when you set an enhanced video mode this should fix your boot
problem.
Singed-off-by: Eric Biederman <[email protected]>
diff --git a/arch/i386/boot/setup.S b/arch/i386/boot/setup.S
index 53903a4..246ac88 100644
--- a/arch/i386/boot/setup.S
+++ b/arch/i386/boot/setup.S
@@ -287,6 +287,13 @@ # Check if an old loader tries to load a
loader_panic_mess: .string "Wrong loader, giving up..."
loader_ok:
+# Zero initialize the variables we keep in the bootsector
+ xorw %di, %di
+ xorb %al, %al
+ movw $497, %cx
+ rep
+ stosb
+
# Get memory size (extended mem, kB)
xorl %eax, %eax
"H. Peter Anvin" <[email protected]> writes:
> Andrew Morton wrote:
>> On Thu, 05 Oct 2006 00:13:12 -0600
>> [email protected] (Eric W. Biederman) wrote:
>>
>>> Do things work better if you don't specify a vga=xxx mode?
>> yes, without vga=0x263 it boots.
>
> vga= actually patches a specific offset in the boot sector. We don't actually
> have 512 bytes, we have some 500-ish bytes plus a small patch area at the end.
>From video.S
It uses offset 0 in the boot sector. We have 497 bytes that can be used,
before we call setup.S after that setup.S considers the entire bootsector
it's own.
/* Positions of various video parameters passed to the kernel */
/* (see also include/linux/tty.h) */
#define PARAM_CURSOR_POS 0x00
#define PARAM_VIDEO_PAGE 0x04
#define PARAM_VIDEO_MODE 0x06
#define PARAM_VIDEO_COLS 0x07
#define PARAM_VIDEO_EGA_BX 0x0a
#define PARAM_VIDEO_LINES 0x0e
#define PARAM_HAVE_VGA 0x0f
#define PARAM_FONT_POINTS 0x10
#define PARAM_LFB_WIDTH 0x12
#define PARAM_LFB_HEIGHT 0x14
#define PARAM_LFB_DEPTH 0x16
#define PARAM_LFB_BASE 0x18
#define PARAM_LFB_SIZE 0x1c
#define PARAM_LFB_LINELENGTH 0x24
#define PARAM_LFB_COLORS 0x26
#define PARAM_VESAPM_SEG 0x2e
#define PARAM_VESAPM_OFF 0x30
#define PARAM_LFB_PAGES 0x32
#define PARAM_VESA_ATTRIB 0x34
#define PARAM_CAPABILITIES 0x36
Eric
Eric W. Biederman wrote:
> In the lazy programmer school of fixes.
>
> I haven't really tested this in any configuration.
> But reading video.S it does use variable in the bootsector.
> It does seem to initialize the variables before use.
> But obviously something is missed.
>
> By zeroing the uninteresting parts of the bootsector just after we
> have determined we are loaded ok. We should ensure we are
> always in a known state the entire time.
>
> Andrew if I am right about the cause of your video not working
> when you set an enhanced video mode this should fix your boot
> problem.
>
I just noticed we're using string instructions in setup.S, without
forcing DF = 0 anywhere. There should be a "cld" near the top.
-hpa
On Thu, Oct 05, 2006 at 12:25:58AM -0600, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
> Ok. I just noticed another piece that we want to change for
> greater compatibility. We should make the virtual and the physical
> addresses the same. Then there is no danger of some loader getting
> them mixed up.
>
Ok. I changed virtual address to LOAD_PHYSICAL_ADDR. Please find attached
the regenrated patch.
Increasingly the cobbled together boot protocol that
is bzImage does not have the flexibility to deal
with booting in new situations.
Now that we no longer support the bootsector loader
we have 512 bytes at the very start of a bzImage that
we can use for other things.
Placing an ELF header there allows us to retain
a single binary for all of x86 while at the same
time describing things that bzImage does not allow
us to describe.
The existing bugger off code for warning if we attempt to
boot from the bootsector is kept but the error message is
made more terse so we have a little more room to play with.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/boot/Makefile | 2
arch/i386/boot/bootsect.S | 56 ++++++++++-
arch/i386/boot/tools/build.c | 214 ++++++++++++++++++++++++++++++++++++++-----
3 files changed, 244 insertions(+), 28 deletions(-)
diff -puN arch/i386/boot/bootsect.S~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/bootsect.S
--- linux-2.6.18-git17/arch/i386/boot/bootsect.S~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-04 14:49:13.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/bootsect.S 2006-10-05 16:57:25.000000000 -0400
@@ -13,6 +13,11 @@
*
*/
+#include <linux/version.h>
+#include <linux/utsrelease.h>
+#include <linux/compile.h>
+#include <linux/elf.h>
+#include <asm/page.h>
#include <asm/boot.h>
SETUPSECTS = 4 /* default nr of setup-sectors */
@@ -42,10 +47,55 @@ SWAP_DEV = 0 /* SWAP_DEV is now writte
.global _start
_start:
+ehdr:
+ # e_ident is carefully crafted so if this is treated
+ # as an x86 bootsector you will execute through
+ # e_ident and then print the bugger off message.
+ # The 1 stores to bx+di is unfortunate it is
+ # unlikely to affect the ability to print
+ # a message and you aren't supposed to be booting a
+ # bzImage directly from a floppy anyway.
+
+ # e_ident
+ .byte ELFMAG0, ELFMAG1, ELFMAG2, ELFMAG3
+ .byte ELFCLASS32, ELFDATA2LSB, EV_CURRENT, ELFOSABI_STANDALONE
+ .byte 0xeb, 0x3d, 0, 0, 0, 0, 0, 0
+#ifndef CONFIG_RELOCATABLE
+ .word ET_EXEC # e_type
+#else
+ .word ET_DYN # e_type
+#endif
+ .word EM_386 # e_machine
+ .int 1 # e_version
+ .int LOAD_PHYSICAL_ADDR # e_entry
+ .int phdr - _start # e_phoff
+ .int 0 # e_shoff
+ .int 0 # e_flags
+ .word e_ehdr - ehdr # e_ehsize
+ .word e_phdr - phdr # e_phentsize
+ .word 1 # e_phnum
+ .word 40 # e_shentsize
+ .word 0 # e_shnum
+ .word 0 # e_shstrndx
+e_ehdr:
+.org 71
+normalize:
# Normalize the start address
jmpl $BOOTSEG, $start2
+.org 80
+phdr:
+ .int PT_LOAD # p_type
+ .int (SETUPSECTS+1)*512 # p_offset
+ .int LOAD_PHYSICAL_ADDR # p_vaddr
+ .int LOAD_PHYSICAL_ADDR # p_paddr
+ .int SYSSIZE*16 # p_filesz
+ .int 0 # p_memsz
+ .int PF_R | PF_W | PF_X # p_flags
+ .int CONFIG_PHYSICAL_ALIGN # p_align
+e_phdr:
+
start2:
movw %cs, %ax
movw %ax, %ds
@@ -78,11 +128,11 @@ die:
bugger_off_msg:
- .ascii "Direct booting from floppy is no longer supported.\r\n"
- .ascii "Please use a boot loader program instead.\r\n"
+ .ascii "Booting linux without a boot loader is no longer supported.\r\n"
.ascii "\n"
- .ascii "Remove disk and press any key to reboot . . .\r\n"
+ .ascii "Press any key to reboot . . .\r\n"
.byte 0
+ebugger_off_msg:
# Kernel attributes; used by setup
diff -puN arch/i386/boot/Makefile~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/Makefile
--- linux-2.6.18-git17/arch/i386/boot/Makefile~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-04 14:49:13.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/Makefile 2006-10-04 14:49:13.000000000 -0400
@@ -43,7 +43,7 @@ $(obj)/bzImage: BUILDFLAGS := -b
quiet_cmd_image = BUILD $@
cmd_image = $(obj)/tools/build $(BUILDFLAGS) $(obj)/bootsect $(obj)/setup \
- $(obj)/vmlinux.bin $(ROOT_DEV) > $@
+ $(obj)/vmlinux.bin $(ROOT_DEV) vmlinux > $@
$(obj)/zImage $(obj)/bzImage: $(obj)/bootsect $(obj)/setup \
$(obj)/vmlinux.bin $(obj)/tools/build FORCE
diff -puN arch/i386/boot/tools/build.c~i386-boot-Add-an-ELF-header-to-bzImage arch/i386/boot/tools/build.c
--- linux-2.6.18-git17/arch/i386/boot/tools/build.c~i386-boot-Add-an-ELF-header-to-bzImage 2006-10-04 14:49:13.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/tools/build.c 2006-10-04 14:49:13.000000000 -0400
@@ -27,6 +27,11 @@
#include <string.h>
#include <stdlib.h>
#include <stdarg.h>
+#include <elf.h>
+#include <byteswap.h>
+#define USE_BSD
+#include <endian.h>
+#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/sysmacros.h>
@@ -48,6 +53,10 @@ byte buf[1024];
int fd;
int is_big_kernel;
+#define MAX_PHDRS 100
+static Elf32_Ehdr ehdr;
+static Elf32_Phdr phdr[MAX_PHDRS];
+
void die(const char * str, ...)
{
va_list args;
@@ -57,20 +66,151 @@ void die(const char * str, ...)
exit(1);
}
+#if BYTE_ORDER == LITTLE_ENDIAN
+#define le16_to_cpu(val) (val)
+#define le32_to_cpu(val) (val)
+#endif
+#if BYTE_ORDER == BIG_ENDIAN
+#define le16_to_cpu(val) bswap_16(val)
+#define le32_to_cpu(val) bswap_32(val)
+#endif
+
+static uint16_t elf16_to_cpu(uint16_t val)
+{
+ return le16_to_cpu(val);
+}
+
+static uint32_t elf32_to_cpu(uint32_t val)
+{
+ return le32_to_cpu(val);
+}
+
void file_open(const char *name)
{
if ((fd = open(name, O_RDONLY, 0)) < 0)
die("Unable to open `%s': %m", name);
}
+static void read_ehdr(void)
+{
+ if (read(fd, &ehdr, sizeof(ehdr)) != sizeof(ehdr)) {
+ die("Cannot read ELF header: %s\n",
+ strerror(errno));
+ }
+ if (memcmp(ehdr.e_ident, ELFMAG, 4) != 0) {
+ die("No ELF magic\n");
+ }
+ if (ehdr.e_ident[EI_CLASS] != ELFCLASS32) {
+ die("Not a 32 bit executable\n");
+ }
+ if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) {
+ die("Not a LSB ELF executable\n");
+ }
+ if (ehdr.e_ident[EI_VERSION] != EV_CURRENT) {
+ die("Unknown ELF version\n");
+ }
+ /* Convert the fields to native endian */
+ ehdr.e_type = elf16_to_cpu(ehdr.e_type);
+ ehdr.e_machine = elf16_to_cpu(ehdr.e_machine);
+ ehdr.e_version = elf32_to_cpu(ehdr.e_version);
+ ehdr.e_entry = elf32_to_cpu(ehdr.e_entry);
+ ehdr.e_phoff = elf32_to_cpu(ehdr.e_phoff);
+ ehdr.e_shoff = elf32_to_cpu(ehdr.e_shoff);
+ ehdr.e_flags = elf32_to_cpu(ehdr.e_flags);
+ ehdr.e_ehsize = elf16_to_cpu(ehdr.e_ehsize);
+ ehdr.e_phentsize = elf16_to_cpu(ehdr.e_phentsize);
+ ehdr.e_phnum = elf16_to_cpu(ehdr.e_phnum);
+ ehdr.e_shentsize = elf16_to_cpu(ehdr.e_shentsize);
+ ehdr.e_shnum = elf16_to_cpu(ehdr.e_shnum);
+ ehdr.e_shstrndx = elf16_to_cpu(ehdr.e_shstrndx);
+
+ if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
+ die("Unsupported ELF header type\n");
+ }
+ if (ehdr.e_machine != EM_386) {
+ die("Not for x86\n");
+ }
+ if (ehdr.e_version != EV_CURRENT) {
+ die("Unknown ELF version\n");
+ }
+ if (ehdr.e_ehsize != sizeof(Elf32_Ehdr)) {
+ die("Bad Elf header size\n");
+ }
+ if (ehdr.e_phentsize != sizeof(Elf32_Phdr)) {
+ die("Bad program header entry\n");
+ }
+ if (ehdr.e_shentsize != sizeof(Elf32_Shdr)) {
+ die("Bad section header entry\n");
+ }
+ if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+ die("String table index out of bounds\n");
+ }
+}
+
+static void read_phds(void)
+{
+ int i;
+ size_t size;
+ if (ehdr.e_phnum > MAX_PHDRS) {
+ die("%d program headers supported: %d\n",
+ ehdr.e_phnum, MAX_PHDRS);
+ }
+ if (lseek(fd, ehdr.e_phoff, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ ehdr.e_phoff, strerror(errno));
+ }
+ size = sizeof(phdr[0])*ehdr.e_phnum;
+ if (read(fd, &phdr, size) != size) {
+ die("Cannot read ELF section headers: %s\n",
+ strerror(errno));
+ }
+ for(i = 0; i < ehdr.e_phnum; i++) {
+ phdr[i].p_type = elf32_to_cpu(phdr[i].p_type);
+ phdr[i].p_offset = elf32_to_cpu(phdr[i].p_offset);
+ phdr[i].p_vaddr = elf32_to_cpu(phdr[i].p_vaddr);
+ phdr[i].p_paddr = elf32_to_cpu(phdr[i].p_paddr);
+ phdr[i].p_filesz = elf32_to_cpu(phdr[i].p_filesz);
+ phdr[i].p_memsz = elf32_to_cpu(phdr[i].p_memsz);
+ phdr[i].p_flags = elf32_to_cpu(phdr[i].p_flags);
+ phdr[i].p_align = elf32_to_cpu(phdr[i].p_align);
+ }
+}
+
+unsigned long vmlinux_memsz(void)
+{
+ unsigned long min, max, size;
+ int i;
+ min = 0xffffffff;
+ max = 0;
+ for(i = 0; i < ehdr.e_phnum; i++) {
+ unsigned long start, end;
+ if (phdr[i].p_type != PT_LOAD)
+ continue;
+ start = phdr[i].p_paddr;
+ end = phdr[i].p_paddr + phdr[i].p_memsz;
+ if (start < min)
+ min = start;
+ if (end > max)
+ max = end;
+ }
+ /* Get the reported size by vmlinux */
+ size = max - min;
+ /* Add 128K for the bootmem bitmap */
+ size += 128*1024;
+ /* Add in space for the initial page tables */
+ size = ((size + (((size + 4095) >> 12)*4)) + 4095) & ~4095;
+ return size;
+}
+
void usage(void)
{
- die("Usage: build [-b] bootsect setup system [rootdev] [> image]");
+ die("Usage: build [-b] bootsect setup system rootdev vmlinux [> image]");
}
int main(int argc, char ** argv)
{
unsigned int i, sz, setup_sectors;
+ unsigned kernel_offset, kernel_filesz, kernel_memsz;
int c;
u32 sys_size;
byte major_root, minor_root;
@@ -81,30 +221,25 @@ int main(int argc, char ** argv)
is_big_kernel = 1;
argc--, argv++;
}
- if ((argc < 4) || (argc > 5))
+ if (argc != 6)
usage();
- if (argc > 4) {
- if (!strcmp(argv[4], "CURRENT")) {
- if (stat("/", &sb)) {
- perror("/");
- die("Couldn't stat /");
- }
- major_root = major(sb.st_dev);
- minor_root = minor(sb.st_dev);
- } else if (strcmp(argv[4], "FLOPPY")) {
- if (stat(argv[4], &sb)) {
- perror(argv[4]);
- die("Couldn't stat root device.");
- }
- major_root = major(sb.st_rdev);
- minor_root = minor(sb.st_rdev);
- } else {
- major_root = 0;
- minor_root = 0;
+ if (!strcmp(argv[4], "CURRENT")) {
+ if (stat("/", &sb)) {
+ perror("/");
+ die("Couldn't stat /");
+ }
+ major_root = major(sb.st_dev);
+ minor_root = minor(sb.st_dev);
+ } else if (strcmp(argv[4], "FLOPPY")) {
+ if (stat(argv[4], &sb)) {
+ perror(argv[4]);
+ die("Couldn't stat root device.");
}
+ major_root = major(sb.st_rdev);
+ minor_root = minor(sb.st_rdev);
} else {
- major_root = DEFAULT_MAJOR_ROOT;
- minor_root = DEFAULT_MINOR_ROOT;
+ major_root = 0;
+ minor_root = 0;
}
fprintf(stderr, "Root device is (%d, %d)\n", major_root, minor_root);
@@ -144,10 +279,11 @@ int main(int argc, char ** argv)
i += c;
}
+ kernel_offset = (setup_sectors + 1)*512;
file_open(argv[3]);
if (fstat (fd, &sb))
die("Unable to stat `%s': %m", argv[3]);
- sz = sb.st_size;
+ kernel_filesz = sz = sb.st_size;
fprintf (stderr, "System is %d kB\n", sz/1024);
sys_size = (sz + 15) / 16;
if (!is_big_kernel && sys_size > DEF_SYSSIZE)
@@ -168,7 +304,37 @@ int main(int argc, char ** argv)
}
close(fd);
- if (lseek(1, 497, SEEK_SET) != 497) /* Write sizes to the bootsector */
+ file_open(argv[5]);
+ read_ehdr();
+ read_phds();
+ close(fd);
+ kernel_memsz = vmlinux_memsz();
+
+ if (lseek(1, 84, SEEK_SET) != 84) /* Write sizes to the bootsector */
+ die("Output: seek failed");
+ buf[0] = (kernel_offset >> 0) & 0xff;
+ buf[1] = (kernel_offset >> 8) & 0xff;
+ buf[2] = (kernel_offset >> 16) & 0xff;
+ buf[3] = (kernel_offset >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel file offset failed");
+ if (lseek(1, 96, SEEK_SET) != 96)
+ die("Output: seek failed");
+ buf[0] = (kernel_filesz >> 0) & 0xff;
+ buf[1] = (kernel_filesz >> 8) & 0xff;
+ buf[2] = (kernel_filesz >> 16) & 0xff;
+ buf[3] = (kernel_filesz >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel file size failed");
+ if (lseek(1, 100, SEEK_SET) != 100)
+ die("Output: seek failed");
+ buf[0] = (kernel_memsz >> 0) & 0xff;
+ buf[1] = (kernel_memsz >> 8) & 0xff;
+ buf[2] = (kernel_memsz >> 16) & 0xff;
+ buf[3] = (kernel_memsz >> 24) & 0xff;
+ if (write(1, buf, 4) != 4)
+ die("Write of kernel memory size failed");
+ if (lseek(1, 497, SEEK_SET) != 497)
die("Output: seek failed");
buf[0] = setup_sectors;
if (write(1, buf, 1) != 1)
_
On Thu, Oct 05, 2006 at 12:48:10AM -0600, Eric W. Biederman wrote:
> Andrew Morton <[email protected]> writes:
>
> > On Thu, 05 Oct 2006 00:13:12 -0600
> > [email protected] (Eric W. Biederman) wrote:
> >
> >> Do things work better if you don't specify a vga=xxx mode?
> >
> > yes, without vga=0x263 it boots.
>
> Ok. It will take some digging but I suspect the problem is
> that video.S is using a table or a variable placed over the original
> boot sector, and expecting it to be zero initialized.
>
> Finding that in the pile of 2000 lines of assembly could take a
> little while.
>
> Now at least we have something other people can try and reproduce
> this problem with.
>
I have tried it on three machines with various combinations of vga=
but no luck. Can't reproduce the problem at all. :-(
Vivek
On Thu, 05 Oct 2006 09:29:42 -0600
[email protected] (Eric W. Biederman) wrote:
>
> In the lazy programmer school of fixes.
>
> I haven't really tested this in any configuration.
> But reading video.S it does use variable in the bootsector.
> It does seem to initialize the variables before use.
> But obviously something is missed.
>
> By zeroing the uninteresting parts of the bootsector just after we
> have determined we are loaded ok. We should ensure we are
> always in a known state the entire time.
>
> Andrew if I am right about the cause of your video not working
> when you set an enhanced video mode this should fix your boot
> problem.
>
> Singed-off-by: Eric Biederman <[email protected]>
>
> diff --git a/arch/i386/boot/setup.S b/arch/i386/boot/setup.S
> index 53903a4..246ac88 100644
> --- a/arch/i386/boot/setup.S
> +++ b/arch/i386/boot/setup.S
> @@ -287,6 +287,13 @@ # Check if an old loader tries to load a
> loader_panic_mess: .string "Wrong loader, giving up..."
>
> loader_ok:
> +# Zero initialize the variables we keep in the bootsector
> + xorw %di, %di
> + xorb %al, %al
> + movw $497, %cx
> + rep
> + stosb
> +
> # Get memory size (extended mem, kB)
>
> xorl %eax, %eax
That fixed the vga=0x263 crash.
Andrew Morton <[email protected]> writes:
> On Thu, 05 Oct 2006 09:29:42 -0600
> [email protected] (Eric W. Biederman) wrote:
>
>>
>> In the lazy programmer school of fixes.
>>
>> I haven't really tested this in any configuration.
>> But reading video.S it does use variable in the bootsector.
>> It does seem to initialize the variables before use.
>> But obviously something is missed.
>>
>> By zeroing the uninteresting parts of the bootsector just after we
>> have determined we are loaded ok. We should ensure we are
>> always in a known state the entire time.
>>
>> Andrew if I am right about the cause of your video not working
>> when you set an enhanced video mode this should fix your boot
>> problem.
>>
>> Singed-off-by: Eric Biederman <[email protected]>
>>
>> diff --git a/arch/i386/boot/setup.S b/arch/i386/boot/setup.S
>> index 53903a4..246ac88 100644
>> --- a/arch/i386/boot/setup.S
>> +++ b/arch/i386/boot/setup.S
>> @@ -287,6 +287,13 @@ # Check if an old loader tries to load a
>> loader_panic_mess: .string "Wrong loader, giving up..."
>>
>> loader_ok:
>> +# Zero initialize the variables we keep in the bootsector
>> + xorw %di, %di
>> + xorb %al, %al
>> + movw $497, %cx
>> + rep
>> + stosb
>> +
>> # Get memory size (extended mem, kB)
>>
>> xorl %eax, %eax
>
> That fixed the vga=0x263 crash.
Good. We still have to be paranoid and address HPA's missing cld issues,
But otherwise it looks like we are in good shape.
Eric
Vivek Goyal wrote:
> +/* __pa_symbol should be used for C visible symbols.
> + This seems to be the official gcc blessed way to do such arithmetic. */
> +#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)x,0))
#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x),0))
^^^
... should be better. You should not rely on RELOC_HIDE implementation.
Franck
On Fri, Oct 06, 2006 at 03:10:58PM +0200, Franck Bui-Huu wrote:
> Vivek Goyal wrote:
>
> > +/* __pa_symbol should be used for C visible symbols.
> > + This seems to be the official gcc blessed way to do such arithmetic. */
> > +#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)x,0))
>
> #define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x),0))
> ^^^
> ... should be better. You should not rely on RELOC_HIDE implementation.
>
Thanks Franck. Done.
On x86_64 we have to be careful with calculating the physical
address of kernel symbols. Both because of compiler odditities
and because the symbols live in a different range of the virtual
address space.
Having a defintition of __pa_symbol that works on both x86_64 and
i386 simplifies writing code that works for both x86_64 and
i386 that has these kinds of dependencies.
So this patch adds the trivial i386 __pa_symbol definition.
Added assembly magic similar to RELOC_HIDE as suggested by Andi Kleen.
Just picked it up from x86_64.
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
include/asm-i386/page.h | 3 +++
1 file changed, 3 insertions(+)
diff -puN include/asm-i386/page.h~i386-define-__pa_symbol include/asm-i386/page.h
--- linux-2.6.18-git17/include/asm-i386/page.h~i386-define-__pa_symbol 2006-10-02 14:39:18.000000000 -0400
+++ linux-2.6.18-git17-root/include/asm-i386/page.h 2006-10-06 13:09:25.000000000 -0400
@@ -124,6 +124,9 @@ extern int page_is_ram(unsigned long pag
#define VMALLOC_RESERVE ((unsigned long)__VMALLOC_RESERVE)
#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
+/* __pa_symbol should be used for C visible symbols.
+ This seems to be the official gcc blessed way to do such arithmetic. */
+#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x),0))
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
#ifdef CONFIG_FLATMEM
_
On Fri, Oct 06, 2006 at 06:56:03AM -0600, Eric W. Biederman wrote:
> Andrew Morton <[email protected]> writes:
>
> > On Thu, 05 Oct 2006 09:29:42 -0600
> > [email protected] (Eric W. Biederman) wrote:
> >
> >>
> >> In the lazy programmer school of fixes.
> >>
> >> I haven't really tested this in any configuration.
> >> But reading video.S it does use variable in the bootsector.
> >> It does seem to initialize the variables before use.
> >> But obviously something is missed.
> >>
> >> By zeroing the uninteresting parts of the bootsector just after we
> >> have determined we are loaded ok. We should ensure we are
> >> always in a known state the entire time.
> >>
> >> Andrew if I am right about the cause of your video not working
> >> when you set an enhanced video mode this should fix your boot
> >> problem.
> >>
> >> Singed-off-by: Eric Biederman <[email protected]>
> >>
> >> diff --git a/arch/i386/boot/setup.S b/arch/i386/boot/setup.S
> >> index 53903a4..246ac88 100644
> >> --- a/arch/i386/boot/setup.S
> >> +++ b/arch/i386/boot/setup.S
> >> @@ -287,6 +287,13 @@ # Check if an old loader tries to load a
> >> loader_panic_mess: .string "Wrong loader, giving up..."
> >>
> >> loader_ok:
> >> +# Zero initialize the variables we keep in the bootsector
> >> + xorw %di, %di
> >> + xorb %al, %al
> >> + movw $497, %cx
> >> + rep
> >> + stosb
> >> +
> >> # Get memory size (extended mem, kB)
> >>
> >> xorl %eax, %eax
> >
> > That fixed the vga=0x263 crash.
>
> Good. We still have to be paranoid and address HPA's missing cld issues,
> But otherwise it looks like we are in good shape.
>
Hi Eric,
I have added cld in the regenerated patch below.
Also one more minor nit. stosb relies on being %es set properly. By the
time control reaches loader_ok, i could not find %es being set explicitly
hence I am assuming we are relying on bootloader to set it up for us.
Maybe we can be little more paranoid and setup the %es before stosb. I
have done this change too in the attached patch. Pleaese have a look.
I know little about assembly code.
In the lazy programmer school of fixes.
I haven't really tested this in any configuration.
But reading video.S it does use variable in the bootsector.
It does seem to initialize the variables before use.
But obviously something is missed.
By zeroing the uninteresting parts of the bootsector just after we
have determined we are loaded ok. We should ensure we are
always in a known state the entire time.
Andrew if I am right about the cause of your video not working
when you set an enhanced video mode this should fix your boot
problem.
Singed-off-by: Eric Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
arch/i386/boot/setup.S | 11 +++++++++++
1 file changed, 11 insertions(+)
diff -puN arch/i386/boot/setup.S~i386-set-bootset-to-zero-fix arch/i386/boot/setup.S
--- linux-2.6.18-git17/arch/i386/boot/setup.S~i386-set-bootset-to-zero-fix 2006-10-06 12:42:19.000000000 -0400
+++ linux-2.6.18-git17-root/arch/i386/boot/setup.S 2006-10-06 12:49:37.000000000 -0400
@@ -287,6 +287,17 @@ good_sig:
loader_panic_mess: .string "Wrong loader, giving up..."
loader_ok:
+# Zero initialize the variables we keep in the bootsector
+ movw %cs, %ax # aka SETUPSEG
+ subw $DELTA_INITSEG, %ax # aka INITSEG
+ movw %ax, %es
+ xorw %di, %di
+ xorb %al, %al
+ movw $497, %cx
+ cld
+ rep
+ stosb
+
# Get memory size (extended mem, kB)
xorl %eax, %eax
_
Vivek Goyal wrote:
>>
> Hi Eric,
>
> I have added cld in the regenerated patch below.
>
No, the cld needs to be earlier. It turns out this isn't the first use
of string instructions.
I think I am going to add DF=0 as a documented entry condition; it
definitely seems all current Linux kernels require it.
> Also one more minor nit. stosb relies on being %es set properly. By the
> time control reaches loader_ok, i could not find %es being set explicitly
> hence I am assuming we are relying on bootloader to set it up for us.
>
> Maybe we can be little more paranoid and setup the %es before stosb. I
> have done this change too in the attached patch. Pleaese have a look.
> I know little about assembly code.
%es being set is part of the requirements list, although it *would* be
better to not rely on any segment registers being set at all (only rely
on %cs.)
-hpa
Vivek Goyal <[email protected]> writes:
> Hi Eric,
>
> I have added cld in the regenerated patch below.
>
> Also one more minor nit. stosb relies on being %es set properly. By the
> time control reaches loader_ok, i could not find %es being set explicitly
> hence I am assuming we are relying on bootloader to set it up for us.
No my bad. I was thinking it was %ds, like everything else.
> Maybe we can be little more paranoid and setup the %es before stosb. I
> have done this change too in the attached patch. Pleaese have a look.
> I know little about assembly code.
Looks good to me.
Signed-off-by: Eric W. Biederman <[email protected]>
> In the lazy programmer school of fixes.
>
> I haven't really tested this in any configuration.
> But reading video.S it does use variable in the bootsector.
> It does seem to initialize the variables before use.
> But obviously something is missed.
>
> By zeroing the uninteresting parts of the bootsector just after we
> have determined we are loaded ok. We should ensure we are
> always in a known state the entire time.
>
> Andrew if I am right about the cause of your video not working
> when you set an enhanced video mode this should fix your boot
> problem.
>
> Singed-off-by: Eric Biederman <[email protected]>
>
> Signed-off-by: Vivek Goyal <[email protected]>
> ---
>
> arch/i386/boot/setup.S | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff -puN arch/i386/boot/setup.S~i386-set-bootset-to-zero-fix
> arch/i386/boot/setup.S
> --- linux-2.6.18-git17/arch/i386/boot/setup.S~i386-set-bootset-to-zero-fix
> 2006-10-06 12:42:19.000000000 -0400
> +++ linux-2.6.18-git17-root/arch/i386/boot/setup.S 2006-10-06 12:49:37.000000000
> -0400
> @@ -287,6 +287,17 @@ good_sig:
> loader_panic_mess: .string "Wrong loader, giving up..."
>
> loader_ok:
> +# Zero initialize the variables we keep in the bootsector
> + movw %cs, %ax # aka SETUPSEG
> + subw $DELTA_INITSEG, %ax # aka INITSEG
> + movw %ax, %es
> + xorw %di, %di
> + xorb %al, %al
> + movw $497, %cx
> + cld
> + rep
> + stosb
> +
> # Get memory size (extended mem, kB)
>
> xorl %eax, %eax
> _
"H. Peter Anvin" <[email protected]> writes:
> Vivek Goyal wrote:
>>>
>> Hi Eric,
>> I have added cld in the regenerated patch below.
>
> No, the cld needs to be earlier. It turns out this isn't the first use of
> string instructions.
Can we rely on the int calls not setting df? Otherwise we need to clear
df at each use as we do with all of the later uses.
> I think I am going to add DF=0 as a documented entry condition; it definitely
> seems all current Linux kernels require it.
>
>> Also one more minor nit. stosb relies on being %es set properly. By the
>> time control reaches loader_ok, i could not find %es being set explicitly
>> hence I am assuming we are relying on bootloader to set it up for us. Maybe we
>> can be little more paranoid and setup the %es before stosb. I
>> have done this change too in the attached patch. Pleaese have a look.
>> I know little about assembly code.
>
> %es being set is part of the requirements list, although it *would* be better to
> not rely on any segment registers being set at all (only rely on %cs.)
Agreed.
Eric
Eric W. Biederman wrote:
> "H. Peter Anvin" <[email protected]> writes:
>
>> Vivek Goyal wrote:
>>> Hi Eric,
>>> I have added cld in the regenerated patch below.
>> No, the cld needs to be earlier. It turns out this isn't the first use of
>> string instructions.
>
> Can we rely on the int calls not setting df? Otherwise we need to clear
> df at each use as we do with all of the later uses.
>
Yes, we can, with a few exceptions. INT saves the flags and IRET
restores them.
-hpa
On Tue, 3 Oct 2006 13:04:13 -0400
Vivek Goyal <[email protected]> wrote:
> Ld knows about 2 kinds of symbols, absolute and section
> relative. Section relative symbols symbols change value
> when a section is moved and absolute symbols do not.
>
> Currently in the linker script we have several labels
> marking the beginning and ending of sections that
> are outside of sections, making them absolute symbols.
> Having a mixture of absolute and section relative
> symbols refereing to the same data is currently harmless
> but it is confusing.
>
> This must be done carefully as newer revs of ld do not place
> symbols that appear in sections without data and instead
> ld makes those symbols global :(
>
> My ultimate goal is to build a relocatable kernel. The
> safest and least intrusive technique is to generate
> relocation entries so the kernel can be relocated at load
> time. The only penalty would be an increase in the size
> of the kernel binary. The problem is that if absolute and
> relocatable symbols are not properly specified absolute symbols
> will be relocated or section relative symbols won't be, which
> is fatal.
>
> The practical motivation is that when generating kernels that
> will run from a reserved area for analyzing what caused
> a kernel panic, it is simpler if you don't need to hard code
> the physical memory location they will run at, especially
> for the distributions.
This patch causes the following warnings:
/opt/crosstool/gcc-4.1.0-glibc-2.3.6/i686-unknown-linux-gnu/bin/i686-unknown-linux-gnu-ld: .tmp_vmlinux1: warning: allocated section `.smp_altinstr_replacement' not in segment
/opt/crosstool/gcc-4.1.0-glibc-2.3.6/i686-unknown-linux-gnu/bin/i686-unknown-linux-gnu-ld: .tmp_vmlinux2: warning: allocated section `.smp_altinstr_replacement' not in segment
/opt/crosstool/gcc-4.1.0-glibc-2.3.6/i686-unknown-linux-gnu/bin/i686-unknown-linux-gnu-ld: vmlinux: warning: allocated section `.smp_altinstr_replacement' not in segment
The patch
i386-force-section-size-to-be-non-zero-to-prevent-a-symbol-becoming-absolute.patch
makes those warnings go away again, but we decided to drop that.
This:
.smp_altinstr_replacement : AT(ADDR(.smp_altinstr_replacement) - LOAD_OFFSET) {
*(.smp_altinstr_replacement)
. = ALIGN(4096);
__smp_alt_end = .;
}
looks odd. What's the point in putting a gap before __smp_alt_end? Moving
__smp_alt_end to before the ALIGN doesn't prevent the warning.
GNU ld version 2.16.1, gcc-4.1.0, config at
http://userweb.kernel.org/~akpm/config-vmm.txt
On Fri, Oct 06, 2006 at 11:35:47PM -0700, Andrew Morton wrote:
> On Tue, 3 Oct 2006 13:04:13 -0400
> Vivek Goyal <[email protected]> wrote:
>
> > Ld knows about 2 kinds of symbols, absolute and section
> > relative. Section relative symbols symbols change value
> > when a section is moved and absolute symbols do not.
> >
> > Currently in the linker script we have several labels
> > marking the beginning and ending of sections that
> > are outside of sections, making them absolute symbols.
> > Having a mixture of absolute and section relative
> > symbols refereing to the same data is currently harmless
> > but it is confusing.
> >
> > This must be done carefully as newer revs of ld do not place
> > symbols that appear in sections without data and instead
> > ld makes those symbols global :(
> >
> > My ultimate goal is to build a relocatable kernel. The
> > safest and least intrusive technique is to generate
> > relocation entries so the kernel can be relocated at load
> > time. The only penalty would be an increase in the size
> > of the kernel binary. The problem is that if absolute and
> > relocatable symbols are not properly specified absolute symbols
> > will be relocated or section relative symbols won't be, which
> > is fatal.
> >
> > The practical motivation is that when generating kernels that
> > will run from a reserved area for analyzing what caused
> > a kernel panic, it is simpler if you don't need to hard code
> > the physical memory location they will run at, especially
> > for the distributions.
>
> This patch causes the following warnings:
>
> /opt/crosstool/gcc-4.1.0-glibc-2.3.6/i686-unknown-linux-gnu/bin/i686-unknown-linux-gnu-ld: .tmp_vmlinux1: warning: allocated section `.smp_altinstr_replacement' not in segment
> /opt/crosstool/gcc-4.1.0-glibc-2.3.6/i686-unknown-linux-gnu/bin/i686-unknown-linux-gnu-ld: .tmp_vmlinux2: warning: allocated section `.smp_altinstr_replacement' not in segment
> /opt/crosstool/gcc-4.1.0-glibc-2.3.6/i686-unknown-linux-gnu/bin/i686-unknown-linux-gnu-ld: vmlinux: warning: allocated section `.smp_altinstr_replacement' not in segment
>
> The patch
> i386-force-section-size-to-be-non-zero-to-prevent-a-symbol-becoming-absolute.patch
> makes those warnings go away again, but we decided to drop that.
>
> This:
>
> .smp_altinstr_replacement : AT(ADDR(.smp_altinstr_replacement) - LOAD_OFFSET) {
> *(.smp_altinstr_replacement)
> . = ALIGN(4096);
> __smp_alt_end = .;
> }
>
> looks odd. What's the point in putting a gap before __smp_alt_end? Moving
> __smp_alt_end to before the ALIGN doesn't prevent the warning.
>
Actually this ALIGN(4096) was already present present before symbol
__smp_alt_end that's why we kept it even when we moved __smp_alt_end
inside the section.
But now thinking about it, it looks like this ALIGN(4096) might not be
required. There is already one ALIGN(4086) present after this section
which should take care of protecting other data while this page is freed.
Please find attached a patch for the same. I am also copying Gerd Hoffmann,
who introduced this ALIGN. Gerd, can you please confirm that above ALIGN()
is not required and the patch attached should be fine.
As a side effect, above warning also disappears. Looks like there is no
data in the section .smp_altinstr_replacement but above ALIGN() forced
the linker to create a non-empty allocatable section. The type of the
section is NOBITS and probably that's why linker is emitting the warning.
I will write a separate mail to linker folks to find more about it.
Thanks
Vivek
o There seems to be one extra ALIGN(4096) before symbol __smp_alt_end. The
only usage of __smp_alt_end is to mark the end of smp alternative
sections so that this memory can be freed. As a physical page is freed
one has to just make sure that there is no other data on the same page
where __smp_alt_end is pointing. There is already a ALIGN(4096) after
this section which should take care of the above issue. Hence it looks
like the ALIGN(4096) before __smp_alt_end is redundant and not required.
Signed-off-by: Vivek Goyal <[email protected]>
---
linux-2.6.19-rc1-vivek/arch/i386/kernel/vmlinux.lds.S | 1 -
1 files changed, 1 deletion(-)
diff -puN arch/i386/kernel/vmlinux.lds.S~i386-remove-unnecessary-align-option arch/i386/kernel/vmlinux.lds.S
--- linux-2.6.19-rc1/arch/i386/kernel/vmlinux.lds.S~i386-remove-unnecessary-align-option 2006-10-08 12:33:05.000000000 -0400
+++ linux-2.6.19-rc1-vivek/arch/i386/kernel/vmlinux.lds.S 2006-10-08 12:33:05.000000000 -0400
@@ -112,7 +112,6 @@ SECTIONS
}
.smp_altinstr_replacement : AT(ADDR(.smp_altinstr_replacement) - LOAD_OFFSET) {
*(.smp_altinstr_replacement)
- . = ALIGN(4096);
__smp_alt_end = .;
}
_
>> looks odd. What's the point in putting a gap before __smp_alt_end? Moving
>> __smp_alt_end to before the ALIGN doesn't prevent the warning.
>>
> Please find attached a patch for the same. I am also copying Gerd Hoffmann,
> who introduced this ALIGN. Gerd, can you please confirm that above ALIGN()
> is not required and the patch attached should be fine.
The data between __smp_alt_start and __smp_alt_end will be released at
boot time in some cases (UP machine, kernel without CPU_HOTPLUG, ...).
Releasing memory works at page granularity only, thats why I added the
alignment. I think you can't simply drop it.
> o There seems to be one extra ALIGN(4096) before symbol __smp_alt_end. The
> only usage of __smp_alt_end is to mark the end of smp alternative
> sections so that this memory can be freed. As a physical page is freed
> one has to just make sure that there is no other data on the same page
> where __smp_alt_end is pointing. There is already a ALIGN(4096) after
> this section which should take care of the above issue. Hence it looks
> like the ALIGN(4096) before __smp_alt_end is redundant and not required.
Hmm, ok, it should work then. How about adding a comment to make sure
the align after __smp_alt_end doesn't get dropped by accident?
cheers,
Gerd
--
Gerd Hoffmann <[email protected]>
http://www.suse.de/~kraxel/julika-dora.jpeg
On Mon, Oct 09, 2006 at 09:35:26AM +0200, Gerd Hoffmann wrote:
> >> looks odd. What's the point in putting a gap before __smp_alt_end? Moving
> >> __smp_alt_end to before the ALIGN doesn't prevent the warning.
> >>
>
> > Please find attached a patch for the same. I am also copying Gerd Hoffmann,
> > who introduced this ALIGN. Gerd, can you please confirm that above ALIGN()
> > is not required and the patch attached should be fine.
>
> The data between __smp_alt_start and __smp_alt_end will be released at
> boot time in some cases (UP machine, kernel without CPU_HOTPLUG, ...).
>
> Releasing memory works at page granularity only, thats why I added the
> alignment. I think you can't simply drop it.
>
> > o There seems to be one extra ALIGN(4096) before symbol __smp_alt_end. The
> > only usage of __smp_alt_end is to mark the end of smp alternative
> > sections so that this memory can be freed. As a physical page is freed
> > one has to just make sure that there is no other data on the same page
> > where __smp_alt_end is pointing. There is already a ALIGN(4096) after
> > this section which should take care of the above issue. Hence it looks
> > like the ALIGN(4096) before __smp_alt_end is redundant and not required.
>
> Hmm, ok, it should work then. How about adding a comment to make sure
> the align after __smp_alt_end doesn't get dropped by accident?
>
Thanks Gerd. I have put a comment to make things more clear. Please find
attahched the attached regenerated patch.
o There seems to be one extra ALIGN(4096) before symbol __smp_alt_end. The
only usage of __smp_alt_end is to mark the end of smp alternative
sections so that this memory can be freed. As a physical page is freed
one has to just make sure that there is no other data on the same page
where __smp_alt_end is pointing. There is already a ALIGN(4096) after
this section which should take care of the above issue. Hence it looks
like the ALIGN(4096) before __smp_alt_end is redundant and not required.
Signed-off-by: Vivek Goyal <[email protected]>
---
linux-2.6.19-rc1-vivek/arch/i386/kernel/vmlinux.lds.S | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff -puN arch/i386/kernel/vmlinux.lds.S~i386-remove-unnecessary-align-option arch/i386/kernel/vmlinux.lds.S
--- linux-2.6.19-rc1/arch/i386/kernel/vmlinux.lds.S~i386-remove-unnecessary-align-option 2006-10-09 09:39:00.000000000 -0400
+++ linux-2.6.19-rc1-vivek/arch/i386/kernel/vmlinux.lds.S 2006-10-09 09:43:22.000000000 -0400
@@ -112,11 +112,15 @@ SECTIONS
}
.smp_altinstr_replacement : AT(ADDR(.smp_altinstr_replacement) - LOAD_OFFSET) {
*(.smp_altinstr_replacement)
- . = ALIGN(4096);
__smp_alt_end = .;
}
- /* will be freed after init */
+ /* will be freed after init
+ * Following ALIGN() is required to make sure no other data falls on the
+ * same page where __smp_alt_end is pointing as that page might be freed
+ * after boot. Always make sure that ALIGN() directive is present after
+ * the section which contains __smp_alt_end.
+ */
. = ALIGN(4096); /* Init code and data */
.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
__init_begin = .;
_
On Fri, Oct 06, 2006 at 02:54:12PM -0700, H. Peter Anvin wrote:
> Eric W. Biederman wrote:
> >"H. Peter Anvin" <[email protected]> writes:
> >
> >>Vivek Goyal wrote:
> >>>Hi Eric,
> >>>I have added cld in the regenerated patch below.
> >>No, the cld needs to be earlier. It turns out this isn't the first use of
> >>string instructions.
> >
> >Can we rely on the int calls not setting df? Otherwise we need to clear
> >df at each use as we do with all of the later uses.
> >
>
> Yes, we can, with a few exceptions. INT saves the flags and IRET
> restores them.
>
Ok. I have added the "cld" early in the setup code. I am still retaining
the call to "cld" just before string instruction to be on the safer side.
Please find attached the regenerated patch.
In the lazy programmer school of fixes.
I haven't really tested this in any configuration.
But reading video.S it does use variable in the bootsector.
It does seem to initialize the variables before use.
But obviously something is missed.
By zeroing the uninteresting parts of the bootsector just after we
have determined we are loaded ok. We should ensure we are
always in a known state the entire time.
Andrew if I am right about the cause of your video not working
when you set an enhanced video mode this should fix your boot
problem.
Singed-off-by: Eric Biederman <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
linux-2.6.19-rc1-vivek/arch/i386/boot/setup.S | 12 ++++++++++++
1 files changed, 12 insertions(+)
diff -puN arch/i386/boot/setup.S~i386-set-bootset-to-zero-fix arch/i386/boot/setup.S
--- linux-2.6.19-rc1/arch/i386/boot/setup.S~i386-set-bootset-to-zero-fix 2006-10-09 10:11:58.000000000 -0400
+++ linux-2.6.19-rc1-vivek/arch/i386/boot/setup.S 2006-10-09 10:27:42.000000000 -0400
@@ -167,6 +167,7 @@ trampoline: call start_of_setup
# End of setup header #####################################################
start_of_setup:
+ cld # set DF=0
# Bootlin depends on this being done early
movw $0x01500, %ax
movb $0x81, %dl
@@ -287,6 +288,17 @@ good_sig:
loader_panic_mess: .string "Wrong loader, giving up..."
loader_ok:
+# Zero initialize the variables we keep in the bootsector
+ movw %cs, %ax # aka SETUPSEG
+ subw $DELTA_INITSEG, %ax # aka INITSEG
+ movw %ax, %es
+ xorw %di, %di
+ xorb %al, %al
+ movw $497, %cx
+ cld
+ rep
+ stosb
+
# Get memory size (extended mem, kB)
xorl %eax, %eax
_
On Mon, 9 Oct 2006 10:33:45 -0400
Vivek Goyal <[email protected]> wrote:
> Please find attached the regenerated patch.
Somewhere amongst the six versions of this patch, the kernel broke. Seems
that the kernel command line isn't getting recognised. The machine is
running LILO and RH FC1.
I'll consolidate the patches which I have now and then I'll drop them.
They are (were), in order:
i386-distinguish-absolute-symbols.patch
i386-distinguish-absolute-symbols-fix.patch
i386-align-data-section-to-4k-boundary.patch
i386-define-__pa_symbol-2.patch
i386-setupc-reserve-kernel-memory-starting-from-_text.patch
i386-config_physical_start-cleanup.patch
make-linux-elfh-safe-to-be-included-in-assembly-files.patch
elf-add-elfosabi_standalone-to-elfh.patch
kallsyms-generate-relocatable-symbols.patch
i386-relocatable-kernel-support.patch
i386-implement-config_physical_align.patch
i386-boot-add-an-elf-header-to-bzimage.patch
i386-boot-add-an-elf-header-to-bzimage-fix.patch
i386-boot-add-an-elf-header-to-bzimage-update-2.patch
i386-boot-add-an-elf-header-to-bzimage-fix-fix.patch
i386-boot-add-an-elf-header-to-bzimage-fix-fix-fix.patch
i386-boot-add-an-elf-header-to-bzimage-fix-fix-fix-fix.patch
Andrew Morton <[email protected]> writes:
> On Mon, 9 Oct 2006 10:33:45 -0400
> Vivek Goyal <[email protected]> wrote:
>
>> Please find attached the regenerated patch.
>
> Somewhere amongst the six versions of this patch, the kernel broke. Seems
> that the kernel command line isn't getting recognised. The machine is
> running LILO and RH FC1.
Ugh. That is no fun :(
Eric
On Mon, Oct 09, 2006 at 08:14:18PM -0700, Andrew Morton wrote:
> On Mon, 9 Oct 2006 10:33:45 -0400
> Vivek Goyal <[email protected]> wrote:
>
> > Please find attached the regenerated patch.
>
> Somewhere amongst the six versions of this patch, the kernel broke. Seems
> that the kernel command line isn't getting recognised. The machine is
> running LILO and RH FC1.
>
> I'll consolidate the patches which I have now and then I'll drop them.
>
Hi Andrew,
I will find a machine having lilo and look into the issue.
Instead of dropping all the patches, can't we just drop the last patch which
adds an elf header. Most likely this issue should be happening because of
that patch. Rest of the patches should be harmless.
Thanks
Vivek
On Tue, 10 Oct 2006 10:30:44 -0400
Vivek Goyal <[email protected]> wrote:
> On Mon, Oct 09, 2006 at 08:14:18PM -0700, Andrew Morton wrote:
> > On Mon, 9 Oct 2006 10:33:45 -0400
> > Vivek Goyal <[email protected]> wrote:
> >
> > > Please find attached the regenerated patch.
> >
> > Somewhere amongst the six versions of this patch, the kernel broke. Seems
> > that the kernel command line isn't getting recognised. The machine is
> > running LILO and RH FC1.
> >
> > I'll consolidate the patches which I have now and then I'll drop them.
> >
>
> Hi Andrew,
>
> I will find a machine having lilo and look into the issue.
Thanks.
> Instead of dropping all the patches, can't we just drop the last patch which
> adds an elf header.
Yes, that patch was the cause.
> Most likely this issue should be happening because of
> that patch. Rest of the patches should be harmless.
Those patches had reached my tolerance threshold and other people want to
make changes to vmlinux.lds.S, and having large and uncertain changes in
there makes that harder for them.
Resend them when you think they're ready for another run please.
On Mon, Oct 09, 2006 at 08:14:18PM -0700, Andrew Morton wrote:
> On Mon, 9 Oct 2006 10:33:45 -0400
> Vivek Goyal <[email protected]> wrote:
>
> > Please find attached the regenerated patch.
>
> Somewhere amongst the six versions of this patch, the kernel broke. Seems
> that the kernel command line isn't getting recognised. The machine is
> running LILO and RH FC1.
>
Hi Andrew,
I tried lilo 22.7.3 with FC6 Test2 with 2.6.19-rc1 and all the relocatable
patches and things work fine for me. Commnand line is also being recognized
properly.
Looks like something specific to your setup. Can you please provide some
more details
- Do you see any failure messages?
- Can you please provide your /etc/lilo.conf file.
- What lilo version are you using?
- Can you please also upload your kernel config file.
Thanks
Vivek
On Tue, 10 Oct 2006 17:40:25 -0400
Vivek Goyal <[email protected]> wrote:
> On Mon, Oct 09, 2006 at 08:14:18PM -0700, Andrew Morton wrote:
> > On Mon, 9 Oct 2006 10:33:45 -0400
> > Vivek Goyal <[email protected]> wrote:
> >
> > > Please find attached the regenerated patch.
> >
> > Somewhere amongst the six versions of this patch, the kernel broke. Seems
> > that the kernel command line isn't getting recognised. The machine is
> > running LILO and RH FC1.
> >
>
> Hi Andrew,
>
> I tried lilo 22.7.3 with FC6 Test2 with 2.6.19-rc1 and all the relocatable
> patches and things work fine for me. Commnand line is also being recognized
> properly.
This is FC1.
> Looks like something specific to your setup. Can you please provide some
> more details
>
> - Do you see any failure messages?
Tricky. No command line means no serial console, no netconsole.
> - Can you please provide your /etc/lilo.conf file.
Various stuff at http://userweb.kernel.org/~akpm/vmm/
> - What lilo version are you using?
vmm:/boot> lilo -v
LILO version 21.4-4, Copyright (C) 1992-1998 Werner Almesberger
'lba32' extensions Copyright (C) 1999,2000 John Coffman
> - Can you please also upload your kernel config file.
See above.