2016-04-29 00:10:03

by Kees Cook

[permalink] [raw]
Subject: [PATCH 0/6] x86/boot: Improve compressed kernel handling

This is the next batch of patches for cleaning up x86 boot to prepare
for KASLR improvements. This fixes a bug (for when relocation can be
above 2G in later patches), standardizes where the compressed kernel is
positioned for decompression, corrects (and simplies) the total kernel
run size calculation, and (using the now available offset constants)
fixes bounds-checking on relocations.

The patch series following this one will split the KASLR physical
and virtual base address randomization.

-Kees


2016-04-29 00:09:21

by Kees Cook

[permalink] [raw]
Subject: [PATCH 5/6] x86/KASLR: Clean up unused code from old "run_size"

From: Yinghai Lu <[email protected]>

Since run_size is now calculated in misc.c, the old script and associated
argument passing is no longer needed. This patch removes them, and renames
"run_size" to the more descriptive "kernel_total_size".

Cc: "H. Peter Anvin" <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Junjie Mao <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
[kees: rewrote changelog, rename run_size to kernel_total_size]
Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/compressed/Makefile | 4 +---
arch/x86/boot/compressed/head_32.S | 3 +--
arch/x86/boot/compressed/head_64.S | 3 ---
arch/x86/boot/compressed/misc.c | 11 ++++------
arch/x86/boot/compressed/mkpiggy.c | 10 ++-------
arch/x86/tools/calc_run_size.sh | 42 --------------------------------------
6 files changed, 8 insertions(+), 65 deletions(-)
delete mode 100644 arch/x86/tools/calc_run_size.sh

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 45997805adad..adef26d22224 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -121,10 +121,8 @@ suffix-$(CONFIG_KERNEL_XZ) := xz
suffix-$(CONFIG_KERNEL_LZO) := lzo
suffix-$(CONFIG_KERNEL_LZ4) := lz4

-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
- $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
quiet_cmd_mkpiggy = MKPIGGY $@
- cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+ cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )

targets += piggy.S
$(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8a95e2f845aa..1038524270e7 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -238,7 +238,6 @@ relocated:
* Do the extraction, and jump to the new kernel..
*/
/* push arguments for extract_kernel: */
- pushl $z_run_size /* size of kernel with .bss and .brk */
pushl $z_output_len /* decompressed length, end of relocs */

movl BP_init_size(%esi), %eax
@@ -254,7 +253,7 @@ relocated:
pushl %eax /* heap area */
pushl %esi /* real mode pointer */
call extract_kernel /* returns kernel location in %eax */
- addl $28, %esp
+ addl $24, %esp

/*
* Jump to the extracted kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 09cdc0c3ee7e..7c047002950c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -415,8 +415,6 @@ relocated:
* Do the extraction, and jump to the new kernel..
*/
pushq %rsi /* Save the real mode argument */
- movq $z_run_size, %r9 /* size of kernel with .bss and .brk */
- pushq %r9
movq %rsi, %rdi /* real mode address */
leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
leaq input_data(%rip), %rdx /* input_data */
@@ -424,7 +422,6 @@ relocated:
movq %rbp, %r8 /* output target address */
movq $z_output_len, %r9 /* decompressed length, end of relocs */
call extract_kernel /* returns kernel location in %rax */
- popq %r9
popq %rsi

/*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cda93d16ad4d..bee6238e7cfc 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -340,9 +340,9 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
unsigned char *input_data,
unsigned long input_len,
unsigned char *output,
- unsigned long output_len,
- unsigned long run_size)
+ unsigned long output_len)
{
+ const unsigned long kernel_total_size = VO__end - VO__text;
unsigned char *output_orig = output;

/* Retain x86 boot parameters pointer passed from startup_32/64. */
@@ -364,8 +364,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
lines = boot_params->screen_info.orig_video_lines;
cols = boot_params->screen_info.orig_video_cols;

- run_size = VO__end - VO__text;
-
console_init();
debug_putstr("early console in extract_kernel\n");

@@ -377,7 +375,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
debug_putaddr(input_len);
debug_putaddr(output);
debug_putaddr(output_len);
- debug_putaddr(run_size);
+ debug_putaddr(kernel_total_size);

/*
* The memory hole needed for the kernel is the larger of either
@@ -385,8 +383,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
* entire decompressed kernel plus .bss and .brk sections.
*/
output = choose_random_location(input_data, input_len, output,
- output_len > run_size ? output_len
- : run_size);
+ max(output_len, kernel_total_size));

/* Validate memory location choices. */
if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index f095ed9e7d3c..72bad2c8debe 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -34,13 +34,11 @@ int main(int argc, char *argv[])
{
uint32_t olen;
long ilen;
- unsigned long run_size;
FILE *f = NULL;
int retval = 1;

- if (argc < 3) {
- fprintf(stderr, "Usage: %s compressed_file run_size\n",
- argv[0]);
+ if (argc < 2) {
+ fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
goto bail;
}

@@ -65,15 +63,11 @@ int main(int argc, char *argv[])
ilen = ftell(f);
olen = get_unaligned_le32(&olen);

- run_size = atoi(argv[2]);
-
printf(".section \".rodata..compressed\",\"a\",@progbits\n");
printf(".globl z_input_len\n");
printf("z_input_len = %lu\n", ilen);
printf(".globl z_output_len\n");
printf("z_output_len = %lu\n", (unsigned long)olen);
- printf(".globl z_run_size\n");
- printf("z_run_size = %lu\n", run_size);

printf(".globl input_data, input_data_end\n");
printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17bb3910..000000000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
- echo "Never found .bss or .brk file offset" >&2
- exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
- # Gold linker shows them as consecutive.
- endB=$(( $offsetB + $sizeB ))
- if [ "$endB" != "$run_size" ] ; then
- printf "sizeA: 0x%x\n" $sizeA >&2
- printf "offsetA: 0x%x\n" $offsetA >&2
- printf "sizeB: 0x%x\n" $sizeB >&2
- printf "offsetB: 0x%x\n" $offsetB >&2
- echo ".bss and .brk are non-contiguous" >&2
- exit 1
- fi
-fi
-
-printf "%d\n" $run_size
-exit 0
--
2.6.3

2016-04-29 00:09:20

by Kees Cook

[permalink] [raw]
Subject: [PATCH 3/6] x86/boot: Calculate decompression size during boot not build

From: Yinghai Lu <[email protected]>

Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
This doesn't work well because mkpiggy.c doesn't know the details of the
decompressor in use. As a result, it can only make an estimation, which
has risks:

- output + output_len (VO) could be much bigger than input + input_len
(ZO). In this case, the decompressed kernel plus relocs could overwrite
the decompression code while it is running.

- The head code of ZO could be bigger than z_extract_offset. In this case
an overwrite could happen when the head code is running to move ZO to
the end of buffer. Though currently the size of the head code is very
small it's still a potential risk. Since there is no rule to limit the
size of the head code of ZO, it runs the risk of suddenly becoming a
(hard to find) bug.

Instead, this moves the z_extract_offset calculation into header.S, and
makes adjustments to be sure that the above two cases can never happen,
and further corrects the comments describing the calculations.

Since we have (previously) made ZO always be located against the end of
decompression buffer, z_extract_offset is only used here to calculate an
appropriate buffer size (INIT_SIZE), and is not longer used elsewhere. As
such, it can be removed from voffset.h.

Additionally cleans up #if/#else #define indenting for improved
readability.

Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
[kees: rewrote changelog and comments]
Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/mkpiggy.c | 21 ++++-----------------
arch/x86/boot/header.S | 35 +++++++++++++++++++++++++----------
3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index b1ef9e489084..942f7dabfb1e 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -95,7 +95,7 @@ targets += voffset.h
$(obj)/voffset.h: vmlinux FORCE
$(call if_changed,voffset)

-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'

quiet_cmd_zoffset = ZOFFSET $@
cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index b980046c3329..f095ed9e7d3c 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -18,11 +18,10 @@
*
* H. Peter Anvin <[email protected]>
*
- * ----------------------------------------------------------------------- */
-
-/*
- * Compute the desired load offset from a compressed program; outputs
- * a small assembly wrapper with the appropriate symbols defined.
+ * -----------------------------------------------------------------------
+ *
+ * Outputs a small assembly wrapper with the appropriate symbols defined.
+ *
*/

#include <stdlib.h>
@@ -35,7 +34,6 @@ int main(int argc, char *argv[])
{
uint32_t olen;
long ilen;
- unsigned long offs;
unsigned long run_size;
FILE *f = NULL;
int retval = 1;
@@ -67,15 +65,6 @@ int main(int argc, char *argv[])
ilen = ftell(f);
olen = get_unaligned_le32(&olen);

- /*
- * Now we have the input (compressed) and output (uncompressed)
- * sizes, compute the necessary decompression offset...
- */
-
- offs = (olen > ilen) ? olen - ilen : 0;
- offs += olen >> 12; /* Add 8 bytes for each 32K block */
- offs += 64*1024 + 128; /* Add 64K + 128 bytes slack */
- offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
run_size = atoi(argv[2]);

printf(".section \".rodata..compressed\",\"a\",@progbits\n");
@@ -83,8 +72,6 @@ int main(int argc, char *argv[])
printf("z_input_len = %lu\n", ilen);
printf(".globl z_output_len\n");
printf("z_output_len = %lu\n", (unsigned long)olen);
- printf(".globl z_extract_offset\n");
- printf("z_extract_offset = 0x%lx\n", offs);
printf(".globl z_run_size\n");
printf("z_run_size = %lu\n", run_size);

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index fd85b9e4e953..fb137d4a7e8c 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -508,13 +508,10 @@ pref_address: .quad LOAD_PHYSICAL_ADDR # preferred load addr
# To avoid problems with the compressed data's meta information an extra 18
# bytes are needed. Leading to the formula:
#
-# extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size
+# extra_bytes = (uncompressed_size >> 12) + 32768 + 18
#
# Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
# Adding 32768 instead of 32767 just makes for round numbers.
-# Adding the decompressor_size is necessary as it musht live after all
-# of the data as well. Last I measured the decompressor is about 14K.
-# 10K of actual data and 4K of bss.
#
# Above analysis is for decompressing gzip compressed kernel only. Up to
# now 6 different decompressor are supported all together. And among them
@@ -524,17 +521,35 @@ pref_address: .quad LOAD_PHYSICAL_ADDR # preferred load addr
# the description in lib/decompressor_xxx.c for specific information.
#
# extra_bytes = (uncompressed_size >> 12) + 65536 + 128
-#
-# Note that this calculation, which results in z_extract_offset (below),
-# is currently generated in compressed/mkpiggy.c

-#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+#define ZO_z_extra_bytes ((ZO_z_output_len >> 12) + 65536 + 128)
+#if ZO_z_output_len > ZO_z_input_len
+# define ZO_z_extract_offset (ZO_z_output_len + ZO_z_extra_bytes - \
+ ZO_z_input_len)
+#else
+# define ZO_z_extract_offset ZO_z_extra_bytes
+#endif
+
+/*
+ * The extract_offset has to be bigger than ZO head section. Otherwise when
+ * the head code is running to move ZO to the end of the buffer, it will
+ * overwrite head code itself.
+ */
+#if (ZO__ehead - ZO_startup_32) > ZO_z_extract_offset
+# define ZO_z_min_extract_offset ((ZO__ehead - ZO_startup_32 + 4095) & ~4095)
+#else
+# define ZO_z_min_extract_offset ((ZO_z_extract_offset + 4095) & ~4095)
+#endif
+
+#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
+
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
-#define INIT_SIZE ZO_INIT_SIZE
+# define INIT_SIZE ZO_INIT_SIZE
#else
-#define INIT_SIZE VO_INIT_SIZE
+# define INIT_SIZE VO_INIT_SIZE
#endif
+
init_size: .long INIT_SIZE # kernel initialization size
handover_offset: .long 0 # Filled in by build.c

--
2.6.3

2016-04-29 00:09:17

by Kees Cook

[permalink] [raw]
Subject: [PATCH 1/6] x86/KASLR: Handle kernel relocation above 2G

From: Baoquan He <[email protected]>

When processing the relocation table, the offset used to calculate the
relocation is an int. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G. To handle relocations above 2G (seen in
situations like kexec, netboot, etc), this offset needs to be calculated
using a long to avoid wrapping and miscalculating the relocation.

Signed-off-by: Baoquan He <[email protected]>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/compressed/misc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 6dde6ccdf00e..45145149c07d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -232,7 +232,7 @@ static void handle_relocations(void *output, unsigned long output_len)
* So we work backwards from the end of the decompressed image.
*/
for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
- int extended = *reloc;
+ long extended = *reloc;
extended += map;

ptr = (unsigned long)extended;
--
2.6.3

2016-04-29 00:09:44

by Kees Cook

[permalink] [raw]
Subject: [PATCH 6/6] x86/boot: Correctly bounds-check relocations

From: Yinghai Lu <[email protected]>

Relocation handling performs bounds checking on the resulting calculated
addresses. The existing code uses output_len (VO size plus relocs size) as
the max address. This is not right since the max_addr check should stop at
the end of VO and exclude bss, brk, etc, which follows. The valid range
should be VO [_text, __bss_start] in the loaded physical address space.

This patch adds an export for __bss_start in voffset.h and uses it to
set the correct limit for max_addr.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: Baoquan He <[email protected]>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/misc.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index adef26d22224..75f2233b8414 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -57,7 +57,7 @@ LDFLAGS_vmlinux := -T
hostprogs-y := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include

-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'

quiet_cmd_voffset = VOFFSET $@
cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index bee6238e7cfc..8f0253d8c7ff 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -191,7 +191,7 @@ static void handle_relocations(void *output, unsigned long output_len)
int *reloc;
unsigned long delta, map, ptr;
unsigned long min_addr = (unsigned long)output;
- unsigned long max_addr = min_addr + output_len;
+ unsigned long max_addr = min_addr + (VO___bss_start - VO__text);

/*
* Calculate the delta between where vmlinux was linked to load
--
2.6.3

2016-04-29 00:10:00

by Kees Cook

[permalink] [raw]
Subject: [PATCH 4/6] x86/boot: Fix "run_size" calculation

From: Yinghai Lu <[email protected]>

Currently, the variable "run_size" holds the total kernel size
(size of code plus brk and bss) and is calculated via the shell script
arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
of the .bss and .brk sections in vmlinux, and adds them as follows:

run_size=$(( $offsetA + $sizeA + $sizeB ))

However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.

First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.

[bhe@x1 linux]$ objdump -h vmlinux

vmlinux: file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
27 .bss 00170000 ffffffff81ec8000 0000000001ec8000 012c8000 2**12
ALLOC
28 .brk 00027000 ffffffff82038000 0000000002038000 012c8000 2**0
ALLOC

Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:

0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000

However, if we instead examine the ELF LOAD program headers, we see a
different picture.

[bhe@x1 linux]$ readelf -l vmlinux

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x0000000000b5e000 0x0000000000b5e000 R E 200000
LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
0x0000000000145000 0x0000000000145000 RW 200000
LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d45000
0x0000000000018158 0x0000000000018158 RW 200000
LOAD 0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
0x000000000016a000 0x0000000000301000 RWE 200000
NOTE 0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
0x00000000000001bc 0x00000000000001bc 4

Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
__ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
__modver
01 .data .vvar
02 .data..percpu
03 .init.text .init.data .x86_cpu_dev.init .parainstructions
.altinstructions .altinstr_replacement .iommu_table .apicdrivers
.exit.text .smp_locks .bss .brk
04 .notes

As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:

0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000

So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000

0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000

As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h for such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation for run_size.

Dependence before:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c

Dependence after:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==>
boot/voffset.h ==> vmlinux

Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Cc: Junjie Mao <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
[kees: rewrote changelog]
Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/Makefile | 11 +----------
arch/x86/boot/compressed/Makefile | 12 ++++++++++++
arch/x86/boot/compressed/misc.c | 3 +++
3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 942f7dabfb1e..700a9c6e6159 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -86,15 +86,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE

SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))

-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
-
-quiet_cmd_voffset = VOFFSET $@
- cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
-
-targets += voffset.h
-$(obj)/voffset.h: vmlinux FORCE
- $(call if_changed,voffset)
-
sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'

quiet_cmd_zoffset = ZOFFSET $@
@@ -106,7 +97,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE


AFLAGS_header.o += -I$(obj)
-$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
+$(obj)/header.o: $(obj)/zoffset.h

LDFLAGS_setup.elf := -T
$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 542c92f5ca4f..45997805adad 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -57,6 +57,18 @@ LDFLAGS_vmlinux := -T
hostprogs-y := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include

+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+
+quiet_cmd_voffset = VOFFSET $@
+ cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
+
+targets += ../voffset.h
+
+$(obj)/../voffset.h: vmlinux FORCE
+ $(call if_changed,voffset)
+
+$(obj)/misc.o: $(obj)/../voffset.h
+
vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o \
$(obj)/piggy.o $(obj)/cpuflags.o
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 4b4605e94b3c..cda93d16ad4d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -13,6 +13,7 @@

#include "misc.h"
#include "../string.h"
+#include "../voffset.h"

/*
* WARNING!!
@@ -363,6 +364,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
lines = boot_params->screen_info.orig_video_lines;
cols = boot_params->screen_info.orig_video_cols;

+ run_size = VO__end - VO__text;
+
console_init();
debug_putstr("early console in extract_kernel\n");

--
2.6.3

2016-04-29 00:10:55

by Kees Cook

[permalink] [raw]
Subject: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer

From: Yinghai Lu <[email protected]>

This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what VO and ZO are. They were introduced in commits by hpa:

77d1a49 x86, boot: make symbols from the main vmlinux available
37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields

Specifically:

VO:
- uncompressed kernel image
- size: VO__end - VO__text ("VO_INIT_SIZE" define)

ZO:
- bootable compressed kernel image (boot/compressed/vmlinux)
- head text + compressed kernel (VO and relocs table) + decompressor code
- size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)

The INIT_SIZE definition is used to find the larger of the two image sizes:

#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
#define INIT_SIZE ZO_INIT_SIZE
#else
#define INIT_SIZE VO_INIT_SIZE
#endif

The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)

When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:

|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|

When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.

|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|

To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing that
start extract_offset as is currently done.

This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code. Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).

Signed-off-by: Yinghai Lu <[email protected]>
[kees: rewrote changelog and comments]
Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 11 +++++++++--
arch/x86/boot/compressed/head_64.S | 8 ++++++--
arch/x86/boot/compressed/misc.c | 17 +++++++++++++++++
arch/x86/boot/compressed/mkpiggy.c | 3 ---
arch/x86/boot/compressed/vmlinux.lds.S | 1 +
arch/x86/kernel/asm-offsets.c | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +-
7 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 26dd9df19a69..8a95e2f845aa 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -176,7 +176,9 @@ preferred_addr:
1:

/* Target address to relocate to for decompression */
- addl $z_extract_offset, %ebx
+ movl BP_init_size(%esi), %eax
+ subl $_end, %eax
+ addl %eax, %ebx

/* Set up the stack */
leal boot_stack_end(%ebx), %esp
@@ -238,8 +240,13 @@ relocated:
/* push arguments for extract_kernel: */
pushl $z_run_size /* size of kernel with .bss and .brk */
pushl $z_output_len /* decompressed length, end of relocs */
- leal z_extract_offset_negative(%ebx), %ebp
+
+ movl BP_init_size(%esi), %eax
+ subl $_end, %eax
+ movl %ebx, %ebp
+ subl %eax, %ebp
pushl %ebp /* output address */
+
pushl $z_input_len /* input_len */
leal input_data(%ebx), %eax
pushl %eax /* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d43c30ed89ed..09cdc0c3ee7e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -110,7 +110,9 @@ ENTRY(startup_32)
1:

/* Target address to relocate to for decompression */
- addl $z_extract_offset, %ebx
+ movl BP_init_size(%esi), %eax
+ subl $_end, %eax
+ addl %eax, %ebx

/*
* Prepare for entering 64 bit mode
@@ -338,7 +340,9 @@ preferred_addr:
1:

/* Target address to relocate to for decompression */
- leaq z_extract_offset(%rbp), %rbx
+ movl BP_init_size(%rsi), %ebx
+ subl $_end, %ebx
+ addq %rbp, %rbx

/* Set up the stack */
leaq boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 45145149c07d..4b4605e94b3c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -318,6 +318,23 @@ static void parse_elf(void *output)
free(phdrs);
}

+/*
+ * The compressed kernel image (ZO), has been moved so that its position
+ * is against the end of the buffer used to hold the uncompressed kernel
+ * image (VO) and the execution environment (.bss, .brk), which makes sure
+ * there is room to do the in-place decompression. (See header.S for the
+ * calculations.)
+ *
+ * |-----compressed kernel image------|
+ * V V
+ * 0 extract_offset +INIT_SIZE
+ * |-----------|---------------|-------------------------|--------|
+ * | | | |
+ * VO__text startup_32 of ZO VO__end ZO__end
+ * ^ ^
+ * |-------uncompressed kernel image---------|
+ *
+ */
asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
unsigned char *input_data,
unsigned long input_len,
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f213182..b980046c3329 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -85,9 +85,6 @@ int main(int argc, char *argv[])
printf("z_output_len = %lu\n", (unsigned long)olen);
printf(".globl z_extract_offset\n");
printf("z_extract_offset = 0x%lx\n", offs);
- /* z_extract_offset_negative allows simplification of head_32.S */
- printf(".globl z_extract_offset_negative\n");
- printf("z_extract_offset_negative = -0x%lx\n", offs);
printf(".globl z_run_size\n");
printf("z_run_size = %lu\n", run_size);

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c98284..e24e0a0c90c9 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -70,5 +70,6 @@ SECTIONS
_epgtable = . ;
}
#endif
+ . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;
}
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 5c042466f274..674134e9f5e5 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -80,6 +80,7 @@ void common(void) {
OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
OFFSET(BP_version, boot_params, hdr.version);
OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+ OFFSET(BP_init_size, boot_params, hdr.init_size);
OFFSET(BP_pref_address, boot_params, hdr.pref_address);
OFFSET(BP_code32_start, boot_params, hdr.code32_start);

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4c941f88d405..9297a002d8e5 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -334,7 +334,7 @@ SECTIONS
__brk_limit = .;
}

- . = ALIGN(PAGE_SIZE);
+ . = ALIGN(PAGE_SIZE); /* keep VO_INIT_SIZE page aligned */
_end = .;

STABS_DEBUG
--
2.6.3

2016-04-29 07:18:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer


* Kees Cook <[email protected]> wrote:

> From: Yinghai Lu <[email protected]>
>
> This change makes later calculations about where the kernel is located
> easier to reason about. To better understand this change, we must first
> clarify what VO and ZO are. They were introduced in commits by hpa:
>
> 77d1a49 x86, boot: make symbols from the main vmlinux available
> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
>
> Specifically:
>
> VO:
> - uncompressed kernel image
> - size: VO__end - VO__text ("VO_INIT_SIZE" define)
>
> ZO:
> - bootable compressed kernel image (boot/compressed/vmlinux)
> - head text + compressed kernel (VO and relocs table) + decompressor code
> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
>
> The INIT_SIZE definition is used to find the larger of the two image sizes:
>
> #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
> #define VO_INIT_SIZE (VO__end - VO__text)
> #if ZO_INIT_SIZE > VO_INIT_SIZE
> #define INIT_SIZE ZO_INIT_SIZE
> #else
> #define INIT_SIZE VO_INIT_SIZE
> #endif
>
> The current code uses extract_offset to decide where to position the
> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
> currently includes the extract_offset.)

Yeah, so I rewrote the above to:

=================>
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:

77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields")

Specifically:

All names prefixed with 'VO_':

- relate to the uncompressed kernel image

- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)

All names prefixed with 'ZO_':

- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code

- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)

The 'INIT_SIZE' value is used to find the larger of the two image sizes:

#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)

#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif

The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)
<=================

Assuming the edits I made are correct, this is the point where the changelog lost
me. It does not explain why ZO_z_extract_offset exists. Why isn't the ZO copied to
offset 0?

I had to go into arch/x86/boot/compressed/mkpiggy.c, where ZO_z_extract_offset is
generated, to find the answer: it's needed because we are trying to minimize the
amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.

To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range. Correct?

This is a very essential central concept to the whole code, but nowhere is it
described clearly!

But more importantly, especially in view of address space randomization, we should
realize that the days of 8 MB i386-DX systems are gone, and we should get rid of
all this crazy obfuscation that is hindering development in this area. I also
suspect that the actual temporary allocation size reduction savings from this
trick are relatively small, compared to the resulting total memory size.

So my suggestion: let's just cleanly separate all the data areas and not try to do
any clever overlapping: the benefit will be minimal, and any system that has main
RAM less than twice of the VO+ZO image sizes is fundamentally unbootable and
unusable anyway.

I.e. have a really clean size calculation of:

ZO + VO + decompressor-stacks-size + decompressor-data-size

and decompress accordingly without tricks, without overlaps, without any chance
for corruption - and, most importantly, without this metric ton of obfuscation
that very few people have managed to fight their way through in the last couple of
years, and which hinders essential features ...

Agreed?

Thanks,

Ingo

2016-04-29 07:48:57

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer

On Fri, Apr 29, 2016 at 12:18 AM, Ingo Molnar <[email protected]> wrote:
>
> * Kees Cook <[email protected]> wrote:
>
>> From: Yinghai Lu <[email protected]>
>>
>> This change makes later calculations about where the kernel is located
>> easier to reason about. To better understand this change, we must first
>> clarify what VO and ZO are. They were introduced in commits by hpa:
>>
>> 77d1a49 x86, boot: make symbols from the main vmlinux available
>> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields
>>
>> Specifically:
>>
>> VO:
>> - uncompressed kernel image
>> - size: VO__end - VO__text ("VO_INIT_SIZE" define)
>>
>> ZO:
>> - bootable compressed kernel image (boot/compressed/vmlinux)
>> - head text + compressed kernel (VO and relocs table) + decompressor code
>> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
>>
>> The INIT_SIZE definition is used to find the larger of the two image sizes:
>>
>> #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
>> #define VO_INIT_SIZE (VO__end - VO__text)
>> #if ZO_INIT_SIZE > VO_INIT_SIZE
>> #define INIT_SIZE ZO_INIT_SIZE
>> #else
>> #define INIT_SIZE VO_INIT_SIZE
>> #endif
>>
>> The current code uses extract_offset to decide where to position the
>> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
>> currently includes the extract_offset.)
>
> Yeah, so I rewrote the above to:
>
> =================>
> This change makes later calculations about where the kernel is located
> easier to reason about. To better understand this change, we must first
> clarify what 'VO' and 'ZO' are. These values were introduced in commits
> by hpa:
>
> 77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
> 37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
>
> Specifically:
>
> All names prefixed with 'VO_':
>
> - relate to the uncompressed kernel image
>
> - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
>
> All names prefixed with 'ZO_':
>
> - relate to the bootable compressed kernel image (boot/compressed/vmlinux),
> which is composed of the following memory areas:
> - head text
> - compressed kernel (VO image and relocs table)
> - decompressor code
>
> - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
>
> The 'INIT_SIZE' value is used to find the larger of the two image sizes:
>
> #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
> #define VO_INIT_SIZE (VO__end - VO__text)
>
> #if ZO_INIT_SIZE > VO_INIT_SIZE
> # define INIT_SIZE ZO_INIT_SIZE
> #else
> # define INIT_SIZE VO_INIT_SIZE
> #endif
>
> The current code uses extract_offset to decide where to position the
> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
> currently includes the extract_offset.)
> <=================
>
> Assuming the edits I made are correct, this is the point where the changelog lost
> me. It does not explain why ZO_z_extract_offset exists. Why isn't the ZO copied to
> offset 0?
>
> I had to go into arch/x86/boot/compressed/mkpiggy.c, where ZO_z_extract_offset is
> generated, to find the answer: it's needed because we are trying to minimize the
> amount of RAM used for the whole act of creating an uncompressed, executable,
> properly relocation-linked kernel image in system memory. We do this so that
> kernels can be booted on even very small systems.
>
> To achieve the goal of minimal memory consumption we have implemented an in-place
> decompression strategy: instead of cleanly separating the VO and ZO images and
> also allocating some memory for the decompression code's runtime needs, we instead
> create this elaborate layout of memory buffers where the output (decompressed)
> stream, as it progresses, overlaps with and destroys the input (compressed)
> stream. This can only be done safely if the ZO image is placed to the end of the
> VO range, plus a certain amount of safety distance to make sure that when the last
> bytes of the VO range are decompressed, the compressed stream pointer is safely
> beyond the end of the VO range. Correct?
>
> This is a very essential central concept to the whole code, but nowhere is it
> described clearly!

That would certainly be worth calling out in the description, true.

> But more importantly, especially in view of address space randomization, we should
> realize that the days of 8 MB i386-DX systems are gone, and we should get rid of
> all this crazy obfuscation that is hindering development in this area. I also
> suspect that the actual temporary allocation size reduction savings from this
> trick are relatively small, compared to the resulting total memory size.
>
> So my suggestion: let's just cleanly separate all the data areas and not try to do
> any clever overlapping: the benefit will be minimal, and any system that has main
> RAM less than twice of the VO+ZO image sizes is fundamentally unbootable and
> unusable anyway.
>
> I.e. have a really clean size calculation of:
>
> ZO + VO + decompressor-stacks-size + decompressor-data-size
>
> and decompress accordingly without tricks, without overlaps, without any chance
> for corruption - and, most importantly, without this metric ton of obfuscation
> that very few people have managed to fight their way through in the last couple of
> years, and which hinders essential features ...
>
> Agreed?

I don't agree. We do still have embedded systems running x86 kernels,
and we have cases where we're running multiple kernels in memory (like
kdump). I think the memory savings is worth the complexity, especially
since the complexity is being reduced up by this patch. But that's not
all:

If we moved the compressed kernel after the buffer, the only thing
we'd do would be taking up more memory. We'd still have the head_*.S
complexity of handling the relocation and handling the copy, we'd
still have the extraction, etc, etc. The only thing would be literally
changing extract_offset to INIT_SIZE. Everything else would be the
same.

If we moved the decompressed kernel after the compressed kernel,
(ignoring KASLR for a moment) then we'd end up in a confusing
situation where the kernel would be running somewhere other than where
the boot loader asked it to load. I don't even want to think about the
weird bug reports we might get from a change like that from old or
weird loaders.

This patch gets us a more reasonable layout with less complexity and
no change to the memory footprint without changing the expectations of
the boot loader. I really think this should stand.

-Kees

--
Kees Cook
Chrome OS & Brillo Security

Subject: [tip:x86/boot] x86/KASLR: Handle kernel relocations above 2G correctly

Commit-ID: 6f9af75faa1df61e1ee5bea8a787a90605bb528d
Gitweb: http://git.kernel.org/tip/6f9af75faa1df61e1ee5bea8a787a90605bb528d
Author: Baoquan He <[email protected]>
AuthorDate: Thu, 28 Apr 2016 17:09:03 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 29 Apr 2016 09:58:26 +0200

x86/KASLR: Handle kernel relocations above 2G correctly

When processing the relocation table, the offset used to calculate the
relocation is an 'int'. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G.

To handle relocations above 2G (seen in situations like kexec, netboot, etc),
this offset needs to be calculated using a 'long' to avoid wrapping and
miscalculating the relocation.

Signed-off-by: Baoquan He <[email protected]>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/boot/compressed/misc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 6dde6cc..4514514 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -232,7 +232,7 @@ static void handle_relocations(void *output, unsigned long output_len)
* So we work backwards from the end of the decompressed image.
*/
for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
- int extended = *reloc;
+ long extended = *reloc;
extended += map;

ptr = (unsigned long)extended;

2016-04-29 08:08:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer


* Kees Cook <[email protected]> wrote:

> I don't agree. We do still have embedded systems running x86 kernels, and we
> have cases where we're running multiple kernels in memory (like kdump). I think
> the memory savings is worth the complexity, especially since the complexity is
> being reduced up by this patch. [...]

Hm, so can we quantify it, how much are the temporary memory savings in practice?
I'd like to see actual vmlinuz numbers with say a defconfig and with a distro
config.

Small systems tend to have smaller kernel images, so the temporary savings tend to
be smaller as well. There's no long term loss, we'd still recover all memory not
used by the resulting kernel image and make it usable as free RAM. So the only
question is the temporary memory allocation size of the decompression step.

> [...] But that's not all:
>
> If we moved the compressed kernel after the buffer, the only thing
> we'd do would be taking up more memory. We'd still have the head_*.S
> complexity of handling the relocation and handling the copy, we'd
> still have the extraction, etc, etc. The only thing would be literally
> changing extract_offset to INIT_SIZE. Everything else would be the
> same.

Yes - but arguing about all this code would cause a lot fewer headaches,
for me at least!

Also, I think once we've simplified the whole model of decompression, we can
improve on the structure even more.

> If we moved the decompressed kernel after the compressed kernel, (ignoring KASLR
> for a moment) then we'd end up in a confusing situation where the kernel would
> be running somewhere other than where the boot loader asked it to load. I don't
> even want to think about the weird bug reports we might get from a change like
> that from old or weird loaders.

Well, 'where the boot loader asked it to load' in this case is essentially the
z_extract_offset .globl, isn't it? But how to use that value is a mostly x86
kernel internal matter - in fact there's even an inversion step between the 32-bit
and 64-bit value. Is there any other boot loader environment component I missed?

Thanks,

Ingo

2016-04-29 09:51:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer


* Ingo Molnar <[email protected]> wrote:

> * Kees Cook <[email protected]> wrote:
>
> > I don't agree. We do still have embedded systems running x86 kernels, and we
> > have cases where we're running multiple kernels in memory (like kdump). I think
> > the memory savings is worth the complexity, especially since the complexity is
> > being reduced up by this patch. [...]
>
> Hm, so can we quantify it, how much are the temporary memory savings in practice?
> I'd like to see actual vmlinuz numbers with say a defconfig and with a distro
> config.

Btw., with the extra explanations added to the changelog I am happy with the
series and have applied it to x86/boot and pushed it out. If we decide to simplify
the code it will be much easier to simplify correctly working, well documented
code.

It would be nice to do another pass at the in-code comments and at the general
code structure, to make sure all this is properly explained and structured.

Thanks,

Ingo

Subject: [tip:x86/boot] x86/boot: Move compressed kernel to the end of the decompression buffer

Commit-ID: 974f221c84b05b1dc2f5ea50dc16d2a9d1e95eda
Gitweb: http://git.kernel.org/tip/974f221c84b05b1dc2f5ea50dc16d2a9d1e95eda
Author: Yinghai Lu <[email protected]>
AuthorDate: Thu, 28 Apr 2016 17:09:04 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 29 Apr 2016 11:03:29 +0200

x86/boot: Move compressed kernel to the end of the decompression buffer

This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:

77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields")

Specifically:

All names prefixed with 'VO_':

- relate to the uncompressed kernel image

- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)

All names prefixed with 'ZO_':

- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code

- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)

The 'INIT_SIZE' value is used to find the larger of the two image sizes:

#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)

#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif

The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)

Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.

To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.

z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)

When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:

|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|

When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.

|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|

To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.

This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.

Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).

Signed-off-by: Yinghai Lu <[email protected]>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 11 +++++++++--
arch/x86/boot/compressed/head_64.S | 8 ++++++--
arch/x86/boot/compressed/misc.c | 17 +++++++++++++++++
arch/x86/boot/compressed/mkpiggy.c | 3 ---
arch/x86/boot/compressed/vmlinux.lds.S | 1 +
arch/x86/kernel/asm-offsets.c | 1 +
arch/x86/kernel/vmlinux.lds.S | 2 +-
7 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 26dd9df..8a95e2f 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -176,7 +176,9 @@ preferred_addr:
1:

/* Target address to relocate to for decompression */
- addl $z_extract_offset, %ebx
+ movl BP_init_size(%esi), %eax
+ subl $_end, %eax
+ addl %eax, %ebx

/* Set up the stack */
leal boot_stack_end(%ebx), %esp
@@ -238,8 +240,13 @@ relocated:
/* push arguments for extract_kernel: */
pushl $z_run_size /* size of kernel with .bss and .brk */
pushl $z_output_len /* decompressed length, end of relocs */
- leal z_extract_offset_negative(%ebx), %ebp
+
+ movl BP_init_size(%esi), %eax
+ subl $_end, %eax
+ movl %ebx, %ebp
+ subl %eax, %ebp
pushl %ebp /* output address */
+
pushl $z_input_len /* input_len */
leal input_data(%ebx), %eax
pushl %eax /* input_data */
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d43c30e..09cdc0c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -110,7 +110,9 @@ ENTRY(startup_32)
1:

/* Target address to relocate to for decompression */
- addl $z_extract_offset, %ebx
+ movl BP_init_size(%esi), %eax
+ subl $_end, %eax
+ addl %eax, %ebx

/*
* Prepare for entering 64 bit mode
@@ -338,7 +340,9 @@ preferred_addr:
1:

/* Target address to relocate to for decompression */
- leaq z_extract_offset(%rbp), %rbx
+ movl BP_init_size(%rsi), %ebx
+ subl $_end, %ebx
+ addq %rbp, %rbx

/* Set up the stack */
leaq boot_stack_end(%rbx), %rsp
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 4514514..4b4605e9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -318,6 +318,23 @@ static void parse_elf(void *output)
free(phdrs);
}

+/*
+ * The compressed kernel image (ZO), has been moved so that its position
+ * is against the end of the buffer used to hold the uncompressed kernel
+ * image (VO) and the execution environment (.bss, .brk), which makes sure
+ * there is room to do the in-place decompression. (See header.S for the
+ * calculations.)
+ *
+ * |-----compressed kernel image------|
+ * V V
+ * 0 extract_offset +INIT_SIZE
+ * |-----------|---------------|-------------------------|--------|
+ * | | | |
+ * VO__text startup_32 of ZO VO__end ZO__end
+ * ^ ^
+ * |-------uncompressed kernel image---------|
+ *
+ */
asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
unsigned char *input_data,
unsigned long input_len,
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index d8222f2..b980046 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -85,9 +85,6 @@ int main(int argc, char *argv[])
printf("z_output_len = %lu\n", (unsigned long)olen);
printf(".globl z_extract_offset\n");
printf("z_extract_offset = 0x%lx\n", offs);
- /* z_extract_offset_negative allows simplification of head_32.S */
- printf(".globl z_extract_offset_negative\n");
- printf("z_extract_offset_negative = -0x%lx\n", offs);
printf(".globl z_run_size\n");
printf("z_run_size = %lu\n", run_size);

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 34d047c..e24e0a0 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -70,5 +70,6 @@ SECTIONS
_epgtable = . ;
}
#endif
+ . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;
}
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 5c04246..674134e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -80,6 +80,7 @@ void common(void) {
OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
OFFSET(BP_version, boot_params, hdr.version);
OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+ OFFSET(BP_init_size, boot_params, hdr.init_size);
OFFSET(BP_pref_address, boot_params, hdr.pref_address);
OFFSET(BP_code32_start, boot_params, hdr.code32_start);

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4c941f8..9297a00 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -334,7 +334,7 @@ SECTIONS
__brk_limit = .;
}

- . = ALIGN(PAGE_SIZE);
+ . = ALIGN(PAGE_SIZE); /* keep VO_INIT_SIZE page aligned */
_end = .;

STABS_DEBUG

Subject: [tip:x86/boot] x86/boot: Calculate decompression size during boot not build

Commit-ID: d607251ba9acc0b5faeaa08818f60d041dd19472
Gitweb: http://git.kernel.org/tip/d607251ba9acc0b5faeaa08818f60d041dd19472
Author: Yinghai Lu <[email protected]>
AuthorDate: Thu, 28 Apr 2016 17:09:05 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 29 Apr 2016 11:03:29 +0200

x86/boot: Calculate decompression size during boot not build

Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
This doesn't work well because mkpiggy.c doesn't know the details of the
decompressor in use. As a result, it can only make an estimation, which
has risks:

- output + output_len (VO) could be much bigger than input + input_len
(ZO). In this case, the decompressed kernel plus relocs could overwrite
the decompression code while it is running.

- The head code of ZO could be bigger than z_extract_offset. In this case
an overwrite could happen when the head code is running to move ZO to
the end of buffer. Though currently the size of the head code is very
small it's still a potential risk. Since there is no rule to limit the
size of the head code of ZO, it runs the risk of suddenly becoming a
(hard to find) bug.

Instead, this moves the z_extract_offset calculation into header.S, and
makes adjustments to be sure that the above two cases can never happen,
and further corrects the comments describing the calculations.

Since we have (in the previous patch) made ZO always be located against
the end of decompression buffer, z_extract_offset is only used here to
calculate an appropriate buffer size (INIT_SIZE), and is not longer used
elsewhere. As such, it can be removed from voffset.h.

Additionally clean up #if/#else #define to improve readability.

Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
[ Rewrote the changelog and comments. ]
Signed-off-by: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/mkpiggy.c | 21 ++++-----------------
arch/x86/boot/header.S | 35 +++++++++++++++++++++++++----------
3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index b1ef9e4..942f7dab 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -95,7 +95,7 @@ targets += voffset.h
$(obj)/voffset.h: vmlinux FORCE
$(call if_changed,voffset)

-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'

quiet_cmd_zoffset = ZOFFSET $@
cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index b980046..f095ed9 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -18,11 +18,10 @@
*
* H. Peter Anvin <[email protected]>
*
- * ----------------------------------------------------------------------- */
-
-/*
- * Compute the desired load offset from a compressed program; outputs
- * a small assembly wrapper with the appropriate symbols defined.
+ * -----------------------------------------------------------------------
+ *
+ * Outputs a small assembly wrapper with the appropriate symbols defined.
+ *
*/

#include <stdlib.h>
@@ -35,7 +34,6 @@ int main(int argc, char *argv[])
{
uint32_t olen;
long ilen;
- unsigned long offs;
unsigned long run_size;
FILE *f = NULL;
int retval = 1;
@@ -67,15 +65,6 @@ int main(int argc, char *argv[])
ilen = ftell(f);
olen = get_unaligned_le32(&olen);

- /*
- * Now we have the input (compressed) and output (uncompressed)
- * sizes, compute the necessary decompression offset...
- */
-
- offs = (olen > ilen) ? olen - ilen : 0;
- offs += olen >> 12; /* Add 8 bytes for each 32K block */
- offs += 64*1024 + 128; /* Add 64K + 128 bytes slack */
- offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
run_size = atoi(argv[2]);

printf(".section \".rodata..compressed\",\"a\",@progbits\n");
@@ -83,8 +72,6 @@ int main(int argc, char *argv[])
printf("z_input_len = %lu\n", ilen);
printf(".globl z_output_len\n");
printf("z_output_len = %lu\n", (unsigned long)olen);
- printf(".globl z_extract_offset\n");
- printf("z_extract_offset = 0x%lx\n", offs);
printf(".globl z_run_size\n");
printf("z_run_size = %lu\n", run_size);

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index fd85b9e..3dd5be3 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -508,13 +508,10 @@ pref_address: .quad LOAD_PHYSICAL_ADDR # preferred load addr
# To avoid problems with the compressed data's meta information an extra 18
# bytes are needed. Leading to the formula:
#
-# extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size
+# extra_bytes = (uncompressed_size >> 12) + 32768 + 18
#
# Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
# Adding 32768 instead of 32767 just makes for round numbers.
-# Adding the decompressor_size is necessary as it musht live after all
-# of the data as well. Last I measured the decompressor is about 14K.
-# 10K of actual data and 4K of bss.
#
# Above analysis is for decompressing gzip compressed kernel only. Up to
# now 6 different decompressor are supported all together. And among them
@@ -524,17 +521,35 @@ pref_address: .quad LOAD_PHYSICAL_ADDR # preferred load addr
# the description in lib/decompressor_xxx.c for specific information.
#
# extra_bytes = (uncompressed_size >> 12) + 65536 + 128
-#
-# Note that this calculation, which results in z_extract_offset (below),
-# is currently generated in compressed/mkpiggy.c

-#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
+#define ZO_z_extra_bytes ((ZO_z_output_len >> 12) + 65536 + 128)
+#if ZO_z_output_len > ZO_z_input_len
+# define ZO_z_extract_offset (ZO_z_output_len + ZO_z_extra_bytes - \
+ ZO_z_input_len)
+#else
+# define ZO_z_extract_offset ZO_z_extra_bytes
+#endif
+
+/*
+ * The extract_offset has to be bigger than ZO head section. Otherwise when
+ * the head code is running to move ZO to the end of the buffer, it will
+ * overwrite the head code itself.
+ */
+#if (ZO__ehead - ZO_startup_32) > ZO_z_extract_offset
+# define ZO_z_min_extract_offset ((ZO__ehead - ZO_startup_32 + 4095) & ~4095)
+#else
+# define ZO_z_min_extract_offset ((ZO_z_extract_offset + 4095) & ~4095)
+#endif
+
+#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_min_extract_offset)
+
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
-#define INIT_SIZE ZO_INIT_SIZE
+# define INIT_SIZE ZO_INIT_SIZE
#else
-#define INIT_SIZE VO_INIT_SIZE
+# define INIT_SIZE VO_INIT_SIZE
#endif
+
init_size: .long INIT_SIZE # kernel initialization size
handover_offset: .long 0 # Filled in by build.c


Subject: [tip:x86/boot] x86/boot: Fix "run_size" calculation

Commit-ID: 67b6662559f7f77bcbd3ac67d09aaac11785f3c1
Gitweb: http://git.kernel.org/tip/67b6662559f7f77bcbd3ac67d09aaac11785f3c1
Author: Yinghai Lu <[email protected]>
AuthorDate: Thu, 28 Apr 2016 17:09:06 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 29 Apr 2016 11:03:30 +0200

x86/boot: Fix "run_size" calculation

Currently, the "run_size" variable holds the total kernel size
(size of code plus brk and bss) and is calculated via the shell script
arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
of the .bss and .brk sections from the vmlinux, and adds them as follows:

run_size = $(( $offsetA + $sizeA + $sizeB ))

However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.

First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.

[bhe@x1 linux]$ objdump -h vmlinux

vmlinux: file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
27 .bss 00170000 ffffffff81ec8000 0000000001ec8000 012c8000 2**12
ALLOC
28 .brk 00027000 ffffffff82038000 0000000002038000 012c8000 2**0
ALLOC

Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:

0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000

However, if we instead examine the ELF LOAD program headers, we see a
different picture.

[bhe@x1 linux]$ readelf -l vmlinux

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x0000000000b5e000 0x0000000000b5e000 R E 200000
LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
0x0000000000145000 0x0000000000145000 RW 200000
LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d45000
0x0000000000018158 0x0000000000018158 RW 200000
LOAD 0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
0x000000000016a000 0x0000000000301000 RWE 200000
NOTE 0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
0x00000000000001bc 0x00000000000001bc 4

Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
__ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
__modver
01 .data .vvar
02 .data..percpu
03 .init.text .init.data .x86_cpu_dev.init .parainstructions
.altinstructions .altinstr_replacement .iommu_table .apicdrivers
.exit.text .smp_locks .bss .brk
04 .notes

As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:

0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000

So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000

0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000

As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h for such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation for run_size.

Dependence before:

boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c

Dependence after:

boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux

Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Junjie Mao <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: [email protected]
Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/boot/Makefile | 11 +----------
arch/x86/boot/compressed/Makefile | 12 ++++++++++++
arch/x86/boot/compressed/misc.c | 3 +++
3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 942f7dab..700a9c6 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -86,15 +86,6 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE

SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))

-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 0x\1/p'
-
-quiet_cmd_voffset = VOFFSET $@
- cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
-
-targets += voffset.h
-$(obj)/voffset.h: vmlinux FORCE
- $(call if_changed,voffset)
-
sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'

quiet_cmd_zoffset = ZOFFSET $@
@@ -106,7 +97,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE


AFLAGS_header.o += -I$(obj)
-$(obj)/header.o: $(obj)/voffset.h $(obj)/zoffset.h
+$(obj)/header.o: $(obj)/zoffset.h

LDFLAGS_setup.elf := -T
$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 542c92f..4599780 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -57,6 +57,18 @@ LDFLAGS_vmlinux := -T
hostprogs-y := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include

+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+
+quiet_cmd_voffset = VOFFSET $@
+ cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
+
+targets += ../voffset.h
+
+$(obj)/../voffset.h: vmlinux FORCE
+ $(call if_changed,voffset)
+
+$(obj)/misc.o: $(obj)/../voffset.h
+
vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o \
$(obj)/piggy.o $(obj)/cpuflags.o
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 4b4605e9..cda93d1 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -13,6 +13,7 @@

#include "misc.h"
#include "../string.h"
+#include "../voffset.h"

/*
* WARNING!!
@@ -363,6 +364,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
lines = boot_params->screen_info.orig_video_lines;
cols = boot_params->screen_info.orig_video_cols;

+ run_size = VO__end - VO__text;
+
console_init();
debug_putstr("early console in extract_kernel\n");


Subject: [tip:x86/boot] x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'

Commit-ID: 4d2d542482205d3df1a0852751f5b004cc6390cc
Gitweb: http://git.kernel.org/tip/4d2d542482205d3df1a0852751f5b004cc6390cc
Author: Yinghai Lu <[email protected]>
AuthorDate: Thu, 28 Apr 2016 17:09:07 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 29 Apr 2016 11:03:30 +0200

x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'

Since 'run_size' is now calculated in misc.c, the old script and associated
argument passing is no longer needed. This patch removes them, and renames
'run_size' to the more descriptive 'kernel_total_size'.

Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
[ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ]
Signed-off-by: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Junjie Mao <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/boot/compressed/Makefile | 4 +---
arch/x86/boot/compressed/head_32.S | 3 +--
arch/x86/boot/compressed/head_64.S | 3 ---
arch/x86/boot/compressed/misc.c | 11 ++++------
arch/x86/boot/compressed/mkpiggy.c | 10 ++-------
arch/x86/tools/calc_run_size.sh | 42 --------------------------------------
6 files changed, 8 insertions(+), 65 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 4599780..adef26d 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -121,10 +121,8 @@ suffix-$(CONFIG_KERNEL_XZ) := xz
suffix-$(CONFIG_KERNEL_LZO) := lzo
suffix-$(CONFIG_KERNEL_LZ4) := lz4

-RUN_SIZE = $(shell $(OBJDUMP) -h vmlinux | \
- $(CONFIG_SHELL) $(srctree)/arch/x86/tools/calc_run_size.sh)
quiet_cmd_mkpiggy = MKPIGGY $@
- cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
+ cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )

targets += piggy.S
$(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8a95e2f..1038524 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -238,7 +238,6 @@ relocated:
* Do the extraction, and jump to the new kernel..
*/
/* push arguments for extract_kernel: */
- pushl $z_run_size /* size of kernel with .bss and .brk */
pushl $z_output_len /* decompressed length, end of relocs */

movl BP_init_size(%esi), %eax
@@ -254,7 +253,7 @@ relocated:
pushl %eax /* heap area */
pushl %esi /* real mode pointer */
call extract_kernel /* returns kernel location in %eax */
- addl $28, %esp
+ addl $24, %esp

/*
* Jump to the extracted kernel.
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 09cdc0c..7c04700 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -415,8 +415,6 @@ relocated:
* Do the extraction, and jump to the new kernel..
*/
pushq %rsi /* Save the real mode argument */
- movq $z_run_size, %r9 /* size of kernel with .bss and .brk */
- pushq %r9
movq %rsi, %rdi /* real mode address */
leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
leaq input_data(%rip), %rdx /* input_data */
@@ -424,7 +422,6 @@ relocated:
movq %rbp, %r8 /* output target address */
movq $z_output_len, %r9 /* decompressed length, end of relocs */
call extract_kernel /* returns kernel location in %rax */
- popq %r9
popq %rsi

/*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cda93d1..bee6238 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -340,9 +340,9 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
unsigned char *input_data,
unsigned long input_len,
unsigned char *output,
- unsigned long output_len,
- unsigned long run_size)
+ unsigned long output_len)
{
+ const unsigned long kernel_total_size = VO__end - VO__text;
unsigned char *output_orig = output;

/* Retain x86 boot parameters pointer passed from startup_32/64. */
@@ -364,8 +364,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
lines = boot_params->screen_info.orig_video_lines;
cols = boot_params->screen_info.orig_video_cols;

- run_size = VO__end - VO__text;
-
console_init();
debug_putstr("early console in extract_kernel\n");

@@ -377,7 +375,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
debug_putaddr(input_len);
debug_putaddr(output);
debug_putaddr(output_len);
- debug_putaddr(run_size);
+ debug_putaddr(kernel_total_size);

/*
* The memory hole needed for the kernel is the larger of either
@@ -385,8 +383,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
* entire decompressed kernel plus .bss and .brk sections.
*/
output = choose_random_location(input_data, input_len, output,
- output_len > run_size ? output_len
- : run_size);
+ max(output_len, kernel_total_size));

/* Validate memory location choices. */
if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index f095ed9..72bad2c 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -34,13 +34,11 @@ int main(int argc, char *argv[])
{
uint32_t olen;
long ilen;
- unsigned long run_size;
FILE *f = NULL;
int retval = 1;

- if (argc < 3) {
- fprintf(stderr, "Usage: %s compressed_file run_size\n",
- argv[0]);
+ if (argc < 2) {
+ fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
goto bail;
}

@@ -65,15 +63,11 @@ int main(int argc, char *argv[])
ilen = ftell(f);
olen = get_unaligned_le32(&olen);

- run_size = atoi(argv[2]);
-
printf(".section \".rodata..compressed\",\"a\",@progbits\n");
printf(".globl z_input_len\n");
printf("z_input_len = %lu\n", ilen);
printf(".globl z_output_len\n");
printf("z_output_len = %lu\n", (unsigned long)olen);
- printf(".globl z_run_size\n");
- printf("z_run_size = %lu\n", run_size);

printf(".globl input_data, input_data_end\n");
printf("input_data:\n");
diff --git a/arch/x86/tools/calc_run_size.sh b/arch/x86/tools/calc_run_size.sh
deleted file mode 100644
index 1a4c17b..0000000
--- a/arch/x86/tools/calc_run_size.sh
+++ /dev/null
@@ -1,42 +0,0 @@
-#!/bin/sh
-#
-# Calculate the amount of space needed to run the kernel, including room for
-# the .bss and .brk sections.
-#
-# Usage:
-# objdump -h a.out | sh calc_run_size.sh
-
-NUM='\([0-9a-fA-F]*[ \t]*\)'
-OUT=$(sed -n 's/^[ \t0-9]*.b[sr][sk][ \t]*'"$NUM$NUM$NUM$NUM"'.*/\1\4/p')
-if [ -z "$OUT" ] ; then
- echo "Never found .bss or .brk file offset" >&2
- exit 1
-fi
-
-OUT=$(echo ${OUT# })
-sizeA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetA=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-sizeB=$(printf "%d" 0x${OUT%% *})
-OUT=${OUT#* }
-offsetB=$(printf "%d" 0x${OUT%% *})
-
-run_size=$(( $offsetA + $sizeA + $sizeB ))
-
-# BFD linker shows the same file offset in ELF.
-if [ "$offsetA" -ne "$offsetB" ] ; then
- # Gold linker shows them as consecutive.
- endB=$(( $offsetB + $sizeB ))
- if [ "$endB" != "$run_size" ] ; then
- printf "sizeA: 0x%x\n" $sizeA >&2
- printf "offsetA: 0x%x\n" $offsetA >&2
- printf "sizeB: 0x%x\n" $sizeB >&2
- printf "offsetB: 0x%x\n" $offsetB >&2
- echo ".bss and .brk are non-contiguous" >&2
- exit 1
- fi
-fi
-
-printf "%d\n" $run_size
-exit 0

Subject: [tip:x86/boot] x86/boot: Correctly bounds-check relocations

Commit-ID: 4abf061bf87bbd856c8d60199b2fba8b8f9b9fd6
Gitweb: http://git.kernel.org/tip/4abf061bf87bbd856c8d60199b2fba8b8f9b9fd6
Author: Yinghai Lu <[email protected]>
AuthorDate: Thu, 28 Apr 2016 17:09:08 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 29 Apr 2016 11:03:30 +0200

x86/boot: Correctly bounds-check relocations

Relocation handling performs bounds checking on the resulting calculated
addresses. The existing code uses output_len (VO size plus relocs size) as
the max address. This is not right since the max_addr check should stop at
the end of VO and exclude bss, brk, etc, which follows. The valid range
should be VO [_text, __bss_start] in the loaded physical address space.

This patch adds an export for __bss_start in voffset.h and uses it to
set the correct limit for max_addr.

Signed-off-by: Yinghai Lu <[email protected]>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/misc.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index adef26d..75f2233 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -57,7 +57,7 @@ LDFLAGS_vmlinux := -T
hostprogs-y := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include

-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'

quiet_cmd_voffset = VOFFSET $@
cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index bee6238..8f0253d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -191,7 +191,7 @@ static void handle_relocations(void *output, unsigned long output_len)
int *reloc;
unsigned long delta, map, ptr;
unsigned long min_addr = (unsigned long)output;
- unsigned long max_addr = min_addr + output_len;
+ unsigned long max_addr = min_addr + (VO___bss_start - VO__text);

/*
* Calculate the delta between where vmlinux was linked to load

2016-10-03 21:50:40

by Simon Glass

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer

HI Matt,

On 16 August 2016 at 20:25, Matt Mullins <[email protected]> wrote:
> On Tue, Aug 16, 2016 at 12:19:42PM -0700, Yinghai Lu wrote:
>> On Mon, Aug 15, 2016 at 9:01 PM, Matt Mullins <[email protected]> wrote:
>> >
>> > This appears to have a negative effect on booting the Intel Edison platform, as
>> > it uses u-boot as its bootloader. u-boot does not copy the init_size parameter
>> > when booting a bzImage: it copies a fixed-size setup_header [1], and its
>> > definition of setup_header doesn't include the parameters beyond setup_data [2].
>> >
>> > With a zero value for init_size, this calculates a %rsp value of 0x101ff9600.
>> > This causes the boot process to hard-stop at the immediately-following pushq, as
>> > this platform has no usable physical addresses above 4G.
>> >
>> > What are the options for getting this type of platform to function again? For
>> > now, kexec from a working Linux system does seem to be a work-around, but there
>> > appears to be other x86 hardware using u-boot: the chromium.org folks seem to be
>> > maintaining the u-boot x86 tree.
>> >
>> > [1] http://git.denx.de/?p=u-boot.git;a=blob;f=arch/x86/lib/zimage.c;h=1b33c771391f49ffe82864ff1582bdfd07e5e97d;hb=HEAD#l156
>> > [2] http://git.denx.de/?p=u-boot.git;a=blob;f=arch/x86/include/asm/bootparam.h;h=140095117e5a2daef0a097c55f0ed10e08acc781;hb=HEAD#l24
>>
>> Then should fix the u-boot about header_size assumption.
>
> I was hoping to avoid that, since the Edison's u-boot is 10,000-line patch atop
> the upstream -- I don't trust myself to build and flash one quite yet.

It would be good to upstream that!

>
> If this turned out to affect Chromebooks, I'd spend more effort pushing for
> a kernel fix, but it seems that ChromeOS has a different kernel load procedure
> and doesn't use "zboot". For now, I'll probably just keep a local patch that
> hard-codes a value large enough to decompress and launch the kernel.
>
> I may turn that local patch into something gated by a Kconfig eventually, in
> hopes that users of the other x86 u-boot platforms will see it in a "make
> oldconfig" run.

Well, I think this patch is useful. But also, let's fix U-Boot.

Regards,
Simon

2016-11-30 16:52:31

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer

On Tue, Oct 4, 2016 at 12:50 AM, Simon Glass <[email protected]> wrote:
> HI Matt,
>
> On 16 August 2016 at 20:25, Matt Mullins <[email protected]> wrote:
>> On Tue, Aug 16, 2016 at 12:19:42PM -0700, Yinghai Lu wrote:
>>> On Mon, Aug 15, 2016 at 9:01 PM, Matt Mullins <[email protected]> wrote:
>>> >
>>> > This appears to have a negative effect on booting the Intel Edison platform, as
>>> > it uses u-boot as its bootloader. u-boot does not copy the init_size parameter
>>> > when booting a bzImage: it copies a fixed-size setup_header [1], and its
>>> > definition of setup_header doesn't include the parameters beyond setup_data [2].
>>> >
>>> > With a zero value for init_size, this calculates a %rsp value of 0x101ff9600.
>>> > This causes the boot process to hard-stop at the immediately-following pushq, as
>>> > this platform has no usable physical addresses above 4G.
>>> >
>>> > What are the options for getting this type of platform to function again? For
>>> > now, kexec from a working Linux system does seem to be a work-around, but there
>>> > appears to be other x86 hardware using u-boot: the chromium.org folks seem to be
>>> > maintaining the u-boot x86 tree.
>>> >
>>> > [1] http://git.denx.de/?p=u-boot.git;a=blob;f=arch/x86/lib/zimage.c;h=1b33c771391f49ffe82864ff1582bdfd07e5e97d;hb=HEAD#l156
>>> > [2] http://git.denx.de/?p=u-boot.git;a=blob;f=arch/x86/include/asm/bootparam.h;h=140095117e5a2daef0a097c55f0ed10e08acc781;hb=HEAD#l24
>>>
>>> Then should fix the u-boot about header_size assumption.
>>
>> I was hoping to avoid that, since the Edison's u-boot is 10,000-line patch atop
>> the upstream -- I don't trust myself to build and flash one quite yet.
>
> It would be good to upstream that!
>
>>
>> If this turned out to affect Chromebooks, I'd spend more effort pushing for
>> a kernel fix, but it seems that ChromeOS has a different kernel load procedure
>> and doesn't use "zboot". For now, I'll probably just keep a local patch that
>> hard-codes a value large enough to decompress and launch the kernel.
>>
>> I may turn that local patch into something gated by a Kconfig eventually, in
>> hopes that users of the other x86 u-boot platforms will see it in a "make
>> oldconfig" run.
>
> Well, I think this patch is useful. But also, let's fix U-Boot.

Found this late.

Anyway, I have got same issue [1] and basically what I did, I just
deuglified a bit that monster hack pile for U-Boot to build my own
fork of most recent one [2].
For now on I have working x86_64 kernel directly and newest U-Boot.

[1] https://github.com/andy-shev/linux/issues/3
[2] https://github.com/andy-shev/u-boot/tree/edison

--
With Best Regards,
Andy Shevchenko