2013-10-01 19:38:07

by Kees Cook

[permalink] [raw]
Subject: [PATCH v6 0/7] Kernel base address randomization

Here is the latest version of the kASLR series. It has much improved
e820 walking code, and expands the window available on 64-bit.

This is rolled out on Chrome OS devices, and working well.

-Kees


2013-10-01 19:38:00

by Kees Cook

[permalink] [raw]
Subject: [PATCH 3/7] x86, kaslr: find minimum safe relocation position

Examine all the known unsafe areas and avoid them by just raising the
minimum relocation position to be past them.

Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/compressed/aslr.c | 50 +++++++++++++++++++++++++++++++++++++++
arch/x86/boot/compressed/misc.c | 10 ++------
arch/x86/boot/compressed/misc.h | 8 +++++++
3 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index b73cc66..ed7e9f0 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -2,6 +2,53 @@

#ifdef CONFIG_RANDOMIZE_BASE

+static unsigned long find_minimum_location(unsigned long input,
+ unsigned long input_size,
+ unsigned long output,
+ unsigned long output_size)
+{
+ u64 initrd_start, initrd_size;
+ u64 cmd_line, cmd_line_size;
+ unsigned long unsafe, unsafe_len;
+ char *ptr;
+
+ /*
+ * Mark off the region that is unsafe to overlap during
+ * decompression (see calculations at top of misc.c).
+ */
+ unsafe_len = (output_size >> 12) + 32768 + 18;
+ unsafe = (unsigned long)input + input_size - unsafe_len;
+
+ /*
+ * Locate other regions that cannot be over-written during
+ * decompression: initrd, cmd_line.
+ */
+ initrd_start = (u64)real_mode->ext_ramdisk_image << 32;
+ initrd_start |= real_mode->hdr.ramdisk_image;
+ initrd_size = (u64)real_mode->ext_ramdisk_size << 32;
+ initrd_size |= real_mode->hdr.ramdisk_size;
+ cmd_line = (u64)real_mode->ext_cmd_line_ptr << 32;
+ cmd_line |= real_mode->hdr.cmd_line_ptr;
+ /* Calculate size of cmd_line. */
+ ptr = (char *)(unsigned long)cmd_line;
+ for (cmd_line_size = 0; ptr[cmd_line_size++]; )
+ ;
+
+ /* Minimum location must be above all these regions: */
+ output = max(output, unsafe + unsafe_len);
+ output = max(output, (unsigned long)free_mem_ptr + BOOT_HEAP_SIZE);
+ output = max(output, (unsigned long)free_mem_end_ptr + BOOT_STACK_SIZE);
+ output = max(output, (unsigned long)initrd_start
+ + (unsigned long)initrd_size);
+ output = max(output, (unsigned long)cmd_line
+ + (unsigned long)cmd_line_size);
+
+ /* Make sure the location is still aligned. */
+ output = ALIGN(output, CONFIG_PHYSICAL_ALIGN);
+
+ return output;
+}
+
unsigned char *choose_kernel_location(unsigned char *input,
unsigned long input_size,
unsigned char *output,
@@ -14,6 +61,9 @@ unsigned char *choose_kernel_location(unsigned char *input,
goto out;
}

+ choice = find_minimum_location((unsigned long)input, input_size,
+ (unsigned long)output, output_size);
+
/* XXX: choose random location. */

out:
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 7138768..196eaf3 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -112,14 +112,8 @@ struct boot_params *real_mode; /* Pointer to real-mode data */
void *memset(void *s, int c, size_t n);
void *memcpy(void *dest, const void *src, size_t n);

-#ifdef CONFIG_X86_64
-#define memptr long
-#else
-#define memptr unsigned
-#endif
-
-static memptr free_mem_ptr;
-static memptr free_mem_end_ptr;
+memptr free_mem_ptr;
+memptr free_mem_end_ptr;

static char *vidmem;
static int vidport;
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 9077af7..42f71bb 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -23,7 +23,15 @@
#define BOOT_BOOT_H
#include "../ctype.h"

+#ifdef CONFIG_X86_64
+#define memptr long
+#else
+#define memptr unsigned
+#endif
+
/* misc.c */
+extern memptr free_mem_ptr;
+extern memptr free_mem_end_ptr;
extern struct boot_params *real_mode; /* Pointer to real-mode data */
void __putstr(const char *s);
#define error_putstr(__x) __putstr(__x)
--
1.7.9.5

2013-10-01 19:38:03

by Kees Cook

[permalink] [raw]
Subject: [PATCH 6/7] x86, kaslr: report kernel offset on panic

When the system panics, include the kernel offset in the report to assist
in debugging.

Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/kernel/setup.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f0de629..1708862 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -824,6 +824,20 @@ static void __init trim_low_memory_range(void)
}

/*
+ * Dump out kernel offset information on panic.
+ */
+static int
+dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
+{
+ pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
+ "(relocation range: 0x%lx-0x%lx)\n",
+ (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
+ __START_KERNEL_map, MODULES_VADDR-1);
+
+ return 0;
+}
+
+/*
* Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use these data structures
* for initialization. Note, the efi init code path is determined by the
@@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
}

#endif /* CONFIG_X86_32 */
+
+static struct notifier_block kernel_offset_notifier = {
+ .notifier_call = dump_kernel_offset
+};
+
+static int __init register_kernel_offset_dumper(void)
+{
+ atomic_notifier_chain_register(&panic_notifier_list,
+ &kernel_offset_notifier);
+ return 0;
+}
+__initcall(register_kernel_offset_dumper);
--
1.7.9.5

2013-10-01 19:37:56

by Kees Cook

[permalink] [raw]
Subject: [PATCH 2/7] x86, kaslr: return location from decompress_kernel

This allows decompress_kernel to return a new location for the kernel to
be relocated to. Additionally, enforces CONFIG_PHYSICAL_START as the
minimum relocation position when building with CONFIG_RELOCATABLE.

With CONFIG_RANDOMIZE_BASE set, the choose_kernel_location routine
will select a new location to decompress the kernel, though here it is
presently a no-op. The logic for bypassing this routine with "nokaslr"
on the kernel command line is handled.

Signed-off-by: Kees Cook <[email protected]>
---
Documentation/kernel-parameters.txt | 4 ++++
arch/x86/Kconfig | 38 +++++++++++++++++++++++++++++++----
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/aslr.c | 23 +++++++++++++++++++++
arch/x86/boot/compressed/cmdline.c | 2 +-
arch/x86/boot/compressed/head_32.S | 10 +++++----
arch/x86/boot/compressed/head_64.S | 16 +++++++++------
arch/x86/boot/compressed/misc.c | 8 ++++++--
arch/x86/boot/compressed/misc.h | 27 +++++++++++++++++++------
9 files changed, 106 insertions(+), 24 deletions(-)
create mode 100644 arch/x86/boot/compressed/aslr.c

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fcbb736..773fc4c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1975,6 +1975,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
noapic [SMP,APIC] Tells the kernel to not make use of any
IOAPICs that may be present in the system.

+ nokaslr [X86]
+ Disable kernel base offset ASLR (Address Space
+ Layout Randomization) if built into the kernel.
+
noautogroup Disable scheduler automatic task group creation.

nobats [PPC] Do not use BATs for mapping kernel lowmem
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ee2fb9d..992701d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1722,16 +1722,46 @@ config RELOCATABLE

Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
it has been loaded at and the compile time physical address
- (CONFIG_PHYSICAL_START) is ignored.
+ (CONFIG_PHYSICAL_START) is used as the minimum location.

-# Relocation on x86-32 needs some additional build support
+config RANDOMIZE_BASE
+ bool "Randomize the address of the kernel image"
+ depends on RELOCATABLE
+ depends on !HIBERNATION
+ default n
+ ---help---
+ Randomizes the physical and virtual address at which the
+ kernel image is decompressed, as a security feature that
+ deters exploit attempts relying on knowledge of the location
+ of kernel internals.
+
+ Entropy is generated using the RDRAND instruction if it
+ is supported. If not, then RDTSC is used, if supported. If
+ neither RDRAND nor RDTSC are supported, then no randomness
+ is introduced.
+
+ The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
+ and aligned according to PHYSICAL_ALIGN.
+
+config RANDOMIZE_BASE_MAX_OFFSET
+ hex "Maximum ASLR offset allowed"
+ depends on RANDOMIZE_BASE
+ default "0x10000000"
+ range 0x0 0x10000000
+ ---help---
+ Determines the maximal offset in bytes that will be applied to the
+ kernel when Address Space Layout Randomization (ASLR) is active.
+ Must be less than or equal to the actual physical memory on the
+ system. This must be a power of two.
+
+# Relocation on x86 needs some additional build support
config X86_NEED_RELOCS
def_bool y
- depends on X86_32 && RELOCATABLE
+ depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)

config PHYSICAL_ALIGN
hex "Alignment value to which kernel should be aligned"
- default "0x1000000"
+ default "0x200000"
range 0x2000 0x1000000 if X86_32
range 0x200000 0x1000000 if X86_64
---help---
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 3312f1b..ae8b5db 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -27,7 +27,7 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include

VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o $(obj)/early_serial_console.o \
- $(obj)/piggy.o $(obj)/cpuflags.o
+ $(obj)/piggy.o $(obj)/cpuflags.o $(obj)/aslr.o

$(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
new file mode 100644
index 0000000..b73cc66
--- /dev/null
+++ b/arch/x86/boot/compressed/aslr.c
@@ -0,0 +1,23 @@
+#include "misc.h"
+
+#ifdef CONFIG_RANDOMIZE_BASE
+
+unsigned char *choose_kernel_location(unsigned char *input,
+ unsigned long input_size,
+ unsigned char *output,
+ unsigned long output_size)
+{
+ unsigned long choice = (unsigned long)output;
+
+ if (cmdline_find_option_bool("nokaslr")) {
+ debug_putstr("KASLR disabled...\n");
+ goto out;
+ }
+
+ /* XXX: choose random location. */
+
+out:
+ return (unsigned char *)choice;
+}
+
+#endif /* CONFIG_RANDOMIZE_BASE */
diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index bffd73b..b68e303 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -1,6 +1,6 @@
#include "misc.h"

-#ifdef CONFIG_EARLY_PRINTK
+#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE

static unsigned long fs;
static inline void set_fs(unsigned long seg)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 5d6f689..9116aac 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -117,9 +117,11 @@ preferred_addr:
addl %eax, %ebx
notl %eax
andl %eax, %ebx
-#else
- movl $LOAD_PHYSICAL_ADDR, %ebx
+ cmpl $LOAD_PHYSICAL_ADDR, %ebx
+ jge 1f
#endif
+ movl $LOAD_PHYSICAL_ADDR, %ebx
+1:

/* Target address to relocate to for decompression */
addl $z_extract_offset, %ebx
@@ -191,14 +193,14 @@ relocated:
leal boot_heap(%ebx), %eax
pushl %eax /* heap area */
pushl %esi /* real mode pointer */
- call decompress_kernel
+ call decompress_kernel /* returns kernel location in %eax */
addl $24, %esp

/*
* Jump to the decompressed kernel.
*/
xorl %ebx, %ebx
- jmp *%ebp
+ jmp *%eax

/*
* Stack and heap for uncompression
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index c337422..c5c1ae0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -94,9 +94,11 @@ ENTRY(startup_32)
addl %eax, %ebx
notl %eax
andl %eax, %ebx
-#else
- movl $LOAD_PHYSICAL_ADDR, %ebx
+ cmpl $LOAD_PHYSICAL_ADDR, %ebx
+ jge 1f
#endif
+ movl $LOAD_PHYSICAL_ADDR, %ebx
+1:

/* Target address to relocate to for decompression */
addl $z_extract_offset, %ebx
@@ -269,9 +271,11 @@ preferred_addr:
addq %rax, %rbp
notq %rax
andq %rax, %rbp
-#else
- movq $LOAD_PHYSICAL_ADDR, %rbp
+ cmpq $LOAD_PHYSICAL_ADDR, %rbp
+ jge 1f
#endif
+ movq $LOAD_PHYSICAL_ADDR, %rbp
+1:

/* Target address to relocate to for decompression */
leaq z_extract_offset(%rbp), %rbx
@@ -339,13 +343,13 @@ relocated:
movl $z_input_len, %ecx /* input_len */
movq %rbp, %r8 /* output target address */
movq $z_output_len, %r9 /* decompressed length */
- call decompress_kernel
+ call decompress_kernel /* returns kernel location in %rax */
popq %rsi

/*
* Jump to the decompressed kernel.
*/
- jmp *%rbp
+ jmp *%rax

.code32
no_longmode:
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 434f077..7138768 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -395,7 +395,7 @@ static void parse_elf(void *output)
free(phdrs);
}

-asmlinkage void decompress_kernel(void *rmode, memptr heap,
+asmlinkage void *decompress_kernel(void *rmode, memptr heap,
unsigned char *input_data,
unsigned long input_len,
unsigned char *output,
@@ -422,6 +422,10 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
free_mem_ptr = heap; /* Heap */
free_mem_end_ptr = heap + BOOT_HEAP_SIZE;

+ output = choose_kernel_location(input_data, input_len,
+ output, output_len);
+
+ /* Validate memory location choices. */
if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
error("Destination address inappropriately aligned");
#ifdef CONFIG_X86_64
@@ -441,5 +445,5 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
parse_elf(output);
handle_relocations(output, output_len);
debug_putstr("done.\nBooting the kernel.\n");
- return;
+ return output;
}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 674019d..9077af7 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -39,23 +39,38 @@ static inline void debug_putstr(const char *s)

#endif

-#ifdef CONFIG_EARLY_PRINTK
-
+#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE
/* cmdline.c */
int cmdline_find_option(const char *option, char *buffer, int bufsize);
int cmdline_find_option_bool(const char *option);
+#endif

-/* early_serial_console.c */
-extern int early_serial_base;
-void console_init(void);

+#if CONFIG_RANDOMIZE_BASE
+/* aslr.c */
+unsigned char *choose_kernel_location(unsigned char *input,
+ unsigned long input_size,
+ unsigned char *output,
+ unsigned long output_size);
#else
+static inline
+unsigned char *choose_kernel_location(unsigned char *input,
+ unsigned long input_size,
+ unsigned char *output,
+ unsigned long output_size)
+{
+ return output;
+}
+#endif

+#ifdef CONFIG_EARLY_PRINTK
/* early_serial_console.c */
+extern int early_serial_base;
+void console_init(void);
+#else
static const int early_serial_base;
static inline void console_init(void)
{ }
-
#endif

#endif
--
1.7.9.5

2013-10-01 19:38:55

by Kees Cook

[permalink] [raw]
Subject: [PATCH 1/7] x86, kaslr: move CPU flags out of cpucheck

Refactor the CPU flags handling out of the cpucheck routines so that
they can be reused by the future ASLR routines (in order to detect CPU
features like RDRAND and RDTSC).

This reworks has_eflag() and has_fpu() to be used on both 32-bit and
64-bit, and refactors the calls to cpuid to make them PIC-safe on 32-bit.

Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/boot.h | 10 +---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/cpuflags.c | 12 ++++
arch/x86/boot/cpucheck.c | 86 -----------------------------
arch/x86/boot/cpuflags.c | 104 +++++++++++++++++++++++++++++++++++
arch/x86/boot/cpuflags.h | 19 +++++++
7 files changed, 138 insertions(+), 97 deletions(-)
create mode 100644 arch/x86/boot/compressed/cpuflags.c
create mode 100644 arch/x86/boot/cpuflags.c
create mode 100644 arch/x86/boot/cpuflags.h

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 379814b..0da2e37 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -20,7 +20,7 @@ targets := vmlinux.bin setup.bin setup.elf bzImage
targets += fdimage fdimage144 fdimage288 image.iso mtools.conf
subdir- := compressed

-setup-y += a20.o bioscall.o cmdline.o copy.o cpu.o cpucheck.o
+setup-y += a20.o bioscall.o cmdline.o copy.o cpu.o cpuflags.o cpucheck.o
setup-y += early_serial_console.o edd.o header.o main.o mca.o memory.o
setup-y += pm.o pmjump.o printf.o regs.o string.o tty.o video.o
setup-y += video-mode.o version.o
diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef72bae..50f8c5e 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -26,9 +26,8 @@
#include <asm/boot.h>
#include <asm/setup.h>
#include "bitops.h"
-#include <asm/cpufeature.h>
-#include <asm/processor-flags.h>
#include "ctype.h"
+#include "cpuflags.h"

/* Useful macros */
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
@@ -307,14 +306,7 @@ static inline int cmdline_find_option_bool(const char *option)
return __cmdline_find_option_bool(cmd_line_ptr, option);
}

-
/* cpu.c, cpucheck.c */
-struct cpu_features {
- int level; /* Family, or 64 for x86-64 */
- int model;
- u32 flags[NCAPINTS];
-};
-extern struct cpu_features cpu;
int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr);
int validate_cpu(void);

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index dcd90df..3312f1b 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -27,7 +27,7 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include

VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o $(obj)/early_serial_console.o \
- $(obj)/piggy.o
+ $(obj)/piggy.o $(obj)/cpuflags.o

$(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone

diff --git a/arch/x86/boot/compressed/cpuflags.c b/arch/x86/boot/compressed/cpuflags.c
new file mode 100644
index 0000000..931cba6
--- /dev/null
+++ b/arch/x86/boot/compressed/cpuflags.c
@@ -0,0 +1,12 @@
+#ifdef CONFIG_RANDOMIZE_BASE
+
+#include "../cpuflags.c"
+
+bool has_cpuflag(int flag)
+{
+ get_flags();
+
+ return test_bit(flag, cpu.flags);
+}
+
+#endif
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff03..e1f3c16 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -28,8 +28,6 @@
#include <asm/required-features.h>
#include <asm/msr-index.h>

-struct cpu_features cpu;
-static u32 cpu_vendor[3];
static u32 err_flags[NCAPINTS];

static const int req_level = CONFIG_X86_MINIMUM_CPU_FAMILY;
@@ -69,90 +67,6 @@ static int is_transmeta(void)
cpu_vendor[2] == A32('M', 'x', '8', '6');
}

-static int has_fpu(void)
-{
- u16 fcw = -1, fsw = -1;
- u32 cr0;
-
- asm("movl %%cr0,%0" : "=r" (cr0));
- if (cr0 & (X86_CR0_EM|X86_CR0_TS)) {
- cr0 &= ~(X86_CR0_EM|X86_CR0_TS);
- asm volatile("movl %0,%%cr0" : : "r" (cr0));
- }
-
- asm volatile("fninit ; fnstsw %0 ; fnstcw %1"
- : "+m" (fsw), "+m" (fcw));
-
- return fsw == 0 && (fcw & 0x103f) == 0x003f;
-}
-
-static int has_eflag(u32 mask)
-{
- u32 f0, f1;
-
- asm("pushfl ; "
- "pushfl ; "
- "popl %0 ; "
- "movl %0,%1 ; "
- "xorl %2,%1 ; "
- "pushl %1 ; "
- "popfl ; "
- "pushfl ; "
- "popl %1 ; "
- "popfl"
- : "=&r" (f0), "=&r" (f1)
- : "ri" (mask));
-
- return !!((f0^f1) & mask);
-}
-
-static void get_flags(void)
-{
- u32 max_intel_level, max_amd_level;
- u32 tfms;
-
- if (has_fpu())
- set_bit(X86_FEATURE_FPU, cpu.flags);
-
- if (has_eflag(X86_EFLAGS_ID)) {
- asm("cpuid"
- : "=a" (max_intel_level),
- "=b" (cpu_vendor[0]),
- "=d" (cpu_vendor[1]),
- "=c" (cpu_vendor[2])
- : "a" (0));
-
- if (max_intel_level >= 0x00000001 &&
- max_intel_level <= 0x0000ffff) {
- asm("cpuid"
- : "=a" (tfms),
- "=c" (cpu.flags[4]),
- "=d" (cpu.flags[0])
- : "a" (0x00000001)
- : "ebx");
- cpu.level = (tfms >> 8) & 15;
- cpu.model = (tfms >> 4) & 15;
- if (cpu.level >= 6)
- cpu.model += ((tfms >> 16) & 0xf) << 4;
- }
-
- asm("cpuid"
- : "=a" (max_amd_level)
- : "a" (0x80000000)
- : "ebx", "ecx", "edx");
-
- if (max_amd_level >= 0x80000001 &&
- max_amd_level <= 0x8000ffff) {
- u32 eax = 0x80000001;
- asm("cpuid"
- : "+a" (eax),
- "=c" (cpu.flags[6]),
- "=d" (cpu.flags[1])
- : : "ebx");
- }
- }
-}
-
/* Returns a bitmask of which words we have error bits in */
static int check_flags(void)
{
diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c
new file mode 100644
index 0000000..b02544a
--- /dev/null
+++ b/arch/x86/boot/cpuflags.c
@@ -0,0 +1,104 @@
+#include <linux/types.h>
+#include "bitops.h"
+
+#include <asm/processor-flags.h>
+#include <asm/required-features.h>
+#include <asm/msr-index.h>
+#include "cpuflags.h"
+
+struct cpu_features cpu;
+u32 cpu_vendor[3];
+
+static bool loaded_flags;
+
+static int has_fpu(void)
+{
+ u16 fcw = -1, fsw = -1;
+ unsigned long cr0;
+
+ asm volatile("mov %%cr0,%0" : "=r" (cr0));
+ if (cr0 & (X86_CR0_EM|X86_CR0_TS)) {
+ cr0 &= ~(X86_CR0_EM|X86_CR0_TS);
+ asm volatile("mov %0,%%cr0" : : "r" (cr0));
+ }
+
+ asm volatile("fninit ; fnstsw %0 ; fnstcw %1"
+ : "+m" (fsw), "+m" (fcw));
+
+ return fsw == 0 && (fcw & 0x103f) == 0x003f;
+}
+
+int has_eflag(unsigned long mask)
+{
+ unsigned long f0, f1;
+
+ asm volatile("pushf \n\t"
+ "pushf \n\t"
+ "pop %0 \n\t"
+ "mov %0,%1 \n\t"
+ "xor %2,%1 \n\t"
+ "push %1 \n\t"
+ "popf \n\t"
+ "pushf \n\t"
+ "pop %1 \n\t"
+ "popf"
+ : "=&r" (f0), "=&r" (f1)
+ : "ri" (mask));
+
+ return !!((f0^f1) & mask);
+}
+
+/* Handle x86_32 PIC using ebx. */
+#if defined(__i386__) && defined(__PIC__)
+# define EBX_REG "=r"
+#else
+# define EBX_REG "=b"
+#endif
+
+static inline void cpuid(u32 id, u32 *a, u32 *b, u32 *c, u32 *d)
+{
+ asm volatile(".ifnc %%ebx,%3 ; movl %%ebx,%3 ; .endif \n\t"
+ "cpuid \n\t"
+ ".ifnc %%ebx,%3 ; xchgl %%ebx,%3 ; .endif \n\t"
+ : "=a" (*a), "=c" (*c), "=d" (*d), EBX_REG (*b)
+ : "a" (id)
+ );
+}
+
+void get_flags(void)
+{
+ u32 max_intel_level, max_amd_level;
+ u32 tfms;
+ u32 ignored;
+
+ if (loaded_flags)
+ return;
+ loaded_flags = true;
+
+ if (has_fpu())
+ set_bit(X86_FEATURE_FPU, cpu.flags);
+
+ if (has_eflag(X86_EFLAGS_ID)) {
+ cpuid(0x0, &max_intel_level, &cpu_vendor[0], &cpu_vendor[2],
+ &cpu_vendor[1]);
+
+ if (max_intel_level >= 0x00000001 &&
+ max_intel_level <= 0x0000ffff) {
+ cpuid(0x1, &tfms, &ignored, &cpu.flags[4],
+ &cpu.flags[0]);
+ cpu.level = (tfms >> 8) & 15;
+ cpu.model = (tfms >> 4) & 15;
+ if (cpu.level >= 6)
+ cpu.model += ((tfms >> 16) & 0xf) << 4;
+ }
+
+ cpuid(0x80000000, &max_amd_level, &ignored, &ignored,
+ &ignored);
+
+ if (max_amd_level >= 0x80000001 &&
+ max_amd_level <= 0x8000ffff) {
+ cpuid(0x80000001, &ignored, &ignored, &cpu.flags[6],
+ &cpu.flags[1]);
+ }
+ }
+}
diff --git a/arch/x86/boot/cpuflags.h b/arch/x86/boot/cpuflags.h
new file mode 100644
index 0000000..9bb4e25
--- /dev/null
+++ b/arch/x86/boot/cpuflags.h
@@ -0,0 +1,19 @@
+#ifndef BOOT_CPUFLAGS_H
+#define BOOT_CPUFLAGS_H
+
+#include <asm/cpufeature.h>
+#include <asm/processor-flags.h>
+
+struct cpu_features {
+ int level; /* Family, or 64 for x86-64 */
+ int model;
+ u32 flags[NCAPINTS];
+};
+
+extern struct cpu_features cpu;
+extern u32 cpu_vendor[3];
+
+int has_eflag(unsigned long mask);
+void get_flags(void);
+
+#endif
--
1.7.9.5

2013-10-01 19:39:01

by Kees Cook

[permalink] [raw]
Subject: [PATCH 4/7] x86, kaslr: select random base offset

Select a random location when CONFIG_RANDOMIZE_BASE is used, bounded
by CONFIG_RANDOMIZE_BASE_MAX_OFFSET. Sources of randomness currently
include RDRAND, RDTSC, or the i8254.

Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/compressed/aslr.c | 83 ++++++++++++++++++++++++++++++++++++++-
arch/x86/boot/compressed/misc.h | 2 +
2 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index ed7e9f0..2f63f9f 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -1,6 +1,72 @@
#include "misc.h"

#ifdef CONFIG_RANDOMIZE_BASE
+#include <asm/msr.h>
+
+#include <asm/archrandom.h>
+static inline int rdrand(unsigned long *v)
+{
+ int ok;
+ asm volatile("1: " RDRAND_LONG "\n\t"
+ "jc 2f\n\t"
+ "decl %0\n\t"
+ "jnz 1b\n\t"
+ "2:"
+ : "=r" (ok), "=a" (*v)
+ : "0" (RDRAND_RETRY_LOOPS));
+ return ok;
+}
+
+#define I8254_PORT_CONTROL 0x43
+#define I8254_PORT_COUNTER0 0x40
+#define I8254_CMD_READBACK 0xC0
+#define I8254_SELECT_COUNTER0 0x02
+#define I8254_STATUS_NOTREADY 0x40
+static inline u16 i8254(void)
+{
+ u16 status, timer;
+
+ do {
+ outb(I8254_PORT_CONTROL,
+ I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
+ status = inb(I8254_PORT_COUNTER0);
+ timer = inb(I8254_PORT_COUNTER0);
+ timer |= inb(I8254_PORT_COUNTER0) << 8;
+ } while (status & I8254_STATUS_NOTREADY);
+
+ return timer;
+}
+
+static unsigned long get_random_long(void)
+{
+ unsigned long random;
+
+ if (has_cpuflag(X86_FEATURE_RDRAND)) {
+ debug_putstr("KASLR using RDRAND...\n");
+ if (rdrand(&random))
+ return random;
+ }
+
+ if (has_cpuflag(X86_FEATURE_TSC)) {
+ uint32_t raw;
+
+ debug_putstr("KASLR using RDTSC...\n");
+ rdtscl(raw);
+
+ /* Only use the low bits of rdtsc. */
+ random = raw & 0xffff;
+ } else {
+ debug_putstr("KASLR using i8254...\n");
+ random = i8254();
+ }
+
+ /* Extend timer bits poorly... */
+ random |= (random << 16);
+#ifdef CONFIG_X86_64
+ random |= (random << 32) | (random << 48);
+#endif
+ return random;
+}

static unsigned long find_minimum_location(unsigned long input,
unsigned long input_size,
@@ -55,6 +121,7 @@ unsigned char *choose_kernel_location(unsigned char *input,
unsigned long output_size)
{
unsigned long choice = (unsigned long)output;
+ unsigned long random;

if (cmdline_find_option_bool("nokaslr")) {
debug_putstr("KASLR disabled...\n");
@@ -64,8 +131,22 @@ unsigned char *choose_kernel_location(unsigned char *input,
choice = find_minimum_location((unsigned long)input, input_size,
(unsigned long)output, output_size);

- /* XXX: choose random location. */
+ /* XXX: Find an appropriate E820 hole, instead of adding offset. */
+ random = get_random_long();
+
+ /* Clip off top of the range. */
+ random &= (CONFIG_RANDOMIZE_BASE_MAX_OFFSET - 1);
+ while (random + output_size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+ random >>= 1;
+
+ /* Clip off bottom of range. */
+ random &= ~(choice - 1);
+
+ /* Always enforce the minimum. */
+ if (random < choice)
+ goto out;

+ choice = random;
out:
return (unsigned char *)choice;
}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 42f71bb..24e3e56 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -60,6 +60,8 @@ unsigned char *choose_kernel_location(unsigned char *input,
unsigned long input_size,
unsigned char *output,
unsigned long output_size);
+/* cpuflags.c */
+bool has_cpuflag(int flag);
#else
static inline
unsigned char *choose_kernel_location(unsigned char *input,
--
1.7.9.5

2013-10-01 19:38:53

by Kees Cook

[permalink] [raw]
Subject: [PATCH 5/7] x86, kaslr: select memory region from e820 maps

Counts available alignment positions across all e820 maps, and chooses
one randomly for the new kernel base address.

Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/boot/compressed/aslr.c | 140 ++++++++++++++++++++++++++++++++++++---
1 file changed, 130 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 2f63f9f..ab0f8a3 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -2,6 +2,7 @@

#ifdef CONFIG_RANDOMIZE_BASE
#include <asm/msr.h>
+#include <asm/e820.h>

#include <asm/archrandom.h>
static inline int rdrand(unsigned long *v)
@@ -115,6 +116,130 @@ static unsigned long find_minimum_location(unsigned long input,
return output;
}

+/*
+ * This routine is used to count how many aligned slots that can hold the
+ * kernel exist across all e820 entries. It is called in two phases, once
+ * to count valid memory regions available for the kernel image, and a
+ * second time to select one from those seen.
+ *
+ * It is first called with "counting" set to true, where it expects to be
+ * called once for each e820 entry. In this mode, it will update *count
+ * with how many slots are available in the given e820 entry. Once the walk
+ * across all e820 entries has finished, the caller will have a total count
+ * of all valid memory regions available for the kernel image.
+ *
+ * Once the first pass of entry walking is finished, the caller selects one
+ * of the possible slots (stored in *count), and performs a second walk,
+ * with "counting" set to false. In this mode, *count is decremented until
+ * the corresponding slot is found in a specific e820 region, at which
+ * point, the function returns that address, and the walk terminates.
+ */
+static unsigned long process_e820_entry(struct e820entry *entry, bool counting,
+ unsigned long minimum,
+ unsigned long image_size,
+ unsigned long *count)
+{
+ u64 addr, size;
+ unsigned long alignments;
+
+ /* Skip non-RAM entries. */
+ if (entry->type != E820_RAM)
+ return 0;
+
+ /* Ignore entries entirely above our maximum. */
+ if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+ return 0;
+
+ /* Ignore entries entirely below our minimum. */
+ if (entry->addr + entry->size < minimum)
+ return 0;
+
+ size = entry->size;
+ addr = entry->addr;
+
+ /* Potentially raise address to minimum location. */
+ if (addr < minimum)
+ addr = minimum;
+
+ /* Potentially raise address to meet alignment requirements. */
+ addr = ALIGN(addr, CONFIG_PHYSICAL_ALIGN);
+
+ /* Did we raise the address above the bounds of this e820 region? */
+ if (addr > entry->addr + entry->size)
+ return 0;
+
+ /* Reduce size by any delta from the original address. */
+ size -= addr - entry->addr;
+
+ /* Reduce maximum image starting location to maximum limit. */
+ if (addr + size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+ size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - addr;
+
+ /* Ignore entries that cannot hold even a single kernel image. */
+ if (size < image_size)
+ return 0;
+
+ /*
+ * Reduce size by kernel image size so we can see how many aligned
+ * starting addresses will fit without running past the end of a
+ * region. XXX: adjacent e820 regions are not detected, so up to
+ * image_size / CONFIG_PHYSICAL_ALIGN slots may go unused across
+ * adjacent regions.
+ */
+ size -= image_size;
+
+ /* Now we know how many aligned slots can contain the image. */
+ alignments = (size / CONFIG_PHYSICAL_ALIGN) + 1;
+
+ /* In the first pass, just counting all the e820 entries? */
+ if (counting) {
+ *count += alignments;
+ return 0;
+ }
+
+ /* Otherwise we're counting down to find a specific aligned slot. */
+ if (*count < alignments) {
+ /* Desired region is in this entry. */
+ return addr + (CONFIG_PHYSICAL_ALIGN * *count);
+ } else {
+ /* Desired region is beyond this entry. */
+ *count -= alignments;
+ return 0;
+ }
+}
+
+static unsigned long find_random_e820(unsigned long minimum,
+ unsigned long size)
+{
+ int i;
+ unsigned long addr, count;
+
+ /* Make sure minimum is aligned. */
+ minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+
+ /* Verify potential e820 positions. */
+ count = 0;
+ for (i = 0; i < real_mode->e820_entries; i++) {
+ process_e820_entry(&real_mode->e820_map[i], true, minimum,
+ size, &count);
+ }
+
+ /* Handle crazy case of nothing fitting. */
+ if (count == 0)
+ return 0;
+
+ count = get_random_long() % count;
+
+ /* Select desired e820 position. */
+ for (i = 0; i < real_mode->e820_entries; i++) {
+ addr = process_e820_entry(&real_mode->e820_map[i], false,
+ minimum, size, &count);
+ if (addr)
+ return addr;
+ }
+ return 0;
+}
+
unsigned char *choose_kernel_location(unsigned char *input,
unsigned long input_size,
unsigned char *output,
@@ -131,16 +256,11 @@ unsigned char *choose_kernel_location(unsigned char *input,
choice = find_minimum_location((unsigned long)input, input_size,
(unsigned long)output, output_size);

- /* XXX: Find an appropriate E820 hole, instead of adding offset. */
- random = get_random_long();
-
- /* Clip off top of the range. */
- random &= (CONFIG_RANDOMIZE_BASE_MAX_OFFSET - 1);
- while (random + output_size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
- random >>= 1;
-
- /* Clip off bottom of range. */
- random &= ~(choice - 1);
+ random = find_random_e820(choice, output_size);
+ if (!random) {
+ debug_putstr("KASLR could not find suitable E820 region...\n");
+ goto out;
+ }

/* Always enforce the minimum. */
if (random < choice)
--
1.7.9.5

2013-10-01 19:38:51

by Kees Cook

[permalink] [raw]
Subject: [PATCH 7/7] x86, kaslr: raise max positions to 1GiB on x86_64

On 64-bit, this raises the maximum location to 1GiB, the upper limit
currently, since the kernel fixmap page mappings need to be moved to
use the other 1GiB (which would be the theoretical limit when building
with -mcmodel=kernel).

Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/Kconfig | 16 +++++++++++++---
arch/x86/include/asm/page_64_types.h | 15 ++++++++++++---
arch/x86/include/asm/pgtable_64_types.h | 2 +-
arch/x86/mm/init_32.c | 3 +++
4 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 992701d..51f4399 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1746,13 +1746,23 @@ config RANDOMIZE_BASE
config RANDOMIZE_BASE_MAX_OFFSET
hex "Maximum ASLR offset allowed"
depends on RANDOMIZE_BASE
- default "0x10000000"
- range 0x0 0x10000000
+ range 0x0 0x20000000 if X86_32
+ default "0x20000000" if X86_32
+ range 0x0 0x40000000 if X86_64
+ default "0x40000000" if X86_64
---help---
Determines the maximal offset in bytes that will be applied to the
kernel when Address Space Layout Randomization (ASLR) is active.
Must be less than or equal to the actual physical memory on the
- system. This must be a power of two.
+ system. This must be a multiple of CONFIG_PHYSICAL_ALIGN.
+
+ On 32-bit this is limited to 512MiB.
+
+ On 64-bit this is limited by how the kernel fixmap page table is
+ positioned, so this cannot be larger that 1GiB currently. Normally
+ there is a 512MiB to 1.5GiB split between kernel and modules. When
+ this is raised above the 512MiB default, the modules area will
+ shrink to compensate, up to the current maximum 1GiB to 1GiB split.

# Relocation on x86 needs some additional build support
config X86_NEED_RELOCS
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 43dcd80..8de6d9c 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -39,9 +39,18 @@
#define __VIRTUAL_MASK_SHIFT 47

/*
- * Kernel image size is limited to 512 MB (see level2_kernel_pgt in
- * arch/x86/kernel/head_64.S), and it is mapped here:
+ * Kernel image size is limited to 1GiB due to the fixmap living in the
+ * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
+ * 512MiB by default, leaving 1.5GiB for modules once the page tables
+ * are fully set up. If kernel ASLR is configured, it can extend the
+ * kernel page table mapping, reducing the size of the modules area.
*/
-#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024)
+#define KERNEL_IMAGE_SIZE_DEFAULT (512 * 1024 * 1024)
+#if defined(CONFIG_RANDOMIZE_BASE) && \
+ CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT
+#define KERNEL_IMAGE_SIZE CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+#else
+#define KERNEL_IMAGE_SIZE KERNEL_IMAGE_SIZE_DEFAULT
+#endif

#endif /* _ASM_X86_PAGE_64_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 2d88344..c883bf7 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -58,7 +58,7 @@ typedef struct { pteval_t pte; } pte_t;
#define VMALLOC_START _AC(0xffffc90000000000, UL)
#define VMALLOC_END _AC(0xffffe8ffffffffff, UL)
#define VMEMMAP_START _AC(0xffffea0000000000, UL)
-#define MODULES_VADDR _AC(0xffffffffa0000000, UL)
+#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
#define MODULES_END _AC(0xffffffffff000000, UL)
#define MODULES_LEN (MODULES_END - MODULES_VADDR)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 4287f1f..5bdc543 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -806,6 +806,9 @@ void __init mem_init(void)
BUILD_BUG_ON(VMALLOC_START >= VMALLOC_END);
#undef high_memory
#undef __FIXADDR_TOP
+#ifdef CONFIG_RANDOMIZE_BASE
+ BUILD_BUG_ON(CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE);
+#endif

#ifdef CONFIG_HIGHMEM
BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE > FIXADDR_START);
--
1.7.9.5

2013-10-01 20:47:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86, kaslr: select random base offset

On 10/01/2013 12:37 PM, Kees Cook wrote:
> +
> +#include <asm/archrandom.h>
> +static inline int rdrand(unsigned long *v)
> +{
> + int ok;
> + asm volatile("1: " RDRAND_LONG "\n\t"
> + "jc 2f\n\t"
> + "decl %0\n\t"
> + "jnz 1b\n\t"
> + "2:"
> + : "=r" (ok), "=a" (*v)
> + : "0" (RDRAND_RETRY_LOOPS));
> + return ok;
> +}
> +

This looks just like rdrand_long() in arch/x86/kernel/cpu/rdrand.c and
could move into the header file, no?

-hpa

2013-10-01 20:49:32

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/7] x86, kaslr: move CPU flags out of cpucheck

On 10/01/2013 12:37 PM, Kees Cook wrote:
> Refactor the CPU flags handling out of the cpucheck routines so that
> they can be reused by the future ASLR routines (in order to detect CPU
> features like RDRAND and RDTSC).
>
> This reworks has_eflag() and has_fpu() to be used on both 32-bit and
> 64-bit, and refactors the calls to cpuid to make them PIC-safe on 32-bit.
>
> Signed-off-by: Kees Cook <[email protected]>

Please flag the ones that specifically touch the boot code so that is
clear. Neither the title or the description makes that at all clear,
and at first reading is fairly confusing as a result.

-hpa

2013-10-01 21:09:16

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 1/7] x86, kaslr: move CPU flags out of cpucheck

On Tue, Oct 1, 2013 at 1:48 PM, H. Peter Anvin <[email protected]> wrote:
> On 10/01/2013 12:37 PM, Kees Cook wrote:
>> Refactor the CPU flags handling out of the cpucheck routines so that
>> they can be reused by the future ASLR routines (in order to detect CPU
>> features like RDRAND and RDTSC).
>>
>> This reworks has_eflag() and has_fpu() to be used on both 32-bit and
>> 64-bit, and refactors the calls to cpuid to make them PIC-safe on 32-bit.
>>
>> Signed-off-by: Kees Cook <[email protected]>
>
> Please flag the ones that specifically touch the boot code so that is
> clear. Neither the title or the description makes that at all clear,
> and at first reading is fairly confusing as a result.

Yes, good point. Patch 1/7 should be named "x86, boot: ..."

Thanks!

-Kees

--
Kees Cook
Chrome OS Security

2013-10-01 21:18:25

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86, kaslr: select random base offset

On Tue, Oct 1, 2013 at 1:46 PM, H. Peter Anvin <[email protected]> wrote:
> On 10/01/2013 12:37 PM, Kees Cook wrote:
>> +
>> +#include <asm/archrandom.h>
>> +static inline int rdrand(unsigned long *v)
>> +{
>> + int ok;
>> + asm volatile("1: " RDRAND_LONG "\n\t"
>> + "jc 2f\n\t"
>> + "decl %0\n\t"
>> + "jnz 1b\n\t"
>> + "2:"
>> + : "=r" (ok), "=a" (*v)
>> + : "0" (RDRAND_RETRY_LOOPS));
>> + return ok;
>> +}
>> +
>
> This looks just like rdrand_long() in arch/x86/kernel/cpu/rdrand.c and
> could move into the header file, no?

Yes, good idea. I'll move it into archrandom.h instead of this copy/paste.

-Kees

--
Kees Cook
Chrome OS Security

2013-10-02 00:39:15

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

(2013/10/02 4:37), Kees Cook wrote:
> When the system panics, include the kernel offset in the report to assist
> in debugging.
>
> Signed-off-by: Kees Cook <[email protected]>
> ---
> arch/x86/kernel/setup.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index f0de629..1708862 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -824,6 +824,20 @@ static void __init trim_low_memory_range(void)
> }
>
> /*
> + * Dump out kernel offset information on panic.
> + */
> +static int
> +dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
> +{
> + pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
> + "(relocation range: 0x%lx-0x%lx)\n",
> + (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
> + __START_KERNEL_map, MODULES_VADDR-1);

Using phys_base seems better.

> +
> + return 0;
> +}
> +
> +/*
> * Determine if we were loaded by an EFI loader. If so, then we have also been
> * passed the efi memmap, systab, etc., so we should use these data structures
> * for initialization. Note, the efi init code path is determined by the
> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
> }
>
> #endif /* CONFIG_X86_32 */
> +
> +static struct notifier_block kernel_offset_notifier = {
> + .notifier_call = dump_kernel_offset
> +};
> +
> +static int __init register_kernel_offset_dumper(void)
> +{
> + atomic_notifier_chain_register(&panic_notifier_list,
> + &kernel_offset_notifier);
> + return 0;
> +}
> +__initcall(register_kernel_offset_dumper);
>

Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS doesn't use
kdump? Anyway, kdump related tools now calculate phys_base from memory map
information passed as ELF PT_LOAD entries like below.

$ LANG=C readelf -l vmcore-rhel6up4

Elf file type is CORE (Core file)
Entry point 0x0
There are 5 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
NOTE 0x0000000000000158 0x0000000000000000 0x0000000000000000
0x0000000000000b08 0x0000000000000b08 0
LOAD 0x0000000000000c60 0xffffffff81000000 0x0000000001000000
0x000000000103b000 0x000000000103b000 RWE 0
LOAD 0x000000000103bc60 0xffff880000001000 0x0000000000001000
0x000000000009cc00 0x000000000009cc00 RWE 0
LOAD 0x00000000010d8860 0xffff880000100000 0x0000000000100000
0x0000000002f00000 0x0000000002f00000 RWE 0
LOAD 0x0000000003fd8860 0xffff880013000000 0x0000000013000000
0x000000002cffd000 0x000000002cffd000 RWE 0

Each PT_LOAD entry is assigned to virtual and physical address. In this case,
1st PT_LOAD entry belongs to kernel text mapping region, from which we can
calculate phys_base value.

Therefore, we already have phys_base information even in case of kdump, and
as long as using kdump-related tools such as crash utility, we don't need
to see ELF PT_LOAD headers as I explain here because they calculate the process
I explain here automatically.

Another idea is to add phys_base value in VMCOREINFO information that is debugging
information for user-land tools to filter crash dump. This is simple string information
so you can see the values contained in some crash dump by using strings command to it.
For example,

$ LANG=C strings vmcore-rhel6up4
CORE
CORE
CORE
CORE
VMCOREINFO
OSRELEASE=2.6.32-345.el6.x86_64
PAGESIZE=4096
SYMBOL(init_uts_ns)=ffffffff81a8e940
SYMBOL(node_online_map)=ffffffff81c09e20
SYMBOL(swapper_pg_dir)=ffffffff81a85000
...cut...
NUMBER(PG_private)=11
NUMBER(PG_swapcache)=16
SYMBOL(phys_base)=ffffffff81a8d010
SYMBOL(init_level4_pgt)=ffffffff81a85000
SYMBOL(node_data)=ffffffff81c06280
LENGTH(node_data)=512
CRASHTIME=1355815389

The problem is that currently phys_base information is somehow exported as virtual address
assigned to phys_base symbol, not the value of it. User-land tools use this to determine
if they need to calculate phys_base. But obviously, it's simpler to export phys_base information
as its value, not the assigned address. Then, as a side-effect, we can check phys_base value
using strings command to crash dump and this is a little handy.

--
Thanks.
HATAYAMA, Daisuke

2013-10-02 01:07:51

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

(2013/10/02 9:38), HATAYAMA Daisuke wrote:
> (2013/10/02 4:37), Kees Cook wrote:
<cut>
>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
>> }
>>
>> #endif /* CONFIG_X86_32 */
>> +
>> +static struct notifier_block kernel_offset_notifier = {
>> + .notifier_call = dump_kernel_offset
>> +};
>> +
>> +static int __init register_kernel_offset_dumper(void)
>> +{
>> + atomic_notifier_chain_register(&panic_notifier_list,
>> + &kernel_offset_notifier);
>> + return 0;
>> +}
>> +__initcall(register_kernel_offset_dumper);
>>
>
> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS doesn't use
> kdump? Anyway, kdump related tools now calculate phys_base from memory map
> information passed as ELF PT_LOAD entries like below.

Another simpler way is to print this information at boot time, not at panic.

--
Thanks.
HATAYAMA, Daisuke

2013-10-02 05:07:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v6 0/7] Kernel base address randomization


* Kees Cook <[email protected]> wrote:

> Here is the latest version of the kASLR series. It has much improved
> e820 walking code, and expands the window available on 64-bit.
>
> This is rolled out on Chrome OS devices, and working well.

There's one kernel debuggability detail that should be discussed I think:
should symbolic printouts (in oops messages but also in /proc/kallsyms)
and instrumentation interfaces that expose kernel addresses attempt to
de-randomize the addresses, stack contents and register values that lie
within the random range?

- it would be easier to use those addresses and look them up in a vmlinux
or in a System.map as well.

- it would be somewhat safer to post an oops publicly if it did not
contain the random offset in an easily identifiable way.

- oops patterns from distribution kernels that enable randomization would
match up better.

- this would make it safer to expose /proc/kallsyms to user-space
profiling, while keeping the random offset a kernel-internal secret.

- RIP information in profiling streams would thus not contain the
kernel random offset either.

The other approach would be what your series does, to keep all the raw,
randomized output and to assume that users who are allowed to access to
logs or profiling can learn the random offset.

I tend to lean towards the 'raw' approach that you picked, but an argument
can be made for both approaches - and in any case I haven't seen this
discussed to conclusion with cons/pros listed and a consensus/decision
reached.

Thanks,

Ingo

2013-10-02 05:13:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v6 0/7] Kernel base address randomization

I think that the randomization offset would be necessary in order to identify pointers.

Ingo Molnar <[email protected]> wrote:
>
>* Kees Cook <[email protected]> wrote:
>
>> Here is the latest version of the kASLR series. It has much improved
>> e820 walking code, and expands the window available on 64-bit.
>>
>> This is rolled out on Chrome OS devices, and working well.
>
>There's one kernel debuggability detail that should be discussed I
>think:
>should symbolic printouts (in oops messages but also in /proc/kallsyms)
>
>and instrumentation interfaces that expose kernel addresses attempt to
>de-randomize the addresses, stack contents and register values that lie
>
>within the random range?
>
>- it would be easier to use those addresses and look them up in a
>vmlinux
> or in a System.map as well.
>
> - it would be somewhat safer to post an oops publicly if it did not
> contain the random offset in an easily identifiable way.
>
>- oops patterns from distribution kernels that enable randomization
>would
> match up better.
>
> - this would make it safer to expose /proc/kallsyms to user-space
> profiling, while keeping the random offset a kernel-internal secret.
>
> - RIP information in profiling streams would thus not contain the
> kernel random offset either.
>
>The other approach would be what your series does, to keep all the raw,
>
>randomized output and to assume that users who are allowed to access to
>
>logs or profiling can learn the random offset.
>
>I tend to lean towards the 'raw' approach that you picked, but an
>argument
>can be made for both approaches - and in any case I haven't seen this
>discussed to conclusion with cons/pros listed and a consensus/decision
>reached.
>
>Thanks,
>
> Ingo

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2013-10-02 05:25:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v6 0/7] Kernel base address randomization


* H. Peter Anvin <[email protected]> wrote:

> I think that the randomization offset would be necessary in order to
> identify pointers.

I mean, for example in an oops message we print data in words: the RIP,
other registers and stack contents. If any of these values lies within the
randomization range then we could de-randomize it.

So instead of exposing randomized values, we could expose de-randomized
values.

( This isn't fool-proof: if some data value happens to lie within the
random range spuriously then we'll incorrectly transform it. In the
context of oops messages this should not be a big practical problem
though. )

For example, assume that the following oops is from a distro kernel and
contains raw randomized addresses:

[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at init/main.c:544 start_kernel+0x1fa/0x3e8()
[ 0.000000] PANIC: double fault, error_code: 0xffffffff810e74ed
[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0-rc1-01728-gd787a30-dirty #54
[ 0.000000] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 09/04/2008
[ 0.000000] task: ffffffff81e104a0 ti: ffffffff81e00000 task.ti: ffffffff81e00000
[ 0.000000] RIP: 0246:[<0000000000000010>] [<0000000000000010>] 0xf
[ 0.000000] RSP: 0000:0000000000000000 EFLAGS: ffffffff81e01de8
[ 0.000000] RAX: 0000000005330533 RBX: 0000000000000001 RCX: 000000000000023f
[ 0.000000] RDX: 0000000000000533 RSI: 0000000000000046 RDI: ffffffff82099944
[ 0.000000] RBP: ffffffff81e01e68 R08: 000000004e524157 R09: 00000000000000ca
[ 0.000000] R10: 3a4449502030203a R11: 555043203a474e49 R12: 00000000ffffffff
[ 0.000000] R13: 0000000000000000 R14: 0000000000000044 R15: 0000000000000006
[ 0.000000] FS: 0000000000000000(0000) GS:ffff8801b9c00000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.000000] CR2: ffff88033ffff000 CR3: 0000000001e0b000 CR4: 00000000000006b0
[ 0.000000]
[ 0.000000] Kernel panic - not syncing: Machine halted.
[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0-rc1-01728-gd787a30-dirty #54
[ 0.000000] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 09/04/2008
[ 0.000000] ffff8801b9c06f60 ffff8801b9c06e80 ffffffff81822bd3 0000000000000586
[ 0.000000] ffffffff81c8e10b ffff8801b9c06f00 ffffffff8181efdb ffffffff81e00000
[ 0.000000] 0000000000000008 ffff8801b9c06f10 ffff8801b9c06eb0 6230653130303030
[ 0.000000] Call Trace:
[ 0.000000] <#DF> [<ffffffff81822bd3>] dump_stack+0x46/0x58
[ 0.000000] [<ffffffff8181efdb>] panic+0xb6/0x1bf
[ 0.000000] [<ffffffff81078980>] df_debug+0x30/0x30
[ 0.000000] [<ffffffff810e74ed>] ? vprintk_emit+0x1ed/0x4e0
[ 0.000000] [<ffffffff810e74ed>] ? vprintk_emit+0x1ed/0x4e0
[ 0.000000] [<ffffffff81046471>] do_double_fault+0x61/0x90
[ 0.000000] [<ffffffff818324d2>] double_fault+0x22/0x30
[ 0.000000] <<EOE>> [<ffffffff813ba281>] ? vsnprintf+0x471/0x650
[ 0.000000] [<ffffffff81f28c9f>] ? start_kernel+0x1fa/0x3e8
[ 0.000000] [<ffffffff8181f505>] printk+0x5c/0x5e
[ 0.000000] [<ffffffff81f28c9f>] ? start_kernel+0x1fa/0x3e8
[ 0.000000] [<ffffffff810962ec>] warn_slowpath_common+0x6c/0xb0
[ 0.000000] [<ffffffff810963d1>] warn_slowpath_fmt+0x41/0x50
[ 0.000000] [<ffffffff81f28c9f>] start_kernel+0x1fa/0x3e8
[ 0.000000] [<ffffffff81f288a4>] ? repair_env_string+0x5e/0x5e
[ 0.000000] [<ffffffff81f285a5>] x86_64_start_reservations+0x2a/0x2c
[ 0.000000] [<ffffffff81f286a4>] x86_64_start_kernel+0xfd/0x101

Anyone who reads this oops can recover the random offset by knowing the
non-randomized value:

$ grep -w printk /boot/System.map-3.9.10-100.fc17.x86_64
ffffffff81651a64 T printk

and substracting that from the value seen in the oops:

[ 0.000000] [<ffffffff8181f505>] printk+0x5c/0x5e

So my suggestion would be to check each printed out value in an oops
message and de-randomize it if it's within the randomized range. So the
above entry would be printed as the 'static' address:

[ 0.000000] [<ffffffff81651a64>] printk+0x5c/0x5e

Note how this entry does not expose the random offset anymore. It's also
easy to stick the raw value into 'gdb vmlinux' and use it for debugging.

Something similar can be done for profiling streams where we _know_ that
it's a kernel text address (the perf profiling trace entries for example),
and the same could be done for /proc/kallsyms.

The random range is a fairly narrow region of 64-bit address space so
spurious hits should be relatively rare.

Now, a 'raw' address might still lie in places like mixed up in registers
or lying non-word-aligned on the kernel stack - but at least the 'typical'
oops would be fairly safe to post and there would be no 'obvious' places
to recover the secret from, even if you happen to have access to some logs
and some profiling.

At least that's the argument that can be made. I'm not entirely sure it's
valid and I'm leaning towards the simplicity of only outputting raw oops
values.

Thanks,

Ingo

2013-10-02 05:32:05

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v6 0/7] Kernel base address randomization

On 10/01/2013 10:25 PM, Ingo Molnar wrote:
>
> I mean, for example in an oops message we print data in words: the RIP,
> other registers and stack contents. If any of these values lies within the
> randomization range then we could de-randomize it.
>
> So instead of exposing randomized values, we could expose de-randomized
> values.
>
> ( This isn't fool-proof: if some data value happens to lie within the
> random range spuriously then we'll incorrectly transform it. In the
> context of oops messages this should not be a big practical problem
> though. )
>

I don't agree that this isn't a big practical problem. I often find it
necessary to pick out "things that look like pointers". Overall,
derandomization would make it possible to get really confused when you
have things like half a pointer overwritten.

-hpa

2013-10-02 05:36:35

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v6 0/7] Kernel base address randomization

On Tue, Oct 1, 2013 at 10:30 PM, H. Peter Anvin <[email protected]> wrote:
> On 10/01/2013 10:25 PM, Ingo Molnar wrote:
>>
>> I mean, for example in an oops message we print data in words: the RIP,
>> other registers and stack contents. If any of these values lies within the
>> randomization range then we could de-randomize it.
>>
>> So instead of exposing randomized values, we could expose de-randomized
>> values.
>>
>> ( This isn't fool-proof: if some data value happens to lie within the
>> random range spuriously then we'll incorrectly transform it. In the
>> context of oops messages this should not be a big practical problem
>> though. )
>>
>
> I don't agree that this isn't a big practical problem. I often find it
> necessary to pick out "things that look like pointers". Overall,
> derandomization would make it possible to get really confused when you
> have things like half a pointer overwritten.

I think reflecting the reality of the system is the correct way to go.
Attempting to do the derandomization on the live system seems
extremely fragile. It's much cleaner to have a "true" view of the
running system and work from there. I don't want to have to wonder if
my kernel is lying to me about where things are in memory any more
than it already does. :)

-Kees

--
Kees Cook
Chrome OS Security

2013-10-02 07:48:58

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

On Tue, Oct 1, 2013 at 5:38 PM, HATAYAMA Daisuke
<[email protected]> wrote:
> (2013/10/02 4:37), Kees Cook wrote:
>> When the system panics, include the kernel offset in the report to assist
>> in debugging.
>>
>> Signed-off-by: Kees Cook <[email protected]>
>> ---
>> arch/x86/kernel/setup.c | 26 ++++++++++++++++++++++++++
>> 1 file changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index f0de629..1708862 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -824,6 +824,20 @@ static void __init trim_low_memory_range(void)
>> }
>>
>> /*
>> + * Dump out kernel offset information on panic.
>> + */
>> +static int
>> +dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>> +{
>> + pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
>> + "(relocation range: 0x%lx-0x%lx)\n",
>> + (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
>> + __START_KERNEL_map, MODULES_VADDR-1);
>
> Using phys_base seems better.

phys_base is not changed during relocation. For example, if I print
out phys_base during this pr_emerg call, it remains 0x0, even though
the random offset was 0xa200000.

>> +
>> + return 0;
>> +}
>> +
>> +/*
>> * Determine if we were loaded by an EFI loader. If so, then we have also been
>> * passed the efi memmap, systab, etc., so we should use these data structures
>> * for initialization. Note, the efi init code path is determined by the
>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
>> }
>>
>> #endif /* CONFIG_X86_32 */
>> +
>> +static struct notifier_block kernel_offset_notifier = {
>> + .notifier_call = dump_kernel_offset
>> +};
>> +
>> +static int __init register_kernel_offset_dumper(void)
>> +{
>> + atomic_notifier_chain_register(&panic_notifier_list,
>> + &kernel_offset_notifier);
>> + return 0;
>> +}
>> +__initcall(register_kernel_offset_dumper);
>>
>
> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS doesn't use
> kdump? Anyway, kdump related tools now calculate phys_base from memory map
> information passed as ELF PT_LOAD entries like below.

Correct, we are not currently using kdump.

> $ LANG=C readelf -l vmcore-rhel6up4
>
> Elf file type is CORE (Core file)
> Entry point 0x0
> There are 5 program headers, starting at offset 64
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> NOTE 0x0000000000000158 0x0000000000000000 0x0000000000000000
> 0x0000000000000b08 0x0000000000000b08 0
> LOAD 0x0000000000000c60 0xffffffff81000000 0x0000000001000000
> 0x000000000103b000 0x000000000103b000 RWE 0
> LOAD 0x000000000103bc60 0xffff880000001000 0x0000000000001000
> 0x000000000009cc00 0x000000000009cc00 RWE 0
> LOAD 0x00000000010d8860 0xffff880000100000 0x0000000000100000
> 0x0000000002f00000 0x0000000002f00000 RWE 0
> LOAD 0x0000000003fd8860 0xffff880013000000 0x0000000013000000
> 0x000000002cffd000 0x000000002cffd000 RWE 0
>
> Each PT_LOAD entry is assigned to virtual and physical address. In this case,
> 1st PT_LOAD entry belongs to kernel text mapping region, from which we can
> calculate phys_base value.

It seems like all the information you need would still be available?
The virtual address is there, so it should be trivial to see the
offset, IIUC.

> Therefore, we already have phys_base information even in case of kdump, and
> as long as using kdump-related tools such as crash utility, we don't need
> to see ELF PT_LOAD headers as I explain here because they calculate the process
> I explain here automatically.
>
> Another idea is to add phys_base value in VMCOREINFO information that is debugging
> information for user-land tools to filter crash dump. This is simple string information
> so you can see the values contained in some crash dump by using strings command to it.
> For example,

Sure, though not phys_base, but rather the offset? I think this is
similar to coredumps of PIE binaries end up with, though I haven't
looked very closely at that in a while.

-Kees

--
Kees Cook
Chrome OS Security

2013-10-02 07:51:32

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

On Tue, Oct 1, 2013 at 6:06 PM, HATAYAMA Daisuke
<[email protected]> wrote:
> (2013/10/02 9:38), HATAYAMA Daisuke wrote:
>> (2013/10/02 4:37), Kees Cook wrote:
> <cut>
>>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
>>> }
>>>
>>> #endif /* CONFIG_X86_32 */
>>> +
>>> +static struct notifier_block kernel_offset_notifier = {
>>> + .notifier_call = dump_kernel_offset
>>> +};
>>> +
>>> +static int __init register_kernel_offset_dumper(void)
>>> +{
>>> + atomic_notifier_chain_register(&panic_notifier_list,
>>> + &kernel_offset_notifier);
>>> + return 0;
>>> +}
>>> +__initcall(register_kernel_offset_dumper);
>>>
>>
>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS doesn't use
>> kdump? Anyway, kdump related tools now calculate phys_base from memory map
>> information passed as ELF PT_LOAD entries like below.
>
> Another simpler way is to print this information at boot time, not at panic.

No, since a dump may happen temporally far enough away from boot time
that the offset would not be contained in the dmesg buffers any more.
The offset report needs to be part of the panic message, especially
for those using pstore (e.g. Chrome OS0 for crash reporting (which
includes only the panic log contents).

-Kees

--
Kees Cook
Chrome OS Security

2013-10-02 09:14:50

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

(2013/10/02 16:48), Kees Cook wrote:
> On Tue, Oct 1, 2013 at 5:38 PM, HATAYAMA Daisuke
> <[email protected]> wrote:
>> (2013/10/02 4:37), Kees Cook wrote:
>>> When the system panics, include the kernel offset in the report to assist
>>> in debugging.
>>>
>>> Signed-off-by: Kees Cook <[email protected]>
>>> ---
>>> arch/x86/kernel/setup.c | 26 ++++++++++++++++++++++++++
>>> 1 file changed, 26 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>> index f0de629..1708862 100644
>>> --- a/arch/x86/kernel/setup.c
>>> +++ b/arch/x86/kernel/setup.c
>>> @@ -824,6 +824,20 @@ static void __init trim_low_memory_range(void)
>>> }
>>>
>>> /*
>>> + * Dump out kernel offset information on panic.
>>> + */
>>> +static int
>>> +dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>>> +{
>>> + pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
>>> + "(relocation range: 0x%lx-0x%lx)\n",
>>> + (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
>>> + __START_KERNEL_map, MODULES_VADDR-1);
>>
>> Using phys_base seems better.
>
> phys_base is not changed during relocation. For example, if I print
> out phys_base during this pr_emerg call, it remains 0x0, even though
> the random offset was 0xa200000.
>

Thanks, I'm clear. The relocation changes addresses assigned to kernel symbols
themselves and kernel text region is loaded into the newly assigned relocated
address. So, phys_base results in 0.

>>> +
>>> + return 0;
>>> +}
>>> +
>>> +/*
>>> * Determine if we were loaded by an EFI loader. If so, then we have also been
>>> * passed the efi memmap, systab, etc., so we should use these data structures
>>> * for initialization. Note, the efi init code path is determined by the
>>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
>>> }
>>>
>>> #endif /* CONFIG_X86_32 */
>>> +
>>> +static struct notifier_block kernel_offset_notifier = {
>>> + .notifier_call = dump_kernel_offset
>>> +};
>>> +
>>> +static int __init register_kernel_offset_dumper(void)
>>> +{
>>> + atomic_notifier_chain_register(&panic_notifier_list,
>>> + &kernel_offset_notifier);
>>> + return 0;
>>> +}
>>> +__initcall(register_kernel_offset_dumper);
>>>
>>
>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS doesn't use
>> kdump? Anyway, kdump related tools now calculate phys_base from memory map
>> information passed as ELF PT_LOAD entries like below.
>
> Correct, we are not currently using kdump.
>
>> $ LANG=C readelf -l vmcore-rhel6up4
>>
>> Elf file type is CORE (Core file)
>> Entry point 0x0
>> There are 5 program headers, starting at offset 64
>>
>> Program Headers:
>> Type Offset VirtAddr PhysAddr
>> FileSiz MemSiz Flags Align
>> NOTE 0x0000000000000158 0x0000000000000000 0x0000000000000000
>> 0x0000000000000b08 0x0000000000000b08 0
>> LOAD 0x0000000000000c60 0xffffffff81000000 0x0000000001000000
>> 0x000000000103b000 0x000000000103b000 RWE 0
>> LOAD 0x000000000103bc60 0xffff880000001000 0x0000000000001000
>> 0x000000000009cc00 0x000000000009cc00 RWE 0
>> LOAD 0x00000000010d8860 0xffff880000100000 0x0000000000100000
>> 0x0000000002f00000 0x0000000002f00000 RWE 0
>> LOAD 0x0000000003fd8860 0xffff880013000000 0x0000000013000000
>> 0x000000002cffd000 0x000000002cffd000 RWE 0
>>
>> Each PT_LOAD entry is assigned to virtual and physical address. In this case,
>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we can
>> calculate phys_base value.
>
> It seems like all the information you need would still be available?
> The virtual address is there, so it should be trivial to see the
> offset, IIUC.
>

Partially yes. I think OK to analyze crash dump by crash utility, a gdb-based
symbolic debugger for kernel, since phys_base absorbs kernel offset caused by
relocation and phys_base is available in the way I explained above.

However, the gained phys_base is not correct one, exactly
phys_base + offset_by_relocation.
When analyzing crash dump by crash utility, we use debug information generated
during kernel build, which we install as kernel-debuginfo on RHEL for example.
Symbols in debuginfo have statically assigned addresses at build so we see
the statically assigned addresses during debugging and we see
phys_base + offset_by_relocation as phys_base. This would be problematic
if failure on crash dump is relevant to the relocated addresses, though I don't
immediately come up with crash senario where relocated symbol is defitely
necessary.

Still we can get relocated addresses if kallsyms is enabled on the kernel,
but kallsyms and relocatable kernels are authogonal. I don't think it natural
to rely on kallsyms. It seems natural to export relocation information newly
as debugging information.

>> Therefore, we already have phys_base information even in case of kdump, and
>> as long as using kdump-related tools such as crash utility, we don't need
>> to see ELF PT_LOAD headers as I explain here because they calculate the process
>> I explain here automatically.
>>
>> Another idea is to add phys_base value in VMCOREINFO information that is debugging
>> information for user-land tools to filter crash dump. This is simple string information
>> so you can see the values contained in some crash dump by using strings command to it.
>> For example,
>
> Sure, though not phys_base, but rather the offset? I think this is
> similar to coredumps of PIE binaries end up with, though I haven't
> looked very closely at that in a while.
>
> -Kees
>


--
Thanks.
HATAYAMA, Daisuke

2013-10-03 00:36:10

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

(2013/10/02 18:13), HATAYAMA Daisuke wrote:
> (2013/10/02 16:48), Kees Cook wrote:
<cut>
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> * Determine if we were loaded by an EFI loader. If so, then we have also been
>>>> * passed the efi memmap, systab, etc., so we should use these data structures
>>>> * for initialization. Note, the efi init code path is determined by the
>>>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
>>>> }
>>>>
>>>> #endif /* CONFIG_X86_32 */
>>>> +
>>>> +static struct notifier_block kernel_offset_notifier = {
>>>> + .notifier_call = dump_kernel_offset
>>>> +};
>>>> +
>>>> +static int __init register_kernel_offset_dumper(void)
>>>> +{
>>>> + atomic_notifier_chain_register(&panic_notifier_list,
>>>> + &kernel_offset_notifier);
>>>> + return 0;
>>>> +}
>>>> +__initcall(register_kernel_offset_dumper);
>>>>
>>>
>>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS doesn't use
>>> kdump? Anyway, kdump related tools now calculate phys_base from memory map
>>> information passed as ELF PT_LOAD entries like below.
>>
>> Correct, we are not currently using kdump.
>>
>>> $ LANG=C readelf -l vmcore-rhel6up4
>>>
>>> Elf file type is CORE (Core file)
>>> Entry point 0x0
>>> There are 5 program headers, starting at offset 64
>>>
>>> Program Headers:
>>> Type Offset VirtAddr PhysAddr
>>> FileSiz MemSiz Flags Align
>>> NOTE 0x0000000000000158 0x0000000000000000 0x0000000000000000
>>> 0x0000000000000b08 0x0000000000000b08 0
>>> LOAD 0x0000000000000c60 0xffffffff81000000 0x0000000001000000
>>> 0x000000000103b000 0x000000000103b000 RWE 0
>>> LOAD 0x000000000103bc60 0xffff880000001000 0x0000000000001000
>>> 0x000000000009cc00 0x000000000009cc00 RWE 0
>>> LOAD 0x00000000010d8860 0xffff880000100000 0x0000000000100000
>>> 0x0000000002f00000 0x0000000002f00000 RWE 0
>>> LOAD 0x0000000003fd8860 0xffff880013000000 0x0000000013000000
>>> 0x000000002cffd000 0x000000002cffd000 RWE 0
>>>
>>> Each PT_LOAD entry is assigned to virtual and physical address. In this case,
>>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we can
>>> calculate phys_base value.
>>
>> It seems like all the information you need would still be available?
>> The virtual address is there, so it should be trivial to see the
>> offset, IIUC.
>>
>
> Partially yes. I think OK to analyze crash dump by crash utility, a gdb-based
> symbolic debugger for kernel, since phys_base absorbs kernel offset caused by
> relocation and phys_base is available in the way I explained above.
>
> However, the gained phys_base is not correct one, exactly
> phys_base + offset_by_relocation.
> When analyzing crash dump by crash utility, we use debug information generated
> during kernel build, which we install as kernel-debuginfo on RHEL for example.
> Symbols in debuginfo have statically assigned addresses at build so we see
> the statically assigned addresses during debugging and we see
> phys_base + offset_by_relocation as phys_base. This would be problematic
> if failure on crash dump is relevant to the relocated addresses, though I don't
> immediately come up with crash senario where relocated symbol is defitely
> necessary.
>
> Still we can get relocated addresses if kallsyms is enabled on the kernel,
> but kallsyms and relocatable kernels are authogonal. I don't think it natural
> to rely on kallsyms. It seems natural to export relocation information newly
> as debugging information.
>

I was confused yesterday. As I said above, kdump related tools now don't support
relocation on x86_64, phys_base only. kdump related tools think of present kernel
offset as phys_base. Then, they reflect kernel offset caused by relocation in
physical addresses only, not in virtual addresses. This obviously affects the tools.

BTW, relocation looks more sophisticated than phys_base one. Is it possible to
switch from phys_base one to relocation on x86_64? On x86, relocation is used so
I guess x86_64 can work in the same way. Is there something missing?
Is there what phys_base can but relocation cannot on x86_64?

And, Dave, is there feature for crash utility to treat relocation now?

--
Thanks.
HATAYAMA, Daisuke

2013-10-03 13:47:45

by Dave Anderson

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic



----- Original Message -----
> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
> > (2013/10/02 16:48), Kees Cook wrote:
> <cut>
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> * Determine if we were loaded by an EFI loader. If so, then we have also been
> >>>> * passed the efi memmap, systab, etc., so we should use these data structures
> >>>> * for initialization. Note, the efi init code path is determined by the
> >>>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
> >>>> }
> >>>>
> >>>> #endif /* CONFIG_X86_32 */
> >>>> +
> >>>> +static struct notifier_block kernel_offset_notifier = {
> >>>> + .notifier_call = dump_kernel_offset
> >>>> +};
> >>>> +
> >>>> +static int __init register_kernel_offset_dumper(void)
> >>>> +{
> >>>> + atomic_notifier_chain_register(&panic_notifier_list,
> >>>> + &kernel_offset_notifier);
> >>>> + return 0;
> >>>> +}
> >>>> +__initcall(register_kernel_offset_dumper);
> >>>>
> >>>
> >>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS doesn't use
> >>> kdump? Anyway, kdump related tools now calculate phys_base from memory map
> >>> information passed as ELF PT_LOAD entries like below.
> >>
> >> Correct, we are not currently using kdump.
> >>
> >>> $ LANG=C readelf -l vmcore-rhel6up4
> >>>
> >>> Elf file type is CORE (Core file)
> >>> Entry point 0x0
> >>> There are 5 program headers, starting at offset 64
> >>>
> >>> Program Headers:
> >>> Type Offset VirtAddr PhysAddr
> >>> FileSiz MemSiz Flags Align
> >>> NOTE 0x0000000000000158 0x0000000000000000 0x0000000000000000
> >>> 0x0000000000000b08 0x0000000000000b08 0
> >>> LOAD 0x0000000000000c60 0xffffffff81000000 0x0000000001000000
> >>> 0x000000000103b000 0x000000000103b000 RWE 0
> >>> LOAD 0x000000000103bc60 0xffff880000001000 0x0000000000001000
> >>> 0x000000000009cc00 0x000000000009cc00 RWE 0
> >>> LOAD 0x00000000010d8860 0xffff880000100000 0x0000000000100000
> >>> 0x0000000002f00000 0x0000000002f00000 RWE 0
> >>> LOAD 0x0000000003fd8860 0xffff880013000000 0x0000000013000000
> >>> 0x000000002cffd000 0x000000002cffd000 RWE 0
> >>>
> >>> Each PT_LOAD entry is assigned to virtual and physical address. In this case,
> >>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we can
> >>> calculate phys_base value.
> >>
> >> It seems like all the information you need would still be available?
> >> The virtual address is there, so it should be trivial to see the
> >> offset, IIUC.
> >>
> >
> > Partially yes. I think OK to analyze crash dump by crash utility, a gdb-based
> > symbolic debugger for kernel, since phys_base absorbs kernel offset caused by
> > relocation and phys_base is available in the way I explained above.
> >
> > However, the gained phys_base is not correct one, exactly phys_base + offset_by_relocation.
> > When analyzing crash dump by crash utility, we use debug information generated
> > during kernel build, which we install as kernel-debuginfo on RHEL for example.
> > Symbols in debuginfo have statically assigned addresses at build so we see
> > the statically assigned addresses during debugging and we see
> > phys_base + offset_by_relocation as phys_base. This would be problematic
> > if failure on crash dump is relevant to the relocated addresses, though I don't
> > immediately come up with crash senario where relocated symbol is defitely necessary.
> >
> > Still we can get relocated addresses if kallsyms is enabled on the kernel,
> > but kallsyms and relocatable kernels are authogonal. I don't think it natural
> > to rely on kallsyms. It seems natural to export relocation information newly
> > as debugging information.
> >
>
> I was confused yesterday. As I said above, kdump related tools now don't support
> relocation on x86_64, phys_base only. kdump related tools think of present kernel
> offset as phys_base. Then, they reflect kernel offset caused by relocation in
> physical addresses only, not in virtual addresses. This obviously affects the
> tools.
>
> BTW, relocation looks more sophisticated than phys_base one. Is it possible to
> switch from phys_base one to relocation on x86_64? On x86, relocation is used so
> I guess x86_64 can work in the same way. Is there something missing?
> Is there what phys_base can but relocation cannot on x86_64?
>
> And, Dave, is there feature for crash utility to treat relocation now?

Well sort of, there are couple guessing-game kludges that can be used.

For 32-bit x86 systems configured with a CONFIG_PHYSICAL_START value
that is larger than its CONFIG_PHYSICAL_ALIGN value, such that the
vmlinux symbol values do not match their relocated virtual address
values, there are two options for analyzing dumpfiles:

(1) there is a "--reloc size" command line option, presuming that
you know what it is.
(2) take a snapshot of the /proc/kallsyms file from the crashing
system into a file, and put it on the command line, similar
to putting a System.map file on the command line in order to
override the symbol values in the vmlinux file.

In those cases, we have to alter all of the symbols seen in the
vmlinux file, and go into a backdoor into the embedded gdb module
to patch/modify the symbol values.

On live x86 systems, the two options above are not necessary if
/proc/kallsyms exists, because its contents can be checked against
the vmlinux file symbol values, and the relocation calculated.

For x86_64, the --reloc argument has never been needed. But if
for whatever reason the "phys_base" value cannot be determined,
it can be forced with the "--machdep phys_base=addr" option,
again presuming you know what it is.

Dave

> --
> Thanks.
> HATAYAMA, Daisuke
>
>

2013-10-07 02:00:17

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

(2013/10/03 22:47), Dave Anderson wrote:
>
>
> ----- Original Message -----
>> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
>>> (2013/10/02 16:48), Kees Cook wrote:
>> <cut>
>>>>>> +
>>>>>> + return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> * Determine if we were loaded by an EFI loader. If so, then we have also been
>>>>>> * passed the efi memmap, systab, etc., so we should use these data structures
>>>>>> * for initialization. Note, the efi init code path is determined by the
>>>>>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
>>>>>> }
>>>>>>
>>>>>> #endif /* CONFIG_X86_32 */
>>>>>> +
>>>>>> +static struct notifier_block kernel_offset_notifier = {
>>>>>> + .notifier_call = dump_kernel_offset
>>>>>> +};
>>>>>> +
>>>>>> +static int __init register_kernel_offset_dumper(void)
>>>>>> +{
>>>>>> + atomic_notifier_chain_register(&panic_notifier_list,
>>>>>> + &kernel_offset_notifier);
>>>>>> + return 0;
>>>>>> +}
>>>>>> +__initcall(register_kernel_offset_dumper);
>>>>>>
>>>>>
>>>>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS doesn't use
>>>>> kdump? Anyway, kdump related tools now calculate phys_base from memory map
>>>>> information passed as ELF PT_LOAD entries like below.
>>>>
>>>> Correct, we are not currently using kdump.
>>>>
>>>>> $ LANG=C readelf -l vmcore-rhel6up4
>>>>>
>>>>> Elf file type is CORE (Core file)
>>>>> Entry point 0x0
>>>>> There are 5 program headers, starting at offset 64
>>>>>
>>>>> Program Headers:
>>>>> Type Offset VirtAddr PhysAddr
>>>>> FileSiz MemSiz Flags Align
>>>>> NOTE 0x0000000000000158 0x0000000000000000 0x0000000000000000
>>>>> 0x0000000000000b08 0x0000000000000b08 0
>>>>> LOAD 0x0000000000000c60 0xffffffff81000000 0x0000000001000000
>>>>> 0x000000000103b000 0x000000000103b000 RWE 0
>>>>> LOAD 0x000000000103bc60 0xffff880000001000 0x0000000000001000
>>>>> 0x000000000009cc00 0x000000000009cc00 RWE 0
>>>>> LOAD 0x00000000010d8860 0xffff880000100000 0x0000000000100000
>>>>> 0x0000000002f00000 0x0000000002f00000 RWE 0
>>>>> LOAD 0x0000000003fd8860 0xffff880013000000 0x0000000013000000
>>>>> 0x000000002cffd000 0x000000002cffd000 RWE 0
>>>>>
>>>>> Each PT_LOAD entry is assigned to virtual and physical address. In this case,
>>>>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we can
>>>>> calculate phys_base value.
>>>>
>>>> It seems like all the information you need would still be available?
>>>> The virtual address is there, so it should be trivial to see the
>>>> offset, IIUC.
>>>>
>>>
>>> Partially yes. I think OK to analyze crash dump by crash utility, a gdb-based
>>> symbolic debugger for kernel, since phys_base absorbs kernel offset caused by
>>> relocation and phys_base is available in the way I explained above.
>>>
>>> However, the gained phys_base is not correct one, exactly phys_base + offset_by_relocation.
>>> When analyzing crash dump by crash utility, we use debug information generated
>>> during kernel build, which we install as kernel-debuginfo on RHEL for example.
>>> Symbols in debuginfo have statically assigned addresses at build so we see
>>> the statically assigned addresses during debugging and we see
>>> phys_base + offset_by_relocation as phys_base. This would be problematic
>>> if failure on crash dump is relevant to the relocated addresses, though I don't
>>> immediately come up with crash senario where relocated symbol is defitely necessary.
>>>
>>> Still we can get relocated addresses if kallsyms is enabled on the kernel,
>>> but kallsyms and relocatable kernels are authogonal. I don't think it natural
>>> to rely on kallsyms. It seems natural to export relocation information newly
>>> as debugging information.
>>>
>>
>> I was confused yesterday. As I said above, kdump related tools now don't support
>> relocation on x86_64, phys_base only. kdump related tools think of present kernel
>> offset as phys_base. Then, they reflect kernel offset caused by relocation in
>> physical addresses only, not in virtual addresses. This obviously affects the
>> tools.
>>
>> BTW, relocation looks more sophisticated than phys_base one. Is it possible to
>> switch from phys_base one to relocation on x86_64? On x86, relocation is used so
>> I guess x86_64 can work in the same way. Is there something missing?
>> Is there what phys_base can but relocation cannot on x86_64?
>>
>> And, Dave, is there feature for crash utility to treat relocation now?
>
> Well sort of, there are couple guessing-game kludges that can be used.
>
> For 32-bit x86 systems configured with a CONFIG_PHYSICAL_START value
> that is larger than its CONFIG_PHYSICAL_ALIGN value, such that the
> vmlinux symbol values do not match their relocated virtual address
> values, there are two options for analyzing dumpfiles:
>
> (1) there is a "--reloc size" command line option, presuming that
> you know what it is.
> (2) take a snapshot of the /proc/kallsyms file from the crashing
> system into a file, and put it on the command line, similar
> to putting a System.map file on the command line in order to
> override the symbol values in the vmlinux file.
>
> In those cases, we have to alter all of the symbols seen in the
> vmlinux file, and go into a backdoor into the embedded gdb module
> to patch/modify the symbol values.
>
> On live x86 systems, the two options above are not necessary if
> /proc/kallsyms exists, because its contents can be checked against
> the vmlinux file symbol values, and the relocation calculated.
>
> For x86_64, the --reloc argument has never been needed. But if
> for whatever reason the "phys_base" value cannot be determined,
> it can be forced with the "--machdep phys_base=addr" option,
> again presuming you know what it is.
>

Thanks for detailed explanation. So, there's already a feature in crash utility
to address relocation!, though it's better for me to try them to check if it's
really applicable to this feature. My concern is whether --reloc works well
on x86_64 too, because relocation has never done on x86_64 ever, right?

Another concern is that in case of relocation, users need to additional information
regarding runtime symbol information to crash utility. I want to avoid additional
process, automation is preferable if possible.

I guess it's enough if there's runtime symbol addresses because we can get relocated
offset value by comparing it with the compile-time symbol address contained in
a given debuginfo file. Candidates for such symbols are the ones contained in
VMCOREINFO note containing some symbol values for makedumpfile to refer to mm-related
objects in kernel, which is always contained in vmcore generated by current kdump and
also vmcores converted by makedumpfile from it. How about this idea?

# I added CC to crash utility mailing list

--
Thanks.
HATAYAMA, Daisuke

2013-10-07 13:22:13

by Dave Anderson

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic



----- Original Message -----
> (2013/10/03 22:47), Dave Anderson wrote:
> >
> >
> > ----- Original Message -----
> >> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
> >>> (2013/10/02 16:48), Kees Cook wrote:
> >> <cut>
> >>>>>> +
> >>>>>> + return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>> +/*
> >>>>>> * Determine if we were loaded by an EFI loader. If so, then we
> >>>>>> have also been
> >>>>>> * passed the efi memmap, systab, etc., so we should use these
> >>>>>> data structures
> >>>>>> * for initialization. Note, the efi init code path is determined
> >>>>>> by the
> >>>>>> @@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
> >>>>>> }
> >>>>>>
> >>>>>> #endif /* CONFIG_X86_32 */
> >>>>>> +
> >>>>>> +static struct notifier_block kernel_offset_notifier = {
> >>>>>> + .notifier_call = dump_kernel_offset
> >>>>>> +};
> >>>>>> +
> >>>>>> +static int __init register_kernel_offset_dumper(void)
> >>>>>> +{
> >>>>>> + atomic_notifier_chain_register(&panic_notifier_list,
> >>>>>> + &kernel_offset_notifier);
> >>>>>> + return 0;
> >>>>>> +}
> >>>>>> +__initcall(register_kernel_offset_dumper);
> >>>>>>
> >>>>>
> >>>>> Panic notifier is not executed if kdump is enabled. Maybe, Chrome OS
> >>>>> doesn't use
> >>>>> kdump? Anyway, kdump related tools now calculate phys_base from memory
> >>>>> map
> >>>>> information passed as ELF PT_LOAD entries like below.
> >>>>
> >>>> Correct, we are not currently using kdump.
> >>>>
> >>>>> $ LANG=C readelf -l vmcore-rhel6up4
> >>>>>
> >>>>> Elf file type is CORE (Core file)
> >>>>> Entry point 0x0
> >>>>> There are 5 program headers, starting at offset 64
> >>>>>
> >>>>> Program Headers:
> >>>>> Type Offset VirtAddr PhysAddr
> >>>>> FileSiz MemSiz Flags Align
> >>>>> NOTE 0x0000000000000158 0x0000000000000000
> >>>>> 0x0000000000000000
> >>>>> 0x0000000000000b08 0x0000000000000b08 0
> >>>>> LOAD 0x0000000000000c60 0xffffffff81000000
> >>>>> 0x0000000001000000
> >>>>> 0x000000000103b000 0x000000000103b000 RWE 0
> >>>>> LOAD 0x000000000103bc60 0xffff880000001000
> >>>>> 0x0000000000001000
> >>>>> 0x000000000009cc00 0x000000000009cc00 RWE 0
> >>>>> LOAD 0x00000000010d8860 0xffff880000100000
> >>>>> 0x0000000000100000
> >>>>> 0x0000000002f00000 0x0000000002f00000 RWE 0
> >>>>> LOAD 0x0000000003fd8860 0xffff880013000000
> >>>>> 0x0000000013000000
> >>>>> 0x000000002cffd000 0x000000002cffd000 RWE 0
> >>>>>
> >>>>> Each PT_LOAD entry is assigned to virtual and physical address. In this
> >>>>> case,
> >>>>> 1st PT_LOAD entry belongs to kernel text mapping region, from which we
> >>>>> can
> >>>>> calculate phys_base value.
> >>>>
> >>>> It seems like all the information you need would still be available?
> >>>> The virtual address is there, so it should be trivial to see the
> >>>> offset, IIUC.
> >>>>
> >>>
> >>> Partially yes. I think OK to analyze crash dump by crash utility, a
> >>> gdb-based
> >>> symbolic debugger for kernel, since phys_base absorbs kernel offset
> >>> caused by
> >>> relocation and phys_base is available in the way I explained above.
> >>>
> >>> However, the gained phys_base is not correct one, exactly phys_base +
> >>> offset_by_relocation.
> >>> When analyzing crash dump by crash utility, we use debug information
> >>> generated
> >>> during kernel build, which we install as kernel-debuginfo on RHEL for
> >>> example.
> >>> Symbols in debuginfo have statically assigned addresses at build so we
> >>> see
> >>> the statically assigned addresses during debugging and we see
> >>> phys_base + offset_by_relocation as phys_base. This would be problematic
> >>> if failure on crash dump is relevant to the relocated addresses, though I
> >>> don't
> >>> immediately come up with crash senario where relocated symbol is defitely
> >>> necessary.
> >>>
> >>> Still we can get relocated addresses if kallsyms is enabled on the
> >>> kernel,
> >>> but kallsyms and relocatable kernels are authogonal. I don't think it
> >>> natural
> >>> to rely on kallsyms. It seems natural to export relocation information
> >>> newly
> >>> as debugging information.
> >>>
> >>
> >> I was confused yesterday. As I said above, kdump related tools now don't
> >> support
> >> relocation on x86_64, phys_base only. kdump related tools think of present
> >> kernel
> >> offset as phys_base. Then, they reflect kernel offset caused by relocation
> >> in
> >> physical addresses only, not in virtual addresses. This obviously affects
> >> the
> >> tools.
> >>
> >> BTW, relocation looks more sophisticated than phys_base one. Is it
> >> possible to
> >> switch from phys_base one to relocation on x86_64? On x86, relocation is
> >> used so
> >> I guess x86_64 can work in the same way. Is there something missing?
> >> Is there what phys_base can but relocation cannot on x86_64?
> >>
> >> And, Dave, is there feature for crash utility to treat relocation now?
> >
> > Well sort of, there are couple guessing-game kludges that can be used.
> >
> > For 32-bit x86 systems configured with a CONFIG_PHYSICAL_START value
> > that is larger than its CONFIG_PHYSICAL_ALIGN value, such that the
> > vmlinux symbol values do not match their relocated virtual address
> > values, there are two options for analyzing dumpfiles:
> >
> > (1) there is a "--reloc size" command line option, presuming that
> > you know what it is.
> > (2) take a snapshot of the /proc/kallsyms file from the crashing
> > system into a file, and put it on the command line, similar
> > to putting a System.map file on the command line in order to
> > override the symbol values in the vmlinux file.
> >
> > In those cases, we have to alter all of the symbols seen in the
> > vmlinux file, and go into a backdoor into the embedded gdb module
> > to patch/modify the symbol values.
> >
> > On live x86 systems, the two options above are not necessary if
> > /proc/kallsyms exists, because its contents can be checked against
> > the vmlinux file symbol values, and the relocation calculated.
> >
> > For x86_64, the --reloc argument has never been needed. But if
> > for whatever reason the "phys_base" value cannot be determined,
> > it can be forced with the "--machdep phys_base=addr" option,
> > again presuming you know what it is.
> >
>
> Thanks for detailed explanation. So, there's already a feature in crash utility
> to address relocation!, though it's better for me to try them to check if it's
> really applicable to this feature. My concern is whether --reloc works well
> on x86_64 too, because relocation has never done on x86_64 ever, right?

Correct.

> Another concern is that in case of relocation, users need to additional information
> regarding runtime symbol information to crash utility. I want to avoid additional
> process, automation is preferable if possible.

Right. As I mentioned in the case of 32-bit x86 dumpfiles, there is no automation
available when CONFIG_PHYSICAL_START is larger than CONFIG_PHYSICAL_ALIGN. The user
either has to be aware of their values in order to calculate the --reloc argument,
or has to capture a copy of the /proc/kallsyms file on the crashed system. Typically
users/distros using kdump changed their x86 configurations to avoid having to deal
with that.

> I guess it's enough if there's runtime symbol addresses because we can get relocated
> offset value by comparing it with the compile-time symbol address contained in
> a given debuginfo file. Candidates for such symbols are the ones contained in
> VMCOREINFO note containing some symbol values for makedumpfile to refer to mm-related
> objects in kernel, which is always contained in vmcore generated by current kdump and
> also vmcores converted by makedumpfile from it. How about this idea?

But how would that differ from using an incorrect (non-matching) vmlinux file?

Dave

2013-10-08 09:53:17

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

(2013/10/07 22:21), Dave Anderson wrote:

> ----- Original Message -----
>> (2013/10/03 22:47), Dave Anderson wrote:

>>> ----- Original Message -----
>>>> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
>>>>> (2013/10/02 16:48), Kees Cook wrote:

>>
>> Thanks for detailed explanation. So, there's already a feature in crash utility
>> to address relocation!, though it's better for me to try them to check if it's
>> really applicable to this feature. My concern is whether --reloc works well
>> on x86_64 too, because relocation has never done on x86_64 ever, right?
>
> Correct.
>
>> Another concern is that in case of relocation, users need to additional information
>> regarding runtime symbol information to crash utility. I want to avoid additional
>> process, automation is preferable if possible.
>
> Right. As I mentioned in the case of 32-bit x86 dumpfiles, there is no automation
> available when CONFIG_PHYSICAL_START is larger than CONFIG_PHYSICAL_ALIGN. The user
> either has to be aware of their values in order to calculate the --reloc argument,
> or has to capture a copy of the /proc/kallsyms file on the crashed system. Typically
> users/distros using kdump changed their x86 configurations to avoid having to deal
> with that.
>

Sorry, I don't understand why relocation size cannot be calculated when
CONFIG_PHYSICALSTART > CONFIG_PHYSICAL_ALIGN. Could you explain that?

>> I guess it's enough if there's runtime symbol addresses because we can get relocated
>> offset value by comparing it with the compile-time symbol address contained in
>> a given debuginfo file. Candidates for such symbols are the ones contained in
>> VMCOREINFO note containing some symbol values for makedumpfile to refer to mm-related
>> objects in kernel, which is always contained in vmcore generated by current kdump and
>> also vmcores converted by makedumpfile from it. How about this idea?
>
> But how would that differ from using an incorrect (non-matching) vmlinux file?
>

It seems to me almost similar to what crash currently does even if we do relocation check.
The current check crash currently does is trial-and-error since there's no information
indicating given vmcore and vmlinuxcertainly match well.

For example, the process I imagine is:

1) try to match vmcore and vmlinux with no relocation. If fails, go to 2).
2) try to match vmcore and vmlinux with relocation.

The two steps include symbol table initialization so it might actually be difficult to
resume back from 2) to 1).

Also, if gap due to phys_base and gap due to relocation can happen at the same time,
calculating two values automatically might be futher complicated. So, it would be better
to add relocation value in VMCOREINFO. Then, what crash utility sholud do becomes very simple.

BTW, can it really happen that gaps due to phys_base and due to relocation happen at the
same time? I feel relocation covers phys_base mechanism. If there's relocation, phys_base
is not necessary.

--
Thanks.
HATAYAMA, Daisuke

2013-10-08 13:38:57

by Dave Anderson

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic



----- Original Message -----
> (2013/10/07 22:21), Dave Anderson wrote:
>
> > ----- Original Message -----
> >> (2013/10/03 22:47), Dave Anderson wrote:
>
> >>> ----- Original Message -----
> >>>> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
> >>>>> (2013/10/02 16:48), Kees Cook wrote:
>
> >>
> >> Thanks for detailed explanation. So, there's already a feature in crash
> >> utility
> >> to address relocation!, though it's better for me to try them to check if
> >> it's
> >> really applicable to this feature. My concern is whether --reloc works
> >> well
> >> on x86_64 too, because relocation has never done on x86_64 ever, right?
> >
> > Correct.
> >
> >> Another concern is that in case of relocation, users need to additional information
> >> regarding runtime symbol information to crash utility. I want to avoid additional
> >> process, automation is preferable if possible.
> >
> > Right. As I mentioned in the case of 32-bit x86 dumpfiles, there is no automation
> > available when CONFIG_PHYSICAL_START is larger than CONFIG_PHYSICAL_ALIGN. The user
> > either has to be aware of their values in order to calculate the --reloc argument,
> > or has to capture a copy of the /proc/kallsyms file on the crashed system. Typically
> > users/distros using kdump changed their x86 configurations to avoid having to deal
> > with that.
> >
>
> Sorry, I don't understand why relocation size cannot be calculated when
> CONFIG_PHYSICALSTART > CONFIG_PHYSICAL_ALIGN. Could you explain that?

I just meant that when CONFIG_PHYSICAL_START > CONFIG_PHYSICAL_ALIGN, the 32-bit x86 kernel
gets relocated (like the secondary kdump kernel), but that information is not readily available
from the vmlinux/vmcore pair.

>
> >> I guess it's enough if there's runtime symbol addresses because we can get relocated
> >> offset value by comparing it with the compile-time symbol address contained in
> >> a given debuginfo file. Candidates for such symbols are the ones contained in
> >> VMCOREINFO note containing some symbol values for makedumpfile to refer to mm-related
> >> objects in kernel, which is always contained in vmcore generated by current kdump and
> >> also vmcores converted by makedumpfile from it. How about this idea?
> >
> > But how would that differ from using an incorrect (non-matching) vmlinux file?
> >
>
> It seems to me almost similar to what crash currently does even if we do relocation check.
> The current check crash currently does is trial-and-error since there's no information
> indicating given vmcore and vmlinuxcertainly match well.
>
> For example, the process I imagine is:
>
> 1) try to match vmcore and vmlinux with no relocation. If fails, go to 2).
> 2) try to match vmcore and vmlinux with relocation.
>
> The two steps include symbol table initialization so it might actually be difficult to
> resume back from 2) to 1).
>
> Also, if gap due to phys_base and gap due to relocation can happen at the same time,
> calculating two values automatically might be futher complicated. So, it would be better
> to add relocation value in VMCOREINFO. Then, what crash utility sholud do becomes very simple.

Yes please...

And while you're at it, the kernel's

VMCOREINFO_SYMBOL(phys_base);

is pretty much useless, at least w/respect to ELF vmcores, since we need to know its
value in order to translate the address. And I don't think that makedumpfile uses
it when it calculates the phys_base that it stores in compressed kdump headers. Why
not put its value instead of its address?

> BTW, can it really happen that gaps due to phys_base and due to relocation happen at the
> same time? I feel relocation covers phys_base mechanism. If there's relocation, phys_base
> is not necessary.
>
> --
> Thanks.
> HATAYAMA, Daisuke
>
>

2013-10-09 10:04:51

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

(2013/10/08 22:38), Dave Anderson wrote:
>
>
> ----- Original Message -----
>> (2013/10/07 22:21), Dave Anderson wrote:
>>
>>> ----- Original Message -----
>>>> (2013/10/03 22:47), Dave Anderson wrote:
>>
>>>>> ----- Original Message -----
>>>>>> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
>>>>>>> (2013/10/02 16:48), Kees Cook wrote:
>>
>>>>
>>>> Thanks for detailed explanation. So, there's already a feature in crash
>>>> utility
>>>> to address relocation!, though it's better for me to try them to check if
>>>> it's
>>>> really applicable to this feature. My concern is whether --reloc works
>>>> well
>>>> on x86_64 too, because relocation has never done on x86_64 ever, right?
>>>
>>> Correct.
>>>
>>>> Another concern is that in case of relocation, users need to additional information
>>>> regarding runtime symbol information to crash utility. I want to avoid additional
>>>> process, automation is preferable if possible.
>>>
>>> Right. As I mentioned in the case of 32-bit x86 dumpfiles, there is no automation
>>> available when CONFIG_PHYSICAL_START is larger than CONFIG_PHYSICAL_ALIGN. The user
>>> either has to be aware of their values in order to calculate the --reloc argument,
>>> or has to capture a copy of the /proc/kallsyms file on the crashed system. Typically
>>> users/distros using kdump changed their x86 configurations to avoid having to deal
>>> with that.
>>>
>>
>> Sorry, I don't understand why relocation size cannot be calculated when
>> CONFIG_PHYSICALSTART > CONFIG_PHYSICAL_ALIGN. Could you explain that?
>
> I just meant that when CONFIG_PHYSICAL_START > CONFIG_PHYSICAL_ALIGN, the 32-bit x86 kernel
> gets relocated (like the secondary kdump kernel), but that information is not readily available
> from the vmlinux/vmcore pair.
>

My understanding on CONFIG_PHYSICAL_ALIGN was that starting address of kernel text area
is always rounded up to CONFIG_PHYSICAL_ALIGN, only. Your explanation would be part I don't
understand well. I'll reconsider it locally...

>>
>>>> I guess it's enough if there's runtime symbol addresses because we can get relocated
>>>> offset value by comparing it with the compile-time symbol address contained in
>>>> a given debuginfo file. Candidates for such symbols are the ones contained in
>>>> VMCOREINFO note containing some symbol values for makedumpfile to refer to mm-related
>>>> objects in kernel, which is always contained in vmcore generated by current kdump and
>>>> also vmcores converted by makedumpfile from it. How about this idea?
>>>
>>> But how would that differ from using an incorrect (non-matching) vmlinux file?
>>>
>>
>> It seems to me almost similar to what crash currently does even if we do relocation check.
>> The current check crash currently does is trial-and-error since there's no information
>> indicating given vmcore and vmlinuxcertainly match well.
>>
>> For example, the process I imagine is:
>>
>> 1) try to match vmcore and vmlinux with no relocation. If fails, go to 2).
>> 2) try to match vmcore and vmlinux with relocation.
>>
>> The two steps include symbol table initialization so it might actually be difficult to
>> resume back from 2) to 1).
>>
>> Also, if gap due to phys_base and gap due to relocation can happen at the same time,
>> calculating two values automatically might be futher complicated. So, it would be better
>> to add relocation value in VMCOREINFO. Then, what crash utility sholud do becomes very simple.
>
> Yes please...
>
> And while you're at it, the kernel's
>
> VMCOREINFO_SYMBOL(phys_base);
>
> is pretty much useless, at least w/respect to ELF vmcores, since we need to know its
> value in order to translate the address. And I don't think that makedumpfile uses
> it when it calculates the phys_base that it stores in compressed kdump headers. Why
> not put its value instead of its address?
>

Yes, I've also noticed this fact. Anyway, I'll post a patch to improvement this phys_base
and a patch to export relocation information in VMCOREINFO.

--
Thanks.
HATAYAMA, Daisuke

2013-10-09 14:15:10

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

On 10/09/2013 03:04 AM, HATAYAMA Daisuke wrote:
>>>
>>> Sorry, I don't understand why relocation size cannot be calculated when
>>> CONFIG_PHYSICALSTART > CONFIG_PHYSICAL_ALIGN. Could you explain that?
>>
>> I just meant that when CONFIG_PHYSICAL_START > CONFIG_PHYSICAL_ALIGN,
>> the 32-bit x86 kernel
>> gets relocated (like the secondary kdump kernel), but that information
>> is not readily available
>> from the vmlinux/vmcore pair.
>>
>
> My understanding on CONFIG_PHYSICAL_ALIGN was that starting address of
> kernel text area
> is always rounded up to CONFIG_PHYSICAL_ALIGN, only. Your explanation
> would be part I don't
> understand well. I'll reconsider it locally...
>

If CONFIG_PHYSICAL_START == CONFIG_PHYSICAL_ALIGN, then it is very
likely that the kernel (in the absence of kASLR) will be run at the
CONFIG_PHYSICAL_START address, as the initial loading address, usually 1
MB, will be rounded up to CONFIG_PHYSICAL_ALIGN.

Since CONFIG_PHYSICAL_START is the unrelocated linking address, they end
up matching.

-hpa

2013-10-09 18:06:06

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86, kaslr: report kernel offset on panic

On Wed, Oct 9, 2013 at 3:04 AM, HATAYAMA Daisuke
<[email protected]> wrote:
> (2013/10/08 22:38), Dave Anderson wrote:
>>
>>
>>
>> ----- Original Message -----
>>>
>>> (2013/10/07 22:21), Dave Anderson wrote:
>>>
>>>> ----- Original Message -----
>>>>>
>>>>> (2013/10/03 22:47), Dave Anderson wrote:
>>>
>>>
>>>>>> ----- Original Message -----
>>>>>>>
>>>>>>> (2013/10/02 18:13), HATAYAMA Daisuke wrote:
>>>>>>>>
>>>>>>>> (2013/10/02 16:48), Kees Cook wrote:
>>>
>>>
>>>>>
>>>>> Thanks for detailed explanation. So, there's already a feature in crash
>>>>> utility
>>>>> to address relocation!, though it's better for me to try them to check
>>>>> if
>>>>> it's
>>>>> really applicable to this feature. My concern is whether --reloc works
>>>>> well
>>>>> on x86_64 too, because relocation has never done on x86_64 ever, right?
>>>>
>>>>
>>>> Correct.
>>>>
>>>>> Another concern is that in case of relocation, users need to additional
>>>>> information
>>>>> regarding runtime symbol information to crash utility. I want to avoid
>>>>> additional
>>>>> process, automation is preferable if possible.
>>>>
>>>>
>>>> Right. As I mentioned in the case of 32-bit x86 dumpfiles, there is no
>>>> automation
>>>> available when CONFIG_PHYSICAL_START is larger than
>>>> CONFIG_PHYSICAL_ALIGN. The user
>>>> either has to be aware of their values in order to calculate the --reloc
>>>> argument,
>>>> or has to capture a copy of the /proc/kallsyms file on the crashed
>>>> system. Typically
>>>> users/distros using kdump changed their x86 configurations to avoid
>>>> having to deal
>>>> with that.
>>>>
>>>
>>> Sorry, I don't understand why relocation size cannot be calculated when
>>> CONFIG_PHYSICALSTART > CONFIG_PHYSICAL_ALIGN. Could you explain that?
>>
>>
>> I just meant that when CONFIG_PHYSICAL_START > CONFIG_PHYSICAL_ALIGN, the
>> 32-bit x86 kernel
>> gets relocated (like the secondary kdump kernel), but that information is
>> not readily available
>> from the vmlinux/vmcore pair.
>>
>
> My understanding on CONFIG_PHYSICAL_ALIGN was that starting address of
> kernel text area
> is always rounded up to CONFIG_PHYSICAL_ALIGN, only. Your explanation would
> be part I don't
> understand well. I'll reconsider it locally...
>
>
>>>
>>>>> I guess it's enough if there's runtime symbol addresses because we can
>>>>> get relocated
>>>>> offset value by comparing it with the compile-time symbol address
>>>>> contained in
>>>>> a given debuginfo file. Candidates for such symbols are the ones
>>>>> contained in
>>>>> VMCOREINFO note containing some symbol values for makedumpfile to refer
>>>>> to mm-related
>>>>> objects in kernel, which is always contained in vmcore generated by
>>>>> current kdump and
>>>>> also vmcores converted by makedumpfile from it. How about this idea?
>>>>
>>>>
>>>> But how would that differ from using an incorrect (non-matching) vmlinux
>>>> file?
>>>>
>>>
>>> It seems to me almost similar to what crash currently does even if we do
>>> relocation check.
>>> The current check crash currently does is trial-and-error since there's
>>> no information
>>> indicating given vmcore and vmlinuxcertainly match well.
>>>
>>> For example, the process I imagine is:
>>>
>>> 1) try to match vmcore and vmlinux with no relocation. If fails, go
>>> to 2).
>>> 2) try to match vmcore and vmlinux with relocation.
>>>
>>> The two steps include symbol table initialization so it might actually be
>>> difficult to
>>> resume back from 2) to 1).
>>>
>>> Also, if gap due to phys_base and gap due to relocation can happen at the
>>> same time,
>>> calculating two values automatically might be futher complicated. So, it
>>> would be better
>>> to add relocation value in VMCOREINFO. Then, what crash utility sholud do
>>> becomes very simple.
>>
>>
>> Yes please...
>>
>> And while you're at it, the kernel's
>>
>> VMCOREINFO_SYMBOL(phys_base);
>>
>> is pretty much useless, at least w/respect to ELF vmcores, since we need
>> to know its
>> value in order to translate the address. And I don't think that
>> makedumpfile uses
>> it when it calculates the phys_base that it stores in compressed kdump
>> headers. Why
>> not put its value instead of its address?
>>
>
> Yes, I've also noticed this fact. Anyway, I'll post a patch to improvement
> this phys_base
> and a patch to export relocation information in VMCOREINFO.

Cool; can you keep me CCed on those patches? I'm interested in seeing
and testing it too.

Thanks!

-Kees

--
Kees Cook
Chrome OS Security