2014-01-20 16:48:21

by H. Peter Anvin

[permalink] [raw]
Subject: [GIT PULL] x86/kaslr for v3.14

Hi Linus,

This enables kernel address space randomization for x86.

The following changes since commit d0e639c9e06d44e713170031fe05fb60ebe680af:

Linux 3.12-rc4 (2013-10-06 14:00:20 -0700)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-kaslr-for-linus

The head of this tree is da2b6fb990cf782b18952f534ec7323453bc4fc9.

x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET (2014-01-14 10:45:56 -0800)

----------------------------------------------------------------
H. Peter Anvin (2):
x86, boot: Rename get_flags() and check_flags() to *_cpuflags()
x86, kaslr: Add a circular multiply for better bit diffusion

Kees Cook (10):
x86, boot: Move CPU flags out of cpucheck
x86, kaslr: Return location from decompress_kernel
x86, kaslr: Provide randomness functions
x86, kaslr: Select random position from e820 maps
x86, kaslr: Report kernel offset on panic
x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64
x86/relocs: Add percpu fixup for GNU ld 2.23
x86, kaslr: Mix entropy sources together as needed
x86, kaslr: Use char array to gain sizeof sanity
x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET

Michael Davidson (1):
x86, relocs: Add more per-cpu gold special cases

Wei Yongjun (1):
x86, kaslr: Remove unused including <linux/version.h>

Documentation/kernel-parameters.txt | 4 +
arch/x86/Kconfig | 59 +++++-
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/boot.h | 10 +-
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/aslr.c | 316 ++++++++++++++++++++++++++++++++
arch/x86/boot/compressed/cmdline.c | 2 +-
arch/x86/boot/compressed/cpuflags.c | 12 ++
arch/x86/boot/compressed/head_32.S | 10 +-
arch/x86/boot/compressed/head_64.S | 16 +-
arch/x86/boot/compressed/misc.c | 18 +-
arch/x86/boot/compressed/misc.h | 37 +++-
arch/x86/boot/cpucheck.c | 100 +---------
arch/x86/boot/cpuflags.c | 104 +++++++++++
arch/x86/boot/cpuflags.h | 19 ++
arch/x86/include/asm/archrandom.h | 21 +++
arch/x86/include/asm/page_64_types.h | 15 +-
arch/x86/include/asm/pgtable_64_types.h | 2 +-
arch/x86/kernel/cpu/rdrand.c | 14 --
arch/x86/kernel/setup.c | 26 +++
arch/x86/mm/init_32.c | 3 +
arch/x86/tools/relocs.c | 20 +-
22 files changed, 654 insertions(+), 158 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fcbb736d55fe..773fc4c077b4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1975,6 +1975,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
noapic [SMP,APIC] Tells the kernel to not make use of any
IOAPICs that may be present in the system.

+ nokaslr [X86]
+ Disable kernel base offset ASLR (Address Space
+ Layout Randomization) if built into the kernel.
+
noautogroup Disable scheduler automatic task group creation.

nobats [PPC] Do not use BATs for mapping kernel lowmem
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ee2fb9d37745..5c9e19dccf2f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1722,16 +1722,67 @@ config RELOCATABLE

Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
it has been loaded at and the compile time physical address
- (CONFIG_PHYSICAL_START) is ignored.
+ (CONFIG_PHYSICAL_START) is used as the minimum location.

-# Relocation on x86-32 needs some additional build support
+config RANDOMIZE_BASE
+ bool "Randomize the address of the kernel image"
+ depends on RELOCATABLE
+ depends on !HIBERNATION
+ default n
+ ---help---
+ Randomizes the physical and virtual address at which the
+ kernel image is decompressed, as a security feature that
+ deters exploit attempts relying on knowledge of the location
+ of kernel internals.
+
+ Entropy is generated using the RDRAND instruction if it is
+ supported. If RDTSC is supported, it is used as well. If
+ neither RDRAND nor RDTSC are supported, then randomness is
+ read from the i8254 timer.
+
+ The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
+ and aligned according to PHYSICAL_ALIGN. Since the kernel is
+ built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
+ minimum of 2MiB, only 10 bits of entropy is theoretically
+ possible. At best, due to page table layouts, 64-bit can use
+ 9 bits of entropy and 32-bit uses 8 bits.
+
+ If unsure, say N.
+
+config RANDOMIZE_BASE_MAX_OFFSET
+ hex "Maximum kASLR offset allowed" if EXPERT
+ depends on RANDOMIZE_BASE
+ range 0x0 0x20000000 if X86_32
+ default "0x20000000" if X86_32
+ range 0x0 0x40000000 if X86_64
+ default "0x40000000" if X86_64
+ ---help---
+ The lesser of RANDOMIZE_BASE_MAX_OFFSET and available physical
+ memory is used to determine the maximal offset in bytes that will
+ be applied to the kernel when kernel Address Space Layout
+ Randomization (kASLR) is active. This must be a multiple of
+ PHYSICAL_ALIGN.
+
+ On 32-bit this is limited to 512MiB by page table layouts. The
+ default is 512MiB.
+
+ On 64-bit this is limited by how the kernel fixmap page table is
+ positioned, so this cannot be larger than 1GiB currently. Without
+ RANDOMIZE_BASE, there is a 512MiB to 1.5GiB split between kernel
+ and modules. When RANDOMIZE_BASE_MAX_OFFSET is above 512MiB, the
+ modules area will shrink to compensate, up to the current maximum
+ 1GiB to 1GiB split. The default is 1GiB.
+
+ If unsure, leave at the default value.
+
+# Relocation on x86 needs some additional build support
config X86_NEED_RELOCS
def_bool y
- depends on X86_32 && RELOCATABLE
+ depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)

config PHYSICAL_ALIGN
hex "Alignment value to which kernel should be aligned"
- default "0x1000000"
+ default "0x200000"
range 0x2000 0x1000000 if X86_32
range 0x200000 0x1000000 if X86_64
---help---
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 379814bc41e3..0da2e37b37c3 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -20,7 +20,7 @@ targets := vmlinux.bin setup.bin setup.elf bzImage
targets += fdimage fdimage144 fdimage288 image.iso mtools.conf
subdir- := compressed

-setup-y += a20.o bioscall.o cmdline.o copy.o cpu.o cpucheck.o
+setup-y += a20.o bioscall.o cmdline.o copy.o cpu.o cpuflags.o cpucheck.o
setup-y += early_serial_console.o edd.o header.o main.o mca.o memory.o
setup-y += pm.o pmjump.o printf.o regs.o string.o tty.o video.o
setup-y += video-mode.o version.o
diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef72baeff484..50f8c5e0f37e 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -26,9 +26,8 @@
#include <asm/boot.h>
#include <asm/setup.h>
#include "bitops.h"
-#include <asm/cpufeature.h>
-#include <asm/processor-flags.h>
#include "ctype.h"
+#include "cpuflags.h"

/* Useful macros */
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
@@ -307,14 +306,7 @@ static inline int cmdline_find_option_bool(const char *option)
return __cmdline_find_option_bool(cmd_line_ptr, option);
}

-
/* cpu.c, cpucheck.c */
-struct cpu_features {
- int level; /* Family, or 64 for x86-64 */
- int model;
- u32 flags[NCAPINTS];
-};
-extern struct cpu_features cpu;
int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr);
int validate_cpu(void);

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index dcd90df10ab4..ae8b5dbbd8c5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -27,7 +27,7 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include

VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o $(obj)/early_serial_console.o \
- $(obj)/piggy.o
+ $(obj)/piggy.o $(obj)/cpuflags.o $(obj)/aslr.o

$(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
new file mode 100644
index 000000000000..90a21f430117
--- /dev/null
+++ b/arch/x86/boot/compressed/aslr.c
@@ -0,0 +1,316 @@
+#include "misc.h"
+
+#ifdef CONFIG_RANDOMIZE_BASE
+#include <asm/msr.h>
+#include <asm/archrandom.h>
+#include <asm/e820.h>
+
+#include <generated/compile.h>
+#include <linux/module.h>
+#include <linux/uts.h>
+#include <linux/utsname.h>
+#include <generated/utsrelease.h>
+
+/* Simplified build-specific string for starting entropy. */
+static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
+ LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
+
+#define I8254_PORT_CONTROL 0x43
+#define I8254_PORT_COUNTER0 0x40
+#define I8254_CMD_READBACK 0xC0
+#define I8254_SELECT_COUNTER0 0x02
+#define I8254_STATUS_NOTREADY 0x40
+static inline u16 i8254(void)
+{
+ u16 status, timer;
+
+ do {
+ outb(I8254_PORT_CONTROL,
+ I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
+ status = inb(I8254_PORT_COUNTER0);
+ timer = inb(I8254_PORT_COUNTER0);
+ timer |= inb(I8254_PORT_COUNTER0) << 8;
+ } while (status & I8254_STATUS_NOTREADY);
+
+ return timer;
+}
+
+static unsigned long rotate_xor(unsigned long hash, const void *area,
+ size_t size)
+{
+ size_t i;
+ unsigned long *ptr = (unsigned long *)area;
+
+ for (i = 0; i < size / sizeof(hash); i++) {
+ /* Rotate by odd number of bits and XOR. */
+ hash = (hash << ((sizeof(hash) * 8) - 7)) | (hash >> 7);
+ hash ^= ptr[i];
+ }
+
+ return hash;
+}
+
+/* Attempt to create a simple but unpredictable starting entropy. */
+static unsigned long get_random_boot(void)
+{
+ unsigned long hash = 0;
+
+ hash = rotate_xor(hash, build_str, sizeof(build_str));
+ hash = rotate_xor(hash, real_mode, sizeof(*real_mode));
+
+ return hash;
+}
+
+static unsigned long get_random_long(void)
+{
+#ifdef CONFIG_X86_64
+ const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
+#else
+ const unsigned long mix_const = 0x3f39e593UL;
+#endif
+ unsigned long raw, random = get_random_boot();
+ bool use_i8254 = true;
+
+ debug_putstr("KASLR using");
+
+ if (has_cpuflag(X86_FEATURE_RDRAND)) {
+ debug_putstr(" RDRAND");
+ if (rdrand_long(&raw)) {
+ random ^= raw;
+ use_i8254 = false;
+ }
+ }
+
+ if (has_cpuflag(X86_FEATURE_TSC)) {
+ debug_putstr(" RDTSC");
+ rdtscll(raw);
+
+ random ^= raw;
+ use_i8254 = false;
+ }
+
+ if (use_i8254) {
+ debug_putstr(" i8254");
+ random ^= i8254();
+ }
+
+ /* Circular multiply for better bit diffusion */
+ asm("mul %3"
+ : "=a" (random), "=d" (raw)
+ : "a" (random), "rm" (mix_const));
+ random += raw;
+
+ debug_putstr("...\n");
+
+ return random;
+}
+
+struct mem_vector {
+ unsigned long start;
+ unsigned long size;
+};
+
+#define MEM_AVOID_MAX 5
+struct mem_vector mem_avoid[MEM_AVOID_MAX];
+
+static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
+{
+ /* Item at least partially before region. */
+ if (item->start < region->start)
+ return false;
+ /* Item at least partially after region. */
+ if (item->start + item->size > region->start + region->size)
+ return false;
+ return true;
+}
+
+static bool mem_overlaps(struct mem_vector *one, struct mem_vector *two)
+{
+ /* Item one is entirely before item two. */
+ if (one->start + one->size <= two->start)
+ return false;
+ /* Item one is entirely after item two. */
+ if (one->start >= two->start + two->size)
+ return false;
+ return true;
+}
+
+static void mem_avoid_init(unsigned long input, unsigned long input_size,
+ unsigned long output, unsigned long output_size)
+{
+ u64 initrd_start, initrd_size;
+ u64 cmd_line, cmd_line_size;
+ unsigned long unsafe, unsafe_len;
+ char *ptr;
+
+ /*
+ * Avoid the region that is unsafe to overlap during
+ * decompression (see calculations at top of misc.c).
+ */
+ unsafe_len = (output_size >> 12) + 32768 + 18;
+ unsafe = (unsigned long)input + input_size - unsafe_len;
+ mem_avoid[0].start = unsafe;
+ mem_avoid[0].size = unsafe_len;
+
+ /* Avoid initrd. */
+ initrd_start = (u64)real_mode->ext_ramdisk_image << 32;
+ initrd_start |= real_mode->hdr.ramdisk_image;
+ initrd_size = (u64)real_mode->ext_ramdisk_size << 32;
+ initrd_size |= real_mode->hdr.ramdisk_size;
+ mem_avoid[1].start = initrd_start;
+ mem_avoid[1].size = initrd_size;
+
+ /* Avoid kernel command line. */
+ cmd_line = (u64)real_mode->ext_cmd_line_ptr << 32;
+ cmd_line |= real_mode->hdr.cmd_line_ptr;
+ /* Calculate size of cmd_line. */
+ ptr = (char *)(unsigned long)cmd_line;
+ for (cmd_line_size = 0; ptr[cmd_line_size++]; )
+ ;
+ mem_avoid[2].start = cmd_line;
+ mem_avoid[2].size = cmd_line_size;
+
+ /* Avoid heap memory. */
+ mem_avoid[3].start = (unsigned long)free_mem_ptr;
+ mem_avoid[3].size = BOOT_HEAP_SIZE;
+
+ /* Avoid stack memory. */
+ mem_avoid[4].start = (unsigned long)free_mem_end_ptr;
+ mem_avoid[4].size = BOOT_STACK_SIZE;
+}
+
+/* Does this memory vector overlap a known avoided area? */
+bool mem_avoid_overlap(struct mem_vector *img)
+{
+ int i;
+
+ for (i = 0; i < MEM_AVOID_MAX; i++) {
+ if (mem_overlaps(img, &mem_avoid[i]))
+ return true;
+ }
+
+ return false;
+}
+
+unsigned long slots[CONFIG_RANDOMIZE_BASE_MAX_OFFSET / CONFIG_PHYSICAL_ALIGN];
+unsigned long slot_max = 0;
+
+static void slots_append(unsigned long addr)
+{
+ /* Overflowing the slots list should be impossible. */
+ if (slot_max >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET /
+ CONFIG_PHYSICAL_ALIGN)
+ return;
+
+ slots[slot_max++] = addr;
+}
+
+static unsigned long slots_fetch_random(void)
+{
+ /* Handle case of no slots stored. */
+ if (slot_max == 0)
+ return 0;
+
+ return slots[get_random_long() % slot_max];
+}
+
+static void process_e820_entry(struct e820entry *entry,
+ unsigned long minimum,
+ unsigned long image_size)
+{
+ struct mem_vector region, img;
+
+ /* Skip non-RAM entries. */
+ if (entry->type != E820_RAM)
+ return;
+
+ /* Ignore entries entirely above our maximum. */
+ if (entry->addr >= CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+ return;
+
+ /* Ignore entries entirely below our minimum. */
+ if (entry->addr + entry->size < minimum)
+ return;
+
+ region.start = entry->addr;
+ region.size = entry->size;
+
+ /* Potentially raise address to minimum location. */
+ if (region.start < minimum)
+ region.start = minimum;
+
+ /* Potentially raise address to meet alignment requirements. */
+ region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
+
+ /* Did we raise the address above the bounds of this e820 region? */
+ if (region.start > entry->addr + entry->size)
+ return;
+
+ /* Reduce size by any delta from the original address. */
+ region.size -= region.start - entry->addr;
+
+ /* Reduce maximum size to fit end of image within maximum limit. */
+ if (region.start + region.size > CONFIG_RANDOMIZE_BASE_MAX_OFFSET)
+ region.size = CONFIG_RANDOMIZE_BASE_MAX_OFFSET - region.start;
+
+ /* Walk each aligned slot and check for avoided areas. */
+ for (img.start = region.start, img.size = image_size ;
+ mem_contains(&region, &img) ;
+ img.start += CONFIG_PHYSICAL_ALIGN) {
+ if (mem_avoid_overlap(&img))
+ continue;
+ slots_append(img.start);
+ }
+}
+
+static unsigned long find_random_addr(unsigned long minimum,
+ unsigned long size)
+{
+ int i;
+ unsigned long addr;
+
+ /* Make sure minimum is aligned. */
+ minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+
+ /* Verify potential e820 positions, appending to slots list. */
+ for (i = 0; i < real_mode->e820_entries; i++) {
+ process_e820_entry(&real_mode->e820_map[i], minimum, size);
+ }
+
+ return slots_fetch_random();
+}
+
+unsigned char *choose_kernel_location(unsigned char *input,
+ unsigned long input_size,
+ unsigned char *output,
+ unsigned long output_size)
+{
+ unsigned long choice = (unsigned long)output;
+ unsigned long random;
+
+ if (cmdline_find_option_bool("nokaslr")) {
+ debug_putstr("KASLR disabled...\n");
+ goto out;
+ }
+
+ /* Record the various known unsafe memory ranges. */
+ mem_avoid_init((unsigned long)input, input_size,
+ (unsigned long)output, output_size);
+
+ /* Walk e820 and find a random address. */
+ random = find_random_addr(choice, output_size);
+ if (!random) {
+ debug_putstr("KASLR could not find suitable E820 region...\n");
+ goto out;
+ }
+
+ /* Always enforce the minimum. */
+ if (random < choice)
+ goto out;
+
+ choice = random;
+out:
+ return (unsigned char *)choice;
+}
+
+#endif /* CONFIG_RANDOMIZE_BASE */
diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index bffd73b45b1f..b68e3033e6b9 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -1,6 +1,6 @@
#include "misc.h"

-#ifdef CONFIG_EARLY_PRINTK
+#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE

static unsigned long fs;
static inline void set_fs(unsigned long seg)
diff --git a/arch/x86/boot/compressed/cpuflags.c b/arch/x86/boot/compressed/cpuflags.c
new file mode 100644
index 000000000000..aa313466118b
--- /dev/null
+++ b/arch/x86/boot/compressed/cpuflags.c
@@ -0,0 +1,12 @@
+#ifdef CONFIG_RANDOMIZE_BASE
+
+#include "../cpuflags.c"
+
+bool has_cpuflag(int flag)
+{
+ get_cpuflags();
+
+ return test_bit(flag, cpu.flags);
+}
+
+#endif
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 5d6f6891b188..9116aac232c7 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -117,9 +117,11 @@ preferred_addr:
addl %eax, %ebx
notl %eax
andl %eax, %ebx
-#else
- movl $LOAD_PHYSICAL_ADDR, %ebx
+ cmpl $LOAD_PHYSICAL_ADDR, %ebx
+ jge 1f
#endif
+ movl $LOAD_PHYSICAL_ADDR, %ebx
+1:

/* Target address to relocate to for decompression */
addl $z_extract_offset, %ebx
@@ -191,14 +193,14 @@ relocated:
leal boot_heap(%ebx), %eax
pushl %eax /* heap area */
pushl %esi /* real mode pointer */
- call decompress_kernel
+ call decompress_kernel /* returns kernel location in %eax */
addl $24, %esp

/*
* Jump to the decompressed kernel.
*/
xorl %ebx, %ebx
- jmp *%ebp
+ jmp *%eax

/*
* Stack and heap for uncompression
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index c337422b575d..c5c1ae0997e7 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -94,9 +94,11 @@ ENTRY(startup_32)
addl %eax, %ebx
notl %eax
andl %eax, %ebx
-#else
- movl $LOAD_PHYSICAL_ADDR, %ebx
+ cmpl $LOAD_PHYSICAL_ADDR, %ebx
+ jge 1f
#endif
+ movl $LOAD_PHYSICAL_ADDR, %ebx
+1:

/* Target address to relocate to for decompression */
addl $z_extract_offset, %ebx
@@ -269,9 +271,11 @@ preferred_addr:
addq %rax, %rbp
notq %rax
andq %rax, %rbp
-#else
- movq $LOAD_PHYSICAL_ADDR, %rbp
+ cmpq $LOAD_PHYSICAL_ADDR, %rbp
+ jge 1f
#endif
+ movq $LOAD_PHYSICAL_ADDR, %rbp
+1:

/* Target address to relocate to for decompression */
leaq z_extract_offset(%rbp), %rbx
@@ -339,13 +343,13 @@ relocated:
movl $z_input_len, %ecx /* input_len */
movq %rbp, %r8 /* output target address */
movq $z_output_len, %r9 /* decompressed length */
- call decompress_kernel
+ call decompress_kernel /* returns kernel location in %rax */
popq %rsi

/*
* Jump to the decompressed kernel.
*/
- jmp *%rbp
+ jmp *%rax

.code32
no_longmode:
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 434f077d2c4d..196eaf373a06 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -112,14 +112,8 @@ struct boot_params *real_mode; /* Pointer to real-mode data */
void *memset(void *s, int c, size_t n);
void *memcpy(void *dest, const void *src, size_t n);

-#ifdef CONFIG_X86_64
-#define memptr long
-#else
-#define memptr unsigned
-#endif
-
-static memptr free_mem_ptr;
-static memptr free_mem_end_ptr;
+memptr free_mem_ptr;
+memptr free_mem_end_ptr;

static char *vidmem;
static int vidport;
@@ -395,7 +389,7 @@ static void parse_elf(void *output)
free(phdrs);
}

-asmlinkage void decompress_kernel(void *rmode, memptr heap,
+asmlinkage void *decompress_kernel(void *rmode, memptr heap,
unsigned char *input_data,
unsigned long input_len,
unsigned char *output,
@@ -422,6 +416,10 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
free_mem_ptr = heap; /* Heap */
free_mem_end_ptr = heap + BOOT_HEAP_SIZE;

+ output = choose_kernel_location(input_data, input_len,
+ output, output_len);
+
+ /* Validate memory location choices. */
if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
error("Destination address inappropriately aligned");
#ifdef CONFIG_X86_64
@@ -441,5 +439,5 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
parse_elf(output);
handle_relocations(output, output_len);
debug_putstr("done.\nBooting the kernel.\n");
- return;
+ return output;
}
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 674019d8e235..24e3e569a13c 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -23,7 +23,15 @@
#define BOOT_BOOT_H
#include "../ctype.h"

+#ifdef CONFIG_X86_64
+#define memptr long
+#else
+#define memptr unsigned
+#endif
+
/* misc.c */
+extern memptr free_mem_ptr;
+extern memptr free_mem_end_ptr;
extern struct boot_params *real_mode; /* Pointer to real-mode data */
void __putstr(const char *s);
#define error_putstr(__x) __putstr(__x)
@@ -39,23 +47,40 @@ static inline void debug_putstr(const char *s)

#endif

-#ifdef CONFIG_EARLY_PRINTK
-
+#if CONFIG_EARLY_PRINTK || CONFIG_RANDOMIZE_BASE
/* cmdline.c */
int cmdline_find_option(const char *option, char *buffer, int bufsize);
int cmdline_find_option_bool(const char *option);
+#endif

-/* early_serial_console.c */
-extern int early_serial_base;
-void console_init(void);

+#if CONFIG_RANDOMIZE_BASE
+/* aslr.c */
+unsigned char *choose_kernel_location(unsigned char *input,
+ unsigned long input_size,
+ unsigned char *output,
+ unsigned long output_size);
+/* cpuflags.c */
+bool has_cpuflag(int flag);
#else
+static inline
+unsigned char *choose_kernel_location(unsigned char *input,
+ unsigned long input_size,
+ unsigned char *output,
+ unsigned long output_size)
+{
+ return output;
+}
+#endif

+#ifdef CONFIG_EARLY_PRINTK
/* early_serial_console.c */
+extern int early_serial_base;
+void console_init(void);
+#else
static const int early_serial_base;
static inline void console_init(void)
{ }
-
#endif

#endif
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff037201f..100a9a10076a 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -28,8 +28,6 @@
#include <asm/required-features.h>
#include <asm/msr-index.h>

-struct cpu_features cpu;
-static u32 cpu_vendor[3];
static u32 err_flags[NCAPINTS];

static const int req_level = CONFIG_X86_MINIMUM_CPU_FAMILY;
@@ -69,92 +67,8 @@ static int is_transmeta(void)
cpu_vendor[2] == A32('M', 'x', '8', '6');
}

-static int has_fpu(void)
-{
- u16 fcw = -1, fsw = -1;
- u32 cr0;
-
- asm("movl %%cr0,%0" : "=r" (cr0));
- if (cr0 & (X86_CR0_EM|X86_CR0_TS)) {
- cr0 &= ~(X86_CR0_EM|X86_CR0_TS);
- asm volatile("movl %0,%%cr0" : : "r" (cr0));
- }
-
- asm volatile("fninit ; fnstsw %0 ; fnstcw %1"
- : "+m" (fsw), "+m" (fcw));
-
- return fsw == 0 && (fcw & 0x103f) == 0x003f;
-}
-
-static int has_eflag(u32 mask)
-{
- u32 f0, f1;
-
- asm("pushfl ; "
- "pushfl ; "
- "popl %0 ; "
- "movl %0,%1 ; "
- "xorl %2,%1 ; "
- "pushl %1 ; "
- "popfl ; "
- "pushfl ; "
- "popl %1 ; "
- "popfl"
- : "=&r" (f0), "=&r" (f1)
- : "ri" (mask));
-
- return !!((f0^f1) & mask);
-}
-
-static void get_flags(void)
-{
- u32 max_intel_level, max_amd_level;
- u32 tfms;
-
- if (has_fpu())
- set_bit(X86_FEATURE_FPU, cpu.flags);
-
- if (has_eflag(X86_EFLAGS_ID)) {
- asm("cpuid"
- : "=a" (max_intel_level),
- "=b" (cpu_vendor[0]),
- "=d" (cpu_vendor[1]),
- "=c" (cpu_vendor[2])
- : "a" (0));
-
- if (max_intel_level >= 0x00000001 &&
- max_intel_level <= 0x0000ffff) {
- asm("cpuid"
- : "=a" (tfms),
- "=c" (cpu.flags[4]),
- "=d" (cpu.flags[0])
- : "a" (0x00000001)
- : "ebx");
- cpu.level = (tfms >> 8) & 15;
- cpu.model = (tfms >> 4) & 15;
- if (cpu.level >= 6)
- cpu.model += ((tfms >> 16) & 0xf) << 4;
- }
-
- asm("cpuid"
- : "=a" (max_amd_level)
- : "a" (0x80000000)
- : "ebx", "ecx", "edx");
-
- if (max_amd_level >= 0x80000001 &&
- max_amd_level <= 0x8000ffff) {
- u32 eax = 0x80000001;
- asm("cpuid"
- : "+a" (eax),
- "=c" (cpu.flags[6]),
- "=d" (cpu.flags[1])
- : : "ebx");
- }
- }
-}
-
/* Returns a bitmask of which words we have error bits in */
-static int check_flags(void)
+static int check_cpuflags(void)
{
u32 err;
int i;
@@ -187,8 +101,8 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
if (has_eflag(X86_EFLAGS_AC))
cpu.level = 4;

- get_flags();
- err = check_flags();
+ get_cpuflags();
+ err = check_cpuflags();

if (test_bit(X86_FEATURE_LM, cpu.flags))
cpu.level = 64;
@@ -207,8 +121,8 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
eax &= ~(1 << 15);
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));

- get_flags(); /* Make sure it really did something */
- err = check_flags();
+ get_cpuflags(); /* Make sure it really did something */
+ err = check_cpuflags();
} else if (err == 0x01 &&
!(err_flags[0] & ~(1 << X86_FEATURE_CX8)) &&
is_centaur() && cpu.model >= 6) {
@@ -223,7 +137,7 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));

set_bit(X86_FEATURE_CX8, cpu.flags);
- err = check_flags();
+ err = check_cpuflags();
} else if (err == 0x01 && is_transmeta()) {
/* Transmeta might have masked feature bits in word 0 */

@@ -238,7 +152,7 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
: : "ecx", "ebx");
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));

- err = check_flags();
+ err = check_cpuflags();
}

if (err_flags_ptr)
diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c
new file mode 100644
index 000000000000..a9fcb7cfb241
--- /dev/null
+++ b/arch/x86/boot/cpuflags.c
@@ -0,0 +1,104 @@
+#include <linux/types.h>
+#include "bitops.h"
+
+#include <asm/processor-flags.h>
+#include <asm/required-features.h>
+#include <asm/msr-index.h>
+#include "cpuflags.h"
+
+struct cpu_features cpu;
+u32 cpu_vendor[3];
+
+static bool loaded_flags;
+
+static int has_fpu(void)
+{
+ u16 fcw = -1, fsw = -1;
+ unsigned long cr0;
+
+ asm volatile("mov %%cr0,%0" : "=r" (cr0));
+ if (cr0 & (X86_CR0_EM|X86_CR0_TS)) {
+ cr0 &= ~(X86_CR0_EM|X86_CR0_TS);
+ asm volatile("mov %0,%%cr0" : : "r" (cr0));
+ }
+
+ asm volatile("fninit ; fnstsw %0 ; fnstcw %1"
+ : "+m" (fsw), "+m" (fcw));
+
+ return fsw == 0 && (fcw & 0x103f) == 0x003f;
+}
+
+int has_eflag(unsigned long mask)
+{
+ unsigned long f0, f1;
+
+ asm volatile("pushf \n\t"
+ "pushf \n\t"
+ "pop %0 \n\t"
+ "mov %0,%1 \n\t"
+ "xor %2,%1 \n\t"
+ "push %1 \n\t"
+ "popf \n\t"
+ "pushf \n\t"
+ "pop %1 \n\t"
+ "popf"
+ : "=&r" (f0), "=&r" (f1)
+ : "ri" (mask));
+
+ return !!((f0^f1) & mask);
+}
+
+/* Handle x86_32 PIC using ebx. */
+#if defined(__i386__) && defined(__PIC__)
+# define EBX_REG "=r"
+#else
+# define EBX_REG "=b"
+#endif
+
+static inline void cpuid(u32 id, u32 *a, u32 *b, u32 *c, u32 *d)
+{
+ asm volatile(".ifnc %%ebx,%3 ; movl %%ebx,%3 ; .endif \n\t"
+ "cpuid \n\t"
+ ".ifnc %%ebx,%3 ; xchgl %%ebx,%3 ; .endif \n\t"
+ : "=a" (*a), "=c" (*c), "=d" (*d), EBX_REG (*b)
+ : "a" (id)
+ );
+}
+
+void get_cpuflags(void)
+{
+ u32 max_intel_level, max_amd_level;
+ u32 tfms;
+ u32 ignored;
+
+ if (loaded_flags)
+ return;
+ loaded_flags = true;
+
+ if (has_fpu())
+ set_bit(X86_FEATURE_FPU, cpu.flags);
+
+ if (has_eflag(X86_EFLAGS_ID)) {
+ cpuid(0x0, &max_intel_level, &cpu_vendor[0], &cpu_vendor[2],
+ &cpu_vendor[1]);
+
+ if (max_intel_level >= 0x00000001 &&
+ max_intel_level <= 0x0000ffff) {
+ cpuid(0x1, &tfms, &ignored, &cpu.flags[4],
+ &cpu.flags[0]);
+ cpu.level = (tfms >> 8) & 15;
+ cpu.model = (tfms >> 4) & 15;
+ if (cpu.level >= 6)
+ cpu.model += ((tfms >> 16) & 0xf) << 4;
+ }
+
+ cpuid(0x80000000, &max_amd_level, &ignored, &ignored,
+ &ignored);
+
+ if (max_amd_level >= 0x80000001 &&
+ max_amd_level <= 0x8000ffff) {
+ cpuid(0x80000001, &ignored, &ignored, &cpu.flags[6],
+ &cpu.flags[1]);
+ }
+ }
+}
diff --git a/arch/x86/boot/cpuflags.h b/arch/x86/boot/cpuflags.h
new file mode 100644
index 000000000000..ea97697e51e4
--- /dev/null
+++ b/arch/x86/boot/cpuflags.h
@@ -0,0 +1,19 @@
+#ifndef BOOT_CPUFLAGS_H
+#define BOOT_CPUFLAGS_H
+
+#include <asm/cpufeature.h>
+#include <asm/processor-flags.h>
+
+struct cpu_features {
+ int level; /* Family, or 64 for x86-64 */
+ int model;
+ u32 flags[NCAPINTS];
+};
+
+extern struct cpu_features cpu;
+extern u32 cpu_vendor[3];
+
+int has_eflag(unsigned long mask);
+void get_cpuflags(void);
+
+#endif
diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h
index 0d9ec770f2f8..e6a92455740e 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -39,6 +39,20 @@

#ifdef CONFIG_ARCH_RANDOM

+/* Instead of arch_get_random_long() when alternatives haven't run. */
+static inline int rdrand_long(unsigned long *v)
+{
+ int ok;
+ asm volatile("1: " RDRAND_LONG "\n\t"
+ "jc 2f\n\t"
+ "decl %0\n\t"
+ "jnz 1b\n\t"
+ "2:"
+ : "=r" (ok), "=a" (*v)
+ : "0" (RDRAND_RETRY_LOOPS));
+ return ok;
+}
+
#define GET_RANDOM(name, type, rdrand, nop) \
static inline int name(type *v) \
{ \
@@ -68,6 +82,13 @@ GET_RANDOM(arch_get_random_int, unsigned int, RDRAND_INT, ASM_NOP3);

#endif /* CONFIG_X86_64 */

+#else
+
+static inline int rdrand_long(unsigned long *v)
+{
+ return 0;
+}
+
#endif /* CONFIG_ARCH_RANDOM */

extern void x86_init_rdrand(struct cpuinfo_x86 *c);
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 43dcd804ebd5..8de6d9cf3b95 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -39,9 +39,18 @@
#define __VIRTUAL_MASK_SHIFT 47

/*
- * Kernel image size is limited to 512 MB (see level2_kernel_pgt in
- * arch/x86/kernel/head_64.S), and it is mapped here:
+ * Kernel image size is limited to 1GiB due to the fixmap living in the
+ * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
+ * 512MiB by default, leaving 1.5GiB for modules once the page tables
+ * are fully set up. If kernel ASLR is configured, it can extend the
+ * kernel page table mapping, reducing the size of the modules area.
*/
-#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024)
+#define KERNEL_IMAGE_SIZE_DEFAULT (512 * 1024 * 1024)
+#if defined(CONFIG_RANDOMIZE_BASE) && \
+ CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT
+#define KERNEL_IMAGE_SIZE CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+#else
+#define KERNEL_IMAGE_SIZE KERNEL_IMAGE_SIZE_DEFAULT
+#endif

#endif /* _ASM_X86_PAGE_64_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 2d883440cb9a..c883bf726398 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -58,7 +58,7 @@ typedef struct { pteval_t pte; } pte_t;
#define VMALLOC_START _AC(0xffffc90000000000, UL)
#define VMALLOC_END _AC(0xffffe8ffffffffff, UL)
#define VMEMMAP_START _AC(0xffffea0000000000, UL)
-#define MODULES_VADDR _AC(0xffffffffa0000000, UL)
+#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
#define MODULES_END _AC(0xffffffffff000000, UL)
#define MODULES_LEN (MODULES_END - MODULES_VADDR)

diff --git a/arch/x86/kernel/cpu/rdrand.c b/arch/x86/kernel/cpu/rdrand.c
index 88db010845cb..384df5105fbc 100644
--- a/arch/x86/kernel/cpu/rdrand.c
+++ b/arch/x86/kernel/cpu/rdrand.c
@@ -31,20 +31,6 @@ static int __init x86_rdrand_setup(char *s)
}
__setup("nordrand", x86_rdrand_setup);

-/* We can't use arch_get_random_long() here since alternatives haven't run */
-static inline int rdrand_long(unsigned long *v)
-{
- int ok;
- asm volatile("1: " RDRAND_LONG "\n\t"
- "jc 2f\n\t"
- "decl %0\n\t"
- "jnz 1b\n\t"
- "2:"
- : "=r" (ok), "=a" (*v)
- : "0" (RDRAND_RETRY_LOOPS));
- return ok;
-}
-
/*
* Force a reseed cycle; we are architecturally guaranteed a reseed
* after no more than 512 128-bit chunks of random data. This also
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f0de6294b955..1708862fc40d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -824,6 +824,20 @@ static void __init trim_low_memory_range(void)
}

/*
+ * Dump out kernel offset information on panic.
+ */
+static int
+dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
+{
+ pr_emerg("Kernel Offset: 0x%lx from 0x%lx "
+ "(relocation range: 0x%lx-0x%lx)\n",
+ (unsigned long)&_text - __START_KERNEL, __START_KERNEL,
+ __START_KERNEL_map, MODULES_VADDR-1);
+
+ return 0;
+}
+
+/*
* Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use these data structures
* for initialization. Note, the efi init code path is determined by the
@@ -1242,3 +1256,15 @@ void __init i386_reserve_resources(void)
}

#endif /* CONFIG_X86_32 */
+
+static struct notifier_block kernel_offset_notifier = {
+ .notifier_call = dump_kernel_offset
+};
+
+static int __init register_kernel_offset_dumper(void)
+{
+ atomic_notifier_chain_register(&panic_notifier_list,
+ &kernel_offset_notifier);
+ return 0;
+}
+__initcall(register_kernel_offset_dumper);
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 4287f1ffba7e..5bdc5430597c 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -806,6 +806,9 @@ void __init mem_init(void)
BUILD_BUG_ON(VMALLOC_START >= VMALLOC_END);
#undef high_memory
#undef __FIXADDR_TOP
+#ifdef CONFIG_RANDOMIZE_BASE
+ BUILD_BUG_ON(CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE);
+#endif

#ifdef CONFIG_HIGHMEM
BUG_ON(PKMAP_BASE + LAST_PKMAP*PAGE_SIZE > FIXADDR_START);
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index f7bab68a4b83..11f9285a2ff6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -722,15 +722,25 @@ static void percpu_init(void)

/*
* Check to see if a symbol lies in the .data..percpu section.
- * For some as yet not understood reason the "__init_begin"
- * symbol which immediately preceeds the .data..percpu section
- * also shows up as it it were part of it so we do an explict
- * check for that symbol name and ignore it.
+ *
+ * The linker incorrectly associates some symbols with the
+ * .data..percpu section so we also need to check the symbol
+ * name to make sure that we classify the symbol correctly.
+ *
+ * The GNU linker incorrectly associates:
+ * __init_begin
+ * __per_cpu_load
+ *
+ * The "gold" linker incorrectly associates:
+ * init_per_cpu__irq_stack_union
+ * init_per_cpu__gdt_page
*/
static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
{
return (sym->st_shndx == per_cpu_shndx) &&
- strcmp(symname, "__init_begin");
+ strcmp(symname, "__init_begin") &&
+ strcmp(symname, "__per_cpu_load") &&
+ strncmp(symname, "init_per_cpu_", 13);
}


2014-01-20 22:54:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

So I pulled this, but one question:

On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin <[email protected]> wrote:
> +config RANDOMIZE_BASE
> + bool "Randomize the address of the kernel image"
> + depends on RELOCATABLE
> + depends on !HIBERNATION

How fundamental is that "!HIBERNATION" issue? Right now that
anti-dependency on hibernation support will mean that no distro kernel
will actually use the kernel address space randomization. Which
long-term is a problem.

I'm not sure HIBERNATION is really getting all that much use, but I
suspect distros would still want to support it.

Is it just a temporary "I wasn't able to make it work, need to get
some PM people involved", or is it something really fundamental?

Linus

2014-01-20 23:00:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/20/2014 02:54 PM, Linus Torvalds wrote:
> So I pulled this, but one question:
>
> On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin <[email protected]> wrote:
>> +config RANDOMIZE_BASE
>> + bool "Randomize the address of the kernel image"
>> + depends on RELOCATABLE
>> + depends on !HIBERNATION
>
> How fundamental is that "!HIBERNATION" issue? Right now that
> anti-dependency on hibernation support will mean that no distro kernel
> will actually use the kernel address space randomization. Which
> long-term is a problem.
>
> I'm not sure HIBERNATION is really getting all that much use, but I
> suspect distros would still want to support it.
>
> Is it just a temporary "I wasn't able to make it work, need to get
> some PM people involved", or is it something really fundamental?
>

Kees, could you comment?

-hpa

2014-01-20 23:12:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
<[email protected]> wrote:
> So I pulled this, but one question:

.. oh, and since I decided to test it, and was looking for problems:
enabling kaslr breaks "perf". The *profile* looks fine, but the
disassembly doesn't work.

I'm not entirely surprised. I decided I wanted to test it for a
reason, after all. So it's not unexpected, but perhaps people hadn't
thought about it, and clearly hadn't tested it.

Kernel modules disassemble fine, so clearly perf knows about code that
moves around, but apparently it gets surprised when the core vmlinux
file disassembly doesn't match addresses.

Linus

2014-01-20 23:14:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/20/2014 03:12 PM, Linus Torvalds wrote:
> On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
> <[email protected]> wrote:
>> So I pulled this, but one question:
>
> .. oh, and since I decided to test it, and was looking for problems:
> enabling kaslr breaks "perf". The *profile* looks fine, but the
> disassembly doesn't work.
>
> I'm not entirely surprised. I decided I wanted to test it for a
> reason, after all. So it's not unexpected, but perhaps people hadn't
> thought about it, and clearly hadn't tested it.
>
> Kernel modules disassemble fine, so clearly perf knows about code that
> moves around, but apparently it gets surprised when the core vmlinux
> file disassembly doesn't match addresses.
>

So this is presumably something that needs to be fixed in perf?

-hpa

2014-01-21 05:18:48

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
<[email protected]> wrote:
> So I pulled this, but one question:
>
> On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin <[email protected]> wrote:
>> +config RANDOMIZE_BASE
>> + bool "Randomize the address of the kernel image"
>> + depends on RELOCATABLE
>> + depends on !HIBERNATION
>
> How fundamental is that "!HIBERNATION" issue? Right now that
> anti-dependency on hibernation support will mean that no distro kernel
> will actually use the kernel address space randomization. Which
> long-term is a problem.
>
> I'm not sure HIBERNATION is really getting all that much use, but I
> suspect distros would still want to support it.
>
> Is it just a temporary "I wasn't able to make it work, need to get
> some PM people involved", or is it something really fundamental?

Right, this is a "need to get PM people involved" situation. When
kASLR was being designed, hibernation learning about the kernel base
looked like a separable problem, and given the very long list of
requirements for making it work at all, I carved this out as "future
work".

As for perf, it's similar -- it's another entirely solvable problem,
but perf needs to be untaught some of its assumptions.

We've had a static kernel base forever, so I'm expecting some bumps in
the road here. I'm hopeful none of it will be too painful, though.

-Kees

--
Kees Cook
Chrome OS Security

2014-01-21 09:00:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Mon, Jan 20, 2014 at 03:13:33PM -0800, H. Peter Anvin wrote:
> On 01/20/2014 03:12 PM, Linus Torvalds wrote:
> > On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
> > <[email protected]> wrote:
> >> So I pulled this, but one question:
> >
> > .. oh, and since I decided to test it, and was looking for problems:
> > enabling kaslr breaks "perf". The *profile* looks fine, but the
> > disassembly doesn't work.
> >
> > I'm not entirely surprised. I decided I wanted to test it for a
> > reason, after all. So it's not unexpected, but perhaps people hadn't
> > thought about it, and clearly hadn't tested it.
> >
> > Kernel modules disassemble fine, so clearly perf knows about code that
> > moves around, but apparently it gets surprised when the core vmlinux
> > file disassembly doesn't match addresses.
> >
>
> So this is presumably something that needs to be fixed in perf?

Where do we learn about the offset from userspace?

2014-01-21 10:27:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* Linus Torvalds <[email protected]> wrote:

> On Mon, Jan 20, 2014 at 2:54 PM, Linus Torvalds
> <[email protected]> wrote:
> > So I pulled this, but one question:
>
> .. oh, and since I decided to test it, and was looking for problems:
> enabling kaslr breaks "perf". The *profile* looks fine, but the
> disassembly doesn't work.
>
> I'm not entirely surprised. I decided I wanted to test it for a
> reason, after all. So it's not unexpected, but perhaps people hadn't
> thought about it, and clearly hadn't tested it.
>
> Kernel modules disassemble fine, so clearly perf knows about code
> that moves around, but apparently it gets surprised when the core
> vmlinux file disassembly doesn't match addresses.

Hm, live annotation of the kernel image is a relatively new perf
feature, and KASLR predated that (by years) - which would at least in
part explain why it went unnoticed. (Although it does not excuse the
lack of testing.)

I've Cc:-ed Adrian Hunter, the author of the /proc/kcore annotation
feature.

Adrian: enabling x86 KASLR in Linus's latestest kernel via
CONFIG_RANDOMIZE_BASE=y [this is also present in the -tip tree] breaks
/proc/kcore annotation, because the kernel image position is now per
bootup randomized and perf's annotation code gets surprised by that.

Thanks,

Ingo

2014-01-21 13:56:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/21/2014 02:27 AM, Ingo Molnar wrote:
>
> Hm, live annotation of the kernel image is a relatively new perf
> feature, and KASLR predated that (by years) - which would at least in
> part explain why it went unnoticed. (Although it does not excuse the
> lack of testing.)
>

kASLR is new, but on 32 bits we have relocated the kernel for a long
time. kASLR is the first use case of relocating the 64-bit kernel, though.

-hpa

2014-01-21 14:03:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* H. Peter Anvin <[email protected]> wrote:

> On 01/21/2014 02:27 AM, Ingo Molnar wrote:
> >
> > Hm, live annotation of the kernel image is a relatively new perf
> > feature, and KASLR predated that (by years) - which would at least in
> > part explain why it went unnoticed. (Although it does not excuse the
> > lack of testing.)
>
> kASLR is new, but on 32 bits we have relocated the kernel for a long
> time. [...]

I doubt many people develop on 32-bit x86, and the group of people
looking at annotated 32-bit assembly kernel profiles ought to be
another order of magnitude smaller than that ...

> [...] kASLR is the first use case of relocating the 64-bit kernel,
> though.

Yeah.

Thanks,

Ingo

2014-01-21 14:06:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/21/2014 06:03 AM, Ingo Molnar wrote:
>
> * H. Peter Anvin <[email protected]> wrote:
>
>> On 01/21/2014 02:27 AM, Ingo Molnar wrote:
>>>
>>> Hm, live annotation of the kernel image is a relatively new perf
>>> feature, and KASLR predated that (by years) - which would at least in
>>> part explain why it went unnoticed. (Although it does not excuse the
>>> lack of testing.)
>>
>> kASLR is new, but on 32 bits we have relocated the kernel for a long
>> time. [...]
>
> I doubt many people develop on 32-bit x86, and the group of people
> looking at annotated 32-bit assembly kernel profiles ought to be
> another order of magnitude smaller than that ...
>

Yes... I was commenting on the statement that "kASLR predated that by
years". It hasn't been common.

-hpa

2014-01-21 14:14:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* H. Peter Anvin <[email protected]> wrote:

> On 01/21/2014 06:03 AM, Ingo Molnar wrote:
> >
> > * H. Peter Anvin <[email protected]> wrote:
> >
> >> On 01/21/2014 02:27 AM, Ingo Molnar wrote:
> >>>
> >>> Hm, live annotation of the kernel image is a relatively new perf
> >>> feature, and KASLR predated that (by years) - which would at least in
> >>> part explain why it went unnoticed. (Although it does not excuse the
> >>> lack of testing.)
> >>
> >> kASLR is new, but on 32 bits we have relocated the kernel for a long
> >> time. [...]
> >
> > I doubt many people develop on 32-bit x86, and the group of people
> > looking at annotated 32-bit assembly kernel profiles ought to be
> > another order of magnitude smaller than that ...
>
> Yes... I was commenting on the statement that "kASLR predated that
> by years". It hasn't been common.

Ah, I didn't mean to suggest that it's an old upstream feature: what I
mean is that the KASLR patch is pretty old, and it has been deployed
by the Chromium guys for quite some time, and by others?

It was just never combined with perf live annotation which is a recent
perf feature.

Anyway ... I suspect it's the fixing of the bug that matters most, not
its genealogy ;)

Thanks,

Ingo

2014-01-21 14:18:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/21/2014 06:14 AM, Ingo Molnar wrote:
>
> Ah, I didn't mean to suggest that it's an old upstream feature: what I
> mean is that the KASLR patch is pretty old, and it has been deployed
> by the Chromium guys for quite some time, and by others?
>
> It was just never combined with perf live annotation which is a recent
> perf feature.
>

Ah, right. Parallel convergence.

> Anyway ... I suspect it's the fixing of the bug that matters most, not
> its genealogy ;)

Indeed.

-hpa

2014-01-21 14:21:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/21/2014 01:00 AM, Peter Zijlstra wrote:
>>
>> So this is presumably something that needs to be fixed in perf?
>
> Where do we learn about the offset from userspace?
>

Now this is tricky... if this offset is too easy to get it completely
defeats kASLR. On the other hand, I presume that if we are exporting
/proc/kcore we're not secure anyway. Kees, I assume that in "secure"
mode perf annotations simply wouldn't work anyway?

-hpa

2014-01-21 14:39:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* H. Peter Anvin <[email protected]> wrote:

> On 01/21/2014 01:00 AM, Peter Zijlstra wrote:
> >>
> >> So this is presumably something that needs to be fixed in perf?
> >
> > Where do we learn about the offset from userspace?
> >
>
> Now this is tricky... if this offset is too easy to get it
> completely defeats kASLR. On the other hand, I presume that if we
> are exporting /proc/kcore we're not secure anyway. Kees, I assume
> that in "secure" mode perf annotations simply wouldn't work anyway?

So /proc/kcore is:

aldebaran:~> ls -l /proc/kcore
-r-------- 1 root root 140737486266368 Jan 21 15:35 /proc/kcore

i.e. root only.

The thing is, one of my first remarks on this whole KASLR series was
that tooling needs to work. I suggested that the kernel should only
expose non-randomized addresses and that all facilities need to
continue to 'just work' with those. That argument was ignored AFAICS
and the problem still isn't solved.

I'd argue that solving it in the kernel instead of making all tooling
variants aware of KASLR one by one is a far more intelligent and
efficient solution ...

Thanks,

Ingo

2014-01-21 14:52:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/21/2014 06:39 AM, Ingo Molnar wrote:
>>
>> Now this is tricky... if this offset is too easy to get it
>> completely defeats kASLR. On the other hand, I presume that if we
>> are exporting /proc/kcore we're not secure anyway. Kees, I assume
>> that in "secure" mode perf annotations simply wouldn't work anyway?
>
> So /proc/kcore is:
>
> aldebaran:~> ls -l /proc/kcore
> -r-------- 1 root root 140737486266368 Jan 21 15:35 /proc/kcore
>
> i.e. root only.
>

Yes, I mean if we want to be sealed against user space intrusion I'm
assuming /proc/kcore is unavailable, but that we don't

> The thing is, one of my first remarks on this whole KASLR series was
> that tooling needs to work. I suggested that the kernel should only
> expose non-randomized addresses and that all facilities need to
> continue to 'just work' with those. That argument was ignored AFAICS
> and the problem still isn't solved.
>
> I'd argue that solving it in the kernel instead of making all tooling
> variants aware of KASLR one by one is a far more intelligent and
> efficient solution ...

Not ignored, but found not to really work all that well (we had that
discussion in the context of relocated kernels, too.) The problem you
end up with is that as soon as you run into situations where you have to
deal with pointers during debugging, be it using kgdb, stack dumps or
whatever, all the work that you have done in the kernel to try to hide
relocation from the debug infrastructure all of a sudden becomes a huge
liability, and ends up backfiring in a horrific way.

-hpa

2014-01-21 14:56:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* H. Peter Anvin <[email protected]> wrote:

> > The thing is, one of my first remarks on this whole KASLR series
> > was that tooling needs to work. I suggested that the kernel should
> > only expose non-randomized addresses and that all facilities need
> > to continue to 'just work' with those. That argument was ignored
> > AFAICS and the problem still isn't solved.
> >
> > I'd argue that solving it in the kernel instead of making all
> > tooling variants aware of KASLR one by one is a far more
> > intelligent and efficient solution ...
>
> Not ignored, but found not to really work all that well (we had that
> discussion in the context of relocated kernels, too.) The problem
> you end up with is that as soon as you run into situations where you
> have to deal with pointers during debugging, be it using kgdb, stack
> dumps or whatever, all the work that you have done in the kernel to
> try to hide relocation from the debug infrastructure all of a sudden
> becomes a huge liability, and ends up backfiring in a horrific way.

The thing is, that 'huge liability' is now pushed into tooling, which
isn't in any better position to judge a piece of data in a backtrace
than the kernel - in fact it's in an arguably worse position, as it
does not generate that data.

kgdb is an entirely different animal, I'm talking about the 99%
usecase: code profiling and tooling interpreting code addresses that
come from the kernel.

Thanks,

Ingo

2014-01-21 18:37:04

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 21, 2014 at 6:20 AM, H. Peter Anvin <[email protected]> wrote:
> On 01/21/2014 01:00 AM, Peter Zijlstra wrote:
>>>
>>> So this is presumably something that needs to be fixed in perf?
>>
>> Where do we learn about the offset from userspace?
>
> Now this is tricky... if this offset is too easy to get it completely
> defeats kASLR. On the other hand, I presume that if we are exporting
> /proc/kcore we're not secure anyway. Kees, I assume that in "secure"
> mode perf annotations simply wouldn't work anyway?

The goal scope of the kernel base address randomization is to keep it
secret from non-root users, confined processes, and/or remote systems.
For local secrecy, if you're running with kaslr and you haven't set
kptr_restrict, dmesg_restrict, and perf_event_paranoid, that's a
problem since you're likely leaking things trivially through
/proc/kallsyms, dmesg, and/or perf.

-Kees

--
Kees Cook
Chrome OS Security

2014-01-23 09:39:25

by Pavel Machek

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Mon 2014-01-20 14:54:12, Linus Torvalds wrote:
> So I pulled this, but one question:
>
> On Mon, Jan 20, 2014 at 8:47 AM, H. Peter Anvin <[email protected]> wrote:
> > +config RANDOMIZE_BASE
> > + bool "Randomize the address of the kernel image"
> > + depends on RELOCATABLE
> > + depends on !HIBERNATION
>
> How fundamental is that "!HIBERNATION" issue? Right now that
> anti-dependency on hibernation support will mean that no distro kernel
> will actually use the kernel address space randomization. Which
> long-term is a problem.

Hibernation does some interesting tricks, and it did depend on image
layout staying same.

Rafael did some great work on x86-64 to enable resuming different
kernel version, so this should no longer be a problem (on
x86-64). Just test it, failure will not be subtle.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-01-26 10:16:33

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Mon, Jan 20, 2014 at 5:47 PM, H. Peter Anvin <[email protected]> wrote:
> Hi Linus,
>
> This enables kernel address space randomization for x86.
>
> The following changes since commit d0e639c9e06d44e713170031fe05fb60ebe680af:
>
> Linux 3.12-rc4 (2013-10-06 14:00:20 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-kaslr-for-linus
>
> The head of this tree is da2b6fb990cf782b18952f534ec7323453bc4fc9.
>
> x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET (2014-01-14 10:45:56 -0800)
>
> ----------------------------------------------------------------
> H. Peter Anvin (2):
> x86, boot: Rename get_flags() and check_flags() to *_cpuflags()
> x86, kaslr: Add a circular multiply for better bit diffusion
>
> Kees Cook (10):
> x86, boot: Move CPU flags out of cpucheck
> x86, kaslr: Return location from decompress_kernel
> x86, kaslr: Provide randomness functions
> x86, kaslr: Select random position from e820 maps
> x86, kaslr: Report kernel offset on panic
> x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64
> x86/relocs: Add percpu fixup for GNU ld 2.23
> x86, kaslr: Mix entropy sources together as needed
> x86, kaslr: Use char array to gain sizeof sanity
> x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET
>
> Michael Davidson (1):
> x86, relocs: Add more per-cpu gold special cases
>
> Wei Yongjun (1):
> x86, kaslr: Remove unused including <linux/version.h>

Currently we print the kernel offset only upon a panic() using the
panic notifier list.
This way it does not show up if the kernel hits a BUG() in process
context or something less critical.
Wouldn't make more sense to report the offset in every dump_stack() or
show_regs() call?
Or at least add a hint that the kernel was built with kaslr enabled.

I'm a heavy user of addr2line -e vmlinux <booboo-happened-here>, so I
want to make sure that I don't waste
too much time with finding out that the addresses are useless. :-)

--
Thanks,
//richard

2014-01-27 05:33:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/26/2014 02:16 AM, Richard Weinberger wrote:
>
> Currently we print the kernel offset only upon a panic() using the
> panic notifier list.
> This way it does not show up if the kernel hits a BUG() in process
> context or something less critical.
> Wouldn't make more sense to report the offset in every dump_stack() or
> show_regs() call?

No, because that information is available to user space unless we panic.

-hpa

2014-01-27 06:49:14

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Mon, Jan 27, 2014 at 6:33 AM, H. Peter Anvin <[email protected]> wrote:
> On 01/26/2014 02:16 AM, Richard Weinberger wrote:
>>
>> Currently we print the kernel offset only upon a panic() using the
>> panic notifier list.
>> This way it does not show up if the kernel hits a BUG() in process
>> context or something less critical.
>> Wouldn't make more sense to report the offset in every dump_stack() or
>> show_regs() call?
>
> No, because that information is available to user space unless we panic.

Didn't you mean non-root?
I thought one has to set dmesg_restrict anyways if kASLR is used.

And isn't the offset available to perf too?
Of course only for root, but still user space.

--
Thanks,
//richard

2014-01-27 06:51:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/26/2014 10:49 PM, Richard Weinberger wrote:
>>
>> No, because that information is available to user space unless we panic.
>
> Didn't you mean non-root?
> I thought one has to set dmesg_restrict anyways if kASLR is used.
>
> And isn't the offset available to perf too?
> Of course only for root, but still user space.
>

For certain system security levels one want to protect even from a rogue
root. In those cases, leaking that information via dmesg and perf isn't
going to work, either.

With lower security settings, by all means...

-hpa

2014-01-27 06:53:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Of course, stack traces themselves contain that information, so one
could argue that oops=panic is required in order for kASLR to provide
any kind of security against root. (oops=panic is probably a good idea
in secure environments anyway...)

-hpa

2014-01-27 07:34:35

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Am 27.01.2014 07:52, schrieb H. Peter Anvin:
> Of course, stack traces themselves contain that information, so one
> could argue that oops=panic is required in order for kASLR to provide
> any kind of security against root. (oops=panic is probably a good idea
> in secure environments anyway...)

Now I understand your point.

/proc/<pid>/stack and a world-readable /boot also need to be disabled.
Deploying a secure kASLR is not easy, especially for end-user distros.

Maybe a CONFIG_RANDOMIZE_BASE_I_MEAN_IT which disables various sources of
information leakage would help too. ;-)

Thanks,
//richard

2014-01-27 07:38:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* H. Peter Anvin <[email protected]> wrote:

> On 01/26/2014 10:49 PM, Richard Weinberger wrote:
> >>
> >> No, because that information is available to user space unless we panic.
> >
> > Didn't you mean non-root?
> > I thought one has to set dmesg_restrict anyways if kASLR is used.
> >
> > And isn't the offset available to perf too?
> > Of course only for root, but still user space.
> >
>
> For certain system security levels one want to protect even from a
> rogue root. In those cases, leaking that information via dmesg and
> perf isn't going to work, either.
>
> With lower security settings, by all means...

The 'no' was categorical and unconditional though, so is the right
answer perhaps something more along the lines of:

'Yes, the random offset can be reported in an oops, as long as
high security setups can turn off the reporting of the offset,
in their idealistic attempt to protect the system against root.'

?

I also still think that in addition to reporting the offset,
automatically 'un-randomizing' the oopses and warnings would be useful
as well: with a clear to recognize indicator used for every value
unrandomized, such as capitalizing their first hexa digit.

Let me show a mockup of how I think it could work:

raw 64-bit original:

[ 246.085174] <IRQ> [<ffffffff8264fbf6>] dump_stack+0x46/0x58
[ 246.098352] [<ffffffff82054fb6>] warn_slowpath_fmt+0x46/0x50
[ 246.104786] [<ffffffff825710d6>] dev_watchdog+0x246/0x250
[ 246.110923] [<ffffffff82570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[ 246.119097] [<ffffffff8206092a>] call_timer_fn+0x3a/0x110
[ 246.125224] [<ffffffff8206280f>] ? update_process_times+0x6f/0x80

64-bit un-randomized:

[ 246.085174] <IRQ> [<FFFFFFFF8164fbf6>] dump_stack+0x46/0x58
[ 246.091633] [<FFFFFFFF81054ecc>] warn_slowpath_common+0x8c/0xc0
[ 246.098352] [<FFFFFFFF81054fb6>] warn_slowpath_fmt+0x46/0x50
[ 246.104786] [<FFFFFFFF815710d6>] dev_watchdog+0x246/0x250
[ 246.110923] [<FFFFFFFF81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[ 246.119097] [<FFFFFFFF8106092a>] call_timer_fn+0x3a/0x110
[ 246.125224] [<FFFFFFFF8106280f>] ? update_process_times+0x6f/0x80

Note how the hex values of unrandomized kernel text start with capital
letters, and how their values match up System.map and vmlinux symbol
values.

raw 32-bit randomized:

[ 39.054098] [<c20ded55>] ? __jump_label_update+0x45/0x60
[ 39.064852] [<c2057aa2>] ? queue_work_on+0x32/0x70
[ 39.074570] [<c20085b1>] ? mark_tsc_unstable+0x21/0x60
[ 39.084980] [<c2f03af6>] ? tsc_init+0x326/0x344
[ 39.094175] [<c2eff9c5>] ? start_kernel+0x2c7/0x356

32-bit un-randomized:

[ 39.054098] [<C10ded55>] ? __jump_label_update+0x45/0x60
[ 39.064852] [<C1057aa2>] ? queue_work_on+0x32/0x70
[ 39.074570] [<C10085b1>] ? mark_tsc_unstable+0x21/0x60
[ 39.084980] [<C1f03af6>] ? tsc_init+0x326/0x344
[ 39.094175] [<C1eff9c5>] ? start_kernel+0x2c7/0x356

This looks eminently useful to me, I could plug those hexa values into
gdb straight away to look up a symbol instead of having to subtract
the random offset first.

This would do 99% of the unrandomizing job for the user/developer (and
not the least, for tooling), without obfuscating oopses as it would be
clear on which values the unrandomizing was performed, without losing
information.

But only reporting the random offset with every raw oops and stack
dump would lead to equivalent information as well. We can tweak this
as we go, it's not like these details are ABIs - I just still think
that the KASLR feature isn't nearly as utility oriented as it should
be.

Thanks,

Ingo

2014-01-27 07:43:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* Ingo Molnar <[email protected]> wrote:

>
> * H. Peter Anvin <[email protected]> wrote:
>
> > On 01/26/2014 10:49 PM, Richard Weinberger wrote:
> > >>
> > >> No, because that information is available to user space unless we panic.
> > >
> > > Didn't you mean non-root?
> > > I thought one has to set dmesg_restrict anyways if kASLR is used.
> > >
> > > And isn't the offset available to perf too?
> > > Of course only for root, but still user space.
> > >
> >
> > For certain system security levels one want to protect even from a
> > rogue root. In those cases, leaking that information via dmesg and
> > perf isn't going to work, either.
> >
> > With lower security settings, by all means...
>
> The 'no' was categorical and unconditional though, so is the right
> answer perhaps something more along the lines of:
>
> 'Yes, the random offset can be reported in an oops, as long as
> high security setups can turn off the reporting of the offset,
> in their idealistic attempt to protect the system against root.'
>
> ?

'reporting of the offset' should probably be 'reporting kernel data' -
there's many possible ways an oops (and its associated raw stack dump)
can leak the offset, I'm not sure this can ever be made 'safe' against
a rougue root.

Not giving kernel originated debug information at all would. (At the
cost of reducing the utility of having that root user around, of
course.)

Thanks,

Ingo

2014-01-27 07:59:48

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Am 27.01.2014 08:38, schrieb Ingo Molnar:
>
> * H. Peter Anvin <[email protected]> wrote:
>
>> On 01/26/2014 10:49 PM, Richard Weinberger wrote:
>>>>
>>>> No, because that information is available to user space unless we panic.
>>>
>>> Didn't you mean non-root?
>>> I thought one has to set dmesg_restrict anyways if kASLR is used.
>>>
>>> And isn't the offset available to perf too?
>>> Of course only for root, but still user space.
>>>
>>
>> For certain system security levels one want to protect even from a
>> rogue root. In those cases, leaking that information via dmesg and
>> perf isn't going to work, either.
>>
>> With lower security settings, by all means...
>
> The 'no' was categorical and unconditional though, so is the right
> answer perhaps something more along the lines of:
>
> 'Yes, the random offset can be reported in an oops, as long as
> high security setups can turn off the reporting of the offset,
> in their idealistic attempt to protect the system against root.'
>
> ?
>
> I also still think that in addition to reporting the offset,
> automatically 'un-randomizing' the oopses and warnings would be useful
> as well: with a clear to recognize indicator used for every value
> unrandomized, such as capitalizing their first hexa digit.
>
> Let me show a mockup of how I think it could work:
>
> raw 64-bit original:
>
> [ 246.085174] <IRQ> [<ffffffff8264fbf6>] dump_stack+0x46/0x58
> [ 246.098352] [<ffffffff82054fb6>] warn_slowpath_fmt+0x46/0x50
> [ 246.104786] [<ffffffff825710d6>] dev_watchdog+0x246/0x250
> [ 246.110923] [<ffffffff82570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
> [ 246.119097] [<ffffffff8206092a>] call_timer_fn+0x3a/0x110
> [ 246.125224] [<ffffffff8206280f>] ? update_process_times+0x6f/0x80
>
> 64-bit un-randomized:
>
> [ 246.085174] <IRQ> [<FFFFFFFF8164fbf6>] dump_stack+0x46/0x58
> [ 246.091633] [<FFFFFFFF81054ecc>] warn_slowpath_common+0x8c/0xc0
> [ 246.098352] [<FFFFFFFF81054fb6>] warn_slowpath_fmt+0x46/0x50
> [ 246.104786] [<FFFFFFFF815710d6>] dev_watchdog+0x246/0x250
> [ 246.110923] [<FFFFFFFF81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
> [ 246.119097] [<FFFFFFFF8106092a>] call_timer_fn+0x3a/0x110
> [ 246.125224] [<FFFFFFFF8106280f>] ? update_process_times+0x6f/0x80
>
> Note how the hex values of unrandomized kernel text start with capital
> letters, and how their values match up System.map and vmlinux symbol
> values.
>
> raw 32-bit randomized:
>
> [ 39.054098] [<c20ded55>] ? __jump_label_update+0x45/0x60
> [ 39.064852] [<c2057aa2>] ? queue_work_on+0x32/0x70
> [ 39.074570] [<c20085b1>] ? mark_tsc_unstable+0x21/0x60
> [ 39.084980] [<c2f03af6>] ? tsc_init+0x326/0x344
> [ 39.094175] [<c2eff9c5>] ? start_kernel+0x2c7/0x356
>
> 32-bit un-randomized:
>
> [ 39.054098] [<C10ded55>] ? __jump_label_update+0x45/0x60
> [ 39.064852] [<C1057aa2>] ? queue_work_on+0x32/0x70
> [ 39.074570] [<C10085b1>] ? mark_tsc_unstable+0x21/0x60
> [ 39.084980] [<C1f03af6>] ? tsc_init+0x326/0x344
> [ 39.094175] [<C1eff9c5>] ? start_kernel+0x2c7/0x356
>
> This looks eminently useful to me, I could plug those hexa values into
> gdb straight away to look up a symbol instead of having to subtract
> the random offset first.
>
> This would do 99% of the unrandomizing job for the user/developer (and
> not the least, for tooling), without obfuscating oopses as it would be
> clear on which values the unrandomizing was performed, without losing
> information.

I like this idea.

Hopefully nothing breaks if the mix lower and upper case hex numbers. =)
If so we could still inject a line like
"[<fffffffffffffffe>] __unrandomize_addr+0x0/0x0" into the trace
to mark a an un-randomized one.
Or a <UN-RANDOM> like <IRQ> on x86_64...

Thanks,
//richard

2014-01-27 17:05:28

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Sun, Jan 26, 2014 at 10:49 PM, Richard Weinberger
<[email protected]> wrote:
> On Mon, Jan 27, 2014 at 6:33 AM, H. Peter Anvin <[email protected]> wrote:
>> On 01/26/2014 02:16 AM, Richard Weinberger wrote:
>>>
>>> Currently we print the kernel offset only upon a panic() using the
>>> panic notifier list.
>>> This way it does not show up if the kernel hits a BUG() in process
>>> context or something less critical.
>>> Wouldn't make more sense to report the offset in every dump_stack() or
>>> show_regs() call?
>>
>> No, because that information is available to user space unless we panic.
>
> Didn't you mean non-root?
> I thought one has to set dmesg_restrict anyways if kASLR is used.
>
> And isn't the offset available to perf too?
> Of course only for root, but still user space.

Setting dmesg_restrict is done mostly in an effort to try to lock down
access to dmesg since it'll likely contain enough clues to help an
attacker. System owners need to avoid dmesg getting sprayed into
/var/log world-readable, or available via privileged debugging
daemons, etc. Since keeping dmesg secret from non-root users is going
to be error-prone, I had a goal of keeping the offset out of dmesg
while the system is still running -- hence doing it only at panic
time.

Finding the offset as the (unconfined) root user is extremely easy, so
I personally see no reason to hide it from root (and it would be very
irritating for things like perf, too). I view kASLR as a tool for
statistical defense against confined processes or remote attacks.

I would argue that decoding a non-panic oops on a running system is
entirely possible as-is, since the offset can be found from
/proc/kallsyms as root. It was the dead system that needed the offset
exported: via text in the panic, or via an ELF note in a core.

-Kees

--
Kees Cook
Chrome OS Security

2014-01-27 17:20:23

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Am 27.01.2014 18:05, schrieb Kees Cook:
> I would argue that decoding a non-panic oops on a running system is
> entirely possible as-is, since the offset can be found from
> /proc/kallsyms as root. It was the dead system that needed the offset
> exported: via text in the panic, or via an ELF note in a core.

The problem is that you have to pickup information from two sources.
As a kernel developer users/customers often show you a backtrace (oops or panic)
and want you do find the problem.
They barley manage it copy&paste the topmost full trace from dmesg or /var/log/messages.
If I have to ask them a bit later to tell me the offset from /proc/kallsyms or something else
I'm lost. Mostly because they have already rebooted the box...

Thanks,
//richard

2014-01-27 17:24:19

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Mon, Jan 27, 2014 at 9:20 AM, Richard Weinberger <[email protected]> wrote:
> Am 27.01.2014 18:05, schrieb Kees Cook:
>> I would argue that decoding a non-panic oops on a running system is
>> entirely possible as-is, since the offset can be found from
>> /proc/kallsyms as root. It was the dead system that needed the offset
>> exported: via text in the panic, or via an ELF note in a core.
>
> The problem is that you have to pickup information from two sources.
> As a kernel developer users/customers often show you a backtrace (oops or panic)
> and want you do find the problem.
> They barley manage it copy&paste the topmost full trace from dmesg or /var/log/messages.
> If I have to ask them a bit later to tell me the offset from /proc/kallsyms or something else
> I'm lost. Mostly because they have already rebooted the box...

As long as I can turn it off, I'd be happy. :)
/proc/sys/kernel/kaslr_in_oops or something?

-Kees

--
Kees Cook
Chrome OS Security

2014-01-28 06:28:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* Kees Cook <[email protected]> wrote:

> On Mon, Jan 27, 2014 at 9:20 AM, Richard Weinberger <[email protected]> wrote:
> > Am 27.01.2014 18:05, schrieb Kees Cook:
> >> I would argue that decoding a non-panic oops on a running system is
> >> entirely possible as-is, since the offset can be found from
> >> /proc/kallsyms as root. It was the dead system that needed the offset
> >> exported: via text in the panic, or via an ELF note in a core.
> >
> > The problem is that you have to pickup information from two sources.
> > As a kernel developer users/customers often show you a backtrace (oops or panic)
> > and want you do find the problem.
> > They barley manage it copy&paste the topmost full trace from dmesg or /var/log/messages.
> > If I have to ask them a bit later to tell me the offset from /proc/kallsyms or something else
> > I'm lost. Mostly because they have already rebooted the box...
>
> As long as I can turn it off, I'd be happy. :)
> /proc/sys/kernel/kaslr_in_oops or something?

Yeah, as long as it decodes by default.

Thanks,

Ingo

2014-01-28 08:26:11

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Am 28.01.2014 07:28, schrieb Ingo Molnar:
>
> * Kees Cook <[email protected]> wrote:
>
>> On Mon, Jan 27, 2014 at 9:20 AM, Richard Weinberger <[email protected]> wrote:
>>> Am 27.01.2014 18:05, schrieb Kees Cook:
>>>> I would argue that decoding a non-panic oops on a running system is
>>>> entirely possible as-is, since the offset can be found from
>>>> /proc/kallsyms as root. It was the dead system that needed the offset
>>>> exported: via text in the panic, or via an ELF note in a core.
>>>
>>> The problem is that you have to pickup information from two sources.
>>> As a kernel developer users/customers often show you a backtrace (oops or panic)
>>> and want you do find the problem.
>>> They barley manage it copy&paste the topmost full trace from dmesg or /var/log/messages.
>>> If I have to ask them a bit later to tell me the offset from /proc/kallsyms or something else
>>> I'm lost. Mostly because they have already rebooted the box...
>>
>> As long as I can turn it off, I'd be happy. :)
>> /proc/sys/kernel/kaslr_in_oops or something?

Would be nice to have! :)

> Yeah, as long as it decodes by default.

Yep.
I like Ingo's idea (capital letters as indicators).
Are we all fine with that?

Thanks,
//richard

2014-01-28 15:55:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/28/2014 12:25 AM, Richard Weinberger wrote:
>
> Yep.
> I like Ingo's idea (capital letters as indicators).
> Are we all fine with that?
>

I guess it is extremely unlikely that we'd have a kernel address without
any letters... the "general solutions only please" part of my brain
wants to scream to handle that corner case, though. The most logical
way would be to change the brackets.

It doesn't matter much, though.

-hpa

2014-01-28 16:25:43

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Am 28.01.2014 16:55, schrieb H. Peter Anvin:
> On 01/28/2014 12:25 AM, Richard Weinberger wrote:
>>
>> Yep.
>> I like Ingo's idea (capital letters as indicators).
>> Are we all fine with that?
>>
>
> I guess it is extremely unlikely that we'd have a kernel address without
> any letters... the "general solutions only please" part of my brain
> wants to scream to handle that corner case, though. The most logical
> way would be to change the brackets.
>
> It doesn't matter much, though.

I fear both solutions will break various scripts.
For example scripts/markup_oops.pl looks for \[\<([a-z0-9]+)\>\].

Thanks,
//richard

2014-01-28 16:31:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/28/2014 08:25 AM, Richard Weinberger wrote:
>
> I fear both solutions will break various scripts.
> For example scripts/markup_oops.pl looks for \[\<([a-z0-9]+)\>\].
>

I don't think there is any way to not break anything... we're
introducing a new kind of object ("normalized kernel pointer") here.

-hpa


2014-01-28 16:51:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 8:30 AM, H. Peter Anvin <[email protected]> wrote:
>
> I don't think there is any way to not break anything... we're
> introducing a new kind of object ("normalized kernel pointer") here.

I suspect we could just drop the addresses entirely if we have a
symbolic version. It's not like the hex addresses are all that
interesting, and in commit messages I actually end up editing them out
just because they are worthless noise.

So we could just do "schedule+0x45/0x368" without the actual hex
address in brackets..

Linus

2014-01-28 17:05:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* Linus Torvalds <[email protected]> wrote:

> On Tue, Jan 28, 2014 at 8:30 AM, H. Peter Anvin <[email protected]> wrote:
> >
> > I don't think there is any way to not break anything... we're
> > introducing a new kind of object ("normalized kernel pointer") here.
>
> I suspect we could just drop the addresses entirely if we have a
> symbolic version. It's not like the hex addresses are all that
> interesting, and in commit messages I actually end up editing them
> out just because they are worthless noise.

Well, I often use the hex numbers to look them up and disassemble them
in a vmlinux via gdb and 'list *0x1234123412341234' - where the
vmlinux has no debuginfo. (Debuginfo takes longer to build so I
generally build without it.)

I'm quite sure other kernel developers do that as well.

> So we could just do "schedule+0x45/0x368" without the actual hex
> address in brackets..

AFAICS this won't work in a symbol-less vmlinux. Is there some trick
to do it with gdb?

Thanks,

Ingo

2014-01-28 17:12:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 9:05 AM, Ingo Molnar <[email protected]> wrote:
>
> Well, I often use the hex numbers to look them up and disassemble them
> in a vmlinux via gdb and 'list *0x1234123412341234' - where the
> vmlinux has no debuginfo. (Debuginfo takes longer to build so I
> generally build without it.)

Why the heck wouldn't you do that? Just do

list schedule+0x45

instead.

> AFAICS this won't work in a symbol-less vmlinux. Is there some trick
> to do it with gdb?

Why would you have a symbol-less vmlinux? The only reason to strip
vmlinux is because you were crazy enough to build with
CONFIG_DEBUG_INFO and the damn debug info is so large that it won't
fit on your root partition. But dammit, if you build with debug_info
and then strip the end result, you're just insane. You made your build
take ten times longer, use ten times more diskspace, and then you
throw it all away. Crazy.

So I don't think the symbol-less version is worth even worrying about.
You do want to build with KALLSYMS (or whatever the config option is
called), so that the symbolic name is worth something, but once you
have the symbolc name, you're good unless you did something terminally
stupid.

Btw, we should make it harder to enable CONFIG_DEBUG_INFO. It's a
f*cking pain. It's particularly nasty when you do "make allmodconfig"
and it enables debug-info and makes the build take forever and waste
diskspace - but nobody sane actually *boots* the end result, so that
debug info is all pointless.

Linus

2014-01-28 17:24:46

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Am 28.01.2014 18:12, schrieb Linus Torvalds:
> On Tue, Jan 28, 2014 at 9:05 AM, Ingo Molnar <[email protected]> wrote:
>>
>> Well, I often use the hex numbers to look them up and disassemble them
>> in a vmlinux via gdb and 'list *0x1234123412341234' - where the
>> vmlinux has no debuginfo. (Debuginfo takes longer to build so I
>> generally build without it.)
>
> Why the heck wouldn't you do that? Just do
>
> list schedule+0x45
>
> instead.

Hmm, my gdb does not like that notion.

(gdb) list schedule+0x45
Function "schedule+0x45" not defined.

It thinks +0x45 is part of the function name, adding spaces also does not help.
Or am I doing something very stupid?

Thanks,
//richard

2014-01-28 17:35:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 9:24 AM, Richard Weinberger <[email protected]> wrote:
>
> Hmm, my gdb does not like that notion.
>
> (gdb) list schedule+0x45
> Function "schedule+0x45" not defined.

I don't have a debug build, so maybe it's something specific to gdb
actually seeing type information and then being confused..

Can you do "x/10i schedule+0x45" because that definitely works for me.
But I literally detest DBUG_INFO builds, because it's useless crap.
99% of everything I debug is from people reporting problems on their
kernels, so it's not like my local debug info would match that anyway.

Anyway, if it's a type information thing due to DEBUG_INFO that
confuses gdb and makes it think that the "+0x45" part doesn't make
sense, you may need to add a cast to get gdb to ignore the type
information. IOW, does

list (void *)schedule + 0x45

work for you?

I happen to have gdb-7.6.50 here that I tested with, but I've used
"symbol+offset" forever afaik, so it's definitely not something new.

Linus

2014-01-28 17:52:26

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Am 28.01.2014 18:35, schrieb Linus Torvalds:
> On Tue, Jan 28, 2014 at 9:24 AM, Richard Weinberger <[email protected]> wrote:
>>
>> Hmm, my gdb does not like that notion.
>>
>> (gdb) list schedule+0x45
>> Function "schedule+0x45" not defined.
>
> I don't have a debug build, so maybe it's something specific to gdb
> actually seeing type information and then being confused..
>
> Can you do "x/10i schedule+0x45" because that definitely works for me.
> But I literally detest DBUG_INFO builds, because it's useless crap.
> 99% of everything I debug is from people reporting problems on their
> kernels, so it's not like my local debug info would match that anyway.

x/10i schedule+0x45 works.

> Anyway, if it's a type information thing due to DEBUG_INFO that
> confuses gdb and makes it think that the "+0x45" part doesn't make
> sense, you may need to add a cast to get gdb to ignore the type
> information. IOW, does
>
> list (void *)schedule + 0x45
>
> work for you?

Nope.

(gdb) list (void *)schedule + 0x45
Function "(void *)schedule + 0x45" not defined.

> I happen to have gdb-7.6.50 here that I tested with, but I've used
> "symbol+offset" forever afaik, so it's definitely not something new.

I'm on 7.5.1.

Thanks,
//richard

2014-01-28 17:56:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 9:52 AM, Richard Weinberger <[email protected]> wrote:
>
> x/10i schedule+0x45 works.

Ok, so it's just list that is braindamaged. Maybe it wants a
*(schedule+0x45) or something. I dunno. You can obviously get the hex
number from just doing "p schedule+0x45" and then do that.

Whatever. I still claim that the hex numbers in the oopses are
worthless noise. The fact that gdb has "UI issues" is an unrelated
issue and perhaps not surprising.

Linus

2014-01-28 18:55:07

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14



Am 28.01.2014 18:56, schrieb Linus Torvalds:
> On Tue, Jan 28, 2014 at 9:52 AM, Richard Weinberger <[email protected]> wrote:
>>
>> x/10i schedule+0x45 works.
>
> Ok, so it's just list that is braindamaged. Maybe it wants a
> *(schedule+0x45) or something. I dunno. You can obviously get the hex
> number from just doing "p schedule+0x45" and then do that.

Ah, "list *(schedule+0x45)" works.
This reminds my why I never use gdb and like addr2line so much. ;D

Thanks,
//richard

2014-01-28 19:48:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* Linus Torvalds <[email protected]> wrote:

> On Tue, Jan 28, 2014 at 9:05 AM, Ingo Molnar <[email protected]> wrote:
> >
> > Well, I often use the hex numbers to look them up and disassemble them
> > in a vmlinux via gdb and 'list *0x1234123412341234' - where the
> > vmlinux has no debuginfo. (Debuginfo takes longer to build so I
> > generally build without it.)
>
> Why the heck wouldn't you do that? Just do
>
> list schedule+0x45
>
> instead.
>
> > AFAICS this won't work in a symbol-less vmlinux. Is there some trick
> > to do it with gdb?
>
> Why would you have a symbol-less vmlinux? The only reason to strip
> vmlinux is because you were crazy enough to build with

I don't think I ever stripped a vmlinux in my life, and I definitely
didn't strip this one:

phoenix:~/linux/linux> file vmlinux
vmlinux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=0x239d56b40bb654ddd2dd704e6b585d5c90de4e12, not stripped

> CONFIG_DEBUG_INFO and the damn debug info is so large that it won't
> fit on your root partition. But dammit, if you build with debug_info
> and then strip the end result, you're just insane. [...]

So, to quote myself from 2 sentences ago:

> > vmlinux has no debuginfo. (Debuginfo takes longer to build so I
> > generally build without it.)

I really meant it when I said I build without debuginfo! :)

So, when I build a kernel, such as with a regular 'make defconfig',
the following happens in gdb:

Reading symbols from /home/mingo/tip/vmlinux...(no debugging symbols found)...done.
(gdb) list schedule+0x45
No symbol table is loaded. Use the "file" command.

Is there a way to resolve schedule+0x45 in a regular vmlinux? It was
an honest question.

> [...] You made your build take ten times longer, use ten times more
> diskspace, and then you throw it all away. Crazy.

It's so crazy that I in fact try to force off debuginfo for all my
builds, even randconfig ones:

config DEBUG_INFO
bool "Compile the kernel with debug info"
depends on DEBUG_KERNEL
# too slow build in QA
depends on 0

> So I don't think the symbol-less version is worth even worrying
> about. You do want to build with KALLSYMS (or whatever the config
> option is called), so that the symbolic name is worth something, but
> once you have the symbolc name, you're good unless you did something
> terminally stupid.

Hm, I have kallsyms on - it's a regular 'make defconfig':

phoenix:~/linux/linux> grep KALL .config
CONFIG_KALLSYMS=y

I might be doing something terminally stupid.

> Btw, we should make it harder to enable CONFIG_DEBUG_INFO. [...]

It's really not set:

phoenix:~/linux/linux> grep DEBUG_INFO .config
# CONFIG_DEBUG_INFO is not set

> [...] It's a f*cking pain. It's particularly nasty when you do "make
> allmodconfig" and it enables debug-info and makes the build take
> forever and waste diskspace - but nobody sane actually *boots* the
> end result, so that debug info is all pointless.

I actually boot (almost-)allmod and allyesconfigs - still I disable
DEBUG_INFO because it's such a strain on our planet's climate.

Thanks,

Ingo

2014-01-28 20:07:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 11:48 AM, Ingo Molnar <[email protected]> wrote:
>
> I really meant it when I said I build without debuginfo! :)

Ok, but so what?

As mentioned, nobody sane should build with DEBUG_INFO. But a normal
vmlinux file has the symbol information even without it.

> So, when I build a kernel, such as with a regular 'make defconfig',
> the following happens in gdb:
>
> Reading symbols from /home/mingo/tip/vmlinux...(no debugging symbols found)...done.
> (gdb) list schedule+0x45
> No symbol table is loaded. Use the "file" command.
>
> Is there a way to resolve schedule+0x45 in a regular vmlinux? It was
> an honest question.

That seems to be just a gdb bug (or "UI feature"), in that gdb likes
to give misleading error messages and requires odd syntax for some
things.

The symbols are there (see the first line):

Reading symbols from /home/mingo/tip/vmlinux...(no debugging symbols
found)...done.

and that "no debugging symbols found" is just because you don't have
the extra *debug* info.

The "list" command requires debug info to work (since that's where the
line number information is), and will not work with hex symbols either
if you don't have that. So when it says "No symbol table is loaded" it
really means "no debug information is loaded".

But you can see that the symbol is perfectly fine:

(gdb) list *(schedule+0x45)
No symbol table is loaded. Use the "file" command.
(gdb) x/3i schedule+0x45
0xffffffff81616d05 <schedule+69>: adc 0x1(%rsi),%bh
0xffffffff81616d0b <schedule+75>: callq 0xffffffff812c0b30
<blk_flush_plug_list>
0xffffffff81616d10 <schedule+80>: jmp 0xffffffff81616cdf <schedule+31>

(ok, so 0x45 wasn't the start of a real instruction, but you get the idea).

So my point is that the hex address doesn't give you *anything* that
the symbolic address doesn't give you. Unless you do truly crazy
things like actively strip the kernel.

Linus

2014-01-28 20:15:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 12:07:26PM -0800, Linus Torvalds wrote:
> On Tue, Jan 28, 2014 at 11:48 AM, Ingo Molnar <[email protected]> wrote:
> >
> > I really meant it when I said I build without debuginfo! :)
>
> Ok, but so what?
>
> As mentioned, nobody sane should build with DEBUG_INFO.

Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
Something like:

"You don't need to enable this if you want symbolic names for kernel
objects. Enable CONFIG_KALLSYMS instead."

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-01-28 20:25:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <[email protected]> wrote:
>
> Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
> Something like:
>
> "You don't need to enable this if you want symbolic names for kernel
> objects. Enable CONFIG_KALLSYMS instead."

Probably. And then we should make sure that allyesconfig/allmodconfig
don't pick it.

There *are* reasonable uses for DEBUG_INFO:

- if you very actively and extensively use kgdb, DEBUG_INFO is very useful.

- distros might want to build distro kernels with DEBUG_INFO for the
kernel debug package

but for most kernel developers, DEBUG_INFO really just bloats things
and makes the build much *much* slower, for very little gain. Sure,
you get line numbers, but let's face it, it's not that hard to just do
"x/20i <address>" and trying to match it up with the generated code
instead. And since "just match it up" model works with user-reported
oopses of random kernels, you had better be able and willing to do
that *anyway*.

If most of the oopses you decode are on your own machine with your own
kernel, you might want to try to learn to be more careful when writing
code. And I'm not even kidding.

Linus

2014-01-28 20:28:42

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

Am 28.01.2014 21:25, schrieb Linus Torvalds:
> On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <[email protected]> wrote:
>>
>> Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
>> Something like:
>>
>> "You don't need to enable this if you want symbolic names for kernel
>> objects. Enable CONFIG_KALLSYMS instead."
>
> Probably. And then we should make sure that allyesconfig/allmodconfig
> don't pick it.

Let's make it depend on CONFIG_EXPERT.

Thanks,
//richard

2014-01-28 20:39:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 01/28/2014 12:28 PM, Richard Weinberger wrote:
> Am 28.01.2014 21:25, schrieb Linus Torvalds:
>> On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <[email protected]> wrote:
>>>
>>> Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
>>> Something like:
>>>
>>> "You don't need to enable this if you want symbolic names for kernel
>>> objects. Enable CONFIG_KALLSYMS instead."
>>
>> Probably. And then we should make sure that allyesconfig/allmodconfig
>> don't pick it.
>
> Let's make it depend on CONFIG_EXPERT.
>

That doesn't solve all*config. My scripts for all*config just overrides
it, maybe we need to do something like that by default?

-hpa

2014-01-28 20:49:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 12:25:15PM -0800, Linus Torvalds wrote:
> Probably. And then we should make sure that allyesconfig/allmodconfig
> don't pick it.

I'd need to think about that a bit longer as scripts/kconfig/conf.c goes
and sets those. Unless someone has a better idea...

> There *are* reasonable uses for DEBUG_INFO:
>
> - if you very actively and extensively use kgdb, DEBUG_INFO is very useful.

Right, and at least KGDB depends on it. I was going to suggest to hide
it and make only tools select it but...

> - distros might want to build distro kernels with DEBUG_INFO for the
> kernel debug package

Yes. SUSE uses crash to analyze dumps and for that it needs -g symbols
and other distros I'm sure too in some fashion.

> but for most kernel developers, DEBUG_INFO really just bloats things
> and makes the build much *much* slower, for very little gain.

Right, your discussion kinda made me aware of this - I hadn't realized
that KALLSYMS is enough.

> Sure, you get line numbers, but let's face it, it's not that hard
> to just do "x/20i <address>" and trying to match it up with the
> generated code instead. And since "just match it up" model works with
> user-reported oopses of random kernels, you had better be able and
> willing to do that *anyway*.

Hohumm.

> If most of the oopses you decode are on your own machine with your own
> kernel, you might want to try to learn to be more careful when writing
> code. And I'm not even kidding.

Haha.

Ok, here's the help text hunk:

---
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dbf94a7d25a8..73b36a1624ec 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -126,7 +126,11 @@ config DEBUG_INFO
This adds debug symbols to the kernel and modules (gcc -g), and
is needed if you intend to use kernel crashdump or binary object
tools like crash, kgdb, LKCD, gdb, etc on the kernel.
- Say Y here only if you plan to debug the kernel.
+
+ If you only want to have symbols in kernel traces and are not
+ going to need support for those tools above, you don't need
+ to enable this as it is a huge bloat and build slowdown.
+ Enable CONFIG_KALLSYMS instead.

If unsure, say N.


--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-01-28 21:10:00

by Dave Jones

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 12:25:15PM -0800, Linus Torvalds wrote:
> On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <[email protected]> wrote:
> >
> > Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
> > Something like:
> >
> > "You don't need to enable this if you want symbolic names for kernel
> > objects. Enable CONFIG_KALLSYMS instead."
>
> Probably. And then we should make sure that allyesconfig/allmodconfig
> don't pick it.
>
> There *are* reasonable uses for DEBUG_INFO:
>
> - if you very actively and extensively use kgdb, DEBUG_INFO is very useful.
>
> - distros might want to build distro kernels with DEBUG_INFO for the
> kernel debug package

- objdump -S is kind of useful. I find myself using that quite often.
(like at least a dozen times each merge window)

pretty sure it works with DEBUG_INFO_REDUCED though, which is somewhat
faster than full DEBUG_INFO

Dave

2014-01-28 23:37:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, Jan 28, 2014 at 09:49:06PM +0100, Borislav Petkov wrote:
> On Tue, Jan 28, 2014 at 12:25:15PM -0800, Linus Torvalds wrote:
> > Probably. And then we should make sure that allyesconfig/allmodconfig
> > don't pick it.
>
> I'd need to think about that a bit longer as scripts/kconfig/conf.c goes
> and sets those. Unless someone has a better idea...

Maybe we can do something simple like this, it works here. We could
even generate a random name for the all.config file so that there are
no conflicts with existing stuff and even test for its presence. Or we
could hide all that in a script and so on and so on...

---
diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index 844bc9da08da..76e889fc6384 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -77,7 +77,9 @@ update-po-config: $(obj)/kxgettext $(obj)/gconf.glade.h
PHONY += allnoconfig allyesconfig allmodconfig alldefconfig randconfig

allnoconfig allyesconfig allmodconfig alldefconfig randconfig: $(obj)/conf
- $< --$@ $(Kconfig)
+ $(shell echo "# CONFIG_DEBUG_INFO is not set" > /tmp/all.config)
+ $(Q)KCONFIG_ALLCONFIG=/tmp/all.config $< --$@ $(Kconfig)
+ $(Q)rm -rf /tmp/all.config

PHONY += listnewconfig olddefconfig oldnoconfig savedefconfig defconfig


--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-01-29 06:38:19

by Mike Galbraith

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Tue, 2014-01-28 at 16:08 -0500, Dave Jones wrote:
> On Tue, Jan 28, 2014 at 12:25:15PM -0800, Linus Torvalds wrote:
> > On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <[email protected]> wrote:
> > >
> > > Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
> > > Something like:
> > >
> > > "You don't need to enable this if you want symbolic names for kernel
> > > objects. Enable CONFIG_KALLSYMS instead."
> >
> > Probably. And then we should make sure that allyesconfig/allmodconfig
> > don't pick it.
> >
> > There *are* reasonable uses for DEBUG_INFO:
> >
> > - if you very actively and extensively use kgdb, DEBUG_INFO is very useful.
> >
> > - distros might want to build distro kernels with DEBUG_INFO for the
> > kernel debug package
>
> - objdump -S is kind of useful. I find myself using that quite often.
> (like at least a dozen times each merge window)

For those with wimpy asm muscles (/me), it's very useful indeed, as are
gdb list *foo()+0x10 or *0xfeedbabedeadbeef. I always build with full
DEBUG_INFO, just keep configs lean enough that lots of kernels fit in a
2G /boot, and _never ever_ set a box up with a microscopic root.

> pretty sure it works with DEBUG_INFO_REDUCED though, which is somewhat
> faster than full DEBUG_INFO

Yeah, it does.

-Mike

2014-01-29 08:11:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* Linus Torvalds <[email protected]> wrote:

> On Tue, Jan 28, 2014 at 11:48 AM, Ingo Molnar <[email protected]> wrote:
> >
> > I really meant it when I said I build without debuginfo! :)
>
> Ok, but so what?
>
> As mentioned, nobody sane should build with DEBUG_INFO. But a normal
> vmlinux file has the symbol information even without it.

So, your mail sure read to me as a rant directed at me, so I thought
I'd defend myself or something :)

I now realize that the whole episode was caused by me calling the
vmlinux 'symbol-less':

> > > > AFAICS this won't work in a symbol-less vmlinux. Is there some
> > > > trick to do it with gdb?

while I should have said 'debuginfo-less'. Mea culpa.

> > So, when I build a kernel, such as with a regular 'make defconfig',
> > the following happens in gdb:
> >
> > Reading symbols from /home/mingo/tip/vmlinux...(no debugging symbols found)...done.
> > (gdb) list schedule+0x45
> > No symbol table is loaded. Use the "file" command.
> >
> > Is there a way to resolve schedule+0x45 in a regular vmlinux? It
> > was an honest question.
>
> That seems to be just a gdb bug (or "UI feature"), in that gdb likes
> to give misleading error messages and requires odd syntax for some
> things.

Yeah. Almost as if they worked hard to make annoying users go away or
something. (LLVM is IMO a blessing because, despite its somewhat
broken licensing, it cured a similar attitude of the GCC folks. In a
way competition is more important than licensing details!)

> But you can see that the symbol is perfectly fine:
>
> (gdb) list *(schedule+0x45)

Oh, cool. Thanks for that trick - this will save me quite some time in
the future.

So we can strip absolute addresses just fine from oopses - cool.

I'd even argue to strip the hex on non-randomized kernels as long as
there's kallsyms around, and only print hex if we don't have any
symbols.

> So my point is that the hex address doesn't give you *anything* that
> the symbolic address doesn't give you. [...]

Yeah, and with your trick that's now the case for my debugging as
well, which is a nice touch.

> [...] Unless you do truly crazy things like actively strip the
> kernel.

Being crazy is something I try to avoid. (Beyond being a maintainer
of a software project as busy and stressful as the Linux kernel that is.)

Thanks,

Ingo

2014-01-29 08:25:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* H. Peter Anvin <[email protected]> wrote:

> On 01/28/2014 12:28 PM, Richard Weinberger wrote:
> > Am 28.01.2014 21:25, schrieb Linus Torvalds:
> >> On Tue, Jan 28, 2014 at 12:15 PM, Borislav Petkov <[email protected]> wrote:
> >>>
> >>> Shouldn't we hold that down in the Kconfig help text of DEBUG_INFO?
> >>> Something like:
> >>>
> >>> "You don't need to enable this if you want symbolic names for kernel
> >>> objects. Enable CONFIG_KALLSYMS instead."
> >>
> >> Probably. And then we should make sure that allyesconfig/allmodconfig
> >> don't pick it.
> >
> > Let's make it depend on CONFIG_EXPERT.
> >
>
> That doesn't solve all*config. My scripts for all*config just
> overrides it, maybe we need to do something like that by default?

For features we want to exclude from allyesconfig we can define them
as logical negatives. (I've used this technique in the past to hide
silly options from allyesconfig and it works well.)

So to reign in debuginfo in allyesconfig/allmodconfig builds we could
do something like:

config SAVE_TIME_AND_DISK_SPACE
bool "Faster and leaner kernel build: compile without debug info"
default y

choice
prompt "Choose DEBUGINFO bloat level"
depends on !SAVE_TIME_AND_DISK_SPACE
default DEBUG_INFO_REDUCED
help
This option allows to select the debuginfo model.

config DEBUG_INFO_REDUCED
bool "Reduced amount of debugging information"

config DEBUG_INFO
bool "Full DEBUG info, 'planet killer' version"

endchoice

(or so, perhaps with slightly more PC text.)

That would default to Y and would disable debuginfo by default.

( And yeah, it's a bit ugly, but it beats having to hack the kconfig
language! )

Thanks,

Ingo

2014-01-29 08:27:14

by Mathias Krause

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 29 January 2014 09:11, Ingo Molnar <[email protected]> wrote:
>> But you can see that the symbol is perfectly fine:
>>
>> (gdb) list *(schedule+0x45)
>
> Oh, cool. Thanks for that trick - this will save me quite some time in
> the future.
>
> So we can strip absolute addresses just fine from oopses - cool.
>
> I'd even argue to strip the hex on non-randomized kernels as long as
> there's kallsyms around, and only print hex if we don't have any
> symbols.

Please, don't do so! I do find the hex values in the backtrace *very*
useful as I'm using 'objdump -wdr vmlinux | less' quite often to
"browse around" in the kernel binary. Grepping for addresses from a
backtrace works quite nicely this way. Having to lookup symbols and do
base-16 arithmetics in the head (or a shell, for that matter) would
only slow down this process. So, please leave the hex values in place.
They do help a lot -- at least in the non-kASLR case.

Regards,
Mathias

2014-01-29 10:40:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Wed, Jan 29, 2014 at 09:25:05AM +0100, Ingo Molnar wrote:
> That would default to Y and would disable debuginfo by default.
>
> ( And yeah, it's a bit ugly, but it beats having to hack the kconfig
> language! )

Yeah, that could be another solution - I came up with another hack last
night: http://marc.info/?l=linux-kernel&m=139095226124549

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-01-30 09:23:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14


* Mathias Krause <[email protected]> wrote:

> On 29 January 2014 09:11, Ingo Molnar <[email protected]> wrote:
> >> But you can see that the symbol is perfectly fine:
> >>
> >> (gdb) list *(schedule+0x45)
> >
> > Oh, cool. Thanks for that trick - this will save me quite some time in
> > the future.
> >
> > So we can strip absolute addresses just fine from oopses - cool.
> >
> > I'd even argue to strip the hex on non-randomized kernels as long as
> > there's kallsyms around, and only print hex if we don't have any
> > symbols.
>
> Please, don't do so! I do find the hex values in the backtrace
> *very* useful as I'm using 'objdump -wdr vmlinux | less' quite often
> to "browse around" in the kernel binary. Grepping for addresses from
> a backtrace works quite nicely this way. Having to lookup symbols
> and do base-16 arithmetics in the head (or a shell, for that matter)
> would only slow down this process. So, please leave the hex values
> in place. They do help a lot -- at least in the non-kASLR case.

Well, if the consensus is that they help then we better make them
correct in the KASLR case as well ...

Thanks,

Ingo

2014-01-30 18:15:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Thu, Jan 30, 2014 at 1:23 AM, Ingo Molnar <[email protected]> wrote:
>
> Well, if the consensus is that they help then we better make them
> correct in the KASLR case as well ...

In the kaslr case, the hex values cannot possibly help, since they are
meaningless due to the random offset (the external tools will *not* be
able to use them without relocation information anyway, and if the
tools can take relocation into account, they might as well just take
the symbol name into account anyway).

I'd really suggest dropping the damn thing entirely. Even if you do
use "objdump" (and I admit to doing that quite often myself), the
symbol version isn't really all that hard to use. And the thing is,
once again, the symbol version works even for kaslr, while the raw hex
does not.

I do agree that having to do the arithmetic can be annoying, but it
wouldn't actually be all that hard to write a script that turns the
"objdump" format from hex address to "symbol+offset" format. It would
probably be three lines of perl:

- parse the "hex: rest-of-line" thing
- if "rest-of-line" is of the format "<symbol>:", then remember the
hex and the symbol name.
- print out "symbol+offset: rest-of-line" where "offset" is "hex -
remembered_hex"

Even I could probably do it, and my perl knowledge is really limited
to "I can edit other peoples perl code if I'm lucky". Somebody who
actually does perl probably goes "Three lines? That's a oneliner".

Linus

2014-01-30 22:07:33

by Vivek Goyal

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Sun, Jan 26, 2014 at 10:51:06PM -0800, H. Peter Anvin wrote:
> On 01/26/2014 10:49 PM, Richard Weinberger wrote:
> >>
> >> No, because that information is available to user space unless we panic.
> >
> > Didn't you mean non-root?
> > I thought one has to set dmesg_restrict anyways if kASLR is used.
> >
> > And isn't the offset available to perf too?
> > Of course only for root, but still user space.
> >
>
> For certain system security levels one want to protect even from a rogue
> root. In those cases, leaking that information via dmesg and perf isn't
> going to work, either.
>
> With lower security settings, by all means...

I am wondering if kdump functionality is impacted with this change.

Kexec tools prepares ELF headers for kernel memory ranges and for the
range where kernel text is mapped. So it needs to know virtual address
of the region where kernel is mapped (obtained by /proc/kcore) and
it gets the physical address where kernel is loaded from /proc/iomem.

So with this change are we planning to hide kernel text virtual address and
physical address information information from root in /proc/kcore and
/proc/iomem in anyway?

Thanks
Vivek

2014-01-31 16:57:07

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Thu, Jan 30, 2014 at 2:07 PM, Vivek Goyal <[email protected]> wrote:
> On Sun, Jan 26, 2014 at 10:51:06PM -0800, H. Peter Anvin wrote:
>> On 01/26/2014 10:49 PM, Richard Weinberger wrote:
>> >>
>> >> No, because that information is available to user space unless we panic.
>> >
>> > Didn't you mean non-root?
>> > I thought one has to set dmesg_restrict anyways if kASLR is used.
>> >
>> > And isn't the offset available to perf too?
>> > Of course only for root, but still user space.
>> >
>>
>> For certain system security levels one want to protect even from a rogue
>> root. In those cases, leaking that information via dmesg and perf isn't
>> going to work, either.
>>
>> With lower security settings, by all means...
>
> I am wondering if kdump functionality is impacted with this change.
>
> Kexec tools prepares ELF headers for kernel memory ranges and for the
> range where kernel text is mapped. So it needs to know virtual address
> of the region where kernel is mapped (obtained by /proc/kcore) and
> it gets the physical address where kernel is loaded from /proc/iomem.
>
> So with this change are we planning to hide kernel text virtual address and
> physical address information information from root in /proc/kcore and
> /proc/iomem in anyway?

I have no intention of that. Mentioned earlier in the thread, hiding
it from root will be pretty ugly/hard/pointless:
https://lkml.org/lkml/2014/1/27/287
I would like to just keep the offset out of dmesg.

-Kees

--
Kees Cook
Chrome OS Security

2014-02-07 14:49:58

by Vivek Goyal

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Fri, Jan 31, 2014 at 08:57:03AM -0800, Kees Cook wrote:

[..]
> I have no intention of that. Mentioned earlier in the thread, hiding
> it from root will be pretty ugly/hard/pointless:
> https://lkml.org/lkml/2014/1/27/287
> I would like to just keep the offset out of dmesg.

[ CC Dave Young ]

Hi Kees,

Dave Young is testing kdump with kaslr enabled. He is facing some issues.

One issue he mentioned is that when second kernel boots, it might be
placed in an area which is outside the reserved area for second kernel.

We reserve a certain memory for second kernel. And modify memory map of
second kernel using memmap=exactmap parameter. Looks like kernel placement
is happening before memmap=exactmap takes effect. And that seems to be
the reason that second kernel can be placed outside the reserved memory.

IOW, memmap=exactmap and kaslr don't work together. Is it possible to
first let memmap=exactmap take affect and then kaslr does its job. Or it
is too late by the time memmap=exactmap is parsed.

As a workaround, Dave is currently using "nokaslr" command line parameter
for second kernel. He is still facing issues where makedumpfile segment
faults. He is looking into it further.

I thought I will atleast bring up with issue of memmap=exactmap and kaslr
being incompatible.

Thanks
Vivek

2014-02-07 16:04:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 02/07/2014 06:49 AM, Vivek Goyal wrote:
>
> Hi Kees,
>
> Dave Young is testing kdump with kaslr enabled. He is facing some issues.
>
> One issue he mentioned is that when second kernel boots, it might be
> placed in an area which is outside the reserved area for second kernel.
>
> We reserve a certain memory for second kernel. And modify memory map of
> second kernel using memmap=exactmap parameter. Looks like kernel placement
> is happening before memmap=exactmap takes effect. And that seems to be
> the reason that second kernel can be placed outside the reserved memory.
>
> IOW, memmap=exactmap and kaslr don't work together. Is it possible to
> first let memmap=exactmap take affect and then kaslr does its job. Or it
> is too late by the time memmap=exactmap is parsed.
>
> As a workaround, Dave is currently using "nokaslr" command line parameter
> for second kernel. He is still facing issues where makedumpfile segment
> faults. He is looking into it further.
>
> I thought I will atleast bring up with issue of memmap=exactmap and kaslr
> being incompatible.
>

Yes, because memmap=exactmap gets parsed too late; kaslr assumes that
the e820 information passed to it is actually correct.

Yet another cause of breakage caused by the decision on the part of
kdump to rely on command-line options.

-hpa

2014-02-07 16:25:14

by Vivek Goyal

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Fri, Feb 07, 2014 at 08:04:13AM -0800, H. Peter Anvin wrote:
> On 02/07/2014 06:49 AM, Vivek Goyal wrote:
> >
> > Hi Kees,
> >
> > Dave Young is testing kdump with kaslr enabled. He is facing some issues.
> >
> > One issue he mentioned is that when second kernel boots, it might be
> > placed in an area which is outside the reserved area for second kernel.
> >
> > We reserve a certain memory for second kernel. And modify memory map of
> > second kernel using memmap=exactmap parameter. Looks like kernel placement
> > is happening before memmap=exactmap takes effect. And that seems to be
> > the reason that second kernel can be placed outside the reserved memory.
> >
> > IOW, memmap=exactmap and kaslr don't work together. Is it possible to
> > first let memmap=exactmap take affect and then kaslr does its job. Or it
> > is too late by the time memmap=exactmap is parsed.
> >
> > As a workaround, Dave is currently using "nokaslr" command line parameter
> > for second kernel. He is still facing issues where makedumpfile segment
> > faults. He is looking into it further.
> >
> > I thought I will atleast bring up with issue of memmap=exactmap and kaslr
> > being incompatible.
> >
>
> Yes, because memmap=exactmap gets parsed too late; kaslr assumes that
> the e820 information passed to it is actually correct.
>
> Yet another cause of breakage caused by the decision on the part of
> kdump to rely on command-line options.

[CC kexec mailing list]

Ok, I think this is high time we change kexec-tools to not use
memmap=exactmap and start passing modified memory map in bootparams. I
think only concern with that change was backward compatibility of
kexec-tools with older kernels.

IIUC, only thing which will be impacted by this change is users of
saved_max_pfn which determine the highest accessible pfn in first
kernel. Some calgary IOMMU code seems to be the only user of it now.

So may be we can create a new command line option say --pass-memmap-cmdline
to kexec-tools which forces old behavior and by default we pass memmap
in bootparams.

Or create --do-not-pass-memmap-cmdline and new users will use it. Default
will be old behavior.

Thansk
Vivek

2014-02-07 19:07:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 02/07/2014 06:49 AM, Vivek Goyal wrote:
>
> As a workaround, Dave is currently using "nokaslr" command line parameter
> for second kernel. He is still facing issues where makedumpfile segment
> faults. He is looking into it further.
>

Now, let's state this: kaslr for kdump is almost certainly useless (the
amount of reserved memory is not enough to provide any meaningful
randomization, so any randomization needs to happen during the memory
reservation phase.) So disabling kaslr in the kdump kernel is entirely
appropriate.

-hpa

2014-02-07 19:44:50

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On Fri, Feb 7, 2014 at 11:07 AM, H. Peter Anvin <[email protected]> wrote:
> On 02/07/2014 06:49 AM, Vivek Goyal wrote:
>>
>> As a workaround, Dave is currently using "nokaslr" command line parameter
>> for second kernel. He is still facing issues where makedumpfile segment
>> faults. He is looking into it further.
>>
>
> Now, let's state this: kaslr for kdump is almost certainly useless (the
> amount of reserved memory is not enough to provide any meaningful
> randomization, so any randomization needs to happen during the memory
> reservation phase.) So disabling kaslr in the kdump kernel is entirely
> appropriate.

Peter covered everything already, but yeah, kaslr and kdump may not
make a lot of sense together, but regardless, yes, it only examines
e820 for memory space. It has to do all this work before the kernel
decompresses, so it's very early.

-Kees

--
Kees Cook
Chrome OS Security

2014-02-07 23:16:41

by Dave Young

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 02/07/14 at 11:24am, Vivek Goyal wrote:
> On Fri, Feb 07, 2014 at 08:04:13AM -0800, H. Peter Anvin wrote:
> > On 02/07/2014 06:49 AM, Vivek Goyal wrote:
> > >
> > > Hi Kees,
> > >
> > > Dave Young is testing kdump with kaslr enabled. He is facing some issues.
> > >
> > > One issue he mentioned is that when second kernel boots, it might be
> > > placed in an area which is outside the reserved area for second kernel.
> > >
> > > We reserve a certain memory for second kernel. And modify memory map of
> > > second kernel using memmap=exactmap parameter. Looks like kernel placement
> > > is happening before memmap=exactmap takes effect. And that seems to be
> > > the reason that second kernel can be placed outside the reserved memory.
> > >
> > > IOW, memmap=exactmap and kaslr don't work together. Is it possible to
> > > first let memmap=exactmap take affect and then kaslr does its job. Or it
> > > is too late by the time memmap=exactmap is parsed.
> > >
> > > As a workaround, Dave is currently using "nokaslr" command line parameter
> > > for second kernel. He is still facing issues where makedumpfile segment
> > > faults. He is looking into it further.
> > >
> > > I thought I will atleast bring up with issue of memmap=exactmap and kaslr
> > > being incompatible.
> > >
> >
> > Yes, because memmap=exactmap gets parsed too late; kaslr assumes that
> > the e820 information passed to it is actually correct.
> >
> > Yet another cause of breakage caused by the decision on the part of
> > kdump to rely on command-line options.
>
> [CC kexec mailing list]
>
> Ok, I think this is high time we change kexec-tools to not use
> memmap=exactmap and start passing modified memory map in bootparams. I
> think only concern with that change was backward compatibility of
> kexec-tools with older kernels.
>
> IIUC, only thing which will be impacted by this change is users of
> saved_max_pfn which determine the highest accessible pfn in first
> kernel. Some calgary IOMMU code seems to be the only user of it now.
>
> So may be we can create a new command line option say --pass-memmap-cmdline
> to kexec-tools which forces old behavior and by default we pass memmap
> in bootparams.
>
> Or create --do-not-pass-memmap-cmdline and new users will use it. Default
> will be old behavior.

Hi Vivek,

Chaowang <[email protected]> is looking into passing setup_data SETUP_E820_EXT
instead of using exactmap. Previously Thomas Renninger tried passing them in e820.
I did not find the old thread, but I remember it's not enough because of the
128 entries limitation.

In case for kaslr, I agree that it do not make sense for kdump kernel. IMHO it's
better to disable it automaticlly for kdump kernel but we can use the nokaslr
as workaround for now.

Thanks
Dave

2014-02-07 23:21:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 02/07/2014 03:16 PM, Dave Young wrote:
>
> Hi Vivek,
>
> Chaowang <[email protected]> is looking into passing setup_data SETUP_E820_EXT
> instead of using exactmap. Previously Thomas Renninger tried passing them in e820.
> I did not find the old thread, but I remember it's not enough because of the
> 128 entries limitation.
>

THERE IS NO SUCH LIMITATION.

Only 128 slots fit in struct boot_params, but additional entries can be
added to the setup_data linked list.

-hpa

2014-02-07 23:28:47

by Dave Young

[permalink] [raw]
Subject: Re: [GIT PULL] x86/kaslr for v3.14

On 02/07/14 at 03:20pm, H. Peter Anvin wrote:
> On 02/07/2014 03:16 PM, Dave Young wrote:
> >
> > Hi Vivek,
> >
> > Chaowang <[email protected]> is looking into passing setup_data SETUP_E820_EXT
> > instead of using exactmap. Previously Thomas Renninger tried passing them in e820.
> > I did not find the old thread, but I remember it's not enough because of the
> > 128 entries limitation.
> >
>
> THERE IS NO SUCH LIMITATION.
>
> Only 128 slots fit in struct boot_params, but additional entries can be
> added to the setup_data linked list.

Ok, I did not describe it right, I actually means the space limit. Thanks for
clarifying.