2013-04-04 20:08:11

by Kees Cook

[permalink] [raw]
Subject: [PATCH 0/3] kernel ASLR

Hello,

This patch series implements per-boot kernel base offset ASLR. It is based
on work by Dan Rosenberg, Neill Clift, Michael Davidson, and myself. Since
Dan's original thread[1], this code has been improved to work on 64-bit,
among other things.

This is presently in use at Google, and is being ported to Chrome
OS. It has several limitations currently, but I wanted to get the ball
rolling again on upstreaming this. More details are in the individual
patches. They are split into three pieces: the offset selection logic,
the 64-bit relocation logic, and finally putting the offset to use at
boot time.

Thanks,

-Kees

[1] http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/index.html#00520


2013-04-04 20:08:15

by Kees Cook

[permalink] [raw]
Subject: [PATCH 2/3] x86: build reloc tool for both 64 and 32 bit

Add logic for 64-bit kernel relocations. Since there is no need to
handle 32 and 64 bit at the same time, refactor away most of the 32/64
bit ELF differences and split the build into producing two separate
binaries. Additionally switches to using realloc instead of a two-pass
approach.

Heavily based on work by Neill Clift and Michael Davidson.

Signed-off-by: Kees Cook <[email protected]>
Cc: Eric Northup <[email protected]>
---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/realmode/rm/Makefile | 2 +-
arch/x86/tools/.gitignore | 3 +-
arch/x86/tools/Makefile | 14 +-
arch/x86/tools/relocs.c | 717 ++++++++++++++++++++++++++-----------
arch/x86/tools/relocs_32.c | 1 +
arch/x86/tools/relocs_64.c | 2 +
7 files changed, 533 insertions(+), 208 deletions(-)
create mode 100644 arch/x86/tools/relocs_32.c
create mode 100644 arch/x86/tools/relocs_64.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 376ef47..deaed7d 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -48,7 +48,7 @@ $(obj)/vmlinux.bin: vmlinux FORCE

targets += vmlinux.bin.all vmlinux.relocs

-CMD_RELOCS = arch/x86/tools/relocs
+CMD_RELOCS = arch/x86/tools/relocs_$(BITS)
quiet_cmd_relocs = RELOCS $@
cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
$(obj)/vmlinux.relocs: vmlinux FORCE
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 8869287..2b1e429 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -56,7 +56,7 @@ $(obj)/realmode.bin: $(obj)/realmode.elf $(obj)/realmode.relocs
$(call if_changed,objcopy)

quiet_cmd_relocs = RELOCS $@
- cmd_relocs = arch/x86/tools/relocs --realmode $< > $@
+ cmd_relocs = arch/x86/tools/relocs_32 --realmode $< > $@

targets += realmode.relocs
$(obj)/realmode.relocs: $(obj)/realmode.elf FORCE
diff --git a/arch/x86/tools/.gitignore b/arch/x86/tools/.gitignore
index be0ed06..51374a2 100644
--- a/arch/x86/tools/.gitignore
+++ b/arch/x86/tools/.gitignore
@@ -1 +1,2 @@
-relocs
+relocs_32
+relocs_64
diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile
index bae601f..8c3b17a 100644
--- a/arch/x86/tools/Makefile
+++ b/arch/x86/tools/Makefile
@@ -37,6 +37,16 @@ $(obj)/test_get_len.o: $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/in

$(obj)/insn_sanity.o: $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c $(srctree)/arch/x86/include/asm/inat_types.h $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c

+$(obj)/relocs_64.o: $(srctree)/arch/x86/tools/relocs.c $(srctree)/arch/x86/tools/relocs_64.c
+$(obj)/relocs_32.o: $(srctree)/arch/x86/tools/relocs.c $(srctree)/arch/x86/tools/relocs_32.c
+
HOST_EXTRACFLAGS += -I$(srctree)/tools/include
-hostprogs-y += relocs
-relocs: $(obj)/relocs
+hostprogs-y += relocs_$(BITS)
+relocs_binaries = relocs_$(BITS)
+ifeq ($(CONFIG_64BIT),y)
+ hostprogs-y += relocs_32
+ relocs_binaries += relocs_32
+endif
+relocs: $(relocs_binaries)
+relocs_64: $(obj)/relocs_64
+relocs_32: $(obj)/relocs_32
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 79d67bd..63c5090 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -2,6 +2,7 @@
#include <stdarg.h>
#include <stdlib.h>
#include <stdint.h>
+#include <inttypes.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
@@ -12,21 +13,78 @@
#include <regex.h>
#include <tools/le_byteshift.h>

+#ifdef CONFIG_X86_64
+#define ELF_BITS 64
+#define ELF_MACHINE EM_X86_64
+#define ELF_MACHINE_NAME "x86_64"
+#define SHT_REL_TYPE SHT_RELA
+#define Elf_Rel Elf64_Rela
+#else
+#define ELF_BITS 32
+#define ELF_MACHINE EM_386
+#define ELF_MACHINE_NAME "i386"
+#define SHT_REL_TYPE SHT_REL
+#define Elf_Rel Elf32_Rel
+#endif
+
+#if (ELF_BITS == 64)
+#define ELF_CLASS ELFCLASS64
+#define ELF_R_SYM(val) ELF64_R_SYM(val)
+#define ELF_R_TYPE(val) ELF64_R_TYPE(val)
+#define ELF_ST_TYPE(o) ELF64_ST_TYPE(o)
+#define ELF_ST_BIND(o) ELF64_ST_BIND(o)
+#define ELF_ST_VISIBILITY(o) ELF64_ST_VISIBILITY(o)
+#else
+#define ELF_CLASS ELFCLASS32
+#define ELF_R_SYM(val) ELF32_R_SYM(val)
+#define ELF_R_TYPE(val) ELF32_R_TYPE(val)
+#define ELF_ST_TYPE(o) ELF32_ST_TYPE(o)
+#define ELF_ST_BIND(o) ELF32_ST_BIND(o)
+#define ELF_ST_VISIBILITY(o) ELF32_ST_VISIBILITY(o)
+#endif
+
+#define ElfW(type) _ElfW(ELF_BITS, type)
+#define _ElfW(bits, type) __ElfW(bits, type)
+#define __ElfW(bits, type) Elf##bits##_##type
+
static void die(char *fmt, ...);

#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-static Elf32_Ehdr ehdr;
-static unsigned long reloc_count, reloc_idx;
-static unsigned long *relocs;
-static unsigned long reloc16_count, reloc16_idx;
-static unsigned long *relocs16;
+static ElfW(Ehdr) ehdr;
+
+struct relocs {
+ uint32_t *offset;
+ unsigned long count;
+ unsigned long size;
+};
+
+struct relocs relocs16;
+struct relocs relocs32;
+#ifdef CONFIG_X86_64
+struct relocs relocs64;
+#endif
+
+static void add_reloc(struct relocs *r, uint32_t offset)
+{
+ if (r->count == r->size) {
+ unsigned long newsize = r->size + 50000;
+ void *mem = realloc(r->offset, newsize * sizeof(r->offset[0]));
+
+ if (!mem)
+ die("realloc of %ld entries for relocs failed\n",
+ newsize);
+ r->offset = mem;
+ r->size = newsize;
+ }
+ r->offset[r->count++] = offset;
+}

struct section {
- Elf32_Shdr shdr;
- struct section *link;
- Elf32_Sym *symtab;
- Elf32_Rel *reltab;
- char *strtab;
+ ElfW(Shdr) shdr;
+ struct section *link;
+ ElfW(Sym) *symtab;
+ Elf_Rel *reltab;
+ char *strtab;
};
static struct section *secs;

@@ -49,6 +107,9 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
"^(xen_irq_disable_direct_reloc$|"
"xen_save_fl_direct_reloc$|"
"VDSO|"
+#ifdef CONFIG_X86_64
+ "__vvar_page|"
+#endif
"__crc_)",

/*
@@ -72,6 +133,11 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
"__end_rodata|"
"__initramfs_start|"
"(jiffies|jiffies_64)|"
+#ifdef CONFIG_X86_64
+ "__per_cpu_load|"
+ "init_per_cpu__.*|"
+ "__end_rodata_hpage_align|"
+#endif
"_end)$"
};

@@ -198,6 +264,24 @@ static const char *rel_type(unsigned type)
{
static const char *type_name[] = {
#define REL_TYPE(X) [X] = #X
+#ifdef CONFIG_X86_64
+ REL_TYPE(R_X86_64_NONE),
+ REL_TYPE(R_X86_64_64),
+ REL_TYPE(R_X86_64_PC32),
+ REL_TYPE(R_X86_64_GOT32),
+ REL_TYPE(R_X86_64_PLT32),
+ REL_TYPE(R_X86_64_COPY),
+ REL_TYPE(R_X86_64_GLOB_DAT),
+ REL_TYPE(R_X86_64_JUMP_SLOT),
+ REL_TYPE(R_X86_64_RELATIVE),
+ REL_TYPE(R_X86_64_GOTPCREL),
+ REL_TYPE(R_X86_64_32),
+ REL_TYPE(R_X86_64_32S),
+ REL_TYPE(R_X86_64_16),
+ REL_TYPE(R_X86_64_PC16),
+ REL_TYPE(R_X86_64_8),
+ REL_TYPE(R_X86_64_PC8),
+#else
REL_TYPE(R_386_NONE),
REL_TYPE(R_386_32),
REL_TYPE(R_386_PC32),
@@ -213,6 +297,7 @@ static const char *rel_type(unsigned type)
REL_TYPE(R_386_PC8),
REL_TYPE(R_386_16),
REL_TYPE(R_386_PC16),
+#endif
#undef REL_TYPE
};
const char *name = "unknown type rel type name";
@@ -240,7 +325,7 @@ static const char *sec_name(unsigned shndx)
return name;
}

-static const char *sym_name(const char *sym_strtab, Elf32_Sym *sym)
+static const char *sym_name(const char *sym_strtab, ElfW(Sym) *sym)
{
const char *name;
name = "<noname>";
@@ -253,15 +338,45 @@ static const char *sym_name(const char *sym_strtab, Elf32_Sym *sym)
return name;
}

+#ifdef CONFIG_X86_64
+static ElfW(Sym) *sym_lookup(const char *symname)
+{
+ int i;
+ for (i = 0; i < ehdr.e_shnum; i++) {
+ struct section *sec = &secs[i];
+ long nsyms;
+ char *strtab;
+ ElfW(Sym) *symtab;
+ ElfW(Sym) *sym;
+
+ if (sec->shdr.sh_type != SHT_SYMTAB)
+ continue;
+
+ nsyms = sec->shdr.sh_size/sizeof(ElfW(Sym));
+ symtab = sec->symtab;
+ strtab = sec->link->strtab;
+
+ for (sym = symtab; --nsyms >= 0; sym++) {
+ if (!sym->st_name)
+ continue;
+ if (strcmp(symname, strtab + sym->st_name) == 0)
+ return sym;
+ }
+ }
+ return 0;
+}
+#endif


#if BYTE_ORDER == LITTLE_ENDIAN
#define le16_to_cpu(val) (val)
#define le32_to_cpu(val) (val)
+#define le64_to_cpu(val) (val)
#endif
#if BYTE_ORDER == BIG_ENDIAN
#define le16_to_cpu(val) bswap_16(val)
#define le32_to_cpu(val) bswap_32(val)
+#define le64_to_cpu(val) bswap_64(val)
#endif

static uint16_t elf16_to_cpu(uint16_t val)
@@ -274,6 +389,25 @@ static uint32_t elf32_to_cpu(uint32_t val)
return le32_to_cpu(val);
}

+#if (ELF_BITS == 64)
+static uint64_t elf64_to_cpu(uint64_t val)
+{
+ return le64_to_cpu(val);
+}
+#endif
+
+#define elf_half_to_cpu(x) elf16_to_cpu(x)
+#define elf_word_to_cpu(x) elf32_to_cpu(x)
+#if (ELF_BITS == 64)
+#define elf_addr_to_cpu(x) elf64_to_cpu(x)
+#define elf_off_to_cpu(x) elf64_to_cpu(x)
+#define elf_xword_to_cpu(x) elf64_to_cpu(x)
+#else
+#define elf_addr_to_cpu(x) elf32_to_cpu(x)
+#define elf_off_to_cpu(x) elf32_to_cpu(x)
+#define elf_xword_to_cpu(x) elf32_to_cpu(x)
+#endif
+
static void read_ehdr(FILE *fp)
{
if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1) {
@@ -283,8 +417,8 @@ static void read_ehdr(FILE *fp)
if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0) {
die("No ELF magic\n");
}
- if (ehdr.e_ident[EI_CLASS] != ELFCLASS32) {
- die("Not a 32 bit executable\n");
+ if (ehdr.e_ident[EI_CLASS] != ELF_CLASS) {
+ die("Not a %d bit executable\n", ELF_BITS);
}
if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) {
die("Not a LSB ELF executable\n");
@@ -293,36 +427,36 @@ static void read_ehdr(FILE *fp)
die("Unknown ELF version\n");
}
/* Convert the fields to native endian */
- ehdr.e_type = elf16_to_cpu(ehdr.e_type);
- ehdr.e_machine = elf16_to_cpu(ehdr.e_machine);
- ehdr.e_version = elf32_to_cpu(ehdr.e_version);
- ehdr.e_entry = elf32_to_cpu(ehdr.e_entry);
- ehdr.e_phoff = elf32_to_cpu(ehdr.e_phoff);
- ehdr.e_shoff = elf32_to_cpu(ehdr.e_shoff);
- ehdr.e_flags = elf32_to_cpu(ehdr.e_flags);
- ehdr.e_ehsize = elf16_to_cpu(ehdr.e_ehsize);
- ehdr.e_phentsize = elf16_to_cpu(ehdr.e_phentsize);
- ehdr.e_phnum = elf16_to_cpu(ehdr.e_phnum);
- ehdr.e_shentsize = elf16_to_cpu(ehdr.e_shentsize);
- ehdr.e_shnum = elf16_to_cpu(ehdr.e_shnum);
- ehdr.e_shstrndx = elf16_to_cpu(ehdr.e_shstrndx);
+ ehdr.e_type = elf_half_to_cpu(ehdr.e_type);
+ ehdr.e_machine = elf_half_to_cpu(ehdr.e_machine);
+ ehdr.e_version = elf_word_to_cpu(ehdr.e_version);
+ ehdr.e_entry = elf_addr_to_cpu(ehdr.e_entry);
+ ehdr.e_phoff = elf_off_to_cpu(ehdr.e_phoff);
+ ehdr.e_shoff = elf_off_to_cpu(ehdr.e_shoff);
+ ehdr.e_flags = elf_word_to_cpu(ehdr.e_flags);
+ ehdr.e_ehsize = elf_half_to_cpu(ehdr.e_ehsize);
+ ehdr.e_phentsize = elf_half_to_cpu(ehdr.e_phentsize);
+ ehdr.e_phnum = elf_half_to_cpu(ehdr.e_phnum);
+ ehdr.e_shentsize = elf_half_to_cpu(ehdr.e_shentsize);
+ ehdr.e_shnum = elf_half_to_cpu(ehdr.e_shnum);
+ ehdr.e_shstrndx = elf_half_to_cpu(ehdr.e_shstrndx);

if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
die("Unsupported ELF header type\n");
}
- if (ehdr.e_machine != EM_386) {
- die("Not for x86\n");
+ if (ehdr.e_machine != ELF_MACHINE) {
+ die("Not for %s\n", ELF_MACHINE_NAME);
}
if (ehdr.e_version != EV_CURRENT) {
die("Unknown ELF version\n");
}
- if (ehdr.e_ehsize != sizeof(Elf32_Ehdr)) {
+ if (ehdr.e_ehsize != sizeof(ElfW(Ehdr))) {
die("Bad Elf header size\n");
}
- if (ehdr.e_phentsize != sizeof(Elf32_Phdr)) {
+ if (ehdr.e_phentsize != sizeof(ElfW(Phdr))) {
die("Bad program header entry\n");
}
- if (ehdr.e_shentsize != sizeof(Elf32_Shdr)) {
+ if (ehdr.e_shentsize != sizeof(ElfW(Shdr))) {
die("Bad section header entry\n");
}
if (ehdr.e_shstrndx >= ehdr.e_shnum) {
@@ -333,7 +467,7 @@ static void read_ehdr(FILE *fp)
static void read_shdrs(FILE *fp)
{
int i;
- Elf32_Shdr shdr;
+ ElfW(Shdr) shdr;

secs = calloc(ehdr.e_shnum, sizeof(struct section));
if (!secs) {
@@ -349,16 +483,16 @@ static void read_shdrs(FILE *fp)
if (fread(&shdr, sizeof shdr, 1, fp) != 1)
die("Cannot read ELF section headers %d/%d: %s\n",
i, ehdr.e_shnum, strerror(errno));
- sec->shdr.sh_name = elf32_to_cpu(shdr.sh_name);
- sec->shdr.sh_type = elf32_to_cpu(shdr.sh_type);
- sec->shdr.sh_flags = elf32_to_cpu(shdr.sh_flags);
- sec->shdr.sh_addr = elf32_to_cpu(shdr.sh_addr);
- sec->shdr.sh_offset = elf32_to_cpu(shdr.sh_offset);
- sec->shdr.sh_size = elf32_to_cpu(shdr.sh_size);
- sec->shdr.sh_link = elf32_to_cpu(shdr.sh_link);
- sec->shdr.sh_info = elf32_to_cpu(shdr.sh_info);
- sec->shdr.sh_addralign = elf32_to_cpu(shdr.sh_addralign);
- sec->shdr.sh_entsize = elf32_to_cpu(shdr.sh_entsize);
+ sec->shdr.sh_name = elf_word_to_cpu(shdr.sh_name);
+ sec->shdr.sh_type = elf_word_to_cpu(shdr.sh_type);
+ sec->shdr.sh_flags = elf_xword_to_cpu(shdr.sh_flags);
+ sec->shdr.sh_addr = elf_addr_to_cpu(shdr.sh_addr);
+ sec->shdr.sh_offset = elf_off_to_cpu(shdr.sh_offset);
+ sec->shdr.sh_size = elf_xword_to_cpu(shdr.sh_size);
+ sec->shdr.sh_link = elf_word_to_cpu(shdr.sh_link);
+ sec->shdr.sh_info = elf_word_to_cpu(shdr.sh_info);
+ sec->shdr.sh_addralign = elf_xword_to_cpu(shdr.sh_addralign);
+ sec->shdr.sh_entsize = elf_xword_to_cpu(shdr.sh_entsize);
if (sec->shdr.sh_link < ehdr.e_shnum)
sec->link = &secs[sec->shdr.sh_link];
}
@@ -412,12 +546,12 @@ static void read_symtabs(FILE *fp)
die("Cannot read symbol table: %s\n",
strerror(errno));
}
- for (j = 0; j < sec->shdr.sh_size/sizeof(Elf32_Sym); j++) {
- Elf32_Sym *sym = &sec->symtab[j];
- sym->st_name = elf32_to_cpu(sym->st_name);
- sym->st_value = elf32_to_cpu(sym->st_value);
- sym->st_size = elf32_to_cpu(sym->st_size);
- sym->st_shndx = elf16_to_cpu(sym->st_shndx);
+ for (j = 0; j < sec->shdr.sh_size/sizeof(ElfW(Sym)); j++) {
+ ElfW(Sym) *sym = &sec->symtab[j];
+ sym->st_name = elf_word_to_cpu(sym->st_name);
+ sym->st_value = elf_addr_to_cpu(sym->st_value);
+ sym->st_size = elf_xword_to_cpu(sym->st_size);
+ sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
}
}
}
@@ -428,7 +562,7 @@ static void read_relocs(FILE *fp)
int i,j;
for (i = 0; i < ehdr.e_shnum; i++) {
struct section *sec = &secs[i];
- if (sec->shdr.sh_type != SHT_REL) {
+ if (sec->shdr.sh_type != SHT_REL_TYPE) {
continue;
}
sec->reltab = malloc(sec->shdr.sh_size);
@@ -445,10 +579,13 @@ static void read_relocs(FILE *fp)
die("Cannot read symbol table: %s\n",
strerror(errno));
}
- for (j = 0; j < sec->shdr.sh_size/sizeof(Elf32_Rel); j++) {
- Elf32_Rel *rel = &sec->reltab[j];
- rel->r_offset = elf32_to_cpu(rel->r_offset);
- rel->r_info = elf32_to_cpu(rel->r_info);
+ for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Rel); j++) {
+ Elf_Rel *rel = &sec->reltab[j];
+ rel->r_offset = elf_addr_to_cpu(rel->r_offset);
+ rel->r_info = elf_xword_to_cpu(rel->r_info);
+#if (SHT_REL_TYPE == SHT_RELA)
+ rel->r_addend = elf_xword_to_cpu(rel->r_addend);
+#endif
}
}
}
@@ -468,19 +605,25 @@ static void print_absolute_symbols(void)
continue;
}
sym_strtab = sec->link->strtab;
- for (j = 0; j < sec->shdr.sh_size/sizeof(Elf32_Sym); j++) {
- Elf32_Sym *sym;
+ for (j = 0; j < sec->shdr.sh_size/sizeof(ElfW(Sym)); j++) {
+ ElfW(Sym) *sym;
const char *name;
sym = &sec->symtab[j];
name = sym_name(sym_strtab, sym);
if (sym->st_shndx != SHN_ABS) {
continue;
}
- printf("%5d %08x %5d %10s %10s %12s %s\n",
+#if (ELF_BITS == 64)
+ printf("%5d %016"PRIx64" %5"PRId64
+ " %10s %10s %12s %s\n",
+#else
+ printf("%5d %08"PRIx32" %5"PRId32
+ " %10s %10s %12s %s\n",
+#endif
j, sym->st_value, sym->st_size,
- sym_type(ELF32_ST_TYPE(sym->st_info)),
- sym_bind(ELF32_ST_BIND(sym->st_info)),
- sym_visibility(ELF32_ST_VISIBILITY(sym->st_other)),
+ sym_type(ELF_ST_TYPE(sym->st_info)),
+ sym_bind(ELF_ST_BIND(sym->st_info)),
+ sym_visibility(ELF_ST_VISIBILITY(sym->st_other)),
name);
}
}
@@ -495,9 +638,9 @@ static void print_absolute_relocs(void)
struct section *sec = &secs[i];
struct section *sec_applies, *sec_symtab;
char *sym_strtab;
- Elf32_Sym *sh_symtab;
+ ElfW(Sym) *sh_symtab;
int j;
- if (sec->shdr.sh_type != SHT_REL) {
+ if (sec->shdr.sh_type != SHT_REL_TYPE) {
continue;
}
sec_symtab = sec->link;
@@ -507,12 +650,12 @@ static void print_absolute_relocs(void)
}
sh_symtab = sec_symtab->symtab;
sym_strtab = sec_symtab->link->strtab;
- for (j = 0; j < sec->shdr.sh_size/sizeof(Elf32_Rel); j++) {
- Elf32_Rel *rel;
- Elf32_Sym *sym;
+ for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Rel); j++) {
+ Elf_Rel *rel;
+ ElfW(Sym) *sym;
const char *name;
rel = &sec->reltab[j];
- sym = &sh_symtab[ELF32_R_SYM(rel->r_info)];
+ sym = &sh_symtab[ELF_R_SYM(rel->r_info)];
name = sym_name(sym_strtab, sym);
if (sym->st_shndx != SHN_ABS) {
continue;
@@ -542,10 +685,16 @@ static void print_absolute_relocs(void)
printed = 1;
}

- printf("%08x %08x %10s %08x %s\n",
+#if (ELF_BITS == 64)
+ printf("%016"PRIx64" %016"PRIx64
+ " %10s %016"PRIx64" %s\n",
+#else
+ printf("%08"PRIx32" %08"PRIx32
+ " %10s %08"PRIx32" %s\n",
+#endif
rel->r_offset,
rel->r_info,
- rel_type(ELF32_R_TYPE(rel->r_info)),
+ rel_type(ELF_R_TYPE(rel->r_info)),
sym->st_value,
name);
}
@@ -555,19 +704,20 @@ static void print_absolute_relocs(void)
printf("\n");
}

-static void walk_relocs(void (*visit)(Elf32_Rel *rel, Elf32_Sym *sym),
- int use_real_mode)
+
+static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *,
+ ElfW(Sym) *, const char *))
{
int i;
/* Walk through the relocations */
for (i = 0; i < ehdr.e_shnum; i++) {
char *sym_strtab;
- Elf32_Sym *sh_symtab;
+ ElfW(Sym) *sh_symtab;
struct section *sec_applies, *sec_symtab;
int j;
struct section *sec = &secs[i];

- if (sec->shdr.sh_type != SHT_REL) {
+ if (sec->shdr.sh_type != SHT_REL_TYPE) {
continue;
}
sec_symtab = sec->link;
@@ -577,101 +727,276 @@ static void walk_relocs(void (*visit)(Elf32_Rel *rel, Elf32_Sym *sym),
}
sh_symtab = sec_symtab->symtab;
sym_strtab = sec_symtab->link->strtab;
- for (j = 0; j < sec->shdr.sh_size/sizeof(Elf32_Rel); j++) {
- Elf32_Rel *rel;
- Elf32_Sym *sym;
- unsigned r_type;
- const char *symname;
- int shn_abs;

- rel = &sec->reltab[j];
- sym = &sh_symtab[ELF32_R_SYM(rel->r_info)];
- r_type = ELF32_R_TYPE(rel->r_info);
-
- shn_abs = sym->st_shndx == SHN_ABS;
-
- switch (r_type) {
- case R_386_NONE:
- case R_386_PC32:
- case R_386_PC16:
- case R_386_PC8:
- /*
- * NONE can be ignored and and PC relative
- * relocations don't need to be adjusted.
- */
- break;
+ for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Rel); j++) {
+ Elf_Rel *rel = &sec->reltab[j];
+ ElfW(Sym) *sym = &sh_symtab[ELF_R_SYM(rel->r_info)];
+ const char *symname = sym_name(sym_strtab, sym);

- case R_386_16:
- symname = sym_name(sym_strtab, sym);
- if (!use_real_mode)
- goto bad;
- if (shn_abs) {
- if (is_reloc(S_ABS, symname))
- break;
- else if (!is_reloc(S_SEG, symname))
- goto bad;
- } else {
- if (is_reloc(S_LIN, symname))
- goto bad;
- else
- break;
- }
- visit(rel, sym);
- break;
+ process(sec, rel, sym, symname);
+ }
+ }
+}

- case R_386_32:
- symname = sym_name(sym_strtab, sym);
- if (shn_abs) {
- if (is_reloc(S_ABS, symname))
- break;
- else if (!is_reloc(S_REL, symname))
- goto bad;
- } else {
- if (use_real_mode &&
- !is_reloc(S_LIN, symname))
- break;
- }
- visit(rel, sym);
- break;
- default:
- die("Unsupported relocation type: %s (%d)\n",
- rel_type(r_type), r_type);
+#ifdef CONFIG_X86_64
+
+#define PER_CPU_SECTION ".data..percpu"
+static int per_cpu_shndx = -1;
+ElfW(Addr) per_cpu_load_addr;
+
+/*
+ * The .data..percpu section is a special case for x86_64 SMP kernels.
+ * It is used to initialize the actual per_cpu areas and to provide
+ * definitions for the per_cpu variables that correspond to their offsets
+ * within the percpu area. Since the values of all of the symbols need
+ * to be offsets from the start of the per_cpu area the virtual address
+ * (sh_addr) of .data..percpu is 0 in SMP kernels.
+ *
+ * This means that:
+ *
+ * Relocations that reference symbols in the per_cpu area do not
+ * need further relocation (since the value is an offset relative
+ * to the start of the per_cpu area that does not change).
+ *
+ * Relocations that apply to the per_cpu area need to have their
+ * offset adjusted by by the value of __per_cpu_load to make them
+ * point to the correct place in the loaded image (because the
+ * virtual address of .data..percpu is 0).
+ *
+ * For non SMP kernels .data..percpu is linked as part of the normal
+ * kernel data and does not require special treatment.
+ *
+ */
+static void percpu_init(void)
+{
+ int i;
+ for (i = 0; i < ehdr.e_shnum; i++) {
+ ElfW(Sym) *sym;
+ if (strcmp(sec_name(i), PER_CPU_SECTION))
+ continue;
+
+ if (secs[i].shdr.sh_addr != 0) /* non SMP kernel */
+ return;
+
+ sym = sym_lookup("__per_cpu_load");
+ if (!sym)
+ die("can't find __per_cpu_load\n");
+
+ per_cpu_shndx = i;
+ per_cpu_load_addr = sym->st_value;
+ return;
+ }
+}
+
+/*
+ * Check to see if a symbol lies in the .data..percpu section.
+ * For some as yet not understood reason the "__init_begin"
+ * symbol which immediately preceeds the .data..percpu section
+ * also shows up as it it were part of it so we do an explict
+ * check for that symbol name and ignore it.
+ */
+static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
+{
+ return (sym->st_shndx == per_cpu_shndx) &&
+ strcmp(symname, "__init_begin");
+}
+
+
+static int do_reloc(struct section *sec, Elf64_Rela *rel, ElfW(Sym) *sym,
+ const char *symname)
+{
+ unsigned r_type = ELF64_R_TYPE(rel->r_info);
+ ElfW(Addr) offset = rel->r_offset;
+ int shn_abs = (sym->st_shndx == SHN_ABS) && !is_reloc(S_REL, symname);
+
+ if (sym->st_shndx == SHN_UNDEF)
+ return 0;
+
+ /*
+ * adjust the offset if this reloc applies to the percpu section
+ */
+ if (sec->shdr.sh_info == per_cpu_shndx)
+ offset += per_cpu_load_addr;
+
+ switch (r_type) {
+ case R_X86_64_NONE:
+ case R_X86_64_PC32:
+ /*
+ * NONE can be ignored and PC relative
+ * relocations don't need to be adjusted.
+ */
+ break;
+
+ case R_X86_64_32:
+ case R_X86_64_32S:
+ case R_X86_64_64:
+ /*
+ * References to the percpu area don't need to be adjusted.
+ */
+ if (is_percpu_sym(sym, symname))
+ break;
+
+ if (shn_abs) {
+ /*
+ * whitelisted absolute symbols
+ * do not require relocation
+ */
+ if (is_reloc(S_ABS, symname))
break;
- bad:
- symname = sym_name(sym_strtab, sym);
- die("Invalid %s %s relocation: %s\n",
- shn_abs ? "absolute" : "relative",
- rel_type(r_type), symname);
- }
+
+ die("Invalid absolute %s relocation: %s\n",
+ rel_type(r_type), symname);
+ break;
}
+
+ /*
+ * Relocation offsets for 64 bit kernels are output
+ * as 32 bits and sign extended back to 64 bits when
+ * the relocations are processed.
+ * Make sure that the offset will fit.
+ */
+ if ((int32_t)offset != (int64_t)offset)
+ die("Relocation offset doesn't fit in 32 bits\n");
+
+ if (r_type == R_X86_64_64)
+ add_reloc(&relocs64, offset);
+ else
+ add_reloc(&relocs32, offset);
+ break;
+
+ default:
+ die("Unsupported relocation type: %s (%d)\n",
+ rel_type(r_type), r_type);
+ break;
}
+
+ return 0;
}

-static void count_reloc(Elf32_Rel *rel, Elf32_Sym *sym)
+#else
+
+static int do_reloc(struct section *sec, Elf32_Rel *rel, ElfW(Sym) *sym,
+ const char *symname)
{
- if (ELF32_R_TYPE(rel->r_info) == R_386_16)
- reloc16_count++;
- else
- reloc_count++;
+ unsigned r_type = ELF32_R_TYPE(rel->r_info);
+ int shn_abs = (sym->st_shndx == SHN_ABS) && !is_reloc(S_REL, symname);
+
+ switch (r_type) {
+ case R_386_NONE:
+ case R_386_PC32:
+ case R_386_PC16:
+ case R_386_PC8:
+ /*
+ * NONE can be ignored and PC relative
+ * relocations don't need to be adjusted.
+ */
+ break;
+
+ case R_386_32:
+ if (shn_abs) {
+ /*
+ * whitelisted absolute symbols
+ * do not require relocation
+ */
+ if (is_reloc(S_ABS, symname))
+ break;
+
+ die("Invalid absolute %s relocation: %s\n",
+ rel_type(r_type), symname);
+ break;
+ }
+
+ add_reloc(&relocs32, rel->r_offset);
+ break;
+
+ default:
+ die("Unsupported relocation type: %s (%d)\n",
+ rel_type(r_type), r_type);
+ break;
+ }
+
+ return 0;
}
+#endif

-static void collect_reloc(Elf32_Rel *rel, Elf32_Sym *sym)
+
+static int do_reloc_real(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
+ const char *symname)
{
- /* Remember the address that needs to be adjusted. */
- if (ELF32_R_TYPE(rel->r_info) == R_386_16)
- relocs16[reloc16_idx++] = rel->r_offset;
- else
- relocs[reloc_idx++] = rel->r_offset;
+ unsigned r_type = ELF32_R_TYPE(rel->r_info);
+ int shn_abs = (sym->st_shndx == SHN_ABS) && !is_reloc(S_REL, symname);
+
+ switch (r_type) {
+ case R_386_NONE:
+ case R_386_PC32:
+ case R_386_PC16:
+ case R_386_PC8:
+ /*
+ * NONE can be ignored and PC relative
+ * relocations don't need to be adjusted.
+ */
+ break;
+
+ case R_386_16:
+ if (shn_abs) {
+ if (is_reloc(S_ABS, symname))
+ break;
+
+ if (is_reloc(S_SEG, symname)) {
+ add_reloc(&relocs16, rel->r_offset);
+ break;
+ }
+ } else {
+ if (!is_reloc(S_LIN, symname))
+ break;
+ }
+ die("Invalid %s %s relocation: %s\n",
+ shn_abs ? "absolute" : "relative",
+ rel_type(r_type), symname);
+ break;
+
+ case R_386_32:
+ if (shn_abs) {
+ if (is_reloc(S_ABS, symname))
+ break;
+
+ if (is_reloc(S_REL, symname)) {
+ add_reloc(&relocs32, rel->r_offset);
+ break;
+ }
+ } else {
+ if (is_reloc(S_LIN, symname))
+ add_reloc(&relocs32, rel->r_offset);
+ break;
+ }
+ die("Invalid %s %s relocation: %s\n",
+ shn_abs ? "absolute" : "relative",
+ rel_type(r_type), symname);
+ break;
+
+ default:
+ die("Unsupported relocation type: %s (%d)\n",
+ rel_type(r_type), r_type);
+ break;
+ }
+
+ return 0;
}

+
static int cmp_relocs(const void *va, const void *vb)
{
- const unsigned long *a, *b;
+ const uint32_t *a, *b;
a = va; b = vb;
return (*a == *b)? 0 : (*a > *b)? 1 : -1;
}

-static int write32(unsigned int v, FILE *f)
+static void sort_relocs(struct relocs *r)
+{
+ qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs);
+}
+
+static int write32(uint32_t v, FILE *f)
{
unsigned char buf[4];

@@ -679,79 +1004,62 @@ static int write32(unsigned int v, FILE *f)
return fwrite(buf, 1, 4, f) == 4 ? 0 : -1;
}

+static int write32_as_text(uint32_t v, FILE *f)
+{
+ return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1;
+}
+
static void emit_relocs(int as_text, int use_real_mode)
{
int i;
- /* Count how many relocations I have and allocate space for them. */
- reloc_count = 0;
- walk_relocs(count_reloc, use_real_mode);
- relocs = malloc(reloc_count * sizeof(relocs[0]));
- if (!relocs) {
- die("malloc of %d entries for relocs failed\n",
- reloc_count);
- }
+ int (*write_reloc)(uint32_t, FILE *) = write32;

- relocs16 = malloc(reloc16_count * sizeof(relocs[0]));
- if (!relocs16) {
- die("malloc of %d entries for relocs16 failed\n",
- reloc16_count);
- }
/* Collect up the relocations */
- reloc_idx = 0;
- walk_relocs(collect_reloc, use_real_mode);
+ walk_relocs(use_real_mode ? do_reloc_real : do_reloc);

- if (reloc16_count && !use_real_mode)
+ if (relocs16.count && !use_real_mode)
die("Segment relocations found but --realmode not specified\n");

/* Order the relocations for more efficient processing */
- qsort(relocs, reloc_count, sizeof(relocs[0]), cmp_relocs);
- qsort(relocs16, reloc16_count, sizeof(relocs16[0]), cmp_relocs);
+ sort_relocs(&relocs16);
+ sort_relocs(&relocs32);
+#ifdef CONFIG_X86_64
+ sort_relocs(&relocs64);
+#endif
+
+ /* output the relocations */

- /* Print the relocations */
if (as_text) {
- /* Print the relocations in a form suitable that
- * gas will like.
- */
printf(".section \".data.reloc\",\"a\"\n");
printf(".balign 4\n");
- if (use_real_mode) {
- printf("\t.long %lu\n", reloc16_count);
- for (i = 0; i < reloc16_count; i++)
- printf("\t.long 0x%08lx\n", relocs16[i]);
- printf("\t.long %lu\n", reloc_count);
- for (i = 0; i < reloc_count; i++) {
- printf("\t.long 0x%08lx\n", relocs[i]);
- }
- } else {
- /* Print a stop */
- printf("\t.long 0x%08lx\n", (unsigned long)0);
- for (i = 0; i < reloc_count; i++) {
- printf("\t.long 0x%08lx\n", relocs[i]);
- }
- }
-
- printf("\n");
+ write_reloc = write32_as_text;
}
- else {
- if (use_real_mode) {
- write32(reloc16_count, stdout);
- for (i = 0; i < reloc16_count; i++)
- write32(relocs16[i], stdout);
- write32(reloc_count, stdout);
-
- /* Now print each relocation */
- for (i = 0; i < reloc_count; i++)
- write32(relocs[i], stdout);
- } else {
- /* Print a stop */
- write32(0, stdout);

- /* Now print each relocation */
- for (i = 0; i < reloc_count; i++) {
- write32(relocs[i], stdout);
- }
- }
+ if (use_real_mode) {
+ write_reloc(relocs16.count, stdout);
+ for (i = 0; i < relocs16.count; i++)
+ write_reloc(relocs16.offset[i], stdout);
+
+ write_reloc(relocs32.count, stdout);
+ for (i = 0; i < relocs32.count; i++)
+ write_reloc(relocs32.offset[i], stdout);
+ } else {
+#ifdef CONFIG_X86_64
+ /* Print a stop */
+ write_reloc(0, stdout);
+
+ /* Now print each relocation */
+ for (i = 0; i < relocs64.count; i++)
+ write_reloc(relocs64.offset[i], stdout);
+#endif
+ /* Print a stop */
+ write_reloc(0, stdout);
+
+ /* Now print each relocation */
+ for (i = 0; i < relocs32.count; i++)
+ write_reloc(relocs32.offset[i], stdout);
}
+
}

static void usage(void)
@@ -812,6 +1120,9 @@ int main(int argc, char **argv)
read_strtabs(fp);
read_symtabs(fp);
read_relocs(fp);
+#ifdef CONFIG_X86_64
+ percpu_init();
+#endif
if (show_absolute_syms) {
print_absolute_symbols();
goto out;
diff --git a/arch/x86/tools/relocs_32.c b/arch/x86/tools/relocs_32.c
new file mode 100644
index 0000000..8cf7e94
--- /dev/null
+++ b/arch/x86/tools/relocs_32.c
@@ -0,0 +1 @@
+#include "relocs.c"
diff --git a/arch/x86/tools/relocs_64.c b/arch/x86/tools/relocs_64.c
new file mode 100644
index 0000000..419b05a
--- /dev/null
+++ b/arch/x86/tools/relocs_64.c
@@ -0,0 +1,2 @@
+#define CONFIG_X86_64 1
+#include "relocs.c"
--
1.7.9.5

2013-04-04 20:08:29

by Kees Cook

[permalink] [raw]
Subject: [PATCH 1/3] x86: routines to choose random kernel base offset

This provides routines for selecting a randomized kernel base offset,
bounded by e820 details. It tries to use RDRAND and falls back to
RDTSC. If "noaslr" is on the kernel command line, no offset will be used.

Heavily based on work by Dan Rosenberg and Neill Clift.

Signed-off-by: Kees Cook <[email protected]>
Cc: Eric Northup <[email protected]>
---
arch/x86/boot/compressed/Makefile | 7 +-
arch/x86/boot/compressed/aslr.S | 228 +++++++++++++++++++++++++++++++++++++
2 files changed, 233 insertions(+), 2 deletions(-)
create mode 100644 arch/x86/boot/compressed/aslr.S

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 8a84501..376ef47 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -4,7 +4,10 @@
# create a compressed vmlinux image from the original vmlinux
#

-targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo head_$(BITS).o misc.o string.o cmdline.o early_serial_console.o piggy.o
+targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 \
+ vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo head_$(BITS).o \
+ misc.o string.o cmdline.o early_serial_console.o piggy.o \
+ aslr.o

KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2
KBUILD_CFLAGS += -fno-strict-aliasing -fPIC
@@ -26,7 +29,7 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include

VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o $(obj)/early_serial_console.o \
- $(obj)/piggy.o
+ $(obj)/piggy.o $(obj)/aslr.o

$(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
$(obj)/efi_stub_$(BITS).o: KBUILD_CLFAGS += -fshort-wchar -mno-red-zone
diff --git a/arch/x86/boot/compressed/aslr.S b/arch/x86/boot/compressed/aslr.S
new file mode 100644
index 0000000..37cdef4
--- /dev/null
+++ b/arch/x86/boot/compressed/aslr.S
@@ -0,0 +1,228 @@
+/*
+ * arch/x86/boot/compressed/aslr.S
+ *
+ * Support routine for Kernel Address Space Layout Randomization used by both
+ * the 32 and 64 bit boot code.
+ *
+ */
+ .text
+
+#include <asm/boot.h>
+#include <asm/asm-offsets.h>
+#include <asm/cpufeature.h>
+#include <asm/processor-flags.h>
+#include <asm/e820.h>
+
+#ifdef CONFIG_RANDOMIZE_BASE
+
+ .globl select_aslr_address
+ .code32
+
+/*
+ * Get the physical memory limit for the run from the physical load position of
+ * the kernel. The kernel loads at LOAD_PHYSICAL_ADDR and we need to know how
+ * much physical memory is available for use after that point to make sure the
+ * relocated kernel will fit. Returns the limit in eax.
+ */
+get_physical_run_end:
+ pushl %edi
+ pushl %esi
+ pushl %ebx
+ pushl %edx
+ pushl %ecx
+ movzbl BP_e820_entries(%esi), %edi
+ leal BP_e820_map(%esi), %esi
+ testl %edi, %edi
+ jz 5f
+1: cmpl $E820_RAM, E820_type(%esi)
+ jnz 4f
+ movl E820_addr(%esi), %eax
+ movl E820_addr+4(%esi), %edx
+ testl %edx, %edx /* Start address is too big for 32 bit */
+ jnz 4f
+ cmpl $LOAD_PHYSICAL_ADDR, %eax
+ ja 4f
+ movl E820_size(%esi), %ecx
+ movl E820_size+4(%esi), %ebx
+ addl %eax, %ecx
+ adcl %edx, %ebx
+ jz 2f /* end address not beyond 32bit*/
+/* For a large run set the limit as 2^32-1 */
+ xorl %ecx, %ecx
+ decl %ecx
+ jmp 3f
+2: cmpl $LOAD_PHYSICAL_ADDR, %ecx
+ jb 4f
+3:
+ movl %ecx, %eax
+ jmp 6f
+
+4: addl $E820_entry_size, %esi
+ decl %edi
+ jnz 1b
+5: xorl %eax, %eax /* Fail */
+6: popl %ecx
+ popl %edx
+ popl %ebx
+ popl %esi
+ popl %edi
+ ret
+
+/*
+ * Get a random value to be used for the ASLR kernel offset.
+ * Returns the value in eax.
+ */
+get_aslr_offset:
+ pushl %ebx
+ pushl %edx
+ pushl %ecx
+ call find_cmdline_option
+ testl %eax, %eax
+ jne 4f
+ /* Standard check for cpuid */
+ pushfl /* Push original flags */
+ pushfl
+ popl %eax
+ movl %eax, %ebx
+ xorl $X86_EFLAGS_ID, %eax
+ pushl %eax
+ popfl
+ pushfl
+ popl %eax
+ popfl /* Pop original flags */
+ cmpl %eax, %ebx
+ /* Say zero offset if we can't change the flag */
+ movl $0, %eax
+ je 4f
+
+ /* Check for cpuid 1 */
+ cpuid
+ cmpl $0x1, %eax
+ jb 4f
+
+ movl $0x1, %eax
+ cpuid
+ xor %eax, %eax
+
+ /* RDRAND is bit 30 */
+ btl $(X86_FEATURE_RDRAND & 31), %ecx
+ jc 1f
+
+ /* RDTSC is bit 4 */
+ btl $(X86_FEATURE_TSC & 31), %edx
+ jc 3f
+
+ /* Nothing is supported */
+ jmp 4f
+1:
+ /*
+ * RDRAND sets carry bit on success, otherwise we should try
+ * again up to 16 times.
+ */
+ movl $0x10, %ecx
+2:
+ /* rdrand %eax */
+ .byte 0x0f, 0xc7, 0xf0
+ jc 4f
+ loop 2b
+
+ /* Fall through: if RDRAND is supported but fails, use RDTSC,
+ * which is guaranteed to be supported.
+ */
+3:
+ rdtsc
+ /*
+ * Since this is time related get some of the least significant bits
+ * past the alignment mask
+ */
+ shll $0x0c, %eax
+ /* Fix the maximal offset allowed */
+4: andl $CONFIG_RANDOMIZE_BASE_MAX_OFFSET-1, %eax
+ popl %ecx
+ popl %edx
+ popl %ebx
+ ret
+
+/*
+ * Select the ASLR address to use. We can get called once either in 32
+ * or 64 bit mode. The latter if we have a 64 bit loader.
+ * Uses ebp as the input base and returns the result in eax.
+ */
+select_aslr_address:
+ pushl %edx
+ pushl %ebx
+ pushl %ecx
+ pushl %edi
+ call get_aslr_offset
+ pushl %eax
+ call get_physical_run_end
+ movl %eax, %edx
+ popl %eax
+1: movl %ebp, %ebx
+ addl %eax, %ebx
+ movl BP_kernel_alignment(%esi), %edi
+ decl %edi
+ addl %edi, %ebx
+ notl %edi
+ andl %edi, %ebx
+ /* Make sure we don't copy beyond run */
+ leal boot_stack_end(%ebx), %ecx
+ leal z_extract_offset(%ecx), %ecx
+ cmpl %edx, %ecx
+ jb 2f
+ shrl $1, %eax /* Shink offset */
+ jne 1b /* Move on if offset zero */
+ mov %ebp, %ebx
+2: movl %ebx, %eax
+ popl %edi
+ popl %ecx
+ popl %ebx
+ popl %edx
+ ret
+
+/*
+ * Find the "noaslr" option if present on the command line.
+ */
+find_cmdline_option:
+
+#define ASLR_STRLEN 6
+
+ pushl %ecx
+ pushl %edi
+ xorl %eax, %eax /* Assume we fail */
+ movl BP_cmd_line_ptr(%esi), %edi
+ testl %edi, %edi
+ je 6f
+ /* Calculate string length */
+ leal -1(%edi), %ecx
+1: incl %ecx
+ cmpb $0, (%ecx)
+ jne 1b
+ subl %edi, %ecx
+2: cmpl $ASLR_STRLEN, %ecx
+ jb 6f
+ cmpl $0x73616f6e, (%edi) /* noas */
+ jne 4f
+ cmpb $0x6c, 4(%edi) /* l */
+ jne 4f
+ cmpb $0x72, 5(%edi) /* r */
+ jne 4f
+ /* If at the start then no beginning separator required */
+ cmpl %edi, BP_cmd_line_ptr(%esi)
+ je 3f
+ cmpb $0x20, -1(%edi)
+ ja 4f
+ /* If at the end then no end separator required */
+3: cmpl $ASLR_STRLEN, %ecx
+ je 5f
+ cmpb $0x20, ASLR_STRLEN(%edi)
+ jbe 5f
+4: incl %edi
+ decl %ecx
+ jmp 2b
+5: incl %eax /* Sucess */
+6: popl %edi
+ popl %ecx
+ ret
+
+#endif /* CONFIG_RANDOMIZE_BASE */
--
1.7.9.5

2013-04-04 20:08:28

by Kees Cook

[permalink] [raw]
Subject: [PATCH 3/3] x86: kernel base offset ASLR

This creates CONFIG_RANDOMIZE_BASE, so that the base offset of the kernel
can be randomized at boot.

This makes kernel vulnerabilities harder to reliably exploit, especially
from remote attacks and local processes in seccomp containers. Keeping
the location of kernel addresses secret becomes very important when using
this feature, so enabling kptr_restrict and dmesg_restrict is recommended.
Besides direct address leaks, several other attacks are possible to bypass
this on local systems, including cache timing[1]. However, the benefits of
this feature in certain environments exceed the perceived weaknesses[2].

An added security benefit is making the IDT read-only.

Current entropy is low, since the kernel has basically a minimum 2MB
alignment and has been built with -2G memory addressing. As a result,
available entropy will be 8 bits in the best case. The e820 entries on
a given system may further limit the available memory.

This feature is presently incompatible with hibernation.

When built into the kernel, the "noaslr" kernel command line option will
disable the feature.

Heavily based on work by Dan Rosenberg[3] and Neill Clift.

[1] http://www.internetsociety.org/sites/default/files/Practical%20Timing%20Side%20Channel%20Attacks%20Against%20Kernel%20Space%20ASLR.pdf
[2] http://forums.grsecurity.net/viewtopic.php?f=7&t=3367
[3] http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/index.html#00520

Signed-off-by: Kees Cook <[email protected]>
Cc: Eric Northup <[email protected]>
---
Documentation/kernel-parameters.txt | 4 +
arch/x86/Kconfig | 51 +++++++++++--
arch/x86/Makefile | 3 +
arch/x86/boot/compressed/head_32.S | 21 +++++-
arch/x86/boot/compressed/head_64.S | 135 ++++++++++++++++++++++++++++++++--
arch/x86/include/asm/fixmap.h | 4 +
arch/x86/include/asm/page_32_types.h | 2 +
arch/x86/include/asm/page_64_types.h | 4 -
arch/x86/include/asm/page_types.h | 4 +
arch/x86/kernel/asm-offsets.c | 14 ++++
arch/x86/kernel/setup.c | 24 ++++++
arch/x86/kernel/traps.c | 6 ++
12 files changed, 251 insertions(+), 21 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4609e81..e1b8993 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1839,6 +1839,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
noapic [SMP,APIC] Tells the kernel to not make use of any
IOAPICs that may be present in the system.

+ noaslr [X86]
+ Disable kernel base offset ASLR (Address Space
+ Layout Randomization) if built into the kernel.
+
noautogroup Disable scheduler automatic task group creation.

nobats [PPC] Do not use BATs for mapping kernel lowmem
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 70c0f3d..6fe1a3b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1649,8 +1649,8 @@ config PHYSICAL_START
If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
bzImage will decompress itself to above physical address and
run from there. Otherwise, bzImage will run from the address where
- it has been loaded by the boot loader and will ignore above physical
- address.
+ it has been loaded by the boot loader, using the above physical
+ address as a lower bound.

In normal kdump cases one does not have to set/change this option
as now bzImage can be compiled as a completely relocatable image
@@ -1696,15 +1696,49 @@ config RELOCATABLE

Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
it has been loaded at and the compile time physical address
- (CONFIG_PHYSICAL_START) is ignored.
-
-# Relocation on x86-32 needs some additional build support
+ (CONFIG_PHYSICAL_START) is solely used as a lower bound.
+
+config RANDOMIZE_BASE
+ bool "Randomize the address of the kernel image"
+ depends on RELOCATABLE
+ depends on !HIBERNATION
+ default n
+ ---help---
+ Randomizes the phyiscal and virtual address at which the
+ kernel image is decompressed, as a security feature that
+ deters exploit attempts relying on knowledge of the location
+ of kernel internals.
+
+ This feature also uses a fixed mapping to move the IDT
+ (if not already done as a fix for the F00F bug), to avoid
+ exposing the location of kernel internals relative to the
+ original IDT. This has the additional security benefit of
+ marking the new virtual address of the IDT read-only.
+
+ Entropy is generated using the RDRAND instruction if it
+ is supported. If not, then RDTSC is used, if supported. If
+ neither RDRAND nor RDTSC are supported, then no randomness
+ is introduced. Support for the CPUID instruction is required
+ to check for the availability of these two instructions.
+
+config RANDOMIZE_BASE_MAX_OFFSET
+ hex "Maximum ASLR offset allowed"
+ depends on RANDOMIZE_BASE
+ default "0x10000000"
+ range 0x0 0x10000000
+ ---help---
+ Determines the maximal offset in bytes that will be applied to the
+ kernel when Address Space Layout Randomization (ASLR) is active.
+ Physical memory layout and kernel size may limit this further.
+ This must be a power of two.
+
+# Relocation on x86-32/64 needs some additional build support
config X86_NEED_RELOCS
def_bool y
- depends on X86_32 && RELOCATABLE
+ depends on RELOCATABLE

config PHYSICAL_ALIGN
- hex "Alignment value to which kernel should be aligned" if X86_32
+ hex "Alignment value to which kernel should be aligned"
default "0x1000000"
range 0x2000 0x1000000
---help---
@@ -1724,6 +1758,9 @@ config PHYSICAL_ALIGN
end result is that kernel runs from a physical address meeting
above alignment restrictions.

+ Generally when using CONFIG_RANDOMIZE_BASE, this is safe to
+ lower to 0x200000.
+
Don't change this unless you know what you are doing.

config HOTPLUG_CPU
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 5c47726..4f280bd 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -60,6 +60,9 @@ else
# Use -mpreferred-stack-boundary=3 if supported.
KBUILD_CFLAGS += $(call cc-option,-mno-sse -mpreferred-stack-boundary=3)

+ ifdef CONFIG_RANDOMIZE_BASE
+ LDFLAGS_vmlinux := --emit-relocs
+ endif
# FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 1e3184f..8139c2f 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -29,6 +29,7 @@
#include <asm/page_types.h>
#include <asm/boot.h>
#include <asm/asm-offsets.h>
+#include <asm/cpufeature.h>

__HEAD
ENTRY(startup_32)
@@ -111,15 +112,29 @@ preferred_addr:
*/

#ifdef CONFIG_RELOCATABLE
+#ifdef CONFIG_RANDOMIZE_BASE
+ /* Setup boot stack for calls */
+ leal boot_stack_end(%ebp), %esp
+ call select_aslr_address /* Select ASLR address */
+ movl %eax, %ebx
+ /* LOAD_PHSYICAL_ADDR is the minimum safe address we can
+ * decompress at */
+ cmpl $LOAD_PHYSICAL_ADDR, %ebx
+ jae 1f
+ movl $LOAD_PHYSICAL_ADDR, %ebx
+1:
+#else /* CONFIG_RANDOMIZE_BASE */
movl %ebp, %ebx
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
notl %eax
andl %eax, %ebx
-#else
+#endif /* CONFIG_RANDOMIZE_BASE */
+
+#else /* CONFIG_RELOCATABLE */
movl $LOAD_PHYSICAL_ADDR, %ebx
-#endif
+#endif /* CONFIG_RELOCATABLE */

/* Target address to relocate to for decompression */
addl $z_extract_offset, %ebx
@@ -235,3 +250,5 @@ boot_heap:
boot_stack:
.fill BOOT_STACK_SIZE, 1, 0
boot_stack_end:
+ .globl boot_stack_end
+
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index c1d383d..fc37910 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -59,7 +59,7 @@ ENTRY(startup_32)
1:

/*
- * Calculate the delta between where we were compiled to run
+ * Calculate the delta between where we were linked to load
* at and where we were actually loaded at. This can only be done
* with a short local call on x86. Nothing else will tell us what
* address we are running at. The reserved chunk of the real-mode
@@ -78,10 +78,10 @@ ENTRY(startup_32)

call verify_cpu
testl %eax, %eax
- jnz no_longmode
+ jnz hang

/*
- * Compute the delta between where we were compiled to run at
+ * Compute the delta between where we were linked to load at
* and where the code will actually run at.
*
* %ebp contains the address we are loaded at by the boot loader and %ebx
@@ -90,15 +90,32 @@ ENTRY(startup_32)
*/

#ifdef CONFIG_RELOCATABLE
+#ifdef CONFIG_RANDOMIZE_BASE
+ call select_aslr_address /* Select ASLR offset */
+ movl %eax, %ebx
+ /* LOAD_PHYSICAL_ADDR is the minimum safe address we can
+ * decompress at */
+ cmpl $LOAD_PHYSICAL_ADDR, %ebx
+ jae 1f
+ movl $LOAD_PHYSICAL_ADDR, %ebx
+#else /* CONFIG_RANDOMIZE_BASE */
movl %ebp, %ebx
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
notl %eax
andl %eax, %ebx
-#else
+#endif /* CONFIG_RANDOMIZE_BASE */
+
+#ifdef CONFIG_RANDOMIZE_BASE
+1: movl %ebx, %eax
+ subl $LOAD_PHYSICAL_ADDR, %eax
+ movl %eax, aslr_offset(%ebp)
+ incl aslr_in_32bit(%ebp) /* say 32 bit code ran */
+#endif /* CONFIG_RANDOMIZE_BASE */
+#else /* CONFIG_RELOCATABLE */
movl $LOAD_PHYSICAL_ADDR, %ebx
-#endif
+#endif /* CONFIG_RELOCATABLE */

/* Target address to relocate to for decompression */
addl $z_extract_offset, %ebx
@@ -266,14 +283,30 @@ preferred_addr:
/* Start with the delta to where the kernel will run at. */
#ifdef CONFIG_RELOCATABLE
leaq startup_32(%rip) /* - $startup_32 */, %rbp
+#ifdef CONFIG_RANDOMIZE_BASE
+ leaq boot_stack_end(%rip), %rsp
+ testl $1, aslr_in_32bit(%rip)
+ jne 1f
+ call select_aslr_address
+ movq %rax, %rbp
+ jmp 2f
+1: movl aslr_offset(%rip), %eax
+ addq %rax, %rbp
+ /* LOAD_PHYSICAL_ADDR is the minimum safe address we can
+ * decompress at. */
+ cmpq $LOAD_PHYSICAL_ADDR, %rbp
+ jae 2f
+ movq $LOAD_PHYSICAL_ADDR, %rbp
+2:
+#endif /* CONFIG_RANDOMIZE_BASE */
movl BP_kernel_alignment(%rsi), %eax
decl %eax
addq %rax, %rbp
notq %rax
andq %rax, %rbp
-#else
+#else /* CONFIG_RELOCATABLE */
movq $LOAD_PHYSICAL_ADDR, %rbp
-#endif
+#endif /* CONFIG_RELOCATABLE */

/* Target address to relocate to for decompression */
leaq z_extract_offset(%rbp), %rbx
@@ -343,13 +376,85 @@ relocated:
call decompress_kernel
popq %rsi

+#ifdef CONFIG_RANDOMIZE_BASE
+/*
+ * Find the address of the relocations.
+ */
+ leaq z_output_len(%rbp), %rdi
+
+/*
+ * Calculate the delta between where vmlinux was linked to load
+ * and where it was actually loaded.
+ */
+ movq %rbp, %rbx
+ subq $LOAD_PHYSICAL_ADDR, %rbx
+ je 3f /* Nothing to be done if loaded at linked addr. */
+/*
+ * The kernel contains a table of relocation addresses. Those addresses
+ * have the final load address of the kernel in virtual memory.
+ * We are currently working in the self map. So we need to create an
+ * adjustment for kernel memory addresses to the self map. This will
+ * involve subtracting out the base address of the kernel.
+ */
+ movq $-__START_KERNEL_map, %rdx /* Literal is too big for add etc */
+ addq %rbx, %rdx
+/*
+ * Process relocations. 32 bit relocations first then 64 bit after.
+ * Two sets of binary relocations are added to the end of the
+ * kernel before compression. Each relocation table entry is the kernel
+ * address of the location which needs to be updated stored as a 32 bit
+ * value which is sign extended to 64 bits.
+ *
+ * Format is:
+ *
+ * kernel bits...
+ * 0 - zero terminator for 64 bit relocations
+ * 64 bit relocation repeated
+ * 0 - zero terminator for 32 bit relocations
+ * 32 bit relocation repeated
+ *
+ * So we work backwards from the end of the decompressed image.
+ */
+1: subq $4, %rdi
+ movslq (%rdi), %rcx
+ testq %rcx, %rcx
+ je 2f
+ addq %rdx, %rcx
+/*
+ * Relocation can't be before the image or
+ * after the current position of the current relocation.
+ * This is a cheap bounds check. It could be more exact
+ * and limit to the end of the image prior to the relocations
+ * but allowing relocations themselves to be fixed up will not
+ * do any harm.
+ */
+ cmpq %rbp, %rcx
+ jb hang
+ cmpq %rdi, %rcx
+ jae hang
+ addl %ebx, (%rcx) /* 32 bit relocation */
+ jmp 1b
+2: subq $4, %rdi
+ movslq (%rdi), %rcx
+ testq %rcx, %rcx
+ je 3f
+ addq %rdx, %rcx
+ cmpq %rbp, %rcx
+ jb hang
+ cmpq %rdi, %rcx
+ jae hang
+ addq %rbx, (%rcx) /* 64 bit relocation */
+ jmp 2b
+3:
+#endif /* CONFIG_RANDOMIZE_BASE */
+
/*
* Jump to the decompressed kernel.
*/
jmp *%rbp

.code32
-no_longmode:
+hang:
/* This isn't an x86-64 CPU so hang */
1:
hlt
@@ -369,6 +474,19 @@ gdt:
.quad 0x0000000000000000 /* TS continued */
gdt_end:

+#ifdef CONFIG_RANDOMIZE_BASE
+aslr_offset:
+ .long 0 /* Offset selected for ASLR */
+/*
+ * Set if ASLR ran in 32 bit mode. For 64 bit loaders the 32 bit code
+ * doesn't run and we need to do the offset calculation there for the
+ * first time.
+ */
+aslr_in_32bit:
+ .long 0
+
+#endif /* CONFIG_RANDOMIZE_BASE */
+
/*
* Stack and heap for uncompression
*/
@@ -379,6 +497,7 @@ boot_heap:
boot_stack:
.fill BOOT_STACK_SIZE, 1, 0
boot_stack_end:
+ .globl boot_stack_end

/*
* Space for page tables (not in .bss so not zeroed)
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index a09c285..8cb54c1 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -106,6 +106,10 @@ enum fixed_addresses {
#endif
#ifdef CONFIG_X86_F00F_BUG
FIX_F00F_IDT, /* Virtual mapping for IDT */
+#else
+#ifdef CONFIG_RANDOMIZE_BASE
+ FIX_RANDOM_IDT, /* Virtual mapping for IDT */
+#endif
#endif
#ifdef CONFIG_X86_CYCLONE_TIMER
FIX_CYCLONE_TIMER, /*cyclone timer register*/
diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h
index ef17af0..996582c 100644
--- a/arch/x86/include/asm/page_32_types.h
+++ b/arch/x86/include/asm/page_32_types.h
@@ -15,6 +15,8 @@
*/
#define __PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL)

+#define __START_KERNEL (__PAGE_OFFSET + __PHYSICAL_START)
+
#define THREAD_SIZE_ORDER 1
#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)

diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 8b491e6..c0dfe38 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -32,10 +32,6 @@
*/
#define __PAGE_OFFSET _AC(0xffff880000000000, UL)

-#define __PHYSICAL_START ((CONFIG_PHYSICAL_START + \
- (CONFIG_PHYSICAL_ALIGN - 1)) & \
- ~(CONFIG_PHYSICAL_ALIGN - 1))
-
#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
#define __START_KERNEL_map _AC(0xffffffff80000000, UL)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 54c9787..b6f9b49 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -33,6 +33,10 @@
(((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0 ) | \
VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)

+#define __PHYSICAL_START ((CONFIG_PHYSICAL_START + \
+ (CONFIG_PHYSICAL_ALIGN - 1)) & \
+ ~(CONFIG_PHYSICAL_ALIGN - 1))
+
#ifdef CONFIG_X86_64
#include <asm/page_64_types.h>
#else
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 2861082..7e014b7 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -70,6 +70,20 @@ void common(void) {
OFFSET(BP_pref_address, boot_params, hdr.pref_address);
OFFSET(BP_code32_start, boot_params, hdr.code32_start);

+ OFFSET(BP_scratch, boot_params, scratch);
+ OFFSET(BP_loadflags, boot_params, hdr.loadflags);
+ OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
+ OFFSET(BP_version, boot_params, hdr.version);
+ OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment);
+ OFFSET(BP_e820_map, boot_params, e820_map);
+ OFFSET(BP_e820_entries, boot_params, e820_entries);
+ OFFSET(BP_cmd_line_ptr, boot_params, hdr.cmd_line_ptr);
+
+ OFFSET(E820_addr, e820entry, addr);
+ OFFSET(E820_size, e820entry, size);
+ OFFSET(E820_type, e820entry, type);
+ DEFINE(E820_entry_size, sizeof(struct e820entry));
+
BLANK();
DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
}
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 90d8cc9..fd9e68f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -801,6 +801,18 @@ static void __init trim_low_memory_range(void)
}

/*
+ * Dump out kernel offset information on panic.
+ */
+static int
+dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
+{
+ pr_emerg("Kernel Offset: 0x%lx\n",
+ (unsigned long)&_text - __START_KERNEL);
+
+ return 0;
+}
+
+/*
* Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use these data structures
* for initialization. Note, the efi init code path is determined by the
@@ -1220,3 +1232,15 @@ void __init i386_reserve_resources(void)
}

#endif /* CONFIG_X86_32 */
+
+static struct notifier_block kernel_offset_notifier = {
+ .notifier_call = dump_kernel_offset
+};
+
+static int __init register_kernel_offset_dumper(void)
+{
+ atomic_notifier_chain_register(&panic_notifier_list,
+ &kernel_offset_notifier);
+ return 0;
+}
+__initcall(register_kernel_offset_dumper);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 68bda7a..c00a482 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -752,6 +752,12 @@ void __init trap_init(void)
set_bit(SYSCALL_VECTOR, used_vectors);
#endif

+#if defined(CONFIG_RANDOMIZE_BASE) && !defined(CONFIG_X86_F00F_BUG)
+ __set_fixmap(FIX_RANDOM_IDT, __pa(&idt_table), PAGE_KERNEL_RO);
+
+ /* Update the IDT descriptor. It will be reloaded in cpu_init() */
+ idt_descr.address = fix_to_virt(FIX_RANDOM_IDT);
+#endif
/*
* Should be a barrier for any external CPU state:
*/
--
1.7.9.5

2013-04-04 20:13:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On 04/04/2013 01:07 PM, Kees Cook wrote:
> However, the benefits of
> this feature in certain environments exceed the perceived weaknesses[2].

Could you clarify?

-hpa

2013-04-04 20:19:23

by Julien Tinnes

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/04/2013 01:07 PM, Kees Cook wrote:
>> However, the benefits of
>> this feature in certain environments exceed the perceived weaknesses[2].
>
> Could you clarify?

I think privilege reduction in general, and sandboxing in particular,
can make KASLR even more useful. A lot of the information leaks can be
mitigated in the same way as attack surface and vulnerabilities can be
mitigated.

Julien

2013-04-04 20:22:22

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

I have to admit to being somewhat skeptical toward KASLR with only 8
bits of randomness. There are at least two potential ways of
dramatically increasing the available randomness:

1. actually compose the kernel of multiple independently relocatable
pieces (maybe chunk it on 2M boundaries or something.)

2. compile the kernel as one of the memory models which can be executed
anywhere in the 64-bit address space. The cost of this would have
to be quantified, of course.

The latter is particularly something that should be considered for the
LPF JIT, to defend against JIT spray attacks.

-hpa

2013-04-04 20:23:43

by Julien Tinnes

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 1:19 PM, Julien Tinnes <[email protected]> wrote:
> On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>> However, the benefits of
>>> this feature in certain environments exceed the perceived weaknesses[2].
>>
>> Could you clarify?
>
> I think privilege reduction in general, and sandboxing in particular,
> can make KASLR even more useful. A lot of the information leaks can be
> mitigated in the same way as attack surface and vulnerabilities can be
> mitigated.

Case in point:
- leaks of 64 bits kernel values to userland in compatibility
sub-mode. Sandboxing by using seccomp-bpf can restrict a process to
the 64-bit mode API.
- restricting access to the syslog() system call

Julien

2013-04-04 20:28:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On 04/04/2013 01:23 PM, Julien Tinnes wrote:
> On Thu, Apr 4, 2013 at 1:19 PM, Julien Tinnes <[email protected]> wrote:
>> On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>>> However, the benefits of
>>>> this feature in certain environments exceed the perceived weaknesses[2].
>>>
>>> Could you clarify?
>>
>> I think privilege reduction in general, and sandboxing in particular,
>> can make KASLR even more useful. A lot of the information leaks can be
>> mitigated in the same way as attack surface and vulnerabilities can be
>> mitigated.
>
> Case in point:
> - leaks of 64 bits kernel values to userland in compatibility
> sub-mode. Sandboxing by using seccomp-bpf can restrict a process to
> the 64-bit mode API.
> - restricting access to the syslog() system call
>

That doesn't really speak to the value proposition. My concern is that
we're going to spend a lot of time chasing/plugging infoleaks instead of
tackling bigger problems.

8 bits of entropy is not a lot.

-hpa

2013-04-04 20:47:53

by Eric Northup

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 1:21 PM, H. Peter Anvin <[email protected]> wrote:
> I have to admit to being somewhat skeptical toward KASLR with only 8
> bits of randomness.

I agree that 8 bits is pretty low and more would be better. However,
even 8 bits provides a < 1% chance that any particular guess will be
correct. Combined with kernel crash monitoring, this amount of ASLR
means that brute-force attacks can't occur undetectably, even if they
can eventually be successful. Having a signal that indicates an
attack-in-progress is a pretty big leg up from not. Of course,
infoleaks would render this whole discussion moot, but I'm replying to
the "only 8 bits" part here.

> There are at least two potential ways of
> dramatically increasing the available randomness:
>
> 1. actually compose the kernel of multiple independently relocatable
> pieces (maybe chunk it on 2M boundaries or something.)

Without increasing the entropy bits, does this actually increase the #
of tries necessary for an attacker to guess correctly? It
dramatically increases the number of possible configurations of kernel
address space, but for any given piece there are only 256 possible
locations.

> 2. compile the kernel as one of the memory models which can be executed
> anywhere in the 64-bit address space. The cost of this would have
> to be quantified, of course.

I attempted to do this, but was limited by my knowledge of the
toolchain. I would welcome help or suggestions!

2013-04-04 20:48:48

by Julien Tinnes

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 1:27 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/04/2013 01:23 PM, Julien Tinnes wrote:
>> On Thu, Apr 4, 2013 at 1:19 PM, Julien Tinnes <[email protected]> wrote:
>>> On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>>>> However, the benefits of
>>>>> this feature in certain environments exceed the perceived weaknesses[2].
>>>>
>>>> Could you clarify?
>>>
>>> I think privilege reduction in general, and sandboxing in particular,
>>> can make KASLR even more useful. A lot of the information leaks can be
>>> mitigated in the same way as attack surface and vulnerabilities can be
>>> mitigated.
>>
>> Case in point:
>> - leaks of 64 bits kernel values to userland in compatibility
>> sub-mode. Sandboxing by using seccomp-bpf can restrict a process to
>> the 64-bit mode API.
>> - restricting access to the syslog() system call
>>
>
> That doesn't really speak to the value proposition. My concern is that
> we're going to spend a lot of time chasing/plugging infoleaks instead of
> tackling bigger problems.

Certain leaks are already an issue, even without kernel base randomization.
But yeah, this would give an incentive to plug more infoleaks. I'm not
sure what cost this would incur on kernel development.

There are by-design ones (printk) and bugs. I think we would want to
correct bugs regardless?
For by-design ones, privilege-reduction can often be an appropriate answer.

I really see KASLR as the next natural step:
1. Enforce different privilege levels via the kernel
2. Attackers attack the kernel directly
3a. Allow user-land to restrict the kernel's attack surface and
develop sandboxes (seccomp-bpf, kvm..)
3b. Add more exploitation defenses to the kernel, leveraging (3a) and (1).

> 8 bits of entropy is not a lot.

It would certainly be nice to have more, but it's a good first start.
Unlike user-land segfaults, many kernel-mode panics aren't recoverable
for an attacker.

2013-04-04 20:54:39

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/04/2013 01:07 PM, Kees Cook wrote:
>> However, the benefits of
>> this feature in certain environments exceed the perceived weaknesses[2].
>
> Could you clarify?

I would summarize the discussion of KASLR weaknesses into to two
general observations:
1- it depends on address location secrecy and leaks are common/easy.
2- it has low entropy so attack success rates may be high.

For "1", as Julien mentions, remote attacks and attacks from a
significantly contained process (via seccomp-bpf) minimizes the leak
exposure. For local attacks, cache timing attacks and other things
also exist, but the ASLR can be improved to defend against that too.
So, KASLR is useful on systems that are virtualization hosts,
providing remote services, or running locally confined processes.

For "2", I think that the comparison to userspace ASLR entropy isn't
as direct. For userspace, most systems don't tend to have any kind of
watchdog on segfaulting processes, so a remote attacker could just
keep trying an attack until they got lucky, in which case low entropy
is a serious problem. In the case of KASLR, a single attack failure
means the system goes down, which makes mounting an attack much more
difficult. I think 8 bits is fine to start with, and I think start
with a base offset ASLR is a good first step. We can improve things in
the future.

-Kees

--
Kees Cook
Chrome OS Security

2013-04-04 20:58:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

It seems to me that you are assuming that the attacker is targeting a specific system, but a bot might as well target 256 different systems and see what sticks...

Kees Cook <[email protected]> wrote:

>On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>> However, the benefits of
>>> this feature in certain environments exceed the perceived
>weaknesses[2].
>>
>> Could you clarify?
>
>I would summarize the discussion of KASLR weaknesses into to two
>general observations:
>1- it depends on address location secrecy and leaks are common/easy.
>2- it has low entropy so attack success rates may be high.
>
>For "1", as Julien mentions, remote attacks and attacks from a
>significantly contained process (via seccomp-bpf) minimizes the leak
>exposure. For local attacks, cache timing attacks and other things
>also exist, but the ASLR can be improved to defend against that too.
>So, KASLR is useful on systems that are virtualization hosts,
>providing remote services, or running locally confined processes.
>
>For "2", I think that the comparison to userspace ASLR entropy isn't
>as direct. For userspace, most systems don't tend to have any kind of
>watchdog on segfaulting processes, so a remote attacker could just
>keep trying an attack until they got lucky, in which case low entropy
>is a serious problem. In the case of KASLR, a single attack failure
>means the system goes down, which makes mounting an attack much more
>difficult. I think 8 bits is fine to start with, and I think start
>with a base offset ASLR is a good first step. We can improve things in
>the future.
>
>-Kees
>
>--
>Kees Cook
>Chrome OS Security

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-04-04 21:00:06

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 1:58 PM, H. Peter Anvin <[email protected]> wrote:
> It seems to me that you are assuming that the attacker is targeting a specific system, but a bot might as well target 256 different systems and see what sticks...

Certainly, but system monitoring will show 255 crashed machines, which
is a huge blip on any radar. :)

-Kees

>
> Kees Cook <[email protected]> wrote:
>
>>On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>>> However, the benefits of
>>>> this feature in certain environments exceed the perceived
>>weaknesses[2].
>>>
>>> Could you clarify?
>>
>>I would summarize the discussion of KASLR weaknesses into to two
>>general observations:
>>1- it depends on address location secrecy and leaks are common/easy.
>>2- it has low entropy so attack success rates may be high.
>>
>>For "1", as Julien mentions, remote attacks and attacks from a
>>significantly contained process (via seccomp-bpf) minimizes the leak
>>exposure. For local attacks, cache timing attacks and other things
>>also exist, but the ASLR can be improved to defend against that too.
>>So, KASLR is useful on systems that are virtualization hosts,
>>providing remote services, or running locally confined processes.
>>
>>For "2", I think that the comparison to userspace ASLR entropy isn't
>>as direct. For userspace, most systems don't tend to have any kind of
>>watchdog on segfaulting processes, so a remote attacker could just
>>keep trying an attack until they got lucky, in which case low entropy
>>is a serious problem. In the case of KASLR, a single attack failure
>>means the system goes down, which makes mounting an attack much more
>>difficult. I think 8 bits is fine to start with, and I think start
>>with a base offset ASLR is a good first step. We can improve things in
>>the future.
>>
>>-Kees
>>
>>--
>>Kees Cook
>>Chrome OS Security
>
> --
> Sent from my mobile phone. Please excuse brevity and lack of formatting.



--
Kees Cook
Chrome OS Security

2013-04-04 21:00:52

by Julien Tinnes

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

Natural evolution: when the cluster is the computer, kernel panics are
recoverable like segfaults in a multi-process OS.

You have a point and 8 bits isn't perfect, but it's already useful
regardless, in certain scenarios.

On Thu, Apr 4, 2013 at 1:58 PM, H. Peter Anvin <[email protected]> wrote:
> It seems to me that you are assuming that the attacker is targeting a specific system, but a bot might as well target 256 different systems and see what sticks...
>
> Kees Cook <[email protected]> wrote:
>
>>On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>>> However, the benefits of
>>>> this feature in certain environments exceed the perceived
>>weaknesses[2].
>>>
>>> Could you clarify?
>>
>>I would summarize the discussion of KASLR weaknesses into to two
>>general observations:
>>1- it depends on address location secrecy and leaks are common/easy.
>>2- it has low entropy so attack success rates may be high.
>>
>>For "1", as Julien mentions, remote attacks and attacks from a
>>significantly contained process (via seccomp-bpf) minimizes the leak
>>exposure. For local attacks, cache timing attacks and other things
>>also exist, but the ASLR can be improved to defend against that too.
>>So, KASLR is useful on systems that are virtualization hosts,
>>providing remote services, or running locally confined processes.
>>
>>For "2", I think that the comparison to userspace ASLR entropy isn't
>>as direct. For userspace, most systems don't tend to have any kind of
>>watchdog on segfaulting processes, so a remote attacker could just
>>keep trying an attack until they got lucky, in which case low entropy
>>is a serious problem. In the case of KASLR, a single attack failure
>>means the system goes down, which makes mounting an attack much more
>>difficult. I think 8 bits is fine to start with, and I think start
>>with a base offset ASLR is a good first step. We can improve things in
>>the future.
>>
>>-Kees
>>
>>--
>>Kees Cook
>>Chrome OS Security
>
> --
> Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-04-04 21:01:43

by Eric Northup

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 1:58 PM, H. Peter Anvin <[email protected]> wrote:
> It seems to me that you are assuming that the attacker is targeting a specific system, but a bot might as well target 256 different systems and see what sticks...

The alarm signal from the ones that don't stick is, in my opinion, the
primary benefit from this work -- it makes certain classes of attack
much less economical. A crash dump from a panic'd machine may include
enough information to diagnose the exploited vulnerability - and once
diagnosed and fixed, knowledge about the vulnerability is much less
valuable.

>
> Kees Cook <[email protected]> wrote:
>
>>On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>>> However, the benefits of
>>>> this feature in certain environments exceed the perceived
>>weaknesses[2].
>>>
>>> Could you clarify?
>>
>>I would summarize the discussion of KASLR weaknesses into to two
>>general observations:
>>1- it depends on address location secrecy and leaks are common/easy.
>>2- it has low entropy so attack success rates may be high.
>>
>>For "1", as Julien mentions, remote attacks and attacks from a
>>significantly contained process (via seccomp-bpf) minimizes the leak
>>exposure. For local attacks, cache timing attacks and other things
>>also exist, but the ASLR can be improved to defend against that too.
>>So, KASLR is useful on systems that are virtualization hosts,
>>providing remote services, or running locally confined processes.
>>
>>For "2", I think that the comparison to userspace ASLR entropy isn't
>>as direct. For userspace, most systems don't tend to have any kind of
>>watchdog on segfaulting processes, so a remote attacker could just
>>keep trying an attack until they got lucky, in which case low entropy
>>is a serious problem. In the case of KASLR, a single attack failure
>>means the system goes down, which makes mounting an attack much more
>>difficult. I think 8 bits is fine to start with, and I think start
>>with a base offset ASLR is a good first step. We can improve things in
>>the future.
>>
>>-Kees
>>
>>--
>>Kees Cook
>>Chrome OS Security
>
> --
> Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-04-04 21:02:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

What system monitoring? Most systems don't have much...

Kees Cook <[email protected]> wrote:

>On Thu, Apr 4, 2013 at 1:58 PM, H. Peter Anvin <[email protected]> wrote:
>> It seems to me that you are assuming that the attacker is targeting a
>specific system, but a bot might as well target 256 different systems
>and see what sticks...
>
>Certainly, but system monitoring will show 255 crashed machines, which
>is a huge blip on any radar. :)
>
>-Kees
>
>>
>> Kees Cook <[email protected]> wrote:
>>
>>>On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>>>> However, the benefits of
>>>>> this feature in certain environments exceed the perceived
>>>weaknesses[2].
>>>>
>>>> Could you clarify?
>>>
>>>I would summarize the discussion of KASLR weaknesses into to two
>>>general observations:
>>>1- it depends on address location secrecy and leaks are common/easy.
>>>2- it has low entropy so attack success rates may be high.
>>>
>>>For "1", as Julien mentions, remote attacks and attacks from a
>>>significantly contained process (via seccomp-bpf) minimizes the leak
>>>exposure. For local attacks, cache timing attacks and other things
>>>also exist, but the ASLR can be improved to defend against that too.
>>>So, KASLR is useful on systems that are virtualization hosts,
>>>providing remote services, or running locally confined processes.
>>>
>>>For "2", I think that the comparison to userspace ASLR entropy isn't
>>>as direct. For userspace, most systems don't tend to have any kind of
>>>watchdog on segfaulting processes, so a remote attacker could just
>>>keep trying an attack until they got lucky, in which case low entropy
>>>is a serious problem. In the case of KASLR, a single attack failure
>>>means the system goes down, which makes mounting an attack much more
>>>difficult. I think 8 bits is fine to start with, and I think start
>>>with a base offset ASLR is a good first step. We can improve things
>in
>>>the future.
>>>
>>>-Kees
>>>
>>>--
>>>Kees Cook
>>>Chrome OS Security
>>
>> --
>> Sent from my mobile phone. Please excuse brevity and lack of
>formatting.
>
>
>
>--
>Kees Cook
>Chrome OS Security

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-04-04 21:04:47

by Eric Northup

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 2:01 PM, H. Peter Anvin <[email protected]> wrote:
> What system monitoring? Most systems don't have much...

The security of an unmonitored system is going to be much lower than
of a well-monitored system. That's true independent of whether kASLR
is deployed.

>
> Kees Cook <[email protected]> wrote:
>
>>On Thu, Apr 4, 2013 at 1:58 PM, H. Peter Anvin <[email protected]> wrote:
>>> It seems to me that you are assuming that the attacker is targeting a
>>specific system, but a bot might as well target 256 different systems
>>and see what sticks...
>>
>>Certainly, but system monitoring will show 255 crashed machines, which
>>is a huge blip on any radar. :)
>>
>>-Kees
>>
>>>
>>> Kees Cook <[email protected]> wrote:
>>>
>>>>On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>>>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>>>>> However, the benefits of
>>>>>> this feature in certain environments exceed the perceived
>>>>weaknesses[2].
>>>>>
>>>>> Could you clarify?
>>>>
>>>>I would summarize the discussion of KASLR weaknesses into to two
>>>>general observations:
>>>>1- it depends on address location secrecy and leaks are common/easy.
>>>>2- it has low entropy so attack success rates may be high.
>>>>
>>>>For "1", as Julien mentions, remote attacks and attacks from a
>>>>significantly contained process (via seccomp-bpf) minimizes the leak
>>>>exposure. For local attacks, cache timing attacks and other things
>>>>also exist, but the ASLR can be improved to defend against that too.
>>>>So, KASLR is useful on systems that are virtualization hosts,
>>>>providing remote services, or running locally confined processes.
>>>>
>>>>For "2", I think that the comparison to userspace ASLR entropy isn't
>>>>as direct. For userspace, most systems don't tend to have any kind of
>>>>watchdog on segfaulting processes, so a remote attacker could just
>>>>keep trying an attack until they got lucky, in which case low entropy
>>>>is a serious problem. In the case of KASLR, a single attack failure
>>>>means the system goes down, which makes mounting an attack much more
>>>>difficult. I think 8 bits is fine to start with, and I think start
>>>>with a base offset ASLR is a good first step. We can improve things
>>in
>>>>the future.
>>>>
>>>>-Kees
>>>>
>>>>--
>>>>Kees Cook
>>>>Chrome OS Security
>>>
>>> --
>>> Sent from my mobile phone. Please excuse brevity and lack of
>>formatting.
>>
>>
>>
>>--
>>Kees Cook
>>Chrome OS Security
>
> --
> Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-04-04 21:06:21

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

If an attacker targets multiple distinct systems across a wide range
of system owners, instead of landing the exploit against all of them,
they'll get less than 1% of them, and crash all the rest, removing
them (for a while) from the target pool. Without KASLR, they would
have landed 100% of the attacks.

If an attacker targets multiple systems belonging to a single system
owner, most service providers, if they have that many machines, will
have some level of system monitoring. And when 99% of the hosts go
down, they're going to notice.

KASLR is certainly not a silver bullet, but it dramatically changes
the landscape of what kinds of attacks can be used.

-Kees

On Thu, Apr 4, 2013 at 2:01 PM, H. Peter Anvin <[email protected]> wrote:
> What system monitoring? Most systems don't have much...
>
> Kees Cook <[email protected]> wrote:
>
>>On Thu, Apr 4, 2013 at 1:58 PM, H. Peter Anvin <[email protected]> wrote:
>>> It seems to me that you are assuming that the attacker is targeting a
>>specific system, but a bot might as well target 256 different systems
>>and see what sticks...
>>
>>Certainly, but system monitoring will show 255 crashed machines, which
>>is a huge blip on any radar. :)
>>
>>-Kees
>>
>>>
>>> Kees Cook <[email protected]> wrote:
>>>
>>>>On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
>>>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
>>>>>> However, the benefits of
>>>>>> this feature in certain environments exceed the perceived
>>>>weaknesses[2].
>>>>>
>>>>> Could you clarify?
>>>>
>>>>I would summarize the discussion of KASLR weaknesses into to two
>>>>general observations:
>>>>1- it depends on address location secrecy and leaks are common/easy.
>>>>2- it has low entropy so attack success rates may be high.
>>>>
>>>>For "1", as Julien mentions, remote attacks and attacks from a
>>>>significantly contained process (via seccomp-bpf) minimizes the leak
>>>>exposure. For local attacks, cache timing attacks and other things
>>>>also exist, but the ASLR can be improved to defend against that too.
>>>>So, KASLR is useful on systems that are virtualization hosts,
>>>>providing remote services, or running locally confined processes.
>>>>
>>>>For "2", I think that the comparison to userspace ASLR entropy isn't
>>>>as direct. For userspace, most systems don't tend to have any kind of
>>>>watchdog on segfaulting processes, so a remote attacker could just
>>>>keep trying an attack until they got lucky, in which case low entropy
>>>>is a serious problem. In the case of KASLR, a single attack failure
>>>>means the system goes down, which makes mounting an attack much more
>>>>difficult. I think 8 bits is fine to start with, and I think start
>>>>with a base offset ASLR is a good first step. We can improve things
>>in
>>>>the future.
>>>>
>>>>-Kees
>>>>
>>>>--
>>>>Kees Cook
>>>>Chrome OS Security
>>>
>>> --
>>> Sent from my mobile phone. Please excuse brevity and lack of
>>formatting.
>>
>>
>>
>>--
>>Kees Cook
>>Chrome OS Security
>
> --
> Sent from my mobile phone. Please excuse brevity and lack of formatting.



--
Kees Cook
Chrome OS Security

2013-04-05 01:09:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On 04/04/2013 01:47 PM, Eric Northup wrote:
>>
>> 1. actually compose the kernel of multiple independently relocatable
>> pieces (maybe chunk it on 2M boundaries or something.)
>
> Without increasing the entropy bits, does this actually increase the #
> of tries necessary for an attacker to guess correctly? It
> dramatically increases the number of possible configurations of kernel
> address space, but for any given piece there are only 256 possible
> locations.
>

The 2M chunk was a red herring; one would of course effectively pack
blocks together, probably packed back to back, in random order.

>> 2. compile the kernel as one of the memory models which can be executed
>> anywhere in the 64-bit address space. The cost of this would have
>> to be quantified, of course.
>
> I attempted to do this, but was limited by my knowledge of the
> toolchain. I would welcome help or suggestions!

Start by looking at the ABI document. I suspect what we need is some
variant of the small PIC model.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-04-05 07:05:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR


* Julien Tinnes <[email protected]> wrote:

> On Thu, Apr 4, 2013 at 1:27 PM, H. Peter Anvin <[email protected]> wrote:
> > On 04/04/2013 01:23 PM, Julien Tinnes wrote:
> >> On Thu, Apr 4, 2013 at 1:19 PM, Julien Tinnes <[email protected]> wrote:
> >>> On Thu, Apr 4, 2013 at 1:12 PM, H. Peter Anvin <[email protected]> wrote:
> >>>> On 04/04/2013 01:07 PM, Kees Cook wrote:
> >>>>> However, the benefits of
> >>>>> this feature in certain environments exceed the perceived weaknesses[2].
> >>>>
> >>>> Could you clarify?
> >>>
> >>> I think privilege reduction in general, and sandboxing in particular,
> >>> can make KASLR even more useful. A lot of the information leaks can be
> >>> mitigated in the same way as attack surface and vulnerabilities can be
> >>> mitigated.
> >>
> >> Case in point:
> >> - leaks of 64 bits kernel values to userland in compatibility
> >> sub-mode. Sandboxing by using seccomp-bpf can restrict a process to
> >> the 64-bit mode API.
> >> - restricting access to the syslog() system call
> >>
> >
> > That doesn't really speak to the value proposition. My concern is that
> > we're going to spend a lot of time chasing/plugging infoleaks instead of
> > tackling bigger problems.
>
> Certain leaks are already an issue, even without kernel base
> randomization.

Definitely. Stealth infiltration needs a high reliability expoit,
especially if the attack vector used is a zero day kernel vulnerability.

Injecting uncertainty gives us a chance to get a crash logged and the
vulnerability exposed.

> But yeah, this would give an incentive to plug more infoleaks. I'm not
> sure what cost this would incur on kernel development.

I consider it a plus on kernel development - the more incentives to plug
infoleaks, the better.

> There are by-design ones (printk) and bugs. I think we would want to
> correct bugs regardless?

Definitely.

> For by-design ones, privilege-reduction can often be an appropriate answer.

Correct, that's the motivation behind kptr_restrict and dmsg_restrict.

> I really see KASLR as the next natural step:
>
> 1. Enforce different privilege levels via the kernel
> 2. Attackers attack the kernel directly
> 3a. Allow user-land to restrict the kernel's attack surface and
> develop sandboxes (seccomp-bpf, kvm..)
> 3b. Add more exploitation defenses to the kernel, leveraging (3a) and (1).
>
> > 8 bits of entropy is not a lot.
>
> It would certainly be nice to have more, but it's a good first start.
> Unlike user-land segfaults, many kernel-mode panics aren't recoverable
> for an attacker.

The other aspect of even just a couple of bits of extra entropy is that it
changes the economics of worms and other remote attacks: there's a
significant difference between being able to infect one machine per packet
and only 1 out of 256 machines while the other 255 get crashed.

The downside is debuggability - so things like 'debug' on the kernel boot
command line should probably disable this feature automatically.

Thanks,

Ingo

2013-04-05 07:11:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR


* Kees Cook <[email protected]> wrote:

> This creates CONFIG_RANDOMIZE_BASE, so that the base offset of the kernel
> can be randomized at boot.
>
> This makes kernel vulnerabilities harder to reliably exploit, especially
> from remote attacks and local processes in seccomp containers. Keeping
> the location of kernel addresses secret becomes very important when using
> this feature, so enabling kptr_restrict and dmesg_restrict is recommended.
> Besides direct address leaks, several other attacks are possible to bypass
> this on local systems, including cache timing[1]. However, the benefits of
> this feature in certain environments exceed the perceived weaknesses[2].
>
> An added security benefit is making the IDT read-only.
>
> Current entropy is low, since the kernel has basically a minimum 2MB
> alignment and has been built with -2G memory addressing. As a result,
> available entropy will be 8 bits in the best case. The e820 entries on
> a given system may further limit the available memory.
>
> This feature is presently incompatible with hibernation.
>
> When built into the kernel, the "noaslr" kernel command line option will
> disable the feature.
>
> Heavily based on work by Dan Rosenberg[3] and Neill Clift.
>
> [1] http://www.internetsociety.org/sites/default/files/Practical%20Timing%20Side%20Channel%20Attacks%20Against%20Kernel%20Space%20ASLR.pdf
> [2] http://forums.grsecurity.net/viewtopic.php?f=7&t=3367
> [3] http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/index.html#00520
>
> Signed-off-by: Kees Cook <[email protected]>
> Cc: Eric Northup <[email protected]>
> ---
> Documentation/kernel-parameters.txt | 4 +
> arch/x86/Kconfig | 51 +++++++++++--
> arch/x86/Makefile | 3 +
> arch/x86/boot/compressed/head_32.S | 21 +++++-
> arch/x86/boot/compressed/head_64.S | 135 ++++++++++++++++++++++++++++++++--
> arch/x86/include/asm/fixmap.h | 4 +
> arch/x86/include/asm/page_32_types.h | 2 +
> arch/x86/include/asm/page_64_types.h | 4 -
> arch/x86/include/asm/page_types.h | 4 +
> arch/x86/kernel/asm-offsets.c | 14 ++++
> arch/x86/kernel/setup.c | 24 ++++++
> arch/x86/kernel/traps.c | 6 ++
> 12 files changed, 251 insertions(+), 21 deletions(-)

Before going into the details, I have a structural request: could you
please further increase the granularity of the patch-set?

In particular I'd suggest introducing a helper Kconfig bool that makes the
IDT readonly - instead of using CONFIG_RANDOMIZE_BASE for that.
CONFIG_RANDOMIZE_BASE can then select this helper Kconfig switch.

Users could also select a readonly-IDT - even if they don't want a
randomized kernel.

With that done, it would be nice to implement the read-only IDT changes in
a separate, preparatory patch. If it causes any problems it will be easier
to isolate.

Thanks,

Ingo

2013-04-05 07:13:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86: build reloc tool for both 64 and 32 bit


* Kees Cook <[email protected]> wrote:

> Add logic for 64-bit kernel relocations. Since there is no need to
> handle 32 and 64 bit at the same time, refactor away most of the 32/64
> bit ELF differences and split the build into producing two separate
> binaries. Additionally switches to using realloc instead of a two-pass
> approach.
>
> Heavily based on work by Neill Clift and Michael Davidson.
>
> Signed-off-by: Kees Cook <[email protected]>
> Cc: Eric Northup <[email protected]>
> ---
> arch/x86/boot/compressed/Makefile | 2 +-
> arch/x86/realmode/rm/Makefile | 2 +-
> arch/x86/tools/.gitignore | 3 +-
> arch/x86/tools/Makefile | 14 +-
> arch/x86/tools/relocs.c | 717 ++++++++++++++++++++++++++-----------
> arch/x86/tools/relocs_32.c | 1 +
> arch/x86/tools/relocs_64.c | 2 +
> 7 files changed, 533 insertions(+), 208 deletions(-)
> create mode 100644 arch/x86/tools/relocs_32.c
> create mode 100644 arch/x86/tools/relocs_64.c

This patch too is a bit large and it would be wise to split it into two
steps: first the refactoring - which is non-functional and should not
cause any problems in theory - then the change that switches to realloc.

Thanks,

Ingo

2013-04-05 07:24:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86: routines to choose random kernel base offset


* Kees Cook <[email protected]> wrote:

> This provides routines for selecting a randomized kernel base offset,
> bounded by e820 details. It tries to use RDRAND and falls back to RDTSC.
> If "noaslr" is on the kernel command line, no offset will be used.

Would it make sense to also add three other sources of entropy:

---------

1)

CMOS_READ(RTC_SECONDS);

The exact second the bootup occured might not be known to the attacker, so
this could add a bit or two of entropy even in the worst case where the
attacker has access to precise 'uptime' and system log information and
there's no NTP active that fudges the offsets.

If the attacker is most restricted, in the sandboxed case, then this could
add a fair amount of entropy.

2)

Another source of per system entropy would be to simply mix all e820
entries into the random offset - we already parse them to place the kernel
image.

The e820 info changes per system type, amount of RAM and devices
installed.

Especially in a restricted remote environment the attacker might not know
the hardware details and the e820 map.

3)

A build time random bit. This is essentially per system as well if the
attacker does not know the precise kernel version, or if the kernel was
custom built.

---------

In the worst case an attacker can guess all of these bits - but
statistically it still improves entropy for the general Linux box that
gets attacked.

Thanks,

Ingo

2013-04-05 07:34:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR


* Kees Cook <[email protected]> wrote:

> +config RANDOMIZE_BASE_MAX_OFFSET
> + hex "Maximum ASLR offset allowed"
> + depends on RANDOMIZE_BASE
> + default "0x10000000"
> + range 0x0 0x10000000
> + ---help---
> + Determines the maximal offset in bytes that will be applied to the
> + kernel when Address Space Layout Randomization (ASLR) is active.
> + Physical memory layout and kernel size may limit this further.
> + This must be a power of two.

I'm not sure this configuration option should be exposed. Is there any
reason that if the feature is enabled, to not set this to the highest
possible value?

Furthermore, when randomization is enabled, I'd suggest to default
kptr_restrict to 1 [if the user does not override it] - currently the
bootup default is 0.

I.e. make it easy to enable effective KASLR with a single configuration
step, without "forgot about kptr_restrict" gotchas.

I'd also suggest to rename RANDOMIZE_BASE to something like
RANDOMIZE_KBASE. The 'kbase' makes it really clear that this is about some
kernel base address randomization.

'Randomize base' sounds too generic, and could be misunderstood to mean
something about our random pool for example.

Thanks,

Ingo

2013-04-05 07:36:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86: routines to choose random kernel base offset


* Ingo Molnar <[email protected]> wrote:

>
> * Kees Cook <[email protected]> wrote:
>
> > This provides routines for selecting a randomized kernel base offset,
> > bounded by e820 details. It tries to use RDRAND and falls back to RDTSC.
> > If "noaslr" is on the kernel command line, no offset will be used.
>
> Would it make sense to also add three other sources of entropy:

In any case, would it be possible to also mix these bootup sources of
entropy into our regular random pool?

That would improve random pool entropy on all Linux systems, not just
those that choose to enable kernel-base-address randomization.

Thanks,

Ingo

2013-04-05 07:55:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR


* Eric Northup <[email protected]> wrote:

> On Thu, Apr 4, 2013 at 1:58 PM, H. Peter Anvin <[email protected]> wrote:
> > It seems to me that you are assuming that the attacker is targeting a specific system, but a bot might as well target 256 different systems and see what sticks...
>
> The alarm signal from the ones that don't stick is, in my opinion, the
> primary benefit from this work -- it makes certain classes of attack
> much less economical. A crash dump from a panic'd machine may include
> enough information to diagnose the exploited vulnerability - and once
> diagnosed and fixed, knowledge about the vulnerability is much less
> valuable.

Correct.

Beyond making worm propagation and zombie collection dynamics much less
favorable, there's another aspect to randomization: attacks against high
value Linux targets often use high value exploits, where considerable
effort is spent to make sure that the attack will succeed 100%, without
alerting anyone - or will fail safely without alerting anyone.

Probabilistically crashing the kernel does not fit that requirement.

In some cases adding even a _single bit_ of randomness will change the
economics dramatically, because as time progresses and the kernel gets
(hopefully) more secure, the value of an exploitable zero-day
vulnerability becomes inevitably much higher than the value of pretty much
any system attacked.

Injecting a significant risk of detection is a powerful concept. Think of
WWII: how much effort went into making sure that the Germans did not
detect that the encryption of Enigma was broken. Or how much effort went
into making sure that the soviets did not detect that the US got hold of
one of their nukes - etc.

So this feature really seems useful across the security spectrum, for low
and high value systems alike.

Thanks,

Ingo

2013-04-05 08:04:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR


* H. Peter Anvin <[email protected]> wrote:

> I have to admit to being somewhat skeptical toward KASLR with only 8
> bits of randomness. There are at least two potential ways of
> dramatically increasing the available randomness:
>
> 1. actually compose the kernel of multiple independently relocatable
> pieces (maybe chunk it on 2M boundaries or something.)
>
> 2. compile the kernel as one of the memory models which can be executed
> anywhere in the 64-bit address space. The cost of this would have
> to be quantified, of course.
>
> The latter is particularly something that should be considered for the
> LPF JIT, to defend against JIT spray attacks.

The cost of 64-bit RIPs is probably measurable both in cache footprint and
in execution speed.

Doing that might make sense - but unless it's surprisingly cheap to do it,
at least from a distro perspective, randomizing the kernel base using the
existing compact address space would probably be the preferred option -
even if a bigger build model was available.

Random runtime shuffling of the kernel image - is that possible with
existing toolchains?

Thanks,

Ingo

2013-04-05 12:12:55

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, 4 Apr 2013, Kees Cook wrote:

> This creates CONFIG_RANDOMIZE_BASE, so that the base offset of the kernel
> can be randomized at boot.
>
> This makes kernel vulnerabilities harder to reliably exploit, especially
> from remote attacks and local processes in seccomp containers. Keeping
> the location of kernel addresses secret becomes very important when using
> this feature, so enabling kptr_restrict and dmesg_restrict is recommended.

If we are going to take the KASLR path at all, and assuming this is done
purely because of security, I'd suggest not only vaguely mentioning this
requirement in the changelog (and calling it recommendation), but make it
a hard prerequisity.

Without kernel addresses being invisible to userspace, this feature is
completely useless, but might provide very false sense of security if just
blindly enabled by random Joe Bofh.

--
Jiri Kosina
SUSE Labs

2013-04-05 14:50:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 04, 2013 at 01:07:35PM -0700, Kees Cook wrote:
> This creates CONFIG_RANDOMIZE_BASE, so that the base offset of the kernel
> can be randomized at boot.

Right,

if I'm reading this whole deal correctly, I have an issue with this
in the sense that if this thing is enabled by default and people are
running stripped kernels, an oops which is being reported is worth sh*t
since all the addresses there are random and one simply can't map them
back to which functions the callstack frames are pointing to. Which will
majorly hinder debuggability, IMHO...

[ … ]

> When built into the kernel, the "noaslr" kernel command line option
> will disable the feature.

... so the saner thing to do, IMHO, would be to flip the meaning of
this option to "kaslr" or whatever and let people and distros enable
randomization on kernels which are bug free and don't oops (good luck
finding those :-)). Generally make the thing opt-in instead of opt-out.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-05 15:31:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On 04/05/2013 01:04 AM, Ingo Molnar wrote:
>
> Random runtime shuffling of the kernel image - is that possible with
> existing toolchains?
>

Yes... the question is how much work we'd be willing to go through to
make it happen.

One approach: the kernel already contains a linker -- used for modules
-- and the bulk of the kernel could actually be composed to a "pile of
modules" that gets linked on boot. This would provide very large
amounts of randomness.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-04-05 18:15:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86: routines to choose random kernel base offset

On 04/05/2013 12:36 AM, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
>>
>> * Kees Cook <[email protected]> wrote:
>>
>>> This provides routines for selecting a randomized kernel base offset,
>>> bounded by e820 details. It tries to use RDRAND and falls back to RDTSC.
>>> If "noaslr" is on the kernel command line, no offset will be used.
>>
>> Would it make sense to also add three other sources of entropy:
>
> In any case, would it be possible to also mix these bootup sources of
> entropy into our regular random pool?
>
> That would improve random pool entropy on all Linux systems, not just
> those that choose to enable kernel-base-address randomization.
>

I think we already do at least some of these, but at this point, for any
non-RDRAND-capable hardware we could almost certainly do better for any
definition of anything at all.

RDRAND is obviously the ultimate solution here.

-hpa

2013-04-05 18:18:14

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On 04/05/2013 01:04 AM, Ingo Molnar wrote:
>
> The cost of 64-bit RIPs is probably measurable both in cache footprint and
> in execution speed.
>

Well, "probably" usually translates to "worth measuring" to me.

> Random runtime shuffling of the kernel image - is that possible with
> existing toolchains?

I wanted to point out... yes this is hard, but it has the ability to be
*much* stronger than any other form of KASLR simply because it means
that a single infoleak doesn't give everything else away.

-hpa

2013-04-05 20:01:07

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Thu, Apr 4, 2013 at 1:21 PM, H. Peter Anvin <[email protected]> wrote:
> I have to admit to being somewhat skeptical toward KASLR with only 8
> bits of randomness. There are at least two potential ways of
> dramatically increasing the available randomness:
>
> 1. actually compose the kernel of multiple independently relocatable
> pieces (maybe chunk it on 2M boundaries or something.)
>
> 2. compile the kernel as one of the memory models which can be executed
> anywhere in the 64-bit address space. The cost of this would have
> to be quantified, of course.

Why just let bootloader to load kernel on random address instead?

For our 64bit bzImage, boot loader could load kernel to anywhere above 4G.

Thanks

Yinghai

2013-04-05 20:05:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On 04/05/2013 01:01 PM, Yinghai Lu wrote:
> On Thu, Apr 4, 2013 at 1:21 PM, H. Peter Anvin <[email protected]> wrote:
>> I have to admit to being somewhat skeptical toward KASLR with only 8
>> bits of randomness. There are at least two potential ways of
>> dramatically increasing the available randomness:
>>
>> 1. actually compose the kernel of multiple independently relocatable
>> pieces (maybe chunk it on 2M boundaries or something.)
>>
>> 2. compile the kernel as one of the memory models which can be executed
>> anywhere in the 64-bit address space. The cost of this would have
>> to be quantified, of course.
>
> Why just let bootloader to load kernel on random address instead?
>
> For our 64bit bzImage, boot loader could load kernel to anywhere above 4G.
>

That makes zero difference, since the issue at hand is the *virtual*
addresses the kernel are running at. Currently, the 64-bit kernel
always runs at 0xffffffff81000000 virtual. We can't run out of
arbitrary bits of the 64-bit address space because of the memory model.

Furthermore, dealing with the boot loaders means dealing with the boot
loader maintainers, which can be insanely painful. Consider that Grub2,
10 years after this was implemented, still can't load more than one
initramfs component.

-hpa

2013-04-05 20:19:42

by Julien Tinnes

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Fri, Apr 5, 2013 at 7:49 AM, Borislav Petkov <[email protected]> wrote:
> On Thu, Apr 04, 2013 at 01:07:35PM -0700, Kees Cook wrote:
>> This creates CONFIG_RANDOMIZE_BASE, so that the base offset of the kernel
>> can be randomized at boot.
>
> Right,
>
> if I'm reading this whole deal correctly, I have an issue with this
> in the sense that if this thing is enabled by default and people are
> running stripped kernels, an oops which is being reported is worth sh*t
> since all the addresses there are random and one simply can't map them
> back to which functions the callstack frames are pointing to. Which will
> majorly hinder debuggability, IMHO...

I think it'd be perfectly ok for OOPS to print out the kernel base.

Restricting access to these oopses becomes a different problem
(privilege separation). Some existing sandboxes (Chromium, vsftpd,
openssh..) are already defending against it.

Julien

2013-04-05 20:19:41

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Fri, Apr 5, 2013 at 1:05 PM, H. Peter Anvin <[email protected]> wrote:

> That makes zero difference, since the issue at hand is the *virtual*
> addresses the kernel are running at. Currently, the 64-bit kernel
> always runs at 0xffffffff81000000 virtual. We can't run out of
> arbitrary bits of the 64-bit address space because of the memory model.

Not sure if I understand this.

when bzImage64 is loaded high, the kernel high address 0xffffffff81000000
is pointed to phys address above 4G without problem.

Also during arch/x86/boot/compressed/head_64.S, kernel does not parse
e820 yet, so it can not find right place for kernel yet.

bootloader already parse the e820, and it already know kernel run time size.
So it should be easy for them for crazy random range for kernel.

>
> Furthermore, dealing with the boot loaders means dealing with the boot
> loader maintainers, which can be insanely painful. Consider that Grub2,
> 10 years after this was implemented, still can't load more than one
> initramfs component.

syslinux and pxelinux could do that?

Thanks

Yinghai

2013-04-05 20:29:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On 04/05/2013 01:19 PM, Yinghai Lu wrote:
> On Fri, Apr 5, 2013 at 1:05 PM, H. Peter Anvin <[email protected]> wrote:
>
>> That makes zero difference, since the issue at hand is the *virtual*
>> addresses the kernel are running at. Currently, the 64-bit kernel
>> always runs at 0xffffffff81000000 virtual. We can't run out of
>> arbitrary bits of the 64-bit address space because of the memory model.
>
> Not sure if I understand this.
>
> when bzImage64 is loaded high, the kernel high address 0xffffffff81000000
> is pointed to phys address above 4G without problem.
>

That' s the problem.

KASLR is about randomizing the *virtual* address space, not the
*physical* address space. On 32 bits those are connected (which is a
problem all on its own), on 64 bits not so much.

>>
>> Furthermore, dealing with the boot loaders means dealing with the boot
>> loader maintainers, which can be insanely painful. Consider that Grub2,
>> 10 years after this was implemented, still can't load more than one
>> initramfs component.
>
> syslinux and pxelinux could do that?
>

Yes, they can.

2013-04-05 20:43:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Fri, Apr 05, 2013 at 01:19:39PM -0700, Julien Tinnes wrote:
> I think it'd be perfectly ok for OOPS to print out the kernel base.

Yeah, ok, this still would need some massaging of the oops output per
script, but it shouldn't be a big problem.

Also, you probably need to make clear in the oops itself that the
addresses have been randomized. Or, is the mere presence of kernel base
going to imply that?

> Restricting access to these oopses becomes a different problem
> (privilege separation). Some existing sandboxes (Chromium, vsftpd,
> openssh..) are already defending against it.

Ok.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-05 22:06:06

by Julien Tinnes

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Fri, Apr 5, 2013 at 12:11 AM, Ingo Molnar <[email protected]> wrote:
>
> * Kees Cook <[email protected]> wrote:
>
>> This creates CONFIG_RANDOMIZE_BASE, so that the base offset of the kernel
>> can be randomized at boot.
>>
>> This makes kernel vulnerabilities harder to reliably exploit, especially
>> from remote attacks and local processes in seccomp containers. Keeping
>> the location of kernel addresses secret becomes very important when using
>> this feature, so enabling kptr_restrict and dmesg_restrict is recommended.
>> Besides direct address leaks, several other attacks are possible to bypass
>> this on local systems, including cache timing[1]. However, the benefits of
>> this feature in certain environments exceed the perceived weaknesses[2].
>>
>> An added security benefit is making the IDT read-only.
>>
>> Current entropy is low, since the kernel has basically a minimum 2MB
>> alignment and has been built with -2G memory addressing. As a result,
>> available entropy will be 8 bits in the best case. The e820 entries on
>> a given system may further limit the available memory.
>>
>> This feature is presently incompatible with hibernation.
>>
>> When built into the kernel, the "noaslr" kernel command line option will
>> disable the feature.
>>
>> Heavily based on work by Dan Rosenberg[3] and Neill Clift.
>>
>> [1] http://www.internetsociety.org/sites/default/files/Practical%20Timing%20Side%20Channel%20Attacks%20Against%20Kernel%20Space%20ASLR.pdf
>> [2] http://forums.grsecurity.net/viewtopic.php?f=7&t=3367
>> [3] http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/index.html#00520
>>
>> Signed-off-by: Kees Cook <[email protected]>
>> Cc: Eric Northup <[email protected]>
>> ---
>> Documentation/kernel-parameters.txt | 4 +
>> arch/x86/Kconfig | 51 +++++++++++--
>> arch/x86/Makefile | 3 +
>> arch/x86/boot/compressed/head_32.S | 21 +++++-
>> arch/x86/boot/compressed/head_64.S | 135 ++++++++++++++++++++++++++++++++--
>> arch/x86/include/asm/fixmap.h | 4 +
>> arch/x86/include/asm/page_32_types.h | 2 +
>> arch/x86/include/asm/page_64_types.h | 4 -
>> arch/x86/include/asm/page_types.h | 4 +
>> arch/x86/kernel/asm-offsets.c | 14 ++++
>> arch/x86/kernel/setup.c | 24 ++++++
>> arch/x86/kernel/traps.c | 6 ++
>> 12 files changed, 251 insertions(+), 21 deletions(-)
>
> Before going into the details, I have a structural request: could you
> please further increase the granularity of the patch-set?
>
> In particular I'd suggest introducing a helper Kconfig bool that makes the
> IDT readonly - instead of using CONFIG_RANDOMIZE_BASE for that.
> CONFIG_RANDOMIZE_BASE can then select this helper Kconfig switch.
>
> Users could also select a readonly-IDT - even if they don't want a
> randomized kernel.

Speaking of IDT, and to capture some off-thread discussion here, we
should remember that the "SGDT" and "SIDT" instructions aren't
privileged on x86, so user-land can leak these out without any way for
the kernel to intercept that.

Adding their own random offsets to these two tables would be very
useful. This could be done in a later patchset of course.

Julien

2013-04-05 22:09:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On 04/05/2013 03:06 PM, Julien Tinnes wrote:
>
> Speaking of IDT, and to capture some off-thread discussion here, we
> should remember that the "SGDT" and "SIDT" instructions aren't
> privileged on x86, so user-land can leak these out without any way for
> the kernel to intercept that.
>
> Adding their own random offsets to these two tables would be very
> useful. This could be done in a later patchset of course.
>

Yes, if the GDT or IDT position is at all correlated to the kernel
position this is pointless.

-hpa

2013-04-05 22:13:40

by Julien Tinnes

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Fri, Apr 5, 2013 at 3:08 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/05/2013 03:06 PM, Julien Tinnes wrote:
>>
>> Speaking of IDT, and to capture some off-thread discussion here, we
>> should remember that the "SGDT" and "SIDT" instructions aren't
>> privileged on x86, so user-land can leak these out without any way for
>> the kernel to intercept that.
>>
>> Adding their own random offsets to these two tables would be very
>> useful. This could be done in a later patchset of course.
>>
>
> Yes, if the GDT or IDT position is at all correlated to the kernel
> position this is pointless.

Let's say it's less useful :) Remote attacks and from-inside-a-VM
attack would still be mitigated.

Julien

2013-04-05 23:18:51

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Fri, Apr 5, 2013 at 1:43 PM, Borislav Petkov <[email protected]> wrote:
> On Fri, Apr 05, 2013 at 01:19:39PM -0700, Julien Tinnes wrote:
>> I think it'd be perfectly ok for OOPS to print out the kernel base.
>
> Yeah, ok, this still would need some massaging of the oops output per
> script, but it shouldn't be a big problem.
>
> Also, you probably need to make clear in the oops itself that the
> addresses have been randomized. Or, is the mere presence of kernel base
> going to imply that?

There is already a hook in the patch that prints the offset:

+dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
+{
+ pr_emerg("Kernel Offset: 0x%lx\n",
+ (unsigned long)&_text - __START_KERNEL);
...
+ atomic_notifier_chain_register(&panic_notifier_list,
+ &kernel_offset_notifier);

But of course, this can get improved.

-Kees

--
Kees Cook
Chrome OS Security

2013-04-06 10:10:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

On Fri, Apr 05, 2013 at 04:18:49PM -0700, Kees Cook wrote:
> There is already a hook in the patch that prints the offset:
>
> +dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
> +{
> + pr_emerg("Kernel Offset: 0x%lx\n",
> + (unsigned long)&_text - __START_KERNEL);
> ...
> + atomic_notifier_chain_register(&panic_notifier_list,
> + &kernel_offset_notifier);
>
> But of course, this can get improved.

Yeah, this should probably be added to dump_trace(), i.e. something
which walks stack frames and dumps addresses. Because, in the panic
notifier, you're missing all those WARN* callsites, for example.

Also, I wonder whether it wouldn't be too hard to go even a step further
and compute the original, linker vmlinux addresses from the offsets
and dump a stack trace which looks exactly the same as if KASLR is
off. It'll probably need something to say KASLR was on when the trace
happened, though:

[ 790.253365] Call Trace (KASLR):
[ 790.254121] [<ffffffff8110bc90>] ? __smpboot_create_thread+0x180/0x180
[ 790.255428] [<ffffffff810ff1df>] kthread+0xef/0x100
...

so that people who stare at this, know.

Because, in that case, you don't need both the panic notifier or the
userspace script massaging stack trace output anymore.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-08 11:58:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR


* H. Peter Anvin <[email protected]> wrote:

> On 04/05/2013 01:04 AM, Ingo Molnar wrote:
> >
> > Random runtime shuffling of the kernel image - is that possible with
> > existing toolchains?
> >
>
> Yes... the question is how much work we'd be willing to go through to make it
> happen.
>
> One approach: the kernel already contains a linker -- used for modules -- and
> the bulk of the kernel could actually be composed to a "pile of modules" that
> gets linked on boot. This would provide very large amounts of randomness.

Is there no code generation / micro-performance disadvantage to that?

Thanks,

Ingo

2013-04-08 12:13:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR


* Borislav Petkov <[email protected]> wrote:

> On Fri, Apr 05, 2013 at 01:19:39PM -0700, Julien Tinnes wrote:
> > I think it'd be perfectly ok for OOPS to print out the kernel base.
>
> Yeah, ok, this still would need some massaging of the oops output per
> script, but it shouldn't be a big problem.

Doesn't kallsyms decoding make it all useful in most cases?

(I assume that this patch-set still keeps kallsyms working.)

Thanks,

Ingo

2013-04-08 14:59:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: kernel base offset ASLR

Not if we do it right, but there is a huge potential boot time penalty.

Ingo Molnar <[email protected]> wrote:

>
>* H. Peter Anvin <[email protected]> wrote:
>
>> On 04/05/2013 01:04 AM, Ingo Molnar wrote:
>> >
>> > Random runtime shuffling of the kernel image - is that possible
>with
>> > existing toolchains?
>> >
>>
>> Yes... the question is how much work we'd be willing to go through to
>make it
>> happen.
>>
>> One approach: the kernel already contains a linker -- used for
>modules -- and
>> the bulk of the kernel could actually be composed to a "pile of
>modules" that
>> gets linked on boot. This would provide very large amounts of
>randomness.
>
>Is there no code generation / micro-performance disadvantage to that?
>
>Thanks,
>
> Ingo

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-04-11 20:53:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 0/3] kernel ASLR

I am going to set up a tip:x86/kaslr branch, but this patchset really
needs some work before committing it, in particular Ingo is right that
it needs to be more finegrained.

-hpa

2013-04-11 21:28:27

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 0/3] kernel ASLR

On Thu, Apr 11, 2013 at 1:52 PM, H. Peter Anvin <[email protected]> wrote:
> I am going to set up a tip:x86/kaslr branch, but this patchset really
> needs some work before committing it, in particular Ingo is right that
> it needs to be more finegrained.

Absolutely. I'm working on this currently. Thanks!

-Kees

--
Kees Cook
Chrome OS Security