Hello! This is an RFC for a proof of concept I've been working on to
improve KASLR by making it finer grained. I hope you will take a look at this
and give me some feedback on the general concept as well as the early
implementation. At this point in time, the code is functional, although
there is lots of room for optimization and improvement. This patchset applies
to 5.5.0-rc7
The TL;DR summary is: This patch set rearranges your kernel code at load time
on a per-function level granularity, with only around a second added to
boot time.
Background
----------
KASLR was merged into the kernel with the objective of increasing the
difficulty of code reuse attacks. Code reuse attacks reused existing code
snippets to get around existing memory protections. They exploit software bugs
which expose addresses of useful code snippets to control the flow of
execution for their own nefarious purposes. KASLR moves the entire kernel
code text as a unit at boot time in order to make addresses less predictable.
The order of the code within the segment is unchanged - only the base address
is shifted. There are a few shortcomings to this algorithm.
1. Low Entropy - there are only so many locations the kernel can fit in. This
means an attacker could guess without too much trouble.
2. Knowledge of a single address can reveal the offset of the base address,
exposing all other locations for a published/known kernel image.
3. Info leaks abound.
Finer grained ASLR has been proposed as a way to make ASLR more resistant
to info leaks. It is not a new concept at all, and there are many variations
possible.
Proposed Improvement
--------------------
This patch set proposes adding function reordering on top of the existing
KASLR base address randomization. The over-arching objective is incremental
improvement over what we already have, as well as something that can be
merged and deployed with as little disruption to our existing kernel/ecosystem
as possible. It is designed to work with the existing solution, although it
can be used independently (not sure why you would do that though...). The
implementation is really pretty simple, and there are 2 main area where
changes occur:
* Build time
GCC has an option to place functions into individual .text sections. We can
use this option to implement function reordering at load time. The final
compiled vmlinux retains all the section headers, which can be used to help
us find the address ranges of each function. Using this information and
an expanded table of relocation addresses, we can shuffle the individual
text sections immediately after decompression. You are probably asking
yourself how this could possibly work given the number of tables of
addresses that exist inside the kernel today for features such as exception
handling and kprobes. Most of these tables generate relocations and
require a simple update, and some tables that have assumptions about order
require sorting after the update. In order to modify these tables, we
preserve a few key symbols from the objcopy symbol stripping process for use
after shuffling the text segments.
Some highlights from the build time changes to look for:
The top level kernel Makefile was modified to add the gcc flag if it
is supported. For this RFC, I am applying this flag to everything it is
possible to randomize. Future work could turn off this flags for selected
files or even entire subsystems, although obviously at the cost of security.
The relocs tool is updated to add relative relocations. This information
previously wasn't included because it wasn't necessary when moving the
entire .text segment as a unit.
A new file was created to contain a list of symbols that objcopy should
keep. We use those symbols at load time as described below.
* Load time
The boot kernel was modified to parse the vmlinux elf file after
decompression to check for our interesting symbols that we kept, and to
look for any .text.* sections to randomize. We then shuffle the sections
and update any tables that need to be updated or resorted. The existing
code which updated relocation addresses was modified to account for not
just a fixed delta from the load address, but the offset that the function
section was moved to. This requires inspection of each address to see if
it was impacted by a randomization. We use a bsearch to make this less
horrible on performce.
In this patch we utilize a pseudo-random number generator in order to allow
for known seeds to be used to create repeatable images across multiple
boots. This is purely for debugging - obviously not recommended for use
in production. This pseudo-random number generator code is very much just
an experiment and not ready for merging yet. We also block access to
/proc/kallsyms for any non-privileged user so that we don't give away our new
layout.
Does this improve security though?
----------------------------------
The objective of this patch set is to improve a technology that is already
merged into the kernel (KASLR). Obviously, this code is not a one stop
shopping place for blocking code reuse attacks, but should instead be
considered as one of several tools that can be used. In particular, this
code is meant to make KASLR more effective in the presence of info leaks.
A key point to note is that we are basically accepting that there are
many and various ways to leak address today and in the future, and rather
than assume that we can stop them all, we should find a method which makes
individual info leaks less important. Many people claim that standard KASLR
is good enough protection if the attacker does not have access to the host
machine, but for example, CVE-2019-0688 demonstrated that addresses can
be obtained even with remote attacks. So if you define a threat model in which
the kernel has W^X (executable memory is not writable), and ideally XOM
(execute only memory, neither readable nor writable), and does have info leaks,
this patch will make derandomizing the kernel considerably harder. How much
harder will depend on how much entropy we add to the existing entropy of
standard KASLR. There are some variables that determine this. Firstly and most
obviously, the number of functions you randomize matters. This implementation
keeps the existing .text section for code that cannot be randomized - for
example, because it was assembly code, or we opted out of randomization for
performance reasons. The less sections to randomize, the less entropy. In
addition, due to alignment (16 bytes for x86_64), the number of bits in a
address that the attacker needs to guess is reduced, as the lower bits are
identical. For future work, we could explore randomizing the starting
position of the function and padding with INT3s if we wanted to make the
lower bits unique. Having a XOM solution for the kernel such as the one
for VM guests on X86 platforms that is currently under discussion makes
finer grained randomization more effective against JIT-ROP and other
variations.
Other solutions
---------------
CFI is another method of mitigating code reuse attacks. CFI attempts to
prevent control flow from being diverted to gadgets (snippets of code
anywhere in the text section) by restricting calls to validated
function entry points.
* Forward Edge CFI
Common forward edge CFI implementations will check the function signature to
make sure that control flow matches the expected function signature. This
reduced the number of calls that will pass a forward edge CFI check to those
that match the original function's signature. Unfortunately, the kernel has
many functions with the same signature, so forward CFI in and of itself still
allows an attacker to potentially change to a different valid function with
the same signature.
In theory, finer grained randomization can be combined with CFI to make this
even harder, so CFI and finer grained KASLR do not need to be
competing solutions. For a kernel with both forward edge CFI and function
level randomization, an attacker would have to call to a function which not
only matched the function signature, but they would also need to take the
extra step to find the address of the new function they want to call.
In practice, I have not tested this combination, as I used standard kernel
build tools (gcc, not clang) and CFI was not an option.
* Backward edge CFI
Backward edge CFI is not available for X86 at all today, so in the case of
modifying a return address, there is no CFI based solution for today's
X86 hardware running upstream Linux.
Performance Impact
------------------
There are two areas where function reordering can impact performance: boot
time latency, and run time performance.
* Boot time latency
This implementation of finer grained KASLR impacts the boot time of the kernel
in several places. It requires additional parsing of the kernel ELF file to
obtain the section headers of the sections to be randomized. It calls the
random number generator for each section to be randomized to determine that
section's new memory location. It copies the decompressed kernel into a new
area of memory to avoid corruption when laying out the newly randomized
sections. It increases the number of relocations the kernel has to perform at
boot time vs. standard KASLR, and it also requires a lookup on each address
that needs to be relocated to see if it was in a randomized section and needs
to be adjusted by a new offset.
Booting a test VM on a modern, well appointed system showed an increase in
latency of approximately 1 second.
* Run time
The performance impact at run-time of function reordering varies by workload.
Using kcbench, a kernel compilation benchmark, the performance of a kernel
build with finer grained KASLR was about 1% slower than a kernel with standard
KASLR. Analysis with perf showed a slightly higher percentage of
L1-icache-load-misses. Other workloads were examined as well, with varied
results. Some workloads performed significantly worse under FGKASLR, while
others stayed the same or were mysteriously better. In general, it will
depend on the code flow whether or not finer grained KASLR will impact
your workload, and how the underlying code was designed. Future work could
identify hot areas that may not be randomized and either leave them in the
.text section or group them together into a single section that may be
randomized. If grouping things together helps, one other thing to consider
is that if we could identify text blobs that should be grouped together
to benefit a particular code flow, it could be interesting to explore
whether this security feature could be also be used as a performance
feature if you are interested in optimizing your kernel layout for a
particular workload at boot time. Optimizing function layout for a particular
workload has been researched and proven effective - for more information
read the Facebook paper "Optimizing Function Placement for Large-Scale
Data-Center Applications" (see references section below).
Image Size
----------
Adding additional section headers as a result of compiling with
-ffunction-sections will increase the size of the vmlinux ELF file. In
addition, the vmlinux.bin file generated in arch/x86/boot/compressed by
objcopy grows significantly with the current POC implementation. This is
because the boot heap size must be dramatically increased to support shuffling
the sections and re-sorting kallsyms. With a sample kernel compilation using a
stock Fedora config, bzImage grew about 7.5X when CONFIG_FG_KASLR was enabled.
This is because the boot heap area is included in the image itself.
It is possible to mitigate this issue by moving the boot heap out of .bss.
Kees Cook has a prototype of this working, and it is included in this
patchset.
Building
--------
To enable fine grained KASLR in the prototype code, you need to have the
following config options set (including all the ones you would use to build
normal KASLR)
CONFIG_FG_KASLR=y
Modules
-------
The same technique can be applied to modules very easily. This patchset
demonstrates how you can randomize modules at load time by shuffling
the sections prior to moving them into memory. The module code has
some TODOs left in it and has been tested less than the kernel code.
References
----------
There are a lot of academic papers which explore finer grained ASLR and
CFI. This paper in particular contributed the most to my implementation design
as well as my overall understanding of the problem space:
Selfrando: Securing the Tor Browser against De-anonymization Exploits,
M. Conti, S. Crane, T. Frassetto, et al.
For more information on how function layout impacts performance, see:
Optimizing Function Placement for Large-Scale Data-Center Applications,
G. Ottoni, B. Maher
Kees Cook (3):
x86/boot: Allow a "silent" kaslr random byte fetch
x86/boot/KASLR: Introduce PRNG for faster shuffling
x86/boot: Move "boot heap" out of .bss
Kristen Carlson Accardi (8):
modpost: Support >64K sections
x86: tools/relocs: Support >64K section headers
x86: Makefile: Add build and config option for CONFIG_FG_KASLR
x86: make sure _etext includes function sections
x86/tools: Adding relative relocs for randomized functions
x86: Add support for finer grained KASLR
kallsyms: hide layout and expose seed
module: Reorder functions
Makefile | 4 +
arch/x86/Kconfig | 13 +
arch/x86/boot/compressed/Makefile | 8 +-
arch/x86/boot/compressed/fgkaslr.c | 751 +++++++++++++++++++++++
arch/x86/boot/compressed/head_32.S | 5 +-
arch/x86/boot/compressed/head_64.S | 7 +-
arch/x86/boot/compressed/kaslr.c | 6 +-
arch/x86/boot/compressed/misc.c | 109 +++-
arch/x86/boot/compressed/misc.h | 26 +
arch/x86/boot/compressed/vmlinux.lds.S | 1 +
arch/x86/boot/compressed/vmlinux.symbols | 15 +
arch/x86/include/asm/boot.h | 15 +-
arch/x86/include/asm/kaslr.h | 4 +-
arch/x86/kernel/vmlinux.lds.S | 18 +-
arch/x86/lib/kaslr.c | 83 ++-
arch/x86/mm/init.c | 2 +-
arch/x86/mm/kaslr.c | 2 +-
arch/x86/tools/relocs.c | 143 ++++-
arch/x86/tools/relocs.h | 4 +-
arch/x86/tools/relocs_common.c | 15 +-
include/asm-generic/vmlinux.lds.h | 2 +-
kernel/kallsyms.c | 30 +-
kernel/module.c | 85 +++
scripts/kallsyms.c | 14 +-
scripts/link-vmlinux.sh | 4 +
scripts/mod/modpost.c | 16 +-
26 files changed, 1312 insertions(+), 70 deletions(-)
create mode 100644 arch/x86/boot/compressed/fgkaslr.c
create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
--
2.24.1
While it is already supported to find the total number of section
headers if we exceed 64K sections, we need to support the
extended symbol table to get section header indexes for symbols
when there are > 64K sections. Parse the elf file to read
the extended symbol table info, and then replace all direct references
to st_shndx with calls to sym_index(), which will determine whether
we can read the value directly or whether we need to pull it out of
the extended table.
Signed-off-by: Kristen Carlson Accardi <[email protected]>
---
arch/x86/tools/relocs.c | 104 ++++++++++++++++++++++++++++++----------
1 file changed, 78 insertions(+), 26 deletions(-)
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..a00dc133f109 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -14,6 +14,10 @@
static Elf_Ehdr ehdr;
static unsigned long shnum;
static unsigned int shstrndx;
+static unsigned int shsymtabndx;
+static unsigned int shxsymtabndx;
+
+static int sym_index(Elf_Sym *sym);
struct relocs {
uint32_t *offset;
@@ -32,6 +36,7 @@ struct section {
Elf_Shdr shdr;
struct section *link;
Elf_Sym *symtab;
+ Elf32_Word *xsymtab;
Elf_Rel *reltab;
char *strtab;
};
@@ -265,7 +270,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym *sym)
name = sym_strtab + sym->st_name;
}
else {
- name = sec_name(sym->st_shndx);
+ name = sec_name(sym_index(sym));
}
return name;
}
@@ -335,6 +340,23 @@ static uint64_t elf64_to_cpu(uint64_t val)
#define elf_xword_to_cpu(x) elf32_to_cpu(x)
#endif
+static int sym_index(Elf_Sym *sym)
+{
+ Elf_Sym *symtab = secs[shsymtabndx].symtab;
+ Elf32_Word *xsymtab = secs[shxsymtabndx].xsymtab;
+ unsigned long offset;
+ int index;
+
+ if (sym->st_shndx != SHN_XINDEX)
+ return sym->st_shndx;
+
+ /* calculate offset of sym from head of table. */
+ offset = (unsigned long) sym - (unsigned long) symtab;
+ index = offset/sizeof(*sym);
+
+ return elf32_to_cpu(xsymtab[index]);
+}
+
static void read_ehdr(FILE *fp)
{
if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1) {
@@ -468,31 +490,60 @@ static void read_strtabs(FILE *fp)
static void read_symtabs(FILE *fp)
{
int i,j;
+
for (i = 0; i < shnum; i++) {
struct section *sec = &secs[i];
- if (sec->shdr.sh_type != SHT_SYMTAB) {
+ int num_syms;
+
+ switch (sec->shdr.sh_type) {
+ case SHT_SYMTAB_SHNDX:
+ sec->xsymtab = malloc(sec->shdr.sh_size);
+ if (!sec->xsymtab) {
+ die("malloc of %d bytes for xsymtab failed\n",
+ sec->shdr.sh_size);
+ }
+ if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ sec->shdr.sh_offset, strerror(errno));
+ }
+ if (fread(sec->xsymtab, 1, sec->shdr.sh_size, fp)
+ != sec->shdr.sh_size) {
+ die("Cannot read extended symbol table: %s\n",
+ strerror(errno));
+ }
+ shxsymtabndx = i;
+ continue;
+
+ case SHT_SYMTAB:
+ num_syms = sec->shdr.sh_size/sizeof(Elf_Sym);
+
+ sec->symtab = malloc(sec->shdr.sh_size);
+ if (!sec->symtab) {
+ die("malloc of %d bytes for symtab failed\n",
+ sec->shdr.sh_size);
+ }
+ if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+ die("Seek to %d failed: %s\n",
+ sec->shdr.sh_offset, strerror(errno));
+ }
+ if (fread(sec->symtab, 1, sec->shdr.sh_size, fp)
+ != sec->shdr.sh_size) {
+ die("Cannot read symbol table: %s\n",
+ strerror(errno));
+ }
+ for (j = 0; j < num_syms; j++) {
+ Elf_Sym *sym = &sec->symtab[j];
+
+ sym->st_name = elf_word_to_cpu(sym->st_name);
+ sym->st_value = elf_addr_to_cpu(sym->st_value);
+ sym->st_size = elf_xword_to_cpu(sym->st_size);
+ sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
+ }
+ shsymtabndx = i;
+ continue;
+
+ default:
continue;
- }
- sec->symtab = malloc(sec->shdr.sh_size);
- if (!sec->symtab) {
- die("malloc of %d bytes for symtab failed\n",
- sec->shdr.sh_size);
- }
- if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
- die("Seek to %d failed: %s\n",
- sec->shdr.sh_offset, strerror(errno));
- }
- if (fread(sec->symtab, 1, sec->shdr.sh_size, fp)
- != sec->shdr.sh_size) {
- die("Cannot read symbol table: %s\n",
- strerror(errno));
- }
- for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Sym); j++) {
- Elf_Sym *sym = &sec->symtab[j];
- sym->st_name = elf_word_to_cpu(sym->st_name);
- sym->st_value = elf_addr_to_cpu(sym->st_value);
- sym->st_size = elf_xword_to_cpu(sym->st_size);
- sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
}
}
}
@@ -759,13 +810,14 @@ static void percpu_init(void)
*/
static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
{
- return (sym->st_shndx == per_cpu_shndx) &&
+ int shndx = sym_index(sym);
+
+ return (shndx == per_cpu_shndx) &&
strcmp(symname, "__init_begin") &&
strcmp(symname, "__per_cpu_load") &&
strncmp(symname, "init_per_cpu_", 13);
}
-
static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
const char *symname)
{
@@ -1088,7 +1140,7 @@ static int do_reloc_info(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
sec_name(sec->shdr.sh_info),
rel_type(ELF_R_TYPE(rel->r_info)),
symname,
- sec_name(sym->st_shndx));
+ sec_name(sym_index(sym)));
return 0;
}
--
2.24.1
We will be using -ffunction-sections to place each function in
it's own text section so it can be randomized at load time. The
linker considers these .text.* sections "orphaned sections", and
will place them after the first similar section (.text). However,
we need to move _etext so that it is after both .text and .text.*
We also need to calculate text size to include .text AND .text.*
Signed-off-by: Kristen Carlson Accardi <[email protected]>
---
arch/x86/kernel/vmlinux.lds.S | 18 +++++++++++++++++-
include/asm-generic/vmlinux.lds.h | 2 +-
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3a1a819da137..e54e9ac5b429 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -146,8 +146,24 @@ SECTIONS
#endif
} :text =0xcccc
- /* End of text section, which should occupy whole number of pages */
+#ifdef CONFIG_FG_KASLR
+ /*
+ * -ffunction-sections creates .text.* sections, which are considered
+ * "orphan sections" and added after the first similar section (.text).
+ * Adding this ALIGN statement causes the address of _etext
+ * to be below that of all the .text.* orphaned sections
+ */
+ . = ALIGN(PAGE_SIZE);
+#endif
_etext = .;
+
+ /*
+ * the size of the .text section is used to calculate the address
+ * range for orc lookups. If we just use SIZEOF(.text), we will
+ * miss all the .text.* sections. Calculate the size using _etext
+ * and _stext and save the value for later.
+ */
+ text_size = _etext - _stext;
. = ALIGN(PAGE_SIZE);
X86_ALIGN_RODATA_BEGIN
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e00f41aa8ec4..edf19f4296e2 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -798,7 +798,7 @@
. = ALIGN(4); \
.orc_lookup : AT(ADDR(.orc_lookup) - LOAD_OFFSET) { \
orc_lookup = .; \
- . += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) / \
+ . += (((text_size + LOOKUP_BLOCK_SIZE - 1) / \
LOOKUP_BLOCK_SIZE) + 1) * 4; \
orc_lookup_end = .; \
}
--
2.24.1
Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
the make file to build with -ffunction-sections if CONFIG_FG_KASLR
Signed-off-by: Kristen Carlson Accardi <[email protected]>
---
Makefile | 4 ++++
arch/x86/Kconfig | 13 +++++++++++++
2 files changed, 17 insertions(+)
diff --git a/Makefile b/Makefile
index c50ef91f6136..41438a921666 100644
--- a/Makefile
+++ b/Makefile
@@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
endif
+ifdef CONFIG_FG_KASLR
+KBUILD_CFLAGS += -ffunction-sections
+endif
+
# arch Makefile may override CC so keep this after arch Makefile is included
NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5e8949953660..b4b1b17076a8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2208,6 +2208,19 @@ config RANDOMIZE_BASE
If unsure, say Y.
+config FG_KASLR
+ bool "Finer grained Kernel Address Space Layout Randomization"
+ depends on $(cc-option, -ffunction-sections)
+ depends on RANDOMIZE_BASE && X86_64
+ help
+ This option improves the randomness of the kernel text
+ over basic Kernel Address Space Layout Randomization (KASLR)
+ by reordering the kernel text at boot time. This feature
+ uses information generated at compile time to re-layout the
+ kernel text section at boot time at function level granularity.
+
+ If unsure, say N.
+
# Relocation on x86 needs some additional build support
config X86_NEED_RELOCS
def_bool y
--
2.24.1
From: Kees Cook <[email protected]>
Under earlyprintk, each RNG call produces a debug report line. When
shuffling hundreds of functions, this is not useful information (each
line is identical and tells us nothing new). Instead, allow for a NULL
"purpose" to suppress the debug reporting.
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Kristen Carlson Accardi <[email protected]>
---
arch/x86/lib/kaslr.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index a53665116458..2b3eb8c948a3 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -56,11 +56,14 @@ unsigned long kaslr_get_random_long(const char *purpose)
unsigned long raw, random = get_boot_seed();
bool use_i8254 = true;
- debug_putstr(purpose);
- debug_putstr(" KASLR using");
+ if (purpose) {
+ debug_putstr(purpose);
+ debug_putstr(" KASLR using");
+ }
if (has_cpuflag(X86_FEATURE_RDRAND)) {
- debug_putstr(" RDRAND");
+ if (purpose)
+ debug_putstr(" RDRAND");
if (rdrand_long(&raw)) {
random ^= raw;
use_i8254 = false;
@@ -68,7 +71,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
}
if (has_cpuflag(X86_FEATURE_TSC)) {
- debug_putstr(" RDTSC");
+ if (purpose)
+ debug_putstr(" RDTSC");
raw = rdtsc();
random ^= raw;
@@ -76,7 +80,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
}
if (use_i8254) {
- debug_putstr(" i8254");
+ if (purpose)
+ debug_putstr(" i8254");
random ^= i8254();
}
@@ -86,7 +91,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
: "a" (random), "rm" (mix_const));
random += raw;
- debug_putstr("...\n");
+ if (purpose)
+ debug_putstr("...\n");
return random;
}
--
2.24.1
If a module has functions split out into separate text sections
(i.e. compiled with the -ffunction-sections flag), reorder the
functions to provide some code diversification to modules.
Signed-off-by: Kristen Carlson Accardi <[email protected]>
---
kernel/module.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 85 insertions(+)
diff --git a/kernel/module.c b/kernel/module.c
index b56f3224b161..231563e95e61 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -53,6 +53,8 @@
#include <linux/bsearch.h>
#include <linux/dynamic_debug.h>
#include <linux/audit.h>
+#include <linux/random.h>
+#include <asm/setup.h>
#include <uapi/linux/module.h>
#include "module-internal.h"
@@ -3245,6 +3247,87 @@ static int find_module_sections(struct module *mod, struct load_info *info)
return 0;
}
+/*
+ * shuffle_text_list()
+ * Use a Fisher Yates algorithm to shuffle a list of text sections.
+ */
+static void shuffle_text_list(Elf_Shdr **list, int size)
+{
+ int i;
+ unsigned int j;
+ Elf_Shdr *temp;
+
+ for (i = size - 1; i > 0; i--) {
+ /*
+ * TBD - seed. We need to be able to use a known
+ * seed so that we can non-randomly randomize for
+ * debugging.
+ */
+
+ // pick a random index from 0 to i
+ get_random_bytes(&j, sizeof(j));
+ j = j % (i + 1);
+
+ temp = list[i];
+ list[i] = list[j];
+ list[j] = temp;
+ }
+}
+
+/*
+ * randomize_text()
+ * Look through the core section looking for executable code sections.
+ * Store sections in an array and then shuffle the sections
+ * to reorder the functions.
+ */
+static void randomize_text(struct module *mod, struct load_info *info)
+{
+ int i;
+ int num_text_sections = 0;
+ Elf_Shdr **text_list;
+ int size = 0;
+ int max_sections = info->hdr->e_shnum;
+ unsigned int sec = find_sec(info, ".text");
+
+ if (!IS_ENABLED(CONFIG_FG_KASLR) || !kaslr_enabled())
+ return;
+
+ if (sec == 0)
+ return;
+
+ text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
+ if (text_list == NULL)
+ return;
+
+ for (i = 0; i < max_sections; i++) {
+ Elf_Shdr *shdr = &info->sechdrs[i];
+ const char *sname = info->secstrings + shdr->sh_name;
+
+ if (!(shdr->sh_flags & SHF_ALLOC) ||
+ !(shdr->sh_flags & SHF_EXECINSTR) ||
+ strstarts(sname, ".init"))
+ continue;
+
+ text_list[num_text_sections] = shdr;
+ num_text_sections++;
+ }
+
+ shuffle_text_list(text_list, num_text_sections);
+
+ for (i = 0; i < num_text_sections; i++) {
+ Elf_Shdr *shdr = text_list[i];
+ unsigned int infosec;
+ const char *sname;
+
+ sname = info->secstrings + shdr->sh_name;
+ infosec = shdr->sh_info;
+
+ shdr->sh_entsize = get_offset(mod, &size, shdr, infosec);
+ }
+
+ kfree(text_list);
+}
+
static int move_module(struct module *mod, struct load_info *info)
{
int i;
@@ -3282,6 +3365,8 @@ static int move_module(struct module *mod, struct load_info *info)
} else
mod->init_layout.base = NULL;
+ randomize_text(mod, info);
+
/* Transfer each section which specifies SHF_ALLOC */
pr_debug("final section addresses:\n");
for (i = 0; i < info->hdr->e_shnum; i++) {
--
2.24.1
> On Feb 5, 2020, at 2:39 PM, Kristen Carlson Accardi <[email protected]> wrote:
>
> From: Kees Cook <[email protected]>
>
> Under earlyprintk, each RNG call produces a debug report line. When
> shuffling hundreds of functions, this is not useful information (each
> line is identical and tells us nothing new). Instead, allow for a NULL
> "purpose" to suppress the debug reporting.
Have you counted how many RDRAND calls this causes? RDRAND is exceedingly slow on all CPUs I’ve looked at. The whole “RDRAND has great bandwidth” marketing BS actually means that it has decent bandwidth if all CPUs hammer it at the same time. The latency is abysmal. I have asked Intel to improve this, but the latency of that request will be quadrillions of cycles :)
It wouldn’t shock me if just the RDRAND calls account for a respectable fraction of total time. The RDTSC fallback, on the other hand, may be so predictable as to be useless.
I would suggest adding a little ChaCha20 DRBG or similar to the KASLR environment instead. What crypto primitives are available there?
On Wed, Feb 05, 2020 at 02:39:44PM -0800, Kristen Carlson Accardi wrote:
> Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
> the make file to build with -ffunction-sections if CONFIG_FG_KASLR
>
> Signed-off-by: Kristen Carlson Accardi <[email protected]>
> ---
> Makefile | 4 ++++
> arch/x86/Kconfig | 13 +++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index c50ef91f6136..41438a921666 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
> KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
> endif
>
> +ifdef CONFIG_FG_KASLR
> +KBUILD_CFLAGS += -ffunction-sections
> +endif
The GCC manual has:
-ffunction-sections
-fdata-sections
Place each function or data item into its own section in the output
file if the target supports arbitrary sections. The name of the
function or the name of the data item determines the section’s name
in the output file.
Use these options on systems where the linker can perform
optimizations to improve locality of reference in the instruction
space. Most systems using the ELF object format have linkers with
such optimizations. On AIX, the linker rearranges sections (CSECTs)
based on the call graph. The performance impact varies.
Together with a linker garbage collection (linker --gc-sections
option) these options may lead to smaller statically-linked
executables (after stripping).
On ELF/DWARF systems these options do not degenerate the quality of
the debug information. There could be issues with other object
files/debug info formats.
Only use these options when there are significant benefits from
doing so. When you specify these options, the assembler and linker
create larger object and executable files and are also slower. These
options affect code generation. They prevent optimizations by the
compiler and assembler using relative locations inside a translation
unit since the locations are unknown until link time. An example of
such an optimization is relaxing calls to short call instructions.
In particular:
"They prevent optimizations by the compiler and assembler using
relative locations inside a translation unit since the locations are
unknown until link time."
I suppose in practise this only means tail-calls are affected and will
no longer use JMP.d8. Or are more things affected?
(Also, should not the next patch come before this one?)
On Thu, Feb 06, 2020 at 11:30:55AM +0100, Peter Zijlstra wrote:
> On Wed, Feb 05, 2020 at 02:39:44PM -0800, Kristen Carlson Accardi wrote:
> > Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
> > the make file to build with -ffunction-sections if CONFIG_FG_KASLR
> >
> > Signed-off-by: Kristen Carlson Accardi <[email protected]>
> > ---
> > Makefile | 4 ++++
> > arch/x86/Kconfig | 13 +++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/Makefile b/Makefile
> > index c50ef91f6136..41438a921666 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
> > KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
> > endif
> >
> > +ifdef CONFIG_FG_KASLR
> > +KBUILD_CFLAGS += -ffunction-sections
> > +endif
> [...]
> In particular:
>
> "They prevent optimizations by the compiler and assembler using
> relative locations inside a translation unit since the locations are
> unknown until link time."
I think this mainly a feature of this flag, since it's those relocations
that are used to do the post-shuffle fixups. But yes, I would imagine
this has some negative impact on code generation.
> I suppose in practise this only means tail-calls are affected and will
> no longer use JMP.d8. Or are more things affected?
It's worth looking at. I'm also curious to see how this will interact
with Link Time Optimization.
--
Kees Cook
On Wed, Feb 05, 2020 at 05:08:55PM -0800, Andy Lutomirski wrote:
>
>
> > On Feb 5, 2020, at 2:39 PM, Kristen Carlson Accardi <[email protected]> wrote:
> >
> > From: Kees Cook <[email protected]>
> >
> > Under earlyprintk, each RNG call produces a debug report line. When
> > shuffling hundreds of functions, this is not useful information (each
> > line is identical and tells us nothing new). Instead, allow for a NULL
> > "purpose" to suppress the debug reporting.
>
> Have you counted how many RDRAND calls this causes? RDRAND is
> exceedingly slow on all CPUs I’ve looked at. The whole “RDRAND
> has great bandwidth” marketing BS actually means that it has decent
> bandwidth if all CPUs hammer it at the same time. The latency is abysmal.
> I have asked Intel to improve this, but the latency of that request will
> be quadrillions of cycles :)
In an earlier version of this series, it was called once per function
section (so, about 50,000 times). The (lack of) speed was quite
measurable.
> I would suggest adding a little ChaCha20 DRBG or similar to the KASLR
> environment instead. What crypto primitives are available there?
Agreed. The simple PRNG in the next patch was most just a POC initially,
but Kristen kept it due to its debugging properties (specifying an
external seed). Pulling in ChaCha20 seems like a good approach.
--
Kees Cook
On Wed, Feb 05, 2020 at 02:39:49PM -0800, Kristen Carlson Accardi wrote:
> If a module has functions split out into separate text sections
> (i.e. compiled with the -ffunction-sections flag), reorder the
> functions to provide some code diversification to modules.
>
> Signed-off-by: Kristen Carlson Accardi <[email protected]>
Probably a good idea to add Jessica to CC in next version:
Jessica Yu <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
-Kees
> ---
> kernel/module.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 85 insertions(+)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index b56f3224b161..231563e95e61 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -53,6 +53,8 @@
> #include <linux/bsearch.h>
> #include <linux/dynamic_debug.h>
> #include <linux/audit.h>
> +#include <linux/random.h>
> +#include <asm/setup.h>
> #include <uapi/linux/module.h>
> #include "module-internal.h"
>
> @@ -3245,6 +3247,87 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> return 0;
> }
>
> +/*
> + * shuffle_text_list()
> + * Use a Fisher Yates algorithm to shuffle a list of text sections.
> + */
> +static void shuffle_text_list(Elf_Shdr **list, int size)
> +{
> + int i;
> + unsigned int j;
> + Elf_Shdr *temp;
> +
> + for (i = size - 1; i > 0; i--) {
> + /*
> + * TBD - seed. We need to be able to use a known
> + * seed so that we can non-randomly randomize for
> + * debugging.
> + */
> +
> + // pick a random index from 0 to i
> + get_random_bytes(&j, sizeof(j));
> + j = j % (i + 1);
> +
> + temp = list[i];
> + list[i] = list[j];
> + list[j] = temp;
> + }
> +}
> +
> +/*
> + * randomize_text()
> + * Look through the core section looking for executable code sections.
> + * Store sections in an array and then shuffle the sections
> + * to reorder the functions.
> + */
> +static void randomize_text(struct module *mod, struct load_info *info)
> +{
> + int i;
> + int num_text_sections = 0;
> + Elf_Shdr **text_list;
> + int size = 0;
> + int max_sections = info->hdr->e_shnum;
> + unsigned int sec = find_sec(info, ".text");
> +
> + if (!IS_ENABLED(CONFIG_FG_KASLR) || !kaslr_enabled())
> + return;
> +
> + if (sec == 0)
> + return;
> +
> + text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
> + if (text_list == NULL)
> + return;
> +
> + for (i = 0; i < max_sections; i++) {
> + Elf_Shdr *shdr = &info->sechdrs[i];
> + const char *sname = info->secstrings + shdr->sh_name;
> +
> + if (!(shdr->sh_flags & SHF_ALLOC) ||
> + !(shdr->sh_flags & SHF_EXECINSTR) ||
> + strstarts(sname, ".init"))
> + continue;
> +
> + text_list[num_text_sections] = shdr;
> + num_text_sections++;
> + }
> +
> + shuffle_text_list(text_list, num_text_sections);
> +
> + for (i = 0; i < num_text_sections; i++) {
> + Elf_Shdr *shdr = text_list[i];
> + unsigned int infosec;
> + const char *sname;
> +
> + sname = info->secstrings + shdr->sh_name;
> + infosec = shdr->sh_info;
> +
> + shdr->sh_entsize = get_offset(mod, &size, shdr, infosec);
> + }
> +
> + kfree(text_list);
> +}
> +
> static int move_module(struct module *mod, struct load_info *info)
> {
> int i;
> @@ -3282,6 +3365,8 @@ static int move_module(struct module *mod, struct load_info *info)
> } else
> mod->init_layout.base = NULL;
>
> + randomize_text(mod, info);
> +
> /* Transfer each section which specifies SHF_ALLOC */
> pr_debug("final section addresses:\n");
> for (i = 0; i < info->hdr->e_shnum; i++) {
> --
> 2.24.1
>
--
Kees Cook
On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote:
> We will be using -ffunction-sections to place each function in
> it's own text section so it can be randomized at load time. The
> linker considers these .text.* sections "orphaned sections", and
> will place them after the first similar section (.text). However,
> we need to move _etext so that it is after both .text and .text.*
> We also need to calculate text size to include .text AND .text.*
The dependency on the linker's orphan section handling is, I feel,
rather fragile (during work on CFI and generally building kernels with
Clang's LLD linker, we keep tripping over difference between how BFD and
LLD handle orphans). However, this is currently no way to perform a
section "pass through" where input sections retain their name as an
output section. (If anyone knows a way to do this, I'm all ears).
Right now, you can only collect sections like this:
.text : AT(ADDR(.text) - LOAD_OFFSET) {
*(.text.*)
}
or let them be orphans, which then the linker attempts to find a
"similar" (code, data, etc) section to put them near:
https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
So, basically, yes, this works, but I'd like to see BFD and LLD grow
some kind of /PASSTHRU/ special section (like /DISCARD/), that would let
a linker script specify _where_ these sections should roughly live.
Related thoughts:
I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
what function start alignment should be. It seems the linker is 16 byte
aligning these functions, when I think no alignment is needed for
function starts, so we're wasting some memory (average 8 bytes per
function, at say 50,000 functions, so approaching 512KB) between
functions. If we can specify a 1 byte alignment for these orphan
sections, that would be nice, as mentioned in the cover letter: we lose
a 4 bits of entropy to this alignment, since all randomized function
addresses will have their low bits set to zero.
And we can't adjust function section alignment, or there is some
benefit to a larger alignment, I would like to have a way for the linker
to specify the inter-section padding (or fill) bytes. Right now, the
FILL(0xnn) (or =0xnn) can be used to specify the padding bytes _within_
a section, but not between sections. Right now, BFD appears to 0-pad. I'd
like that to be 0xCC so "guessing" addresses incorrectly will trigger
a trap.
-Kees
>
> Signed-off-by: Kristen Carlson Accardi <[email protected]>
> ---
> arch/x86/kernel/vmlinux.lds.S | 18 +++++++++++++++++-
> include/asm-generic/vmlinux.lds.h | 2 +-
> 2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 3a1a819da137..e54e9ac5b429 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -146,8 +146,24 @@ SECTIONS
> #endif
> } :text =0xcccc
>
> - /* End of text section, which should occupy whole number of pages */
> +#ifdef CONFIG_FG_KASLR
> + /*
> + * -ffunction-sections creates .text.* sections, which are considered
> + * "orphan sections" and added after the first similar section (.text).
> + * Adding this ALIGN statement causes the address of _etext
> + * to be below that of all the .text.* orphaned sections
> + */
> + . = ALIGN(PAGE_SIZE);
> +#endif
> _etext = .;
> +
> + /*
> + * the size of the .text section is used to calculate the address
> + * range for orc lookups. If we just use SIZEOF(.text), we will
> + * miss all the .text.* sections. Calculate the size using _etext
> + * and _stext and save the value for later.
> + */
> + text_size = _etext - _stext;
> . = ALIGN(PAGE_SIZE);
>
> X86_ALIGN_RODATA_BEGIN
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index e00f41aa8ec4..edf19f4296e2 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -798,7 +798,7 @@
> . = ALIGN(4); \
> .orc_lookup : AT(ADDR(.orc_lookup) - LOAD_OFFSET) { \
> orc_lookup = .; \
> - . += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) / \
> + . += (((text_size + LOOKUP_BLOCK_SIZE - 1) / \
> LOOKUP_BLOCK_SIZE) + 1) * 4; \
> orc_lookup_end = .; \
> }
> --
> 2.24.1
>
--
Kees Cook
On Thu, Feb 6, 2020 at 1:26 PM Kees Cook <[email protected]> wrote:
> I know x86_64 stack alignment is 16 bytes.
That's true for the standard sysv ABI that is used in userspace; but
the kernel uses a custom ABI with 8-byte stack alignment. See
arch/x86/Makefile:
# For gcc stack alignment is specified with -mpreferred-stack-boundary,
# clang has the option -mstack-alignment for that purpose.
ifneq ($(call cc-option, -mpreferred-stack-boundary=4),)
cc_stack_align4 := -mpreferred-stack-boundary=2
cc_stack_align8 := -mpreferred-stack-boundary=3
else ifneq ($(call cc-option, -mstack-alignment=16),)
cc_stack_align4 := -mstack-alignment=4
cc_stack_align8 := -mstack-alignment=8
endif
[...]
# By default gcc and clang use a stack alignment of 16 bytes for x86.
# However the standard kernel entry on x86-64 leaves the stack on an
# 8-byte boundary. If the compiler isn't informed about the actual
# alignment it will generate extra alignment instructions for the
# default alignment which keep the stack *mis*aligned.
# Furthermore an alignment to the register width reduces stack usage
# and the number of alignment instructions.
KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align8))
> I cannot find evidence for
> what function start alignment should be.
There is no architecturally required alignment for functions, but
Intel's Optimization Manual
(<https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>)
recommends in section 3.4.1.5, "Code Alignment":
| Assembly/Compiler Coding Rule 12. (M impact, H generality)
| All branch targets should be 16-byte aligned.
AFAIK this is recommended because, as documented in section 2.3.2.1,
"Legacy Decode Pipeline" (describing the frontend of Sandy Bridge, and
used as the base for newer microarchitectures):
| An instruction fetch is a 16-byte aligned lookup through the ITLB
and into the instruction cache.
| The instruction cache can deliver every cycle 16 bytes to the
instruction pre-decoder.
AFAIK this means that if a branch ends close to the end of a 16-byte
block, the frontend is less efficient because it may have to run two
instruction fetches before the first instruction can even be decoded.
On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
> what function start alignment should be. It seems the linker is 16 byte
> aligning these functions, when I think no alignment is needed for
> function starts, so we're wasting some memory (average 8 bytes per
> function, at say 50,000 functions, so approaching 512KB) between
> functions. If we can specify a 1 byte alignment for these orphan
> sections, that would be nice, as mentioned in the cover letter: we lose
> a 4 bits of entropy to this alignment, since all randomized function
> addresses will have their low bits set to zero.
>
The default function alignment is 16-bytes for x64 at least with gcc.
You can use -falign-functions to specify a different alignment.
There was some old discussion on reducing it [1] but it doesn't seem to
have been merged.
[1] https://lore.kernel.org/lkml/[email protected]/
On Wed, Feb 05, 2020 at 02:39:41PM -0800, Kristen Carlson Accardi wrote:
> While it is already supported to find the total number of section
> headers if we exceed 64K sections, we need to support the
> extended symbol table to get section header indexes for symbols
> when there are > 64K sections. Parse the elf file to read
> the extended symbol table info, and then replace all direct references
> to st_shndx with calls to sym_index(), which will determine whether
> we can read the value directly or whether we need to pull it out of
> the extended table.
>
> Signed-off-by: Kristen Carlson Accardi <[email protected]>
Looks good to me.
Reviewed-by: Kees Cook <[email protected]>
-Kees
--
Kees Cook
On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
> On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote:
> > We will be using -ffunction-sections to place each function in
> > it's own text section so it can be randomized at load time. The
> > linker considers these .text.* sections "orphaned sections", and
> > will place them after the first similar section (.text). However,
> > we need to move _etext so that it is after both .text and .text.*
> > We also need to calculate text size to include .text AND .text.*
>
> The dependency on the linker's orphan section handling is, I feel,
> rather fragile (during work on CFI and generally building kernels with
> Clang's LLD linker, we keep tripping over difference between how BFD and
> LLD handle orphans). However, this is currently no way to perform a
> section "pass through" where input sections retain their name as an
> output section. (If anyone knows a way to do this, I'm all ears).
>
> Right now, you can only collect sections like this:
>
> .text : AT(ADDR(.text) - LOAD_OFFSET) {
> *(.text.*)
> }
>
> or let them be orphans, which then the linker attempts to find a
> "similar" (code, data, etc) section to put them near:
> https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
>
> So, basically, yes, this works, but I'd like to see BFD and LLD grow
> some kind of /PASSTHRU/ special section (like /DISCARD/), that would let
> a linker script specify _where_ these sections should roughly live.
>
You could go through the objects that are being linked and find the
individual text sections, and generate the linker script using that?
On Thu, Feb 06, 2020 at 09:39:43AM -0500, Arvind Sankar wrote:
> On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
> > I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
> > what function start alignment should be. It seems the linker is 16 byte
> > aligning these functions, when I think no alignment is needed for
> > function starts, so we're wasting some memory (average 8 bytes per
> > function, at say 50,000 functions, so approaching 512KB) between
> > functions. If we can specify a 1 byte alignment for these orphan
> > sections, that would be nice, as mentioned in the cover letter: we lose
> > a 4 bits of entropy to this alignment, since all randomized function
> > addresses will have their low bits set to zero.
> >
>
> The default function alignment is 16-bytes for x64 at least with gcc.
> You can use -falign-functions to specify a different alignment.
>
> There was some old discussion on reducing it [1] but it doesn't seem to
> have been merged.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
Though I don't think the entropy loss is real. With 50k functions, you
can use at most log(50k!) = ~35 KiB worth of entropy in permuting them,
no matter what the alignment is. The only way you can get more is if you
have more than 50k slots to put them in.
On Thu, Feb 06, 2020 at 09:57:40AM -0500, Arvind Sankar wrote:
> On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
> > On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote:
> > > We will be using -ffunction-sections to place each function in
> > > it's own text section so it can be randomized at load time. The
> > > linker considers these .text.* sections "orphaned sections", and
> > > will place them after the first similar section (.text). However,
> > > we need to move _etext so that it is after both .text and .text.*
> > > We also need to calculate text size to include .text AND .text.*
> >
> > The dependency on the linker's orphan section handling is, I feel,
> > rather fragile (during work on CFI and generally building kernels with
> > Clang's LLD linker, we keep tripping over difference between how BFD and
> > LLD handle orphans). However, this is currently no way to perform a
> > section "pass through" where input sections retain their name as an
> > output section. (If anyone knows a way to do this, I'm all ears).
> >
> > Right now, you can only collect sections like this:
> >
> > .text : AT(ADDR(.text) - LOAD_OFFSET) {
> > *(.text.*)
> > }
> >
> > or let them be orphans, which then the linker attempts to find a
> > "similar" (code, data, etc) section to put them near:
> > https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
> >
> > So, basically, yes, this works, but I'd like to see BFD and LLD grow
> > some kind of /PASSTHRU/ special section (like /DISCARD/), that would let
> > a linker script specify _where_ these sections should roughly live.
> >
>
> You could go through the objects that are being linked and find the
> individual text sections, and generate the linker script using that?
Also, one thing to note about the orphan section handling -- by default
ld will combine multiple orphan sections with the same name into a
single output section. So if you have sections corresponding to static
functions with the same name but from different files, they will get
unnecessarily combined. You may want to add --unique to the ld options
to keep them separate. That will create multiple sections with the same
name instead of merging them.
> On Feb 6, 2020, at 7:29 AM, Arvind Sankar <[email protected]> wrote:
>
> On Thu, Feb 06, 2020 at 09:39:43AM -0500, Arvind Sankar wrote:
>>> On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
>>> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
>>> what function start alignment should be. It seems the linker is 16 byte
>>> aligning these functions, when I think no alignment is needed for
>>> function starts, so we're wasting some memory (average 8 bytes per
>>> function, at say 50,000 functions, so approaching 512KB) between
>>> functions. If we can specify a 1 byte alignment for these orphan
>>> sections, that would be nice, as mentioned in the cover letter: we lose
>>> a 4 bits of entropy to this alignment, since all randomized function
>>> addresses will have their low bits set to zero.
>>>
>>
>> The default function alignment is 16-bytes for x64 at least with gcc.
>> You can use -falign-functions to specify a different alignment.
>>
>> There was some old discussion on reducing it [1] but it doesn't seem to
>> have been merged.
>>
>> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Though I don't think the entropy loss is real. With 50k functions, you
> can use at most log(50k!) = ~35 KiB worth of entropy in permuting them,
> no matter what the alignment is. The only way you can get more is if you
> have more than 50k slots to put them in.
There is a security consideration here that has nothing to do with entropy per se. If an attacker locates two functions, they learn the distance between them. This constrains what can fit in the gap. Padding reduces the strength of this type of attack, as would some degree of random padding.
From: Jann Horn
> Sent: 06 February 2020 13:16
...
> > I cannot find evidence for
> > what function start alignment should be.
>
> There is no architecturally required alignment for functions, but
> Intel's Optimization Manual
> (<https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-
> optimization-manual.pdf>)
> recommends in section 3.4.1.5, "Code Alignment":
>
> | Assembly/Compiler Coding Rule 12. (M impact, H generality)
> | All branch targets should be 16-byte aligned.
>
> AFAIK this is recommended because, as documented in section 2.3.2.1,
> "Legacy Decode Pipeline" (describing the frontend of Sandy Bridge, and
> used as the base for newer microarchitectures):
>
> | An instruction fetch is a 16-byte aligned lookup through the ITLB
> and into the instruction cache.
> | The instruction cache can deliver every cycle 16 bytes to the
> instruction pre-decoder.
>
> AFAIK this means that if a branch ends close to the end of a 16-byte
> block, the frontend is less efficient because it may have to run two
> instruction fetches before the first instruction can even be decoded.
See also The microarchitecture of Intel, AMD and VIA CPUs from http://www.agner.org/optimize
My suspicion is that reducing the cache size (so more code fits in)
will almost always be a win over aligning branch targets and entry points.
If the alignment of a function matters then there are probably other
changes to that bit of code that will give a larger benefit.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, 2020-02-05 at 17:08 -0800, Andy Lutomirski wrote:
> > On Feb 5, 2020, at 2:39 PM, Kristen Carlson Accardi <
> > [email protected]> wrote:
> >
> > From: Kees Cook <[email protected]>
> >
> > Under earlyprintk, each RNG call produces a debug report line. When
> > shuffling hundreds of functions, this is not useful information
> > (each
> > line is identical and tells us nothing new). Instead, allow for a
> > NULL
> > "purpose" to suppress the debug reporting.
>
> Have you counted how many RDRAND calls this causes? RDRAND is
> exceedingly slow on all CPUs I’ve looked at. The whole “RDRAND has
> great bandwidth” marketing BS actually means that it has decent
> bandwidth if all CPUs hammer it at the same time. The latency is
> abysmal. I have asked Intel to improve this, but the latency of that
> request will be quadrillions of cycles :)
>
> It wouldn’t shock me if just the RDRAND calls account for a
> respectable fraction of total time. The RDTSC fallback, on the other
> hand, may be so predictable as to be useless.
I think at the moment the calls to rdrand are really not the largest
contributor to the latency. The relocations are the real bottleneck -
each address must be inspected to see if it is in the list of function
sections that have been randomized, and the value at that address must
also be inspected to see if it's in the list of function sections.
That's a lot of lookups. That said, I tried to measure the difference
between using Kees' prng vs. the rdrand calls and found little to no
measurable difference. I think at this point it's in the noise -
hopefully we will get to a point where this matters more.
>
> I would suggest adding a little ChaCha20 DRBG or similar to the KASLR
> environment instead. What crypto primitives are available there?
I will read up on this.
On Thu, 2020-02-06 at 04:26 -0800, Kees Cook wrote:
> On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi
> wrote:
> > We will be using -ffunction-sections to place each function in
> > it's own text section so it can be randomized at load time. The
> > linker considers these .text.* sections "orphaned sections", and
> > will place them after the first similar section (.text). However,
> > we need to move _etext so that it is after both .text and .text.*
> > We also need to calculate text size to include .text AND .text.*
>
> The dependency on the linker's orphan section handling is, I feel,
> rather fragile (during work on CFI and generally building kernels
> with
> Clang's LLD linker, we keep tripping over difference between how BFD
> and
> LLD handle orphans). However, this is currently no way to perform a
> section "pass through" where input sections retain their name as an
> output section. (If anyone knows a way to do this, I'm all ears).
>
> Right now, you can only collect sections like this:
>
> .text : AT(ADDR(.text) - LOAD_OFFSET) {
> *(.text.*)
> }
>
> or let them be orphans, which then the linker attempts to find a
> "similar" (code, data, etc) section to put them near:
> https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
>
> So, basically, yes, this works, but I'd like to see BFD and LLD grow
> some kind of /PASSTHRU/ special section (like /DISCARD/), that would
> let
> a linker script specify _where_ these sections should roughly live.
>
> Related thoughts:
>
> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
> what function start alignment should be. It seems the linker is 16
> byte
> aligning these functions, when I think no alignment is needed for
> function starts, so we're wasting some memory (average 8 bytes per
> function, at say 50,000 functions, so approaching 512KB) between
> functions. If we can specify a 1 byte alignment for these orphan
> sections, that would be nice, as mentioned in the cover letter: we
> lose
> a 4 bits of entropy to this alignment, since all randomized function
> addresses will have their low bits set to zero.
So, when I was developing this patch set, I initially ignored the value
of sh_addralign and just packed the functions in one right after
another when I did the new layout. They were byte aligned :). I later
realized that I should probably pay attention to alignment and thus
started respecting the value that was in sh_addralign. There is
actually nothing stopping me from ignoring it again, other than I am
concerned that I will make runtime performance suffer even more than I
already have.
>
> And we can't adjust function section alignment, or there is some
> benefit to a larger alignment, I would like to have a way for the
> linker
> to specify the inter-section padding (or fill) bytes. Right now, the
> FILL(0xnn) (or =0xnn) can be used to specify the padding bytes
> _within_
> a section, but not between sections. Right now, BFD appears to 0-pad.
> I'd
> like that to be 0xCC so "guessing" addresses incorrectly will trigger
> a trap.
Padding the space between functions with int3 is easy to add during
boot time, and I've got it on my todo list.
> On Feb 6, 2020, at 11:41 AM, Kristen Carlson Accardi <[email protected]> wrote:
>
> On Thu, 2020-02-06 at 04:26 -0800, Kees Cook wrote:
>>> On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi
>>> wrote:
>>> We will be using -ffunction-sections to place each function in
>>> it's own text section so it can be randomized at load time. The
>>> linker considers these .text.* sections "orphaned sections", and
>>> will place them after the first similar section (.text). However,
>>> we need to move _etext so that it is after both .text and .text.*
>>> We also need to calculate text size to include .text AND .text.*
>>
>> The dependency on the linker's orphan section handling is, I feel,
>> rather fragile (during work on CFI and generally building kernels
>> with
>> Clang's LLD linker, we keep tripping over difference between how BFD
>> and
>> LLD handle orphans). However, this is currently no way to perform a
>> section "pass through" where input sections retain their name as an
>> output section. (If anyone knows a way to do this, I'm all ears).
>>
>> Right now, you can only collect sections like this:
>>
>> .text : AT(ADDR(.text) - LOAD_OFFSET) {
>> *(.text.*)
>> }
>>
>> or let them be orphans, which then the linker attempts to find a
>> "similar" (code, data, etc) section to put them near:
>> https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
>>
>> So, basically, yes, this works, but I'd like to see BFD and LLD grow
>> some kind of /PASSTHRU/ special section (like /DISCARD/), that would
>> let
>> a linker script specify _where_ these sections should roughly live.
>>
>> Related thoughts:
>>
>> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
>> what function start alignment should be. It seems the linker is 16
>> byte
>> aligning these functions, when I think no alignment is needed for
>> function starts, so we're wasting some memory (average 8 bytes per
>> function, at say 50,000 functions, so approaching 512KB) between
>> functions. If we can specify a 1 byte alignment for these orphan
>> sections, that would be nice, as mentioned in the cover letter: we
>> lose
>> a 4 bits of entropy to this alignment, since all randomized function
>> addresses will have their low bits set to zero.
>
> So, when I was developing this patch set, I initially ignored the value
> of sh_addralign and just packed the functions in one right after
> another when I did the new layout. They were byte aligned :). I later
> realized that I should probably pay attention to alignment and thus
> started respecting the value that was in sh_addralign. There is
> actually nothing stopping me from ignoring it again, other than I am
> concerned that I will make runtime performance suffer even more than I
> already have.
If you start randomizing *data* sections, then alignment matters.
Also, in the shiny new era of Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment may actually matter. Sigh. The symptom will be horrible maybe-exploitable crashes on old microcode and “minimal performance impact” on new microcode. In this context, “minimal” may actually mean “huge, throw away your CPU and replace it with one from a different vendor.”
Of course, there doesn’t appear to be anything resembling credible public documentation for any of this.
On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote:
> Also, in the shiny new era of
> Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment
> may actually matter.
*groan*, indeed. I just went and looked that up. I missed this one in
all the other fuss :/
So per:
https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
the toolchain mitigations only work if the offset in the ifetch window
(32 bytes) is preserved. Which seems to suggest we ought to align all
functions to 32byte before randomizing it, otherwise we're almost
guaranteed to change this offset by the act of randomizing.
On Fri, Feb 07, 2020 at 10:24:23AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote:
> > Also, in the shiny new era of
> > Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment
> > may actually matter.
>
> *groan*, indeed. I just went and looked that up. I missed this one in
> all the other fuss :/
>
> So per:
>
> https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
>
> the toolchain mitigations only work if the offset in the ifetch window
> (32 bytes) is preserved. Which seems to suggest we ought to align all
> functions to 32byte before randomizing it, otherwise we're almost
> guaranteed to change this offset by the act of randomizing.
Wheee! This sounds like in needs to be fixed generally, yes? (And I see
"FUNCTION_ALIGN" macro is currently 16 bytes...
--
Kees Cook
On Sun, Feb 09, 2020 at 05:43:40PM -0800, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 10:24:23AM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote:
> > > Also, in the shiny new era of
> > > Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment
> > > may actually matter.
> >
> > *groan*, indeed. I just went and looked that up. I missed this one in
> > all the other fuss :/
> >
> > So per:
> >
> > https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
> >
> > the toolchain mitigations only work if the offset in the ifetch window
> > (32 bytes) is preserved. Which seems to suggest we ought to align all
> > functions to 32byte before randomizing it, otherwise we're almost
> > guaranteed to change this offset by the act of randomizing.
>
> Wheee! This sounds like in needs to be fixed generally, yes? (And I see
> "FUNCTION_ALIGN" macro is currently 16 bytes...
It depends a bit on how it all works I suppose (I'm not too clear on the
details).
Suppose the linker appends translation units at (at least) 32 bytes
alignment, but the function alignment inside the translation unit is
smaller, then it could still work, because the assembler (which is going
to insert NOPs to avoid instructions being in the 'wrong' place) can
still know the offset.
If the linker is going to be fancy (say LTO) and move code around inside
sections/translation units, then this goes out the window obviously.
The same with this fine-grained-randomization, if the section alignment
is smaller than 32 bytes, the offset is going to change and the
mitigation will be nullified.
I'll leave it to others to figure out the exact details. But afaict it
should be possible to have fine-grained-randomization and preserve the
workaround in the end.
>
> I'll leave it to others to figure out the exact details. But afaict it
> should be possible to have fine-grained-randomization and preserve the
> workaround in the end.
>
the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing)
and then in the randomizer preserve the offset within 32 bytes, no matter what it is
that would get you an average padding of 16 bytes which is a bit more than now but not too insane
(queue Kees' argument that tiny bits of padding are actually good)
On Mon, Feb 10, 2020 at 07:54:58AM -0800, Arjan van de Ven wrote:
> >
> > I'll leave it to others to figure out the exact details. But afaict it
> > should be possible to have fine-grained-randomization and preserve the
> > workaround in the end.
> >
>
> the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing)
> and then in the randomizer preserve the offset within 32 bytes, no matter what it is
>
> that would get you an average padding of 16 bytes which is a bit more than now but not too insane
> (queue Kees' argument that tiny bits of padding are actually good)
>
With the patchset for adding the mbranches-within-32B-boundaries option,
the section alignment gets forced to 32. With function-sections that
means function alignment has to be 32 too.
+++ Kees Cook [06/02/20 04:41 -0800]:
>On Wed, Feb 05, 2020 at 02:39:49PM -0800, Kristen Carlson Accardi wrote:
>> If a module has functions split out into separate text sections
>> (i.e. compiled with the -ffunction-sections flag), reorder the
>> functions to provide some code diversification to modules.
>>
>> Signed-off-by: Kristen Carlson Accardi <[email protected]>
>
>Probably a good idea to add Jessica to CC in next version:
> Jessica Yu <[email protected]>
Thanks :)
>Reviewed-by: Kees Cook <[email protected]>
>
>-Kees
>
>> ---
>> kernel/module.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 85 insertions(+)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index b56f3224b161..231563e95e61 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -53,6 +53,8 @@
>> #include <linux/bsearch.h>
>> #include <linux/dynamic_debug.h>
>> #include <linux/audit.h>
>> +#include <linux/random.h>
>> +#include <asm/setup.h>
>> #include <uapi/linux/module.h>
>> #include "module-internal.h"
>>
>> @@ -3245,6 +3247,87 @@ static int find_module_sections(struct module *mod, struct load_info *info)
>> return 0;
>> }
>>
>> +/*
>> + * shuffle_text_list()
>> + * Use a Fisher Yates algorithm to shuffle a list of text sections.
>> + */
>> +static void shuffle_text_list(Elf_Shdr **list, int size)
>> +{
>> + int i;
>> + unsigned int j;
>> + Elf_Shdr *temp;
>> +
>> + for (i = size - 1; i > 0; i--) {
>> + /*
>> + * TBD - seed. We need to be able to use a known
>> + * seed so that we can non-randomly randomize for
>> + * debugging.
>> + */
>> +
>> + // pick a random index from 0 to i
>> + get_random_bytes(&j, sizeof(j));
>> + j = j % (i + 1);
>> +
>> + temp = list[i];
>> + list[i] = list[j];
>> + list[j] = temp;
>> + }
>> +}
>> +
>> +/*
>> + * randomize_text()
>> + * Look through the core section looking for executable code sections.
>> + * Store sections in an array and then shuffle the sections
>> + * to reorder the functions.
>> + */
>> +static void randomize_text(struct module *mod, struct load_info *info)
>> +{
>> + int i;
>> + int num_text_sections = 0;
>> + Elf_Shdr **text_list;
>> + int size = 0;
>> + int max_sections = info->hdr->e_shnum;
>> + unsigned int sec = find_sec(info, ".text");
>> +
>> + if (!IS_ENABLED(CONFIG_FG_KASLR) || !kaslr_enabled())
>> + return;
Maybe put this conditional before the call to randomize_text() in
move_module()? Just for code readability reasons (i.e., we only call
randomize_text() when CONFIG_FG_KASLR || kaslr_enabled()). But this is
just a matter of preference.
>> +
>> + if (sec == 0)
>> + return;
>> +
>> + text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
>> + if (text_list == NULL)
>> + return;
>> +
>> + for (i = 0; i < max_sections; i++) {
>> + Elf_Shdr *shdr = &info->sechdrs[i];
>> + const char *sname = info->secstrings + shdr->sh_name;
>> +
>> + if (!(shdr->sh_flags & SHF_ALLOC) ||
>> + !(shdr->sh_flags & SHF_EXECINSTR) ||
>> + strstarts(sname, ".init"))
>> + continue;
>> +
>> + text_list[num_text_sections] = shdr;
>> + num_text_sections++;
>> + }
>> +
>> + shuffle_text_list(text_list, num_text_sections);
>> +
>> + for (i = 0; i < num_text_sections; i++) {
>> + Elf_Shdr *shdr = text_list[i];
>> + unsigned int infosec;
>> + const char *sname;
>> +
>> + sname = info->secstrings + shdr->sh_name;
sname doesn't appear to be used after this (?)
>> + infosec = shdr->sh_info;
>> +
>> + shdr->sh_entsize = get_offset(mod, &size, shdr, infosec);
get_offset() expects a section index as the last argument, but sh_info
is 0 for text sections (SHT_PROGBITS). It's really only used in
arch_mod_section_prepend() though, which is only defined for parisc.
We could perhaps save the section index in sh_info in the previous for
loop (hacky, I have to double check if this is actually safe, but it
seems sh_info only has signficance for SHT_REL* and SHT_SYMTAB
sections).
>> + }
>> +
>> + kfree(text_list);
>> +}
>> +
>> static int move_module(struct module *mod, struct load_info *info)
>> {
>> int i;
>> @@ -3282,6 +3365,8 @@ static int move_module(struct module *mod, struct load_info *info)
>> } else
>> mod->init_layout.base = NULL;
>>
>> + randomize_text(mod, info);
Hm, I wonder if we should not incorporate the randomize_text() logic into
layout_sections() instead of move_module(), since logically speaking
layout_sections() determines all the section offsets and stores them
in sh_entsize, whereas move_module() does all the allocations and
copying to final destination. Just a thought.
Thanks!
Jessica
On Mon, Feb 10, 2020 at 11:36:29AM -0500, Arvind Sankar wrote:
> On Mon, Feb 10, 2020 at 07:54:58AM -0800, Arjan van de Ven wrote:
> > >
> > > I'll leave it to others to figure out the exact details. But afaict it
> > > should be possible to have fine-grained-randomization and preserve the
> > > workaround in the end.
> > >
> >
> > the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing)
> > and then in the randomizer preserve the offset within 32 bytes, no matter what it is
> >
> > that would get you an average padding of 16 bytes which is a bit more than now but not too insane
> > (queue Kees' argument that tiny bits of padding are actually good)
> >
>
> With the patchset for adding the mbranches-within-32B-boundaries option,
> the section alignment gets forced to 32. With function-sections that
> means function alignment has to be 32 too.
We should be careful about enabling -mbranches-within-32B-boundaries.
It will hurt AMD, and presumably future Intel CPUs which don't need it.
--
Josh
On Fri, Feb 21, 2020 at 01:50:39PM -0600, Josh Poimboeuf wrote:
> On Mon, Feb 10, 2020 at 11:36:29AM -0500, Arvind Sankar wrote:
> > On Mon, Feb 10, 2020 at 07:54:58AM -0800, Arjan van de Ven wrote:
> > > >
> > > > I'll leave it to others to figure out the exact details. But afaict it
> > > > should be possible to have fine-grained-randomization and preserve the
> > > > workaround in the end.
> > > >
> > >
> > > the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing)
> > > and then in the randomizer preserve the offset within 32 bytes, no matter what it is
> > >
> > > that would get you an average padding of 16 bytes which is a bit more than now but not too insane
> > > (queue Kees' argument that tiny bits of padding are actually good)
> > >
> >
> > With the patchset for adding the mbranches-within-32B-boundaries option,
> > the section alignment gets forced to 32. With function-sections that
> > means function alignment has to be 32 too.
>
> We should be careful about enabling -mbranches-within-32B-boundaries.
> It will hurt AMD, and presumably future Intel CPUs which don't need it.
>
> --
> Josh
>
And past Intel CPUs too :) As I understand it only appears from Skylake
onwards.
On Wed, Feb 05, 2020 at 02:39:44PM -0800, Kristen Carlson Accardi wrote:
> Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
> the make file to build with -ffunction-sections if CONFIG_FG_KASLR
>
> Signed-off-by: Kristen Carlson Accardi <[email protected]>
> ---
> Makefile | 4 ++++
> arch/x86/Kconfig | 13 +++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index c50ef91f6136..41438a921666 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
> KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
> endif
>
> +ifdef CONFIG_FG_KASLR
> +KBUILD_CFLAGS += -ffunction-sections
> +endif
> +
With -ffunction-sections I get a few unreachable code warnings from
objtool.
arch/x86/kernel/dumpstack.o: warning: objtool: show_iret_regs()+0x10: unreachable instruction
fs/sysfs/dir.o: warning: objtool: sysfs_create_mount_point()+0x4f: unreachable instruction
kernel/time/clocksource.o: warning: objtool: __clocksource_register_scale()+0x21: unreachable instruction
drivers/tty/sysrq.o: warning: objtool: sysrq_filter()+0x2ef: unreachable instruction
arch/x86/mm/fault.o: warning: objtool: pgtable_bad()+0x3f: unreachable instruction
drivers/acpi/pci_root.o: warning: objtool: acpi_pci_osc_control_set()+0x123: unreachable instruction
drivers/rtc/class.o: warning: objtool: devm_rtc_device_register()+0x40: unreachable instruction
kernel/power/process.o: warning: objtool: freeze_processes.cold()+0x0: unreachable instruction
drivers/pnp/quirks.o: warning: objtool: quirk_awe32_resources()+0x42: unreachable instruction
drivers/acpi/utils.o: warning: objtool: acpi_evaluate_dsm()+0xf1: unreachable instruction
kernel/reboot.o: warning: objtool: __do_sys_reboot()+0x1b6: unreachable instruction
kernel/power/swap.o: warning: objtool: swsusp_read()+0x185: unreachable instruction
drivers/hid/hid-core.o: warning: objtool: hid_hw_start()+0x38: unreachable instruction
drivers/acpi/battery.o: warning: objtool: sysfs_add_battery.cold()+0x1a: unreachable instruction
arch/x86/kernel/cpu/mce/core.o: warning: objtool: do_machine_check.cold()+0x33: unreachable instruction
drivers/pcmcia/cistpl.o: warning: objtool: pccard_store_cis()+0x4e: unreachable instruction
drivers/gpu/vga/vgaarb.o: warning: objtool: pci_notify()+0x35: unreachable instruction
arch/x86/kernel/tsc.o: warning: objtool: determine_cpu_tsc_frequencies()+0x45: unreachable instruction
drivers/pcmcia/yenta_socket.o: warning: objtool: ti1250_override()+0x50: unreachable instruction
fs/proc/proc_sysctl.o: warning: objtool: sysctl_print_dir.isra.0()+0x19: unreachable instruction
drivers/iommu/intel-iommu.o: warning: objtool: intel_iommu_init()+0x4f4: unreachable instruction
net/mac80211/ibss.o: warning: objtool: ieee80211_ibss_work.cold()+0x157: unreachable instruction
drivers/net/ethernet/intel/e1000/e1000_main.o: warning: objtool: e1000_clean.cold()+0x0: unreachable instruction
net/core/skbuff.o: warning: objtool: skb_dump.cold()+0x3fd: unreachable instruction
On Tue, 2020-02-25 at 12:55 -0500, Arvind Sankar wrote:
> On Wed, Feb 05, 2020 at 02:39:44PM -0800, Kristen Carlson Accardi
> wrote:
> > Allow user to select CONFIG_FG_KASLR if dependencies are met.
> > Change
> > the make file to build with -ffunction-sections if CONFIG_FG_KASLR
> >
> > Signed-off-by: Kristen Carlson Accardi <[email protected]>
> > ---
> > Makefile | 4 ++++
> > arch/x86/Kconfig | 13 +++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/Makefile b/Makefile
> > index c50ef91f6136..41438a921666 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
> > KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
> > endif
> >
> > +ifdef CONFIG_FG_KASLR
> > +KBUILD_CFLAGS += -ffunction-sections
> > +endif
> > +
>
> With -ffunction-sections I get a few unreachable code warnings from
> objtool.
>
> arch/x86/kernel/dumpstack.o: warning: objtool: show_iret_regs()+0x10:
> unreachable instruction
> fs/sysfs/dir.o: warning: objtool: sysfs_create_mount_point()+0x4f:
> unreachable instruction
> kernel/time/clocksource.o: warning: objtool:
> __clocksource_register_scale()+0x21: unreachable instruction
> drivers/tty/sysrq.o: warning: objtool: sysrq_filter()+0x2ef:
> unreachable instruction
> arch/x86/mm/fault.o: warning: objtool: pgtable_bad()+0x3f:
> unreachable instruction
> drivers/acpi/pci_root.o: warning: objtool:
> acpi_pci_osc_control_set()+0x123: unreachable instruction
> drivers/rtc/class.o: warning: objtool:
> devm_rtc_device_register()+0x40: unreachable instruction
> kernel/power/process.o: warning: objtool:
> freeze_processes.cold()+0x0: unreachable instruction
> drivers/pnp/quirks.o: warning: objtool: quirk_awe32_resources()+0x42:
> unreachable instruction
> drivers/acpi/utils.o: warning: objtool: acpi_evaluate_dsm()+0xf1:
> unreachable instruction
> kernel/reboot.o: warning: objtool: __do_sys_reboot()+0x1b6:
> unreachable instruction
> kernel/power/swap.o: warning: objtool: swsusp_read()+0x185:
> unreachable instruction
> drivers/hid/hid-core.o: warning: objtool: hid_hw_start()+0x38:
> unreachable instruction
> drivers/acpi/battery.o: warning: objtool:
> sysfs_add_battery.cold()+0x1a: unreachable instruction
> arch/x86/kernel/cpu/mce/core.o: warning: objtool:
> do_machine_check.cold()+0x33: unreachable instruction
> drivers/pcmcia/cistpl.o: warning: objtool: pccard_store_cis()+0x4e:
> unreachable instruction
> drivers/gpu/vga/vgaarb.o: warning: objtool: pci_notify()+0x35:
> unreachable instruction
> arch/x86/kernel/tsc.o: warning: objtool:
> determine_cpu_tsc_frequencies()+0x45: unreachable instruction
> drivers/pcmcia/yenta_socket.o: warning: objtool:
> ti1250_override()+0x50: unreachable instruction
> fs/proc/proc_sysctl.o: warning: objtool:
> sysctl_print_dir.isra.0()+0x19: unreachable instruction
> drivers/iommu/intel-iommu.o: warning: objtool:
> intel_iommu_init()+0x4f4: unreachable instruction
> net/mac80211/ibss.o: warning: objtool:
> ieee80211_ibss_work.cold()+0x157: unreachable instruction
> drivers/net/ethernet/intel/e1000/e1000_main.o: warning: objtool:
> e1000_clean.cold()+0x0: unreachable instruction
> net/core/skbuff.o: warning: objtool: skb_dump.cold()+0x3fd:
> unreachable instruction
Thanks, I will look into this. I also get these warnings - I've been
ignoring them successfully so far, but I'll root cause the issue.
On Tue, 2020-02-25 at 12:55 -0500, Arvind Sankar wrote:
> On Wed, Feb 05, 2020 at 02:39:44PM -0800, Kristen Carlson Accardi
> wrote:
> > Allow user to select CONFIG_FG_KASLR if dependencies are met.
> > Change
> > the make file to build with -ffunction-sections if CONFIG_FG_KASLR
> >
> > Signed-off-by: Kristen Carlson Accardi <[email protected]>
> > ---
> > Makefile | 4 ++++
> > arch/x86/Kconfig | 13 +++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/Makefile b/Makefile
> > index c50ef91f6136..41438a921666 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
> > KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
> > endif
> >
> > +ifdef CONFIG_FG_KASLR
> > +KBUILD_CFLAGS += -ffunction-sections
> > +endif
> > +
>
> With -ffunction-sections I get a few unreachable code warnings from
> objtool.
>
> arch/x86/kernel/dumpstack.o: warning: objtool: show_iret_regs()+0x10:
> unreachable instruction
> fs/sysfs/dir.o: warning: objtool: sysfs_create_mount_point()+0x4f:
> unreachable instruction
> kernel/time/clocksource.o: warning: objtool:
> __clocksource_register_scale()+0x21: unreachable instruction
> drivers/tty/sysrq.o: warning: objtool: sysrq_filter()+0x2ef:
> unreachable instruction
> arch/x86/mm/fault.o: warning: objtool: pgtable_bad()+0x3f:
> unreachable instruction
> drivers/acpi/pci_root.o: warning: objtool:
> acpi_pci_osc_control_set()+0x123: unreachable instruction
> drivers/rtc/class.o: warning: objtool:
> devm_rtc_device_register()+0x40: unreachable instruction
> kernel/power/process.o: warning: objtool:
> freeze_processes.cold()+0x0: unreachable instruction
> drivers/pnp/quirks.o: warning: objtool: quirk_awe32_resources()+0x42:
> unreachable instruction
> drivers/acpi/utils.o: warning: objtool: acpi_evaluate_dsm()+0xf1:
> unreachable instruction
> kernel/reboot.o: warning: objtool: __do_sys_reboot()+0x1b6:
> unreachable instruction
> kernel/power/swap.o: warning: objtool: swsusp_read()+0x185:
> unreachable instruction
> drivers/hid/hid-core.o: warning: objtool: hid_hw_start()+0x38:
> unreachable instruction
> drivers/acpi/battery.o: warning: objtool:
> sysfs_add_battery.cold()+0x1a: unreachable instruction
> arch/x86/kernel/cpu/mce/core.o: warning: objtool:
> do_machine_check.cold()+0x33: unreachable instruction
> drivers/pcmcia/cistpl.o: warning: objtool: pccard_store_cis()+0x4e:
> unreachable instruction
> drivers/gpu/vga/vgaarb.o: warning: objtool: pci_notify()+0x35:
> unreachable instruction
> arch/x86/kernel/tsc.o: warning: objtool:
> determine_cpu_tsc_frequencies()+0x45: unreachable instruction
> drivers/pcmcia/yenta_socket.o: warning: objtool:
> ti1250_override()+0x50: unreachable instruction
> fs/proc/proc_sysctl.o: warning: objtool:
> sysctl_print_dir.isra.0()+0x19: unreachable instruction
> drivers/iommu/intel-iommu.o: warning: objtool:
> intel_iommu_init()+0x4f4: unreachable instruction
> net/mac80211/ibss.o: warning: objtool:
> ieee80211_ibss_work.cold()+0x157: unreachable instruction
> drivers/net/ethernet/intel/e1000/e1000_main.o: warning: objtool:
> e1000_clean.cold()+0x0: unreachable instruction
> net/core/skbuff.o: warning: objtool: skb_dump.cold()+0x3fd:
> unreachable instruction
I'm still working on a solution, but the issue here is that any .cold
function is going to be in a different section than the related
function, and when objtool is searching for instructions in
find_insn(), it assumes that it must be in the same section as the
caller.
On Tue, Mar 24, 2020 at 02:24:51PM -0700, Kristen Carlson Accardi wrote:
> On Tue, 2020-02-25 at 12:55 -0500, Arvind Sankar wrote:
> > On Wed, Feb 05, 2020 at 02:39:44PM -0800, Kristen Carlson Accardi
> > wrote:
> > > Allow user to select CONFIG_FG_KASLR if dependencies are met.
> > > Change
> > > the make file to build with -ffunction-sections if CONFIG_FG_KASLR
> > >
> > > Signed-off-by: Kristen Carlson Accardi <[email protected]>
> > > ---
> > > Makefile | 4 ++++
> > > arch/x86/Kconfig | 13 +++++++++++++
> > > 2 files changed, 17 insertions(+)
> > >
> > > diff --git a/Makefile b/Makefile
> > > index c50ef91f6136..41438a921666 100644
> > > --- a/Makefile
> > > +++ b/Makefile
> > > @@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
> > > KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
> > > endif
> > >
> > > +ifdef CONFIG_FG_KASLR
> > > +KBUILD_CFLAGS += -ffunction-sections
> > > +endif
> > > +
> >
> > With -ffunction-sections I get a few unreachable code warnings from
> > objtool.
> >
> > [...]
> > net/mac80211/ibss.o: warning: objtool:
> > ieee80211_ibss_work.cold()+0x157: unreachable instruction
> > drivers/net/ethernet/intel/e1000/e1000_main.o: warning: objtool:
> > e1000_clean.cold()+0x0: unreachable instruction
> > net/core/skbuff.o: warning: objtool: skb_dump.cold()+0x3fd:
> > unreachable instruction
>
> I'm still working on a solution, but the issue here is that any .cold
> function is going to be in a different section than the related
> function, and when objtool is searching for instructions in
> find_insn(), it assumes that it must be in the same section as the
> caller.
Can we teach objtool about this? It doesn't seem too unreasonable.
--
Kees Cook