2021-11-22 17:14:19

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

Objtool based IBT validation in 3 passes:

--ibt-fix-direct:

Detect and rewrite any code/reloc from a JMP/CALL instruction
to an ENDBR instruction. This is basically a compiler bug since
neither needs the ENDBR and decoding it is a pure waste of time.

--ibt:

Report code relocs that are not JMP/CALL and don't point to ENDBR

There's a bunch of false positives, for eg. static_call_update()
and copy_thread() and kprobes. But most of them were due to
_THIS_IP_ which has been taken care of with the prior annotation.

--ibt-seal:

Find and NOP superfluous ENDBR instructions. Any function that
doesn't have it's address taken should not have an ENDBR
instruction. This removes about 1-in-4 ENDBR instructions.

All these flags are LTO like and require '--vmlinux --duplicate' to
run. As is, the output on x86_64-defconfig looks like:

defconfig-build/vmlinux.o: warning: objtool: apply_retpolines()+0x9b: relocation to !ENDBR: __x86_indirect_thunk_array+0x0
defconfig-build/vmlinux.o: warning: objtool: copy_thread()+0x3c: relocation to !ENDBR: ret_from_fork+0x0
defconfig-build/vmlinux.o: warning: objtool: machine_kexec_prepare()+0x189: relocation to !ENDBR: relocate_kernel+0x0
defconfig-build/vmlinux.o: warning: objtool: machine_kexec()+0x44: relocation to !ENDBR: relocate_kernel+0x0
defconfig-build/vmlinux.o: warning: objtool: __kretprobe_trampoline()+0x0: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: arch_prepare_kretprobe()+0x16: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: trampoline_handler()+0x1b: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: apply_relocate_add()+0x30: relocation to !ENDBR: __memcpy+0x0
defconfig-build/vmlinux.o: warning: objtool: unwind_next_frame()+0x53d: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: unwind_next_frame()+0x3d0: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: unwind_next_frame()+0x232: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: __unwind_start()+0x11a: relocation to !ENDBR: ret_from_fork+0x0
defconfig-build/vmlinux.o: warning: objtool: sched_dynamic_update()+0x3b: relocation to !ENDBR: preempt_schedule_thunk+0x0
defconfig-build/vmlinux.o: warning: objtool: sched_dynamic_update()+0x55: relocation to !ENDBR: preempt_schedule_notrace_thunk+0x0
defconfig-build/vmlinux.o: warning: objtool: sched_dynamic_update()+0x1e3: relocation to !ENDBR: preempt_schedule_thunk+0x0
defconfig-build/vmlinux.o: warning: objtool: sched_dynamic_update()+0x1fd: relocation to !ENDBR: preempt_schedule_notrace_thunk+0x0
defconfig-build/vmlinux.o: warning: objtool: filter_irq_stacks()+0x1f: relocation to !ENDBR: __irqentry_text_end+0x0
defconfig-build/vmlinux.o: warning: objtool: __kretprobe_find_ret_addr()+0xe: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: kretprobe_find_ret_addr()+0x13: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: __kretprobe_trampoline_handler()+0x43: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: trace_seq_print_sym()+0x8e: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: trace_seq_print_sym()+0x39: relocation to !ENDBR: __kretprobe_trampoline+0x0
defconfig-build/vmlinux.o: warning: objtool: override_function_with_return()+0x4: relocation to !ENDBR: just_return_func+0x0
defconfig-build/vmlinux.o: warning: objtool: rpm_suspend()+0x409: relocation to !ENDBR: rpm_suspend+0x56b
defconfig-build/vmlinux.o: warning: objtool: rpm_idle()+0x192: relocation to !ENDBR: rpm_idle+0x215
defconfig-build/vmlinux.o: warning: objtool: arch_hibernation_header_save()+0x17: relocation to !ENDBR: restore_registers+0x0
defconfig-build/vmlinux.o: warning: objtool: relocate_restore_code()+0x1e: relocation to !ENDBR: core_restore_code+0xfffffffffffffffc
defconfig-build/vmlinux.o: warning: objtool: relocate_restore_code()+0x29: relocation to !ENDBR: core_restore_code+0x0
defconfig-build/vmlinux.o: warning: objtool: init_real_mode()+0xf1: relocation to !ENDBR: secondary_startup_64+0x0
defconfig-build/vmlinux.o: warning: objtool: int3_exception_notify()+0x52: relocation to !ENDBR: int3_magic+0x0
defconfig-build/vmlinux.o: warning: objtool: exc_double_fault()+0x55: relocation to !ENDBR: native_irq_return_iret+0x0
defconfig-build/vmlinux.o: warning: objtool: exc_debug()+0xa3: relocation to !ENDBR: __end_entry_SYSENTER_compat+0x0
defconfig-build/vmlinux.o: warning: objtool: .text+0x5211c: relocation to !ENDBR: .text+0x52125
defconfig-build/vmlinux.o: warning: objtool: .head.text+0x80: relocation to !ENDBR: .head.text+0x89
defconfig-build/vmlinux.o: warning: objtool: .head.text+0xf4: relocation to !ENDBR: .head.text+0x107
defconfig-build/vmlinux.o: warning: objtool: .entry.text+0x1836: relocation to !ENDBR: asm_load_gs_index+0x3
defconfig-build/vmlinux.o: warning: objtool: .entry.text+0x1937: relocation to !ENDBR: .entry.text+0x19b4
defconfig-build/vmlinux.o: warning: objtool: .entry.text+0x198a: relocation to !ENDBR: .entry.text+0x19b4
defconfig-build/vmlinux.o: warning: objtool: .entry.text+0x1945: relocation to !ENDBR: .entry.text+0x19d9
IBT superfluous ENDBR: 11799/43941

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
tools/objtool/arch/x86/decode.c | 18 ++
tools/objtool/builtin-check.c | 10 +
tools/objtool/check.c | 236 +++++++++++++++++++++++++++++---
tools/objtool/include/objtool/arch.h | 2
tools/objtool/include/objtool/builtin.h | 2
tools/objtool/include/objtool/objtool.h | 3
tools/objtool/objtool.c | 1
7 files changed, 249 insertions(+), 23 deletions(-)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -112,7 +112,7 @@ int arch_decode_instruction(struct objto
const struct elf *elf = file->elf;
struct insn insn;
int x86_64, ret;
- unsigned char op1, op2,
+ unsigned char op1, op2, prefix,
rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -137,6 +137,8 @@ int arch_decode_instruction(struct objto
if (insn.vex_prefix.nbytes)
return 0;

+ prefix = insn.prefixes.bytes[0];
+
op1 = insn.opcode.bytes[0];
op2 = insn.opcode.bytes[1];

@@ -491,6 +493,11 @@ int arch_decode_instruction(struct objto
/* nopl/nopw */
*type = INSN_NOP;

+ } else if (op2 == 0x1e) {
+
+ if (prefix == 0xf3 && (modrm == 0xfa || modrm == 0xfb))
+ *type = INSN_ENDBR;
+
} else if (op2 == 0xa0 || op2 == 0xa8) {

/* push fs/gs */
@@ -691,6 +698,15 @@ const char *arch_nop_insn(int len)
return nops[len-1];
}

+const char *arch_call_insn(unsigned long ip, unsigned long target)
+{
+ static char bytes[5] = { 0xe8, 0, 0, 0, 0 };
+
+ *(int *)(&bytes[1]) = (long)(target - (ip + 5));
+
+ return bytes;
+}
+
#define BYTE_RET 0xC3

const char *arch_ret_insn(int len)
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,7 @@
#include <objtool/objtool.h>

bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- validate_dup, vmlinux, mcount, noinstr, backup;
+ validate_dup, vmlinux, mcount, noinstr, backup, ibt, ibt_fix_direct, ibt_seal;

static const char * const check_usage[] = {
"objtool check [<options>] file.o",
@@ -45,6 +45,9 @@ const struct option check_options[] = {
OPT_BOOLEAN('l', "vmlinux", &vmlinux, "vmlinux.o validation"),
OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"),
OPT_BOOLEAN('B', "backup", &backup, "create .orig files before modification"),
+ OPT_BOOLEAN(0, "ibt", &ibt, "foobar"),
+ OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "foobar"),
+ OPT_BOOLEAN(0, "ibt-seal", &ibt_seal, "foobar"),
OPT_END(),
};

@@ -84,6 +87,11 @@ int cmd_check(int argc, const char **arg
argc = cmd_parse_options(argc, argv, check_usage);
objname = argv[0];

+ if (ibt && !(vmlinux && validate_dup)) {
+ fprintf(stderr, "--ibt requires: --vmlinux --duplicate\n");
+ exit(129);
+ }
+
file = objtool_open_read(objname);
if (!file)
return 1;
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -378,6 +378,7 @@ static int decode_instructions(struct ob
memset(insn, 0, sizeof(*insn));
INIT_LIST_HEAD(&insn->alts);
INIT_LIST_HEAD(&insn->stack_ops);
+ INIT_LIST_HEAD(&insn->call_node);

insn->sec = sec;
insn->offset = offset;
@@ -1170,6 +1171,13 @@ static int add_jump_destinations(struct
unsigned long dest_off;

for_each_insn(file, insn) {
+ if (insn->type == INSN_ENDBR) {
+ if (insn->func && insn->offset == insn->func->offset) {
+ list_add_tail(&insn->call_node, &file->endbr_list);
+ file->nr_endbr++;
+ }
+ }
+
if (!is_static_jump(insn))
continue;

@@ -1186,10 +1194,14 @@ static int add_jump_destinations(struct
} else if (insn->func) {
/* internal or external sibling call (with reloc) */
add_call_dest(file, insn, reloc->sym, true);
- continue;
+
+ dest_sec = reloc->sym->sec;
+ dest_off = reloc->sym->offset +
+ arch_dest_reloc_offset(reloc->addend);
+
} else if (reloc->sym->sec->idx) {
dest_sec = reloc->sym->sec;
- dest_off = reloc->sym->sym.st_value +
+ dest_off = reloc->sym->offset +
arch_dest_reloc_offset(reloc->addend);
} else {
/* non-func asm code jumping to another file */
@@ -1199,6 +1211,9 @@ static int add_jump_destinations(struct
insn->jump_dest = find_insn(file, dest_sec, dest_off);
if (!insn->jump_dest) {

+ if (insn->func)
+ continue;
+
/*
* This is a special case where an alt instruction
* jumps past the end of the section. These are
@@ -1213,6 +1228,19 @@ static int add_jump_destinations(struct
return -1;
}

+ if (ibt && insn->call_dest && insn->jump_dest->type == INSN_ENDBR) {
+ if (reloc) {
+ if (ibt_fix_direct) {
+ reloc->addend += 4;
+ elf_write_reloc(file->elf, reloc);
+ } else {
+ WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset);
+ }
+ } else {
+ WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset);
+ }
+ }
+
/*
* Cross-function jump.
*/
@@ -1240,7 +1268,8 @@ static int add_jump_destinations(struct
insn->jump_dest->func->pfunc = insn->func;

} else if (insn->jump_dest->func->pfunc != insn->func->pfunc &&
- insn->jump_dest->offset == insn->jump_dest->func->offset) {
+ ((insn->jump_dest->offset == insn->jump_dest->func->offset) ||
+ (insn->jump_dest->offset == insn->jump_dest->func->offset + 4))) {
/* internal sibling call (without reloc) */
add_call_dest(file, insn, insn->jump_dest->func, true);
}
@@ -1250,23 +1279,12 @@ static int add_jump_destinations(struct
return 0;
}

-static struct symbol *find_call_destination(struct section *sec, unsigned long offset)
-{
- struct symbol *call_dest;
-
- call_dest = find_func_by_offset(sec, offset);
- if (!call_dest)
- call_dest = find_symbol_by_offset(sec, offset);
-
- return call_dest;
-}
-
/*
* Find the destination instructions for all calls.
*/
static int add_call_destinations(struct objtool_file *file)
{
- struct instruction *insn;
+ struct instruction *insn, *target = NULL;
unsigned long dest_off;
struct symbol *dest;
struct reloc *reloc;
@@ -1278,7 +1296,21 @@ static int add_call_destinations(struct
reloc = insn_reloc(file, insn);
if (!reloc) {
dest_off = arch_jump_destination(insn);
- dest = find_call_destination(insn->sec, dest_off);
+
+ target = find_insn(file, insn->sec, dest_off);
+ if (!target) {
+ WARN_FUNC("direct call to nowhere", insn->sec, insn->offset);
+ return -1;
+ }
+ dest = target->func;
+ if (!dest)
+ dest = find_symbol_containing(insn->sec, dest_off);
+ if (!dest) {
+ WARN_FUNC("IMM can't find call dest symbol at %s+0x%lx",
+ insn->sec, insn->offset,
+ insn->sec->name, dest_off);
+ return -1;
+ }

add_call_dest(file, insn, dest, false);

@@ -1297,10 +1329,25 @@ static int add_call_destinations(struct
}

} else if (reloc->sym->type == STT_SECTION) {
- dest_off = arch_dest_reloc_offset(reloc->addend);
- dest = find_call_destination(reloc->sym->sec, dest_off);
+ struct section *dest_sec;
+
+ dest_sec = reloc->sym->sec;
+ dest_off = reloc->sym->offset +
+ arch_dest_reloc_offset(reloc->addend);
+
+ target = find_insn(file, dest_sec, dest_off);
+ if (target) {
+ dest = target->func;
+ if (!dest)
+ dest = find_symbol_containing(dest_sec, dest_off);
+ } else {
+ WARN("foo");
+ dest = find_func_by_offset(dest_sec, dest_off);
+ if (!dest)
+ dest = find_symbol_by_offset(dest_sec, dest_off);
+ }
if (!dest) {
- WARN_FUNC("can't find call dest symbol at %s+0x%lx",
+ WARN_FUNC("RELOC can't find call dest symbol at %s+0x%lx",
insn->sec, insn->offset,
reloc->sym->sec->name,
dest_off);
@@ -1311,9 +1358,37 @@ static int add_call_destinations(struct

} else if (reloc->sym->retpoline_thunk) {
add_retpoline_call(file, insn);
+ continue;
+
+ } else {
+ struct section *dest_sec;
+
+ dest_sec = reloc->sym->sec;
+ dest_off = reloc->sym->offset +
+ arch_dest_reloc_offset(reloc->addend);
+
+ target = find_insn(file, dest_sec, dest_off);

- } else
add_call_dest(file, insn, reloc->sym, false);
+ }
+
+ if (ibt && target && target->type == INSN_ENDBR) {
+ if (reloc) {
+ if (ibt_fix_direct) {
+ reloc->addend += 4;
+ elf_write_reloc(file->elf, reloc);
+ } else {
+ WARN_FUNC("Direct RELOC call to ENDBR", insn->sec, insn->offset);
+ }
+ } else {
+ if (ibt_fix_direct) {
+ elf_write_insn(file->elf, insn->sec, insn->offset, insn->len,
+ arch_call_insn(insn->offset, dest_off + 4));
+ } else {
+ WARN_FUNC("Direct IMM call to ENDBR", insn->sec, insn->offset);
+ }
+ }
+ }
}

return 0;
@@ -3023,6 +3098,8 @@ static struct instruction *next_insn_to_
return next_insn_same_sec(file, insn);
}

+static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn);
+
/*
* Follow the branch starting at the given instruction, and recursively follow
* any other branches (jumps). Meanwhile, track the frame pointer state at
@@ -3071,6 +3148,12 @@ static int validate_branch(struct objtoo

if (insn->hint) {
state.cfi = *insn->cfi;
+ if (ibt) {
+ if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY &&
+ insn->type != INSN_ENDBR) {
+ WARN_FUNC("IRET_ENTRY hint without ENDBR", insn->sec, insn->offset);
+ }
+ }
} else {
/* XXX track if we actually changed state.cfi */

@@ -3216,7 +3299,12 @@ static int validate_branch(struct objtoo
state.df = false;
break;

+ case INSN_NOP:
+ break;
+
default:
+ if (ibt)
+ validate_ibt_insn(file, insn);
break;
}

@@ -3466,6 +3554,111 @@ static int validate_functions(struct obj
return warnings;
}

+static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name)
+{
+ struct instruction *dest;
+ struct section *sec;
+ unsigned long off;
+
+ sec = reloc->sym->sec;
+ off = reloc->sym->offset + reloc->addend;
+
+ dest = find_insn(file, sec, off);
+ if (!dest)
+ return 0;
+
+ if (name && dest->func)
+ *name = dest->func->name;
+
+ if (dest->type == INSN_ENDBR) {
+ if (!list_empty(&dest->call_node)) {
+ list_del_init(&dest->call_node);
+ return 1;
+ }
+ return 0;
+ }
+
+ if (reloc->sym->static_call_tramp)
+ return 0;
+
+ return -1;
+}
+
+static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn)
+{
+ struct reloc *reloc = insn_reloc(file, insn);
+ struct instruction *target;
+ unsigned long offset;
+
+ if (!reloc)
+ return;
+
+ if (validate_ibt_reloc(file, reloc, NULL) >= 0)
+ return;
+
+ offset = reloc->sym->offset + reloc->addend;
+
+ target = find_insn(file, reloc->sym->sec, offset);
+ if (target && insn->func == target->func && target->this_ip)
+ return;
+
+ WARN_FUNC("relocation to !ENDBR: %s+0x%lx",
+ insn->sec, insn->offset,
+ target && target->func ? target->func->name : reloc->sym->name,
+ target && target->func ? target->offset - target->func->offset : reloc->addend);
+}
+
+static int validate_ibt(struct objtool_file *file)
+{
+ struct instruction *insn;
+ struct section *sec;
+ struct reloc *reloc;
+ char *name;
+ int nr = 0;
+
+ for_each_sec(file, sec) {
+ /* already done in validate_branch() */
+ if (sec->sh.sh_flags & SHF_EXECINSTR)
+ continue;
+
+ if (!sec->reloc)
+ continue;
+
+ if (!strncmp(sec->name, ".orc", 4))
+ continue;
+
+ if (!strncmp(sec->name, ".discard", 8))
+ continue;
+
+ if (!strcmp(sec->name, "_error_injection_whitelist"))
+ continue;
+
+ if (!strcmp(sec->name, "_kprobe_blacklist"))
+ continue;
+
+ list_for_each_entry(reloc, &sec->reloc->reloc_list, list) {
+ if (validate_ibt_reloc(file, reloc, &name) > 0) {
+// WARN("preserved (%s) ENDBR from section: %s", name, sec->name);
+ }
+
+ }
+ }
+
+ list_for_each_entry(insn, &file->endbr_list, call_node) {
+ if (ibt_seal) {
+ elf_write_insn(file->elf, insn->sec,
+ insn->offset, insn->len,
+ arch_nop_insn(insn->len));
+ }
+ nr++;
+ }
+
+ if (stats)
+ printf("IBT superfluous ENDBR: %d/%d\n", nr, file->nr_endbr);
+
+ return 0;
+}
+
static int validate_reachable_instructions(struct objtool_file *file)
{
struct instruction *insn;
@@ -3534,6 +3727,9 @@ int check(struct objtool_file *file)
goto out;
warnings += ret;

+ if (vmlinux && ibt)
+ ret = validate_ibt(file);
+
if (!warnings) {
ret = validate_reachable_instructions(file);
if (ret < 0)
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -26,6 +26,7 @@ enum insn_type {
INSN_CLAC,
INSN_STD,
INSN_CLD,
+ INSN_ENDBR,
INSN_OTHER,
};

@@ -83,6 +84,7 @@ unsigned long arch_dest_reloc_offset(int

const char *arch_nop_insn(int len);
const char *arch_ret_insn(int len);
+const char *arch_call_insn(unsigned long ip, unsigned long target);

int arch_decode_hint_reg(u8 sp_reg, int *base);

--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,7 @@

extern const struct option check_options[];
extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- validate_dup, vmlinux, mcount, noinstr, backup;
+ validate_dup, vmlinux, mcount, noinstr, backup, ibt, ibt_fix_direct, ibt_seal;

extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);

--- a/tools/objtool/include/objtool/objtool.h
+++ b/tools/objtool/include/objtool/objtool.h
@@ -26,8 +26,11 @@ struct objtool_file {
struct list_head retpoline_call_list;
struct list_head static_call_list;
struct list_head mcount_loc_list;
+ struct list_head endbr_list;
bool ignore_unreachables, c_file, hints, rodata;

+ unsigned int nr_endbr;
+
unsigned long jl_short, jl_long;
unsigned long jl_nop_short, jl_nop_long;

--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -128,6 +128,7 @@ struct objtool_file *objtool_open_read(c
INIT_LIST_HEAD(&file.retpoline_call_list);
INIT_LIST_HEAD(&file.static_call_list);
INIT_LIST_HEAD(&file.mcount_loc_list);
+ INIT_LIST_HEAD(&file.endbr_list);
file.c_file = !vmlinux && find_section_by_name(file.elf, ".comment");
file.ignore_unreachables = no_unreachable;
file.hints = false;




2021-11-24 19:30:45

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Mon, Nov 22, 2021 at 06:03:07PM +0100, Peter Zijlstra wrote:
> +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name)
> +{
> + struct instruction *dest;
> + struct section *sec;
> + unsigned long off;
> +
> + sec = reloc->sym->sec;
> + off = reloc->sym->offset + reloc->addend;
> +
> + dest = find_insn(file, sec, off);
> + if (!dest)
> + return 0;
> +
> + if (name && dest->func)
> + *name = dest->func->name;

I think these checks can be further narrowed down by only looking for
X86_64_64 relocs.

> + list_for_each_entry(insn, &file->endbr_list, call_node) {
> + if (ibt_seal) {
> + elf_write_insn(file->elf, insn->sec,
> + insn->offset, insn->len,
> + arch_nop_insn(insn->len));
> + }

Like the retpoline rewriting, I'd much rather have objtool create
annotations which the kernel can then read and patch itself.

e.g. have '.ibt.direct_call_sites' and '.ibt.unused_endbr_insns'
sections.

--
Josh


2021-12-24 02:05:55

by Joao Moreira

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On 2021-11-22 09:03, Peter Zijlstra wrote:
> Objtool based IBT validation in 3 passes:
>
> --ibt-fix-direct:
>
> Detect and rewrite any code/reloc from a JMP/CALL instruction
> to an ENDBR instruction. This is basically a compiler bug since
> neither needs the ENDBR and decoding it is a pure waste of time.
>
> --ibt:
>
> Report code relocs that are not JMP/CALL and don't point to ENDBR
>
> There's a bunch of false positives, for eg. static_call_update()
> and copy_thread() and kprobes. But most of them were due to
> _THIS_IP_ which has been taken care of with the prior annotation.
>
> --ibt-seal:
>
> Find and NOP superfluous ENDBR instructions. Any function that
> doesn't have it's address taken should not have an ENDBR
> instruction. This removes about 1-in-4 ENDBR instructions.
>

I did some experimentation with compiler-based implementation for two of
the features described here (plus an additional one). Before going into
details, just a quick highlight that the compiler has limited visibility
over handwritten assembly sources thus, depending on the feature, a
compiler-based approach will not cover as much as objtool. All the
observations below were made when compiling the kernel with defconfig, +
CLANG-related options, + LTO sometimes. Here I used kernel revision
0fcfb00b28c0b7884635dacf38e46d60bf3d4eb1 with PeterZ's IBT Beginning
patches applied on top (plus changes to Kbuild), thus, IBT was not
really enforced. Tests consisted mostly of Clang's synthetics tests +
booting a compiled kernel.

Prototypes of the features are available in:
https://github.com/lvwr/llvm/tree/joao/ibt -- I fixed as many corner
cases I could find while trying it out, but I believe some might still
be haunting. Also, I'm not very keen to Kbuild internals nor to however
the kernel patches itself during runtime, so I may have missed some
details.

Finally, I'm interested in general feedback about this... better ways of
implementing, alternative approaches, new possible optimizations and
everything. I should be AFK for a few days in the next weeks, but I'll
be glad to discuss this in January and then. Happy Holidays :)

The features:

-mibt-seal:

Add ENDBRs exclusively to address-taken functions regardless of its
linkage visibility. Only make sense together with LTO.

Observations: Reduced drastically the number of ENDBRs placed in the
kernel binary (From 44383 to 33192), but still missed some that were
later fixed by objtool (Number of fixes by objtool reduced from 11730 to
540). I did not investigate further why these superfluous ENDBRs were
still left in the binary, but at this point my hypotheses spin around
(i) false-positive address-taken conclusions by the compiler, possibly
due to things like exported symbols and such; (ii) assembly sources
which are invisible to the compiler (although this would more likely
hide address taken functions); (iii) other binary level transformations
done by objtool.

Runtime testing: The kernel was verified to properly boot after being
compiled with -mibt-seal (+ LTO).

Note: This feature was already submitted for upstreaming with the
llvm-project: https://reviews.llvm.org/D116070

-mibt-fix-direct:

Direct calls to functions which are known to have ENDBR instructions are
fixed to target the first instruction after the ENDBR (+4 offset). Does
not necessarily depend on LTO, but is meaningfully improved by it
because it depends on after-linking stuff to successfully identify
targets that will have ENDBRs (aliases and assembly functions are a big
complication here, so this currently won't try to optimize calls to
functions which exist in the compiler context as declarations).

Observations: I did a quick change on objtool to collect stats on the
number of fixes applied. Without -mibt-fix-direct objtool had to fix
227552 direct calls/jmps. Without LTO, -mibt-fix-direct reduced this
number to 218455. With LTO this number was reduced to 1728.

Runtime testing: The kernel was verified to properly boot when compiled
with -mibt-fix-direct both with and without LTO. I tried a more
aggressive approach for non-LTO compilation, which was trying to
optimize based on declarations (when functions were not visible) and
noticed that in this scenario the kernel would crash very early in the
boot process. I did not investigate further, but my hypothesis is that
this approach mistakenly optimizes calls to assembly functions without
ENDBRs, leading to random weirdness and doom.

-mibt-preceding-endbr:

Instead of placing ENDBRs after the function entry label, place it right
before. Indirect calls are fixed (using a sub instr) to target 4 bytes
before the function entry point. Address values which are reused after
the indirect call are fixed back to their original value (using an add
instr). Indirect calls that use in-memory pointers are transformed to
first load the function pointer from memory into a free register, then
do the sub, then call it. This does not depend on LTO.

Runtime testing: The runtime test was done using a kernel whose asm
functions were not fixed regarding the position of ENDBRs. Yet, perhaps
surprisingly, the kernel still boots when compiled with
-mibt-preceding-endbr. I don't really know how many indirect calls to
assembly functions happened, but I assume that these would have crashed
if IBT was being enforced due to the misplaced ENDBRs (and it could also
have crashed in these early tests due to indirect calls targeting
unknown instruction 4 bytes before assembly functions, thus the surprise
factor when I saw the kernel booting). Kernel booted alright with
-mibt-preceding-endbr with and without LTO. CONFIG_RETPOLINE was
disabled for testing this.

2022-02-09 05:26:16

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 08, 2022 at 06:21:16PM -0800, Joao Moreira wrote:
> > > Note: This feature was already submitted for upstreaming with the
> > > llvm-project: https://reviews.llvm.org/D116070
> >
> > Ah nice; I see this has been committed now.
>
> Yes, but then some front-end changes also required this fix
> https://reviews.llvm.org/D118052, which is currently under review (posting
> this here in case someone is trying this out).
>
> >
> > Given that IBT will need to work with both Clang and gcc, I suspect the
> > objtool approach will still end up needing to do all the verification.
> >
> > (And as you say, it has limited visibility into assembly.)
>
> Agreed that at this point objtool provides more coverage. Yet, besides being
> an attempt to relief objtool and improve a bit the compiler support as
> mentioned in the series cover letter, it is still nice to reduce the
> left-over nops and fixups which end-up scattered all around.
>
> FWIIW, https://reviews.llvm.org/D118438 and https://reviews.llvm.org/D118355
> are also being cooked. Comments and ideas for new approaches or improvements
> in the compiler support for this are very welcome :)

Ah, excellent, thanks for the pointers. There's also this in the works:
https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
to objtool, IBT, etc.)

--
Kees Cook

2022-02-09 06:15:30

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Wed, Nov 24, 2021 at 11:30:37AM -0800, Josh Poimboeuf wrote:
> On Mon, Nov 22, 2021 at 06:03:07PM +0100, Peter Zijlstra wrote:
> > +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name)
> > +{
> > + struct instruction *dest;
> > + struct section *sec;
> > + unsigned long off;
> > +
> > + sec = reloc->sym->sec;
> > + off = reloc->sym->offset + reloc->addend;
> > +
> > + dest = find_insn(file, sec, off);
> > + if (!dest)
> > + return 0;
> > +
> > + if (name && dest->func)
> > + *name = dest->func->name;
>
> I think these checks can be further narrowed down by only looking for
> X86_64_64 relocs.
>
> > + list_for_each_entry(insn, &file->endbr_list, call_node) {
> > + if (ibt_seal) {
> > + elf_write_insn(file->elf, insn->sec,
> > + insn->offset, insn->len,
> > + arch_nop_insn(insn->len));
> > + }
>
> Like the retpoline rewriting, I'd much rather have objtool create
> annotations which the kernel can then read and patch itself.
>
> e.g. have '.ibt.direct_call_sites' and '.ibt.unused_endbr_insns'
> sections.

Why have the kernel do that work at every boot when it can be known at
link time?

--
Kees Cook

2022-02-09 10:43:13

by Joao Moreira

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

>> Note: This feature was already submitted for upstreaming with the
>> llvm-project: https://reviews.llvm.org/D116070
>
> Ah nice; I see this has been committed now.

Yes, but then some front-end changes also required this fix
https://reviews.llvm.org/D118052, which is currently under review
(posting this here in case someone is trying this out).

>
> Given that IBT will need to work with both Clang and gcc, I suspect the
> objtool approach will still end up needing to do all the verification.
>
> (And as you say, it has limited visibility into assembly.)

Agreed that at this point objtool provides more coverage. Yet, besides
being an attempt to relief objtool and improve a bit the compiler
support as mentioned in the series cover letter, it is still nice to
reduce the left-over nops and fixups which end-up scattered all around.

FWIIW, https://reviews.llvm.org/D118438 and
https://reviews.llvm.org/D118355 are also being cooked. Comments and
ideas for new approaches or improvements in the compiler support for
this are very welcome :)

Tks,
Joao

2022-02-09 12:36:25

by Joao Moreira

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

> Ah, excellent, thanks for the pointers. There's also this in the works:
> https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
> to objtool, IBT, etc.)

Oh, great! Thanks for pointing it out. I guess I saw something with a
similar name before ;)
https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf

Jokes aside (and perhaps questions more targeted to Sami), from a
diagonal look it seems that this follows the good old tag approach
proposed by PaX/grsecurity, right? If this is the case, should I assume
it could also benefit from features like -mibt-seal? Also are you
considering that perhaps we can use alternatives to flip different CFI
instrumentation as suggested by PeterZ in another thread?

Tks,
Joao

2022-02-09 12:45:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Wed, Feb 09, 2022 at 12:41:41PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 08, 2022 at 03:43:27PM -0800, Kees Cook wrote:

> > Why have the kernel do that work at every boot when it can be known at
> > link time?
>
> I have patches that write a 4 byte #UD there instead of a nop. That
> would make !IBT hardware splat as well when it hits a sealed function
> (and in that case actually having those extra ENDBR generated is a
> bonus).
>
> Anyway, I have some newer patches and some hardware, except it's a NUC
> and working with those things is a royal pain in the arse since they
> don't have serial. I finally did get XHCI debug port working, but
> there's no XDBC grub support, so now I managed to boot a dead kernel and
> the thing is a brick until I can be arsed to connect a keybaord and
> screen to it again :-(
>
> KVM/qemu has no IBT support merged yet, so I can't use that either.

FWIW, those patches live here:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/wip.ibt

2022-02-09 13:26:17

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Thu, Dec 23, 2021 at 06:05:50PM -0800, [email protected] wrote:
> On 2021-11-22 09:03, Peter Zijlstra wrote:
> > Objtool based IBT validation in 3 passes:
> >
> > --ibt-fix-direct:
> >
> > Detect and rewrite any code/reloc from a JMP/CALL instruction
> > to an ENDBR instruction. This is basically a compiler bug since
> > neither needs the ENDBR and decoding it is a pure waste of time.
> >
> > --ibt:
> >
> > Report code relocs that are not JMP/CALL and don't point to ENDBR
> >
> > There's a bunch of false positives, for eg. static_call_update()
> > and copy_thread() and kprobes. But most of them were due to
> > _THIS_IP_ which has been taken care of with the prior annotation.
> >
> > --ibt-seal:
> >
> > Find and NOP superfluous ENDBR instructions. Any function that
> > doesn't have it's address taken should not have an ENDBR
> > instruction. This removes about 1-in-4 ENDBR instructions.
> >
>
> I did some experimentation with compiler-based implementation for two of the
> features described here (plus an additional one). Before going into details,
> just a quick highlight that the compiler has limited visibility over
> handwritten assembly sources thus, depending on the feature, a
> compiler-based approach will not cover as much as objtool. All the
> observations below were made when compiling the kernel with defconfig, +
> CLANG-related options, + LTO sometimes. Here I used kernel revision
> 0fcfb00b28c0b7884635dacf38e46d60bf3d4eb1 with PeterZ's IBT Beginning patches
> applied on top (plus changes to Kbuild), thus, IBT was not really enforced.
> Tests consisted mostly of Clang's synthetics tests + booting a compiled
> kernel.
>
> Prototypes of the features are available in:
> https://github.com/lvwr/llvm/tree/joao/ibt -- I fixed as many corner cases I
> could find while trying it out, but I believe some might still be haunting.
> Also, I'm not very keen to Kbuild internals nor to however the kernel
> patches itself during runtime, so I may have missed some details.
>
> Finally, I'm interested in general feedback about this... better ways of
> implementing, alternative approaches, new possible optimizations and
> everything. I should be AFK for a few days in the next weeks, but I'll be
> glad to discuss this in January and then. Happy Holidays :)
>
> The features:
>
> -mibt-seal:
>
> Add ENDBRs exclusively to address-taken functions regardless of its linkage
> visibility. Only make sense together with LTO.
>
> Observations: Reduced drastically the number of ENDBRs placed in the kernel
> binary (From 44383 to 33192), but still missed some that were later fixed by
> objtool (Number of fixes by objtool reduced from 11730 to 540). I did not
> investigate further why these superfluous ENDBRs were still left in the
> binary, but at this point my hypotheses spin around (i) false-positive
> address-taken conclusions by the compiler, possibly due to things like
> exported symbols and such; (ii) assembly sources which are invisible to the
> compiler (although this would more likely hide address taken functions);
> (iii) other binary level transformations done by objtool.
>
> Runtime testing: The kernel was verified to properly boot after being
> compiled with -mibt-seal (+ LTO).
>
> Note: This feature was already submitted for upstreaming with the
> llvm-project: https://reviews.llvm.org/D116070

Ah nice; I see this has been committed now.

Given that IBT will need to work with both Clang and gcc, I suspect the
objtool approach will still end up needing to do all the verification.

(And as you say, it has limited visibility into assembly.)

-Kees

--
Kees Cook

2022-02-09 13:42:05

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 08, 2022 at 03:43:27PM -0800, Kees Cook wrote:
> On Wed, Nov 24, 2021 at 11:30:37AM -0800, Josh Poimboeuf wrote:
> > On Mon, Nov 22, 2021 at 06:03:07PM +0100, Peter Zijlstra wrote:
> > > +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name)
> > > +{
> > > + struct instruction *dest;
> > > + struct section *sec;
> > > + unsigned long off;
> > > +
> > > + sec = reloc->sym->sec;
> > > + off = reloc->sym->offset + reloc->addend;
> > > +
> > > + dest = find_insn(file, sec, off);
> > > + if (!dest)
> > > + return 0;
> > > +
> > > + if (name && dest->func)
> > > + *name = dest->func->name;
> >
> > I think these checks can be further narrowed down by only looking for
> > X86_64_64 relocs.
> >
> > > + list_for_each_entry(insn, &file->endbr_list, call_node) {
> > > + if (ibt_seal) {
> > > + elf_write_insn(file->elf, insn->sec,
> > > + insn->offset, insn->len,
> > > + arch_nop_insn(insn->len));
> > > + }
> >
> > Like the retpoline rewriting, I'd much rather have objtool create
> > annotations which the kernel can then read and patch itself.
> >
> > e.g. have '.ibt.direct_call_sites' and '.ibt.unused_endbr_insns'
> > sections.
>
> Why have the kernel do that work at every boot when it can be known at
> link time?

True, but:

- The kernel is already doing several other flavors of boot-time
self-patching. IMO it's better to consolidate such craziness in one
place (or a limited number of places) if we can. It seems more
robust, and limits confusion about who's patching what and when.

- Patching text at link time has pitfalls and I'd like to avoid (as much
as reasonably possible) objtool having the ability to brick the
kernel.

--
Josh


2022-02-09 16:18:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 08, 2022 at 03:43:27PM -0800, Kees Cook wrote:
> On Wed, Nov 24, 2021 at 11:30:37AM -0800, Josh Poimboeuf wrote:
> > On Mon, Nov 22, 2021 at 06:03:07PM +0100, Peter Zijlstra wrote:
> > > +static int validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc, char **name)
> > > +{
> > > + struct instruction *dest;
> > > + struct section *sec;
> > > + unsigned long off;
> > > +
> > > + sec = reloc->sym->sec;
> > > + off = reloc->sym->offset + reloc->addend;
> > > +
> > > + dest = find_insn(file, sec, off);
> > > + if (!dest)
> > > + return 0;
> > > +
> > > + if (name && dest->func)
> > > + *name = dest->func->name;
> >
> > I think these checks can be further narrowed down by only looking for
> > X86_64_64 relocs.
> >
> > > + list_for_each_entry(insn, &file->endbr_list, call_node) {
> > > + if (ibt_seal) {
> > > + elf_write_insn(file->elf, insn->sec,
> > > + insn->offset, insn->len,
> > > + arch_nop_insn(insn->len));
> > > + }
> >
> > Like the retpoline rewriting, I'd much rather have objtool create
> > annotations which the kernel can then read and patch itself.
> >
> > e.g. have '.ibt.direct_call_sites' and '.ibt.unused_endbr_insns'
> > sections.
>
> Why have the kernel do that work at every boot when it can be known at
> link time?

I have patches that write a 4 byte #UD there instead of a nop. That
would make !IBT hardware splat as well when it hits a sealed function
(and in that case actually having those extra ENDBR generated is a
bonus).

Anyway, I have some newer patches and some hardware, except it's a NUC
and working with those things is a royal pain in the arse since they
don't have serial. I finally did get XHCI debug port working, but
there's no XDBC grub support, so now I managed to boot a dead kernel and
the thing is a brick until I can be arsed to connect a keybaord and
screen to it again :-(

KVM/qemu has no IBT support merged yet, so I can't use that either.

2022-02-11 17:14:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote:
> > Ah, excellent, thanks for the pointers. There's also this in the works:
> > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
> > to objtool, IBT, etc.)
>
> Oh, great! Thanks for pointing it out. I guess I saw something with a
> similar name before ;) https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf
>
> Jokes aside (and perhaps questions more targeted to Sami), from a diagonal
> look it seems that this follows the good old tag approach proposed by
> PaX/grsecurity, right? If this is the case, should I assume it could also
> benefit from features like -mibt-seal? Also are you considering that perhaps
> we can use alternatives to flip different CFI instrumentation as suggested
> by PeterZ in another thread?

So, lets try and recap things from IRC yesterday. There's a whole bunch
of things intertwining making indirect branches 'interesting'. Most of
which I've not seen mentioned in Sami's KCFI proposal which makes it
rather pointless.

I think we'll end up with something related to KCFI, but with distinct
differences:

- 32bit immediates for smaller code
- __kcfi_check_fail() is out for smaller code
- it must interact with IBT/BTI and retpolines
- we must be very careful with speculation.

Right, so because !IBT-CFI needs the check at the call site, like:

caller:
cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
je 1f # 2 bytes
ud2 # 2 bytes
1: call __x86_indirect_thunk_rax # 5 bytes


.align 16
.byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
func:
...
ret


While FineIBT has them at the landing site:

caller:
movl $0xdeadbeef, %r11d # 6 bytes
call __x86_indirect_thunk_rax # 5 bytes


.align 16
func:
endbr # 4 bytes
cmpl $0xdeadbeef, %r11d # 7 bytes
je 1f # 2 bytes
ud2 # 2 bytes
1: ...
ret


It seems to me that always doing the check at the call-site is simpler,
since it avoids code-bloat and patching work. That is, if we allow both
we'll effectivly blow up the code by 11 + 13 bytes (11 at the call site,
13 at function entry) as opposed to 11+4 or 6+13.

Which then yields:

caller:
cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
je 1f # 2 bytes
ud2 # 2 bytes
1: call __x86_indirect_thunk_rax # 5 bytes


.align 16
.byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
func:
endbr # 4 bytes
...
ret

For a combined 11+8 bytes overhead :/

Now, this setup provides:

- minimal size (please yell if there's a smaller option I missed;
s/ud2/int3/ ?)
- since the retpoline handles speculation from stuff before it, the
load-miss induced speculation is covered.
- the 'je' branch is binary, leading to either the retpoline or the
ud2, both which are a speculation stop.
- the ud2 is placed such that if the exception is non-fatal, code
execution can recover
- when IBT is present we can rewrite the thunk call to:

lfence
call *(%rax)

and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes).
- can disable CFI by replacing the cmpl with:

jmp 1f

(or an 11 byte nop, which is just about possible). And since we
already have all retpoline thunk callsites in a section, we can
trivially find all CFI bits that are always in front it them.
- function pointer sanity


Additionally, if we ensure all direct call are +4 and only indirect
calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR. We
can replace those ENDBR instructions of functions that should never be
indirectly called with:

ud1 0x0(%rax),%eax

which is a 4 byte #UD. This gives us the property that even on !IBT
hardware such a call will go *splat*.

Further, Andrew put in the request for __attribute__((cfi_seed(blah)))
to allow distinguishing indirect functions with otherwise identical
signature; eg. cookie = hash32(blah##signature).


Did I miss anything? Got anything wrong?

2022-02-14 21:41:46

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Fri, Feb 11, 2022 at 5:38 AM Peter Zijlstra <[email protected]> wrote:
> I think we'll end up with something related to KCFI, but with distinct
> differences:
>
> - 32bit immediates for smaller code

Sure, I don't see issues with that. Based on a quick test with
defconfig, this reduces vmlinux size by 0.30%.

> - __kcfi_check_fail() is out for smaller code

I'm fine with adding a trap mode that's used by default, but having
more helpful diagnostics when something fails is useful even in
production systems in my experience. This change results in a vmlinux
that's another 0.92% smaller.

> Which then yields:
>
> caller:
> cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
> je 1f # 2 bytes
> ud2 # 2 bytes
> 1: call __x86_indirect_thunk_rax # 5 bytes

Note that the compiler might not emit this *exact* sequence of
instructions. For example, Clang generates this for events_sysfs_show
with the modified KCFI patch:

2274: cmpl $0x4d7bed9e,-0x4(%r11)
227c: jne 22c0 <events_sysfs_show+0x6c>
227e: call 2283 <events_sysfs_show+0x2f>
227f: R_X86_64_PLT32 __x86_indirect_thunk_r11-0x4
...
22c0: ud2

In this case the function has two indirect calls and Clang seems to
prefer to emit just one ud2.

> .align 16
> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> func:
> endbr # 4 bytes

Here func is no longer aligned to 16 bytes, in case that's important.

> Further, Andrew put in the request for __attribute__((cfi_seed(blah)))
> to allow distinguishing indirect functions with otherwise identical
> signature; eg. cookie = hash32(blah##signature).

Sounds reasonable.

> Did I miss anything? Got anything wrong?

How would you like to deal with the 4-byte hashes in objtool? We
either need to annotate all function symbols in the kernel, or we need
a way to distinguish the hashes from random instructions, so we can
also have functions that don't have a type hash.

Sami

2022-02-14 22:53:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> On Fri, Feb 11, 2022 at 5:38 AM Peter Zijlstra <[email protected]> wrote:
> > I think we'll end up with something related to KCFI, but with distinct
> > differences:
> >
> > - 32bit immediates for smaller code
>
> Sure, I don't see issues with that. Based on a quick test with
> defconfig, this reduces vmlinux size by 0.30%.
>
> > - __kcfi_check_fail() is out for smaller code
>
> I'm fine with adding a trap mode that's used by default, but having
> more helpful diagnostics when something fails is useful even in
> production systems in my experience. This change results in a vmlinux
> that's another 0.92% smaller.

You can easily have the exception generate a nice warning, you can even
have it continue. You really don't need a call for that.

> > Which then yields:
> >
> > caller:
> > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
> > je 1f # 2 bytes
> > ud2 # 2 bytes
> > 1: call __x86_indirect_thunk_rax # 5 bytes
>
> Note that the compiler might not emit this *exact* sequence of
> instructions. For example, Clang generates this for events_sysfs_show
> with the modified KCFI patch:
>
> 2274: cmpl $0x4d7bed9e,-0x4(%r11)
> 227c: jne 22c0 <events_sysfs_show+0x6c>
> 227e: call 2283 <events_sysfs_show+0x2f>
> 227f: R_X86_64_PLT32 __x86_indirect_thunk_r11-0x4
> ...
> 22c0: ud2
>
> In this case the function has two indirect calls and Clang seems to
> prefer to emit just one ud2.

That will not allow you to recover from the exception. UD2 is not an
unconditional fail. It should have an out-going edge in this case too.

Heck, most of the WARN_ON() things are UD2 instructions.

Also, you really should add a CS prefix to the retpoline thunk call if
you insist on using r11 (or any of the higher regs).

> > .align 16
> > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> > func:
> > endbr # 4 bytes
>
> Here func is no longer aligned to 16 bytes, in case that's important.

The idea was to have the hash and the endbr in the same cacheline.

> > Did I miss anything? Got anything wrong?
>
> How would you like to deal with the 4-byte hashes in objtool? We
> either need to annotate all function symbols in the kernel, or we need
> a way to distinguish the hashes from random instructions, so we can
> also have functions that don't have a type hash.

Easiest would be to create a special section with all the hash offsets
in I suppose. A bit like -mfentry-section=name.

2022-02-15 17:06:24

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <[email protected]> wrote:
> On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > I'm fine with adding a trap mode that's used by default, but having
> > more helpful diagnostics when something fails is useful even in
> > production systems in my experience. This change results in a vmlinux
> > that's another 0.92% smaller.
>
> You can easily have the exception generate a nice warning, you can even
> have it continue. You really don't need a call for that.

Sure, but wouldn't that require us to generate something like
__bug_table, so we know where the CFI specific traps are?

> > In this case the function has two indirect calls and Clang seems to
> > prefer to emit just one ud2.
>
> That will not allow you to recover from the exception. UD2 is not an
> unconditional fail. It should have an out-going edge in this case too.

Yes, CFI failures are not recoverable in that code. In fact, LLVM
assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
suppose we could just use an int3 instead. I assume that's sufficient
to stop speculation?

> Also, you really should add a CS prefix to the retpoline thunk call if
> you insist on using r11 (or any of the higher regs).

I actually didn't touch the retpoline thunk call, that's exactly the
code Clang normally generates.

> > How would you like to deal with the 4-byte hashes in objtool? We
> > either need to annotate all function symbols in the kernel, or we need
> > a way to distinguish the hashes from random instructions, so we can
> > also have functions that don't have a type hash.
>
> Easiest would be to create a special section with all the hash offsets
> in I suppose. A bit like -mfentry-section=name.

OK, I'll take a look. With 64-bit hashes I was planning to use a known
prefix, but that's not really an option with a 32-bit hash.

Sami

2022-02-15 23:53:20

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote:
> On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <[email protected]> wrote:
> > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > > I'm fine with adding a trap mode that's used by default, but having
> > > more helpful diagnostics when something fails is useful even in
> > > production systems in my experience. This change results in a vmlinux
> > > that's another 0.92% smaller.
> >
> > You can easily have the exception generate a nice warning, you can even
> > have it continue. You really don't need a call for that.
>
> Sure, but wouldn't that require us to generate something like
> __bug_table, so we know where the CFI specific traps are?

It also means the trap handler needs to do a bunch of instruction
decoding to find the address that was going to be jumped to, etc.

> > > In this case the function has two indirect calls and Clang seems to
> > > prefer to emit just one ud2.
> >
> > That will not allow you to recover from the exception. UD2 is not an
> > unconditional fail. It should have an out-going edge in this case too.
>
> Yes, CFI failures are not recoverable in that code. In fact, LLVM
> assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
> suppose we could just use an int3 instead. I assume that's sufficient
> to stop speculation?

Peter, is there a reason you want things in the specific order of:

cmp, je-to-call, trap, call

Isn't it more run-time efficient to have an out-of-line failure of
the form:

cmp, jne-to-trap, call, ...code..., trap, jmp-to-call

I thought the static label stuff allowed the "default out of line"
option, as far as pessimizing certain states, etc? The former is certainly
code-size smaller, though, yes, but doesn't it waste space in the cache
line for the unlikely case, etc?

> > Also, you really should add a CS prefix to the retpoline thunk call if
> > you insist on using r11 (or any of the higher regs).
>
> I actually didn't touch the retpoline thunk call, that's exactly the
> code Clang normally generates.
>
> > > How would you like to deal with the 4-byte hashes in objtool? We
> > > either need to annotate all function symbols in the kernel, or we need
> > > a way to distinguish the hashes from random instructions, so we can
> > > also have functions that don't have a type hash.
> >
> > Easiest would be to create a special section with all the hash offsets
> > in I suppose. A bit like -mfentry-section=name.
>
> OK, I'll take a look. With 64-bit hashes I was planning to use a known
> prefix, but that's not really an option with a 32-bit hash.

32-bit hashes would have both code size and runtime benefits: fewer
instructions for the compare therefore a smaller set of instructions
added.

--
Kees Cook

2022-02-16 04:11:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 15, 2022 at 12:03:12PM -0800, Kees Cook wrote:
> On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote:
> > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <[email protected]> wrote:
> > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > > > I'm fine with adding a trap mode that's used by default, but having
> > > > more helpful diagnostics when something fails is useful even in
> > > > production systems in my experience. This change results in a vmlinux
> > > > that's another 0.92% smaller.
> > >
> > > You can easily have the exception generate a nice warning, you can even
> > > have it continue. You really don't need a call for that.
> >
> > Sure, but wouldn't that require us to generate something like
> > __bug_table, so we know where the CFI specific traps are?
>
> It also means the trap handler needs to do a bunch of instruction
> decoding to find the address that was going to be jumped to, etc.

arch/x86/kernel/alternative.c:apply_retpolines() has all that, since we
need to to know that to re-write the thunk-call.

> > > > In this case the function has two indirect calls and Clang seems to
> > > > prefer to emit just one ud2.
> > >
> > > That will not allow you to recover from the exception. UD2 is not an
> > > unconditional fail. It should have an out-going edge in this case too.
> >
> > Yes, CFI failures are not recoverable in that code. In fact, LLVM
> > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
> > suppose we could just use an int3 instead. I assume that's sufficient
> > to stop speculation?
>
> Peter, is there a reason you want things in the specific order of:
>
> cmp, je-to-call, trap, call
>
> Isn't it more run-time efficient to have an out-of-line failure of
> the form:
>
> cmp, jne-to-trap, call, ...code..., trap, jmp-to-call
>
> I thought the static label stuff allowed the "default out of line"
> option, as far as pessimizing certain states, etc? The former is certainly
> code-size smaller, though, yes, but doesn't it waste space in the cache
> line for the unlikely case, etc?

Mostly so that we can deduce the address of the trap from the retpoline
site, also the above has a fairly high chance of using jcc.d32 which is
actually larger than jcc.d8+ud2.

2022-02-16 06:23:14

by Joao Moreira

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

>>
>> Mostly so that we can deduce the address of the trap from the
>> retpoline
>> site, also the above has a fairly high chance of using jcc.d32 which
>> is
>> actually larger than jcc.d8+ud2.
>
> Ah, yeah, that's an interesting point.
>
> Still, I worry about finding ways to convinces Clang to emit precisely
> cmp/je/trap/call, but I guess we'll catch it immediately if it doesn't.
> :P

This can probably be done more easily/precisely if implemented directly
in the compiler's arch-specific backend. At least for x86 it wasn't a
hassle to emit a defined sequence of instructions in the past. The price
is that it will require a pass specific to each supported architecture,
but I guess this isn't that bad.

Perhaps this is discussion for a different mailing list, idk... but
just pointing that it is not a huge wall.

2022-02-16 06:33:57

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 15, 2022 at 10:05:50PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 15, 2022 at 12:03:12PM -0800, Kees Cook wrote:
> > On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote:
> > > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <[email protected]> wrote:
> > > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > > > > I'm fine with adding a trap mode that's used by default, but having
> > > > > more helpful diagnostics when something fails is useful even in
> > > > > production systems in my experience. This change results in a vmlinux
> > > > > that's another 0.92% smaller.
> > > >
> > > > You can easily have the exception generate a nice warning, you can even
> > > > have it continue. You really don't need a call for that.
> > >
> > > Sure, but wouldn't that require us to generate something like
> > > __bug_table, so we know where the CFI specific traps are?
> >
> > It also means the trap handler needs to do a bunch of instruction
> > decoding to find the address that was going to be jumped to, etc.
>
> arch/x86/kernel/alternative.c:apply_retpolines() has all that, since we
> need to to know that to re-write the thunk-call.

Ah, okay, well that makes things easier. :)

> > > > > In this case the function has two indirect calls and Clang seems to
> > > > > prefer to emit just one ud2.
> > > >
> > > > That will not allow you to recover from the exception. UD2 is not an
> > > > unconditional fail. It should have an out-going edge in this case too.
> > >
> > > Yes, CFI failures are not recoverable in that code. In fact, LLVM
> > > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
> > > suppose we could just use an int3 instead. I assume that's sufficient
> > > to stop speculation?
> >
> > Peter, is there a reason you want things in the specific order of:
> >
> > cmp, je-to-call, trap, call
> >
> > Isn't it more run-time efficient to have an out-of-line failure of
> > the form:
> >
> > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call
> >
> > I thought the static label stuff allowed the "default out of line"
> > option, as far as pessimizing certain states, etc? The former is certainly
> > code-size smaller, though, yes, but doesn't it waste space in the cache
> > line for the unlikely case, etc?
>
> Mostly so that we can deduce the address of the trap from the retpoline
> site, also the above has a fairly high chance of using jcc.d32 which is
> actually larger than jcc.d8+ud2.

Ah, yeah, that's an interesting point.

Still, I worry about finding ways to convinces Clang to emit precisely
cmp/je/trap/call, but I guess we'll catch it immediately if it doesn't.
:P

--
Kees Cook

2022-02-16 06:38:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote:
> On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <[email protected]> wrote:
> > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > > I'm fine with adding a trap mode that's used by default, but having
> > > more helpful diagnostics when something fails is useful even in
> > > production systems in my experience. This change results in a vmlinux
> > > that's another 0.92% smaller.
> >
> > You can easily have the exception generate a nice warning, you can even
> > have it continue. You really don't need a call for that.
>
> Sure, but wouldn't that require us to generate something like
> __bug_table, so we know where the CFI specific traps are?

Yes, but since you're going to emit a retpoline, and objtool already
generates a list of retpoline sites, we can abuse that. If the
instruction after the trap is a retpoline site, we a CFI fail.

> > > In this case the function has two indirect calls and Clang seems to
> > > prefer to emit just one ud2.
> >
> > That will not allow you to recover from the exception. UD2 is not an
> > unconditional fail. It should have an out-going edge in this case too.
>
> Yes, CFI failures are not recoverable in that code. In fact, LLVM
> assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
> suppose we could just use an int3 instead. I assume that's sufficient
> to stop speculation?

It is. int3 is also smaller by one byte, but the exception is already
fairly complicated, but I can see if we can make it work.

> > Also, you really should add a CS prefix to the retpoline thunk call if
> > you insist on using r11 (or any of the higher regs).
>
> I actually didn't touch the retpoline thunk call, that's exactly the
> code Clang normally generates.

Yeah, and it needs help, also see:

https://lore.kernel.org/lkml/[email protected]/

Specifically, clang needs to:

- CS prefix stuff the retpoline thunk call/jmp;
- stop doing conditional indirect tail-calls.

2022-02-16 06:47:03

by Joao Moreira

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On 2022-02-11 05:38, Peter Zijlstra wrote:
> On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote:
>> > Ah, excellent, thanks for the pointers. There's also this in the works:
>> > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
>> > to objtool, IBT, etc.)
>>
>> Oh, great! Thanks for pointing it out. I guess I saw something with a
>> similar name before ;)
>> https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf
>>
>> Jokes aside (and perhaps questions more targeted to Sami), from a
>> diagonal
>> look it seems that this follows the good old tag approach proposed by
>> PaX/grsecurity, right? If this is the case, should I assume it could
>> also
>> benefit from features like -mibt-seal? Also are you considering that
>> perhaps
>> we can use alternatives to flip different CFI instrumentation as
>> suggested
>> by PeterZ in another thread?
>
> So, lets try and recap things from IRC yesterday. There's a whole bunch
> of things intertwining making indirect branches 'interesting'. Most of
> which I've not seen mentioned in Sami's KCFI proposal which makes it
> rather pointless.
>
> I think we'll end up with something related to KCFI, but with distinct
> differences:
>
> - 32bit immediates for smaller code
> - __kcfi_check_fail() is out for smaller code
> - it must interact with IBT/BTI and retpolines
> - we must be very careful with speculation.
>
> Right, so because !IBT-CFI needs the check at the call site, like:
>
> caller:
> cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
> je 1f # 2 bytes
> ud2 # 2 bytes
> 1: call __x86_indirect_thunk_rax # 5 bytes
>
>
> .align 16
> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> func:
> ...
> ret
>
>
> While FineIBT has them at the landing site:
>
> caller:
> movl $0xdeadbeef, %r11d # 6 bytes
> call __x86_indirect_thunk_rax # 5 bytes
>
>
> .align 16
> func:
> endbr # 4 bytes
> cmpl $0xdeadbeef, %r11d # 7 bytes
> je 1f # 2 bytes
> ud2 # 2 bytes
> 1: ...
> ret
>
>
> It seems to me that always doing the check at the call-site is simpler,
> since it avoids code-bloat and patching work. That is, if we allow both
> we'll effectivly blow up the code by 11 + 13 bytes (11 at the call
> site,
> 13 at function entry) as opposed to 11+4 or 6+13.
>
> Which then yields:
>
> caller:
> cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
> je 1f # 2 bytes
> ud2 # 2 bytes
> 1: call __x86_indirect_thunk_rax # 5 bytes
>
>
> .align 16
> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> func:
> endbr # 4 bytes
> ...
> ret
>
> For a combined 11+8 bytes overhead :/
>
> Now, this setup provides:
>
> - minimal size (please yell if there's a smaller option I missed;
> s/ud2/int3/ ?)
> - since the retpoline handles speculation from stuff before it, the
> load-miss induced speculation is covered.
> - the 'je' branch is binary, leading to either the retpoline or the
> ud2, both which are a speculation stop.
> - the ud2 is placed such that if the exception is non-fatal, code
> execution can recover
> - when IBT is present we can rewrite the thunk call to:
>
> lfence
> call *(%rax)
>
> and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes).
> - can disable CFI by replacing the cmpl with:
>
> jmp 1f
>
> (or an 11 byte nop, which is just about possible). And since we
> already have all retpoline thunk callsites in a section, we can
> trivially find all CFI bits that are always in front it them.
> - function pointer sanity
>

Agreed that it is sensible to support CFI for CPUs without CET. KCFI is
a win.
Yet, I still think that we should support FineIBT and there are a few
reasons
for that (other than hopeful performance benefits).

- KCFI is more prone to the presence of unintended allowed targets.
Since the
IBT scheme always rely on the same watermark tag (the ENDBR
instruction), it
is easier to prevent these from being emitted by JIT/compilers (fwiiw,
see
https://reviews.llvm.org/rGf385823e04f300c92ec03dbd660d621cc618a271 and
https://dl.acm.org/doi/10.1145/3337167.3337175).

- Regarding the space overhead, I can try to verify if FineIBT can be
feasibly
reshaped into:

caller:
movl $0xdeadbeef,%r10 # 6 bytes
sub $0x5,%r11 # 4 bytes
call *(%r11) # 5 bytes

.align 16
endbr # 4 bytes
call __x86_fineibt_thunk_deadbeef # 5 bytes
func+4:
...
ret

__x86_fineibt_thunk_deadbeef:
xor $0xdeadbeef, %r10
je 1f
ud2
1:
retq

This scheme would require less space and less runtime patching. We would
need
one additional byte on callees to patch both the ENDBR and the 5-byte
call
instruction, plus space to hold the thunks somewhere else. The thunks
overhead
is then dilluted across the functions belonging to the same equivalence
set,
as these will use the same thunk. On a quick analysis over defconfig, it
seems
that the number of equivalence sets are ~25% of the number of functions
(thus,
space overhead is cut to a fourth). I guess this would only require the
KCFI
compiler support to emit an additional "0xcc" before the tag.

Thunks can also be placed in a special section and dropped during boot
to
reduce the runtime memory footprint.

What do you think, is this worth investigating?

Also, in my understanding, r11 does not need to be preserved as,
per-ABI, it
is a scratch register. So there is no need to add $0x5,%r11 back after
the
call. Let me know if I'm mistaken.

>
> Additionally, if we ensure all direct call are +4 and only indirect
> calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR.
> We
> can replace those ENDBR instructions of functions that should never be
> indirectly called with:
>
> ud1 0x0(%rax),%eax
>
> which is a 4 byte #UD. This gives us the property that even on !IBT
> hardware such a call will go *splat*.

Given that the space overhead is a big concern, isn't it better to use
the
compiler support to prevent ENDBRs in the first place? Patching #UD is
kinda
cool, yeah, but I don't see it providing meaningful security guarantees
over
not having the ENDBRs emitted at all.

>
> Further, Andrew put in the request for __attribute__((cfi_seed(blah)))
> to allow distinguishing indirect functions with otherwise identical
> signature; eg. cookie = hash32(blah##signature).
>
>
> Did I miss anything? Got anything wrong?

I understand that Today there are not too many CET-enabled CPUs out
there,
and that the benefits might be a bit blurred currently. Yet, given that
this
is something that should continuously change with time, would you object
to
FineIBT as a build option, i.e., something which might not be supported
by
alternatives when using defconfig?

2022-02-16 12:30:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 15, 2022 at 10:05:50PM +0100, Peter Zijlstra wrote:

> > Peter, is there a reason you want things in the specific order of:
> >
> > cmp, je-to-call, trap, call
> >
> > Isn't it more run-time efficient to have an out-of-line failure of
> > the form:
> >
> > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call
> >
> > I thought the static label stuff allowed the "default out of line"
> > option, as far as pessimizing certain states, etc? The former is certainly
> > code-size smaller, though, yes, but doesn't it waste space in the cache
> > line for the unlikely case, etc?
>
> Mostly so that we can deduce the address of the trap from the retpoline
> site, also the above has a fairly high chance of using jcc.d32 which is
> actually larger than jcc.d8+ud2.

Also; and I think I mentioned this a few emails back, by having the
whole CFI thing as a single range of bytes it becomes easier to patch.

2022-03-02 09:20:19

by Joao Moreira

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On 2022-03-01 19:06, Peter Collingbourne wrote:
> Hi Peter,
>
> One issue with this call sequence is that:
>
> On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
>> caller:
>> cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
>
> Because this instruction ends in the constant 0xdeadbeef, it may
> be used as a "gadget" that would effectively allow branching to an
> arbitrary address in %rax if the attacker can arrange to set ZF=1.
>
>> je 1f # 2 bytes
>> ud2 # 2 bytes
>> 1: call __x86_indirect_thunk_rax # 5 bytes
>>
>>
>> .align 16
>> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
>> func:
>> endbr # 4 bytes
>> ...
>> ret
>
> I think we can avoid this problem with a slight tweak to your
> instruction sequence, at the cost of 2 bytes per function prologue.
> First, change the call sequence like so:
>
> cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes
> je 1f # 2 bytes
> ud2 # 2 bytes
> 1: call __x86_indirect_thunk_rax # 5 bytes
>
> The key difference is that we've changed 0x4 to 0x6.
>
> Then change the function prologue to this:
>
> .align 16
> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> .zero 2 # 2 bytes
> func:
>
> The end result of the above is that the constant embedded in the cmpl
> instruction may only be used to reach the following ud2 instruction,
> which will "harmlessly" terminate execution in the same way as if
> the prologue signature did not match.

FWIIW, this makes sense IMHO. These additional pre-prologue bytes are
also what might be needed to enable the smaller version of FineIBT that
I suggested in this thread some time ago.

2022-03-02 23:11:56

by Peter Collingbourne

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

Hi Peter,

One issue with this call sequence is that:

On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
> caller:
> cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes

Because this instruction ends in the constant 0xdeadbeef, it may
be used as a "gadget" that would effectively allow branching to an
arbitrary address in %rax if the attacker can arrange to set ZF=1.

> je 1f # 2 bytes
> ud2 # 2 bytes
> 1: call __x86_indirect_thunk_rax # 5 bytes
>
>
> .align 16
> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> func:
> endbr # 4 bytes
> ...
> ret

I think we can avoid this problem with a slight tweak to your
instruction sequence, at the cost of 2 bytes per function prologue.
First, change the call sequence like so:

cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes
je 1f # 2 bytes
ud2 # 2 bytes
1: call __x86_indirect_thunk_rax # 5 bytes

The key difference is that we've changed 0x4 to 0x6.

Then change the function prologue to this:

.align 16
.byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
.zero 2 # 2 bytes
func:

The end result of the above is that the constant embedded in the cmpl
instruction may only be used to reach the following ud2 instruction,
which will "harmlessly" terminate execution in the same way as if
the prologue signature did not match.

Peter

2022-06-08 18:19:23

by Fangrui Song

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

Hi Peter,

On Tue, Mar 1, 2022 at 7:06 PM Peter Collingbourne <[email protected]> wrote:
>
> Hi Peter,
> One issue with this call sequence is that:
>
> On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
> > caller:
> > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
>
> Because this instruction ends in the constant 0xdeadbeef, it may
> be used as a "gadget" that would effectively allow branching to an
> arbitrary address in %rax if the attacker can arrange to set ZF=1.

Do you mind elaborating how this instruction can be used as a gadget?
How does it look like?

The information will be useful to the summary of Sami's KCFI LLVM
patch: https://reviews.llvm.org/D119296

> > je 1f # 2 bytes
> > ud2 # 2 bytes
> > 1: call __x86_indirect_thunk_rax # 5 bytes
> >
> >
> > .align 16
> > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> > func:
> > endbr # 4 bytes
> > ...
> > ret
>
> I think we can avoid this problem with a slight tweak to your
> instruction sequence, at the cost of 2 bytes per function prologue.
> First, change the call sequence like so:
>
> cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes
> je 1f # 2 bytes
> ud2 # 2 bytes
> 1: call __x86_indirect_thunk_rax # 5 bytes
>
> The key difference is that we've changed 0x4 to 0x6.
>
> Then change the function prologue to this:
>
> .align 16
> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes
> .zero 2 # 2 bytes
> func:
>
> The end result of the above is that the constant embedded in the cmpl
> instruction may only be used to reach the following ud2 instruction,
> which will "harmlessly" terminate execution in the same way as if
> the prologue signature did not match.
>
> Peter
>


--
宋方睿

2022-06-09 00:40:32

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Wed, Jun 8, 2022 at 10:53 AM Fāng-ruì Sòng <[email protected]> wrote:
>
> Hi Peter,
>
> On Tue, Mar 1, 2022 at 7:06 PM Peter Collingbourne <[email protected]> wrote:
> >
> > Hi Peter,
> > One issue with this call sequence is that:
> >
> > On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
> > > caller:
> > > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes
> >
> > Because this instruction ends in the constant 0xdeadbeef, it may
> > be used as a "gadget" that would effectively allow branching to an
> > arbitrary address in %rax if the attacker can arrange to set ZF=1.
>
> Do you mind elaborating how this instruction can be used as a gadget?
> How does it look like?

With the offset of -4, the je instruction here can be an indirect call
target because it's preceded by a valid type hash at the end of the
cmpl instruction. If we change the offset to -6, only the ud2
instruction is a potential call target in this sequence, which will be
less useful to an attacker.

> The information will be useful to the summary of Sami's KCFI LLVM
> patch: https://reviews.llvm.org/D119296

I'll add more information about the X86 preamble to the description.

Sami