2021-04-16 22:00:46

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 00/15] x86: Add support for Clang CFI

This series adds support for Clang's Control-Flow Integrity (CFI)
checking for x86_64. With CFI, the compiler injects a runtime check
before each indirect function call to ensure the target is a valid
function with the correct static type. This restricts possible call
targets and makes it more difficult for an attacker to exploit bugs
that allow the modification of stored function pointers. For more
details, see:

https://clang.llvm.org/docs/ControlFlowIntegrity.html

The first two patches contain objtool support for CFI, and the
remaining patches disable CFI where it shouldn't be used and fix
other smaller issues, such as type conflicts that confuse the
compiler.

Note that the patches are based on next-20210416. You can also pull
the series from

https://github.com/samitolvanen/linux.git x86-cfi-v1


Kees Cook (3):
x86/extable: Do not mark exception callback as CFI
x86/alternatives: Use C int3 selftest but disable KASAN
x86, relocs: Ignore __typeid__ relocations

Sami Tolvanen (12):
objtool: Find a destination for jumps beyond the section end
objtool: Add CONFIG_CFI_CLANG support
objtool: Add ASM_STACK_FRAME_NON_STANDARD
static_call: Use global functions for the self-test
x86: Implement function_nocfi
x86: Avoid CFI jump tables in IDT and entry points
x86/ftrace: Use function_nocfi in MCOUNT_ADDR
x86/purgatory: Disable CFI
x86, module: Ignore __typeid__ relocations
x86, cpu: Use LTO for cpu.c with CFI
x86, kprobes: Fix optprobe_template_func type mismatch
x86, build: Allow CONFIG_CFI_CLANG to be selected

arch/x86/Kconfig | 1 +
arch/x86/include/asm/desc.h | 8 ++++-
arch/x86/include/asm/ftrace.h | 2 +-
arch/x86/include/asm/page.h | 14 +++++++++
arch/x86/kernel/Makefile | 3 ++
arch/x86/kernel/alternative.c | 21 +++----------
arch/x86/kernel/cpu/common.c | 8 ++---
arch/x86/kernel/idt.c | 2 +-
arch/x86/kernel/kprobes/opt.c | 4 +--
arch/x86/kernel/module.c | 4 +++
arch/x86/kernel/traps.c | 2 +-
arch/x86/mm/extable.c | 1 +
arch/x86/power/Makefile | 2 ++
arch/x86/purgatory/Makefile | 2 +-
arch/x86/tools/relocs.c | 7 +++++
arch/x86/xen/Makefile | 2 ++
include/linux/objtool.h | 5 +++
kernel/static_call.c | 4 +--
tools/include/linux/objtool.h | 5 +++
tools/objtool/check.c | 4 +++
tools/objtool/elf.c | 48 +++++++++++++++++++++++++++++
tools/objtool/include/objtool/elf.h | 2 +-
22 files changed, 119 insertions(+), 32 deletions(-)


base-commit: 18250b538735142307082e4e99e3ae5c12d44013
--
2.31.1.368.gbe11c130af-goog


2021-04-16 22:00:47

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 04/15] static_call: Use global functions for the self-test

With CONFIG_CFI_CLANG, the compiler renames static functions. This
breaks static_call users using static functions, because the current
implementation assumes functions have stable names by hardcoding them
in inline assembly. Make the self-test functions global to prevent the
compiler from renaming them.

Signed-off-by: Sami Tolvanen <[email protected]>
---
kernel/static_call.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/static_call.c b/kernel/static_call.c
index 723fcc9d20db..d09f500c2d2a 100644
--- a/kernel/static_call.c
+++ b/kernel/static_call.c
@@ -503,12 +503,12 @@ long __static_call_return0(void)

#ifdef CONFIG_STATIC_CALL_SELFTEST

-static int func_a(int x)
+int func_a(int x)
{
return x+1;
}

-static int func_b(int x)
+int func_b(int x)
{
return x+2;
}
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:01:14

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 02/15] objtool: Add CONFIG_CFI_CLANG support

With CONFIG_CFI_CLANG, the compiler replaces function references with
references to the CFI jump table, which confuses objtool. This change,
based on Josh's initial patch [1], goes through the list of relocations
and replaces jump table symbols with the actual function symbols.

[1] https://lore.kernel.org/r/d[email protected]redhat.com/

Reported-by: Sedat Dilek <[email protected]>
Suggested-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
---
tools/objtool/elf.c | 48 +++++++++++++++++++++++++++++
tools/objtool/include/objtool/elf.h | 2 +-
2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index d08f5f3670f8..5cf2c61ce7b8 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -273,6 +273,10 @@ static int read_sections(struct elf *elf)
}
sec->len = sec->sh.sh_size;

+ /* Detect -fsanitize=cfi jump table sections */
+ if (!strncmp(sec->name, ".text..L.cfi.jumptable", 22))
+ sec->cfi_jt = true;
+
list_add_tail(&sec->list, &elf->sections);
elf_hash_add(elf->section_hash, &sec->hash, sec->idx);
elf_hash_add(elf->section_name_hash, &sec->name_hash, str_hash(sec->name));
@@ -548,6 +552,48 @@ static int read_rela_reloc(struct section *sec, int i, struct reloc *reloc, unsi
return 0;
}

+static int fix_cfi_relocs(const struct elf *elf)
+{
+ struct section *sec, *tmpsec;
+ struct reloc *reloc, *tmpreloc;
+
+ list_for_each_entry_safe(sec, tmpsec, &elf->sections, list) {
+ list_for_each_entry_safe(reloc, tmpreloc, &sec->reloc_list, list) {
+ struct symbol *sym;
+ struct reloc *func_reloc;
+
+ /*
+ * CONFIG_CFI_CLANG replaces function relocations to refer
+ * to an intermediate jump table. Undo the conversion so
+ * objtool can make sense of things.
+ */
+ if (!reloc->sym->sec->cfi_jt)
+ continue;
+
+ if (reloc->sym->type == STT_SECTION)
+ sym = find_func_by_offset(reloc->sym->sec,
+ reloc->addend);
+ else
+ sym = reloc->sym;
+
+ if (!sym || !sym->sec)
+ continue;
+
+ /*
+ * The jump table immediately jumps to the actual function,
+ * so look up the relocation there.
+ */
+ func_reloc = find_reloc_by_dest_range(elf, sym->sec, sym->offset, 5);
+ if (!func_reloc || !func_reloc->sym)
+ continue;
+
+ reloc->sym = func_reloc->sym;
+ }
+ }
+
+ return 0;
+}
+
static int read_relocs(struct elf *elf)
{
struct section *sec;
@@ -608,6 +654,8 @@ static int read_relocs(struct elf *elf)
tot_reloc += nr_reloc;
}

+ fix_cfi_relocs(elf);
+
if (stats) {
printf("max_reloc: %lu\n", max_reloc);
printf("tot_reloc: %lu\n", tot_reloc);
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index 45e5ede363b0..ef19578fc5e4 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -39,7 +39,7 @@ struct section {
char *name;
int idx;
unsigned int len;
- bool changed, text, rodata, noinstr;
+ bool changed, text, rodata, noinstr, cfi_jt;
};

struct symbol {
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:01:35

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 05/15] x86: Implement function_nocfi

With CONFIG_CFI_CLANG, the compiler replaces function addresses in
instrumented C code with jump table addresses. This change implements
the function_nocfi() macro, which returns the actual function address
instead.

Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/include/asm/page.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 7555b48803a8..5499a05c44fc 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -71,6 +71,20 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
extern bool __virt_addr_valid(unsigned long kaddr);
#define virt_addr_valid(kaddr) __virt_addr_valid((unsigned long) (kaddr))

+#ifdef CONFIG_CFI_CLANG
+/*
+ * With CONFIG_CFI_CLANG, the compiler replaces function address
+ * references with the address of the function's CFI jump table
+ * entry. The function_nocfi macro always returns the address of the
+ * actual function instead.
+ */
+#define function_nocfi(x) ({ \
+ void *addr; \
+ asm("leaq " __stringify(x) "(%%rip), %0\n\t" : "=r" (addr)); \
+ addr; \
+})
+#endif
+
#endif /* __ASSEMBLY__ */

#include <asm-generic/memory_model.h>
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:01:54

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 08/15] x86/extable: Do not mark exception callback as CFI

From: Kees Cook <[email protected]>

The exception table entries are constructed out of a relative offset
and point to the actual function, not the CFI table entry. For now,
just mark the caller as not checking CFI. The failure is most visible
at boot with CONFIG_DEBUG_RODATA_TEST=y.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/mm/extable.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index b93d6cd08a7f..a7eae1c4c59f 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -155,6 +155,7 @@ enum handler_type ex_get_fault_handler_type(unsigned long ip)
return EX_HANDLER_OTHER;
}

+__nocfi
int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
unsigned long fault_addr)
{
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:02:17

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 10/15] x86/purgatory: Disable CFI

Disable CONFIG_CFI_CLANG for the stand-alone purgatory.ro.

Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/purgatory/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..ed46ad780130 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -31,7 +31,7 @@ KCOV_INSTRUMENT := n
# These are adjustments to the compiler flags used for objects that
# make up the standalone purgatory.ro

-PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel
+PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel $(CC_FLAGS_CFI)
PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss -g0
PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN) -DDISABLE_BRANCH_PROFILING
PURGATORY_CFLAGS += -fno-stack-protector
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:02:31

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 03/15] objtool: Add ASM_STACK_FRAME_NON_STANDARD

To use the STACK_FRAME_NON_STANDARD macro for a static symbol
defined in inline assembly, we need a C declaration that implies
global visibility. This type mismatch confuses the compiler with
CONFIG_CFI_CLANG. This change adds an inline assembly version of
the macro to avoid the issue.

Signed-off-by: Sami Tolvanen <[email protected]>
---
include/linux/objtool.h | 5 +++++
tools/include/linux/objtool.h | 5 +++++
2 files changed, 10 insertions(+)

diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index 7e72d975cb76..2f29ce48ab5f 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -66,6 +66,11 @@ struct unwind_hint {
static void __used __section(".discard.func_stack_frame_non_standard") \
*__func_stack_frame_non_standard_##func = func

+#define ASM_STACK_FRAME_NON_STANDARD(func) \
+ ".pushsection .discard.func_stack_frame_non_standard, \"aw\"\n" \
+ ".long " __stringify(func) " - .\n" \
+ ".popsection\n"
+
#else /* __ASSEMBLY__ */

/*
diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
index 7e72d975cb76..2f29ce48ab5f 100644
--- a/tools/include/linux/objtool.h
+++ b/tools/include/linux/objtool.h
@@ -66,6 +66,11 @@ struct unwind_hint {
static void __used __section(".discard.func_stack_frame_non_standard") \
*__func_stack_frame_non_standard_##func = func

+#define ASM_STACK_FRAME_NON_STANDARD(func) \
+ ".pushsection .discard.func_stack_frame_non_standard, \"aw\"\n" \
+ ".long " __stringify(func) " - .\n" \
+ ".popsection\n"
+
#else /* __ASSEMBLY__ */

/*
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:02:33

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 06/15] x86: Avoid CFI jump tables in IDT and entry points

With CONFIG_CFI_CLANG, the compiler replaces function addresses in C
code with jump table addresses. To avoid referring to jump tables in
entry code with PTI, disable CFI for IDT and paravirt code, and use
function_nocfi() to prevent jump table addresses from being added to
the IDT or system call entry points.

Reported-by: Sedat Dilek <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
arch/x86/include/asm/desc.h | 8 +++++++-
arch/x86/kernel/Makefile | 3 +++
arch/x86/kernel/cpu/common.c | 8 ++++----
arch/x86/kernel/idt.c | 2 +-
arch/x86/kernel/traps.c | 2 +-
arch/x86/xen/Makefile | 2 ++
6 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 476082a83d1c..96666256eec2 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -381,7 +381,13 @@ static inline void set_desc_limit(struct desc_struct *desc, unsigned long limit)
desc->limit1 = (limit >> 16) & 0xf;
}

-void alloc_intr_gate(unsigned int n, const void *addr);
+/*
+ * Use function_nocfi() to prevent the compiler from replacing function
+ * addresses with jump table addresses when CONFIG_CFI_CLANG is enabled.
+ */
+#define alloc_intr_gate(n, addr) __alloc_intr_gate(n, function_nocfi(addr))
+
+void __alloc_intr_gate(unsigned int n, const void *addr);

static inline void init_idt_data(struct idt_data *data, unsigned int n,
const void *addr)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0704c2a94272..12a739eb208e 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -46,6 +46,9 @@ endif
# non-deterministic coverage.
KCOV_INSTRUMENT := n

+CFLAGS_REMOVE_idt.o := $(CC_FLAGS_CFI)
+CFLAGS_REMOVE_paravirt.o := $(CC_FLAGS_CFI)
+
CFLAGS_head$(BITS).o += -fno-stack-protector

CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6bdb69a9a7dc..e6cff98b182a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1752,10 +1752,10 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
void syscall_init(void)
{
wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
- wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
+ wrmsrl(MSR_LSTAR, (unsigned long)function_nocfi(entry_SYSCALL_64));

#ifdef CONFIG_IA32_EMULATION
- wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
+ wrmsrl(MSR_CSTAR, (unsigned long)function_nocfi(entry_SYSCALL_compat));
/*
* This only works on Intel CPUs.
* On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
@@ -1765,9 +1765,9 @@ void syscall_init(void)
wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
(unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
- wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
+ wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)function_nocfi(entry_SYSENTER_compat));
#else
- wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
+ wrmsrl(MSR_CSTAR, (unsigned long)function_nocfi(ignore_sysret));
wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index d552f177eca0..6574a742e373 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -340,7 +340,7 @@ void idt_invalidate(void *addr)
load_idt(&idt);
}

-void __init alloc_intr_gate(unsigned int n, const void *addr)
+void __init __alloc_intr_gate(unsigned int n, const void *addr)
{
if (WARN_ON(n < FIRST_SYSTEM_VECTOR))
return;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 853ea7a80806..54616dc49116 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -403,7 +403,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
* which is what the stub expects, given that the faulting
* RIP will be the IRET instruction.
*/
- regs->ip = (unsigned long)asm_exc_general_protection;
+ regs->ip = (unsigned long)function_nocfi(asm_exc_general_protection);
regs->sp = (unsigned long)&gpregs->orig_ax;

return;
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 40b5779fce21..61e2d9b2fa43 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -11,6 +11,8 @@ endif
CFLAGS_enlighten_pv.o := -fno-stack-protector
CFLAGS_mmu_pv.o := -fno-stack-protector

+CFLAGS_REMOVE_enlighten_pv.o := $(CC_FLAGS_CFI)
+
obj-y += enlighten.o
obj-y += mmu.o
obj-y += time.o
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:02:44

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 11/15] x86, relocs: Ignore __typeid__ relocations

From: Kees Cook <[email protected]>

The __typeid__* symbols aren't actually relocations, so they can be
ignored during relocation generation.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/tools/relocs.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 04c5a44b9682..78516ccea0c8 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -48,6 +48,7 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
"^(xen_irq_disable_direct_reloc$|"
"xen_save_fl_direct_reloc$|"
"VDSO|"
+ "__typeid__|"
"__crc_)",

/*
@@ -808,6 +809,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
symname);
break;

+ case R_X86_64_8:
+ if (!shn_abs || !is_reloc(S_ABS, symname))
+ die("Non-whitelisted %s relocation: %s\n",
+ rel_type(r_type), symname);
+ break;
+
case R_X86_64_32:
case R_X86_64_32S:
case R_X86_64_64:
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:03:10

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 13/15] x86, cpu: Use LTO for cpu.c with CFI

Allow LTO to be used for cpu.c when CONFIG_CFI_CLANG is enabled to avoid
indirect call failures. CFI requires Clang >= 12, which doesn't have the
stack protector inlining bug.

Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/power/Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile
index 379777572bc9..a0532851fed7 100644
--- a/arch/x86/power/Makefile
+++ b/arch/x86/power/Makefile
@@ -4,9 +4,11 @@
# itself be stack-protected
CFLAGS_cpu.o := -fno-stack-protector

+ifndef CONFIG_CFI_CLANG
# Clang may incorrectly inline functions with stack protector enabled into
# __restore_processor_state(): https://bugs.llvm.org/show_bug.cgi?id=47479
CFLAGS_REMOVE_cpu.o := $(CC_FLAGS_LTO)
+endif

obj-$(CONFIG_PM_SLEEP) += cpu.o
obj-$(CONFIG_HIBERNATION) += hibernate_$(BITS).o hibernate_asm_$(BITS).o hibernate.o
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:03:14

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 14/15] x86, kprobes: Fix optprobe_template_func type mismatch

The optprobe_template_func symbol is defined in inline assembly,
but it's not marked global, which conflicts with the C declaration
needed for STACK_FRAME_NON_STANDARD and confuses the compiler when
CONFIG_CFI_CLANG is enabled.

Marking the symbol global would make the compiler happy, but as the
compiler also generates a CFI jump table entry for all address-taken
functions, the jump table ends up containing a jump to the .rodata
section where optprobe_template_func resides, which results in an
objtool warning.

Use ASM_STACK_FRAME_NON_STANDARD instead to avoid both issues.

Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/kernel/kprobes/opt.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 71425ebba98a..95375ef5deee 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -103,6 +103,7 @@ static void synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long val)
asm (
".pushsection .rodata\n"
"optprobe_template_func:\n"
+ ASM_STACK_FRAME_NON_STANDARD(optprobe_template_func)
".global optprobe_template_entry\n"
"optprobe_template_entry:\n"
#ifdef CONFIG_X86_64
@@ -154,9 +155,6 @@ asm (
"optprobe_template_end:\n"
".popsection\n");

-void optprobe_template_func(void);
-STACK_FRAME_NON_STANDARD(optprobe_template_func);
-
#define TMPL_CLAC_IDX \
((long)optprobe_template_clac - (long)optprobe_template_entry)
#define TMPL_MOVE_IDX \
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:03:15

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 09/15] x86/alternatives: Use C int3 selftest but disable KASAN

From: Kees Cook <[email protected]>

Instead of using inline asm for the int3 selftest (which confuses the
Clang's ThinLTO pass), this restores the C function but disables KASAN
(and tracing for good measure) to keep the things simple and avoid
unexpected side-effects. This attempts to keep the fix from commit
ecc606103837 ("x86/alternatives: Fix int3_emulate_call() selftest stack
corruption") without using inline asm.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/kernel/alternative.c | 21 ++++-----------------
1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 6974b5174495..669a23454c09 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -496,23 +496,10 @@ extern struct paravirt_patch_site __start_parainstructions[],
*
* See entry_{32,64}.S for more details.
*/
-
-/*
- * We define the int3_magic() function in assembly to control the calling
- * convention such that we can 'call' it from assembly.
- */
-
-extern void int3_magic(unsigned int *ptr); /* defined in asm */
-
-asm (
-" .pushsection .init.text, \"ax\", @progbits\n"
-" .type int3_magic, @function\n"
-"int3_magic:\n"
-" movl $1, (%" _ASM_ARG1 ")\n"
-" ret\n"
-" .size int3_magic, .-int3_magic\n"
-" .popsection\n"
-);
+static void __init __no_sanitize_address notrace int3_magic(unsigned int *ptr)
+{
+ *ptr = 1;
+}

extern __initdata unsigned long int3_selftest_ip; /* defined in asm below */

--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:03:29

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 01/15] objtool: Find a destination for jumps beyond the section end

With -ffunction-sections, Clang can generate a jump beyond the end of
a section when the section ends in an unreachable instruction. If the
offset matches the section length, use the last instruction as the
jump destination.

Signed-off-by: Sami Tolvanen <[email protected]>
---
tools/objtool/check.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index a0a8ce61cd4d..5fd60e055a5c 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -870,6 +870,10 @@ static int add_jump_destinations(struct objtool_file *file)
}

insn->jump_dest = find_insn(file, dest_sec, dest_off);
+
+ if (!insn->jump_dest && dest_sec->len == dest_off)
+ insn->jump_dest = find_last_insn(file, dest_sec);
+
if (!insn->jump_dest) {

/*
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:03:57

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 12/15] x86, module: Ignore __typeid__ relocations

Ignore the __typeid__ relocations generated with CONFIG_CFI_CLANG
when loading modules.

Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/kernel/module.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 5e9a34b5bd74..c4aeba237eef 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -197,6 +197,10 @@ static int __apply_relocate_add(Elf64_Shdr *sechdrs,
val -= (u64)loc;
write(loc, &val, 8);
break;
+ case R_X86_64_8:
+ if (!strncmp(strtab + sym->st_name, "__typeid__", 10))
+ break;
+ fallthrough;
default:
pr_err("%s: Unknown rela relocation: %llu\n",
me->name, ELF64_R_TYPE(rel[i].r_info));
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:04:08

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 07/15] x86/ftrace: Use function_nocfi in MCOUNT_ADDR

With CONFIG_CFI_CLANG, the compiler replaces the __fentry__ address in
MCOUNT_ADDR with the address of a CFI jump table. Use function_nocfi()
to get the actual function address.

Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/include/asm/ftrace.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 9f3130f40807..0b7567994f4a 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -6,7 +6,7 @@
#ifndef CC_USING_FENTRY
# error Compiler does not support fentry?
#endif
-# define MCOUNT_ADDR ((unsigned long)(__fentry__))
+# define MCOUNT_ADDR ((unsigned long)(function_nocfi(__fentry__)))
#define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */

#ifdef CONFIG_DYNAMIC_FTRACE
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:04:32

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH 15/15] x86, build: Allow CONFIG_CFI_CLANG to be selected

Select ARCH_SUPPORTS_CFI_CLANG to allow CFI to be enabled.

Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bf69d07e46b8..81d2dc568e56 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -106,6 +106,7 @@ config X86
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
select ARCH_SUPPORTS_LTO_CLANG if X86_64
select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
+ select ARCH_SUPPORTS_CFI_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
--
2.31.1.368.gbe11c130af-goog

2021-04-16 22:09:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 01:38:34PM -0700, Sami Tolvanen wrote:
> With CONFIG_CFI_CLANG, the compiler replaces function addresses in
> instrumented C code with jump table addresses. This change implements
> the function_nocfi() macro, which returns the actual function address
> instead.
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> ---
> arch/x86/include/asm/page.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
> index 7555b48803a8..5499a05c44fc 100644
> --- a/arch/x86/include/asm/page.h
> +++ b/arch/x86/include/asm/page.h
> @@ -71,6 +71,20 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
> extern bool __virt_addr_valid(unsigned long kaddr);
> #define virt_addr_valid(kaddr) __virt_addr_valid((unsigned long) (kaddr))
>
> +#ifdef CONFIG_CFI_CLANG

Almost every patch is talking about this magical config symbol but it
is nowhere to be found. How do I disable it, is there a Kconfig entry
somewhere, etc, etc?

> +/*
> + * With CONFIG_CFI_CLANG, the compiler replaces function address
> + * references with the address of the function's CFI jump table
> + * entry. The function_nocfi macro always returns the address of the
> + * actual function instead.
> + */
> +#define function_nocfi(x) ({ \
> + void *addr; \
> + asm("leaq " __stringify(x) "(%%rip), %0\n\t" : "=r" (addr)); \
> + addr; \
> +})

Also, eww.

It seems all these newfangled things we get, mean obfuscating code even
more. What's wrong with using __nocfi instead? That's nicely out of the
way so slap that in front of functions.

Also, did you even build this on, say, 32-bit allmodconfig?

Oh well.

In file included from ./include/linux/ftrace.h:22:0,
from ./include/linux/init_task.h:9,
from init/init_task.c:2:
./include/linux/ftrace.h: In function ‘ftrace_init_nop’:
./arch/x86/include/asm/ftrace.h:9:40: error: implicit declaration of function ‘function_nocfi’ [-Werror=implicit-function-declaration]
# define MCOUNT_ADDR ((unsigned long)(function_nocfi(__fentry__)))
^
./include/linux/ftrace.h:671:35: note: in expansion of macro ‘MCOUNT_ADDR’
return ftrace_make_nop(mod, rec, MCOUNT_ADDR);
^~~~~~~~~~~
In file included from ./include/linux/ftrace.h:22:0,
from ./include/linux/perf_event.h:49,
from ./include/linux/trace_events.h:10,
from ./include/trace/syscall.h:7,
from ./include/linux/syscalls.h:85,
from init/main.c:21:
./include/linux/ftrace.h: In function ‘ftrace_init_nop’:
./arch/x86/include/asm/ftrace.h:9:40: error: implicit declaration of function ‘function_nocfi’ [-Werror=implicit-function-declaration]
# define MCOUNT_ADDR ((unsigned long)(function_nocfi(__fentry__)))
^
./include/linux/ftrace.h:671:35: note: in expansion of macro ‘MCOUNT_ADDR’
return ftrace_make_nop(mod, rec, MCOUNT_ADDR);
^~~~~~~~~~~
In file included from ./include/linux/ftrace.h:22:0,
from ./include/linux/perf_event.h:49,
from ./include/linux/trace_events.h:10,
from ./include/trace/syscall.h:7,
from ./include/linux/syscalls.h:85,
from init/initramfs.c:10:
./include/linux/ftrace.h: In function ‘ftrace_init_nop’:
./arch/x86/include/asm/ftrace.h:9:40: error: implicit declaration of function ‘function_nocfi’ [-Werror=implicit-function-declaration]
# define MCOUNT_ADDR ((unsigned long)(function_nocfi(__fentry__)))
^
./include/linux/ftrace.h:671:35: note: in expansion of macro ‘MCOUNT_ADDR’
return ftrace_make_nop(mod, rec, MCOUNT_ADDR);
^~~~~~~~~~~
In file included from ./include/linux/ftrace.h:22:0,
from ./include/linux/perf_event.h:49,
from arch/x86/events/core.c:15:
./include/linux/ftrace.h: In function ‘ftrace_init_nop’:
./arch/x86/include/asm/ftrace.h:9:40: error: implicit declaration of function ‘function_nocfi’ [-Werror=implicit-function-declaration]
# define MCOUNT_ADDR ((unsigned long)(function_nocfi(__fentry__)))
...


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-16 22:11:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 04/15] static_call: Use global functions for the self-test

On Fri, Apr 16 2021 at 13:38, Sami Tolvanen wrote:

> With CONFIG_CFI_CLANG, the compiler renames static functions. This
> breaks static_call users using static functions, because the current
> implementation assumes functions have stable names by hardcoding them
> in inline assembly. Make the self-test functions global to prevent the
> compiler from renaming them.

Sorry, no.

> Signed-off-by: Sami Tolvanen <[email protected]>
> ---
> kernel/static_call.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/static_call.c b/kernel/static_call.c
> index 723fcc9d20db..d09f500c2d2a 100644
> --- a/kernel/static_call.c
> +++ b/kernel/static_call.c
> @@ -503,12 +503,12 @@ long __static_call_return0(void)
>
> #ifdef CONFIG_STATIC_CALL_SELFTEST
>
> -static int func_a(int x)
> +int func_a(int x)
> {
> return x+1;
> }
>
> -static int func_b(int x)
> +int func_b(int x)
> {
> return x+2;
> }

Did you even compile that?

Global functions without a prototype are generating warnings, but we can
ignore them just because of sekurity, right?

Aside of that polluting the global namespace with func_a/b just to work
around a tool shortcoming is beyond hillarious.

Fix the tool not the perfectly correct code.

Thanks,

tglx

2021-04-16 22:11:20

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 2:18 PM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Apr 16, 2021 at 01:38:34PM -0700, Sami Tolvanen wrote:
> > With CONFIG_CFI_CLANG, the compiler replaces function addresses in
> > instrumented C code with jump table addresses. This change implements
> > the function_nocfi() macro, which returns the actual function address
> > instead.
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > ---
> > arch/x86/include/asm/page.h | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
> > index 7555b48803a8..5499a05c44fc 100644
> > --- a/arch/x86/include/asm/page.h
> > +++ b/arch/x86/include/asm/page.h
> > @@ -71,6 +71,20 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
> > extern bool __virt_addr_valid(unsigned long kaddr);
> > #define virt_addr_valid(kaddr) __virt_addr_valid((unsigned long) (kaddr))
> >
> > +#ifdef CONFIG_CFI_CLANG
>
> Almost every patch is talking about this magical config symbol but it
> is nowhere to be found. How do I disable it, is there a Kconfig entry
> somewhere, etc, etc?

As I mentioned in the cover letter, this series is based on
linux-next. I forgot to include a link to the original patch series
that adds CONFIG_CFI_CLANG, but I'll be sure to point to it in the
next version. Sorry about the confusion.

> > +/*
> > + * With CONFIG_CFI_CLANG, the compiler replaces function address
> > + * references with the address of the function's CFI jump table
> > + * entry. The function_nocfi macro always returns the address of the
> > + * actual function instead.
> > + */
> > +#define function_nocfi(x) ({ \
> > + void *addr; \
> > + asm("leaq " __stringify(x) "(%%rip), %0\n\t" : "=r" (addr)); \
> > + addr; \
> > +})
>
> Also, eww.
>
> It seems all these newfangled things we get, mean obfuscating code even
> more. What's wrong with using __nocfi instead? That's nicely out of the
> way so slap that in front of functions.

__nocfi only disables CFI checking in a function, the compiler still
changes function addresses to point to the CFI jump table, which is
why we need function_nocfi().

> Also, did you even build this on, say, 32-bit allmodconfig?
>
> Oh well.
>
> In file included from ./include/linux/ftrace.h:22:0,
> from ./include/linux/init_task.h:9,
> from init/init_task.c:2:
> ./include/linux/ftrace.h: In function ‘ftrace_init_nop’:
> ./arch/x86/include/asm/ftrace.h:9:40: error: implicit declaration of function ‘function_nocfi’ [-Werror=implicit-function-declaration]

This is defined in linux-next, but I do see another issue, which I'll
fix in v2. Note that CFI_CLANG itself cannot be selected on 32-bit
x86.

Sami

2021-04-16 22:11:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
> __nocfi only disables CFI checking in a function, the compiler still
> changes function addresses to point to the CFI jump table, which is
> why we need function_nocfi().

So call it __func_addr() or get_function_addr() or so, so that at least
it is clear what this does.

Also, am I going to get a steady stream of patches adding that wrapper
to function names or is this it? IOW, have you built an allyesconfig to
see how many locations need touching?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-16 22:14:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 3:03 PM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
> > __nocfi only disables CFI checking in a function, the compiler still
> > changes function addresses to point to the CFI jump table, which is
> > why we need function_nocfi().
>
> So call it __func_addr() or get_function_addr() or so, so that at least
> it is clear what this does.
>

This seems backwards to me. If I do:

extern void foo(some signature);

then I would, perhaps naively, expect foo to be the actual symbol that
gets called and for the ABI to be changed to do the CFI checks. The
foo symbol would point to whatever magic is needed. I assume I'm
missing something.

2021-04-16 22:14:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16 2021 at 14:49, Sami Tolvanen wrote:
> On Fri, Apr 16, 2021 at 2:18 PM Borislav Petkov <[email protected]> wrote:
>> In file included from ./include/linux/ftrace.h:22:0,
>> from ./include/linux/init_task.h:9,
>> from init/init_task.c:2:
>> ./include/linux/ftrace.h: In function ‘ftrace_init_nop’:
>> ./arch/x86/include/asm/ftrace.h:9:40: error: implicit declaration of function ‘function_nocfi’ [-Werror=implicit-function-declaration]
>
> This is defined in linux-next, but I do see another issue, which I'll
> fix in v2. Note that CFI_CLANG itself cannot be selected on 32-bit
> x86.

Sure and because of that it's overrated to make sure that it does not
break the build. I know, sekurity ...

But aside of that when looking at the rest of the series, then I really
have to ask whether the only way to address this is to make a large
amount of code unreadable like this:

- wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
+ wrmsrl(MSR_LSTAR, (unsigned long)function_nocfi(entry_SYSCALL_64));

plus a gazillion of similar changes.

This is unreadable garbage.

Thanks,

tglx




2021-04-16 22:15:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 16, 2021 at 3:03 PM Borislav Petkov <[email protected]> wrote:
> >
> > On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
> > > __nocfi only disables CFI checking in a function, the compiler still
> > > changes function addresses to point to the CFI jump table, which is
> > > why we need function_nocfi().
> >
> > So call it __func_addr() or get_function_addr() or so, so that at least
> > it is clear what this does.
> >
>
> This seems backwards to me. If I do:
>
> extern void foo(some signature);
>
> then I would, perhaps naively, expect foo to be the actual symbol that

I'm just reading the patch:

... The function_nocfi macro always returns the address of the
+ * actual function instead.
+ */
+#define function_nocfi(x) ({ \
+ void *addr; \
+ asm("leaq " __stringify(x) "(%%rip), %0\n\t" : "=r" (addr)); \
+ addr;

so it does a rip-relative load into a reg which ends up with the function
address.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-16 22:18:10

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Sat, Apr 17, 2021 at 12:02:51AM +0200, Borislav Petkov wrote:
> On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
> > __nocfi only disables CFI checking in a function, the compiler still
> > changes function addresses to point to the CFI jump table, which is
> > why we need function_nocfi().
>
> So call it __func_addr() or get_function_addr() or so, so that at least
> it is clear what this does.

FWIW, it's been renamed already. I'll CC Mark back into the thread.
https://lore.kernel.org/lkml/[email protected]/

> Also, am I going to get a steady stream of patches adding that wrapper
> to function names or is this it? IOW, have you built an allyesconfig to
> see how many locations need touching?

Nooo. Much like __nocfi, this should be extremely rare and is only used in
places that must not be doing CFI nor working on the jump tables (e.g. the
syscall MSR). There list for arm64 in -next, for example, is short:


429d9a552e81 arm64: ftrace: use function_nocfi for ftrace_call
fbcdf27674bc arm64: add __nocfi to __apply_alternatives
f2324191e959 arm64: add __nocfi to functions that jump to a physical address
c4a384170f17 arm64: use function_nocfi with __pa_symbol
5198a15901d2 psci: use function_nocfi for cpu_resume
8e284f3ebed2 bpf: disable CFI in dispatcher functions


--
Kees Cook

2021-04-16 22:21:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 3:14 PM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 16, 2021 at 3:03 PM Borislav Petkov <[email protected]> wrote:
> > >
> > > On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
> > > > __nocfi only disables CFI checking in a function, the compiler still
> > > > changes function addresses to point to the CFI jump table, which is
> > > > why we need function_nocfi().
> > >
> > > So call it __func_addr() or get_function_addr() or so, so that at least
> > > it is clear what this does.
> > >
> >
> > This seems backwards to me. If I do:
> >
> > extern void foo(some signature);
> >
> > then I would, perhaps naively, expect foo to be the actual symbol that
>
> I'm just reading the patch:
>
> ... The function_nocfi macro always returns the address of the
> + * actual function instead.
> + */
> +#define function_nocfi(x) ({ \
> + void *addr; \
> + asm("leaq " __stringify(x) "(%%rip), %0\n\t" : "=r" (addr)); \
> + addr;
>
> so it does a rip-relative load into a reg which ends up with the function
> address.

This is horrible.

We made a mistake adapting the kernel to GCC's nonsensical stack
protector ABI, especially on 32-bit, instead of making GCC fix it.
Let's not repeat this with clang please.

Sami, I'm assuming that:

extern void func(void);

results in anything that takes a pointer to func getting a pointer to
some special magic descriptor instead of to func, so that:

void (*ptr)(void);
ptr = func;
ptr();

does the right thing. Then void (*)(void) is no longer a raw pointer. Fine.

But obviously there is code that needs real function pointers. How
about making this a first-class feature, or at least hacking around it
more cleanly. For example, what does this do:

char entry_whatever[];
wrmsrl(..., (unsigned long)entry_whatever);

or, alternatively,

extern void func() __attribute__((nocfi));

void (*ptr)(void);
ptr = func; /* probably fails to compile -- invalid conversion */

(unsigned long)func /* returns the actual pointer */

func(); /* works like normal */

And maybe allow this too:

void (*ptr)(void) __attribute__((nocfi);
ptr = func;
ptr(); /* emits an unchecked call. maybe warns, too. anyone who
does this needs to be extremely careful. */

--Andy

2021-04-16 22:27:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 06/15] x86: Avoid CFI jump tables in IDT and entry points

On Fri, Apr 16 2021 at 13:38, Sami Tolvanen wrote:
> With CONFIG_CFI_CLANG, the compiler replaces function addresses in C
> code with jump table addresses.

Fine.

> To avoid referring to jump tables in entry code with PTI,

What has this to do with PTI?

> disable CFI for IDT and paravirt code, and use function_nocfi() to
> prevent jump table addresses from being added to the IDT or system
> call entry points.

How does this changelog make sense for anyone not familiar with the
matter at hand?

Where is the analysis why excluding

> +CFLAGS_REMOVE_idt.o := $(CC_FLAGS_CFI)
> +CFLAGS_REMOVE_paravirt.o := $(CC_FLAGS_CFI)

all of idt.c and paravirt.c is correct and how that is going to be
correct in the future?

These files are excluded from CFI, so I can add whatever I want to them
and circumvent the purpose of CFI, right?

Brilliant plan that. But I know, sekurity ...

Thanks,

tglx

2021-04-16 22:29:01

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 16, 2021 at 3:03 PM Borislav Petkov <[email protected]> wrote:
> >
> > On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
> > > __nocfi only disables CFI checking in a function, the compiler still
> > > changes function addresses to point to the CFI jump table, which is
> > > why we need function_nocfi().
> >
> > So call it __func_addr() or get_function_addr() or so, so that at least
> > it is clear what this does.
> >
>
> This seems backwards to me. If I do:
>
> extern void foo(some signature);
>
> then I would, perhaps naively, expect foo to be the actual symbol that
> gets called

Yes.

> and for the ABI to be changed to do the CFI checks.

Uh, no? There's no ABI change -- indirect calls are changed to do the
checking.

> The
> foo symbol would point to whatever magic is needed.

No, the symbol points to the jump table entry. Direct calls get minimal
overhead and indirect calls can add the "is this function in the right
table" checking.

> I assume I'm
> missing something.

Further symbol vs address stuff is discussed here:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/cfi&id=ff301ceb5299551c3650d0e07ba879b766da4cc0

But note that this shouldn't turn into a discussion of "maybe Clang could
do CFI differently"; this is what Clang has.

https://clang.llvm.org/docs/ControlFlowIntegrity.html

--
Kees Cook

2021-04-16 22:39:12

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 03:20:17PM -0700, Andy Lutomirski wrote:
> But obviously there is code that needs real function pointers. How
> about making this a first-class feature, or at least hacking around it
> more cleanly. For example, what does this do:
>
> char entry_whatever[];
> wrmsrl(..., (unsigned long)entry_whatever);

This is just casting. It'll still resolve to the jump table entry.

> or, alternatively,
>
> extern void func() __attribute__((nocfi));

__nocfi says func() should not perform checking of correct jump table
membership for indirect calls.

But we don't want a global marking for a function to be ignored by CFI;
we don't want functions to escape CFI -- we want specific _users_ to
either not check CFI for indirect calls (__nocfi) or we want specific
passed addresses to avoid going through the jump table
(function_nocfi()).

So, instead of a cast, a wrapper is used to bypass instrumentation in
the very few cases its needed.

(Note that such a wrapper is no-op without CFI enabled.)

--
Kees Cook

2021-04-16 22:55:06

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 3:28 PM Kees Cook <[email protected]> wrote:
>
> On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 16, 2021 at 3:03 PM Borislav Petkov <[email protected]> wrote:
> > >
> > > On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
> > > > __nocfi only disables CFI checking in a function, the compiler still
> > > > changes function addresses to point to the CFI jump table, which is
> > > > why we need function_nocfi().
> > >
> > > So call it __func_addr() or get_function_addr() or so, so that at least
> > > it is clear what this does.
> > >
> >
> > This seems backwards to me. If I do:
> >
> > extern void foo(some signature);
> >
> > then I would, perhaps naively, expect foo to be the actual symbol that
> > gets called
>
> Yes.
>
> > and for the ABI to be changed to do the CFI checks.
>
> Uh, no? There's no ABI change -- indirect calls are changed to do the
> checking.

Maybe ABI is the wrong word, or maybe I'm not fully clued in. But, if I do:

extern void call_it(void (*ptr)(void));

and I define call_it in one translation unit and call it from another,
the ABI effectively changed, right? Because ptr is (depending on the
"canonical" mode) no longer a regular function pointer.

> > char entry_whatever[];
> > wrmsrl(..., (unsigned long)entry_whatever);
>
> This is just casting. It'll still resolve to the jump table entry.

How? As far as clang is concerned, entry_whatever isn't a function at
all. What jump table entry?

>
> > or, alternatively,
> >
> > extern void func() __attribute__((nocfi));
>
> __nocfi says func() should not perform checking of correct jump table
> membership for indirect calls.
>
> But we don't want a global marking for a function to be ignored by CFI;
> we don't want functions to escape CFI -- we want specific _users_ to
> either not check CFI for indirect calls (__nocfi) or we want specific
> passed addresses to avoid going through the jump table
> (function_nocfi()).

Maybe I spelled it badly, and I maybe I requested the wrong thing.
Here are actual required use cases.

1. I defined a function in asm. I want to tell clang that this
function is defined in asm, and for clang to behave accordingly:

.globl func
func:
; do stuff

later:

extern void func(void) [something here];

There really should be a way to write this correctly such that it
works regardless of the setting of
-fsanitize-cfi-canonical-jump-tables. This should not bypass CFI. It
should *work*, with CFI enforced. If I read all the various things
you linked correctly, this would be something like __cfi_noncanonical,
and I reserve the right to think that this is a horrible name.

2. I need a raw function pointer, thank you very much. I would like
to be honest about it, and I don't really want to bypass CFI, but I
need the actual bits in the actual symbol.

translation unit 1 defines func. Maybe it's C with
-fsanitize-cfi-canonical-jump-tables, maybe it's C with
-fno-sanitize-cfi-canonical-jump-tables or however it's spelled, and
maybe it's plain asm. Now translation unit 2 does:

2a. Uses a literal symbol, because it's going to modify function text
or poke an MSR or whatever:

wrmsrl(MSR_WHATEVER, func);

clang needs to give us *some* way to have a correct declaration of
func such that we can, without resorting to inline asm kludges, get
the actual bit pattern of the actual symbol.

2b. Maybe optional: convert from function pointer to bit pattern of
actual symbol.

If someone gives me a real, correctly typed C pointer representing a
function pointer, I want a way to find the address of the body of the
function. Then we can use it for things that aren't *calling* it per
se, e.g. disassembling it. This is not necessarily a fully formed
thought right now, but I think something along these lines might be
needed.

The reverse of 2b (converting from actual symbol to function pointer)
might be equivalent to 1, or it might not. I suppose there are some
subtleties here.

Be that as it may, it sounds like right now clang has some issues
interoperating with asm when CFI is enabled. If so, clang needs to be
improved.

(The unsigned long hack is not necessarily good enough. I should be able to do:

.global func
func:
; C ABI compliant code here

extern void func(void) [attribute as in 1]

unsigned long actual_address = [something clang actually understands](func);

If this thing refuses to work when fed a nonconstant function pointer
because of some genuinely good reason, fine. But, if 'func' is an
actual literal symbol name, this thing needs to be compile-time
constant expression.

>
> So, instead of a cast, a wrapper is used to bypass instrumentation in
> the very few cases its needed.

NAK. The compiler needs to cooperate IMO.

>
> (Note that such a wrapper is no-op without CFI enabled.)

> But note that this shouldn't turn into a discussion of "maybe Clang could
> do CFI differently"; this is what Clang has.
>
> https://clang.llvm.org/docs/ControlFlowIntegrity.html

If this is what Clang has, and Clang won't improve, then we can just
not apply these patches...

2021-04-16 23:00:00

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 03:52:44PM -0700, Andy Lutomirski wrote:
> > > char entry_whatever[];
> > > wrmsrl(..., (unsigned long)entry_whatever);
> >
> > This is just casting. It'll still resolve to the jump table entry.
>
> How? As far as clang is concerned, entry_whatever isn't a function at
> all. What jump table entry?

Whoops, sorry, I misread the [] as (). I thought you were just showing
an arbitrary function declaration, but I see what you mean now.

I am digesting the rest of your email now... :)

--
Kees Cook

2021-04-16 23:03:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16 2021 at 15:37, Kees Cook wrote:

> On Fri, Apr 16, 2021 at 03:20:17PM -0700, Andy Lutomirski wrote:
>> But obviously there is code that needs real function pointers. How
>> about making this a first-class feature, or at least hacking around it
>> more cleanly. For example, what does this do:
>>
>> char entry_whatever[];
>> wrmsrl(..., (unsigned long)entry_whatever);
>
> This is just casting. It'll still resolve to the jump table entry.
>
>> or, alternatively,
>>
>> extern void func() __attribute__((nocfi));
>
> __nocfi says func() should not perform checking of correct jump table
> membership for indirect calls.
>
> But we don't want a global marking for a function to be ignored by CFI;
> we don't want functions to escape CFI -- we want specific _users_ to
> either not check CFI for indirect calls (__nocfi) or we want specific
> passed addresses to avoid going through the jump table
> (function_nocfi()).

And that's why you mark entire files to be exempt without any rationale
why it makes sense.

Thanks,

tglx

2021-04-16 23:43:06

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 03:52:44PM -0700, Andy Lutomirski wrote:
> Maybe ABI is the wrong word, or maybe I'm not fully clued in. But, if I do:
>
> extern void call_it(void (*ptr)(void));
>
> and I define call_it in one translation unit and call it from another,
> the ABI effectively changed, right? Because ptr is (depending on the
> "canonical" mode) no longer a regular function pointer.

Right, I was thinking maybe "calling convention", or really, "the
ability to still use 'ptr' as if it were a function". Which is true,
yes. It's just that 'ptr' is aimed at a jump table that jumps to the
"true" function body.

> 1. I defined a function in asm. I want to tell clang that this
> function is defined in asm, and for clang to behave accordingly:
>
> .globl func
> func:
> ; do stuff
>
> later:
>
> extern void func(void) [something here];
>
> There really should be a way to write this correctly such that it
> works regardless of the setting of
> -fsanitize-cfi-canonical-jump-tables. This should not bypass CFI. It
> should *work*, with CFI enforced. If I read all the various things
> you linked correctly, this would be something like __cfi_noncanonical,
> and I reserve the right to think that this is a horrible name.

Yes, I find the name confusing too. Without noncanonical, we'd need
C call wrappers for every single .S function that had its address
taken. This is very common in crypto, for example. That level of extra
code seemed like a total non-starter. Instead, we just get a few places
we have to mark.

> 2. I need a raw function pointer, thank you very much. I would like
> to be honest about it, and I don't really want to bypass CFI, but I
> need the actual bits in the actual symbol.
>
> translation unit 1 defines func. Maybe it's C with
> -fsanitize-cfi-canonical-jump-tables, maybe it's C with
> -fno-sanitize-cfi-canonical-jump-tables or however it's spelled, and
> maybe it's plain asm. Now translation unit 2 does:
>
> 2a. Uses a literal symbol, because it's going to modify function text
> or poke an MSR or whatever:
>
> wrmsrl(MSR_WHATEVER, func);
>
> clang needs to give us *some* way to have a correct declaration of
> func such that we can, without resorting to inline asm kludges, get
> the actual bit pattern of the actual symbol.

We don't want version of a global symbol alias of func that points to
the function body, though; it's only very specific cases where this
should be stripped (MSR, ftrace, etc).

So, if there were some Clang-specific syntax for this, it would still be
used on a case-by-case basis. It would still be something like:

wrmsrl(MSR_WAT, __builtin_gimme_body_p(func));

Which is basically what already exists, just with a different name.

> 2b. Maybe optional: convert from function pointer to bit pattern of
> actual symbol.
>
> If someone gives me a real, correctly typed C pointer representing a
> function pointer, I want a way to find the address of the body of the
> function. Then we can use it for things that aren't *calling* it per
> se, e.g. disassembling it. This is not necessarily a fully formed
> thought right now, but I think something along these lines might be
> needed.
>
> The reverse of 2b (converting from actual symbol to function pointer)
> might be equivalent to 1, or it might not. I suppose there are some
> subtleties here.
>
> Be that as it may, it sounds like right now clang has some issues
> interoperating with asm when CFI is enabled. If so, clang needs to be
> improved.
>
> (The unsigned long hack is not necessarily good enough. I should be able to do:
>
> .global func
> func:
> ; C ABI compliant code here
>
> extern void func(void) [attribute as in 1]
>
> unsigned long actual_address = [something clang actually understands](func);
>
> If this thing refuses to work when fed a nonconstant function pointer
> because of some genuinely good reason, fine. But, if 'func' is an
> actual literal symbol name, this thing needs to be compile-time
> constant expression.

Okay, you're saying you want __builtin_gimme_body_p() to be a constant
expression for the compiler, not inline asm?

Given the very few places this is expected to be used, and the fact that
it works as-is already, why is this additional requirement useful?

> > So, instead of a cast, a wrapper is used to bypass instrumentation in
> > the very few cases its needed.
>
> NAK. The compiler needs to cooperate IMO.

It's trying very hard. ;)

> > But note that this shouldn't turn into a discussion of "maybe Clang could
> > do CFI differently"; this is what Clang has.
> >
> > https://clang.llvm.org/docs/ControlFlowIntegrity.html
>
> If this is what Clang has, and Clang won't improve, then we can just
> not apply these patches...

I'm not saying Clang can't change -- I'm saying redesigning the entire
implementation of Clang's CFI isn't feasible, and I want to avoid having
that become the requirement because that's unreasonable. Clang's current
CFI works for many other projects, it's supported, it's what Android
has been using on its kernels 3 years now. The twist, obviously, is that
other projects don't use asm the way the kernel does, so that's where
things get weird, and where we've already been getting help from LLVM
folks to improve the situation.

If the solution is a new Clang builtin, okay, but I'd just like to
understand why that's justified compared to the existing solution
(especially since the resulting machine code is likely to be nearly
identical in the current uses).

-Kees

--
Kees Cook

2021-04-16 23:58:01

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 06/15] x86: Avoid CFI jump tables in IDT and entry points

On Sat, Apr 17, 2021 at 12:26:56AM +0200, Thomas Gleixner wrote:
> On Fri, Apr 16 2021 at 13:38, Sami Tolvanen wrote:
> > With CONFIG_CFI_CLANG, the compiler replaces function addresses in C
> > code with jump table addresses.
>
> Fine.
>
> > To avoid referring to jump tables in entry code with PTI,
>
> What has this to do with PTI?

Short answer: in earlier development of this series, entry routines
were attempting to jump to the (unmapped) jump tables, and IDT code had
similar issues. But yes, the commit message can be improved; I'll let
Sami explain the details.

> > disable CFI for IDT and paravirt code, and use function_nocfi() to
> > prevent jump table addresses from being added to the IDT or system
> > call entry points.
>
> How does this changelog make sense for anyone not familiar with the
> matter at hand?
>
> Where is the analysis why excluding
>
> > +CFLAGS_REMOVE_idt.o := $(CC_FLAGS_CFI)
> > +CFLAGS_REMOVE_paravirt.o := $(CC_FLAGS_CFI)
>
> all of idt.c and paravirt.c is correct and how that is going to be
> correct in the future?
>
> These files are excluded from CFI, so I can add whatever I want to them
> and circumvent the purpose of CFI, right?
>
> Brilliant plan that. But I know, sekurity ...

*sigh* we're on the same side. :P I will choose to understand your
comments here as:

"How will enforcement of CFI policy be correctly maintained here if
the justification for disabling it for whole compilation units is not
clearly understandable by other developers not familiar with the nuances
of its application?"

This is a completely justified position to take. Thank you for calling
it out; we'll make it better.

--
Kees Cook

2021-04-17 00:04:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 06/15] x86: Avoid CFI jump tables in IDT and entry points

On Fri, Apr 16 2021 at 16:56, Kees Cook wrote:
> On Sat, Apr 17, 2021 at 12:26:56AM +0200, Thomas Gleixner wrote:
>> Where is the analysis why excluding
>>
>> > +CFLAGS_REMOVE_idt.o := $(CC_FLAGS_CFI)
>> > +CFLAGS_REMOVE_paravirt.o := $(CC_FLAGS_CFI)
>>
>> all of idt.c and paravirt.c is correct and how that is going to be
>> correct in the future?
>>
>> These files are excluded from CFI, so I can add whatever I want to them
>> and circumvent the purpose of CFI, right?
>>
>> Brilliant plan that. But I know, sekurity ...
>
> *sigh* we're on the same side. :P I will choose to understand your
> comments here as:
>
> "How will enforcement of CFI policy be correctly maintained here if
> the justification for disabling it for whole compilation units is not
> clearly understandable by other developers not familiar with the nuances
> of its application?"

Plus, if there is a justification for disabling it for a whole
compilation unit:

Where is the tooling which makes sure that this compilation unit is not
later on filled with code which should be subject to CFI?

Thanks,

tglx


2021-04-17 00:19:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 04/15] static_call: Use global functions for the self-test

On Fri, Apr 16 2021 at 23:37, Thomas Gleixner wrote:
> On Fri, Apr 16 2021 at 13:38, Sami Tolvanen wrote:
>> #ifdef CONFIG_STATIC_CALL_SELFTEST
>>
>> -static int func_a(int x)
>> +int func_a(int x)
>> {
>> return x+1;
>> }
>>
>> -static int func_b(int x)
>> +int func_b(int x)
>> {
>> return x+2;
>> }
>
> Did you even compile that?
>
> Global functions without a prototype are generating warnings, but we can
> ignore them just because of sekurity, right?
>
> Aside of that polluting the global namespace with func_a/b just to work
> around a tool shortcoming is beyond hillarious.
>
> Fix the tool not the perfectly correct code.

That said, I wouldn't mind a __dont_dare_to_rename annotation to help
the compiler, but anything else is just wrong.

Thanks,

tglx

2021-04-17 10:20:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Sat, Apr 17 2021 at 01:02, Thomas Gleixner wrote:
> On Fri, Apr 16 2021 at 15:37, Kees Cook wrote:
>
>> On Fri, Apr 16, 2021 at 03:20:17PM -0700, Andy Lutomirski wrote:
>>> But obviously there is code that needs real function pointers. How
>>> about making this a first-class feature, or at least hacking around it
>>> more cleanly. For example, what does this do:
>>>
>>> char entry_whatever[];
>>> wrmsrl(..., (unsigned long)entry_whatever);
>>
>> This is just casting. It'll still resolve to the jump table entry.
>>
>>> or, alternatively,
>>>
>>> extern void func() __attribute__((nocfi));
>>
>> __nocfi says func() should not perform checking of correct jump table
>> membership for indirect calls.
>>
>> But we don't want a global marking for a function to be ignored by CFI;
>> we don't want functions to escape CFI -- we want specific _users_ to
>> either not check CFI for indirect calls (__nocfi) or we want specific
>> passed addresses to avoid going through the jump table
>> (function_nocfi()).
>
> And that's why you mark entire files to be exempt without any rationale
> why it makes sense.

The reason why you have to do that is because function_nocfi() is not
provided by the compiler.

So you need to hack around that with that macro which fails to work
e.g. for the idt data arrays.

Is there any fundamental reason why the compiler does not provide that
in a form which allows to use it everywhere?

It's not too much asked from a tool which provides new functionality to
provide it in a way which is usable.

Thanks,

tglx

2021-04-17 11:38:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 09/15] x86/alternatives: Use C int3 selftest but disable KASAN

On Fri, Apr 16, 2021 at 01:38:38PM -0700, Sami Tolvanen wrote:
> From: Kees Cook <[email protected]>
>
> Instead of using inline asm for the int3 selftest (which confuses the
> Clang's ThinLTO pass), this restores the C function but disables KASAN
> (and tracing for good measure) to keep the things simple and avoid
> unexpected side-effects. This attempts to keep the fix from commit
> ecc606103837 ("x86/alternatives: Fix int3_emulate_call() selftest stack
> corruption") without using inline asm.
>
> Signed-off-by: Kees Cook <[email protected]>
> Signed-off-by: Sami Tolvanen <[email protected]>
> ---
> arch/x86/kernel/alternative.c | 21 ++++-----------------
> 1 file changed, 4 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 6974b5174495..669a23454c09 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -496,23 +496,10 @@ extern struct paravirt_patch_site __start_parainstructions[],
> *
> * See entry_{32,64}.S for more details.
> */
> -
> -/*
> - * We define the int3_magic() function in assembly to control the calling
> - * convention such that we can 'call' it from assembly.
> - */
> -
> -extern void int3_magic(unsigned int *ptr); /* defined in asm */
> -
> -asm (
> -" .pushsection .init.text, \"ax\", @progbits\n"
> -" .type int3_magic, @function\n"
> -"int3_magic:\n"
> -" movl $1, (%" _ASM_ARG1 ")\n"
> -" ret\n"
> -" .size int3_magic, .-int3_magic\n"
> -" .popsection\n"
> -);
> +static void __init __no_sanitize_address notrace int3_magic(unsigned int *ptr)
> +{
> + *ptr = 1;
> +}

I really don't like this. the compiler is free to mess this up in all
sorts of ways.

The problem is that the call-site does not respect the regular calling
convention, so calling a regular C function is just asking for trouble,
which is why it ended up being asm, then we fully control the calling
convention.

2021-04-17 14:21:09

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 05/15] x86: Implement function_nocfi

From: Kees Cook
> Sent: 16 April 2021 23:28
>
> On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 16, 2021 at 3:03 PM Borislav Petkov <[email protected]> wrote:
> > >
> > > On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
> > > > __nocfi only disables CFI checking in a function, the compiler still
> > > > changes function addresses to point to the CFI jump table, which is
> > > > why we need function_nocfi().
> > >
> > > So call it __func_addr() or get_function_addr() or so, so that at least
> > > it is clear what this does.
> > >
> >
> > This seems backwards to me. If I do:
> >
> > extern void foo(some signature);
> >
> > then I would, perhaps naively, expect foo to be the actual symbol that
> > gets called
>
> Yes.
>
> > and for the ABI to be changed to do the CFI checks.
>
> Uh, no? There's no ABI change -- indirect calls are changed to do the
> checking.
>
> > The
> > foo symbol would point to whatever magic is needed.
>
> No, the symbol points to the jump table entry. Direct calls get minimal
> overhead and indirect calls can add the "is this function in the right
> table" checking.


Isn't this a bit like one of the PPC? ABI where function addresses
aren't (always) the entry point.
IIRC is causes all sorts of obscure grief.

I'd also like to know how indirect calls are actually expected to work.
The whole idea is that the potential targets might be in a kernel module
that is loaded at run time.

Scanning a list of possible targets has to be a bad design decision.

If you are trying to check that the called function has the right
prototype, stashing a hash of the prototype (or a known random number)
before the entry point would give most of the benefits without most
of the costs.
The linker could be taught to do the same test (much like name mangling
but without the crap user experience).

That scheme only has the downside of a data cache miss and (hopefully)
statically predicted correct branch (hmm - except you don't want to
speculatively execute the wrong function) and polluting the data cache
with code.

This all seems like a ploy to force people to buy faster cpus.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-04-17 15:50:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi



> On Apr 17, 2021, at 7:20 AM, David Laight <[email protected]> wrote:
>
> From: Kees Cook
>> Sent: 16 April 2021 23:28
>>
>>> On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:
>>> On Fri, Apr 16, 2021 at 3:03 PM Borislav Petkov <[email protected]> wrote:
>>>>
>>>> On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote:
>>>>> __nocfi only disables CFI checking in a function, the compiler still
>>>>> changes function addresses to point to the CFI jump table, which is
>>>>> why we need function_nocfi().
>>>>
>>>> So call it __func_addr() or get_function_addr() or so, so that at least
>>>> it is clear what this does.
>>>>
>>>
>>> This seems backwards to me. If I do:
>>>
>>> extern void foo(some signature);
>>>
>>> then I would, perhaps naively, expect foo to be the actual symbol that
>>> gets called
>>
>> Yes.
>>
>>> and for the ABI to be changed to do the CFI checks.
>>
>> Uh, no? There's no ABI change -- indirect calls are changed to do the
>> checking.
>>
>>> The
>>> foo symbol would point to whatever magic is needed.
>>
>> No, the symbol points to the jump table entry. Direct calls get minimal
>> overhead and indirect calls can add the "is this function in the right
>> table" checking.
>
>
> Isn't this a bit like one of the PPC? ABI where function addresses
> aren't (always) the entry point.
> IIRC is causes all sorts of obscure grief.
>
> I'd also like to know how indirect calls are actually expected to work.
> The whole idea is that the potential targets might be in a kernel module
> that is loaded at run time.
>
> Scanning a list of possible targets has to be a bad design decision.
>
> If you are trying to check that the called function has the right
> prototype, stashing a hash of the prototype (or a known random number)
> before the entry point would give most of the benefits without most
> of the costs.
> The linker could be taught to do the same test (much like name mangling
> but without the crap user experience).
>
> That scheme only has the downside of a data cache miss and (hopefully)
> statically predicted correct branch (hmm - except you don't want to
> speculatively execute the wrong function) and polluting the data cache
> with code.

I admit I was quite surprised by the actual CFI implementation. I would have expected a CFI’d function pointer to actually point to a little descriptor that contains a hash of the function’s type. The whole bit vector thing seems quite inefficient.

>
> This all seems like a ploy to force people to buy faster cpus.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

2021-04-17 23:20:47

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Fri, Apr 16, 2021 at 4:40 PM Kees Cook <[email protected]> wrote:
>

> > 1. I defined a function in asm. I want to tell clang that this
> > function is defined in asm, and for clang to behave accordingly:
> >
> > .globl func
> > func:
> > ; do stuff
> >
> > later:
> >
> > extern void func(void) [something here];
> >
> > There really should be a way to write this correctly such that it
> > works regardless of the setting of
> > -fsanitize-cfi-canonical-jump-tables. This should not bypass CFI. It
> > should *work*, with CFI enforced. If I read all the various things
> > you linked correctly, this would be something like __cfi_noncanonical,
> > and I reserve the right to think that this is a horrible name.
>
> Yes, I find the name confusing too. Without noncanonical, we'd need
> C call wrappers for every single .S function that had its address
> taken. This is very common in crypto, for example. That level of extra
> code seemed like a total non-starter. Instead, we just get a few places
> we have to mark.

The patch you linked doesn't have a noncanonical attribute, though.
So I'm not sure how to reliably call into asm from C.

(The more I think about it, the less I like "canonical". What is
"canonical"? The symbol? The function body? Something else?)

>
> > 2. I need a raw function pointer, thank you very much. I would like
> > to be honest about it, and I don't really want to bypass CFI, but I
> > need the actual bits in the actual symbol.
> >
> > translation unit 1 defines func. Maybe it's C with
> > -fsanitize-cfi-canonical-jump-tables, maybe it's C with
> > -fno-sanitize-cfi-canonical-jump-tables or however it's spelled, and
> > maybe it's plain asm. Now translation unit 2 does:
> >
> > 2a. Uses a literal symbol, because it's going to modify function text
> > or poke an MSR or whatever:
> >
> > wrmsrl(MSR_WHATEVER, func);
> >
> > clang needs to give us *some* way to have a correct declaration of
> > func such that we can, without resorting to inline asm kludges, get
> > the actual bit pattern of the actual symbol.
>
> We don't want version of a global symbol alias of func that points to
> the function body, though; it's only very specific cases where this
> should be stripped (MSR, ftrace, etc).
>
> So, if there were some Clang-specific syntax for this, it would still be
> used on a case-by-case basis. It would still be something like:
>
> wrmsrl(MSR_WAT, __builtin_gimme_body_p(func));
>

> Okay, you're saying you want __builtin_gimme_body_p() to be a constant
> expression for the compiler, not inline asm?

Yes.

I admit that, in the trivial case where the asm code is *not* a
C-ABI-compliant function, giving a type that doesn't fool the compiler
into thinking that it might be is probably the best fix. Maybe we
should standardize something, e.g.:

struct raw_symbol; /* not defined anywhere */
#define DECLARE_RAW_SYMBOL(x) struct raw_symbol x[]

and then we write this:

DECLARE_RAW_SYMBOL(entry_SYSCALL_64);

wrmsrl(..., (unsigned long)entry_SYSCALL_64);

It would be a bit nifty if we didn't need a forward declaration, but
I'm not immediately seeing a way to do this without hacks that we'll
probably regret;

But this doesn't help the case in which the symbol is an actual
C-callable function and we want to be able to call it, too.

2021-04-17 23:54:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Sat, Apr 17 2021 at 16:19, Andy Lutomirski wrote:
> On Fri, Apr 16, 2021 at 4:40 PM Kees Cook <[email protected]> wrote:
>> Okay, you're saying you want __builtin_gimme_body_p() to be a constant
>> expression for the compiler, not inline asm?
>
> Yes.
>
> I admit that, in the trivial case where the asm code is *not* a
> C-ABI-compliant function, giving a type that doesn't fool the compiler
> into thinking that it might be is probably the best fix. Maybe we
> should standardize something, e.g.:
>
> struct raw_symbol; /* not defined anywhere */
> #define DECLARE_RAW_SYMBOL(x) struct raw_symbol x[]
>
> and then we write this:
>
> DECLARE_RAW_SYMBOL(entry_SYSCALL_64);
>
> wrmsrl(..., (unsigned long)entry_SYSCALL_64);
>
> It would be a bit nifty if we didn't need a forward declaration, but
> I'm not immediately seeing a way to do this without hacks that we'll
> probably regret;
>
> But this doesn't help the case in which the symbol is an actual
> C-callable function and we want to be able to call it, too.

The right way to solve this is that the compiler provides a builtin

function_nocfi() +/- the naming bikeshed

which works for

foo = function_nocfi(bar);

and

static unsigned long foo[] = {
function_nocfi(bar1),
function_nocfi(bar2),
};

which are pretty much the only possible 2 usecases. If the compiler
wants to have function_nocfi_in_code() and function_nocfi_const()
because it can't figure it out on it's own, then I can live with that,
but that's still several orders of magnitudes better than having to work
around it by whatever nasty macro/struct magic.

I don't care about the slightly more unreadable code, but if that
builtin has a descriptive name, then it's even useful for documentary
purposes. And it can be easily grepped for which makes it subject to
static code analysers which can help to detect abuse.

Anything which requires to come up with half baken constructs to work
around the shortcomings of the compiler are wrong to begin with.

Thanks,

tglx

2021-04-18 00:16:00

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Sat, Apr 17, 2021 at 4:53 PM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Apr 17 2021 at 16:19, Andy Lutomirski wrote:
> > On Fri, Apr 16, 2021 at 4:40 PM Kees Cook <[email protected]> wrote:
> >> Okay, you're saying you want __builtin_gimme_body_p() to be a constant
> >> expression for the compiler, not inline asm?
> >
> > Yes.
> >
> > I admit that, in the trivial case where the asm code is *not* a
> > C-ABI-compliant function, giving a type that doesn't fool the compiler
> > into thinking that it might be is probably the best fix. Maybe we
> > should standardize something, e.g.:
> >
> > struct raw_symbol; /* not defined anywhere */
> > #define DECLARE_RAW_SYMBOL(x) struct raw_symbol x[]
> >
> > and then we write this:
> >
> > DECLARE_RAW_SYMBOL(entry_SYSCALL_64);
> >
> > wrmsrl(..., (unsigned long)entry_SYSCALL_64);
> >
> > It would be a bit nifty if we didn't need a forward declaration, but
> > I'm not immediately seeing a way to do this without hacks that we'll
> > probably regret;
> >
> > But this doesn't help the case in which the symbol is an actual
> > C-callable function and we want to be able to call it, too.
>
> The right way to solve this is that the compiler provides a builtin
>
> function_nocfi() +/- the naming bikeshed
>
> which works for
>
> foo = function_nocfi(bar);

I agree in general. But right now, we have, in asm/proto.h:

void entry_SYSCALL_64(void);

and that's pure nonsense. Depending on your point of view,
entry_SYSCALL_64 is a symbol that resolves to an integer or it's an
array of bytes containing instructions, but it is most definitely not
a function void (void). So, regardless of any CFI stuff, I propose
that we standardize our handling of prototypes of symbols that are
opaque to the C compiler. Here are a couple of choices:

Easy one:

extern u8 entry_SYSCALL_64[];

Slightly more complicated:

struct opaque_symbol;
extern struct opaque_symbol entry_SYSCALL_64;

The opaque_symbol variant avoids any possible confusion over the weird
status of arrays in C, and it's hard to misuse, since struct
opaque_symbol is an incomplete type.

--Andy

2021-04-18 16:18:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Sat, Apr 17 2021 at 17:11, Andy Lutomirski wrote:
> On Sat, Apr 17, 2021 at 4:53 PM Thomas Gleixner <[email protected]> wrote:
>> which works for
>>
>> foo = function_nocfi(bar);
>
> I agree in general. But right now, we have, in asm/proto.h:
>
> void entry_SYSCALL_64(void);
>
> and that's pure nonsense. Depending on your point of view,
> entry_SYSCALL_64 is a symbol that resolves to an integer or it's an
> array of bytes containing instructions, but it is most definitely not
> a function void (void). So, regardless of any CFI stuff, I propose
> that we standardize our handling of prototypes of symbols that are
> opaque to the C compiler. Here are a couple of choices:
>
> Easy one:
>
> extern u8 entry_SYSCALL_64[];
>
> Slightly more complicated:
>
> struct opaque_symbol;
> extern struct opaque_symbol entry_SYSCALL_64;
>
> The opaque_symbol variant avoids any possible confusion over the weird
> status of arrays in C, and it's hard to misuse, since struct
> opaque_symbol is an incomplete type.

Makes sense.

2021-04-18 23:00:30

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Sun, Apr 18, 2021 at 9:17 AM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Apr 17 2021 at 17:11, Andy Lutomirski wrote:
> > On Sat, Apr 17, 2021 at 4:53 PM Thomas Gleixner <[email protected]> wrote:
> >> which works for
> >>
> >> foo = function_nocfi(bar);
> >
> > I agree in general. But right now, we have, in asm/proto.h:
> >
> > void entry_SYSCALL_64(void);
> >
> > and that's pure nonsense. Depending on your point of view,
> > entry_SYSCALL_64 is a symbol that resolves to an integer or it's an
> > array of bytes containing instructions, but it is most definitely not
> > a function void (void). So, regardless of any CFI stuff, I propose
> > that we standardize our handling of prototypes of symbols that are
> > opaque to the C compiler. Here are a couple of choices:
> >
> > Easy one:
> >
> > extern u8 entry_SYSCALL_64[];
> >
> > Slightly more complicated:
> >
> > struct opaque_symbol;
> > extern struct opaque_symbol entry_SYSCALL_64;
> >
> > The opaque_symbol variant avoids any possible confusion over the weird
> > status of arrays in C, and it's hard to misuse, since struct
> > opaque_symbol is an incomplete type.
>
> Makes sense.

Sami, do you want to do this as part of your series or should I write a patch?

2021-04-19 09:35:06

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On 17/04/2021 00.28, Kees Cook wrote:
> On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:

>> The
>> foo symbol would point to whatever magic is needed.
>
> No, the symbol points to the jump table entry. Direct calls get minimal
> overhead and indirect calls can add the "is this function in the right
> table" checking.
>
>
> But note that this shouldn't turn into a discussion of "maybe Clang could
> do CFI differently"; this is what Clang has.

Why not? In particular, I'd really like somebody to answer the question
"why not just store a cookie before each address-taken or
external-linkage function?".

(1) direct calls get _no_ overhead, not just "minimal".
(2) address comparison, intra- or inter-module, Just Works, no need to
disable one sanity check to get others.
(3) The overhead and implementation of the signature check is the same
for inter- and intra- module calls.
(4) passing (unsigned long)sym into asm code or stashing it in a table
Just Works.

How much code would be needed on the clang side to provide a
--cfi-mode=inline ?

The only issue, AFAICT, which is also a problem being handled in the
current proposal, is indirect calls to asm code from C. They either have
to be instrumented to not do any checking (which requires callers to
know that the pointer they have might point to an asm-implementation),
or the asm code could be taught to emit those cookies as well - which
would provide even better coverage. Something like

CFI_COOKIE(foo)
foo:
...

with CFI_COOKIE expanding to nothing when !CONFIG_CFI, and otherwise to
(something like) ".quad cfi_cookie__foo" ; with some appropriate Kbuild
and compiler magic to generate a table of cfi_cookie__* defines from a
header file with the prototypes.

Rasmus

2021-04-19 16:11:29

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Sat, Apr 17, 2021 at 3:16 AM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, Apr 17 2021 at 01:02, Thomas Gleixner wrote:
> > On Fri, Apr 16 2021 at 15:37, Kees Cook wrote:
> >
> >> On Fri, Apr 16, 2021 at 03:20:17PM -0700, Andy Lutomirski wrote:
> >>> But obviously there is code that needs real function pointers. How
> >>> about making this a first-class feature, or at least hacking around it
> >>> more cleanly. For example, what does this do:
> >>>
> >>> char entry_whatever[];
> >>> wrmsrl(..., (unsigned long)entry_whatever);
> >>
> >> This is just casting. It'll still resolve to the jump table entry.
> >>
> >>> or, alternatively,
> >>>
> >>> extern void func() __attribute__((nocfi));
> >>
> >> __nocfi says func() should not perform checking of correct jump table
> >> membership for indirect calls.
> >>
> >> But we don't want a global marking for a function to be ignored by CFI;
> >> we don't want functions to escape CFI -- we want specific _users_ to
> >> either not check CFI for indirect calls (__nocfi) or we want specific
> >> passed addresses to avoid going through the jump table
> >> (function_nocfi()).
> >
> > And that's why you mark entire files to be exempt without any rationale
> > why it makes sense.
>
> The reason why you have to do that is because function_nocfi() is not
> provided by the compiler.
>
> So you need to hack around that with that macro which fails to work
> e.g. for the idt data arrays.
>
> Is there any fundamental reason why the compiler does not provide that
> in a form which allows to use it everywhere?

I'm not aware of a fundamental reason why the compiler couldn't
provide a built-in here. This series attempts to work with what's
available at the moment, and admittedly that's not quite ideal on x86.

> It's not too much asked from a tool which provides new functionality to
> provide it in a way which is usable.

Sure, that's reasonable. I'll talk to our compiler folks and see how
we can proceed here.

Sami

2021-04-19 16:47:47

by Joao Moreira

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi


> Why not? In particular, I'd really like somebody to answer the question
> "why not just store a cookie before each address-taken or
> external-linkage function?".
>
FWIIW, this was done before (at least twice): First with grsecurity/PaX
RAP (https://grsecurity.net/rap_faq) then with kCFI
(https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel-wp.pdf,
https://github.com/kcfi/kcfi - which is no longer maintained).

At the time I worked on kCFI someone raised a concern regarding this
cookie-based design being mutually exclusive to execute-only memories
(XOM), what, if XOM is really relevant to someone, should be a valid
concern.

Since design is being questioned, an x86/CET-specific third design for
CFI was recently discussed here:
https://www.openwall.com/lists/kernel-hardening/2021/02/11/1 -- I assume
that, arch-dependency considered, this should be easier to integrate
when compared to clang-cfi. Also, given that it is based on CET, this
also has the benefit of constraining mispeculations (which is a nice
side-effect).

Tks, Joao

2021-04-19 19:16:58

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH 09/15] x86/alternatives: Use C int3 selftest but disable KASAN

On Sat, Apr 17, 2021 at 4:37 AM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Apr 16, 2021 at 01:38:38PM -0700, Sami Tolvanen wrote:
> > From: Kees Cook <[email protected]>
> >
> > Instead of using inline asm for the int3 selftest (which confuses the
> > Clang's ThinLTO pass), this restores the C function but disables KASAN
> > (and tracing for good measure) to keep the things simple and avoid
> > unexpected side-effects. This attempts to keep the fix from commit
> > ecc606103837 ("x86/alternatives: Fix int3_emulate_call() selftest stack
> > corruption") without using inline asm.
> >
> > Signed-off-by: Kees Cook <[email protected]>
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > ---
> > arch/x86/kernel/alternative.c | 21 ++++-----------------
> > 1 file changed, 4 insertions(+), 17 deletions(-)
> >
> > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> > index 6974b5174495..669a23454c09 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -496,23 +496,10 @@ extern struct paravirt_patch_site __start_parainstructions[],
> > *
> > * See entry_{32,64}.S for more details.
> > */
> > -
> > -/*
> > - * We define the int3_magic() function in assembly to control the calling
> > - * convention such that we can 'call' it from assembly.
> > - */
> > -
> > -extern void int3_magic(unsigned int *ptr); /* defined in asm */
> > -
> > -asm (
> > -" .pushsection .init.text, \"ax\", @progbits\n"
> > -" .type int3_magic, @function\n"
> > -"int3_magic:\n"
> > -" movl $1, (%" _ASM_ARG1 ")\n"
> > -" ret\n"
> > -" .size int3_magic, .-int3_magic\n"
> > -" .popsection\n"
> > -);
> > +static void __init __no_sanitize_address notrace int3_magic(unsigned int *ptr)
> > +{
> > + *ptr = 1;
> > +}
>
> I really don't like this. the compiler is free to mess this up in all
> sorts of ways.
>
> The problem is that the call-site does not respect the regular calling
> convention, so calling a regular C function is just asking for trouble,
> which is why it ended up being asm, then we fully control the calling
> convention.

Ack. The problem here is that we can't declare an extern static
function in C. How would you feel about making int3_magic a global
instead to match the C declaration?

Sami

2021-04-19 19:26:54

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi

On Sun, Apr 18, 2021 at 3:57 PM Andy Lutomirski <[email protected]> wrote:
>
> On Sun, Apr 18, 2021 at 9:17 AM Thomas Gleixner <[email protected]> wrote:
> >
> > On Sat, Apr 17 2021 at 17:11, Andy Lutomirski wrote:
> > > On Sat, Apr 17, 2021 at 4:53 PM Thomas Gleixner <[email protected]> wrote:
> > >> which works for
> > >>
> > >> foo = function_nocfi(bar);
> > >
> > > I agree in general. But right now, we have, in asm/proto.h:
> > >
> > > void entry_SYSCALL_64(void);
> > >
> > > and that's pure nonsense. Depending on your point of view,
> > > entry_SYSCALL_64 is a symbol that resolves to an integer or it's an
> > > array of bytes containing instructions, but it is most definitely not
> > > a function void (void). So, regardless of any CFI stuff, I propose
> > > that we standardize our handling of prototypes of symbols that are
> > > opaque to the C compiler. Here are a couple of choices:
> > >
> > > Easy one:
> > >
> > > extern u8 entry_SYSCALL_64[];
> > >
> > > Slightly more complicated:
> > >
> > > struct opaque_symbol;
> > > extern struct opaque_symbol entry_SYSCALL_64;
> > >
> > > The opaque_symbol variant avoids any possible confusion over the weird
> > > status of arrays in C, and it's hard to misuse, since struct
> > > opaque_symbol is an incomplete type.
> >
> > Makes sense.
>
> Sami, do you want to do this as part of your series or should I write a patch?

I can certainly include this in the next version, but that might have
to wait a bit for compiler changes, so if you want this done sooner,
please go ahead.

Sami

2021-04-19 19:27:32

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 05/15] x86: Implement function_nocfi

From: Andy Lutomirski
> Sent: 18 April 2021 01:12
..
> Slightly more complicated:
>
> struct opaque_symbol;
> extern struct opaque_symbol entry_SYSCALL_64;
>
> The opaque_symbol variant avoids any possible confusion over the weird
> status of arrays in C, and it's hard to misuse, since struct
> opaque_symbol is an incomplete type.

Maybe:

s/opaque_symbol/entry_SYSCALL_64/

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-04-19 21:54:35

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 05/15] x86: Implement function_nocfi

From: Rasmus Villemoes
> Sent: 19 April 2021 09:40
>
> On 17/04/2021 00.28, Kees Cook wrote:
> > On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:
>
> >> The
> >> foo symbol would point to whatever magic is needed.
> >
> > No, the symbol points to the jump table entry. Direct calls get minimal
> > overhead and indirect calls can add the "is this function in the right
> > table" checking.
> >
> >
> > But note that this shouldn't turn into a discussion of "maybe Clang could
> > do CFI differently"; this is what Clang has.
>
> Why not? In particular, I'd really like somebody to answer the question
> "why not just store a cookie before each address-taken or
> external-linkage function?".
>
> (1) direct calls get _no_ overhead, not just "minimal".
> (2) address comparison, intra- or inter-module, Just Works, no need to
> disable one sanity check to get others.
> (3) The overhead and implementation of the signature check is the same
> for inter- and intra- module calls.
> (4) passing (unsigned long)sym into asm code or stashing it in a table
> Just Works.
>
> How much code would be needed on the clang side to provide a
> --cfi-mode=inline ?
>
> The only issue, AFAICT, which is also a problem being handled in the
> current proposal, is indirect calls to asm code from C. They either have
> to be instrumented to not do any checking (which requires callers to
> know that the pointer they have might point to an asm-implementation),
> or the asm code could be taught to emit those cookies as well - which
> would provide even better coverage. Something like
>
> CFI_COOKIE(foo)
> foo:
> ...
>
> with CFI_COOKIE expanding to nothing when !CONFIG_CFI, and otherwise to
> (something like) ".quad cfi_cookie__foo" ; with some appropriate Kbuild
> and compiler magic to generate a table of cfi_cookie__* defines from a
> header file with the prototypes.

I'm wondering what 'threat models' CFI is trying to protect from.
Adding code to code doing 'call indirect' can only protect against
calls to 'inappropriate' functions.
You may also be worried about whether functions are being called
from the right place - someone might manage to 'plant' an indirect
jump somewhere - probably more likely than overwriting an address.
But a truly malicious caller will be able to subvert most checks.
And non-malicious errors can be largely eliminated by strengthening
normal compiler type checking.

A simple run-time check would be adding an extra 'hidden' function
parameter that is a hash of the prototype.
Especially if the hash is based on a build-id - so differs
between kernel builds.

Having the caller scan a list of valid targets seems just broken.
You might as well replace the function pointer with an index
into the array of valid targets.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-04-19 22:13:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 05/15] x86: Implement function_nocfi


> On Apr 19, 2021, at 8:26 AM, David Laight <[email protected]> wrote:
>
> From: Andy Lutomirski
>> Sent: 18 April 2021 01:12
> ..
>> Slightly more complicated:
>>
>> struct opaque_symbol;
>> extern struct opaque_symbol entry_SYSCALL_64;
>>
>> The opaque_symbol variant avoids any possible confusion over the weird
>> status of arrays in C, and it's hard to misuse, since struct
>> opaque_symbol is an incomplete type.
>
> Maybe:
>
> s/opaque_symbol/entry_SYSCALL_64/
>

Cute. OTOH, I’m not sure whether that has much benefit, and having a single type for all of this allows it to be declared just once. I suppose the magic could be wrapped in a macro, though.

> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2021-04-20 07:21:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 09/15] x86/alternatives: Use C int3 selftest but disable KASAN

On Mon, Apr 19, 2021 at 08:26:36AM -0700, Sami Tolvanen wrote:
> On Sat, Apr 17, 2021 at 4:37 AM Peter Zijlstra <[email protected]> wrote:

> > The problem is that the call-site does not respect the regular calling
> > convention, so calling a regular C function is just asking for trouble,
> > which is why it ended up being asm, then we fully control the calling
> > convention.
>
> Ack. The problem here is that we can't declare an extern static
> function in C. How would you feel about making int3_magic a global
> instead to match the C declaration?

Works for me.

2021-04-20 18:15:14

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 01/15] objtool: Find a destination for jumps beyond the section end

On Fri, Apr 16, 2021 at 01:38:30PM -0700, Sami Tolvanen wrote:
> With -ffunction-sections, Clang can generate a jump beyond the end of
> a section when the section ends in an unreachable instruction.

Why? Can you show an example?

--
Josh

2021-04-20 19:52:32

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 02/15] objtool: Add CONFIG_CFI_CLANG support

On Fri, Apr 16, 2021 at 01:38:31PM -0700, Sami Tolvanen wrote:
> +static int fix_cfi_relocs(const struct elf *elf)
> +{
> + struct section *sec, *tmpsec;
> + struct reloc *reloc, *tmpreloc;
> +
> + list_for_each_entry_safe(sec, tmpsec, &elf->sections, list) {
> + list_for_each_entry_safe(reloc, tmpreloc, &sec->reloc_list, list) {

These loops don't remove structs from the lists, so are the "safe"
iterators really needed?

> + struct symbol *sym;
> + struct reloc *func_reloc;
> +
> + /*
> + * CONFIG_CFI_CLANG replaces function relocations to refer
> + * to an intermediate jump table. Undo the conversion so
> + * objtool can make sense of things.
> + */

I think this comment could be clearer if it were placed above the
function.

> + if (!reloc->sym->sec->cfi_jt)
> + continue;
> +
> + if (reloc->sym->type == STT_SECTION)
> + sym = find_func_by_offset(reloc->sym->sec,
> + reloc->addend);
> + else
> + sym = reloc->sym;
> +
> + if (!sym || !sym->sec)
> + continue;

This should be a fatal error, it probably means something's gone wrong
with the reading of the jump table.

> +
> + /*
> + * The jump table immediately jumps to the actual function,
> + * so look up the relocation there.
> + */
> + func_reloc = find_reloc_by_dest_range(elf, sym->sec, sym->offset, 5);

The jump instruction's reloc is always at offset 1, so this can maybe be
optimized to:

func_reloc = find_reloc_by_dest(elf, sym->sec, sym->offset+1);

though of course that will probably break (future) arm64 objtool. Maybe
an arch-specific offset from the start of the insn would be good.

> + if (!func_reloc || !func_reloc->sym)
> + continue;

This should also be an error.

> +
> + reloc->sym = func_reloc->sym;

I think we should also do 'reloc->addend = 0', because of the
STT_SECTION case:

0000000000000025 0000259e0000000b R_X86_64_32S 0000000000000000 .text..L.cfi.jumptable.8047 + 8

which then translates to

0000000000000009 0000063200000004 R_X86_64_PLT32 0000000000000000 x2apic_prepare_cpu - 4

so the original addend of '8' is no longer valid. Copying the '-4'
wouldn't be right either, because that only applies to a PLT32 text
reloc. A '0' should be appropriate for the 32S data reloc.

This addend might not actually be read by any code, so it may not
currently be an actual bug, but better safe than sorry.

--
Josh

2021-04-20 20:26:42

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH 01/15] objtool: Find a destination for jumps beyond the section end

On Tue, Apr 20, 2021 at 11:14 AM Josh Poimboeuf <[email protected]> wrote:
>
> On Fri, Apr 16, 2021 at 01:38:30PM -0700, Sami Tolvanen wrote:
> > With -ffunction-sections, Clang can generate a jump beyond the end of
> > a section when the section ends in an unreachable instruction.
>
> Why? Can you show an example?

Here's the warning I'm seeing when building allyesconfig + CFI:

vmlinux.o: warning: objtool:
rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c()+0x149:
can't find jump dest instruction at
.text.rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c+0x7dc

$ objdump -d -r -j
.text.rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c
vmlinux.o
0000000000000000 <rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c>:
...
149: 0f 85 8d 06 00 00 jne 7dc <.compoundliteral.4>
...
7d7: e8 00 00 00 00 callq 7dc <.compoundliteral.4>
7d8: R_X86_64_PLT32 __stack_chk_fail-0x4

Sami

2021-04-20 20:46:58

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH 02/15] objtool: Add CONFIG_CFI_CLANG support

On Tue, Apr 20, 2021 at 12:48 PM Josh Poimboeuf <[email protected]> wrote:
>
> On Fri, Apr 16, 2021 at 01:38:31PM -0700, Sami Tolvanen wrote:
> > +static int fix_cfi_relocs(const struct elf *elf)
> > +{
> > + struct section *sec, *tmpsec;
> > + struct reloc *reloc, *tmpreloc;
> > +
> > + list_for_each_entry_safe(sec, tmpsec, &elf->sections, list) {
> > + list_for_each_entry_safe(reloc, tmpreloc, &sec->reloc_list, list) {
>
> These loops don't remove structs from the lists, so are the "safe"
> iterators really needed?
>
> > + struct symbol *sym;
> > + struct reloc *func_reloc;
> > +
> > + /*
> > + * CONFIG_CFI_CLANG replaces function relocations to refer
> > + * to an intermediate jump table. Undo the conversion so
> > + * objtool can make sense of things.
> > + */
>
> I think this comment could be clearer if it were placed above the
> function.
>
> > + if (!reloc->sym->sec->cfi_jt)
> > + continue;
> > +
> > + if (reloc->sym->type == STT_SECTION)
> > + sym = find_func_by_offset(reloc->sym->sec,
> > + reloc->addend);
> > + else
> > + sym = reloc->sym;
> > +
> > + if (!sym || !sym->sec)
> > + continue;
>
> This should be a fatal error, it probably means something's gone wrong
> with the reading of the jump table.
>
> > +
> > + /*
> > + * The jump table immediately jumps to the actual function,
> > + * so look up the relocation there.
> > + */
> > + func_reloc = find_reloc_by_dest_range(elf, sym->sec, sym->offset, 5);
>
> The jump instruction's reloc is always at offset 1, so this can maybe be
> optimized to:
>
> func_reloc = find_reloc_by_dest(elf, sym->sec, sym->offset+1);
>
> though of course that will probably break (future) arm64 objtool. Maybe
> an arch-specific offset from the start of the insn would be good.
>
> > + if (!func_reloc || !func_reloc->sym)
> > + continue;
>
> This should also be an error.
>
> > +
> > + reloc->sym = func_reloc->sym;
>
> I think we should also do 'reloc->addend = 0', because of the
> STT_SECTION case:
>
> 0000000000000025 0000259e0000000b R_X86_64_32S 0000000000000000 .text..L.cfi.jumptable.8047 + 8
>
> which then translates to
>
> 0000000000000009 0000063200000004 R_X86_64_PLT32 0000000000000000 x2apic_prepare_cpu - 4
>
> so the original addend of '8' is no longer valid. Copying the '-4'
> wouldn't be right either, because that only applies to a PLT32 text
> reloc. A '0' should be appropriate for the 32S data reloc.
>
> This addend might not actually be read by any code, so it may not
> currently be an actual bug, but better safe than sorry.

Thank you for taking a look, Josh! I'll fix these in the next version.

Sami

2021-04-20 23:01:20

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 01/15] objtool: Find a destination for jumps beyond the section end

On Tue, Apr 20, 2021 at 01:25:43PM -0700, Sami Tolvanen wrote:
> On Tue, Apr 20, 2021 at 11:14 AM Josh Poimboeuf <[email protected]> wrote:
> >
> > On Fri, Apr 16, 2021 at 01:38:30PM -0700, Sami Tolvanen wrote:
> > > With -ffunction-sections, Clang can generate a jump beyond the end of
> > > a section when the section ends in an unreachable instruction.
> >
> > Why? Can you show an example?
>
> Here's the warning I'm seeing when building allyesconfig + CFI:
>
> vmlinux.o: warning: objtool:
> rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c()+0x149:
> can't find jump dest instruction at
> .text.rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c+0x7dc
>
> $ objdump -d -r -j
> .text.rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c
> vmlinux.o
> 0000000000000000 <rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c>:
> ...
> 149: 0f 85 8d 06 00 00 jne 7dc <.compoundliteral.4>
> ...
> 7d7: e8 00 00 00 00 callq 7dc <.compoundliteral.4>
> 7d8: R_X86_64_PLT32 __stack_chk_fail-0x4

Instead of silencing the warning by faking the jump destination, I'd
rather improve the warning to something like

"warning: rockchip_spi_transfer_one() falls through to the next function"

which is what we normally do in this type of situation.

It may be caused by UB, or a compiler bug, but either way we should
figure out the root cause.

--
Josh

2021-04-20 23:01:57

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 01/15] objtool: Find a destination for jumps beyond the section end

On Tue, Apr 20, 2021 at 3:57 PM Josh Poimboeuf <[email protected]> wrote:
>
> On Tue, Apr 20, 2021 at 01:25:43PM -0700, Sami Tolvanen wrote:
> > On Tue, Apr 20, 2021 at 11:14 AM Josh Poimboeuf <[email protected]> wrote:
> > >
> > > On Fri, Apr 16, 2021 at 01:38:30PM -0700, Sami Tolvanen wrote:
> > > > With -ffunction-sections, Clang can generate a jump beyond the end of
> > > > a section when the section ends in an unreachable instruction.
> > >
> > > Why? Can you show an example?
> >
> > Here's the warning I'm seeing when building allyesconfig + CFI:
> >
> > vmlinux.o: warning: objtool:
> > rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c()+0x149:
> > can't find jump dest instruction at
> > .text.rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c+0x7dc
> >
> > $ objdump -d -r -j
> > .text.rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c
> > vmlinux.o
> > 0000000000000000 <rockchip_spi_transfer_one.f088382d97b74759d70e27e891fe8f1c>:
> > ...
> > 149: 0f 85 8d 06 00 00 jne 7dc <.compoundliteral.4>
> > ...
> > 7d7: e8 00 00 00 00 callq 7dc <.compoundliteral.4>
> > 7d8: R_X86_64_PLT32 __stack_chk_fail-0x4
>
> Instead of silencing the warning by faking the jump destination, I'd
> rather improve the warning to something like
>
> "warning: rockchip_spi_transfer_one() falls through to the next function"
>
> which is what we normally do in this type of situation.
>
> It may be caused by UB, or a compiler bug, but either way we should
> figure out the root cause.

We probably want to creduce or cvise this. IIRC we still have
outstanding issues with switch statements with user-annotated
unreachable branches not getting eliminated.
--
Thanks,
~Nick Desaulniers