2021-10-13 18:19:51

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 00/15] x86: Add support for Clang CFI

This series adds support for Clang's Control-Flow Integrity (CFI)
checking to x86_64. With CFI, the compiler injects a runtime
check before each indirect function call to ensure the target is
a valid function with the correct static type. This restricts
possible call targets and makes it more difficult for an attacker
to exploit bugs that allow the modification of stored function
pointers. For more details, see:

https://clang.llvm.org/docs/ControlFlowIntegrity.html

Note that v5 is based on tip/master. The first two patches contain
objtool support for CFI, the remaining patches change function
declarations to use opaque types, fix type mismatch issues that
confuse the compiler, and disable CFI where it can't be used.

You can also pull this series from

https://github.com/samitolvanen/linux.git x86-cfi-v5

---
Changes in v5:
- Renamed DECLARE_ASM_FUNC_SYMBOL() to DECLARE_NOT_CALLED_FROM_C()
after some discussion about naming.

- Added an explicit include for <linux/cfi.h> to tracepoint.c.

- Updated commit messages based on feedback.

Changes in v4:
- Dropped the extable patch after the code was refactored in -tip.

- Switched to __section() instead of open-coding the attribute.

- Added an explicit ifdef for filtering out CC_FLAGS_CFI in
purgatory for readability.

- Added a comment to arch_cfi_jump_reloc_offset() in objtool.

Changes in v3:
- Dropped Clang requirement to >= 13 after the missing compiler
fix was backported there.

- Added DEFINE_CFI_IMMEDIATE_RETURN_STUB to address the issue
with tp_stub_func in kernel/tracepoint.c.

- Renamed asm_func_t to asm_func_ptr.

- Changed extable handlers to use __cficanonical instead of
disabling CFI for fixup_exception.

Changes in v2:
- Dropped the first objtool patch as the warnings were fixed in
separate patches.

- Changed fix_cfi_relocs() in objtool to not rely on jump table
symbols, and to return an error if it can't find a relocation.

- Fixed a build issue with ASM_STACK_FRAME_NON_STANDARD().

- Dropped workarounds for inline assembly references to
address-taken static functions with CFI as this was fixed in
the compiler.

- Changed the C declarations of non-callable functions to use
opaque types and dropped the function_nocfi() patches.

- Changed ARCH_SUPPORTS_CFI_CLANG to depend on Clang >=14 for
the compiler fixes.


Kees Cook (1):
x86, relocs: Ignore __typeid__ relocations

Sami Tolvanen (14):
objtool: Add CONFIG_CFI_CLANG support
objtool: Add ASM_STACK_FRAME_NON_STANDARD
linkage: Add DECLARE_NOT_CALLED_FROM_C
cfi: Add DEFINE_CFI_IMMEDIATE_RETURN_STUB
tracepoint: Exclude tp_stub_func from CFI checking
ftrace: Use an opaque type for functions not callable from C
lkdtm: Disable UNSET_SMEP with CFI
lkdtm: Use an opaque type for lkdtm_rodata_do_nothing
x86: Use an opaque type for functions not callable from C
x86/purgatory: Disable CFI
x86, module: Ignore __typeid__ relocations
x86, cpu: Use LTO for cpu.c with CFI
x86, kprobes: Fix optprobe_template_func type mismatch
x86, build: Allow CONFIG_CFI_CLANG to be selected

arch/x86/Kconfig | 1 +
arch/x86/include/asm/ftrace.h | 2 +-
arch/x86/include/asm/idtentry.h | 10 +++---
arch/x86/include/asm/page_64.h | 7 ++--
arch/x86/include/asm/paravirt_types.h | 3 +-
arch/x86/include/asm/processor.h | 2 +-
arch/x86/include/asm/proto.h | 25 ++++++-------
arch/x86/include/asm/uaccess_64.h | 9 ++---
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/ftrace.c | 2 +-
arch/x86/kernel/kprobes/opt.c | 4 +--
arch/x86/kernel/module.c | 4 +++
arch/x86/kernel/paravirt.c | 4 +--
arch/x86/kvm/emulate.c | 4 +--
arch/x86/kvm/kvm_emulate.h | 9 ++---
arch/x86/power/Makefile | 2 ++
arch/x86/purgatory/Makefile | 4 +++
arch/x86/tools/relocs.c | 7 ++++
arch/x86/xen/enlighten_pv.c | 6 ++--
arch/x86/xen/xen-ops.h | 10 +++---
drivers/misc/lkdtm/bugs.c | 2 +-
drivers/misc/lkdtm/lkdtm.h | 2 +-
drivers/misc/lkdtm/perms.c | 2 +-
drivers/misc/lkdtm/rodata.c | 2 +-
include/asm-generic/vmlinux.lds.h | 11 ++++++
include/linux/cfi.h | 13 +++++++
include/linux/ftrace.h | 7 ++--
include/linux/linkage.h | 13 +++++++
include/linux/objtool.h | 6 ++++
kernel/cfi.c | 24 ++++++++++++-
kernel/tracepoint.c | 6 ++--
tools/include/linux/objtool.h | 6 ++++
tools/objtool/arch/x86/decode.c | 17 +++++++++
tools/objtool/elf.c | 51 +++++++++++++++++++++++++++
tools/objtool/include/objtool/arch.h | 3 ++
tools/objtool/include/objtool/elf.h | 2 +-
36 files changed, 218 insertions(+), 66 deletions(-)


base-commit: 880e2b8e3151574b9e3419d1fbb06726ddee8b03
--
2.33.0.1079.g6e70778dc9-goog


2021-10-13 18:19:51

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 01/15] objtool: Add CONFIG_CFI_CLANG support

The upcoming CONFIG_CFI_CLANG support uses -fsanitize=cfi, the
non-canonical version of which hijacks function entry by changing
function relocation references to point to an intermediary jump table.

For example:

Relocation section '.rela.discard.func_stack_frame_non_standard' at offset 0x37e018 contains 6 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0002944700000002 R_X86_64_PC32 00000000000023f0 do_suspend_lowlevel + 0
0000000000000008 0003c11900000001 R_X86_64_64 0000000000000008 xen_cpuid$e69bc59f4fade3b6f2b579b3934137df.cfi_jt + 0
0000000000000010 0003980900000001 R_X86_64_64 0000000000000060 machine_real_restart.cfi_jt + 0
0000000000000018 0003962b00000001 R_X86_64_64 0000000000000e18 kretprobe_trampoline.cfi_jt + 0
0000000000000020 000028f300000001 R_X86_64_64 0000000000000000 .rodata + 12
0000000000000028 000349f400000001 R_X86_64_64 0000000000000018 __crash_kexec.cfi_jt + 0

0000000000000060 <machine_real_restart.cfi_jt>:
60: e9 00 00 00 00 jmpq 65 <machine_real_restart.cfi_jt+0x5>
61: R_X86_64_PLT32 machine_real_restart-0x4
65: cc int3
66: cc int3
67: cc int3

This breaks objtool vmlinux validation in many ways, including static
call site detection and the STACK_FRAME_NON_STANDARD() macro.

Fix it by converting those relocations' symbol references back to their
original non-jump-table versions. Note this doesn't change the actual
relocations in the object itself, it just changes objtool's view of
them. This change is based on Josh's initial patch:

https://lore.kernel.org/r/d743f4b36e120c06506567a9f87a062ae03da47f.1611263462.git.jpoimboe@redhat.com/

Reported-by: Sedat Dilek <[email protected]>
Suggested-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
tools/objtool/arch/x86/decode.c | 17 ++++++++++
tools/objtool/elf.c | 51 ++++++++++++++++++++++++++++
tools/objtool/include/objtool/arch.h | 3 ++
tools/objtool/include/objtool/elf.h | 2 +-
4 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 1f2ae708b223..5fe31523e51f 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -63,6 +63,23 @@ bool arch_callee_saved_reg(unsigned char reg)
}
}

+unsigned long arch_cfi_section_reloc_offset(struct reloc *reloc)
+{
+ if (!reloc->addend)
+ return 0;
+
+ if (reloc->type == R_X86_64_PC32 || reloc->type == R_X86_64_PLT32)
+ return reloc->addend + 4;
+
+ return reloc->addend;
+}
+
+unsigned long arch_cfi_jump_reloc_offset(unsigned long offset)
+{
+ /* offset to the relocation in a jmp instruction */
+ return offset + 1;
+}
+
unsigned long arch_dest_reloc_offset(int addend)
{
return addend + 4;
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index b18f0055b50b..cd09c93c34fb 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -18,6 +18,7 @@
#include <errno.h>
#include <objtool/builtin.h>

+#include <objtool/arch.h>
#include <objtool/elf.h>
#include <objtool/warn.h>

@@ -290,6 +291,10 @@ static int read_sections(struct elf *elf)
if (sec->sh.sh_flags & SHF_EXECINSTR)
elf->text_size += sec->sh.sh_size;

+ /* Detect -fsanitize=cfi jump table sections */
+ if (!strncmp(sec->name, ".text..L.cfi.jumptable", 22))
+ sec->cfi_jt = true;
+
list_add_tail(&sec->list, &elf->sections);
elf_hash_add(section, &sec->hash, sec->idx);
elf_hash_add(section_name, &sec->name_hash, str_hash(sec->name));
@@ -575,6 +580,49 @@ static int read_rela_reloc(struct section *sec, int i, struct reloc *reloc, unsi
return 0;
}

+/*
+ * CONFIG_CFI_CLANG replaces function relocations to refer to an intermediate
+ * jump table. Undo the conversion so objtool can make sense of things.
+ */
+static int fix_cfi_relocs(const struct elf *elf)
+{
+ struct section *sec;
+ struct reloc *reloc;
+
+ list_for_each_entry(sec, &elf->sections, list) {
+ list_for_each_entry(reloc, &sec->reloc_list, list) {
+ struct reloc *cfi_reloc;
+ unsigned long offset;
+
+ if (!reloc->sym->sec->cfi_jt)
+ continue;
+
+ if (reloc->sym->type == STT_SECTION)
+ offset = arch_cfi_section_reloc_offset(reloc);
+ else
+ offset = reloc->sym->offset;
+
+ /*
+ * The jump table immediately jumps to the actual function,
+ * so look up the relocation there.
+ */
+ offset = arch_cfi_jump_reloc_offset(offset);
+ cfi_reloc = find_reloc_by_dest(elf, reloc->sym->sec, offset);
+
+ if (!cfi_reloc || !cfi_reloc->sym) {
+ WARN("can't find a CFI jump table relocation at %s+0x%lx",
+ reloc->sym->sec->name, offset);
+ return -1;
+ }
+
+ reloc->sym = cfi_reloc->sym;
+ reloc->addend = 0;
+ }
+ }
+
+ return 0;
+}
+
static int read_relocs(struct elf *elf)
{
struct section *sec;
@@ -638,6 +686,9 @@ static int read_relocs(struct elf *elf)
tot_reloc += nr_reloc;
}

+ if (fix_cfi_relocs(elf))
+ return -1;
+
if (stats) {
printf("max_reloc: %lu\n", max_reloc);
printf("tot_reloc: %lu\n", tot_reloc);
diff --git a/tools/objtool/include/objtool/arch.h b/tools/objtool/include/objtool/arch.h
index 589ff58426ab..93bde8aaf2e3 100644
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -81,6 +81,9 @@ unsigned long arch_jump_destination(struct instruction *insn);

unsigned long arch_dest_reloc_offset(int addend);

+unsigned long arch_cfi_section_reloc_offset(struct reloc *reloc);
+unsigned long arch_cfi_jump_reloc_offset(unsigned long offset);
+
const char *arch_nop_insn(int len);
const char *arch_ret_insn(int len);

diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index c48c1067797d..e9432be2a0b0 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -38,7 +38,7 @@ struct section {
Elf_Data *data;
char *name;
int idx;
- bool changed, text, rodata, noinstr;
+ bool changed, text, rodata, noinstr, cfi_jt;
};

struct symbol {
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:19:55

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 06/15] ftrace: Use an opaque type for functions not callable from C

With CONFIG_CFI_CLANG, the compiler changes function references to
point to the CFI jump table. As ftrace_call, ftrace_regs_call, and
mcount_call are not called by C code, but are trampolines injected
as calls replacing the nops at the start of functions added by the
compiler, use DECLARE_NOT_CALLED_FROM_C to declare them.

Signed-off-by: Sami Tolvanen <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
include/linux/ftrace.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 832e65f06754..c53a00b96ba9 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -578,9 +578,10 @@ extern void ftrace_replace_code(int enable);
extern int ftrace_update_ftrace_func(ftrace_func_t func);
extern void ftrace_caller(void);
extern void ftrace_regs_caller(void);
-extern void ftrace_call(void);
-extern void ftrace_regs_call(void);
-extern void mcount_call(void);
+
+DECLARE_NOT_CALLED_FROM_C(ftrace_call);
+DECLARE_NOT_CALLED_FROM_C(ftrace_regs_call);
+DECLARE_NOT_CALLED_FROM_C(mcount_call);

void ftrace_modify_all_code(int command);

--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:19:55

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 07/15] lkdtm: Disable UNSET_SMEP with CFI

Disable the UNSET_SMEP test when CONFIG_CFI_CLANG is enabled as
jumping to a call gadget would always trip CFI instead.

Signed-off-by: Sami Tolvanen <[email protected]>
Acked-by: Kees Cook <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
drivers/misc/lkdtm/bugs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/lkdtm/bugs.c b/drivers/misc/lkdtm/bugs.c
index 4282b625200f..6e8677852262 100644
--- a/drivers/misc/lkdtm/bugs.c
+++ b/drivers/misc/lkdtm/bugs.c
@@ -367,7 +367,7 @@ void lkdtm_STACK_GUARD_PAGE_TRAILING(void)

void lkdtm_UNSET_SMEP(void)
{
-#if IS_ENABLED(CONFIG_X86_64) && !IS_ENABLED(CONFIG_UML)
+#if IS_ENABLED(CONFIG_X86_64) && !IS_ENABLED(CONFIG_UML) && !IS_ENABLED(CONFIG_CFI_CLANG)
#define MOV_CR4_DEPTH 64
void (*direct_write_cr4)(unsigned long val);
unsigned char *insn;
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:19:59

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 08/15] lkdtm: Use an opaque type for lkdtm_rodata_do_nothing

Use an opaque type for lkdtm_rodata_do_nothing to stop the compiler
from generating a CFI jump table entry that jumps to .rodata.

Signed-off-by: Sami Tolvanen <[email protected]>
Acked-by: Kees Cook <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
drivers/misc/lkdtm/lkdtm.h | 2 +-
drivers/misc/lkdtm/perms.c | 2 +-
drivers/misc/lkdtm/rodata.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index c212a253edde..35535c422939 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -137,7 +137,7 @@ void lkdtm_REFCOUNT_TIMING(void);
void lkdtm_ATOMIC_TIMING(void);

/* rodata.c */
-void lkdtm_rodata_do_nothing(void);
+DECLARE_NOT_CALLED_FROM_C(lkdtm_rodata_do_nothing);

/* usercopy.c */
void __init lkdtm_usercopy_init(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 2dede2ef658f..fa2bd90bd8ee 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -151,7 +151,7 @@ void lkdtm_EXEC_VMALLOC(void)

void lkdtm_EXEC_RODATA(void)
{
- execute_location(lkdtm_rodata_do_nothing, CODE_AS_IS);
+ execute_location((void *)lkdtm_rodata_do_nothing, CODE_AS_IS);
}

void lkdtm_EXEC_USERSPACE(void)
diff --git a/drivers/misc/lkdtm/rodata.c b/drivers/misc/lkdtm/rodata.c
index baacb876d1d9..17ed0ad4e6ae 100644
--- a/drivers/misc/lkdtm/rodata.c
+++ b/drivers/misc/lkdtm/rodata.c
@@ -3,7 +3,7 @@
* This includes functions that are meant to live entirely in .rodata
* (via objcopy tricks), to validate the non-executability of .rodata.
*/
-#include "lkdtm.h"
+void lkdtm_rodata_do_nothing(void);

void noinstr lkdtm_rodata_do_nothing(void)
{
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:19:59

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 05/15] tracepoint: Exclude tp_stub_func from CFI checking

If allocate_probes fails, func_remove replaces the old function
with a pointer to tp_stub_func, which is called using a mismatching
function pointer that will always trip indirect call checks with
CONFIG_CFI_CLANG. Use DEFINE_CFI_IMMEDATE_RETURN_STUB to define
tp_stub_func to allow it to pass CFI checking.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
kernel/tracepoint.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 64ea283f2f86..8a0d463c8507 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -9,6 +9,7 @@
#include <linux/list.h>
#include <linux/rcupdate.h>
#include <linux/tracepoint.h>
+#include <linux/cfi.h>
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/sched/signal.h>
@@ -99,10 +100,7 @@ struct tp_probes {
};

/* Called in removal of a func but failed to allocate a new tp_funcs */
-static void tp_stub_func(void)
-{
- return;
-}
+static DEFINE_CFI_IMMEDIATE_RETURN_STUB(tp_stub_func);

static inline void *allocate_probes(int count)
{
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:20:00

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 04/15] cfi: Add DEFINE_CFI_IMMEDIATE_RETURN_STUB

This change introduces the DEFINE_CFI_IMMEDIATE_RETURN_STUB macro,
which defines a stub function that immediately returns and when
defined in the core kernel, always passes indirect call checking
with CONFIG_CFI_CLANG. Note that this macro should only be used when
a stub cannot be called using the correct function type.

Signed-off-by: Sami Tolvanen <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
include/asm-generic/vmlinux.lds.h | 11 +++++++++++
include/linux/cfi.h | 13 +++++++++++++
kernel/cfi.c | 24 +++++++++++++++++++++++-
3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index f2984af2b85b..5b77284f7221 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -407,6 +407,16 @@
KEEP(*(.static_call_tramp_key)) \
__stop_static_call_tramp_key = .;

+#ifdef CONFIG_CFI_CLANG
+#define CFI_EXCLUDED_DATA \
+ . = ALIGN(8); \
+ __start_cfi_excluded = .; \
+ KEEP(*(.cfi_excluded_stubs)) \
+ __stop_cfi_excluded = .;
+#else
+#define CFI_EXCLUDED_DATA
+#endif
+
/*
* Allow architectures to handle ro_after_init data on their
* own by defining an empty RO_AFTER_INIT_DATA.
@@ -430,6 +440,7 @@
__start_rodata = .; \
*(.rodata) *(.rodata.*) \
SCHED_DATA \
+ CFI_EXCLUDED_DATA \
RO_AFTER_INIT_DATA /* Read only after init */ \
. = ALIGN(8); \
__start___tracepoints_ptrs = .; \
diff --git a/include/linux/cfi.h b/include/linux/cfi.h
index 879744aaa6e0..19f74af8eac2 100644
--- a/include/linux/cfi.h
+++ b/include/linux/cfi.h
@@ -20,6 +20,17 @@ extern void __cfi_check(uint64_t id, void *ptr, void *diag);
#define __CFI_ADDRESSABLE(fn, __attr) \
const void *__cfi_jt_ ## fn __visible __attr = (void *)&fn

+/*
+ * Defines a stub function that returns immediately, and when defined and
+ * referenced in the core kernel, always passes CFI checking. This should
+ * be used only for stubs that cannot be called using the correct function
+ * pointer type, which should be rare.
+ */
+#define DEFINE_CFI_IMMEDIATE_RETURN_STUB(fn) \
+ void fn(void) { return; } \
+ const void *__cfi_excl_ ## fn __visible \
+ __section(".cfi_excluded_stubs") = (void *)&fn
+
#ifdef CONFIG_CFI_CLANG_SHADOW

extern void cfi_module_add(struct module *mod, unsigned long base_addr);
@@ -35,6 +46,8 @@ static inline void cfi_module_remove(struct module *mod, unsigned long base_addr
#else /* !CONFIG_CFI_CLANG */

#define __CFI_ADDRESSABLE(fn, __attr)
+#define DEFINE_CFI_IMMEDIATE_RETURN_STUB(fn) \
+ void fn(void) { return; }

#endif /* CONFIG_CFI_CLANG */

diff --git a/kernel/cfi.c b/kernel/cfi.c
index 9594cfd1cf2c..8d931089141b 100644
--- a/kernel/cfi.c
+++ b/kernel/cfi.c
@@ -278,12 +278,34 @@ static inline cfi_check_fn find_module_check_fn(unsigned long ptr)
return fn;
}

+extern unsigned long __start_cfi_excluded[];
+extern unsigned long __stop_cfi_excluded[];
+
+static inline bool is_cfi_excluded(unsigned long ptr)
+{
+ unsigned long *p = __start_cfi_excluded;
+
+ for ( ; p < __stop_cfi_excluded; ++p)
+ if (*p == ptr)
+ return true;
+
+ return false;
+}
+
+static void __cfi_pass(uint64_t id, void *ptr, void *diag)
+{
+}
+
static inline cfi_check_fn find_check_fn(unsigned long ptr)
{
cfi_check_fn fn = NULL;

- if (is_kernel_text(ptr))
+ if (is_kernel_text(ptr)) {
+ if (unlikely(is_cfi_excluded(ptr)))
+ return __cfi_pass;
+
return __cfi_check;
+ }

/*
* Indirect call checks can happen when RCU is not watching. Both
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:20:10

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 14/15] x86, kprobes: Fix optprobe_template_func type mismatch

The optprobe_template_func symbol is defined in inline assembly,
but it's not marked global, which conflicts with the C declaration
needed for STACK_FRAME_NON_STANDARD and confuses the compiler when
CONFIG_CFI_CLANG is enabled.

Marking the symbol global would make the compiler happy, but as the
compiler also generates a CFI jump table entry for all address-taken
functions, the jump table ends up containing a jump to the .rodata
section where optprobe_template_func resides, which results in an
objtool warning.

Use ASM_STACK_FRAME_NON_STANDARD instead to avoid both issues.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
arch/x86/kernel/kprobes/opt.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 71425ebba98a..95375ef5deee 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -103,6 +103,7 @@ static void synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long val)
asm (
".pushsection .rodata\n"
"optprobe_template_func:\n"
+ ASM_STACK_FRAME_NON_STANDARD(optprobe_template_func)
".global optprobe_template_entry\n"
"optprobe_template_entry:\n"
#ifdef CONFIG_X86_64
@@ -154,9 +155,6 @@ asm (
"optprobe_template_end:\n"
".popsection\n");

-void optprobe_template_func(void);
-STACK_FRAME_NON_STANDARD(optprobe_template_func);
-
#define TMPL_CLAC_IDX \
((long)optprobe_template_clac - (long)optprobe_template_entry)
#define TMPL_MOVE_IDX \
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:20:21

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 10/15] x86/purgatory: Disable CFI

Disable CONFIG_CFI_CLANG for the stand-alone purgatory.ro.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
arch/x86/purgatory/Makefile | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..911954fec31c 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -55,6 +55,10 @@ ifdef CONFIG_RETPOLINE
PURGATORY_CFLAGS_REMOVE += $(RETPOLINE_CFLAGS)
endif

+ifdef CONFIG_CFI_CLANG
+PURGATORY_CFLAGS_REMOVE += $(CC_FLAGS_CFI)
+endif
+
CFLAGS_REMOVE_purgatory.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_purgatory.o += $(PURGATORY_CFLAGS)

--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:20:24

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

The kernel has several assembly functions that are not directly callable
from C. Use an opaque type for these function prototypes to make misuse
harder, and to avoid the need to annotate references to these functions
for Clang's Control-Flow Integrity (CFI).

Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Alexander Lobakin <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
arch/x86/include/asm/ftrace.h | 2 +-
arch/x86/include/asm/idtentry.h | 10 +++++-----
arch/x86/include/asm/page_64.h | 7 ++++---
arch/x86/include/asm/paravirt_types.h | 3 ++-
arch/x86/include/asm/processor.h | 2 +-
arch/x86/include/asm/proto.h | 25 +++++++++++++------------
arch/x86/include/asm/uaccess_64.h | 9 +++------
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/ftrace.c | 2 +-
arch/x86/kernel/paravirt.c | 4 ++--
arch/x86/kvm/emulate.c | 4 ++--
arch/x86/kvm/kvm_emulate.h | 9 ++-------
arch/x86/xen/enlighten_pv.c | 6 +++---
arch/x86/xen/xen-ops.h | 10 +++++-----
14 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 9f3130f40807..bc675d6ce4eb 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -17,7 +17,7 @@

#ifndef __ASSEMBLY__
extern atomic_t modifying_ftrace_code;
-extern void __fentry__(void);
+DECLARE_NOT_CALLED_FROM_C(__fentry__);

static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e9902..6538bf5a47d6 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -27,8 +27,8 @@
* as well which is used to emit the entry stubs in entry_32/64.S.
*/
#define DECLARE_IDTENTRY(vector, func) \
- asmlinkage void asm_##func(void); \
- asmlinkage void xen_asm_##func(void); \
+ DECLARE_NOT_CALLED_FROM_C(asm_##func); \
+ DECLARE_NOT_CALLED_FROM_C(xen_asm_##func); \
__visible void func(struct pt_regs *regs)

/**
@@ -78,8 +78,8 @@ static __always_inline void __##func(struct pt_regs *regs)
* C-handler.
*/
#define DECLARE_IDTENTRY_ERRORCODE(vector, func) \
- asmlinkage void asm_##func(void); \
- asmlinkage void xen_asm_##func(void); \
+ DECLARE_NOT_CALLED_FROM_C(asm_##func); \
+ DECLARE_NOT_CALLED_FROM_C(xen_asm_##func); \
__visible void func(struct pt_regs *regs, unsigned long error_code)

/**
@@ -386,7 +386,7 @@ static __always_inline void __##func(struct pt_regs *regs)
* - The C handler called from the C shim
*/
#define DECLARE_IDTENTRY_DF(vector, func) \
- asmlinkage void asm_##func(void); \
+ DECLARE_NOT_CALLED_FROM_C(asm_##func); \
__visible void func(struct pt_regs *regs, \
unsigned long error_code, \
unsigned long address)
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 4bde0dc66100..22beb80c0708 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -5,6 +5,7 @@
#include <asm/page_64_types.h>

#ifndef __ASSEMBLY__
+#include <linux/linkage.h>
#include <asm/alternative.h>

/* duplicated to the one in bootmem.h */
@@ -40,9 +41,9 @@ extern unsigned long __phys_addr_symbol(unsigned long);
#define pfn_valid(pfn) ((pfn) < max_pfn)
#endif

-void clear_page_orig(void *page);
-void clear_page_rep(void *page);
-void clear_page_erms(void *page);
+DECLARE_NOT_CALLED_FROM_C(clear_page_orig);
+DECLARE_NOT_CALLED_FROM_C(clear_page_rep);
+DECLARE_NOT_CALLED_FROM_C(clear_page_erms);

static inline void clear_page(void *page)
{
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d9d6b0203ec4..dfaa50d20d6a 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -38,6 +38,7 @@
#include <asm/desc_defs.h>
#include <asm/pgtable_types.h>
#include <asm/nospec-branch.h>
+#include <asm/proto.h>

struct page;
struct thread_struct;
@@ -271,7 +272,7 @@ struct paravirt_patch_template {

extern struct pv_info pv_info;
extern struct paravirt_patch_template pv_ops;
-extern void (*paravirt_iret)(void);
+extern asm_func_ptr paravirt_iret;

#define PARAVIRT_PATCH(x) \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 577f342dbfb2..1e6b6372b53b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -449,7 +449,7 @@ static inline unsigned long cpu_kernelmode_gs_base(int cpu)

DECLARE_PER_CPU(void *, hardirq_stack_ptr);
DECLARE_PER_CPU(bool, hardirq_stack_inuse);
-extern asmlinkage void ignore_sysret(void);
+DECLARE_NOT_CALLED_FROM_C(ignore_sysret);

/* Save actual FS/GS selectors and bases to current->thread */
void current_save_fsgs(void);
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 8c5d1910a848..55d1161c985a 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -2,6 +2,7 @@
#ifndef _ASM_X86_PROTO_H
#define _ASM_X86_PROTO_H

+#include <linux/linkage.h>
#include <asm/ldt.h>

struct task_struct;
@@ -11,26 +12,26 @@ struct task_struct;
void syscall_init(void);

#ifdef CONFIG_X86_64
-void entry_SYSCALL_64(void);
-void entry_SYSCALL_64_safe_stack(void);
+DECLARE_NOT_CALLED_FROM_C(entry_SYSCALL_64);
+DECLARE_NOT_CALLED_FROM_C(entry_SYSCALL_64_safe_stack);
long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2);
#endif

#ifdef CONFIG_X86_32
-void entry_INT80_32(void);
-void entry_SYSENTER_32(void);
-void __begin_SYSENTER_singlestep_region(void);
-void __end_SYSENTER_singlestep_region(void);
+DECLARE_NOT_CALLED_FROM_C(entry_INT80_32);
+DECLARE_NOT_CALLED_FROM_C(entry_SYSENTER_32);
+DECLARE_NOT_CALLED_FROM_C(__begin_SYSENTER_singlestep_region);
+DECLARE_NOT_CALLED_FROM_C(__end_SYSENTER_singlestep_region);
#endif

#ifdef CONFIG_IA32_EMULATION
-void entry_SYSENTER_compat(void);
-void __end_entry_SYSENTER_compat(void);
-void entry_SYSCALL_compat(void);
-void entry_SYSCALL_compat_safe_stack(void);
-void entry_INT80_compat(void);
+DECLARE_NOT_CALLED_FROM_C(entry_SYSENTER_compat);
+DECLARE_NOT_CALLED_FROM_C(__end_entry_SYSENTER_compat);
+DECLARE_NOT_CALLED_FROM_C(entry_SYSCALL_compat);
+DECLARE_NOT_CALLED_FROM_C(entry_SYSCALL_compat_safe_stack);
+DECLARE_NOT_CALLED_FROM_C(entry_INT80_compat);
#ifdef CONFIG_XEN_PV
-void xen_entry_INT80_compat(void);
+DECLARE_NOT_CALLED_FROM_C(xen_entry_INT80_compat);
#endif
#endif

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 45697e04d771..96cf72d6b75c 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -17,12 +17,9 @@
*/

/* Handles exceptions in both to and from, but doesn't do access_ok */
-__must_check unsigned long
-copy_user_enhanced_fast_string(void *to, const void *from, unsigned len);
-__must_check unsigned long
-copy_user_generic_string(void *to, const void *from, unsigned len);
-__must_check unsigned long
-copy_user_generic_unrolled(void *to, const void *from, unsigned len);
+DECLARE_NOT_CALLED_FROM_C(copy_user_enhanced_fast_string);
+DECLARE_NOT_CALLED_FROM_C(copy_user_generic_string);
+DECLARE_NOT_CALLED_FROM_C(copy_user_generic_unrolled);

static __always_inline __must_check unsigned long
copy_user_generic(void *to, const void *from, unsigned len)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index e9da3dc71254..1a07ce172667 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -530,7 +530,7 @@ extern struct paravirt_patch_site __start_parainstructions[],
* convention such that we can 'call' it from assembly.
*/

-extern void int3_magic(unsigned int *ptr); /* defined in asm */
+DECLARE_NOT_CALLED_FROM_C(int3_magic);

asm (
" .pushsection .init.text, \"ax\", @progbits\n"
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 1b3ce3b4a2a2..a73dfe7c430d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -589,7 +589,7 @@ void arch_ftrace_trampoline_free(struct ftrace_ops *ops)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER

#ifdef CONFIG_DYNAMIC_FTRACE
-extern void ftrace_graph_call(void);
+DECLARE_NOT_CALLED_FROM_C(ftrace_graph_call);

static const char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
{
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index ebc45360ffd4..d3471c0e285a 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -138,7 +138,7 @@ void paravirt_set_sched_clock(u64 (*func)(void))
}

/* These are in entry.S */
-extern void native_iret(void);
+DECLARE_NOT_CALLED_FROM_C(native_iret);

static struct resource reserve_ioports = {
.start = 0,
@@ -403,7 +403,7 @@ struct paravirt_patch_template pv_ops = {
#ifdef CONFIG_PARAVIRT_XXL
NOKPROBE_SYMBOL(native_load_idt);

-void (*paravirt_iret)(void) = native_iret;
+asm_func_ptr paravirt_iret = native_iret;
#endif

EXPORT_SYMBOL(pv_ops);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9a144ca8e146..91600a05b6fd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -201,7 +201,7 @@ struct opcode {
const struct escape *esc;
const struct instr_dual *idual;
const struct mode_dual *mdual;
- void (*fastop)(struct fastop *fake);
+ fastop_t fastop;
} u;
int (*check_perm)(struct x86_emulate_ctxt *ctxt);
};
@@ -322,7 +322,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop);
__FOP_RET(#name)

#define FOP_START(op) \
- extern void em_##op(struct fastop *fake); \
+ DECLARE_NOT_CALLED_FROM_C(em_##op); \
asm(".pushsection .text, \"ax\" \n\t" \
".global em_" #op " \n\t" \
".align " __stringify(FASTOP_SIZE) " \n\t" \
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 68b420289d7e..44c1a9324e1c 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -290,13 +290,8 @@ enum x86emul_mode {
#define X86EMUL_SMM_MASK (1 << 6)
#define X86EMUL_SMM_INSIDE_NMI_MASK (1 << 7)

-/*
- * fastop functions are declared as taking a never-defined fastop parameter,
- * so they can't be called from C directly.
- */
-struct fastop;
-
-typedef void (*fastop_t)(struct fastop *);
+/* fastop functions cannot be called from C directly. */
+typedef asm_func_ptr fastop_t;

struct x86_emulate_ctxt {
void *vcpu;
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4f63117f09bb..d52929ac70c7 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -584,8 +584,8 @@ DEFINE_IDTENTRY_RAW(xenpv_exc_machine_check)
#endif

struct trap_array_entry {
- void (*orig)(void);
- void (*xen)(void);
+ asm_func_ptr orig;
+ asm_func_ptr xen;
bool ist_okay;
};

@@ -644,7 +644,7 @@ static bool __ref get_trap_addr(void **addr, unsigned int ist)
struct trap_array_entry *entry = trap_array + nr;

if (*addr == entry->orig) {
- *addr = entry->xen;
+ *addr = (void *)entry->xen;
ist_okay = entry->ist_okay;
found = true;
break;
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 8bc8b72a205d..a8fbf485556b 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -8,12 +8,12 @@
#include <xen/xen-ops.h>

/* These are code, but not functions. Defined in entry.S */
-extern const char xen_failsafe_callback[];
+DECLARE_NOT_CALLED_FROM_C(xen_failsafe_callback);

-void xen_sysenter_target(void);
+DECLARE_NOT_CALLED_FROM_C(xen_sysenter_target);
#ifdef CONFIG_X86_64
-void xen_syscall_target(void);
-void xen_syscall32_target(void);
+DECLARE_NOT_CALLED_FROM_C(xen_syscall_target);
+DECLARE_NOT_CALLED_FROM_C(xen_syscall32_target);
#endif

extern void *xen_initial_gdt;
@@ -139,7 +139,7 @@ __visible unsigned long xen_read_cr2(void);
__visible unsigned long xen_read_cr2_direct(void);

/* These are not functions, and cannot be called normally */
-__visible void xen_iret(void);
+DECLARE_NOT_CALLED_FROM_C(xen_iret);

extern int xen_panic_handler_init(void);

--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:20:48

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 12/15] x86, module: Ignore __typeid__ relocations

The R_X86_64_8 __typeid__* relocations are for constants the compiler
generates for indirect call type checking with CONFIG_CFI_CLANG. Ignore
them when loading modules.

Signed-off-by: Sami Tolvanen <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
arch/x86/kernel/module.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 5e9a34b5bd74..c4aeba237eef 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -197,6 +197,10 @@ static int __apply_relocate_add(Elf64_Shdr *sechdrs,
val -= (u64)loc;
write(loc, &val, 8);
break;
+ case R_X86_64_8:
+ if (!strncmp(strtab + sym->st_name, "__typeid__", 10))
+ break;
+ fallthrough;
default:
pr_err("%s: Unknown rela relocation: %llu\n",
me->name, ELF64_R_TYPE(rel[i].r_info));
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:20:52

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 13/15] x86, cpu: Use LTO for cpu.c with CFI

Allow LTO to be used for cpu.c when CONFIG_CFI_CLANG is enabled to avoid
indirect call failures. CFI requires Clang >= 13, which doesn't have the
stack protector inlining bug.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
arch/x86/power/Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile
index 379777572bc9..a0532851fed7 100644
--- a/arch/x86/power/Makefile
+++ b/arch/x86/power/Makefile
@@ -4,9 +4,11 @@
# itself be stack-protected
CFLAGS_cpu.o := -fno-stack-protector

+ifndef CONFIG_CFI_CLANG
# Clang may incorrectly inline functions with stack protector enabled into
# __restore_processor_state(): https://bugs.llvm.org/show_bug.cgi?id=47479
CFLAGS_REMOVE_cpu.o := $(CC_FLAGS_LTO)
+endif

obj-$(CONFIG_PM_SLEEP) += cpu.o
obj-$(CONFIG_HIBERNATION) += hibernate_$(BITS).o hibernate_asm_$(BITS).o hibernate.o
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:22:21

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

The kernel has several assembly functions, which are not directly
callable from C but need to be referred to from C code. This change adds
the DECLARE_NOT_CALLED_FROM_C macro, which allows us to declare these
symbols using an opaque type, which makes misuse harder, and avoids the
need to annotate references to the functions for Clang's Control-Flow
Integrity (CFI).

Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
include/linux/linkage.h | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/include/linux/linkage.h b/include/linux/linkage.h
index dbf8506decca..f982d5f550ac 100644
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -48,6 +48,19 @@
#define __PAGE_ALIGNED_DATA .section ".data..page_aligned", "aw"
#define __PAGE_ALIGNED_BSS .section ".bss..page_aligned", "aw"

+/*
+ * Declares a function not callable from C using an opaque type. Defined as
+ * an array to allow the address of the symbol to be taken without '&'.
+ */
+#ifndef DECLARE_NOT_CALLED_FROM_C
+#define DECLARE_NOT_CALLED_FROM_C(sym) \
+ extern const u8 sym[]
+#endif
+
+#ifndef __ASSEMBLY__
+typedef const u8 *asm_func_ptr;
+#endif
+
/*
* This is used by architectures to keep arguments on the stack
* untouched by the compiler by keeping them live until the end.
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:22:46

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 11/15] x86, relocs: Ignore __typeid__ relocations

From: Kees Cook <[email protected]>

The R_X86_64_8 __typeid__* relocations are for constants the compiler
generates for indirect call type checking with CONFIG_CFI_CLANG. They
can be ignored during relocation generation.

Signed-off-by: Kees Cook <[email protected]>
[ Sami: clarified the commit message ]
Signed-off-by: Sami Tolvanen <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
arch/x86/tools/relocs.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 27c82207d387..5304a6037924 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -51,6 +51,7 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
"^(xen_irq_disable_direct_reloc$|"
"xen_save_fl_direct_reloc$|"
"VDSO|"
+ "__typeid__|"
"__crc_)",

/*
@@ -811,6 +812,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
symname);
break;

+ case R_X86_64_8:
+ if (!shn_abs || !is_reloc(S_ABS, symname))
+ die("Non-whitelisted %s relocation: %s\n",
+ rel_type(r_type), symname);
+ break;
+
case R_X86_64_32:
case R_X86_64_32S:
case R_X86_64_64:
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:23:34

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v5 15/15] x86, build: Allow CONFIG_CFI_CLANG to be selected

Select ARCH_SUPPORTS_CFI_CLANG to allow CFI to be enabled with
Clang >= 13.

Link: https://bugs.llvm.org/show_bug.cgi?id=51588
Signed-off-by: Sami Tolvanen <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 47023166fb7b..1f310cc4e344 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -107,6 +107,7 @@ config X86
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
+ select ARCH_SUPPORTS_CFI_CLANG if X86_64 && CLANG_VERSION >= 130000
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
--
2.33.0.1079.g6e70778dc9-goog

2021-10-13 18:58:28

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 12/15] x86, module: Ignore __typeid__ relocations

On Wed, Oct 13, 2021 at 11:16:55AM -0700, Sami Tolvanen wrote:
> The R_X86_64_8 __typeid__* relocations are for constants the compiler
> generates for indirect call type checking with CONFIG_CFI_CLANG. Ignore
> them when loading modules.
>
> Signed-off-by: Sami Tolvanen <[email protected]>

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2021-10-13 18:58:41

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 15/15] x86, build: Allow CONFIG_CFI_CLANG to be selected

On Wed, Oct 13, 2021 at 11:16:58AM -0700, Sami Tolvanen wrote:
> Select ARCH_SUPPORTS_CFI_CLANG to allow CFI to be enabled with
> Clang >= 13.
>
> Link: https://bugs.llvm.org/show_bug.cgi?id=51588
> Signed-off-by: Sami Tolvanen <[email protected]>

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2021-10-13 19:02:34

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 01/15] objtool: Add CONFIG_CFI_CLANG support

On Wed, Oct 13, 2021 at 11:16:44AM -0700, Sami Tolvanen wrote:
> The upcoming CONFIG_CFI_CLANG support uses -fsanitize=cfi, the
> non-canonical version of which hijacks function entry by changing
> function relocation references to point to an intermediary jump table.
>
> For example:
>
> Relocation section '.rela.discard.func_stack_frame_non_standard' at offset 0x37e018 contains 6 entries:
> Offset Info Type Symbol's Value Symbol's Name + Addend
> 0000000000000000 0002944700000002 R_X86_64_PC32 00000000000023f0 do_suspend_lowlevel + 0
> 0000000000000008 0003c11900000001 R_X86_64_64 0000000000000008 xen_cpuid$e69bc59f4fade3b6f2b579b3934137df.cfi_jt + 0
> 0000000000000010 0003980900000001 R_X86_64_64 0000000000000060 machine_real_restart.cfi_jt + 0
> 0000000000000018 0003962b00000001 R_X86_64_64 0000000000000e18 kretprobe_trampoline.cfi_jt + 0
> 0000000000000020 000028f300000001 R_X86_64_64 0000000000000000 .rodata + 12
> 0000000000000028 000349f400000001 R_X86_64_64 0000000000000018 __crash_kexec.cfi_jt + 0
>
> 0000000000000060 <machine_real_restart.cfi_jt>:
> 60: e9 00 00 00 00 jmpq 65 <machine_real_restart.cfi_jt+0x5>
> 61: R_X86_64_PLT32 machine_real_restart-0x4
> 65: cc int3
> 66: cc int3
> 67: cc int3
>
> This breaks objtool vmlinux validation in many ways, including static
> call site detection and the STACK_FRAME_NON_STANDARD() macro.
>
> Fix it by converting those relocations' symbol references back to their
> original non-jump-table versions. Note this doesn't change the actual
> relocations in the object itself, it just changes objtool's view of
> them. This change is based on Josh's initial patch:
>
> https://lore.kernel.org/r/d743f4b36e120c06506567a9f87a062ae03da47f.1611263462.git.jpoimboe@redhat.com/
>
> Reported-by: Sedat Dilek <[email protected]>
> Suggested-by: Josh Poimboeuf <[email protected]>
> Signed-off-by: Sami Tolvanen <[email protected]>

This looks really clean. Thanks!

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2021-10-13 19:03:26

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Wed, Oct 13, 2021 at 11:16:46AM -0700, Sami Tolvanen wrote:
> The kernel has several assembly functions, which are not directly
> callable from C but need to be referred to from C code. This change adds
> the DECLARE_NOT_CALLED_FROM_C macro, which allows us to declare these
> symbols using an opaque type, which makes misuse harder, and avoids the
> need to annotate references to the functions for Clang's Control-Flow
> Integrity (CFI).
>
> Suggested-by: Andy Lutomirski <[email protected]>
> Suggested-by: Steven Rostedt <[email protected]>
> Signed-off-by: Sami Tolvanen <[email protected]>

I like this; I have a sense CFI won't stay the only user of this
annotation.

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2021-10-13 19:03:56

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 04/15] cfi: Add DEFINE_CFI_IMMEDIATE_RETURN_STUB

On Wed, Oct 13, 2021 at 11:16:47AM -0700, Sami Tolvanen wrote:
> This change introduces the DEFINE_CFI_IMMEDIATE_RETURN_STUB macro,
> which defines a stub function that immediately returns and when
> defined in the core kernel, always passes indirect call checking
> with CONFIG_CFI_CLANG. Note that this macro should only be used when
> a stub cannot be called using the correct function type.
>
> Signed-off-by: Sami Tolvanen <[email protected]>

I remain a bit worried about this exception infrastructure, but it's the
best way forward right now.

One thought: add DEFINE_CFI_IMMEDIATE_RETURN_STUB (and maybe other
things to watch closely) to MAINTAINERS:

diff --git a/MAINTAINERS b/MAINTAINERS
index abdcbcfef73d..2c9a24fd6a3c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4553,6 +4553,7 @@ B: https://github.com/ClangBuiltLinux/linux/issues
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/clang/features
F: include/linux/cfi.h
F: kernel/cfi.c
+K: \bDEFINE_CFI_IMMEDIATE_RETURN_STUB\b

CLEANCACHE API
M: Konrad Rzeszutek Wilk <[email protected]>

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2021-10-13 19:05:39

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 06/15] ftrace: Use an opaque type for functions not callable from C

On Wed, Oct 13, 2021 at 11:16:49AM -0700, Sami Tolvanen wrote:
> With CONFIG_CFI_CLANG, the compiler changes function references to
> point to the CFI jump table. As ftrace_call, ftrace_regs_call, and
> mcount_call are not called by C code, but are trampolines injected
> as calls replacing the nops at the start of functions added by the
> compiler, use DECLARE_NOT_CALLED_FROM_C to declare them.
>
> Signed-off-by: Sami Tolvanen <[email protected]>

Looks right to me...

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2021-10-13 19:06:56

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 05/15] tracepoint: Exclude tp_stub_func from CFI checking

On Wed, Oct 13, 2021 at 11:16:48AM -0700, Sami Tolvanen wrote:
> If allocate_probes fails, func_remove replaces the old function
> with a pointer to tp_stub_func, which is called using a mismatching
> function pointer that will always trip indirect call checks with
> CONFIG_CFI_CLANG. Use DEFINE_CFI_IMMEDATE_RETURN_STUB to define
> tp_stub_func to allow it to pass CFI checking.
>
> Signed-off-by: Sami Tolvanen <[email protected]>

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2021-10-13 19:08:57

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 10/15] x86/purgatory: Disable CFI

On Wed, Oct 13, 2021 at 11:16:53AM -0700, Sami Tolvanen wrote:
> Disable CONFIG_CFI_CLANG for the stand-alone purgatory.ro.

If there's a v6, this commit log might do well to have a "why" added. I
assume it'd be something like:

... because purgatory is not running with a full kernel mapping with
jump tables, etc...

Reviewed-by: Kees Cook <[email protected]>

>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Tested-by: Sedat Dilek <[email protected]>
> ---
> arch/x86/purgatory/Makefile | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 95ea17a9d20c..911954fec31c 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -55,6 +55,10 @@ ifdef CONFIG_RETPOLINE
> PURGATORY_CFLAGS_REMOVE += $(RETPOLINE_CFLAGS)
> endif
>
> +ifdef CONFIG_CFI_CLANG
> +PURGATORY_CFLAGS_REMOVE += $(CC_FLAGS_CFI)
> +endif
> +
> CFLAGS_REMOVE_purgatory.o += $(PURGATORY_CFLAGS_REMOVE)
> CFLAGS_purgatory.o += $(PURGATORY_CFLAGS)
>
> --
> 2.33.0.1079.g6e70778dc9-goog
>

--
Kees Cook

2021-10-13 19:11:58

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 13, 2021 at 11:16:43AM -0700, Sami Tolvanen wrote:
> This series adds support for Clang's Control-Flow Integrity (CFI)
> checking to x86_64. With CFI, the compiler injects a runtime
> check before each indirect function call to ensure the target is
> a valid function with the correct static type. This restricts
> possible call targets and makes it more difficult for an attacker
> to exploit bugs that allow the modification of stored function
> pointers. For more details, see:
>
> https://clang.llvm.org/docs/ControlFlowIntegrity.html
>
> Note that v5 is based on tip/master. The first two patches contain
> objtool support for CFI, the remaining patches change function
> declarations to use opaque types, fix type mismatch issues that
> confuse the compiler, and disable CFI where it can't be used.

x86 folks: I'd prefer this series went via -tip, but I can carry it for
-next as well. What would you like to do here? I think it's ready.

Thanks!

-Kees

--
Kees Cook

2021-10-13 19:21:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 05/15] tracepoint: Exclude tp_stub_func from CFI checking

On Wed, 13 Oct 2021 11:16:48 -0700
Sami Tolvanen <[email protected]> wrote:

> If allocate_probes fails, func_remove replaces the old function
> with a pointer to tp_stub_func, which is called using a mismatching
> function pointer that will always trip indirect call checks with
> CONFIG_CFI_CLANG. Use DEFINE_CFI_IMMEDATE_RETURN_STUB to define
> tp_stub_func to allow it to pass CFI checking.
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Tested-by: Sedat Dilek <[email protected]>
> ---
> kernel/tracepoint.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 64ea283f2f86..8a0d463c8507 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -9,6 +9,7 @@
> #include <linux/list.h>
> #include <linux/rcupdate.h>
> #include <linux/tracepoint.h>
> +#include <linux/cfi.h>
> #include <linux/err.h>
> #include <linux/slab.h>
> #include <linux/sched/signal.h>
> @@ -99,10 +100,7 @@ struct tp_probes {
> };
>
> /* Called in removal of a func but failed to allocate a new tp_funcs */
> -static void tp_stub_func(void)
> -{
> - return;
> -}
> +static DEFINE_CFI_IMMEDIATE_RETURN_STUB(tp_stub_func);

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve

>
> static inline void *allocate_probes(int count)
> {

2021-10-13 19:23:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 06/15] ftrace: Use an opaque type for functions not callable from C

On Wed, 13 Oct 2021 11:16:49 -0700
Sami Tolvanen <[email protected]> wrote:

> With CONFIG_CFI_CLANG, the compiler changes function references to
> point to the CFI jump table. As ftrace_call, ftrace_regs_call, and
> mcount_call are not called by C code, but are trampolines injected
> as calls replacing the nops at the start of functions added by the
> compiler, use DECLARE_NOT_CALLED_FROM_C to declare them.
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Tested-by: Sedat Dilek <[email protected]>
> ---
> include/linux/ftrace.h | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 832e65f06754..c53a00b96ba9 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -578,9 +578,10 @@ extern void ftrace_replace_code(int enable);
> extern int ftrace_update_ftrace_func(ftrace_func_t func);
> extern void ftrace_caller(void);
> extern void ftrace_regs_caller(void);
> -extern void ftrace_call(void);
> -extern void ftrace_regs_call(void);
> -extern void mcount_call(void);
> +
> +DECLARE_NOT_CALLED_FROM_C(ftrace_call);
> +DECLARE_NOT_CALLED_FROM_C(ftrace_regs_call);
> +DECLARE_NOT_CALLED_FROM_C(mcount_call);
>
> void ftrace_modify_all_code(int command);
>

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve

2021-10-14 00:48:00

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v5 01/15] objtool: Add CONFIG_CFI_CLANG support

On Wed, Oct 13, 2021 at 11:16:44AM -0700, Sami Tolvanen wrote:
> The upcoming CONFIG_CFI_CLANG support uses -fsanitize=cfi, the
> non-canonical version of which hijacks function entry by changing
> function relocation references to point to an intermediary jump table.
>
> For example:
>
> Relocation section '.rela.discard.func_stack_frame_non_standard' at offset 0x37e018 contains 6 entries:
> Offset Info Type Symbol's Value Symbol's Name + Addend
> 0000000000000000 0002944700000002 R_X86_64_PC32 00000000000023f0 do_suspend_lowlevel + 0
> 0000000000000008 0003c11900000001 R_X86_64_64 0000000000000008 xen_cpuid$e69bc59f4fade3b6f2b579b3934137df.cfi_jt + 0
> 0000000000000010 0003980900000001 R_X86_64_64 0000000000000060 machine_real_restart.cfi_jt + 0
> 0000000000000018 0003962b00000001 R_X86_64_64 0000000000000e18 kretprobe_trampoline.cfi_jt + 0
> 0000000000000020 000028f300000001 R_X86_64_64 0000000000000000 .rodata + 12
> 0000000000000028 000349f400000001 R_X86_64_64 0000000000000018 __crash_kexec.cfi_jt + 0
>
> 0000000000000060 <machine_real_restart.cfi_jt>:
> 60: e9 00 00 00 00 jmpq 65 <machine_real_restart.cfi_jt+0x5>
> 61: R_X86_64_PLT32 machine_real_restart-0x4
> 65: cc int3
> 66: cc int3
> 67: cc int3
>
> This breaks objtool vmlinux validation in many ways, including static
> call site detection and the STACK_FRAME_NON_STANDARD() macro.
>
> Fix it by converting those relocations' symbol references back to their
> original non-jump-table versions. Note this doesn't change the actual
> relocations in the object itself, it just changes objtool's view of
> them. This change is based on Josh's initial patch:
>
> https://lore.kernel.org/r/d743f4b36e120c06506567a9f87a062ae03da47f.1611263462.git.jpoimboe@redhat.com/
>
> Reported-by: Sedat Dilek <[email protected]>
> Suggested-by: Josh Poimboeuf <[email protected]>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Tested-by: Sedat Dilek <[email protected]>

Acked-by: Josh Poimboeuf <[email protected]>

--
Josh

2021-10-14 10:26:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 01/15] objtool: Add CONFIG_CFI_CLANG support

On Wed, Oct 13, 2021 at 11:16:44AM -0700, Sami Tolvanen wrote:
> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> index 1f2ae708b223..5fe31523e51f 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -63,6 +63,23 @@ bool arch_callee_saved_reg(unsigned char reg)
> }
> }
>
> +unsigned long arch_cfi_section_reloc_offset(struct reloc *reloc)
> +{
> + if (!reloc->addend)
> + return 0;
> +
> + if (reloc->type == R_X86_64_PC32 || reloc->type == R_X86_64_PLT32)
> + return reloc->addend + 4;
> +
> + return reloc->addend;
> +}
> +
> +unsigned long arch_cfi_jump_reloc_offset(unsigned long offset)
> +{
> + /* offset to the relocation in a jmp instruction */
> + return offset + 1;
> +}
> +
> unsigned long arch_dest_reloc_offset(int addend)
> {
> return addend + 4;
> diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> index b18f0055b50b..cd09c93c34fb 100644
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c

> @@ -575,6 +580,49 @@ static int read_rela_reloc(struct section *sec, int i, struct reloc *reloc, unsi
> return 0;
> }
>
> +/*
> + * CONFIG_CFI_CLANG replaces function relocations to refer to an intermediate
> + * jump table. Undo the conversion so objtool can make sense of things.
> + */
> +static int fix_cfi_relocs(const struct elf *elf)
> +{
> + struct section *sec;
> + struct reloc *reloc;
> +
> + list_for_each_entry(sec, &elf->sections, list) {
> + list_for_each_entry(reloc, &sec->reloc_list, list) {
> + struct reloc *cfi_reloc;
> + unsigned long offset;
> +
> + if (!reloc->sym->sec->cfi_jt)
> + continue;
> +
> + if (reloc->sym->type == STT_SECTION)
> + offset = arch_cfi_section_reloc_offset(reloc);
> + else
> + offset = reloc->sym->offset;
> +
> + /*
> + * The jump table immediately jumps to the actual function,
> + * so look up the relocation there.
> + */
> + offset = arch_cfi_jump_reloc_offset(offset);
> + cfi_reloc = find_reloc_by_dest(elf, reloc->sym->sec, offset);
> +
> + if (!cfi_reloc || !cfi_reloc->sym) {
> + WARN("can't find a CFI jump table relocation at %s+0x%lx",
> + reloc->sym->sec->name, offset);
> + return -1;
> + }
> +
> + reloc->sym = cfi_reloc->sym;
> + reloc->addend = 0;
> + }
> + }
> +
> + return 0;
> +}

If that section ever gets modified and we end up running
elf_rebuild_reloc_section() on it, we're in trouble, right?

Do we want a fatal error in elf_rebuild_reloc_section() when ->cfi_jt is
set?

2021-10-14 12:13:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

On Wed, Oct 13, 2021 at 11:16:52AM -0700, Sami Tolvanen wrote:
> The kernel has several assembly functions that are not directly callable
> from C. Use an opaque type for these function prototypes to make misuse
> harder, and to avoid the need to annotate references to these functions
> for Clang's Control-Flow Integrity (CFI).
>
> Suggested-by: Andy Lutomirski <[email protected]>
> Suggested-by: Alexander Lobakin <[email protected]>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Tested-by: Sedat Dilek <[email protected]>
> ---
> arch/x86/include/asm/ftrace.h | 2 +-
> arch/x86/include/asm/idtentry.h | 10 +++++-----
> arch/x86/include/asm/page_64.h | 7 ++++---
> arch/x86/include/asm/paravirt_types.h | 3 ++-
> arch/x86/include/asm/processor.h | 2 +-
> arch/x86/include/asm/proto.h | 25 +++++++++++++------------
> arch/x86/include/asm/uaccess_64.h | 9 +++------
> arch/x86/kernel/alternative.c | 2 +-
> arch/x86/kernel/ftrace.c | 2 +-
> arch/x86/kernel/paravirt.c | 4 ++--
> arch/x86/kvm/emulate.c | 4 ++--
> arch/x86/kvm/kvm_emulate.h | 9 ++-------
> arch/x86/xen/enlighten_pv.c | 6 +++---
> arch/x86/xen/xen-ops.h | 10 +++++-----
> 14 files changed, 45 insertions(+), 50 deletions(-)

No matter from which direction I look at it, wrapping some functions
which a crazy macro doesn't look good.

So what's the plan here?

Everytime someone adds an asm function which is not callable from C but
forgets to use that magic macro, someone from team CFI will send a patch
fixing that?

I.e., a whack-a-mole game?

If we're going to do that keep-CFI-working game, we might just as well
not do the macro but use the C code it evaluates to, so that at least it
looks ok-ish:

DECLARE_NOT_CALLED_FROM_C(int3_magic);

vs

extern const u8 int3_magic[];

(Just picked one at random).

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-10-14 19:07:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

On Thu, Oct 14, 2021 at 09:07:57AM -0700, Kees Cook wrote:
> I don't think it's a super common thing to add, so in this case, yes,
> I think doing it on a case-by-case basis will be fine.

You don't have a choice - if there's no automation which verifies
whether all the CFI annotation needed is there, people won't know what
to wrap in what macro.

> I'd _much_ prefer keeping the macro, as it explains what's going on,
> which doesn't require a comment at every "extern const u8 foo[]" usage.

You don't have to - it is just an extern.

> It serves as an annotation, etc.

Oh, that I figured.

> And, there's been a lot of discussion on the best way to do this, what
> to name it, etc.

Looking at the changelog, DECLARE_ASM_FUNC_SYMBOL, makes a lot more
sense to me even if it doesn't specify the aspect that it is not called
by C but who cares - it is generic enough.

> This looks to be the best option currently.

Maybe because wrapping some random symbols in a obfuscating macro to
make the next tool happy, is simply the wrong thing to do. I know, I
know, clang CFI needs it because of magical reason X but that doesn't
make it any better. Someday soon we'll have to write a tutorial for
people submitting kernel patches explaining what annotation to add where
and why.

Why can't clang be taught to ignore those symbols:

clang -fsanitize=cfi -fsanitize-cfi-ignore-symbols=<list>

?

Hmm, looking at https://clang.llvm.org/docs/ControlFlowIntegrity.html

there *is* an ignore list:

"Ignorelist

A Sanitizer special case list can be used to relax CFI checks for
certain source files, functions and types using the src, fun and type
entity types. Specific CFI modes can be be specified using [section]
headers.

...

# Ignore all functions with names containing MyFooBar.
fun:*MyFooBar*
..."


So why aren't we doing that instead of those macros?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-10-14 20:31:08

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

On Thu, Oct 14, 2021 at 01:21:38PM +0200, Borislav Petkov wrote:
> On Wed, Oct 13, 2021 at 11:16:52AM -0700, Sami Tolvanen wrote:
> > The kernel has several assembly functions that are not directly callable
> > from C. Use an opaque type for these function prototypes to make misuse
> > harder, and to avoid the need to annotate references to these functions
> > for Clang's Control-Flow Integrity (CFI).
> >
> > Suggested-by: Andy Lutomirski <[email protected]>
> > Suggested-by: Alexander Lobakin <[email protected]>
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Reviewed-by: Kees Cook <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
> > Tested-by: Sedat Dilek <[email protected]>
> > ---
> > arch/x86/include/asm/ftrace.h | 2 +-
> > arch/x86/include/asm/idtentry.h | 10 +++++-----
> > arch/x86/include/asm/page_64.h | 7 ++++---
> > arch/x86/include/asm/paravirt_types.h | 3 ++-
> > arch/x86/include/asm/processor.h | 2 +-
> > arch/x86/include/asm/proto.h | 25 +++++++++++++------------
> > arch/x86/include/asm/uaccess_64.h | 9 +++------
> > arch/x86/kernel/alternative.c | 2 +-
> > arch/x86/kernel/ftrace.c | 2 +-
> > arch/x86/kernel/paravirt.c | 4 ++--
> > arch/x86/kvm/emulate.c | 4 ++--
> > arch/x86/kvm/kvm_emulate.h | 9 ++-------
> > arch/x86/xen/enlighten_pv.c | 6 +++---
> > arch/x86/xen/xen-ops.h | 10 +++++-----
> > 14 files changed, 45 insertions(+), 50 deletions(-)
>
> No matter from which direction I look at it, wrapping some functions
> which a crazy macro doesn't look good.
>
> So what's the plan here?
>
> Everytime someone adds an asm function which is not callable from C but
> forgets to use that magic macro, someone from team CFI will send a patch
> fixing that?
>
> I.e., a whack-a-mole game?

I don't think it's a super common thing to add, so in this case, yes,
I think doing it on a case-by-case basis will be fine. This is common
practice in the kernel; not everyone tests all CONFIGs, so stuff gets
missed, patches are sent, life goes on. :)

> If we're going to do that keep-CFI-working game, we might just as well
> not do the macro but use the C code it evaluates to, so that at least it
> looks ok-ish:
>
> DECLARE_NOT_CALLED_FROM_C(int3_magic);
>
> vs
>
> extern const u8 int3_magic[];

I'd _much_ prefer keeping the macro, as it explains what's going on,
which doesn't require a comment at every "extern const u8 foo[]" usage.
It serves as an annotation, etc.

And, there's been a lot of discussion on the best way to do this, what
to name it, etc. This looks to be the best option currently.

-Kees

--
Kees Cook

2021-10-14 22:30:09

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

On Thu, Oct 14, 2021 at 10:31 AM Borislav Petkov <[email protected]> wrote:
>
> On Thu, Oct 14, 2021 at 09:07:57AM -0700, Kees Cook wrote:
> > I don't think it's a super common thing to add, so in this case, yes,
> > I think doing it on a case-by-case basis will be fine.
>
> You don't have a choice - if there's no automation which verifies
> whether all the CFI annotation needed is there, people won't know what
> to wrap in what macro.
>
> > I'd _much_ prefer keeping the macro, as it explains what's going on,
> > which doesn't require a comment at every "extern const u8 foo[]" usage.
>
> You don't have to - it is just an extern.
>
> > It serves as an annotation, etc.
>
> Oh, that I figured.
>
> > And, there's been a lot of discussion on the best way to do this, what
> > to name it, etc.
>
> Looking at the changelog, DECLARE_ASM_FUNC_SYMBOL, makes a lot more
> sense to me even if it doesn't specify the aspect that it is not called
> by C but who cares - it is generic enough.
>
> > This looks to be the best option currently.
>
> Maybe because wrapping some random symbols in a obfuscating macro to
> make the next tool happy, is simply the wrong thing to do. I know, I
> know, clang CFI needs it because of magical reason X but that doesn't
> make it any better. Someday soon we'll have to write a tutorial for
> people submitting kernel patches explaining what annotation to add where
> and why.
>
> Why can't clang be taught to ignore those symbols:
>
> clang -fsanitize=cfi -fsanitize-cfi-ignore-symbols=<list>
>
> ?
>
> Hmm, looking at https://clang.llvm.org/docs/ControlFlowIntegrity.html
>
> there *is* an ignore list:
>
> "Ignorelist
>
> A Sanitizer special case list can be used to relax CFI checks for
> certain source files, functions and types using the src, fun and type
> entity types. Specific CFI modes can be be specified using [section]
> headers.
>
> ...
>
> # Ignore all functions with names containing MyFooBar.
> fun:*MyFooBar*
> ..."
>
>
> So why aren't we doing that instead of those macros?

The ignorelist can be used to disable CFI checking in the listed
functions, but the compiler still checks indirect calls made to these
functions from elsewhere, and therefore, using an opaque type is still
necessary to ensure the symbol address isn't replaced with a jump
table address.

Anyway, I thought using a macro would make the code easier to
understand. I'm happy to rename it to something that makes more sense,
but also fine with switching to a simple extern u8[] if that's
preferable.

Sami

2021-10-14 23:22:24

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

On Thu, Oct 14, 2021 at 07:31:26PM +0200, Borislav Petkov wrote:
> On Thu, Oct 14, 2021 at 09:07:57AM -0700, Kees Cook wrote:
> Looking at the changelog, DECLARE_ASM_FUNC_SYMBOL, makes a lot more
> sense to me even if it doesn't specify the aspect that it is not called
> by C but who cares - it is generic enough.

Around we go. :) Josh[1] and Steven[2] explicitly disagreed with
that name, leading to the current name[3]. Do you want it to be
DECLARE_ASM_FUNC_SYMBOL() over those objections?

I'd really like to finish this shed -- I need to take the bikes in from
the rain. :P

-Kees

[1] https://lore.kernel.org/lkml/20211006032945.axlqh3vehgar6adr@treble/
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://lore.kernel.org/lkml/CABCJKufCaOXOUF43a-PQshO8aEsMNhZ2EiyGMSOp9ZGn57G=pg@mail.gmail.com/

--
Kees Cook

2021-10-14 23:43:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

On Thu, 14 Oct 2021 11:47:01 -0700
Kees Cook <[email protected]> wrote:

> On Thu, Oct 14, 2021 at 07:31:26PM +0200, Borislav Petkov wrote:
> > On Thu, Oct 14, 2021 at 09:07:57AM -0700, Kees Cook wrote:
> > Looking at the changelog, DECLARE_ASM_FUNC_SYMBOL, makes a lot more
> > sense to me even if it doesn't specify the aspect that it is not called
> > by C but who cares - it is generic enough.
>
> Around we go. :) Josh[1] and Steven[2] explicitly disagreed with
> that name, leading to the current name[3]. Do you want it to be
> DECLARE_ASM_FUNC_SYMBOL() over those objections?

Just note, that I was fine with the original name, but was against the
version Josh suggested ;-)

>
> I'd really like to finish this shed -- I need to take the bikes in from
> the rain. :P

Back to black it is!

-- Steve

2021-10-15 00:02:39

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

On Thu, Oct 14, 2021 at 02:52:11PM -0400, Steven Rostedt wrote:
> On Thu, 14 Oct 2021 11:47:01 -0700
> Kees Cook <[email protected]> wrote:
>
> > On Thu, Oct 14, 2021 at 07:31:26PM +0200, Borislav Petkov wrote:
> > > On Thu, Oct 14, 2021 at 09:07:57AM -0700, Kees Cook wrote:
> > > Looking at the changelog, DECLARE_ASM_FUNC_SYMBOL, makes a lot more
> > > sense to me even if it doesn't specify the aspect that it is not called
> > > by C but who cares - it is generic enough.
> >
> > Around we go. :) Josh[1] and Steven[2] explicitly disagreed with
> > that name, leading to the current name[3]. Do you want it to be
> > DECLARE_ASM_FUNC_SYMBOL() over those objections?
>
> Just note, that I was fine with the original name, but was against the
> version Josh suggested ;-)

Naming is important, especially for something as confusing as this. We
need to be able to read it in a few months and have some idea of what's
going on.

"DECLARE_ASM_FUNC_SYMBOL" is nonsensical. As a reader of the code I
wonder why are some asm functions using it and not others? And how do I
know if I need it for my new function?

"extern const u8 int3_magic[]" is even worse. Why are some asm
functions randomly declared as arrays, and others not? Where can I go
to find out more without digging through the commit log?

--
Josh

2021-10-15 00:02:50

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v5 09/15] x86: Use an opaque type for functions not callable from C

On Thu, Oct 14, 2021 at 11:24 AM Sami Tolvanen <[email protected]> wrote:
>
> Anyway, I thought using a macro would make the code easier to
> understand. I'm happy to rename it to something that makes more sense,
> but also fine with switching to a simple extern u8[] if that's
> preferable.

Perhaps `extern u8 []`s with a comment that these symbols are
functions that aren't meant to be called from C, only asm (or compiler
generated code) would suffice?
--
Thanks,
~Nick Desaulniers

2021-10-15 00:26:28

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 01/15] objtool: Add CONFIG_CFI_CLANG support

On Thu, Oct 14, 2021 at 3:23 AM Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Oct 13, 2021 at 11:16:44AM -0700, Sami Tolvanen wrote:
> > diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> > index 1f2ae708b223..5fe31523e51f 100644
> > --- a/tools/objtool/arch/x86/decode.c
> > +++ b/tools/objtool/arch/x86/decode.c
> > @@ -63,6 +63,23 @@ bool arch_callee_saved_reg(unsigned char reg)
> > }
> > }
> >
> > +unsigned long arch_cfi_section_reloc_offset(struct reloc *reloc)
> > +{
> > + if (!reloc->addend)
> > + return 0;
> > +
> > + if (reloc->type == R_X86_64_PC32 || reloc->type == R_X86_64_PLT32)
> > + return reloc->addend + 4;
> > +
> > + return reloc->addend;
> > +}
> > +
> > +unsigned long arch_cfi_jump_reloc_offset(unsigned long offset)
> > +{
> > + /* offset to the relocation in a jmp instruction */
> > + return offset + 1;
> > +}
> > +
> > unsigned long arch_dest_reloc_offset(int addend)
> > {
> > return addend + 4;
> > diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> > index b18f0055b50b..cd09c93c34fb 100644
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
>
> > @@ -575,6 +580,49 @@ static int read_rela_reloc(struct section *sec, int i, struct reloc *reloc, unsi
> > return 0;
> > }
> >
> > +/*
> > + * CONFIG_CFI_CLANG replaces function relocations to refer to an intermediate
> > + * jump table. Undo the conversion so objtool can make sense of things.
> > + */
> > +static int fix_cfi_relocs(const struct elf *elf)
> > +{
> > + struct section *sec;
> > + struct reloc *reloc;
> > +
> > + list_for_each_entry(sec, &elf->sections, list) {
> > + list_for_each_entry(reloc, &sec->reloc_list, list) {
> > + struct reloc *cfi_reloc;
> > + unsigned long offset;
> > +
> > + if (!reloc->sym->sec->cfi_jt)
> > + continue;
> > +
> > + if (reloc->sym->type == STT_SECTION)
> > + offset = arch_cfi_section_reloc_offset(reloc);
> > + else
> > + offset = reloc->sym->offset;
> > +
> > + /*
> > + * The jump table immediately jumps to the actual function,
> > + * so look up the relocation there.
> > + */
> > + offset = arch_cfi_jump_reloc_offset(offset);
> > + cfi_reloc = find_reloc_by_dest(elf, reloc->sym->sec, offset);
> > +
> > + if (!cfi_reloc || !cfi_reloc->sym) {
> > + WARN("can't find a CFI jump table relocation at %s+0x%lx",
> > + reloc->sym->sec->name, offset);
> > + return -1;
> > + }
> > +
> > + reloc->sym = cfi_reloc->sym;
> > + reloc->addend = 0;
> > + }
> > + }
> > +
> > + return 0;
> > +}
>
> If that section ever gets modified and we end up running
> elf_rebuild_reloc_section() on it, we're in trouble, right?
>
> Do we want a fatal error in elf_rebuild_reloc_section() when ->cfi_jt is
> set?

That's a valid point. Since ->cfi_jt is only set for jump table
sections, I think we'll need a different flag for this though.

Sami

2021-10-15 11:15:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C



On Wed, Oct 13, 2021, at 11:16 AM, Sami Tolvanen wrote:
> The kernel has several assembly functions, which are not directly
> callable from C but need to be referred to from C code. This change adds
> the DECLARE_NOT_CALLED_FROM_C macro, which allows us to declare these
> symbols using an opaque type, which makes misuse harder, and avoids the
> need to annotate references to the functions for Clang's Control-Flow
> Integrity (CFI).
>
> Suggested-by: Andy Lutomirski <[email protected]>
> Suggested-by: Steven Rostedt <[email protected]>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>
> Tested-by: Sedat Dilek <[email protected]>
> ---
> include/linux/linkage.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/linkage.h b/include/linux/linkage.h
> index dbf8506decca..f982d5f550ac 100644
> --- a/include/linux/linkage.h
> +++ b/include/linux/linkage.h
> @@ -48,6 +48,19 @@
> #define __PAGE_ALIGNED_DATA .section ".data..page_aligned", "aw"
> #define __PAGE_ALIGNED_BSS .section ".bss..page_aligned", "aw"
>
> +/*
> + * Declares a function not callable from C using an opaque type. Defined as
> + * an array to allow the address of the symbol to be taken without '&'.
> + */

I’m not convinced that taking the address without using & is a laudable goal. The magical arrays-are-pointers-too behavior of C is a mistake, not a delightful simplification.

> +#ifndef DECLARE_NOT_CALLED_FROM_C
> +#define DECLARE_NOT_CALLED_FROM_C(sym) \
> + extern const u8 sym[]
> +#endif

The relevant property of these symbols isn’t that they’re not called from C. The relevant thing is that they are just and not objects of a type that the programmer cares to tell the compiler about. (Or that the compiler understands, for that matter. On a system with XO memory or if they’re in a funny section, dereferencing them may fail.)

So I think we should use incomplete structs, which can’t be dereferenced and will therefore be less error prone.

2021-10-16 22:00:19

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Thu, Oct 14, 2021 at 7:51 PM Andy Lutomirski <[email protected]> wrote:
>
>
>
> On Wed, Oct 13, 2021, at 11:16 AM, Sami Tolvanen wrote:
> > The kernel has several assembly functions, which are not directly
> > callable from C but need to be referred to from C code. This change adds
> > the DECLARE_NOT_CALLED_FROM_C macro, which allows us to declare these
> > symbols using an opaque type, which makes misuse harder, and avoids the
> > need to annotate references to the functions for Clang's Control-Flow
> > Integrity (CFI).
> >
> > Suggested-by: Andy Lutomirski <[email protected]>
> > Suggested-by: Steven Rostedt <[email protected]>
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
> > Tested-by: Sedat Dilek <[email protected]>
> > ---
> > include/linux/linkage.h | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/include/linux/linkage.h b/include/linux/linkage.h
> > index dbf8506decca..f982d5f550ac 100644
> > --- a/include/linux/linkage.h
> > +++ b/include/linux/linkage.h
> > @@ -48,6 +48,19 @@
> > #define __PAGE_ALIGNED_DATA .section ".data..page_aligned", "aw"
> > #define __PAGE_ALIGNED_BSS .section ".bss..page_aligned", "aw"
> >
> > +/*
> > + * Declares a function not callable from C using an opaque type. Defined as
> > + * an array to allow the address of the symbol to be taken without '&'.
> > + */
>
> I’m not convinced that taking the address without using & is a laudable goal. The magical arrays-are-pointers-too behavior of C is a mistake, not a delightful simplification.

Sure, if everyone is fine with the extra churn of adding & to the
symbol references, I can certainly switch to an incomplete struct here
instead. I stayed with extern const u8[] because you didn't comment on
the potential churn previously:

https://lore.kernel.org/lkml/CABCJKud4auwY50rO0UzH721eRyyvJ8+43+Xt9vcLgw-SMYtHEw@mail.gmail.com/

Boris, any thoughts about using an incomplete struct instead of extern u8[]?

> > +#ifndef DECLARE_NOT_CALLED_FROM_C
> > +#define DECLARE_NOT_CALLED_FROM_C(sym) \
> > + extern const u8 sym[]
> > +#endif
>
> The relevant property of these symbols isn’t that they’re not called from C. The relevant thing is that they are just and not objects of a type that the programmer cares to tell the compiler about. (Or that the compiler understands, for that matter. On a system with XO memory or if they’re in a funny section, dereferencing them may fail.)

Makes sense. It sounds like the macro should be renamed then. Josh
wasn't a fan of DECLARE_ASM_FUNC_SYMBOL() and if I understand
correctly, you and Boris feel like DECLARE_NOT_CALLED_FROM_C() is not
quite right either. Suggestions?

> So I think we should use incomplete structs, which can’t be dereferenced and will therefore be less error prone.

Another option is to just drop the macro and use something like struct
opaque_symbol instead, as you suggested earlier:

https://lore.kernel.org/lkml/CALCETrVEhL9N_DEM8=rbAzp8Nb2pDitRCZGRAVcE288MBrJ4ug@mail.gmail.com/

Any preferences? I wouldn't mind having a consensus on the approach
and naming before sending v6. :)

Sami

2021-10-16 23:27:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Thu, Oct 14 2021 at 19:51, Andy Lutomirski wrote:
> On Wed, Oct 13, 2021, at 11:16 AM, Sami Tolvanen wrote:
>>
>> +/*
>> + * Declares a function not callable from C using an opaque type. Defined as
>> + * an array to allow the address of the symbol to be taken without '&'.
>> + */
> I’m not convinced that taking the address without using & is a
> laudable goal. The magical arrays-are-pointers-too behavior of C is a
> mistake, not a delightful simplification.

>> +#ifndef DECLARE_NOT_CALLED_FROM_C
>> +#define DECLARE_NOT_CALLED_FROM_C(sym) \
>> + extern const u8 sym[]
>> +#endif
>

> The relevant property of these symbols isn’t that they’re not called
> from C. The relevant thing is that they are just and not objects of a
> type that the programmer cares to tell the compiler about. (Or that
> the compiler understands, for that matter. On a system with XO memory
> or if they’re in a funny section, dereferencing them may fail.)

I agree.

> So I think we should use incomplete structs, which can’t be
> dereferenced and will therefore be less error prone.

While being late to that bike shed painting party, I really have to ask
the question _why_ can't the compiler provide an annotation for these
kind of things which:

1) Make the build fail when invoked directly

2) Tell CFI that this is _NOT_ something it can understand

-void clear_page_erms(void *page);
+void __bikeshedme clear_page_erms(void *page);

That still tells me:

1) This is a function

2) It has a regular argument which is expected to be in RDI

which even allows to do analyis of e.g. the alternative call which
invokes that function.

DECLARE_NOT_CALLED_FROM_C(clear_page_erms);

loses these properties and IMO it's a tasteless hack.

Thanks,

tglx

2021-10-17 02:11:04

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C



On Fri, Oct 15, 2021, at 8:55 AM, Thomas Gleixner wrote:
> On Thu, Oct 14 2021 at 19:51, Andy Lutomirski wrote:
>> On Wed, Oct 13, 2021, at 11:16 AM, Sami Tolvanen wrote:
>>>
>>> +/*
>>> + * Declares a function not callable from C using an opaque type. Defined as
>>> + * an array to allow the address of the symbol to be taken without '&'.
>>> + */
>> I’m not convinced that taking the address without using & is a
>> laudable goal. The magical arrays-are-pointers-too behavior of C is a
>> mistake, not a delightful simplification.
>
>>> +#ifndef DECLARE_NOT_CALLED_FROM_C
>>> +#define DECLARE_NOT_CALLED_FROM_C(sym) \
>>> + extern const u8 sym[]
>>> +#endif
>>
>
>> The relevant property of these symbols isn’t that they’re not called
>> from C. The relevant thing is that they are just and not objects of a
>> type that the programmer cares to tell the compiler about. (Or that
>> the compiler understands, for that matter. On a system with XO memory
>> or if they’re in a funny section, dereferencing them may fail.)
>
> I agree.
>
>> So I think we should use incomplete structs, which can’t be
>> dereferenced and will therefore be less error prone.
>
> While being late to that bike shed painting party, I really have to ask
> the question _why_ can't the compiler provide an annotation for these
> kind of things which:
>
> 1) Make the build fail when invoked directly
>
> 2) Tell CFI that this is _NOT_ something it can understand
>
> -void clear_page_erms(void *page);
> +void __bikeshedme clear_page_erms(void *page);
>
> That still tells me:
>
> 1) This is a function
>
> 2) It has a regular argument which is expected to be in RDI
>
> which even allows to do analyis of e.g. the alternative call which
> invokes that function.
>
> DECLARE_NOT_CALLED_FROM_C(clear_page_erms);
>
> loses these properties and IMO it's a tasteless hack.
>


Ah, but clear_page_erms is a different beast entirely as compared to, say, the syscall entry. It *is* a C function. So I see two ways to handle it:

1. Make it completely opaque. Tglx doesn’t like it, and I agree, but it would *work*.

2. Make it a correctly typed function. In clang CFI land, this may or may not be “canonical” (or non canonical?).

I think #2 is far better. I complained about this quite a few versions ago, and, sorry, the word “canonical” is pretty much a non-starter. There needs to be a way to annotate a function pointer type and an extern function declaration that says “the callee follows the ABI *without CFI*” and the compiler needs to do the right thing. And whatever attribute or keyword gets used needs to give the reader at least some chance of understanding.

(If there is a technical reason why function *pointers* of this type can’t be called, perhaps involving IBT, so be it. But the type system should really be aware of C-ABI functions that come from outside the CFI world.)

It looks like clear_page might be improved by using static_call some day, and then proper typing will be a requirement.

Would it help if I file a clang bug about this?

2021-10-17 10:06:31

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Fri, Oct 15, 2021 at 9:22 AM Andy Lutomirski <[email protected]> wrote:
>
>
>
> On Fri, Oct 15, 2021, at 8:55 AM, Thomas Gleixner wrote:
> > On Thu, Oct 14 2021 at 19:51, Andy Lutomirski wrote:
> >> On Wed, Oct 13, 2021, at 11:16 AM, Sami Tolvanen wrote:
> >>>
> >>> +/*
> >>> + * Declares a function not callable from C using an opaque type. Defined as
> >>> + * an array to allow the address of the symbol to be taken without '&'.
> >>> + */
> >> I’m not convinced that taking the address without using & is a
> >> laudable goal. The magical arrays-are-pointers-too behavior of C is a
> >> mistake, not a delightful simplification.
> >
> >>> +#ifndef DECLARE_NOT_CALLED_FROM_C
> >>> +#define DECLARE_NOT_CALLED_FROM_C(sym) \
> >>> + extern const u8 sym[]
> >>> +#endif
> >>
> >
> >> The relevant property of these symbols isn’t that they’re not called
> >> from C. The relevant thing is that they are just and not objects of a
> >> type that the programmer cares to tell the compiler about. (Or that
> >> the compiler understands, for that matter. On a system with XO memory
> >> or if they’re in a funny section, dereferencing them may fail.)
> >
> > I agree.
> >
> >> So I think we should use incomplete structs, which can’t be
> >> dereferenced and will therefore be less error prone.
> >
> > While being late to that bike shed painting party, I really have to ask
> > the question _why_ can't the compiler provide an annotation for these
> > kind of things which:
> >
> > 1) Make the build fail when invoked directly
> >
> > 2) Tell CFI that this is _NOT_ something it can understand
> >
> > -void clear_page_erms(void *page);
> > +void __bikeshedme clear_page_erms(void *page);
> >
> > That still tells me:
> >
> > 1) This is a function
> >
> > 2) It has a regular argument which is expected to be in RDI
> >
> > which even allows to do analyis of e.g. the alternative call which
> > invokes that function.
> >
> > DECLARE_NOT_CALLED_FROM_C(clear_page_erms);
> >
> > loses these properties and IMO it's a tasteless hack.
> >
>
>
> Ah, but clear_page_erms is a different beast entirely as compared to, say, the syscall entry. It *is* a C function. So I see two ways to handle it:
>
> 1. Make it completely opaque. Tglx doesn’t like it, and I agree, but it would *work*.
>
> 2. Make it a correctly typed function. In clang CFI land, this may or may not be “canonical” (or non canonical?).

Technically speaking the clear_page_* declarations don't need to be
changed for CFI, they do work fine as is, but I included them in the
patch as they're not actually called from C code right now. But you're
right, we should use a proper function declarations for these. I'll
drop the changes to this file in the next version.

I wouldn't mind having a consensus on how to deal with exception
handlers etc. though. Should I still use opaque types for those?

Sami

2021-10-17 14:34:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C



On Fri, Oct 15, 2021, at 9:47 AM, Sami Tolvanen wrote:
> On Fri, Oct 15, 2021 at 9:22 AM Andy Lutomirski <[email protected]> wrote:
>>
>>
>>
>> On Fri, Oct 15, 2021, at 8:55 AM, Thomas Gleixner wrote:
>> > On Thu, Oct 14 2021 at 19:51, Andy Lutomirski wrote:
>> >> On Wed, Oct 13, 2021, at 11:16 AM, Sami Tolvanen wrote:
>> >>>
>> >>> +/*
>> >>> + * Declares a function not callable from C using an opaque type. Defined as
>> >>> + * an array to allow the address of the symbol to be taken without '&'.
>> >>> + */
>> >> I’m not convinced that taking the address without using & is a
>> >> laudable goal. The magical arrays-are-pointers-too behavior of C is a
>> >> mistake, not a delightful simplification.
>> >
>> >>> +#ifndef DECLARE_NOT_CALLED_FROM_C
>> >>> +#define DECLARE_NOT_CALLED_FROM_C(sym) \
>> >>> + extern const u8 sym[]
>> >>> +#endif
>> >>
>> >
>> >> The relevant property of these symbols isn’t that they’re not called
>> >> from C. The relevant thing is that they are just and not objects of a
>> >> type that the programmer cares to tell the compiler about. (Or that
>> >> the compiler understands, for that matter. On a system with XO memory
>> >> or if they’re in a funny section, dereferencing them may fail.)
>> >
>> > I agree.
>> >
>> >> So I think we should use incomplete structs, which can’t be
>> >> dereferenced and will therefore be less error prone.
>> >
>> > While being late to that bike shed painting party, I really have to ask
>> > the question _why_ can't the compiler provide an annotation for these
>> > kind of things which:
>> >
>> > 1) Make the build fail when invoked directly
>> >
>> > 2) Tell CFI that this is _NOT_ something it can understand
>> >
>> > -void clear_page_erms(void *page);
>> > +void __bikeshedme clear_page_erms(void *page);
>> >
>> > That still tells me:
>> >
>> > 1) This is a function
>> >
>> > 2) It has a regular argument which is expected to be in RDI
>> >
>> > which even allows to do analyis of e.g. the alternative call which
>> > invokes that function.
>> >
>> > DECLARE_NOT_CALLED_FROM_C(clear_page_erms);
>> >
>> > loses these properties and IMO it's a tasteless hack.
>> >
>>
>>
>> Ah, but clear_page_erms is a different beast entirely as compared to, say, the syscall entry. It *is* a C function. So I see two ways to handle it:
>>
>> 1. Make it completely opaque. Tglx doesn’t like it, and I agree, but it would *work*.
>>
>> 2. Make it a correctly typed function. In clang CFI land, this may or may not be “canonical” (or non canonical?).
>
> Technically speaking the clear_page_* declarations don't need to be
> changed for CFI, they do work fine as is, but I included them in the
> patch as they're not actually called from C code right now. But you're
> right, we should use a proper function declarations for these. I'll
> drop the changes to this file in the next version.

If you were to call (with a regular C function call using ()) clear_page_erms, what happens? IMO it should either work or fail to compile. Crashing is no good.

>
> I wouldn't mind having a consensus on how to deal with exception
> handlers etc. though. Should I still use opaque types for those?
>

Yes, as they are not C functions.

> Sami

2021-10-17 16:37:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Fri, Oct 15 2021 at 17:55, Thomas Gleixner wrote:
> On Thu, Oct 14 2021 at 19:51, Andy Lutomirski wrote:
>> On Wed, Oct 13, 2021, at 11:16 AM, Sami Tolvanen wrote:
> That still tells me:
>
> 1) This is a function
>
> 2) It has a regular argument which is expected to be in RDI
>
> which even allows to do analyis of e.g. the alternative call which
> invokes that function.
>
> DECLARE_NOT_CALLED_FROM_C(clear_page_erms);
>
> loses these properties and IMO it's a tasteless hack.

Look:

SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
size_t, len)

Not beautiful, but it gives the information which is needed and it tells
me clearly what this is about. While the above lumps everything together
whatever it is.

Having __bikeshedme would allow to do:

__hardware_call
__xenhv_call
__inline_asm_call

or such, which clearly tells how the function should be used and it can
even be validated by tooling.

You could to that with macros as well, but thats not what you offered.

Seriously, if you want to sell me that stuff, then you really should
offer me something which has a value on its own and makes it palatable
to me. That's not something new. See:

https://lore.kernel.org/lkml/[email protected]/

That said, I still want to have a coherent technical explanation why the
compiler people cannot come up with a sensible annotation for these
things.

I'm tired of this attitude, that they add features and then ask us to
make our code more ugly.

It's not a well hidden secret that the kernel uses these kind of
constructs. So why on earth can't they be bothered to think about these
things upfront and offer us something nice and useful?

Thanks,

tglx







2021-10-17 18:09:03

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Fri, Oct 15, 2021 at 10:57 AM Thomas Gleixner <[email protected]> wrote:
>
> On Fri, Oct 15 2021 at 17:55, Thomas Gleixner wrote:
> > On Thu, Oct 14 2021 at 19:51, Andy Lutomirski wrote:
> >> On Wed, Oct 13, 2021, at 11:16 AM, Sami Tolvanen wrote:
> > That still tells me:
> >
> > 1) This is a function
> >
> > 2) It has a regular argument which is expected to be in RDI
> >
> > which even allows to do analyis of e.g. the alternative call which
> > invokes that function.
> >
> > DECLARE_NOT_CALLED_FROM_C(clear_page_erms);
> >
> > loses these properties and IMO it's a tasteless hack.
>
> Look:
>
> SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
> size_t, len)
>
> Not beautiful, but it gives the information which is needed and it tells
> me clearly what this is about. While the above lumps everything together
> whatever it is.

Sure, that makes sense. Ignoring the macro for a moment, how do you
feel about using incomplete structs for the non-C functions as Andy
suggested?

> Having __bikeshedme would allow to do:
>
> __hardware_call
> __xenhv_call
> __inline_asm_call
>
> or such, which clearly tells how the function should be used and it can
> even be validated by tooling.

Previously you suggested adding a built-in function to the compiler:

https://lore.kernel.org/lkml/[email protected]/

I actually did implement this in Clang, but the feature wasn't
necessary with opaque types, so I never moved forward with those
patches. A built-in also won't make the code any cleaner, which was a
concern last time.

I do agree that a function attribute would look cleaner, but it won't
stop anyone from mistakenly calling these functions from C code, which
was something Andy wanted to address at the same time. Do you still
prefer a function attribute over using an opaque type nevertheless?

> You could to that with macros as well, but thats not what you offered.
>
> Seriously, if you want to sell me that stuff, then you really should
> offer me something which has a value on its own and makes it palatable
> to me. That's not something new. See:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> That said, I still want to have a coherent technical explanation why the
> compiler people cannot come up with a sensible annotation for these
> things.

I can only assume they didn't think about this specific use case.

Sami

2021-10-17 22:04:18

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Fri, Oct 15, 2021, at 11:42 AM, Sami Tolvanen wrote:
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> That said, I still want to have a coherent technical explanation why the
>> compiler people cannot come up with a sensible annotation for these
>> things.
>
> I can only assume they didn't think about this specific use case.

I must be missing something here. Linux is full of C-ABI functions implemented in asm. Just off of a quick grep:

asm_load_gs_index, memset, memmove, basically everything in arch/x86/lib/*.S

If they're just declared and called directly from C, it should just work. But an *indirect* call needs some sort of special handling. How does this work in your patchset?

Then we get to these nasty cases where, for some reason, we need to explicitly grab the actual entry point or we need to grab the actual literal address that we can call indirectly. This might be alternative_call, where we're trying to be fast and we want to bypass the CFI magic because, despite what the compiler might try to infer, we are doing a direct call (so it can't be the wrong address due a runtime attack, ENDBR isn't needed, etc). And I can easily believe that the opposite comes to mind. And there are things like exception entries, where C calls make no sense, CFI makes no sense, and they should be totally opaque.

So I tend to think that tglx is right *and* we need an attribute, because there really are multiple things going on here.

SYM_FUNC_START(c_callable_func)
...
ret
SYM_FUNC_END

extern __magic int c_callable_func(whatever);

Surely *something* needs to go where __magic is to tell the compiler that we have a function that wasn't generated by a CFI-aware compiler and that it's just a C ABI function. (Or maybe this is completely implicit? I can't keep track of exactly which thing generates which code in clang CFI.)

But we *also* have the read-the-address thing:

void something(void)
{
/* actual C body */
}
alternative_call(something, someotherthing, ...);

That wants to expand to assembly code that does:

CALL [target]

where [target] is the actual first instruction of real code and not a CFI prologue. Or, inversely, we want:

void (*ptr)(void) = something;

which (I presume -- correct me if I'm wrong) wants the CFI landing pad. It's not the same address.

And this all wants to work both for asm-defined functions and C-defined functions. This really is orthogonal to the is-it-asm-or-is-it-C things. All four combinations are possible.

Does this make any sense? I kind of thing we want the attributes and the builtin, along the lines of:

asm("call %m", function_nocfi_address(something));

or however else we wire it up.

(And, of course, the things that aren't C functions at all, like exception entries, should be opaque.)

2021-10-18 02:35:28

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Fri, Oct 15, 2021 at 12:36 PM Andy Lutomirski <[email protected]> wrote:
>
> On Fri, Oct 15, 2021, at 11:42 AM, Sami Tolvanen wrote:
> >> https://lore.kernel.org/lkml/[email protected]/
> >>
> >> That said, I still want to have a coherent technical explanation why the
> >> compiler people cannot come up with a sensible annotation for these
> >> things.
> >
> > I can only assume they didn't think about this specific use case.
>
> I must be missing something here. Linux is full of C-ABI functions implemented in asm. Just off of a quick grep:
>
> asm_load_gs_index, memset, memmove, basically everything in arch/x86/lib/*.S
>
> If they're just declared and called directly from C, it should just work. But an *indirect* call needs some sort of special handling. How does this work in your patchset?

Making indirect calls to functions implemented in assembly doesn't
require special handling. The type is inferred from the C function
declaration.

> Then we get to these nasty cases where, for some reason, we need to explicitly grab the actual entry point or we need to grab the actual literal address that we can call indirectly. This might be alternative_call, where we're trying to be fast and we want to bypass the CFI magic because, despite what the compiler might try to infer, we are doing a direct call (so it can't be the wrong address due a runtime attack, ENDBR isn't needed, etc). And I can easily believe that the opposite comes to mind. And there are things like exception entries, where C calls make no sense, CFI makes no sense, and they should be totally opaque.

Correct, this is the main issue we're trying to solve. For low-level
entry points, we want the actual symbol address instead of the CFI
magic, both because of performance and because CFI jump tables may not
be mapped into memory with KPTI. Using an opaque type cleanly
accomplishes the goal for symbols that are not callable from C code.

> So I tend to think that tglx is right *and* we need an attribute, because there really are multiple things going on here.
>
> SYM_FUNC_START(c_callable_func)
> ...
> ret
> SYM_FUNC_END
>
> extern __magic int c_callable_func(whatever);
>
> Surely *something* needs to go where __magic is to tell the compiler that we have a function that wasn't generated by a CFI-aware compiler and that it's just a C ABI function. (Or maybe this is completely implicit? I can't keep track of exactly which thing generates which code in clang CFI.)

This is implicit. Nothing is needed for this case.

> But we *also* have the read-the-address thing:
>
> void something(void)
> {
> /* actual C body */
> }
> alternative_call(something, someotherthing, ...);
>
> That wants to expand to assembly code that does:
>
> CALL [target]
>
> where [target] is the actual first instruction of real code and not a CFI prologue.

Yes, here we would ideally want to avoid the CFI stub for better
performance, but nothing actually breaks even if we don't.

> Or, inversely, we want:
>
> void (*ptr)(void) = something;
>
> which (I presume -- correct me if I'm wrong) wants the CFI landing pad. It's not the same address.

Correct.

> And this all wants to work both for asm-defined functions and C-defined functions. This really is orthogonal to the is-it-asm-or-is-it-C things. All four combinations are possible.
>
> Does this make any sense? I kind of thing we want the attributes and the builtin, along the lines of:
>
> asm("call %m", function_nocfi_address(something));
>
> or however else we wire it up.
>
> (And, of course, the things that aren't C functions at all, like exception entries, should be opaque.)

I agree, there are cases where having a function attribute and/or a
built-in to stop the compiler from interfering would be useful. I'll
dust off my patch series and see how the LLVM folks feel about it.

Sami

2021-10-18 03:13:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Fri, Oct 15 2021 at 11:42, Sami Tolvanen wrote:
> On Fri, Oct 15, 2021 at 10:57 AM Thomas Gleixner <[email protected]> wrote:
>> Not beautiful, but it gives the information which is needed and it tells
>> me clearly what this is about. While the above lumps everything together
>> whatever it is.
>
> Sure, that makes sense. Ignoring the macro for a moment, how do you
> feel about using incomplete structs for the non-C functions as Andy
> suggested?

I think I agreed with that back then when he suggested it the first
time. That still allows me to do a classification:

struct asm_exception
struct asm_xen_hv_call
....

>> Having __bikeshedme would allow to do:
>>
>> __hardware_call
>> __xenhv_call
>> __inline_asm_call
>>
>> or such, which clearly tells how the function should be used and it can
>> even be validated by tooling.
>
> Previously you suggested adding a built-in function to the compiler:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> I actually did implement this in Clang, but the feature wasn't
> necessary with opaque types, so I never moved forward with those
> patches. A built-in also won't make the code any cleaner, which was a
> concern last time.
>
> I do agree that a function attribute would look cleaner, but it won't
> stop anyone from mistakenly calling these functions from C code, which
> was something Andy wanted to address at the same time. Do you still
> prefer a function attribute over using an opaque type nevertheless?

For actually callable functions, by some definition of callable,
e.g. the clear_page_*() variants a proper attribute would be definitely
preferred.

That attribute should tell the compiler that the function is using the
register arguments correctly but is not suitable for direct invocation
because it clobbers registers.

So the compiler can just refuse to call such a function if used directly
without an inline asm wrapper which describes the clobbers, right?

But thinking more about clobbers. The only "annotation" of clobbers we
have today are the clobbers in the inline asm, which is fragile too.

Something like

__attribute__ ((clobbers ("rcx", "rax")))

might be useful by itself because it allows validation of the clobbers
in the inline asm wrappers and also allows a analysis tool to look at
the ASM code and check whether the above list is correct.

Hmm?

Thanks,

tglx

2021-10-18 03:38:44

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Sat, Oct 16, 2021 at 12:17:40AM +0200, Thomas Gleixner wrote:
> For actually callable functions, by some definition of callable,
> e.g. the clear_page_*() variants a proper attribute would be definitely
> preferred.

See my last email, clear_page_*() has nothing to do with CFI in the
first place.

> That attribute should tell the compiler that the function is using the
> register arguments correctly but is not suitable for direct invocation
> because it clobbers registers.
>
> So the compiler can just refuse to call such a function if used directly
> without an inline asm wrapper which describes the clobbers, right?
>
> But thinking more about clobbers. The only "annotation" of clobbers we
> have today are the clobbers in the inline asm, which is fragile too.
>
> Something like
>
> __attribute__ ((clobbers ("rcx", "rax")))
>
> might be useful by itself because it allows validation of the clobbers
> in the inline asm wrappers and also allows a analysis tool to look at
> the ASM code and check whether the above list is correct.
>
> Hmm?

Functions are allowed to clobber rcx and rax anyway.

The clear_page_*() functions follow the C ABI, like (almost) every other
asm function in the kernel. I think there's a misunderstanding here, as
most of this doesn't have anything to do with CFI anyway.

--
Josh

2021-10-18 03:38:44

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Fri, Oct 15, 2021 at 01:37:02PM -0700, Sami Tolvanen wrote:
> > But we *also* have the read-the-address thing:
> >
> > void something(void)
> > {
> > /* actual C body */
> > }
> > alternative_call(something, someotherthing, ...);
> >
> > That wants to expand to assembly code that does:
> >
> > CALL [target]
> >
> > where [target] is the actual first instruction of real code and not
> > a CFI prologue.
>
> Yes, here we would ideally want to avoid the CFI stub for better
> performance, but nothing actually breaks even if we don't.

That's because there's no CFI involved in alternative_call(). It
doesn't use function pointers. It uses direct calls.

So all the discussion about clear_page_*() is completely irrelevant. It
has nothing to do with CFI.

Same for copy_user_enhanced_fast_string() and friends.

> > And this all wants to work both for asm-defined functions and
> > C-defined functions. This really is orthogonal to the
> > is-it-asm-or-is-it-C things. All four combinations are possible.
> >
> > Does this make any sense?

Not really, I think Sami debunked most of your theories :-)

I think you're misunderstanding how Clang CFI works. It doesn't
instrument the target function. It hooks into the caller's function
pointer relocation, so when I try to call func_ptr(), the compiler
hijacks the function pointer and forces me to call into a
func_ptr.cfi_jt() checking function instead.

> > I kind of thing we want the attributes and the builtin, along the lines of:
> >
> > asm("call %m", function_nocfi_address(something));
> >
> > or however else we wire it up.
> >
> > (And, of course, the things that aren't C functions at all, like
> > exception entries, should be opaque.)
>
> I agree, there are cases where having a function attribute and/or a
> built-in to stop the compiler from interfering would be useful. I'll
> dust off my patch series and see how the LLVM folks feel about it.

With all the talk about function attributes I still haven't heard a
clear and specific case where one is needed.

If you take out the things that don't actually need the
DEFINE_NOT_CALLED_FROM_C() annotation then all you have left is the need
for opaque structs as far as I can tell.

--
Josh

2021-10-18 17:10:55

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 03/15] linkage: Add DECLARE_NOT_CALLED_FROM_C

On Sat, Oct 16, 2021 at 2:12 PM Josh Poimboeuf <[email protected]> wrote:
>
> On Fri, Oct 15, 2021 at 01:37:02PM -0700, Sami Tolvanen wrote:
> > > But we *also* have the read-the-address thing:
> > >
> > > void something(void)
> > > {
> > > /* actual C body */
> > > }
> > > alternative_call(something, someotherthing, ...);
> > >
> > > That wants to expand to assembly code that does:
> > >
> > > CALL [target]
> > >
> > > where [target] is the actual first instruction of real code and not
> > > a CFI prologue.
> >
> > Yes, here we would ideally want to avoid the CFI stub for better
> > performance, but nothing actually breaks even if we don't.
>
> That's because there's no CFI involved in alternative_call(). It
> doesn't use function pointers. It uses direct calls.

Ah, you're right. alternative_call() uses the function name instead of
the address, so it's not actually affected by CFI.

> > > I kind of thing we want the attributes and the builtin, along the lines of:
> > >
> > > asm("call %m", function_nocfi_address(something));
> > >
> > > or however else we wire it up.
> > >
> > > (And, of course, the things that aren't C functions at all, like
> > > exception entries, should be opaque.)
> >
> > I agree, there are cases where having a function attribute and/or a
> > built-in to stop the compiler from interfering would be useful. I'll
> > dust off my patch series and see how the LLVM folks feel about it.
>
> With all the talk about function attributes I still haven't heard a
> clear and specific case where one is needed.
>
> If you take out the things that don't actually need the
> DEFINE_NOT_CALLED_FROM_C() annotation then all you have left is the need
> for opaque structs as far as I can tell.

Correct.

Sami

2021-10-19 10:09:55

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

From: Sami Tolvanen <[email protected]>
Date: Wed, 13 Oct 2021 11:16:43 -0700

> This series adds support for Clang's Control-Flow Integrity (CFI)
> checking to x86_64. With CFI, the compiler injects a runtime
> check before each indirect function call to ensure the target is
> a valid function with the correct static type. This restricts
> possible call targets and makes it more difficult for an attacker
> to exploit bugs that allow the modification of stored function
> pointers. For more details, see:
>
> https://clang.llvm.org/docs/ControlFlowIntegrity.html
>
> Note that v5 is based on tip/master. The first two patches contain
> objtool support for CFI, the remaining patches change function
> declarations to use opaque types, fix type mismatch issues that
> confuse the compiler, and disable CFI where it can't be used.
>
> You can also pull this series from
>
> https://github.com/samitolvanen/linux.git x86-cfi-v5

Hi,

I found [0] while was testing Peter's retpoline series, wanted to
ask / double check if that is because I'm using ClangCFI for x86
on unsupported Clang 12. It is fixed in 13 I suppose?

[0] https://lore.kernel.org/all/[email protected]

Thanks,
Al

2021-10-19 15:42:59

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Tue, Oct 19, 2021 at 3:06 AM Alexander Lobakin <[email protected]> wrote:
>
> From: Sami Tolvanen <[email protected]>
> Date: Wed, 13 Oct 2021 11:16:43 -0700
>
> > This series adds support for Clang's Control-Flow Integrity (CFI)
> > checking to x86_64. With CFI, the compiler injects a runtime
> > check before each indirect function call to ensure the target is
> > a valid function with the correct static type. This restricts
> > possible call targets and makes it more difficult for an attacker
> > to exploit bugs that allow the modification of stored function
> > pointers. For more details, see:
> >
> > https://clang.llvm.org/docs/ControlFlowIntegrity.html
> >
> > Note that v5 is based on tip/master. The first two patches contain
> > objtool support for CFI, the remaining patches change function
> > declarations to use opaque types, fix type mismatch issues that
> > confuse the compiler, and disable CFI where it can't be used.
> >
> > You can also pull this series from
> >
> > https://github.com/samitolvanen/linux.git x86-cfi-v5
>
> Hi,
>
> I found [0] while was testing Peter's retpoline series, wanted to
> ask / double check if that is because I'm using ClangCFI for x86
> on unsupported Clang 12. It is fixed in 13 I suppose?
>
> [0] https://lore.kernel.org/all/[email protected]

No, it works exactly the same in later compiler versions. I also
replied to that thread, but this looks like another instance where
using an opaque type instead of a function declaration fixes the
issue, and probably makes sense as the thunks are not directly
callable from C code.

Sami

2021-10-21 10:28:34

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

From: Sami Tolvanen <[email protected]>
Date: Wed, 13 Oct 2021 11:16:43 -0700

> This series adds support for Clang's Control-Flow Integrity (CFI)
> checking to x86_64. With CFI, the compiler injects a runtime
> check before each indirect function call to ensure the target is
> a valid function with the correct static type. This restricts
> possible call targets and makes it more difficult for an attacker
> to exploit bugs that allow the modification of stored function
> pointers. For more details, see:
>
> https://clang.llvm.org/docs/ControlFlowIntegrity.html
>
> Note that v5 is based on tip/master. The first two patches contain
> objtool support for CFI, the remaining patches change function
> declarations to use opaque types, fix type mismatch issues that
> confuse the compiler, and disable CFI where it can't be used.
>
> You can also pull this series from
>
> https://github.com/samitolvanen/linux.git x86-cfi-v5

[ snip ]

I've been using it since the end of May on my x86_64, so for v5
(with changing retpoline thunks prototypes to opaque):

Reviwed-by: Alexander Lobakin <[email protected]>
Tested-by: Alexander Lobakin <[email protected]>

Thanks!
Al

2021-10-27 10:15:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 13, 2021 at 11:16:43AM -0700, Sami Tolvanen wrote:
> This series adds support for Clang's Control-Flow Integrity (CFI)
> checking to x86_64. With CFI, the compiler injects a runtime
> check before each indirect function call to ensure the target is
> a valid function with the correct static type. This restricts
> possible call targets and makes it more difficult for an attacker
> to exploit bugs that allow the modification of stored function
> pointers. For more details, see:
>
> https://clang.llvm.org/docs/ControlFlowIntegrity.html

So, if I understand this right, the compiler emits, for every function
two things: 1) the actual funcion and 2) a jump-table entry.

Then, every time the address of a function is taken, 2) is given instead
of the expected 1), right?

But how does this work with things like static_call(), which we give a
function address (now a jump-table entry) and use that to write direct
call instructions?

Should not this jump-table thingy get converted to an actual function
address somewhere around arch_static_call_transform() ? This also seems
relevant for arm64 (which already has CLANG_CFI supported) given:

https://lkml.kernel.org/r/[email protected]

Or am I still not understanding this CFI thing?

2021-10-27 21:23:00

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v5 00/15] x86: Add support for Clang CFI

From: Peter Zijlstra
> Sent: 26 October 2021 21:16
>
> On Wed, Oct 13, 2021 at 11:16:43AM -0700, Sami Tolvanen wrote:
> > This series adds support for Clang's Control-Flow Integrity (CFI)
> > checking to x86_64. With CFI, the compiler injects a runtime
> > check before each indirect function call to ensure the target is
> > a valid function with the correct static type. This restricts
> > possible call targets and makes it more difficult for an attacker
> > to exploit bugs that allow the modification of stored function
> > pointers. For more details, see:
> >
> > https://clang.llvm.org/docs/ControlFlowIntegrity.html
>
> So, if I understand this right, the compiler emits, for every function
> two things: 1) the actual funcion and 2) a jump-table entry.
>
> Then, every time the address of a function is taken, 2) is given instead
> of the expected 1), right?
>
> But how does this work with things like static_call(), which we give a
> function address (now a jump-table entry) and use that to write direct
> call instructions?
>
> Should not this jump-table thingy get converted to an actual function
> address somewhere around arch_static_call_transform() ? This also seems
> relevant for arm64 (which already has CLANG_CFI supported) given:
>
> https://lkml.kernel.org/r/[email protected]
>
> Or am I still not understanding this CFI thing?

From what I remember the compiler adds code prior to every jump indirect
to check that the function address is in the list of valid functions
(with a suitable prototype - or some similar check).

So it add a run-time search to every indirect call.

What would be more sensible would be a hidden extra parameter that is
a hash of the prototype that is saved just before the entry point.
Then the caller and called function could both check.
That is still a moderate cost for an indirect call.

Anything that can write asm can get around any check - it just gets a
bit harder.
But overwritten function pointers would be detected - which I assume
is the main threat.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-10-27 21:23:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 10:02:56AM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 26 October 2021 21:16
> >
> > On Wed, Oct 13, 2021 at 11:16:43AM -0700, Sami Tolvanen wrote:
> > > This series adds support for Clang's Control-Flow Integrity (CFI)
> > > checking to x86_64. With CFI, the compiler injects a runtime
> > > check before each indirect function call to ensure the target is
> > > a valid function with the correct static type. This restricts
> > > possible call targets and makes it more difficult for an attacker
> > > to exploit bugs that allow the modification of stored function
> > > pointers. For more details, see:
> > >
> > > https://clang.llvm.org/docs/ControlFlowIntegrity.html
> >
> > So, if I understand this right, the compiler emits, for every function
> > two things: 1) the actual funcion and 2) a jump-table entry.
> >
> > Then, every time the address of a function is taken, 2) is given instead
> > of the expected 1), right?
> >
> > But how does this work with things like static_call(), which we give a
> > function address (now a jump-table entry) and use that to write direct
> > call instructions?
> >
> > Should not this jump-table thingy get converted to an actual function
> > address somewhere around arch_static_call_transform() ? This also seems
> > relevant for arm64 (which already has CLANG_CFI supported) given:
> >
> > https://lkml.kernel.org/r/[email protected]
> >
> > Or am I still not understanding this CFI thing?
>
> From what I remember the compiler adds code prior to every jump indirect
> to check that the function address is in the list of valid functions
> (with a suitable prototype - or some similar check).

It definitely mucks about with the address too; see here:

https://lkml.kernel.org/r/[email protected]

I'm thinking static_call() wants the real actual function address before
it writes instructions.

2021-10-27 21:25:15

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 12:55:17PM +0000, David Laight wrote:
> From: Mark Rutland
> > Sent: 27 October 2021 13:05
> ...
> > Taking a step back, it'd be nicer if we didn't have the jump-table shim
> > at all, and had some SW landing pad (e.g. a NOP with some magic bytes)
> > in the callees that the caller could check for. Then function pointers
> > would remain callable in call cases, and we could explcitly add landing
> > pads to asm to protect those. I *think* that's what the grsecurity folk
> > do, but I could be mistaken.
>
> It doesn't need to be a 'landing pad'.
> The 'magic value' could be at 'label - 8'.

Sure; I'd intended to mean the general case of something at some fixed
offset from the entrypoint, either before or after, potentially but not
necessarily inline in the executed instruction stream.

Mark.

2021-10-27 21:25:22

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v5 00/15] x86: Add support for Clang CFI

From: Mark Rutland
> Sent: 27 October 2021 13:05
...
> Taking a step back, it'd be nicer if we didn't have the jump-table shim
> at all, and had some SW landing pad (e.g. a NOP with some magic bytes)
> in the callees that the caller could check for. Then function pointers
> would remain callable in call cases, and we could explcitly add landing
> pads to asm to protect those. I *think* that's what the grsecurity folk
> do, but I could be mistaken.

It doesn't need to be a 'landing pad'.
The 'magic value' could be at 'label - 8'.

Provided you can generate the required value it could be added
to asm functions.
(Or you could patch it at startup by stealing the value from
a C function.)

Depending on the threat model, you may even want the called function
to do some sanity checks on the caller.

I suspect that anything you do is easy to subvert by anything that
can actually write asm.
So if the real threat is overwritten function tables then something
relatively simple is adequate.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-10-27 21:25:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 02:22:27PM +0200, Ard Biesheuvel wrote:
> On Wed, 27 Oct 2021 at 14:05, Mark Rutland <[email protected]> wrote:

> > > Should not this jump-table thingy get converted to an actual function
> > > address somewhere around arch_static_call_transform() ? This also seems
> > > relevant for arm64 (which already has CLANG_CFI supported) given:
> > >
> > > https://lkml.kernel.org/r/[email protected]
> >
> > Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...
> >
>
> Sadly, that only works on symbol names, so we cannot use it to strip
> CFI-ness from void *func arguments passed into the static call API,
> unfortunately.

Right, and while mostly static_call_update() is used, whcih is a macro
and could possibly be used to wrap this, we very much rely on
__static_call_update() also working without that wrapper and then we're
up a creek without no paddles.

2021-10-27 21:26:21

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, 27 Oct 2021 at 14:05, Mark Rutland <[email protected]> wrote:
>
> On Tue, Oct 26, 2021 at 10:16:22PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 13, 2021 at 11:16:43AM -0700, Sami Tolvanen wrote:
> > > This series adds support for Clang's Control-Flow Integrity (CFI)
> > > checking to x86_64. With CFI, the compiler injects a runtime
> > > check before each indirect function call to ensure the target is
> > > a valid function with the correct static type. This restricts
> > > possible call targets and makes it more difficult for an attacker
> > > to exploit bugs that allow the modification of stored function
> > > pointers. For more details, see:
> > >
> > > https://clang.llvm.org/docs/ControlFlowIntegrity.html
> >
> > So, if I understand this right, the compiler emits, for every function
> > two things: 1) the actual funcion and 2) a jump-table entry.
> >
> > Then, every time the address of a function is taken, 2) is given instead
> > of the expected 1), right?
>
> Yes, and we had to bodge around this with function_nocfi() to get the
> actual function address.
>
> Really there should be a compiler intrinsic or attribute for this, given
> the compiler has all the releveant information available. On arm64 we
> had to us inine asm to generate the addres...
>
> Taking a step back, it'd be nicer if we didn't have the jump-table shim
> at all, and had some SW landing pad (e.g. a NOP with some magic bytes)
> in the callees that the caller could check for. Then function pointers
> would remain callable in call cases, and we could explcitly add landing
> pads to asm to protect those. I *think* that's what the grsecurity folk
> do, but I could be mistaken.
>
> > But how does this work with things like static_call(), which we give a
> > function address (now a jump-table entry) and use that to write direct
> > call instructions?
> >
> > Should not this jump-table thingy get converted to an actual function
> > address somewhere around arch_static_call_transform() ? This also seems
> > relevant for arm64 (which already has CLANG_CFI supported) given:
> >
> > https://lkml.kernel.org/r/[email protected]
>
> Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...
>

Sadly, that only works on symbol names, so we cannot use it to strip
CFI-ness from void *func arguments passed into the static call API,
unfortunately.

Also, function_nocfi() seems broken in the sense that it relies on the
symbol existing in the global namespace, which may not be true for
function symbols with static linkage, as they can be optimized away
entirely. I think the same might apply to function symbols with
external linkage under LTO.

2021-10-27 21:26:24

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, 27 Oct 2021 at 15:05, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Oct 27, 2021 at 02:48:52PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 27, 2021 at 02:22:27PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 27 Oct 2021 at 14:05, Mark Rutland <[email protected]> wrote:
> >
> > > > > Should not this jump-table thingy get converted to an actual function
> > > > > address somewhere around arch_static_call_transform() ? This also seems
> > > > > relevant for arm64 (which already has CLANG_CFI supported) given:
> > > > >
> > > > > https://lkml.kernel.org/r/[email protected]
> > > >
> > > > Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...
> > > >
> > >
> > > Sadly, that only works on symbol names, so we cannot use it to strip
> > > CFI-ness from void *func arguments passed into the static call API,
> > > unfortunately.
> >
> > Right, and while mostly static_call_update() is used, whcih is a macro
> > and could possibly be used to wrap this, we very much rely on
> > __static_call_update() also working without that wrapper and then we're
> > up a creek without no paddles.
>
> Specifically, we support code like:
>
> struct foo {
> void (*func1)(args1);
> void (*func2)(args2);
> }
>
> struct foo global_foo;
>
> ...
>
> DEFINE_STATIC_CALL_NULL(func1, *global_foo.func1);
>
> ...
>
> __init foo_init()
> {
> // whatever code that fills out foo
>
> static_call_update(func1, global_foo.func1);
> }
>
> ...
>
> hot_function()
> {
> ...
> static_cal(func1)(args1);
> ...
> }
>
> cold_function()
> {
> ...
> global_foo->func1(args1);
> ...
> }
>
> And there is no way I can see this working sanely with CFI as presented.
>
> Even though the above uses static_call_update(), we can't no longer use
> function_nocfi() on the @func argument, because it's not a symbol, it's
> already a function pointer.
>
> Nor can we fill global_foo.func1 with function_nocfi() because then
> cold_function() goes sideways.
>

As far as I can tell from playing around with Clang, the stubs can
actually be executed directly, they just jumps to the actual function.
The compiler simply generates a jump table for each prototype that
appears in the code as the target of an indirect jump, and checks
whether the target appears in the list.

E.g., the code below

void foo(void) {}
void bar(int) {}
void baz(int) {}
void (* volatile fn1)(void) = foo;
void (* volatile fn2)(int) = bar;

int main(int argc, char *argv[])
{
fn1();
fn2 = baz;
fn2(-1);
}

produces

0000000000400594 <foo.cfi>:
400594: d65f03c0 ret

0000000000400598 <bar.cfi>:
400598: d65f03c0 ret

000000000040059c <baz.cfi>:
40059c: d65f03c0 ret

00000000004005a0 <main>:
4005a0: a9bf7bfd stp x29, x30, [sp, #-16]!

// First indirect call
4005a4: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
4005a8: f9401508 ldr x8, [x8, #40]
4005ac: 90000009 adrp x9, 400000 <__abi_tag-0x278>
4005b0: 91182129 add x9, x9, #0x608
4005b4: 910003fd mov x29, sp
4005b8: eb09011f cmp x8, x9
4005bc: 54000241 b.ne 400604 <main+0x64> // b.any
4005c0: d63f0100 blr x8

// Assignment of fn2
4005c4: 90000009 adrp x9, 400000 <__abi_tag-0x278>
4005c8: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
4005cc: 91184129 add x9, x9, #0x610
4005d0: f9001909 str x9, [x8, #48]

// Second indirect call
4005d4: f9401908 ldr x8, [x8, #48]
4005d8: 90000009 adrp x9, 400000 <__abi_tag-0x278>
4005dc: 91183129 add x9, x9, #0x60c
4005e0: cb090109 sub x9, x8, x9
4005e4: 93c90929 ror x9, x9, #2
4005e8: f100053f cmp x9, #0x1
4005ec: 540000c8 b.hi 400604 <main+0x64> // b.pmore
4005f0: 12800000 mov w0, #0xffffffff // #-1
4005f4: d63f0100 blr x8


4005f8: 2a1f03e0 mov w0, wzr
4005fc: a8c17bfd ldp x29, x30, [sp], #16
400600: d65f03c0 ret
400604: d4200020 brk #0x1

0000000000400608 <__typeid__ZTSFvvE_global_addr>:
400608: 17ffffe3 b 400594 <foo.cfi>

000000000040060c <__typeid__ZTSFviE_global_addr>:
40060c: 17ffffe3 b 400598 <bar.cfi>
400610: 17ffffe3 b 40059c <baz.cfi>

So it looks like taking the address is fine, although not optimal due
to the additional jump. We could fudge around that by checking the
opcode at the target of the call, or token paste ".cfi" after the
symbol name in the static_call_update() macro, but it doesn't like
like anything is terminally broken tbh.

2021-10-27 21:27:14

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Tue, Oct 26, 2021 at 10:16:22PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 13, 2021 at 11:16:43AM -0700, Sami Tolvanen wrote:
> > This series adds support for Clang's Control-Flow Integrity (CFI)
> > checking to x86_64. With CFI, the compiler injects a runtime
> > check before each indirect function call to ensure the target is
> > a valid function with the correct static type. This restricts
> > possible call targets and makes it more difficult for an attacker
> > to exploit bugs that allow the modification of stored function
> > pointers. For more details, see:
> >
> > https://clang.llvm.org/docs/ControlFlowIntegrity.html
>
> So, if I understand this right, the compiler emits, for every function
> two things: 1) the actual funcion and 2) a jump-table entry.
>
> Then, every time the address of a function is taken, 2) is given instead
> of the expected 1), right?

Yes, and we had to bodge around this with function_nocfi() to get the
actual function address.

Really there should be a compiler intrinsic or attribute for this, given
the compiler has all the releveant information available. On arm64 we
had to us inine asm to generate the addres...

Taking a step back, it'd be nicer if we didn't have the jump-table shim
at all, and had some SW landing pad (e.g. a NOP with some magic bytes)
in the callees that the caller could check for. Then function pointers
would remain callable in call cases, and we could explcitly add landing
pads to asm to protect those. I *think* that's what the grsecurity folk
do, but I could be mistaken.

> But how does this work with things like static_call(), which we give a
> function address (now a jump-table entry) and use that to write direct
> call instructions?
>
> Should not this jump-table thingy get converted to an actual function
> address somewhere around arch_static_call_transform() ? This also seems
> relevant for arm64 (which already has CLANG_CFI supported) given:
>
> https://lkml.kernel.org/r/[email protected]

Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...

Thanks,
Mark.

2021-10-27 21:27:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 03:30:11PM +0200, Ard Biesheuvel wrote:

> As far as I can tell from playing around with Clang, the stubs can
> actually be executed directly,

I had just finished reading the clang docs which suggest as much and was
about to try what the compiler actually generates.

> they just jumps to the actual function.
> The compiler simply generates a jump table for each prototype that
> appears in the code as the target of an indirect jump, and checks
> whether the target appears in the list.
>
> E.g., the code below
>
> void foo(void) {}
> void bar(int) {}
> void baz(int) {}
> void (* volatile fn1)(void) = foo;
> void (* volatile fn2)(int) = bar;
>
> int main(int argc, char *argv[])
> {
> fn1();
> fn2 = baz;
> fn2(-1);
> }
>
> produces
>
> 0000000000400594 <foo.cfi>:
> 400594: d65f03c0 ret
>
> 0000000000400598 <bar.cfi>:
> 400598: d65f03c0 ret
>
> 000000000040059c <baz.cfi>:
> 40059c: d65f03c0 ret

Right, so these are the actual functions ^.

> 00000000004005a0 <main>:
> 4005a0: a9bf7bfd stp x29, x30, [sp, #-16]!
>
> // First indirect call
> 4005a4: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
> 4005a8: f9401508 ldr x8, [x8, #40]
> 4005ac: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> 4005b0: 91182129 add x9, x9, #0x608
> 4005b4: 910003fd mov x29, sp
> 4005b8: eb09011f cmp x8, x9
> 4005bc: 54000241 b.ne 400604 <main+0x64> // b.any
> 4005c0: d63f0100 blr x8

That's impenetrable to me, sorry.

> // Assignment of fn2
> 4005c4: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> 4005c8: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
> 4005cc: 91184129 add x9, x9, #0x610
> 4005d0: f9001909 str x9, [x8, #48]

I'm struggling here, x9 points to the branch at 400610, but then what?

x8 is in .data somewhere?

> // Second indirect call
> 4005d4: f9401908 ldr x8, [x8, #48]
> 4005d8: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> 4005dc: 91183129 add x9, x9, #0x60c
> 4005e0: cb090109 sub x9, x8, x9
> 4005e4: 93c90929 ror x9, x9, #2
> 4005e8: f100053f cmp x9, #0x1
> 4005ec: 540000c8 b.hi 400604 <main+0x64> // b.pmore
> 4005f0: 12800000 mov w0, #0xffffffff // #-1
> 4005f4: d63f0100 blr x8
>
>
> 4005f8: 2a1f03e0 mov w0, wzr
> 4005fc: a8c17bfd ldp x29, x30, [sp], #16
> 400600: d65f03c0 ret
> 400604: d4200020 brk #0x1


> 0000000000400608 <__typeid__ZTSFvvE_global_addr>:
> 400608: 17ffffe3 b 400594 <foo.cfi>
>
> 000000000040060c <__typeid__ZTSFviE_global_addr>:
> 40060c: 17ffffe3 b 400598 <bar.cfi>
> 400610: 17ffffe3 b 40059c <baz.cfi>

And these are the stubs per type.

> So it looks like taking the address is fine, although not optimal due
> to the additional jump.

Right.

> We could fudge around that by checking the
> opcode at the target of the call, or token paste ".cfi" after the
> symbol name in the static_call_update() macro, but it doesn't like
> like anything is terminally broken tbh.

Agreed, since the jump table entries are actually executable it 'works'.

I really don't like that extra jump though, so I think I really do want
that nocfi_ptr() thing. And going by:

https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#forward-edge-cfi-for-indirect-function-calls

and the above, that might be possible (on x86) with something like:

/*
* Turns a Clang CFI jump-table entry into an actual function pointer.
* These jump-table entries are simply jmp.d32 instruction with their
* relative offset pointing to the actual function, therefore decode the
* instruction to find the real function.
*/
static __always_inline void *nocfi_ptr(void *func)
{
union text_poke_insn insn = *(union text_poke_insn *)func;

return func + sizeof(insn) + insn.disp;
}

But really, that wants to be a compiler intrinsic.

2021-10-27 21:27:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 02:48:52PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 27, 2021 at 02:22:27PM +0200, Ard Biesheuvel wrote:
> > On Wed, 27 Oct 2021 at 14:05, Mark Rutland <[email protected]> wrote:
>
> > > > Should not this jump-table thingy get converted to an actual function
> > > > address somewhere around arch_static_call_transform() ? This also seems
> > > > relevant for arm64 (which already has CLANG_CFI supported) given:
> > > >
> > > > https://lkml.kernel.org/r/[email protected]
> > >
> > > Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...
> > >
> >
> > Sadly, that only works on symbol names, so we cannot use it to strip
> > CFI-ness from void *func arguments passed into the static call API,
> > unfortunately.
>
> Right, and while mostly static_call_update() is used, whcih is a macro
> and could possibly be used to wrap this, we very much rely on
> __static_call_update() also working without that wrapper and then we're
> up a creek without no paddles.

Specifically, we support code like:

struct foo {
void (*func1)(args1);
void (*func2)(args2);
}

struct foo global_foo;

...

DEFINE_STATIC_CALL_NULL(func1, *global_foo.func1);

...

__init foo_init()
{
// whatever code that fills out foo

static_call_update(func1, global_foo.func1);
}

...

hot_function()
{
...
static_cal(func1)(args1);
...
}

cold_function()
{
...
global_foo->func1(args1);
...
}

And there is no way I can see this working sanely with CFI as presented.

Even though the above uses static_call_update(), we can't no longer use
function_nocfi() on the @func argument, because it's not a symbol, it's
already a function pointer.

Nor can we fill global_foo.func1 with function_nocfi() because then
cold_function() goes sideways.

2021-10-27 21:28:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 01:05:15PM +0100, Mark Rutland wrote:
> On Tue, Oct 26, 2021 at 10:16:22PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 13, 2021 at 11:16:43AM -0700, Sami Tolvanen wrote:
> > > This series adds support for Clang's Control-Flow Integrity (CFI)
> > > checking to x86_64. With CFI, the compiler injects a runtime
> > > check before each indirect function call to ensure the target is
> > > a valid function with the correct static type. This restricts
> > > possible call targets and makes it more difficult for an attacker
> > > to exploit bugs that allow the modification of stored function
> > > pointers. For more details, see:
> > >
> > > https://clang.llvm.org/docs/ControlFlowIntegrity.html
> >
> > So, if I understand this right, the compiler emits, for every function
> > two things: 1) the actual funcion and 2) a jump-table entry.
> >
> > Then, every time the address of a function is taken, 2) is given instead
> > of the expected 1), right?
>
> Yes, and we had to bodge around this with function_nocfi() to get the
> actual function address.

The patch set under consideration seems to have forgotten to provide one
for x86 :/

> Really there should be a compiler intrinsic or attribute for this, given
> the compiler has all the releveant information available. On arm64 we
> had to us inine asm to generate the addres...

Agreed, this *really* shouldn't be an arch asm hack trying to undo
something the compiler did.

2021-10-27 21:28:48

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, 27 Oct 2021 at 16:03, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Oct 27, 2021 at 03:30:11PM +0200, Ard Biesheuvel wrote:
>
> > As far as I can tell from playing around with Clang, the stubs can
> > actually be executed directly,
>
> I had just finished reading the clang docs which suggest as much and was
> about to try what the compiler actually generates.
>
> > they just jumps to the actual function.
> > The compiler simply generates a jump table for each prototype that
> > appears in the code as the target of an indirect jump, and checks
> > whether the target appears in the list.
> >
> > E.g., the code below
> >
> > void foo(void) {}
> > void bar(int) {}
> > void baz(int) {}
> > void (* volatile fn1)(void) = foo;
> > void (* volatile fn2)(int) = bar;
> >
> > int main(int argc, char *argv[])
> > {
> > fn1();
> > fn2 = baz;
> > fn2(-1);
> > }
> >
> > produces
> >
> > 0000000000400594 <foo.cfi>:
> > 400594: d65f03c0 ret
> >
> > 0000000000400598 <bar.cfi>:
> > 400598: d65f03c0 ret
> >
> > 000000000040059c <baz.cfi>:
> > 40059c: d65f03c0 ret
>
> Right, so these are the actual functions ^.
>
> > 00000000004005a0 <main>:
> > 4005a0: a9bf7bfd stp x29, x30, [sp, #-16]!
> >
> > // First indirect call
> > 4005a4: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
> > 4005a8: f9401508 ldr x8, [x8, #40]
> > 4005ac: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > 4005b0: 91182129 add x9, x9, #0x608
> > 4005b4: 910003fd mov x29, sp
> > 4005b8: eb09011f cmp x8, x9
> > 4005bc: 54000241 b.ne 400604 <main+0x64> // b.any
> > 4005c0: d63f0100 blr x8
>
> That's impenetrable to me, sorry.
>

This loads the value of fn1 in x8, and takes the address of the jump
table in x9. Since it is only one entry long, it does a simple compare
to check whether x8 appears in the jump table, and branches to the BRK
at the end if they are different.

> > // Assignment of fn2
> > 4005c4: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > 4005c8: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
> > 4005cc: 91184129 add x9, x9, #0x610
> > 4005d0: f9001909 str x9, [x8, #48]
>
> I'm struggling here, x9 points to the branch at 400610, but then what?
>
> x8 is in .data somewhere?
>

This takes the address of the jump table entry of 'baz' in x9, and
stores it in fn2 whose address is taken in x8.


> > // Second indirect call
> > 4005d4: f9401908 ldr x8, [x8, #48]
> > 4005d8: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > 4005dc: 91183129 add x9, x9, #0x60c
> > 4005e0: cb090109 sub x9, x8, x9
> > 4005e4: 93c90929 ror x9, x9, #2
> > 4005e8: f100053f cmp x9, #0x1
> > 4005ec: 540000c8 b.hi 400604 <main+0x64> // b.pmore
> > 4005f0: 12800000 mov w0, #0xffffffff // #-1
> > 4005f4: d63f0100 blr x8
> >
> >
> > 4005f8: 2a1f03e0 mov w0, wzr
> > 4005fc: a8c17bfd ldp x29, x30, [sp], #16
> > 400600: d65f03c0 ret
> > 400604: d4200020 brk #0x1
>
>
> > 0000000000400608 <__typeid__ZTSFvvE_global_addr>:
> > 400608: 17ffffe3 b 400594 <foo.cfi>
> >
> > 000000000040060c <__typeid__ZTSFviE_global_addr>:
> > 40060c: 17ffffe3 b 400598 <bar.cfi>
> > 400610: 17ffffe3 b 40059c <baz.cfi>
>
> And these are the stubs per type.
>
> > So it looks like taking the address is fine, although not optimal due
> > to the additional jump.
>
> Right.
>

... although it does seem that function_nocfi() doesn't actually work
as expected, given that we want the address of <func>.cfi and not the
address of the stub.

> > We could fudge around that by checking the
> > opcode at the target of the call, or token paste ".cfi" after the
> > symbol name in the static_call_update() macro, but it doesn't like
> > like anything is terminally broken tbh.
>
> Agreed, since the jump table entries are actually executable it 'works'.
>
> I really don't like that extra jump though, so I think I really do want
> that nocfi_ptr() thing. And going by:
>
> https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#forward-edge-cfi-for-indirect-function-calls
>
> and the above, that might be possible (on x86) with something like:
>
> /*
> * Turns a Clang CFI jump-table entry into an actual function pointer.
> * These jump-table entries are simply jmp.d32 instruction with their
> * relative offset pointing to the actual function, therefore decode the
> * instruction to find the real function.
> */
> static __always_inline void *nocfi_ptr(void *func)
> {
> union text_poke_insn insn = *(union text_poke_insn *)func;
>
> return func + sizeof(insn) + insn.disp;
> }
>
> But really, that wants to be a compiler intrinsic.

Agreed. We could easily do something similar on arm64, but I'd prefer
to avoid that too.

2021-10-27 21:29:17

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 7:18 AM Ard Biesheuvel <[email protected]> wrote:
>
> On Wed, 27 Oct 2021 at 16:03, Peter Zijlstra <[email protected]> wrote:
> >
> > On Wed, Oct 27, 2021 at 03:30:11PM +0200, Ard Biesheuvel wrote:
> >
> > > As far as I can tell from playing around with Clang, the stubs can
> > > actually be executed directly,
> >
> > I had just finished reading the clang docs which suggest as much and was
> > about to try what the compiler actually generates.
> >
> > > they just jumps to the actual function.
> > > The compiler simply generates a jump table for each prototype that
> > > appears in the code as the target of an indirect jump, and checks
> > > whether the target appears in the list.
> > >
> > > E.g., the code below
> > >
> > > void foo(void) {}
> > > void bar(int) {}
> > > void baz(int) {}
> > > void (* volatile fn1)(void) = foo;
> > > void (* volatile fn2)(int) = bar;
> > >
> > > int main(int argc, char *argv[])
> > > {
> > > fn1();
> > > fn2 = baz;
> > > fn2(-1);
> > > }
> > >
> > > produces
> > >
> > > 0000000000400594 <foo.cfi>:
> > > 400594: d65f03c0 ret
> > >
> > > 0000000000400598 <bar.cfi>:
> > > 400598: d65f03c0 ret
> > >
> > > 000000000040059c <baz.cfi>:
> > > 40059c: d65f03c0 ret
> >
> > Right, so these are the actual functions ^.
> >
> > > 00000000004005a0 <main>:
> > > 4005a0: a9bf7bfd stp x29, x30, [sp, #-16]!
> > >
> > > // First indirect call
> > > 4005a4: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
> > > 4005a8: f9401508 ldr x8, [x8, #40]
> > > 4005ac: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > > 4005b0: 91182129 add x9, x9, #0x608
> > > 4005b4: 910003fd mov x29, sp
> > > 4005b8: eb09011f cmp x8, x9
> > > 4005bc: 54000241 b.ne 400604 <main+0x64> // b.any
> > > 4005c0: d63f0100 blr x8
> >
> > That's impenetrable to me, sorry.
> >
>
> This loads the value of fn1 in x8, and takes the address of the jump
> table in x9. Since it is only one entry long, it does a simple compare
> to check whether x8 appears in the jump table, and branches to the BRK
> at the end if they are different.
>
> > > // Assignment of fn2
> > > 4005c4: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > > 4005c8: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
> > > 4005cc: 91184129 add x9, x9, #0x610
> > > 4005d0: f9001909 str x9, [x8, #48]
> >
> > I'm struggling here, x9 points to the branch at 400610, but then what?
> >
> > x8 is in .data somewhere?
> >
>
> This takes the address of the jump table entry of 'baz' in x9, and
> stores it in fn2 whose address is taken in x8.
>
>
> > > // Second indirect call
> > > 4005d4: f9401908 ldr x8, [x8, #48]
> > > 4005d8: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > > 4005dc: 91183129 add x9, x9, #0x60c
> > > 4005e0: cb090109 sub x9, x8, x9
> > > 4005e4: 93c90929 ror x9, x9, #2
> > > 4005e8: f100053f cmp x9, #0x1
> > > 4005ec: 540000c8 b.hi 400604 <main+0x64> // b.pmore
> > > 4005f0: 12800000 mov w0, #0xffffffff // #-1
> > > 4005f4: d63f0100 blr x8
> > >
> > >
> > > 4005f8: 2a1f03e0 mov w0, wzr
> > > 4005fc: a8c17bfd ldp x29, x30, [sp], #16
> > > 400600: d65f03c0 ret
> > > 400604: d4200020 brk #0x1
> >
> >
> > > 0000000000400608 <__typeid__ZTSFvvE_global_addr>:
> > > 400608: 17ffffe3 b 400594 <foo.cfi>
> > >
> > > 000000000040060c <__typeid__ZTSFviE_global_addr>:
> > > 40060c: 17ffffe3 b 400598 <bar.cfi>
> > > 400610: 17ffffe3 b 40059c <baz.cfi>
> >
> > And these are the stubs per type.
> >
> > > So it looks like taking the address is fine, although not optimal due
> > > to the additional jump.
> >
> > Right.
> >
>
> ... although it does seem that function_nocfi() doesn't actually work
> as expected, given that we want the address of <func>.cfi and not the
> address of the stub.

This is because the example wasn't compiled with
-fno-sanitize-cfi-canonical-jump-tables, which we use in the kernel.
With non-canonical jump tables, <func> continues to point to the
function and <func>.cfi_jt points to the jump table, and therefore,
function_nocfi() returns the raw function address.

> > > We could fudge around that by checking the
> > > opcode at the target of the call, or token paste ".cfi" after the
> > > symbol name in the static_call_update() macro, but it doesn't like
> > > like anything is terminally broken tbh.
> >
> > Agreed, since the jump table entries are actually executable it 'works'.
> >
> > I really don't like that extra jump though, so I think I really do want
> > that nocfi_ptr() thing. And going by:
> >
> > https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#forward-edge-cfi-for-indirect-function-calls
> >
> > and the above, that might be possible (on x86) with something like:
> >
> > /*
> > * Turns a Clang CFI jump-table entry into an actual function pointer.
> > * These jump-table entries are simply jmp.d32 instruction with their
> > * relative offset pointing to the actual function, therefore decode the
> > * instruction to find the real function.
> > */
> > static __always_inline void *nocfi_ptr(void *func)
> > {
> > union text_poke_insn insn = *(union text_poke_insn *)func;
> >
> > return func + sizeof(insn) + insn.disp;
> > }
> >
> > But really, that wants to be a compiler intrinsic.
>
> Agreed. We could easily do something similar on arm64, but I'd prefer
> to avoid that too.

I'll see what we can do. Note that the compiler built-in we previously
discussed would have semantics similar to function_nocfi(). It would
return the raw function address from a symbol name, but it wouldn't
decode the address from an arbitrary pointer, so this would require
something different.

Sami

2021-10-27 21:29:37

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, 27 Oct 2021 at 17:50, Sami Tolvanen <[email protected]> wrote:
>
> On Wed, Oct 27, 2021 at 7:18 AM Ard Biesheuvel <[email protected]> wrote:
> >
> > On Wed, 27 Oct 2021 at 16:03, Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Wed, Oct 27, 2021 at 03:30:11PM +0200, Ard Biesheuvel wrote:
> > >
> > > > As far as I can tell from playing around with Clang, the stubs can
> > > > actually be executed directly,
> > >
> > > I had just finished reading the clang docs which suggest as much and was
> > > about to try what the compiler actually generates.
> > >
> > > > they just jumps to the actual function.
> > > > The compiler simply generates a jump table for each prototype that
> > > > appears in the code as the target of an indirect jump, and checks
> > > > whether the target appears in the list.
> > > >
> > > > E.g., the code below
> > > >
> > > > void foo(void) {}
> > > > void bar(int) {}
> > > > void baz(int) {}
> > > > void (* volatile fn1)(void) = foo;
> > > > void (* volatile fn2)(int) = bar;
> > > >
> > > > int main(int argc, char *argv[])
> > > > {
> > > > fn1();
> > > > fn2 = baz;
> > > > fn2(-1);
> > > > }
> > > >
> > > > produces
> > > >
> > > > 0000000000400594 <foo.cfi>:
> > > > 400594: d65f03c0 ret
> > > >
> > > > 0000000000400598 <bar.cfi>:
> > > > 400598: d65f03c0 ret
> > > >
> > > > 000000000040059c <baz.cfi>:
> > > > 40059c: d65f03c0 ret
> > >
> > > Right, so these are the actual functions ^.
> > >
> > > > 00000000004005a0 <main>:
> > > > 4005a0: a9bf7bfd stp x29, x30, [sp, #-16]!
> > > >
> > > > // First indirect call
> > > > 4005a4: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
> > > > 4005a8: f9401508 ldr x8, [x8, #40]
> > > > 4005ac: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > > > 4005b0: 91182129 add x9, x9, #0x608
> > > > 4005b4: 910003fd mov x29, sp
> > > > 4005b8: eb09011f cmp x8, x9
> > > > 4005bc: 54000241 b.ne 400604 <main+0x64> // b.any
> > > > 4005c0: d63f0100 blr x8
> > >
> > > That's impenetrable to me, sorry.
> > >
> >
> > This loads the value of fn1 in x8, and takes the address of the jump
> > table in x9. Since it is only one entry long, it does a simple compare
> > to check whether x8 appears in the jump table, and branches to the BRK
> > at the end if they are different.
> >
> > > > // Assignment of fn2
> > > > 4005c4: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > > > 4005c8: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
> > > > 4005cc: 91184129 add x9, x9, #0x610
> > > > 4005d0: f9001909 str x9, [x8, #48]
> > >
> > > I'm struggling here, x9 points to the branch at 400610, but then what?
> > >
> > > x8 is in .data somewhere?
> > >
> >
> > This takes the address of the jump table entry of 'baz' in x9, and
> > stores it in fn2 whose address is taken in x8.
> >
> >
> > > > // Second indirect call
> > > > 4005d4: f9401908 ldr x8, [x8, #48]
> > > > 4005d8: 90000009 adrp x9, 400000 <__abi_tag-0x278>
> > > > 4005dc: 91183129 add x9, x9, #0x60c
> > > > 4005e0: cb090109 sub x9, x8, x9
> > > > 4005e4: 93c90929 ror x9, x9, #2
> > > > 4005e8: f100053f cmp x9, #0x1
> > > > 4005ec: 540000c8 b.hi 400604 <main+0x64> // b.pmore
> > > > 4005f0: 12800000 mov w0, #0xffffffff // #-1
> > > > 4005f4: d63f0100 blr x8
> > > >
> > > >
> > > > 4005f8: 2a1f03e0 mov w0, wzr
> > > > 4005fc: a8c17bfd ldp x29, x30, [sp], #16
> > > > 400600: d65f03c0 ret
> > > > 400604: d4200020 brk #0x1
> > >
> > >
> > > > 0000000000400608 <__typeid__ZTSFvvE_global_addr>:
> > > > 400608: 17ffffe3 b 400594 <foo.cfi>
> > > >
> > > > 000000000040060c <__typeid__ZTSFviE_global_addr>:
> > > > 40060c: 17ffffe3 b 400598 <bar.cfi>
> > > > 400610: 17ffffe3 b 40059c <baz.cfi>
> > >
> > > And these are the stubs per type.
> > >
> > > > So it looks like taking the address is fine, although not optimal due
> > > > to the additional jump.
> > >
> > > Right.
> > >
> >
> > ... although it does seem that function_nocfi() doesn't actually work
> > as expected, given that we want the address of <func>.cfi and not the
> > address of the stub.
>
> This is because the example wasn't compiled with
> -fno-sanitize-cfi-canonical-jump-tables, which we use in the kernel.
> With non-canonical jump tables, <func> continues to point to the
> function and <func>.cfi_jt points to the jump table, and therefore,
> function_nocfi() returns the raw function address.
>

Ah excellent. So that means that we don't need function_nocfi() at
all, given that
- statically allocated references (i.e., DEFINE_STATIC_CALL()) will
refer to the function directly;
- runtime assignments can decode the target of the *func pointer and
strip off the initial branch.

It would still be nice to have an intrinsic for that, or some variable
attribute that signifies that assigning the address of a function to
it will produce the actual function rather than the jump table entry.

> > > > We could fudge around that by checking the
> > > > opcode at the target of the call, or token paste ".cfi" after the
> > > > symbol name in the static_call_update() macro, but it doesn't like
> > > > like anything is terminally broken tbh.
> > >
> > > Agreed, since the jump table entries are actually executable it 'works'.
> > >
> > > I really don't like that extra jump though, so I think I really do want
> > > that nocfi_ptr() thing. And going by:
> > >
> > > https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#forward-edge-cfi-for-indirect-function-calls
> > >
> > > and the above, that might be possible (on x86) with something like:
> > >
> > > /*
> > > * Turns a Clang CFI jump-table entry into an actual function pointer.
> > > * These jump-table entries are simply jmp.d32 instruction with their
> > > * relative offset pointing to the actual function, therefore decode the
> > > * instruction to find the real function.
> > > */
> > > static __always_inline void *nocfi_ptr(void *func)
> > > {
> > > union text_poke_insn insn = *(union text_poke_insn *)func;
> > >
> > > return func + sizeof(insn) + insn.disp;
> > > }
> > >
> > > But really, that wants to be a compiler intrinsic.
> >
> > Agreed. We could easily do something similar on arm64, but I'd prefer
> > to avoid that too.
>
> I'll see what we can do. Note that the compiler built-in we previously
> discussed would have semantics similar to function_nocfi(). It would
> return the raw function address from a symbol name, but it wouldn't
> decode the address from an arbitrary pointer, so this would require
> something different.
>
> Sami

2021-10-27 21:30:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 04:18:17PM +0200, Ard Biesheuvel wrote:
> On Wed, 27 Oct 2021 at 16:03, Peter Zijlstra <[email protected]> wrote:

> > /*
> > * Turns a Clang CFI jump-table entry into an actual function pointer.
> > * These jump-table entries are simply jmp.d32 instruction with their
> > * relative offset pointing to the actual function, therefore decode the
> > * instruction to find the real function.
> > */
> > static __always_inline void *nocfi_ptr(void *func)
> > {
> > union text_poke_insn insn = *(union text_poke_insn *)func;

also, probably, for the paranoid amongst us:

if (WARN_ON_ONCE(insn.opcode != JMP32_INSN_OPCODE))
return func;

> > return func + sizeof(insn) + insn.disp;
> > }
> >
> > But really, that wants to be a compiler intrinsic.
>
> Agreed. We could easily do something similar on arm64, but I'd prefer
> to avoid that too.

Right, because on x86 CET-IBT will force that entry to have a different
form (and size), similar on arm64 with BTI.

I was thinking the compiler really should implicitly do this conversion
when a function pointer is cast to an integer type. But barring that, we
really need an intrinsic to perform this.

Also, perhaps the compiler should admit it's doing dodgy crap and
introduce the notion of address spaces and use the type system to
separate these two forms.

2021-10-27 21:31:15

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 03:04:55PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 27, 2021 at 02:48:52PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 27, 2021 at 02:22:27PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 27 Oct 2021 at 14:05, Mark Rutland <[email protected]> wrote:
> >
> > > > > Should not this jump-table thingy get converted to an actual function
> > > > > address somewhere around arch_static_call_transform() ? This also seems
> > > > > relevant for arm64 (which already has CLANG_CFI supported) given:
> > > > >
> > > > > https://lkml.kernel.org/r/[email protected]
> > > >
> > > > Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...
> > > >
> > >
> > > Sadly, that only works on symbol names, so we cannot use it to strip
> > > CFI-ness from void *func arguments passed into the static call API,
> > > unfortunately.
> >
> > Right, and while mostly static_call_update() is used, whcih is a macro
> > and could possibly be used to wrap this, we very much rely on
> > __static_call_update() also working without that wrapper and then we're
> > up a creek without no paddles.
>
> Specifically, we support code like:
>
> struct foo {
> void (*func1)(args1);
> void (*func2)(args2);
> }
>
> struct foo global_foo;

And global_foo is intentionally non-const?

>
> ...
>
> DEFINE_STATIC_CALL_NULL(func1, *global_foo.func1);
>
> ...
>
> __init foo_init()
> {
> // whatever code that fills out foo
>
> static_call_update(func1, global_foo.func1);
> }
>
> ...
>
> hot_function()
> {
> ...
> static_cal(func1)(args1);
> ...
> }
>
> cold_function()
> {
> ...
> global_foo->func1(args1);
> ...
> }

If global_foo is non-const, then the static call stuff is just an
obfuscated indirect call. The attack CFI attempts to block is having
a controlled write flaw turn into controlled execution. For example,
in the above, an attacker would use a flaw that could aim a write to
global_foo->func1, and then get the kernel to take an execution path
that executes global_foo->func1 (i.e. through cold_function).

If static_call_update() accepts an arbitrary function argument, then it
needs to perform the same validation that is currently being injected
at indirect call sites to avoid having static calls weaken CFI. If
things are left alone, static calls will just insert a direct call to
the jump table, and things "work" (with an extra jump), but nothing
is actually protected (it just requires the attacker locate a more
narrow set of kernel call flows that includes static_call_update). Any
attacker write to global_foo->func1, followed by a static_call_update(),
will create a direct call (with no CFI checking) to the attacker's
destination. (It's likely harder to find the needed execution path to
induce a static_call_update(), but that shouldn't stop us from trying
to protect it.)

Getting a "jump table to actual function" primitive only avoids the added
jump -- all the CFI checking remains bypassed. If static call function
address storage isn't const, for CFI to work as expected the update()
routine will need to do the same checking that is done at indirect call
sites when extracting the "real" function for writing into a direct call.
(i.e. it will need to be function prototype aware, be able to find the
real matching jump table and size, and verify the given function is
actually in the table. Just dereferencing the jump table to find the
address isn't a protection: the attacker just has to write to func1 with
a pointer to an attacker-constructed jump table.)

I think it's not unreasonable to solve this in phases, though. Given that
static calls work with CFI as-is (albeit with an extra jump), and that
static calls do offer some hardening of existing indirect call targets (by
narrowing the execution path needed to actually bring the func1 value into
the kernel execution flow), I don't see either feature blocking the other.

Adding proper CFI function prototype checking to static calls seems
like a security improvement with or without CFI, and a minor performance
improvement under CFI.

To avoid all of this, though, it'd be better if static calls only
switched between one of a per-site const list of possible functions,
which would make it a much tighter hand-rolled CFI system itself. :)
(i.e. it would operate from a specific and short list of expected
functions rather than the "best effort" approach of matching function
prototypes as done by Clang CFI.)

--
Kees Cook

2021-10-27 21:37:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 10:11:28AM -0700, Kees Cook wrote:
> On Wed, Oct 27, 2021 at 03:04:55PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 27, 2021 at 02:48:52PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 27, 2021 at 02:22:27PM +0200, Ard Biesheuvel wrote:
> > > > On Wed, 27 Oct 2021 at 14:05, Mark Rutland <[email protected]> wrote:
> > >
> > > > > > Should not this jump-table thingy get converted to an actual function
> > > > > > address somewhere around arch_static_call_transform() ? This also seems
> > > > > > relevant for arm64 (which already has CLANG_CFI supported) given:
> > > > > >
> > > > > > https://lkml.kernel.org/r/[email protected]
> > > > >
> > > > > Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...
> > > > >
> > > >
> > > > Sadly, that only works on symbol names, so we cannot use it to strip
> > > > CFI-ness from void *func arguments passed into the static call API,
> > > > unfortunately.
> > >
> > > Right, and while mostly static_call_update() is used, whcih is a macro
> > > and could possibly be used to wrap this, we very much rely on
> > > __static_call_update() also working without that wrapper and then we're
> > > up a creek without no paddles.
> >
> > Specifically, we support code like:
> >
> > struct foo {
> > void (*func1)(args1);
> > void (*func2)(args2);
> > }
> >
> > struct foo global_foo;
>
> And global_foo is intentionally non-const?

Yep, since depending on the init function it can discover and stuff in
a wild variety of functions.

> >
> > ...
> >
> > DEFINE_STATIC_CALL_NULL(func1, *global_foo.func1);
> >
> > ...
> >
> > __init foo_init()
> > {
> > // whatever code that fills out foo
> >
> > static_call_update(func1, global_foo.func1);
> > }
> >
> > ...
> >
> > hot_function()
> > {
> > ...
> > static_cal(func1)(args1);
> > ...
> > }
> >
> > cold_function()
> > {
> > ...
> > global_foo->func1(args1);
> > ...
> > }
>
> If global_foo is non-const, then the static call stuff is just an
> obfuscated indirect call.

It is not. The target is determined once, at boot time, depending on the
hardware, it then never changes. The static_call() results in a direct
call to that function.

> The attack CFI attempts to block is having
> a controlled write flaw turn into controlled execution. For example,
> in the above, an attacker would use a flaw that could aim a write to
> global_foo->func1, and then get the kernel to take an execution path
> that executes global_foo->func1 (i.e. through cold_function).

I know, and CFI works for cold_function.

> If static_call_update() accepts an arbitrary function argument, then it
> needs to perform the same validation that is currently being injected
> at indirect call sites to avoid having static calls weaken CFI.

static_call_update() is a macro and has compile time signature checks,
the actual function is __static_call_update() and someone can go add
extra validation in there if they so desire.

I did have this patch:

https://lkml.kernel.org/r/[email protected]

but I never did get around to finishing it. Although looking at it now,
I suppose static_call_seal() might be a better name.

And you're worried about __static_call_update() over text_poke_bp()
because?

> Getting a "jump table to actual function" primitive only avoids the added
> jump -- all the CFI checking remains bypassed.

Exactly, so the extra jump serves no purpose and needs to go. Doubly so
because people are looking at static_call() to undo some of the
performance damage introduced by CFI :-)

> If static call function
> address storage isn't const, for CFI to work as expected the update()
> routine will need to do the same checking that is done at indirect call
> sites when extracting the "real" function for writing into a direct call.

I've mentioned static_call like a hundred times in these CFI threads..
if you want to do CFI on them, go ahead. I'm just not sure the C type
system is up for that, you'll have to somehow frob the signature symbol
into __static_call_update(), something like __builtin_type_symbol().

> To avoid all of this, though, it'd be better if static calls only
> switched between one of a per-site const list of possible functions,
> which would make it a much tighter hand-rolled CFI system itself. :)
> (i.e. it would operate from a specific and short list of expected
> functions rather than the "best effort" approach of matching function
> prototypes as done by Clang CFI.)

That sounds like a ton of painful ugly.

2021-10-27 22:17:38

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v5 00/15] x86: Add support for Clang CFI

From: Mark Rutland
> Sent: 27 October 2021 14:18
>
> On Wed, Oct 27, 2021 at 12:55:17PM +0000, David Laight wrote:
> > From: Mark Rutland
> > > Sent: 27 October 2021 13:05
> > ...
> > > Taking a step back, it'd be nicer if we didn't have the jump-table shim
> > > at all, and had some SW landing pad (e.g. a NOP with some magic bytes)
> > > in the callees that the caller could check for. Then function pointers
> > > would remain callable in call cases, and we could explcitly add landing
> > > pads to asm to protect those. I *think* that's what the grsecurity folk
> > > do, but I could be mistaken.
> >
> > It doesn't need to be a 'landing pad'.
> > The 'magic value' could be at 'label - 8'.
>
> Sure; I'd intended to mean the general case of something at some fixed
> offset from the entrypoint, either before or after, potentially but not
> necessarily inline in the executed instruction stream.

What you really want is to be able to read the value using the I-cache
so as not to pollute the D-cache with code bytes and to avoid having
both an I-cache and D-cache miss at the same time for the same memory.

Even if the I-cache read took an extra clock (or two) I suspect it
would be an overall gain.
This is also true for code that uses pc-relative instructions to
read constants - common in arm-64.

Not sure any hardware lets you do that though :-(

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-10-27 22:30:15

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 11:21:26PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 27, 2021 at 10:11:28AM -0700, Kees Cook wrote:
> > On Wed, Oct 27, 2021 at 03:04:55PM +0200, Peter Zijlstra wrote:
> > > [...]
> > > cold_function()
> > > {
> > > ...
> > > global_foo->func1(args1);
> > > ...
> > > }
> >
> > If global_foo is non-const, then the static call stuff is just an
> > obfuscated indirect call.
>
> It is not. The target is determined once, at boot time, depending on the
> hardware, it then never changes. The static_call() results in a direct
> call to that function.

Right, I mean it is _effectively_ an indirect call in the sense that the
call destination is under the control of a writable memory location.
Yes, static_call_update() must be called to make the change, hence why I
didn't wave my arms around when static_call was originally added. It's a
net benefit for the kernel: actual indirect calls now become updatable
direct calls. This means reachability becomes much harder for attackers;
I'm totally a fan. :)

> > If static_call_update() accepts an arbitrary function argument, then it
> > needs to perform the same validation that is currently being injected
> > at indirect call sites to avoid having static calls weaken CFI.
>
> static_call_update() is a macro and has compile time signature checks,
> the actual function is __static_call_update() and someone can go add
> extra validation in there if they so desire.
>
> I did have this patch:
>
> https://lkml.kernel.org/r/[email protected]
>
> but I never did get around to finishing it. Although looking at it now,
> I suppose static_call_seal() might be a better name.

Right -- though wouldn't just adding __ro_after_init do the same?

DEFINE_STATIC_CALL(static_call_name, func_a) __ro_after_init;

If you wanted the clean WARN_ON, __static_call_update() could check if
the struct static_call_key is in a non-writable region.

> And you're worried about __static_call_update() over text_poke_bp()
> because?

Both have a meaningful lack of exposure to common attacker-reachable
code paths (e.g., it's likely rare that there are ways attackers can
control the target/content of a text_poke_bp(): alternatives and ftrace,
both of which tend to use read-only content).

static_call_update() is limited per-site with a fixed set of hardcoded
targets (i.e. any place static_call() is used), but the content
is effectively arbitrary (coming from writable memory). AIUI, it's
intended to be used more widely than text_poke_bp(), and it seems like
call sites using static_call_update() will become less rare in future
kernels. (Currently it seems mostly focused on sched and pmu, which
don't traditionally have much userspace control).

So, they're kind of opposite, but for me the question is for examining
the changes to reachability and what kind of primitives their existence
provides an attacker. IMO, static_calls are a net gain over indirect
calls (from some usually writable function pointer) because it becomes
a direct call. It's risk doesn't match a "real" direct call, though,
since they do still have the (much more narrow) risk of having something
call static_call_update() from a writable function pointer as in your
example, but that's still a defensively better position than an regular
indirect call.

What I'm hoping to see from static_calls is maybe defaulting to being
ro_after_init, and explicitly marking those that can't be, which makes
auditing easier and put things into a default-safe state (i.e. both
fixed targets and fixed content).

> > Getting a "jump table to actual function" primitive only avoids the added
> > jump -- all the CFI checking remains bypassed.
>
> Exactly, so the extra jump serves no purpose and needs to go. Doubly so
> because people are looking at static_call() to undo some of the
> performance damage introduced by CFI :-)

Well, sure, it's inefficient, not _broken_. :) And can you point to the
"use static_calls because CFI is slow" discussion? I'd be curious to read
through that; the past performance testing had shown that CFI performance
overhead tends to be less than the gains of LTO. So compared to a "stock"
kernel, things should be about the same if not faster.

That said, I understand I'm talking to you, and you're paying very close
attention to the scheduler, etc. :) I'm sure there are specific
workloads that look terrible under CFI!

> > If static call function
> > address storage isn't const, for CFI to work as expected the update()
> > routine will need to do the same checking that is done at indirect call
> > sites when extracting the "real" function for writing into a direct call.
>
> I've mentioned static_call like a hundred times in these CFI threads..
> if you want to do CFI on them, go ahead. I'm just not sure the C type
> system is up for that, you'll have to somehow frob the signature symbol
> into __static_call_update(), something like __builtin_type_symbol().
>
> > To avoid all of this, though, it'd be better if static calls only
> > switched between one of a per-site const list of possible functions,
> > which would make it a much tighter hand-rolled CFI system itself. :)
> > (i.e. it would operate from a specific and short list of expected
> > functions rather than the "best effort" approach of matching function
> > prototypes as done by Clang CFI.)
>
> That sounds like a ton of painful ugly.

Right, I know. That's why I'm saying that I see these features as
certainly related, but not at odds with each other. CFI doesn't protect
static_call sites, but static_call sites are already more well protected
than their earlier indirect calls.

The only note I had was that if static_call wants to dig into the jump
table, it likely needs to static_call-specific: we don't want a general
way to do that without knowing we have some reasonable bounds on the
behavior of the code using it. I think it's fine to have static_calls
do this, though it'd be nice if they could be a little more defensive
(generally) at the same time.

--
Kees Cook

2021-10-28 11:14:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 03:27:59PM -0700, Kees Cook wrote:

> Right -- though wouldn't just adding __ro_after_init do the same?
>
> DEFINE_STATIC_CALL(static_call_name, func_a) __ro_after_init;

That breaks modules (and your jump_label patch doing the same is
similarly broken).

When a module is loaded that uses the static_call(), it needs to
register it's .static_call_sites range with the static_call_key which
requires modifying it.

2021-10-28 17:15:01

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Thu, Oct 28, 2021 at 01:09:39PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 27, 2021 at 03:27:59PM -0700, Kees Cook wrote:
>
> > Right -- though wouldn't just adding __ro_after_init do the same?
> >
> > DEFINE_STATIC_CALL(static_call_name, func_a) __ro_after_init;
>
> That breaks modules (and your jump_label patch doing the same is
> similarly broken).

Well that's no fun. :) I'd like to understand this better so I can fix
it!

>
> When a module is loaded that uses the static_call(), it needs to
> register it's .static_call_sites range with the static_call_key which
> requires modifying it.

Reading static_call_add_module() leaves me with even more questions. ;)

It looks like module static calls need to write to kernel text? I don't
understand. Is this when a module is using an non-module key for a call
site? And in that case, this happens:

key |= s_key & STATIC_CALL_SITE_FLAGS;

Where "key" is not in the module?

And the flags can be:

#define STATIC_CALL_SITE_TAIL 1UL /* tail call */
#define STATIC_CALL_SITE_INIT 2UL /* init section */

But aren't these per-site attributes? Why are they stored per-key?

if (!init && static_call_is_init(site))
continue;

...

arch_static_call_transform(site_addr, NULL, func,
static_call_is_tail(site));


--
Kees Cook

2021-10-28 20:31:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Thu, Oct 28, 2021 at 10:12:32AM -0700, Kees Cook wrote:
> On Thu, Oct 28, 2021 at 01:09:39PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 27, 2021 at 03:27:59PM -0700, Kees Cook wrote:
> >
> > > Right -- though wouldn't just adding __ro_after_init do the same?
> > >
> > > DEFINE_STATIC_CALL(static_call_name, func_a) __ro_after_init;
> >
> > That breaks modules (and your jump_label patch doing the same is
> > similarly broken).
>
> Well that's no fun. :) I'd like to understand this better so I can fix
> it!
>
> >
> > When a module is loaded that uses the static_call(), it needs to
> > register it's .static_call_sites range with the static_call_key which
> > requires modifying it.
>
> Reading static_call_add_module() leaves me with even more questions. ;)

Yes, that function is highly magical..

> It looks like module static calls need to write to kernel text?

No, they need to modify the static_call_key though.

> I don't
> understand. Is this when a module is using an non-module key for a call
> site? And in that case, this happens:
>
> key |= s_key & STATIC_CALL_SITE_FLAGS;
>
> Where "key" is not in the module?
>
> And the flags can be:
>
> #define STATIC_CALL_SITE_TAIL 1UL /* tail call */
> #define STATIC_CALL_SITE_INIT 2UL /* init section */
>
> But aren't these per-site attributes? Why are they stored per-key?

They are per site, but stored in the key pointer.

So static_call has (and jump_label is nearly identical):

struct static_call_site {
s32 addr;
s32 key;
};

struct static_call_mod {
struct static_call_mod *next;
struct module *mod;
struct static_call_sutes *sites;
};

struct static_call_key {
void *func;
union {
unsigned long type;
struct static_call_mod *mods;
struct static_call_site *sites;
};
};

__SCT_##name() tramplines (no analog with jump_label)

.static_call_sites section
.static_call_tramp_key section (no analog with jump_label)

Where the key holds the current function pointer and a pointer to either
an array of static_call_site or a pointer to a static_call_mod.

Now, a key observation is that all these data structures are word
aligned, which means we have at least 2 lsb bits to play with. For
static_call_key::{mods,sites} the LSB indicates which, 0:mods, 1:sites.

Then the .static_call_sites section is an array of struct
static_call_site sorted by the static_call_key pointer.

The static_call_sites holds relative displacements, but represents:

struct static_call_key *key;
unsigned long call_address;

Now, since code (on x86) is variable length, there are no spare bits in
the code address, but since static_call_key is aligned, we have spare
bits. It is those bits we use to encode TAIL (Bit0) and INIT (Bit1).

If INIT, the address points to an __init section and we shouldn't try
and touch if after those have been freed or bad stuff happens.

If TAIL, it's a tail-call and we get to write a jump instruction instead
of a call instruction.

So, objtool builds .static_call_sites at built time, then at init (or
module load) time we sort the array by static_call_key pointer, such
that we get consequtive ranges per key. We iterate the array and every
time the key pointer changes, we -- already having the key pointer --
set key->sites to the first.

Now, kernel init of static_call happens *really* early and memory
allocation doesn't work yet, which is why we have that {mods,sites}
thing. Therefore, when the first module gets loaded, we need to allocate
a struct static_call_mod for the kernel (mod==NULL) and transfer the
sites pointer to it and change key to a mods pointer.

So one possible solution would be to have a late init (but before RO),
that, re-iterates the sites array and pre-allocates the kernel
static_call_mod structure. That way, static_call_key gets changed to a
mods pointer and wouldn't ever need changing after that, only the
static_call_mod (which isn't RO) gets changed when modules get
added/deleted.

The above is basically identical to jump_labels. However static_call()
have one more trick:

EXPORT_STATIC_CALL_TRAMP()

That exports the trampoline symbol, but not the static_call_key data
structure. The result is that modules can use the static_call(), but
cannot use static_call_update() because they cannot get at the key.

In this case objtool cannot correctly put the static_call_key address in
the static_call_site, what it does instead is store the trampoline
address (there's a 1:1 relation between key and tramplines). And then we
ues the .static_call_tramp_key section to find a mapping from trampoline
to key and rewrite the site to be 'right'. All this happens before
sorting it on key obv.

Hope that clarifies things, instead of making it worse :-)

2021-10-29 20:06:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Wed, Oct 27, 2021 at 08:50:17AM -0700, Sami Tolvanen wrote:
> On Wed, Oct 27, 2021 at 7:18 AM Ard Biesheuvel <[email protected]> wrote:

> > > /*
> > > * Turns a Clang CFI jump-table entry into an actual function pointer.
> > > * These jump-table entries are simply jmp.d32 instruction with their
> > > * relative offset pointing to the actual function, therefore decode the
> > > * instruction to find the real function.
> > > */
> > > static __always_inline void *nocfi_ptr(void *func)
> > > {
> > > union text_poke_insn insn = *(union text_poke_insn *)func;
> > >
> > > return func + sizeof(insn) + insn.disp;
> > > }
> > >
> > > But really, that wants to be a compiler intrinsic.
> >
> > Agreed. We could easily do something similar on arm64, but I'd prefer
> > to avoid that too.
>
> I'll see what we can do. Note that the compiler built-in we previously
> discussed would have semantics similar to function_nocfi(). It would
> return the raw function address from a symbol name, but it wouldn't
> decode the address from an arbitrary pointer, so this would require
> something different.

So I had a bit of a peek at what clang generates:

3fa4: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 3fa7: R_X86_64_32S __SCK__x86_pmu_handle_irq
3fab: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 3fae: R_X86_64_32S __SCT__x86_pmu_handle_irq.cfi_jt
3fb2: e8 00 00 00 00 call 3fb7 <init_hw_perf_events+0x1dc> 3fb3: R_X86_64_PLT32 __static_call_update-0x4

So this then gives the trampoline jump table entry to
__static_call_update(), with the result that it will rewrite the
jump-table entry, not the trampoline!

Now it so happens that the trampoline looks *exactly* like the
jump-table entry (one jmp.d32 instruction), so in that regards it'll
again 'work'.

But this is all really, as in *really*, wrong. And I'm really sad I'm
the one to have to discover this, even though I've mentioned
static_call()s being tricky in previous reviews.


2021-10-30 07:51:42

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] static_call,x86: Robustify trampoline patching

On Fri, Oct 29, 2021 at 10:03:24PM +0200, Peter Zijlstra wrote:

> So I had a bit of a peek at what clang generates:
>
> 3fa4: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 3fa7: R_X86_64_32S __SCK__x86_pmu_handle_irq
> 3fab: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 3fae: R_X86_64_32S __SCT__x86_pmu_handle_irq.cfi_jt
> 3fb2: e8 00 00 00 00 call 3fb7 <init_hw_perf_events+0x1dc> 3fb3: R_X86_64_PLT32 __static_call_update-0x4
>
> So this then gives the trampoline jump table entry to
> __static_call_update(), with the result that it will rewrite the
> jump-table entry, not the trampoline!
>
> Now it so happens that the trampoline looks *exactly* like the
> jump-table entry (one jmp.d32 instruction), so in that regards it'll
> again 'work'.
>
> But this is all really, as in *really*, wrong. And I'm really sad I'm
> the one to have to discover this, even though I've mentioned
> static_call()s being tricky in previous reviews.

The below makes the clang-cfi build properly sick:

[ 0.000000] trampoline signature fail
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] kernel BUG at arch/x86/kernel/static_call.c:65!

---
Subject: static_call,x86: Robustify trampoline patching

Add a few signature bytes after the static call trampoline and verify
those bytes match before patching the trampoline. This avoids patching
random other JMPs (such as CFI jump-table entries) instead.

These bytes decode as:

d: 53 push %rbx
e: 43 54 rex.XB push %r12

And happen to spell "SCT".

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/include/asm/static_call.h | 1 +
arch/x86/kernel/static_call.c | 14 ++++++++++----
tools/objtool/check.c | 3 +++
3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/static_call.h b/arch/x86/include/asm/static_call.h
index cbb67b6030f9..39ebe0511869 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -27,6 +27,7 @@
".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
STATIC_CALL_TRAMP_STR(name) ": \n" \
insns " \n" \
+ ".byte 0x53, 0x43, 0x54 \n" \
".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
".popsection \n")
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
index ea028e736831..9c407a33a774 100644
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -56,10 +56,15 @@ static void __ref __static_call_transform(void *insn, enum insn_type type, void
text_poke_bp(insn, code, size, emulate);
}

-static void __static_call_validate(void *insn, bool tail)
+static void __static_call_validate(void *insn, bool tail, bool tramp)
{
u8 opcode = *(u8 *)insn;

+ if (tramp && memcmp(insn+5, "SCT", 3)) {
+ pr_err("trampoline signature fail");
+ BUG();
+ }
+
if (tail) {
if (opcode == JMP32_INSN_OPCODE ||
opcode == RET_INSN_OPCODE)
@@ -74,7 +79,8 @@ static void __static_call_validate(void *insn, bool tail)
/*
* If we ever trigger this, our text is corrupt, we'll probably not live long.
*/
- WARN_ONCE(1, "unexpected static_call insn opcode 0x%x at %pS\n", opcode, insn);
+ pr_err("unexpected static_call insn opcode 0x%x at %pS\n", opcode, insn);
+ BUG();
}

static inline enum insn_type __sc_insn(bool null, bool tail)
@@ -97,12 +103,12 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
mutex_lock(&text_mutex);

if (tramp) {
- __static_call_validate(tramp, true);
+ __static_call_validate(tramp, true, true);
__static_call_transform(tramp, __sc_insn(!func, true), func);
}

if (IS_ENABLED(CONFIG_HAVE_STATIC_CALL_INLINE) && site) {
- __static_call_validate(site, tail);
+ __static_call_validate(site, tail, false);
__static_call_transform(site, __sc_insn(!func, tail), func);
}

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index fb3f251ea021..6d5951113d53 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3310,6 +3310,9 @@ static bool ignore_unreachable_insn(struct objtool_file *file, struct instructio
if (!insn->func)
return false;

+ if (insn->func->static_call_tramp)
+ return true;
+
/*
* CONFIG_UBSAN_TRAP inserts a UD2 when it sees
* __builtin_unreachable(). The BUG() macro has an unreachable() after

2021-10-30 08:20:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sat, Oct 30, 2021 at 09:47:58AM +0200, Peter Zijlstra wrote:
> On Fri, Oct 29, 2021 at 10:03:24PM +0200, Peter Zijlstra wrote:
>
> > So I had a bit of a peek at what clang generates:
> >
> > 3fa4: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 3fa7: R_X86_64_32S __SCK__x86_pmu_handle_irq
> > 3fab: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 3fae: R_X86_64_32S __SCT__x86_pmu_handle_irq.cfi_jt
> > 3fb2: e8 00 00 00 00 call 3fb7 <init_hw_perf_events+0x1dc> 3fb3: R_X86_64_PLT32 __static_call_update-0x4
> >
> > So this then gives the trampoline jump table entry to
> > __static_call_update(), with the result that it will rewrite the
> > jump-table entry, not the trampoline!
> >
> > Now it so happens that the trampoline looks *exactly* like the
> > jump-table entry (one jmp.d32 instruction), so in that regards it'll
> > again 'work'.
> >
> > But this is all really, as in *really*, wrong. And I'm really sad I'm
> > the one to have to discover this, even though I've mentioned
> > static_call()s being tricky in previous reviews.
>
> The below makes the clang-cfi build properly sick:
>
> [ 0.000000] trampoline signature fail
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] kernel BUG at arch/x86/kernel/static_call.c:65!

So fundamentally I think the whole notion that the address of a function
is something different than 'the address of that function' is an *utter*
fail.

So things like FineIBT use a scheme where they pass a hash value along
with the indirect call, which is veryfied at the other end. Can't we
adjust it like:


foo.cfi:
endbr
xorl $0xdeadbeef, %r10d
jz foo
ud2
nop # make it an even 16 bytes
foo:
# actual function text


Then have the address of foo, be the address of foo, like any normal
sane person would expect. Have direct calls to foo, go to foo, again, as
expected.

When doing an indirect call (to r11, as clang does), then, and only
then, do:

movl $0xdeadbeef, %r10d
subq $0x10, %r11
call *%r11

# if the r11 lives, add:
addq $0x10, %r11


Then only when caller and callee agree 0xdeadbeef is the password, does
the indirect call go through.

Why isn't this a suitable CFI scheme even without IBT?

2021-10-30 17:22:21

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sat, 30 Oct 2021 at 09:49, Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Oct 29, 2021 at 10:03:24PM +0200, Peter Zijlstra wrote:
>
> > So I had a bit of a peek at what clang generates:
> >
> > 3fa4: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 3fa7: R_X86_64_32S __SCK__x86_pmu_handle_irq
> > 3fab: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 3fae: R_X86_64_32S __SCT__x86_pmu_handle_irq.cfi_jt
> > 3fb2: e8 00 00 00 00 call 3fb7 <init_hw_perf_events+0x1dc> 3fb3: R_X86_64_PLT32 __static_call_update-0x4
> >
> > So this then gives the trampoline jump table entry to
> > __static_call_update(), with the result that it will rewrite the
> > jump-table entry, not the trampoline!
> >
> > Now it so happens that the trampoline looks *exactly* like the
> > jump-table entry (one jmp.d32 instruction), so in that regards it'll
> > again 'work'.
> >
> > But this is all really, as in *really*, wrong. And I'm really sad I'm
> > the one to have to discover this, even though I've mentioned
> > static_call()s being tricky in previous reviews.
>
> The below makes the clang-cfi build properly sick:
>
> [ 0.000000] trampoline signature fail
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] kernel BUG at arch/x86/kernel/static_call.c:65!
>
> ---
> Subject: static_call,x86: Robustify trampoline patching
>
> Add a few signature bytes after the static call trampoline and verify
> those bytes match before patching the trampoline. This avoids patching
> random other JMPs (such as CFI jump-table entries) instead.
>
> These bytes decode as:
>
> d: 53 push %rbx
> e: 43 54 rex.XB push %r12
>
> And happen to spell "SCT".
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>

I just realized that arm64 has the exact same problem, which is not
being addressed by my v5 of the static call support patch.

As it turns out, the v11 Clang that I have been testing with is broken
wrt BTI landing pads, and omits them from the jump table entries.
Clang 12+ adds them properly, which means that both the jump table
entry and the static call trampoline may start with BTI C + direct
branch, and we also need additional checks to disambiguate.

2021-10-30 18:05:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sat, Oct 30, 2021 at 07:19:53PM +0200, Ard Biesheuvel wrote:
> I just realized that arm64 has the exact same problem, which is not
> being addressed by my v5 of the static call support patch.

Yeah, it would.

> As it turns out, the v11 Clang that I have been testing with is broken
> wrt BTI landing pads, and omits them from the jump table entries.
> Clang 12+ adds them properly, which means that both the jump table
> entry and the static call trampoline may start with BTI C + direct
> branch, and we also need additional checks to disambiguate.

I'm not sure, why would the static_call trampoline need a BTI C ? The
whole point of static_call() is to be a direct call, we should never
have an indirect call to the trampoline, that would defeat the whole
purpose.

2021-10-30 19:02:36

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sat, 30 Oct 2021 at 20:03, Peter Zijlstra <[email protected]> wrote:
>
> On Sat, Oct 30, 2021 at 07:19:53PM +0200, Ard Biesheuvel wrote:
> > I just realized that arm64 has the exact same problem, which is not
> > being addressed by my v5 of the static call support patch.
>
> Yeah, it would.
>
> > As it turns out, the v11 Clang that I have been testing with is broken
> > wrt BTI landing pads, and omits them from the jump table entries.
> > Clang 12+ adds them properly, which means that both the jump table
> > entry and the static call trampoline may start with BTI C + direct
> > branch, and we also need additional checks to disambiguate.
>
> I'm not sure, why would the static_call trampoline need a BTI C ? The
> whole point of static_call() is to be a direct call, we should never
> have an indirect call to the trampoline, that would defeat the whole
> purpose.

This might happen when the distance between the caller and the
trampoline is more than 128 MB, in which case we emit a veneer that
uses an indirect call as well. So we definitely need the landing pad
in the trampoline.

2021-10-30 19:09:30

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Fri, Oct 29, 2021 at 1:04 PM Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Oct 27, 2021 at 08:50:17AM -0700, Sami Tolvanen wrote:
> > On Wed, Oct 27, 2021 at 7:18 AM Ard Biesheuvel <[email protected]> wrote:
>
> > > > /*
> > > > * Turns a Clang CFI jump-table entry into an actual function pointer.
> > > > * These jump-table entries are simply jmp.d32 instruction with their
> > > > * relative offset pointing to the actual function, therefore decode the
> > > > * instruction to find the real function.
> > > > */
> > > > static __always_inline void *nocfi_ptr(void *func)
> > > > {
> > > > union text_poke_insn insn = *(union text_poke_insn *)func;
> > > >
> > > > return func + sizeof(insn) + insn.disp;
> > > > }
> > > >
> > > > But really, that wants to be a compiler intrinsic.
> > >
> > > Agreed. We could easily do something similar on arm64, but I'd prefer
> > > to avoid that too.
> >
> > I'll see what we can do. Note that the compiler built-in we previously
> > discussed would have semantics similar to function_nocfi(). It would
> > return the raw function address from a symbol name, but it wouldn't
> > decode the address from an arbitrary pointer, so this would require
> > something different.
>
> So I had a bit of a peek at what clang generates:
>
> 3fa4: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 3fa7: R_X86_64_32S __SCK__x86_pmu_handle_irq
> 3fab: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 3fae: R_X86_64_32S __SCT__x86_pmu_handle_irq.cfi_jt
> 3fb2: e8 00 00 00 00 call 3fb7 <init_hw_perf_events+0x1dc> 3fb3: R_X86_64_PLT32 __static_call_update-0x4
>
> So this then gives the trampoline jump table entry to
> __static_call_update(), with the result that it will rewrite the
> jump-table entry, not the trampoline!
>
> Now it so happens that the trampoline looks *exactly* like the
> jump-table entry (one jmp.d32 instruction), so in that regards it'll
> again 'work'.

Ugh, good catch!

> But this is all really, as in *really*, wrong. And I'm really sad I'm
> the one to have to discover this, even though I've mentioned
> static_call()s being tricky in previous reviews.

My bad, I didn't realize Clang does this with a typeof(func)
declaration. I'll make sure we have a reasonable fix for this before
the next version.

Sami

2021-10-31 16:26:59

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sat, 30 Oct 2021 at 20:55, Ard Biesheuvel <[email protected]> wrote:
>
> On Sat, 30 Oct 2021 at 20:03, Peter Zijlstra <[email protected]> wrote:
> >
> > On Sat, Oct 30, 2021 at 07:19:53PM +0200, Ard Biesheuvel wrote:
> > > I just realized that arm64 has the exact same problem, which is not
> > > being addressed by my v5 of the static call support patch.
> >
> > Yeah, it would.
> >
> > > As it turns out, the v11 Clang that I have been testing with is broken
> > > wrt BTI landing pads, and omits them from the jump table entries.
> > > Clang 12+ adds them properly, which means that both the jump table
> > > entry and the static call trampoline may start with BTI C + direct
> > > branch, and we also need additional checks to disambiguate.
> >
> > I'm not sure, why would the static_call trampoline need a BTI C ? The
> > whole point of static_call() is to be a direct call, we should never
> > have an indirect call to the trampoline, that would defeat the whole
> > purpose.
>
> This might happen when the distance between the caller and the
> trampoline is more than 128 MB, in which case we emit a veneer that
> uses an indirect call as well. So we definitely need the landing pad
> in the trampoline.

Something like the below seems to work to prevent getting the wrong
trampoline address into arch_static_call_transform:

diff --git a/arch/x86/include/asm/static_call.h
b/arch/x86/include/asm/static_call.h
index cbb67b6030f9..c3704ea21bee 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -25,7 +25,9 @@
asm(".pushsection .static_call.text, \"ax\" \n" \
".align 4 \n" \
".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
+ ".globl " STATIC_CALL_TRAMP_P_STR(name) " \n" \
STATIC_CALL_TRAMP_STR(name) ": \n" \
+ STATIC_CALL_TRAMP_P_STR(name) ": \n" \
insns " \n" \
".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
".size " STATIC_CALL_TRAMP_STR(name) ", . - "
STATIC_CALL_TRAMP_STR(name) " \n" \
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 19dc210214c0..46777a3395d3 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -143,7 +143,7 @@
*/
extern void arch_static_call_transform(void *site, void *tramp, void
*func, bool tail);

-#define STATIC_CALL_TRAMP_ADDR(name) &STATIC_CALL_TRAMP(name)
+#define STATIC_CALL_TRAMP_ADDR(name) &STATIC_CALL_TRAMP_P(name)

#else
#define STATIC_CALL_TRAMP_ADDR(name) NULL
diff --git a/include/linux/static_call_types.h
b/include/linux/static_call_types.h
index 5a00b8b2cf9f..98a448f5ae45 100644
--- a/include/linux/static_call_types.h
+++ b/include/linux/static_call_types.h
@@ -18,6 +18,8 @@
#define STATIC_CALL_TRAMP(name)
__PASTE(STATIC_CALL_TRAMP_PREFIX, name)
#define STATIC_CALL_TRAMP_STR(name) __stringify(STATIC_CALL_TRAMP(name))

+#define STATIC_CALL_TRAMP_P(name) __PASTE(STATIC_CALL_TRAMP(name), _p)
+#define STATIC_CALL_TRAMP_P_STR(name) __stringify(STATIC_CALL_TRAMP_P(name))
/*
* Flags in the low bits of static_call_site::key.
*/
@@ -36,7 +38,8 @@ struct static_call_site {

#define DECLARE_STATIC_CALL(name, func)
\
extern struct static_call_key STATIC_CALL_KEY(name); \
- extern typeof(func) STATIC_CALL_TRAMP(name);
+ extern typeof(func) STATIC_CALL_TRAMP(name); \
+ extern u8 STATIC_CALL_TRAMP_P(name);

#ifdef CONFIG_HAVE_STATIC_CALL

That leaves the 'func' argument, which ideally should not go through
the jump table either, but at least it is not terminally broken there.

2021-10-31 16:42:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sun, Oct 31, 2021 at 05:24:13PM +0100, Ard Biesheuvel wrote:
> On Sat, 30 Oct 2021 at 20:55, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Sat, 30 Oct 2021 at 20:03, Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Sat, Oct 30, 2021 at 07:19:53PM +0200, Ard Biesheuvel wrote:
> > > > I just realized that arm64 has the exact same problem, which is not
> > > > being addressed by my v5 of the static call support patch.
> > >
> > > Yeah, it would.
> > >
> > > > As it turns out, the v11 Clang that I have been testing with is broken
> > > > wrt BTI landing pads, and omits them from the jump table entries.
> > > > Clang 12+ adds them properly, which means that both the jump table
> > > > entry and the static call trampoline may start with BTI C + direct
> > > > branch, and we also need additional checks to disambiguate.
> > >
> > > I'm not sure, why would the static_call trampoline need a BTI C ? The
> > > whole point of static_call() is to be a direct call, we should never
> > > have an indirect call to the trampoline, that would defeat the whole
> > > purpose.
> >
> > This might happen when the distance between the caller and the
> > trampoline is more than 128 MB, in which case we emit a veneer that
> > uses an indirect call as well. So we definitely need the landing pad
> > in the trampoline.
>
> Something like the below seems to work to prevent getting the wrong
> trampoline address into arch_static_call_transform:

Is is also a terriblly gross hack. I really want the clang-cfi stuff to
improve, not add layers of hacks on top of it.

2021-10-31 16:45:20

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sun, 31 Oct 2021 at 17:39, Peter Zijlstra <[email protected]> wrote:
>
> On Sun, Oct 31, 2021 at 05:24:13PM +0100, Ard Biesheuvel wrote:
> > On Sat, 30 Oct 2021 at 20:55, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > On Sat, 30 Oct 2021 at 20:03, Peter Zijlstra <[email protected]> wrote:
> > > >
> > > > On Sat, Oct 30, 2021 at 07:19:53PM +0200, Ard Biesheuvel wrote:
> > > > > I just realized that arm64 has the exact same problem, which is not
> > > > > being addressed by my v5 of the static call support patch.
> > > >
> > > > Yeah, it would.
> > > >
> > > > > As it turns out, the v11 Clang that I have been testing with is broken
> > > > > wrt BTI landing pads, and omits them from the jump table entries.
> > > > > Clang 12+ adds them properly, which means that both the jump table
> > > > > entry and the static call trampoline may start with BTI C + direct
> > > > > branch, and we also need additional checks to disambiguate.
> > > >
> > > > I'm not sure, why would the static_call trampoline need a BTI C ? The
> > > > whole point of static_call() is to be a direct call, we should never
> > > > have an indirect call to the trampoline, that would defeat the whole
> > > > purpose.
> > >
> > > This might happen when the distance between the caller and the
> > > trampoline is more than 128 MB, in which case we emit a veneer that
> > > uses an indirect call as well. So we definitely need the landing pad
> > > in the trampoline.
> >
> > Something like the below seems to work to prevent getting the wrong
> > trampoline address into arch_static_call_transform:
>
> Is is also a terriblly gross hack. I really want the clang-cfi stuff to
> improve, not add layers of hacks on top of it.

I'm just as annoyed as you are about the apparent need for this.
However, emitting an alias at build time is far better IMHO than
adding a magic byte sequence and having to check it at runtime.

2021-10-31 20:13:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sun, Oct 31, 2021 at 05:44:04PM +0100, Ard Biesheuvel wrote:

> > Is is also a terriblly gross hack. I really want the clang-cfi stuff to
> > improve, not add layers of hacks on top of it.
>
> I'm just as annoyed as you are about the apparent need for this.
> However, emitting an alias at build time is far better IMHO than
> adding a magic byte sequence and having to check it at runtime.

Oh, I'm keeping that magic sequence :-) That's hardening in general, and
I don't want to ever want to debug a wrong poke like that again.

Adding an extra label fixes this thing, but there's still the other
cases where we need/want/desire a *real* function pointer.

I'm very close to saying that anything that mucks up function pointers
like this is a complete non-starter. Let's start re-start this whole CFI
endeavour from the start.

2021-10-31 20:24:20

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sun, 31 Oct 2021 at 21:11, Peter Zijlstra <[email protected]> wrote:
>
> On Sun, Oct 31, 2021 at 05:44:04PM +0100, Ard Biesheuvel wrote:
>
> > > Is is also a terriblly gross hack. I really want the clang-cfi stuff to
> > > improve, not add layers of hacks on top of it.
> >
> > I'm just as annoyed as you are about the apparent need for this.
> > However, emitting an alias at build time is far better IMHO than
> > adding a magic byte sequence and having to check it at runtime.
>
> Oh, I'm keeping that magic sequence :-) That's hardening in general, and
> I don't want to ever want to debug a wrong poke like that again.
>
> Adding an extra label fixes this thing, but there's still the other
> cases where we need/want/desire a *real* function pointer.
>
> I'm very close to saying that anything that mucks up function pointers
> like this is a complete non-starter. Let's start re-start this whole CFI
> endeavour from the start.

Well, CFI is already in mainline for arm64, whereas static call
support is not. So we have to deal with it one way or the other.

So for the static call targets, I agree that we want to support any
expression that produces a function pointer, but that part is not
actually broken, it is just sub-optimal iff you are using CFI Clang.
For taking the address of the trampoline, I think the solutions we
have are sufficient (although I am not inclined to add the magic sig
to arm64 if the label is sufficient).

That means we can support static calls on arm64 now without breaking
Clang CFI, and work on a solution for the redundant jumps on a more
relaxed schedule.

2021-10-31 20:48:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sun, Oct 31, 2021 at 09:21:56PM +0100, Ard Biesheuvel wrote:

> That means we can support static calls on arm64 now without breaking
> Clang CFI, and work on a solution for the redundant jumps on a more
> relaxed schedule.

Yes, arm64 has a 'problem' with having already merged the clang-cfi
stuff :/

I'm hoping the x86 solution can be an alternative CFI scheme, I'm
starting to really hate this one. And I'm not at all convinced the
proposed scheme is the best possible scheme given the constraints of
kernel code. AFAICT it's a compromise made in userspace.

2021-10-31 23:57:09

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sun, 31 Oct 2021 at 21:45, Peter Zijlstra <[email protected]> wrote:
>
> On Sun, Oct 31, 2021 at 09:21:56PM +0100, Ard Biesheuvel wrote:
>
> > That means we can support static calls on arm64 now without breaking
> > Clang CFI, and work on a solution for the redundant jumps on a more
> > relaxed schedule.
>
> Yes, arm64 has a 'problem' with having already merged the clang-cfi
> stuff :/
>
> I'm hoping the x86 solution can be an alternative CFI scheme, I'm
> starting to really hate this one. And I'm not at all convinced the
> proposed scheme is the best possible scheme given the constraints of
> kernel code. AFAICT it's a compromise made in userspace.

Your scheme only works with IBT: the value of %r11 is under the
adversary's control so it could just point it at 'foo+0x10' if it
wants to call foo indirectly, and circumvent the check. So without IBT
(or BTI), I think the check fundamentally belongs in the caller, not
in the callee.

2021-11-01 04:14:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On 10/27/21 15:27, Kees Cook wrote:
> On Wed, Oct 27, 2021 at 11:21:26PM +0200, Peter Zijlstra wrote:
>> On Wed, Oct 27, 2021 at 10:11:28AM -0700, Kees Cook wrote:
>>> On Wed, Oct 27, 2021 at 03:04:55PM +0200, Peter Zijlstra wrote:
>>>> [...]
>>>> cold_function()
>>>> {
>>>> ...
>>>> global_foo->func1(args1);
>>>> ...
>>>> }
>>>
>>> If global_foo is non-const, then the static call stuff is just an
>>> obfuscated indirect call.
>>
>> It is not. The target is determined once, at boot time, depending on the
>> hardware, it then never changes. The static_call() results in a direct
>> call to that function.
>
> Right, I mean it is _effectively_ an indirect call in the sense that the
> call destination is under the control of a writable memory location.
> Yes, static_call_update() must be called to make the change, hence why I
> didn't wave my arms around when static_call was originally added. It's a
> net benefit for the kernel: actual indirect calls now become updatable
> direct calls. This means reachability becomes much harder for attackers;
> I'm totally a fan. :)

I think this means that function_nocfi() is the wrong abstraction. To
solve this properly, we want (sorry for fake C++ concept-ish syntax, but
this would be a builtin):

template<function_pointer T> uintptr_t cfi_check_get_raw_address(T ptr);

You pass in a function pointer and you get out the address of the actual
function body. And this *also* emits the same type checking code that
an actual call would emit. So:

void (*ptr)(void);

ptr();

and:

void (*ptr)(void);

asm volatile ("call *%0" :: "rmi" (cfi_check_get_raw_address(ptr)));

would emit comparably safe code, except that the latter would actually
read the code bytes and the latter is missing clobbers and such. And
yes, this means that, without special support, the jump table can't live
in XO memory.

void foo(void);
int (*ptr)(void) = (void *)foo;
cfi_check_get_raw_address(ptr);

would crash due to a CFI violation.

For what it's worth, I feel fairly strongly that we should not merge
half-baked CFI support. Years ago, we merged half-baked stack protector
support in which we shoehorned in a totally inappropriate x86_32 gcc
implementation into the kernel, and the result was a mess. It's
*finally* cleaned up, and that only happened because we got gcc to add
the actual required support, then we waited a while, and then we dropped
stack protector support for old gcc versions. Yuck.

2021-11-01 09:08:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Mon, Nov 01, 2021 at 12:36:18AM +0100, Ard Biesheuvel wrote:
> On Sun, 31 Oct 2021 at 21:45, Peter Zijlstra <[email protected]> wrote:
> >
> > On Sun, Oct 31, 2021 at 09:21:56PM +0100, Ard Biesheuvel wrote:
> >
> > > That means we can support static calls on arm64 now without breaking
> > > Clang CFI, and work on a solution for the redundant jumps on a more
> > > relaxed schedule.
> >
> > Yes, arm64 has a 'problem' with having already merged the clang-cfi
> > stuff :/
> >
> > I'm hoping the x86 solution can be an alternative CFI scheme, I'm
> > starting to really hate this one. And I'm not at all convinced the
> > proposed scheme is the best possible scheme given the constraints of
> > kernel code. AFAICT it's a compromise made in userspace.
>
> Your scheme only works with IBT: the value of %r11 is under the
> adversary's control so it could just point it at 'foo+0x10' if it
> wants to call foo indirectly, and circumvent the check. So without IBT
> (or BTI), I think the check fundamentally belongs in the caller, not
> in the callee.

How is that not true for the jump table approach? Like I showed earlier,
it is *trivial* to reconstruct the actual function pointer from a
jump-table entry pointer.

In any case, I really want the discussion to start at square one, and
show/explain why any chosen CFI scheme is actually good for the kernel.
Just because clang happened to have implemented it, doesn't make it the
most suitable scheme for the kernel.

2021-11-01 09:40:53

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] static_call,x86: Robustify trampoline patching

From: Peter Zijlstra
> Sent: 01 November 2021 09:02
..
> In any case, I really want the discussion to start at square one, and
> show/explain why any chosen CFI scheme is actually good for the kernel.
> Just because clang happened to have implemented it, doesn't make it the
> most suitable scheme for the kernel.

How much overhead does it add to write("/dev/null", "", 1) ?
You've two large jump tables.
One for the syscall entry - (all the syscalls have the
same prototype), and a second for selecting the correct
device driver's 'write' entry point.

You really don't want to be doing any kind of search.

Hardware that supported a (say) 16-bit constant in both the
'landing pad' and call indirect instruction and trapped if
they differed would be useful - but I doubt any hardware
that checks landing pads is anywhere near that useful.

David.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-11-01 14:16:06

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Mon, 1 Nov 2021 at 10:05, Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Nov 01, 2021 at 12:36:18AM +0100, Ard Biesheuvel wrote:
> > On Sun, 31 Oct 2021 at 21:45, Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Sun, Oct 31, 2021 at 09:21:56PM +0100, Ard Biesheuvel wrote:
> > >
> > > > That means we can support static calls on arm64 now without breaking
> > > > Clang CFI, and work on a solution for the redundant jumps on a more
> > > > relaxed schedule.
> > >
> > > Yes, arm64 has a 'problem' with having already merged the clang-cfi
> > > stuff :/
> > >
> > > I'm hoping the x86 solution can be an alternative CFI scheme, I'm
> > > starting to really hate this one. And I'm not at all convinced the
> > > proposed scheme is the best possible scheme given the constraints of
> > > kernel code. AFAICT it's a compromise made in userspace.
> >
> > Your scheme only works with IBT: the value of %r11 is under the
> > adversary's control so it could just point it at 'foo+0x10' if it
> > wants to call foo indirectly, and circumvent the check. So without IBT
> > (or BTI), I think the check fundamentally belongs in the caller, not
> > in the callee.
>
> How is that not true for the jump table approach? Like I showed earlier,
> it is *trivial* to reconstruct the actual function pointer from a
> jump-table entry pointer.
>

That is not the point. The point is that Clang instruments every
indirect call that it emits, to check whether the type of the jump
table entry it is about to call matches the type of the caller. IOW,
the indirect calls can only branch into jump tables, and all jump
table entries in a table each branch to the start of some function of
the same type.

So the only thing you could achieve by adding or subtracting a
constant value from the indirect call address is either calling
another function of the same type (if you are hitting another entry in
the same table), or failing the CFI type check.

Instrumenting the callee only needs something like BTI, and a
consistent use of the landing pads to ensure that you cannot trivially
omit the check by landing right after it.

> In any case, I really want the discussion to start at square one, and
> show/explain why any chosen CFI scheme is actually good for the kernel.
> Just because clang happened to have implemented it, doesn't make it the
> most suitable scheme for the kernel.

Not disagreeing with that for x86, but again, we're already past that for arm64.

2021-11-02 13:31:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Mon, Nov 01, 2021 at 03:14:41PM +0100, Ard Biesheuvel wrote:
> On Mon, 1 Nov 2021 at 10:05, Peter Zijlstra <[email protected]> wrote:

> > How is that not true for the jump table approach? Like I showed earlier,
> > it is *trivial* to reconstruct the actual function pointer from a
> > jump-table entry pointer.
> >
>
> That is not the point. The point is that Clang instruments every
> indirect call that it emits, to check whether the type of the jump
> table entry it is about to call matches the type of the caller. IOW,
> the indirect calls can only branch into jump tables, and all jump
> table entries in a table each branch to the start of some function of
> the same type.
>
> So the only thing you could achieve by adding or subtracting a
> constant value from the indirect call address is either calling
> another function of the same type (if you are hitting another entry in
> the same table), or failing the CFI type check.

Ah, I see, so the call-site needs to have a branch around the indirect
call instruction.

> Instrumenting the callee only needs something like BTI, and a
> consistent use of the landing pads to ensure that you cannot trivially
> omit the check by landing right after it.

That does bring up another point tho; how are we going to do a kernel
that's optimal for both software CFI and hardware aided CFI?

All questions that need answering I think.


So how insane is something like this, have each function:

foo.cfi:
endbr64
xorl $0xdeadbeef, %r10d
jz foo
ud2
nop # make it 16 bytes
foo:
# actual function text goes here


And for each hash have two thunks:


# arg: r11
# clobbers: r10, r11
__x86_indirect_cfi_deadbeef:
movl -9(%r11), %r10 # immediate in foo.cfi
xorl $0xdeadbeef, %r10 # our immediate
jz 1f
ud2
1: ALTERNATIVE_2 "jmp *%r11",
"jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
"lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD



# arg: r11
# clobbers: r10, r11
__x86_indirect_ibt_deadbeef:
movl $0xdeadbeef, %r10
subq $0x10, %r11
ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE
jmp *%r11



And have the actual indirect callsite look like:

# r11 - &foo
ALTERNATIVE_2 "cs call __x86_indirect_thunk_r11",
"cs call __x86_indirect_cfi_deadbeef", X86_FEATURE_CFI
"cs call __x86_indirect_ibt_deadbeef", X86_FEATURE_IBT

Although if the compiler were to emit:

cs call __x86_indirect_cfi_deadbeef

we could probaly fix it up from there.


Then we can at runtime decide between:

{!cfi, cfi, ibt} x {!retpoline, retpoline, retpoline-amd}

2021-11-02 15:40:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:

> So how insane is something like this, have each function:
>
> foo.cfi:
> endbr64
> xorl $0xdeadbeef, %r10d
> jz foo
> ud2
> nop # make it 16 bytes
> foo:
> # actual function text goes here
>
>
> And for each hash have two thunks:
>
>
> # arg: r11
> # clobbers: r10, r11
> __x86_indirect_cfi_deadbeef:
> movl -9(%r11), %r10 # immediate in foo.cfi
> xorl $0xdeadbeef, %r10 # our immediate
> jz 1f
> ud2
> 1: ALTERNATIVE_2 "jmp *%r11",
> "jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
> "lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
>
>
>
> # arg: r11
> # clobbers: r10, r11
> __x86_indirect_ibt_deadbeef:
> movl $0xdeadbeef, %r10
> subq $0x10, %r11
> ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE
> jmp *%r11
>

These two thunks could of course be one big alternative.

> And have the actual indirect callsite look like:
>
> # r11 - &foo
> ALTERNATIVE_2 "cs call __x86_indirect_thunk_r11",
> "cs call __x86_indirect_cfi_deadbeef", X86_FEATURE_CFI
> "cs call __x86_indirect_ibt_deadbeef", X86_FEATURE_IBT

Also simplifying this.

> Although if the compiler were to emit:
>
> cs call __x86_indirect_cfi_deadbeef
>
> we could probaly fix it up from there.
>
>
> Then we can at runtime decide between:
>
> {!cfi, cfi, ibt} x {!retpoline, retpoline, retpoline-amd}
>

2021-11-02 17:29:05

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v5 00/15] x86: Add support for Clang CFI

On Thu, Oct 28, 2021 at 10:29:05PM +0200, Peter Zijlstra wrote:
> Now, since code (on x86) is variable length, there are no spare bits in
> the code address, but since static_call_key is aligned, we have spare
> bits. It is those bits we use to encode TAIL (Bit0) and INIT (Bit1).
>
> If INIT, the address points to an __init section and we shouldn't try
> and touch if after those have been freed or bad stuff happens.
>
> If TAIL, it's a tail-call and we get to write a jump instruction instead
> of a call instruction.

I think this is the part that I was missing: the information is about
the _address_, but it's stored in the _key_'s low bits (regardless of
the key's actual/masked key pointer).

> [...]
> Hope that clarifies things, instead of making it worse :-)

It does help, yes, thanks! I will need to read it again and go follow
along in the code, but yes, that helps explain it.

--
Kees Cook

2021-11-02 17:37:58

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Sat, Oct 30, 2021 at 10:16:31AM +0200, Peter Zijlstra wrote:
> foo.cfi:
> endbr
> xorl $0xdeadbeef, %r10d
> jz foo
> ud2
> nop # make it an even 16 bytes
> foo:
> # actual function text
>
>
> Then have the address of foo, be the address of foo, like any normal
> sane person would expect. Have direct calls to foo, go to foo, again, as
> expected.
>
> When doing an indirect call (to r11, as clang does), then, and only
> then, do:
>
> movl $0xdeadbeef, %r10d
> subq $0x10, %r11
> call *%r11
>
> # if the r11 lives, add:
> addq $0x10, %r11
>
>
> Then only when caller and callee agree 0xdeadbeef is the password, does
> the indirect call go through.
>
> Why isn't this a suitable CFI scheme even without IBT?

The trouble is that the callee is doing the verification. There's no
protection against calling into a callee that doesn't perform a check
(e.g. BPF JIT, or otherwise constructed executable memory, etc). The
caller needs to do the verification that what they're calling into is
safe before it makes the call.

--
Kees Cook

2021-11-02 17:49:33

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, 2 Nov 2021 at 16:15, Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
>
> > So how insane is something like this, have each function:
> >
> > foo.cfi:
> > endbr64
> > xorl $0xdeadbeef, %r10d
> > jz foo
> > ud2
> > nop # make it 16 bytes
> > foo:
> > # actual function text goes here
> >
> >
> > And for each hash have two thunks:
> >
> >
> > # arg: r11
> > # clobbers: r10, r11
> > __x86_indirect_cfi_deadbeef:
> > movl -9(%r11), %r10 # immediate in foo.cfi
> > xorl $0xdeadbeef, %r10 # our immediate
> > jz 1f
> > ud2
> > 1: ALTERNATIVE_2 "jmp *%r11",
> > "jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
> > "lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
> >

So are these supposed to go into the jump tables? If so, there still
needs to be a check against the boundary of the table at the call
site, to ensure that we are not calling something that we shouldn't.

If they are not going into the jump tables, I don't see the point of
having them, as only happy flow/uncomprised code would bother to use
them.


> >
> >
> > # arg: r11
> > # clobbers: r10, r11
> > __x86_indirect_ibt_deadbeef:
> > movl $0xdeadbeef, %r10
> > subq $0x10, %r11
> > ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE
> > jmp *%r11
> >
>
> These two thunks could of course be one big alternative.
>
> > And have the actual indirect callsite look like:
> >
> > # r11 - &foo
> > ALTERNATIVE_2 "cs call __x86_indirect_thunk_r11",
> > "cs call __x86_indirect_cfi_deadbeef", X86_FEATURE_CFI
> > "cs call __x86_indirect_ibt_deadbeef", X86_FEATURE_IBT
>
> Also simplifying this.
>
> > Although if the compiler were to emit:
> >
> > cs call __x86_indirect_cfi_deadbeef
> >
> > we could probaly fix it up from there.
> >
> >
> > Then we can at runtime decide between:
> >
> > {!cfi, cfi, ibt} x {!retpoline, retpoline, retpoline-amd}
> >

2021-11-02 18:11:27

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 01, 2021 at 03:14:41PM +0100, Ard Biesheuvel wrote:
> > On Mon, 1 Nov 2021 at 10:05, Peter Zijlstra <[email protected]> wrote:
>
> > > How is that not true for the jump table approach? Like I showed earlier,
> > > it is *trivial* to reconstruct the actual function pointer from a
> > > jump-table entry pointer.
> > >
> >
> > That is not the point. The point is that Clang instruments every
> > indirect call that it emits, to check whether the type of the jump
> > table entry it is about to call matches the type of the caller. IOW,
> > the indirect calls can only branch into jump tables, and all jump
> > table entries in a table each branch to the start of some function of
> > the same type.
> >
> > So the only thing you could achieve by adding or subtracting a
> > constant value from the indirect call address is either calling
> > another function of the same type (if you are hitting another entry in
> > the same table), or failing the CFI type check.
>
> Ah, I see, so the call-site needs to have a branch around the indirect
> call instruction.
>
> > Instrumenting the callee only needs something like BTI, and a
> > consistent use of the landing pads to ensure that you cannot trivially
> > omit the check by landing right after it.
>
> That does bring up another point tho; how are we going to do a kernel
> that's optimal for both software CFI and hardware aided CFI?
>
> All questions that need answering I think.

I'm totally fine with designing a new CFI for a future option,
but blocking the existing (working) one does not best serve our end
users. There are already people waiting on x86 CFI because having the
extra layer of defense is valuable for them. No, it's not perfect,
but it works right now, and evidence from Android shows that it has
significant real-world defensive value. Some of the more adventurous
are actually patching their kernels with the CFI support already, and
happily running their workloads, etc.

Supporting Clang CFI means we actually have something to evolve
from, where as starting completely over means (likely significant)
delays leaving folks without the option available at all. I think the
various compiler and kernel tweaks needed to improve kernel support
are reasonable, but building a totally new CFI implementation is not:
it _does_ work today on x86. Yes, it's weird, but not outrageously so.
(And just to state the obvious, CFI is an _optional_ CONFIG: not
everyone wants CFI, so it's okay if there are some sharp edges under
some CONFIG combinations.)

Regardless, speaking to a new CFI design below:

> So how insane is something like this, have each function:
>
> foo.cfi:
> endbr64
> xorl $0xdeadbeef, %r10d
> jz foo
> ud2
> nop # make it 16 bytes
> foo:
> # actual function text goes here
>
>
> And for each hash have two thunks:
>
>
> # arg: r11
> # clobbers: r10, r11
> __x86_indirect_cfi_deadbeef:
> movl -9(%r11), %r10 # immediate in foo.cfi

This requires the text be readable. I have been hoping to avoid this for
a CFI implementation so we could gain the benefit of execute-only
memory (available soon on arm64, and possible today on x86 under a
hypervisor).

But, yes, FWIW, this is very similar to what PaX RAP CFI does: the
caller reads "$dest - offset" for a hash, and compares it against the
hard-coded hash at the call site, before "call $dest".

> xorl $0xdeadbeef, %r10 # our immediate
> jz 1f
> ud2
> 1: ALTERNATIVE_2 "jmp *%r11",
> "jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
> "lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
>
>
>
> # arg: r11
> # clobbers: r10, r11
> __x86_indirect_ibt_deadbeef:
> movl $0xdeadbeef, %r10
> subq $0x10, %r11
> ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE
> jmp *%r11
>
>
>
> And have the actual indirect callsite look like:
>
> # r11 - &foo
> ALTERNATIVE_2 "cs call __x86_indirect_thunk_r11",
> "cs call __x86_indirect_cfi_deadbeef", X86_FEATURE_CFI
> "cs call __x86_indirect_ibt_deadbeef", X86_FEATURE_IBT
>
> Although if the compiler were to emit:
>
> cs call __x86_indirect_cfi_deadbeef
>
> we could probaly fix it up from there.

It seems like this could work for any CFI implementation, though, if
the CFI implementation always performed a call, or if the bounds of the
inline checking were known? i.e. objtool could also find the inline
checks just as well as the call, though?

> Then we can at runtime decide between:
>
> {!cfi, cfi, ibt} x {!retpoline, retpoline, retpoline-amd}

This does look pretty powerful, but I still don't think it precludes
using the existing Clang CFI. I don't want perfect to be the enemy of
good. :)

--
Kees Cook

2021-11-02 18:15:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 06:44:56PM +0100, Ard Biesheuvel wrote:
> On Tue, 2 Nov 2021 at 16:15, Peter Zijlstra <[email protected]> wrote:
> >
> > On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
> >
> > > So how insane is something like this, have each function:
> > >
> > > foo.cfi:
> > > endbr64
> > > xorl $0xdeadbeef, %r10d
> > > jz foo
> > > ud2
> > > nop # make it 16 bytes
> > > foo:
> > > # actual function text goes here
> > >
> > >
> > > And for each hash have two thunks:
> > >
> > >
> > > # arg: r11
> > > # clobbers: r10, r11
> > > __x86_indirect_cfi_deadbeef:
> > > movl -9(%r11), %r10 # immediate in foo.cfi
> > > xorl $0xdeadbeef, %r10 # our immediate
> > > jz 1f
> > > ud2
> > > 1: ALTERNATIVE_2 "jmp *%r11",
> > > "jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
> > > "lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
> > >
>
> So are these supposed to go into the jump tables? If so, there still
> needs to be a check against the boundary of the table at the call
> site, to ensure that we are not calling something that we shouldn't.
>
> If they are not going into the jump tables, I don't see the point of
> having them, as only happy flow/uncomprised code would bother to use
> them.

I don't understand. If you can scribble your own code, you can
circumvent pretty much any range check anyway. But if you can't scribble
your own code, you get to use the branch here and that checks the
callsite and callee signature.

The range check isn't fundamental to CFI, having a check is the
important thing AFAIU.

2021-11-02 18:19:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 07:14:25PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 02, 2021 at 06:44:56PM +0100, Ard Biesheuvel wrote:
> > On Tue, 2 Nov 2021 at 16:15, Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
> > >
> > > > So how insane is something like this, have each function:
> > > >
> > > > foo.cfi:
> > > > endbr64
> > > > xorl $0xdeadbeef, %r10d
> > > > jz foo
> > > > ud2
> > > > nop # make it 16 bytes
> > > > foo:
> > > > # actual function text goes here
> > > >
> > > >
> > > > And for each hash have two thunks:
> > > >
> > > >
> > > > # arg: r11
> > > > # clobbers: r10, r11
> > > > __x86_indirect_cfi_deadbeef:
> > > > movl -9(%r11), %r10 # immediate in foo.cfi
> > > > xorl $0xdeadbeef, %r10 # our immediate
> > > > jz 1f
> > > > ud2
> > > > 1: ALTERNATIVE_2 "jmp *%r11",
> > > > "jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
> > > > "lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
> > > >
> >
> > So are these supposed to go into the jump tables? If so, there still
> > needs to be a check against the boundary of the table at the call
> > site, to ensure that we are not calling something that we shouldn't.
> >
> > If they are not going into the jump tables, I don't see the point of
> > having them, as only happy flow/uncomprised code would bother to use
> > them.
>
> I don't understand. If you can scribble your own code, you can
> circumvent pretty much any range check anyway. But if you can't scribble
> your own code, you get to use the branch here and that checks the
> callsite and callee signature.
>
> The range check isn't fundamental to CFI, having a check is the
> important thing AFAIU.

That is, how is a jump-table/range-check better than a hash-value match
check?

2021-11-02 18:23:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 10:35:30AM -0700, Kees Cook wrote:
> On Sat, Oct 30, 2021 at 10:16:31AM +0200, Peter Zijlstra wrote:
> > foo.cfi:
> > endbr
> > xorl $0xdeadbeef, %r10d
> > jz foo
> > ud2
> > nop # make it an even 16 bytes
> > foo:
> > # actual function text
> >
> >
> > Then have the address of foo, be the address of foo, like any normal
> > sane person would expect. Have direct calls to foo, go to foo, again, as
> > expected.
> >
> > When doing an indirect call (to r11, as clang does), then, and only
> > then, do:
> >
> > movl $0xdeadbeef, %r10d
> > subq $0x10, %r11
> > call *%r11
> >
> > # if the r11 lives, add:
> > addq $0x10, %r11
> >
> >
> > Then only when caller and callee agree 0xdeadbeef is the password, does
> > the indirect call go through.
> >
> > Why isn't this a suitable CFI scheme even without IBT?
>
> The trouble is that the callee is doing the verification. There's no
> protection against calling into a callee that doesn't perform a check
> (e.g. BPF JIT, or otherwise constructed executable memory, etc). The
> caller needs to do the verification that what they're calling into is
> safe before it makes the call.

Right, Ard said the same, see new crackpot scheme here:

https://lkml.kernel.org/r/[email protected]

2021-11-02 18:24:17

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, 2 Nov 2021 at 19:14, Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Nov 02, 2021 at 06:44:56PM +0100, Ard Biesheuvel wrote:
> > On Tue, 2 Nov 2021 at 16:15, Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
> > >
> > > > So how insane is something like this, have each function:
> > > >
> > > > foo.cfi:
> > > > endbr64
> > > > xorl $0xdeadbeef, %r10d
> > > > jz foo
> > > > ud2
> > > > nop # make it 16 bytes
> > > > foo:
> > > > # actual function text goes here
> > > >
> > > >
> > > > And for each hash have two thunks:
> > > >
> > > >
> > > > # arg: r11
> > > > # clobbers: r10, r11
> > > > __x86_indirect_cfi_deadbeef:
> > > > movl -9(%r11), %r10 # immediate in foo.cfi
> > > > xorl $0xdeadbeef, %r10 # our immediate
> > > > jz 1f
> > > > ud2
> > > > 1: ALTERNATIVE_2 "jmp *%r11",
> > > > "jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
> > > > "lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
> > > >
> >
> > So are these supposed to go into the jump tables? If so, there still
> > needs to be a check against the boundary of the table at the call
> > site, to ensure that we are not calling something that we shouldn't.
> >
> > If they are not going into the jump tables, I don't see the point of
> > having them, as only happy flow/uncomprised code would bother to use
> > them.
>
> I don't understand. If you can scribble your own code, you can
> circumvent pretty much any range check anyway.

A function pointer is data not code.

> But if you can't scribble
> your own code, you get to use the branch here and that checks the
> callsite and callee signature.
>

OK, so the call site has a direct branch to this trampoline then? That
wasn't clear to me.

> The range check isn't fundamental to CFI, having a check is the
> important thing AFAIU.

Agreed. If the call site has a direct branch, it doesn't need the range check.

2021-11-02 21:07:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching



On Tue, Nov 2, 2021, at 11:10 AM, Kees Cook wrote:
> On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
>> On Mon, Nov 01, 2021 at 03:14:41PM +0100, Ard Biesheuvel wrote:
>> > On Mon, 1 Nov 2021 at 10:05, Peter Zijlstra <[email protected]> wrote:
>>
>> > > How is that not true for the jump table approach? Like I showed earlier,
>> > > it is *trivial* to reconstruct the actual function pointer from a
>> > > jump-table entry pointer.
>> > >
>> >
>> > That is not the point. The point is that Clang instruments every
>> > indirect call that it emits, to check whether the type of the jump
>> > table entry it is about to call matches the type of the caller. IOW,
>> > the indirect calls can only branch into jump tables, and all jump
>> > table entries in a table each branch to the start of some function of
>> > the same type.
>> >
>> > So the only thing you could achieve by adding or subtracting a
>> > constant value from the indirect call address is either calling
>> > another function of the same type (if you are hitting another entry in
>> > the same table), or failing the CFI type check.
>>
>> Ah, I see, so the call-site needs to have a branch around the indirect
>> call instruction.
>>
>> > Instrumenting the callee only needs something like BTI, and a
>> > consistent use of the landing pads to ensure that you cannot trivially
>> > omit the check by landing right after it.
>>
>> That does bring up another point tho; how are we going to do a kernel
>> that's optimal for both software CFI and hardware aided CFI?
>>
>> All questions that need answering I think.
>
> I'm totally fine with designing a new CFI for a future option,
> but blocking the existing (working) one does not best serve our end
> users.

I like security, but I also like building working systems, and I think I disagree with you. There are a whole bunch of CFI schemes out there, with varying hardware requirements, and they provide varying degrees of fine grained protection and varying degrees of protection against improper speculation. We do not want to merge clang CFI just because it’s “ready” and end up with a mess that makes it harder to support other schemes in the kernel.

So, yes, a good CFI scheme needs caller-side protection, especially if IBT isn’t in use. But a good CFI scheme also needs to interoperate with the rest of the kernel, and this whole “canonical” and symbol-based lookup and static_call thing is nonsense. I think we need a better implementation, whether it uses intrinsics or little C helpers or whatever.

I’m not saying this needs to be incompatible with current clang releases, but I do think we need a clear story for how operations like static call patching are supposed to work.

FYI, Ard, many years ago we merged kernel support for the original gcc stack protector. We have since *removed* it on x86_32 in favor of a nicer implementation that requires a newer toolchain.


2021-11-02 21:21:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 11:10:10AM -0700, Kees Cook wrote:
> On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:

> > All questions that need answering I think.
>
> I'm totally fine with designing a new CFI for a future option,
> but blocking the existing (working) one does not best serve our end
> users.

Accepting a half arsed CFI now, just because it is what we have, will
only make it ever so much more difficuly to get new/better
things implemented in the toolchains, because the pressure is off.

Also, it will make enabling IBT more difficult, or we'll end up having
CFI and IBT mutually exclusive, which is a crap position to be in.

> There are already people waiting on x86 CFI because having the
> extra layer of defense is valuable for them. No, it's not perfect,
> but it works right now, and evidence from Android shows that it has
> significant real-world defensive value. Some of the more adventurous
> are actually patching their kernels with the CFI support already, and
> happily running their workloads, etc.

It works by accident, not design. Which is a giant red flag in my book.

> Supporting Clang CFI means we actually have something to evolve
> from,

I don't want to evolve from a place that's crazy and we shouldn't have
been in to begin with. Redefining what a function pointer is is insane,
and the required work-arounds are ugly at best.

The ARM64 folks have expressed regret from having merged it. Why should
x86 do something that's known to cause regret?

> where as starting completely over means (likely significant)
> delays leaving folks without the option available at all.

See above. Maybe some of those folks will get motivated to make it
happen faster.

> I think the various compiler and kernel tweaks needed to improve
> kernel support are reasonable, but building a totally new CFI
> implementation is not: it _does_ work today on x86.

By sodding accident; see that one static call patch that makes it burn.

If someone were to use the jump-table entry after we'd scribbled it,
thing will go sideways bad.

> Yes, it's weird, but not outrageously so.

Clearly we disagree.

> Regardless, speaking to a new CFI design below:
>
> > So how insane is something like this, have each function:
> >
> > foo.cfi:
> > endbr64
> > xorl $0xdeadbeef, %r10d
> > jz foo
> > ud2
> > nop # make it 16 bytes
> > foo:
> > # actual function text goes here
> >
> >
> > And for each hash have two thunks:
> >
> >
> > # arg: r11
> > # clobbers: r10, r11
> > __x86_indirect_cfi_deadbeef:
> > movl -9(%r11), %r10 # immediate in foo.cfi
>
> This requires the text be readable. I have been hoping to avoid this for
> a CFI implementation so we could gain the benefit of execute-only
> memory (available soon on arm64, and possible today on x86 under a
> hypervisor).

It's only needed for the 'legacy' case of software only CFI if that
makes you feel better. BTI/IBT based CFI doesn't need this.

(also, I'm very much a bare metal kinda guy)

> > xorl $0xdeadbeef, %r10 # our immediate
> > jz 1f
> > ud2
> > 1: ALTERNATIVE_2 "jmp *%r11",
> > "jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
> > "lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
> >
> >
> >
> > # arg: r11
> > # clobbers: r10, r11
> > __x86_indirect_ibt_deadbeef:
> > movl $0xdeadbeef, %r10
> > subq $0x10, %r11
> > ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE
> > jmp *%r11
> >

IBT case ^ doesn't actually read from the code. It simply jumps in front
of the real function by a set amount to get the extra cruft /
landing-pad required for indirect calls.

Also note that direct calls never make use of that pad, which means it
can be stripped from all symbols that never have their address taken,
which means less indirect targets.

(Or poison the thing with UD2 and write 0 in the immediate)

> > And have the actual indirect callsite look like:
> >
> > # r11 - &foo
> > ALTERNATIVE_2 "cs call __x86_indirect_thunk_r11",
> > "cs call __x86_indirect_cfi_deadbeef", X86_FEATURE_CFI
> > "cs call __x86_indirect_ibt_deadbeef", X86_FEATURE_IBT
> >
> > Although if the compiler were to emit:
> >
> > cs call __x86_indirect_cfi_deadbeef
> >
> > we could probaly fix it up from there.
>
> It seems like this could work for any CFI implementation, though, if

The strong suggestion is that function pointers are sane, and there's a
few other considerations, but yes.

> the CFI implementation always performed a call, or if the bounds of the
> inline checking were known? i.e. objtool could also find the inline
> checks just as well as the call, though?

Somewhat hard, it's much easier to find a single instruction than a
pattern. Also, footprint. Smaller really is better.

> > Then we can at runtime decide between:
> >
> > {!cfi, cfi, ibt} x {!retpoline, retpoline, retpoline-amd}
>
> This does look pretty powerful, but I still don't think it precludes
> using the existing Clang CFI. I don't want perfect to be the enemy of
> good. :)

As stated above, we're in violent disagreement that the proposed scheme
comes anywhere near good. I utterly hate the redefintion of function
pointers and I also think the range compare goes sideways in the face of
modules. That's two marks against jump-tables.

(Also, in the !CFI case, you can't actually free the jump-tables, since
you're uncondtionally s(t)uck with them because of how &func is
wrecked.)

2021-11-02 21:49:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 07:18:53PM +0100, Ard Biesheuvel wrote:

> > The range check isn't fundamental to CFI, having a check is the
> > important thing AFAIU.
>
> Agreed. If the call site has a direct branch, it doesn't need the range check.

That, from the earlier email:

| And have the actual indirect callsite look like:
|
| # r11 - &foo
| ALTERNATIVE_2 "cs call __x86_indirect_thunk_r11",
| "cs call __x86_indirect_cfi_deadbeef", X86_FEATURE_CFI
| "cs call __x86_indirect_ibt_deadbeef", X86_FEATURE_IBT

So the callsite has a direct call to the hash-specific and cfi-type
specific thunk, which then does an (indirect) tail-call.

The CFI one does the hash check in the thunk and jumps to the function
proper, the IBT one on does it in the landing-pad.

The !CFI one ignore it all and simply does an indirect call (retpoline
aided or otherwise) to the function proper -- in which case we can free
all the thunks.

2021-11-02 23:15:48

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 02:02:38PM -0700, Andy Lutomirski wrote:
>
>
> On Tue, Nov 2, 2021, at 11:10 AM, Kees Cook wrote:
> > On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
> >> On Mon, Nov 01, 2021 at 03:14:41PM +0100, Ard Biesheuvel wrote:
> >> > On Mon, 1 Nov 2021 at 10:05, Peter Zijlstra <[email protected]> wrote:
> >>
> >> > > How is that not true for the jump table approach? Like I showed earlier,
> >> > > it is *trivial* to reconstruct the actual function pointer from a
> >> > > jump-table entry pointer.
> >> > >
> >> >
> >> > That is not the point. The point is that Clang instruments every
> >> > indirect call that it emits, to check whether the type of the jump
> >> > table entry it is about to call matches the type of the caller. IOW,
> >> > the indirect calls can only branch into jump tables, and all jump
> >> > table entries in a table each branch to the start of some function of
> >> > the same type.
> >> >
> >> > So the only thing you could achieve by adding or subtracting a
> >> > constant value from the indirect call address is either calling
> >> > another function of the same type (if you are hitting another entry in
> >> > the same table), or failing the CFI type check.
> >>
> >> Ah, I see, so the call-site needs to have a branch around the indirect
> >> call instruction.
> >>
> >> > Instrumenting the callee only needs something like BTI, and a
> >> > consistent use of the landing pads to ensure that you cannot trivially
> >> > omit the check by landing right after it.
> >>
> >> That does bring up another point tho; how are we going to do a kernel
> >> that's optimal for both software CFI and hardware aided CFI?
> >>
> >> All questions that need answering I think.
> >
> > I'm totally fine with designing a new CFI for a future option,
> > but blocking the existing (working) one does not best serve our end
> > users.
>
> I like security, but I also like building working systems, and I think
> I disagree with you. There are a whole bunch of CFI schemes out there,
> with varying hardware requirements, and they provide varying degrees
> of fine grained protection and varying degrees of protection against
> improper speculation. We do not want to merge clang CFI just because
> it’s “ready” and end up with a mess that makes it harder to support
> other schemes in the kernel.

Right, and I see the difficulties here. And speaking to Peter's
observation that CFI "accidentally" worked with static_calls, I don't
see it that way: it worked because it was designed to be as "invisible"
as possible. It's just that at a certain point of extreme binary output
control, it becomes an issue and I think that's going to be true for
*any* CFI system: they each will have different design characteristics.

One of the goals of the Clang CFI use in Linux was to make it as
minimally invasive as possible (and you can see this guiding Sami's
choices: e.g. he didn't go change all the opaque address uses to need a
"&" prefix added, etc). I think we're always going to have some
push/pull between the compiler's "general"ness and the kernel's
"specific"ness.

> So, yes, a good CFI scheme needs caller-side protection, especially if
> IBT isn’t in use. But a good CFI scheme also needs to interoperate with
> the rest of the kernel, and this whole “canonical” and symbol-based
> lookup and static_call thing is nonsense. I think we need a better
> implementation, whether it uses intrinsics or little C helpers or
> whatever.

I think we're very close already. Like I said, I think it's fine to nail
down some of these interoperability requirements; we've been doing it
all along. We got there with arm64, and it looks to me like we're almost
there on x86. There is this particular case with static_calls now, but I
don't think it's insurmountable.

> I’m not saying this needs to be incompatible with current clang
> releases, but I do think we need a clear story for how operations like
> static call patching are supposed to work.

Right -- and until very recently, it did all Just Work. I think we'll
always have these kinds of issues with whatever CFI implementations are
made available, and since we've already cleaned up so much of what's
needed to make the kernel work with *any* CFI, it makes sense to
continue with the one we've already got working for arm64.

> FYI, Ard, many years ago we merged kernel support for the original
> gcc stack protector. We have since *removed* it on x86_32 in favor of
> a nicer implementation that requires a newer toolchain.

Sure, and this is the kind of thing I mean: we had an awkward
implementation of a meaningful defense, and we improved on it. I think
it's important to be able to make these kinds of concessions to gain the
defensive features they provide. And yes, we can continue to improve it,
but in the meantime, we can stop entire classes of problems from
happening to our user base.

--
Kees Cook

2021-11-03 00:47:42

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On 11/2/21 16:13, Kees Cook wrote:
> On Tue, Nov 02, 2021 at 02:02:38PM -0700, Andy Lutomirski wrote:
>>
>>
>> On Tue, Nov 2, 2021, at 11:10 AM, Kees Cook wrote:
>>> On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
>>>> On Mon, Nov 01, 2021 at 03:14:41PM +0100, Ard Biesheuvel wrote:
>>>>> On Mon, 1 Nov 2021 at 10:05, Peter Zijlstra <[email protected]> wrote:
>>>>
>>>>>> How is that not true for the jump table approach? Like I showed earlier,
>>>>>> it is *trivial* to reconstruct the actual function pointer from a
>>>>>> jump-table entry pointer.
>>>>>>
>>>>>
>>>>> That is not the point. The point is that Clang instruments every
>>>>> indirect call that it emits, to check whether the type of the jump
>>>>> table entry it is about to call matches the type of the caller. IOW,
>>>>> the indirect calls can only branch into jump tables, and all jump
>>>>> table entries in a table each branch to the start of some function of
>>>>> the same type.
>>>>>
>>>>> So the only thing you could achieve by adding or subtracting a
>>>>> constant value from the indirect call address is either calling
>>>>> another function of the same type (if you are hitting another entry in
>>>>> the same table), or failing the CFI type check.
>>>>
>>>> Ah, I see, so the call-site needs to have a branch around the indirect
>>>> call instruction.
>>>>
>>>>> Instrumenting the callee only needs something like BTI, and a
>>>>> consistent use of the landing pads to ensure that you cannot trivially
>>>>> omit the check by landing right after it.
>>>>
>>>> That does bring up another point tho; how are we going to do a kernel
>>>> that's optimal for both software CFI and hardware aided CFI?
>>>>
>>>> All questions that need answering I think.
>>>
>>> I'm totally fine with designing a new CFI for a future option,
>>> but blocking the existing (working) one does not best serve our end
>>> users.
>>
>> I like security, but I also like building working systems, and I think
>> I disagree with you. There are a whole bunch of CFI schemes out there,
>> with varying hardware requirements, and they provide varying degrees
>> of fine grained protection and varying degrees of protection against
>> improper speculation. We do not want to merge clang CFI just because
>> it’s “ready” and end up with a mess that makes it harder to support
>> other schemes in the kernel.
>
> Right, and I see the difficulties here. And speaking to Peter's
> observation that CFI "accidentally" worked with static_calls, I don't
> see it that way: it worked because it was designed to be as "invisible"
> as possible. It's just that at a certain point of extreme binary output
> control, it becomes an issue and I think that's going to be true for
> *any* CFI system: they each will have different design characteristics.
>
> One of the goals of the Clang CFI use in Linux was to make it as
> minimally invasive as possible (and you can see this guiding Sami's
> choices: e.g. he didn't go change all the opaque address uses to need a
> "&" prefix added, etc). I think we're always going to have some
> push/pull between the compiler's "general"ness and the kernel's
> "specific"ness.

The "&" thing was the wrong way around. That part of CFI was Linux
being sloppy, and the & part is a straight-up improvement. Improvements
can be invasive. The problem with invasive code is when it invades
places it doesn't belong.

>
>> So, yes, a good CFI scheme needs caller-side protection, especially if
>> IBT isn’t in use. But a good CFI scheme also needs to interoperate with
>> the rest of the kernel, and this whole “canonical” and symbol-based
>> lookup and static_call thing is nonsense. I think we need a better
>> implementation, whether it uses intrinsics or little C helpers or
>> whatever.
>
> I think we're very close already. Like I said, I think it's fine to nail
> down some of these interoperability requirements; we've been doing it
> all along. We got there with arm64, and it looks to me like we're almost
> there on x86. There is this particular case with static_calls now, but I
> don't think it's insurmountable.

So fundamentally, I think there's a sort of type system issue here, and
it's more or less the same issue with IBT and with various fine-grained
refinements to IBT. Using syntax more or like what's been floating
around this thread, the asm works like this:

[possible magic here depending on the precise scheme]
func.indirect:
ENDBR (if it's an IBT build)
[magic here?]

[in a jump table scheme, there's a jmp here and possibly a large or
negative gap.]

func:
actual body

The point being that there's the actual function body (where one should
*direct* jump to) and there's the address that goes in a C function
pointer. Indirect calls/jumps go to (or pretend to go to, depending on
the scheme) func.indirect.

While one can plausibly kludge up the static call patching (and who
knows what else -- eBPF, kprobes, mcount stuff, etc) using symbol-based
hackery, I think we actually just want a way to convert from a function
pointer to a function address. It doesn't need to be fast, but it needs
to work. And I think this can probably be mostly done in plain C based
on clang's implementation. Ideally it would have a typecheck:

unsigned long __cfi_decode_funcpointer(funcptr type);

but that's not happening in plain C because I don't think there's a way
to get the magic metadata. But I bet we could do:

unsigned long __cfi_decode_funcpointer(funcptr val, funcptr ref)
{
BUG_ON(ref and val aren't the same type);
read insn bytes at val;
return the jump target;
}

If this won't work with current clang, let's ask to get whatever basic
functionality is needed to enable it.

<rambling>

Sadly, in clang, this is wrapped up in the incomprehensible "canonical"
stuff. I think that's a big mistake -- any sane ENDBR-using scheme
would really prefer that ENDBR to be right next to the actual function
body, and really any scheme would benefit due to better cache locality.
But, more importantly, IMO any sane ENDBR-using scheme wants to generate
the indirect stub as part of code gen for the actual function. In an
IBT build, this really doesn't deserve to work:

non-IBT C file generating:

func:
ret

IBT-aware C file:

extern void func(void);
ptr = func;

clang has horrible magic to generate the indirect stub in the caller
translation unit, and I think that's solving a problem that doesn't
really need solving. Sure, it's nifty, but it IMO should be opt-in, at
least in a world where everyone agrees to recompile for CFI. (Sure,
using that hack for userspace IBT would be cute, and it would work.)

There's another tradeoff, though: errors like this are possible:

translation unit A:
void func(int)
{
[body here]
}

translation unit B:
extern void func(char); /* wrong signature! */
ptr = func;
ptr();

The callee-generates-the-stub model will not catch this. The
caller-generates-the-stub model will.

>
> Sure, and this is the kind of thing I mean: we had an awkward
> implementation of a meaningful defense, and we improved on it. I think
> it's important to be able to make these kinds of concessions to gain the
> defensive features they provide. And yes, we can continue to improve it,
> but in the meantime, we can stop entire classes of problems from
> happening to our user base.
>

In retrospect, we should have put our feet down with stack protector,
though. The problem with it was gcc hardcoding things that shouldn't
have been hardcoded, and the gcc fixes were trivial.

2021-11-03 08:40:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On Tue, Nov 02, 2021 at 05:20:05PM -0700, Andy Lutomirski wrote:
> I think that's a big mistake -- any sane ENDBR-using scheme would
> really prefer that ENDBR to be right next to the actual function body,
> and really any scheme would benefit due to better cache locality.

Agreed, IBT/BTI want the landing pad in front of the actual function.

> But, more importantly, IMO any sane ENDBR-using scheme wants to
> generate the indirect stub as part of code gen for the actual
> function.

Sorta, I really want to be able to not have a landing pad for functions
whose address is never taken. At that point it doesn't matter if it gets
generated along with the function and then stripped/poisoned later, or
generated later.

As such, the landing pad should not be part of the function proper,
direct calls should never observe it.

Less landing pads is more better.

2021-11-03 10:04:14

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] static_call,x86: Robustify trampoline patching

From: Peter Zijlstra
> Sent: 03 November 2021 08:36
>
> On Tue, Nov 02, 2021 at 05:20:05PM -0700, Andy Lutomirski wrote:
> > I think that's a big mistake -- any sane ENDBR-using scheme would
> > really prefer that ENDBR to be right next to the actual function body,
> > and really any scheme would benefit due to better cache locality.
>
> Agreed, IBT/BTI want the landing pad in front of the actual function.
>
> > But, more importantly, IMO any sane ENDBR-using scheme wants to
> > generate the indirect stub as part of code gen for the actual
> > function.
>
> Sorta, I really want to be able to not have a landing pad for functions
> whose address is never taken. At that point it doesn't matter if it gets
> generated along with the function and then stripped/poisoned later, or
> generated later.
>
> As such, the landing pad should not be part of the function proper,
> direct calls should never observe it.
>
> Less landing pads is more better.

One problem is when a direct call is 'too far' for a call instruction.
IIRC this can happen in arm64 with modules (all 64bit except x86?).
So an indirect call has to be used instead - which needs the landing pad.
Although it may actually be better to put a trampoline (landing pad
+ near jump) elsewhere and have the module loader do the correct fixup.
(Is the loader already generating a trampoline in the module code?)
The function body can then be cache-line aligned - with its benefits.

Can't anything that can write instructions always use a retpoline
to implement a jump indirect to an arbitrary address?
(Not to mention just generating the required code rather than a call.)

AFAICT CFI is all about detecting invalid values in function pointer tables.
It doesn't really protect in any way from JIT code doing incorrect things.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-11-03 19:33:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On 11/3/21 01:35, Peter Zijlstra wrote:
> On Tue, Nov 02, 2021 at 05:20:05PM -0700, Andy Lutomirski wrote:
>> I think that's a big mistake -- any sane ENDBR-using scheme would
>> really prefer that ENDBR to be right next to the actual function body,
>> and really any scheme would benefit due to better cache locality.
>
> Agreed, IBT/BTI want the landing pad in front of the actual function.
>
>> But, more importantly, IMO any sane ENDBR-using scheme wants to
>> generate the indirect stub as part of code gen for the actual
>> function.
>
> Sorta, I really want to be able to not have a landing pad for functions
> whose address is never taken. At that point it doesn't matter if it gets
> generated along with the function and then stripped/poisoned later, or
> generated later.

Stripping is conceptually straightforward even without LTO.

foo.indirect:
ENDBR
foo:
...

and the linker learns (using magic function sections?) that, if
foo.indirect is not referenced, then it should not be generated. Or a
very straightforward walk over all the relocations in an object to
poison the unused .indirect entries could be done. Making this work
with DSOs, EXPORT_SYMBOL, etc will be somewhat nontrivial, but not
impossible.

--Andy

Subject: [tip: locking/urgent] static_call,x86: Robustify trampoline patching

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID: 2105a92748e83e2e3ee6be539da959706bbb3898
Gitweb: https://git.kernel.org/tip/2105a92748e83e2e3ee6be539da959706bbb3898
Author: Peter Zijlstra <[email protected]>
AuthorDate: Sat, 30 Oct 2021 09:47:58 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 11 Nov 2021 13:09:31 +01:00

static_call,x86: Robustify trampoline patching

Add a few signature bytes after the static call trampoline and verify
those bytes match before patching the trampoline. This avoids patching
random other JMPs (such as CFI jump-table entries) instead.

These bytes decode as:

d: 53 push %rbx
e: 43 54 rex.XB push %r12

And happen to spell "SCT".

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/include/asm/static_call.h | 1 +
arch/x86/kernel/static_call.c | 14 ++++++++++----
tools/objtool/check.c | 3 +++
3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/static_call.h b/arch/x86/include/asm/static_call.h
index cbb67b6..39ebe05 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -27,6 +27,7 @@
".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
STATIC_CALL_TRAMP_STR(name) ": \n" \
insns " \n" \
+ ".byte 0x53, 0x43, 0x54 \n" \
".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
".popsection \n")
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
index ea028e7..9c407a3 100644
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -56,10 +56,15 @@ static void __ref __static_call_transform(void *insn, enum insn_type type, void
text_poke_bp(insn, code, size, emulate);
}

-static void __static_call_validate(void *insn, bool tail)
+static void __static_call_validate(void *insn, bool tail, bool tramp)
{
u8 opcode = *(u8 *)insn;

+ if (tramp && memcmp(insn+5, "SCT", 3)) {
+ pr_err("trampoline signature fail");
+ BUG();
+ }
+
if (tail) {
if (opcode == JMP32_INSN_OPCODE ||
opcode == RET_INSN_OPCODE)
@@ -74,7 +79,8 @@ static void __static_call_validate(void *insn, bool tail)
/*
* If we ever trigger this, our text is corrupt, we'll probably not live long.
*/
- WARN_ONCE(1, "unexpected static_call insn opcode 0x%x at %pS\n", opcode, insn);
+ pr_err("unexpected static_call insn opcode 0x%x at %pS\n", opcode, insn);
+ BUG();
}

static inline enum insn_type __sc_insn(bool null, bool tail)
@@ -97,12 +103,12 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
mutex_lock(&text_mutex);

if (tramp) {
- __static_call_validate(tramp, true);
+ __static_call_validate(tramp, true, true);
__static_call_transform(tramp, __sc_insn(!func, true), func);
}

if (IS_ENABLED(CONFIG_HAVE_STATIC_CALL_INLINE) && site) {
- __static_call_validate(site, tail);
+ __static_call_validate(site, tail, false);
__static_call_transform(site, __sc_insn(!func, tail), func);
}

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index add3990..2173582 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3310,6 +3310,9 @@ static bool ignore_unreachable_insn(struct objtool_file *file, struct instructio
if (!insn->func)
return false;

+ if (insn->func->static_call_tramp)
+ return true;
+
/*
* CONFIG_UBSAN_TRAP inserts a UD2 when it sees
* __builtin_unreachable(). The BUG() macro has an unreachable() after

2021-11-15 13:09:58

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH] static_call,x86: Robustify trampoline patching

On 30/10/2021 10.16, Peter Zijlstra wrote:
> On Sat, Oct 30, 2021 at 09:47:58AM +0200, Peter Zijlstra wrote:

> So fundamentally I think the whole notion that the address of a function
> is something different than 'the address of that function' is an *utter*
> fail.

Agreed. IMHO, commits like 0a5b412891df, 981731129e0f should have been a
big red flag saying "ok, perhaps we shall not mess with the semantics of
function pointer comparisons". Yes yes, the commit log in cf68fffb66d6
did dutifully say "This may break code that relies on comparing
addresses passed from other components.".

But it did not spell out that that for example means that i2c bus
recovery now doesn't result in recovering the bus but instead in a NULL
pointer deref and kernel panic: Some i2c drivers set ->recover_bus =
i2c_generic_scl_recovery and provide ->sda_gpiod, ->scl_gpiod, then rely on

if (bri->scl_gpiod && bri->recover_bus == i2c_generic_scl_recovery)

in i2c-core-base.c causing ->get_scl/->set_scl to be filled in. That's
of course broken with ->recover_bus pointing to the module's jump table,
so when ->recover_bus is actually called, the bri->set_scl() call in
i2c_generic_scl_recovery() is a NULL pointer deref. [I've tested that
this actually happens on an imx8mp_evk board.]

And who knows how much other code happens to rely on function pointer
comparison working as specified in the C standard for correctness.

Rasmus