2020-09-25 14:59:56

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v13 0/8] Control-flow Enforcement: Indirect Branch Tracking

Control-flow Enforcement (CET) is a new Intel processor feature that blocks
return/jump-oriented programming attacks. Details are in "Intel 64 and
IA-32 Architectures Software Developer's Manual" [1].

This is the second part of CET and enables Indirect Branch Tracking (IBT).
It is built on top of the shadow stack series.

Changes in v13:
- Drop the patch that disables vsyscall emulation when CET is enabled, and
re-introduce an earlier patch that fixes shadow stack and IBT for the
emulation code.
- Remove "INTEL" string from CET Kconfig options, update help text, and
change IBT default configuration to N.

[1] Intel 64 and IA-32 Architectures Software Developer's Manual:

https://software.intel.com/en-us/download/intel-64-and-ia-32-
architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4

[2] Indirect Branch Tracking patches v12.

https://lkml.kernel.org/r/[email protected]/

H.J. Lu (3):
x86/cet/ibt: Update arch_prctl functions for Indirect Branch Tracking
x86/vdso/32: Add ENDBR32 to __kernel_vsyscall entry point
x86/vdso: Insert endbr32/endbr64 to vDSO

Yu-cheng Yu (5):
x86/cet/ibt: Add Kconfig option for user-mode Indirect Branch Tracking
x86/cet/ibt: User-mode Indirect Branch Tracking support
x86/cet/ibt: Handle signals for Indirect Branch Tracking
x86/cet/ibt: ELF header parsing for Indirect Branch Tracking
x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for
vsyscall emulation

arch/x86/Kconfig | 21 +++++++
arch/x86/entry/vdso/Makefile | 4 ++
arch/x86/entry/vdso/vdso32/system_call.S | 3 +
arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++
arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 +++
arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
arch/x86/include/asm/cet.h | 3 +
arch/x86/include/asm/disabled-features.h | 8 ++-
arch/x86/kernel/cet.c | 60 ++++++++++++++++++-
arch/x86/kernel/cet_prctl.c | 8 ++-
arch/x86/kernel/cpu/common.c | 17 ++++++
arch/x86/kernel/fpu/signal.c | 8 ++-
arch/x86/kernel/process_64.c | 8 +++
.../arch/x86/include/asm/disabled-features.h | 8 ++-
14 files changed, 184 insertions(+), 8 deletions(-)

--
2.21.0


2020-09-25 14:59:58

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v13 1/8] x86/cet/ibt: Add Kconfig option for user-mode Indirect Branch Tracking

Introduce Kconfig option X86_BRANCH_TRACKING_USER.

Indirect Branch Tracking (IBT) provides protection against CALL-/JMP-
oriented programming attacks. It is active when the kernel has this
feature enabled, and the processor and the application support it.
When this feature is enabled, legacy non-IBT applications continue to
work, but without IBT protection.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
v13:
- Update help text, and change default to N.
- Change X86_INTEL_* to X86_*.

v10:
- Change build-time CET check to config depends on.

arch/x86/Kconfig | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4b28a0ce4594..15c7f2606c9d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1966,6 +1966,25 @@ config X86_SHADOW_STACK_USER

If unsure, say N.

+config X86_BRANCH_TRACKING_USER
+ prompt "Intel Indirect Branch Tracking for user-mode"
+ def_bool n
+ depends on CPU_SUP_INTEL && X86_64
+ depends on $(cc-option,-fcf-protection)
+ select X86_CET
+ help
+ Indirect Branch Tracking (IBT) provides protection against
+ CALL-/JMP-oriented programming attacks. It is active when
+ the kernel has this feature enabled, and the processor and
+ the application support it. When this feature is enabled,
+ legacy non-IBT applications continue to work, but without
+ IBT protection.
+ Support for this feature is only known to be present on
+ processors released in 2020 or later. CET features are also
+ known to increase kernel text size by 3.7 KB.
+
+ If unsure, say N.
+
config EFI
bool "EFI runtime service support"
depends on ACPI
--
2.21.0

2020-09-25 15:00:00

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v13 2/8] x86/cet/ibt: User-mode Indirect Branch Tracking support

Introduce user-mode Indirect Branch Tracking (IBT) support. Update setup
routines to include IBT.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
v10:
- Change no_cet_ibt to no_user_ibt.

v9:
- Change cpu_feature_enabled() to static_cpu_has().

v2:
- Change noibt to no_cet_ibt.

arch/x86/include/asm/cet.h | 3 ++
arch/x86/include/asm/disabled-features.h | 8 ++++-
arch/x86/kernel/cet.c | 33 +++++++++++++++++++
arch/x86/kernel/cpu/common.c | 17 ++++++++++
.../arch/x86/include/asm/disabled-features.h | 8 ++++-
5 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 16870e5bc8eb..3a1cba579cb2 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -15,6 +15,7 @@ struct cet_status {
unsigned long shstk_base;
unsigned long shstk_size;
unsigned int locked:1;
+ unsigned int ibt_enabled:1;
};

#ifdef CONFIG_X86_CET
@@ -26,6 +27,8 @@ void cet_free_shstk(struct task_struct *p);
int cet_verify_rstor_token(bool ia32, unsigned long ssp, unsigned long *new_ssp);
void cet_restore_signal(struct sc_ext *sc);
int cet_setup_signal(bool ia32, unsigned long rstor, struct sc_ext *sc);
+int cet_setup_ibt(void);
+void cet_disable_ibt(void);
#else
static inline int prctl_cet(int option, u64 arg2) { return -EINVAL; }
static inline int cet_setup_thread_shstk(struct task_struct *p,
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index edac76ed75e7..e7096a1e2698 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -62,6 +62,12 @@
#define DISABLE_SHSTK (1<<(X86_FEATURE_SHSTK & 31))
#endif

+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+#define DISABLE_IBT 0
+#else
+#define DISABLE_IBT (1<<(X86_FEATURE_IBT & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -83,7 +89,7 @@
#define DISABLED_MASK15 0
#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK)
#define DISABLED_MASK17 0
-#define DISABLED_MASK18 0
+#define DISABLED_MASK18 (DISABLE_IBT)
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)

#endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index b285c726bb88..e95fadb264f7 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -13,6 +13,8 @@
#include <linux/uaccess.h>
#include <linux/sched/signal.h>
#include <linux/compat.h>
+#include <linux/vmalloc.h>
+#include <linux/bitops.h>
#include <asm/msr.h>
#include <asm/user.h>
#include <asm/fpu/internal.h>
@@ -341,3 +343,34 @@ int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext)

return 0;
}
+
+int cet_setup_ibt(void)
+{
+ u64 msr_val;
+
+ if (!static_cpu_has(X86_FEATURE_IBT))
+ return -EOPNOTSUPP;
+
+ start_update_msrs();
+ rdmsrl(MSR_IA32_U_CET, msr_val);
+ msr_val |= (CET_ENDBR_EN | CET_NO_TRACK_EN);
+ wrmsrl(MSR_IA32_U_CET, msr_val);
+ end_update_msrs();
+ current->thread.cet.ibt_enabled = 1;
+ return 0;
+}
+
+void cet_disable_ibt(void)
+{
+ u64 msr_val;
+
+ if (!static_cpu_has(X86_FEATURE_IBT))
+ return;
+
+ start_update_msrs();
+ rdmsrl(MSR_IA32_U_CET, msr_val);
+ msr_val &= CET_SHSTK_EN;
+ wrmsrl(MSR_IA32_U_CET, msr_val);
+ end_update_msrs();
+ current->thread.cet.ibt_enabled = 0;
+}
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 084480f975aa..909b4160a2d2 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -536,6 +536,23 @@ static __init int setup_disable_shstk(char *s)
__setup("no_user_shstk", setup_disable_shstk);
#endif

+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+static __init int setup_disable_ibt(char *s)
+{
+ /* require an exact match without trailing characters */
+ if (s[0] != '\0')
+ return 0;
+
+ if (!boot_cpu_has(X86_FEATURE_IBT))
+ return 1;
+
+ setup_clear_cpu_cap(X86_FEATURE_IBT);
+ pr_info("x86: 'no_user_ibt' specified, disabling user Branch Tracking\n");
+ return 1;
+}
+__setup("no_user_ibt", setup_disable_ibt);
+#endif
+
/*
* Some CPU features depend on higher CPUID levels, which may not always
* be available due to CPUID level capping or broken virtualization
diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h
index edac76ed75e7..e7096a1e2698 100644
--- a/tools/arch/x86/include/asm/disabled-features.h
+++ b/tools/arch/x86/include/asm/disabled-features.h
@@ -62,6 +62,12 @@
#define DISABLE_SHSTK (1<<(X86_FEATURE_SHSTK & 31))
#endif

+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+#define DISABLE_IBT 0
+#else
+#define DISABLE_IBT (1<<(X86_FEATURE_IBT & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -83,7 +89,7 @@
#define DISABLED_MASK15 0
#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP|DISABLE_SHSTK)
#define DISABLED_MASK17 0
-#define DISABLED_MASK18 0
+#define DISABLED_MASK18 (DISABLE_IBT)
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)

#endif /* _ASM_X86_DISABLED_FEATURES_H */
--
2.21.0

2020-09-25 15:00:01

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v13 4/8] x86/cet/ibt: ELF header parsing for Indirect Branch Tracking

Update arch_setup_elf_property() for Indirect Branch Tracking.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
v9:
- Change cpu_feature_enabled() to static_cpu_has().

arch/x86/Kconfig | 2 ++
arch/x86/kernel/process_64.c | 8 ++++++++
2 files changed, 10 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 15c7f2606c9d..cc9876f85e91 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1972,6 +1972,8 @@ config X86_BRANCH_TRACKING_USER
depends on CPU_SUP_INTEL && X86_64
depends on $(cc-option,-fcf-protection)
select X86_CET
+ select ARCH_USE_GNU_PROPERTY
+ select ARCH_BINFMT_ELF_STATE
help
Indirect Branch Tracking (IBT) provides protection against
CALL-/JMP-oriented programming attacks. It is active when
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 8725e67bcd44..1147a1052a07 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -866,6 +866,14 @@ int arch_setup_elf_property(struct arch_elf_state *state)
r = cet_setup_shstk();
}

+ if (r < 0)
+ return r;
+
+ if (static_cpu_has(X86_FEATURE_IBT)) {
+ if (state->gnu_property & GNU_PROPERTY_X86_FEATURE_1_IBT)
+ r = cet_setup_ibt();
+ }
+
return r;
}
#endif
--
2.21.0

2020-09-25 15:00:13

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

Vsyscall entry points are effectively branch targets. Mark them with
ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
and reset IBT state machine.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
v13:
- Check shadow stack address is canonical.
- Change from writing to MSRs to writing to CET xstate.

arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
3 files changed, 44 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..315ee3572664 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -38,6 +38,9 @@
#include <asm/fixmap.h>
#include <asm/traps.h>
#include <asm/paravirt.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/fpu/internal.h>

#define CREATE_TRACE_POINTS
#include "vsyscall_trace.h"
@@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
/* Emulate a ret instruction. */
regs->ip = caller;
regs->sp += 8;
+
+#ifdef CONFIG_X86_CET
+ if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
+ struct cet_user_state *cet;
+ struct fpu *fpu;
+
+ fpu = &tsk->thread.fpu;
+ fpregs_lock();
+
+ if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+ copy_fpregs_to_fpstate(fpu);
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ }
+
+ cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
+ if (!cet) {
+ fpregs_unlock();
+ goto sigsegv;
+ }
+
+ if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
+ cet->user_ssp += 8;
+
+ if (cet->user_cet & CET_ENDBR_EN)
+ cet->user_cet &= ~CET_WAIT_ENDBR;
+
+ __fpu_invalidate_fpregs_state(fpu);
+ fpregs_unlock();
+ }
+#endif
+
return true;

sigsegv:
diff --git a/arch/x86/entry/vsyscall/vsyscall_emu_64.S b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
index 2e203f3a25a7..b2fa92104cdb 100644
--- a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
+++ b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
@@ -17,16 +17,25 @@ __PAGE_ALIGNED_DATA
.type __vsyscall_page, @object
__vsyscall_page:

+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_gettimeofday, %rax
syscall
ret

.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_time, %rax
syscall
ret

.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_getcpu, %rax
syscall
ret
diff --git a/arch/x86/entry/vsyscall/vsyscall_trace.h b/arch/x86/entry/vsyscall/vsyscall_trace.h
index 3c3f9765a85c..7aa2101ada44 100644
--- a/arch/x86/entry/vsyscall/vsyscall_trace.h
+++ b/arch/x86/entry/vsyscall/vsyscall_trace.h
@@ -25,6 +25,7 @@ TRACE_EVENT(emulate_vsyscall,
#endif

#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_PATH ../../arch/x86/entry/vsyscall/
#define TRACE_INCLUDE_FILE vsyscall_trace
#include <trace/define_trace.h>
--
2.21.0

2020-09-25 15:01:14

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v13 5/8] x86/cet/ibt: Update arch_prctl functions for Indirect Branch Tracking

From: "H.J. Lu" <[email protected]>

Update ARCH_X86_CET_STATUS and ARCH_X86_CET_DISABLE for Indirect Branch
Tracking.

Signed-off-by: H.J. Lu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/kernel/cet_prctl.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cet_prctl.c b/arch/x86/kernel/cet_prctl.c
index bd5ad11763e4..0af1ec5d028f 100644
--- a/arch/x86/kernel/cet_prctl.c
+++ b/arch/x86/kernel/cet_prctl.c
@@ -22,6 +22,9 @@ static int copy_status_to_user(struct cet_status *cet, u64 arg2)
buf[2] = (u64)cet->shstk_size;
}

+ if (cet->ibt_enabled)
+ buf[0] |= GNU_PROPERTY_X86_FEATURE_1_IBT;
+
return copy_to_user((u64 __user *)arg2, buf, sizeof(buf));
}

@@ -42,7 +45,8 @@ int prctl_cet(int option, u64 arg2)
if (option == ARCH_X86_CET_STATUS)
return copy_status_to_user(cet, arg2);

- if (!static_cpu_has(X86_FEATURE_SHSTK))
+ if (!static_cpu_has(X86_FEATURE_SHSTK) &&
+ !static_cpu_has(X86_FEATURE_IBT))
return -EOPNOTSUPP;

switch (option) {
@@ -56,6 +60,8 @@ int prctl_cet(int option, u64 arg2)
return -EINVAL;
if (features & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
cet_disable_shstk();
+ if (features & GNU_PROPERTY_X86_FEATURE_1_IBT)
+ cet_disable_ibt();
return 0;

case ARCH_X86_CET_LOCK:
--
2.21.0

2020-09-25 15:01:15

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v13 3/8] x86/cet/ibt: Handle signals for Indirect Branch Tracking

An indirect CALL/JMP moves the indirect branch tracking (IBT) state machine
to WAIT_ENDBR status until the instruction reaches an ENDBR opcode. If the
CALL/JMP does not reach an ENDBR opcode, the processor raises a control-
protection fault. WAIT_ENDBR status can be read from MSR_IA32_U_CET.

WAIT_ENDBR is cleared for signal handling, and restored for sigreturn.

IBT state machine is described in Intel SDM Vol. 1, Sec. 18.3.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
v9:
- Fix missing WAIT_ENDBR in signal handling.

arch/x86/kernel/cet.c | 27 +++++++++++++++++++++++++--
arch/x86/kernel/fpu/signal.c | 8 +++++---
2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index e95fadb264f7..1f8b72269166 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -295,6 +295,13 @@ void cet_restore_signal(struct sc_ext *sc_ext)
msr_val |= CET_SHSTK_EN;
}

+ if (cet->ibt_enabled) {
+ msr_val |= (CET_ENDBR_EN | CET_NO_TRACK_EN);
+
+ if (sc_ext->wait_endbr)
+ msr_val |= CET_WAIT_ENDBR;
+ }
+
if (test_thread_flag(TIF_NEED_FPU_LOAD))
cet_user_state->user_cet = msr_val;
else
@@ -335,9 +342,25 @@ int cet_setup_signal(bool ia32, unsigned long rstor_addr, struct sc_ext *sc_ext)
sc_ext->ssp = new_ssp;
}

- if (ssp) {
+ if (ssp || cet->ibt_enabled) {
+
start_update_msrs();
- wrmsrl(MSR_IA32_PL3_SSP, ssp);
+
+ if (ssp)
+ wrmsrl(MSR_IA32_PL3_SSP, ssp);
+
+ if (cet->ibt_enabled) {
+ u64 r;
+
+ rdmsrl(MSR_IA32_U_CET, r);
+
+ if (r & CET_WAIT_ENDBR) {
+ sc_ext->wait_endbr = 1;
+ r &= ~CET_WAIT_ENDBR;
+ wrmsrl(MSR_IA32_U_CET, r);
+ }
+ }
+
end_update_msrs();
}

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c0c2141cb4b3..077853ef6f48 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -57,7 +57,8 @@ int save_cet_to_sigframe(int ia32, void __user *fp, unsigned long restorer)
{
int err = 0;

- if (!current->thread.cet.shstk_size)
+ if (!current->thread.cet.shstk_size &&
+ !current->thread.cet.ibt_enabled)
return 0;

if (fp) {
@@ -89,7 +90,8 @@ static int get_cet_from_sigframe(int ia32, void __user *fp, struct sc_ext *ext)

memset(ext, 0, sizeof(*ext));

- if (!current->thread.cet.shstk_size)
+ if (!current->thread.cet.shstk_size &&
+ !current->thread.cet.ibt_enabled)
return 0;

if (fp) {
@@ -577,7 +579,7 @@ static unsigned long fpu__alloc_sigcontext_ext(unsigned long sp)
* sigcontext_ext is at: fpu + fpu_user_xstate_size +
* FP_XSTATE_MAGIC2_SIZE, then aligned to 8.
*/
- if (cet->shstk_size)
+ if (cet->shstk_size || cet->ibt_enabled)
sp -= (sizeof(struct sc_ext) + 8);

return sp;
--
2.21.0

2020-09-25 15:01:25

by Yu-cheng Yu

[permalink] [raw]
Subject: [PATCH v13 6/8] x86/vdso/32: Add ENDBR32 to __kernel_vsyscall entry point

From: "H.J. Lu" <[email protected]>

Add ENDBR32 to __kernel_vsyscall entry point.

Signed-off-by: H.J. Lu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/vdso/vdso32/system_call.S | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/entry/vdso/vdso32/system_call.S b/arch/x86/entry/vdso/vdso32/system_call.S
index de1fff7188aa..e331fcdebd95 100644
--- a/arch/x86/entry/vdso/vdso32/system_call.S
+++ b/arch/x86/entry/vdso/vdso32/system_call.S
@@ -14,6 +14,9 @@
ALIGN
__kernel_vsyscall:
CFI_STARTPROC
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr32
+#endif
/*
* Reshuffle regs so that all of any of the entry instructions
* will preserve enough state.
--
2.21.0

2020-09-25 16:36:00

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <[email protected]> wrote:
>
> Vsyscall entry points are effectively branch targets. Mark them with
> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> and reset IBT state machine.
>
> Signed-off-by: Yu-cheng Yu <[email protected]>
> ---
> v13:
> - Check shadow stack address is canonical.
> - Change from writing to MSRs to writing to CET xstate.
>
> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
> 3 files changed, 44 insertions(+)
>
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 44c33103a955..315ee3572664 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -38,6 +38,9 @@
> #include <asm/fixmap.h>
> #include <asm/traps.h>
> #include <asm/paravirt.h>
> +#include <asm/fpu/xstate.h>
> +#include <asm/fpu/types.h>
> +#include <asm/fpu/internal.h>
>
> #define CREATE_TRACE_POINTS
> #include "vsyscall_trace.h"
> @@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
> /* Emulate a ret instruction. */
> regs->ip = caller;
> regs->sp += 8;
> +
> +#ifdef CONFIG_X86_CET
> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> + struct cet_user_state *cet;
> + struct fpu *fpu;
> +
> + fpu = &tsk->thread.fpu;
> + fpregs_lock();
> +
> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> + copy_fpregs_to_fpstate(fpu);
> + set_thread_flag(TIF_NEED_FPU_LOAD);
> + }
> +
> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> + if (!cet) {
> + fpregs_unlock();
> + goto sigsegv;

I *think* your patchset tries to keep cet.shstk_size and
cet.ibt_enabled in sync with the MSR, in which case it should be
impossible to get here, but a comment and a warning would be much
better than a random sigsegv.

Shouldn't we have a get_xsave_addr_or_allocate() that will never
return NULL but instead will mark the state as in use and set up the
init state if the feature was previously not in use?

2020-09-25 16:49:59

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On 9/25/2020 9:31 AM, Andy Lutomirski wrote:
> On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <[email protected]> wrote:
>>

[...]

>> @@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
>> /* Emulate a ret instruction. */
>> regs->ip = caller;
>> regs->sp += 8;
>> +
>> +#ifdef CONFIG_X86_CET
>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
>> + struct cet_user_state *cet;
>> + struct fpu *fpu;
>> +
>> + fpu = &tsk->thread.fpu;
>> + fpregs_lock();
>> +
>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
>> + copy_fpregs_to_fpstate(fpu);
>> + set_thread_flag(TIF_NEED_FPU_LOAD);
>> + }
>> +
>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>> + if (!cet) {
>> + fpregs_unlock();
>> + goto sigsegv;
>
> I *think* your patchset tries to keep cet.shstk_size and
> cet.ibt_enabled in sync with the MSR, in which case it should be
> impossible to get here, but a comment and a warning would be much
> better than a random sigsegv.

Yes, it should be impossible to get here. I will add a comment and a
warning, but still do sigsegv. Should this happen, and the function
return, the app gets a control-protection fault. Why not let it fail early?

>
> Shouldn't we have a get_xsave_addr_or_allocate() that will never
> return NULL but instead will mark the state as in use and set up the
> init state if the feature was previously not in use?
>

We already have a static __raw_xsave_addr(), which returns a pointer to
the requested xstate. Maybe we can export __raw_xsave_addr(), if that
is needed.

2020-09-25 16:52:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation



> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
>
> On 9/25/2020 9:31 AM, Andy Lutomirski wrote:
>>> On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <[email protected]> wrote:
>>>
>
> [...]
>
>>> @@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
>>> /* Emulate a ret instruction. */
>>> regs->ip = caller;
>>> regs->sp += 8;
>>> +
>>> +#ifdef CONFIG_X86_CET
>>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
>>> + struct cet_user_state *cet;
>>> + struct fpu *fpu;
>>> +
>>> + fpu = &tsk->thread.fpu;
>>> + fpregs_lock();
>>> +
>>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
>>> + copy_fpregs_to_fpstate(fpu);
>>> + set_thread_flag(TIF_NEED_FPU_LOAD);
>>> + }
>>> +
>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>>> + if (!cet) {
>>> + fpregs_unlock();
>>> + goto sigsegv;
>> I *think* your patchset tries to keep cet.shstk_size and
>> cet.ibt_enabled in sync with the MSR, in which case it should be
>> impossible to get here, but a comment and a warning would be much
>> better than a random sigsegv.
>
> Yes, it should be impossible to get here. I will add a comment and a warning, but still do sigsegv. Should this happen, and the function return, the app gets a control-protection fault. Why not let it fail early?

I’m okay with either approach as long as we get a comment and warning.

>
>>
>> Shouldn't we have a get_xsave_addr_or_allocate() that will never
>> return NULL but instead will mark the state as in use and set up the
>> init state if the feature was previously not in use?
>
> We already have a static __raw_xsave_addr(), which returns a pointer to the requested xstate. Maybe we can export __raw_xsave_addr(), if that is needed.

I don’t think that’s what we want in general — we want the whole construct of initializing the state if needed.

2020-09-28 17:00:55

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
> >
> > On 9/25/2020 9:31 AM, Andy Lutomirski wrote:
> > > > On Fri, Sep 25, 2020 at 7:58 AM Yu-cheng Yu <[email protected]> wrote:
> > > >
> >
> > [...]
> >
> > > > @@ -286,6 +289,37 @@ bool emulate_vsyscall(unsigned long error_code,
> > > > /* Emulate a ret instruction. */
> > > > regs->ip = caller;
> > > > regs->sp += 8;
> > > > +
> > > > +#ifdef CONFIG_X86_CET
> > > > + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > > > + struct cet_user_state *cet;
> > > > + struct fpu *fpu;
> > > > +
> > > > + fpu = &tsk->thread.fpu;
> > > > + fpregs_lock();
> > > > +
> > > > + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > > > + copy_fpregs_to_fpstate(fpu);
> > > > + set_thread_flag(TIF_NEED_FPU_LOAD);
> > > > + }
> > > > +
> > > > + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > > + if (!cet) {
> > > > + fpregs_unlock();
> > > > + goto sigsegv;
> > > I *think* your patchset tries to keep cet.shstk_size and
> > > cet.ibt_enabled in sync with the MSR, in which case it should be
> > > impossible to get here, but a comment and a warning would be much
> > > better than a random sigsegv.
> >
> > Yes, it should be impossible to get here. I will add a comment and a warning, but still do sigsegv. Should this happen, and the function return, the app gets a control-protection fault. Why not let it fail early?
>
> I’m okay with either approach as long as we get a comment and warning.
>

Here is the updated patch. I can also re-send the whole series as v14. Thanks!

======

From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
From: Yu-cheng Yu <[email protected]>
Date: Thu, 29 Nov 2018 14:15:38 -0800
Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch
Tracking for vsyscall emulation

Vsyscall entry points are effectively branch targets. Mark them with
ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
and reset IBT state machine.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
v13:
- Check shadow stack address is canonical.
- Change from writing to MSRs to
writing to CET xstate.

arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
3 files changed, 44 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..30b166091d46 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -38,6 +38,9 @@
#include <asm/fixmap.h>
#include <asm/traps.h>
#include <asm/paravirt.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/fpu/internal.h>

#define CREATE_TRACE_POINTS
#include "vsyscall_trace.h"
@@ -286,6 +289,42 @@ bool emulate_vsyscall(unsigned long error_code,
/* Emulate a ret instruction. */
regs->ip = caller;
regs->sp += 8;
+
+#ifdef CONFIG_X86_CET
+ if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
+ struct cet_user_state *cet;
+ struct fpu *fpu;
+
+ fpu = &tsk->thread.fpu;
+ fpregs_lock();
+
+ if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+ copy_fpregs_to_fpstate(fpu);
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ }
+
+ cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
+ if (!cet) {
+ /*
+ * This is an unlikely case where the task is
+ * CET-enabled, but CET xstate is in INIT.
+ */
+ WARN_ONCE(1, "CET is enabled, but no xstates");
+ fpregs_unlock();
+ goto sigsegv;
+ }
+
+ if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
+ cet->user_ssp += 8;
+
+ if (cet->user_cet & CET_ENDBR_EN)
+ cet->user_cet &= ~CET_WAIT_ENDBR;
+
+ __fpu_invalidate_fpregs_state(fpu);
+ fpregs_unlock();
+ }
+#endif
+
return true;

sigsegv:
diff --git a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
index 2e203f3a25a7..b2fa92104cdb 100644
--- a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
+++ b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
@@ -17,16 +17,25 @@ __PAGE_ALIGNED_DATA
.type __vsyscall_page, @object
__vsyscall_page:

+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_gettimeofday, %rax
syscall
ret

.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_time, %rax
syscall
ret

.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_getcpu, %rax
syscall
ret
diff --git a/arch/x86/entry/vsyscall/vsyscall_trace.h
b/arch/x86/entry/vsyscall/vsyscall_trace.h
index 3c3f9765a85c..7aa2101ada44 100644
--- a/arch/x86/entry/vsyscall/vsyscall_trace.h
+++ b/arch/x86/entry/vsyscall/vsyscall_trace.h
@@ -25,6 +25,7 @@ TRACE_EVENT(emulate_vsyscall,
#endif

#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_PATH ../../arch/x86/entry/vsyscall/
#define TRACE_INCLUDE_FILE vsyscall_trace
#include <trace/define_trace.h>
--
2.21.0



2020-09-28 17:39:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
>
> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > > On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
> +
> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> + if (!cet) {
> + /*
> + * This is an unlikely case where the task is
> + * CET-enabled, but CET xstate is in INIT.
> + */
> + WARN_ONCE(1, "CET is enabled, but no xstates");

"unlikely" doesn't really cover this.

> + fpregs_unlock();
> + goto sigsegv;
> + }
> +
> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> + cet->user_ssp += 8;

This looks buggy. The condition should be "if SHSTK is on, then add 8
to user_ssp". If the result is noncanonical, then some appropriate
exception should be generated, probably by the FPU restore code -- see
below. You should be checking the SHSTK_EN bit, not SSP.

Also, can you point me to where any of these canonicality rules are
documented in the SDM? I looked and I can't find them.


This reminds me: this code in extable.c needs to change.

__visible bool ex_handler_fprestore(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr)
{
regs->ip = ex_fixup_addr(fixup);

WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing
FPU registers.",
(void *)instruction_pointer(regs));

__copy_kernel_to_fpregs(&init_fpstate, -1);

Now that we have supervisor states like CET, this is buggy. This
should do something intelligent like initializing all the *user* state
and trying again. If that succeeds, a signal should be sent rather
than just corrupting the task. And if it fails, then perhaps some
actual intelligence is needed. We certainly should not just disable
CET because something is wrong with the CET MSRs.

2020-09-28 19:06:22

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
>>
>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
>> +
>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>> + if (!cet) {
>> + /*
>> + * This is an unlikely case where the task is
>> + * CET-enabled, but CET xstate is in INIT.
>> + */
>> + WARN_ONCE(1, "CET is enabled, but no xstates");
>
> "unlikely" doesn't really cover this.
>
>> + fpregs_unlock();
>> + goto sigsegv;
>> + }
>> +
>> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
>> + cet->user_ssp += 8;
>
> This looks buggy. The condition should be "if SHSTK is on, then add 8
> to user_ssp". If the result is noncanonical, then some appropriate
> exception should be generated, probably by the FPU restore code -- see
> below. You should be checking the SHSTK_EN bit, not SSP.

The code now checks if shadow stack is on (yes, it should check SHSTK_EN
bit, I will fix it.), then adds 8 to user_ssp. If the result is
canonical, then it sets the corresponding xstate.

If the resulting address is not canonical, the kernel does not know what
the address should be either. I think the best action to take is doing
nothing about the shadow stack pointer, and let the application return
and get a control protection fault. The application should have not got
into such situation in the first place; if it does, it should fault.

>
> Also, can you point me to where any of these canonicality rules are
> documented in the SDM? I looked and I can't find them.

The SDM is not very explicit. It should have been.

>
> This reminds me: this code in extable.c needs to change.
>
> __visible bool ex_handler_fprestore(const struct exception_table_entry *fixup,
> struct pt_regs *regs, int trapnr,
> unsigned long error_code,
> unsigned long fault_addr)
> {
> regs->ip = ex_fixup_addr(fixup);
>
> WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing
> FPU registers.",
> (void *)instruction_pointer(regs));
>
> __copy_kernel_to_fpregs(&init_fpstate, -1);
>
> Now that we have supervisor states like CET, this is buggy. This
> should do something intelligent like initializing all the *user* state
> and trying again. If that succeeds, a signal should be sent rather
> than just corrupting the task. And if it fails, then perhaps some
> actual intelligence is needed. We certainly should not just disable
> CET because something is wrong with the CET MSRs.
>

Yes, but it needs more thought. Maybe a separate patch and more discussion?

Yu-cheng

2020-09-29 18:41:16

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
>>
>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
>> +
>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>> + if (!cet) {
>> + /*
>> + * This is an unlikely case where the task is
>> + * CET-enabled, but CET xstate is in INIT.
>> + */
>> + WARN_ONCE(1, "CET is enabled, but no xstates");
>
> "unlikely" doesn't really cover this.
>
>> + fpregs_unlock();
>> + goto sigsegv;
>> + }
>> +
>> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
>> + cet->user_ssp += 8;
>
> This looks buggy. The condition should be "if SHSTK is on, then add 8
> to user_ssp". If the result is noncanonical, then some appropriate
> exception should be generated, probably by the FPU restore code -- see
> below. You should be checking the SHSTK_EN bit, not SSP.

Updated. Is this OK? I will resend the whole series later.

Thanks,
Yu-cheng

======

From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
From: Yu-cheng Yu <[email protected]>
Date: Thu, 29 Nov 2018 14:15:38 -0800
Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
Indirect Branch
Tracking for vsyscall emulation

Vsyscall entry points are effectively branch targets. Mark them with
ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
and reset IBT state machine.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
v13:
- Check shadow stack address is canonical.
- Change from writing to MSRs to writing to CET xstate.

arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
3 files changed, 44 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..30b166091d46 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -38,6 +38,9 @@
#include <asm/fixmap.h>
#include <asm/traps.h>
#include <asm/paravirt.h>
+#include <asm/fpu/xstate.h>
+#include <asm/fpu/types.h>
+#include <asm/fpu/internal.h>

#define CREATE_TRACE_POINTS
#include "vsyscall_trace.h"
@@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
/* Emulate a ret instruction. */
regs->ip = caller;
regs->sp += 8;
+
+#ifdef CONFIG_X86_CET
+ if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
+ struct cet_user_state *cet;
+ struct fpu *fpu;
+
+ fpu = &tsk->thread.fpu;
+ fpregs_lock();
+
+ if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+ copy_fpregs_to_fpstate(fpu);
+ set_thread_flag(TIF_NEED_FPU_LOAD);
+ }
+
+ cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
+ if (!cet) {
+ /*
+ * This should not happen. The task is
+ * CET-enabled, but CET xstate is in INIT.
+ */
+ WARN_ONCE(1, "CET is enabled, but no xstates");
+ fpregs_unlock();
+ goto sigsegv;
+ }
+
+ if (cet->user_cet & CET_SHSTK_EN) {
+ if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
+ cet->user_ssp += 8;
+ }
+
+ if (cet->user_cet & CET_ENDBR_EN)
+ cet->user_cet &= ~CET_WAIT_ENDBR;
+
+ __fpu_invalidate_fpregs_state(fpu);
+ fpregs_unlock();
+ }
+#endif
+
return true;

sigsegv:
diff --git a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
index 2e203f3a25a7..b2fa92104cdb 100644
--- a/arch/x86/entry/vsyscall/vsyscall_emu_64.S
+++ b/arch/x86/entry/vsyscall/vsyscall_emu_64.S
@@ -17,16 +17,25 @@ __PAGE_ALIGNED_DATA
.type __vsyscall_page, @object
__vsyscall_page:

+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_gettimeofday, %rax
syscall
ret

.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_time, %rax
syscall
ret

.balign 1024, 0xcc
+#ifdef CONFIG_X86_BRANCH_TRACKING_USER
+ endbr64
+#endif
mov $__NR_getcpu, %rax
syscall
ret
diff --git a/arch/x86/entry/vsyscall/vsyscall_trace.h
b/arch/x86/entry/vsyscall/vsyscall_trace.h
index 3c3f9765a85c..7aa2101ada44 100644
--- a/arch/x86/entry/vsyscall/vsyscall_trace.h
+++ b/arch/x86/entry/vsyscall/vsyscall_trace.h
@@ -25,6 +25,7 @@ TRACE_EVENT(emulate_vsyscall,
#endif

#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_PATH ../../arch/x86/entry/vsyscall/
#define TRACE_INCLUDE_FILE vsyscall_trace
#include <trace/define_trace.h>
--
2.21.0

2020-09-29 20:01:36

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <[email protected]> wrote:
>
> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
> >>
> >> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> >>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
> >> +
> >> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> >> + if (!cet) {
> >> + /*
> >> + * This is an unlikely case where the task is
> >> + * CET-enabled, but CET xstate is in INIT.
> >> + */
> >> + WARN_ONCE(1, "CET is enabled, but no xstates");
> >
> > "unlikely" doesn't really cover this.
> >
> >> + fpregs_unlock();
> >> + goto sigsegv;
> >> + }
> >> +
> >> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> >> + cet->user_ssp += 8;
> >
> > This looks buggy. The condition should be "if SHSTK is on, then add 8
> > to user_ssp". If the result is noncanonical, then some appropriate
> > exception should be generated, probably by the FPU restore code -- see
> > below. You should be checking the SHSTK_EN bit, not SSP.
>
> Updated. Is this OK? I will resend the whole series later.
>
> Thanks,
> Yu-cheng
>
> ======
>
> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> From: Yu-cheng Yu <[email protected]>
> Date: Thu, 29 Nov 2018 14:15:38 -0800
> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> Indirect Branch
> Tracking for vsyscall emulation
>
> Vsyscall entry points are effectively branch targets. Mark them with
> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> and reset IBT state machine.
>
> Signed-off-by: Yu-cheng Yu <[email protected]>
> ---
> v13:
> - Check shadow stack address is canonical.
> - Change from writing to MSRs to writing to CET xstate.
>
> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
> 3 files changed, 44 insertions(+)
>
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 44c33103a955..30b166091d46 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -38,6 +38,9 @@
> #include <asm/fixmap.h>
> #include <asm/traps.h>
> #include <asm/paravirt.h>
> +#include <asm/fpu/xstate.h>
> +#include <asm/fpu/types.h>
> +#include <asm/fpu/internal.h>
>
> #define CREATE_TRACE_POINTS
> #include "vsyscall_trace.h"
> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> /* Emulate a ret instruction. */
> regs->ip = caller;
> regs->sp += 8;
> +
> +#ifdef CONFIG_X86_CET
> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> + struct cet_user_state *cet;
> + struct fpu *fpu;
> +
> + fpu = &tsk->thread.fpu;
> + fpregs_lock();
> +
> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> + copy_fpregs_to_fpstate(fpu);
> + set_thread_flag(TIF_NEED_FPU_LOAD);
> + }
> +
> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> + if (!cet) {
> + /*
> + * This should not happen. The task is
> + * CET-enabled, but CET xstate is in INIT.
> + */

Can the comment explain better, please? I would say something like:

If the kernel thinks this task has CET enabled (because
tsk->thread.cet has one of the features enabled), then the
corresponding bits must also be set in the CET XSAVES region. If the
CET XSAVES region is in the INIT state, then the kernel's concept of
the task's CET state is corrupt.

> + WARN_ONCE(1, "CET is enabled, but no xstates");
> + fpregs_unlock();
> + goto sigsegv;
> + }
> +
> + if (cet->user_cet & CET_SHSTK_EN) {
> + if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> + cet->user_ssp += 8;
> + }

This makes so sense to me. Also, the vsyscall emulation code is
intended to be as rigid as possible to minimize the chance that it
gets used as an exploit gadget. So we should not silently corrupt
anything. Moreover, this code seems quite dangerous -- you've created
a gadget that does RET without actually verifying the SHSTK token. If
SHSTK and some form of strong indirect branch/call CFI is in use, then
the existance of a CFI-bypassing return primitive at a fixed address
seems quite problematic.

So I think you need to write a function that reasonably accurately
emulates a usermode RET.

--Andy

2020-09-29 20:03:50

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <[email protected]> wrote:
> >
> > On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > > On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
> > >>
> > >> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > >>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
> > >> +
> > >> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > >> + if (!cet) {
> > >> + /*
> > >> + * This is an unlikely case where the task is
> > >> + * CET-enabled, but CET xstate is in INIT.
> > >> + */
> > >> + WARN_ONCE(1, "CET is enabled, but no xstates");
> > >
> > > "unlikely" doesn't really cover this.
> > >
> > >> + fpregs_unlock();
> > >> + goto sigsegv;
> > >> + }
> > >> +
> > >> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> > >> + cet->user_ssp += 8;
> > >
> > > This looks buggy. The condition should be "if SHSTK is on, then add 8
> > > to user_ssp". If the result is noncanonical, then some appropriate
> > > exception should be generated, probably by the FPU restore code -- see
> > > below. You should be checking the SHSTK_EN bit, not SSP.
> >
> > Updated. Is this OK? I will resend the whole series later.
> >
> > Thanks,
> > Yu-cheng
> >
> > ======
> >
> > From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> > From: Yu-cheng Yu <[email protected]>
> > Date: Thu, 29 Nov 2018 14:15:38 -0800
> > Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> > Indirect Branch
> > Tracking for vsyscall emulation
> >
> > Vsyscall entry points are effectively branch targets. Mark them with
> > ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> > and reset IBT state machine.
> >
> > Signed-off-by: Yu-cheng Yu <[email protected]>
> > ---
> > v13:
> > - Check shadow stack address is canonical.
> > - Change from writing to MSRs to writing to CET xstate.
> >
> > arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
> > arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
> > arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
> > 3 files changed, 44 insertions(+)
> >
> > diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> > b/arch/x86/entry/vsyscall/vsyscall_64.c
> > index 44c33103a955..30b166091d46 100644
> > --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > @@ -38,6 +38,9 @@
> > #include <asm/fixmap.h>
> > #include <asm/traps.h>
> > #include <asm/paravirt.h>
> > +#include <asm/fpu/xstate.h>
> > +#include <asm/fpu/types.h>
> > +#include <asm/fpu/internal.h>
> >
> > #define CREATE_TRACE_POINTS
> > #include "vsyscall_trace.h"
> > @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> > /* Emulate a ret instruction. */
> > regs->ip = caller;
> > regs->sp += 8;
> > +
> > +#ifdef CONFIG_X86_CET
> > + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > + struct cet_user_state *cet;
> > + struct fpu *fpu;
> > +
> > + fpu = &tsk->thread.fpu;
> > + fpregs_lock();
> > +
> > + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > + copy_fpregs_to_fpstate(fpu);
> > + set_thread_flag(TIF_NEED_FPU_LOAD);
> > + }
> > +
> > + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > + if (!cet) {
> > + /*
> > + * This should not happen. The task is
> > + * CET-enabled, but CET xstate is in INIT.
> > + */
>
> Can the comment explain better, please? I would say something like:
>
> If the kernel thinks this task has CET enabled (because
> tsk->thread.cet has one of the features enabled), then the
> corresponding bits must also be set in the CET XSAVES region. If the
> CET XSAVES region is in the INIT state, then the kernel's concept of
> the task's CET state is corrupt.
>
> > + WARN_ONCE(1, "CET is enabled, but no xstates");
> > + fpregs_unlock();
> > + goto sigsegv;
> > + }
> > +
> > + if (cet->user_cet & CET_SHSTK_EN) {
> > + if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> > + cet->user_ssp += 8;
> > + }
>
> This makes so sense to me. Also, the vsyscall emulation code is
> intended to be as rigid as possible to minimize the chance that it
> gets used as an exploit gadget. So we should not silently corrupt
> anything. Moreover, this code seems quite dangerous -- you've created
> a gadget that does RET without actually verifying the SHSTK token. If
> SHSTK and some form of strong indirect branch/call CFI is in use, then
> the existance of a CFI-bypassing return primitive at a fixed address
> seems quite problematic.
>
> So I think you need to write a function that reasonably accurately
> emulates a usermode RET.
>

For what it's worth, I think there is an alternative. If you all
(userspace people, etc) can come up with a credible way for a user
program to statically declare that it doesn't need vsyscalls, then we
could make SHSTK depend on *that*, and we could avoid this mess. This
breaks orthogonality, but it's probably a decent outcome.

2020-10-01 00:25:55

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <[email protected]> wrote:
>>
>> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <[email protected]> wrote:
>>>
>>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
>>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
>>>>>
>>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
>>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
>>>>> +
>>>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>>>>> + if (!cet) {
>>>>> + /*
>>>>> + * This is an unlikely case where the task is
>>>>> + * CET-enabled, but CET xstate is in INIT.
>>>>> + */
>>>>> + WARN_ONCE(1, "CET is enabled, but no xstates");
>>>>
>>>> "unlikely" doesn't really cover this.
>>>>
>>>>> + fpregs_unlock();
>>>>> + goto sigsegv;
>>>>> + }
>>>>> +
>>>>> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
>>>>> + cet->user_ssp += 8;
>>>>
>>>> This looks buggy. The condition should be "if SHSTK is on, then add 8
>>>> to user_ssp". If the result is noncanonical, then some appropriate
>>>> exception should be generated, probably by the FPU restore code -- see
>>>> below. You should be checking the SHSTK_EN bit, not SSP.
>>>
>>> Updated. Is this OK? I will resend the whole series later.
>>>
>>> Thanks,
>>> Yu-cheng
>>>
>>> ======
>>>
>>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
>>> From: Yu-cheng Yu <[email protected]>
>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
>>> Indirect Branch
>>> Tracking for vsyscall emulation
>>>
>>> Vsyscall entry points are effectively branch targets. Mark them with
>>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
>>> and reset IBT state machine.
>>>
>>> Signed-off-by: Yu-cheng Yu <[email protected]>
>>> ---
>>> v13:
>>> - Check shadow stack address is canonical.
>>> - Change from writing to MSRs to writing to CET xstate.
>>>
>>> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
>>> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
>>> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
>>> 3 files changed, 44 insertions(+)
>>>
>>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
>>> b/arch/x86/entry/vsyscall/vsyscall_64.c
>>> index 44c33103a955..30b166091d46 100644
>>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
>>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
>>> @@ -38,6 +38,9 @@
>>> #include <asm/fixmap.h>
>>> #include <asm/traps.h>
>>> #include <asm/paravirt.h>
>>> +#include <asm/fpu/xstate.h>
>>> +#include <asm/fpu/types.h>
>>> +#include <asm/fpu/internal.h>
>>>
>>> #define CREATE_TRACE_POINTS
>>> #include "vsyscall_trace.h"
>>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
>>> /* Emulate a ret instruction. */
>>> regs->ip = caller;
>>> regs->sp += 8;
>>> +
>>> +#ifdef CONFIG_X86_CET
>>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
>>> + struct cet_user_state *cet;
>>> + struct fpu *fpu;
>>> +
>>> + fpu = &tsk->thread.fpu;
>>> + fpregs_lock();
>>> +
>>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
>>> + copy_fpregs_to_fpstate(fpu);
>>> + set_thread_flag(TIF_NEED_FPU_LOAD);
>>> + }
>>> +
>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>>> + if (!cet) {
>>> + /*
>>> + * This should not happen. The task is
>>> + * CET-enabled, but CET xstate is in INIT.
>>> + */
>>
>> Can the comment explain better, please? I would say something like:
>>
>> If the kernel thinks this task has CET enabled (because
>> tsk->thread.cet has one of the features enabled), then the
>> corresponding bits must also be set in the CET XSAVES region. If the
>> CET XSAVES region is in the INIT state, then the kernel's concept of
>> the task's CET state is corrupt.
>>
>>> + WARN_ONCE(1, "CET is enabled, but no xstates");
>>> + fpregs_unlock();
>>> + goto sigsegv;
>>> + }
>>> +
>>> + if (cet->user_cet & CET_SHSTK_EN) {
>>> + if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
>>> + cet->user_ssp += 8;
>>> + }
>>
>> This makes so sense to me. Also, the vsyscall emulation code is
>> intended to be as rigid as possible to minimize the chance that it
>> gets used as an exploit gadget. So we should not silently corrupt
>> anything. Moreover, this code seems quite dangerous -- you've created
>> a gadget that does RET without actually verifying the SHSTK token. If
>> SHSTK and some form of strong indirect branch/call CFI is in use, then
>> the existance of a CFI-bypassing return primitive at a fixed address
>> seems quite problematic.
>>
>> So I think you need to write a function that reasonably accurately
>> emulates a usermode RET.
>>
>
> For what it's worth, I think there is an alternative. If you all
> (userspace people, etc) can come up with a credible way for a user
> program to statically declare that it doesn't need vsyscalls, then we
> could make SHSTK depend on *that*, and we could avoid this mess. This
> breaks orthogonality, but it's probably a decent outcome.
>

Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
thread flag, and in emulate_vsyscall(), checks the flag.

When CET is enabled, ld-linux will do DISABLE_VSYSCALL.

How is that?

Yu-cheng

2020-10-01 01:11:28

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Wed, Sep 30, 2020 at 3:33 PM Yu, Yu-cheng <[email protected]> wrote:
>
> On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> > On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <[email protected]> wrote:
> >>
> >> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <[email protected]> wrote:
> >>>
> >>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> >>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
> >>>>>
> >>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> >>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
> >>>>> +
> >>>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> >>>>> + if (!cet) {
> >>>>> + /*
> >>>>> + * This is an unlikely case where the task is
> >>>>> + * CET-enabled, but CET xstate is in INIT.
> >>>>> + */
> >>>>> + WARN_ONCE(1, "CET is enabled, but no xstates");
> >>>>
> >>>> "unlikely" doesn't really cover this.
> >>>>
> >>>>> + fpregs_unlock();
> >>>>> + goto sigsegv;
> >>>>> + }
> >>>>> +
> >>>>> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> >>>>> + cet->user_ssp += 8;
> >>>>
> >>>> This looks buggy. The condition should be "if SHSTK is on, then add 8
> >>>> to user_ssp". If the result is noncanonical, then some appropriate
> >>>> exception should be generated, probably by the FPU restore code -- see
> >>>> below. You should be checking the SHSTK_EN bit, not SSP.
> >>>
> >>> Updated. Is this OK? I will resend the whole series later.
> >>>
> >>> Thanks,
> >>> Yu-cheng
> >>>
> >>> ======
> >>>
> >>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> >>> From: Yu-cheng Yu <[email protected]>
> >>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> >>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> >>> Indirect Branch
> >>> Tracking for vsyscall emulation
> >>>
> >>> Vsyscall entry points are effectively branch targets. Mark them with
> >>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> >>> and reset IBT state machine.
> >>>
> >>> Signed-off-by: Yu-cheng Yu <[email protected]>
> >>> ---
> >>> v13:
> >>> - Check shadow stack address is canonical.
> >>> - Change from writing to MSRs to writing to CET xstate.
> >>>
> >>> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
> >>> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
> >>> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
> >>> 3 files changed, 44 insertions(+)
> >>>
> >>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> >>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> >>> index 44c33103a955..30b166091d46 100644
> >>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> >>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> >>> @@ -38,6 +38,9 @@
> >>> #include <asm/fixmap.h>
> >>> #include <asm/traps.h>
> >>> #include <asm/paravirt.h>
> >>> +#include <asm/fpu/xstate.h>
> >>> +#include <asm/fpu/types.h>
> >>> +#include <asm/fpu/internal.h>
> >>>
> >>> #define CREATE_TRACE_POINTS
> >>> #include "vsyscall_trace.h"
> >>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> >>> /* Emulate a ret instruction. */
> >>> regs->ip = caller;
> >>> regs->sp += 8;
> >>> +
> >>> +#ifdef CONFIG_X86_CET
> >>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> >>> + struct cet_user_state *cet;
> >>> + struct fpu *fpu;
> >>> +
> >>> + fpu = &tsk->thread.fpu;
> >>> + fpregs_lock();
> >>> +
> >>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> >>> + copy_fpregs_to_fpstate(fpu);
> >>> + set_thread_flag(TIF_NEED_FPU_LOAD);
> >>> + }
> >>> +
> >>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> >>> + if (!cet) {
> >>> + /*
> >>> + * This should not happen. The task is
> >>> + * CET-enabled, but CET xstate is in INIT.
> >>> + */
> >>
> >> Can the comment explain better, please? I would say something like:
> >>
> >> If the kernel thinks this task has CET enabled (because
> >> tsk->thread.cet has one of the features enabled), then the
> >> corresponding bits must also be set in the CET XSAVES region. If the
> >> CET XSAVES region is in the INIT state, then the kernel's concept of
> >> the task's CET state is corrupt.
> >>
> >>> + WARN_ONCE(1, "CET is enabled, but no xstates");
> >>> + fpregs_unlock();
> >>> + goto sigsegv;
> >>> + }
> >>> +
> >>> + if (cet->user_cet & CET_SHSTK_EN) {
> >>> + if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> >>> + cet->user_ssp += 8;
> >>> + }
> >>
> >> This makes so sense to me. Also, the vsyscall emulation code is
> >> intended to be as rigid as possible to minimize the chance that it
> >> gets used as an exploit gadget. So we should not silently corrupt
> >> anything. Moreover, this code seems quite dangerous -- you've created
> >> a gadget that does RET without actually verifying the SHSTK token. If
> >> SHSTK and some form of strong indirect branch/call CFI is in use, then
> >> the existance of a CFI-bypassing return primitive at a fixed address
> >> seems quite problematic.
> >>
> >> So I think you need to write a function that reasonably accurately
> >> emulates a usermode RET.
> >>
> >
> > For what it's worth, I think there is an alternative. If you all
> > (userspace people, etc) can come up with a credible way for a user
> > program to statically declare that it doesn't need vsyscalls, then we
> > could make SHSTK depend on *that*, and we could avoid this mess. This
> > breaks orthogonality, but it's probably a decent outcome.
> >
>
> Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
> thread flag, and in emulate_vsyscall(), checks the flag.
>
> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
>
> How is that?

Backwards, no? Presumably vsyscall needs to be disabled before or
concurrently with CET being enabled, not after.

I think the solution of making vsyscall emulation work correctly with
CET is going to be better and possibly more straightforward.

>
> Yu-cheng

2020-10-01 01:18:00

by H.J. Lu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <[email protected]> wrote:
>
> On Wed, Sep 30, 2020 at 3:33 PM Yu, Yu-cheng <[email protected]> wrote:
> >
> > On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> > > On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <[email protected]> wrote:
> > >>
> > >> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <[email protected]> wrote:
> > >>>
> > >>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > >>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
> > >>>>>
> > >>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > >>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
> > >>>>> +
> > >>>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > >>>>> + if (!cet) {
> > >>>>> + /*
> > >>>>> + * This is an unlikely case where the task is
> > >>>>> + * CET-enabled, but CET xstate is in INIT.
> > >>>>> + */
> > >>>>> + WARN_ONCE(1, "CET is enabled, but no xstates");
> > >>>>
> > >>>> "unlikely" doesn't really cover this.
> > >>>>
> > >>>>> + fpregs_unlock();
> > >>>>> + goto sigsegv;
> > >>>>> + }
> > >>>>> +
> > >>>>> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> > >>>>> + cet->user_ssp += 8;
> > >>>>
> > >>>> This looks buggy. The condition should be "if SHSTK is on, then add 8
> > >>>> to user_ssp". If the result is noncanonical, then some appropriate
> > >>>> exception should be generated, probably by the FPU restore code -- see
> > >>>> below. You should be checking the SHSTK_EN bit, not SSP.
> > >>>
> > >>> Updated. Is this OK? I will resend the whole series later.
> > >>>
> > >>> Thanks,
> > >>> Yu-cheng
> > >>>
> > >>> ======
> > >>>
> > >>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> > >>> From: Yu-cheng Yu <[email protected]>
> > >>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> > >>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> > >>> Indirect Branch
> > >>> Tracking for vsyscall emulation
> > >>>
> > >>> Vsyscall entry points are effectively branch targets. Mark them with
> > >>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> > >>> and reset IBT state machine.
> > >>>
> > >>> Signed-off-by: Yu-cheng Yu <[email protected]>
> > >>> ---
> > >>> v13:
> > >>> - Check shadow stack address is canonical.
> > >>> - Change from writing to MSRs to writing to CET xstate.
> > >>>
> > >>> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
> > >>> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
> > >>> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
> > >>> 3 files changed, 44 insertions(+)
> > >>>
> > >>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> > >>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> > >>> index 44c33103a955..30b166091d46 100644
> > >>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > >>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > >>> @@ -38,6 +38,9 @@
> > >>> #include <asm/fixmap.h>
> > >>> #include <asm/traps.h>
> > >>> #include <asm/paravirt.h>
> > >>> +#include <asm/fpu/xstate.h>
> > >>> +#include <asm/fpu/types.h>
> > >>> +#include <asm/fpu/internal.h>
> > >>>
> > >>> #define CREATE_TRACE_POINTS
> > >>> #include "vsyscall_trace.h"
> > >>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> > >>> /* Emulate a ret instruction. */
> > >>> regs->ip = caller;
> > >>> regs->sp += 8;
> > >>> +
> > >>> +#ifdef CONFIG_X86_CET
> > >>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > >>> + struct cet_user_state *cet;
> > >>> + struct fpu *fpu;
> > >>> +
> > >>> + fpu = &tsk->thread.fpu;
> > >>> + fpregs_lock();
> > >>> +
> > >>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > >>> + copy_fpregs_to_fpstate(fpu);
> > >>> + set_thread_flag(TIF_NEED_FPU_LOAD);
> > >>> + }
> > >>> +
> > >>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > >>> + if (!cet) {
> > >>> + /*
> > >>> + * This should not happen. The task is
> > >>> + * CET-enabled, but CET xstate is in INIT.
> > >>> + */
> > >>
> > >> Can the comment explain better, please? I would say something like:
> > >>
> > >> If the kernel thinks this task has CET enabled (because
> > >> tsk->thread.cet has one of the features enabled), then the
> > >> corresponding bits must also be set in the CET XSAVES region. If the
> > >> CET XSAVES region is in the INIT state, then the kernel's concept of
> > >> the task's CET state is corrupt.
> > >>
> > >>> + WARN_ONCE(1, "CET is enabled, but no xstates");
> > >>> + fpregs_unlock();
> > >>> + goto sigsegv;
> > >>> + }
> > >>> +
> > >>> + if (cet->user_cet & CET_SHSTK_EN) {
> > >>> + if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> > >>> + cet->user_ssp += 8;
> > >>> + }
> > >>
> > >> This makes so sense to me. Also, the vsyscall emulation code is
> > >> intended to be as rigid as possible to minimize the chance that it
> > >> gets used as an exploit gadget. So we should not silently corrupt
> > >> anything. Moreover, this code seems quite dangerous -- you've created
> > >> a gadget that does RET without actually verifying the SHSTK token. If
> > >> SHSTK and some form of strong indirect branch/call CFI is in use, then
> > >> the existance of a CFI-bypassing return primitive at a fixed address
> > >> seems quite problematic.
> > >>
> > >> So I think you need to write a function that reasonably accurately
> > >> emulates a usermode RET.
> > >>
> > >
> > > For what it's worth, I think there is an alternative. If you all
> > > (userspace people, etc) can come up with a credible way for a user
> > > program to statically declare that it doesn't need vsyscalls, then we
> > > could make SHSTK depend on *that*, and we could avoid this mess. This
> > > breaks orthogonality, but it's probably a decent outcome.
> > >
> >
> > Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
> > thread flag, and in emulate_vsyscall(), checks the flag.
> >
> > When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> >
> > How is that?
>
> Backwards, no? Presumably vsyscall needs to be disabled before or
> concurrently with CET being enabled, not after.
>
> I think the solution of making vsyscall emulation work correctly with
> CET is going to be better and possibly more straightforward.
>

We can do

1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
2. If CPU supports CET and the program is CET enabled:
a. Disable the vsyscall page.
b. Pass control to user.
c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.

So when control is passed from kernel to user, the vsyscall page is
disabled if the program
is CET enabled.

--
H.J.

2020-10-01 01:19:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <[email protected]> wrote:
>
> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <[email protected]> wrote:
> >
> > On Wed, Sep 30, 2020 at 3:33 PM Yu, Yu-cheng <[email protected]> wrote:
> > >
> > > On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> > > > On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <[email protected]> wrote:
> > > >>
> > > >> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <[email protected]> wrote:
> > > >>>
> > > >>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > > >>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
> > > >>>>>
> > > >>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > > >>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
> > > >>>>> +
> > > >>>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > >>>>> + if (!cet) {
> > > >>>>> + /*
> > > >>>>> + * This is an unlikely case where the task is
> > > >>>>> + * CET-enabled, but CET xstate is in INIT.
> > > >>>>> + */
> > > >>>>> + WARN_ONCE(1, "CET is enabled, but no xstates");
> > > >>>>
> > > >>>> "unlikely" doesn't really cover this.
> > > >>>>
> > > >>>>> + fpregs_unlock();
> > > >>>>> + goto sigsegv;
> > > >>>>> + }
> > > >>>>> +
> > > >>>>> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> > > >>>>> + cet->user_ssp += 8;
> > > >>>>
> > > >>>> This looks buggy. The condition should be "if SHSTK is on, then add 8
> > > >>>> to user_ssp". If the result is noncanonical, then some appropriate
> > > >>>> exception should be generated, probably by the FPU restore code -- see
> > > >>>> below. You should be checking the SHSTK_EN bit, not SSP.
> > > >>>
> > > >>> Updated. Is this OK? I will resend the whole series later.
> > > >>>
> > > >>> Thanks,
> > > >>> Yu-cheng
> > > >>>
> > > >>> ======
> > > >>>
> > > >>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> > > >>> From: Yu-cheng Yu <[email protected]>
> > > >>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> > > >>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> > > >>> Indirect Branch
> > > >>> Tracking for vsyscall emulation
> > > >>>
> > > >>> Vsyscall entry points are effectively branch targets. Mark them with
> > > >>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> > > >>> and reset IBT state machine.
> > > >>>
> > > >>> Signed-off-by: Yu-cheng Yu <[email protected]>
> > > >>> ---
> > > >>> v13:
> > > >>> - Check shadow stack address is canonical.
> > > >>> - Change from writing to MSRs to writing to CET xstate.
> > > >>>
> > > >>> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
> > > >>> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
> > > >>> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
> > > >>> 3 files changed, 44 insertions(+)
> > > >>>
> > > >>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> > > >>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> > > >>> index 44c33103a955..30b166091d46 100644
> > > >>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > > >>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > > >>> @@ -38,6 +38,9 @@
> > > >>> #include <asm/fixmap.h>
> > > >>> #include <asm/traps.h>
> > > >>> #include <asm/paravirt.h>
> > > >>> +#include <asm/fpu/xstate.h>
> > > >>> +#include <asm/fpu/types.h>
> > > >>> +#include <asm/fpu/internal.h>
> > > >>>
> > > >>> #define CREATE_TRACE_POINTS
> > > >>> #include "vsyscall_trace.h"
> > > >>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> > > >>> /* Emulate a ret instruction. */
> > > >>> regs->ip = caller;
> > > >>> regs->sp += 8;
> > > >>> +
> > > >>> +#ifdef CONFIG_X86_CET
> > > >>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > > >>> + struct cet_user_state *cet;
> > > >>> + struct fpu *fpu;
> > > >>> +
> > > >>> + fpu = &tsk->thread.fpu;
> > > >>> + fpregs_lock();
> > > >>> +
> > > >>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > > >>> + copy_fpregs_to_fpstate(fpu);
> > > >>> + set_thread_flag(TIF_NEED_FPU_LOAD);
> > > >>> + }
> > > >>> +
> > > >>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > >>> + if (!cet) {
> > > >>> + /*
> > > >>> + * This should not happen. The task is
> > > >>> + * CET-enabled, but CET xstate is in INIT.
> > > >>> + */
> > > >>
> > > >> Can the comment explain better, please? I would say something like:
> > > >>
> > > >> If the kernel thinks this task has CET enabled (because
> > > >> tsk->thread.cet has one of the features enabled), then the
> > > >> corresponding bits must also be set in the CET XSAVES region. If the
> > > >> CET XSAVES region is in the INIT state, then the kernel's concept of
> > > >> the task's CET state is corrupt.
> > > >>
> > > >>> + WARN_ONCE(1, "CET is enabled, but no xstates");
> > > >>> + fpregs_unlock();
> > > >>> + goto sigsegv;
> > > >>> + }
> > > >>> +
> > > >>> + if (cet->user_cet & CET_SHSTK_EN) {
> > > >>> + if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> > > >>> + cet->user_ssp += 8;
> > > >>> + }
> > > >>
> > > >> This makes so sense to me. Also, the vsyscall emulation code is
> > > >> intended to be as rigid as possible to minimize the chance that it
> > > >> gets used as an exploit gadget. So we should not silently corrupt
> > > >> anything. Moreover, this code seems quite dangerous -- you've created
> > > >> a gadget that does RET without actually verifying the SHSTK token. If
> > > >> SHSTK and some form of strong indirect branch/call CFI is in use, then
> > > >> the existance of a CFI-bypassing return primitive at a fixed address
> > > >> seems quite problematic.
> > > >>
> > > >> So I think you need to write a function that reasonably accurately
> > > >> emulates a usermode RET.
> > > >>
> > > >
> > > > For what it's worth, I think there is an alternative. If you all
> > > > (userspace people, etc) can come up with a credible way for a user
> > > > program to statically declare that it doesn't need vsyscalls, then we
> > > > could make SHSTK depend on *that*, and we could avoid this mess. This
> > > > breaks orthogonality, but it's probably a decent outcome.
> > > >
> > >
> > > Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
> > > thread flag, and in emulate_vsyscall(), checks the flag.
> > >
> > > When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> > >
> > > How is that?
> >
> > Backwards, no? Presumably vsyscall needs to be disabled before or
> > concurrently with CET being enabled, not after.
> >
> > I think the solution of making vsyscall emulation work correctly with
> > CET is going to be better and possibly more straightforward.
> >
>
> We can do
>
> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
> 2. If CPU supports CET and the program is CET enabled:
> a. Disable the vsyscall page.
> b. Pass control to user.
> c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
>
> So when control is passed from kernel to user, the vsyscall page is
> disabled if the program
> is CET enabled.

Let me say this one more time:

If we have a per-process vsyscall disable control and a per-process
CET control, we are going to keep those settings orthogonal. I'm
willing to entertain an option in which enabling SHSTK without also
disabling vsyscalls is disallowed, We are *not* going to have any CET
flags magically disable vsyscalls, though, and we are not going to
have a situation where disabling vsyscalls on process startup requires
enabling SHSTK.

Any possible static vsyscall controls (and CET controls, for that
matter) also need to come with some explanation of whether they are
properties set on the ELF loader, the ELF program being loaded, or
both. And this explanation needs to cover what happens when old
binaries link against new libc versions and vice versa. A new
CET-enabled binary linked against old libc running on a new kernel
that is expected to work on a non-CET CPU MUST work on a CET CPU, too.

Right now, literally the only thing preventing vsyscall emulation from
coexisting with SHSTK is that the implementation eeds work.

So your proposal is rejected. Sorry.

2020-10-01 01:23:27

by H.J. Lu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Wed, Sep 30, 2020 at 6:10 PM Andy Lutomirski <[email protected]> wrote:
>
> On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <[email protected]> wrote:
> >
> > On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <[email protected]> wrote:
> > >
> > > On Wed, Sep 30, 2020 at 3:33 PM Yu, Yu-cheng <[email protected]> wrote:
> > > >
> > > > On 9/29/2020 1:00 PM, Andy Lutomirski wrote:
> > > > > On Tue, Sep 29, 2020 at 12:57 PM Andy Lutomirski <[email protected]> wrote:
> > > > >>
> > > > >> On Tue, Sep 29, 2020 at 11:37 AM Yu, Yu-cheng <[email protected]> wrote:
> > > > >>>
> > > > >>> On 9/28/2020 10:37 AM, Andy Lutomirski wrote:
> > > > >>>> On Mon, Sep 28, 2020 at 9:59 AM Yu-cheng Yu <[email protected]> wrote:
> > > > >>>>>
> > > > >>>>> On Fri, 2020-09-25 at 09:51 -0700, Andy Lutomirski wrote:
> > > > >>>>>>> On Sep 25, 2020, at 9:48 AM, Yu, Yu-cheng <[email protected]> wrote:
> > > > >>>>> +
> > > > >>>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > > >>>>> + if (!cet) {
> > > > >>>>> + /*
> > > > >>>>> + * This is an unlikely case where the task is
> > > > >>>>> + * CET-enabled, but CET xstate is in INIT.
> > > > >>>>> + */
> > > > >>>>> + WARN_ONCE(1, "CET is enabled, but no xstates");
> > > > >>>>
> > > > >>>> "unlikely" doesn't really cover this.
> > > > >>>>
> > > > >>>>> + fpregs_unlock();
> > > > >>>>> + goto sigsegv;
> > > > >>>>> + }
> > > > >>>>> +
> > > > >>>>> + if (cet->user_ssp && ((cet->user_ssp + 8) < TASK_SIZE_MAX))
> > > > >>>>> + cet->user_ssp += 8;
> > > > >>>>
> > > > >>>> This looks buggy. The condition should be "if SHSTK is on, then add 8
> > > > >>>> to user_ssp". If the result is noncanonical, then some appropriate
> > > > >>>> exception should be generated, probably by the FPU restore code -- see
> > > > >>>> below. You should be checking the SHSTK_EN bit, not SSP.
> > > > >>>
> > > > >>> Updated. Is this OK? I will resend the whole series later.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Yu-cheng
> > > > >>>
> > > > >>> ======
> > > > >>>
> > > > >>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> > > > >>> From: Yu-cheng Yu <[email protected]>
> > > > >>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> > > > >>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> > > > >>> Indirect Branch
> > > > >>> Tracking for vsyscall emulation
> > > > >>>
> > > > >>> Vsyscall entry points are effectively branch targets. Mark them with
> > > > >>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> > > > >>> and reset IBT state machine.
> > > > >>>
> > > > >>> Signed-off-by: Yu-cheng Yu <[email protected]>
> > > > >>> ---
> > > > >>> v13:
> > > > >>> - Check shadow stack address is canonical.
> > > > >>> - Change from writing to MSRs to writing to CET xstate.
> > > > >>>
> > > > >>> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
> > > > >>> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
> > > > >>> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
> > > > >>> 3 files changed, 44 insertions(+)
> > > > >>>
> > > > >>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> > > > >>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> > > > >>> index 44c33103a955..30b166091d46 100644
> > > > >>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > > > >>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > > > >>> @@ -38,6 +38,9 @@
> > > > >>> #include <asm/fixmap.h>
> > > > >>> #include <asm/traps.h>
> > > > >>> #include <asm/paravirt.h>
> > > > >>> +#include <asm/fpu/xstate.h>
> > > > >>> +#include <asm/fpu/types.h>
> > > > >>> +#include <asm/fpu/internal.h>
> > > > >>>
> > > > >>> #define CREATE_TRACE_POINTS
> > > > >>> #include "vsyscall_trace.h"
> > > > >>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> > > > >>> /* Emulate a ret instruction. */
> > > > >>> regs->ip = caller;
> > > > >>> regs->sp += 8;
> > > > >>> +
> > > > >>> +#ifdef CONFIG_X86_CET
> > > > >>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> > > > >>> + struct cet_user_state *cet;
> > > > >>> + struct fpu *fpu;
> > > > >>> +
> > > > >>> + fpu = &tsk->thread.fpu;
> > > > >>> + fpregs_lock();
> > > > >>> +
> > > > >>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > > > >>> + copy_fpregs_to_fpstate(fpu);
> > > > >>> + set_thread_flag(TIF_NEED_FPU_LOAD);
> > > > >>> + }
> > > > >>> +
> > > > >>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> > > > >>> + if (!cet) {
> > > > >>> + /*
> > > > >>> + * This should not happen. The task is
> > > > >>> + * CET-enabled, but CET xstate is in INIT.
> > > > >>> + */
> > > > >>
> > > > >> Can the comment explain better, please? I would say something like:
> > > > >>
> > > > >> If the kernel thinks this task has CET enabled (because
> > > > >> tsk->thread.cet has one of the features enabled), then the
> > > > >> corresponding bits must also be set in the CET XSAVES region. If the
> > > > >> CET XSAVES region is in the INIT state, then the kernel's concept of
> > > > >> the task's CET state is corrupt.
> > > > >>
> > > > >>> + WARN_ONCE(1, "CET is enabled, but no xstates");
> > > > >>> + fpregs_unlock();
> > > > >>> + goto sigsegv;
> > > > >>> + }
> > > > >>> +
> > > > >>> + if (cet->user_cet & CET_SHSTK_EN) {
> > > > >>> + if (cet->user_ssp && (cet->user_ssp + 8 < TASK_SIZE_MAX))
> > > > >>> + cet->user_ssp += 8;
> > > > >>> + }
> > > > >>
> > > > >> This makes so sense to me. Also, the vsyscall emulation code is
> > > > >> intended to be as rigid as possible to minimize the chance that it
> > > > >> gets used as an exploit gadget. So we should not silently corrupt
> > > > >> anything. Moreover, this code seems quite dangerous -- you've created
> > > > >> a gadget that does RET without actually verifying the SHSTK token. If
> > > > >> SHSTK and some form of strong indirect branch/call CFI is in use, then
> > > > >> the existance of a CFI-bypassing return primitive at a fixed address
> > > > >> seems quite problematic.
> > > > >>
> > > > >> So I think you need to write a function that reasonably accurately
> > > > >> emulates a usermode RET.
> > > > >>
> > > > >
> > > > > For what it's worth, I think there is an alternative. If you all
> > > > > (userspace people, etc) can come up with a credible way for a user
> > > > > program to statically declare that it doesn't need vsyscalls, then we
> > > > > could make SHSTK depend on *that*, and we could avoid this mess. This
> > > > > breaks orthogonality, but it's probably a decent outcome.
> > > > >
> > > >
> > > > Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
> > > > thread flag, and in emulate_vsyscall(), checks the flag.
> > > >
> > > > When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> > > >
> > > > How is that?
> > >
> > > Backwards, no? Presumably vsyscall needs to be disabled before or
> > > concurrently with CET being enabled, not after.
> > >
> > > I think the solution of making vsyscall emulation work correctly with
> > > CET is going to be better and possibly more straightforward.
> > >
> >
> > We can do
> >
> > 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
> > 2. If CPU supports CET and the program is CET enabled:
> > a. Disable the vsyscall page.
> > b. Pass control to user.
> > c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
> >
> > So when control is passed from kernel to user, the vsyscall page is
> > disabled if the program
> > is CET enabled.
>
> Let me say this one more time:
>
> If we have a per-process vsyscall disable control and a per-process
> CET control, we are going to keep those settings orthogonal. I'm
> willing to entertain an option in which enabling SHSTK without also
> disabling vsyscalls is disallowed, We are *not* going to have any CET
> flags magically disable vsyscalls, though, and we are not going to
> have a situation where disabling vsyscalls on process startup requires
> enabling SHSTK.
>
> Any possible static vsyscall controls (and CET controls, for that
> matter) also need to come with some explanation of whether they are
> properties set on the ELF loader, the ELF program being loaded, or

Kernel enables CET on CET processors only if ld.so is CET enabled.
Kernel passes control to CET enabled ld.so with CET enabled and
ld.so will check if CET should be disabled because of legacy program
or dependency libraries.

> both. And this explanation needs to cover what happens when old
> binaries link against new libc versions and vice versa. A new
> CET-enabled binary linked against old libc running on a new kernel
> that is expected to work on a non-CET CPU MUST work on a CET CPU, too.

Since kernel doesn't enable CET on ld.so from the old libc, the new
CET-enabled binary will start with CET disabled regardless whatever the
processor the binary runs on.

> Right now, literally the only thing preventing vsyscall emulation from
> coexisting with SHSTK is that the implementation eeds work.
>
> So your proposal is rejected. Sorry.



--
H.J.

2020-10-01 16:55:13

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On 9/30/2020 6:10 PM, Andy Lutomirski wrote:
> On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <[email protected]> wrote:
>>
>> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <[email protected]> wrote:

[...]

>>>>>>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
>>>>>>> From: Yu-cheng Yu <[email protected]>
>>>>>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
>>>>>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
>>>>>>> Indirect Branch
>>>>>>> Tracking for vsyscall emulation
>>>>>>>
>>>>>>> Vsyscall entry points are effectively branch targets. Mark them with
>>>>>>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
>>>>>>> and reset IBT state machine.
>>>>>>>
>>>>>>> Signed-off-by: Yu-cheng Yu <[email protected]>
>>>>>>> ---
>>>>>>> v13:
>>>>>>> - Check shadow stack address is canonical.
>>>>>>> - Change from writing to MSRs to writing to CET xstate.
>>>>>>>
>>>>>>> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
>>>>>>> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
>>>>>>> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
>>>>>>> 3 files changed, 44 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
>>>>>>> b/arch/x86/entry/vsyscall/vsyscall_64.c
>>>>>>> index 44c33103a955..30b166091d46 100644
>>>>>>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
>>>>>>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
>>>>>>> @@ -38,6 +38,9 @@
>>>>>>> #include <asm/fixmap.h>
>>>>>>> #include <asm/traps.h>
>>>>>>> #include <asm/paravirt.h>
>>>>>>> +#include <asm/fpu/xstate.h>
>>>>>>> +#include <asm/fpu/types.h>
>>>>>>> +#include <asm/fpu/internal.h>
>>>>>>>
>>>>>>> #define CREATE_TRACE_POINTS
>>>>>>> #include "vsyscall_trace.h"
>>>>>>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
>>>>>>> /* Emulate a ret instruction. */
>>>>>>> regs->ip = caller;
>>>>>>> regs->sp += 8;
>>>>>>> +
>>>>>>> +#ifdef CONFIG_X86_CET
>>>>>>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
>>>>>>> + struct cet_user_state *cet;
>>>>>>> + struct fpu *fpu;
>>>>>>> +
>>>>>>> + fpu = &tsk->thread.fpu;
>>>>>>> + fpregs_lock();
>>>>>>> +
>>>>>>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
>>>>>>> + copy_fpregs_to_fpstate(fpu);
>>>>>>> + set_thread_flag(TIF_NEED_FPU_LOAD);
>>>>>>> + }
>>>>>>> +
>>>>>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
>>>>>>> + if (!cet) {
>>>>>>> + /*
>>>>>>> + * This should not happen. The task is
>>>>>>> + * CET-enabled, but CET xstate is in INIT.
>>>>>>> + */
>>>>>>
[...]
>>>>>>
>>>>>
>>>>> For what it's worth, I think there is an alternative. If you all
>>>>> (userspace people, etc) can come up with a credible way for a user
>>>>> program to statically declare that it doesn't need vsyscalls, then we
>>>>> could make SHSTK depend on *that*, and we could avoid this mess. This
>>>>> breaks orthogonality, but it's probably a decent outcome.
>>>>>
>>>>
>>>> Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
>>>> thread flag, and in emulate_vsyscall(), checks the flag.
>>>>
>>>> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
>>>>
>>>> How is that?
>>>
>>> Backwards, no? Presumably vsyscall needs to be disabled before or
>>> concurrently with CET being enabled, not after.
>>>
>>> I think the solution of making vsyscall emulation work correctly with
>>> CET is going to be better and possibly more straightforward.
>>>
>>
>> We can do
>>
>> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
>> 2. If CPU supports CET and the program is CET enabled:
>> a. Disable the vsyscall page.
>> b. Pass control to user.
>> c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
>>
>> So when control is passed from kernel to user, the vsyscall page is
>> disabled if the program
>> is CET enabled.
>
> Let me say this one more time:
>
> If we have a per-process vsyscall disable control and a per-process
> CET control, we are going to keep those settings orthogonal. I'm
> willing to entertain an option in which enabling SHSTK without also
> disabling vsyscalls is disallowed, We are *not* going to have any CET
> flags magically disable vsyscalls, though, and we are not going to
> have a situation where disabling vsyscalls on process startup requires
> enabling SHSTK.
>
> Any possible static vsyscall controls (and CET controls, for that
> matter) also need to come with some explanation of whether they are
> properties set on the ELF loader, the ELF program being loaded, or
> both. And this explanation needs to cover what happens when old
> binaries link against new libc versions and vice versa. A new
> CET-enabled binary linked against old libc running on a new kernel
> that is expected to work on a non-CET CPU MUST work on a CET CPU, too.
>
> Right now, literally the only thing preventing vsyscall emulation from
> coexisting with SHSTK is that the implementation eeds work.
>
> So your proposal is rejected. Sorry.
>
I think, even with shadow stack/ibt enabled, we can still allow XONLY
without too much mess.

What about this?

Thanks,
Yu-cheng

======

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 8b0b32ac7791..d39da0a15521 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -48,16 +48,16 @@
static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init =
#ifdef CONFIG_LEGACY_VSYSCALL_NONE
NONE;
-#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY)
+#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) || defined(CONFIG_X86_CET)
XONLY;
-#else
+#else
EMULATE;
#endif

static int __init vsyscall_setup(char *str)
{
if (str) {
- if (!strcmp("emulate", str))
+ if (!strcmp("emulate", str) && !IS_ENABLED(CONFIG_X86_CET))
vsyscall_mode = EMULATE;
else if (!strcmp("xonly", str))
vsyscall_mode = XONLY;

2020-10-01 17:29:29

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Thu, Oct 1, 2020 at 9:51 AM Yu, Yu-cheng <[email protected]> wrote:
>
> On 9/30/2020 6:10 PM, Andy Lutomirski wrote:
> > On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <[email protected]> wrote:
> >>
> >> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <[email protected]> wrote:
>
> [...]
>
> >>>>>>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> >>>>>>> From: Yu-cheng Yu <[email protected]>
> >>>>>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> >>>>>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> >>>>>>> Indirect Branch
> >>>>>>> Tracking for vsyscall emulation
> >>>>>>>
> >>>>>>> Vsyscall entry points are effectively branch targets. Mark them with
> >>>>>>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> >>>>>>> and reset IBT state machine.
> >>>>>>>
> >>>>>>> Signed-off-by: Yu-cheng Yu <[email protected]>
> >>>>>>> ---
> >>>>>>> v13:
> >>>>>>> - Check shadow stack address is canonical.
> >>>>>>> - Change from writing to MSRs to writing to CET xstate.
> >>>>>>>
> >>>>>>> arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++
> >>>>>>> arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 ++++++
> >>>>>>> arch/x86/entry/vsyscall/vsyscall_trace.h | 1 +
> >>>>>>> 3 files changed, 44 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> >>>>>>> b/arch/x86/entry/vsyscall/vsyscall_64.c
> >>>>>>> index 44c33103a955..30b166091d46 100644
> >>>>>>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> >>>>>>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> >>>>>>> @@ -38,6 +38,9 @@
> >>>>>>> #include <asm/fixmap.h>
> >>>>>>> #include <asm/traps.h>
> >>>>>>> #include <asm/paravirt.h>
> >>>>>>> +#include <asm/fpu/xstate.h>
> >>>>>>> +#include <asm/fpu/types.h>
> >>>>>>> +#include <asm/fpu/internal.h>
> >>>>>>>
> >>>>>>> #define CREATE_TRACE_POINTS
> >>>>>>> #include "vsyscall_trace.h"
> >>>>>>> @@ -286,6 +289,44 @@ bool emulate_vsyscall(unsigned long error_code,
> >>>>>>> /* Emulate a ret instruction. */
> >>>>>>> regs->ip = caller;
> >>>>>>> regs->sp += 8;
> >>>>>>> +
> >>>>>>> +#ifdef CONFIG_X86_CET
> >>>>>>> + if (tsk->thread.cet.shstk_size || tsk->thread.cet.ibt_enabled) {
> >>>>>>> + struct cet_user_state *cet;
> >>>>>>> + struct fpu *fpu;
> >>>>>>> +
> >>>>>>> + fpu = &tsk->thread.fpu;
> >>>>>>> + fpregs_lock();
> >>>>>>> +
> >>>>>>> + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> >>>>>>> + copy_fpregs_to_fpstate(fpu);
> >>>>>>> + set_thread_flag(TIF_NEED_FPU_LOAD);
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> + cet = get_xsave_addr(&fpu->state.xsave, XFEATURE_CET_USER);
> >>>>>>> + if (!cet) {
> >>>>>>> + /*
> >>>>>>> + * This should not happen. The task is
> >>>>>>> + * CET-enabled, but CET xstate is in INIT.
> >>>>>>> + */
> >>>>>>
> [...]
> >>>>>>
> >>>>>
> >>>>> For what it's worth, I think there is an alternative. If you all
> >>>>> (userspace people, etc) can come up with a credible way for a user
> >>>>> program to statically declare that it doesn't need vsyscalls, then we
> >>>>> could make SHSTK depend on *that*, and we could avoid this mess. This
> >>>>> breaks orthogonality, but it's probably a decent outcome.
> >>>>>
> >>>>
> >>>> Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
> >>>> thread flag, and in emulate_vsyscall(), checks the flag.
> >>>>
> >>>> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> >>>>
> >>>> How is that?
> >>>
> >>> Backwards, no? Presumably vsyscall needs to be disabled before or
> >>> concurrently with CET being enabled, not after.
> >>>
> >>> I think the solution of making vsyscall emulation work correctly with
> >>> CET is going to be better and possibly more straightforward.
> >>>
> >>
> >> We can do
> >>
> >> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
> >> 2. If CPU supports CET and the program is CET enabled:
> >> a. Disable the vsyscall page.
> >> b. Pass control to user.
> >> c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
> >>
> >> So when control is passed from kernel to user, the vsyscall page is
> >> disabled if the program
> >> is CET enabled.
> >
> > Let me say this one more time:
> >
> > If we have a per-process vsyscall disable control and a per-process
> > CET control, we are going to keep those settings orthogonal. I'm
> > willing to entertain an option in which enabling SHSTK without also
> > disabling vsyscalls is disallowed, We are *not* going to have any CET
> > flags magically disable vsyscalls, though, and we are not going to
> > have a situation where disabling vsyscalls on process startup requires
> > enabling SHSTK.
> >
> > Any possible static vsyscall controls (and CET controls, for that
> > matter) also need to come with some explanation of whether they are
> > properties set on the ELF loader, the ELF program being loaded, or
> > both. And this explanation needs to cover what happens when old
> > binaries link against new libc versions and vice versa. A new
> > CET-enabled binary linked against old libc running on a new kernel
> > that is expected to work on a non-CET CPU MUST work on a CET CPU, too.
> >
> > Right now, literally the only thing preventing vsyscall emulation from
> > coexisting with SHSTK is that the implementation eeds work.
> >
> > So your proposal is rejected. Sorry.
> >
> I think, even with shadow stack/ibt enabled, we can still allow XONLY
> without too much mess.
>
> What about this?
>
> Thanks,
> Yu-cheng
>
> ======
>
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 8b0b32ac7791..d39da0a15521 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -48,16 +48,16 @@
> static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init =
> #ifdef CONFIG_LEGACY_VSYSCALL_NONE
> NONE;
> -#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY)
> +#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) || defined(CONFIG_X86_CET)
> XONLY;
> -#else
> +#else
> EMULATE;
> #endif

I don't get it.

First, you can't do any of this based on config -- it must be runtime.

Second, and more importantly, I don't see how XONLY helps at all. The
(non-executable) text that's exposed to user code in EMULATE mode is
trivial to get right with CET -- your code already handles it. It's
the emulation code (that runs identically in EMULATE and XONLY mode)
that's tricky.

2020-10-06 19:12:10

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On 10/1/2020 10:26 AM, Andy Lutomirski wrote:
> On Thu, Oct 1, 2020 at 9:51 AM Yu, Yu-cheng <[email protected]> wrote:
>>
>> On 9/30/2020 6:10 PM, Andy Lutomirski wrote:
>>> On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <[email protected]> wrote:
>>>>
>>>> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <[email protected]> wrote:
>>
>> [...]
>>
>>>>>>>>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
>>>>>>>>> From: Yu-cheng Yu <[email protected]>
>>>>>>>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
>>>>>>>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
>>>>>>>>> Indirect Branch
>>>>>>>>> Tracking for vsyscall emulation
>>>>>>>>>
>>>>>>>>> Vsyscall entry points are effectively branch targets. Mark them with
>>>>>>>>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
>>>>>>>>> and reset IBT state machine.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Yu-cheng Yu <[email protected]>

[...]

>>>>>>>>
>>>>>>>
>>>>>>> For what it's worth, I think there is an alternative. If you all
>>>>>>> (userspace people, etc) can come up with a credible way for a user
>>>>>>> program to statically declare that it doesn't need vsyscalls, then we
>>>>>>> could make SHSTK depend on *that*, and we could avoid this mess. This
>>>>>>> breaks orthogonality, but it's probably a decent outcome.
>>>>>>>
>>>>>>
>>>>>> Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
>>>>>> thread flag, and in emulate_vsyscall(), checks the flag.
>>>>>>
>>>>>> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
>>>>>>
>>>>>> How is that?
>>>>>
>>>>> Backwards, no? Presumably vsyscall needs to be disabled before or
>>>>> concurrently with CET being enabled, not after.
>>>>>
>>>>> I think the solution of making vsyscall emulation work correctly with
>>>>> CET is going to be better and possibly more straightforward.
>>>>>
>>>>
>>>> We can do
>>>>
>>>> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
>>>> 2. If CPU supports CET and the program is CET enabled:
>>>> a. Disable the vsyscall page.
>>>> b. Pass control to user.
>>>> c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
>>>>
>>>> So when control is passed from kernel to user, the vsyscall page is
>>>> disabled if the program
>>>> is CET enabled.
>>>
>>> Let me say this one more time:
>>>
>>> If we have a per-process vsyscall disable control and a per-process
>>> CET control, we are going to keep those settings orthogonal. I'm
>>> willing to entertain an option in which enabling SHSTK without also
>>> disabling vsyscalls is disallowed, We are *not* going to have any CET
>>> flags magically disable vsyscalls, though, and we are not going to
>>> have a situation where disabling vsyscalls on process startup requires
>>> enabling SHSTK.
>>>
>>> Any possible static vsyscall controls (and CET controls, for that
>>> matter) also need to come with some explanation of whether they are
>>> properties set on the ELF loader, the ELF program being loaded, or
>>> both. And this explanation needs to cover what happens when old
>>> binaries link against new libc versions and vice versa. A new
>>> CET-enabled binary linked against old libc running on a new kernel
>>> that is expected to work on a non-CET CPU MUST work on a CET CPU, too.
>>>
>>> Right now, literally the only thing preventing vsyscall emulation from
>>> coexisting with SHSTK is that the implementation eeds work.
>>>
>>> So your proposal is rejected. Sorry.
>>>
>> I think, even with shadow stack/ibt enabled, we can still allow XONLY
>> without too much mess.
>>
>> What about this?
>>
>> Thanks,
>> Yu-cheng
>>
>> ======
>>
>> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
>> b/arch/x86/entry/vsyscall/vsyscall_64.c
>> index 8b0b32ac7791..d39da0a15521 100644
>> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
>> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
>> @@ -48,16 +48,16 @@
>> static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init =
>> #ifdef CONFIG_LEGACY_VSYSCALL_NONE
>> NONE;
>> -#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY)
>> +#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) || defined(CONFIG_X86_CET)
>> XONLY;
>> -#else
>> +#else
>> EMULATE;
>> #endif
>
> I don't get it.
>
> First, you can't do any of this based on config -- it must be runtime.
>
> Second, and more importantly, I don't see how XONLY helps at all. The
> (non-executable) text that's exposed to user code in EMULATE mode is
> trivial to get right with CET -- your code already handles it. It's
> the emulation code (that runs identically in EMULATE and XONLY mode)
> that's tricky.
>

Hi,

There has been some ambiguity in my previous proposals. To make things
clear, I created a patch for arch_prctl(VSYSCALL_CTL), which controls
the TIF_VSYSCALL_DISABLE flag. It is entirely orthogonal to shadow
stack or IBT. On top of the patch, we can do SET_PERSONALITY2() to
disable vsyscall, e.g.

======
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 0e1be2a13359..c730ff00bc62 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -394,6 +394,19 @@ struct arch_elf_state {
.gnu_property = 0, \
}

+#define SET_PERSONALITY2(ex, state) \
+do { \
+ unsigned int has_cet; \
+ \
+ has_cet = GNU_PROPERTY_X86_FEATURE_1_SHSTK | \
+ GNU_PROPERTY_X86_FEATURE_1_IBT; \
+ \
+ if ((state)->gnu_property & has_cet) \
+ set_thread_flag(TIF_VSYSCALL_DISABLE); \
+ \
+ SET_PERSONALITY(ex); \
+} while (0)
+
#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) (0)
#define arch_check_elf(ehdr, interp, interp_ehdr, state) (0)
#endif

======
The is the patch.

From a124b81086122495d6837f26df99db619cd5402a Mon Sep 17 00:00:00 2001
From: Yu-cheng Yu <[email protected]>
Date: Mon, 5 Oct 2020 12:10:26 -0700
Subject: [PATCH 34/45] x86/vsyscall/64: Introduce arch_prctl(VSYCALL_CTL)

Vsyscall emulation provides compatibility to older applications. Newer
applications use the vDSO interface and do not use vsyscalls, and it is
desirable to have a per-task control of vsyscall.

One use case of the interface is when shadow stack and/or indirect branch
tracking is enabled and vsyscall emulation needs to cancel out the control-
flow protection. The cancelling code, if implemented, could become a back
door for evading the protection. Disabling vsyscall eliminates the risk.

Introduce arch_prctl(VSYSCALL_CTL), which sets/clears TIF_VSYSCALL_DISABLE
flag. When the flag is set, vsyscall is disabled.

Signed-off-by: Yu-cheng Yu <[email protected]>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 3 +++
arch/x86/include/asm/thread_info.h | 2 ++
arch/x86/include/uapi/asm/prctl.h | 1 +
arch/x86/kernel/process_64.c | 17 +++++++++++++++++
tools/arch/x86/include/uapi/asm/prctl.h | 1 +
5 files changed, 24 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..fe8f3db6d21b 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -127,6 +127,9 @@ bool emulate_vsyscall(unsigned long error_code,
long ret;
unsigned long orig_dx;

+ if (test_thread_flag(TIF_VSYSCALL_DISABLE))
+ return false;
+
/* Write faults or kernel-privilege faults never get fixed up. */
if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
return false;
diff --git a/arch/x86/include/asm/thread_info.h
b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..c0cce3401c0f 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -98,6 +98,7 @@ struct thread_info {
#define TIF_IO_BITMAP 22 /* uses I/O bitmap */
#define TIF_FORCED_TF 24 /* true if TF in eflags artificially */
#define TIF_BLOCKSTEP 25 /* set when we want DEBUGCTLMSR_BTF */
+#define TIF_VSYSCALL_DISABLE 26 /* set when vsyscall is disallowed */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
#define TIF_ADDR32 29 /* 32-bit address space on 64 bits */
@@ -127,6 +128,7 @@ struct thread_info {
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
#define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
#define _TIF_BLOCKSTEP (1 << TIF_BLOCKSTEP)
+#define _TIF_VSYSCALL_DISABLE (1 << TIF_VSYSCALL_DISABLE)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
#define _TIF_ADDR32 (1 << TIF_ADDR32)
diff --git a/arch/x86/include/uapi/asm/prctl.h
b/arch/x86/include/uapi/asm/prctl.h
index 9245bf629120..223fa382a81e 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -13,6 +13,7 @@
#define ARCH_MAP_VDSO_X32 0x2001
#define ARCH_MAP_VDSO_32 0x2002
#define ARCH_MAP_VDSO_64 0x2003
+#define ARCH_VSYSCALL_CTRL 0x2004

#define ARCH_X86_CET_STATUS 0x3001
#define ARCH_X86_CET_DISABLE 0x3002
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1147a1052a07..eba61791c9cf 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -719,6 +719,20 @@ static long prctl_map_vdso(const struct vdso_image
*image, unsigned long addr)
}
#endif

+static long prctl_vsyscall_ctrl(unsigned int disable)
+{
+ if (IS_ENABLED(CONFIG_X86_VSYSCALL_EMULATION)) {
+ if (disable)
+ set_thread_flag(TIF_VSYSCALL_DISABLE);
+ else
+ clear_thread_flag(TIF_VSYSCALL_DISABLE);
+
+ return 0;
+ } else {
+ return disable ? 0 : -EINVAL;
+ }
+}
+
long do_arch_prctl_64(struct task_struct *task, int option, unsigned
long arg2)
{
int ret = 0;
@@ -807,6 +821,9 @@ long do_arch_prctl_64(struct task_struct *task, int
option, unsigned long arg2)
return prctl_map_vdso(&vdso_image_64, arg2);
#endif

+ case ARCH_VSYSCALL_CTRL:
+ return prctl_vsyscall_ctrl(arg2);
+
default:
ret = -EINVAL;
break;
diff --git a/tools/arch/x86/include/uapi/asm/prctl.h
b/tools/arch/x86/include/uapi/asm/prctl.h
index 9245bf629120..2476f46fa51f 100644
--- a/tools/arch/x86/include/uapi/asm/prctl.h
+++ b/tools/arch/x86/include/uapi/asm/prctl.h
@@ -13,6 +13,7 @@
#define ARCH_MAP_VDSO_X32 0x2001
#define ARCH_MAP_VDSO_32 0x2002
#define ARCH_MAP_VDSO_64 0x2003
+#define ARCH_VSYSCALL_CTL 0x2004

#define ARCH_X86_CET_STATUS 0x3001
#define ARCH_X86_CET_DISABLE 0x3002
--
2.21.0


2020-10-09 19:56:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and Indirect Branch Tracking for vsyscall emulation

On Tue, Oct 6, 2020 at 12:09 PM Yu, Yu-cheng <[email protected]> wrote:
>
> On 10/1/2020 10:26 AM, Andy Lutomirski wrote:
> > On Thu, Oct 1, 2020 at 9:51 AM Yu, Yu-cheng <[email protected]> wrote:
> >>
> >> On 9/30/2020 6:10 PM, Andy Lutomirski wrote:
> >>> On Wed, Sep 30, 2020 at 6:01 PM H.J. Lu <[email protected]> wrote:
> >>>>
> >>>> On Wed, Sep 30, 2020 at 4:44 PM Andy Lutomirski <[email protected]> wrote:
> >>
> >> [...]
> >>
> >>>>>>>>> From 09803e66dca38d7784e32687d0693550948199ed Mon Sep 17 00:00:00 2001
> >>>>>>>>> From: Yu-cheng Yu <[email protected]>
> >>>>>>>>> Date: Thu, 29 Nov 2018 14:15:38 -0800
> >>>>>>>>> Subject: [PATCH v13 8/8] x86/vsyscall/64: Fixup Shadow Stack and
> >>>>>>>>> Indirect Branch
> >>>>>>>>> Tracking for vsyscall emulation
> >>>>>>>>>
> >>>>>>>>> Vsyscall entry points are effectively branch targets. Mark them with
> >>>>>>>>> ENDBR64 opcodes. When emulating the RET instruction, unwind shadow stack
> >>>>>>>>> and reset IBT state machine.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Yu-cheng Yu <[email protected]>
>
> [...]
>
> >>>>>>>>
> >>>>>>>
> >>>>>>> For what it's worth, I think there is an alternative. If you all
> >>>>>>> (userspace people, etc) can come up with a credible way for a user
> >>>>>>> program to statically declare that it doesn't need vsyscalls, then we
> >>>>>>> could make SHSTK depend on *that*, and we could avoid this mess. This
> >>>>>>> breaks orthogonality, but it's probably a decent outcome.
> >>>>>>>
> >>>>>>
> >>>>>> Would an arch_prctl(DISABLE_VSYSCALL) work? The kernel then sets a
> >>>>>> thread flag, and in emulate_vsyscall(), checks the flag.
> >>>>>>
> >>>>>> When CET is enabled, ld-linux will do DISABLE_VSYSCALL.
> >>>>>>
> >>>>>> How is that?
> >>>>>
> >>>>> Backwards, no? Presumably vsyscall needs to be disabled before or
> >>>>> concurrently with CET being enabled, not after.
> >>>>>
> >>>>> I think the solution of making vsyscall emulation work correctly with
> >>>>> CET is going to be better and possibly more straightforward.
> >>>>>
> >>>>
> >>>> We can do
> >>>>
> >>>> 1. Add ARCH_X86_DISABLE_VSYSCALL to disable the vsyscall page.
> >>>> 2. If CPU supports CET and the program is CET enabled:
> >>>> a. Disable the vsyscall page.
> >>>> b. Pass control to user.
> >>>> c. Enable the vsyscall page when ARCH_X86_CET_DISABLE is called.
> >>>>
> >>>> So when control is passed from kernel to user, the vsyscall page is
> >>>> disabled if the program
> >>>> is CET enabled.
> >>>
> >>> Let me say this one more time:
> >>>
> >>> If we have a per-process vsyscall disable control and a per-process
> >>> CET control, we are going to keep those settings orthogonal. I'm
> >>> willing to entertain an option in which enabling SHSTK without also
> >>> disabling vsyscalls is disallowed, We are *not* going to have any CET
> >>> flags magically disable vsyscalls, though, and we are not going to
> >>> have a situation where disabling vsyscalls on process startup requires
> >>> enabling SHSTK.
> >>>
> >>> Any possible static vsyscall controls (and CET controls, for that
> >>> matter) also need to come with some explanation of whether they are
> >>> properties set on the ELF loader, the ELF program being loaded, or
> >>> both. And this explanation needs to cover what happens when old
> >>> binaries link against new libc versions and vice versa. A new
> >>> CET-enabled binary linked against old libc running on a new kernel
> >>> that is expected to work on a non-CET CPU MUST work on a CET CPU, too.
> >>>
> >>> Right now, literally the only thing preventing vsyscall emulation from
> >>> coexisting with SHSTK is that the implementation eeds work.
> >>>
> >>> So your proposal is rejected. Sorry.
> >>>
> >> I think, even with shadow stack/ibt enabled, we can still allow XONLY
> >> without too much mess.
> >>
> >> What about this?
> >>
> >> Thanks,
> >> Yu-cheng
> >>
> >> ======
> >>
> >> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> >> b/arch/x86/entry/vsyscall/vsyscall_64.c
> >> index 8b0b32ac7791..d39da0a15521 100644
> >> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> >> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> >> @@ -48,16 +48,16 @@
> >> static enum { EMULATE, XONLY, NONE } vsyscall_mode __ro_after_init =
> >> #ifdef CONFIG_LEGACY_VSYSCALL_NONE
> >> NONE;
> >> -#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY)
> >> +#elif defined(CONFIG_LEGACY_VSYSCALL_XONLY) || defined(CONFIG_X86_CET)
> >> XONLY;
> >> -#else
> >> +#else
> >> EMULATE;
> >> #endif
> >
> > I don't get it.
> >
> > First, you can't do any of this based on config -- it must be runtime.
> >
> > Second, and more importantly, I don't see how XONLY helps at all. The
> > (non-executable) text that's exposed to user code in EMULATE mode is
> > trivial to get right with CET -- your code already handles it. It's
> > the emulation code (that runs identically in EMULATE and XONLY mode)
> > that's tricky.
> >
>
> Hi,
>
> There has been some ambiguity in my previous proposals. To make things
> clear, I created a patch for arch_prctl(VSYSCALL_CTL), which controls
> the TIF_VSYSCALL_DISABLE flag. It is entirely orthogonal to shadow
> stack or IBT. On top of the patch, we can do SET_PERSONALITY2() to
> disable vsyscall, e.g.

NAK. Let me try explaining again.

>
> ======
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 0e1be2a13359..c730ff00bc62 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -394,6 +394,19 @@ struct arch_elf_state {
> .gnu_property = 0, \
> }
>
> +#define SET_PERSONALITY2(ex, state) \
> +do { \
> + unsigned int has_cet; \
> + \
> + has_cet = GNU_PROPERTY_X86_FEATURE_1_SHSTK | \
> + GNU_PROPERTY_X86_FEATURE_1_IBT; \
> + \
> + if ((state)->gnu_property & has_cet) \
> + set_thread_flag(TIF_VSYSCALL_DISABLE); \
> + \
> + SET_PERSONALITY(ex); \
> +} while (0)
> +

This is not what "orthogonal" means. If the bits were orthogonal, the
logic would be:

if (gnu_property & DISABLE_VSYSCALL)
disable vsyscall;
if (gnu_property & SHSTK)
enable SHSTK;
if (gnu_property & IBT);
enable IBT;

and, if necessarily (although I still think it would be preferable not
to do this):

if ((gnu_property & (DISABLE_VSYSCALL | SHSTK)) == SHSTK)
return -EINVAL;

As far as I'm concerned, you have two choices:

a) Make SHSTK work *correctly* with vsyscall emulation.

b) Add a high quality mechanism to disable vsyscall emulation and make
SHSTK depend on that.

As far as I'm concerned, (a) is preferable. Ideally we'd get (a)
*and* a high quality vsyscall emulation disable mechanism with no
dependencies.


> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c
> b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 44c33103a955..fe8f3db6d21b 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -127,6 +127,9 @@ bool emulate_vsyscall(unsigned long error_code,
> long ret;
> unsigned long orig_dx;
>
> + if (test_thread_flag(TIF_VSYSCALL_DISABLE))
> + return false;
> +

This needs to be per-mm, not per-thread. There's a patch floating
around that gets us about a quarter of the way there. I'm not
convinced that CET should wait for this to finish.