Hello All,
I've been working on linux support for shadow stack and landing pad
instruction on riscv for a while.
These are still RFC quality. But atleast they're in a shape which can
start a discussion and I can get some feedback. So I decided to sending
out patches.
This patch series implements `zisslpcfi` extension which helps software
to enforce control-flow integrity (cfi) on riscv CPUs. Currently spec is
called zisslpcfi [1] spec. This literally means "unprivileged integer
shadow stack & landing pad based control-flow integrity".
Three major architectures [1, 2, 3] either already support or have
indicated support for control flow integrity extensions based on shadow
stack (for backward edge) and landing pads for indirect call/jmp (for
forward edge). Since these mechanisms are solving one common security
issue (control flow hijacking attacks) plaguing software ecosystem,
there're bound to be common similarities and thus need of abstracting
or having common code. These commonalities are:
- Concept of shadow stack for program
- Protection of shadow stack from regular stores but still allowing
return address to be store via special mechanism(s)
- Concept of indirect branch tracking and thus landing pad instruction
as target for indirect branches.
Due to commonality in three arches for shadow stack and landing pad instr
support, patch series defines following arch-agnostic configs:
- CONFIG_USER_SHADOW_STACK: Selecting this config means kernel will
have support for programs compiled with optin of shadow stacks.
- CONFIG_USER_INDIRECT_BR_LP: Selecting this config means kernel will
have support for program compiled with optin of landing pad
instructions on indirect branches.
There're stubs for abstraction on non-riscv architectures and specific
implementation for riscv architecture. Architecture owners can implement
those stubs abstractions on respective architectures to implement shadow
stack and branch tracking mechanism.
Additional highlights of this patch series specifically targeted towards
implementation of `zisslpcfi`
Shadow stack and landing pad state
----------------------------------
On riscv I choose to insert shadow stack and landing pad state in
thread_info. pt_regs is one choice but I didn't see a compelling reason
to put these in pt_regs. If you do have a compelling reason to put this
in pt_regs, we can make that change.
Shadow stack
------------
Shadow stack is already present on x86 [2] and is going to be part of
riscv and aarch64 [3] architecture. Since shadow stack are writeable
memory but at the same time are not allowed to be writeable by regular
stores. They have special meaning and this patch series proposes a new
`mmap` protection flag `PROT_SHADOWSTACK`. Repsective architecture can
choose to implement this memory protection. With respect to vma flags,
riscv implementation chooses to have only `VM_WRITE` as shadow stack
meaning and is analogous to riscv architecture's PTE encodings
(X=0,R=0,W=1) as shadow stack. On riscv, all stores to shadow stack
memory raise access store faults. Similarly all shadow stack load and
stores on non-shadow stack memory (but valid) will raise access
load/store faults. This patch series creates a synthetic exception code
(=14) which is reserved in privileged spec and feeds that into common
page fault handling routine.
ELF parsing
------------
There're two ways to enable forward cfi and backward cfi
- prctls: `ld` can issue prctls to enable it on the runtime.
- elf marker: Some sort of marker in elf header which kernel can recognize
and setup forward and backward cfi state.
This patch series uses .note.gnu.property section and assumes two flags
exist there for forward cfi and backward cfi, if present kernel setups
the respective cfi state.
Please note that this part will change because risc-v is moving towards
using `.riscv.attributes` section to host such flags. So I'll change the
implementation in future revisions.
Signal & ucontext
------------------
This patch series steals 4(32bit)/8(64bit) bytes from a padding structure
in ucontext. This padding exist for future use case which we don't know yet.
But stealing few bytes from this padding allows us to keep the structure
size same and save some compatiblity issues. Signal patches still need
some work (have to work out a shadow stack token save on signal delivery
and restore mechanism on sigreturn. This will prevent any abuse of
sigreturns to hijack control flow)
Audit mode
-----------
Since this is a RFC, current set of patches suppresses cfi violations in
program and let the program make forward progress. However this shouldn't
be allowed by default and can be built into a policy.
More on `zisslpcfi` riscv extension
------------------------------------
zisslpcfi (CFI) extends ISA in following manner:
Forward cfi (indirect call/jmp)
- Landing pad instruction requirement for indirect call/jmp
All indirect call and jump must land on landing pad instruction `lpcll`
else CPU will raise illegal instruction exception. `lpcll` stands for
land pad check lower label.
- Static label (25bit label) checking instructions for indirect call/jmp
Extension provides mechanism using which a compiler generated label
value can be set in a designated CSR at call site and it can be checked
at the call target. If mismatch happens, CPU will raise illegal
instruction exception. Compiler can generate hash based on function
signature type. Extension provide mechanisms using which label value
is part of the instruction itself as immediate and thus has static
immutable property.
Backward cfi (returns)
- Shadow stack (SS) for function returns
Extension provides sspush x1/x5, sspop x1/x5 and sschkra instructions.
sspush will push on shadow stack while sspop will pop from shadow stack
sschkra will succeed only when x1 == x5 is true else it will raise
illegal instruction exception. Shadow stacks introduces new virtual
memory type and thus new PTE encodings. Existing reserved encoding of
R=0,W=1,X=0 is now shadow stack PTE encoding (only if backward cfi is
enabled for current mode). New virtual memory type allows CPU to
distinguish so that stores coming from sspush or ssamoswap can succeed
while regular stores raise access violations.
opcodes:
zisslpcfi opcodes are carved out of new opcode encodings. These opcodes
encodings were reserved until now. A new extension called zimops make
these opcodes into "may be operations". zimops stands for unprivileged
may be operations (mops) and if implemented default behavior is to mov 0
to rd. zisslpcfi extension changes executable in a way where it should be
able to run on riscv cpu which implements cfi extension as well as riscv
cpu which doesn't implement cfi extension. As long as zimops is
implemented, all such instructions will not fault and simply move 0 to rd.
A hart implementing cfi must implement zimops. Any future extension can
re-purpose zimops to change behavior and claim them while also not
breaking binary/executable compatiblity . zisslpcfi is first such
extension to modify zimops behavior.
Instructions:
zisslpcfi defines following instructions.
Backward control flow:
sspush x1/x5:
Decrement shadow stack pointer and pushes x1 or x5 on shadow stack.
sspop x1/x5:
Pops from shadow stack into x1 or x5. Increments shadow stack pointer.
ssprr:
Reads current shadow stack pointer into a destination register.
sschckra:
Compares x1 with x5. Raises illegal instr exception if x1 != x5.
ssamoswap:
Atomically swaps value on top of shadow stack.
Forward control flow:
Forward control flow extends architecture to allow software to set labels
(25bits of label) at call/jmp site and check labels at target. Extension
gives instructions to set label as part of immediate in instruction itself
. Since immediate is limited in terms of bit length, labels are set and
checked in ladder fashion of 9, 8 and 8 bits.
lpsll, lpsml, lpsul:
sets lower (9bit), mid (8bit) and upper (8bit) label values in CSR_LPLR
respectively.
lpcll, lpcml, lpcul:
checks lower (9bit), mid (8bit) and upper (8bit) label values with
CSR_LPLR respectively. Check label instructions raise illegal instruction
fault when labels mismatch. `lpcll` has dual purpose; it acts as landing
pad instruction as well label checking for lower 9 bits.
Tests and other bits
********************
For convenience this patch has been tested with followng qemu impl.
https://github.com/deepak0414/qemu/tree/scfi_menvcfg_gh_Zisslpcfi-0.1
I've been able to boot linux kernel using this implementation and run
very basic simple tests apps. For convenience here is the branch which
has implementation.
https://github.com/deepak0414/linux-riscv-cfi/tree/Zisslpcfi-0.4_v6.1-rc2
In order to perform unit-tests on qemu-impl, I've been using riscv-test
and created unit tests to test implementation. riscv-tests branch URL is
below
https://github.com/deepak0414/riscv-tests/tree/cfi_tests
[1] - https://github.com/riscv/riscv-cfi
[2] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-introduction-intel-cet-844137.pdf
[3] - https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022
Deepak Gupta (20):
sslp stubs: shadow stack and landing pad stubs
riscv: zisslpcfi enumeration
riscv: zisslpcfi extension csr and bit definitions
riscv: kernel enabling user code for shadow stack and landing pad
mmap : Introducing new protection "PROT_SHADOWSTACK" for mmap
riscv: Implementing "PROT_SHADOWSTACK" on riscv
elf: ELF header parsing in GNU property for cfi state
riscv: ELF header parsing in GNU property for riscv zisslpcfi
riscv mmu: riscv shadow stack page fault handling
riscv mmu: write protect and shadow stack
mmu: maybe_mkwrite updated to manufacture shadow stack PTEs
riscv mm: manufacture shadow stack pte and is vma shadowstack
riscv: illegal instruction handler for cfi violations
riscv: audit mode for cfi violations
sslp prctl: arch-agnostic prctl for shadow stack and landing pad instr
riscv: Implements sslp prctls
riscv ucontext: adding shadow stack pointer field in ucontext
riscv signal: Save and restore of shadow stack for signal
config: adding two new config for control flow integrity
riscv: select config for shadow stack and landing pad instr support
arch/riscv/Kconfig | 4 +
arch/riscv/include/asm/csr.h | 28 ++++
arch/riscv/include/asm/elf.h | 54 ++++++++
arch/riscv/include/asm/hwcap.h | 6 +-
arch/riscv/include/asm/mman.h | 19 +++
arch/riscv/include/asm/pgtable.h | 21 ++-
arch/riscv/include/asm/processor.h | 26 ++++
arch/riscv/include/asm/thread_info.h | 5 +
arch/riscv/include/uapi/asm/ucontext.h | 32 ++++-
arch/riscv/kernel/asm-offsets.c | 5 +
arch/riscv/kernel/cpu.c | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/entry.S | 40 ++++++
arch/riscv/kernel/process.c | 155 +++++++++++++++++++++
arch/riscv/kernel/signal.c | 45 ++++++
arch/riscv/kernel/sys_riscv.c | 22 +++
arch/riscv/kernel/traps.c | 183 ++++++++++++++++++++++++-
arch/riscv/mm/fault.c | 23 +++-
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pageattr.c | 7 +
fs/binfmt_elf.c | 5 +
include/linux/elf.h | 8 ++
include/linux/mm.h | 23 +++-
include/linux/pgtable.h | 4 +
include/linux/processor.h | 17 +++
include/uapi/asm-generic/mman-common.h | 6 +
include/uapi/linux/elf.h | 6 +
include/uapi/linux/prctl.h | 26 ++++
init/Kconfig | 19 +++
kernel/sys.c | 40 ++++++
mm/mmap.c | 4 +
31 files changed, 825 insertions(+), 12 deletions(-)
create mode 100644 arch/riscv/include/asm/mman.h
--
2.25.1
In absence of shadow stack config and landing pad instr config, stubs are
needed to indicate whether shadow stack & landing pad instr is supported.
In absence of config, these stubs return false (indicating no support)
In presence of config, an extern declaration is added and arch specific
implementation can choose to implement detection.
Signed-off-by: Deepak Gupta <[email protected]>
---
include/linux/processor.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/linux/processor.h b/include/linux/processor.h
index dc78bdc7079a..228aa95a7cd7 100644
--- a/include/linux/processor.h
+++ b/include/linux/processor.h
@@ -59,4 +59,21 @@ do { \
#endif
+#ifndef CONFIG_USER_SHADOW_STACK
+static inline bool arch_supports_shadow_stack(void)
+{
+ return false;
+}
+#else
+extern bool arch_supports_shadow_stack(void);
+#endif
+
+#ifndef CONFIG_USER_INDIRECT_BR_LP
+static inline bool arch_supports_indirect_br_lp_instr(void)
+{
+ return false;
+}
+#else
+extern bool arch_supports_indirect_br_lp_instr(void);
+#endif
#endif /* _LINUX_PROCESSOR_H */
--
2.25.1
This patch adds support for detecting zisslpcfi. zisslpcfi stands for
unprivleged integer spec extension to support shadow stack and landing
pad instruction for indirect branch.
This patch looks for "zisslpcfi" in device tree and accordinlgy lights up
bit in cpu feature bitmap. Furthermore this patch adds detection utility
functions to return whether shadow stack or landing pads are supported by
cpu.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/hwcap.h | 6 +++++-
arch/riscv/include/asm/processor.h | 12 ++++++++++++
arch/riscv/kernel/cpu.c | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
4 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 86328e3acb02..245fb7ffddd2 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -59,7 +59,8 @@ enum riscv_isa_ext_id {
RISCV_ISA_EXT_ZIHINTPAUSE,
RISCV_ISA_EXT_SSTC,
RISCV_ISA_EXT_SVINVAL,
- RISCV_ISA_EXT_ID_MAX
+ RISCV_ISA_EXT_ZCFI,
+ RISCV_ISA_EXT_ID_MAX,
};
static_assert(RISCV_ISA_EXT_ID_MAX <= RISCV_ISA_EXT_MAX);
@@ -72,6 +73,7 @@ enum riscv_isa_ext_key {
RISCV_ISA_EXT_KEY_FPU, /* For 'F' and 'D' */
RISCV_ISA_EXT_KEY_ZIHINTPAUSE,
RISCV_ISA_EXT_KEY_SVINVAL,
+ RISCV_ISA_EXT_KEY_ZCFI,
RISCV_ISA_EXT_KEY_MAX,
};
@@ -95,6 +97,8 @@ static __always_inline int riscv_isa_ext2key(int num)
return RISCV_ISA_EXT_KEY_ZIHINTPAUSE;
case RISCV_ISA_EXT_SVINVAL:
return RISCV_ISA_EXT_KEY_SVINVAL;
+ case RISCV_ISA_EXT_ZCFI:
+ return RISCV_ISA_EXT_KEY_ZCFI;
default:
return -EINVAL;
}
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 94a0590c6971..bdebce2cc323 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -80,6 +80,18 @@ int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid);
extern void riscv_fill_hwcap(void);
extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
+#ifdef CONFIG_USER_SHADOW_STACK
+static inline bool arch_supports_shadow_stack(void)
+{
+ return __riscv_isa_extension_available(NULL, RISCV_ISA_EXT_ZCFI);
+}
+#endif
+#ifdef CONFIG_USER_INDIRECT_BR_LP
+static inline bool arch_supports_indirect_br_lp_instr(void)
+{
+ return __riscv_isa_extension_available(NULL, RISCV_ISA_EXT_ZCFI);
+}
+#endif
#endif /* __ASSEMBLY__ */
#endif /* _ASM_RISCV_PROCESSOR_H */
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 1b9a5a66e55a..fe2bb908d805 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -168,6 +168,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
+ __RISCV_ISA_EXT_DATA(zisslpcfi, RISCV_ISA_EXT_ZCFI),
__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
};
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 93e45560af30..b44e258a7502 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -228,6 +228,7 @@ void __init riscv_fill_hwcap(void)
SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
+ SET_ISA_EXT_MAP("zisslpcfi", RISCV_ISA_EXT_ZCFI);
}
#undef SET_ISA_EXT_MAP
}
--
2.25.1
zisslpcfi extension extends xstatus CSR to hold enabling bits for
shadow stack, forward cfi (landing pad instruction enforcement on
indirect call/jmp) and recording current landing pad state of cpu.
zisslpcfi adds two new CSRs
- CSR_LPLR: Strict forward control flow can be implemented by compiler
by doing label match on target with label generated on call-site. This
CSR can be programmed with label (preserving current abi). New instrs
are provided to place label values in this CSR.
- CSR_SSP: Return control flow is protected via shadow stack. CSR_SSP
contains current shadow stack pointer.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/csr.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 0e571f6483d9..243031d1d305 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -18,6 +18,23 @@
#define SR_MPP _AC(0x00001800, UL) /* Previously Machine */
#define SR_SUM _AC(0x00040000, UL) /* Supervisor User Memory Access */
+/* zisslpcfi status bits */
+#define SR_UFCFIEN _AC(0x00800000, UL)
+#define SR_UBCFIEN _AC(0x01000000, UL)
+#define SR_SPELP _AC(0x02000000, UL)
+#define SR_MPELP _AC(0x04000000, UL)
+#ifdef CONFIG_RISCV_M_MODE
+#define SR_ELP SR_MPELP
+#else
+#define SR_ELP SR_SPELP
+#endif
+
+#ifdef CONFIG_RISCV_M_MODE
+#define CFISTATUS_MASK (SR_UFCFIEN | SR_UBCFIEN | SR_MPELP | SR_SPELP)
+#else
+#define CFISTATUS_MASK (SR_ELP | SR_UFCFIEN | SR_UBCFIEN)
+#endif
+
#define SR_FS _AC(0x00006000, UL) /* Floating-point Status */
#define SR_FS_OFF _AC(0x00000000, UL)
#define SR_FS_INITIAL _AC(0x00002000, UL)
@@ -168,6 +185,14 @@
#define ENVCFG_CBIE_INV _AC(0x3, UL)
#define ENVCFG_FIOM _AC(0x1, UL)
+/*
+ * zisslpcfi user mode csrs
+ * CSR_LPLR is a label register which holds compiler generated label that must be checked on target.
+ * CSR_SSP holds current shadow stack pointer.
+ */
+#define CSR_LPLR 0x006
+#define CSR_SSP 0x020
+
/* symbolic CSR names: */
#define CSR_CYCLE 0xc00
#define CSR_TIME 0xc01
--
2.25.1
Enables architectural support for shadow stack and landing pad instr
for user mode on riscv.
This patch does following
- Defines a new structure cfi_status
- Includes cfi_status in thread_info
- Defines offsets to new member fields in thread_info in asm-offsets.c
- Saves and restore cfi state on trap entry (U --> S) and exit (S --> U)
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/processor.h | 11 ++++++++
arch/riscv/include/asm/thread_info.h | 5 ++++
arch/riscv/kernel/asm-offsets.c | 5 ++++
arch/riscv/kernel/entry.S | 40 ++++++++++++++++++++++++++++
4 files changed, 61 insertions(+)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index bdebce2cc323..f065309927b1 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -41,6 +41,17 @@ struct thread_struct {
unsigned long bad_cause;
};
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+struct cfi_status {
+ unsigned int ufcfi_en : 1; /* Enable for forward cfi. Note that ELP goes in sstatus */
+ unsigned int ubcfi_en : 1; /* Enable for backward cfi. */
+ unsigned int rsvd1 : 30;
+ unsigned int lp_label; /* saved label value (25bit) */
+ long user_shdw_stk; /* Current user shadow stack pointer */
+ long shdw_stk_base; /* Base address of shadow stack */
+};
+#endif
+
/* Whitelist the fstate from the task_struct for hardened usercopy */
static inline void arch_thread_struct_whitelist(unsigned long *offset,
unsigned long *size)
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index 67322f878e0d..f74b8bd55d5b 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -65,6 +65,11 @@ struct thread_info {
*/
long kernel_sp; /* Kernel stack pointer */
long user_sp; /* User stack pointer */
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+ /* cfi_state only if config is defined */
+ /* state of user cfi state. note this includes LPLR and SSP as well */
+ struct cfi_status user_cfi_state;
+#endif
int cpu;
};
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index df9444397908..340e6413cf3c 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -38,6 +38,11 @@ void asm_offsets(void)
OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp);
OFFSET(TASK_TI_USER_SP, task_struct, thread_info.user_sp);
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+ OFFSET(TASK_TI_USER_CFI_STATUS, task_struct, thread_info.user_cfi_state);
+ OFFSET(TASK_TI_USER_LPLR, task_struct, thread_info.user_cfi_state.lp_label);
+ OFFSET(TASK_TI_USER_SSP, task_struct, thread_info.user_cfi_state.user_shdw_stk);
+#endif
OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]);
OFFSET(TASK_THREAD_F1, task_struct, thread.fstate.f[1]);
OFFSET(TASK_THREAD_F2, task_struct, thread.fstate.f[2]);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 99d38fdf8b18..f283130c81ec 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -73,6 +73,31 @@ _save_context:
REG_S x30, PT_T5(sp)
REG_S x31, PT_T6(sp)
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+ /*
+ * If U --> S, CSR_SCRATCH should be holding U TP
+ * If S --> S, CSR_SCRATCH should be holding S TP
+ * s2 == tp means, previous mode was S
+ * else previous mode U
+ * we need to save cfi status only when previous mode was U
+ */
+ csrr s2, CSR_SCRATCH
+ xor s2, s2, tp
+ beqz s2, skip_bcfi_save
+ /* load cfi status word */
+ lw s2, TASK_TI_USER_CFI_STATUS(tp)
+ andi s3, s2, 1
+ beqz s3, skip_fcfi_save
+ /* fcfi is enabled, capture ELP and LPLR state and record it */
+ csrr s3, CSR_LPLR /* record label register */
+ sw s3, TASK_TI_USER_LPLR(tp) /* save it back in thread_info structure */
+skip_fcfi_save:
+ andi s3, s2, 2
+ beqz s3, skip_bcfi_save
+ csrr s3, CSR_SSP
+ REG_S s3, TASK_TI_USER_SSP(tp) /* save user ssp in thread_info */
+skip_bcfi_save:
+#endif
/*
* Disable user-mode memory access as it should only be set in the
* actual user copy routines.
@@ -283,6 +308,21 @@ resume_userspace:
*/
csrw CSR_SCRATCH, tp
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+ lw s2, TASK_TI_USER_CFI_STATUS(tp)
+ andi s3, s2, 1
+ beqz s3, skip_fcfi_resume
+ xor s3, s3, s3
+ lw s3, TASK_TI_USER_LPLR(tp)
+ csrw CSR_LPLR, s3
+skip_fcfi_resume:
+ andi s3, s2, 2
+ beqz s3, skip_bcfi_resume
+ REG_L s3, TASK_TI_USER_SSP(tp) /* save user ssp in thread_info */
+ csrw CSR_SSP, s3
+skip_bcfi_resume:
+#endif
+
restore_all:
#ifdef CONFIG_TRACE_IRQFLAGS
REG_L s1, PT_STATUS(sp)
--
2.25.1
Major architectures (x86, arm, riscv) have introduced shadow
stack support in their architecture for return control flow integrity
ISA extensions have some special encodings to make sure this shadow stack
page has special property in page table i.e a readonly page but still
writeable under special scenarios. As an example x86 has `call` (or new
shadow stack instructions) which can perform store on shadow stack but
regular stores are disallowed. Similarly riscv has sspush & ssamoswap
instruction which can perform stores but regular stores are not allowed.
As evident a page which can only be writeable by certain special
instructions but otherwise appear readonly to regular stores need a new
protection flag.
This patch introduces a new mmap protection flag to indicate such
protection in generic manner. Architectures can implement such protection
using arch specific encodings in page tables.
Signed-off-by: Deepak Gupta <[email protected]>
---
include/uapi/asm-generic/mman-common.h | 6 ++++++
mm/mmap.c | 4 ++++
2 files changed, 10 insertions(+)
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 6ce1f1ceb432..c8e549b29a24 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -11,6 +11,12 @@
#define PROT_WRITE 0x2 /* page can be written */
#define PROT_EXEC 0x4 /* page can be executed */
#define PROT_SEM 0x8 /* page may be used for atomic ops */
+/*
+ * Major architectures (x86, aarch64, riscv) have shadow stack now. Each architecture can
+ * choose to implement different PTE encodings. x86 encodings are PTE.R=0, PTE.W=1, PTE.D=1
+ * riscv encodings are PTE.R=0, PTE.W=1. Aarch64 encodings are not published yet
+ */
+#define PROT_SHADOWSTACK 0x40
/* 0x10 reserved for arch-specific use */
/* 0x20 reserved for arch-specific use */
#define PROT_NONE 0x0 /* page can not be accessed */
diff --git a/mm/mmap.c b/mm/mmap.c
index 425a9349e610..7e877c93d711 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -46,6 +46,7 @@
#include <linux/pkeys.h>
#include <linux/oom.h>
#include <linux/sched/mm.h>
+#include <linux/processor.h>
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
@@ -1251,6 +1252,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
if (!len)
return -EINVAL;
+ /* If PROT_SHADOWSTACK is specified and arch doesn't support it, return -EINVAL */
+ if ((prot & PROT_SHADOWSTACK) && !arch_supports_shadow_stack())
+ return -EINVAL;
/*
* Does the application expect PROT_READ to imply PROT_EXEC?
*
--
2.25.1
This patchimplements new mmap protection flag "PROT_SHADOWSTACK" on riscv
Zisslpcfi extension on riscv uses R=0, W=1, X=0 as shadow stack PTE
encoding. This encoding is reserved if Zisslpcfi is not implemented or
backward cfi is not enabled for the respective mode.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/mman.h | 19 +++++++++++++++++++
arch/riscv/include/asm/pgtable.h | 1 +
arch/riscv/kernel/sys_riscv.c | 22 ++++++++++++++++++++++
arch/riscv/mm/init.c | 2 +-
4 files changed, 43 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h
new file mode 100644
index 000000000000..9c8499294a60
--- /dev/null
+++ b/arch/riscv/include/asm/mman.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_MMAN_H__
+#define __ASM_MMAN_H__
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <uapi/asm/mman.h>
+
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
+ unsigned long pkey __always_unused)
+{
+ unsigned long ret = 0;
+
+ ret = (prot & PROT_SHADOWSTACK)?VM_WRITE:0;
+ return ret;
+}
+#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+
+#endif /* ! __ASM_MMAN_H__ */
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4eba9a98d0e3..74dbe122f2fa 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -159,6 +159,7 @@ extern struct pt_alloc_ops pt_ops __initdata;
#define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
#define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \
_PAGE_EXEC | _PAGE_WRITE)
+#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE)
#define PAGE_COPY PAGE_READ
#define PAGE_COPY_EXEC PAGE_EXEC
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 5d3f2fbeb33c..c3cf6b94c710 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -18,6 +18,28 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
return -EINVAL;
+ /*
+ * If only PROT_WRITE is specified then extend that to PROT_READ
+ * protection_map[VM_WRITE] is now going to select shadow stack encodings.
+ * So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
+ * If user wants to create shadow stack then they should specify PROT_SHADOWSTACK
+ * protection
+ */
+ if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
+ prot |= PROT_READ;
+
+ /*
+ * PROT_SHADOWSTACK is new protection flag. If specified with other like PROT_WRITE or
+ * PROT_READ PROT_SHADOWSTACK takes precedence. We can do either of following
+ * - ensure no other protection flags are specified along with it and return EINVAL
+ * OR
+ * - ensure we clear other protection flags.
+ * Choosing to follow former, if any other bit is set in prot, we return EINVAL
+ * Other architectures can treat different combinations for PROT_SHADOWSTACK
+ */
+ if (unlikely((prot & PROT_SHADOWSTACK) && (prot & ~PROT_SHADOWSTACK)))
+ return -EINVAL;
+
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
offset >> (PAGE_SHIFT - page_shift_offset));
}
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 478d6763a01a..ba8138c90450 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -294,7 +294,7 @@ static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAG
static const pgprot_t protection_map[16] = {
[VM_NONE] = PAGE_NONE,
[VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
+ [VM_WRITE] = PAGE_SHADOWSTACK,
[VM_WRITE | VM_READ] = PAGE_COPY,
[VM_EXEC] = PAGE_EXEC,
[VM_EXEC | VM_READ] = PAGE_READ_EXEC,
--
2.25.1
Binaries enabled with support for control-flow integrity will have new
instructions that may fault on cpus which dont implement cfi mechanisms.
This change adds
- stub for setting up cfi state when loading a binary. Architecture
specific implementation can choose to implement this stub and setup
cfi state for program.
- define riscv ELF flag marker for forward cfi and backward cfi in
uapi/linux/elf.h
Signed-off-by: Deepak Gupta <[email protected]>
---
fs/binfmt_elf.c | 5 +++++
include/linux/elf.h | 8 ++++++++
include/uapi/linux/elf.h | 6 ++++++
3 files changed, 19 insertions(+)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 9a780fafc539..bb431052eb01 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1277,6 +1277,11 @@ static int load_elf_binary(struct linux_binprm *bprm)
set_binfmt(&elf_format);
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+ retval = arch_elf_setup_cfi_state(&arch_state);
+ if (retval < 0)
+ goto out;
+#endif
#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
retval = ARCH_SETUP_ADDITIONAL_PAGES(bprm, elf_ex, !!interpreter);
if (retval < 0)
diff --git a/include/linux/elf.h b/include/linux/elf.h
index c9a46c4e183b..106d28f065aa 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -109,4 +109,12 @@ static inline int arch_elf_adjust_prot(int prot,
}
#endif
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+extern int arch_elf_setup_cfi_state(const struct arch_elf_state *state);
+#else
+static inline int arch_elf_setup_cfi_state(const struct arch_elf_state *state)
+{
+ return 0;
+}
+#endif
#endif /* _LINUX_ELF_H */
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 4c6a8fa5e7ed..1cbd332061dc 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -468,4 +468,10 @@ typedef struct elf64_note {
/* Bits for GNU_PROPERTY_AARCH64_FEATURE_1_BTI */
#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0)
+/* .note.gnu.property types for RISCV: */
+/* Bits for GNU_PROPERTY_RISCV_FEATURE_1_FCFI/BCFI */
+#define GNU_PROPERTY_RISCV_FEATURE_1_AND 0xc0000000
+#define GNU_PROPERTY_RISCV_FEATURE_1_FCFI (1u << 0)
+#define GNU_PROPERTY_RISCV_FEATURE_1_BCFI (1u << 1)
+
#endif /* _UAPI_LINUX_ELF_H */
--
2.25.1
Binaries enabled for Zisslpcfi will have new instructions that may fault
on risc-v cpus which dont implement Zimops or Zicfi. This change adds
- support for parsing new backward and forward cfi flags in
PT_GNU_PROPERTY
- setting cfi state on recognizing cfi flags in ELF
- enable back cfi and forward cfi in sstatus
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/elf.h | 54 +++++++++++++++++++++++++++++
arch/riscv/kernel/process.c | 67 ++++++++++++++++++++++++++++++++++++
2 files changed, 121 insertions(+)
diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index e7acffdf21d2..60ac2d2390ee 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -14,6 +14,7 @@
#include <asm/auxvec.h>
#include <asm/byteorder.h>
#include <asm/cacheinfo.h>
+#include <linux/processor.h>
/*
* These are used to set parameters in the core dumps.
@@ -140,4 +141,57 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm,
compat_arch_setup_additional_pages
#endif /* CONFIG_COMPAT */
+
+#define RISCV_ELF_FCFI (1 << 0)
+#define RISCV_ELF_BCFI (1 << 1)
+
+#ifdef CONFIG_ARCH_BINFMT_ELF_STATE
+struct arch_elf_state {
+ int flags;
+};
+
+#define INIT_ARCH_ELF_STATE { \
+ .flags = 0, \
+}
+#endif
+
+#ifdef CONFIG_ARCH_USE_GNU_PROPERTY
+static inline int arch_parse_elf_property(u32 type, const void *data,
+ size_t datasz, bool compat,
+ struct arch_elf_state *arch)
+{
+ /*
+ * TODO: Do we want to support in 32bit/compat?
+ * may be return 0 for now.
+ */
+ if (IS_ENABLED(CONFIG_COMPAT) && compat)
+ return 0;
+ if ((type & GNU_PROPERTY_RISCV_FEATURE_1_AND) == GNU_PROPERTY_RISCV_FEATURE_1_AND) {
+ const u32 *p = data;
+
+ if (datasz != sizeof(*p))
+ return -ENOEXEC;
+ if (arch_supports_indirect_br_lp_instr() &&
+ (*p & GNU_PROPERTY_RISCV_FEATURE_1_FCFI))
+ arch->flags |= RISCV_ELF_FCFI;
+ if (arch_supports_shadow_stack() && (*p & GNU_PROPERTY_RISCV_FEATURE_1_BCFI))
+ arch->flags |= RISCV_ELF_BCFI;
+ }
+ return 0;
+}
+
+static inline int arch_elf_pt_proc(void *ehdr, void *phdr,
+ struct file *f, bool is_interp,
+ struct arch_elf_state *state)
+{
+ return 0;
+}
+
+static inline int arch_check_elf(void *ehdr, bool has_interp,
+ void *interp_ehdr,
+ struct arch_elf_state *state)
+{
+ return 0;
+}
+#endif
#endif /* _ASM_RISCV_ELF_H */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 8955f2432c2d..db676262e61e 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -24,6 +24,7 @@
#include <asm/switch_to.h>
#include <asm/thread_info.h>
#include <asm/cpuidle.h>
+#include <linux/mman.h>
register unsigned long gp_in_global __asm__("gp");
@@ -135,6 +136,14 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
else
regs->status |= SR_UXL_64;
#endif
+#ifdef CONFIG_USER_SHADOW_STACK
+ if (current_thread_info()->user_cfi_state.ufcfi_en)
+ regs->status |= SR_UFCFIEN;
+#endif
+#ifdef CONFIG_USER_INDIRECT_BR_LP
+ if (current_thread_info()->user_cfi_state.ubcfi_en)
+ regs->status |= SR_UBCFIEN;
+#endif
}
void flush_thread(void)
@@ -189,3 +198,61 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
p->thread.sp = (unsigned long)childregs; /* kernel sp */
return 0;
}
+
+
+int allocate_shadow_stack(unsigned long *shadow_stack_base, unsigned long *shdw_size)
+{
+ int flags = MAP_ANONYMOUS | MAP_PRIVATE;
+ struct mm_struct *mm = current->mm;
+ unsigned long addr, populate, size;
+ *shadow_stack = 0;
+
+ if (!shdw_size)
+ return -EINVAL;
+
+ size = *shdw_size;
+
+ /* If size is 0, then try to calculate yourself */
+ if (size == 0)
+ size = round_up(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G), PAGE_SIZE);
+ mmap_write_lock(mm);
+ addr = do_mmap(NULL, 0, size, PROT_SHADOWSTACK, flags, 0,
+ &populate, NULL);
+ mmap_write_unlock(mm);
+ if (IS_ERR_VALUE(addr))
+ return PTR_ERR((void *)addr);
+ *shadow_stack_base = addr;
+ *shdw_size = size;
+ return 0;
+}
+
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+/* gets called from load_elf_binary(). This'll setup shadow stack and forward cfi enable */
+int arch_elf_setup_cfi_state(const struct arch_elf_state *state)
+{
+ int ret = 0;
+ unsigned long shadow_stack_base = 0;
+ unsigned long shadow_stk_size = 0;
+ struct thread_info *info = NULL;
+
+ info = current_thread_info();
+ /* setup back cfi state */
+ /* setup cfi state only if implementation supports it */
+ if (arch_supports_shadow_stack() && (state->flags & RISCV_ELF_BCFI)) {
+ info->user_cfi_state.ubcfi_en = 1;
+ ret = allocate_shadow_stack(&shadow_stack_base, &shadow_stk_size);
+ if (ret)
+ return ret;
+
+ info->user_cfi_state.user_shdw_stk = (shadow_stack_base + shadow_stk_size);
+ info->user_cfi_state.shdw_stk_base = shadow_stack_base;
+ }
+ /* setup forward cfi state */
+ if (arch_supports_indirect_br_lp_instr() && (state->flags & RISCV_ELF_FCFI)) {
+ info->user_cfi_state.ufcfi_en = 1;
+ info->user_cfi_state.lp_label = 0;
+ }
+
+ return ret;
+}
+#endif
\ No newline at end of file
--
2.25.1
Shadow stack load/stores to valid non-shadow memory raise access faults.
Regular store to shadow stack memory raise access fault as well.
This patch implements load and store access handler. Load access handler
reads faulting instruction and if it was an instruction issuing ss load,
it'll invoke page fault handler with a synthetic cause (marked reserved
in priv spec).
Similarly store access hanlder reads faulting instruction and if it was
an instruction issuing ss store, it'll invoke page fault handler with a
synthetic cause (reserved in spec).
All other cases in load/store access handler will lead to SIGSEV.
There might be concerns that using a reserved exception code may create
an issue because some riscv implementation might already using this code.
However counter argument would be, linux kernel is not using this code
and thus linux kernel should be able to use this exception code on such
a hardware.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/csr.h | 3 ++
arch/riscv/kernel/traps.c | 99 ++++++++++++++++++++++++++++++++++++
arch/riscv/mm/fault.c | 23 ++++++++-
3 files changed, 124 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 243031d1d305..828b1c2a74c2 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -104,6 +104,9 @@
#define EXC_SUPERVISOR_SYSCALL 10
#define EXC_INST_PAGE_FAULT 12
#define EXC_LOAD_PAGE_FAULT 13
+#ifdef CONFIG_USER_SHADOW_STACK
+#define EXC_SS_ACCESS_PAGE_FAULT 14
+#endif
#define EXC_STORE_PAGE_FAULT 15
#define EXC_INST_GUEST_PAGE_FAULT 20
#define EXC_LOAD_GUEST_PAGE_FAULT 21
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 549bde5c970a..5553b8d48ba5 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -94,6 +94,85 @@ static void do_trap_error(struct pt_regs *regs, int signo, int code,
}
}
+/* Zisslpcfi instructions encodings */
+#define SS_PUSH_POP 0x81C04073
+#define SS_AMOSWAP 0x82004073
+
+bool is_ss_load_store_insn(unsigned long insn)
+{
+ if ((insn & SS_PUSH_POP) == SS_PUSH_POP)
+ return true;
+ /*
+ * SS_AMOSWAP overlaps with LP_S_LL.
+ * But LP_S_LL can never raise access fault
+ */
+ if ((insn & SS_AMOSWAP) == SS_AMOSWAP)
+ return true;
+
+ return false;
+}
+
+ulong get_instruction(ulong epc)
+{
+ ulong *epc_ptr = (ulong *) epc;
+ ulong insn = 0;
+
+ __enable_user_access();
+ insn = *epc_ptr;
+ __disable_user_access();
+ return insn;
+}
+
+#ifdef CONFIG_USER_SHADOW_STACK
+extern asmlinkage void do_page_fault(struct pt_regs *regs);
+
+/*
+ * If CFI enabled then following then load access fault can occur if
+ * ssload (sspop/ssamoswap) happens on non-shadow stack memory.
+ * This is a valid case when we want to do COW on SS memory on `fork` or memory is swapped out.
+ * SS memory is marked as readonly and subsequent sspop or sspush will lead to
+ * load/store access fault. We need to decode instruction. If it's sspop or sspush
+ * Page fault handler is invoked.
+ */
+int handle_load_access_fault(struct pt_regs *regs)
+{
+ ulong insn = get_instruction(regs->epc);
+
+ if (is_ss_load_store_insn(insn)) {
+ regs->cause = EXC_SS_ACCESS_PAGE_FAULT;
+ do_page_fault(regs);
+ return 0;
+ }
+
+ return 1;
+}
+/*
+ * If CFI enabled then following then store access fault can occur if
+ * -- ssstore (sspush/ssamoswap) happens on non-shadow stack memory
+ * -- regular store happens on shadow stack memory
+ */
+int handle_store_access_fault(struct pt_regs *regs)
+{
+ ulong insn = get_instruction(regs->epc);
+
+ /*
+ * if a shadow stack store insn, change cause to
+ * synthetic SS_ACCESS_PAGE_FAULT
+ */
+ if (is_ss_load_store_insn(insn)) {
+ regs->cause = EXC_SS_ACCESS_PAGE_FAULT;
+ do_page_fault(regs);
+ return 0;
+ }
+ /*
+ * Reaching here means it was a regular store.
+ * A regular access fault anyways had been delivering SIGSEV
+ * A regular store to shadow stack anyways is also a SIGSEV
+ */
+ return 1;
+}
+#endif
+
#if defined(CONFIG_XIP_KERNEL) && defined(CONFIG_RISCV_ALTERNATIVE)
#define __trap_section __section(".xip.traps")
#else
@@ -113,8 +192,18 @@ DO_ERROR_INFO(do_trap_insn_fault,
SIGSEGV, SEGV_ACCERR, "instruction access fault");
DO_ERROR_INFO(do_trap_insn_illegal,
SIGILL, ILL_ILLOPC, "illegal instruction");
+#ifdef CONFIG_USER_SHADOW_STACK
+asmlinkage void __trap_section do_trap_load_fault(struct pt_regs *regs)
+{
+ if (!handle_load_access_fault(regs))
+ return;
+ do_trap_error(regs, SIGSEGV, SEGV_ACCERR, regs->epc,
+ "load access fault");
+}
+#else
DO_ERROR_INFO(do_trap_load_fault,
SIGSEGV, SEGV_ACCERR, "load access fault");
+#endif
#ifndef CONFIG_RISCV_M_MODE
DO_ERROR_INFO(do_trap_load_misaligned,
SIGBUS, BUS_ADRALN, "Oops - load address misaligned");
@@ -140,8 +229,18 @@ asmlinkage void __trap_section do_trap_store_misaligned(struct pt_regs *regs)
"Oops - store (or AMO) address misaligned");
}
#endif
+#ifdef CONFIG_USER_SHADOW_STACK
+asmlinkage void __trap_section do_trap_store_fault(struct pt_regs *regs)
+{
+ if (!handle_store_access_fault(regs))
+ return;
+ do_trap_error(regs, SIGSEGV, SEGV_ACCERR, regs->epc,
+ "store (or AMO) access fault");
+}
+#else
DO_ERROR_INFO(do_trap_store_fault,
SIGSEGV, SEGV_ACCERR, "store (or AMO) access fault");
+#endif
DO_ERROR_INFO(do_trap_ecall_u,
SIGILL, ILL_ILLTRP, "environment call from U-mode");
DO_ERROR_INFO(do_trap_ecall_s,
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index d86f7cebd4a7..b5ecf36eba3d 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -18,6 +18,7 @@
#include <asm/ptrace.h>
#include <asm/tlbflush.h>
+#include <asm/pgtable.h>
#include "../kernel/head.h"
@@ -177,6 +178,7 @@ static inline void vmalloc_fault(struct pt_regs *regs, int code, unsigned long a
static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
{
+ unsigned long prot = 0, shdw_stk_mask = 0;
switch (cause) {
case EXC_INST_PAGE_FAULT:
if (!(vma->vm_flags & VM_EXEC)) {
@@ -194,6 +196,20 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
return true;
}
break;
+#ifdef CONFIG_USER_SHADOW_STACK
+ /*
+ * If a ss access page fault. vma must have only VM_WRITE.
+ * and page prot much match to PAGE_SHADOWSTACK.
+ */
+ case EXC_SS_ACCESS_PAGE_FAULT:
+ prot = pgprot_val(vma->vm_page_prot);
+ shdw_stk_mask = pgprot_val(PAGE_SHADOWSTACK);
+ if (((vma->vm_flags & (VM_WRITE | VM_READ | VM_EXEC)) != VM_WRITE) ||
+ ((prot & shdw_stk_mask) != shdw_stk_mask)) {
+ return true;
+ }
+ break;
+#endif
default:
panic("%s: unhandled cause %lu", __func__, cause);
}
@@ -274,7 +290,12 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
- if (cause == EXC_STORE_PAGE_FAULT)
+ if (cause == EXC_STORE_PAGE_FAULT
+#ifdef CONFIG_USER_SHADOW_STACK
+ || cause == EXC_SS_ACCESS_PAGE_FAULT
+ /* if config says shadow stack and cause is ss access then indicate a write */
+#endif
+ )
flags |= FAULT_FLAG_WRITE;
else if (cause == EXC_INST_PAGE_FAULT)
flags |= FAULT_FLAG_INSTRUCTION;
--
2.25.1
`fork` implements copy on write (COW) by making pages readonly in child
and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE.
Assumption is that page is readable and on fault copy on write happens.
To implement COW on such pages, clearing up W bit makes them XWR = 000.
This will result in wrong PTE setting which says no perms but V=1 and PFN
field pointing to final page. Instead desired behavior is to turn it into
a readable page, take an access (load/store) fault on sspush/sspop
(shadow stack) and then perform COW on such pages. This way regular reads
would still be allowed and not lead to COW maintaining current behavior
of COW on non-shadow stack but writeable memory.
On the other hand it doesn't interfere with existing COW for read-write
memory. Assumption is always that _PAGE_READ must have been set and thus
setting _PAGE_READ is harmless.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/pgtable.h | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 74dbe122f2fa..13b325253c99 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -334,7 +334,7 @@ static inline int pte_special(pte_t pte)
static inline pte_t pte_wrprotect(pte_t pte)
{
- return __pte(pte_val(pte) & ~(_PAGE_WRITE));
+ return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ));
}
/* static inline pte_t pte_mkread(pte_t pte) */
@@ -509,7 +509,15 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
static inline void ptep_set_wrprotect(struct mm_struct *mm,
unsigned long address, pte_t *ptep)
{
- atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
+ volatile pte_t read_pte = *ptep;
+ /*
+ * ptep_set_wrprotect can be called for shadow stack ranges too.
+ * shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
+ * encoding 000b which is wrong encoding with V = 1. This should lead to page fault
+ * but we dont want this wrong configuration to be set in page tables.
+ */
+ atomic_long_set((atomic_long_t *)ptep,
+ ((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
}
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
--
2.25.1
maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if
VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write-
able memory except it can only be written by certain specific
instructions. This patch allows maybe_mkwrite to create shadow stack PTEs
if vma is shadow stack VMA. Each arch can define which combination of VMA
flags means a shadow stack.
Additionally pte_mkshdwstk must be provided by arch specific PTE
construction headers to create shadow stack PTEs. (in arch specific
pgtable.h).
This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK
is not selected.
Signed-off-by: Deepak Gupta <[email protected]>
---
include/linux/mm.h | 23 +++++++++++++++++++++--
include/linux/pgtable.h | 4 ++++
2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8f857163ac89..a7705bc49bfe 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page)
void free_compound_page(struct page *page);
#ifdef CONFIG_MMU
+
+#ifdef CONFIG_USER_SHADOW_STACK
+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma);
+#endif
+
+static inline bool
+is_shadow_stack_vma(struct vm_area_struct *vma)
+{
+#ifdef CONFIG_USER_SHADOW_STACK
+ return arch_is_shadow_stack_vma(vma);
+#else
+ return false;
+#endif
+}
+
/*
* Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when
* servicing faults for write access. In the normal case, do always want
@@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page);
*/
static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
{
- if (likely(vma->vm_flags & VM_WRITE))
- pte = pte_mkwrite(pte);
+ if (likely(vma->vm_flags & VM_WRITE)) {
+ if (unlikely(is_shadow_stack_vma(vma)))
+ pte = pte_mkshdwstk(pte);
+ else
+ pte = pte_mkwrite(pte);
+ }
return pte;
}
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 1159b25b0542..94b157218c73 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1736,4 +1736,8 @@ pgprot_t vm_get_page_prot(unsigned long vm_flags) \
} \
EXPORT_SYMBOL(vm_get_page_prot);
+#ifndef CONFIG_USER_SHADOW_STACK
+#define pte_mkshdwstk(pte) pte
+#endif
+
#endif /* _LINUX_PGTABLE_H */
--
2.25.1
This patch implements creating shadow stack pte (on riscv) if
CONFIG_USER_SHADOW_STACK is selected. Creating shadow stack PTE on riscv
means that clearing RWX and then setting W=1.
Additionally this patch implements `arch_is_shadow_stack_vma`. Each arch
can decide which combination of VMA flags are treated as shadow stack.
riscv is choosing to following PTE encodings for VMA flags as well i.e.
VM_WRITE only (no VM_READ or VM_EXEC) means its a shadow stack vma on
riscv.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/pgtable.h | 8 ++++++++
arch/riscv/mm/pageattr.c | 7 +++++++
2 files changed, 15 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 13b325253c99..11a423e78d52 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -344,6 +344,14 @@ static inline pte_t pte_mkwrite(pte_t pte)
return __pte(pte_val(pte) | _PAGE_WRITE);
}
+#ifdef CONFIG_USER_SHADOW_STACK
+static inline pte_t pte_mkshdwstk(pte_t pte)
+{
+ /* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */
+ return __pte((pte_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE);
+}
+#endif
+
/* static inline pte_t pte_mkexec(pte_t pte) */
static inline pte_t pte_mkdirty(pte_t pte)
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 86c56616e5de..582e17c4dc28 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -233,3 +233,10 @@ bool kernel_page_present(struct page *page)
pte = pte_offset_kernel(pmd, addr);
return pte_present(*pte);
}
+
+#ifdef CONFIG_USER_SHADOW_STACK
+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma)
+{
+ return ((vma->vm_flags & (VM_WRITE | VM_READ | VM_EXEC)) == VM_WRITE);
+}
+#endif
\ No newline at end of file
--
2.25.1
Adding an audit mode per task which suppresses cfi violations reported
as illegal instruction exception.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/processor.h | 3 ++-
arch/riscv/kernel/process.c | 2 ++
arch/riscv/kernel/traps.c | 7 ++++++-
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index f065309927b1..39c36f739ebb 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -45,7 +45,8 @@ struct thread_struct {
struct cfi_status {
unsigned int ufcfi_en : 1; /* Enable for forward cfi. Note that ELP goes in sstatus */
unsigned int ubcfi_en : 1; /* Enable for backward cfi. */
- unsigned int rsvd1 : 30;
+ unsigned int audit_mode : 1;
+ unsigned int rsvd1 : 29;
unsigned int lp_label; /* saved label value (25bit) */
long user_shdw_stk; /* Current user shadow stack pointer */
long shdw_stk_base; /* Base address of shadow stack */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index db676262e61e..bfd8511914d9 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -246,11 +246,13 @@ int arch_elf_setup_cfi_state(const struct arch_elf_state *state)
info->user_cfi_state.user_shdw_stk = (shadow_stack_base + shadow_stk_size);
info->user_cfi_state.shdw_stk_base = shadow_stack_base;
+ info->user_cfi_state.audit_mode = 1;
}
/* setup forward cfi state */
if (arch_supports_indirect_br_lp_instr() && (state->flags & RISCV_ELF_FCFI)) {
info->user_cfi_state.ufcfi_en = 1;
info->user_cfi_state.lp_label = 0;
+ info->user_cfi_state.audit_mode = 1;
}
return ret;
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index a292699f4f25..1901a8b73de5 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -165,6 +165,7 @@ int handle_illegal_instruction(struct pt_regs *regs)
if (arch_supports_indirect_br_lp_instr() &&
#ifdef CONFIG_USER_INDIRECT_BR_LP
info->user_cfi_state.ufcfi_en &&
+ info->user_cfi_state.audit_mode &&
#endif
(regs->status & SR_ELP)) {
pr_warn("cfi violation (elp): comm = %s, task = %p\n", task->comm, task);
@@ -172,7 +173,11 @@ int handle_illegal_instruction(struct pt_regs *regs)
return 0;
}
/* if faulting opcode is sscheckra/lpcll/lpcml/lpcll, advance PC and resume */
- if (is_cfi_violation_insn(insn)) {
+ if (is_cfi_violation_insn(insn)
+#if defined(CONFIG_USER_SHADOW_STACK) || defined(CONFIG_USER_INDIRECT_BR_LP)
+ && info->user_cfi_state.audit_mode
+#endif
+ ) {
/* no compressed form for zisslpcfi instructions */
regs->epc += 4;
return 0;
--
2.25.1
Zisslpcfi spec proposes that cfi violations are reported as illegal
instruction exception. Following are the cases
- elp missing: An indirect jmp/call landed on instruction which is
not `lpcll`
- label mismatch: Static label embedded in instr `lpcll/lpcml/lpcul`
doesn't match with repsective label in CSR_LPLR
- sscheckra: x1 and x5 don't match.
Current changes run user code in audit mode. That means that any cfi
violation is suppressed and app is allowed to continue.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/kernel/traps.c | 79 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 77 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 5553b8d48ba5..a292699f4f25 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -97,6 +97,10 @@ static void do_trap_error(struct pt_regs *regs, int signo, int code,
/* Zisslpcfi instructions encodings */
#define SS_PUSH_POP 0x81C04073
#define SS_AMOSWAP 0x82004073
+#define SS_CHECKRA 0x8A12C073
+#define LP_C_LL 0x83004073
+#define LP_C_ML 0x86804073
+#define LP_C_UL 0x8B804073
bool is_ss_load_store_insn(unsigned long insn)
{
@@ -112,6 +116,71 @@ bool is_ss_load_store_insn(unsigned long insn)
return false;
}
+bool is_cfi_violation_insn(unsigned long insn)
+{
+ struct task_struct *task = current;
+ bool ss_exist = false, lp_exist = false;
+
+ ss_exist = arch_supports_shadow_stack();
+ lp_exist = arch_supports_indirect_br_lp_instr();
+
+ if (ss_exist && (insn == SS_CHECKRA)) {
+ pr_warn("cfi violation (sschkra): comm = %s, task = %p\n", task->comm, task);
+ return true;
+ }
+ if (lp_exist && ((insn & LP_C_LL) == LP_C_LL)) {
+ pr_warn("cfi violation (lpcll): comm = %s, task = %p\n", task->comm, task);
+ return true;
+ }
+ if (lp_exist && ((insn & LP_C_ML) == LP_C_ML)) {
+ pr_warn("cfi violation (lpcml): comm = %s, task = %p\n", task->comm, task);
+ return true;
+ }
+ if (lp_exist && ((insn & LP_C_UL) == LP_C_UL)) {
+ pr_warn("cfi violation (lpcul): comm = %s, task = %p\n", task->comm, task);
+ return true;
+ }
+
+ return false;
+}
+
+int handle_illegal_instruction(struct pt_regs *regs)
+{
+ /* stval should hold faulting opcode */
+ unsigned long insn = csr_read(stval);
+ struct thread_info *info = NULL;
+ struct task_struct *task = current;
+
+ info = current_thread_info();
+ /*
+ * If CFI enabled then following instructions leads to illegal instruction fault
+ * -- sscheckra: x1 and x5 mismatch
+ * -- ELP = 1, Any instruction other than lpcll will fault
+ * -- lpcll will fault if lower label don't match with LPLR.LL
+ * -- lpcml will fault if lower label don't match with LPLR.ML
+ * -- lpcul will fault if lower label don't match with LPLR.UL
+ */
+
+ /* If fcfi enabled and ELP = 1, suppress ELP (audit mode) and resume */
+ if (arch_supports_indirect_br_lp_instr() &&
+#ifdef CONFIG_USER_INDIRECT_BR_LP
+ info->user_cfi_state.ufcfi_en &&
+#endif
+ (regs->status & SR_ELP)) {
+ pr_warn("cfi violation (elp): comm = %s, task = %p\n", task->comm, task);
+ regs->status &= ~(SR_ELP);
+ return 0;
+ }
+ /* if faulting opcode is sscheckra/lpcll/lpcml/lpcll, advance PC and resume */
+ if (is_cfi_violation_insn(insn)) {
+ /* no compressed form for zisslpcfi instructions */
+ regs->epc += 4;
+ return 0;
+ }
+
+ return 1;
+}
+
ulong get_instruction(ulong epc)
{
ulong *epc_ptr = (ulong *) epc;
@@ -190,8 +259,14 @@ DO_ERROR_INFO(do_trap_insn_misaligned,
SIGBUS, BUS_ADRALN, "instruction address misaligned");
DO_ERROR_INFO(do_trap_insn_fault,
SIGSEGV, SEGV_ACCERR, "instruction access fault");
-DO_ERROR_INFO(do_trap_insn_illegal,
- SIGILL, ILL_ILLOPC, "illegal instruction");
+
+asmlinkage void __trap_section do_trap_insn_illegal(struct pt_regs *regs)
+{
+ if (!handle_illegal_instruction(regs))
+ return;
+ do_trap_error(regs, SIGILL, ILL_ILLOPC, regs->epc,
+ "illegal instruction");
+}
#ifdef CONFIG_USER_SHADOW_STACK
asmlinkage void __trap_section do_trap_load_fault(struct pt_regs *regs)
{
--
2.25.1
To maintain control flow integrity of a program, integrity of indirect
control transfers has to be maintained. Almost in all architectures there
are two mechanisms for indirect control transfer
- Indirect call relying on a memory operand.
- Returns which pop an address from stack and return to caller.
Control transfers relying on memory operands are inherently susceptible to
memory corruption bugs and thus allowing attackers to perform code re-use
attacks which eventually is used to inject attacker's payload.
All major architectures (x86, aarch64 and riscv) have introduced hardware
assistance in form of architectural extensions to protect returns (using
alternate shadow/control stack) and forward control flow (by enforcing
all indirect control transfers land on a landing pad instruction)
This patch introduces two new CONFIGs
- CONFIG_USER_SHADOW_STACK
Config to enable kernel support for user mode shadow stacks
- CONFIG_USER_INDIRECT_BR_LP
Config to enable kernel support for enforcing landing pad instruction
on target of an indirect control transfer.
Signed-off-by: Deepak Gupta <[email protected]>
---
init/Kconfig | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/init/Kconfig b/init/Kconfig
index 44e90b28a30f..8867ea4b074f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -121,6 +121,25 @@ config THREAD_INFO_IN_TASK
One subtle change that will be needed is to use try_get_task_stack()
and put_task_stack() in save_thread_stack_tsk() and get_wchan().
+config USER_SHADOW_STACK
+ bool
+ help
+ Select this to enable kernel to support user mode shadow stack. Most
+ major architectures now support hardware assisted shadow stack. This
+ allows to enable non-arch specifics related to shadow stack in kernel.
+ Arch specific configuration options may also need to be enabled.
+
+config USER_INDIRECT_BR_LP
+ bool
+ help
+ Select this to allow user mode apps to opt-in to force requirement for
+ a landing pad instruction on indirect jumps or indirect calls in user mode.
+ Most major architectures now support hardware assistance for landing pad
+ instruction on indirect call or a jump. This config option allows non-arch
+ specifics related to landing pad instruction to be enabled separately from
+ arch specific implementations. Arch specific configuration options may also
+ need to be enabled.
+
menu "General setup"
config BROKEN
--
2.25.1
Three architectures (x86, aarch64, riscv) have announced support for
shadow stack and enforcing requirement of landing pad instructions on
indirect call/jmp. This patch adds arch-agnostic prtcl support to enable
/disable/get/set status of shadow stack and forward control (landing pad)
flow cfi statuses.
New prctls are
- PR_GET_SHADOW_STACK_STATUS, PR_SET_SHADOW_STACK_STATUS
- PR_GET_INDIRECT_BR_LP_STATUS, PR_SET_INDIRECT_BR_LP_STATUS
Signed-off-by: Deepak Gupta <[email protected]>
---
include/uapi/linux/prctl.h | 26 +++++++++++++++++++++++++
kernel/sys.c | 40 ++++++++++++++++++++++++++++++++++++++
2 files changed, 66 insertions(+)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a5e06dcbba13..0f401cb2d6d1 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -284,4 +284,30 @@ struct prctl_mm_map {
#define PR_SET_VMA 0x53564d41
# define PR_SET_VMA_ANON_NAME 0
+/*
+ * get shadow stack status for current thread. Assumes shadow stack is min 4 byte aligned.
+ * Note shadow stack can be 8 byte aligned on 64bit.
+ * Lower 2 bits can give status of locked and enabled/disabled.
+ * size and address range can be obtained via /proc/maps. get_shadow_stack_status will
+ * return base of shadow stack.
+ */
+#define PR_GET_SHADOW_STACK_STATUS 65
+/*
+ * set shadow stack status for current thread (including enabling, disabling or locking)
+ * note that it will only set the status and setup of the shadow stack. Allocating shadow
+ * stack should be done separately using mmap.
+ */
+#define PR_SET_SHADOW_STACK_STATUS 66
+# define PR_SHADOW_STACK_LOCK (1UL << 0)
+# define PR_SHADOW_STACK_ENABLE (1UL << 1)
+
+/* get status of requirement of a landing pad instruction for current thread */
+#define PR_GET_INDIRECT_BR_LP_STATUS 67
+/*
+ * set status of requirement of a landing pad instruction for current thread
+ * (including enabling, disabling or locking)
+ */
+#define PR_SET_INDIRECT_BR_LP_STATUS 68
+# define PR_INDIRECT_BR_LP_LOCK (1UL << 0)
+# define PR_INDIRECT_BR_LP_ENABLE (1UL << 1)
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 88b31f096fb2..da8c65d474df 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2284,6 +2284,26 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
return -EINVAL;
}
+int __weak arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_set_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
#define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
#ifdef CONFIG_ANON_VMA_NAME
@@ -2628,6 +2648,26 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_SET_VMA:
error = prctl_set_vma(arg2, arg3, arg4, arg5);
break;
+ case PR_GET_SHADOW_STACK_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_get_shadow_stack_status(me, (unsigned long __user *) arg2);
+ break;
+ case PR_SET_SHADOW_STACK_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_set_shadow_stack_status(me, (unsigned long __user *) arg2);
+ break;
+ case PR_GET_INDIRECT_BR_LP_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_get_indir_br_lp_status(me, (unsigned long __user *) arg2);
+ break;
+ case PR_SET_INDIRECT_BR_LP_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_set_indir_br_lp_status(me, (unsigned long __user *) arg2);
+ break;
default:
error = -EINVAL;
break;
--
2.25.1
New prctls are PR_GET_SHADOW_STACK_STATUS/PR_SET_SHADOW_STACK_STATUS and
PR_GET_INDIRECT_BR_LP_STATUS/PR_SET_INDIRECT_BR_LP_STATUS are implemented
on riscv in this patch.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/asm/processor.h | 4 +-
arch/riscv/kernel/process.c | 88 +++++++++++++++++++++++++++++-
2 files changed, 90 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 39c36f739ebb..c088584580b4 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -46,7 +46,9 @@ struct cfi_status {
unsigned int ufcfi_en : 1; /* Enable for forward cfi. Note that ELP goes in sstatus */
unsigned int ubcfi_en : 1; /* Enable for backward cfi. */
unsigned int audit_mode : 1;
- unsigned int rsvd1 : 29;
+ unsigned int ufcfi_locked : 1;
+ unsigned int ubcfi_locked : 1;
+ unsigned int rsvd1 : 27;
unsigned int lp_label; /* saved label value (25bit) */
long user_shdw_stk; /* Current user shadow stack pointer */
long shdw_stk_base; /* Base address of shadow stack */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index bfd8511914d9..1218ed4fd29f 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -257,4 +257,90 @@ int arch_elf_setup_cfi_state(const struct arch_elf_state *state)
return ret;
}
-#endif
\ No newline at end of file
+#endif
+
+#ifdef CONFIG_USER_SHADOW_STACK
+int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
+{
+ unsigned long bcfi_status = 0;
+ struct thread_info *info = NULL;
+
+ if (!arch_supports_shadow_stack())
+ return -EINVAL;
+
+ info = current_thread_info();
+ bcfi_status |= info->user_cfi_state.ubcfi_locked ? (1UL << 0) : 0;
+ bcfi_status |= info->user_cfi_state.ubcfi_en ? ((1UL << 1) |
+ (info->user_cfi_state.user_shdw_stk)) : 0;
+
+ return copy_to_user(status, &bcfi_status, sizeof(bcfi_status)) ? -EFAULT : 0;
+}
+
+int arch_set_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
+{
+ unsigned long bcfi_status = 0;
+ struct thread_info *info = NULL;
+ unsigned long shdw_stk = 0;
+
+ if (!arch_supports_shadow_stack())
+ return -EINVAL;
+
+ info = current_thread_info();
+ /* bcfi status is locked and further can't be modified by user */
+ if (info->user_cfi_state.ubcfi_locked)
+ return -EINVAL;
+
+ if (copy_from_user(&bcfi_status, status, sizeof(bcfi_status)))
+ return -EFAULT;
+ /* clear two least significant bits. Always assume min 4 byte alignment */
+ shdw_stk = (long) (bcfi_status & (~3));
+
+ if (shdw_stk >= TASK_SIZE)
+ return -EINVAL;
+
+ info->user_cfi_state.ubcfi_en = (bcfi_status & (1UL << 1)) ? 1 : 0;
+ info->user_cfi_state.ubcfi_locked = (bcfi_status & (1UL << 0)) ? 1 : 0;
+ info->user_cfi_state.user_shdw_stk = (long) shdw_stk;
+
+ return 0;
+}
+#endif
+
+#ifdef CONFIG_USER_INDIRECT_BR_LP
+int arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
+{
+ unsigned long fcfi_status = 0;
+ struct thread_info *info = NULL;
+
+ if (!arch_supports_indirect_br_lp_instr())
+ return -EINVAL;
+
+ info = current_thread_info();
+ fcfi_status |= info->user_cfi_state.ufcfi_locked ? (1UL << 0) : 0;
+ fcfi_status |= info->user_cfi_state.ufcfi_en ? (1UL << 1) : 0;
+
+ return copy_to_user(status, &fcfi_status, sizeof(fcfi_status)) ? -EFAULT : 0;
+}
+
+int arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
+{
+ unsigned long fcfi_status = 0;
+ struct thread_info *info = NULL;
+
+ if (!arch_supports_indirect_br_lp_instr())
+ return -EINVAL;
+
+ info = current_thread_info();
+ /* bcfi status is locked and further can't be modified by user */
+ if (info->user_cfi_state.ufcfi_locked)
+ return -EINVAL;
+
+ if (copy_from_user(&fcfi_status, status, sizeof(fcfi_status)))
+ return -EFAULT;
+
+ info->user_cfi_state.ufcfi_en = (fcfi_status & (1UL << 1)) ? 1 : 0;
+ info->user_cfi_state.ufcfi_locked = (fcfi_status & (1UL << 0)) ? 1 : 0;
+
+ return 0;
+}
+#endif
--
2.25.1
Shadow stack needs to be saved and restored on signal delivery and
signal return.
ucontext structure on riscv has existing large padding for possible
future extension of uc_sigmask. This patch steals XLEN/8 bytes from
padding to keep structure size and offset of existing member fields
same.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/include/uapi/asm/ucontext.h | 32 +++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/uapi/asm/ucontext.h b/arch/riscv/include/uapi/asm/ucontext.h
index 516bd0bb0da5..72303e5618a1 100644
--- a/arch/riscv/include/uapi/asm/ucontext.h
+++ b/arch/riscv/include/uapi/asm/ucontext.h
@@ -21,9 +21,12 @@ struct ucontext {
* at the end of this structure and explicitly state it can be
* expanded, so we didn't want to box ourselves in here.
*/
- __u8 __unused[1024 / 8 - sizeof(sigset_t)];
- /*
- * We can't put uc_sigmask at the end of this structure because we need
+ __u8 __unused[1024 / 8 - sizeof(sigset_t)
+#ifdef CONFIG_USER_SHADOW_STACK
+ - sizeof(unsigned long)
+#endif
+ ];
+ /* We can't put uc_sigmask at the end of this structure because we need
* to be able to expand sigcontext in the future. For example, the
* vector ISA extension will almost certainly add ISA state. We want
* to ensure all user-visible ISA state can be saved and restored via a
@@ -31,7 +34,30 @@ struct ucontext {
* infinite extensibility. Since we know this will be extended and we
* assume sigset_t won't be extended an extreme amount, we're
* prioritizing this.
+ */
+
+ /*
+ * Zisslpcfi will need state in ucontext to save and restore across
+ * makecontext/setcontext. Such one state is shadow stack pointer. We may need
+ * to save label (of the target function) as well (but that's to be decided).
+ * Stealing 8 (64bit) / 4 (32bit) bytes from padding (__unused) reserved
+ * for expanding sigset_t. We could've expanded the size of ucontext. But
+ * shadow stack is something which by default would be enabled via ELF.
+ * ucontext expansion makes more sense for situations like vector where
+ * app is willingly opting in to get special functionality. Opt-in allows
+ * for enlightening in ucontext restore. Second reason is shadow stack
+ * doesn't need a lot of state and only shadow stack pointer. Tax on
+ * ecosystem due to a small size change (8 bytes) of ucontext is more than
+ * simply keeping the size same and shoving the ss pointer in here. Please
+ * note that shadow stack pointer is pointing to a shadow stack address.
+ * Shadow stack address has shadow stack restore token using which shadow
+ * stack should be restored.
+ * Please note that we're keeping uc_ss_ptr at that this location so that
+ * every other offsets are same and thus works for compatibility.
*/
+#ifdef CONFIG_USER_SHADOW_STACK
+ unsigned long uc_ss_ptr;
+#endif
struct sigcontext uc_mcontext;
};
--
2.25.1
This patch selects config shadow stack support and landing pad instr
support. Since shadow stack support and landing instr support relies
on ELF header, this change also selects ARCH_USE_GNU_PROPERTY and
ARCH_BINFMT_ELF_STATE.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/Kconfig | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..9a39ada1d9d0 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -132,6 +132,10 @@ config RISCV
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select TRACE_IRQFLAGS_SUPPORT
+ select USER_SHADOW_STACK
+ select USER_INDIRECT_BR_LP
+ select ARCH_USE_GNU_PROPERTY
+ select ARCH_BINFMT_ELF_STATE
select UACCESS_MEMCPY if !MMU
select ZONE_DMA32 if 64BIT
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && $(cc-option,-fpatchable-function-entry=8)
--
2.25.1
Save shadow stack pointer in ucontext structure while delivering signal.
Restore shadow stack pointer from ucontext on sigreturn.
Signed-off-by: Deepak Gupta <[email protected]>
---
arch/riscv/kernel/signal.c | 45 ++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index bfb2afa4135f..b963bbce5879 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -103,6 +103,7 @@ SYSCALL_DEFINE0(rt_sigreturn)
struct pt_regs *regs = current_pt_regs();
struct rt_sigframe __user *frame;
struct task_struct *task;
+ struct thread_info *info = NULL;
sigset_t set;
/* Always make any pending restarted system calls return -EINTR */
@@ -124,6 +125,27 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (restore_altstack(&frame->uc.uc_stack))
goto badframe;
+#if defined(CONFIG_USER_SHADOW_STACK)
+ /*
+ * TODO: Restore shadow stack as a form of token stored on shadow stack itself as a safe
+ * way to restore.
+ * A token on shadow gives following properties
+ * - Safe save and restore for shadow stack switching. Any save of shadow stack
+ * must have had saved a token on shadow stack. Similarly any restore of shadow
+ * stack must check the token before restore. Since writing to shadow stack with
+ * address of shadow stack itself is not easily allowed. A restore without a save
+ * is quite difficult for an attacker to perform.
+ * - A natural break. A token in shadow stack provides a natural break in shadow stack
+ * So a single linear range can be bucketed into different shadow stack segments.
+ * Any sspop; sscheckra will detect the condition and fault to kernel.
+ */
+ info = current_thread_info();
+ if (info->user_cfi_state.ubcfi_en &&
+ __copy_from_user(&info->user_cfi_state.user_shdw_stk, &frame->uc.uc_ss_ptr,
+ sizeof(unsigned long)))
+ goto badframe;
+#endif
+
regs->cause = -1UL;
return regs->a0;
@@ -180,6 +202,7 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
struct pt_regs *regs)
{
struct rt_sigframe __user *frame;
+ struct thread_info *info = NULL;
long err = 0;
frame = get_sigframe(ksig, regs, sizeof(*frame));
@@ -191,6 +214,23 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
/* Create the ucontext. */
err |= __put_user(0, &frame->uc.uc_flags);
err |= __put_user(NULL, &frame->uc.uc_link);
+#if defined(CONFIG_USER_SHADOW_STACK)
+ /*
+ * TODO: Save a pointer to shadow stack itself on shadow stack as a form of token.
+ * A token on shadow gives following properties
+ * - Safe save and restore for shadow stack switching. Any save of shadow stack
+ * must have had saved a token on shadow stack. Similarly any restore of shadow
+ * stack must check the token before restore. Since writing to shadow stack with
+ * address of shadow stack itself is not easily allowed. A restore without a save
+ * is quite difficult for an attacker to perform.
+ * - A natural break. A token in shadow stack provides a natural break in shadow stack
+ * So a single linear range can be bucketed into different shadow stack segments. Any
+ * sspop; sscheckra will detect the condition and fault to kernel.
+ */
+ info = current_thread_info();
+ if (info->user_cfi_state.ubcfi_en)
+ err |= __put_user(info->user_cfi_state.user_shdw_stk, &frame->uc.uc_ss_ptr);
+#endif
err |= __save_altstack(&frame->uc.uc_stack, regs->sp);
err |= setup_sigcontext(frame, regs);
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
@@ -201,6 +241,11 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
#ifdef CONFIG_MMU
regs->ra = (unsigned long)VDSO_SYMBOL(
current->mm->context.vdso, rt_sigreturn);
+#if defined(CONFIG_USER_SHADOW_STACK)
+ /* if bcfi is enabled x1 (ra) and x5 (t0) must match */
+ if (info->user_cfi_state.ubcfi_en)
+ regs->t0 = regs->ra;
+#endif
#else
/*
* For the nommu case we don't have a VDSO. Instead we push two
--
2.25.1
On 13.02.23 05:53, Deepak Gupta wrote:
> maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if
> VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write-
> able memory except it can only be written by certain specific
> instructions. This patch allows maybe_mkwrite to create shadow stack PTEs
> if vma is shadow stack VMA. Each arch can define which combination of VMA
> flags means a shadow stack.
>
> Additionally pte_mkshdwstk must be provided by arch specific PTE
> construction headers to create shadow stack PTEs. (in arch specific
> pgtable.h).
>
> This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK
> is not selected.
>
> Signed-off-by: Deepak Gupta <[email protected]>
> ---
> include/linux/mm.h | 23 +++++++++++++++++++++--
> include/linux/pgtable.h | 4 ++++
> 2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8f857163ac89..a7705bc49bfe 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page)
> void free_compound_page(struct page *page);
>
> #ifdef CONFIG_MMU
> +
> +#ifdef CONFIG_USER_SHADOW_STACK
> +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma);
> +#endif
> +
> +static inline bool
> +is_shadow_stack_vma(struct vm_area_struct *vma)
> +{
> +#ifdef CONFIG_USER_SHADOW_STACK
> + return arch_is_shadow_stack_vma(vma);
> +#else
> + return false;
> +#endif
> +}
> +
> /*
> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when
> * servicing faults for write access. In the normal case, do always want
> @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page);
> */
> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
> {
> - if (likely(vma->vm_flags & VM_WRITE))
> - pte = pte_mkwrite(pte);
> + if (likely(vma->vm_flags & VM_WRITE)) {
> + if (unlikely(is_shadow_stack_vma(vma)))
> + pte = pte_mkshdwstk(pte);
> + else
> + pte = pte_mkwrite(pte);
> + }
> return pte;
Exactly what we are trying to avoid in the x86 approach right now.
Please see the x86 series on details, we shouldn't try reinventing the
wheel but finding a core-mm approach that fits multiple architectures.
https://lkml.kernel.org/r/[email protected]
--
Thanks,
David / dhildenb
On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote:
>On 13.02.23 05:53, Deepak Gupta wrote:
>>maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if
>>VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write-
>>able memory except it can only be written by certain specific
>>instructions. This patch allows maybe_mkwrite to create shadow stack PTEs
>>if vma is shadow stack VMA. Each arch can define which combination of VMA
>>flags means a shadow stack.
>>
>>Additionally pte_mkshdwstk must be provided by arch specific PTE
>>construction headers to create shadow stack PTEs. (in arch specific
>>pgtable.h).
>>
>>This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK
>>is not selected.
>>
>>Signed-off-by: Deepak Gupta <[email protected]>
>>---
>> include/linux/mm.h | 23 +++++++++++++++++++++--
>> include/linux/pgtable.h | 4 ++++
>> 2 files changed, 25 insertions(+), 2 deletions(-)
>>
>>diff --git a/include/linux/mm.h b/include/linux/mm.h
>>index 8f857163ac89..a7705bc49bfe 100644
>>--- a/include/linux/mm.h
>>+++ b/include/linux/mm.h
>>@@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page)
>> void free_compound_page(struct page *page);
>> #ifdef CONFIG_MMU
>>+
>>+#ifdef CONFIG_USER_SHADOW_STACK
>>+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma);
>>+#endif
>>+
>>+static inline bool
>>+is_shadow_stack_vma(struct vm_area_struct *vma)
>>+{
>>+#ifdef CONFIG_USER_SHADOW_STACK
>>+ return arch_is_shadow_stack_vma(vma);
>>+#else
>>+ return false;
>>+#endif
>>+}
>>+
>> /*
>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when
>> * servicing faults for write access. In the normal case, do always want
>>@@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page);
>> */
>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>> {
>>- if (likely(vma->vm_flags & VM_WRITE))
>>- pte = pte_mkwrite(pte);
>>+ if (likely(vma->vm_flags & VM_WRITE)) {
>>+ if (unlikely(is_shadow_stack_vma(vma)))
>>+ pte = pte_mkshdwstk(pte);
>>+ else
>>+ pte = pte_mkwrite(pte);
>>+ }
>> return pte;
>
>Exactly what we are trying to avoid in the x86 approach right now.
>Please see the x86 series on details, we shouldn't try reinventing the
>wheel but finding a core-mm approach that fits multiple architectures.
>
>https://lkml.kernel.org/r/[email protected]
Thanks David for comment here. I looked at x86 approach. This patch
actually written in a way which is not re-inventing wheel and is following
a core-mm approach that fits multiple architectures.
Change above checks `is_shadow_stack_vma` and if it returns true then only
it manufactures shadow stack pte else it'll make a regular writeable mapping.
Now if we look at `is_shadow_stack_vma` implementation, it returns false if
`CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is
defined then it calls `arch_is_shadow_stack_vma` which should be implemented
by arch specific code. This allows each architecture to define their own vma
flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE`
which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0)
Additionally pte_mkshdwstk will be nop if not implemented by architecture.
Let me know if this make sense. If I am missing something here, let me know.
>
>--
>Thanks,
>
>David / dhildenb
>
On 13.02.23 15:37, Deepak Gupta wrote:
> On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote:
>> On 13.02.23 05:53, Deepak Gupta wrote:
>>> maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if
>>> VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write-
>>> able memory except it can only be written by certain specific
>>> instructions. This patch allows maybe_mkwrite to create shadow stack PTEs
>>> if vma is shadow stack VMA. Each arch can define which combination of VMA
>>> flags means a shadow stack.
>>>
>>> Additionally pte_mkshdwstk must be provided by arch specific PTE
>>> construction headers to create shadow stack PTEs. (in arch specific
>>> pgtable.h).
>>>
>>> This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK
>>> is not selected.
>>>
>>> Signed-off-by: Deepak Gupta <[email protected]>
>>> ---
>>> include/linux/mm.h | 23 +++++++++++++++++++++--
>>> include/linux/pgtable.h | 4 ++++
>>> 2 files changed, 25 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 8f857163ac89..a7705bc49bfe 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page)
>>> void free_compound_page(struct page *page);
>>> #ifdef CONFIG_MMU
>>> +
>>> +#ifdef CONFIG_USER_SHADOW_STACK
>>> +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma);
>>> +#endif
>>> +
>>> +static inline bool
>>> +is_shadow_stack_vma(struct vm_area_struct *vma)
>>> +{
>>> +#ifdef CONFIG_USER_SHADOW_STACK
>>> + return arch_is_shadow_stack_vma(vma);
>>> +#else
>>> + return false;
>>> +#endif
>>> +}
>>> +
>>> /*
>>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when
>>> * servicing faults for write access. In the normal case, do always want
>>> @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page);
>>> */
>>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>>> {
>>> - if (likely(vma->vm_flags & VM_WRITE))
>>> - pte = pte_mkwrite(pte);
>>> + if (likely(vma->vm_flags & VM_WRITE)) {
>>> + if (unlikely(is_shadow_stack_vma(vma)))
>>> + pte = pte_mkshdwstk(pte);
>>> + else
>>> + pte = pte_mkwrite(pte);
>>> + }
>>> return pte;
>>
>> Exactly what we are trying to avoid in the x86 approach right now.
>> Please see the x86 series on details, we shouldn't try reinventing the
>> wheel but finding a core-mm approach that fits multiple architectures.
>>
>> https://lkml.kernel.org/r/[email protected]
>
> Thanks David for comment here. I looked at x86 approach. This patch
> actually written in a way which is not re-inventing wheel and is following
> a core-mm approach that fits multiple architectures.
>
> Change above checks `is_shadow_stack_vma` and if it returns true then only
> it manufactures shadow stack pte else it'll make a regular writeable mapping.
>
> Now if we look at `is_shadow_stack_vma` implementation, it returns false if
> `CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is
> defined then it calls `arch_is_shadow_stack_vma` which should be implemented
> by arch specific code. This allows each architecture to define their own vma
> flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE`
> which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0)
>
> Additionally pte_mkshdwstk will be nop if not implemented by architecture.
>
> Let me know if this make sense. If I am missing something here, let me know.
See the discussion in that thread. The idea is to pass a VMA to
pte_mkwrite() and let it handle how to actually set it writable.
--
Thanks,
David / dhildenb
On Mon, Feb 13, 2023 at 03:56:22PM +0100, David Hildenbrand wrote:
>On 13.02.23 15:37, Deepak Gupta wrote:
>>On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote:
>>>On 13.02.23 05:53, Deepak Gupta wrote:
>>>>maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if
>>>>VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write-
>>>>able memory except it can only be written by certain specific
>>>>instructions. This patch allows maybe_mkwrite to create shadow stack PTEs
>>>>if vma is shadow stack VMA. Each arch can define which combination of VMA
>>>>flags means a shadow stack.
>>>>
>>>>Additionally pte_mkshdwstk must be provided by arch specific PTE
>>>>construction headers to create shadow stack PTEs. (in arch specific
>>>>pgtable.h).
>>>>
>>>>This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK
>>>>is not selected.
>>>>
>>>>Signed-off-by: Deepak Gupta <[email protected]>
>>>>---
>>>> include/linux/mm.h | 23 +++++++++++++++++++++--
>>>> include/linux/pgtable.h | 4 ++++
>>>> 2 files changed, 25 insertions(+), 2 deletions(-)
>>>>
>>>>diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>index 8f857163ac89..a7705bc49bfe 100644
>>>>--- a/include/linux/mm.h
>>>>+++ b/include/linux/mm.h
>>>>@@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page)
>>>> void free_compound_page(struct page *page);
>>>> #ifdef CONFIG_MMU
>>>>+
>>>>+#ifdef CONFIG_USER_SHADOW_STACK
>>>>+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma);
>>>>+#endif
>>>>+
>>>>+static inline bool
>>>>+is_shadow_stack_vma(struct vm_area_struct *vma)
>>>>+{
>>>>+#ifdef CONFIG_USER_SHADOW_STACK
>>>>+ return arch_is_shadow_stack_vma(vma);
>>>>+#else
>>>>+ return false;
>>>>+#endif
>>>>+}
>>>>+
>>>> /*
>>>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when
>>>> * servicing faults for write access. In the normal case, do always want
>>>>@@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page);
>>>> */
>>>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>>>> {
>>>>- if (likely(vma->vm_flags & VM_WRITE))
>>>>- pte = pte_mkwrite(pte);
>>>>+ if (likely(vma->vm_flags & VM_WRITE)) {
>>>>+ if (unlikely(is_shadow_stack_vma(vma)))
>>>>+ pte = pte_mkshdwstk(pte);
>>>>+ else
>>>>+ pte = pte_mkwrite(pte);
>>>>+ }
>>>> return pte;
>>>
>>>Exactly what we are trying to avoid in the x86 approach right now.
>>>Please see the x86 series on details, we shouldn't try reinventing the
>>>wheel but finding a core-mm approach that fits multiple architectures.
>>>
>>>https://lkml.kernel.org/r/[email protected]
>>
>>Thanks David for comment here. I looked at x86 approach. This patch
>>actually written in a way which is not re-inventing wheel and is following
>>a core-mm approach that fits multiple architectures.
>>
>>Change above checks `is_shadow_stack_vma` and if it returns true then only
>>it manufactures shadow stack pte else it'll make a regular writeable mapping.
>>
>>Now if we look at `is_shadow_stack_vma` implementation, it returns false if
>>`CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is
>>defined then it calls `arch_is_shadow_stack_vma` which should be implemented
>>by arch specific code. This allows each architecture to define their own vma
>>flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE`
>>which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0)
>>
>>Additionally pte_mkshdwstk will be nop if not implemented by architecture.
>>
>>Let me know if this make sense. If I am missing something here, let me know.
>
>See the discussion in that thread. The idea is to pass a VMA to
>pte_mkwrite() and let it handle how to actually set it writable.
>
Thanks. I see. Instances where `pte_mkwrite` is directly invoked by checking
VM_WRITE and thus instead of fixing all those instance, make pte_mkwrite itself
take vma flag or vma.
I'll revise.
>--
>Thanks,
>
>David / dhildenb
>
On 13.02.23 21:01, Deepak Gupta wrote:
> On Mon, Feb 13, 2023 at 03:56:22PM +0100, David Hildenbrand wrote:
>> On 13.02.23 15:37, Deepak Gupta wrote:
>>> On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote:
>>>> On 13.02.23 05:53, Deepak Gupta wrote:
>>>>> maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if
>>>>> VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write-
>>>>> able memory except it can only be written by certain specific
>>>>> instructions. This patch allows maybe_mkwrite to create shadow stack PTEs
>>>>> if vma is shadow stack VMA. Each arch can define which combination of VMA
>>>>> flags means a shadow stack.
>>>>>
>>>>> Additionally pte_mkshdwstk must be provided by arch specific PTE
>>>>> construction headers to create shadow stack PTEs. (in arch specific
>>>>> pgtable.h).
>>>>>
>>>>> This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK
>>>>> is not selected.
>>>>>
>>>>> Signed-off-by: Deepak Gupta <[email protected]>
>>>>> ---
>>>>> include/linux/mm.h | 23 +++++++++++++++++++++--
>>>>> include/linux/pgtable.h | 4 ++++
>>>>> 2 files changed, 25 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>> index 8f857163ac89..a7705bc49bfe 100644
>>>>> --- a/include/linux/mm.h
>>>>> +++ b/include/linux/mm.h
>>>>> @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page)
>>>>> void free_compound_page(struct page *page);
>>>>> #ifdef CONFIG_MMU
>>>>> +
>>>>> +#ifdef CONFIG_USER_SHADOW_STACK
>>>>> +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma);
>>>>> +#endif
>>>>> +
>>>>> +static inline bool
>>>>> +is_shadow_stack_vma(struct vm_area_struct *vma)
>>>>> +{
>>>>> +#ifdef CONFIG_USER_SHADOW_STACK
>>>>> + return arch_is_shadow_stack_vma(vma);
>>>>> +#else
>>>>> + return false;
>>>>> +#endif
>>>>> +}
>>>>> +
>>>>> /*
>>>>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when
>>>>> * servicing faults for write access. In the normal case, do always want
>>>>> @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page);
>>>>> */
>>>>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>>>>> {
>>>>> - if (likely(vma->vm_flags & VM_WRITE))
>>>>> - pte = pte_mkwrite(pte);
>>>>> + if (likely(vma->vm_flags & VM_WRITE)) {
>>>>> + if (unlikely(is_shadow_stack_vma(vma)))
>>>>> + pte = pte_mkshdwstk(pte);
>>>>> + else
>>>>> + pte = pte_mkwrite(pte);
>>>>> + }
>>>>> return pte;
>>>>
>>>> Exactly what we are trying to avoid in the x86 approach right now.
>>>> Please see the x86 series on details, we shouldn't try reinventing the
>>>> wheel but finding a core-mm approach that fits multiple architectures.
>>>>
>>>> https://lkml.kernel.org/r/[email protected]
>>>
>>> Thanks David for comment here. I looked at x86 approach. This patch
>>> actually written in a way which is not re-inventing wheel and is following
>>> a core-mm approach that fits multiple architectures.
>>>
>>> Change above checks `is_shadow_stack_vma` and if it returns true then only
>>> it manufactures shadow stack pte else it'll make a regular writeable mapping.
>>>
>>> Now if we look at `is_shadow_stack_vma` implementation, it returns false if
>>> `CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is
>>> defined then it calls `arch_is_shadow_stack_vma` which should be implemented
>>> by arch specific code. This allows each architecture to define their own vma
>>> flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE`
>>> which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0)
>>>
>>> Additionally pte_mkshdwstk will be nop if not implemented by architecture.
>>>
>>> Let me know if this make sense. If I am missing something here, let me know.
>>
>> See the discussion in that thread. The idea is to pass a VMA to
>> pte_mkwrite() and let it handle how to actually set it writable.
>>
>
> Thanks. I see. Instances where `pte_mkwrite` is directly invoked by checking
> VM_WRITE and thus instead of fixing all those instance, make pte_mkwrite itself
> take vma flag or vma.
>
> I'll revise.
Thanks, it would be great to discuss in the other threads what else you
would need to make it work for you. I assume Rick will have something to
play with soonish (Right, Rick? :) ).
--
Thanks,
David / dhildenb
On Tue, 2023-02-14 at 13:10 +0100, David Hildenbrand wrote:
> I assume Rick will have something to
> play with soonish (Right, Rick? :) ).
Yes, Deepak and I were discussing on the x86 series. I haven't heard
anything from 0-day for a few days so looking good. There was
discussion happening with Boris on the pte_modify() patch, so might
wait a day more to post a new version.
On Sun, Feb 12, 2023 at 08:53:44PM -0800, Deepak Gupta wrote:
> Three architectures (x86, aarch64, riscv) have announced support for
> shadow stack and enforcing requirement of landing pad instructions on
> indirect call/jmp. This patch adds arch-agnostic prtcl support to enable
> /disable/get/set status of shadow stack and forward control (landing pad)
> flow cfi statuses.
>
> New prctls are
> - PR_GET_SHADOW_STACK_STATUS, PR_SET_SHADOW_STACK_STATUS
> - PR_GET_INDIRECT_BR_LP_STATUS, PR_SET_INDIRECT_BR_LP_STATUS
FWIW I had something very similar in my in progress arm64 support for
GCS (our equivalent feature), though without the LP stuff as we don't
have that.
Reviewed-by: Mark Brown <[email protected]>
I'll pull this into my branch and redo things on top of it if that's OK,
seems sensible to avoid collisions/duplication?
On Sun, Feb 12, 2023 at 08:53:44PM -0800, Deepak Gupta wrote:
> +int __weak arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
> +{
> + return -EINVAL;
> +}
Having looked at this further is there any great reason why the status
is passed as a pointer? It seems needless effort.
I am really sorry to have missed this and being late.
I saw your GCS patches. Thanks for picking this up.
On Wed, Jun 7, 2023 at 1:22 PM Mark Brown <[email protected]> wrote:
>
> On Sun, Feb 12, 2023 at 08:53:44PM -0800, Deepak Gupta wrote:
>
> > +int __weak arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
> > +{
> > + return -EINVAL;
> > +}
>
> Having looked at this further is there any great reason why the status
> is passed as a pointer? It seems needless effort.
I was trying to be cleaner here to not overload returned status with a pointer.
You could say that any negative value is an error. I don't have any
favorites here.
-Deepak
On Mon, Oct 09, 2023 at 02:22:51PM -0700, Deepak Gupta wrote:
> On Wed, Jun 7, 2023 at 1:22 PM Mark Brown <[email protected]> wrote:
> > On Sun, Feb 12, 2023 at 08:53:44PM -0800, Deepak Gupta wrote:
> > > +int __weak arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
> > > +{
> > > + return -EINVAL;
> > > +}
> > Having looked at this further is there any great reason why the status
> > is passed as a pointer? It seems needless effort.
> I was trying to be cleaner here to not overload returned status with a pointer.
> You could say that any negative value is an error. I don't have any
> favorites here.
OK, thanks - I changed it to treat negative codes as errors, I'll leave
things like that.