The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2, this will be implemented separately. Executables
are started without GCS and must use a prctl() to enable it, it is
expected that this will be done very early in application execution by
the dynamic linker or other startup code. For dynamic linking this will
be done by checking that everything in the executable is marked as GCS
compatible.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V Zicfiss feature which I initially
adopted fairly directly but following review feedback has been revised
quite a bit.
We currently maintain the x86 pattern of implicitly allocating a shadow
stack for threads started with shadow stack enabled, there has been some
discussion of removing this support and requiring the use of clone3()
with explicit allocation of shadow stacks instead. I have no strong
feelings either way, implicit allocation is not really consistent with
anything else we do and creates the potential for errors around thread
exit but on the other hand it is existing ABI on x86 and minimises the
changes needed in userspace code.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
The series depends on support for shadow stacks in clone3(), that series
includes the addition of ARCH_HAS_USER_SHADOW_STACK.
https://lore.kernel.org/r/[email protected]
It also depends on the addition of more waitpid() flags to nolibc:
https://lore.kernel.org/r/[email protected]
You can see a branch with the full set of dependencies against Linus'
tree at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Mark Brown <[email protected]>
---
Changes in v8:
- Invalidate signal cap token on stack when consuming.
- Typo and other trivial fixes.
- Don't try to use process_vm_write() on GCS, it intentionally does not
work.
- Fix leak of thread GCSs.
- Rebase onto latest clone3() series.
- Link to v7: https://lore.kernel.org/r/[email protected]
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/[email protected]
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/[email protected]
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/[email protected]
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/[email protected]
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/[email protected]
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/[email protected]
---
Mark Brown (38):
arm64/mm: Restructure arch_validate_flags() for extensibility
prctl: arch-agnostic prctl for shadow stack
mman: Add map_shadow_stack() flags
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide put_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/gcs: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Ensure that new threads have a GCS
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests
kselftest: Provide shadow stack enable helpers for arm64
Documentation/admin-guide/kernel-parameters.txt | 6 +
Documentation/arch/arm64/booting.rst | 22 +
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 233 +++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 20 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 107 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/mman.h | 23 +-
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 40 ++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 85 +++
arch/arm64/kernel/ptrace.c | 59 ++
arch/arm64/kernel/signal.c | 242 ++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/emulate-nested.c | 4 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 79 ++-
arch/arm64/mm/gcs.c | 300 +++++++++
arch/arm64/mm/mmap.c | 13 +-
arch/arm64/tools/cpucaps | 1 +
arch/x86/include/uapi/asm/mman.h | 3 -
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/uapi/asm-generic/mman.h | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 24 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 428 ++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 736 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 62 ++
.../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/ksft_shstk.h | 37 ++
73 files changed, 4241 insertions(+), 40 deletions(-)
---
base-commit: 50abefbf1bc07f5c4e403fd28f71dcee855100f7
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <[email protected]>
Currently arch_validate_flags() is written in a very non-extensible
fashion, returning immediately if MTE is not supported and writing the MTE
check as a direct return. Since we will want to add more checks for GCS
refactor the existing code to be more extensible, no functional change
intended.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/mman.h | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 5966ee4a6154..c21849ffdd88 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -52,11 +52,17 @@ static inline bool arch_validate_prot(unsigned long prot,
static inline bool arch_validate_flags(unsigned long vm_flags)
{
- if (!system_supports_mte())
- return true;
+ if (system_supports_mte()) {
+ /*
+ * only allow VM_MTE if VM_MTE_ALLOWED has been set
+ * previously
+ */
+ if ((vm_flags & VM_MTE) && !(vm_flags & VM_MTE_ALLOWED))
+ return false;
+ }
+
+ return true;
- /* only allow VM_MTE if VM_MTE_ALLOWED has been set previously */
- return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED);
}
#define arch_validate_flags(vm_flags) arch_validate_flags(vm_flags)
--
2.30.2
In preparation for adding arm64 GCS support make the map_shadow_stack()
SHADOW_STACK_SET_TOKEN flag generic and add _SET_MARKER. The existing
flag indicats that a token usable for stack switch should be added to
the top of the newly mapped GCS region while the new flag indicates that
a top of stack marker suitable for use by unwinders should be added
above that.
For arm64 the top of stack marker is all bits 0.
Signed-off-by: Mark Brown <[email protected]>
---
arch/x86/include/uapi/asm/mman.h | 3 ---
include/uapi/asm-generic/mman.h | 4 ++++
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h
index 46cdc941f958..ac1e6277212b 100644
--- a/arch/x86/include/uapi/asm/mman.h
+++ b/arch/x86/include/uapi/asm/mman.h
@@ -5,9 +5,6 @@
#define MAP_32BIT 0x40 /* only give out 32bit addresses */
#define MAP_ABOVE4G 0x80 /* only map above 4GB */
-/* Flags for map_shadow_stack(2) */
-#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */
-
#include <asm-generic/mman.h>
#endif /* _ASM_X86_MMAN_H */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 57e8195d0b53..d6a282687af5 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -19,4 +19,8 @@
#define MCL_FUTURE 2 /* lock all future mappings */
#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */
+#define SHADOW_STACK_SET_MARKER (1ULL << 1) /* Set up a top of stack merker in the shadow stack */
+
+
#endif /* __ASM_GENERIC_MMAN_H */
--
2.30.2
FEAT_GCS introduces a number of new system registers, we require that
access to these registers is not trapped when we identify that the feature
is detected.
Signed-off-by: Mark Brown <[email protected]>
---
Documentation/arch/arm64/booting.rst | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
index b57776a68f15..de3679770c64 100644
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@@ -411,6 +411,28 @@ Before jumping into the kernel, the following conditions must be met:
- HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
+ - For features with Guarded Control Stacks (FEAT_GCS):
+
+ - If EL3 is present:
+
+ - SCR_EL3.GCSEn (bit 39) must be initialised to 0b1.
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+ - HFGITR_EL2.nGCSEPP (bit 59) must be initialised to 0b1.
+
+ - HFGITR_EL2.nGCSSTR_EL1 (bit 58) must be initialised to 0b1.
+
+ - HFGITR_EL2.nGCSPUSHM_EL1 (bit 57) must be initialised to 0b1.
+
+ - HFGRTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.
+
+ - HFGRTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
+
+ - HFGWTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.
+
+ - HFGWTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
+
The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must
enter the kernel in the same exception level. Where the values documented
--
2.30.2
Add some documentation of the userspace ABI for Guarded Control Stacks.
Signed-off-by: Mark Brown <[email protected]>
---
Documentation/arch/arm64/gcs.rst | 233 +++++++++++++++++++++++++++++++++++++
Documentation/arch/arm64/index.rst | 1 +
2 files changed, 234 insertions(+)
diff --git a/Documentation/arch/arm64/gcs.rst b/Documentation/arch/arm64/gcs.rst
new file mode 100644
index 000000000000..c45c0326836a
--- /dev/null
+++ b/Documentation/arch/arm64/gcs.rst
@@ -0,0 +1,233 @@
+===============================================
+Guarded Control Stack support for AArch64 Linux
+===============================================
+
+This document outlines briefly the interface provided to userspace by Linux in
+order to support use of the ARM Guarded Control Stack (GCS) feature.
+
+This is an outline of the most important features and issues only and not
+intended to be exhaustive.
+
+
+
+1. General
+-----------
+
+* GCS is an architecture feature intended to provide greater protection
+ against return oriented programming (ROP) attacks and to simplify the
+ implementation of features that need to collect stack traces such as
+ profiling.
+
+* When GCS is enabled a separate guarded control stack is maintained by the
+ PE which is writeable only through specific GCS operations. This
+ stores the call stack only, when a procedure call instruction is
+ performed the current PC is pushed onto the GCS and on RET the
+ address in the LR is verified against that on the top of the GCS.
+
+* When active current GCS pointer is stored in the system register
+ GCSPR_EL0. This is readable by userspace but can only be updated
+ via specific GCS instructions.
+
+* The architecture provides instructions for switching between guarded
+ control stacks with checks to ensure that the new stack is a valid
+ target for switching.
+
+* The functionality of GCS is similar to that provided by the x86 Shadow
+ Stack feature, due to sharing of userspace interfaces the ABI refers to
+ shadow stacks rather than GCS.
+
+* Support for GCS is reported to userspace via HWCAP2_GCS in the aux vector
+ AT_HWCAP2 entry.
+
+* GCS is enabled per thread. While there is support for disabling GCS
+ at runtime this should be done with great care.
+
+* GCS memory access faults are reported as normal memory access faults.
+
+* GCS specific errors (those reported with EC 0x2d) will be reported as
+ SIGSEGV with a si_code of SEGV_CPERR (control protection error).
+
+* GCS is supported only for AArch64.
+
+* On systems where GCS is supported GCSPR_EL0 is always readable by EL0
+ regardless of the GCS configuration for the thread.
+
+* The architecture supports enabling GCS without verifying that return values
+ in LR match those in the GCS, the LR will be ignored. This is not supported
+ by Linux.
+
+* EL0 GCS entries with bit 63 set are reserved for use, one such use is defined
+ below for signals and should be ignored when parsing the stack if not
+ understood.
+
+
+2. Enabling and disabling Guarded Control Stacks
+-------------------------------------------------
+
+* GCS is enabled and disabled for a thread via the PR_SET_SHADOW_STACK_STATUS
+ prctl(), this takes a single flags argument specifying which GCS features
+ should be used.
+
+* When set PR_SHADOW_STACK_ENABLE flag allocates a Guarded Control Stack
+ and enables GCS for the thread, enabling the functionality controlled by
+ GCSCRE0_EL1.{nTR, RVCHKEN, PCRSEL}.
+
+* When set the PR_SHADOW_STACK_PUSH flag enables the functionality controlled
+ by GCSCRE0_EL1.PUSHMEn, allowing explicit GCS pushes.
+
+* When set the PR_SHADOW_STACK_WRITE flag enables the functionality controlled
+ by GCSCRE0_EL1.STREn, allowing explicit stores to the Guarded Control Stack.
+
+* Any unknown flags will cause PR_SET_SHADOW_STACK_STATUS to return -EINVAL.
+
+* PR_LOCK_SHADOW_STACK_STATUS is passed a bitmask of features with the same
+ values as used for PR_SET_SHADOW_STACK_STATUS. Any future changes to the
+ status of the specified GCS mode bits will be rejected.
+
+* PR_LOCK_SHADOW_STACK_STATUS allows any bit to be locked, this allows
+ userspace to prevent changes to any future features.
+
+* There is no support for a process to remove a lock that has been set for
+ it.
+
+* PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS affect only the
+ thread that called them, any other running threads will be unaffected.
+
+* New threads inherit the GCS configuration of the thread that created them.
+
+* GCS is disabled on exec().
+
+* The current GCS configuration for a thread may be read with the
+ PR_GET_SHADOW_STACK_STATUS prctl(), this returns the same flags that
+ are passed to PR_SET_SHADOW_STACK_STATUS.
+
+* If GCS is disabled for a thread after having previously been enabled then
+ the stack will remain allocated for the lifetime of the thread. At present
+ any attempt to reenable GCS for the thread will be rejected, this may be
+ revisited in future.
+
+* It should be noted that since enabling GCS will result in GCS becoming
+ active immediately it is not normally possible to return from the function
+ that invoked the prctl() that enabled GCS. It is expected that the normal
+ usage will be that GCS is enabled very early in execution of a program.
+
+
+
+3. Allocation of Guarded Control Stacks
+----------------------------------------
+
+* When GCS is enabled for a thread a new Guarded Control Stack will be
+ allocated for it of size RLIMIT_STACK or 4 gigabytes, whichever is
+ smaller.
+
+* When a new thread is created by a thread which has GCS enabled then a
+ new Guarded Control Stack will be allocated for the new thread with
+ half the size of the standard stack.
+
+* When a stack is allocated by enabling GCS or during thread creation then
+ the top 8 bytes of the stack will be initialised to 0 and GCSPR_EL0 will
+ be set to point to the address of this 0 value, this can be used to
+ detect the top of the stack.
+
+* Additional Guarded Control Stacks can be allocated using the
+ map_shadow_stack() system call.
+
+* Stacks allocated using map_shadow_stack() can optionally have an end of
+ stack marker and cap placed at the top of the stack. If the flag
+ SHADOW_STACK_SET_TOKEN is specified a cap will be placed on the stack,
+ if SHADOW_STACK_SET_MARKER is not specified the cap will be the top 8
+ bytes of the stack and if it is specified then the cap will be the next
+ 8 bytes. While specifying just SHADOW_STACK_SET_MARKER by itself is
+ valid since the marker is all bits 0 it has no observable effect.
+
+* Stacks allocated using map_shadow_stack() must have a size which is a
+ multiple of 8 bytes larger than 8 bytes and must be 8 bytes aligned.
+
+* An address can be specified to map_shadow_stack(), if one is provided then
+ it must be aligned to a page boundary.
+
+* When a thread is freed the Guarded Control Stack initially allocated for
+ that thread will be freed. Note carefully that if the stack has been
+ switched this may not be the stack currently in use by the thread.
+
+
+4. Signal handling
+--------------------
+
+* A new signal frame record gcs_context encodes the current GCS mode and
+ pointer for the interrupted context on signal delivery. This will always
+ be present on systems that support GCS.
+
+* The record contains a flag field which reports the current GCS configuration
+ for the interrupted context as PR_GET_SHADOW_STACK_STATUS would.
+
+* The signal handler is run with the same GCS configuration as the interrupted
+ context.
+
+* When GCS is enabled for the interrupted thread a signal handling specific
+ GCS cap token will be written to the GCS, this is an architectural GCS cap
+ token with bit 63 set and the token type (bits 0..11) all clear. The
+ GCSPR_EL0 reported in the signal frame will point to this cap token.
+
+* The signal handler will use the same GCS as the interrupted context.
+
+* When GCS is enabled on signal entry a frame with the address of the signal
+ return handler will be pushed onto the GCS, allowing return from the signal
+ handler via RET as normal. This will not be reported in the gcs_context in
+ the signal frame.
+
+
+5. Signal return
+-----------------
+
+When returning from a signal handler:
+
+* If there is a gcs_context record in the signal frame then the GCS flags
+ and GCSPR_EL0 will be restored from that context prior to further
+ validation.
+
+* If there is no gcs_context record in the signal frame then the GCS
+ configuration will be unchanged.
+
+* If GCS is enabled on return from a signal handler then GCSPR_EL0 must
+ point to a valid GCS signal cap record, this will be popped from the
+ GCS prior to signal return.
+
+* If the GCS configuration is locked when returning from a signal then any
+ attempt to change the GCS configuration will be treated as an error. This
+ is true even if GCS was not enabled prior to signal entry.
+
+* GCS may be disabled via signal return but any attempt to enable GCS via
+ signal return will be rejected.
+
+
+6. ptrace extensions
+---------------------
+
+* A new regset NT_ARM_GCS is defined for use with PTRACE_GETREGSET and
+ PTRACE_SETREGSET.
+
+* Due to the complexity surrounding allocation and deallocation of stacks and
+ lack of practical application it is not possible to enable GCS via ptrace.
+ GCS may be disabled via the ptrace interface.
+
+* Other GCS modes may be configured via ptrace.
+
+* Configuration via ptrace ignores locking of GCS mode bits.
+
+
+7. ELF coredump extensions
+---------------------------
+
+* NT_ARM_GCS notes will be added to each coredump for each thread of the
+ dumped process. The contents will be equivalent to the data that would
+ have been read if a PTRACE_GETREGSET of the corresponding type were
+ executed for each thread when the coredump was generated.
+
+
+
+8. /proc extensions
+--------------------
+
+* Guarded Control Stack pages will include "ss" in their VmFlags in
+ /proc/<pid>/smaps.
diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
index d08e924204bf..dcf3ee3eb8c0 100644
--- a/Documentation/arch/arm64/index.rst
+++ b/Documentation/arch/arm64/index.rst
@@ -14,6 +14,7 @@ ARM64 Architecture
booting
cpu-feature-registers
elf_hwcaps
+ gcs
hugetlbpage
kdump
legacy_instructions
--
2.30.2
The architecture defines a format for guarded control stack caps, used
to mark the top of an unused GCS in order to limit the potential for
exploitation via stack switching. Add definitions associated with these.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index c3b19b376c86..6ed813e856c1 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -1064,6 +1064,26 @@
#define POE_RXW UL(0x7)
#define POE_MASK UL(0xf)
+/*
+ * Definitions for Guarded Control Stack
+ */
+
+#define GCS_CAP_ADDR_MASK GENMASK(63, 12)
+#define GCS_CAP_ADDR_SHIFT 12
+#define GCS_CAP_ADDR_WIDTH 52
+#define GCS_CAP_ADDR(x) FIELD_GET(GCS_CAP_ADDR_MASK, x)
+
+#define GCS_CAP_TOKEN_MASK GENMASK(11, 0)
+#define GCS_CAP_TOKEN_SHIFT 0
+#define GCS_CAP_TOKEN_WIDTH 12
+#define GCS_CAP_TOKEN(x) FIELD_GET(GCS_CAP_TOKEN_MASK, x)
+
+#define GCS_CAP_VALID_TOKEN 0x1
+#define GCS_CAP_IN_PROGRESS_TOKEN 0x5
+
+#define GCS_CAP(x) ((((unsigned long)x) & GCS_CAP_ADDR_MASK) | \
+ GCS_CAP_VALID_TOKEN)
+
#define ARM64_FEATURE_FIELD_BITS 4
/* Defined for compatibility only, do not add new users. */
--
2.30.2
In order for EL1 to write to an EL0 GCS it must use the GCSSTTR instruction
rather than a normal STTR. Provide a put_user_gcs() which does this.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/uaccess.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 22e10e79f56a..e118c3d772c8 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -445,6 +445,24 @@ static inline int gcssttr(unsigned long __user *addr, unsigned long val)
return err;
}
+static inline void put_user_gcs(unsigned long val, unsigned long __user *addr,
+ int *err)
+{
+ int ret;
+
+ if (!access_ok((char __user *)addr, sizeof(u64))) {
+ *err = -EFAULT;
+ return;
+ }
+
+ uaccess_ttbr0_enable();
+ ret = gcssttr(addr, val);
+ if (ret != 0)
+ *err = ret;
+ uaccess_ttbr0_disable();
+}
+
+
#endif /* CONFIG_ARM64_GCS */
#endif /* __ASM_UACCESS_H */
--
2.30.2
Add a cpufeature for GCS, allowing other code to conditionally support it
at runtime.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/cpufeature.h | 6 ++++++
arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++
arch/arm64/tools/cpucaps | 1 +
3 files changed, 23 insertions(+)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 21c824edf8ce..3f3a685fa6e2 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -825,6 +825,12 @@ static inline bool system_supports_lpa2(void)
return cpus_have_final_cap(ARM64_HAS_LPA2);
}
+static inline bool system_supports_gcs(void)
+{
+ return IS_ENABLED(CONFIG_ARM64_GCS) &&
+ alternative_has_cap_unlikely(ARM64_HAS_GCS);
+}
+
int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 8d1a634a403e..b606842ab8c1 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -255,6 +255,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
};
static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
+ ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_GCS),
+ FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_GCS_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_SME_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_MPAM_frac_SHIFT, 4, 0),
@@ -2250,6 +2252,12 @@ static void cpu_enable_mops(const struct arm64_cpu_capabilities *__unused)
sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_MSCEn);
}
+static void cpu_enable_gcs(const struct arm64_cpu_capabilities *__unused)
+{
+ /* GCS is not currently used at EL1 */
+ write_sysreg_s(0, SYS_GCSCR_EL1);
+}
+
/* Internal helper functions to match cpu capability type */
static bool
cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
@@ -2739,6 +2747,14 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.matches = has_lpa2,
},
+ {
+ .desc = "Guarded Control Stack (GCS)",
+ .capability = ARM64_HAS_GCS,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .cpu_enable = cpu_enable_gcs,
+ .matches = has_cpuid_feature,
+ ARM64_CPUID_FIELDS(ID_AA64PFR1_EL1, GCS, IMP)
+ },
{},
};
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index b912b1409fc0..148734504295 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -28,6 +28,7 @@ HAS_EPAN
HAS_EVT
HAS_FGT
HAS_FPSIMD
+HAS_GCS
HAS_GENERIC_AUTH
HAS_GENERIC_AUTH_ARCH_QARMA3
HAS_GENERIC_AUTH_ARCH_QARMA5
--
2.30.2
Pages used for guarded control stacks need to be described to the hardware
using the Permission Indirection Extension, GCS is not supported without
PIE. In order to support copy on write for guarded stacks we allocate two
values, one for active GCSs and one for GCS pages marked as read only prior
to copy.
Since the actual effect is defined using PIE the specific bit pattern used
does not matter to the hardware but we choose two values which differ only
in PTE_WRITE in order to help share code with non-PIE cases.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/pgtable-prot.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 483dbfa39c4c..14a33e0bece3 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -129,15 +129,23 @@ extern bool arm64_use_ng_mappings;
/* 6: PTE_PXN | PTE_WRITE */
/* 7: PAGE_SHARED_EXEC PTE_PXN | PTE_WRITE | PTE_USER */
/* 8: PAGE_KERNEL_ROX PTE_UXN */
-/* 9: PTE_UXN | PTE_USER */
+/* 9: PAGE_GCS_RO PTE_UXN | PTE_USER */
/* a: PAGE_KERNEL_EXEC PTE_UXN | PTE_WRITE */
-/* b: PTE_UXN | PTE_WRITE | PTE_USER */
+/* b: PAGE_GCS PTE_UXN | PTE_WRITE | PTE_USER */
/* c: PAGE_KERNEL_RO PTE_UXN | PTE_PXN */
/* d: PAGE_READONLY PTE_UXN | PTE_PXN | PTE_USER */
/* e: PAGE_KERNEL PTE_UXN | PTE_PXN | PTE_WRITE */
/* f: PAGE_SHARED PTE_UXN | PTE_PXN | PTE_WRITE | PTE_USER */
+#define _PAGE_GCS (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_WRITE | PTE_USER)
+#define _PAGE_GCS_RO (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_USER)
+
+#define PAGE_GCS __pgprot(_PAGE_GCS)
+#define PAGE_GCS_RO __pgprot(_PAGE_GCS_RO)
+
#define PIE_E0 ( \
+ PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS), PIE_GCS) | \
+ PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS_RO), PIE_R) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_X_O) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RWX) | \
@@ -145,6 +153,8 @@ extern bool arm64_use_ng_mappings;
PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED), PIE_RW))
#define PIE_E1 ( \
+ PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS), PIE_NONE_O) | \
+ PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS_RO), PIE_NONE_O) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_NONE_O) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_R) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RW) | \
--
2.30.2
Define C callable functions for GCS instructions used by the kernel. In
order to avoid ambitious toolchain requirements for GCS support these are
manually encoded, this means we have fixed register numbers which will be
a bit limiting for the compiler but none of these should be used in
sufficiently fast paths for this to be a problem.
Note that GCSSTTR is used to store to EL0.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 51 ++++++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/uaccess.h | 22 +++++++++++++++++
2 files changed, 73 insertions(+)
diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
new file mode 100644
index 000000000000..7c5e95218db6
--- /dev/null
+++ b/arch/arm64/include/asm/gcs.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+#ifndef __ASM_GCS_H
+#define __ASM_GCS_H
+
+#include <asm/types.h>
+#include <asm/uaccess.h>
+
+static inline void gcsb_dsync(void)
+{
+ asm volatile(".inst 0xd503227f" : : : "memory");
+}
+
+static inline void gcsstr(u64 *addr, u64 val)
+{
+ register u64 *_addr __asm__ ("x0") = addr;
+ register long _val __asm__ ("x1") = val;
+
+ /* GCSSTTR x1, x0 */
+ asm volatile(
+ ".inst 0xd91f1c01\n"
+ :
+ : "rZ" (_val), "r" (_addr)
+ : "memory");
+}
+
+static inline void gcsss1(u64 Xt)
+{
+ asm volatile (
+ "sys #3, C7, C7, #2, %0\n"
+ :
+ : "rZ" (Xt)
+ : "memory");
+}
+
+static inline u64 gcsss2(void)
+{
+ u64 Xt;
+
+ asm volatile(
+ "SYSL %0, #3, C7, C7, #3\n"
+ : "=r" (Xt)
+ :
+ : "memory");
+
+ return Xt;
+}
+
+#endif
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 14be5000c5a0..22e10e79f56a 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -425,4 +425,26 @@ static inline size_t probe_subpage_writeable(const char __user *uaddr,
#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */
+#ifdef CONFIG_ARM64_GCS
+
+static inline int gcssttr(unsigned long __user *addr, unsigned long val)
+{
+ register unsigned long __user *_addr __asm__ ("x0") = addr;
+ register unsigned long _val __asm__ ("x1") = val;
+ int err = 0;
+
+ /* GCSSTTR x1, x0 */
+ asm volatile(
+ "1: .inst 0xd91f1c01\n"
+ "2: \n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %w0)
+ : "+r" (err)
+ : "rZ" (_val), "r" (_addr)
+ : "memory");
+
+ return err;
+}
+
+#endif /* CONFIG_ARM64_GCS */
+
#endif /* __ASM_UACCESS_H */
--
2.30.2
Map pages flagged as being part of a GCS as such rather than using the
full set of generic VM flags.
This is done using a conditional rather than extending the size of
protection_map since that would make for a very sparse array.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/mman.h | 9 +++++++++
arch/arm64/mm/mmap.c | 13 ++++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index c21849ffdd88..6d3fe6433a62 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -61,6 +61,15 @@ static inline bool arch_validate_flags(unsigned long vm_flags)
return false;
}
+ if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) {
+ /*
+ * An executable GCS isn't a good idea, and the mm
+ * core can't cope with a shared GCS.
+ */
+ if (vm_flags & (VM_EXEC | VM_ARM64_BTI | VM_SHARED))
+ return false;
+ }
+
return true;
}
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 645fe60d000f..e44ce6fcfad9 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -79,9 +79,20 @@ arch_initcall(adjust_protection_map);
pgprot_t vm_get_page_prot(unsigned long vm_flags)
{
- pteval_t prot = pgprot_val(protection_map[vm_flags &
+ pteval_t prot;
+
+ /* If this is a GCS then only interpret VM_WRITE. */
+ if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) {
+ if (vm_flags & VM_WRITE)
+ prot = _PAGE_GCS;
+ else
+ prot = _PAGE_GCS_RO;
+ } else {
+ prot = pgprot_val(protection_map[vm_flags &
(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
+ }
+ /* VM_ARM64_BTI on a GCS is rejected in arch_validate_flags() */
if (vm_flags & VM_ARM64_BTI)
prot |= PTE_GP;
--
2.30.2
GCS introduces a number of system registers for EL1 and EL0, on systems
with GCS we need to context switch them and expose them to VMMs to allow
guests to use GCS, as well as describe their fine grained traps to
nested virtualisation. Traps are already disabled.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
arch/arm64/kvm/emulate-nested.c | 4 ++++
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +++++++++++++++++
arch/arm64/kvm/sys_regs.c | 22 ++++++++++++++++++++++
4 files changed, 55 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..6c7ea7f9cd92 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -388,6 +388,12 @@ enum vcpu_sysreg {
GCR_EL1, /* Tag Control Register */
TFSRE0_EL1, /* Tag Fault Status Register (EL0) */
+ /* Guarded Control Stack registers */
+ GCSCRE0_EL1, /* Guarded Control Stack Control (EL0) */
+ GCSCR_EL1, /* Guarded Control Stack Control (EL1) */
+ GCSPR_EL0, /* Guarded Control Stack Pointer (EL0) */
+ GCSPR_EL1, /* Guarded Control Stack Pointer (EL1) */
+
/* 32bit specific registers. */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
@@ -1221,6 +1227,12 @@ static inline bool __vcpu_has_feature(const struct kvm_arch *ka, int feature)
#define vcpu_has_feature(v, f) __vcpu_has_feature(&(v)->kvm->arch, (f))
+static inline bool has_gcs(void)
+{
+ return IS_ENABLED(CONFIG_ARM64_GCS) &&
+ cpus_have_final_cap(ARM64_HAS_GCS);
+}
+
int kvm_trng_call(struct kvm_vcpu *vcpu);
#ifdef CONFIG_KVM
extern phys_addr_t hyp_mem_base;
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 431fd429932d..24eb7eccbae4 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -1098,8 +1098,12 @@ static const struct encoding_to_trap_config encoding_to_fgt[] __initconst = {
SR_FGT(SYS_ESR_EL1, HFGxTR, ESR_EL1, 1),
SR_FGT(SYS_DCZID_EL0, HFGxTR, DCZID_EL0, 1),
SR_FGT(SYS_CTR_EL0, HFGxTR, CTR_EL0, 1),
+ SR_FGT(SYS_GCSPR_EL0, HFGxTR, nGCS_EL0, 1),
SR_FGT(SYS_CSSELR_EL1, HFGxTR, CSSELR_EL1, 1),
SR_FGT(SYS_CPACR_EL1, HFGxTR, CPACR_EL1, 1),
+ SR_FGT(SYS_GCSCR_EL1, HFGxTR, nGCS_EL1, 1),
+ SR_FGT(SYS_GCSPR_EL1, HFGxTR, nGCS_EL1, 1),
+ SR_FGT(SYS_GCSCRE0_EL1, HFGxTR, nGCS_EL0, 1),
SR_FGT(SYS_CONTEXTIDR_EL1, HFGxTR, CONTEXTIDR_EL1, 1),
SR_FGT(SYS_CLIDR_EL1, HFGxTR, CLIDR_EL1, 1),
SR_FGT(SYS_CCSIDR_EL1, HFGxTR, CCSIDR_EL1, 1),
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index bb6b571ec627..ec34d4a90717 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -25,6 +25,8 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
{
ctxt_sys_reg(ctxt, TPIDR_EL0) = read_sysreg(tpidr_el0);
ctxt_sys_reg(ctxt, TPIDRRO_EL0) = read_sysreg(tpidrro_el0);
+ if (has_gcs())
+ ctxt_sys_reg(ctxt, GCSPR_EL0) = read_sysreg_s(SYS_GCSPR_EL0);
}
static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
@@ -62,6 +64,12 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg_par();
ctxt_sys_reg(ctxt, TPIDR_EL1) = read_sysreg(tpidr_el1);
+ if (has_gcs()) {
+ ctxt_sys_reg(ctxt, GCSPR_EL1) = read_sysreg_el1(SYS_GCSPR);
+ ctxt_sys_reg(ctxt, GCSCR_EL1) = read_sysreg_el1(SYS_GCSCR);
+ ctxt_sys_reg(ctxt, GCSCRE0_EL1) = read_sysreg_s(SYS_GCSCRE0_EL1);
+ }
+
if (ctxt_has_mte(ctxt)) {
ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
@@ -95,6 +103,8 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL0), tpidr_el0);
write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0), tpidrro_el0);
+ if (has_gcs())
+ write_sysreg_s(ctxt_sys_reg(ctxt, GCSPR_EL0), SYS_GCSPR_EL0);
}
static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
@@ -138,6 +148,13 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1), par_el1);
write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1);
+ if (has_gcs()) {
+ write_sysreg_el1(ctxt_sys_reg(ctxt, GCSPR_EL1), SYS_GCSPR);
+ write_sysreg_el1(ctxt_sys_reg(ctxt, GCSCR_EL1), SYS_GCSCR);
+ write_sysreg_s(ctxt_sys_reg(ctxt, GCSCRE0_EL1),
+ SYS_GCSCRE0_EL1);
+ }
+
if (ctxt_has_mte(ctxt)) {
write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 30253bd19917..83ba767e75d2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2000,6 +2000,23 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
.visibility = mte_visibility, \
}
+static unsigned int gcs_visibility(const struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *rd)
+{
+ if (has_gcs())
+ return 0;
+
+ return REG_HIDDEN;
+}
+
+#define GCS_REG(name) { \
+ SYS_DESC(SYS_##name), \
+ .access = undef_access, \
+ .reset = reset_unknown, \
+ .reg = name, \
+ .visibility = gcs_visibility, \
+}
+
static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
@@ -2376,6 +2393,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
PTRAUTH_KEY(APDB),
PTRAUTH_KEY(APGA),
+ GCS_REG(GCSCR_EL1),
+ GCS_REG(GCSPR_EL1),
+ GCS_REG(GCSCRE0_EL1),
+
{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
{ SYS_DESC(SYS_ELR_EL1), access_elr},
@@ -2462,6 +2483,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
{ SYS_DESC(SYS_CTR_EL0), access_ctr },
+ GCS_REG(GCSPR_EL0),
{ SYS_DESC(SYS_SVCR), undef_access },
{ PMU_SYS_REG(PMCR_EL0), .access = access_pmcr, .reset = reset_pmcr,
--
2.30.2
There is a control HCRX_EL2.GCSEn which must be set to allow GCS
features to take effect at lower ELs and also fine grained traps for GCS
usage at EL0 and EL1. Configure all these to allow GCS usage by EL0 and
EL1.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/el2_setup.h | 17 +++++++++++++++++
arch/arm64/include/asm/kvm_arm.h | 4 ++--
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index b7afaa026842..17672563e333 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -27,6 +27,14 @@
ubfx x0, x0, #ID_AA64MMFR1_EL1_HCX_SHIFT, #4
cbz x0, .Lskip_hcrx_\@
mov_q x0, HCRX_HOST_FLAGS
+
+ /* Enable GCS if supported */
+ mrs_s x1, SYS_ID_AA64PFR1_EL1
+ ubfx x1, x1, #ID_AA64PFR1_EL1_GCS_SHIFT, #4
+ cbz x1, .Lset_hcrx_\@
+ orr x0, x0, #HCRX_EL2_GCSEn
+
+.Lset_hcrx_\@:
msr_s SYS_HCRX_EL2, x0
.Lskip_hcrx_\@:
.endm
@@ -190,6 +198,15 @@
orr x0, x0, #HFGxTR_EL2_nPIR_EL1
orr x0, x0, #HFGxTR_EL2_nPIRE0_EL1
+ /* GCS depends on PIE so we don't check it if PIE is absent */
+ mrs_s x1, SYS_ID_AA64PFR1_EL1
+ ubfx x1, x1, #ID_AA64PFR1_EL1_GCS_SHIFT, #4
+ cbz x1, .Lset_fgt_\@
+
+ /* Disable traps of access to GCS registers at EL0 and EL1 */
+ orr x0, x0, #HFGxTR_EL2_nGCS_EL1_MASK
+ orr x0, x0, #HFGxTR_EL2_nGCS_EL0_MASK
+
.Lset_fgt_\@:
msr_s SYS_HFGRTR_EL2, x0
msr_s SYS_HFGWTR_EL2, x0
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3c6f8ba1e479..a9354c237a97 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -103,9 +103,9 @@
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
#define HCRX_GUEST_FLAGS \
- (HCRX_EL2_SMPME | HCRX_EL2_TCR2En | \
+ (HCRX_EL2_SMPME | HCRX_EL2_TCR2En | HCRX_EL2_GCSEn |\
(cpus_have_final_cap(ARM64_HAS_MOPS) ? (HCRX_EL2_MSCEn | HCRX_EL2_MCE2) : 0))
-#define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En)
+#define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En | HCRX_EL2_GCSEn)
/* TCR_EL2 Registers bits */
#define TCR_EL2_DS (1UL << 32)
--
2.30.2
Hook up an override for GCS, allowing it to be disabled from the command
line by specifying arm64.nogcs in case there are problems.
Signed-off-by: Mark Brown <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
arch/arm64/kernel/idreg-override.c | 2 ++
2 files changed, 8 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 31b3a25680d0..e86160251d23 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -429,9 +429,15 @@
arm64.nobti [ARM64] Unconditionally disable Branch Target
Identification support
+ arm64.nogcs [ARM64] Unconditionally disable Guarded Control Stack
+ support
+
arm64.nomops [ARM64] Unconditionally disable Memory Copy and Memory
Set instructions support
+ arm64.nopauth [ARM64] Unconditionally disable Pointer Authentication
+ support
+
arm64.nomte [ARM64] Unconditionally disable Memory Tagging Extension
support
diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index e30fd9e32ef3..00bcdad53ba9 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -110,6 +110,7 @@ static const struct ftr_set_desc pfr1 __prel64_initconst = {
.override = &id_aa64pfr1_override,
.fields = {
FIELD("bt", ID_AA64PFR1_EL1_BT_SHIFT, NULL ),
+ FIELD("gcs", ID_AA64PFR1_EL1_GCS_SHIFT, NULL),
FIELD("mte", ID_AA64PFR1_EL1_MTE_SHIFT, NULL),
FIELD("sme", ID_AA64PFR1_EL1_SME_SHIFT, pfr1_sme_filter),
{}
@@ -190,6 +191,7 @@ static const struct {
{ "arm64.nosve", "id_aa64pfr0.sve=0" },
{ "arm64.nosme", "id_aa64pfr1.sme=0" },
{ "arm64.nobti", "id_aa64pfr1.bt=0" },
+ { "arm64.nogcs", "id_aa64pfr1.gcs=0" },
{ "arm64.nopauth",
"id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 "
"id_aa64isar1.api=0 id_aa64isar1.apa=0 "
--
2.30.2
Provide a hwcap to enable userspace to detect support for GCS.
Signed-off-by: Mark Brown <[email protected]>
---
Documentation/arch/arm64/elf_hwcaps.rst | 3 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/kernel/cpufeature.c | 3 +++
arch/arm64/kernel/cpuinfo.c | 1 +
5 files changed, 9 insertions(+)
diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst
index ced7b335e2e0..86d4ace9c75c 100644
--- a/Documentation/arch/arm64/elf_hwcaps.rst
+++ b/Documentation/arch/arm64/elf_hwcaps.rst
@@ -317,6 +317,9 @@ HWCAP2_LRCPC3
HWCAP2_LSE128
Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0011.
+HWCAP2_GCS
+ Functionality implied by ID_AA64PFR1_EL1.GCS == 0b1
+
4. Unused AT_HWCAP bits
-----------------------
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index cd71e09ea14d..e01e6b72a839 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -142,6 +142,7 @@
#define KERNEL_HWCAP_SVE_B16B16 __khwcap2_feature(SVE_B16B16)
#define KERNEL_HWCAP_LRCPC3 __khwcap2_feature(LRCPC3)
#define KERNEL_HWCAP_LSE128 __khwcap2_feature(LSE128)
+#define KERNEL_HWCAP_GCS __khwcap2_feature(GCS)
/*
* This yields a mask that user programs can use to figure out what
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index 5023599fa278..996b5b5d4c4e 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -107,5 +107,6 @@
#define HWCAP2_SVE_B16B16 (1UL << 45)
#define HWCAP2_LRCPC3 (1UL << 46)
#define HWCAP2_LSE128 (1UL << 47)
+#define HWCAP2_GCS (1UL << 48)
#endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b606842ab8c1..1a92c4502a0b 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2867,6 +2867,9 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
HWCAP_CAP(ID_AA64ZFR0_EL1, I8MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEI8MM),
HWCAP_CAP(ID_AA64ZFR0_EL1, F32MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF32MM),
HWCAP_CAP(ID_AA64ZFR0_EL1, F64MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF64MM),
+#endif
+#ifdef CONFIG_ARM64_GCS
+ HWCAP_CAP(ID_AA64PFR1_EL1, GCS, IMP, CAP_HWCAP, KERNEL_HWCAP_GCS),
#endif
HWCAP_CAP(ID_AA64PFR1_EL1, SSBS, SSBS2, CAP_HWCAP, KERNEL_HWCAP_SSBS),
#ifdef CONFIG_ARM64_BTI
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 47043c0d95ec..b3ec0b89c9e0 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -128,6 +128,7 @@ static const char *const hwcap_str[] = {
[KERNEL_HWCAP_SVE_B16B16] = "sveb16b16",
[KERNEL_HWCAP_LRCPC3] = "lrcpc3",
[KERNEL_HWCAP_LSE128] = "lse128",
+ [KERNEL_HWCAP_GCS] = "gcs",
};
#ifdef CONFIG_COMPAT
--
2.30.2
A new exception code is defined for GCS specific faults other than
standard load/store faults, for example GCS token validation failures,
add handling for this. These faults are reported to userspace as
segfaults with code SEGV_CPERR (protection error), mirroring the
reporting for x86 shadow stack errors.
GCS faults due to memory load/store operations generate data aborts with
a flag set, these will be handled separately as part of the data abort
handling.
Since we do not currently enable GCS for EL1 we should not get any faults
there but while we're at it we wire things up there, treating any GCS
fault as fatal.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/esr.h | 28 +++++++++++++++++++++++++++-
arch/arm64/include/asm/exception.h | 2 ++
arch/arm64/kernel/entry-common.c | 23 +++++++++++++++++++++++
arch/arm64/kernel/traps.c | 11 +++++++++++
4 files changed, 63 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 353fe08546cf..20ee9f531864 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -51,7 +51,8 @@
#define ESR_ELx_EC_FP_EXC32 (0x28)
/* Unallocated EC: 0x29 - 0x2B */
#define ESR_ELx_EC_FP_EXC64 (0x2C)
-/* Unallocated EC: 0x2D - 0x2E */
+#define ESR_ELx_EC_GCS (0x2D)
+/* Unallocated EC: 0x2E */
#define ESR_ELx_EC_SERROR (0x2F)
#define ESR_ELx_EC_BREAKPT_LOW (0x30)
#define ESR_ELx_EC_BREAKPT_CUR (0x31)
@@ -382,6 +383,31 @@
#define ESR_ELx_MOPS_ISS_SRCREG(esr) (((esr) & (UL(0x1f) << 5)) >> 5)
#define ESR_ELx_MOPS_ISS_SIZEREG(esr) (((esr) & (UL(0x1f) << 0)) >> 0)
+/* ISS field definitions for GCS */
+#define ESR_ELx_ExType_SHIFT (20)
+#define ESR_ELx_ExType_MASK GENMASK(23, 20)
+#define ESR_ELx_Raddr_SHIFT (10)
+#define ESR_ELx_Raddr_MASK GENMASK(14, 10)
+#define ESR_ELx_Rn_SHIFT (5)
+#define ESR_ELx_Rn_MASK GENMASK(9, 5)
+#define ESR_ELx_Rvalue_SHIFT 5
+#define ESR_ELx_Rvalue_MASK GENMASK(9, 5)
+#define ESR_ELx_IT_SHIFT (0)
+#define ESR_ELx_IT_MASK GENMASK(4, 0)
+
+#define ESR_ELx_ExType_DATA_CHECK 0
+#define ESR_ELx_ExType_EXLOCK 1
+#define ESR_ELx_ExType_STR 2
+
+#define ESR_ELx_IT_RET 0
+#define ESR_ELx_IT_GCSPOPM 1
+#define ESR_ELx_IT_RET_KEYA 2
+#define ESR_ELx_IT_RET_KEYB 3
+#define ESR_ELx_IT_GCSSS1 4
+#define ESR_ELx_IT_GCSSS2 5
+#define ESR_ELx_IT_GCSPOPCX 6
+#define ESR_ELx_IT_GCSPOPX 7
+
#ifndef __ASSEMBLY__
#include <asm/types.h>
diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
index ad688e157c9b..99caff458e20 100644
--- a/arch/arm64/include/asm/exception.h
+++ b/arch/arm64/include/asm/exception.h
@@ -57,6 +57,8 @@ void do_el0_undef(struct pt_regs *regs, unsigned long esr);
void do_el1_undef(struct pt_regs *regs, unsigned long esr);
void do_el0_bti(struct pt_regs *regs);
void do_el1_bti(struct pt_regs *regs, unsigned long esr);
+void do_el0_gcs(struct pt_regs *regs, unsigned long esr);
+void do_el1_gcs(struct pt_regs *regs, unsigned long esr);
void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr,
struct pt_regs *regs);
void do_fpsimd_acc(unsigned long esr, struct pt_regs *regs);
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 0fc94207e69a..52d78ce63a4e 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -429,6 +429,15 @@ static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
exit_to_kernel_mode(regs);
}
+static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
+{
+ enter_from_kernel_mode(regs);
+ local_daif_inherit(regs);
+ do_el1_gcs(regs, esr);
+ local_daif_mask();
+ exit_to_kernel_mode(regs);
+}
+
static void noinstr el1_dbg(struct pt_regs *regs, unsigned long esr)
{
unsigned long far = read_sysreg(far_el1);
@@ -471,6 +480,9 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
case ESR_ELx_EC_BTI:
el1_bti(regs, esr);
break;
+ case ESR_ELx_EC_GCS:
+ el1_gcs(regs, esr);
+ break;
case ESR_ELx_EC_BREAKPT_CUR:
case ESR_ELx_EC_SOFTSTP_CUR:
case ESR_ELx_EC_WATCHPT_CUR:
@@ -650,6 +662,14 @@ static void noinstr el0_mops(struct pt_regs *regs, unsigned long esr)
exit_to_user_mode(regs);
}
+static void noinstr el0_gcs(struct pt_regs *regs, unsigned long esr)
+{
+ enter_from_user_mode(regs);
+ local_daif_restore(DAIF_PROCCTX);
+ do_el0_gcs(regs, esr);
+ exit_to_user_mode(regs);
+}
+
static void noinstr el0_inv(struct pt_regs *regs, unsigned long esr)
{
enter_from_user_mode(regs);
@@ -732,6 +752,9 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
case ESR_ELx_EC_MOPS:
el0_mops(regs, esr);
break;
+ case ESR_ELx_EC_GCS:
+ el0_gcs(regs, esr);
+ break;
case ESR_ELx_EC_BREAKPT_LOW:
case ESR_ELx_EC_SOFTSTP_LOW:
case ESR_ELx_EC_WATCHPT_LOW:
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 215e6d7f2df8..fb867c6526a6 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -500,6 +500,16 @@ void do_el1_bti(struct pt_regs *regs, unsigned long esr)
die("Oops - BTI", regs, esr);
}
+void do_el0_gcs(struct pt_regs *regs, unsigned long esr)
+{
+ force_signal_inject(SIGSEGV, SEGV_CPERR, regs->pc, 0);
+}
+
+void do_el1_gcs(struct pt_regs *regs, unsigned long esr)
+{
+ die("Oops - GCS", regs, esr);
+}
+
void do_el0_fpac(struct pt_regs *regs, unsigned long esr)
{
force_signal_inject(SIGILL, ILL_ILLOPN, regs->pc, esr);
@@ -838,6 +848,7 @@ static const char *esr_class_str[] = {
[ESR_ELx_EC_MOPS] = "MOPS",
[ESR_ELx_EC_FP_EXC32] = "FP (AArch32)",
[ESR_ELx_EC_FP_EXC64] = "FP (AArch64)",
+ [ESR_ELx_EC_GCS] = "Guarded Control Stack",
[ESR_ELx_EC_SERROR] = "SError",
[ESR_ELx_EC_BREAKPT_LOW] = "Breakpoint (lower EL)",
[ESR_ELx_EC_BREAKPT_CUR] = "Breakpoint (current EL)",
--
2.30.2
All GCS operations at EL0 must happen on a page which is marked as
having UnprivGCS access, including read operations. If a GCS operation
attempts to access a page without this then it will generate a data
abort with the GCS bit set in ESR_EL1.ISS2.
EL0 may validly generate such faults, for example due to copy on write
which will cause the GCS data to be stored in a read only page with no
GCS permissions until the actual copy happens. Since UnprivGCS allows
both reads and writes to the GCS (though only through GCS operations) we
need to ensure that the memory management subsystem handles GCS accesses
as writes at all times. Do this by adding FAULT_FLAG_WRITE to any GCS
page faults, adding handling to ensure that invalid cases are identfied
as such early so the memory management core does not think they will
succeed. The core cannot distinguish between VMAs which are generally
writeable and VMAs which are only writeable through GCS operations.
EL1 may validly write to EL0 GCS for management purposes (eg, while
initialising with cap tokens).
We also report any GCS faults in VMAs not marked as part of a GCS as
access violations, causing a fault to be delivered to userspace if it
attempts to do GCS operations outside a GCS.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/mm/fault.c | 79 +++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 71 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 55f6455a8284..5303d4e3457d 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -494,13 +494,30 @@ static void do_bad_area(unsigned long far, unsigned long esr,
}
}
+/*
+ * Note: not valid for EL1 DC IVAC, but we never use that such that it
+ * should fault. EL0 cannot issue DC IVAC (undef).
+ */
+static bool is_write_abort(unsigned long esr)
+{
+ return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
+}
+
+static bool is_gcs_fault(unsigned long esr)
+{
+ if (!esr_is_data_abort(esr))
+ return false;
+
+ return ESR_ELx_ISS2(esr) & ESR_ELx_GCS;
+}
+
#define VM_FAULT_BADMAP ((__force vm_fault_t)0x010000)
#define VM_FAULT_BADACCESS ((__force vm_fault_t)0x020000)
static vm_fault_t __do_page_fault(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long addr,
unsigned int mm_flags, unsigned long vm_flags,
- struct pt_regs *regs)
+ unsigned long esr, struct pt_regs *regs)
{
/*
* Ok, we have a good vm_area for this memory access, so we can handle
@@ -510,6 +527,26 @@ static vm_fault_t __do_page_fault(struct mm_struct *mm,
*/
if (!(vma->vm_flags & vm_flags))
return VM_FAULT_BADACCESS;
+
+ if (vma->vm_flags & VM_SHADOW_STACK) {
+ /*
+ * Writes to a GCS must either be generated by a GCS
+ * operation or be from EL1.
+ */
+ if (is_write_abort(esr) &&
+ !(is_gcs_fault(esr) || is_el1_data_abort(esr)))
+ return VM_FAULT_BADACCESS;
+ } else {
+ /*
+ * GCS faults should never happen for pages that are
+ * not part of a GCS and the operation being attempted
+ * can never succeed.
+ */
+ if (is_gcs_fault(esr))
+ return VM_FAULT_BADACCESS;
+ }
+
+
return handle_mm_fault(vma, addr, mm_flags, regs);
}
@@ -518,13 +555,18 @@ static bool is_el0_instruction_abort(unsigned long esr)
return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW;
}
-/*
- * Note: not valid for EL1 DC IVAC, but we never use that such that it
- * should fault. EL0 cannot issue DC IVAC (undef).
- */
-static bool is_write_abort(unsigned long esr)
+static bool is_invalid_el0_gcs_access(struct vm_area_struct *vma, u64 esr)
{
- return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
+ if (!system_supports_gcs())
+ return false;
+ if (likely(!(vma->vm_flags & VM_SHADOW_STACK))) {
+ if (is_gcs_fault(esr))
+ return true;
+ return false;
+ }
+ if (is_gcs_fault(esr))
+ return false;
+ return is_write_abort(esr);
}
static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
@@ -573,6 +615,13 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
/* If EPAN is absent then exec implies read */
if (!alternative_has_cap_unlikely(ARM64_HAS_EPAN))
vm_flags |= VM_EXEC;
+ /*
+ * Upgrade read faults to write faults, GCS reads must
+ * occur on a page marked as GCS so we need to trigger
+ * copy on write always.
+ */
+ if (is_gcs_fault(esr))
+ mm_flags |= FAULT_FLAG_WRITE;
}
if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs)) {
@@ -594,6 +643,20 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
if (!vma)
goto lock_mmap;
+ /*
+ * We get legitimate write faults for GCS pages from GCS
+ * operations, even when the initial operation was a read, as
+ * a result of upgrading GCS accesses to writes for CoW but
+ * GCS acceses outside of a GCS must fail. Specifically check
+ * for this since the mm core isn't able to distinguish
+ * invalid GCS access from valid ones and will try to resolve
+ * the fault.
+ */
+ if (is_invalid_el0_gcs_access(vma, esr)) {
+ vma_end_read(vma);
+ goto lock_mmap;
+ }
+
if (!(vma->vm_flags & vm_flags)) {
vma_end_read(vma);
goto lock_mmap;
@@ -625,7 +688,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
goto done;
}
- fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, regs);
+ fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, esr, regs);
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
--
2.30.2
When a new thread is created by a thread with GCS enabled the GCS needs
to be specified along with the regular stack. clone3() has been
extended to support this case, allowing userspace to explicitly specify
the size and location of the GCS. The specified GCS must have a valid
GCS token at the top of the stack, as though userspace were pivoting to
the new GCS. This will be consumed on use. At present we do not
atomically consume the token, this will be addressed in a future
revision.
Unfortunately plain clone() is not extensible and existing clone3()
users will not specify a stack so all existing code would be broken if
we mandated specifying the stack explicitly. For compatibility with
these cases and also x86 (which did not initially implement clone3()
support for shadow stacks) if no GCS is specified we will allocate one
thread so when a thread is created which has GCS enabled allocate one
for it. We follow the extensively discussed x86 implementation and
allocate min(RLIMIT_STACK, 4G). Since the GCS only stores the call
stack and not any variables this should be more than sufficient for most
applications.
GCSs allocated via this mechanism then it will be freed when the thread
exits, those explicitly configured by the user will not.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 9 ++++
arch/arm64/kernel/process.c | 29 +++++++++++
arch/arm64/mm/gcs.c | 117 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 155 insertions(+)
diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 04594ef59dad..c1f274fdb9c0 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -8,6 +8,8 @@
#include <asm/types.h>
#include <asm/uaccess.h>
+struct kernel_clone_args;
+
static inline void gcsb_dsync(void)
{
asm volatile(".inst 0xd503227f" : : : "memory");
@@ -58,6 +60,8 @@ static inline bool task_gcs_el0_enabled(struct task_struct *task)
void gcs_set_el0_mode(struct task_struct *task);
void gcs_free(struct task_struct *task);
void gcs_preserve_current_state(void);
+unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args);
#else
@@ -69,6 +73,11 @@ static inline bool task_gcs_el0_enabled(struct task_struct *task)
static inline void gcs_set_el0_mode(struct task_struct *task) { }
static inline void gcs_free(struct task_struct *task) { }
static inline void gcs_preserve_current_state(void) { }
+static inline unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args)
+{
+ return -ENOTSUPP;
+}
#endif
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index fd80b43c2969..8bd66cde0a86 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -285,9 +285,32 @@ static void flush_gcs(void)
write_sysreg_s(0, SYS_GCSPR_EL0);
}
+static int copy_thread_gcs(struct task_struct *p,
+ const struct kernel_clone_args *args)
+{
+ unsigned long gcs;
+
+ gcs = gcs_alloc_thread_stack(p, args);
+ if (IS_ERR_VALUE(gcs))
+ return PTR_ERR((void *)gcs);
+
+ p->thread.gcs_el0_mode = current->thread.gcs_el0_mode;
+ p->thread.gcs_el0_locked = current->thread.gcs_el0_locked;
+
+ /* Ensure the current state of the GCS is seen by CoW */
+ gcsb_dsync();
+
+ return 0;
+}
+
#else
static void flush_gcs(void) { }
+static int copy_thread_gcs(struct task_struct *p,
+ const struct kernel_clone_args *args)
+{
+ return 0;
+}
#endif
@@ -303,6 +326,7 @@ void flush_thread(void)
void arch_release_task_struct(struct task_struct *tsk)
{
fpsimd_release_task(tsk);
+ gcs_free(tsk);
}
int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
@@ -369,6 +393,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
unsigned long stack_start = args->stack;
unsigned long tls = args->tls;
struct pt_regs *childregs = task_pt_regs(p);
+ int ret;
memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));
@@ -410,6 +435,10 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
p->thread.uw.tp_value = tls;
p->thread.tpidr2_el0 = 0;
}
+
+ ret = copy_thread_gcs(p, args);
+ if (ret != 0)
+ return ret;
} else {
/*
* A kthread has no context to ERET to, so ensure any buggy
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index b0a67efc522b..3cbc3a3d4bc7 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -8,6 +8,113 @@
#include <asm/cpufeature.h>
#include <asm/page.h>
+static unsigned long alloc_gcs(unsigned long addr, unsigned long size,
+ unsigned long token_offset, bool set_res_tok)
+{
+ int flags = MAP_ANONYMOUS | MAP_PRIVATE;
+ struct mm_struct *mm = current->mm;
+ unsigned long mapped_addr, unused;
+
+ if (addr)
+ flags |= MAP_FIXED_NOREPLACE;
+
+ mmap_write_lock(mm);
+ mapped_addr = do_mmap(NULL, addr, size, PROT_READ | PROT_WRITE, flags,
+ VM_SHADOW_STACK, 0, &unused, NULL);
+ mmap_write_unlock(mm);
+
+ return mapped_addr;
+}
+
+static unsigned long gcs_size(unsigned long size)
+{
+ if (size)
+ return PAGE_ALIGN(size);
+
+ /* Allocate RLIMIT_STACK/2 with limits of PAGE_SIZE..2G */
+ size = PAGE_ALIGN(min_t(unsigned long long,
+ rlimit(RLIMIT_STACK) / 2, SZ_2G));
+ return max(PAGE_SIZE, size);
+}
+
+static bool gcs_consume_token(struct task_struct *tsk, unsigned long user_addr)
+{
+ unsigned long expected = GCS_CAP(user_addr);
+ unsigned long val;
+ int ret = 0;
+
+ /* This should really be an atomic cpmxchg. It is not. */
+ __get_user_error(val, (__user unsigned long *)user_addr, ret);
+ if (ret != 0)
+ return false;
+
+ if (val != expected)
+ return false;
+
+ put_user_gcs(0, (__user unsigned long*)user_addr, &ret);
+
+ return ret == 0;
+}
+
+unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args)
+{
+ unsigned long addr, size, gcspr_el0;
+
+ /* If the user specified a GCS use it. */
+ if (args->shadow_stack_size) {
+ if (!system_supports_gcs())
+ return (unsigned long)ERR_PTR(-EINVAL);
+
+ addr = args->shadow_stack;
+ size = args->shadow_stack_size;
+
+ /*
+ * There should be a token, there might be an end of
+ * stack marker.
+ */
+ gcspr_el0 = addr + size - (2 * sizeof(u64));
+ if (!gcs_consume_token(tsk, gcspr_el0)) {
+ gcspr_el0 += sizeof(u64);
+ if (!gcs_consume_token(tsk, gcspr_el0))
+ return (unsigned long)ERR_PTR(-EINVAL);
+ }
+
+ /* Userspace is responsible for unmapping */
+ tsk->thread.gcspr_el0 = gcspr_el0 + sizeof(u64);
+ } else {
+
+ /*
+ * Otherwise fall back to legacy clone() support and
+ * implicitly allocate a GCS if we need a new one.
+ */
+
+ if (!system_supports_gcs())
+ return 0;
+
+ if (!task_gcs_el0_enabled(tsk))
+ return 0;
+
+ if ((args->flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) {
+ tsk->thread.gcspr_el0 = read_sysreg_s(SYS_GCSPR_EL0);
+ return 0;
+ }
+
+ size = args->stack_size;
+
+ size = gcs_size(size);
+ addr = alloc_gcs(0, size, 0, 0);
+ if (IS_ERR_VALUE(addr))
+ return addr;
+
+ tsk->thread.gcs_base = addr;
+ tsk->thread.gcs_size = size;
+ tsk->thread.gcspr_el0 = addr + size - sizeof(u64);
+ }
+
+ return addr;
+}
+
/*
* Apply the GCS mode configured for the specified task to the
* hardware.
@@ -30,6 +137,16 @@ void gcs_set_el0_mode(struct task_struct *task)
void gcs_free(struct task_struct *task)
{
+
+ /*
+ * When fork() with CLONE_VM fails, the child (tsk) already
+ * has a GCS allocated, and exit_thread() calls this function
+ * to free it. In this case the parent (current) and the
+ * child share the same mm struct.
+ */
+ if (!task->mm || task->mm != current->mm)
+ return;
+
if (task->thread.gcs_base)
vm_munmap(task->thread.gcs_base, task->thread.gcs_size);
--
2.30.2
Implement the architecture neutral prtctl() interface for setting the
shadow stack status, this supports setting and reading the current GCS
configuration for the current thread.
Userspace can enable basic GCS functionality and additionally also
support for GCS pushes and arbitrary GCS stores. It is expected that
this prctl() will be called very early in application startup, for
example by the dynamic linker, and not subsequently adjusted during
normal operation. Users should carefully note that after enabling GCS
for a thread GCS will become active with no call stack so it is not
normally possible to return from the function that invoked the prctl().
State is stored per thread, enabling GCS for a thread causes a GCS to be
allocated for that thread.
Userspace may lock the current GCS configuration by specifying
PR_SHADOW_STACK_ENABLE_LOCK, this prevents any further changes to the
GCS configuration via any means.
If GCS is not being enabled then all flags other than _LOCK are ignored,
it is not possible to enable stores or pops without enabling GCS.
When disabling the GCS we do not free the allocated stack, this allows
for inspection of the GCS after disabling as part of fault reporting.
Since it is not an expected use case and since it presents some
complications in determining what to do with previously initialsed data
on the GCS attempts to reenable GCS after this are rejected. This can
be revisted if a use case arises.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 22 ++++++++++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/mm/gcs.c | 82 ++++++++++++++++++++++++++++++++++++++
3 files changed, 105 insertions(+)
diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index c1f274fdb9c0..48c97e63e56a 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -50,6 +50,9 @@ static inline u64 gcsss2(void)
return Xt;
}
+#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK \
+ (PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE | PR_SHADOW_STACK_PUSH)
+
#ifdef CONFIG_ARM64_GCS
static inline bool task_gcs_el0_enabled(struct task_struct *task)
@@ -63,6 +66,20 @@ void gcs_preserve_current_state(void);
unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
const struct kernel_clone_args *args);
+static inline int gcs_check_locked(struct task_struct *task,
+ unsigned long new_val)
+{
+ unsigned long cur_val = task->thread.gcs_el0_mode;
+
+ cur_val &= task->thread.gcs_el0_locked;
+ new_val &= task->thread.gcs_el0_locked;
+
+ if (cur_val != new_val)
+ return -EBUSY;
+
+ return 0;
+}
+
#else
static inline bool task_gcs_el0_enabled(struct task_struct *task)
@@ -78,6 +95,11 @@ static inline unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
{
return -ENOTSUPP;
}
+static inline int gcs_check_locked(struct task_struct *task,
+ unsigned long new_val)
+{
+ return 0;
+}
#endif
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 6fc6dcbd494c..6a3091ec0f03 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -184,6 +184,7 @@ struct thread_struct {
u64 tpidr2_el0;
#ifdef CONFIG_ARM64_GCS
unsigned int gcs_el0_mode;
+ unsigned int gcs_el0_locked;
u64 gcspr_el0;
u64 gcs_base;
u64 gcs_size;
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 3cbc3a3d4bc7..95f5cf599bc6 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -154,3 +154,85 @@ void gcs_free(struct task_struct *task)
task->thread.gcs_base = 0;
task->thread.gcs_size = 0;
}
+
+int arch_set_shadow_stack_status(struct task_struct *task, unsigned long arg)
+{
+ unsigned long gcs, size;
+ int ret;
+
+ if (!system_supports_gcs())
+ return -EINVAL;
+
+ if (is_compat_thread(task_thread_info(task)))
+ return -EINVAL;
+
+ /* Reject unknown flags */
+ if (arg & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK)
+ return -EINVAL;
+
+ ret = gcs_check_locked(task, arg);
+ if (ret != 0)
+ return ret;
+
+ /* If we are enabling GCS then make sure we have a stack */
+ if (arg & PR_SHADOW_STACK_ENABLE) {
+ if (!task_gcs_el0_enabled(task)) {
+ /* Do not allow GCS to be reenabled */
+ if (task->thread.gcs_base)
+ return -EINVAL;
+
+ if (task != current)
+ return -EBUSY;
+
+ size = gcs_size(0);
+ gcs = alloc_gcs(task->thread.gcspr_el0, size,
+ 0, 0);
+ if (!gcs)
+ return -ENOMEM;
+
+ task->thread.gcspr_el0 = gcs + size - sizeof(u64);
+ task->thread.gcs_base = gcs;
+ task->thread.gcs_size = size;
+ if (task == current)
+ write_sysreg_s(task->thread.gcspr_el0,
+ SYS_GCSPR_EL0);
+
+ }
+ }
+
+ task->thread.gcs_el0_mode = arg;
+ if (task == current)
+ gcs_set_el0_mode(task);
+
+ return 0;
+}
+
+int arch_get_shadow_stack_status(struct task_struct *task,
+ unsigned long __user *arg)
+{
+ if (!system_supports_gcs())
+ return -EINVAL;
+
+ if (is_compat_thread(task_thread_info(task)))
+ return -EINVAL;
+
+ return put_user(task->thread.gcs_el0_mode, arg);
+}
+
+int arch_lock_shadow_stack_status(struct task_struct *task,
+ unsigned long arg)
+{
+ if (!system_supports_gcs())
+ return -EINVAL;
+
+ if (is_compat_thread(task_thread_info(task)))
+ return -EINVAL;
+
+ /*
+ * We support locking unknown bits so applications can prevent
+ * any changes in a future proof manner.
+ */
+ task->thread.gcs_el0_locked |= arg;
+
+ return 0;
+}
--
2.30.2
Use VM_HIGH_ARCH_5 for guarded control stack pages.
Signed-off-by: Mark Brown <[email protected]>
---
Documentation/filesystems/proc.rst | 2 +-
fs/proc/task_mmu.c | 3 +++
include/linux/mm.h | 12 +++++++++++-
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 104c6d047d9b..0392c3b74650 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -570,7 +570,7 @@ encoded manner. The codes are the following:
mt arm64 MTE allocation tags are enabled
um userfaultfd missing tracking
uw userfaultfd wr-protect tracking
- ss shadow stack page
+ ss shadow/guarded control stack page
== =======================================
Note that there is no guarantee that every flag and associated mnemonic will
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ff2c601f7d1c..fb0633d8e309 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -702,6 +702,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
#ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK
[ilog2(VM_SHADOW_STACK)] = "ss",
+#endif
+#ifdef CONFIG_ARM64_GCS
+ [ilog2(VM_SHADOW_STACK)] = "ss",
#endif
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0b1139c5df60..6cc304c90c63 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -352,7 +352,17 @@ extern unsigned int kobjsize(const void *objp);
* for more details on the guard size.
*/
# define VM_SHADOW_STACK VM_HIGH_ARCH_5
-#else
+#endif
+
+#if defined(CONFIG_ARM64_GCS)
+/*
+ * arm64's Guarded Control Stack implements similar functionality and
+ * has similar constraints to shadow stacks.
+ */
+# define VM_SHADOW_STACK VM_HIGH_ARCH_5
+#endif
+
+#ifndef VM_SHADOW_STACK
# define VM_SHADOW_STACK VM_NONE
#endif
--
2.30.2
Provide a Kconfig option allowing the user to select if GCS support is
built into the kernel.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/Kconfig | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index aa7c1d435139..e0048e4660cf 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2098,6 +2098,26 @@ config ARM64_EPAN
if the cpu does not implement the feature.
endmenu # "ARMv8.7 architectural features"
+menu "v9.4 architectural features"
+
+config ARM64_GCS
+ bool "Enable support for Guarded Control Stack (GCS)"
+ default y
+ select ARCH_HAS_USER_SHADOW_STACK
+ select ARCH_USES_HIGH_VMA_FLAGS
+ help
+ Guarded Control Stack (GCS) provides support for a separate
+ stack with restricted access which contains only return
+ addresses. This can be used to harden against some attacks
+ by comparing return address used by the program with what is
+ stored in the GCS, and may also be used to efficiently obtain
+ the call stack for applications such as profiling.
+
+ The feature is detected at runtime, and will remain disabled
+ if the system does not implement the feature.
+
+endmenu # "v9.4 architectural features"
+
config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y
--
2.30.2
Add coverage of the GCS hwcap to the hwcap selftest, using a read of
GCSPR_EL0 to generate SIGILL without having to worry about enabling GCS.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/abi/hwcap.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/tools/testing/selftests/arm64/abi/hwcap.c b/tools/testing/selftests/arm64/abi/hwcap.c
index 1189e77c8152..bc9e3250a9df 100644
--- a/tools/testing/selftests/arm64/abi/hwcap.c
+++ b/tools/testing/selftests/arm64/abi/hwcap.c
@@ -63,6 +63,17 @@ static void fp_sigill(void)
asm volatile("fmov s0, #1");
}
+static void gcs_sigill(void)
+{
+ unsigned long *gcspr;
+
+ asm volatile(
+ "mrs %0, S3_3_C2_C5_1"
+ : "=r" (gcspr)
+ :
+ : "cc");
+}
+
static void ilrcpc_sigill(void)
{
/* LDAPUR W0, [SP, #8] */
@@ -360,6 +371,14 @@ static const struct hwcap_data {
.cpuinfo = "fp",
.sigill_fn = fp_sigill,
},
+ {
+ .name = "GCS",
+ .at_hwcap = AT_HWCAP2,
+ .hwcap_bit = HWCAP2_GCS,
+ .cpuinfo = "gcs",
+ .sigill_fn = gcs_sigill,
+ .sigill_reliable = true,
+ },
{
.name = "JSCVT",
.at_hwcap = AT_HWCAP,
--
2.30.2
Teach the framework about the GCS signal context, avoiding warnings on
the unknown context.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/signal/testcases/testcases.c | 7 +++++++
tools/testing/selftests/arm64/signal/testcases/testcases.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.c b/tools/testing/selftests/arm64/signal/testcases/testcases.c
index 9f580b55b388..1cd124732be4 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.c
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.c
@@ -209,6 +209,13 @@ bool validate_reserved(ucontext_t *uc, size_t resv_sz, char **err)
zt = (struct zt_context *)head;
new_flags |= ZT_CTX;
break;
+ case GCS_MAGIC:
+ if (flags & GCS_CTX)
+ *err = "Multiple GCS_MAGIC";
+ if (head->size != sizeof(struct gcs_context))
+ *err = "Bad size for gcs_context";
+ new_flags |= GCS_CTX;
+ break;
case EXTRA_MAGIC:
if (flags & EXTRA_CTX)
*err = "Multiple EXTRA_MAGIC";
diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.h b/tools/testing/selftests/arm64/signal/testcases/testcases.h
index a08ab0d6207a..9b2599745c29 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.h
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.h
@@ -19,6 +19,7 @@
#define ZA_CTX (1 << 2)
#define EXTRA_CTX (1 << 3)
#define ZT_CTX (1 << 4)
+#define GCS_CTX (1 << 5)
#define KSFT_BAD_MAGIC 0xdeadbeef
--
2.30.2
Currently we ignore si_code unless the expected signal is a SIGSEGV, in
which case we enforce it being SEGV_ACCERR. Allow test cases to specify
exactly which si_code should be generated so we can validate this, and
test for other segfault codes.
Signed-off-by: Mark Brown <[email protected]>
---
.../testing/selftests/arm64/signal/test_signals.h | 4 +++
.../selftests/arm64/signal/test_signals_utils.c | 29 ++++++++++++++--------
2 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals.h b/tools/testing/selftests/arm64/signal/test_signals.h
index 7ada43688c02..ee75a2c25ce7 100644
--- a/tools/testing/selftests/arm64/signal/test_signals.h
+++ b/tools/testing/selftests/arm64/signal/test_signals.h
@@ -71,6 +71,10 @@ struct tdescr {
* Zero when no signal is expected on success
*/
int sig_ok;
+ /*
+ * expected si_code for sig_ok, or 0 to not check
+ */
+ int sig_ok_code;
/* signum expected on unsupported CPU features. */
int sig_unsupp;
/* a timeout in second for test completion */
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c
index 89ef95c1af0e..63deca32b0df 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.c
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c
@@ -143,16 +143,25 @@ static bool handle_signal_ok(struct tdescr *td,
"current->token ZEROED...test is probably broken!\n");
abort();
}
- /*
- * Trying to narrow down the SEGV to the ones generated by Kernel itself
- * via arm64_notify_segfault(). This is a best-effort check anyway, and
- * the si_code check may need to change if this aspect of the kernel
- * ABI changes.
- */
- if (td->sig_ok == SIGSEGV && si->si_code != SEGV_ACCERR) {
- fprintf(stdout,
- "si_code != SEGV_ACCERR...test is probably broken!\n");
- abort();
+ if (td->sig_ok_code) {
+ if (si->si_code != td->sig_ok_code) {
+ fprintf(stdout, "si_code is %d not %d\n",
+ si->si_code, td->sig_ok_code);
+ abort();
+ }
+ } else {
+ /*
+ * Trying to narrow down the SEGV to the ones
+ * generated by Kernel itself via
+ * arm64_notify_segfault(). This is a best-effort
+ * check anyway, and the si_code check may need to
+ * change if this aspect of the kernel ABI changes.
+ */
+ if (td->sig_ok == SIGSEGV && si->si_code != SEGV_ACCERR) {
+ fprintf(stdout,
+ "si_code != SEGV_ACCERR...test is probably broken!\n");
+ abort();
+ }
}
td->pass = 1;
/*
--
2.30.2
Since it is not possible to return from the function that enabled GCS
without disabling GCS it is very inconvenient to use the signal handling
tests to cover GCS when GCS is not enabled by the toolchain and runtime,
something that no current distribution does. Since none of the testcases
do anything with stacks that would cause problems with GCS we can sidestep
this issue by unconditionally enabling GCS on startup and exiting with a
call to exit() rather than a return from main().
Signed-off-by: Mark Brown <[email protected]>
---
.../testing/selftests/arm64/signal/test_signals.c | 17 ++++++++++++-
.../selftests/arm64/signal/test_signals_utils.h | 29 ++++++++++++++++++++++
2 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals.c b/tools/testing/selftests/arm64/signal/test_signals.c
index 00051b40d71e..30e95f50db19 100644
--- a/tools/testing/selftests/arm64/signal/test_signals.c
+++ b/tools/testing/selftests/arm64/signal/test_signals.c
@@ -7,6 +7,10 @@
* Each test provides its own tde struct tdescr descriptor to link with
* this wrapper. Framework provides common helpers.
*/
+
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+
#include <kselftest.h>
#include "test_signals.h"
@@ -16,6 +20,16 @@ struct tdescr *current = &tde;
int main(int argc, char *argv[])
{
+ /*
+ * Ensure GCS is at least enabled throughout the tests if
+ * supported, otherwise the inability to return from the
+ * function that enabled GCS makes it very inconvenient to set
+ * up test cases. The prctl() may fail if GCS was locked by
+ * libc setup code.
+ */
+ if (getauxval(AT_HWCAP2) & HWCAP2_GCS)
+ gcs_set_state(PR_SHADOW_STACK_ENABLE);
+
ksft_print_msg("%s :: %s\n", current->name, current->descr);
if (test_setup(current) && test_init(current)) {
test_run(current);
@@ -23,5 +37,6 @@ int main(int argc, char *argv[])
}
test_result(current);
- return current->result;
+ /* Do not return in case GCS was enabled */
+ exit(current->result);
}
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 762c8fe9c54a..1e80808ee105 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -18,6 +18,35 @@ void test_cleanup(struct tdescr *td);
int test_run(struct tdescr *td);
void test_result(struct tdescr *td);
+#ifndef __NR_prctl
+#define __NR_prctl 167
+#endif
+
+/*
+ * The prctl takes 1 argument but we need to ensure that the other
+ * values passed in registers to the syscall are zero since the kernel
+ * validates them.
+ */
+#define gcs_set_state(state) \
+ ({ \
+ register long _num __asm__ ("x8") = __NR_prctl; \
+ register long _arg1 __asm__ ("x0") = PR_SET_SHADOW_STACK_STATUS; \
+ register long _arg2 __asm__ ("x1") = (long)(state); \
+ register long _arg3 __asm__ ("x2") = 0; \
+ register long _arg4 __asm__ ("x3") = 0; \
+ register long _arg5 __asm__ ("x4") = 0; \
+ \
+ __asm__ volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+ })
+
static inline bool feats_ok(struct tdescr *td)
{
if (td->feats_incompatible & td->feats_supported)
--
2.30.2
There are things like threads which nolibc struggles with which we want
to add coverage for, and the ABI allows us to test most of these even if
libc itself does not understand GCS so add a test application built
using the system libc.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/gcs/.gitignore | 1 +
tools/testing/selftests/arm64/gcs/Makefile | 4 +-
tools/testing/selftests/arm64/gcs/gcs-util.h | 10 +
tools/testing/selftests/arm64/gcs/libc-gcs.c | 736 +++++++++++++++++++++++++++
4 files changed, 750 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/gcs/.gitignore b/tools/testing/selftests/arm64/gcs/.gitignore
index 0e5e695ecba5..5810c4a163d4 100644
--- a/tools/testing/selftests/arm64/gcs/.gitignore
+++ b/tools/testing/selftests/arm64/gcs/.gitignore
@@ -1 +1,2 @@
basic-gcs
+libc-gcs
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
index 61a30f483429..a8fdf21e9a47 100644
--- a/tools/testing/selftests/arm64/gcs/Makefile
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -6,7 +6,9 @@
# nolibc.
#
-TEST_GEN_PROGS := basic-gcs
+TEST_GEN_PROGS := basic-gcs libc-gcs
+
+LDLIBS+=-lpthread
include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/gcs/gcs-util.h b/tools/testing/selftests/arm64/gcs/gcs-util.h
index b37801c95604..4bafd1d7feb5 100644
--- a/tools/testing/selftests/arm64/gcs/gcs-util.h
+++ b/tools/testing/selftests/arm64/gcs/gcs-util.h
@@ -16,6 +16,16 @@
#define __NR_prctl 167
#endif
+#ifndef NT_ARM_GCS
+#define NT_ARM_GCS 0x40e
+
+struct user_gcs {
+ __u64 features_enabled;
+ __u64 features_locked;
+ __u64 gcspr_el0;
+};
+#endif
+
/* Shadow Stack/Guarded Control Stack interface */
#define PR_GET_SHADOW_STACK_STATUS 71
#define PR_SET_SHADOW_STACK_STATUS 72
diff --git a/tools/testing/selftests/arm64/gcs/libc-gcs.c b/tools/testing/selftests/arm64/gcs/libc-gcs.c
new file mode 100644
index 000000000000..937f8bee7bdd
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/libc-gcs.c
@@ -0,0 +1,736 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Limited.
+ */
+
+#define _GNU_SOURCE
+
+#include <pthread.h>
+#include <stdbool.h>
+
+#include <sys/auxv.h>
+#include <sys/mman.h>
+#include <sys/prctl.h>
+#include <sys/ptrace.h>
+#include <sys/uio.h>
+
+#include <asm/hwcap.h>
+#include <asm/mman.h>
+
+#include <linux/compiler.h>
+
+#include "kselftest_harness.h"
+
+#include "gcs-util.h"
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num __asm__ ("x8") = (num); \
+ register long _arg1 __asm__ ("x0") = (long)(arg1); \
+ register long _arg2 __asm__ ("x1") = (long)(arg2); \
+ register long _arg3 __asm__ ("x2") = 0; \
+ register long _arg4 __asm__ ("x3") = 0; \
+ register long _arg5 __asm__ ("x4") = 0; \
+ \
+ __asm__ volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+static noinline void gcs_recurse(int depth)
+{
+ if (depth)
+ gcs_recurse(depth - 1);
+
+ /* Prevent tail call optimization so we actually recurse */
+ asm volatile("dsb sy" : : : "memory");
+}
+
+/* Smoke test that a function call and return works*/
+TEST(can_call_function)
+{
+ gcs_recurse(0);
+}
+
+static void *gcs_test_thread(void *arg)
+{
+ int ret;
+ unsigned long mode;
+
+ /*
+ * Some libcs don't seem to fill unused arguments with 0 but
+ * the kernel validates this so we supply all 5 arguments.
+ */
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ if (ret != 0) {
+ ksft_print_msg("PR_GET_SHADOW_STACK_STATUS failed: %d\n", ret);
+ return NULL;
+ }
+
+ if (!(mode & PR_SHADOW_STACK_ENABLE)) {
+ ksft_print_msg("GCS not enabled in thread, mode is %u\n",
+ mode);
+ return NULL;
+ }
+
+ /* Just in case... */
+ gcs_recurse(0);
+
+ /* Use a non-NULL value to indicate a pass */
+ return &gcs_test_thread;
+}
+
+/* Verify that if we start a new thread it has GCS enabled */
+TEST(gcs_enabled_thread)
+{
+ pthread_t thread;
+ void *thread_ret;
+ int ret;
+
+ ret = pthread_create(&thread, NULL, gcs_test_thread, NULL);
+ ASSERT_TRUE(ret == 0);
+ if (ret != 0)
+ return;
+
+ ret = pthread_join(thread, &thread_ret);
+ ASSERT_TRUE(ret == 0);
+ if (ret != 0)
+ return;
+
+ ASSERT_TRUE(thread_ret != NULL);
+}
+
+/* Read the GCS until we find the terminator */
+TEST(gcs_find_terminator)
+{
+ unsigned long *gcs, *cur;
+
+ gcs = get_gcspr();
+ cur = gcs;
+ while (*cur)
+ cur++;
+
+ ksft_print_msg("GCS in use from %p-%p\n", gcs, cur);
+
+ /*
+ * We should have at least whatever called into this test so
+ * the two pointer should differ.
+ */
+ ASSERT_TRUE(gcs != cur);
+}
+
+/*
+ * We can access a GCS via ptrace
+ *
+ * This could usefully have a fixture but note that each test is
+ * fork()ed into a new child whcih causes issues. Might be better to
+ * lift at least some of this out into a separate, non-harness, test
+ * program.
+ */
+TEST(ptrace_read_write)
+{
+ pid_t child, pid;
+ int ret, status;
+ siginfo_t si;
+ uint64_t val, rval, gcspr;
+ struct user_gcs child_gcs;
+ struct iovec iov, local_iov, remote_iov;
+
+ child = fork();
+ if (child == -1) {
+ ksft_print_msg("fork() failed: %d (%s)\n",
+ errno, strerror(errno));
+ ASSERT_NE(child, -1);
+ }
+
+ if (child == 0) {
+ /*
+ * In child, make sure there's something on the stack and
+ * ask to be traced.
+ */
+ gcs_recurse(0);
+ if (ptrace(PTRACE_TRACEME, -1, NULL, NULL))
+ ksft_exit_fail_msg("PTRACE_TRACEME", strerror(errno));
+
+ if (raise(SIGSTOP))
+ ksft_exit_fail_msg("raise(SIGSTOP)", strerror(errno));
+
+ return;
+ }
+
+ ksft_print_msg("Child: %d\n", child);
+
+ /* Attach to the child */
+ while (1) {
+ int sig;
+
+ pid = wait(&status);
+ if (pid == -1) {
+ ksft_print_msg("wait() failed: %s",
+ strerror(errno));
+ goto error;
+ }
+
+ /*
+ * This should never happen but it's hard to flag in
+ * the framework.
+ */
+ if (pid != child)
+ continue;
+
+ if (WIFEXITED(status) || WIFSIGNALED(status))
+ ksft_exit_fail_msg("Child died unexpectedly\n");
+
+ if (!WIFSTOPPED(status))
+ goto error;
+
+ sig = WSTOPSIG(status);
+
+ if (ptrace(PTRACE_GETSIGINFO, pid, NULL, &si)) {
+ if (errno == ESRCH) {
+ ASSERT_NE(errno, ESRCH);
+ return;
+ }
+
+ if (errno == EINVAL) {
+ sig = 0; /* bust group-stop */
+ goto cont;
+ }
+
+ ksft_print_msg("PTRACE_GETSIGINFO: %s\n",
+ strerror(errno));
+ goto error;
+ }
+
+ if (sig == SIGSTOP && si.si_code == SI_TKILL &&
+ si.si_pid == pid)
+ break;
+
+ cont:
+ if (ptrace(PTRACE_CONT, pid, NULL, sig)) {
+ if (errno == ESRCH) {
+ ASSERT_NE(errno, ESRCH);
+ return;
+ }
+
+ ksft_print_msg("PTRACE_CONT: %s\n", strerror(errno));
+ goto error;
+ }
+ }
+
+ /* Where is the child GCS? */
+ iov.iov_base = &child_gcs;
+ iov.iov_len = sizeof(child_gcs);
+ ret = ptrace(PTRACE_GETREGSET, child, NT_ARM_GCS, &iov);
+ if (ret != 0) {
+ ksft_print_msg("Failed to read child GCS state: %s (%d)\n",
+ strerror(errno), errno);
+ goto error;
+ }
+
+ /* We should have inherited GCS over fork(), confirm */
+ if (!(child_gcs.features_enabled & PR_SHADOW_STACK_ENABLE)) {
+ ASSERT_TRUE(child_gcs.features_enabled &
+ PR_SHADOW_STACK_ENABLE);
+ goto error;
+ }
+
+ gcspr = child_gcs.gcspr_el0;
+ ksft_print_msg("Child GCSPR 0x%lx, flags %x, locked %x\n",
+ gcspr, child_gcs.features_enabled,
+ child_gcs.features_locked);
+
+ /* Ideally we'd cross check with the child memory map */
+
+ errno = 0;
+ val = ptrace(PTRACE_PEEKDATA, child, (void *)gcspr, NULL);
+ ret = errno;
+ if (ret != 0)
+ ksft_print_msg("PTRACE_PEEKDATA failed: %s (%d)\n",
+ strerror(ret), ret);
+ EXPECT_EQ(ret, 0);
+
+ /* The child should be in a function, the GCSPR shouldn't be 0 */
+ EXPECT_NE(val, 0);
+
+ /* Same thing via process_vm_readv() */
+ local_iov.iov_base = &rval;
+ local_iov.iov_len = sizeof(rval);
+ remote_iov.iov_base = (void *)gcspr;
+ remote_iov.iov_len = sizeof(rval);
+ ret = process_vm_readv(child, &local_iov, 1, &remote_iov, 1, 0);
+ if (ret == -1)
+ ksft_print_msg("process_vm_readv() failed: %s (%d)\n",
+ strerror(errno), errno);
+ EXPECT_EQ(ret, sizeof(rval));
+ EXPECT_EQ(val, rval);
+
+ /* Write data via a peek */
+ ret = ptrace(PTRACE_POKEDATA, child, (void *)gcspr, NULL);
+ if (ret == -1)
+ ksft_print_msg("PTRACE_POKEDATA failed: %s (%d)\n",
+ strerror(errno), errno);
+ EXPECT_EQ(ret, 0);
+ EXPECT_EQ(0, ptrace(PTRACE_PEEKDATA, child, (void *)gcspr, NULL));
+
+ /* Restore what we had before */
+ ret = ptrace(PTRACE_POKEDATA, child, (void *)gcspr, val);
+ if (ret == -1)
+ ksft_print_msg("PTRACE_POKEDATA failed: %s (%d)\n",
+ strerror(errno), errno);
+ EXPECT_EQ(ret, 0);
+ EXPECT_EQ(val, ptrace(PTRACE_PEEKDATA, child, (void *)gcspr, NULL));
+
+ /* That's all, folks */
+ kill(child, SIGKILL);
+ return;
+
+error:
+ kill(child, SIGKILL);
+ ASSERT_FALSE(true);
+}
+
+FIXTURE(map_gcs)
+{
+ unsigned long *stack;
+};
+
+FIXTURE_VARIANT(map_gcs)
+{
+ size_t stack_size;
+ unsigned long flags;
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s2k_cap_marker)
+{
+ .stack_size = 2 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s2k_cap)
+{
+ .stack_size = 2 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s2k_marker)
+{
+ .stack_size = 2 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s2k)
+{
+ .stack_size = 2 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s4k_cap_marker)
+{
+ .stack_size = 4 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s4k_cap)
+{
+ .stack_size = 4 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s3k_marker)
+{
+ .stack_size = 4 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s4k)
+{
+ .stack_size = 4 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s16k_cap_marker)
+{
+ .stack_size = 16 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s16k_cap)
+{
+ .stack_size = 16 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s16k_marker)
+{
+ .stack_size = 16 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s16k)
+{
+ .stack_size = 16 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s64k_cap_marker)
+{
+ .stack_size = 64 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s64k_cap)
+{
+ .stack_size = 64 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s64k_marker)
+{
+ .stack_size = 64 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s64k)
+{
+ .stack_size = 64 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s128k_cap_marker)
+{
+ .stack_size = 128 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s128k_cap)
+{
+ .stack_size = 128 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s128k_marker)
+{
+ .stack_size = 128 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s128k)
+{
+ .stack_size = 128 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_SETUP(map_gcs)
+{
+ self->stack = (void *)syscall(__NR_map_shadow_stack, 0,
+ variant->stack_size,
+ variant->flags);
+ ASSERT_FALSE(self->stack == MAP_FAILED);
+ ksft_print_msg("Allocated stack from %p-%p\n", self->stack,
+ (unsigned long)self->stack + variant->stack_size);
+}
+
+FIXTURE_TEARDOWN(map_gcs)
+{
+ int ret;
+
+ if (self->stack != MAP_FAILED) {
+ ret = munmap(self->stack, variant->stack_size);
+ ASSERT_EQ(ret, 0);
+ }
+}
+
+/* The stack has a cap token */
+TEST_F(map_gcs, stack_capped)
+{
+ unsigned long *stack = self->stack;
+ size_t cap_index;
+
+ cap_index = (variant->stack_size / sizeof(unsigned long));
+
+ switch (variant->flags & (SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN)) {
+ case SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN:
+ cap_index -= 2;
+ break;
+ case SHADOW_STACK_SET_TOKEN:
+ cap_index -= 1;
+ break;
+ case SHADOW_STACK_SET_MARKER:
+ case 0:
+ /* No cap, no test */
+ return;
+ }
+
+ ASSERT_EQ(stack[cap_index], GCS_CAP(&stack[cap_index]));
+}
+
+/* The top of the stack is 0 */
+TEST_F(map_gcs, stack_terminated)
+{
+ unsigned long *stack = self->stack;
+ size_t term_index;
+
+ if (!(variant->flags & SHADOW_STACK_SET_MARKER))
+ return;
+
+ term_index = (variant->stack_size / sizeof(unsigned long)) - 1;
+
+ ASSERT_EQ(stack[term_index], 0);
+}
+
+/* Writes should fault */
+TEST_F_SIGNAL(map_gcs, not_writeable, SIGSEGV)
+{
+ self->stack[0] = 0;
+}
+
+/* Put it all together, we can safely switch to and from the stack */
+TEST_F(map_gcs, stack_switch)
+{
+ size_t cap_index;
+ cap_index = (variant->stack_size / sizeof(unsigned long));
+ unsigned long *orig_gcspr_el0, *pivot_gcspr_el0;
+
+ /* Skip over the stack terminator and point at the cap */
+ switch (variant->flags & (SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN)) {
+ case SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN:
+ cap_index -= 2;
+ break;
+ case SHADOW_STACK_SET_TOKEN:
+ cap_index -= 1;
+ break;
+ case SHADOW_STACK_SET_MARKER:
+ case 0:
+ /* No cap, no test */
+ return;
+ }
+ pivot_gcspr_el0 = &self->stack[cap_index];
+
+ /* Pivot to the new GCS */
+ ksft_print_msg("Pivoting to %p from %p, target has value 0x%lx\n",
+ pivot_gcspr_el0, get_gcspr(),
+ *pivot_gcspr_el0);
+ gcsss1(pivot_gcspr_el0);
+ orig_gcspr_el0 = gcsss2();
+ ksft_print_msg("Pivoted to %p from %p, target has value 0x%lx\n",
+ get_gcspr(), orig_gcspr_el0,
+ *pivot_gcspr_el0);
+
+ ksft_print_msg("Pivoted, GCSPR_EL0 now %p\n", get_gcspr());
+
+ /* New GCS must be in the new buffer */
+ ASSERT_TRUE((unsigned long)get_gcspr() > (unsigned long)self->stack);
+ ASSERT_TRUE((unsigned long)get_gcspr() <=
+ (unsigned long)self->stack + variant->stack_size);
+
+ /* We should be able to use all but 2 slots of the new stack */
+ ksft_print_msg("Recursing %d levels\n", cap_index - 1);
+ gcs_recurse(cap_index - 1);
+
+ /* Pivot back to the original GCS */
+ gcsss1(orig_gcspr_el0);
+ pivot_gcspr_el0 = gcsss2();
+
+ gcs_recurse(0);
+ ksft_print_msg("Pivoted back to GCSPR_EL0 0x%lx\n", get_gcspr());
+}
+
+/* We fault if we try to go beyond the end of the stack */
+TEST_F_SIGNAL(map_gcs, stack_overflow, SIGSEGV)
+{
+ size_t cap_index;
+ cap_index = (variant->stack_size / sizeof(unsigned long));
+ unsigned long *orig_gcspr_el0, *pivot_gcspr_el0;
+
+ /* Skip over the stack terminator and point at the cap */
+ switch (variant->flags & (SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN)) {
+ case SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN:
+ cap_index -= 2;
+ break;
+ case SHADOW_STACK_SET_TOKEN:
+ cap_index -= 1;
+ break;
+ case SHADOW_STACK_SET_MARKER:
+ case 0:
+ /* No cap, no test but we need to SEGV to avoid a false fail */
+ orig_gcspr_el0 = get_gcspr();
+ *orig_gcspr_el0 = 0;
+ return;
+ }
+ pivot_gcspr_el0 = &self->stack[cap_index];
+
+ /* Pivot to the new GCS */
+ ksft_print_msg("Pivoting to %p from %p, target has value 0x%lx\n",
+ pivot_gcspr_el0, get_gcspr(),
+ *pivot_gcspr_el0);
+ gcsss1(pivot_gcspr_el0);
+ orig_gcspr_el0 = gcsss2();
+ ksft_print_msg("Pivoted to %p from %p, target has value 0x%lx\n",
+ pivot_gcspr_el0, orig_gcspr_el0,
+ *pivot_gcspr_el0);
+
+ ksft_print_msg("Pivoted, GCSPR_EL0 now %p\n", get_gcspr());
+
+ /* New GCS must be in the new buffer */
+ ASSERT_TRUE((unsigned long)get_gcspr() > (unsigned long)self->stack);
+ ASSERT_TRUE((unsigned long)get_gcspr() <=
+ (unsigned long)self->stack + variant->stack_size);
+
+ /* Now try to recurse, we should fault doing this. */
+ ksft_print_msg("Recursing %d levels...\n", cap_index + 1);
+ gcs_recurse(cap_index + 1);
+ ksft_print_msg("...done\n");
+
+ /* Clean up properly to try to guard against spurious passes. */
+ gcsss1(orig_gcspr_el0);
+ pivot_gcspr_el0 = gcsss2();
+ ksft_print_msg("Pivoted back to GCSPR_EL0 0x%lx\n", get_gcspr());
+}
+
+FIXTURE(map_invalid_gcs)
+{
+};
+
+FIXTURE_VARIANT(map_invalid_gcs)
+{
+ size_t stack_size;
+};
+
+FIXTURE_SETUP(map_invalid_gcs)
+{
+}
+
+FIXTURE_TEARDOWN(map_invalid_gcs)
+{
+}
+
+/* GCS must be larger than 16 bytes */
+FIXTURE_VARIANT_ADD(map_invalid_gcs, too_small)
+{
+ .stack_size = 8,
+};
+
+/* GCS size must be 16 byte aligned */
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_1) { .stack_size = 1024 + 1 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_2) { .stack_size = 1024 + 2 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_3) { .stack_size = 1024 + 3 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_4) { .stack_size = 1024 + 4 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_5) { .stack_size = 1024 + 5 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_6) { .stack_size = 1024 + 6 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_7) { .stack_size = 1024 + 7 };
+
+TEST_F(map_invalid_gcs, do_map)
+{
+ void *stack;
+
+ stack = (void *)syscall(__NR_map_shadow_stack, 0,
+ variant->stack_size, 0);
+ ASSERT_TRUE(stack == MAP_FAILED);
+ if (stack != MAP_FAILED)
+ munmap(stack, variant->stack_size);
+}
+
+FIXTURE(invalid_mprotect)
+{
+ unsigned long *stack;
+ size_t stack_size;
+};
+
+FIXTURE_VARIANT(invalid_mprotect)
+{
+ unsigned long flags;
+};
+
+FIXTURE_SETUP(invalid_mprotect)
+{
+ self->stack_size = sysconf(_SC_PAGE_SIZE);
+ self->stack = (void *)syscall(__NR_map_shadow_stack, 0,
+ self->stack_size, 0);
+ ASSERT_FALSE(self->stack == MAP_FAILED);
+ ksft_print_msg("Allocated stack from %p-%p\n", self->stack,
+ (unsigned long)self->stack + self->stack_size);
+}
+
+FIXTURE_TEARDOWN(invalid_mprotect)
+{
+ int ret;
+
+ if (self->stack != MAP_FAILED) {
+ ret = munmap(self->stack, self->stack_size);
+ ASSERT_EQ(ret, 0);
+ }
+}
+
+FIXTURE_VARIANT_ADD(invalid_mprotect, exec)
+{
+ .flags = PROT_EXEC,
+};
+
+FIXTURE_VARIANT_ADD(invalid_mprotect, bti)
+{
+ .flags = PROT_BTI,
+};
+
+FIXTURE_VARIANT_ADD(invalid_mprotect, exec_bti)
+{
+ .flags = PROT_EXEC | PROT_BTI,
+};
+
+TEST_F(invalid_mprotect, do_map)
+{
+ int ret;
+
+ ret = mprotect(self->stack, self->stack_size, variant->flags);
+ ASSERT_EQ(ret, -1);
+}
+
+TEST_F(invalid_mprotect, do_map_read)
+{
+ int ret;
+
+ ret = mprotect(self->stack, self->stack_size,
+ variant->flags | PROT_READ);
+ ASSERT_EQ(ret, -1);
+}
+
+int main(int argc, char **argv)
+{
+ unsigned long gcs_mode;
+ int ret;
+
+ if (!(getauxval(AT_HWCAP2) & HWCAP2_GCS))
+ ksft_exit_skip("SKIP GCS not supported\n");
+
+ /*
+ * Force shadow stacks on, our tests *should* be fine with or
+ * without libc support and with or without this having ended
+ * up tagged for GCS and enabled by the dynamic linker. We
+ * can't use the libc prctl() function since we can't return
+ * from enabling the stack.
+ */
+ ret = my_syscall2(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &gcs_mode);
+ if (ret) {
+ ksft_print_msg("Failed to read GCS state: %d\n", ret);
+ return EXIT_FAILURE;
+ }
+
+ if (!(gcs_mode & PR_SHADOW_STACK_ENABLE)) {
+ gcs_mode = PR_SHADOW_STACK_ENABLE;
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ gcs_mode);
+ if (ret) {
+ ksft_print_msg("Failed to configure GCS: %d\n", ret);
+ return EXIT_FAILURE;
+ }
+ }
+
+ /* Avoid returning in case libc doesn't understand GCS */
+ exit(test_harness_run(argc, argv));
+}
--
2.30.2
When invoking a signal handler we use the GCS configuration and stack
for the current thread.
Since we implement signal return by calling the signal handler with a
return address set up pointing to a trampoline in the vDSO we need to
also configure any active GCS for this by pushing a frame for the
trampoline onto the GCS. If we do not do this then signal return will
generate a GCS protection fault.
In order to guard against attempts to bypass GCS protections via signal
return we only allow returning with GCSPR_EL0 pointing to an address
where it was previously preempted by a signal. We do this by pushing a
cap onto the GCS, this takes the form of an architectural GCS cap token
with the top bit set and token type of 0 which we add on signal entry
and validate and pop off on signal return. The combination of the top
bit being set and the token type mean that this can't be interpreted as
a valid token or address.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 1 +
arch/arm64/kernel/signal.c | 134 +++++++++++++++++++++++++++++++++++++++++--
arch/arm64/mm/gcs.c | 1 +
3 files changed, 131 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 48c97e63e56a..f50660603ecf 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -9,6 +9,7 @@
#include <asm/uaccess.h>
struct kernel_clone_args;
+struct ksignal;
static inline void gcsb_dsync(void)
{
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 0e8beb3349ea..1cca646a7479 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -25,6 +25,7 @@
#include <asm/elf.h>
#include <asm/exception.h>
#include <asm/cacheflush.h>
+#include <asm/gcs.h>
#include <asm/ucontext.h>
#include <asm/unistd.h>
#include <asm/fpsimd.h>
@@ -34,6 +35,37 @@
#include <asm/traps.h>
#include <asm/vdso.h>
+#ifdef CONFIG_ARM64_GCS
+/* Extra bit set in the address distinguishing a signal cap token. */
+#define GCS_SIGNAL_CAP_FLAG BIT(63)
+
+#define GCS_SIGNAL_CAP(addr) ((((unsigned long)addr) & GCS_CAP_ADDR_MASK) | \
+ GCS_SIGNAL_CAP_FLAG)
+
+static bool gcs_signal_cap_valid(u64 addr, u64 val)
+{
+ /*
+ * The top bit should be set, this is an invalid address for
+ * EL0 and will only be set for caps created by signals.
+ */
+ if (!(val & GCS_SIGNAL_CAP_FLAG))
+ return false;
+
+ /* The rest should be a standard architectural cap token. */
+ val &= ~GCS_SIGNAL_CAP_FLAG;
+
+ /* The cap must not have a token set */
+ if (GCS_CAP_TOKEN(val) != 0)
+ return false;
+
+ /* The cap must store the VA the cap was stored at */
+ if (GCS_CAP_ADDR(addr) != GCS_CAP_ADDR(val))
+ return false;
+
+ return true;
+}
+#endif
+
/*
* Do a signal return; undo the signal stack. These are aligned to 128-bit.
*/
@@ -815,6 +847,50 @@ static int restore_sigframe(struct pt_regs *regs,
return err;
}
+#ifdef CONFIG_ARM64_GCS
+static int gcs_restore_signal(void)
+{
+ u64 gcspr_el0, cap;
+ int ret;
+
+ if (!system_supports_gcs())
+ return 0;
+
+ if (!(current->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE))
+ return 0;
+
+ gcspr_el0 = read_sysreg_s(SYS_GCSPR_EL0);
+
+ /*
+ * GCSPR_EL0 should be pointing at a capped GCS, read the cap...
+ */
+ gcsb_dsync();
+ ret = copy_from_user(&cap, (__user void*)gcspr_el0, sizeof(cap));
+ if (ret)
+ return -EFAULT;
+
+ /*
+ * ...then check that the cap is the actual GCS before
+ * restoring it.
+ */
+ if (!gcs_signal_cap_valid(gcspr_el0, cap))
+ return -EINVAL;
+
+ /* Invalidate the token to prevent reuse */
+ put_user_gcs(0, (__user void*)gcspr_el0, &ret);
+ if (ret != 0)
+ return -EFAULT;
+
+ current->thread.gcspr_el0 = gcspr_el0 + sizeof(cap);
+ write_sysreg_s(current->thread.gcspr_el0, SYS_GCSPR_EL0);
+
+ return 0;
+}
+
+#else
+static int gcs_restore_signal(void) { return 0; }
+#endif
+
SYSCALL_DEFINE0(rt_sigreturn)
{
struct pt_regs *regs = current_pt_regs();
@@ -841,6 +917,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (restore_altstack(&frame->uc.uc_stack))
goto badframe;
+ if (gcs_restore_signal())
+ goto badframe;
+
return regs->regs[0];
badframe:
@@ -1071,7 +1150,50 @@ static int get_sigframe(struct rt_sigframe_user_layout *user,
return 0;
}
-static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
+#ifdef CONFIG_ARM64_GCS
+
+static int gcs_signal_entry(__sigrestore_t sigtramp, struct ksignal *ksig)
+{
+ unsigned long __user *gcspr_el0;
+ int ret = 0;
+
+ if (!system_supports_gcs())
+ return 0;
+
+ if (!task_gcs_el0_enabled(current))
+ return 0;
+
+ /*
+ * We are entering a signal handler, current register state is
+ * active.
+ */
+ gcspr_el0 = (unsigned long __user *)read_sysreg_s(SYS_GCSPR_EL0);
+
+ /*
+ * Push a cap and the GCS entry for the trampoline onto the GCS.
+ */
+ put_user_gcs((unsigned long)sigtramp, gcspr_el0 - 2, &ret);
+ put_user_gcs(GCS_SIGNAL_CAP(gcspr_el0 - 1), gcspr_el0 - 1, &ret);
+ if (ret != 0)
+ return ret;
+
+ gcsb_dsync();
+
+ gcspr_el0 -= 2;
+ write_sysreg_s((unsigned long)gcspr_el0, SYS_GCSPR_EL0);
+
+ return 0;
+}
+#else
+
+static int gcs_signal_entry(__sigrestore_t sigtramp, struct ksignal *ksig)
+{
+ return 0;
+}
+
+#endif
+
+static int setup_return(struct pt_regs *regs, struct ksignal *ksig,
struct rt_sigframe_user_layout *user, int usig)
{
__sigrestore_t sigtramp;
@@ -1079,7 +1201,7 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
regs->regs[0] = usig;
regs->sp = (unsigned long)user->sigframe;
regs->regs[29] = (unsigned long)&user->next_frame->fp;
- regs->pc = (unsigned long)ka->sa.sa_handler;
+ regs->pc = (unsigned long)ksig->ka.sa.sa_handler;
/*
* Signal delivery is a (wacky) indirect function call in
@@ -1119,12 +1241,14 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
sme_smstop();
}
- if (ka->sa.sa_flags & SA_RESTORER)
- sigtramp = ka->sa.sa_restorer;
+ if (ksig->ka.sa.sa_flags & SA_RESTORER)
+ sigtramp = ksig->ka.sa.sa_restorer;
else
sigtramp = VDSO_SYMBOL(current->mm->context.vdso, sigtramp);
regs->regs[30] = (unsigned long)sigtramp;
+
+ return gcs_signal_entry(sigtramp, ksig);
}
static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set,
@@ -1147,7 +1271,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set,
err |= __save_altstack(&frame->uc.uc_stack, regs->sp);
err |= setup_sigframe(&user, regs, set);
if (err == 0) {
- setup_return(regs, &ksig->ka, &user, usig);
+ err = setup_return(regs, ksig, &user, usig);
if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
err |= copy_siginfo_to_user(&frame->info, &ksig->info);
regs->regs[1] = (unsigned long)&frame->info;
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index f34821d98d85..2d8e54316fe2 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -6,6 +6,7 @@
#include <linux/types.h>
#include <asm/cpufeature.h>
+#include <asm/gcs.h>
#include <asm/page.h>
static unsigned long alloc_gcs(unsigned long addr, unsigned long size,
--
2.30.2
Do some testing of the signal handling for GCS, checking that a GCS
frame has the expected information in it and that the expected signals
are delivered with invalid operations.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../selftests/arm64/signal/test_signals_utils.h | 10 +++
.../arm64/signal/testcases/gcs_exception_fault.c | 62 +++++++++++++++
.../selftests/arm64/signal/testcases/gcs_frame.c | 88 ++++++++++++++++++++++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++++++++++++++++
5 files changed, 228 insertions(+)
diff --git a/tools/testing/selftests/arm64/signal/.gitignore b/tools/testing/selftests/arm64/signal/.gitignore
index 839e3a252629..26de12918890 100644
--- a/tools/testing/selftests/arm64/signal/.gitignore
+++ b/tools/testing/selftests/arm64/signal/.gitignore
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
mangle_*
fake_sigreturn_*
+gcs_*
sme_*
ssve_*
sve_*
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 1e80808ee105..36fc12b3cd60 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -6,6 +6,7 @@
#include <assert.h>
#include <stdio.h>
+#include <stdint.h>
#include <string.h>
#include <linux/compiler.h>
@@ -47,6 +48,15 @@ void test_result(struct tdescr *td);
_arg1; \
})
+static inline __attribute__((always_inline)) uint64_t get_gcspr_el0(void)
+{
+ uint64_t val;
+
+ asm volatile("mrs %0, S3_3_C2_C5_1" : "=r" (val));
+
+ return val;
+}
+
static inline bool feats_ok(struct tdescr *td)
{
if (td->feats_incompatible & td->feats_supported)
diff --git a/tools/testing/selftests/arm64/signal/testcases/gcs_exception_fault.c b/tools/testing/selftests/arm64/signal/testcases/gcs_exception_fault.c
new file mode 100644
index 000000000000..6228448b2ae7
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/gcs_exception_fault.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Limited
+ */
+
+#include <errno.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+/*
+ * We should get this from asm/siginfo.h but the testsuite is being
+ * clever with redefining siginfo_t.
+ */
+#ifndef SEGV_CPERR
+#define SEGV_CPERR 10
+#endif
+
+static inline void gcsss1(uint64_t Xt)
+{
+ asm volatile (
+ "sys #3, C7, C7, #2, %0\n"
+ :
+ : "rZ" (Xt)
+ : "memory");
+}
+
+static int gcs_op_fault_trigger(struct tdescr *td)
+{
+ /*
+ * The slot below our current GCS should be in a valid GCS but
+ * must not have a valid cap in it.
+ */
+ gcsss1(get_gcspr_el0() - 8);
+
+ return 0;
+}
+
+static int gcs_op_fault_signal(struct tdescr *td, siginfo_t *si,
+ ucontext_t *uc)
+{
+ ASSERT_GOOD_CONTEXT(uc);
+
+ return 1;
+}
+
+struct tdescr tde = {
+ .name = "Invalid GCS operation",
+ .descr = "An invalid GCS operation generates the expected signal",
+ .feats_required = FEAT_GCS,
+ .timeout = 3,
+ .sig_ok = SIGSEGV,
+ .sig_ok_code = SEGV_CPERR,
+ .sanity_disabled = true,
+ .trigger = gcs_op_fault_trigger,
+ .run = gcs_op_fault_signal,
+};
diff --git a/tools/testing/selftests/arm64/signal/testcases/gcs_frame.c b/tools/testing/selftests/arm64/signal/testcases/gcs_frame.c
new file mode 100644
index 000000000000..b405d82321da
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/gcs_frame.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Limited
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+static union {
+ ucontext_t uc;
+ char buf[1024 * 64];
+} context;
+
+static int gcs_regs(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+ size_t offset;
+ struct _aarch64_ctx *head = GET_BUF_RESV_HEAD(context);
+ struct gcs_context *gcs;
+ unsigned long expected, gcspr;
+ uint64_t *u64_val;
+ int ret;
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &expected, 0, 0, 0);
+ if (ret != 0) {
+ fprintf(stderr, "Unable to query GCS status\n");
+ return 1;
+ }
+
+ /* We expect a cap to be added to the GCS in the signal frame */
+ gcspr = get_gcspr_el0();
+ gcspr -= 8;
+ fprintf(stderr, "Expecting GCSPR_EL0 %lx\n", gcspr);
+
+ if (!get_current_context(td, &context.uc, sizeof(context))) {
+ fprintf(stderr, "Failed getting context\n");
+ return 1;
+ }
+
+ /* Ensure that the signal restore token was consumed */
+ u64_val = (uint64_t *)get_gcspr_el0() + 1;
+ if (*u64_val) {
+ fprintf(stderr, "GCS value at %p is %lx not 0\n",
+ u64_val, *u64_val);
+ return 1;
+ }
+
+ fprintf(stderr, "Got context\n");
+
+ head = get_header(head, GCS_MAGIC, GET_BUF_RESV_SIZE(context),
+ &offset);
+ if (!head) {
+ fprintf(stderr, "No GCS context\n");
+ return 1;
+ }
+
+ gcs = (struct gcs_context *)head;
+
+ /* Basic size validation is done in get_current_context() */
+
+ if (gcs->features_enabled != expected) {
+ fprintf(stderr, "Features enabled %llx but expected %lx\n",
+ gcs->features_enabled, expected);
+ return 1;
+ }
+
+ if (gcs->gcspr != gcspr) {
+ fprintf(stderr, "Got GCSPR %llx but expected %lx\n",
+ gcs->gcspr, gcspr);
+ return 1;
+ }
+
+ fprintf(stderr, "GCS context validated\n");
+ td->pass = 1;
+
+ return 0;
+}
+
+struct tdescr tde = {
+ .name = "GCS basics",
+ .descr = "Validate a GCS signal context",
+ .feats_required = FEAT_GCS,
+ .timeout = 3,
+ .run = gcs_regs,
+};
diff --git a/tools/testing/selftests/arm64/signal/testcases/gcs_write_fault.c b/tools/testing/selftests/arm64/signal/testcases/gcs_write_fault.c
new file mode 100644
index 000000000000..faeabb18c4b2
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/gcs_write_fault.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Limited
+ */
+
+#include <errno.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+static uint64_t *gcs_page;
+
+#ifndef __NR_map_shadow_stack
+#define __NR_map_shadow_stack 453
+#endif
+
+static bool alloc_gcs(struct tdescr *td)
+{
+ long page_size = sysconf(_SC_PAGE_SIZE);
+
+ gcs_page = (void *)syscall(__NR_map_shadow_stack, 0,
+ page_size, 0);
+ if (gcs_page == MAP_FAILED) {
+ fprintf(stderr, "Failed to map %ld byte GCS: %d\n",
+ page_size, errno);
+ return false;
+ }
+
+ return true;
+}
+
+static int gcs_write_fault_trigger(struct tdescr *td)
+{
+ /* Verify that the page is readable (ie, not completely unmapped) */
+ fprintf(stderr, "Read value 0x%lx\n", gcs_page[0]);
+
+ /* A regular write should trigger a fault */
+ gcs_page[0] = EINVAL;
+
+ return 0;
+}
+
+static int gcs_write_fault_signal(struct tdescr *td, siginfo_t *si,
+ ucontext_t *uc)
+{
+ ASSERT_GOOD_CONTEXT(uc);
+
+ return 1;
+}
+
+
+struct tdescr tde = {
+ .name = "GCS write fault",
+ .descr = "Normal writes to a GCS segfault",
+ .feats_required = FEAT_GCS,
+ .timeout = 3,
+ .sig_ok = SIGSEGV,
+ .sanity_disabled = true,
+ .init = alloc_gcs,
+ .trigger = gcs_write_fault_trigger,
+ .run = gcs_write_fault_signal,
+};
--
2.30.2
Verify that we can lock individual GCS mode bits, that other modes
aren't affected and as a side effect also that every combination of
modes can be enabled.
Normally the inability to reenable GCS after disabling it would be an
issue with testing but fortunately the kselftest_harness runs each test
within a fork()ed child. This can be inconvenient for some kinds of
testing but here it means that each test is in a separate thread and
therefore won't be affected by other tests in the suite.
Once we get toolchains with support for enabling GCS by default we will
need to take care to not do that in the build system but there are no
such toolchains yet so it is not yet an issue.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/gcs/.gitignore | 1 +
tools/testing/selftests/arm64/gcs/Makefile | 2 +-
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/gcs/.gitignore b/tools/testing/selftests/arm64/gcs/.gitignore
index 5810c4a163d4..0c86f53f68ad 100644
--- a/tools/testing/selftests/arm64/gcs/.gitignore
+++ b/tools/testing/selftests/arm64/gcs/.gitignore
@@ -1,2 +1,3 @@
basic-gcs
libc-gcs
+gcs-locking
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
index a8fdf21e9a47..2173d6275956 100644
--- a/tools/testing/selftests/arm64/gcs/Makefile
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -6,7 +6,7 @@
# nolibc.
#
-TEST_GEN_PROGS := basic-gcs libc-gcs
+TEST_GEN_PROGS := basic-gcs libc-gcs gcs-locking
LDLIBS+=-lpthread
diff --git a/tools/testing/selftests/arm64/gcs/gcs-locking.c b/tools/testing/selftests/arm64/gcs/gcs-locking.c
new file mode 100644
index 000000000000..f6a73254317e
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/gcs-locking.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Limited.
+ *
+ * Tests for GCS mode locking. These tests rely on both having GCS
+ * unconfigured on entry and on the kselftest harness running each
+ * test in a fork()ed process which will have it's own mode.
+ */
+
+#include <limits.h>
+
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+
+#include <asm/hwcap.h>
+
+#include "kselftest_harness.h"
+
+#include "gcs-util.h"
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num __asm__ ("x8") = (num); \
+ register long _arg1 __asm__ ("x0") = (long)(arg1); \
+ register long _arg2 __asm__ ("x1") = (long)(arg2); \
+ register long _arg3 __asm__ ("x2") = 0; \
+ register long _arg4 __asm__ ("x3") = 0; \
+ register long _arg5 __asm__ ("x4") = 0; \
+ \
+ __asm__ volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+/* No mode bits are rejected for locking */
+TEST(lock_all_modes)
+{
+ int ret;
+
+ ret = prctl(PR_LOCK_SHADOW_STACK_STATUS, ULONG_MAX, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+}
+
+FIXTURE(valid_modes)
+{
+};
+
+FIXTURE_VARIANT(valid_modes)
+{
+ unsigned long mode;
+};
+
+FIXTURE_VARIANT_ADD(valid_modes, enable)
+{
+ .mode = PR_SHADOW_STACK_ENABLE,
+};
+
+FIXTURE_VARIANT_ADD(valid_modes, enable_write)
+{
+ .mode = PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE,
+};
+
+FIXTURE_VARIANT_ADD(valid_modes, enable_push)
+{
+ .mode = PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_PUSH,
+};
+
+FIXTURE_VARIANT_ADD(valid_modes, enable_write_push)
+{
+ .mode = PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE |
+ PR_SHADOW_STACK_PUSH,
+};
+
+FIXTURE_SETUP(valid_modes)
+{
+}
+
+FIXTURE_TEARDOWN(valid_modes)
+{
+}
+
+/* We can set the mode at all */
+TEST_F(valid_modes, set)
+{
+ int ret;
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, 0);
+
+ _exit(0);
+}
+
+/* Enabling, locking then disabling is rejected */
+TEST_F(valid_modes, enable_lock_disable)
+{
+ unsigned long mode;
+ int ret;
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, variant->mode);
+
+ ret = prctl(PR_LOCK_SHADOW_STACK_STATUS, variant->mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS, 0);
+ ASSERT_EQ(ret, -EBUSY);
+
+ _exit(0);
+}
+
+/* Locking then enabling is rejected */
+TEST_F(valid_modes, lock_enable)
+{
+ unsigned long mode;
+ int ret;
+
+ ret = prctl(PR_LOCK_SHADOW_STACK_STATUS, variant->mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, -EBUSY);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, 0);
+
+ _exit(0);
+}
+
+/* Locking then changing other modes is fine */
+TEST_F(valid_modes, lock_enable_disable_others)
+{
+ unsigned long mode;
+ int ret;
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, variant->mode);
+
+ ret = prctl(PR_LOCK_SHADOW_STACK_STATUS, variant->mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ PR_SHADOW_STACK_ALL_MODES);
+ ASSERT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, PR_SHADOW_STACK_ALL_MODES);
+
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, variant->mode);
+
+ _exit(0);
+}
+
+int main(int argc, char **argv)
+{
+ unsigned long mode;
+ int ret;
+
+ if (!(getauxval(AT_HWCAP2) & HWCAP2_GCS))
+ ksft_exit_skip("SKIP GCS not supported\n");
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ if (ret) {
+ ksft_print_msg("Failed to read GCS state: %d\n", ret);
+ return EXIT_FAILURE;
+ }
+
+ if (mode & PR_SHADOW_STACK_ENABLE) {
+ ksft_print_msg("GCS was enabled, test unsupported\n");
+ return KSFT_SKIP;
+ }
+
+ return test_harness_run(argc, argv);
+}
--
2.30.2
While it's a bit off topic for them the floating point stress tests do give
us some coverage of context thrashing cases, and also of active signal
delivery separate to the relatively complicated framework in the actual
signals tests. Have the tests enable GCS on startup, ignoring failures so
they continue to work as before on systems without GCS.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/fp/assembler.h | 15 +++++++++++++++
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 ++
tools/testing/selftests/arm64/fp/sve-test.S | 2 ++
tools/testing/selftests/arm64/fp/za-test.S | 2 ++
tools/testing/selftests/arm64/fp/zt-test.S | 2 ++
5 files changed, 23 insertions(+)
diff --git a/tools/testing/selftests/arm64/fp/assembler.h b/tools/testing/selftests/arm64/fp/assembler.h
index 9b38a0da407d..7012f9f796de 100644
--- a/tools/testing/selftests/arm64/fp/assembler.h
+++ b/tools/testing/selftests/arm64/fp/assembler.h
@@ -65,4 +65,19 @@ endfunction
bl puts
.endm
+#define PR_SET_SHADOW_STACK_STATUS 72
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+
+.macro enable_gcs
+ // Run with GCS
+ mov x0, PR_SET_SHADOW_STACK_STATUS
+ mov x1, PR_SHADOW_STACK_ENABLE
+ mov x2, xzr
+ mov x3, xzr
+ mov x4, xzr
+ mov x5, xzr
+ mov x8, #__NR_prctl
+ svc #0
+.endm
+
#endif /* ! ASSEMBLER_H */
diff --git a/tools/testing/selftests/arm64/fp/fpsimd-test.S b/tools/testing/selftests/arm64/fp/fpsimd-test.S
index 8b960d01ed2e..b16fb7f42e3e 100644
--- a/tools/testing/selftests/arm64/fp/fpsimd-test.S
+++ b/tools/testing/selftests/arm64/fp/fpsimd-test.S
@@ -215,6 +215,8 @@ endfunction
// Main program entry point
.globl _start
function _start
+ enable_gcs
+
mov x23, #0 // signal count
mov w0, #SIGINT
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index fff60e2a25ad..2fb4f0b84476 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -378,6 +378,8 @@ endfunction
// Main program entry point
.globl _start
function _start
+ enable_gcs
+
mov x23, #0 // Irritation signal count
mov w0, #SIGINT
diff --git a/tools/testing/selftests/arm64/fp/za-test.S b/tools/testing/selftests/arm64/fp/za-test.S
index 095b45531640..b2603aba99de 100644
--- a/tools/testing/selftests/arm64/fp/za-test.S
+++ b/tools/testing/selftests/arm64/fp/za-test.S
@@ -231,6 +231,8 @@ endfunction
// Main program entry point
.globl _start
function _start
+ enable_gcs
+
mov x23, #0 // signal count
mov w0, #SIGINT
diff --git a/tools/testing/selftests/arm64/fp/zt-test.S b/tools/testing/selftests/arm64/fp/zt-test.S
index b5c81e81a379..8d9609a49008 100644
--- a/tools/testing/selftests/arm64/fp/zt-test.S
+++ b/tools/testing/selftests/arm64/fp/zt-test.S
@@ -200,6 +200,8 @@ endfunction
// Main program entry point
.globl _start
function _start
+ enable_gcs
+
mov x23, #0 // signal count
mov w0, #SIGINT
--
2.30.2
Allow test programs to use the shadow stack helpers on arm64.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/ksft_shstk.h | 37 ++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/tools/testing/selftests/ksft_shstk.h b/tools/testing/selftests/ksft_shstk.h
index 85d0747c1802..223e24b4eb80 100644
--- a/tools/testing/selftests/ksft_shstk.h
+++ b/tools/testing/selftests/ksft_shstk.h
@@ -50,6 +50,43 @@ static inline __attribute__((always_inline)) void enable_shadow_stack(void)
#endif
+#ifdef __aarch64__
+#define PR_SET_SHADOW_STACK_STATUS 72
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num __asm__ ("x8") = (num); \
+ register long _arg1 __asm__ ("x0") = (long)(arg1); \
+ register long _arg2 __asm__ ("x1") = (long)(arg2); \
+ register long _arg3 __asm__ ("x2") = 0; \
+ register long _arg4 __asm__ ("x3") = 0; \
+ register long _arg5 __asm__ ("x4") = 0; \
+ \
+ __asm__ volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+#define ENABLE_SHADOW_STACK
+static inline __attribute__((always_inline)) void enable_shadow_stack(void)
+{
+ int ret;
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ PR_SHADOW_STACK_ENABLE);
+ if (ret == 0)
+ shadow_stack_enabled = true;
+}
+
+#endif
+
#ifndef __NR_map_shadow_stack
#define __NR_map_shadow_stack 453
#endif
--
2.30.2
This test program just covers the basic GCS ABI, covering aspects of the
ABI as standalone features without attempting to integrate things.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/gcs/.gitignore | 1 +
tools/testing/selftests/arm64/gcs/Makefile | 18 ++
tools/testing/selftests/arm64/gcs/basic-gcs.c | 428 ++++++++++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 90 ++++++
5 files changed, 538 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile
index 28b93cab8c0d..22029e60eff3 100644
--- a/tools/testing/selftests/arm64/Makefile
+++ b/tools/testing/selftests/arm64/Makefile
@@ -4,7 +4,7 @@
ARCH ?= $(shell uname -m 2>/dev/null || echo not)
ifneq (,$(filter $(ARCH),aarch64 arm64))
-ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi
+ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi gcs
else
ARM64_SUBTARGETS :=
endif
diff --git a/tools/testing/selftests/arm64/gcs/.gitignore b/tools/testing/selftests/arm64/gcs/.gitignore
new file mode 100644
index 000000000000..0e5e695ecba5
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/.gitignore
@@ -0,0 +1 @@
+basic-gcs
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
new file mode 100644
index 000000000000..61a30f483429
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 ARM Limited
+#
+# In order to avoid interaction with the toolchain and dynamic linker the
+# portions of these tests that interact with the GCS are implemented using
+# nolibc.
+#
+
+TEST_GEN_PROGS := basic-gcs
+
+include ../../lib.mk
+
+$(OUTPUT)/basic-gcs: basic-gcs.c
+ $(CC) -g -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
+ -static -include ../../../../include/nolibc/nolibc.h \
+ -I../../../../../usr/include \
+ -std=gnu99 -I../.. -g \
+ -ffreestanding -Wall $^ -o $@ -lgcc
diff --git a/tools/testing/selftests/arm64/gcs/basic-gcs.c b/tools/testing/selftests/arm64/gcs/basic-gcs.c
new file mode 100644
index 000000000000..3ef957e0065f
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/basic-gcs.c
@@ -0,0 +1,428 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Limited.
+ */
+
+#include <limits.h>
+#include <stdbool.h>
+
+#include <linux/prctl.h>
+
+#include <sys/mman.h>
+#include <asm/mman.h>
+#include <linux/sched.h>
+
+#include "kselftest.h"
+#include "gcs-util.h"
+
+/* nolibc doesn't have sysconf(), just hard code the maximum */
+static size_t page_size = 65536;
+
+static __attribute__((noinline)) void valid_gcs_function(void)
+{
+ /* Do something the compiler can't optimise out */
+ my_syscall1(__NR_prctl, PR_SVE_GET_VL);
+}
+
+static inline int gcs_set_status(unsigned long mode)
+{
+ bool enabling = mode & PR_SHADOW_STACK_ENABLE;
+ int ret;
+ unsigned long new_mode;
+
+ /*
+ * The prctl takes 1 argument but we need to ensure that the
+ * other 3 values passed in registers to the syscall are zero
+ * since the kernel validates them.
+ */
+ ret = my_syscall5(__NR_prctl, PR_SET_SHADOW_STACK_STATUS, mode,
+ 0, 0, 0);
+
+ if (ret == 0) {
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &new_mode, 0, 0, 0);
+ if (ret == 0) {
+ if (new_mode != mode) {
+ ksft_print_msg("Mode set to %x not %x\n",
+ new_mode, mode);
+ ret = -EINVAL;
+ }
+ } else {
+ ksft_print_msg("Failed to validate mode: %d\n", ret);
+ }
+
+ if (enabling != chkfeat_gcs()) {
+ ksft_print_msg("%senabled by prctl but %senabled in CHKFEAT\n",
+ enabling ? "" : "not ",
+ chkfeat_gcs() ? "" : "not ");
+ ret = -EINVAL;
+ }
+ }
+
+ return ret;
+}
+
+/* Try to read the status */
+static bool read_status(void)
+{
+ unsigned long state;
+ int ret;
+
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &state, 0, 0, 0);
+ if (ret != 0) {
+ ksft_print_msg("Failed to read state: %d\n", ret);
+ return false;
+ }
+
+ return state & PR_SHADOW_STACK_ENABLE;
+}
+
+/* Just a straight enable */
+static bool base_enable(void)
+{
+ int ret;
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE);
+ if (ret) {
+ ksft_print_msg("PR_SHADOW_STACK_ENABLE failed %d\n", ret);
+ return false;
+ }
+
+ return true;
+}
+
+/* Check we can read GCSPR_EL0 when GCS is enabled */
+static bool read_gcspr_el0(void)
+{
+ unsigned long *gcspr_el0;
+
+ ksft_print_msg("GET GCSPR\n");
+ gcspr_el0 = get_gcspr();
+ ksft_print_msg("GCSPR_EL0 is %p\n", gcspr_el0);
+
+ return true;
+}
+
+/* Also allow writes to stack */
+static bool enable_writeable(void)
+{
+ int ret;
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE);
+ if (ret) {
+ ksft_print_msg("PR_SHADOW_STACK_ENABLE writeable failed: %d\n", ret);
+ return false;
+ }
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE);
+ if (ret) {
+ ksft_print_msg("failed to restore plain enable %d\n", ret);
+ return false;
+ }
+
+ return true;
+}
+
+/* Also allow writes to stack */
+static bool enable_push_pop(void)
+{
+ int ret;
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_PUSH);
+ if (ret) {
+ ksft_print_msg("PR_SHADOW_STACK_ENABLE with push failed: %d\n",
+ ret);
+ return false;
+ }
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE);
+ if (ret) {
+ ksft_print_msg("failed to restore plain enable %d\n", ret);
+ return false;
+ }
+
+ return true;
+}
+
+/* Enable GCS and allow everything */
+static bool enable_all(void)
+{
+ int ret;
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_PUSH |
+ PR_SHADOW_STACK_WRITE);
+ if (ret) {
+ ksft_print_msg("PR_SHADOW_STACK_ENABLE with everything failed: %d\n",
+ ret);
+ return false;
+ }
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE);
+ if (ret) {
+ ksft_print_msg("failed to restore plain enable %d\n", ret);
+ return false;
+ }
+
+ return true;
+}
+
+static bool enable_invalid(void)
+{
+ int ret = gcs_set_status(ULONG_MAX);
+ if (ret == 0) {
+ ksft_print_msg("GCS_SET_STATUS %lx succeeded\n", ULONG_MAX);
+ return false;
+ }
+
+ return true;
+}
+
+/* Map a GCS */
+static bool map_guarded_stack(void)
+{
+ int ret;
+ uint64_t *buf;
+ uint64_t expected_cap;
+ int elem;
+ bool pass = true;
+
+ buf = (void *)my_syscall3(__NR_map_shadow_stack, 0, page_size,
+ SHADOW_STACK_SET_MARKER |
+ SHADOW_STACK_SET_TOKEN);
+ if (buf == MAP_FAILED) {
+ ksft_print_msg("Failed to map %d byte GCS: %d\n",
+ page_size, errno);
+ return false;
+ }
+ ksft_print_msg("Mapped GCS at %p-%p\n", buf,
+ (uint64_t)buf + page_size);
+
+ /* The top of the newly allocated region should be 0 */
+ elem = (page_size / sizeof(uint64_t)) - 1;
+ if (buf[elem]) {
+ ksft_print_msg("Last entry is 0x%lx not 0x0\n", buf[elem]);
+ pass = false;
+ }
+
+ /* Then a valid cap token */
+ elem--;
+ expected_cap = ((uint64_t)buf + page_size - 16);
+ expected_cap &= GCS_CAP_ADDR_MASK;
+ expected_cap |= GCS_CAP_VALID_TOKEN;
+ if (buf[elem] != expected_cap) {
+ ksft_print_msg("Cap entry is 0x%lx not 0x%lx\n",
+ buf[elem], expected_cap);
+ pass = false;
+ }
+ ksft_print_msg("cap token is 0x%lx\n", buf[elem]);
+
+ /* The rest should be zeros */
+ for (elem = 0; elem < page_size / sizeof(uint64_t) - 2; elem++) {
+ if (!buf[elem])
+ continue;
+ ksft_print_msg("GCS slot %d is 0x%lx not 0x0\n",
+ elem, buf[elem]);
+ pass = false;
+ }
+
+ ret = munmap(buf, page_size);
+ if (ret != 0) {
+ ksft_print_msg("Failed to unmap %d byte GCS: %d\n",
+ page_size, errno);
+ pass = false;
+ }
+
+ return pass;
+}
+
+/* A fork()ed process can run */
+static bool test_fork(void)
+{
+ unsigned long child_mode;
+ int ret, status;
+ pid_t pid;
+ bool pass = true;
+
+ pid = fork();
+ if (pid == -1) {
+ ksft_print_msg("fork() failed: %d\n", errno);
+ pass = false;
+ goto out;
+ }
+ if (pid == 0) {
+ /* In child, make sure we can call a function, read
+ * the GCS pointer and status and then exit */
+ valid_gcs_function();
+ get_gcspr();
+
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &child_mode, 0, 0, 0);
+ if (ret == 0 && !(child_mode & PR_SHADOW_STACK_ENABLE)) {
+ ksft_print_msg("GCS not enabled in child\n");
+ ret = -EINVAL;
+ }
+
+ exit(ret);
+ }
+
+ /*
+ * In parent, check we can still do function calls then block
+ * for the child.
+ */
+ valid_gcs_function();
+
+ ksft_print_msg("Waiting for child %d\n", pid);
+
+ ret = waitpid(pid, &status, 0);
+ if (ret == -1) {
+ ksft_print_msg("Failed to wait for child: %d\n",
+ errno);
+ return false;
+ }
+
+ if (!WIFEXITED(status)) {
+ ksft_print_msg("Child exited due to signal %d\n",
+ WTERMSIG(status));
+ pass = false;
+ } else {
+ if (WEXITSTATUS(status)) {
+ ksft_print_msg("Child exited with status %d\n",
+ WEXITSTATUS(status));
+ pass = false;
+ }
+ }
+
+out:
+
+ return pass;
+}
+
+/* Check that we can explicitly specify a GCS via clone3() */
+static bool test_clone3(void)
+{
+ struct clone_args args;
+ unsigned long child_mode;
+ pid_t pid = -1;
+ int status, ret;
+ bool pass;
+
+ memset(&args, 0, sizeof(args));
+ args.flags = CLONE_VM;
+ args.shadow_stack_size = page_size;
+
+ pid = my_syscall2(__NR_clone3, &args, sizeof(args));
+ if (pid < 0) {
+ ksft_print_msg("clone3() failed: %d\n", errno);
+ pass = false;
+ goto out;
+ }
+
+ /* In child? */
+ if (pid == 0) {
+ /* Do we have GCS enabled? */
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &child_mode, 0, 0, 0);
+ if (ret != 0) {
+ ksft_print_msg("PR_GET_SHADOW_STACK_STATUS failed: %d\n",
+ ret);
+ exit(EXIT_FAILURE);
+ }
+
+ if (!(child_mode & PR_SHADOW_STACK_ENABLE)) {
+ ksft_print_msg("GCS not enabled in child\n");
+ exit(EXIT_FAILURE);
+ }
+
+ ksft_print_msg("GCS enabled in child\n");
+
+ /* We've probably already called a function but make sure */
+ valid_gcs_function();
+
+ exit(EXIT_SUCCESS);
+ }
+
+ if (waitpid(-1, &status, __WALL) < 0) {
+ ksft_print_msg("waitpid() failed %d\n", errno);
+ pass = false;
+ goto out;
+ }
+ if (WIFEXITED(status)) {
+ if (WEXITSTATUS(status) == EXIT_SUCCESS) {
+ pass = true;
+ } else {
+ ksft_print_msg("Child returned status %d\n",
+ WEXITSTATUS(status));
+ pass = false;
+ }
+ } else if (WIFSIGNALED(status)) {
+ ksft_print_msg("Child exited due to signal %d\n",
+ WTERMSIG(status));
+ pass = false;
+ } else {
+ ksft_print_msg("Child exited uncleanly\n");
+ pass = false;
+ }
+
+out:
+ return pass;
+}
+
+typedef bool (*gcs_test)(void);
+
+static struct {
+ char *name;
+ gcs_test test;
+ bool needs_enable;
+} tests[] = {
+ { "read_status", read_status },
+ { "base_enable", base_enable, true },
+ { "read_gcspr_el0", read_gcspr_el0 },
+ { "enable_writeable", enable_writeable, true },
+ { "enable_push_pop", enable_push_pop, true },
+ { "enable_all", enable_all, true },
+ { "enable_invalid", enable_invalid, true },
+ { "map_guarded_stack", map_guarded_stack },
+ { "fork", test_fork },
+ { "clone3", test_clone3 },
+};
+
+int main(void)
+{
+ int i, ret;
+ unsigned long gcs_mode;
+
+ ksft_print_header();
+
+ /*
+ * We don't have getauxval() with nolibc so treat a failure to
+ * read GCS state as a lack of support and skip.
+ */
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &gcs_mode, 0, 0, 0);
+ if (ret != 0)
+ ksft_exit_skip("Failed to read GCS state: %d\n", ret);
+
+ if (!(gcs_mode & PR_SHADOW_STACK_ENABLE)) {
+ gcs_mode = PR_SHADOW_STACK_ENABLE;
+ ret = my_syscall5(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ gcs_mode, 0, 0, 0);
+ if (ret != 0)
+ ksft_exit_fail_msg("Failed to enable GCS: %d\n", ret);
+ }
+
+ ksft_set_plan(ARRAY_SIZE(tests));
+
+ for (i = 0; i < ARRAY_SIZE(tests); i++) {
+ ksft_test_result((*tests[i].test)(), "%s\n", tests[i].name);
+ }
+
+ /* One last test: disable GCS, we can do this one time */
+ my_syscall5(__NR_prctl, PR_SET_SHADOW_STACK_STATUS, 0, 0, 0, 0);
+ if (ret != 0)
+ ksft_print_msg("Failed to disable GCS: %d\n", ret);
+
+ ksft_finished();
+
+ return 0;
+}
diff --git a/tools/testing/selftests/arm64/gcs/gcs-util.h b/tools/testing/selftests/arm64/gcs/gcs-util.h
new file mode 100644
index 000000000000..b37801c95604
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/gcs-util.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 ARM Limited.
+ */
+
+#ifndef GCS_UTIL_H
+#define GCS_UTIL_H
+
+#include <stdbool.h>
+
+#ifndef __NR_map_shadow_stack
+#define __NR_map_shadow_stack 453
+#endif
+
+#ifndef __NR_prctl
+#define __NR_prctl 167
+#endif
+
+/* Shadow Stack/Guarded Control Stack interface */
+#define PR_GET_SHADOW_STACK_STATUS 71
+#define PR_SET_SHADOW_STACK_STATUS 72
+#define PR_LOCK_SHADOW_STACK_STATUS 73
+
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+# define PR_SHADOW_STACK_WRITE (1UL << 1)
+# define PR_SHADOW_STACK_PUSH (1UL << 2)
+
+#define PR_SHADOW_STACK_ALL_MODES \
+ PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE | PR_SHADOW_STACK_PUSH
+
+#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */
+#define SHADOW_STACK_SET_MARKER (1ULL << 1) /* Set up a top of stack merker in the shadow stack */
+
+#define GCS_CAP_ADDR_MASK (0xfffffffffffff000UL)
+#define GCS_CAP_TOKEN_MASK (0x0000000000000fffUL)
+#define GCS_CAP_VALID_TOKEN 1
+#define GCS_CAP_IN_PROGRESS_TOKEN 5
+
+#define GCS_CAP(x) (((unsigned long)(x) & GCS_CAP_ADDR_MASK) | \
+ GCS_CAP_VALID_TOKEN)
+
+static inline unsigned long *get_gcspr(void)
+{
+ unsigned long *gcspr;
+
+ asm volatile(
+ "mrs %0, S3_3_C2_C5_1"
+ : "=r" (gcspr)
+ :
+ : "cc");
+
+ return gcspr;
+}
+
+static inline void __attribute__((always_inline)) gcsss1(unsigned long *Xt)
+{
+ asm volatile (
+ "sys #3, C7, C7, #2, %0\n"
+ :
+ : "rZ" (Xt)
+ : "memory");
+}
+
+static inline unsigned long __attribute__((always_inline)) *gcsss2(void)
+{
+ unsigned long *Xt;
+
+ asm volatile(
+ "SYSL %0, #3, C7, C7, #3\n"
+ : "=r" (Xt)
+ :
+ : "memory");
+
+ return Xt;
+}
+
+static inline bool chkfeat_gcs(void)
+{
+ register long val __asm__ ("x16") = 1;
+
+ /* CHKFEAT x16 */
+ asm volatile(
+ "hint #0x28\n"
+ : "=r" (val)
+ : "r" (val));
+
+ return val != 1;
+}
+
+#endif
--
2.30.2
Add a stress test which runs one more process than we have CPUs spinning
through a very recursive function with frequent syscalls immediately prior
to return and signals being injected every 100ms. The goal is to flag up
any scheduling related issues, for example failure to ensure that barriers
are inserted when moving a GCS using task to another CPU. The test runs for
a configurable amount of time, defaulting to 10 seconds.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/gcs/.gitignore | 2 +
tools/testing/selftests/arm64/gcs/Makefile | 6 +-
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 ++++++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++++++++
5 files changed, 850 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/gcs/.gitignore b/tools/testing/selftests/arm64/gcs/.gitignore
index 0c86f53f68ad..1e8d1f6b27f2 100644
--- a/tools/testing/selftests/arm64/gcs/.gitignore
+++ b/tools/testing/selftests/arm64/gcs/.gitignore
@@ -1,3 +1,5 @@
basic-gcs
libc-gcs
gcs-locking
+gcs-stress
+gcs-stress-thread
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
index 2173d6275956..d8b06ca51e22 100644
--- a/tools/testing/selftests/arm64/gcs/Makefile
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -6,7 +6,8 @@
# nolibc.
#
-TEST_GEN_PROGS := basic-gcs libc-gcs gcs-locking
+TEST_GEN_PROGS := basic-gcs libc-gcs gcs-locking gcs-stress
+TEST_GEN_PROGS_EXTENDED := gcs-stress-thread
LDLIBS+=-lpthread
@@ -18,3 +19,6 @@ $(OUTPUT)/basic-gcs: basic-gcs.c
-I../../../../../usr/include \
-std=gnu99 -I../.. -g \
-ffreestanding -Wall $^ -o $@ -lgcc
+
+$(OUTPUT)/gcs-stress-thread: gcs-stress-thread.S
+ $(CC) -nostdlib $^ -o $@
diff --git a/tools/testing/selftests/arm64/gcs/asm-offsets.h b/tools/testing/selftests/arm64/gcs/asm-offsets.h
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/tools/testing/selftests/arm64/gcs/gcs-stress-thread.S b/tools/testing/selftests/arm64/gcs/gcs-stress-thread.S
new file mode 100644
index 000000000000..2a08d6bf1ced
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/gcs-stress-thread.S
@@ -0,0 +1,311 @@
+// Program that loops for ever doing lots of recursions and system calls,
+// intended to be used as part of a stress test for GCS context switching.
+//
+// Copyright 2015-2023 Arm Ltd
+
+#include <asm/unistd.h>
+
+#define sa_sz 32
+#define sa_flags 8
+#define sa_handler 0
+#define sa_mask_sz 8
+
+#define si_code 8
+
+#define SIGINT 2
+#define SIGABRT 6
+#define SIGUSR1 10
+#define SIGSEGV 11
+#define SIGUSR2 12
+#define SIGTERM 15
+#define SEGV_CPERR 10
+
+#define SA_NODEFER 1073741824
+#define SA_SIGINFO 4
+#define ucontext_regs 184
+
+#define PR_SET_SHADOW_STACK_STATUS 72
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+
+#define GCSPR_EL0 S3_3_C2_C5_1
+
+.macro function name
+ .macro endfunction
+ .type \name, @function
+ .purgem endfunction
+ .endm
+\name:
+.endm
+
+// Print a single character x0 to stdout
+// Clobbers x0-x2,x8
+function putc
+ str x0, [sp, #-16]!
+
+ mov x0, #1 // STDOUT_FILENO
+ mov x1, sp
+ mov x2, #1
+ mov x8, #__NR_write
+ svc #0
+
+ add sp, sp, #16
+ ret
+endfunction
+.globl putc
+
+// Print a NUL-terminated string starting at address x0 to stdout
+// Clobbers x0-x3,x8
+function puts
+ mov x1, x0
+
+ mov x2, #0
+0: ldrb w3, [x0], #1
+ cbz w3, 1f
+ add x2, x2, #1
+ b 0b
+
+1: mov w0, #1 // STDOUT_FILENO
+ mov x8, #__NR_write
+ svc #0
+
+ ret
+endfunction
+.globl puts
+
+// Utility macro to print a literal string
+// Clobbers x0-x4,x8
+.macro puts string
+ .pushsection .rodata.str1.1, "aMS", @progbits, 1
+.L__puts_literal\@: .string "\string"
+ .popsection
+
+ ldr x0, =.L__puts_literal\@
+ bl puts
+.endm
+
+// Print an unsigned decimal number x0 to stdout
+// Clobbers x0-x4,x8
+function putdec
+ mov x1, sp
+ str x30, [sp, #-32]! // Result can't be > 20 digits
+
+ mov x2, #0
+ strb w2, [x1, #-1]! // Write the NUL terminator
+
+ mov x2, #10
+0: udiv x3, x0, x2 // div-mod loop to generate the digits
+ msub x0, x3, x2, x0
+ add w0, w0, #'0'
+ strb w0, [x1, #-1]!
+ mov x0, x3
+ cbnz x3, 0b
+
+ ldrb w0, [x1]
+ cbnz w0, 1f
+ mov w0, #'0' // Print "0" for 0, not ""
+ strb w0, [x1, #-1]!
+
+1: mov x0, x1
+ bl puts
+
+ ldr x30, [sp], #32
+ ret
+endfunction
+.globl putdec
+
+// Print an unsigned decimal number x0 to stdout, followed by a newline
+// Clobbers x0-x5,x8
+function putdecn
+ mov x5, x30
+
+ bl putdec
+ mov x0, #'\n'
+ bl putc
+
+ ret x5
+endfunction
+.globl putdecn
+
+// Fill x1 bytes starting at x0 with 0.
+// Clobbers x1, x2.
+function memclr
+ mov w2, #0
+endfunction
+.globl memclr
+ // fall through to memfill
+
+// Trivial memory fill: fill x1 bytes starting at address x0 with byte w2
+// Clobbers x1
+function memfill
+ cmp x1, #0
+ b.eq 1f
+
+0: strb w2, [x0], #1
+ subs x1, x1, #1
+ b.ne 0b
+
+1: ret
+endfunction
+.globl memfill
+
+// w0: signal number
+// x1: sa_action
+// w2: sa_flags
+// Clobbers x0-x6,x8
+function setsignal
+ str x30, [sp, #-((sa_sz + 15) / 16 * 16 + 16)]!
+
+ mov w4, w0
+ mov x5, x1
+ mov w6, w2
+
+ add x0, sp, #16
+ mov x1, #sa_sz
+ bl memclr
+
+ mov w0, w4
+ add x1, sp, #16
+ str w6, [x1, #sa_flags]
+ str x5, [x1, #sa_handler]
+ mov x2, #0
+ mov x3, #sa_mask_sz
+ mov x8, #__NR_rt_sigaction
+ svc #0
+
+ cbz w0, 1f
+
+ puts "sigaction failure\n"
+ b abort
+
+1: ldr x30, [sp], #((sa_sz + 15) / 16 * 16 + 16)
+ ret
+endfunction
+
+
+function tickle_handler
+ // Perhaps collect GCSPR_EL0 here in future?
+ ret
+endfunction
+
+function terminate_handler
+ mov w21, w0
+ mov x20, x2
+
+ puts "Terminated by signal "
+ mov w0, w21
+ bl putdec
+ puts ", no error\n"
+
+ mov x0, #0
+ mov x8, #__NR_exit
+ svc #0
+endfunction
+
+function segv_handler
+ // stash the siginfo_t *
+ mov x20, x1
+
+ // Disable GCS, we don't want additional faults logging things
+ mov x0, PR_SET_SHADOW_STACK_STATUS
+ mov x1, xzr
+ mov x2, xzr
+ mov x3, xzr
+ mov x4, xzr
+ mov x5, xzr
+ mov x8, #__NR_prctl
+ svc #0
+
+ puts "Got SIGSEGV code "
+
+ ldr x21, [x20, #si_code]
+ mov x0, x21
+ bl putdec
+
+ // GCS faults should have si_code SEGV_CPERR
+ cmp x21, #SEGV_CPERR
+ bne 1f
+
+ puts " (GCS violation)"
+1:
+ mov x0, '\n'
+ bl putc
+ b abort
+endfunction
+
+// Recurse x20 times
+.macro recurse id
+function recurse\id
+ stp x29, x30, [sp, #-16]!
+ mov x29, sp
+
+ cmp x20, 0
+ beq 1f
+ sub x20, x20, 1
+ bl recurse\id
+
+1:
+ ldp x29, x30, [sp], #16
+
+ // Do a syscall immediately prior to returning to try to provoke
+ // scheduling and migration at a point where coherency issues
+ // might trigger.
+ mov x8, #__NR_getpid
+ svc #0
+
+ ret
+endfunction
+.endm
+
+// Generate and use two copies so we're changing the GCS contents
+recurse 1
+recurse 2
+
+.globl _start
+function _start
+ // Run with GCS
+ mov x0, PR_SET_SHADOW_STACK_STATUS
+ mov x1, PR_SHADOW_STACK_ENABLE
+ mov x2, xzr
+ mov x3, xzr
+ mov x4, xzr
+ mov x5, xzr
+ mov x8, #__NR_prctl
+ svc #0
+ cbz x0, 1f
+ puts "Failed to enable GCS\n"
+ b abort
+1:
+
+ mov w0, #SIGTERM
+ adr x1, terminate_handler
+ mov w2, #SA_SIGINFO
+ bl setsignal
+
+ mov w0, #SIGUSR1
+ adr x1, tickle_handler
+ mov w2, #SA_SIGINFO
+ orr w2, w2, #SA_NODEFER
+ bl setsignal
+
+ mov w0, #SIGSEGV
+ adr x1, segv_handler
+ mov w2, #SA_SIGINFO
+ orr w2, w2, #SA_NODEFER
+ bl setsignal
+
+ puts "Running\n"
+
+loop:
+ // Small recursion depth so we're frequently flipping between
+ // the two recursors and changing what's on the stack
+ mov x20, #5
+ bl recurse1
+ mov x20, #5
+ bl recurse2
+ b loop
+endfunction
+
+abort:
+ mov x0, #255
+ mov x8, #__NR_exit
+ svc #0
diff --git a/tools/testing/selftests/arm64/gcs/gcs-stress.c b/tools/testing/selftests/arm64/gcs/gcs-stress.c
new file mode 100644
index 000000000000..23fd8ec37bdc
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/gcs-stress.c
@@ -0,0 +1,532 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022-3 ARM Limited.
+ */
+
+#define _GNU_SOURCE
+#define _POSIX_C_SOURCE 199309L
+
+#include <errno.h>
+#include <getopt.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/auxv.h>
+#include <sys/epoll.h>
+#include <sys/prctl.h>
+#include <sys/types.h>
+#include <sys/uio.h>
+#include <sys/wait.h>
+#include <asm/hwcap.h>
+
+#include "../../kselftest.h"
+
+struct child_data {
+ char *name, *output;
+ pid_t pid;
+ int stdout;
+ bool output_seen;
+ bool exited;
+ int exit_status;
+ int exit_signal;
+};
+
+static int epoll_fd;
+static struct child_data *children;
+static struct epoll_event *evs;
+static int tests;
+static int num_children;
+static bool terminate;
+
+static int startup_pipe[2];
+
+static int num_processors(void)
+{
+ long nproc = sysconf(_SC_NPROCESSORS_CONF);
+ if (nproc < 0) {
+ perror("Unable to read number of processors\n");
+ exit(EXIT_FAILURE);
+ }
+
+ return nproc;
+}
+
+static void start_thread(struct child_data *child)
+{
+ int ret, pipefd[2], i;
+ struct epoll_event ev;
+
+ ret = pipe(pipefd);
+ if (ret != 0)
+ ksft_exit_fail_msg("Failed to create stdout pipe: %s (%d)\n",
+ strerror(errno), errno);
+
+ child->pid = fork();
+ if (child->pid == -1)
+ ksft_exit_fail_msg("fork() failed: %s (%d)\n",
+ strerror(errno), errno);
+
+ if (!child->pid) {
+ /*
+ * In child, replace stdout with the pipe, errors to
+ * stderr from here as kselftest prints to stdout.
+ */
+ ret = dup2(pipefd[1], 1);
+ if (ret == -1) {
+ fprintf(stderr, "dup2() %d\n", errno);
+ exit(EXIT_FAILURE);
+ }
+
+ /*
+ * Duplicate the read side of the startup pipe to
+ * FD 3 so we can close everything else.
+ */
+ ret = dup2(startup_pipe[0], 3);
+ if (ret == -1) {
+ fprintf(stderr, "dup2() %d\n", errno);
+ exit(EXIT_FAILURE);
+ }
+
+ /*
+ * Very dumb mechanism to clean open FDs other than
+ * stdio. We don't want O_CLOEXEC for the pipes...
+ */
+ for (i = 4; i < 8192; i++)
+ close(i);
+
+ /*
+ * Read from the startup pipe, there should be no data
+ * and we should block until it is closed. We just
+ * carry on on error since this isn't super critical.
+ */
+ ret = read(3, &i, sizeof(i));
+ if (ret < 0)
+ fprintf(stderr, "read(startp pipe) failed: %s (%d)\n",
+ strerror(errno), errno);
+ if (ret > 0)
+ fprintf(stderr, "%d bytes of data on startup pipe\n",
+ ret);
+ close(3);
+
+ ret = execl("gcs-stress-thread", "gcs-stress-thread", NULL);
+ fprintf(stderr, "execl(gcs-stress-thread) failed: %d (%s)\n",
+ errno, strerror(errno));
+
+ exit(EXIT_FAILURE);
+ } else {
+ /*
+ * In parent, remember the child and close our copy of the
+ * write side of stdout.
+ */
+ close(pipefd[1]);
+ child->stdout = pipefd[0];
+ child->output = NULL;
+ child->exited = false;
+ child->output_seen = false;
+
+ ev.events = EPOLLIN | EPOLLHUP;
+ ev.data.ptr = child;
+
+ ret = asprintf(&child->name, "Thread-%d", child->pid);
+ if (ret == -1)
+ ksft_exit_fail_msg("asprintf() failed\n");
+
+ ret = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, child->stdout, &ev);
+ if (ret < 0) {
+ ksft_exit_fail_msg("%s EPOLL_CTL_ADD failed: %s (%d)\n",
+ child->name, strerror(errno), errno);
+ }
+ }
+
+ ksft_print_msg("Started %s\n", child->name);
+ num_children++;
+}
+
+static bool child_output_read(struct child_data *child)
+{
+ char read_data[1024];
+ char work[1024];
+ int ret, len, cur_work, cur_read;
+
+ ret = read(child->stdout, read_data, sizeof(read_data));
+ if (ret < 0) {
+ if (errno == EINTR)
+ return true;
+
+ ksft_print_msg("%s: read() failed: %s (%d)\n",
+ child->name, strerror(errno),
+ errno);
+ return false;
+ }
+ len = ret;
+
+ child->output_seen = true;
+
+ /* Pick up any partial read */
+ if (child->output) {
+ strncpy(work, child->output, sizeof(work) - 1);
+ cur_work = strnlen(work, sizeof(work));
+ free(child->output);
+ child->output = NULL;
+ } else {
+ cur_work = 0;
+ }
+
+ cur_read = 0;
+ while (cur_read < len) {
+ work[cur_work] = read_data[cur_read++];
+
+ if (work[cur_work] == '\n') {
+ work[cur_work] = '\0';
+ ksft_print_msg("%s: %s\n", child->name, work);
+ cur_work = 0;
+ } else {
+ cur_work++;
+ }
+ }
+
+ if (cur_work) {
+ work[cur_work] = '\0';
+ ret = asprintf(&child->output, "%s", work);
+ if (ret == -1)
+ ksft_exit_fail_msg("Out of memory\n");
+ }
+
+ return false;
+}
+
+static void child_output(struct child_data *child, uint32_t events,
+ bool flush)
+{
+ bool read_more;
+
+ if (events & EPOLLIN) {
+ do {
+ read_more = child_output_read(child);
+ } while (read_more);
+ }
+
+ if (events & EPOLLHUP) {
+ close(child->stdout);
+ child->stdout = -1;
+ flush = true;
+ }
+
+ if (flush && child->output) {
+ ksft_print_msg("%s: %s<EOF>\n", child->name, child->output);
+ free(child->output);
+ child->output = NULL;
+ }
+}
+
+static void child_tickle(struct child_data *child)
+{
+ if (child->output_seen && !child->exited)
+ kill(child->pid, SIGUSR1);
+}
+
+static void child_stop(struct child_data *child)
+{
+ if (!child->exited)
+ kill(child->pid, SIGTERM);
+}
+
+static void child_cleanup(struct child_data *child)
+{
+ pid_t ret;
+ int status;
+ bool fail = false;
+
+ if (!child->exited) {
+ do {
+ ret = waitpid(child->pid, &status, 0);
+ if (ret == -1 && errno == EINTR)
+ continue;
+
+ if (ret == -1) {
+ ksft_print_msg("waitpid(%d) failed: %s (%d)\n",
+ child->pid, strerror(errno),
+ errno);
+ fail = true;
+ break;
+ }
+
+ if (WIFEXITED(status)) {
+ child->exit_status = WEXITSTATUS(status);
+ child->exited = true;
+ }
+
+ if (WIFSIGNALED(status)) {
+ child->exit_signal = WTERMSIG(status);
+ ksft_print_msg("%s: Exited due to signal %d\n",
+ child->name);
+ fail = true;
+ child->exited = true;
+ }
+ } while (!child->exited);
+ }
+
+ if (!child->output_seen) {
+ ksft_print_msg("%s no output seen\n", child->name);
+ fail = true;
+ }
+
+ if (child->exit_status != 0) {
+ ksft_print_msg("%s exited with error code %d\n",
+ child->name, child->exit_status);
+ fail = true;
+ }
+
+ ksft_test_result(!fail, "%s\n", child->name);
+}
+
+static void handle_child_signal(int sig, siginfo_t *info, void *context)
+{
+ int i;
+ bool found = false;
+
+ for (i = 0; i < num_children; i++) {
+ if (children[i].pid == info->si_pid) {
+ children[i].exited = true;
+ children[i].exit_status = info->si_status;
+ found = true;
+ break;
+ }
+ }
+
+ if (!found)
+ ksft_print_msg("SIGCHLD for unknown PID %d with status %d\n",
+ info->si_pid, info->si_status);
+}
+
+static void handle_exit_signal(int sig, siginfo_t *info, void *context)
+{
+ int i;
+
+ /* If we're already exiting then don't signal again */
+ if (terminate)
+ return;
+
+ ksft_print_msg("Got signal, exiting...\n");
+
+ terminate = true;
+
+ /*
+ * This should be redundant, the main loop should clean up
+ * after us, but for safety stop everything we can here.
+ */
+ for (i = 0; i < num_children; i++)
+ child_stop(&children[i]);
+}
+
+/* Handle any pending output without blocking */
+static void drain_output(bool flush)
+{
+ int ret = 1;
+ int i;
+
+ while (ret > 0) {
+ ret = epoll_wait(epoll_fd, evs, tests, 0);
+ if (ret < 0) {
+ if (errno == EINTR)
+ continue;
+ ksft_print_msg("epoll_wait() failed: %s (%d)\n",
+ strerror(errno), errno);
+ }
+
+ for (i = 0; i < ret; i++)
+ child_output(evs[i].data.ptr, evs[i].events, flush);
+ }
+}
+
+static const struct option options[] = {
+ { "timeout", required_argument, NULL, 't' },
+ { }
+};
+
+int main(int argc, char **argv)
+{
+ int seen_children;
+ bool all_children_started = false;
+ int gcs_threads;
+ int timeout = 10;
+ int ret, cpus, i, c;
+ struct sigaction sa;
+
+ while ((c = getopt_long(argc, argv, "t:", options, NULL)) != -1) {
+ switch (c) {
+ case 't':
+ ret = sscanf(optarg, "%d", &timeout);
+ if (ret != 1)
+ ksft_exit_fail_msg("Failed to parse timeout %s\n",
+ optarg);
+ break;
+ default:
+ ksft_exit_fail_msg("Unknown argument\n");
+ }
+ }
+
+ cpus = num_processors();
+ tests = 0;
+
+ if (getauxval(AT_HWCAP2) & HWCAP2_GCS) {
+ /* One extra thread, trying to trigger migrations */
+ gcs_threads = cpus + 1;
+ tests += gcs_threads;
+ } else {
+ gcs_threads = 0;
+ }
+
+ ksft_print_header();
+ ksft_set_plan(tests);
+
+ ksft_print_msg("%d CPUs, %d GCS threads\n",
+ cpus, gcs_threads);
+
+ if (!tests)
+ ksft_exit_skip("No tests scheduled\n");
+
+ if (timeout > 0)
+ ksft_print_msg("Will run for %ds\n", timeout);
+ else
+ ksft_print_msg("Will run until terminated\n");
+
+ children = calloc(sizeof(*children), tests);
+ if (!children)
+ ksft_exit_fail_msg("Unable to allocate child data\n");
+
+ ret = epoll_create1(EPOLL_CLOEXEC);
+ if (ret < 0)
+ ksft_exit_fail_msg("epoll_create1() failed: %s (%d)\n",
+ strerror(errno), ret);
+ epoll_fd = ret;
+
+ /* Create a pipe which children will block on before execing */
+ ret = pipe(startup_pipe);
+ if (ret != 0)
+ ksft_exit_fail_msg("Failed to create startup pipe: %s (%d)\n",
+ strerror(errno), errno);
+
+ /* Get signal handers ready before we start any children */
+ memset(&sa, 0, sizeof(sa));
+ sa.sa_sigaction = handle_exit_signal;
+ sa.sa_flags = SA_RESTART | SA_SIGINFO;
+ sigemptyset(&sa.sa_mask);
+ ret = sigaction(SIGINT, &sa, NULL);
+ if (ret < 0)
+ ksft_print_msg("Failed to install SIGINT handler: %s (%d)\n",
+ strerror(errno), errno);
+ ret = sigaction(SIGTERM, &sa, NULL);
+ if (ret < 0)
+ ksft_print_msg("Failed to install SIGTERM handler: %s (%d)\n",
+ strerror(errno), errno);
+ sa.sa_sigaction = handle_child_signal;
+ ret = sigaction(SIGCHLD, &sa, NULL);
+ if (ret < 0)
+ ksft_print_msg("Failed to install SIGCHLD handler: %s (%d)\n",
+ strerror(errno), errno);
+
+ evs = calloc(tests, sizeof(*evs));
+ if (!evs)
+ ksft_exit_fail_msg("Failed to allocated %d epoll events\n",
+ tests);
+
+ for (i = 0; i < gcs_threads; i++)
+ start_thread(&children[i]);
+
+ /*
+ * All children started, close the startup pipe and let them
+ * run.
+ */
+ close(startup_pipe[0]);
+ close(startup_pipe[1]);
+
+ timeout *= 10;
+ for (;;) {
+ /* Did we get a signal asking us to exit? */
+ if (terminate)
+ break;
+
+ /*
+ * Timeout is counted in 100ms with no output, the
+ * tests print during startup then are silent when
+ * running so this should ensure they all ran enough
+ * to install the signal handler, this is especially
+ * useful in emulation where we will both be slow and
+ * likely to have a large set of VLs.
+ */
+ ret = epoll_wait(epoll_fd, evs, tests, 100);
+ if (ret < 0) {
+ if (errno == EINTR)
+ continue;
+ ksft_exit_fail_msg("epoll_wait() failed: %s (%d)\n",
+ strerror(errno), errno);
+ }
+
+ /* Output? */
+ if (ret > 0) {
+ for (i = 0; i < ret; i++) {
+ child_output(evs[i].data.ptr, evs[i].events,
+ false);
+ }
+ continue;
+ }
+
+ /* Otherwise epoll_wait() timed out */
+
+ /*
+ * If the child processes have not produced output they
+ * aren't actually running the tests yet.
+ */
+ if (!all_children_started) {
+ seen_children = 0;
+
+ for (i = 0; i < num_children; i++)
+ if (children[i].output_seen ||
+ children[i].exited)
+ seen_children++;
+
+ if (seen_children != num_children) {
+ ksft_print_msg("Waiting for %d children\n",
+ num_children - seen_children);
+ continue;
+ }
+
+ all_children_started = true;
+ }
+
+ ksft_print_msg("Sending signals, timeout remaining: %d00ms\n",
+ timeout);
+
+ for (i = 0; i < num_children; i++)
+ child_tickle(&children[i]);
+
+ /* Negative timeout means run indefinitely */
+ if (timeout < 0)
+ continue;
+ if (--timeout == 0)
+ break;
+ }
+
+ ksft_print_msg("Finishing up...\n");
+ terminate = true;
+
+ for (i = 0; i < tests; i++)
+ child_stop(&children[i]);
+
+ drain_output(false);
+
+ for (i = 0; i < tests; i++)
+ child_cleanup(&children[i]);
+
+ drain_output(true);
+
+ ksft_print_cnts();
+
+ return 0;
+}
--
2.30.2
There are two registers controlling the GCS state of EL0, GCSPR_EL0 which
is the current GCS pointer and GCSCRE0_EL1 which has enable bits for the
specific GCS functionality enabled for EL0. Manage these on context switch
and process lifetime events, GCS is reset on exec(). Also ensure that
any changes to the GCS memory are visible to other PEs and that changes
from other PEs are visible on this one by issuing a GCSB DSYNC when
moving to or from a thread with GCS.
Since the current GCS configuration of a thread will be visible to
userspace we store the configuration in the format used with userspace
and provide a helper which configures the system register as needed.
On systems that support GCS we always allow access to GCSPR_EL0, this
facilitates reporting of GCS faults if userspace implements disabling of
GCS on error - the GCS can still be discovered and examined even if GCS
has been disabled.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 24 ++++++++++++++++
arch/arm64/include/asm/processor.h | 6 ++++
arch/arm64/kernel/process.c | 56 ++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/gcs.c | 39 ++++++++++++++++++++++++++
5 files changed, 126 insertions(+)
diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 7c5e95218db6..04594ef59dad 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -48,4 +48,28 @@ static inline u64 gcsss2(void)
return Xt;
}
+#ifdef CONFIG_ARM64_GCS
+
+static inline bool task_gcs_el0_enabled(struct task_struct *task)
+{
+ return current->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE;
+}
+
+void gcs_set_el0_mode(struct task_struct *task);
+void gcs_free(struct task_struct *task);
+void gcs_preserve_current_state(void);
+
+#else
+
+static inline bool task_gcs_el0_enabled(struct task_struct *task)
+{
+ return false;
+}
+
+static inline void gcs_set_el0_mode(struct task_struct *task) { }
+static inline void gcs_free(struct task_struct *task) { }
+static inline void gcs_preserve_current_state(void) { }
+
+#endif
+
#endif
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 5b0a04810b23..6fc6dcbd494c 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -182,6 +182,12 @@ struct thread_struct {
u64 sctlr_user;
u64 svcr;
u64 tpidr2_el0;
+#ifdef CONFIG_ARM64_GCS
+ unsigned int gcs_el0_mode;
+ u64 gcspr_el0;
+ u64 gcs_base;
+ u64 gcs_size;
+#endif
};
static inline unsigned int thread_get_vl(struct thread_struct *thread,
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 7387b68c745b..fd80b43c2969 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -48,6 +48,7 @@
#include <asm/cacheflush.h>
#include <asm/exec.h>
#include <asm/fpsimd.h>
+#include <asm/gcs.h>
#include <asm/mmu_context.h>
#include <asm/mte.h>
#include <asm/processor.h>
@@ -271,12 +272,32 @@ static void flush_tagged_addr_state(void)
clear_thread_flag(TIF_TAGGED_ADDR);
}
+#ifdef CONFIG_ARM64_GCS
+
+static void flush_gcs(void)
+{
+ if (!system_supports_gcs())
+ return;
+
+ gcs_free(current);
+ current->thread.gcs_el0_mode = 0;
+ write_sysreg_s(0, SYS_GCSCRE0_EL1);
+ write_sysreg_s(0, SYS_GCSPR_EL0);
+}
+
+#else
+
+static void flush_gcs(void) { }
+
+#endif
+
void flush_thread(void)
{
fpsimd_flush_thread();
tls_thread_flush();
flush_ptrace_hw_breakpoint(current);
flush_tagged_addr_state();
+ flush_gcs();
}
void arch_release_task_struct(struct task_struct *tsk)
@@ -474,6 +495,40 @@ static void entry_task_switch(struct task_struct *next)
__this_cpu_write(__entry_task, next);
}
+#ifdef CONFIG_ARM64_GCS
+
+void gcs_preserve_current_state(void)
+{
+ if (task_gcs_el0_enabled(current))
+ current->thread.gcspr_el0 = read_sysreg_s(SYS_GCSPR_EL0);
+}
+
+static void gcs_thread_switch(struct task_struct *next)
+{
+ if (!system_supports_gcs())
+ return;
+
+ gcs_preserve_current_state();
+
+ gcs_set_el0_mode(next);
+ write_sysreg_s(next->thread.gcspr_el0, SYS_GCSPR_EL0);
+
+ /*
+ * Ensure that GCS changes are observable by/from other PEs in
+ * case of migration.
+ */
+ if (task_gcs_el0_enabled(current) || task_gcs_el0_enabled(next))
+ gcsb_dsync();
+}
+
+#else
+
+static void gcs_thread_switch(struct task_struct *next)
+{
+}
+
+#endif
+
/*
* ARM erratum 1418040 handling, affecting the 32bit view of CNTVCT.
* Ensure access is disabled when switching to a 32bit task, ensure
@@ -533,6 +588,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
ssbs_thread_switch(next);
erratum_1418040_thread_switch(next);
ptrauth_thread_switch_user(next);
+ gcs_thread_switch(next);
/*
* Complete any pending TLB or cache maintenance on this CPU in case
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index dbd1bc95967d..4e7cb2f02999 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_TRANS_TABLE) += trans_pgd.o
obj-$(CONFIG_TRANS_TABLE) += trans_pgd-asm.o
obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
obj-$(CONFIG_ARM64_MTE) += mteswap.o
+obj-$(CONFIG_ARM64_GCS) += gcs.o
KASAN_SANITIZE_physaddr.o += n
obj-$(CONFIG_KASAN) += kasan_init.o
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
new file mode 100644
index 000000000000..b0a67efc522b
--- /dev/null
+++ b/arch/arm64/mm/gcs.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/syscalls.h>
+#include <linux/types.h>
+
+#include <asm/cpufeature.h>
+#include <asm/page.h>
+
+/*
+ * Apply the GCS mode configured for the specified task to the
+ * hardware.
+ */
+void gcs_set_el0_mode(struct task_struct *task)
+{
+ u64 gcscre0_el1 = GCSCRE0_EL1_nTR;
+
+ if (task->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE)
+ gcscre0_el1 |= GCSCRE0_EL1_RVCHKEN | GCSCRE0_EL1_PCRSEL;
+
+ if (task->thread.gcs_el0_mode & PR_SHADOW_STACK_WRITE)
+ gcscre0_el1 |= GCSCRE0_EL1_STREn;
+
+ if (task->thread.gcs_el0_mode & PR_SHADOW_STACK_PUSH)
+ gcscre0_el1 |= GCSCRE0_EL1_PUSHMEn;
+
+ write_sysreg_s(gcscre0_el1, SYS_GCSCRE0_EL1);
+}
+
+void gcs_free(struct task_struct *task)
+{
+ if (task->thread.gcs_base)
+ vm_munmap(task->thread.gcs_base, task->thread.gcs_size);
+
+ task->thread.gcspr_el0 = 0;
+ task->thread.gcs_base = 0;
+ task->thread.gcs_size = 0;
+}
--
2.30.2
As discussed extensively in the changelog for the addition of this
syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
existing mmap() and madvise() syscalls do not map entirely well onto the
security requirements for guarded control stacks since they lead to
windows where memory is allocated but not yet protected or stacks which
are not properly and safely initialised. Instead a new syscall
map_shadow_stack() has been defined which allocates and initialises a
shadow stack page.
Implement this for arm64. Two flags are provided, allowing applications
to request that the stack be initialised with a valid cap token at the
top of the stack and optionally also an end of stack marker above that.
We support requesting an end of stack marker alone but since this is a
NULL pointer it is indistinguishable from not initialising anything by
itself.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/mm/gcs.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 95f5cf599bc6..f34821d98d85 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -115,6 +115,67 @@ unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
return addr;
}
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
+{
+ unsigned long alloc_size;
+ unsigned long __user *cap_ptr;
+ unsigned long cap_val;
+ int ret = 0;
+ int cap_offset;
+
+ if (!system_supports_gcs())
+ return -EOPNOTSUPP;
+
+ if (flags & ~(SHADOW_STACK_SET_TOKEN | SHADOW_STACK_SET_MARKER))
+ return -EINVAL;
+
+ if (addr && (addr % PAGE_SIZE))
+ return -EINVAL;
+
+ if (size == 8 || size % 8)
+ return -EINVAL;
+
+ /*
+ * An overflow would result in attempting to write the restore token
+ * to the wrong location. Not catastrophic, but just return the right
+ * error code and block it.
+ */
+ alloc_size = PAGE_ALIGN(size);
+ if (alloc_size < size)
+ return -EOVERFLOW;
+
+ addr = alloc_gcs(addr, alloc_size, 0, false);
+ if (IS_ERR_VALUE(addr))
+ return addr;
+
+ /*
+ * Put a cap token at the end of the allocated region so it
+ * can be switched to.
+ */
+ if (flags & SHADOW_STACK_SET_TOKEN) {
+ /* Leave an extra empty frame as a top of stack marker? */
+ if (flags & SHADOW_STACK_SET_MARKER)
+ cap_offset = 2;
+ else
+ cap_offset = 1;
+
+ cap_ptr = (unsigned long __user *)(addr + size -
+ (cap_offset * sizeof(unsigned long)));
+ cap_val = GCS_CAP(cap_ptr);
+
+ put_user_gcs(cap_val, cap_ptr, &ret);
+ if (ret != 0) {
+ vm_munmap(addr, size);
+ return -EFAULT;
+ }
+
+ /* Ensure the new cap is viaible for GCS */
+ gcsb_dsync();
+ }
+
+ return addr;
+}
+
/*
* Apply the GCS mode configured for the specified task to the
* hardware.
--
2.30.2
In preparation for testing GCS related signal handling add it as a feature
we check for in the signal handling support code.
Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/signal/test_signals.h | 2 ++
tools/testing/selftests/arm64/signal/test_signals_utils.c | 3 +++
2 files changed, 5 insertions(+)
diff --git a/tools/testing/selftests/arm64/signal/test_signals.h b/tools/testing/selftests/arm64/signal/test_signals.h
index 1e6273d81575..7ada43688c02 100644
--- a/tools/testing/selftests/arm64/signal/test_signals.h
+++ b/tools/testing/selftests/arm64/signal/test_signals.h
@@ -35,6 +35,7 @@ enum {
FSME_BIT,
FSME_FA64_BIT,
FSME2_BIT,
+ FGCS_BIT,
FMAX_END
};
@@ -43,6 +44,7 @@ enum {
#define FEAT_SME (1UL << FSME_BIT)
#define FEAT_SME_FA64 (1UL << FSME_FA64_BIT)
#define FEAT_SME2 (1UL << FSME2_BIT)
+#define FEAT_GCS (1UL << FGCS_BIT)
/*
* A descriptor used to describe and configure a test case.
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c
index 0dc948db3a4a..89ef95c1af0e 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.c
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c
@@ -30,6 +30,7 @@ static char const *const feats_names[FMAX_END] = {
" SME ",
" FA64 ",
" SME2 ",
+ " GCS ",
};
#define MAX_FEATS_SZ 128
@@ -329,6 +330,8 @@ int test_init(struct tdescr *td)
td->feats_supported |= FEAT_SME_FA64;
if (getauxval(AT_HWCAP2) & HWCAP2_SME2)
td->feats_supported |= FEAT_SME2;
+ if (getauxval(AT_HWCAP2) & HWCAP2_GCS)
+ td->feats_supported |= FEAT_GCS;
if (feats_ok(td)) {
if (td->feats_required & td->feats_supported)
fprintf(stderr,
--
2.30.2
Provide a new register type NT_ARM_GCS reporting the current GCS mode
and pointer for EL0. Due to the interactions with allocation and
deallocation of Guarded Control Stacks we do not permit any changes to
the GCS mode via ptrace, only GCSPR_EL0 may be changed.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/uapi/asm/ptrace.h | 8 +++++
arch/arm64/kernel/ptrace.c | 59 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/elf.h | 1 +
3 files changed, 68 insertions(+)
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 7fa2f7036aa7..0f39ba4f3efd 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -324,6 +324,14 @@ struct user_za_header {
#define ZA_PT_SIZE(vq) \
(ZA_PT_ZA_OFFSET + ZA_PT_ZA_SIZE(vq))
+/* GCS state (NT_ARM_GCS) */
+
+struct user_gcs {
+ __u64 features_enabled;
+ __u64 features_locked;
+ __u64 gcspr_el0;
+};
+
#endif /* __ASSEMBLY__ */
#endif /* _UAPI__ASM_PTRACE_H */
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index dc6cf0e37194..c8dd489cfca8 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -34,6 +34,7 @@
#include <asm/cpufeature.h>
#include <asm/debug-monitors.h>
#include <asm/fpsimd.h>
+#include <asm/gcs.h>
#include <asm/mte.h>
#include <asm/pointer_auth.h>
#include <asm/stacktrace.h>
@@ -1411,6 +1412,51 @@ static int tagged_addr_ctrl_set(struct task_struct *target, const struct
}
#endif
+#ifdef CONFIG_ARM64_GCS
+static int gcs_get(struct task_struct *target,
+ const struct user_regset *regset,
+ struct membuf to)
+{
+ struct user_gcs user_gcs;
+
+ if (target == current)
+ gcs_preserve_current_state();
+
+ user_gcs.features_enabled = target->thread.gcs_el0_mode;
+ user_gcs.features_locked = target->thread.gcs_el0_locked;
+ user_gcs.gcspr_el0 = target->thread.gcspr_el0;
+
+ return membuf_write(&to, &user_gcs, sizeof(user_gcs));
+}
+
+static int gcs_set(struct task_struct *target, const struct
+ user_regset *regset, unsigned int pos,
+ unsigned int count, const void *kbuf, const
+ void __user *ubuf)
+{
+ int ret;
+ struct user_gcs user_gcs;
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &user_gcs, 0, -1);
+ if (ret)
+ return ret;
+
+ if (user_gcs.features_enabled & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK)
+ return -EINVAL;
+
+ /* Do not allow enable via ptrace */
+ if ((user_gcs.features_enabled & PR_SHADOW_STACK_ENABLE) &&
+ !(target->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE))
+ return -EBUSY;
+
+ target->thread.gcs_el0_mode = user_gcs.features_enabled;
+ target->thread.gcs_el0_locked = user_gcs.features_locked;
+ target->thread.gcspr_el0 = user_gcs.gcspr_el0;
+
+ return 0;
+}
+#endif
+
enum aarch64_regset {
REGSET_GPR,
REGSET_FPR,
@@ -1439,6 +1485,9 @@ enum aarch64_regset {
#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
REGSET_TAGGED_ADDR_CTRL,
#endif
+#ifdef CONFIG_ARM64_GCS
+ REGSET_GCS,
+#endif
};
static const struct user_regset aarch64_regsets[] = {
@@ -1589,6 +1638,16 @@ static const struct user_regset aarch64_regsets[] = {
.set = tagged_addr_ctrl_set,
},
#endif
+#ifdef CONFIG_ARM64_GCS
+ [REGSET_GCS] = {
+ .core_note_type = NT_ARM_GCS,
+ .n = sizeof(struct user_gcs) / sizeof(u64),
+ .size = sizeof(u64),
+ .align = sizeof(u64),
+ .regset_get = gcs_get,
+ .set = gcs_set,
+ },
+#endif
};
static const struct user_regset_view user_aarch64_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 9417309b7230..436dfc359f61 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -440,6 +440,7 @@ typedef struct elf64_shdr {
#define NT_ARM_SSVE 0x40b /* ARM Streaming SVE registers */
#define NT_ARM_ZA 0x40c /* ARM SME ZA registers */
#define NT_ARM_ZT 0x40d /* ARM SME ZT registers */
+#define NT_ARM_GCS 0x40e /* ARM GCS state */
#define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */
#define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */
#define NT_MIPS_DSP 0x800 /* MIPS DSP ASE registers */
--
2.30.2
Add a context for the GCS state and include it in the signal context when
running on a system that supports GCS. We reuse the same flags that the
prctl() uses to specify which GCS features are enabled and also provide the
current GCS pointer.
We do not support enabling GCS via signal return, there is a conflict
between specifying GCSPR_EL0 and allocation of a new GCS and this is not
an ancticipated use case. We also enforce GCS configuration locking on
signal return.
Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/uapi/asm/sigcontext.h | 9 +++
arch/arm64/kernel/signal.c | 108 +++++++++++++++++++++++++++++++
2 files changed, 117 insertions(+)
diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
index f23c1dc3f002..7b66d245f2d2 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -168,6 +168,15 @@ struct zt_context {
__u16 __reserved[3];
};
+#define GCS_MAGIC 0x47435300
+
+struct gcs_context {
+ struct _aarch64_ctx head;
+ __u64 gcspr;
+ __u64 features_enabled;
+ __u64 reserved;
+};
+
#endif /* !__ASSEMBLY__ */
#include <asm/sve_context.h>
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 1cca646a7479..91bfb315bfd3 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -88,6 +88,7 @@ struct rt_sigframe_user_layout {
unsigned long fpsimd_offset;
unsigned long esr_offset;
+ unsigned long gcs_offset;
unsigned long sve_offset;
unsigned long tpidr2_offset;
unsigned long za_offset;
@@ -214,6 +215,8 @@ struct user_ctxs {
u32 za_size;
struct zt_context __user *zt;
u32 zt_size;
+ struct gcs_context __user *gcs;
+ u32 gcs_size;
};
static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
@@ -606,6 +609,83 @@ extern int restore_zt_context(struct user_ctxs *user);
#endif /* ! CONFIG_ARM64_SME */
+#ifdef CONFIG_ARM64_GCS
+
+static int preserve_gcs_context(struct gcs_context __user *ctx)
+{
+ int err = 0;
+ u64 gcspr;
+
+ /*
+ * We will add a cap token to the frame, include it in the
+ * GCSPR_EL0 we report to support stack switching via
+ * sigreturn.
+ */
+ gcs_preserve_current_state();
+ gcspr = current->thread.gcspr_el0;
+ if (task_gcs_el0_enabled(current))
+ gcspr -= 8;
+
+ __put_user_error(GCS_MAGIC, &ctx->head.magic, err);
+ __put_user_error(sizeof(*ctx), &ctx->head.size, err);
+ __put_user_error(gcspr, &ctx->gcspr, err);
+ __put_user_error(0, &ctx->reserved, err);
+ __put_user_error(current->thread.gcs_el0_mode,
+ &ctx->features_enabled, err);
+
+ return err;
+}
+
+static int restore_gcs_context(struct user_ctxs *user)
+{
+ u64 gcspr, enabled;
+ int err = 0;
+
+ if (user->gcs_size != sizeof(*user->gcs))
+ return -EINVAL;
+
+ __get_user_error(gcspr, &user->gcs->gcspr, err);
+ __get_user_error(enabled, &user->gcs->features_enabled, err);
+ if (err)
+ return err;
+
+ /* Don't allow unknown modes */
+ if (enabled & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK)
+ return -EINVAL;
+
+ err = gcs_check_locked(current, enabled);
+ if (err != 0)
+ return err;
+
+ /* Don't allow enabling */
+ if (!task_gcs_el0_enabled(current) &&
+ (enabled & PR_SHADOW_STACK_ENABLE))
+ return -EINVAL;
+
+ /* If we are disabling disable everything */
+ if (!(enabled & PR_SHADOW_STACK_ENABLE))
+ enabled = 0;
+
+ current->thread.gcs_el0_mode = enabled;
+
+ /*
+ * We let userspace set GCSPR_EL0 to anything here, we will
+ * validate later in gcs_restore_signal().
+ */
+ current->thread.gcspr_el0 = gcspr;
+ write_sysreg_s(current->thread.gcspr_el0, SYS_GCSPR_EL0);
+
+ return 0;
+}
+
+#else /* ! CONFIG_ARM64_GCS */
+
+/* Turn any non-optimised out attempts to use these into a link error: */
+extern int preserve_gcs_context(void __user *ctx);
+extern int restore_gcs_context(struct user_ctxs *user);
+
+#endif /* ! CONFIG_ARM64_GCS */
+
static int parse_user_sigframe(struct user_ctxs *user,
struct rt_sigframe __user *sf)
{
@@ -622,6 +702,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->tpidr2 = NULL;
user->za = NULL;
user->zt = NULL;
+ user->gcs = NULL;
if (!IS_ALIGNED((unsigned long)base, 16))
goto invalid;
@@ -716,6 +797,17 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->zt_size = size;
break;
+ case GCS_MAGIC:
+ if (!system_supports_gcs())
+ goto invalid;
+
+ if (user->gcs)
+ goto invalid;
+
+ user->gcs = (struct gcs_context __user *)head;
+ user->gcs_size = size;
+ break;
+
case EXTRA_MAGIC:
if (have_extra_context)
goto invalid;
@@ -835,6 +927,9 @@ static int restore_sigframe(struct pt_regs *regs,
err = restore_fpsimd_context(&user);
}
+ if (err == 0 && system_supports_gcs() && user.gcs)
+ err = restore_gcs_context(&user);
+
if (err == 0 && system_supports_tpidr2() && user.tpidr2)
err = restore_tpidr2_context(&user);
@@ -954,6 +1049,13 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
return err;
}
+ if (system_supports_gcs()) {
+ err = sigframe_alloc(user, &user->gcs_offset,
+ sizeof(struct gcs_context));
+ if (err)
+ return err;
+ }
+
if (system_supports_sve() || system_supports_sme()) {
unsigned int vq = 0;
@@ -1047,6 +1149,12 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
__put_user_error(current->thread.fault_code, &esr_ctx->esr, err);
}
+ if (system_supports_gcs() && err == 0 && user->gcs_offset) {
+ struct gcs_context __user *gcs_ctx =
+ apply_user_offset(user, user->gcs_offset);
+ err |= preserve_gcs_context(gcs_ctx);
+ }
+
/* Scalable Vector Extension state (including streaming), if present */
if ((system_supports_sve() || system_supports_sme()) &&
err == 0 && user->sve_offset) {
--
2.30.2
On Sat, 03 Feb 2024 12:25:39 +0000,
Mark Brown <[email protected]> wrote:
>
> GCS introduces a number of system registers for EL1 and EL0, on systems
and EL2.
> with GCS we need to context switch them and expose them to VMMs to allow
> guests to use GCS, as well as describe their fine grained traps to
> nested virtualisation. Traps are already disabled.
The latter is not true with NV, since the guest is in control of the
FGT registers.
>
> Signed-off-by: Mark Brown <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
> arch/arm64/kvm/emulate-nested.c | 4 ++++
> arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +++++++++++++++++
> arch/arm64/kvm/sys_regs.c | 22 ++++++++++++++++++++++
> 4 files changed, 55 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 21c57b812569..6c7ea7f9cd92 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -388,6 +388,12 @@ enum vcpu_sysreg {
> GCR_EL1, /* Tag Control Register */
> TFSRE0_EL1, /* Tag Fault Status Register (EL0) */
>
> + /* Guarded Control Stack registers */
> + GCSCRE0_EL1, /* Guarded Control Stack Control (EL0) */
> + GCSCR_EL1, /* Guarded Control Stack Control (EL1) */
This is subjected to VNCR (0x8D0).
> + GCSPR_EL0, /* Guarded Control Stack Pointer (EL0) */
> + GCSPR_EL1, /* Guarded Control Stack Pointer (EL1) */
So is this one (0x8C0). And how about the *_EL2 versions?
> +
> /* 32bit specific registers. */
> DACR32_EL2, /* Domain Access Control Register */
> IFSR32_EL2, /* Instruction Fault Status Register */
> @@ -1221,6 +1227,12 @@ static inline bool __vcpu_has_feature(const struct kvm_arch *ka, int feature)
>
> #define vcpu_has_feature(v, f) __vcpu_has_feature(&(v)->kvm->arch, (f))
>
> +static inline bool has_gcs(void)
> +{
> + return IS_ENABLED(CONFIG_ARM64_GCS) &&
> + cpus_have_final_cap(ARM64_HAS_GCS);
> +}
> +
> int kvm_trng_call(struct kvm_vcpu *vcpu);
> #ifdef CONFIG_KVM
> extern phys_addr_t hyp_mem_base;
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 431fd429932d..24eb7eccbae4 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -1098,8 +1098,12 @@ static const struct encoding_to_trap_config encoding_to_fgt[] __initconst = {
> SR_FGT(SYS_ESR_EL1, HFGxTR, ESR_EL1, 1),
> SR_FGT(SYS_DCZID_EL0, HFGxTR, DCZID_EL0, 1),
> SR_FGT(SYS_CTR_EL0, HFGxTR, CTR_EL0, 1),
> + SR_FGT(SYS_GCSPR_EL0, HFGxTR, nGCS_EL0, 1),
> SR_FGT(SYS_CSSELR_EL1, HFGxTR, CSSELR_EL1, 1),
> SR_FGT(SYS_CPACR_EL1, HFGxTR, CPACR_EL1, 1),
> + SR_FGT(SYS_GCSCR_EL1, HFGxTR, nGCS_EL1, 1),
> + SR_FGT(SYS_GCSPR_EL1, HFGxTR, nGCS_EL1, 1),
> + SR_FGT(SYS_GCSCRE0_EL1, HFGxTR, nGCS_EL0, 1),
This is clearly wrong on all 4 counts (the n prefix gives it away...).
> SR_FGT(SYS_CONTEXTIDR_EL1, HFGxTR, CONTEXTIDR_EL1, 1),
> SR_FGT(SYS_CLIDR_EL1, HFGxTR, CLIDR_EL1, 1),
> SR_FGT(SYS_CCSIDR_EL1, HFGxTR, CCSIDR_EL1, 1),
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index bb6b571ec627..ec34d4a90717 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -25,6 +25,8 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
> {
> ctxt_sys_reg(ctxt, TPIDR_EL0) = read_sysreg(tpidr_el0);
> ctxt_sys_reg(ctxt, TPIDRRO_EL0) = read_sysreg(tpidrro_el0);
> + if (has_gcs())
> + ctxt_sys_reg(ctxt, GCSPR_EL0) = read_sysreg_s(SYS_GCSPR_EL0);
We have had this discussion in the past. This must be based on the
VM's configuration. Guarding the check with the host capability is a
valuable optimisation, but that's nowhere near enough. See the series
that I have posted on this very subject (you're on Cc), but you are
welcome to invent your own mechanism in the meantime.
> }
>
> static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
> @@ -62,6 +64,12 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
> ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg_par();
> ctxt_sys_reg(ctxt, TPIDR_EL1) = read_sysreg(tpidr_el1);
>
> + if (has_gcs()) {
> + ctxt_sys_reg(ctxt, GCSPR_EL1) = read_sysreg_el1(SYS_GCSPR);
> + ctxt_sys_reg(ctxt, GCSCR_EL1) = read_sysreg_el1(SYS_GCSCR);
> + ctxt_sys_reg(ctxt, GCSCRE0_EL1) = read_sysreg_s(SYS_GCSCRE0_EL1);
> + }
> +
Same thing.
> if (ctxt_has_mte(ctxt)) {
> ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
> ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
> @@ -95,6 +103,8 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
> {
> write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL0), tpidr_el0);
> write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0), tpidrro_el0);
> + if (has_gcs())
> + write_sysreg_s(ctxt_sys_reg(ctxt, GCSPR_EL0), SYS_GCSPR_EL0);
> }
>
> static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
> @@ -138,6 +148,13 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
> write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1), par_el1);
> write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1);
>
> + if (has_gcs()) {
> + write_sysreg_el1(ctxt_sys_reg(ctxt, GCSPR_EL1), SYS_GCSPR);
> + write_sysreg_el1(ctxt_sys_reg(ctxt, GCSCR_EL1), SYS_GCSCR);
> + write_sysreg_s(ctxt_sys_reg(ctxt, GCSCRE0_EL1),
> + SYS_GCSCRE0_EL1);
> + }
> +
For the benefit of the unsuspecting reviewers, and in the absence of a
public specification (which the XML drop isn't), it would be good to
have the commit message explaining the rationale of what gets saved
when.
> if (ctxt_has_mte(ctxt)) {
> write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
> write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 30253bd19917..83ba767e75d2 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2000,6 +2000,23 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
> .visibility = mte_visibility, \
> }
>
> +static unsigned int gcs_visibility(const struct kvm_vcpu *vcpu,
> + const struct sys_reg_desc *rd)
> +{
> + if (has_gcs())
> + return 0;
Yet another case of exposing potentially unwanted state, to the VMM
this time.
> +
> + return REG_HIDDEN;
> +}
> +
> +#define GCS_REG(name) { \
> + SYS_DESC(SYS_##name), \
> + .access = undef_access, \
> + .reset = reset_unknown, \
> + .reg = name, \
> + .visibility = gcs_visibility, \
> +}
> +
> static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
> const struct sys_reg_desc *rd)
> {
> @@ -2376,6 +2393,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> PTRAUTH_KEY(APDB),
> PTRAUTH_KEY(APGA),
>
> + GCS_REG(GCSCR_EL1),
> + GCS_REG(GCSPR_EL1),
> + GCS_REG(GCSCRE0_EL1),
> +
> { SYS_DESC(SYS_SPSR_EL1), access_spsr},
> { SYS_DESC(SYS_ELR_EL1), access_elr},
>
> @@ -2462,6 +2483,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> { SYS_DESC(SYS_SMIDR_EL1), undef_access },
> { SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
> { SYS_DESC(SYS_CTR_EL0), access_ctr },
> + GCS_REG(GCSPR_EL0),
> { SYS_DESC(SYS_SVCR), undef_access },
>
> { PMU_SYS_REG(PMCR_EL0), .access = access_pmcr, .reset = reset_pmcr,
>
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Mon, Feb 05, 2024 at 09:46:16AM +0000, Marc Zyngier wrote:
> On Sat, 03 Feb 2024 12:25:39 +0000,
> Mark Brown <[email protected]> wrote:
> > +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> > @@ -25,6 +25,8 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
> > {
> > ctxt_sys_reg(ctxt, TPIDR_EL0) = read_sysreg(tpidr_el0);
> > ctxt_sys_reg(ctxt, TPIDRRO_EL0) = read_sysreg(tpidrro_el0);
> > + if (has_gcs())
> > + ctxt_sys_reg(ctxt, GCSPR_EL0) = read_sysreg_s(SYS_GCSPR_EL0);
> We have had this discussion in the past. This must be based on the
> VM's configuration. Guarding the check with the host capability is a
> valuable optimisation, but that's nowhere near enough. See the series
> that I have posted on this very subject (you're on Cc), but you are
> welcome to invent your own mechanism in the meantime.
Right, which postdates the version you're replying to and isn't merged
yet - the current code was what you were asking for at the time. I'm
expecting to update all these feature series to work with that once it
gets finalised and merged but it's not there yet, I do see I forgot to
put a note in v9 about that like I did for dpISA - sorry about that, I
was too focused on the clone3() rework when rebasing onto the new
kernel.
This particular series isn't going to get merged for a while yet anyway
due to the time it'll take for userspace testing, I'm expecting your
series to be in by the time it becomes an issue.
> > + if (has_gcs()) {
> > + write_sysreg_el1(ctxt_sys_reg(ctxt, GCSPR_EL1), SYS_GCSPR);
> > + write_sysreg_el1(ctxt_sys_reg(ctxt, GCSCR_EL1), SYS_GCSCR);
> > + write_sysreg_s(ctxt_sys_reg(ctxt, GCSCRE0_EL1),
> > + SYS_GCSCRE0_EL1);
> > + }
> For the benefit of the unsuspecting reviewers, and in the absence of a
> public specification (which the XML drop isn't), it would be good to
> have the commit message explaining the rationale of what gets saved
> when.
What are you looking for in terms of rationale here? The KVM house
style is often very reliant on reader context so it would be good to
know what considerations you'd like to see explicitly addressed. These
registers shouldn't do anything when we aren't running the guest so
they're not terribly ordering sensitive, the EL2 ones will need a bit
more consideration in the face of nested virt.
On Mon, 05 Feb 2024 12:35:53 +0000,
Mark Brown <[email protected]> wrote:
>
> On Mon, Feb 05, 2024 at 09:46:16AM +0000, Marc Zyngier wrote:
> > On Sat, 03 Feb 2024 12:25:39 +0000,
> > Mark Brown <[email protected]> wrote:
>
> > > +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> > > @@ -25,6 +25,8 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
> > > {
> > > ctxt_sys_reg(ctxt, TPIDR_EL0) = read_sysreg(tpidr_el0);
> > > ctxt_sys_reg(ctxt, TPIDRRO_EL0) = read_sysreg(tpidrro_el0);
> > > + if (has_gcs())
> > > + ctxt_sys_reg(ctxt, GCSPR_EL0) = read_sysreg_s(SYS_GCSPR_EL0);
>
> > We have had this discussion in the past. This must be based on the
> > VM's configuration. Guarding the check with the host capability is a
> > valuable optimisation, but that's nowhere near enough. See the series
> > that I have posted on this very subject (you're on Cc), but you are
> > welcome to invent your own mechanism in the meantime.
>
> Right, which postdates the version you're replying to and isn't merged
> yet - the current code was what you were asking for at the time.
v1 and v2 predate it. And if the above is what I did ask, then I must
have done a very poor job of explaining what was required. For which I
apologise profusely.
> I'm
> expecting to update all these feature series to work with that once it
> gets finalised and merged but it's not there yet, I do see I forgot to
> put a note in v9 about that like I did for dpISA - sorry about that, I
> was too focused on the clone3() rework when rebasing onto the new
> kernel.
>
> This particular series isn't going to get merged for a while yet anyway
> due to the time it'll take for userspace testing, I'm expecting your
> series to be in by the time it becomes an issue.
Right. Then I'll ignore it for the foreseeable future.
>
> > > + if (has_gcs()) {
> > > + write_sysreg_el1(ctxt_sys_reg(ctxt, GCSPR_EL1), SYS_GCSPR);
> > > + write_sysreg_el1(ctxt_sys_reg(ctxt, GCSCR_EL1), SYS_GCSCR);
> > > + write_sysreg_s(ctxt_sys_reg(ctxt, GCSCRE0_EL1),
> > > + SYS_GCSCRE0_EL1);
> > > + }
>
> > For the benefit of the unsuspecting reviewers, and in the absence of a
> > public specification (which the XML drop isn't), it would be good to
> > have the commit message explaining the rationale of what gets saved
> > when.
>
> What are you looking for in terms of rationale here? The KVM house
> style is often very reliant on reader context so it would be good to
> know what considerations you'd like to see explicitly addressed.
Nothing to do with style, everything to do with substance: if nothing
in the host kernel makes any use of these registers, why are they
eagerly saved/restored on nVHE/hVHE? I'm sure you have a reason for
it, but it isn't that obvious. Because these two modes need all the
help they can get in terms of overhead reduction.
> These
> registers shouldn't do anything when we aren't running the guest so
> they're not terribly ordering sensitive, the EL2 ones will need a bit
> more consideration in the face of nested virt.
The EL2 registers should follow the exact same pattern, specially once
you fix the VNCR bugs I pointed out.
M.
--
Without deviation from the norm, progress is not possible.
On Mon, Feb 05, 2024 at 03:34:05PM +0000, Marc Zyngier wrote:
> Mark Brown <[email protected]> wrote:
> > On Mon, Feb 05, 2024 at 09:46:16AM +0000, Marc Zyngier wrote:
> > > We have had this discussion in the past. This must be based on the
> > > VM's configuration. Guarding the check with the host capability is a
> > > valuable optimisation, but that's nowhere near enough. See the series
> > > that I have posted on this very subject (you're on Cc), but you are
> > > welcome to invent your own mechanism in the meantime.
> > Right, which postdates the version you're replying to and isn't merged
> > yet - the current code was what you were asking for at the time.
> v1 and v2 predate it. And if the above is what I did ask, then I must
> have done a very poor job of explaining what was required. For which I
> apologise profusely.
To be clear it's what was asked for prior to the switch to the
forthcoming switch to the parsing idregs scheme, I haven't pulled in
your idregs work yet since it's being rapidly iterated and this is an
already large series with dependencies.
> > I'm
> > expecting to update all these feature series to work with that once it
> > gets finalised and merged but it's not there yet, I do see I forgot to
> > put a note in v9 about that like I did for dpISA - sorry about that, I
> > was too focused on the clone3() rework when rebasing onto the new
> > kernel.
> > This particular series isn't going to get merged for a while yet anyway
> > due to the time it'll take for userspace testing, I'm expecting your
> > series to be in by the time it becomes an issue.
> Right. Then I'll ignore it for the foreseeable future.
Actually now I think about it would you be open to merging the guest
context switching bit without the rest of the series (pending me fixing
the issues you raise of course)? If so I'll split that bit out in the
hope that we can reduce the size of the series and CC list for the
userspace support which I imagine would make people a bit happier.
> > > > + write_sysreg_s(ctxt_sys_reg(ctxt, GCSCRE0_EL1),
> > > > + SYS_GCSCRE0_EL1);
> > > > + }
> > > For the benefit of the unsuspecting reviewers, and in the absence of a
> > > public specification (which the XML drop isn't), it would be good to
> > > have the commit message explaining the rationale of what gets saved
> > > when.
> > What are you looking for in terms of rationale here? The KVM house
> > style is often very reliant on reader context so it would be good to
> > know what considerations you'd like to see explicitly addressed.
> Nothing to do with style, everything to do with substance: if nothing
The style I'm referring to there is the style for documentation.
> in the host kernel makes any use of these registers, why are they
> eagerly saved/restored on nVHE/hVHE? I'm sure you have a reason for
> it, but it isn't that obvious. Because these two modes need all the
> help they can get in terms of overhead reduction.
Ah, I see - yes, they should probably be moved somewhere else. Though
I'm not clear why some of the other registers that we're saving and
restoring in the same place are being done eagerly? The userspace
TPIDRs stand out for example, they're in taken care of in
__sysreg_save_user_state() which is called in the same paths. IIRC my
thinking there was something along the lines of "this is where we save
and restore everything else that's just a general system register, I
should be consistent".
Am I right in thinking kvm_arch_vcpu_load()/_put() would make sense?
Everything in there currently looked like it was there more due to doing
something more complex than simple register save/restore and we weren't
worrying too much about what was going on with just the sysregs.
> > These
> > registers shouldn't do anything when we aren't running the guest so
> > they're not terribly ordering sensitive, the EL2 ones will need a bit
> > more consideration in the face of nested virt.
> The EL2 registers should follow the exact same pattern, specially once
> you fix the VNCR bugs I pointed out.
Great, that's what I'd thought thanks - I hadn't checked yet.
Hello,
Mark Brown <[email protected]> writes:
> Changes in v8:
> - Invalidate signal cap token on stack when consuming.
> - Typo and other trivial fixes.
> - Don't try to use process_vm_write() on GCS, it intentionally does not
> work.
> - Fix leak of thread GCSs.
> - Rebase onto latest clone3() series.
> - Link to v7: https://lore.kernel.org/r/[email protected]
Thank you for addressing my comments. I still have a few nets and
questions in a few patches, but regardless of them:
Reviewed-by: Thiago Jung Bauermann <[email protected]>
--
Thiago
Mark Brown <[email protected]> writes:
> When a new thread is created by a thread with GCS enabled the GCS needs
> to be specified along with the regular stack. clone3() has been
> extended to support this case, allowing userspace to explicitly specify
> the size and location of the GCS. The specified GCS must have a valid
> GCS token at the top of the stack, as though userspace were pivoting to
> the new GCS. This will be consumed on use. At present we do not
> atomically consume the token, this will be addressed in a future
> revision.
>
> Unfortunately plain clone() is not extensible and existing clone3()
> users will not specify a stack so all existing code would be broken if
> we mandated specifying the stack explicitly. For compatibility with
> these cases and also x86 (which did not initially implement clone3()
> support for shadow stacks) if no GCS is specified we will allocate one
> thread so when a thread is created which has GCS enabled allocate one
~~~~~~
This "thread" seems extraneous in the sentence. Remove it?
> for it. We follow the extensively discussed x86 implementation and
> allocate min(RLIMIT_STACK, 4G). Since the GCS only stores the call
Isn't it min(RLIMIT_STACK/2, 2G), as seen in gcs_size()? If true, this
size should also be fixed in Documentation/arch/arm64/gcs.rst.
> stack and not any variables this should be more than sufficient for most
> applications.
>
> GCSs allocated via this mechanism then it will be freed when the thread
> exits, those explicitly configured by the user will not.
I'm not sure I parsed this sentence correctly. Is it missing an "If" at
the beginning?
> +unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
> + const struct kernel_clone_args *args)
> +{
> + unsigned long addr, size, gcspr_el0;
> +
> + /* If the user specified a GCS use it. */
> + if (args->shadow_stack_size) {
> + if (!system_supports_gcs())
> + return (unsigned long)ERR_PTR(-EINVAL);
> +
> + addr = args->shadow_stack;
> + size = args->shadow_stack_size;
> +
> + /*
> + * There should be a token, there might be an end of
> + * stack marker.
> + */
> + gcspr_el0 = addr + size - (2 * sizeof(u64));
> + if (!gcs_consume_token(tsk, gcspr_el0)) {
Should this code validate the end of stack marker? Or doesn't it matter
whether the marker is correct or not?
> + gcspr_el0 += sizeof(u64);
> + if (!gcs_consume_token(tsk, gcspr_el0))
> + return (unsigned long)ERR_PTR(-EINVAL);
> + }
> +
> + /* Userspace is responsible for unmapping */
> + tsk->thread.gcspr_el0 = gcspr_el0 + sizeof(u64);
> + } else {
--
Thiago
Mark Brown <[email protected]> writes:
> +#ifdef CONFIG_ARM64_GCS
> +static int gcs_restore_signal(void)
> +{
> + u64 gcspr_el0, cap;
> + int ret;
> +
> + if (!system_supports_gcs())
> + return 0;
> +
> + if (!(current->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE))
> + return 0;
> +
> + gcspr_el0 = read_sysreg_s(SYS_GCSPR_EL0);
> +
> + /*
> + * GCSPR_EL0 should be pointing at a capped GCS, read the cap...
> + */
> + gcsb_dsync();
> + ret = copy_from_user(&cap, (__user void*)gcspr_el0, sizeof(cap));
> + if (ret)
> + return -EFAULT;
> +
> + /*
> + * ...then check that the cap is the actual GCS before
> + * restoring it.
> + */
> + if (!gcs_signal_cap_valid(gcspr_el0, cap))
> + return -EINVAL;
> +
> + /* Invalidate the token to prevent reuse */
> + put_user_gcs(0, (__user void*)gcspr_el0, &ret);
> + if (ret != 0)
> + return -EFAULT;
You had mentioned that "ideally we'd be doing a compare and exchange
here to substitute in a zero". Is a compare and exchange not necessary
anymore, or is it just being left for later? In the latter case, a TODO
or FIXME comment mentioning it would be useful here.
> +
> + current->thread.gcspr_el0 = gcspr_el0 + sizeof(cap);
> + write_sysreg_s(current->thread.gcspr_el0, SYS_GCSPR_EL0);
> +
> + return 0;
> +}
--
Thiago
Mark Brown <[email protected]> writes:
> Do some testing of the signal handling for GCS, checking that a GCS
> frame has the expected information in it and that the expected signals
> are delivered with invalid operations.
>
> Signed-off-by: Mark Brown <[email protected]>
> ---
> tools/testing/selftests/arm64/signal/.gitignore | 1 +
> .../selftests/arm64/signal/test_signals_utils.h | 10 +++
> .../arm64/signal/testcases/gcs_exception_fault.c | 62 +++++++++++++++
> .../selftests/arm64/signal/testcases/gcs_frame.c | 88 ++++++++++++++++++++++
> .../arm64/signal/testcases/gcs_write_fault.c | 67 ++++++++++++++++
> 5 files changed, 228 insertions(+)
Just FYI, in v7 I reported that gcs_write_fault was failing for me.
Now all tests in this patch are passing.
--
Thiago
Mark Brown <[email protected]> writes:
> There are things like threads which nolibc struggles with which we want
> to add coverage for, and the ABI allows us to test most of these even if
> libc itself does not understand GCS so add a test application built
> using the system libc.
>
> Signed-off-by: Mark Brown <[email protected]>
> ---
> tools/testing/selftests/arm64/gcs/.gitignore | 1 +
> tools/testing/selftests/arm64/gcs/Makefile | 4 +-
> tools/testing/selftests/arm64/gcs/gcs-util.h | 10 +
> tools/testing/selftests/arm64/gcs/libc-gcs.c | 736 +++++++++++++++++++++++++++
> 4 files changed, 750 insertions(+), 1 deletion(-)
In v7, several tests weren't running in my FVT VM for some reason.
This time they do:
$ ./run_kselftest.sh -t arm64:libc-gcs
TAP version 13
1..1
# timeout set to 45
# selftests: arm64: libc-gcs
# TAP version 13
# 1..118
# # Starting 118 tests from 32 test cases.
# # RUN global.can_call_function ...
# # can_call_function: Test terminated unexpectedly by signal 11
# # FAIL global.can_call_function
# not ok 1 global.can_call_function
# # RUN global.gcs_enabled_thread ...
# # OK global.gcs_enabled_thread
# ok 2 global.gcs_enabled_thread
⋮
# # RUN invalid_mprotect.exec_bti.do_map_read ...
# # Allocated stack from 0xffffb3aa9000-0xffffb3aaa000
# # OK invalid_mprotect.exec_bti.do_map_read
# ok 118 invalid_mprotect.exec_bti.do_map_read
# # FAILED: 117 / 118 tests passed.
# # Totals: pass:117 fail:1 xfail:0 xpass:0 skip:0 error:0
not ok 1 selftests: arm64: libc-gcs # exit=1
The only issue as can be seen above is that the can_call_function test
is failing. The child is getting a GCS Segmentation fault when returning
from fork().
I tried debugging it with GDB, but I don't see what's wrong since the
address in LR matches the first entry in GCSPR. Here is the
debug session:
(gdb) break libc-gcs.c:58
Breakpoint 1 at 0x3894: file libc-gcs.c, line 58.
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /var/tmp/selftests/arm64/libc-gcs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
TAP version 13
1..118
# Starting 118 tests from 32 test cases.
# RUN global.can_call_function ...
[Attaching after Thread 0xfffff7ff7e80 (LWP 9164) fork to child process 9168]
[New inferior 2 (process 9168)]
[Detaching after fork from parent process 9164]
[Inferior 1 (process 9164) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Thread 2.1 "libc-gcs" received signal SIGSEGV, Segmentation fault
Guarded Control Stack error.
[Switching to Thread 0xfffff7ff7e80 (LWP 9168)]
0x0000fffff7ec6fc0 [GCS error] in __GI__Fork () at ../sysdeps/nptl/_Fork.c:50
warning: 50 ../sysdeps/nptl/_Fork.c: No such file or directory
(gdb) bt
#0 0x0000fffff7ec6fc0 [GCS error] in __GI__Fork () at ../sysdeps/nptl/_Fork.c:50
#1 0x0000fffff7ec6be0 in __libc_fork () at ./posix/fork.c:73
#2 0x0000aaaaaaaa49b8 in __run_test (f=f@entry=0xaaaaaaab98c8 <_fixture_global>,
variant=variant@entry=0xffffffffefb8, t=t@entry=0xaaaaaaab81b0 <_can_call_function_object>)
at /home/thiago.bauermann/src/linux/tools/testing/selftests/kselftest_harness.h:1128
#3 0x0000aaaaaaaa2ac4 in test_harness_run (argv=0xfffffffff158, argc=1)
at /home/thiago.bauermann/src/linux/tools/testing/selftests/kselftest_harness.h:1199
#4 main (argc=1, argv=0xfffffffff158) at libc-gcs.c:735
(gdb) p $gcspr
$1 = (void *) 0xfffff7dfffe0
(gdb) p/x $lr
$3 = 0xfffff7ec6be0
(gdb) p/x *(unsigned long *)$gcspr
$5 = 0xfffff7ec6be0
(gdb) disassemble
Dump of assembler code for function __GI__Fork:
0x0000fffff7ec6f70 <+0>: mrs x5, tpidr_el0
0x0000fffff7ec6f74 <+4>: mov x0, #0x11 // #17
0x0000fffff7ec6f78 <+8>: sub x6, x5, #0x7c0
0x0000fffff7ec6f7c <+12>: sub x4, x5, #0x6f0
0x0000fffff7ec6f80 <+16>: movk x0, #0x120, lsl #16
0x0000fffff7ec6f84 <+20>: mov x1, #0x0 // #0
0x0000fffff7ec6f88 <+24>: mov x2, #0x0 // #0
0x0000fffff7ec6f8c <+28>: mov x3, #0x0 // #0
0x0000fffff7ec6f90 <+32>: mov x8, #0xdc // #220
0x0000fffff7ec6f94 <+36>: svc #0x0
0x0000fffff7ec6f98 <+40>: cmn x0, #0x1, lsl #12
0x0000fffff7ec6f9c <+44>: b.hi 0xfffff7ec6fc4 <__GI__Fork+84> // b.pmore
0x0000fffff7ec6fa0 <+48>: mov w2, w0
0x0000fffff7ec6fa4 <+52>: cbnz w0, 0xfffff7ec6fbc <__GI__Fork+76>
0x0000fffff7ec6fa8 <+56>: sub x0, x5, #0x6e0
0x0000fffff7ec6fac <+60>: mov x1, #0x18 // #24
0x0000fffff7ec6fb0 <+64>: mov x8, #0x63 // #99
0x0000fffff7ec6fb4 <+68>: stp x0, x0, [x6, #216]
0x0000fffff7ec6fb8 <+72>: svc #0x0
0x0000fffff7ec6fbc <+76>: mov w0, w2
=> 0x0000fffff7ec6fc0 <+80>: ret
0x0000fffff7ec6fc4 <+84>: adrp x1, 0xfffff7faa000 <sys_siglist+424>
0x0000fffff7ec6fc8 <+88>: ldr x1, [x1, #3528]
0x0000fffff7ec6fcc <+92>: neg w0, w0
0x0000fffff7ec6fd0 <+96>: mov w2, #0xffffffff // #-1
0x0000fffff7ec6fd4 <+100>: str w0, [x5, x1]
0x0000fffff7ec6fd8 <+104>: mov w0, w2
0x0000fffff7ec6fdc <+108>: ret
End of assembler dump.
(gdb) p $w0
$8 = 0
(gdb) p $_siginfo.si_signo
$12 = 11
(gdb) p $_siginfo.si_code
$13 = 10
(gdb) p $_siginfo._sifields._sigfault.si_addr
$14 = (void *) 0xfffff7ec6fc0 <__GI__Fork+80>
--
Thiago
On Sat, Feb 3, 2024, at 7:25 AM, Mark Brown wrote:
> The arm64 Guarded Control Stack (GCS) feature provides support for
> hardware protected stacks of return addresses, intended to provide
> hardening against return oriented programming (ROP) attacks and to make
> it easier to gather call stacks for applications such as profiling.
>
> When GCS is active a secondary stack called the Guarded Control Stack is
> maintained, protected with a memory attribute which means that it can
> only be written with specific GCS operations. The current GCS pointer
> can not be directly written to by userspace. When a BL is executed the
> value stored in LR is also pushed onto the GCS, and when a RET is
> executed the top of the GCS is popped and compared to LR with a fault
> being raised if the values do not match. GCS operations may only be
> performed on GCS pages, a data abort is generated if they are not.
>
> The combination of hardware enforcement and lack of extra instructions
> in the function entry and exit paths should result in something which
> has less overhead and is more difficult to attack than a purely software
> implementation like clang's shadow stacks.
>
> This series implements support for use of GCS by userspace, along with
> support for use of GCS within KVM guests. It does not enable use of GCS
> by either EL1 or EL2, this will be implemented separately. Executables
> are started without GCS and must use a prctl() to enable it, it is
> expected that this will be done very early in application execution by
> the dynamic linker or other startup code. For dynamic linking this will
> be done by checking that everything in the executable is marked as GCS
> compatible.
>
> x86 has an equivalent feature called shadow stacks, this series depends
> on the x86 patches for generic memory management support for the new
> guarded/shadow stack page type and shares APIs as much as possible. As
> there has been extensive discussion with the wider community around the
> ABI for shadow stacks I have as far as practical kept implementation
> decisions close to those for x86, anticipating that review would lead to
> similar conclusions in the absence of strong reasoning for divergence.
>
> The main divergence I am concious of is that x86 allows shadow stack to
> be enabled and disabled repeatedly, freeing the shadow stack for the
> thread whenever disabled, while this implementation keeps the GCS
> allocated after disable but refuses to reenable it. This is to avoid
> races with things actively walking the GCS during a disable, we do
> anticipate that some systems will wish to disable GCS at runtime but are
> not aware of any demand for subsequently reenabling it.
>
> x86 uses an arch_prctl() to manage enable and disable, since only x86
> and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
> patch set for the equivalent RISC-V Zicfiss feature which I initially
> adopted fairly directly but following review feedback has been revised
> quite a bit.
While discussing the ABI implications of shadow stacks in the context of
Zicfiss and musl a few days ago, I had the following idea for how to solve
the source compatibility problems with shadow stacks in POSIX.1-2004 and
POSIX.1-2017:
1. Introduce a "flexible shadow stack handling" option. For what follows,
it doesn't matter if this is system-wide, per-mm, or per-vma.
2. Shadow stack faults on non-shadow stack pages, if flexible shadow stack
handling is in effect, cause the affected page to become a shadow stack
page. When this happens, the page filled with invalid address tokens.
Faults from non-shadow-stack accesses to a shadow-stack page which was
created by the previous paragraph will cause the page to revert to
non-shadow-stack usage, with or without clearing.
Important: a shadow stack operation can only load a valid address from
a page if that page has been in continuous shadow stack use since the
address was written by another shadow stack operation; the flexibility
delays error reporting in cases of stray writes but it never allows for
corruption of shadow stack operation.
3. Standards-defined operations which use a user-provided stack
(makecontext, sigaltstack, pthread_attr_setstack) use a subrange of the
provided stack for shadow stack storage. I propose to use a shadow
stack size of 1/32 of the provided stack size, rounded up to a positive
integer number of pages, and place the shadow stack allocation at the
lowest page-aligned address inside the provided stack region.
Since page usage is flexible, no change in page permissions is
immediately needed; this merely sets the initial shadow stack pointer for
the new context.
If the shadow stack grew in the opposite direction to the architectural
stack, it would not be necessary to pick a fixed direction.
4. SIGSTKSZ and MINSIGSTKSZ are increased by 2 pages to provide sufficient
space for a minimum-sized shadow stack region and worst case alignment.
_Without_ doing this, sigaltstack cannot be used to recover from stack
overflows if the shadow stack limit is reached first, and makecontext
cannot be supported without memory leaks and unreportable error conditions.
Kernel-allocated shadow stacks with a unique VM type are still useful since
they allows stray writes to crash at the time the stray write is performed,
rather than delaying the crash until the next shadow stack read.
The pthread and makecontext changes could be purely libc side, but we would
need kernel support for sigaltstack and page usage changes.
Luckily, there is no need to support stacks which are simultaneously used
from more than one process, so "is this a shadow stack page" can be tracked
purely at the vma/pte level without any need to involve the inode. POSIX
explicitly allows using mmap to obtain stack memory and does not forbid
MAP_SHARED; I consider this sufficiently perverse application behavior that
it is not necessary to ensure exclusive use of the underlying pages while
a shadow stack pte exists. (Applications that use MAP_SHARED for stacks
do not get the full benefit of the shadow stack but they keep POSIX.1-2004
conformance, applications that allocate stacks exclusively in MAP_PRIVATE
memory lose no security.)
The largest complication of this scheme is likely to be that the shadow
stack usage property of a page needs to survive the page being swapped out
and back in, which likely means that it must be present in the swap PTE.
I am substantially less familiar with GCS and SHSTK than with Zicfiss.
It is likely that a syscall or other mechanism is needed to initialize the
shadow stack in flexible memory for makecontext.
Is there interest on the kernel side on having mechanisms to fully support
POSIX.1-2004 with GCS or Zicfiss enabled?
-s
> We currently maintain the x86 pattern of implicitly allocating a shadow
> stack for threads started with shadow stack enabled, there has been some
> discussion of removing this support and requiring the use of clone3()
> with explicit allocation of shadow stacks instead. I have no strong
> feelings either way, implicit allocation is not really consistent with
> anything else we do and creates the potential for errors around thread
> exit but on the other hand it is existing ABI on x86 and minimises the
> changes needed in userspace code.
>
> There is an open issue with support for CRIU, on x86 this required the
> ability to set the GCS mode via ptrace. This series supports
> configuring mode bits other than enable/disable via ptrace but it needs
> to be confirmed if this is sufficient.
>
> The series depends on support for shadow stacks in clone3(), that series
> includes the addition of ARCH_HAS_USER_SHADOW_STACK.
>
>
> https://lore.kernel.org/r/[email protected]
>
> It also depends on the addition of more waitpid() flags to nolibc:
>
>
> https://lore.kernel.org/r/[email protected]
>
> You can see a branch with the full set of dependencies against Linus'
> tree at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Signed-off-by: Mark Brown <[email protected]>
> ---
> Changes in v8:
> - Invalidate signal cap token on stack when consuming.
> - Typo and other trivial fixes.
> - Don't try to use process_vm_write() on GCS, it intentionally does not
> work.
> - Fix leak of thread GCSs.
> - Rebase onto latest clone3() series.
> - Link to v7:
> https://lore.kernel.org/r/[email protected]
>
> Changes in v7:
> - Rebase onto v6.7-rc2 via the clone3() patch series.
> - Change the token used to cap the stack during signal handling to be
> compatible with GCSPOPM.
> - Fix flags for new page types.
> - Fold in support for clone3().
> - Replace copy_to_user_gcs() with put_user_gcs().
> - Link to v6:
> https://lore.kernel.org/r/[email protected]
>
> Changes in v6:
> - Rebase onto v6.6-rc3.
> - Add some more gcsb_dsync() barriers following spec clarifications.
> - Due to ongoing discussion around clone()/clone3() I've not updated
> anything there, the behaviour is the same as on previous versions.
> - Link to v5:
> https://lore.kernel.org/r/[email protected]
>
> Changes in v5:
> - Don't map any permissions for user GCSs, we always use EL0 accessors
> or use a separate mapping of the page.
> - Reduce the standard size of the GCS to RLIMIT_STACK/2.
> - Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
> - Clarifications and fixes to documentation.
> - More tests.
> - Link to v4:
> https://lore.kernel.org/r/[email protected]
>
> Changes in v4:
> - Implement flags for map_shadow_stack() allowing the cap and end of
> stack marker to be enabled independently or not at all.
> - Relax size and alignment requirements for map_shadow_stack().
> - Add more blurb explaining the advantages of hardware enforcement.
> - Link to v3:
> https://lore.kernel.org/r/[email protected]
>
> Changes in v3:
> - Rebase onto v6.5-rc4.
> - Add a GCS barrier on context switch.
> - Add a GCS stress test.
> - Link to v2:
> https://lore.kernel.org/r/[email protected]
>
> Changes in v2:
> - Rebase onto v6.5-rc3.
> - Rework prctl() interface to allow each bit to be locked independently.
> - map_shadow_stack() now places the cap token based on the size
> requested by the caller not the actual space allocated.
> - Mode changes other than enable via ptrace are now supported.
> - Expand test coverage.
> - Various smaller fixes and adjustments.
> - Link to v1:
> https://lore.kernel.org/r/[email protected]
>
> ---
> Mark Brown (38):
> arm64/mm: Restructure arch_validate_flags() for extensibility
> prctl: arch-agnostic prctl for shadow stack
> mman: Add map_shadow_stack() flags
> arm64: Document boot requirements for Guarded Control Stacks
> arm64/gcs: Document the ABI for Guarded Control Stacks
> arm64/sysreg: Add definitions for architected GCS caps
> arm64/gcs: Add manual encodings of GCS instructions
> arm64/gcs: Provide put_user_gcs()
> arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
> arm64/mm: Allocate PIE slots for EL0 guarded control stack
> mm: Define VM_SHADOW_STACK for arm64 when we support GCS
> arm64/mm: Map pages for guarded control stack
> KVM: arm64: Manage GCS registers for guests
> arm64/gcs: Allow GCS usage at EL0 and EL1
> arm64/idreg: Add overrride for GCS
> arm64/hwcap: Add hwcap for GCS
> arm64/traps: Handle GCS exceptions
> arm64/mm: Handle GCS data aborts
> arm64/gcs: Context switch GCS state for EL0
> arm64/gcs: Ensure that new threads have a GCS
> arm64/gcs: Implement shadow stack prctl() interface
> arm64/mm: Implement map_shadow_stack()
> arm64/signal: Set up and restore the GCS context for signal handlers
> arm64/signal: Expose GCS state in signal frames
> arm64/ptrace: Expose GCS via ptrace and core files
> arm64: Add Kconfig for Guarded Control Stack (GCS)
> kselftest/arm64: Verify the GCS hwcap
> kselftest/arm64: Add GCS as a detected feature in the signal tests
> kselftest/arm64: Add framework support for GCS to signal handling tests
> kselftest/arm64: Allow signals tests to specify an expected si_code
> kselftest/arm64: Always run signals tests with GCS enabled
> kselftest/arm64: Add very basic GCS test program
> kselftest/arm64: Add a GCS test program built with the system libc
> kselftest/arm64: Add test coverage for GCS mode locking
> selftests/arm64: Add GCS signal tests
> kselftest/arm64: Add a GCS stress test
> kselftest/arm64: Enable GCS for the FP stress tests
> kselftest: Provide shadow stack enable helpers for arm64
>
> Documentation/admin-guide/kernel-parameters.txt | 6 +
> Documentation/arch/arm64/booting.rst | 22 +
> Documentation/arch/arm64/elf_hwcaps.rst | 3 +
> Documentation/arch/arm64/gcs.rst | 233 +++++++
> Documentation/arch/arm64/index.rst | 1 +
> Documentation/filesystems/proc.rst | 2 +-
> arch/arm64/Kconfig | 20 +
> arch/arm64/include/asm/cpufeature.h | 6 +
> arch/arm64/include/asm/el2_setup.h | 17 +
> arch/arm64/include/asm/esr.h | 28 +-
> arch/arm64/include/asm/exception.h | 2 +
> arch/arm64/include/asm/gcs.h | 107 +++
> arch/arm64/include/asm/hwcap.h | 1 +
> arch/arm64/include/asm/kvm_arm.h | 4 +-
> arch/arm64/include/asm/kvm_host.h | 12 +
> arch/arm64/include/asm/mman.h | 23 +-
> arch/arm64/include/asm/pgtable-prot.h | 14 +-
> arch/arm64/include/asm/processor.h | 7 +
> arch/arm64/include/asm/sysreg.h | 20 +
> arch/arm64/include/asm/uaccess.h | 40 ++
> arch/arm64/include/uapi/asm/hwcap.h | 1 +
> arch/arm64/include/uapi/asm/ptrace.h | 8 +
> arch/arm64/include/uapi/asm/sigcontext.h | 9 +
> arch/arm64/kernel/cpufeature.c | 19 +
> arch/arm64/kernel/cpuinfo.c | 1 +
> arch/arm64/kernel/entry-common.c | 23 +
> arch/arm64/kernel/idreg-override.c | 2 +
> arch/arm64/kernel/process.c | 85 +++
> arch/arm64/kernel/ptrace.c | 59 ++
> arch/arm64/kernel/signal.c | 242 ++++++-
> arch/arm64/kernel/traps.c | 11 +
> arch/arm64/kvm/emulate-nested.c | 4 +
> arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
> arch/arm64/kvm/sys_regs.c | 22 +
> arch/arm64/mm/Makefile | 1 +
> arch/arm64/mm/fault.c | 79 ++-
> arch/arm64/mm/gcs.c | 300 +++++++++
> arch/arm64/mm/mmap.c | 13 +-
> arch/arm64/tools/cpucaps | 1 +
> arch/x86/include/uapi/asm/mman.h | 3 -
> fs/proc/task_mmu.c | 3 +
> include/linux/mm.h | 16 +-
> include/uapi/asm-generic/mman.h | 4 +
> include/uapi/linux/elf.h | 1 +
> include/uapi/linux/prctl.h | 22 +
> kernel/sys.c | 30 +
> tools/testing/selftests/arm64/Makefile | 2 +-
> tools/testing/selftests/arm64/abi/hwcap.c | 19 +
> tools/testing/selftests/arm64/fp/assembler.h | 15 +
> tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
> tools/testing/selftests/arm64/fp/sve-test.S | 2 +
> tools/testing/selftests/arm64/fp/za-test.S | 2 +
> tools/testing/selftests/arm64/fp/zt-test.S | 2 +
> tools/testing/selftests/arm64/gcs/.gitignore | 5 +
> tools/testing/selftests/arm64/gcs/Makefile | 24 +
> tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
> tools/testing/selftests/arm64/gcs/basic-gcs.c | 428 ++++++++++++
> tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++
> .../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++
> tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++
> tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++
> tools/testing/selftests/arm64/gcs/libc-gcs.c | 736 +++++++++++++++++++++
> tools/testing/selftests/arm64/signal/.gitignore | 1 +
> .../testing/selftests/arm64/signal/test_signals.c | 17 +-
> .../testing/selftests/arm64/signal/test_signals.h | 6 +
> .../selftests/arm64/signal/test_signals_utils.c | 32 +-
> .../selftests/arm64/signal/test_signals_utils.h | 39 ++
> .../arm64/signal/testcases/gcs_exception_fault.c | 62 ++
> .../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++
> .../arm64/signal/testcases/gcs_write_fault.c | 67 ++
> .../selftests/arm64/signal/testcases/testcases.c | 7 +
> .../selftests/arm64/signal/testcases/testcases.h | 1 +
> tools/testing/selftests/ksft_shstk.h | 37 ++
> 73 files changed, 4241 insertions(+), 40 deletions(-)
> ---
> base-commit: 50abefbf1bc07f5c4e403fd28f71dcee855100f7
> change-id: 20230303-arm64-gcs-e311ab0d8729
>
> Best regards,
> --
> Mark Brown <[email protected]>
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi,
I worked on the x86 kernel shadow stack support. I think it is an
interesting suggestion. Some questions below, and I will think more on
it.
On Tue, 2024-02-20 at 11:36 -0500, Stefan O'Rear wrote:
> While discussing the ABI implications of shadow stacks in the context
> of
> Zicfiss and musl a few days ago, I had the following idea for how to
> solve
> the source compatibility problems with shadow stacks in POSIX.1-2004
> and
> POSIX.1-2017:
>
> 1. Introduce a "flexible shadow stack handling" option. For what
> follows,
> it doesn't matter if this is system-wide, per-mm, or per-vma.
>
> 2. Shadow stack faults on non-shadow stack pages, if flexible shadow
> stack
> handling is in effect, cause the affected page to become a shadow
> stack
> page. When this happens, the page filled with invalid address
> tokens.
Hmm, could the shadow stack underflow onto the real stack then? Not
sure how bad that is. INCSSP (incrementing the SSP register on x86)
loops are not rare so it seems like something that could happen.
>
> Faults from non-shadow-stack accesses to a shadow-stack page which
> was
> created by the previous paragraph will cause the page to revert to
> non-shadow-stack usage, with or without clearing.
Won't this prevent catching stack overflows when they happen? An
overflow will just turn the shadow stack into normal stack and only get
detected when the shadow stack unwinds?
A related question would be how to handle the expanding nature of the
initial stack. I guess the initial stack could be special and have a
separate shadow stack.
>
> Important: a shadow stack operation can only load a valid address
> from
> a page if that page has been in continuous shadow stack use since
> the
> address was written by another shadow stack operation; the
> flexibility
> delays error reporting in cases of stray writes but it never
> allows for
> corruption of shadow stack operation.
Shadow stacks currently have automatic guard gaps to try to prevent one
thread from overflowing onto another thread's shadow stack. This would
somewhat opens that up, as the stack guard gaps are usually maintained
by userspace for new threads. It would have to be thought through if
these could still be enforced with checking at additional spots.
>
> 3. Standards-defined operations which use a user-provided stack
> (makecontext, sigaltstack, pthread_attr_setstack) use a subrange
> of the
> provided stack for shadow stack storage. I propose to use a
> shadow
> stack size of 1/32 of the provided stack size, rounded up to a
> positive
> integer number of pages, and place the shadow stack allocation at
> the
> lowest page-aligned address inside the provided stack region.
>
> Since page usage is flexible, no change in page permissions is
> immediately needed; this merely sets the initial shadow stack
> pointer for
> the new context.
>
> If the shadow stack grew in the opposite direction to the
> architectural
> stack, it would not be necessary to pick a fixed direction.
>
> 4. SIGSTKSZ and MINSIGSTKSZ are increased by 2 pages to provide
> sufficient
> space for a minimum-sized shadow stack region and worst case
> alignment.
Do all makecontext() callers ensure the size is greater than this?
I guess glibc's makecontext() could do this scheme to prevent leaking
without any changes to the kernel. Basically steal a little of the
stack address range and overwrite it with a shadow stack mapping. But
only if the apps leave enough room. If they need to be updated, then
they could be updated to manage their own shadow stacks too I think.
>
> _Without_ doing this, sigaltstack cannot be used to recover from
> stack
> overflows if the shadow stack limit is reached first, and makecontext
> cannot be supported without memory leaks and unreportable error
> conditions.
FWIW, I think the makecontext() shadow stack leaking is a bad idea. I
would prefer the existing makecontext() interface just didn't support
shadow stack, rather than the leaking solution glibc does today.
The situation (for arm and riscv too I think?) is that some
applications will just not work automatically due to custom stack
switching implementations. (user level threading libraries, JITs, etc).
So I think it should be ok to ask for apps to change to enable shadow
stack and we should avoid doing anything too awkward in pursuit of
getting it to work completely transparently.
For ucontext, there was some discussion about implementing changes to
the interface makecontext() interface that allows the app to allocate
and manage their own shadow stacks. So they would be responsible for
freeing and allocating the shadow stacks. It seems a little more
straightforward.
For x86, due to some existing GCC binaries that jumped ahead of the
kernel support, it will likely require an ABI opt-in to enable alt
shadow stacks. So alt shadow stack support design is still pretty open
on the x86 side. Very glad to get broader input on it.
>
> Kernel-allocated shadow stacks with a unique VM type are still useful
> since
> they allows stray writes to crash at the time the stray write is
> performed,
> rather than delaying the crash until the next shadow stack read.
>
> The pthread and makecontext changes could be purely libc side, but we
> would
> need kernel support for sigaltstack and page usage changes.
>
> Luckily, there is no need to support stacks which are simultaneously
> used
> from more than one process, so "is this a shadow stack page" can be
> tracked
> purely at the vma/pte level without any need to involve the inode.
> POSIX
> explicitly allows using mmap to obtain stack memory and does not
> forbid
> MAP_SHARED; I consider this sufficiently perverse application
> behavior that
> it is not necessary to ensure exclusive use of the underlying pages
> while
> a shadow stack pte exists. (Applications that use MAP_SHARED for
> stacks
> do not get the full benefit of the shadow stack but they keep
> POSIX.1-2004
> conformance, applications that allocate stacks exclusively in
> MAP_PRIVATE
> memory lose no security.)
On x86 we don't support MAP_SHARED shadow stacks. There is a whole
snarl around the dirty bit in the PTE. I'm not sure it's impossible but
it was gladly avoided. There is also a benefit in avoiding having them
get mapped as writable in a different context.
>
> The largest complication of this scheme is likely to be that the
> shadow
> stack usage property of a page needs to survive the page being
> swapped out
> and back in, which likely means that it must be present in the swap
> PTE.
>
> I am substantially less familiar with GCS and SHSTK than with
> Zicfiss.
> It is likely that a syscall or other mechanism is needed to
> initialize the
> shadow stack in flexible memory for makecontext.
The ucontext stacks (and alt shadow stacks is the plan) need to have a
"restore token". So, yea, you would probably need some syscall to
"convert" the normal stack memory into shadow stack with a restore
token.
>
> Is there interest on the kernel side on having mechanisms to fully
> support
> POSIX.1-2004 with GCS or Zicfiss enabled?
Can you clarify, is the goal to meet compatibility with the spec or try
to make more apps run with shadow stack automatically?
On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P wrote:
> Hi,
>
> I worked on the x86 kernel shadow stack support. I think it is an
> interesting suggestion. Some questions below, and I will think more on
> it.
>
> On Tue, 2024-02-20 at 11:36 -0500, Stefan O'Rear wrote:
> > While discussing the ABI implications of shadow stacks in the context
> > of
> > Zicfiss and musl a few days ago, I had the following idea for how to
> > solve
> > the source compatibility problems with shadow stacks in POSIX.1-2004
> > and
> > POSIX.1-2017:
> >
> > 1. Introduce a "flexible shadow stack handling" option. For what
> > follows,
> > it doesn't matter if this is system-wide, per-mm, or per-vma.
> >
> > 2. Shadow stack faults on non-shadow stack pages, if flexible shadow
> > stack
> > handling is in effect, cause the affected page to become a shadow
> > stack
> > page. When this happens, the page filled with invalid address
> > tokens.
>
> Hmm, could the shadow stack underflow onto the real stack then? Not
> sure how bad that is. INCSSP (incrementing the SSP register on x86)
> loops are not rare so it seems like something that could happen.
Shadow stack underflow should fault on attempt to access
non-shadow-stack memory as shadow-stack, no?
> > Faults from non-shadow-stack accesses to a shadow-stack page which
> > was
> > created by the previous paragraph will cause the page to revert to
> > non-shadow-stack usage, with or without clearing.
>
> Won't this prevent catching stack overflows when they happen? An
> overflow will just turn the shadow stack into normal stack and only get
> detected when the shadow stack unwinds?
I don't think that's as big a problem as it sounds like. It might make
pinpointing the spot at which things went wrong take a little bit more
work, but it should not admit any wrong-execution.
> A related question would be how to handle the expanding nature of the
> initial stack. I guess the initial stack could be special and have a
> separate shadow stack.
That seems fine.
> > Important: a shadow stack operation can only load a valid address
> > from
> > a page if that page has been in continuous shadow stack use since
> > the
> > address was written by another shadow stack operation; the
> > flexibility
> > delays error reporting in cases of stray writes but it never
> > allows for
> > corruption of shadow stack operation.
>
> Shadow stacks currently have automatic guard gaps to try to prevent one
> thread from overflowing onto another thread's shadow stack. This would
> somewhat opens that up, as the stack guard gaps are usually maintained
> by userspace for new threads. It would have to be thought through if
> these could still be enforced with checking at additional spots.
I would think the existing guard pages would already do that if a
thread's shadow stack is contiguous with its own data stack.
> > 3. Standards-defined operations which use a user-provided stack
> > (makecontext, sigaltstack, pthread_attr_setstack) use a subrange
> > of the
> > provided stack for shadow stack storage. I propose to use a
> > shadow
> > stack size of 1/32 of the provided stack size, rounded up to a
> > positive
> > integer number of pages, and place the shadow stack allocation at
> > the
> > lowest page-aligned address inside the provided stack region.
> >
> > Since page usage is flexible, no change in page permissions is
> > immediately needed; this merely sets the initial shadow stack
> > pointer for
> > the new context.
> >
> > If the shadow stack grew in the opposite direction to the
> > architectural
> > stack, it would not be necessary to pick a fixed direction.
> >
> > 4. SIGSTKSZ and MINSIGSTKSZ are increased by 2 pages to provide
> > sufficient
> > space for a minimum-sized shadow stack region and worst case
> > alignment.
>
> Do all makecontext() callers ensure the size is greater than this?
>
> I guess glibc's makecontext() could do this scheme to prevent leaking
> without any changes to the kernel. Basically steal a little of the
> stack address range and overwrite it with a shadow stack mapping. But
> only if the apps leave enough room. If they need to be updated, then
> they could be updated to manage their own shadow stacks too I think.
From the musl side, I have always looked at the entirely of shadow
stack stuff with very heavy skepticism, and anything that breaks
existing interface contracts, introduced places where apps can get
auto-killed because a late resource allocation fails, or requires
applications to code around the existence of something that should be
an implementation detail, is a non-starter. To even consider shadow
stack support, it must truely be fully non-breaking.
> > _Without_ doing this, sigaltstack cannot be used to recover from
> > stack
> > overflows if the shadow stack limit is reached first, and makecontext
> > cannot be supported without memory leaks and unreportable error
> > conditions.
>
> FWIW, I think the makecontext() shadow stack leaking is a bad idea. I
> would prefer the existing makecontext() interface just didn't support
> shadow stack, rather than the leaking solution glibc does today.
AIUI the proposal by Stefan makes it non-leaking because it's just
using normal memory that reverts to normal usage on any
non-shadow-stack access.
Rich
On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P wrote:
> On Tue, 2024-02-20 at 11:36 -0500, Stefan O'Rear wrote:
> > 2. Shadow stack faults on non-shadow stack pages, if flexible shadow
> > stack
> > ?? handling is in effect, cause the affected page to become a shadow
> > stack
> > ?? page.? When this happens, the page filled with invalid address
> > tokens.
> Hmm, could the shadow stack underflow onto the real stack then? Not
> sure how bad that is. INCSSP (incrementing the SSP register on x86)
> loops are not rare so it seems like something that could happen.
Yes, they'd trash any pages of normal stack they touch as they do so but
otherwise seems similar to overflow.
> The situation (for arm and riscv too I think?) is that some
> applications will just not work automatically due to custom stack
> switching implementations. (user level threading libraries, JITs, etc).
> So?I think it should be ok to ask for apps to change to enable shadow
> stack and we should avoid doing anything too awkward in pursuit of
> getting it to work completely transparently.
Yes, on arm64 anything that rewrites or is otherwise clever with the
stack is going to have to understand that the GCS exists on arm64 and do
matching rewrites/updates for the GCS. This includes anything that
switches stacks, it will need to use GCS specific instructions to change
the current shadow stack pointer.
> > MAP_SHARED; I consider this sufficiently perverse application
> > behavior that
> > it is not necessary to ensure exclusive use of the underlying pages
> > while
> > a shadow stack pte exists.? (Applications that use MAP_SHARED for
> > stacks
> > do not get the full benefit of the shadow stack but they keep
> > POSIX.1-2004
> > conformance, applications that allocate stacks exclusively in
> > MAP_PRIVATE
> > memory lose no security.)
> On x86 we don't support MAP_SHARED shadow stacks. There is a whole
> snarl around the dirty bit in the PTE. I'm not sure it's impossible but
> it was gladly avoided. There is also a benefit in avoiding having them
> get mapped as writable in a different context.
Similarly for arm64, I think we can physically do it IIRC but between
having to map via map_shadow_stack() for security reasons and it just
generally not seeming like a clever idea the implementation shouldn't
actually let you get a MAP_SHARED GCS it's not something that's been
considered.
> > I am substantially less familiar with GCS and SHSTK than with
> > Zicfiss.
> > It is likely that a syscall or other mechanism is needed to
> > initialize the
> > shadow stack in flexible memory for makecontext.
> The ucontext stacks (and alt shadow stacks is the plan) need to have a
> "restore token". So, yea, you would probably need some syscall to
> "convert" the normal stack memory into shadow stack with a restore
> token.
Similar considerations for GCS, we need tokens and we don't want
userspace to be able to write by itself in the normal case.
On Tue, 2024-02-20 at 13:57 -0500, Rich Felker wrote:
> On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P wrote:
> > Hmm, could the shadow stack underflow onto the real stack then? Not
> > sure how bad that is. INCSSP (incrementing the SSP register on x86)
> > loops are not rare so it seems like something that could happen.
>
> Shadow stack underflow should fault on attempt to access
> non-shadow-stack memory as shadow-stack, no?
Maybe I'm misunderstanding. I thought the proposal included allowing
shadow stack access to convert normal address ranges to shadow stack,
and normal writes to convert shadow stack to normal.
> >
> > Won't this prevent catching stack overflows when they happen? An
> > overflow will just turn the shadow stack into normal stack and only
> > get
> > detected when the shadow stack unwinds?
>
> I don't think that's as big a problem as it sounds like. It might
> make
> pinpointing the spot at which things went wrong take a little bit
> more
> work, but it should not admit any wrong-execution.
Right, it's a point about debugging. I'm just trying to analyze the
pros and cons and not calling it a showstopper.
> >
> > Shadow stacks currently have automatic guard gaps to try to prevent
> > one
> > thread from overflowing onto another thread's shadow stack. This
> > would
> > somewhat opens that up, as the stack guard gaps are usually
> > maintained
> > by userspace for new threads. It would have to be thought through
> > if
> > these could still be enforced with checking at additional spots.
>
> I would think the existing guard pages would already do that if a
> thread's shadow stack is contiguous with its own data stack.
The difference is that the kernel provides the guard gaps, where this
would rely on userspace to do it. It is not a showstopper either.
I think my biggest question on this is how does it change the
capability for two threads to share a shadow stack. It might require
some special rules around the syscall that writes restore tokens. So
I'm not sure. It probably needs a POC.
>
> From the musl side, I have always looked at the entirely of shadow
> stack stuff with very heavy skepticism, and anything that breaks
> existing interface contracts, introduced places where apps can get
> auto-killed because a late resource allocation fails, or requires
> applications to code around the existence of something that should be
> an implementation detail, is a non-starter. To even consider shadow
> stack support, it must truely be fully non-breaking.
The manual assembly stack switching and JIT code in the apps needs to
be updated. I don't think there is a way around it.
I agree though that the late allocation failures are not great. Mark is
working on clone3 support which should allow moving the shadow stack
allocation to happen in userspace with the normal stack. Even for riscv
though, doesn't it need to update a new register in stack switching?
BTW, x86 shadow stack has a mode where the shadow stack is writable
with a special instruction (WRSS). It enables the SSP to be set
arbitrarily by writing restore tokens. We discussed this as an option
to make the existing longjmp() and signal stuff work more transparently
for glibc.
>
> > > _Without_ doing this, sigaltstack cannot be used to recover from
> > > stack
> > > overflows if the shadow stack limit is reached first, and
> > > makecontext
> > > cannot be supported without memory leaks and unreportable error
> > > conditions.
> >
> > FWIW, I think the makecontext() shadow stack leaking is a bad idea.
> > I
> > would prefer the existing makecontext() interface just didn't
> > support
> > shadow stack, rather than the leaking solution glibc does today.
>
> AIUI the proposal by Stefan makes it non-leaking because it's just
> using normal memory that reverts to normal usage on any
> non-shadow-stack access.
>
Right, but does it break any existing apps anyway (because of small
ucontext stack sizes)?
BTW, when I talk about "not supporting" I don't mean the app should
crash. I mean it should instead run normally, just without shadow stack
enabled. Not sure if that was clear. Since shadow stack is not
essential for an application to function, it is only security hardening
on top.
Although determining if an application supports shadow stack has turned
out to be difficult in practice. Handling dlopen() is especially hard.
On Tue, 2024-02-20 at 20:14 +0000, Mark Brown wrote:
> > Hmm, could the shadow stack underflow onto the real stack then? Not
> > sure how bad that is. INCSSP (incrementing the SSP register on x86)
> > loops are not rare so it seems like something that could happen.
>
> Yes, they'd trash any pages of normal stack they touch as they do so
> but
> otherwise seems similar to overflow.
I was thinking in the normal buffer overflow case there is a guard gap
at the end of the stack, but in this case the shadow stack is directly
adjacent to the regular stack. It's probably a minor point.
On Tue, Feb 20, 2024 at 11:30:22PM +0000, Edgecombe, Rick P wrote:
> On Tue, 2024-02-20 at 13:57 -0500, Rich Felker wrote:
> > On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P wrote:
> > > Hmm, could the shadow stack underflow onto the real stack then? Not
> > > sure how bad that is. INCSSP (incrementing the SSP register on x86)
> > > loops are not rare so it seems like something that could happen.
> >
> > Shadow stack underflow should fault on attempt to access
> > non-shadow-stack memory as shadow-stack, no?
>
> Maybe I'm misunderstanding. I thought the proposal included allowing
> shadow stack access to convert normal address ranges to shadow stack,
> and normal writes to convert shadow stack to normal.
As I understood the original discussion of the proposal on IRC, it was
only one-way (from shadow to normal). Unless I'm missing something,
making it one-way is necessary to catch situations where the shadow
stack would become compromised.
> > > Shadow stacks currently have automatic guard gaps to try to prevent
> > > one
> > > thread from overflowing onto another thread's shadow stack. This
> > > would
> > > somewhat opens that up, as the stack guard gaps are usually
> > > maintained
> > > by userspace for new threads. It would have to be thought through
> > > if
> > > these could still be enforced with checking at additional spots.
> >
> > I would think the existing guard pages would already do that if a
> > thread's shadow stack is contiguous with its own data stack.
>
> The difference is that the kernel provides the guard gaps, where this
> would rely on userspace to do it. It is not a showstopper either.
>
> I think my biggest question on this is how does it change the
> capability for two threads to share a shadow stack. It might require
> some special rules around the syscall that writes restore tokens. So
> I'm not sure. It probably needs a POC.
Why would they be sharing a shadow stack?
> > From the musl side, I have always looked at the entirely of shadow
> > stack stuff with very heavy skepticism, and anything that breaks
> > existing interface contracts, introduced places where apps can get
> > auto-killed because a late resource allocation fails, or requires
> > applications to code around the existence of something that should be
> > an implementation detail, is a non-starter. To even consider shadow
> > stack support, it must truely be fully non-breaking.
>
> The manual assembly stack switching and JIT code in the apps needs to
> be updated. I don't think there is a way around it.
Indeed, I'm not talking about programs with JIT/manual stack-switching
asm, just anything using existing APIs for control of stack --
pthread_setstack, makecontext, sigaltstack, etc.
> I agree though that the late allocation failures are not great. Mark is
> working on clone3 support which should allow moving the shadow stack
> allocation to happen in userspace with the normal stack. Even for riscv
> though, doesn't it need to update a new register in stack switching?
If clone is called with signals masked, it's probably not necessary
for the kernel to set the shadow stack register as part of clone3.
> BTW, x86 shadow stack has a mode where the shadow stack is writable
> with a special instruction (WRSS). It enables the SSP to be set
> arbitrarily by writing restore tokens. We discussed this as an option
> to make the existing longjmp() and signal stuff work more transparently
> for glibc.
>
> >
> > > > _Without_ doing this, sigaltstack cannot be used to recover from
> > > > stack
> > > > overflows if the shadow stack limit is reached first, and
> > > > makecontext
> > > > cannot be supported without memory leaks and unreportable error
> > > > conditions.
> > >
> > > FWIW, I think the makecontext() shadow stack leaking is a bad idea.
> > > I
> > > would prefer the existing makecontext() interface just didn't
> > > support
> > > shadow stack, rather than the leaking solution glibc does today.
> >
> > AIUI the proposal by Stefan makes it non-leaking because it's just
> > using normal memory that reverts to normal usage on any
> > non-shadow-stack access.
> >
>
> Right, but does it break any existing apps anyway (because of small
> ucontext stack sizes)?
>
> BTW, when I talk about "not supporting" I don't mean the app should
> crash. I mean it should instead run normally, just without shadow stack
> enabled. Not sure if that was clear. Since shadow stack is not
> essential for an application to function, it is only security hardening
> on top.
>
> Although determining if an application supports shadow stack has turned
> out to be difficult in practice. Handling dlopen() is especially hard.
One reasonable thing to do, that might be preferable to overengineered
solutions, is to disable shadow-stack process-wide if an interface
incompatible with it is used (sigaltstack, pthread_create with an
attribute setup using pthread_attr_setstack, makecontext, etc.), as
well as if an incompatible library is is dlopened. This is much more
acceptable than continuing to run with shadow stacks managed sloppily
by the kernel and async killing the process on OOM, and is probably
*more compatible* with apps than changing the minimum stack size
requirements out from under them.
The place where it's really needed to be able to allocate the shadow
stack synchronously under userspace control, in order to harden normal
applications that aren't doing funny things, is in pthread_create
without a caller-provided stack.
Rich
On Tue, Feb 20, 2024, at 6:30 PM, Edgecombe, Rick P wrote:
> On Tue, 2024-02-20 at 13:57 -0500, Rich Felker wrote:
>> On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P wrote:
>> > Hmm, could the shadow stack underflow onto the real stack then? Not
>> > sure how bad that is. INCSSP (incrementing the SSP register on x86)
>> > loops are not rare so it seems like something that could happen.
>>
>> Shadow stack underflow should fault on attempt to access
>> non-shadow-stack memory as shadow-stack, no?
>
> Maybe I'm misunderstanding. I thought the proposal included allowing
> shadow stack access to convert normal address ranges to shadow stack,
> and normal writes to convert shadow stack to normal.
Ideally for riscv only writes would cause conversion, an incssp underflow
which performs shadow stack reads would be able to fault early.
For arm, since a syscall is needed anyway to set up the token in a new
shadow stack region, it would make sense for conversion from non-shadow
to shadow usage to never be automatic.
>> >
>> > Won't this prevent catching stack overflows when they happen? An
>> > overflow will just turn the shadow stack into normal stack and only
>> > get
>> > detected when the shadow stack unwinds?
>>
>> I don't think that's as big a problem as it sounds like. It might
>> make
>> pinpointing the spot at which things went wrong take a little bit
>> more
>> work, but it should not admit any wrong-execution.
>
> Right, it's a point about debugging. I'm just trying to analyze the
> pros and cons and not calling it a showstopper.
It's certainly undesirable, so I'd like to have both mechanisms available
(shadow stacks in ordinary memory to support several problematic APIs,
and in dedicated mappings with guard pages otherwise).
>> >
>> > Shadow stacks currently have automatic guard gaps to try to prevent
>> > one
>> > thread from overflowing onto another thread's shadow stack. This
>> > would
>> > somewhat opens that up, as the stack guard gaps are usually
>> > maintained
>> > by userspace for new threads. It would have to be thought through
>> > if
>> > these could still be enforced with checking at additional spots.
>>
>> I would think the existing guard pages would already do that if a
>> thread's shadow stack is contiguous with its own data stack.
>
> The difference is that the kernel provides the guard gaps, where this
> would rely on userspace to do it. It is not a showstopper either.
>
> I think my biggest question on this is how does it change the
> capability for two threads to share a shadow stack. It might require
> some special rules around the syscall that writes restore tokens. So
> I'm not sure. It probably needs a POC.
I'm not quite understanding what the property you're looking for here is.
>> From the musl side, I have always looked at the entirely of shadow
>> stack stuff with very heavy skepticism, and anything that breaks
>> existing interface contracts, introduced places where apps can get
>> auto-killed because a late resource allocation fails, or requires
>> applications to code around the existence of something that should be
>> an implementation detail, is a non-starter. To even consider shadow
>> stack support, it must truely be fully non-breaking.
>
> The manual assembly stack switching and JIT code in the apps needs to
> be updated. I don't think there is a way around it.
Naturally. If an application uses nonportable functionality like JIT
and inline assembly, it's fine (within reason) for those nonportable
components to need changes for shadow stack support.
The objective of this proposal is to allow applications that do _not_
use inline assembly but rather only C APIs defined in POSIX.1-2004 to
execute correctly in an environment where shadow stacks are enabled
by default.
> I agree though that the late allocation failures are not great. Mark is
> working on clone3 support which should allow moving the shadow stack
> allocation to happen in userspace with the normal stack. Even for riscv
> though, doesn't it need to update a new register in stack switching?
>
> BTW, x86 shadow stack has a mode where the shadow stack is writable
> with a special instruction (WRSS). It enables the SSP to be set
> arbitrarily by writing restore tokens. We discussed this as an option
> to make the existing longjmp() and signal stuff work more transparently
> for glibc.
>
>>
>> > > _Without_ doing this, sigaltstack cannot be used to recover from
>> > > stack
>> > > overflows if the shadow stack limit is reached first, and
>> > > makecontext
>> > > cannot be supported without memory leaks and unreportable error
>> > > conditions.
>> >
>> > FWIW, I think the makecontext() shadow stack leaking is a bad idea.
>> > I
>> > would prefer the existing makecontext() interface just didn't
>> > support
>> > shadow stack, rather than the leaking solution glibc does today.
>>
>> AIUI the proposal by Stefan makes it non-leaking because it's just
>> using normal memory that reverts to normal usage on any
>> non-shadow-stack access.
>>
>
> Right, but does it break any existing apps anyway (because of small
> ucontext stack sizes)?
Possibly, but that's what SIGSTKSZ/MINSIGSTKSZ is for. This is already
variable on several platforms due to variable-length vector extensions.
> BTW, when I talk about "not supporting" I don't mean the app should
> crash. I mean it should instead run normally, just without shadow stack
> enabled. Not sure if that was clear. Since shadow stack is not
> essential for an application to function, it is only security hardening
> on top.
I appreciate that. How far can we go in that direction? If we can
automatically disable shadow stacks on any call to makecontext, sigaltstack,
or pthread_attr_setstack without causing other threads to crash if they were
in the middle of shadow stack maintenance we can probably simplify this
proposal, although I need to think more about what's possible.
> Although determining if an application supports shadow stack has turned
> out to be difficult in practice. Handling dlopen() is especially hard.
How so? Is the hard part figuring out if you need to do something, or
doing it?
-s
On Tue, 2024-02-20 at 18:54 -0500, [email protected] wrote:
> On Tue, Feb 20, 2024 at 11:30:22PM +0000, Edgecombe, Rick P wrote:
> > On Tue, 2024-02-20 at 13:57 -0500, Rich Felker wrote:
> > > On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P
> > > wrote:
> > > > Hmm, could the shadow stack underflow onto the real stack then?
> > > > Not
> > > > sure how bad that is. INCSSP (incrementing the SSP register on
> > > > x86)
> > > > loops are not rare so it seems like something that could
> > > > happen.
> > >
> > > Shadow stack underflow should fault on attempt to access
> > > non-shadow-stack memory as shadow-stack, no?
> >
> > Maybe I'm misunderstanding. I thought the proposal included
> > allowing
> > shadow stack access to convert normal address ranges to shadow
> > stack,
> > and normal writes to convert shadow stack to normal.
>
> As I understood the original discussion of the proposal on IRC, it
> was
> only one-way (from shadow to normal). Unless I'm missing something,
> making it one-way is necessary to catch situations where the shadow
> stack would become compromised.
The original post here:
https://lore.kernel.org/lkml/[email protected]/
...has:
"Shadow stack faults on non-shadow stack pages, if flexible shadow
stack handling is in effect, cause the affected page to become a shadow
stack page. When this happens, the page filled with invalid address
tokens."
...and:
"Faults from non-shadow-stack accesses to a shadow-stack page which was
created by the previous paragraph will cause the page to revert to non-
shadow-stack usage, with or without clearing."
I see Stefan has clarified in another response. So I'll go try to
figure it out.
>
> > > > Shadow stacks currently have automatic guard gaps to try to
> > > > prevent
> > > > one
> > > > thread from overflowing onto another thread's shadow stack.
> > > > This
> > > > would
> > > > somewhat opens that up, as the stack guard gaps are usually
> > > > maintained
> > > > by userspace for new threads. It would have to be thought
> > > > through
> > > > if
> > > > these could still be enforced with checking at additional
> > > > spots.
> > >
> > > I would think the existing guard pages would already do that if a
> > > thread's shadow stack is contiguous with its own data stack.
> >
> > The difference is that the kernel provides the guard gaps, where
> > this
> > would rely on userspace to do it. It is not a showstopper either.
> >
> > I think my biggest question on this is how does it change the
> > capability for two threads to share a shadow stack. It might
> > require
> > some special rules around the syscall that writes restore tokens.
> > So
> > I'm not sure. It probably needs a POC.
>
> Why would they be sharing a shadow stack?
The guard gap was introduced originally based on a suggestion that
overflowing a shadow stack onto an adjacent shadow stack could cause
corruption that could be used by an attacker to work around the
protection. There wasn't any concrete demonstrated attacks or
suggestion that all the protection was moot.
But when we talk about capabilities for converting memory to shadow
stack with simple memory accesses, and syscalls that can write restore
token to shadow stacks, it's not immediately clear to me that it
wouldn't open up something like that. Like if two restore tokens were
written to a shadow stack, or two shadow stacks were adjacent with
normal memory between them that later got converted to shadow stack.
Those sorts of scenarios, but I won't lean on those specific examples.
Sorry for being hand wavy. It's just where I'm at, at this point.
>
> > > From the musl side, I have always looked at the entirely of
> > > shadow
> > > stack stuff with very heavy skepticism, and anything that breaks
> > > existing interface contracts, introduced places where apps can
> > > get
> > > auto-killed because a late resource allocation fails, or requires
> > > applications to code around the existence of something that
> > > should be
> > > an implementation detail, is a non-starter. To even consider
> > > shadow
> > > stack support, it must truely be fully non-breaking.
> >
> > The manual assembly stack switching and JIT code in the apps needs
> > to
> > be updated. I don't think there is a way around it.
>
> Indeed, I'm not talking about programs with JIT/manual stack-
> switching
> asm, just anything using existing APIs for control of stack --
> pthread_setstack, makecontext, sigaltstack, etc.
Then I think WRSS might fit your requirements better than what glibc
did. It was considered a reduced security mode that made libc's job
much easier and had better compatibility, but the last discussion was
to try to do it without WRSS.
>
> > I agree though that the late allocation failures are not great.
> > Mark is
> > working on clone3 support which should allow moving the shadow
> > stack
> > allocation to happen in userspace with the normal stack. Even for
> > riscv
> > though, doesn't it need to update a new register in stack
> > switching?
>
> If clone is called with signals masked, it's probably not necessary
> for the kernel to set the shadow stack register as part of clone3.
So you would want a mode of clone3 that basically leaves the shadow
stack bits alone? Mark was driving that effort, but it doesn't seem
horrible to me on first impression. If it would open up the possibility
of musl support.
>
> > BTW, x86 shadow stack has a mode where the shadow stack is writable
> > with a special instruction (WRSS). It enables the SSP to be set
> > arbitrarily by writing restore tokens. We discussed this as an
> > option
> > to make the existing longjmp() and signal stuff work more
> > transparently
> > for glibc.
> >
> >
> >
> > BTW, when I talk about "not supporting" I don't mean the app should
> > crash. I mean it should instead run normally, just without shadow
> > stack
> > enabled. Not sure if that was clear. Since shadow stack is not
> > essential for an application to function, it is only security
> > hardening
> > on top.
> >
> > Although determining if an application supports shadow stack has
> > turned
> > out to be difficult in practice. Handling dlopen() is especially
> > hard.
>
> One reasonable thing to do, that might be preferable to
> overengineered
> solutions, is to disable shadow-stack process-wide if an interface
> incompatible with it is used (sigaltstack, pthread_create with an
> attribute setup using pthread_attr_setstack, makecontext, etc.), as
> well as if an incompatible library is is dlopened.
I think it would be an interesting approach to determining
compatibility. On x86 there has been cases of binaries getting
mismarked as supporting shadow stack. So an automated way of filtering
some of those out would be very useful I think. I guess the dynamic
linker could determine this based on some list of functions?
The dlopen() bit gets complicated though. You need to disable shadow
stack for all threads, which presumably the kernel could be coaxed into
doing. But those threads might be using shadow stack instructions
(INCSSP, RSTORSSP, etc). These are a collection of instructions that
allow limited control of the SSP. When shadow stack gets disabled,
these suddenly turn into #UD generating instructions. So any other
threads executing those instructions when shadow stack got disabled
would be in for a nasty surprise.
Glibc's permissive mode (that disables shadow stack when dlopen()ing a
DSO that doesn't support shadow stack) is quite limited because of
this. There was a POC for working around it, but I'll stop there for
now, to not spam you with the details. I'm not sure of arm and risc-v
details on this specific corner, but for x86.
> This is much more
> acceptable than continuing to run with shadow stacks managed sloppily
> by the kernel and async killing the process on OOM, and is probably
> *more compatible* with apps than changing the minimum stack size
> requirements out from under them.
Yep.
>
> The place where it's really needed to be able to allocate the shadow
> stack synchronously under userspace control, in order to harden
> normal
> applications that aren't doing funny things, is in pthread_create
> without a caller-provided stack.
Yea most apps don't do anything too tricky. Mostly shadow stack "just
works". But it's no excuse to just crash for the others.
On Tue, Feb 20, 2024 at 06:59:58PM -0500, Stefan O'Rear wrote:
> On Tue, Feb 20, 2024, at 6:30 PM, Edgecombe, Rick P wrote:
> > Maybe I'm misunderstanding. I thought the proposal included allowing
> > shadow stack access to convert normal address ranges to shadow stack,
> > and normal writes to convert shadow stack to normal.
> Ideally for riscv only writes would cause conversion, an incssp underflow
> which performs shadow stack reads would be able to fault early.
> For arm, since a syscall is needed anyway to set up the token in a new
> shadow stack region, it would make sense for conversion from non-shadow
> to shadow usage to never be automatic.
Well, we only need the token to pivot in userspace so we could
*potentially* work something out as part of the conversion process.
It's not filling me with enthusiasm though, and I've certainly not
actually thought it through yet.
> > I agree though that the late allocation failures are not great. Mark is
> > working on clone3 support which should allow moving the shadow stack
> > allocation to happen in userspace with the normal stack. Even for riscv
> > though, doesn't it need to update a new register in stack switching?
> > BTW, x86 shadow stack has a mode where the shadow stack is writable
> > with a special instruction (WRSS). It enables the SSP to be set
> > arbitrarily by writing restore tokens. We discussed this as an option
> > to make the existing longjmp() and signal stuff work more transparently
> > for glibc.
We have this feature on arm64 too, plus a separately controllable push
instruction (though that's less useful here).
> > BTW, when I talk about "not supporting" I don't mean the app should
> > crash. I mean it should instead run normally, just without shadow stack
> > enabled. Not sure if that was clear. Since shadow stack is not
> > essential for an application to function, it is only security hardening
> > on top.
> I appreciate that. How far can we go in that direction? If we can
> automatically disable shadow stacks on any call to makecontext, sigaltstack,
> or pthread_attr_setstack without causing other threads to crash if they were
> in the middle of shadow stack maintenance we can probably simplify this
> proposal, although I need to think more about what's possible.
Aside from concerns about disabling over all the threads in the process
(which should be solvable if annoying) this would be incompatible with
policies which prevent disabling of shadow stacks, and it feels like it
might end up being a gadget people could use which will concern some
people. There's a tension here between compatibility and the security
applications of these features.
On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote:
> doing. But those threads might be using shadow stack instructions
> (INCSSP, RSTORSSP, etc). These are a collection of instructions that
> allow limited control of the SSP. When shadow stack gets disabled,
> these suddenly turn into #UD generating instructions. So any other
> threads executing those instructions when shadow stack got disabled
> would be in for a nasty surprise.
> Glibc's permissive mode (that disables shadow stack when dlopen()ing a
> DSO that doesn't support shadow stack) is quite limited because of
> this. There was a POC for working around it, but I'll stop there for
> now, to not spam you with the details. I'm not sure of arm and risc-v
> details on this specific corner, but for x86.
We have the same issue with disabling GCS causing GCS instructions to
become undefined.
On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote:
> On Tue, 2024-02-20 at 18:54 -0500, [email protected] wrote:
> > On Tue, Feb 20, 2024 at 11:30:22PM +0000, Edgecombe, Rick P wrote:
> > > On Tue, 2024-02-20 at 13:57 -0500, Rich Felker wrote:
> > > > On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P
> > > > > Shadow stacks currently have automatic guard gaps to try to
> > > > > prevent
> > > > > one
> > > > > thread from overflowing onto another thread's shadow stack.
> > > > > This
> > > > > would
> > > > > somewhat opens that up, as the stack guard gaps are usually
> > > > > maintained
> > > > > by userspace for new threads. It would have to be thought
> > > > > through
> > > > > if
> > > > > these could still be enforced with checking at additional
> > > > > spots.
> > > >
> > > > I would think the existing guard pages would already do that if a
> > > > thread's shadow stack is contiguous with its own data stack.
> > >
> > > The difference is that the kernel provides the guard gaps, where
> > > this
> > > would rely on userspace to do it. It is not a showstopper either.
> > >
> > > I think my biggest question on this is how does it change the
> > > capability for two threads to share a shadow stack. It might
> > > require
> > > some special rules around the syscall that writes restore tokens.
> > > So
> > > I'm not sure. It probably needs a POC.
> >
> > Why would they be sharing a shadow stack?
>
> The guard gap was introduced originally based on a suggestion that
> overflowing a shadow stack onto an adjacent shadow stack could cause
> corruption that could be used by an attacker to work around the
> protection. There wasn't any concrete demonstrated attacks or
> suggestion that all the protection was moot.
OK, so not sharing, just happening to be adjacent.
I was thinking from a standpoint of allocating them as part of the
same range as the main stack, just with different protections, where
that would never happen; you'd always have intervening non-shadowstack
pages. But when they're kernel-allocated, yes, they need their own
guard pages.
> But when we talk about capabilities for converting memory to shadow
> stack with simple memory accesses, and syscalls that can write restore
> token to shadow stacks, it's not immediately clear to me that it
> wouldn't open up something like that. Like if two restore tokens were
> written to a shadow stack, or two shadow stacks were adjacent with
> normal memory between them that later got converted to shadow stack.
> Those sorts of scenarios, but I won't lean on those specific examples.
> Sorry for being hand wavy. It's just where I'm at, at this point.
I don't think it's safe to have automatic conversions back and forth,
only for normal accesses to convert shadowstack to normal memory (in
which case, any subsequent attempt to operate on it as shadow stack
indicates a critical bug and should be trapped to terminate the
process).
> > > > From the musl side, I have always looked at the entirely of
> > > > shadow
> > > > stack stuff with very heavy skepticism, and anything that breaks
> > > > existing interface contracts, introduced places where apps can
> > > > get
> > > > auto-killed because a late resource allocation fails, or requires
> > > > applications to code around the existence of something that
> > > > should be
> > > > an implementation detail, is a non-starter. To even consider
> > > > shadow
> > > > stack support, it must truely be fully non-breaking.
> > >
> > > The manual assembly stack switching and JIT code in the apps needs
> > > to
> > > be updated. I don't think there is a way around it.
> >
> > Indeed, I'm not talking about programs with JIT/manual stack-
> > switching
> > asm, just anything using existing APIs for control of stack --
> > pthread_setstack, makecontext, sigaltstack, etc.
>
> Then I think WRSS might fit your requirements better than what glibc
> did. It was considered a reduced security mode that made libc's job
> much easier and had better compatibility, but the last discussion was
> to try to do it without WRSS.
Where can I read more about this? Some searches I tried didn't turn up
much useful information.
> > > I agree though that the late allocation failures are not great.
> > > Mark is
> > > working on clone3 support which should allow moving the shadow
> > > stack
> > > allocation to happen in userspace with the normal stack. Even for
> > > riscv
> > > though, doesn't it need to update a new register in stack
> > > switching?
> >
> > If clone is called with signals masked, it's probably not necessary
> > for the kernel to set the shadow stack register as part of clone3.
>
> So you would want a mode of clone3 that basically leaves the shadow
> stack bits alone? Mark was driving that effort, but it doesn't seem
> horrible to me on first impression. If it would open up the possibility
> of musl support.
Well I'm not sure. That's what we're trying to figure out. But I don't
think modifying it is a hard requirement, since it can be modified
from userspace if needed as long as signals are masked.
> > One reasonable thing to do, that might be preferable to
> > overengineered
> > solutions, is to disable shadow-stack process-wide if an interface
> > incompatible with it is used (sigaltstack, pthread_create with an
> > attribute setup using pthread_attr_setstack, makecontext, etc.), as
> > well as if an incompatible library is is dlopened.
>
> I think it would be an interesting approach to determining
> compatibility. On x86 there has been cases of binaries getting
> mismarked as supporting shadow stack. So an automated way of filtering
> some of those out would be very useful I think. I guess the dynamic
> linker could determine this based on some list of functions?
I didn't follow this whole mess, but from our side (musl) it does not
seem relevant. There are no legacy binaries wrongly marked because we
have never supported shadow stacks so far.
> The dlopen() bit gets complicated though. You need to disable shadow
> stack for all threads, which presumably the kernel could be coaxed into
> doing. But those threads might be using shadow stack instructions
> (INCSSP, RSTORSSP, etc). These are a collection of instructions that
> allow limited control of the SSP. When shadow stack gets disabled,
> these suddenly turn into #UD generating instructions. So any other
> threads executing those instructions when shadow stack got disabled
> would be in for a nasty surprise.
This is the kernel's problem if that's happening. It should be
trapping these and returning immediately like a NOP if shadow stack
has been disabled, not generating SIGILL.
> > The place where it's really needed to be able to allocate the shadow
> > stack synchronously under userspace control, in order to harden
> > normal
> > applications that aren't doing funny things, is in pthread_create
> > without a caller-provided stack.
>
> Yea most apps don't do anything too tricky. Mostly shadow stack "just
> works". But it's no excuse to just crash for the others.
One thing to note here is that, to enable this, we're going to need
some way to detect "new enough kernel that shadow stack semantics are
all right". If there are kernels that have shadow stack support but
with problems that make it unsafe to use (this sounds like the case),
we can't turn it on without a way to avoid trying to use it on those.
Rich
On Tue, 2024-02-20 at 20:27 -0500, [email protected] wrote:
> > Then I think WRSS might fit your requirements better than what
> > glibc
> > did. It was considered a reduced security mode that made libc's job
> > much easier and had better compatibility, but the last discussion
> > was
> > to try to do it without WRSS.
>
> Where can I read more about this? Some searches I tried didn't turn
> up
> much useful information.
There never was any proposal written down AFAIK. In the past we have
had a couple "shadow stack meetup" calls where folks who are working on
shadow stack got together to hash out some things. We discussed it
there.
But briefly, in the Intel SDM (and other places) there is documentation
on the special shadow stack instructions. The two key ones for this are
WRSS and RSTORSSP. WRSS is an instruction which can be enabled by the
kernel (and there is upstream support for this). The instruction can
write through shadow stack memory.
RSTORSSP can be used to consume a restore token, which is a special
value on the shadow stack. When this operations happens the SSP is
moved adjacent to the token that was just consumed. So between the two
of them the SSP can be adjusted to specific spots on the shadow stack
or another shadow stack.
Today when you longjmp() with shadow stack in glibc, INCSSP is used to
move the SSP back to the spot on the shadow stack where the setjmp()
was called. But this algorithm doesn't always work, for example,
longjmp()ing between stacks. To work around this glibc uses a scheme
where it searches from the target SSP for a shadow stack token and then
consumes it and INCSSPs back to the target SPP. It just barely
miraculously worked in most cases.
Some specific cases that were still open were longjmp()ing off of a
custom userspace threading library stack, which may not have left a
token behind when it jumped to a new stack. And also, potentially off
of an alt shadow stack in the future, depending on whether it leaves a
restore token when handling a signal. (the problem there, is if there
is no room to leave it).
So that is how x86 glic works, and I think arm was thinking along the
same lines. But if you have WRSS (and arm's version), you could just
write a restore token or anything else you need to fixup on the shadow
stack. Then you could longjmp() in one go without any high wire acts.
It's much simpler and more robust and would prevent needing to leave a
restore token when handling a signal to an alt shadow stack. Although,
nothing was ever prototyped. So "in theory".
But that is all about moving the SSP where you need it. It doesn't
resolve any of the allocation lifecycle issues. I think for those the
solutions are:
1. Not supporting ucontext/sigaltstack and shadow stack
2. Stefan's idea
3. A new interface that takes user allocated shadow stacks for those
operations
My preference has been a combination of 1 and 3. For threads, I think
Mark's clone3 enhancements will help.
Anyway, there is an attempt at a summary. I'd also point you to HJ for
more glibc context, as I mostly worked on the kernel side.
On Tue, 2024-02-20 at 18:11 -0800, Rick Edgecombe wrote:
> Some specific cases that were still open were longjmp()ing off of a
> custom userspace threading library stack, which may not have left a
> token behind when it jumped to a new stack. And also, potentially off
> of an alt shadow stack in the future, depending on whether it leaves
> a
> restore token when handling a signal. (the problem there, is if there
> is no room to leave it).
Ah, I remember the other one. If the token on the target shadow stack
is at the end of the shadow stack, it may not be able to handle pushing
a shadow stack signal frame if a signal hits while is unwinding through
the token. As in, where normal longjmp() is direct transition, in this
case the longjmp() operation can be temporarily in a place where a
signal cannot be handled.
On Tue, 2024-02-20 at 18:59 -0500, Stefan O'Rear wrote:
>
> Ideally for riscv only writes would cause conversion, an incssp
> underflow
> which performs shadow stack reads would be able to fault early.
Why can't makecontext() just clobber part of the low address side of
the passed in stack with a shadow stack mapping? Like say it just
munmap()'s part of the passed stack, and map_shadow_stack() in it's
place.
Then you could still have the shadow stack->normal conversion process
triggered by normal writes. IIUC the concern there is to make sure the
caller can reuse it as normal memory when it is done with the
ucontext/sigaltstack stuff? So the normal->shadow stack part could be
explicit.
But the more I think about this, the more I think it is a hack, and a
proper fix is to use new interfaces. It also would be difficult to
sell, if the faulting conversion stuff is in any way complex.
On Tue, Feb 20, 2024 at 08:27:37PM -0500, [email protected] wrote:
> On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote:
> > (INCSSP, RSTORSSP, etc). These are a collection of instructions that
> > allow limited control of the SSP. When shadow stack gets disabled,
> > these suddenly turn into #UD generating instructions. So any other
> > threads executing those instructions when shadow stack got disabled
> > would be in for a nasty surprise.
> This is the kernel's problem if that's happening. It should be
> trapping these and returning immediately like a NOP if shadow stack
> has been disabled, not generating SIGILL.
I'm not sure that's going to work out well, all it takes is some code
that's looking at the shadow stack and expecting something to happen as
a result of the instructions it's executing and we run into trouble. A
lot of things won't notice and will just happily carry on but I expect
there are going to be things that care. We also end up with an
additional state for threads that have had shadow stacks transparently
disabled, that's managable but still.
> > > The place where it's really needed to be able to allocate the shadow
> > > stack synchronously under userspace control, in order to harden
> > > normal
> > > applications that aren't doing funny things, is in pthread_create
> > > without a caller-provided stack.
> > Yea most apps don't do anything too tricky. Mostly shadow stack "just
> > works". But it's no excuse to just crash for the others.
> One thing to note here is that, to enable this, we're going to need
> some way to detect "new enough kernel that shadow stack semantics are
> all right". If there are kernels that have shadow stack support but
> with problems that make it unsafe to use (this sounds like the case),
> we can't turn it on without a way to avoid trying to use it on those.
If we have this automatic conversion of pages to shadow stack then we
should have an API for enabling it, userspace should be able to use the
presence of that API to determine if the feature is there.
On Wed, Feb 21, 2024 at 01:53:10PM +0000, Mark Brown wrote:
> On Tue, Feb 20, 2024 at 08:27:37PM -0500, [email protected] wrote:
> > On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote:
>
> > > (INCSSP, RSTORSSP, etc). These are a collection of instructions that
> > > allow limited control of the SSP. When shadow stack gets disabled,
> > > these suddenly turn into #UD generating instructions. So any other
> > > threads executing those instructions when shadow stack got disabled
> > > would be in for a nasty surprise.
>
> > This is the kernel's problem if that's happening. It should be
> > trapping these and returning immediately like a NOP if shadow stack
> > has been disabled, not generating SIGILL.
>
> I'm not sure that's going to work out well, all it takes is some code
> that's looking at the shadow stack and expecting something to happen as
> a result of the instructions it's executing and we run into trouble. A
> lot of things won't notice and will just happily carry on but I expect
> there are going to be things that care. We also end up with an
> additional state for threads that have had shadow stacks transparently
> disabled, that's managable but still.
I said NOP but there's no reason it strictly needs to be a NOP. It
could instead do something reasonable to convey the state of racing
with shadow stack being disabled.
>
> > > > The place where it's really needed to be able to allocate the shadow
> > > > stack synchronously under userspace control, in order to harden
> > > > normal
> > > > applications that aren't doing funny things, is in pthread_create
> > > > without a caller-provided stack.
>
> > > Yea most apps don't do anything too tricky. Mostly shadow stack "just
> > > works". But it's no excuse to just crash for the others.
>
> > One thing to note here is that, to enable this, we're going to need
> > some way to detect "new enough kernel that shadow stack semantics are
> > all right". If there are kernels that have shadow stack support but
> > with problems that make it unsafe to use (this sounds like the case),
> > we can't turn it on without a way to avoid trying to use it on those.
>
> If we have this automatic conversion of pages to shadow stack then we
> should have an API for enabling it, userspace should be able to use the
> presence of that API to determine if the feature is there.
Yes, or if a new prctl is needed to make disabling safe (see above)
that could probably be used.
Rich
On Wed, Feb 21, 2024 at 09:58:01AM -0500, [email protected] wrote:
> On Wed, Feb 21, 2024 at 01:53:10PM +0000, Mark Brown wrote:
> > On Tue, Feb 20, 2024 at 08:27:37PM -0500, [email protected] wrote:
> > > On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote:
> > > > (INCSSP, RSTORSSP, etc). These are a collection of instructions that
> > > > allow limited control of the SSP. When shadow stack gets disabled,
> > > > these suddenly turn into #UD generating instructions. So any other
> > > > threads executing those instructions when shadow stack got disabled
> > > > would be in for a nasty surprise.
> > > This is the kernel's problem if that's happening. It should be
> > > trapping these and returning immediately like a NOP if shadow stack
> > > has been disabled, not generating SIGILL.
> > I'm not sure that's going to work out well, all it takes is some code
> > that's looking at the shadow stack and expecting something to happen as
> > a result of the instructions it's executing and we run into trouble. A
> I said NOP but there's no reason it strictly needs to be a NOP. It
> could instead do something reasonable to convey the state of racing
> with shadow stack being disabled.
This feels like it's getting complicated and I fear it may be an uphill
struggle to get such code merged, at least for arm64. My instinct is
that it's going to be much more robust and generally tractable to let
things run to some suitable synchronisation point and then disable
there, but if we're going to do that then userspace can hopefully
arrange to do the disabling itself through the standard disable
interface anyway. Presumably it'll want to notice things being disabled
at some point anyway? TBH that's been how all the prior proposals for
process wide disable I've seen were done.
On Wed, Feb 21, 2024 at 05:36:12PM +0000, Mark Brown wrote:
> On Wed, Feb 21, 2024 at 09:58:01AM -0500, [email protected] wrote:
> > On Wed, Feb 21, 2024 at 01:53:10PM +0000, Mark Brown wrote:
> > > On Tue, Feb 20, 2024 at 08:27:37PM -0500, [email protected] wrote:
> > > > On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote:
>
> > > > > (INCSSP, RSTORSSP, etc). These are a collection of instructions that
> > > > > allow limited control of the SSP. When shadow stack gets disabled,
> > > > > these suddenly turn into #UD generating instructions. So any other
> > > > > threads executing those instructions when shadow stack got disabled
> > > > > would be in for a nasty surprise.
>
> > > > This is the kernel's problem if that's happening. It should be
> > > > trapping these and returning immediately like a NOP if shadow stack
> > > > has been disabled, not generating SIGILL.
>
> > > I'm not sure that's going to work out well, all it takes is some code
> > > that's looking at the shadow stack and expecting something to happen as
> > > a result of the instructions it's executing and we run into trouble. A
>
> > I said NOP but there's no reason it strictly needs to be a NOP. It
> > could instead do something reasonable to convey the state of racing
> > with shadow stack being disabled.
>
> This feels like it's getting complicated and I fear it may be an uphill
> struggle to get such code merged, at least for arm64. My instinct is
> that it's going to be much more robust and generally tractable to let
> things run to some suitable synchronisation point and then disable
> there, but if we're going to do that then userspace can hopefully
> arrange to do the disabling itself through the standard disable
> interface anyway. Presumably it'll want to notice things being disabled
> at some point anyway? TBH that's been how all the prior proposals for
> process wide disable I've seen were done.
If it's possible to disable per-thread rather than per-process, some
things are easier. Disabling on account of using alt stacks only needs
to be done on the threads using those stacks. However, for dlopen
purposes you need a way to disable shadow stack for the whole process.
Initially this is only needed for the thread that called dlopen, but
it needs to have propagated to any thread that synchronizes with
completion of the call to dlopen by the time that synchronization
occurs, and since that synchronization can happen in lots of different
ways that are purely userspace (thanks to futexes being userspace in
the uncontended case), I don't see any way to make it work without
extremely invasive, high-cost checks.
If folks on the kernel side are not going to be amenable to doing the
things that are easy for the kernel to make it work without breaking
compatibility with existing interfaces, but that are impossible or
near-impossible for userspace to do, this seems like a dead-end. And I
suspect an operation to "disable shadow stack, but without making
threads still in SS-critical sections crash" is going to be
necessary..
Rich
On Wed, 2024-02-21 at 12:57 -0500, [email protected] wrote:
> > This feels like it's getting complicated and I fear it may be an
> > uphill
> > struggle to get such code merged, at least for arm64. My instinct
> > is
> > that it's going to be much more robust and generally tractable to
> > let
> > things run to some suitable synchronisation point and then disable
> > there, but if we're going to do that then userspace can hopefully
> > arrange to do the disabling itself through the standard disable
> > interface anyway. Presumably it'll want to notice things being
> > disabled
> > at some point anyway? TBH that's been how all the prior proposals
> > for
> > process wide disable I've seen were done.
>
> If it's possible to disable per-thread rather than per-process, some
> things are easier. Disabling on account of using alt stacks only
> needs
> to be done on the threads using those stacks. However, for dlopen
> purposes you need a way to disable shadow stack for the whole
> process.
> Initially this is only needed for the thread that called dlopen, but
> it needs to have propagated to any thread that synchronizes with
> completion of the call to dlopen by the time that synchronization
> occurs, and since that synchronization can happen in lots of
> different
> ways that are purely userspace (thanks to futexes being userspace in
> the uncontended case), I don't see any way to make it work without
> extremely invasive, high-cost checks.
For glibc's use, we talked about a couple of options.
1. A mode to start suppressing the #UD's from the shadow stack
instructions
2. A mode to start suppressing #CPs (the exception that happens when
the shadow stack doesn't match). So the shadow stack instructions
continue to operate normally, but if the shadow stack gets mismatched
due to lack of support, the ret is emulated. It probably is safer (but
still not perfect), but the performance penalty of emulating every RET
after things get screwed up would be a significant down side. There
also needs to be clean handling of shadow stack #PFs.
3. Per-thread locking that is used around all shadow stack operations
that could be sensitive to disabling. This could be maybe exposed to
apps in case they want to use shadow stack instructions manually. Then
during dlopen() it waits until it can cleanly disable shadow stack for
each thread. In each critical sections there are checks for whether
shadow stack is still enabled.
3 is the cleanest and safest I think, and it was thought it might not
need kernel help, due to a scheme Florian had to direct signals to
specific threads. It's my preference at this point.
1 and 2 are POCed here, if you are interested:
https://github.com/rpedgeco/linux/commits/shstk_suppress_rfc/
>
> If folks on the kernel side are not going to be amenable to doing the
> things that are easy for the kernel to make it work without breaking
> compatibility with existing interfaces, but that are impossible or
> near-impossible for userspace to do, this seems like a dead-end. And
> I
> suspect an operation to "disable shadow stack, but without making
> threads still in SS-critical sections crash" is going to be
> necessary..
I think we have to work through all the alternative before we can
accuse the kernel of not being amenable. Is there something that you
would like to see out of this conversation that is not happening?
On Mon, Feb 19, 2024 at 11:02:22PM -0300, Thiago Jung Bauermann wrote:
> Mark Brown <[email protected]> writes:
> > + gcspr_el0 = addr + size - (2 * sizeof(u64));
> > + if (!gcs_consume_token(tsk, gcspr_el0)) {
> Should this code validate the end of stack marker? Or doesn't it matter
> whether the marker is correct or not?
I don't think we specifically care, we're just looking for a token here.
On Wed, Feb 21, 2024 at 06:12:30PM +0000, Edgecombe, Rick P wrote:
> On Wed, 2024-02-21 at 12:57 -0500, [email protected] wrote:
> > > This feels like it's getting complicated and I fear it may be an
> > > uphill
> > > struggle to get such code merged, at least for arm64. My instinct
> > > is
> > > that it's going to be much more robust and generally tractable to
> > > let
> > > things run to some suitable synchronisation point and then disable
> > > there, but if we're going to do that then userspace can hopefully
> > > arrange to do the disabling itself through the standard disable
> > > interface anyway. Presumably it'll want to notice things being
> > > disabled
> > > at some point anyway? TBH that's been how all the prior proposals
> > > for
> > > process wide disable I've seen were done.
> >
> > If it's possible to disable per-thread rather than per-process, some
> > things are easier. Disabling on account of using alt stacks only
> > needs
> > to be done on the threads using those stacks. However, for dlopen
> > purposes you need a way to disable shadow stack for the whole
> > process.
> > Initially this is only needed for the thread that called dlopen, but
> > it needs to have propagated to any thread that synchronizes with
> > completion of the call to dlopen by the time that synchronization
> > occurs, and since that synchronization can happen in lots of
> > different
> > ways that are purely userspace (thanks to futexes being userspace in
> > the uncontended case), I don't see any way to make it work without
> > extremely invasive, high-cost checks.
>
> For glibc's use, we talked about a couple of options.
> 1. A mode to start suppressing the #UD's from the shadow stack
> instructions
> 2. A mode to start suppressing #CPs (the exception that happens when
> the shadow stack doesn't match). So the shadow stack instructions
> continue to operate normally, but if the shadow stack gets mismatched
> due to lack of support, the ret is emulated. It probably is safer (but
> still not perfect), but the performance penalty of emulating every RET
> after things get screwed up would be a significant down side. There
> also needs to be clean handling of shadow stack #PFs.
> 3. Per-thread locking that is used around all shadow stack operations
> that could be sensitive to disabling. This could be maybe exposed to
> apps in case they want to use shadow stack instructions manually. Then
> during dlopen() it waits until it can cleanly disable shadow stack for
> each thread. In each critical sections there are checks for whether
> shadow stack is still enabled.
>
> 3 is the cleanest and safest I think, and it was thought it might not
> need kernel help, due to a scheme Florian had to direct signals to
> specific threads. It's my preference at this point.
The operations where the shadow stack has to be processed need to be
executable from async-signal context, so this imposes a requirement to
block all signals around the lock. This makes all longjmps a heavy,
multi-syscall operation rather than O(1) userspace operation. I do not
think this is an acceptable implementation, especially when there are
clearly superior alternatives without that cost or invasiveness.
> 1 and 2 are POCed here, if you are interested:
> https://github.com/rpedgeco/linux/commits/shstk_suppress_rfc/
I'm not clear why 2 (suppression of #CP) is desirable at all. If
shadow stack is being disabled, it should just be disabled, with
minimal fault handling to paper over any racing operations at the
moment it's disabled. Leaving it on with extreme slowness to make it
not actually do anything does not seem useful.
Is there some way folks have in mind to use option 2 to lazily disable
shadow stack once the first SS-incompatible code is executed, when
execution is then known not to be in the middle of a SS-critical
section, instead of doing it right away? I don't see how this could
work, since the SS-incompatible code could be running from a signal
handler that interrupted an SS-critical section.
> > If folks on the kernel side are not going to be amenable to doing the
> > things that are easy for the kernel to make it work without breaking
> > compatibility with existing interfaces, but that are impossible or
> > near-impossible for userspace to do, this seems like a dead-end. And
> > I
> > suspect an operation to "disable shadow stack, but without making
> > threads still in SS-critical sections crash" is going to be
> > necessary..
>
> I think we have to work through all the alternative before we can
> accuse the kernel of not being amenable. Is there something that you
> would like to see out of this conversation that is not happening?
No, I was just interpreting "uphill battle". I really do not want to
engage in an uphill battle for the sake of making it practical to
support something that was never my goal to begin with. If I'm
misreading this, or if others are willing to put the effort into that
"battle", I'd be happy to be mistaken about "not amenable".
Rich
On Wed, Feb 21, 2024 at 12:57:19PM -0500, [email protected] wrote:
> On Wed, Feb 21, 2024 at 05:36:12PM +0000, Mark Brown wrote:
> > This feels like it's getting complicated and I fear it may be an uphill
> > struggle to get such code merged, at least for arm64. My instinct is
> > that it's going to be much more robust and generally tractable to let
> > things run to some suitable synchronisation point and then disable
> > there, but if we're going to do that then userspace can hopefully
> > arrange to do the disabling itself through the standard disable
> > interface anyway. Presumably it'll want to notice things being disabled
> > at some point anyway? TBH that's been how all the prior proposals for
> > process wide disable I've seen were done.
> If it's possible to disable per-thread rather than per-process, some
> things are easier. Disabling on account of using alt stacks only needs
Both x86 and arm64 currently track shadow stack enablement per thread,
not per process, so it's not just possible to do per thread it's the
only thing we're currently implementing. I think the same is true for
RISC-V but I didn't look as closely at that yet.
> to be done on the threads using those stacks. However, for dlopen
> purposes you need a way to disable shadow stack for the whole process.
> Initially this is only needed for the thread that called dlopen, but
> it needs to have propagated to any thread that synchronizes with
> completion of the call to dlopen by the time that synchronization
> occurs, and since that synchronization can happen in lots of different
> ways that are purely userspace (thanks to futexes being userspace in
> the uncontended case), I don't see any way to make it work without
> extremely invasive, high-cost checks.
Yeah, it's not particularly nice - any whole process disable is going to
have some nasty cases I think. Rick's message about covered AFAIR the
discussion, there were also some proposals for more limited userspaces I
think.
> If folks on the kernel side are not going to be amenable to doing the
> things that are easy for the kernel to make it work without breaking
> compatibility with existing interfaces, but that are impossible or
> near-impossible for userspace to do, this seems like a dead-end. And I
> suspect an operation to "disable shadow stack, but without making
> threads still in SS-critical sections crash" is going to be
> necessary..
Could you be more specific as to the easy things that you're referencing
here?
On Wed, 2024-02-21 at 13:30 -0500, [email protected] wrote:
> > 3 is the cleanest and safest I think, and it was thought it might
> > not
> > need kernel help, due to a scheme Florian had to direct signals to
> > specific threads. It's my preference at this point.
>
> The operations where the shadow stack has to be processed need to be
> executable from async-signal context, so this imposes a requirement
> to
> block all signals around the lock. This makes all longjmps a heavy,
> multi-syscall operation rather than O(1) userspace operation. I do
> not
> think this is an acceptable implementation, especially when there are
> clearly superior alternatives without that cost or invasiveness.
That is a good point. Could the per-thread locks be nestable to address
this? We just need to know if a thread *might* be using shadow stacks.
So we really just need a per-thread count.
>
> > 1 and 2 are POCed here, if you are interested:
> > https://github.com/rpedgeco/linux/commits/shstk_suppress_rfc/
>
> I'm not clear why 2 (suppression of #CP) is desirable at all. If
> shadow stack is being disabled, it should just be disabled, with
> minimal fault handling to paper over any racing operations at the
> moment it's disabled. Leaving it on with extreme slowness to make it
> not actually do anything does not seem useful.
The benefit is that code that is using shadow stack instructions won't
crash if it relies on them working. For example RDSSP turns into a NOP
if shadow stack is disabled, and the intrinsic is written such that a
NULL pointer is returned if shadow stack is disabled. The shadow stack
is normally readable, and this happens in glibc sometimes. So if there
was code like:
long foo = *(long *)_get_ssp();
...then it could suddenly read a NULL pointer if shadow stack got
disabled. (notice, it's not even a "shadow stack access" fault-wise. So
it was looked at as somewhat more robust. But neither 1 or 2 are
perfect for apps that are manually using shadow stack instructions.
>
> Is there some way folks have in mind to use option 2 to lazily
> disable
> shadow stack once the first SS-incompatible code is executed, when
> execution is then known not to be in the middle of a SS-critical
> section, instead of doing it right away? I don't see how this could
> work, since the SS-incompatible code could be running from a signal
> handler that interrupted an SS-critical section.
The idea was to disable it without critical sections, and it could be
more robust, but not perfect. I was preferring option 1 between 1 and
2, which was closer to your original suggestion. But it has problems
like the example I gave above. I agree 1 is relatively simpler for the
kernel, between 1 and 2.
>
> > > If folks on the kernel side are not going to be amenable to doing
> > > the
> > > things that are easy for the kernel to make it work without
> > > breaking
> > > compatibility with existing interfaces, but that are impossible
> > > or
> > > near-impossible for userspace to do, this seems like a dead-end.
> > > And
> > > I
> > > suspect an operation to "disable shadow stack, but without making
> > > threads still in SS-critical sections crash" is going to be
> > > necessary..
> >
> > I think we have to work through all the alternative before we can
> > accuse the kernel of not being amenable. Is there something that
> > you
> > would like to see out of this conversation that is not happening?
>
> No, I was just interpreting "uphill battle". I really do not want to
> engage in an uphill battle for the sake of making it practical to
> support something that was never my goal to begin with. If I'm
> misreading this, or if others are willing to put the effort into that
> "battle", I'd be happy to be mistaken about "not amenable".
I don't think x86 maintainers have put a foot down on anything around
this at least. They would normally have concerns about complexity and
maintainability. So if we have something that has lower value
(imperfect solution), and high complexity, it starts to look like less
promising path.
On Wed, Feb 21, 2024 at 06:53:44PM +0000, Edgecombe, Rick P wrote:
> On Wed, 2024-02-21 at 13:30 -0500, [email protected] wrote:
> > > 3 is the cleanest and safest I think, and it was thought it might
> > > not
> > > need kernel help, due to a scheme Florian had to direct signals to
> > > specific threads. It's my preference at this point.
> >
> > The operations where the shadow stack has to be processed need to be
> > executable from async-signal context, so this imposes a requirement
> > to
> > block all signals around the lock. This makes all longjmps a heavy,
> > multi-syscall operation rather than O(1) userspace operation. I do
> > not
> > think this is an acceptable implementation, especially when there are
> > clearly superior alternatives without that cost or invasiveness.
>
> That is a good point. Could the per-thread locks be nestable to address
> this? We just need to know if a thread *might* be using shadow stacks.
> So we really just need a per-thread count.
Due to arbitrarily nestable signal frames, no, this does not suffice.
An interrupted operation using the lock could be arbitrarily delayed,
even never execute again, making any call to dlopen deadlock.
> > > 1 and 2 are POCed here, if you are interested:
> > > https://github.com/rpedgeco/linux/commits/shstk_suppress_rfc/
> >
> > I'm not clear why 2 (suppression of #CP) is desirable at all. If
> > shadow stack is being disabled, it should just be disabled, with
> > minimal fault handling to paper over any racing operations at the
> > moment it's disabled. Leaving it on with extreme slowness to make it
> > not actually do anything does not seem useful.
>
> The benefit is that code that is using shadow stack instructions won't
> crash if it relies on them working. For example RDSSP turns into a NOP
> if shadow stack is disabled, and the intrinsic is written such that a
> NULL pointer is returned if shadow stack is disabled. The shadow stack
> is normally readable, and this happens in glibc sometimes. So if there
> was code like:
>
> long foo = *(long *)_get_ssp();
>
> ...then it could suddenly read a NULL pointer if shadow stack got
> disabled. (notice, it's not even a "shadow stack access" fault-wise. So
> it was looked at as somewhat more robust. But neither 1 or 2 are
> perfect for apps that are manually using shadow stack instructions.
It's fine to turn RDSSP into an actual emulated read of the SSP, or at
least an emulated load of zero so that uninitialized data is not left
in the target register. If doing the latter, code working with the
shadow stack just needs to be prepared for the possibility that it
could be async-disabled, and check the return value.
I have not looked at all the instructions that become #UD but I
suspect they all have reasonable trivial ways to implement a
"disabled" version of them that userspace can act upon reasonably.
Rich
On Wed, Feb 21, 2024 at 06:32:20PM +0000, Mark Brown wrote:
> On Wed, Feb 21, 2024 at 12:57:19PM -0500, [email protected] wrote:
> > On Wed, Feb 21, 2024 at 05:36:12PM +0000, Mark Brown wrote:
>
> > > This feels like it's getting complicated and I fear it may be an uphill
> > > struggle to get such code merged, at least for arm64. My instinct is
> > > that it's going to be much more robust and generally tractable to let
> > > things run to some suitable synchronisation point and then disable
> > > there, but if we're going to do that then userspace can hopefully
> > > arrange to do the disabling itself through the standard disable
> > > interface anyway. Presumably it'll want to notice things being disabled
> > > at some point anyway? TBH that's been how all the prior proposals for
> > > process wide disable I've seen were done.
>
> > If it's possible to disable per-thread rather than per-process, some
> > things are easier. Disabling on account of using alt stacks only needs
>
> Both x86 and arm64 currently track shadow stack enablement per thread,
> not per process, so it's not just possible to do per thread it's the
> only thing we're currently implementing. I think the same is true for
> RISC-V but I didn't look as closely at that yet.
That's nice! It allows still keeping part of the benefit of SS in
programs which have some threads running with custom stacks. We do
however need a global-disable option for dlopen. In musl this could be
done via the same mechanism ("synccall") used for set*id -- it's
basically userspace IPI. But just having a native operation would be
nicer, and would probably help glibc where I don't think they
abstracted their set*id mechanism to do other things like this.
> > If folks on the kernel side are not going to be amenable to doing the
> > things that are easy for the kernel to make it work without breaking
> > compatibility with existing interfaces, but that are impossible or
> > near-impossible for userspace to do, this seems like a dead-end. And I
> > suspect an operation to "disable shadow stack, but without making
> > threads still in SS-critical sections crash" is going to be
> > necessary..
>
> Could you be more specific as to the easy things that you're referencing
> here?
Basically the ARCH_SHSTK_SUPPRESS_UD proposal.
Rich
On Mon, Feb 19, 2024 at 11:15:57PM -0300, Thiago Jung Bauermann wrote:
> The only issue as can be seen above is that the can_call_function test
> is failing. The child is getting a GCS Segmentation fault when returning
> from fork().
> I tried debugging it with GDB, but I don't see what's wrong since the
> address in LR matches the first entry in GCSPR. Here is the
> debug session:
I'm simply not seeing this in my testing. There's *something* going on
somewhere, I had another report of a similarish thing elsewhere, but not
in any way that I've ever been able to reproduce. It smells like there
might be something missing with the page tables...
On Wed, 2024-02-21 at 14:06 -0500, [email protected] wrote:
> Due to arbitrarily nestable signal frames, no, this does not suffice.
> An interrupted operation using the lock could be arbitrarily delayed,
> even never execute again, making any call to dlopen deadlock.
Doh! Yep, it is not robust to this. The only thing that could be done
would be a timeout in dlopen(). Which would make the whole thing just
better than nothing.
>
> >
>
> It's fine to turn RDSSP into an actual emulated read of the SSP, or
> at
> least an emulated load of zero so that uninitialized data is not left
> in the target register.
We can't intercept RDSSP, but it becomes a NOP by default. (disclaimer
x86-only knowledge).
> If doing the latter, code working with the
> shadow stack just needs to be prepared for the possibility that it
> could be async-disabled, and check the return value.
>
> I have not looked at all the instructions that become #UD but I
> suspect they all have reasonable trivial ways to implement a
> "disabled" version of them that userspace can act upon reasonably.
This would have to be thought through functionally and performance
wise. I'm not opposed if can come up with a fully fleshed out plan. How
serious are you in pursuing musl support, if we had something like
this?
HJ, any thoughts on whether glibc would use this as well?
It is probably worth mentioning that from the security side (as Mark
mentioned there is always tension in the tradeoffs on these features),
permissive mode is seen by some as something that weakens security too
much. Apps could call dlopen() on a known unsupported DSO before doing
ROP. I don't know if you have any musl users with specific shadow stack
use cases to ask about this.
On Wed, Feb 21, 2024 at 07:22:21PM +0000, Edgecombe, Rick P wrote:
> On Wed, 2024-02-21 at 14:06 -0500, [email protected] wrote:
> > Due to arbitrarily nestable signal frames, no, this does not suffice.
> > An interrupted operation using the lock could be arbitrarily delayed,
> > even never execute again, making any call to dlopen deadlock.
>
> Doh! Yep, it is not robust to this. The only thing that could be done
> would be a timeout in dlopen(). Which would make the whole thing just
> better than nothing.
>
> >
> > >
> >
> > It's fine to turn RDSSP into an actual emulated read of the SSP, or
> > at
> > least an emulated load of zero so that uninitialized data is not left
> > in the target register.
>
> We can't intercept RDSSP, but it becomes a NOP by default. (disclaimer
> x86-only knowledge).
OK, then I think the contract just has to be that userspace, in a
process that might dynamically disable shadow stack, needs to do
something like xor %reg,%reg before rdssp so that the outcome is
deterministic in disabled case.
> > If doing the latter, code working with the
> > shadow stack just needs to be prepared for the possibility that it
> > could be async-disabled, and check the return value.
> >
> > I have not looked at all the instructions that become #UD but I
> > suspect they all have reasonable trivial ways to implement a
> > "disabled" version of them that userspace can act upon reasonably.
>
> This would have to be thought through functionally and performance
> wise. I'm not opposed if can come up with a fully fleshed out plan. How
> serious are you in pursuing musl support, if we had something like
> this?
Up til this thread, my position was pretty much "nope" because it
looked like it could not be done in a way compatible with existing
interface requirements.
However, what's been discussed here, contingent on a dynamic-disable
(ideally allowing choice of per-thread or global, to minimize loss of
hardening properties),
Personally, I believe shadow stack has very low hardening value
relative to cost/complexity, and my leaning would be just to ignore
it. However, I also know it becomes marketing pressure, including
pressure on distros that use musl -- "Why doesn't [distro] do shadow
stack?? I thought you were security oriented!!!" -- and if it can be
done in a non-breaking and non-invasive way, I think it's reasonable
to pursue and make something work.
> HJ, any thoughts on whether glibc would use this as well?
>
> It is probably worth mentioning that from the security side (as Mark
> mentioned there is always tension in the tradeoffs on these features),
> permissive mode is seen by some as something that weakens security too
> much. Apps could call dlopen() on a known unsupported DSO before doing
> ROP. I don't know if you have any musl users with specific shadow stack
> use cases to ask about this.
Yes, this is potentially an argument for something like the option 2,
if there's a way to leave SS enabled but then trap when something goes
wrong, detect if it went wrong via SS-incompatible library code, and
lazily disable SS, otherwise terminate.
But I just realized, I'm not even sure why shared libraries need to be
explicitly SS-compatible. Unless they're doing their own asm stack
switches, shouldn't they just work by default? And since I don't
understand this reason, I also don't understand what the failure mode
is when an incompatible library is loaded, and thus whether it would
be possible to detect and attribute the failure to the library, or
whether the library would induce failure somewhere else.
Anyway, a mechanism to allow the userspace implementation to disable
SS doesn't inherently expose a means to do that. A system integrator
doing maximum hardening might choose to build all libraries as
SS-compatible, or to patch the loader to refuse to load incompatible
libraries.
Rich
On Wed, Feb 21, 2024 at 11:22 AM Edgecombe, Rick P
<[email protected]> wrote:
>
> On Wed, 2024-02-21 at 14:06 -0500, [email protected] wrote:
> > Due to arbitrarily nestable signal frames, no, this does not suffice.
> > An interrupted operation using the lock could be arbitrarily delayed,
> > even never execute again, making any call to dlopen deadlock.
>
> Doh! Yep, it is not robust to this. The only thing that could be done
> would be a timeout in dlopen(). Which would make the whole thing just
> better than nothing.
>
> >
> > >
> >
> > It's fine to turn RDSSP into an actual emulated read of the SSP, or
> > at
> > least an emulated load of zero so that uninitialized data is not left
> > in the target register.
>
> We can't intercept RDSSP, but it becomes a NOP by default. (disclaimer
> x86-only knowledge).
>
> > If doing the latter, code working with the
> > shadow stack just needs to be prepared for the possibility that it
> > could be async-disabled, and check the return value.
> >
> > I have not looked at all the instructions that become #UD but I
> > suspect they all have reasonable trivial ways to implement a
> > "disabled" version of them that userspace can act upon reasonably.
>
> This would have to be thought through functionally and performance
> wise. I'm not opposed if can come up with a fully fleshed out plan. How
> serious are you in pursuing musl support, if we had something like
> this?
>
> HJ, any thoughts on whether glibc would use this as well?
Assuming that we are talking about permissive mode, if kernel can
suppress UD, we don't need to disable SHSTK. Glibc can enable
ARCH_SHSTK_SUPPRESS_UD instead.
> It is probably worth mentioning that from the security side (as Mark
> mentioned there is always tension in the tradeoffs on these features),
> permissive mode is seen by some as something that weakens security too
> much. Apps could call dlopen() on a known unsupported DSO before doing
> ROP. I don't know if you have any musl users with specific shadow stack
> use cases to ask about this.
--
H.J.
On Wed, Feb 21, 2024 at 12:18 PM H.J. Lu <[email protected]> wrote:
>
> On Wed, Feb 21, 2024 at 11:22 AM Edgecombe, Rick P
> <[email protected]> wrote:
> >
> > On Wed, 2024-02-21 at 14:06 -0500, [email protected] wrote:
> > > Due to arbitrarily nestable signal frames, no, this does not suffice.
> > > An interrupted operation using the lock could be arbitrarily delayed,
> > > even never execute again, making any call to dlopen deadlock.
> >
> > Doh! Yep, it is not robust to this. The only thing that could be done
> > would be a timeout in dlopen(). Which would make the whole thing just
> > better than nothing.
> >
> > >
> > > >
> > >
> > > It's fine to turn RDSSP into an actual emulated read of the SSP, or
> > > at
> > > least an emulated load of zero so that uninitialized data is not left
> > > in the target register.
> >
> > We can't intercept RDSSP, but it becomes a NOP by default. (disclaimer
> > x86-only knowledge).
> >
> > > If doing the latter, code working with the
> > > shadow stack just needs to be prepared for the possibility that it
> > > could be async-disabled, and check the return value.
> > >
> > > I have not looked at all the instructions that become #UD but I
> > > suspect they all have reasonable trivial ways to implement a
> > > "disabled" version of them that userspace can act upon reasonably.
> >
> > This would have to be thought through functionally and performance
> > wise. I'm not opposed if can come up with a fully fleshed out plan. How
> > serious are you in pursuing musl support, if we had something like
> > this?
> >
> > HJ, any thoughts on whether glibc would use this as well?
>
> Assuming that we are talking about permissive mode, if kernel can
> suppress UD, we don't need to disable SHSTK. Glibc can enable
> ARCH_SHSTK_SUPPRESS_UD instead.
Kernel must suppress all possible SHSTK UDs.
> > It is probably worth mentioning that from the security side (as Mark
> > mentioned there is always tension in the tradeoffs on these features),
> > permissive mode is seen by some as something that weakens security too
> > much. Apps could call dlopen() on a known unsupported DSO before doing
> > ROP. I don't know if you have any musl users with specific shadow stack
> > use cases to ask about this.
>
>
>
> --
> H.J.
--
H.J.
On Wed, Feb 21, 2024 at 12:25 PM H.J. Lu <[email protected]> wrote:
>
> On Wed, Feb 21, 2024 at 12:18 PM H.J. Lu <[email protected]> wrote:
> >
> > On Wed, Feb 21, 2024 at 11:22 AM Edgecombe, Rick P
> > <[email protected]> wrote:
> > >
> > > On Wed, 2024-02-21 at 14:06 -0500, [email protected] wrote:
> > > > Due to arbitrarily nestable signal frames, no, this does not suffice.
> > > > An interrupted operation using the lock could be arbitrarily delayed,
> > > > even never execute again, making any call to dlopen deadlock.
> > >
> > > Doh! Yep, it is not robust to this. The only thing that could be done
> > > would be a timeout in dlopen(). Which would make the whole thing just
> > > better than nothing.
> > >
> > > >
> > > > >
> > > >
> > > > It's fine to turn RDSSP into an actual emulated read of the SSP, or
> > > > at
> > > > least an emulated load of zero so that uninitialized data is not left
> > > > in the target register.
> > >
> > > We can't intercept RDSSP, but it becomes a NOP by default. (disclaimer
> > > x86-only knowledge).
> > >
> > > > If doing the latter, code working with the
> > > > shadow stack just needs to be prepared for the possibility that it
> > > > could be async-disabled, and check the return value.
> > > >
> > > > I have not looked at all the instructions that become #UD but I
> > > > suspect they all have reasonable trivial ways to implement a
> > > > "disabled" version of them that userspace can act upon reasonably.
> > >
> > > This would have to be thought through functionally and performance
> > > wise. I'm not opposed if can come up with a fully fleshed out plan. How
> > > serious are you in pursuing musl support, if we had something like
> > > this?
> > >
> > > HJ, any thoughts on whether glibc would use this as well?
> >
> > Assuming that we are talking about permissive mode, if kernel can
> > suppress UD, we don't need to disable SHSTK. Glibc can enable
> > ARCH_SHSTK_SUPPRESS_UD instead.
>
> Kernel must suppress all possible SHSTK UDs.
If SHSTK is disabled by kernel, not by glibc, there can be 2 issues:
1. Glibc and kernel may be out of sync on SHSTK.
2. When kernel disables SHSTK, glibc may be in the middle of reading
shadow stack in longjmp, searching for a restore token.
> > > It is probably worth mentioning that from the security side (as Mark
> > > mentioned there is always tension in the tradeoffs on these features),
> > > permissive mode is seen by some as something that weakens security too
> > > much. Apps could call dlopen() on a known unsupported DSO before doing
> > > ROP. I don't know if you have any musl users with specific shadow stack
> > > use cases to ask about this.
> >
> >
> >
> > --
> > H.J.
>
>
>
> --
> H.J.
--
H.J.
On Wed, Feb 21, 2024 at 07:22:21PM +0000, Edgecombe, Rick P wrote:
> On Wed, 2024-02-21 at 14:06 -0500, [email protected] wrote:
> > It's fine to turn RDSSP into an actual emulated read of the SSP, or
> > at
> > least an emulated load of zero so that uninitialized data is not left
> > in the target register.
> We can't intercept RDSSP, but it becomes a NOP by default. (disclaimer
> x86-only knowledge).
For arm64 we have a separate control GCSCRE0_EL1.nTR for access to
GCSPR_EL0 (our SSP equivalent) we can use.
> > I have not looked at all the instructions that become #UD but I
> > suspect they all have reasonable trivial ways to implement a
> > "disabled" version of them that userspace can act upon reasonably.
> This would have to be thought through functionally and performance
> wise. I'm not opposed if can come up with a fully fleshed out plan. How
> serious are you in pursuing musl support, if we had something like
> this?
Same here, we have to be careful since it's defining ABI in a way that
we don't normally provide ABI but if there's a clear case for doing it
then...
On Mon, Feb 19, 2024 at 11:15:57PM -0300, Thiago Jung Bauermann wrote:
> The only issue as can be seen above is that the can_call_function test
> is failing. The child is getting a GCS Segmentation fault when returning
> from fork().
> I tried debugging it with GDB, but I don't see what's wrong since the
> address in LR matches the first entry in GCSPR. Here is the
> debug session:
I believe based on prior discussions that you're running this using
shrinkwrap - can you confirm exactly how please, including things like
which firmware configuration you're using? I'm using current git with
shrinkwrap run \
--rtvar KERNEL=arch/arm64/boot/Image \
--rtvar ROOTFS=${ROOTFS} \
--rtvar CMDLINE="${CMDLINE}" \
--overlay=arch/v9.4.yaml ns-edk2.yaml
and a locally built yocto and everything seems perfectly happy.
Mark Brown <[email protected]> writes:
> On Mon, Feb 19, 2024 at 11:15:57PM -0300, Thiago Jung Bauermann wrote:
>
>> The only issue as can be seen above is that the can_call_function test
>> is failing. The child is getting a GCS Segmentation fault when returning
>> from fork().
>
>> I tried debugging it with GDB, but I don't see what's wrong since the
>> address in LR matches the first entry in GCSPR. Here is the
>> debug session:
>
> I believe based on prior discussions that you're running this using
> shrinkwrap - can you confirm exactly how please, including things like
> which firmware configuration you're using? I'm using current git with
>
> shrinkwrap run \
> --rtvar KERNEL=arch/arm64/boot/Image \
> --rtvar ROOTFS=${ROOTFS} \
> --rtvar CMDLINE="${CMDLINE}" \
> --overlay=arch/v9.4.yaml ns-edk2.yaml
>
> and a locally built yocto and everything seems perfectly happy.
Yes, this is how I'm running it:
CMDLINE="Image dtb=fdt.dtb console=ttyAMA0 earlycon=pl011,0x1c090000 root=/dev/vda2 ip=dhcp maxcpus=1"
shrinkwrap run \
--rtvar=KERNEL=Image-gcs-v8-v6.7-rc4-14743-ga551a7d7af93 \
--rtvar=ROOTFS=$HOME/VMs/ubuntu-aarch64.img \
--rtvar=CMDLINE="$CMDLINE" \
ns-edk2.yaml
I ran the following to set up the FVP VM:
$ shrinkwrap build --overlay=arch/v9.4.yaml ns-edk2.yaml
My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is
here:
https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2
I tried removing "maxcpus=1" from the kernel command line, but it made
no difference.
I also tried resetting my Shrinkwrap setup and starting from scratch,
but it also made no difference: I just pulled from the current main
branch and removed Shrinkwrap's build and package directories, and also
removed all Docker images and the one container I had.
Here are some firmware versions from early boot:
NOTICE: Booting Trusted Firmware
NOTICE: BL1: v2.10.0 (release):v2.10.0
NOTICE: BL1: Built : 00:07:29, Feb 23 2024
⋮
NOTICE: BL2: v2.10.0 (release):v2.10.0
NOTICE: BL2: Built : 00:07:29, Feb 23 2024
⋮
NOTICE: BL31: v2.10.0 (release):v2.10.0
NOTICE: BL31: Built : 00:07:29, Feb 23 2024
⋮
[ edk2 ] UEFI firmware (version built at 00:06:55 on Feb 23 2024)
Press ESCAPE for boot options ...........UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
It looks like our main differences are the kernel config and the distro.
--
Thiago
On Thu, Feb 22, 2024 at 11:24:59PM -0300, Thiago Jung Bauermann wrote:
> Mark Brown <[email protected]> writes:
> My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is
> here:
> https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2
Does using defconfig make a difference for you?
> Here are some firmware versions from early boot:
These are (as you'd expect) the same.
> It looks like our main differences are the kernel config and the distro.
Indeed.
Mark Brown <[email protected]> writes:
> [[PGP Signed Part:Undecided]]
> On Thu, Feb 22, 2024 at 11:24:59PM -0300, Thiago Jung Bauermann wrote:
>> Mark Brown <[email protected]> writes:
>
>> My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is
>> here:
>
>> https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2
>
> Does using defconfig make a difference for you?
No, I still get the same result with the defconfig.
--
Thiago
On Thu, Feb 22, 2024 at 11:24:59PM -0300, Thiago Jung Bauermann wrote:
> Mark Brown <[email protected]> writes:
> > I believe based on prior discussions that you're running this using
> > shrinkwrap - can you confirm exactly how please, including things like
> > which firmware configuration you're using? I'm using current git with
> >
> > shrinkwrap run \
> > --rtvar KERNEL=arch/arm64/boot/Image \
> > --rtvar ROOTFS=${ROOTFS} \
> > --rtvar CMDLINE="${CMDLINE}" \
> > --overlay=arch/v9.4.yaml ns-edk2.yaml
> >
> > and a locally built yocto and everything seems perfectly happy.
>
> Yes, this is how I'm running it:
>
> CMDLINE="Image dtb=fdt.dtb console=ttyAMA0 earlycon=pl011,0x1c090000 root=/dev/vda2 ip=dhcp maxcpus=1"
>
> shrinkwrap run \
> --rtvar=KERNEL=Image-gcs-v8-v6.7-rc4-14743-ga551a7d7af93 \
I guess that's bitrotted?
> My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is
> here:
...
> https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2
Thanks, it seems to be something in your config that's making a
difference - I can see issues with that. Hopefully that'll help me get
to the bottom of this quickly. I spent a bunch of time fighting with
Ubuntu images to get them running but once I did they didn't seem to
make much difference.
Mark Brown <[email protected]> writes:
> [[PGP Signed Part:Undecided]]
> On Thu, Feb 22, 2024 at 11:24:59PM -0300, Thiago Jung Bauermann wrote:
>> Mark Brown <[email protected]> writes:
>
>> > I believe based on prior discussions that you're running this using
>> > shrinkwrap - can you confirm exactly how please, including things like
>> > which firmware configuration you're using? I'm using current git with
>> >
>> > shrinkwrap run \
>> > --rtvar KERNEL=arch/arm64/boot/Image \
>> > --rtvar ROOTFS=${ROOTFS} \
>> > --rtvar CMDLINE="${CMDLINE}" \
>> > --overlay=arch/v9.4.yaml ns-edk2.yaml
>> >
>> > and a locally built yocto and everything seems perfectly happy.
>>
>> Yes, this is how I'm running it:
>>
>> CMDLINE="Image dtb=fdt.dtb console=ttyAMA0 earlycon=pl011,0x1c090000 root=/dev/vda2 ip=dhcp maxcpus=1"
>>
>> shrinkwrap run \
>> --rtvar=KERNEL=Image-gcs-v8-v6.7-rc4-14743-ga551a7d7af93 \
>
> I guess that's bitrotted?
Ah, sorry. When I renamed the Image I messed up the kernel version in the
filename, but I did confirm via "uname -r" that I was running the
correct version: 6.8.0-rc2-ga551a7d7af93.
>> My rootfs is Ubuntu 22.04.3. In case it's useful, my kernel config is
>> here:
>
> ...
>
>> https://people.linaro.org/~thiago.bauermann/gcs/config-v6.8.0-rc2
>
> Thanks, it seems to be something in your config that's making a
> difference - I can see issues with that. Hopefully that'll help me get
> to the bottom of this quickly. I spent a bunch of time fighting with
> Ubuntu images to get them running but once I did they didn't seem to
> make much difference.
In that case, it's interesting that I still run into the problem with
the defconfig. One thing I failed to mention and perhaps is relevant
considering your result, is that I didn't copy the modules into the disk
image, so the FVP was running just with was built into the kernel.
That was actually the main reason for me to use a custom config:
I didn't want to have to deal with kernel modules, so I created a config
that didn't have any.
--
Thiago
* Mark Brown <[email protected]> [2024-02-21 17:36:12 +0000]:
> On Wed, Feb 21, 2024 at 09:58:01AM -0500, [email protected] wrote:
> > On Wed, Feb 21, 2024 at 01:53:10PM +0000, Mark Brown wrote:
> > > On Tue, Feb 20, 2024 at 08:27:37PM -0500, [email protected] wrote:
> > > > On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote:
>
> > > > > (INCSSP, RSTORSSP, etc). These are a collection of instructions that
> > > > > allow limited control of the SSP. When shadow stack gets disabled,
> > > > > these suddenly turn into #UD generating instructions. So any other
> > > > > threads executing those instructions when shadow stack got disabled
> > > > > would be in for a nasty surprise.
>
> > > > This is the kernel's problem if that's happening. It should be
> > > > trapping these and returning immediately like a NOP if shadow stack
> > > > has been disabled, not generating SIGILL.
>
> > > I'm not sure that's going to work out well, all it takes is some code
> > > that's looking at the shadow stack and expecting something to happen as
> > > a result of the instructions it's executing and we run into trouble. A
>
> > I said NOP but there's no reason it strictly needs to be a NOP. It
> > could instead do something reasonable to convey the state of racing
> > with shadow stack being disabled.
>
> This feels like it's getting complicated and I fear it may be an uphill
> struggle to get such code merged, at least for arm64. My instinct is
the aarch64 behaviour is already nop
for gcs instructions when gcs is disabled.
the isa was designed so async disable is
possible.
only x86 linux would have to emulate this.
> that it's going to be much more robust and generally tractable to let
> things run to some suitable synchronisation point and then disable
> there, but if we're going to do that then userspace can hopefully
> arrange to do the disabling itself through the standard disable
> interface anyway. Presumably it'll want to notice things being disabled
> at some point anyway? TBH that's been how all the prior proposals for
> process wide disable I've seen were done.
On Sat, Mar 2, 2024 at 6:57 AM Szabolcs Nagy <[email protected]> wrote:
>
> * Mark Brown <[email protected]> [2024-02-21 17:36:12 +0000]:
>
> > On Wed, Feb 21, 2024 at 09:58:01AM -0500, [email protected] wrote:
> > > On Wed, Feb 21, 2024 at 01:53:10PM +0000, Mark Brown wrote:
> > > > On Tue, Feb 20, 2024 at 08:27:37PM -0500, [email protected] wrote:
> > > > > On Wed, Feb 21, 2024 at 12:35:48AM +0000, Edgecombe, Rick P wrote:
> >
> > > > > > (INCSSP, RSTORSSP, etc). These are a collection of instructions that
> > > > > > allow limited control of the SSP. When shadow stack gets disabled,
> > > > > > these suddenly turn into #UD generating instructions. So any other
> > > > > > threads executing those instructions when shadow stack got disabled
> > > > > > would be in for a nasty surprise.
> >
> > > > > This is the kernel's problem if that's happening. It should be
> > > > > trapping these and returning immediately like a NOP if shadow stack
> > > > > has been disabled, not generating SIGILL.
> >
> > > > I'm not sure that's going to work out well, all it takes is some code
> > > > that's looking at the shadow stack and expecting something to happen as
> > > > a result of the instructions it's executing and we run into trouble A
> >
> > > I said NOP but there's no reason it strictly needs to be a NOP. It
> > > could instead do something reasonable to convey the state of racing
> > > with shadow stack being disabled.
> >
> > This feels like it's getting complicated and I fear it may be an uphill
> > struggle to get such code merged, at least for arm64. My instinct is
>
> the aarch64 behaviour is already nop
> for gcs instructions when gcs is disabled.
> the isa was designed so async disable is
> possible.
>
> only x86 linux would have to emulate this.
On Linux/x86, normal instructions are used to update SSP after
checking SHSTK is enabled. If SHSTK is disabled in between,
program behavior may be undefined.
--
H.J.
On Sat, Mar 02, 2024 at 03:57:02PM +0100, Szabolcs Nagy wrote:
> * Mark Brown <[email protected]> [2024-02-21 17:36:12 +0000]:
> > > I said NOP but there's no reason it strictly needs to be a NOP. It
> > > could instead do something reasonable to convey the state of racing
> > > with shadow stack being disabled.
> > This feels like it's getting complicated and I fear it may be an uphill
> > struggle to get such code merged, at least for arm64. My instinct is
> the aarch64 behaviour is already nop
> for gcs instructions when gcs is disabled.
> the isa was designed so async disable is
> possible.
Yeah, we'd need to handle GCSPR_EL0 somehow (currently it's inaccessible
when GCS is disabled) and userspace would need to take care it's not
doing something that could get stuck if for example a pop didn't
actually *do* anything.