2023-08-07 22:03:53

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 00/36] arm64/gcs: Provide support for GCS in userspace

The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.

When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.

The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.

This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2, this will be implemented separately. Executables
are started without GCS and must use a prctl() to enable it, it is
expected that this will be done very early in application execution by
the dynamic linker or other startup code.

x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.

The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.

x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V zisslpcfi feature which I initially
adopted fairly directly but following review feedback has been revised
quite a bit.

There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.

There's a few bits where I'm not convinced with where I've placed
things, in particular the GCS write operation is in the GCS header not
in uaccess.h, I wasn't sure what was clearest there and am probably too
close to the code to have a clear opinion. The reporting of GCS in
/proc/PID/smaps is also a bit awkward.

The series depends on the x86 shadow stack support:

https://lore.kernel.org/lkml/[email protected]/

I've rebased this onto v6.5-rc4 but not included it in the series in
order to avoid confusion with Rick's work and cut down the size of the
series, you can see the branch at:

https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Mark Brown <[email protected]>
---
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Mark Brown (36):
prctl: arch-agnostic prctl for shadow stack
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide copy_to_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/gcs: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Allocate a new GCS for threads with GCS enabled
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests

Documentation/admin-guide/kernel-parameters.txt | 3 +
Documentation/arch/arm64/booting.rst | 22 +
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 228 +++++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 19 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 106 ++++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 42 ++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 85 ++++
arch/arm64/kernel/ptrace.c | 59 +++
arch/arm64/kernel/signal.c | 237 ++++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 78 ++-
arch/arm64/mm/gcs.c | 234 +++++++++
arch/arm64/mm/mmap.c | 12 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 55 +++
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 ++
kernel/sys_ni.c | 1 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 24 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 356 ++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 ++++++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 87 ++++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 500 +++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 +++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 +++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
72 files changed, 3823 insertions(+), 34 deletions(-)
---
base-commit: ed0e1456f04be7a93c9a186e8e13aed78b555617
change-id: 20230303-arm64-gcs-e311ab0d8729

Best regards,
--
Mark Brown <[email protected]>



2023-08-07 22:04:09

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 02/36] arm64: Document boot requirements for Guarded Control Stacks

FEAT_GCS introduces a number of new system registers, we require that
access to these registers is not trapped when we identify that the feature
is detected.

Signed-off-by: Mark Brown <[email protected]>
---
Documentation/arch/arm64/booting.rst | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
index b57776a68f15..de3679770c64 100644
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@@ -411,6 +411,28 @@ Before jumping into the kernel, the following conditions must be met:

- HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.

+ - For features with Guarded Control Stacks (FEAT_GCS):
+
+ - If EL3 is present:
+
+ - SCR_EL3.GCSEn (bit 39) must be initialised to 0b1.
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+ - HFGITR_EL2.nGCSEPP (bit 59) must be initialised to 0b1.
+
+ - HFGITR_EL2.nGCSSTR_EL1 (bit 58) must be initialised to 0b1.
+
+ - HFGITR_EL2.nGCSPUSHM_EL1 (bit 57) must be initialised to 0b1.
+
+ - HFGRTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.
+
+ - HFGRTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
+
+ - HFGWTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.
+
+ - HFGWTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
+
The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must
enter the kernel in the same exception level. Where the values documented

--
2.30.2


2023-08-07 22:04:59

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 10/36] mm: Define VM_SHADOW_STACK for arm64 when we support GCS

Use VM_HIGH_ARCH_5 for guarded control stack pages.

Signed-off-by: Mark Brown <[email protected]>
---
Documentation/filesystems/proc.rst | 2 +-
fs/proc/task_mmu.c | 3 +++
include/linux/mm.h | 12 +++++++++++-
3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 6ccb57089a06..086a0408a4d7 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -566,7 +566,7 @@ encoded manner. The codes are the following:
mt arm64 MTE allocation tags are enabled
um userfaultfd missing tracking
uw userfaultfd wr-protect tracking
- ss shadow stack page
+ ss shadow/guarded control stack page
== =======================================

Note that there is no guarantee that every flag and associated mnemonic will
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfab855fe7e9..e8c50848bb16 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
#ifdef CONFIG_X86_USER_SHADOW_STACK
[ilog2(VM_SHADOW_STACK)] = "ss",
+#endif
+#ifdef CONFIG_ARM64_GCS
+ [ilog2(VM_SHADOW_STACK)] = "ss",
#endif
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 43fe625b85aa..3f939ae212e5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -372,7 +372,17 @@ extern unsigned int kobjsize(const void *objp);
* having a PAGE_SIZE guard gap.
*/
# define VM_SHADOW_STACK VM_HIGH_ARCH_5
-#else
+#endif
+
+#if defined(CONFIG_ARM64_GCS)
+/*
+ * arm64's Guarded Control Stack implements similar functionality and
+ * has similar constraints to shadow stacks.
+ */
+# define VM_SHADOW_STACK VM_HIGH_ARCH_5
+#endif
+
+#ifndef VM_SHADOW_STACK
# define VM_SHADOW_STACK VM_NONE
#endif


--
2.30.2


2023-08-07 22:06:12

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 12/36] KVM: arm64: Manage GCS registers for guests

GCS introduces a number of system registers for EL1 and EL0, on systems
with GCS we need to context switch them and expose them to VMMs to allow
guests to use GCS. Traps are already disabled.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +++++++++++++++++
arch/arm64/kvm/sys_regs.c | 22 ++++++++++++++++++++++
3 files changed, 51 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d3dd05bbfe23..a5bb00f58108 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -364,6 +364,12 @@ enum vcpu_sysreg {
PIR_EL1, /* Permission Indirection Register 1 (EL1) */
PIRE0_EL1, /* Permission Indirection Register 0 (EL1) */

+ /* Guarded Control Stack registers */
+ GCSCRE0_EL1, /* Guarded Control Stack Control (EL0) */
+ GCSCR_EL1, /* Guarded Control Stack Control (EL1) */
+ GCSPR_EL0, /* Guarded Control Stack Pointer (EL0) */
+ GCSPR_EL1, /* Guarded Control Stack Pointer (EL1) */
+
/* 32bit specific registers. */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
@@ -1136,6 +1142,12 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
#define kvm_vm_has_ran_once(kvm) \
(test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &(kvm)->arch.flags))

+static inline bool has_gcs(void)
+{
+ return IS_ENABLED(CONFIG_ARM64_GCS) &&
+ cpus_have_final_cap(ARM64_HAS_GCS);
+}
+
int kvm_trng_call(struct kvm_vcpu *vcpu);
#ifdef CONFIG_KVM
extern phys_addr_t hyp_mem_base;
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index bb6b571ec627..ec34d4a90717 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -25,6 +25,8 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
{
ctxt_sys_reg(ctxt, TPIDR_EL0) = read_sysreg(tpidr_el0);
ctxt_sys_reg(ctxt, TPIDRRO_EL0) = read_sysreg(tpidrro_el0);
+ if (has_gcs())
+ ctxt_sys_reg(ctxt, GCSPR_EL0) = read_sysreg_s(SYS_GCSPR_EL0);
}

static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
@@ -62,6 +64,12 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg_par();
ctxt_sys_reg(ctxt, TPIDR_EL1) = read_sysreg(tpidr_el1);

+ if (has_gcs()) {
+ ctxt_sys_reg(ctxt, GCSPR_EL1) = read_sysreg_el1(SYS_GCSPR);
+ ctxt_sys_reg(ctxt, GCSCR_EL1) = read_sysreg_el1(SYS_GCSCR);
+ ctxt_sys_reg(ctxt, GCSCRE0_EL1) = read_sysreg_s(SYS_GCSCRE0_EL1);
+ }
+
if (ctxt_has_mte(ctxt)) {
ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
@@ -95,6 +103,8 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL0), tpidr_el0);
write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0), tpidrro_el0);
+ if (has_gcs())
+ write_sysreg_s(ctxt_sys_reg(ctxt, GCSPR_EL0), SYS_GCSPR_EL0);
}

static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
@@ -138,6 +148,13 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1), par_el1);
write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1);

+ if (has_gcs()) {
+ write_sysreg_el1(ctxt_sys_reg(ctxt, GCSPR_EL1), SYS_GCSPR);
+ write_sysreg_el1(ctxt_sys_reg(ctxt, GCSCR_EL1), SYS_GCSCR);
+ write_sysreg_s(ctxt_sys_reg(ctxt, GCSCRE0_EL1),
+ SYS_GCSCRE0_EL1);
+ }
+
if (ctxt_has_mte(ctxt)) {
write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2ca2973abe66..5b2f238d33be 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1768,6 +1768,23 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
.visibility = mte_visibility, \
}

+static unsigned int gcs_visibility(const struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *rd)
+{
+ if (has_gcs())
+ return 0;
+
+ return REG_HIDDEN;
+}
+
+#define GCS_REG(name) { \
+ SYS_DESC(SYS_##name), \
+ .access = undef_access, \
+ .reset = reset_unknown, \
+ .reg = name, \
+ .visibility = gcs_visibility, \
+}
+
static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
@@ -2080,6 +2097,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
PTRAUTH_KEY(APDB),
PTRAUTH_KEY(APGA),

+ GCS_REG(GCSCR_EL1),
+ GCS_REG(GCSPR_EL1),
+ GCS_REG(GCSCRE0_EL1),
+
{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
{ SYS_DESC(SYS_ELR_EL1), access_elr},

@@ -2162,6 +2183,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
{ SYS_DESC(SYS_CTR_EL0), access_ctr },
+ GCS_REG(GCSPR_EL0),
{ SYS_DESC(SYS_SVCR), undef_access },

{ PMU_SYS_REG(PMCR_EL0), .access = access_pmcr,

--
2.30.2


2023-08-07 22:09:47

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 20/36] arm64/gcs: Implement shadow stack prctl() interface

Implement the architecture neutral prtctl() interface for setting the
shadow stack status, this supports setting and reading the current GCS
configuration for the current thread.

Userspace can enable basic GCS functionality and additionally also
support for GCS pushes and arbatrary GCS stores. It is expected that
this prctl() will be called very early in application startup, for
example by the dynamic linker, and not subsequently adjusted during
normal operation. Users should carefully note that after enabling GCS
for a thread GCS will become active with no call stack so it is not
normally possible to return from the function that invoked the prctl().

State is stored per thread, enabling GCS for a thread causes a GCS to be
allocated for that thread.

Userspace may lock the current GCS configuration by specifying
PR_SHADOW_STACK_ENABLE_LOCK, this prevents any further changes to the
GCS configuration via any means.

If GCS is not being enabled then all flags other than _LOCK are ignored,
it is not possible to enable stores or pops without enabling GCS.

When disabling the GCS we do not free the allocated stack, this allows
for inspection of the GCS after disabling as part of fault reporting.
Since it is not an expected use case and since it presents some
complications in determining what to do with previously initialsed data
on the GCS attempts to reenable GCS after this are rejected. This can
be revisted if a use case arises.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 22 ++++++++++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/mm/gcs.c | 82 ++++++++++++++++++++++++++++++++++++++
3 files changed, 105 insertions(+)

diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 4371a2f99b4a..c150e76869a1 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -48,6 +48,9 @@ static inline u64 gcsss2(void)
return Xt;
}

+#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK \
+ (PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE | PR_SHADOW_STACK_PUSH)
+
#ifdef CONFIG_ARM64_GCS

static inline bool task_gcs_el0_enabled(struct task_struct *task)
@@ -61,6 +64,20 @@ void gcs_preserve_current_state(void);
unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
unsigned long clone_flags, size_t size);

+static inline int gcs_check_locked(struct task_struct *task,
+ unsigned long new_val)
+{
+ unsigned long cur_val = task->thread.gcs_el0_mode;
+
+ cur_val &= task->thread.gcs_el0_locked;
+ new_val &= task->thread.gcs_el0_locked;
+
+ if (cur_val != new_val)
+ return -EBUSY;
+
+ return 0;
+}
+
#else

static inline bool task_gcs_el0_enabled(struct task_struct *task)
@@ -76,6 +93,11 @@ static inline unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
{
return -ENOTSUPP;
}
+static inline int gcs_check_locked(struct task_struct *task,
+ unsigned long new_val)
+{
+ return 0;
+}

#endif

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index f1551228a143..e4255749844a 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -181,6 +181,7 @@ struct thread_struct {
u64 tpidr2_el0;
#ifdef CONFIG_ARM64_GCS
unsigned int gcs_el0_mode;
+ unsigned int gcs_el0_locked;
u64 gcspr_el0;
u64 gcs_base;
u64 gcs_size;
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 1e059c37088d..64c9f9a85925 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -93,3 +93,85 @@ void gcs_free(struct task_struct *task)
task->thread.gcs_base = 0;
task->thread.gcs_size = 0;
}
+
+int arch_set_shadow_stack_status(struct task_struct *task, unsigned long arg)
+{
+ unsigned long gcs, size;
+ int ret;
+
+ if (!system_supports_gcs())
+ return -EINVAL;
+
+ if (is_compat_thread(task_thread_info(task)))
+ return -EINVAL;
+
+ /* Reject unknown flags */
+ if (arg & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK)
+ return -EINVAL;
+
+ ret = gcs_check_locked(task, arg);
+ if (ret != 0)
+ return ret;
+
+ /* If we are enabling GCS then make sure we have a stack */
+ if (arg & PR_SHADOW_STACK_ENABLE) {
+ if (!task_gcs_el0_enabled(task)) {
+ /* Do not allow GCS to be reenabled */
+ if (task->thread.gcs_base)
+ return -EINVAL;
+
+ if (task != current)
+ return -EBUSY;
+
+ size = gcs_size(0);
+ gcs = alloc_gcs(task->thread.gcspr_el0, size,
+ 0, 0);
+ if (!gcs)
+ return -ENOMEM;
+
+ task->thread.gcspr_el0 = gcs + size - sizeof(u64);
+ task->thread.gcs_base = gcs;
+ task->thread.gcs_size = size;
+ if (task == current)
+ write_sysreg_s(task->thread.gcspr_el0,
+ SYS_GCSPR_EL0);
+
+ }
+ }
+
+ task->thread.gcs_el0_mode = arg;
+ if (task == current)
+ gcs_set_el0_mode(task);
+
+ return 0;
+}
+
+int arch_get_shadow_stack_status(struct task_struct *task,
+ unsigned long __user *arg)
+{
+ if (!system_supports_gcs())
+ return -EINVAL;
+
+ if (is_compat_thread(task_thread_info(task)))
+ return -EINVAL;
+
+ return put_user(task->thread.gcs_el0_mode, arg);
+}
+
+int arch_lock_shadow_stack_status(struct task_struct *task,
+ unsigned long arg)
+{
+ if (!system_supports_gcs())
+ return -EINVAL;
+
+ if (is_compat_thread(task_thread_info(task)))
+ return -EINVAL;
+
+ /*
+ * We support locking unknown bits so applications can prevent
+ * any changes in a future proof manner.
+ */
+ task->thread.gcs_el0_locked |= arg;
+
+ return 0;
+}

--
2.30.2


2023-08-07 22:10:11

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 36/36] kselftest/arm64: Enable GCS for the FP stress tests

While it's a bit off topic for them the floating point stress tests do give
us some coverage of context thrashing cases, and also of active signal
delivery separate to the relatively complicated framework in the actual
signals tests. Have the tests enable GCS on startup, ignoring failures so
they continue to work as before on systems without GCS.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/fp/assembler.h | 15 +++++++++++++++
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 ++
tools/testing/selftests/arm64/fp/sve-test.S | 2 ++
tools/testing/selftests/arm64/fp/za-test.S | 2 ++
tools/testing/selftests/arm64/fp/zt-test.S | 2 ++
5 files changed, 23 insertions(+)

diff --git a/tools/testing/selftests/arm64/fp/assembler.h b/tools/testing/selftests/arm64/fp/assembler.h
index 9b38a0da407d..7012f9f796de 100644
--- a/tools/testing/selftests/arm64/fp/assembler.h
+++ b/tools/testing/selftests/arm64/fp/assembler.h
@@ -65,4 +65,19 @@ endfunction
bl puts
.endm

+#define PR_SET_SHADOW_STACK_STATUS 72
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+
+.macro enable_gcs
+ // Run with GCS
+ mov x0, PR_SET_SHADOW_STACK_STATUS
+ mov x1, PR_SHADOW_STACK_ENABLE
+ mov x2, xzr
+ mov x3, xzr
+ mov x4, xzr
+ mov x5, xzr
+ mov x8, #__NR_prctl
+ svc #0
+.endm
+
#endif /* ! ASSEMBLER_H */
diff --git a/tools/testing/selftests/arm64/fp/fpsimd-test.S b/tools/testing/selftests/arm64/fp/fpsimd-test.S
index 8b960d01ed2e..b16fb7f42e3e 100644
--- a/tools/testing/selftests/arm64/fp/fpsimd-test.S
+++ b/tools/testing/selftests/arm64/fp/fpsimd-test.S
@@ -215,6 +215,8 @@ endfunction
// Main program entry point
.globl _start
function _start
+ enable_gcs
+
mov x23, #0 // signal count

mov w0, #SIGINT
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index 4328895dfc87..486634bc7def 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -378,6 +378,8 @@ endfunction
// Main program entry point
.globl _start
function _start
+ enable_gcs
+
mov x23, #0 // Irritation signal count

mov w0, #SIGINT
diff --git a/tools/testing/selftests/arm64/fp/za-test.S b/tools/testing/selftests/arm64/fp/za-test.S
index 9dcd70911397..f789694fa3ea 100644
--- a/tools/testing/selftests/arm64/fp/za-test.S
+++ b/tools/testing/selftests/arm64/fp/za-test.S
@@ -231,6 +231,8 @@ endfunction
// Main program entry point
.globl _start
function _start
+ enable_gcs
+
mov x23, #0 // signal count

mov w0, #SIGINT
diff --git a/tools/testing/selftests/arm64/fp/zt-test.S b/tools/testing/selftests/arm64/fp/zt-test.S
index d63286397638..ea5e55310705 100644
--- a/tools/testing/selftests/arm64/fp/zt-test.S
+++ b/tools/testing/selftests/arm64/fp/zt-test.S
@@ -200,6 +200,8 @@ endfunction
// Main program entry point
.globl _start
function _start
+ enable_gcs
+
mov x23, #0 // signal count

mov w0, #SIGINT

--
2.30.2


2023-08-07 22:11:39

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 24/36] arm64/ptrace: Expose GCS via ptrace and core files

Provide a new register type NT_ARM_GCS reporting the current GCS mode
and pointer for EL0. Due to the interactions with allocation and
deallocation of Guarded Control Stacks we do not permit any changes to
the GCS mode via ptrace, only GCSPR_EL0 may be changed.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/uapi/asm/ptrace.h | 8 +++++
arch/arm64/kernel/ptrace.c | 59 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/elf.h | 1 +
3 files changed, 68 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 7fa2f7036aa7..0f39ba4f3efd 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -324,6 +324,14 @@ struct user_za_header {
#define ZA_PT_SIZE(vq) \
(ZA_PT_ZA_OFFSET + ZA_PT_ZA_SIZE(vq))

+/* GCS state (NT_ARM_GCS) */
+
+struct user_gcs {
+ __u64 features_enabled;
+ __u64 features_locked;
+ __u64 gcspr_el0;
+};
+
#endif /* __ASSEMBLY__ */

#endif /* _UAPI__ASM_PTRACE_H */
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index d7f4f0d1ae12..c159090bc731 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -33,6 +33,7 @@
#include <asm/cpufeature.h>
#include <asm/debug-monitors.h>
#include <asm/fpsimd.h>
+#include <asm/gcs.h>
#include <asm/mte.h>
#include <asm/pointer_auth.h>
#include <asm/stacktrace.h>
@@ -1390,6 +1391,51 @@ static int tagged_addr_ctrl_set(struct task_struct *target, const struct
}
#endif

+#ifdef CONFIG_ARM64_GCS
+static int gcs_get(struct task_struct *target,
+ const struct user_regset *regset,
+ struct membuf to)
+{
+ struct user_gcs user_gcs;
+
+ if (target == current)
+ gcs_preserve_current_state();
+
+ user_gcs.features_enabled = target->thread.gcs_el0_mode;
+ user_gcs.features_locked = target->thread.gcs_el0_locked;
+ user_gcs.gcspr_el0 = target->thread.gcspr_el0;
+
+ return membuf_write(&to, &user_gcs, sizeof(user_gcs));
+}
+
+static int gcs_set(struct task_struct *target, const struct
+ user_regset *regset, unsigned int pos,
+ unsigned int count, const void *kbuf, const
+ void __user *ubuf)
+{
+ int ret;
+ struct user_gcs user_gcs;
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &user_gcs, 0, -1);
+ if (ret)
+ return ret;
+
+ if (user_gcs.features_enabled & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK)
+ return -EINVAL;
+
+ /* Do not allow enable via ptrace */
+ if ((user_gcs.features_enabled & PR_SHADOW_STACK_ENABLE) &&
+ !!(target->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE))
+ return -EBUSY;
+
+ target->thread.gcs_el0_mode = user_gcs.features_enabled;
+ target->thread.gcs_el0_locked = user_gcs.features_locked;
+ target->thread.gcspr_el0 = user_gcs.gcspr_el0;
+
+ return 0;
+}
+#endif
+
enum aarch64_regset {
REGSET_GPR,
REGSET_FPR,
@@ -1418,6 +1464,9 @@ enum aarch64_regset {
#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
REGSET_TAGGED_ADDR_CTRL,
#endif
+#ifdef CONFIG_ARM64_GCS
+ REGSET_GCS,
+#endif
};

static const struct user_regset aarch64_regsets[] = {
@@ -1568,6 +1617,16 @@ static const struct user_regset aarch64_regsets[] = {
.set = tagged_addr_ctrl_set,
},
#endif
+#ifdef CONFIG_ARM64_GCS
+ [REGSET_GCS] = {
+ .core_note_type = NT_ARM_GCS,
+ .n = sizeof(struct user_gcs) / sizeof(u64),
+ .size = sizeof(u64),
+ .align = sizeof(u64),
+ .regset_get = gcs_get,
+ .set = gcs_set,
+ },
+#endif
};

static const struct user_regset_view user_aarch64_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 0c8cf359ea5b..00f698a2ab17 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -438,6 +438,7 @@ typedef struct elf64_shdr {
#define NT_ARM_SSVE 0x40b /* ARM Streaming SVE registers */
#define NT_ARM_ZA 0x40c /* ARM SME ZA registers */
#define NT_ARM_ZT 0x40d /* ARM SME ZT registers */
+#define NT_ARM_GCS 0x40e /* ARM GCS state */
#define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */
#define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */
#define NT_MIPS_DSP 0x800 /* MIPS DSP ASE registers */

--
2.30.2


2023-08-07 22:11:39

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 04/36] arm64/sysreg: Add new system registers for GCS

FEAT_GCS introduces a number of new system registers. Add the registers
available up to EL2 to sysreg as per DDI0601 2022-12.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/tools/sysreg | 55 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)

diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 65866bf819c3..20c12e65a304 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -1780,6 +1780,41 @@ Sysreg SMCR_EL1 3 0 1 2 6
Fields SMCR_ELx
EndSysreg

+SysregFields GCSCR_ELx
+Res0 63:10
+Field 9 STREn
+Field 8 PUSHMEn
+Res0 7
+Field 6 EXLOCKEN
+Field 5 RVCHKEN
+Res0 4:1
+Field 0 PCRSEL
+EndSysregFields
+
+Sysreg GCSCR_EL1 3 0 2 5 0
+Fields GCSCR_ELx
+EndSysreg
+
+SysregFields GCSPR_ELx
+Field 63:3 PTR
+Res0 2:0
+EndSysregFields
+
+Sysreg GCSPR_EL1 3 0 2 5 1
+Fields GCSPR_ELx
+EndSysreg
+
+Sysreg GCSCRE0_EL1 3 0 2 5 2
+Res0 63:11
+Field 10 nTR
+Field 9 STREn
+Field 8 PUSHMEn
+Res0 7:6
+Field 5 RVCHKEN
+Res0 4:1
+Field 0 PCRSEL
+EndSysreg
+
Sysreg ALLINT 3 0 4 3 0
Res0 63:14
Field 13 ALLINT
@@ -2010,6 +2045,10 @@ Field 4 DZP
Field 3:0 BS
EndSysreg

+Sysreg GCSPR_EL0 3 3 2 5 1
+Fields GCSPR_ELx
+EndSysreg
+
Sysreg SVCR 3 3 4 2 2
Res0 63:2
Field 1 ZA
@@ -2209,6 +2248,14 @@ Sysreg SMCR_EL2 3 4 1 2 6
Fields SMCR_ELx
EndSysreg

+Sysreg GCSCR_EL2 3 4 2 5 0
+Fields GCSCR_ELx
+EndSysreg
+
+Sysreg GCSPR_EL2 3 4 2 5 1
+Fields GCSPR_ELx
+EndSysreg
+
Sysreg DACR32_EL2 3 4 3 0 0
Res0 63:32
Field 31:30 D15
@@ -2268,6 +2315,14 @@ Sysreg SMCR_EL12 3 5 1 2 6
Fields SMCR_ELx
EndSysreg

+Sysreg GCSCR_EL12 3 5 2 5 0
+Fields GCSCR_ELx
+EndSysreg
+
+Sysreg GCSPR_EL12 3 5 2 5 1
+Fields GCSPR_ELx
+EndSysreg
+
Sysreg FAR_EL12 3 5 6 0 0
Field 63:0 ADDR
EndSysreg

--
2.30.2


2023-08-07 22:12:40

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 17/36] arm64/mm: Handle GCS data aborts

All GCS operations at EL0 must happen on a page which is marked as
having UnprivGCS access, including read operations. If a GCS operation
attempts to access a page without this then it will generate a data
abort with the GCS bit set in ESR_EL1.ISS2.

EL0 may validly generate such faults, for example due to copy on write
which will cause the GCS data to be stored in a read only page with no
GCS permissions until the actual copy happens. Since UnprivGCS allows
both reads and writes to the GCS (though only through GCS operations) we
need to ensure that the memory management subsystem handles GCS accesses
as writes at all times. Do this by adding FAULT_FLAG_WRITE to any GCS
page faults, adding handling to ensure that invalid cases are identfied
as such early so the memory management core does not think they will
succeed. The core cannot distinguish between VMAs which are generally
writeable and VMAs which are only writeable through GCS operations.

EL1 may validly write to EL0 GCS for management purposes (eg, while
initialising with cap tokens).

We also report any GCS faults in VMAs not marked as part of a GCS as
access violations, causing a fault to be delivered to userspace if it
attempts to do GCS operations outside a GCS.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/mm/fault.c | 78 +++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 70 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 3fe516b32577..ec392207a475 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -494,13 +494,30 @@ static void do_bad_area(unsigned long far, unsigned long esr,
}
}

+/*
+ * Note: not valid for EL1 DC IVAC, but we never use that such that it
+ * should fault. EL0 cannot issue DC IVAC (undef).
+ */
+static bool is_write_abort(unsigned long esr)
+{
+ return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
+}
+
+static bool is_gcs_fault(unsigned long esr)
+{
+ if (!esr_is_data_abort(esr))
+ return false;
+
+ return ESR_ELx_ISS2(esr) & ESR_ELx_GCS;
+}
+
#define VM_FAULT_BADMAP ((__force vm_fault_t)0x010000)
#define VM_FAULT_BADACCESS ((__force vm_fault_t)0x020000)

static vm_fault_t __do_page_fault(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long addr,
unsigned int mm_flags, unsigned long vm_flags,
- struct pt_regs *regs)
+ unsigned long esr, struct pt_regs *regs)
{
/*
* Ok, we have a good vm_area for this memory access, so we can handle
@@ -510,6 +527,26 @@ static vm_fault_t __do_page_fault(struct mm_struct *mm,
*/
if (!(vma->vm_flags & vm_flags))
return VM_FAULT_BADACCESS;
+
+ if (vma->vm_flags & VM_SHADOW_STACK) {
+ /*
+ * Writes to a GCS must either be generated by a GCS
+ * operation or be from EL1.
+ */
+ if (is_write_abort(esr) &&
+ !(is_gcs_fault(esr) || is_el1_data_abort(esr)))
+ return VM_FAULT_BADACCESS;
+ } else {
+ /*
+ * GCS faults should never happen for pages that are
+ * not part of a GCS and the operation being attempted
+ * can never succeed.
+ */
+ if (is_gcs_fault(esr))
+ return VM_FAULT_BADACCESS;
+ }
+
+
return handle_mm_fault(vma, addr, mm_flags, regs);
}

@@ -518,13 +555,18 @@ static bool is_el0_instruction_abort(unsigned long esr)
return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW;
}

-/*
- * Note: not valid for EL1 DC IVAC, but we never use that such that it
- * should fault. EL0 cannot issue DC IVAC (undef).
- */
-static bool is_write_abort(unsigned long esr)
+static bool is_invalid_el0_gcs_access(struct vm_area_struct *vma, u64 esr)
{
- return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
+ if (!system_supports_gcs())
+ return false;
+ if (likely(!(vma->vm_flags & VM_SHADOW_STACK))) {
+ if (is_gcs_fault(esr))
+ return true;
+ return false;
+ }
+ if (is_gcs_fault(esr))
+ return false;
+ return is_write_abort(esr);
}

static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
@@ -573,6 +615,13 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
/* If EPAN is absent then exec implies read */
if (!cpus_have_const_cap(ARM64_HAS_EPAN))
vm_flags |= VM_EXEC;
+ /*
+ * Upgrade read faults to write faults, GCS reads must
+ * occur on a page marked as GCS so we need to trigger
+ * copy on write always.
+ */
+ if (is_gcs_fault(esr))
+ mm_flags |= FAULT_FLAG_WRITE;
}

if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs)) {
@@ -595,6 +644,19 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
if (!vma)
goto lock_mmap;

+ /*
+ * We get legitimate write faults for GCS pages from GCS
+ * operations and from EL1 writes to EL0 pages but just plain
+ * EL0 writes are invalid. Specifically check for this since
+ * as a result of upgrading read faults to write faults for
+ * CoW the mm core isn't able to distinguish these invalid
+ * writes.
+ */
+ if (is_invalid_el0_gcs_access(vma, esr)) {
+ vma_end_read(vma);
+ goto lock_mmap;
+ }
+
if (!(vma->vm_flags & vm_flags)) {
vma_end_read(vma);
goto lock_mmap;
@@ -624,7 +686,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
goto done;
}

- fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, regs);
+ fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, esr, regs);

/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {

--
2.30.2


2023-08-07 22:13:01

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 06/36] arm64/gcs: Add manual encodings of GCS instructions

Define C callable functions for GCS instructions used by the kernel. In
order to avoid ambitious toolchain requirements for GCS support these are
manually encoded, this means we have fixed register numbers which will be
a bit limiting for the compiler but none of these should be used in
sufficiently fast paths for this to be a problem.

Note that GCSSTTR is used to store to EL0.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 51 ++++++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/uaccess.h | 22 +++++++++++++++++
2 files changed, 73 insertions(+)

diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
new file mode 100644
index 000000000000..7c5e95218db6
--- /dev/null
+++ b/arch/arm64/include/asm/gcs.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+#ifndef __ASM_GCS_H
+#define __ASM_GCS_H
+
+#include <asm/types.h>
+#include <asm/uaccess.h>
+
+static inline void gcsb_dsync(void)
+{
+ asm volatile(".inst 0xd503227f" : : : "memory");
+}
+
+static inline void gcsstr(u64 *addr, u64 val)
+{
+ register u64 *_addr __asm__ ("x0") = addr;
+ register long _val __asm__ ("x1") = val;
+
+ /* GCSSTTR x1, x0 */
+ asm volatile(
+ ".inst 0xd91f1c01\n"
+ :
+ : "rZ" (_val), "r" (_addr)
+ : "memory");
+}
+
+static inline void gcsss1(u64 Xt)
+{
+ asm volatile (
+ "sys #3, C7, C7, #2, %0\n"
+ :
+ : "rZ" (Xt)
+ : "memory");
+}
+
+static inline u64 gcsss2(void)
+{
+ u64 Xt;
+
+ asm volatile(
+ "SYSL %0, #3, C7, C7, #3\n"
+ : "=r" (Xt)
+ :
+ : "memory");
+
+ return Xt;
+}
+
+#endif
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 14be5000c5a0..22e10e79f56a 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -425,4 +425,26 @@ static inline size_t probe_subpage_writeable(const char __user *uaddr,

#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */

+#ifdef CONFIG_ARM64_GCS
+
+static inline int gcssttr(unsigned long __user *addr, unsigned long val)
+{
+ register unsigned long __user *_addr __asm__ ("x0") = addr;
+ register unsigned long _val __asm__ ("x1") = val;
+ int err = 0;
+
+ /* GCSSTTR x1, x0 */
+ asm volatile(
+ "1: .inst 0xd91f1c01\n"
+ "2: \n"
+ _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %w0)
+ : "+r" (err)
+ : "rZ" (_val), "r" (_addr)
+ : "memory");
+
+ return err;
+}
+
+#endif /* CONFIG_ARM64_GCS */
+
#endif /* __ASM_UACCESS_H */

--
2.30.2


2023-08-07 22:15:18

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 01/36] prctl: arch-agnostic prctl for shadow stack

Three architectures (x86, aarch64, riscv) have announced support for
shadow stacks with fairly similar functionality. While x86 is using
arch_prctl() to control the functionality neither arm64 nor riscv uses
that interface so this patch adds arch-agnostic prctl() support to
get and set status of shadow stacks and lock the current configuation to
prevent further changes, with support for turning on and off individual
subfeatures so applications can limit their exposure to features that
they do not need. The features are:

- PR_SHADOW_STACK_ENABLE: Tracking and enforcement of shadow stacks,
including allocation of a shadow stack if one is not already
allocated.
- PR_SHADOW_STACK_WRITE: Writes to specific addresses in the shadow
stack.
- PR_SHADOW_STACK_PUSH: Push additional values onto the shadow stack.

These features are expected to be inherited by new threads and cleared
on exec(), unknown features should be rejected for enable but accepted
for locking (in order to allow for future proofing).

This is based on a patch originally written by Deepak Gupta but modified
fairly heavily, support for indirect landing pads is removed, additional
modes added and the locking interface reworked. The set status prctl()
is also reworked to just set flags, if setting/reading the shadow stack
pointer is required this could be a separate prctl.

Signed-off-by: Mark Brown <[email protected]>
---
include/linux/mm.h | 4 ++++
include/uapi/linux/prctl.h | 22 ++++++++++++++++++++++
kernel/sys.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 56 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0ead9d8013e8..43fe625b85aa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3964,4 +3964,8 @@ static inline void accept_memory(phys_addr_t start, phys_addr_t end)

#endif

+int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status);
+int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
+int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
+
#endif /* _LINUX_MM_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3c36aeade991..0de3d6ee18e0 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -305,4 +305,26 @@ struct prctl_mm_map {
# define PR_RISCV_V_VSTATE_CTRL_NEXT_MASK 0xc
# define PR_RISCV_V_VSTATE_CTRL_MASK 0x1f

+/*
+ * Get the current shadow stack configuration for the current thread,
+ * this will be the value configured via PR_SET_SHADOW_STACK_STATUS.
+ */
+#define PR_GET_SHADOW_STACK_STATUS 71
+
+/*
+ * Set the current shadow stack configuration. Enabling the shadow
+ * stack will cause a shadow stack to be allocated for the thread.
+ */
+#define PR_SET_SHADOW_STACK_STATUS 72
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+# define PR_SHADOW_STACK_WRITE (1UL << 1)
+# define PR_SHADOW_STACK_PUSH (1UL << 2)
+
+/*
+ * Prevent further changes to the specified shadow stack
+ * configuration. All bits may be locked via this call, including
+ * undefined bits.
+ */
+#define PR_LOCK_SHADOW_STACK_STATUS 73
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 2410e3999ebe..b26423a614a9 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2302,6 +2302,21 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
return -EINVAL;
}

+int __weak arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_set_shadow_stack_status(struct task_struct *t, unsigned long status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status)
+{
+ return -EINVAL;
+}
+
#define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)

#ifdef CONFIG_ANON_VMA_NAME
@@ -2720,6 +2735,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_RISCV_V_GET_CONTROL:
error = RISCV_V_GET_CONTROL();
break;
+ case PR_GET_SHADOW_STACK_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_get_shadow_stack_status(me, (unsigned long __user *) arg2);
+ break;
+ case PR_SET_SHADOW_STACK_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_set_shadow_stack_status(me, arg2);
+ break;
+ case PR_LOCK_SHADOW_STACK_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_lock_shadow_stack_status(me, arg2);
+ break;
default:
error = -EINVAL;
break;

--
2.30.2


2023-08-07 22:22:11

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 16/36] arm64/traps: Handle GCS exceptions

A new exception code is defined for GCS specific faults other than
standard load/store faults, for example GCS token validation failures,
add handling for this. These faults are reported to userspace as
segfaults with code SEGV_CPERR (protection error), mirroring the
reporting for x86 shadow stack errors.

GCS faults due to memory load/store operations generate data aborts with
a flag set, these will be handled separately as part of the data abort
handling.

Since we do not currently enable GCS for EL1 we should not get any faults
there but while we're at it we wire things up there, treating any GCS
fault as fatal.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/esr.h | 28 +++++++++++++++++++++++++++-
arch/arm64/include/asm/exception.h | 2 ++
arch/arm64/kernel/entry-common.c | 23 +++++++++++++++++++++++
arch/arm64/kernel/traps.c | 11 +++++++++++
4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index ae35939f395b..a87a8305051f 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -51,7 +51,8 @@
#define ESR_ELx_EC_FP_EXC32 (0x28)
/* Unallocated EC: 0x29 - 0x2B */
#define ESR_ELx_EC_FP_EXC64 (0x2C)
-/* Unallocated EC: 0x2D - 0x2E */
+#define ESR_ELx_EC_GCS (0x2D)
+/* Unallocated EC: 0x2E */
#define ESR_ELx_EC_SERROR (0x2F)
#define ESR_ELx_EC_BREAKPT_LOW (0x30)
#define ESR_ELx_EC_BREAKPT_CUR (0x31)
@@ -382,6 +383,31 @@
#define ESR_ELx_MOPS_ISS_SRCREG(esr) (((esr) & (UL(0x1f) << 5)) >> 5)
#define ESR_ELx_MOPS_ISS_SIZEREG(esr) (((esr) & (UL(0x1f) << 0)) >> 0)

+/* ISS field definitions for GCS */
+#define ESR_ELx_ExType_SHIFT (20)
+#define ESR_ELx_ExType_MASK GENMASK(23, 20)
+#define ESR_ELx_Raddr_SHIFT (10)
+#define ESR_ELx_Raddr_MASK GENMASK(14, 10)
+#define ESR_ELx_Rn_SHIFT (5)
+#define ESR_ELx_Rn_MASK GENMASK(9, 5)
+#define ESR_ELx_Rvalue_SHIFT 5
+#define ESR_ELx_Rvalue_MASK GENMASK(9, 5)
+#define ESR_ELx_IT_SHIFT (0)
+#define ESR_ELx_IT_MASK GENMASK(4, 0)
+
+#define ESR_ELx_ExType_DATA_CHECK 0
+#define ESR_ELx_ExType_EXLOCK 1
+#define ESR_ELx_ExType_STR 2
+
+#define ESR_ELx_IT_RET 0
+#define ESR_ELx_IT_GCSPOPM 1
+#define ESR_ELx_IT_RET_KEYA 2
+#define ESR_ELx_IT_RET_KEYB 3
+#define ESR_ELx_IT_GCSSS1 4
+#define ESR_ELx_IT_GCSSS2 5
+#define ESR_ELx_IT_GCSPOPCX 6
+#define ESR_ELx_IT_GCSPOPX 7
+
#ifndef __ASSEMBLY__
#include <asm/types.h>

diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
index ad688e157c9b..99caff458e20 100644
--- a/arch/arm64/include/asm/exception.h
+++ b/arch/arm64/include/asm/exception.h
@@ -57,6 +57,8 @@ void do_el0_undef(struct pt_regs *regs, unsigned long esr);
void do_el1_undef(struct pt_regs *regs, unsigned long esr);
void do_el0_bti(struct pt_regs *regs);
void do_el1_bti(struct pt_regs *regs, unsigned long esr);
+void do_el0_gcs(struct pt_regs *regs, unsigned long esr);
+void do_el1_gcs(struct pt_regs *regs, unsigned long esr);
void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr,
struct pt_regs *regs);
void do_fpsimd_acc(unsigned long esr, struct pt_regs *regs);
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 6b2e0c367702..4d86216962e5 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -400,6 +400,15 @@ static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
exit_to_kernel_mode(regs);
}

+static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
+{
+ enter_from_kernel_mode(regs);
+ local_daif_inherit(regs);
+ do_el1_gcs(regs, esr);
+ local_daif_mask();
+ exit_to_kernel_mode(regs);
+}
+
static void noinstr el1_dbg(struct pt_regs *regs, unsigned long esr)
{
unsigned long far = read_sysreg(far_el1);
@@ -442,6 +451,9 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
case ESR_ELx_EC_BTI:
el1_bti(regs, esr);
break;
+ case ESR_ELx_EC_GCS:
+ el1_gcs(regs, esr);
+ break;
case ESR_ELx_EC_BREAKPT_CUR:
case ESR_ELx_EC_SOFTSTP_CUR:
case ESR_ELx_EC_WATCHPT_CUR:
@@ -621,6 +633,14 @@ static void noinstr el0_mops(struct pt_regs *regs, unsigned long esr)
exit_to_user_mode(regs);
}

+static void noinstr el0_gcs(struct pt_regs *regs, unsigned long esr)
+{
+ enter_from_user_mode(regs);
+ local_daif_restore(DAIF_PROCCTX);
+ do_el0_gcs(regs, esr);
+ exit_to_user_mode(regs);
+}
+
static void noinstr el0_inv(struct pt_regs *regs, unsigned long esr)
{
enter_from_user_mode(regs);
@@ -701,6 +721,9 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
case ESR_ELx_EC_MOPS:
el0_mops(regs, esr);
break;
+ case ESR_ELx_EC_GCS:
+ el0_gcs(regs, esr);
+ break;
case ESR_ELx_EC_BREAKPT_LOW:
case ESR_ELx_EC_SOFTSTP_LOW:
case ESR_ELx_EC_WATCHPT_LOW:
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 8b70759cdbb9..65dab959f620 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -500,6 +500,16 @@ void do_el1_bti(struct pt_regs *regs, unsigned long esr)
die("Oops - BTI", regs, esr);
}

+void do_el0_gcs(struct pt_regs *regs, unsigned long esr)
+{
+ force_signal_inject(SIGSEGV, SEGV_CPERR, regs->pc, 0);
+}
+
+void do_el1_gcs(struct pt_regs *regs, unsigned long esr)
+{
+ die("Oops - GCS", regs, esr);
+}
+
void do_el0_fpac(struct pt_regs *regs, unsigned long esr)
{
force_signal_inject(SIGILL, ILL_ILLOPN, regs->pc, esr);
@@ -884,6 +894,7 @@ static const char *esr_class_str[] = {
[ESR_ELx_EC_MOPS] = "MOPS",
[ESR_ELx_EC_FP_EXC32] = "FP (AArch32)",
[ESR_ELx_EC_FP_EXC64] = "FP (AArch64)",
+ [ESR_ELx_EC_GCS] = "Guarded Control Stack",
[ESR_ELx_EC_SERROR] = "SError",
[ESR_ELx_EC_BREAKPT_LOW] = "Breakpoint (lower EL)",
[ESR_ELx_EC_BREAKPT_CUR] = "Breakpoint (current EL)",

--
2.30.2


2023-08-07 22:22:23

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 19/36] arm64/gcs: Allocate a new GCS for threads with GCS enabled

We do not currently have a mechanism to specify a new GCS for a new
thread so when a thread is created which has GCS enabled allocate one
for it. Since there is no current API for specifying the size of the
GCS we follow the extensively discussed x86 implementation and allocate
min(RLIMIT_STACK, 4G). Since the GCS only stores the call stack and not
any variables this should be more than sufficient for most applications.

When allocating the stack we initialise GCSPR_EL0 to point to one entry
below the end of the region allocated, this keeps the top entry of the
stack 0 so software walking the GCS can easily detect the end of the
region.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 7 ++++++
arch/arm64/kernel/process.c | 30 ++++++++++++++++++++++++
arch/arm64/mm/gcs.c | 56 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 93 insertions(+)

diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 04594ef59dad..4371a2f99b4a 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -58,6 +58,8 @@ static inline bool task_gcs_el0_enabled(struct task_struct *task)
void gcs_set_el0_mode(struct task_struct *task);
void gcs_free(struct task_struct *task);
void gcs_preserve_current_state(void);
+unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
+ unsigned long clone_flags, size_t size);

#else

@@ -69,6 +71,11 @@ static inline bool task_gcs_el0_enabled(struct task_struct *task)
static inline void gcs_set_el0_mode(struct task_struct *task) { }
static inline void gcs_free(struct task_struct *task) { }
static inline void gcs_preserve_current_state(void) { }
+static inline unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
+ unsigned long clone_flags, size_t size)
+{
+ return -ENOTSUPP;
+}

#endif

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index b8a42471aea3..1de6371ca2d8 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -284,9 +284,34 @@ static void flush_gcs(void)
}
}

+static int copy_thread_gcs(struct task_struct *p, unsigned long clone_flags,
+ size_t stack_size)
+{
+ unsigned long gcs;
+
+ if (!system_supports_gcs())
+ return 0;
+
+ if (!task_gcs_el0_enabled(p))
+ return 0;
+
+ p->thread.gcspr_el0 = read_sysreg_s(SYS_GCSPR_EL0);
+
+ gcs = gcs_alloc_thread_stack(p, clone_flags, stack_size);
+ if (IS_ERR_VALUE(gcs))
+ return PTR_ERR((void *)gcs);
+
+ return 0;
+}
+
#else

static void flush_gcs(void) { }
+static int copy_thread_gcs(struct task_struct *p, unsigned long clone_flags,
+ size_t stack_size)
+{
+ return 0;
+}

#endif

@@ -368,6 +393,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
unsigned long stack_start = args->stack;
unsigned long tls = args->tls;
struct pt_regs *childregs = task_pt_regs(p);
+ int ret;

memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));

@@ -409,6 +435,10 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
p->thread.uw.tp_value = tls;
p->thread.tpidr2_el0 = 0;
}
+
+ ret = copy_thread_gcs(p, clone_flags, args->stack_size);
+ if (ret != 0)
+ return ret;
} else {
/*
* A kthread has no context to ERET to, so ensure any buggy
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index b0a67efc522b..1e059c37088d 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -8,6 +8,62 @@
#include <asm/cpufeature.h>
#include <asm/page.h>

+static unsigned long alloc_gcs(unsigned long addr, unsigned long size,
+ unsigned long token_offset, bool set_res_tok)
+{
+ int flags = MAP_ANONYMOUS | MAP_PRIVATE;
+ struct mm_struct *mm = current->mm;
+ unsigned long mapped_addr, unused;
+
+ if (addr)
+ flags |= MAP_FIXED_NOREPLACE;
+
+ mmap_write_lock(mm);
+ mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags,
+ VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL);
+ mmap_write_unlock(mm);
+
+ return mapped_addr;
+}
+
+static unsigned long gcs_size(unsigned long size)
+{
+ if (size)
+ return PAGE_ALIGN(size);
+
+ /* Allocate RLIMIT_STACK with limits of PAGE_SIZE..4G */
+ size = PAGE_ALIGN(min_t(unsigned long long,
+ rlimit(RLIMIT_STACK), SZ_4G));
+ return max(PAGE_SIZE, size);
+}
+
+unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
+ unsigned long clone_flags, size_t size)
+{
+ unsigned long addr;
+
+ if (!system_supports_gcs())
+ return 0;
+
+ if (!task_gcs_el0_enabled(tsk))
+ return 0;
+
+ if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM)
+ return 0;
+
+ size = gcs_size(size);
+
+ addr = alloc_gcs(0, size, 0, 0);
+ if (IS_ERR_VALUE(addr))
+ return addr;
+
+ tsk->thread.gcs_base = addr;
+ tsk->thread.gcs_size = size;
+ tsk->thread.gcspr_el0 = addr + size - sizeof(u64);
+
+ return addr;
+}
+
/*
* Apply the GCS mode configured for the specified task to the
* hardware.

--
2.30.2


2023-08-07 22:25:22

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 27/36] kselftest/arm64: Add GCS as a detected feature in the signal tests

In preparation for testing GCS related signal handling add it as a feature
we check for in the signal handling support code.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/signal/test_signals.h | 2 ++
tools/testing/selftests/arm64/signal/test_signals_utils.c | 3 +++
2 files changed, 5 insertions(+)

diff --git a/tools/testing/selftests/arm64/signal/test_signals.h b/tools/testing/selftests/arm64/signal/test_signals.h
index 1e6273d81575..7ada43688c02 100644
--- a/tools/testing/selftests/arm64/signal/test_signals.h
+++ b/tools/testing/selftests/arm64/signal/test_signals.h
@@ -35,6 +35,7 @@ enum {
FSME_BIT,
FSME_FA64_BIT,
FSME2_BIT,
+ FGCS_BIT,
FMAX_END
};

@@ -43,6 +44,7 @@ enum {
#define FEAT_SME (1UL << FSME_BIT)
#define FEAT_SME_FA64 (1UL << FSME_FA64_BIT)
#define FEAT_SME2 (1UL << FSME2_BIT)
+#define FEAT_GCS (1UL << FGCS_BIT)

/*
* A descriptor used to describe and configure a test case.
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c
index 0dc948db3a4a..89ef95c1af0e 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.c
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c
@@ -30,6 +30,7 @@ static char const *const feats_names[FMAX_END] = {
" SME ",
" FA64 ",
" SME2 ",
+ " GCS ",
};

#define MAX_FEATS_SZ 128
@@ -329,6 +330,8 @@ int test_init(struct tdescr *td)
td->feats_supported |= FEAT_SME_FA64;
if (getauxval(AT_HWCAP2) & HWCAP2_SME2)
td->feats_supported |= FEAT_SME2;
+ if (getauxval(AT_HWCAP2) & HWCAP2_GCS)
+ td->feats_supported |= FEAT_GCS;
if (feats_ok(td)) {
if (td->feats_required & td->feats_supported)
fprintf(stderr,

--
2.30.2


2023-08-07 22:29:04

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

Add some documentation of the userspace ABI for Guarded Control Stacks.

Signed-off-by: Mark Brown <[email protected]>
---
Documentation/arch/arm64/gcs.rst | 228 +++++++++++++++++++++++++++++++++++++
Documentation/arch/arm64/index.rst | 1 +
2 files changed, 229 insertions(+)

diff --git a/Documentation/arch/arm64/gcs.rst b/Documentation/arch/arm64/gcs.rst
new file mode 100644
index 000000000000..c0f43961fd4b
--- /dev/null
+++ b/Documentation/arch/arm64/gcs.rst
@@ -0,0 +1,228 @@
+===============================================
+Guarded Control Stack support for AArch64 Linux
+===============================================
+
+This document outlines briefly the interface provided to userspace by Linux in
+order to support use of the ARM Guarded Control Stack (GCS) feature.
+
+This is an outline of the most important features and issues only and not
+intended to be exhaustive.
+
+
+
+1. General
+-----------
+
+* GCS is an architecture feature intended to provide greater protection
+ against return oriented programming (ROP) attacks and to simplify the
+ implementation of features that need to collect stack traces such as
+ profiling.
+
+* When GCS is enabled a separate guarded control stack is maintained by the
+ PE which is writeable only through specific GCS operations. This
+ stores the call stack only, when a procedure call instruction is
+ performed the current PC is pushed onto the GCS and on RET the
+ address in the LR is verified against that on the top of the GCS.
+
+* When active current GCS pointer is stored in the system register
+ GCSPR_EL0. This is readable by userspace but can only be updated
+ via specific GCS instructions.
+
+* The architecture provides instructions for switching between guarded
+ control stacks with checks to ensure that the new stack is a valid
+ target for switching.
+
+* The functionality of GCS is similar to that provided by the x86 Shadow
+ Stack feature, due to sharing of userspace interfaces the ABI refers to
+ shadow stacks rather than GCS.
+
+* Support for GCS is reported to userspace via HWCAP2_GCS in the aux vector
+ AT_HWCAP2 entry.
+
+* GCS is enabled per thread. While there is support for disabling GCS
+ at runtime this should be done with great care.
+
+* GCS memory access faults are reported as normal memory access faults.
+
+* GCS specific errors (those reported with EC 0x2d) will be reported as
+ SIGSEGV with a si_code of SEGV_CPERR (control protection error).
+
+* GCS is supported only for AArch64.
+
+* On systems where GCS is supported GCSPR_EL0 is always readable by EL0
+ regardless of the GCS configuration for the thread.
+
+* The architecture supports enabling GCS without verifying that return values
+ in LR match those in the GCS, the LR will be ignored. This is not supported
+ by Linux.
+
+* EL0 GCS entries with bit 63 set are reserved for use, one such use is defined
+ below for signals and should be ignored when parsing the stack if not
+ understood.
+
+
+2. Enabling and disabling Guarded Control Stacks
+-------------------------------------------------
+
+* GCS is enabled and disabled for a thread via the PR_SET_SHADOW_STACK_STATUS
+ prctl(), this takes a single flags argument specifying which GCS features
+ should be used.
+
+* When set PR_SHADOW_STACK_ENABLE flag allocates a Guarded Control Stack for
+ and enables GCS for the thread, enabling the functionality controlled by
+ GCSPRE0_EL1.{nTR, RVCHKEN, PCRSEL}.
+
+* When set the PR_SHADOW_STACK_PUSH flag enables the functionality controlled
+ by GCSCRE0_EL1.PUSHMEn, allowing explicit GCS pushes.
+
+* When set the PR_SHADOW_STACK_WRITE flag enables the functionality controlled
+ by GCSCRE0_EL1.STREn, allowing explicit stores to the Guarded Control Stack.
+
+* Any unknown flags will cause PR_SET_SHADOW_STACK_STATUS to return -EINVAL.
+
+* PR_LOCK_SHADOW_STACK_STATUS is passed a bitmask of features with the same
+ values as used for PR_SET_SHADOW_STACK_STATUS. Any future changes to the
+ status of the specified GCS mode bits will be rejected.
+
+* PR_LOCK_SHADOW_STACK_STATUS allows any bit to be locked, this allows
+ userspace to prevent changes to any future features.
+
+* PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS affect only the
+ thread the called them, any other running threads will be unaffected.
+
+* New threads inherit the GCS configuration of the thread that created them.
+
+* GCS is disabled on exec().
+
+* The current GCS configuration for a thread may be read with the
+ PR_GET_SHADOW_STACK_STATUS prctl(), this returns the same flags that
+ are passed to PR_SET_SHADOW_STACK_STATUS.
+
+* If GCS is disabled for a thread after having previously been enabled then
+ the stack will remain allocated for the lifetime of the thread. At present
+ any attempt to reenable GCS for the thread will be rejected, this may be
+ revisited in future.
+
+* It should be noted that since enabling GCS will result in GCS becoming
+ active immediately it is not normally possible to return from the function
+ that invoked the prctl() that enabled GCS. It is expected that the normal
+ usage will be that GCS is enabled very early in execution of a program.
+
+
+
+3. Allocation of Guarded Control Stacks
+----------------------------------------
+
+* When GCS is enabled for a thread a new Guarded Control Stack will be
+ allocated for it of size RLIMIT_STACK / 2 or 2 gigabytes, whichever is
+ smaller.
+
+* When a new thread is created by a thread which has GCS enabled then a
+ new Guarded Control Stack will be allocated for the new thread with
+ half the size of the standard stack.
+
+* When a stack is allocated by enabling GCS or during thread creation then
+ the top 8 bytes of the stack will be initialised to 0 and GCSPR_EL0 will
+ be set to point to the address of this 0 value, this can be used to
+ detect the top of the stack.
+
+* Additional Guarded Control Stacks can be allocated using the
+ map_shadow_stack() system call.
+
+* Stacks allocated using map_shadow_stack() can optionally have an end of
+ stack marker and cap placed at the top of the stack. If the flag
+ SHADOW_STACK_SET_TOKEN is specified a cap will be placed on the stack,
+ if SHADOW_STACK_SET_MARKER is not specified the cap will be the top 8
+ bytes of the stack and if it is specified then the cap will be the next
+ 8 bytes. While specifying just SHADOW_STACK_SET_MARKER by itself is
+ valid since the marker is all bits 0 it has no observable effect.
+
+* Stacks allocated using map_shadow_stack() must be larger than 16 bytes and
+ must be 16 bytes aligned.
+
+* When GCS is disabled for a thread the Guarded Control Stack initially
+ allocated for that thread will be freed. Note carefully that if the
+ stack has been switched this may not be the stack currently in use by
+ the thread.
+
+
+4. Signal handling
+--------------------
+
+* A new signal frame record gcs_context encodes the current GCS mode and
+ pointer for the interrupted context on signal delivery. This will always
+ be present on systems that support GCS.
+
+* The record contains a flag field which reports the current GCS configuration
+ for the interrupted context as PR_GET_SHADOW_STACK_STATUS would.
+
+* The signal handler is run with the same GCS configuration as the interrupted
+ context.
+
+* When GCS is enabled for the interrupted thread a signal handling specific
+ GCS cap token will be written to the GCS, this is an architectural GCS cap
+ token with bit 63 set. The GCSPR_EL0 reported in the signal frame will
+ point to this cap token.
+
+* The signal handler will use the same GCS as the interrupted context.
+
+* When GCS is enabled on signal entry a frame with the address of the signal
+ return handler will be pushed onto the GCS, allowing return from the signal
+ handler via RET as normal. This will not be reported in the gcs_context in
+ the signal frame.
+
+
+5. Signal return
+-----------------
+
+When returning from a signal handler:
+
+* If there is a gcs_context record in the signal frame then the GCS flags
+ and GCSPR_EL0 will be restored from that context prior to further
+ validation.
+
+* If there is no gcs_context record in the signal frame then the GCS
+ configuration will be unchanged.
+
+* If GCS is enabled on return from a signal handler then GCSPR_EL0 must
+ point to a valid GCS signal cap record, this will be popped from the
+ GCS prior to signal return.
+
+* If the GCS configuration is locked when returning from a signal then any
+ attempt to change the GCS configuration will be treated as an error. This
+ is true even if GCS was not enabled prior to signal entry.
+
+* GCS may be disabled via signal return but any attempt to enable GCS via
+ signal return will be rejected.
+
+
+7. ptrace extensions
+---------------------
+
+* A new regset NT_ARM_GCS is defined for use with PTRACE_GETREGSET and
+ PTRACE_SETREGSET.
+
+* Due to the complexity surrounding allocation and deallocation of stacks and
+ lack of practical application it is not possible to enable GCS via ptrace.
+ GCS may be disabled via the ptrace interface.
+
+* Other GCS modes may be configured via ptrace.
+
+* Configuration via ptrace ignores locking of GCS mode bits.
+
+
+8. ELF coredump extensions
+---------------------------
+
+* NT_ARM_GCS notes will be added to each coredump for each thread of the
+ dumped process. The contents will be equivalent to the data that would
+ have been read if a PTRACE_GETREGSET of the corresponding type were
+ executed for each thread when the coredump was generated.
+
+
+
+9. /proc extensions
+--------------------
+
+* Guarded Control Stack pages will include "ss" in their VmFlags in
+ /proc/<pid>/smaps.
diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
index d08e924204bf..dcf3ee3eb8c0 100644
--- a/Documentation/arch/arm64/index.rst
+++ b/Documentation/arch/arm64/index.rst
@@ -14,6 +14,7 @@ ARM64 Architecture
booting
cpu-feature-registers
elf_hwcaps
+ gcs
hugetlbpage
kdump
legacy_instructions

--
2.30.2


2023-08-07 22:29:07

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 34/36] selftests/arm64: Add GCS signal tests

Do some testing of the signal handling for GCS, checking that a GCS
frame has the expected information in it and that the expected signals
are delivered with invalid operations.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../selftests/arm64/signal/test_signals_utils.h | 10 +++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 ++++++++++++++++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 ++++++++++++++++++++++
.../arm64/signal/testcases/gcs_write_fault.c | 67 +++++++++++++++++++
5 files changed, 215 insertions(+)

diff --git a/tools/testing/selftests/arm64/signal/.gitignore b/tools/testing/selftests/arm64/signal/.gitignore
index 839e3a252629..26de12918890 100644
--- a/tools/testing/selftests/arm64/signal/.gitignore
+++ b/tools/testing/selftests/arm64/signal/.gitignore
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
mangle_*
fake_sigreturn_*
+gcs_*
sme_*
ssve_*
sve_*
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 1cea64986baa..d41f237db28d 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -6,6 +6,7 @@

#include <assert.h>
#include <stdio.h>
+#include <stdint.h>
#include <string.h>

#include "test_signals.h"
@@ -45,6 +46,15 @@ void test_result(struct tdescr *td);
_arg1; \
})

+static inline __attribute__((always_inline)) uint64_t get_gcspr_el0(void)
+{
+ uint64_t val;
+
+ asm volatile("mrs %0, S3_3_C2_C5_1" : "=r" (val));
+
+ return val;
+}
+
static inline bool feats_ok(struct tdescr *td)
{
if (td->feats_incompatible & td->feats_supported)
diff --git a/tools/testing/selftests/arm64/signal/testcases/gcs_exception_fault.c b/tools/testing/selftests/arm64/signal/testcases/gcs_exception_fault.c
new file mode 100644
index 000000000000..532d533592a1
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/gcs_exception_fault.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Limited
+ */
+
+#include <errno.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+/* This should be includable from some standard header, but which? */
+#ifndef SEGV_CPERR
+#define SEGV_CPERR 10
+#endif
+
+static inline void gcsss1(uint64_t Xt)
+{
+ asm volatile (
+ "sys #3, C7, C7, #2, %0\n"
+ :
+ : "rZ" (Xt)
+ : "memory");
+}
+
+static int gcs_op_fault_trigger(struct tdescr *td)
+{
+ /*
+ * The slot below our current GCS should be in a valid GCS but
+ * must not have a valid cap in it.
+ */
+ gcsss1(get_gcspr_el0() - 8);
+
+ return 0;
+}
+
+static int gcs_op_fault_signal(struct tdescr *td, siginfo_t *si,
+ ucontext_t *uc)
+{
+ ASSERT_GOOD_CONTEXT(uc);
+
+ return 1;
+}
+
+struct tdescr tde = {
+ .name = "Invalid GCS operation",
+ .descr = "An invalid GCS operation generates the expected signal",
+ .feats_required = FEAT_GCS,
+ .timeout = 3,
+ .sig_ok = SIGSEGV,
+ .sig_ok_code = SEGV_CPERR,
+ .sanity_disabled = true,
+ .trigger = gcs_op_fault_trigger,
+ .run = gcs_op_fault_signal,
+};
diff --git a/tools/testing/selftests/arm64/signal/testcases/gcs_frame.c b/tools/testing/selftests/arm64/signal/testcases/gcs_frame.c
new file mode 100644
index 000000000000..d67cb26195a6
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/gcs_frame.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Limited
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+static union {
+ ucontext_t uc;
+ char buf[1024 * 64];
+} context;
+
+static int gcs_regs(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+ size_t offset;
+ struct _aarch64_ctx *head = GET_BUF_RESV_HEAD(context);
+ struct gcs_context *gcs;
+ unsigned long expected, gcspr;
+ int ret;
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &expected, 0, 0, 0);
+ if (ret != 0) {
+ fprintf(stderr, "Unable to query GCS status\n");
+ return 1;
+ }
+
+ /* We expect a cap to be added to the GCS in the signal frame */
+ gcspr = get_gcspr_el0();
+ gcspr -= 8;
+ fprintf(stderr, "Expecting GCSPR_EL0 %lx\n", gcspr);
+
+ if (!get_current_context(td, &context.uc, sizeof(context))) {
+ fprintf(stderr, "Failed getting context\n");
+ return 1;
+ }
+ fprintf(stderr, "Got context\n");
+
+ head = get_header(head, GCS_MAGIC, GET_BUF_RESV_SIZE(context),
+ &offset);
+ if (!head) {
+ fprintf(stderr, "No GCS context\n");
+ return 1;
+ }
+
+ gcs = (struct gcs_context *)head;
+
+ /* Basic size validation is done in get_current_context() */
+
+ if (gcs->features_enabled != expected) {
+ fprintf(stderr, "Features enabled %llx but expected %lx\n",
+ gcs->features_enabled, expected);
+ return 1;
+ }
+
+ if (gcs->gcspr != gcspr) {
+ fprintf(stderr, "Got GCSPR %llx but expected %lx\n",
+ gcs->gcspr, gcspr);
+ return 1;
+ }
+
+ fprintf(stderr, "GCS context validated\n");
+ td->pass = 1;
+
+ return 0;
+}
+
+struct tdescr tde = {
+ .name = "GCS basics",
+ .descr = "Validate a GCS signal context",
+ .feats_required = FEAT_GCS,
+ .timeout = 3,
+ .run = gcs_regs,
+};
diff --git a/tools/testing/selftests/arm64/signal/testcases/gcs_write_fault.c b/tools/testing/selftests/arm64/signal/testcases/gcs_write_fault.c
new file mode 100644
index 000000000000..126b1a294a29
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/gcs_write_fault.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Limited
+ */
+
+#include <errno.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include <sys/mman.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+static uint64_t *gcs_page;
+
+#ifndef __NR_map_shadow_stack
+#define __NR_map_shadow_stack 452
+#endif
+
+static bool alloc_gcs(struct tdescr *td)
+{
+ long page_size = sysconf(_SC_PAGE_SIZE);
+
+ gcs_page = (void *)syscall(__NR_map_shadow_stack, 0,
+ page_size, 0);
+ if (gcs_page == MAP_FAILED) {
+ fprintf(stderr, "Failed to map %ld byte GCS: %d\n",
+ page_size, errno);
+ return false;
+ }
+
+ return true;
+}
+
+static int gcs_write_fault_trigger(struct tdescr *td)
+{
+ /* Verify that the page is readable (ie, not completely unmapped) */
+ fprintf(stderr, "Read value 0x%lx\n", gcs_page[0]);
+
+ /* A regular write should trigger a fault */
+ gcs_page[0] = EINVAL;
+
+ return 0;
+}
+
+static int gcs_write_fault_signal(struct tdescr *td, siginfo_t *si,
+ ucontext_t *uc)
+{
+ ASSERT_GOOD_CONTEXT(uc);
+
+ return 1;
+}
+
+
+struct tdescr tde = {
+ .name = "GCS write fault",
+ .descr = "Normal writes to a GCS segfault",
+ .feats_required = FEAT_GCS,
+ .timeout = 3,
+ .sig_ok = SIGSEGV,
+ .sanity_disabled = true,
+ .init = alloc_gcs,
+ .trigger = gcs_write_fault_trigger,
+ .run = gcs_write_fault_signal,
+};

--
2.30.2


2023-08-07 22:35:24

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 31/36] kselftest/arm64: Add very basic GCS test program

This test program just covers the basic GCS ABI, covering aspects of the
ABI as standalone features without attempting to integrate things.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/gcs/.gitignore | 1 +
tools/testing/selftests/arm64/gcs/Makefile | 18 ++
tools/testing/selftests/arm64/gcs/basic-gcs.c | 356 ++++++++++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 87 +++++++
5 files changed, 463 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile
index 28b93cab8c0d..22029e60eff3 100644
--- a/tools/testing/selftests/arm64/Makefile
+++ b/tools/testing/selftests/arm64/Makefile
@@ -4,7 +4,7 @@
ARCH ?= $(shell uname -m 2>/dev/null || echo not)

ifneq (,$(filter $(ARCH),aarch64 arm64))
-ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi
+ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi gcs
else
ARM64_SUBTARGETS :=
endif
diff --git a/tools/testing/selftests/arm64/gcs/.gitignore b/tools/testing/selftests/arm64/gcs/.gitignore
new file mode 100644
index 000000000000..0e5e695ecba5
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/.gitignore
@@ -0,0 +1 @@
+basic-gcs
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
new file mode 100644
index 000000000000..61a30f483429
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 ARM Limited
+#
+# In order to avoid interaction with the toolchain and dynamic linker the
+# portions of these tests that interact with the GCS are implemented using
+# nolibc.
+#
+
+TEST_GEN_PROGS := basic-gcs
+
+include ../../lib.mk
+
+$(OUTPUT)/basic-gcs: basic-gcs.c
+ $(CC) -g -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
+ -static -include ../../../../include/nolibc/nolibc.h \
+ -I../../../../../usr/include \
+ -std=gnu99 -I../.. -g \
+ -ffreestanding -Wall $^ -o $@ -lgcc
diff --git a/tools/testing/selftests/arm64/gcs/basic-gcs.c b/tools/testing/selftests/arm64/gcs/basic-gcs.c
new file mode 100644
index 000000000000..0fac554a3c4d
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/basic-gcs.c
@@ -0,0 +1,356 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Limited.
+ */
+
+#include <limits.h>
+#include <stdbool.h>
+
+#include <linux/prctl.h>
+
+#include <sys/mman.h>
+#include <asm/mman.h>
+
+#include "kselftest.h"
+#include "gcs-util.h"
+
+/* nolibc doesn't have sysconf(), just hard code the maximum */
+static size_t page_size = 65536;
+
+static __attribute__((noinline)) void valid_gcs_function(void)
+{
+ /* Do something the compiler can't optimise out */
+ my_syscall1(__NR_prctl, PR_SVE_GET_VL);
+}
+
+static inline int gcs_set_status(unsigned long mode)
+{
+ bool enabling = mode & PR_SHADOW_STACK_ENABLE;
+ int ret;
+ unsigned long new_mode;
+
+ /*
+ * The prctl takes 1 argument but we need to ensure that the
+ * other 3 values passed in registers to the syscall are zero
+ * since the kernel validates them.
+ */
+ ret = my_syscall5(__NR_prctl, PR_SET_SHADOW_STACK_STATUS, mode,
+ 0, 0, 0);
+
+ if (ret == 0) {
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &new_mode, 0, 0, 0);
+ if (ret == 0) {
+ if (new_mode != mode) {
+ ksft_print_msg("Mode set to %x not %x\n",
+ new_mode, mode);
+ ret = -EINVAL;
+ }
+ } else {
+ ksft_print_msg("Failed to validate mode: %d\n", ret);
+ }
+
+ if (enabling != chkfeat_gcs()) {
+ ksft_print_msg("%senabled by prctl but %senabled in CHKFEAT\n",
+ enabling ? "" : "not ",
+ chkfeat_gcs() ? "" : "not ");
+ ret = -EINVAL;
+ }
+ }
+
+ return ret;
+}
+
+/* Try to read the status */
+static bool read_status(void)
+{
+ unsigned long state;
+ int ret;
+
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &state, 0, 0, 0);
+ if (ret != 0) {
+ ksft_print_msg("Failed to read state: %d\n", ret);
+ return false;
+ }
+
+ return state & PR_SHADOW_STACK_ENABLE;
+}
+
+/* Just a straight enable */
+static bool base_enable(void)
+{
+ int ret;
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE);
+ if (ret) {
+ ksft_print_msg("PR_SHADOW_STACK_ENABLE failed %d\n", ret);
+ return false;
+ }
+
+ return true;
+}
+
+/* Check we can read GCSPR_EL0 when GCS is enabled */
+static bool read_gcspr_el0(void)
+{
+ unsigned long *gcspr_el0;
+
+ ksft_print_msg("GET GCSPR\n");
+ gcspr_el0 = get_gcspr();
+ ksft_print_msg("GCSPR_EL0 is %p\n", gcspr_el0);
+
+ return true;
+}
+
+/* Also allow writes to stack */
+static bool enable_writeable(void)
+{
+ int ret;
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE);
+ if (ret) {
+ ksft_print_msg("PR_SHADOW_STACK_ENABLE writeable failed: %d\n", ret);
+ return false;
+ }
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE);
+ if (ret) {
+ ksft_print_msg("failed to restore plain enable %d\n", ret);
+ return false;
+ }
+
+ return true;
+}
+
+/* Also allow writes to stack */
+static bool enable_push_pop(void)
+{
+ int ret;
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_PUSH);
+ if (ret) {
+ ksft_print_msg("PR_SHADOW_STACK_ENABLE with push failed: %d\n",
+ ret);
+ return false;
+ }
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE);
+ if (ret) {
+ ksft_print_msg("failed to restore plain enable %d\n", ret);
+ return false;
+ }
+
+ return true;
+}
+
+/* Enable GCS and allow everything */
+static bool enable_all(void)
+{
+ int ret;
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_PUSH |
+ PR_SHADOW_STACK_WRITE);
+ if (ret) {
+ ksft_print_msg("PR_SHADOW_STACK_ENABLE with everything failed: %d\n",
+ ret);
+ return false;
+ }
+
+ ret = gcs_set_status(PR_SHADOW_STACK_ENABLE);
+ if (ret) {
+ ksft_print_msg("failed to restore plain enable %d\n", ret);
+ return false;
+ }
+
+ return true;
+}
+
+static bool enable_invalid(void)
+{
+ int ret = gcs_set_status(ULONG_MAX);
+ if (ret == 0) {
+ ksft_print_msg("GCS_SET_STATUS %lx succeeded\n", ULONG_MAX);
+ return false;
+ }
+
+ return true;
+}
+
+/* Map a GCS */
+static bool map_guarded_stack(void)
+{
+ int ret;
+ uint64_t *buf;
+ uint64_t expected_cap;
+ int elem;
+ bool pass = true;
+
+ buf = (void *)my_syscall3(__NR_map_shadow_stack, 0, page_size,
+ SHADOW_STACK_SET_MARKER |
+ SHADOW_STACK_SET_TOKEN);
+ if (buf == MAP_FAILED) {
+ ksft_print_msg("Failed to map %d byte GCS: %d\n",
+ page_size, errno);
+ return false;
+ }
+ ksft_print_msg("Mapped GCS at %p-%p\n", buf,
+ (uint64_t)buf + page_size);
+
+ /* The top of the newly allocated region should be 0 */
+ elem = (page_size / sizeof(uint64_t)) - 1;
+ if (buf[elem]) {
+ ksft_print_msg("Last entry is 0x%lx not 0x0\n", buf[elem]);
+ pass = false;
+ }
+
+ /* Then a valid cap token */
+ elem--;
+ expected_cap = ((uint64_t)buf + page_size - 16);
+ expected_cap &= GCS_CAP_ADDR_MASK;
+ expected_cap |= GCS_CAP_VALID_TOKEN;
+ if (buf[elem] != expected_cap) {
+ ksft_print_msg("Cap entry is 0x%lx not 0x%lx\n",
+ buf[elem], expected_cap);
+ pass = false;
+ }
+ ksft_print_msg("cap token is 0x%lx\n", buf[elem]);
+
+ /* The rest should be zeros */
+ for (elem = 0; elem < page_size / sizeof(uint64_t) - 2; elem++) {
+ if (!buf[elem])
+ continue;
+ ksft_print_msg("GCS slot %d is 0x%lx not 0x0\n",
+ elem, buf[elem]);
+ pass = false;
+ }
+
+ ret = munmap(buf, page_size);
+ if (ret != 0) {
+ ksft_print_msg("Failed to unmap %d byte GCS: %d\n",
+ page_size, errno);
+ pass = false;
+ }
+
+ return pass;
+}
+
+/* A fork()ed process can run */
+static bool test_fork(void)
+{
+ unsigned long child_mode;
+ int ret, status;
+ pid_t pid;
+ bool pass = true;
+
+ pid = fork();
+ if (pid == -1) {
+ ksft_print_msg("fork() failed: %d\n", errno);
+ pass = false;
+ goto out;
+ }
+ if (pid == 0) {
+ /* In child, make sure we can call a function, read
+ * the GCS pointer and status and then exit */
+ valid_gcs_function();
+ get_gcspr();
+
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &child_mode, 0, 0, 0);
+ if (ret == 0 && !(child_mode & PR_SHADOW_STACK_ENABLE)) {
+ ksft_print_msg("GCS not enabled in child\n");
+ ret = -EINVAL;
+ }
+
+ exit(ret);
+ }
+
+ /*
+ * In parent, check we can still do function calls then block
+ * for the child.
+ */
+ valid_gcs_function();
+
+ ksft_print_msg("Waiting for child %d\n", pid);
+
+ ret = waitpid(pid, &status, 0);
+ if (ret == -1) {
+ ksft_print_msg("Failed to wait for child: %d\n",
+ errno);
+ return false;
+ }
+
+ if (!WIFEXITED(status)) {
+ ksft_print_msg("Child exited due to signal %d\n",
+ WTERMSIG(status));
+ pass = false;
+ } else {
+ if (WEXITSTATUS(status)) {
+ ksft_print_msg("Child exited with status %d\n",
+ WEXITSTATUS(status));
+ pass = false;
+ }
+ }
+
+out:
+
+ return pass;
+}
+
+typedef bool (*gcs_test)(void);
+
+static struct {
+ char *name;
+ gcs_test test;
+ bool needs_enable;
+} tests[] = {
+ { "read_status", read_status },
+ { "base_enable", base_enable, true },
+ { "read_gcspr_el0", read_gcspr_el0 },
+ { "enable_writeable", enable_writeable, true },
+ { "enable_push_pop", enable_push_pop, true },
+ { "enable_all", enable_all, true },
+ { "enable_invalid", enable_invalid, true },
+ { "map_guarded_stack", map_guarded_stack },
+ { "fork", test_fork },
+};
+
+int main(void)
+{
+ int i, ret;
+ unsigned long gcs_mode;
+
+ ksft_print_header();
+
+ /*
+ * We don't have getauxval() with nolibc so treat a failure to
+ * read GCS state as a lack of support and skip.
+ */
+ ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS,
+ &gcs_mode, 0, 0, 0);
+ if (ret != 0)
+ ksft_exit_skip("Failed to read GCS state: %d\n", ret);
+
+ if (!(gcs_mode & PR_SHADOW_STACK_ENABLE)) {
+ gcs_mode = PR_SHADOW_STACK_ENABLE;
+ ret = my_syscall5(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ gcs_mode, 0, 0, 0);
+ if (ret != 0)
+ ksft_exit_fail_msg("Failed to enable GCS: %d\n", ret);
+ }
+
+ ksft_set_plan(ARRAY_SIZE(tests));
+
+ for (i = 0; i < ARRAY_SIZE(tests); i++) {
+ ksft_test_result((*tests[i].test)(), "%s\n", tests[i].name);
+ }
+
+ /* One last test: disable GCS, we can do this one time */
+ my_syscall5(__NR_prctl, PR_SET_SHADOW_STACK_STATUS, 0, 0, 0, 0);
+ if (ret != 0)
+ ksft_print_msg("Failed to disable GCS: %d\n", ret);
+
+ ksft_finished();
+
+ return 0;
+}
diff --git a/tools/testing/selftests/arm64/gcs/gcs-util.h b/tools/testing/selftests/arm64/gcs/gcs-util.h
new file mode 100644
index 000000000000..c517f1a710c5
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/gcs-util.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 ARM Limited.
+ */
+
+#ifndef GCS_UTIL_H
+#define GCS_UTIL_H
+
+#include <stdbool.h>
+
+#ifndef __NR_map_shadow_stack
+#define __NR_map_shadow_stack 452
+#endif
+
+#ifndef __NR_prctl
+#define __NR_prctl 167
+#endif
+
+/* Shadow Stack/Guarded Control Stack interface */
+#define PR_GET_SHADOW_STACK_STATUS 71
+#define PR_SET_SHADOW_STACK_STATUS 72
+#define PR_LOCK_SHADOW_STACK_STATUS 73
+
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+# define PR_SHADOW_STACK_WRITE (1UL << 1)
+# define PR_SHADOW_STACK_PUSH (1UL << 2)
+
+#define PR_SHADOW_STACK_ALL_MODES \
+ PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE | PR_SHADOW_STACK_PUSH
+
+#define GCS_CAP_ADDR_MASK (0xfffffffffffff000UL)
+#define GCS_CAP_TOKEN_MASK (0x0000000000000fffUL)
+#define GCS_CAP_VALID_TOKEN 1
+#define GCS_CAP_IN_PROGRESS_TOKEN 5
+
+#define GCS_CAP(x) (((unsigned long)(x) & GCS_CAP_ADDR_MASK) | \
+ GCS_CAP_VALID_TOKEN)
+
+static inline unsigned long *get_gcspr(void)
+{
+ unsigned long *gcspr;
+
+ asm volatile(
+ "mrs %0, S3_3_C2_C5_1"
+ : "=r" (gcspr)
+ :
+ : "cc");
+
+ return gcspr;
+}
+
+static inline void __attribute__((always_inline)) gcsss1(unsigned long *Xt)
+{
+ asm volatile (
+ "sys #3, C7, C7, #2, %0\n"
+ :
+ : "rZ" (Xt)
+ : "memory");
+}
+
+static inline unsigned long __attribute__((always_inline)) *gcsss2(void)
+{
+ unsigned long *Xt;
+
+ asm volatile(
+ "SYSL %0, #3, C7, C7, #3\n"
+ : "=r" (Xt)
+ :
+ : "memory");
+
+ return Xt;
+}
+
+static inline bool chkfeat_gcs(void)
+{
+ register long val __asm__ ("x16") = 1;
+
+ /* CHKFEAT x16 */
+ asm volatile(
+ "hint #0x28\n"
+ : "=r" (val)
+ : "r" (val));
+
+ return val != 1;
+}
+
+#endif

--
2.30.2


2023-08-07 22:37:30

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 11/36] arm64/mm: Map pages for guarded control stack

Map pages flagged as being part of a GCS as such rather than using the
full set of generic VM flags.

This is done using a conditional rather than extending the size of
protection_map since that would make for a very sparse array.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/mm/mmap.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 8f5b7ce857ed..8f40198cd44e 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -79,8 +79,18 @@ arch_initcall(adjust_protection_map);

pgprot_t vm_get_page_prot(unsigned long vm_flags)
{
- pteval_t prot = pgprot_val(protection_map[vm_flags &
+ pteval_t prot;
+
+ /* If this is a GCS then only interpret VM_WRITE. */
+ if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) {
+ if (vm_flags & VM_WRITE)
+ prot = _PAGE_GCS;
+ else
+ prot = _PAGE_GCS_RO;
+ } else {
+ prot = pgprot_val(protection_map[vm_flags &
(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
+ }

if (vm_flags & VM_ARM64_BTI)
prot |= PTE_GP;

--
2.30.2


2023-08-07 22:39:43

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 14/36] arm64/idreg: Add overrride for GCS

Hook up an override for GCS, allowing it to be disabled from the command
line by specifying arm64.nogcs in case there are problems.

Signed-off-by: Mark Brown <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 3 +++
arch/arm64/kernel/idreg-override.c | 2 ++
2 files changed, 5 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1457995fd41..86662eed3003 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -418,6 +418,9 @@
arm64.nobti [ARM64] Unconditionally disable Branch Target
Identification support

+ arm64.nogcs [ARM64] Unconditionally disable Guarded Control Stack
+ support
+
arm64.nopauth [ARM64] Unconditionally disable Pointer Authentication
support

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index 2fe2491b692c..49269a5cff10 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -99,6 +99,7 @@ static const struct ftr_set_desc pfr1 __initconst = {
.override = &id_aa64pfr1_override,
.fields = {
FIELD("bt", ID_AA64PFR1_EL1_BT_SHIFT, NULL ),
+ FIELD("gcs", ID_AA64PFR1_EL1_GCS_SHIFT, NULL),
FIELD("mte", ID_AA64PFR1_EL1_MTE_SHIFT, NULL),
FIELD("sme", ID_AA64PFR1_EL1_SME_SHIFT, pfr1_sme_filter),
{}
@@ -178,6 +179,7 @@ static const struct {
{ "arm64.nosve", "id_aa64pfr0.sve=0" },
{ "arm64.nosme", "id_aa64pfr1.sme=0" },
{ "arm64.nobti", "id_aa64pfr1.bt=0" },
+ { "arm64.nogcs", "id_aa64pfr1.gcs=0" },
{ "arm64.nopauth",
"id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 "
"id_aa64isar1.api=0 id_aa64isar1.apa=0 "

--
2.30.2


2023-08-07 22:40:03

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 33/36] kselftest/arm64: Add test coverage for GCS mode locking

Verify that we can lock individual GCS mode bits, that other modes
aren't affected and as a side effect also that every combination of
modes can be enabled.

Normally the inability to reenable GCS after disabling it would be an
issue with testing but fortunately the kselftest_harness runs each test
within a fork()ed child. This can be inconvenient for some kinds of
testing but here it means that each test is in a separate thread and
therefore won't be affected by other tests in the suite.

Once we get toolchains with support for enabling GCS by default we will
need to take care to not do that in the build system but there are no
such toolchains yet so it is not yet an issue.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/gcs/.gitignore | 1 +
tools/testing/selftests/arm64/gcs/Makefile | 2 +-
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/arm64/gcs/.gitignore b/tools/testing/selftests/arm64/gcs/.gitignore
index 5810c4a163d4..0c86f53f68ad 100644
--- a/tools/testing/selftests/arm64/gcs/.gitignore
+++ b/tools/testing/selftests/arm64/gcs/.gitignore
@@ -1,2 +1,3 @@
basic-gcs
libc-gcs
+gcs-locking
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
index a8fdf21e9a47..2173d6275956 100644
--- a/tools/testing/selftests/arm64/gcs/Makefile
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -6,7 +6,7 @@
# nolibc.
#

-TEST_GEN_PROGS := basic-gcs libc-gcs
+TEST_GEN_PROGS := basic-gcs libc-gcs gcs-locking

LDLIBS+=-lpthread

diff --git a/tools/testing/selftests/arm64/gcs/gcs-locking.c b/tools/testing/selftests/arm64/gcs/gcs-locking.c
new file mode 100644
index 000000000000..f6a73254317e
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/gcs-locking.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Limited.
+ *
+ * Tests for GCS mode locking. These tests rely on both having GCS
+ * unconfigured on entry and on the kselftest harness running each
+ * test in a fork()ed process which will have it's own mode.
+ */
+
+#include <limits.h>
+
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+
+#include <asm/hwcap.h>
+
+#include "kselftest_harness.h"
+
+#include "gcs-util.h"
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num __asm__ ("x8") = (num); \
+ register long _arg1 __asm__ ("x0") = (long)(arg1); \
+ register long _arg2 __asm__ ("x1") = (long)(arg2); \
+ register long _arg3 __asm__ ("x2") = 0; \
+ register long _arg4 __asm__ ("x3") = 0; \
+ register long _arg5 __asm__ ("x4") = 0; \
+ \
+ __asm__ volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+/* No mode bits are rejected for locking */
+TEST(lock_all_modes)
+{
+ int ret;
+
+ ret = prctl(PR_LOCK_SHADOW_STACK_STATUS, ULONG_MAX, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+}
+
+FIXTURE(valid_modes)
+{
+};
+
+FIXTURE_VARIANT(valid_modes)
+{
+ unsigned long mode;
+};
+
+FIXTURE_VARIANT_ADD(valid_modes, enable)
+{
+ .mode = PR_SHADOW_STACK_ENABLE,
+};
+
+FIXTURE_VARIANT_ADD(valid_modes, enable_write)
+{
+ .mode = PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE,
+};
+
+FIXTURE_VARIANT_ADD(valid_modes, enable_push)
+{
+ .mode = PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_PUSH,
+};
+
+FIXTURE_VARIANT_ADD(valid_modes, enable_write_push)
+{
+ .mode = PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE |
+ PR_SHADOW_STACK_PUSH,
+};
+
+FIXTURE_SETUP(valid_modes)
+{
+}
+
+FIXTURE_TEARDOWN(valid_modes)
+{
+}
+
+/* We can set the mode at all */
+TEST_F(valid_modes, set)
+{
+ int ret;
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, 0);
+
+ _exit(0);
+}
+
+/* Enabling, locking then disabling is rejected */
+TEST_F(valid_modes, enable_lock_disable)
+{
+ unsigned long mode;
+ int ret;
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, variant->mode);
+
+ ret = prctl(PR_LOCK_SHADOW_STACK_STATUS, variant->mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS, 0);
+ ASSERT_EQ(ret, -EBUSY);
+
+ _exit(0);
+}
+
+/* Locking then enabling is rejected */
+TEST_F(valid_modes, lock_enable)
+{
+ unsigned long mode;
+ int ret;
+
+ ret = prctl(PR_LOCK_SHADOW_STACK_STATUS, variant->mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, -EBUSY);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, 0);
+
+ _exit(0);
+}
+
+/* Locking then changing other modes is fine */
+TEST_F(valid_modes, lock_enable_disable_others)
+{
+ unsigned long mode;
+ int ret;
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, variant->mode);
+
+ ret = prctl(PR_LOCK_SHADOW_STACK_STATUS, variant->mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ PR_SHADOW_STACK_ALL_MODES);
+ ASSERT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, PR_SHADOW_STACK_ALL_MODES);
+
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ variant->mode);
+ ASSERT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ ASSERT_EQ(ret, 0);
+ ASSERT_EQ(mode, variant->mode);
+
+ _exit(0);
+}
+
+int main(int argc, char **argv)
+{
+ unsigned long mode;
+ int ret;
+
+ if (!(getauxval(AT_HWCAP2) & HWCAP2_GCS))
+ ksft_exit_skip("SKIP GCS not supported\n");
+
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ if (ret) {
+ ksft_print_msg("Failed to read GCS state: %d\n", ret);
+ return EXIT_FAILURE;
+ }
+
+ if (mode & PR_SHADOW_STACK_ENABLE) {
+ ksft_print_msg("GCS was enabled, test unsupported\n");
+ return KSFT_SKIP;
+ }
+
+ return test_harness_run(argc, argv);
+}

--
2.30.2


2023-08-07 22:40:46

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 13/36] arm64/gcs: Allow GCS usage at EL0 and EL1

There is a control HCRX_EL2.GCSEn which must be set to allow GCS
features to take effect at lower ELs and also fine grained traps for GCS
usage at EL0 and EL1. Configure all these to allow GCS usage by EL0 and
EL1.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/el2_setup.h | 17 +++++++++++++++++
arch/arm64/include/asm/kvm_arm.h | 4 ++--
2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 8e5ffb58f83e..45f3a7dcfd95 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -27,6 +27,14 @@
ubfx x0, x0, #ID_AA64MMFR1_EL1_HCX_SHIFT, #4
cbz x0, .Lskip_hcrx_\@
mov_q x0, HCRX_HOST_FLAGS
+
+ /* Enable GCS if supported */
+ mrs_s x1, SYS_ID_AA64PFR1_EL1
+ ubfx x1, x1, #ID_AA64PFR1_EL1_GCS_SHIFT, #4
+ cbz x1, .Lset_hcrx_\@
+ orr x0, x0, #HCRX_EL2_GCSEn
+
+.Lset_hcrx_\@:
msr_s SYS_HCRX_EL2, x0
.Lskip_hcrx_\@:
.endm
@@ -186,6 +194,15 @@
orr x0, x0, #HFGxTR_EL2_nPIR_EL1
orr x0, x0, #HFGxTR_EL2_nPIRE0_EL1

+ /* GCS depends on PIE so we don't check it if PIE is absent */
+ mrs_s x1, SYS_ID_AA64PFR1_EL1
+ ubfx x1, x1, #ID_AA64PFR1_EL1_GCS_SHIFT, #4
+ cbz x1, .Lset_fgt_\@
+
+ /* Disable traps of access to GCS registers at EL0 and EL1 */
+ orr x0, x0, #HFGxTR_EL2_nGCS_EL1_MASK
+ orr x0, x0, #HFGxTR_EL2_nGCS_EL0_MASK
+
.Lset_fgt_\@:
msr_s SYS_HFGRTR_EL2, x0
msr_s SYS_HFGWTR_EL2, x0
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 58e5eb27da68..9c84e200217b 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -94,8 +94,8 @@
#define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)

-#define HCRX_GUEST_FLAGS (HCRX_EL2_SMPME | HCRX_EL2_TCR2En)
-#define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En)
+#define HCRX_GUEST_FLAGS (HCRX_EL2_SMPME | HCRX_EL2_TCR2En | HCRX_EL2_GCSEn)
+#define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En | HCRX_EL2_GCSEn)

/* TCR_EL2 Registers bits */
#define TCR_EL2_RES1 ((1U << 31) | (1 << 23))

--
2.30.2


2023-08-07 22:42:24

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 22/36] arm64/signal: Set up and restore the GCS context for signal handlers

When invoking a signal handler we use the GCS configuration and stack
for the current thread.

Since we implement signal return by calling the signal handler with a
return address set up pointing to a trampoline in the vDSO we need to
also configure any active GCS for this by pushing a frame for the
trampoline onto the GCS. If we do not do this then signal return will
generate a GCS protection fault.

In order to guard against attempts to bypass GCS protections via signal
return we only allow returning with GCSPR_EL0 pointing to an address
where it was previously preempted by a signal. We do this by pushing a
cap onto the GCS, this takes the form of an architectural GCS cap token
with the top bit set which we add on signal entry and validate and pop
off on signal return. Since the top bit is set address validation for
the token will fail if an attempt is made to use it with the stack
switch instructions.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 2 +
arch/arm64/kernel/signal.c | 130 +++++++++++++++++++++++++++++++++++++++++--
arch/arm64/mm/gcs.c | 1 +
3 files changed, 128 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index c150e76869a1..65496103d462 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -8,6 +8,8 @@
#include <asm/types.h>
#include <asm/uaccess.h>

+struct ksignal;
+
static inline void gcsb_dsync(void)
{
asm volatile(".inst 0xd503227f" : : : "memory");
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 0df8cc295ea5..1c31be0f373e 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -25,6 +25,7 @@
#include <asm/elf.h>
#include <asm/exception.h>
#include <asm/cacheflush.h>
+#include <asm/gcs.h>
#include <asm/ucontext.h>
#include <asm/unistd.h>
#include <asm/fpsimd.h>
@@ -34,6 +35,36 @@
#include <asm/traps.h>
#include <asm/vdso.h>

+#ifdef CONFIG_ARM64_GCS
+/* Extra bit set in the address distinguishing a signal cap token. */
+#define GCS_SIGNAL_CAP_FLAG BIT(63)
+
+#define GCS_SIGNAL_CAP(addr) (GCS_CAP(addr) | GCS_SIGNAL_CAP_FLAG)
+
+static bool gcs_signal_cap_valid(u64 addr, u64 val)
+{
+ /*
+ * The top bit should be set, this is an invalid address for
+ * EL0 and will only be set for caps created by signals.
+ */
+ if (!(val & GCS_SIGNAL_CAP_FLAG))
+ return false;
+
+ /* The rest should be a standard architectural cap token. */
+ val &= ~GCS_SIGNAL_CAP_FLAG;
+
+ /* The cap must have the low bits set to a token value */
+ if (GCS_CAP_TOKEN(val) != GCS_CAP_VALID_TOKEN)
+ return false;
+
+ /* The cap must store the VA the cap was stored at */
+ if (GCS_CAP_ADDR(addr) != GCS_CAP_ADDR(val))
+ return false;
+
+ return true;
+}
+#endif
+
/*
* Do a signal return; undo the signal stack. These are aligned to 128-bit.
*/
@@ -815,6 +846,45 @@ static int restore_sigframe(struct pt_regs *regs,
return err;
}

+#ifdef CONFIG_ARM64_GCS
+static int gcs_restore_signal(void)
+{
+ u64 gcspr_el0, cap;
+ int ret;
+
+ if (!system_supports_gcs())
+ return 0;
+
+ if (!(current->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE))
+ return 0;
+
+ gcspr_el0 = read_sysreg_s(SYS_GCSPR_EL0);
+
+ /*
+ * GCSPR_EL0 should be pointing at a capped GCS, read the cap...
+ */
+ gcsb_dsync();
+ ret = copy_from_user(&cap, (__user void*)gcspr_el0, sizeof(cap));
+ if (ret)
+ return -EFAULT;
+
+ /*
+ * ...then check that the cap is the actual GCS before
+ * restoring it.
+ */
+ if (!gcs_signal_cap_valid(gcspr_el0, cap))
+ return -EINVAL;
+
+ current->thread.gcspr_el0 = gcspr_el0 + sizeof(cap);
+ write_sysreg_s(current->thread.gcspr_el0, SYS_GCSPR_EL0);
+
+ return 0;
+}
+
+#else
+static int gcs_restore_signal(void) { return 0; }
+#endif
+
SYSCALL_DEFINE0(rt_sigreturn)
{
struct pt_regs *regs = current_pt_regs();
@@ -841,6 +911,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (restore_altstack(&frame->uc.uc_stack))
goto badframe;

+ if (gcs_restore_signal())
+ goto badframe;
+
return regs->regs[0];

badframe:
@@ -1071,7 +1144,52 @@ static int get_sigframe(struct rt_sigframe_user_layout *user,
return 0;
}

-static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
+#ifdef CONFIG_ARM64_GCS
+
+static int gcs_signal_entry(__sigrestore_t sigtramp, struct ksignal *ksig)
+{
+ unsigned long __user *gcspr_el0;
+ unsigned long cap[2];
+ int ret;
+
+ if (!system_supports_gcs())
+ return 0;
+
+ if (!task_gcs_el0_enabled(current))
+ return 0;
+
+ /*
+ * We are entering a signal handler, current register state is
+ * active.
+ */
+ gcspr_el0 = (unsigned long __user *)read_sysreg_s(SYS_GCSPR_EL0);
+
+ /*
+ * Push a cap and the GCS entry for the trampoline onto the GCS.
+ */
+ cap[1] = GCS_SIGNAL_CAP(gcspr_el0 - 1);
+ cap[0] = (unsigned long)sigtramp;
+ ret = copy_to_user_gcs(gcspr_el0 - 2, cap, ARRAY_SIZE(cap));
+ if (ret != 0)
+ return ret;
+
+ gcsb_dsync();
+
+ gcspr_el0 -= 2;
+ write_sysreg_s((unsigned long)gcspr_el0, SYS_GCSPR_EL0);
+
+ return 0;
+}
+#else
+
+static int gcs_signal_entry(__sigrestore_t sigtramp, struct ksignal *ksig)
+{
+ return 0;
+}
+
+#endif
+
+static int setup_return(struct pt_regs *regs, struct ksignal *ksig,
struct rt_sigframe_user_layout *user, int usig)
{
__sigrestore_t sigtramp;
@@ -1079,7 +1197,7 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
regs->regs[0] = usig;
regs->sp = (unsigned long)user->sigframe;
regs->regs[29] = (unsigned long)&user->next_frame->fp;
- regs->pc = (unsigned long)ka->sa.sa_handler;
+ regs->pc = (unsigned long)ksig->ka.sa.sa_handler;

/*
* Signal delivery is a (wacky) indirect function call in
@@ -1119,12 +1237,14 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
sme_smstop();
}

- if (ka->sa.sa_flags & SA_RESTORER)
- sigtramp = ka->sa.sa_restorer;
+ if (ksig->ka.sa.sa_flags & SA_RESTORER)
+ sigtramp = ksig->ka.sa.sa_restorer;
else
sigtramp = VDSO_SYMBOL(current->mm->context.vdso, sigtramp);

regs->regs[30] = (unsigned long)sigtramp;
+
+ return gcs_signal_entry(sigtramp, ksig);
}

static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set,
@@ -1147,7 +1267,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set,
err |= __save_altstack(&frame->uc.uc_stack, regs->sp);
err |= setup_sigframe(&user, regs, set);
if (err == 0) {
- setup_return(regs, &ksig->ka, &user, usig);
+ err = setup_return(regs, ksig, &user, usig);
if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
err |= copy_siginfo_to_user(&frame->info, &ksig->info);
regs->regs[1] = (unsigned long)&frame->info;
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index b41700d6695e..0034d5b12971 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -6,6 +6,7 @@
#include <linux/types.h>

#include <asm/cpufeature.h>
+#include <asm/gcs.h>
#include <asm/page.h>

static unsigned long alloc_gcs(unsigned long addr, unsigned long size,

--
2.30.2


2023-08-07 22:50:41

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 25/36] arm64: Add Kconfig for Guarded Control Stack (GCS)

Provide a Kconfig option allowing the user to select if GCS support is
built into the kernel.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/Kconfig | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a2511b30d0f6..b5ef1a698770 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2093,6 +2093,25 @@ config ARM64_EPAN
if the cpu does not implement the feature.
endmenu # "ARMv8.7 architectural features"

+menu "v9.4 architectural features"
+
+config ARM64_GCS
+ bool "Enable support for Guarded Control Stack (GCS)"
+ default y
+ select ARCH_USES_HIGH_VMA_FLAGS
+ help
+ Guarded Control Stack (GCS) provides support for a separate
+ stack with restricted access which contains only return
+ addresses. This can be used to harden against some attacks
+ by comparing return address used by the program with what is
+ stored in the GCS, and may also be used to efficiently obtain
+ the call stack for applications such as profiling.
+
+ The feature is detected at runtime, and will remain disabled
+ if the system does not implement the feature.
+
+endmenu # "v9.4 architectural features"
+
config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y

--
2.30.2


2023-08-07 22:51:10

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 07/36] arm64/gcs: Provide copy_to_user_gcs()

In order for EL1 to write to an EL0 GCS it must use the GCSSTTR instruction
rather than a normal STTR. Provide a copy_to_user_gcs() which does this.
Since it is not possible to store anything other than a 64 bit value the
interface is presented in terms of 64 bit values, using unsigned long
rather than u64 due to sparse.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/uaccess.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 22e10e79f56a..24aa804e95a7 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -445,6 +445,26 @@ static inline int gcssttr(unsigned long __user *addr, unsigned long val)
return err;
}

+static inline int copy_to_user_gcs(unsigned long __user *addr,
+ unsigned long *val,
+ int count)
+{
+ int ret = -EFAULT;
+ int i;
+
+ if (access_ok((char __user *)addr, count * sizeof(u64))) {
+ uaccess_ttbr0_enable();
+ for (i = 0; i < count; i++) {
+ ret = gcssttr(addr++, *val++);
+ if (ret != 0)
+ break;
+ }
+ uaccess_ttbr0_disable();
+ }
+
+ return ret;
+}
+
#endif /* CONFIG_ARM64_GCS */

#endif /* __ASM_UACCESS_H */

--
2.30.2


2023-08-07 22:51:34

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 21/36] arm64/mm: Implement map_shadow_stack()

As discussed extensively in the changelog for the addition of this
syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
existing mmap() and madvise() syscalls do not map entirely well onto the
security requirements for guarded control stacks since they lead to
windows where memory is allocated but not yet protected or stacks which
are not properly and safely initialised. Instead a new syscall
map_shadow_stack() has been defined which allocates and initialises a
shadow stack page.

Implement this for arm64. Two flags are provided, allowing applications
to request that the stack be initialised with a valid cap token at the
top of the stack and optionally also an end of stack marker above that.
We support requesting an end of stack marker alone but since this is a
NULL pointer it is indistinguishable from not initialising anything by
itself.

Since the x86 code has not yet been rebased to v6.5-rc1 this includes
the architecture neutral parts of Rick Edgecmbe's "x86/shstk: Introduce
map_shadow_stack syscall".

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/mm/gcs.c | 58 ++++++++++++++++++++++++++++++++++++++-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +++-
kernel/sys_ni.c | 1 +
4 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 64c9f9a85925..b41700d6695e 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -52,7 +52,6 @@ unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
return 0;

size = gcs_size(size);
-
addr = alloc_gcs(0, size, 0, 0);
if (IS_ERR_VALUE(addr))
return addr;
@@ -64,6 +63,63 @@ unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
return addr;
}

+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
+{
+ unsigned long alloc_size;
+ unsigned long __user *cap_ptr;
+ unsigned long cap_val;
+ int ret, cap_offset;
+
+ if (!system_supports_gcs())
+ return -EOPNOTSUPP;
+
+ if (flags & ~(SHADOW_STACK_SET_TOKEN | SHADOW_STACK_SET_MARKER))
+ return -EINVAL;
+
+ if (addr % 8)
+ return -EINVAL;
+
+ if (size == 8 || size % 8)
+ return -EINVAL;
+
+ /*
+ * An overflow would result in attempting to write the restore token
+ * to the wrong location. Not catastrophic, but just return the right
+ * error code and block it.
+ */
+ alloc_size = PAGE_ALIGN(size);
+ if (alloc_size < size)
+ return -EOVERFLOW;
+
+ addr = alloc_gcs(addr, alloc_size, 0, false);
+ if (IS_ERR_VALUE(addr))
+ return addr;
+
+ /*
+ * Put a cap token at the end of the allocated region so it
+ * can be switched to.
+ */
+ if (flags & SHADOW_STACK_SET_TOKEN) {
+ /* Leave an extra empty frame as a top of stack marker? */
+ if (flags & SHADOW_STACK_SET_MARKER)
+ cap_offset = 2;
+ else
+ cap_offset = 1;
+
+ cap_ptr = (unsigned long __user *)(addr + size -
+ (cap_offset * sizeof(unsigned long)));
+ cap_val = GCS_CAP(cap_ptr);
+
+ ret = copy_to_user_gcs(cap_ptr, &cap_val, 1);
+ if (ret != 0) {
+ vm_munmap(addr, size);
+ return -EFAULT;
+ }
+ }
+
+ return addr;
+}
+
/*
* Apply the GCS mode configured for the specified task to the
* hardware.
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 03e3d0121d5e..7f6dc0988197 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -953,6 +953,7 @@ asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long l
asmlinkage long sys_cachestat(unsigned int fd,
struct cachestat_range __user *cstat_range,
struct cachestat __user *cstat, unsigned int flags);
+asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags);

/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index fd6c1cb585db..38885a795ea6 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -820,8 +820,11 @@ __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
#define __NR_cachestat 451
__SYSCALL(__NR_cachestat, sys_cachestat)

+#define __NR_map_shadow_stack 452
+__SYSCALL(__NR_map_shadow_stack, sys_map_shadow_stack)
+
#undef __NR_syscalls
-#define __NR_syscalls 452
+#define __NR_syscalls 453

/*
* 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 781de7cc6a4e..e137c1385c56 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -274,6 +274,7 @@ COND_SYSCALL(vm86old);
COND_SYSCALL(modify_ldt);
COND_SYSCALL(vm86);
COND_SYSCALL(kexec_file_load);
+COND_SYSCALL(map_shadow_stack);

/* s390 */
COND_SYSCALL(s390_pci_mmio_read);

--
2.30.2


2023-08-07 22:53:11

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 23/36] arm64/signal: Expose GCS state in signal frames

Add a context for the GCS state and include it in the signal context when
running on a system that supports GCS. We reuse the same flags that the
prctl() uses to specify which GCS features are enabled and also provide the
current GCS pointer.

We do not support enabling GCS via signal return, there is a conflict
between specifying GCSPR_EL0 and allocation of a new GCS and this is not
an ancticipated use case. We also enforce GCS configuration locking on
signal return.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/uapi/asm/sigcontext.h | 9 +++
arch/arm64/kernel/signal.c | 107 +++++++++++++++++++++++++++++++
2 files changed, 116 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
index f23c1dc3f002..7b66d245f2d2 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -168,6 +168,15 @@ struct zt_context {
__u16 __reserved[3];
};

+#define GCS_MAGIC 0x47435300
+
+struct gcs_context {
+ struct _aarch64_ctx head;
+ __u64 gcspr;
+ __u64 features_enabled;
+ __u64 reserved;
+};
+
#endif /* !__ASSEMBLY__ */

#include <asm/sve_context.h>
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 1c31be0f373e..4cc0c7928cb3 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -87,6 +87,7 @@ struct rt_sigframe_user_layout {

unsigned long fpsimd_offset;
unsigned long esr_offset;
+ unsigned long gcs_offset;
unsigned long sve_offset;
unsigned long tpidr2_offset;
unsigned long za_offset;
@@ -213,6 +214,8 @@ struct user_ctxs {
u32 za_size;
struct zt_context __user *zt;
u32 zt_size;
+ struct gcs_context __user *gcs;
+ u32 gcs_size;
};

static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
@@ -605,6 +608,82 @@ extern int restore_zt_context(struct user_ctxs *user);

#endif /* ! CONFIG_ARM64_SME */

+#ifdef CONFIG_ARM64_GCS
+
+static int preserve_gcs_context(struct gcs_context __user *ctx)
+{
+ int err = 0;
+ u64 gcspr;
+
+ /*
+ * We will add a cap token to the frame, include it in the
+ * GCSPR_EL0 we report to support stack switching via
+ * sigreturn.
+ */
+ gcs_preserve_current_state();
+ gcspr = current->thread.gcspr_el0;
+ if (task_gcs_el0_enabled(current))
+ gcspr -= 8;
+
+ __put_user_error(GCS_MAGIC, &ctx->head.magic, err);
+ __put_user_error(sizeof(*ctx), &ctx->head.size, err);
+ __put_user_error(gcspr, &ctx->gcspr, err);
+ __put_user_error(current->thread.gcs_el0_mode,
+ &ctx->features_enabled, err);
+
+ return err;
+}
+
+static int restore_gcs_context(struct user_ctxs *user)
+{
+ u64 gcspr, enabled;
+ int err = 0;
+
+ if (user->gcs_size != sizeof(*user->gcs))
+ return -EINVAL;
+
+ __get_user_error(gcspr, &user->gcs->gcspr, err);
+ __get_user_error(enabled, &user->gcs->features_enabled, err);
+ if (err)
+ return err;
+
+ /* Don't allow unknown modes */
+ if (enabled & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK)
+ return -EINVAL;
+
+ err = gcs_check_locked(current, enabled);
+ if (err != 0)
+ return err;
+
+ /* Don't allow enabling */
+ if (!task_gcs_el0_enabled(current) &&
+ (enabled & PR_SHADOW_STACK_ENABLE))
+ return -EINVAL;
+
+ /* If we are disabling disable everything */
+ if (!(enabled & PR_SHADOW_STACK_ENABLE))
+ enabled = 0;
+
+ current->thread.gcs_el0_mode = enabled;
+
+ /*
+ * We let userspace set GCSPR_EL0 to anything here, we will
+ * validate later in gcs_restore_signal().
+ */
+ current->thread.gcspr_el0 = gcspr;
+ write_sysreg_s(current->thread.gcspr_el0, SYS_GCSPR_EL0);
+
+ return 0;
+}
+
+#else /* ! CONFIG_ARM64_GCS */
+
+/* Turn any non-optimised out attempts to use these into a link error: */
+extern int preserve_gcs_context(void __user *ctx);
+extern int restore_gcs_context(struct user_ctxs *user);
+
+#endif /* ! CONFIG_ARM64_GCS */
+
static int parse_user_sigframe(struct user_ctxs *user,
struct rt_sigframe __user *sf)
{
@@ -621,6 +700,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->tpidr2 = NULL;
user->za = NULL;
user->zt = NULL;
+ user->gcs = NULL;

if (!IS_ALIGNED((unsigned long)base, 16))
goto invalid;
@@ -715,6 +795,17 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->zt_size = size;
break;

+ case GCS_MAGIC:
+ if (!system_supports_gcs())
+ goto invalid;
+
+ if (user->gcs)
+ goto invalid;
+
+ user->gcs = (struct gcs_context __user *)head;
+ user->gcs_size = size;
+ break;
+
case EXTRA_MAGIC:
if (have_extra_context)
goto invalid;
@@ -834,6 +925,9 @@ static int restore_sigframe(struct pt_regs *regs,
err = restore_fpsimd_context(&user);
}

+ if (err == 0 && system_supports_gcs() && user.gcs)
+ err = restore_gcs_context(&user);
+
if (err == 0 && system_supports_tpidr2() && user.tpidr2)
err = restore_tpidr2_context(&user);

@@ -948,6 +1042,13 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
return err;
}

+ if (system_supports_gcs()) {
+ err = sigframe_alloc(user, &user->gcs_offset,
+ sizeof(struct gcs_context));
+ if (err)
+ return err;
+ }
+
if (system_supports_sve() || system_supports_sme()) {
unsigned int vq = 0;

@@ -1041,6 +1142,12 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
__put_user_error(current->thread.fault_code, &esr_ctx->esr, err);
}

+ if (system_supports_gcs() && err == 0 && user->gcs_offset) {
+ struct gcs_context __user *gcs_ctx =
+ apply_user_offset(user, user->gcs_offset);
+ err |= preserve_gcs_context(gcs_ctx);
+ }
+
/* Scalable Vector Extension state (including streaming), if present */
if ((system_supports_sve() || system_supports_sme()) &&
err == 0 && user->sve_offset) {

--
2.30.2


2023-08-07 22:54:04

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 26/36] kselftest/arm64: Verify the GCS hwcap

Add coverage of the GCS hwcap to the hwcap selftest, using a read of
GCSPR_EL0 to generate SIGILL without having to worry about enabling GCS.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/abi/hwcap.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/tools/testing/selftests/arm64/abi/hwcap.c b/tools/testing/selftests/arm64/abi/hwcap.c
index d4ad813fed10..38844e4c5aae 100644
--- a/tools/testing/selftests/arm64/abi/hwcap.c
+++ b/tools/testing/selftests/arm64/abi/hwcap.c
@@ -39,6 +39,17 @@ static void cssc_sigill(void)
asm volatile(".inst 0xdac01c00" : : : "x0");
}

+static void gcs_sigill(void)
+{
+ unsigned long *gcspr;
+
+ asm volatile(
+ "mrs %0, S3_3_C2_C5_1"
+ : "=r" (gcspr)
+ :
+ : "cc");
+}
+
static void mops_sigill(void)
{
char dst[1], src[1];
@@ -223,6 +234,14 @@ static const struct hwcap_data {
.cpuinfo = "cssc",
.sigill_fn = cssc_sigill,
},
+ {
+ .name = "GCS",
+ .at_hwcap = AT_HWCAP2,
+ .hwcap_bit = HWCAP2_GCS,
+ .cpuinfo = "gcs",
+ .sigill_fn = gcs_sigill,
+ .sigill_reliable = true,
+ },
{
.name = "MOPS",
.at_hwcap = AT_HWCAP2,

--
2.30.2


2023-08-07 22:55:49

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 28/36] kselftest/arm64: Add framework support for GCS to signal handling tests

Teach the framework about the GCS signal context, avoiding warnings on
the unknown context.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/signal/testcases/testcases.c | 7 +++++++
tools/testing/selftests/arm64/signal/testcases/testcases.h | 1 +
2 files changed, 8 insertions(+)

diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.c b/tools/testing/selftests/arm64/signal/testcases/testcases.c
index 9f580b55b388..1cd124732be4 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.c
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.c
@@ -209,6 +209,13 @@ bool validate_reserved(ucontext_t *uc, size_t resv_sz, char **err)
zt = (struct zt_context *)head;
new_flags |= ZT_CTX;
break;
+ case GCS_MAGIC:
+ if (flags & GCS_CTX)
+ *err = "Multiple GCS_MAGIC";
+ if (head->size != sizeof(struct gcs_context))
+ *err = "Bad size for gcs_context";
+ new_flags |= GCS_CTX;
+ break;
case EXTRA_MAGIC:
if (flags & EXTRA_CTX)
*err = "Multiple EXTRA_MAGIC";
diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.h b/tools/testing/selftests/arm64/signal/testcases/testcases.h
index a08ab0d6207a..9b2599745c29 100644
--- a/tools/testing/selftests/arm64/signal/testcases/testcases.h
+++ b/tools/testing/selftests/arm64/signal/testcases/testcases.h
@@ -19,6 +19,7 @@
#define ZA_CTX (1 << 2)
#define EXTRA_CTX (1 << 3)
#define ZT_CTX (1 << 4)
+#define GCS_CTX (1 << 5)

#define KSFT_BAD_MAGIC 0xdeadbeef


--
2.30.2


2023-08-07 22:56:05

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 32/36] kselftest/arm64: Add a GCS test program built with the system libc

There are things like threads which nolibc struggles with which we want
to add coverage for, and the ABI allows us to test most of these even if
libc itself does not understand GCS so add a test application built
using the system libc.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/gcs/.gitignore | 1 +
tools/testing/selftests/arm64/gcs/Makefile | 4 +-
tools/testing/selftests/arm64/gcs/libc-gcs.c | 500 +++++++++++++++++++++++++++
3 files changed, 504 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/arm64/gcs/.gitignore b/tools/testing/selftests/arm64/gcs/.gitignore
index 0e5e695ecba5..5810c4a163d4 100644
--- a/tools/testing/selftests/arm64/gcs/.gitignore
+++ b/tools/testing/selftests/arm64/gcs/.gitignore
@@ -1 +1,2 @@
basic-gcs
+libc-gcs
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
index 61a30f483429..a8fdf21e9a47 100644
--- a/tools/testing/selftests/arm64/gcs/Makefile
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -6,7 +6,9 @@
# nolibc.
#

-TEST_GEN_PROGS := basic-gcs
+TEST_GEN_PROGS := basic-gcs libc-gcs
+
+LDLIBS+=-lpthread

include ../../lib.mk

diff --git a/tools/testing/selftests/arm64/gcs/libc-gcs.c b/tools/testing/selftests/arm64/gcs/libc-gcs.c
new file mode 100644
index 000000000000..5d20442358ed
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/libc-gcs.c
@@ -0,0 +1,500 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Limited.
+ */
+
+#include <pthread.h>
+#include <stdbool.h>
+
+#include <sys/auxv.h>
+#include <sys/mman.h>
+#include <sys/prctl.h>
+
+#include <asm/hwcap.h>
+#include <asm/mman.h>
+
+#include <linux/compiler.h>
+
+#include "kselftest_harness.h"
+
+#include "gcs-util.h"
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num __asm__ ("x8") = (num); \
+ register long _arg1 __asm__ ("x0") = (long)(arg1); \
+ register long _arg2 __asm__ ("x1") = (long)(arg2); \
+ register long _arg3 __asm__ ("x2") = 0; \
+ register long _arg4 __asm__ ("x3") = 0; \
+ register long _arg5 __asm__ ("x4") = 0; \
+ \
+ __asm__ volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+static noinline void gcs_recurse(int depth)
+{
+ if (depth)
+ gcs_recurse(depth - 1);
+
+ /* Prevent tail call optimization so we actually recurse */
+ asm volatile("dsb sy" : : : "memory");
+}
+
+/* Smoke test that a function call and return works*/
+TEST(can_call_function)
+{
+ gcs_recurse(0);
+}
+
+static void *gcs_test_thread(void *arg)
+{
+ int ret;
+ unsigned long mode;
+
+ /*
+ * Some libcs don't seem to fill unused arguments with 0 but
+ * the kernel validates this so we supply all 5 arguments.
+ */
+ ret = prctl(PR_GET_SHADOW_STACK_STATUS, &mode, 0, 0, 0);
+ if (ret != 0) {
+ ksft_print_msg("PR_GET_SHADOW_STACK_STATUS failed: %d\n", ret);
+ return NULL;
+ }
+
+ if (!(mode & PR_SHADOW_STACK_ENABLE)) {
+ ksft_print_msg("GCS not enabled in thread, mode is %u\n",
+ mode);
+ return NULL;
+ }
+
+ /* Just in case... */
+ gcs_recurse(0);
+
+ /* Use a non-NULL value to indicate a pass */
+ return &gcs_test_thread;
+}
+
+/* Verify that if we start a new thread it has GCS enabled */
+TEST(gcs_enabled_thread)
+{
+ pthread_t thread;
+ void *thread_ret;
+ int ret;
+
+ ret = pthread_create(&thread, NULL, gcs_test_thread, NULL);
+ ASSERT_TRUE(ret == 0);
+ if (ret != 0)
+ return;
+
+ ret = pthread_join(thread, &thread_ret);
+ ASSERT_TRUE(ret == 0);
+ if (ret != 0)
+ return;
+
+ ASSERT_TRUE(thread_ret != NULL);
+}
+
+/* Read the GCS until we find the terminator */
+TEST(gcs_find_terminator)
+{
+ unsigned long *gcs, *cur;
+
+ gcs = get_gcspr();
+ cur = gcs;
+ while (*cur)
+ cur++;
+
+ ksft_print_msg("GCS in use from %p-%p\n", gcs, cur);
+
+ /*
+ * We should have at least whatever called into this test so
+ * the two pointer should differ.
+ */
+ ASSERT_TRUE(gcs != cur);
+}
+
+FIXTURE(map_gcs)
+{
+ unsigned long *stack;
+};
+
+FIXTURE_VARIANT(map_gcs)
+{
+ size_t stack_size;
+ unsigned long flags;
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s2k_cap_marker)
+{
+ .stack_size = 2 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s2k_cap)
+{
+ .stack_size = 2 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s2k_marker)
+{
+ .stack_size = 2 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s2k)
+{
+ .stack_size = 2 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s4k_cap_marker)
+{
+ .stack_size = 4 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s4k_cap)
+{
+ .stack_size = 4 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s3k_marker)
+{
+ .stack_size = 4 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s4k)
+{
+ .stack_size = 4 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s16k_cap_marker)
+{
+ .stack_size = 16 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s16k_cap)
+{
+ .stack_size = 16 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s16k_marker)
+{
+ .stack_size = 16 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s16k)
+{
+ .stack_size = 16 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s64k_cap_marker)
+{
+ .stack_size = 64 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s64k_cap)
+{
+ .stack_size = 64 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s64k_marker)
+{
+ .stack_size = 64 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s64k)
+{
+ .stack_size = 64 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s128k_cap_marker)
+{
+ .stack_size = 128 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s128k_cap)
+{
+ .stack_size = 128 * 1024,
+ .flags = SHADOW_STACK_SET_TOKEN,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s128k_marker)
+{
+ .stack_size = 128 * 1024,
+ .flags = SHADOW_STACK_SET_MARKER,
+};
+
+FIXTURE_VARIANT_ADD(map_gcs, s128k)
+{
+ .stack_size = 128 * 1024,
+ .flags = 0,
+};
+
+FIXTURE_SETUP(map_gcs)
+{
+ self->stack = (void *)syscall(__NR_map_shadow_stack, 0,
+ variant->stack_size,
+ variant->flags);
+ ASSERT_FALSE(self->stack == MAP_FAILED);
+ ksft_print_msg("Allocated stack from %p-%p\n", self->stack,
+ (unsigned long)self->stack + variant->stack_size);
+}
+
+FIXTURE_TEARDOWN(map_gcs)
+{
+ int ret;
+
+ if (self->stack != MAP_FAILED) {
+ ret = munmap(self->stack, variant->stack_size);
+ ASSERT_EQ(ret, 0);
+ }
+}
+
+/* The stack has a cap token */
+TEST_F(map_gcs, stack_capped)
+{
+ unsigned long *stack = self->stack;
+ size_t cap_index;
+
+ cap_index = (variant->stack_size / sizeof(unsigned long));
+
+ switch (variant->flags & (SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN)) {
+ case SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN:
+ cap_index -= 2;
+ break;
+ case SHADOW_STACK_SET_TOKEN:
+ cap_index -= 1;
+ break;
+ case SHADOW_STACK_SET_MARKER:
+ case 0:
+ /* No cap, no test */
+ return;
+ }
+
+ ASSERT_EQ(stack[cap_index], GCS_CAP(&stack[cap_index]));
+}
+
+/* The top of the stack is 0 */
+TEST_F(map_gcs, stack_terminated)
+{
+ unsigned long *stack = self->stack;
+ size_t term_index;
+
+ if (!(variant->flags & SHADOW_STACK_SET_MARKER))
+ return;
+
+ term_index = (variant->stack_size / sizeof(unsigned long)) - 1;
+
+ ASSERT_EQ(stack[term_index], 0);
+}
+
+/* Writes should fault */
+TEST_F_SIGNAL(map_gcs, not_writeable, SIGSEGV)
+{
+ self->stack[0] = 0;
+}
+
+/* Put it all together, we can safely switch to and from the stack */
+TEST_F(map_gcs, stack_switch)
+{
+ size_t cap_index;
+ cap_index = (variant->stack_size / sizeof(unsigned long));
+ unsigned long *orig_gcspr_el0, *pivot_gcspr_el0;
+
+ /* Skip over the stack terminator and point at the cap */
+ switch (variant->flags & (SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN)) {
+ case SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN:
+ cap_index -= 2;
+ break;
+ case SHADOW_STACK_SET_TOKEN:
+ cap_index -= 1;
+ break;
+ case SHADOW_STACK_SET_MARKER:
+ case 0:
+ /* No cap, no test */
+ return;
+ }
+ pivot_gcspr_el0 = &self->stack[cap_index];
+
+ /* Pivot to the new GCS */
+ ksft_print_msg("Pivoting to %p from %p, target has value 0x%lx\n",
+ pivot_gcspr_el0, get_gcspr(),
+ *pivot_gcspr_el0);
+ gcsss1(pivot_gcspr_el0);
+ orig_gcspr_el0 = gcsss2();
+ ksft_print_msg("Pivoted to %p from %p, target has value 0x%lx\n",
+ pivot_gcspr_el0, get_gcspr(),
+ *pivot_gcspr_el0);
+
+ ksft_print_msg("Pivoted, GCSPR_EL0 now %p\n", get_gcspr());
+
+ /* New GCS must be in the new buffer */
+ ASSERT_TRUE((unsigned long)get_gcspr() > (unsigned long)self->stack);
+ ASSERT_TRUE((unsigned long)get_gcspr() <=
+ (unsigned long)self->stack + variant->stack_size);
+
+ /* We should be able to use all but 2 slots of the new stack */
+ ksft_print_msg("Recursing %d levels\n", cap_index - 1);
+ gcs_recurse(cap_index - 1);
+
+ /* Pivot back to the original GCS */
+ gcsss1(orig_gcspr_el0);
+ pivot_gcspr_el0 = gcsss2();
+
+ gcs_recurse(0);
+ ksft_print_msg("Pivoted back to GCSPR_EL0 0x%lx\n", get_gcspr());
+}
+
+/* We fault if we try to go beyond the end of the stack */
+TEST_F_SIGNAL(map_gcs, stack_overflow, SIGSEGV)
+{
+ size_t cap_index;
+ cap_index = (variant->stack_size / sizeof(unsigned long));
+ unsigned long *orig_gcspr_el0, *pivot_gcspr_el0;
+
+ /* Skip over the stack terminator and point at the cap */
+ switch (variant->flags & (SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN)) {
+ case SHADOW_STACK_SET_MARKER | SHADOW_STACK_SET_TOKEN:
+ cap_index -= 2;
+ break;
+ case SHADOW_STACK_SET_TOKEN:
+ cap_index -= 1;
+ break;
+ case SHADOW_STACK_SET_MARKER:
+ case 0:
+ /* No cap, no test but we need to SEGV to avoid a false fail */
+ orig_gcspr_el0 = get_gcspr();
+ *orig_gcspr_el0 = 0;
+ return;
+ }
+ pivot_gcspr_el0 = &self->stack[cap_index];
+
+ /* Pivot to the new GCS */
+ ksft_print_msg("Pivoting to %p from %p, target has value 0x%lx\n",
+ pivot_gcspr_el0, get_gcspr(),
+ *pivot_gcspr_el0);
+ gcsss1(pivot_gcspr_el0);
+ orig_gcspr_el0 = gcsss2();
+ ksft_print_msg("Pivoted to %p from %p, target has value 0x%lx\n",
+ pivot_gcspr_el0, get_gcspr(),
+ *pivot_gcspr_el0);
+
+ ksft_print_msg("Pivoted, GCSPR_EL0 now %p\n", get_gcspr());
+
+ /* New GCS must be in the new buffer */
+ ASSERT_TRUE((unsigned long)get_gcspr() > (unsigned long)self->stack);
+ ASSERT_TRUE((unsigned long)get_gcspr() <=
+ (unsigned long)self->stack + variant->stack_size);
+
+ /* Now try to recurse, we should fault doing this. */
+ ksft_print_msg("Recursing %d levels...\n", cap_index + 1);
+ gcs_recurse(cap_index + 1);
+ ksft_print_msg("...done\n");
+
+ /* Clean up properly to try to guard against spurious passes. */
+ gcsss1(orig_gcspr_el0);
+ pivot_gcspr_el0 = gcsss2();
+ ksft_print_msg("Pivoted back to GCSPR_EL0 0x%lx\n", get_gcspr());
+}
+
+FIXTURE(map_invalid_gcs)
+{
+};
+
+FIXTURE_VARIANT(map_invalid_gcs)
+{
+ size_t stack_size;
+};
+
+FIXTURE_SETUP(map_invalid_gcs)
+{
+}
+
+FIXTURE_TEARDOWN(map_invalid_gcs)
+{
+}
+
+/* GCS must be larger than 16 bytes */
+FIXTURE_VARIANT_ADD(map_invalid_gcs, too_small)
+{
+ .stack_size = 8,
+};
+
+/* GCS size must be 16 byte aligned */
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_1) { .stack_size = 1024 + 1 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_2) { .stack_size = 1024 + 2 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_3) { .stack_size = 1024 + 3 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_4) { .stack_size = 1024 + 4 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_5) { .stack_size = 1024 + 5 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_6) { .stack_size = 1024 + 6 };
+FIXTURE_VARIANT_ADD(map_invalid_gcs, unligned_7) { .stack_size = 1024 + 7 };
+
+TEST_F(map_invalid_gcs, do_map)
+{
+ void *stack;
+
+ stack = (void *)syscall(__NR_map_shadow_stack, 0,
+ variant->stack_size, 0);
+ ASSERT_TRUE(stack == MAP_FAILED);
+ if (stack != MAP_FAILED)
+ munmap(stack, variant->stack_size);
+}
+
+
+int main(int argc, char **argv)
+{
+ unsigned long gcs_mode;
+ int ret;
+
+ if (!(getauxval(AT_HWCAP2) & HWCAP2_GCS))
+ ksft_exit_skip("SKIP GCS not supported\n");
+
+ /*
+ * Force shadow stacks on, our tests *should* be fine with or
+ * without libc support and with or without this having ended
+ * up tagged for GCS and enabled by the dynamic linker. We
+ * can't use the libc prctl() function since we can't return
+ * from enabling the stack. Also lock GCS if not already
+ * locked so we can test behaviour when it's locked.
+ */
+ ret = my_syscall2(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &gcs_mode);
+ if (ret) {
+ ksft_print_msg("Failed to read GCS state: %d\n", ret);
+ return EXIT_FAILURE;
+ }
+
+ if (!(gcs_mode & PR_SHADOW_STACK_ENABLE)) {
+ gcs_mode = PR_SHADOW_STACK_ENABLE;
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ gcs_mode);
+ if (ret) {
+ ksft_print_msg("Failed to configure GCS: %d\n", ret);
+ return EXIT_FAILURE;
+ }
+ }
+
+ /* Avoid returning in case libc doesn't understand GCS */
+ exit(test_harness_run(argc, argv));
+}

--
2.30.2


2023-08-07 22:56:20

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 18/36] arm64/gcs: Context switch GCS state for EL0

There are two registers controlling the GCS state of EL0, GCSPR_EL0 which
is the current GCS pointer and GCSCRE0_EL1 which has enable bits for the
specific GCS functionality enabled for EL0. Manage these on context switch
and process lifetime events, GCS is reset on exec(). Also ensure that
any changes to the GCS memory are visible to other PEs and that changes
from other PEs are visible on this one by issuing a GCSB DSYNC when
moving to or from a thread with GCS.

Since the current GCS configuration of a thread will be visible to
userspace we store the configuration in the format used with userspace
and provide a helper which configures the system register as needed.

On systems that support GCS we always allow access to GCSPR_EL0, this
facilitates reporting of GCS faults if userspace implements disabling of
GCS on error - the GCS can still be discovered and examined even if GCS
has been disabled.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/gcs.h | 24 +++++++++++++++++
arch/arm64/include/asm/processor.h | 6 +++++
arch/arm64/kernel/process.c | 55 ++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/gcs.c | 39 +++++++++++++++++++++++++++
5 files changed, 125 insertions(+)

diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 7c5e95218db6..04594ef59dad 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -48,4 +48,28 @@ static inline u64 gcsss2(void)
return Xt;
}

+#ifdef CONFIG_ARM64_GCS
+
+static inline bool task_gcs_el0_enabled(struct task_struct *task)
+{
+ return current->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE;
+}
+
+void gcs_set_el0_mode(struct task_struct *task);
+void gcs_free(struct task_struct *task);
+void gcs_preserve_current_state(void);
+
+#else
+
+static inline bool task_gcs_el0_enabled(struct task_struct *task)
+{
+ return false;
+}
+
+static inline void gcs_set_el0_mode(struct task_struct *task) { }
+static inline void gcs_free(struct task_struct *task) { }
+static inline void gcs_preserve_current_state(void) { }
+
+#endif
+
#endif
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 3918f2a67970..f1551228a143 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -179,6 +179,12 @@ struct thread_struct {
u64 sctlr_user;
u64 svcr;
u64 tpidr2_el0;
+#ifdef CONFIG_ARM64_GCS
+ unsigned int gcs_el0_mode;
+ u64 gcspr_el0;
+ u64 gcs_base;
+ u64 gcs_size;
+#endif
};

static inline unsigned int thread_get_vl(struct thread_struct *thread,
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 0fcc4eb1a7ab..b8a42471aea3 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -48,6 +48,7 @@
#include <asm/cacheflush.h>
#include <asm/exec.h>
#include <asm/fpsimd.h>
+#include <asm/gcs.h>
#include <asm/mmu_context.h>
#include <asm/mte.h>
#include <asm/processor.h>
@@ -271,12 +272,31 @@ static void flush_tagged_addr_state(void)
clear_thread_flag(TIF_TAGGED_ADDR);
}

+#ifdef CONFIG_ARM64_GCS
+
+static void flush_gcs(void)
+{
+ if (system_supports_gcs()) {
+ gcs_free(current);
+ current->thread.gcs_el0_mode = 0;
+ write_sysreg_s(0, SYS_GCSCRE0_EL1);
+ write_sysreg_s(0, SYS_GCSPR_EL0);
+ }
+}
+
+#else
+
+static void flush_gcs(void) { }
+
+#endif
+
void flush_thread(void)
{
fpsimd_flush_thread();
tls_thread_flush();
flush_ptrace_hw_breakpoint(current);
flush_tagged_addr_state();
+ flush_gcs();
}

void arch_release_task_struct(struct task_struct *tsk)
@@ -474,6 +494,40 @@ static void entry_task_switch(struct task_struct *next)
__this_cpu_write(__entry_task, next);
}

+#ifdef CONFIG_ARM64_GCS
+
+void gcs_preserve_current_state(void)
+{
+ if (task_gcs_el0_enabled(current))
+ current->thread.gcspr_el0 = read_sysreg_s(SYS_GCSPR_EL0);
+}
+
+static void gcs_thread_switch(struct task_struct *next)
+{
+ if (!system_supports_gcs())
+ return;
+
+ gcs_preserve_current_state();
+
+ /*
+ * Ensure that GCS changes are observable by/from other PEs in
+ * case of migration.
+ */
+ if (task_gcs_el0_enabled(current) || task_gcs_el0_enabled(next))
+ gcsb_dsync();
+
+ gcs_set_el0_mode(next);
+ write_sysreg_s(next->thread.gcspr_el0, SYS_GCSPR_EL0);
+}
+
+#else
+
+static void gcs_thread_switch(struct task_struct *next)
+{
+}
+
+#endif
+
/*
* ARM erratum 1418040 handling, affecting the 32bit view of CNTVCT.
* Ensure access is disabled when switching to a 32bit task, ensure
@@ -533,6 +587,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
ssbs_thread_switch(next);
erratum_1418040_thread_switch(next);
ptrauth_thread_switch_user(next);
+ gcs_thread_switch(next);

/*
* Complete any pending TLB or cache maintenance on this CPU in case
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index dbd1bc95967d..4e7cb2f02999 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_TRANS_TABLE) += trans_pgd.o
obj-$(CONFIG_TRANS_TABLE) += trans_pgd-asm.o
obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
obj-$(CONFIG_ARM64_MTE) += mteswap.o
+obj-$(CONFIG_ARM64_GCS) += gcs.o
KASAN_SANITIZE_physaddr.o += n

obj-$(CONFIG_KASAN) += kasan_init.o
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
new file mode 100644
index 000000000000..b0a67efc522b
--- /dev/null
+++ b/arch/arm64/mm/gcs.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/syscalls.h>
+#include <linux/types.h>
+
+#include <asm/cpufeature.h>
+#include <asm/page.h>
+
+/*
+ * Apply the GCS mode configured for the specified task to the
+ * hardware.
+ */
+void gcs_set_el0_mode(struct task_struct *task)
+{
+ u64 gcscre0_el1 = GCSCRE0_EL1_nTR;
+
+ if (task->thread.gcs_el0_mode & PR_SHADOW_STACK_ENABLE)
+ gcscre0_el1 |= GCSCRE0_EL1_RVCHKEN | GCSCRE0_EL1_PCRSEL;
+
+ if (task->thread.gcs_el0_mode & PR_SHADOW_STACK_WRITE)
+ gcscre0_el1 |= GCSCRE0_EL1_STREn;
+
+ if (task->thread.gcs_el0_mode & PR_SHADOW_STACK_PUSH)
+ gcscre0_el1 |= GCSCRE0_EL1_PUSHMEn;
+
+ write_sysreg_s(gcscre0_el1, SYS_GCSCRE0_EL1);
+}
+
+void gcs_free(struct task_struct *task)
+{
+ if (task->thread.gcs_base)
+ vm_munmap(task->thread.gcs_base, task->thread.gcs_size);
+
+ task->thread.gcspr_el0 = 0;
+ task->thread.gcs_base = 0;
+ task->thread.gcs_size = 0;
+}

--
2.30.2


2023-08-07 22:56:48

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 29/36] kselftest/arm64: Allow signals tests to specify an expected si_code

Currently we ignore si_code unless the expected signal is a SIGSEGV, in
which case we enforce it being SEGV_ACCERR. Allow test cases to specify
exactly which si_code should be generated so we can validate this, and
test for other segfault codes.

Signed-off-by: Mark Brown <[email protected]>
---
.../testing/selftests/arm64/signal/test_signals.h | 4 +++
.../selftests/arm64/signal/test_signals_utils.c | 29 ++++++++++++++--------
2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/arm64/signal/test_signals.h b/tools/testing/selftests/arm64/signal/test_signals.h
index 7ada43688c02..ee75a2c25ce7 100644
--- a/tools/testing/selftests/arm64/signal/test_signals.h
+++ b/tools/testing/selftests/arm64/signal/test_signals.h
@@ -71,6 +71,10 @@ struct tdescr {
* Zero when no signal is expected on success
*/
int sig_ok;
+ /*
+ * expected si_code for sig_ok, or 0 to not check
+ */
+ int sig_ok_code;
/* signum expected on unsupported CPU features. */
int sig_unsupp;
/* a timeout in second for test completion */
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c
index 89ef95c1af0e..63deca32b0df 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.c
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c
@@ -143,16 +143,25 @@ static bool handle_signal_ok(struct tdescr *td,
"current->token ZEROED...test is probably broken!\n");
abort();
}
- /*
- * Trying to narrow down the SEGV to the ones generated by Kernel itself
- * via arm64_notify_segfault(). This is a best-effort check anyway, and
- * the si_code check may need to change if this aspect of the kernel
- * ABI changes.
- */
- if (td->sig_ok == SIGSEGV && si->si_code != SEGV_ACCERR) {
- fprintf(stdout,
- "si_code != SEGV_ACCERR...test is probably broken!\n");
- abort();
+ if (td->sig_ok_code) {
+ if (si->si_code != td->sig_ok_code) {
+ fprintf(stdout, "si_code is %d not %d\n",
+ si->si_code, td->sig_ok_code);
+ abort();
+ }
+ } else {
+ /*
+ * Trying to narrow down the SEGV to the ones
+ * generated by Kernel itself via
+ * arm64_notify_segfault(). This is a best-effort
+ * check anyway, and the si_code check may need to
+ * change if this aspect of the kernel ABI changes.
+ */
+ if (td->sig_ok == SIGSEGV && si->si_code != SEGV_ACCERR) {
+ fprintf(stdout,
+ "si_code != SEGV_ACCERR...test is probably broken!\n");
+ abort();
+ }
}
td->pass = 1;
/*

--
2.30.2


2023-08-07 22:57:57

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 05/36] arm64/sysreg: Add definitions for architected GCS caps

The architecture defines a format for guarded control stack caps, used
to mark the top of an unused GCS in order to limit the potential for
exploitation via stack switching. Add definitions associated with these.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index b481935e9314..3d7f9b25b8fb 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -730,6 +730,26 @@

#define PIRx_ELx_PERM(idx, perm) ((perm) << ((idx) * 4))

+/*
+ * Definitions for Guarded Control Stack
+ */
+
+#define GCS_CAP_ADDR_MASK GENMASK(63, 12)
+#define GCS_CAP_ADDR_SHIFT 12
+#define GCS_CAP_ADDR_WIDTH 52
+#define GCS_CAP_ADDR(x) FIELD_GET(GCS_CAP_ADDR_MASK, x)
+
+#define GCS_CAP_TOKEN_MASK GENMASK(11, 0)
+#define GCS_CAP_TOKEN_SHIFT 0
+#define GCS_CAP_TOKEN_WIDTH 12
+#define GCS_CAP_TOKEN(x) FIELD_GET(GCS_CAP_TOKEN_MASK, x)
+
+#define GCS_CAP_VALID_TOKEN 0x1
+#define GCS_CAP_IN_PROGRESS_TOKEN 0x5
+
+#define GCS_CAP(x) ((((unsigned long)x) & GCS_CAP_ADDR_MASK) | \
+ GCS_CAP_VALID_TOKEN)
+
#define ARM64_FEATURE_FIELD_BITS 4

/* Defined for compatibility only, do not add new users. */

--
2.30.2


2023-08-07 23:00:18

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 09/36] arm64/mm: Allocate PIE slots for EL0 guarded control stack

Pages used for guarded control stacks need to be described to the hardware
using the Permission Indirection Extension, GCS is not supported without
PIE. In order to support copy on write for guarded stacks we allocate two
values, one for active GCSs and one for GCS pages marked as read only prior
to copy.

Since the actual effect is defined using PIE the specific bit pattern used
does not matter to the hardware but we choose two values which differ only
in PTE_WRITE in order to help share code with non-PIE cases.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/pgtable-prot.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index eed814b00a38..b157ae0420ed 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -131,15 +131,23 @@ extern bool arm64_use_ng_mappings;
/* 6: PTE_PXN | PTE_WRITE */
/* 7: PAGE_SHARED_EXEC PTE_PXN | PTE_WRITE | PTE_USER */
/* 8: PAGE_KERNEL_ROX PTE_UXN */
-/* 9: PTE_UXN | PTE_USER */
+/* 9: PAGE_GCS_RO PTE_UXN | PTE_USER */
/* a: PAGE_KERNEL_EXEC PTE_UXN | PTE_WRITE */
-/* b: PTE_UXN | PTE_WRITE | PTE_USER */
+/* b: PAGE_GCS PTE_UXN | PTE_WRITE | PTE_USER */
/* c: PAGE_KERNEL_RO PTE_UXN | PTE_PXN */
/* d: PAGE_READONLY PTE_UXN | PTE_PXN | PTE_USER */
/* e: PAGE_KERNEL PTE_UXN | PTE_PXN | PTE_WRITE */
/* f: PAGE_SHARED PTE_UXN | PTE_PXN | PTE_WRITE | PTE_USER */

+#define _PAGE_GCS (_PAGE_DEFAULT | PTE_UXN | PTE_WRITE | PTE_USER)
+#define _PAGE_GCS_RO (_PAGE_DEFAULT | PTE_UXN | PTE_USER)
+
+#define PAGE_GCS __pgprot(_PAGE_GCS)
+#define PAGE_GCS_RO __pgprot(_PAGE_GCS_RO)
+
#define PIE_E0 ( \
+ PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS), PIE_GCS) | \
+ PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS_RO), PIE_R) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_X_O) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RWX) | \
@@ -147,6 +155,8 @@ extern bool arm64_use_ng_mappings;
PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED), PIE_RW))

#define PIE_E1 ( \
+ PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS), PIE_RW) | \
+ PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS_RO), PIE_R) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_NONE_O) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_R) | \
PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RW) | \

--
2.30.2


2023-08-07 23:02:14

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 08/36] arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)

Add a cpufeature for GCS, allowing other code to conditionally support it
at runtime.

Signed-off-by: Mark Brown <[email protected]>
---
arch/arm64/include/asm/cpufeature.h | 6 ++++++
arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++
arch/arm64/tools/cpucaps | 1 +
3 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 96e50227f940..189783142a96 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -831,6 +831,12 @@ static inline bool system_supports_tlb_range(void)
cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
}

+static inline bool system_supports_gcs(void)
+{
+ return IS_ENABLED(CONFIG_ARM64_GCS) &&
+ cpus_have_const_cap(ARM64_HAS_GCS);
+}
+
int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
bool try_emulate_mrs(struct pt_regs *regs, u32 isn);

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index f9d456fe132d..91a14a6ccb04 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -254,6 +254,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
};

static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
+ ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_GCS),
+ FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_GCS_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_SME_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_EL1_MPAM_frac_SHIFT, 4, 0),
@@ -2219,6 +2221,12 @@ static void cpu_enable_mops(const struct arm64_cpu_capabilities *__unused)
sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_MSCEn);
}

+static void cpu_enable_gcs(const struct arm64_cpu_capabilities *__unused)
+{
+ /* GCS is not currently used at EL1 */
+ write_sysreg_s(0, SYS_GCSCR_EL1);
+}
+
/* Internal helper functions to match cpu capability type */
static bool
cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
@@ -2715,6 +2723,14 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.min_field_value = ID_AA64MMFR2_EL1_EVT_IMP,
.matches = has_cpuid_feature,
},
+ {
+ .desc = "Guarded Control Stack (GCS)",
+ .capability = ARM64_HAS_GCS,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .cpu_enable = cpu_enable_gcs,
+ .matches = has_cpuid_feature,
+ ARM64_CPUID_FIELDS(ID_AA64PFR1_EL1, GCS, IMP)
+ },
{},
};

diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index c80ed4f3cbce..ab582f592131 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -26,6 +26,7 @@ HAS_ECV
HAS_ECV_CNTPOFF
HAS_EPAN
HAS_EVT
+HAS_GCS
HAS_GENERIC_AUTH
HAS_GENERIC_AUTH_ARCH_QARMA3
HAS_GENERIC_AUTH_ARCH_QARMA5

--
2.30.2


2023-08-07 23:02:15

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 15/36] arm64/hwcap: Add hwcap for GCS

Provide a hwcap to enable userspace to detect support for GCS.

Signed-off-by: Mark Brown <[email protected]>
---
Documentation/arch/arm64/elf_hwcaps.rst | 3 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/kernel/cpufeature.c | 3 +++
arch/arm64/kernel/cpuinfo.c | 1 +
5 files changed, 9 insertions(+)

diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst
index 8c8addb4194c..75f3960cad39 100644
--- a/Documentation/arch/arm64/elf_hwcaps.rst
+++ b/Documentation/arch/arm64/elf_hwcaps.rst
@@ -305,6 +305,9 @@ HWCAP2_SMEF16F16
HWCAP2_MOPS
Functionality implied by ID_AA64ISAR2_EL1.MOPS == 0b0001.

+HWCAP2_GCS
+ Functionality implied by ID_AA64PFR1_EL1.GCS == 0b1
+
4. Unused AT_HWCAP bits
-----------------------

diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index 692b1ec663b2..39f397a2b5b2 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -138,6 +138,7 @@
#define KERNEL_HWCAP_SME_B16B16 __khwcap2_feature(SME_B16B16)
#define KERNEL_HWCAP_SME_F16F16 __khwcap2_feature(SME_F16F16)
#define KERNEL_HWCAP_MOPS __khwcap2_feature(MOPS)
+#define KERNEL_HWCAP_GCS __khwcap2_feature(GCS)

/*
* This yields a mask that user programs can use to figure out what
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index a2cac4305b1e..7510c35e6864 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -103,5 +103,6 @@
#define HWCAP2_SME_B16B16 (1UL << 41)
#define HWCAP2_SME_F16F16 (1UL << 42)
#define HWCAP2_MOPS (1UL << 43)
+#define HWCAP2_GCS (1UL << 44)

#endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 91a14a6ccb04..7b46e01140c4 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2840,6 +2840,9 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
HWCAP_CAP(ID_AA64ZFR0_EL1, I8MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEI8MM),
HWCAP_CAP(ID_AA64ZFR0_EL1, F32MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF32MM),
HWCAP_CAP(ID_AA64ZFR0_EL1, F64MM, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEF64MM),
+#endif
+#ifdef CONFIG_ARM64_GCS
+ HWCAP_CAP(ID_AA64PFR1_EL1, GCS, IMP, CAP_HWCAP, KERNEL_HWCAP_GCS),
#endif
HWCAP_CAP(ID_AA64PFR1_EL1, SSBS, SSBS2, CAP_HWCAP, KERNEL_HWCAP_SSBS),
#ifdef CONFIG_ARM64_BTI
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 58622dc85917..451fbbeffa39 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -126,6 +126,7 @@ static const char *const hwcap_str[] = {
[KERNEL_HWCAP_SME_B16B16] = "smeb16b16",
[KERNEL_HWCAP_SME_F16F16] = "smef16f16",
[KERNEL_HWCAP_MOPS] = "mops",
+ [KERNEL_HWCAP_GCS] = "gcs",
};

#ifdef CONFIG_COMPAT

--
2.30.2


2023-08-07 23:06:45

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 30/36] kselftest/arm64: Always run signals tests with GCS enabled

Since it is not possible to return from the function that enabled GCS
without disabling GCS it is very inconvenient to use the signal handling
tests to cover GCS when GCS is not enabled by the toolchain and runtime,
something that no current distribution does. Since none of the testcases
do anything with stacks that would cause problems with GCS we can sidestep
this issue by unconditionally enabling GCS on startup and exiting with a
call to exit() rather than a return from main().

Signed-off-by: Mark Brown <[email protected]>
---
.../testing/selftests/arm64/signal/test_signals.c | 17 ++++++++++++-
.../selftests/arm64/signal/test_signals_utils.h | 29 ++++++++++++++++++++++
2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/arm64/signal/test_signals.c b/tools/testing/selftests/arm64/signal/test_signals.c
index 00051b40d71e..30e95f50db19 100644
--- a/tools/testing/selftests/arm64/signal/test_signals.c
+++ b/tools/testing/selftests/arm64/signal/test_signals.c
@@ -7,6 +7,10 @@
* Each test provides its own tde struct tdescr descriptor to link with
* this wrapper. Framework provides common helpers.
*/
+
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+
#include <kselftest.h>

#include "test_signals.h"
@@ -16,6 +20,16 @@ struct tdescr *current = &tde;

int main(int argc, char *argv[])
{
+ /*
+ * Ensure GCS is at least enabled throughout the tests if
+ * supported, otherwise the inability to return from the
+ * function that enabled GCS makes it very inconvenient to set
+ * up test cases. The prctl() may fail if GCS was locked by
+ * libc setup code.
+ */
+ if (getauxval(AT_HWCAP2) & HWCAP2_GCS)
+ gcs_set_state(PR_SHADOW_STACK_ENABLE);
+
ksft_print_msg("%s :: %s\n", current->name, current->descr);
if (test_setup(current) && test_init(current)) {
test_run(current);
@@ -23,5 +37,6 @@ int main(int argc, char *argv[])
}
test_result(current);

- return current->result;
+ /* Do not return in case GCS was enabled */
+ exit(current->result);
}
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..1cea64986baa 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -16,6 +16,35 @@ void test_cleanup(struct tdescr *td);
int test_run(struct tdescr *td);
void test_result(struct tdescr *td);

+#ifndef __NR_prctl
+#define __NR_prctl 167
+#endif
+
+/*
+ * The prctl takes 1 argument but we need to ensure that the other
+ * values passed in registers to the syscall are zero since the kernel
+ * validates them.
+ */
+#define gcs_set_state(state) \
+ ({ \
+ register long _num __asm__ ("x8") = __NR_prctl; \
+ register long _arg1 __asm__ ("x0") = PR_SET_SHADOW_STACK_STATUS; \
+ register long _arg2 __asm__ ("x1") = (long)(state); \
+ register long _arg3 __asm__ ("x2") = 0; \
+ register long _arg4 __asm__ ("x3") = 0; \
+ register long _arg5 __asm__ ("x4") = 0; \
+ \
+ __asm__ volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+ })
+
static inline bool feats_ok(struct tdescr *td)
{
if (td->feats_incompatible & td->feats_supported)

--
2.30.2


2023-08-07 23:07:43

by Mark Brown

[permalink] [raw]
Subject: [PATCH v4 35/36] kselftest/arm64: Add a GCS stress test

Add a stress test which runs one more process than we have CPUs spinning
through a very recursive function with frequent syscalls immediately prior
to return and signals being injected every 100ms. The goal is to flag up
any scheduling related issues, for example failure to ensure that barriers
are inserted when moving a GCS using task to another CPU. The test runs for
a configurable amount of time, defaulting to 10 seconds.

Signed-off-by: Mark Brown <[email protected]>
---
tools/testing/selftests/arm64/gcs/.gitignore | 2 +
tools/testing/selftests/arm64/gcs/Makefile | 6 +-
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 ++++++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++++++++
5 files changed, 850 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/arm64/gcs/.gitignore b/tools/testing/selftests/arm64/gcs/.gitignore
index 0c86f53f68ad..1e8d1f6b27f2 100644
--- a/tools/testing/selftests/arm64/gcs/.gitignore
+++ b/tools/testing/selftests/arm64/gcs/.gitignore
@@ -1,3 +1,5 @@
basic-gcs
libc-gcs
gcs-locking
+gcs-stress
+gcs-stress-thread
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
index 2173d6275956..d8b06ca51e22 100644
--- a/tools/testing/selftests/arm64/gcs/Makefile
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -6,7 +6,8 @@
# nolibc.
#

-TEST_GEN_PROGS := basic-gcs libc-gcs gcs-locking
+TEST_GEN_PROGS := basic-gcs libc-gcs gcs-locking gcs-stress
+TEST_GEN_PROGS_EXTENDED := gcs-stress-thread

LDLIBS+=-lpthread

@@ -18,3 +19,6 @@ $(OUTPUT)/basic-gcs: basic-gcs.c
-I../../../../../usr/include \
-std=gnu99 -I../.. -g \
-ffreestanding -Wall $^ -o $@ -lgcc
+
+$(OUTPUT)/gcs-stress-thread: gcs-stress-thread.S
+ $(CC) -nostdlib $^ -o $@
diff --git a/tools/testing/selftests/arm64/gcs/asm-offsets.h b/tools/testing/selftests/arm64/gcs/asm-offsets.h
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/tools/testing/selftests/arm64/gcs/gcs-stress-thread.S b/tools/testing/selftests/arm64/gcs/gcs-stress-thread.S
new file mode 100644
index 000000000000..4fe8695333e5
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/gcs-stress-thread.S
@@ -0,0 +1,311 @@
+// Program that loops for ever doing lots of recursions and system calls,
+// intended to be used as part of a stress test for GCS context switching.
+//
+// Copyright 2015-2023 Arm Ltd
+
+#include <asm/unistd.h>
+
+#define sa_sz 32
+#define sa_flags 8
+#define sa_handler 0
+#define sa_mask_sz 8
+
+#define si_code 8
+
+#define SIGINT 2
+#define SIGABRT 6
+#define SIGUSR1 10
+#define SIGSEGV 11
+#define SIGUSR2 12
+#define SIGTERM 15
+#define SEGV_CPERR 10
+
+#define SA_NODEFER 1073741824
+#define SA_SIGINFO 4
+#define ucontext_regs 184
+
+#define PR_SET_SHADOW_STACK_STATUS 72
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+
+#define GCSPR_EL0 S3_3_C2_C5_1
+
+.macro function name
+ .macro endfunction
+ .type \name, @function
+ .purgem endfunction
+ .endm
+\name:
+.endm
+
+// Print a single character x0 to stdout
+// Clobbers x0-x2,x8
+function putc
+ str x0, [sp, #-16]!
+
+ mov x0, #1 // STDOUT_FILENO
+ mov x1, sp
+ mov x2, #1
+ mov x8, #__NR_write
+ svc #0
+
+ add sp, sp, #16
+ ret
+endfunction
+.globl putc
+
+// Print a NUL-terminated string starting at address x0 to stdout
+// Clobbers x0-x3,x8
+function puts
+ mov x1, x0
+
+ mov x2, #0
+0: ldrb w3, [x0], #1
+ cbz w3, 1f
+ add x2, x2, #1
+ b 0b
+
+1: mov w0, #1 // STDOUT_FILENO
+ mov x8, #__NR_write
+ svc #0
+
+ ret
+endfunction
+.globl puts
+
+// Utility macro to print a literal string
+// Clobbers x0-x4,x8
+.macro puts string
+ .pushsection .rodata.str1.1, "aMS", @progbits, 1
+.L__puts_literal\@: .string "\string"
+ .popsection
+
+ ldr x0, =.L__puts_literal\@
+ bl puts
+.endm
+
+// Print an unsigned decimal number x0 to stdout
+// Clobbers x0-x4,x8
+function putdec
+ mov x1, sp
+ str x30, [sp, #-32]! // Result can't be > 20 digits
+
+ mov x2, #0
+ strb w2, [x1, #-1]! // Write the NUL terminator
+
+ mov x2, #10
+0: udiv x3, x0, x2 // div-mod loop to generate the digits
+ msub x0, x3, x2, x0
+ add w0, w0, #'0'
+ strb w0, [x1, #-1]!
+ mov x0, x3
+ cbnz x3, 0b
+
+ ldrb w0, [x1]
+ cbnz w0, 1f
+ mov w0, #'0' // Print "0" for 0, not ""
+ strb w0, [x1, #-1]!
+
+1: mov x0, x1
+ bl puts
+
+ ldr x30, [sp], #32
+ ret
+endfunction
+.globl putdec
+
+// Print an unsigned decimal number x0 to stdout, followed by a newline
+// Clobbers x0-x5,x8
+function putdecn
+ mov x5, x30
+
+ bl putdec
+ mov x0, #'\n'
+ bl putc
+
+ ret x5
+endfunction
+.globl putdecn
+
+// Fill x1 bytes starting at x0 with 0.
+// Clobbers x1, x2.
+function memclr
+ mov w2, #0
+endfunction
+.globl memclr
+ // fall through to memfill
+
+// Trivial memory fill: fill x1 bytes starting at address x0 with byte w2
+// Clobbers x1
+function memfill
+ cmp x1, #0
+ b.eq 1f
+
+0: strb w2, [x0], #1
+ subs x1, x1, #1
+ b.ne 0b
+
+1: ret
+endfunction
+.globl memfill
+
+// w0: signal number
+// x1: sa_action
+// w2: sa_flags
+// Clobbers x0-x6,x8
+function setsignal
+ str x30, [sp, #-((sa_sz + 15) / 16 * 16 + 16)]!
+
+ mov w4, w0
+ mov x5, x1
+ mov w6, w2
+
+ add x0, sp, #16
+ mov x1, #sa_sz
+ bl memclr
+
+ mov w0, w4
+ add x1, sp, #16
+ str w6, [x1, #sa_flags]
+ str x5, [x1, #sa_handler]
+ mov x2, #0
+ mov x3, #sa_mask_sz
+ mov x8, #__NR_rt_sigaction
+ svc #0
+
+ cbz w0, 1f
+
+ puts "sigaction failure\n"
+ b abort
+
+1: ldr x30, [sp], #((sa_sz + 15) / 16 * 16 + 16)
+ ret
+endfunction
+
+
+function tickle_handler
+ // Perhaps collect GCSPR_EL0 here in future?
+ ret
+endfunction
+
+function terminate_handler
+ mov w21, w0
+ mov x20, x2
+
+ puts "Terminated by signal "
+ mov w0, w21
+ bl putdec
+ puts ", no error\n"
+
+ mov x0, #0
+ mov x8, #__NR_exit
+ svc #0
+endfunction
+
+function segv_handler
+ // stash the siginfo_t *
+ mov x20, x1
+
+ // Disable GCS, we don't want additional faults logging things
+ mov x0, PR_SET_SHADOW_STACK_STATUS
+ mov x1, xzr
+ mov x2, xzr
+ mov x3, xzr
+ mov x4, xzr
+ mov x5, xzr
+ mov x8, #__NR_prctl
+ svc #0
+
+ puts "Got SIGSEGV code "
+
+ ldr x21, [x20, #si_code]
+ mov x0, x21
+ bl putdec
+
+ // GCS faults should have si_code SEGV_CPERR
+ cmp x21, #SEGV_CPERR
+ bne 1f
+
+ puts " (GCS violation)"
+1:
+ mov x0, '\n'
+ bl putc
+ b abort
+endfunction
+
+// Recurse x20 times
+.macro recurse id
+function recurse\id
+ stp x29, x30, [sp, #-16]!
+ mov x29, sp
+
+ cmp x20, 0
+ beq 1f
+ sub x20, x20, 1
+ bl recurse\id
+
+1:
+ ldp x29, x30, [sp], #16
+
+ // Do a syscall immediately prior to returning to try to provoke
+ // scheduling and migration at a point where coherency issues
+ // might trigger.
+ mov x8, #__NR_getpid
+ svc #0
+
+ ret
+endfunction
+.endmacro
+
+// Generate and use two copies so we're changing the GCS contents
+recurse 1
+recurse 2
+
+.globl _start
+function _start
+ // Run with GCS
+ mov x0, PR_SET_SHADOW_STACK_STATUS
+ mov x1, PR_SHADOW_STACK_ENABLE
+ mov x2, xzr
+ mov x3, xzr
+ mov x4, xzr
+ mov x5, xzr
+ mov x8, #__NR_prctl
+ svc #0
+ cbz x0, 1f
+ puts "Failed to enable GCS\n"
+ b abort
+1:
+
+ mov w0, #SIGTERM
+ adr x1, terminate_handler
+ mov w2, #SA_SIGINFO
+ bl setsignal
+
+ mov w0, #SIGUSR1
+ adr x1, tickle_handler
+ mov w2, #SA_SIGINFO
+ orr w2, w2, #SA_NODEFER
+ bl setsignal
+
+ mov w0, #SIGSEGV
+ adr x1, segv_handler
+ mov w2, #SA_SIGINFO
+ orr w2, w2, #SA_NODEFER
+ bl setsignal
+
+ puts "Running\n"
+
+loop:
+ // Small recursion depth so we're frequently flipping between
+ // the two recursors and changing what's on the stack
+ mov x20, #5
+ bl recurse1
+ mov x20, #5
+ bl recurse2
+ b loop
+endfunction
+
+abort:
+ mov x0, #255
+ mov x8, #__NR_exit
+ svc #0
diff --git a/tools/testing/selftests/arm64/gcs/gcs-stress.c b/tools/testing/selftests/arm64/gcs/gcs-stress.c
new file mode 100644
index 000000000000..23fd8ec37bdc
--- /dev/null
+++ b/tools/testing/selftests/arm64/gcs/gcs-stress.c
@@ -0,0 +1,532 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022-3 ARM Limited.
+ */
+
+#define _GNU_SOURCE
+#define _POSIX_C_SOURCE 199309L
+
+#include <errno.h>
+#include <getopt.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/auxv.h>
+#include <sys/epoll.h>
+#include <sys/prctl.h>
+#include <sys/types.h>
+#include <sys/uio.h>
+#include <sys/wait.h>
+#include <asm/hwcap.h>
+
+#include "../../kselftest.h"
+
+struct child_data {
+ char *name, *output;
+ pid_t pid;
+ int stdout;
+ bool output_seen;
+ bool exited;
+ int exit_status;
+ int exit_signal;
+};
+
+static int epoll_fd;
+static struct child_data *children;
+static struct epoll_event *evs;
+static int tests;
+static int num_children;
+static bool terminate;
+
+static int startup_pipe[2];
+
+static int num_processors(void)
+{
+ long nproc = sysconf(_SC_NPROCESSORS_CONF);
+ if (nproc < 0) {
+ perror("Unable to read number of processors\n");
+ exit(EXIT_FAILURE);
+ }
+
+ return nproc;
+}
+
+static void start_thread(struct child_data *child)
+{
+ int ret, pipefd[2], i;
+ struct epoll_event ev;
+
+ ret = pipe(pipefd);
+ if (ret != 0)
+ ksft_exit_fail_msg("Failed to create stdout pipe: %s (%d)\n",
+ strerror(errno), errno);
+
+ child->pid = fork();
+ if (child->pid == -1)
+ ksft_exit_fail_msg("fork() failed: %s (%d)\n",
+ strerror(errno), errno);
+
+ if (!child->pid) {
+ /*
+ * In child, replace stdout with the pipe, errors to
+ * stderr from here as kselftest prints to stdout.
+ */
+ ret = dup2(pipefd[1], 1);
+ if (ret == -1) {
+ fprintf(stderr, "dup2() %d\n", errno);
+ exit(EXIT_FAILURE);
+ }
+
+ /*
+ * Duplicate the read side of the startup pipe to
+ * FD 3 so we can close everything else.
+ */
+ ret = dup2(startup_pipe[0], 3);
+ if (ret == -1) {
+ fprintf(stderr, "dup2() %d\n", errno);
+ exit(EXIT_FAILURE);
+ }
+
+ /*
+ * Very dumb mechanism to clean open FDs other than
+ * stdio. We don't want O_CLOEXEC for the pipes...
+ */
+ for (i = 4; i < 8192; i++)
+ close(i);
+
+ /*
+ * Read from the startup pipe, there should be no data
+ * and we should block until it is closed. We just
+ * carry on on error since this isn't super critical.
+ */
+ ret = read(3, &i, sizeof(i));
+ if (ret < 0)
+ fprintf(stderr, "read(startp pipe) failed: %s (%d)\n",
+ strerror(errno), errno);
+ if (ret > 0)
+ fprintf(stderr, "%d bytes of data on startup pipe\n",
+ ret);
+ close(3);
+
+ ret = execl("gcs-stress-thread", "gcs-stress-thread", NULL);
+ fprintf(stderr, "execl(gcs-stress-thread) failed: %d (%s)\n",
+ errno, strerror(errno));
+
+ exit(EXIT_FAILURE);
+ } else {
+ /*
+ * In parent, remember the child and close our copy of the
+ * write side of stdout.
+ */
+ close(pipefd[1]);
+ child->stdout = pipefd[0];
+ child->output = NULL;
+ child->exited = false;
+ child->output_seen = false;
+
+ ev.events = EPOLLIN | EPOLLHUP;
+ ev.data.ptr = child;
+
+ ret = asprintf(&child->name, "Thread-%d", child->pid);
+ if (ret == -1)
+ ksft_exit_fail_msg("asprintf() failed\n");
+
+ ret = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, child->stdout, &ev);
+ if (ret < 0) {
+ ksft_exit_fail_msg("%s EPOLL_CTL_ADD failed: %s (%d)\n",
+ child->name, strerror(errno), errno);
+ }
+ }
+
+ ksft_print_msg("Started %s\n", child->name);
+ num_children++;
+}
+
+static bool child_output_read(struct child_data *child)
+{
+ char read_data[1024];
+ char work[1024];
+ int ret, len, cur_work, cur_read;
+
+ ret = read(child->stdout, read_data, sizeof(read_data));
+ if (ret < 0) {
+ if (errno == EINTR)
+ return true;
+
+ ksft_print_msg("%s: read() failed: %s (%d)\n",
+ child->name, strerror(errno),
+ errno);
+ return false;
+ }
+ len = ret;
+
+ child->output_seen = true;
+
+ /* Pick up any partial read */
+ if (child->output) {
+ strncpy(work, child->output, sizeof(work) - 1);
+ cur_work = strnlen(work, sizeof(work));
+ free(child->output);
+ child->output = NULL;
+ } else {
+ cur_work = 0;
+ }
+
+ cur_read = 0;
+ while (cur_read < len) {
+ work[cur_work] = read_data[cur_read++];
+
+ if (work[cur_work] == '\n') {
+ work[cur_work] = '\0';
+ ksft_print_msg("%s: %s\n", child->name, work);
+ cur_work = 0;
+ } else {
+ cur_work++;
+ }
+ }
+
+ if (cur_work) {
+ work[cur_work] = '\0';
+ ret = asprintf(&child->output, "%s", work);
+ if (ret == -1)
+ ksft_exit_fail_msg("Out of memory\n");
+ }
+
+ return false;
+}
+
+static void child_output(struct child_data *child, uint32_t events,
+ bool flush)
+{
+ bool read_more;
+
+ if (events & EPOLLIN) {
+ do {
+ read_more = child_output_read(child);
+ } while (read_more);
+ }
+
+ if (events & EPOLLHUP) {
+ close(child->stdout);
+ child->stdout = -1;
+ flush = true;
+ }
+
+ if (flush && child->output) {
+ ksft_print_msg("%s: %s<EOF>\n", child->name, child->output);
+ free(child->output);
+ child->output = NULL;
+ }
+}
+
+static void child_tickle(struct child_data *child)
+{
+ if (child->output_seen && !child->exited)
+ kill(child->pid, SIGUSR1);
+}
+
+static void child_stop(struct child_data *child)
+{
+ if (!child->exited)
+ kill(child->pid, SIGTERM);
+}
+
+static void child_cleanup(struct child_data *child)
+{
+ pid_t ret;
+ int status;
+ bool fail = false;
+
+ if (!child->exited) {
+ do {
+ ret = waitpid(child->pid, &status, 0);
+ if (ret == -1 && errno == EINTR)
+ continue;
+
+ if (ret == -1) {
+ ksft_print_msg("waitpid(%d) failed: %s (%d)\n",
+ child->pid, strerror(errno),
+ errno);
+ fail = true;
+ break;
+ }
+
+ if (WIFEXITED(status)) {
+ child->exit_status = WEXITSTATUS(status);
+ child->exited = true;
+ }
+
+ if (WIFSIGNALED(status)) {
+ child->exit_signal = WTERMSIG(status);
+ ksft_print_msg("%s: Exited due to signal %d\n",
+ child->name);
+ fail = true;
+ child->exited = true;
+ }
+ } while (!child->exited);
+ }
+
+ if (!child->output_seen) {
+ ksft_print_msg("%s no output seen\n", child->name);
+ fail = true;
+ }
+
+ if (child->exit_status != 0) {
+ ksft_print_msg("%s exited with error code %d\n",
+ child->name, child->exit_status);
+ fail = true;
+ }
+
+ ksft_test_result(!fail, "%s\n", child->name);
+}
+
+static void handle_child_signal(int sig, siginfo_t *info, void *context)
+{
+ int i;
+ bool found = false;
+
+ for (i = 0; i < num_children; i++) {
+ if (children[i].pid == info->si_pid) {
+ children[i].exited = true;
+ children[i].exit_status = info->si_status;
+ found = true;
+ break;
+ }
+ }
+
+ if (!found)
+ ksft_print_msg("SIGCHLD for unknown PID %d with status %d\n",
+ info->si_pid, info->si_status);
+}
+
+static void handle_exit_signal(int sig, siginfo_t *info, void *context)
+{
+ int i;
+
+ /* If we're already exiting then don't signal again */
+ if (terminate)
+ return;
+
+ ksft_print_msg("Got signal, exiting...\n");
+
+ terminate = true;
+
+ /*
+ * This should be redundant, the main loop should clean up
+ * after us, but for safety stop everything we can here.
+ */
+ for (i = 0; i < num_children; i++)
+ child_stop(&children[i]);
+}
+
+/* Handle any pending output without blocking */
+static void drain_output(bool flush)
+{
+ int ret = 1;
+ int i;
+
+ while (ret > 0) {
+ ret = epoll_wait(epoll_fd, evs, tests, 0);
+ if (ret < 0) {
+ if (errno == EINTR)
+ continue;
+ ksft_print_msg("epoll_wait() failed: %s (%d)\n",
+ strerror(errno), errno);
+ }
+
+ for (i = 0; i < ret; i++)
+ child_output(evs[i].data.ptr, evs[i].events, flush);
+ }
+}
+
+static const struct option options[] = {
+ { "timeout", required_argument, NULL, 't' },
+ { }
+};
+
+int main(int argc, char **argv)
+{
+ int seen_children;
+ bool all_children_started = false;
+ int gcs_threads;
+ int timeout = 10;
+ int ret, cpus, i, c;
+ struct sigaction sa;
+
+ while ((c = getopt_long(argc, argv, "t:", options, NULL)) != -1) {
+ switch (c) {
+ case 't':
+ ret = sscanf(optarg, "%d", &timeout);
+ if (ret != 1)
+ ksft_exit_fail_msg("Failed to parse timeout %s\n",
+ optarg);
+ break;
+ default:
+ ksft_exit_fail_msg("Unknown argument\n");
+ }
+ }
+
+ cpus = num_processors();
+ tests = 0;
+
+ if (getauxval(AT_HWCAP2) & HWCAP2_GCS) {
+ /* One extra thread, trying to trigger migrations */
+ gcs_threads = cpus + 1;
+ tests += gcs_threads;
+ } else {
+ gcs_threads = 0;
+ }
+
+ ksft_print_header();
+ ksft_set_plan(tests);
+
+ ksft_print_msg("%d CPUs, %d GCS threads\n",
+ cpus, gcs_threads);
+
+ if (!tests)
+ ksft_exit_skip("No tests scheduled\n");
+
+ if (timeout > 0)
+ ksft_print_msg("Will run for %ds\n", timeout);
+ else
+ ksft_print_msg("Will run until terminated\n");
+
+ children = calloc(sizeof(*children), tests);
+ if (!children)
+ ksft_exit_fail_msg("Unable to allocate child data\n");
+
+ ret = epoll_create1(EPOLL_CLOEXEC);
+ if (ret < 0)
+ ksft_exit_fail_msg("epoll_create1() failed: %s (%d)\n",
+ strerror(errno), ret);
+ epoll_fd = ret;
+
+ /* Create a pipe which children will block on before execing */
+ ret = pipe(startup_pipe);
+ if (ret != 0)
+ ksft_exit_fail_msg("Failed to create startup pipe: %s (%d)\n",
+ strerror(errno), errno);
+
+ /* Get signal handers ready before we start any children */
+ memset(&sa, 0, sizeof(sa));
+ sa.sa_sigaction = handle_exit_signal;
+ sa.sa_flags = SA_RESTART | SA_SIGINFO;
+ sigemptyset(&sa.sa_mask);
+ ret = sigaction(SIGINT, &sa, NULL);
+ if (ret < 0)
+ ksft_print_msg("Failed to install SIGINT handler: %s (%d)\n",
+ strerror(errno), errno);
+ ret = sigaction(SIGTERM, &sa, NULL);
+ if (ret < 0)
+ ksft_print_msg("Failed to install SIGTERM handler: %s (%d)\n",
+ strerror(errno), errno);
+ sa.sa_sigaction = handle_child_signal;
+ ret = sigaction(SIGCHLD, &sa, NULL);
+ if (ret < 0)
+ ksft_print_msg("Failed to install SIGCHLD handler: %s (%d)\n",
+ strerror(errno), errno);
+
+ evs = calloc(tests, sizeof(*evs));
+ if (!evs)
+ ksft_exit_fail_msg("Failed to allocated %d epoll events\n",
+ tests);
+
+ for (i = 0; i < gcs_threads; i++)
+ start_thread(&children[i]);
+
+ /*
+ * All children started, close the startup pipe and let them
+ * run.
+ */
+ close(startup_pipe[0]);
+ close(startup_pipe[1]);
+
+ timeout *= 10;
+ for (;;) {
+ /* Did we get a signal asking us to exit? */
+ if (terminate)
+ break;
+
+ /*
+ * Timeout is counted in 100ms with no output, the
+ * tests print during startup then are silent when
+ * running so this should ensure they all ran enough
+ * to install the signal handler, this is especially
+ * useful in emulation where we will both be slow and
+ * likely to have a large set of VLs.
+ */
+ ret = epoll_wait(epoll_fd, evs, tests, 100);
+ if (ret < 0) {
+ if (errno == EINTR)
+ continue;
+ ksft_exit_fail_msg("epoll_wait() failed: %s (%d)\n",
+ strerror(errno), errno);
+ }
+
+ /* Output? */
+ if (ret > 0) {
+ for (i = 0; i < ret; i++) {
+ child_output(evs[i].data.ptr, evs[i].events,
+ false);
+ }
+ continue;
+ }
+
+ /* Otherwise epoll_wait() timed out */
+
+ /*
+ * If the child processes have not produced output they
+ * aren't actually running the tests yet.
+ */
+ if (!all_children_started) {
+ seen_children = 0;
+
+ for (i = 0; i < num_children; i++)
+ if (children[i].output_seen ||
+ children[i].exited)
+ seen_children++;
+
+ if (seen_children != num_children) {
+ ksft_print_msg("Waiting for %d children\n",
+ num_children - seen_children);
+ continue;
+ }
+
+ all_children_started = true;
+ }
+
+ ksft_print_msg("Sending signals, timeout remaining: %d00ms\n",
+ timeout);
+
+ for (i = 0; i < num_children; i++)
+ child_tickle(&children[i]);
+
+ /* Negative timeout means run indefinitely */
+ if (timeout < 0)
+ continue;
+ if (--timeout == 0)
+ break;
+ }
+
+ ksft_print_msg("Finishing up...\n");
+ terminate = true;
+
+ for (i = 0; i < tests; i++)
+ child_stop(&children[i]);
+
+ drain_output(false);
+
+ for (i = 0; i < tests; i++)
+ child_cleanup(&children[i]);
+
+ drain_output(true);
+
+ ksft_print_cnts();
+
+ return 0;
+}

--
2.30.2


2023-08-09 15:33:41

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Mon, Aug 07, 2023 at 11:00:08PM +0100, Mark Brown wrote:
> +2. Enabling and disabling Guarded Control Stacks
> +-------------------------------------------------
> +
> +* GCS is enabled and disabled for a thread via the PR_SET_SHADOW_STACK_STATUS
> + prctl(), this takes a single flags argument specifying which GCS features
> + should be used.
> +
> +* When set PR_SHADOW_STACK_ENABLE flag allocates a Guarded Control Stack for

The 'for' at the end of the line above is not needed.

> + and enables GCS for the thread, enabling the functionality controlled by
> + GCSPRE0_EL1.{nTR, RVCHKEN, PCRSEL}.

This should be GCSCRE0_EL1.

> +* When set the PR_SHADOW_STACK_PUSH flag enables the functionality controlled
> + by GCSCRE0_EL1.PUSHMEn, allowing explicit GCS pushes.
> +
> +* When set the PR_SHADOW_STACK_WRITE flag enables the functionality controlled
> + by GCSCRE0_EL1.STREn, allowing explicit stores to the Guarded Control Stack.
> +
> +* Any unknown flags will cause PR_SET_SHADOW_STACK_STATUS to return -EINVAL.
> +
> +* PR_LOCK_SHADOW_STACK_STATUS is passed a bitmask of features with the same
> + values as used for PR_SET_SHADOW_STACK_STATUS. Any future changes to the
> + status of the specified GCS mode bits will be rejected.
> +
> +* PR_LOCK_SHADOW_STACK_STATUS allows any bit to be locked, this allows
> + userspace to prevent changes to any future features.

I presume a new lock prctl() won't allow unlocking but can only extend
the lock. I haven't looked at the patches yet but it may be worth
spelling this out.

> +* PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS affect only the
> + thread the called them, any other running threads will be unaffected.

s/the called/that called/

> +* New threads inherit the GCS configuration of the thread that created them.
> +
> +* GCS is disabled on exec().
> +
> +* The current GCS configuration for a thread may be read with the
> + PR_GET_SHADOW_STACK_STATUS prctl(), this returns the same flags that
> + are passed to PR_SET_SHADOW_STACK_STATUS.
> +
> +* If GCS is disabled for a thread after having previously been enabled then
> + the stack will remain allocated for the lifetime of the thread.

Sorry if this has been discussed in other threads. What is the issue
with unmapping/freeing of the shadow stack?

> At present
> + any attempt to reenable GCS for the thread will be rejected, this may be
> + revisited in future.

What's the rationale here? Is it that function returns won't work?

> +3. Allocation of Guarded Control Stacks
> +----------------------------------------
> +
> +* When GCS is enabled for a thread a new Guarded Control Stack will be
> + allocated for it of size RLIMIT_STACK / 2 or 2 gigabytes, whichever is
> + smaller.

Is this number based on the fact that a function call would only push
the LR to GCS while standard function prologue pushes at least two
registers?

> +* When GCS is disabled for a thread the Guarded Control Stack initially
> + allocated for that thread will be freed. Note carefully that if the
> + stack has been switched this may not be the stack currently in use by
> + the thread.

Does this not contradict an earlier statement that the GCS is not freed
for a thread when disabled?

> +4. Signal handling
> +--------------------
> +
> +* A new signal frame record gcs_context encodes the current GCS mode and
> + pointer for the interrupted context on signal delivery. This will always
> + be present on systems that support GCS.
> +
> +* The record contains a flag field which reports the current GCS configuration
> + for the interrupted context as PR_GET_SHADOW_STACK_STATUS would.
> +
> +* The signal handler is run with the same GCS configuration as the interrupted
> + context.
> +
> +* When GCS is enabled for the interrupted thread a signal handling specific
> + GCS cap token will be written to the GCS, this is an architectural GCS cap
> + token with bit 63 set. The GCSPR_EL0 reported in the signal frame will
> + point to this cap token.

I lost track of the GCS spec versions. Has the valid cap token format
been updated? What I have in my spec (though most likely old) is:

An entry in the Guarded control stack is defined as a Valid cap entry,
if bits [63:12] of the value are same as bits [63:12] of the address
where the entry is stored and bits [11:0] contain a Valid cap token.


The other bits in the code look fine to me so far but I haven't looked
at the code yet.

--
Catalin

2023-08-09 15:48:55

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Wed, Aug 09, 2023 at 03:24:14PM +0100, Catalin Marinas wrote:
> On Mon, Aug 07, 2023 at 11:00:08PM +0100, Mark Brown wrote:

> > +* When set PR_SHADOW_STACK_ENABLE flag allocates a Guarded Control Stack for
>
> The 'for' at the end of the line above is not needed.
>
> > + and enables GCS for the thread, enabling the functionality controlled by

I find it a little clearer that it's a per thread stack here but sure.

> > +* PR_LOCK_SHADOW_STACK_STATUS allows any bit to be locked, this allows
> > + userspace to prevent changes to any future features.

> I presume a new lock prctl() won't allow unlocking but can only extend
> the lock. I haven't looked at the patches yet but it may be worth
> spelling this out.

Yes, there is no unlock prctl() - the lock prctl() just lets you set
locks.

> > +* If GCS is disabled for a thread after having previously been enabled then
> > + the stack will remain allocated for the lifetime of the thread.

> Sorry if this has been discussed in other threads. What is the issue
> with unmapping/freeing of the shadow stack?

If something was in the middle of looking the GCS (eg, you're trying to
disable GCS during signal handling that preempted something that is
logging the call stack) and you pull the GCS storage from underneath it
then things will go badly.

> > At present
> > + any attempt to reenable GCS for the thread will be rejected, this may be
> > + revisited in future.

> What's the rationale here? Is it that function returns won't work?

We have to work out where to point GCSPR_EL0 when reenabling - at the
top of the old stack, where it was before, on a new stack? It was
easier to just not support reenabling than to work out what the sensible
answer is when it's not clear that any real use case exists to inform
what makes sense. We can add support for that later (you can probe by
starting a thread, disabling and trying to reenable) if someone does
come up with a use case.

The use case for disabling is a non-enforcing mode which logs GCS faults
and then disables GCS.

> > +3. Allocation of Guarded Control Stacks
> > +----------------------------------------

> > +* When GCS is enabled for a thread a new Guarded Control Stack will be
> > + allocated for it of size RLIMIT_STACK / 2 or 2 gigabytes, whichever is
> > + smaller.

> Is this number based on the fact that a function call would only push
> the LR to GCS while standard function prologue pushes at least two
> registers?

It's actually based on bitrot that I'd initially chosen a smaller value
since it's likely that functions will push at least something as you
suggest, the patches now just use RLIMIT_STACK. I'll fix.

> > +* When GCS is disabled for a thread the Guarded Control Stack initially
> > + allocated for that thread will be freed. Note carefully that if the
> > + stack has been switched this may not be the stack currently in use by
> > + the thread.

> Does this not contradict an earlier statement that the GCS is not freed
> for a thread when disabled?

Yes, it was meant to say when the thread is freed rather than when GCS
is disabled.

> > +* When GCS is enabled for the interrupted thread a signal handling specific
> > + GCS cap token will be written to the GCS, this is an architectural GCS cap
> > + token with bit 63 set. The GCSPR_EL0 reported in the signal frame will
> > + point to this cap token.

> I lost track of the GCS spec versions. Has the valid cap token format
> been updated? What I have in my spec (though most likely old) is:

> An entry in the Guarded control stack is defined as a Valid cap entry,
> if bits [63:12] of the value are same as bits [63:12] of the address
> where the entry is stored and bits [11:0] contain a Valid cap token.

You have a draft version of the spec, this was changed and now there is
now a token field reserved in the low 12 bits of the register:

#define GCS_CAP_ADDR_MASK GENMASK(63, 12)
#define GCS_CAP_ADDR_SHIFT 12
#define GCS_CAP_ADDR_WIDTH 52
#define GCS_CAP_ADDR(x) FIELD_GET(GCS_CAP_ADDR_MASK, x)

#define GCS_CAP_TOKEN_MASK GENMASK(11, 0)
#define GCS_CAP_TOKEN_SHIFT 0
#define GCS_CAP_TOKEN_WIDTH 12
#define GCS_CAP_TOKEN(x) FIELD_GET(GCS_CAP_TOKEN_MASK, x)

#define GCS_CAP_VALID_TOKEN 0x1
#define GCS_CAP_IN_PROGRESS_TOKEN 0x5

#define GCS_CAP(x) ((((unsigned long)x) & GCS_CAP_ADDR_MASK) | \
GCS_CAP_VALID_TOKEN)


Attachments:
(No filename) (4.38 kB)
signature.asc (499.00 B)
Download all attachments

2023-08-10 09:30:46

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

The 08/09/2023 16:34, Mark Brown wrote:
> On Wed, Aug 09, 2023 at 03:24:14PM +0100, Catalin Marinas wrote:
> > On Mon, Aug 07, 2023 at 11:00:08PM +0100, Mark Brown wrote:
> > > +* When GCS is enabled for a thread a new Guarded Control Stack will be
> > > + allocated for it of size RLIMIT_STACK / 2 or 2 gigabytes, whichever is
> > > + smaller.
>
> > Is this number based on the fact that a function call would only push
> > the LR to GCS while standard function prologue pushes at least two
> > registers?
>
> It's actually based on bitrot that I'd initially chosen a smaller value
> since it's likely that functions will push at least something as you
> suggest, the patches now just use RLIMIT_STACK. I'll fix.

the pcs requires 16byte aligned stack frames, with 8byte per gcs entry
there is no need for same gcs size as stack size in userspace.

you can argue about a fixed size small increment (stacksize/2 + inc)
for signal handling on alt stack and special tokens, but stack size is
overkill i think.

fwiw my current makecontext patch uses roundup(stacksize/2+160).
(threads guaranteed to have about 300bytes of data on the stack in glibc
so if gcs is stacksize/2, that accounts for the increment. this is for
the theoretical case when an empty thread just tries to overflow the
stack and then handle the fault on sigaltstack.)

2023-08-10 12:03:25

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Thu, Aug 10, 2023 at 09:55:50AM +0100, Szabolcs Nagy wrote:
> The 08/09/2023 16:34, Mark Brown wrote:

> > It's actually based on bitrot that I'd initially chosen a smaller value
> > since it's likely that functions will push at least something as you
> > suggest, the patches now just use RLIMIT_STACK. I'll fix.

> the pcs requires 16byte aligned stack frames, with 8byte per gcs entry
> there is no need for same gcs size as stack size in userspace.

I agree that it's going to be excessive for pretty much all
applications, I adjusted it to match x86 as part of the general effort
to avoid divergence and because I was a bit concerned about non-PCS
cases (eg, JITed code) potentially running into trouble, especially with
smaller stack limits. It's not an issue I have super strong opinions on
though, as you can see I had implemented it both ways at various times.


Attachments:
(No filename) (891.00 B)
signature.asc (499.00 B)
Download all attachments

2023-08-10 15:04:17

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

The 08/10/2023 12:41, Mark Brown wrote:
> On Thu, Aug 10, 2023 at 09:55:50AM +0100, Szabolcs Nagy wrote:
> > The 08/09/2023 16:34, Mark Brown wrote:
>
> > > It's actually based on bitrot that I'd initially chosen a smaller value
> > > since it's likely that functions will push at least something as you
> > > suggest, the patches now just use RLIMIT_STACK. I'll fix.
>
> > the pcs requires 16byte aligned stack frames, with 8byte per gcs entry
> > there is no need for same gcs size as stack size in userspace.
>
> I agree that it's going to be excessive for pretty much all
> applications, I adjusted it to match x86 as part of the general effort
> to avoid divergence and because I was a bit concerned about non-PCS
> cases (eg, JITed code) potentially running into trouble, especially with

is that even possible?

16byte alignment is not a convention but architectural:
access via unaligned sp traps (at least in userspace).

it is possible to use bl such that the stack is not involved
e.g. if there is no bl/ret pairing, but if we base the gcs
size on the stack size then i'd expect one stack frame per
bl/ret pair with 16byte alignment, or is there a programming
model possible that uses 8byte stack per bl?

> smaller stack limits. It's not an issue I have super strong opinions on
> though, as you can see I had implemented it both ways at various times.



2023-08-10 17:22:07

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Thu, Aug 10, 2023 at 02:34:04PM +0100, Szabolcs Nagy wrote:
> The 08/10/2023 12:41, Mark Brown wrote:

> > I agree that it's going to be excessive for pretty much all
> > applications, I adjusted it to match x86 as part of the general effort
> > to avoid divergence and because I was a bit concerned about non-PCS
> > cases (eg, JITed code) potentially running into trouble, especially with

> is that even possible?

> 16byte alignment is not a convention but architectural:
> access via unaligned sp traps (at least in userspace).

> it is possible to use bl such that the stack is not involved
> e.g. if there is no bl/ret pairing, but if we base the gcs
> size on the stack size then i'd expect one stack frame per
> bl/ret pair with 16byte alignment, or is there a programming
> model possible that uses 8byte stack per bl?

That's definitely what I'd expect most of the time. You'd need to be
tracking what needs pushing in some other register but it's possible.
Quite why you'd do this is a separate question, I think I'm being overly
cautious worrying about anyone actually having done it but it wouldn't
be the first time I was surprised by someone doing something unexpected.
Like I say I think it's excessive and was erring on the side of being
conservative.


Attachments:
(No filename) (1.27 kB)
signature.asc (499.00 B)
Download all attachments

2023-08-10 17:47:34

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 11/36] arm64/mm: Map pages for guarded control stack

On Mon, Aug 07, 2023 at 11:00:16PM +0100, Mark Brown wrote:
> diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
> index 8f5b7ce857ed..8f40198cd44e 100644
> --- a/arch/arm64/mm/mmap.c
> +++ b/arch/arm64/mm/mmap.c
> @@ -79,8 +79,18 @@ arch_initcall(adjust_protection_map);
>
> pgprot_t vm_get_page_prot(unsigned long vm_flags)
> {
> - pteval_t prot = pgprot_val(protection_map[vm_flags &
> + pteval_t prot;
> +
> + /* If this is a GCS then only interpret VM_WRITE. */
> + if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) {
> + if (vm_flags & VM_WRITE)
> + prot = _PAGE_GCS;
> + else
> + prot = _PAGE_GCS_RO;
> + } else {
> + prot = pgprot_val(protection_map[vm_flags &
> (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
> + }
>
> if (vm_flags & VM_ARM64_BTI)
> prot |= PTE_GP;

Some combinations here don't make sense like GCS + exec or BTI. I think
the code above (correctly) ignores exec but it still sets PTE_GP if BTI
(the architecture may allow this but you can't execute from the GCS page
anyway).

I haven't checked the x86 patches to see when VM_SHADOW_STACK is set but
if there's no additional check at a higher level, we should add
something to arch_validate_flags(), assuming it's called on those paths.

--
Catalin

2023-08-11 14:47:29

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 09/36] arm64/mm: Allocate PIE slots for EL0 guarded control stack

On Mon, Aug 07, 2023 at 11:00:14PM +0100, Mark Brown wrote:
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index eed814b00a38..b157ae0420ed 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -131,15 +131,23 @@ extern bool arm64_use_ng_mappings;
> /* 6: PTE_PXN | PTE_WRITE */
> /* 7: PAGE_SHARED_EXEC PTE_PXN | PTE_WRITE | PTE_USER */
> /* 8: PAGE_KERNEL_ROX PTE_UXN */
> -/* 9: PTE_UXN | PTE_USER */
> +/* 9: PAGE_GCS_RO PTE_UXN | PTE_USER */
> /* a: PAGE_KERNEL_EXEC PTE_UXN | PTE_WRITE */
> -/* b: PTE_UXN | PTE_WRITE | PTE_USER */
> +/* b: PAGE_GCS PTE_UXN | PTE_WRITE | PTE_USER */
> /* c: PAGE_KERNEL_RO PTE_UXN | PTE_PXN */
> /* d: PAGE_READONLY PTE_UXN | PTE_PXN | PTE_USER */
> /* e: PAGE_KERNEL PTE_UXN | PTE_PXN | PTE_WRITE */
> /* f: PAGE_SHARED PTE_UXN | PTE_PXN | PTE_WRITE | PTE_USER */
>
> +#define _PAGE_GCS (_PAGE_DEFAULT | PTE_UXN | PTE_WRITE | PTE_USER)
> +#define _PAGE_GCS_RO (_PAGE_DEFAULT | PTE_UXN | PTE_USER)
> +
> +#define PAGE_GCS __pgprot(_PAGE_GCS)
> +#define PAGE_GCS_RO __pgprot(_PAGE_GCS_RO)
> +
> #define PIE_E0 ( \
> + PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS), PIE_GCS) | \
> + PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS_RO), PIE_R) | \
> PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_X_O) | \
> PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX) | \
> PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RWX) | \
> @@ -147,6 +155,8 @@ extern bool arm64_use_ng_mappings;
> PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED), PIE_RW))
>
> #define PIE_E1 ( \
> + PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS), PIE_RW) | \
> + PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS_RO), PIE_R) | \

Had some thoughts on this. Why do we need the EL1 GCS attributes to map
to RW? The instructions we'd use to write the shadow stack are the GCS
'T' variants that run as user already.

The only instructions we have in the kernel that would run as EL1 on a
user address are the exclusives (futex code or the old deprecated
emulation but we don't care about them in this context). So I wonder
whether the kernel PIE entry could simply be PIE_NONE_O. Would this be
too restrictive for future uses? Given the coherency between a GCS
access and a standard data access, we may want to restrict it now until
we have a use-case.

--
Catalin

2023-08-11 15:18:38

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 17/36] arm64/mm: Handle GCS data aborts

On Mon, Aug 07, 2023 at 11:00:22PM +0100, Mark Brown wrote:
> @@ -510,6 +527,26 @@ static vm_fault_t __do_page_fault(struct mm_struct *mm,
> */
> if (!(vma->vm_flags & vm_flags))
> return VM_FAULT_BADACCESS;
> +
> + if (vma->vm_flags & VM_SHADOW_STACK) {
> + /*
> + * Writes to a GCS must either be generated by a GCS
> + * operation or be from EL1.
> + */
> + if (is_write_abort(esr) &&
> + !(is_gcs_fault(esr) || is_el1_data_abort(esr)))
> + return VM_FAULT_BADACCESS;

Related to my PIE permissions comment: when do we have a valid EL1 data
write abort that's not a GCS fault? Does a faulting GCSSTTR set the
ESR_ELx_GCS bit?

> + } else {
> + /*
> + * GCS faults should never happen for pages that are
> + * not part of a GCS and the operation being attempted
> + * can never succeed.
> + */
> + if (is_gcs_fault(esr))
> + return VM_FAULT_BADACCESS;

If one does a GCS push/store to a non-GCS page, do we get a GCS fault or
something else? I couldn't figure out from the engineering spec. If the
hardware doesn't generate such exceptions, we might as well remove this
'else' branch. But maybe it does generate a GCS-specific fault as you
added a similar check in is_invalid_el0_gcs_access().

> @@ -595,6 +644,19 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
> if (!vma)
> goto lock_mmap;
>
> + /*
> + * We get legitimate write faults for GCS pages from GCS
> + * operations and from EL1 writes to EL0 pages but just plain

What are the EL1 writes to the shadow stack? Would it not use
copy_to_user_gcs()?

--
Catalin

2023-08-11 15:54:09

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 18/36] arm64/gcs: Context switch GCS state for EL0

On Mon, Aug 07, 2023 at 11:00:23PM +0100, Mark Brown wrote:
> @@ -271,12 +272,31 @@ static void flush_tagged_addr_state(void)
> clear_thread_flag(TIF_TAGGED_ADDR);
> }
>
> +#ifdef CONFIG_ARM64_GCS
> +
> +static void flush_gcs(void)
> +{
> + if (system_supports_gcs()) {

Nitpick: use "if (system_supports_gcs()) return" when we have more than
a line in the conditional block (slightly more consistent with other
places).

> + gcs_free(current);
> + current->thread.gcs_el0_mode = 0;
> + write_sysreg_s(0, SYS_GCSCRE0_EL1);
> + write_sysreg_s(0, SYS_GCSPR_EL0);
> + }
> +}

Do we need and isb() or there's one on this path? If it's only EL0
making use of this register, we should be fine with the ERET before
returning to user. Not sure whether the kernel uses this, GCSSTTR
doesn't need it.

> +static void gcs_thread_switch(struct task_struct *next)
> +{
> + if (!system_supports_gcs())
> + return;
> +
> + gcs_preserve_current_state();
> +
> + /*
> + * Ensure that GCS changes are observable by/from other PEs in
> + * case of migration.
> + */
> + if (task_gcs_el0_enabled(current) || task_gcs_el0_enabled(next))
> + gcsb_dsync();

What's this barrier for? The spec (at least the version I have) only
talks about accesses, nothing to do with the registers that we context
switch here.

--
Catalin

2023-08-11 16:49:08

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 19/36] arm64/gcs: Allocate a new GCS for threads with GCS enabled

On Mon, Aug 07, 2023 at 11:00:24PM +0100, Mark Brown wrote:
> diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
> index b0a67efc522b..1e059c37088d 100644
> --- a/arch/arm64/mm/gcs.c
> +++ b/arch/arm64/mm/gcs.c
> @@ -8,6 +8,62 @@
> #include <asm/cpufeature.h>
> #include <asm/page.h>
>
> +static unsigned long alloc_gcs(unsigned long addr, unsigned long size,
> + unsigned long token_offset, bool set_res_tok)
> +{
> + int flags = MAP_ANONYMOUS | MAP_PRIVATE;
> + struct mm_struct *mm = current->mm;
> + unsigned long mapped_addr, unused;
> +
> + if (addr)
> + flags |= MAP_FIXED_NOREPLACE;
> +
> + mmap_write_lock(mm);
> + mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags,
> + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL);

Why not PROT_WRITE as well? I guess I need to check the x86 patches
since the do_mmap() called here has a different prototype than what's in
mainline.

This gets confusing since currently the VM_* flags are derived from the
PROT_* flags passed to mmap(). But you skip the PROT_WRITE in favour of
adding VM_WRITE directly.

I haven't followed the x86 discussion but did we run out of PROT_* bits
for a PROT_SHADOW_STACK?

> + mmap_write_unlock(mm);
> +
> + return mapped_addr;
> +}
> +
> +static unsigned long gcs_size(unsigned long size)
> +{
> + if (size)
> + return PAGE_ALIGN(size);
> +
> + /* Allocate RLIMIT_STACK with limits of PAGE_SIZE..4G */
> + size = PAGE_ALIGN(min_t(unsigned long long,
> + rlimit(RLIMIT_STACK), SZ_4G));
> + return max(PAGE_SIZE, size);
> +}

I saw Szabolcs commenting on the default size as well. Maybe we should
go for RLIMIT_STACK/2 but let's see how the other sub-thread is going.

> +
> +unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
> + unsigned long clone_flags, size_t size)
> +{
> + unsigned long addr;
> +
> + if (!system_supports_gcs())
> + return 0;
> +
> + if (!task_gcs_el0_enabled(tsk))
> + return 0;
> +
> + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM)
> + return 0;

Is it safe for CLONE_VFORK not to get a new shadow stack? A syscall for
exec could push something to the stack. I guess the GCS pointer in the
parent stays the same, so it wouldn't matter.

That said, I think this check should be somewhere higher up in the
caller of gcs_alloc_thread_stack(). The copy_thread_gcs() function
already does most of the above checks. Is the GCS allocation called from
elsewhere as well?

--
Catalin

2023-08-11 17:44:27

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 07/36] arm64/gcs: Provide copy_to_user_gcs()

On Mon, Aug 07, 2023 at 11:00:12PM +0100, Mark Brown wrote:
> +static inline int copy_to_user_gcs(unsigned long __user *addr,
> + unsigned long *val,
> + int count)
> +{
> + int ret = -EFAULT;
> + int i;
> +
> + if (access_ok((char __user *)addr, count * sizeof(u64))) {
> + uaccess_ttbr0_enable();
> + for (i = 0; i < count; i++) {
> + ret = gcssttr(addr++, *val++);
> + if (ret != 0)
> + break;
> + }
> + uaccess_ttbr0_disable();
> + }
> +
> + return ret;
> +}

I think it makes more sense to have a put_user_gcs() of a single
element. I've only seen it used with 2 elements in the signal code but
we could as well do two put_user_gcs() calls (as we do for other stuff
that we push to the signal frame).

--
Catalin

2023-08-11 17:50:16

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 21/36] arm64/mm: Implement map_shadow_stack()

On Mon, Aug 07, 2023 at 11:00:26PM +0100, Mark Brown wrote:
> As discussed extensively in the changelog for the addition of this
> syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
> existing mmap() and madvise() syscalls do not map entirely well onto the
> security requirements for guarded control stacks since they lead to
> windows where memory is allocated but not yet protected or stacks which
> are not properly and safely initialised. Instead a new syscall
> map_shadow_stack() has been defined which allocates and initialises a
> shadow stack page.

I guess I need to read the x86 discussion after all ;).

Given that we won't have an mmap(PROT_SHADOW_STACK), are we going to
have restrictions on mprotect()? E.g. it would be useful to reject a
PROT_EXEC on the shadow stack.

--
Catalin

2023-08-16 09:39:12

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 17/36] arm64/mm: Handle GCS data aborts

On Fri, Aug 11, 2023 at 04:09:33PM +0100, Catalin Marinas wrote:
> On Mon, Aug 07, 2023 at 11:00:22PM +0100, Mark Brown wrote:

> > + if (is_write_abort(esr) &&
> > + !(is_gcs_fault(esr) || is_el1_data_abort(esr)))
> > + return VM_FAULT_BADACCESS;

> Related to my PIE permissions comment: when do we have a valid EL1 data
> write abort that's not a GCS fault? Does a faulting GCSSTTR set the
> ESR_ELx_GCS bit?

Yes, it should do. The GCS instructions have access descriptors created
with CreateAccDescGCS() which results in the access being flagged as a
GCS access.

> > + } else {
> > + /*
> > + * GCS faults should never happen for pages that are
> > + * not part of a GCS and the operation being attempted
> > + * can never succeed.
> > + */
> > + if (is_gcs_fault(esr))
> > + return VM_FAULT_BADACCESS;

> If one does a GCS push/store to a non-GCS page, do we get a GCS fault or
> something else? I couldn't figure out from the engineering spec. If the
> hardware doesn't generate such exceptions, we might as well remove this
> 'else' branch. But maybe it does generate a GCS-specific fault as you
> added a similar check in is_invalid_el0_gcs_access().

Yes, see AddGCSRecord() and LoadCheckGCSRecord() - all GCS initiated
accesses need to be AccDescGCS so appropriate permissions enforcement
can happen and that's what causes the fault to be flagged as GCS.

> > @@ -595,6 +644,19 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
> > if (!vma)
> > goto lock_mmap;
> >
> > + /*
> > + * We get legitimate write faults for GCS pages from GCS
> > + * operations and from EL1 writes to EL0 pages but just plain

> What are the EL1 writes to the shadow stack? Would it not use
> copy_to_user_gcs()?

They should, yes - I'll reword the comment.


Attachments:
(No filename) (1.81 kB)
signature.asc (499.00 B)
Download all attachments

2023-08-17 00:44:03

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 21/36] arm64/mm: Implement map_shadow_stack()

On Tue, Aug 15, 2023 at 08:42:52PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2023-08-07 at 23:00 +0100, Mark Brown wrote:
> > +???????if (flags & ~(SHADOW_STACK_SET_TOKEN |
> > SHADOW_STACK_SET_MARKER))
> > +???????????????return -EINVAL;

> Thanks for adding SHADOW_STACK_SET_MARKER. I don't see where it is
> defined in these patches though. Might have been left out on accident?

I added it to the dependency patches I've got which pull bits out of the
x86 series prior to you having rebased it, the ABI bits are mixed in
with the x86 architecture changes which I didn't feel like dealing with
the rebasing for so I pulled out the ABI portions. I'll resolve this
properly when I rebase back onto the x86 series (ideally after the next
merge window it'll be in mainline!). For these that'll probably boil
down to adding defines to prctl.h for the generic prctl API.


Attachments:
(No filename) (886.00 B)
signature.asc (499.00 B)
Download all attachments

2023-08-17 08:53:22

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 18/36] arm64/gcs: Context switch GCS state for EL0

On Fri, Aug 11, 2023 at 04:32:10PM +0100, Catalin Marinas wrote:
> On Mon, Aug 07, 2023 at 11:00:23PM +0100, Mark Brown wrote:

> > + gcs_free(current);
> > + current->thread.gcs_el0_mode = 0;
> > + write_sysreg_s(0, SYS_GCSCRE0_EL1);
> > + write_sysreg_s(0, SYS_GCSPR_EL0);
> > + }
> > +}

> Do we need and isb() or there's one on this path? If it's only EL0
> making use of this register, we should be fine with the ERET before
> returning to user. Not sure whether the kernel uses this, GCSSTTR
> doesn't need it.

They're only used by EL0, at EL1 we do read GCSPR for signal handling
but AIUI that shouldn't be any more of an issue than it is for the
TPIDRs which we don't have a barrier for. It's possible I'm
misunderstanding though.

> > + /*
> > + * Ensure that GCS changes are observable by/from other PEs in
> > + * case of migration.
> > + */
> > + if (task_gcs_el0_enabled(current) || task_gcs_el0_enabled(next))
> > + gcsb_dsync();

> What's this barrier for? The spec (at least the version I have) only
> talks about accesses, nothing to do with the registers that we context
> switch here.

Right, it's for the GCS memory rather than the registers. I'm fairly
sure it's excessive but but was erring on the side of caution until I
have convinced myself that the interactions between GCS barriers and
regular barriers were doing the right thing, until we have physical
implementations to contend with I'd guess the practical impact will be
minimal.


Attachments:
(No filename) (1.47 kB)
signature.asc (499.00 B)
Download all attachments

2023-08-18 13:57:38

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 07/36] arm64/gcs: Provide copy_to_user_gcs()

On Fri, Aug 11, 2023 at 05:36:05PM +0100, Catalin Marinas wrote:
> On Mon, Aug 07, 2023 at 11:00:12PM +0100, Mark Brown wrote:
> > +static inline int copy_to_user_gcs(unsigned long __user *addr,
> > + unsigned long *val,
> > + int count)

> I think it makes more sense to have a put_user_gcs() of a single
> element. I've only seen it used with 2 elements in the signal code but
> we could as well do two put_user_gcs() calls (as we do for other stuff
> that we push to the signal frame).

Right, it's just the two element array in the signals code and the one
element for the context token in map_shadow_stack(). I can refactor to
a single read/write operation, I'd originally written it that way but I
wasn't thrilled with either writing a load of fun macros to mirror the
way vanilla put_user() is written or having code that looked very
different to the other similarly named functions were done.


Attachments:
(No filename) (931.00 B)
signature.asc (499.00 B)
Download all attachments

2023-08-19 12:21:14

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Fri, Aug 18, 2023 at 06:29:54PM +0100, Catalin Marinas wrote:

> A related question - it may have been discussed intensively on the x86
> thread (I may read it sometime) - why not have the libc map the shadow

Your assumption that this is a single thread feels optimistic there.

> stack and pass the pointer/size to clone3()? It saves us from having to
> guess what the right size we'd need. struct clone_args is extensible.

I can't recall or locate the specific reasoning there right now, perhaps
Rick or someone else can? I'd guess there would be compat concerns for
things that don't go via libc which would complicate the story with
identifying and marking things as GCS/SS safe, it's going to be more
robust to just supply a GCS if the process is using it. That said
having a default doesn't preclude us using the extensibility to allow
userspace directly to control the GCS size, I would certainly be in
favour of adding support for that.

> (I plan to get back next week to this series, I'll need to read a bit
> more on the spec)

I've been making changes, mostly in response to your feedback, so there
should be a new version on Monday even if not everything is addressed
yet.


Attachments:
(No filename) (1.19 kB)
signature.asc (499.00 B)
Download all attachments

2023-08-19 13:29:02

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 21/36] arm64/mm: Implement map_shadow_stack()

On Fri, Aug 11, 2023 at 05:38:24PM +0100, Catalin Marinas wrote:

> Given that we won't have an mmap(PROT_SHADOW_STACK), are we going to
> have restrictions on mprotect()? E.g. it would be useful to reject a
> PROT_EXEC on the shadow stack.

mprotect() uses arch_validate_flags() which we're already having cover
this so it's already covered.


Attachments:
(No filename) (351.00 B)
signature.asc (499.00 B)
Download all attachments

2023-08-19 22:40:42

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 09/36] arm64/mm: Allocate PIE slots for EL0 guarded control stack

On Fri, Aug 11, 2023 at 03:23:12PM +0100, Catalin Marinas wrote:

> > #define PIE_E1 ( \
> > + PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS), PIE_RW) | \
> > + PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS_RO), PIE_R) | \

> Had some thoughts on this. Why do we need the EL1 GCS attributes to map
> to RW? The instructions we'd use to write the shadow stack are the GCS
> 'T' variants that run as user already.

> The only instructions we have in the kernel that would run as EL1 on a
> user address are the exclusives (futex code or the old deprecated
> emulation but we don't care about them in this context). So I wonder
> whether the kernel PIE entry could simply be PIE_NONE_O. Would this be
> too restrictive for future uses? Given the coherency between a GCS
> access and a standard data access, we may want to restrict it now until
> we have a use-case.

Good point. I remember I originally wrote that before I checked into
how things like copying pages for ptrace worked but they don't keep
the GCSness of the page so they're fine.

I don't think we need to worry about future uses since these are slots
reserved for GCS use, if we need a different value later


Attachments:
(No filename) (1.18 kB)
signature.asc (499.00 B)
Download all attachments

2023-08-23 15:07:34

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Wed, Aug 23, 2023 at 11:09:59AM +0100, Szabolcs Nagy wrote:
> The 08/22/2023 18:53, Mark Brown wrote:
> > On Tue, Aug 22, 2023 at 05:49:51PM +0100, Catalin Marinas wrote:

> > the what but not the why. For example roughly the current behaviour was
> > already in place in v10 of the original series:

> well the original shstk patches predate clone3 so no surprise there.
> e.g. v6 is from 2018 and clone3 is 2019 linux 5.3
> https://lore.kernel.org/lkml/[email protected]/

Ah, that'd do it. People weren't excited enough on about clone3() when
reviewing the shadow stack patches, I hadn't realised clone3() was so
new.

> > I do worry about the story for users calling the underlying clone3() API
> > (or legacy clone() for that matter) directly, and we would also need to
> > handle the initial GCS enable via prctl() - that's not insurmountable,
> > we could add a size argument there that only gets interpreted during the
> > initial enable for example.

> musl and bionic currently use plain clone for threads.

> and there is user code doing raw clone threads (such threads are
> technically not allowed to call into libc) it's not immediately
> clear to me if having gcs in those threads is better or worse.

Right, that's my big worry - I hadn't realised it was extending as far
as musl/bionic.

> one difference is that userspace can then set gcspr of a new thread
> and e.g. two threads can have overlapping gcs, however i don't think
> this impacts security much since if clone3 is attacker controlled
> then likely all bets are off.

Yeah, I think that's a "you broke it, you get all the pieces" thing.

> > My sense is that they deployment story is going to be smoother with
> > defaults being provided since it avoids dealing with the issue of what
> > to do if userspace creates a thread without a GCS in a GCS enabled
> > process but like I say I'd be totally happy to extend clone3(). I will
> > put some patches together for that (probably once the x86 stuff lands).
> > Given the size of this series it might be better split out for
> > manageability if nothing else.

> i would make thread without gcs to implicitly disable gcs, since
> that's what's bw compat with clones outside of libc (the libc can
> guarantee gcs allocation when gcs is enabled).

That'd create a pretty substantial divergence with the x86 patches if
they land this time around, there's not enough time to rework them now -
I suppose it'd mainly bite people implementing libc type stuff but
still, doesn't feel great.


Attachments:
(No filename) (2.54 kB)
signature.asc (499.00 B)
Download all attachments

2023-08-23 19:39:26

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Wed, Aug 23, 2023 at 05:45:11PM +0100, Catalin Marinas wrote:

> I don't mind the divergence in this area if the libc folks are ok with
> it. x86 can eventually use the clone3() interface if they want more
> flexibility, they'll just have to continue supporting the old one. I
> think we already diverge around the prctl().

Yes, though that's basically the same thing - the difference is
basically just that x86 uses their custom arch_prctl() thing and we use
a regular prctl(), there's nothing conceptual going on like there is
here. I don't really mind either, it's just something where I'd
anticipate pushback.


Attachments:
(No filename) (631.00 B)
signature.asc (499.00 B)
Download all attachments

2023-08-23 21:18:29

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Wed, Aug 23, 2023 at 11:09:59AM +0100, Szabolcs Nagy wrote:
> The 08/22/2023 18:53, Mark Brown wrote:
> > On Tue, Aug 22, 2023 at 05:49:51PM +0100, Catalin Marinas wrote:
> > > It would be good if someone provided a summary of the x86 decision (I'll
> > > get to those thread but most likely in September). I think we concluded
> > > that we can't deploy GCS entirely transparently, so we need a libc
> > > change (apart from the ELF annotations). Since libc is opting in to GCS,
> >
> > Right, we need changes for setjmp()/longjmp() for example.
> >
> > > we could also update the pthread_create() etc. to allocate the shadow
> > > together with the standard stack.
> > >
> > > Anyway, that's my preference but maybe there were good reasons not to do
> > > this.
> >
> > Yeah, it'd be good to understand. I've been through quite a lot of old
> > versions of the x86 series (I've not found them all, there's 30 versions
> > or something of the old series plus the current one is on v9) and the
> > code always appears to have been this way with changelogs that explain
> > the what but not the why. For example roughly the current behaviour was
> > already in place in v10 of the original series:
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> well the original shstk patches predate clone3 so no surprise there.
> e.g. v6 is from 2018 and clone3 is 2019 linux 5.3
> https://lore.kernel.org/lkml/[email protected]/

Good point, I had not realised that.

> > I do worry about the story for users calling the underlying clone3() API
> > (or legacy clone() for that matter) directly, and we would also need to
> > handle the initial GCS enable via prctl() - that's not insurmountable,
> > we could add a size argument there that only gets interpreted during the
> > initial enable for example.
>
> musl and bionic currently use plain clone for threads.
>
> and there is user code doing raw clone threads (such threads are
> technically not allowed to call into libc) it's not immediately
> clear to me if having gcs in those threads is better or worse.
>
> glibc can use clone3 args for gcs, i'd expect the unmap to be more
> annoying than the allocation, but possible (it is certainly more
> work than leaving everything to the kernel).

Unmapping is indeed more complex but I guess something similar needs to
happen for the thread stack to be reclaimed.

The thing I dislike about the kernel automatically mapping it is the
arbitrary fraction of RLIMIT_STACK size. glibc may use RLIMIT_STACK as a
hint for the thread stack size but is this the case for other libraries?
Some quick search (which I may have misinterpreted) shows that musl uses
128KB, bionic 1MB. So at this point the shadow stack size has no
relevance for the actual thread stack.

An alternative would be for the clone3() to provide an address _hint_
and size for GCS and it would still be the kernel doing the mmap (and
munmap on clearing). But at least the user has some control over the
placement of the GCS and its size (and maybe providing the address has
MAP_FIXED semantics).

> > My sense is that they deployment story is going to be smoother with
> > defaults being provided since it avoids dealing with the issue of what
> > to do if userspace creates a thread without a GCS in a GCS enabled
> > process but like I say I'd be totally happy to extend clone3(). I will
> > put some patches together for that (probably once the x86 stuff lands).
> > Given the size of this series it might be better split out for
> > manageability if nothing else.
>
> i would make thread without gcs to implicitly disable gcs, since
> that's what's bw compat with clones outside of libc (the libc can
> guarantee gcs allocation when gcs is enabled).

Yes, this should work. Any invocation of clone() or clone3() without a
shadow stack would disable GCS. What about the reverse, should GCS be
enabled for a thread even if the clone3() caller has GCS disabled? I
guess we shouldn't since GCS enabling depends on the prctl() state set
previously.

--
Catalin

2023-08-30 18:53:40

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Wed, Aug 30, 2023 at 01:37:33PM +0100, Szabolcs Nagy wrote:
> The 08/24/2023 16:43, Catalin Marinas wrote:

> > Is there a use-case for the unlocked configuration to allow disabling
> > the GCS implicitly via a clone syscall?

> how would you handle clone or clone3 without gcs specified?
> (in the cases when clone creates a new thread with new stack)

> (1) fail.
> (2) allocate gcs.
> (3) disable gcs.

...

> problem with (2) is that the size policy and lifetime management
> is in the kernel then. (since only special cases are affected i
> guess that is ok, but i assumed we want to avoid this by moving
> to clone3 and user managed gcs).

Right, it seems like if we go with this then we may as well just allow
plain clone() too.

> the problem with (3) is escaping the security measure, however
> it only applies to very special threads that can always decide
> to opt-in to gcs, so i don't see this as such a bad option and
> at least bw compat with existing code. (in my threat model the
> attacker cannot hijack clone syscalls as that seems stronger
> than hijacking return addresses.)

It doesn't seem great to have a feature which is to a large extent a
security feature where we provide a fairly straightforward mechanism for
disabling the feature and actively expect things to be using it.

Given the timescales until this gets practically deployed on arm64 I
would be inclined to go with making things fail and forcing updates in
the users, though obviously that's less helpful for x86 where the
hardware is in user hands already so it's more of a pressing issue (and
there's already what is effectively option 2 in the code). We could
have the architectures diverge, as you say the effect is likely to be
mainly in very low level code rather than general software.


Attachments:
(No filename) (1.78 kB)
signature.asc (499.00 B)
Download all attachments

2023-09-29 01:04:54

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

The 08/23/2023 14:11, Catalin Marinas wrote:
> On Wed, Aug 23, 2023 at 11:09:59AM +0100, Szabolcs Nagy wrote:
> > The 08/22/2023 18:53, Mark Brown wrote:
> > > I do worry about the story for users calling the underlying clone3() API
> > > (or legacy clone() for that matter) directly, and we would also need to
> > > handle the initial GCS enable via prctl() - that's not insurmountable,
> > > we could add a size argument there that only gets interpreted during the
> > > initial enable for example.
...
> > and there is user code doing raw clone threads (such threads are
> > technically not allowed to call into libc) it's not immediately
> > clear to me if having gcs in those threads is better or worse.

i think raw clone / clone3 users may be relevant so we need a
solution such that they don't fail when gcs args are missing.

userspace allocated gcs works for me, but maybe the alternative
with size only is more consistent (thread gcs is kernel mapped
with fallback size logic if gcs size is missing):

> An alternative would be for the clone3() to provide an address _hint_
> and size for GCS and it would still be the kernel doing the mmap (and
> munmap on clearing). But at least the user has some control over the
> placement of the GCS and its size (and maybe providing the address has
> MAP_FIXED semantics).

the main thread gcs is still special: the size is provided
via prctl (if at all).

2023-10-02 21:11:22

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Thu, Sep 28, 2023 at 05:59:25PM +0100, Szabolcs Nagy wrote:
> The 08/23/2023 14:11, Catalin Marinas wrote:

> > > and there is user code doing raw clone threads (such threads are
> > > technically not allowed to call into libc) it's not immediately
> > > clear to me if having gcs in those threads is better or worse.

> i think raw clone / clone3 users may be relevant so we need a
> solution such that they don't fail when gcs args are missing.

Are we sure about that? Old binaries shouldn't be affected since they
won't turn GCS so we're just talking about new binaries here - are there
really so many of them that we won't be able to get them all converted
over to clone3() and GCS in the timescales we're talking about for GCS
deployment? I obviously don't particularly mind having the default size
logic but if we allow clone() then that's keeping the existing behaviour
and layering allocation via clone3() on top of it which Catalin didn't
want. Catalin?

> userspace allocated gcs works for me, but maybe the alternative
> with size only is more consistent (thread gcs is kernel mapped
> with fallback size logic if gcs size is missing):

If we have size only then the handling of GCS and normal stack in struct
clone_args would be inconsistent. Given that it seems better to have
the field present, we can allow it to be NULL and do the allocation with
the specified size but it should be there.

> > An alternative would be for the clone3() to provide an address _hint_
> > and size for GCS and it would still be the kernel doing the mmap (and
> > munmap on clearing). But at least the user has some control over the
> > placement of the GCS and its size (and maybe providing the address has
> > MAP_FIXED semantics).

> the main thread gcs is still special: the size is provided
> via prctl (if at all).

Either that or we have it do a map_shadow_stack() but that's an extra
syscall during startup.


Attachments:
(No filename) (1.91 kB)
signature.asc (499.00 B)
Download all attachments

2023-10-02 22:06:13

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Mon, 2023-10-02 at 20:49 +0100, Mark Brown wrote:
> On Thu, Sep 28, 2023 at 05:59:25PM +0100, Szabolcs Nagy wrote:
> > The 08/23/2023 14:11, Catalin Marinas wrote:
>
> > > > and there is user code doing raw clone threads (such threads
> > > > are
> > > > technically not allowed to call into libc) it's not immediately
> > > > clear to me if having gcs in those threads is better or worse.
>
> > i think raw clone / clone3 users may be relevant so we need a
> > solution such that they don't fail when gcs args are missing.
>
> Are we sure about that?  Old binaries shouldn't be affected since
> they
> won't turn GCS so we're just talking about new binaries here - are
> there
> really so many of them that we won't be able to get them all
> converted
> over to clone3() and GCS in the timescales we're talking about for
> GCS
> deployment?  I obviously don't particularly mind having the default
> size
> logic but if we allow clone() then that's keeping the existing
> behaviour
> and layering allocation via clone3() on top of it which Catalin
> didn't
> want. 

On the x86 side, there have been a lot of binaries generated that have
been blindly marked as supporting shadow stack. So even if they are not
old, there may still be mismarked ones using clone().

In general there are a lot of tradeoffs with shadow stack between
security, compatibility and performance. Szabolcs had previously
discussed some ideas around all three:
- compatibility (automatic sigaltstack())
- performance/utility (creating a new libc API for makecontext())
- security (token schemes to guarantee only one user of a shadow 
stack at a time)

On the x86 kernel side, we had our hands tied a bit by the existing
userspace, and kind of ended up with a mix. I imagined that we might
see demand along one of those axes after some real world use. At which
point we could add some opt-in ABI tweaks to help.

If ARM is thinking of doing things differently than x86, you might
think about how you weight those tradeoffs. Like, it might be silly to
worry about clone() support if something else ends up breaking
compatibility majorly. But, it might be worthwhile it you end up going
to the proposed extremes around signal alt stacks, to maximize
compatibility

Also then maybe x86 could copy the ARM ABI some day, if it ends up
chasing the tradeoff people prefer. It probably goes without saying
that the closer these features behave from the app developer
perspective, the better. So a different ABI than x86 that also targets
a mix would be a bit unfortunate. (not the end of the world though)

Anyway, just wanted to share some perspective there. Sorry for joining
this thread so late.

2023-10-03 08:46:59

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

The 10/02/2023 20:49, Mark Brown wrote:
> On Thu, Sep 28, 2023 at 05:59:25PM +0100, Szabolcs Nagy wrote:
> > The 08/23/2023 14:11, Catalin Marinas wrote:
>
> > > > and there is user code doing raw clone threads (such threads are
> > > > technically not allowed to call into libc) it's not immediately
> > > > clear to me if having gcs in those threads is better or worse.
>
> > i think raw clone / clone3 users may be relevant so we need a
> > solution such that they don't fail when gcs args are missing.
>
> Are we sure about that? Old binaries shouldn't be affected since they
> won't turn GCS so we're just talking about new binaries here - are there
> really so many of them that we won't be able to get them all converted
> over to clone3() and GCS in the timescales we're talking about for GCS
> deployment? I obviously don't particularly mind having the default size
> logic but if we allow clone() then that's keeping the existing behaviour
> and layering allocation via clone3() on top of it which Catalin didn't
> want. Catalin?

clone3 seems to have features that are only available in clone3 and
not exposed (reasonably) in libc apis so ppl will use clone3 directly
and those will be hard to fix for gcs (you have to convince upstream
to add future arm64 arch specific changes that they cannot test).
where this analysis might be wrong is that raw clone3 is more likely
used as fork/vfork without a new stack and thus no gcs issue.

even if we have time to fix code, we don't want too many ifdef hacks
just for gcs so it matters how many projects are affected.

> > userspace allocated gcs works for me, but maybe the alternative
> > with size only is more consistent (thread gcs is kernel mapped
> > with fallback size logic if gcs size is missing):
>
> If we have size only then the handling of GCS and normal stack in struct
> clone_args would be inconsistent. Given that it seems better to have
> the field present, we can allow it to be NULL and do the allocation with
> the specified size but it should be there.

i see, then try the original plan.

> > the main thread gcs is still special: the size is provided
> > via prctl (if at all).
>
> Either that or we have it do a map_shadow_stack() but that's an extra
> syscall during startup.

an extra syscall is not too bad for the gcs enabled case.

2023-10-03 13:38:26

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Mon, Oct 02, 2023 at 09:43:25PM +0000, Edgecombe, Rick P wrote:

> If ARM is thinking of doing things differently than x86, you might
> think about how you weight those tradeoffs. Like, it might be silly to
> worry about clone() support if something else ends up breaking
> compatibility majorly. But, it might be worthwhile it you end up going
> to the proposed extremes around signal alt stacks, to maximize
> compatibility

Yeah, I think Catalin's thinking here was that we're quite a way out
from actual hardware so it's much more tractable to fix up callers than
it is for x86 where the hardware is widely available.

> Also then maybe x86 could copy the ARM ABI some day, if it ends up
> chasing the tradeoff people prefer. It probably goes without saying
> that the closer these features behave from the app developer
> perspective, the better. So a different ABI than x86 that also targets
> a mix would be a bit unfortunate. (not the end of the world though)

If nothing else even if we end up being stricter about things it would
be extremely disappointing if we ended up with something where code for
arm64 won't run when built for x86.


Attachments:
(No filename) (1.15 kB)
signature.asc (499.00 B)
Download all attachments

2023-10-03 14:27:13

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Tue, Oct 03, 2023 at 09:45:56AM +0100, Szabolcs Nagy wrote:

> clone3 seems to have features that are only available in clone3 and
> not exposed (reasonably) in libc apis so ppl will use clone3 directly
> and those will be hard to fix for gcs (you have to convince upstream
> to add future arm64 arch specific changes that they cannot test).

Ah, I hadn't realised that there were things that weren't available via
libc - that does change the calculation a bit here. I would hope that
anything we do for clone3() would work just as well for x86 so the test
side should be a bit easier there than if it were a future arm64 thing,
though obviously it wouldn't be mandatory on x86 in the way that Catalin
wanted it for arm64.

> where this analysis might be wrong is that raw clone3 is more likely
> used as fork/vfork without a new stack and thus no gcs issue.

> even if we have time to fix code, we don't want too many ifdef hacks
> just for gcs so it matters how many projects are affected.

My impression was that raw usage of the APIs was a specialist enough
thing that this was viable, ICBW though - I might not have been
searching well enough (clone is an annoying term to search for!).


Attachments:
(No filename) (1.19 kB)
signature.asc (499.00 B)
Download all attachments

2023-10-05 17:27:51

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Tue, Oct 03, 2023 at 03:26:51PM +0100, Mark Brown wrote:
> On Tue, Oct 03, 2023 at 09:45:56AM +0100, Szabolcs Nagy wrote:
> > clone3 seems to have features that are only available in clone3 and
> > not exposed (reasonably) in libc apis so ppl will use clone3 directly
> > and those will be hard to fix for gcs (you have to convince upstream
> > to add future arm64 arch specific changes that they cannot test).
>
> Ah, I hadn't realised that there were things that weren't available via
> libc - that does change the calculation a bit here. I would hope that
> anything we do for clone3() would work just as well for x86 so the test
> side should be a bit easier there than if it were a future arm64 thing,
> though obviously it wouldn't be mandatory on x86 in the way that Catalin
> wanted it for arm64.

I haven't checked how many clone() or clone3() uses outside the libc are
(I tried some quick search in Debian but did not dig into the specifics
to see how generic that code is). I agree that having to change valid
cases outside of libc is not ideal. Even if we have the same clone3()
interface for x86 and arm64, we'd have other architectures that need
#ifdef'ing.

So I'm slightly warming up to the idea of having a default shadow stack
size (either RLIMIT_STACK or the clone3() stack size, following x86). A
clone3() extension can be added on top, though I wonder whether anyone
will use it if the kernel allocates a shadow stack by default.

It's not just the default size that I dislike (I think the x86
RLIMIT_STACK or clone3() stack_size is probably good enough) but the
kernel allocating the shadow stack and inserting it into the user
address space. The actual thread stack is managed by the user but the
shadow stack is not (and we don't do this very often). Anyway, I don't
have a better solution for direct uses of clone() or clone3(), other
than running those threads with the shadow stack disabled. Not sure
that's desirable.

--
Catalin

2023-10-06 12:17:32

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Thu, Oct 05, 2023 at 06:23:10PM +0100, Catalin Marinas wrote:

> It's not just the default size that I dislike (I think the x86
> RLIMIT_STACK or clone3() stack_size is probably good enough) but the
> kernel allocating the shadow stack and inserting it into the user
> address space. The actual thread stack is managed by the user but the
> shadow stack is not (and we don't do this very often). Anyway, I don't
> have a better solution for direct uses of clone() or clone3(), other
> than running those threads with the shadow stack disabled. Not sure
> that's desirable.

Running threads with the shadow stack disabled if they don't explicitly
request it feels like it's asking for trouble - as well as the escape
route from the protection it'd provide I'd expect there to be trouble
for things that do stack pivots, potentially random issues if there's a
mix of ways threads are started. It's going to be a tradeoff whatever
we do.


Attachments:
(No filename) (956.00 B)
signature.asc (499.00 B)
Download all attachments

2023-10-06 12:30:57

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

Mark Brown <[email protected]> writes:

> On Thu, Oct 05, 2023 at 06:23:10PM +0100, Catalin Marinas wrote:
>
>> It's not just the default size that I dislike (I think the x86
>> RLIMIT_STACK or clone3() stack_size is probably good enough) but the
>> kernel allocating the shadow stack and inserting it into the user
>> address space. The actual thread stack is managed by the user but the
>> shadow stack is not (and we don't do this very often). Anyway, I don't
>> have a better solution for direct uses of clone() or clone3(), other
>> than running those threads with the shadow stack disabled. Not sure
>> that's desirable.
>
> Running threads with the shadow stack disabled if they don't explicitly
> request it feels like it's asking for trouble - as well as the escape
> route from the protection it'd provide I'd expect there to be trouble
> for things that do stack pivots, potentially random issues if there's a
> mix of ways threads are started. It's going to be a tradeoff whatever
> we do.

Something I haven't seen in the discussion is that one of the ways I
have seen a non-libc clone used is to implement a fork with flags.
That is a new mm is created, and effectively a new process. Which
makes the characterization different.

In general creating a thread with clone and bypassing libc is
incompatible with pthreads, and the caller gets to keep both pieces.

As long as there is enough information code can detect that
shadow stacks are in use, and the code is able to create their own
I don't see why it shouldn't be the callers responsibility.

On the other hand I don't see the maintainer of clone Christian Brauner
or the libc folks especially Florian cc'd on this thread. So I really
don't think you have the right folks in on this conversation.

Eric
k

2023-10-06 13:23:39

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Fri, Oct 06, 2023 at 07:29:45AM -0500, Eric W. Biederman wrote:
> Mark Brown <[email protected]> writes:

> >> It's not just the default size that I dislike (I think the x86
> >> RLIMIT_STACK or clone3() stack_size is probably good enough) but the
> >> kernel allocating the shadow stack and inserting it into the user
> >> address space. The actual thread stack is managed by the user but the
> >> shadow stack is not (and we don't do this very often). Anyway, I don't
> >> have a better solution for direct uses of clone() or clone3(), other
> >> than running those threads with the shadow stack disabled. Not sure
> >> that's desirable.

> > Running threads with the shadow stack disabled if they don't explicitly
> > request it feels like it's asking for trouble - as well as the escape
> > route from the protection it'd provide I'd expect there to be trouble
> > for things that do stack pivots, potentially random issues if there's a
> > mix of ways threads are started. It's going to be a tradeoff whatever
> > we do.

> Something I haven't seen in the discussion is that one of the ways I
> have seen a non-libc clone used is to implement a fork with flags.
> That is a new mm is created, and effectively a new process. Which
> makes the characterization different.

> In general creating a thread with clone and bypassing libc is
> incompatible with pthreads, and the caller gets to keep both pieces.

> As long as there is enough information code can detect that
> shadow stacks are in use, and the code is able to create their own
> I don't see why it shouldn't be the callers responsibility.

> On the other hand I don't see the maintainer of clone Christian Brauner
> or the libc folks especially Florian cc'd on this thread. So I really
> don't think you have the right folks in on this conversation.

Well, copying them in now. The discussion here is about allocation of
shadow stacks for the arm64 implementation of the feature (the arm64
feature is called Guarded Control Stack in the architecture). These
maintain a second copy of the stack with only the return targets in
memory allocated with special protections so userspace can't write to it
directly and use this when doing returns to ensure that the returns
haven't been redirected. These shadow stacks can be allocated directly
by userspace using a new system call map_shadow_stack(), doing this via
mmap() was extensively discussed but it was concluded that this was very
likely to lead to security problems so we've got this new syscall that
ensures that shadow stack memory is never accessible to userspace via
other means.

The x86 implementation that has already been merged into mainline will
allocate a new shadow stack for newly created threads when the creating
thread has one. There was a suggestion to have arm64 diverge and
require that threads be created with clone3() and manualy provide a
shadow stack but then concerns were raised that as well as the issues
with divergence this would be too disruptive for adoption due to
non-libc thread creation. It's not controversial that it'd be good to
have clone3() by able to explicitly specify a shadow stack, just if it
should be required.


Attachments:
(No filename) (3.17 kB)
signature.asc (499.00 B)
Download all attachments

2023-10-19 17:08:42

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v4 03/36] arm64/gcs: Document the ABI for Guarded Control Stacks

On Thu, Oct 05, 2023 at 06:23:10PM +0100, Catalin Marinas wrote:

> I haven't checked how many clone() or clone3() uses outside the libc are
> (I tried some quick search in Debian but did not dig into the specifics
> to see how generic that code is). I agree that having to change valid
> cases outside of libc is not ideal. Even if we have the same clone3()
> interface for x86 and arm64, we'd have other architectures that need
> #ifdef'ing.

FTR the set of Debian source packages that have references to the string
__NR_clone (which picks up clone3 too) is below. At least some (eg,
kore) just have things that look like a copy of the syscall table rather
than things that look like calls, though equally it's likely we're
missing some.

aflplusplus
android-platform-tools
binutils-avr
box64
brltty
bubblewrap
chromium
chrony
crash
criu
crun
dietlibc
elogind
emscripten
fakeroot-ng
falcosecurity-libs
firefox
firefox-esr
flatpak
gcc-9
gcc-10
gcc-11
gcc-12
gcc-13
gcc-arm-none-eabi
gcc-snapshot
gdb-msp430
glibc
gnumach
hurd
klibc
kore
libpod
libseccomp
linux
llvm-toolchain-14
llvm-toolchain-15
llvm-toolchain-16
lxc
lxcfs
lxd
musl
newlib
notcurses
purelibc
pwntools
qemu
qt6-base
qt6-webengine
qtbase-opensource-src
qtbase-opensource-src-gles
qtwebengine-opensource-src
radare2
rumur
rustc
rust-linux-raw-sys
rust-rustix
strace
stress-ng
swtpm
systemd
systemtap
termpaint
thunderbird
tor
uclibc
umview
valgrind
vsftpd
wasi-libc
webkit2gtk
wpewebkit


Attachments:
(No filename) (1.50 kB)
signature.asc (499.00 B)
Download all attachments