2018-01-04 15:08:54

by Will Deacon

[permalink] [raw]
Subject: [PATCH 00/11] arm64 kpti hardening and variant 2 workarounds

Hi all,

This set of patches builds on top of the arm64 kpti patches[1] queued for
4.16 and further hardens the arm64 Linux kernel against the side-channel
attacks recently published by Google Project Zero.

In particular, the series does the following:

* Enable kpti by default on arm64, based on the value of ID_AA64PFR0_EL1.CSV3

* Prevent speculative resteering of the indirect branch in the exception
trampoline page, which resides at a fixed virtual address to avoid a
KASLR leak

* Add hooks to changes of context where the branch predictor could
theoretically resteer the speculative instruction stream having been trained
by userspace or a guest OS. These hooks are signal delivery (to prevent
training the branch predictor on kernel addresses from userspace),
switch_mm (return to user if SW PAN is enabled) and exit from a guest VM.

* Implement a dummy PSCI "VERSION" call as the hook for affected Cortex-A
CPUs. This will invalidate the predictor state with the latest Arm Trusted
Firmware patches which will appear at [2] and SoC vendors with affected
CPUs are strongly encouraged to update. We plan to switch to a more
efficient, special-purpose call when it is available and the PSCI spec
has been updated accordingly.

I'd like to get this in for 4.16, but that doesn't mean we can't improve
it further once it's merged.

For more information about the impact of this issue and the software migitations
for Arm processors, please see http://www.arm.com/security-update.

Thanks,

Will

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/tag/?h=kpti-base
[2] https://github.com/ARM-software/arm-trusted-firmware/wiki/ARM-Trusted-Firmware-Security-Advisory-TFV-6

--->8

Marc Zyngier (3):
arm64: Move post_ttbr_update_workaround to C code
arm64: KVM: Use per-CPU vector when BP hardening is enabled
arm64: KVM: Make PSCI_VERSION a fast path

Will Deacon (8):
arm64: use RET instruction for exiting the trampoline
arm64: Kconfig: Reword UNMAP_KERNEL_AT_EL0 kconfig entry
arm64: Take into account ID_AA64PFR0_EL1.CSV3
arm64: cpufeature: Pass capability structure to ->enable callback
drivers/firmware: Expose psci_get_version through psci_ops structure
arm64: Add skeleton to harden the branch predictor against aliasing
attacks
arm64: cputype: Add missing MIDR values for Cortex-A72 and Cortex-A75
arm64: Implement branch predictor hardening for affected Cortex-A CPUs

arch/arm/include/asm/kvm_mmu.h | 10 ++++
arch/arm64/Kconfig | 30 +++++++---
arch/arm64/include/asm/assembler.h | 13 -----
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/cputype.h | 4 ++
arch/arm64/include/asm/kvm_mmu.h | 38 ++++++++++++
arch/arm64/include/asm/mmu.h | 37 ++++++++++++
arch/arm64/include/asm/sysreg.h | 2 +
arch/arm64/kernel/Makefile | 4 ++
arch/arm64/kernel/bpi.S | 79 +++++++++++++++++++++++++
arch/arm64/kernel/cpu_errata.c | 116 +++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/cpufeature.c | 12 +++-
arch/arm64/kernel/entry.S | 7 ++-
arch/arm64/kvm/hyp/switch.c | 15 ++++-
arch/arm64/mm/context.c | 11 ++++
arch/arm64/mm/fault.c | 1 +
arch/arm64/mm/proc.S | 3 +-
drivers/firmware/psci.c | 2 +
include/linux/psci.h | 1 +
virt/kvm/arm/arm.c | 8 ++-
20 files changed, 366 insertions(+), 30 deletions(-)
create mode 100644 arch/arm64/kernel/bpi.S

--
2.1.4


2018-01-04 15:08:45

by Will Deacon

[permalink] [raw]
Subject: [PATCH 04/11] arm64: cpufeature: Pass capability structure to ->enable callback

In order to invoke the CPU capability ->matches callback from the ->enable
callback for applying local-CPU workarounds, we need a handle on the
capability structure.

This patch passes a pointer to the capability structure to the ->enable
callback.

Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/kernel/cpufeature.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index e11c11bb5b02..6133c14b9b01 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1151,7 +1151,7 @@ void __init enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps)
* uses an IPI, giving us a PSTATE that disappears when
* we return.
*/
- stop_machine(caps->enable, NULL, cpu_online_mask);
+ stop_machine(caps->enable, (void *)caps, cpu_online_mask);
}
}
}
@@ -1194,7 +1194,7 @@ verify_local_cpu_features(const struct arm64_cpu_capabilities *caps)
cpu_die_early();
}
if (caps->enable)
- caps->enable(NULL);
+ caps->enable((void *)caps);
}
}

--
2.1.4

2018-01-04 15:08:50

by Will Deacon

[permalink] [raw]
Subject: [PATCH 01/11] arm64: use RET instruction for exiting the trampoline

Speculation attacks against the entry trampoline can potentially resteer
the speculative instruction stream through the indirect branch and into
arbitrary gadgets within the kernel.

This patch defends against these attacks by forcing a misprediction
through the return stack: a dummy BL instruction loads an entry into
the stack, so that the predicted program flow of the subsequent RET
instruction is to a branch-to-self instruction which is finally resolved
as a branch to the kernel vectors with speculation suppressed.

Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/kernel/entry.S | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 031392ee5f47..b9feb587294d 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -1029,6 +1029,9 @@ alternative_else_nop_endif
.if \regsize == 64
msr tpidrro_el0, x30 // Restored in kernel_ventry
.endif
+ bl 2f
+ b .
+2:
tramp_map_kernel x30
#ifdef CONFIG_RANDOMIZE_BASE
adr x30, tramp_vectors + PAGE_SIZE
@@ -1041,7 +1044,7 @@ alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003
msr vbar_el1, x30
add x30, x30, #(1b - tramp_vectors)
isb
- br x30
+ ret
.endm

.macro tramp_exit, regsize = 64
--
2.1.4

2018-01-04 15:08:49

by Will Deacon

[permalink] [raw]
Subject: [PATCH 03/11] arm64: Take into account ID_AA64PFR0_EL1.CSV3

For non-KASLR kernels where the KPTI behaviour has not been overridden
on the command line we can use ID_AA64PFR0_EL1.CSV3 to determine whether
or not we should unmap the kernel whilst running at EL0.

Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 1 +
arch/arm64/kernel/cpufeature.c | 7 ++++++-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 08cc88574659..ae519bbd3f9e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -437,6 +437,7 @@
#define ID_AA64ISAR1_DPB_SHIFT 0

/* id_aa64pfr0 */
+#define ID_AA64PFR0_CSV3_SHIFT 60
#define ID_AA64PFR0_SVE_SHIFT 32
#define ID_AA64PFR0_GIC_SHIFT 24
#define ID_AA64PFR0_ASIMD_SHIFT 20
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9f0545dfe497..e11c11bb5b02 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -145,6 +145,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar1[] = {
};

static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_GIC_SHIFT, 4, 0),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
@@ -851,6 +852,8 @@ static int __kpti_forced; /* 0: not forced, >0: forced on, <0: forced off */
static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
int __unused)
{
+ u64 pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
+
/* Forced on command line? */
if (__kpti_forced) {
pr_info_once("kernel page table isolation forced %s by command line option\n",
@@ -862,7 +865,9 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
return true;

- return false;
+ /* Defer to CPU feature registers */
+ return !cpuid_feature_extract_unsigned_field(pfr0,
+ ID_AA64PFR0_CSV3_SHIFT);
}

static int __init parse_kpti(char *str)
--
2.1.4

2018-01-04 15:08:47

by Will Deacon

[permalink] [raw]
Subject: [PATCH 02/11] arm64: Kconfig: Reword UNMAP_KERNEL_AT_EL0 kconfig entry

Although CONFIG_UNMAP_KERNEL_AT_EL0 does make KASLR more robust, it's
actually more useful as a mitigation against speculation attacks that
can leak arbitrary kernel data to userspace through speculation.

Reword the Kconfig help message to reflect this, and make the option
depend on EXPERT so that it is on by default for the majority of users.

Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3af1657fcac3..efaaa3a66b95 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -834,15 +834,14 @@ config FORCE_MAX_ZONEORDER
4M allocations matching the default size used by generic code.

config UNMAP_KERNEL_AT_EL0
- bool "Unmap kernel when running in userspace (aka \"KAISER\")"
+ bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
default y
help
- Some attacks against KASLR make use of the timing difference between
- a permission fault which could arise from a page table entry that is
- present in the TLB, and a translation fault which always requires a
- page table walk. This option defends against these attacks by unmapping
- the kernel whilst running in userspace, therefore forcing translation
- faults for all of kernel space.
+ Speculation attacks against some high-performance processors can
+ be used to bypass MMU permission checks and leak kernel data to
+ userspace. This can be defended against by unmapping the kernel
+ when running in userspace, mapping it back in on exception entry
+ via a trampoline page in the vector table.

If unsure, say Y.

--
2.1.4

2018-01-04 15:10:29

by Will Deacon

[permalink] [raw]
Subject: [PATCH 10/11] arm64: cputype: Add missing MIDR values for Cortex-A72 and Cortex-A75

Hook up MIDR values for the Cortex-A72 and Cortex-A75 CPUs, since they
will soon need MIDR matches for hardening the branch predictor.

Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/include/asm/cputype.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 235e77d98261..84385b94e70b 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -79,8 +79,10 @@
#define ARM_CPU_PART_AEM_V8 0xD0F
#define ARM_CPU_PART_FOUNDATION 0xD00
#define ARM_CPU_PART_CORTEX_A57 0xD07
+#define ARM_CPU_PART_CORTEX_A72 0xD08
#define ARM_CPU_PART_CORTEX_A53 0xD03
#define ARM_CPU_PART_CORTEX_A73 0xD09
+#define ARM_CPU_PART_CORTEX_A75 0xD0A

#define APM_CPU_PART_POTENZA 0x000

@@ -94,7 +96,9 @@

#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
#define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
+#define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72)
#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
+#define MIDR_CORTEX_A75 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A75)
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
--
2.1.4

2018-01-04 15:10:30

by Will Deacon

[permalink] [raw]
Subject: [PATCH 07/11] arm64: Add skeleton to harden the branch predictor against aliasing attacks

Aliasing attacks against CPU branch predictors can allow an attacker to
redirect speculative control flow on some CPUs and potentially divulge
information from one context to another.

This patch adds initial skeleton code behind a new Kconfig option to
enable implementation-specific mitigations against these attacks for
CPUs that are affected.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 17 +++++++++
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/mmu.h | 37 ++++++++++++++++++++
arch/arm64/include/asm/sysreg.h | 1 +
arch/arm64/kernel/Makefile | 4 +++
arch/arm64/kernel/bpi.S | 55 +++++++++++++++++++++++++++++
arch/arm64/kernel/cpu_errata.c | 74 ++++++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/cpufeature.c | 1 +
arch/arm64/mm/context.c | 2 ++
arch/arm64/mm/fault.c | 1 +
10 files changed, 194 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kernel/bpi.S

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index efaaa3a66b95..cea44b95187c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -845,6 +845,23 @@ config UNMAP_KERNEL_AT_EL0

If unsure, say Y.

+config HARDEN_BRANCH_PREDICTOR
+ bool "Harden the branch predictor against aliasing attacks" if EXPERT
+ default y
+ help
+ Speculation attacks against some high-performance processors rely on
+ being able to manipulate the branch predictor for a victim context by
+ executing aliasing branches in the attacker context. Such attacks
+ can be partially mitigated against by clearing internal branch
+ predictor state and limiting the prediction logic in some situations.
+
+ This config option will take CPU-specific actions to harden the
+ branch predictor against aliasing attacks and may rely on specific
+ instruction sequences or control bits being set by the system
+ firmware.
+
+ If unsure, say Y.
+
menuconfig ARMV8_DEPRECATED
bool "Emulate deprecated/obsolete ARMv8 instructions"
depends on COMPAT
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index b4537ffd1018..51616e77fe6b 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -42,7 +42,8 @@
#define ARM64_HAS_DCPOP 21
#define ARM64_SVE 22
#define ARM64_UNMAP_KERNEL_AT_EL0 23
+#define ARM64_HARDEN_BRANCH_PREDICTOR 24

-#define ARM64_NCAPS 24
+#define ARM64_NCAPS 25

#endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 6f7bdb89817f..6dd83d75b82a 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -41,6 +41,43 @@ static inline bool arm64_kernel_unmapped_at_el0(void)
cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0);
}

+typedef void (*bp_hardening_cb_t)(void);
+
+struct bp_hardening_data {
+ int hyp_vectors_slot;
+ bp_hardening_cb_t fn;
+};
+
+#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
+extern char __bp_harden_hyp_vecs_start[], __bp_harden_hyp_vecs_end[];
+
+DECLARE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
+
+static inline struct bp_hardening_data *arm64_get_bp_hardening_data(void)
+{
+ return this_cpu_ptr(&bp_hardening_data);
+}
+
+static inline void arm64_apply_bp_hardening(void)
+{
+ struct bp_hardening_data *d;
+
+ if (!cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR))
+ return;
+
+ d = arm64_get_bp_hardening_data();
+ if (d->fn)
+ d->fn();
+}
+#else
+static inline struct bp_hardening_data *arm64_get_bp_hardening_data(void)
+{
+ return NULL;
+}
+
+static inline void arm64_apply_bp_hardening(void) { }
+#endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
+
extern void paging_init(void);
extern void bootmem_init(void);
extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ae519bbd3f9e..871744973ece 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -438,6 +438,7 @@

/* id_aa64pfr0 */
#define ID_AA64PFR0_CSV3_SHIFT 60
+#define ID_AA64PFR0_CSV2_SHIFT 56
#define ID_AA64PFR0_SVE_SHIFT 32
#define ID_AA64PFR0_GIC_SHIFT 24
#define ID_AA64PFR0_ASIMD_SHIFT 20
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 067baace74a0..0c760db04858 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -53,6 +53,10 @@ arm64-obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o
arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o

+ifeq ($(CONFIG_KVM),y)
+arm64-obj-$(CONFIG_HARDEN_BRANCH_PREDICTOR) += bpi.o
+endif
+
obj-y += $(arm64-obj-y) vdso/ probes/
obj-m += $(arm64-obj-m)
head-y := head.o
diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
new file mode 100644
index 000000000000..06a931eb2673
--- /dev/null
+++ b/arch/arm64/kernel/bpi.S
@@ -0,0 +1,55 @@
+/*
+ * Contains CPU specific branch predictor invalidation sequences
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+
+.macro ventry target
+ .rept 31
+ nop
+ .endr
+ b \target
+.endm
+
+.macro vectors target
+ ventry \target + 0x000
+ ventry \target + 0x080
+ ventry \target + 0x100
+ ventry \target + 0x180
+
+ ventry \target + 0x200
+ ventry \target + 0x280
+ ventry \target + 0x300
+ ventry \target + 0x380
+
+ ventry \target + 0x400
+ ventry \target + 0x480
+ ventry \target + 0x500
+ ventry \target + 0x580
+
+ ventry \target + 0x600
+ ventry \target + 0x680
+ ventry \target + 0x700
+ ventry \target + 0x780
+.endm
+
+ .align 11
+ENTRY(__bp_harden_hyp_vecs_start)
+ .rept 4
+ vectors __kvm_hyp_vector
+ .endr
+ENTRY(__bp_harden_hyp_vecs_end)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 0e27f86ee709..16ea5c6f314e 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -46,6 +46,80 @@ static int cpu_enable_trap_ctr_access(void *__unused)
return 0;
}

+#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
+#include <asm/mmu_context.h>
+#include <asm/cacheflush.h>
+
+DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
+
+#ifdef CONFIG_KVM
+static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start,
+ const char *hyp_vecs_end)
+{
+ void *dst = lm_alias(__bp_harden_hyp_vecs_start + slot * SZ_2K);
+ int i;
+
+ for (i = 0; i < SZ_2K; i += 0x80)
+ memcpy(dst + i, hyp_vecs_start, hyp_vecs_end - hyp_vecs_start);
+
+ flush_icache_range((uintptr_t)dst, (uintptr_t)dst + SZ_2K);
+}
+
+static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
+ const char *hyp_vecs_start,
+ const char *hyp_vecs_end)
+{
+ static int last_slot = -1;
+ static DEFINE_SPINLOCK(bp_lock);
+ int cpu, slot = -1;
+
+ spin_lock(&bp_lock);
+ for_each_possible_cpu(cpu) {
+ if (per_cpu(bp_hardening_data.fn, cpu) == fn) {
+ slot = per_cpu(bp_hardening_data.hyp_vectors_slot, cpu);
+ break;
+ }
+ }
+
+ if (slot == -1) {
+ last_slot++;
+ BUG_ON(((__bp_harden_hyp_vecs_end - __bp_harden_hyp_vecs_start)
+ / SZ_2K) <= last_slot);
+ slot = last_slot;
+ __copy_hyp_vect_bpi(slot, hyp_vecs_start, hyp_vecs_end);
+ }
+
+ __this_cpu_write(bp_hardening_data.hyp_vectors_slot, slot);
+ __this_cpu_write(bp_hardening_data.fn, fn);
+ spin_unlock(&bp_lock);
+}
+#else
+static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
+ const char *hyp_vecs_start,
+ const char *hyp_vecs_end)
+{
+ __this_cpu_write(bp_hardening_data.fn, fn);
+}
+#endif /* CONFIG_KVM */
+
+static void install_bp_hardening_cb(const struct arm64_cpu_capabilities *entry,
+ bp_hardening_cb_t fn,
+ const char *hyp_vecs_start,
+ const char *hyp_vecs_end)
+{
+ u64 pfr0;
+
+ if (!entry->matches(entry, SCOPE_LOCAL_CPU))
+ return;
+
+ pfr0 = read_cpuid(ID_AA64PFR0_EL1);
+ if (cpuid_feature_extract_unsigned_field(pfr0, ID_AA64PFR0_CSV2_SHIFT))
+ return;
+
+ __install_bp_hardening_cb(fn, hyp_vecs_start, hyp_vecs_end);
+}
+#endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
+
#define MIDR_RANGE(model, min, max) \
.def_scope = SCOPE_LOCAL_CPU, \
.matches = is_affected_midr_range, \
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6133c14b9b01..19ed09b0bb24 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -146,6 +146,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar1[] = {

static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0),
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV2_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_GIC_SHIFT, 4, 0),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index c1e3b6479c8f..f1d99ffc77d1 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -246,6 +246,8 @@ asmlinkage void post_ttbr_update_workaround(void)
"ic iallu; dsb nsh; isb",
ARM64_WORKAROUND_CAVIUM_27456,
CONFIG_CAVIUM_ERRATUM_27456));
+
+ arm64_apply_bp_hardening();
}

static int asids_init(void)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 22168cd0dde7..5203b6040cb6 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -318,6 +318,7 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
lsb = PAGE_SHIFT;
si.si_addr_lsb = lsb;

+ arm64_apply_bp_hardening();
force_sig_info(sig, &si, tsk);
}

--
2.1.4

2018-01-04 15:10:31

by Will Deacon

[permalink] [raw]
Subject: [PATCH 08/11] arm64: KVM: Use per-CPU vector when BP hardening is enabled

From: Marc Zyngier <[email protected]>

Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 10 ++++++++++
arch/arm64/include/asm/kvm_mmu.h | 38 ++++++++++++++++++++++++++++++++++++++
arch/arm64/kvm/hyp/switch.c | 2 +-
virt/kvm/arm/arm.c | 8 +++++++-
4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index fa6f2174276b..eb46fc81a440 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -221,6 +221,16 @@ static inline unsigned int kvm_get_vmid_bits(void)
return 8;
}

+static inline void *kvm_get_hyp_vector(void)
+{
+ return kvm_ksym_ref(__kvm_hyp_vector);
+}
+
+static inline int kvm_map_vectors(void)
+{
+ return 0;
+}
+
#endif /* !__ASSEMBLY__ */

#endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 672c8684d5c2..2d6d4bd9de52 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -309,5 +309,43 @@ static inline unsigned int kvm_get_vmid_bits(void)
return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
}

+#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
+#include <asm/mmu.h>
+
+static inline void *kvm_get_hyp_vector(void)
+{
+ struct bp_hardening_data *data = arm64_get_bp_hardening_data();
+ void *vect = kvm_ksym_ref(__kvm_hyp_vector);
+
+ if (data->fn) {
+ vect = __bp_harden_hyp_vecs_start +
+ data->hyp_vectors_slot * SZ_2K;
+
+ if (!has_vhe())
+ vect = lm_alias(vect);
+ }
+
+ return vect;
+}
+
+static inline int kvm_map_vectors(void)
+{
+ return create_hyp_mappings(kvm_ksym_ref(__bp_harden_hyp_vecs_start),
+ kvm_ksym_ref(__bp_harden_hyp_vecs_end),
+ PAGE_HYP_EXEC);
+}
+
+#else
+static inline void *kvm_get_hyp_vector(void)
+{
+ return kvm_ksym_ref(__kvm_hyp_vector);
+}
+
+static inline int kvm_map_vectors(void)
+{
+ return 0;
+}
+#endif
+
#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index f7c651f3a8c0..8d4f3c9d6dc4 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -52,7 +52,7 @@ static void __hyp_text __activate_traps_vhe(void)
val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
write_sysreg(val, cpacr_el1);

- write_sysreg(__kvm_hyp_vector, vbar_el1);
+ write_sysreg(kvm_get_hyp_vector(), vbar_el1);
}

static void __hyp_text __activate_traps_nvhe(void)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 6b60c98a6e22..1c9fdb6db124 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1158,7 +1158,7 @@ static void cpu_init_hyp_mode(void *dummy)
pgd_ptr = kvm_mmu_get_httbr();
stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
hyp_stack_ptr = stack_page + PAGE_SIZE;
- vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
+ vector_ptr = (unsigned long)kvm_get_hyp_vector();

__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
__cpu_init_stage2();
@@ -1403,6 +1403,12 @@ static int init_hyp_mode(void)
goto out_err;
}

+ err = kvm_map_vectors();
+ if (err) {
+ kvm_err("Cannot map vectors\n");
+ goto out_err;
+ }
+
/*
* Map the Hyp stack pages
*/
--
2.1.4

2018-01-04 15:10:34

by Will Deacon

[permalink] [raw]
Subject: [PATCH 05/11] drivers/firmware: Expose psci_get_version through psci_ops structure

Entry into recent versions of ARM Trusted Firmware will invalidate the CPU
branch predictor state in order to protect against aliasing attacks.

This patch exposes the PSCI "VERSION" function via psci_ops, so that it
can be invoked outside of the PSCI driver where necessary.

Signed-off-by: Will Deacon <[email protected]>
---
drivers/firmware/psci.c | 2 ++
include/linux/psci.h | 1 +
2 files changed, 3 insertions(+)

diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
index d687ca3d5049..8b25d31e8401 100644
--- a/drivers/firmware/psci.c
+++ b/drivers/firmware/psci.c
@@ -496,6 +496,8 @@ static void __init psci_init_migrate(void)
static void __init psci_0_2_set_functions(void)
{
pr_info("Using standard PSCI v0.2 function IDs\n");
+ psci_ops.get_version = psci_get_version;
+
psci_function_id[PSCI_FN_CPU_SUSPEND] =
PSCI_FN_NATIVE(0_2, CPU_SUSPEND);
psci_ops.cpu_suspend = psci_cpu_suspend;
diff --git a/include/linux/psci.h b/include/linux/psci.h
index bdea1cb5e1db..6306ab10af18 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -26,6 +26,7 @@ int psci_cpu_init_idle(unsigned int cpu);
int psci_cpu_suspend_enter(unsigned long index);

struct psci_operations {
+ u32 (*get_version)(void);
int (*cpu_suspend)(u32 state, unsigned long entry_point);
int (*cpu_off)(u32 state);
int (*cpu_on)(unsigned long cpuid, unsigned long entry_point);
--
2.1.4

2018-01-04 15:10:33

by Will Deacon

[permalink] [raw]
Subject: [PATCH 06/11] arm64: Move post_ttbr_update_workaround to C code

From: Marc Zyngier <[email protected]>

We will soon need to invoke a CPU-specific function pointer after changing
page tables, so move post_ttbr_update_workaround out into C code to make
this possible.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/include/asm/assembler.h | 13 -------------
arch/arm64/kernel/entry.S | 2 +-
arch/arm64/mm/context.c | 9 +++++++++
arch/arm64/mm/proc.S | 3 +--
4 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index c45bc94f15d0..cee60ce0da52 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -476,17 +476,4 @@ alternative_endif
mrs \rd, sp_el0
.endm

-/*
- * Errata workaround post TTBRx_EL1 update.
- */
- .macro post_ttbr_update_workaround
-#ifdef CONFIG_CAVIUM_ERRATUM_27456
-alternative_if ARM64_WORKAROUND_CAVIUM_27456
- ic iallu
- dsb nsh
- isb
-alternative_else_nop_endif
-#endif
- .endm
-
#endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index b9feb587294d..6aa112baf601 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -277,7 +277,7 @@ alternative_else_nop_endif
* Cavium erratum 27456 (broadcast TLBI instructions may cause I-cache
* corruption).
*/
- post_ttbr_update_workaround
+ bl post_ttbr_update_workaround
.endif
1:
.if \el != 0
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 1cb3bc92ae5c..c1e3b6479c8f 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -239,6 +239,15 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
cpu_switch_mm(mm->pgd, mm);
}

+/* Errata workaround post TTBRx_EL1 update. */
+asmlinkage void post_ttbr_update_workaround(void)
+{
+ asm volatile(ALTERNATIVE("nop; nop; nop",
+ "ic iallu; dsb nsh; isb",
+ ARM64_WORKAROUND_CAVIUM_27456,
+ CONFIG_CAVIUM_ERRATUM_27456));
+}
+
static int asids_init(void)
{
asid_bits = get_cpu_asid_bits();
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 3146dc96f05b..6affb68a9a14 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -145,8 +145,7 @@ ENTRY(cpu_do_switch_mm)
isb
msr ttbr0_el1, x0 // now update TTBR0
isb
- post_ttbr_update_workaround
- ret
+ b post_ttbr_update_workaround // Back to C code...
ENDPROC(cpu_do_switch_mm)

.pushsection ".idmap.text", "ax"
--
2.1.4

2018-01-04 15:10:27

by Will Deacon

[permalink] [raw]
Subject: [PATCH 09/11] arm64: KVM: Make PSCI_VERSION a fast path

From: Marc Zyngier <[email protected]>

For those CPUs that require PSCI to perform a BP invalidation,
going all the way to the PSCI code for not much is a waste of
precious cycles. Let's terminate that call as early as possible.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/kvm/hyp/switch.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8d4f3c9d6dc4..4d273f6d0e69 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -17,6 +17,7 @@

#include <linux/types.h>
#include <linux/jump_label.h>
+#include <uapi/linux/psci.h>

#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
@@ -341,6 +342,18 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
goto again;

+ if (exit_code == ARM_EXCEPTION_TRAP &&
+ (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC64 ||
+ kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC32) &&
+ vcpu_get_reg(vcpu, 0) == PSCI_0_2_FN_PSCI_VERSION) {
+ u64 val = PSCI_RET_NOT_SUPPORTED;
+ if (test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features))
+ val = 2;
+
+ vcpu_set_reg(vcpu, 0, val);
+ goto again;
+ }
+
if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
exit_code == ARM_EXCEPTION_TRAP) {
bool valid;
--
2.1.4

2018-01-04 15:10:26

by Will Deacon

[permalink] [raw]
Subject: [PATCH 11/11] arm64: Implement branch predictor hardening for affected Cortex-A CPUs

Cortex-A57, A72, A73 and A75 are susceptible to branch predictor aliasing
and can theoretically be attacked by malicious code.

This patch implements a PSCI-based mitigation for these CPUs when available.
The call into firmware will invalidate the branch predictor state, preventing
any malicious entries from affecting other victim contexts.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/kernel/bpi.S | 24 ++++++++++++++++++++++++
arch/arm64/kernel/cpu_errata.c | 42 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 66 insertions(+)

diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
index 06a931eb2673..2b10d52a0321 100644
--- a/arch/arm64/kernel/bpi.S
+++ b/arch/arm64/kernel/bpi.S
@@ -53,3 +53,27 @@ ENTRY(__bp_harden_hyp_vecs_start)
vectors __kvm_hyp_vector
.endr
ENTRY(__bp_harden_hyp_vecs_end)
+ENTRY(__psci_hyp_bp_inval_start)
+ stp x0, x1, [sp, #-16]!
+ stp x2, x3, [sp, #-16]!
+ stp x4, x5, [sp, #-16]!
+ stp x6, x7, [sp, #-16]!
+ stp x8, x9, [sp, #-16]!
+ stp x10, x11, [sp, #-16]!
+ stp x12, x13, [sp, #-16]!
+ stp x14, x15, [sp, #-16]!
+ stp x16, x17, [sp, #-16]!
+ stp x18, x19, [sp, #-16]!
+ mov x0, #0x84000000
+ smc #0
+ ldp x18, x19, [sp], #16
+ ldp x16, x17, [sp], #16
+ ldp x14, x15, [sp], #16
+ ldp x12, x13, [sp], #16
+ ldp x10, x11, [sp], #16
+ ldp x8, x9, [sp], #16
+ ldp x6, x7, [sp], #16
+ ldp x4, x5, [sp], #16
+ ldp x2, x3, [sp], #16
+ ldp x0, x1, [sp], #16
+ENTRY(__psci_hyp_bp_inval_end)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 16ea5c6f314e..cb0fb3796bb8 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -53,6 +53,8 @@ static int cpu_enable_trap_ctr_access(void *__unused)
DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);

#ifdef CONFIG_KVM
+extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[];
+
static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start,
const char *hyp_vecs_end)
{
@@ -94,6 +96,9 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
spin_unlock(&bp_lock);
}
#else
+#define __psci_hyp_bp_inval_start NULL
+#define __psci_hyp_bp_inval_end NULL
+
static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
const char *hyp_vecs_start,
const char *hyp_vecs_end)
@@ -118,6 +123,21 @@ static void install_bp_hardening_cb(const struct arm64_cpu_capabilities *entry,

__install_bp_hardening_cb(fn, hyp_vecs_start, hyp_vecs_end);
}
+
+#include <linux/psci.h>
+
+static int enable_psci_bp_hardening(void *data)
+{
+ const struct arm64_cpu_capabilities *entry = data;
+
+ if (psci_ops.get_version)
+ install_bp_hardening_cb(entry,
+ (bp_hardening_cb_t)psci_ops.get_version,
+ __psci_hyp_bp_inval_start,
+ __psci_hyp_bp_inval_end);
+
+ return 0;
+}
#endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */

#define MIDR_RANGE(model, min, max) \
@@ -261,6 +281,28 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
MIDR_ALL_VERSIONS(MIDR_CORTEX_A73),
},
#endif
+#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
+ {
+ .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
+ .enable = enable_psci_bp_hardening,
+ },
+ {
+ .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
+ .enable = enable_psci_bp_hardening,
+ },
+ {
+ .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A73),
+ .enable = enable_psci_bp_hardening,
+ },
+ {
+ .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A75),
+ .enable = enable_psci_bp_hardening,
+ },
+#endif
{
}
};
--
2.1.4

2018-01-04 15:39:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/11] arm64: Kconfig: Reword UNMAP_KERNEL_AT_EL0 kconfig entry

On Thu, Jan 04, 2018 at 03:08:26PM +0000, Will Deacon wrote:
> Although CONFIG_UNMAP_KERNEL_AT_EL0 does make KASLR more robust, it's
> actually more useful as a mitigation against speculation attacks that
> can leak arbitrary kernel data to userspace through speculation.
>
> Reword the Kconfig help message to reflect this, and make the option
> depend on EXPERT so that it is on by default for the majority of users.
>
> Signed-off-by: Will Deacon <[email protected]>

Why is this not reusing the PAGE_TABLE_ISOLATION setting in
security/Kconfig ?

2018-01-04 16:09:20

by Lorenzo Pieralisi

[permalink] [raw]
Subject: Re: [PATCH 05/11] drivers/firmware: Expose psci_get_version through psci_ops structure

On Thu, Jan 04, 2018 at 03:08:29PM +0000, Will Deacon wrote:
> Entry into recent versions of ARM Trusted Firmware will invalidate the CPU
> branch predictor state in order to protect against aliasing attacks.
>
> This patch exposes the PSCI "VERSION" function via psci_ops, so that it
> can be invoked outside of the PSCI driver where necessary.
>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> drivers/firmware/psci.c | 2 ++
> include/linux/psci.h | 1 +
> 2 files changed, 3 insertions(+)

Acked-by: Lorenzo Pieralisi <[email protected]>

> diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
> index d687ca3d5049..8b25d31e8401 100644
> --- a/drivers/firmware/psci.c
> +++ b/drivers/firmware/psci.c
> @@ -496,6 +496,8 @@ static void __init psci_init_migrate(void)
> static void __init psci_0_2_set_functions(void)
> {
> pr_info("Using standard PSCI v0.2 function IDs\n");
> + psci_ops.get_version = psci_get_version;
> +
> psci_function_id[PSCI_FN_CPU_SUSPEND] =
> PSCI_FN_NATIVE(0_2, CPU_SUSPEND);
> psci_ops.cpu_suspend = psci_cpu_suspend;
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index bdea1cb5e1db..6306ab10af18 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -26,6 +26,7 @@ int psci_cpu_init_idle(unsigned int cpu);
> int psci_cpu_suspend_enter(unsigned long index);
>
> struct psci_operations {
> + u32 (*get_version)(void);
> int (*cpu_suspend)(u32 state, unsigned long entry_point);
> int (*cpu_off)(u32 state);
> int (*cpu_on)(unsigned long cpuid, unsigned long entry_point);
> --
> 2.1.4
>

2018-01-04 16:24:24

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 01/11] arm64: use RET instruction for exiting the trampoline

On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
> Speculation attacks against the entry trampoline can potentially resteer
> the speculative instruction stream through the indirect branch and into
> arbitrary gadgets within the kernel.
>
> This patch defends against these attacks by forcing a misprediction
> through the return stack: a dummy BL instruction loads an entry into
> the stack, so that the predicted program flow of the subsequent RET
> instruction is to a branch-to-self instruction which is finally resolved
> as a branch to the kernel vectors with speculation suppressed.
>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/arm64/kernel/entry.S | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 031392ee5f47..b9feb587294d 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -1029,6 +1029,9 @@ alternative_else_nop_endif
> .if \regsize == 64
> msr tpidrro_el0, x30 // Restored in kernel_ventry
> .endif
> + bl 2f
> + b .
> +2:

This deserves a comment, I guess?

Also, is deliberately unbalancing the return stack likely to cause
performance problems, e.g., in libc hot paths?

> tramp_map_kernel x30
> #ifdef CONFIG_RANDOMIZE_BASE
> adr x30, tramp_vectors + PAGE_SIZE
> @@ -1041,7 +1044,7 @@ alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003
> msr vbar_el1, x30
> add x30, x30, #(1b - tramp_vectors)
> isb
> - br x30
> + ret
> .endm
>
> .macro tramp_exit, regsize = 64
> --
> 2.1.4
>

2018-01-04 16:25:53

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 06/11] arm64: Move post_ttbr_update_workaround to C code

On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
> From: Marc Zyngier <[email protected]>
>
> We will soon need to invoke a CPU-specific function pointer after changing
> page tables, so move post_ttbr_update_workaround out into C code to make
> this possible.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/arm64/include/asm/assembler.h | 13 -------------
> arch/arm64/kernel/entry.S | 2 +-
> arch/arm64/mm/context.c | 9 +++++++++
> arch/arm64/mm/proc.S | 3 +--
> 4 files changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index c45bc94f15d0..cee60ce0da52 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -476,17 +476,4 @@ alternative_endif
> mrs \rd, sp_el0
> .endm
>
> -/*
> - * Errata workaround post TTBRx_EL1 update.
> - */
> - .macro post_ttbr_update_workaround
> -#ifdef CONFIG_CAVIUM_ERRATUM_27456
> -alternative_if ARM64_WORKAROUND_CAVIUM_27456
> - ic iallu
> - dsb nsh
> - isb
> -alternative_else_nop_endif
> -#endif
> - .endm
> -
> #endif /* __ASM_ASSEMBLER_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index b9feb587294d..6aa112baf601 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -277,7 +277,7 @@ alternative_else_nop_endif
> * Cavium erratum 27456 (broadcast TLBI instructions may cause I-cache
> * corruption).
> */
> - post_ttbr_update_workaround
> + bl post_ttbr_update_workaround
> .endif
> 1:
> .if \el != 0
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index 1cb3bc92ae5c..c1e3b6479c8f 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -239,6 +239,15 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
> cpu_switch_mm(mm->pgd, mm);
> }
>
> +/* Errata workaround post TTBRx_EL1 update. */
> +asmlinkage void post_ttbr_update_workaround(void)
> +{
> + asm volatile(ALTERNATIVE("nop; nop; nop",

What does 'volatile' add here?

> + "ic iallu; dsb nsh; isb",
> + ARM64_WORKAROUND_CAVIUM_27456,
> + CONFIG_CAVIUM_ERRATUM_27456));
> +}
> +
> static int asids_init(void)
> {
> asid_bits = get_cpu_asid_bits();
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 3146dc96f05b..6affb68a9a14 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -145,8 +145,7 @@ ENTRY(cpu_do_switch_mm)
> isb
> msr ttbr0_el1, x0 // now update TTBR0
> isb
> - post_ttbr_update_workaround
> - ret
> + b post_ttbr_update_workaround // Back to C code...
> ENDPROC(cpu_do_switch_mm)
>
> .pushsection ".idmap.text", "ax"
> --
> 2.1.4
>

2018-01-04 16:28:53

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 08/11] arm64: KVM: Use per-CPU vector when BP hardening is enabled

On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
> From: Marc Zyngier <[email protected]>
>
> Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.
>

Why does bp hardening require per-cpu vectors?

> Signed-off-by: Marc Zyngier <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/arm/include/asm/kvm_mmu.h | 10 ++++++++++
> arch/arm64/include/asm/kvm_mmu.h | 38 ++++++++++++++++++++++++++++++++++++++
> arch/arm64/kvm/hyp/switch.c | 2 +-
> virt/kvm/arm/arm.c | 8 +++++++-
> 4 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index fa6f2174276b..eb46fc81a440 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -221,6 +221,16 @@ static inline unsigned int kvm_get_vmid_bits(void)
> return 8;
> }
>
> +static inline void *kvm_get_hyp_vector(void)
> +{
> + return kvm_ksym_ref(__kvm_hyp_vector);
> +}
> +
> +static inline int kvm_map_vectors(void)
> +{
> + return 0;
> +}
> +
> #endif /* !__ASSEMBLY__ */
>
> #endif /* __ARM_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 672c8684d5c2..2d6d4bd9de52 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -309,5 +309,43 @@ static inline unsigned int kvm_get_vmid_bits(void)
> return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
> }
>
> +#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
> +#include <asm/mmu.h>
> +
> +static inline void *kvm_get_hyp_vector(void)
> +{
> + struct bp_hardening_data *data = arm64_get_bp_hardening_data();
> + void *vect = kvm_ksym_ref(__kvm_hyp_vector);
> +
> + if (data->fn) {
> + vect = __bp_harden_hyp_vecs_start +
> + data->hyp_vectors_slot * SZ_2K;
> +
> + if (!has_vhe())
> + vect = lm_alias(vect);
> + }
> +
> + return vect;
> +}
> +
> +static inline int kvm_map_vectors(void)
> +{
> + return create_hyp_mappings(kvm_ksym_ref(__bp_harden_hyp_vecs_start),
> + kvm_ksym_ref(__bp_harden_hyp_vecs_end),
> + PAGE_HYP_EXEC);
> +}
> +
> +#else
> +static inline void *kvm_get_hyp_vector(void)
> +{
> + return kvm_ksym_ref(__kvm_hyp_vector);
> +}
> +
> +static inline int kvm_map_vectors(void)
> +{
> + return 0;
> +}
> +#endif
> +
> #endif /* __ASSEMBLY__ */
> #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index f7c651f3a8c0..8d4f3c9d6dc4 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -52,7 +52,7 @@ static void __hyp_text __activate_traps_vhe(void)
> val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
> write_sysreg(val, cpacr_el1);
>
> - write_sysreg(__kvm_hyp_vector, vbar_el1);
> + write_sysreg(kvm_get_hyp_vector(), vbar_el1);
> }
>
> static void __hyp_text __activate_traps_nvhe(void)
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 6b60c98a6e22..1c9fdb6db124 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1158,7 +1158,7 @@ static void cpu_init_hyp_mode(void *dummy)
> pgd_ptr = kvm_mmu_get_httbr();
> stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
> hyp_stack_ptr = stack_page + PAGE_SIZE;
> - vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
> + vector_ptr = (unsigned long)kvm_get_hyp_vector();
>
> __cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
> __cpu_init_stage2();
> @@ -1403,6 +1403,12 @@ static int init_hyp_mode(void)
> goto out_err;
> }
>
> + err = kvm_map_vectors();
> + if (err) {
> + kvm_err("Cannot map vectors\n");
> + goto out_err;
> + }
> +
> /*
> * Map the Hyp stack pages
> */
> --
> 2.1.4
>

2018-01-04 16:31:39

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 11/11] arm64: Implement branch predictor hardening for affected Cortex-A CPUs

On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
> Cortex-A57, A72, A73 and A75 are susceptible to branch predictor aliasing
> and can theoretically be attacked by malicious code.
>
> This patch implements a PSCI-based mitigation for these CPUs when available.
> The call into firmware will invalidate the branch predictor state, preventing
> any malicious entries from affecting other victim contexts.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/arm64/kernel/bpi.S | 24 ++++++++++++++++++++++++
> arch/arm64/kernel/cpu_errata.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 66 insertions(+)
>
> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
> index 06a931eb2673..2b10d52a0321 100644
> --- a/arch/arm64/kernel/bpi.S
> +++ b/arch/arm64/kernel/bpi.S
> @@ -53,3 +53,27 @@ ENTRY(__bp_harden_hyp_vecs_start)
> vectors __kvm_hyp_vector
> .endr
> ENTRY(__bp_harden_hyp_vecs_end)
> +ENTRY(__psci_hyp_bp_inval_start)
> + stp x0, x1, [sp, #-16]!
> + stp x2, x3, [sp, #-16]!
> + stp x4, x5, [sp, #-16]!
> + stp x6, x7, [sp, #-16]!
> + stp x8, x9, [sp, #-16]!
> + stp x10, x11, [sp, #-16]!
> + stp x12, x13, [sp, #-16]!
> + stp x14, x15, [sp, #-16]!
> + stp x16, x17, [sp, #-16]!
> + stp x18, x19, [sp, #-16]!

Would it be better to update sp only once here?
Also, do x18 and x19 need to be preserved/restored here?

> + mov x0, #0x84000000
> + smc #0
> + ldp x18, x19, [sp], #16
> + ldp x16, x17, [sp], #16
> + ldp x14, x15, [sp], #16
> + ldp x12, x13, [sp], #16
> + ldp x10, x11, [sp], #16
> + ldp x8, x9, [sp], #16
> + ldp x6, x7, [sp], #16
> + ldp x4, x5, [sp], #16
> + ldp x2, x3, [sp], #16
> + ldp x0, x1, [sp], #16
> +ENTRY(__psci_hyp_bp_inval_end)
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 16ea5c6f314e..cb0fb3796bb8 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -53,6 +53,8 @@ static int cpu_enable_trap_ctr_access(void *__unused)
> DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
>
> #ifdef CONFIG_KVM
> +extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[];
> +
> static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start,
> const char *hyp_vecs_end)
> {
> @@ -94,6 +96,9 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
> spin_unlock(&bp_lock);
> }
> #else
> +#define __psci_hyp_bp_inval_start NULL
> +#define __psci_hyp_bp_inval_end NULL
> +
> static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
> const char *hyp_vecs_start,
> const char *hyp_vecs_end)
> @@ -118,6 +123,21 @@ static void install_bp_hardening_cb(const struct arm64_cpu_capabilities *entry,
>
> __install_bp_hardening_cb(fn, hyp_vecs_start, hyp_vecs_end);
> }
> +
> +#include <linux/psci.h>
> +
> +static int enable_psci_bp_hardening(void *data)
> +{
> + const struct arm64_cpu_capabilities *entry = data;
> +
> + if (psci_ops.get_version)
> + install_bp_hardening_cb(entry,
> + (bp_hardening_cb_t)psci_ops.get_version,
> + __psci_hyp_bp_inval_start,
> + __psci_hyp_bp_inval_end);
> +
> + return 0;
> +}
> #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
>
> #define MIDR_RANGE(model, min, max) \
> @@ -261,6 +281,28 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
> MIDR_ALL_VERSIONS(MIDR_CORTEX_A73),
> },
> #endif
> +#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
> + {
> + .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
> + MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
> + .enable = enable_psci_bp_hardening,
> + },
> + {
> + .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
> + MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
> + .enable = enable_psci_bp_hardening,
> + },
> + {
> + .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
> + MIDR_ALL_VERSIONS(MIDR_CORTEX_A73),
> + .enable = enable_psci_bp_hardening,
> + },
> + {
> + .capability = ARM64_HARDEN_BRANCH_PREDICTOR,
> + MIDR_ALL_VERSIONS(MIDR_CORTEX_A75),
> + .enable = enable_psci_bp_hardening,
> + },
> +#endif
> {
> }
> };
> --
> 2.1.4
>

2018-01-04 17:04:05

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 08/11] arm64: KVM: Use per-CPU vector when BP hardening is enabled

On 04/01/18 16:28, Ard Biesheuvel wrote:
> On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
>> From: Marc Zyngier <[email protected]>
>>
>> Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.
>>
>
> Why does bp hardening require per-cpu vectors?

The description is not 100% accurate. We have per *CPU type* vectors.
This stems from the following, slightly conflicting requirements:

- We have systems with more than one CPU type (think big-little)
- Different implementations require different BP hardening sequences
- The BP hardening sequence must be executed before doing any branch

The natural solution is to have one set of vectors per CPU type,
containing the BP hardening sequence for that particular implementation,
ending with a branch to the common code.

M.
--
Jazz is not dead. It just smells funny...

2018-01-04 17:05:19

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 08/11] arm64: KVM: Use per-CPU vector when BP hardening is enabled

On 4 January 2018 at 17:04, Marc Zyngier <[email protected]> wrote:
> On 04/01/18 16:28, Ard Biesheuvel wrote:
>> On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
>>> From: Marc Zyngier <[email protected]>
>>>
>>> Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.
>>>
>>
>> Why does bp hardening require per-cpu vectors?
>
> The description is not 100% accurate. We have per *CPU type* vectors.
> This stems from the following, slightly conflicting requirements:
>
> - We have systems with more than one CPU type (think big-little)
> - Different implementations require different BP hardening sequences
> - The BP hardening sequence must be executed before doing any branch
>
> The natural solution is to have one set of vectors per CPU type,
> containing the BP hardening sequence for that particular implementation,
> ending with a branch to the common code.
>

Crystal clear, thanks.

2018-01-04 17:14:13

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 11/11] arm64: Implement branch predictor hardening for affected Cortex-A CPUs

On 04/01/18 16:31, Ard Biesheuvel wrote:
> On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
>> Cortex-A57, A72, A73 and A75 are susceptible to branch predictor aliasing
>> and can theoretically be attacked by malicious code.
>>
>> This patch implements a PSCI-based mitigation for these CPUs when available.
>> The call into firmware will invalidate the branch predictor state, preventing
>> any malicious entries from affecting other victim contexts.
>>
>> Signed-off-by: Marc Zyngier <[email protected]>
>> Signed-off-by: Will Deacon <[email protected]>
>> ---
>> arch/arm64/kernel/bpi.S | 24 ++++++++++++++++++++++++
>> arch/arm64/kernel/cpu_errata.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 66 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
>> index 06a931eb2673..2b10d52a0321 100644
>> --- a/arch/arm64/kernel/bpi.S
>> +++ b/arch/arm64/kernel/bpi.S
>> @@ -53,3 +53,27 @@ ENTRY(__bp_harden_hyp_vecs_start)
>> vectors __kvm_hyp_vector
>> .endr
>> ENTRY(__bp_harden_hyp_vecs_end)
>> +ENTRY(__psci_hyp_bp_inval_start)
>> + stp x0, x1, [sp, #-16]!
>> + stp x2, x3, [sp, #-16]!
>> + stp x4, x5, [sp, #-16]!
>> + stp x6, x7, [sp, #-16]!
>> + stp x8, x9, [sp, #-16]!
>> + stp x10, x11, [sp, #-16]!
>> + stp x12, x13, [sp, #-16]!
>> + stp x14, x15, [sp, #-16]!
>> + stp x16, x17, [sp, #-16]!
>> + stp x18, x19, [sp, #-16]!
>
> Would it be better to update sp only once here?

Maybe. I suppose that's quite uarch dependent, but worth trying.

> Also, do x18 and x19 need to be preserved/restored here?

My bad. I misread the SMCCC and though I needed to save it too. For the
reference, the text says:

"Registers X18-X30 and stack pointers SP_EL0 and SP_ELx are saved by
the function that is called, and must be preserved over the SMC or HVC
call."

I'll amend the patch.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2018-01-04 18:32:00

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 01/11] arm64: use RET instruction for exiting the trampoline

Hi Ard,

On Thu, Jan 04, 2018 at 04:24:22PM +0000, Ard Biesheuvel wrote:
> On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
> > Speculation attacks against the entry trampoline can potentially resteer
> > the speculative instruction stream through the indirect branch and into
> > arbitrary gadgets within the kernel.
> >
> > This patch defends against these attacks by forcing a misprediction
> > through the return stack: a dummy BL instruction loads an entry into
> > the stack, so that the predicted program flow of the subsequent RET
> > instruction is to a branch-to-self instruction which is finally resolved
> > as a branch to the kernel vectors with speculation suppressed.
> >
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> > arch/arm64/kernel/entry.S | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> > index 031392ee5f47..b9feb587294d 100644
> > --- a/arch/arm64/kernel/entry.S
> > +++ b/arch/arm64/kernel/entry.S
> > @@ -1029,6 +1029,9 @@ alternative_else_nop_endif
> > .if \regsize == 64
> > msr tpidrro_el0, x30 // Restored in kernel_ventry
> > .endif
> > + bl 2f
> > + b .
> > +2:
>
> This deserves a comment, I guess?

Yeah, I suppose ;) I'll lift something out of the commit message.

> Also, is deliberately unbalancing the return stack likely to cause
> performance problems, e.g., in libc hot paths?

I don't think so, because it remains balanced after this code. We push an
entry on with the BL and pop it with the RET; the rest of the return stack
remains unchanged. That said, I'm also not sure what we could do differently
here!

Will

2018-01-04 18:35:03

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 01/11] arm64: use RET instruction for exiting the trampoline

On 4 January 2018 at 18:31, Will Deacon <[email protected]> wrote:
> Hi Ard,
>
> On Thu, Jan 04, 2018 at 04:24:22PM +0000, Ard Biesheuvel wrote:
>> On 4 January 2018 at 15:08, Will Deacon <[email protected]> wrote:
>> > Speculation attacks against the entry trampoline can potentially resteer
>> > the speculative instruction stream through the indirect branch and into
>> > arbitrary gadgets within the kernel.
>> >
>> > This patch defends against these attacks by forcing a misprediction
>> > through the return stack: a dummy BL instruction loads an entry into
>> > the stack, so that the predicted program flow of the subsequent RET
>> > instruction is to a branch-to-self instruction which is finally resolved
>> > as a branch to the kernel vectors with speculation suppressed.
>> >
>> > Signed-off-by: Will Deacon <[email protected]>
>> > ---
>> > arch/arm64/kernel/entry.S | 5 ++++-
>> > 1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> > index 031392ee5f47..b9feb587294d 100644
>> > --- a/arch/arm64/kernel/entry.S
>> > +++ b/arch/arm64/kernel/entry.S
>> > @@ -1029,6 +1029,9 @@ alternative_else_nop_endif
>> > .if \regsize == 64
>> > msr tpidrro_el0, x30 // Restored in kernel_ventry
>> > .endif
>> > + bl 2f
>> > + b .
>> > +2:
>>
>> This deserves a comment, I guess?
>
> Yeah, I suppose ;) I'll lift something out of the commit message.
>
>> Also, is deliberately unbalancing the return stack likely to cause
>> performance problems, e.g., in libc hot paths?
>
> I don't think so, because it remains balanced after this code. We push an
> entry on with the BL and pop it with the RET; the rest of the return stack
> remains unchanged.

Ah, of course. For some reason, I had it in my mind that the failed
prediction affects the state of the return stack but that doesn't make
sense.

> That said, I'm also not sure what we could do differently
> here!
>
> Will

2018-01-04 23:15:19

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 03/11] arm64: Take into account ID_AA64PFR0_EL1.CSV3

On 01/04/2018 07:08 AM, Will Deacon wrote:
> For non-KASLR kernels where the KPTI behaviour has not been overridden
> on the command line we can use ID_AA64PFR0_EL1.CSV3 to determine whether
> or not we should unmap the kernel whilst running at EL0.
>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/arm64/include/asm/sysreg.h | 1 +
> arch/arm64/kernel/cpufeature.c | 7 ++++++-
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 08cc88574659..ae519bbd3f9e 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -437,6 +437,7 @@
> #define ID_AA64ISAR1_DPB_SHIFT 0
>
> /* id_aa64pfr0 */
> +#define ID_AA64PFR0_CSV3_SHIFT 60
> #define ID_AA64PFR0_SVE_SHIFT 32
> #define ID_AA64PFR0_GIC_SHIFT 24
> #define ID_AA64PFR0_ASIMD_SHIFT 20
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 9f0545dfe497..e11c11bb5b02 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -145,6 +145,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar1[] = {
> };
>
> static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
> + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0),
> ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
> ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_GIC_SHIFT, 4, 0),
> S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
> @@ -851,6 +852,8 @@ static int __kpti_forced; /* 0: not forced, >0: forced on, <0: forced off */
> static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
> int __unused)
> {
> + u64 pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
> +
> /* Forced on command line? */
> if (__kpti_forced) {
> pr_info_once("kernel page table isolation forced %s by command line option\n",
> @@ -862,7 +865,9 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
> if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> return true;
>
> - return false;
> + /* Defer to CPU feature registers */
> + return !cpuid_feature_extract_unsigned_field(pfr0,
> + ID_AA64PFR0_CSV3_SHIFT);
> }
>
> static int __init parse_kpti(char *str)
>

Nit: we only print a message if it's forced on the command line,
can we get a message similar to x86 regardless of state to
clearly indicate if KPTI is enabled?

Thanks,
Laura

2018-01-05 10:24:29

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH 03/11] arm64: Take into account ID_AA64PFR0_EL1.CSV3

On 04/01/18 15:08, Will Deacon wrote:
> For non-KASLR kernels where the KPTI behaviour has not been overridden
> on the command line we can use ID_AA64PFR0_EL1.CSV3 to determine whether
> or not we should unmap the kernel whilst running at EL0.
>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/arm64/include/asm/sysreg.h | 1 +
> arch/arm64/kernel/cpufeature.c | 7 ++++++-
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 08cc88574659..ae519bbd3f9e 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -437,6 +437,7 @@
> #define ID_AA64ISAR1_DPB_SHIFT 0
>
> /* id_aa64pfr0 */
> +#define ID_AA64PFR0_CSV3_SHIFT 60
> #define ID_AA64PFR0_SVE_SHIFT 32
> #define ID_AA64PFR0_GIC_SHIFT 24
> #define ID_AA64PFR0_ASIMD_SHIFT 20
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 9f0545dfe497..e11c11bb5b02 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -145,6 +145,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar1[] = {
> };
>
> static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
> + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0),
> ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
> ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_GIC_SHIFT, 4, 0),
> S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
> @@ -851,6 +852,8 @@ static int __kpti_forced; /* 0: not forced, >0: forced on, <0: forced off */
> static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
> int __unused)
> {
> + u64 pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
> +
> /* Forced on command line? */
> if (__kpti_forced) {
> pr_info_once("kernel page table isolation forced %s by command line option\n",
> @@ -862,7 +865,9 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
> if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> return true;
>
> - return false;
> + /* Defer to CPU feature registers */
> + return !cpuid_feature_extract_unsigned_field(pfr0,
> + ID_AA64PFR0_CSV3_SHIFT);
> }
>

The cpufeature bit changes look good to me. FWIW,

Reviewed-by: Suzuki K Poulose <[email protected]>

2018-01-05 10:29:26

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH 04/11] arm64: cpufeature: Pass capability structure to ->enable callback

On 04/01/18 15:08, Will Deacon wrote:
> In order to invoke the CPU capability ->matches callback from the ->enable
> callback for applying local-CPU workarounds, we need a handle on the
> capability structure.
>
> This patch passes a pointer to the capability structure to the ->enable
> callback.
>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/arm64/kernel/cpufeature.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index e11c11bb5b02..6133c14b9b01 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1151,7 +1151,7 @@ void __init enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps)
> * uses an IPI, giving us a PSTATE that disappears when
> * we return.
> */
> - stop_machine(caps->enable, NULL, cpu_online_mask);
> + stop_machine(caps->enable, (void *)caps, cpu_online_mask);
> }
> }
> }
> @@ -1194,7 +1194,7 @@ verify_local_cpu_features(const struct arm64_cpu_capabilities *caps)
> cpu_die_early();
> }
> if (caps->enable)
> - caps->enable(NULL);
> + caps->enable((void *)caps);
> }
> }
>
>

Reviewed-by: Suzuki K Poulose <[email protected]>