2021-08-12 19:16:14

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 0/5] arm64: Survival kit for SCR_EL3.HCE==0 conditions

Anyone vaguely familiar with the ARMv8 architecture would quickly
understand that entering the kernel at EL2 without enabling the HVC
instruction is... living dangerously. But as it turns out [0], there
is a whole range of (*cough*) "high quality" (*cough*) Broadcom
systems out there configured exactly like that.

If you are speechless, I'm right with you.

These machines have stopped being able to boot an upstream kernel
since 5.12, where we changed the way we switch from nVHE to VHE, as
this relies on the HVC instruction being usable... It is also worth
noting that these systems have never been able to use KVM. Or kexec.

This small series addresses the issue by detecting an UNDEFing HVC in
a fairly controlled environment, and in this case pretend that we have
booted at EL1. It also documents the requirement for SCR_EL3.HCE to be
set to *1* if the kernel is entered at EL2. Turns out that we really
have to state the obvious.

This has been tested on a FVP model with a hacked-up boot-wrapper.

Note that I really don't think any of this is -stable material, except
maybe for the documentation. It isn't 5.14 material either. Best case,
this is 5.15, or maybe even later. If ever.

M. (drink required)

[0] https://lore.kernel.org/r/[email protected]

Marc Zyngier (5):
arm64: Directly expand __init_el2_nvhe_prepare_eret where needed
arm64: Handle UNDEF in the EL2 stub vectors
arm64: Detect disabled HVC early
arm64: Warn on booting at EL2 with HVC disabled
arm64: Document the requirement for SCR_EL3.HCE

Documentation/arm64/booting.rst | 5 +++++
arch/arm64/include/asm/el2_setup.h | 6 ------
arch/arm64/include/asm/virt.h | 10 +++++++++
arch/arm64/kernel/head.S | 34 ++++++++++++++++++++++++++++++
arch/arm64/kernel/hyp-stub.S | 19 ++++++++++++++++-
arch/arm64/kernel/smp.c | 3 +++
6 files changed, 70 insertions(+), 7 deletions(-)

--
2.30.2


2021-08-12 19:16:52

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 1/5] arm64: Directly expand __init_el2_nvhe_prepare_eret where needed

As we are about to enable taking exceptions extremely early,
it becomes pointless to pre-populate SPSR_EL2 in init_el2_state.

We already have the stuck-in-VHE handling code that needs some
special casing, so let's bite the bullet and get rid of the macros
by expanding them where needed.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/include/asm/el2_setup.h | 6 ------
arch/arm64/kernel/head.S | 2 ++
2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 21fa330f498d..0b5c8033eaf2 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -164,11 +164,6 @@
.Lskip_fgt_\@:
.endm

-.macro __init_el2_nvhe_prepare_eret
- mov x0, #INIT_PSTATE_EL1
- msr spsr_el2, x0
-.endm
-
/**
* Initialize EL2 registers to sane values. This should be called early on all
* cores that were booted in EL2. Note that everything gets initialised as
@@ -189,7 +184,6 @@
__init_el2_nvhe_cptr
__init_el2_nvhe_sve
__init_el2_fgt
- __init_el2_nvhe_prepare_eret
.endm

#endif /* __ARM_KVM_INIT_H__ */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index c5c994a73a64..9d5aa56a98cc 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -547,6 +547,8 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
mov_q x0, INIT_SCTLR_EL1_MMU_OFF
msr sctlr_el1, x0

+ mov_q x0, INIT_PSTATE_EL1
+ msr spsr_el2, x0
msr elr_el2, lr
mov w0, #BOOT_CPU_MODE_EL2
eret
--
2.30.2

2021-08-12 19:17:13

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 2/5] arm64: Handle UNDEF in the EL2 stub vectors

As we want to handle the silly case where HVC has been disabled
from EL3, augment our ability to handle exception at EL2.

Check for unknown exceptions (usually UNDEF) coming from EL2,
and treat them as a failing HVC call into the stubs. While
this isn't great and obviously doesn't catter for the gigantic
range of possible exceptions, it isn't any worse than what we
have today.

Just don't try and use it for anything else.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kernel/hyp-stub.S | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 43d212618834..026a34515b21 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -46,7 +46,16 @@ SYM_CODE_END(__hyp_stub_vectors)
.align 11

SYM_CODE_START_LOCAL(elx_sync)
- cmp x0, #HVC_SET_VECTORS
+ mrs x4, spsr_el2
+ and x4, x4, #PSR_MODE_MASK
+ orr x4, x4, #1
+ cmp x4, #PSR_MODE_EL2h
+ b.ne 0f
+ mrs x4, esr_el2
+ eor x4, x4, #ESR_ELx_IL
+ cbz x4, el2_undef
+
+0: cmp x0, #HVC_SET_VECTORS
b.ne 1f
msr vbar_el2, x1
b 9f
@@ -71,6 +80,14 @@ SYM_CODE_START_LOCAL(elx_sync)

9: mov x0, xzr
eret
+
+el2_undef:
+ // Assumes this was a HVC that went really wrong...
+ mrs x0, elr_el2
+ add x0, x0, #4
+ msr elr_el2, x0
+ mov_q x0, HVC_STUB_ERR
+ eret
SYM_CODE_END(elx_sync)

// nVHE? No way! Give me the real thing!
--
2.30.2

2021-08-12 20:26:04

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 5/5] arm64: Document the requirement for SCR_EL3.HCE

It is amazing that we never documented this absolutely basic
requirement: if you boot the kernel at EL2, you'd better
enable the HVC instruction from EL3.

Really, just do it.

Signed-off-by: Marc Zyngier <[email protected]>
---
Documentation/arm64/booting.rst | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
index a9192e7a231b..6c729d0c4bc2 100644
--- a/Documentation/arm64/booting.rst
+++ b/Documentation/arm64/booting.rst
@@ -212,6 +212,11 @@ Before jumping into the kernel, the following conditions must be met:
- The value of SCR_EL3.FIQ must be the same as the one present at boot
time whenever the kernel is executing.

+ For all systems:
+ - If EL3 is present and the kernel is entered at EL2:
+
+ - SCR_EL3.HCE (bit 8) must be initialised to 0b1.
+
For systems with a GICv3 interrupt controller to be used in v3 mode:
- If EL3 is present:

--
2.30.2

2021-08-12 20:27:29

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 4/5] arm64: Warn on booting at EL2 with HVC disabled

Now that we are able to paper over the gigantic stupidity that
booting at EL2 with SCR_EL3.HCE==0 is, let's taint WARN_TAINT()
when detecting this situation.

Yes, this is *LOUD*.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/include/asm/virt.h | 10 ++++++++++
arch/arm64/kernel/head.S | 10 ++++++++++
arch/arm64/kernel/smp.c | 3 +++
3 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 7379f35ae2c6..89bf5ae522da 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -49,6 +49,9 @@
#define BOOT_CPU_MODE_EL1 (0xe11)
#define BOOT_CPU_MODE_EL2 (0xe12)

+/* Flags associated to the boot mode */
+#define BOOT_CPU_MODE_DOWNGRADED (1 << 0)
+
#ifndef __ASSEMBLY__

#include <asm/ptrace.h>
@@ -67,6 +70,13 @@
*/
extern u32 __boot_cpu_mode[2];

+/*
+ * __boot_cpu_mode_flags records events that are associated with CPUs
+ * coming online. A CPU having been downgraded from EL2 to EL1 because
+ * of HVC not being enabled will have BOOT_CPU_MODE_DOWNGRADED set.
+ */
+extern u32 __boot_cpu_mode_flags[1];
+
void __hyp_set_vectors(phys_addr_t phys_vector_base);
void __hyp_reset_vectors(void);

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d6b2b05f5d3a..fdad6805868b 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -530,7 +530,13 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
/*
* HVC is unusable, so pretend we actually booted at EL1.
* Once we have left EL2, there will be no going back.
+ * set_cpu_boot_mode_flag will do the necessary CMOs for us.
*/
+ adr_l x1, __boot_cpu_mode_flags
+ ldr w0, [x1]
+ orr w0, w0, BOOT_CPU_MODE_DOWNGRADED
+ str w0, [x1]
+
mov_q x0, INIT_SCTLR_EL1_MMU_OFF
msr sctlr_el1, x0

@@ -623,6 +629,10 @@ SYM_DATA_START(__early_cpu_boot_status)
.quad 0
SYM_DATA_END(__early_cpu_boot_status)

+SYM_DATA_START(__boot_cpu_mode_flags)
+ .long 0
+SYM_DATA_END(__boot_cpu_mode_flags)
+
.popsection

/*
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 6f6ff072acbd..43fad7ca9110 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -425,6 +425,9 @@ static void __init hyp_mode_check(void)
else if (is_hyp_mode_mismatched())
WARN_TAINT(1, TAINT_CPU_OUT_OF_SPEC,
"CPU: CPUs started in inconsistent modes");
+ else if (__boot_cpu_mode_flags[0] & BOOT_CPU_MODE_DOWNGRADED)
+ WARN_TAINT(1, TAINT_CPU_OUT_OF_SPEC,
+ "CPU: CPUs downgraded to EL1, HVC disabled");
else
pr_info("CPU: All CPU(s) started at EL1\n");
if (IS_ENABLED(CONFIG_KVM) && !is_kernel_in_hyp_mode()) {
--
2.30.2

2021-08-12 21:59:49

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 3/5] arm64: Detect disabled HVC early

Having HVC disabled from EL3 while the kernel is entered at EL2
is a complete nightmare.

We end-up taking an UNDEF at the worse possible moment (checking
for VHE) and even if we didn't, having KVM enabled would signify
the premature end of the kernel.

Instead, try and detect this stupid case by issuing a HVC
for HVC_RESET_VECTORS, which does nothing when the stubs
are live. If we get HVC_STUB_ERR back, that's because the
UNDEF handler has kicked in.

In this situation, close your eyes, block your nose, and gracefully
pretend we have booted at EL1.

Reported-by: Rafał Miłecki <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kernel/head.S | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 9d5aa56a98cc..d6b2b05f5d3a 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -519,6 +519,28 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
msr vbar_el2, x0
isb

+ // Check that HVC actually works...
+ mov x0, #HVC_RESET_VECTORS
+ hvc #0
+
+ mov_q x1, HVC_STUB_ERR
+ cmp x0, x1
+ b.ne 0f
+
+ /*
+ * HVC is unusable, so pretend we actually booted at EL1.
+ * Once we have left EL2, there will be no going back.
+ */
+ mov_q x0, INIT_SCTLR_EL1_MMU_OFF
+ msr sctlr_el1, x0
+
+ mov_q x0, INIT_PSTATE_EL1
+ msr spsr_el2, x0
+ msr elr_el2, lr
+ mov_q w0, BOOT_CPU_MODE_EL1
+ eret
+
+0:
/*
* Fruity CPUs seem to have HCR_EL2.E2H set to RES1,
* making it impossible to start in nVHE mode. Is that
--
2.30.2

2021-08-12 22:13:18

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH 4/5] arm64: Warn on booting at EL2 with HVC disabled

On 12.08.2021 21:02, Marc Zyngier wrote:
> Now that we are able to paper over the gigantic stupidity that
> booting at EL2 with SCR_EL3.HCE==0 is, let's taint WARN_TAINT()
> when detecting this situation.
>
> Yes, this is *LOUD*.
>
> Signed-off-by: Marc Zyngier <[email protected]>

Tested-by: Rafał Miłecki <[email protected]>

This replaces:
CPU: All CPU(s) started at EL1

with:
------------[ cut here ]------------
CPU: CPUs downgraded to EL1, HVC disabled
WARNING: CPU: 0 PID: 1 at arch/arm64/kernel/smp.c:429 smp_cpus_done+0xac/0xe8
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5-g86fc10657896 #41
Hardware name: Asus GT-AC5300 (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
pc : smp_cpus_done+0xac/0xe8
lr : smp_cpus_done+0xac/0xe8
sp : ffffffc01002be00
x29: ffffffc01002be00 x28: 0000000000000000 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
x23: ffffffc010ab4000 x22: 0000000000000000 x21: 0000000000000000
x20: ffffffc0107b7e74 x19: ffffffc010a78000 x18: 0000000000000001
x17: ffffffc010a9ee40 x16: 0000000000000000 x15: 000042496b0a18f2
x14: fffffffffffc0f87 x13: 0000000000000039 x12: ffffff80010b03b0
x11: 00000000ffffffea x10: ffffffc010a5eb50 x9 : 0000000000000001
x8 : 0000000000000001 x7 : 0000000000017fe8 x6 : c0000000ffffefff
x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 00000000ffffffff x1 : a8d68d1fd96fdc00 x0 : 0000000000000000
Call trace:
smp_cpus_done+0xac/0xe8
smp_init+0x68/0x78
kernel_init_freeable+0xd0/0x214
kernel_init+0x24/0x120
ret_from_fork+0x10/0x18
---[ end trace a7d4af835e9d6e6b ]---


BEFORE:

smp: Bringing up secondary CPUs ...
Detected VIPT I-cache on CPU1
CPU1: Booted secondary processor 0x0000000001 [0x420f1000]
Detected VIPT I-cache on CPU2
CPU2: Booted secondary processor 0x0000000002 [0x420f1000]
Detected VIPT I-cache on CPU3
CPU3: Booted secondary processor 0x0000000003 [0x420f1000]
smp: Brought up 1 node, 4 CPUs
SMP: Total of 4 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: 32-bit EL1 Support
CPU features: detected: CRC32 instructions
CPU: All CPU(s) started at EL1


AFTER:

smp: Bringing up secondary CPUs ...
Detected VIPT I-cache on CPU1
CPU1: Booted secondary processor 0x0000000001 [0x420f1000]
Detected VIPT I-cache on CPU2
CPU2: Booted secondary processor 0x0000000002 [0x420f1000]
Detected VIPT I-cache on CPU3
CPU3: Booted secondary processor 0x0000000003 [0x420f1000]
smp: Brought up 1 node, 4 CPUs
SMP: Total of 4 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: 32-bit EL1 Support
CPU features: detected: CRC32 instructions
------------[ cut here ]------------
CPU: CPUs downgraded to EL1, HVC disabled
WARNING: CPU: 0 PID: 1 at arch/arm64/kernel/smp.c:429 smp_cpus_done+0xac/0xe8
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5-g86fc10657896 #41
Hardware name: Asus GT-AC5300 (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
pc : smp_cpus_done+0xac/0xe8
lr : smp_cpus_done+0xac/0xe8
sp : ffffffc01002be00
x29: ffffffc01002be00 x28: 0000000000000000 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
x23: ffffffc010ab4000 x22: 0000000000000000 x21: 0000000000000000
x20: ffffffc0107b7e74 x19: ffffffc010a78000 x18: 0000000000000001
x17: ffffffc010a9ee40 x16: 0000000000000000 x15: 000042496b0a18f2
x14: fffffffffffc0f87 x13: 0000000000000039 x12: ffffff80010b03b0
x11: 00000000ffffffea x10: ffffffc010a5eb50 x9 : 0000000000000001
x8 : 0000000000000001 x7 : 0000000000017fe8 x6 : c0000000ffffefff
x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 00000000ffffffff x1 : a8d68d1fd96fdc00 x0 : 0000000000000000
Call trace:
smp_cpus_done+0xac/0xe8
smp_init+0x68/0x78
kernel_init_freeable+0xd0/0x214
kernel_init+0x24/0x120
ret_from_fork+0x10/0x18
---[ end trace a7d4af835e9d6e6b ]---

2021-08-12 22:13:18

by Rafał Miłecki

[permalink] [raw]
Subject: Re: [PATCH 3/5] arm64: Detect disabled HVC early

On 12.08.2021 21:02, Marc Zyngier wrote:
> Having HVC disabled from EL3 while the kernel is entered at EL2
> is a complete nightmare.
>
> We end-up taking an UNDEF at the worse possible moment (checking
> for VHE) and even if we didn't, having KVM enabled would signify
> the premature end of the kernel.
>
> Instead, try and detect this stupid case by issuing a HVC
> for HVC_RESET_VECTORS, which does nothing when the stubs
> are live. If we get HVC_STUB_ERR back, that's because the
> UNDEF handler has kicked in.
>
> In this situation, close your eyes, block your nose, and gracefully
> pretend we have booted at EL1.
>
> Reported-by: Rafał Miłecki <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>

This does the trick.

Tested-by: Rafał Miłecki <[email protected]>

2021-08-13 09:10:17

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 3/5] arm64: Detect disabled HVC early

On Thu, Aug 12, 2021 at 08:02:11PM +0100, Marc Zyngier wrote:
> Having HVC disabled from EL3 while the kernel is entered at EL2
> is a complete nightmare.
>
> We end-up taking an UNDEF at the worse possible moment (checking
> for VHE) and even if we didn't, having KVM enabled would signify
> the premature end of the kernel.
>
> Instead, try and detect this stupid case by issuing a HVC
> for HVC_RESET_VECTORS, which does nothing when the stubs
> are live. If we get HVC_STUB_ERR back, that's because the
> UNDEF handler has kicked in.
>
> In this situation, close your eyes, block your nose, and gracefully
> pretend we have booted at EL1.
>
> Reported-by: Rafał Miłecki <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/kernel/head.S | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 9d5aa56a98cc..d6b2b05f5d3a 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -519,6 +519,28 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
> msr vbar_el2, x0
> isb
>
> + // Check that HVC actually works...
> + mov x0, #HVC_RESET_VECTORS
> + hvc #0
> +
> + mov_q x1, HVC_STUB_ERR
> + cmp x0, x1
> + b.ne 0f
> +
> + /*
> + * HVC is unusable, so pretend we actually booted at EL1.
> + * Once we have left EL2, there will be no going back.
> + */

This comment got me thinking...

.macro host_hvc0
mrs xzr, actlr_el1
.endm

then set HCR_EL2.TACR=1 while we still can and match the ISS against a
constant in the handler. Too awful?

Will

2021-08-13 14:16:13

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 2/5] arm64: Handle UNDEF in the EL2 stub vectors

On 2021-08-12 20:02, Marc Zyngier wrote:
> As we want to handle the silly case where HVC has been disabled
> from EL3, augment our ability to handle exception at EL2.
>
> Check for unknown exceptions (usually UNDEF) coming from EL2,
> and treat them as a failing HVC call into the stubs. While
> this isn't great and obviously doesn't catter for the gigantic
> range of possible exceptions, it isn't any worse than what we
> have today.
>
> Just don't try and use it for anything else.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/kernel/hyp-stub.S | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
> index 43d212618834..026a34515b21 100644
> --- a/arch/arm64/kernel/hyp-stub.S
> +++ b/arch/arm64/kernel/hyp-stub.S
> @@ -46,7 +46,16 @@ SYM_CODE_END(__hyp_stub_vectors)
> .align 11
>
> SYM_CODE_START_LOCAL(elx_sync)
> - cmp x0, #HVC_SET_VECTORS
> + mrs x4, spsr_el2
> + and x4, x4, #PSR_MODE_MASK
> + orr x4, x4, #1
> + cmp x4, #PSR_MODE_EL2h
> + b.ne 0f
> + mrs x4, esr_el2
> + eor x4, x4, #ESR_ELx_IL
> + cbz x4, el2_undef

Hmm, might it be neater to check ESR_EL2.ISS to see if we landed here
for any reason *other* than a successfully-executed HVC?

Robin.

> +
> +0: cmp x0, #HVC_SET_VECTORS
> b.ne 1f
> msr vbar_el2, x1
> b 9f
> @@ -71,6 +80,14 @@ SYM_CODE_START_LOCAL(elx_sync)
>
> 9: mov x0, xzr
> eret
> +
> +el2_undef:
> + // Assumes this was a HVC that went really wrong...
> + mrs x0, elr_el2
> + add x0, x0, #4
> + msr elr_el2, x0
> + mov_q x0, HVC_STUB_ERR
> + eret
> SYM_CODE_END(elx_sync)
>
> // nVHE? No way! Give me the real thing!
>

2021-08-13 17:38:45

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 3/5] arm64: Detect disabled HVC early

On Fri, 13 Aug 2021 10:05:40 +0100,
Will Deacon <[email protected]> wrote:
>
> On Thu, Aug 12, 2021 at 08:02:11PM +0100, Marc Zyngier wrote:
> > Having HVC disabled from EL3 while the kernel is entered at EL2
> > is a complete nightmare.
> >
> > We end-up taking an UNDEF at the worse possible moment (checking
> > for VHE) and even if we didn't, having KVM enabled would signify
> > the premature end of the kernel.
> >
> > Instead, try and detect this stupid case by issuing a HVC
> > for HVC_RESET_VECTORS, which does nothing when the stubs
> > are live. If we get HVC_STUB_ERR back, that's because the
> > UNDEF handler has kicked in.
> >
> > In this situation, close your eyes, block your nose, and gracefully
> > pretend we have booted at EL1.
> >
> > Reported-by: Rafał Miłecki <[email protected]>
> > Signed-off-by: Marc Zyngier <[email protected]>
> > ---
> > arch/arm64/kernel/head.S | 22 ++++++++++++++++++++++
> > 1 file changed, 22 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 9d5aa56a98cc..d6b2b05f5d3a 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -519,6 +519,28 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
> > msr vbar_el2, x0
> > isb
> >
> > + // Check that HVC actually works...
> > + mov x0, #HVC_RESET_VECTORS
> > + hvc #0
> > +
> > + mov_q x1, HVC_STUB_ERR
> > + cmp x0, x1
> > + b.ne 0f
> > +
> > + /*
> > + * HVC is unusable, so pretend we actually booted at EL1.
> > + * Once we have left EL2, there will be no going back.
> > + */
>
> This comment got me thinking...
>
> .macro host_hvc0
> mrs xzr, actlr_el1
> .endm
>
> then set HCR_EL2.TACR=1 while we still can and match the ISS against a
> constant in the handler. Too awful?

I had a similar idea, using an ID register instead (though ACTLR_EL1
is much neater). It would indeed allow the kernel to go back to EL2.

But for what purpose? Yes, we could enable KVM that way. However, the
guest would need to know about it as well, as even simple things like
PSCI wouldn't work. That's would be defining a new guest ABI, and I'm
not overly keen on it.

Unless you are thinking of something else?

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-08-13 17:43:19

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 2/5] arm64: Handle UNDEF in the EL2 stub vectors

On Fri, 13 Aug 2021 14:08:23 +0100,
Robin Murphy <[email protected]> wrote:
>
> On 2021-08-12 20:02, Marc Zyngier wrote:
> > As we want to handle the silly case where HVC has been disabled
> > from EL3, augment our ability to handle exception at EL2.
> >
> > Check for unknown exceptions (usually UNDEF) coming from EL2,
> > and treat them as a failing HVC call into the stubs. While
> > this isn't great and obviously doesn't catter for the gigantic
> > range of possible exceptions, it isn't any worse than what we
> > have today.
> >
> > Just don't try and use it for anything else.
> >
> > Signed-off-by: Marc Zyngier <[email protected]>
> > ---
> > arch/arm64/kernel/hyp-stub.S | 19 ++++++++++++++++++-
> > 1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
> > index 43d212618834..026a34515b21 100644
> > --- a/arch/arm64/kernel/hyp-stub.S
> > +++ b/arch/arm64/kernel/hyp-stub.S
> > @@ -46,7 +46,16 @@ SYM_CODE_END(__hyp_stub_vectors)
> > .align 11
> > SYM_CODE_START_LOCAL(elx_sync)
> > - cmp x0, #HVC_SET_VECTORS
> > + mrs x4, spsr_el2
> > + and x4, x4, #PSR_MODE_MASK
> > + orr x4, x4, #1
> > + cmp x4, #PSR_MODE_EL2h
> > + b.ne 0f
> > + mrs x4, esr_el2
> > + eor x4, x4, #ESR_ELx_IL
> > + cbz x4, el2_undef
>
> Hmm, might it be neater to check ESR_EL2.ISS to see if we landed here
> for any reason *other* than a successfully-executed HVC?

We absolutely could. However, the sixpence question (yes, that's the
Brexit effect for you) is "what do you do with exceptions that are
neither UNDEF now HVC?".

We are taking a leap of faith by assuming that the only thing that
will UNDEF at EL2 while the stubs are installed is HVC. If anything
else occurs, I have no idea what to do with it. I guess we could always
ignore it instead of treating it as a HVC (as it is done at the
moment).

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-08-13 18:18:58

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 2/5] arm64: Handle UNDEF in the EL2 stub vectors

On 2021-08-13 18:41, Marc Zyngier wrote:
> On Fri, 13 Aug 2021 14:08:23 +0100,
> Robin Murphy <[email protected]> wrote:
>>
>> On 2021-08-12 20:02, Marc Zyngier wrote:
>>> As we want to handle the silly case where HVC has been disabled
>>> from EL3, augment our ability to handle exception at EL2.
>>>
>>> Check for unknown exceptions (usually UNDEF) coming from EL2,
>>> and treat them as a failing HVC call into the stubs. While
>>> this isn't great and obviously doesn't catter for the gigantic
>>> range of possible exceptions, it isn't any worse than what we
>>> have today.
>>>
>>> Just don't try and use it for anything else.
>>>
>>> Signed-off-by: Marc Zyngier <[email protected]>
>>> ---
>>> arch/arm64/kernel/hyp-stub.S | 19 ++++++++++++++++++-
>>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
>>> index 43d212618834..026a34515b21 100644
>>> --- a/arch/arm64/kernel/hyp-stub.S
>>> +++ b/arch/arm64/kernel/hyp-stub.S
>>> @@ -46,7 +46,16 @@ SYM_CODE_END(__hyp_stub_vectors)
>>> .align 11
>>> SYM_CODE_START_LOCAL(elx_sync)
>>> - cmp x0, #HVC_SET_VECTORS
>>> + mrs x4, spsr_el2
>>> + and x4, x4, #PSR_MODE_MASK
>>> + orr x4, x4, #1
>>> + cmp x4, #PSR_MODE_EL2h
>>> + b.ne 0f
>>> + mrs x4, esr_el2
>>> + eor x4, x4, #ESR_ELx_IL
>>> + cbz x4, el2_undef
>>
>> Hmm, might it be neater to check ESR_EL2.ISS to see if we landed here
>> for any reason *other* than a successfully-executed HVC?
>
> We absolutely could. However, the sixpence question (yes, that's the
> Brexit effect for you) is "what do you do with exceptions that are
> neither UNDEF now HVC?".
>
> We are taking a leap of faith by assuming that the only thing that
> will UNDEF at EL2 while the stubs are installed is HVC. If anything
> else occurs, I have no idea what to do with it. I guess we could always
> ignore it instead of treating it as a HVC (as it is done at the
> moment).

Right, I think that concern applies pretty much equally whichever way
you slice it. "Any exception other than an unknown from EL2 must imply
HVC" doesn't seem any less sketchy than "Any exception other than HVC
implies something is horribly wrong and abandoning EL2 might be wise" to
me, but it was primarily that the latter avoids having to faff with the
SPSR as well. No big deal either way, just one of my "I reckon this
could be shorter..." musings; it's been particularly Friday today :)

Robin.

2021-08-14 09:40:48

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 2/5] arm64: Handle UNDEF in the EL2 stub vectors

On Fri, 13 Aug 2021 19:17:56 +0100,
Robin Murphy <[email protected]> wrote:
>
> On 2021-08-13 18:41, Marc Zyngier wrote:
> > On Fri, 13 Aug 2021 14:08:23 +0100,
> > Robin Murphy <[email protected]> wrote:
> >>
> >> On 2021-08-12 20:02, Marc Zyngier wrote:
> >>> As we want to handle the silly case where HVC has been disabled
> >>> from EL3, augment our ability to handle exception at EL2.
> >>>
> >>> Check for unknown exceptions (usually UNDEF) coming from EL2,
> >>> and treat them as a failing HVC call into the stubs. While
> >>> this isn't great and obviously doesn't catter for the gigantic
> >>> range of possible exceptions, it isn't any worse than what we
> >>> have today.
> >>>
> >>> Just don't try and use it for anything else.
> >>>
> >>> Signed-off-by: Marc Zyngier <[email protected]>
> >>> ---
> >>> arch/arm64/kernel/hyp-stub.S | 19 ++++++++++++++++++-
> >>> 1 file changed, 18 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
> >>> index 43d212618834..026a34515b21 100644
> >>> --- a/arch/arm64/kernel/hyp-stub.S
> >>> +++ b/arch/arm64/kernel/hyp-stub.S
> >>> @@ -46,7 +46,16 @@ SYM_CODE_END(__hyp_stub_vectors)
> >>> .align 11
> >>> SYM_CODE_START_LOCAL(elx_sync)
> >>> - cmp x0, #HVC_SET_VECTORS
> >>> + mrs x4, spsr_el2
> >>> + and x4, x4, #PSR_MODE_MASK
> >>> + orr x4, x4, #1
> >>> + cmp x4, #PSR_MODE_EL2h
> >>> + b.ne 0f
> >>> + mrs x4, esr_el2
> >>> + eor x4, x4, #ESR_ELx_IL
> >>> + cbz x4, el2_undef
> >>
> >> Hmm, might it be neater to check ESR_EL2.ISS to see if we landed here
> >> for any reason *other* than a successfully-executed HVC?
> >
> > We absolutely could. However, the sixpence question (yes, that's the
> > Brexit effect for you) is "what do you do with exceptions that are
> > neither UNDEF now HVC?".
> >
> > We are taking a leap of faith by assuming that the only thing that
> > will UNDEF at EL2 while the stubs are installed is HVC. If anything
> > else occurs, I have no idea what to do with it. I guess we could always
> > ignore it instead of treating it as a HVC (as it is done at the
> > moment).
>
> Right, I think that concern applies pretty much equally whichever way
> you slice it. "Any exception other than an unknown from EL2 must imply
> HVC" doesn't seem any less sketchy than "Any exception other than HVC
> implies something is horribly wrong and abandoning EL2 might be wise"
> to me, but it was primarily that the latter avoids having to faff with
> the SPSR as well.

Actually, that's not a bad idea at all. Here's my take on the theme,
completely untested:

diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 43d212618834..5783dbab529f 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -46,6 +46,23 @@ SYM_CODE_END(__hyp_stub_vectors)
.align 11

SYM_CODE_START_LOCAL(elx_sync)
+ // tpidr_el2 isn't used for anything while the stubs are
+ // installed, so use it to save x0 while we guess the
+ // exception type. No, we don't have a stack...
+ msr tpidr_el2, x0
+ mrs x0, esr_el2
+ ubfx x0, x0, #26, #6
+ cmp x0, #ESR_ELx_EC_HVC64
+ b.eq elx_hvc
+ cbz x0, elx_unknown
+
+ // For anything else, we have no reasonable way to handle
+ // the exception. Go back to the faulting instruction...
+ mrs x0, tpidr_el2
+ eret
+
+elx_hvc:
+ mrs x0, tpidr_el2
cmp x0, #HVC_SET_VECTORS
b.ne 1f
msr vbar_el2, x1
@@ -71,6 +88,14 @@ SYM_CODE_START_LOCAL(elx_sync)

9: mov x0, xzr
eret
+
+elx_unknown:
+ // Assumes this was a HVC that went really wrong...
+ mrs x0, elr_el2
+ add x0, x0, #4
+ msr elr_el2, x0
+ mov_q x0, HVC_STUB_ERR
+ eret
SYM_CODE_END(elx_sync)

// nVHE? No way! Give me the real thing!


> No big deal either way, just one of my "I reckon this could be
> shorter..." musings; it's been particularly Friday today :)

Well, I just made it a lot longer! :D Let me know what you think.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-08-15 07:30:47

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 0/5] arm64: Survival kit for SCR_EL3.HCE==0 conditions



On 8/12/2021 9:02 PM, Marc Zyngier wrote:
> Anyone vaguely familiar with the ARMv8 architecture would quickly
> understand that entering the kernel at EL2 without enabling the HVC
> instruction is... living dangerously. But as it turns out [0], there
> is a whole range of (*cough*) "high quality" (*cough*) Broadcom
> systems out there configured exactly like that.

Some Broadcom systems, namely the 4908 and all of those using CFE, they
later switched to u-boot and ATF and got it right.

>
> If you are speechless, I'm right with you.
>
> These machines have stopped being able to boot an upstream kernel
> since 5.12, where we changed the way we switch from nVHE to VHE, as
> this relies on the HVC instruction being usable... It is also worth
> noting that these systems have never been able to use KVM. Or kexec.
>
> This small series addresses the issue by detecting an UNDEFing HVC in
> a fairly controlled environment, and in this case pretend that we have
> booted at EL1. It also documents the requirement for SCR_EL3.HCE to be
> set to *1* if the kernel is entered at EL2. Turns out that we really
> have to state the obvious.
>
> This has been tested on a FVP model with a hacked-up boot-wrapper.
>
> Note that I really don't think any of this is -stable material, except
> maybe for the documentation. It isn't 5.14 material either. Best case,
> this is 5.15, or maybe even later. If ever.

While I am very appreciative of the work you have done here to try to
get the dysfunctional systems to warn and continue to boot, I would
rather we try to load a minimal shim at EL3 capable of fixing up any
incorrect EL3 register setting ahead of loading the kernel provided this
is possible at all on a commercially available system. Rafal, is this
something that CFE allows you to do (as I could not get a straight
answer from that team), if so have you tried it?
--
Florian

2021-08-15 09:30:59

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 0/5] arm64: Survival kit for SCR_EL3.HCE==0 conditions

On Sun, 15 Aug 2021 08:28:47 +0100,
Florian Fainelli <[email protected]> wrote:
>
>
>
> On 8/12/2021 9:02 PM, Marc Zyngier wrote:
> > Anyone vaguely familiar with the ARMv8 architecture would quickly
> > understand that entering the kernel at EL2 without enabling the HVC
> > instruction is... living dangerously. But as it turns out [0], there
> > is a whole range of (*cough*) "high quality" (*cough*) Broadcom
> > systems out there configured exactly like that.
>
> Some Broadcom systems, namely the 4908 and all of those using CFE,
> they later switched to u-boot and ATF and got it right.

Do we have a list of the affected systems?

>
> >
> > If you are speechless, I'm right with you.
> >
> > These machines have stopped being able to boot an upstream kernel
> > since 5.12, where we changed the way we switch from nVHE to VHE, as
> > this relies on the HVC instruction being usable... It is also worth
> > noting that these systems have never been able to use KVM. Or kexec.
> >
> > This small series addresses the issue by detecting an UNDEFing HVC in
> > a fairly controlled environment, and in this case pretend that we have
> > booted at EL1. It also documents the requirement for SCR_EL3.HCE to be
> > set to *1* if the kernel is entered at EL2. Turns out that we really
> > have to state the obvious.
> >
> > This has been tested on a FVP model with a hacked-up boot-wrapper.
> >
> > Note that I really don't think any of this is -stable material, except
> > maybe for the documentation. It isn't 5.14 material either. Best case,
> > this is 5.15, or maybe even later. If ever.
>
> While I am very appreciative of the work you have done here to try to
> get the dysfunctional systems to warn and continue to boot, I would
> rather we try to load a minimal shim at EL3 capable of fixing up any
> incorrect EL3 register setting ahead of loading the kernel provided
> this is possible at all on a commercially available system.

That would be the best thing to do, and would make the machine fully
usable. I still think we need to have something in the kernel to at
least let the user know that their system is misconfigured though.

If CFE allows a payload to be loaded at EL3 and executed on all CPUs,
that would be absolutely awesome. It would even allow switching over
to ATF...

Thanks,

M.

> Rafal, is this something that CFE allows you to do (as I could not
> get a straight answer from that team), if so have you tried it?
> --
> Florian
>

--
Without deviation from the norm, progress is not possible.

2021-08-22 11:37:18

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 0/5] arm64: Survival kit for SCR_EL3.HCE==0 conditions



On 8/15/2021 11:27 AM, Marc Zyngier wrote:
> On Sun, 15 Aug 2021 08:28:47 +0100,
> Florian Fainelli <[email protected]> wrote:
>>
>>
>>
>> On 8/12/2021 9:02 PM, Marc Zyngier wrote:
>>> Anyone vaguely familiar with the ARMv8 architecture would quickly
>>> understand that entering the kernel at EL2 without enabling the HVC
>>> instruction is... living dangerously. But as it turns out [0], there
>>> is a whole range of (*cough*) "high quality" (*cough*) Broadcom
>>> systems out there configured exactly like that.
>>
>> Some Broadcom systems, namely the 4908 and all of those using CFE,
>> they later switched to u-boot and ATF and got it right.
>
> Do we have a list of the affected systems?

That is going to be hard to come up with given that OEMs such as Asus,
TP-Link, or Netgear and multiple devices are affected. I would start
with this list which is far from exhaustive: Netgear R8000P, TP-Link
Archer C2300, Asus GT AC5300.

>
>>
>>>
>>> If you are speechless, I'm right with you.
>>>
>>> These machines have stopped being able to boot an upstream kernel
>>> since 5.12, where we changed the way we switch from nVHE to VHE, as
>>> this relies on the HVC instruction being usable... It is also worth
>>> noting that these systems have never been able to use KVM. Or kexec.
>>>
>>> This small series addresses the issue by detecting an UNDEFing HVC in
>>> a fairly controlled environment, and in this case pretend that we have
>>> booted at EL1. It also documents the requirement for SCR_EL3.HCE to be
>>> set to *1* if the kernel is entered at EL2. Turns out that we really
>>> have to state the obvious.
>>>
>>> This has been tested on a FVP model with a hacked-up boot-wrapper.
>>>
>>> Note that I really don't think any of this is -stable material, except
>>> maybe for the documentation. It isn't 5.14 material either. Best case,
>>> this is 5.15, or maybe even later. If ever.
>>
>> While I am very appreciative of the work you have done here to try to
>> get the dysfunctional systems to warn and continue to boot, I would
>> rather we try to load a minimal shim at EL3 capable of fixing up any
>> incorrect EL3 register setting ahead of loading the kernel provided
>> this is possible at all on a commercially available system.
>
> That would be the best thing to do, and would make the machine fully
> usable. I still think we need to have something in the kernel to at
> least let the user know that their system is misconfigured though.
>
> If CFE allows a payload to be loaded at EL3 and executed on all CPUs,
> that would be absolutely awesome. It would even allow switching over
> to ATF...

Supposedly there are GPL tarballs containing an u-boot that is suitable
for the 4908, however I have no idea where to find those. In premise you
could chainload those to CFE and a get seemingly functional system.
--
Florian

2021-08-24 10:50:44

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 5/5] arm64: Document the requirement for SCR_EL3.HCE

On Thu, Aug 12, 2021 at 08:02:13PM +0100, Marc Zyngier wrote:
> It is amazing that we never documented this absolutely basic
> requirement: if you boot the kernel at EL2, you'd better
> enable the HVC instruction from EL3.
>
> Really, just do it.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> Documentation/arm64/booting.rst | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> index a9192e7a231b..6c729d0c4bc2 100644
> --- a/Documentation/arm64/booting.rst
> +++ b/Documentation/arm64/booting.rst
> @@ -212,6 +212,11 @@ Before jumping into the kernel, the following conditions must be met:
> - The value of SCR_EL3.FIQ must be the same as the one present at boot
> time whenever the kernel is executing.
>
> + For all systems:
> + - If EL3 is present and the kernel is entered at EL2:
> +
> + - SCR_EL3.HCE (bit 8) must be initialised to 0b1.
> +
> For systems with a GICv3 interrupt controller to be used in v3 mode:
> - If EL3 is present:

I'll queue this patch only for now.

A nitpick, I think we should move "For all systems" and "If EL3 is
present..." above the lines describing the SCR_EL3.FIQ requirement (I
can make the change locally).

--
Catalin

2021-08-24 10:54:01

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 5/5] arm64: Document the requirement for SCR_EL3.HCE

On Tue, Aug 24, 2021 at 11:49:01AM +0100, Catalin Marinas wrote:
> On Thu, Aug 12, 2021 at 08:02:13PM +0100, Marc Zyngier wrote:
> > It is amazing that we never documented this absolutely basic
> > requirement: if you boot the kernel at EL2, you'd better
> > enable the HVC instruction from EL3.
> >
> > Really, just do it.
> >
> > Signed-off-by: Marc Zyngier <[email protected]>
> > ---
> > Documentation/arm64/booting.rst | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> > index a9192e7a231b..6c729d0c4bc2 100644
> > --- a/Documentation/arm64/booting.rst
> > +++ b/Documentation/arm64/booting.rst
> > @@ -212,6 +212,11 @@ Before jumping into the kernel, the following conditions must be met:
> > - The value of SCR_EL3.FIQ must be the same as the one present at boot
> > time whenever the kernel is executing.
> >
> > + For all systems:
> > + - If EL3 is present and the kernel is entered at EL2:
> > +
> > + - SCR_EL3.HCE (bit 8) must be initialised to 0b1.
> > +
> > For systems with a GICv3 interrupt controller to be used in v3 mode:
> > - If EL3 is present:
>
> I'll queue this patch only for now.
>
> A nitpick, I think we should move "For all systems" and "If EL3 is
> present..." above the lines describing the SCR_EL3.FIQ requirement (I
> can make the change locally).


FWIW, with that:

Acked-by: Mark Rutland <[email protected]>

Mark.

2021-08-24 16:21:14

by Catalin Marinas

[permalink] [raw]
Subject: Re: (subset) [PATCH 0/5] arm64: Survival kit for SCR_EL3.HCE==0 conditions

On Thu, 12 Aug 2021 20:02:08 +0100, Marc Zyngier wrote:
> Anyone vaguely familiar with the ARMv8 architecture would quickly
> understand that entering the kernel at EL2 without enabling the HVC
> instruction is... living dangerously. But as it turns out [0], there
> is a whole range of (*cough*) "high quality" (*cough*) Broadcom
> systems out there configured exactly like that.
>
> If you are speechless, I'm right with you.
>
> [...]

Applied to arm64 (for-next/misc), thanks!

[5/5] arm64: Document the requirement for SCR_EL3.HCE
https://git.kernel.org/arm64/c/e3849765037b

--
Catalin