Subject: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

When running with return thunks enabled under 32-bit EFI, the system
crashes with:

[ 0.137688] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[ 0.138136] BUG: unable to handle page fault for address: 000000005bc02900
[ 0.138136] #PF: supervisor instruction fetch in kernel mode
[ 0.138136] #PF: error_code(0x0011) - permissions violation
[ 0.138136] PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063
[ 0.138136] Oops: 0011 [#1] PREEMPT SMP PTI
[ 0.138136] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166
[ 0.138136] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 0.138136] RIP: 0010:0x5bc02900
[ 0.138136] Code: Unable to access opcode bytes at RIP 0x5bc028d6.
[ 0.138136] RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046
[ 0.138136] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048
[ 0.138136] RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b
[ 0.138136] RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28
[ 0.138136] R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710
[ 0.138136] R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001
[ 0.138136] FS: 0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000
[ 0.138136] CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033
[ 0.138136] CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0
[ 0.138136] Call Trace:
[ 0.138136] <TASK>
[ 0.138136] ? efi_set_virtual_address_map+0x9c/0x175
[ 0.138136] efi_enter_virtual_mode+0x4a6/0x53e
[ 0.138136] start_kernel+0x67c/0x71e
[ 0.138136] x86_64_start_reservations+0x24/0x2a
[ 0.138136] x86_64_start_kernel+0xe9/0xf4
[ 0.138136] secondary_startup_64_no_verify+0xe5/0xeb
[ 0.138136] </TASK>

That's because it cannot jump to the return thunk from the 32-bit code.
Using a naked RET and marking it as safe allows the system to proceed
booting.

Fixes: aa3d480315ba ("x86: Use return-thunk in asm code")
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: <[email protected]>
---

Does this leave one potential attack vector open? Perhaps, since this is
running under a different mapping (AFAIU), the risk is reduced? Or rather, the
attacker could attack using the firmware RETs anyway?

Alternatively, we could use IBPB when available when using the wrapper.

Thoughts?

---
arch/x86/platform/efi/efi_thunk_64.S | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_thunk_64.S b/arch/x86/platform/efi/efi_thunk_64.S
index 9ffe2bad27d5..4e5257a4811b 100644
--- a/arch/x86/platform/efi/efi_thunk_64.S
+++ b/arch/x86/platform/efi/efi_thunk_64.S
@@ -23,6 +23,7 @@
#include <linux/objtool.h>
#include <asm/page_types.h>
#include <asm/segment.h>
+#include <asm/nospec-branch.h>

.text
.code64
@@ -75,7 +76,9 @@ STACK_FRAME_NON_STANDARD __efi64_thunk
1: movq 0x20(%rsp), %rsp
pop %rbx
pop %rbp
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3

.code32
2: pushl $__KERNEL_CS
--
2.34.1


2022-07-15 23:13:50

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Fri, Jul 15, 2022 at 04:45:50PM -0300, Thadeu Lima de Souza Cascardo wrote:
> When running with return thunks enabled under 32-bit EFI, the system
> crashes with:
>
> [ 0.137688] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [ 0.138136] BUG: unable to handle page fault for address: 000000005bc02900
> [ 0.138136] #PF: supervisor instruction fetch in kernel mode
> [ 0.138136] #PF: error_code(0x0011) - permissions violation
> [ 0.138136] PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063
> [ 0.138136] Oops: 0011 [#1] PREEMPT SMP PTI
> [ 0.138136] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166
> [ 0.138136] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [ 0.138136] RIP: 0010:0x5bc02900
> [ 0.138136] Code: Unable to access opcode bytes at RIP 0x5bc028d6.
> [ 0.138136] RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046
> [ 0.138136] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048
> [ 0.138136] RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b
> [ 0.138136] RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28
> [ 0.138136] R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710
> [ 0.138136] R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001
> [ 0.138136] FS: 0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000
> [ 0.138136] CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033
> [ 0.138136] CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0
> [ 0.138136] Call Trace:
> [ 0.138136] <TASK>
> [ 0.138136] ? efi_set_virtual_address_map+0x9c/0x175
> [ 0.138136] efi_enter_virtual_mode+0x4a6/0x53e
> [ 0.138136] start_kernel+0x67c/0x71e
> [ 0.138136] x86_64_start_reservations+0x24/0x2a
> [ 0.138136] x86_64_start_kernel+0xe9/0xf4
> [ 0.138136] secondary_startup_64_no_verify+0xe5/0xeb
> [ 0.138136] </TASK>
>
> That's because it cannot jump to the return thunk from the 32-bit code.
> Using a naked RET and marking it as safe allows the system to proceed
> booting.
>
> Fixes: aa3d480315ba ("x86: Use return-thunk in asm code")
> Reported-by: Guenter Roeck <[email protected]>
> Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: <[email protected]>

No more crashes with this patch applies on top of the mainline kernel
(sha e5d523f1ae8f).

Tested-by: Guenter Roeck <[email protected]>

Guenter

2022-07-18 11:54:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Fri, Jul 15, 2022 at 04:45:50PM -0300, Thadeu Lima de Souza Cascardo wrote:
> When running with return thunks enabled under 32-bit EFI, the system
> crashes with:
>
> [ 0.137688] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [ 0.138136] BUG: unable to handle page fault for address: 000000005bc02900
> [ 0.138136] #PF: supervisor instruction fetch in kernel mode
> [ 0.138136] #PF: error_code(0x0011) - permissions violation
> [ 0.138136] PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063
> [ 0.138136] Oops: 0011 [#1] PREEMPT SMP PTI
> [ 0.138136] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166
> [ 0.138136] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [ 0.138136] RIP: 0010:0x5bc02900
> [ 0.138136] Code: Unable to access opcode bytes at RIP 0x5bc028d6.
> [ 0.138136] RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046
> [ 0.138136] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048
> [ 0.138136] RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b
> [ 0.138136] RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28
> [ 0.138136] R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710
> [ 0.138136] R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001
> [ 0.138136] FS: 0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000
> [ 0.138136] CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033
> [ 0.138136] CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0
> [ 0.138136] Call Trace:
> [ 0.138136] <TASK>
> [ 0.138136] ? efi_set_virtual_address_map+0x9c/0x175
> [ 0.138136] efi_enter_virtual_mode+0x4a6/0x53e
> [ 0.138136] start_kernel+0x67c/0x71e
> [ 0.138136] x86_64_start_reservations+0x24/0x2a
> [ 0.138136] x86_64_start_kernel+0xe9/0xf4
> [ 0.138136] secondary_startup_64_no_verify+0xe5/0xeb
> [ 0.138136] </TASK>
>
> That's because it cannot jump to the return thunk from the 32-bit code.
> Using a naked RET and marking it as safe allows the system to proceed
> booting.
>
> Fixes: aa3d480315ba ("x86: Use return-thunk in asm code")
> Reported-by: Guenter Roeck <[email protected]>
> Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: <[email protected]>
> ---
>
> Does this leave one potential attack vector open? Perhaps, since this is
> running under a different mapping (AFAIU), the risk is reduced? Or rather, the
> attacker could attack using the firmware RETs anyway?
>
> Alternatively, we could use IBPB when available when using the wrapper.
>
> Thoughts?

What actual uarch are you running this on? Is this AMD hardware?

For Intel we'll enable IBRS for firmware if it is not otherwise enabled
(upstream will always enable IBRS for the SKL family chips, but Thomas
just posted the retbleed=stuff approach yesterday that will not)

On AMD I think you're stuck with IBPB, but that has to be issued
*before* calling the firmware muck.

In either case, I think the patch as proposed is fine. But perhaps we
want something like the below on top.

---
Subject: x86/amd: Use IBPB for firmware calls

On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/nospec-branch.h | 2 ++
arch/x86/kernel/cpu/bugs.c | 11 ++++++++++-
3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 00f5227c8459..a77b915d36a8 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -302,6 +302,7 @@
#define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */
#define X86_FEATURE_RETHUNK (11*32+14) /* "" Use REturn THUNK */
#define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */
+#define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 10a3bfc1eb23..f934dcdb7c0d 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -297,6 +297,8 @@ do { \
alternative_msr_write(MSR_IA32_SPEC_CTRL, \
spec_ctrl_current() | SPEC_CTRL_IBRS, \
X86_FEATURE_USE_IBRS_FW); \
+ altnerative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, \
+ X86_FEATURE_USE_IBPB_FW); \
} while (0)

#define firmware_restrict_branch_speculation_end() \
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index aa34f908c39f..78c9082242a9 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1516,7 +1516,16 @@ static void __init spectre_v2_select_mitigation(void)
* the CPU supports Enhanced IBRS, kernel might un-intentionally not
* enable IBRS around firmware calls.
*/
- if (boot_cpu_has(X86_FEATURE_IBRS) && !spectre_v2_in_ibrs_mode(mode)) {
+ if (boot_cpu_has_bug(X86_BUG_RETBLEED) &&
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+ boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) {
+
+ if (retbleed_cmd != RETBLEED_CMD_IBPB) {
+ setup_force_cpu_cap(X86_FEATURE_USE_IBPB_FW);
+ pr_info("Enabling Speculation Barrier for firmware calls\n");
+ }
+
+ } else if (boot_cpu_has(X86_FEATURE_IBRS) && !spectre_v2_in_ibrs_mode(mode)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
pr_info("Enabling Restricted Speculation for firmware calls\n");
}

Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Mon, Jul 18, 2022 at 01:41:37PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 15, 2022 at 04:45:50PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > When running with return thunks enabled under 32-bit EFI, the system
> > crashes with:
> >
> > [ 0.137688] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> > [ 0.138136] BUG: unable to handle page fault for address: 000000005bc02900
> > [ 0.138136] #PF: supervisor instruction fetch in kernel mode
> > [ 0.138136] #PF: error_code(0x0011) - permissions violation
> > [ 0.138136] PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063
> > [ 0.138136] Oops: 0011 [#1] PREEMPT SMP PTI
> > [ 0.138136] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166
> > [ 0.138136] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > [ 0.138136] RIP: 0010:0x5bc02900
> > [ 0.138136] Code: Unable to access opcode bytes at RIP 0x5bc028d6.
> > [ 0.138136] RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046
> > [ 0.138136] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048
> > [ 0.138136] RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b
> > [ 0.138136] RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28
> > [ 0.138136] R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710
> > [ 0.138136] R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001
> > [ 0.138136] FS: 0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000
> > [ 0.138136] CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033
> > [ 0.138136] CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0
> > [ 0.138136] Call Trace:
> > [ 0.138136] <TASK>
> > [ 0.138136] ? efi_set_virtual_address_map+0x9c/0x175
> > [ 0.138136] efi_enter_virtual_mode+0x4a6/0x53e
> > [ 0.138136] start_kernel+0x67c/0x71e
> > [ 0.138136] x86_64_start_reservations+0x24/0x2a
> > [ 0.138136] x86_64_start_kernel+0xe9/0xf4
> > [ 0.138136] secondary_startup_64_no_verify+0xe5/0xeb
> > [ 0.138136] </TASK>
> >
> > That's because it cannot jump to the return thunk from the 32-bit code.
> > Using a naked RET and marking it as safe allows the system to proceed
> > booting.
> >
> > Fixes: aa3d480315ba ("x86: Use return-thunk in asm code")
> > Reported-by: Guenter Roeck <[email protected]>
> > Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
> > Cc: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Josh Poimboeuf <[email protected]>
> > Cc: <[email protected]>
> > ---
> >
> > Does this leave one potential attack vector open? Perhaps, since this is
> > running under a different mapping (AFAIU), the risk is reduced? Or rather, the
> > attacker could attack using the firmware RETs anyway?
> >
> > Alternatively, we could use IBPB when available when using the wrapper.
> >
> > Thoughts?
>
> What actual uarch are you running this on? Is this AMD hardware?
>
> For Intel we'll enable IBRS for firmware if it is not otherwise enabled
> (upstream will always enable IBRS for the SKL family chips, but Thomas
> just posted the retbleed=stuff approach yesterday that will not)
>
> On AMD I think you're stuck with IBPB, but that has to be issued
> *before* calling the firmware muck.
>
> In either case, I think the patch as proposed is fine. But perhaps we
> want something like the below on top.
>

I was testing this on Intel, because Guenter Roeck had reported such failures
when booting with 32-bit EFI. My patch fixes the boot problem, but then I asked
whether it would leave us vulnerable and, then, I was thinking about AMD
mostly, as you pointed out.

And I think you nailed what I had in mind for using IBPB when doing firmware
calls, and perhaps this is wanted even when we ignore this naked RET here.

There is a typo on your patch below, but I will give it a try and see if it
doesn't blow up on AMD systems without IBPB (by way of emulation).

Thanks.
Cascardo.

> ---
> Subject: x86/amd: Use IBPB for firmware calls
>
> On AMD IBRS does not prevent Retbleed; as such use IBPB before a
> firmware call to flush the branch history state.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/nospec-branch.h | 2 ++
> arch/x86/kernel/cpu/bugs.c | 11 ++++++++++-
> 3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 00f5227c8459..a77b915d36a8 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -302,6 +302,7 @@
> #define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */
> #define X86_FEATURE_RETHUNK (11*32+14) /* "" Use REturn THUNK */
> #define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */
> +#define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */
>
> /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 10a3bfc1eb23..f934dcdb7c0d 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -297,6 +297,8 @@ do { \
> alternative_msr_write(MSR_IA32_SPEC_CTRL, \
> spec_ctrl_current() | SPEC_CTRL_IBRS, \
> X86_FEATURE_USE_IBRS_FW); \
> + altnerative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, \
> + X86_FEATURE_USE_IBPB_FW); \
> } while (0)
>
> #define firmware_restrict_branch_speculation_end() \
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index aa34f908c39f..78c9082242a9 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1516,7 +1516,16 @@ static void __init spectre_v2_select_mitigation(void)
> * the CPU supports Enhanced IBRS, kernel might un-intentionally not
> * enable IBRS around firmware calls.
> */
> - if (boot_cpu_has(X86_FEATURE_IBRS) && !spectre_v2_in_ibrs_mode(mode)) {
> + if (boot_cpu_has_bug(X86_BUG_RETBLEED) &&
> + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
> + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) {
> +
> + if (retbleed_cmd != RETBLEED_CMD_IBPB) {
> + setup_force_cpu_cap(X86_FEATURE_USE_IBPB_FW);
> + pr_info("Enabling Speculation Barrier for firmware calls\n");
> + }
> +
> + } else if (boot_cpu_has(X86_FEATURE_IBRS) && !spectre_v2_in_ibrs_mode(mode)) {
> setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
> pr_info("Enabling Restricted Speculation for firmware calls\n");
> }

2022-07-18 16:24:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Mon, Jul 18, 2022 at 10:59:18AM -0300, Thadeu Lima de Souza Cascardo wrote:

> And I think you nailed what I had in mind for using IBPB when doing firmware
> calls, and perhaps this is wanted even when we ignore this naked RET here.
>
> There is a typo on your patch below, but I will give it a try and see if it
> doesn't blow up on AMD systems without IBPB (by way of emulation).

Yeah, Boris pointed out the same. Typing is hard ;-)

2022-07-18 16:40:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Mon, Jul 18, 2022 at 01:41:37PM +0200, Peter Zijlstra wrote:
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 10a3bfc1eb23..f934dcdb7c0d 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -297,6 +297,8 @@ do { \
> alternative_msr_write(MSR_IA32_SPEC_CTRL, \
> spec_ctrl_current() | SPEC_CTRL_IBRS, \
> X86_FEATURE_USE_IBRS_FW); \
> + altnerative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, \
> + X86_FEATURE_USE_IBPB_FW); \
> } while (0)

So I'm being told we need to untrain on return from EFI to protect the
kernel from it. Ontop of yours.

Asm looks correct but lemme run it through the test builds first...

---
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 38a3e86e665e..e58f18555022 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -62,6 +62,12 @@
dec reg; \
jnz 771b;

+#ifdef CONFIG_CPU_UNRET_ENTRY
+#define CALL_ZEN_UNTRAIN_RET "call zen_untrain_ret"
+#else
+#define CALL_ZEN_UNTRAIN_RET ""
+#endif
+
#ifdef __ASSEMBLY__

/*
@@ -128,12 +134,6 @@
.Lskip_rsb_\@:
.endm

-#ifdef CONFIG_CPU_UNRET_ENTRY
-#define CALL_ZEN_UNTRAIN_RET "call zen_untrain_ret"
-#else
-#define CALL_ZEN_UNTRAIN_RET ""
-#endif
-
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
* return thunk isn't mapped into the userspace tables (then again, AMD
@@ -169,6 +169,16 @@ extern void __x86_return_thunk(void);
extern void zen_untrain_ret(void);
extern void entry_ibpb(void);

+#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY)
+# define UNTRAIN_RET \
+ asm volatile(ALTERNATIVE_2(ANNOTATE_RETPOLINE_SAFE "nop", \
+ CALL_ZEN_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ "call entry_ibpb", X86_FEATURE_ENTRY_IBPB) \
+ ::)
+#else
+# define UNTRAIN_RET
+#endif
+
#ifdef CONFIG_RETPOLINE

#define GEN(reg) \
@@ -306,6 +316,7 @@ do { \
alternative_msr_write(MSR_IA32_SPEC_CTRL, \
spec_ctrl_current(), \
X86_FEATURE_USE_IBRS_FW); \
+ UNTRAIN_RET; \
preempt_enable(); \
} while (0)


--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
(HRB 36809, AG Nürnberg)

2022-07-18 16:42:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Mon, Jul 18, 2022 at 06:28:27PM +0200, Borislav Petkov wrote:
> On Mon, Jul 18, 2022 at 01:41:37PM +0200, Peter Zijlstra wrote:
> > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > index 10a3bfc1eb23..f934dcdb7c0d 100644
> > --- a/arch/x86/include/asm/nospec-branch.h
> > +++ b/arch/x86/include/asm/nospec-branch.h
> > @@ -297,6 +297,8 @@ do { \
> > alternative_msr_write(MSR_IA32_SPEC_CTRL, \
> > spec_ctrl_current() | SPEC_CTRL_IBRS, \
> > X86_FEATURE_USE_IBRS_FW); \
> > + altnerative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, \
> > + X86_FEATURE_USE_IBPB_FW); \
> > } while (0)
>
> So I'm being told we need to untrain on return from EFI to protect the
> kernel from it. Ontop of yours.

I don't think there's any credible way we can protect against EFI taking
over the system if it wants to. It runs at CPL0 and has access to the
direct map. If EFI wants it can take over the system without trying.

2022-07-18 18:52:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Mon, Jul 18, 2022 at 9:28 AM Borislav Petkov <[email protected]> wrote:
>
> So I'm being told we need to untrain on return from EFI to protect the
> kernel from it.

Why would we have to protect the kernel from EFI?

If we can't trust EFI, then the machine is already compromised. We
just *called* an EFI routine, if EFI is untrusted, it did something
random.

I mean, it could have already done something bad at boot time when it
loaded the kernel.

So no, let's not "protect ourselves from EFI".

Linus

2022-07-18 18:59:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Mon, Jul 18, 2022 at 11:34:02AM -0700, Linus Torvalds wrote:
> Why would we have to protect the kernel from EFI?

Yes, we cleared this up on IRC in the meantime.

This was raised as a concern in case we don't trust EFI. But we cannot
not (double negation on purpose) trust EFI because it can do whatever it
likes anyway, "underneath" the OS.

I'm keeping the UNTRAIN_RET-in-C diff in my patches/ folder, though - I
get the feeling we might need it soon for something else.

:-)

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
(HRB 36809, AG Nürnberg)

2022-07-19 16:11:13

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Mon, 18 Jul 2022 at 20:46, Borislav Petkov <[email protected]> wrote:
>
> On Mon, Jul 18, 2022 at 11:34:02AM -0700, Linus Torvalds wrote:
> > Why would we have to protect the kernel from EFI?
>
> Yes, we cleared this up on IRC in the meantime.
>
> This was raised as a concern in case we don't trust EFI. But we cannot
> not (double negation on purpose) trust EFI because it can do whatever it
> likes anyway, "underneath" the OS.
>
> I'm keeping the UNTRAIN_RET-in-C diff in my patches/ folder, though - I
> get the feeling we might need it soon for something else.
>
> :-)
>

The code in question is a little trampoline that executes from the EFI
mixed mode 1:1 mapping of the kernel text, and never via the kernel
mapping, so we should just move it into .rodata instead (and fix up
the mixed mode virtual address map logic accordingly). I don't think
mapping the kernel text and rodata into the 1;1 EFI map is needed at
all, tbh, and the only thing we ever access via that mapping is that
little trampoline.

Something like

--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -176,7 +176,7 @@ virt_to_phys_or_null_size(void *va, unsigned long size)

int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
{
- unsigned long pfn, text, pf, rodata;
+ unsigned long pfn, pf, rodata;
struct page *page;
unsigned npages;
pgd_t *pgd = efi_mm.pgd;
@@ -222,7 +222,7 @@ int __init efi_setup_page_tables(unsigned long
pa_memmap, unsigned num_pages)
/*
* When making calls to the firmware everything needs to be 1:1
* mapped and addressable with 32-bit pointers. Map the kernel
- * text and allocate a new stack because we can't rely on the
+ * rodata and allocate a new stack because we can't rely on the
* stack pointer being < 4GB.
*/
if (!efi_is_mixed())
@@ -236,21 +236,11 @@ int __init efi_setup_page_tables(unsigned long
pa_memmap, unsigned num_pages)

efi_mixed_mode_stack_pa = page_to_phys(page + 1); /* stack grows down */

- npages = (_etext - _text) >> PAGE_SHIFT;
- text = __pa(_text);
- pfn = text >> PAGE_SHIFT;
-
- pf = _PAGE_ENC;
- if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
- pr_err("Failed to map kernel text 1:1\n");
- return 1;
- }
-
npages = (__end_rodata - __start_rodata) >> PAGE_SHIFT;
rodata = __pa(__start_rodata);
pfn = rodata >> PAGE_SHIFT;

- pf = _PAGE_NX | _PAGE_ENC;
+ pf = _PAGE_ENC;
if (kernel_map_pages_in_pgd(pgd, pfn, rodata, npages, pf)) {
pr_err("Failed to map kernel rodata 1:1\n");
return 1;
diff --git a/arch/x86/platform/efi/efi_thunk_64.S
b/arch/x86/platform/efi/efi_thunk_64.S
index 854dd81804b7..d4ee75ebfad6 100644
--- a/arch/x86/platform/efi/efi_thunk_64.S
+++ b/arch/x86/platform/efi/efi_thunk_64.S
@@ -71,7 +71,9 @@ STACK_FRAME_NON_STANDARD __efi64_thunk
pushq $__KERNEL32_CS
pushq %rdi /* EFI runtime service address */
lretq
+SYM_FUNC_END(__efi64_thunk)

+ .section ".rodata", "a"
1: movq 0x20(%rsp), %rsp
pop %rbx
pop %rbp
@@ -81,7 +83,6 @@ STACK_FRAME_NON_STANDARD __efi64_thunk
2: pushl $__KERNEL_CS
pushl %ebp
lret
-SYM_FUNC_END(__efi64_thunk)

.bss
.balign 8

2022-07-19 18:37:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] efi/x86: use naked RET on mixed mode call wrapper

On Tue, Jul 19, 2022 at 05:22:28PM +0200, Ard Biesheuvel wrote:
> The code in question is a little trampoline that executes from the EFI
> mixed mode 1:1 mapping of the kernel text, and never via the kernel
> mapping, so we should just move it into .rodata instead (and fix up
> the mixed mode virtual address map logic accordingly). I don't think
> mapping the kernel text and rodata into the 1;1 EFI map is needed at
> all, tbh, and the only thing we ever access via that mapping is that
> little trampoline.
>
> Something like

I'm obviously always for simplifications like that. I'm guessing this
should be tested for a full next release before it goes to Linus?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [tip: x86/urgent] x86/amd: Use IBPB for firmware calls

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 28a99e95f55c61855983d36a88c05c178d966bb7
Gitweb: https://git.kernel.org/tip/28a99e95f55c61855983d36a88c05c178d966bb7
Author: Peter Zijlstra <[email protected]>
AuthorDate: Mon, 18 Jul 2022 13:41:37 +02:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Mon, 18 Jul 2022 15:38:09 +02:00

x86/amd: Use IBPB for firmware calls

On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.

And because in order to do an EFI call, the kernel maps a whole lot of
the kernel page table into the EFI page table, do an IBPB just in case
in order to prevent the scenario of poisoning the BTB and causing an EFI
call using the unprotected RET there.

[ bp: Massage. ]

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/nospec-branch.h | 2 ++
arch/x86/kernel/cpu/bugs.c | 11 ++++++++++-
3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 00f5227..a77b915 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -302,6 +302,7 @@
#define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */
#define X86_FEATURE_RETHUNK (11*32+14) /* "" Use REturn THUNK */
#define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */
+#define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 10a3bfc..38a3e86 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -297,6 +297,8 @@ do { \
alternative_msr_write(MSR_IA32_SPEC_CTRL, \
spec_ctrl_current() | SPEC_CTRL_IBRS, \
X86_FEATURE_USE_IBRS_FW); \
+ alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, \
+ X86_FEATURE_USE_IBPB_FW); \
} while (0)

#define firmware_restrict_branch_speculation_end() \
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index aa34f90..78c9082 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1516,7 +1516,16 @@ static void __init spectre_v2_select_mitigation(void)
* the CPU supports Enhanced IBRS, kernel might un-intentionally not
* enable IBRS around firmware calls.
*/
- if (boot_cpu_has(X86_FEATURE_IBRS) && !spectre_v2_in_ibrs_mode(mode)) {
+ if (boot_cpu_has_bug(X86_BUG_RETBLEED) &&
+ (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+ boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) {
+
+ if (retbleed_cmd != RETBLEED_CMD_IBPB) {
+ setup_force_cpu_cap(X86_FEATURE_USE_IBPB_FW);
+ pr_info("Enabling Speculation Barrier for firmware calls\n");
+ }
+
+ } else if (boot_cpu_has(X86_FEATURE_IBRS) && !spectre_v2_in_ibrs_mode(mode)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
pr_info("Enabling Restricted Speculation for firmware calls\n");
}