When running identity mapped and depending on the kernel configuration,
it is possible that cc_platform_has() can have compiler generated code
that uses jump tables. This causes a boot failure because the jump table
uses un-mapped kernel virtual addresses, not identity mapped addresses.
This has been seen with CONFIG_RETPOLINE=n.
Similar to sme_encrypt_kernel(), use an open-coded direct check for the
status of SNP rather than trying to eliminate the jump table. This
preserves any code optimization in cc_platform_has() that can be useful
post boot. It also limits the changes to SEV-specific files so that
future compiler features won't necessarily require possible build changes
just because they are not compatible with running identity mapped.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216385
Link: https://lore.kernel.org/all/[email protected]/
Cc: <[email protected]> # 5.19.x
Reported-by: Sean Christopherson <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kernel/sev.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 63dc626627a0..4f84c3f11af5 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -701,7 +701,13 @@ static void __init early_set_pages_state(unsigned long paddr, unsigned int npage
void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
unsigned int npages)
{
- if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ /*
+ * This can be invoked in early boot while running identity mapped, so
+ * use an open coded check for SNP instead of using cc_platform_has().
+ * This eliminates worries about jump tables or checking boot_cpu_data
+ * in the cc_platform_has() function.
+ */
+ if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
return;
/*
@@ -717,7 +723,13 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
unsigned int npages)
{
- if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ /*
+ * This can be invoked in early boot while running identity mapped, so
+ * use an open coded check for SNP instead of using cc_platform_has().
+ * This eliminates worries about jump tables or checking boot_cpu_data
+ * in the cc_platform_has() function.
+ */
+ if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
return;
/* Invalidate the memory pages before they are marked shared in the RMP table. */
--
2.37.2
On 8/23/22 14:55, Tom Lendacky wrote:
> When running identity mapped and depending on the kernel configuration,
> it is possible that cc_platform_has() can have compiler generated code
> that uses jump tables. This causes a boot failure because the jump table
> uses un-mapped kernel virtual addresses, not identity mapped addresses.
> This has been seen with CONFIG_RETPOLINE=n.
So, we don't have *ANY* control over where the compiler uses jump
tables. The kernel just happened to add some code that uses them, fell
over, and this adds a hack to get booting again.
Isn't this a bigger problem?
On Wed, Aug 24, 2022 at 11:43:10AM -0700, Dave Hansen wrote:
> So, we don't have *ANY* control over where the compiler uses jump
> tables. The kernel just happened to add some code that uses them, fell
> over, and this adds a hack to get booting again.
>
> Isn't this a bigger problem?
I had the same question already. Was thinking of maybe disabling
the compiler from producing jump tables in the ident-mapped code.
Tom's argument is that that might prevent the compiler from doing
optimizations but I haven't talked to compiler folks whether those
optimizations are even worth the effort.
Regardless, the potential problem is limited:
"# (jump-tables are implicitly disabled by RETPOLINE)"
i.e., only RETPOLINE=n builds for now which should be a minority?
I guess when this explodes somewhere else again, we will have to
generalize a fix.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 8/24/22 11:48, Borislav Petkov wrote:
> On Wed, Aug 24, 2022 at 11:43:10AM -0700, Dave Hansen wrote:
>> So, we don't have *ANY* control over where the compiler uses jump
>> tables. The kernel just happened to add some code that uses them, fell
>> over, and this adds a hack to get booting again.
>>
>> Isn't this a bigger problem?
> I had the same question already. Was thinking of maybe disabling
> the compiler from producing jump tables in the ident-mapped code.
> Tom's argument is that that might prevent the compiler from doing
> optimizations but I haven't talked to compiler folks whether those
> optimizations are even worth the effort.
>
> Regardless, the potential problem is limited:
>
> "# (jump-tables are implicitly disabled by RETPOLINE)"
Ahh, I missed the connection with retpoline. The ubiquity of
RETPOLINE=y probably means we'll see more of these issues because people
won't find them unless they're building and running weirdo configurations.
> i.e., only RETPOLINE=n builds for now which should be a minority?
>
> I guess when this explodes somewhere else again, we will have to
> generalize a fix.
Yep. It also reminds me to add RETPOLINE=n build to my tests.