2024-02-13 04:08:37

by Kevin Loughlin

[permalink] [raw]
Subject: [PATCH] x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active

SEV-SNP requires encrypted memory to be validated before access. The
kernel is responsible for validating the ROM memory range because the
range is not part of the e820 table and therefore not pre-validated by
the BIOS.

While the current SEV-SNP code attempts to validate the ROM range in
probe_roms(), this does not suffice for all existing use cases. In
particular, if EFI_CONFIG_TABLES are not enabled and
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will
attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which
falls in the ROM range) prior to validation. The specific problematic
call chain occurs during dmi_setup() -> dmi_scan_machine() and results
in a crash during boot if SEV-SNP is enabled under these conditions.

This commit thus provides the simple solution of moving the ROM range
validation from probe_roms() to before dmi_setup(), such that a SEV-SNP
guest satisfying the above use case now successfully boots.

Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active")
Signed-off-by: Kevin Loughlin <[email protected]>
---
arch/x86/include/asm/setup.h | 6 ++++++
arch/x86/kernel/probe_roms.c | 19 +++++++++----------
arch/x86/kernel/setup.c | 10 ++++++++++
3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 5c83729c8e71..5c8f5b0d0f9f 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -117,6 +117,12 @@ void *extend_brk(size_t size, size_t align);
__section(".bss..brk") __aligned(1) __used \
static char __brk_##name[size]

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+void snp_prep_rom_range(void);
+#else
+static inline void snp_prep_rom_range(void) { }
+#endif
+
extern void probe_roms(void);

void clear_bss(void);
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 319fef37d9dc..83b192f5e3cc 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -196,6 +196,15 @@ static int __init romchecksum(const unsigned char *rom, unsigned long length)
return !length && !sum;
}

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+void __init snp_prep_rom_range(void)
+{
+ snp_prep_memory(video_rom_resource.start,
+ ((system_rom_resource.end + 1) - video_rom_resource.start),
+ SNP_PAGE_STATE_PRIVATE);
+}
+#endif
+
void __init probe_roms(void)
{
unsigned long start, length, upper;
@@ -203,16 +212,6 @@ void __init probe_roms(void)
unsigned char c;
int i;

- /*
- * The ROM memory range is not part of the e820 table and is therefore not
- * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted
- * memory, and SNP requires encrypted memory to be validated before access.
- * Do that here.
- */
- snp_prep_memory(video_rom_resource.start,
- ((system_rom_resource.end + 1) - video_rom_resource.start),
- SNP_PAGE_STATE_PRIVATE);
-
/* video rom */
upper = adapter_rom_resources[0].start;
for (start = video_rom_resource.start; start < upper; start += 2048) {
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84201071dfac..19f870728486 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -902,6 +902,16 @@ void __init setup_arch(char **cmdline_p)
efi_init();

reserve_ibft_region();
+
+ /*
+ * The ROM memory range is not part of the e820 table and is therefore not
+ * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted
+ * memory, and SNP requires encrypted memory to be validated before access.
+ * This should be done before dmi_setup(), which may access the ROM region
+ * even before probe_roms() is called.
+ */
+ snp_prep_rom_range();
+
dmi_setup();

/*
--
2.43.0.687.g38aa6559b0-goog



2024-02-13 20:03:20

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH] x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active

Quoting Kevin Loughlin (2024-02-12 22:07:46)
> SEV-SNP requires encrypted memory to be validated before access. The
> kernel is responsible for validating the ROM memory range because the
> range is not part of the e820 table and therefore not pre-validated by
> the BIOS.
>
> While the current SEV-SNP code attempts to validate the ROM range in
> probe_roms(), this does not suffice for all existing use cases. In
> particular, if EFI_CONFIG_TABLES are not enabled and
> CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will
> attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which
> falls in the ROM range) prior to validation. The specific problematic
> call chain occurs during dmi_setup() -> dmi_scan_machine() and results
> in a crash during boot if SEV-SNP is enabled under these conditions.

AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial
encrypted guest image, and I'm not aware of any VMM implementations that
do this either. As a result, it seems like snp_prep_rom_range() would
only result in the guest seeing ciphertext in these ranges.

If dmi_setup() similarly scans these ranges, it seems likely the same
issue would be present: the validated/private regions would only contain
ciphertext rather than the expected ROM data. Does that agree with the
behavior you are seeing?

If so, maybe instead probe_roms should just be skipped in the case of SNP?
And perhaps dmi_setup() should similarly skip the legacy ROM ranges for
the kernel configs in question?

-Mike

>
> This commit thus provides the simple solution of moving the ROM range
> validation from probe_roms() to before dmi_setup(), such that a SEV-SNP
> guest satisfying the above use case now successfully boots.
>
> Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active")
> Signed-off-by: Kevin Loughlin <[email protected]>
> ---
> arch/x86/include/asm/setup.h | 6 ++++++
> arch/x86/kernel/probe_roms.c | 19 +++++++++----------
> arch/x86/kernel/setup.c | 10 ++++++++++
> 3 files changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
> index 5c83729c8e71..5c8f5b0d0f9f 100644
> --- a/arch/x86/include/asm/setup.h
> +++ b/arch/x86/include/asm/setup.h
> @@ -117,6 +117,12 @@ void *extend_brk(size_t size, size_t align);
> __section(".bss..brk") __aligned(1) __used \
> static char __brk_##name[size]
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +void snp_prep_rom_range(void);
> +#else
> +static inline void snp_prep_rom_range(void) { }
> +#endif
> +
> extern void probe_roms(void);
>
> void clear_bss(void);
> diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
> index 319fef37d9dc..83b192f5e3cc 100644
> --- a/arch/x86/kernel/probe_roms.c
> +++ b/arch/x86/kernel/probe_roms.c
> @@ -196,6 +196,15 @@ static int __init romchecksum(const unsigned char *rom, unsigned long length)
> return !length && !sum;
> }
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +void __init snp_prep_rom_range(void)
> +{
> + snp_prep_memory(video_rom_resource.start,
> + ((system_rom_resource.end + 1) - video_rom_resource.start),
> + SNP_PAGE_STATE_PRIVATE);
> +}
> +#endif
> +
> void __init probe_roms(void)
> {
> unsigned long start, length, upper;
> @@ -203,16 +212,6 @@ void __init probe_roms(void)
> unsigned char c;
> int i;
>
> - /*
> - * The ROM memory range is not part of the e820 table and is therefore not
> - * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted
> - * memory, and SNP requires encrypted memory to be validated before access.
> - * Do that here.
> - */
> - snp_prep_memory(video_rom_resource.start,
> - ((system_rom_resource.end + 1) - video_rom_resource.start),
> - SNP_PAGE_STATE_PRIVATE);
> -
> /* video rom */
> upper = adapter_rom_resources[0].start;
> for (start = video_rom_resource.start; start < upper; start += 2048) {
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 84201071dfac..19f870728486 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -902,6 +902,16 @@ void __init setup_arch(char **cmdline_p)
> efi_init();
>
> reserve_ibft_region();
> +
> + /*
> + * The ROM memory range is not part of the e820 table and is therefore not
> + * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted
> + * memory, and SNP requires encrypted memory to be validated before access.
> + * This should be done before dmi_setup(), which may access the ROM region
> + * even before probe_roms() is called.
> + */
> + snp_prep_rom_range();
> +
> dmi_setup();
>
> /*
> --
> 2.43.0.687.g38aa6559b0-goog
>
>

2024-02-13 23:11:09

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [PATCH] x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active

On Tue, Feb 13, 2024 at 12:03 PM Michael Roth <[email protected]> wrote:
>
> Quoting Kevin Loughlin (2024-02-12 22:07:46)
> > SEV-SNP requires encrypted memory to be validated before access. The
> > kernel is responsible for validating the ROM memory range because the
> > range is not part of the e820 table and therefore not pre-validated by
> > the BIOS.
> >
> > While the current SEV-SNP code attempts to validate the ROM range in
> > probe_roms(), this does not suffice for all existing use cases. In
> > particular, if EFI_CONFIG_TABLES are not enabled and
> > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will
> > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which
> > falls in the ROM range) prior to validation. The specific problematic
> > call chain occurs during dmi_setup() -> dmi_scan_machine() and results
> > in a crash during boot if SEV-SNP is enabled under these conditions.
>
> AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial
> encrypted guest image, and I'm not aware of any VMM implementations that
> do this either.

I'm using a VMM implementation that uses (non-EFI) Oak stage0 firmware [0].

[0] https://github.com/project-oak/oak/tree/main/stage0_bin

> If dmi_setup() similarly scans these ranges, it seems likely the same
> issue would be present: the validated/private regions would only contain
> ciphertext rather than the expected ROM data. Does that agree with the
> behavior you are seeing?
>
> If so, maybe instead probe_roms should just be skipped in the case of SNP?

If probe_roms() is skipped, SEV-SNP guest boot also currently crashes;
I just quickly tried that (though admittedly haven't looked into why).
Apparently though, the fix for early ROM range accesses is not as
simple as just skipping probe_roms() if SEV-SNP is enabled.
Furthermore, skipping probe_roms() was also *not* the route taken in
the initial attempt that prevents this issue for EFI use cases [1].

[1] https://lore.kernel.org/lkml/[email protected]/

> And perhaps dmi_setup() should similarly skip the legacy ROM ranges for
> the kernel configs in question?

Given (a) non-EFI firmware is supported in other SME/SEV boot code
patches [2], (b) this patch does not seem to introduce significant
complexity (it just moves [1] to earlier in the boot process to
additionally handle the non-EFI case), and (c) skipping
probe_roms()+dmi_setup() doesn't work without additional changes, I'm
currently still inclined to simply validate the legacy ROM ranges
early enough to prevent this issue (as is already done when using EFI
firmware).

[2] https://lore.kernel.org/lkml/CAMj1kXFZKM5wU8djcVBxDmnCJwV4Xpest6u1EbE=7wyLUUeUUQ@mail.gmail.com/

2024-02-16 22:51:08

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH] x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active

On Tue, Feb 13, 2024 at 03:10:46PM -0800, Kevin Loughlin wrote:
> On Tue, Feb 13, 2024 at 12:03 PM Michael Roth <[email protected]> wrote:
> >
> > Quoting Kevin Loughlin (2024-02-12 22:07:46)
> > > SEV-SNP requires encrypted memory to be validated before access. The
> > > kernel is responsible for validating the ROM memory range because the
> > > range is not part of the e820 table and therefore not pre-validated by
> > > the BIOS.
> > >
> > > While the current SEV-SNP code attempts to validate the ROM range in
> > > probe_roms(), this does not suffice for all existing use cases. In
> > > particular, if EFI_CONFIG_TABLES are not enabled and
> > > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will
> > > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which
> > > falls in the ROM range) prior to validation. The specific problematic
> > > call chain occurs during dmi_setup() -> dmi_scan_machine() and results
> > > in a crash during boot if SEV-SNP is enabled under these conditions.
> >
> > AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial
> > encrypted guest image, and I'm not aware of any VMM implementations that
> > do this either.
>
> I'm using a VMM implementation that uses (non-EFI) Oak stage0 firmware [0].
>
> [0] https://github.com/project-oak/oak/tree/main/stage0_bin
>
> > If dmi_setup() similarly scans these ranges, it seems likely the same
> > issue would be present: the validated/private regions would only contain
> > ciphertext rather than the expected ROM data. Does that agree with the
> > behavior you are seeing?
> >
> > If so, maybe instead probe_roms should just be skipped in the case of SNP?
>
> If probe_roms() is skipped, SEV-SNP guest boot also currently crashes;
> I just quickly tried that (though admittedly haven't looked into why).

default_find_smp_config() will also call smp_scan_config() on
0xF0000-0x10000, so that might be the additional issue you're hitting.
If I skip that for in addition to probe_roms, then boot works for me.

The dmi_setup() case you hit would also need similar handling if taking
this approach.

> Apparently though, the fix for early ROM range accesses is not as
> simple as just skipping probe_roms() if SEV-SNP is enabled.
> Furthermore, skipping probe_roms() was also *not* the route taken in
> the initial attempt that prevents this issue for EFI use cases [1].
>
> [1] https://lore.kernel.org/lkml/[email protected]/

It seems the currently handling has a bug that has been in place since the
original SEV guest code was added. If you dump the data that probe_roms()
sees while it is scanning for instances of ROMSIGNATURE (0xaa55) in the
region, you'll see that it is random data that changes on every boot.
The root issue is that this region does not contain encrypted data, and
is only being accessed that way because the early page table has the
encryption bit set for this range.

The effects are subtle: if the code ever sees a pair of bytes that look
like ROMSIGNATURE, it will reserve that memory so it can be accessed
later, generally just 0xc0000-0xc7fff. In extremely rare cases where the
ciphertext's data has a checksum that happens to match the contents, it
will use a random byte, multiple it by 512, and reserve up to 64k for
this bogus ROM region.

For SNP this resulted in a more obvious failure: a #VC exception because
the supposedly encrypted memory was in fact not encrypted, and thus not
PVALIDATED. Unfortunately the fix you linked to involved maintaining the
broken SEV behavior rather than fixing this mismatch.

>
> > And perhaps dmi_setup() should similarly skip the legacy ROM ranges for
> > the kernel configs in question?
>
> Given (a) non-EFI firmware is supported in other SME/SEV boot code
> patches [2], (b) this patch does not seem to introduce significant
> complexity (it just moves [1] to earlier in the boot process to
> additionally handle the non-EFI case), and (c) skipping
> probe_roms()+dmi_setup() doesn't work without additional changes, I'm
> currently still inclined to simply validate the legacy ROM ranges
> early enough to prevent this issue (as is already done when using EFI
> firmware).

The 2 options I see are:

a) Skipping accesses to these regions for SEV. It is vaguely possible
some implementation out there actually did measure/load the ROM as
part of the initial guest image for SEV, but for SNP this would
have been impossible since it would have lead to the guest crashing
when snp_prep_roms() was called, since RMPUPDATE on the host only
rescinds the validated bit if there is a change to the RMP entry.
If it was already assigned/private/validated then the guest code
would detected that PVALIDATE resulted in no changes, and so it
would have failed with PVALIDATE_FAIL_NOUPDATE. So if you want to
be super sure you don't break legacy SEV implementations then you
could limit the change to SNP guests where it's essentially
guaranteed these regions are not being utilized in any functional
way.

b) Modifying the early page table setup by early_make_pgtable() to
clear the encrypted bit for 0xC0000-0x100000 legacy region. The
challenge there is everything is PMD-mapped at that stage of boot
and there's no infrastructure for splitting page tables to handle
non-2MB-aligned/sized regions.

But I don't think continuing to propagate the broken SEV behavior is
the right fix. At some point those random scans may trigger something
more problematic than wasted memory reservations. It may even be the
case already since I haven't audited the dmi_setup()/smp_scan_config()
paths yet, but nothing good/useful can come of it.

-Mike

>
> [2] https://lore.kernel.org/lkml/CAMj1kXFZKM5wU8djcVBxDmnCJwV4Xpest6u1EbE=7wyLUUeUUQ@mail.gmail.com/

2024-02-22 20:21:00

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCH] x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active

On Wed, Feb 21, 2024 at 02:50:00PM -0800, Kevin Loughlin wrote:
> On Fri, Feb 16, 2024 at 2:50 PM Michael Roth <[email protected]> wrote:
> >
> > On Tue, Feb 13, 2024 at 03:10:46PM -0800, Kevin Loughlin wrote:
> > > On Tue, Feb 13, 2024 at 12:03 PM Michael Roth <[email protected]> wrote:
> > > >
> > > > Quoting Kevin Loughlin (2024-02-12 22:07:46)
> > > > > SEV-SNP requires encrypted memory to be validated before access. The
> > > > > kernel is responsible for validating the ROM memory range because the
> > > > > range is not part of the e820 table and therefore not pre-validated by
> > > > > the BIOS.
> > > > >
> > > > > While the current SEV-SNP code attempts to validate the ROM range in
> > > > > probe_roms(), this does not suffice for all existing use cases. In
> > > > > particular, if EFI_CONFIG_TABLES are not enabled and
> > > > > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will
> > > > > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which
> > > > > falls in the ROM range) prior to validation. The specific problematic
> > > > > call chain occurs during dmi_setup() -> dmi_scan_machine() and results
> > > > > in a crash during boot if SEV-SNP is enabled under these conditions.
> > > >
> > > > AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial
> > > > encrypted guest image, and I'm not aware of any VMM implementations that
> > > > do this either.
> > >
> > > I'm using a VMM implementation that uses (non-EFI) Oak stage0 firmware [0].
> > >
> > > [0] https://github.com/project-oak/oak/tree/main/stage0_bin
> > >
> > > > If dmi_setup() similarly scans these ranges, it seems likely the same
> > > > issue would be present: the validated/private regions would only contain
> > > > ciphertext rather than the expected ROM data. Does that agree with the
> > > > behavior you are seeing?
> > > >
> > > > If so, maybe instead probe_roms should just be skipped in the case of SNP?
> > >
> > > If probe_roms() is skipped, SEV-SNP guest boot also currently crashes;
> > > I just quickly tried that (though admittedly haven't looked into why).
> >
> > default_find_smp_config() will also call smp_scan_config() on
> > 0xF0000-0x10000, so that might be the additional issue you're hitting.
> > If I skip that for in addition to probe_roms, then boot works for me.
>
> Yeah, smp_scan_config() was the culprit. Thanks.
>
> > It seems the currently handling has a bug that has been in place since the
> > original SEV guest code was added. If you dump the data that probe_roms()
> > sees while it is scanning for instances of ROMSIGNATURE (0xaa55) in the
> > region, you'll see that it is random data that changes on every boot.
> > The root issue is that this region does not contain encrypted data, and
> > is only being accessed that way because the early page table has the
> > encryption bit set for this range.
> >
> > The effects are subtle: if the code ever sees a pair of bytes that look
> > like ROMSIGNATURE, it will reserve that memory so it can be accessed
> > later, generally just 0xc0000-0xc7fff. In extremely rare cases where the
> > ciphertext's data has a checksum that happens to match the contents, it
> > will use a random byte, multiple it by 512, and reserve up to 64k for
> > this bogus ROM region.
> >
> > For SNP this resulted in a more obvious failure: a #VC exception because
> > the supposedly encrypted memory was in fact not encrypted, and thus not
> > PVALIDATED. Unfortunately the fix you linked to involved maintaining the
> > broken SEV behavior rather than fixing this mismatch.
> >
> > >
> > > > And perhaps dmi_setup() should similarly skip the legacy ROM ranges for
> > > > the kernel configs in question?
> > >
> > > Given (a) non-EFI firmware is supported in other SME/SEV boot code
> > > patches [2], (b) this patch does not seem to introduce significant
> > > complexity (it just moves [1] to earlier in the boot process to
> > > additionally handle the non-EFI case), and (c) skipping
> > > probe_roms()+dmi_setup() doesn't work without additional changes, I'm
> > > currently still inclined to simply validate the legacy ROM ranges
> > > early enough to prevent this issue (as is already done when using EFI
> > > firmware).
> >
> > The 2 options I see are:
> >
> > a) Skipping accesses to these regions for SEV. It is vaguely possible
> > some implementation out there actually did measure/load the ROM as
> > part of the initial guest image for SEV, but for SNP this would
> > have been impossible since it would have lead to the guest crashing
> > when snp_prep_roms() was called, since RMPUPDATE on the host only
> > rescinds the validated bit if there is a change to the RMP entry.
> > If it was already assigned/private/validated then the guest code
> > would detected that PVALIDATE resulted in no changes, and so it
> > would have failed with PVALIDATE_FAIL_NOUPDATE. So if you want to
> > be super sure you don't break legacy SEV implementations then you
> > could limit the change to SNP guests where it's essentially
> > guaranteed these regions are not being utilized in any functional
> > way.
>
> Based on your explanation, I agree that (at a minimum) it makes sense
> to rectify the behavior for SEV-SNP guests.
>
> On that note, as you describe here, I skipped the 3 ROM region scans
> on platforms with CC_ATTR_GUEST_SEV_SNP (and deleted the call to
> snp_prep_memory()) and successfully booted. I can send that as v2.

Sounds good. Please add me to the Cc, happy to test/review.

>
> Note that I have *not* tried skipping the scans for all SEV guest
> variants (CC_ATTR_GUEST_MEM_ENCRYPT) since those boots appear to be
> functioning without the change (and there is a risk of breaking the
> sorts of implementations that you described); also note that
> clang-built SEV-SNP guests still require [0] and [1] to function.
>
> [0] https://lore.kernel.org/all/[email protected]/
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=1c811d403afd73f04bde82b83b24c754011bd0e8
>
> > b) Modifying the early page table setup by early_make_pgtable() to
> > clear the encrypted bit for 0xC0000-0x100000 legacy region. The
> > challenge there is everything is PMD-mapped at that stage of boot
> > and there's no infrastructure for splitting page tables to handle
> > non-2MB-aligned/sized regions.
>
> If ever needed/desired, a slight variant of this second option might
> also be providing a temporary unencrypted mapping on the fly during
> the few times the regions are scanned during early boot, similar to
> how __sme_early_map_unmap_mem() is already used for sme_map_bootdata()
> in head64.c. I haven't tried it, but I just wanted to note it down in
> case it becomes relevant.

True, that might be another option to consider if needed.

-Mike

2024-02-21 22:55:42

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [PATCH] x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active

On Fri, Feb 16, 2024 at 2:50 PM Michael Roth <[email protected]> wrote:
>
> On Tue, Feb 13, 2024 at 03:10:46PM -0800, Kevin Loughlin wrote:
> > On Tue, Feb 13, 2024 at 12:03 PM Michael Roth <michael.roth@amdcom> wrote:
> > >
> > > Quoting Kevin Loughlin (2024-02-12 22:07:46)
> > > > SEV-SNP requires encrypted memory to be validated before access. The
> > > > kernel is responsible for validating the ROM memory range because the
> > > > range is not part of the e820 table and therefore not pre-validated by
> > > > the BIOS.
> > > >
> > > > While the current SEV-SNP code attempts to validate the ROM range in
> > > > probe_roms(), this does not suffice for all existing use cases. In
> > > > particular, if EFI_CONFIG_TABLES are not enabled and
> > > > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will
> > > > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which
> > > > falls in the ROM range) prior to validation. The specific problematic
> > > > call chain occurs during dmi_setup() -> dmi_scan_machine() and results
> > > > in a crash during boot if SEV-SNP is enabled under these conditions.
> > >
> > > AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial
> > > encrypted guest image, and I'm not aware of any VMM implementations that
> > > do this either.
> >
> > I'm using a VMM implementation that uses (non-EFI) Oak stage0 firmware [0].
> >
> > [0] https://github.com/project-oak/oak/tree/main/stage0_bin
> >
> > > If dmi_setup() similarly scans these ranges, it seems likely the same
> > > issue would be present: the validated/private regions would only contain
> > > ciphertext rather than the expected ROM data. Does that agree with the
> > > behavior you are seeing?
> > >
> > > If so, maybe instead probe_roms should just be skipped in the case of SNP?
> >
> > If probe_roms() is skipped, SEV-SNP guest boot also currently crashes;
> > I just quickly tried that (though admittedly haven't looked into why).
>
> default_find_smp_config() will also call smp_scan_config() on
> 0xF0000-0x10000, so that might be the additional issue you're hitting.
> If I skip that for in addition to probe_roms, then boot works for me.

Yeah, smp_scan_config() was the culprit. Thanks.

> It seems the currently handling has a bug that has been in place since the
> original SEV guest code was added. If you dump the data that probe_roms()
> sees while it is scanning for instances of ROMSIGNATURE (0xaa55) in the
> region, you'll see that it is random data that changes on every boot.
> The root issue is that this region does not contain encrypted data, and
> is only being accessed that way because the early page table has the
> encryption bit set for this range.
>
> The effects are subtle: if the code ever sees a pair of bytes that look
> like ROMSIGNATURE, it will reserve that memory so it can be accessed
> later, generally just 0xc0000-0xc7fff. In extremely rare cases where the
> ciphertext's data has a checksum that happens to match the contents, it
> will use a random byte, multiple it by 512, and reserve up to 64k for
> this bogus ROM region.
>
> For SNP this resulted in a more obvious failure: a #VC exception because
> the supposedly encrypted memory was in fact not encrypted, and thus not
> PVALIDATED. Unfortunately the fix you linked to involved maintaining the
> broken SEV behavior rather than fixing this mismatch.
>
> >
> > > And perhaps dmi_setup() should similarly skip the legacy ROM ranges for
> > > the kernel configs in question?
> >
> > Given (a) non-EFI firmware is supported in other SME/SEV boot code
> > patches [2], (b) this patch does not seem to introduce significant
> > complexity (it just moves [1] to earlier in the boot process to
> > additionally handle the non-EFI case), and (c) skipping
> > probe_roms()+dmi_setup() doesn't work without additional changes, I'm
> > currently still inclined to simply validate the legacy ROM ranges
> > early enough to prevent this issue (as is already done when using EFI
> > firmware).
>
> The 2 options I see are:
>
> a) Skipping accesses to these regions for SEV. It is vaguely possible
> some implementation out there actually did measure/load the ROM as
> part of the initial guest image for SEV, but for SNP this would
> have been impossible since it would have lead to the guest crashing
> when snp_prep_roms() was called, since RMPUPDATE on the host only
> rescinds the validated bit if there is a change to the RMP entry.
> If it was already assigned/private/validated then the guest code
> would detected that PVALIDATE resulted in no changes, and so it
> would have failed with PVALIDATE_FAIL_NOUPDATE. So if you want to
> be super sure you don't break legacy SEV implementations then you
> could limit the change to SNP guests where it's essentially
> guaranteed these regions are not being utilized in any functional
> way.

Based on your explanation, I agree that (at a minimum) it makes sense
to rectify the behavior for SEV-SNP guests.

On that note, as you describe here, I skipped the 3 ROM region scans
on platforms with CC_ATTR_GUEST_SEV_SNP (and deleted the call to
snp_prep_memory()) and successfully booted. I can send that as v2.

Note that I have *not* tried skipping the scans for all SEV guest
variants (CC_ATTR_GUEST_MEM_ENCRYPT) since those boots appear to be
functioning without the change (and there is a risk of breaking the
sorts of implementations that you described); also note that
clang-built SEV-SNP guests still require [0] and [1] to function.

[0] https://lore.kernel.org/all/[email protected]/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=1c811d403afd73f04bde82b83b24c754011bd0e8

> b) Modifying the early page table setup by early_make_pgtable() to
> clear the encrypted bit for 0xC0000-0x100000 legacy region. The
> challenge there is everything is PMD-mapped at that stage of boot
> and there's no infrastructure for splitting page tables to handle
> non-2MB-aligned/sized regions.

If ever needed/desired, a slight variant of this second option might
also be providing a temporary unencrypted mapping on the fly during
the few times the regions are scanned during early boot, similar to
how __sme_early_map_unmap_mem() is already used for sme_map_bootdata()
in head64.c. I haven't tried it, but I just wanted to note it down in
case it becomes relevant.