2024-04-10 10:27:58

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings

From: Ard Biesheuvel <[email protected]>

The early 64-bit boot code must be entered with a 1:1 mapping of the
bootable image, but it cannot operate without a 1:1 mapping of all the
assets in memory that it accesses, and therefore, it creates such
mappings for all known assets upfront, and additional ones on demand
when a page fault happens on a memory address.

These mappings are created with the global bit G set, as the flags used
to create page table descriptors are based on __PAGE_KERNEL_LARGE_EXEC
defined by the core kernel, even though the context where these mappings
are used is very different.

This means that the TLB maintenance carried out by the decompressor is
not sufficient if it is entered with CR4.PGE enabled, which has been
observed to happen with the stage0 bootloader of project Oak. While this
is a dubious practice if no global mappings are being used to begin
with, the decompressor is clearly at fault here for creating global
mappings and not performing the appropriate TLB maintenance.

Since commit

f97b67a773cd84b ("x86/decompressor: Only call the trampoline when changing paging levels")

CR4 is no longer modified by the decompressor if no change in the number
of paging levels is needed. Before that, CR4 would always be set to a
known value with PGE cleared.

So clear CR4.PGE explicitly in the decompressor before switching to the
new 1:1 mapping which uses the G bit.

Cc: Conrad Grobler <[email protected]>
Cc: Kevin Loughlin <[email protected]>
Fixes: f97b67a773cd84b ("x86/decompressor: Only call the trampoline when ...")
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index d040080d7edb..e746bf2efdf7 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -179,6 +179,9 @@ void initialize_identity_maps(void *rmode)

sev_prep_identity_maps(top_level_pgt);

+ /* Disable global mappings */
+ asm("mov %0, %%cr4" :: "r"(native_read_cr4() & ~X86_CR4_PGE));
+
/* Load the new page-table. */
write_cr3(top_level_pgt);

--
2.44.0.478.gd926399ef9-goog



2024-04-10 13:01:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings


* Ard Biesheuvel <[email protected]> wrote:

> From: Ard Biesheuvel <[email protected]>
>
> The early 64-bit boot code must be entered with a 1:1 mapping of the
> bootable image, but it cannot operate without a 1:1 mapping of all the
> assets in memory that it accesses, and therefore, it creates such
> mappings for all known assets upfront, and additional ones on demand
> when a page fault happens on a memory address.
>
> These mappings are created with the global bit G set, as the flags used
> to create page table descriptors are based on __PAGE_KERNEL_LARGE_EXEC
> defined by the core kernel, even though the context where these mappings
> are used is very different.
>
> This means that the TLB maintenance carried out by the decompressor is
> not sufficient if it is entered with CR4.PGE enabled, which has been
> observed to happen with the stage0 bootloader of project Oak. While this
> is a dubious practice if no global mappings are being used to begin
> with, the decompressor is clearly at fault here for creating global
> mappings and not performing the appropriate TLB maintenance.
>
> Since commit
>
> f97b67a773cd84b ("x86/decompressor: Only call the trampoline when changing paging levels")
>
> CR4 is no longer modified by the decompressor if no change in the number
> of paging levels is needed. Before that, CR4 would always be set to a
> known value with PGE cleared.

So if we do this for robustness & historical pre-f97b67a773cd84b
quirk-reliance's sake, I'd prefer if we loaded a known CR4 value again,
instead of just turning off the PGE bit.

It's probably also a tiny bit faster, as no CR4 read has to be performed.

Thanks,

Ingo

2024-04-10 13:54:38

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings

On Wed, 10 Apr 2024 at 14:58, Ingo Molnar <[email protected]> wrote:
>
>
> * Ard Biesheuvel <[email protected]> wrote:
>
> > From: Ard Biesheuvel <[email protected]>
> >
> > The early 64-bit boot code must be entered with a 1:1 mapping of the
> > bootable image, but it cannot operate without a 1:1 mapping of all the
> > assets in memory that it accesses, and therefore, it creates such
> > mappings for all known assets upfront, and additional ones on demand
> > when a page fault happens on a memory address.
> >
> > These mappings are created with the global bit G set, as the flags used
> > to create page table descriptors are based on __PAGE_KERNEL_LARGE_EXEC
> > defined by the core kernel, even though the context where these mappings
> > are used is very different.
> >
> > This means that the TLB maintenance carried out by the decompressor is
> > not sufficient if it is entered with CR4.PGE enabled, which has been
> > observed to happen with the stage0 bootloader of project Oak. While this
> > is a dubious practice if no global mappings are being used to begin
> > with, the decompressor is clearly at fault here for creating global
> > mappings and not performing the appropriate TLB maintenance.
> >
> > Since commit
> >
> > f97b67a773cd84b ("x86/decompressor: Only call the trampoline when changing paging levels")
> >
> > CR4 is no longer modified by the decompressor if no change in the number
> > of paging levels is needed. Before that, CR4 would always be set to a
> > known value with PGE cleared.
>
> So if we do this for robustness & historical pre-f97b67a773cd84b
> quirk-reliance's sake, I'd prefer if we loaded a known CR4 value again,
> instead of just turning off the PGE bit.
>
> It's probably also a tiny bit faster, as no CR4 read has to be performed.
>

Fair enough. I'll go and change that.

2024-04-10 13:56:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings


* Ard Biesheuvel <[email protected]> wrote:

> On Wed, 10 Apr 2024 at 14:58, Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Ard Biesheuvel <[email protected]> wrote:
> >
> > > From: Ard Biesheuvel <[email protected]>
> > >
> > > The early 64-bit boot code must be entered with a 1:1 mapping of the
> > > bootable image, but it cannot operate without a 1:1 mapping of all the
> > > assets in memory that it accesses, and therefore, it creates such
> > > mappings for all known assets upfront, and additional ones on demand
> > > when a page fault happens on a memory address.
> > >
> > > These mappings are created with the global bit G set, as the flags used
> > > to create page table descriptors are based on __PAGE_KERNEL_LARGE_EXEC
> > > defined by the core kernel, even though the context where these mappings
> > > are used is very different.
> > >
> > > This means that the TLB maintenance carried out by the decompressor is
> > > not sufficient if it is entered with CR4.PGE enabled, which has been
> > > observed to happen with the stage0 bootloader of project Oak. While this
> > > is a dubious practice if no global mappings are being used to begin
> > > with, the decompressor is clearly at fault here for creating global
> > > mappings and not performing the appropriate TLB maintenance.
> > >
> > > Since commit
> > >
> > > f97b67a773cd84b ("x86/decompressor: Only call the trampoline when changing paging levels")
> > >
> > > CR4 is no longer modified by the decompressor if no change in the number
> > > of paging levels is needed. Before that, CR4 would always be set to a
> > > known value with PGE cleared.
> >
> > So if we do this for robustness & historical pre-f97b67a773cd84b
> > quirk-reliance's sake, I'd prefer if we loaded a known CR4 value again,
> > instead of just turning off the PGE bit.
> >
> > It's probably also a tiny bit faster, as no CR4 read has to be performed.
> >
>
> Fair enough. I'll go and change that.

Thanks!

Ingo