2021-10-01 16:10:04

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 0/4] x86/mm: Fix some issues with using trampoline_pgd

From: Joerg Roedel <[email protected]>

Hi,

here are a couple of fixes and documentation improvements for the use
of the trampoline_pgd in the kernel. Most importantly it fixes the
issue that switching to the trampoline_pgd will unmap the kernel stack
and real_mode_header, making crashes likely before the code can
actually jump to real mode.

The first patch adds a comment to document that the trampoline_pgd
aliases kernel page-tables in the user address range, establishing
global TLB entries for these addresses. The next two patches add
global TLB flushes when switching to and from the trampoline_pgd.

The last patch extends the trampoline_pgd to cover the whole kernel
address range. This is needed to make sure the stack and the
real_mode_header are still mapped after the switch and that the code
flow can safely reach real-mode.

Please review.

Thanks,

Joerg

Changes v2->v3:

- Addressed review comments from Dave Hansen

Link to v2: https://lore.kernel.org/lkml/[email protected]/

Joerg Roedel (4):
x86/realmode: Add comment for Global bit usage in trampline_pgd
x86/mm/64: Flush global TLB on boot and AP bringup
x86/mm: Flush global TLB when switching to trampoline page-table
x86/64/mm: Map all kernel memory into trampoline_pgd

arch/x86/include/asm/realmode.h | 1 +
arch/x86/kernel/head64.c | 15 ++++++++++++++
arch/x86/kernel/head_64.S | 19 +++++++++++++++++-
arch/x86/kernel/reboot.c | 12 ++---------
arch/x86/mm/init.c | 5 +++++
arch/x86/realmode/init.c | 35 ++++++++++++++++++++++++++++++++-
6 files changed, 75 insertions(+), 12 deletions(-)


base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
--
2.33.0


2021-10-01 16:10:08

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 4/4] x86/64/mm: Map all kernel memory into trampoline_pgd

From: Joerg Roedel <[email protected]>

The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff
range of kernel memory (with 4-level paging). This range contains the
kernels text+data+bss mappings and the module mapping space, but not the
direct mapping and the vmalloc area.

This is enough to get an application processors out of real-mode, but
for code that switches back to real-mode the trampoline_pgd is missing
important parts of the address space. For example, consider this code
from arch/x86/kernel/reboot.c, function machine_real_restart() for a
64-bit kernel:

#ifdef CONFIG_X86_32
load_cr3(initial_page_table);
#else
write_cr3(real_mode_header->trampoline_pgd);

/* Exiting long mode will fail if CR4.PCIDE is set. */
if (boot_cpu_has(X86_FEATURE_PCID))
cr4_clear_bits(X86_CR4_PCIDE);
#endif

/* Jump to the identity-mapped low memory code */
#ifdef CONFIG_X86_32
asm volatile("jmpl *%0" : :
"rm" (real_mode_header->machine_real_restart_asm),
"a" (type));
#else
asm volatile("ljmpl *%0" : :
"m" (real_mode_header->machine_real_restart_asm),
"D" (type));
#endif

The code switches to the trampoline_pgd, which unmaps the direct mapping
and also the kernel stack. The call to cr4_clear_bits() will find no
stack and crash the machine. The real_mode_header pointer below points
into the direct mapping, and dereferencing it also causes a crash.

The reason this does not crash always is only that kernel mappings are
global and the CR3 switch does not flush those mappings. But if theses
mappings are not in the TLB already, the above code will crash before it
can jump to the real-mode stub.

Extend the trampoline_pgd to contain all kernel mappings to prevent
these crashes and to make code which runs on this page-table more
robust.

Cc: [email protected]
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/realmode/init.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index b9802b18f504..77617cd624fe 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -95,6 +95,7 @@ static void __init setup_real_mode(void)
#ifdef CONFIG_X86_64
u64 *trampoline_pgd;
u64 efer;
+ int i;
#endif

base = (unsigned char *)real_mode_header;
@@ -151,8 +152,17 @@ static void __init setup_real_mode(void)
trampoline_header->flags = 0;

trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
+
+ /* Map the real mode stub as virtual == physical */
trampoline_pgd[0] = trampoline_pgd_entry.pgd;
- trampoline_pgd[511] = init_top_pgt[511].pgd;
+
+ /*
+ * Include the entirety of the kernel mapping into the trampoline
+ * PGD. This way, all mappings present in the normal kernel page
+ * tables are usable while running on trampoline_pgd.
+ */
+ for (i = pgd_index(__PAGE_OFFSET); i < PTRS_PER_PGD; i++)
+ trampoline_pgd[i] = init_top_pgt[i].pgd;
#endif

sme_sev_setup_real_mode(trampoline_header);
--
2.33.0

2021-10-01 18:38:47

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table

From: Joerg Roedel <[email protected]>

Move the switching code into a function so that it can be re-used and
add a global TLB flush. This makes sure that usage of memory which is
not mapped in the trampoline page-table is reliably caught.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/realmode.h | 1 +
arch/x86/kernel/reboot.c | 12 ++----------
arch/x86/realmode/init.c | 23 +++++++++++++++++++++++
3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d083c873..331474b150f1 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -89,6 +89,7 @@ static inline void set_real_mode_mem(phys_addr_t mem)
}

void reserve_real_mode(void);
+void load_trampoline_pgtable(void);

#endif /* __ASSEMBLY__ */

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 0a40df66a40d..fa700b46588e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -113,17 +113,9 @@ void __noreturn machine_real_restart(unsigned int type)
spin_unlock(&rtc_lock);

/*
- * Switch back to the initial page table.
+ * Switch to the trampoline page table.
*/
-#ifdef CONFIG_X86_32
- load_cr3(initial_page_table);
-#else
- write_cr3(real_mode_header->trampoline_pgd);
-
- /* Exiting long mode will fail if CR4.PCIDE is set. */
- if (boot_cpu_has(X86_FEATURE_PCID))
- cr4_clear_bits(X86_CR4_PCIDE);
-#endif
+ load_trampoline_pgtable();

/* Jump to the identity-mapped low memory code */
#ifdef CONFIG_X86_32
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 31b5856010cb..b9802b18f504 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -17,6 +17,29 @@ u32 *trampoline_cr4_features;
/* Hold the pgd entry used on booting additional CPUs */
pgd_t trampoline_pgd_entry;

+void load_trampoline_pgtable(void)
+{
+#ifdef CONFIG_X86_32
+ load_cr3(initial_page_table);
+#else
+ /* Exiting long mode will fail if CR4.PCIDE is set. */
+ if (boot_cpu_has(X86_FEATURE_PCID))
+ cr4_clear_bits(X86_CR4_PCIDE);
+
+ write_cr3(real_mode_header->trampoline_pgd);
+#endif
+
+ /*
+ * The CR3 write above will not flush global TLB entries.
+ * Stale, global entries from previous sets of page tables may
+ * still be present. Flush those stale entries.
+ *
+ * This ensures that memory accessed while running with
+ * trampoline_pgd is *actually* mapped into trampoline_pgd.
+ */
+ __flush_tlb_all();
+}
+
void __init reserve_real_mode(void)
{
phys_addr_t mem;
--
2.33.0

2021-10-27 21:23:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table

On Fri, Oct 01, 2021 at 05:48:16PM +0200, Joerg Roedel wrote:
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 31b5856010cb..b9802b18f504 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -17,6 +17,29 @@ u32 *trampoline_cr4_features;
> /* Hold the pgd entry used on booting additional CPUs */
> pgd_t trampoline_pgd_entry;
>
> +void load_trampoline_pgtable(void)
> +{
> +#ifdef CONFIG_X86_32
> + load_cr3(initial_page_table);
> +#else
> + /* Exiting long mode will fail if CR4.PCIDE is set. */

So this comment is not valid anymore if this is a separate function - it
is valid only when that function is called in reboot.c so I guess you
should leave that comment there.

> + if (boot_cpu_has(X86_FEATURE_PCID))
> + cr4_clear_bits(X86_CR4_PCIDE);
> +
> + write_cr3(real_mode_header->trampoline_pgd);

Is there any significance to the reordering of those calls here? The
commit message doesn't say...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-02 12:59:00

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table

On Wed, Oct 27, 2021 at 11:58:45AM +0200, Borislav Petkov wrote:
> On Fri, Oct 01, 2021 at 05:48:16PM +0200, Joerg Roedel wrote:
> > +void load_trampoline_pgtable(void)
> > +{
> > +#ifdef CONFIG_X86_32
> > + load_cr3(initial_page_table);
> > +#else
> > + /* Exiting long mode will fail if CR4.PCIDE is set. */
>
> So this comment is not valid anymore if this is a separate function - it
> is valid only when that function is called in reboot.c so I guess you
> should leave that comment there.

Okay, but in the caller it is not visible the CR4.PCID is disabled in
this function. I'd rather update the comment to tell that the function
is called before transitioning to real mode?

>
> > + if (boot_cpu_has(X86_FEATURE_PCID))
> > + cr4_clear_bits(X86_CR4_PCIDE);
> > +
> > + write_cr3(real_mode_header->trampoline_pgd);
>
> Is there any significance to the reordering of those calls here? The
> commit message doesn't say...

Yes, the call to cr4_clear_bits() is not safe anymore on the trampoline
page-table, because the per-cpu areas are not fully mapped anymore.

This changes with the next patch, but its nevertheless more robust to
minimize the code running on the trampoline page-table.

I will add that to the commit message.

Regards,

--
J?rg R?del
[email protected]

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 N?rnberg
Germany

(HRB 36809, AG N?rnberg)
Gesch?ftsf?hrer: Ivo Totev


2021-12-02 18:26:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86/mm: Flush global TLB when switching to trampoline page-table

On Thu, Dec 02, 2021 at 01:58:51PM +0100, Joerg Roedel wrote:
> Okay, but in the caller it is not visible the CR4.PCID is disabled in
> this function. I'd rather update the comment to tell that the function
> is called before transitioning to real mode?

Well, if something calls load_trampoline_pgtable(), it kinda assumes
that if it wants that that function will do all the necessary steps to
load it, including clearing PCIDE.

Why does the caller even need to know that that function clears
CR4.PCIDE?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette